date:20110606

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
itagaki.takah...@gmail.com wrote:
 On Sun, Jun 5, 2011 at 12:03, Robert Haas robertmh...@gmail.com wrote:
 If other buffer pins do exist, then we can't
 defragment the page, but that doesn't mean no useful work can be done:
 we can still mark used line pointers dead, or dead line pointers
 unused.  We cannot defragment, but that can be done either by the next
 VACUUM or by a HOT cleanup.

 This is just an idea -- Is it possible to have copy-on-write techniques?
 VACUUM allocates a duplicated page for the pinned page, and copy valid
 tuples into the new page. Following buffer readers after the VACUUM will
 see the cloned page instead of the old pinned one.

Heikki suggested the same thing, and it's not a bad idea, but I think
it would be more work to implement than what I proposed.  The caller
would need to be aware that, if it tries to re-acquire a content lock
on the same page, the offset of the tuple within the page might
change.  I'm not sure how much work would be required to cope with
that possibility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Jim Nasby

On Jun 6, 2011, at 1:00 AM, Robert Haas wrote:
 On Mon, Jun 6, 2011 at 12:19 AM, Itagaki Takahiro
 itagaki.takah...@gmail.com wrote:
 On Sun, Jun 5, 2011 at 12:03, Robert Haas robertmh...@gmail.com wrote:
 If other buffer pins do exist, then we can't
 defragment the page, but that doesn't mean no useful work can be done:
 we can still mark used line pointers dead, or dead line pointers
 unused.  We cannot defragment, but that can be done either by the next
 VACUUM or by a HOT cleanup.
 
 This is just an idea -- Is it possible to have copy-on-write techniques?
 VACUUM allocates a duplicated page for the pinned page, and copy valid
 tuples into the new page. Following buffer readers after the VACUUM will
 see the cloned page instead of the old pinned one.
 
 Heikki suggested the same thing, and it's not a bad idea, but I think
 it would be more work to implement than what I proposed.  The caller
 would need to be aware that, if it tries to re-acquire a content lock
 on the same page, the offset of the tuple within the page might
 change.  I'm not sure how much work would be required to cope with
 that possibility.

I've had a related idea that I haven't looked into... if you're scanning a 
relation (ie: index scan, seq scan) I've wondered if it would be more efficient 
to deal with the entire page at once, possibly be making a copy of it. This 
would reduce the number of times you pin the page (often quite dramatically). I 
realize that means copying the entire page, but I suspect that would occur 
entirely in the L1 cache, which would be fast.

So perhaps instead of copy on write we should try for copy on read on all 
appropriate plan nodes.

On a related note, I've also wondered if it would be useful to allow nodes to 
deal with more than one tuple at a time; the idea being that it's better to 
execute a smaller chunk of code over a bigger chunk of data instead of 
dribbling tuples through an entire execution tree one at a time. Perhaps that 
will only be useful if nodes are executing in parallel...
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Pavan Deolasee

On Sun, Jun 5, 2011 at 8:33 AM, Robert Haas robertmh...@gmail.com wrote:
 We've occasionally seen problems with VACUUM getting stuck for failure
 to acquire a cleanup lock due to, for example, a cursor holding a pin
 on the buffer page.  In the worst case, this can cause an undetected
 deadlock, if the backend holding the buffer pin blocks trying to
 acquire a heavyweight lock that is in turn blocked by VACUUM.  A while
 back, someone (Greg Stark? me?) floated the idea of not waiting for
 the cleanup lock.  If we can't get it immediately, or within some
 short period of time, then we just skip the page and continue on.


Do we know if this is really a problem though ? The deadlock for
example, can happen only when a backend tries to get a table level
conflicting lock while holding the buffer pin and I am not sure if we
do that.

The contention issue would probably make sense for small tables
because for large to very large tables, the probability that a backend
and vacuum would process the same page would be quite low. With the
current default for vac_threshold, the small tables can get vacuumed
very frequently and if they are also heavily accessed, the cleanup
lock can become a bottleneck.

Another issue that might be worth paying attention to is the single
pass vacuum that I am currently working on. The design that we agreed
up on, assumes that the index vacuum must clear index pointers to all
the dead line pointers. If we skip any page, we must at least collect
the existing dead line pointers and remove those index pointers. If we
create dead line pointers and we want to vacuum them later, we store
the LSN in the page and that may require defrag. Of course, we can
work around that, but I think it will be useful if we do some tests to
show that the cleanup lock is indeed a major bottleneck.

Thanks,
Pavan
-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] storing TZ along timestamps

2011-06-06 Thread Jim Nasby

On Jun 4, 2011, at 3:56 AM, Greg Stark wrote:
 On Thu, Jun 2, 2011 at 8:58 PM, Jim Nasby j...@nasby.net wrote:
 
 I'm torn between whether the type should store the original time or the 
 original time converted to GMT.
 
 This is the wrong way to think about it. We *never* store time
 converted to GMT.  When we want to represent a point in time we
 represent it as seconds since the epoch.
Right. Sorry, my bad.

 The question here is how to represent more complex concepts than
 simply points in time. I think the two concepts under discussion are
 a) a composite type representing a point in time and a timezone it
 should be interpreted in for operations and display and b) the
 original input provided which is a text string with the constraint
 that it's a valid input which can be interpreted as a point in time.

My fear with A is that something could change that would make it impossible to 
actually get back to the time that was originally entered. For example, a new 
version of the timezone database could change something. Though, that problem 
also exists for timestamptz today, so presumably if it was much of an issue 
we'd have gotten complaints by now.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 07:12, Robert Haas wrote:

I did some further investigation of this.  It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it.  That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires.  Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction.  It
might be possible to make the lock entry spring into existence on
demand - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid.  That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been given a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues.  I'll look into this
further.


Ah, I remember I saw that vxid lock pop up quite high in an oprofile 
profile recently. I think it was the case of executing a lot of very 
simple prepared queries. So it would be nice to address that, even from 
a single CPU point of view.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 09:35, Jim Nasby wrote:

I've had a related idea that I haven't looked into... if you're scanning a 
relation (ie: index scan, seq scan) I've wondered if it would be more efficient 
to deal with the entire page at once, possibly be making a copy of it. This 
would reduce the number of times you pin the page (often quite dramatically). I 
realize that means copying the entire page, but I suspect that would occur 
entirely in the L1 cache, which would be fast.


We already do that. When an index scan moves to an index page, the heap 
tid pointers of all the matching index tuples are copied to 
backend-private memory in one go, and the lock is released. And for a 
seqscan, the visibility of all the tuples on the page is checked in one 
go while holding the lock, then the lock is released but the pin is 
kept. The pin is only released after all the tuples have been read. 
There's no repeated pin-unpin for each tuple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: AuthenticationMD5 protocol documentation clarification

2011-06-06 Thread Cyan Ogilvie

This is my first patch, so I hope I've got the process right for submitting
patches.

I'm building a driver to talk version 3.0 of the protocol, and generally
I've found the documentation to be excellent.  One are I had trouble with
was responding to the AuthenticationMD5Password challenge.  After receiving
help on IRC, I've attached a patch to the protocol documentation attempting
to clarify what is expected by the backend, basically:

concat(
'md5',
hex_encode(
md5(
concat(
hex_encode(
md5(
concat(password, username)
)
),
salt
)
)
)
)

My technical writing skills were not up to wording that in plain english,
and it seems like the rest of the documentation for the protocol steers
clear of anything that looks like code.  Is this policy in this area or is
the code-esque description ok?

No code is changed, only documentation, so I've left out the code-relevant
patch info fields

Patch info:

Project name: postgresql
Branch: master

Cyan
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
new file mode 100644
index d3de330..ba95241
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 294,303 
listitem
 para
  The frontend must now send a PasswordMessage containing the
! password encrypted via MD5, using the 4-character salt
! specified in the AuthenticationMD5Password message.  If
! this is the correct password, the server responds with an
! AuthenticationOk, otherwise it responds with an ErrorResponse.
 /para
/listitem
   /varlistentry
--- 294,306 
listitem
 para
  The frontend must now send a PasswordMessage containing the
! result of concat('md5',
! hex_encode(md5(concat(hex_encode(md5(concat(password, username))),
! salt, where salt is the 4-character salt specified in
! the AuthenticationMD5Password message.  Username and password do not
! include the trailing null byte.  If this is the correct password, the
! server responds with an AuthenticationOk, otherwise it responds with
! an ErrorResponse.
 /para
/listitem
   /varlistentry

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Vacuum, visibility maps and SKIP_PAGES_THRESHOLD

2011-06-06 Thread Heikki Linnakangas


On 03.06.2011 22:16, Bruce Momjian wrote:

I realize we just read the pages from the kernel to maintain sequential
I/O, but do we actually read the contents of the page if we know it
doesn't need vacuuming?


Yes.


 If so, do we need to?


Not necessarily, but it allows us to freeze old tuples, and doesn't cost 
much anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] gdb with postgres

2011-06-06 Thread HuangQi

Hi,
   I was using gdb to debug postgres. In order to debug the backend of
running query, I start postgres first and use select * from
pg_backend_pid() to ask for backend pid. Then I start gdb in another bash
window with gdb postgres and attach the pid obtained above and set the
breakpoint. Then I run the query from the first window. However, the
debugging precess which is shown below is not going to the breakpoint. I
tried many different breakpoints, but it always start from the 305
client_read_ended().


GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as sparc-sun-solaris2.10...
(gdb) b qp_add_paths_to_joinrel
Breakpoint 1 at 0x1a6744: file joinpath.c, line 67.
(gdb) attach 23903
Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903
Retry #1:
Retry #2:
Retry #3:
Retry #4:
[New LWP 1]
0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1
(gdb) n
Single stepping until exit from function _rt_boot,
which has no line number information.
secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305
305 client_read_ended();
(gdb) n
pq_recvbuf () at pqcomm.c:767
767 if (r  0)
(gdb) n
769 if (errno == EINTR)
(gdb) n
782 if (r == 0)
(gdb) n
788 return EOF;
(gdb) n
791 PqRecvLength += r;
(gdb)

Any one know what is going wrong? BTW, as you can see, the system is
solaris.

-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] gdb with postgres

2011-06-06 Thread Pavan Deolasee

On Mon, Jun 6, 2011 at 1:13 PM, HuangQi huangq...@gmail.com wrote:
 Hi,
    I was using gdb to debug postgres. In order to debug the backend of
 running query, I start postgres first and use select * from
 pg_backend_pid() to ask for backend pid. Then I start gdb in another bash
 window with gdb postgres and attach the pid obtained above and set the
 breakpoint. Then I run the query from the first window. However, the
 debugging precess which is shown below is not going to the breakpoint. I
 tried many different breakpoints, but it always start from the 305
 client_read_ended().


Please compile with -O0 -g flags to see all the debug symbols.

CFLAGS=-O0 -g ./configure --enable-debug

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gdb with postgres

2011-06-06 Thread HuangQi

Sorry, but recompile with this flag still doesn't work.

On 6 June 2011 15:46, Pavan Deolasee pavan.deola...@gmail.com wrote:

 On Mon, Jun 6, 2011 at 1:13 PM, HuangQi huangq...@gmail.com wrote:
  Hi,
 I was using gdb to debug postgres. In order to debug the backend of
  running query, I start postgres first and use select * from
  pg_backend_pid() to ask for backend pid. Then I start gdb in another
 bash
  window with gdb postgres and attach the pid obtained above and set the
  breakpoint. Then I run the query from the first window. However, the
  debugging precess which is shown below is not going to the breakpoint. I
  tried many different breakpoints, but it always start from the 305
  client_read_ended().
 

 Please compile with -O0 -g flags to see all the debug symbols.

 CFLAGS=-O0 -g ./configure --enable-debug

 Thanks,
 Pavan

 --
 Pavan Deolasee
 EnterpriseDB http://www.enterprisedb.com




-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] BLOB support

2011-06-06 Thread Radosław Smogura


On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote:

Tom Lane t...@sss.pgh.pa.us writes:
Yes.  I think the appropriate problem statement is provide 
streaming
access to large field values, as an alternative to just 
fetching/storing
the entire value at once.  I see no good reason to import the 
entire
messy notion of LOBS/CLOBS.  (The fact that other databases have 
done it

is not a good reason.)


Spent some time in the archive to confirm a certain “déjà vu”
impression.  Couldn't find it.  Had to manually search in closed 
commit

fests… but here we are, I think:

  https://commitfest.postgresql.org/action/patch_view?id=70
  
http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us
  
http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com


Regards,


I think more about this with contrast to sent references, but I still 
have in my mind construct
Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit 
outdated we have BlueRay

conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
where 1st parameter is myWeddingDvd,
or if someone doesn't like Java he/she/it may wish to put C++ istream, 
or C FILE.


I think (with respect to below consideration), this implicite requires 
that LOBs should be stored in one, centralized place doesn't matter if 
this will be file system or special table, or something else, but when 
statement is processed there is no idea with which table LOB will be 
associated, if we want to TOAST, where TOAST it, what will be if 
insertion will by SQL function, which choose table depending on BLOB 
content?


Quite interesting idea from cited patch was about string identifying 
LOB, but with above it close road to for JDBC create LOB. I think, as 
well constructs that insert 1st, small LOB into table to get some driver 
depending API are little bit old fashioned.


Possible solutions, if we don't want centralized storage, may be:
1. Keep BLOB in memory, but this may, depending on implementation, 
reduce size of initial BLOB.
2. Temporally backup blob in file, then when values are stored copy 
file to TOAST table, but still some changes are required to support LOBs 
for complex types and arrays.


So please give some ideas how to resolve this, or may be it has low 
priority?


Regards,
Radek

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Simon Riggs

On Sun, Jun 5, 2011 at 4:03 AM, Robert Haas robertmh...@gmail.com wrote:
 We've occasionally seen problems with VACUUM getting stuck for failure
 to acquire a cleanup lock due to, for example, a cursor holding a pin
 on the buffer page.  In the worst case, this can cause an undetected
 deadlock, if the backend holding the buffer pin blocks trying to
 acquire a heavyweight lock that is in turn blocked by VACUUM.  A while
 back, someone (Greg Stark? me?) floated the idea of not waiting for
 the cleanup lock.  If we can't get it immediately, or within some
 short period of time, then we just skip the page and continue on.

 Today I had what might be a better idea: don't try to acquire a
 cleanup lock at all.  Instead, acquire an exclusive lock.  After
 having done so, observe the pin count.  If there are no other buffer
 pins, that means our exclusive lock is actually a cleanup lock, and we
 proceed as now.  If other buffer pins do exist, then we can't
 defragment the page, but that doesn't mean no useful work can be done:
 we can still mark used line pointers dead, or dead line pointers
 unused.  We cannot defragment, but that can be done either by the next
 VACUUM or by a HOT cleanup.  We can even arrange - using existing
 mechanism - to leave behind a hint that the page is a good candidate
 for a HOT cleanup, by setting pd_prune_xid to, say, FrozenXID.

 Like the idea of skipping pages on which we can't acquire a cleanup
 lock altogether, this should prevent VACUUM from getting stuck trying
 to lock a heap page.  While buffer pins can be held for extended
 periods of time, I don't think there is any operation that holds a
 buffer content lock more than very briefly.  Furthermore, unlike the
 idea of skipping the page altogether, we could use this approach even
 during an anti-wraparound vacuum.

 Thoughts?


Not waiting seems like a good idea.

Not returning to the block while it is in RAM or not cleaning the
block at all would cause a different performance issues, which I would
wish to avoid.

Hot Standby has specific code to avoid this situation. Perhaps you
could copy that, not sure.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Pavel Stehule

2011/6/6 Radosław Smogura rsmog...@softperience.eu:
 On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote:

 Tom Lane t...@sss.pgh.pa.us writes:

 Yes.  I think the appropriate problem statement is provide streaming
 access to large field values, as an alternative to just fetching/storing
 the entire value at once.  I see no good reason to import the entire
 messy notion of LOBS/CLOBS.  (The fact that other databases have done it
 is not a good reason.)

 Spent some time in the archive to confirm a certain “déjà vu”
 impression.  Couldn't find it.  Had to manually search in closed commit
 fests… but here we are, I think:

  https://commitfest.postgresql.org/action/patch_view?id=70
  http://archives.postgresql.org/message-id/17891.1246301...@sss.pgh.pa.us
  http://archives.postgresql.org/message-id/4a4bf87e.7010...@ak.jp.nec.com

 Regards,

 I think more about this with contrast to sent references, but I still have
 in my mind construct
 Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated
 we have BlueRay
 conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
 where 1st parameter is myWeddingDvd,
 or if someone doesn't like Java he/she/it may wish to put C++ istream, or C
 FILE.

 I think (with respect to below consideration), this implicite requires that
 LOBs should be stored in one, centralized place doesn't matter if this will
 be file system or special table, or something else, but when statement is
 processed there is no idea with which table LOB will be associated, if we
 want to TOAST, where TOAST it, what will be if insertion will by SQL
 function, which choose table depending on BLOB content?

 Quite interesting idea from cited patch was about string identifying LOB,
 but with above it close road to for JDBC create LOB. I think, as well
 constructs that insert 1st, small LOB into table to get some driver
 depending API are little bit old fashioned.

 Possible solutions, if we don't want centralized storage, may be:
 1. Keep BLOB in memory, but this may, depending on implementation, reduce
 size of initial BLOB.
 2. Temporally backup blob in file, then when values are stored copy file to
 TOAST table, but still some changes are required to support LOBs for complex
 types and arrays.

@1 is useles for multiuser applications. This is a problem of current
implemementation for large TOAST values. You can hold around
work_mem bytes in mem, but any larger content should to be forwarded
to file.

Pavel


 So please give some ideas how to resolve this, or may be it has low
 priority?

 Regards,
 Radek

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SAVEPOINTs and COMMIT performance

2011-06-06 Thread Heikki Linnakangas


On 06.02.2011 23:09, Simon Riggs wrote:

On Sun, 2011-02-06 at 12:11 -0500, Bruce Momjian wrote:

Did this ever get addressed?


Patch attached.

Seems like the easiest fix I can come up with.



@@ -2518,7 +2518,7 @@ CommitTransactionCommand(void)
case TBLOCK_SUBEND:
do
{
-   CommitSubTransaction();
+   CommitSubTransaction(true);
s = CurrentTransactionState;/* changed by 
pop */
} while (s-blockState == TBLOCK_SUBEND);
/* If we had a COMMIT command, finish off the main xact 
too */


We also get into this codepath at RELEASE SAVEPOINT, in which case it is 
wrong to not reassign the locks to the parent subtransaction.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Simon Riggs

On Sat, Jun 4, 2011 at 5:55 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The approach looks sound to me. It's a fairly isolated patch and we
 should be considering this for inclusion in 9.1, not wait another
 year.

 That suggestion is completely insane.  The patch is only WIP and full of
 bugs, even according to its author.  Even if it were solid, it is way
 too late to be pushing such stuff into 9.1.  We're trying to ship a
 release, not find ways to cause it to slip more.

In 8.3, you implemented virtual transactionids days before we produced
a Release Candidate, against my recommendation.

At that time, I didn't start questioning your sanity. In fact we all
applauded that because it was a great performance gain.

The fact that you disagree with me does not make me insane. Inaction
on this point, resulting in a year's delay, will be considered to be a
gross waste by the majority of objective observers.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 12:40, Simon Riggs wrote:

On Sat, Jun 4, 2011 at 5:55 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Simon Riggssi...@2ndquadrant.com  writes:

The approach looks sound to me. It's a fairly isolated patch and we
should be considering this for inclusion in 9.1, not wait another
year.


That suggestion is completely insane.  The patch is only WIP and full of
bugs, even according to its author.  Even if it were solid, it is way
too late to be pushing such stuff into 9.1.  We're trying to ship a
release, not find ways to cause it to slip more.


In 8.3, you implemented virtual transactionids days before we produced
a Release Candidate, against my recommendation.


FWIW, this bottleneck was not introduced by the introduction of virtual 
transaction ids. Before that patch, we just took the lock on the real 
transaction id instead.



The fact that you disagree with me does not make me insane.


You are not insane, even if your suggestion is.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 10:42, Heikki Linnakangas wrote:

On 03.06.2011 14:02, Alexander Korotkov wrote:

Hackers,

WIP patch of fast GiST index build is attached. Code is dirty and
comments
are lacking, but it works. Now it is ready for first benchmarks, which
should prove efficiency of selected technique. It's time to compare fast
GiST index build with repeat insert build on large enough datasets
(datasets
which don't fit to cache). There are following aims of testing:
1) Measure acceleration of index build.
2) Measure change in index quality.
I'm going to do first testing using synthetic datasets. Everybody who
have
interesting real-life datasets for testing are welcome.


I ran another test with a simple table generated with:

CREATE TABLE pointtest (p point);
INSERT INTO pointtest SELECT point(random(), random()) FROM
generate_series(1,5000);

Generating a gist index with:

CREATE INDEX i_pointtest ON pointtest USING gist (p);

took about 15 hours without the patch, and 2 hours with it. That's quite
dramatic.


Oops, that was a rounding error, sorry. The run took about 2.7 hours 
with the patch, which of course should be rounded to 3 hours, not 2. 
Anyway, it is still a very impressive improvement.


I'm glad you could get the patch ready for benchmarking this quickly. 
Now you just need to get the patch into shape so that it can be 
committed. That is always the more time-consuming part, so I'm glad you 
have plenty of time left for it.


Could you please create a TODO list on the wiki page, listing all the 
missing features, known bugs etc. that will need to be fixed? That'll 
make it easier to see how much work there is left. It'll also help 
anyone looking at the patch to know which issues are known issues.


Meanwhile, it would still be very valuable if others could test this 
with different workloads. And Alexander, it would be good if at some 
point you could write some benchmark scripts too, and put them on the 
wiki page, just to see what kind of workloads have been taken into 
consideration and tested already. Do you think there's some worst-case 
data distributions where this algorithm would perform particularly badly?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DOMAINs and CASTs

2011-06-06 Thread Peter Eisentraut

On tis, 2011-05-17 at 14:11 -0500, Jaime Casanova wrote:
 On Tue, May 17, 2011 at 12:19 PM, Robert Haas robertmh...@gmail.com wrote:
 
  The more controversial question is what to do if someone tries to
  create such a cast anyway.  We could just ignore that as we do now, or
  we could throw a NOTICE, WARNING, or ERROR.
 
 IMHO, not being an error per se but an implementation limitation i
 would prefer to send a WARNING

Implementation limitations are normally reported as errors.  I don't see
why it should be different here.

It's debatable whether it's an implementation restriction anyway.  If
you want to create casts from or to a domain, maybe distinct types or
type aliases or something like that would be a more appropriate feature
in the long run.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 2:54 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Ah, I remember I saw that vxid lock pop up quite high in an oprofile profile
 recently. I think it was the case of executing a lot of very simple prepared
 queries. So it would be nice to address that, even from a single CPU point
 of view.

It doesn't seem too hard to do, although I have to think about the
details.  Even though the VXID locks involved are Exclusive locks,
they are actually very much like the weak locks that the current
patch accelerates, because the Exclusive lock is taken only by the
VXID owner, and it can therefore be safely assumed that the initial
lock acquisition won't block anything.  Therefore, it's really
unnecessary to touch the primary lock table at transaction start (and
to only touch it at the end if someone's waiting).  However, there's a
fly in the ointment: when someone tries to ShareLock a VXID, we need
to determine whether that VXID is still around and, if so, make an
Exclusive lock entry for it in the primary lock table.  And, unlike
what I'm doing for strong relation locks, it's probably NOT acceptable
for that to acquire and release every per-backend LWLock, because
every place that waits for VXID locks waits for a list of locks in
sequence, so we could end up with O(n^2) behavior.  Now, in theory
that's not a huge problem: the VXID includes the backend ID, so we
ought to be able to figure out which single per-backend LWLock is of
interest and just acquire/release that one.  Unfortunately, it appears
that there's no easy way to go from a backend ID to a PGPROC.  The
backend IDs are offsets into the ProcState array, so they give us a
pointer to the backend's sinval state, not its PGPROC.  And while the
PGPROC has a pointer to the sinval info, there's no pointer in the
opposite direction.  Even if there were, we'd probably need to hold
SInvalWriteLock in shared mode to follow it.

That might not be the end of the world, since VXID locks are fairly
infrequently used, but it's certainly a little grotty.  I do rather
wonder if we should be trying to reduce the number of separate places
where we list the running processes.  We have arrays of PGPROC
structures, and then we have one set of pointers to PGPROCs in the
ProcArray, and then we have the ProcState structures for sinval.  I
wonder if there's some way to rearrange all this to simplify the
bookkeeping.

BTW, how do you identify from oprofile that *vxid* locks were the
problem?  I didn't think it could produce that level of detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

2011-06-06 Thread Alexander Korotkov

Hi!

On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 06.06.2011 10:42, Heikki Linnakangas wrote:

 I ran another test with a simple table generated with:

 CREATE TABLE pointtest (p point);
 INSERT INTO pointtest SELECT point(random(), random()) FROM
 generate_series(1,5000);

 Generating a gist index with:

 CREATE INDEX i_pointtest ON pointtest USING gist (p);

 took about 15 hours without the patch, and 2 hours with it. That's quite
 dramatic.


 Oops, that was a rounding error, sorry. The run took about 2.7 hours with
 the patch, which of course should be rounded to 3 hours, not 2. Anyway, it
 is still a very impressive improvement.

I have similar results on 100 millions of rows: 21.6 hours without patch and
2 hours with patch. But I found a problem: index quality is worse. See
following query plans. There test is relation where index was created in
ordinal way, and test2 is relation where patch was used.

QUERY PLAN

---
 Bitmap Heap Scan on test  (cost=4391.01..270397.31 rows=10 width=20)
(actual time=1.257..2.147 rows=838 loops=1)
   Recheck Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box)
   Buffers: shared hit=968
   -  Bitmap Index Scan on test_idx  (cost=0.00..4366.01 rows=10
width=0) (actual time=1.162..1.162 rows=838 loops=1)
 Index Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box)
 Buffers: shared hit=131
 Total runtime: 2.214 ms
(7 rows)

 QUERY PLAN


 Bitmap Heap Scan on test2  (cost=4370.84..270377.13 rows=10 width=20)
(actual time=5.252..6.056 rows=838 loops=1)
   Recheck Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box)
   Buffers: shared hit=1458
   -  Bitmap Index Scan on test2_idx  (cost=0.00..4345.84 rows=10
width=0) (actual time=5.155..5.155 rows=838 loops=1)
 Index Cond: (v @ '(0.903,0.203),(0.9,0.2)'::box)
 Buffers: shared hit=621
 Total runtime: 6.121 ms
(7 rows)

QUERY PLAN

---
 Bitmap Heap Scan on test  (cost=4391.01..270397.31 rows=10 width=20)
(actual time=2.148..2.977 rows=850 loops=1)
   Recheck Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box)
   Buffers: shared hit=1099
   -  Bitmap Index Scan on test_idx  (cost=0.00..4366.01 rows=10
width=0) (actual time=2.052..2.052 rows=850 loops=1)
 Index Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box)
 Buffers: shared hit=249
 Total runtime: 3.033 ms
(7 rows)

 QUERY PLAN


 Bitmap Heap Scan on test2  (cost=4370.84..270377.13 rows=10 width=20)
(actual time=6.806..7.602 rows=850 loops=1)
   Recheck Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box)
   Buffers: shared hit=1615
   -  Bitmap Index Scan on test2_idx  (cost=0.00..4345.84 rows=10
width=0) (actual time=6.709..6.709 rows=850 loops=1)
 Index Cond: (v @ '(0.503,0.503),(0.5,0.5)'::box)
 Buffers: shared hit=773
 Total runtime: 7.667 ms
(7 rows)

We can see that index scan requires read of several times more
pages. Original paper denotes such effect. It explains it by the routing
rectangles in less optimal ways. But this effect wasn't so dramatic in tests
provided in the paper. So, I have following thoughts about this problem:

1) Number of pages, which was readed from index is too large even with
ordinal index build. Querying of small area requires read of hundred of
pages. It probbably caused by picksplit implementation. I've version of
picksplit algorithm which seems to be much more efficient. I'll do some
benchmarks with my picksplit algorithm. I hope difference in index quality
will be not so dramatic.

2) I can try to do some enchancements in fast build alogrithms which could
improve tree quality. In original paper Hilbert heuristic was used to achive
even better tree quality than tree which was created in ordinal way. But
since we use GiST we are restricted by it's interface (or we have to create
new interface functions(s), but I like to avoid it). I would like to try to
do some ordering by penalty value in buffer emptying process and buffers
relocation on split.

3) Probably, there is some bug which affects tree quality.


 Could you please create a TODO list on the wiki page, listing all the
 missing features, known bugs etc. that will need to be fixed? That'll make
 it easier to see how much work there is left. It'll also help anyone looking
 at the patch to

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 07:12, Robert Haas wrote:

I did some further investigation of this.  It appears that more than
99% of the lock manager lwlock traffic that remains with this patch
applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
statement runs in a separate transaction, and for each new transaction
we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
that transaction, so that other processes can wait for it.  That
requires acquiring and releasing a lock manager partition lock, and we
have to do the same thing a moment later at transaction end to dump
the lock.

A quick grep seems to indicate that the only places where we actually
make use of those VXID locks are in DefineIndex(), when CREATE INDEX
CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
expires.  Considering that these are not commonplace events, it seems
tremendously wasteful to incur the overhead for every transaction.  It
might be possible to make the lock entry spring into existence on
demand - i.e. if a backend wants to wait on a vxid entry, it creates
the LOCK and PROCLOCK objects for that vxid.  That presents a few
synchronization challenges, and plus we have to make sure that the
backend that's just been given a lock knows that it needs to release
it, but those seem like they might be manageable problems, especially
given the new infrastructure introduced by the current patch, which
already has to deal with some of those issues.  I'll look into this
further.


At the moment, the transaction with given vxid acquires an ExclusiveLock 
on the vxid, and anyone who wants to wait for it to finish acquires a 
ShareLock. If we simply reverse that, so that the transaction itself 
takes ShareLock, and anyone wanting to wait on it take an ExclusiveLock, 
will this fastlock patch bust this bottleneck too?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 2:36 AM, Pavan Deolasee pavan.deola...@gmail.com wrote:
 Do we know if this is really a problem though ? The deadlock for
 example, can happen only when a backend tries to get a table level
 conflicting lock while holding the buffer pin and I am not sure if we
 do that.

The deadlock isn't terribly common, because, as you say, you need the
process holding the buffer pin to try to take a lock on the relation
being vacuumed that is strong enough to conflict with
ShareUpdateExclusiveLock.  That's a slightly unusual thing to do.

But the problem of vacuum stalling out because it can't get the
cleanup lock is a very real one.  I've seen at least one customer hit
this in production, and it was pretty painful.  Now, granted, you need
some bad application design, too: you have to leave a cursor lying
around instead of running it to completion and then stopping.  But
supposing you do make that mistake, you might hope that it wouldn't
cause VACUUM starvation, which is what happens today.  IOW, I'm less
worried about whether the cleanup lock is slowing vacuum down than I
am about eliminating the pathological cases where an autovacuum
workers gets pinned down, stuck waiting for a cleanup lock that never
arrives.  Now the table doesn't get vacuumed (bad) and the system as a
whole is one AV worker short of what it's supposed to have (also bad).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 8:02 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 06.06.2011 07:12, Robert Haas wrote:

 I did some further investigation of this.  It appears that more than
 99% of the lock manager lwlock traffic that remains with this patch
 applied has locktag_type == LOCKTAG_VIRTUALTRANSACTION.  Every SELECT
 statement runs in a separate transaction, and for each new transaction
 we run VirtualXactLockTableInsert(), which takes a lock on the vxid of
 that transaction, so that other processes can wait for it.  That
 requires acquiring and releasing a lock manager partition lock, and we
 have to do the same thing a moment later at transaction end to dump
 the lock.

 A quick grep seems to indicate that the only places where we actually
 make use of those VXID locks are in DefineIndex(), when CREATE INDEX
 CONCURRENTLY is in use, and during Hot Standby, when max_standby_delay
 expires.  Considering that these are not commonplace events, it seems
 tremendously wasteful to incur the overhead for every transaction.  It
 might be possible to make the lock entry spring into existence on
 demand - i.e. if a backend wants to wait on a vxid entry, it creates
 the LOCK and PROCLOCK objects for that vxid.  That presents a few
 synchronization challenges, and plus we have to make sure that the
 backend that's just been given a lock knows that it needs to release
 it, but those seem like they might be manageable problems, especially
 given the new infrastructure introduced by the current patch, which
 already has to deal with some of those issues.  I'll look into this
 further.

 At the moment, the transaction with given vxid acquires an ExclusiveLock on
 the vxid, and anyone who wants to wait for it to finish acquires a
 ShareLock. If we simply reverse that, so that the transaction itself takes
 ShareLock, and anyone wanting to wait on it take an ExclusiveLock, will this
 fastlock patch bust this bottleneck too?

Not without some further twaddling.  Right now, the fast path only
applies when you are taking a lock  ShareUpdateExclusiveLock on an
unshared relation.  See also the email I just sent on why using the
exact same mechanism might not be such a hot idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in PQsetvalue

2011-06-06 Thread Pavel Golub

Hello, guys.

You wrote:

MM On Fri, Jun 3, 2011 at 10:36 PM, Andrew Chernow a...@esilo.com wrote:
 On 6/3/2011 10:26 PM, Andrew Chernow wrote:

 I disagree -- I think the fix is a one-liner. line 446:
 if (tup_num == res-ntups !res-tuples[tup_num])

 should just become
 if (tup_num == res-ntups)

 also the memset of the tuple slots when the slot array is expanded can
 be removed. (in addition, the array tuple array expansion should
 really be abstracted, but that isn't strictly necessary here).


 All true. This is a cleaner fix to something that was in fact broken ;)
 You want

 Attached a patch that fixes the OP's issue. PQsetvalue now uses pqAddTuple
 to
 grow the tuple table and has removed the remnants of an older idea that
 caused
 the bug.


 Sorry, I attached the wrong patch.  Here is the correct one.


MM This looks good.  Pavel, want to test it?

Sorry for delay in answer. Yeah, I'm glad to. Should I apply this
patch by myself?

MM merlin



-- 
With best wishes,
 Pavel  mailto:pa...@gf.microolap.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Fast GiST index build

2011-06-06 Thread Alexander Korotkov

On Mon, Jun 6, 2011 at 2:51 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Do you think there's some worst-case data distributions where this
 algorithm would perform particularly badly?

I think there could be some worst-case GiST applications. Just now gist fast
build algorithm invokes more penalty calls than repeatable insert algorithm.
If I succeed then it will invoke even more such calls. So, if penalty
function is very slow then gist fast build will be slover then
repeatable insert.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] WIP: Fast GiST index build

2011-06-06 Thread Alexander Korotkov

On Mon, Jun 6, 2011 at 4:14 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 If I succeed then it will invoke even more such calls.

I meant here that if I succeed in enhancements which improve index quality
then fast build algorithm will invoke even more such calls.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 14:59, Robert Haas wrote:

BTW, how do you identify from oprofile that *vxid* locks were the
problem?  I didn't think it could produce that level of detail.


It can show the call stack of each call, with --callgraph=n option, 
where you can see what percentage of the calls to LockAcquire come from 
VirtualXactLockTableInsert.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Dimitri Fontaine

Jeff Davis pg...@j-davis.com writes:
 I'd like to take another look at Range Types and whether part of it
 should be an extension. Some of these issues relate to extensions in
 general, not just range types.

That's a good question :)

I think the way things are going to be organised now is that we will
have core-blessed extensions:  don't mix the mechanism and the policy.

 non-issue if we had a good type interface system (that works on
 polymorphic types) -- we could just have a built-in range interface,
 and the range extension could add  as the range interface's overlaps
 operator for the type ANYRANGE.

That means that this is, IMHO, the right approach.  Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

And the useful stuff you need to have to benefit from that core support
would be an extension.  It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

So, I see us getting those different kinds of extensions in the future:

 a. core extensions, shipped by default
 b. contribs, not shipped by default, maintained by core hackers
 c. examples, included in the source code only, maintained as contribs
 d. “trusted network” of extensions (pgdg, pgxn, debian, privates, etc)
 e. external independent extensions, just as now 

The other main difference between a core extension and a contrib will be
where it's documented.  Greg Smith proposed a patch wherein he moved
some contribs to a new extension/ place, and had them auto installed.

I think the consensus is to instead add a new chapter (maybe between
current chapters 9. Functions and Operators and 10. Type Conversion) and
host “core extensions” docs there.  The source code organisation is
controversial because technically not necessary.  We have to keep the
work Greg did to keep those contribs shipped by default.  Oh, and that
is on the 9.1 Open Items, right?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Bug in XPATH() if expression returns a scalar value

2011-06-06 Thread Peter Eisentraut

On tis, 2011-05-31 at 16:19 +0200, Florian Pflug wrote:
 If people deem this to be a problem, we could instead add a separate
 function XPATH_VALUE() that returns VARCHAR, and make people use that
 for scalar-value-returning expressions.

Why not replicate what contrib/xml2 provides, namely

xpath_string()
xpath_number()
xpath_bool()

That way, types are preserved.

 However, to avoid confusion,
 XPATH() should then be taught to raise an error if used for
 scalar-value
 returning expressions, instead of silently returning an empty array as
 it does now.

Sounds reasonable.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Domains versus polymorphic functions, redux

2011-06-06 Thread Peter Eisentraut

On fre, 2011-06-03 at 13:53 -0500, Kevin Grittner wrote:
 Another long-range nicety would be something which I have seen in
 some other databases, and which is consistent with the inheritance
 theme, is that you can't compare or assign dissimilar domains -- an
 error is thrown. So if you try to join from the eye color column in
 a person table to the key of a hair color table, you get an error
 unless you explicitly cast one or both of them to the common type. 

What you are looking for is the SQL feature called distinct types.
The makers of the SQL standard have sort of deprecated domains in favor
of distinct types, because distinct types address your sort of use case
better, and prescribing the behavior of domains becomes weirder and
weirder as the type system becomes more complex.  Which is pretty much
the same experience we've been having over the years.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [v9.2] Fix leaky-view problem, part 1

2011-06-06 Thread Kohei Kaigai

This patch enables to fix up leaky-view problem using functions with tiny cost 
estimation scenario.

The point of this scenario is criteria to reorder qualifiers of scanning plan 
in order_qual_clauses(). The optimizer may pull up simple subqueries into upper 
level, then its qualifier will get merged with ones in the upper level. When 
executor scans a relation, qualifiers with smaller cost shall be executed 
earlier to minimize cost to filter out invisible tuples. However, we know 
unpreferable side-effects when we use a view for row-level security.
Even if a certain subquery rewritten from a view is defined for row-level 
security, a function with tiny cost appended from outside of the view may 
executed earlier than qualifiers to perform as security policy of the view, as 
long as the view is enough simple and the supplied function has tiny cost. In 
the result, this function can see the arguments come from invisible tuples, and 
leak them into somewhere.

The solution is quite simple. This patch enables to track original depth of 
qualifiers and modify criteria to sort qualifiers in order_qual_clauses().
Even if a function with tiny cost is supplied from outside of views, the 
patched optimizer does not prioritize cost estimation more than the depth.

It fixes up the scenario [1] in the bellow descriprions.


The background of the leaky-view problem is well summarized at:
  http://wiki.postgresql.org/wiki/RLS

We had discussed several scenarios in v9.1 development cycle, and the last 
developer meeting. We almost concluded the following criteria to characterize 
whether a leak-view scenario is problematic to be fixed, or not.
 * If unprived user can directly reference contents of invisible tuples, it is 
a problem to be fixed.
 * As long as contents of invisible tuples are consumed by internal stuff (eg, 
index-access method), it is not a problem to be fixed.

Thus, the scenario [1] and [2] are problematic to be fixed, but [3] and [4] are 
not. So, I'll try to fix up these two scenario with the patch part-1 amd part-2.

[1] unexpected reorder of functions with tiny-cost and side-effects

Qualifiers of WHERE or JOIN ... IN clause shall be sorted by estimated cost, 
not depth of nest level. Thus, this logic can make order reversal when 
user-given qualifier has smaller cost than qualifiers to perform as security 
policy inside of view.
In the result, these qualifiers can reference both of visible and invisible 
tuples prior to the filtering by row-level security policy of the view. Thus, 
this behavior can be used to leak contents of invisible tuples.


[2] unexpected push-down of functions with side-effect into join-loop

If arguments of qualifier being appended on outside of join-loop references 
only one-side of the join-loop, it is a good strategy to distribute this 
qualifier into inside of the join-loop to minimize number of tuples to be 
joined, from the viewpoint of performance.
However, it also makes order reversal when the join-loop is a part of view 
definition that should perform row-level security policy. Then, these 
exogenetic qualifiers may be executed prior to the filtering by row-level 
security policy of the view. Thus, this behavior can be used to leak contents 
of invisible tuple.


[3] estimation of hidden value using iteration of PK/FK proves

Due to the nature of PK/FK constraints, we can infer existence of key values 
being stored within invisible tuple, even if we never allows users to reference 
contents of invisible tuples.
We commonly call this type of information leaks covert-channel, and it is 
basically impossible to prevent according to the previous security research, 
however, its risk is also relatively small because of slow bandwidth to leak.
We already made consensus this scenario is not a problem to be fixed.


[4] estimation of hidden value using statistics

One example was selectivity-estimator function; that may reference statistical 
information delivered from the tables have invisible tuples for optimization. 
Here are two points to be considered. The one is purely internal stuff may be 
able to reference invisible tuples, however, it is not a problem as long as it 
does not leak them into end-users; such as index access methods. The second is 
statistical or other form of date delivered from invisible tuples. We can set 
up a table that contains data delivered from invisible tuples using row-level 
triggers, however, it is quite a matter of database administration. Unless 
owner of tables set up such a leakable configuration, other users cannot 
reference them.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@emea.nec.com


pgsql-fix-leaky-view-part-1.patch
Description: pgsql-fix-leaky-view-part-1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gdb with postgres

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 3:43 AM, HuangQi huangq...@gmail.com wrote:
 Hi,
    I was using gdb to debug postgres. In order to debug the backend of
 running query, I start postgres first and use select * from
 pg_backend_pid() to ask for backend pid. Then I start gdb in another bash
 window with gdb postgres and attach the pid obtained above and set the
 breakpoint. Then I run the query from the first window. However, the
 debugging precess which is shown below is not going to the breakpoint. I
 tried many different breakpoints, but it always start from the 305
 client_read_ended().

 GNU gdb 6.6
 Copyright (C) 2006 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain
 conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as sparc-sun-solaris2.10...
 (gdb) b qp_add_paths_to_joinrel
 Breakpoint 1 at 0x1a6744: file joinpath.c, line 67.
 (gdb) attach 23903
 Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903
 Retry #1:
 Retry #2:
 Retry #3:
 Retry #4:
 [New LWP 1]
 0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1
 (gdb) n
 Single stepping until exit from function _rt_boot,
 which has no line number information.
 secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305
 305                     client_read_ended();
 (gdb) n
 pq_recvbuf () at pqcomm.c:767
 767                     if (r  0)
 (gdb) n
 769                             if (errno == EINTR)
 (gdb) n
 782                     if (r == 0)
 (gdb) n
 788                             return EOF;
 (gdb) n
 791                     PqRecvLength += r;
 (gdb)
 Any one know what is going wrong? BTW, as you can see, the system is
 solaris.

Perhaps you want c for continue rather than n for next.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gdb with postgres

2011-06-06 Thread HuangQi

If I enter c, gdb will directly finish executing this process and current
query will finish. Furthermore, if I enter next query, gdb will not debug it
and stay in continue status.

On 6 June 2011 21:44, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 6, 2011 at 3:43 AM, HuangQi huangq...@gmail.com wrote:
  Hi,
 I was using gdb to debug postgres. In order to debug the backend of
  running query, I start postgres first and use select * from
  pg_backend_pid() to ask for backend pid. Then I start gdb in another
 bash
  window with gdb postgres and attach the pid obtained above and set the
  breakpoint. Then I run the query from the first window. However, the
  debugging precess which is shown below is not going to the breakpoint. I
  tried many different breakpoints, but it always start from the 305
  client_read_ended().
 
  GNU gdb 6.6
  Copyright (C) 2006 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you
 are
  welcome to change it and/or distribute copies of it under certain
  conditions.
  Type show copying to see the conditions.
  There is absolutely no warranty for GDB.  Type show warranty for
 details.
  This GDB was configured as sparc-sun-solaris2.10...
  (gdb) b qp_add_paths_to_joinrel
  Breakpoint 1 at 0x1a6744: file joinpath.c, line 67.
  (gdb) attach 23903
  Attaching to program `/usrlocal/pgsql/bin/postgres', process 23903
  Retry #1:
  Retry #2:
  Retry #3:
  Retry #4:
  [New LWP 1]
  0xff0cbaa4 in _rt_boot () from /usr/lib/ld.so.1
  (gdb) n
  Single stepping until exit from function _rt_boot,
  which has no line number information.
  secure_read (port=0x4a7760, ptr=0x455948, len=8192) at be-secure.c:305
  305 client_read_ended();
  (gdb) n
  pq_recvbuf () at pqcomm.c:767
  767 if (r  0)
  (gdb) n
  769 if (errno == EINTR)
  (gdb) n
  782 if (r == 0)
  (gdb) n
  788 return EOF;
  (gdb) n
  791 PqRecvLength += r;
  (gdb)
  Any one know what is going wrong? BTW, as you can see, the system is
  solaris.

 Perhaps you want c for continue rather than n for next.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Best Regards
Huang Qi Victor

Re: [HACKERS] gdb with postgres

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 9:47 AM, HuangQi huangq...@gmail.com wrote:
 If I enter c, gdb will directly finish executing this process and current
 query will finish. Furthermore, if I enter next query, gdb will not debug it
 and stay in continue status.

Hmm, that must mean your breakpoint isn't properly set.

Instead of doing gdb postgres and then using attach, try just doing
gdb -p PIDNAME, then set your breakpoint, then continue.  That's how
I've always done it, anyway...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gdb with postgres

2011-06-06 Thread Kevin Grittner

HuangQi huangq...@gmail.com wrote:
 
 (gdb) b qp_add_paths_to_joinrel
 Breakpoint 1 at 0x1a6744: file joinpath.c, line 67.
 (gdb) attach 23903
 
 If I enter c, gdb will directly finish executing this process and
 current query will finish.
 
Are you absolutely sure that running your query will result in a
call to this function?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: AuthenticationMD5 protocol documentation clarification

2011-06-06 Thread Robert Haas

On Sun, Jun 5, 2011 at 11:26 AM, Cyan Ogilvie cyan.ogil...@gmail.com wrote:
 This is my first patch, so I hope I've got the process right for submitting
 patches.

You're doing great.  I suspect we do want to either (1) reword what
you've done in English, rather than writing it as code, or at least
(2) add some SGML markup to the code.  Our next CommitFest starts in
just over a week, so you should receive some more specific feedback
pretty soon.

Also, if you'd like to help review someone else's patch, that would be great.

http://archives.postgresql.org/pgsql-rrreviewers/2011-06/msg0.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Tom Lane

=?UTF-8?Q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
  I think more about this with contrast to sent references, but I still 
  have in my mind construct
  Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit 
  outdated we have BlueRay
  conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
  where 1st parameter is myWeddingDvd,

Yes, if you insist upon designing the API like that, then you come to
the conclusion that you need global LOB identifiers.

However, there are many ways to design this that don't work that way.
One idea to think about is

insert into someonetubevideos values('')
returning open_for_write(videocolumn)

which gives you back some kind of writable stream ID (this is a
transient, within-session ID, not global) for the target field in the
row you just inserted.

BTW, as was noted upthread by Dimitri, this whole subject has been
discussed before on pgsql-hackers.  You really ought to go re-read the
previous threads.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Simon Riggs

On Mon, Jun 6, 2011 at 11:19 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 06.06.2011 12:40, Simon Riggs wrote:

 On Sat, Jun 4, 2011 at 5:55 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

 Simon Riggssi...@2ndquadrant.com  writes:

 The approach looks sound to me. It's a fairly isolated patch and we
 should be considering this for inclusion in 9.1, not wait another
 year.

 That suggestion is completely insane.  The patch is only WIP and full of
 bugs, even according to its author.  Even if it were solid, it is way
 too late to be pushing such stuff into 9.1.  We're trying to ship a
 release, not find ways to cause it to slip more.

 In 8.3, you implemented virtual transactionids days before we produced
 a Release Candidate, against my recommendation.

 FWIW, this bottleneck was not introduced by the introduction of virtual
 transaction ids. Before that patch, we just took the lock on the real
 transaction id instead.

Of course it wasn't. You've misunderstood completely.

My point was that we have in the past implemented performance changes
to increase scalability at the last minute, and also that our personal
risk perspectives are not always set in stone.

Robert has highlighted the value of this change and its clearly not
beyond our wit to include it, even if it is beyond our will to do so.


 The fact that you disagree with me does not make me insane.

 You are not insane, even if your suggestion is.

LOL. Your logic is still poor though. :-)

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Different execution time for same plan

2011-06-06 Thread Nick Raj

Hi,

I am using postgresql 8.4.6. I have made an index on my data-type that is
working fine. I mean output is coming properly.

When i execute the query first time, query takes a quite longer time but
second time execution of the same query takes very less time (despite
execution plan is same)

This is my first time execution of query 
*explain analyze select * from vehicle_st1 where '(116.3,39.3,2007/06/11
11:11:11),(117.2,39.8,2007/09/13 11:11:11)' @ stpoint;*
   QUERY
PLAN
-
 Index Scan using st1 on vehicle_st1  (cost=0.00..8226.36 rows=2096
width=66) (actual time=65.962..1587.627 rows=9069 loops=1)
   Index Cond: ('(116.30,39.30,2007-06-11
11:11:11+05:30),(117.20,39.80,2007-09-13 11:11:11+05:30)'::ndpoint
@ stpoint)
* Total runtime: 1594.446 ms*
(3 rows)

Second time
*explain analyze select * from vehicle_st1 where '(116.3,39.3,2007/06/11
11:11:11),(117.2,39.8,2007/09/13 11:11:11)' @ stpoint;*
   QUERY
PLAN
-
 Index Scan using st1 on vehicle_st1  (cost=0.00..8226.36 rows=2096
width=66) (actual time=0.156..14.316 rows=9069 loops=1)
   Index Cond: ('(116.30,39.30,2007-06-11
11:11:11+05:30),(117.20,39.80,2007-09-13 11:11:11+05:30)'::ndpoint
@ stpoint)
 *Total runtime: 19.525 ms*
(3 rows)

Third time
*It gives 17.148 ms*

Fourth time
*It gives 25.102 ms*

MY postgresql.conf file having setting like this (this is original setting,
i haven't modify anything)

#--
# RESOURCE USAGE (except WAL)
#--

# - Memory -

shared_buffers = 28MB# min 128kB
# (change requires restart)
#temp_buffers = 8MB# min 800kB
#max_prepared_transactions = 0# zero disables the feature
# (change requires restart)
# Note:  Increasing max_prepared_transactions costs ~600 bytes of shared
memory
# per transaction slot, plus lock space (see max_locks_per_transaction).
# It is not advisable to set max_prepared_transactions nonzero unless you
# actively intend to use prepared transactions.
#work_mem = 1MB# min 64kB
#maintenance_work_mem = 16MB# min 1MB
#max_stack_depth = 2MB# min 100kB



Why the same plan giving different execution time? (Reason may be data gets
buffered (cached) for the second time execution) Why there is so much
difference?
I want to know the estimate correct time of this query then which option is
true?
1. First one(1594 ms) when application just started, all buffer are empty.
But in practical situation they are not fully empty.
2. I have to taken the stable execution time (19-21 ms).
3. Average down these four execution time.

Which option will be true?

Thanks
Nick

Re: [HACKERS] Domains versus polymorphic functions, redux

2011-06-06 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 What you are looking for is the SQL feature called distinct types.
 The makers of the SQL standard have sort of deprecated domains in favor
 of distinct types, because distinct types address your sort of use case
 better, and prescribing the behavior of domains becomes weirder and
 weirder as the type system becomes more complex.  Which is pretty much
 the same experience we've been having over the years.

Yeah ... the one thing that is actually completely broken about (our
current interpretation of) domains is that a first-class SQL datatype
cannot sanely have a NOT NULL constraint attached to it.  That just
doesn't work in conjunction with outer joins, to take one glaring
example.

As I mentioned upthread, a closer look at the standard leads me to think
that the committee doesn't actually intend that a domain's constraints
follow it through operations --- I now think they only intend that the
constraints get checked in the context of a cast to the domain
(including assignment casts).  In our terminology that would mean that a
domain gets downcast to its base type as soon as you do anything at all
to the value, even just pass it through a join.

There are certainly applications where such a behavior isn't what you
want, but trying to force domains to do something else is just not going
to lead to desirable results.  It's better to invent some other concept,
and it sounds like the committee reached the same conclusion.

Anyway, I think we're out of time to do anything about the issue for
9.1.  I think what we'd better do is force a downcast in the context
of matching to an ANYARRAY parameter, and leave the other cases to
revisit later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Radosław Smogura

Tom Lane t...@sss.pgh.pa.us Monday 06 of June 2011 16:13:26
 =?UTF-8?Q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
   I think more about this with contrast to sent references, but I still
   have in my mind construct
   Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
   outdated we have BlueRay
   conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
   where 1st parameter is myWeddingDvd,
 
 Yes, if you insist upon designing the API like that, then you come to
 the conclusion that you need global LOB identifiers.
 
 However, there are many ways to design this that don't work that way.
 One idea to think about is
 
   insert into someonetubevideos values('')
   returning open_for_write(videocolumn)
 
 which gives you back some kind of writable stream ID (this is a
 transient, within-session ID, not global) for the target field in the
 row you just inserted.
I know, but this is a little bit old-fashioned bahaviour.

 BTW, as was noted upthread by Dimitri, this whole subject has been
 discussed before on pgsql-hackers.  You really ought to go re-read the
 previous threads.
 
   regards, tom lane

I read this, but it may be rethinked again. Actaully changes to TOAST (I mean 
streaming will be just for LOBs, I had written all found disadvantages for 
TOAST for LOB and it's looks like only performance of above is some kind of 
disadvantage, as well this prevent some less usefull concepts of Copy on Write 
for LOBs.

Introducing streaming for TOAST is little useless, sorry just for cite from 
my, mentoined document:

(This is generally about on demand stream of TOASTed value, in 
context of LOBs is acceptable, as long not transactional aware LOBs are 
acceptable). If we will add streaming of TOASTed values, so caller will 
get some reference to this value, we need to ensure that pointed data 
will not be changed, nor deleted - I think this will require caller to 
add FOR UPDATE (or silently to add this by server) for each statement 
returning pointers to TOASTed, as client may do transactional query, and 
other client just after (1st) may remove record, commit, and call 
VACUUM. In this situation when 1st will try to read data form given row, 
it will get error. This may be accpetable for LOBs (commonly LOBs may be not 
transaction aware, but I will be angry if this will happen with VARCHAR)

If this is acceptable I will do following changes.

Add 
- server_max_in_memory_lob_size - GUC server start-only config to describe 
maximum value of client session parameter max_in_memory_lob.

- max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
memory before backing up to file

- rescursivly toasting, detoasting during insert/update/remove for searching 
for LOBs (we need this for arrays and complex types) - this is for last stage 
(error disallowing LOBs in composites/arrays may be quite enaugh, for 
begining) - I want LOBs to be starting point for LOBing other types (e.g. some 
big arrays may be LOBbed).

- during toasting, lob will be toasted and in place of LOB, the reference to 
it will be putted, and encoded in LOB datum.

- api for LOB manipulation (few changes to current implementation) in way that 
BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
for LOB will not affect size of datum looking at size of LOB.

- api for maintaing temoraly lob files, we need this as per session list of id 
- file desc, to prevent prevent stealing of lobs by different connections 
(security)

- streaming api for TOASTED values (based on COPY protocol, or changed COPY 
protocol) or at least function calls - I havent looked at this in context of 
TOASTed LOBs.

Is it good?

Regards,
Radek


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Tom Lane

=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
 Introducing streaming for TOAST is little useless, sorry just for cite from 
 my, mentoined document:

 (This is generally about on demand stream of TOASTed value, in 
 context of LOBs is acceptable, as long not transactional aware LOBs are 
 acceptable). If we will add streaming of TOASTed values, so caller will 
 get some reference to this value, we need to ensure that pointed data 
 will not be changed, nor deleted - I think this will require caller to 
 add FOR UPDATE (or silently to add this by server) for each statement 
 returning pointers to TOASTed, as client may do transactional query,

It's already been explained to you that that's not the case.

 If this is acceptable I will do following changes.

 Add 
 - server_max_in_memory_lob_size - GUC server start-only config to describe 
 maximum value of client session parameter max_in_memory_lob.

 - max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
 memory before backing up to file

 - rescursivly toasting, detoasting during insert/update/remove for searching 
 for LOBs (we need this for arrays and complex types) - this is for last stage 
 (error disallowing LOBs in composites/arrays may be quite enaugh, for 
 begining) - I want LOBs to be starting point for LOBing other types (e.g. 
 some 
 big arrays may be LOBbed).

 - during toasting, lob will be toasted and in place of LOB, the reference to 
 it will be putted, and encoded in LOB datum.

 - api for LOB manipulation (few changes to current implementation) in way 
 that 
 BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
 for LOB will not affect size of datum looking at size of LOB.

 - api for maintaing temoraly lob files, we need this as per session list of 
 id 
 - file desc, to prevent prevent stealing of lobs by different connections 
 (security)

 - streaming api for TOASTED values (based on COPY protocol, or changed COPY 
 protocol) or at least function calls - I havent looked at this in context of 
 TOASTed LOBs.

 Is it good?

This all looks like you decided on a solution first and then started to
look for a problem to apply it to.  I don't want to see us inventing a
pile of GUCs for this, and I don't think there is a need to make any
fundamental changes in the TOAST mechanism either.  What we do need is a
streaming access protocol to read or write wide field values *without*
forcing people to change the basic structure of their tables.  You keep
on wanting to invent weird, IMO unnecessary language features instead.
Try to make the feature as transparent as possible, not as visible as
possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Alvaro Herrera

Excerpts from Tom Lane's message of sáb jun 04 12:49:05 -0400 2011:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  What surprises me is that the open references remain after a database
  drop.  Surely this means that no backends keep open file descriptors to
  any table in that database, because there are no connections.
 
 bgwriter ...

Actually you were both wrong, hah.  It's not bgwriter, and this doesn't
belong on pgsql-general.  It's a backend.  However, as we mentioned
initially, the database to which this file belongs is dropped.

What we found out after more careful investigation is that the file is
kept open by a backend connected to a different database.  I have a
suspicion that what happened here is that this backend was forced to
flush out a page from shared buffers to read some other page; and it was
forced to do a fsync of this file.  And then it forgets to close the
file descriptor.

Actually, there are 11 processes holding open file descriptors to the
table, each to a slightly different subset of the many segments of the
table.  (There's also one holding a FD to the deleted
pg_largeobject_loid_pn_index -- remember, this is a dropped database).
All those backends belong to Zabbix, the monitoring platform, which are
connected to a different database.

I think what we have here is a new bug.

(This is running 8.4.8, by the way.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 17:13, Tom Lane wrote:

=?UTF-8?Q?Rados=C5=82aw_Smogura?=rsmog...@softperience.eu  writes:

  I think more about this with contrast to sent references, but I still
  have in my mind construct
  Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
  outdated we have BlueRay
  conn.prepareStatemnt(INSERT INTO someonetubevideos values (?))
  where 1st parameter is myWeddingDvd,


Yes, if you insist upon designing the API like that, then you come to
the conclusion that you need global LOB identifiers.


That's what the JDBC api looks like, but it doesn't mean you need global 
LOB identifiers. When you create the Blob object (myWeddingDvd), the 
driver can just keep a reference to the given stream (myWeddingStream) 
to the Blob object. When you execute the INSERT statement, the driver 
can read the stream and stream the data to the server.


The protocol changes I think Tom and I and others are envisioning would 
work just fine with that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Different execution time for same plan

2011-06-06 Thread Kevin Grittner

First off, this is posted to the wrong list -- this list is for
discussion of development of the PostgreSQL product.  There is a
list for performance questions where this belongs:
pgsql-performa...@postgresql.org.  I'm moving this to the
performance list with a blind copy to the -hackers list so people
know where the discussion went.
 
Nick Raj nickrajj...@gmail.com wrote:
 
 When i execute the query first time, query takes a quite longer
 time but second time execution of the same query takes very less
 time (despite execution plan is same)
 
 Why the same plan giving different execution time? (Reason may be
 data gets buffered (cached) for the second time execution) Why
 there is so much difference?
 
Because an access to a RAM buffer is much, much faster than a disk
access.
 
 Which option will be true?
 
It depends entirely on how much of the data needed for the query is
cached.  Sometimes people will run a set of queries to warm the
cache before letting users in.
 
 MY postgresql.conf file having setting like this (this is original
 setting, i haven't modify anything)
 
 shared_buffers = 28MB
 
 #work_mem = 1MB# min 64kB
 #maintenance_work_mem = 16MB# min 1MB
 
If you're concerned about performance, these settings (and several
others) should probably be adjusted:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server  
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 10:49 AM, Simon Riggs si...@2ndquadrant.com wrote:
 My point was that we have in the past implemented performance changes
 to increase scalability at the last minute, and also that our personal
 risk perspectives are not always set in stone.

 Robert has highlighted the value of this change and its clearly not
 beyond our wit to include it, even if it is beyond our will to do so.

So, at the risk of totally derailing this thread -- what this boils
down to is a philosophical disagreement.

It seems to me (and, I think, to Tom and Heikki and others as well)
that it's not possible to keep on making changes to the release right
up until the last minute and then expect the release to be of high
quality.  If we keep committing new features, then we'll keep
introducing new bugs.  The only hope of making the bug count go down
at some point is to stop making changes that aren't bug fixes.  We
could come up with some complex procedure for determining whether a
patch is important enough and non-invasive enough to bypass the normal
deadline, but that would probably lead to a lot more arguing about
procedure, and realistically, it's still going to increase the bug
count at least somewhat.  IMHO, it's better to just have a deadline,
and stuff either makes it or it doesn't.  I realize we haven't always
adhered to the principle in the past, but at least IMV that's not a
mistake we want to continue repeating.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: AuthenticationMD5 protocol documentation clarification

2011-06-06 Thread Heikki Linnakangas


On 06.06.2011 16:58, Robert Haas wrote:

On Sun, Jun 5, 2011 at 11:26 AM, Cyan Ogilviecyan.ogil...@gmail.com  wrote:

This is my first patch, so I hope I've got the process right for submitting
patches.


You're doing great.  I suspect we do want to either (1) reword what
you've done in English, rather than writing it as code, or at least
(2) add some SGML markup to the code.  Our next CommitFest starts in
just over a week, so you should receive some more specific feedback
pretty soon.


That is quite complicated to explain in plain English, so some sort of 
pseudo-code is probably a good idea. I would recommend not to formulate 
it as a SQL expression, though. It makes you think you could execute it 
from psql or something. Even if you know that's not how to do it, it 
feels confusing. Maybe something like:


literalmd5/literal hex_encode(md5(hex_encode(md5(password username) 
salt)


with some extra markup to make it look pretty.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Kevin Grittner

Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 What we found out after more careful investigation is that the
 file is kept open by a backend connected to a different database. 
 I have a suspicion that what happened here is that this backend
 was forced to flush out a page from shared buffers to read some
 other page; and it was forced to do a fsync of this file.  And
 then it forgets to close the file descriptor.
 
This sounds vaguely similar to what I found with WAL files being
held open for days after they were deleted by read-only backends:
 
http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us
 
I mention it only because there might be one place to fix both
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Alvaro Herrera

Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011:
 Alvaro Herrera alvhe...@commandprompt.com wrote:
  
  What we found out after more careful investigation is that the
  file is kept open by a backend connected to a different database. 
  I have a suspicion that what happened here is that this backend
  was forced to flush out a page from shared buffers to read some
  other page; and it was forced to do a fsync of this file.  And
  then it forgets to close the file descriptor.
  
 This sounds vaguely similar to what I found with WAL files being
 held open for days after they were deleted by read-only backends:
  
 http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us
  
 I mention it only because there might be one place to fix both

Hmm interesting.  I don't think the placement suggested by Tom would be
useful, because the Zabbix backends are particularly busy all the time,
so they wouldn't run ProcessCatchupEvent at all.


-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Kevin Grittner's message of lun jun 06 11:58:51 -0400 2011:
 Alvaro Herrera alvhe...@commandprompt.com wrote:
 What we found out after more careful investigation is that the
 file is kept open by a backend connected to a different database. 
 I have a suspicion that what happened here is that this backend
 was forced to flush out a page from shared buffers to read some
 other page; and it was forced to do a fsync of this file.  And
 then it forgets to close the file descriptor.

It doesn't forget to close the descriptor; it intentionally keeps it
for possible further use.

 This sounds vaguely similar to what I found with WAL files being
 held open for days after they were deleted by read-only backends:
 http://archives.postgresql.org/message-id/15412.1259630...@sss.pgh.pa.us
 I mention it only because there might be one place to fix both

 Hmm interesting.  I don't think the placement suggested by Tom would be
 useful, because the Zabbix backends are particularly busy all the time,
 so they wouldn't run ProcessCatchupEvent at all.

Yeah, I wasn't that thrilled with the suggestion either.  But we can't
just have backends constantly closing every open FD they hold, or
performance will suffer.  I don't see any very good place to do this...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Kevin Grittner

Robert Haas robertmh...@gmail.com wrote:
 
 IMHO, it's better to just have a deadline, and stuff either makes
 it or it doesn't.  I realize we haven't always adhered to the
 principle in the past, but at least IMV that's not a mistake we
 want to continue repeating.
 
+1
 
I've said it before, but I think it bears repeating, that deferring
this to 9.2 doesn't mean that it comes out in a production release
12 months later -- unless we continue to repeat this mistake
endlessly.  It means that this release comes out closer to when we
said it would -- for the sake of argument let's hypothesize one
month.  So by holding the line on such inclusions all the current
9.1 features come out one month sooner, and this feature comes out
11 months later than it would have if we'd put it into 9.1.  With
some feature we consider squeezing in, it would be more like
delaying everything which is done by three months so that one
feature gets out nine months earlier.
 
Perhaps the best way to describe the suggestion that this be
included in 9.1 isn't that it's an insane suggestion; but that it's
a suggestion which, if adopted, would be likely to drive those who
are striving for a more organized development and release process
insane.
 
Or one could look at it in a cost/benefit format -- major features
delivered per year go up by holding the line, administrative costs
are reduced, and people who are focusing on release stability get
more months per year to do development.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WALInsertLock tuning

2011-06-06 Thread Simon Riggs

In earlier discussions of how to improve WALInsertLock contention, it
was observed that we must zero each new page before we advance the WAL
insertion point.
http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html

IMHO the page zeroing is completely unnecessary, and replication works
perfectly well without that (as a test of recovery logic). It is
unnecessary because we already allow non-zeroed parts of WAL files,
and provide a mechanism to detect stale data.

The following trivial patch removes the page zeroing, which reduces
the lock duration.

Comments?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


walinsertlock_avoid_zero.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:
 Alvaro Herrera alvhe...@commandprompt.com writes:

  Hmm interesting.  I don't think the placement suggested by Tom would be
  useful, because the Zabbix backends are particularly busy all the time,
  so they wouldn't run ProcessCatchupEvent at all.
 
 Yeah, I wasn't that thrilled with the suggestion either.  But we can't
 just have backends constantly closing every open FD they hold, or
 performance will suffer.  I don't see any very good place to do this...

How about doing something on an sinval message for pg_database?
That doesn't solve the WAL problem Kevin found, of course ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
 2011/6/6 Darren Duncan dar...@darrenduncan.net:
  Jeff Davis wrote:
 
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
 
  First of all, what are the advantages to being in core?
 
 it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of next,
which has some hazards to it (as Tom pointed out, we don't want someone
to define the next for a float to be +1.0). I'll have to think about
this.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Kevin Grittner

Alvaro Herrera alvhe...@commandprompt.com wrote:
 
 That doesn't solve the WAL problem Kevin found, of course ...
 
I wouldn't worry about that too much -- the WAL issue is
self-limiting and not likely to ever cause a failure.  The biggest
risk is that every now and then some new individual will notice it
and waste a bit of time investigating.  The LO issue, on the other
hand, could easily eat enough disk to cause a failure.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
 Jeff Davis wrote:
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
  
  First of all, what are the advantages to being in core?
 
 I believe that ranges aka intervals are widely useful generic types, next 
 after 
 relations/tuples/arrays, and they *should* be supported in core, same as 
 arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

 Now assuming that a range/interval value is generally defined in terms of a 
 pair 
 of endpoints of some ordered type (that is, a type for which ORDER BY or RANK 
 or 
 {,,=,=} etc or LIMIT makes sense), it will be essential that this value 
 is 
 capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

 Also, if Postgres has some concept of type-generic special values -Inf and 
 +Inf 
 (which always sort before or after any other value in the type system), those 
 can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

 Unless you have some other syntax in mind, I suggest lifting the range 
 literal 
 syntax from Perl 6, where .. is an infix operator building a range between 
 its 
 arguments, and a ^ on either side means that side is open, I think; so 
 there 
 are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, . is not valid in an operator name in PG.
Maybe I can use tilde or dash?

 Any operation that wants to deal with a range somehow, such as the BETWEEN 
 syntax, could instead use a range/interval; for example, both of:
 
foo in 1..10

I don't know if it's reasonable to introduce syntax like in here.
Maybe we could just still use between and it would recognize that the
RHS is a range?


 The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

 Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use : as an operator.

 A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

 An array or relation of these range can represent ranges with holes, and the 
 general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Pavel Stehule

2011/6/6 Jeff Davis pg...@j-davis.com:
 On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
 2011/6/6 Darren Duncan dar...@darrenduncan.net:
  Jeff Davis wrote:
 
  I'd like to take another look at Range Types and whether part of it
  should be an extension. Some of these issues relate to extensions in
  general, not just range types.
 
  First of all, what are the advantages to being in core?

 it should be supported by FOREACH statement in PL/pgSQL

 Oh, good idea. It would only work for discrete ranges though.

 However, I would need to somehow reintroduce the concept of next,
 which has some hazards to it (as Tom pointed out, we don't want someone
 to define the next for a float to be +1.0). I'll have to think about
 this.

we can define a step

FOREACH x IN RANGE . BY 
LOOP
END LOOP

Regards

Pavel


 Regards,
        Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:
 Yeah, I wasn't that thrilled with the suggestion either.  But we can't
 just have backends constantly closing every open FD they hold, or
 performance will suffer.  I don't see any very good place to do this...

 How about doing something on an sinval message for pg_database?
 That doesn't solve the WAL problem Kevin found, of course ...

Hmm ... that would help for the specific scenario of dropped databases,
but we've also heard complaints about narrower cases such as a single
dropped table.

A bigger issue is that I don't believe it's very practical to scan the
FD array looking for files associated with a particular database (or
table).  They aren't labeled that way, and parsing the file path to
find out the info seems pretty grotty.

On reflection I think this behavior is probably limited to the case
where we've done what we used to call a blind write of a block that
is unrelated to our database or tables.  For normal SQL-driven accesses,
there's a relcache entry, and flushing of that entry will lead to
closure of associated files.  I wonder whether we should go back to
forcibly closing the FD after a blind write.  This would suck if a
backend had to do many dirty-buffer flushes for the same relation,
but hopefully the bgwriter is doing most of those.  We'd want to make
sure such forced closure *doesn't* occur in the bgwriter.  (If memory
serves, it has a checkpoint-driven closure mechanism instead.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of lun jun 06 12:10:24 -0400 2011:
 Yeah, I wasn't that thrilled with the suggestion either.  But we can't
 just have backends constantly closing every open FD they hold, or
 performance will suffer.  I don't see any very good place to do this...

 How about doing something on an sinval message for pg_database?
 That doesn't solve the WAL problem Kevin found, of course ...

 Hmm ... that would help for the specific scenario of dropped databases,
 but we've also heard complaints about narrower cases such as a single
 dropped table.

 A bigger issue is that I don't believe it's very practical to scan the
 FD array looking for files associated with a particular database (or
 table).  They aren't labeled that way, and parsing the file path to
 find out the info seems pretty grotty.

 On reflection I think this behavior is probably limited to the case
 where we've done what we used to call a blind write of a block that
 is unrelated to our database or tables.  For normal SQL-driven accesses,
 there's a relcache entry, and flushing of that entry will lead to
 closure of associated files.  I wonder whether we should go back to
 forcibly closing the FD after a blind write.  This would suck if a
 backend had to do many dirty-buffer flushes for the same relation,
 but hopefully the bgwriter is doing most of those.  We'd want to make
 sure such forced closure *doesn't* occur in the bgwriter.  (If memory
 serves, it has a checkpoint-driven closure mechanism instead.)

Instead of closing them immediately, how about flagging the FD and
closing all the flagged FDs at the end of each query, or something
like that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote:
 I think the way things are going to be organised now is that we will
 have core-blessed extensions:  don't mix the mechanism and the policy.

I like that idea.

  non-issue if we had a good type interface system (that works on
  polymorphic types) -- we could just have a built-in range interface,
  and the range extension could add  as the range interface's overlaps
  operator for the type ANYRANGE.
 
 That means that this is, IMHO, the right approach.  Have core support
 that enables user defined RANGE types with indexing and planner support,
 etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

If we take the minimal approach, the index support would be the first to
be moved to an extension. In order to have index support in core, we
need quite a few functions and a significant amount of code.

Minimal would be:
  * CREATE TYPE ... AS RANGE
  * ANYRANGE
  * The IO functions
  * Possibly the constructors and accessors ( range(),
range_oc(), lower(), upper(), etc.)

Regarding the type interfaces, the only thing that really worries me
there is that my future work will depend on them existing, and I haven't
really thought through the details. For instance, it just occurred to me
recently that it would need to support polymorphic types, which might be
a little bit more complicated than a simple lookup.

I suppose it's easier to put a few functions in core later if we get
stuck than to rip them out later.

 And the useful stuff you need to have to benefit from that core support
 would be an extension.  It could be a core maintained extension, and it
 could even get installed by default, so that all the users would need to
 do is 'CREATE EXTENSION timeranges;', for example.

Sounds good to me. However, would the extension be available in
pg_regress? If not, I will need to include those constructors/accessors
to be able to test anything.

 I think the consensus is to instead add a new chapter (maybe between
 current chapters 9. Functions and Operators and 10. Type Conversion) and
 host “core extensions” docs there.  The source code organisation is
 controversial because technically not necessary.  We have to keep the
 work Greg did to keep those contribs shipped by default.  Oh, and that
 is on the 9.1 Open Items, right?

OK, so there are still a few things to be decided around documentation
and tests. Both of those things can take a significant amount of time to
rework, so I think I'll leave it alone until we have more of a
consensus.

We still have time before 9.2 to break some of the code out into an
extension when we do have the doc/test issues resolved.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Christopher Browne

On Sun, Jun 5, 2011 at 6:59 PM, Jeff Davis pg...@j-davis.com wrote:
 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

 Thoughts?

I can see merit to having parts of RANGE implemented in core, along
with some of the usage parts implemented as extensions, so that if
I'm not actually using (say) INET ranges, then the database isn't
cluttered up with all the functions and operators for INET ranges.

How to slice it apart into an appropriate admixture of core and
extensions is a good question, though it seems pretty likely that
having an extension for each data type that is to be mixed into a
range is a reasonable way to go.

I think this also can make some would-be arguments against RANGE go away...

I hate that this RANGE extension means we have to draw 5000 lines of
code into every database, and draws in 275 operator functions
evaporates if the base part is entirely smaller, and if you only
draw in all the functions and operators if you request loading of each
of the 17 extensions.

Per-type extensions offers a pretty natural partitioning of the code
for each type, which seems pretty good.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
 we can define a step
 
 FOREACH x IN RANGE . BY 

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 In earlier discussions of how to improve WALInsertLock contention, it
 was observed that we must zero each new page before we advance the WAL
 insertion point.
 http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html

 IMHO the page zeroing is completely unnecessary,

I don't believe it's completely unnecessary.  It does in fact offer
additional protection against mistakenly taking stale data as valid.
You could maybe argue that the degree of safety increase isn't
sufficient to warrant the cost of zeroing the page, but you've not
offered any quantification of either the risk or the cost savings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Alvaro Herrera

Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:

 But the problem of vacuum stalling out because it can't get the
 cleanup lock is a very real one.  I've seen at least one customer hit
 this in production, and it was pretty painful.  Now, granted, you need
 some bad application design, too: you have to leave a cursor lying
 around instead of running it to completion and then stopping.  But
 supposing you do make that mistake, you might hope that it wouldn't
 cause VACUUM starvation, which is what happens today.  IOW, I'm less
 worried about whether the cleanup lock is slowing vacuum down than I
 am about eliminating the pathological cases where an autovacuum
 workers gets pinned down, stuck waiting for a cleanup lock that never
 arrives.  Now the table doesn't get vacuumed (bad) and the system as a
 whole is one AV worker short of what it's supposed to have (also bad).

One of the good things about your proposal is that (AFAICS) you can
freeze tuples without the cleanup lock, so the antiwraparound cleanup
would still work.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 On reflection I think this behavior is probably limited to the case
 where we've done what we used to call a blind write of a block that
 is unrelated to our database or tables.  For normal SQL-driven accesses,
 there's a relcache entry, and flushing of that entry will lead to
 closure of associated files.  I wonder whether we should go back to
 forcibly closing the FD after a blind write.  This would suck if a
 backend had to do many dirty-buffer flushes for the same relation,
 but hopefully the bgwriter is doing most of those.  We'd want to make
 sure such forced closure *doesn't* occur in the bgwriter.  (If memory
 serves, it has a checkpoint-driven closure mechanism instead.)

 Instead of closing them immediately, how about flagging the FD and
 closing all the flagged FDs at the end of each query, or something
 like that?

Hmm, there's already a mechanism for closing temp FDs at the end of a
query ... maybe blind writes could use temp-like FDs?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Jeff Davis

On Mon, 2011-06-06 at 16:45 +, Christopher Browne wrote:
 How to slice it apart into an appropriate admixture of core and
 extensions is a good question, though it seems pretty likely that
 having an extension for each data type that is to be mixed into a
 range is a reasonable way to go.

...

 Per-type extensions offers a pretty natural partitioning of the code
 for each type, which seems pretty good.

Ideally, most range types can be created with a simple:

CREATE TYPE foorange AS RANGE (subtype=foo);

There might be a few subtype-specific functions, like the canonical
function, but overall it should be a small amount of code per range.
However, I'd say just bundle a bunch of rangetypes together in one
extension. There's not really much cost -- if you are using one range
type, you'll use a few more.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Robert Haas

On Sun, Jun 5, 2011 at 2:59 PM, Jeff Davis pg...@j-davis.com wrote:
 So, where on this spectrum should range types fall? I think the most
 minimalist would be to only support #1 (and the necessary type IO
 functions); and leave all other functions, operators, and opclasses to
 an extension. That has a lot of appeal, but I don't think we can ignore
 the challenges above.

 On the other hand, trying to make it a complete feature in core has
 challenges as well. For instance, even with Range Types, Exclusion
 Constraints aren't practical out-of-the-box unless we also have
 BTree-GiST in core. So there's a snowball effect.

 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

I don't have clear feeling on this question in general, but if we're
going to break this up into pieces, it's important that they be
logical pieces.  Putting half the feature in core and half into an
extension just because we can will simplify complicate code
maintenance to no good end.  The snowball effect is something to
avoid, and we need to watch out for that, but if the upshot of putting
part of it in core is that the core code can no longer be understood
or maintained because it depends heavily on a bunch of non-core code,
that's not helpful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] heap vacuum cleanup locks

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 12:49 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of lun jun 06 08:06:10 -0400 2011:

 But the problem of vacuum stalling out because it can't get the
 cleanup lock is a very real one.  I've seen at least one customer hit
 this in production, and it was pretty painful.  Now, granted, you need
 some bad application design, too: you have to leave a cursor lying
 around instead of running it to completion and then stopping.  But
 supposing you do make that mistake, you might hope that it wouldn't
 cause VACUUM starvation, which is what happens today.  IOW, I'm less
 worried about whether the cleanup lock is slowing vacuum down than I
 am about eliminating the pathological cases where an autovacuum
 workers gets pinned down, stuck waiting for a cleanup lock that never
 arrives.  Now the table doesn't get vacuumed (bad) and the system as a
 whole is one AV worker short of what it's supposed to have (also bad).

 One of the good things about your proposal is that (AFAICS) you can
 freeze tuples without the cleanup lock, so the antiwraparound cleanup
 would still work.

Yeah, I think that's a major selling point.  VACUUM getting stuck is
Bad.  Anti-wraparound VACUUM getting stuck is Really Bad.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-06-06 Thread Alvaro Herrera

Excerpts from Josh Kupershmidt's message of dom jun 05 16:36:57 -0400 2011:
 On Tue, May 24, 2011 at 10:31 PM, Josh Kupershmidt schmi...@gmail.com wrote:
  Attached is a rebased patch. From a quick look, it seems that most of
  the object types missing from \dd are already covered by pg_comments
  (cast, constraint, conversion, domain, language, operator class,
  operator family). A few objects would probably still need to be added
  (foreign data wrapper, server).
 
 Here's another update to this patch. Includes:
  * rudimentary doc page for pg_comments
  * 'foreign data wrapper' and 'server' comment types now included in
 pg_comments; regression test updated

Hmm, if we're going to have pg_comments as a syntactic sugar kind of
thing, it should output things in format immediately useful to the user,
i.e. relation/column/etc names and not OIDs.  The OIDs would force you
to do lots of joins just to make it readable.  Maybe you should have a
column for the class of object the comment applies to, but as a name and
not a regclass.  And then a column for names that each comment applies
to.  (We're still struggling to get a useful pg_locks display).  I mean,
if OIDs are good for you and you're OK with doing a few joins, why not
go to the underlying catalogs in the first place?

(IMHO anyway -- what do others think?)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Postmaster holding unlinked files for pg_largeobject table

2011-06-06 Thread Alvaro Herrera

Excerpts from Tom Lane's message of lun jun 06 12:49:46 -0400 2011:
 Robert Haas robertmh...@gmail.com writes:
  On Mon, Jun 6, 2011 at 12:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  On reflection I think this behavior is probably limited to the case
  where we've done what we used to call a blind write of a block that
  is unrelated to our database or tables. For normal SQL-driven accesses,
  there's a relcache entry, and flushing of that entry will lead to
  closure of associated files. I wonder whether we should go back to
  forcibly closing the FD after a blind write. This would suck if a
  backend had to do many dirty-buffer flushes for the same relation,
  but hopefully the bgwriter is doing most of those. We'd want to make
  sure such forced closure *doesn't* occur in the bgwriter. (If memory
  serves, it has a checkpoint-driven closure mechanism instead.)
 
  Instead of closing them immediately, how about flagging the FD and
  closing all the flagged FDs at the end of each query, or something
  like that?
 
 Hmm, there's already a mechanism for closing temp FDs at the end of a
 query ... maybe blind writes could use temp-like FDs?

OK, I'll have a look at how blind writes work this afternoon and propose
something.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Simon Riggs

On Mon, Jun 6, 2011 at 5:14 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Perhaps the best way to describe the suggestion that this be
 included in 9.1 isn't that it's an insane suggestion; but that it's
 a suggestion which, if adopted, would be likely to drive those who
 are striving for a more organized development and release process
 insane.

Kevin, I respect your opinion and thank you for stating your case
without insults.

In this discussion it should be recognised that I have personally
driven the development of a more organized dev and release process. I
requested and argued for stated release dates to assist resource
planning and suggested commitfests as a mechanism to reduce the
feedback times for developers. I also provided the first guide to
patch reviews we published. So I am a proponent of planning and
organization, though some would like to claim I see things
differently.

The major problems of the dev process are now solved, yet more
organization is still being discussed, as if more == better. What
I hear is changed organization and I am not certain that all
change == better in what I see is a leading example of how to
produce great software.

Releasing regularly is important, but not more important than
anything. Ever. Period. Trying to force that will definitely make you
mad, I can see. I request that people stop trying to enforce a process
so strictly that sensible and important change cannot take place when
needed.


 Or one could look at it in a cost/benefit format -- major features
 delivered per year go up by holding the line, administrative costs
 are reduced, and people who are focusing on release stability get
 more months per year to do development.

I do look at it in a cost/benefit format. The problem is the above
statement has nothing user-centric about it.

The cost to us is a few days work and the benefit is a whole year's
worth of increased performance for our user base, which has a hardware
equivalent well into the millions of dollars.

And that's ignoring the users that would've switched to using Postgres
earlier, and those who might leave because of competitive comparison.

I won't say any more about this because I am in no way a beneficiary
from this and even my opinion is given unpaid.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Pavel Stehule

2011/6/6 Jeff Davis pg...@j-davis.com:
 On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
 we can define a step

 FOREACH x IN RANGE . BY 

 That wouldn't need any of the range infrastructure at all -- it would be
 purely syntactic, right?


I don't think. For lot of types the specification of a step is
necessary - for date, for float.

Pavel

 Regards,
        Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Josh Kupershmidt's message of dom jun 05 16:36:57 -0400 2011:
 On Tue, May 24, 2011 at 10:31 PM, Josh Kupershmidt schmi...@gmail.com 
 wrote:
  Attached is a rebased patch. From a quick look, it seems that most of
  the object types missing from \dd are already covered by pg_comments
  (cast, constraint, conversion, domain, language, operator class,
  operator family). A few objects would probably still need to be added
  (foreign data wrapper, server).

 Here's another update to this patch. Includes:
  * rudimentary doc page for pg_comments
  * 'foreign data wrapper' and 'server' comment types now included in
 pg_comments; regression test updated

 Hmm, if we're going to have pg_comments as a syntactic sugar kind of
 thing, it should output things in format immediately useful to the user,
 i.e. relation/column/etc names and not OIDs.  The OIDs would force you
 to do lots of joins just to make it readable.

Well, that's basically what this is doing.  See the objname/objtype
columns.  It's intended that the output of this view should match the
format that COMMENT takes as input.  But propagating the OIDs through
is sensible as well, because sometimes people may want to do other
joins, filtering, etc.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BLOB support

2011-06-06 Thread Radoslaw Smogura

I just started with some image as blob works. And I think topic of this will 
come back. As well many other problems will arise. Flattering tuple, etc.

I will send scretches of streaming in this way, I hope, as I want go back to 
clusterization work.

Sorry, for top reply, windows  phone 7.

Regards,
Radek

-Original Message-
From: Tom Lane
Sent: 6 czerwca 2011 17:41
To: Radoslaw Smogura
Cc: Alvaro Herrera; Pavel Stehule; Dimitri Fontaine; Robert Haas; Peter 
Eisentraut; PG Hackers
Subject: Re: [HACKERS] BLOB support 

=?utf-8?q?Rados=C5=82aw_Smogura?= rsmog...@softperience.eu writes:
 Introducing streaming for TOAST is little useless, sorry just for cite from 
 my, mentoined document:

 (This is generally about on demand stream of TOASTed value, in 
 context of LOBs is acceptable, as long not transactional aware LOBs are 
 acceptable). If we will add streaming of TOASTed values, so caller will 
 get some reference to this value, we need to ensure that pointed data 
 will not be changed, nor deleted - I think this will require caller to 
 add FOR UPDATE (or silently to add this by server) for each statement 
 returning pointers to TOASTed, as client may do transactional query,

It's already been explained to you that that's not the case.

 If this is acceptable I will do following changes.

 Add 
 - server_max_in_memory_lob_size - GUC server start-only config to describe 
 maximum value of client session parameter max_in_memory_lob.

 - max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
 memory before backing up to file

 - rescursivly toasting, detoasting during insert/update/remove for searching 
 for LOBs (we need this for arrays and complex types) - this is for last stage 
 (error disallowing LOBs in composites/arrays may be quite enaugh, for 
 begining) - I want LOBs to be starting point for LOBing other types (e.g. 
 some 
 big arrays may be LOBbed).

 - during toasting, lob will be toasted and in place of LOB, the reference to 
 it will be putted, and encoded in LOB datum.

 - api for LOB manipulation (few changes to current implementation) in way 
 that 
 BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
 for LOB will not affect size of datum looking at size of LOB.

 - api for maintaing temoraly lob files, we need this as per session list of 
 id 
 - file desc, to prevent prevent stealing of lobs by different connections 
 (security)

 - streaming api for TOASTED values (based on COPY protocol, or changed COPY 
 protocol) or at least function calls - I havent looked at this in context of 
 TOASTed LOBs.

 Is it good?

This all looks like you decided on a solution first and then started to
look for a problem to apply it to.  I don't want to see us inventing a
pile of GUCs for this, and I don't think there is a need to make any
fundamental changes in the TOAST mechanism either.  What we do need is a
streaming access protocol to read or write wide field values *without*
forcing people to change the basic structure of their tables.  You keep
on wanting to invent weird, IMO unnecessary language features instead.
Try to make the feature as transparent as possible, not as visible as
possible.

regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] permissions of external PID file

2011-06-06 Thread David Fetter

On Fri, Jun 03, 2011 at 09:51:45PM +0300, Peter Eisentraut wrote:
 The external PID file, if configured, is currently generated with 600
 permissions, which results from the umask setting in the postmaster.  I
 think it would be nicer if we could make that 644.
 
 I have often dealt with scripts and such that look through PID files
 in /var/run, and it's always annoying when some PID file is not readable
 for some reason or no reason at all.  (One simple benefit is that you
 don't need extra privileges to check whether the process is alive.)
 Almost all PID files on my system are world-readable now.

+1 for making this world-readable.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Ross J. Reedstrom

On Mon, Jun 06, 2011 at 12:53:49PM -0400, Robert Haas wrote:
 
 I don't have clear feeling on this question in general, but if we're
 going to break this up into pieces, it's important that they be
 logical pieces.  Putting half the feature in core and half into an
 extension just because we can will simplify complicate code
 maintenance to no good end.  The snowball effect is something to
 avoid, and we need to watch out for that, but if the upshot of putting
 part of it in core is that the core code can no longer be understood
 or maintained because it depends heavily on a bunch of non-core code,
 that's not helpful.
 
And concretely, code paths that cannot be exercised easily from
core-only code will not get regression tested, and will therefore rot.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] heap_hot_search_buffer refactoring

2011-06-06 Thread Robert Haas

The attached patch refactors heap_hot_search_buffer() so that
index_getnext() can use it, and modifies index_getnext() to do so.

The idea is based on one of Heikki's index-only scan patches, from 2009:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg00676.php

I believe, however, that this portion of that patch stands alone,
completely independently of index-only scans.  At present, we have two
copies of the logic to traverse HOT chains floating around, which
means they can get out of sync, possibly resulting in bugs.  This has
already happened once:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00733.php

As a nice bonus, the new logic is both simpler and, I believe, more
correct than the current logic.  In IndexScanDescData, xs_hot_dead,
xs_next_hot, and xs_prev_xmax get replaced by a single boolean
xs_continue_hot.  There is a small behavioral difference: in the
current code, when use a non-MVCC snapshot with index_getnext() and
walk a HOT chain, each call remembers the *next* TID that should be
examined.  With this change, we instead remember the TID that we most
recently returned, and compute the next TID when index_getnext() is
called again.  It seems to me that this is really what we should have
been doing all along.  Imagine, for example, that we have a HOT chain
A - B, where B has not yet committed.  We return A and remember that
we next intend to look at B.  Before index_getnext() is called, B
aborts and a new HOT update produces a HOT chain A - C.  The next
call to index_getnext() will nonetheless look at B and conclude that
it's reached the end of the HOT chain.  This doesn't actually matter a
bit for current uses of index_getnext(), because - at least according
to Heikki's old notes - the only place we use a non-MVCC snapshot with
this function is during CLUSTER.  And at that point, we have the whole
table locked down, so nothing is happening concurrently.  I'm not sure
there's any possibility of us ever using this function in a way that
would break under the current implementation, but what I'm proposing
here does seem more robust to me.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


heap-hot-search-refactoring.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-06-06 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Hmm, if we're going to have pg_comments as a syntactic sugar kind of
 thing, it should output things in format immediately useful to the user,
 i.e. relation/column/etc names and not OIDs.  The OIDs would force you
 to do lots of joins just to make it readable.

 Well, that's basically what this is doing.  See the objname/objtype
 columns.  It's intended that the output of this view should match the
 format that COMMENT takes as input.  But propagating the OIDs through
 is sensible as well, because sometimes people may want to do other
 joins, filtering, etc.

Is it also propagating the catalog OID through?  Because joining on OID
alone is not to be trusted.

I tend to agree with Alvaro's viewpoint here: anybody who wants to deal
directly in OIDs is better off joining directly to pg_description, and
not going through the rather large overhead that this view is going to
impose.  So we should just make this a purely user-friendly view and be
done with it, not try to create an amalgam that serves neither purpose
well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: Allow \dd to show constraint comments

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 2:06 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 6, 2011 at 1:03 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 Hmm, if we're going to have pg_comments as a syntactic sugar kind of
 thing, it should output things in format immediately useful to the user,
 i.e. relation/column/etc names and not OIDs.  The OIDs would force you
 to do lots of joins just to make it readable.

 Well, that's basically what this is doing.  See the objname/objtype
 columns.  It's intended that the output of this view should match the
 format that COMMENT takes as input.  But propagating the OIDs through
 is sensible as well, because sometimes people may want to do other
 joins, filtering, etc.

 Is it also propagating the catalog OID through?  Because joining on OID
 alone is not to be trusted.

Yep.

 I tend to agree with Alvaro's viewpoint here: anybody who wants to deal
 directly in OIDs is better off joining directly to pg_description, and
 not going through the rather large overhead that this view is going to
 impose.  So we should just make this a purely user-friendly view and be
 done with it, not try to create an amalgam that serves neither purpose
 well.

Possibly.  On the other hand we have things like pg_tables floating
around that are basically useless because you can't get the OID
easily.  The information_schema suffers from this problem as well.  I
get what you're saying, but I think we should think two or three times
very carefully before throwing away the OID in a situation where it
can easily be provided.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Stephen Frost

* Dimitri Fontaine (dimi...@2ndquadrant.fr) wrote:
 That means that this is, IMHO, the right approach.  Have core support
 that enables user defined RANGE types with indexing and planner support,
 etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

Yes, we do, however..

 And the useful stuff you need to have to benefit from that core support
 would be an extension.  It could be a core maintained extension, and it
 could even get installed by default, so that all the users would need to
 do is 'CREATE EXTENSION timeranges;', for example.

I don't like the idea of having a capability which is not utilized
in core.  We should make it so extensions can *also* have access to
define their own, but we should have the basics covered in core.

  a. core extensions, shipped by default

Having it as a core extension might work, but I'm not really 'sold' on
it.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] WALInsertLock tuning

2011-06-06 Thread Simon Riggs

On Mon, Jun 6, 2011 at 5:47 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 In earlier discussions of how to improve WALInsertLock contention, it
 was observed that we must zero each new page before we advance the WAL
 insertion point.
 http://postgresql.1045698.n5.nabble.com/Reworking-WAL-locking-td1983647.html

 IMHO the page zeroing is completely unnecessary,

 I don't believe it's completely unnecessary.  It does in fact offer
 additional protection against mistakenly taking stale data as valid.
 You could maybe argue that the degree of safety increase isn't
 sufficient to warrant the cost of zeroing the page, but you've not
 offered any quantification of either the risk or the cost savings.

If we did ever reference stale data, it would need to have a value of
xl_prev that exactly matched what we expected AND would also need a
CRC that exactly matched also. That would be fairly difficult to
arrange deliberately and pretty close to zero chance of happening
otherwise.

But that even assumes we write the unzeroed data at the end of the
buffer. We don't. We only write data up to the end of the WAL record
on the current page, unless we do a continuation record, which means
only replay we would read in another page and check page headers. So
for this to cause an error, we'd have to have an overall matching CRC,
a matching xl_prev and at least one matching page header, which
contains a pageaddress.

But that even assumes we would read data in a different way to which
it was written.

So the only way we could ever reference the stale data is if the WAL
reading code didn't match the WAL writing code (1) because of a bug or
(2) because of a corrupt pg_control record that caused random access
to an otherwise unreachable part of WAL
AND
the CRC matched, and the xl_prev matched and the next page header matched.

Risk == zero. If it wasn't zero I would even mention it because this
is not a trade-off optimization.

Cost savings are those already identified by yourself in our previous
discussion on this, link given upthread. It's the biggest single
removable item from within WALInsertLock. We can measure them if you
like, but I don't see the value in that. It will clearly be a useful
saving on what we all agree is a heavily contended lock.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Merlin Moncure

On Sun, Jun 5, 2011 at 1:59 PM, Jeff Davis pg...@j-davis.com wrote:
 In the several talks that I've given, a common question is related to
 multiranges (ranges with holes). These get a little complex, and I
 don't have a complete answer. However, multiranges can be approximated
 with ordered arrays of non-overlapping, non-adjacent ranges. If someone
 wants to take it upon themselves to develop a set of operators here,
 that would be great -- but without ANYRANGE the operators would be
 unmanageable.

 2. Documentation and Tests
 --
 Let's say we take a minimalist view, and only have ANYRANGE and CREATE
 TYPE ... AS RANGE in core; and leave the rest as an extension.

 What exactly would the documentation say? I think it would be even more
 hypothetical and abstract than the documentation for Exclusion
 Constraints. So, there is a certain documentation advantage to having at
 least enough functionality to allow someone to try out the feature.

 And the tests for such a minimalist feature would be a significant
 challenge -- what do we do there? Get pg_regress to load the extension
 from PGXN?


 3. Quality
 --
 PostgreSQL has a great reputation for quality, and for good reason. But
 extensions don't follow the same quality-control standards; and even if
 some do, there is no visible stamp of approval. So, to ask someone to
 use an extension means that they have to evaluate the quality for
 themselves, which is a pretty high barrier.

 Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
 many of the other issues, quality control is one of the biggest ones
 remaining. I still get questions about when the temporal type will be
 in core, and I think this is why.

 I don't think this is a good excuse to put it in core though. We need to
 solve this problem, and the best way to start is by getting
 well-reviewed, high-quality extensions out there.


 4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
 -
 There are a few aspects of range types that aren't in the first patch,
 but are fairly obvious follow-up additions. These will require some
 knowledge about ranges in the backend, like finding the overlaps
 operator for a range. The current patch provides this knowledge by
 providing a built-in overlaps operator for ANYRANGE. This would be a
 non-issue if we had a good type interface system (that works on
 polymorphic types) -- we could just have a built-in range interface,
 and the range extension could add  as the range interface's overlaps
 operator for the type ANYRANGE.

 =

 So, where on this spectrum should range types fall? I think the most
 minimalist would be to only support #1 (and the necessary type IO
 functions); and leave all other functions, operators, and opclasses to
 an extension. That has a lot of appeal, but I don't think we can ignore
 the challenges above.

 On the other hand, trying to make it a complete feature in core has
 challenges as well. For instance, even with Range Types, Exclusion
 Constraints aren't practical out-of-the-box unless we also have
 BTree-GiST in core. So there's a snowball effect.

 There might also be some middle ground, where its like the minimalist
 approach, but with a few very basic constructors and accessors. That
 would at least make it easier to test, but then to be actually useful
 (with index support, operators, fancy functions, etc.) you'd need the
 extension.

 Thoughts?

ISTM (I haven't followed all the lead up so apologies if this is
already covered) a range is a 3rd pseudo 'container' type (the other
two being composites and arrays). Do you see:

*) being able to make arrays of ranges/ranges of arrays?
*) range of composites?

I vote for at minimum the type itself and ANYRANGE to be in core.
From there you could make it like arrays where the range type is
automatically generated for each POD type.  I would consider that for
sure on basis of simplicity in user-land unless all the extra types
and operators are a performance hit.

A clean and highly usable implementation in the type system in the
spirit of arrays would be fantastic.  I'm particularly interested in
hypothetical constructor/destructor and in/out mechanics...an 'unnest'
like function, a range(a,b,c) that does as row(a,b,c) does, etc,
especially if you can work it out so that everything is not hammered
through textual processing.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error in PQsetvalue

2011-06-06 Thread Merlin Moncure

On Mon, Jun 6, 2011 at 7:09 AM, Pavel Golub pa...@microolap.com wrote:
 Sorry for delay in answer. Yeah, I'm glad to. Should I apply this
 patch by myself?

sure, run it against your test case and make sure it works. if so, we
can pass it up the chain of command (hopefully as context diff).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Josh Berkus


 That's an improvement of about ~3.5x.  According to the vmstat output,
 when running without the patch, the CPU state was about 40% idle.
 With the patch, it dropped down to around 6%.

Wow!  That's fantastic.

Jignesh, are you in a position to test any of Robert's work using DBT or
other benchmarks?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
   IMHO, it's better to just have a deadline,

Well, that's the fine point we're now talking about.

I still think that we should try at making the best release possible.
And if that means including changes at beta time because that's when
someone got around to doing them, so be it — well, they should really
worth it.

So, to the question “do we want hard deadlines?” I think the answer is
“no”, to “do we need hard deadlines?”, my answer is still “no”, and to
the question “does this very change should be considered this late?” my
answer is yes.

Because it really changes the game for PostgreSQL users.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SAVEPOINTs and COMMIT performance

2011-06-06 Thread Simon Riggs

On Mon, Jun 6, 2011 at 10:33 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 06.02.2011 23:09, Simon Riggs wrote:

 On Sun, 2011-02-06 at 12:11 -0500, Bruce Momjian wrote:

 Did this ever get addressed?

 Patch attached.

 Seems like the easiest fix I can come up with.

 @@ -2518,7 +2518,7 @@ CommitTransactionCommand(void)
                case TBLOCK_SUBEND:
                        do
                        {
 -                               CommitSubTransaction();
 +                               CommitSubTransaction(true);
                                s = CurrentTransactionState;    /* changed
 by pop */
                        } while (s-blockState == TBLOCK_SUBEND);
                        /* If we had a COMMIT command, finish off the main
 xact too */

 We also get into this codepath at RELEASE SAVEPOINT, in which case it is
 wrong to not reassign the locks to the parent subtransaction.

Agreed.

Thanks for the review.

I'll change that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Types and extensions

2011-06-06 Thread Dimitri Fontaine

Stephen Frost sfr...@snowman.net writes:
 I don't like the idea of having a capability which is not utilized
 in core.  We should make it so extensions can *also* have access to
 define their own, but we should have the basics covered in core.

Well if another part of core depends on the feature set, then of course
you don't have a choice to make it an extension any more.  I think
that's where I would draw the line.

 Having it as a core extension might work, but I'm not really 'sold' on
 it.

Well, core extension means built by default, part of default regression
tests and all.  The regression test simply begins with the create
extension stanza, that's about it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Dave Page

On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 So, to the question “do we want hard deadlines?” I think the answer is
 “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
 the question “does this very change should be considered this late?” my
 answer is yes.

 Because it really changes the game for PostgreSQL users.

Much as I hate to say it (I too want to keep our schedule as
predictable and organised as possible), I have to agree. Assuming the
patch is good, I think this is something we should push into 9.1. It
really could be a game changer.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Stefan Kaltenbrunner

On 06/06/2011 09:24 PM, Dave Page wrote:
 On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 So, to the question “do we want hard deadlines?” I think the answer is
 “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
 the question “does this very change should be considered this late?” my
 answer is yes.

 Because it really changes the game for PostgreSQL users.
 
 Much as I hate to say it (I too want to keep our schedule as
 predictable and organised as possible), I have to agree. Assuming the
 patch is good, I think this is something we should push into 9.1. It
 really could be a game changer.

I disagree - the proposed patch maybe provides a very significant
improvment for a certain workload type(nothing less but nothing more),
but it was posted way after -BETA and I'm not sure we yet understand all
implications of the changes.
We also have to consider that the underlying issues are known problems
for multiple years^releases so I don't think there is a particular rush
to force them into a particular release (as in 9.1).


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Stephen Frost

* Dave Page (dp...@pgadmin.org) wrote:
 Much as I hate to say it (I too want to keep our schedule as
 predictable and organised as possible), I have to agree. Assuming the
 patch is good, I think this is something we should push into 9.1. It
 really could be a game changer.

So, with folks putting up that we should hammer this patch out and
force it into 9.1..  What should our new release date for 9.1 be?  What
about other patches that didn't make it into 9.1?  What about the
upcoming CommitFest that we've asked people to start working on?

If we're going to start putting in changes like this, I'd suggest that
we try and target something like September for 9.1 to actually be
released.  Playing with the lock management isn't something we want to
be doing lightly and I think we definitely need to have serious testing
of this, similar to what has been done for the SSI changes, before we're
going to be able to release it.

I don't agree that we should delay 9.1, but if people really want this
in, then we need to figure out what the new schedule is going to be.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Dave Page

On Mon, Jun 6, 2011 at 8:40 PM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 On 06/06/2011 09:24 PM, Dave Page wrote:
 On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 So, to the question “do we want hard deadlines?” I think the answer is
 “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
 the question “does this very change should be considered this late?” my
 answer is yes.

 Because it really changes the game for PostgreSQL users.

 Much as I hate to say it (I too want to keep our schedule as
 predictable and organised as possible), I have to agree. Assuming the
 patch is good, I think this is something we should push into 9.1. It
 really could be a game changer.

 I disagree - the proposed patch maybe provides a very significant
 improvment for a certain workload type(nothing less but nothing more),
 but it was posted way after -BETA and I'm not sure we yet understand all
 implications of the changes.

We certainly need to be happy with the implications if we were to make
such a decision.

 We also have to consider that the underlying issues are known problems
 for multiple years^releases so I don't think there is a particular rush
 to force them into a particular release (as in 9.1).

No, there's no *technical* reason we need to do this, as there would
be if it were a bug fix for example. I would just like to see us
narrow the gap with our competitors sooner rather than later, *if*
we're a) happy with the change, and b) we're talking about a minimal
delay (which we may be - Robert says he thinks the patch is good, so
with another review and beta testing).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Josh Berkus

On 6/6/11 12:12 PM, Dimitri Fontaine wrote:
 So, to the question “do we want hard deadlines?” I think the answer is
 “no”, to “do we need hard deadlines?”, my answer is still “no”, and to
 the question “does this very change should be considered this late?” my
 answer is yes.

I could not disagree more strongly.  We're in *beta* now.  It's not like
the last CF closed a couple weeks ago.  Heck, I'm about to open the
first CF for 9.2 in just over a week.

Also, a patch like this needs several months of development, discussion
and  testing in order to fix the issues Robert already identified and
make sure it doesn't break something fundamental to concurrency.   Which
would mean delaying the release would be delayed until at least
November, screwing over all the users who don't care about this patch.

There will *always* be another really cool patch.  If we keep delaying
release to get in one more patch, then we never release.  At some point
you just have to take what you have and call it a release, and we are
months past that point.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Andrew Dunstan




On 06/06/2011 03:24 PM, Dave Page wrote:

On Mon, Jun 6, 2011 at 8:12 PM, Dimitri Fontainedimi...@2ndquadrant.fr  wrote:

So, to the question “do we want hard deadlines?” I think the answer is
“no”, to “do we need hard deadlines?”, my answer is still “no”, and to
the question “does this very change should be considered this late?” my
answer is yes.

Because it really changes the game for PostgreSQL users.

Much as I hate to say it (I too want to keep our schedule as
predictable and organised as possible), I have to agree. Assuming the
patch is good, I think this is something we should push into 9.1. It
really could be a game changer.



I'm not a fan of hard and fast deadlines for releases - it puts too much 
pressure on us to release before we might be ready. But I'm also not a 
fan of totally abandoning our established processes, which accepting 
this would. I don't mind bending the rules a bit occasionally; I do mind 
throwing them out the door.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Dave Page

On Mon, Jun 6, 2011 at 8:44 PM, Stephen Frost sfr...@snowman.net wrote:
 * Dave Page (dp...@pgadmin.org) wrote:
 Much as I hate to say it (I too want to keep our schedule as
 predictable and organised as possible), I have to agree. Assuming the
 patch is good, I think this is something we should push into 9.1. It
 really could be a game changer.

 So, with folks putting up that we should hammer this patch out and
 force it into 9.1..  What should our new release date for 9.1 be?  What
 about other patches that didn't make it into 9.1?  What about the
 upcoming CommitFest that we've asked people to start working on?

 If we're going to start putting in changes like this, I'd suggest that
 we try and target something like September for 9.1 to actually be
 released.  Playing with the lock management isn't something we want to
 be doing lightly and I think we definitely need to have serious testing
 of this, similar to what has been done for the SSI changes, before we're
 going to be able to release it.

Completely aside from the issue at hand, aren't we looking at a
September release by now anyway (assuming we have to void late
July/August as we usually do)?


-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Jignesh Shah

On Mon, Jun 6, 2011 at 2:49 PM, Josh Berkus j...@agliodbs.com wrote:

 That's an improvement of about ~3.5x.  According to the vmstat output,
 when running without the patch, the CPU state was about 40% idle.
 With the patch, it dropped down to around 6%.

 Wow!  That's fantastic.

 Jignesh, are you in a position to test any of Robert's work using DBT or
 other benchmarks?

 --
 Josh Berkus
 PostgreSQL Experts Inc.
 http://pgexperts.com



I missed the discussion. Can you send me the patch (will that work
with 9.1 beta?)? I can do a before and after with DBT2 and let you
know.
And also test it with sysbench read test  which also has a relation
locking bottleneck.

Thanks.

Regards,
Jignesh

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Christopher Browne

On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The cost to us is a few days work and the benefit is a whole year's
 worth of increased performance for our user base, which has a hardware
 equivalent well into the millions of dollars.

I doubt that this is an accurate reflection of the cost.

What was presented by Robert Haas was a proof of concept, and he
pointed out that it had numerous problems.  To requote:

There are numerous problems with the code as it stands at this point.
It crashes if you try to use 2PC, which means the regression tests
fail; it probably does horrible things if you run out of shared
memory; pg_locks knows nothing about the new mechanism (arguably, we
could leave it that way: only locks that can't possibly be conflicting
with anything can be taken using this mechanism, but it would be nice
to fix, I think); and there are likely some other gotchas as well.

Turning this into something ready for production deployment in 9.1
would require a non-trivial amount of additional effort, and would
likely have the adverse effect of deferring the release of 9.1, as
well as of further deferring all the effects of the patches submitted
for the latest commitfest
(https://commitfest.postgresql.org/action/commitfest_view?id=10),
since this defers release of 9.2, as well.

While the patch is a fine one, in that it has interesting effects, it
seems like a way wiser idea to me to let it go through the 9.2
process, so that it has 6 months worth of buildfarm runs before it
gets deployed for real just like all the other items in the 2011-06
CommitFest.

Note that it may lead to further discoveries, so that perhaps, in the
9.2 series, we'd see further improvements due to things that are
discovered as further consequence of testing
https://commitfest.postgresql.org/action/patch_view?id=572.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Robert Haas

On Mon, Jun 6, 2011 at 3:59 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Mon, Jun 6, 2011 at 5:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The cost to us is a few days work and the benefit is a whole year's
 worth of increased performance for our user base, which has a hardware
 equivalent well into the millions of dollars.

 I doubt that this is an accurate reflection of the cost.

 What was presented by Robert Haas was a proof of concept, and he
 pointed out that it had numerous problems.  To requote:

 There are numerous problems with the code as it stands at this point.
 It crashes if you try to use 2PC, which means the regression tests
 fail; it probably does horrible things if you run out of shared
 memory; pg_locks knows nothing about the new mechanism (arguably, we
 could leave it that way: only locks that can't possibly be conflicting
 with anything can be taken using this mechanism, but it would be nice
 to fix, I think); and there are likely some other gotchas as well.

The latest version of the patch is in much better shape:

http://archives.postgresql.org/pgsql-hackers/2011-06/msg00403.php

But this is not intended as disparagement for the balance of your argument.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] contrib/citext versus collations

2011-06-06 Thread Tom Lane

I've been looking into bug #6053, in which Regina Obe complains that
hash-based DISTINCT queries fail for type citext.  The cause is not
far to seek: the header comment for execGrouping.c states

 * Note: we currently assume that equality and hashing functions are not
 * collation-sensitive, so the code in this file has no support for passing
 * collation settings through from callers.  That may have to change someday.

and indeed the failure comes directly from the fact that citext's hash
function *does* expect a collation to be passed to it.  I'm a bit
embarrassed to not have noticed that citext was a counterexample for
this assumption, especially since I already fixed one bug that should
have clued me in (commit a0b75a41a907e1582acdb8aa6ebb9cacca39d7d8).

Now, removing this assumption from execGrouping.c is already a pretty
sizable task --- for starters, at least plan node types Agg, Group,
SetOp, Unique, and WindowAgg would need collation attributes that they
don't have today.  But the assumption that equality operators are not
collation-sensitive is baked into a number of other places too; for
instance
nodeAgg.c @ line 600
indxpath.c @ line 2200
prepunion.c @ line 640
ri_triggers.c @ line 3000
and that's just places where there's a comment about it :-(.

It's worth noting also that in many of these places, paying attention to
collation is not merely going to need more coding; it will directly
translate to a performance hit, one that is entirely unnecessary for the
normal case where collation doesn't affect equality.

So this leaves us between a rock and a hard place.  I think there's just
about no chance of fixing all these things without a serious fresh slip
in the 9.1 schedule.  Also, I'm *not* prepared to fix these things
personally.  I already regret the amount of time I put into collations
this past winter/spring, and am not willing to drop another several
weeks down that sinkhole right now.

The most workable alternative that I can see is to lobotomize citext so
that it always does lower-casing according to the database's default
collation, which would allow us to pretend that its notion of equality
is not collation-sensitive after all.  We could hope to improve this in
future release cycles, but not till we've done the infrastructure work
outlined above.  One bit of infrastructure that might be a good idea is
a flag to indicate whether an equality operator's behavior is
potentially collation-dependent, so that we could avoid taking
performance hits in the normal case.

Comments, other ideas?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] reducing the overhead of frequent table locks - now, with WIP patch

2011-06-06 Thread Kevin Grittner

Stephen Frost sfr...@snowman.net wrote:
 
 if people really want this in, then we need to figure out what the
 new schedule is going to be.
 
I suggest June, 2012.  That way we can get a whole bunch more really
cool patches in, and the users won't have to wait for 9.2 to get
them.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 123 matches

Mail list logo