Re: [HACKERS] lazy vxid locks, v2

2011-07-14 Thread Jeff Davis
On Tue, 2011-07-05 at 13:15 -0400, Robert Haas wrote:
 On Tue, Jul 5, 2011 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
  Here is an updated version of the lazy vxid locks patch [1], which
  applies over the latest reduce the overhead of frequent table
  locks[2] patch.
 
  [1] https://commitfest.postgresql.org/action/patch_view?id=585
  [2] https://commitfest.postgresql.org/action/patch_view?id=572
 
 And then I forgot the attachment.

The patch looks good, and I like the concept.

My only real comment is one that you already made: the
BackendIdGetProc() mechanism is not awesome. However, that seems like
material for a separate patch, if at all.

Big disclaimer: I did not do any performance review, despite the fact
that this is a performance patch.

I see that there are some active performance concerns around this patch,
specifically that it may cause an increase in spinlock contention:

http://archives.postgresql.org/message-id/banlktikp4egbfw9xdx9bq_vk8dqa11w...@mail.gmail.com

Fortunately, there's a subsequent discussion that shows a lot of
promise:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg00293.php

I'll mark this waiting on author pending the results of that
discussion.

I like the approach you're taking with this series of patches, so
perhaps we shouldn't set the bar so high that you have to remove all of
the bottlenecks before making any progress. Then again, maybe there's
not a huge cost to leaving these patches on the shelf until we're sure
that they lead somewhere.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch review : Add ability to constrain backend temporary file space

2011-07-14 Thread Tatsuo Ishii
 Hackers,
 
 This patch needs a new reviewer, per Cedric.  Please help!

Hi I am the new reviewer:-)

I have looked into the v6 patches. One thing I would like to suggest
is, enhancing the error message when temp_file_limit will be exceeded.

ERROR:  aborting due to exceeding temp file limit

Is it possible to add the current temp file limit to the message? For
example,

ERROR:  aborting due to exceeding temp file limit 1kB

I know the current setting of temp_file_limit can be viewd in other
ways, but I think this will make admin's or application developer's
life a little bit easier.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Full GUID support

2011-07-14 Thread Hiroshi Saito

Hi Thomas-san, Ralf-san.

I appreciate your great work.
Thanks!

CC to Postgres-ML.

Regards,
Hiroshi Saito

(2011/07/14 3:49), Thomas Lotterer wrote:
 Thanks for the hint.
 Our ftp daemon is dumping core.
 We are debugging ...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] help with sending email

2011-07-14 Thread Fernando Acosta Torrelly
Hi everybody: 

 

I am using pgmail to send email in an application, but I would like to use
html format 

 

Does anybody has an example how to do this?, or what do you recommend me to
use por doing this?

 

Thanks in advance for your attention. 

 

Best Regards,

 

Fernando Acosta 

Lima - Perú



Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-14 Thread Fujii Masao
On Thu, Jul 14, 2011 at 5:39 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Jul 13, 2011 at 10:56 PM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 Attached is patch for the WAL writer that removes its tight polling
 loop (which probably doesn't get hit often in practice, as we just
 sleep if wal_writer_delay is under a second), and, at least
 potentially, reduces power consumption when idle by using a latch.

 I will break all remaining power consumption work down into
 per-auxiliary process patches. I think that this is appropriate - if
 we hit a snag on one of the processes, there is no need to have that
 hold up everything.

 I've commented that we handle all expected signals, and therefore we
 shouldn't worry about having timeout invalidated by signals, just as
 with the archiver. Previously, we didn't even worry about Postmaster
 death within the tight polling loop, presumably because
 wal_writer_delay is typically small enough to avoid that being a
 problem. I thought that WL_POSTMASTER_DEATH might be superfluous here,
 but then again there is a codepath specifically for the case where
 wal_writer_delay exceeds one second, so it is included in this initial
 version.

 Comments?

 ISTM that this in itself isn't enough to reduce power consumption.

 Currently the only people that use WALWriter are asynchronous commits,
 so we should include within RecordTransactionCommit() a SetLatch()
 command for the WALWriter.

 That way we can have WALWriter sleep until its needed.

+1

Currently walwriter might write out the WAL before a transaction commits.
IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups.
This might be useful for long transaction which generates lots of WAL
records before commit. So we should call SetLatch() in XLogInsert() instead
of RecordTransactionCommit()? Though I'm not sure how much walwriter
improves the performance of synchronous commit case..

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-07-14 Thread Alexander Korotkov
On Thu, Jul 14, 2011 at 12:42 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Pinning a buffer that's already in the shared buffer cache is cheap, I
 doubt you're gaining much by keeping the private hash table in front of the
 buffer cache.

Yes, I see. Pinning a lot of buffers don't gives singnificant benefits but
produce a lot of problems. Also removing the hash table can simplify code.

Also, it's possible that not all of the subtree is actually required during
 the emptying, so in the worst case pre-loading them is counter-productive.

What do you think about pre-fetching all of the subtree? It requires actual
loading of level_step - 1 levels of it. I some cases it still can be
  counter-productive. But probably it is productive in average?


 Well, what do you do if you deem that shared_buffers is too small? Fall
 back to the old method? Also, shared_buffers is shared by all backends, so
 you can't assume that you get to use all of it for the index build. You'd
 need a wide safety margin.

I assumed to check if there are enough of shared_buffers before switching to
buffering method. But concurent backends makes this method unsafe.

There are other difficulties with concurrent backends: it would be nice
estimate usage of effective cache by other backeds before switching to
buffering method. If don't take care about it then we can don't switch to
buffering method which it can give significant benefit.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-14 Thread Simon Riggs
On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masao masao.fu...@gmail.com wrote:

 Currently walwriter might write out the WAL before a transaction commits.
 IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups.
 This might be useful for long transaction which generates lots of WAL
 records before commit. So we should call SetLatch() in XLogInsert() instead
 of RecordTransactionCommit()? Though I'm not sure how much walwriter
 improves the performance of synchronous commit case..

Yeh, we did previously have a heuristic to write out the WAL when it
was more than half full. Not sure I want to put exactly that code back
into such a busy code path.

I suggest that we set latch every time the wal buffers wrap.

So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then
SetLatch on the WALWriter.

That's a simple test and we only check it if we're switch WAL buffer page.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small patch for GiST: move childoffnum to child

2011-07-14 Thread Alexander Korotkov
On Thu, Jul 14, 2011 at 12:56 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 First, notice that we're setting ptr-parent = top. 'top' is the current
 node we're processing, and ptr represents the node to the right of the
 current node. The current node is *not* the parent of the node to the right.
 I believe that line should be ptr-parent = top-parent.

I think same.


 Second, we're adding the entry for the right sibling to the end of the list
 of nodes to visit. But when we process entries from the list, we exit
 immediately when we see a leaf page. That means that the right sibling can
 get queued up behind leaf pages, and thus never visited.

I think possible solution is to save right sibling immediatly after current
page . Thus, this code fragment should looks like this:


if (top-parent  XLByteLT(top-parent-lsn,
 GistPageGetOpaque(page)-nsn) 
GistPageGetOpaque(page)-**rightlink !=
 InvalidBlockNumber /* sanity check */ )
{
/* page splited while we thinking of... */
ptr = (GISTInsertStack *) palloc0(sizeof(**
 GISTInsertStack));
ptr-blkno = GistPageGetOpaque(page)-**rightlink;
ptr-childoffnum = InvalidOffsetNumber;
ptr-parent = top-parent;
ptr-next = top-next;
top-next = ptr;
if (tail == top);
tail = ptr;

}


--
With best regards,
Alexander Korotkov.


[HACKERS] Understanding GIN posting trees

2011-07-14 Thread Heikki Linnakangas

I have a couple of questions on GIN:

The code seems to assume that it's possible for the same TID to appear 
twice for a single key (see addItemPointersToTuple()). I understand that 
it's possible for a single heap tuple to contain the same key twice. For 
example if you index an array of integers like [1,2,1]. But once you've 
inserted all the keys for a single heap item, you never try to insert 
the same TID again, so no duplicates should occur.


Looking at the history, it looks like pre-8.4 we assumed that no such 
duplicates are possible. Duplicates of a single key for one column are 
eliminated in extractEntriesSU(), but apparently when the multi-column 
support was added, we didn't make the de-duplication to run across the 
keys extracted from all columns. Now that the posting tree/list 
insertion code has to deal with duplicates anyway, the de-duplication 
performed in extractEntriesSU() seems pointless. But I wonder if it 
would be better to make extractEntriesSU() remove duplicates across all 
columns, so that the insertion code wouldn't need to deal with duplicates.


Dealing with the duplicates in the insertion code isn't particularly 
difficult. And in fact, now that we only support the getbitmap method, 
we wouldn't really need to eliminate duplicates anyway. But I have an 
ulterior motive:


Why is the posting tree a tree? AFAICS, we never search it using the 
TID, it's always scanned in whole. It would be simpler to store the TIDs 
in a posting list in no particular order. This could potentially make 
insertions cheaper, as you could just append to the last posting list 
page for the key, instead of traversing the posting tree to a particular 
location. You could also pack the tids denser, as you wouldn't need to 
reserve free space for additions in the middle.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-14 Thread Radosław Smogura

On Thu, 14 Jul 2011 15:15:33 +0300, Peter Eisentraut wrote:

On sön, 2011-07-10 at 11:40 -0700, Josh Berkus wrote:

Hackers,

 B. 6. Current behaviour _is intended_ (there is if  to check 
node type) and _natural_. In this particular case user ask for text 
content of some node, and this content is actually .


 I don't buy that. The check for the node type is there because
 two different libxml functions are used to convert nodes to
 strings. The if has absolutely *zero* to do with escaping, expect
 for that missing escape_xml() call in the else case.

 Secondly, there is little point in having an type XML if we
 don't actually ensure that values of that type can only contain
 well-formed XML.

Can anyone else weigh in on this? Peter?


Looks like a good change to me.
I'll bump it in few hours, as I can't recall password from keyring. Now 
I have hands clean and it's not my business to care about this.


Best regards.
Radek.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: a validator for configuration files

2011-07-14 Thread Alexey Klyukin

On Jul 14, 2011, at 4:38 AM, Alvaro Herrera wrote:

 Excerpts from Florian Pflug's message of mié jul 13 20:12:28 -0400 2011:
 On Jul14, 2011, at 01:38 , Alvaro Herrera wrote:
 One strange thing here is that you could get two such messages; say if a
 file has 100 parse errors and there are also valid lines that contain
 bogus settings (foo = bar).  I don't find this to be too problematic,
 and I think fixing it would be excessively annoying.
 
 For example, a bogus run would end like this:
 
 95 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 4, near end of line
 96 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 41, near end of line
 97 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 104, near end of line
 98 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 156, near end of line
 99 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 208, near end of line
 100 LOG:  syntax error in file /pgsql/install/HEAD/data/postgresql.conf 
 line 260, near end of line
 101 LOG:  too many errors found, stopped processing file 
 /pgsql/install/HEAD/data/postgresql.conf
 102 LOG:  unrecognized configuration parameter plperl.err
 103 LOG:  unrecognized configuration parameter this1
 104 LOG:  too many errors found, stopped processing file 
 /pgsql/install/HEAD/data/postgresql.conf
 105 FATAL:  errors detected while parsing configuration files
 
 How about changing ParseConfigFile to say too many *syntax* error found
 instead? It'd be more precise, and we wouldn't emit exactly the
 same message twice.
 
 Yeah, I thought about doing it that way but refrained because it'd be
 one more string to translate.  That's a poor reason, I admit :-)  I'll
 change it.

This is happening because a check for total number of errors so far is 
happening only after coming across at least one non-recognized configuration 
option.  What about adding one more check right after ParseConfigFile, so we 
can bail out early when overwhelmed with syntax errors? This would save a line 
in translation :).

 
 Do you want me to take a closer look at your modified version of the
 patch before you commit, or did you post it more as a FYI, this is
 how it's going to look like?
 
 I know I'd feel more comfortable if you (and Alexey, and Selena) gave it
 another look :-)

I have checked it here and don't see any more problems with it.

--
Command Prompt, Inc.  http://www.CommandPrompt.com
PostgreSQL Replication, Consulting, Custom Development, 24x7 support




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for distinguishing PG instances in event log

2011-07-14 Thread Magnus Hagander
2011/5/26 MauMau maumau...@gmail.com:
 Hello,

 I wrote and attached a patch for the TODO item below (which I proposed).

 Allow multiple Postgres clusters running on the same machine to distinguish
 themselves in the event log
 http://archives.postgresql.org/pgsql-hackers/2011-03/msg01297.php
 http://archives.postgresql.org/pgsql-hackers/2011-05/msg00574.php

 I changed two things from the original proposal.

 1. regsvr32.exe needs /n when you specify event source
 I described the reason in src/bin/pgevent/pgevent.c.

 2. I moved the article for event log registration to more suitable place
 The traditional place and what I originally proposed were not best, because
 those who don't build from source won't read those places.

 I successfully tested event log registration/unregistration, event logging
 with/without event_source parameter, and SHOWing event_source parameter with
 psql on Windows Vista (32-bit). I would appreciate if someone could test it
 on 64-bit Windows who has the 64-bit environment.

 I'll add this patch to the first CommitFest of 9.2. Thank you in advance for
 reviewing it.

+para
+ On Windows, you need to register an event source
+ and its library with the operating system in order
+ to make use of the systemitemeventlog/systemitem option for
+ varnamelog_destination/.
+ See xref linkend=event-log-registration for details.
+/para

* This part is not strictly correct - you don't *need* to do that, it
just makes things look nicer, no?

* Also, what is the use for set_eventlog_parameters()? It's just a
string variable, it shuold work without it.

* We these days avoid #ifdef'ing gucs just because they are not on
that platform - so the list is consistent. The guc should be available
on non-windows platforms as well.

* The guc also needs to go in postgresql.conf.sample

* We never build in unicode mode, so all those checks are unnecessary.

* Are we really allowed to call MessageBox in DlLRegisterService?
Won't that break badly in cases like silent installs?

Attached is an updated patch, which doesn't work yet. I believe the
changes to the backend are correct, but probably some of the cleanups
and changes in the dll are incorrect, because I seem to be unable to
register either the default or a custom handler so far.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 842558d..583a5c9 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2975,6 +2975,13 @@ local0.*/var/log/postgresql
  to the  applicationsyslog/application daemon's configuration file
  to make it work.
 /para
+para
+ On Windows, you need to register an event source
+ and its library with the operating system in order
+ to make use of the systemitemeventlog/systemitem option for
+ varnamelog_destination/.
+ See xref linkend=event-log-registration for details.
+/para
/note
   /listitem
  /varlistentry
@@ -3221,6 +3228,24 @@ local0.*/var/log/postgresql
/listitem
   /varlistentry
 
+ varlistentry id=guc-event-source xreflabel=event_source
+  termvarnameevent_source/varname (typestring/type)/term
+  indexterm
+   primaryvarnameevent_source/ configuration parameter/primary
+  /indexterm
+   listitem
+para
+ When logging to applicationevent log/ is enabled, this parameter
+ determines the program name used to identify
+ productnamePostgreSQL/productname messages in
+ applicationevent log/application. The default is
+ literalPostgreSQL/literal.
+ This parameter can only be set in the filenamepostgresql.conf/
+ file or on the server command line.
+/para
+   /listitem
+  /varlistentry
+
   /variablelist
 /sect2
  sect2 id=runtime-config-logging-when
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 0410cff..41b9009 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1552,19 +1552,6 @@ PostgreSQL, contrib and HTML documentation successfully made. Ready to install.
   /procedure
 
   formalpara
-   titleRegistering applicationeventlog/ on systemitem
-   class=osnameWindows/:/title
-   para
-To register a systemitem class=osnameWindows/ applicationeventlog/
-library with the operating system, issue this command after installation:
-screen
-userinputregsvr32 replaceablepgsql_library_directory//pgevent.dll/
-/screen
-This creates registry entries used by the event viewer.
-   /para
-  /formalpara
-
-  formalpara
titleUninstallation:/title
para
 To undo the installation use the command commandgmake
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index ef83206..bfbb641 100644
--- a/doc/src/sgml/runtime.sgml
+++ 

Re: [HACKERS] SAVEPOINTs and COMMIT performance

2011-07-14 Thread Simon Riggs
On Mon, Jun 6, 2011 at 10:33 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 06.02.2011 23:09, Simon Riggs wrote:

 On Sun, 2011-02-06 at 12:11 -0500, Bruce Momjian wrote:

 Did this ever get addressed?

 Patch attached.

 Seems like the easiest fix I can come up with.

 @@ -2518,7 +2518,7 @@ CommitTransactionCommand(void)
                case TBLOCK_SUBEND:
                        do
                        {
 -                               CommitSubTransaction();
 +                               CommitSubTransaction(true);
                                s = CurrentTransactionState;    /* changed
 by pop */
                        } while (s-blockState == TBLOCK_SUBEND);
                        /* If we had a COMMIT command, finish off the main
 xact too */

 We also get into this codepath at RELEASE SAVEPOINT, in which case it is
 wrong to not reassign the locks to the parent subtransaction.

Attached patch splits TBLOCK_SUBEND state into two new states:
TBLOCK_SUBCOMMIT and TBLOCK_SUBRELEASE, so that we can do the right
thing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


savepoint_commit_performance.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Pavan Deolasee
On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggs si...@2ndquadrant.com wrote:



 Hi Pavan,

 I'd say that seems way too complex for such a small use case and we've
 only just fixed the bugs from 8.4 vacuum map complexity. The code's
 looking very robust now and I'm uneasy that such changes are really
 worth it.


Thanks Simon for looking at the patch.

I am not sure if the use case is really narrow. Today, we dirty the pages in
both the passes and also emit WAL records. Just the heap scan can take a
very long time for large tables, blocking the autovacuum worker threads from
doing useful work on other tables. If I am not wrong, we use ring buffers
for vacuum which would most-likely force those buffers to be written/read
twice to the disk.

Which part of the patch you think is very complex ? We can try to simplify
that. Or are you seeing any obvious bugs that I missed ? IMHO taking out a
phase completely from vacuum (as this patch does) can simplify things.



 You're trying to avoid Phase 3, the second pass on the heap. Why not
 avoid the write in Phase 1 if its clear that we'll need to come back
 again in Phase 3? So we either do a write in Phase 1 or in Phase 3,
 but never both? That minimises the writes, which are what hurt the
 most.


You can possibly do the work in the Phase 3, but that doesn't avoid the
second scan.


 We can reduce the overall cost simply by not doing Phase 2 and Phase 3
 if the number of rows to remove is too few, say  1%.


If you have set the vacuum parameters such that it kicks in when there are
say 5% updates/deletes, you would most likely have that much work to do
anyways.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Simon Riggs
On Tue, Jul 12, 2011 at 9:47 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:

 http://archives.postgresql.org/pgsql-hackers/2011-05/msg01119.php
 PFA a patch which implements the idea with some variation.
 At the start of the first pass, we remember the current LSN. Every page that
 needs some work is HOT-pruned so that dead tuples are truncated to dead line
 pointers. We collect those dead line pointers and mark them as
 dead-vacuumed. Since we don't have any LP flag bits available, we instead
 just use the LP_DEAD flag along with offset value 1 to mark the line pointer
 as dead-vacuumed. The page is defragmented and we  store the LSN remembered
 at the start of the pass in the page special area as vacuum LSN. We also
 update the free space at that point because we are not going to do a second
 pass on the page anymore.

 Once we collect all dead line pointers and mark them as dead-vacuumed, we
 clean-up the indexes and remove all index pointers pointing to those
 dead-vacuumed line pointers. If the index vacuum finishes successfully, we
 store the LSN in the pg_class row of the table (needs catalog changes). At
 that point, we are certain that there are no index pointers pointing to
 dead-vacuumed line pointers and they can be reclaimed at the next
 opportunity.

 During normal operations or subsequent vacuum, if the page is chosen for
 HOT-prunung, we check if has any dead-vacuumed line pointers and if the
 vacuum LSN stored on the page special area is equal to the one stored in the
 pg_class row, and reclaim those dead-vacuum line pointers (the index
 pointers to these line pointers are already taken care of). If the pg_class
 LSN is not the same, the last vacuum probably did not finish completely and
 we collect the dead-vacuum line pointers just like other dead line pointers
 and try to clean up the index pointers as usual.
 I ran few pgbench tests with the patch. I don't see much difference in the
 overall tps, but the vacuum time for the accounts table reduces by nearly
 50%. I neither see much difference in the overall bloat, but then pgbench
 uses HOT very nicely and the accounts table got only couple of vacuum cycles
 in my 7-8 hour run.
 There are couple of things that probably need more attention. I am not sure
 if we need to teach ANALYZE to treat dead line pointers differently. Since
 they take up much less space than a dead tuple, they should definitely have
 a lower weight, but at the same time, we need to take into account the
 number of indexes on the table. The start of first pass LSN that we are
 remembering is in fact the start of the WAL page and I think there could be
 some issues with that, especially for very tiny tables. For example, first
 vacuum may run completely. If another vacuum is started on the same table
 and say it gets the same LSN (because we did not write more than 1 page
 worth WAL in between) and if the second vacuum aborts after it cleaned up
 few pages, we might get into some trouble. The likelihood of such things
 happening is very small, but may be its worth taking care of it. May be we
 can get the exact current LSN and not store it in the pg_class if we don't
 do anything during the cycle.
 Comments ?

Hi Pavan,

I'd say that seems way too complex for such a small use case and we've
only just fixed the bugs from 8.4 vacuum map complexity. The code's
looking very robust now and I'm uneasy that such changes are really
worth it.

You're trying to avoid Phase 3, the second pass on the heap. Why not
avoid the write in Phase 1 if its clear that we'll need to come back
again in Phase 3? So we either do a write in Phase 1 or in Phase 3,
but never both? That minimises the writes, which are what hurt the
most.

We can reduce the overall cost simply by not doing Phase 2 and Phase 3
if the number of rows to remove is too few, say  1%.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-14 Thread Peter Eisentraut
On ons, 2011-07-13 at 11:58 +0200, Nicolas Barbier wrote:
 2011/6/29, Florian Pflug f...@phlo.org:
 
  Secondly, there is little point in having an type XML if we
  don't actually ensure that values of that type can only contain
  well-formed XML.
 
 +1. The fact that XPATH() must return a type that cannot depend on the
 given expression (even if it is a constant string) may be unfortunate,
 but returning XML-that-is-not-quite-XML sounds way worse to me.

The example given was

XPATH('/*/text()', 'rootlt;/root')

This XPath expression returns a node set, and XML is a serialization
format of a node, so returning xml[] in this particular case seems
entirely reasonable to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

2011-07-14 Thread Peter Eisentraut
On sön, 2011-07-10 at 11:40 -0700, Josh Berkus wrote:
 Hackers,
 
  B. 6. Current behaviour _is intended_ (there is if  to check node type) 
  and _natural_. In this particular case user ask for text content of some 
  node, and this content is actually .
  
  I don't buy that. The check for the node type is there because
  two different libxml functions are used to convert nodes to
  strings. The if has absolutely *zero* to do with escaping, expect
  for that missing escape_xml() call in the else case.
  
  Secondly, there is little point in having an type XML if we
  don't actually ensure that values of that type can only contain
  well-formed XML.
 
 Can anyone else weigh in on this? Peter?

Looks like a good change to me.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 12:42, Simon Riggs wrote:

On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masaomasao.fu...@gmail.com  wrote:


Currently walwriter might write out the WAL before a transaction commits.
IOW, walwriter tries to write out the WAL in wal_buffers in every wakeups.
This might be useful for long transaction which generates lots of WAL
records before commit. So we should call SetLatch() in XLogInsert() instead
of RecordTransactionCommit()? Though I'm not sure how much walwriter
improves the performance of synchronous commit case..


Yeh, we did previously have a heuristic to write out the WAL when it
was more than half full. Not sure I want to put exactly that code back
into such a busy code path.

I suggest that we set latch every time the wal buffers wrap.

So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then
SetLatch on the WALWriter.

That's a simple test and we only check it if we're switch WAL buffer page.


That was my first though too - but I wonder if that's too aggressive? A 
backend that does for example a large bulk load will cycle through the 
buffers real quick. It seems like a bad idea to wake up walwriter 
between each buffer in that case. Then again, setting a latch that's 
already set is cheap, so maybe it works fine in practice.


Maybe it would be better to do it less frequently, say, every time you 
switch to new WAL segment. Or every 10 buffers or something like that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-14 Thread Simon Riggs
On Thu, Jul 14, 2011 at 10:53 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 14.07.2011 12:42, Simon Riggs wrote:

 On Thu, Jul 14, 2011 at 9:57 AM, Fujii Masaomasao.fu...@gmail.com
  wrote:

 Currently walwriter might write out the WAL before a transaction commits.
 IOW, walwriter tries to write out the WAL in wal_buffers in every
 wakeups.
 This might be useful for long transaction which generates lots of WAL
 records before commit. So we should call SetLatch() in XLogInsert()
 instead
 of RecordTransactionCommit()? Though I'm not sure how much walwriter
 improves the performance of synchronous commit case..

 Yeh, we did previously have a heuristic to write out the WAL when it
 was more than half full. Not sure I want to put exactly that code back
 into such a busy code path.

 I suggest that we set latch every time the wal buffers wrap.

 So at the bottom of AdvanceXLInsertBuffer(), if nextidx == 0 then
 SetLatch on the WALWriter.

 That's a simple test and we only check it if we're switch WAL buffer page.

 That was my first though too - but I wonder if that's too aggressive? A
 backend that does for example a large bulk load will cycle through the
 buffers real quick. It seems like a bad idea to wake up walwriter between
 each buffer in that case. Then again, setting a latch that's already set is
 cheap, so maybe it works fine in practice.

 Maybe it would be better to do it less frequently, say, every time you
 switch to new WAL segment. Or every 10 buffers or something like that.

Yes, that roughly what I'm saying. When nextidx == 0 is just after we
wrapped wal_buffers, i.e. we only wake up the WALWriter every
wal_buffers pages.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 18:57, Pavan Deolasee wrote:

On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggssi...@2ndquadrant.com  wrote:

I'd say that seems way too complex for such a small use case and we've
only just fixed the bugs from 8.4 vacuum map complexity. The code's
looking very robust now and I'm uneasy that such changes are really
worth it.


Thanks Simon for looking at the patch.

I am not sure if the use case is really narrow. Today, we dirty the pages in
both the passes and also emit WAL records. Just the heap scan can take a
very long time for large tables, blocking the autovacuum worker threads from
doing useful work on other tables. If I am not wrong, we use ring buffers
for vacuum which would most-likely force those buffers to be written/read
twice to the disk.


Seems worthwhile to me. What bothers me a bit is the need for the new 
64-bit LSN value on each heap page. Also, note that temporary tables are 
not WAL-logged, so there's no LSNs.


How does this interact with the visibility map? If you set the 
visibility map bit after vacuuming indexes, a subsequent vacuum will not 
visit the page. The second vacuum will update relindxvacxlogid/off, but 
it will not clean up the dead line pointers left behind by the first 
vacuum. Now the LSN on the page differs from the one stored in pg_class, 
so subsequent pruning will not remove the dead line pointers either. I 
think you can sidestep that if you check that the page's vacuum LSN = 
vacuum LSN in pg_class, instead of equality.


Ignoring the issue stated in previous paragraph, I think you wouldn't 
actually need an 64-bit LSN. A smaller counter is enough, as wrap-around 
doesn't matter. In fact, a single bit would be enough. After a 
successful vacuum, the counter on each heap page (with dead line 
pointers) is N, and the value in pg_class is N. There are no other 
values on the heap, because vacuum will have cleaned them up. When you 
begin the next vacuum, it will stamp pages with N+1. So at any stage, 
there is only one of two values on any page, so a single bit is enough. 
(But as I said, that doesn't hold if vacuum skips some pages thanks to 
the visibility map)


Is there something in place to make sure that pruning uses an up-to-date 
relindxvacxlogid/off value? I guess it doesn't matter if it's 
out-of-date, you'll just miss the opportunity to remove some dead tuples.


Seems odd to store relindxvacxlogid/off as two int32 columns. Store it 
in one uint64 column, or invent a new datatype for LSNs, or store it as 
text in %X/%X format.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 13, 2011, at 12:57 PM, Kevin Grittner wrote:

 create or replace function relistemp(rel pg_class)
  returns boolean language sql immutable strict as
  $$select $1.relpersistence = 't';$$;
 
 Just don't forget to use the table name or alias in front of it... :-)

Oh, nice hack. How far back does that work (pgTAP runs on 8.0 and higher)?

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Full GUID support

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 12:05 AM, Hiroshi Saito wrote:

 Hi Thomas-san, Ralf-san.
 
 I appreciate your great work.
 Thanks!
 
 CC to Postgres-ML.
 
 Regards,
 Hiroshi Saito
 
 (2011/07/14 3:49), Thomas Lotterer wrote:
  Thanks for the hint.
  Our ftp daemon is dumping core.
  We are debugging ...

Ah, good news, thanks.

Where should I report stuff like this in the future? I sent a message about 
this to r...@engelschall.com on May 15th and also a Twitter DM but didn't hear 
back…

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 19:51, David E. Wheeler wrote:

On Jul 13, 2011, at 12:57 PM, Kevin Grittner wrote:


create or replace function relistemp(rel pg_class)
  returns boolean language sql immutable strict as
  $$select $1.relpersistence = 't';$$;

Just don't forget to use the table name or alias in front of it... :-)


Oh, nice hack. How far back does that work (pgTAP runs on 8.0 and higher)?


Far back. But you only need it in = 9.1. Older versions have the 
pg_class.relistemp column anyway.


Not sure how this helps, though. If you modify pgTAP to install that 
automatically in pgTAP when dealing with a new server version, you might 
as well modify its queries to use relispersistence = 't' directly when 
dealing with a new server version. It works as a manual work-around if 
you can't upgrade pgTAP, I guess.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Kevin Grittner
David E. Wheeler da...@kineticode.com wrote:
 On Jul 13, 2011, at 12:57 PM, Kevin Grittner wrote:
 
 create or replace function relistemp(rel pg_class)
  returns boolean language sql immutable strict as
  $$select $1.relpersistence = 't';$$;
 
 Just don't forget to use the table name or alias in front of
 it... :-)
 
 Oh, nice hack. How far back does that work (pgTAP runs on 8.0 and
 higher)?
 
As far as I know, the technique of creating a function with a record
type as its only parameter to use as a generated column goes way
back.  This particular function won't work prior to 9.1, because you
won't have the relpersistence column, but then, prior to 9.1 you
wouldn't need to run this because you already have a relistemp
column.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 9:55 AM, Heikki Linnakangas wrote:

 Far back. But you only need it in = 9.1. Older versions have the 
 pg_class.relistemp column anyway.

Yeah.

 Not sure how this helps, though. If you modify pgTAP to install that 
 automatically in pgTAP when dealing with a new server version, you might as 
 well modify its queries to use relispersistence = 't' directly when dealing 
 with a new server version. It works as a manual work-around if you can't 
 upgrade pgTAP, I guess.

Yeah, that's what I'd rather avoid. I'll probably have to modify the function 
that makes the call to look at the version number. Annoying, but do-able.

  https://github.com/theory/pgtap/blob/master/sql/pgtap.sql.in#L5894

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need help understanding pg_locks

2011-07-14 Thread Bruce Momjian
Florian Pflug wrote:
 On Jul13, 2011, at 21:08 , Bruce Momjian wrote:
  -   OID of the database in which the object exists, or
  -   zero if the object is a shared object, or
  -   null if the lock object is on a transaction ID
  +   OID of the database in which the lock target exists, or
  +   zero if the lock is a shared object, or
  +   null if the lock is on a transaction ID
 
 This sounds good.
 
  +   OID of the relation lock target, or null if the lock is not
 on a relation or part of a relation
 
 That, however, not so much. relation lock target might easily
 be interpreted as the relation's lock target or the
 relation lock's target - at least by non-native speakers such
 as myself. The same is true fro transaction lock target and
 friends.
 
 Can't we simply go with Locked relation, Locked transaction id
 and so on (as in my versions B,C and D up-thread)? I can't really
 get excited about the slight imprecision caused by the fact that some
 rows describe aspiring lock holders instead of current lock holders.
 The existence of the granted column makes the situation pretty clear.
 
 Plus, it's technically not even wrong - a process is waiting because
 somebody else *is* actually holding a lock on the object. So
 the tuple/transaction/... is, in fact, a Locked tuple/transaction/...

I think it will be very confusing to have locked refer to the person
holding the lock while the row is based on who is waiting for it.

I reworded that line to:

+   OID of the relation of the lock target, or null if the lock is not

Update patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index c5851af..84c2257 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -6928,9 +6928,9 @@
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
   entry
-   OID of the database in which the object exists, or
-   zero if the object is a shared object, or
-   null if the lock object is on a transaction ID
+   OID of the database in which the lock target exists, or
+   zero if the lock is a shared object, or
+   null if the lock is on a transaction ID
   /entry
  /row
  row
@@ -6938,7 +6938,7 @@
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
   entry
-   OID of the relation, or null if the lock object is not
+   OID of the relation of the lock target, or null if the lock is not
on a relation or part of a relation
   /entry
  /row
@@ -6947,7 +6947,7 @@
   entrytypeinteger/type/entry
   entry/entry
   entry
-   Page number within the relation, or null if the lock object
+   Page number within the relation, or null if the lock
is not on a tuple or relation page
   /entry
  /row
@@ -6956,7 +6956,7 @@
   entrytypesmallint/type/entry
   entry/entry
   entry
-   Tuple number within the page, or null if the lock object is not
+   Tuple number within the page, or null if the lock is not
on a tuple
   /entry
  /row
@@ -6965,7 +6965,7 @@
   entrytypetext/type/entry
   entry/entry
   entry
-   Virtual ID of a transaction lock, or null if the lock object is not
+   Virtual ID of a transaction lock target, or null if the lock is not
on a virtual transaction ID
   /entry
  /row
@@ -6974,7 +6974,7 @@
   entrytypexid/type/entry
   entry/entry
   entry
-   ID of a transaction lock, or null if the lock object is not on a transaction ID
+   ID of a transaction lock target, or null if the lock is not on a transaction ID
   /entry
  /row
  row
@@ -6982,8 +6982,8 @@
   entrytypeoid/type/entry
   entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
   entry
-   OID of the system catalog containing the object, or null if the
-   lock object is not on a general database object.
+   OID of the system catalog containing the lock target, or null if the
+   lock is not on a general database object.
   /entry
  /row
  row
@@ -6992,7 +6992,7 @@
   entryany OID column/entry
   entry
OID of the object within its system catalog, or null if the
-   lock object is not on a general database object.
+   lock is not on a general database object.
For advisory locks it is used to distinguish the two key
spaces (1 for an int8 key, 2 for two int4 keys).
   /entry
@@ -7005,7 +7005,7 @@
For a table column, this is the column number (the
structfieldclassid/ and structfieldobjid/ refer to the
  

Re: [HACKERS] Re: patch review : Add ability to constrain backend temporary file space

2011-07-14 Thread Mark Kirkwood

On 14/07/11 18:48, Tatsuo Ishii wrote:


Hi I am the new reviewer:-)

I have looked into the v6 patches. One thing I would like to suggest
is, enhancing the error message when temp_file_limit will be exceeded.

ERROR:  aborting due to exceeding temp file limit

Is it possible to add the current temp file limit to the message? For
example,

ERROR:  aborting due to exceeding temp file limit 1kB

I know the current setting of temp_file_limit can be viewd in other
ways, but I think this will make admin's or application developer's
life a little bit easier.


Hi Tatsuo,

Yeah, good suggestion - I agree that it would be useful to supply extra 
detail, I'll amend and resubmit a new patch (along with whatever review 
modifications we come up with in the meantime)!


Cheers

Mark

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] help with sending email

2011-07-14 Thread Robert Haas
On Jul 13, 2011, at 12:29 PM, Fernando Acosta Torrelly fgaco...@gmail.com 
wrote:
 Hi everybody:
 
 I am using pgmail to send email in an application, but I would like to use 
 html format
 
 Does anybody has an example how to do this?, or what do you recommend me to 
 use por doing this?
 
 Thanks in advance for your attention.
 
This is the mailing list for PostgreSQL, not pgmail.  And it is also a 
development mailing list; there are others for user questions.

...Robert

Re: [HACKERS] [BUG] SSPI authentication fails on Windows when server parameter is localhost or domain name

2011-07-14 Thread Magnus Hagander
On Wed, Jun 15, 2011 at 10:53, Ahmed Shinwari ahmed.shinw...@gmail.com wrote:
 Hi All,

 I faced a bug on Windows while connecting via SSPI authentication. I was
 able to find the bug and have attached the patch. Details listed below;

 Postgres Installer: Version 9.0.4
 OS: Windows Server 2008 R2/Windows 7

big snip

Thanks - great analysis!

However, I think there is a better fix for this - simply moving a }
one line. In particular, I'm concerned about passing the same pointer
both as input and output to the function - I couldn't find anything in
the documentation saying this was safe (nor did I find anything saying
it's unsafe, but.) Especially since this code clearly behaves
different on different versions - I've been completely unable to
reproduce this on any of my test machines, but they are all Windows
Server 2003.

So - attached is a new version of the patch, how does this look to
you? FYI, I've had Thom test this new version and it does appear to
work fine in his scenario.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 7799111..936cfea 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -1349,16 +1349,22 @@ pg_SSPI_recvauth(Port *port)
 		  _(could not accept SSPI security context), r);
 		}
 
+		/*
+		 * Overwrite the current context with the one we just received.
+		 * If sspictx is NULL it was the first loop and we need to allocate
+		 * a buffer for it. On subsequent runs, we can just overwrite the
+		 * buffer contents since the size does not change.
+		 */
 		if (sspictx == NULL)
 		{
 			sspictx = malloc(sizeof(CtxtHandle));
 			if (sspictx == NULL)
 ereport(ERROR,
 		(errmsg(out of memory)));
-
-			memcpy(sspictx, newctx, sizeof(CtxtHandle));
 		}
 
+		memcpy(sspictx, newctx, sizeof(CtxtHandle));
+
 		if (r == SEC_I_CONTINUE_NEEDED)
 			elog(DEBUG4, SSPI continue needed);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Josh Berkus
All,

BTW, if we're dumping relistemp, we're going to need to notify every
maker of a PostgreSQL admin interface before we release 9.1.

This is why we should have had a complete set of sysviews ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Bruce Momjian
Josh Berkus wrote:
 All,
 
 BTW, if we're dumping relistemp, we're going to need to notify every
 maker of a PostgreSQL admin interface before we release 9.1.
 
 This is why we should have had a complete set of sysviews ...

Are they not testing our 9.1 betas?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Understanding GIN posting trees

2011-07-14 Thread Teodor Sigaev

I have a couple of questions on GIN:

The code seems to assume that it's possible for the same TID to appear
twice for a single key (see addItemPointersToTuple()). I understand that
it's possible for a single heap tuple to contain the same key twice. For
example if you index an array of integers like [1,2,1]. But once you've
inserted all the keys for a single heap item, you never try to insert
the same TID again, so no duplicates should occur.

Looking at the history, it looks like pre-8.4 we assumed that no such
duplicates are possible. Duplicates of a single key for one column are
eliminated in extractEntriesSU(), but apparently when the multi-column
support was added, we didn't make the de-duplication to run across the
keys extracted from all columns. Now that the posting tree/list
insertion code has to deal with duplicates anyway, the de-duplication
performed in extractEntriesSU() seems pointless. But I wonder if it
would be better to make extractEntriesSU() remove duplicates across all
columns, so that the insertion code wouldn't need to deal with duplicates.


During vacuuming of pending list we could get a powerloss and some data will be 
in tree and pending list both, after restart pgsql will try to insert the same 
data to tree.



Dealing with the duplicates in the insertion code isn't particularly
difficult. And in fact, now that we only support the getbitmap method,
we wouldn't really need to eliminate duplicates anyway. But I have an
ulterior motive:



Why is the posting tree a tree? AFAICS, we never search it using the
TID, it's always scanned in whole. It would be simpler to store the TIDs
in a posting list in no particular order. This could potentially make
insertions cheaper, as you could just append to the last posting list
page for the key, instead of traversing the posting tree to a particular
location. You could also pack the tids denser, as you wouldn't need to
reserve free space for additions in the middle.
For consistentFn call we need to collect all data for current tid. We do that by 
scanning over logical ordered arrays of tids and trees allows to do that by 
scanning a leafs pages.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Pavan Deolasee
On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 14.07.2011 18:57, Pavan Deolasee wrote:

 On Thu, Jul 14, 2011 at 11:46 AM, Simon Riggssi...@2ndquadrant.com
  wrote:

 I'd say that seems way too complex for such a small use case and we've

 only just fixed the bugs from 8.4 vacuum map complexity. The code's
 looking very robust now and I'm uneasy that such changes are really
 worth it.

  Thanks Simon for looking at the patch.

 I am not sure if the use case is really narrow. Today, we dirty the pages
 in
 both the passes and also emit WAL records. Just the heap scan can take a
 very long time for large tables, blocking the autovacuum worker threads
 from
 doing useful work on other tables. If I am not wrong, we use ring buffers
 for vacuum which would most-likely force those buffers to be written/read
 twice to the disk.


 Seems worthwhile to me. What bothers me a bit is the need for the new
 64-bit LSN value on each heap page. Also, note that temporary tables are not
 WAL-logged, so there's no LSNs.


Yeah, the good thing is we store it only when its needed. Temp and unlogged
tables don't need any special handling because we don't rely on the WAL
logging for the table itself. As you have noted down the thread, any counter
which is guaranteed to not wrap-around would have worked. LSN is just very
handy for the purpose (let me think more if we can just do with a flag).


 How does this interact with the visibility map? If you set the visibility
 map bit after vacuuming indexes, a subsequent vacuum will not visit the
 page. The second vacuum will update relindxvacxlogid/off, but it will not
 clean up the dead line pointers left behind by the first vacuum. Now the LSN
 on the page differs from the one stored in pg_class, so subsequent pruning
 will not remove the dead line pointers either. I think you can sidestep that
 if you check that the page's vacuum LSN = vacuum LSN in pg_class, instead
 of equality.


Hmm. We need to carefully see that. As the patch stands, we don't set the
visibility map bit when there are any dead line pointers on the page. I'm
not sure if its clear, but the ItemIdIsDead() test accounts for dead as well
as dead-vacuum line pointers, so the test in lazy_heap_scan() won't let VM
bit to be set early. The next vacuum cycle would still look at the page and
set the bit if the page appears all-visible at that time. Note that in the
next vacuum cycle, we would first clear the dead-vacuum line pointers if its
not already done by some intermediate HOT-prune operation.



 Ignoring the issue stated in previous paragraph, I think you wouldn't
 actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
 doesn't matter. In fact, a single bit would be enough. After a successful
 vacuum, the counter on each heap page (with dead line pointers) is N, and
 the value in pg_class is N. There are no other values on the heap, because
 vacuum will have cleaned them up. When you begin the next vacuum, it will
 stamp pages with N+1. So at any stage, there is only one of two values on
 any page, so a single bit is enough. (But as I said, that doesn't hold if
 vacuum skips some pages thanks to the visibility map)


I am not sure if that can take care of aborted vacuum case with a single bit
or a counter that can wrap-around. Let me think more about it.



 Is there something in place to make sure that pruning uses an up-to-date
 relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date,
 you'll just miss the opportunity to remove some dead tuples.


Yeah, exactly. We just rely on the value that was read when the pg_class
tuple was last read. If we don't have the up-to-date value, we might miss
some opportunity to clean up those dead-vauum line pointers.


 Seems odd to store relindxvacxlogid/off as two int32 columns. Store it in
 one uint64 column, or invent a new datatype for LSNs, or store it as text in
 %X/%X format.


Yeah. I don't remember what issues I faced with uint64 column, may be did
not get around the case where uint64 is not natively supported on the
platform. But %X/%X looks very reasonable. Will change to that.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Need help understanding pg_locks

2011-07-14 Thread Florian Pflug
On Jul14, 2011, at 19:06 , Bruce Momjian wrote:
 Florian Pflug wrote:
 On Jul13, 2011, at 21:08 , Bruce Momjian wrote:
 +   OID of the relation lock target, or null if the lock is not
   on a relation or part of a relation
 
 That, however, not so much. relation lock target might easily
 be interpreted as the relation's lock target or the
 relation lock's target - at least by non-native speakers such
 as myself. The same is true fro transaction lock target and
 friends.
 
 Can't we simply go with Locked relation, Locked transaction id
 and so on (as in my versions B,C and D up-thread)? I can't really
 get excited about the slight imprecision caused by the fact that some
 rows describe aspiring lock holders instead of current lock holders.
 The existence of the granted column makes the situation pretty clear.
 
 Plus, it's technically not even wrong - a process is waiting because
 somebody else *is* actually holding a lock on the object. So
 the tuple/transaction/... is, in fact, a Locked tuple/transaction/...
 
 I think it will be very confusing to have locked refer to the person
 holding the lock while the row is based on who is waiting for it.

I still believe the chance of confusion to be extremely small, but since
you feel otherwise, what about Targeted instead of Locked. As in

  OID of the relation targeted by the lock, or null if the lock does not
  target a relation or part of a relation.

  Page number within the relation targeted by the lock, or null if the
  lock does not target a tuple or a relation page.

  Virtual ID of the transaction targeted by the lock, or null if the lock
  does not target a virtual transaction ID.

Protected/protects instead of Targeted/targets would also work.

Both avoid the imprecision of saying Locked, and the ambiguity on -
which might either mean the physical location of the lock, or the object
its protecting/targeting. 

 I reworded that line to:
 
 +   OID of the relation of the lock target, or null if the lock is not

I'm not a huge fan of that. IMHO  .. of .. of ..  chains are hard to
read. Plus, there isn't such a thing as the relation of a lock target -
the relation *is* the lock target, not a part thereof.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Alvaro Herrera
Excerpts from Pavan Deolasee's message of jue jul 14 13:54:36 -0400 2011:
 On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:

  Seems odd to store relindxvacxlogid/off as two int32 columns. Store it in
  one uint64 column, or invent a new datatype for LSNs, or store it as text in
  %X/%X format.
 
 
 Yeah. I don't remember what issues I faced with uint64 column, may be did
 not get around the case where uint64 is not natively supported on the
 platform. But %X/%X looks very reasonable. Will change to that.

Where is this to be stored?

AFAIR we no longer support platforms that do not have working 64 bit
integer types.

As a column name, relindxvacxlogid seems a bit unfortunate, BTW ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow pg_archivecleanup to ignore extensions

2011-07-14 Thread Josh Berkus
On 7/12/11 11:17 AM, Josh Berkus wrote:
 On 7/12/11 7:38 AM, Simon Riggs wrote:
 On Sun, Jul 10, 2011 at 7:13 PM, Josh Berkus j...@agliodbs.com wrote:

 This patch[1] is for some reason marked waiting on Author.  But I
 can't find that there's been any review of it searching the list.
 What's going on with it?  Has it been reviewed?

 Yes, I reviewed it on list. Some minor changes were discussed. I'm
 with Greg now, so we'll discuss and handle it.
 
 I couldn't find the review searching the archives.  Can you please link
 it in the Commitfest application?  Thanks.

Given the total lack of activity on this patch, I'm bumping it.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extension ownership and pg_dump

2011-07-14 Thread Heikki Linnakangas

pg_dump seems to not dump the owner of extensions. Is that intentional?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is there a committer in the house?

2011-07-14 Thread Josh Berkus
All,

Currently we have 8 patches Ready for Committer in the current CF.
Some of them have been that status for some time.

From traffic on this list, I'm getting the impression that nobody other
than Robert, Heikki and Tom are committing other people's patches.  I
know we have more committers than that.  Bruce?  Simon?  Itagaki?  Bueller?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension ownership and pg_dump

2011-07-14 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 pg_dump seems to not dump the owner of extensions. Is that intentional?

Yeah.  Partly it was a matter of not wanting to implement ALTER
EXTENSION OWNER, but I think there were some definitional issues too.
See the archives from back around February or so.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Understanding GIN posting trees

2011-07-14 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Why is the posting tree a tree? AFAICS, we never search it using the 
 TID, it's always scanned in whole. It would be simpler to store the TIDs 
 in a posting list in no particular order. This could potentially make 
 insertions cheaper, as you could just append to the last posting list 
 page for the key, instead of traversing the posting tree to a particular 
 location. You could also pack the tids denser, as you wouldn't need to 
 reserve free space for additions in the middle.

Surely VACUUM would like to search it by TID for deletion purposes?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-column generic option

2011-07-14 Thread Josh Berkus
All,

Is the spec for this feature still under discussion?  I don't seem to
see a consensus on this thread.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Josh Berkus wrote:
 BTW, if we're dumping relistemp, we're going to need to notify every
 maker of a PostgreSQL admin interface before we release 9.1.

 Are they not testing our 9.1 betas?

There has never, ever, been a guarantee that the system catalogs don't
change across versions.  Anybody issuing such queries should expect to
have to retest them and possibly change them in each new major release.
I see nothing to sweat about here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Understanding GIN posting trees

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 22:10, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

Why is the posting tree a tree? AFAICS, we never search it using the
TID, it's always scanned in whole. It would be simpler to store the TIDs
in a posting list in no particular order. This could potentially make
insertions cheaper, as you could just append to the last posting list
page for the key, instead of traversing the posting tree to a particular
location. You could also pack the tids denser, as you wouldn't need to
reserve free space for additions in the middle.


Surely VACUUM would like to search it by TID for deletion purposes?


It doesn't, it scans all the tid lists in whole. I guess it could search 
by TID, it could be a win if there's only a few deleted tuples, in a 
small range of pages.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: enhanced get diagnostics statement 2

2011-07-14 Thread Alvaro Herrera
A couple items for this patch:

The docs state that the variable to receive each diagnostic value needs
to be of the right data type but fails to specify what it is.  I think
it'd be good to turn that itemizedlist into a table with three
columns: name, type, description.

This seems poor style:

+   case PLPGSQL_GETDIAG_ERROR_CONTEXT:
+   case PLPGSQL_GETDIAG_ERROR_DETAIL:
+   case PLPGSQL_GETDIAG_ERROR_HINT:
+   case PLPGSQL_GETDIAG_RETURNED_SQLSTATE:
+   case PLPGSQL_GETDIAG_MESSAGE_TEXT:
+   if (!$2)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg(EXCEPTION_CONTEXT or 
EXCEPTION_DETAIL or EXCEPTION_HINT or RETURNED_SQLSTATE or MESSAGE_TEXT are not 
allowed in current diagnostics statement),
+parser_errposition(@1)));
+   


I think we could replace this with something like

+   if (!$2)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg(diagnostic value %s is not 
allowed in GET CURRENT DIAGNOSTICS statement, stringify(ditem-kind)),


Other than that, and a few grammar fixes in code comments, this seems
good to me.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 12:19 PM, Tom Lane wrote:

 Are they not testing our 9.1 betas?
 
 There has never, ever, been a guarantee that the system catalogs don't
 change across versions.  Anybody issuing such queries should expect to
 have to retest them and possibly change them in each new major release.
 I see nothing to sweat about here.

A deprecation cycle at least might be useful.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need help understanding pg_locks

2011-07-14 Thread Bruce Momjian
Florian Pflug wrote:
 I still believe the chance of confusion to be extremely small, but since
 you feel otherwise, what about Targeted instead of Locked. As in
 
   OID of the relation targeted by the lock, or null if the lock does not
   target a relation or part of a relation.
 
   Page number within the relation targeted by the lock, or null if the
   lock does not target a tuple or a relation page.
 
   Virtual ID of the transaction targeted by the lock, or null if the lock
   does not target a virtual transaction ID.
 
 Protected/protects instead of Targeted/targets would also work.
 
 Both avoid the imprecision of saying Locked, and the ambiguity on -
 which might either mean the physical location of the lock, or the object
 its protecting/targeting. 
 
  I reworded that line to:
  
  +   OID of the relation of the lock target, or null if the lock is not
 
 I'm not a huge fan of that. IMHO  .. of .. of ..  chains are hard to
 read. Plus, there isn't such a thing as the relation of a lock target -
 the relation *is* the lock target, not a part thereof.

Agreed.  I like targeted by.  New patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index c5851af..6fa6fa9
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 6928,6936 
entrytypeoid/type/entry
entryliterallink linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
entry
!OID of the database in which the object exists, or
!zero if the object is a shared object, or
!null if the lock object is on a transaction ID
/entry
   /row
   row
--- 6928,6936 
entrytypeoid/type/entry
entryliterallink linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
entry
!OID of the database in which the lock target exists, or
!zero if the lock is a shared object, or
!null if the lock is on a transaction ID
/entry
   /row
   row
***
*** 6938,6944 
entrytypeoid/type/entry
entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the relation, or null if the lock object is not
 on a relation or part of a relation
/entry
   /row
--- 6938,6944 
entrytypeoid/type/entry
entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the relation targeted by the lock, or null if the lock is not
 on a relation or part of a relation
/entry
   /row
***
*** 6947,6953 
entrytypeinteger/type/entry
entry/entry
entry
!Page number within the relation, or null if the lock object
 is not on a tuple or relation page
/entry
   /row
--- 6947,6953 
entrytypeinteger/type/entry
entry/entry
entry
!Page number within the relation targeted by the lock, or null if the lock
 is not on a tuple or relation page
/entry
   /row
***
*** 6956,6963 
entrytypesmallint/type/entry
entry/entry
entry
!Tuple number within the page, or null if the lock object is not
!on a tuple
/entry
   /row
   row
--- 6956,6963 
entrytypesmallint/type/entry
entry/entry
entry
!Tuple number within the page targeted by the lock, or null if
!the lock is not on a tuple
/entry
   /row
   row
***
*** 6965,6971 
entrytypetext/type/entry
entry/entry
entry
!Virtual ID of a transaction lock, or null if the lock object is not
 on a virtual transaction ID
/entry
   /row
--- 6965,6971 
entrytypetext/type/entry
entry/entry
entry
!Virtual ID of a transaction targeted by the lock, or null if the lock is not
 on a virtual transaction ID
/entry
   /row
***
*** 6974,6980 
entrytypexid/type/entry
entry/entry
entry
!ID of a transaction lock, or null if the lock object is not on a transaction ID
/entry
   /row
   row
--- 6974,6981 
entrytypexid/type/entry
entry/entry
entry
!ID of a transaction targeted by the lock, or null if the lock
!is not on a transaction ID
/entry
   /row
   row
***
*** 6982,6989 
entrytypeoid/type/entry
entryliterallink linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the 

Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Kevin Grittner
David E. Wheeler da...@kineticode.com wrote:
 
 A deprecation cycle at least might be useful.
 
How about a relistemp extension on pgxn.org for the generated
column function to provide the backward compatibility?  Is the new
extension mechanism a sane way to help those who need a phase-out
period?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: enhanced get diagnostics statement 2

2011-07-14 Thread Pavel Stehule
2011/7/14 Alvaro Herrera alvhe...@commandprompt.com:
 A couple items for this patch:

 The docs state that the variable to receive each diagnostic value needs
 to be of the right data type but fails to specify what it is.  I think
 it'd be good to turn that itemizedlist into a table with three
 columns: name, type, description.

 This seems poor style:

 +                               case PLPGSQL_GETDIAG_ERROR_CONTEXT:
 +                               case PLPGSQL_GETDIAG_ERROR_DETAIL:
 +                               case PLPGSQL_GETDIAG_ERROR_HINT:
 +                               case PLPGSQL_GETDIAG_RETURNED_SQLSTATE:
 +                               case PLPGSQL_GETDIAG_MESSAGE_TEXT:
 +                                   if (!$2)
 +                                       ereport(ERROR,
 +                                           (errcode(ERRCODE_SYNTAX_ERROR),
 +                                            errmsg(EXCEPTION_CONTEXT or 
 EXCEPTION_DETAIL or EXCEPTION_HINT or RETURNED_SQLSTATE or MESSAGE_TEXT are 
 not allowed in current diagnostics statement),
 +                                                    parser_errposition(@1)));
 +


 I think we could replace this with something like

 +                                   if (!$2)
 +                                       ereport(ERROR,
 +                                           (errcode(ERRCODE_SYNTAX_ERROR),
 +                                            errmsg(diagnostic value %s is 
 not allowed in GET CURRENT DIAGNOSTICS statement, stringify(ditem-kind)),


 Other than that, and a few grammar fixes in code comments, this seems
 good to me.


it is good idea

Regards

Pavel

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch: enhanced get diagnostics statement 2

2011-07-14 Thread Alvaro Herrera
Excerpts from Pavel Stehule's message of jue jul 14 16:25:56 -0400 2011:
 2011/7/14 Alvaro Herrera alvhe...@commandprompt.com:
  A couple items for this patch:

 it is good idea

Thanks ... I expect you're going to resubmit the patch based on David's
last version and my comments?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-07-14 Thread Alexander Korotkov
Do you think using rightlink as pointer to parent page is possible during
index build? It would allow to simplify code significantly, because of no
more need to maintain in-memory structures for parents memorizing.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] patch: enhanced get diagnostics statement 2

2011-07-14 Thread Pavel Stehule
2011/7/14 Alvaro Herrera alvhe...@commandprompt.com:
 Excerpts from Pavel Stehule's message of jue jul 14 16:25:56 -0400 2011:
 2011/7/14 Alvaro Herrera alvhe...@commandprompt.com:
  A couple items for this patch:

 it is good idea

 Thanks ... I expect you're going to resubmit the patch based on David's
 last version and my comments?


yes, but tomorrow, time to go sleep

Regards

Pavel

 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Need help understanding pg_locks

2011-07-14 Thread Florian Pflug
On Jul14, 2011, at 22:18 , Bruce Momjian wrote:
 !OID of the database in which the lock target exists, or
 !zero if the lock is a shared object, or
 !null if the lock is on a transaction ID

For consistency, I think it should say target in the second part
of the sentence also now, instead of lock ... on.

Updated patch attached. I tried to make the descriptions a
bit more consistent, replaced object by target, and
added targeted by after the phrase which describes the
locked (or waited-for) object.

best regards,
Florian Pflug

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d4a1d36..33be5d0 100644
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***
*** 6928,6936 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
entry
!OID of the database in which the object exists, or
!zero if the object is a shared object, or
!null if the object is a transaction ID
/entry
   /row
   row
--- 6928,6936 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
entry
!OID of the database in which the lock target exists, or
!zero if the target is a shared object, or
!null if the target is a transaction ID
/entry
   /row
   row
***
*** 6938,6944 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the relation, or null if the object is not
 a relation or part of a relation
/entry
   /row
--- 6938,6944 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the relation targeted by the lock, or null if the target is not
 a relation or part of a relation
/entry
   /row
***
*** 6947,6954 
entrytypeinteger/type/entry
entry/entry
entry
!Page number within the relation, or null if the object
!is not a tuple or relation page
/entry
   /row
   row
--- 6947,6954 
entrytypeinteger/type/entry
entry/entry
entry
!Page number targeted by the lock within the relation,
!or null if the target is not a relation page or tuple
/entry
   /row
   row
***
*** 6956,6962 
entrytypesmallint/type/entry
entry/entry
entry
!Tuple number within the page, or null if the object is not a tuple
/entry
   /row
   row
--- 6956,6963 
entrytypesmallint/type/entry
entry/entry
entry
!Tuple number targeted by the lock within the page,
!or null if the target is not a tuple
/entry
   /row
   row
***
*** 6964,6971 
entrytypetext/type/entry
entry/entry
entry
!Virtual ID of a transaction, or null if the object is not a
!virtual transaction ID
/entry
   /row
   row
--- 6965,6972 
entrytypetext/type/entry
entry/entry
entry
!Virtual ID of the transaction targeted by the lock,
!or null if the target is not a virtual transaction ID
/entry
   /row
   row
***
*** 6973,6979 
entrytypexid/type/entry
entry/entry
entry
!ID of a transaction, or null if the object is not a transaction ID
/entry
   /row
   row
--- 6974,6981 
entrytypexid/type/entry
entry/entry
entry
!ID of the transaction targeted by the lock,
!or null if the target is not a transaction ID
/entry
   /row
   row
***
*** 6981,6988 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the system catalog containing the object, or null if the
!object is not a general database object
/entry
   /row
   row
--- 6983,6990 
entrytypeoid/type/entry
entryliterallink 
linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
entry
!OID of the system catalog containing the lock target, or null if the
!target is not a general database object
/entry
   /row
   row
***
*** 6990,6997 
entrytypeoid/type/entry
entryany OID column/entry
entry
!OID of the object within its system catalog, or null if the
!object is not a general database object.
 For advisory locks it is used to distinguish the two key
 spaces 

Re: [HACKERS] WIP: Fast GiST index build

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 23:41, Alexander Korotkov wrote:

Do you think using rightlink as pointer to parent page is possible during
index build? It would allow to simplify code significantly, because of no
more need to maintain in-memory structures for parents memorizing.


I guess, but where do you store the rightlink, then? Would you need a 
final pass through the index to fix all the rightlinks?


I think you could use the NSN field. It's used to detect concurrent page 
splits, but those can't happen during index build, so you don't need 
that field during index build. You just have to make it look like an 
otherwise illegal NSN, so that it won't be mistaken for a real NSN after 
the index is built. Maybe add a new flag to mean that the NSN is 
actually invalid.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-07-14 Thread Alexander Korotkov
On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 One thing that caught my eye is that when you empty a buffer, you load the
 entire subtree below that buffer, down to the next buffered or leaf level,
 into memory. Every page in that subtree is kept pinned. That is a problem;
 in the general case, the buffer manager can only hold a modest number of
 pages pinned at a time. Consider that the minimum value for shared_buffers
 is just 16. That's unrealistically low for any real system, but the default
 is only 32MB, which equals to just 4096 buffers. A subtree could easily be
 larger than that.

With level step = 1 we need only 2 levels in subtree. With mininun index
tuple size (12 bytes) each page can have at maximum 675. Thus I think
default shared_buffers is enough for level step = 1. I believe it's enough
to add check we have sufficient shared_buffers, isn't it?


 I don't think you're benefiting at all from the buffering that BufFile does
 for you, since you're reading/writing a full block at a time anyway. You
 might as well use the file API in fd.c directly, ie.
 OpenTemporaryFile/FileRead/**FileWrite.

BufFile is distributing temporary data through several files. AFAICS
postgres avoids working with files larger than 1GB. Size of tree buffers can
easily be greater. Without BufFile I need to maintain set of files manually.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Need help understanding pg_locks

2011-07-14 Thread Bruce Momjian

Looks good to me.

---

Florian Pflug wrote:
 On Jul14, 2011, at 22:18 , Bruce Momjian wrote:
  !OID of the database in which the lock target exists, or
  !zero if the lock is a shared object, or
  !null if the lock is on a transaction ID
 
 For consistency, I think it should say target in the second part
 of the sentence also now, instead of lock ... on.
 
 Updated patch attached. I tried to make the descriptions a
 bit more consistent, replaced object by target, and
 added targeted by after the phrase which describes the
 locked (or waited-for) object.
 
 best regards,
 Florian Pflug
 
 diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
 index d4a1d36..33be5d0 100644
 *** a/doc/src/sgml/catalogs.sgml
 --- b/doc/src/sgml/catalogs.sgml
 ***
 *** 6928,6936 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
 entry
 !OID of the database in which the object exists, or
 !zero if the object is a shared object, or
 !null if the object is a transaction ID
 /entry
/row
row
 --- 6928,6936 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-databasestructnamepg_database/structname/link.oid/literal/entry
 entry
 !OID of the database in which the lock target exists, or
 !zero if the target is a shared object, or
 !null if the target is a transaction ID
 /entry
/row
row
 ***
 *** 6938,6944 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
 entry
 !OID of the relation, or null if the object is not
  a relation or part of a relation
 /entry
/row
 --- 6938,6944 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
 entry
 !OID of the relation targeted by the lock, or null if the target is 
 not
  a relation or part of a relation
 /entry
/row
 ***
 *** 6947,6954 
 entrytypeinteger/type/entry
 entry/entry
 entry
 !Page number within the relation, or null if the object
 !is not a tuple or relation page
 /entry
/row
row
 --- 6947,6954 
 entrytypeinteger/type/entry
 entry/entry
 entry
 !Page number targeted by the lock within the relation,
 !or null if the target is not a relation page or tuple
 /entry
/row
row
 ***
 *** 6956,6962 
 entrytypesmallint/type/entry
 entry/entry
 entry
 !Tuple number within the page, or null if the object is not a tuple
 /entry
/row
row
 --- 6956,6963 
 entrytypesmallint/type/entry
 entry/entry
 entry
 !Tuple number targeted by the lock within the page,
 !or null if the target is not a tuple
 /entry
/row
row
 ***
 *** 6964,6971 
 entrytypetext/type/entry
 entry/entry
 entry
 !Virtual ID of a transaction, or null if the object is not a
 !virtual transaction ID
 /entry
/row
row
 --- 6965,6972 
 entrytypetext/type/entry
 entry/entry
 entry
 !Virtual ID of the transaction targeted by the lock,
 !or null if the target is not a virtual transaction ID
 /entry
/row
row
 ***
 *** 6973,6979 
 entrytypexid/type/entry
 entry/entry
 entry
 !ID of a transaction, or null if the object is not a transaction ID
 /entry
/row
row
 --- 6974,6981 
 entrytypexid/type/entry
 entry/entry
 entry
 !ID of the transaction targeted by the lock,
 !or null if the target is not a transaction ID
 /entry
/row
row
 ***
 *** 6981,6988 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
 entry
 !OID of the system catalog containing the object, or null if the
 !object is not a general database object
 /entry
/row
row
 --- 6983,6990 
 entrytypeoid/type/entry
 entryliterallink 
 linkend=catalog-pg-classstructnamepg_class/structname/link.oid/literal/entry
 entry
 !OID of the system catalog containing the lock target, or null if the
 !target is not a general database object
 /entry
/row
row
 ***
 *** 6990,6997 
   

Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Josh Berkus

 There has never, ever, been a guarantee that the system catalogs don't
 change across versions.  Anybody issuing such queries should expect to
 have to retest them and possibly change them in each new major release.

I know that's always been our policy.  It his, however,
vendor-unfriendly because we don't supply any interface for many things
(such as temp tables) other than the system catalogs.

So if we're going to break compatibility, then we could stand to make a
little noise about it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reduced power consumption in WAL Writer process

2011-07-14 Thread Simon Riggs
On Wed, Jul 13, 2011 at 10:56 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Attached is patch for the WAL writer that removes its tight polling
 loop (which probably doesn't get hit often in practice, as we just
 sleep if wal_writer_delay is under a second), and, at least
 potentially, reduces power consumption when idle by using a latch.

 I will break all remaining power consumption work down into
 per-auxiliary process patches. I think that this is appropriate - if
 we hit a snag on one of the processes, there is no need to have that
 hold up everything.

 I've commented that we handle all expected signals, and therefore we
 shouldn't worry about having timeout invalidated by signals, just as
 with the archiver. Previously, we didn't even worry about Postmaster
 death within the tight polling loop, presumably because
 wal_writer_delay is typically small enough to avoid that being a
 problem. I thought that WL_POSTMASTER_DEATH might be superfluous here,
 but then again there is a codepath specifically for the case where
 wal_writer_delay exceeds one second, so it is included in this initial
 version.

 Comments?

ISTM that this in itself isn't enough to reduce power consumption.

Currently the only people that use WALWriter are asynchronous commits,
so we should include within RecordTransactionCommit() a SetLatch()
command for the WALWriter.

That way we can have WALWriter sleep until its needed.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Magnus Hagander
On Thu, Jul 14, 2011 at 21:59, Josh Berkus j...@agliodbs.com wrote:

 There has never, ever, been a guarantee that the system catalogs don't
 change across versions.  Anybody issuing such queries should expect to
 have to retest them and possibly change them in each new major release.

 I know that's always been our policy.  It his, however,
 vendor-unfriendly because we don't supply any interface for many things
 (such as temp tables) other than the system catalogs.

 So if we're going to break compatibility, then we could stand to make a
 little noise about it.

We've broken the admin apps in pretty much every single release. And
they generally don't complain. If someone developing an admin app
hasn't been doing extensive testing starting *at the latest* with
beta1 (and recommended per each alpha), they shouldn't expect to
release until quite long after the release.

That said, a stable set of system views certainly wouldn't hurt - but
making extra noise about a simple change like this one would likely
not make a difference.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a committer in the house?

2011-07-14 Thread Simon Riggs
On Thu, Jul 14, 2011 at 8:01 PM, Josh Berkus j...@agliodbs.com wrote:

 Currently we have 8 patches Ready for Committer in the current CF.
 Some of them have been that status for some time.

 From traffic on this list, I'm getting the impression that nobody other
 than Robert, Heikki and Tom are committing other people's patches.  I
 know we have more committers than that.  Bruce?  Simon?  Itagaki?  Bueller?

Someone called Simon Riggs has committed one patch by another author
and reviewed 3 others, as well as spending many days working on bugs
for beta.

It would be sensible if people based their comments on actual events
rather than negative impressions.

ISTM that you are right that there are other committers that have not
done anything. How strange that you name myself, yet not others.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Understanding GIN posting trees

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 17:28, Teodor Sigaev wrote:

Why is the posting tree a tree? AFAICS, we never search it using the
TID, it's always scanned in whole. It would be simpler to store the TIDs
in a posting list in no particular order. This could potentially make
insertions cheaper, as you could just append to the last posting list
page for the key, instead of traversing the posting tree to a particular
location. You could also pack the tids denser, as you wouldn't need to
reserve free space for additions in the middle.

For consistentFn call we need to collect all data for current tid. We do
that by scanning over logical ordered arrays of tids and trees allows to
do that by scanning a leafs pages.


Oh, I see. You essentially do a merge join of all the posting trees of 
query keys.


Hmm, but we do need to scan all the posting trees of all the matched 
keys in whole anyway. We could collect all TIDs in the posting lists of 
all the keys into separate TIDBitmaps, and then combine the bitmaps, 
calling consistentFn for each TID that was present in at least one 
bitmap. I guess the performance characteristics of that would be 
somewhat different from what we have now, and you'd need to keep a lot 
of in-memory bitmaps if the query contains a lot of keys.



While we're at it, it just occurred to me that it if the number of query 
keys is limited, say = 16, you could build a lookup table for each 
combination of keys either occurring or not. You could use then use that 
instead of calling consistentFn for each possible match. You could even 
use the table to detect common cases like all/any keys must match, 
perhaps opening some optimization opportunities elsewhere.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Dave Page
On Thursday, July 14, 2011, Josh Berkus j...@agliodbs.com wrote:

 There has never, ever, been a guarantee that the system catalogs don't
 change across versions.  Anybody issuing such queries should expect to
 have to retest them and possibly change them in each new major release.

 I know that's always been our policy.  It his, however,
 vendor-unfriendly because we don't supply any interface for many things
 (such as temp tables) other than the system catalogs.

 So if we're going to break compatibility, then we could stand to make a
 little noise about it.

As one of said vendors, I completely disagree. There are a ton of
things that change with each release, and all we do by putting in
hacks for backwards compatibility is add bloat that needs to be
maintained, and encourage vendors to be lazy.

Break compatibility is actually something that is important to us - it
forces us to fix obvious issues, and makes it much harder to
inadvertently miss important changes.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-07-14 Thread Heikki Linnakangas

On 14.07.2011 11:33, Alexander Korotkov wrote:

On Wed, Jul 13, 2011 at 5:59 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:


One thing that caught my eye is that when you empty a buffer, you load the
entire subtree below that buffer, down to the next buffered or leaf level,
into memory. Every page in that subtree is kept pinned. That is a problem;
in the general case, the buffer manager can only hold a modest number of
pages pinned at a time. Consider that the minimum value for shared_buffers
is just 16. That's unrealistically low for any real system, but the default
is only 32MB, which equals to just 4096 buffers. A subtree could easily be
larger than that.


With level step = 1 we need only 2 levels in subtree. With mininun index
tuple size (12 bytes) each page can have at maximum 675. Thus I think
default shared_buffers is enough for level step = 1.


Hundreds of buffer pins is still a lot. And with_level_step=2, the 
number of pins required explodes to 675^2 = 455625.


Pinning a buffer that's already in the shared buffer cache is cheap, I 
doubt you're gaining much by keeping the private hash table in front of 
the buffer cache. Also, it's possible that not all of the subtree is 
actually required during the emptying, so in the worst case pre-loading 
them is counter-productive.



I believe it's enough
to add check we have sufficient shared_buffers, isn't it?


Well, what do you do if you deem that shared_buffers is too small? Fall 
back to the old method? Also, shared_buffers is shared by all backends, 
so you can't assume that you get to use all of it for the index build. 
You'd need a wide safety margin.



I don't think you're benefiting at all from the buffering that BufFile does
for you, since you're reading/writing a full block at a time anyway. You
might as well use the file API in fd.c directly, ie.
OpenTemporaryFile/FileRead/**FileWrite.


BufFile is distributing temporary data through several files. AFAICS
postgres avoids working with files larger than 1GB. Size of tree buffers can
easily be greater. Without BufFile I need to maintain set of files manually.


Ah, I see. Makes sense.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small patch for GiST: move childoffnum to child

2011-07-14 Thread Heikki Linnakangas

I think there's two bugs in the existing gistFindPath code:


if (top-parent  XLByteLT(top-parent-lsn, 
GistPageGetOpaque(page)-nsn) 
GistPageGetOpaque(page)-rightlink != 
InvalidBlockNumber /* sanity check */ )
{
/* page splited while we thinking of... */
ptr = (GISTInsertStack *) 
palloc0(sizeof(GISTInsertStack));
ptr-blkno = GistPageGetOpaque(page)-rightlink;
ptr-childoffnum = InvalidOffsetNumber;
ptr-parent = top;
ptr-next = NULL;
tail-next = ptr;
tail = ptr;
}


First, notice that we're setting ptr-parent = top. 'top' is the 
current node we're processing, and ptr represents the node to the right 
of the current node. The current node is *not* the parent of the node to 
the right. I believe that line should be ptr-parent = top-parent.


Second, we're adding the entry for the right sibling to the end of the 
list of nodes to visit. But when we process entries from the list, we 
exit immediately when we see a leaf page. That means that the right 
sibling can get queued up behind leaf pages, and thus never visited.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a committer in the house?

2011-07-14 Thread Josh Berkus

 ISTM that you are right that there are other committers that have not
 done anything. How strange that you name myself, yet not others.

Touchy today, eh?

And I do name others, read the email again.

Anyway, my question is: the list of committers I know of who have
general knowledge of the codebase and can commit a wide variety of other
people's patches are:

Tom
Robert
Heikki
Bruce
Simon

Who else?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 2:10 PM, Dave Page wrote:

 Break compatibility is actually something that is important to us - it
 forces us to fix obvious issues, and makes it much harder to
 inadvertently miss important changes.

Agreed, but a deprecation cycle would be much appreciated.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a committer in the house?

2011-07-14 Thread Bruce Momjian
Josh Berkus wrote:
 
  ISTM that you are right that there are other committers that have not
  done anything. How strange that you name myself, yet not others.
 
 Touchy today, eh?
 
 And I do name others, read the email again.
 
 Anyway, my question is: the list of committers I know of who have
 general knowledge of the codebase and can commit a wide variety of other
 people's patches are:
 
 Tom
 Robert
 Heikki
 Bruce
 Simon

I have found my reading of the email lists is too delayed to effectively
commit other's patches.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Josh Berkus

 As one of said vendors, I completely disagree. 

I don't agree that you qualify as a vendor.  You're on the friggin' core
team.

I'm talking about vendors like DBVizualizer or TORA, for which
PostgreSQL is just one of the databases they support.  If stuff breaks
gratuitously, the reaction of some of them is always to either drop
support or delay it for a year or more.  This doesn't benefit our community.

 There are a ton of
 things that change with each release, and all we do by putting in
 hacks for backwards compatibility is add bloat that needs to be
 maintained, and encourage vendors to be lazy.

I don't agree that having comprehensive system views with multi-version
stability would be a hack.

 Break compatibility is actually something that is important to us - it
 forces us to fix obvious issues, and makes it much harder to
 inadvertently miss important changes.

What I'm hearing from you is: Breaking backwards compatibility is
something we should do more of because it lets us know which vendors are
paying attention and weeds out the unfit.   Is that what you meant to say?

That seems like a way to ensure that PostgreSQL support continue to be
considered optional, or based on outdated versions, for multi-database
tools.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 There are a ton of
 things that change with each release, and all we do by putting in
 hacks for backwards compatibility is add bloat that needs to be
 maintained, and encourage vendors to be lazy.

 I don't agree that having comprehensive system views with multi-version
 stability would be a hack.

If we had that, it wouldn't be a hack.  Putting in a hack to cover the
specific case of relistemp, on the other hand, is just a hack.

The real question here, IMO, is how many applications are there that
really need to know about temporary relations, but have no interest in
the related feature of unlogged relations?.  Because only such apps
would be served by a compatibility hack for this.  An app that thinks it
knows the semantics of relistemp, and isn't updated to grok unlogged
tables, may be worse than broken --- it may be silently incorrect.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Josh Berkus

 I don't agree that having comprehensive system views with multi-version
 stability would be a hack.
 
 If we had that, it wouldn't be a hack.  Putting in a hack to cover the
 specific case of relistemp, on the other hand, is just a hack.

I agree.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 3:05 PM, Tom Lane wrote:

 Josh Berkus j...@agliodbs.com writes:
 There are a ton of
 things that change with each release, and all we do by putting in
 hacks for backwards compatibility is add bloat that needs to be
 maintained, and encourage vendors to be lazy.
 
 I don't agree that having comprehensive system views with multi-version
 stability would be a hack.
 
 If we had that, it wouldn't be a hack.

Is that an endorsement for adding such a feature?

 Putting in a hack to cover the
 specific case of relistemp, on the other hand, is just a hack.

Sure.

 The real question here, IMO, is how many applications are there that
 really need to know about temporary relations, but have no interest in
 the related feature of unlogged relations?.  Because only such apps
 would be served by a compatibility hack for this.  An app that thinks it
 knows the semantics of relistemp, and isn't updated to grok unlogged
 tables, may be worse than broken --- it may be silently incorrect.

So pgTAP creates temporary tables to store result sets so that it can then 
compare the results of two queries. The function in question was getting a list 
of columns in such a temporary table in order to make sure that the types were 
the same between two such tables before comparing results. It checked relistemp 
to make sure it was looking at the temp table rather than some other table that 
might happen to have the same name.

So now the query looks like this:

SELECT pg_catalog.format_type(a.atttypid, a.atttypmod)
  FROM pg_catalog.pg_attribute a
  JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
 WHERE c.relname = $1
-- AND c.relistemp -- 8.3-9.0
   AND c.relpersistence = 't'  -- 9.1
   AND attnum  0
   AND NOT attisdropped
 ORDER BY attnum

Is that all I need to do, or is there something else I should be aware of with 
regard to unlogged tables?

Thanks,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch Review: Collect frequency statistics and selectivity estimation for arrays

2011-07-14 Thread Nathan Boley
Review of patch: https://commitfest.postgresql.org/action/patch_view?id=539

I glanced through the code and played around with the feature and,
in general, it looks pretty good. But I have a some comments/questions.

Design:

First, it makes me uncomfortable that you are using the MCV and histogram slot
kinds in a way that is very different from other data types.

I realize that tsvector uses MCV in the same way that you do but:

1) I don't like that very much either.
2) TS vector is different in that equality ( in the btree sense )
doesn't make sense, whereas it does for arrays.

Using the histogram slot for the array lengths is also very surprising to me.

Why not just use a new STA_KIND? It's not like we are running out of
room, and this will be the second 'container' type that splits the container
and stores stats about the elements.


Second, I'm fairly unclear about what the actual underlying statistical
method is and what assumptions it makes. Before I can really review
the method, I will probably need to see some documentation, as I say below.
Do you have any tests/simulations that explore the estimation procedure
that I can look at? When will it fail? In what cases does it improve
estimation?

Documentation:

Seems a bit lacking - especially if you keep the non standard usage of
mcv/histograms. Also, it would be really nice to see some update to
the row-estimation-examples page ( in chapter 55 ).

The code comments are good in general, but there are not enough high level
comments . It would be really nice to have a comment that gives an overview
of what each of the following functions are supposed to do:

mcelem_array_selec
mcelem_array_contain_overlap_selec

and especially

mcelem_array_contained_selec

Also, in the compute_array_stats you reference compute_tsvector_stats for
the algorithm, but I think that it would be smarter to either copy the
relevant portions of the comment over or to reference a published document.
If compute_tsvector_stats changes, I doubt that anyone will remember to
update the comment.

Finally, it would be really nice to see, explicitly, and in
mathematical notation

1) The data that is being collected and summarized. ( the statistic )
2) The estimate being derived from the statistic ( ie the selectivity )
i) Any assumptions being made ( ie independence of elements within
an array )

For instance, the only note I could find that actually addressed the
estimator and
assumptions underneath it was

+* mult * dist[i] / mcelem_dist[i] gives us probability 
probability
+* of qual matching from assumption of independent 
element occurence
+* with condition that length = i.

and that's pretty cryptic. That should be expanded upon and placed
more prominently.

A couple nit picks:

1) In calc_distr you go to some lengths to avoid round off errors. Since it is
   certainly just the order of the estimate that matters, why not just
   perform the calculation in log space?

2) compute_array_stats is really long. Could you break it up? ( I know that
   the other analyze functions are the same way, but why continue in
that vein? )

Best,
Nathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Simon Riggs
On Thu, Jul 14, 2011 at 4:57 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:

 Thanks Simon for looking at the patch.

Sorry, I didn't notice there was a patch attached. Not reviewed it. I
thought we were still just talking.


 I am not sure if the use case is really narrow.

This is a very rare issue, because of all the work yourself and Heikki
have put in.

It's only important when we have a (1) big table (hence long scan
time), (2) a use case that avoids HOT *and* (3) we have dirtied a
large enough section of table that the vacuum map is ineffective and
we need to scan high % of table. That combination is pretty rare, so
penalising everybody else with 8 bytes per block seems too much to me.

Big VACUUMs are a problem, but my observation would be that those are
typically transaction wraparound VACUUMs and the extra writes are not
caused by row removal. So we do sometimes do Phase 2 and Phase 3 even
when there is a very low number of row removals - since not all
VACUUMs are triggered by changes.


 Today, we dirty the pages in
 both the passes and also emit WAL records.

This is exactly the thing I'm suggesting we avoid.

 Just the heap scan can take a
 very long time for large tables, blocking the autovacuum worker threads from
 doing useful work on other tables. If I am not wrong, we use ring buffers
 for vacuum which would most-likely force those buffers to be written/read
 twice to the disk.

I think the problem comes from dirtying too many blocks. Scanning the
tables using the ring buffer is actually fairly cheap. The second scan
only touches the blocks that need secondary cleaning, so the cost of
it is usually much less.

I'm suggesting we write each block at most once, rather than twice as
we do now. Yes, we have to do both scans.

My idea does exactly same number of writes as yours. On read-only I/O,
your idea is clearly more efficient, but overall that's not by enough
to justify the 8 byte per block overhead, IMHO.


 Which part of the patch you think is very complex ? We can try to simplify
 that. Or are you seeing any obvious bugs that I missed ? IMHO taking out a
 phase completely from vacuum (as this patch does) can simplify things.

I have great faith in your talents, just not sure about this
particular use of them. I'm sorry to voice them now you've written the
patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of patch Bugfix for XPATH() if expression returns a scalar value

2011-07-14 Thread Florian Pflug
Hi Radosław,

Do you plan to comment on this patch (the one that adds support
for XPATH expressions returning scalar values, not the one that
escapes text and attributes nodes which are selected) further,
or should it be marked Ready for Committer?

I'm asking because I just noticed that you marked the other patch
as Ready for Commiter, but not this one.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Single pass vacuum - take 1

2011-07-14 Thread Pavan Deolasee
On Thu, Jul 14, 2011 at 6:22 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Jul 14, 2011 at 4:57 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:

  Thanks Simon for looking at the patch.

 Sorry, I didn't notice there was a patch attached. Not reviewed it. I
 thought we were still just talking.


No problem. Please review it when you have time.



  I am not sure if the use case is really narrow.

 This is a very rare issue, because of all the work yourself and Heikki
 have put in.


I don't think its rare case since vacuum on any table, small or large, would
take two passes today. Avoiding one of the two, would help the system in
general. HOT and other things help to a great extent, but unfortunately they
don't make vacuum completely go away. So we still want to do vacuum in the
most efficient way.


 It's only important when we have a (1) big table (hence long scan
 time), (2) a use case that avoids HOT *and* (3) we have dirtied a
 large enough section of table that the vacuum map is ineffective and
 we need to scan high % of table. That combination is pretty rare, so
 penalising everybody else with 8 bytes per block seems too much to me.


The additional space is allocated only for those pages which has dead-vacuum
line pointers and would stay only till the next HOT-prune operation on the
page. So everybody does not pay the penalty, even if we assume its too much
and if thats what worries you most.



 I have great faith in your talents, just not sure about this
 particular use of them.


Not sure if thats a compliment or a criticism :-)


 I'm sorry to voice them now you've written the
 patch.


Yeah, I would have liked if you would have said this earlier, but I
appreciate comments any time.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread Robert Haas
On Jul 14, 2011, at 5:13 PM, David E. Wheeler da...@kineticode.com wrote:
 On Jul 14, 2011, at 3:05 PM, Tom Lane wrote:
 
 Josh Berkus j...@agliodbs.com writes:
 There are a ton of
 things that change with each release, and all we do by putting in
 hacks for backwards compatibility is add bloat that needs to be
 maintained, and encourage vendors to be lazy.
 
 I don't agree that having comprehensive system views with multi-version
 stability would be a hack.
 
 If we had that, it wouldn't be a hack.
 
 Is that an endorsement for adding such a feature?
 
 Putting in a hack to cover the
 specific case of relistemp, on the other hand, is just a hack.
 
 Sure.
 
 The real question here, IMO, is how many applications are there that
 really need to know about temporary relations, but have no interest in
 the related feature of unlogged relations?.  Because only such apps
 would be served by a compatibility hack for this.  An app that thinks it
 knows the semantics of relistemp, and isn't updated to grok unlogged
 tables, may be worse than broken --- it may be silently incorrect.
 
 So pgTAP creates temporary tables to store result sets so that it can then 
 compare the results of two queries. The function in question was getting a 
 list of columns in such a temporary table in order to make sure that the 
 types were the same between two such tables before comparing results. It 
 checked relistemp to make sure it was looking at the temp table rather than 
 some other table that might happen to have the same name.
 
 So now the query looks like this:
 
SELECT pg_catalog.format_type(a.atttypid, a.atttypmod)
  FROM pg_catalog.pg_attribute a
  JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
 WHERE c.relname = $1
 -- AND c.relistemp -- 8.3-9.0
   AND c.relpersistence = 't'  -- 9.1
   AND attnum  0
   AND NOT attisdropped
 ORDER BY attnum
 
 Is that all I need to do, or is there something else I should be aware of 
 with regard to unlogged tables?

Probably not, in this case. Just a thought: maybe you could rewrite the query 
to check whether the namespace name starts with pg_temp. Maybe that would be 
version-independent...

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_class.relistemp

2011-07-14 Thread David E. Wheeler
On Jul 14, 2011, at 6:43 PM, Robert Haas wrote:

 Is that all I need to do, or is there something else I should be aware of 
 with regard to unlogged tables?
 
 Probably not, in this case. Just a thought: maybe you could rewrite the query 
 to check whether the namespace name starts with pg_temp. Maybe that would be 
 version-independent...

Ah, good idea, I forgot about pg_temp.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] per-column generic option

2011-07-14 Thread Shigeru Hanada
(2011/07/15 4:17), Josh Berkus wrote:
 All,
 
 Is the spec for this feature still under discussion?  I don't seem to
 see a consensus on this thread.

Yeah, a remaining concern is whether generic (FDW) options should be
separated from existing attoptions or not.

Indeed, reloptions/attoptions mechanism seems to be also applicable to
generic options, since both need to store multiple key-value pairs, but
IMHO generic options should be separated from reloptions/attoptions for
several reasons:

1) FDW options are handled by only FDW, but reloptions/attoptions are
handled by PG core modules such as planner, AM and autovacuum.  If we
can separate them completely, they would be able to share one attribute,
but I worry that some of reloptions/attoptions make sense for some FDW.
 For instance, n_distinct might be useful to control costs of a foreign
table scan.  Though attoptions can't be set via CREATE/ALTER FOREIGN
TABLE yet.

2) In future, newly added option might conflict somebody's FDW option.
Robert Haas has pointed out this issue some days ago.  FDW validator
would reject unknown options, so every FDW would have to know all of
reloptions/attoptions to avoid this issue.

3) It would be difficult to unify syntax to set options from perspective
of backward compatibility and syntax consistency.  Other SQL/MED objects
have the syntax such as OPTIONS (key 'value', ...), but
reloptions/attoptions have the syntax such as SET (key = value, ...).
 Without syntax unification, some tools should care relkind before
handling attoptions.  For instance, pg_dump should choose syntax used to
dump attoptions.  It seems undesired complexity.

Any comments/objections/questions are welcome.

Regards,
-- 
Shigeru Hanada

  * 英語 - 自動検出
  * 英語
  * 日本語

  * 英語
  * 日本語

 javascript:void(0);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: patch review : Add ability to constrain backend temporary file space

2011-07-14 Thread Tatsuo Ishii
 I have looked into the v6 patches. One thing I would like to suggest
 is, enhancing the error message when temp_file_limit will be exceeded.

 ERROR:  aborting due to exceeding temp file limit

 Is it possible to add the current temp file limit to the message? For
 example,

 ERROR:  aborting due to exceeding temp file limit 1kB

 I know the current setting of temp_file_limit can be viewd in other
 ways, but I think this will make admin's or application developer's
 life a little bit easier.
 
 Hi Tatsuo,
 
 Yeah, good suggestion - I agree that it would be useful to supply
 extra detail, I'll amend and resubmit a new patch (along with whatever
 review modifications we come up with in the meantime)!
 
 Cheers
 
 Mark

Maybe we could add more info regarding current usage and requested
amount in addition to the temp file limit value. I mean something
like:

ERROR:  aborting due to exceeding temp file limit. Current usage 9000kB, 
requested size 1024kB, thus it will exceed temp file limit 1kB.

What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is there a committer in the house?

2011-07-14 Thread Peter Eisentraut
On tor, 2011-07-14 at 12:01 -0700, Josh Berkus wrote:
 Currently we have 8 patches Ready for Committer in the current CF.
 Some of them have been that status for some time.
 
 From traffic on this list, I'm getting the impression that nobody other
 than Robert, Heikki and Tom are committing other people's patches.  I
 know we have more committers than that.  Bruce?  Simon?  Itagaki?  Bueller?

I'm still working on preparing 9.1.  I'm not following 9.2 development
yet.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ON COMMIT action not catalogued?

2011-07-14 Thread Peter Eisentraut
(Going through some loose ends in the information schema ...)

Is my understanding right that the ON COMMIT action of temporary tables
is not recorded in the system catalogs anywhere?  Would make sense,
wouldn't be a big problem, just want to make sure I didn't miss
anything.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SSI error messages

2011-07-14 Thread Peter Eisentraut
Some of these new error messages from the SSI code are a mouthful:

not enough elements in RWConflictPool to record a rw-conflict
not enough elements in RWConflictPool to record a potential rw-conflict

These are basically out of shared memory conditions that could be
moved to a DETAIL message.

Canceled on identification as a pivot, during conflict out checking.
Canceled on conflict out to old pivot %u.
Canceled on identification as a pivot, with conflict out to old committed 
transaction %u.
Canceled on conflict out to old pivot.
Canceled on identification as a pivot, during conflict in checking.
Canceled on identification as a pivot, during write.
Canceled on conflict out to pivot %u, during read.
Canceled on identification as a pivot, during commit attempt.
Canceled on commit attempt with conflict in from prepared pivot.

These are DETAIL messages, with the rest of the report saying

ERROR:  could not serialize access due to read/write dependencies among 
transactions
HINT:  The transaction might succeed if retried.

AFAICT, the documentation doesn't mention anything about this pivot
that keeps coming up.  Is it useful that have the user face this
information?  Is there anything a user can derive from seeing one of
these messages versus another, and in addition to the error and hint,
that would help them address the situation?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers