Re: [HACKERS] Re: patch review : Add ability to constrain backend temporary file space

2011-06-24 Thread Mark Kirkwood

On 22/06/11 11:13, Mark Kirkwood wrote:

On 21/06/11 02:39, Cédric Villemain wrote:

2011/6/20 Robert Haasrobertmh...@gmail.com:

On Mon, Jun 20, 2011 at 9:15 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com  wrote:

The feature does not work exactly as expected because the write limit
is rounded per 8kB because we write before checking. I believe if one
write a file of 1GB in one pass (instead of repetitive 8kB increment),
and the temp_file_limit is 0, then the server will write the 1GB
before aborting.

Can we rearrange thing so we check first, and then write?

probably but it needs more work to catch corner cases. We may be safe
to just document that (and also in the code). The only way I see so
far to have a larger value than 8kB here is to have a plugin doing the
sort instead of the postgresql core sort algo.




Thanks guys - will look at moving the check, and adding some 
documentation about the possible impacts of plugins (or new executor 
methods) that might write in chunks bigger than blocksz.


Maybe a few days - I'm home sick ATM, plus looking after these 
http://www.maftet.co.nz/kittens.html





This  version moves the check *before* we write the new buffer, so 
should take care of issues about really large write buffers, plugins 
etc. Also I *think* that means there is no need to amend the documentation.


Cheers

Mark

P.s: Hopefully I've got a context diff this time, just realized that git 
diff -c does *not* produce a context diff (doh).




temp-files-v6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] debugging tools inside postgres

2011-06-24 Thread HuangQi
Hi,
   I'm trying to debug a modification for the query planner. But I found it
seems the data structure of my planned query is incorrect. I was trying to
print out the data structure by use the p command in gdb which is quite
inconvenient and takes time. May I know is there any embedded function in
postgres to print out the node data structures, or any other plan related
data structures? Thanks.

-- 
Best Regards
Huang Qi Victor


Re: [HACKERS] WIP: Fast GiST index build

2011-06-24 Thread Heikki Linnakangas

On 21.06.2011 13:08, Alexander Korotkov wrote:

I've created section about testing in project wiki page:
http://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results
Do you have any notes about table structure?


It would be nice to have links to the datasets and scripts used, so that 
others can reproduce the tests.


It's surprising that the search time differs so much between the 
point_ops tests with uniformly random data with 100M and 10M rows. Just 
to be sure I'm reading it correctly: a small search time is good, right? 
You might want to spell that out explicitly.



As you can see I found that CPU usage might be much higher
with gist_trgm_ops.


Yeah, that is a bit worrysome. 6 minutes without the patch and 18 
minutes with it.



I believe it's due to relatively expensive penalty
method in that opclass.


Hmm, I wonder if it could be optimized. I did a quick test, creating a 
gist_trgm_ops index on a list of English words from 
/usr/share/dict/words. oprofile shows that with the patch, 60% of the 
CPU time is spent in the makesign() function.



But, probably index build can be still faster when
index doesn't fit cache even for gist_trgm_ops.


Yep.


Also with that opclass index
quality is slightly worse but the difference is not dramatic.


5-10% difference should be acceptable

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Shigeru Hanada
(2011/06/24 15:35), HuangQi wrote:
 Hi,
 I'm trying to debug a modification for the query planner. But I found it
 seems the data structure of my planned query is incorrect. I was trying to
 print out the data structure by use the p command in gdb which is quite
 inconvenient and takes time. May I know is there any embedded function in
 postgres to print out the node data structures, or any other plan related
 data structures? Thanks.

I think nodeToString() would help.  This function converts node tree
recursively into a string, and it's applicable for any Node-derived
object, such as Expr, Var and Const.

ex)
elog(DEBUG1, %s, nodeToString(plan));

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Fast GiST index build

2011-06-24 Thread Alexander Korotkov
On Fri, Jun 24, 2011 at 12:40 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 21.06.2011 13:08, Alexander Korotkov wrote:

 I've created section about testing in project wiki page:
 http://wiki.postgresql.org/**wiki/Fast_GiST_index_build_**
 GSoC_2011#Testing_resultshttp://wiki.postgresql.org/wiki/Fast_GiST_index_build_GSoC_2011#Testing_results
 Do you have any notes about table structure?


 It would be nice to have links to the datasets and scripts used, so that
 others can reproduce the tests.

Done.


 It's surprising that the search time differs so much between the point_ops
 tests with uniformly random data with 100M and 10M rows. Just to be sure I'm
 reading it correctly: a small search time is good, right? You might want to
 spell that out explicitly.

Yes, you're reading this correctly. Detailed explanation was added to the
wiki page. It's surprising for me too. I need some more insight into causes
of index quality difference.

Now I found some large enough real-life datasets (thanks to Oleg Bartunov)
and I'm performing tests on them.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread HuangQi
2011/6/24 Shigeru Hanada shigeru.han...@gmail.com

 (2011/06/24 15:35), HuangQi wrote:
  Hi,
  I'm trying to debug a modification for the query planner. But I found
 it
  seems the data structure of my planned query is incorrect. I was trying
 to
  print out the data structure by use the p command in gdb which is quite
  inconvenient and takes time. May I know is there any embedded function in
  postgres to print out the node data structures, or any other plan related
  data structures? Thanks.

 I think nodeToString() would help.  This function converts node tree
 recursively into a string, and it's applicable for any Node-derived
 object, such as Expr, Var and Const.

 ex)
elog(DEBUG1, %s, nodeToString(plan));

 Regards,
 --
 Shigeru Hanada


Hi,
   I don't know why but when I am debugging the query evaluation, the elog
function can not output to shell.


-- 
Best Regards
Huang Qi Victor


[HACKERS] silent_mode and LINUX_OOM_ADJ

2011-06-24 Thread Heikki Linnakangas
While reviewing Peter Geoghegan's postmaster death patch, I noticed that 
if you turn on silent_mode, the LINUX_OOM_ADJ code in fork_process() 
runs when postmaster forks itself into background. That re-enables the 
OOM killer in postmaster, if you've disabled it in the startup script by 
adjusting /proc/self/oom_adj. That seems like a bug, albeit a pretty 
minor one.


This may be a dumb question, but what is the purpose of silent_mode? 
Can't you just use nohup?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch
Hi Florian,

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
 Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.

 On Jun20, 2011, at 22:45 , Noah Misch wrote:
  On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
  On Jun20, 2011, at 19:57 , Noah Misch wrote:
  Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
  available and xmlGenericErrorContext otherwise, I would just add an Autoconf
  check to identify which one to use.
 
 It seems that before xmlStructuredErrorContext was introduced, the structured
 and the generic error handler shared an error context, so doing what you
 suggested looks sensible.
 
 I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
 this the right way. I basically copied a similar check (for setlongjump AFAIR)
 and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine.  Note that, more often than not, committers ask for patches
not to contain the diff of the generated configure itself.  (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

  !XML_PURPOSE_LEGACY /* Save error message only, don't set error 
  flag */,
  
  It's fine to set the flag for legacy users, considering they could just 
  not
  read it.  The important distinction seems to be that we don't emit 
  warnings or
  notices in this case.  Is that correct?  If so, the comment should reflect
  that emphasis.  Then, consider updating the flag unconditionally.
  
  I took me a while to remember while I did it that way, but I think I have 
  now.
  
  I initialled wanted to add an Assert(!xml_error_occurred) to catch any
  missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
  though...
  
  So I guess I should either refrain from setting the flag as you suggested,
  or add such an Assert(). I think I very slightly prefer the latter, what
  do you think?
  
  I do like the idea of that assert.  How about setting the flag anyway, but
  making it Assert(xml_purpose == XML_PURPOSE_LEGACY || 
  !xml_error_occurred)?
 
 I added the Assert, but didn't make the setting of the error flag 
 unconditional.
 
 If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
 case. Now, that case currently isn't exercised, but breaking nevertheless
 seemed wrong to me.

Okay.

  *** a/src/test/regress/expected/xml.out
  --- b/src/test/regress/expected/xml.out
  *** INSERT INTO xmltest VALUES (3, 'wrong')
  *** 8,14 
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, 'wrong');
 ^
  ! DETAIL:  Entity: line 1: parser error : Couldn't find end of Start Tag 
  wrong line 1
  wrong
^
  SELECT * FROM xmltest;
  --- 8,14 
  ERROR:  invalid XML content
  LINE 1: INSERT INTO xmltest VALUES (3, 'wrong');
 ^
  ! DETAIL:  line 1: Couldn't find end of Start Tag wrong line 1
  
  Any reason to change the error text this way?
  
  The Entity: prefix is added by libxml only for messages reported
  to the generic error handler. It *doesn't* refer to entities as defined
  in xml, but instead used in place of the file  name if libxml
  doesn't have that at hand (because it's parsing from memory).
  
  So that Entity: prefix really doesn't have any meaning whatsoever.
  
  So xmlParserPrintFileContext() sends different content to the generic error
  handler from what natural calls to the handler would send?
 
 xmlParserPrintFileContext() only generates the context part of the error
 message. In the example above, these are the two lines
   wrong
 ^
 These lines don't contain the Entity: prefix - neither with the patch
 attached nor without.

Ah, my mistake.

  I believe that the compatibility risk is pretty low here, and removing
  the meaningless prefix makes the error message are bit nicer to read.
  But if you are concerned that there maybe applications out there who
  parse the error text, we could of course add it back. I must say that
  I don't really know what our policy regarding error message stability is...
  
  If the libxml2 API makes it a major pain to preserve exact messages, I
  wouldn't worry.
 
 It'd only require us to prepend Entity:  to the message string, so
 there's no pain there. The question is rather whether we want to continue
 having a pure noise word in front of every libxml error message for
 the sake of compatibility.

There's also the  parser error : difference; would that also be easy to
re-add?  (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

 I vote Nay, on the grounds that I estimate the actual breakage from
 such a change to be approximately zero. Plus the fact that 

Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Fujii Masao
On Wed, Jun 22, 2011 at 9:11 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
                rc = WaitForMultipleObjects(numevents, events, FALSE,
                                                           (timeout = 0) ? 
 (timeout / 1000) : INFINITE);
 -               if (rc == WAIT_FAILED)
 +               if ( (wakeEvents  WL_POSTMASTER_DEATH) 
 +                        !PostmasterIsAlive(true))

 After WaitForMultipleObjects() detects the death of postmaster,
 WaitForSingleObject()
 is called in PostmasterIsAlive(). In this case, what code does
 WaitForSingleObject() return?
 I wonder if WaitForSingleObject() returns the code other than
 WAIT_TIMEOUT and really
 can detect the death of postmaster.

 As noted up-thread, the fact that the archiver does wake and finish on
 Postmaster death can be clearly observed on Windows. I'm not sure why
 you wonder that, as this is fairly standard use of
 PostmasterIsAlive().

Because, if PostmasterHandle is an auto-reset event object, its event state
would be automatically reset just after WaitForMultipleObjects() detects
the postmaster death event, I was afraid. But your observation proved that
my concern was not right.

I have another comments:

+#ifndef WIN32
+   /*
+* Initialise mechanism that allows waiting latch clients
+* to wake on postmaster death, to finish their
+* remaining business
+*/
+   InitPostmasterDeathWatchHandle();
+#endif

Calling this function before creating TopMemoryContext looks unsafe. What if
the function calls ereport(FATAL)?

That ereport() can be called before postgresql.conf is read, i.e., before GUCs
for error reporting are set. Is this OK? If not,
InitPostmasterDeathWatchHandle()
should be moved after SelectConfigFiles().

+#ifndef WIN32
+int postmaster_alive_fds[2];
+#endif

postmaster_alive_fds[] should be initialized to {-1, -1}?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Peter Geoghegan
On 24 June 2011 12:30, Fujii Masao masao.fu...@gmail.com wrote:
 +#ifndef WIN32
 +       /*
 +        * Initialise mechanism that allows waiting latch clients
 +        * to wake on postmaster death, to finish their
 +        * remaining business
 +        */
 +       InitPostmasterDeathWatchHandle();
 +#endif

 Calling this function before creating TopMemoryContext looks unsafe. What if
 the function calls ereport(FATAL)?

 That ereport() can be called before postgresql.conf is read, i.e., before GUCs
 for error reporting are set. Is this OK? If not,
 InitPostmasterDeathWatchHandle()
 should be moved after SelectConfigFiles().

I see no reason to take the risk that it might at some point - I've moved it.

 +#ifndef WIN32
 +int postmaster_alive_fds[2];
 +#endif

 postmaster_alive_fds[] should be initialized to {-1, -1}?

Yes, they should. That works better.

I think that Heikki is currently taking another look at my work,
because he indicates in a new message to the list a short time ago
that while reviewing my patch, he realised that there may be an
independent problem with silent_mode. I will wait for his remarks
before producing another version of the patch that incorporates those
two small changes.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Online base backup from the hot-standby

2011-06-24 Thread Steve Singer
On 11-06-24 12:41 AM, Jun Ishiduka wrote:

 The logic that not use pg_stop_backup() would be difficult,
 because pg_stop_backup() is used to identify minRecoveryPoint.


Considering everything that has been discussed on this thread so far.

Do you still think your patch is the best way to accomplish base backups
from standby servers?
If not what changes do you think should be made?


Steve

 
 Jun Ishizuka
 NTT Software Corporation
 TEL:045-317-7018
 E-Mail: ishizuka@po.ntts.co.jp
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Shigeru Hanada
(2011/06/24 19:14), HuangQi wrote:
 2011/6/24 Shigeru Hanadashigeru.han...@gmail.com
 
 (2011/06/24 15:35), HuangQi wrote:
 ex)
 elog(DEBUG1, %s, nodeToString(plan));
 
 Hi,
 I don't know why but when I am debugging the query evaluation, the elog
 function can not output to shell.

What kind of tool do you use to execute the query to be evaluated?

If you are using an interactive tool such as psql, please check setting
of client_min_messages.  Otherwise, please check settings of
log_destination, logging_collector and log_min_messages to ensure that
elog() prints debugging information into your server log file, or stderr
of the terminal which has been used to start PostgreSQL server.

Regards,
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Peter Eisentraut
On tor, 2011-06-23 at 21:39 -0400, Bruce Momjian wrote:
 I have created the following patch which uses 25432 as the default port
 number for pg_upgrade.

I don't think we should just steal a port from the reserved range.
Picking a random port from the private/dynamic range seems more
appropriate.

 It also creates two new environment variables,
 OLDPGPORT and NEWPGPORT, to control the port values because we don't
 want to default to PGPORT anymore.

I would prefer that all PostgreSQL-related environment variables start
with PG.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 9:04 AM, Peter Eisentraut pete...@gmx.net wrote:
 It also creates two new environment variables,
 OLDPGPORT and NEWPGPORT, to control the port values because we don't
 want to default to PGPORT anymore.

 I would prefer that all PostgreSQL-related environment variables start
 with PG.

+1

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Florian Pflug
On Jun24, 2011, at 13:24 , Noah Misch wrote:
 On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
 Updated patch attached.
 
 This builds and passes all tests on both test systems I used previously
 (CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
 2.6.31.dfsg-2ubuntu1.6).  Tested behaviors look good, too.
 
 On Jun20, 2011, at 22:45 , Noah Misch wrote:
 On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
 On Jun20, 2011, at 19:57 , Noah Misch wrote:
 Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
 available and xmlGenericErrorContext otherwise, I would just add an Autoconf
 check to identify which one to use.
 
 It seems that before xmlStructuredErrorContext was introduced, the structured
 and the generic error handler shared an error context, so doing what you
 suggested looks sensible.
 
 I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I 
 did
 this the right way. I basically copied a similar check (for setlongjump 
 AFAIR)
 and adapted it to check for xmlStructuredErrorContext instead.
 
 Turned out fine.  Note that, more often than not, committers ask for patches
 not to contain the diff of the generated configure itself.  (I personally
 don't mind it when the diff portion is small and generated with the correct
 version of Autoconf, as in this patch.)

Ah, OK. I didn't know what project policy was there, so I figured I'd include
it since the configure script is tracked by git too. Now that I know, I'll 
remove
it from the final patch.

 I believe that the compatibility risk is pretty low here, and removing
 the meaningless prefix makes the error message are bit nicer to read.
 But if you are concerned that there maybe applications out there who
 parse the error text, we could of course add it back. I must say that
 I don't really know what our policy regarding error message stability is...
 
 If the libxml2 API makes it a major pain to preserve exact messages, I
 wouldn't worry.
 
 It'd only require us to prepend Entity:  to the message string, so
 there's no pain there. The question is rather whether we want to continue
 having a pure noise word in front of every libxml error message for
 the sake of compatibility.
 
 There's also the  parser error : difference; would that also be easy to
 re-add?  (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. 
Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

 I vote Nay, on the grounds that I estimate the actual breakage from
 such a change to be approximately zero. Plus the fact that libxml
 error messages aren't completely stable either. I had to suppress the
 DETAIL output for one of the regression tests to make them work for
 both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
 URI in it's error message in one of these versions but not the other...
 
 I'm not particularly worried about actual breakage; I'm just putting on the
 one change per patch-lawyer hat.  All other things being equal, I'd prefer
 one patch that changes the mechanism for marshalling errors while minimally
 changing the format of existing errors, then another patch that proposes new
 formatting.  (Granted, the formatting-only patch would probably founder.)  But
 it's a judgement call, and this change is in a grey area.  I'm fine with
 leaving it up to the committer.

I agree with that principle, in principle. In the light of my paragraph above
about how we'd need to duplicate libxml code to keep the error messages the 
same,
however, I'll leave it up to the committer as you suggested.

 A few minor code comments:
 
 *** a/src/backend/utils/adt/xml.c
 --- b/src/backend/utils/adt/xml.c
 
 + static bool xml_error_initialized = false;
 
 Since xml_error_initialized is only used for asserts and now redundant with
 asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

You're absolutely right. Will remove.

 ! /*
 !  * pg_xml_done --- restore libxml state after pg_xml_init().
 !  *
 !  * This must be called if pg_xml_init() was called with a
 !  * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
 !  * (i.e. the structured error handler) to the original state.
 
 The first sentence is obsolete.

Right again. Will remove.

 *** a/src/include/utils/xml.h
 --- b/src/include/utils/xml.h
 *** typedef enum
 *** 68,74 
  XML_STANDALONE_OMITTED
  }   XmlStandaloneType;
 
 ! extern void pg_xml_init(void);
  extern void xml_ereport(int level, int sqlcode, const char *msg);
  extern xmltype *xmlconcat(List *args);
  extern xmltype 

Re: [HACKERS] silent_mode and LINUX_OOM_ADJ

2011-06-24 Thread Alvaro Herrera
Excerpts from Heikki Linnakangas's message of vie jun 24 07:01:57 -0400 2011:
 While reviewing Peter Geoghegan's postmaster death patch, I noticed that 
 if you turn on silent_mode, the LINUX_OOM_ADJ code in fork_process() 
 runs when postmaster forks itself into background. That re-enables the 
 OOM killer in postmaster, if you've disabled it in the startup script by 
 adjusting /proc/self/oom_adj. That seems like a bug, albeit a pretty 
 minor one.
 
 This may be a dumb question, but what is the purpose of silent_mode? 
 Can't you just use nohup?

I think silent_mode is an artifact from when our daemon handling in
general was a lot more primitive (I bet there wasn't even pg_ctl then).
Maybe we could discuss removing it altogether.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade version check improvements and small fixes

2011-06-24 Thread Dan McGee
On Wed, Jun 22, 2011 at 8:54 PM, Bruce Momjian br...@momjian.us wrote:
 0003 is what I really wanted to solve, which was my failure with
 pg_upgrade. The call to pg_ctl didn't succeed because the binaries
 didn't match the data directory, thus resulting in this:

 The error had nothing to do with trust at all; it was simply that I
 tried to use 9.0 binaries with an 8.4 data directory. My patch checks
 for this and ensures that the -D bindir is the correct version, just
 as the -B datadir has to be the correct version.

 I had not thought about testing if the bin and data directory were the
 same major version, but you are right that it generates an odd error if
 they are not.

 I changed your sscanf format string because the version can be just
 9.2dev and there is no need for the minor version.
No problem- you're going to know way more about this than me, and I
was just going off what I saw in my two installed versions and a quick
look over at the code of pg_ctl to make sure that string was never
translated.

On a side note I don't think I ever mentioned to everyone else why
parsing the version from pg_ctl rather than pg_config was done- this
is so we don't introduce any additional binary requirements. Tom Lane
noted in an earlier cleanup series that pg_config is not really needed
at all in the old cluster, so I wanted to keep it that way, but pg_ctl
has always been required.

   I saw no reason
 to test if the binary version matches the pg_upgrade version because we
 already test the cluster version, and we test the cluster version is the
 same as the binary version.
Duh. That makes sense. Thanks for getting to these so quickly!

To partially toot my own horn but also show where a user like me
encountered this, after some packaging hacking, anyone running Arch
Linux should be able to do a pg_upgrade from here on out by installing
the postgresql-old-upgrade package
(http://www.archlinux.org/packages/extra/x86_64/postgresql-old-upgrade/)
and possible consulting the Arch wiki.

-Dan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Pgbuildfarm-members] CREATE FUNCTION hang on test machine polecat on HEAD

2011-06-24 Thread Robert Creager

Got another one (no env since the last changes).  I'll try and run the kept 
install from when I killed 22853 (had to use -9) to see if it reproduces the 
problem with the same binaries.  Is there interest in me removing the perl in 
/opt/local/bin/perl?  I can install ccache elsewhere and rename that directory.

  502   310 1   0   0:00.09 ?? 0:00.14 
/Library/PostgreSQL/8.3/bin/postgres -D /Library/PostgreSQL/8.3/data
  502   313   310   0   0:00.36 ?? 0:00.51 postgres: logger process 



 
  502   315   310   0   0:01.10 ?? 0:02.43 postgres: writer process 



  
  502   316   310   0   0:01.03 ?? 0:01.62 postgres: wal writer process 



 
  502   317   310   0   0:00.28 ?? 0:00.40 postgres: autovacuum 
launcher process


 
  502   318   310   0   0:00.29 ?? 0:00.33 postgres: stats collector 
process 


 
  501 22813 1   0   0:00.29 ?? 0:00.38 /Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/bin/postgres -D data-C
  501 22815 22813   0   0:00.57 ?? 0:01.31 postgres: writer process 
  501 22816 22813   0   0:00.53 ?? 0:00.85 postgres: wal writer process 

  501 22817 22813   0   0:00.28 ?? 0:00.65 postgres: autovacuum 
launcher process 
  501 22818 22813   0   0:01.19 ?? 0:01.47 postgres: stats collector 
process 
  501 22853 22813   0  78:13.79 ??89:26.32 postgres: Robert 
pl_regression [local] CREATE FUNCTION  

Robert:/usr/local/src/build-farm-4.4/builds/HEAD
% gdb inst/bin/postgres 22853
GNU gdb 6.3.50-20050815 (Apple version gdb-1518) (Sat Feb 12 02:52:12 UTC 2011)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as x86_64-apple-darwin...Reading symbols for shared 
libraries .. done
/Volumes/High Usage/usr/local/src/build-farm-4.4/builds/HEAD/22853: No such 
file or directory

Attaching to program: `/Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/bin/postgres', process 
22853.
Reading symbols for shared libraries .+. done
0x000100a505e4 in Perl_get_hash_seed ()
(gdb) bt
#0  0x000100a505e4 in Perl_get_hash_seed ()
#1  0x000100a69b94 in perl_parse ()
#2  0x0001007c0680 in plperl_init_interp () at plperl.c:781
#3  0x0001007c117a in _PG_init () at plperl.c:443
#4  0x000100304396 in internal_load_library (libname=0x10100d540 
/Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/inst/lib/postgresql/plperl.so) 
at dfmgr.c:284
#5  0x000100304ce5 in load_external_function (filename=value temporarily 
unavailable, due to optimizations, funcname=0x10100d508 plperl_validator, 
signalNotFound=1 '\001', filehandle=0x7fff5fbfd3b8) at dfmgr.c:113
#6  0x000100307200 in fmgr_info_C_lang [inlined] () at /Volumes/High 
Usage/usr/local/src/build-farm-4.4/builds/HEAD/pgsql.4091/src/backend/utils/fmgr/fmgr.c:349
#7  0x000100307200 in fmgr_info_cxt_security (functionId=41362, 
finfo=0x7fff5fbfd410, mcxt=value temporarily unavailable, due to 
optimizations, ignore_security=value temporarily unavailable, due to 
optimizations) at fmgr.c:280
#8  0x0001003083f0 in OidFunctionCall1Coll (functionId=value temporarily 
unavailable, due to optimizations, collation=0, arg1=41430) at fmgr.c:1585
#9  0x00010009f58d in ProcedureCreate 

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera

Another thing I just noticed is that if you pg_upgrade from a previous
release, all the NOT NULL pg_constraint rows are going to be missing.
I'm not sure what's the best way to deal with this -- should we just
provide a script for the user to run on each database?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 (2011/06/24 15:35), HuangQi wrote:
 I'm trying to debug a modification for the query planner. But I found it
 seems the data structure of my planned query is incorrect. I was trying to
 print out the data structure by use the p command in gdb which is quite
 inconvenient and takes time. May I know is there any embedded function in
 postgres to print out the node data structures, or any other plan related
 data structures? Thanks.

 I think nodeToString() would help.

For interactive use in gdb, I generally do

call pprint(..node pointer..)

which prints to the postmaster log.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] debugging tools inside postgres

2011-06-24 Thread HuangQi
On 24 June 2011 23:21, Tom Lane t...@sss.pgh.pa.us wrote:

 Shigeru Hanada shigeru.han...@gmail.com writes:
  (2011/06/24 15:35), HuangQi wrote:
  I'm trying to debug a modification for the query planner. But I found it
  seems the data structure of my planned query is incorrect. I was trying
 to
  print out the data structure by use the p command in gdb which is
 quite
  inconvenient and takes time. May I know is there any embedded function
 in
  postgres to print out the node data structures, or any other plan
 related
  data structures? Thanks.

  I think nodeToString() would help.

 For interactive use in gdb, I generally do

call pprint(..node pointer..)

 which prints to the postmaster log.

regards, tom lane



Thanks, Tom, this call pprint() works very nice.
-- 
Best Regards
Huang Qi Victor


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 11:17 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Another thing I just noticed is that if you pg_upgrade from a previous
 release, all the NOT NULL pg_constraint rows are going to be missing.

Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
the system catalogs, so if these entries aren't getting created
automatically, that sounds like a bug in the patch...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas
While taking a look at the existing documentation for pg_locks I came
across the following paragraph:

  para
   When the structnamepg_locks/structname view is accessed, the
   internal lock manager data structures are momentarily locked, and
   a copy is made for the view to display.  This ensures that the
   view produces a consistent set of results, while not blocking
   normal lock manager operations longer than necessary.  Nonetheless
   there could be some impact on database performance if this view is
   frequently accessed.
  /para

AFAICS, this is no longer quite true.  The view of the data from the
main lock manager will be self-consistent, and the view of the data
from the predicate lock manager will be self-consistent, but they need
not be consistent with each other.  I don't think that's a problem we
need to fix, but we probably ought to make the documentation match the
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie jun 24 12:06:17 -0400 2011:
 On Fri, Jun 24, 2011 at 11:17 AM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Another thing I just noticed is that if you pg_upgrade from a previous
  release, all the NOT NULL pg_constraint rows are going to be missing.
 
 Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
 the system catalogs, so if these entries aren't getting created
 automatically, that sounds like a bug in the patch...

Ah -- we should be fine then.  I haven't tested that yet (actually I
haven't finished tinkering with the code).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 While taking a look at the existing documentation for pg_locks I
 came across the following paragraph:
 
   para
When the structnamepg_locks/structname view is accessed,
the internal lock manager data structures are momentarily
locked, and a copy is made for the view to display.  This
ensures that the view produces a consistent set of results,
while not blocking normal lock manager operations longer than
necessary. Nonetheless there could be some impact on database
performance if this view is frequently accessed.
   /para
 
 AFAICS, this is no longer quite true.  The view of the data from
 the main lock manager will be self-consistent, and the view of the
 data from the predicate lock manager will be self-consistent, but
 they need not be consistent with each other.  I don't think that's
 a problem we need to fix, but we probably ought to make the
 documentation match the behavior.
 
It remains true that the heavyweight locking structures are
momentarily locked to capture a consistent view of those, and it is
also true that the predicate locking structures are momentarily
locked to get a consistent view of that data.  Both are not covered
by some over-arching lock, but that's true at *all* times --
SIReadLock entries can disappear mid-transaction (for READ ONLY
transactions) and can persist past transaction completion (if there
are overlapping transactions with which the completed transaction
can still form a dangerous structure).  So there is never a very
close tie between the timing of the appearance or disappearance for
SIRead locks versus any heavyweight locks.
 
That is covered to some degree in the section on serializable
transactions:
 
http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html#XACT-SERIALIZABLE
 
Any thoughts on what else the docs need to be more clear?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 12:27 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 While taking a look at the existing documentation for pg_locks I
 came across the following paragraph:

   para
    When the structnamepg_locks/structname view is accessed,
    the internal lock manager data structures are momentarily
    locked, and a copy is made for the view to display.  This
    ensures that the view produces a consistent set of results,
    while not blocking normal lock manager operations longer than
    necessary. Nonetheless there could be some impact on database
    performance if this view is frequently accessed.
   /para

 AFAICS, this is no longer quite true.  The view of the data from
 the main lock manager will be self-consistent, and the view of the
 data from the predicate lock manager will be self-consistent, but
 they need not be consistent with each other.  I don't think that's
 a problem we need to fix, but we probably ought to make the
 documentation match the behavior.

 It remains true that the heavyweight locking structures are
 momentarily locked to capture a consistent view of those, and it is
 also true that the predicate locking structures are momentarily
 locked to get a consistent view of that data.  Both are not covered
 by some over-arching lock, but...

...but they could be, if we were so inclined.  We could acquire all of
the lock manager partition locks, all of the predicate lock manager
partition locks, and the lock on SerializableXactHashLock - then dump
all the lock state from both tables - then release all the locks.
What we actually do is acquire all of the lock manager locks, snapshot
the state, release those locks, then get the predicate lock manager
locks and SerializableXactHashLock, snapshot the state over there,
release those locks, and then dump everything out.  Since we don't do
that, it's possible that select * from pg_locks could see a state
that never really existed.

For example, it's possible that, after doing GetLockStatusData() but
before doing GetPredicateLockStatusData(), some backend might commit a
transaction, releasing a heavyweight lock, begin a new transaction,
and acquire a predicate lock.  Now, the backend looking at the lock
table is going to see both of those locks held at the same time even
though in reality that never happened.  The existing documentation for
pg_locks indicates that such anomalies are possible because it's based
on the (heretofore correct) idea that we're going to grab every single
relevant lwlock at the same time before taking a snapshot of the
system state.

What I think we should do is replace the existing paragraph with
something along these lines:

The structnamepg_locksstructname view displays data from both the
regular lock manager and the predicate lock manager, which are
separate systems.  When this view is accessed, the internal data
structures of each lock manager are momentarily locked, and copies are
made for the view to display.  Each lock manager will therefore
produce a consistent set of results, but as we do not lock both lock
managers simultaneously, it is possible for locks to be taken or
released after we interrogate the regular lock manager and before we
interrogate the predicate lock manager.  Each lock manager is only
locked for the minimum possible time so as to reduce the performance
impact of querying this view, but there could nevertheless be some
impact on database performance if it is frequently accessed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Heikki Linnakangas

On 24.06.2011 11:40, Heikki Linnakangas wrote:

On 21.06.2011 13:08, Alexander Korotkov wrote:

I believe it's due to relatively expensive penalty
method in that opclass.


Hmm, I wonder if it could be optimized. I did a quick test, creating a
gist_trgm_ops index on a list of English words from
/usr/share/dict/words. oprofile shows that with the patch, 60% of the
CPU time is spent in the makesign() function.


I couldn't resist looking into this, and came up with the attached 
patch. I tested this with:


CREATE TABLE words (word text);
COPY words FROM '/usr/share/dict/words';
CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

And then ran REINDEX INDEX i_words a few times with and without the 
patch. Without the patch, reindex takes about 4.7 seconds. With the 
patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes 
even more important with Alexander's fast GiST build patch, which calls 
the penalty function more.


I used the attached showsign-debug.patch to verify that the patched 
makesign function produces the same results as the existing code. I 
haven't tested the big-endian code, however.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index b328a09..1f3d3e3 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -84,17 +84,88 @@ gtrgm_out(PG_FUNCTION_ARGS)
 static void
 makesign(BITVECP sign, TRGM *a)
 {
-	int4		k,
-len = ARRNELEM(a);
+	int4		len = ARRNELEM(a);
 	trgm	   *ptr = GETARR(a);
-	int4		tmp = 0;
+	char	   *p;
+	char	   *endptr;
+	uint32		w1,
+w2,
+w3;
+	uint32		trg1,
+trg2,
+trg3,
+trg4;
+	uint32	   *p32;
 
 	MemSet((void *) sign, 0, sizeof(BITVEC));
 	SETBIT(sign, SIGLENBIT);	/* set last unused bit */
-	for (k = 0; k  len; k++)
+
+	if (len == 0)
+		return;
+
+	endptr = (char *) (ptr + len);
+	/*
+	 * We have to extract each trigram into a uint32, and calculate the HASH.
+	 * This would be a lot easier if each trigram was aligned at 4-byte
+	 * boundary, but they're not. The simple way would be to copy each
+	 * trigram byte-per-byte, but that is quite slow, and this function is a
+	 * hotspot in penalty calculation.
+	 *
+	 * The first trigram in the array doesn't begin at a 4-byte boundary, the
+	 * flags byte comes first, but the next one does. So we fetch the first
+	 * trigram as a special case, and after that the following four trigrams
+	 * fall onto 4-byte words like this:
+	 *
+	 *  w1   w2   w3
+	 * AAAB BBCC CDDD
+	 *
+	 * As long as there's at least four trigrams left to process, we fetch
+	 * the next three words and extract the trigrams from them with bit
+	 * operations.
+	 *
+	 */
+	p32 = (uint32 *) (((char *) ptr) - 1);
+
+	/* Fetch and extract the initial word */
+	w1 = *(p32++);
+#ifdef WORDS_BIGENDIAN
+	trg1 = w1  8;
+#else
+	trg1 = w1  8;
+#endif
+	HASH(sign, trg1);
+
+	while((char *) p32  endptr - 12)
 	{
-		CPTRGM(((char *) tmp), ptr + k);
-		HASH(sign, tmp);
+		w1 = *(p32++);
+		w2 = *(p32++);
+		w3 = *(p32++);
+
+#ifdef WORDS_BIGENDIAN
+		trg1 = w1  0xFF00;
+		trg2 = (w1  24) | ((w2  0x)  8);
+		trg3 = ((w2  0x)  16) | ((w3  0xFF00)  16);
+		trg4 = w3  8;
+#else
+		trg1 = w1  0x00FF;
+		trg2 = (w1  24) | ((w2  0x)  8);
+		trg3 = ((w2  0x)  16) | ((w3  0x00FF)  16);
+		trg4 = w3  8;
+#endif
+
+		HASH(sign, trg1);
+		HASH(sign, trg2);
+		HASH(sign, trg3);
+		HASH(sign, trg4);
+	}
+
+	/* Handle remaining 1-3 trigrams the slow way */
+	p = (char *) p32;
+	while (p  endptr)
+	{
+		CPTRGM(((char *) trg1), p);
+		HASH(sign, trg1);
+		p += 3;
 	}
 }
 
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index b328a09..b5be800 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -44,6 +44,9 @@ Datum		gtrgm_penalty(PG_FUNCTION_ARGS);
 PG_FUNCTION_INFO_V1(gtrgm_picksplit);
 Datum		gtrgm_picksplit(PG_FUNCTION_ARGS);
 
+PG_FUNCTION_INFO_V1(gtrgm_showsign);
+Datum		gtrgm_showsign(PG_FUNCTION_ARGS);
+
 #define GETENTRY(vec,pos) ((TRGM *) DatumGetPointer((vec)-vector[(pos)].key))
 
 /* Number of one-bits in an unsigned byte */
@@ -98,6 +101,32 @@ makesign(BITVECP sign, TRGM *a)
 	}
 }
 
+static char *
+printsign(BITVECP sign)
+{
+	static char c[200];
+	char *p = c;
+	int i;
+	for(i=0; i  SIGLEN;i++)
+	{
+		p += snprintf(p, 3, %02x, (unsigned int) (((unsigned char *) sign)[i]));
+	}
+	return c;
+}
+
+Datum
+gtrgm_showsign(PG_FUNCTION_ARGS)
+{
+	text	   *in = PG_GETARG_TEXT_P(0);
+	BITVEC		sign;
+	TRGM	   *trg;
+
+	trg = generate_trgm(VARDATA(in), VARSIZE(in) - VARHDRSZ);
+	makesign(sign, trg);
+
+	PG_RETURN_TEXT_P(cstring_to_text(printsign(sign)));
+}
+
 Datum
 gtrgm_compress(PG_FUNCTION_ARGS)
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes 

Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Bernd Helmle



--On 24. Juni 2011 12:06:17 -0400 Robert Haas robertmh...@gmail.com wrote:


Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
the system catalogs, so if these entries aren't getting created
automatically, that sounds like a bug in the patch...


AFAIR, i've tested it and it worked as expected.

--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] News on Clang

2011-06-24 Thread Peter Geoghegan
A couple of months back, Greg Stark reported mixed results on getting
Clang/LLVM to build Postgres. It could be done, but there were some
bugs, including a bug that caused certain grammar TUs to take way too
long to compile. 2 bug reports were filed. He said that the long
compile time problem was made a lot better by using SVN-tip as it
existed at the time - it brought compile time down from over a minute
to just 15 seconds for preproc.c, which was of particular concern
then.

Last night, I had a go at getting Postgres to build using my own build
of Clang SVN-tip (2.9). I was successful, but I too found certain
grammar TUs to be very slow to compile. Total times were 3m23s for
Clang, to GCC 4.6's 2m1s. I made a (perhaps duplicate) bug report,
which had a preprocessed gram.c as a testcase. Here was the compile
time that I saw for the file:

[peter@peter postgresql]$ time /home/peter/build/Release/bin/clang -O2
-Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -Wformat-security
-fno-strict-aliasing -fwrapv -Wno-error -I. -I. -I
/home/peter/postgresql/src/include -D_GNU_SOURCE   -c -o gram.o
/home/peter/postgresql/src/backend/parser/gram.c
In file included from gram.y:12939:
scan.c:16246:23: warning: unused variable 'yyg' [-Wunused-variable]
struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var ...
  ^
1 warning generated.

real1m5.780s
user1m4.915s
sys 0m0.086s

The compile time is astronomically high, considering it takes about 2
seconds to compile gram.c on the same machine using GCC 4.6 with the
same flags.

This problem is reportedly a front-end issue. Observe what happens
when I omit the -Wall flag:

[peter@peter postgresql]$ time /home/peter/build/Release/bin/clang -O2
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels -Wformat-security -fno-strict-aliasing -fwrapv
-Wno-error -I. -I. -I /home/peter/postgresql/src/include -D_GNU_SOURCE
  -c -o gram.o /home/peter/postgresql/src/backend/parser/gram.c

real0m2.153s
user0m2.073s
sys 0m0.065s
[peter@peter postgresql]$

This is very close to the time for GCC. The problem has been further
isolated to the flag -Wuninitialized.

I hacked our configure file to omit -Wall in $CFLAGS . -Wall is just a
flag to have GCC/Clang show all reasonable warnings - it doesn't
affect binaries. I then measured the total build time for Postgres
using Clang SVN-tip. Here's what time make outputs:

* SNIP *
real2m7.102s
user1m49.015s
sys 0m10.635s

I'm very encouraged by this - Clang is snapping at the heels of GCC
here. I'd really like to see Clang as a better supported compiler,
because the whole LLVM infrastructure seems to have a lot to offer -
potentially improved compile times, better warnings/errors, a good
static analysis tool, a nice debugger, and a nice modular architecture
for integration with third party components. It is still a bit
immature, but it has the momentum.

At a large presentation that I and other PG community members were
present at during FOSDEM, Postgres was specifically cited as an
example of a medium-sized C program that had considerably improved
compile times on Clang. While I was obviously unable to reproduce the
very impressive compile-time numbers claimed (at -O0), I still think
that Clang has a lot of promise. Here are the slides from that
presentation:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 It remains true that the heavyweight locking structures are
 momentarily locked to capture a consistent view of those, and it
 is also true that the predicate locking structures are
 momentarily locked to get a consistent view of that data.  Both
 are not covered by some over-arching lock, but...
 
 ...but they could be, if we were so inclined.
 
Yes.
 
 What we actually do is acquire all of the lock manager locks,
 snapshot the state, release those locks, then get the predicate
 lock manager locks and SerializableXactHashLock, snapshot the
 state over there, release those locks, and then dump everything
 out.  Since we don't do that, it's possible that select * from
 pg_locks could see a state that never really existed.
 
 For example, it's possible that, after doing GetLockStatusData()
 but before doing GetPredicateLockStatusData(), some backend might
 commit a transaction, releasing a heavyweight lock, begin a new
 transaction, and acquire a predicate lock.  Now, the backend
 looking at the lock table is going to see both of those locks held
 at the same time even though in reality that never happened.
 
That's a fair point.
 
 What I think we should do is replace the existing paragraph with
 something along these lines:
 
 The structnamepg_locksstructname view displays data from both
 the regular lock manager and the predicate lock manager, which are
 separate systems.  When this view is accessed, the internal data
 structures of each lock manager are momentarily locked, and copies
 are made for the view to display.  Each lock manager will
 therefore produce a consistent set of results, but as we do not
 lock both lock managers simultaneously, it is possible for locks
 to be taken or released after we interrogate the regular lock
 manager and before we interrogate the predicate lock manager. 
 Each lock manager is only locked for the minimum possible time so
 as to reduce the performance impact of querying this view, but
 there could nevertheless be some impact on database performance if
 it is frequently accessed.
 
I agree that it's probably better to document it than to increase
the time that both systems are locked.  If we get complaints about
it, we can always revisit the issue in a later release.
 
The above wording looks fine to me.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] News on Clang

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 1:02 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 [research]

Nice stuff.  Thanks for the update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith
There's been a steady flow of messages on pgsql-advocacy since last 
month (threads Crediting sponsors in release notes? and Crediting 
reviewers  bug-reporters in the release notes) talking about who/how 
should receive credited for their work on PostgreSQL.  That discussion 
seems to be me heading in one inevitable direction:  it's not going to 
be possible to make everyone happy unless there's a way to track all of 
these things for each feature added to PostgreSQL:


-Short description for the release notes
-Feature author(s)
-Reviewers and bug reporters
-Sponsors
-Main git commit adding the feature

Now, this is clearly the job for a tool, because the idea that any 
person capable of doing this work will actually do it is 
laughable--everyone qualified is too busy.  It strikes me however that 
the current production of the release notes is itself a time consuming 
and error-prone process that could also be improved by automation.  I 
had an idea for pushing forward both these at once.


Committers here are pretty good at writing terse but clear summaries of 
new features when they are added.  These are generally distilled further 
for the release notes.  It strikes me that a little decoration of commit 
messages might go a long way toward saving time in a few areas here.  
I'll pick a simple easy example I did to demonstrate; I wrote a small 
optimization to commit_delay committed at 
http://archives.postgresql.org/message-id/e1pqp72-0001us...@gemulon.postgresql.org


This made its way into the release notes like this:

  Improve performance of commit_siblings (Greg Smith)
  This allows the use of commit_siblings with less overhead.

What if the commit message had been decorated like this?

  Feature:  Improve performance of commit_siblings

  Optimize commit_siblings in two ways to improve group commit.
  First, avoid scanning the whole ProcArray once we know there...

With that simple addition, two things become possible:

-Generating a first draft of the release notes for a new version could 
turn into a script that parses the git commit logs, which has gotta save 
somebody a whole lot of time each release that goes into the first draft 
of the release notes.
-All of these other ways to analyze of the contributors would be much 
easier to maintain.  A little Author: decoration to that section of 
each commit would probably be welcome too.


I'm sure someone is going to reply to this suggesting some git metadata 
is the right way to handle this, but that seems like overkill to me.  I 
think there's enough committer time gained in faster release note 
generation for this decoration to payback its overhead, which is 
important to me--I'd want a change here to net close to zero for 
committers.  And the fact that it would also allow deriving all this 
other data makes it easier to drive the goals rising out of advocacy 
forward too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera
Excerpts from Bernd Helmle's message of vie jun 24 12:56:26 -0400 2011:
 
 --On 24. Juni 2011 12:06:17 -0400 Robert Haas robertmh...@gmail.com wrote:
 
  Uh, really?  pg_upgrade uses SQL commands to recreate the contents of
  the system catalogs, so if these entries aren't getting created
  automatically, that sounds like a bug in the patch...
 
 AFAIR, i've tested it and it worked as expected.

Okay, thanks for the confirmation.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Christopher Browne
On Fri, Jun 24, 2011 at 5:15 PM, Greg Smith g...@2ndquadrant.com wrote:
 -All of these other ways to analyze of the contributors would be much easier
 to maintain.  A little Author: decoration to that section of each commit
 would probably be welcome too.

I think you're quite right, that mining the commit logs for these
sorts of information is very much the right answer.

Initially, the choice has been to not use the Author tag in Git; I
think that came as part of the overall intent that, at least, in the
beginning, the workflow of the PostgreSQL project shouldn't diverge
terribly much from how it had been with CVS.

It makes sense to have some debate about additional features to
consider using to capture useful workflow.  Sadly, I think that would
have been a very useful debate to have held at the Devs meeting in
Ottawa, when a lot of the relevant people were in a single room; it's
a bit harder now.

In any case, having some tooling to rummage through commits to
generate proposals for release notes seems like a fine idea.  Even
without policy changes (e.g. - to start using Author:, and possibly
other explicit metadata), it would surely be possible to propose
release note contents that tries to use what it finds.

For instance, if the tool captured all email addresses that it finds
in a commit message, and stows them in a convenient spot, that makes
it easier for the human to review the addresses and classify which
might indicate authorship.  Maybe a step ahead.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 1:15 PM, Greg Smith g...@2ndquadrant.com wrote:
 There's been a steady flow of messages on pgsql-advocacy since last month
 (threads Crediting sponsors in release notes? and Crediting reviewers 
 bug-reporters in the release notes) talking about who/how should receive
 credited for their work on PostgreSQL.  That discussion seems to be me
 heading in one inevitable direction:  it's not going to be possible to make
 everyone happy unless there's a way to track all of these things for each
 feature added to PostgreSQL:

I don't read -advocacy regularly, but reviewing the archives, it
doesn't seem to me that we've reached any conclusion about whether
including this information in the release notes is a good idea in the
first place.  It seems to me that one name for each feature is about
at the limit of what we can reasonably do without cluttering the
release notes to the point of unreadability.  I am OK with it the way
it is, but if we must change it I would argue that we ought to have
less credit there, not more.  Which is not to say we shouldn't have
credit.  I think crediting sponsors and reviewers and bug reporters is
a good idea.  But I think the web site is the place to do that, not
the release notes.

As for annotating the commit messages, I think something like:

Reporter: Sam Jones
Author: Beverly Smith
Author: Jim Davids
Reviewer: Fred Block
Reviewer: Pauline Andrews

...would be a useful convention.  I am disinclined to add a feature
annotation.  I think it is unlikely that will end up being any more
useful than just extracting either the whole commit message or its
first line.

I am not inclined to try to track sponsors in the commit message at
all.  Suppose Jeff Davis submits a patch, Stephen Frost reviews it,
and I commit it.  Besides the three human beings involved,
potentially, you've got three employers who might be considered
sponsors, plus any customers of those employers who might have paid
said employer money to justify the time spent on that patch.  On a big
patch, you could easily have ten companies involved in different
roles, some of whom may have made a far larger real contribution to
the development of the feature than others, and I am 100% opposed to
making it the committer's job to include all that in the commit
message.   Also, unlike individuals (whose names can usually be read
off the thread in a few seconds), it is not necessarily obvious who
the corporate participants are (or which ones even WANT to be
credited).  It is almost certain that the committer will sometimes get
it wrong, and the commit log is a terrible place to record information
that might need to be changed after the fact.

It seems to me that, at least for sponsorship information, it would be
far better to have a separate database that pulls in the commit logs
and then gets annotated by the people who care.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-06-24 Thread Jeff Davis
On Thu, 2011-06-23 at 22:02 -0400, Robert Haas wrote:
  1. INSERT to a new page, marking it with LSN X
  2. WAL flushed to LSN Y (Y  X)
  2. Some persistent snapshot (that doesn't see the INSERT) is released,
  and generates WAL recording that fact with LSN Z (Z  Y)
  3. Lazy VACUUM marks the newly all-visible page with PD_ALL_VISIBLE
  4. page is written out because LSN is still X
  5. crash

 I don't really think that's a separate set of assumptions - if we had
 some system whereby snapshots could survive a crash, then they'd have
 to be WAL-logged (because that's how we make things survive crashes).

In the above example, it is WAL-logged (with LSN Z).

 And anything that is WAL-logged must obey the WAL-before-data rule.
 We have a system already that ensures that when
 synchronous_commit=off, CLOG pages can't be flushed before the
 corresponding WAL record makes it to disk.

In this case, how do you prevent the PD_ALL_VISIBLE from making it to
disk if you never bumped the LSN when it was set? It seems like you just
don't have the information to do so, and it seems like the information
required would be variable in size.

 I guess the point you are driving at here is that a page can only go
 from being all-visible to not-all-visible by virtue of being modified.
  There's no other piece of state (like a persistent snapshot) that can
 be lost as part of a crash that would make us need change our mind and
 decide that an all-visible XID is really not all-visible after all.
 (The reverse is not true: since snapshots are ephemeral, a crash will
 render every row either all-visible or dead.)  I guess I never thought
 about documenting that particular aspect of it because (to me) it
 seems fairly self-evident.  Maybe I'm wrong...

I didn't mean to make this conversation quite so hypothetical. My
primary points are:

1. Sometimes it makes sense to break the typical WAL conventions for
performance reasons. But when we do so, we have to be quite careful,
because things get complicated quickly.

2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
because the conditions under which it may be set are more complex
(having to do with both snapshots and cleanup actions). Other hint bits
are based only on transaction status: either the WAL for that
transaction completion got flushed (and is therefore permanent), and we
set the hint bit; or it didn't get flushed and we don't.

Just having this discussion has been good enough for me to get a better
idea what's going on, so if you think the comments are sufficient that's
OK with me.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 1:43 PM, Jeff Davis pg...@j-davis.com wrote:
 And anything that is WAL-logged must obey the WAL-before-data rule.
 We have a system already that ensures that when
 synchronous_commit=off, CLOG pages can't be flushed before the
 corresponding WAL record makes it to disk.

 In this case, how do you prevent the PD_ALL_VISIBLE from making it to
 disk if you never bumped the LSN when it was set? It seems like you just
 don't have the information to do so, and it seems like the information
 required would be variable in size.

Well, I think that would be a problem for the hypothetical implementer
of the persistent snapshot feature.  :-)

More seriously, Heikki and I previously discussed creating some
systematic method for suppressing FPIs when they are not needed,
perhaps by using a bit in the page header to indicate whether an FPI
has been generated since the last checkpoint.  I think it would be
nice to have such a system, but since we don't have a clear agreement
either that it's a good idea or what we'd do after that, I'm not
inclined to invest time in it.  To really get any benefit out of a
change in that area, we'd need probably need to (a) remove the LSN
interlocks that prevent changes from being replayed if the LSN of the
page has already advanced beyond the record LSN and (b) change at
least some of XLOG_HEAP_{INSERT,UPDATE,DELETE} to be idempotent.  But
if we went in that direction then that might help to regularize some
of this and make it a bit less ad-hoc.

 I didn't mean to make this conversation quite so hypothetical. My
 primary points are:

 1. Sometimes it makes sense to break the typical WAL conventions for
 performance reasons. But when we do so, we have to be quite careful,
 because things get complicated quickly.

Yes.

 2. PD_ALL_VISIBLE is a little bit more complex than other hint bits,
 because the conditions under which it may be set are more complex
 (having to do with both snapshots and cleanup actions). Other hint bits
 are based only on transaction status: either the WAL for that
 transaction completion got flushed (and is therefore permanent), and we
 set the hint bit; or it didn't get flushed and we don't.

I think the term hint bits really shouldn't be applied to anything
other than HEAP_{XMIN,XMAX}_{COMMITTED,INVALID}.  Otherwise, we get
into confusing territory pretty quickly.  Our algorithm for
opportunistically killing index entries pointing to dead tuples is not
WAL-logged, but it involves more than a single bit.  OTOH, clearing of
the PD_ALL_VISIBLE bit has always been WAL-logged, so lumping that in
with HEAP_XMIN_COMMITTED is pretty misleading.

 Just having this discussion has been good enough for me to get a better
 idea what's going on, so if you think the comments are sufficient that's
 OK with me.

I'm not 100% certain they are, but let's wait and see if anyone else
wants to weigh in...  please do understand I'm not trying to be a pain
in the neck.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 12:51 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 24.06.2011 11:40, Heikki Linnakangas wrote:

 On 21.06.2011 13:08, Alexander Korotkov wrote:

 I believe it's due to relatively expensive penalty
 method in that opclass.

 Hmm, I wonder if it could be optimized. I did a quick test, creating a
 gist_trgm_ops index on a list of English words from
 /usr/share/dict/words. oprofile shows that with the patch, 60% of the
 CPU time is spent in the makesign() function.

 I couldn't resist looking into this, and came up with the attached patch. I
 tested this with:

 CREATE TABLE words (word text);
 COPY words FROM '/usr/share/dict/words';
 CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

 And then ran REINDEX INDEX i_words a few times with and without the patch.
 Without the patch, reindex takes about 4.7 seconds. With the patch, 3.7
 seconds. That's a worthwhile gain on its own, but becomes even more
 important with Alexander's fast GiST build patch, which calls the penalty
 function more.

 I used the attached showsign-debug.patch to verify that the patched makesign
 function produces the same results as the existing code. I haven't tested
 the big-endian code, however.

Out of curiosity (and because there is no comment or Assert here), how
can you be so sure of the input alignment?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 01:42 PM, Robert Haas wrote:

I am disinclined to add a feature
annotation.  I think it is unlikely that will end up being any more
useful than just extracting either the whole commit message or its
first line.
   


I don't see any good way to extract the list of commits relevant to the 
release notes without something like it.  Right now, you can't just mine 
every commit into the release notes without getting more noise than 
signal.  Something that tags the ones that are adding new features or 
other notable updates, as opposed to bug fixes, doc updates, etc., would 
allow that separation.



I am not inclined to try to track sponsors in the commit message at
all.


I was not suggesting that information be part of the commit.  We've 
worked out a reasonable initial process for the people working on 
sponsored features to record that information completely outside of the 
commit or release notes data.  It turns out though that process would be 
easier to drive if it were easier to derive a feature-{commit,author} 
list though--and that would spit out for free with the rest of this.  
Improving the ability to do sponsor tracking is more of a helpful 
side-effect of something that's useful for other reasons rather than a 
direct goal.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] News on Clang

2011-06-24 Thread Jeroen Vermeulen

On 2011-06-25 00:02, Peter Geoghegan wrote:


At a large presentation that I and other PG community members were
present at during FOSDEM, Postgres was specifically cited as an
example of a medium-sized C program that had considerably improved
compile times on Clang. While I was obviously unable to reproduce the
very impressive compile-time numbers claimed (at -O0), I still think
that Clang has a lot of promise. Here are the slides from that
presentation:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology


I notice that the slide about compilation speed on postgres compares 
only front-end speeds between gcc and clang, not the speeds for 
optimization and code generation.  That may explain why the difference 
is more pronounced on the slide than it is for a real build.


By the way, I was amazed to see such a young compiler build libpqxx with 
no other problems than a few justified warnings or errors that gcc 
hadn't issued.  And that's C++, which is a lot harder than C!  The 
output was also far more helpful than gcc's.  IIRC I found clang just 
slightly faster than gcc on a full configure/build/test.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Heikki Linnakangas

On 24.06.2011 21:24, Robert Haas wrote:

Out of curiosity (and because there is no comment or Assert here), how
can you be so sure of the input alignment?


The input TRGM to makesign() is a varlena, so it must be at least 4-byte 
aligned. If it was not for some reason, the existing VARSIZE invocation 
(within GETARR()) would already fail on platforms that are strict about 
alignment.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 2:47 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 06/24/2011 01:42 PM, Robert Haas wrote:
 I am disinclined to add a feature
 annotation.  I think it is unlikely that will end up being any more
 useful than just extracting either the whole commit message or its
 first line.

 I don't see any good way to extract the list of commits relevant to the
 release notes without something like it.  Right now, you can't just mine
 every commit into the release notes without getting more noise than signal.
  Something that tags the ones that are adding new features or other notable
 updates, as opposed to bug fixes, doc updates, etc., would allow that
 separation.

Oh, I see.  There's definitely some fuzziness about which commits make
it into the release notes right now and which do not - sometimes
things get missed, or sometimes Bruce omits something I would have
included or includes something I would have omitted.  OTOH, it's not
clear that making every committer do that on every commit is going to
be better than having one person go through the log and decide
everything all at once.

If I were attacking this problem, I'd be inclined to make a web
application that lists all the commits in a format roughly similar to
the git API, and then lets you tag each commit with tags from some
list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
Some of the tagging (e.g. docs-only) could probably even be done
automatically.  Then somebody could go through once a month and update
all the tags.  I'd be more more willing to volunteer to do that than I
would be to trying to get the right metadata tag in every commit...

 I am not inclined to try to track sponsors in the commit message at
 all.

 I was not suggesting that information be part of the commit.  We've worked
 out a reasonable initial process for the people working on sponsored
 features to record that information completely outside of the commit or
 release notes data.  It turns out though that process would be easier to
 drive if it were easier to derive a feature-{commit,author} list
 though--and that would spit out for free with the rest of this.  Improving
 the ability to do sponsor tracking is more of a helpful side-effect of
 something that's useful for other reasons rather than a direct goal.

OK, that makes sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Optimizing pg_trgm makesign() (was Re: [HACKERS] WIP: Fast GiST index build)

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 3:01 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 24.06.2011 21:24, Robert Haas wrote:

 Out of curiosity (and because there is no comment or Assert here), how
 can you be so sure of the input alignment?

 The input TRGM to makesign() is a varlena, so it must be at least 4-byte
 aligned. If it was not for some reason, the existing VARSIZE invocation
 (within GETARR()) would already fail on platforms that are strict about
 alignment.

Hmm, OK.  Might be worth adding a comment, anyway...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Christopher Browne
On Fri, Jun 24, 2011 at 6:47 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 06/24/2011 01:42 PM, Robert Haas wrote:
 I am not inclined to try to track sponsors in the commit message at
 all.

 I was not suggesting that information be part of the commit.  We've worked
 out a reasonable initial process for the people working on sponsored
 features to record that information completely outside of the commit or
 release notes data.  It turns out though that process would be easier to
 drive if it were easier to derive a feature-{commit,author} list
 though--and that would spit out for free with the rest of this.  Improving
 the ability to do sponsor tracking is more of a helpful side-effect of
 something that's useful for other reasons rather than a direct goal.

In taking a peek at the documentation and comments out on the interweb
about git amend, I don't think that it's going to be particularly
successful if we try to capture all this stuff in the commit message
and metadata, because it's tough to guarantee that *all* this data
would be correctly captured at commit time, and you can't revise it
after the commit gets pushed upstream.

That applies to anything we might want to track, whether Author: (at
the 'likely to be useful', and 'could quite readily get it correct the
first time' end of the spectrum) or Sponsor: (which is at more than
one dubious ends of spectra).

I expect that the correlation between commit and [various parties] is
something that will need to take place outside git.

The existing CommitFest data goes quite a long ways towards capturing
interesting information (with the likely exception of sponsorship);
what it's missing, at this point, is a capture of what commit or
commits wound up drawing the proposed patch into the official code
base.  That suggests a pretty natural extension, and this is something
I really would like to see.  I'd love to head to the CommitFest page,
and have a list of URLs pointing to the commits that implemented a
particular change.

Given that particular correspondence (e.g. - commitfest request -
list of commits), that would help associate authors, reviewers, and
such to each commit that was related to CommitFest work.  That doesn't
cover everything, but it's a plenty good start.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-24 Thread Robert Haas
On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas robertmh...@gmail.com wrote:
 New patch attached, with that one-line change.

Jeff, are you planning to review this further?  Do you think it's OK to commit?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs
On Thu, Jun 23, 2011 at 8:59 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 22, 2011 at 11:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
 For people that still think there is still risk, I've added a
 parameter called relaxed_ddl_locking which defaults to off. That
 way people can use this feature in an informed way without exposing us
 to support risks from its use.

 I can't get excited about adding a GUC that says you can turn this
 off if you want but don't blame us if it breaks.  That just doesn't
 seem like good software engineering to me.

Nobody is suggesting we fix the pre-existing bug - we're just going to
leave it. That doesn't sound like good software engineering either,
but I'm not going to complain because I understand.

We're in a bind here and I'm trying to suggest practical ways out, as
well as cater for the levels of risk people are willing to accept. I
don't think we need that parameter at all, but if you do then I'm
willing to tolerate it.


 I think we should be clear that this adds *one* lock/unlock for each
 object for each session, ONCE only after each transaction that
 executed a DDL statement against an object. So with 200 sessions, a
 typical ALTER TABLE statement will cause 400 locks. The current lock
 manager maxes out above 50,000 locks per second, so any comparative
 analysis will show this is a minor blip on even a heavy workload.

 I think that using locks to address this problem is the wrong
 approach.  I think the right fix is to use something other than
 SnapshotNow to do the system catalog scans.

I agree with that, but its a much bigger job than you think and I
really doubt that you will do it for 9.2, definitely not for 9.1.

There are so many call points, I would not bother personally to
attempt a such an critical and widely invasive fix.

So I'd put that down as a Blue Sky will-fix-one-day thing.


 However, if we were going
 to use locking, then we'd need to ensure that:

 (1) No transaction which has made changes to a system catalog may
 commit while some other transaction is in the middle of scanning that
 catalog.
 (2) No transaction which has made changes to a set of system catalogs
 may commit while some other transaction is in the midst of fetching
 interrelated data from a subset of those catalogs.

 It's important to note, I think, that the problem here doesn't occur
 at the time that the table rows are updated, because those rows aren't
 visible yet.  The problem happens at commit time.  So unless I'm
 missing something, ALTER TABLE would only need to acquire the relation
 definition lock after it has finished all of its other work.  In fact,
 it doesn't even really need to acquire it then, either.  It could just
 queue up a list of relation definition locks to acquire immediately
 before commit, which would be advantageous if the transaction went on
 to abort, since in that case we'd skip the work of acquiring the locks
 at all.

That would work if the only thing ALTER TABLE does is write. There are
various places where we read catalogs and all of those catalogs need
to be relationally integrated. So the only safe way, without lots of
incredibly detailed analysis (which you would then find fault with) is
to lock at the top and hold the lock throughout most of the
processing. That's the safe thing to do, at least.

I do already use the point you make in the patch, where it's used  to
unlock before and then relock after the FK constraint check, so that
the RelationDefinition lock is never held for long periods in any code
path.

 In fact, without doing that, this patch would undo much of the point
 of the original ALTER TABLE lock strength reduction:

 begin;
 alter table foo alter column a set storage plain;
 start a new psql session in another window
 select * from foo;

 In master, the SELECT completes without blocking.  With this patch
 applied, the SELECT blocks, just as it did in 9.0, because it can't
 rebuild the relcache entry without getting the relation definition
 lock, which the alter table will hold until commit.

It's not the same at all. Yes, they are both locks; what counts is the
frequency and duration of locking.

With AccessExclusiveLock we prevent all SELECTs from accessing the
table while we queue for the lock, which can be blocked behind a long
running query. So the total outage to the application for one minor
change can be hours - unpredictably. (Which is why I never wrote the
ALTER TABLE set ndistinct myself, even though I suggested it, cos its
mostly unusable).

With the reduced locking level AND this patch the *rebuild* can be
blocked behind the DDL. But the DDL itself is not blocked in the same
way, so the total outage is much reduced and the whole set of actions
goes through fairly quickly.

If you only submit one DDL operation then the rebuild has nothing to
block against, so the whole thing goes through smoothly.


 In fact, the same
 thing happens even if foo has been accessed previously in the same
 session.  

[HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread karavelov

Hello,

I have seen the discussions about fastlock patch and lazy-vxid  performance 
degradation, so I decided to test it myself.

The setup:
- hardware
Supermicro blade 
6xSAS @15k on LSI RAID:
 1 disk for system + pg_xlog 
 4 disk RAID 10 for data 
 1 disk for spare 
2 x Xeon E5405 @2GHz (no HT), 8 cores total
8G RAM

- software 
Debian Sid, linux-2.6.39.1
Postgresql 9.1 beta2, compiled by debian sources
incrementally applied fastlock v3 and lazy-vxid v1 patches. I have to resolve 
manually a conflict in src/backend/storage/lmgr/proc.c
Configuration: increased shared_mem to 2G, max_connections to 500

- pgbench
initiated datasert with scaling factor 100
example command invocation: ./pgbench -h 127.0.0.1 -n -S -T 30 -c 8 -j 8 -M 
prepared pgtest

Results:

clients beta2   +fastlock   +lazyvzid local socket  
8   76064   92430   92198   106734
16  64254   90788   90698   105097
32  56629   88189   88269   101202
64  51124   84354   8463996362
128 45455   79361   7972490625
256 40370   71904   7273782434

All runs are executed on warm cache, I made some runs for 300s with the same 
results (tps).
I have done some runs with -M simple with identical distribution across cleints.

I post this results because they somehow contradict with previous results 
posted on the list. In
my case the patches does not only improve peak performance but also improve the 
performance 
under load - without patches the performance with 256 clients is 53% of the 
peak performance 
that is obtained with 8 clients, with patches the performance with 256 client 
is 79% of the peak 
with 8 clients. 

Best regards
Luben Karavelov

P.S. Excuse me for starting new thread - I am new on the list.



Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 3:31 PM,  karave...@mail.bg wrote:
 I post this results because they somehow contradict with previous results
 posted on the list. In
 my case the patches does not only improve peak performance but also improve
 the performance
 under load - without patches the performance with 256 clients is 53% of the
 peak performance
 that is obtained with 8 clients, with patches the performance with 256
 client is 79% of the peak
 with 8 clients.

I think this is strongly related to core count.  The spinlock
contention problems don't become really bad until you get up above 32
CPUs... at least from what I can tell so far.

So I'm not surprised it was just a straight win on your machine... but
thanks for verifying.  It's helpful to have more data points.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Test case please. I don't understand the problem you're describing.

S1: select * from foo;
S2: begin;
S2: alter table foo alter column a set storage plain;
S1: select * from foo;
blocks

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_locks documentation vs. SSI

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 1:08 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 What I think we should do is replace the existing paragraph with
 something along these lines:

 The structnamepg_locksstructname view displays data from both
 the regular lock manager and the predicate lock manager, which are
 separate systems.  When this view is accessed, the internal data
 structures of each lock manager are momentarily locked, and copies
 are made for the view to display.  Each lock manager will
 therefore produce a consistent set of results, but as we do not
 lock both lock managers simultaneously, it is possible for locks
 to be taken or released after we interrogate the regular lock
 manager and before we interrogate the predicate lock manager.
 Each lock manager is only locked for the minimum possible time so
 as to reduce the performance impact of querying this view, but
 there could nevertheless be some impact on database performance if
 it is frequently accessed.

 I agree that it's probably better to document it than to increase
 the time that both systems are locked.  If we get complaints about
 it, we can always revisit the issue in a later release.

 The above wording looks fine to me.

OK, committed that change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs
On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Test case please. I don't understand the problem you're describing.

 S1: select * from foo;
 S2: begin;
 S2: alter table foo alter column a set storage plain;
 S1: select * from foo;
 blocks

Er,,.yes, that what locks do. Where is the bug?

We have these choices of behaviour
1. It doesn't error and doesn't block - not possible for 9.1, probably
not for 9.2 either
2. It doesn't block, but may throw an error sometimes - the reported bug
3. It blocks in some cases for short periods where people do repeated
DDL, but never throws errors - this patch
4. Full scale locking - human sacrifice, cats and dogs, living
together, mass hysteria

If you want to avoid the blocking, then don't hold open the transaction.

Do this

S1: select * from foo
S2: alter table   run in its own transaction
S1: select * from foo

Doesn't block, no errors. Which is exactly what most people do on
their production servers. The ALTER TABLE statements we're talking
about are not schema changes. They don't need to be coordinated with
other DDL.

This patch has locking, but its the most reduced form of locking that
is available for a non invasive patch for 9.1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Bruce Momjian
Peter Eisentraut wrote:
 On tor, 2011-06-23 at 21:39 -0400, Bruce Momjian wrote:
  I have created the following patch which uses 25432 as the default port
  number for pg_upgrade.
 
 I don't think we should just steal a port from the reserved range.
 Picking a random port from the private/dynamic range seems more
 appropriate.

Oh, I didn't know about that.  I will use 50432 instead.

  It also creates two new environment variables,
  OLDPGPORT and NEWPGPORT, to control the port values because we don't
  want to default to PGPORT anymore.
 
 I would prefer that all PostgreSQL-related environment variables start
 with PG.

OK, attached.  I was also using environment variables for PGDATA and
PGBIN do I renamed those too to begin with 'PG'.

Patch attached.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
new file mode 100644
index 1ee2aca..5c5ce72
*** a/contrib/pg_upgrade/check.c
--- b/contrib/pg_upgrade/check.c
*** output_check_banner(bool *live_check)
*** 29,34 
--- 29,37 
  	if (user_opts.check  is_server_running(old_cluster.pgdata))
  	{
  		*live_check = true;
+ 		if (old_cluster.port == DEF_PGUPORT)
+ 			pg_log(PG_FATAL, When checking a live old server, 
+    you must specify the old server's port number.\n);
  		if (old_cluster.port == new_cluster.port)
  			pg_log(PG_FATAL, When checking a live server, 
     the old and new port numbers must be different.\n);
diff --git a/contrib/pg_upgrade/option.c b/contrib/pg_upgrade/option.c
new file mode 100644
index 4401a81..d29aad0
*** a/contrib/pg_upgrade/option.c
--- b/contrib/pg_upgrade/option.c
*** parseCommandLine(int argc, char *argv[])
*** 58,65 
  	os_info.progname = get_progname(argv[0]);
  
  	/* Process libpq env. variables; load values here for usage() output */
! 	old_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
! 	new_cluster.port = getenv(PGPORT) ? atoi(getenv(PGPORT)) : DEF_PGPORT;
  
  	os_user_effective_id = get_user_info(os_info.user);
  	/* we override just the database user name;  we got the OS id above */
--- 58,65 
  	os_info.progname = get_progname(argv[0]);
  
  	/* Process libpq env. variables; load values here for usage() output */
! 	old_cluster.port = getenv(PGPORTOLD) ? atoi(getenv(PGPORTOLD)) : DEF_PGUPORT;
! 	new_cluster.port = getenv(PGPORTNEW) ? atoi(getenv(PGPORTNEW)) : DEF_PGUPORT;
  
  	os_user_effective_id = get_user_info(os_info.user);
  	/* we override just the database user name;  we got the OS id above */
*** parseCommandLine(int argc, char *argv[])
*** 203,215 
  	}
  
  	/* Get values from env if not already set */
! 	check_required_directory(old_cluster.bindir, OLDBINDIR, -b,
  			old cluster binaries reside);
! 	check_required_directory(new_cluster.bindir, NEWBINDIR, -B,
  			new cluster binaries reside);
! 	check_required_directory(old_cluster.pgdata, OLDDATADIR, -d,
  			old cluster data resides);
! 	check_required_directory(new_cluster.pgdata, NEWDATADIR, -D,
  			new cluster data resides);
  }
  
--- 203,215 
  	}
  
  	/* Get values from env if not already set */
! 	check_required_directory(old_cluster.bindir, PGBINOLD, -b,
  			old cluster binaries reside);
! 	check_required_directory(new_cluster.bindir, PGBINNEW, -B,
  			new cluster binaries reside);
! 	check_required_directory(old_cluster.pgdata, PGDATAOLD, -d,
  			old cluster data resides);
! 	check_required_directory(new_cluster.pgdata, PGDATANEW, -D,
  			new cluster data resides);
  }
  
*** For example:\n\
*** 254,270 
  or\n), old_cluster.port, new_cluster.port, os_info.user);
  #ifndef WIN32
  	printf(_(\
!   $ export OLDDATADIR=oldCluster/data\n\
!   $ export NEWDATADIR=newCluster/data\n\
!   $ export OLDBINDIR=oldCluster/bin\n\
!   $ export NEWBINDIR=newCluster/bin\n\
$ pg_upgrade\n));
  #else
  	printf(_(\
!   C:\\ set OLDDATADIR=oldCluster/data\n\
!   C:\\ set NEWDATADIR=newCluster/data\n\
!   C:\\ set OLDBINDIR=oldCluster/bin\n\
!   C:\\ set NEWBINDIR=newCluster/bin\n\
C:\\ pg_upgrade\n));
  #endif
  	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
--- 254,270 
  or\n), old_cluster.port, new_cluster.port, os_info.user);
  #ifndef WIN32
  	printf(_(\
!   $ export PGDATAOLD=oldCluster/data\n\
!   $ export PGDATANEW=newCluster/data\n\
!   $ export PGBINOLD=oldCluster/bin\n\
!   $ export PGBINNEW=newCluster/bin\n\
$ pg_upgrade\n));
  #else
  	printf(_(\
!   C:\\ set PGDATAOLD=oldCluster/data\n\
!   C:\\ set PGDATANEW=newCluster/data\n\
!   C:\\ set PGBINOLD=oldCluster/bin\n\
!   C:\\ set PGBINNEW=newCluster/bin\n\
C:\\ pg_upgrade\n));
  #endif
  	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
diff --git a/contrib/pg_upgrade/pg_upgrade.h b/contrib/pg_upgrade/pg_upgrade.h
new 

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian
Robert Haas wrote:
 On Fri, Jun 24, 2011 at 2:47 PM, Greg Smith g...@2ndquadrant.com wrote:
  On 06/24/2011 01:42 PM, Robert Haas wrote:
  I am disinclined to add a feature
  annotation. ?I think it is unlikely that will end up being any more
  useful than just extracting either the whole commit message or its
  first line.
 
  I don't see any good way to extract the list of commits relevant to the
  release notes without something like it. ?Right now, you can't just mine
  every commit into the release notes without getting more noise than signal.
  ?Something that tags the ones that are adding new features or other notable
  updates, as opposed to bug fixes, doc updates, etc., would allow that
  separation.
 
 Oh, I see.  There's definitely some fuzziness about which commits make
 it into the release notes right now and which do not - sometimes
 things get missed, or sometimes Bruce omits something I would have
 included or includes something I would have omitted.  OTOH, it's not
 clear that making every committer do that on every commit is going to
 be better than having one person go through the log and decide
 everything all at once.
 
 If I were attacking this problem, I'd be inclined to make a web
 application that lists all the commits in a format roughly similar to
 the git API, and then lets you tag each commit with tags from some
 list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
 Some of the tagging (e.g. docs-only) could probably even be done
 automatically.  Then somebody could go through once a month and update
 all the tags.  I'd be more more willing to volunteer to do that than I
 would be to trying to get the right metadata tag in every commit...

That tagging is basically what I do on my first pass through the release
notes.  For the gory details:

http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Test case please. I don't understand the problem you're describing.

 S1: select * from foo;
 S2: begin;
 S2: alter table foo alter column a set storage plain;
 S1: select * from foo;
 blocks

 Er,,.yes, that what locks do. Where is the bug?

I didn't say it was a bug.  I said that the additional locking you
added acted to remove the much of the benefit of reducing the lock
strength in the first place.  If a brief exclusive lock (such as would
be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
acceptable, a brief exclusive lock on a different lock tag that blocks
most of the same operations isn't going to be any better.  You might
still see some improvement in the cases where some complicated
operation like scanning or rewriting the table can be performed
without holding the relation definition lock, and then only get the
relation definition lock afterward.  But for something like the above
example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
your previous email) the benefit is going to be darned close to zero.

 This patch has locking, but its the most reduced form of locking that
 is available for a non invasive patch for 9.1

I don't like the extra lock manager traffic this adds to operations
that aren't even performing DDL, I'm not convinced that it is safe,
the additional locking negates a significant portion of the benefit of
the original patch, and Tom's made a fairly convincing argument that
even with this pg_dump will still become significantly less reliable
than heretofore even if we did apply it.  In my view, what we need to
do is revert to using AccessExclusiveLock until some of these other
details are worked out.  I wasn't entirely convinced that that was
necessary when Tom first posted this email, but having thought through
the issues and looked over your patch, it's now my position that that
is the best way forward.  The fact that your proposed patch appears
not to be thinking very clearly about when locks need to be taken and
released only adds to my feeling that we are in for a bad time if we
go down this route.

In short: -1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq SSL with non-blocking sockets

2011-06-24 Thread Steve Singer

On 11-06-15 03:20 PM, Martin Pihlak wrote:


Yes, that sounds like a good idea -- especially considering that COPY
is not the only operation that can cause SSL_write retries.


This is of course still work in progress, needs cleaning up and
definitely more testing. But at this point before going any further,
I'd really appreciate a quick review from resident libpq gurus.



Martin,

I'm not a libpq guru but I've given your patch a look over.

The idea of storing the original buffer in new members of connection 
structure for later re-use seems like a good way to approach this.


A few things I noticed (that you might be aware of since you mentioned 
it needs cleanup)


-The patch doesn't compile with non-ssl builds,  the debug at the bottom 
of PQSendSome isn't in an #ifdef


-I don't think your handling the return code properly.   Consider this case.

pqSendSome(some data)
  sslRetryBuf = some Data
  return 1
pqSendSome(more data)
  it sends all of 'some data'
  returns 0

I think 1 should be returned because all of 'more data' still needs to 
be sent.  I think returning a 0 will break PQsetnonblocking if you call 
it when there is data in both sslRetryBuf and outputBuffer.
We might even want to try sending the data in outputBuffer after we've 
sent all the data sitting in sslRetryBuf.



If you close the connection with an outstanding sslRetryBuf you need to 
free it.



Other than running your test program I didn't do any testing with this 
patch.


Steve


regards,
Martin







Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 3:31 PM,  karave...@mail.bg wrote:
 clients beta2 +fastlock +lazyvzid local socket
 8 76064 92430 92198 106734
 16 64254 90788 90698 105097
 32 56629 88189 88269 101202
 64 51124 84354 84639 96362
 128 45455 79361 79724 90625
 256 40370 71904 72737 82434

I'm having trouble interpreting this table.

Column 1: # of clients
Column 2: TPS using 9.1beta2 unpatched
Column 3: TPS using 9.1beta2 + fastlock patch
Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
Column 5: ???

At any rate, that is a big improvement on a system with only 8 cores.
I would have thought you would have needed ~16 cores to get that much
speedup.  I wonder if the -M prepared makes a difference ... I wasn't
using that option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Simon Riggs
On Fri, Jun 24, 2011 at 10:08 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 24, 2011 at 4:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Jun 24, 2011 at 9:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 24, 2011 at 3:46 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Test case please. I don't understand the problem you're describing.

 S1: select * from foo;
 S2: begin;
 S2: alter table foo alter column a set storage plain;
 S1: select * from foo;
 blocks

 Er,,.yes, that what locks do. Where is the bug?

 I didn't say it was a bug.  I said that the additional locking you
 added acted to remove the much of the benefit of reducing the lock
 strength in the first place.  If a brief exclusive lock (such as would
 be taken by ALTER TABLE SET STORAGE in 9.0 and prior release) wasn't
 acceptable, a brief exclusive lock on a different lock tag that blocks
 most of the same operations isn't going to be any better.  You might
 still see some improvement in the cases where some complicated
 operation like scanning or rewriting the table can be performed
 without holding the relation definition lock, and then only get the
 relation definition lock afterward.  But for something like the above
 example (or the ALTER TABLE SET (n_distinct) feature you mentioned in
 your previous email) the benefit is going to be darned close to zero.

 This patch has locking, but its the most reduced form of locking that
 is available for a non invasive patch for 9.1

 I don't like the extra lock manager traffic this adds to operations
 that aren't even performing DDL, I'm not convinced that it is safe,
 the additional locking negates a significant portion of the benefit of
 the original patch, and Tom's made a fairly convincing argument that
 even with this pg_dump will still become significantly less reliable
 than heretofore even if we did apply it.  In my view, what we need to
 do is revert to using AccessExclusiveLock until some of these other
 details are worked out.  I wasn't entirely convinced that that was
 necessary when Tom first posted this email, but having thought through
 the issues and looked over your patch, it's now my position that that
 is the best way forward.  The fact that your proposed patch appears
 not to be thinking very clearly about when locks need to be taken and
 released only adds to my feeling that we are in for a bad time if we
 go down this route.

 In short: -1 from me.

I've explained all of the above points to you already and you're wrong.

I don't think you understand this patch at all, nor even the original feature.

Locking the table is completely different from locking the definition
of a table.

Do you advocate that all ALTER TABLE operations use
AccessExclusiveLock, or just the operations I added? If you see
problems here, then you must also be willing to increase the lock
strength of things such as INHERITS, and back patch them to where the
same problems exist. You'll wriggle out of that, I'm sure. There are
regrettably, many bugs here and they can't be fixed in the simple
manner you propose.

It's not me you block Robert, I'm not actually a user and I will sleep
well whatever happens, happy that I tried to resolve this. Users watch
and remember.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 5:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I've explained all of the above points to you already and you're wrong.

We're going to have to agree to disagree on that point.

 Do you advocate that all ALTER TABLE operations use
 AccessExclusiveLock, or just the operations I added? If you see
 problems here, then you must also be willing to increase the lock
 strength of things such as INHERITS, and back patch them to where the
 same problems exist. You'll wriggle out of that, I'm sure. There are
 regrettably, many bugs here and they can't be fixed in the simple
 manner you propose.

I think there is quite a lot of difference between realizing that we
can't fix every problem, and deciding to put out a release that adds a
whole lot more of them that we have no plans to fix.

 It's not me you block Robert, I'm not actually a user and I will sleep
 well whatever happens, happy that I tried to resolve this. Users watch
 and remember.

If you are proposing that I should worry about a posse of angry
PostgreSQL users hunting me down (or abandoning the product) because I
agreed with Tom Lane on the necessity of reverting one of your
patches, then I'm willing to take that chance.  For one thing, there's
a pretty good chance they'll go after Tom first.  For two things,
there's at least an outside chance I might be rescued by an
alternative posse who supports our tradition of putting out high
quality releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small 9.1 documentation fix (SSPI auth)

2011-06-24 Thread Christian Ullrich
When Magnus fixed and applied my SSPI-via-GSS patch in January, we 
forgot to fix to the documentation. Suggested patch attached; should I 
also put that four-liner into any CFs?


--
Christian
diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
new file mode 100644
index 575eb3b..e7c7db4
*** a/doc/src/sgml/client-auth.sgml
--- b/doc/src/sgml/client-auth.sgml
*** omicron bryanh  
*** 989,996 
  literalnegotiate/literal mode, which will use
  productnameKerberos/productname when possible and automatically
  fall back to productnameNTLM/productname in other cases.
! productnameSSPI/productname authentication only works when both
! server and client are running productnameWindows/productname.
 /para
  
 para
--- 989,998 
  literalnegotiate/literal mode, which will use
  productnameKerberos/productname when possible and automatically
  fall back to productnameNTLM/productname in other cases.
! In addition to clients running on productnameWindows/productname,
! productnameSSPI/productname authentication is also supported by
! applicationlibpq/application-based clients on other platforms, 
! through the use of productnameGSSAPI/productname.
 /para
  
 para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera

So remind me ... did we discuss PRIMARY KEY constraints?  Are they
supposed to show up as inherited not null rows in the child?  Obviously,
they do not show up as PKs in the child, but they *are* not null so my
guess is that they need to be inherited as not null as well.  (Right
now, unpatched head of course emits the column as attnotnull).

In this case, the inherited name (assuming that the child declaration
does not explicitely state a constraint name) should be the same as the
PK, correct?

It is unclear to me that primary keys shouldn't be inherited by default.
But I guess that's a separate discussion.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 6:39 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 So remind me ... did we discuss PRIMARY KEY constraints?  Are they
 supposed to show up as inherited not null rows in the child?  Obviously,
 they do not show up as PKs in the child, but they *are* not null so my
 guess is that they need to be inherited as not null as well.  (Right
 now, unpatched head of course emits the column as attnotnull).

 In this case, the inherited name (assuming that the child declaration
 does not explicitely state a constraint name) should be the same as the
 PK, correct?

I would tend to think of the not-null-ness that is required by the
primary constraint as a separate constraint, not an intrinsic part of
the primary key.  IOW, if you drop the primary key constraint, IMV,
that should never cause the column to begin allowing nulls.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread David Christensen
On Jun 24, 2011, at 2:28 PM, Christopher Browne wrote:
 On Fri, Jun 24, 2011 at 6:47 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 06/24/2011 01:42 PM, Robert Haas wrote:
 I am not inclined to try to track sponsors in the commit message at
 all.
 
 I was not suggesting that information be part of the commit.  We've worked
 out a reasonable initial process for the people working on sponsored
 features to record that information completely outside of the commit or
 release notes data.  It turns out though that process would be easier to
 drive if it were easier to derive a feature-{commit,author} list
 though--and that would spit out for free with the rest of this.  Improving
 the ability to do sponsor tracking is more of a helpful side-effect of
 something that's useful for other reasons rather than a direct goal.
 
 In taking a peek at the documentation and comments out on the interweb
 about git amend, I don't think that it's going to be particularly
 successful if we try to capture all this stuff in the commit message
 and metadata, because it's tough to guarantee that *all* this data
 would be correctly captured at commit time, and you can't revise it
 after the commit gets pushed upstream.

Perhaps `git notes` could be something used to annotate these:

http://www.kernel.org/pub/software/scm/git/docs/git-notes.html

Regards,

David
--
David Christensen
End Point Corporation
da...@endpoint.com





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Peter Eisentraut
On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
   It also creates two new environment variables,
   OLDPGPORT and NEWPGPORT, to control the port values because we
 don't
   want to default to PGPORT anymore.
  
  I would prefer that all PostgreSQL-related environment variables
 start
  with PG.
 
 OK, attached.  I was also using environment variables for PGDATA and
 PGBIN do I renamed those too to begin with 'PG'.

I'm wondering why pg_upgrade needs environment variables at all.  It's a
one-shot operation.  Environment variables are typically used to shared
default settings across programs.  I don't see how that applies here.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another issue with invalid XML values

2011-06-24 Thread Noah Misch
On Fri, Jun 24, 2011 at 04:15:35PM +0200, Florian Pflug wrote:
 On Jun24, 2011, at 13:24 , Noah Misch wrote:
  On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:

  There's also the  parser error : difference; would that also be easy to
  re-add?  (I'm assuming it's not a constant but depends on the 
  xmlErrorDomain.)
 
 It's the string representation of the error domain and error level. 
 Unfortunately,
 the logic which builds that string representation is contained in the static
 function xmlReportError() deep within libxml, and that function doesn't seem
 to be called at all for errors reported via a structured error handler :-(
 
 So re-adding that would mean duplicating that code within our xml.c, which
 seems undesirable...

Yes, that seems like sufficient trouble to not undertake merely for the sake of
a cosmetic error message match.

  ! typedef enum
  ! {
  !  PG_XML_STRICTNESS_NONE  /* No error handling */,
  !  PG_XML_STRICTNESS_LEGACY/* Ignore notices/warnings/errors unless
  !   * function 
  result indicates error condition
  !   */,
  !  PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser 
  notices/warnings/errors */,
  !  PG_XML_STRICTNESS_ALL   /* Report all notices/warnings/errors 
  */,
  ! } PgXmlStrictnessType;
  
  We don't generally prefix backend names with Pg/PG.  Also, I think the word
  Type in PgXmlStrictnessType is redundant.  How about just XmlStrictness?
 
 I agree with removing the Type suffix. I'm  not so sure about the Pg
 prefix, though. I've added that after actually hitting a conflict between
 one of this enum's constants and some constant defined by libxml. Admittedly,
 that was *before* I renamed the type to PgXmlStrictnessType, so removing
 the Pg prefix now would probably work. But it seems a bit close for
 comfort - libxml defines a ton of constants named XML_something.
 
 Still, if you feel that the Pg prefix looks to weird and believe that it's
 better to wait until there's an actual clash before renaming things, I won't
 object further. Just wanted to bring the issue to your attention...

I do think it looks weird, but I agree that the potential for conflict is
notably higher than normal.  I'm fine with having the prefix.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] News on Clang

2011-06-24 Thread Peter Eisentraut
On fre, 2011-06-24 at 18:02 +0100, Peter Geoghegan wrote:
 I'm very encouraged by this - Clang is snapping at the heels of GCC
 here. I'd really like to see Clang as a better supported compiler,

We have a build farm member for Clang:
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=smewbr=HEAD

That doesn't capture the compile time issues that you described, but
several other issues that caused the builds to fail altogether have been
worked out recently, and I'd consider clang 2.9+ to be fully supported
as of PG 9.1.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 04:52 PM, Bruce Momjian wrote:

That tagging is basically what I do on my first pass through the release
notes.  For the gory details:

http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009
   


Excellent summary of the process I was trying to suggest might be 
improved; the two most relevant bits:


3   remove insignificant items  2.7k1 day
4   research and reword items   1k  5 days


Some sort of tagging to identify feature changes should drive down the 
time spent on filtering insignificant items.  And the person doing the 
commit already has the context you are acquiring later as research 
here.  Would suggesting they try to write a short description at commit 
time drive it and the reword phase time down significantly?  Can't say 
for sure, but I wanted to throw the idea out for 
consideration--particularly since solving it well ends up making some of 
this other derived data people would like to see a lot easier to 
generate too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us




Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Bruce Momjian
Peter Eisentraut wrote:
 On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
It also creates two new environment variables,
OLDPGPORT and NEWPGPORT, to control the port values because we
  don't
want to default to PGPORT anymore.
   
   I would prefer that all PostgreSQL-related environment variables
  start
   with PG.
  
  OK, attached.  I was also using environment variables for PGDATA and
  PGBIN do I renamed those too to begin with 'PG'.
 
 I'm wondering why pg_upgrade needs environment variables at all.  It's a
 one-shot operation.  Environment variables are typically used to shared
 default settings across programs.  I don't see how that applies here.

They were there in the original EnterpriseDB code, and in some cases
like PGPORT, we _used_ those environment variables.  Also, the
command-line can get pretty long so we actually illustrate environment
variable use in its --help:

For example:
  pg_upgrade -d oldCluster/data -D newCluster/data -b oldCluster/bin -B 
newCluster/bin
or
  $ export OLDDATADIR=oldCluster/data
  $ export NEWDATADIR=newCluster/data
  $ export OLDBINDIR=oldCluster/bin
  $ export NEWBINDIR=newCluster/bin
  $ pg_upgrade

You want the environment variable support removed?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FWD: fastlock+lazyvzid patch performance

2011-06-24 Thread karavelov
- Цитат от Robert Haas (robertmh...@gmail.com), на 25.06.2011 в 00:16 -

 On Fri, Jun 24, 2011 at 3:31 PM,   wrote:
 clients beta2 +fastlock +lazyvzid local socket
 8 76064 92430 92198 106734
 16 64254 90788 90698 105097
 32 56629 88189 88269 101202
 64 51124 84354 84639 96362
 128 45455 79361 79724 90625
 256 40370 71904 72737 82434
 
 I'm having trouble interpreting this table.
 
 Column 1: # of clients
 Column 2: TPS using 9.1beta2 unpatched
 Column 3: TPS using 9.1beta2 + fastlock patch
 Column 4: TPS using 9.1beta2 + fastlock patch + vxid patch
 Column 5: ???

9.1beta2 + fastlock patch + vxid patch , pgbench run on unix domain 
socket, the other tests are using local TCP connection.

 At any rate, that is a big improvement on a system with only 8 cores.
 I would have thought you would have needed ~16 cores to get that much
 speedup.  I wonder if the -M prepared makes a difference ... I wasn't
 using that option.
 

Yes, it does make some difference, 
Using unpatched beta2, 8 clients with simple protocol I get 57059 tps.
With all patches and simple protocol I get 60707 tps. So the difference
between patched/stock is not so big. I suppose the system gets CPU bound
on parsing and planning every submitted request. With -M extended I 
get even slower results.

Luben

--
Perhaps, there is no greater love than that of a
 revolutionary couple where each of the two lovers is
 ready to abandon the other at any moment if revolution
 demands it.
 Zizek

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian
Greg Smith wrote:
 On 06/24/2011 04:52 PM, Bruce Momjian wrote:
  That tagging is basically what I do on my first pass through the release
  notes.  For the gory details:
 
  http://momjian.us/main/blogs/pgblog/2009.html#March_25_2009
 
 
 Excellent summary of the process I was trying to suggest might be 
 improved; the two most relevant bits:
 
 3 remove insignificant items  2.7k1 day
 4 research and reword items   1k  5 days
 
 
 Some sort of tagging to identify feature changes should drive down the 
 time spent on filtering insignificant items.  And the person doing the 
 commit already has the context you are acquiring later as research 
 here.  Would suggesting they try to write a short description at commit 
 time drive it and the reword phase time down significantly?  Can't say 
 for sure, but I wanted to throw the idea out for 
 consideration--particularly since solving it well ends up making some of 
 this other derived data people would like to see a lot easier to 
 generate too.

Most of those five days is tracking down commits where I can't figure
out the user-visible change, or if there is one, and wording things to
be in a consistent voice.  Git does allow me to look those up much
faster than CVS.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 03:28 PM, Christopher Browne wrote:

I expect that the correlation between commit and [various parties] is
something that will need to take place outside git.
   


Agreed on everything except the Author information that is already 
being placed into each commit.  The right data is already going into 
there, all it would take is some small amount of tagging to make it 
easier to extract programatically.



The existing CommitFest data goes quite a long ways towards capturing
interesting information (with the likely exception of sponsorship);
what it's missing, at this point, is a capture of what commit or
commits wound up drawing the proposed patch into the official code
base.


The main problem with driving this from the CommitFest app is that not 
every feature ends up in there.  Committers who commit their own work 
are one source of those.  Commits for bug fixes that end up being 
notable enough to go into the release notes are another.


I agree it would be nice if every entry marked as Committed in the CF 
app included a final link to the message ID of the commit closing it.  
But since I don't ever see that being the complete data set, I find it 
hard to justify enforcing that work.  And the ability to operate 
programatically on the output from git log is a slightly easier path 
to walk down than extracting the same from the CF app, you avoid one 
pre-processing step:  extracting the right entries in the database to 
get a list of commit IDs.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] News on Clang

2011-06-24 Thread Peter Geoghegan
On 25 June 2011 00:27, Peter Eisentraut pete...@gmx.net wrote:
 On fre, 2011-06-24 at 18:02 +0100, Peter Geoghegan wrote:
 I'm very encouraged by this - Clang is snapping at the heels of GCC
 here. I'd really like to see Clang as a better supported compiler,

 We have a build farm member for Clang:
 http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=smewbr=HEAD

 That doesn't capture the compile time issues that you described, but
 several other issues that caused the builds to fail altogether have been
 worked out recently, and I'd consider clang 2.9+ to be fully supported
 as of PG 9.1.

I did see the D_GNU_SOURCE issue that you described a while back with
2.8. My bleeding edge Fedora 15 system's yum repository only has 2.8,
for some reason.

I'm glad that you feel we're ready to officially support Clang -
should this be in the 9.1 release notes?

Anyway, since the problem has been narrowed to a specific compiler
flag, and we have a good test case, I'm optimistic that the bug can be
fixed quickly.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Greg Smith

On 06/24/2011 03:21 PM, Robert Haas wrote:

If I were attacking this problem, I'd be inclined to make a web
application that lists all the commits in a format roughly similar to
the git API, and then lets you tag each commit with tags from some
list (feature, bug-fix, revert, repair-of-previous-commit, etc.).
Some of the tagging (e.g. docs-only) could probably even be done
automatically.  Then somebody could go through once a month and update
all the tags.  I'd be more more willing to volunteer to do that than I
would be to trying to get the right metadata tag in every commit...
   


I tend not to think in terms of solutions that involve web applications 
because I never build them, but this seems like a useful approach to 
consider.  Given that the list of tags is pretty static, I could see a 
table with a line for each commit, and a series of check boxes in 
columns for each tag next to it.  Then you just click on each of the 
tags that apply to that line.


Once that was done, increasing the amount of smarts that go into 
pre-populating which boxes are already filled in could then happen, with 
docs only being the easiest one to spot.  A really smart 
implementation here might even eventually make a good guess for bug 
fix too, based on whether a bunch of similar commits happened to other 
branches around the same time.  Everyone is getting better lately at 
noting the original SHA1 when fixing a mistake too, so being able to 
identify repair seems likely when that's observed.


This approach would pull the work from being at commit time, but it 
would still be easier to do incrementally and to distribute the work 
around than what's feasible right now.  Release note prep takes critical 
project contributors a non-trivial amount of time, this would let anyone 
who felt like tagging things for an hour help with the early stages of 
that.  And it would provide a functional source for the metadata I've 
been searching for too, to drive all this derived data about sponsors etc.


Disclaimer:  as a person who does none of this work currently, my 
suggestions are strictly aimed to inspire those who do in a direction 
that might makes things easier for them.  I can get the sponsor stuff 
I've volunteered to work on finished regardless.  I just noticed what 
seems like it could be a good optimization over here while I was working 
on that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Bruce Momjian
Greg Smith wrote:
 I tend not to think in terms of solutions that involve web applications 
 because I never build them, but this seems like a useful approach to 
 consider.  Given that the list of tags is pretty static, I could see a 
 table with a line for each commit, and a series of check boxes in 
 columns for each tag next to it.  Then you just click on each of the 
 tags that apply to that line.
 
 Once that was done, increasing the amount of smarts that go into 
 pre-populating which boxes are already filled in could then happen, with 
 docs only being the easiest one to spot.  A really smart 
 implementation here might even eventually make a good guess for bug 
 fix too, based on whether a bunch of similar commits happened to other 
 branches around the same time.  Everyone is getting better lately at 
 noting the original SHA1 when fixing a mistake too, so being able to 
 identify repair seems likely when that's observed.
 
 This approach would pull the work from being at commit time, but it 
 would still be easier to do incrementally and to distribute the work 
 around than what's feasible right now.  Release note prep takes critical 
 project contributors a non-trivial amount of time, this would let anyone 
 who felt like tagging things for an hour help with the early stages of 
 that.  And it would provide a functional source for the metadata I've 
 been searching for too, to drive all this derived data about sponsors etc.

We could have the items put into release note categories and have a
button that marks incompatibilties.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: starting to review the Extend NOT NULL representation to pg_constraint patch

2011-06-24 Thread Alvaro Herrera
Excerpts from Robert Haas's message of vie jun 24 19:01:49 -0400 2011:
 On Fri, Jun 24, 2011 at 6:39 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  So remind me ... did we discuss PRIMARY KEY constraints?  Are they
  supposed to show up as inherited not null rows in the child?  Obviously,
  they do not show up as PKs in the child, but they *are* not null so my
  guess is that they need to be inherited as not null as well.  (Right
  now, unpatched head of course emits the column as attnotnull).
 
  In this case, the inherited name (assuming that the child declaration
  does not explicitely state a constraint name) should be the same as the
  PK, correct?
 
 I would tend to think of the not-null-ness that is required by the
 primary constraint as a separate constraint, not an intrinsic part of
 the primary key.  IOW, if you drop the primary key constraint, IMV,
 that should never cause the column to begin allowing nulls.

Yeah, that is actually what happens.  (I had never noticed this, but it seems
obvious in hindsight.)

alvherre=# create table foo (a int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY creará el índice implícito «foo_pkey» para 
la tabla «foo»
CREATE TABLE
alvherre=# alter table foo drop constraint foo_pkey;
ALTER TABLE
alvherre=# \d foo
Tabla «public.foo»
 Columna |  Tipo   | Modificadores 
-+-+---
 a   | integer | not null

What this says is that this patch needs to be creating pg_constraint
entries for all PRIMARY KEY columns too, which answers my question above
quite nicely.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Latch implementation that wakes on postmaster death on both win32 and Unix

2011-06-24 Thread Peter Geoghegan
Attached is patch that addresses Fujii's third and most recent set of concerns.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 4952d22..bfe6bcd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -10158,7 +10158,7 @@ retry:
 	/*
 	 * Wait for more WAL to arrive, or timeout to be reached
 	 */
-	WaitLatch(XLogCtl-recoveryWakeupLatch, 500L);
+	WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, 500L);
 	ResetLatch(XLogCtl-recoveryWakeupLatch);
 }
 else
diff --git a/src/backend/port/unix_latch.c b/src/backend/port/unix_latch.c
index 6dae7c9..6d2e3a1 100644
--- a/src/backend/port/unix_latch.c
+++ b/src/backend/port/unix_latch.c
@@ -93,6 +93,7 @@
 #endif
 
 #include miscadmin.h
+#include postmaster/postmaster.h
 #include storage/latch.h
 #include storage/shmem.h
 
@@ -188,22 +189,25 @@ DisownLatch(volatile Latch *latch)
  * backend-local latch initialized with InitLatch, or a shared latch
  * associated with the current process by calling OwnLatch.
  *
- * Returns 'true' if the latch was set, or 'false' if timeout was reached.
+ * Returns bit field indicating which condition(s) caused the wake-up.
+ *
+ * Note that there is no guarantee that callers will have all wake-up conditions
+ * returned, but we will report at least one.
  */
-bool
-WaitLatch(volatile Latch *latch, long timeout)
+int
+WaitLatch(volatile Latch *latch, int wakeEvents, long timeout)
 {
-	return WaitLatchOrSocket(latch, PGINVALID_SOCKET, false, false, timeout)  0;
+	return WaitLatchOrSocket(latch, wakeEvents, PGINVALID_SOCKET, timeout);
 }
 
 /*
  * Like WaitLatch, but will also return when there's data available in
- * 'sock' for reading or writing. Returns 0 if timeout was reached,
- * 1 if the latch was set, 2 if the socket became readable or writable.
+ * 'sock' for reading or writing.
+ *
+ * Returns same bit mask and makes same guarantees as WaitLatch.
  */
 int
-WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
-  bool forWrite, long timeout)
+WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, pgsocket sock, long timeout)
 {
 	struct timeval tv,
 			   *tvp = NULL;
@@ -211,12 +215,15 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	fd_set		output_mask;
 	int			rc;
 	int			result = 0;
+	bool		found = false;
+
+	Assert(wakeEvents != 0);
 
 	if (latch-owner_pid != MyProcPid)
 		elog(ERROR, cannot wait on a latch owned by another process);
 
 	/* Initialize timeout */
-	if (timeout = 0)
+	if (timeout = 0  (wakeEvents  WL_TIMEOUT))
 	{
 		tv.tv_sec = timeout / 100L;
 		tv.tv_usec = timeout % 100L;
@@ -224,7 +231,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	}
 
 	waiting = true;
-	for (;;)
+	do
 	{
 		int			hifd;
 
@@ -235,16 +242,31 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		 * do that), and the select() will return immediately.
 		 */
 		drainSelfPipe();
-		if (latch-is_set)
+		if (latch-is_set  (wakeEvents  WL_LATCH_SET))
 		{
-			result = 1;
+			result |= WL_LATCH_SET;
+			found = true;
+			/* Leave loop immediately, avoid blocking again.
+			 *
+			 * Don't attempt to report any other reason
+			 * for returning to callers that may have
+			 * happened to coincide.
+			 */
 			break;
 		}
 
 		FD_ZERO(input_mask);
 		FD_SET(selfpipe_readfd, input_mask);
 		hifd = selfpipe_readfd;
-		if (sock != PGINVALID_SOCKET  forRead)
+
+		if (wakeEvents  WL_POSTMASTER_DEATH)
+		{
+			FD_SET(postmaster_alive_fds[POSTMASTER_FD_WATCH], input_mask);
+			if (postmaster_alive_fds[POSTMASTER_FD_WATCH]  hifd)
+hifd = postmaster_alive_fds[POSTMASTER_FD_WATCH];
+		}
+
+		if (sock != PGINVALID_SOCKET  (wakeEvents  WL_SOCKET_READABLE))
 		{
 			FD_SET(sock, input_mask);
 			if (sock  hifd)
@@ -252,7 +274,7 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 		}
 
 		FD_ZERO(output_mask);
-		if (sock != PGINVALID_SOCKET  forWrite)
+		if (sock != PGINVALID_SOCKET  (wakeEvents  WL_SOCKET_WRITEABLE))
 		{
 			FD_SET(sock, output_mask);
 			if (sock  hifd)
@@ -268,20 +290,33 @@ WaitLatchOrSocket(volatile Latch *latch, pgsocket sock, bool forRead,
 	(errcode_for_socket_access(),
 	 errmsg(select() failed: %m)));
 		}
-		if (rc == 0)
+		if (rc == 0  (wakeEvents  WL_TIMEOUT))
 		{
 			/* timeout exceeded */
-			result = 0;
-			break;
+			result |= WL_TIMEOUT;
+			found = true;
 		}
-		if (sock != PGINVALID_SOCKET 
-			((forRead  FD_ISSET(sock, input_mask)) ||
-			 (forWrite  FD_ISSET(sock, output_mask
+		if (sock != PGINVALID_SOCKET)
 		{
-			result = 2;
-			break;/* data available in socket */
+			if ((wakeEvents  WL_SOCKET_READABLE )  FD_ISSET(sock, input_mask))
+			{
+result |= WL_SOCKET_READABLE;
+found = true; /* data available in socket */
+	

Re: [HACKERS] Deriving release notes from git commit messages

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 7:51 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 06/24/2011 03:28 PM, Christopher Browne wrote:
 I expect that the correlation between commit and [various parties] is
 something that will need to take place outside git.

 Agreed on everything except the Author information that is already being
 placed into each commit.  The right data is already going into there, all it
 would take is some small amount of tagging to make it easier to extract
 programatically.

Yeah, I think we should seriously consider doing something about that.

 The existing CommitFest data goes quite a long ways towards capturing
 interesting information (with the likely exception of sponsorship);
 what it's missing, at this point, is a capture of what commit or
 commits wound up drawing the proposed patch into the official code
 base.

 The main problem with driving this from the CommitFest app is that not every
 feature ends up in there.  Committers who commit their own work are one
 source of those.  Commits for bug fixes that end up being notable enough to
 go into the release notes are another.

Yep.

 I agree it would be nice if every entry marked as Committed in the CF app
 included a final link to the message ID of the commit closing it.  But since
 I don't ever see that being the complete data set, I find it hard to justify
 enforcing that work.  And the ability to operate programatically on the
 output from git log is a slightly easier path to walk down than extracting
 the same from the CF app, you avoid one pre-processing step:  extracting the
 right entries in the database to get a list of commit IDs.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade defaulting to port 25432

2011-06-24 Thread Robert Haas
On Fri, Jun 24, 2011 at 7:47 PM, Bruce Momjian br...@momjian.us wrote:
 Peter Eisentraut wrote:
 On fre, 2011-06-24 at 16:34 -0400, Bruce Momjian wrote:
It also creates two new environment variables,
OLDPGPORT and NEWPGPORT, to control the port values because we
  don't
want to default to PGPORT anymore.
  
   I would prefer that all PostgreSQL-related environment variables
  start
   with PG.
 
  OK, attached.  I was also using environment variables for PGDATA and
  PGBIN do I renamed those too to begin with 'PG'.

 I'm wondering why pg_upgrade needs environment variables at all.  It's a
 one-shot operation.  Environment variables are typically used to shared
 default settings across programs.  I don't see how that applies here.

 They were there in the original EnterpriseDB code, and in some cases
 like PGPORT, we _used_ those environment variables.  Also, the
 command-line can get pretty long so we actually illustrate environment
 variable use in its --help:

        For example:
          pg_upgrade -d oldCluster/data -D newCluster/data -b oldCluster/bin 
 -B newCluster/bin
        or
          $ export OLDDATADIR=oldCluster/data
          $ export NEWDATADIR=newCluster/data
          $ export OLDBINDIR=oldCluster/bin
          $ export NEWBINDIR=newCluster/bin
          $ pg_upgrade

 You want the environment variable support removed?

I don't.  It's production usefulness is questionable, but it's quite
handy for testing IMO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers