Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-03 Thread Amit Kapila
On Tuesday, April 02, 2013 9:49 PM Peter Eisentraut wrote:
 I'm going to ignore most of the discussion that led up to this and give
 this patch a fresh look.

Thank you.

 + screen
 + SET PERSISTENT max_connections To 10;
 + /screen
 
 The To should probably be capitalized.

Okay, shall fix it.
 
 I doubt this example actually works because changing max_connections
 requires a restart.  Try to find a better example.

Any parameter's changed with this command can be effective either after
restart or SIGHUP.
So I will change it to SIGHUP parameter.

 It's weird that SET LOCAL and SET SESSION actually *set* the value, and
 the second key word determines how long the setting will last.  SET
 PERSISTENT doesn't actually set the value.  I predict that this will be
 a new favorite help-it-doesn't-work FAQ.
 
   varlistentry
termliteralSCHEMA/literal/term
listitem
 !   paraliteralSET [ PERSISTENT ] SCHEMA
 'replaceablevalue/'/ is an alias for
  literalSET search_path TO replaceablevalue//.  Only
 one
  schema can be specified using this syntax.
 /para
 
 I don't think [ PERSISTENT ] needs to be added in these and similar
 snippets.  We don't mention LOCAL etc. here either.

Agreed, shall fix this.

 --- 34,41 
   filenamepostgresql.conf/filename,
 filenamepg_hba.conf/filename, and
   filenamepg_ident.conf/filename are traditionally stored in
   varnamePGDATA/ (although in productnamePostgreSQL/productname
 8.0 and
 ! later, it is possible to place them elsewhere). By default the
 directory config is stored
 ! in varnamePGDATA/, however it should be kept along with
 filenamepostgresql.conf/filename.
   /para
 
   table tocentry=1 id=pgdata-contents-table
 
 This chunk doesn't make any sense to me.  Should is always tricky.
 Why should I, and why might I not?

I mean to say here, that user needs to move config directory and its
contents along with 
postgresql.conf. 
Shall we change as below: 
By default config directory is stored in PGDATA, however it needs to be kept
along with postgresql.conf


   row
 +  entryfilenameconfig//entry
 +  entrySubdirectory containing automatically generated configuration
 files/entry
 + /row
 +
 + row
entryfilenamebase//entry
entrySubdirectory containing per-database subdirectories/entry
   /row
 
 Only automatically generated ones?

No, any other files can also be present. 
How about change it as : 
Subdirectory containing generated configuration files. 
Any other suggestions?

This new directory's will be used to place generated files, 

 COPY_STRING_FIELD(name);
 COPY_NODE_FIELD(args);
 COPY_SCALAR_FIELD(is_local);
 +   COPY_SCALAR_FIELD(is_persistent);
 
 return newnode;
   }
 
 I suggest changing is_local into a new trivalued field that stores
 LOCAL
 or SESSION or PERSISTENT.
 
 n-is_local = false;
 $$ = (Node *) n;
 }

Okay, I will change it.

 +   | SET PERSISTENT set_persistent
 +   {
 +   VariableSetStmt *n = $3;
 +   n-is_persistent = true;
 +   $$ = (Node *) n;
 +   }
 ;
 
   set_rest:
 
 Why can't you use SET PERSISTENT set_rest?

As SET PERSISTENT cannot be used with some of syntaxes, example (SESSION
AUTHORIZATION) and also it is not supportted inside transaction blocks.

 
 *** a/src/backend/replication/basebackup.c
 --- b/src/backend/replication/basebackup.c
 ***
 *** 755,760  sendDir(char *path, int basepathlen, bool sizeonly)
 --- 755,766 
 strlen(PG_TEMP_FILE_PREFIX)) ==
 0)
 continue;
 
 +   /* skip auto conf temporary file */
 +   if (strncmp(de-d_name,
 +   PG_AUTOCONF_FILENAME .,
 +   sizeof(PG_AUTOCONF_FILENAME))
 == 0)
 +   continue;
 +
 
 Maybe pg_basebackup should be taught to ignore certain kinds of
 temporary files in general.  The file name shouldn't be hardcoded into
 pg_basebackup.  This would effectively make the configuration file
 naming scheme part of the replication protocol.  See other thread about
 pg_basebackup client/server compatibility.  This needs to be
 generalized.

As we are just trying to ignore the file during backup, why it should effect
replication protocol? 
I mean protocol should be effected, if try to send something new or don't
send what is expected on the other side.
Also the same is done for team and backup label file.

 (Side thought: Does pg_basebackup copy editor temp and backup files?)

Currently pg_basebackup doesn't copy temp or backup files. The check I made
in code is similar to check for those two.

 
 +   

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-03 Thread Amit Kapila
On Wednesday, April 03, 2013 2:23 AM Greg Smith wrote:
 On 4/1/13 5:44 AM, Amit Kapila wrote:
 
  I think in that case we can have 3 separate patches
  1. Memory growth defect fix
  2. Default postgresql.conf to include config directory and SET
 Persistent
  into single file implementation
  3. Rearrangement of GUC validation into validate_conf_option
 function.
 
  As already there is review happened for point 2 and point 1 is an
 existing
  code defect fix, so in my opinion
  Patch 1 and 2 should be considered for 9.3.
 
 They have been considered for 9.3, I just doubt they could get
 committed
 right now.  In order for this to go in as part of the very last 9.3
 feature changes (which is where we're at in the development cycle),
 you'd need to have a committer volunteer to take on the job of doing a
 second level of review on it.  And then that would have to happen
 without any other issues popping up.  That usually is not how it works.
   Normally the first round of committer review finds another round of
 issues, and there's at least one more update before commit.  (Note that
 this is exactly what just happened today, with the review from Peter
 Eisentraut)
 
 I'm not saying it's impossible for this feature to go in to 9.3, but
 I'm
 not seeing any committers volunteer to take on the job either.  I do
 want to see this feature go in--I'll update it for 9.4 even if you
 don't--but we're already into April.  There isn't much time left for
 9.3.  And the security release this week has gobbled up a good chunk of
 committer and packager time unexpectedly, which is just bad luck for
 your submission.
 
  From a process perspective, features that enter the last CF of a
 release that are very close to ready from the start have good odds of
 being committed.  You've done an excellent job of updating this in
 response to feedback, but it has involved a long list of changes so
 far.
   It's fair to say this was still a rough feature at the start of CF
 2013-01, and now it's good but can be usefully polished a bit more.
 
 For something of this size, going from rough feature to commit quality
 normally takes more than one CF.  I don't have any obvious errors to
 point out right now.  But I think there's still some room to improve on
 this before commit.  Andres mentioned on another thread that he thought
 merging some of your ideas with the version Zoltan did was useful to
 look at, and I was thinking of something similar.  This is close to
 being ready, and I hope you won't get discouraged just because it's
 probably going to slip to 9.4.

Thank you for keeping me motivated for this patch now and throughout the
last CF by providing valuable suggestions and comments.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-04-03 Thread Stefan Kaltenbrunner
On 04/03/2013 12:59 AM, Tom Lane wrote:
 I wrote:
 I think the simplest fix is to insert PG_SETMASK(UnBlockSig) into
 StatementCancelHandler() and any other handlers that might exit via
 longjmp.  I'm a bit inclined to only do this on platforms where a
 problem is demonstrable, which so far is only OpenBSD.  (You'd
 think that all BSDen would have the same issue, but the buildfarm
 shows otherwise.)
 
 BTW, on further thought it seems like maybe this is an OpenBSD bug,
 at least in part: what is evidently happening is that the temporary
 blockage of SIGINT during the handler persists even after we've
 longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
 to establish that longjmp handler --- so why isn't the original signal
 mask reinstalled when we return to the main loop?
 
 If (your version of) OpenBSD is getting this wrong, it'd explain why
 we've not seen similar behavior elsewhere.

hmm trolling the openbsd cvs history brings up this:

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-03 Thread Erikjan Rijkers
On Tue, April 2, 2013 23:54, Alexander Korotkov wrote:

 [trgm-regexp-0.15.patch.gz]

Yes, it does look good now; Attached a list of measurements. Most of the 
searches that I put in
that test-program are now speeded up very much.

There still are a few regressions, for example:

HEAD  azjunk6  x[aeiou]{4,5}q  83  Seq Scan  1393.465 ms
trgm_regex15  azjunk6  x[aeiou]{4,5}q  83  Bitmap Heap Scan  1728.319 ms

HEAD  azjunk7  x[aeiou]{1,3}q  190031  Seq Scan 16819.555 ms
trgm_regex15  azjunk7  x[aeiou]{1,3}q  190031  Bitmap Heap Scan 21286.804 ms

Not exactly negligible, and ideally those regressions would be removed but with 
the huge
advantages for other cases I'd say it's worth it.

hth,

Erik Rijkers





re-head-13-15-20130403-0708.txt.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-03 Thread Alexander Korotkov
On Wed, Apr 3, 2013 at 11:10 AM, Erikjan Rijkers e...@xs4all.nl wrote:

 On Tue, April 2, 2013 23:54, Alexander Korotkov wrote:

  [trgm-regexp-0.15.patch.gz]

 Yes, it does look good now; Attached a list of measurements. Most of the
 searches that I put in
 that test-program are now speeded up very much.

 There still are a few regressions, for example:

 HEAD  azjunk6  x[aeiou]{4,5}q  83  Seq Scan
  1393.465 ms
 trgm_regex15  azjunk6  x[aeiou]{4,5}q  83  Bitmap Heap Scan
  1728.319 ms

 HEAD  azjunk7  x[aeiou]{1,3}q  190031  Seq Scan
 16819.555 ms
 trgm_regex15  azjunk7  x[aeiou]{1,3}q  190031  Bitmap Heap Scan
 21286.804 ms

 Not exactly negligible, and ideally those regressions would be removed but
 with the huge
 advantages for other cases I'd say it's worth it.


Thank you for testing!
Exploring results more detail I found version 13 to be buggy. This version
is a dead end, we have quite different API now. Could you use v12 instead
of v13 in comparison, please?
Sometimes we have regression in comparison with head in two reasons:
1) We select index scan in both cases but with patch we spent more time for
analysis. It's inevitable disadvantage of any index. We can only take care
of analysis doesn't take too long. Current testing results don't show this
reason to be significant.
2) Sometimes we select index scan while sequential scan would be faster.
It's also inevitable disadvantage until we have a relevant statistics. We
now have similar situation, for example, with in-core geometrical search
and LIKE/ILIKE search in pg_trgm. However,  probably, situation could be
improved somehow even without such statistics. But I think we can do such
conclusion based on synthetical testing, because improvements for
synthetical cases could appear to be an worsening for real-life cases.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] regression test failed when enabling checksum

2013-04-03 Thread Andres Freund
On 2013-04-01 19:51:19 -0700, Jeff Janes wrote:
 On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis pg...@j-davis.com wrote:
 
 
  Patch attached. Only brief testing done, so I might have missed
  something. I will look more closely later.
 
 
  After applying your patch, I could run the stress test described here:
 
  http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php
 
  But altered to make use of initdb -k, of course.
 
  Over 10,000 cycles of crash and recovery, I encountered two cases of
  checksum failures after recovery, example:
  ...
 
 
 
  Unfortunately I already cleaned up the data directory before noticing the
  problem, so I have nothing to post for forensic analysis.  I'll try to
  reproduce the problem.
 
 
 I've reproduced the problem, this time in block 74 of relation
 base/16384/4931589, and a tarball of the data directory is here:
 
 https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing
 
 (the table is in database jjanes under role jjanes, the binary is commit
 9ad27c215362df436f8c)
 
 What I would probably really want is the data as it existed after the crash
 but before recovery started, but since the postmaster immediately starts
 recovery after the crash, I don't know of a good way to capture this.
 
 I guess one thing to do would be to extract from the WAL the most recent
 FPW for block 74 of relation base/16384/4931589  (assuming it hasn't been
 recycled already) and see if it matches what is actually in that block of
 that data file, but I don't currently know how to do that.
 
 11500 SELECT 2013-04-01 12:01:56.926 PDT:WARNING:  page verification
 failed, calculated checksum 54570 but expected 34212
 11500 SELECT 2013-04-01 12:01:56.926 PDT:ERROR:  invalid page in block 74
 of relation base/16384/4931589
 11500 SELECT 2013-04-01 12:01:56.926 PDT:STATEMENT:  select sum(count) from
 foo

I just checked and unfortunately your dump doesn't contain all that much
valid WAL:
rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
7/AB28, prev 7/AA90, bkp: , desc: checkpoint: redo 7/AB28; tli 
1; prev tli 1; fpw true; xid 0/156747297; oid 4939781; multi 1; offset 0; 
oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
7/AB90, prev 7/AB28, bkp: , desc: checkpoint: redo 7/AB90; tli 
1; prev tli 1; fpw true; xid 0/156747297; oid 4939781; multi 1; offset 0; 
oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
pg_xlogdump: FATAL:  error in WAL record at 7/AB90: record with zero length 
at 7/ABF8

So just two checkpoint records.

Unfortunately I  fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-03 Thread Albe Laurenz
David E. Wheeler wrote:
 I am working on scripts to copy data from Oracle via oracle_fdw. They each do 
 something like this:
 
 CREATE SCHEMA migrate_stuff;
 SET search_path TO migrate_stuff,public;
 CREATE EXTENSION oracle_fdw SCHEMA migrate_rules;
 
[...]
 
 Then I run them in parallel:
 
 for file in migrate*.sql; do
 psql -d foo -f $file 
 done
 wait
 
 This works fine except for one thing: the first CREATE EXTENSION statement 
 blocks all the others. Even
 when I create the extension in separate schemas in each script! I have to 
 remove the CREATE EXTENSION
 statement, create it in public before any of the scripts run, then drop it 
 when they're done. I'm okay
 with this workaround, but wasn't sure if the blocking of CREATE EXTENSION was 
 intentional or a known
 issue (id did not see it documented in 
 http://www.postgresql.org/docs/current/static/sql-
 createextension.html).

I'd expect one of the CREATE EXTENSION commands to succeed
and the others to block until the transaction is committed,
then to fail with 'extension oracle_fdw already exists'.

If that is what happens, it's what I'd expect since
extension names are unique (see the unique constraint on
pg_extension).

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra security measures for next week's releases

2013-04-03 Thread roadrunner6

Am 28.03.2013 18:03, schrieb Tom Lane:

The core committee has decided that one of the security issues due to be
fixed next week is sufficiently bad that we need to take extra measures
to prevent it from becoming public before packages containing the fix
are available.  (This is a scenario we've discussed before, but never
had to actually implement.)



8.3 has reached EOL in February 2013, I guess there will be no fix for 
8.3, right?





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extra security measures for next week's releases

2013-04-03 Thread Magnus Hagander
On Wed, Apr 3, 2013 at 12:09 PM,  roadrunn...@gmx.at wrote:
 Am 28.03.2013 18:03, schrieb Tom Lane:

 The core committee has decided that one of the security issues due to be
 fixed next week is sufficiently bad that we need to take extra measures
 to prevent it from becoming public before packages containing the fix
 are available.  (This is a scenario we've discussed before, but never
 had to actually implement.)


 8.3 has reached EOL in February 2013, I guess there will be no fix for 8.3,
 right?

That is correct.

Some distributions may backpatch fixes manually, but there will be no
official patch for 8.3.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Florian Pflug
On Apr1, 2013, at 13:43 , Robert Haas robertmh...@gmail.com wrote:
 I don't think the current behavior is broken.  I found it
 counterintuitive at first, but then I got used to it.  It's reasonably
 self-consistent: arrays can't have empty dimensions, therefore the
 empty array is unique and dimensionless.  Is that the behavior I would
 have picked if I had designed the type?  No, it isn't.  I wouldn't
 have tried to support one-dimensional arrays and multi-dimensional
 arrays in the same data type either, nor would I have supported
 non-default lower bounds.  But all of those ships have sailed, and the
 time to change them is not after people have spent 10 years building
 applications that work with the current behavior.  If we want to
 introduce a new type with different, perhaps better, behavior, well, I
 think that might be a fine idea.  But I *don't* think imposing a hard
 compatibility break on users of arrays is a good idea.

+1.

Especially since having infinitely many variants of empty is IMHO
hardly an improvement over the existing situation.

If we're going to break compatibility, we should IMHO get rid of
non-zero lower bounds all together. My guess is that the number of
affected users wouldn't be much higher than for the proposed patch,
and it'd allow lossless mapping to most language's native array types…

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in documentation for function to_json

2013-04-03 Thread Michael Paquier
Hi all,

While reading some documentation about json functions, I found a typo with
the example used with function to_json. A bracket was missing. Please find
attached the patch correcting that.

Regards,
-- 
Michael


20120403_to_json_typo.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2013-04-03 Thread Dimitri Fontaine
Hi,

Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Documentation doesn't build, multiple errors. In addition to the reference
 pages, there should be a section in the main docs about these templates.

 I'm now working on that, setting up the documentation tool set.

Fixed in the attached version 6 of the patch.

I couldn't get to fix the documentation build tool chain on my main
workstation, so I had to revive a VM where I was lucky enough to find my
old setup was still working, all with shared directories, VPATH build
setup etc. Anyways.

I also expanded a little on the docs we did have, adding a section in
the main extension chapter and some example in the CREATE TEMPLATE FOR
EXTENSION reference page.

That version has no flaw that I'm aware of and implements the design we
reached after almost 2 years on the topic which is more complex than it
would appear at first sight. I hope that you will like the patch if not
the feature itself, as I have big plans for it in the near future.

Many thanks to all involved for helping with the design and the
reviewing, regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



templates.v6.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Andrew Dunstan


On 04/02/2013 02:46 PM, Florian Pflug wrote:

On Apr1, 2013, at 13:43 , Robert Haas robertmh...@gmail.com wrote:

I don't think the current behavior is broken.  I found it
counterintuitive at first, but then I got used to it.  It's reasonably
self-consistent: arrays can't have empty dimensions, therefore the
empty array is unique and dimensionless.  Is that the behavior I would
have picked if I had designed the type?  No, it isn't.  I wouldn't
have tried to support one-dimensional arrays and multi-dimensional
arrays in the same data type either, nor would I have supported
non-default lower bounds.  But all of those ships have sailed, and the
time to change them is not after people have spent 10 years building
applications that work with the current behavior.  If we want to
introduce a new type with different, perhaps better, behavior, well, I
think that might be a fine idea.  But I *don't* think imposing a hard
compatibility break on users of arrays is a good idea.

+1.

Especially since having infinitely many variants of empty is IMHO
hardly an improvement over the existing situation.

If we're going to break compatibility, we should IMHO get rid of
non-zero lower bounds all together. My guess is that the number of
affected users wouldn't be much higher than for the proposed patch,
and it'd allow lossless mapping to most language's native array types…



That would actually break a HUGE number of users, since the default 
lower bound is 1. I have seen any number of pieces if code that rely on 
that.


Doesn't the standard have something to say about all this?

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in documentation for function to_json

2013-04-03 Thread Andrew Dunstan


On 04/03/2013 09:22 AM, Michael Paquier wrote:

Hi all,

While reading some documentation about json functions, I found a typo 
with the example used with function to_json. A bracket was missing. 
Please find attached the patch correcting that.





I have a patch for this and other docs errors waiting in the wings. I 
will commit it when the current hiatus is done, some time tomorrow.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in documentation for function to_json

2013-04-03 Thread Michael Paquier
On Wed, Apr 3, 2013 at 10:32 PM, Andrew Dunstan and...@dunslane.net wrote:

 I have a patch for this and other docs errors waiting in the wings. I will
 commit it when the current hiatus is done, some time tomorrow.

OK thanks.
-- 
Michael


[HACKERS] Typo in FDW documentation

2013-04-03 Thread Albe Laurenz
I found a small typo, patch attached.

Yours,
Laurenz Albe


fdw-doc.patch
Description: fdw-doc.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-03 Thread Robert Haas
On Tue, Apr 2, 2013 at 1:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
  I agree in general, though I'm not sure the bgwriter process can
  reasonably handle this need along with what it's already supposed to be
  doing.  We may need another background process that is just responsible
  for keeping the freelist populated.

  What else is the bgwriter actually doing otherwise? Sure, it doesn't put 
  the
  pages on the freelist, but otherwise its trying to do the above isn't it?

 I think it will be problematic to tie buffer-undirtying to putting both
 clean and dirty buffers into the freelist.  It might chance to work all
 right to use one scan process for both, but I'm afraid it's more likely
 that we'd end up either overserving one goal or underserving the other.

 Hm. I had imagined that we would only ever put clean buffers into the
 freelist and that we would never write out a buffer that we don't need
 for a new page. I don't see much point in randomly writing out buffers
 that won't be needed for something else soon. Currently we can't do much
 better than basically undirtying random buffers since we don't really know
 which page will reach a usagecount of zero since bgwriter doesn't
 manipulate usagecounts.

 One other scenario I can see is the problem of strategy buffer reusage
 of dirtied pages (hint bits, pruning) during seqscans where we would
 benefit from pages being written out fast, but I can't imagine that that
 could be handled very well by something like the bgwriter?

 Am I missing something?

I've had the same thought.  I think we should consider having a
background process that listens on a queue, sort of like the fsync
absorption queue.  When a process using a buffer access strategy
dirties a buffer, it adds it to that queue and sets the latch for the
background process, which then wakes up and starts cleaning the
buffers that have been added to its queue.  The hope is that, by the
time the ring buffer wraps around, the background process will have
cleaned the buffer, preventing the foreground process from having to
wait for the buffer write (and, perhaps, xlog flush).

The main hesitation I've had about actually implementing such a scheme
is that I find it a bit unappealing to have a background process
dedicated to just this.  But maybe it could be combined with some of
the other ideas presented here.  Perhaps we should have one process
that scans the buffer arena and populates the freelists; as a side
effect, if it runs across a dirty buffer, it kicks it over to the
process described in the previous paragraph (which could still, also,
absorb requests from other backends using buffer access strategies).
Then we'd end up with nothing that looks exactly like the background
writer we have now, but maybe no one would miss it.

I think that as we go through the process of trying to improve this,
we should also look hard at trying to make the algorithms more
self-tuning.  For example, instead of having a fixed delay between
rounds for the buffer-arena-scanning process, I think we should try to
make it adaptive.  If it sticks a bunch of buffers on the freelist and
the freelist then runs dry before it wakes up again, the backend that
notices that the list is empty (or below some low watermark), it
should set a latch to wake up the buffer-arena-scanning process; and
the next time that process goes back to sleep, it should sleep for a
shorter period of time.  As things are today, what the background
writer actually does is unhelpful enough that there might not be much
point in fiddling with this, but as we get to having a more sensible
scheme overall, I think it will pay dividends.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Florian Pflug
On Apr3, 2013, at 15:30 , Andrew Dunstan and...@dunslane.net wrote:
 On 04/02/2013 02:46 PM, Florian Pflug wrote:
 If we're going to break compatibility, we should IMHO get rid of
 non-zero lower bounds all together. My guess is that the number of
 affected users wouldn't be much higher than for the proposed patch,
 and it'd allow lossless mapping to most language's native array types…
 
 That would actually break a HUGE number of users, since the default lower
 bound is 1. I have seen any number of pieces if code that rely on that.

Uh, yeah, we should make it 1 then, not 0, then. As long as the bound
is fixed, conversion to native C/Java/Ruby/Python/... arrays would still
be lossless.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 04/02/2013 02:46 PM, Florian Pflug wrote:
 If we're going to break compatibility, we should IMHO get rid of
 non-zero lower bounds all together. My guess is that the number of
 affected users wouldn't be much higher than for the proposed patch,
 and it'd allow lossless mapping to most language's native array types…

 That would actually break a HUGE number of users, since the default 
 lower bound is 1. I have seen any number of pieces if code that rely on 
 that.

I assume he meant non-one lower bounds.  But in any case, removing the
option for other lower bounds would be very clearly a removal of useful
functionality, so I don't see it happening ... certainly not if we're
so tied to bug-compatibility that we can't redefine the number of
dimensions an empty array has.

I think though that the upthread argument that we'd have multiple
interpretations of the same thing is bogus.  To me, the core idea that's
being suggested here is that '{}' should mean a zero-length 1-D array,
not a zero-D array as formerly.  We would still need a way to represent
zero-D arrays, if only because they'd still exist on-disk in existing
databases (assuming we're not willing to break pg_upgrade for this).
I suggest that that ought *not* involve any braces.  Perhaps '[]=' would
be a suitable representation.  In the other direction, ISTM that
'{{},{},{}}' is a zero-by-three array, entirely distinct from '{}' or
'{{}}' in dimensionality if not content.  I haven't worked this out in
complete detail, but if different strings mean the same thing then we
don't have the representation defined right.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in documentation for function to_json

2013-04-03 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 I have a patch for this and other docs errors waiting in the wings. I 
 will commit it when the current hiatus is done, some time tomorrow.

There's no reason not to commit now.  The blockage is on propagation to
the anongit mirror, not on what committers can do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in documentation for function to_json

2013-04-03 Thread Andrew Dunstan


On 04/03/2013 10:11 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

I have a patch for this and other docs errors waiting in the wings. I
will commit it when the current hiatus is done, some time tomorrow.

There's no reason not to commit now.  The blockage is on propagation to
the anongit mirror, not on what committers can do.





True, but there's no great advantage either. I am not planning to commit 
anything else before then, and I like to check the committers email when 
I commit to make sure everything is kosher.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in FDW documentation

2013-04-03 Thread Tom Lane
Albe Laurenz laurenz.a...@wien.gv.at writes:
 I found a small typo, patch attached.

Drat, thought I'd fixed that before, but obviously not.  Done now, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-03 Thread Tom Lane
[ sorry for being slow to respond, things are crazy this week ]

Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Anyway, the immediate takeaway is that this represents a horribly
 expensive way for pg_dump to find out which matviews aren't
 scannable.  The cheap way for it to find out would be if we had a
 boolean flag for that in pg_class.  Would you review the bidding
 as to why things were done the way they are?  Because in general,
 having to ask the kernel something is going to suck in any case,
 so basing it on the size of the disk file doesn't seem to me to
 be basically a good thing.

 The early versions of the patch had a boolean in pg_class to track
 this, but I got complaints from Robert and Noah (and possibly
 others?) that this got too ugly in combination with the support for
 unlogged matviews, and they suggested the current approach.

Meh.  I don't think that we should allow that corner case to drive the
design like this.  I would *far* rather not have unlogged matviews at
all than this boondoggle.  I'm not even convinced that the existing
semantics are desirable: who's to say that having an unlogged matview
revert to empty on crash renders it invalid?  If it's a summary of
underlying unlogged tables, then that's actually the right thing.
(If someone is desperately unhappy with such a behavior, why are they
insisting on it being unlogged, anyway?)

If we go with this implementation, I think we are going to be painting
ourselves into a corner that will be difficult or impossible to get out
of, because the scannability state of a matview is not just a matter of
catalog contents but of on-disk files.  That will make it difficult to
impossible for pg_upgrade to cope reasonably with any future rethinking
of scannability status.

In fact, I'm going to go further and say that I do not like the entire
concept of scannability, either as to design or implementation, and
I think we should just plain rip it out.  We can rethink that for 9.4
or later, but what we've got right now is a kluge, and I don't want
to find ourselves required to be bug-compatible with it forevermore.
To take just the most salient point: assuming that we've somehow
determined that some matview is too out-of-date to be usable, why is the
appropriate response to that to cause queries on it to fail altogether?
That seems like about the least useful feature that could be devised.
Why not, say, have queries fall back to expanding the view definition as
though it were a regular view?

I think matviews without any scannability control are already a pretty
useful feature, and one I'd not be ashamed to ship just like that for
9.3.  But the scannability stuff is clearly going to need to be
revisited.  IMO, driving a stake in the ground at the exact spot that
this stake is placed is not a good plan.  If we simply remove that
aspect altogether for now, I think we'll have more room to maneuver in
future releases.

I apologize for not having griped about this earlier, but I've really
paid no attention to the matviews work up to now; there are only so
many cycles in a day.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Heikki Linnakangas
While playing with Alexander's pg_trgm regexp patch, I noticed that the 
regexp library trips an assertion (if enabled) or crashes, when passed 
an input string that contains more than 32k different characters:


select 'foo' ~ (select string_agg(chr(x),'') from generate_series(100, 
35000) x) as nastyregex;


This is because it uses 'short' as the datatype to identify colors. When 
it overflows, -32768 is used as index to the colordesc array, and you 
get a crash. AFAICS this can't reliably be used for anything more 
sinister than crashing the backend.


A regex with that many different colors is an extreme case, so I think 
it's enough to turn the assertion in newcolor() into a run-time check, 
and throw a too many colors in regexp error. Alternatively, we could 
expand 'color' from short to int, but that would double the memory usage 
of sane regexps with less different characters.


Thoughts?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 A regex with that many different colors is an extreme case, so I think 
 it's enough to turn the assertion in newcolor() into a run-time check, 
 and throw a too many colors in regexp error. Alternatively, we could 
 expand 'color' from short to int, but that would double the memory usage 
 of sane regexps with less different characters.

Obviously Henry didn't think that far ahead.  I agree that throwing
an error is the best solution, and that widening color is probably
not what we want to do.  You want to fix that, or shall I?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Heikki Linnakangas

On 03.04.2013 18:21, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

A regex with that many different colors is an extreme case, so I think
it's enough to turn the assertion in newcolor() into a run-time check,
and throw a too many colors in regexp error. Alternatively, we could
expand 'color' from short to int, but that would double the memory usage
of sane regexps with less different characters.


Obviously Henry didn't think that far ahead.  I agree that throwing
an error is the best solution, and that widening color is probably
not what we want to do.  You want to fix that, or shall I?


I can do it. I assume that Tcl has the same bug, so I'll submit a report 
there, too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] c language functions

2013-04-03 Thread Rodrigo Barboza
Hello.
I'm trying to create a generic add function.
I have defined a type my_uint and it needs a '+' operator.
This operator should work like normal int + int operation.
The function is defined expecting arguments (my_uint, anyelement).

I'm confused in retrieving the anyelement type, value and than do the add
operation and return the correct value and type.
I tried to use PG_GETARG_DATUM, but I don't know how to extract the value
from it (should it be a uint32, uint64, float or double).

Any tips?


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 03:02, Florian Pflug wrote:

On Apr3, 2013, at 15:30 , Andrew Dunstan and...@dunslane.net wrote:

On 04/02/2013 02:46 PM, Florian Pflug wrote:

If we're going to break compatibility, we should IMHO get rid of
non-zero lower bounds all together. My guess is that the number of
affected users wouldn't be much higher than for the proposed patch,
and it'd allow lossless mapping to most language's native array types…

That would actually break a HUGE number of users, since the default lower
bound is 1. I have seen any number of pieces if code that rely on that.

Uh, yeah, we should make it 1 then, not 0, then. As long as the bound
is fixed, conversion to native C/Java/Ruby/Python/... arrays would still
be lossless.

best regards,
Florian Pflug


Zero as the default lower bound is consistent with most languages 
(especially the common ones like C, C++, Java,  Python), in fact I 
don't remember any language where that is not the case (ignoring SQL) - 
and I've written programs in about 20 languages.


Maybe we should adopt the famous compromise of '0.5'?  :-)


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 03.04.2013 18:21, Tom Lane wrote:
 Obviously Henry didn't think that far ahead.  I agree that throwing
 an error is the best solution, and that widening color is probably
 not what we want to do.  You want to fix that, or shall I?

 I can do it. I assume that Tcl has the same bug, so I'll submit a report 
 there, too.

Yes, definitely.

It occurs to me that at some point it might be useful to convert color
to unsigned short, so that you could have 64K of them --- but we'd still
need the error check anyway, and there's no reason to tackle such a
change today.  (This possible change is, however, another reason to not
want pg_trgm looking directly at the internals of the data structure...)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-03 Thread Jeff Janes
On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 This commit introduced a problem with wal_keep_segments:

 commit dfda6ebaec6763090fb78b458a979b558c50b39b



The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing.  However, I
don't really understand KeepLogSeg.  It seems like segno, and hence recptr,
don't actually serve any purpose.

Cheers,

Jeff

diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f9209f..3643be8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -8264,7 +8264,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
XLByteToSeg(recptr, segno);

/* avoid underflow, don't go below 1 */
-   if (segno = wal_keep_segments)
+   if (*logSegNo = wal_keep_segments)
segno = 1;
else
segno = *logSegNo - wal_keep_segments;


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Pavel Stehule
2013/4/3 Gavin Flower gavinflo...@archidevsys.co.nz

 On 04/04/13 03:02, Florian Pflug wrote:

 On Apr3, 2013, at 15:30 , Andrew Dunstan and...@dunslane.net wrote:

 On 04/02/2013 02:46 PM, Florian Pflug wrote:

 If we're going to break compatibility, we should IMHO get rid of
 non-zero lower bounds all together. My guess is that the number of
 affected users wouldn't be much higher than for the proposed patch,
 and it'd allow lossless mapping to most language's native array types…

 That would actually break a HUGE number of users, since the default lower
 bound is 1. I have seen any number of pieces if code that rely on that.

 Uh, yeah, we should make it 1 then, not 0, then. As long as the bound
 is fixed, conversion to native C/Java/Ruby/Python/... arrays would still
 be lossless.

 best regards,
 Florian Pflug


  Zero as the default lower bound is consistent with most languages
 (especially the common ones like C, C++, Java,  Python), in fact I don't
 remember any language where that is not the case (ignoring SQL) - and I've
 written programs in about 20 languages.


pascal, ADA, and ALGOL like languages

Regards

Pavel





 Maybe we should adopt the famous compromise of '0.5'?  :-)


 Cheers,
 Gavin



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers



[HACKERS] track_activities is mostly broken

2013-04-03 Thread Tom Lane
The point of turning off pgstat_track_activities is, IMO, to eliminate
the overhead of updating one's PgBackendStatus entry in main memory.
Commit 4f42b546 broke this, however, because it confused getting into
the reporting disabled state with what should happen once already
in the state.  If you try turning off track_activities in a backend,
you'll find that you can't see its current query any more, but that's
not because it's not reporting it --- it's only because
pg_stat_get_activity() is hiding it from you.  The overhead is still
being paid, as you can confirm by noting that query_start and
state_change continue to update whenever the backend does something.

Will fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-03 Thread David E. Wheeler
On Apr 3, 2013, at 2:37 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:

 I'd expect one of the CREATE EXTENSION commands to succeed
 and the others to block until the transaction is committed,
 then to fail with 'extension oracle_fdw already exists'.
 
 If that is what happens, it's what I'd expect since
 extension names are unique (see the unique constraint on
 pg_extension).

Oh, they are not unique per-schema? I guess they are global to the database but 
then their objects are in the specified schema, then.

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 04:58, Pavel Stehule wrote:




2013/4/3 Gavin Flower gavinflo...@archidevsys.co.nz 
mailto:gavinflo...@archidevsys.co.nz


On 04/04/13 03:02, Florian Pflug wrote:

On Apr3, 2013, at 15:30 , Andrew Dunstan and...@dunslane.net
mailto:and...@dunslane.net wrote:

On 04/02/2013 02:46 PM, Florian Pflug wrote:

If we're going to break compatibility, we should IMHO
get rid of
non-zero lower bounds all together. My guess is that
the number of
affected users wouldn't be much higher than for the
proposed patch,
and it'd allow lossless mapping to most language's
native array types...

That would actually break a HUGE number of users, since
the default lower
bound is 1. I have seen any number of pieces if code that
rely on that.

Uh, yeah, we should make it 1 then, not 0, then. As long as
the bound
is fixed, conversion to native C/Java/Ruby/Python/... arrays
would still
be lossless.

best regards,
Florian Pflug


Zero as the default lower bound is consistent with most languages
(especially the common ones like C, C++, Java,  Python), in fact
I don't remember any language where that is not the case (ignoring
SQL) - and I've written programs in about 20 languages.


pascal, ADA, and ALGOL like languages

Regards

Pavel

ALOGOL 60 was zero based by default, as I remember deliberately setting 
the lower bound to 1, I managed to avoid PASCAL and I only glanced at ADA.



Cheers,
Gavin


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Pavel Stehule
2013/4/3 Gavin Flower gavinflo...@archidevsys.co.nz

  On 04/04/13 04:58, Pavel Stehule wrote:




 2013/4/3 Gavin Flower gavinflo...@archidevsys.co.nz

 On 04/04/13 03:02, Florian Pflug wrote:

 On Apr3, 2013, at 15:30 , Andrew Dunstan and...@dunslane.net wrote:

 On 04/02/2013 02:46 PM, Florian Pflug wrote:

 If we're going to break compatibility, we should IMHO get rid of
 non-zero lower bounds all together. My guess is that the number of
 affected users wouldn't be much higher than for the proposed patch,
 and it'd allow lossless mapping to most language's native array types…

 That would actually break a HUGE number of users, since the default
 lower
 bound is 1. I have seen any number of pieces if code that rely on that.

 Uh, yeah, we should make it 1 then, not 0, then. As long as the bound
 is fixed, conversion to native C/Java/Ruby/Python/... arrays would still
 be lossless.

 best regards,
 Florian Pflug


  Zero as the default lower bound is consistent with most languages
 (especially the common ones like C, C++, Java,  Python), in fact I don't
 remember any language where that is not the case (ignoring SQL) - and I've
 written programs in about 20 languages.


  pascal, ADA, and ALGOL like languages

  Regards

  Pavel

ALOGOL 60 was zero based by default, as I remember deliberately
 setting the lower bound to 1, I managed to avoid PASCAL and I only glanced
 at ADA.


http://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28array%29

Regards

Pavel




 Cheers,
 Gavin



Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Pavel Stehule

  Pavel

ALOGOL 60 was zero based by default, as I remember deliberately
 setting the lower bound to 1, I managed to avoid PASCAL and I only glanced
 at ADA.



 http://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28array%29


In Pascal and similar languages (Wirth family) is possible to define lower
bound - any value (like PL/pgSQL). Depends on a school - but lot of people
used 1.

I remember - it was one argument for in Pascal/C flame war.

Regards

Pavel



 Regards



Pavel




 Cheers,
 Gavin





Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Tom Lane
 Zero as the default lower bound is consistent with most languages
 (especially the common ones like C, C++, Java,  Python), in fact
 I don't remember any language where that is not the case (ignoring
 SQL) - and I've written programs in about 20 languages.

Fortran ... Basic ... actually I'd have thought that zero was a
minority position.  Fashions change I guess.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-03 Thread Robert Haas
On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In fact, I'm going to go further and say that I do not like the entire
 concept of scannability, either as to design or implementation, and
 I think we should just plain rip it out.

This has been my feeling from the beginning, so I'm happy to support
this position.  I think the current version - where scan-ability is
tracked in just one way - is an improvement over the previous version
of the patch - where it was tracked in two different ways with a
confusing shuffle of information from one place to the other.  But my
favorite number of places to track it would be zero.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread David E. Wheeler
On Apr 3, 2013, at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Fortran ... Basic ... actually I'd have thought that zero was a
 minority position.  Fashions change I guess.

I say we turn the default lower bound up to 11.

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sepgsql 2/3] Add db_schema:search permission checks

2013-04-03 Thread Robert Haas
On Tue, Apr 2, 2013 at 2:22 PM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 OK, please check the attached ones.

Thanks.  I reviewed the schema-search patch and I think it looks
reasonable, but shouldn't we be calling the event OAT_NAMESPACE_SEARCH
rather than OAT_SCHEMA_SEARCH?  And, similarly,
ObjectAccessNamespaceSearch and InvokeNamespaceSearchHook?  I think
this terminology could be confusing to backend hackers who are used to
seeing the term namespace internally when referring to schemas.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Andrew Dunstan


On 04/03/2013 11:34 AM, Gavin Flower wrote:


Zero as the default lower bound is consistent with most languages 
(especially the common ones like C, C++, Java,  Python), in fact I 
don't remember any language where that is not the case (ignoring SQL) 
- and I've written programs in about 20 languages.


Pascal and Ada are obvious examples.

I'm quite sure there are others, but don't have time to do the research.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-04-03 Thread Jeff Janes
On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund and...@2ndquadrant.comwrote:



 I just checked and unfortunately your dump doesn't contain all that much
 valid WAL:
 ...



 So just two checkpoint records.

 Unfortunately I  fear that won't be enough to diagnose the problem,
 could you reproduce it with a higher wal_keep_segments?


I've been trying, but see message commit dfda6ebaec67 versus
wal_keep_segments.


Looking at some of the log files more, I see that vacuum is involved, but
in some way I don't understand.  The crash always happens on a test
cycle immediately after the sleep that allows the autovac to kick in and
finish.  So the events goes something like this:

...
run the frantic updating of foo until crash
recovery
query foo and verify the results are consistent with expectations
sleep to allow autovac to do its job.
truncate foo and repopulate it.
run the frantic updating of foo until crash
recovery
attempt to query foo but get the checksum failure.

What the vacuum is doing that corrupts the system in a way that survives
the truncate is a mystery to me.

Also, at one point I had the harness itself exit as soon as it detected the
problem, but I failed to have it shut down the server.  So the server keep
running idle and having autovac do its thing, which produced some
interesting log output:

WARNING:  relation foo page 45 is uninitialized --- fixing
WARNING:  relation foo page 46 is uninitialized --- fixing
...
WARNING:  relation foo page 72 is uninitialized --- fixing
WARNING:  relation foo page 73 is uninitialized --- fixing
WARNING:  page verification failed, calculated checksum 54570 but expected
34212
ERROR:  invalid page in block 74 of relation base/16384/4931589

This happened 3 times.  Every time, the warnings started on page 45, and
they continued up until the invalid page was found (which varied, being 74,
86, and 74 again)

I wonder if the bug is in checksums, or if the checksums are doing their
job by finding some other bug.  And why did those uninitialized pages
trigger warnings when they were autovacced, but not when they were seq
scanned in a query?

Cheers,

Jeff


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Robert Haas
On Wed, Apr 3, 2013 at 12:36 PM, David E. Wheeler da...@kineticode.com wrote:
 On Apr 3, 2013, at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fortran ... Basic ... actually I'd have thought that zero was a
 minority position.  Fashions change I guess.

 I say we turn the default lower bound up to 11.

+1.  I think that's clearly a compromise position that will make
everyone equally happy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-04-03 Thread Andres Freund
On 2013-04-03 09:48:54 -0700, Jeff Janes wrote:
 On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund and...@2ndquadrant.comwrote:
 
 
 
  I just checked and unfortunately your dump doesn't contain all that much
  valid WAL:
  ...
 
 
 
  So just two checkpoint records.
 
  Unfortunately I  fear that won't be enough to diagnose the problem,
  could you reproduce it with a higher wal_keep_segments?
 
 
 I've been trying, but see message commit dfda6ebaec67 versus
 wal_keep_segments.

Setting up an archive_command could help if its affordable from the
space perspective :/

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Heikki Linnakangas

On 03.04.2013 18:41, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

On 03.04.2013 18:21, Tom Lane wrote:

Obviously Henry didn't think that far ahead.  I agree that throwing
an error is the best solution, and that widening color is probably
not what we want to do.  You want to fix that, or shall I?



I can do it. I assume that Tcl has the same bug, so I'll submit a report
there, too.


Yes, definitely.

It occurs to me that at some point it might be useful to convert color
to unsigned short, so that you could have 64K of them ...--- but we'd still
need the error check anyway, and there's no reason to tackle such a
change today.


I was just thinking the same. In practice, expanding it to 64k doesn't 
get you much farther. There is this in newdfa():


  d-incarea = (struct arcp *) MALLOC(nss * cnfa-ncolors * 
sizeof(struct arcp));


That's (number of states) * (number of colors) * (constant). The test 
case I posted earlier would require about 40 GB of RAM for that 
allocation alone, and fails with an out of memory error. Maybe it 
would be possible to construct a regexp that has a lot of colors but few 
states, but that's an even more marginal use case.



Attached is a patch to add the overflow check. I used the error message 
too many distinct characters in regex. That's not totally accurate, 
because there isn't a limit on distinct characters per se, but on the 
number of colors. Conceivably, you could have a regexp with more than 
32k different characters, but where most of them are mapped to the same 
color. In practice, it's not helpful to the user to say too many 
colors; he will have no clue what a color is.


PS. I was mistaken when I said that this causes an assertion failure; it 
segfaults even with assertions enabled.


- Heikki
diff --git a/src/backend/regex/regc_color.c b/src/backend/regex/regc_color.c
index 1c60566..e6aa899 100644
--- a/src/backend/regex/regc_color.c
+++ b/src/backend/regex/regc_color.c
@@ -247,7 +247,15 @@ newcolor(struct colormap * cm)
 		/* oops, must allocate more */
 		struct colordesc *newCd;
 
+		if (cm-max == MAX_COLOR)
+		{
+			CERR(REG_ECOLORS);
+			return COLORLESS;	/* too many colors */
+		}
+
 		n = cm-ncds * 2;
+		if (n  MAX_COLOR + 1)
+			n = MAX_COLOR + 1;
 		if (cm-cd == cm-cdspace)
 		{
 			newCd = (struct colordesc *) MALLOC(n * sizeof(struct colordesc));
diff --git a/src/include/regex/regerrs.h b/src/include/regex/regerrs.h
index a761371..6516d7e 100644
--- a/src/include/regex/regerrs.h
+++ b/src/include/regex/regerrs.h
@@ -77,3 +77,7 @@
 {
 	REG_ETOOBIG, REG_ETOOBIG, nfa has too many states
 },
+
+{
+	REG_ECOLORS, REG_ECOLORS, too many distinct characters in regex
+},
diff --git a/src/include/regex/regex.h b/src/include/regex/regex.h
index 616c2c6..a7fb14f 100644
--- a/src/include/regex/regex.h
+++ b/src/include/regex/regex.h
@@ -153,6 +153,7 @@ typedef struct
 #define REG_MIXED	17			/* character widths of regex and string differ */
 #define REG_BADOPT	18			/* invalid embedded option */
 #define REG_ETOOBIG 19			/* nfa has too many states */
+#define REG_ECOLORS 20			/* too many distinct characters in regex */
 /* two specials for debugging and testing */
 #define REG_ATOI	101			/* convert error-code name to number */
 #define REG_ITOA	102			/* convert error-code number to name */
diff --git a/src/include/regex/regguts.h b/src/include/regex/regguts.h
index e1e406f..fc7f5ae 100644
--- a/src/include/regex/regguts.h
+++ b/src/include/regex/regguts.h
@@ -148,6 +148,7 @@
 typedef short color;			/* colors of characters */
 typedef int pcolor;/* what color promotes to */
 
+#define MAX_COLOR	32767		/* max value that fits in 'color' datatype */
 #define COLORLESS	(-1)		/* impossible color */
 #define WHITE		0			/* default color, parent of all others */
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-03 Thread Robert Haas
On Tue, Apr 2, 2013 at 12:19 PM, Peter Eisentraut pete...@gmx.net wrote:
 It's weird that SET LOCAL and SET SESSION actually *set* the value, and
 the second key word determines how long the setting will last.  SET
 PERSISTENT doesn't actually set the value.  I predict that this will be
 a new favorite help-it-doesn't-work FAQ.

I think this is another argument against this particular syntax.  I
have always thought that something along the lines of ALTER SYSTEM
would be more appropriate.  ALTER DATABASE .. SET and ALTER ROLE ..
SET don't change the value immediately either, and nobody gets
confused about that to my knowledge.  But I can see where SET
PERSISTENT could cause that sort of confusion.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Robert Haas
On Wed, Apr 3, 2013 at 11:26 AM, Rodrigo Barboza
rodrigombu...@gmail.com wrote:
 Hello.
 I'm trying to create a generic add function.
 I have defined a type my_uint and it needs a '+' operator.
 This operator should work like normal int + int operation.
 The function is defined expecting arguments (my_uint, anyelement).

 I'm confused in retrieving the anyelement type, value and than do the add
 operation and return the correct value and type.
 I tried to use PG_GETARG_DATUM, but I don't know how to extract the value
 from it (should it be a uint32, uint64, float or double).

 Any tips?

Well, the information the function ends up receiving is going to
depend on how the function is declared at an SQL level.  So if you are
defining the function to take an argument of anyelement (which seems
unlikely to be a useful thing to do, but let's suppose you do it
anyway) then look at the C code for some other function that takes an
anyelement argument and look at how that function unpacks it.

Similarly, if you declare the function to take int4 argument, then go
look at the C code function that takes an int4 argument and see what
it does to unpack it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Apr 3, 2013 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 In fact, I'm going to go further and say that I do not like the entire
 concept of scannability, either as to design or implementation, and
 I think we should just plain rip it out.

 This has been my feeling from the beginning, so I'm happy to support
 this position.  I think the current version - where scan-ability is
 tracked in just one way - is an improvement over the previous version
 of the patch - where it was tracked in two different ways with a
 confusing shuffle of information from one place to the other.  But my
 favorite number of places to track it would be zero.

To be clear, I think we'll end up tracking some notion of scannability
eventually.  I just don't think the current notion is sufficiently baked
to want to promise to be upward-compatible with it in future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Rodrigo Barboza
Why not useful?
If I don't make it receive anyelement, I will have to create an add
function for each type.
Correct me if I'm wrong.


On Wed, Apr 3, 2013 at 2:27 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Apr 3, 2013 at 11:26 AM, Rodrigo Barboza
 rodrigombu...@gmail.com wrote:
  Hello.
  I'm trying to create a generic add function.
  I have defined a type my_uint and it needs a '+' operator.
  This operator should work like normal int + int operation.
  The function is defined expecting arguments (my_uint, anyelement).
 
  I'm confused in retrieving the anyelement type, value and than do the add
  operation and return the correct value and type.
  I tried to use PG_GETARG_DATUM, but I don't know how to extract the value
  from it (should it be a uint32, uint64, float or double).
 
  Any tips?

 Well, the information the function ends up receiving is going to
 depend on how the function is declared at an SQL level.  So if you are
 defining the function to take an argument of anyelement (which seems
 unlikely to be a useful thing to do, but let's suppose you do it
 anyway) then look at the C code for some other function that takes an
 anyelement argument and look at how that function unpacks it.

 Similarly, if you declare the function to take int4 argument, then go
 look at the C code function that takes an int4 argument and see what
 it does to unpack it.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 Attached is a patch to add the overflow check. I used the error message 
 too many distinct characters in regex. That's not totally accurate, 
 because there isn't a limit on distinct characters per se, but on the 
 number of colors. Conceivably, you could have a regexp with more than 
 32k different characters, but where most of them are mapped to the same 
 color. In practice, it's not helpful to the user to say too many 
 colors; he will have no clue what a color is.

Patch looks good except perhaps for wordsmithing the message text.

One thought is that we don't need to identify this as a regex error
because the PG code will report it with invalid regular expression: %s.

I think there's a good argument for saying too many character colors
and relying on the invalid regular expression context to clue in the
clueless.  After all, most of them don't know what an NFA is either, but
no one has complained about the REG_ETOOBIG message.  I think if you get
to the point where you're triggering this error, you probably know
something about regexes, or even if you don't the phrase too many will
give you a fair idea what's wrong.  There is much to be said for
specifically identifying the implementation limit that's been hit, even
if most users don't know what it is.  So I'd just as soon not fall back
on something imprecise.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-03 Thread Heikki Linnakangas

On 03.04.2013 18:58, Jeff Janes wrote:

On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janesjeff.ja...@gmail.com  wrote:


This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b


The problem seems to be that the underflow warned about is happening,
because the check to guard it was checking the wrong thing.  However, I
don't really understand KeepLogSeg.  It seems like segno, and hence recptr,
don't actually serve any purpose.


Hmm, the check is actually correct, but the assignment in the 
else-branch isn't. The idea of KeepLogSeg is to calculate recptr - 
wal_keep_segments, and assign that to *logSegNo. But only if *logSegNo 
is not already  than the calculated value. Does the attached look 
correct to you?



At some point when it is over-pruning and recycling, it recyles the log
files that are still needed for recovery, and if the database crashes at
that point it will not recover because it can't find either the primary
secondary checkpoint records.


So, KeepLogSeg incorrectly sets *logSegNo to 0, and CreateCheckPoint 
decrements it, causing it to underflow to 2^64-1. Now RemoveOldXlogFiles 
feels free to remove every WAL segment.


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3227d4c..2f79af6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7518,9 +7518,9 @@ CreateRestartPoint(int flags)
 }
 
 /*
- * Calculate the last segment that we need to retain because of
- * wal_keep_segments, by subtracting wal_keep_segments from
- * the given xlog location, recptr.
+ * Retreat *logSegNo to the last segment that we need to retain because of
+ * wal_keep_segments. This is calculated by by subtracting wal_keep_segments
+ * from the given xlog location, recptr.
  */
 static void
 KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
@@ -7536,7 +7536,7 @@ KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo)
 	if (segno = wal_keep_segments)
 		segno = 1;
 	else
-		segno = *logSegNo - wal_keep_segments;
+		segno = segno - wal_keep_segments;
 
 	/* don't delete WAL segments newer than the calculated segment */
 	if (segno  *logSegNo)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Dean Rasheed
On 3 April 2013 15:10, Tom Lane t...@sss.pgh.pa.us wrote:
 I think though that the upthread argument that we'd have multiple
 interpretations of the same thing is bogus.  To me, the core idea that's
 being suggested here is that '{}' should mean a zero-length 1-D array,
 not a zero-D array as formerly.  We would still need a way to represent
 zero-D arrays, if only because they'd still exist on-disk in existing
 databases (assuming we're not willing to break pg_upgrade for this).
 I suggest that that ought *not* involve any braces.  Perhaps '[]=' would
 be a suitable representation.

Then restoring an old SQL dump would silently convert all old zero-D
arrays to zero-length 1-D arrays. Also, if you failed to completely
rewrite all your application code, and accidentally missed a few
functions, you could very easily end up in a situation where your
tables contained a mixture of  zero-D arrays and zero-length 1-D
arrays, that wouldn't compare equally, leading to all sorts of nasty
bugs.

 In the other direction, ISTM that
 '{{},{},{}}' is a zero-by-three array,

Isn't that a three-by-zero array rather than a zero-by-three array?
Either way, what is the use-case for it? I certainly can't remember
ever using a 0x3 or 3x0 matrix.

 entirely distinct from '{}' or
 '{{}}' in dimensionality if not content.

And '{{}}' is a one-by-zero array. There is currently no proposed
syntax for representing a zero-by-zero array, or a zero-by-one array.

All I see here is endless opportunity for confusion and code breakage,
with very little real benefit.

I actually don't think the current behaviour is broken. I find it
perfectly logical and consistent to have a single dimensionless
concept of an empty array. It has no elements, so questions like are
they arranged in 1-D or 2-D, and what index do they start at are
meaningless, and so should return NULL. The only counter-intuitive
thing is that array_length() returns NULL instead of 0 for the empty
array, and it doesn't take long to get used to that behaviour.

Much of this seems to have grown out of a desire to fix the confusing
behaviour of array_length(), but seems to have ended up in a much more
confusing place.

Given that array_length() takes 2 arguments, it is somewhat forgivable
for it to return NULL when you ask for the array's length in a
dimension that it doesn't have. Perhaps what might make the API easier
to work with is a new single-argument function --- e.g., array_size()
that would return the number of elements in the array --- 0 for '{}',
3 for '{1,2,3}', 6 for '{{1,2},{3,4},{5,6}}' and so on.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-04-03 Thread Jeff Davis
On Wed, 2013-04-03 at 09:48 -0700, Jeff Janes wrote:

  And why did those uninitialized pages trigger warnings when they were
 autovacced, but not when they were seq scanned in a query?
 
A scan won't trigger that warning. Empty pages are sometimes the
expected result of a crash when the file is extended but the page is not
written yet. So empty pages aren't necessarily an indication of
corruption (though I'd like to fix that eventually, because sometimes
zeroing is corruption).

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:

 Fortran ... Basic ... actually I'd have thought that zero was a
 minority position.  Fashions change I guess.

When I started programming the top four languages for business
programming were COBOL, BASIC, RPG II, and assembly language.
(Pascal and C came later, and I never saw much use of Fortran by
anyone other than mathematicians.) Except for assembly language,
the subscripts for arrays either started with 1 always, or that was
the default.  Given when it was first developed, it's not too
surprising that the SQL standard adopted 1 as the first element of
an array.

Which is more natural depends on whether you think of the subscript
in terms of ordinal positions or offsets from the base address.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Tom Lane
Rodrigo Barboza rodrigombu...@gmail.com writes:
 Why not useful?
 If I don't make it receive anyelement, I will have to create an add
 function for each type.

If you make it anyelement, then you're contracting to be able to add
any datatype whatsoever to a my_uint.  This is nonsensical.

You'd be better off declaring several specific addition functions,
one for each other type.  This will be an order of magnitude easier
to write, and probably run an order of magnitude faster too, because
just checking to see what type you got would already be significantly
more expensive than adding a couple of integers ought to be.

Look at the built-in types and functions for precedent.  There are
indeed separate functions for int2 + int2, int2 + int4, int4 + int2,
int4 + int4, etc etc.  If we were starting from scratch, we might reduce
that to just int4 + int4 and rely on the implicit coercion from int2 to
int4 to handle the other cases; but there's no way we'd put in run-time
type determination.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-03 Thread Alvaro Herrera
David E. Wheeler wrote:
 On Apr 3, 2013, at 2:37 AM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 
  I'd expect one of the CREATE EXTENSION commands to succeed
  and the others to block until the transaction is committed,
  then to fail with 'extension oracle_fdw already exists'.
  
  If that is what happens, it's what I'd expect since
  extension names are unique (see the unique constraint on
  pg_extension).
 
 Oh, they are not unique per-schema? I guess they are global to the database 
 but then their objects are in the specified schema, then.

Right -- an extension is not considered to live within a schema, they
are database-global.  The objects might live in a particular schema (if
it is relocatable), and there's support to move those to a different
schema, but this doesn't affect the extension itself.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 05:21, Pavel Stehule wrote:




Pavel


ALOGOL 60 was zero based by default, as I remember
deliberately setting the lower bound to 1, I managed to avoid
PASCAL and I only glanced at ADA.


http://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28array%29


In Pascal and similar languages (Wirth family) is possible to define 
lower bound - any value (like PL/pgSQL). Depends on a school - but lot 
of people used 1.


I remember - it was one argument for in Pascal/C flame war.

Regards

Pavel


[...]

At the time (just over 40 years ago!) I was adamant that arrays should 
start with an index of one, now I much prefer zero.  Probably I prefer 
zero in part, because now I understand what is happening at the machine 
code level, and partly because zero is the default for the main 
languages I use.


Wasting time on Google (I have 'more' important things I 'should' be 
doing!), I find ALGOL 60 did not appear to have a default value for the 
lower index - not only that, but one could make it negative! see: 
http://www.csci.csusb.edu/dick/samples/algol60.syntax.html


Anyhow, I think we should standardise on zero as the initial index to be 
as consistent as practicable.  However, not with a religious zeal at the 
expense of practical considerations!



Cheers,
Gavin


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Andres Freund
On 2013-04-04 07:42:36 +1300, Gavin Flower wrote:
 On 04/04/13 05:21, Pavel Stehule wrote:
 
 
 Pavel
 
 ALOGOL 60 was zero based by default, as I remember
 deliberately setting the lower bound to 1, I managed to avoid
 PASCAL and I only glanced at ADA.
 
 
 
  http://en.wikipedia.org/wiki/Comparison_of_programming_languages_%28array%29
 
 
 In Pascal and similar languages (Wirth family) is possible to define lower
 bound - any value (like PL/pgSQL). Depends on a school - but lot of people
 used 1.
 
 I remember - it was one argument for in Pascal/C flame war.
 
 Regards
 
 Pavel
 
 [...]
 
 At the time (just over 40 years ago!) I was adamant that arrays should start
 with an index of one, now I much prefer zero.  Probably I prefer zero in
 part, because now I understand what is happening at the machine code level,
 and partly because zero is the default for the main languages I use.
 
 Wasting time on Google (I have 'more' important things I 'should' be
 doing!), I find ALGOL 60 did not appear to have a default value for the
 lower index - not only that, but one could make it negative! see:
 http://www.csci.csusb.edu/dick/samples/algol60.syntax.html
 
 Anyhow, I think we should standardise on zero as the initial index to be as
 consistent as practicable.  However, not with a religious zeal at the
 expense of practical considerations!

Changing this now, rather than on a green field, strikes me as a pretty
absurd exercise in frustrating users by breaking queries subtly. Since 1
is a value index that won't always break visible.

-1

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 05:30, Tom Lane wrote:

Zero as the default lower bound is consistent with most languages
(especially the common ones like C, C++, Java,  Python), in fact
I don't remember any language where that is not the case (ignoring
SQL) - and I've written programs in about 20 languages.

Fortran ... Basic ... actually I'd have thought that zero was a
minority position.  Fashions change I guess.

regards, tom lane
I had forgotten the indexing in BASIC  FORTRAN, I now recall COBOL's 
TABLE's were equivalent to arrays and they started at one.



Cheers,
Gavin


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 2, 2013 at 12:19 PM, Peter Eisentraut pete...@gmx.net wrote:
 It's weird that SET LOCAL and SET SESSION actually *set* the value, and
 the second key word determines how long the setting will last.  SET
 PERSISTENT doesn't actually set the value.  I predict that this will be
 a new favorite help-it-doesn't-work FAQ.

 I think this is another argument against this particular syntax.  I
 have always thought that something along the lines of ALTER SYSTEM
 would be more appropriate.  ALTER DATABASE .. SET and ALTER ROLE ..
 SET don't change the value immediately either, and nobody gets
 confused about that to my knowledge.  But I can see where SET
 PERSISTENT could cause that sort of confusion.

Yeah, I think I argued for using the SET syntax to start with, but
I'm coming around to the position that SET PERSISTENT is too much
unlike the behavior of other varieties of SET.  ALTER is sounding
more attractive to me now.  Not sure about ALTER SYSTEM in particular
though --- it's not clear that that has any real merit other than
already existing as a keyword.  (Not that that's negligible.)
ALTER CONFIGURATION is another alternative using an existing keyword
that might be worth considering.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Kevin Grittner
Gavin Flower gavinflo...@archidevsys.co.nz wrote:

 Anyhow, I think we should standardise on zero as the initial
 index to be as consistent as practicable.

If you want to suggest a default of zero for the first subscript of
an array in SQL, please don't confuse the issue by using any form
of the word standard in that proposal.  There are ANSI and ISO
standards for SQL, and they require that the first element of an
array is one.  I'm OK with *extending* the standard by *allowing*
other values, but let's not flaunt the standard and break existing
code by changing the *default*.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 05:36, David E. Wheeler wrote:

On Apr 3, 2013, at 9:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:


Fortran ... Basic ... actually I'd have thought that zero was a
minority position.  Fashions change I guess.

I say we turn the default lower bound up to 11.

David


In keeping with the level of irrationality in this thread, maybe we 
should set it to an irrational number like the square root of 2, or 
transcend our selves and make in a transcendental number like pi!  :-)


I suppose using the square root of minus one would be consider too 
imaginative???  :-)



Cheers,
Gavin


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 07:58, Kevin Grittner wrote:

Gavin Flower gavinflo...@archidevsys.co.nz wrote:


Anyhow, I think we should standardise on zero as the initial
index to be as consistent as practicable.

If you want to suggest a default of zero for the first subscript of
an array in SQL, please don't confuse the issue by using any form
of the word standard in that proposal.  There are ANSI and ISO
standards for SQL, and they require that the first element of an
array is one.  I'm OK with *extending* the standard by *allowing*
other values, but let's not flaunt the standard and break existing
code by changing the *default*.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
You omitted my rider 'However, not with a religious zeal at the expense 
of practical considerations!' Which was meant to cover concerns like yours.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-04-03 Thread Jeff Davis
On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:

 I've reproduced the problem, this time in block 74 of relation
 base/16384/4931589, and a tarball of the data directory is here:
 
 
 https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing
 
 
 
 (the table is in database jjanes under role jjanes, the binary is
 commit 9ad27c215362df436f8c)
 
 
 What I would probably really want is the data as it existed after the
 crash but before recovery started, but since the postmaster
 immediately starts recovery after the crash, I don't know of a good
 way to capture this.

Can you just turn off the restart_after_crash GUC? I had a chance to
look at this, and seeing the block before and after recovery would be
nice. I didn't see a log file in the data directory, but it didn't go
through recovery, so I assume it already did that.

The block is corrupt as far as I can tell. The first third is written,
and the remainder is all zeros. The header looks like this:

(These numbers are mostly from pageinspect. The checksum value that
comes from pageinspect needed to be cast back to an unsigned short to
match the error message above -- should that be changed in
pageinspect?).

lsn: 7/252E4080
checksum: 34212
flags: 1
lower: 1188
upper: 5952
special: 8192
pagesize: 8192
version: 4
prune_xid: 156833911

So the header looks good, but most of the page data is missing. I tried
with pg_filedump (the 9.2.0 version, which should be fine), and it
agrees with me that the page is corrupt.

Interestingly, that doesn't result in a user-visible error when
ignore_checksum_failure=on. That's because the item pointers appear to
all be either not normal or they point to the zeroed region. Testing the
visibility of a zeroed tuple header is always false, so no problem.

You'd still think this would cause incorrect results, but I think what's
happening is that this is a new page (otherwise it would have been
written with something other than zeroes before). So, the tuples that
are supposed to be there may be uncommitted anyway.

So, the page may be corrupt without checksums as well, but it just
happens to be hidden for the same reason. Can you try to reproduce it
without -k? And on the checkin right before checksums were added?
Without checksums, you'll need to use pg_filedump (or similar) to find
whether an error has happened.

To start speculating about the root cause: something is violating the
WAL before data rule, or not writing a FPI when it's supposed to, or not
properly restoring the FPI during recovery, or something sets the wrong
LSN. This could still be caused by the checksums patch, but it seems a
little less likely now. The reason I say that is because it's a new page
with tuples on it, so that means something in the insert/update path
ended up not writing the FPI before writing the page.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Rodrigo Barboza
Well, I was checking inside my function the type of the second argument and
switching between the allowed types.
This way kind of does the same thing of many functions, doesn't it?


On Wed, Apr 3, 2013 at 3:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Rodrigo Barboza rodrigombu...@gmail.com writes:
  Why not useful?
  If I don't make it receive anyelement, I will have to create an add
  function for each type.

 If you make it anyelement, then you're contracting to be able to add
 any datatype whatsoever to a my_uint.  This is nonsensical.

 You'd be better off declaring several specific addition functions,
 one for each other type.  This will be an order of magnitude easier
 to write, and probably run an order of magnitude faster too, because
 just checking to see what type you got would already be significantly
 more expensive than adding a couple of integers ought to be.

 Look at the built-in types and functions for precedent.  There are
 indeed separate functions for int2 + int2, int2 + int4, int4 + int2,
 int4 + int4, etc etc.  If we were starting from scratch, we might reduce
 that to just int4 + int4 and rely on the implicit coercion from int2 to
 int4 to handle the other cases; but there's no way we'd put in run-time
 type determination.

 regards, tom lane



Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Andres Freund
On 2013-04-04 08:03:03 +1300, Gavin Flower wrote:
 On 04/04/13 07:58, Kevin Grittner wrote:
 Gavin Flower gavinflo...@archidevsys.co.nz wrote:
 
 Anyhow, I think we should standardise on zero as the initial
 index to be as consistent as practicable.
 If you want to suggest a default of zero for the first subscript of
 an array in SQL, please don't confuse the issue by using any form
 of the word standard in that proposal.  There are ANSI and ISO
 standards for SQL, and they require that the first element of an
 array is one.  I'm OK with *extending* the standard by *allowing*
 other values, but let's not flaunt the standard and break existing
 code by changing the *default*.

That's already possible:

postgres=# SELECT ('[0:3]={e1,e2,e3,e4}'::text[])[0];
 text 
--
 e1
(1 row)

Not too convenient, but ...

 You omitted my rider 'However, not with a religious zeal at the expense of
 practical considerations!' Which was meant to cover concerns like yours.

Given its already possible I don't understand what you propose then. A
guc that allows changing the default? That would need to be included in
dumps and such and would make old dumps - which won't include an
explicit SET of the current value - ambigious to load. So that doesn't
seem to be a good idea either.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-03 Thread Jeff Janes
On Wed, Apr 3, 2013 at 11:14 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 03.04.2013 18:58, Jeff Janes wrote:

 On Tue, Apr 2, 2013 at 10:08 PM, Jeff Janesjeff.ja...@gmail.com  wrote:

  This commit introduced a problem with wal_keep_segments:

 commit dfda6ebaec6763090fb78b458a979b**558c50b39b


 The problem seems to be that the underflow warned about is happening,
 because the check to guard it was checking the wrong thing.  However, I
 don't really understand KeepLogSeg.  It seems like segno, and hence
 recptr,
 don't actually serve any purpose.


 Hmm, the check is actually correct, but the assignment in the else-branch
 isn't. The idea of KeepLogSeg is to calculate recptr - wal_keep_segments,
 and assign that to *logSegNo. But only if *logSegNo is not already  than
 the calculated value. Does the attached look correct to you?


Let me describe what I think is going on.  My description is On start,
recptr is the redo location of the just-completed checkpoint, and logSegNo
is the redo location segment of the checkpoint before that one.  We want to
keep the previous-checkpoint redo location, and we also want to keep
wal_keep_segments before the current-checkpoint redo location, so we take
whichever is earlier.

If my understanding is now correct, then I think your patch looks correct.
 (Also, applying it fixed the problem I was having.)

Why do we keep wal_keep_segments before the just-finished checkpoint,
rather than keeping that many before the previous checkpoint?  I seems like
it would be more intuitive (to the DBA) for that parameter to mean keep
this many more segments than you otherwise would.  I'm not proposing we
change it, I'm just curious about why it is done that way.

Thanks,

Jeff


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-03 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:

 In fact, I'm going to go further and say that I do not like the
 entire concept of scannability, either as to design or
 implementation, and I think we should just plain rip it out.

 This has been my feeling from the beginning, so I'm happy to
 support this position.  I think the current version - where
 scan-ability is tracked in just one way - is an improvement over
 the previous version of the patch - where it was tracked in two
 different ways with a confusing shuffle of information from one
 place to the other. But my favorite number of places to track it
 would be zero.

 To be clear, I think we'll end up tracking some notion of
 scannability eventually.  I just don't think the current notion
 is sufficiently baked to want to promise to be upward-compatible
 with it in future.

To be honest, I don't think I've personally seen a single use case
for matviews where they could be used if you couldn't count on an
error if attempting to use them without the contents reflecting a
materialization of the associated query at *some* point in time.
There probably are such, but I think removing this entirely takes
the percentage of use cases covered by the implementation in this
release down from 10% to 2%.

Consider where the Wisconsin Courts use home-grown materialized
views currently:

(1) On the public web site for circuit court data, visibility is
based on supreme court rules and the advice of a committee
consisting of judges, representatives of the press, defense
attorneys, prosecuting attorneys, etc.  There are cases in the
database which, for one reason or another, should not show up on
the public web site.  On a weekly basis, where monitoring shows the
lowest usage, the table of cases which are too old to be shown
according to the rules thus determined is regenerated.  If there
was the possibility that a dump and load could fail to fill this,
and the queries would run without error, they could not move from
ad hoc matview techniques to the new feature without excessive
risk.

(2) Individual judges have a dashboard showing such things as the
number of court cases which have gone beyond certain thresholds
without action.  They can drill down to detail so that cases
which have slipped through the cracks can be scheduled for some
appropriate action.  Justice delayed... and all of that.  It
would be much better to get an error which would result in
information currently unavailable than to give the impression
that there are no such cases.

Since the main point of this patch is to get the basis for a more
complete matview feature into place while still supporting *some*
use cases, and a high priority needs to be place on not painting
ourselves into a corner, I agree we should rip this out if we think
it does so.  I have spent some time looking at what we will want to
add in future releases, and a more sophisticated concept of what is
fresh enough to allow use of the materialized data is certainly
on the list, and falling back to running the underlying query like
a normal view would be a reasonable option to be able to choose
in some cases; but I think that will *always* need to start with
information about whether data *has* been generated, and an empty
set has to be considered valid data if it has been.  If we come up
with a way to track that which isn't too fragile, I'm confident
that it will remain useful as the feature evolves. Making sure that
the heap has at least one page if data has been generated seems
like a not-entirely-unreasonable way to do that, although there
remains at least one vacuum bug to fix if we keep it, in addition
to Tom's concerns.  It has the advantage of playing nicely with
unlogged tables.  If this is not going to be what we use long term,
do we have a clue what is?

Personally, I think it would be sad to reduce the use cases for
which matviews in 9.3 can be used to those which can tolerate an
error-free reference to a matview for which data has not been
generated, but I'll rip out support for that distinction if that is
the consensus.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Robert Haas
On Wed, Apr 3, 2013 at 3:25 PM, Rodrigo Barboza rodrigombu...@gmail.com wrote:
 Well, I was checking inside my function the type of the second argument and
 switching between the allowed types.
 This way kind of does the same thing of many functions, doesn't it?

Sure, except that it will also call your function when someone does
my_int + text or my_int + bytea or my_int + box, which are surely a
lot less sensible.  It's probably a good idea to assume that if you
make your type work like the existing types, it's more likely to end
well than if you make it behave completely differently.

...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-03 Thread Robert Haas
On Wed, Apr 3, 2013 at 2:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 2, 2013 at 12:19 PM, Peter Eisentraut pete...@gmx.net wrote:
 It's weird that SET LOCAL and SET SESSION actually *set* the value, and
 the second key word determines how long the setting will last.  SET
 PERSISTENT doesn't actually set the value.  I predict that this will be
 a new favorite help-it-doesn't-work FAQ.

 I think this is another argument against this particular syntax.  I
 have always thought that something along the lines of ALTER SYSTEM
 would be more appropriate.  ALTER DATABASE .. SET and ALTER ROLE ..
 SET don't change the value immediately either, and nobody gets
 confused about that to my knowledge.  But I can see where SET
 PERSISTENT could cause that sort of confusion.

 Yeah, I think I argued for using the SET syntax to start with, but
 I'm coming around to the position that SET PERSISTENT is too much
 unlike the behavior of other varieties of SET.  ALTER is sounding
 more attractive to me now.  Not sure about ALTER SYSTEM in particular
 though --- it's not clear that that has any real merit other than
 already existing as a keyword.  (Not that that's negligible.)
 ALTER CONFIGURATION is another alternative using an existing keyword
 that might be worth considering.

Yeah, I thought about something like that.  Aside from saving on
keywords, the reason I like ALTER SYSTEM or similar is that I suspect
there will be other system-wide things that we may want to let people
ALTER in the future, so I think that route might avoid an unnecessary
proliferation of top-level commands.  I am not, however, deadly
attached to the idea, if someone's got a good reason for preferring
something else.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] c language functions

2013-04-03 Thread Rodrigo Barboza
I see, that's true.
I'm returning unknown type, there is a little more overhead. But it's
working now. =]
Thanks for the help guys!


On Wed, Apr 3, 2013 at 6:17 PM, Robert Haas robertmh...@gmail.com wrote:

 On Wed, Apr 3, 2013 at 3:25 PM, Rodrigo Barboza rodrigombu...@gmail.com
 wrote:
  Well, I was checking inside my function the type of the second argument
 and
  switching between the allowed types.
  This way kind of does the same thing of many functions, doesn't it?

 Sure, except that it will also call your function when someone does
 my_int + text or my_int + bytea or my_int + box, which are surely a
 lot less sensible.  It's probably a good idea to assume that if you
 make your type work like the existing types, it's more likely to end
 well than if you make it behave completely differently.

 ...Robert



Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-03 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 In fact, I'm going to go further and say that I do not like the
 entire concept of scannability, either as to design or
 implementation, and I think we should just plain rip it out.

 To be honest, I don't think I've personally seen a single use case
 for matviews where they could be used if you couldn't count on an
 error if attempting to use them without the contents reflecting a
 materialization of the associated query at *some* point in time.

Well, if we remove the WITH NO DATA clause from CREATE MATERIALIZED
VIEW, that minimum requirement is satisfied no?

I wouldn't actually want to remove that option, because pg_dump will
need it to avoid circular-reference problems.  But if you simply don't
use it then you have the minimum guarantee.  And I do not see where
the current implementation is giving you any more guarantees.

What it *is* doing is setting a rather dubious behavioral precedent that
we will no doubt hear Robert complaining about when (not if) we decide
we don't want to be backward-compatible with it anymore.

Granting that throwing an error is actually of some use to some people,
I would not think that people would want to turn it on via a command
that throws away the existing view contents altogether, nor turn it off
with a full-throated REFRESH.  There are going to need to be ways to
incrementally update matviews, and ways to disable/enable access that
are not tied to a complete rebuild, not to mention being based on
user-determined rather than hard-wired criteria for what's too stale.
So I don't think this is a useful base to build on.

If you feel that scannability disable is an absolute must for version 0,
let's invent a matview reloption or some such to implement it and let
users turn it on and off as they wish.  That seems a lot more likely
to still be useful two years from now.  And if you're absolutely
convinced that unlogged matviews mustn't work as I suggest, we can
lose those from 9.3, too.

What I'd actually rather see us spending time on right now is making
some provision for incremental updates, which I will boldly propose
could be supported by user-written triggers on the underlying tables
if we only diked out the prohibitions against INSERT/UPDATE/DELETE on
matviews, and allowed them to operate on a matview's contents just like
it was a table.  Now admittedly that would foreclose allowing matviews
to be updatable in the updatable-view sense, but that's a feature I
would readily give up if it meant users could build incremental update
mechanisms this year and not two years down the road.

 (1) On the public web site for circuit court data, visibility is
 based on supreme court rules and the advice of a committee
 consisting of judges, representatives of the press, defense
 attorneys, prosecuting attorneys, etc.  There are cases in the
 database which, for one reason or another, should not show up on
 the public web site.  On a weekly basis, where monitoring shows the
 lowest usage, the table of cases which are too old to be shown
 according to the rules thus determined is regenerated.  If there
 was the possibility that a dump and load could fail to fill this,
 and the queries would run without error, they could not move from
 ad hoc matview techniques to the new feature without excessive
 risk.

Why exactly do you think that what I'm suggesting would cause a dump and
reload to not regenerate the data?  (Personally, I think pg_dump ought
to have an option selecting whether or not to repopulate matviews, but
again, if that's not what you want *don't use it*.)

 (2) Individual judges have a dashboard showing such things as the
 number of court cases which have gone beyond certain thresholds
 without action.  They can drill down to detail so that cases
 which have slipped through the cracks can be scheduled for some
 appropriate action.  Justice delayed... and all of that.  It
 would be much better to get an error which would result in
 information currently unavailable than to give the impression
 that there are no such cases.

If you need 100% accuracy, which is what this scenario appears to be
demanding, matviews are not for you (yet).  The existing implementation
certainly is nowhere near satisfying this scenario.

 ... Making sure that
 the heap has at least one page if data has been generated seems
 like a not-entirely-unreasonable way to do that, although there
 remains at least one vacuum bug to fix if we keep it, in addition
 to Tom's concerns.

No.  This is an absolute disaster.  It's taking something we have always
considered to be an irrelevant implementation detail and making it into
user-visible DDL state, despite the fact that it doesn't begin to satisfy
basic transactional behaviors.  We *need* to get rid of that aspect of
things.  If you must have scannability state in version 0, okay, but
it has to be a catalog property not this.

 It has the advantage of playing nicely with
 

[HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-03 Thread Will Leinweber
On ref 8507907 when compiling with clang on os x, I got this warning which
seems like a possible bug.

I thought to report this because I imagine clang isn't frequently used
day-to-day by most.

dependency.c:213:36: warning: implicit conversion from enumeration type
'ObjectClass' (aka 'enum ObjectClass') to different
  enumeration type 'ObjectType' (aka 'enum ObjectType') [-Wconversion]

EventTriggerSupportsObjectType(getObjectClass(thisobj)))
~~
^~~
1 warning generated.


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Jim Nasby

On 4/3/13 10:34 AM, Gavin Flower wrote:

Maybe we should adopt the famous compromise of '0.5'?


+0.5. ;P
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Jeff Janes
I've changed the subject from regression test failed when enabling
checksum because I now know they are totally unrelated.

My test case didn't need to depend on archiving being on, and so with a
simple tweak I rendered the two issues orthogonal.


On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis pg...@j-davis.com wrote:

 On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:

  What I would probably really want is the data as it existed after the
  crash but before recovery started, but since the postmaster
  immediately starts recovery after the crash, I don't know of a good
  way to capture this.

 Can you just turn off the restart_after_crash GUC? I had a chance to
 look at this, and seeing the block before and after recovery would be
 nice. I didn't see a log file in the data directory, but it didn't go
 through recovery, so I assume it already did that.


You don't know that the cluster is in the bad state until after it goes
through recovery because most crashes recover perfectly fine.  So it would
have to make a side-copy of the cluster after the crash, then recover the
original and see how things go, then either retain or delete the side-copy.
 Unfortunately my testing harness can't do this at the moment, because the
perl script storing the consistency info needs to survive over the crash
and recovery.   It might take me awhile to figure out how to make it do
this.

I have the server log just go to stderr, where it gets mingled together
with messages from my testing harness.  I had not uploaded that file.

Here is a new upload. It contains both a data_dir tarball including xlog,
and the mingled stderr (do_new.out)

https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHcusp=sharing

The other 3 files in it constitute the harness.  It is a quite a mess, with
some hard-coded paths.  The just-posted fix for wal_keep_segments will also
have to be applied.




 The block is corrupt as far as I can tell. The first third is written,
 and the remainder is all zeros. The header looks like this:


Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
by my design, or course.



 So, the page may be corrupt without checksums as well, but it just
 happens to be hidden for the same reason. Can you try to reproduce it
 without -k?


No, things run (seemingly) fine without -k.


 And on the checkin right before checksums were added?
 Without checksums, you'll need to use pg_filedump (or similar) to find
 whether an error has happened.


I'll work on it, but it will take awhile to figure out exactly how to use
pg_filedump to do that, and how to automate that process and incorporate it
into the harness.

In the meantime, I tested the checksum commit itself (96ef3b8ff1c) and the
problem occurs there, so if the problem is not the checksums themselves
(and I agree it probably isn't) it must have been introduced before that
rather than after.

Cheers,

Jeff


Re: [HACKERS] Clang compiler warning on 9.3 HEAD

2013-04-03 Thread Peter Geoghegan
On Wed, Apr 3, 2013 at 11:52 PM, Will Leinweber w...@heroku.com wrote:
 On ref 8507907 when compiling with clang on os x, I got this warning which
 seems like a possible bug.

 I thought to report this because I imagine clang isn't frequently used
 day-to-day by most.

I actually reported a bug that was thrown up by Clang for similar
reasons over a year ago (i.e. an implicit conversion of one enum type
to another). That bug was fixed (by this commit:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=05e83968929f4ec1eba058fcae755fd2df98864e
), and the diagnostics were considered generally sound and useful at
the time. This looks like a real problem to me.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE EXTENSION BLOCKS

2013-04-03 Thread David E. Wheeler
On Apr 3, 2013, at 11:41 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Oh, they are not unique per-schema? I guess they are global to the database 
 but then their objects are in the specified schema, then.
 
 Right -- an extension is not considered to live within a schema, they
 are database-global.  The objects might live in a particular schema (if
 it is relocatable), and there's support to move those to a different
 schema, but this doesn't affect the extension itself.

Thanks. I humbly submit this patch to help prevent silly questions like this in 
the future.

diff --git a/doc/src/sgml/ref/create_extension.sgml 
b/doc/src/sgml/ref/create_extension.sgml
index 4f3b9a5..4ab3dff 100644
--- a/doc/src/sgml/ref/create_extension.sgml
+++ b/doc/src/sgml/ref/create_extension.sgml
@@ -93,6 +93,8 @@ CREATE EXTENSION [ IF NOT EXISTS ] replaceable 
class=parameterextension_name
 relocated.  The named schema must already exist.
 If not specified, and the extension's control file does not specify a
 schema either, the current default object creation schema is used.
+Note that only the extension objects will be placed into the named
+schema; the extension itself is a database-global object.
/para
   /listitem
  /varlistentry

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-03 Thread Greg Stark
On Wed, Apr 3, 2013 at 3:00 PM, Robert Haas robertmh...@gmail.com wrote:

 The main hesitation I've had about actually implementing such a scheme
 is that I find it a bit unappealing to have a background process
 dedicated to just this.  But maybe it could be combined with some of
 the other ideas presented here.  Perhaps we should have one process
 that scans the buffer arena and populates the freelists; as a side
 effect, if it runs across a dirty buffer, it kicks it over to the
 process described in the previous paragraph (which could still, also,
 absorb requests from other backends using buffer access strategies).
 Then we'd end up with nothing that looks exactly like the background
 writer we have now, but maybe no one would miss it.


I think the general pattern of development has led in the opposite
direction. Every time there's been one daemon responsible for two things
it's ended up causing contorted code and difficult to tune behaviours and
we've ended up splitting the two.

In particular in this case it seems like an especially poor choice. In the
time one buffer write might take the entire freelist might empty out. I
could easily imagine this happening *every* time a write I/O happens. It
seems more likely that you'll need multiple processes running the buffer
cleaning to keep up with a decent I/O subsystem.

I'm still skeptical about the idea of a freelist. That just seems like a
terrible point of contention. However perhaps that's because I'm picturing
an LRU linked list. Perhaps the right thing is to maintain a pool of
buffers in some less contention-prone data structure which lets each
backend pick buffers out more or less independently of the others.

-- 
greg


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Andres Freund
On 2013-04-03 15:57:49 -0700, Jeff Janes wrote:
 I've changed the subject from regression test failed when enabling
 checksum because I now know they are totally unrelated.
 
 My test case didn't need to depend on archiving being on, and so with a
 simple tweak I rendered the two issues orthogonal.
 
 
 On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis pg...@j-davis.com wrote:
 
  On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
 
   What I would probably really want is the data as it existed after the
   crash but before recovery started, but since the postmaster
   immediately starts recovery after the crash, I don't know of a good
   way to capture this.
 
  Can you just turn off the restart_after_crash GUC? I had a chance to
  look at this, and seeing the block before and after recovery would be
  nice. I didn't see a log file in the data directory, but it didn't go
  through recovery, so I assume it already did that.
 
 
 You don't know that the cluster is in the bad state until after it goes
 through recovery because most crashes recover perfectly fine.  So it would
 have to make a side-copy of the cluster after the crash, then recover the
 original and see how things go, then either retain or delete the side-copy.
  Unfortunately my testing harness can't do this at the moment, because the
 perl script storing the consistency info needs to survive over the crash
 and recovery.   It might take me awhile to figure out how to make it do
 this.
 
 I have the server log just go to stderr, where it gets mingled together
 with messages from my testing harness.  I had not uploaded that file.
 
 Here is a new upload. It contains both a data_dir tarball including xlog,
 and the mingled stderr (do_new.out)
 
 https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHcusp=sharing
 
 The other 3 files in it constitute the harness.  It is a quite a mess, with
 some hard-coded paths.  The just-posted fix for wal_keep_segments will also
 have to be applied.
 
 
 
 
  The block is corrupt as far as I can tell. The first third is written,
  and the remainder is all zeros. The header looks like this:
 
 
 Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
 by my design, or course.

There are no FPIs (if you mean a full page image with that) afaics:

Your logs tell us about recovery:
27692  2013-04-03 10:09:15.621 PDT:LOG:  redo starts at 1/3190
27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
checksum in record at 1/31169A68
27692  2013-04-03 10:09:15.647 PDT:LOG:  redo done at 1/31169A38

And then the error:

27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING:  page verification failed, 
calculated checksum 16296 but expected 49517
27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR:  invalid page in block 90 of 
relation base/16384/835589


looking at xlog from that time, the first records are:
rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
1/3128, prev 1/3090, bkp: , desc: checkpoint: redo 1/3128; tli 
1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; oldest 
xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
rmgr: XLOGlen (rec/tot):  4/36, tx:  0, lsn: 
1/3190, prev 1/3128, bkp: , desc: nextOid: 843781
rmgr: Storage len (rec/tot): 16/48, tx:  0, lsn: 
1/31B8, prev 1/3190, bkp: , desc: file create: base/16384/835589
rmgr: Heaplen (rec/tot): 37/  7905, tx:   26254997, lsn: 
1/31E8, prev 1/31B8, bkp: 1000, desc: hot_update: rel 1663/16384/12660; 
tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0

So the file was created after the last checkpoint, so no full pages need
to be logged. And we theoretically should be able to recovery all things
like torn pages from the WAL since that should cover everything that
happened to that file.

The bkp block 1 indicated in the 4th record above is the only one in
that period of WAL btw.

Looking a bit more, the last page that's covered by WAL is:
rmgr: Heaplen (rec/tot): 35/67, tx:   26254998, lsn: 
1/311699A8, prev 1/31169960, bkp: , desc: insert: rel 1663/16384/835589; 
tid 44/56

which is far earlier than the block 90 the error is found in unless i
misunderstand something. Which is strange, since non-zero content to
those pages shouldn't go to disk until the respective WAL is
flushed. Huh, are we missing something here?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Gavin Flower

On 04/04/13 11:55, Jim Nasby wrote:

On 4/3/13 10:34 AM, Gavin Flower wrote:

Maybe we should adopt the famous compromise of '0.5'?


+0.5. ;P

Far too positive for our grim world!  How about '-0,5' instead?  :-)

I notice you call yourself a 'Data Architect' - never too sure If I 
should call myself

a 'software architect who pretends he can program' or
a 'programmer with pretensions of being a software architect'?  :-)


Re: [HACKERS] patch to add \watch to psql

2013-04-03 Thread Will Leinweber
Here is an updated patch that addresses several of the points brought up so
far, such as the sleep, internationalization banner, and zero wait check,
and it removes the premature input check.

Unfortunately rl_clear_screen() is not included at all in libedit, causing
compilation to fail, and I was completely unable to find a way to
distinguish libedit from readline on OS X. It tries extraordinarily hard to
pretend that it's readline. Instead falling back to simple control
characters to clear the screen worked very well on both linux and OS X.


On Tue, Mar 26, 2013 at 2:14 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 3/24/13 3:10 PM, Tom Lane wrote:
  I also concur with the complaint here
 
 http://www.postgresql.org/message-id/caazkufzxyj-rt1aec6s0g7zm68tdlfbbm1r6hgrbbxnz80k...@mail.gmail.com
  that allowing a minimum sleep of 0 is rather dangerous

 The original watch command apparently silently corrects a delay of 0
 to 0.1 seconds.

  Another minor question is whether we really need the time-of-day in the
  banner,

 That's also part of the original watch and occasionally useful, I think.




psql-watch-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-04-03 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 04/03/2013 12:59 AM, Tom Lane wrote:
 BTW, on further thought it seems like maybe this is an OpenBSD bug,
 at least in part: what is evidently happening is that the temporary
 blockage of SIGINT during the handler persists even after we've
 longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
 to establish that longjmp handler --- so why isn't the original signal
 mask reinstalled when we return to the main loop?
 
 If (your version of) OpenBSD is getting this wrong, it'd explain why
 we've not seen similar behavior elsewhere.

 hmm trolling the openbsd cvs history brings up this:
 http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143

That's about alternate signal stacks, which we're not using.

I put together a simple test program (attached) and tried it on
spoonbill, and it says that the signal *does* get unblocked when control
returns to the sigsetjmp(...,1).  So now I'm really confused.  Somehow
the results we're getting in a full-fledged backend do not match up with
the results gotten by this test program ... but why?

regards, tom lane

#include signal.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include setjmp.h

typedef void (*pqsigfunc) (int signo);

sigjmp_buf *PG_exception_stack;

static void
printmask(const char *labl)
{
sigset_t sigs;
int i;

printf(masked signals at %s: , labl);
if (sigprocmask(SIG_SETMASK, NULL, sigs) == 0)
{
for (i = 1; i = 32; i++)
{
//  if (sigs.sigset[0]  sigmask(i))
if (sigs  sigmask(i))
printf(%d, , i);
}
}
else
printf(failed!);
printf(\n);
fflush(stdout);
}

static void
sigint_handler(int signo)
{
printmask(sigint_handler);
siglongjmp(*PG_exception_stack, 1);
abort();
}

static pqsigfunc
pqsignal(int signo, pqsigfunc func)
{
struct sigaction act,
oact;

act.sa_handler = func;
sigemptyset(act.sa_mask);
act.sa_flags = 0;
if (signo != SIGALRM)
act.sa_flags |= SA_RESTART;
#ifdef SA_NOCLDSTOP
if (signo == SIGCHLD)
act.sa_flags |= SA_NOCLDSTOP;
#endif
if (sigaction(signo, act, oact)  0)
return SIG_ERR;
return oact.sa_handler;
}

void
intermediate_func(int level)
{
do {
sigjmp_buf *save_exception_stack = PG_exception_stack;
sigjmp_buf local_sigjmp_buf;

if (sigsetjmp(local_sigjmp_buf, 0) == 0)
{
PG_exception_stack = local_sigjmp_buf;

if (level  0)
intermediate_func(level - 1);
else
{
kill(getpid(), SIGINT);
sleep(1);
abort();
}
}
else
{
PG_exception_stack = save_exception_stack;

printmask(inner longjmp catcher);
siglongjmp(*PG_exception_stack, 1);
abort();
}
PG_exception_stack = save_exception_stack;
} while (0);
}

int
main(int argc, char **argv)
{
sigjmp_buf  outer_sigjmp_buf;

printmask(startup);
pqsignal(SIGINT, sigint_handler);

if (sigsetjmp(outer_sigjmp_buf, 1) != 0)
{
printmask(outer longjmp catcher);
return 0;
}
PG_exception_stack = outer_sigjmp_buf;

intermediate_func(2);

printmask(unexpected exit);
return 0;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch to add \watch to psql

2013-04-03 Thread Dickson S. Guedes
2013/4/3 Will Leinweber w...@heroku.com:
 Here is an updated patch that addresses several of the points brought up so
 far, such as the sleep, internationalization banner, and zero wait check,
 and it removes the premature input check.

 Unfortunately rl_clear_screen() is not included at all in libedit, causing
 compilation to fail, and I was completely unable to find a way to
 distinguish libedit from readline on OS X. It tries extraordinarily hard to
 pretend that it's readline. Instead falling back to simple control
 characters to clear the screen worked very well on both linux and OS X.

I don't have access to an OSX box ATM but term_clear_screen(), in
libedit, didn't help?

-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Andres Freund
On 2013-04-04 01:52:41 +0200, Andres Freund wrote:
 On 2013-04-03 15:57:49 -0700, Jeff Janes wrote:
  I've changed the subject from regression test failed when enabling
  checksum because I now know they are totally unrelated.
  
  My test case didn't need to depend on archiving being on, and so with a
  simple tweak I rendered the two issues orthogonal.
  
  
  On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis pg...@j-davis.com wrote:
  
   On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
  
What I would probably really want is the data as it existed after the
crash but before recovery started, but since the postmaster
immediately starts recovery after the crash, I don't know of a good
way to capture this.
  
   Can you just turn off the restart_after_crash GUC? I had a chance to
   look at this, and seeing the block before and after recovery would be
   nice. I didn't see a log file in the data directory, but it didn't go
   through recovery, so I assume it already did that.
  
  
  You don't know that the cluster is in the bad state until after it goes
  through recovery because most crashes recover perfectly fine.  So it would
  have to make a side-copy of the cluster after the crash, then recover the
  original and see how things go, then either retain or delete the side-copy.
   Unfortunately my testing harness can't do this at the moment, because the
  perl script storing the consistency info needs to survive over the crash
  and recovery.   It might take me awhile to figure out how to make it do
  this.
  
  I have the server log just go to stderr, where it gets mingled together
  with messages from my testing harness.  I had not uploaded that file.
  
  Here is a new upload. It contains both a data_dir tarball including xlog,
  and the mingled stderr (do_new.out)
  
  https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHcusp=sharing
  
  The other 3 files in it constitute the harness.  It is a quite a mess, with
  some hard-coded paths.  The just-posted fix for wal_keep_segments will also
  have to be applied.
  
  
  
  
   The block is corrupt as far as I can tell. The first third is written,
   and the remainder is all zeros. The header looks like this:
  
  
  Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
  by my design, or course.
 
 There are no FPIs (if you mean a full page image with that) afaics:
 
 Your logs tell us about recovery:
 27692  2013-04-03 10:09:15.621 PDT:LOG:  redo starts at 1/3190
 27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
 checksum in record at 1/31169A68
 27692  2013-04-03 10:09:15.647 PDT:LOG:  redo done at 1/31169A38
 
 And then the error:
 
 27698 SELECT 2013-04-03 10:09:16.688 PDT:WARNING:  page verification failed, 
 calculated checksum 16296 but expected 49517
 27698 SELECT 2013-04-03 10:09:16.688 PDT:ERROR:  invalid page in block 90 of 
 relation base/16384/835589
 
 
 looking at xlog from that time, the first records are:
 rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
 1/3128, prev 1/3090, bkp: , desc: checkpoint: redo 1/3128; 
 tli 1; prev tli 1; fpw true; xid 0/26254997; oid 835589; multi 1; offset 0; 
 oldest xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; online
 rmgr: XLOGlen (rec/tot):  4/36, tx:  0, lsn: 
 1/3190, prev 1/3128, bkp: , desc: nextOid: 843781
 rmgr: Storage len (rec/tot): 16/48, tx:  0, lsn: 
 1/31B8, prev 1/3190, bkp: , desc: file create: base/16384/835589
 rmgr: Heaplen (rec/tot): 37/  7905, tx:   26254997, lsn: 
 1/31E8, prev 1/31B8, bkp: 1000, desc: hot_update: rel 
 1663/16384/12660; tid 0/47 xmax 26254997 ; new tid 0/33 xmax 0
 
 So the file was created after the last checkpoint, so no full pages need
 to be logged. And we theoretically should be able to recovery all things
 like torn pages from the WAL since that should cover everything that
 happened to that file.
 
 The bkp block 1 indicated in the 4th record above is the only one in
 that period of WAL btw.
 
 Looking a bit more, the last page that's covered by WAL is:
 rmgr: Heaplen (rec/tot): 35/67, tx:   26254998, lsn: 
 1/311699A8, prev 1/31169960, bkp: , desc: insert: rel 1663/16384/835589; 
 tid 44/56
 
 which is far earlier than the block 90 the error is found in unless i
 misunderstand something. Which is strange, since non-zero content to
 those pages shouldn't go to disk until the respective WAL is
 flushed. Huh, are we missing something here?

Looking at the page lsn's with dd I noticed something peculiar:

page 0:
01 00 00 00 18 c2 00 31 = 1/3100C218
page 1:
01 00 00 00 80 44 01 31 = 1/31014480
page 10:
01 00 00 00 60 ce 05 31 = 1/3105ce60
page 43:
01 00 00 00 58 7a 16 31 = 1/31167a58
page 44:
01 00 00 00 f0 99 16 31 = 1/311699f0
page 45:
00 00 00 00 00 00 00 00 = 0/0
page 90:
01 00 00 00 90 17 1d 32 = 1/321d1790
page 

Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Jeff Davis
 On Wed, 2013-04-03 at 15:57 -0700, Jeff Janes wrote:



 You don't know that the cluster is in the bad state until after it
 goes through recovery because most crashes recover perfectly fine.  So
 it would have to make a side-copy of the cluster after the crash, then
 recover the original and see how things go, then either retain or
 delete the side-copy.  Unfortunately my testing harness can't do this
 at the moment, because the perl script storing the consistency info
 needs to survive over the crash and recovery.   It might take
 me awhile to figure out how to make it do this.

Hmm... you're right, that is difficult to integrate into a test.

Another approach is to expand PageHeaderIsValid in a pre-checksums
version to do some extra checks (perhaps the same checks as pg_filedump
does). It might be able to at least detect simple cases, like all zeros
between pd_upper and pd_special (when pd_upper  pd_special). Still not
easy, but maybe more practical than saving the directory like that. We
may even want a GUC for that to aid future debugging (obviously it would
have a performance impact).

 The other 3 files in it constitute the harness.  It is a quite a mess,
 with some hard-coded paths.  The just-posted fix for wal_keep_segments
 will also have to be applied.

I'm still trying to reproduce the problem myself. I keep trying simpler
versions of your test and I don't see the problem. It's a little awkward
to try to run the full version of your test (and I'm jumping between
code inspection and various other ideas).

 I'll work on it, but it will take awhile to figure out exactly how to
 use pg_filedump to do that, and how to automate that process and
 incorporate it into the harness.

It might be more productive to try to expand PageHeaderIsValid as I
suggested above.

 In the meantime, I tested the checksum commit itself (96ef3b8ff1c) and
 the problem occurs there, so if the problem is not the checksums
 themselves (and I agree it probably isn't) it must have been
 introduced before that rather than after.

Thank you for all of your work on this. I have also attached another
test patch that disables most of the complex areas of the checksums
patch (VM bits, hint bits, etc.). If you apply different portions of the
patch (or the whole thing), it will help eliminate possibilities. In
particular, eliminating the calls to PageSetAllVisible+visibilitymap_set
could tell us a lot.

Also, the first data directory you posted had a problem with a page that
appeared to be a new page (otherwise I would have expected either old or
new data at the end). Do you think this might be a problem only for new
pages? Or do you think your test just makes it likely that many writes
will happen on new pages?

I did find one potential problem in the checksums code (aside from my
previous patch involving wal_level=archive): visibilitymap_set is called
sometimes before the heap page is marked dirty. I will examine a little
more closely, but I expect that to require another patch. Not sure if
that could explain this problem or not.

Regards,
Jeff Davis

*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***
*** 257,264  heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
  		 * point in repeating the prune/defrag process until something else
  		 * happens to the page.
  		 */
! 		if (((PageHeader) page)-pd_prune_xid != prstate.new_prune_xid ||
! 			PageIsFull(page))
  		{
  			((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
--- 257,264 
  		 * point in repeating the prune/defrag process until something else
  		 * happens to the page.
  		 */
! 		if (false  (((PageHeader) page)-pd_prune_xid != prstate.new_prune_xid ||
! 	  PageIsFull(page)))
  		{
  			((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
*** a/src/backend/commands/vacuumlazy.c
--- b/src/backend/commands/vacuumlazy.c
***
*** 668,674  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  			freespace = PageGetHeapFreeSpace(page);
  
  			/* empty pages are always all-visible */
! 			if (!PageIsAllVisible(page))
  			{
  PageSetAllVisible(page);
  MarkBufferDirty(buf);
--- 668,674 
  			freespace = PageGetHeapFreeSpace(page);
  
  			/* empty pages are always all-visible */
! 			if (false  !PageIsAllVisible(page))
  			{
  PageSetAllVisible(page);
  MarkBufferDirty(buf);
***
*** 901,907  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  		freespace = PageGetHeapFreeSpace(page);
  
  		/* mark page all-visible, if appropriate */
! 		if (all_visible)
  		{
  			if (!PageIsAllVisible(page))
  			{
--- 901,907 
  		freespace = PageGetHeapFreeSpace(page);
  
  		/* mark page all-visible, if appropriate */
! 		if (false  all_visible)
  		{
  			if (!PageIsAllVisible(page))
  			{
***
*** 1149,1155  lazy_vacuum_page(Relation onerel, 

Re: [HACKERS] Regex with 32k different chars causes a backend crash

2013-04-03 Thread Noah Misch
On Wed, Apr 03, 2013 at 08:09:15PM +0300, Heikki Linnakangas wrote:
 --- a/src/include/regex/regguts.h
 +++ b/src/include/regex/regguts.h
 @@ -148,6 +148,7 @@
  typedef short color; /* colors of characters */
  typedef int pcolor;  /* what color promotes to */
  
 +#define MAX_COLOR32767   /* max value that fits in 'color' 
 datatype */

This should use SHRT_MAX, no?  (Not that any supported platform differs here.)

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Andres Freund
On 2013-04-04 02:28:32 +0200, Andres Freund wrote:
 On 2013-04-04 01:52:41 +0200, Andres Freund wrote:
  On 2013-04-03 15:57:49 -0700, Jeff Janes wrote:
   I've changed the subject from regression test failed when enabling
   checksum because I now know they are totally unrelated.
   
   My test case didn't need to depend on archiving being on, and so with a
   simple tweak I rendered the two issues orthogonal.
   
   
   On Wed, Apr 3, 2013 at 12:15 PM, Jeff Davis pg...@j-davis.com wrote:
   
On Mon, 2013-04-01 at 19:51 -0700, Jeff Janes wrote:
   
 What I would probably really want is the data as it existed after the
 crash but before recovery started, but since the postmaster
 immediately starts recovery after the crash, I don't know of a good
 way to capture this.
   
Can you just turn off the restart_after_crash GUC? I had a chance to
look at this, and seeing the block before and after recovery would be
nice. I didn't see a log file in the data directory, but it didn't go
through recovery, so I assume it already did that.
   
   
   You don't know that the cluster is in the bad state until after it goes
   through recovery because most crashes recover perfectly fine.  So it would
   have to make a side-copy of the cluster after the crash, then recover the
   original and see how things go, then either retain or delete the 
   side-copy.
Unfortunately my testing harness can't do this at the moment, because the
   perl script storing the consistency info needs to survive over the crash
   and recovery.   It might take me awhile to figure out how to make it do
   this.
   
   I have the server log just go to stderr, where it gets mingled together
   with messages from my testing harness.  I had not uploaded that file.
   
   Here is a new upload. It contains both a data_dir tarball including xlog,
   and the mingled stderr (do_new.out)
   
   https://drive.google.com/folderview?id=0Bzqrh1SO9FcEQmVzSjlmdWZvUHcusp=sharing
   
   The other 3 files in it constitute the harness.  It is a quite a mess, 
   with
   some hard-coded paths.  The just-posted fix for wal_keep_segments will 
   also
   have to be applied.
   
   
   
   
The block is corrupt as far as I can tell. The first third is written,
and the remainder is all zeros. The header looks like this:
   
   
   Yes, that part is by my design.  Why it didn't get fixed from a FPI is not
   by my design, or course.
  
  There are no FPIs (if you mean a full page image with that) afaics:
  
  Your logs tell us about recovery:
  27692  2013-04-03 10:09:15.621 PDT:LOG:  redo starts at 1/3190
  27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
  checksum in record at 1/31169A68
  27692  2013-04-03 10:09:15.647 PDT:LOG:  redo done at 1/31169A38

 Looking at the page lsn's with dd I noticed something peculiar:
 
 page 0:
 01 00 00 00 18 c2 00 31 = 1/3100C218
 page 1:
 01 00 00 00 80 44 01 31 = 1/31014480
 page 10:
 01 00 00 00 60 ce 05 31 = 1/3105ce60
 page 43:
 01 00 00 00 58 7a 16 31 = 1/31167a58
 page 44:
 01 00 00 00 f0 99 16 31 = 1/311699f0
 page 45:
 00 00 00 00 00 00 00 00 = 0/0
 page 90:
 01 00 00 00 90 17 1d 32 = 1/321d1790
 page 91:
 01 00 00 00 38 ef 1b 32 = 1/321bef38
 
 So we have written out pages that are after pages without a LSN that
 have an LSN thats *beyond* the point XLOG has successfully been written
 to disk (1/31169A38)?

By now I am pretty sure the issue is that the end-of-wal is detected too
early. Above the end is detected at 1/31169A38, but the next WAL file
contains valid data:
pg_xlogdump /tmp/tmp/data2/pg_xlog/000100010032 -n 10
rmgr: Heap2   len (rec/tot): 26/58, tx:  0, lsn: 
1/3230, prev 1/31D8, bkp: , desc: clean: rel 1663/16384/835589; blk 
122 remxid 26328926
rmgr: Heaplen (rec/tot): 51/83, tx:   26328964, lsn: 
1/3270, prev 1/3230, bkp: , desc: update: rel 1663/16384/835589; 
tid 122/191 xmax 26328964 ; new tid 129/140 xmax 0
rmgr: Heap2   len (rec/tot): 30/62, tx:  0, lsn: 
1/32C8, prev 1/3270, bkp: , desc: clean: rel 1663/16384/835589; blk 
63 remxid 26328803
rmgr: Btree   len (rec/tot): 34/66, tx:   26328964, lsn: 
1/32000108, prev 1/32C8, bkp: , desc: insert: rel 1663/16384/835590; 
tid 25/14

That would explains exactly those symptopms, wouldn't it? Whats causing
that - I am not sure yet.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-03 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 I'm still skeptical about the idea of a freelist. That just seems like a
 terrible point of contention. However perhaps that's because I'm picturing
 an LRU linked list. Perhaps the right thing is to maintain a pool of
 buffers in some less contention-prone data structure which lets each
 backend pick buffers out more or less independently of the others.

I think the original vision of the clock sweep algorithm included the
idea that different backends could be running the sweep over different
parts of the buffer ring concurrently.  If we could get rid of the need
to serialize that activity, it'd pretty much eliminate the bottleneck
I should think.  The problem is how to manage it to ensure that (a)
backends aren't actually contending on the same buffers as they do this,
and (b) there's a reasonably fair rate of usage_count decrements for
each buffer, rather than possibly everybody ganging up on the same area
sometimes.  Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Tom Lane
and...@anarazel.de (Andres Freund) writes:
 Looking at the page lsn's with dd I noticed something peculiar:

 page 0:
 01 00 00 00 18 c2 00 31 = 1/3100C218
 page 1:
 01 00 00 00 80 44 01 31 = 1/31014480
 page 10:
 01 00 00 00 60 ce 05 31 = 1/3105ce60
 page 43:
 01 00 00 00 58 7a 16 31 = 1/31167a58
 page 44:
 01 00 00 00 f0 99 16 31 = 1/311699f0
 page 45:
 00 00 00 00 00 00 00 00 = 0/0
 page 90:
 01 00 00 00 90 17 1d 32 = 1/321d1790
 page 91:
 01 00 00 00 38 ef 1b 32 = 1/321bef38

 So we have written out pages that are after pages without a LSN that
 have an LSN thats *beyond* the point XLOG has successfully been written
 to disk (1/31169A38)?

If you're looking into the FPIs, those would contain the page's older
LSN, not the one assigned by the current WAL record.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Andres Freund
On 2013-04-03 20:45:51 -0400, Tom Lane wrote:
 and...@anarazel.de (Andres Freund) writes:
  Looking at the page lsn's with dd I noticed something peculiar:
 
  page 0:
  01 00 00 00 18 c2 00 31 = 1/3100C218
  page 1:
  01 00 00 00 80 44 01 31 = 1/31014480
  page 10:
  01 00 00 00 60 ce 05 31 = 1/3105ce60
  page 43:
  01 00 00 00 58 7a 16 31 = 1/31167a58
  page 44:
  01 00 00 00 f0 99 16 31 = 1/311699f0
  page 45:
  00 00 00 00 00 00 00 00 = 0/0
  page 90:
  01 00 00 00 90 17 1d 32 = 1/321d1790
  page 91:
  01 00 00 00 38 ef 1b 32 = 1/321bef38
 
  So we have written out pages that are after pages without a LSN that
  have an LSN thats *beyond* the point XLOG has successfully been written
  to disk (1/31169A38)?
 
 If you're looking into the FPIs, those would contain the page's older
 LSN, not the one assigned by the current WAL record.

Nope, thats from the heap, and the LSNs are *newer* than what startup
recovered to. I am pretty sure by now we are missing out on valid WAL, I
am just not sure why.

Unfortunately we can't easily diagnose what happened at:
27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
checksum in record at 1/31169A68
since the startup process wrote its end of recovery checkpoint there:
rmgr: XLOGlen (rec/tot): 72/   104, tx:  0, lsn: 
1/31169A68, prev 1/31169A38, bkp: , desc: checkpoint: redo 1/31169A68; tli 
1; prev tli 1; fpw true; xid 0/26254999; oid 843781; multi 1; offset 0; oldest 
xid 1799 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown

Starting from a some blocks in that wal segments later:
pg_xlogdump /tmp/tmp/data2/pg_xlog/000100010031 -s 1/3116c000 -n 10
first record is after 1/3116C000, at 1/3116D9D8, skipping over 6616 bytes
rmgr: Heaplen (rec/tot): 51/83, tx:   26254999, lsn: 
1/3116D9D8, prev 1/3116BA20, bkp: , desc: update: rel 1663/16384/835589; 
tid 38/148 xmax 26254999 ; new tid 44/57 xmax 0
rmgr: Btree   len (rec/tot): 34/66, tx:   26254999, lsn: 
1/3116DA30, prev 1/3116D9D8, bkp: , desc: insert: rel 1663/16384/835590; 
tid 25/319
rmgr: Heaplen (rec/tot): 51/83, tx:   26255000, lsn: 
1/3116DA78, prev 1/3116DA30, bkp: , desc: update: rel 1663/16384/835589; 
tid 19/214 xmax 26255000 ; new tid 44/58 xmax 0

the records continue again.

Greetings,


Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor erratum for 9.2.4 release notes

2013-04-03 Thread Ian Lawrence Barwick
I guess the 9.2.4 release notes haven't been finalized yet; apologies
if this is already addressed, but following sentence:

   para
Also, if you are upgrading from a version earlier than 9.2.2,
see the release notes for 9.2.2.
   /para

should read:

   para
Also, if you are upgrading from a version earlier than 9.2.3,
see the release notes for 9.2.3.
   /para


Regards


Ian Barwick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-04-03 Thread Jim Nasby

On 3/26/13 6:42 AM, Cédric Villemain wrote:

Le lundi 25 mars 2013 19:35:12, Daniel Farina a écrit :

  On Mon, Mar 25, 2013 at 11:07 AM, Stefan Kaltenbrunner

 

  ste...@kaltenbrunner.cc wrote:

   Back when we used CVS for quite a few years I kept 7 day rolling

   snapshots of the CVS repo, against just such a difficulty as this. But

   we seem to be much better organized with infrastructure these days so I

   haven't done that for a long time.

  

   well there is always room for improvement(and for learning from others)

   - but I agree that this proposal seems way overkill. If people think we

   should keep online delayed mirrors we certainly have the resources to

   do that on our own if we want though...

 

  What about rdiff-backup? I've set it up for personal use years ago

  (via the handy open source bash script backupninja) years ago and it

  has a pretty nice no-frills point-in-time, self-expiring, file-based

  automatic backup program that works well with file synchronization

  like rsync (I rdiff-backup to one disk and rsync the entire

  rsync-backup output to another disk). I've enjoyed using it quite a

  bit during my own personal-computer emergencies and thought the

  maintenance required from me has been zero, and I have used it from

  time to time to restore, proving it even works. Hardlinks can be used

  to tag versions of a file-directory tree recursively relatively

  compactly.

 

  It won't be as compact as a git-aware solution (since git tends to to

  rewrite entire files, which will confuse file-based incremental

  differential backup), but the amount of data we are talking about is

  pretty small, and as far as a lowest-common-denominator tradeoff for

  use in emergencies, I have to give it a lot of praise. The main

  advantage it has here is it implements point-in-time recovery

  operations that easy to use and actually seem to work. That said,

  I've mostly done targeted recoveries rather than trying to recover

  entire trees.

I have the same set up, and same feedback.


I had the same setup, but got tired of how rdiff-backup behaved when a backup 
was interrupted (very lengthy cleanup process). Since then I've switched to an 
rsync setup that does essentially the same thing as rdiff-backup (uses 
hardlinks between multiple copies).

The only downside I'm aware of is that my rsync backups aren't guaranteed to be 
consistent (for however consistent a backup of an active FS would be...).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor erratum for 9.2.4 release notes

2013-04-03 Thread Tom Lane
Ian Lawrence Barwick barw...@gmail.com writes:
 I guess the 9.2.4 release notes haven't been finalized yet; apologies
 if this is already addressed, but following sentence:

para
 Also, if you are upgrading from a version earlier than 9.2.2,
 see the release notes for 9.2.2.
/para

 should read:

para
 Also, if you are upgrading from a version earlier than 9.2.3,
 see the release notes for 9.2.3.
/para

Um, no, there's no special instructions attached to 9.2.3.  The point of
that para is to thread back to the last minor release that had something
extra for you to do.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor erratum for 9.2.4 release notes

2013-04-03 Thread Ian Lawrence Barwick
2013/4/4 Tom Lane t...@sss.pgh.pa.us:
 Ian Lawrence Barwick barw...@gmail.com writes:
 I guess the 9.2.4 release notes haven't been finalized yet; apologies
 if this is already addressed, but following sentence:

para
 Also, if you are upgrading from a version earlier than 9.2.2,
 see the release notes for 9.2.2.
/para

 should read:

para
 Also, if you are upgrading from a version earlier than 9.2.3,
 see the release notes for 9.2.3.
/para

 Um, no, there's no special instructions attached to 9.2.3.  The point of
 that para is to thread back to the last minor release that had something
 extra for you to do.

Got it, sorry for the noise

Regards

Ian Barwick


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-03 Thread Brendan Jurd
On 4 April 2013 01:10, Tom Lane t...@sss.pgh.pa.us wrote:
 I think though that the upthread argument that we'd have multiple
 interpretations of the same thing is bogus.  To me, the core idea that's
 being suggested here is that '{}' should mean a zero-length 1-D array,
 not a zero-D array as formerly.  We would still need a way to represent
 zero-D arrays, if only because they'd still exist on-disk in existing
 databases (assuming we're not willing to break pg_upgrade for this).

Tom,

My thought was that on-disk zero-D arrays should be converted into
empty 1-D arrays (with default lower bounds of course) when they are
read by array_recv.  Any SQL operation on your zero-D arrays would
therefore resolve as though they were 1-D.  A pg_dump/restore would
result in the arrays being 1-D on the restore side.  If pg_upgrade
conserves the zero-D array in binary form, that's okay since the
receiving end will just treat it as 1-D out of array_recv anyway.

My intention was that the zero-D array could continue to live
indefinitely in binary form, but would never be observable as such by
any application code.

Cheers,
BJ


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Interesting post-mortem on a near disaster with git

2013-04-03 Thread Daniel Farina
On Wed, Apr 3, 2013 at 6:18 PM, Jim Nasby j...@nasby.net wrote:
   What about rdiff-backup? I've set it up for personal use years ago

   (via the handy open source bash script backupninja) years ago and it

   has a pretty nice no-frills point-in-time, self-expiring, file-based

   automatic backup program that works well with file synchronization

   like rsync (I rdiff-backup to one disk and rsync the entire

   rsync-backup output to another disk). I've enjoyed using it quite a

   bit during my own personal-computer emergencies and thought the

   maintenance required from me has been zero, and I have used it from

   time to time to restore, proving it even works. Hardlinks can be used

   to tag versions of a file-directory tree recursively relatively

   compactly.

  

   It won't be as compact as a git-aware solution (since git tends to to

   rewrite entire files, which will confuse file-based incremental

   differential backup), but the amount of data we are talking about is

   pretty small, and as far as a lowest-common-denominator tradeoff for

   use in emergencies, I have to give it a lot of praise. The main

   advantage it has here is it implements point-in-time recovery

   operations that easy to use and actually seem to work. That said,

   I've mostly done targeted recoveries rather than trying to recover

   entire trees.

 I have the same set up, and same feedback.


 I had the same setup, but got tired of how rdiff-backup behaved when a
 backup was interrupted (very lengthy cleanup process). Since then I've
 switched to an rsync setup that does essentially the same thing as
 rdiff-backup (uses hardlinks between multiple copies).

 The only downside I'm aware of is that my rsync backups aren't guaranteed to
 be consistent (for however consistent a backup of an active FS would
 be...).

I forgot to add one more thing to my first mail, although it's very
important to my feeble recommendation: the problem is that blind
synchronization is a great way to propagate destruction.

rdiff-backup (but perhaps others, too) has a file/directory structure
that is, as far as I know, additive, and the pruning can be done
independently at different replicas that can have different
retention...and if done just right (I'm not sure about the case of
concurrent backups being taken) one can write a re-check that no files
are to be modified or deleted by the synchronization as a safeguard.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-03 Thread Ants Aasma
On Thu, Apr 4, 2013 at 3:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think the original vision of the clock sweep algorithm included the
 idea that different backends could be running the sweep over different
 parts of the buffer ring concurrently.  If we could get rid of the need
 to serialize that activity, it'd pretty much eliminate the bottleneck
 I should think.  The problem is how to manage it to ensure that (a)
 backends aren't actually contending on the same buffers as they do this,
 and (b) there's a reasonably fair rate of usage_count decrements for
 each buffer, rather than possibly everybody ganging up on the same area
 sometimes.  Thoughts?

Fairness and avoiding each others implies that some coordination is
required. One wild idea I had is to partition buffers to N partitions
and have one clock sweep each, protected by a lwlock.

To reduce contention, the clocksweep runners use something like
sched_getcpu() to determine which partition to use to find their
buffer. Using the CPU to determine the partition makes it necessary
for the process to be scheduled out in the critical section for some
other backend to contend on the lock. And if some backend does contend
on it, it is likely to reside on the same CPU and by sleeping will
make room for the lockholder to run.

To ensure fairness for buffers, every time one of the clocksweeps
wraps around a global offset counter is incremented. This re-assigns
all cpus/backends to the next partition, sort of like the mad hatters
tea party.

The scenario that I'm most worried about here is what happens when a
process holding the clocksweep lock is migrated to another CPU and
then scheduled out. The processes on the original CPU will sooner or
later block behind the lock while the processes on the CPU where the
lock holder now waits keep hogging the CPU. This will create an
imbalance that the scheduler might try to correct, possibly creating a
nasty feedback loop. It could be that in practice the scenario works
out to be too far fetched to matter, but who knows.

I don't have a slightest idea yet how the background writer would
function in this environment. But if redesigning the bgwriter
mechanism was on the table I thought I would throw the idea out here.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-04-03 Thread Andres Freund
On 2013-04-04 02:58:43 +0200, Andres Freund wrote:
 On 2013-04-03 20:45:51 -0400, Tom Lane wrote:
  and...@anarazel.de (Andres Freund) writes:
   Looking at the page lsn's with dd I noticed something peculiar:
 
   page 0:
   01 00 00 00 18 c2 00 31 = 1/3100C218
   page 1:
   01 00 00 00 80 44 01 31 = 1/31014480
   page 10:
   01 00 00 00 60 ce 05 31 = 1/3105ce60
   page 43:
   01 00 00 00 58 7a 16 31 = 1/31167a58
   page 44:
   01 00 00 00 f0 99 16 31 = 1/311699f0
   page 45:
   00 00 00 00 00 00 00 00 = 0/0
   page 90:
   01 00 00 00 90 17 1d 32 = 1/321d1790
   page 91:
   01 00 00 00 38 ef 1b 32 = 1/321bef38
 
   So we have written out pages that are after pages without a LSN that
   have an LSN thats *beyond* the point XLOG has successfully been written
   to disk (1/31169A38)?
 
  If you're looking into the FPIs, those would contain the page's older
  LSN, not the one assigned by the current WAL record.

 Nope, thats from the heap, and the LSNs are *newer* than what startup
 recovered to. I am pretty sure by now we are missing out on valid WAL, I
 am just not sure why.

Ok, I think I see the bug. And I think its been introduced in the
checkpoints patch.

Remember, as explained in other mails, I am pretty sure the end of wal
was errorneously detected, resulting in the following error:
27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data 
checksum in record at 1/31169A68

Not that without a *hardware* crash end of wal is normally found first
by the checks for wrong LSNs or wrong record lengths. Not here.

My theory is that a page that was full page written changed while it was
inserted into the WAL which caused the error. A possibly scenario for
that would e.g. two concurrent calls to MarkBufferDirtyHint().
Note:
 * 2. The caller might have only share-lock instead of exclusive-lock on the
 *buffer's content lock.

if ((bufHdr-flags  (BM_DIRTY | BM_JUST_DIRTIED)) !=
(BM_DIRTY | BM_JUST_DIRTIED))
{
if (DataChecksumsEnabled()  (bufHdr-flags  BM_PERMANENT))
{
MyPgXact-delayChkpt = delayChkpt = true;
lsn = XLogSaveBufferForHint(buffer); /* PART 1 */
}
}

LockBufHdr(bufHdr);
Assert(bufHdr-refcount  0);

if (!(bufHdr-flags  BM_DIRTY))
{
dirtied = true; /* Means will be dirtied by 
this action */

/*
 * Set the page LSN if we wrote a backup block. We 
aren't
 * supposed to set this when only holding a share lock 
but
 * as long as we serialise it somehow we're OK. We 
choose to
 * set LSN while holding the buffer header lock, which 
causes
 * any reader of an LSN who holds only a share lock to 
also
 * obtain a buffer header lock before using 
PageGetLSN().
 * Fortunately, thats not too many places.
 *
 * If checksums are enabled, you might think we should 
reset the
 * checksum here. That will happen when the page is 
written
 * sometime later in this checkpoint cycle.
 */
if (!XLogRecPtrIsInvalid(lsn))
PageSetLSN(page, lsn); /* PART 2 */
}

So XLogSaveBufferForHint(buffer) is called for all buffers which are not
yet dirty. Without any locks. It just does:

XLogSaveBufferForHint(Buffer buffer)
{
rdata[0].data = (char *) (watermark);
rdata[0].len = sizeof(int);
rdata[0].next = (rdata[1]);

rdata[1].data = NULL;
rdata[1].len = 0;
rdata[1].buffer = buffer;
rdata[1].buffer_std = true;
rdata[1].next = NULL;

return XLogInsert(RM_XLOG_ID, XLOG_HINT, rdata);
}

I think what happens is that that one backend grabs WALInsertLock first,
it computes a full page write for the buffer, including a CRC. And
returns a valid LSN. Then it continues towards the PageSetLSN() after
LockBufHdr().
Allthewhile the second backend notices that WALInsertLock is free again
and continues towards the WALInsertLock protected parts of XLogInsert().

Only that it already has done that part:
/*
 * Calculate CRC of the data, including all the backup blocks
 *
 * Note that the record header isn't added into the CRC initially since 
we
 * don't know the prev-link yet.  Thus, the CRC will represent the CRC 
of
 * the whole record in the order: rdata, then backup blocks, then record
 * header.
 */
INIT_CRC32(rdata_crc);
for (rdt = rdata; rdt != NULL; rdt = rdt-next)
COMP_CRC32(rdata_crc, rdt-data, rdt-len);

It goes on to write all the data to the 

Re: [HACKERS] Multi-pass planner

2013-04-03 Thread Greg Stark
On Fri, Aug 21, 2009 at 6:54 PM, decibel deci...@decibel.org wrote:

 Would it? Risk seems like it would just be something along the lines of
 the high-end of our estimate. I don't think confidence should be that hard
 either. IE: hard-coded guesses have a low confidence. Something pulled
 right out of most_common_vals has a high confidence. Something estimated
 via a bucket is in-between, and perhaps adjusted by the number of tuples.


I used to advocate a similar idea. But when questioned on list I tried to
work out the details and ran into some problem coming up with a concrete
plan.

How do you compare a plan that you think has a 99% chance of running in 1ms
but a 1% chance of taking 1s against a plan that has a 90% chance of 1ms
and a 10% chance of taking 100ms? Which one is actually riskier? They might
even both have the same 95% percentile run-time.

And additionally there are different types of unknowns. Do you want to
treat plans where we have a statistical sample that gives us a
probabilistic answer the same as plans where we think our model just has a
10% chance of being wrong? The model is going to either be consistently
right or consistently wrong for a given query but the sample will vary from
run to run. (Or vice versa depending on the situation).


-- 
greg


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-03 Thread Greg Smith

On 4/2/13 11:54 AM, Robert Haas wrote:

But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
to wait for linear scans of the buffer array.


I was hoping this one would make it to a full six years of being on the 
TODO list before it came up again, missed it by a few weeks.  The 
funniest part is that Amit even submitted a patch on this theme a few 
months ago without much feedback: 
http://www.postgresql.org/message-id/6C0B27F7206C9E4CA54AE035729E9C382852FF97@szxeml509-mbs 
 That stalled where a few things have, on a) needing more regression 
test workloads, and b) wondering just what the deal with large 
shared_buffers setting degrading performance was.


I saw refactoring in this area as waiting behind it being easier to 
experiment with adding new processes, but that barrier has fallen now. 
Maybe it needs a new freelist process, maybe it doesn't, today the code 
needed to try both is relatively cheap.


The other thing that always seemed to stop me was never having a typical 
Linux system big enough to hit some of these problems available all the 
time.  What I did this week on that front was just go buy a 24 core 
server with 64GB of RAM that lives in my house.  I just need to keep it 
two floors away if I want to sleep at night.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >