[HACKERS] Re: [pgeu-general] REMINDER: FOSDEM 2012 - PostgreSQL Devroom: Call for Speakers

2011-12-19 Thread Emanuel Calvo

 Please keep in mind, that the Call for Speakers is open until December 20th.
 Only a few days left.
 Now it's a good time to submit your proposal ;-)


Did someone applied?



-- 
--
              Emanuel Calvo
              Helpame.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ToDo: conditional ALTER TABLE

2011-12-19 Thread Pavel Stehule
Hello

I am working on quiet dumps now. i found a small issue.

pg_dump produces a statements

ALTER TABLE ONLY public.b DROP CONSTRAINT b_fk_fkey;
ALTER TABLE ONLY public.a DROP CONSTRAINT a_pkey;

DROP TABLE IF EXISTS public.b;
DROP TABLE IF EXISTS public.a;

Actually there is no a conditional ALTER. These statements must be
before DROPs, but then it can fails when these tables are missing.

So some form like ALTER TABLE IF EXISTS ... should be usefull

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Enable min/max optimization for bool_and/bool_or/every

2011-12-19 Thread Marti Raudsepp
Hi list,

As discussed on the pgsql-general list, the bool_and() and bool_or()
aggregate functions behave exactly like min() and max() would over
booleans. While it's not likely that people would have an appropriate
index on a boolean column, it seems it wouldn't cost us anything to
take advantage of this optimization, as it requires no code changes at
all, simply value changes in the pg_aggregate catalog.

Before:
db=# explain analyze select bool_and(b) from bools;
 Aggregate  (cost=1693.01..1693.02 rows=1 width=1)
  -  Seq Scan on bools  (cost=0.00..1443.01 rows=11 width=1)
 Total runtime: 29.736 ms

After:
db=# explain analyze select bool_and(b) from bools;
 Result  (cost=0.03..0.04 rows=1 width=0)
  InitPlan 1 (returns $0)
-  Limit  (cost=0.00..0.03 rows=1 width=1)
  -  Index Scan using bools_b_idx on bools
(cost=0.00..3300.28 rows=11 width=1)
Index Cond: (b IS NOT NULL)
 Total runtime: 0.109 ms

Original discussion here:
http://archives.postgresql.org/message-id/CABRT9RAGwQEP+EFhVpZ6=b4cjecue2-qcpb_zdrnpgqna8x...@mail.gmail.com

PS: It seems that the min/max optimization isn't documented in the
manual (apart from release notes), so I didn't include any doc changes
in this patch.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-19 Thread Dimitri Fontaine
Merlin Moncure mmonc...@gmail.com writes:
 Getting back on point, I'm curious about your statement: without
 writing a single line of C.  I took a look at the pl/scheme docs and
 was pretty impressed -- what exactly would be involved to get a
 guile-based ECMAscript working over the pl/scheme implementation?  How
 would that interact exactly with the stated topic -- JSON support?  Do
 you even need a json type if you have strong library based parsing and
 composition features?

My understanding is that JSON is a subset of ECMAscript, so if you get
the latter you already have the former.  Now, someone would have to
check if plscheme still build with guile-2.0, and given that, how
exactly you get pl/ecmascript (or pl/js) out of that.

I won't be in a position to spend time on that this year…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Simon Riggs
On Mon, Dec 19, 2011 at 4:21 AM, Josh Berkus j...@agliodbs.com wrote:
 On 12/18/11 5:55 PM, Greg Stark wrote:
 There is another way to look at this problem. Perhaps it's worth
 having a checksum *even if* there are ways for the checksum to be
 spuriously wrong. Obviously having an invalid checksum can't be a
 fatal error then but it might still be useful information. Rright now
 people don't really know if their system can experience torn pages or
 not and having some way of detecting them could be useful. And if you
 have other unexplained symptoms then having checksum errors might be
 enough evidence that the investigation should start with the hardware
 and get the sysadmin looking at hardware logs and running memtest
 sooner.

 Frankly, if I had torn pages, even if it was just hint bits missing, I
 would want that to be logged.  That's expected if you crash, but if you
 start seeing bad CRC warnings when you haven't had a crash?  That means
 you have a HW problem.

 As long as the CRC checks are by default warnings, then I don't see a
 problem with this; it's certainly better than what we have now.

It is an important problem, and also a big one, hence why it still exists.

Throwing WARNINGs for normal events would not help anybody; thousands
of false positives would just make Postgres appear to be less robust
than it really is. That would be a credibility disaster. VMWare
already have their own distro, so if they like this patch they can use
it.

The only sensible way to handle this is to change the page format as
discussed. IMHO the only sensible way that can happen is if we also
support an online upgrade feature. I will take on the online upgrade
feature if others work on the page format issues, but none of this is
possible for 9.2, ISTM.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Andres Freund
On Monday, December 19, 2011 12:10:11 PM Simon Riggs wrote:
 The only sensible way to handle this is to change the page format as
 discussed. IMHO the only sensible way that can happen is if we also
 support an online upgrade feature. I will take on the online upgrade
 feature if others work on the page format issues, but none of this is
 possible for 9.2, ISTM.
Totally with you that its not 9.2 material. But I think if somebody actually 
wants to implement that that person would need to start discussing and 
implementing rather soon if it should be ready for 9.3. Just because its not 
geared towards the next release doesn't mean it OT.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 6:10 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Throwing WARNINGs for normal events would not help anybody; thousands
 of false positives would just make Postgres appear to be less robust
 than it really is. That would be a credibility disaster. VMWare
 already have their own distro, so if they like this patch they can use
 it.

Agreed on all counts.

It seems to me that it would be possible to plug this hole by keeping
track of which pages in shared_buffers have had unlogged changes to
them since the last FPI.  When you go to evict such a page, you write
some kind of WAL record for it - either an FPI, or maybe a partial
page image containing just the parts that might have been changed
(like all the tuple headers, or whatever).  This would be expensive,
of course.

 The only sensible way to handle this is to change the page format as
 discussed. IMHO the only sensible way that can happen is if we also
 support an online upgrade feature. I will take on the online upgrade
 feature if others work on the page format issues, but none of this is
 possible for 9.2, ISTM.

I'm not sure that I understand the dividing line you are drawing here.
 However, with respect to the implementation of this particular
feature, it would be nice if we could arrange things so that space
cost of the feature need only be paid by people who are using it.  I
think it would be regrettable if everyone had to give up 4 bytes per
page because some people want checksums.  Maybe I'll feel differently
if it turns out that the overhead of turning on checksumming is
modest, but that's not what I'm expecting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need create tuplestore for each fetch?

2011-12-19 Thread Robert Haas
On Thu, Dec 15, 2011 at 8:30 AM, 高增琦 pgf...@gmail.com wrote:
 I found this several days ago when I try to debug a fetch of cursor.
 And I have sent a mail to this list, but no one reply...
 Maybe this is a very simple problem, please help me, thanks a lot...

 Here is the example:
     create table t (a int);
     insert into t values (1),(3),(5),(7),(9);
     insert into t select a+1 from t;
     begin;
     declare c cursor for select * from t order by a;
     fetch 3 in c;
     fetch 3 in c;
     fetch 3 in c;

 In 'PortalRun', a fetch stmt will be treated with PORTAL_UTIL_SELECT,
 and then a tuplestore will be created in 'FillPortalStore' in the
 fetch stmt's portal.

 In 'FillPortalStore', all result will be store at that tuplestore,
 Then, go back to 'PortalRun'; next,  'PortalRunSelect' will send this
 results to client...

 My problem is: why do we need create that tuplestore as an
 middle storeage? why do not we just send these result to clent
 at the first time?

Good question.  I wouldn't expect it to matter very much for a
three-row fetch, but maybe it does for larger ones?  What is your
motivation for investigating this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Robert Haas
On Fri, Dec 16, 2011 at 2:06 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yeah.  Nikhil, Alex, this is the merged patch.  Have a look that it
 still works for you (particularly the pg_dump bits) and I'll commit it.
 I adjusted the regression test a bit too.

It seems hard to believe that ATExecDropConstraint() doesn't need any
adjustment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Robert Haas
On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote:
 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.

This may be stupid of me, but I don't see the point of this.  If you
want keepalives, why use log shipping rather than SR?  Implementing a
really-high-latency method of passing protocol messages through the
archive seems like a complex solution to a non-problem (but, like I
say, I may be missing something).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Greg Smith

On 12/19/2011 08:17 AM, Robert Haas wrote:

If you want keepalives, why use log shipping rather than SR?  Implementing a
really-high-latency method of passing protocol messages through the
archive seems like a complex solution to a non-problem


The problem being addressed is how can people using archiving compute 
time-based lag usefully?  Thinking about an answer to that question 
that made sense for SR drove us toward keepalive timestamp sharing.  
This is trying to introduce a mechanism good enough to do the same thing 
for regular archive recovery.


In the archiving case, the worst case waiting to trip you up is always 
the one where not enough activity happened to generate a new WAL file 
yet.  If people want lag to move correctly in that case anyway, a 
message needs to be transferred from archiver to recovery system.  Simon 
is suggesting that we do that via shipping a new small file in that 
case, rather than trying to muck with putting it into the WAL data or 
something like that.  It's a bit hackish, but a) no more hackish than 
people are used to for PITR, and b) in a way that avoids touching 
database code in the critical path for SR.


This idea might eliminate the last of the reasons I was speculating on 
for adding more timestamps into the WAL stream.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Stephen Frost
* Aidan Van Dyk (ai...@highrise.ca) wrote:
 But the scary part is you don't know how long *ago* the crash was.
 Because a hint-bit-only change w/ a torn-page is a non event in
 PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
 and scrub every page in the database.

Fair enough, but, could we distinguish these two cases?  In other words,
would it be possible to detect if a page was torn due to a 'traditional'
crash and not complain in that case, but complain if there's a CRC
failure and it *doesn't* look like a torn page?

Perhaps that's a stretch, but if we can figure out that a page is torn
already, then perhaps it's not so far fetched..

Thanks,

Stephen
(who is no expert on WAL/torn pages/etc)


signature.asc
Description: Digital signature


Re: [HACKERS] Page Checksums

2011-12-19 Thread Stephen Frost
* Aidan Van Dyk (ai...@highrise.ca) wrote:
 #) Anybody investigated putting the CRC in a relation fork, but not
 right in the data block?  If the CRC contains a timestamp, and is WAL
 logged before the write, at least on reading a block with a wrong
 checksum, if a warning is emitted, the timestamp could be looked at by
 whoever is reading the warning and know tht the block was written
 shortly before the crash $X $PERIODS ago

I do like the idea of putting the CRC info in a relation fork, if it can
be made to work decently, as we might be able to then support it on a
per-relation basis, and maybe even avoid the on-disk format change..

Of course, I'm sure there's all kinds of problems with that approach,
but it might be worth some thinking about.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Command Triggers

2011-12-19 Thread Greg Smith

On 12/18/2011 09:02 PM, Robert Haas wrote:

If we're actually going to expose the parse tree, I think JSON
(or even XML) would be a far better way to expose that than the
existing nodeToString() output.  Sure, you could make due with the
nodeToString() output for some things, especially in PL/perl or
PL/python.  But JSON would be far better, since it's a standard format
rather than something we just made up, and could be used in PL/pgsql
as well, given proper support functions.


That seems pretty sensible; I wouldn't want to make it a hard 
requirement though.  There are three major ways this could go for 9.2:


1) Nothing is changed in core, the best that can be done is whatever you 
can cram into an extension.


2) 9.2 is released with Command Triggers using nodeToString() output as 
the detailed description.  That part is labeled experimental and the 
API is expected to change completely in the next release


3) 9.2 gets enough JSON support to also have an early nodeToJSON API 
instead.  Now it's labeled early release and some API changes may be 
needed in future releases.


From the perspective of enabling a developer community to spring up 
around working in this area, I don't see a huge difference between (2) 
and (3).  We're going in a direction I don't think is very well explored 
yet, by anyone.  The idea that we're going to learn enough to accumulate 
a comprehensive, stable API before release is rather optimistic.  
There's something to be said for Dimitri's suggested approach from the 
perspective of ship it knowing it works for features A, B, then let's 
see what else the users think to do with it.  We can't determine what 
feature complete looks like from what other people are doing anymore; 
only way to find out is to see what the world at large thinks of after 
release.


Think of it as being like the EXPLAIN plan output.  That shipped as just 
text, programs popped up to parse it anyway, they suffered some breaks 
with each new release.  That was still a major win.  Then, new easier to 
parse formats appeared, making the bar to entry for writing new such 
tool much lower.  And the construction of a better API for output 
benefited from seeing what people had actually been struggling with 
before then.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Alvaro Herrera

Excerpts from Stephen Frost's message of lun dic 19 11:18:21 -0300 2011:
 * Aidan Van Dyk (ai...@highrise.ca) wrote:
  #) Anybody investigated putting the CRC in a relation fork, but not
  right in the data block?  If the CRC contains a timestamp, and is WAL
  logged before the write, at least on reading a block with a wrong
  checksum, if a warning is emitted, the timestamp could be looked at by
  whoever is reading the warning and know tht the block was written
  shortly before the crash $X $PERIODS ago
 
 I do like the idea of putting the CRC info in a relation fork, if it can
 be made to work decently, as we might be able to then support it on a
 per-relation basis, and maybe even avoid the on-disk format change..
 
 Of course, I'm sure there's all kinds of problems with that approach,
 but it might be worth some thinking about.

I think the main objection to that idea was that if you lose a single
page of CRCs you have hundreds of data pages which no longer have good
CRCs.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote:
 * Aidan Van Dyk (ai...@highrise.ca) wrote:
 But the scary part is you don't know how long *ago* the crash was.
 Because a hint-bit-only change w/ a torn-page is a non event in
 PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
 and scrub every page in the database.

 Fair enough, but, could we distinguish these two cases?  In other words,
 would it be possible to detect if a page was torn due to a 'traditional'
 crash and not complain in that case, but complain if there's a CRC
 failure and it *doesn't* look like a torn page?

No.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Alvaro Herrera

Excerpts from Dimitri Fontaine's message of dom dic 18 16:54:11 -0300 2011:
 
 Tom Lane t...@sss.pgh.pa.us writes:
  Hmm ... I don't think that I *am* ok with that.  ISTM that we'd then
  find ourselves with any changes in utility statement parse trees
  amounting to a user-visible API break, and that's not an acceptable
  situation.
 
 Oh, you mean like exposing the parser for syntax coloring etc.  I failed
 to see it's the same case.  Do we have an acceptable proposal on that
 front yet?

The conclusion that was reached during developer's meeting was that
those interested should use a technique similar to the one used by the
ecpg parser, namely use some sort of tool to transform the gram.y source
code into something else (different productions).  That idea is not
useful to you here, I'm afraid.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to allow users to kill their own queries

2011-12-19 Thread Greg Smith

On 12/18/2011 07:31 AM, Magnus Hagander wrote:

* I restructured the if statements, because I had a hard time
following the comments around that ;) I find this one easier - but I'm
happy to change back if you think your version was more readable.


That looks fine.  I highlighted this because I had a feeling there was 
still some gain to be had here, just didn't see it myself.  This works.



* The error message in pg_signal_backend breaks the abstraction,
because it specifically talks about terminating the other backend -
when it's not supposed to know about that in that function. I think we
either need to get rid of the hint completely, or we need to find a
way to issue it from the caller. Or pass it as a parameter. It's fine
for now since we only have two signals, but we might have more in the
future..


I feel that including a hint in the pg_terminate_backend case is a UI 
requirement.  If someone has made it as far as discovering that function 
exists, tries calling it, and it fails, the friendly thing to do is 
point them toward a direction that might work better.  Little things 
like that make a huge difference in how friendly the software appears to 
its users; this is even more true in cases where version improvements 
actually expand what can and cannot be done.


My quick and dirty side thinks that just documenting the potential 
future issues would be enough:


Due to the limited number of callers of this function, the hint message 
here can be certain that pg_terminate_backend provides the only path to 
reach this point.  If more callers to pg_signal_backend appear, a more 
generic hint mechanism might be needed here.


If you must have this more generic mechanism available, I would accept 
re-factoring to provide it instead.  What I wouldn't want to live with 
is a commit of this where the hint goes away completely.  It's taken a 
long time chopping the specification to get this feature sorted out; we 
might as well make what's left be the best it can be now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2011-12-19 Thread Greg Smith

On 12/17/2011 04:00 PM, Pavel Stehule wrote:

I use it for checking of my most large plpgsql project - it is about
300KB plpgsql procedures - but this code is not free - and this module
helps to find lot of bugs.


Great.  If you continue to check against that regularly, that makes me 
feel better.  I was guessing you had a large body of such source code 
around, and knowing it executed correctly against all of it improves my 
confidence here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread pratikchirania
OS: I am using Windows server 2003

Version upgrade: hmm.. Is there any fix/change related to this issue in
9.0.6?
If yes, I will upgrade in next scheduled downtime (I am using this as
production server)...

postgres queries are very occasionly used (a set of calls once in 30
minutes).. so I guess I am not calling my DB component heavily.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/pgstat-wait-timeout-tp5078125p5086379.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeVarGetRelid()

2011-12-19 Thread Robert Haas
On Fri, Dec 16, 2011 at 10:32 AM, Noah Misch n...@leadboat.com wrote:
 I agree, but my point is that so far we have no callbacks that differ
 only in that detail.  I accept that we'd probably want to avoid that.

 To illustrate what I had in mind, here's a version of your patch that has five
 callers sharing a callback.  The patch is against d039fd51f79e, just prior to
 your recent commits.

I don't think RenameRelation() ought to wait until we've taken the
lock before doing this:

+   aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(), ACL_CREATE);
+   if (aclresult != ACLCHECK_OK)
+   aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
+  get_namespace_name(namespaceId));

I'm not sure how we can distinguished in a principled way between
permissions checks that must be conducted before locking the relation
and those that can be done afterwards.  And I don't really want to
make that distinction, anyhow.

But I see your point, otherwise.  What I'm tempted to do is define a
new function RangeVarGetRelidCheckOwner(RangeVar *relation, LOCKMODE
lockmode, bool system_tables_ok) that will arrange to look up the OID,
check that you own it, optionally check that it's not a system table,
and acquire the specified lock.  Internally that will call
RangeVarGetRelidExtended() with a callback; the callback arg can be
used to pass through the system_tables_ok flag.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade with plpython is broken

2011-12-19 Thread Peter Eisentraut
Upgrading an instance containing plpython from =8.4 to =9.0 is broken
because the module plpython.so was renamed to plpython2.so, and so the
pg_upgrade check for loadable libraries fails thus:

Your installation references loadable libraries that are missing from the
new installation.  etc.

Installing a symlink fixes the issue.  Should we teach pg_upgrade about
this renaming, or should we install the symlink as part of the standard
installation?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
I recently upgraded my postgres server from 9.0 to 9.1.2 and I am
finding a peculiar problem.I have a program that periodically adds rows
to this table using INSERT. Typically the number of rows is just 1-2
thousand when the table already has 500K rows. Whenever the program is
adding rows, the performance of the search query on the same table is
very bad. The query uses the gin index and the tsearch ranking function
ts_rank_cd. 


This never happened earlier with postgres 9.0 Is there a known issue
with Postgres 9.1? Or how to report this problem?

-Sushant.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Euler Taveira de Oliveira
On 19-12-2011 12:30, Sushant Sinha wrote:
 I recently upgraded my postgres server from 9.0 to 9.1.2 and I am
 finding a peculiar problem.I have a program that periodically adds rows
 to this table using INSERT. Typically the number of rows is just 1-2
 thousand when the table already has 500K rows. Whenever the program is
 adding rows, the performance of the search query on the same table is
 very bad. The query uses the gin index and the tsearch ranking function
 ts_rank_cd. 
 
How bad is bad? It seems you are suffering from don't-fit-on-cache problem, no?

 This never happened earlier with postgres 9.0 Is there a known issue
 with Postgres 9.1? Or how to report this problem?
 
Test case? Query times? Query plans? Are you sure you the compile options are
the same? What about the configuration parameters? What is the exact version
of both installations?


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2011-12-19 Thread Pavel Stehule
2011/12/19 Greg Smith g...@2ndquadrant.com:
 On 12/17/2011 04:00 PM, Pavel Stehule wrote:

 I use it for checking of my most large plpgsql project - it is about
 300KB plpgsql procedures - but this code is not free - and this module
 helps to find lot of bugs.


 Great.  If you continue to check against that regularly, that makes me feel
 better.  I was guessing you had a large body of such source code around, and
 knowing it executed correctly against all of it improves my confidence here.


I am not alone

a subset is used in plpgsql_lint and I know about some commercial
subjects that use it too.

https://github.com/okbob/plpgsql_lint

but code in check function is little newer. It can interesting test
some code that is wroted by person with background from other db,
because they use a different patterns

I don't use a explicit cursors for example - on other hand, I use
exception intensively in my last project. We can ask people from
LedgerSMB about testing if somebody has contact


Regards

Pavel




 --
 Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
 PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Marti Raudsepp
On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli
gianni.cio...@2ndquadrant.it wrote:
  http://wiki.postgresql.org/wiki/Autonomous_subtransactions

 It is meant to be an ongoing project, requesting comments and
 contributions, rather than a conclusive document.

In addition to what Jim Nasby said, this proposal seems a bit
inflexible. In particular:
1. It limits us to exactly 2 autonomous transactions at any time (the
main one and the subtransaction).

2. There's no reason why two autonomous transactions should have a
main / sub relationship. They are autonomous -- they should not
depend on the state of the outer transaction.

Now, the initial implementation may well have such limits, but I think
you should design your proposal to accommodate the above two features
in the future without having to redesign the syntax.

Maybe look at SAVEPOINTs for inspiration. There can be multiple
savepoints in a single transaction, and they can be released/rolled
back to, at any time.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote:
 That seems pretty sensible; I wouldn't want to make it a hard requirement
 though.  There are three major ways this could go for 9.2:

 1) Nothing is changed in core, the best that can be done is whatever you can
 cram into an extension.

 2) 9.2 is released with Command Triggers using nodeToString() output as the
 detailed description.  That part is labeled experimental and the API is
 expected to change completely in the next release

 3) 9.2 gets enough JSON support to also have an early nodeToJSON API
 instead.  Now it's labeled early release and some API changes may be needed
 in future releases.

 From the perspective of enabling a developer community to spring up around
 working in this area, I don't see a huge difference between (2) and (3).

I do.  Anyone coding in PL/pgsql is going to find the nodeToString()
output unusable, and we can easily provide a built-in JSON datatype
with enough richness to make that problem go away in time for 9.2.
People in PL/python and PL/perl may be a bit better off, but I see no
reason to ship something for 9.2 and then break it for 9.3 when we
could perfectly well make that compatibility break before the release.
 (And, in case it's not clear, yes, I am volunteering to do the work,
if it comes down to that.)

But before we get bogged down in the data representation, I think we
need to make a more fundamental decision: to what extent are we OK
with exporting the parse tree at all, in any form?  Tom is arguing
that we shouldn't, and I see his point: the recent DROP command rework
would have broken everybody's command triggers if we had adopted this
proposal, and that would be a real shame, because I don't particularly
like the idea that we can't continue to improve the code and refactor
things because someone out there might be depending on an older and
less well-considered behavior.

On the flip side, I don't really see any other way to make this
feature work at all.  I think that AFTER CREATE triggers and BEFORE
DROP triggers could potentially be implemented by just passing OIDs
in, and that might be useful enough for many people.  But what about
ALTER?  I don't see that you're going to be able to write any sort of
meaningful triggers around ALTER without passing at least some of the
parse tree information to the trigger function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Andres Freund
Hi,

On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote:
 On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote:
  That seems pretty sensible; I wouldn't want to make it a hard requirement
  though.  There are three major ways this could go for 9.2:
  
  1) Nothing is changed in core, the best that can be done is whatever you
  can cram into an extension.
  
  2) 9.2 is released with Command Triggers using nodeToString() output as
  the detailed description.  That part is labeled experimental and the
  API is expected to change completely in the next release
  
  3) 9.2 gets enough JSON support to also have an early nodeToJSON API
  instead.  Now it's labeled early release and some API changes may be
  needed in future releases.
  
  From the perspective of enabling a developer community to spring up
  around working in this area, I don't see a huge difference between (2)
  and (3).
 
 I do.  Anyone coding in PL/pgsql is going to find the nodeToString()
 output unusable, and we can easily provide a built-in JSON datatype
 with enough richness to make that problem go away in time for 9.2.
 People in PL/python and PL/perl may be a bit better off, but I see no
 reason to ship something for 9.2 and then break it for 9.3 when we
 could perfectly well make that compatibility break before the release.
  (And, in case it's not clear, yes, I am volunteering to do the work,
 if it comes down to that.)
I personally think youre underestimating the complexity of providing a 
sensible json compatibility shim ontop the nodestring representation. But I 
would like to be proven wrong ;)
Whats your idea?

 But before we get bogged down in the data representation, I think we
 need to make a more fundamental decision: to what extent are we OK
 with exporting the parse tree at all, in any form?  Tom is arguing
 that we shouldn't, and I see his point: the recent DROP command rework
 would have broken everybody's command triggers if we had adopted this
 proposal, and that would be a real shame, because I don't particularly
 like the idea that we can't continue to improve the code and refactor
 things because someone out there might be depending on an older and
 less well-considered behavior.
I don't see any realistic way to present the data in way thats abstracted from 
the internals for now. The infrastructure is completely new and we don't 
really know what it will be used for. I personally have no problem requiring 
that any nontrivial nodestring access has to be done via c functions for now.
Building a completely new form of parstree seems way too much effort/flamebait 
for me. 

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Pavel Stehule
2011/12/19 Robert Haas robertmh...@gmail.com:
 On Mon, Dec 19, 2011 at 9:31 AM, Greg Smith g...@2ndquadrant.com wrote:
 That seems pretty sensible; I wouldn't want to make it a hard requirement
 though.  There are three major ways this could go for 9.2:

 1) Nothing is changed in core, the best that can be done is whatever you can
 cram into an extension.

 2) 9.2 is released with Command Triggers using nodeToString() output as the
 detailed description.  That part is labeled experimental and the API is
 expected to change completely in the next release

 3) 9.2 gets enough JSON support to also have an early nodeToJSON API
 instead.  Now it's labeled early release and some API changes may be needed
 in future releases.

 From the perspective of enabling a developer community to spring up around
 working in this area, I don't see a huge difference between (2) and (3).

 I do.  Anyone coding in PL/pgsql is going to find the nodeToString()
 output unusable, and we can easily provide a built-in JSON datatype
 with enough richness to make that problem go away in time for 9.2.
 People in PL/python and PL/perl may be a bit better off, but I see no
 reason to ship something for 9.2 and then break it for 9.3 when we
 could perfectly well make that compatibility break before the release.
  (And, in case it's not clear, yes, I am volunteering to do the work,
 if it comes down to that.)

absolutelly

Parsing a some document is not a good way for plpgsql. We can enhance
a diagnostics statement for triggers values.

there should be lot of variables, and it is cheap because we can take
content when it is requested

STATEMENT_NAME,
OBJECT_SCHEMA,
OBJECT_NAME,
OBJECT_TYPE,
OBJECT_OID,
ATTRIBUT_NAME,
ATTRIBUT_OID,
ATTRIBUT_TYPE,
STATEMENT_PARAMETERS_ARRAY

plpgsql is not good for recursive data - can be nice if all necessary
data can be flat.

Regards

Pavel


 But before we get bogged down in the data representation, I think we
 need to make a more fundamental decision: to what extent are we OK
 with exporting the parse tree at all, in any form?  Tom is arguing
 that we shouldn't, and I see his point: the recent DROP command rework
 would have broken everybody's command triggers if we had adopted this
 proposal, and that would be a real shame, because I don't particularly
 like the idea that we can't continue to improve the code and refactor
 things because someone out there might be depending on an older and
 less well-considered behavior.

 On the flip side, I don't really see any other way to make this
 feature work at all.  I think that AFTER CREATE triggers and BEFORE
 DROP triggers could potentially be implemented by just passing OIDs
 in, and that might be useful enough for many people.  But what about
 ALTER?  I don't see that you're going to be able to write any sort of
 meaningful triggers around ALTER without passing at least some of the
 parse tree information to the trigger function.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 11:20 AM, Andres Freund and...@anarazel.de wrote:
 I do.  Anyone coding in PL/pgsql is going to find the nodeToString()
 output unusable, and we can easily provide a built-in JSON datatype
 with enough richness to make that problem go away in time for 9.2.
 People in PL/python and PL/perl may be a bit better off, but I see no
 reason to ship something for 9.2 and then break it for 9.3 when we
 could perfectly well make that compatibility break before the release.
  (And, in case it's not clear, yes, I am volunteering to do the work,
 if it comes down to that.)
 I personally think youre underestimating the complexity of providing a
 sensible json compatibility shim ontop the nodestring representation. But I
 would like to be proven wrong ;)
 Whats your idea?

I haven't gotten that far down into the minutiae yet.  :-)

But the reason I think it won't be too bad is because the current
representation is darn close to JSON already:

{VAR :varno 2 :varattno 1 :vartype 25 :vartypmod -1 :varcollid 100
:varlevelsup 0 :varnoold 2 :varoattno 1 :location 9998}

=

{_:VAR,varno:2,varattno:1,vartype:25,vartypmod:-1,varcollid:100,varlevelsup:0,varnoold:2,varoattno:1,location:9998}


If you've got something like:

{OPEXPR :opno 98 :opfuncid 67 :opresulttype 16 :opretset false
:opcollid 0 :inputcollid 100 :args ({VAR :varno 2 :varattno 1 :vartype
25 :vartypmod -1 :varcollid 100 :varlevelsup 0 :varnoold 2 :varoattno
1 :location 9998} {VAR :varno 1 :varattno 1 :vartype 25 :vartypmod -1
:varcollid 100 :varlevelsup 0 :varnoold 1 :varoattno 1 :location
10009}) :location 10007}

...then the value for the args label will just be an object rather
than an integer.

 But before we get bogged down in the data representation, I think we
 need to make a more fundamental decision: to what extent are we OK
 with exporting the parse tree at all, in any form?  Tom is arguing
 that we shouldn't, and I see his point: the recent DROP command rework
 would have broken everybody's command triggers if we had adopted this
 proposal, and that would be a real shame, because I don't particularly
 like the idea that we can't continue to improve the code and refactor
 things because someone out there might be depending on an older and
 less well-considered behavior.
 I don't see any realistic way to present the data in way thats abstracted from
 the internals for now. The infrastructure is completely new and we don't
 really know what it will be used for.

That's my feeling as well, but I'm hoping Tom or someone else has a better idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 10:02 AM, pratikchirania pratik.chira...@hp.com wrote:
 Version upgrade: hmm.. Is there any fix/change related to this issue in
 9.0.6?

You could read the release notes for those minor version upgrades.

Based on a quick look through the commit logs, and a quick grep of
release-9-0.sgml, I don't think so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping : in .pgpass - code or docs bug?

2011-12-19 Thread Robert Haas
On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstrom reeds...@rice.edu wrote:
 On Fri, Dec 16, 2011 at 02:55:09PM +, Richard Huxton wrote:
 According to the docs [1], you should escape embedded colons in
 .pgpass (fair enough). Below is PG 9.1.1

 user = te:st, db = te:st, password = te:st

     $ cat ~/.pgpass
     *:*:te:st:te:st:te:st
     $ psql91 -U te:st -d te:st
     te:st=

     $ cat ~/.pgpass
     *:*:te\:st:te\:st:te:st
     $ psql91 -U te:st -d te:st
     te:st=

     $ cat ~/.pgpass
     *:*:te\:st:te\:st:te\:st
     $ psql91 -U te:st -d te:st
     psql: FATAL:  password authentication failed for user te:st
     password retrieved from file /home/richardh/.pgpass

 I'm a bit puzzled how it manages without the escaping in the first
 case. There's a lack of consistency though that either needs
 documenting or fixing.

 Hmm, seems the code in fe-connect.c that reads the password out of .pgpass 
 does this:

    if ((t = pwdfMatchesString(t, hostname)) == NULL ||
                        (t = pwdfMatchesString(t, port)) == NULL ||
                        (t = pwdfMatchesString(t, dbname)) == NULL ||
                        (t = pwdfMatchesString(t, username)) == NULL)
  [...]

 pwdfMatchesString 'eats' the stringbuffer until the next unmatched character 
 or
 unescaped colon.  If it falls out the bottom of that, the rest of the line is
 returned as the candidate password.

 Since the code that does the backslash detection is in pwdfMatchesString(), 
 and
 the password never goes through that function, the escapes are not cleaned up.

 This should either be fixed by changing the documentation to say to not escape
 colons or backslashes in the password part, only, or modify this function
 (PasswordFromFile) to silently unescape the password string. It already copies
 it.

My vote is for a doc correction in the back-branches and a behavior
change in master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeVarGetRelid()

2011-12-19 Thread Noah Misch
On Mon, Dec 19, 2011 at 10:11:23AM -0500, Robert Haas wrote:
 On Fri, Dec 16, 2011 at 10:32 AM, Noah Misch n...@leadboat.com wrote:
  I agree, but my point is that so far we have no callbacks that differ
  only in that detail. ?I accept that we'd probably want to avoid that.
 
  To illustrate what I had in mind, here's a version of your patch that has 
  five
  callers sharing a callback. ?The patch is against d039fd51f79e, just prior 
  to
  your recent commits.
 
 I don't think RenameRelation() ought to wait until we've taken the
 lock before doing this:
 
 +   aclresult = pg_namespace_aclcheck(namespaceId, GetUserId(), 
 ACL_CREATE);
 +   if (aclresult != ACLCHECK_OK)
 +   aclcheck_error(aclresult, ACL_KIND_NAMESPACE,
 +  get_namespace_name(namespaceId));
 
 I'm not sure how we can distinguished in a principled way between
 permissions checks that must be conducted before locking the relation
 and those that can be done afterwards.  And I don't really want to
 make that distinction, anyhow.

Here's the rule I used: the pre-lock checks must prove that the user could, by
some command working as-designed, have taken a lock at least as strong as the
one we're about to acquire.  How does that work out in practice?  Relation
owners can always take AccessExclusiveLock by way of a DROP command, so
ownership is always sufficient as a pre-lock test.

The above namespace check is no exception; the object owner has authority to
take the AccessExclusiveLock independent of his authority to ultimately
complete the rename.  (Changes arising from the recent discussions about
locking namespaces during DDL would complicate the situation in other ways.)

 But I see your point, otherwise.  What I'm tempted to do is define a
 new function RangeVarGetRelidCheckOwner(RangeVar *relation, LOCKMODE
 lockmode, bool system_tables_ok) that will arrange to look up the OID,
 check that you own it, optionally check that it's not a system table,
 and acquire the specified lock.  Internally that will call
 RangeVarGetRelidExtended() with a callback; the callback arg can be
 used to pass through the system_tables_ok flag.

Is the system catalog check special?

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] array behavior

2011-12-19 Thread Marti Raudsepp
On Fri, Dec 16, 2011 at 00:15, amit sehas cu...@yahoo.com wrote:
 There is the TOAST mechanism for oversized tuples but that is still
 considered to be a single tuple. Is there any circumstance in which an
 attribute which is an array will be broken up into individual
 tuples which are somehow associated with the main tuple.

I think you've misunderstood TOAST. That's exactly what TOAST does now.
TOAST works by storing individual *attributes* (values) in a separate
table -- not entire tuples. All smaller columns in a row are still stored
together with other rows, and only large fields are split out.

 and additionally it may have undesirable performance impact if the queries
 are not even interested in seeing the array when fetching the object ?

If your query doesn't touch the TOASTed field, it's not even read from the
TOAST table. However, large values that are smaller than the TOAST threshold
(2kB compressed) sometimes cause problems, since they're stored inline in
the heap, bloating the table size and thus increasing the time it takes to
scan it.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgstat wait timeout

2011-12-19 Thread Andrew Dunstan



On 12/19/2011 11:45 AM, Robert Haas wrote:

On Mon, Dec 19, 2011 at 10:02 AM, pratikchiraniapratik.chira...@hp.com  wrote:

Version upgrade: hmm.. Is there any fix/change related to this issue in
9.0.6?

You could read the release notes for those minor version upgrades.

Based on a quick look through the commit logs, and a quick grep of
release-9-0.sgml, I don't think so.




Would this be alleviated by setting stats_temp_dir to point to a ramdisk?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread David Fetter
On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote:
 On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote:
  * Aidan Van Dyk (ai...@highrise.ca) wrote:
  But the scary part is you don't know how long *ago* the crash was.
  Because a hint-bit-only change w/ a torn-page is a non event in
  PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
  and scrub every page in the database.
 
  Fair enough, but, could we distinguish these two cases?  In other words,
  would it be possible to detect if a page was torn due to a 'traditional'
  crash and not complain in that case, but complain if there's a CRC
  failure and it *doesn't* look like a torn page?
 
 No.

Would you be so kind as to elucidate this a bit?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Nikhil Sontakke
 It seems hard to believe that ATExecDropConstraint() doesn't need any
 adjustment.


Yeah, I think we could return early on for only type of constraints.

Also, shouldn't the systable scan break out of the while loop after a
matching constraint name has been found? As of now, it's doing a full scan
of all the constraints for the given relation.

 @@ -6755,6 +6765,7 @@ ATExecDropConstraint(Relation rel, const char
*constrName,
  HeapTupletuple;
  boolfound = false;
  boolis_check_constraint = false;
 +boolis_only_constraint = false;

  /* At top level, permission check was done in ATPrepCmd, else do it
*/
  if (recursing)
 @@ -6791,6 +6802,12 @@ ATExecDropConstraint(Relation rel, const char
*constrName,
  /* Right now only CHECK constraints can be inherited */
  if (con-contype == CONSTRAINT_CHECK)
  is_check_constraint = true;
 +
 +if (con-conisonly)
 +{
 +Assert(is_check_constraint);
 +is_only_constraint = true;
 +}

  /*
   * Perform the actual constraint deletion
 @@ -6802,6 +6819,9 @@ ATExecDropConstraint(Relation rel, const char
*constrName,
  performDeletion(conobj, behavior);

  found = true;
 +
 +/* constraint found - break from the while loop now */
 +break;
  }

  systable_endscan(scan);
 @@ -6830,7 +6850,7 @@ ATExecDropConstraint(Relation rel, const char
*constrName,
   * routines, we have to do this one level of recursion at a time; we
can't
   * use find_all_inheritors to do it in one pass.
   */
 -if (is_check_constraint)
 +if (is_check_constraint  !is_only_constraint)
  children = find_inheritance_children(RelationGetRelid(rel),
lockmode);
  else
  children = NIL;

PFA, revised version containing the above changes based on Alvaro's v4
patch.

Regards,
Nikhils


non_inh_check_v5.0.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Marti Raudsepp
On Mon, Dec 19, 2011 at 17:30, Sushant Sinha sushant...@gmail.com wrote:
 This never happened earlier with postgres 9.0 Is there a known issue
 with Postgres 9.1? Or how to report this problem?

What *did* change in 9.1 is that there's new GIN cost estimation in
the planner. If you do EXPLAIN ANALYZE for your queries, do the plans
differ for 9.0 or 9.1? E.g. doing an index scan instead of a seq scan
or vice versa.

 The query uses the gin index and the tsearch ranking function
 ts_rank_cd.

Another thought -- have you read about the GIN fast updates feature?
This existed in 9.0 too. Instead of updating the index directly, GIN
appends all changes to a sequential list, which needs to be scanned in
whole for read queries. The periodic autovacuum process has to merge
these values back into the index.

Maybe the solution is to tune autovacuum to run more often on the table.

http://www.postgresql.org/docs/9.1/static/gin-implementation.html

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Andres Freund
On Monday, December 19, 2011 03:33:22 PM Alvaro Herrera wrote:
 Excerpts from Stephen Frost's message of lun dic 19 11:18:21 -0300 2011:
  * Aidan Van Dyk (ai...@highrise.ca) wrote:
   #) Anybody investigated putting the CRC in a relation fork, but not
   right in the data block?  If the CRC contains a timestamp, and is WAL
   logged before the write, at least on reading a block with a wrong
   checksum, if a warning is emitted, the timestamp could be looked at by
   whoever is reading the warning and know tht the block was written
   shortly before the crash $X $PERIODS ago
  
  I do like the idea of putting the CRC info in a relation fork, if it can
  be made to work decently, as we might be able to then support it on a
  per-relation basis, and maybe even avoid the on-disk format change..
  
  Of course, I'm sure there's all kinds of problems with that approach,
  but it might be worth some thinking about.
 
 I think the main objection to that idea was that if you lose a single
 page of CRCs you have hundreds of data pages which no longer have good
 CRCs.
Which I find a pretty non-argument because there is lots of SPOF data in a 
cluster (WAL, control record) anyway...
If recent data starts to fail you have to restore from backup anyway.


Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Stephen Frost
* David Fetter (da...@fetter.org) wrote:
 On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote:
  On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote:
   Fair enough, but, could we distinguish these two cases?  In other words,
   would it be possible to detect if a page was torn due to a 'traditional'
   crash and not complain in that case, but complain if there's a CRC
   failure and it *doesn't* look like a torn page?
  
  No.
 
 Would you be so kind as to elucidate this a bit?

I'm guessing, based on some discussion on IRC, that it's because we
don't really 'detect' torn pages today, when it's due to a hint-bit-only
update.  With all the trouble due to hint-bit updates, and if they're
written out or not, makes me wish we could just avoid doing hint-bit
only updates to disk somehow..  Or log them when we do them.  Both of
those have their own drawbacks, of course.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Page Checksums

2011-12-19 Thread Stephen Frost
* Andres Freund (and...@anarazel.de) wrote:
 On Monday, December 19, 2011 03:33:22 PM Alvaro Herrera wrote:
   I do like the idea of putting the CRC info in a relation fork, if it can
   be made to work decently, as we might be able to then support it on a
   per-relation basis, and maybe even avoid the on-disk format change..
   
  I think the main objection to that idea was that if you lose a single
  page of CRCs you have hundreds of data pages which no longer have good
  CRCs.
 Which I find a pretty non-argument because there is lots of SPOF data in a 
 cluster (WAL, control record) anyway...
 If recent data starts to fail you have to restore from backup anyway.

I agree with Andres on this one..  Also, if we use CRC on the pages in
the CRC, hopefully we'd be able to detect when a bad block impacted the
CRC fork and differentiate that from a whole slew of bad blocks in the
heap..

There might be an issue there with handling locking and having to go
through the page-level lock on the CRC, which locks a lot more pages in
the heap and therefore reduces scalability..  Don't we have a similar
issue with the visibility map though?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 19:08 +0200, Marti Raudsepp wrote:
 Another thought -- have you read about the GIN fast updates feature?
 This existed in 9.0 too. Instead of updating the index directly, GIN
 appends all changes to a sequential list, which needs to be scanned in
 whole for read queries. The periodic autovacuum process has to merge
 these values back into the index.
 
 Maybe the solution is to tune autovacuum to run more often on the
 table.
 
 http://www.postgresql.org/docs/9.1/static/gin-implementation.html
 
 Regards,
 Marti 

Probably this is the problem. Is running vacuum analyze under psql is
the same as autovacuum?

-Sushant.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Sushant Sinha
On Mon, 2011-12-19 at 12:41 -0300, Euler Taveira de Oliveira wrote:
 On 19-12-2011 12:30, Sushant Sinha wrote:
  I recently upgraded my postgres server from 9.0 to 9.1.2 and I am
  finding a peculiar problem.I have a program that periodically adds
 rows
  to this table using INSERT. Typically the number of rows is just 1-2
  thousand when the table already has 500K rows. Whenever the program
 is
  adding rows, the performance of the search query on the same table
 is
  very bad. The query uses the gin index and the tsearch ranking
 function
  ts_rank_cd. 
  
 How bad is bad? It seems you are suffering from don't-fit-on-cache
 problem, no? 

The memory is 32GB and the entire database is just 22GB. Even vmstat 1
does not show any disk activity. 

I was not able to isolate the performance numbers since I have observed
this only on the production box where the number of requests keep
increasing as the box gets loaded. But a query that takes 1sec normally
is taking more than 10secs (not sure whether it got the same number of
CPU cycles). Is there a way to find that?

-Sushant.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Monday, December 19, 2011 05:12:13 PM Robert Haas wrote:
 I do.  Anyone coding in PL/pgsql is going to find the nodeToString()
 output unusable, and we can easily provide a built-in JSON datatype
 with enough richness to make that problem go away in time for 9.2.

I hear the sound of goalposts moving.

 I personally think youre underestimating the complexity of providing a 
 sensible json compatibility shim ontop the nodestring representation. But I 
 would like to be proven wrong ;)

If we were going to do this at all, the way to do it is to flush the
existing nodestring representation and redefine it as being JSON.  No?
If this is as easy as people are claiming, it should not be hard to
revise the lower-level bits of outfuncs/readfuncs to make the text
representation compatible.  And there's no reason we can't change
what is stored in pg_rewrite.

 But before we get bogged down in the data representation, I think we
 need to make a more fundamental decision: to what extent are we OK
 with exporting the parse tree at all, in any form?  Tom is arguing
 that we shouldn't, and I see his point: the recent DROP command rework
 would have broken everybody's command triggers if we had adopted this
 proposal, and that would be a real shame, because I don't particularly
 like the idea that we can't continue to improve the code and refactor
 things because someone out there might be depending on an older and
 less well-considered behavior.

The problem that changing the nodestring representation could help with
is making user-written triggers roughly as robust as C code is, to wit
that you don't have to change it as long as the specific fields it
touches aren't redefined.  The question is whether that's good enough.
The DROP command changes provide a pretty strong clue that it isn't.
Admittedly, that's not the sort of change we make too often.  But I
will be seriously annoyed if we start getting the sort of pushback
on parsetree changes that we've been hearing from certain quarters about
configuration file changes.  Those structures are *internal* and we have
got to have the flexibility to whack them around.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Jesper Krogh

On 2011-12-19 18:08, Marti Raudsepp wrote:

The query uses the gin index and the tsearch ranking function
ts_rank_cd.

Another thought -- have you read about the GIN fast updates feature?
This existed in 9.0 too. Instead of updating the index directly, GIN
appends all changes to a sequential list, which needs to be scanned in
whole for read queries. The periodic autovacuum process has to merge
these values back into the index.

Maybe the solution is to tune autovacuum to run more often on the table.

http://www.postgresql.org/docs/9.1/static/gin-implementation.html


I have to say that I consistently have to turn fastupdate off for
our heavily updated gin-indexes. The overall performance gain
may be measurable, but its not intolerable without. The spikes seen
from the applications, when cleanup happens. Either in the foreground
or in the background are not tolerable. (multiple seconds).

I may just not have experienced suffienctly with the various sizes of 
work_mem,
but I would indeed love to have a connection only fastupdate, so 
within a single
transaction it could use the fastupdate technique, but not stuffing 
index-updates

onto unreleated queries by random.

Jesper
--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Simon Riggs
On Mon, Dec 19, 2011 at 1:17 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote:
 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.

 This may be stupid of me, but I don't see the point of this.  If you
 want keepalives, why use log shipping rather than SR?

On Dec 12, you said It also strikes me that anything
that is based on augmenting the walsender/walreceiver protocol leaves
anyone who is using WAL shipping out in the cold.  I'm not clear from
the comments you or Simon have made how important you think that use
case still is.

Not wanting to leave anyone out in the cold, I proposed something to
enhance file based replication also.

In any case, multiple others have requested this feature, so its worth
doing even if you have changed your mind.

 Implementing a
 really-high-latency method of passing protocol messages through the
 archive seems like a complex solution to a non-problem (but, like I
 say, I may be missing something).

So a) it is a problem, and b) its not complex.

The proposed method doesn't necessarily use the archive. Allowing
users to specify how the keepalive will work makes it a flexible
solution to a widely recognised problem.

This proposal doesn't replace the protocol keepalive for streaming
replication, it provides exactly the same thing for file based
replication users. Many people use both streaming and file-based, so
need a way to measure latency that acts similarly no matter which one
is currently in use.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Alvaro Herrera

Excerpts from Alex Hunsaker's message of vie dic 16 18:07:05 -0300 2011:
 
 On Fri, Dec 16, 2011 at 14:01, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
 
  Excerpts from Alex Hunsaker's message of vie dic 16 17:50:12 -0300 2011:
 
  On Fri, Dec 16, 2011 at 12:06, Alvaro Herrera
  alvhe...@commandprompt.com wrote:
 
   Yeah.  Nikhil, Alex, this is the merged patch.  Have a look that it
   still works for you (particularly the pg_dump bits) and I'll commit it.
   I adjusted the regression test a bit too.
 
  Other than the version checks seem to be off by one looks fine.
 
  Uhm ... you're right that convalidated is present in 9.1 [...] So I
  don't think we really need to add a separate branch for 9.1 here, but it
  certainly needs a comment improvement.
 
 Hrm... What am I missing?

I was saying that it should all be = 9.2.  There are no
convalidated=false check constraints in 9.1, so the extra branch is
useless.  This is sufficient:

@@ -6019,8 +6019,13 @@ getTableAttrs(TableInfo *tblinfo, int numTables)
  tbinfo-dobj.name);
 
resetPQExpBuffer(q);
-   if (g_fout-remoteVersion = 90100)
+   if (g_fout-remoteVersion = 90200)
{
+   /*
+* conisonly and convalidated are new in 9.2 (actually, the 
latter
+* is there in 9.1, but it wasn't ever false for check 
constraints
+* until 9.2).
+*/
appendPQExpBuffer(q, SELECT tableoid, oid, conname, 
   pg_catalog.pg_get_constraintdef(oid) AS consrc, 
  conislocal, convalidated, conisonly 


-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Greg Smith

On 12/19/2011 11:12 AM, Robert Haas wrote:

But before we get bogged down in the data representation, I think we
need to make a more fundamental decision: to what extent are we OK
with exporting the parse tree at all, in any form?  Tom is arguing
that we shouldn't, and I see his point: the recent DROP command rework
would have broken everybody's command triggers if we had adopted this
proposal, and that would be a real shame, because I don't particularly
like the idea that we can't continue to improve the code and refactor
things because someone out there might be depending on an older and
less well-considered behavior.


Any interface here would need to be in the same sense Linux uses the 
term:  subject to change in every major version, and maybe even in a 
minor one if that's the best way to solve a higher priority issue.  An 
example we've been consuming that comes to mind is the API for keeping 
processes from being killed by the OOM killer.  It's far from stable:  
http://archives.postgresql.org/message-id/4ce5e437.7080...@2ndquadrant.com 
but it's still possible for users of it to keep up with new releases, 
and when feasible some work toward backward compatibility is done (but 
not always)


As a tool author, I would expect anything working at the level where the 
data needed is only available from the parse tree would need to be 
re-tested against each new version, and then have version specific 
changes as needed.  Setting the expectations bar any higher for 
consumers of that interface would be unrealistic.  The minority of 
people who'd like to use this feature shouldn't want to see PostgreSQL 
development hamstrung for the majority either, and the standards for 
breakage here should be clear from the beginning--unlike those for, say, 
GUC changes between releases.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-19 Thread Phil Sorber
On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote:
 Attached is a patch that addresses the todo item Improve relation
 size functions such as pg_relation_size() to avoid producing an error
 when called against a no longer visible relation.

 http://archives.postgresql.org/pgsql-general/2010-10/msg00332.php

 Instead of returning an error, they now return NULL if the OID is
 found in pg_class when using SnapshotAny. I only applied it to 4
 functions: select pg_relation_size, pg_total_relation_size,
 pg_table_size and pg_indexes_size. These are the ones that were using
 relation_open(). I changed them to using try_relation_open(). For
 three of them I had to move the try_relation_open() call up one level
 in the call stack and change the parameter types for some support
 functions from Oid to Relation. They now also call a new function
 relation_recently_dead() which is what checks for relation in
 SnapshotAny. All regression tests passed.

 Is SnapshotAny the snapshot I should be using? It seems to get the
 correct results. I can drop a table and I get NULL. Then after a
 vacuumdb it returns an error.

Something I'd like to point out myself is that I need to be using
ObjectIdAttributeNumber instead of Anum_pg_class_relfilenode. Probably
just luck that I got the intended results before. This will also
change the logic in a few other places.

Still not clear on the semantics of Snapshot{Any|Self|Now|Dirty}.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Simon Riggs
On Sun, Dec 18, 2011 at 4:22 PM, Jim Nasby j...@nasby.net wrote:
 On Dec 18, 2011, at 2:28 AM, Gianni Ciolli wrote:
 I have written some notes about autonomous subtransactions, which have
 already been touched (at least) in two separate threads; please find
 them at

  http://wiki.postgresql.org/wiki/Autonomous_subtransactions

 The document seems to mix the terms subtransaction and autonomous 
 transaction. That's going to generate a ton of confusion, because both terms 
 already have meaning associated with them:

 - Autonomous transaction means you can execute something outside of your 
 current transaction and it is in no way effected by the current transaction 
 (doesn't matter if T0 commits or not).
 - Subtransactions are an alternative to savepoints. They allow you to break a 
 large transaction into smaller chunks, but if T0 doesn't commit then none of 
 the subtransactions do either.

OK, perhaps we should just stick to the term Autonomous Transaction.
That term is in common use, even if the usage is otherwise exactly the
same as a subtransaction i.e. main transaction stops until the
subtransaction is complete.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Simon Riggs
On Mon, Dec 19, 2011 at 3:52 PM, Marti Raudsepp ma...@juffo.org wrote:
 On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli
 gianni.cio...@2ndquadrant.it wrote:
  http://wiki.postgresql.org/wiki/Autonomous_subtransactions

 It is meant to be an ongoing project, requesting comments and
 contributions, rather than a conclusive document.

 In addition to what Jim Nasby said, this proposal seems a bit
 inflexible. In particular:
 1. It limits us to exactly 2 autonomous transactions at any time (the
 main one and the subtransaction).

It's not clear to me why you think there would be a limitation to
exactly 2 autonomous transactions.

The intention would be to have a theoretically unlimited number,
subject to resource limits, just as normal subtransactions already do.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 12:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I personally think youre underestimating the complexity of providing a
 sensible json compatibility shim ontop the nodestring representation. But I
 would like to be proven wrong ;)

 If we were going to do this at all, the way to do it is to flush the
 existing nodestring representation and redefine it as being JSON.  No?
 If this is as easy as people are claiming, it should not be hard to
 revise the lower-level bits of outfuncs/readfuncs to make the text
 representation compatible.  And there's no reason we can't change
 what is stored in pg_rewrite.

I thought about that.  A quick inspection suggests that this would
slightly increase the size of the stored rules, because the node type
would need to become a key, which would add at least a few more
characters, and also because we'd need more quoting.  That is:

{THING :wump 1}

becomes, at a minimum:

{_: THING, wump: 1}

And that's using a somewhat-lame single character label for the node
tag.  Now, I suspect that in practice the cost of the stored rules
becoming slightly larger is negligible, so maybe it's worth it.

 But before we get bogged down in the data representation, I think we
 need to make a more fundamental decision: to what extent are we OK
 with exporting the parse tree at all, in any form?  Tom is arguing
 that we shouldn't, and I see his point: the recent DROP command rework
 would have broken everybody's command triggers if we had adopted this
 proposal, and that would be a real shame, because I don't particularly
 like the idea that we can't continue to improve the code and refactor
 things because someone out there might be depending on an older and
 less well-considered behavior.

 The problem that changing the nodestring representation could help with
 is making user-written triggers roughly as robust as C code is, to wit
 that you don't have to change it as long as the specific fields it
 touches aren't redefined.  The question is whether that's good enough.

Agreed.

 The DROP command changes provide a pretty strong clue that it isn't.
 Admittedly, that's not the sort of change we make too often.  But I
 will be seriously annoyed if we start getting the sort of pushback
 on parsetree changes that we've been hearing from certain quarters about
 configuration file changes.  Those structures are *internal* and we have
 got to have the flexibility to whack them around.

Yes.

Maybe we should try to split the baby here and defer the question of
whether to expose any of the parse tree internals, and if so how much,
to a future release.  It seems to me that we could design a fairly
useful set of functionality around AFTER-CREATE, BEFORE-DROP, and
maybe even AFTER-ALTER triggers without exposing any parse tree
details.  For CREATE and ALTER, that would make it the client's
responsibility to inspect the system catalogs and figure out what had
happened and what to do about it, which is admittedly not ideal, but
it would be more than we have now, and it would then give us the
option to consider requests to expose more information in future
releases on a case-by-case basis, rather than making a blanket
decision about whether to expose the parse tree or not.  I have a
sneaking suspicion that, while we probably can't get by without
exposing any parse tree information ever, the amount we truly need to
expose might end up being only a small percentage of the total...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeVarGetRelid()

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 11:55 AM, Noah Misch n...@leadboat.com wrote:
 Here's the rule I used: the pre-lock checks must prove that the user could, by
 some command working as-designed, have taken a lock at least as strong as the
 one we're about to acquire.  How does that work out in practice?  Relation
 owners can always take AccessExclusiveLock by way of a DROP command, so
 ownership is always sufficient as a pre-lock test.

 The above namespace check is no exception; the object owner has authority to
 take the AccessExclusiveLock independent of his authority to ultimately
 complete the rename.  (Changes arising from the recent discussions about
 locking namespaces during DDL would complicate the situation in other ways.)

Yes: and I have it mind to work on that for the current release cycle,
time permitting.  It seems to me that there's no real value in
consolidating this callback if there's a change coming down the pipe
that would require us to split it right back out again.

Also, while I see the point that it wouldn't be a security issue to
defer this particular check until after the lock is taken, I still
don't think it's a good idea.  I basically don't see any point in
splitting up the security checks across multiple functions.  Our
permissions-checking logic is already rather diffuse; I think that
finding more reasons for some of it to be done in one function and
some of it to be done in some other function is going in the wrong
direction.  It makes it easier for future people editing the code to
rearrange things in ways that subtly break security without realizing
it, and it's also one more obstacle for things like sepgsql which
would like to get control whenever we do a security check. (Dimitri's
recent comments about command triggers suggest that MAC isn't the only
application of an option to override the default access control
policy.)  It's also not terribly consistent from a user perspective:
hmm, if I don't have permission to do operation X because of reason A,
then my command fails immediately; if I don't have permission because
of reason B, then it waits for a lock and then fails.  Some amount of
inconsistency there is probably inevitable, because, for example, we
won't know whether you have permission on an entire inheritance
hierarchy until we start walking it.  But I don't really see the point
in going out of our way to add more of it.

I think it's important to keep in mind that most of the code that is
going into those callbacks is just being moved.  We're not really
ending up with any more code overall, except to the extent that there
are a couple of lines of net increase for each callback because, well,
it has a comment, and maybe a declaration, and some curly braces at
the beginning and the end.  The ownership check is two lines of code;
it doesn't matter whether we duplicate that or not.  In many cases, I
think the callbacks are actually increasing readability, by pulling a
bunch of related checks into their own function instead of cluttering
up the main code path with them.  I don't want to end up with a lot of
needless code duplication, but I also don't feel any powerful need to
squeeze the number of callback functions down to an absolute minimum
just because I can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Greg Smith

On 12/19/2011 07:50 AM, Robert Haas wrote:

On Mon, Dec 19, 2011 at 6:10 AM, Simon Riggssi...@2ndquadrant.com  wrote:

The only sensible way to handle this is to change the page format as
discussed. IMHO the only sensible way that can happen is if we also
support an online upgrade feature. I will take on the online upgrade
feature if others work on the page format issues, but none of this is
possible for 9.2, ISTM.

I'm not sure that I understand the dividing line you are drawing here.


There are three likely steps to reaching checksums:

1) Build a checksum mechanism into the database.  This is the 
straighforward part that multiple people have now done.


2) Rework hint bits to make the torn page problem go away.  Checksums go 
elsewhere? More WAL logging to eliminate the bad situations?  Eliminate 
some types of hint bit writes?  It seems every alternative has 
trade-offs that will require serious performance testing to really validate.


3) Finally tackle in-place upgrades that include a page format change.  
One basic mechanism was already outlined:  a page converter that knows 
how to handle two page formats, some metadata to track which pages have 
been converted, a daemon to do background conversions.  Simon has some 
new ideas here too (online upgrade involves two clusters kept in sync 
on different versions, slightly different concept than the current 
in-place upgrade).  My recollection is that the in-place page upgrade 
work was pushed out of the critical path before due to lack of immediate 
need.  It wasn't necessary until a) a working catalog upgrade tool was 
validated and b) a bite-size feature change to test it on appeared.  We 
have (a) now in pg_upgrade, and CRCs could be (b)--if the hint bit 
issues are sorted first.


What Simon was saying is that he's got some interest in (3), but wants 
no part of (2).


I don't know how much time each of these will take.  I would expect that 
(2) and (3) have similar scopes though--many days, possibly a few 
months, of work--which means they both dwarf (1).  The part that's been 
done is the visible tip of a mostly underwater iceburg.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] reprise: pretty print viewdefs

2011-12-19 Thread Andrew Dunstan
A client has again raised with me the ugliness of pretty printed 
viewdefs. We looked at this a couple of years ago, but discussion went 
off into the weeds slightly, so I dropped it, but as requested I'm 
revisiting it.


The problem can be simply seen in the screenshot here: 
http://developer.postgresql.org/~adunstan/view_before.png


This is ugly, unreadable and unusable, IMNSHO. Calling it pretty is 
just laughable.


The simple solution I originally proposed to put a line feed and some 
space before every target field in pretty print mode. This is a two line 
patch. The downsides are a) maybe not everyone will like the change and 
b) it will produce superfluous newlines, e.g. before CASE expressions.


We can avoid the superfluous newlines at the cost of some code 
complexity. Whether it's worth it is a judgement call.


If we don't want to do this unconditionally, we'd need either a new 
function which would make it available or a new flavor of pg_get_viewdef 
- if the latter I'm not really sure what the new signatures should be - 
oid/text | something not boolean, but what? Personally, I'd much rather 
just do it. Does anyone *really* like the present output?


cheers

andrew







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-12-19 Thread Greg Smith
On 12/14/2011 09:02 AM, Shigeru Hanada wrote:
 Here I'd like to propose three incremental patches:

 1) fdw_helper_funcs_v3.patch...:  This is not specific to pgsql_fdw, but
 probably useful for every FDWs which use FDW options...
 2) pgsql_fdw_v5.patch:  This patch provides simple pgsql_fdw
 which does *NOT* support any push-down...
 3) pgsql_fdw_pushdown_v1.patch:  This patch adds limited push-down
 capability to pgsql_fdw which is implemented by previous patch...
 ...
 To implement [expression which uses user-defined function], I added 
 exprFunction to nodefuncs.c which returns Oid
 of function which is used in the expression node, but I'm not sure that
 it should be there.  Should we have it inside pgsql_fdw?

After failing to bring some light onto this during my general update,
will try again here. We now have 3 updated patches that refactor things
from how this was originally presented, with one asked implementation
question. There's also a spawned off Join push-down for foreign tables
patch off in another thread.

I don't think it's really clear to everyone what state this feature
proposal is in. We've gotten bits of review here from KaiGai and Heikki,
big picture comments from Robert and Tom. Given how these are
structured, is fdw_helper_funcs_v3.patch at the point where it should be
considered for committer review? Maybe pgsql_fdw_v5.patch too?

The others seem to be more in flux to me, due to all the recent pushdown
changes.

-- 
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 2) Rework hint bits to make the torn page problem go away. 
 Checksums go elsewhere? More WAL logging to eliminate the bad
 situations? Eliminate some types of hint bit writes?  It seems
 every alternative has trade-offs that will require serious
 performance testing to really validate.
 
I'm wondering whether we're not making a mountain out of a mole-hill
here.  In real life, on one single crash, how many torn pages with
hint-bit-only updates do we expect on average?  What's the maximum
possible?  In the event of a crash recovery, can we force all tables
to be seen as needing autovacuum?  Would there be a way to limit
this to some subset which *might* have torn pages somehow?
 
It seems to me that on a typical production system you would
probably have zero or one such page per OS crash, with zero being
far more likely than one.  If we can get that one fixed (if it
exists) before enough time has elapsed for everyone to forget the OS
crash, the idea that we would be scaring the users and negatively
affecting the perception of reliability seems far-fetched.  The fact
that they can *have* page checksums in PostgreSQL should do a lot to
*enhance* the PostgreSQL reputation for reliability in some circles,
especially those getting pounded with FUD from competing products.
If a site has so many OS or hardware failures that they lose track
-- well, they really should be alarmed.
 
Of course, the fact that you may hit such a torn page in a situation
where all data is good means that it shouldn't be more than a
warning.
 
This seems as though it eliminates most of the work people have been
suggesting as necessary, and makes the submitted patch fairly close
to what we want.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why do we need two snapshots per query?

2011-12-19 Thread Greg Smith
This feature has now passed through review by Dimitri with him no longer 
having anything to say about it.  I've marked it ready for committer 
now.  Seems the main decision left here is whether another committer 
wants to take a look at this, or if Robert wants to take a spin on the 
buildfarm wheel by committing it himself.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 12:07 PM, David Fetter da...@fetter.org wrote:
 On Mon, Dec 19, 2011 at 09:34:51AM -0500, Robert Haas wrote:
 On Mon, Dec 19, 2011 at 9:14 AM, Stephen Frost sfr...@snowman.net wrote:
  * Aidan Van Dyk (ai...@highrise.ca) wrote:
  But the scary part is you don't know how long *ago* the crash was.
  Because a hint-bit-only change w/ a torn-page is a non event in
  PostgreSQL *DESIGN*, on crash recovery, it doesn't do anything to try
  and scrub every page in the database.
 
  Fair enough, but, could we distinguish these two cases?  In other words,
  would it be possible to detect if a page was torn due to a 'traditional'
  crash and not complain in that case, but complain if there's a CRC
  failure and it *doesn't* look like a torn page?

 No.

 Would you be so kind as to elucidate this a bit?

Well, basically, Stephen's proposal was pure hand-waving.  :-)

I don't know of any magic trick that would allow us to know whether a
CRC failure looks like a torn page.  The only information we're
going to get is the knowledge of whether the CRC matches or not.  If
it doesn't, it's fundamentally impossible for us to know why.  We know
the page contents are not as expected - that's it!

It's been proposed before that we could examine the page, consider all
the unset hint bits that could be set, and try all combinations of
setting and clearing them to see whether any of them produce a valid
CRC.  But, as Tom has pointed out previously, that has a really quite
large chance of making a page that's *actually* been corrupted look
OK.  If you have 30 or so unset hint bits, odds are very good that
some combination will produce the 32-CRC you're expecting.

To put this another way, we currently WAL-log just about everything.
We get away with NOT WAL-logging some things when we don't care about
whether they make it to disk.  Hint bits, killed index tuple pointers,
etc. cause no harm if they don't get written out, even if some other
portion of the same page does get written out.  But as soon as you CRC
the whole page, now absolutely every single bit on that page becomes
critical data which CANNOT be lost.  IOW, it now requires the same
sort of protection that we already need for our other critical updates
- i.e. WAL logging.  Or you could introduce some completely new
mechanism that serves the same purpose, like MySQL's double-write
buffer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 2:16 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 It seems to me that on a typical production system you would
 probably have zero or one such page per OS crash, with zero being
 far more likely than one.  If we can get that one fixed (if it
 exists) before enough time has elapsed for everyone to forget the OS
 crash, the idea that we would be scaring the users and negatively
 affecting the perception of reliability seems far-fetched.

The problem is that you can't fix them.  If you come to a page with
a bad CRC, you only have two choices: take it seriously, or don't.  If
you take it seriously, then you're complaining about something that
may be completely benign.  If you don't take it seriously, then you're
ignoring something that may be a sign of data corruption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Marti Raudsepp
On Mon, Dec 19, 2011 at 20:34, Simon Riggs si...@2ndquadrant.com wrote:
 It's not clear to me why you think there would be a limitation to
 exactly 2 autonomous transactions.

Sorry my bad, I didn't read the proposal carefully. Nesting does
indeed allow multiple autonomous subtransactions.

Maybe that's just my paranoia, but the fact that subtransactions
aren't named means it's pretty easy to accidentally get out of step
and commit the wrong subtransaction. I see app developers often
messing up BEGIN/COMMIT/ROLLBACK already. This is why I like the
SAVEPOINT style; it's obvious when there's a bug.

(I do realize that allowing subtransactions to commit out of order
also makes it failure prone)

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping : in .pgpass - code or docs bug?

2011-12-19 Thread Richard Huxton

On 19/12/11 16:48, Robert Haas wrote:

On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstromreeds...@rice.edu  wrote:

This should either be fixed by changing the documentation to say to not escape
colons or backslashes in the password part, only, or modify this function
(PasswordFromFile) to silently unescape the password string. It already copies
it.


My vote is for a doc correction in the back-branches and a behavior
change in master.


Seems sensible - presumably mentioning this will be corrected in 9.2?

It's clearly not what you'd call urgent since nobody else seems to 
have noticed before now.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping : in .pgpass - code or docs bug?

2011-12-19 Thread Alvaro Herrera

Excerpts from Richard Huxton's message of lun dic 19 16:33:31 -0300 2011:
 On 19/12/11 16:48, Robert Haas wrote:
  On Sat, Dec 17, 2011 at 3:27 AM, Ross Reedstromreeds...@rice.edu  wrote:
  This should either be fixed by changing the documentation to say to not 
  escape
  colons or backslashes in the password part, only, or modify this function
  (PasswordFromFile) to silently unescape the password string. It already 
  copies
  it.
 
  My vote is for a doc correction in the back-branches and a behavior
  change in master.
 
 Seems sensible - presumably mentioning this will be corrected in 9.2?
 
 It's clearly not what you'd call urgent since nobody else seems to 
 have noticed before now.

Yeah, it is a pretty old bug -- this code was clearly written by some
rookie that didn't know very well what he was doing.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Alvaro Herrera

Excerpts from Marti Raudsepp's message of lun dic 19 16:32:06 -0300 2011:

 (I do realize that allowing subtransactions to commit out of order
 also makes it failure prone)

Uhm?  You can't commit savepoints out of order.  You can release an
older one, but then all the ones following it disappear and can't be
released separately.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Mon, Dec 19, 2011 at 2:16 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 It seems to me that on a typical production system you would
 probably have zero or one such page per OS crash, with zero being
 far more likely than one.  If we can get that one fixed (if it
 exists) before enough time has elapsed for everyone to forget the
 OS crash, the idea that we would be scaring the users and
 negatively affecting the perception of reliability seems
 far-fetched.
 
 The problem is that you can't fix them.  If you come to a page
 with a bad CRC, you only have two choices: take it seriously, or
 don't. If you take it seriously, then you're complaining about
 something that may be completely benign.  If you don't take it
 seriously, then you're ignoring something that may be a sign of
 data corruption.
 
I was thinking that we would warn when such was found, set hint bits
as needed, and rewrite with the new CRC.  In the unlikely event that
it was a torn hint-bit-only page update, it would be a warning about
something which is a benign side-effect of the OS or hardware crash.
The argument was that it could happen months later, and people
might not remember the crash.  My response to that is: don't let it
wait that long.  By forcing a vacuum of all possibly-affected tables
(or all tables if the there's no way to rule any of them out), you
keep it within recent memory.
 
Of course, it would also make sense to document that such an error
after an OS or hardware crash might be benign or may indicate data
corruption or data loss, and give advice on what to do.  There is
obviously no way for PostgreSQL to automagically fix real
corruption flagged by a CRC failure, under any circumstances. 
There's also *always a possibility that CRC error is a false
positive -- if only the bytes in the CRC were damaged.  We're
talking quantitative changes here, not qualitative.
 
I'm arguing that the extreme measures suggested to achieve the
slight quantitative improvements are likely to cause more problems
than they solve.  A better use of resources to improve the false
positive numbers would be to be more aggressive about setting hint
bits -- perhaps when a page is written with any tuples with
transaction IDs before the global xmin, the hint bits should be set
and the CRC calculated before write, for example.  (But that would
be a different patch.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Marti Raudsepp
On Mon, Dec 19, 2011 at 21:43, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 (I do realize that allowing subtransactions to commit out of order
 also makes it failure prone)

 Uhm?  You can't commit savepoints out of order.  You can release an
 older one, but then all the ones following it disappear and can't be
 released separately.

We're confused about the terminology already :)

I was talking about autonomous subtransactions as in COMMIT
SUBTRANSACTION from the proposal. Earlier I commented that it would be
nice if the syntax didn't require autonomous transactions to be
strictly nested.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-12-19 Thread Robert Haas
On Sat, Dec 17, 2011 at 6:11 AM, Simon Riggs si...@2ndquadrant.com wrote:
 A simpler example shows it better. Tom's idea prevents 2 rows being
 returned, but doesn't prevent zero rows.

I don't think it prevents either one.  Suppose we read and return a
tuple, and then someone HOT-updates it and commits.  SnapshotNow is
very happy to also return the updated version of that same tuple on
the next iteration of the scan.  See commit
c0f03aae0469e758964faac0fb741685170c39a5.

 That's a useful improvement, but not the only thing.

 ISTM we can run a scan as we do now, without locking. If scan returns
 zero rows we scan again, but take a definition lock first. ALTER TABLE
 holds the definition lock while running, which won't be a problem
 because we would only use that on shorter AT statements.

 Definition lock being the type of lock described earlier on this
 thread, that we already have code for. So we have code for all the
 parts we need to make this work.

 So that means we'd be able to plug the gaps noted, without reducing
 performance on the common code paths.

 I don't see any objections made so far remain valid with that approach.

I think a retry loop is a possibly useful tool for solving this
problem, but I'm very skeptical about the definition locks, unless we
can arrange to have them taken just before commit (regardless of when
during the transaction ALTER TABLE was executed) and released
immediately afterwards.  What I think might be even better is
something along the lines of what Noah Misch did RangeVarGetRelid --
don't try to lock out concurrent activity, just detect it and redo the
operation if anything bad happens while we're in process.  In this
case, that would mean having a mechanism to know whether any
concurrent transaction had updated the catalog we're scanning during
the scan.

Yet another option, which I wonder whether we're dismissing too
lightly, is to just call GetSnapshotData() and do the scan using a
plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
but is it expensive enough to worry about?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RangeVarGetRelid()

2011-12-19 Thread Noah Misch
On Mon, Dec 19, 2011 at 01:43:18PM -0500, Robert Haas wrote:
 On Mon, Dec 19, 2011 at 11:55 AM, Noah Misch n...@leadboat.com wrote:
  Here's the rule I used: the pre-lock checks must prove that the user could, 
  by
  some command working as-designed, have taken a lock at least as strong as 
  the
  one we're about to acquire. ?How does that work out in practice? ?Relation
  owners can always take AccessExclusiveLock by way of a DROP command, so
  ownership is always sufficient as a pre-lock test.
 
  The above namespace check is no exception; the object owner has authority to
  take the AccessExclusiveLock independent of his authority to ultimately
  complete the rename. ?(Changes arising from the recent discussions about
  locking namespaces during DDL would complicate the situation in other ways.)
 
 Yes: and I have it mind to work on that for the current release cycle,
 time permitting.  It seems to me that there's no real value in
 consolidating this callback if there's a change coming down the pipe
 that would require us to split it right back out again.
 
 Also, while I see the point that it wouldn't be a security issue to
 defer this particular check until after the lock is taken, I still
 don't think it's a good idea.  I basically don't see any point in
 splitting up the security checks across multiple functions.  Our
 permissions-checking logic is already rather diffuse; I think that
 finding more reasons for some of it to be done in one function and
 some of it to be done in some other function is going in the wrong
 direction.  It makes it easier for future people editing the code to
 rearrange things in ways that subtly break security without realizing
 it, and it's also one more obstacle for things like sepgsql which
 would like to get control whenever we do a security check. (Dimitri's
 recent comments about command triggers suggest that MAC isn't the only
 application of an option to override the default access control
 policy.)  It's also not terribly consistent from a user perspective:
 hmm, if I don't have permission to do operation X because of reason A,
 then my command fails immediately; if I don't have permission because
 of reason B, then it waits for a lock and then fails.  Some amount of
 inconsistency there is probably inevitable, because, for example, we
 won't know whether you have permission on an entire inheritance
 hierarchy until we start walking it.  But I don't really see the point
 in going out of our way to add more of it.
 
 I think it's important to keep in mind that most of the code that is
 going into those callbacks is just being moved.  We're not really
 ending up with any more code overall, except to the extent that there
 are a couple of lines of net increase for each callback because, well,
 it has a comment, and maybe a declaration, and some curly braces at
 the beginning and the end.  The ownership check is two lines of code;
 it doesn't matter whether we duplicate that or not.  In many cases, I
 think the callbacks are actually increasing readability, by pulling a
 bunch of related checks into their own function instead of cluttering
 up the main code path with them.  I don't want to end up with a lot of
 needless code duplication, but I also don't feel any powerful need to
 squeeze the number of callback functions down to an absolute minimum
 just because I can.

I'm satisfied that you and I have a common understanding of the options and
their respective merits.  There's plenty of room for judgement in choosing
which merits to seize at the expense of others.  Your judgements here seem
entirely reasonable.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Greg Smith

On 12/19/2011 02:44 PM, Kevin Grittner wrote:

I was thinking that we would warn when such was found, set hint bits
as needed, and rewrite with the new CRC.  In the unlikely event that
it was a torn hint-bit-only page update, it would be a warning about
something which is a benign side-effect of the OS or hardware crash.
The argument was that it could happen months later, and people
might not remember the crash.  My response to that is: don't let it
wait that long.  By forcing a vacuum of all possibly-affected tables
(or all tables if the there's no way to rule any of them out), you
keep it within recent memory.


Cleanup that requires a potentially unbounded in size VACUUM to sort out 
doesn't sound like a great path to wander down.  Ultimately any CRC 
implementation is going to want a scrubbing feature like those found 
in RAID arrays eventually, one that wanders through all database pages 
looking for literal bitrot.  And pushing in priority requests for things 
to check to the top of its queue may end up being a useful feature 
there.  But if you need all that infrastructure just to get the feature 
launched, that's a bit hard to stomach.


Also, as someone who follows Murphy's Law as my chosen religion, I would 
expect this situation could be exactly how flaky hardware would first 
manifest itself:  server crash and a bad CRC on the last thing written 
out.  And in that case, the last thing you want to do is assume things 
are fine, then kick off a VACUUM that might overwrite more good data 
with bad.  The sort of bizarre, that should never happen cases are the 
ones I'd most like to see more protection against, rather than excusing 
them and going on anyway.



There's also *always a possibility that CRC error is a false
positive -- if only the bytes in the CRC were damaged.  We're
talking quantitative changes here, not qualitative.


The main way I expect to validate this sort of thing is with an as yet 
unwritten function to grab information about a data block from a standby 
server for this purpose, something like this:


Master:  Computed CRC A, Stored CRC B; error raised because A!=B
Standby:  Computed CRC C, Stored CRC D

If C==D  A==C, the corruption is probably overwritten bits of the CRC B.

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Alvaro Herrera

Excerpts from Marti Raudsepp's message of lun dic 19 16:50:22 -0300 2011:
 
 On Mon, Dec 19, 2011 at 21:43, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  (I do realize that allowing subtransactions to commit out of order
  also makes it failure prone)
 
  Uhm?  You can't commit savepoints out of order.  You can release an
  older one, but then all the ones following it disappear and can't be
  released separately.
 
 We're confused about the terminology already :)

Yeah.  The code talks about savepoints as subtransactions (because
that's the name they had at first), so if you guys are going to talk
about autonomous transactions as subtransactions too, then the code is
going to end up pretty schizo.

 I was talking about autonomous subtransactions as in COMMIT
 SUBTRANSACTION from the proposal. Earlier I commented that it would be
 nice if the syntax didn't require autonomous transactions to be
 strictly nested.

Oh ... Probably.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page Checksums

2011-12-19 Thread Heikki Linnakangas

On 19.12.2011 21:27, Robert Haas wrote:

To put this another way, we currently WAL-log just about everything.
We get away with NOT WAL-logging some things when we don't care about
whether they make it to disk.  Hint bits, killed index tuple pointers,
etc. cause no harm if they don't get written out, even if some other
portion of the same page does get written out.  But as soon as you CRC
the whole page, now absolutely every single bit on that page becomes
critical data which CANNOT be lost.  IOW, it now requires the same
sort of protection that we already need for our other critical updates
- i.e. WAL logging.  Or you could introduce some completely new
mechanism that serves the same purpose, like MySQL's double-write
buffer.


Double-writes would be a useful option also to reduce the size of WAL 
that needs to be shipped in replication.


Or you could just use a filesystem that does CRCs...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Gianni Ciolli
On Mon, Dec 19, 2011 at 05:52:40PM +0200, Marti Raudsepp wrote:
 On Sun, Dec 18, 2011 at 10:28, Gianni Ciolli
 gianni.cio...@2ndquadrant.it wrote:
   http://wiki.postgresql.org/wiki/Autonomous_subtransactions
 
  It is meant to be an ongoing project, requesting comments and
  contributions, rather than a conclusive document.
 
 In addition to what Jim Nasby said, this proposal seems a bit
 inflexible. In particular:
 1. It limits us to exactly 2 autonomous transactions at any time (the
 main one and the subtransaction).

Thanks for pointing this out; it shows me that the only example I
provided was too basic, so that I have added a second one, plus one
related remark.

The spec doesn't actually impose a limit of only 2 total transactions:
if we have the capability to pause the current transaction (T0) to
start a new one (T1), and then resume T0 after the conclusion of T1,
then we can apply that capability more than once in the same
transaction, resulting in a sequence of transactions T1, T2, ..., Tn
all subordinate to T0, possibly interspersed by statements from
T0. Also, if T0 is a function then it is possible peek at the (very)
near future and eliminate any empty interlude of T0 between T_k and
T_(k+1).

 2. There's no reason why two autonomous transactions should have a
 main / sub relationship. They are autonomous -- they should not
 depend on the state of the outer transaction.

COMMIT/ROLLBACK of any of T1,T2,... does not depend on COMMIT/ROLLBACK
of T0; the parent transaction exists only to simplify the logical
model.  It can happen that T0 contains zero statements, so that it is
very similar to a sequence of independent transactions.

The advantage to me is that we allow autonomous transactions, but
there is always one master transaction, so that it looks more like a
generalisation of the existing model as opposed to something radically
different (and therefore less likely to happen).

The behaviour is indeed similar to the one that can be obtained with
dblink: it allows for instance to have an umbrella transaction which
doesn't modify the database, and has the only purpose of executing
other transactions, which commit or rollback independently of each
other and of the umbrella transaction.

With respect to dblink, this spec brings a simpler syntax, which
doesn't require managing a second connection to the same db, and one
significant difference: you reuse the current session instead of
opening a new one (with some caveats, as noted in the Wiki).

Regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous subtransactions

2011-12-19 Thread Gianni Ciolli
On Mon, Dec 19, 2011 at 09:32:06PM +0200, Marti Raudsepp wrote:
 Maybe that's just my paranoia, but the fact that subtransactions
 aren't named means it's pretty easy to accidentally get out of step
 and commit the wrong subtransaction. I see app developers often
 messing up BEGIN/COMMIT/ROLLBACK already. This is why I like the
 SAVEPOINT style; it's obvious when there's a bug.

For each COMMIT/ROLLBACK there is only one possible BEGIN
SUBTRANSACTION, therefore naming is not strictly necessary, unlike in
the SAVEPOINT case where you can have more than one option.

However, ISTM that allowing to optionally name each transaction could
provide a nice safety measure that the careful developer can use to
reduce the chances of human error. If subtransactions are not named,
no check is made; but if they are named, then the Xs on BEGIN
SUBTRANSACTION X and COMMIT/ROLLBACK SUBTRANSACTION X must match,
otherwise an exception is raised (and a bug is detected).

Regards,
Dr. Gianni Ciolli - 2ndQuadrant Italia
PostgreSQL Training, Services and Support
gianni.cio...@2ndquadrant.it | www.2ndquadrant.it

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Mon, Dec 19, 2011 at 17:30, Sushant Sinha sushant...@gmail.com wrote:
 This never happened earlier with postgres 9.0 Is there a known issue
 with Postgres 9.1? Or how to report this problem?

 What *did* change in 9.1 is that there's new GIN cost estimation in
 the planner. If you do EXPLAIN ANALYZE for your queries, do the plans
 differ for 9.0 or 9.1?

I trolled the commit log a bit, and AFAICS the only significant GIN
changes between 9.1 and reasonably late-model 9.0 are the cost
estimation patch and this one:
http://git.postgresql.org/gitweb/?p=postgresql.gita=commitdiffh=73912e7fbd1b52c51d914214abbec1cda64595f2

which makes me wonder if maybe the OP has a very large fraction of empty
or null entries in his data.  Previously those would have resulted in no
insertion traffic on a GIN index, but now they do.

 Another thought -- have you read about the GIN fast updates feature?
 This existed in 9.0 too.

Yeah, so it seems unlikely to be that, or at least not that by itself.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Postgres 9.1: Adding rows to table causing too much latency in other queries

2011-12-19 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes:
 I have to say that I consistently have to turn fastupdate off for
 our heavily updated gin-indexes. The overall performance gain
 may be measurable, but its not intolerable without. The spikes seen
 from the applications, when cleanup happens. Either in the foreground
 or in the background are not tolerable. (multiple seconds).

Well, that's why there's a provision to turn it off: if response time
spikes are a bigger deal to you than overall performance, you probably
don't want bulk updates.

The theory is that you should be able to tune things so that the bulk
updates are done by autovacuum, but if you can't get that to work
sufficiently reliably, fastupdate=off is the best answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 12:07 PM, Nikhil Sontakke nikkh...@gmail.com wrote:
 It seems hard to believe that ATExecDropConstraint() doesn't need any
 adjustment.

 Yeah, I think we could return early on for only type of constraints.

It's not just that.  Suppose that C inherits from B which inherits
from A.  We add an only constraint to B and a non-only constraint
to A.   Now, what happens in each of the following scenarios?

1. We drop the constraint from B without specifying ONLY.
2. We drop the constraint from B *with* ONLY.
3. We drop the constraint from A without specifying ONLY.
4. We drop the constraint from A *with* ONLY.

Off the top of my head, I suspect that #1 should be an error; #2
should succeed, leaving only the inherited version of the constraint
on B; #3 should remove the constraint from A and leave it on B but I'm
not sure what should happen to C, and I have no clear vision of what
#4 should do.

As a followup question, if we do #2 followed by #4, or #4 followed by
#2, do we end up with the same final state in both cases?

Here's another scenario.  B inherits from A.  We a constraint to A
using ONLY, and then drop it without ONLY.  Does that work or fail?
Also, what happens we add matching constraints to B and A, in each
case using ONLY, and then remove the constraint from A without using
ONLY?  Does anything happen to B's constraint?  Why or why not?

Just to be clear, I like the feature.  But I've done some work on this
code before, and it is amazingly easy for to screw it up and end up
with bugs... so I think lots of careful thought is in order.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Alvaro Herrera

Excerpts from Nikhil Sontakke's message of lun dic 19 14:07:02 -0300 2011:

 PFA, revised version containing the above changes based on Alvaro's v4
 patch.

Committed with these fixes, and with my fix for the pg_dump issue noted
by Alex, too.  Thanks.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

2011-12-19 Thread Marti Raudsepp
Hi list,

Since PostgreSQL 9.1, GIN has new cost estimation code. This code
assumes that the only expression type it's going to see is OpExpr.
However, ScalarArrayOpExpr has also been possible in earlier versions.
Estimating col op ANY (array) queries segfaults in 9.1 if there's
a GIN index on the column.

Case in point:
create table words (word text);
create index on words using gin (to_tsvector('english', word));
explain analyze select * from words where to_tsvector('english', word)
@@ any ('{foo}');

(It seems that RowCompareExpr and NullTest clauses are impossible for
a GIN index -- at least my efforts to find such cases failed)

Attached is an attempted fix for the issue. I split out the code for
the extract call and now run that for each array element, adding
together the average of (partialEntriesInQuals, exactEntriesInQuals,
searchEntriesInQuals) for each array element. After processing all
quals, I multiply the entries by the number of array_scans (which is
the product of all array lengths) to get the total cost.

This required a fair bit of refactoring, but I tried to follow the
patterns for OpExpr pretty strictly -- discounting scans over NULL
elements, returning 0 cost when none of the array elements can match,
accounting for cache effects when there are multiple scans, etc. But
it's also possible that I have no idea what I'm really doing. :)

I also added regression tests for this to tsearch and pg_trgm.

Regards,
Marti
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index e7af7d4..250d853 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -3486,6 +3486,16 @@ explain (costs off)
  Index Cond: (t ~~* '%BCD%'::text)
 (4 rows)
 
+explain (costs off)
+  select * from test2 where t like any ('{%bcd%,qua%}');
+   QUERY PLAN
+-
+ Bitmap Heap Scan on test2
+   Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+   -  Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+(4 rows)
+
 select * from test2 where t like '%BCD%';
  t 
 ---
@@ -3509,6 +3519,13 @@ select * from test2 where t ilike 'qua%';
  quark
 (1 row)
 
+select * from test2 where t like any ('{%bcd%,qua%}');
+   t
+
+ abcdef
+ quark
+(2 rows)
+
 drop index test2_idx_gin;
 create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
 set enable_seqscan=off;
@@ -3528,6 +3545,16 @@ explain (costs off)
Index Cond: (t ~~* '%BCD%'::text)
 (2 rows)
 
+explain (costs off)
+  select * from test2 where t like any ('{%bcd%,qua%}');
+   QUERY PLAN
+-
+ Bitmap Heap Scan on test2
+   Recheck Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+   -  Bitmap Index Scan on test2_idx_gist
+ Index Cond: (t ~~ ANY ('{%bcd%,qua%}'::text[]))
+(4 rows)
+
 select * from test2 where t like '%BCD%';
  t 
 ---
@@ -3551,3 +3578,10 @@ select * from test2 where t ilike 'qua%';
  quark
 (1 row)
 
+select * from test2 where t like any ('{%bcd%,qua%}');
+   t
+
+ abcdef
+ quark
+(2 rows)
+
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index ea902f6..ac969e6 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -47,10 +47,13 @@ explain (costs off)
   select * from test2 where t like '%BCD%';
 explain (costs off)
   select * from test2 where t ilike '%BCD%';
+explain (costs off)
+  select * from test2 where t like any ('{%bcd%,qua%}');
 select * from test2 where t like '%BCD%';
 select * from test2 where t like '%bcd%';
 select * from test2 where t ilike '%BCD%';
 select * from test2 where t ilike 'qua%';
+select * from test2 where t like any ('{%bcd%,qua%}');
 drop index test2_idx_gin;
 create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
 set enable_seqscan=off;
@@ -58,7 +61,10 @@ explain (costs off)
   select * from test2 where t like '%BCD%';
 explain (costs off)
   select * from test2 where t ilike '%BCD%';
+explain (costs off)
+  select * from test2 where t like any ('{%bcd%,qua%}');
 select * from test2 where t like '%BCD%';
 select * from test2 where t like '%bcd%';
 select * from test2 where t ilike '%BCD%';
 select * from test2 where t ilike 'qua%';
+select * from test2 where t like any ('{%bcd%,qua%}');
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index d06809e..64b732e 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -6591,6 +6591,70 @@ find_index_column(Node *op, IndexOptInfo *index)
 }
 
 /*
+ * Estimate gin costs for a single search value. Returns false if no match
+ * is possible, true otherwise. All pointer arguments must be initialized by
+ * the caller.
+ */
+static bool
+gin_costestimate_search(Oid extractProcOid, Datum value, int strategy_op,

Re: [HACKERS] Page Checksums

2011-12-19 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 But if you need all that infrastructure just to get the feature 
 launched, that's a bit hard to stomach.
 
Triggering a vacuum or some hypothetical scrubbing feature?
 
 Also, as someone who follows Murphy's Law as my chosen religion,
 
If you don't think I pay attention to Murphy's Law, I should recap
our backup procedures -- which involves three separate forms of
backup, each to multiple servers in different buildings, real-time,
plus idle-time comparison of the databases of origin to all replicas
with reporting of any discrepancies.  And off-line snapshot
backups on disk at a records center controlled by a different
department.  That's in addition to RAID redundancy and hardware
health and performance monitoring.  Some people think I border on
the paranoid on this issue.
 
 I would expect this situation could be exactly how flaky hardware
 would first manifest itself:  server crash and a bad CRC on the
 last thing written out.  And in that case, the last thing you want
 to do is assume things are fine, then kick off a VACUUM that might
 overwrite more good data with bad.
 
Are you arguing that autovacuum should be disabled after crash
recovery?  I guess if you are arguing that a database VACUUM might
destroy recoverable data when hardware starts to fail, I can't
argue.  And certainly there are way too many people who don't ensure
that they have a good backup before firing up PostgreSQL after a
failure, so I can see not making autovacuum more aggressive than
usual, and perhaps even disabling it until there is some sort of
confirmation (I have no idea how) that a backup has been made.  That
said, a database VACUUM would be one of my first steps after
ensuring that I had a copy of the data directory tree, personally.
I guess I could even live with that as recommended procedure rather
than something triggered through autovacuum and not feel that the
rest of my posts on this are too far off track.
 
 The main way I expect to validate this sort of thing is with an as
 yet unwritten function to grab information about a data block from
 a standby server for this purpose, something like this:
 
 Master:  Computed CRC A, Stored CRC B; error raised because A!=B
 Standby:  Computed CRC C, Stored CRC D
 
 If C==D  A==C, the corruption is probably overwritten bits of
 the CRC B.
 
Are you arguing we need *that* infrastructure to get the feature
launched?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-19 Thread David E. Wheeler
On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote:

 We can hopefully get around this for the extensions in contrib (and
 reasonably well has already), but few large companies are going to be
 happy to go to pgxn and download an extension that has a single
 maintainer (not the team, and in most cases not even a team),
 usually no defined lifecycle, no support, etc. (I'm pretty sure you
 won't get support included for random pgxn modules when you buy a
 contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
 if it the datatype is in core, you *will* get this)

I support having a JSON type in core, but question the assertions here. *Some* 
organizations won’t use PGXN, usually because they require things through a 
different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. 
There are a *lot* of companies out there that use CPAN, easy_install, and Gem. 
The same sorts of places will use PGXN.

Oh, and at PGX, we’ll happily provide support for random modules, so long as 
you pay for our time. We’re not picky (and happy to send improvements back 
upstream), though we might recommend you switch to something better. But such 
evaluations are based on quality, not simply on what ecosystem it came from.

 If we can find a way to have a stable part in core and then have
 addons that can provide these tons of interesting features (which I
 agree there are) until such time that they can be considered stable
 enough for core, I think that's the best compromise.

+1, though I think the core type will at least need some basic operators and 
indexing support.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lots of FSM-related fragility in transaction commit

2011-12-19 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 08.12.2011 08:20, Tom Lane wrote:
 I thought that removing the unreadable file would let me restart,
 but after rm 52860_fsm and trying again to start the server,
 there's a different problem:
 LOG:  consistent recovery state reached at 0/5D71BA8
 LOG:  redo starts at 0/5100050
 WARNING:  page 18 of relation base/27666/52860 is uninitialized
 CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
 PANIC:  WAL contains references to invalid pages

 The bug here is that we consider having immediately reached consistency 
 during crash recovery.

That fixes only part of the problem I was on about, though: I don't
believe that this type of situation should occur in the first place.
We should not be throwing ERROR during post-commit attempts to unlink
the physical files of deleted relations.

After some investigation I believe that the problem was introduced
by the changes to support multiple forks in a relation.  Specifically,
instead of just doing smgrdounlink() during post-commit, we now
do something like this:

for (fork = 0; fork = MAX_FORKNUM; fork++)
{
if (smgrexists(srel, fork))
smgrdounlink(srel, fork, false);
}

and if you look into mdexists() you'll find that it throws ERROR for any
unexpected errno.  This makes smgrdounlink's attempts to be robust
rather pointless.

I'm not entirely sure whether that behavior of mdexists is a good thing,
but it strikes me that the smgrexists call is pretty unnecessary here in
the first place.  We should just get rid of it and do smgrdounlink
unconditionally.  The only reason it's there is to keep smgrdounlink
from whinging about ENOENT on non-primary forks, which is simply stupid
for smgrdounlink to be doing anyway --- it is actually *more* willing to
complain about ENOENT on non-primary than primary forks, which has no
sense that I can detect.

Accordingly I propose the attached patch for HEAD, and similar changes
in all branches that have multiple forks.  Note that this makes the
warning-report conditions identical for both code paths in mdunlink.

regards, tom lane


diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d2fecb1ecb9171fc2c6dd2887d846c8a16779bb0..c5a2541807e148c29c790b0fe12a52c0389f4626 100644
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
*** FinishPreparedTransaction(const char *gi
*** 1366,1373 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			if (smgrexists(srel, fork))
! smgrdounlink(srel, fork, false);
  		}
  		smgrclose(srel);
  	}
--- 1366,1372 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			smgrdounlink(srel, fork, false);
  		}
  		smgrclose(srel);
  	}
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c383011b5f6538fa2b90fa5f7778da7ff59fa679..4177a6c5cf95d5e1a1b49b9f11c17098b2158529 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*** xact_redo_commit_internal(TransactionId 
*** 4595,4605 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			if (smgrexists(srel, fork))
! 			{
! XLogDropRelation(xnodes[i], fork);
! smgrdounlink(srel, fork, true);
! 			}
  		}
  		smgrclose(srel);
  	}
--- 4595,4602 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			XLogDropRelation(xnodes[i], fork);
! 			smgrdounlink(srel, fork, true);
  		}
  		smgrclose(srel);
  	}
*** xact_redo_abort(xl_xact_abort *xlrec, Tr
*** 4738,4748 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			if (smgrexists(srel, fork))
! 			{
! XLogDropRelation(xlrec-xnodes[i], fork);
! smgrdounlink(srel, fork, true);
! 			}
  		}
  		smgrclose(srel);
  	}
--- 4735,4742 
  
  		for (fork = 0; fork = MAX_FORKNUM; fork++)
  		{
! 			XLogDropRelation(xlrec-xnodes[i], fork);
! 			smgrdounlink(srel, fork, true);
  		}
  		smgrclose(srel);
  	}
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index fff933d403e7ca5e7b551b987e1684cd6a6e98a7..450e292f4013f9be5418f482654cd33c14c6d344 100644
*** a/src/backend/catalog/storage.c
--- b/src/backend/catalog/storage.c
*** smgrDoPendingDeletes(bool isCommit)
*** 361,368 
  srel = smgropen(pending-relnode, pending-backend);
  for (i = 0; i = MAX_FORKNUM; i++)
  {
! 	if (smgrexists(srel, i))
! 		smgrdounlink(srel, i, false);
  }
  smgrclose(srel);
  			}
--- 361,367 
  srel = smgropen(pending-relnode, pending-backend);
  for (i = 0; i = MAX_FORKNUM; i++)
  {
! 	smgrdounlink(srel, i, false);
  }
  smgrclose(srel);
  			}
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 

Re: [HACKERS] JSON for PG 9.2

2011-12-19 Thread David E. Wheeler
On Dec 19, 2011, at 3:39 PM, David E. Wheeler wrote:

 Well, no, JSON is formally “a lightweight data-interchange format.” It’s 
 derived from JavaScript syntax, but it is not a programming language, so I 
 wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.

Bah, it says “It is based on a subset of the JavaScript Programming Language, 
Standard ECMA-262 3rd Edition - December 1999.” But my point still holds: it is 
not a programming language, and one does not need a PL to have a JSON data type.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-19 Thread David E. Wheeler
On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:

 My understanding is that JSON is a subset of ECMAscript

Well, no, JSON is formally “a lightweight data-interchange format.” It’s 
derived from JavaScript syntax, but it is not a programming language, so I 
wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.

  http://json.org/

IOW, one does not need a new PL to get this type.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PATCH: tracking temp files in pg_stat_database

2011-12-19 Thread Tomas Vondra
Hello everybody,

this patch adds two counters to pg_stat_database to track temporary
files - number of temp files and number of bytes. I see this as a useful
feature, as temporary files often cause a lot of IO (because of low
work_mem etc.). The log_temp_files is useful, but you have to parse the
log and only temp files exceeding a size limit are logged so the actual
amount of I/O is unknown.

The patch is rather simple:

1) two new fields in PgStat_StatDBEntry (n_temp_files, n_temp_bytes)
2) report/recv function in pgstat.c
3) FileClose modified to log stats for all temp files (see below)
4) two new fields added to pg_stat_database (temp_files, temp_bytes)

I had to modify FileClose to call stat() on each temp file as this
should log all temp files (not just when log_temp_file = 0). But the
impact of this change should be negligible, considering that the user is
already using temp files.

I haven't updated the docs yet - let's see if the patch is acceptable at
all first.

Tomas
diff --git a/src/backend/catalog/system_views.sql 
b/src/backend/catalog/system_views.sql
index 2253ca8..55d20dc 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -574,6 +574,8 @@ CREATE VIEW pg_stat_database AS
 pg_stat_get_db_tuples_updated(D.oid) AS tup_updated,
 pg_stat_get_db_tuples_deleted(D.oid) AS tup_deleted,
 pg_stat_get_db_conflict_all(D.oid) AS conflicts,
+pg_stat_get_db_temp_files(D.oid) AS temp_files,
+pg_stat_get_db_temp_bytes(D.oid) AS temp_bytes,
 pg_stat_get_db_stat_reset_time(D.oid) AS stats_reset
 FROM pg_database D;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 24f4cde..97c7004 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -286,7 +286,7 @@ static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, 
int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
 static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
 static void pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict *msg, int 
len);
-
+static void pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len);
 
 /* 
  * Public functions called from postmaster follow
@@ -1339,6 +1339,29 @@ pgstat_report_recovery_conflict(int reason)
pgstat_send(msg, sizeof(msg));
 }
 
+
+/* 
+ * pgstat_report_tempfile() -
+ *
+ * Tell the collector about a temporary file.
+ * 
+ */
+void
+pgstat_report_tempfile(size_t filesize)
+{
+   PgStat_MsgTempFile msg;
+
+   if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+   return;
+
+   pgstat_setheader(msg.m_hdr, PGSTAT_MTYPE_TEMPFILE);
+   msg.m_databaseid = MyDatabaseId;
+   msg.m_filesize = filesize;
+   pgstat_send(msg, sizeof(msg));
+}
+
+;
+
 /* --
  * pgstat_ping() -
  *
@@ -3185,6 +3208,10 @@ PgstatCollectorMain(int argc, char *argv[])

pgstat_recv_recoveryconflict((PgStat_MsgRecoveryConflict *) msg, len);
break;
 
+   case PGSTAT_MTYPE_TEMPFILE:
+   
pgstat_recv_tempfile((PgStat_MsgTempFile *) msg, len);
+   break;
+
default:
break;
}
@@ -3266,6 +3293,8 @@ pgstat_get_db_entry(Oid databaseid, bool create)
result-n_conflict_snapshot = 0;
result-n_conflict_bufferpin = 0;
result-n_conflict_startup_deadlock = 0;
+   result-n_temp_files = 0;
+   result-n_temp_bytes = 0;
 
result-stat_reset_timestamp = GetCurrentTimestamp();
 
@@ -4177,6 +4206,8 @@ pgstat_recv_resetcounter(PgStat_MsgResetcounter *msg, int 
len)
dbentry-n_tuples_updated = 0;
dbentry-n_tuples_deleted = 0;
dbentry-last_autovac_time = 0;
+   dbentry-n_temp_bytes = 0;
+   dbentry-n_temp_files = 0;
 
dbentry-stat_reset_timestamp = GetCurrentTimestamp();
 
@@ -4403,6 +4434,24 @@ pgstat_recv_recoveryconflict(PgStat_MsgRecoveryConflict 
*msg, int len)
 }
 
 /* --
+ * pgstat_recv_tempfile() -
+ *
+ * Process as PGSTAT_MTYPE_TEMPFILE message.
+ * --
+ */
+static void
+pgstat_recv_tempfile(PgStat_MsgTempFile *msg, int len)
+{
+   PgStat_StatDBEntry *dbentry;
+
+   dbentry = pgstat_get_db_entry(msg-m_databaseid, true);
+
+   dbentry-n_temp_bytes += msg-m_filesize;
+   dbentry-n_temp_files += 1;
+
+}
+
+/* --
  * pgstat_recv_funcstat() -
  *
  * Count what the backend has done.
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 9540279..5e40d12 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ 

Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Nikhil Sontakke
  PFA, revised version containing the above changes based on Alvaro's v4
  patch.

 Committed with these fixes, and with my fix for the pg_dump issue noted
 by Alex, too.  Thanks.


Thanks a lot Alvaro!

Regards,
Nikhils


 --
 Álvaro Herrera alvhe...@commandprompt.com
 The PostgreSQL Company - Command Prompt, Inc.
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support



Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Alvaro Herrera

Excerpts from Nikhil Sontakke's message of lun dic 19 22:32:57 -0300 2011:
   PFA, revised version containing the above changes based on Alvaro's v4
   patch.
 
  Committed with these fixes, and with my fix for the pg_dump issue noted
  by Alex, too.  Thanks.
 
 Thanks a lot Alvaro!

You're welcome.

But did you see Robert Haas' response upthread?  It looks like there's
still some work to do -- at least some research to determine that we're
correctly handling all those cases.  As the committer I'm responsible
for it, but I don't have resources to get into it any time soon.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: WIP patch: Improve relation size functions such as pg_relation_size() to avoid producing an error when called against a no longer visible relation

2011-12-19 Thread Phil Sorber
On Mon, Dec 19, 2011 at 1:27 PM, Phil Sorber p...@omniti.com wrote:
 On Sat, Dec 17, 2011 at 3:52 PM, Phil Sorber p...@omniti.com wrote:
 Attached is a patch that addresses the todo item Improve relation
 size functions such as pg_relation_size() to avoid producing an error
 when called against a no longer visible relation.

 http://archives.postgresql.org/pgsql-general/2010-10/msg00332.php

 Instead of returning an error, they now return NULL if the OID is
 found in pg_class when using SnapshotAny. I only applied it to 4
 functions: select pg_relation_size, pg_total_relation_size,
 pg_table_size and pg_indexes_size. These are the ones that were using
 relation_open(). I changed them to using try_relation_open(). For
 three of them I had to move the try_relation_open() call up one level
 in the call stack and change the parameter types for some support
 functions from Oid to Relation. They now also call a new function
 relation_recently_dead() which is what checks for relation in
 SnapshotAny. All regression tests passed.

 Is SnapshotAny the snapshot I should be using? It seems to get the
 correct results. I can drop a table and I get NULL. Then after a
 vacuumdb it returns an error.

 Something I'd like to point out myself is that I need to be using
 ObjectIdAttributeNumber instead of Anum_pg_class_relfilenode. Probably
 just luck that I got the intended results before. This will also
 change the logic in a few other places.

 Still not clear on the semantics of Snapshot{Any|Self|Now|Dirty}.

I've attached the updated version of my patch with the changes
mentioned in the previous email.
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 2ee5966..bc2c862
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***
*** 15,20 
--- 15,21 
  #include sys/stat.h
  
  #include access/heapam.h
+ #include access/sysattr.h
  #include catalog/catalog.h
  #include catalog/namespace.h
  #include catalog/pg_tablespace.h
***
*** 24,32 
--- 25,35 
  #include storage/fd.h
  #include utils/acl.h
  #include utils/builtins.h
+ #include utils/fmgroids.h
  #include utils/rel.h
  #include utils/relmapper.h
  #include utils/syscache.h
+ #include utils/tqual.h
  
  
  /* Return physical size of directory contents, or 0 if dir doesn't exist */
*** calculate_relation_size(RelFileNode *rfn
*** 281,286 
--- 284,322 
  	return totalsize;
  }
  
+ /*
+  * relation_recently_dead
+  *
+  * Check to see if a relation exists in SnapshotAny
+  */
+ static bool
+ relation_recently_dead(Oid relOid)
+ {
+ 	Relation		classRel;
+ 	ScanKeyData		key[1];
+ 	HeapScanDesc	scan;
+ 	bool			status;
+ 
+ 	classRel = heap_open(RelationRelationId, AccessShareLock);
+ 
+ 	ScanKeyInit(key[0],
+ ObjectIdAttributeNumber,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(relOid));
+ 
+ 	scan = heap_beginscan(classRel, SnapshotAny, 1, key);
+ 
+ 	if (heap_getnext(scan, ForwardScanDirection) != NULL)
+ 		status = true;
+ 	else
+ 		status = false;
+ 
+ 	heap_endscan(scan);
+ 	heap_close(classRel, AccessShareLock);
+ 
+ 	return status;
+ }
+ 
  Datum
  pg_relation_size(PG_FUNCTION_ARGS)
  {
*** pg_relation_size(PG_FUNCTION_ARGS)
*** 289,295 
  	Relation	rel;
  	int64		size;
  
! 	rel = relation_open(relOid, AccessShareLock);
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
--- 325,339 
  	Relation	rel;
  	int64		size;
  
! 	rel = try_relation_open(relOid, AccessShareLock);
! 
! 	if (rel == NULL)
! 	{
! 		if (relation_recently_dead(relOid))
! 			PG_RETURN_NULL();
! 		else
! 			elog(ERROR, could not open relation with OID %u, relOid);
! 	}
  
  	size = calculate_relation_size((rel-rd_node), rel-rd_backend,
  			  forkname_to_number(text_to_cstring(forkName)));
*** calculate_toast_table_size(Oid toastreli
*** 339,352 
   * those won't have attached toast tables, but they can have multiple forks.
   */
  static int64
! calculate_table_size(Oid relOid)
  {
  	int64		size = 0;
- 	Relation	rel;
  	ForkNumber	forkNum;
  
- 	rel = relation_open(relOid, AccessShareLock);
- 
  	/*
  	 * heap size, including FSM and VM
  	 */
--- 383,393 
   * those won't have attached toast tables, but they can have multiple forks.
   */
  static int64
! calculate_table_size(Relation rel)
  {
  	int64		size = 0;
  	ForkNumber	forkNum;
  
  	/*
  	 * heap size, including FSM and VM
  	 */
*** calculate_table_size(Oid relOid)
*** 360,367 
  	if (OidIsValid(rel-rd_rel-reltoastrelid))
  		size += calculate_toast_table_size(rel-rd_rel-reltoastrelid);
  
- 	relation_close(rel, AccessShareLock);
- 
  	return size;
  }
  
--- 401,406 
*** calculate_table_size(Oid relOid)
*** 371,382 
   * Can be applied safely to an index, but you'll just get zero.
   */
  static int64
! calculate_indexes_size(Oid 

Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-12-19 Thread Noah Misch
On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
 Yet another option, which I wonder whether we're dismissing too
 lightly, is to just call GetSnapshotData() and do the scan using a
 plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
 but is it expensive enough to worry about?

Good point.  For the most part, we already regard a catalog scan as too
expensive for bulk use, hence relcache and catcache.  That's not license to
slow them down recklessly, but it's worth discovering how much of a hit we'd
actually face.  I created a function that does this in a loop:

HeapTuple t;

CatalogCacheFlushCatalog(ProcedureRelationId);
t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
if (!HeapTupleIsValid(t))
elog(ERROR, cache lookup failed for function 42);
ReleaseSysCache(t);

Then, I had pgbench call the function once per client with various numbers of
clients and a loop iteration count such that the total number of scans per run
was always 1920.  Results for master and for a copy patched to use MVCC
snapshots in catcache.c only:

 2 clients, master: 4:30.66elapsed
 4 clients, master: 4:26.82elapsed
32 clients, master: 4:25.30elapsed
 2 clients, master: 4:25.67elapsed
 4 clients, master: 4:26.58elapsed
32 clients, master: 4:26.40elapsed
 2 clients, master: 4:27.54elapsed
 4 clients, master: 4:26.60elapsed
32 clients, master: 4:27.20elapsed
 2 clients, mvcc-catcache: 4:35.13elapsed
 4 clients, mvcc-catcache: 4:30.40elapsed
32 clients, mvcc-catcache: 4:37.91elapsed
 2 clients, mvcc-catcache: 4:28.13elapsed
 4 clients, mvcc-catcache: 4:27.06elapsed
32 clients, mvcc-catcache: 4:32.84elapsed
 2 clients, mvcc-catcache: 4:32.47elapsed
 4 clients, mvcc-catcache: 4:24.35elapsed
32 clients, mvcc-catcache: 4:31.54elapsed

I see roughly a 2% performance regression.  However, I'd expect any bulk
losses to come from increased LWLock contention, which just doesn't
materialize in a big way on this 2-core box.  If anyone would like to rerun
this on a larger machine, I can package it up for reuse.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ToDo: conditional ALTER TABLE

2011-12-19 Thread Noah Misch
On Mon, Dec 19, 2011 at 10:25:34AM +0100, Pavel Stehule wrote:
 I am working on quiet dumps now. i found a small issue.
 
 pg_dump produces a statements
 
 ALTER TABLE ONLY public.b DROP CONSTRAINT b_fk_fkey;
 ALTER TABLE ONLY public.a DROP CONSTRAINT a_pkey;
 
 DROP TABLE IF EXISTS public.b;
 DROP TABLE IF EXISTS public.a;
 
 Actually there is no a conditional ALTER. These statements must be
 before DROPs, but then it can fails when these tables are missing.
 
 So some form like ALTER TABLE IF EXISTS ... should be usefull

If ALTER TABLE is the only ALTER command you'd need to change this way, I think
your proposal is good.

nm (who has never used pg_dump --clean)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe

2011-12-19 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 On Mon, Dec 19, 2011 at 12:05:09PM -0500, Robert Haas wrote:
 Yet another option, which I wonder whether we're dismissing too
 lightly, is to just call GetSnapshotData() and do the scan using a
 plain old MVCC snapshot.  Sure, it's more expensive than SnapshotNow,
 but is it expensive enough to worry about?

That might actually be workable ...

 I created a function that does this in a loop:

   HeapTuple t;

   CatalogCacheFlushCatalog(ProcedureRelationId);
   t = SearchSysCache1(PROCOID, ObjectIdGetDatum(42) /* int4in */);
   if (!HeapTupleIsValid(t))
   elog(ERROR, cache lookup failed for function 42);
   ReleaseSysCache(t);

... but this performance test seems to me to be entirely misguided,
because it's testing a situation that isn't going to occur much in the
field, precisely because the syscache should prevent constant reloads of
the same syscache entry.

Poking around a bit, it looks to me like one of the bigger users of
non-cache-fronted SnapshotNow scans is dependency.c.  So maybe testing
the speed of large cascaded drops would be a more relevant test case.
For instance, create a schema containing a few thousand functions, and
see how long it takes to drop the schema.

Another thing that would be useful to know is what effect such a change
would have on the time to run the regression tests with
CLOBBER_CACHE_ALWAYS.  That has nothing to do with any real-world
performance concerns, but it would be good to see if we're going to
cause a problem for the long-suffering buildfarm member that does that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Nikhil Sontakke
 But did you see Robert Haas' response upthread?  It looks like there's
 still some work to do -- at least some research to determine that we're
 correctly handling all those cases.  As the committer I'm responsible
 for it, but I don't have resources to get into it any time soon.


Yeah, am definitely planning to test out those scenarios and will respond
some time late today.

Regards,
Nikhils


Re: [HACKERS] RangeVarGetRelid()

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 3:12 PM, Noah Misch n...@leadboat.com wrote:
 I'm satisfied that you and I have a common understanding of the options and
 their respective merits.  There's plenty of room for judgement in choosing
 which merits to seize at the expense of others.  Your judgements here seem
 entirely reasonable.

Thanks.  I'm not trying to be difficult.

After staring at this for quite a while longer, it seemed to me that
the logic for renaming a relation was similar enough to the logic for
changing a schema that the two calbacks could reasonably be combined
using a bit of conditional logic; and that, further, the same callback
could be used, with a small amount of additional modification, for
ALTER TABLE.  Here's a patch to do that.

I also notice that cluster() - which doesn't have a callback - has
exactly the same needs as ReindexRelation() - which does.  So that
case can certainly share code; though I'm not quite sure what to call
the shared callback, or which file to put it in.
RangeVarCallbackForStorageRewrite?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


rangevargetrelid-callback-round3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2011-12-19 Thread Robert Haas
On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler da...@kineticode.com wrote:
 +1, though I think the core type will at least need some basic operators and 
 indexing support.

And I'm willing to do that, but I thought it best to submit a bare
bones patch first, in the hopes of minimizing the number of
objectionable things therein.  For example, if you want to be able to
index a JSON column, you have to decide on some collation order that
is consistent with JSON's notion of equality, and it's not obvious
what is most logical.  Heck, equality itself isn't 100% obvious.  If
there's adequate support for including JSON in core, and nobody
objects to my implementation, then I'll throw some ideas for those
things up against the wall and see what sticks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

2011-12-19 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 Since PostgreSQL 9.1, GIN has new cost estimation code. This code
 assumes that the only expression type it's going to see is OpExpr.
 However, ScalarArrayOpExpr has also been possible in earlier versions.
 Estimating col op ANY (array) queries segfaults in 9.1 if there's
 a GIN index on the column.

Ugh.  I think we subconsciously assumed that ScalarArrayOpExpr couldn't
appear here because GIN doesn't set amsearcharray, but of course that's
wrong.

 (It seems that RowCompareExpr and NullTest clauses are impossible for
 a GIN index -- at least my efforts to find such cases failed)

No, those are safe for the moment --- indxpath.c has a hard-wired
assumption that RowCompareExpr is only usable with btree, and NullTest
is only considered to be indexable if amsearchnulls is set.  Still,
it'd likely be better if this code ignored unrecognized qual expression
types rather than Assert'ing they're not there.  It's not like the cases
it *does* handle are done so perfectly that ignoring an unknown qual
could be said to bollix the estimate unreasonably.

 Attached is an attempted fix for the issue. I split out the code for
 the extract call and now run that for each array element, adding
 together the average of (partialEntriesInQuals, exactEntriesInQuals,
 searchEntriesInQuals) for each array element. After processing all
 quals, I multiply the entries by the number of array_scans (which is
 the product of all array lengths) to get the total cost.

 This required a fair bit of refactoring, but I tried to follow the
 patterns for OpExpr pretty strictly -- discounting scans over NULL
 elements, returning 0 cost when none of the array elements can match,
 accounting for cache effects when there are multiple scans, etc. But
 it's also possible that I have no idea what I'm really doing. :)

Hmm.  I am reminded of how utterly unreadable diff -u format is for
anything longer than single-line changes :-( ... but I think I don't
like this refactoring much.  Will take a closer look tomorrow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: Non-inheritable check constraints

2011-12-19 Thread Nikhil Sontakke
Hi Robert,

First of all, let me state that this ONLY feature has not messed around
with existing inheritance semantics. It allows attaching a constraint to
any table (which can be part of any hierarchy) without the possibility of
it ever playing any part in future or existing inheritance hierarchies. It
is specific to that table, period.

It's not just that.  Suppose that C inherits from B which inherits
 from A.  We add an only constraint to B and a non-only constraint
 to A.   Now, what happens in each of the following scenarios?


An example against latest HEAD should help here:

create table A(ff1 int);
create table B () inherits (A);
create table C () inherits (B);

alter table A add constraint Achk check (ff1  10);

   The above will attach Achk to A, B and C

alter table only B add constraint Bchk check (ff1  0);

  The above will attach Bchk ONLY to table B

1. We drop the constraint from B without specifying ONLY.


postgres=# alter table B drop constraint Achk;
ERROR:  cannot drop inherited constraint achk of relation b

  The above is existing inheritance based behaviour.

Now let's look at the ONLY constraint:

postgres=# alter table B drop constraint Bchk;
ALTER TABLE

 Since this constraint is not part of any hierarchy, it can be removed.

postgres=# alter table only B add constraint bchk check (ff1  0);
ALTER TABLE
postgres=# alter table only B drop constraint Bchk;
ALTER TABLE

So only constraints do not need the only B qualification to be
deleted. They work both ways and can always be deleted without any issues.

2. We drop the constraint from B *with* ONLY.



postgres=# alter table only B drop constraint Achk;
ERROR:  cannot drop inherited constraint achk of relation b

  The above is existing inheritance based behavior. So regardless of
ONLY an inherited constraint cannot be removed from the middle of the
hierarchy.


 3. We drop the constraint from A without specifying ONLY.


postgres=# alter table A drop constraint Achk;
ALTER TABLE

This removes the constraint from the entire hierarchy across A, B and
C. Again existing inheritance behavior.


 4. We drop the constraint from A *with* ONLY.


postgres=# alter table only A drop constraint Achk;
ALTER TABLE

This converts the Achk constraints belonging to B into a local one. C
still has it as an inherited constraint from B. We can now delete those
constraints as per existing inheritance semantics. However I hope the
difference between these and ONLY constraints are clear. The Achk
constraint associated with B can get inherited in the future whereas only
constraints will not be.


 Off the top of my head, I suspect that #1 should be an error;


It's an error for inherited constraints, but not for only constraints.


 #2
 should succeed, leaving only the inherited version of the constraint
 on B;


Yeah, only constraints removal succeeds, whereas inherited constraints
cannot be removed.


 #3 should remove the constraint from A and leave it on B but I'm
 not sure what should happen to C,


This removes the entire hierarchy.


 and I have no clear vision of what
 #4 should do.


This removes the constraint from A, but maintains the inheritance
relationship between B and C. Again standard existing inheritance semantics.

As a followup question, if we do #2 followed by #4, or #4 followed by
 #2, do we end up with the same final state in both cases?


Yeah. #2 is not able to do much really because we do not allow inherited
constraints to be removed from the mid of the hierarchy.


 Here's another scenario.  B inherits from A.  We a constraint to A
 using ONLY, and then drop it without ONLY.  Does that work or fail?


The constraint gets added to A and since it is an only constraint, its
removal both with and without only A works just fine.


 Also, what happens we add matching constraints to B and A, in each
 case using ONLY, and then remove the constraint from A without using
 ONLY?  Does anything happen to B's constraint?  Why or why not?


Again the key differentiation here is that only constraints are bound to
that table and wont be inherited ever. So this works just fine.

postgres=# alter table only A add constraint A2chk check (ff1  10);
ALTER TABLE
postgres=# alter table only B add constraint A2chk check (ff1  10);
ALTER TABLE

Just to be clear, I like the feature.  But I've done some work on this
 code before, and it is amazingly easy for to screw it up and end up
 with bugs... so I think lots of careful thought is in order.


Agreed. I just tried out the scenarios laid out by you both with and
without the committed patch and AFAICS, normal inheritance semantics have
been preserved properly even after the commit.

Regards,
Nikhils