Re: [HACKERS] Makefiles don't seem to remember to rebuild everything anymore

2012-12-16 Thread Pavan Deolasee
On Sun, Dec 16, 2012 at 12:23 AM, Pavan Deolasee
pavan.deola...@gmail.com wrote:


 When a file is secondary, make will not create the file merely
 because it does not already exist, but make does not automatically
 delete the file.
 (link: 
 ftp://ftp.gnu.org/old-gnu/Manuals/make-3.79.1/html_chapter/make_10.html#SEC97)


And here is another relevant info from
http://www.gnu.org/software/make/manual/html_node/Special-Targets.html

.SECONDARY with no prerequisites causes all targets to be treated as
secondary (i.e., no target is removed because it is considered
intermediate).

Reading that along with the other comment explains what we are seeing
with a .SECONDARY without any prerequisites. BTW I also tried with a
very simple Makefile to rule out any roles that implicit rules might
be playing. If I create a Makefile like below:

final: sfinal
cp sfinal final
sfinal: qfinal
cp qfinal sfinal
qfinal:
touch qfinal

If I build fully and then remove file sfinal, subsequent make will
recreate that file. But if I add a .SECONDARY: target without any
prerequisites, sfinal will not be recreated.

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Pavel Stehule
Hello

I understand to motivation, but proposed syntax is not too intuitive and robust

can you do it in one function and call import only in first call?

Regards

Pavel


2012/12/16 Peter Eisentraut pete...@gmx.net:
 I'm going to use PL/Python as an example, but I would also like to know
 if this could be applicable to other languages.

 When you do

 CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
 AS $$
 source code here
 $$;

 it internally creates a source file that contains

 ---
 def __plpython_procedure_foo_12345():
 source code here
 ---

 It would be useful to be able to do something like this instead:

 ---
 some code here

 def __plpython_procedure_foo_12345():
 some more code here
 ---

 This would especially be useful for placing imports into the first part.
 While you can have them in the function definition, that means they are
 executed every time the function is called, which makes it much slower.
 Also, future imports are not possible this way.

 CREATE FUNCTION already supports multiple AS items.  Currently, multiple
 AS items are rejected for all languages but C.  I'd imagine lifting that
 restriction and leaving it up to the validator to check it.  Then any
 language can accept two AS items if it wants and paste them together in
 whichever way it needs.  (The probin/prosrc naming will then become more
 obsolete, but it's perhaps not worth changing anything about that.)

 So in practice this might look like this:

 CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
 AS $$
 import x
 import y
 $$,
 $$
 real code here
 $$;

 Comments?




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Simon Riggs
On 16 December 2012 07:53, Pavan Deolasee pavan.deola...@gmail.com wrote:
 On Sun, Dec 16, 2012 at 3:18 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 Doing that only makes sense when we're running a SELECT. Setting the
 all visible bit immediately prior to an UPDATE that clears it again is
 pointless effort, generating extra work for no reason.

 On the other hand, the HOT prune operation itself is worthless when
 we're running a SELECT.  The only reason we do it that way is that we
 have to prune before the query starts to use the page, else pruning
 might invalidate pointers-to-tuples that are being held within the
 query plan tree.


 Right. HOT prune may and often would be called in the SELECT path or
 heap/index scan leading to UPDATE/DELETE. But whenever its called, it
 looks like a good idea to set the visibility map bit.

As explained above, I disagree that it looks like a good idea, and
you've shown no evidence it would be or is true.

I agree with Tom that cleaning on SELECT can be worthless and I'd
definitely like to be able to turn it off conditionally or
permanently.

 There is some
 additional overhad to check if a  LIVE tuple is all-visible or not,
 but that doesn't look too much. I did run some pgbench tests for fully
 cached tables and did not see any difference in tps.

 Another idea could have been to NOT clear the visibility bit when a
 HOT update happens. Such tuple can get pruned by HOT prune, so we
 don't need vacuum per se, and the index-only scans are not affected
 because the update was a HOT update, so the index keys did not change
 either. So index-only scans would continue to return the same result.
 Don't know if this would work with hot standby, probably not.

 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.

 Yeah, we could do that. We may not be able to predict whether the
 coming update is HOT or not, but I don't think that matters.

We can predict that an update is HOT in advance, if none of the index
columns are touched in the UPDATE. Yes, there are some cases where it
might not be, but we could probably store that in the statement cache.

Making that checkat run time must cost some block contention, so it
would be good to remove it from every update.

 OTOH its
 probably not too bad to prune in any scan (like we do today) because
 there is fairly high chance that the page will be dirtied for hint bit
 updates. So may be its better to just prune as well. We have already
 put in several optimisations to do so only when required and without
 any unnecessary contention. Of course, benchmarks can prove me wrong.

I think we could use some measurement/stats there so we can check.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-16 Thread Jan Wieck

On 12/14/2012 3:20 PM, Robert Haas wrote:

On Fri, Dec 14, 2012 at 2:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:

...  In more
than ten years of working with PostgreSQL, I've never encountered
where the restriction at issue here prevented a bug.  It's only
annoyed me and broken my application code (when moving from PostgreSQL
8.2 to PostgreSQL 8.3, never mind any other database!).


There are quite a few examples in our archives of application bugs that
would have been prevented, or later were prevented, by the 8.3 changes
that reduced the system's willingness to apply implicit casts to text.
I recall for instance cases where people got wrong/unexpected answers
because the system was sorting what-they-thought-were-timestamp values
textually.

So I find such sweeping claims to be demonstrably false, and I'm
suspicious of behavioral changes that are proposed with such arguments
backing them.


I think you're mixing apples and oranges.  The whole point of the
patch on the table - which, if you recall, was designed originally by
you and merely implemented by me - was to change the behavior only in
the cases where there's only one function with the appropriate name
and argument count.  The ambiguous cases that 8.3+ helpfully prevent
are those where overloading is in use and the choice of which function
to call is somewhat arbitrary and perhaps incorrectly-foreseen by the
user.  Those changes also have the side-effect of preventing a
straightforward function call from working without casts even in cases
where no overloading is in use - and making that case work is
completely different from making the ambiguous case arbitrarily pick
one of the available answers.


FWIW I for one thought that casting more liberal in the case at hand, 
where there is only one function with that name and number of arguments, 
would be a good thing. My only concern with the patch presented was that 
changing make_fn_assignment() in that way may have some unwanted side 
effects because it is called from different locations and the usage of 
COERCION_IMPLICIT may actually guard against something, that we don't 
want to allow. I don't have any evidence that it does, just the concern 
that it may.


Perhaps make_fn_arguments() needs to receive that coercion context as an 
argument and the caller hands in COERCION_ASSIGNMENT only in the case at 
hand?



Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix fmgroids.h not regenerated after clean (not clean dist) on Windows

2012-12-16 Thread Magnus Hagander
On Wed, Dec 12, 2012 at 8:07 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 Hi all

 There's an issue with MSVC builds on Windows where clean.bat deletes
 src\include\utils\fmgroids.h (as it should) but build.pl doesn't
 re-create it reliably.

 It's created fine on the first build because Gen_fmgrtab.pl is called if
 src\backend\utils\fmgrtab.c is missing, which it is on first build.
 Generating it also generates src\backend\utils\fmgroids.h, which is
 copied to src\include. However, this copy isn't repeated after clean
 deletes src\include\utils\fmgroids.h .

 clean dist is fine, since it deletes fmgrtab.c too, causing the whole
 thing to be re-generated.

 The attached patch fixes the issue.

Looks good to me. Applied and backpatched to 9.2 - the logic appears
slightly different before that.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] XLByte* usage

2012-12-16 Thread Andres Freund
Hi,

Now that XLRecPtr's are plain 64bit integers what are we supposed to use
in code comparing and manipulating them? There already is plenty example
of both, but I would like new code to go into one direction not two...

I personally find direct comparisons/manipulations far easier to read
than the XLByte* equivalents.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MySQL search query is not executing in Postgres DB

2012-12-16 Thread Simon Riggs
On 27 November 2012 22:41, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 ... I think if you relaxed
 the function sigs of a few functions on this page
 (http://www.postgresql.org/docs/9.2/interactive/functions-string.html),
 most reported problems would go away.

 That's an interesting way of approaching it.  Do we have any data on
 exactly which functions people do complain about?

 One thing that worries me is introducing ambiguous cases where
 previously there weren't any though.

 Right, but at least we'd be confining the ambiguity to a small number
 of function names.  Tweaking the casting rules could have a lot of
 unforeseen consequences.

There have been many good points made on this thread.

Being sloppy in all cases is a bad thing we all agree or reluctantly
admit; what is needed is the ability for a user to be able to more
closely define what they mean by such conversions, so that application
SQL can be made to work.

It certainly isn't easy to say that COLUMN LIKE '1%' would work in all
cases, since the preferred format of that data might be (xxx)
xxx-, or $x or EURO symbol,xx (i.e. with a comma as the
decimal separator). The format comes from the meaning of the data,
which we cannot know.

What would be useful is to be able to define default format models for
each column. If not defined, there is no implicit cast. If FORMAT is
defined then we know to apply it in the absence of a global cast.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Andres Freund
On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
 Another idea could have been to NOT clear the visibility bit when a
 HOT update happens. Such tuple can get pruned by HOT prune, so we
 don't need vacuum per se, and the index-only scans are not affected
 because the update was a HOT update, so the index keys did not change
 either. So index-only scans would continue to return the same result.
 Don't know if this would work with hot standby, probably not.

For IOSs that sounds like an interesting and itself easy to implement
idea, you basically only would need to add a single !use_hot_update in
the if blocks doing the PageClearAllVisible in heap_update.
This probably could make IOSs far more likely in some scenarios.

The complicated bit seems to be the heapgetpage() logic arround
all_visible, because HOT updates are obviously relevant in normal heap
scans. It seems to me that would require the vm bit continuing to be set
while the page level bit get unset.
I *think* thats actually ok because whenever we set/clear the
visibilitymap we will still log it properly, so the crash safety
guarantees seem to hold true. Obviously we would have to change the
escape hatch for exactly that condition in vacuumlazy, but thats not a
problem...

I don't immediately see a problem with HS, your logic seems to hold
equally true there. The replay logic would need to be refined slightly,
but it looks possible. That is without having checked the code...

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Andres Freund
On 2012-12-15 16:48:08 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  Doing that only makes sense when we're running a SELECT. Setting the
  all visible bit immediately prior to an UPDATE that clears it again is
  pointless effort, generating extra work for no reason.

 On the other hand, the HOT prune operation itself is worthless when
 we're running a SELECT.  The only reason we do it that way is that we
 have to prune before the query starts to use the page, else pruning
 might invalidate pointers-to-tuples that are being held within the
 query plan tree.

 Maybe it's time to look at what it'd take for the low-level scan
 operations to know whether they're scanning the target relation of
 an UPDATE query, so that we could skip pruning altogether except
 when a HOT update could conceivably ensue.  I think this was discussed
 back when HOT went in, but nobody wanted to make the patch more invasive
 than it had to be.

FWIW I think that would be a pretty worthwile optimization - I have seen
workloads where hot pruning lead to considerable contention.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding - GetOldestXmin

2012-12-16 Thread Andres Freund
On 2012-12-15 01:19:26 +0100, Andres Freund wrote:
 On 2012-12-14 14:01:30 -0500, Robert Haas wrote:
  On Fri, Dec 14, 2012 at 6:46 AM, Andres Freund and...@2ndquadrant.com 
  wrote:
   Just moving that tidbit inside the lock seems to be the pragmatic
   choice. GetOldestXmin is called
  
   * once per checkpoint
   * one per index build
   * once in analyze
   * twice per vacuum
   * once for HS feedback messages
  
   Nothing of that occurs frequently enough that 5 instructions will make a
   difference. I would be happy to go an alternative path, but right now I
   don't see any nice one. A already_locked parameter to GetOldestXmin
   seems to be a cure worse than the disease.
 
  I'm not sure that would be so bad, but I guess I question the need to
  do it this way at all.  Most of the time, if you need to advertise
  your global xmin, you use GetSnapshotData(), not GetOldestXmin(), and
  I guess I'm not seeing why that wouldn't also work here.  Am I dumb?

 I wondered upthread whether that would be better:

 On 2012-12-13 21:03:44 +0100, Andres Freund wrote:
  Another alternative to this would be to get a snapshot with
  GetSnapshotData(), copy the xmin to the logical slot, then call
  ProcArrayEndTransaction(). But that doesn't really seem to be nicer to
  me.

 Not sure why I considered it ugly anymore, but it actually has a
 noticeable disadvantage. GetOldestXmin is nicer is than GetSnapshotData
 as the latter set a fairly new xid as xmin whereas GetOldestXmin returns
 the actual current xmin horizon. Thats preferrable because it allows us
 to start up more quickly. snapbuild.c can only start building a snapshot
 once it has seen a xl_running_xact with oldestRunningXid =
 own_xmin. Otherwise we cannot be sure that no relevant catalog tuples
 have been removed.

Hm. One way that could work with fewer changes is to exploit the fact
that a) it seems to be possible to acquire a shared lwlock twice in the
same backend and b) both GetOldestXmin  GetSnapshotData acquire only a
shared lwlock.

Are we willing to guarantee that recursive acquiration of shared lwlocks
continues to work?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Assert for frontend programs?

2012-12-16 Thread Andrew Dunstan


On 12/16/2012 01:29 AM, Peter Eisentraut wrote:

On Fri, 2012-12-14 at 17:03 -0500, Tom Lane wrote:

Having the layer is a good thing, eg so that USE_ASSERT_CHECKING
can control it, or so that somebody can inject a different behavior
if they want.

You could also (or at least additionally) map !USE_ASSERT_CHECKING to
NDEBUG.  This would also help with imported code that calls assert()
directly.



We should probably do that for both frontend and backend code, no? That 
would get rid of potential problems we already have like inet_net_pton.c 
that I noted the other day.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Pavan Deolasee
On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs si...@2ndquadrant.com wrote:


 As explained above, I disagree that it looks like a good idea, and
 you've shown no evidence it would be or is true.


Lets separate out these two issues. What you are suggesting as a
follow up to Tom's idea, I've no objection to that and that might be
worthwhile optimisation to try out. But this patch itself does not
attempt to deal with that and its a separate work item and will
require invasive changes and tests.

*Whenever* we HOT prune, either in SELECT path or UPDATE path, what
I'm suggesting is lets try to set the visibility map bit if the
conditions are favorable. The only extra work that we are doing (as in
the submitted patch) is to check few additional things for LIVE tuples
such as if xmin precedes the OldestXmin or not. That itself does not
seem too costly. What we gain is: 1. next vacuum may skip that page
because its marked all-visible and 2. index-only scan will not visit
the heap page. These two improvements will avoid an useless heap page
IO and may justify a little more work in HOT prune.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Simon Riggs
On 16 December 2012 14:41, Andres Freund and...@2ndquadrant.com wrote:
 On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
 Another idea could have been to NOT clear the visibility bit when a
 HOT update happens. Such tuple can get pruned by HOT prune, so we
 don't need vacuum per se, and the index-only scans are not affected
 because the update was a HOT update, so the index keys did not change
 either. So index-only scans would continue to return the same result.
 Don't know if this would work with hot standby, probably not.

 For IOSs that sounds like an interesting and itself easy to implement
 idea, you basically only would need to add a single !use_hot_update in
 the if blocks doing the PageClearAllVisible in heap_update.
 This probably could make IOSs far more likely in some scenarios.

Doing that would completely change the meaning of the visibility map
from a heap visibility map into an index-only map.

IndexOnly scans would still work, but nothing else would ever and it
would be hard to confirm the validity of the vm.

-1

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Serious problem: media recovery fails after system or PostgreSQL crash

2012-12-16 Thread Tomas Vondra
On 8.12.2012 03:08, Jeff Janes wrote:
 On Thu, Dec 6, 2012 at 3:52 PM, Tomas Vondra t...@fuzzy.cz wrote:
 Hi,

 On 6.12.2012 23:45, MauMau wrote:
 From: Tom Lane t...@sss.pgh.pa.us
 Well, that's unfortunate, but it's not clear that automatic recovery is
 possible.  The only way out of it would be if an undamaged copy of the
 segment was in pg_xlog/ ... but if I recall the logic correctly, we'd
 not even be trying to fetch from the archive if we had a local copy.

 No, PG will try to fetch the WAL file from pg_xlog when it cannot get it
 from archive.  XLogFileReadAnyTLI() does that.  Also, PG manual contains
 the following description:

 http://www.postgresql.org/docs/9.1/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL


 WAL segments that cannot be found in the archive will be sought in
 pg_xlog/; this allows use of recent un-archived segments. However,
 segments that are available from the archive will be used in preference
 to files in pg_xlog/.

 So why don't you use an archive command that does not create such
 incomplete files? I mean something like this:

 archive_command = 'cp %p /arch/%f.tmp  mv /arch/%f.tmp /arch/%f'

 Until the file is renamed, it's considered 'incomplete'.
 
 Wouldn't having the incomplete file be preferable over having none of it at 
 all?
 
 It seems to me you need considerable expertise to figure out how to do
 optimal recovery (i.e. losing the least transactions) in this
 situation, and that that expertise cannot be automated.  Do you trust
 a partial file from a good hard drive, or a complete file from a
 partially melted pg_xlog?

It clearly is a rather complex issue, no doubt about that. And yes,
reliability of the devices with pg_xlog on them is an important detail.
Alghough if the WAL is not written in a reliable way, you're hosed
anyway I guess.

The recommended archive command is based on the assumption that the
local pg_xlog is intact (e.g. because it's located on a reliable RAID1
array), which seems to be the assumption of the OP too.

In my opinion it's more likely to meet an incomplete copy of WAL in the
archive than a corrupted local WAL. And if it really is corrupted, it
would be identified during replay.

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Andres Freund
On 2012-12-16 16:25:03 +, Simon Riggs wrote:
 On 16 December 2012 14:41, Andres Freund and...@2ndquadrant.com wrote:
  On 2012-12-16 13:23:56 +0530, Pavan Deolasee wrote:
  Another idea could have been to NOT clear the visibility bit when a
  HOT update happens. Such tuple can get pruned by HOT prune, so we
  don't need vacuum per se, and the index-only scans are not affected
  because the update was a HOT update, so the index keys did not change
  either. So index-only scans would continue to return the same result.
  Don't know if this would work with hot standby, probably not.
 
  For IOSs that sounds like an interesting and itself easy to implement
  idea, you basically only would need to add a single !use_hot_update in
  the if blocks doing the PageClearAllVisible in heap_update.
  This probably could make IOSs far more likely in some scenarios.

 Doing that would completely change the meaning of the visibility map
 from a heap visibility map into an index-only map.

 IndexOnly scans would still work, but nothing else would ever and it
 would be hard to confirm the validity of the vm.

I don't think it would change the meaning that much - the visibilitymap
would still work for vacuum as normal heap updates would still unset the
all-visible flag. Vacuum would skip pages that already were all-visible
and then only got hot updated, true, but that seems like its an
acceptable tradeoff as that dead space can be fully cleaned up by hot
pruning.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding - GetOldestXmin

2012-12-16 Thread Simon Riggs
On 13 December 2012 20:03, Andres Freund and...@2ndquadrant.com wrote:

 Does anybody have an opinion on the attached patches? Especially 0001,
 which contains the procarray changes?

 It moves a computation of the sort of:

 result -= vacuum_defer_cleanup_age;
 if (!TransactionIdIsNormal(result))
result = FirstNormalTransactionId;

 inside ProcArrayLock. But I can't really imagine that to be relevant...

I don't see why this is hard.

Just make the lock acquisition/release conditional on another parameter.

That way the only thing you'll be moving inside the lock is an if test
on a constant boolean.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical decoding - GetOldestXmin

2012-12-16 Thread Andres Freund
On 2012-12-16 16:44:04 +, Simon Riggs wrote:
 On 13 December 2012 20:03, Andres Freund and...@2ndquadrant.com wrote:

  Does anybody have an opinion on the attached patches? Especially 0001,
  which contains the procarray changes?
 
  It moves a computation of the sort of:
 
  result -= vacuum_defer_cleanup_age;
  if (!TransactionIdIsNormal(result))
 result = FirstNormalTransactionId;
 
  inside ProcArrayLock. But I can't really imagine that to be relevant...

 I don't see why this is hard.

 Just make the lock acquisition/release conditional on another parameter.

 That way the only thing you'll be moving inside the lock is an if test
 on a constant boolean.

Thats not really cheaper. Two branches + additional parameter
passed/pushed vs one branch, one subtransaction, two assignments is a
close call.
As I don't think either really matters in the GetOldestXmin case, I
would be happy with that as well. If people prefer an additional
parameter + adjusting the few callsite vs. a separate function I will go
that way.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small pg_basebackup display bug

2012-12-16 Thread Magnus Hagander
On Sat, Dec 15, 2012 at 2:24 PM, Erik Rijkers e...@xs4all.nl wrote:
 On Sat, December 15, 2012 14:10, Magnus Hagander wrote:
 On Sat, Dec 15, 2012 at 11:39 AM, Erik Rijkers e...@xs4all.nl wrote:
 from 9.3devel (this morning):


 The truncated name in parentheses only shows up during the filling of the 
 new PGDATA; when 100%
 is
 reached that part of the displayed name is deleted.  That's another bug in 
 itself, I think:
 shouldn't the end-state of that line also display the parenthesized name?

 Those are both intentional. It may be argued that the intention shoud
 be changed :), but it's not a bug in itself - it's acting like
 intended.


 Fair enough.  How about just changing the possible length to enable display 
 somewhat longer names
 like I used?  It truncates to 30 characters; perhaps something like 100 would 
 be more reasonable?

The general idea with the selection was that it shouldn't truncate on
a normal terminal. 30 might be a bit on the aggressive side, but 100
would be guaranteed to almost always do it (since there is also other
text on the line than just the filename). I'm not sure if we can
figure out the size of the screen in a platform independent way
without going through a lot of extra work though - but maybe we can?

 That would make such a truncation less frequent, and after all a truncated 
 display is not
 particular useful.

Agreed - it's useful during testing, but not in a typical production
use. It might actually be more useful if it's truncated in in the
other end (keeping the last 30 instead of the first 30 chars)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set visibility map bit after HOT prune

2012-12-16 Thread Tom Lane
Pavan Deolasee pavan.deola...@gmail.com writes:
 On Sun, Dec 16, 2012 at 3:10 PM, Simon Riggs si...@2ndquadrant.com wrote:
 As explained above, I disagree that it looks like a good idea, and
 you've shown no evidence it would be or is true.

 Lets separate out these two issues. What you are suggesting as a
 follow up to Tom's idea, I've no objection to that and that might be
 worthwhile optimisation to try out. But this patch itself does not
 attempt to deal with that and its a separate work item and will
 require invasive changes and tests.

 *Whenever* we HOT prune, either in SELECT path or UPDATE path, what
 I'm suggesting is lets try to set the visibility map bit if the
 conditions are favorable.

I don't believe it's clear at all that this is a good idea.  If we
restrict pruning to occur only when there's a fairly good chance of
an ensuing HOT update, then Simon's original objection (that we're
probably going to have to clear the bit again right away) has
considerable force.  And I agree with him that your proposed
redefinition of the bit's meaning to avoid that is pretty horrid;
it's ugly, complicates the invariant quite a lot, and breaks some
existing usages of the bit.

If we decide that we don't want to restrict pruning like that, then
this patch probably has merit.  But we can't evaluate the two issues
independently.

Another thing that would need to be considered, if we do want to
restrict when pruning happens, is whether it is worth introducing some
other path altogether for setting the all-visible bit.  Or perhaps we
should modify autovacuum's rules so that it will fire on relations for
which there might be lots of unmarked all-visible pages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Peter Eisentraut
On Sun, 2012-12-16 at 10:20 +0100, Pavel Stehule wrote:
 Hello
 
 I understand to motivation, but proposed syntax is not too intuitive and 
 robust
 
 can you do it in one function and call import only in first call?

Sometimes, but it's even less intuitive and robust.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 When you do

 CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
 AS $$
 source code here
 $$;

 it internally creates a source file that contains

 ---
 def __plpython_procedure_foo_12345():
 source code here
 ---

 It would be useful to be able to do something like this instead:

 ---
 some code here

 def __plpython_procedure_foo_12345():
 some more code here
 ---

 This would especially be useful for placing imports into the first part.

Sure, but wouldn't it be cleaner to do that via some language-specific
syntax inside the function string?  I'm imagining some syntax like

CREATE FUNCTION ... AS $$
global[ some definitions here ]
function code here
$$;

where the PL would be responsible for pulling off the global chunk
and structuring what it outputs accordingly.

 CREATE FUNCTION already supports multiple AS items.  Currently, multiple
 AS items are rejected for all languages but C.  I'd imagine lifting that
 restriction and leaving it up to the validator to check it.  Then any
 language can accept two AS items if it wants and paste them together in
 whichever way it needs.  (The probin/prosrc naming will then become more
 obsolete, but it's perhaps not worth changing anything about that.)

I think doing it this way is a bad idea, mainly because (1) it won't
scale to more than two items (at least not without great rearrangement
of pg_proc) and (2) having two otherwise-unlabeled AS items isn't at all
understandable or readable.  For instance, which of the two is the
global part, and why?  The fact that C functions do it like that is a
legacy syntax we're stuck with, not a good model to copy for other
languages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 07:37 AM, Peter Eisentraut wrote:

I'm going to use PL/Python as an example, but I would also like to know
if this could be applicable to other languages.

When you do

CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
source code here
$$;

it internally creates a source file that contains

---
def __plpython_procedure_foo_12345():
 source code here
---

It would be useful to be able to do something like this instead:

---
some code here

def __plpython_procedure_foo_12345():
 some more code here
---

This would especially be useful for placing imports into the first part.
While you can have them in the function definition, that means they are
executed every time the function is called, which makes it much slower.
Also, future imports are not possible this way.

CREATE FUNCTION already supports multiple AS items.  Currently, multiple
AS items are rejected for all languages but C.  I'd imagine lifting that
restriction and leaving it up to the validator to check it.  Then any
language can accept two AS items if it wants and paste them together in
whichever way it needs.  (The probin/prosrc naming will then become more
obsolete, but it's perhaps not worth changing anything about that.)

So in practice this might look like this:

CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
import x
import y
$$,
$$
real code here
$$;

Comments?

As an idea seems quite good, but maybe the run once part could use its
own keyword in the future, something like PREPARE or REQUIRE?

Or maye WITH to reuse a keyword

CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
WITH -- this part is evaluated only once, in PLy_procedure_create
$$
import x
import y
$$
AS -- this is compiled in the same namespace as above
$$
 function body here
$$;

WHile at it, why not also fix the functions to be real function
_with_ _real_ _arguments_ , not arguments-passed-in-as-globals

and at least we could call this function with its real name inside its own 
module
(stored global namespace) so we could easily do recursion

CREATE FUNCTION factorial(n bigint) returns bigint LANGUAGE plpythonu
AS $$
if n==0: return 1
return factorial(n-1) * n
$$;

--
Hannu




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] small pg_basebackup display bug

2012-12-16 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Sat, Dec 15, 2012 at 2:24 PM, Erik Rijkers e...@xs4all.nl wrote:
 That would make such a truncation less frequent, and after all a truncated 
 display is not
 particular useful.

 Agreed - it's useful during testing, but not in a typical production
 use. It might actually be more useful if it's truncated in in the
 other end (keeping the last 30 instead of the first 30 chars)

+1 for truncating from the left.  I think pg_upgrade already does that
in its progress messages.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 07:03 PM, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

When you do
CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
source code here
$$;
it internally creates a source file that contains
---
def __plpython_procedure_foo_12345():
 source code here
---
It would be useful to be able to do something like this instead:
---
some code here
def __plpython_procedure_foo_12345():
 some more code here
---
This would especially be useful for placing imports into the first part.

Sure, but wouldn't it be cleaner to do that via some language-specific
syntax inside the function string?  I'm imagining some syntax like

CREATE FUNCTION ... AS $$
global[ some definitions here ]
function code here
$$;

where the PL would be responsible for pulling off the global chunk
and structuring what it outputs accordingly.

I was going to suggest some special function name to be pulled out of code
passed to CREATE FUNCTION in line with

CREATE FUNCTION foo(a,b,c) AS $$
import x
from __future__ import nex_cool_feature

def helper_function(x):
   ...

def __pg_main__(a,b,c):
defined function body here

$$;

so that the whole text gets compiled into module at first call and the 
__pg_main__ will
be the function that gets called as foo(a,b,c) from postgresql

but this would not be backwards compatible, at least not in any obvious way.

---
Hanniu
 






CREATE FUNCTION already supports multiple AS items.  Currently, multiple
AS items are rejected for all languages but C.  I'd imagine lifting that
restriction and leaving it up to the validator to check it.  Then any
language can accept two AS items if it wants and paste them together in
whichever way it needs.  (The probin/prosrc naming will then become more
obsolete, but it's perhaps not worth changing anything about that.)

I think doing it this way is a bad idea, mainly because (1) it won't
scale to more than two items (at least not without great rearrangement
of pg_proc) and (2) having two otherwise-unlabeled AS items isn't at all
understandable or readable.  For instance, which of the two is the
global part, and why?  The fact that C functions do it like that is a
legacy syntax we're stuck with, not a good model to copy for other
languages.

regards, tom lane






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 07:20 PM, Hannu Krosing wrote:

On 12/16/2012 07:03 PM, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

When you do
CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
source code here
$$;
it internally creates a source file that contains
---
def __plpython_procedure_foo_12345():
 source code here
---
It would be useful to be able to do something like this instead:
---
some code here
def __plpython_procedure_foo_12345():
 some more code here
---
This would especially be useful for placing imports into the first 
part.

Sure, but wouldn't it be cleaner to do that via some language-specific
syntax inside the function string?  I'm imagining some syntax like

CREATE FUNCTION ... AS $$
global[ some definitions here ]
function code here
$$;

where the PL would be responsible for pulling off the global chunk
and structuring what it outputs accordingly.
I was going to suggest some special function name to be pulled out of 
code

passed to CREATE FUNCTION in line with

CREATE FUNCTION foo(a,b,c) AS $$
import x
from __future__ import nex_cool_feature

def helper_function(x):
   ...

def __pg_main__(a,b,c):
defined function body here

$$;

so that the whole text gets compiled into module at first call and the 
__pg_main__ will

be the function that gets called as foo(a,b,c) from postgresql
On further thought the function name should just be what it is defined 
in postgresql, like this


CREATE FUNCTION foo(a,b,c) AS $$
import x
from __future__ import nex_cool_feature

def helper_function(x):
   ...

def foo(a,b,c):
defined function body here

def bar(i,j):
function body for bar(i,j)
$$ language plpythonu;

if the above definition saved the whole compiled unit as module 
pg_functions.foo then
we could define postgresql/plpython function bar() by importing the same 
module


CREATE FUNCTION bar(a,b,c) AS $$
form pg_functions.foo import bar
$$ language plpythonu;

This is not as simple as this, as we still need to find the source for 
foo in case bar() gets
called first and module foo is not yet saved, but this could be one 
approach to having

python modules without introducing extra syntax at postgreSQL level.

but this would not be backwards compatible, at least not in any 
obvious way.

This is still unfortunately true :(


Hannu



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 On further thought the function name should just be what it is defined 
 in postgresql, like this

 CREATE FUNCTION foo(a,b,c) AS $$
  import x
  from __future__ import nex_cool_feature

  def helper_function(x):
 ...

  def foo(a,b,c):
  defined function body here

  def bar(i,j):
  function body for bar(i,j)
 $$ language plpythonu;

 but this would not be backwards compatible, at least not in any 
 obvious way.

 This is still unfortunately true :(

Could we say that *if* the function text contains a line beginning
def function_name then we interpret it as above, otherwise oldstyle?
I'm not sure how big a risk of false positives there'd be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 On further thought the function name should just be what it is defined 
 in postgresql, like this

 CREATE FUNCTION foo(a,b,c) AS $$
  def foo(a,b,c):

BTW, how well will that play with overloaded function names?  I don't
particularly care for saying that PL/Python fails if you overload a
function name across multiple schemas or argument lists ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 07:44 PM, Tom Lane wrote:

Hannu Krosing ha...@2ndquadrant.com writes:

On further thought the function name should just be what it is defined
in postgresql, like this
CREATE FUNCTION foo(a,b,c) AS $$
def foo(a,b,c):

BTW, how well will that play with overloaded function names?  I don't
particularly care for saying that PL/Python fails if you overload a
function name across multiple schemas or argument lists ...

Currently each pl/python function gets compiled in its own python
module namespace, so this is not be a problem .

--
Hannu





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 07:37 PM, Tom Lane wrote:

Hannu Krosing ha...@2ndquadrant.com writes:

On further thought the function name should just be what it is defined
in postgresql, like this
CREATE FUNCTION foo(a,b,c) AS $$
  import x
  from __future__ import nex_cool_feature
  def helper_function(x):
 ...
  def foo(a,b,c):
  defined function body here
  def bar(i,j):
  function body for bar(i,j)
$$ language plpythonu;

but this would not be backwards compatible, at least not in any
obvious way.

This is still unfortunately true :(

Could we say that *if* the function text contains a line beginning
def function_name then we interpret it as above, otherwise oldstyle?
I'm not sure how big a risk of false positives there'd be.

You could be inclined to define a recursive function like this under 
current pl/python


CREATE FUNCTION factorial(n bigint) returns bigint LANGUAGE plpythonu
AS $$

def factorial(n):
if n==0: return 1
return factorial(n-1) * n

return factorial(n)

$$;

but at least for functions returning a non-null value an old-style 
definition usually

end with line in form

return something


Hannu



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Andrew Dunstan


On 12/16/2012 01:37 AM, Peter Eisentraut wrote:

So in practice this might look like this:

CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
import x
import y
$$,
$$
real code here
$$;




Bleah.

It seems obscure to say the least.

Why not have something along the lines of plperl's on_init setting to 
load libraries? Among other things that would give you the advantage of 
being able to preload them, and also of some consistency among PLs.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 08:44 PM, Andrew Dunstan wrote:


On 12/16/2012 01:37 AM, Peter Eisentraut wrote:

So in practice this might look like this:

CREATE FUNCTION foo(...) ... LANGUAGE plpythonu
AS $$
import x
import y
$$,
$$
real code here
$$;




Bleah.

It seems obscure to say the least.

Why not have something along the lines of plperl's on_init setting to 
load libraries? Among other things that would give you the advantage 
of being able to preload them, and also of some consistency among PLs.




While plpython.on_init is a much needed feature, it is orthogonal to 
what is discussed here.


AIUI Peters proposal aimed adding per-function preparation / 
initialisation not something

to be run for initialising the whole interpreter.

Also - to make the plpython.on_init really useful - there should be some 
way to have python

_modules_ inside postgresql

If we would redefine plpython functions to define their own _visible_ 
modules (currently
each has its own module but there is no way to reference it from others) 
and have the
function stored in this module we could not only solve the problem of 
plpython modules

but also for calling other plpython modules directly from python .

for example, if doing

CREATE FUNCTION foo(i int) RETURNS int LANGUAGE plpythonu $$
def foo(i):
return i+1
$$;

would also make this function available as plpy.modules.foo_int.foo 
(meaning its global

namespace would be saved as plpy.modules.foo_int

then other plpy functions could call it directly by doing

from plpy.modules.foo_int import foo

I try to come up with a more detailed proposal along these lines.

--
Hannu







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Peter Eisentraut
On Sun, 2012-12-16 at 19:13 +0100, Hannu Krosing wrote:
 As an idea seems quite good, but maybe the run once part could use
 its
 own keyword in the future, something like PREPARE or REQUIRE?

Well, either we do it in a language independent way, in which case this
would be too prescriptive, or we do it in a Python-specific way (less
likely), but prepare or require are not Python concepts.

 WHile at it, why not also fix the functions to be real function
 _with_ _real_ _arguments_ , not arguments-passed-in-as-globals
 
 and at least we could call this function with its real name inside its
 own module
 (stored global namespace) so we could easily do recursion
 
 CREATE FUNCTION factorial(n bigint) returns bigint LANGUAGE plpythonu
 AS $$
  if n==0: return 1
  return factorial(n-1) * n
 $$;
 
These are also good things to fix, but are they related?  Could they not
be fixed independently?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Peter Eisentraut
On Sun, 2012-12-16 at 13:03 -0500, Tom Lane wrote:
 Sure, but wouldn't it be cleaner to do that via some language-specific
 syntax inside the function string?  I'm imagining some syntax like
 
 CREATE FUNCTION ... AS $$
 global[ some definitions here ]
 function code here
 $$;
 
 where the PL would be responsible for pulling off the global chunk
 and structuring what it outputs accordingly.

But then the language text wouldn't be Python anymore.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2012-12-16 Thread Alexander Korotkov
On Fri, Dec 14, 2012 at 1:34 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Mon, Dec 3, 2012 at 4:31 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 Actually, I generally dislike path matrix for same reasons. But:
 1) Output graphs could contain trigrams which are completely useless for
 search. For example, for regex /(abcdefgh)*ijk/ we need only ijk trigram
 while graph would contain much more.Path matrix is a method to get rid of
 all of them.
 2) If we use color trigrams then we need some criteria for which color
 trigrams to expand into trigrams. Simultaneously, we shouldn't allow path
 from initial state to the final by unexpanded trigrams. It seems much
 harder to do with graph than with matrix.


 Now, I have an idea about doing some not comprehensive but simple and fast
 simplification of graph. I'm doing experiments now. In case of success we
 could get rid of path matrix.


Attached patch have following changes:
1) Postphone expansion of colors. Graph are building on color trigrams.
2) Selective expansion of color trigrams into simple trigrams. All
non-expanded color trigrams are removed. Such removal leads to union of all
states pairs connected with corresponding arcs. Surely, this must no lead
to union of initial and final states: that could do all previous work
senseless.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doc patch, further describe and-mask nature of the permission system

2012-12-16 Thread Karl O. Pinc
On 12/16/2012 12:56:22 AM, Peter Eisentraut wrote:
 On Mon, 2012-12-10 at 20:48 -0600, Karl O. Pinc wrote:
  On 11/14/2012 02:35:54 PM, Karl O. Pinc wrote:
   On 11/13/2012 08:50:55 PM, Peter Eisentraut wrote:
On Sat, 2012-09-29 at 01:16 -0500, Karl O. Pinc wrote:
 This patch makes some sweeping statements.

Unfortunately, they are wrong.
   
   I will see if anything can be salvaged.
  
  Here's another try.
  (I bundled changes to both paragraphs into a single
  patch.)
  
  grants-of-roles-are-additive_v3.patch
 
 I don't get the point of this change, especially why you are trying 
 to
 liken the roles system to the object hierarchy, when they are clearly
 different and unrelated.

It seems to me the that the permission system follows the object system
hierarchy in those cases where different levels of the object
hierarchy may have identical permissions.  The exceptions being
permissions like USAGE, which seems to be a convenient common lexical
token but mean (and need to mean) something entirely different
at each level of the object hierarchy.   ALL is also confuses the
issue, since it means all permissions which work at this level
of the object hierarchy and not all permissions so, say,
granting ALL to a database says nothing about INSERT permission.

I'm (clearly) not steeped in the pg permission system, but it
does seem that where permissions are shared between levels
of the object hierarchy there is a consistency in the
resulting interaction when granting/revoking at different
levels of the object hierarchy.  Perhaps this is ipso facto
(counterexamples being automatically designated as
not shared by nature of the premise :)
or perhaps more an artifact of my attention than the
result of any sort of design.  Anyway, my intent is to point
out this consistency.  Since the way in which interactions
between permissions set at different levels of the object
hierarchy is sometimes useful I go on to describe how to
replicate the behavior and apply it outside the object
hierarchy.

In any case I thought the elaboration would be helpful.  
I had a few minutes and cooked it up.  If you don't don't think
it should go in then reject it.  As noted already in the
docs, permissions are different at different levels of the
object hierarchy, but similar enough to describe in one place.
I was hoping to provide a possible framework for thinking
about permission interactions between object hierarchy levels 
where such occur.  Without any sort of framework everything
becomes a special case and it's hard to keep track of.

Thanks for spending time on it.  If there's anything about
it that appeals then I will continue to work under
your direction.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] multiple CREATE FUNCTION AS items for PLs

2012-12-16 Thread Hannu Krosing

On 12/16/2012 10:23 PM, Peter Eisentraut wrote:

On Sun, 2012-12-16 at 19:13 +0100, Hannu Krosing wrote:

As an idea seems quite good, but maybe the run once part could use
its
own keyword in the future, something like PREPARE or REQUIRE?

Well, either we do it in a language independent way, in which case this
would be too prescriptive, or we do it in a Python-specific way (less
likely), but prepare or require are not Python concepts.

The proposed keywords names are meant to be language-independant
and to signify the the part that is meant for initialisation or 
requirements.


The multiple AS $$...$$ sections have to mean something to be useful at all.

My final choce of WITH seem to both fit with run-one/required/init meaning
and is already a keyword.

But I'd ended preferring much more the approach of putting the whole
function module in functions source code and returning as the plpython
function the item matching of the defined function which can be a function
or any other callable.

The main problem is staying backwards compatible with existing 
implementation.



WHile at it, why not also fix the functions to be real function
_with_ _real_ _arguments_ , not arguments-passed-in-as-globals

and at least we could call this function with its real name inside its
own module
(stored global namespace) so we could easily do recursion

CREATE FUNCTION factorial(n bigint) returns bigint LANGUAGE plpythonu
AS $$
  if n==0: return 1
  return factorial(n-1) * n
$$;


These are also good things to fix, but are they related?  Could they not
be fixed independently?
They could, but fixing these together will probably result in a cleaner 
design :)


even with your original multiple-code-strings design you end up 
manipulating

function-global namespaces (which seem really close to modules) to put the
first $$...$$ there as run-once, pre-def code.

using functions real name (instead of _plpython_funcname_oid) in its 
module
namespace is an one-line fix but to be really useful the mess with 
arguments-as-globals

needs to be rectified.

if we move to the function-code-as module approach we will no longer need
to munge code (add def  before code and then \t at the beginning of 
each line)

which makes everything much cleaner.

The main thing to solve is different model for passing function 
arguments at call time.


---
Hannu



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add big fat caution to pg_restore docs regards partial db restores

2012-12-16 Thread Karl O. Pinc
On 12/16/2012 12:51:06 AM, Peter Eisentraut wrote:

 I'm going to set this patch as returned with feedback for now.

Ok.  At this point I don't have a vision for improving it
so it might sit there untouched.   Maybe someone else
will step forward and make it better.

Regards,

Karl k...@meme.com
Free Software:  You don't pay back, you pay forward.
 -- Robert A. Heinlein



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: optimized DROP of multiple tables within a transaction

2012-12-16 Thread Tomas Vondra
Hi,

I've updated the patch to include the optimization described in the
previous post, i.e. if the number of relations is below a certain
threshold, use a simple for loop, for large numbers of relations use
bsearch calls.

This is done by a new constant BSEARCH_LIMIT, which is set to 10 in the
patch. Then I've modified the 'drop-test' script to take yet another
argument - number of indexes for each table. So for example this

./drop-test.py 1 100 3 'dbname=test'

means create 1 tables, 3 indexes for each of them, and then drop
them in batches of 100 tables.

Then I've run the test with 0, 1, 2, ... 11 indexes for each table for

(a) unpatched HEAD
(b) patch v3.1 (without the optimization)
(c) patch v3.3 (with BSEARCH_LIMIT=10)

and I got these results:


1) dropping one-by-one
--

This is the really interesting part - the issue with the v3.1 is that
for a single table, it's ~2x slower than unpatched PostgreSQL.

 0   1   2   3   4   5   6   789   10   11
--
unpatched   16  28  40  52  63  75  87  99  110  121  135  147
 v3.1   33  43  46  56  58  60  63  72   75   76   79   80
 v3.3   16  20  23  25  29  33  36  40   44   47   79   82

The values are durations in seconds, rounded to integer values. I've run
the test repeatedly and there's very small variance in the numbers.

The v3.3 improves that and it's actually even faster than unpatched
PostgreSQL. How can this happen? I believe it's because the code is
rewritten from

   for each relation (r) in the drop list
  DropRelFileNodeAllBuffers (r)
 for each shared buffer
check and invalidate

to

   copy the relations from drop list to array (a)
   DropRelFileNodeAllBuffers(a)
  for each shared buffer
  for each relation (r) in the array
  check and invalidate

At least that's the only explanation I was able to come up with.

Yet another interesting observation is that even the v3.1 is about as
fast as the unpatched code once there are 3 or more indexes (or TOAST
tables).

So my opinion is that the optimizated patch works as expected, and that
even without the optimization the performance would be acceptable for
most real-world cases.

2) dropping in transaction
--

This is mostly to verify that the code did not break anything, because
the optimization should not kick-in in this case at all. And that seems
to be the case:

 0   1   2   3   4   5   6   789   10   11
--
unpatched   13  24  35  46  58  69  81  92  103  115  127  139
 v3.13   5   7   8  10  12  14  15   16   18   20   21
 v3.33   4   6   7   8  10  11  13   14   15   18   20

The differences between v3.1 and v3.3 are mostly due to rounding etc.


Attached is the v3.3 patch and the testing script I've been using for
the tests above. Feel free to run the tests on your hardware, with your
hardware, shared buffers size etc. I've run that on a 4-core i5-2500 CPU
with 2GB shared buffers.

Tomas


#!/bin/env python

import datetime
import psycopg2
import sys

if __name__ == '__main__':
	
	if len(sys.argv)  4:
		print ERORR: not enough parameters
		print HINT:  drop-test.py num-of-tables drop-num num-of-indexes 'connection string'
		sys.exit(1)
	
	ntables = int(sys.argv[1])
	nlimit  = int(sys.argv[2])
	nindex  = int(sys.argv[3])
	connstr = str(sys.argv[4])
	debug   = False
	
	conn = psycopg2.connect(connstr)
	cur  = conn.cursor()
	
	# print 'creating %s tables' % (ntables,)
	start = datetime.datetime.now()
	for i in range(ntables):
		cur.execute('CREATE TABLE tab_%s (id INT)' % (i,))
		for j in range(nindex):
			cur.execute('CREATE INDEX idx_%s_%s ON tab_%s(id)' % (i,j,i))
		conn.commit();
		if (i % 1000 == 0) and debug:
			print '  tables created: %s' % (i,)
	conn.commit()
	end = datetime.datetime.now()
	# print '  all tables created in %s seconds' % ((end-start).total_seconds(),)

	# set to autocommit mode
	conn.autocommit = True
	start = datetime.datetime.now()
	# print 'dropping %s tables one by one ...' % (ntables,)
	for i in range(ntables):
		cur.execute('DROP TABLE tab_%s' % (i,))
		if (i % 1000 == 0) and debug:
			print '  tables dropped: %s' % (i,)

	end = datetime.datetime.now()
	print 'dropped one-by-one in %s seconds' % ((end-start).total_seconds(),)

	# cancel the autocommit mode
	conn.autocommit = False
	
	# recreate tables
	# print 'creating %s tables' % (ntables,)
	start = datetime.datetime.now()
	for i in range(ntables):
		cur.execute('CREATE TABLE tab_%s (id INT)' % (i,))
		for j in range(nindex):
			cur.execute('CREATE INDEX idx_%s_%s ON tab_%s(id)' % (i,j,i))
		conn.commit();
		if (i % 1000 == 0) and debug:
			print '  tables created: %s' % (i,)

	conn.commit()
	end = datetime.datetime.now()
	# print '  all tables created in %s seconds' % ((end-start).total_seconds(),)
	
	# drop the tables in batches
	

Re: [HACKERS] too much pgbench init output

2012-12-16 Thread Tomas Vondra
Hi,

attached is a new version of the patch that

(a) converts the 'log_step_seconds' variable to a constant (and does
not allow changing it using a command-line option etc.)

(b) keeps the current logging as a default

(c) adds a -q switch that enables the new logging with a 5-second
interval

I'm still not convinced there should be yet another know for tuning the
log interval - opinions?


Tomas

On 11.12.2012 10:23, Jeevan Chalke wrote:
 
 
 
 On Sun, Dec 9, 2012 at 8:11 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 On 20.11.2012 08:22, Jeevan Chalke wrote:
  Hi,
 
 
  On Tue, Nov 20, 2012 at 12:08 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz
  mailto:t...@fuzzy.cz mailto:t...@fuzzy.cz wrote:
 
  On 19.11.2012 11:59, Jeevan Chalke wrote:
   Hi,
  
   I gone through the discussion for this patch and here is my
 review:
  
   The main aim of this patch is to reduce the number of log
 lines. It is
   also suggested to use an options to provide the interval but
 few of us
   are not much agree on it. So final discussion ended at
 keeping 5 sec
   interval between each log line.
  
   However, I see, there are two types of users here:
   1. Who likes these log lines, so that they can troubleshoot some
   slowness and all
   2. Who do not like these log lines.
 
  Who likes these lines / needs them for something useful?
 
 
  No idea. I fall in second category.
 
  But from the discussion, I believe some people may need detailed
 (or lot
  more) output.
 
 I've read the thread again and my impression is that no one really needs
 or likes those lines, but
 
   (1) it's rather pointless to print a message every 100k rows, as it
   usually fills the console with garbabe
 
   (2) it's handy to have regular updates of the progress
 
 I don't think there're people (in the thread) that require to keep the
 current amount of log messages.
 
 But there might be users who actually use the current logs to do
 something (although I can't imagine what). If we want to do this in a
 backwards compatible way, we should probably use a new option (e.g.
 -q) to enable the new (less verbose) logging.
 
 Do we want to allow both types of logging, or shall we keep only the new
 one? If both, which one should be the default one?
 
 
 Both the options are fine with me, but the default should be the current
 behaviour.
 
 
   So keeping these in mind, I rather go for an option which
 will control
   this. People falling in category one can set this option to
 very low
   where as users falling under second category can keep it high.
 
  So what option(s) would you expect? Something that tunes the
 interval
  length or something else?
 
 
  Interval length.
 
 Well, I can surely imagine something like --interval N.
 
 
 +1
  
 
 
  A switch that'd choose between the old and new behavior might
 be a good
  idea, but I'd strongly vote against automagic heuristics. It
 makes the
  behavior very difficult to predict and I really don't want to
 force the
  users to wonder whether the long delay is due to general
 slowness of the
  machine or some clever logic that causes long delays between log
  messages.
 
  That's why I choose a very simple approach with constant time
 interval.
  It does what I was aiming for (less logs) and it's easy to
 predict.
  Sure, we could choose different interval (or make it an option).
 
 
  I am preferring an option for choosing an interval, say from 1
 second to
  10 seconds.
 
 U, why not to allow arbitrary integer? Why saying 1 to 10 seconds?
 
 
 Hmm.. actually, I have no issues with any number there. Just put 1..10
 as we hard-coded it 5. No particular reason as such.
  
 
 
  BTW, what if, we put one log message every 10% (or 5%) with time taken
  (time taken for last 10% (or 5%) and cumulative) over 5 seconds ?
  This will have only 10 (or 20) lines per pgbench initialisation.
  And since we are showing time taken for each block, if any slowness
  happens, one can easily find a block by looking at the timings and
  troubleshoot it.
  Though 10% or 5% is again a debatable number, but keeping it constant
  will eliminate the requirement of an option.
 
 That's what I originally proposed in September (see the messages from
 17/9), and Alvaro was not relly excited about this.
 
 Attached is a patch with fixed whitespace / indentation errors etc.
 Otherwise it's the same as the previous version.
 
 
 OK. Looks good now.
 
 Any other views / suggestions are welcome.
 
 Thanks
 
 

Re: [HACKERS] Adjusting elog behavior in bootstrap/standalone mode

2012-12-16 Thread Noah Misch
On Fri, Dec 14, 2012 at 04:20:30PM -0500, Tom Lane wrote:
 /* Determine whether message is enabled for server log output */
 if (IsPostmasterEnvironment)
 output_to_server = is_log_level_output(elevel, log_min_messages);
 else
 /* In bootstrap/standalone case, do not sort LOG out-of-order */
 output_to_server = (elevel = log_min_messages);
 
 In view of the confusion this caused just now, I wondered if we shouldn't
 get rid of the special case and always follow the is_log_level_output
 rule.  I tried modifying the code that way, and soon found that it made
 initdb rather noisy:

I can see why it was done that way, but agreed.

 creating configuration files ... ok
 creating template1 database in /home/postgres/data/base/1 ... LOG:  bogus 
 data in postmaster.pid
 LOG:  database system was shut down at 2012-12-14 15:55:35 EST
 LOG:  shutting down
 LOG:  database system is shut down

 One possibility is to tweak the elog call sites for these specific
 messages so that they are, say, NOTICE not LOG level when not
 IsPostmasterEnvironment.  That seems like a bit of a hack, but
 I don't see another answer that doesn't involve behind-the-scenes
 decisions in elog.c ... which is exactly what I want to get rid of.

Your proposed hack seems decent.  The second and third lines, in particular,
are rather useless here despite their utility under normal logging.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unresolved error 0xC0000409 on Windows Server

2012-12-16 Thread Noah Misch
On Sun, Dec 09, 2012 at 02:09:21PM -0500, Matthew Gerber wrote:
 In this situation, st_transform throws an internal_error, which my
 function catches and returns NULL for. The error / crash is not caused by a
 NULL argument; rather, it is caused by the final value in the attached
 script's INSERT statement, which contains a lat/lon pair that is beyond
 PostGIS's range. I'm not questioning whether this value is actually outside
 the legal range, but I do not think such an input should cause the server
 to crash completely.

The server should not crash, no.  However, the facts that PostGIS reported an
internal error and the crash is responsive to your choice of geographic inputs
increases the chance that the problem lies in PostGIS code, not PostgreSQL
core code.

 Here are the steps to reproduce the crash:
 
 1) Create a new instance of a 9.2 server (Windows 64-bit), and a new
 database (call it test) with the PostGIS extension.
 
 2) Run the script:
 
 psql -U postgres -d test -f C:\server_crash.sql
 
 You should see the following:
 
 psql:C:/server_crash.sql:31: server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.
 psql:C:/server_crash.sql:31: connection to server was lost
 
 3) Check your log for the error.

I tried this test procedure, but I could not reproduce the crash.  PostgreSQL:
one click installer postgresql-9.2.2-1-windows-x64.exe; PostGIS: v2.0.1 from
Stack Builder; OS: Windows Server 2008 R2 Datacenter.  I needed the workaround
in the last comment of this[1] bug report to get a working installation.  At
that point, your test procedure completes without error.  What specific
versions are involved in your installation?

Could you try again to get a minidump and stack trace?  Connect to your test
database with psql; run SELECT pg_backend_pid();; open Visual Studio; select
Debug - Attach to process...; select the postgres.exe process with matching
ID.  Run your test case; when the exception window pops up, select Break.
If the stack trace does not contain full symbol information, right click on
some of the incomplete lines and select Load Symbols From - Symbol Path;
navigate to the location of postgres.pdb.  You can select Save Dump As...
from the Debug menu to create the minidump.

Thanks,
nm

[1] http://trac.osgeo.org/postgis/ticket/1824


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Strange errors from 9.2.1 and 9.2.2 (I hope I'm missing something obvious)

2012-12-16 Thread Dan Scott
On Dec 11, 2012 9:28 PM, David Gould da...@sonic.net wrote:

 Thank you. I got the example via cut and paste from email and pasted it
 into psql on different hosts. od tells me it ends each line with:

   \n followed by 0xC2 0xA0 and then normal spaces. The C2A0 thing is
   apparently NO-BREAK SPACE. Invisible, silent, odorless but still deadly.

 Which will teach me not to accept text files from the sort of people who
 write code in Word I guess.

It's not just Word... I was bitten by this last week by a WYSIWYG HTML
widget I was using to write some documentation. When I copied the examples
I had created out of said environment during a final technical accuracy
pass and they failed to run in psql, I panicked for a few minutes.

I eventually determined that, rather than just wrapping my code in pre
tags, the widget had created nbsp; entities that were faithfully converted
into Unicode non-breaking spaces in the psql input.