date:20110226

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Heikki Linnakangas


On 26.02.2011 07:55, Tom Lane wrote:

So we really need some refactoring here.  I dislike adding another
fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
but there may not be a better way.


Could you keep the sequence unchanged for non-EXPLAIN callers with some 
refactoring? Add an exposed function like ExecutorFinishRun() that 
Explain calls explicitly in the EXPLAIN ANALYZE case, and modify 
ExecutorEnd to call it too, if it hasn't been called yet and the 
explain-only flag isn't set.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python tracebacks

2011-02-26 Thread Jan Urbański

- Original message -
 On Thu, Feb 24, 2011 at 9:03 AM, Jan Urbański wulc...@wulczer.org
 wrote:
  On 24/02/11 14:10, Peter Eisentraut wrote:
  Hm, perhaps, I put it in the details, because it sounded like the place
  to put information that is not that important, but still helpful. It's
  kind of natural to think of the traceback as the detail of the error
  message. But if you prefer context, I'm fine with that. You want me to
  update the patch to put the traceback in the context?
 
 I don't see a response to this question from Peter, but I read his
 email to indicate that he was hoping you'd rework along these lines.

I can do that, but not until Monday evening.

Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-26 Thread Jan Urbański

- Original message -
 On Wed, Feb 9, 2011 at 5:22 PM, Peter Eisentraut pete...@gmx.net wrote:
  On tis, 2011-02-08 at 00:32 -0500, Steve Singer wrote:
   The documentation could probably still use more word-smithing but
   that can happen later.  I'm marking this as ready for a committer.
  
  Is it necessarily a good idea that an explicit subtransaction disables
  the implicit sub-subtransactions?  It might be conceivable that you'd
  still want to do some try/catch within explicit subtransactions.
 
 Is this still an open question, or what is the remaining issue that
 needs to be addressed with regards to this patch?

The docs are included in the latest patch, and it turned out that disabling 
implicit subxacts inside explicit subxacts is not a good idea, so it's been 
fixed in the last patch. There are no unresolved issues AFAICT.

Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python do not delete function arguments

2011-02-26 Thread Jan Urbański

- Original message -
 On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański wulc...@wulczer.org
 wrote:
  On 15/02/11 20:39, Peter Eisentraut wrote:
   On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
[a bug that we don't know how to fix]

 From this discussion I gather that we have a problem here that we
 don't exactly know how to fix, so I'm inclined to suggest that we mark
 this Returned with Feedback in the CommitFest and instead add it to
 the TODO.   Since this is a pre-existing bug and not a new regression,
 it should not be something we hold up beta for.

I'm officially at a loss on how to fix that bug without some serious gutting of 
how PL/Python arguments work. If someone comes up with a brilliant way to solve 
this problem, we can commit it after beta, or even during the 9.2 cycle (should 
the brilliant solution be backpatcheable).

Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keywords in pg_hba.conf should be field-specific

2011-02-26 Thread Brendan Jurd

On 26 February 2011 18:06, Bruce Momjian br...@momjian.us wrote:

 Any progress on this?


I ended up doing most of the work, but never got around to finishing
it off.  Thanks for the reminder, though.  I'll get that one ready and
drop it onto the next CF.

Cheers,
BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz


On 25 Feb 2011, at 13:18, Robert Haas wrote:

  People coming from Oracle are not favorably
 impressed either by the amount of monitoring data PostgreSQL can
 gather or by the number of knobs that are available to fix problems
 when they occur.  We don't need to have as many knobs as Oracle and we
 probably don't want to, and for that matter we probably couldn't if we
 did want to for lack of manpower, but that doesn't mean we should have
 none.

Still, having more data a user can probe would be nice. 

I wonder why everyone avoids Microsoft's approach to the subject. Apparently, 
they go in the 'auto-tune as much as possible' direction. 
And tests we did a while ago, involving asking team from Microsoft and a team 
from oracle to optimise set of queries for the same set of data (bookies data, 
loads of it) showed that the auto-tuning Microsoft has in their
sql server performed much better than a team of over-sweating oracle dba's. 

In my current work place/camp we have many deployments of the same system, over 
different types of machines, each with different customer data that vary so 
much that queries need to be rather generic. 
Postgresql shows its strength with planner doing a good job for different 
variants of data, however we do a very little tweaking to the configuration 
parameters. Just because it is just too hard to overlook all of them. 
I guess that the systems could behave much better, but no one is going to tweak 
settings for 50 different installations over 50 different type of data and 50 
different sets of hardware. 
If there was even a tiny amount of automation provided in the postgresql, I 
would welcome it with open arms. 




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-26 Thread Marko Tiikkaja


On 2011-02-26 4:41 AM +0200, Tom Lane wrote:

Marko Tiikkajamarko.tiikk...@cs.helsinki.fi  writes:

One thing bothers me though: what was the reason for requiring a
RETURNING clause for data-modifying statements in WITH?


That test was in your patch, no?  I moved the code to another place
but it's still enforcing the same thing, namely that you can't reference
the output of an INSERT/UPDATE/DELETE that hasn't got RETURNING.


Oh, right.  I misunderstood.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Keywords in pg_hba.conf should be field-specific

2011-02-26 Thread Magnus Hagander

On Sat, Feb 26, 2011 at 10:57, Brendan Jurd dire...@gmail.com wrote:
 On 26 February 2011 18:06, Bruce Momjian br...@momjian.us wrote:

 Any progress on this?


 I ended up doing most of the work, but never got around to finishing
 it off.  Thanks for the reminder, though.  I'll get that one ready and
 drop it onto the next CF.

Just as a point, which i proably moot if you've done most of the work
already :) But at some point it might be worthwhile to see if our life
would be mad easier by using flex (with or without bison) to parse
pg_hba.conf as wel, since we do it for postgresql.conf and other
things. If it makes maintenance easier, it's probably worth it.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Greg Stark

On Sat, Feb 26, 2011 at 5:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So we really need some refactoring here.  I dislike adding another
 fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
 but there may not be a better way.  The only way I see to fix this
 without changing that API is to have ExecutorRun do the cleanup
 processing just after the top plan node returns a null tuple, and that
 seems a bit ugly as well.


How would that handle the case of a cursor which isn't read to
completion? Should it still execute the CTEs to completion?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Rod Taylor

On Fri, Feb 25, 2011 at 14:26, Alvaro Herrera alvhe...@commandprompt.comwrote:

 Excerpts from Rod Taylor's message of vie feb 25 14:03:58 -0300 2011:

  How practical would it be for analyze to keep a record of response times
 for
  given sections of a table as it randomly accesses them and generate some
  kind of a map for expected response times for the pieces of data it is
  analysing?

 I think what you want is random_page_cost that can be tailored per
 tablespace.


Yes, that can certainly help but does nothing to help with finding typical
hot-spots or cached sections of the table and sending that information to
the planner.

Between Analyze random sampling and perhaps some metric during actual IO of
random of queries we should be able to determine and record which pieces of
data tend to be hot/in cache, or readily available and what data tends not
to be.


If the planner knew that the value 1 tends to have a much lower cost to
fetch than any other value in the table (it is cached or otherwise readily
available), it can choose a plan better suited toward that.

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Dean Rasheed

On 26 February 2011 05:55, Tom Lane t...@sss.pgh.pa.us wrote:
 Further experimentation has reminded me of why I didn't want to put such
 processing in ExecutorEnd :-(.  There are some nasty interactions with
 EXPLAIN:

 1. EXPLAIN ANALYZE fails to include the execution cycles associated with
 running the ModifyTable nodes to completion.  In the worst case, such as
 WITH t AS (INSERT ...) SELECT 1, it will claim the INSERT subplan is
 never executed, even though rows certainly got inserted.  This is
 because EXPLAIN extracts all the counts from the execution state tree
 before shutting it down with ExecutorEnd.

 2. But it gets worse.  Try the same query *without* ANALYZE.  You'll
 find the INSERT executes anyway!  That's because EXPLAIN still calls
 ExecutorEnd to clean up the execution state tree, and ExecutorEnd
 doesn't realize it's not supposed to run any of the plan.


There's a third problem: AfterTriggerEndQuery() is called before
ExecutorEnd(), and so if the post-processing is done in ExecutorEnd()
and it attempts to queue up any AFTER triggers, it fails (ERROR:
AfterTriggerSaveEvent() called outside of query).


 So we really need some refactoring here.  I dislike adding another
 fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
 but there may not be a better way.  The only way I see to fix this
 without changing that API is to have ExecutorRun do the cleanup
 processing just after the top plan node returns a null tuple, and that
 seems a bit ugly as well.

 Thoughts?


Could the post-processing not be done at the end of ExecutePlan()?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas

On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz
g...@pointblue.com.pl wrote:

 On 25 Feb 2011, at 13:18, Robert Haas wrote:

  People coming from Oracle are not favorably
 impressed either by the amount of monitoring data PostgreSQL can
 gather or by the number of knobs that are available to fix problems
 when they occur.  We don't need to have as many knobs as Oracle and we
 probably don't want to, and for that matter we probably couldn't if we
 did want to for lack of manpower, but that doesn't mean we should have
 none.

 Still, having more data a user can probe would be nice.

 I wonder why everyone avoids Microsoft's approach to the subject. Apparently, 
 they go in the 'auto-tune as much as possible' direction.
 And tests we did a while ago, involving asking team from Microsoft and a team 
 from oracle to optimise set of queries for the same set of data (bookies 
 data, loads of it) showed that the auto-tuning Microsoft has in their
 sql server performed much better than a team of over-sweating oracle dba's.

I don't think *anyone* is avoiding that approach.  There is almost
universal consensus here that auto-tuning is better than manual
tuning, even to the extent of being unwilling to add knobs to allow
manual tuning of settings we have no idea how to auto-tune and no
plans to auto-tune.

 In my current work place/camp we have many deployments of the same system, 
 over different types of machines, each with different customer data that vary 
 so much that queries need to be rather generic.
 Postgresql shows its strength with planner doing a good job for different 
 variants of data, however we do a very little tweaking to the configuration 
 parameters. Just because it is just too hard to overlook all of them.
 I guess that the systems could behave much better, but no one is going to 
 tweak settings for 50 different installations over 50 different type of data 
 and 50 different sets of hardware.
 If there was even a tiny amount of automation provided in the postgresql, I 
 would welcome it with open arms.

What do you have in mind?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Robert Haas

On Sat, Feb 26, 2011 at 1:57 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  Actually, we *do* have some idea which tables are hot. ?Or at least, we
  could. ? Currently, pg_stats for tables are timeless; they just
  accumulate from the last reset, which has always been a problem in
  general for monitoring. ?If we could make top-level table and index
  stats time-based, even in some crude way, we would know which tables
  were currently hot. ?That would also have the benefit of making server
  performance analysis and autotuning easier.

 I think there would be value in giving the DBA an easier way to see
 which tables are hot, but I am really leery about the idea of trying
 to feed that directly into the query planner.  I think this is one of
 those cases where we let people tune it manually for starters, and
 then wait for feedback.  Eventually someone will say oh, I never tune
 that by hand any more, ever since I wrote this script which does the
 following computation... and I just run it out cron.  And then we
 will get out the party hats.  But we will never get the experience we
 need to say what that auto-tuning algorithm will be unless we first
 provide the knob for someone to fiddle with manually.

 It is also possible we will implement a manual way and never get around
 to automating it.   :-(

You make it sound as if we know how but are just too lazy to right the
code.  That is not one of the weaknesses that this community has.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python do not delete function arguments

2011-02-26 Thread Robert Haas

2011/2/26 Jan Urbański wulc...@wulczer.org:
 - Original message -
 On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański wulc...@wulczer.org
 wrote:
  On 15/02/11 20:39, Peter Eisentraut wrote:
   On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
[a bug that we don't know how to fix]

 From this discussion I gather that we have a problem here that we
 don't exactly know how to fix, so I'm inclined to suggest that we mark
 this Returned with Feedback in the CommitFest and instead add it to
 the TODO.   Since this is a pre-existing bug and not a new regression,
 it should not be something we hold up beta for.

 I'm officially at a loss on how to fix that bug without some serious gutting 
 of how PL/Python arguments work. If someone comes up with a brilliant way to 
 solve this problem, we can commit it after beta, or even during the 9.2 cycle 
 (should the brilliant solution be backpatcheable).

Is this discussion related to the following todo item:

Create a new restricted execution class that will allow passing
function arguments in as locals. Passing them as globals means
functions cannot be called recursively.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-26 Thread Robert Haas

2011/2/26 Jan Urbański wulc...@wulczer.org:
 The docs are included in the latest patch, and it turned out that disabling 
 implicit subxacts inside explicit subxacts is not a good idea, so it's been 
 fixed in the last patch. There are no unresolved issues AFAICT.

OK.  Peter, are you planning to commit this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] REVIEW: PL/Python table functions

2011-02-26 Thread Peter Eisentraut

Committed the last version.

I updated the documentation which previously claimed that what you
implemented wasn't supported.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Change pg_last_xlog_receive_location not to move backwards

2011-02-26 Thread Robert Haas

On Wed, Feb 16, 2011 at 7:06 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 16, 2011 at 12:59 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Feb 15, 2011 at 9:41 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Feb 15, 2011 at 12:34 AM, Fujii Masao masao.fu...@gmail.com wrote:
 You suggest that the shared variable Stream tracks the WAL write location,
 after it's set to the replication starting position? I don't think
 that the write
 location needs to be tracked in the shmem because other processes than
 walreceiver don't use it.

 Well, my proposal was to expose it, on the theory that it's useful.
 As we stream the WAL, we write it, so I think for all intents and
 purposes write == stream.  But using it to convey the starting
 position makes more sense if you call it stream than it does if you
 call it write.

 Umm.. I could not find any use case to expose the WAL write location
 besides flush one. So I'm not sure if it's really useful to track the
 write location in the shmem besides the walreceiver-local memory.
 What use case do you think of?

 Well, we're currently exposing that on the master via
 pg_stat_replication.  I guess we could rip that out, but I think that
 if nothing else we're imagining eventually supporting a sync rep mode
 where the standby acknowledges WAL upon receipt rather than upon
 write.  And the lag between the write and flush positions can be up to
 16MB, so it doesn't seem entirely academic.  Basically, the write
 position is the most WAL that could be on disk on standby and the
 flush is the most WAL that we're SURE is on disk on the standby.

 Personally the term stream sounds more ambiguous than write.
 I cannot imagine what location the pg_last_xlog_stream_location or
 stream_location actually returns, from the function name;  WAL
 location that has been received? written? flushed? replayed?
 Since the write sounds cleaner, I like it.

 Well, the problem with receivedUpto is that it's really being used for
 two different things, neither of which is how much WAL has been
 received.  One is where streaming is to start (hence, stream) and the
 other is how much we've flushed to disk (hence, flush).  So you might
 think there were four positions: streaming start, write, flush, apply.
  But I think the first two are really the same: once you've received
 at least one byte, the position that you're streaming from and the
 write position are the same, so I think the name stream can span both
 concepts.  OTOH, stream-start and flush are clearly NOT the same -
 there is a small but potentially significant delay between
 stream/write and flush.

It sounds like the only thing we have definite agreement about from
all this is that apply_location should be renamed to replay_location
in pg_stat_replication, a point fairly incidental to the what this
patch is about.  It seems fairly unsatisfying to just change that and
punt the rest of this, but I'm not sure what the alternative is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python explicit subtransactions

2011-02-26 Thread Peter Eisentraut

On lör, 2011-02-26 at 09:49 -0500, Robert Haas wrote:
 OK.  Peter, are you planning to commit this?

Yes.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python tracebacks

2011-02-26 Thread Peter Eisentraut

On lör, 2011-02-26 at 09:34 +0100, Jan Urbański wrote:
 - Original message -
  On Thu, Feb 24, 2011 at 9:03 AM, Jan Urbański wulc...@wulczer.org
  wrote:
   On 24/02/11 14:10, Peter Eisentraut wrote:
   Hm, perhaps, I put it in the details, because it sounded like the place
   to put information that is not that important, but still helpful. It's
   kind of natural to think of the traceback as the detail of the error
   message. But if you prefer context, I'm fine with that. You want me to
   update the patch to put the traceback in the context?
  
  I don't see a response to this question from Peter, but I read his
  email to indicate that he was hoping you'd rework along these lines.
 
 I can do that, but not until Monday evening.

Well, I was hoping for some other opinion, but I guess my request
stands.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and wal streaming

2011-02-26 Thread Magnus Hagander

On Sun, Feb 20, 2011 at 21:37, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Hi,

 Magnus Hagander mag...@hagander.net writes:
 Better late than never (or?), here's the final cleanup of
 pg_streamrecv for moving into the main distribution, per discussion
 back in late dec or early jan. It also includes the stream logs in
 parallel to backup part that was not completed on pg_basebackup.

 And that's something I've been so interested in!  It's only fair game
 that I spend time reviewing after my insisting for having it :)

 The programs (pg_basebackup and pg_receivexlog) both work as expected,
 and show in the pg_stat_replication system view.

 pg_basebackup -x option should be revised so that it's easier to make
 the difference between streaming WAL while the base backup is ongoing
 and fetching them at the end, taking the risk to corrupt the whole
 backup as soon as wal_keep_segments is undersized.

  -x, --xlog[=stream]       include required WAL files in backup

 It could be --xlog=stream|fetch or something that reads better.

Yeha, that's probably true. I wanted to avoid making it mandatory, but
it's actually easier this way. Will change it to that.


 Now, on to the code itself.

 I wonder if the segment_callback() routine would better be a while{}
 loop rather than a recursive construct.  Also, it looks like a lib
 function but it's doing exit(1)…

Actually, it's even better to just reorder the checks in the other
order - that way we don't need a loop *or* a self-call.


 Unfortunately I can't comment (or won't risk learning enough details
 tonight to try to be smart here) on FindStreamingStart() implementation,
 that seems crucial.

It is - so if you can find the time to, that would be great...


 Will the server refrain from recycling a WAL file when all receivers
 sent_location are not known to be past the positions contained in it?
 If that's the case, the documentation should talk about pg_receivexlog
 as an alternative to archiving, relying on libpq.  It that's not the
 case, is there a good reason for that not being the case? (even if
 that's not on this patch to fix that).


No, not at this point. It would be nice to have that option in the future...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Parallel restore checks wrong thread return value?

2011-02-26 Thread Magnus Hagander

in spawn_restore:


child = (HANDLE) _beginthreadex(NULL, 0, (void *) parallel_restore,
args, 
0, NULL);
if (child == 0)


But from my reading of the docs, _beginthreadex() returns -1 on error, not 0.

Bug, or am I reading it wrong?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Tom Lane

Greg Stark gsst...@mit.edu writes:
 On Sat, Feb 26, 2011 at 5:55 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 So we really need some refactoring here.  I dislike adding another
 fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
 but there may not be a better way.  The only way I see to fix this
 without changing that API is to have ExecutorRun do the cleanup
 processing just after the top plan node returns a null tuple, and that
 seems a bit ugly as well.

 How would that handle the case of a cursor which isn't read to
 completion? Should it still execute the CTEs to completion?

Right at the moment we dodge that issue by disallowing wCTEs in cursors.
If we did allow them, then I would say that the wCTEs have to be run to
completion when the cursor is closed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Greg Stark

On Sat, Feb 26, 2011 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Right at the moment we dodge that issue by disallowing wCTEs in cursors.
 If we did allow them, then I would say that the wCTEs have to be run to
 completion when the cursor is closed.


Does that really dodge anything? Isn't it just the same as running a
query from a client and closing the result without reading to the end?

ExecutorEnd would be called but ExecutorRun would never be called to
the end of the scan.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel restore checks wrong thread return value?

2011-02-26 Thread Magnus Hagander

On Sat, Feb 26, 2011 at 16:51, Magnus Hagander mag...@hagander.net wrote:
 in spawn_restore:


        child = (HANDLE) _beginthreadex(NULL, 0, (void *) parallel_restore,
                                                                        args, 
 0, NULL);
        if (child == 0)


 But from my reading of the docs, _beginthreadex() returns -1 on error, not 0.

 Bug, or am I reading it wrong?

I was.

_beginthread() returns -1 on error, and _beginthreadex() returns 0.

That's just brilliant.

Sorry about the noise.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 26.02.2011 07:55, Tom Lane wrote:
 So we really need some refactoring here.  I dislike adding another
 fundamental step to the ExecutorStart/ExecutorRun/ExecutorEnd sequence,
 but there may not be a better way.

 Could you keep the sequence unchanged for non-EXPLAIN callers with some 
 refactoring? Add an exposed function like ExecutorFinishRun() that 
 Explain calls explicitly in the EXPLAIN ANALYZE case, and modify 
 ExecutorEnd to call it too, if it hasn't been called yet and the 
 explain-only flag isn't set.

I had been toying with the same idea, but it doesn't work, as Dean
Rasheed points out nearby.  The full monty for running the executor
these days is really

CreateQueryDesc(...);
AfterTriggerBeginQuery(...);
ExecutorStart(...);
ExecutorRun(...);   // zero or more times
AfterTriggerEndQuery(...);
ExecutorEnd(...);
FreeQueryDesc(...);

ExecutorEnd can *not* do anything that might fire triggers, because it's
too late: AfterTriggerEndQuery has already been called.

BTW, there are various places that believe they can skip
AfterTriggerBeginQuery/AfterTriggerEndQuery if operation == CMD_SELECT,
an assumption that no longer holds if the query has wCTEs.  So we have
some work to do on the callers no matter what.

I'm inclined to think that it would be best to move the responsibility
for calling AfterTriggerBeginQuery/AfterTriggerEndQuery into the
executor.  That would get us down to

CreateQueryDesc(...);
ExecutorStart(...); // now includes AfterTriggerBeginQuery
ExecutorRun(...);   // zero or more times
ExecutorFinish(...);// ModifyTable cleanup, AfterTriggerEndQuery
ExecutorEnd(...);   // just does what it's always done
FreeQueryDesc(...);

where EXPLAIN without ANALYZE would skip ExecutorRun and ExecutorFinish.

The RI triggers have a requirement for being able to run this sequence
without the AfterTriggerBeginQuery/AfterTriggerEndQuery calls (cf
SPI_execute_snapshot's fire_triggers parameter), but we could support
that by adding an ExecutorStart flag that tells it to suppress those
trigger calls.

IMO the major disadvantage of a refactoring like this is the possibility
of sins of omission in third-party code, in particular somebody not
noticing the added requirement to call ExecutorFinish.  We could help
them out by adding an Assert in ExecutorEnd to verify that
ExecutorFinish had been called (unless explain-only mode).  A variant of
that problem is an auto_explain-like add-on not noticing that they
probably want to hook into ExecutorFinish if they'd previously been
hooking ExecutorRun.  I don't see any simple check for that though.
The other possible failure mode is forgetting to remove calls to the two
trigger functions, but we could encourage getting that right by renaming
those two functions.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO: You can alter it, but you can't view it

2011-02-26 Thread Bruce Momjian

bruce wrote:
 Josh Berkus wrote:
  
   Right now pg_options_to_table() is not documented.  Should it be?
  
  Yes, I think so.
 
 Done, with the attached, applied patch.

Oh, here is an example usage:

test= select pg_options_to_table(reloptions) from pg_class;
   pg_options_to_table
--
 (fillfactor,50)
 (autovacuum_freeze_table_age,10)
(2 rows)

It also works for pg_attribute.attoptions, so I documented that too.
I also fixed the documented return type in my previous patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edb7795..9c8e19f 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** SELECT pg_type_is_visible('myschema.widg
*** 13385,13392 
/row
row
 entryliteralfunctionpg_options_to_table(parameterreloptions/parameter)/function/literal/entry
!entrytypename, option/type/entry
!entryget the set of option name/value pairs from structnamepg_class/.structfieldreloptions//entry
/row
row
 entryliteralfunctionpg_tablespace_databases(parametertablespace_oid/parameter)/function/literal/entry
--- 13385,13392 
/row
row
 entryliteralfunctionpg_options_to_table(parameterreloptions/parameter)/function/literal/entry
!entrytypesetof record/type/entry
!entryget the set of storage option name/value pairs//entry
/row
row
 entryliteralfunctionpg_tablespace_databases(parametertablespace_oid/parameter)/function/literal/entry
*** SELECT pg_type_is_visible('myschema.widg
*** 13484,13492 
/para
  
para
!functionpg_options_to_table/function returns the set of option
 name/value pairs when passed
!structnamepg_class/.structfieldreloptions/.
/para
  
para
--- 13484,13493 
/para
  
para
!functionpg_options_to_table/function returns the set of storage option
 name/value pairs when passed
!structnamepg_class/.structfieldreloptions/ or
!structnamepg_attribute/.structfieldattoptions/.
/para
  
para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread PostgreSQL - Hans-Jürgen Schönig

 
 
 Still, having more data a user can probe would be nice.
 
 I wonder why everyone avoids Microsoft's approach to the subject. 
 Apparently, they go in the 'auto-tune as much as possible' direction.
 And tests we did a while ago, involving asking team from Microsoft and a 
 team from oracle to optimise set of queries for the same set of data 
 (bookies data, loads of it) showed that the auto-tuning Microsoft has in 
 their
 sql server performed much better than a team of over-sweating oracle dba's.
 
 I don't think *anyone* is avoiding that approach.  There is almost
 universal consensus here that auto-tuning is better than manual
 tuning, even to the extent of being unwilling to add knobs to allow
 manual tuning of settings we have no idea how to auto-tune and no
 plans to auto-tune.
 
 In my current work place/camp we have many deployments of the same system, 
 over different types of machines, each with different customer data that 
 vary so much that queries need to be rather generic.
 Postgresql shows its strength with planner doing a good job for different 
 variants of data, however we do a very little tweaking to the configuration 
 parameters. Just because it is just too hard to overlook all of them.
 I guess that the systems could behave much better, but no one is going to 
 tweak settings for 50 different installations over 50 different type of data 
 and 50 different sets of hardware.
 If there was even a tiny amount of automation provided in the postgresql, I 
 would welcome it with open arms.
 
 What do you have in mind?
 



what we are trying to do is to explicitly store column correlations. so, a 
histogram for (a, b) correlation and so on.
the planner code then goes through its restrictions in the query and finds the 
best / longest combination it can find and which has some statistics defined.
it seems we can also do this for join selectivity and expressions. the planner 
code for raw column correlation without expression ( cos(id) or so)  and 
joins is there (WIP, no ANALYZE support and so on so far).

i think auto tuning is a good thing to have and the door to actually do it is 
wide open with our approach.
all it takes is a mechanism to see which conditions are used how often and 
somebody could write a job which automatically tells the system which stats to 
collect / sample.
i think for an average user this is the most simplistic thing then. but, to 
get there we have to get the bloody sampling and the rest of the planner code 
right in the first place.
auto tuning in this area is still something which is far in the future - but at 
least the road to it is clear.

some people suggested some approach dealing with effective_cache_size and so on 
... there are many good approaches here but they don't address the actual 
problem of wrong size-estimates.

many thanks,

hans

--
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python do not delete function arguments

2011-02-26 Thread Jan Urbański

   On Tue, Feb 15, 2011 at 6:04 PM, Jan Urbański wulc...@wulczer.org
   wrote:
On 15/02/11 20:39, Peter Eisentraut wrote:
 On tis, 2011-02-15 at 09:58 +0100, Jan Urbański wrote:
  [a bug that we don't know how to fix]
  
   From this discussion I gather that we have a problem here that we
   don't exactly know how to fix, so I'm inclined to suggest that we
   mark this Returned with Feedback in the CommitFest and instead add
   it to the TODO.   Since this is a pre-existing bug and not a new
   regression, it should not be something we hold up beta for.
  
  I'm officially at a loss on how to fix that bug without some serious
  gutting of how PL/Python arguments work. If someone comes up with a
  brilliant way to solve this problem, we can commit it after beta, or
  even during the 9.2 cycle (should the brilliant solution be
  backpatcheable).
 
 Is this discussion related to the following todo item:
 
 Create a new restricted execution class that will allow passing
 function arguments in as locals. Passing them as globals means
 functions cannot be called recursively.

Yep, the bug it's more or less an emanation of this problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Spatio-Temporal Functions

2011-02-26 Thread Nick Raj

Hi,
I am writing some spatio-temporal function in postgis.
Like, ST_Enters( obj_geom, boundary_geom, time t1,time t2)

For example- Does vehicle enters in ABC between time t1 to t2?
Let us suppose, i take only one object geometry at a time and do compare
then i could not produce the output because it is continous function of time
i.e. Enters is made up of Disjoint - Touches - Inside .
So, for creating this function, i require all the object's geometry before
it go inside the function. So that it can call only once.

My first problem is that
create or replace function ST_Enters(, geometry,timestamp,timestamp)
returns boolean 

Which kind of data type to be used because i have to catch more than 1000
rows and geometry[] doesn't work for it.
So, Can anyone tell me which data type can be used to catch this?

Second,
How can i use this function in SQL? Because suppose i write query like
select ST_Enters ( obj_geom, (select boundary_geom from boundary),
'25/2/2011 12:23:32','26/2/2011') from vehicledata where vehicleid= 'XYZ';
and again it comes out to be one object geometry matches with boundary
geometry,
if i store the inbetween data in some temp table then it creates a lot of
computation between the table and if it end up in between then table may
contains some wrong data.
So, How to use this function in SQL in continuous manner??

Thanks in advance.

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Greg Stark

2011/2/26 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at:
 what we are trying to do is to explicitly store column correlations. so, a 
 histogram for (a, b) correlation and so on.


The problem is that we haven't figured out how to usefully store a
histogram for a,b. Consider the oft-quoted example of a
city,postal-code  -- or city,zip code for Americans. A histogram
of the tuple is just the same as a histogram on the city. It doesn't
tell you how much extra selectivity the postal code or zip code gives
you. And if you happen to store a histogram of postal code, city by
mistake then it doesn't tell you anything at all.

We need a data structure that lets us answer the bayesian question
given a city of New York how selective is zip-code = 02139. I don't
know what that data structure would be.

Heikki and I had a wacky hand-crafted 2D histogram data structure that
I suspect doesn't actually work. And someone else did some research on
list and came up with a fancy sounding name of a statistics concept
that might be what we want.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Martijn van Oosterhout

On Sat, Feb 26, 2011 at 06:44:52PM +, Greg Stark wrote:
 2011/2/26 PostgreSQL - Hans-Jürgen Schönig postg...@cybertec.at:
  what we are trying to do is to explicitly store column correlations. so, a 
  histogram for (a, b) correlation and so on.
 
 The problem is that we haven't figured out how to usefully store a
 histogram for a,b. Consider the oft-quoted example of a
 city,postal-code  -- or city,zip code for Americans. A histogram
 of the tuple is just the same as a histogram on the city. 

But there are cases where it can work. Frankly the example you mention
is odd because for we can't even build useful 1D histograms for city
and zip code, so the fact that 2D is hard is not surprising.

The histograms we do build work fine from  and , just equality. The
2D will handle the same.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 Patriotism is when love of your own people comes first; nationalism,
 when hate for people other than your own comes first. 
   - Charles de Gaulle


signature.asc
Description: Digital signature

Re: [HACKERS] pg_basebackup and wal streaming

2011-02-26 Thread Yeb Havinga


On 2011-02-26 18:19, Magnus Hagander wrote:

Attached is an updated version of the patch that includes these
changes, as well as Windows support and an initial cut at a ref page
for pg_receivexlog (needs some more detail still).
I'm testing a bit more (with the previous version, sorry) and got the 
following while doing a stream backup from a cluster that was at that 
moment doing a pgbench run with 1 synchronous standby.


mgrid@mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 15/72C8
pg_basebackup: starting background WAL receiver
pg_basebackup: got WAL data offset 14744, expected 16791960424)
5148915/5148026 kb g(100%) 1/1 tablespaces
xlog end point: 15/80568878
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: child process exited with error 1

The fetch variant worked ok.

mgrid@mg79:~$ pg_basebackup --xlog -D /data -vP -h mg73 -U repuser
Password:
xlog start point: 15/A220
5482029/5153458 kb g(100%) 1/1 tablespaces
xlog end point: 15/B51D0230
pg_basebackup: base backup completed.

I'm in total monkey test mode here, so I don't even know if I'm not 
supposed to do the streaming variant while other stuff is going on.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup and wal streaming

2011-02-26 Thread Magnus Hagander

On Sat, Feb 26, 2011 at 20:48, Yeb Havinga yebhavi...@gmail.com wrote:
 On 2011-02-26 18:19, Magnus Hagander wrote:

 Attached is an updated version of the patch that includes these
 changes, as well as Windows support and an initial cut at a ref page
 for pg_receivexlog (needs some more detail still).

 I'm testing a bit more (with the previous version, sorry) and got the
 following while doing a stream backup from a cluster that was at that moment
 doing a pgbench run with 1 synchronous standby.

 mgrid@mg79:~$ pg_basebackup --xlog=stream -D /data -vP -h mg73 -U repuser
 Password:
 xlog start point: 15/72C8
 pg_basebackup: starting background WAL receiver
 pg_basebackup: got WAL data offset 14744, expected 16791960424        )
 5148915/5148026 kb g(100%) 1/1 tablespaces
 xlog end point: 15/80568878
 pg_basebackup: waiting for background process to finish streaming...
 pg_basebackup: child process exited with error 1

Hmm, strange. What platform are you on?

I saw something similar *once* on Windows, but it then passed my tests
a lot of times in a row so I figured it was just a didn't clean
properly thing. Clearly there's a bug around.

What's the size of the latest WAL file that it did work on? Is it
16791960424 bytes? That's way way to large, but perhaps it's not
switching segment properly? (that value is supposedly the current
write position in the file..)


 I'm in total monkey test mode here, so I don't even know if I'm not supposed
 to do the streaming variant while other stuff is going on.

Oh yes, that's one of the main reasons to use it, so you should
definitely be able to do that!


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Grzegorz Jaskiewicz


On 26 Feb 2011, at 14:45, Robert Haas wrote:

 On Sat, Feb 26, 2011 at 4:33 AM, Grzegorz Jaskiewicz
 
 
 I don't think *anyone* is avoiding that approach.  There is almost
 universal consensus here that auto-tuning is better than manual
 tuning, even to the extent of being unwilling to add knobs to allow
 manual tuning of settings we have no idea how to auto-tune and no
 plans to auto-tune.
 
Perhaps one step further is required. To change some settings so that it can be 
auto-tuned better. There are some even more drastic steps that would have to be 
taken
and I believe that Microsoft engineers had to take them. Steps back. For 
instance, if there is an issue with inability to find out how much of a table 
is in the cache, perhaps postgresql should
have an option to turn off cached reads/writes completely and thus allow DBA to 
regulate that using the shared_buffers setting. It doesn't sound great, but if 
you think about it
I'm sure there are people willing to use it, if that adds a bit more 
auto-tunning to the server. I would even go a step further, and say that I 
believe that some people will
embrace it on the basis that they can constraint the amount of memory 
PostgreSQL uses on their server as a whole, and that includes caches. 


 In my current work place/camp we have many deployments of the same system, 
 over different types of machines, each with different customer data that 
 vary so much that queries need to be rather generic.
 Postgresql shows its strength with planner doing a good job for different 
 variants of data, however we do a very little tweaking to the configuration 
 parameters. Just because it is just too hard to overlook all of them.
 I guess that the systems could behave much better, but no one is going to 
 tweak settings for 50 different installations over 50 different type of data 
 and 50 different sets of hardware.
 If there was even a tiny amount of automation provided in the postgresql, I 
 would welcome it with open arms.
 
 What do you have in mind?

All I'm trying to say, that whilst you guys focus mostly on single database 
server installations PostgreSQL has also a great user base that use it as part 
of a product that is deployed on different sized machines, 
and with same model but different data variation. We don't sell the product to 
the people and let them take care of it, but rather sell the service - you 
would say. But we also don't have a DBA per customer that would look solely
at the knob tweaking side of things. So my argument here is, that there isn't 
always a person who would know tables and databases by their characteristics 
and thus be able to tweak settings manually. 
That probably is just a one of many examples where it makes sense, and probably 
their primary property is that there's no DBA overlooking whole database and 
thus being able to tune it. 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] disposition of remaining patches

2011-02-26 Thread Josh Berkus


 That would be nice; I'm basically abusing syncrep to this purpose. At
 the same time, someone may need to be notified of such a switchover
 occurring, and in event of failure, it'd be nice to bounce back to the
 primary. Tangentially relevent, Virtual IP is not always an option,
 such as on Amazon EC2.

Well, let's comprehensively address replication in a cloud environment
for 9.2.  You can start a wiki page.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner

 Grzegorz Jaskiewicz  wrote:
 
 I guess that the systems could behave much better, but no one is
 going to tweak settings for 50 different installations over 50
 different type of data and 50 different sets of hardware.
 If there was even a tiny amount of automation provided in the
 postgresql, I would welcome it with open arms.
 
Hmmm...  Well, we have about 100 pieces of hardware with about 200
databases, and we *do* tune them individually, but it's not as
onerous as it might seem.  For our 72 production circuit court
servers, for example, we have one standard configuration which has as
its last line an include file for overrides.  For some counties that
override file is empty.  For many we override effective_cache_size
based on the RAM installed in the machine.  Since most of these
servers have the database fully cached, the standard file uses
equal, low settings for seq_page_cost and random_page_cost, but we
override that where necessary.  We don't generally tune anything else
differently among these servers.  (Maybe work_mem, I'd have to
check.)
 
Which leads me to think that these might be the key items to
autotune.  It's not actually that hard for me to imagine timing a
small percentage of randomly selected page accesses and developing
costing factors for the page costs on the fly.  It might be a bit
trickier to autotune effective_cache_size, but I can think of two or
three heuristics which might work.  Automatically generating sane
values for these three things would eliminate a significant fraction
of problems posted to the performance list.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Generalized edit function?

2011-02-26 Thread fork

Hi hackers,

I am interested in extending Postgres with a generalized edit function like
SAS's compged[1], which is basically levenshtein distance with transposes (ab
- ba) and LOTS of different weights for certain ops (like insert a blank
versus delete from the end versus insert a regular character).  

Compged seems to work really well for us when trying to match addresses (MUCH
better than pure levenshtein), and it would be a great tool for data miners.

I have a number of questions:

1.  Does anybody else care? I would love to see this in contrib, but if the
chances are slim, then I would like to know that too.

2.  Has anybody else done something like this and can give ideas or source?  It
seems to me that the code will have to be a mess of pointers and indexes, but if
there is some theory that simplifies it I haven't heard about it.  (Levenshtein
without transposes is theoretically clean, but I think the fact that we have
transposes means we look ahead 2 chars and lose all the nice dynamic programming
stuff.)

3.  I will probably implement this for ascii characters -- if anyone has any
thoughts on other encodings, please share.

Thanks for everyone's time.  I will try to implement a command line version and
put that on pastebin for people to look at while I port it to the postgres
environment.


[1]
(http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: cross column correlation ...

2011-02-26 Thread Kevin Grittner

Greg Stark  wrote:
 
 Consider the oft-quoted example of a  -- or
  for Americans.
 
I'm not sure everyone realizes just how complicated this particular
issue is.  If we can do a good job with U.S. city, state, zip code we
will have something which will handle a lot of cases.
 
Consider:
 
(1)  Municipality name isn't unique in the U.S.  Many states besides
Wisconsin have a municipality called Madison (I seem to remember
there were over 20 of them).  So city without state doesn't
necessarily get you anywhere near having a unique zip code or range.
 
(2)  A large city has a set of zip codes, all starting with the same
first three digits.  So identifying the municipality doesn't always
identify the zip code, although for small cities it often does. 
Madison, Wisconsin has thirty-some zip codes, some of which are
rather specialized and don't see much use.
 
(3)  Small municipalities surrounded by or adjacent to a large city
may not get their own zip code.  53704 not only covers a large swath
of the northern end of the City of Madison, but is also the zip code
for the Village of Maple Bluff and at least parts of the Township of
Westport.
 
I guess what I'm saying is that this use case has enough complexity
to make an interesting problem to solve.  It may even be more
challenging than you would want for an initial trial of a technique.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_sender_delay is still required?

2011-02-26 Thread Heikki Linnakangas


On 07.12.2010 05:51, Fujii Masao wrote:

On Tue, Dec 7, 2010 at 12:22 PM, Robert Haasrobertmh...@gmail.com  wrote:

Fair enough. How about increasing the default to 10 seconds?
Since bgwriter has already using 10s as a nap time if there is no
configured activity, I think that 10s is non-nonsense default value.


What do we get out of making this non-configurable?


Which would make the setting of replication simpler, I think.
But I agree to just increase the default value of wal_sender_delay
rather than dropping it.


I dropped the ball on this one..

For comparison, the archiver process and autovacuum launcher wake up 
once a second to check if postmaster is still alive. bgwriter, when 
bgwriter_lru_maxpages and archive_timeout are set to 0 to disable it, 
checks for dead postmaster every 10 seconds.


I'll bump the default for wal_sender_delay to 1 second. Maybe an even 
higher value would be good, but it also seems good to kill replication 
connections in a timely fashion if postmaster dies.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generalized edit function?

2011-02-26 Thread Josh Berkus

Fork,

 1.  Does anybody else care? I would love to see this in contrib, but if the
 chances are slim, then I would like to know that too.

That really depends on how well it works, and how much code it is.  It's
way too early for anyone to have a viewpoint on this.  For example, a
few years ago I'd have said that trigrams were hopelessly specialized
for mainstream PostgreSQL, but not they're not.

The path for this is to create an external project (on pgfoundry,
github, sourceforge, code.google,com, etc.) and then an Extension for
this.  Once it's production-quality, and if contrib is even relevant at
that point, you can propose it.

 3.  I will probably implement this for ascii characters -- if anyone has any
 thoughts on other encodings, please share.

Why would the implementation for ASCII be different from Unicode?
Surely any distance algorithm is encoding-neutral.

Anyway, if it's ASCII-only, that's a guaranteed way to make sure it
isn't taken seriously.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generalized edit function?

2011-02-26 Thread Robert Haas

On Sat, Feb 26, 2011 at 4:19 PM, Josh Berkus j...@agliodbs.com wrote:
 Anyway, if it's ASCII-only, that's a guaranteed way to make sure it
 isn't taken seriously.

Pre-9.1 levenshtein is ASCII-only, and I think some of the other stuff
in contrib/fuzzystrmatch still is.  We had to work pretty hard to
avoid a massive performance loss when we made it multi-byte aware, and
IIRC there is still a pretty significant performance loss if any
multibyte characters are actually present.  But at least now it
doesn't return totally bogus answers.

So I have some sympathy with the OP's desire not to burden himself
with the non-ASCII case if he doesn't need it for his application, but
I also agree with your point that we probably wouldn't accept code
into contrib that doesn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: why not finish sub-updates at the end, not the beginning?

2011-02-26 Thread Marko Tiikkaja


On 2011-02-26 7:18 PM, Tom Lane wrote:

IMO the major disadvantage of a refactoring like this is the possibility
of sins of omission in third-party code, in particular somebody not
noticing the added requirement to call ExecutorFinish.  We could help
them out by adding an Assert in ExecutorEnd to verify that
ExecutorFinish had been called (unless explain-only mode).  A variant of
that problem is an auto_explain-like add-on not noticing that they
probably want to hook into ExecutorFinish if they'd previously been
hooking ExecutorRun.  I don't see any simple check for that though.
The other possible failure mode is forgetting to remove calls to the two
trigger functions, but we could encourage getting that right by renaming
those two functions.


While I don't really like the possibility of breaking third party 
modules, I think the idea is good.  Also +1 for adding checks where 
possible.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE: about the name of the feature

2011-02-26 Thread David Fetter

On Sat, Feb 26, 2011 at 12:52:40AM -0500, Robert Haas wrote:
 On Sat, Feb 26, 2011 at 12:30 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  Yay!  I'm excited about this, particularly the possible pipelining
  stuff, where you can do WITH (DELETE .. RETURNING ..) INSERT ...  and
  have it be like cool and fast and stuff.
 
  Or at least I hope you can do that.
 
  It's gonna need some work yet.  As things stand, the tuples are indeed
  pipelined through, but the CteScan nodes *also* stash them aside into
  tuplestores, just in case somebody demands a rescan.  Fixing that will
  require revisiting the exec flags (EXEC_FLAG_REWIND etc).  We don't
  currently distinguish it's unlikely you'll have to rescan from
  you're guaranteed not to have to rescan, but a CteScan that's covering
  a ModifyTable has to know the latter to not have to keep hold of copies
  of the RETURNING tuples.
 
  It might be a small enough change to do after alpha starts, but I don't
  have time for it right now.
 
 Well, if nothing else, the potential is there for a future release.

That's kinda where I am on this one.  In the grand PostgreSQL
tradition, we can have something that works before we have something
that works fast :)

 I'm probably not quite as excited about this feature as David Fetter

Heh!

 (and my 100-Watt lightbulb is not quite as bright as the sun at high
 noon in midsummer) but I do think it's pretty cool, and I appreciate
 you getting it in, even in a somewhat basic form.

Me, too!

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Native XML

2011-02-26 Thread Anton

Hello,
I've been playing with 'native XML' for a while and now wondering if
further development of such a feature makes sense for Postgres.
(By not having brought this up earlier I'm taking the chance that the
effort will be wasted, but that's not something you should worry about.)

The code is available here:
https://github.com/ahouska/postgres/commit/bde3d3ab05915e91a0d831a8877c2fed792693c7

Whoever is interested in my suggestions, I recommend to start at the
test (it needs to be executed standalone, pg_regress is not aware of it
yet):

src/test/regress/sql/xmlnode.sql
src/test/expected/xmlnode.out

In few words, the 'xmlnode' is a structured type that stores XML
document in a form of tree, as opposed to plain text.
Parsing is only performed on insert or update (for update it would also
make sense to implement functions that add/remove nodes at the low
level, w/o dumping  parsing).

Unlike 'libxml2', the parser uses palloc()/pfree(). The output format is
independent from any 3rd party code.
The binary (parsed) XML node is single chunk of memory, independent from
address where it was allocated.
The parser does yet fully conform to XML standard and some functionality
is still missing (DTD, PI, etc., see comments in the code if you're
interested in details).

'xquery()' function evaluates (so far just a simple) XMLPath expressions
and for each document it returns a set of matching nodes/subtrees.
'xmlpath' is parsed XMLPath (i.e. the expression + some metadata). It
helps to avoid repeated parsing of the XMLPath expressions by the
xquery() function.

I don't try to pretend that I invented this concept: DB2, Oracle and
probably some other commercial databases do have it for years.
Even though the mission of Postgres is not as simple as copying features
from other DBMs, I think the structured XML makes sense as such.
It allows for better integration of relational and XML data - especially
joining relational columns with XML node sets.

In the future, interesting features could be based on it. For example,
XML node/subtree can be located quickly within a xmlnode value and as
such it could be indexed (even though the existing indexes / access
methods might not be appropriate for that).

When reviewing my code, please focus on the ideas, rather than the code
quality :-) I'm aware that some refactoring will have to be done in case
this subproject will go on.

Thanks in advance for any feedback,
Tony.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql: \dg off by one error

2011-02-26 Thread Josh Kupershmidt

Hi all,

I noticed an off by one error in psql's verbose-mode display for \dg
and \du. In verbose mode, \dg and \du will not display the
Replication attribute:

test=# \dg rep
List of roles
 Role name | Attributes  | Member of
---+-+---
 rep   | Replication | {}

test=# \dg+ rep
  List of roles
 Role name | Attributes | Member of | Description
---++---+-
 rep   || {}|


Attached is a one line patch to fix.

Josh
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 9dc02a1..fc860cc 100644
*** a/src/bin/psql/describe.c
--- b/src/bin/psql/describe.c
*** describeRoles(const char *pattern, bool
*** 2342,2348 
  			add_role_attribute(buf, _(Cannot login));
  
  		if (pset.sversion = 90100)
! 			if (strcmp(PQgetvalue(res, i, 8), t) == 0)
  add_role_attribute(buf, _(Replication));
  
  		conns = atoi(PQgetvalue(res, i, 6));
--- 2342,2348 
  			add_role_attribute(buf, _(Cannot login));
  
  		if (pset.sversion = 90100)
! 			if (strcmp(PQgetvalue(res, i, (verbose ? 9 : 8)), t) == 0)
  add_role_attribute(buf, _(Replication));
  
  		conns = atoi(PQgetvalue(res, i, 6));

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generalized edit function?

2011-02-26 Thread fork

Robert Haas robertmhaas at gmail.com writes:

 
 On Sat, Feb 26, 2011 at 4:19 PM, Josh Berkus josh at agliodbs.com wrote:
  Anyway, if it's ASCII-only, that's a guaranteed way to make sure it
  isn't taken seriously.
 
 Pre-9.1 levenshtein is ASCII-only, and I think some of the other stuff
 in contrib/fuzzystrmatch still is.

I am only looking at 9.0.3 for levenshtein, so I don't have any thoughts yet on
multi-byteness so far.   I will have to figure out the multibyte character work
once I get the basic algorithm working -- any thoughts on that?  Any pitfalls in
porting?

 So I have some sympathy with the OP's desire not to burden himself
 with the non-ASCII case if he doesn't need it for his application,

 but
 I also agree with your point that we probably wouldn't accept code
 into contrib that doesn't.

Good to know.  I will try to avoid backing myself into an ascii corner.
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Native XML

2011-02-26 Thread Josh Berkus

On 2/26/11 3:40 PM, Anton wrote:
 I've been playing with 'native XML' for a while and now wondering if
 further development of such a feature makes sense for Postgres.
 (By not having brought this up earlier I'm taking the chance that the
 effort will be wasted, but that's not something you should worry about.)

Nah, just if you don't get any feedback, bring it up again in June when
9.2 development officially starts.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Replication server timeout patch

2011-02-26 Thread Fujii Masao

On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas robertmh...@gmail.com wrote:
 IMHO, that's so broken as to be useless.

 I would really like to have a solution to this problem, though.
 Relying on TCP keepalives is weak.

Agreed.

I updated the replication timeout patch which I submitted before.
http://archives.postgresql.org/message-id/AANLkTinSvcdAYryNfZqd0wepyh1Pf7YX6Q0KxhZjas6a%40mail.gmail.com

Since the patch implements also non-blocking send functions,
the timeout can work properly even when the send buffer has
been filled up.

 There are two things that I think are pretty clear.  If the receiver
 has wal_receiver_status_interval=0, then we should ignore
 replication_timeout for that connection.

The patch still doesn't check that wal_receiver_status_interval
is set up properly. I'll implement that later.

Regards,

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
***
*** 2015,2020  SET ENABLE_SEQSCAN TO OFF;
--- 2015,2042 
 /para
/listitem
   /varlistentry
+ 
+  varlistentry id=guc-replication-timeout xreflabel=replication_timeout
+   termvarnamereplication_timeout/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamereplication_timeout/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ Specifies the maximum time, in milliseconds, to wait for the reply
+ from the standby before terminating replication.  This is useful for
+ the primary server to detect the standby crash or network outage.
+ A value of zero (the default) turns this off.  This parameter can
+ only be set in the filenamepostgresql.conf/ file or on the server
+ command line.
+/para
+para
+ To make the timeout work properly, xref linkend=guc-wal-receiver-status-interval
+ must be enabled on the standby, and its value must be less than the
+ value of varnamereplication_timeout/.
+/para
+   /listitem
+  /varlistentry
   /variablelist
  /sect2
  
***
*** 2125,2130  SET ENABLE_SEQSCAN TO OFF;
--- 2147,2157 
 the filenamepostgresql.conf/ file or on the server command line.
 The default value is 10 seconds.
/para
+   para
+When xref linkend=guc-replication-timeout is enabled on the primary,
+varnamewal_receiver_status_interval/ must be enabled, and its value
+must be less than the value of varnamereplication_timeout/.
+   /para
/listitem
   /varlistentry
  
*** a/src/backend/libpq/pqcomm.c
--- b/src/backend/libpq/pqcomm.c
***
*** 56,61 
--- 56,63 
   *		pq_putbytes		- send bytes to connection (not flushed until pq_flush)
   *		pq_flush		- flush pending output
   *		pq_getbyte_if_available - get a byte if available without blocking
+  *		pq_putbytes_if_writable	- send bytes to connection if writable without blocking
+  *		pq_flush_if_writable	- flush pending output if writable without blocking
   *
   * message-level I/O (and old-style-COPY-OUT cruft):
   *		pq_putmessage	- send a normal message (suppressed in COPY OUT mode)
***
*** 112,117  static char sock_path[MAXPGPATH];
--- 114,120 
  
  static char PqSendBuffer[PQ_BUFFER_SIZE];
  static int	PqSendPointer;		/* Next index to store a byte in PqSendBuffer */
+ static int	PqSendStart;		/* Next index to send a byte in PqSendBuffer */
  
  static char PqRecvBuffer[PQ_BUFFER_SIZE];
  static int	PqRecvPointer;		/* Next index to read a byte from PqRecvBuffer */
***
*** 128,133  static bool DoingCopyOut;
--- 131,137 
  static void pq_close(int code, Datum arg);
  static int	internal_putbytes(const char *s, size_t len);
  static int	internal_flush(void);
+ static int	internal_flush_if_writable(void);
  
  #ifdef HAVE_UNIX_SOCKETS
  static int	Lock_AF_UNIX(unsigned short portNumber, char *unixSocketName);
***
*** 1153,1158  internal_putbytes(const char *s, size_t len)
--- 1157,1212 
  }
  
  /* 
+  *		pq_putbytes_if_writable - send bytes to connection (not flushed
+  *			until pq_flush), if writable
+  *
+  * Returns the number of bytes written without blocking, or EOF if trouble.
+  * 
+  */
+ int
+ pq_putbytes_if_writable(const char *s, size_t len)
+ {
+ 	size_t		amount;
+ 	size_t		nwritten = 0;
+ 
+ 	/* Should not be called by old-style COPY OUT */
+ 	Assert(!DoingCopyOut);
+ 	/* No-op if reentrant call */
+ 	if (PqCommBusy)
+ 		return 0;
+ 	PqCommBusy = true;
+ 
+ 	while (len  0)
+ 	{
+ 		/* If buffer is full, then flush it out */
+ 		if (PqSendPointer = PQ_BUFFER_SIZE)
+ 		{
+ 			int		r;
+ 
+ 			r = internal_flush_if_writable();
+ 			if (r == 0)
+ break;
+ 			if (r == EOF)
+ 			{
+ PqCommBusy = false;
+ return r;
+ 			}
+ 		}
+

Re: [HACKERS] Generalized edit function?

2011-02-26 Thread Robert Haas

On Sat, Feb 26, 2011 at 7:40 PM, fork forkandw...@gmail.com wrote:
 Pre-9.1 levenshtein is ASCII-only, and I think some of the other stuff
 in contrib/fuzzystrmatch still is.

 I am only looking at 9.0.3 for levenshtein, so I don't have any thoughts yet 
 on
 multi-byteness so far.   I will have to figure out the multibyte character 
 work
 once I get the basic algorithm working -- any thoughts on that?  Any pitfalls 
 in
 porting?

The main thing with levenshtein() is that if you're working with
single byte characters then you can reference the i'th character as
x[i], whereas if you have multi-byte characters then you need to build
an offset table and look at length[i] bytes beginning at
x[offset[i]].  That turns out to be significantly more expensive.  As
initially proposed, the patch to add multi-byte awareness built this
lookup table for any multi-byte encoding and used the faster technique
for single-byte encodings, but that wasn't actually so hot, because
the most widely used encoding these days is probably UTF-8, which of
course is a multi-byte encoding.  What we ended up with is a fast-path
for the case where both strings contain only single-byte characters,
which will always be true in a single-byte encoding but might easily
also be true in a multi-byte encoding, especially for English
speakers.  I don't know if that's exactly right for what you're trying
to do - you'll probably need to try some different things and
benchmark.  I would however recommend that you look at the
master-branch implementation of levenshtein() rather than the old
9.0.x one, because it's significantly different, and forward-porting
your changes will probably be hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sync rep design architecture (was disposition of remaining patches)

2011-02-26 Thread Daniel Farina

On Fri, Feb 25, 2011 at 8:40 AM, Greg Smith g...@2ndquadrant.com wrote:
 I didn't get the Streaming Rep + Hot Standby features I wanted in 9.0 either. 
  But committing what was reasonable to include in that version let me march 
 forward with very useful new code, doing another year of development on my 
 own projects and getting some new things get fixed in core.  And so far it 
 looks like 9.1 will sort out all of the kinks I was unhappy about.  The same 
 sort of thing will need to happen to get Sync Rep committed and then 
 appropriate for more use cases.  There isn't any margin left for discussions 
 of scope creep left here; really it's is this subset useful for some 
 situations and stable enough to commit now.

I mostly wanted to raise the issue to not be a blocker, but an attempt
to avoid boxing ourselves in for growing such a feature in 9.2.  if
9.1 ships with the syncrep patch as-conceived, it'll just mean that
it'll be hard/not possible to offer syncrep to users as well as at the
infrastructure service provider level...which is, actually, quite
fine -- most current users likely don't want to take the performance
hit of syncrep all the time, but to live with it during a switchover
is quite fine.  I just wanted to make a reasonable effort to ensure
its possibility in a 9.2-like timeframe.

 2. The unprivileged user can disable syncrep, in any situation. This
 flexibility is *great*, but you don't really want people to do it when
 one is performing the switchover.

 For the moment you may have to live with a situation where user connections 
 must be blocked during the brief moment of switchover to eliminate this 
 issue.  That's what I end up doing with 9.0 production systems to get a 
 really clean switchover, there's a second of hiccup even in the best case.  
 I'm not sure yet of the best way yet to build a UI to make that more 
 transparent in the sync rep case.  It's sure not a problem that's going to 
 get solved in this release though.

I'm totally okay killing all backends during the switchover between
9.1 and 9.2 releases, unless I get super clever with pgbouncer...which
I will have to do anyway.

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

50 matches

Mail list logo