Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Darren Duncan
Your proposal makes me think of something similar which might be useful, 
INclusion constraints.  As exclusion constraints might be thought of like a 
generalization of unique/key constraints, inclusion constraints are like a 
generalization of foreign key constraints.  The inclusion constraints 
basically allow some comparison operator other than is-equal to test if values 
in one table match values in another table, and the constraint allows the former 
if the test results in true.  An example of said inclusion test is whether the 
range in one table is contained in a range in another table.  I assume/hope 
that, similarly, now that we have range types in 9.2, that the existing 
exclusion constraints can be used with range comparison operators.


As to your actual proposal, it sounds like a generalization of the relational 
join or set intersection operator where instead of comparing sets defined in 
terms of an enumeration of discrete values we are comparing sets defined by a 
range, which conceptually have infinite values depending on the data type the 
range is defined over.  But if we're doing this, then it would seem to make 
sense to go further and see if we have set analogies for all of our relational 
or set operators, should we want to do work with non-discrete sets.


Now this sounds interesting in theory, but I would also assume that these could 
be implemented by an extension in terms of existing normal relational operators, 
where each range value is a discrete value, combined with operators for unioning 
or differencing etc ranges.  A relation of ranges effectively can represent a 
discontinuous range; in that case, the empty discontinuous range is also 
canonically representable by a relation with zero tuples.


Jeff, I get the impression your proposal is partly about helping performance by 
supporting this internally, rather than one just defining it as a SQL function, 
am I right?


-- Darren Duncan

Jeff Davis wrote:

I hope this is not an inappropriate time for 9.3 discussions. The flip
side of asking for submissions in the first couple commitfests means
that I need to submit proposals now.

What is a Range Join?

See attached SQL for example. The key difference is that the join
condition is not equality, but overlaps ().

Problem statement: slow. Nested loops are the only option, although they
can benefit from an inner GiST index if available. But if the join is
happening up in the plan tree somewhere, then it's impossible for any
index to be available.

Proposed solution: a modified merge join that can handle ranges.

 1. Order the ranges on both sides by the lower bound, then upper bound.
Empty ranges can be excluded entirely.
 2. Left := first range on left, Right := first range on right
 3. If Left or Right is empty, terminate.
 4. If lower(Left)  upper(Right), discard Right, goto 2
 5. If lower(Right)  upper(Left), discard Left, goto 2
 6. return (Left, Right) as joined tuple
 7. Right := next range on right
 8. goto 3

If we get step 4 or step 5 keeps getting triggered, and a btree index is
available (ordered by lower bound), we can re-probe to go to the correct
position, and consider that the new top range on that side. This is an
optimization for the case where there are large numbers of ranges with
no match on the other side.

Thanks to Nathan Boley for helping me devise this algorithm. However,
any bugs are mine alone ;)

Weaknesses: I haven't thought through the optimization, but I suspect it
will be hard to be very accurate in the costing. That might be OK,
because there aren't very many options anyway, but I'll need to think it
through.

Questions:

 * Is this idea sane? -- that is, are ranges important enough that
people are willing to maintain a new operator?
 * The more general problem might be spatial joins which can operate
in N dimensions, and I doubt this would work very well in that case.
Does someone know of a spatial join algorithm (without IP claims) that
would be as good as this one for ranges?
 * Other thoughts?

Regards,
Jeff Davis


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Heikki Linnakangas

On 16.04.2012 08:40, Jeff Davis wrote:

Does someone know of a spatial join algorithm (without IP claims) that
would be as good as this one for ranges?


I'd be happy with an algorithm that's specific to ranges, too, but my 
gut geeling is that there has to be a lot of research on spatial join 
algorithms out there. Implementing one of them would be more widely 
applicable, so I think we should look into spatial join algorithms first 
and see how far that gets us, before we write something specific to ranges.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Proposed solution: a modified merge join that can handle ranges.

  1. Order the ranges on both sides by the lower bound, then upper bound.
 Empty ranges can be excluded entirely.
  2. Left := first range on left, Right := first range on right
  3. If Left or Right is empty, terminate.
  4. If lower(Left)  upper(Right), discard Right, goto 2
  5. If lower(Right)  upper(Left), discard Left, goto 2
  6. return (Left, Right) as joined tuple
  7. Right := next range on right
  8. goto 3

This is surely not correct in detail.  As written, it will be impossible
for any tuple on the right side to be joined to more than one left-side
tuple.  You will need something analogous to the mark-and-restore
rewinding logic in standard mergejoin to make this work.

I can't escape the feeling that we could do this with more-or-less
standard mergejoin logic paying attention to some fuzzy notion of
equality, with the question of whether a given pair of ranges actually
overlap being deferred to a filter condition.  Not quite sure how to
make that work though.

  * Is this idea sane? -- that is, are ranges important enough that
 people are willing to maintain a new operator?

Dunno.  It might be easier to sell the idea of adding support for range
joins in a couple of years, after we've seen how much use ranges get.

I will note that mergejoin is not only one of the more complicated
executor modules, but it accounts for a huge percentage of the planner's
complexity; which doesn't exactly make me sympathize with the notion of
let's-copy-and-paste-all-that.  It'd be a lot better if we could build
it as a small tweak to mergejoin rather than an independent thing.

(Having said that, I have no idea how the planner support could work
at all, because its treatment of mergejoins is intimately tied to
mergejoinable equality and EquivalenceClasses and PathKeys, which don't
seem readily extensible to the range situation.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-04-16 Thread Nikhil Sontakke
  Displace yes. It would error out if someone says
 
  ALTER TABLE ONLY... CHECK ();
 
  suggesting to use the ONLY with the CHECK.

 I'd say the behavior for that case can revert to the PostgreSQL 9.1
 behavior.
 If the table has children, raise an error.  Otherwise, add an inheritable
 CHECK constraint, albeit one lacking inheritors at that moment.


Ok, that sounds reasonable.

Another thing that we should consider is that if we are replacing ONLY with
NO INHERIT, then instead of just making a cosmetic syntactic change, we
should also replace all the is*only type of field names with noinherit for
the sake of completeness and uniformity.

Regards,
Nikhils


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Amit Kapila
Another way to look at this is that if we have

  select ... from a,b,c,d where a.x = b.y + c.z

we want to consider a cross-join of b and c, in the hopes that we can do
something useful with the join clause at the next level where it can
join to a.  From b's perspective there is no percentage in joining to d.

For this kind of query, currently (referring 9.0.3 code) also it considers
join of b,c and b,d. 
As there is no join clause between b,c,d so it will go in path of
make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

In this kind of query, if the suggestion by me in below mail is followed,
then it will consider joining a,b a,c a,d at level-2 in function
make_rels_by_clause_joins() which it currently doesn't do which may generate
useless join paths.
However in real-world scenario's this kind of queries where 2 cols of
different tables are
used in one side expression (b.y + c.z) of where clause may be less.

On the other hand, when we come to consider d, it will have no join
clauses so we will consider joining it to each other rel in turn.  

When it come to consider d, as at level -2 it only consider later rels. So
it should not consider joining with each other rel.

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Monday, April 16, 2012 9:57 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila amit.kap...@huawei.com writes:
 That case is handled by make_rels_by_clauseless_joins
 It will be handled by make_rels_by_clauseless_joins() if given rel old_rel
 doesn't have any join clause.
 However if it has join clause but doesn't able to join with any other rels
 like in the example you have provided for relation c, it is not able to
join
 with other rel d. 
 In such cases it can do cross-join with d, because it has not found any
 relation to join with.
 Doesn't it will address the problem you mentioned?

Sounds to me like that's going in the wrong direction, ie, joining to
exactly the wrong relations.  If you have to cross-join it's better to
cross-join against relations that are included in one of the join
clauses that does mention the current relation.

Another way to look at this is that if we have

select ... from a,b,c,d where a.x = b.y + c.z

we want to consider a cross-join of b and c, in the hopes that we can do
something useful with the join clause at the next level where it can
join to a.  From b's perspective there is no percentage in joining to d.
On the other hand, when we come to consider d, it will have no join
clauses so we will consider joining it to each other rel in turn.  So
if there is any value in joining d early, we will find that out.

Or at least that's the theory.  Now that I look at it this way, I think
there is a bug here: when we are at level 2, we only consider later rels
in the list for forced clauseless joins.  That would be okay if the
condition were symmetrical, but it isn't.  This makes for a bogus
FROM-list ordering dependency in handling of clauseless joins.  Not
sure how much that matters in the real world, but it still seems wrong.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Noah Misch
On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
 In the department of query cancellations, I believe Noah argued
 previously that this wasn't really going to cause a problem.  And,
 indeed, if the master has a mix of inserts, updates, and deletes, then
 it seems likely that any recovery conflicts generated this way would
 be hitting about the same place in the XID space as the ones caused by
 pruning away dead tuples.  What will be different, if we go this
 route, is that an insert-only master, which right now only generates
 conflicts at freezing time, will instead generate conflicts much
 sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
 enough to know whether this is a problem, and I'm not trying to block
 this approach, but I do want to make sure that everyone agrees that
 it's acceptable before we go do it, because inevitably somebody is
 going to get bit.

Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC off would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.

 As to correctness, after further review of lazy_scan_heap(), I think
 there are some residual bugs here that need to be fixed whatever we
 decide to do about the Hot Standby case:
 
 1. I noticed that when we do PageSetAllVisible() today we follow it up
 with SetBufferCommitInfoNeedsSave().  PD_ALL_VISIBLE is not merely a
 hint, so I think these should be changed to MarkBufferDirty(), which
 shouldn't make much difference given current code, but proposals to
 handle hint changes differently than non-hint changes abound, so it's
 probably not good to rely on those meaning the same thing forever.

Do you refer to PD_ALL_VISIBLE as not merely a hint due to the requirement
to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
in its role as a witness of tuple visibility, but it is not a hint in its role
as a witness of the visibility map status?  In any event, I agree that those
call sites should use MarkBufferDirty().

The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
systems with weaker memory ordering, the FlushBuffer() process's clearing of
BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
process until some time after the former retrieved buffer contents from shared
memory.  Memory barriers could make the comment true, but we should probably
just update the comment to explain that a race condition may eat the update
entirely.  Incidentally, is there a good reason for MarkBufferDirty() to
increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
not to do so?  Looks like an oversight.

 2. More seriously, lazy_scan_heap() releases the buffer lock before
 setting the all-visible bit on the heap.  This looks just plain wrong
 (and it's my fault, to be clear).  Somebody else can come along after
 we've released the buffer lock and mark the page not-all-visible.
 That concurrent process will check the visibility map bit, find it
 already clear, and go on its merry way.  Then, we set the visibility
 map bit and go on our merry way.  Now we have a not-all-visible page
 with the all-visible bit set in the visibility map.   Oops.

Hmm, indeed.  This deserves its own open item.

 I think
 this probably needs to be rewritten so that lazy_scan_heap() acquires
 a pin on the visibility map page before locking the buffer for cleanup
 and holds onto that pin until either we exit the range of heap pages
 covered by that visibility map page or trigger index vac due to
 maintenance_work_mem exhaustion.  With that infrastructure in place,
 we can set the visibility map bit at the same time we set the
 page-level bit without risking holding the buffer lock across a buffer
 I/O (we might have to hold the buffer lock across a WAL flush in the
 worst case, but that hazard exists in a number of other places as well
 and fixing it here is well out of scope).

Looks reasonable at a glance.

 1. vacuum on master sets the page-level bit and the visibility map bit
 2. the heap page with the bit is written to disk BEFORE the WAL entry
 generated in (1) gets flushed; this is allowable, since it's not an
 error for the page-level bit to be set while the visibility-map bit is
 unset, only the other way around
 3. crash (still before the WAL entry is flushed)
 4. some operation on the master emits an FPW for the page without
 first rendering it not-all-visible
 
 At present, I'm not sure there's any real problem here, since normally
 an all-visible heap page is only going to get another WAL entry if
 somebody does an insert, update, or delete on it, in which case the
 visibility map bit is going to get cleared anyway.  Freezing the page
 might emit a new FPW, but that's going to generate conflicts anyway,
 so no problem there.  So I think there's no real problem here, but it
 

Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-16 Thread Etsuro Fujita
Thank you for the review.

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew Dunstan
 Sent: Friday, April 13, 2012 9:16 PM
 To: Shigeru HANADA
 Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
 
 
 
 On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
  (2012/04/13 16:59), Etsuro Fujita wrote:
  I updated the patch added to CF 2012-Next [1].  Attached is the
  updated version of the patch.
  I applied the patch and ran regression tests of file_fdw, and I got
  SIGSEGV X-(
 
  The failure occurs in fileGetOptions, and it is caused by
  list_delete_cell used in foreach loop; ListCell points delete target
  has been free-ed in list_delete_cell, but foreach accesses it to get
  next element.
 
  Some of backend functions which use list_delete_cell in loop use for
  loop instead of foreach, and other functions exit the loop after
  calling list_delete_cell.  Since we can't stop searching non-COPY
  options until meeting the end of the options list, we would need to
  choose former (for loop), or create another list which contains only
  valid COPY options and return it via other_options parameter.
 
 
 Yes, the code in fileGetOptions() appears to be bogus.

Sorry, I will fix it.

 Also, validate is a terrible name for the option (and in the code)
IMNSHO.
 It's far too generic. validate_not_null or some such would surely be
 better.

I thought it would be used for not only NOT NULL but also CHECK and foreign
key constraints.  That is, when a user sets the option to 'true', file_fdw
verifies that each tuple meets all kinds of constraints.  So, how about
validate_data_file or simply validate_file?

Best regards,
Etsuro Fujita

 cheers
 
 andrew
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
 changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch n...@leadboat.com wrote:
 On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
 In the department of query cancellations, I believe Noah argued
 previously that this wasn't really going to cause a problem.  And,
 indeed, if the master has a mix of inserts, updates, and deletes, then
 it seems likely that any recovery conflicts generated this way would
 be hitting about the same place in the XID space as the ones caused by
 pruning away dead tuples.  What will be different, if we go this
 route, is that an insert-only master, which right now only generates
 conflicts at freezing time, will instead generate conflicts much
 sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
 enough to know whether this is a problem, and I'm not trying to block
 this approach, but I do want to make sure that everyone agrees that
 it's acceptable before we go do it, because inevitably somebody is
 going to get bit.

 Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
 A standby with the GUC off would ignore all-visible indicators and also
 decline to generate conflicts when flipping them on.

I'm against adding a new GUC, especially for that fairly thin reason.


 1. vacuum on master sets the page-level bit and the visibility map bit
 2. the heap page with the bit is written to disk BEFORE the WAL entry
 generated in (1) gets flushed; this is allowable, since it's not an
 error for the page-level bit to be set while the visibility-map bit is
 unset, only the other way around
 3. crash (still before the WAL entry is flushed)
 4. some operation on the master emits an FPW for the page without
 first rendering it not-all-visible

 At present, I'm not sure there's any real problem here, since normally
 an all-visible heap page is only going to get another WAL entry if
 somebody does an insert, update, or delete on it, in which case the
 visibility map bit is going to get cleared anyway.  Freezing the page
 might emit a new FPW, but that's going to generate conflicts anyway,
 so no problem there.  So I think there's no real problem here, but it
 doesn't seem totally future-proof - any future type of WAL record that
 might modify the page without rendering it not-all-visible would
 create a very subtle hazard for Hot Standby.  We could dodge the whole
 issue by leaving the hack in heapgetpage() intact and just be happy
 with making index-only scans work, or maybe somebody has a more clever
 idea.

 Good point.  We could finagle things so the standby notices end-of-recovery
 checkpoints and blocks the optimization for older snapshots.  For example,
 maintain an integer count of end-of-recovery checkpoints seen and store that
 in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
 global value is greater than the snapshot's value, disable the optimization
 for that snapshot.  I don't know whether the optimization is powerful enough
 to justify that level of trouble, but it's an option to consider.

 For a different approach, targeting the fragility, we could add assertions to
 detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
 full-page image.

We don't need a future proof solution, especially not at this stage of
the release cycle. Every time you add a WAL record, we need to
consider the possible conflict impact.

We just need a simple and clear solution. I'll work on that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 7:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Dunno.  It might be easier to sell the idea of adding support for range
 joins in a couple of years, after we've seen how much use ranges get.

Once we've started the journey towards range types we must complete it
reasonably quickly.

Having partially implemented, unoptimised features is the same as not
having the feature at all, so it will remain unused until it really
works. We could say that about many features, but if Jeff is
championing this, I'd say go for it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-16 Thread Simon Riggs
On Sun, Apr 15, 2012 at 11:31 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 04/15/2012 05:46 AM, Simon Riggs wrote:

 Our problem is not lack of resource, it is ineffective
 delegation. As Hannu points out, he didn't know the patch would be
 rejected, so he didn't know help was needed to save something useful.
 I considered that the job of the CF manager, but perhaps it is was
 not.


 Note that one of the influences on the death march here was lack of a
 fully devoted CF manager for the full duration.  I did some rabble rousing
 to get things started in the usual way, but my time for this only budgeted
 for six weeks in that role.  And that was quite optimistic for this one.

 Trying to quantify how much time investment the CF manager role really
 involves is one of my important projects to chew on.  Whoever ends up doing
 that should at least have an idea what scale of problem they're getting
 into.

Again, if the CF manager role is too big, then we delegate. No reason
why we can't have a CF team, with different people responsible for
different areas.

The key thing is working out how to mobilise the resources we have to
best effect.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Different gettext domain needed for error context

2012-04-16 Thread Heikki Linnakangas

On 15.04.2012 00:54, Tom Lane wrote:

I really think we need to change errcontext itself to pass the correct
domain.  If we are going to require a domain to be provided (and this
does require that, for correct operation), then we need to break any
code that doesn't provide it in a visible fashion.

A possibly more attractive alternative is to redefine errcontext
with a macro that allows TEXTDOMAIN to be passed in behind-the-scenes,
thus keeping source-level compatibility.  We can do this with the same
type of hack we've used for many years for elog():

#define errcontext  set_errcontext_domain(TEXTDOMAIN), errcontext_msg

where the actual message-passing function is now called errcontext_msg.


Ok then, here's a patch using that approach.

I had to rename a few local variables called errcontext, because the 
macro now tries to expands those and you get an error.


Note: If you want to test this at home, the original test case I posted 
doesn't currently work because the text of the context messages in 
PL/pgSQL have been slightly changed since I posted the original test 
case, but the translations have not been updated yet. Until then, you 
can manually remove the double quotes in messages like 'function \%s\' 
in the .po file to test this.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0c301b2..a10b569 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6504,7 +6504,7 @@ StartupXLOG(void)
 			bool		recoveryContinue = true;
 			bool		recoveryApply = true;
 			bool		recoveryPause = false;
-			ErrorContextCallback errcontext;
+			ErrorContextCallback errcallback;
 			TimestampTz xtime;
 
 			InRedo = true;
@@ -6566,10 +6566,10 @@ StartupXLOG(void)
 }
 
 /* Setup error traceback support for ereport() */
-errcontext.callback = rm_redo_error_callback;
-errcontext.arg = (void *) record;
-errcontext.previous = error_context_stack;
-error_context_stack = errcontext;
+errcallback.callback = rm_redo_error_callback;
+errcallback.arg = (void *) record;
+errcallback.previous = error_context_stack;
+error_context_stack = errcallback;
 
 /*
  * ShmemVariableCache-nextXid must be beyond record's xid.
@@ -6614,7 +6614,7 @@ StartupXLOG(void)
 RmgrTable[record-xl_rmid].rm_redo(EndRecPtr, record);
 
 /* Pop the error context stack */
-error_context_stack = errcontext.previous;
+error_context_stack = errcallback.previous;
 
 if (!XLogRecPtrIsInvalid(ControlFile-backupStartPoint) 
 	XLByteLE(ControlFile-backupEndPoint, EndRecPtr))
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 95fec8d..cb7e67a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1854,7 +1854,7 @@ CopyFrom(CopyState cstate)
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
-	ErrorContextCallback errcontext;
+	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
 	BulkInsertState bistate;
@@ -1998,10 +1998,10 @@ CopyFrom(CopyState cstate)
 	econtext = GetPerTupleExprContext(estate);
 
 	/* Set up callback to identify error line number */
-	errcontext.callback = CopyFromErrorCallback;
-	errcontext.arg = (void *) cstate;
-	errcontext.previous = error_context_stack;
-	error_context_stack = errcontext;
+	errcallback.callback = CopyFromErrorCallback;
+	errcallback.arg = (void *) cstate;
+	errcallback.previous = error_context_stack;
+	error_context_stack = errcallback;
 
 	for (;;)
 	{
@@ -2116,7 +2116,7 @@ CopyFrom(CopyState cstate)
 			nBufferedTuples, bufferedTuples);
 
 	/* Done, clean up */
-	error_context_stack = errcontext.previous;
+	error_context_stack = errcallback.previous;
 
 	FreeBulkInsertState(bistate);
 
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 80dbdd1..33966ee 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -144,10 +144,10 @@ setup_parser_errposition_callback(ParseCallbackState *pcbstate,
 	/* Setup error traceback support for ereport() */
 	pcbstate-pstate = pstate;
 	pcbstate-location = location;
-	pcbstate-errcontext.callback = pcb_error_callback;
-	pcbstate-errcontext.arg = (void *) pcbstate;
-	pcbstate-errcontext.previous = error_context_stack;
-	error_context_stack = pcbstate-errcontext;
+	pcbstate-errcallback.callback = pcb_error_callback;
+	pcbstate-errcallback.arg = (void *) pcbstate;
+	pcbstate-errcallback.previous = error_context_stack;
+	error_context_stack = pcbstate-errcallback;
 }
 
 /*
@@ -157,7 +157,7 @@ void
 cancel_parser_errposition_callback(ParseCallbackState *pcbstate)
 {
 	/* Pop the error context stack */
-	error_context_stack = pcbstate-errcontext.previous;
+	error_context_stack = pcbstate-errcallback.previous;
 }
 
 /*
diff --git 

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Heikki Linnakangas

On 16.04.2012 10:38, Simon Riggs wrote:

On Mon, Apr 16, 2012 at 8:02 AM, Noah Mischn...@leadboat.com  wrote:

On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:

In the department of query cancellations, I believe Noah argued
previously that this wasn't really going to cause a problem.  And,
indeed, if the master has a mix of inserts, updates, and deletes, then
it seems likely that any recovery conflicts generated this way would
be hitting about the same place in the XID space as the ones caused by
pruning away dead tuples.  What will be different, if we go this
route, is that an insert-only master, which right now only generates
conflicts at freezing time, will instead generate conflicts much
sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
enough to know whether this is a problem, and I'm not trying to block
this approach, but I do want to make sure that everyone agrees that
it's acceptable before we go do it, because inevitably somebody is
going to get bit.


Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC off would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.


I'm against adding a new GUC, especially for that fairly thin reason.


Same here.

Can we have a soft hot standby conflict that doesn't kill the query, 
but disables index-only-scans?


In the long run, perhaps we need to store XIDs in the visibility map 
instead of just a bit. If you we only stored one xid per 100 pages or 
something like that, the storage overhead would not be much higher than 
what we have at the moment. But that's obviously not going to happen for 
9.2...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-16 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@commandprompt.com writes:
 I've used Redmine a lot, as you know, and I only keep using it because
 it's a requirement at work.  It is certainly not close to usable for
 general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
 certainly much worse, though).

Same story here, still using redmine a lot, all with custom reports etc.

 I can't say that it's all that slow, or that there's a problem with the
 code, or that the search doesn't work right (and I've never had a wiki
 edit disappear, either, and I've used that a lot).  It's just the wrong
 tool altogether.

It's indeed slow here, and I agree that's not the problem. Not the tool
we need, +1.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] man pages for contrib programs

2012-04-16 Thread Dimitri Fontaine
Peter Eisentraut pete...@gmx.net writes:
 Good question.  I guess we could keep the original name ... Modules
 for that chapter.

Those are a kind of server application in my mind, I think we want to
keep using “module” to mean the shared library file we load at runtime,
be it a .so, a .dylib or a .dll.

That said, those non-extension stuff are most often just a module.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Heikki Linnakangas

On 13.04.2012 19:17, Guillaume Lelarge wrote:

On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:

On 08.04.2012 11:59, Guillaume Lelarge wrote:

There could be a good reason which would explain why we can't (or don't
want to) do this, but I don't see it right now.


Me neither, except a general desire to keep internals hidden. I propose
the attached.


Sounds good to me. I would love to see this happening in 9.2.


Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed 
MSVC install.pm to also copy the header file.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] nodes/*funcs.c inconsistencies

2012-04-16 Thread Noah Misch
I observed these inconsistencies in node support functions:

- _copyReassignOwnedStmt() uses COPY_SCALAR_FIELD() on the string field
  newrole, and _equalReassignOwnedStmt() uses COMPARE_NODE_FIELD().

- _outCreateForeignTableStmt() calls _outCreateStmt() directly.  This produces
  the label CREATEFOREIGNTABLESTMTCREATESTMT.  The attached patch splits
  things out the way we normally do in outfuncs.c.  There's no readfuncs.c
  support, so this is strictly cosmetic.

- _outColumnDef() uses WRITE_INT_FIELD for the storage field, a char.
  Again, no readfuncs.c support to create a compatibility problem.

- _copyRenameStmt() and _equalRenameStmt() ignore the relationType field,
  but I can't see a good reason to do so.  PostgreSQL 9.1 added this field,
  but only recent master (after commit 38b9693f of 3 April 2012) references
  the field beyond setting it in the parser.

- _copyViewStmt() and _equalViewStmt() ignore the options field, and
  _equalColumnDef() ignores fdwoptions.  These are new in PostgreSQL 9.2,
  and I see no good reason to ignore them.

I'd suggest backpatching the ReassignOwnedStmt() bits; the wrong code could
produce crashes.  The rest are for master only.

Thanks,
nm
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c94799b..eed0ea4 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***
*** 2891,2896  _copyRenameStmt(const RenameStmt *from)
--- 2891,2897 
RenameStmt *newnode = makeNode(RenameStmt);
  
COPY_SCALAR_FIELD(renameType);
+   COPY_SCALAR_FIELD(relationType);
COPY_NODE_FIELD(relation);
COPY_NODE_FIELD(object);
COPY_NODE_FIELD(objarg);
***
*** 3047,3052  _copyViewStmt(const ViewStmt *from)
--- 3048,3054 
COPY_NODE_FIELD(aliases);
COPY_NODE_FIELD(query);
COPY_SCALAR_FIELD(replace);
+   COPY_NODE_FIELD(options);
  
return newnode;
  }
***
*** 3650,3656  _copyReassignOwnedStmt(const ReassignOwnedStmt *from)
ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt);
  
COPY_NODE_FIELD(roles);
!   COPY_SCALAR_FIELD(newrole);
  
return newnode;
  }
--- 3652,3658 
ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt);
  
COPY_NODE_FIELD(roles);
!   COPY_STRING_FIELD(newrole);
  
return newnode;
  }
diff --git a/src/backend/nodes/equalindex 9564210..c06b068 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***
*** 1306,1311  static bool
--- 1306,1312 
  _equalRenameStmt(const RenameStmt *a, const RenameStmt *b)
  {
COMPARE_SCALAR_FIELD(renameType);
+   COMPARE_SCALAR_FIELD(relationType);
COMPARE_NODE_FIELD(relation);
COMPARE_NODE_FIELD(object);
COMPARE_NODE_FIELD(objarg);
***
*** 1438,1443  _equalViewStmt(const ViewStmt *a, const ViewStmt *b)
--- 1439,1445 
COMPARE_NODE_FIELD(aliases);
COMPARE_NODE_FIELD(query);
COMPARE_SCALAR_FIELD(replace);
+   COMPARE_NODE_FIELD(options);
  
return true;
  }
***
*** 1945,1951  static bool
  _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt 
*b)
  {
COMPARE_NODE_FIELD(roles);
!   COMPARE_NODE_FIELD(newrole);
  
return true;
  }
--- 1947,1953 
  _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt 
*b)
  {
COMPARE_NODE_FIELD(roles);
!   COMPARE_STRING_FIELD(newrole);
  
return true;
  }
***
*** 2182,2187  _equalColumnDef(const ColumnDef *a, const ColumnDef *b)
--- 2184,2190 
COMPARE_NODE_FIELD(collClause);
COMPARE_SCALAR_FIELD(collOid);
COMPARE_NODE_FIELD(constraints);
+   COMPARE_NODE_FIELD(fdwoptions);
  
return true;
  }
diff --git a/src/backend/nodes/outfunindex 594b3fd..f713773 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***
*** 1927,1937  _outPlannerParamItem(StringInfo str, const PlannerParamItem 
*node)
   *
   
*/
  
  static void
! _outCreateStmt(StringInfo str, const CreateStmt *node)
  {
-   WRITE_NODE_TYPE(CREATESTMT);
- 
WRITE_NODE_FIELD(relation);
WRITE_NODE_FIELD(tableElts);
WRITE_NODE_FIELD(inhRelations);
--- 1927,1938 
   *
   
*/
  
+ /*
+  * print the basic stuff of all nodes that inherit from CreateStmt
+  */
  static void
! _outCreateStmtInfo(StringInfo str, const CreateStmt *node)
  {
WRITE_NODE_FIELD(relation);
WRITE_NODE_FIELD(tableElts);
WRITE_NODE_FIELD(inhRelations);
***
*** 1944,1954  _outCreateStmt(StringInfo str, const CreateStmt *node)
  }
  
  static void
  _outCreateForeignTableStmt(StringInfo str, 

[HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Kyotaro HORIGUCHI
Hello, this is bug report and a patch for it.

The first patch in the attachments is for 9.2dev and next one is
for 9.1.3.

On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does
not check against WAL segments written. This makes checkpointer
always run at the speed according to checkpoint_timeout
regardless of WAL advancing rate.

This leads to unexpected imbalance in the numbers of WAL segment
files between the master and the standby(s) for high advance rate
of WALs. And what is worse, the master would have much higher
chance to remove some WAL segments before the standby receives
them.

XLogPageRead()@xlog.c triggers checkpoint referring to WAL
segment advance. So I think this is a bug of bgwriter in 9.1. The
attached patches fix that on 9.2dev and 9.1.3 respctively.

In the backported version to 9.1.3, bgwriter.c is modified
instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
used as the equivalent of GetStandbyFlushRecPtr() in 9.2.

By the way, GetStandbyFlushRecPtr() acquires spin lock within. It
might be enough to read XLogCtl-recoveryLastRecPtr without lock
to make rough estimation, but I can't tell it is safe or
not. Same discussion could be for GetWalRcvWriteRecPtr() on
9.1.3.


However, it seems to work fine on a simple test.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c9473f7..f86e9b9 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -491,8 +491,8 @@ CheckpointerMain(void)
 			 * Initialize checkpointer-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =
+do_restartpoint ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -731,28 +731,29 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-	 * computed before calling CreateCheckPoint. The code in XLogInsert that
-	 * actually triggers a checkpoint when checkpoint_segments is exceeded
-	 * compares against RedoRecptr, so this is not completely accurate.
-	 * However, it's good enough for our purposes, we're only calculating an
-	 * estimate anyway.
+	 * computed before calling CreateCheckPoint. The code in
+	 * XLogInsert that actually triggers a checkpoint when
+	 * checkpoint_segments is exceeded compares against RedoRecptr.
+	 * Similarly, we consult WAL flush location instead on hot
+	 * standbys and XLogPageRead compares it aganst RedoRecptr, too.
+	 * Altough these are not completely accurate, it's good enough for
+	 * our purposes, we're only calculating an estimate anyway.
 	 */
-	if (!RecoveryInProgress())
+	
+	recptr = RecoveryInProgress() ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
+	elapsed_xlogs =
+		(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
+		 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
+		CheckPointSegments;
+
+	if (progress  elapsed_xlogs)
 	{
-		recptr = GetInsertRecPtr();
-		elapsed_xlogs =
-			(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
-			 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
-			CheckPointSegments;
-
-		if (progress  elapsed_xlogs)
-		{
-			ckpt_cached_elapsed = elapsed_xlogs;
-			return false;
-		}
+		ckpt_cached_elapsed = elapsed_xlogs;
+		return false;
 	}
 
 	/*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 5643ec8..0ce9945 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -56,6 +56,7 @@
 #include pgstat.h
 #include postmaster/bgwriter.h
 #include replication/syncrep.h
+#include replication/walreceiver.h
 #include storage/bufmgr.h
 #include storage/fd.h
 #include storage/ipc.h
@@ -489,8 +490,8 @@ BackgroundWriterMain(void)
 			 * Initialize bgwriter-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =	do_restartpoint ?
+GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-	 * computed 

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 1:05 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:

 Hello, this is bug report and a patch for it.

 The first patch in the attachments is for 9.2dev and next one is
 for 9.1.3.

 On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does
 not check against WAL segments written. This makes checkpointer
 always run at the speed according to checkpoint_timeout
 regardless of WAL advancing rate.

Thanks, I'll take a look.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Alexander Korotkov
On Mon, Apr 16, 2012 at 10:29 AM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 16.04.2012 08:40, Jeff Davis wrote:

 Does someone know of a spatial join algorithm (without IP claims) that
 would be as good as this one for ranges?


 I'd be happy with an algorithm that's specific to ranges, too, but my gut
 geeling is that there has to be a lot of research on spatial join
 algorithms out there. Implementing one of them would be more widely
 applicable, so I think we should look into spatial join algorithms first
 and see how far that gets us, before we write something specific to ranges.


There is a good overview article about spatial joins.
http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf
It shows that there is a lot of methods based on building temporaty
indexes. It might mean that building of temporary GiST index is not a bad
idea.
Also there are methods which looks like extension of Jeff Davis proposal to
the multidimensional case. It is Plane Sweep and External Plane Sweep
methods. However, it might use sophisticated data structures for the
sweep. And I believe it should use it for good performance.

--
With best regards,
Alexander Korotkov.


[HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Kyotaro HORIGUCHI
Hello, I found a duplicate words in the comment of
StartupXLOG@xlog.c and the attached patch fixes it.


Essentially the fix is in one line as follows,

-  * We're in recovery, so unlogged relations relations may be trashed
+  * We're in recovery, so unlogged relations may be trashed

But I did fill-paragraph for the fixed comment so the patch
replaces a little bit more.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0c301b2..04dec4e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6354,10 +6354,10 @@ StartupXLOG(void)
 		CheckRequiredParameterValues();
 
 		/*
-		 * We're in recovery, so unlogged relations relations may be trashed
-		 * and must be reset.  This should be done BEFORE allowing Hot Standby
-		 * connections, so that read-only backends don't try to read whatever
-		 * garbage is left over from before.
+		 * We're in recovery, so unlogged relations may be trashed and must
+		 * be reset.  This should be done BEFORE allowing Hot Standby
+		 * connections, so that read-only backends don't try to read
+		 * whatever garbage is left over from before.
 		 */
 		ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Peter Eisentraut
On mån, 2012-04-16 at 21:28 +0900, Kyotaro HORIGUCHI wrote:
 Hello, I found a duplicate words in the comment of
 StartupXLOG@xlog.c and the attached patch fixes it.
 
 
 Essentially the fix is in one line as follows,
 
 -  * We're in recovery, so unlogged relations relations may be trashed
 +  * We're in recovery, so unlogged relations may be trashed

Fixed.

 But I did fill-paragraph for the fixed comment so the patch
 replaces a little bit more.

You might want to adjust your fill-column setting to 79, so pgindent
doesn't reformat that again.  Compare to what I just committed.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Hannu Krosing
On Tue, 2012-01-31 at 12:58 -0500, Andrew Dunstan wrote:
 
 On 01/30/2012 10:37 AM, Andrew Dunstan wrote:
 
 
  Aside: is query_to_json really necessary? It seems rather ugly and
  easily avoidable using row_to_json.
 
 
  I started with this, again by analogy with query_to_xml(). But I agree 
  it's a bit ugly. If we're not going to do it, then we definitely need 
  to look at caching the output funcs in the function info. A closer 
  approximation is actually:
 
 SELECT array_to_json(array_agg(q))
 FROM ( your query here ) q;
 
 
  But then I'd want the ability to break that up a bit with line feeds, 
  so we'd need to adjust the interface slightly. (Hint: don't try the 
  above with select * from pg_class.)
 
 
  I'll wait on further comments, but I can probably turn these changes 
  around very quickly once we're agreed.
 
 
 
 
 based on Abhijit's feeling and some discussion offline, the consensus 
 seems to be to remove query_to_json.

The only comment I have here is that query_to_json could have been
replaced with json_agg, so thet you don't need to do double-buffering
for the results of array(yourquery) call in 

SELECT array_to_json(array(yourquery));

Or is there some other way to avoid it except to wrap row_to_json()
calls in own aggregate function which adds enclosing brackets and comma
separator ( like this : '['row1[,rowN]']' ?

 cheers
 
 andrew
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Andrew Dunstan



On 04/16/2012 09:34 AM, Hannu Krosing wrote:

based on Abhijit's feeling and some discussion offline, the consensus
seems to be to remove query_to_json.

The only comment I have here is that query_to_json could have been
replaced with json_agg, so thet you don't need to do double-buffering
for the results of array(yourquery) call in

SELECT array_to_json(array(yourquery));

Or is there some other way to avoid it except to wrap row_to_json()
calls in own aggregate function which adds enclosing brackets and comma
separator ( like this : '['row1[,rowN]']' ?




The way I usually write this is:

select array_to_json(array_agg(q))
from (yourquery) q;

It's a pity you didn't make this comment back in January when we were 
talking about this. I think it's too late now in this release cycle to 
be talking about adding the aggregate function.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Can we have a soft hot standby conflict that doesn't kill the query, 
 but disables index-only-scans?

Well, there wouldn't be any way for the planner to know whether an
index-only scan would be safe or not.  I think this would have to look
like a run-time fallback.  Could it be structured as return that the
page's all-visible bit is not set, instead of failing?  Or am I
confused about where the conflict is coming from?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Merlin Moncure
On Mon, Apr 16, 2012 at 9:10 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 04/16/2012 09:34 AM, Hannu Krosing wrote:

 based on Abhijit's feeling and some discussion offline, the consensus
 seems to be to remove query_to_json.

 The only comment I have here is that query_to_json could have been
 replaced with json_agg, so thet you don't need to do double-buffering
 for the results of array(yourquery) call in

 SELECT array_to_json(array(yourquery));

 Or is there some other way to avoid it except to wrap row_to_json()
 calls in own aggregate function which adds enclosing brackets and comma
 separator ( like this : '['row1[,rowN]']' ?



 The way I usually write this is:

    select array_to_json(array_agg(q))
    from (yourquery) q;

 It's a pity you didn't make this comment back in January when we were
 talking about this. I think it's too late now in this release cycle to be
 talking about adding the aggregate function.

I find array_agg to be pretty consistently slower than
array()...although not much, say around 5-10%.  I use array_agg only
when grouping.  try timing
select array_to_json(array_agg(v)) from (select v from
generate_series(1,100) v) q;
vs
select array_to_json(array(select v from generate_series(1,100) v));

I agree with Hannu but as things stand if I'm trying to avoid the
extra buffer I've found myself doing the final aggregation on the
client -- it's easy enough.  BTW, I'm using the json stuff heavily and
it's just absolutely fantastic.  Finally I can write web applications
without wondering exactly where it was that computer science went off
the rails.

I've already demoed a prototype app that integrates pg directly with
the many high quality js libraries out there and it makes things very
easy and quick by making writing data services trivial.  Data pushes
are still quite a pain but I figure something can be worked out.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes
On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:
 Yes, just like when the readahead window set to 256, FETCH 1024
 will iterate through 4 windows or FETCH 64 iterates through the
 same window 4 times. This is the idea behind the readahead window.

Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
times for FETCH 64 is the idea though, I agree.

 How about allowing the readahead window to be resized for the
 non-decorated case if the runtime encounters FETCH N and N is
 greater than the previous window?

To be resized by what? IMO a FETCH N should always be a FETCH N, no matter
what, i.e. if the readahead window is larger, use it, but even if it's smaller
we should still fetch N at the same time.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Hannu Krosing
On Mon, 2012-04-16 at 10:10 -0400, Andrew Dunstan wrote:
 
 On 04/16/2012 09:34 AM, Hannu Krosing wrote:
  based on Abhijit's feeling and some discussion offline, the consensus
  seems to be to remove query_to_json.
  The only comment I have here is that query_to_json could have been
  replaced with json_agg, so thet you don't need to do double-buffering
  for the results of array(yourquery) call in
 
  SELECT array_to_json(array(yourquery));
 
  Or is there some other way to avoid it except to wrap row_to_json()
  calls in own aggregate function which adds enclosing brackets and comma
  separator ( like this : '['row1[,rowN]']' ?
 
 
 
 The way I usually write this is:
 
  select array_to_json(array_agg(q))
  from (yourquery) q;
 
 It's a pity you didn't make this comment back in January when we were 
 talking about this. I think it's too late now in this release cycle to 
 be talking about adding the aggregate function.

My comment is not meant to propose changing anything in 9.2.

I think what we have here is absolutely fantastic :)

If doing something in 9.3 then what I would like is some way to express
multiple queries. Basically a variant of 

query_to_json(query text[])

where queries would be evaluated in order and then all the results
aggregated into on json object.

But aggregation on client as suggested by Merlin may be a better way
to do it for larger result(set)s. 

Especially as it could enable streaming of the resultsets without having
to first buffer everything on the server.


If we can add something, then perhaps a deeper  pretty_print feature

samples:

hannu=# \d test
   Table public.test
 Column |Type |
Modifiers 
+-+---
 id | integer | not null default
nextval('test_id_seq'::regclass)
 data   | text| 
 tstamp | timestamp without time zone | default now()
Indexes:
test_pkey PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test from test),true);
-[ RECORD
1 
]-+
array_to_json | [{id:1,data:0.262814193032682,tstamp:2012-04-05
13:21:03.235204},
  |  {id:2,data:0.157406373415142,tstamp:2012-04-05
13:21:05.2033}]

This is OK


hannu=# \d test2
   Table public.test2
 Column |Type |
Modifiers  
+-+
 id | integer | not null default
nextval('test2_id_seq'::regclass)
 data2  | test| 
 tstamp | timestamp without time zone | default now()
Indexes:
test2_pkey PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test2 from test2),true);
-[ RECORD
1 
]-+---
array_to_json |
[{id:1,data2:{id:1,data:0.262814193032682,tstamp:2012-04-05
13:21:03.235204},tstamp:2012-04-05 13:25:03.644497},
  |
{id:2,data2:{id:2,data:0.157406373415142,tstamp:2012-04-05
13:21:05.2033},tstamp:2012-04-05 13:25:03.644497}]

This is kind of OK

hannu=# \d test3
   Table public.test3
 Column |Type |
Modifiers  
+-+
 id | integer | not null default
nextval('test3_id_seq'::regclass)
 data3  | test2[] | 
 tstamp | timestamp without time zone | default now()
Indexes:
test3_pkey PRIMARY KEY, btree (id)


hannu=# select array_to_json(array(select test3 from test3),true);
-[ RECORD
1 
]-+---
array_to_json |
[{id:1,data3:[{id:1,data2:{id:1,data:0.262814193032682,tstamp:2012-04-05
 13:21:03.235204},tstamp:2012-04-05 
13:25:03.644497},{id:2,data2:{id:2,data:0.157406373415142,tstamp:2012-04-05
 13:21:05.2033},tstamp:2012-04-05 13:25:03.644497}],tstamp:2012-04-16 
14:40:15.795947}]

but this would be nicer if printed like pythons pprint :

 pprint.pprint(d)
[{'data3': [{'data2': {'data': '0.262814193032682',
   'id': 1,
   'tstamp': '2012-04-05 13:21:03.235204'},
 'id': 1,
 'tstamp': '2012-04-05 13:25:03.644497'},
{'data2': {'data': '0.157406373415142',
   'id': 2,
   'tstamp': '2012-04-05 13:21:05.2033'},
 'id': 2,
   

Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-16 Thread Fujii Masao
On Sun, Apr 15, 2012 at 12:13 AM, Thom Brown t...@linux.com wrote:
 On 14 April 2012 15:58, Fujii Masao masao.fu...@gmail.com wrote:
 On Sat, Apr 14, 2012 at 4:16 AM, Thom Brown t...@linux.com wrote:
 I have a question though.  What happens when this is set to write
 (or remote_write as proposed) but it's being used on a standalone
 primary?  At the moment it's not documented what level of guarantee
 this would provide.

 http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
 -
 Commits made when synchronous_commit is set to on or write will
 wait until the synchronous standby responds. The response may
 never occur if the last, or only, standby should crash.
 -

 Is this description not enough? If not enough, how should we change
 the document?

 No, that's not what I was referring to.  If you don't have a standby
 (i.e. a single, isolated database cluster with no replication), and
 its synchronous_commit is set to 'remote_write', what effect does that
 have?

It's the same effect as 'on' and 'local' do, i.e., transaction commit waits
for only local WAL flush. This behavior is not documented explicitly...
How should we change the document? What about adding the following
into the explanation of synchronous_commit parameter (maybe the end
of second paragraph of that)?

-
If synchronous_standby_names is not set, on, remote_write and local
provide the same synchronization level; transaction commit only waits for
local flush.
-

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Alex

Dimitri Fontaine dimi...@2ndquadrant.fr writes:

 Alvaro Herrera alvhe...@commandprompt.com writes:
 I've used Redmine a lot, as you know, and I only keep using it because
 it's a requirement at work.  It is certainly not close to usable for
 general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
 certainly much worse, though).

 Same story here, still using redmine a lot, all with custom reports etc.

 I can't say that it's all that slow, or that there's a problem with the
 code, or that the search doesn't work right (and I've never had a wiki
 edit disappear, either, and I've used that a lot).  It's just the wrong
 tool altogether.

 It's indeed slow here, and I agree that's not the problem. Not the tool
 we need, +1.

I still fail to see how Redmine doesn't fit into requirements summarized
at that wiki page[1], so that must be something other than formal
requirement of being free/open software and running postgres behind
(some sort of feeling maybe?)

Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?

Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.

Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...

--
Regards,
Alex

[1] http://wiki.postgresql.org/wiki/TrackerDiscussion


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Magnus Hagander
On Mon, Apr 16, 2012 at 18:24, Alex a...@commandprompt.com wrote:

 Dimitri Fontaine dimi...@2ndquadrant.fr writes:

 Alvaro Herrera alvhe...@commandprompt.com writes:
 I've used Redmine a lot, as you know, and I only keep using it because
 it's a requirement at work.  It is certainly not close to usable for
 general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
 certainly much worse, though).

 Same story here, still using redmine a lot, all with custom reports etc.

 I can't say that it's all that slow, or that there's a problem with the
 code, or that the search doesn't work right (and I've never had a wiki
 edit disappear, either, and I've used that a lot).  It's just the wrong
 tool altogether.

 It's indeed slow here, and I agree that's not the problem. Not the tool
 we need, +1.

 I still fail to see how Redmine doesn't fit into requirements summarized
 at that wiki page[1], so that must be something other than formal
 requirement of being free/open software and running postgres behind
 (some sort of feeling maybe?)

One thing to note is that the referenced wiki page is over a year old.
And that many more things have been said on email lists than are
actually in that page.

But as one note - I don't believe you can drive redmine completely
from email, which is certainly a requirement that has been discussed,
but is not entirely listed on that page.


 Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
 please describe your ideal tool for the task?

 Given that every other existing tool likely have pissed off someone
 already, I guess our best bet is writing one from scratch.

FWIW, I think the closest thing we've found so far would be debbugs -
which IIRC doesn't have any kind of reasonable database backend, which
would be a strange choice for a project like ours :) And makes many
things harder...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Merlin Moncure
On Mon, Apr 16, 2012 at 11:19 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 If doing something in 9.3 then what I would like is some way to express
 multiple queries. Basically a variant of

 query_to_json(query text[])

 where queries would be evaluated in order and then all the results
 aggregated into on json object.

I personally don't like variants of to_json that push the query in as
text. They defeat parameterization and have other issues.  Another
point for client side processing is the new row level processing in
libpq, so I'd argue that if the result is big enough to warrant
worring about buffering (and it'd have to be a mighty big json doc),
the best bet is to extract it as rows.  I'm playing around with
node.js for the json serving and the sending code looks like this:

  var first = true;

  query.on('row', function(row) {
if(first) {
  first = false;
  response.write('[');
}
else response.write(',');
response.write(row.jsondata);
  });
  query.on('end', function() {
response.write(']');
response.end();
  });

 -- not too bad

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-04-16 Thread Alvaro Herrera

Excerpts from Nikhil Sontakke's message of lun abr 16 03:56:06 -0300 2012:
   Displace yes. It would error out if someone says
  
   ALTER TABLE ONLY... CHECK ();
  
   suggesting to use the ONLY with the CHECK.
 
  I'd say the behavior for that case can revert to the PostgreSQL 9.1
  behavior.
  If the table has children, raise an error.  Otherwise, add an inheritable
  CHECK constraint, albeit one lacking inheritors at that moment.
 
 Ok, that sounds reasonable.

Good, I agree with that too.

Are you going to submit an updated patch?  I started working on your
original a couple of days ago but got distracted by some family news
here.  I'll send it to you so that you can start from there, to avoid
duplicate work.

 Another thing that we should consider is that if we are replacing ONLY with
 NO INHERIT, then instead of just making a cosmetic syntactic change, we
 should also replace all the is*only type of field names with noinherit for
 the sake of completeness and uniformity.

Yeah, I was considering the same thing.  conisonly isn't a very good
name on its own (it only made sense because the ONLY came from ALTER
TABLE ONLY).

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Alex

Magnus Hagander mag...@hagander.net writes:

 One thing to note is that the referenced wiki page is over a year old.
 And that many more things have been said on email lists than are
 actually in that page.

Yeah, I went through it briefly and rather important concern seem to
have been raised by Tom Lane in this msg:
http://archives.postgresql.org/pgsql-hackers/2011-05/msg01480.php

This paragraph:

 The real question is, who is going to keep it up to date?  GSM has the
 right point of view here: we need at least a couple of people who are
 willing to invest substantial amounts of time, or it's not going to go
 anywhere.  Seeing that we can barely manage to keep the mailing list
 moderator positions staffed, I'm not hopeful.

 But as one note - I don't believe you can drive redmine completely
 from email, which is certainly a requirement that has been discussed,
 but is not entirely listed on that page.

Ugh, what do you mean by that?  You can change any attribute (like
status, priority, assigned person, etc.) of a ticket via email.
Anything else?

 FWIW, I think the closest thing we've found so far would be debbugs -
 which IIRC doesn't have any kind of reasonable database backend, which
 would be a strange choice for a project like ours :) And makes many
 things harder...

What stops us from writing a postgres backend for debbugs if it is so
brilliant on handing email and stuff?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Boszormenyi Zoltan

2012-04-16 18:04 keltezéssel, Michael Meskes írta:

On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:

Yes, just like when the readahead window set to 256, FETCH 1024
will iterate through 4 windows or FETCH 64 iterates through the
same window 4 times. This is the idea behind the readahead window.

Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
times for FETCH 64 is the idea though, I agree.


OK. I would like to stretch your agreement a little. :-)

Can we state that caching means that if the cache should serve
the incoming request(s) until the request spills out of it?

If your answer to the above is yes, then please consider this case:
- readahead window is 256 (via ECPGFETCHSZ)
- FETCH 64 was executed twice, so you are in the middle of the cache
- FETCH 1024 is requested

So, if I understand you correctly, you expect this scenario:
- set a one-time readahead window size ( N - # of rows that can be served
  = 1024 - 128 = 768) so the next FETCH by the runtime will fullfill
  this request fully
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 768 will be executed
- all subsequent requests use the old readahead size


How about allowing the readahead window to be resized for the
non-decorated case if the runtime encounters FETCH N and N is
greater than the previous window?

To be resized by what?


By the new FETCH request. Instead of the above, I imagined this:
- the runtime notices that the new request is larger than the current
  readahead window size, modifies the readahead window size upwards,
  so the next FETCH will use it
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 1024 will be executed and the remaining
  768 rows will be served from the new cache
- all subsequent requests use the new readahead size, 1024



  IMO a FETCH N should always be a FETCH N, no matter
what


This part of your statement contradicts with caching. :-)


, i.e. if the readahead window is larger, use it, but even if it's smaller
we should still fetch N at the same time.


So, there can be occasional one-time larger requests but
smaller ones should apply the set window size, right?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Fujii Masao
On Mon, Apr 16, 2012 at 9:05 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 In the backported version to 9.1.3, bgwriter.c is modified
 instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
 used as the equivalent of GetStandbyFlushRecPtr() in 9.2.

In 9,2, GetXLogReplayRecPtr() should be used instead of GetStandbyFlushRecPtr().
A restartpoint is scheduled to finish before next restartpoint is
triggered. A restartpoint
is triggered if too much WAL files have been replayed since last
restartpoint. So
a restartpoint should be scheduled according to the replay location
not the receive
location.

-* computed before calling CreateCheckPoint. The code in XLogInsert that
-* actually triggers a checkpoint when checkpoint_segments is exceeded
-* compares against RedoRecptr, so this is not completely accurate.
-* However, it's good enough for our purposes, we're only calculating an
-* estimate anyway.
+* computed before calling CreateCheckPoint. The code in
+* XLogInsert that actually triggers a checkpoint when
+* checkpoint_segments is exceeded compares against RedoRecptr.
+* Similarly, we consult WAL flush location instead on hot
+* standbys and XLogPageRead compares it aganst RedoRecptr, too.
+* Altough these are not completely accurate, it's good enough for
+* our purposes, we're only calculating an estimate anyway.

I think that basically it's better not to change the comments (i.e., not to add
the line feed) if their contents are the same as previous ones, to highlight
what you actually changed in the patch.

Typo: RedoRecptr should be RedoRecPtr?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis
On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
   1. Order the ranges on both sides by the lower bound, then upper bound.
  Empty ranges can be excluded entirely.
   2. Left := first range on left, Right := first range on right
   3. If Left or Right is empty, terminate.
   4. If lower(Left)  upper(Right), discard Right, goto 2
   5. If lower(Right)  upper(Left), discard Left, goto 2
   6. return (Left, Right) as joined tuple
   7. Right := next range on right
   8. goto 3
 
 This is surely not correct in detail.  As written, it will be impossible
 for any tuple on the right side to be joined to more than one left-side
 tuple.  You will need something analogous to the mark-and-restore
 rewinding logic in standard mergejoin to make this work.

Every time you discard a tuple on the left, you go to step 2, which
rewinds the right side back to the first non-discarded tuple.

So, implemented using mark and restore:

  * start off with the marks at the first tuple on each side
  * discard means move the mark down a tuple
  * setting it back to the first range means restoring to the mark
  * going to the next range (step 7) just means getting another
tuple, without changing the mark

Only one side really needs the mark and restore logic, but it was easier
to write the pseudocode in a symmetrical way (except step 7).

 I will note that mergejoin is not only one of the more complicated
 executor modules, but it accounts for a huge percentage of the planner's
 complexity; which doesn't exactly make me sympathize with the notion of
 let's-copy-and-paste-all-that.  It'd be a lot better if we could build
 it as a small tweak to mergejoin rather than an independent thing.
 
 (Having said that, I have no idea how the planner support could work
 at all, because its treatment of mergejoins is intimately tied to
 mergejoinable equality and EquivalenceClasses and PathKeys, which don't
 seem readily extensible to the range situation.)

I think this is the more important problem area. It's clear that I'll
need to do some more investigation into the planning.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Can we have a soft hot standby conflict that doesn't kill the query,
 but disables index-only-scans?

 Well, there wouldn't be any way for the planner to know whether an
 index-only scan would be safe or not.  I think this would have to look
 like a run-time fallback.  Could it be structured as return that the
 page's all-visible bit is not set, instead of failing?  Or am I
 confused about where the conflict is coming from?

The starting point is that HS now offers 4 different mechanisms for
avoiding conflicts, probably the best of which is hot standby
feedback. So we only need to improve things if those techniques don't
work for people. So initially, my attitude is lets throw a conflict
and wait for feedback during beta.

If we do need to do something, then introduce concept of a visibility conflict.

On replay:
If feedback not set, set LSN of visibility conflict on PROCs that
conflict, if not already set.

On query:
If feedback not set, check conflict LSN against page, if page is
later, check visibility.

In terms of optimisation, I wouldn't expect to have to adjust costs at
all. The difference would only show for long running queries and
typically they don't touch too much new data as a fraction of total.
The cost model for index only is pretty crude anyhow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Guillaume Lelarge
On Mon, 2012-04-16 at 13:09 +0300, Heikki Linnakangas wrote:
 On 13.04.2012 19:17, Guillaume Lelarge wrote:
  On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:
  On 08.04.2012 11:59, Guillaume Lelarge wrote:
  There could be a good reason which would explain why we can't (or don't
  want to) do this, but I don't see it right now.
 
  Me neither, except a general desire to keep internals hidden. I propose
  the attached.
 
  Sounds good to me. I would love to see this happening in 9.2.
 
 Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed 
 MSVC install.pm to also copy the header file.
 

Thanks.


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Can we have a soft hot standby conflict that doesn't kill the query, but
 disables index-only-scans?

Yeah, something like that seems possible.

For example, suppose the master includes, in each
mark-heap-page-all-visible record, the newest XID on the page.  On the
standby, we keep track of the most recent such XID ever seen in shared
memory.  After noting that a page is all-visible, the standby
cross-checks its snapshot against this XID and does the heap fetch
anyway if it's too new.  Conceivably this can be done with memory
barriers but not without locks: we must update the XID in shared
memory *before* marking the page all-visible, and we must check the
visibility map or page-level bit *before* fetching the XID from shared
memory - but the actual reads and writes of the XID are atomic.

Now, if an index-only scan suffers a very high number of these soft
conflicts and consequently does a lot more heap fetches than
expected, performance might suck.  But that should be rare, and could
be mitigated by turning on hot_standby_feedback.  Also, there might be
some hit for repeatedly reading that XID from memory.

Alternatively, we could try to forbid index-only scan plans from being
created in the first place *only when* the underlying snapshot is too
old.  But then what happens if a conflict happens later, after the
plan has been created but before it's fully executed?  At that point
it's too late to switch gears, so we'd still need something like the
above.  And the above might be adequate all by itself, since in
practice index-only scans are mostly going to be useful when the data
isn't changing all that fast, so most of the queries that would be
cancelled by hard conflicts wouldn't have actually had a problem
anyway.

 In the long run, perhaps we need to store XIDs in the visibility map instead
 of just a bit. If you we only stored one xid per 100 pages or something like
 that, the storage overhead would not be much higher than what we have at the
 moment. But that's obviously not going to happen for 9.2...

Well, that would also have the undesirable side effect of greatly
reducing the granularity of the map.  As it is, updating a tiny
fraction of the tuples in the table could result in the entire table
ceasing to be not-all-visible if you happen to update exactly one
tuple per page through the entire heap.  Or more generally, updating
X% of the rows in the table can cause Y% of the rows in the table to
no longer be visible for index-only-scan purposes, where Y  X.  What
you're proposing would make that much worse.

I think we're actually going to find that the current system isn't
tight enough; my suspicion is that users are going to complain that
we're not aggressive enough about marking pages all-visible, because
autovac won't kick in until updates+deletes20%, but (1) index-only
scans also care about inserts and (2) by the time we've got 20% dead
tuples index-only scans may well be worthless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch n...@leadboat.com wrote:
 Do you refer to PD_ALL_VISIBLE as not merely a hint due to the requirement
 to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
 positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
 in its role as a witness of tuple visibility, but it is not a hint in its role
 as a witness of the visibility map status?

Exactly.

 The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
 systems with weaker memory ordering, the FlushBuffer() process's clearing of
 BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
 process until some time after the former retrieved buffer contents from shared
 memory.

True.

 Memory barriers could make the comment true, but we should probably
 just update the comment to explain that a race condition may eat the update
 entirely.

Agreed.

 Incidentally, is there a good reason for MarkBufferDirty() to
 increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
 not to do so?  Looks like an oversight.

Also agreed.

 2. More seriously, lazy_scan_heap() releases the buffer lock before
 setting the all-visible bit on the heap.  This looks just plain wrong
 (and it's my fault, to be clear).  Somebody else can come along after
 we've released the buffer lock and mark the page not-all-visible.
 That concurrent process will check the visibility map bit, find it
 already clear, and go on its merry way.  Then, we set the visibility
 map bit and go on our merry way.  Now we have a not-all-visible page
 with the all-visible bit set in the visibility map.   Oops.

 Hmm, indeed.  This deserves its own open item.

Also agreed.

 Good point.  We could finagle things so the standby notices end-of-recovery
 checkpoints and blocks the optimization for older snapshots.  For example,
 maintain an integer count of end-of-recovery checkpoints seen and store that
 in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
 global value is greater than the snapshot's value, disable the optimization
 for that snapshot.  I don't know whether the optimization is powerful enough
 to justify that level of trouble, but it's an option to consider.

I suspect not.  It seems we can make index-only scans work without
doing something like this; it's only the sequential-scan optimization
we might lose.  But nobody's ever really complained about not having
that, to my knowledge.

 For a different approach, targeting the fragility, we could add assertions to
 detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
 full-page image.

The holes are narrow enough that I fear such cases would be detected
only on high-velocity production systems, which is not exactly where
you want to find out about such problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis
On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote:

 There is a good overview article about spatial joins.
 http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf 

Thank you, that's exactly the kind of overview I was looking for.

 It shows that there is a lot of methods based on building temporaty
 indexes. It might mean that building of temporary GiST index is not a
 bad idea.

That had occurred to me, but I was hesitant to only use temp indexes. It
still doesn't really offer a good solution when both sides of the join
are relatively large (because of random I/O). Also the build speed of
the index would be more important than it is now.

On the other hand, if I tackle the problem using temp indexes, it would
be a more general solution useful for many problems (for instance, we
really need a better solution when the result of a subquery doesn't fit
in a work_mem-sized hashtable).

We may end up with some kind of combination (as the sweep seems to be;
see below).

 Also there are methods which looks like extension of Jeff Davis
 proposal to the multidimensional case. It is Plane Sweep and
 External Plane Sweep methods. However, it might use sophisticated data
 structures for the sweep. And I believe it should use it for good
 performance.

That method looks closer to what I had in mind.

My Range Join proposal is essentially the same as the sweep method
except that it only joins on one dimension, and the rest of the
dimensions have to be checked with RECHECK. So, this still works for an
N-dimensional join, as long as a single dimension is fairly selective.

The sweep method seems to do the first dimension with the sweep method,
and maintains a changing index that it uses to do the lookup. In other
words, it's essentially just a way to do the RECHECK on the other
dimensions in less than O(n) time. So, if the sweep dimension is not
selective at all, this degenerates to the temporary index method (or
something close, anyway).

The fact that, in the sweep method, there is still one special
dimension, makes me think that it will be easier to make the API work
based on ranges (which is a big win because postgres already knows about
ranges). If so, that makes it easier to divide the project into stages,
because we could get it working for ranges and then extend it to support
other dimensions more efficiently later.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If we do need to do something, then introduce concept of a visibility 
 conflict.

 On replay:
 If feedback not set, set LSN of visibility conflict on PROCs that
 conflict, if not already set.

 On query:
 If feedback not set, check conflict LSN against page, if page is
 later, check visibility.

Hmm, should have read the whole thread before replying.  This similar
to what I just proposed in response to Heikki's message, but using LSN
in lieu of (or maybe you mean in addition to) XID.

I don't think we can ignore the need to throw conflicts just because
hot_standby_feedback is set; there are going to be corner cases, for
example, when it's just recently been turned on and the master has
already done cleanup; or if the master and standby have recently
gotten disconnected for even just a few seconds.

But fundamentally we all seem to be converging on some variant of the
soft conflict idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-16 Thread Greg Smith

On 04/15/2012 12:01 PM, Tom Lane wrote:

Where I think we have been fooling ourselves is in failing to tell
the difference between a patch that is committable in the current fest,
versus one that is still WIP and is going to need more development time.


I wonder if this bit of state might be worth extending the UI to 
include.  Just a little toggle box with the options WIP and Commit 
Submission there.  [I am unattached to those particular terms]


I think everyone is clear that Command Triggers is an example that 
reflects a more general problem seen many times before; I'll continue 
using it as a fresh example here without meaning to pick on Dimitri in 
particular.  If Dimitri had submitted that in January while ticking 
Commit Submission, it might have sparked a talk about the difference 
in expectations earlier.  If we use Robert as the bad guy watching the 
CF progress (again, just as an example, not trying to paint him with 
that title), I think it would have been easier for him to write an 
e-mail like this:


It's March now, and this feature has been under heavy review and 
development for 6 weeks.  This looks more like a WIP feature to me, not 
one that arrived as a Commit Submission.  Is there any useful subset to 
consider instead?


Compared to the current way such things happen, that's a more factual 
style of message without as much emotion or judging, and one that can be 
raised much earlier in the CF cycle.  I think it will be easier for 
people to write those, compared with having to be the person saying 
this isn't ready to commit only at the end.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Tom Lane
Amit Kapila amit.kap...@huawei.com writes:
 For this kind of query, currently (referring 9.0.3 code) also it considers
 join of b,c and b,d. 
 As there is no join clause between b,c,d so it will go in path of
 make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

 In this kind of query, if the suggestion by me in below mail is followed,
 then it will consider joining a,b a,c a,d at level-2 in function
 make_rels_by_clause_joins() which it currently doesn't do which may generate
 useless join paths.
 However in real-world scenario's this kind of queries where 2 cols of
 different tables are
 used in one side expression (b.y + c.z) of where clause may be less.

 On the other hand, when we come to consider d, it will have no join
 clauses so we will consider joining it to each other rel in turn.  

 When it come to consider d, as at level -2 it only consider later rels. So
 it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way.  In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old SELECT * FROM a, b. So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.  In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis
On Sun, 2012-04-15 at 23:18 -0700, Darren Duncan wrote:
 Your proposal makes me think of something similar which might be useful, 
 INclusion constraints.  As exclusion constraints might be thought of like a 
 generalization of unique/key constraints, inclusion constraints are like a 
 generalization of foreign key constraints.  The inclusion constraints 
 basically allow some comparison operator other than is-equal to test if 
 values 
 in one table match values in another table, and the constraint allows the 
 former 
 if the test results in true.  An example of said inclusion test is whether 
 the 
 range in one table is contained in a range in another table.  I assume/hope 
 that, similarly, now that we have range types in 9.2, that the existing 
 exclusion constraints can be used with range comparison operators.

Yes, Range Foreign Keys are one of the loose ends for Range Types that
I'd like to wrap up.

 As to your actual proposal, it sounds like a generalization of the relational 
 join or set intersection operator where instead of comparing sets defined in 
 terms of an enumeration of discrete values we are comparing sets defined by a 
 range, which conceptually have infinite values depending on the data type the 
 range is defined over.  But if we're doing this, then it would seem to make 
 sense to go further and see if we have set analogies for all of our 
 relational 
 or set operators, should we want to do work with non-discrete sets.
 
 Now this sounds interesting in theory, but I would also assume that these 
 could 
 be implemented by an extension in terms of existing normal relational 
 operators, 
 where each range value is a discrete value, combined with operators for 
 unioning 
 or differencing etc ranges.  A relation of ranges effectively can represent a 
 discontinuous range; in that case, the empty discontinuous range is also 
 canonically representable by a relation with zero tuples.
 
 Jeff, I get the impression your proposal is partly about helping performance 
 by 
 supporting this internally, rather than one just defining it as a SQL 
 function, 
 am I right?

Per my proposal, the problem statement is that it's slow, so speeding
it up is really the entire proposal ;)

Extending the syntax might be interesting as well, but I don't have a
proposal for that.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Last gasp

2012-04-16 Thread Jay Levitt

Alex wrote:

Jay Levittjay.lev...@gmail.com  writes:


Alex wrote:

I didn't follow this whole thread, but have we considered Redmine[1]?

As the resident Ruby is shiny, let's do everything in Rails on my
MacBook guy, I'd like to make a statement against interest: I've
tried Redmine a few times and it's been painful.  Much of the codebase
is deprecated, it's slow, it has no meaningful search (in 2012?!),
I've seen wiki edits disappear, and at the moment pulling up its own
FAQ page at redmine.org times out.


Yay, that's totally FUD to me.


You're right, it was. My bad. Someday I will find the balance between 
precision and concision.



Could you please elaborate a bit on your points?

Deprecated codebase?  Let me guess...

It runs on an outdated version of Rails (2.3) but only because Rails is
changing so rapidly, I believe.  There is work in progress[1] to move to
the supported branch Rails-3.x.


I wasn't even thinking of that; I know many production systems still run on 
Rails 2.3, and in fact it probably even performs better for some workloads. 
3.x is a mixed bag. I don't hold that against Redmine.


But it's still FUD, because I can't remember where I saw this information. 
So: withdrawn.




Slow?  Do you have any data to back this point up?


No measurable data; just a sigh of relief when switching from Redmine to 
Github - and GitHub ain't a speed demon. In general, I've seen multi-second 
page load times on crazy-simple things like wiki edits; this was on a hosted 
provider (sourcerepo.com), but they also hosted our git repo and we had no 
speed problems there.



No meaningful search, eh?  Works for me.


Redmine searches return partial-word matches, and there's no way to disable 
that.  Searching for test finds latest. To me, that's broken.


Also, the UI is very 5 years ago; e.g., compare revisions uses the same 
columns-of-radio-buttons approach as MediaWiki. If the goal is a tool to 
reduce friction and increase involvement, you want a smoother UX.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Aggressive memory consumption in {ts,array}_typanalyze

2012-04-16 Thread Noah Misch
Both $SUBJECT functions pass to hash_create() an expected hash table size of
1 * attstattarget.  Based on header comments, this represents a near-worst
case.  These typanalyze functions scan the hash tables sequentially, thereby
visiting the entire allocation.  Per the recommendation in comments at
hash_create(), we should be more conservative.  On my system, this tiny test
case runs in 2.8s and dirties 1.0 GiB of local memory:

  CREATE TEMP TABLE t AS SELECT '{}'::int[];
  SET default_statistics_target = 1;
  ANALYZE t;

Rather arbitrarily, I reduced the hash_create() size hint by 99.9%, to the
width of the histograms destined for pg_statistic.  This streamlined the test
case to 20ms runtime and 2 MiB of memory.

To verify that nothing awful happens when the hash table sees considerable
dynamic growth, I used a subject table entailing a 9M-element hash table at
ANALYZE time:

  CREATE UNLOGGED TABLE t AS SELECT array[n,300+n,600+n]
FROM generate_series(1,300) t(n);

Unpatched master takes 15s and dirties 2.1 GiB; patched takes 15s and dirties
1.2 GiB.  The timing noise overlapped any systematic difference, but the
patched version might have been around 500ms slower.  Based on that, I'm
comfortable trusting that improving smaller cases in this way will not greatly
harm larger cases.

The lack of field complaints about ts_typanalyze() resource usage does argue
against the need for a change here, but I think adding array_typanalyze() in
PostgreSQL 9.2 significantly increases our risk exposure.  Sites may have
cranked up the statistics target on array columns to compensate for the lack
of explicit statistical support.  Every cluster has several array columns in
the system catalogs.

The size hint I chose is fairly arbitrary.  Any suggestions for principled
alternatives?

Thanks,
nm
*** a/src/backend/tsearch/ts_typanalyze.c
--- b/src/backend/tsearch/ts_typanalyze.c
***
*** 186,192  compute_tsvector_stats(VacAttrStats *stats,
hash_ctl.match = lexeme_match;
hash_ctl.hcxt = CurrentMemoryContext;
lexemes_tab = hash_create(Analyzed lexemes table,
! bucket_width * 7,
  hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
--- 186,192 
hash_ctl.match = lexeme_match;
hash_ctl.hcxt = CurrentMemoryContext;
lexemes_tab = hash_create(Analyzed lexemes table,
! num_mcelem,
  hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
*** a/src/backend/utils/adt/array_typanalyze.c
--- b/src/backend/utils/adt/array_typanalyze.c
***
*** 282,288  compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc 
fetchfunc,
elem_hash_ctl.match = element_match;
elem_hash_ctl.hcxt = CurrentMemoryContext;
elements_tab = hash_create(Analyzed elements table,
!  bucket_width * 7,
   elem_hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
--- 282,288 
elem_hash_ctl.match = element_match;
elem_hash_ctl.hcxt = CurrentMemoryContext;
elements_tab = hash_create(Analyzed elements table,
!  num_mcelem,
   elem_hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Alexander Korotkov
On Tue, Apr 17, 2012 at 12:12 AM, Jeff Davis pg...@j-davis.com wrote:

 On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote:

  There is a good overview article about spatial joins.
  http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf

 Thank you, that's exactly the kind of overview I was looking for.

  It shows that there is a lot of methods based on building temporaty
  indexes. It might mean that building of temporary GiST index is not a
  bad idea.

 That had occurred to me, but I was hesitant to only use temp indexes. It
 still doesn't really offer a good solution when both sides of the join
 are relatively large (because of random I/O). Also the build speed of
 the index would be more important than it is now.

Note, that amount of random I/O during GiST index build significanlty
dicreased after my GSoC project for buffering GiST index build. However,
GiST index build is still quite CPU-expensive. This problem could be evaded
by support of another methods of index build (another than producing a lot
of penalty and picksplit calls). Hilbert curve and similar methods could
help here. Implementation of such methods would require extension of GiST
interface.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Memory usage during sorting

2012-04-16 Thread Peter Geoghegan
On 14 April 2012 14:34, Peter Geoghegan pe...@2ndquadrant.com wrote:
 FWIW, I started playing with adding timsort to Postgres last night:

 https://github.com/Peter2ndQuadrant/postgres/tree/timsort

I've fixed this feature-branch so that every qsort_arg call site
(including the tuplesort specialisations thereof) call timsort_arg
instead. All but 4 regression tests pass, but they don't really count
as failures, since they're down to an assumption in the tests that the
order certain tuples appear should be the same as our current
quicksort implementation returns them, even though, in these
problematic cases, that is partially dictated by implementation - our
quicksort isn't stable, but timsort is. I think that the original
tests are faulty, but I understand that they're only expected to
provide smoke testing.

I did have to fix one bug in the implementation that I found, which
was an assumption that comparators would only return one of {-1, 0,
1}. The C standard requires only that they return negative, zero or
positive values, as required. I also polished the code a bit, and
added more comments.

I haven't actually quantified the benefits yet, and have no numbers to
show just yet, but I suspect it will prove to be a fairly compelling
win in certain situations.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis pg...@j-davis.com wrote:

 That had occurred to me, but I was hesitant to only use temp indexes. It
 still doesn't really offer a good solution when both sides of the join
 are relatively large (because of random I/O). Also the build speed of
 the index would be more important than it is now.

The thing I like most about temp indexes is that they needn't be temporary.

I'd like to see something along the lines of demand-created optional
indexes, that we reclaim space/maintenance overhead on according to
some cache management scheme. More space you have, the more of the
important ones hang around. The rough same idea applies to
materialised views.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Jay Levitt

Alex wrote:

I still fail to see how Redmine doesn't fit into requirements summarized
at that wiki page[1], so that must be something other than formal
requirement of being free/open software and running postgres behind
(some sort of feeling maybe?)


Well, if those requirements are in fact requirements, Redmine could suit the 
purpose (perhaps with some custom extensions).  But yes, Redmine feels 
wrong to me, though I have never been particularly happy with *any* 
self-hosted bug tracker.


Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT - 
Redmine feels the least-worst to me; it's about equal to Trac, and it's in 
Ruby so I could theoretically(!) improve it.


I think the biggest missing pieces in Redmine aside from custom CF stuff 
are: better search, single sign-on (it requires Yet Another Login), a better 
UX (AJAX, syntax highlighting) and better git integration (a la pull 
requests, where private git commits = patches). Those are some pretty big 
pieces. I don't think Redmine out-of-the-box would improve either CFs or 
community involvement.



Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?


My opinion isn't all that important, since I currently have an infinite 
opinion-to-contribution ratio, but in my unicorniverse: We'd accept that 
open source hasn't always produced great UX, we'd use GitHub's issue 
tracker, allow volunteers to do bug wrangling triage via tags, use GitHub 
hooks to integrate with the existing CF app, and write archiving tools that 
would let us easily export everything off of GitHub for when (a) something 
better comes along or (b) GitHub pops out of existence or adds egregious 
licensing terms like BitKeeper.


Reasons:

- Familiarity: Many developers already have a GitHub account and use it
- Discoverability: GitHub has great SEO
- Tight integration of git with patch and issue management (pull requests, 
fork networks, etc); eliminates ceremony rather than adding to it

- Readable UI with syntax highlighting, etc
- Patch commenting and git integration encourage actual review-resubmit 
cycles instead of Here, look, I fixed it for you reviews

- Two-way email/web integration
- Meets Tom's would be sort of neat criteria[1]
- Could easily implement Simon's pony criteria[2] with tags and API
- Easily extensible with API and hooks
- Subjectively: Its design encourages better community and core interactions 
than any I've seen in 25 years.


GitHub could well be a non-starter, but if third-party-dependence is really 
the holdup, I'd volunteer to write the tools - in fact, a google of [export 
issues from github] shows a few that might already suffice.



Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.


ISTR there's a great Writing your own bug tracker is an anti-pattern blog, 
but I can't find it anymore.



Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...


As an end-user, I've reported exactly one bug in a release version of 
Postgres, and it was fixed (and back-ported!) the next day.  So I really 
can't complain about the tracking of actual bugs.


Sounds like we do need something better for CF/patch workflow, tho.

Jay

[1] Tom wrote:


Now what would be sort of neat is if we had a way to keep all the versions
of patch X plus author and reviewer information, links to reviews and
discussion, etc. in some sort of centralized place


[2] Simon wrote:


My I-Want-a-Pony idea is some kind of rating system that allows us all
to judge patches in terms of importance/popularity, complexity and
maturity. I guess a Balanced Scorecard for the development process. So
we can all see whats going on.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jay Levitt

Simon Riggs wrote:

I'd like to see something along the lines of demand-created optional
indexes, that we reclaim space/maintenance overhead on according to
some cache management scheme. More space you have, the more of the
important ones hang around. The rough same idea applies to
materialised views.


+10; this sort of demand-driven optimization could be the database 
equivalent of Java's HotSpot (which accomplished the amazing task of making 
Java kinda fastish sometimes).


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Robert Haas
On Apr 16, 2012, at 1:40 AM, Jeff Davis pg...@j-davis.com wrote:
 See attached SQL for example. The 
 Problem statement: slow. Nested loops are the only option, although they
 can benefit from an inner GiST index if available. But if the join is
 happening up in the plan tree somewhere, then it's impossible for any
 index to be available.

Hmm. This sounds like something that Tom's recent work on parameterized plans 
ought to have fixed, or if not, it seems closely related. And by this I mean 
specifically the ability to use a GiST index to drive a nested loop that is 
higher up in the plan tree than the immediate parent of the index scan.

This is not an argument against your proposal, just an observation.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Joshua D. Drake


On 04/16/2012 09:24 AM, Alex wrote:

Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?

Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.

Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...
I believe the biggest hurdle for many hackers is that in redmine, email 
is not a first class citizen. The majority of hackers are never going to 
want to go into a web interface to get something done, they live in 
VI/Emacs and the command line.


One thing that redmine definitely breaks is proper handling of 
attachments in email, thus the first thing moving to redmine would break 
would be patch submission.


JD



--
Regards,
Alex

[1] http://wiki.postgresql.org/wiki/TrackerDiscussion





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Apr 16, 2012, at 1:40 AM, Jeff Davis pg...@j-davis.com wrote:
 See attached SQL for example. The 
 Problem statement: slow. Nested loops are the only option, although they
 can benefit from an inner GiST index if available. But if the join is
 happening up in the plan tree somewhere, then it's impossible for any
 index to be available.

 Hmm. This sounds like something that Tom's recent work on
 parameterized plans ought to have fixed, or if not, it seems closely
 related.

Not really.  It's still going to be a nestloop, and as such not terribly
well suited for queries where there are a lot of matchable rows on both
sides.  The work I've been doing is really about making nestloops usable
in cases where join order restrictions formerly prevented it --- but
Jeff's complaint has nothing to do with that.  (This thought also makes
me a bit dubious about the nearby suggestions that more indexes will
fix it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Michael Banck
On Mon, Apr 16, 2012 at 06:29:47PM +0200, Magnus Hagander wrote:
 FWIW, I think the closest thing we've found so far would be debbugs -
 which IIRC doesn't have any kind of reasonable database backend, which
 would be a strange choice for a project like ours :) And makes many
 things harder...

FWIW, Don Armstrong (the main debbugs hacker these days I believe)
recently posted a blog post about a more obscure feature of debuugs
(forcemerge), where he finishs with so we can eventually keep a
postgresql database updated in addition to the flatfile database.:

http://www.donarmstrong.com/posts/debbugs_forcemerge/

I don't know whether this implies a full PostgreSQL backend or just a
helper DB for some part of the functionality, but it might be something
to keep watching.


Regards,

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Slow temporary tables when using sync rep

2012-04-16 Thread Thom Brown
Hi,

I've noticed that when using synchronous replication (on 9.2devel at
least), temporary tables become really slow:

thom@test=# create temporary table temp_test (a text, b text);
CREATE TABLE
Time: 16.812 ms
thom@test=# SET synchronous_commit = 'local';
SET
Time: 2.739 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 3.911 ms
thom@test=# SET synchronous_commit = 'remote_write';
SET
Time: 2.826 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 831.384 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 1700.154 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 4976.853 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 5294.213 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');

It appears to be taking exactly 6 seconds between each transaction,
but as if it's only attempting to complete every 6 seconds like a
heartbeat:

thom@test=# insert into temp_test (a, b) values ('one', 'two');insert
into temp_test (a, b) values ('one', 'two');insert into temp_test (a,
b) values ('one', 'two');insert into temp_test (a, b) values ('one',
'two');
INSERT 0 1
Time: 141.586 ms
INSERT 0 1
Time: 6009.059 ms
INSERT 0 1
Time: 6009.305 ms
INSERT 0 1
Time: 6009.610 ms

thom@test=# begin;
BEGIN
Time: 0.469 ms
thom@test=*# insert into temp_test (a, b) values ('one',
'two');insert into temp_test (a, b) values ('one', 'two');insert into
temp_test (a, b) values ('one', 'two');insert into temp_test (a, b)
values ('one', 'two');
INSERT 0 1
Time: 0.841 ms
INSERT 0 1
Time: 0.395 ms
INSERT 0 1
Time: 0.354 ms
INSERT 0 1
Time: 0.419 ms
thom@test=*# end;
COMMIT
Time: 5649.679 ms


This doesn't apply to regular tables:

thom@test=# create table test (a text, b text);
CREATE TABLE
Time: 18.806 ms
thom@test=# SET synchronous_commit = 'local';
SET
Time: 2.751 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 6.546 ms
thom@test=# SET synchronous_commit = 'remote_write';
SET
Time: 2.713 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 7.257 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 6.308 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 8.871 ms

Is this a known problem?
-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-16 Thread Michael Nolan
On Mon, Apr 16, 2012 at 6:27 PM, Thom Brown t...@linux.com wrote:

 Hi,

 I've noticed that when using synchronous replication (on 9.2devel at
 least), temporary tables become really slow:

 Since temporary tables are only present until the session ends (or
possibly only until a commit), why are they replicated at all?

BTW,  should we have an entry in the index for 'temporary tables?
--
Mike Nolan


Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Amit Kapila
I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

It is on similar lines, but the only difference is that it will try to join 
old_rel with other_rel list incase 
old_rel is not able to join with any of other_rel in the list with proper
join clause between them. 

In the second place, instead of N tests to see whether a rel lacks any
join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.

In the typical case where most base rels are directly joined to only a few
other rels, it will not go in the path where it has to try new combinations,
as the idea is that if in first pass the old_rel is not able find any
suitable rel to join then only it will go in second pass to try to join with
other_rel list.

In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old SELECT * FROM a, b. So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.

I believe this will not be related to clause-less joins because we have
entered the function make_rels_by_clause_joins() for relation which has join
clause and which can be joined to one of the other_rel. So in most cases I
believe it will find one of the members in other_rel list with a proper join
clause. Only in some cases when it is not able to find any of the other_rel
with proper join clause between them, it will go in new path. So in a way if
we see this case handling is different from handling clause-less join case.

The other way like you said is there any join clause that both these
relations participate in?

Does this mean that we will evaluate the join list to see if there can be
any valid join between 2 relations. 
Is it possible that in all scenarios or cases we will be able to find the
existence of any valid join which 
can be possible may be not at this level but at next level.

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, April 17, 2012 2:05 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila amit.kap...@huawei.com writes:
 For this kind of query, currently (referring 9.0.3 code) also it considers
 join of b,c and b,d. 
 As there is no join clause between b,c,d so it will go in path of
 make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

 In this kind of query, if the suggestion by me in below mail is followed,
 then it will consider joining a,b a,c a,d at level-2 in function
 make_rels_by_clause_joins() which it currently doesn't do which may
generate
 useless join paths.
 However in real-world scenario's this kind of queries where 2 cols of
 different tables are
 used in one side expression (b.y + c.z) of where clause may be less.

 On the other hand, when we come to consider d, it will have no join
 clauses so we will consider joining it to each other rel in turn.  

 When it come to consider d, as at level -2 it only consider later rels. So
 it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way.  In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old SELECT * FROM a, b. So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.  In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Kyotaro HORIGUCHI
  But I did fill-paragraph for the fixed comment so the patch
  replaces a little bit more.
 
 You might want to adjust your fill-column setting to 79, so pgindent
 doesn't reformat that again.  Compare to what I just committed.

Thank you for sugestion. I could't decide fill-column fit to
every code and comments.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis
On Mon, 2012-04-16 at 22:20 +0100, Simon Riggs wrote:
 On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis pg...@j-davis.com wrote:
 
  That had occurred to me, but I was hesitant to only use temp indexes. It
  still doesn't really offer a good solution when both sides of the join
  are relatively large (because of random I/O). Also the build speed of
  the index would be more important than it is now.
 
 The thing I like most about temp indexes is that they needn't be temporary.
 
 I'd like to see something along the lines of demand-created optional
 indexes, that we reclaim space/maintenance overhead on according to
 some cache management scheme. More space you have, the more of the
 important ones hang around. The rough same idea applies to
 materialised views.

I think to make an informed decision, the next thing I need to do is
hack up a prototype version of my merge join idea, and see how well it
performs against an index nestloop today.

It seems to me that both approaches may have merit independently.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Alex


 I believe the biggest hurdle for many hackers is that in redmine,
 email is not a first class citizen. The majority of hackers are never
 going to want to go into a web interface to get something done, they
 live in VI/Emacs and the command line.

 One thing that redmine definitely breaks is proper handling of
 attachments in email, thus the first thing moving to redmine would
 break would be patch submission.

Right.  This is not too hard to fix, however, and I didn't suggest we
move to vanilla Redmine.  It's hackable, so we can just bash it into
shape we need (and we need a few Ruby hackers, so there's always someone
in the postgres community to fix it when it breaks, of course.)

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Kyotaro HORIGUCHI
Hello, thank you for comment.

  In the backported version to 9.1.3, bgwriter.c is modified
  instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
  used as the equivalent of GetStandbyFlushRecPtr() in 9.2.
 
 In 9,2, GetXLogReplayRecPtr() should be used instead of
 GetStandbyFlushRecPtr().  A restartpoint is scheduled to finish
 before next restartpoint is triggered. A restartpoint is
 triggered if too much WAL files have been replayed since last
 restartpoint. So a restartpoint should be scheduled according
 to the replay location not the receive location.

I agree with it basically. But I've get confused to look into
GetStandbyFlushRecPtr().

|   if (XLByteLT(receivePtr, replayPtr))
| return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr;
|   else
| return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr;

This seems imply receivePtr may be behind replayPtr. I don't
understand what condition makes it but anyway the bottom line I
think is that a restartpoint should be based on WALs surely
synced. So I choosed GetStandbyFlushRecPtr() to read the
location.

If receivePtr/restorePtr always precede or are equal to
replayPtr, I prefer GetXLogReplayRecPtr() as you suggest.

(And some comment about the order among these pointers might
 should be supplied for the part)

 I think that basically it's better not to change the comments
 (i.e., not to add the line feed) if their contents are the same
 as previous ones, to highlight what you actually changed in the
 patch.

Hmm. It is a priority matter between pointing up in or
compactness of a patch and consistency in outcome of that. I
think the latter takes precedence over the former.

Altough, I could have found a description on better balance.  But
more than that, I've found fill-column for this comment be too
short...

 Typo: RedoRecptr should be RedoRecPtr?

I think that's right. I've unconsciously brought that spelling
from the orignal comment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes
On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:
 OK. I would like to stretch your agreement a little. :-)
 ...

Yeah, you got a point here.

 By the new FETCH request. Instead of the above, I imagined this:
 - the runtime notices that the new request is larger than the current
   readahead window size, modifies the readahead window size upwards,
   so the next FETCH will use it
 - serve the request's first 128 rows from the current cache
 - for the 129th row, FETCH 1024 will be executed and the remaining
   768 rows will be served from the new cache

That means window size goes up to 1024-128 for that one case?

 - all subsequent requests use the new readahead size, 1024

Sounds reasonable to me.

 So, there can be occasional one-time larger requests but
 smaller ones should apply the set window size, right?

Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
work like what you suggested to make sure people don't have to recompile.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Boszormenyi Zoltan

2012-04-17 05:52 keltezéssel, Michael Meskes írta:

On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:

OK. I would like to stretch your agreement a little. :-)
...

Yeah, you got a point here.


By the new FETCH request. Instead of the above, I imagined this:
- the runtime notices that the new request is larger than the current
   readahead window size, modifies the readahead window size upwards,
   so the next FETCH will use it
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 1024 will be executed and the remaining
   768 rows will be served from the new cache

That means window size goes up to 1024-128 for that one case?


I listed two scenarios.
1. occasional bump of the readahead window for large requests,
   for smaller requests it uses the originally set size
2. permanent bump of the readahead window for large requests
   (larger than previously seen), all subsequent requests use
   the new size

Both can be implemented easily, which one do you prefer?
If you always use very large requests, 1) behaves like 2)




- all subsequent requests use the new readahead size, 1024

Sounds reasonable to me.


So, there can be occasional one-time larger requests but
smaller ones should apply the set window size, right?

Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
work like what you suggested to make sure people don't have to recompile.

Michael



--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes
On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote:
 I listed two scenarios.
 1. occasional bump of the readahead window for large requests,
for smaller requests it uses the originally set size
 2. permanent bump of the readahead window for large requests
(larger than previously seen), all subsequent requests use
the new size
 
 Both can be implemented easily, which one do you prefer?
 If you always use very large requests, 1) behaves like 2)

I'd say let's go for #2. #1 is probably more efficient but not what the
programmer asked us to do. After all it's easy to increase the window size
accordingly if you want so as a programmer.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Pavel Stehule
2012/4/16 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 13.04.2012 19:17, Guillaume Lelarge wrote:

 On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:

 On 08.04.2012 11:59, Guillaume Lelarge wrote:

 There could be a good reason which would explain why we can't (or don't
 want to) do this, but I don't see it right now.


 Me neither, except a general desire to keep internals hidden. I propose
 the attached.


 Sounds good to me. I would love to see this happening in 9.2.


 Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC
 install.pm to also copy the header file.


Hello,

it doesn't work for modules from contrib directory

pavel ~/src/postgresql/contrib/check_plpgsql $ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include -D_GNU_SOURCE   -c -o check_plpgsql.o
check_plpgsql.c
check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory
compilation terminated.


Regards

Pavel


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-16 Thread Etsuro Fujita
I updated the patch.  Attached is an updated version of the patch.

Changes:
  * fix a bug in fileGetOptions()
  * rename the validation option and its code to validate_data_file
  * clean up

Best regards,
Etsuro Fujita

 -Original Message-
 From: pgsql-hackers-ow...@postgresql.org
 [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
 Sent: Monday, April 16, 2012 4:09 PM
 To: 'Andrew Dunstan'; 'Shigeru HANADA'
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
 
 Thank you for the review.
 
  -Original Message-
  From: pgsql-hackers-ow...@postgresql.org
  [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew
  Dunstan
  Sent: Friday, April 13, 2012 9:16 PM
  To: Shigeru HANADA
  Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
 
 
 
  On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
   (2012/04/13 16:59), Etsuro Fujita wrote:
   I updated the patch added to CF 2012-Next [1].  Attached is the
   updated version of the patch.
   I applied the patch and ran regression tests of file_fdw, and I got
   SIGSEGV X-(
  
   The failure occurs in fileGetOptions, and it is caused by
   list_delete_cell used in foreach loop; ListCell points delete target
   has been free-ed in list_delete_cell, but foreach accesses it to get
   next element.
  
   Some of backend functions which use list_delete_cell in loop use for
   loop instead of foreach, and other functions exit the loop after
   calling list_delete_cell.  Since we can't stop searching non-COPY
   options until meeting the end of the options list, we would need to
   choose former (for loop), or create another list which contains
   only valid COPY options and return it via other_options parameter.
  
 
  Yes, the code in fileGetOptions() appears to be bogus.
 
 Sorry, I will fix it.
 
  Also, validate is a terrible name for the option (and in the code)
 IMNSHO.
  It's far too generic. validate_not_null or some such would surely be
  better.
 
 I thought it would be used for not only NOT NULL but also CHECK and
foreign
 key constraints.  That is, when a user sets the option to 'true', file_fdw
 verifies that each tuple meets all kinds of constraints.  So, how about
 validate_data_file or simply validate_file?
 
 Best regards,
 Etsuro Fujita
 
  cheers
 
  andrew
 
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
  make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
 changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



file_fdw_notnull_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Magnus Hagander
On Mon, Apr 16, 2012 at 23:48, Jay Levitt jay.lev...@gmail.com wrote:
 Alex wrote:

 I still fail to see how Redmine doesn't fit into requirements summarized
 at that wiki page[1], so that must be something other than formal
 requirement of being free/open software and running postgres behind
 (some sort of feeling maybe?)


 Well, if those requirements are in fact requirements, Redmine could suit the
 purpose (perhaps with some custom extensions).  But yes, Redmine feels
 wrong to me, though I have never been particularly happy with *any*
 self-hosted bug tracker.

That's probably one reason people aren't jumping on this. Because
there is no tracker out there that people actually *like*...


 Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT -
 Redmine feels the least-worst to me; it's about equal to Trac, and it's in
 Ruby so I could theoretically(!) improve it.

It being in ruby is actually generally a liability in the
postgresql.org world - we have very few people who actually know it.
And some of those who know it no longer want to work with it. So if
you're good with Ruby, and want to contribute, feel free to contact me
off-list because we have some things we need help with :-)


 I think the biggest missing pieces in Redmine aside from custom CF stuff
 are: better search, single sign-on (it requires Yet Another Login), a better

We already have redmine working with single password (though not
single signon) for a limited number of projects such as pgweb.


 UX (AJAX, syntax highlighting) and better git integration (a la pull
 requests, where private git commits = patches). Those are some pretty big
 pieces. I don't think Redmine out-of-the-box would improve either CFs or
 community involvement.

I think we can all agree on that :-)

 Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
 please describe your ideal tool for the task?


 My opinion isn't all that important, since I currently have an infinite
 opinion-to-contribution ratio, but in my unicorniverse: We'd accept that
 open source hasn't always produced great UX, we'd use GitHub's issue
 tracker, allow volunteers to do bug wrangling triage via tags, use GitHub
 hooks to integrate with the existing CF app, and write archiving tools that
 would let us easily export everything off of GitHub for when (a) something
 better comes along or (b) GitHub pops out of existence or adds egregious
 licensing terms like BitKeeper.

 Reasons:

 - Familiarity: Many developers already have a GitHub account and use it

Most of the more senior developers don't use github. Other than
possibly as a place to store a plain git repository. So that's not
really relevant.

 - Discoverability: GitHub has great SEO

I'm willing to bet that postgresql.org has better SEO when it comes to postgres.

 - Tight integration of git with patch and issue management (pull requests,
 fork networks, etc); eliminates ceremony rather than adding to it

There are some things that help there, certainly. Pull requests is
basically email, but fork network tracking and such can be darn useful
sometimes.

 - Readable UI with syntax highlighting, etc

Yes, this part is very nice.

 - Patch commenting and git integration encourage actual review-resubmit
 cycles instead of Here, look, I fixed it for you reviews

The amount of spam coming through that system, and the
inability/unwillingness of github to even care about it is a killer
argument *against* github.

We have working antispam for email. The github antispam is somewhere
around where email antispam was in 1994.

 - Two-way email/web integration
 - Meets Tom's would be sort of neat criteria[1]
 - Could easily implement Simon's pony criteria[2] with tags and API
 - Easily extensible with API and hooks
 - Subjectively: Its design encourages better community and core interactions
 than any I've seen in 25 years.

 GitHub could well be a non-starter, but if third-party-dependence is really
 the holdup, I'd volunteer to write the tools - in fact, a google of [export
 issues from github] shows a few that might already suffice.

It *is* a non-starter, because (a) it's a third party dependency, and
(b) AFAIK they don't provide *data access* to the issue trackers.

If the actual issue trackers were stored in git, like the webpages are
for example, it would be a different story...b


 Given that every other existing tool likely have pissed off someone
 already, I guess our best bet is writing one from scratch.

 ISTR there's a great Writing your own bug tracker is an anti-pattern blog,
 but I can't find it anymore.

I'm sure there's more than one :-)

If it was that easy we would've written one already. But doing that is
likely going to be a lot of work, and it needs to be maintained.

(Not that picking another one means it doesn't need to be maintained -
I've seen so many things broken by upgrading redmine or bugzilla that
it's silly...)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: