date:20120416

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Kyotaro HORIGUCHI

This is new version of the patch.
I replaced GetStandbyFlushRecPtr with GetXLogReplayRecPtr to
check progress of checkpoint following Fujii's sugestion.

The first one is for 9.2dev, and the second is 9.1.3 backported version.

===
By the way, I took a close look around there,

> I agree with it basically. But I've get confused to look into
> GetStandbyFlushRecPtr().
> 
> |   if (XLByteLT(receivePtr, replayPtr))
> | return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr;
> |   else
> | return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr;

 - receivePtr seems always updated just after syncing received xlog.
 - replayPtr is updated just BEFORE xlog_redo operation, and
 - restorePtr is updated AFTER xlog_redo().
 - And, replayPtr seems not exceeds receivePtr.

These seems quite reasonable. These conditions make following
conditional expression.

 restorePtr <= replayPtr <= receivePtr

But XLByteLT(recievePtr, replayPtr) this should not return true
under the condition above.. Something wrong in my assumption?


Anyway, I understand here that you say the location returned by
GetXLogReplayRecPtr() is always flushed.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c9473f7..c2fafbf 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -491,8 +491,8 @@ CheckpointerMain(void)
 			 * Initialize checkpointer-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =
+do_restartpoint ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -731,28 +731,30 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-	 * computed before calling CreateCheckPoint. The code in XLogInsert that
-	 * actually triggers a checkpoint when checkpoint_segments is exceeded
-	 * compares against RedoRecptr, so this is not completely accurate.
-	 * However, it's good enough for our purposes, we're only calculating an
-	 * estimate anyway.
+	 * computed before calling CreateCheckPoint. The code in
+	 * XLogInsert that actually triggers a checkpoint when
+	 * checkpoint_segments is exceeded compares against RedoRecPtr.
+	 * Similarly, we consult WAL replay location instead on hot
+	 * standbys and XLogPageRead compares it aganst RedoRecPtr, too.
+	 * Altough these are not completely accurate, it's good enough for
+	 * our purposes, we're only calculating an estimate anyway.
 	 */
-	if (!RecoveryInProgress())
+	
+	recptr =
+		RecoveryInProgress() ? GetXLogReplayRecPtr(NULL) : GetInsertRecPtr();
+	elapsed_xlogs =
+		(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
+		 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
+		CheckPointSegments;
+
+	if (progress < elapsed_xlogs)
 	{
-		recptr = GetInsertRecPtr();
-		elapsed_xlogs =
-			(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
-			 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
-			CheckPointSegments;
-
-		if (progress < elapsed_xlogs)
-		{
-			ckpt_cached_elapsed = elapsed_xlogs;
-			return false;
-		}
+		ckpt_cached_elapsed = elapsed_xlogs;
+		return false;
 	}
 
 	/*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 5643ec8..a272866 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -56,6 +56,7 @@
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
 #include "replication/syncrep.h"
+#include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -489,8 +490,8 @@ BackgroundWriterMain(void)
 			 * Initialize bgwriter-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =	do_restartpoint ?
+GetXLogReplayRecPtr() : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-	 * computed before calling CreateCheckPoint. The code in XLogInsert that
-	 * actually triggers a checkpoint when checkpoint_

Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Heikki Linnakangas


On 17.04.2012 07:56, Pavel Stehule wrote:

2012/4/16 Heikki Linnakangas:

Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC
install.pm to also copy the header file.


Hello,

it doesn't work for modules from contrib directory

pavel ~/src/postgresql/contrib/check_plpgsql $ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include -D_GNU_SOURCE   -c -o check_plpgsql.o
check_plpgsql.c
check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory
compilation terminated.


Hmm, the makefile rule I added copies the plpgsql.h file to 
include/server directory when you do "make install". That makes the file 
available when you build with USE_PGXS=1, without access to the source 
tree, but doesn't change the situation when you build inside contrib.


If you plop the module directly to contrib, I guess you'll have to do

CFLAGS += -I$(top_srcdir)/src/pl/plpgsql/src

That's what pldebugger has always done.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Gsoc2012 idea, tablesample

2012-04-16 Thread Heikki Linnakangas


On 24.03.2012 22:12, Joshua Berkus wrote:

Qi,

Yeah, I can see that.  That's a sign that you had a good idea for a project, 
actually: your idea is interesting enough that people want to debate it.  Make 
a proposal on Monday and our potential mentors will help you refine the idea.


Yep. The discussion withered, so let me try to summarize:

1. We probably don't want the SQL syntax to be added to the grammar. 
This should be written as an extension, using custom functions as the 
API, instead of extra SQL syntax.


2. It's not very useful if it's just a dummy replacement for "WHERE 
random() < ?". It has to be more advanced than that. Quality of the 
sample is important, as is performance. There was also an interesting 
idea of on implementing monetary unit sampling.


I think this would be a useful project if those two points are taken 
care of.


Another idea that Robert Haas suggested was to add support doing a TID 
scan for a query like "WHERE ctid<  '(501,1)'". That's not enough work 
for GSoC project on its own, but could certainly be a part of it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Magnus Hagander

On Mon, Apr 16, 2012 at 23:48, Jay Levitt  wrote:
> Alex wrote:
>>
>> I still fail to see how Redmine doesn't fit into requirements summarized
>> at that wiki page[1], so that must be something other than formal
>> requirement of being free/open software and running postgres behind
>> (some sort of "feeling" maybe?)
>
>
> Well, if those requirements are in fact requirements, Redmine could suit the
> purpose (perhaps with some custom extensions).  But yes, Redmine "feels"
> wrong to me, though I have never been particularly happy with *any*
> self-hosted bug tracker.

That's probably one reason people aren't jumping on this. Because
there is no tracker out there that people actually *like*...

> Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT -
> Redmine feels the least-worst to me; it's about equal to Trac, and it's in
> Ruby so I could theoretically(!) improve it.

It being in ruby is actually generally a liability in the
postgresql.org world - we have very few people who actually know it.
And some of those who know it no longer want to work with it. So if
you're good with Ruby, and want to contribute, feel free to contact me
off-list because we have some things we need help with :-)

> I think the biggest missing pieces in Redmine aside from custom CF stuff
> are: better search, single sign-on (it requires Yet Another Login), a better

We already have redmine working with single password (though not
single signon) for a limited number of projects such as pgweb.

> UX (AJAX, syntax highlighting) and better git integration (a la pull
> requests, where private git commits = patches). Those are some pretty big
> pieces. I don't think Redmine out-of-the-box would improve either CFs or
> community involvement.

I think we can all agree on that :-)

>> Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
>> please describe your ideal tool for the task?
>
>
> My opinion isn't all that important, since I currently have an infinite
> opinion-to-contribution ratio, but in my unicorniverse: We'd accept that
> open source hasn't always produced great UX, we'd use GitHub's issue
> tracker, allow volunteers to do "bug wrangling" triage via tags, use GitHub
> hooks to integrate with the existing CF app, and write archiving tools that
> would let us easily export everything off of GitHub for when (a) something
> better comes along or (b) GitHub pops out of existence or adds egregious
> licensing terms like BitKeeper.
>
> Reasons:
>
> - Familiarity: Many developers already have a GitHub account and use it

Most of the more senior developers don't use github. Other than
possibly as a place to store a plain git repository. So that's not
really relevant.

> - Discoverability: GitHub has great SEO

I'm willing to bet that postgresql.org has better SEO when it comes to postgres.

> - Tight integration of git with patch and issue management (pull requests,
> fork networks, etc); eliminates ceremony rather than adding to it

There are some things that help there, certainly. Pull requests is
basically email, but fork network tracking and such can be darn useful
sometimes.

> - Readable UI with syntax highlighting, etc

Yes, this part is very nice.

> - Patch commenting and git integration encourage actual review-resubmit
> cycles instead of "Here, look, I fixed it for you" reviews

The amount of spam coming through that system, and the
inability/unwillingness of github to even care about it is a killer
argument *against* github.

We have working antispam for email. The github antispam is somewhere
around where email antispam was in 1994.

> - Two-way email/web integration
> - Meets Tom's "would be sort of neat" criteria[1]
> - Could easily implement Simon's "pony" criteria[2] with tags and API
> - Easily extensible with API and hooks
> - Subjectively: Its design encourages better community and core interactions
> than any I've seen in 25 years.
>
> GitHub could well be a non-starter, but if third-party-dependence is really
> the holdup, I'd volunteer to write the tools - in fact, a google of [export
> issues from github] shows a few that might already suffice.

It *is* a non-starter, because (a) it's a third party dependency, and
(b) AFAIK they don't provide *data access* to the issue trackers.

If the actual issue trackers were stored in git, like the webpages are
for example, it would be a different story...b

>> Given that every other existing tool likely have pissed off someone
>> already, I guess our best bet is writing one from scratch.
>
> ISTR there's a great "Writing your own bug tracker is an anti-pattern" blog,
> but I can't find it anymore.

I'm sure there's more than one :-)

If it was that easy we would've written one already. But doing that is
likely going to be a lot of work, and it needs to be maintained.

(Not that picking another one means it doesn't need to be maintained -
I've seen so many things broken by upgrading redmine or bugzilla that
it's silly...)

-- 
 Magnus

Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-16 Thread Etsuro Fujita

I updated the patch.  Attached is an updated version of the patch.

Changes:
  * fix a bug in fileGetOptions()
  * rename the validation option and its code to "validate_data_file"
  * clean up

Best regards,
Etsuro Fujita

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Etsuro Fujita
> Sent: Monday, April 16, 2012 4:09 PM
> To: 'Andrew Dunstan'; 'Shigeru HANADA'
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
> 
> Thank you for the review.
> 
> > -Original Message-
> > From: pgsql-hackers-ow...@postgresql.org
> > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew
> > Dunstan
> > Sent: Friday, April 13, 2012 9:16 PM
> > To: Shigeru HANADA
> > Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
> >
> >
> >
> > On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
> > > (2012/04/13 16:59), Etsuro Fujita wrote:
> > >> I updated the patch added to CF 2012-Next [1].  Attached is the
> > >> updated version of the patch.
> > > I applied the patch and ran regression tests of file_fdw, and I got
> > > SIGSEGV X-(
> > >
> > > The failure occurs in fileGetOptions, and it is caused by
> > > list_delete_cell used in foreach loop; ListCell points delete target
> > > has been free-ed in list_delete_cell, but foreach accesses it to get
> > > next element.
> > >
> > > Some of backend functions which use list_delete_cell in loop use "for"
> > > loop instead of foreach, and other functions exit the loop after
> > > calling list_delete_cell.  Since we can't stop searching non-COPY
> > > options until meeting the end of the options list, we would need to
> > > choose former ("for" loop), or create another list which contains
> > > only valid COPY options and return it via other_options parameter.
> > >
> >
> > Yes, the code in fileGetOptions() appears to be bogus.
> 
> Sorry, I will fix it.
> 
> > Also, "validate" is a terrible name for the option (and in the code)
> IMNSHO.
> > It's far too generic. "validate_not_null" or some such would surely be
> > better.
> 
> I thought it would be used for not only NOT NULL but also CHECK and
foreign
> key constraints.  That is, when a user sets the option to 'true', file_fdw
> verifies that each tuple meets all kinds of constraints.  So, how about
> "validate_data_file" or simply "validate_file"?
> 
> Best regards,
> Etsuro Fujita
> 
> > cheers
> >
> > andrew
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To
> > make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> 
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



file_fdw_notnull_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Pavel Stehule

2012/4/16 Heikki Linnakangas :
> On 13.04.2012 19:17, Guillaume Lelarge wrote:
>>
>> On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:
>>>
>>> On 08.04.2012 11:59, Guillaume Lelarge wrote:

 There could be a good reason which would explain why we can't (or don't
 want to) do this, but I don't see it right now.
>>>
>>>
>>> Me neither, except a general desire to keep internals hidden. I propose
>>> the attached.
>>
>>
>> Sounds good to me. I would love to see this happening in 9.2.
>
>
> Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed MSVC
> install.pm to also copy the header file.
>

Hello,

it doesn't work for modules from contrib directory

pavel ~/src/postgresql/contrib/check_plpgsql $ make
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -g -fpic -I. -I.
-I../../src/include -D_GNU_SOURCE   -c -o check_plpgsql.o
check_plpgsql.c
check_plpgsql.c:16:21: fatal error: plpgsql.h: No such file or directory
compilation terminated.


Regards

Pavel

>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes

On Tue, Apr 17, 2012 at 06:02:34AM +0200, Boszormenyi Zoltan wrote:
> I listed two scenarios.
> 1. occasional bump of the readahead window for large requests,
>for smaller requests it uses the originally set size
> 2. permanent bump of the readahead window for large requests
>(larger than previously seen), all subsequent requests use
>the new size
> 
> Both can be implemented easily, which one do you prefer?
> If you always use very large requests, 1) behaves like 2)

I'd say let's go for #2. #1 is probably more efficient but not what the
programmer asked us to do. After all it's easy to increase the window size
accordingly if you want so as a programmer.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Boszormenyi Zoltan


2012-04-17 05:52 keltezéssel, Michael Meskes írta:

On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:

OK. I would like to stretch your agreement a little. :-)
...

Yeah, you got a point here.


By the new FETCH request. Instead of the above, I imagined this:
- the runtime notices that the new request is larger than the current
   readahead window size, modifies the readahead window size upwards,
   so the next FETCH will use it
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 1024 will be executed and the remaining
   768 rows will be served from the new cache

That means window size goes up to 1024-128 for that one case?


I listed two scenarios.
1. occasional bump of the readahead window for large requests,
   for smaller requests it uses the originally set size
2. permanent bump of the readahead window for large requests
   (larger than previously seen), all subsequent requests use
   the new size

Both can be implemented easily, which one do you prefer?
If you always use very large requests, 1) behaves like 2)




- all subsequent requests use the new readahead size, 1024

Sounds reasonable to me.


So, there can be occasional one-time larger requests but
smaller ones should apply the set window size, right?

Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
work like what you suggested to make sure people don't have to recompile.

Michael



--
--
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes

On Mon, Apr 16, 2012 at 07:18:07PM +0200, Boszormenyi Zoltan wrote:
> OK. I would like to stretch your agreement a little. :-)
> ...

Yeah, you got a point here.

> By the new FETCH request. Instead of the above, I imagined this:
> - the runtime notices that the new request is larger than the current
>   readahead window size, modifies the readahead window size upwards,
>   so the next FETCH will use it
> - serve the request's first 128 rows from the current cache
> - for the 129th row, FETCH 1024 will be executed and the remaining
>   768 rows will be served from the new cache

That means window size goes up to 1024-128 for that one case?

> - all subsequent requests use the new readahead size, 1024

Sounds reasonable to me.

> So, there can be occasional one-time larger requests but
> smaller ones should apply the set window size, right?

Yes. I do agree that FETCH N cannot fetch N all the time, but please make it
work like what you suggested to make sure people don't have to recompile.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Kyotaro HORIGUCHI

Hello, thank you for comment.

> > In the backported version to 9.1.3, bgwriter.c is modified
> > instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
> > used as the equivalent of GetStandbyFlushRecPtr() in 9.2.
> 
> In 9,2, GetXLogReplayRecPtr() should be used instead of
> GetStandbyFlushRecPtr().  A restartpoint is scheduled to finish
> before next restartpoint is triggered. A restartpoint is
> triggered if too much WAL files have been replayed since last
> restartpoint. So a restartpoint should be scheduled according
> to the replay location not the receive location.

I agree with it basically. But I've get confused to look into
GetStandbyFlushRecPtr().

|   if (XLByteLT(receivePtr, replayPtr))
| return XLByteLT(replayPtr, restorePtr) ? restorePtr : replayPtr;
|   else
| return XLByteLT(receivePtr, restorePtr) ? restorePtr : receivePtr;

This seems imply receivePtr may be behind replayPtr. I don't
understand what condition makes it but anyway the bottom line I
think is that a restartpoint should be based on WALs surely
synced. So I choosed GetStandbyFlushRecPtr() to read the
location.

If receivePtr/restorePtr always precede or are equal to
replayPtr, I prefer GetXLogReplayRecPtr() as you suggest.

(And some comment about the order among these pointers might
 should be supplied for the part)

> I think that basically it's better not to change the comments
> (i.e., not to add the line feed) if their contents are the same
> as previous ones, to highlight what you actually changed in the
> patch.

Hmm. It is a priority matter between pointing up in or
compactness of a patch and consistency in outcome of that. I
think the latter takes precedence over the former.

Altough, I could have found a description on better balance.  But
more than that, I've found fill-column for this comment be too
short...

> Typo: RedoRecptr should be RedoRecPtr?

I think that's right. I've unconsciously brought that spelling
from the orignal comment.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Alex


>
> I believe the biggest hurdle for many hackers is that in redmine,
> email is not a first class citizen. The majority of hackers are never
> going to want to go into a web interface to get something done, they
> live in VI/Emacs and the command line.
>
> One thing that redmine definitely breaks is proper handling of
> attachments in email, thus the first thing moving to redmine would
> break would be patch submission.

Right.  This is not too hard to fix, however, and I didn't suggest we
move to vanilla Redmine.  It's hackable, so we can just bash it into
shape we need (and we need a few Ruby hackers, so there's always someone
in the postgres community to fix it when it breaks, of course.)

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis

On Mon, 2012-04-16 at 22:20 +0100, Simon Riggs wrote:
> On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis  wrote:
> 
> > That had occurred to me, but I was hesitant to only use temp indexes. It
> > still doesn't really offer a good solution when both sides of the join
> > are relatively large (because of random I/O). Also the build speed of
> > the index would be more important than it is now.
> 
> The thing I like most about temp indexes is that they needn't be temporary.
> 
> I'd like to see something along the lines of demand-created optional
> indexes, that we reclaim space/maintenance overhead on according to
> some cache management scheme. More space you have, the more of the
> important ones hang around. The rough same idea applies to
> materialised views.

I think to make an informed decision, the next thing I need to do is
hack up a prototype version of my merge join idea, and see how well it
performs against an index nestloop today.

It seems to me that both approaches may have merit independently.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Kyotaro HORIGUCHI

> > But I did fill-paragraph for the fixed comment so the patch
> > replaces a little bit more.
> 
> You might want to adjust your fill-column setting to 79, so pgindent
> doesn't reformat that again.  Compare to what I just committed.

Thank you for sugestion. I could't decide fill-column fit to
every code and comments.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Amit Kapila

>>I might still be misunderstanding, but I think what you are suggesting
>>is that in the loop in make_rels_by_clause_joins, if we find that the
>>old_rel doesn't have a join clause/restriction with the current
>>other_rel, we check to see whether other_rel has any join clauses at
>>all, and force the join to occur anyway if it doesn't.

It is on similar lines, but the only difference is that it will try to join 
old_rel with other_rel list incase 
old_rel is not able to join with any of other_rel in the list with proper
join clause between them. 

>>In the second place, instead of N tests to see whether a rel lacks any
join clauses,
>>we'd now have something approaching O(N^2) such tests, in the typical
>>case where most base rels are directly joined to only a few other rels.

In the typical case where most base rels are directly joined to only a few
other rels, it will not go in the path where it has to try new combinations,
as the idea is that if in first pass the old_rel is not able find any
suitable rel to join then only it will go in second pass to try to join with
other_rel list.

>>In the first place, it seems to miss the need to
>>clauseless-join two relations when neither of them have any join
>>clauses, for instance plain old "SELECT * FROM a, b". So you still need
>>something like the make_rels_by_clauseless_joins code path, and it's
>>not entirely clear how to avoid duplicated work there.

I believe this will not be related to clause-less joins because we have
entered the function make_rels_by_clause_joins() for relation which has join
clause and which can be joined to one of the other_rel. So in most cases I
believe it will find one of the members in other_rel list with a proper join
clause. Only in some cases when it is not able to find any of the other_rel
with proper join clause between them, it will go in new path. So in a way if
we see this case handling is different from handling clause-less join case.

The other way like you said "is there any join clause that both these
relations participate in?"

Does this mean that we will evaluate the join list to see if there can be
any valid join between 2 relations. 
Is it possible that in all scenarios or cases we will be able to find the
existence of any valid join which 
can be possible may be not at this level but at next level.

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, April 17, 2012 2:05 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila  writes:
> For this kind of query, currently (referring 9.0.3 code) also it considers
> join of b,c and b,d. 
> As there is no join clause between b,c,d so it will go in path of
> make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

> In this kind of query, if the suggestion by me in below mail is followed,
> then it will consider joining a,b a,c a,d at level-2 in function
> make_rels_by_clause_joins() which it currently doesn't do which may
generate
> useless join paths.
> However in real-world scenario's this kind of queries where 2 cols of
> different tables are
> used in one side expression (b.y + c.z) of where clause may be less.

> On the other hand, when we come to consider d, it will have no join
> clauses so we will consider joining it to each other rel in turn.  

> When it come to consider d, as at level -2 it only consider later rels. So
> it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way.  In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old "SELECT * FROM a, b". So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.  In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Slow temporary tables when using sync rep

2012-04-16 Thread Michael Nolan

On Mon, Apr 16, 2012 at 6:27 PM, Thom Brown  wrote:

> Hi,
>
> I've noticed that when using synchronous replication (on 9.2devel at
> least), temporary tables become really slow:
>
> Since temporary tables are only present until the session ends (or
possibly only until a commit), why are they replicated at all?

BTW,  should we have an entry in the index for 'temporary tables?
--
Mike Nolan

[HACKERS] Slow temporary tables when using sync rep

2012-04-16 Thread Thom Brown

Hi,

I've noticed that when using synchronous replication (on 9.2devel at
least), temporary tables become really slow:

thom@test=# create temporary table temp_test (a text, b text);
CREATE TABLE
Time: 16.812 ms
thom@test=# SET synchronous_commit = 'local';
SET
Time: 2.739 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 3.911 ms
thom@test=# SET synchronous_commit = 'remote_write';
SET
Time: 2.826 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 831.384 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 1700.154 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 4976.853 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');
INSERT 0 1
Time: 5294.213 ms
thom@test=# insert into temp_test (a, b) values ('one', 'two');

It appears to be taking exactly 6 seconds between each transaction,
but as if it's only attempting to complete every 6 seconds like a
heartbeat:

thom@test=# insert into temp_test (a, b) values ('one', 'two');insert
into temp_test (a, b) values ('one', 'two');insert into temp_test (a,
b) values ('one', 'two');insert into temp_test (a, b) values ('one',
'two');
INSERT 0 1
Time: 141.586 ms
INSERT 0 1
Time: 6009.059 ms
INSERT 0 1
Time: 6009.305 ms
INSERT 0 1
Time: 6009.610 ms

thom@test=# begin;
BEGIN
Time: 0.469 ms
thom@test=*# insert into temp_test (a, b) values ('one',
'two');insert into temp_test (a, b) values ('one', 'two');insert into
temp_test (a, b) values ('one', 'two');insert into temp_test (a, b)
values ('one', 'two');
INSERT 0 1
Time: 0.841 ms
INSERT 0 1
Time: 0.395 ms
INSERT 0 1
Time: 0.354 ms
INSERT 0 1
Time: 0.419 ms
thom@test=*# end;
COMMIT
Time: 5649.679 ms


This doesn't apply to regular tables:

thom@test=# create table test (a text, b text);
CREATE TABLE
Time: 18.806 ms
thom@test=# SET synchronous_commit = 'local';
SET
Time: 2.751 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 6.546 ms
thom@test=# SET synchronous_commit = 'remote_write';
SET
Time: 2.713 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 7.257 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 6.308 ms
thom@test=# insert into test (a, b) values ('one', 'two');
INSERT 0 1
Time: 8.871 ms

Is this a known problem?
-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Michael Banck

On Mon, Apr 16, 2012 at 06:29:47PM +0200, Magnus Hagander wrote:
> FWIW, I think the closest thing we've found so far would be debbugs -
> which IIRC doesn't have any kind of reasonable database backend, which
> would be a strange choice for a project like ours :) And makes many
> things harder...

FWIW, Don Armstrong (the main debbugs hacker these days I believe)
recently posted a blog post about a more obscure feature of debuugs
(forcemerge), where he finishs with "so we can eventually keep a
postgresql database updated in addition to the flatfile database.":

http://www.donarmstrong.com/posts/debbugs_forcemerge/

I don't know whether this implies a full PostgreSQL backend or just a
helper DB for some part of the functionality, but it might be something
to keep watching.


Regards,

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Tom Lane

Robert Haas  writes:
> On Apr 16, 2012, at 1:40 AM, Jeff Davis  wrote:
>> See attached SQL for example. The 
>> Problem statement: slow. Nested loops are the only option, although they
>> can benefit from an inner GiST index if available. But if the join is
>> happening up in the plan tree somewhere, then it's impossible for any
>> index to be available.

> Hmm. This sounds like something that Tom's recent work on
> parameterized plans ought to have fixed, or if not, it seems closely
> related.

Not really.  It's still going to be a nestloop, and as such not terribly
well suited for queries where there are a lot of matchable rows on both
sides.  The work I've been doing is really about making nestloops usable
in cases where join order restrictions formerly prevented it --- but
Jeff's complaint has nothing to do with that.  (This thought also makes
me a bit dubious about the nearby suggestions that more indexes will
fix it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Joshua D. Drake



On 04/16/2012 09:24 AM, Alex wrote:

Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?

Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.

Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...
I believe the biggest hurdle for many hackers is that in redmine, email 
is not a first class citizen. The majority of hackers are never going to 
want to go into a web interface to get something done, they live in 
VI/Emacs and the command line.


One thing that redmine definitely breaks is proper handling of 
attachments in email, thus the first thing moving to redmine would break 
would be patch submission.


JD



--
Regards,
Alex

[1] http://wiki.postgresql.org/wiki/TrackerDiscussion





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Robert Haas

On Apr 16, 2012, at 1:40 AM, Jeff Davis  wrote:
> See attached SQL for example. The 
> Problem statement: slow. Nested loops are the only option, although they
> can benefit from an inner GiST index if available. But if the join is
> happening up in the plan tree somewhere, then it's impossible for any
> index to be available.

Hmm. This sounds like something that Tom's recent work on parameterized plans 
ought to have fixed, or if not, it seems closely related. And by "this" I mean 
specifically the ability to use a GiST index to drive a nested loop that is 
higher up in the plan tree than the immediate parent of the index scan.

This is not an argument against your proposal, just an observation.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jay Levitt


Simon Riggs wrote:

I'd like to see something along the lines of demand-created optional
indexes, that we reclaim space/maintenance overhead on according to
some cache management scheme. More space you have, the more of the
important ones hang around. The rough same idea applies to
materialised views.


+10; this sort of demand-driven optimization could be the database 
equivalent of Java's HotSpot (which accomplished the amazing task of making 
Java kinda fastish sometimes).


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Jay Levitt


Alex wrote:

I still fail to see how Redmine doesn't fit into requirements summarized
at that wiki page[1], so that must be something other than formal
requirement of being free/open software and running postgres behind
(some sort of "feeling" maybe?)


Well, if those requirements are in fact requirements, Redmine could suit the 
purpose (perhaps with some custom extensions).  But yes, Redmine "feels" 
wrong to me, though I have never been particularly happy with *any* 
self-hosted bug tracker.


Of those I've used - Redmine, Mantis, JIRA, Trac, Bugzilla, GForge, RT - 
Redmine feels the least-worst to me; it's about equal to Trac, and it's in 
Ruby so I could theoretically(!) improve it.


I think the biggest missing pieces in Redmine aside from custom CF stuff 
are: better search, single sign-on (it requires Yet Another Login), a better 
UX (AJAX, syntax highlighting) and better git integration (a la pull 
requests, where private git commits = patches). Those are some pretty big 
pieces. I don't think Redmine out-of-the-box would improve either CFs or 
community involvement.



Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?


My opinion isn't all that important, since I currently have an infinite 
opinion-to-contribution ratio, but in my unicorniverse: We'd accept that 
open source hasn't always produced great UX, we'd use GitHub's issue 
tracker, allow volunteers to do "bug wrangling" triage via tags, use GitHub 
hooks to integrate with the existing CF app, and write archiving tools that 
would let us easily export everything off of GitHub for when (a) something 
better comes along or (b) GitHub pops out of existence or adds egregious 
licensing terms like BitKeeper.


Reasons:

- Familiarity: Many developers already have a GitHub account and use it
- Discoverability: GitHub has great SEO
- Tight integration of git with patch and issue management (pull requests, 
fork networks, etc); eliminates ceremony rather than adding to it

- Readable UI with syntax highlighting, etc
- Patch commenting and git integration encourage actual review-resubmit 
cycles instead of "Here, look, I fixed it for you" reviews

- Two-way email/web integration
- Meets Tom's "would be sort of neat" criteria[1]
- Could easily implement Simon's "pony" criteria[2] with tags and API
- Easily extensible with API and hooks
- Subjectively: Its design encourages better community and core interactions 
than any I've seen in 25 years.


GitHub could well be a non-starter, but if third-party-dependence is really 
the holdup, I'd volunteer to write the tools - in fact, a google of [export 
issues from github] shows a few that might already suffice.



Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.


ISTR there's a great "Writing your own bug tracker is an anti-pattern" blog, 
but I can't find it anymore.



Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...


As an end-user, I've reported exactly one bug in a release version of 
Postgres, and it was fixed (and back-ported!) the next day.  So I really 
can't complain about the tracking of actual bugs.


Sounds like we do need something better for CF/patch workflow, tho.

Jay

[1] Tom wrote:


Now what would be sort of neat is if we had a way to keep all the versions
of patch X plus author and reviewer information, links to reviews and
discussion, etc. in some sort of centralized place


[2] Simon wrote:


My I-Want-a-Pony idea is some kind of rating system that allows us all
to judge patches in terms of importance/popularity, complexity and
maturity. I guess a Balanced Scorecard for the development process. So
we can all see whats going on.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Simon Riggs

On Mon, Apr 16, 2012 at 9:12 PM, Jeff Davis  wrote:

> That had occurred to me, but I was hesitant to only use temp indexes. It
> still doesn't really offer a good solution when both sides of the join
> are relatively large (because of random I/O). Also the build speed of
> the index would be more important than it is now.

The thing I like most about temp indexes is that they needn't be temporary.

I'd like to see something along the lines of demand-created optional
indexes, that we reclaim space/maintenance overhead on according to
some cache management scheme. More space you have, the more of the
important ones hang around. The rough same idea applies to
materialised views.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

2012-04-16 Thread Peter Geoghegan

On 14 April 2012 14:34, Peter Geoghegan  wrote:
> FWIW, I started playing with adding timsort to Postgres last night:
>
> https://github.com/Peter2ndQuadrant/postgres/tree/timsort

I've fixed this feature-branch so that every qsort_arg call site
(including the tuplesort specialisations thereof) call timsort_arg
instead. All but 4 regression tests pass, but they don't really count
as failures, since they're down to an assumption in the tests that the
order certain tuples appear should be the same as our current
quicksort implementation returns them, even though, in these
problematic cases, that is partially dictated by implementation - our
quicksort isn't stable, but timsort is. I think that the original
tests are faulty, but I understand that they're only expected to
provide smoke testing.

I did have to fix one bug in the implementation that I found, which
was an assumption that comparators would only return one of {-1, 0,
1}. The C standard requires only that they return negative, zero or
positive values, as required. I also polished the code a bit, and
added more comments.

I haven't actually quantified the benefits yet, and have no numbers to
show just yet, but I suspect it will prove to be a fairly compelling
win in certain situations.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Alexander Korotkov

On Tue, Apr 17, 2012 at 12:12 AM, Jeff Davis  wrote:

> On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote:
>
> > There is a good overview article about spatial joins.
> > http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf
>
> Thank you, that's exactly the kind of overview I was looking for.
>
> > It shows that there is a lot of methods based on building temporaty
> > indexes. It might mean that building of temporary GiST index is not a
> > bad idea.
>
> That had occurred to me, but I was hesitant to only use temp indexes. It
> still doesn't really offer a good solution when both sides of the join
> are relatively large (because of random I/O). Also the build speed of
> the index would be more important than it is now.
>
Note, that amount of random I/O during GiST index build significanlty
dicreased after my GSoC project for buffering GiST index build. However,
GiST index build is still quite CPU-expensive. This problem could be evaded
by support of another methods of index build (another than producing a lot
of penalty and picksplit calls). Hilbert curve and similar methods could
help here. Implementation of such methods would require extension of GiST
interface.

--
With best regards,
Alexander Korotkov.

[HACKERS] Aggressive memory consumption in {ts,array}_typanalyze

2012-04-16 Thread Noah Misch

Both $SUBJECT functions pass to hash_create() an expected hash table size of
1 * attstattarget.  Based on header comments, this represents a near-worst
case.  These typanalyze functions scan the hash tables sequentially, thereby
visiting the entire allocation.  Per the recommendation in comments at
hash_create(), we should be more conservative.  On my system, this tiny test
case runs in 2.8s and dirties 1.0 GiB of local memory:

  CREATE TEMP TABLE t AS SELECT '{}'::int[];
  SET default_statistics_target = 1;
  ANALYZE t;

Rather arbitrarily, I reduced the hash_create() size hint by 99.9%, to the
width of the histograms destined for pg_statistic.  This streamlined the test
case to <20ms runtime and 2 MiB of memory.

To verify that nothing awful happens when the hash table sees considerable
dynamic growth, I used a subject table entailing a 9M-element hash table at
ANALYZE time:

  CREATE UNLOGGED TABLE t AS SELECT array[n,300+n,600+n]
FROM generate_series(1,300) t(n);

Unpatched master takes 15s and dirties 2.1 GiB; patched takes 15s and dirties
1.2 GiB.  The timing noise overlapped any systematic difference, but the
patched version might have been around 500ms slower.  Based on that, I'm
comfortable trusting that improving smaller cases in this way will not greatly
harm larger cases.

The lack of field complaints about ts_typanalyze() resource usage does argue
against the need for a change here, but I think adding array_typanalyze() in
PostgreSQL 9.2 significantly increases our risk exposure.  Sites may have
cranked up the statistics target on array columns to compensate for the lack
of explicit statistical support.  Every cluster has several array columns in
the system catalogs.

The size hint I chose is fairly arbitrary.  Any suggestions for principled
alternatives?

Thanks,
nm
*** a/src/backend/tsearch/ts_typanalyze.c
--- b/src/backend/tsearch/ts_typanalyze.c
***
*** 186,192  compute_tsvector_stats(VacAttrStats *stats,
hash_ctl.match = lexeme_match;
hash_ctl.hcxt = CurrentMemoryContext;
lexemes_tab = hash_create("Analyzed lexemes table",
! bucket_width * 7,
  &hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
--- 186,192 
hash_ctl.match = lexeme_match;
hash_ctl.hcxt = CurrentMemoryContext;
lexemes_tab = hash_create("Analyzed lexemes table",
! num_mcelem,
  &hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
*** a/src/backend/utils/adt/array_typanalyze.c
--- b/src/backend/utils/adt/array_typanalyze.c
***
*** 282,288  compute_array_stats(VacAttrStats *stats, AnalyzeAttrFetchFunc 
fetchfunc,
elem_hash_ctl.match = element_match;
elem_hash_ctl.hcxt = CurrentMemoryContext;
elements_tab = hash_create("Analyzed elements table",
!  bucket_width * 7,
   &elem_hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  
--- 282,288 
elem_hash_ctl.match = element_match;
elem_hash_ctl.hcxt = CurrentMemoryContext;
elements_tab = hash_create("Analyzed elements table",
!  num_mcelem,
   &elem_hash_ctl,
HASH_ELEM | HASH_FUNCTION | 
HASH_COMPARE | HASH_CONTEXT);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-16 Thread Jay Levitt


Alex wrote:

Jay Levitt  writes:


Alex wrote:

I didn't follow this whole thread, but have we considered Redmine[1]?

As the resident "Ruby is shiny, let's do everything in Rails on my
MacBook" guy, I'd like to make a statement against interest: I've
tried Redmine a few times and it's been painful.  Much of the codebase
is deprecated, it's slow, it has no meaningful search (in 2012?!),
I've seen wiki edits disappear, and at the moment pulling up its own
FAQ page at redmine.org times out.


Yay, that's totally FUD to me.


You're right, it was. My bad. Someday I will find the balance between 
precision and concision.



Could you please elaborate a bit on your points?

Deprecated codebase?  Let me guess...

It runs on an outdated version of Rails (2.3) but only because Rails is
changing so rapidly, I believe.  There is work in progress[1] to move to
the supported branch Rails-3.x.


I wasn't even thinking of that; I know many production systems still run on 
Rails 2.3, and in fact it probably even performs better for some workloads. 
3.x is a mixed bag. I don't hold that against Redmine.


But it's still FUD, because I can't remember where I saw this information. 
So: withdrawn.




Slow?  Do you have any data to back this point up?


No measurable data; just a sigh of relief when switching from Redmine to 
Github - and GitHub ain't a speed demon. In general, I've seen multi-second 
page load times on crazy-simple things like wiki edits; this was on a hosted 
provider (sourcerepo.com), but they also hosted our git repo and we had no 
speed problems there.



No meaningful search, eh?  Works for me.


Redmine searches return partial-word matches, and there's no way to disable 
that.  Searching for "test" finds "latest". To me, that's broken.


Also, the UI is very 5 years ago; e.g., "compare revisions" uses the same 
columns-of-radio-buttons approach as MediaWiki. If the goal is a tool to 
reduce friction and increase involvement, you want a smoother UX.


Jay

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis

On Sun, 2012-04-15 at 23:18 -0700, Darren Duncan wrote:
> Your proposal makes me think of something similar which might be useful, 
> INclusion constraints.  As "exclusion constraints" might be thought of like a 
> generalization of unique/key constraints, "inclusion constraints" are like a 
> generalization of foreign key constraints.  The "inclusion constraints" 
> basically allow some comparison operator other than is-equal to test if 
> values 
> in one table match values in another table, and the constraint allows the 
> former 
> if the test results in true.  An example of said inclusion test is whether 
> the 
> range in one table is contained in a range in another table.  I assume/hope 
> that, similarly, now that we have range types in 9.2, that the existing 
> "exclusion constraints" can be used with range comparison operators.

Yes, Range Foreign Keys are one of the loose ends for Range Types that
I'd like to wrap up.

> As to your actual proposal, it sounds like a generalization of the relational 
> join or set intersection operator where instead of comparing sets defined in 
> terms of an enumeration of discrete values we are comparing sets defined by a 
> range, which conceptually have infinite values depending on the data type the 
> range is defined over.  But if we're doing this, then it would seem to make 
> sense to go further and see if we have set analogies for all of our 
> relational 
> or set operators, should we want to do work with non-discrete sets.
> 
> Now this sounds interesting in theory, but I would also assume that these 
> could 
> be implemented by an extension in terms of existing normal relational 
> operators, 
> where each range value is a discrete value, combined with operators for 
> unioning 
> or differencing etc ranges.  A relation of ranges effectively can represent a 
> discontinuous range; in that case, the empty discontinuous range is also 
> canonically representable by a relation with zero tuples.
> 
> Jeff, I get the impression your proposal is partly about helping performance 
> by 
> supporting this internally, rather than one just defining it as a SQL 
> function, 
> am I right?

Per my proposal, the problem statement is that it's "slow", so speeding
it up is really the entire proposal ;)

Extending the syntax might be interesting as well, but I don't have a
proposal for that.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Tom Lane

Amit Kapila  writes:
> For this kind of query, currently (referring 9.0.3 code) also it considers
> join of b,c and b,d. 
> As there is no join clause between b,c,d so it will go in path of
> make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

> In this kind of query, if the suggestion by me in below mail is followed,
> then it will consider joining a,b a,c a,d at level-2 in function
> make_rels_by_clause_joins() which it currently doesn't do which may generate
> useless join paths.
> However in real-world scenario's this kind of queries where 2 cols of
> different tables are
> used in one side expression (b.y + c.z) of where clause may be less.

> On the other hand, when we come to consider d, it will have no join
> clauses so we will consider joining it to each other rel in turn.  

> When it come to consider d, as at level -2 it only consider later rels. So
> it should not consider joining with each other rel.

I might still be misunderstanding, but I think what you are suggesting
is that in the loop in make_rels_by_clause_joins, if we find that the
old_rel doesn't have a join clause/restriction with the current
other_rel, we check to see whether other_rel has any join clauses at
all, and force the join to occur anyway if it doesn't.

I can't really get excited about doing it that way instead of the
current way.  In the first place, it seems to miss the need to
clauseless-join two relations when neither of them have any join
clauses, for instance plain old "SELECT * FROM a, b". So you still need
something like the make_rels_by_clauseless_joins code path, and it's
not entirely clear how to avoid duplicated work there.  In the second
place, instead of N tests to see whether a rel lacks any join clauses,
we'd now have something approaching O(N^2) such tests, in the typical
case where most base rels are directly joined to only a few other rels.
So it seems to make things slower for little obvious benefit.

In general, queries with join-clause-less rels are pretty uncommon,
so I don't want to introduce extra work into make_rels_by_clause_joins
to handle the case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-16 Thread Greg Smith


On 04/15/2012 12:01 PM, Tom Lane wrote:

Where I think we have been fooling ourselves is in failing to tell
the difference between a patch that is committable in the current fest,
versus one that is still WIP and is going to need more development time.


I wonder if this bit of state might be worth extending the UI to 
include.  Just a little toggle box with the options "WIP" and "Commit 
Submission" there.  [I am unattached to those particular terms]


I think everyone is clear that Command Triggers is an example that 
reflects a more general problem seen many times before; I'll continue 
using it as a fresh example here without meaning to pick on Dimitri in 
particular.  If Dimitri had submitted that in January while ticking 
"Commit Submission", it might have sparked a talk about the difference 
in expectations earlier.  If we use Robert as the bad guy watching the 
CF progress (again, just as an example, not trying to paint him with 
that title), I think it would have been easier for him to write an 
e-mail like this:


"It's March now, and this feature has been under heavy review and 
development for 6 weeks.  This looks more like a WIP feature to me, not 
one that arrived as a Commit Submission.  Is there any useful subset to 
consider instead?"


Compared to the current way such things happen, that's a more factual 
style of message without as much emotion or judging, and one that can be 
raised much earlier in the CF cycle.  I think it will be easier for 
people to write those, compared with having to be the person saying 
"this isn't ready to commit" only at the end.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas

On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs  wrote:
> If we do need to do something, then introduce concept of a visibility 
> conflict.
>
> On replay:
> If feedback not set, set LSN of visibility conflict on PROCs that
> conflict, if not already set.
>
> On query:
> If feedback not set, check conflict LSN against page, if page is
> later, check visibility.

Hmm, should have read the whole thread before replying.  This similar
to what I just proposed in response to Heikki's message, but using LSN
in lieu of (or maybe you mean in addition to) XID.

I don't think we can ignore the need to throw conflicts just because
hot_standby_feedback is set; there are going to be corner cases, for
example, when it's just recently been turned on and the master has
already done cleanup; or if the master and standby have recently
gotten disconnected for even just a few seconds.

But fundamentally we all seem to be converging on some variant of the
"soft conflict" idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis

On Mon, 2012-04-16 at 16:22 +0400, Alexander Korotkov wrote:

> There is a good overview article about spatial joins.
> http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf 

Thank you, that's exactly the kind of overview I was looking for.

> It shows that there is a lot of methods based on building temporaty
> indexes. It might mean that building of temporary GiST index is not a
> bad idea.

That had occurred to me, but I was hesitant to only use temp indexes. It
still doesn't really offer a good solution when both sides of the join
are relatively large (because of random I/O). Also the build speed of
the index would be more important than it is now.

On the other hand, if I tackle the problem using temp indexes, it would
be a more general solution useful for many problems (for instance, we
really need a better solution when the result of a subquery doesn't fit
in a work_mem-sized hashtable).

We may end up with some kind of combination (as the sweep seems to be;
see below).

> Also there are methods which looks like extension of Jeff Davis
> proposal to the multidimensional case. It is Plane Sweep and
> External Plane Sweep methods. However, it might use sophisticated data
> structures for the "sweep". And I believe it should use it for good
> performance.

That method looks closer to what I had in mind.

My Range Join proposal is essentially the same as the sweep method
except that it only joins on one dimension, and the rest of the
dimensions have to be checked with RECHECK. So, this still works for an
N-dimensional join, as long as a single dimension is fairly selective.

The sweep method seems to do the first dimension with the sweep method,
and maintains a changing index that it uses to do the lookup. In other
words, it's essentially just a way to do the RECHECK on the other
dimensions in less than O(n) time. So, if the sweep dimension is not
selective at all, this degenerates to the temporary index method (or
something close, anyway).

The fact that, in the sweep method, there is still one "special"
dimension, makes me think that it will be easier to make the API work
based on ranges (which is a big win because postgres already knows about
ranges). If so, that makes it easier to divide the project into stages,
because we could get it working for ranges and then extend it to support
other dimensions more efficiently later.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas

On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch  wrote:
> Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement
> to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
> positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
> in its role as a witness of tuple visibility, but it is not a hint in its role
> as a witness of the visibility map status?

Exactly.

> The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
> systems with weaker memory ordering, the FlushBuffer() process's clearing of
> BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
> process until some time after the former retrieved buffer contents from shared
> memory.

True.

> Memory barriers could make the comment true, but we should probably
> just update the comment to explain that a race condition may eat the update
> entirely.

Agreed.

> Incidentally, is there a good reason for MarkBufferDirty() to
> increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
> not to do so?  Looks like an oversight.

Also agreed.

>> 2. More seriously, lazy_scan_heap() releases the buffer lock before
>> setting the all-visible bit on the heap.  This looks just plain wrong
>> (and it's my fault, to be clear).  Somebody else can come along after
>> we've released the buffer lock and mark the page not-all-visible.
>> That concurrent process will check the visibility map bit, find it
>> already clear, and go on its merry way.  Then, we set the visibility
>> map bit and go on our merry way.  Now we have a not-all-visible page
>> with the all-visible bit set in the visibility map.   Oops.
>
> Hmm, indeed.  This deserves its own open item.

Also agreed.

> Good point.  We could finagle things so the standby notices end-of-recovery
> checkpoints and blocks the optimization for older snapshots.  For example,
> maintain an integer count of end-of-recovery checkpoints seen and store that
> in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
> global value is greater than the snapshot's value, disable the optimization
> for that snapshot.  I don't know whether the optimization is powerful enough
> to justify that level of trouble, but it's an option to consider.

I suspect not.  It seems we can make index-only scans work without
doing something like this; it's only the sequential-scan optimization
we might lose.  But nobody's ever really complained about not having
that, to my knowledge.

> For a different approach, targeting the fragility, we could add assertions to
> detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
> full-page image.

The holes are narrow enough that I fear such cases would be detected
only on high-velocity production systems, which is not exactly where
you want to find out about such problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas

On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas
 wrote:
> Can we have a "soft" hot standby conflict that doesn't kill the query, but
> disables index-only-scans?

Yeah, something like that seems possible.

For example, suppose the master includes, in each
mark-heap-page-all-visible record, the newest XID on the page.  On the
standby, we keep track of the most recent such XID ever seen in shared
memory.  After noting that a page is all-visible, the standby
cross-checks its snapshot against this XID and does the heap fetch
anyway if it's too new.  Conceivably this can be done with memory
barriers but not without locks: we must update the XID in shared
memory *before* marking the page all-visible, and we must check the
visibility map or page-level bit *before* fetching the XID from shared
memory - but the actual reads and writes of the XID are atomic.

Now, if an index-only scan suffers a very high number of these "soft
conflicts" and consequently does a lot more heap fetches than
expected, performance might suck.  But that should be rare, and could
be mitigated by turning on hot_standby_feedback.  Also, there might be
some hit for repeatedly reading that XID from memory.

Alternatively, we could try to forbid index-only scan plans from being
created in the first place *only when* the underlying snapshot is too
old.  But then what happens if a conflict happens later, after the
plan has been created but before it's fully executed?  At that point
it's too late to switch gears, so we'd still need something like the
above.  And the above might be adequate all by itself, since in
practice index-only scans are mostly going to be useful when the data
isn't changing all that fast, so most of the queries that would be
cancelled by "hard" conflicts wouldn't have actually had a problem
anyway.

> In the long run, perhaps we need to store XIDs in the visibility map instead
> of just a bit. If you we only stored one xid per 100 pages or something like
> that, the storage overhead would not be much higher than what we have at the
> moment. But that's obviously not going to happen for 9.2...

Well, that would also have the undesirable side effect of greatly
reducing the granularity of the map.  As it is, updating a tiny
fraction of the tuples in the table could result in the entire table
ceasing to be not-all-visible if you happen to update exactly one
tuple per page through the entire heap.  Or more generally, updating
X% of the rows in the table can cause Y% of the rows in the table to
no longer be visible for index-only-scan purposes, where Y >> X.  What
you're proposing would make that much worse.

I think we're actually going to find that the current system isn't
tight enough; my suspicion is that users are going to complain that
we're not aggressive enough about marking pages all-visible, because
autovac won't kick in until updates+deletes>20%, but (1) index-only
scans also care about inserts and (2) by the time we've got 20% dead
tuples index-only scans may well be worthless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Guillaume Lelarge

On Mon, 2012-04-16 at 13:09 +0300, Heikki Linnakangas wrote:
> On 13.04.2012 19:17, Guillaume Lelarge wrote:
> > On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:
> >> On 08.04.2012 11:59, Guillaume Lelarge wrote:
> >>> There could be a good reason which would explain why we can't (or don't
> >>> want to) do this, but I don't see it right now.
> >>
> >> Me neither, except a general desire to keep internals hidden. I propose
> >> the attached.
> >
> > Sounds good to me. I would love to see this happening in 9.2.
> 
> Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed 
> MSVC install.pm to also copy the header file.
> 

Thanks.


-- 
Guillaume
http://blog.guillaume.lelarge.info
http://www.dalibo.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs

On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> Can we have a "soft" hot standby conflict that doesn't kill the query,
>> but disables index-only-scans?
>
> Well, there wouldn't be any way for the planner to know whether an
> index-only scan would be safe or not.  I think this would have to look
> like a run-time fallback.  Could it be structured as "return that the
> page's all-visible bit is not set, instead of failing?"  Or am I
> confused about where the conflict is coming from?

The starting point is that HS now offers 4 different mechanisms for
avoiding conflicts, probably the best of which is hot standby
feedback. So we only need to improve things if those techniques don't
work for people. So initially, my attitude is lets throw a conflict
and wait for feedback during beta.

If we do need to do something, then introduce concept of a visibility conflict.

On replay:
If feedback not set, set LSN of visibility conflict on PROCs that
conflict, if not already set.

On query:
If feedback not set, check conflict LSN against page, if page is
later, check visibility.

In terms of optimisation, I wouldn't expect to have to adjust costs at
all. The difference would only show for long running queries and
typically they don't touch too much new data as a fraction of total.
The cost model for index only is pretty crude anyhow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Jeff Davis

On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote:
> Jeff Davis  writes:
> >  1. Order the ranges on both sides by the lower bound, then upper bound.
> > Empty ranges can be excluded entirely.
> >  2. Left := first range on left, Right := first range on right
> >  3. If Left or Right is empty, terminate.
> >  4. If lower(Left) > upper(Right), discard Right, goto 2
> >  5. If lower(Right) > upper(Left), discard Left, goto 2
> >  6. return (Left, Right) as joined tuple
> >  7. Right := next range on right
> >  8. goto 3
> 
> This is surely not correct in detail.  As written, it will be impossible
> for any tuple on the right side to be joined to more than one left-side
> tuple.  You will need something analogous to the mark-and-restore
> rewinding logic in standard mergejoin to make this work.

Every time you discard a tuple on the left, you go to step 2, which
rewinds the right side back to the first non-discarded tuple.

So, implemented using mark and restore:

  * start off with the marks at the first tuple on each side
  * "discard" means move the mark down a tuple
  * setting it back to the first range means restoring to the mark
  * going to the next range (step 7) just means getting another
tuple, without changing the mark

Only one side really needs the mark and restore logic, but it was easier
to write the pseudocode in a symmetrical way (except step 7).

> I will note that mergejoin is not only one of the more complicated
> executor modules, but it accounts for a huge percentage of the planner's
> complexity; which doesn't exactly make me sympathize with the notion of
> let's-copy-and-paste-all-that.  It'd be a lot better if we could build
> it as a small tweak to mergejoin rather than an independent thing.
> 
> (Having said that, I have no idea how the planner support could work
> at all, because its treatment of mergejoins is intimately tied to
> mergejoinable equality and EquivalenceClasses and PathKeys, which don't
> seem readily extensible to the range situation.)

I think this is the more important problem area. It's clear that I'll
need to do some more investigation into the planning.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Fujii Masao

On Mon, Apr 16, 2012 at 9:05 PM, Kyotaro HORIGUCHI
 wrote:
> In the backported version to 9.1.3, bgwriter.c is modified
> instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
> used as the equivalent of GetStandbyFlushRecPtr() in 9.2.

In 9,2, GetXLogReplayRecPtr() should be used instead of GetStandbyFlushRecPtr().
A restartpoint is scheduled to finish before next restartpoint is
triggered. A restartpoint
is triggered if too much WAL files have been replayed since last
restartpoint. So
a restartpoint should be scheduled according to the replay location
not the receive
location.

-* computed before calling CreateCheckPoint. The code in XLogInsert that
-* actually triggers a checkpoint when checkpoint_segments is exceeded
-* compares against RedoRecptr, so this is not completely accurate.
-* However, it's good enough for our purposes, we're only calculating an
-* estimate anyway.
+* computed before calling CreateCheckPoint. The code in
+* XLogInsert that actually triggers a checkpoint when
+* checkpoint_segments is exceeded compares against RedoRecptr.
+* Similarly, we consult WAL flush location instead on hot
+* standbys and XLogPageRead compares it aganst RedoRecptr, too.
+* Altough these are not completely accurate, it's good enough for
+* our purposes, we're only calculating an estimate anyway.

I think that basically it's better not to change the comments (i.e., not to add
the line feed) if their contents are the same as previous ones, to highlight
what you actually changed in the patch.

Typo: RedoRecptr should be RedoRecPtr?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Boszormenyi Zoltan


2012-04-16 18:04 keltezéssel, Michael Meskes írta:

On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:

Yes, just like when the readahead window set to 256, FETCH 1024
will iterate through 4 windows or FETCH 64 iterates through the
same window 4 times. This is the idea behind the "readahead window".

Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
times for FETCH 64 is the idea though, I agree.


OK. I would like to stretch your agreement a little. :-)

Can we state that caching means that if the cache should serve
the incoming request(s) until the request spills out of it?

If your answer to the above is "yes", then please consider this case:
- readahead window is 256 (via ECPGFETCHSZ)
- FETCH 64 was executed twice, so you are in the middle of the cache
- FETCH 1024 is requested

So, if I understand you correctly, you expect this scenario:
- set a "one-time" readahead window size ( N - # of rows that can be served
  = 1024 - 128 = 768) so the next FETCH by the runtime will fullfill
  this request fully
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 768 will be executed
- all subsequent requests use the old readahead size


How about allowing the readahead window to be resized for the
non-decorated case if the runtime encounters FETCH N and N is
greater than the previous window?

To be resized by what?


By the new FETCH request. Instead of the above, I imagined this:
- the runtime notices that the new request is larger than the current
  readahead window size, modifies the readahead window size upwards,
  so the next FETCH will use it
- serve the request's first 128 rows from the current cache
- for the 129th row, FETCH 1024 will be executed and the remaining
  768 rows will be served from the new cache
- all subsequent requests use the new readahead size, 1024



  IMO a FETCH N should always be a FETCH N, no matter
what


This part of your statement contradicts with caching. :-)


, i.e. if the readahead window is larger, use it, but even if it's smaller
we should still fetch N at the same time.


So, there can be occasional one-time larger requests but
smaller ones should apply the set window size, right?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need

2012-04-16 Thread Alex


Magnus Hagander  writes:

> One thing to note is that the referenced wiki page is over a year old.
> And that many more things have been said on email lists than are
> actually in that page.

Yeah, I went through it briefly and rather important concern seem to
have been raised by Tom Lane in this msg:
http://archives.postgresql.org/pgsql-hackers/2011-05/msg01480.php

This paragraph:

> The real question is, who is going to keep it up to date?  GSM has the
> right point of view here: we need at least a couple of people who are
> willing to invest substantial amounts of time, or it's not going to go
> anywhere.  Seeing that we can barely manage to keep the mailing list
> moderator positions staffed, I'm not hopeful.

> But as one note - I don't believe you can drive redmine completely
> from email, which is certainly a requirement that has been discussed,
> but is not entirely listed on that page.

Ugh, what do you mean by that?  You can change any attribute (like
status, priority, assigned person, etc.) of a ticket via email.
Anything else?

> FWIW, I think the closest thing we've found so far would be debbugs -
> which IIRC doesn't have any kind of reasonable database backend, which
> would be a strange choice for a project like ours :) And makes many
> things harder...

What stops us from writing a postgres backend for debbugs if it is so
brilliant on handing email and stuff?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] how to create a non-inherited CHECK constraint in CREATE TABLE

2012-04-16 Thread Alvaro Herrera


Excerpts from Nikhil Sontakke's message of lun abr 16 03:56:06 -0300 2012:
> > > Displace yes. It would error out if someone says
> > >
> > > ALTER TABLE ONLY... CHECK ();
> > >
> > > suggesting to use the ONLY with the CHECK.
> >
> > I'd say the behavior for that case can revert to the PostgreSQL 9.1
> > behavior.
> > If the table has children, raise an error.  Otherwise, add an inheritable
> > CHECK constraint, albeit one lacking inheritors at that moment.
> >
> Ok, that sounds reasonable.

Good, I agree with that too.

Are you going to submit an updated patch?  I started working on your
original a couple of days ago but got distracted by some family news
here.  I'll send it to you so that you can start from there, to avoid
duplicate work.

> Another thing that we should consider is that if we are replacing ONLY with
> NO INHERIT, then instead of just making a cosmetic syntactic change, we
> should also replace all the is*only type of field names with noinherit for
> the sake of completeness and uniformity.

Yeah, I was considering the same thing.  "conisonly" isn't a very good
name on its own (it only made sense because the ONLY came from "ALTER
TABLE ONLY").

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Merlin Moncure

On Mon, Apr 16, 2012 at 11:19 AM, Hannu Krosing  wrote:
> If doing something in 9.3 then what I would like is some way to express
> multiple queries. Basically a variant of
>
> query_to_json(query text[])
>
> where queries would be evaluated in order and then all the results
> aggregated into on json object.

I personally don't like variants of to_json that push the query in as
text. They defeat parameterization and have other issues.  Another
point for client side processing is the new row level processing in
libpq, so I'd argue that if the result is big enough to warrant
worring about buffering (and it'd have to be a mighty big json doc),
the best bet is to extract it as rows.  I'm playing around with
node.js for the json serving and the sending code looks like this:

  var first = true;

  query.on('row', function(row) {
if(first) {
  first = false;
  response.write('[');
}
else response.write(',');
response.write(row.jsondata);
  });
  query.on('end', function() {
response.write(']');
response.end();
  });

 -- not too bad

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Magnus Hagander

On Mon, Apr 16, 2012 at 18:24, Alex  wrote:
>
> Dimitri Fontaine  writes:
>
>> Alvaro Herrera  writes:
>>> I've used Redmine a lot, as you know, and I only keep using it because
>>> it's a requirement at work.  It is certainly not close to usable for
>>> general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
>>> certainly much worse, though).
>>
>> Same story here, still using redmine a lot, all with custom reports etc.
>>
>>> I can't say that it's all that slow, or that there's a problem with the
>>> code, or that the search doesn't work right (and I've never had a wiki
>>> edit disappear, either, and I've used that a lot).  It's just the wrong
>>> tool altogether.
>>
>> It's indeed slow here, and I agree that's not the problem. Not the tool
>> we need, +1.
>
> I still fail to see how Redmine doesn't fit into requirements summarized
> at that wiki page[1], so that must be something other than formal
> requirement of being free/open software and running postgres behind
> (some sort of "feeling" maybe?)

One thing to note is that the referenced wiki page is over a year old.
And that many more things have been said on email lists than are
actually in that page.

But as one note - I don't believe you can drive redmine completely
from email, which is certainly a requirement that has been discussed,
but is not entirely listed on that page.


> Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
> please describe your ideal tool for the task?
>
> Given that every other existing tool likely have pissed off someone
> already, I guess our best bet is writing one from scratch.

FWIW, I think the closest thing we've found so far would be debbugs -
which IIRC doesn't have any kind of reasonable database backend, which
would be a strange choice for a project like ours :) And makes many
things harder...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Bug tracker tool we need (was: Last gasp)

2012-04-16 Thread Alex


Dimitri Fontaine  writes:

> Alvaro Herrera  writes:
>> I've used Redmine a lot, as you know, and I only keep using it because
>> it's a requirement at work.  It is certainly not close to usable for
>> general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
>> certainly much worse, though).
>
> Same story here, still using redmine a lot, all with custom reports etc.
>
>> I can't say that it's all that slow, or that there's a problem with the
>> code, or that the search doesn't work right (and I've never had a wiki
>> edit disappear, either, and I've used that a lot).  It's just the wrong
>> tool altogether.
>
> It's indeed slow here, and I agree that's not the problem. Not the tool
> we need, +1.

I still fail to see how Redmine doesn't fit into requirements summarized
at that wiki page[1], so that must be something other than formal
requirement of being free/open software and running postgres behind
(some sort of "feeling" maybe?)

Jay, Alvaro, Dimitri (and whoever else wants to speak up) could you
please describe your ideal tool for the task?

Given that every other existing tool likely have pissed off someone
already, I guess our best bet is writing one from scratch.

Or maybe there isn't really a need for a tracker?  The core team have
managed to live without one for so long after all...

--
Regards,
Alex

[1] http://wiki.postgresql.org/wiki/TrackerDiscussion


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add new replication mode synchronous_commit = 'write'.

2012-04-16 Thread Fujii Masao

On Sun, Apr 15, 2012 at 12:13 AM, Thom Brown  wrote:
> On 14 April 2012 15:58, Fujii Masao  wrote:
>> On Sat, Apr 14, 2012 at 4:16 AM, Thom Brown  wrote:
>>> I have a question though.  What happens when this is set to "write"
>>> (or "remote_write" as proposed) but it's being used on a standalone
>>> primary?  At the moment it's not documented what level of guarantee
>>> this would provide.
>>
>> http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION-HA
>> -
>> Commits made when synchronous_commit is set to on or write will
>> wait until the synchronous standby responds. The response may
>> never occur if the last, or only, standby should crash.
>> -
>>
>> Is this description not enough? If not enough, how should we change
>> the document?
>
> No, that's not what I was referring to.  If you don't have a standby
> (i.e. a single, isolated database cluster with no replication), and
> its synchronous_commit is set to 'remote_write', what effect does that
> have?

It's the same effect as 'on' and 'local' do, i.e., transaction commit waits
for only local WAL flush. This behavior is not documented explicitly...
How should we change the document? What about adding the following
into the explanation of synchronous_commit parameter (maybe the end
of second paragraph of that)?

-
If synchronous_standby_names is not set, on, remote_write and local
provide the same synchronization level; transaction commit only waits for
local flush.
-

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Hannu Krosing

On Mon, 2012-04-16 at 10:10 -0400, Andrew Dunstan wrote:
> 
> On 04/16/2012 09:34 AM, Hannu Krosing wrote:
> >> based on Abhijit's feeling and some discussion offline, the consensus
> >> seems to be to remove query_to_json.
> > The only comment I have here is that query_to_json could have been
> > replaced with json_agg, so thet you don't need to do double-buffering
> > for the results of array() call in
> >
> > SELECT array_to_json(array());
> >
> > Or is there some other way to avoid it except to wrap row_to_json()
> > calls in own aggregate function which adds enclosing brackets and comma
> > separator ( like this : '['[,]']' ?
> >
> >
> 
> The way I usually write this is:
> 
>  select array_to_json(array_agg(q))
>  from () q;
> 
> It's a pity you didn't make this comment back in January when we were 
> talking about this. I think it's too late now in this release cycle to 
> be talking about adding the aggregate function.

My comment is not meant to propose changing anything in 9.2.

I think what we have here is absolutely fantastic :)

If doing something in 9.3 then what I would like is some way to express
multiple queries. Basically a variant of 

query_to_json(query text[])

where queries would be evaluated in order and then all the results
aggregated into on json object.

But "aggregation on client" as suggested by Merlin may be a better way
to do it for larger result(set)s. 

Especially as it could enable streaming of the resultsets without having
to first buffer everything on the server.


If we can add something, then perhaps a "deeper"  pretty_print feature

samples:

hannu=# \d test
   Table "public.test"
 Column |Type |
Modifiers 
+-+---
 id | integer | not null default
nextval('test_id_seq'::regclass)
 data   | text| 
 tstamp | timestamp without time zone | default now()
Indexes:
"test_pkey" PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test from test),true);
-[ RECORD
1 
]-+
array_to_json | [{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05
13:21:03.235204"},
  |  {"id":2,"data":"0.157406373415142","tstamp":"2012-04-05
13:21:05.2033"}]

This is OK


hannu=# \d test2
   Table "public.test2"
 Column |Type |
Modifiers  
+-+
 id | integer | not null default
nextval('test2_id_seq'::regclass)
 data2  | test| 
 tstamp | timestamp without time zone | default now()
Indexes:
"test2_pkey" PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test2 from test2),true);
-[ RECORD
1 
]-+---
array_to_json |
[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05
13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},
  |
{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05
13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}]

This is "kind of OK"

hannu=# \d test3
   Table "public.test3"
 Column |Type |
Modifiers  
+-+
 id | integer | not null default
nextval('test3_id_seq'::regclass)
 data3  | test2[] | 
 tstamp | timestamp without time zone | default now()
Indexes:
"test3_pkey" PRIMARY KEY, btree (id)


hannu=# select array_to_json(array(select test3 from test3),true);
-[ RECORD
1 
]-+---
array_to_json |
[{"id":1,"data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05
 13:21:03.235204"},"tstamp":"2012-04-05 
13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05
 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}],"tstamp":"2012-04-16 
14:40:15.795947"}]

but this would be nicer if printed like pythons pprint :

>>> pprint.pprint(d)
[{'data3': [{'data2': {'data': '0.262814193032682',
   'id': 1,
   'tstamp': '2012-04-05 13:21:03.235204'},
 'id': 1,
 'tstamp': '2012-04-05 13:25:03.644497'},
{'data2': {'data': '0.157406373415

Re: [HACKERS] ECPG FETCH readahead

2012-04-16 Thread Michael Meskes

On Mon, Apr 16, 2012 at 06:24:57AM +0200, Boszormenyi Zoltan wrote:
> Yes, just like when the readahead window set to 256, FETCH 1024
> will iterate through 4 windows or FETCH 64 iterates through the
> same window 4 times. This is the idea behind the "readahead window".

Really? It's definitely not the idea behind FETCH 1024. Using the same window 4
times for FETCH 64 is the idea though, I agree.

> How about allowing the readahead window to be resized for the
> non-decorated case if the runtime encounters FETCH N and N is
> greater than the previous window?

To be resized by what? IMO a FETCH N should always be a FETCH N, no matter
what, i.e. if the readahead window is larger, use it, but even if it's smaller
we should still fetch N at the same time.

Michael

-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at googlemail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Merlin Moncure

On Mon, Apr 16, 2012 at 9:10 AM, Andrew Dunstan  wrote:
>
>
> On 04/16/2012 09:34 AM, Hannu Krosing wrote:
>>>
>>> based on Abhijit's feeling and some discussion offline, the consensus
>>> seems to be to remove query_to_json.
>>
>> The only comment I have here is that query_to_json could have been
>> replaced with json_agg, so thet you don't need to do double-buffering
>> for the results of array() call in
>>
>> SELECT array_to_json(array());
>>
>> Or is there some other way to avoid it except to wrap row_to_json()
>> calls in own aggregate function which adds enclosing brackets and comma
>> separator ( like this : '['[,]']' ?
>>
>>
>
> The way I usually write this is:
>
>    select array_to_json(array_agg(q))
>    from () q;
>
> It's a pity you didn't make this comment back in January when we were
> talking about this. I think it's too late now in this release cycle to be
> talking about adding the aggregate function.

I find array_agg to be pretty consistently slower than
array()...although not much, say around 5-10%.  I use array_agg only
when grouping.  try timing
select array_to_json(array_agg(v)) from (select v from
generate_series(1,100) v) q;
vs
select array_to_json(array(select v from generate_series(1,100) v));

I agree with Hannu but as things stand if I'm trying to avoid the
extra buffer I've found myself doing the final aggregation on the
client -- it's easy enough.  BTW, I'm using the json stuff heavily and
it's just absolutely fantastic.  Finally I can write web applications
without wondering exactly where it was that computer science went off
the rails.

I've already demoed a prototype app that integrates pg directly with
the many high quality js libraries out there and it makes things very
easy and quick by making writing data services trivial.  Data pushes
are still quite a pain but I figure something can be worked out.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Tom Lane

Heikki Linnakangas  writes:
> Can we have a "soft" hot standby conflict that doesn't kill the query, 
> but disables index-only-scans?

Well, there wouldn't be any way for the planner to know whether an
index-only scan would be safe or not.  I think this would have to look
like a run-time fallback.  Could it be structured as "return that the
page's all-visible bit is not set, instead of failing?"  Or am I
confused about where the conflict is coming from?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Andrew Dunstan




On 04/16/2012 09:34 AM, Hannu Krosing wrote:

based on Abhijit's feeling and some discussion offline, the consensus
seems to be to remove query_to_json.

The only comment I have here is that query_to_json could have been
replaced with json_agg, so thet you don't need to do double-buffering
for the results of array() call in

SELECT array_to_json(array());

Or is there some other way to avoid it except to wrap row_to_json()
calls in own aggregate function which adds enclosing brackets and comma
separator ( like this : '['[,]']' ?




The way I usually write this is:

select array_to_json(array_agg(q))
from () q;

It's a pity you didn't make this comment back in January when we were 
talking about this. I think it's too late now in this release cycle to 
be talking about adding the aggregate function.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON for PG 9.2

2012-04-16 Thread Hannu Krosing

On Tue, 2012-01-31 at 12:58 -0500, Andrew Dunstan wrote:
> 
> On 01/30/2012 10:37 AM, Andrew Dunstan wrote:
> >
> >
> >> Aside: is query_to_json really necessary? It seems rather ugly and
> >> easily avoidable using row_to_json.
> >>
> >
> > I started with this, again by analogy with query_to_xml(). But I agree 
> > it's a bit ugly. If we're not going to do it, then we definitely need 
> > to look at caching the output funcs in the function info. A closer 
> > approximation is actually:
> >
> >SELECT array_to_json(array_agg(q))
> >FROM ( your query here ) q;
> >
> >
> > But then I'd want the ability to break that up a bit with line feeds, 
> > so we'd need to adjust the interface slightly. (Hint: don't try the 
> > above with "select * from pg_class".)
> >
> >
> > I'll wait on further comments, but I can probably turn these changes 
> > around very quickly once we're agreed.
> >
> >
> >
> 
> based on Abhijit's feeling and some discussion offline, the consensus 
> seems to be to remove query_to_json.

The only comment I have here is that query_to_json could have been
replaced with json_agg, so thet you don't need to do double-buffering
for the results of array() call in 

SELECT array_to_json(array());

Or is there some other way to avoid it except to wrap row_to_json()
calls in own aggregate function which adds enclosing brackets and comma
separator ( like this : '['[,]']' ?

> cheers
> 
> andrew
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Peter Eisentraut

On mån, 2012-04-16 at 21:28 +0900, Kyotaro HORIGUCHI wrote:
> Hello, I found a duplicate words in the comment of
> StartupXLOG@xlog.c and the attached patch fixes it.
> 
> 
> Essentially the fix is in one line as follows,
> 
> -  * We're in recovery, so unlogged relations relations may be trashed
> +  * We're in recovery, so unlogged relations may be trashed

Fixed.

> But I did fill-paragraph for the fixed comment so the patch
> replaces a little bit more.

You might want to adjust your fill-column setting to 79, so pgindent
doesn't reformat that again.  Compare to what I just committed.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A typo fix in a comment in xlog.c

2012-04-16 Thread Kyotaro HORIGUCHI

Hello, I found a duplicate words in the comment of
StartupXLOG@xlog.c and the attached patch fixes it.


Essentially the fix is in one line as follows,

-  * We're in recovery, so unlogged relations relations may be trashed
+  * We're in recovery, so unlogged relations may be trashed

But I did fill-paragraph for the fixed comment so the patch
replaces a little bit more.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0c301b2..04dec4e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6354,10 +6354,10 @@ StartupXLOG(void)
 		CheckRequiredParameterValues();
 
 		/*
-		 * We're in recovery, so unlogged relations relations may be trashed
-		 * and must be reset.  This should be done BEFORE allowing Hot Standby
-		 * connections, so that read-only backends don't try to read whatever
-		 * garbage is left over from before.
+		 * We're in recovery, so unlogged relations may be trashed and must
+		 * be reset.  This should be done BEFORE allowing Hot Standby
+		 * connections, so that read-only backends don't try to read
+		 * whatever garbage is left over from before.
 		 */
 		ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP);
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Alexander Korotkov

On Mon, Apr 16, 2012 at 10:29 AM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 16.04.2012 08:40, Jeff Davis wrote:
>
>> Does someone know of a spatial join algorithm (without IP claims) that
>> would be as good as this one for ranges?
>>
>
> I'd be happy with an algorithm that's specific to ranges, too, but my gut
> geeling is that there has to be a lot of research on spatial join
> algorithms out there. Implementing one of them would be more widely
> applicable, so I think we should look into spatial join algorithms first
> and see how far that gets us, before we write something specific to ranges.

There is a good overview article about spatial joins.
http://www.cs.umd.edu/users/hjs/pubs/jacoxtods07.pdf
It shows that there is a lot of methods based on building temporaty
indexes. It might mean that building of temporary GiST index is not a bad
idea.
Also there are methods which looks like extension of Jeff Davis proposal to
the multidimensional case. It is Plane Sweep and External Plane Sweep
methods. However, it might use sophisticated data structures for the
"sweep". And I believe it should use it for good performance.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Simon Riggs

On Mon, Apr 16, 2012 at 1:05 PM, Kyotaro HORIGUCHI
 wrote:

> Hello, this is bug report and a patch for it.
>
> The first patch in the attachments is for 9.2dev and next one is
> for 9.1.3.
>
> On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does
> not check against WAL segments written. This makes checkpointer
> always run at the speed according to checkpoint_timeout
> regardless of WAL advancing rate.

Thanks, I'll take a look.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [BUG] Checkpointer on hot standby runs without looking checkpoint_segments

2012-04-16 Thread Kyotaro HORIGUCHI

Hello, this is bug report and a patch for it.

The first patch in the attachments is for 9.2dev and next one is
for 9.1.3.

On the current 9.2dev, IsCheckpointOnSchedule@checkpointer.c does
not check against WAL segments written. This makes checkpointer
always run at the speed according to checkpoint_timeout
regardless of WAL advancing rate.

This leads to unexpected imbalance in the numbers of WAL segment
files between the master and the standby(s) for high advance rate
of WALs. And what is worse, the master would have much higher
chance to remove some WAL segments before the standby receives
them.

XLogPageRead()@xlog.c triggers checkpoint referring to WAL
segment advance. So I think this is a bug of bgwriter in 9.1. The
attached patches fix that on 9.2dev and 9.1.3 respctively.

In the backported version to 9.1.3, bgwriter.c is modified
instead of checkpointer.c in 9.2. And GetWalRcvWriteRecPtr() is
used as the equivalent of GetStandbyFlushRecPtr() in 9.2.

By the way, GetStandbyFlushRecPtr() acquires spin lock within. It
might be enough to read XLogCtl->recoveryLastRecPtr without lock
to make rough estimation, but I can't tell it is safe or
not. Same discussion could be for GetWalRcvWriteRecPtr() on
9.1.3.


However, it seems to work fine on a simple test.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.
diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index c9473f7..f86e9b9 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -491,8 +491,8 @@ CheckpointerMain(void)
 			 * Initialize checkpointer-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =
+do_restartpoint ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -731,28 +731,29 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-	 * computed before calling CreateCheckPoint. The code in XLogInsert that
-	 * actually triggers a checkpoint when checkpoint_segments is exceeded
-	 * compares against RedoRecptr, so this is not completely accurate.
-	 * However, it's good enough for our purposes, we're only calculating an
-	 * estimate anyway.
+	 * computed before calling CreateCheckPoint. The code in
+	 * XLogInsert that actually triggers a checkpoint when
+	 * checkpoint_segments is exceeded compares against RedoRecptr.
+	 * Similarly, we consult WAL flush location instead on hot
+	 * standbys and XLogPageRead compares it aganst RedoRecptr, too.
+	 * Altough these are not completely accurate, it's good enough for
+	 * our purposes, we're only calculating an estimate anyway.
 	 */
-	if (!RecoveryInProgress())
+	
+	recptr = RecoveryInProgress() ? GetStandbyFlushRecPtr() : GetInsertRecPtr();
+	elapsed_xlogs =
+		(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
+		 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
+		CheckPointSegments;
+
+	if (progress < elapsed_xlogs)
 	{
-		recptr = GetInsertRecPtr();
-		elapsed_xlogs =
-			(((double) (int32) (recptr.xlogid - ckpt_start_recptr.xlogid)) * XLogSegsPerFile +
-			 ((double) recptr.xrecoff - (double) ckpt_start_recptr.xrecoff) / XLogSegSize) /
-			CheckPointSegments;
-
-		if (progress < elapsed_xlogs)
-		{
-			ckpt_cached_elapsed = elapsed_xlogs;
-			return false;
-		}
+		ckpt_cached_elapsed = elapsed_xlogs;
+		return false;
 	}
 
 	/*
diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
index 5643ec8..0ce9945 100644
--- a/src/backend/postmaster/bgwriter.c
+++ b/src/backend/postmaster/bgwriter.c
@@ -56,6 +56,7 @@
 #include "pgstat.h"
 #include "postmaster/bgwriter.h"
 #include "replication/syncrep.h"
+#include "replication/walreceiver.h"
 #include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "storage/ipc.h"
@@ -489,8 +490,8 @@ BackgroundWriterMain(void)
 			 * Initialize bgwriter-private variables used during checkpoint.
 			 */
 			ckpt_active = true;
-			if (!do_restartpoint)
-ckpt_start_recptr = GetInsertRecPtr();
+			ckpt_start_recptr =	do_restartpoint ?
+GetWalRcvWriteRecPtr(NULL) : GetInsertRecPtr();
 			ckpt_start_time = now;
 			ckpt_cached_elapsed = 0;
 
@@ -764,30 +765,32 @@ IsCheckpointOnSchedule(double progress)
 		return false;
 
 	/*
-	 * Check progress against WAL segments written and checkpoint_segments.
+	 * Check progress against WAL segments written, or replayed for
+	 * hot standby, and checkpoint_segments.
 	 *
 	 * We compare the current WAL insert location against the location
-

[HACKERS] nodes/*funcs.c inconsistencies

2012-04-16 Thread Noah Misch

I observed these inconsistencies in node support functions:

- _copyReassignOwnedStmt() uses COPY_SCALAR_FIELD() on the string field
  "newrole", and _equalReassignOwnedStmt() uses COMPARE_NODE_FIELD().

- _outCreateForeignTableStmt() calls _outCreateStmt() directly.  This produces
  the label "CREATEFOREIGNTABLESTMTCREATESTMT".  The attached patch splits
  things out the way we normally do in outfuncs.c.  There's no readfuncs.c
  support, so this is strictly cosmetic.

- _outColumnDef() uses WRITE_INT_FIELD for the "storage" field, a char.
  Again, no readfuncs.c support to create a compatibility problem.

- _copyRenameStmt() and _equalRenameStmt() ignore the "relationType" field,
  but I can't see a good reason to do so.  PostgreSQL 9.1 added this field,
  but only recent master (after commit 38b9693f of 3 April 2012) references
  the field beyond setting it in the parser.

- _copyViewStmt() and _equalViewStmt() ignore the "options" field, and
  _equalColumnDef() ignores "fdwoptions".  These are new in PostgreSQL 9.2,
  and I see no good reason to ignore them.

I'd suggest backpatching the ReassignOwnedStmt() bits; the wrong code could
produce crashes.  The rest are for master only.

Thanks,
nm
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index c94799b..eed0ea4 100644
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***
*** 2891,2896  _copyRenameStmt(const RenameStmt *from)
--- 2891,2897 
RenameStmt *newnode = makeNode(RenameStmt);
  
COPY_SCALAR_FIELD(renameType);
+   COPY_SCALAR_FIELD(relationType);
COPY_NODE_FIELD(relation);
COPY_NODE_FIELD(object);
COPY_NODE_FIELD(objarg);
***
*** 3047,3052  _copyViewStmt(const ViewStmt *from)
--- 3048,3054 
COPY_NODE_FIELD(aliases);
COPY_NODE_FIELD(query);
COPY_SCALAR_FIELD(replace);
+   COPY_NODE_FIELD(options);
  
return newnode;
  }
***
*** 3650,3656  _copyReassignOwnedStmt(const ReassignOwnedStmt *from)
ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt);
  
COPY_NODE_FIELD(roles);
!   COPY_SCALAR_FIELD(newrole);
  
return newnode;
  }
--- 3652,3658 
ReassignOwnedStmt *newnode = makeNode(ReassignOwnedStmt);
  
COPY_NODE_FIELD(roles);
!   COPY_STRING_FIELD(newrole);
  
return newnode;
  }
diff --git a/src/backend/nodes/equalindex 9564210..c06b068 100644
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***
*** 1306,1311  static bool
--- 1306,1312 
  _equalRenameStmt(const RenameStmt *a, const RenameStmt *b)
  {
COMPARE_SCALAR_FIELD(renameType);
+   COMPARE_SCALAR_FIELD(relationType);
COMPARE_NODE_FIELD(relation);
COMPARE_NODE_FIELD(object);
COMPARE_NODE_FIELD(objarg);
***
*** 1438,1443  _equalViewStmt(const ViewStmt *a, const ViewStmt *b)
--- 1439,1445 
COMPARE_NODE_FIELD(aliases);
COMPARE_NODE_FIELD(query);
COMPARE_SCALAR_FIELD(replace);
+   COMPARE_NODE_FIELD(options);
  
return true;
  }
***
*** 1945,1951  static bool
  _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt 
*b)
  {
COMPARE_NODE_FIELD(roles);
!   COMPARE_NODE_FIELD(newrole);
  
return true;
  }
--- 1947,1953 
  _equalReassignOwnedStmt(const ReassignOwnedStmt *a, const ReassignOwnedStmt 
*b)
  {
COMPARE_NODE_FIELD(roles);
!   COMPARE_STRING_FIELD(newrole);
  
return true;
  }
***
*** 2182,2187  _equalColumnDef(const ColumnDef *a, const ColumnDef *b)
--- 2184,2190 
COMPARE_NODE_FIELD(collClause);
COMPARE_SCALAR_FIELD(collOid);
COMPARE_NODE_FIELD(constraints);
+   COMPARE_NODE_FIELD(fdwoptions);
  
return true;
  }
diff --git a/src/backend/nodes/outfunindex 594b3fd..f713773 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***
*** 1927,1937  _outPlannerParamItem(StringInfo str, const PlannerParamItem 
*node)
   *
   
*/
  
  static void
! _outCreateStmt(StringInfo str, const CreateStmt *node)
  {
-   WRITE_NODE_TYPE("CREATESTMT");
- 
WRITE_NODE_FIELD(relation);
WRITE_NODE_FIELD(tableElts);
WRITE_NODE_FIELD(inhRelations);
--- 1927,1938 
   *
   
*/
  
+ /*
+  * print the basic stuff of all nodes that inherit from CreateStmt
+  */
  static void
! _outCreateStmtInfo(StringInfo str, const CreateStmt *node)
  {
WRITE_NODE_FIELD(relation);
WRITE_NODE_FIELD(tableElts);
WRITE_NODE_FIELD(inhRelations);
***
*** 1944,1954  _outCreateStmt(StringInfo str, const CreateStmt *node)
  }
  
  static void
  _outCreateForeignTableStmt(St

Re: [HACKERS] Why can't I use pgxs to build a plpgsql plugin?

2012-04-16 Thread Heikki Linnakangas


On 13.04.2012 19:17, Guillaume Lelarge wrote:

On Thu, 2012-04-12 at 12:28 +0300, Heikki Linnakangas wrote:

On 08.04.2012 11:59, Guillaume Lelarge wrote:

There could be a good reason which would explain why we can't (or don't
want to) do this, but I don't see it right now.


Me neither, except a general desire to keep internals hidden. I propose
the attached.


Sounds good to me. I would love to see this happening in 9.2.


Ok, committed. I fixed the .PHONY line as Tom pointed out, and changed 
MSVC install.pm to also copy the header file.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] man pages for contrib programs

2012-04-16 Thread Dimitri Fontaine

Peter Eisentraut  writes:
> Good question.  I guess we could keep the original name "... Modules"
> for that chapter.

Those are a kind of server application in my mind, I think we want to
keep using “module” to mean the shared library file we load at runtime,
be it a .so, a .dylib or a .dll.

That said, those non-extension stuff are most often just a module.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-16 Thread Dimitri Fontaine

Alvaro Herrera  writes:
> I've used Redmine a lot, as you know, and I only keep using it because
> it's a requirement at work.  It is certainly not close to usable for
> general pgsql stuff.  (Trac, which we used to use prior to Redmine, was
> certainly much worse, though).

Same story here, still using redmine a lot, all with custom reports etc.

> I can't say that it's all that slow, or that there's a problem with the
> code, or that the search doesn't work right (and I've never had a wiki
> edit disappear, either, and I've used that a lot).  It's just the wrong
> tool altogether.

It's indeed slow here, and I agree that's not the problem. Not the tool
we need, +1.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Heikki Linnakangas


On 16.04.2012 10:38, Simon Riggs wrote:

On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch  wrote:

On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:

In the department of query cancellations, I believe Noah argued
previously that this wasn't really going to cause a problem.  And,
indeed, if the master has a mix of inserts, updates, and deletes, then
it seems likely that any recovery conflicts generated this way would
be hitting about the same place in the XID space as the ones caused by
pruning away dead tuples.  What will be different, if we go this
route, is that an insert-only master, which right now only generates
conflicts at freezing time, will instead generate conflicts much
sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
enough to know whether this is a problem, and I'm not trying to block
this approach, but I do want to make sure that everyone agrees that
it's acceptable before we go do it, because inevitably somebody is
going to get bit.


Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC "off" would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.


I'm against adding a new GUC, especially for that fairly thin reason.


Same here.

Can we have a "soft" hot standby conflict that doesn't kill the query, 
but disables index-only-scans?


In the long run, perhaps we need to store XIDs in the visibility map 
instead of just a bit. If you we only stored one xid per 100 pages or 
something like that, the storage overhead would not be much higher than 
what we have at the moment. But that's obviously not going to happen for 
9.2...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Different gettext domain needed for error context

2012-04-16 Thread Heikki Linnakangas


On 15.04.2012 00:54, Tom Lane wrote:

I really think we need to change errcontext itself to pass the correct
domain.  If we are going to require a domain to be provided (and this
does require that, for correct operation), then we need to break any
code that doesn't provide it in a visible fashion.

A possibly more attractive alternative is to redefine errcontext
with a macro that allows TEXTDOMAIN to be passed in behind-the-scenes,
thus keeping source-level compatibility.  We can do this with the same
type of hack we've used for many years for elog():

#define errcontext  set_errcontext_domain(TEXTDOMAIN), errcontext_msg

where the actual message-passing function is now called errcontext_msg.


Ok then, here's a patch using that approach.

I had to rename a few local variables called "errcontext", because the 
macro now tries to expands those and you get an error.


Note: If you want to test this at home, the original test case I posted 
doesn't currently work because the text of the context messages in 
PL/pgSQL have been slightly changed since I posted the original test 
case, but the translations have not been updated yet. Until then, you 
can manually remove the double quotes in messages like 'function \"%s\"' 
in the .po file to test this.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 0c301b2..a10b569 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -6504,7 +6504,7 @@ StartupXLOG(void)
 			bool		recoveryContinue = true;
 			bool		recoveryApply = true;
 			bool		recoveryPause = false;
-			ErrorContextCallback errcontext;
+			ErrorContextCallback errcallback;
 			TimestampTz xtime;
 
 			InRedo = true;
@@ -6566,10 +6566,10 @@ StartupXLOG(void)
 }
 
 /* Setup error traceback support for ereport() */
-errcontext.callback = rm_redo_error_callback;
-errcontext.arg = (void *) record;
-errcontext.previous = error_context_stack;
-error_context_stack = &errcontext;
+errcallback.callback = rm_redo_error_callback;
+errcallback.arg = (void *) record;
+errcallback.previous = error_context_stack;
+error_context_stack = &errcallback;
 
 /*
  * ShmemVariableCache->nextXid must be beyond record's xid.
@@ -6614,7 +6614,7 @@ StartupXLOG(void)
 RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
 
 /* Pop the error context stack */
-error_context_stack = errcontext.previous;
+error_context_stack = errcallback.previous;
 
 if (!XLogRecPtrIsInvalid(ControlFile->backupStartPoint) &&
 	XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 95fec8d..cb7e67a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -1854,7 +1854,7 @@ CopyFrom(CopyState cstate)
 	TupleTableSlot *myslot;
 	MemoryContext oldcontext = CurrentMemoryContext;
 
-	ErrorContextCallback errcontext;
+	ErrorContextCallback errcallback;
 	CommandId	mycid = GetCurrentCommandId(true);
 	int			hi_options = 0; /* start with default heap_insert options */
 	BulkInsertState bistate;
@@ -1998,10 +1998,10 @@ CopyFrom(CopyState cstate)
 	econtext = GetPerTupleExprContext(estate);
 
 	/* Set up callback to identify error line number */
-	errcontext.callback = CopyFromErrorCallback;
-	errcontext.arg = (void *) cstate;
-	errcontext.previous = error_context_stack;
-	error_context_stack = &errcontext;
+	errcallback.callback = CopyFromErrorCallback;
+	errcallback.arg = (void *) cstate;
+	errcallback.previous = error_context_stack;
+	error_context_stack = &errcallback;
 
 	for (;;)
 	{
@@ -2116,7 +2116,7 @@ CopyFrom(CopyState cstate)
 			nBufferedTuples, bufferedTuples);
 
 	/* Done, clean up */
-	error_context_stack = errcontext.previous;
+	error_context_stack = errcallback.previous;
 
 	FreeBulkInsertState(bistate);
 
diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c
index 80dbdd1..33966ee 100644
--- a/src/backend/parser/parse_node.c
+++ b/src/backend/parser/parse_node.c
@@ -144,10 +144,10 @@ setup_parser_errposition_callback(ParseCallbackState *pcbstate,
 	/* Setup error traceback support for ereport() */
 	pcbstate->pstate = pstate;
 	pcbstate->location = location;
-	pcbstate->errcontext.callback = pcb_error_callback;
-	pcbstate->errcontext.arg = (void *) pcbstate;
-	pcbstate->errcontext.previous = error_context_stack;
-	error_context_stack = &pcbstate->errcontext;
+	pcbstate->errcallback.callback = pcb_error_callback;
+	pcbstate->errcallback.arg = (void *) pcbstate;
+	pcbstate->errcallback.previous = error_context_stack;
+	error_context_stack = &pcbstate->errcallback;
 }
 
 /*
@@ -157,7 +157,7 @@ void
 cancel_parser_errposition_callback(ParseCallbackState *pcbstate)
 {
 	/* Pop the error context stack */
-	error_context_stack = pcbstate->errcontext.previous;
+	error_context_stack = pcbstate->errcallback.previous;
 }

Re: [HACKERS] Last gasp

2012-04-16 Thread Simon Riggs

On Sun, Apr 15, 2012 at 11:31 PM, Greg Smith  wrote:
> On 04/15/2012 05:46 AM, Simon Riggs wrote:
>>
>> Our problem is not lack of resource, it is ineffective
>> delegation. As Hannu points out, he didn't know the patch would be
>> rejected, so he didn't know help was needed to save something useful.
>> I considered that the job of the CF manager, but perhaps it is was
>> not.
>
>
> Note that one of the influences on the "death march" here was lack of a
> fully devoted CF manager for the full duration.  I did some rabble rousing
> to get things started in the usual way, but my time for this only budgeted
> for six weeks in that role.  And that was quite optimistic for this one.
>
> Trying to quantify how much time investment the CF manager role really
> involves is one of my important projects to chew on.  Whoever ends up doing
> that should at least have an idea what scale of problem they're getting
> into.

Again, if the CF manager role is too big, then we delegate. No reason
why we can't have a CF team, with different people responsible for
different areas.

The key thing is working out how to mobilise the resources we have to
best effect.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.3 Pre-proposal: Range Merge Join

2012-04-16 Thread Simon Riggs

On Mon, Apr 16, 2012 at 7:52 AM, Tom Lane  wrote:

> Dunno.  It might be easier to sell the idea of adding support for range
> joins in a couple of years, after we've seen how much use ranges get.

Once we've started the journey towards range types we must complete it
reasonably quickly.

Having partially implemented, unoptimised features is the same as not
having the feature at all, so it will remain unused until it really
works. We could say that about many features, but if Jeff is
championing this, I'd say go for it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs

On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch  wrote:
> On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
>> In the department of query cancellations, I believe Noah argued
>> previously that this wasn't really going to cause a problem.  And,
>> indeed, if the master has a mix of inserts, updates, and deletes, then
>> it seems likely that any recovery conflicts generated this way would
>> be hitting about the same place in the XID space as the ones caused by
>> pruning away dead tuples.  What will be different, if we go this
>> route, is that an insert-only master, which right now only generates
>> conflicts at freezing time, will instead generate conflicts much
>> sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
>> enough to know whether this is a problem, and I'm not trying to block
>> this approach, but I do want to make sure that everyone agrees that
>> it's acceptable before we go do it, because inevitably somebody is
>> going to get bit.
>
> Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
> A standby with the GUC "off" would ignore all-visible indicators and also
> decline to generate conflicts when flipping them on.

I'm against adding a new GUC, especially for that fairly thin reason.


>> 1. vacuum on master sets the page-level bit and the visibility map bit
>> 2. the heap page with the bit is written to disk BEFORE the WAL entry
>> generated in (1) gets flushed; this is allowable, since it's not an
>> error for the page-level bit to be set while the visibility-map bit is
>> unset, only the other way around
>> 3. crash (still before the WAL entry is flushed)
>> 4. some operation on the master emits an FPW for the page without
>> first rendering it not-all-visible
>>
>> At present, I'm not sure there's any real problem here, since normally
>> an all-visible heap page is only going to get another WAL entry if
>> somebody does an insert, update, or delete on it, in which case the
>> visibility map bit is going to get cleared anyway.  Freezing the page
>> might emit a new FPW, but that's going to generate conflicts anyway,
>> so no problem there.  So I think there's no real problem here, but it
>> doesn't seem totally future-proof - any future type of WAL record that
>> might modify the page without rendering it not-all-visible would
>> create a very subtle hazard for Hot Standby.  We could dodge the whole
>> issue by leaving the hack in heapgetpage() intact and just be happy
>> with making index-only scans work, or maybe somebody has a more clever
>> idea.
>
> Good point.  We could finagle things so the standby notices end-of-recovery
> checkpoints and blocks the optimization for older snapshots.  For example,
> maintain an integer count of end-of-recovery checkpoints seen and store that
> in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
> global value is greater than the snapshot's value, disable the optimization
> for that snapshot.  I don't know whether the optimization is powerful enough
> to justify that level of trouble, but it's an option to consider.
>
> For a different approach, targeting the fragility, we could add assertions to
> detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
> full-page image.

We don't need a future proof solution, especially not at this stage of
the release cycle. Every time you add a WAL record, we need to
consider the possible conflict impact.

We just need a simple and clear solution. I'll work on that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] not null validation option in contrib/file_fdw

2012-04-16 Thread Etsuro Fujita

Thank you for the review.

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andrew Dunstan
> Sent: Friday, April 13, 2012 9:16 PM
> To: Shigeru HANADA
> Cc: Etsuro Fujita; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] not null validation option in contrib/file_fdw
> 
> 
> 
> On 04/13/2012 07:21 AM, Shigeru HANADA wrote:
> > (2012/04/13 16:59), Etsuro Fujita wrote:
> >> I updated the patch added to CF 2012-Next [1].  Attached is the
> >> updated version of the patch.
> > I applied the patch and ran regression tests of file_fdw, and I got
> > SIGSEGV X-(
> >
> > The failure occurs in fileGetOptions, and it is caused by
> > list_delete_cell used in foreach loop; ListCell points delete target
> > has been free-ed in list_delete_cell, but foreach accesses it to get
> > next element.
> >
> > Some of backend functions which use list_delete_cell in loop use "for"
> > loop instead of foreach, and other functions exit the loop after
> > calling list_delete_cell.  Since we can't stop searching non-COPY
> > options until meeting the end of the options list, we would need to
> > choose former ("for" loop), or create another list which contains only
> > valid COPY options and return it via other_options parameter.
> >
> 
> Yes, the code in fileGetOptions() appears to be bogus.

Sorry, I will fix it.

> Also, "validate" is a terrible name for the option (and in the code)
IMNSHO.
> It's far too generic. "validate_not_null" or some such would surely be
> better.

I thought it would be used for not only NOT NULL but also CHECK and foreign
key constraints.  That is, when a user sets the option to 'true', file_fdw
verifies that each tuple meets all kinds of constraints.  So, how about
"validate_data_file" or simply "validate_file"?

Best regards,
Etsuro Fujita

> cheers
> 
> andrew
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Noah Misch

On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
> In the department of query cancellations, I believe Noah argued
> previously that this wasn't really going to cause a problem.  And,
> indeed, if the master has a mix of inserts, updates, and deletes, then
> it seems likely that any recovery conflicts generated this way would
> be hitting about the same place in the XID space as the ones caused by
> pruning away dead tuples.  What will be different, if we go this
> route, is that an insert-only master, which right now only generates
> conflicts at freezing time, will instead generate conflicts much
> sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
> enough to know whether this is a problem, and I'm not trying to block
> this approach, but I do want to make sure that everyone agrees that
> it's acceptable before we go do it, because inevitably somebody is
> going to get bit.

Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC "off" would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.

> As to correctness, after further review of lazy_scan_heap(), I think
> there are some residual bugs here that need to be fixed whatever we
> decide to do about the Hot Standby case:
> 
> 1. I noticed that when we do PageSetAllVisible() today we follow it up
> with SetBufferCommitInfoNeedsSave().  PD_ALL_VISIBLE is not merely a
> hint, so I think these should be changed to MarkBufferDirty(), which
> shouldn't make much difference given current code, but proposals to
> handle hint changes differently than non-hint changes abound, so it's
> probably not good to rely on those meaning the same thing forever.

Do you refer to PD_ALL_VISIBLE as "not merely a hint" due to the requirement
to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
in its role as a witness of tuple visibility, but it is not a hint in its role
as a witness of the visibility map status?  In any event, I agree that those
call sites should use MarkBufferDirty().

The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
systems with weaker memory ordering, the FlushBuffer() process's clearing of
BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
process until some time after the former retrieved buffer contents from shared
memory.  Memory barriers could make the comment true, but we should probably
just update the comment to explain that a race condition may eat the update
entirely.  Incidentally, is there a good reason for MarkBufferDirty() to
increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
not to do so?  Looks like an oversight.

> 2. More seriously, lazy_scan_heap() releases the buffer lock before
> setting the all-visible bit on the heap.  This looks just plain wrong
> (and it's my fault, to be clear).  Somebody else can come along after
> we've released the buffer lock and mark the page not-all-visible.
> That concurrent process will check the visibility map bit, find it
> already clear, and go on its merry way.  Then, we set the visibility
> map bit and go on our merry way.  Now we have a not-all-visible page
> with the all-visible bit set in the visibility map.   Oops.

Hmm, indeed.  This deserves its own open item.

> I think
> this probably needs to be rewritten so that lazy_scan_heap() acquires
> a pin on the visibility map page before locking the buffer for cleanup
> and holds onto that pin until either we exit the range of heap pages
> covered by that visibility map page or trigger index vac due to
> maintenance_work_mem exhaustion.  With that infrastructure in place,
> we can set the visibility map bit at the same time we set the
> page-level bit without risking holding the buffer lock across a buffer
> I/O (we might have to hold the buffer lock across a WAL flush in the
> worst case, but that hazard exists in a number of other places as well
> and fixing it here is well out of scope).

Looks reasonable at a glance.

> 1. vacuum on master sets the page-level bit and the visibility map bit
> 2. the heap page with the bit is written to disk BEFORE the WAL entry
> generated in (1) gets flushed; this is allowable, since it's not an
> error for the page-level bit to be set while the visibility-map bit is
> unset, only the other way around
> 3. crash (still before the WAL entry is flushed)
> 4. some operation on the master emits an FPW for the page without
> first rendering it not-all-visible
> 
> At present, I'm not sure there's any real problem here, since normally
> an all-visible heap page is only going to get another WAL entry if
> somebody does an insert, update, or delete on it, in which case the
> visibility map bit is going to get cleared anyway.  Freezing the page
> might emit a new FPW, but that's going to generate conflicts anyway,
> so no problem the

Re: [HACKERS] Improving our clauseless-join heuristics

2012-04-16 Thread Amit Kapila

>>Another way to look at this is that if we have

>>  select ... from a,b,c,d where a.x = b.y + c.z

>>we want to consider a cross-join of b and c, in the hopes that we can do
>>something useful with the join clause at the next level where it can
>>join to a.  From b's perspective there is no percentage in joining to d.

For this kind of query, currently (referring 9.0.3 code) also it considers
join of b,c and b,d. 
As there is no join clause between b,c,d so it will go in path of
make_rels_by_clauseless_joins() where it considers join of b,c and b,d.

In this kind of query, if the suggestion by me in below mail is followed,
then it will consider joining a,b a,c a,d at level-2 in function
make_rels_by_clause_joins() which it currently doesn't do which may generate
useless join paths.
However in real-world scenario's this kind of queries where 2 cols of
different tables are
used in one side expression (b.y + c.z) of where clause may be less.

>>On the other hand, when we come to consider d, it will have no join
>>clauses so we will consider joining it to each other rel in turn.  

When it come to consider d, as at level -2 it only consider later rels. So
it should not consider joining with each other rel.

-Original Message-
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Monday, April 16, 2012 9:57 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Improving our clauseless-join heuristics 

Amit Kapila  writes:
> That case is handled by make_rels_by_clauseless_joins
> It will be handled by make_rels_by_clauseless_joins() if given rel old_rel
> doesn't have any join clause.
> However if it has join clause but doesn't able to join with any other rels
> like in the example you have provided for relation c, it is not able to
join
> with other rel d. 
> In such cases it can do cross-join with d, because it has not found any
> relation to join with.
> Doesn't it will address the problem you mentioned?

Sounds to me like that's going in the wrong direction, ie, joining to
exactly the wrong relations.  If you have to cross-join it's better to
cross-join against relations that are included in one of the join
clauses that does mention the current relation.

Another way to look at this is that if we have

select ... from a,b,c,d where a.x = b.y + c.z

we want to consider a cross-join of b and c, in the hopes that we can do
something useful with the join clause at the next level where it can
join to a.  From b's perspective there is no percentage in joining to d.
On the other hand, when we come to consider d, it will have no join
clauses so we will consider joining it to each other rel in turn.  So
if there is any value in joining d early, we will find that out.

Or at least that's the theory.  Now that I look at it this way, I think
there is a bug here: when we are at level 2, we only consider later rels
in the list for forced clauseless joins.  That would be okay if the
condition were symmetrical, but it isn't.  This makes for a bogus
FROM-list ordering dependency in handling of clauseless joins.  Not
sure how much that matters in the real world, but it still seems wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

68 matches

Mail list logo