date:20130402

Re: [HACKERS] spoonbill vs. -HEAD

2013-04-02 Thread Stefan Kaltenbrunner

On 04/03/2013 12:59 AM, Tom Lane wrote:
> I wrote:
>> I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into
>> StatementCancelHandler() and any other handlers that might exit via
>> longjmp.  I'm a bit inclined to only do this on platforms where a
>> problem is demonstrable, which so far is only OpenBSD.  (You'd
>> think that all BSDen would have the same issue, but the buildfarm
>> shows otherwise.)
> 
> BTW, on further thought it seems like maybe this is an OpenBSD bug,
> at least in part: what is evidently happening is that the temporary
> blockage of SIGINT during the handler persists even after we've
> longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
> to establish that longjmp handler --- so why isn't the original signal
> mask reinstalled when we return to the main loop?
> 
> If (your version of) OpenBSD is getting this wrong, it'd explain why
> we've not seen similar behavior elsewhere.

hmm trolling the openbsd cvs history brings up this:

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/sparc64/sparc64/machdep.c?r1=1.143;sortby=date#rev1.143


Stefan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Amit Kapila

On Wednesday, April 03, 2013 2:23 AM Greg Smith wrote:
> On 4/1/13 5:44 AM, Amit Kapila wrote:
> 
> > I think in that case we can have 3 separate patches
> > 1. Memory growth defect fix
> > 2. Default postgresql.conf to include config directory and SET
> Persistent
> > into single file implementation
> > 3. Rearrangement of GUC validation into validate_conf_option
> function.
> >
> > As already there is review happened for point 2 and point 1 is an
> existing
> > code defect fix, so in my opinion
> > Patch 1 and 2 should be considered for 9.3.
> 
> They have been considered for 9.3, I just doubt they could get
> committed
> right now.  In order for this to go in as part of the very last 9.3
> feature changes (which is where we're at in the development cycle),
> you'd need to have a committer volunteer to take on the job of doing a
> second level of review on it.  And then that would have to happen
> without any other issues popping up.  That usually is not how it works.
>   Normally the first round of committer review finds another round of
> issues, and there's at least one more update before commit.  (Note that
> this is exactly what just happened today, with the review from Peter
> Eisentraut)
> 
> I'm not saying it's impossible for this feature to go in to 9.3, but
> I'm
> not seeing any committers volunteer to take on the job either.  I do
> want to see this feature go in--I'll update it for 9.4 even if you
> don't--but we're already into April.  There isn't much time left for
> 9.3.  And the security release this week has gobbled up a good chunk of
> committer and packager time unexpectedly, which is just bad luck for
> your submission.
> 
>  From a process perspective, features that enter the last CF of a
> release that are very close to ready from the start have good odds of
> being committed.  You've done an excellent job of updating this in
> response to feedback, but it has involved a long list of changes so
> far.
>   It's fair to say this was still a rough feature at the start of CF
> 2013-01, and now it's good but can be usefully polished a bit more.
> 
> For something of this size, going from rough feature to commit quality
> normally takes more than one CF.  I don't have any obvious errors to
> point out right now.  But I think there's still some room to improve on
> this before commit.  Andres mentioned on another thread that he thought
> merging some of your ideas with the version Zoltan did was useful to
> look at, and I was thinking of something similar.  This is close to
> being ready, and I hope you won't get discouraged just because it's
> probably going to slip to 9.4.

Thank you for keeping me motivated for this patch now and throughout the
last CF by providing valuable suggestions and comments.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Amit Kapila

On Tuesday, April 02, 2013 9:49 PM Peter Eisentraut wrote:
> I'm going to ignore most of the discussion that led up to this and give
> this patch a fresh look.

Thank you.

> + 
> + SET PERSISTENT max_connections To 10;
> + 
> 
> The To should probably be capitalized.

Okay, shall fix it.
 
> I doubt this example actually works because changing max_connections
> requires a restart.  Try to find a better example.

Any parameter's changed with this command can be effective either after
restart or SIGHUP.
So I will change it to SIGHUP parameter.

> It's weird that SET LOCAL and SET SESSION actually *set* the value, and
> the second key word determines how long the setting will last.  SET
> PERSISTENT doesn't actually set the value.  I predict that this will be
> a new favorite help-it-doesn't-work FAQ.
> 
>   
>SCHEMA
>
> !   SET [ PERSISTENT ] SCHEMA
> 'value' is an alias for
>  SET search_path TO value.  Only
> one
>  schema can be specified using this syntax.
> 
> 
> I don't think [ PERSISTENT ] needs to be added in these and similar
> snippets.  We don't mention LOCAL etc. here either.

Agreed, shall fix this.

> --- 34,41 
>   postgresql.conf,
> pg_hba.conf, and
>   pg_ident.conf are traditionally stored in
>   PGDATA (although in PostgreSQL
> 8.0 and
> ! later, it is possible to place them elsewhere). By default the
> directory config is stored
> ! in PGDATA, however it should be kept along with
> postgresql.conf.
>   
> 
>   
> 
> This chunk doesn't make any sense to me.  "Should" is always tricky.
> Why should I, and why might I not?

I mean to say here, that user needs to move config directory and its
contents along with 
postgresql.conf. 
Shall we change as below: 
By default config directory is stored in PGDATA, however it needs to be kept
along with postgresql.conf


>   
> +  config
> +  Subdirectory containing automatically generated configuration
> files
> + 
> +
> + 
>base
>Subdirectory containing per-database subdirectories
>   
> 
> Only automatically generated ones?

No, any other files can also be present. 
How about change it as : 
Subdirectory containing generated configuration files. 
Any other suggestions?

This new directory's will be used to place generated files, 

> COPY_STRING_FIELD(name);
> COPY_NODE_FIELD(args);
> COPY_SCALAR_FIELD(is_local);
> +   COPY_SCALAR_FIELD(is_persistent);
> 
> return newnode;
>   }
> 
> I suggest changing is_local into a new trivalued field that stores
> LOCAL
> or SESSION or PERSISTENT.
> 
> n->is_local = false;
> $$ = (Node *) n;
> }

Okay, I will change it.

> +   | SET PERSISTENT set_persistent
> +   {
> +   VariableSetStmt *n = $3;
> +   n->is_persistent = true;
> +   $$ = (Node *) n;
> +   }
> ;
> 
>   set_rest:
> 
> Why can't you use SET PERSISTENT set_rest?

As SET PERSISTENT cannot be used with some of syntaxes, example (SESSION
AUTHORIZATION) and also it is not supportted inside transaction blocks.

> 
> *** a/src/backend/replication/basebackup.c
> --- b/src/backend/replication/basebackup.c
> ***
> *** 755,760  sendDir(char *path, int basepathlen, bool sizeonly)
> --- 755,766 
> strlen(PG_TEMP_FILE_PREFIX)) ==
> 0)
> continue;
> 
> +   /* skip auto conf temporary file */
> +   if (strncmp(de->d_name,
> +   PG_AUTOCONF_FILENAME ".",
> +   sizeof(PG_AUTOCONF_FILENAME))
> == 0)
> +   continue;
> +
> 
> Maybe pg_basebackup should be taught to ignore certain kinds of
> temporary files in general.  The file name shouldn't be hardcoded into
> pg_basebackup.  This would effectively make the configuration file
> naming scheme part of the replication protocol.  See other thread about
> pg_basebackup client/server compatibility.  This needs to be
> generalized.

As we are just trying to ignore the file during backup, why it should effect
replication protocol? 
I mean protocol should be effected, if try to send something new or don't
send what is expected on the other side.
Also the same is done for team and backup label file.

> (Side thought: Does pg_basebackup copy editor temp and backup files?)

Currently pg_basebackup doesn't copy temp or backup files. The check I made
in code is similar to check for those two.

> 
> +   ereport(elevel,
> +   (errmsg("Configuration parameters changed with SET
> PERSISTENT command before start of server "
> +   "will not be effective as \"%s\"  file is not
> accessible.", PG_AUTOCO

[HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-02 Thread Jeff Janes

This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b
Author: Heikki Linnakangas 
Date:   Sun Jun 24 18:06:38 2012 +0300

Don't waste the last segment of each 4GB logical log file.


in a side window do: watch "ls -lrt /tmp/data/pg_xlog"

dfda6ebaec/bin/initdb -D /tmp/data
dfda6ebaec/bin/pg_ctl -D /tmp/data -l logfile restart -o "--fsync=off
--wal_keep_segments=20"
createdb
pgbench -i -s10
pgbench -T3600

xlogs accumulate until there are about 26 of them.  Then all of a sudden
they drop down to 11 of them.  Then it builds back up to around 26, and
seems to stay there permanently.

At some point when it is over-pruning and recycling, it recyles the log
files that are still needed for recovery, and if the database crashes at
that point it will not recover because it can't find either the primary
secondary checkpoint records.

Cheers,

Jeff

[HACKERS] psql crash fix

2013-04-02 Thread Bruce Momjian

I found that psql will crash if given a PSQLRC value containing a tilde:

$ PSQLRC="~/x" psql test
*** glibc detected *** psql: free(): invalid pointer: 
0x7fffb7c933ec ***

This is on Debian Squeeze 6.0.7.  The fix is to pstrdup() the value
returned by getenv(), so it can be free()'ed later --- you can't free
getenv()-returned values:

   As typically implemented, getenv() returns a pointer to a string
   within the environment list.  The caller must take care not to
   modify this string, since that would change the environment of
   the process.

This bug exists in 9.2 and git head.  I also removed the return value
from expand_tilde() as no caller was using it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
new file mode 100644
index be5e34a..3dea92c
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
*** session_username(void)
*** 1645,1655 
   * substitute '~' with HOME or '~username' with username's home dir
   *
   */
! char *
  expand_tilde(char **filename)
  {
  	if (!filename || !(*filename))
! 		return NULL;
  
  	/*
  	 * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde
--- 1645,1655 
   * substitute '~' with HOME or '~username' with username's home dir
   *
   */
! void
  expand_tilde(char **filename)
  {
  	if (!filename || !(*filename))
! 		return;
  
  	/*
  	 * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde
*** expand_tilde(char **filename)
*** 1697,1701 
  	}
  #endif
  
! 	return *filename;
  }
--- 1697,1701 
  	}
  #endif
  
! 	return;
  }
diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h
new file mode 100644
index d8bb093..db645da
*** a/src/bin/psql/common.h
--- b/src/bin/psql/common.h
*** extern bool is_superuser(void);
*** 44,49 
  extern bool standard_strings(void);
  extern const char *session_username(void);
  
! extern char *expand_tilde(char **filename);
  
  #endif   /* COMMON_H */
--- 44,49 
  extern bool standard_strings(void);
  extern const char *session_username(void);
  
! extern void expand_tilde(char **filename);
  
  #endif   /* COMMON_H */
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
new file mode 100644
index 5cb6b5f..5d7fe6e
*** a/src/bin/psql/startup.c
--- b/src/bin/psql/startup.c
*** process_psqlrc(char *argv0)
*** 610,616 
  	char		rc_file[MAXPGPATH];
  	char		my_exec_path[MAXPGPATH];
  	char		etc_path[MAXPGPATH];
! 	char	   *envrc;
  
  	find_my_exec(argv0, my_exec_path);
  	get_etc_path(my_exec_path, etc_path);
--- 610,616 
  	char		rc_file[MAXPGPATH];
  	char		my_exec_path[MAXPGPATH];
  	char		etc_path[MAXPGPATH];
! 	char	   *envrc = getenv("PSQLRC");
  
  	find_my_exec(argv0, my_exec_path);
  	get_etc_path(my_exec_path, etc_path);
*** process_psqlrc(char *argv0)
*** 618,629 
  	snprintf(rc_file, MAXPGPATH, "%s/%s", etc_path, SYSPSQLRC);
  	process_psqlrc_file(rc_file);
  
- 	envrc = getenv("PSQLRC");
- 
  	if (envrc != NULL && strlen(envrc) > 0)
  	{
! 		expand_tilde(&envrc);
! 		process_psqlrc_file(envrc);
  	}
  	else if (get_home_path(home))
  	{
--- 618,630 
  	snprintf(rc_file, MAXPGPATH, "%s/%s", etc_path, SYSPSQLRC);
  	process_psqlrc_file(rc_file);
  
  	if (envrc != NULL && strlen(envrc) > 0)
  	{
! 		/* might need to free() this */
! 		char *envrc_alloc = pstrdup(envrc);
! 
! 		expand_tilde(&envrc_alloc);
! 		process_psqlrc_file(envrc_alloc);
  	}
  	else if (get_home_path(home))
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-04-02 Thread Tom Lane

I wrote:
> I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into
> StatementCancelHandler() and any other handlers that might exit via
> longjmp.  I'm a bit inclined to only do this on platforms where a
> problem is demonstrable, which so far is only OpenBSD.  (You'd
> think that all BSDen would have the same issue, but the buildfarm
> shows otherwise.)

BTW, on further thought it seems like maybe this is an OpenBSD bug,
at least in part: what is evidently happening is that the temporary
blockage of SIGINT during the handler persists even after we've
longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
to establish that longjmp handler --- so why isn't the original signal
mask reinstalled when we return to the main loop?

If (your version of) OpenBSD is getting this wrong, it'd explain why
we've not seen similar behavior elsewhere.

This theory doesn't really exonerate our code completely, because we use
sigsetjmp(..., 0) in PG_TRY, which means that in a code path where we
catch a longjmp and don't PG_RE_THROW it, we could be left with the
signal disabled.  I don't believe that is happening in the
isolation-test cases, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CREATE EXTENSION BLOCKS

2013-04-02 Thread David E. Wheeler

Hackers,

I am working on scripts to copy data from Oracle via oracle_fdw. They each do 
something like this:

CREATE SCHEMA migrate_stuff;
SET search_path TO migrate_stuff,public;
CREATE EXTENSION oracle_fdw SCHEMA migrate_rules;

CREATE SERVER oracle_stuff FOREIGN DATA WRAPPER oracle_fdw
OPTIONS (dbserver :'oracle_uri');

CREATE USER MAPPING FOR postgres SERVER oracle_stuff
OPTIONS (user :'oracle_user', password :'oracle_pass');

CREATE FOREIGN TABLE migrate_stuff (
   stuff_id integer,
   name text
) SERVER oracle_rules OPTIONS(table 'STUFF');

INSERT INTO my.stuff SELECT * FROM migrate_stuff;

DROP SCHEMA migrate_stuff CASCADE;
COMMIT;

Then I run them in parallel:

for file in migrate*.sql; do
psql -d foo -f $file &
done
wait

This works fine except for one thing: the first CREATE EXTENSION statement 
blocks all the others. Even when I create the extension in separate schemas in 
each script! I have to remove the CREATE EXTENSION statement, create it in 
public before any of the scripts run, then drop it when they're done. I'm okay 
with this workaround, but wasn't sure if the blocking of CREATE EXTENSION was 
intentional or a known issue (id did not see it documented in 
http://www.postgresql.org/docs/current/static/sql-createextension.html).

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] spoonbill vs. -HEAD

2013-04-02 Thread Tom Lane

Stefan Kaltenbrunner  writes:
> On 03/26/2013 11:30 PM, Tom Lane wrote:
>> A different line of thought is that the cancel was received by the
>> backend but didn't succeed in cancelling the query for some reason.

> I added the "pgcancel failed" codepath you suggested but it does not
> seem to get triggered at all so the above might actually be what is
> happening...

Stefan was kind enough to grant me access to spoonbill, and after
some experimentation I found out the problem.  It seems that OpenBSD
blocks additional deliveries of a signal while the signal handler is
in progress, and that this is implemented by just calling sigprocmask()
before and after calling the handler.  Therefore, if the handler doesn't
return normally --- like, say, it longjmps --- the restoration of the
previous mask never happens.  So we're left with the signal still
blocked, meaning second and subsequent attempts to interrupt the backend
don't work.

This isn't revealed by the regular regression tests because they don't
exercise PQcancel, but several recently-added isolation tests do attempt
to PQcancel the same backend more than once.

It's a bit surprising that it's taken us this long to recognize the
problem.  Typical use of PQcancel doesn't necessarily cause a failure:
StatementCancelHandler() won't exit through longjmp unless
ImmediateInterruptOK is true, which is only the case while waiting for a
heavyweight lock.  But still, you'd think somebody would've run into
the case in normal usage.

I think the simplest fix is to insert "PG_SETMASK(&UnBlockSig)" into
StatementCancelHandler() and any other handlers that might exit via
longjmp.  I'm a bit inclined to only do this on platforms where a
problem is demonstrable, which so far is only OpenBSD.  (You'd
think that all BSDen would have the same issue, but the buildfarm
shows otherwise.)

BTW, this does not seem to explain the symptoms shown at
http://www.postgresql.org/message-id/4fe4d89a.8020...@kaltenbrunner.cc
because what we were seeing there was that *all* signals appeared to be
blocked.  However, after this round of debugging I no longer have a lot
of faith in OpenBSD's ps, because it was lying to me about whether the
process had signals blocked or not (or at least, it couldn't see the
effects of the interrupt signal disable, although when I added debugging
code to print the active signal mask according to sigprocmask() I got
told the truth).  So I'm not sure how much trust to put in those older
ps results.  It's possible that the previous failures were a
manifestation of something related to this bug.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2013-04-02 Thread Alexander Korotkov

On Wed, Apr 3, 2013 at 12:36 AM, Erikjan Rijkers  wrote:

> On Mon, April 1, 2013 23:15, Alexander Korotkov wrote:
>
> [trgm-regexp-0.14.patch.gz]
>
> Hi Alexander,
>

Hi Erik!

> Something went wrong in this version of the patch: many (most) queries
> that were earlier
> spectacularly fast have become slow, often slower than a seqscan or only
> marginally faster. See
> the attached numbers; it compares head(seqscan) with trgm-regex patch
> versions 13 and 14.
>
> I did not even complete the test-run because version 14 is so clearly
> inferior to 13 (and earlier,
> as far as I can remember).
>
> (let me know if you want the whole result, I can run it overnight)
>

Yes, there was bug in new color map scan implementation. It was fixed in
attached version. Could you re-run tests please?
Attached version of patch also addresses problem that we now using API it's
not correct to use special values of color number. It introduces additional
structure ExtendedColor which contain ExtendedColorClass field for handling
additional "virtual" colors introduced by analysis.

--
With best regards,
Alexander Korotkov.

trgm-regexp-0.15.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting to 9.3 beta

2013-04-02 Thread Simon Riggs

On 29 March 2013 15:05, Tom Lane  wrote:
> Bruce Momjian  writes:
>> Our final 9.3 commit-fest has has exceeded the two-month mark, so it is
>> time to start targeting a date to close it and get to 9.3 beta.  I see
>> 25 items will needing attention before we can close it:
>
>>   https://commitfest.postgresql.org/action/commitfest_view?id=17
>
>> What is a reasonable timeframe to target for completion of these items?
>
> TBH, once Andrew commits the JSON patch, I wouldn't have a problem with
> moving all the rest to Returned With Feedback or the next CF.  None of
> the others seem to me to be close-to-committable (with the possible
> exception of the Max LSN patch, which I've not looked at), and April is
> not the time to be doing development.

Agreed

I completely agree with the need to end this CF here/now/soon. As
someone who has normally tried to stretch the limit on what is
possible, I do recognise the need for this to come to an end.
(Certainly my own available time is running out.)

Thanks very much to everybody that tried to get so much into the release.

> Next week is going to be tied up with the back-branch releases, but
> maybe we could target beta for the week after?  The main gating factor
> at this point really would be how quickly we could write some draft
> release notes, so people know what to test.

I think we probably need a little more time for open items discussion
and lists (not more feature commits, just docs and
I-said-I'd-change-thats). In the past that has taken a couple of weeks
to work through, so I'd guess there's a few things to sort out there.

So I suggest about 23 April for Beta1.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Greg Smith


On 4/1/13 5:44 AM, Amit Kapila wrote:


I think in that case we can have 3 separate patches
1. Memory growth defect fix
2. Default postgresql.conf to include config directory and SET Persistent
into single file implementation
3. Rearrangement of GUC validation into validate_conf_option function.

As already there is review happened for point 2 and point 1 is an existing
code defect fix, so in my opinion
Patch 1 and 2 should be considered for 9.3.


They have been considered for 9.3, I just doubt they could get committed 
right now.  In order for this to go in as part of the very last 9.3 
feature changes (which is where we're at in the development cycle), 
you'd need to have a committer volunteer to take on the job of doing a 
second level of review on it.  And then that would have to happen 
without any other issues popping up.  That usually is not how it works. 
 Normally the first round of committer review finds another round of 
issues, and there's at least one more update before commit.  (Note that 
this is exactly what just happened today, with the review from Peter 
Eisentraut)


I'm not saying it's impossible for this feature to go in to 9.3, but I'm 
not seeing any committers volunteer to take on the job either.  I do 
want to see this feature go in--I'll update it for 9.4 even if you 
don't--but we're already into April.  There isn't much time left for 
9.3.  And the security release this week has gobbled up a good chunk of 
committer and packager time unexpectedly, which is just bad luck for 
your submission.


From a process perspective, features that enter the last CF of a 
release that are very close to ready from the start have good odds of 
being committed.  You've done an excellent job of updating this in 
response to feedback, but it has involved a long list of changes so far. 
 It's fair to say this was still a rough feature at the start of CF 
2013-01, and now it's good but can be usefully polished a bit more.


For something of this size, going from rough feature to commit quality 
normally takes more than one CF.  I don't have any obvious errors to 
point out right now.  But I think there's still some room to improve on 
this before commit.  Andres mentioned on another thread that he thought 
merging some of your ideas with the version Zoltan did was useful to 
look at, and I was thinking of something similar.  This is close to 
being ready, and I hope you won't get discouraged just because it's 
probably going to slip to 9.4.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2013-04-02 Thread Erikjan Rijkers

On Mon, April 1, 2013 23:15, Alexander Korotkov wrote:

[trgm-regexp-0.14.patch.gz]

Hi Alexander,

Something went wrong in this version of the patch: many (most) queries that 
were earlier
spectacularly fast have become slow, often slower than a seqscan or only 
marginally faster. See
the attached numbers; it compares head(seqscan) with trgm-regex patch versions 
13 and 14.

I did not even complete the test-run because version 14 is so clearly inferior 
to 13 (and earlier,
as far as I can remember).

(let me know if you want the whole result, I can run it overnight)


Thanks,


Erik Rijkers


re-head-13-14-20130402-2219.txt.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spin Lock sleep resolution

2013-04-02 Thread David Gould

On Tue, 2 Apr 2013 09:01:36 -0700
Jeff Janes  wrote:

> Sorry.  I triple checked that the patch was there, but it seems like if you
> save a draft with an attachment, when you come back later to finish and
> send it, the attachment may not be there anymore.  The Gmail Offline teams
> still has a ways to go.  Hopefully it is actually there this time.

I'll give the patch a try, I have a workload that is impacted by spinlocks
fairly heavily sometimes and this might help or at least give me more
information. Thanks!

-dg

-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-02 Thread Kevin Grittner

Kevin Grittner  wrote:
> Tom Lane  wrote:

>> Another reason why I don't like this code is that
>> pg_relation_is_scannable is broken by design:
>>
>>  relid = PG_GETARG_OID(0);
>>  relation = RelationIdGetRelation(relid);
>>  result = relation->rd_isscannable;
>>  RelationClose(relation);
>>
>> You can't do that: if the relcache entry doesn't already exist,
>> this will try to construct one while not holding any lock on the
>> relation, which is subject to all sorts of race conditions.
>
> Hmm.  I think I had that covered earlier but messed up in
> rearranging to respond to review comments.  Will review both new
> calling locations.

For the SQL-level function, does this look OK?:

diff --git a/src/backend/utils/adt/dbsize.c
b/src/backend/utils/adt/dbsize.c
index d589d26..94e55f0 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -850,9 +850,13 @@ pg_relation_is_scannable(PG_FUNCTION_ARGS)
    bool    result;
 
    relid = PG_GETARG_OID(0);
-   relation = RelationIdGetRelation(relid);
+   relation = try_relation_open(relid, AccessShareLock);
+
+   if (relation == NULL)
+   PG_RETURN_BOOL(false);
+
    result = relation->rd_isscannable;
-   RelationClose(relation);
+   relation_close(relation, AccessShareLock);
 
    PG_RETURN_BOOL(result);
 }

I think the call in ExecCheckRelationsScannable() is safe because
it comes after the tables are all already locked.  I put it there
so that the appropriate lock strength should be used based on the
whether it was locked by ExecInitNode() or something before it.  Am
I missing something?  Can I not count on the lock being held at
that point?  Would the right level of API here be relation_open()
with NoLock rather than RelationIdGetRelation()?  Or is there some
other call which is more appropriate there?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Atri Sharma



Sent from my iPad

On 02-Apr-2013, at 23:41, Merlin Moncure  wrote:

> On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma  wrote:
>> On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas  wrote:
>> 
>>> One thought I had for fiddling with usage_count is to make it grow
>>> additively (x = x + 1) and decay exponentially (x = x >> 1).  I'm not
>>> sure the idea is any good, but one problem with the current system is
>>> that it's pretty trivial for a buffer to accumulate five touches, and
>>> after that we lose all memory of what the frequency of access is, so a
>>> pages of varying different levels of "hotness" can all have usage
>>> count 5.  This might allow a little more refinement without letting
>>> the time to degrade the usage count get out of control.
>> 
>> This is just off the top of my head, but one possible solution could
>> be to quantize the levels of hotness. Specifically, we could
>> categorize buffers based on hotness. All buffers start in level 1 and
>> usage_count 0. Whichever buffer reaches usage_count of 5, and next
>> clock sweep which wants to increment its usage_count(hence taking it
>> above 5) sees that, it promotes the buffer to the next level, and
>> resets its usage_count to 0. Same logic applies for each level. When
>> we decrement usage_count and see that it is zero(for some buffer), if
>> it is in a level > 1, we demote the buffer to the next lower level. If
>> the buffer is in level 1, it is a potential candidate for replacement.
>> 
>> This will allow us to have a loose idea about the hotness of a page,
>> without actually storing the usage_count for a buffer. We can still
>> update usage_count without locking, as buffers in high contention
>> which miss an update in their usage_count wont be affected by that
>> missed update, in accordance with all the discussion upthread.
> 
> how is that different from usage_count itself? usage_count *is* a
> measure of hotness.  the arbitrary cap at 5 is paranoia to prevent the
> already considerable damage that occurs in the situation Andres is
> talking about (where everyhing is marked 'hot' so you have to sweep a
> lot more).
> 
> also, any added complexity in terms of manipulating usage_count is a
> move away from the lockless maintenance I'm proposing.  maybe my idea
> is a non-starter on that basis alone, but the mechanic should be kept
> as simple as possible.  the idea to move it to the bgwriter is to
> pre-emptively do the work that backends are now doing: try and keep
> ahead of the allocations being done so that buffer requests are
> satisfied quickly.
> 

I agree, we want to reduce the complexity of usage_count.I was only musing on 
the point Robert raised, if we want to continue using usage_count and refine it 
for more accurate tracking of hotness of a buffer.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-02 Thread Kevin Grittner

Tom Lane  wrote:
> Kevin Grittner  writes:
>> Tom Lane  wrote:
>>> So maybe I'm nuts to care about the performance of an
>>> assert-enabled backend, but I don't really find a 4X runtime
>>> degradation acceptable, even for development work.  Does anyone
>>> want to fess up to having caused this, or do I need to start
>>> tracking down what changed?
>
>> I checked master HEAD for a dump of regression and got about 4
>> seconds.  I checked right after my initial push of matview code
>> and got 2.5 seconds.  I checked just before that and got 1
>> second.  There was some additional pg_dump work for matviews
>> after the initial push which may or may not account for the rest
>> of the time.
>
> I poked at this a bit, and eventually found that the performance
> differential goes away if I dike out the
> pg_relation_is_scannable() call in getTables()'s table-collection
> query.  What appears to be happening is that those calls cause a
> great many more relcache entries to be made than used to happen,
> plus many entries are made earlier in the run than they used to
> be.  Many of those entries have subsidiary memory contexts, which
> results in an O(N^2) growth in the time spent in AllocSetCheck,
> since postgres.c does MemoryContextCheck(TopMemoryContext) once
> per received command, and pg_dump will presumably issue O(N)
> commands in an N-table database.
>
> So one question that brings up is whether we need to dial back
> the amount of work done in memory context checking, but I'm loath
> to do so in development builds.  That code has fingered an awful
> lot of bugs.  OTOH, if the number of tables in the regression DB
> continues to grow, we may have little choice.
>
> Anyway, the immediate takeaway is that this represents a horribly
> expensive way for pg_dump to find out which matviews aren't
> scannable.  The cheap way for it to find out would be if we had a
> boolean flag for that in pg_class.  Would you review the bidding
> as to why things were done the way they are?  Because in general,
> having to ask the kernel something is going to suck in any case,
> so basing it on the size of the disk file doesn't seem to me to
> be basically a good thing.

The early versions of the patch had a boolean in pg_class to track
this, but I got complaints from Robert and Noah (and possibly
others?) that this got too ugly in combination with the support for
unlogged matviews, and they suggested the current approach.  For an
unlogged matview we need to replace the heap (main fork) with the
init fork before the database is up and running, so there would
need to be some way for that to result in forcing the flag in
pg_class.  I was attempting to do that when a matview which was
flagged as scannable in pg_class was opened, but I gotta agree that
it wasn't pretty.  Noah suggested a function based on testing the
heap to see if it looked like the init fork, and the current state
of affairs I my attempt to make it work that way.

We could definitely minimize the overhead by only testing this for
matviews.  It seemed potentially useful for the function to have
some purpose for other types of relations, so I was trying to avoid
that.  Maybe that was a bad idea.

> We could alleviate the pain by changing pg_dump's query to
> something like
>
> (case when c.relkind = 'm'
>  then pg_relation_is_scannable(c.oid)
>  else false end)

Yeah, that's the sort of thing I was thinking of.  If we're going
to go that way, we may want to touch one or two other spots.

> but TBH this feels like bandaging a bad design.

So far nobody has been able to suggest a better way to support both
unlogged matviews and some way to prevent a matview from being used
if it does not contain materialized data for its query from *some*
point in time.  Suggestions welcome.

> Another reason why I don't like this code is that
> pg_relation_is_scannable is broken by design:
>
> relid = PG_GETARG_OID(0);
> relation = RelationIdGetRelation(relid);
> result = relation->rd_isscannable;
> RelationClose(relation);
>
> You can't do that: if the relcache entry doesn't already exist,
> this will try to construct one while not holding any lock on the
> relation, which is subject to all sorts of race conditions.

Hmm.  I think I had that covered earlier but messed up in
rearranging to respond to review comments.  Will review both new
calling locations.

> In general, direct calls on RelationIdGetRelation are probably
> broken.

If valid contexts for use of the function are that limited, it
might merit a comment where the function is defined.  I'm not sure
what a good alternative to this would be.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Merlin Moncure

On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma  wrote:
> On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas  wrote:
>
>> One thought I had for fiddling with usage_count is to make it grow
>> additively (x = x + 1) and decay exponentially (x = x >> 1).  I'm not
>> sure the idea is any good, but one problem with the current system is
>> that it's pretty trivial for a buffer to accumulate five touches, and
>> after that we lose all memory of what the frequency of access is, so a
>> pages of varying different levels of "hotness" can all have usage
>> count 5.  This might allow a little more refinement without letting
>> the time to degrade the usage count get out of control.
>
> This is just off the top of my head, but one possible solution could
> be to quantize the levels of hotness. Specifically, we could
> categorize buffers based on hotness. All buffers start in level 1 and
> usage_count 0. Whichever buffer reaches usage_count of 5, and next
> clock sweep which wants to increment its usage_count(hence taking it
> above 5) sees that, it promotes the buffer to the next level, and
> resets its usage_count to 0. Same logic applies for each level. When
> we decrement usage_count and see that it is zero(for some buffer), if
> it is in a level > 1, we demote the buffer to the next lower level. If
> the buffer is in level 1, it is a potential candidate for replacement.
>
> This will allow us to have a loose idea about the hotness of a page,
> without actually storing the usage_count for a buffer. We can still
> update usage_count without locking, as buffers in high contention
> which miss an update in their usage_count wont be affected by that
> missed update, in accordance with all the discussion upthread.

how is that different from usage_count itself? usage_count *is* a
measure of hotness.  the arbitrary cap at 5 is paranoia to prevent the
already considerable damage that occurs in the situation Andres is
talking about (where everyhing is marked 'hot' so you have to sweep a
lot more).

also, any added complexity in terms of manipulating usage_count is a
move away from the lockless maintenance I'm proposing.  maybe my idea
is a non-starter on that basis alone, but the mechanic should be kept
as simple as possible.  the idea to move it to the bgwriter is to
pre-emptively do the work that backends are now doing: try and keep
ahead of the allocations being done so that buffer requests are
satisfied quickly.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Atri Sharma

On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas  wrote:

> One thought I had for fiddling with usage_count is to make it grow
> additively (x = x + 1) and decay exponentially (x = x >> 1).  I'm not
> sure the idea is any good, but one problem with the current system is
> that it's pretty trivial for a buffer to accumulate five touches, and
> after that we lose all memory of what the frequency of access is, so a
> pages of varying different levels of "hotness" can all have usage
> count 5.  This might allow a little more refinement without letting
> the time to degrade the usage count get out of control.

This is just off the top of my head, but one possible solution could
be to quantize the levels of hotness. Specifically, we could
categorize buffers based on hotness. All buffers start in level 1 and
usage_count 0. Whichever buffer reaches usage_count of 5, and next
clock sweep which wants to increment its usage_count(hence taking it
above 5) sees that, it promotes the buffer to the next level, and
resets its usage_count to 0. Same logic applies for each level. When
we decrement usage_count and see that it is zero(for some buffer), if
it is in a level > 1, we demote the buffer to the next lower level. If
the buffer is in level 1, it is a potential candidate for replacement.

This will allow us to have a loose idea about the hotness of a page,
without actually storing the usage_count for a buffer. We can still
update usage_count without locking, as buffers in high contention
which miss an update in their usage_count wont be affected by that
missed update, in accordance with all the discussion upthread.

Thoughts/Comments?

Regards,

Atri

--
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread David E. Wheeler

On Apr 2, 2013, at 10:16 AM, Tom Lane  wrote:

>> Is this knowledge encapsulated in a to-do?
> 
> I added an item to the "Indexes" section of the TODO page.

Great, thanks.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund

On 2013-04-02 18:26:23 +0100, Greg Stark wrote:
> I'm confused by this thread. We *used* to maintain an LRU. The whole
> reason for the clock-sweep algorithm is precisely to avoid maintaining
> a linked list of least recently used buffers since the head of that
> list is a point of contention.

I don't think anybody is proposing to put the LRU back into a linked list,
given the frequency of usagecount manipulations that would probably end pretty
badly. What I think Robert, Tom and I are talking are talking about is putting
*some* buffers with usagecount = 0 into a linked list so that when a backend
requires a new page it can take one buffer from the freelist instead of

a) possibly touching quite some (I have seen 4 times *every* existing header)
pages to find one with usagecount = 0
b) having to write the page out itself

If everything is going well that would mean only the bgwritter (or if
bgfreelist or whatever) performs the clock sweep. Others take *new* pages from
the freelist instead of performing part of the sweep themselves.

Makes more sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Greg Stark

I'm confused by this thread. We *used* to maintain an LRU. The whole
reason for the clock-sweep algorithm is precisely to avoid maintaining
a linked list of least recently used buffers since the head of that
list is a point of contention.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane

"David E. Wheeler"  writes:
> On Apr 2, 2013, at 8:03 AM, Tom Lane  wrote:
>>> Are there any widely known non-built-in cases besides citext?

>> Well, indxpath.c knows about text LIKE and network subset operators,
>> and it would be nice if it knew how to do the same type of optimization
>> for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
>> doesn't seem implementable in the current infrastructure because ranges
>> aren't all built-in types.  I agree that the citext case isn't too
>> compelling in isolation, but surely the range case is interesting.

On further reflection, I withdraw the claim that range-inclusion
couldn't be implemented in the current design.  Although the various
range types might not be built-in, the "anyelement <@ anyrange" operator
*is* built-in, so its OID could be added to the switch statements in
indxpath.c.  I don't think it'd be terribly difficult to then add
datatype-agnostic code to pry apart the range value and construct a
derived "btree_indexed_col >= lowbound AND btree_indexed_col <= highbound"
indexclause.  But this could not be extended to citext or other plugin
extensions, because their operators don't have hard-wired OIDs.

Anyway, even if the specific claim about ranges is bogus, I still think
there are enough data points to justify the idea that a more extensible
mechanism would be worth having.  At the same time, there's a reason
why it's not yet got to the top of anyone's priority list.

> Is this knowledge encapsulated in a to-do?

I added an item to the "Indexes" section of the TODO page.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund

On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
> >> I agree in general, though I'm not sure the bgwriter process can
> >> reasonably handle this need along with what it's already supposed to be
> >> doing.  We may need another background process that is just responsible
> >> for keeping the freelist populated.
> 
> > What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
> > pages on the freelist, but otherwise its trying to do the above isn't it?
> 
> I think it will be problematic to tie buffer-undirtying to putting both
> clean and dirty buffers into the freelist.  It might chance to work all
> right to use one scan process for both, but I'm afraid it's more likely
> that we'd end up either overserving one goal or underserving the other.

Hm. I had imagined that we would only ever put clean buffers into the
freelist and that we would never write out a buffer that we don't need
for a new page. I don't see much point in randomly writing out buffers
that won't be needed for something else soon. Currently we can't do much
better than basically undirtying random buffers since we don't really know
which page will reach a usagecount of zero since bgwriter doesn't
manipulate usagecounts.

One other scenario I can see is the problem of strategy buffer reusage
of dirtied pages (hint bits, pruning) during seqscans where we would
benefit from pages being written out fast, but I can't imagine that that
could be handled very well by something like the bgwriter?

Am I missing something?

> Also note the entire design of BgBufferSync right now is predicated on
> the assumption that the rate of motion of the scan strategy point
> reflects the rate at which new buffers are needed.  If backends are
> supposed to always get new buffers from the freelist, that logic becomes
> circular: the bgwriter would be watching itself.  Perhaps we can
> refactor the feedback control loop logic so that the buffer scan rate is
> driven by changes in the length of the freelist, but I'm not sure
> exactly what the consequences would be.

Yea, that will definitely need refactoring. What I am imagining is that
that pacing is basically built ontop of a few StragetyControl variables
like:
* number of buffers from the freelist
* number of buffers acquired by backend because freelist was empty
* number of buffers written out by backend because freelist was empty

Those should be pretty cheap to maintain and should be enough to
implement sensible pacing for bgwriter.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Tom Lane

Andres Freund  writes:
> On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
>> I agree in general, though I'm not sure the bgwriter process can
>> reasonably handle this need along with what it's already supposed to be
>> doing.  We may need another background process that is just responsible
>> for keeping the freelist populated.

> What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
> pages on the freelist, but otherwise its trying to do the above isn't it?

I think it will be problematic to tie buffer-undirtying to putting both
clean and dirty buffers into the freelist.  It might chance to work all
right to use one scan process for both, but I'm afraid it's more likely
that we'd end up either overserving one goal or underserving the other.

Also note the entire design of BgBufferSync right now is predicated on
the assumption that the rate of motion of the scan strategy point
reflects the rate at which new buffers are needed.  If backends are
supposed to always get new buffers from the freelist, that logic becomes
circular: the bgwriter would be watching itself.  Perhaps we can
refactor the feedback control loop logic so that the buffer scan rate is
driven by changes in the length of the freelist, but I'm not sure
exactly what the consequences would be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund

On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
> Robert Haas  writes:
> > But, having said that, I still think the best idea is what Andres
> > proposed, which pretty much matches my own thoughts: the bgwriter
> > needs to populate the free list, so that buffer allocations don't have
> > to wait for linear scans of the buffer array.  That's just plain too
> > slow.
> 
> I agree in general, though I'm not sure the bgwriter process can
> reasonably handle this need along with what it's already supposed to be
> doing.  We may need another background process that is just responsible
> for keeping the freelist populated.

What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
pages on the freelist, but otherwise its trying to do the above isn't it?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund

On 2013-04-02 11:54:32 -0400, Robert Haas wrote:
> On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure  wrote:
> > That's a very fair point, although not being able to evict pinned
> > buffers is a highly mitigating aspect.  Also CLOG is a different beast
> > entirely -- it's much more dense (2 bits!) vs a tuple so a single page
> > can a lot of high priority things.  But you could be right anyways.
> >
> > Given that, I wouldn't feel very comfortable with forced eviction
> > without knowing for sure high priority buffers were immune from that.
> > Your nailing idea is maybe the ideal solution.   Messing around with
> > the usage_count mechanic is tempting (like raising the cap and making
> > the sweeper more aggressive as it iterates), but probably really
> > difficult to get right, and, hopefully, ultimately moot.
> 
> One thought I had for fiddling with usage_count is to make it grow
> additively (x = x + 1) and decay exponentially (x = x >> 1).  I'm not
> sure the idea is any good, but one problem with the current system is
> that it's pretty trivial for a buffer to accumulate five touches, and
> after that we lose all memory of what the frequency of access is, so a
> pages of varying different levels of "hotness" can all have usage
> count 5.  This might allow a little more refinement without letting
> the time to degrade the usage count get out of control.
> 
> But, having said that, I still think the best idea is what Andres
> proposed, which pretty much matches my own thoughts: the bgwriter
> needs to populate the free list, so that buffer allocations don't have
> to wait for linear scans of the buffer array.  That's just plain too
> slow.

That way the usagecount should go down far more slowly which essentially makes
it more granular. And I think fiddling on that level before we have a more
sensible buffer acquiration implementation is pretty premature since that will
change too much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Tom Lane

Robert Haas  writes:
> But, having said that, I still think the best idea is what Andres
> proposed, which pretty much matches my own thoughts: the bgwriter
> needs to populate the free list, so that buffer allocations don't have
> to wait for linear scans of the buffer array.  That's just plain too
> slow.

I agree in general, though I'm not sure the bgwriter process can
reasonably handle this need along with what it's already supposed to be
doing.  We may need another background process that is just responsible
for keeping the freelist populated.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Peter Eisentraut

I'm going to ignore most of the discussion that led up to this and give
this patch a fresh look.

+ 
+ SET PERSISTENT max_connections To 10;
+ 

The To should probably be capitalized.

I doubt this example actually works because changing max_connections
requires a restart.  Try to find a better example.

It's weird that SET LOCAL and SET SESSION actually *set* the value, and
the second key word determines how long the setting will last.  SET
PERSISTENT doesn't actually set the value.  I predict that this will be
a new favorite help-it-doesn't-work FAQ.

  
   SCHEMA
   
!   SET [ PERSISTENT ] SCHEMA
'value' is an alias for
 SET search_path TO value.  Only one
 schema can be specified using this syntax.


I don't think [ PERSISTENT ] needs to be added in these and similar
snippets.  We don't mention LOCAL etc. here either.

--- 34,41 
  postgresql.conf,
pg_hba.conf, and
  pg_ident.conf are traditionally stored in
  PGDATA (although in PostgreSQL
8.0 and
! later, it is possible to place them elsewhere). By default the
directory config is stored
! in PGDATA, however it should be kept along with
postgresql.conf.
  

  

This chunk doesn't make any sense to me.  "Should" is always tricky.
Why should I, and why might I not?

  
+  config
+  Subdirectory containing automatically generated configuration
files
+ 
+
+ 
   base
   Subdirectory containing per-database subdirectories
  

Only automatically generated ones?

COPY_STRING_FIELD(name);
COPY_NODE_FIELD(args);
COPY_SCALAR_FIELD(is_local);
+   COPY_SCALAR_FIELD(is_persistent);

return newnode;
  }

I suggest changing is_local into a new trivalued field that stores LOCAL
or SESSION or PERSISTENT.

n->is_local = false;
$$ = (Node *) n;
}
+   | SET PERSISTENT set_persistent
+   {
+   VariableSetStmt *n = $3;
+   n->is_persistent = true;
+   $$ = (Node *) n;
+   }
;

  set_rest:

Why can't you use SET PERSISTENT set_rest?

*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***
*** 755,760  sendDir(char *path, int basepathlen, bool sizeonly)
--- 755,766 
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;

+   /* skip auto conf temporary file */
+   if (strncmp(de->d_name,
+   PG_AUTOCONF_FILENAME ".",
+   sizeof(PG_AUTOCONF_FILENAME)) == 0)
+   continue;
+

Maybe pg_basebackup should be taught to ignore certain kinds of
temporary files in general.  The file name shouldn't be hardcoded into
pg_basebackup.  This would effectively make the configuration file
naming scheme part of the replication protocol.  See other thread about
pg_basebackup client/server compatibility.  This needs to be generalized.

(Side thought: Does pg_basebackup copy editor temp and backup files?)

+   ereport(elevel,
+   (errmsg("Configuration parameters changed with SET
PERSISTENT command before start of server "
+   "will not be effective as \"%s\"  file is not
accessible.", PG_AUTOCONF_FILENAME)));

I'm having trouble interpreting this: How can you change something with
SET PERSISTENT before the server starts?

Also: errmsg messages should start with a lowercase letter and should
generally be much shorter.  Please review other cases in your patch as well.

+   appendStringInfoString(&buf, "# Do not edit this file manually! "
+  "It is overwritten by the SET PERSISTENT command
\n");

There is some punctuation or something missing at the end.

I suggest you break this into two lines.  It's pretty long.


I think the naming of these files is suboptimal:

+ #define PG_AUTOCONF_DIR   "config"

"config" would seem to include pg_hba.conf and others.  Or
postgresql.conf for that matter.  Maybe I should move postgresql.conf
into config/.  Another great new FAQ.  The normal convention for this is
"postgresql.conf.d".  I don't see any reason not to use that.  One
reason that was brought up is that this doesn't match other things
currently in PGDATA, but (a) actually it does (hint: postgresql.conf),
and (b) neither does "config".

+ #define PG_AUTOCONF_FILENAME  "persistent.auto.conf"

"auto" sounds like the result of an automatic tuning tool.  Why not just
persistent.conf.  Or set-persistent.conf.  That makes the association
clear enough.

I don't know why the concept of a configuration directory is being
introduced here.  I mean, I'm all for that, but it seems completely
independent of this.  Some people

Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2013-04-02 Thread Dimitri Fontaine

Heikki Linnakangas  writes:
> I'm quite worried about the security ramifications of this patch. Today, if
> you're not sure if a system has e.g sslinfo installed, you can safely just
> run "CREATE EXTENSION sslinfo". With this patch, that's no longer true,
> because "foo" might not be the extension you're looking for. Mallory

With the attached patch, you can't create a template for an extension
that is already available, so to protect against your scenario you only
have to make "sslinfo" available.

Please also note that when actually installing the "sslinfo" extension,
the one from the system will get prefered over the one from the
templates, should you have both available.

Now, I can see why you would still think it's not enough. Baring better
ideas, the current patch restricts the feature to superusers.

> Documentation doesn't build, multiple errors. In addition to the reference
> pages, there should be a section in the main docs about these templates.

I'm now working on that, setting up the documentation tool set.

>> postgres=# create template for extension myextension version '1.0' with () 
>> as 'foobar';
>> CREATE TEMPLATE FOR EXTENSION
>> postgres=# create extension myextension;
>> ERROR:  syntax error at or near "foobar"
>> LINE 1: create extension myextension;
>> ^
>
> Confusing error message.

Introducing a syntax error in hstore--1.1.sql leads to the same message.
Even if we agree that it must be changed, I think that's for another
patch.

~# create extension hstore;
ERROR:  syntax error at or near "foobar" at character 115
STATEMENT:  create extension hstore;
ERROR:  syntax error at or near "foobar"

>> postgres=# create template for extension myextension version '1.0' with () 
>> as $$create table foobar(i int4) $$;
>> CREATE TEMPLATE FOR EXTENSION
>> postgres=# create extension myextension;
>> CREATE EXTENSION
>> postgres=# select * from foobar;
>> ERROR:  relation "foobar" does not exist
>> LINE 1: select * from foobar;
>>   ^
>
> Where did that table go?

The control->schema was not properly initialized when not given by the
user. That's fixed, and I added a regression test.

> Ah, here... Where did that "1.0" schema come from?

Fixed too, was the same bug.

>> postgres=> create template for extension myextension version '1.0' with
>> (schema public) as $$ create function evilfunc() returns int4 AS
>> evilfunc' language internal; $$;
>> CREATE TEMPLATE FOR EXTENSION
>> postgres=> create extension myextension version '1.0';ERROR:  permission 
>> denied for language internal
>> postgres=> drop template for extension myextension version '1.0';
>> ERROR:  extension with OID 16440 does not exist
>
> Something wrong with catalog caching.

Works for me, I couldn't reproduce the bug here, working from Álvaro's
version 4 of the patch. Maybe he did already fix it, and you tested my
version 3?

>> $ make -s  install
>> /usr/bin/install: cannot stat `./hstore--1.0.sql': No such file or directory
>> make: *** [install] Error 1
>
> Installing hstore fails.

Fixed in the attached. Seeing that makes me think that you actually used
Álvaro's version 4, though.

>> postgres=> create template for extension sslinfo version '1.0' with
>> (schema public) as $$ create function evilfunc() returns int4 AS
>> evilfunc' language internal; $$;
>> ERROR:  extension "sslinfo" is already available

Expected.

>> postgres=> create template for extension sslinfo2 version '1.0' with
>> (schema public) as $$ create function evilfunc() returns int4 AS
>> evilfunc' language internal; $$;
>> CREATE TEMPLATE FOR EXTENSION
>> postgres=> alter template for extension sslinfo2 rename to sslinfo;
>> ALTER TEMPLATE FOR EXTENSION
>
> If we check for an existing extension at CREATE, should also check for that
> in ALTER ... RENAME TO.

Indeed, good catch. Fixed in the attached version 5 of the patch.

I didn't add a regression test for that case, because we need to know
which extensions are available when we try to obtain this error, and I
don't know how to do that. We could create a template for the extension
foobar, then foobar2, then rename foobar2 to foobar, but that wouldn't
exercise the same code path.

Thanks again for your reviewing,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

templates.v5.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Spin Lock sleep resolution

2013-04-02 Thread Jeff Janes

On Monday, April 1, 2013, Tom Lane wrote:

> Jeff Janes  writes:
> > The problem is that the state is maintained only to an integer number of
> > milliseconds starting at 1, so it can take a number of attempts for the
> > random increment to jump from 1 to 2, and then from 2 to 3.
>
> Hm ... fair point, if you assume that the underlying OS has a sleep
> resolution finer than 1ms.  Otherwise it would not matter.

Let's say you get a long stretch of increments that are all a ratio of <1.5
fold, for simplicity let's say they are all 1.3 fold.  When you do
intermediate truncations of the state variable, it never progresses at all.

perl -le '$foo=1; foreach (1..10) {$foo*=1.3; print int $foo}'

perl -le '$foo=1; foreach (1..10) {$foo*=1.3; $foo=int $foo; print int
$foo}'

Obviously the true stochastic case is not quite that stark.

> No patch seen here ...
>

Sorry.  I triple checked that the patch was there, but it seems like if you
save a draft with an attachment, when you come back later to finish and
send it, the attachment may not be there anymore.  The Gmail Offline teams
still has a ways to go.  Hopefully it is actually there this time.

Cheers,

Jeff

spin_delay_microsecond_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread David E. Wheeler

On Apr 2, 2013, at 8:03 AM, Tom Lane  wrote:

>> Are there any widely known non-built-in cases besides citext?
> 
> Well, indxpath.c knows about text LIKE and network subset operators,
> and it would be nice if it knew how to do the same type of optimization
> for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
> doesn't seem implementable in the current infrastructure because ranges
> aren't all built-in types.  I agree that the citext case isn't too
> compelling in isolation, but surely the range case is interesting.

Is this knowledge encapsulated in a to-do?

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas

On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure  wrote:
> That's a very fair point, although not being able to evict pinned
> buffers is a highly mitigating aspect.  Also CLOG is a different beast
> entirely -- it's much more dense (2 bits!) vs a tuple so a single page
> can a lot of high priority things.  But you could be right anyways.
>
> Given that, I wouldn't feel very comfortable with forced eviction
> without knowing for sure high priority buffers were immune from that.
> Your nailing idea is maybe the ideal solution.   Messing around with
> the usage_count mechanic is tempting (like raising the cap and making
> the sweeper more aggressive as it iterates), but probably really
> difficult to get right, and, hopefully, ultimately moot.

One thought I had for fiddling with usage_count is to make it grow
additively (x = x + 1) and decay exponentially (x = x >> 1).  I'm not
sure the idea is any good, but one problem with the current system is
that it's pretty trivial for a buffer to accumulate five touches, and
after that we lose all memory of what the frequency of access is, so a
pages of varying different levels of "hotness" can all have usage
count 5.  This might allow a little more refinement without letting
the time to degrade the usage count get out of control.

But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
to wait for linear scans of the buffer array.  That's just plain too
slow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Merlin Moncure

On Tue, Apr 2, 2013 at 9:55 AM, Robert Haas  wrote:
> On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure  wrote:
>> That seems pretty unlikely because of A sheer luck of hitting that
>> page for the dropout (if your buffer count is N the chances of losing
>> it would seem to be 1/N at most) and B highly used pages are much more
>> likely to be pinned and thus immune from eviction.  But my issue with
>> this whole line of analysis is that I've never been able to to turn it
>> up in simulated testing.   Probably to do it you'd need very very fast
>> storage.
>
> Well, if you have shared_buffers=8GB, that's a million buffers.  One
> in a million events happen pretty frequently on a heavily loaded
> server, which, on recent versions of PostgreSQL, can support several
> hundred thousand queries per second, each of which accesses multiple
> buffers.
>
> I've definitely seen evidence that poor choices of which CLOG buffer
> to evict can result in a noticeable system-wide stall while everyone
> waits for it to be read back in.  I don't have any similar evidence
> for shared buffers, but I wouldn't be very surprised if the same
> danger exists there, too.

That's a very fair point, although not being able to evict pinned
buffers is a highly mitigating aspect.  Also CLOG is a different beast
entirely -- it's much more dense (2 bits!) vs a tuple so a single page
can a lot of high priority things.  But you could be right anyways.

Given that, I wouldn't feel very comfortable with forced eviction
without knowing for sure high priority buffers were immune from that.
Your nailing idea is maybe the ideal solution.   Messing around with
the usage_count mechanic is tempting (like raising the cap and making
the sweeper more aggressive as it iterates), but probably really
difficult to get right, and, hopefully, ultimately moot.

 merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane

Peter Eisentraut  writes:
> On 4/2/13 10:26 AM, Tom Lane wrote:
>> The issue with the LIKE special case is that left-anchored patterns
>> are (to some extent) indexable with ordinary btree indexes, and so we
>> want to exploit that rather than tell people they have to have a whole
>> other index.

> In practice, you need an index specifically for pattern matching anyway.

Well, you do if you want to search for general patterns, but there's
been enough interest in the LIKE optimization as-is to make me think
we could not get away with removing it.

>> So it's a pretty special case, but there are just enough instances of it
>> to wish for some not-so-hard-wired way to deal with it.

> Are there any widely known non-built-in cases besides citext?

Well, indxpath.c knows about text LIKE and network subset operators,
and it would be nice if it knew how to do the same type of optimization
for range inclusion, ie btree_indexed_col <@ range_constant.  The latter
doesn't seem implementable in the current infrastructure because ranges
aren't all built-in types.  I agree that the citext case isn't too
compelling in isolation, but surely the range case is interesting.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas

On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure  wrote:
> That seems pretty unlikely because of A sheer luck of hitting that
> page for the dropout (if your buffer count is N the chances of losing
> it would seem to be 1/N at most) and B highly used pages are much more
> likely to be pinned and thus immune from eviction.  But my issue with
> this whole line of analysis is that I've never been able to to turn it
> up in simulated testing.   Probably to do it you'd need very very fast
> storage.

Well, if you have shared_buffers=8GB, that's a million buffers.  One
in a million events happen pretty frequently on a heavily loaded
server, which, on recent versions of PostgreSQL, can support several
hundred thousand queries per second, each of which accesses multiple
buffers.

I've definitely seen evidence that poor choices of which CLOG buffer
to evict can result in a noticeable system-wide stall while everyone
waits for it to be read back in.  I don't have any similar evidence
for shared buffers, but I wouldn't be very surprised if the same
danger exists there, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas

On Tue, Apr 2, 2013 at 6:32 AM, Andres Freund  wrote:
> On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
>> On 3/23/13 7:41 AM, Ants Aasma wrote:
>> >Yes, having bgwriter do the actual cleaning up seems like a good idea.
>> >The whole bgwriter infrastructure will need some serious tuning. There
>> >are many things that could be shifted to background if we knew it
>> >could keep up, like hint bit setting on dirty buffers being flushed
>> >out. But again, we have the issue of having good tests to see where
>> >the changes hurt.
>>
>> I think at some point we need to stop depending on just bgwriter for all 
>> this stuff. I believe it would be much cleaner if we had separate procs for 
>> everything we needed (although some synergies might exist; if we wanted to 
>> set hint bits during write then bgwriter *is* the logical place to put that).
>>
>> In this case, I don't think keeping stuff on the free list is close enough 
>> to checkpoints that we'd want bgwriter to handle both. At most we might want 
>> them to pass some metrics back in forth.
>
> bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.
>
> In my personal experience and measurement bgwriter is pretty close to
> useless right now. I think - pretty similar to what Amit has done - it
> should perform part of a real clock sweep instead of just looking ahead
> of the current position without changing usagecounts and the sweep
> position and put enough buffers on the freelist to sustain the need till
> its next activity phase. I hacked around that one night in a hotel and
> got impressive speedups (and quite some breakage) for bigger than s_b
> workloads.
>
> That would reduce quite a bit of pain points:
> - fewer different processes/cpus looking at buffer headers ahead in the cycle
> - fewer cpus changing usagecounts
> - dirty pages are far more likely to be flushed out already when a new
>   page is needed
> - stuff like the relation extension lock which right now frequently have
>   to search far and wide for new pages while holding the extension lock
>   exlusively should finish quite a bit faster
>
> If the freelist lock is separated from the lock protecting the clock
> sweep this should get quite a bit of a scalability boost without having
> potential unfairness you can have with partitioning the lock or such.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread Peter Eisentraut

On 4/2/13 10:26 AM, Tom Lane wrote:
> The issue with the LIKE special case is that left-anchored patterns
> are (to some extent) indexable with ordinary btree indexes, and so we
> want to exploit that rather than tell people they have to have a whole
> other index.

In practice, you need an index specifically for pattern matching anyway.
 The difference would be that instead of using a different operator
class that has no recorded association with the original operator, you'd
use a different access method that is associated with the operator in
normal ways.

> So it's a pretty special case, but there are just enough instances of it
> to wish for some not-so-hard-wired way to deal with it.

Are there any widely known non-built-in cases besides citext?
Presumably the reason you use citext is that you use a lot of letters.
So I kind of doubt that there are going to be a lot of cases of pattern
searches on citext with left-anchored patterns starting with (a
sufficient number of) non-letters.  It's possible, of course, but
doesn't seem important enough by itself.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane

Peter Eisentraut  writes:
> On 3/30/13 11:35 PM, Tom Lane wrote:
>> The LIKE index optimization is hard-wired into
>> match_special_index_operator(), which never heard of citext's ~~
>> operators.
>> 
>> I've wanted for years to replace that mechanism with something that
>> would support plug-in extensions, but have no very good idea how to
>> do it.

> I have been thinking there should be a GiST index that associates with
> the pattern matching operators.  I haven't worked out the details, but
> at least this seems it might be the right framework to solve this problem.

No, not really.  You can already build things like that with GiST, see
pg_trgm (which already does LIKE and will soon be able to do regexes).

The issue with the LIKE special case is that left-anchored patterns
are (to some extent) indexable with ordinary btree indexes, and so we
want to exploit that rather than tell people they have to have a whole
other index.  It doesn't make sense IMO to try to mark ~~ as a btree
operator, because btree doesn't really have any ability for
operator-specific code to do what would have to be done.  What does make
sense is for the planner to understand how to extract a btree-indexable
clause out of what's in the query, as that (a) keeps the complexity
out of the run-time machinery, and (b) provides an opportunity for the
planner to estimate whether the whole thing is worth the trouble or
not, which it frequently isn't in LIKE cases.

So it's a pretty special case, but there are just enough instances of it
to wish for some not-so-hard-wired way to deal with it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] citext like searches using index

2013-04-02 Thread Peter Eisentraut

On 3/30/13 11:35 PM, Tom Lane wrote:
> The LIKE index optimization is hard-wired into
> match_special_index_operator(), which never heard of citext's ~~
> operators.
> 
> I've wanted for years to replace that mechanism with something that
> would support plug-in extensions, but have no very good idea how to
> do it.

I have been thinking there should be a GiST index that associates with
the pattern matching operators.  I haven't worked out the details, but
at least this seems it might be the right framework to solve this problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund

On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
> On 3/23/13 7:41 AM, Ants Aasma wrote:
> >Yes, having bgwriter do the actual cleaning up seems like a good idea.
> >The whole bgwriter infrastructure will need some serious tuning. There
> >are many things that could be shifted to background if we knew it
> >could keep up, like hint bit setting on dirty buffers being flushed
> >out. But again, we have the issue of having good tests to see where
> >the changes hurt.
> 
> I think at some point we need to stop depending on just bgwriter for all this 
> stuff. I believe it would be much cleaner if we had separate procs for 
> everything we needed (although some synergies might exist; if we wanted to 
> set hint bits during write then bgwriter *is* the logical place to put that).
> 
> In this case, I don't think keeping stuff on the free list is close enough to 
> checkpoints that we'd want bgwriter to handle both. At most we might want 
> them to pass some metrics back in forth.

bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.

In my personal experience and measurement bgwriter is pretty close to
useless right now. I think - pretty similar to what Amit has done - it
should perform part of a real clock sweep instead of just looking ahead
of the current position without changing usagecounts and the sweep
position and put enough buffers on the freelist to sustain the need till
its next activity phase. I hacked around that one night in a hotel and
got impressive speedups (and quite some breakage) for bigger than s_b
workloads.

That would reduce quite a bit of pain points:
- fewer different processes/cpus looking at buffer headers ahead in the cycle
- fewer cpus changing usagecounts
- dirty pages are far more likely to be flushed out already when a new
  page is needed
- stuff like the relation extension lock which right now frequently have
  to search far and wide for new pages while holding the extension lock
  exlusively should finish quite a bit faster

If the freelist lock is separated from the lock protecting the clock
sweep this should get quite a bit of a scalability boost without having
potential unfairness you can have with partitioning the lock or such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] regression test failed when enabling checksum

2013-04-02 Thread Andres Freund

On 2013-04-01 19:51:19 -0700, Jeff Janes wrote:
> On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes  wrote:
> 
> > On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis  wrote:
> >
> >>
> >> Patch attached. Only brief testing done, so I might have missed
> >> something. I will look more closely later.
> >>
> >
> > After applying your patch, I could run the stress test described here:
> >
> > http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php
> >
> > But altered to make use of initdb -k, of course.
> >
> > Over 10,000 cycles of crash and recovery, I encountered two cases of
> > checksum failures after recovery, example:
> > ...
> >
> 
> 
> > Unfortunately I already cleaned up the data directory before noticing the
> > problem, so I have nothing to post for forensic analysis.  I'll try to
> > reproduce the problem.
> >
> >
> I've reproduced the problem, this time in block 74 of relation
> base/16384/4931589, and a tarball of the data directory is here:
> 
> https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing
> 
> (the table is in database jjanes under role jjanes, the binary is commit
> 9ad27c215362df436f8c)
> 
> What I would probably really want is the data as it existed after the crash
> but before recovery started, but since the postmaster immediately starts
> recovery after the crash, I don't know of a good way to capture this.
> 
> I guess one thing to do would be to extract from the WAL the most recent
> FPW for block 74 of relation base/16384/4931589  (assuming it hasn't been
> recycled already) and see if it matches what is actually in that block of
> that data file, but I don't currently know how to do that.

Since I bragged somewhere else recently that it should be easy to do now that
we have pg_xlogdump I hacked it up so it dumps all the full page writes into
the directory specified by --dump-bkp=PATH. It currently overwrites previous
full page writes to the same page but that should be trivial to change if you
want by adding %X.%X for the lsn into the path sprintf.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 09cdce611dc74082901ca1a646135a5ea1af709c Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 2 Apr 2013 11:43:23 +0200
Subject: [PATCH] pg_xlogdump: add option for dumping full page writes into a
 directory

---
 contrib/pg_xlogdump/pg_xlogdump.c |   56 -
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index d6d5498..a5e4186 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -40,6 +40,7 @@ typedef struct XLogDumpConfig
 	bool		bkp_details;
 	int			stop_after_records;
 	int			already_displayed_records;
+	char		*dump_bkp;
 
 	/* filter options */
 	int			filter_by_rmgr;
@@ -373,6 +374,46 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord
    bkpb.block, bkpb.hole_offset, bkpb.hole_length);
 		}
 	}
+
+	if (config->dump_bkp != NULL)
+	{
+		int			bkpnum;
+		char	   *blk = (char *) XLogRecGetData(record) + record->xl_len;
+
+		for (bkpnum = 0; bkpnum < XLR_MAX_BKP_BLOCKS; bkpnum++)
+		{
+			BkpBlock	bkpb;
+			char		bkpbdata[BLCKSZ];
+			int			outfd;
+			char		outpath[MAXPGPATH];
+
+			if (!(XLR_BKP_BLOCK(bkpnum) & record->xl_info))
+continue;
+
+			memcpy(&bkpb, blk, sizeof(BkpBlock));
+			blk += sizeof(BkpBlock);
+
+			memcpy(bkpbdata, blk, bkpb.hole_offset);
+			memset(bkpbdata + bkpb.hole_offset, 0, bkpb.hole_length);
+			memcpy(bkpbdata + bkpb.hole_offset + bkpb.hole_length,
+			   blk + bkpb.hole_offset,
+			   BLCKSZ - bkpb.hole_offset - bkpb.hole_length);
+
+			blk += BLCKSZ - bkpb.hole_length;
+
+			sprintf(outpath, "%s/%u-%u-%u:%u_%s", config->dump_bkp,
+			bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode,
+			bkpb.block, forkNames[bkpb.fork]);
+			outfd = open(outpath, O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR);
+			if (outfd < 0)
+fatal_error("could not open output file %s: %s",
+outpath, strerror(errno));
+			if (write(outfd, bkpbdata, BLCKSZ) != BLCKSZ)
+fatal_error("could not successfully write output block %s: %s",
+outpath, strerror(errno));
+			close(outfd);
+		}
+	}
 }
 
 static void
@@ -387,6 +428,7 @@ usage(void)
 	printf("  -?, --help show this help, then exit\n");
 	printf("\nContent options:\n");
 	printf("  -b, --bkp-details  output detailed information about backup blocks\n");
+	printf("  -d, --dump-bkp=pathdump all full page images into PATH\n");
 	printf("  -e, --end=RECPTR   stop reading at log position RECPTR\n");
 	printf("  -n, --limit=N  number of records to display\n");
 	printf("  -p, --path=PATHdirectory in which to find log segment files\n");
@@ -413,6 +455,7 @@ main(int argc, char **argv)
 
 	static struct option l

Re: [HACKERS] regression test failed when enabling checksum

2013-04-02 Thread Simon Riggs

On 2 April 2013 02:53, Jeff Davis  wrote:

>> Any idea
>> what is going on?
>
> Not right now.

Since I'm now responsible for the quality of this patch, I need to say
this before someone else does: we have until the end of the week to
fix this conclusively, or I will need to consider whether to revoke
the patch in this release. Thanks for your time and trouble to locate
problems.

It may be that the patch is revealing an underlying bug, so we don't
yet have evidence for the source of the bug.

> My primary suspect is what's going on in
> visibilitymap_set() and heap_xlog_visible(), which is more complex than
> some of the other code. That would require some VACUUM activity, which
> isn't in your workload -- do you think autovacuum may kick in sometimes?

Other candidates might be:

* page hole is non-empty, so the overwrite of the "Full" page write
may leave the block in an intermediate state. Tom recently fixed
RestoreBkpBlock() to avoid it zeroing the contents first before
applying the page.

* something to do with access to GetPageLSN()

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-02 Thread Gavin Flower


On 29/03/13 13:12, Brendan Jurd wrote:

On 28 March 2013 20:34, Dean Rasheed  wrote:

Is the patch also going to allow empty arrays in higher dimensions
where not just the last dimension is empty?

It doesn't allow that at present.


It seems as though, if
it's allowing 1-by-0 arrays like '{{}}' and '[4:4][8:7]={{}}', it
should also allow 0-by-0 arrays like '[4:3][8:7]={}', and 0-by-3
arrays like '[4:3][11:13]={}'.

I think the array literal parser would reject this on the grounds that
the brace structure doesn't match the dimension specifiers.  You could
modify that check to respect zero length in dimensions other than the
innermost one, but it's hard to say whether it would be worth the
effort.

It might be instructive to hear from somebody who does (or intends to)
use multidim arrays for some practical purpose, but I don't even know
whether such a person exists within earshot of this list.

Cheers,
BJ


Multiple dimension arrays. 'might' be something I may need, or at least 
usefully use in my current project.


I am not ready to do the database design to that detail yet.

However, I know I will need to do 'clever'(TM) things with multiple 
images areas on images, and possibly group several images together (and 
still access their individual image areas).  So potentially arrays of 
dimension 2 or greater may be useful.  Certainly, some images will not 
have any image areas defined, some arrays might be empty.



Cheers,
Gavin

43 matches

Mail list logo