Re: [HACKERS] Triggers on foreign tables

2013-10-10 Thread Ronan Dunklau
Le dimanche 6 octobre 2013 22:33:23 Kohei KaiGai a écrit :
 2013/9/10 Ronan Dunklau rdunk...@gmail.com:
  For row-level triggers, it seems more complicated. From what I understand,
  OLD/NEW tuples are fetched from the heap using their ctid (except for
  BEFORE INSERT triggers). How could this be adapted for foreign tables ?
 
 It seems to me the origin of difficulty to support row-level trigger
 is that foreign table
 does not have a back-door to see the older tuple to be updated, unlike
 regular tables.
 The core-PostgreSQL knows on-disk format to store tuples within
 regular tables, thus,
 GetTupleForTrigger() can fetch a tuple being identified by tupleid.
 On the other hand, writable foreign table is also designed to identify
 a remote tuple
 with tupleid, as long as FDW module is responsible.

It is my understanding that the tupleid is one way for a FDW to identify a 
tuple, but the  AddForeignUpdateTargets hook allows for arbitrary resjunks 
columns to be added. It is these columns that are then used to identify the 
tuple to update. I don't think the tupleid itself can be used to identify a 
tuple for the trigger. 

 So, a straightforward idea is adding a new FDW interface to get an
 older image of
 the tuple to be updated. It makes possible to implement something similar to
 GetTupleForTrigger() on foreign tables, even though it takes a
 secondary query to
 fetch a record from the remote server. Probably, it is an headache for us

 One thing we pay attention is, the tuple to be updated is already
 fetched from the
 remote server and the tuple being fetched latest is (always?) the
 target of update
 or delete. It is not a heavy job for ForeignScanState node to hold a
 copy of the latest
 tuple. Isn't it an available way to reference relevant
 ForeignScanState to get the older
 image?

It is however possible to have only an incomplete tuple.

The FDW may have only fetched the tupleid-equivalent, and  we would have to 
ensure that the reltargetlist contains every attribute that the trigger could 
need. Or, perform a second round-trip to the foreign data store to fetch the 
whole tuple.

 If my assumption is right, all we need to enhance are (1) add a store on
 ForeignScanState to hold the last tuple on the scan stage, (2) add
 GetForeignTupleForTrigger() to reference the store above on the relevant
 ForeignScanState.

I would add a 3) have a GetTupleForTrigger() hook that would get called with 
the stored tuple.

I personnally don't like this approach: there is already (n+1) round trips to 
the foreign store (one during the scan phase, and one per tuple during the 
update/delete phase), we don't need another one per tuple for the triggers.

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] PSQL return coder

2013-10-10 Thread Tom Lane
James Sewell james.sew...@lisasoft.com writes:
 My question is in a rollback scenario is it possible to get PSQL to return
 a non 0 exit status?

Maybe you could use -c instead of -f?

$ psql -c 'select 1; select 1/0' regression
ERROR:  division by zero
$ echo $?
1

You won't need explicit BEGIN/END because this is already a single
transaction.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-10-10 Thread Andres Freund
On 2013-10-02 13:16:06 +0900, Michael Paquier wrote:
 Each patch applied with its parents compiles, has no warnings AFAIK
 and passes regression/isolation tests. Working on 0004 by the end of
 the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
 0003 now, and I can put some time to finalize them for this CF. I
 think that we should perhaps split 0003 into 2 pieces, with one patch
 for the introduction of index_concurrent_build, and another for
 index_concurrent_set_dead. Comments are welcome about that though, and
 if people agree on that I'll do it once 0002 is finalized.

FWIW I don't think splitting of index_concurrent_build is worthwile...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fast gin cache performance improvement

2013-10-10 Thread Etsuro Fujita
Ian Link wrote:
  Although I asked this question, I've reconsidered about these
  parameters, and it seems that these parameters not only make code
  rather complex but are a little confusing to users.  So I'd like to propose
 to introduce only one parameter:
  fast_cache_size.  While users that give weight to update performance
  for the fast update technique set this parameter to a large value,
  users that give weight not only to update performance but to search
  performance set this parameter to a small value.  What do you think about
 this?
 I think it makes sense to maintain this separation. If the user doesn't need
 a per-index setting, they don't have to use the parameter. Since the parameter
 is off by default, they don't even need to worry about it.
 There might as well be one parameter for users that don't need fine-grained
 control. We can document this and I don't think it will be confusing to the
 user.

OK, though I'd like to hear the opinion of others.

  4. In my understanding, the small value of
  gin_fast_limit/fast_cache_size leads to the increase in GIN search
  performance, which, however, leads to the decrease in GIN update
  performance.  Am I right?  If so, I think the tradeoff should be noted in
 the documentation.
 I believe this is correct.

  5. The following documents in Chapter 57. GIN Indexes need to be
  updated: * 57.3.1. GIN Fast Update Technique * 57.4. GIN Tips and
  Tricks
 Sure, I can add something.

  6. I would like to see the results for the additional test cases
(tsvectors).
 I don't really have any good test cases for this available, and have very
limited
 time for postgres at the moment. Feel free to create a test case, but I don't
 believe I can at the moment. Sorry!

Unfortunately, I don't have much time to do such a thing, though I think we
should do that.  (In addition, we should do another performance test to make
clear an influence of fast update performance from introducing these parameters,
which would be required to determine the default values of these parameters.)

  7. The commented-out elog() code should be removed.
 Sorry about that, I shouldn't have submitted the patch with those still there.

 I should have a new patch soonish, hopefully. Thanks for your feedback!

I think it is desirable that this patch should be resubmitted for the next
CommitFest for further review and testing mentioned above.  So I'd like to mark
this patch as Returned with Feedback.  Is it OK?

Thanks,

Best regards,
Etsuro Fujita




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest progress

2013-10-10 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
   The CommitFest is supposed to be a time to
 *commit the patches that are ready to be committed*, not to wait
 indefinitely for them to become ready to be committed.

I beg to differ. Commit Fests are the time when patch authors know they
can get feedback from the community and in particular committers.

So as a patch author it's best if you can arrange your schedule and be
ready to submit new versions as asked, or comment on your design choices
and trade-offs, etc.

Patch commit can happen whenever in the cycle at the discretion of the
committer. Commit Fest are all about *review* and *feedback*.

 I therefore propose that we start by marking all of the patches that
 are currently Waiting on Author as Returned with Feedback.  Most of
 them have been that way for a long time.

That seems fair.

 Then, I think all of the people who are listed as reviewers need to
 take a look at the current state of their patches and decide whether
 or not they are reasonably ready to be committed.  If they are, then

I've been distracted away from this commit fest but should be able to
get back to it now. Will post soon about the patches I enrolled myself
with.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for 
me.


Concernig the usability, I would like to suggest a minor change, that massively 
increases the usefulness of the patch for beginners, who often use this view as 
a first approach to optimize index structure.


The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer needed to 
be exposed in the view for everyone.


I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated (The actual 
behaviour is the default). This new option is not always the best starting 
point to discover index shortfalls, but a huge gain for beginners because it 
serves the needs

in more than 90% of the normal use cases.

What do you think?

Arne

P.S. This message is resent, because it is stalled two days, so sorry for a 
possible duplicate

Date:   Mon Oct 7 17:54:08 2013 +

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c

index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;   /* max # 
statements to track */
 static int pgss_track; /* tracking level */
 static bool pgss_track_utility; /* whether to track utility commands */
 static bool pgss_save; /* whether to save stats across 
shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query 
representation shown in the view (otherwise show the first query executed with 
this query_id) */

 #define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
 NULL,
 NULL);

+   DefineCustomBoolVariable(pg_stat_statements.normalize,
+Selects whether the view column contains the query 
strings in a normalized form.,
+  NULL,
+  pgss_normalize,
+  true,
+  PGC_SUSET,
+  0,
+  NULL,
+  NULL,
+  NULL);
+
EmitWarningsOnPlaceholders(pg_stat_statements);

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-   if (jstate)
+   if (jstate  pgss_normalize)
{
-   /* Normalize the string if enabled */
+   /* Normalize the string is not NULL and normalized 
query strings are enabled */
norm_query = generate_normalized_query(jstate, query,

   query_len,

   key.encoding);






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Sameer Thakur
Please find patch attached which adds documentation for session_start
and introduced fields and corrects documentation for queryid to be
query_id. session_start remains in the view as agreed.
regards
Sameer


pg_stat_statements-identification-v8.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 3:11 AM,  pilum...@uni-muenster.de wrote:
 But the drawback of this approach is impossibility to use
 explain analyze without further substitutions.

You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally
executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CommitFest progress

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 5:21 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
   The CommitFest is supposed to be a time to
 *commit the patches that are ready to be committed*, not to wait
 indefinitely for them to become ready to be committed.

 I beg to differ. Commit Fests are the time when patch authors know they
 can get feedback from the community and in particular committers.

 So as a patch author it's best if you can arrange your schedule and be
 ready to submit new versions as asked, or comment on your design choices
 and trade-offs, etc.

 Patch commit can happen whenever in the cycle at the discretion of the
 committer. Commit Fest are all about *review* and *feedback*.

Sure, I don't disagree with any of that.

 I therefore propose that we start by marking all of the patches that
 are currently Waiting on Author as Returned with Feedback.  Most of
 them have been that way for a long time.

 That seems fair.

 Then, I think all of the people who are listed as reviewers need to
 take a look at the current state of their patches and decide whether
 or not they are reasonably ready to be committed.  If they are, then

 I've been distracted away from this commit fest but should be able to
 get back to it now. Will post soon about the patches I enrolled myself
 with.

Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Wed, Oct 9, 2013 at 11:35 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Oct 9, 2013 at 8:20 PM, Bruce Momjian br...@momjian.us wrote:
 I am not sure that having that external to the backend really makes
 sense because I am concerned people will not use it.  We can certainly
 add it to change our defaults, of course.  Also consider many installs
 are automated.

 Sure.

 I was imagining that we'd want to write the tool with the idea in mind
 that it was usually run immediately after initdb. We'd reach out to
 packagers to have them push it into the hands of users where that's
 practical.

 If you think that sounds odd, consider that on at least one popular
 Linux distro, installing MySQL will show a ncurses interface where the
 mysql password is set. We wouldn't need anything as fancy as that.

I actually had the thought that it might be something we'd integrate
*into* initdb.  So you'd do initdb --system-memory 8GB or something
like that and it would do the rest.  That'd be slick, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 1:23 AM, Magnus Hagander mag...@hagander.net wrote:
 I think it would be even simpler, and more reliable, to start with the
 parameter to initdb - I like that. But instead of having it set a new
 variable based on that and then autotune off that, just have *initdb*
 do these calculations you're suggesting, and write new defaults to the
 files (preferably with a comment).

 That way if the user *later* comes in and say changes shared_buffers,
 we don't dynamically resize the work_mem into a value that might cause
 his machine to die from swapping which would definitely violate the
 principle of least surprise..

+1 for all of that.  I completely agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-10 Thread Heikki Linnakangas

On 09.10.2013 02:04, Tomas Vondra wrote:

On 8.10.2013 21:59, Heikki Linnakangas wrote:

On 08.10.2013 17:47, Alexander Korotkov wrote:

Hi, Tomas!

On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:


I've attempted to rerun the benchmarks tests I did a few weeks ago, but
   I got repeated crashes when loading the data (into a table with
tsvector+gin index).

Right before a crash, theres this message in the log:

 PANIC:  not enough space in leaf page!



Thanks for testing. Heikki's version of patch don't works for me too on
even much more simplier examples. I can try to get it working if he
answer
my question about GinDataLeafPageGetPostingList* macros.


The new macros in that patch version were quite botched. Here's a new
attempt.


Nope, still the same errors :-(

PANIC:  not enough space in leaf page!
LOG:  server process (PID 29722) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages


I've continued hacking away at the patch, here's yet another version. 
I've done a lot of cleanup and refactoring to make the code more 
readable (I hope). I'm not sure what caused the panic you saw, but it's 
probably fixed now.  Let me know if not.


Some notable changes since my last patch version:

* I changed the ginbtree interface so that isEnoughSpace method is no 
more. Instead, placeToPage returns false without doing anything if the 
item doesn't fit. It was slightly simpler this way when working with the 
compressed posting lists, as placeToPage had to calculate tuple sizes 
anyway to decide how many items fit on the page.


* I rewrote incrUpdateItemIndexes(). It now operates in two stages: 1. 
adjust the pageOffsets and 2. split the bin. I found it more readable 
this way.


* I changed the WAL format of insert records so that there is a separate 
struct for data-leaf, data-internal, and entry pages. The information 
recorded for each of those was so different that it was confusing to 
cram them all into the same struct.



I'm going to step back a bit from the details of the patch now. I think 
there's enough work for you, Alexander, until the next commitfest:


* Try to make the code also work with the old page format, so that you 
don't need to REINDEX after pg_upgrade.


* Documentation. The new compressed posting list format needs to be 
explained somewhere. At the top of ginpostinglist.c would be a good 
place. The README should be improved too.


* Fix whatever I broke (sorry)

Are we committed to go ahead with this patch in 9.4 timeframe, in one 
form or another? If we are, then I'd like to start committing 
refactoring patches that just move code around in preparation for the 
Patch, to make the review of the core of the patch easier. For example, 
merging the isEnoughSpace and placeToPage methods in the ginbtree 
interface. Without the patch, it's unnecessary code churn - it won't do 
any harm but it won't do any good either. I'm pretty confident that this 
patch will land in the 9.4 timeframe, so barring objections I'll start 
committing such refactorings.


- Heikki


gin-packed-postinglists-10-heikki.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70

Thx for your reply.

On Thu, 10 Oct 2013, Peter Geoghegan wrote:


On Thu, Oct 10, 2013 at 3:11 AM,  pilum...@uni-muenster.de wrote:

But the drawback of this approach is impossibility to use
explain analyze without further substitutions.


You can fairly easily disable the swapping of constants with '?'
symbols, so that the query text stored would match the full originally


I thought I did ?! I introduced an additional user parameter
to disable the normalization in the patch shown in my last mail.

If there is already an easier way in the actual distribution, 
i simply missed ist. 
Where is this behaviour documented?




executed query. Why would you want to, though? There could be many
actual plans whose costs are aggregated as one query. Seeing one of
them is not necessarily useful at all, and could be misleading.



Yeah, (thinking of for example parameter ranges) I mentioned that, I think,
but in the majority of cases beginners can easily conclude missing indices
executing explain analyze, because the queries, that are aggregated
and displayed under one query_id have very similar (or simply the same) query 
plans.

It's also only an option disabled by default: You can simply do nothing, if you 
don't like it :-)

VlG

Arne Scheffer


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] strange behavior of pg_trgm's similarity function

2013-10-10 Thread Fujii Masao
Hi,

The behavior of pg_trgm's similarity function seems strange. Is this
intentional?

I was thinking that the following three calls of the similarity function return
the same number because the second argument is just the three characters
contained in the first argument in every calls.

=# SELECT similarity('12345', '123');
=# SELECT similarity('12345', '234');
=# SELECT similarity('12345', '345');

But that's not true. Each returns the different number.

=# SELECT similarity('12345', '123');
 similarity

   0.428571
(1 row)

=# SELECT similarity('12345', '234');
 similarity

   0.11
(1 row)

=# SELECT similarity('12345', '345');
 similarity

   0.25
(1 row)

This happens because, for example, similarity('12345', '123') returns
the similarity number of '**12345*' and '**123*' (* means the blank character),
NOT '12345' and '123'. IOW, two and one blank characters are added into
the heading and tailing of each argument, respectively. I wonder why
pg_trgm's similarity function works in this way. We should change this
so that no blank characters are added into the arguments?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange behavior of pg_trgm's similarity function

2013-10-10 Thread Heikki Linnakangas

On 10.10.2013 15:03, Fujii Masao wrote:

Hi,

The behavior of pg_trgm's similarity function seems strange. Is this
intentional?

I was thinking that the following three calls of the similarity function return
the same number because the second argument is just the three characters
contained in the first argument in every calls.

=# SELECT similarity('12345', '123');
=# SELECT similarity('12345', '234');
=# SELECT similarity('12345', '345');

But that's not true. Each returns the different number.

=# SELECT similarity('12345', '123');
  similarity

0.428571
(1 row)

=# SELECT similarity('12345', '234');
  similarity

0.11
(1 row)

=# SELECT similarity('12345', '345');
  similarity

0.25
(1 row)

This happens because, for example, similarity('12345', '123') returns
the similarity number of '**12345*' and '**123*' (* means the blank character),
NOT '12345' and '123'. IOW, two and one blank characters are added into
the heading and tailing of each argument, respectively. I wonder why
pg_trgm's similarity function works in this way. We should change this
so that no blank characters are added into the arguments?


Well, you could also argue that 11 and 22 are quite similar, 
even though pg_trgm's similarity will not think so. It comes down to the 
definition of similarity, and how well that definition matches your 
intuition.


FWIW, it feels right to me that a match in the beginning of a word is 
worth more than one in the middle of a string. -1 on changing that.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
* Peter Geoghegan (p...@heroku.com) wrote:
 On Wed, Oct 9, 2013 at 7:11 PM, Stephen Frost sfr...@snowman.net wrote:
  There is definitely something to be said for simplicity and just up'ing
  the default would have a more dramatic impact with a setting like
  work_mem than it would with shared_buffers, imv.
 
 Simplicity for us or for our users? 

My thinking was 'both', really.

 I wonder if we should just ship something like pgtune (in /bin, not in
 /contrib) that can be optionally used at initdb time. Making something
 like wal_buffers self-tuning is really compelling, but work_mem is
 quite different.

I'm coming around to agree with this also- doing this at initdb time
really makes more sense than during server start-up based on some
(mostly) unrelated value.

 I hear a lot of complaints about the first 15 minutes experience of
 Postgres. It's easy to scoff at this kind of thing, but I think we
 could do a lot better there, and at no real cost - the major blocker
 to doing something like that has been fixed (of course, I refer to the
 SysV shared memory limits). Is the person on a very small box where
 our current very conservative defaults are appropriate? Why not ask a
 few high-level questions like that to get inexperienced users started?

There are certainly challenges here wrt asking questions during install,
as was mentioned elsewhere, but I agree that we could do better.

 The tool could even have a parameter that allows a packager to pass
 total system memory without bothering the user with that, and without
 bothering us with having to figure out a way to make that work
 correctly and portably.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
* Magnus Hagander (mag...@hagander.net) wrote:
 I think it would be even simpler, and more reliable, to start with the
 parameter to initdb - I like that. But instead of having it set a new
 variable based on that and then autotune off that, just have *initdb*
 do these calculations you're suggesting, and write new defaults to the
 files (preferably with a comment).

Agreed; I especially like having the comment included.

 That way if the user *later* comes in and say changes shared_buffers,
 we don't dynamically resize the work_mem into a value that might cause
 his machine to die from swapping which would definitely violate the
 principle of least surprise..

+1

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-10 Thread Robert Haas
On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote:
 Do you have a better alternative? Making the computation unconditionally
 64bit will have a runtime overhead and adding a StaticAssert in the
 existing macro doesn't work because we use it in array sizes where gcc
 balks.
 We could try using inline functions, but that's not going to be pretty
 either.

 I don't really see that many further usecases that will align 64bit
 values on 32bit platforms, so I think we're ok for now.

I'd be inclined to make the computation unconditionally 64-bit.  I
doubt the speed penalty is enough to worry about, and I think we're
going to have more and more cases where optimizing for 32-bit
platforms is just not the right decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: FORCE_NULL option for copy COPY in CSV mode

2013-10-10 Thread Andrew Dunstan


On 10/09/2013 11:47 PM, Amit Kapila wrote:


One of the advantage, I could see using NULL For .. syntax is
that already we have one syntax with which user can specify what
strings can be replaced with NULL, now just to handle quoted empty
string why to add different syntax.

FORCE_NULL has advantage that it can be used for some columns rather
than all columns, but I think for that existing syntax can also be
modified to support it.






I think it's badly designed  on its face. We don't need and shouldn't 
provide a facility for different NULL markers. A general facility for 
that would be an ugly an quite pointless addition to code complexity. 
What we need is simply a way of altering one specific behaviour, namely 
the treatment of quoted NULL markers. We should not do that by allowing 
munging the NULL marker per column, but by a syntactical mechanism that 
directly addresses the change in behaviour. If you don't like FORCE 
NULL then let's pick something else, but not this ugly and unnecessary 
NULL FOR gadget.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] space reserved for WAL record does not match what was written: panic on windows

2013-10-10 Thread Andres Freund
On 2013-10-10 08:59:47 -0400, Robert Haas wrote:
 On Tue, Oct 8, 2013 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote:
  Do you have a better alternative? Making the computation unconditionally
  64bit will have a runtime overhead and adding a StaticAssert in the
  existing macro doesn't work because we use it in array sizes where gcc
  balks.
  We could try using inline functions, but that's not going to be pretty
  either.
 
  I don't really see that many further usecases that will align 64bit
  values on 32bit platforms, so I think we're ok for now.
 
 I'd be inclined to make the computation unconditionally 64-bit.  I
 doubt the speed penalty is enough to worry about, and I think we're
 going to have more and more cases where optimizing for 32-bit
 platforms is just not the right decision.

MAXALIGN is used in several of PG's hottest functions in many
scenarios. att_align_nominal is used in slot_deform_tuple,
heap_deform_tuple, nocachegetattr, etc. So I don't think that's viable
yet. At least not with much more benefit than this...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PSQL return coder

2013-10-10 Thread Merlin Moncure
On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 James Sewell james.sew...@lisasoft.com writes:
 My question is in a rollback scenario is it possible to get PSQL to return
 a non 0 exit status?

 Maybe you could use -c instead of -f?

 $ psql -c 'select 1; select 1/0' regression
 ERROR:  division by zero
 $ echo $?
 1

 You won't need explicit BEGIN/END because this is already a single
 transaction.

According to the man page,
EXIT STATUS
   psql returns 0 to the shell if it finished normally, 1 if a fatal error
   of its own (out of memory, file not found) occurs, 2 if the  connection
   to the server went bad and the session was not interactive, and 3 if an
   error occurred in a script and the variable ON_ERROR_STOP was set.

So for a longer script ON_ERROR_STOP might be the ticket (which is
usually a good idea anyways).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-10-10 Thread Antonin Houska
On 10/09/2013 08:56 PM, Robert Haas wrote:
 There seem to be several review comments made since you posted this
 version.  I'll mark this Waiting on Author in the CommitFest
 application, since it seems that the patch needs to be further
 updated.

Since it was waiting for reviewer, I was not sure whether I should wait
for his findings and fix everything in a single transaction.
Nevertheless, the next version is attached here.

It fixes findings reported in
http://www.postgresql.org/message-id/20130903165652.gc5...@eldon.alvh.no-ip.org

As for units
http://www.postgresql.org/message-id/20130903224127.gd7...@awork2.anarazel.de
I'm not convinced about MB at the moment. Unfortunately I couldn't
find any other command-line PG utility requiring amount of data as an
option. Thus I use single-letter suffix, just as wget does for the same
purposes.

As for this
http://www.postgresql.org/message-id/20130903125155.ga18...@awork2.anarazel.de
there eventually seems to be a consensus (I notice now the discussion
was off-list):

 On 2013-09-03 23:21:57 +0200, Antonin Houska wrote:
 On 09/03/2013 02:51 PM, Andres Freund wrote:


 It's probably better to use latches for the waiting, those have properly
 defined interruption semantics. Whether pg_usleep will be interrupted is
 platform dependant...

 I noticed a comment about interruptions around the definition of
 pg_usleep() function, but concluded that the sleeps are rather frequent
 in this applications (typically in the order of tens to hundreds per
 second, although the maximum of 256 might need to be decreased).
 Therefore occasional interruptions shouldn't distort the actual rate
 much. I'll think about it again. Thanks,
 
 The issue is rather that you might not get woken up when you want to
 be. Signal handlers in postgres tend to do a
 SetLatch(MyProc-procLatch); which then will interrupt sleeps done via
 WaitLatch(). It's probably not that important with the duration of the
 sleeps you have.
 
 Greetings,
 
 Andres Freund

// Antonin Houska (Tony)




diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index eb0c1d6..882a26b 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,22 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-r/option/term
+  termoption--max-rate/option/term
+  listitem
+   para
+The maximum amount of data transferred from server per second.
+The purpose is to limit impact of applicationpg_basebackup/application
+on a running master server while transfering data directory.
+   /para
+   para
+Suffixes literalk/literal (kilobytes) and literalM/literal
+(megabytes) are accepted. For example: literal10M/literal
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-R/option/term
   termoption--write-recovery-conf/option/term
   listitem
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ba8d173..b2f10c3 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -33,6 +33,7 @@
 #include utils/builtins.h
 #include utils/elog.h
 #include utils/ps_status.h
+#include utils/timestamp.h
 #include pgtar.h
 
 typedef struct
@@ -42,6 +43,7 @@ typedef struct
 	bool		fastcheckpoint;
 	bool		nowait;
 	bool		includewal;
+	uint32		maxrate;
 } basebackup_options;
 
 
@@ -59,6 +61,7 @@ static void perform_base_backup(basebackup_options *opt, DIR *tblspcdir);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
 static int	compareWalFileNames(const void *a, const void *b);
+static void throttle(size_t increment);
 
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
@@ -68,6 +71,41 @@ static bool backup_started_in_recovery = false;
  */
 #define TAR_SEND_SIZE 32768
 
+
+/*
+ * The maximum amount of data per second - bounds of the user input.
+ *
+ * If the maximum should be increased to more than 4 GB, uint64 must
+ * be introduced for the related variables. However such high values have
+ * little to do with throttling.
+ */
+#define MAX_RATE_LOWER	32768
+#define MAX_RATE_UPPER	(1024  20)
+
+/*
+ * Transfer rate is only measured when this number of bytes has been sent.
+ * (Too frequent checks would impose too high CPU overhead.)
+ *
+ * The default value is used unless it'd result in too frequent checks.
+ */
+#define THROTTLING_SAMPLE_MIN	32768
+
+/* The maximum number of checks per second.  */
+#define THROTTLING_MAX_FREQUENCY	128
+
+/* The actual value, transfer of which may cause sleep. */
+static uint32 throttling_sample;
+
+/* Amount of data already transfered but not yet throttled.  */
+static int32 throttling_counter;
+
+/* The minimum time required to transfer throttling_sample bytes. */

[HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-10 Thread Amit Kapila
One of the user's of PostgreSQL has reported that if tablespace path
is long, it leads to hang and the hang is unbreakable.

Simple testcase to reproduce hang is:
a. initdb -D 
E:\WorkSpace\PostgreSQL\master\RM30253_Data\aa\db
b. Create tablespace tbs location 'E:\WorkSpace\PostgreSQL\master\Data\idb';
c. Drop tablespace tbs;

In this test path length used in 174, but I observed that hang occurs
if the length is greater than 130 (approx.)

I have tested this test on few different Windows platforms (Windows XP
32-bit, Windows 7 64bit). Hang occurs on Windows7 64bit. User has
reported it on Windows 2008 64bit.

On further analysis, I found that hang occurs in some of Windows
API(FindFirstFile, RemoveDirectroy) when symlink path
(pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
API's. For above testcase, it will hang in path
destroy_tablespace_directories-ReadDir-readdir-FindFirstFile

I have tried using mklink /J (utility in Windows 7 and above) to
create Junction point instead of current way in pgsymlink, it still
hangs in similar way.

Some of the ways to resolve the problem are described as below:

1. I found that if the link path is accessed as a full path during
readdir or stat, it works fine.

For example in function destroy_tablespace_directories(), the path
used to access tablespace directory is of form
pg_tblspc/16235/PG_9.4_201309051 by using below sprintf
sprintf(linkloc_with_version_dir,
pg_tblspc/%u/%s,tablespaceoid,TABLESPACE_VERSION_DIRECTORY);
Now when it tries to access this path it is assumed in code that
corresponding OS API will take care of considering this path w.r.t
current working directory, which is right as per specs,
however as it hangs in OS API (FindFirstFile) if path length  130 for
symlink and if try to use full path instead of starting with
pg_tblspc, it works fine.
So one way to resolve this issue is to use full path for symbolic link
path access instead of relying on OS to use full path.

2. Resolve symbolic link to actual path in code whenever we tries to
access it using pgreadlink. It is already used in pg_basebackup.

3. One another way is to check in code (initdb and create tablespace)
to not allow path of length more than 100 or 120

Kindly let me know your suggestions regarding above approaches to
resolve the problem or if you think there can be any other better way
to address this problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-10 Thread Mike Blackwell
I'd received an email from Gibheer suggesting it be move due to lack of time to 
work on it.  I can certainly move it back if that's no longer the case.





On Oct 9, 2013, at 23:25, Amit Kapila amit.kapil...@gmail.com wrote:

 On Thu, Oct 10, 2013 at 3:17 AM, Gibheer gibh...@zero-knowledge.org wrote:
 On Mon, 7 Oct 2013 11:39:55 +0530
 Amit Kapila amit.kapil...@gmail.com wrote:
 Robert Haas wrote:
 On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
 andres(at)2ndquadrant(dot)com wrote:
 Hmm.  It seems like this match is making MaxConnections no longer
 mean the maximum number of connections, but rather the maximum
 number of non-replication connections.  I don't think I support
 that definitional change, and I'm kinda surprised if this is
 sufficient to implement it anyway (e.g. see InitProcGlobal()).
 
 I don't think the implementation is correct, but why don't you
 like the definitional change? The set of things you can do from
 replication connections are completely different from a normal
 connection. So using separate pools for them seems to make sense.
 That they end up allocating similar internal data seems to be an
 implementation detail to me.
 
 Because replication connections are still connections.  If I tell
 the system I want to allow 100 connections to the server, it should
 allow 100 connections, not 110 or 95 or any other number.
 
 I think that to reserve connections for replication, mechanism similar
 to superuser_reserved_connections be used rather than auto vacuum
 workers or background workers.
 This won't change the definition of MaxConnections. Another thing is
 that rather than introducing new parameter for replication reserved
 connections, it is better to use max_wal_senders as it can serve the
 purpose.
 
 Review for replication_reserved_connections-v2.patch, considering we
 are going to use mechanism similar to superuser_reserved_connections
 and won't allow definition of MaxConnections to change.
 
 1. /* the extra unit accounts for the autovacuum launcher */
  MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
 - + max_worker_processes;
 + + max_worker_processes + max_wal_senders;
 
 Above changes are not required.
 
 2.
 + if ((!am_superuser  !am_walsender) 
  ReservedBackends  0 
  !HaveNFreeProcs(ReservedBackends))
 
 Change the check as you have in patch-1 for
 ReserveReplicationConnections
 
 if (!am_superuser 
 (max_wal_senders  0 || ReservedBackends  0) 
 !HaveNFreeProcs(max_wal_senders + ReservedBackends))
 ereport(FATAL,
 (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
 errmsg(remaining connection slots are reserved for replication and
 superuser connections)));
 
 3. In guc.c, change description of max_wal_senders similar to
 superuser_reserved_connections at below place:
   {max_wal_senders, PGC_POSTMASTER, REPLICATION_SENDING,
 gettext_noop(Sets the maximum number of simultaneously running WAL
 sender processes.),
 
 4. With this approach, there is no need to change InitProcGlobal(), as
 it is used to keep track bgworkerFreeProcs and autovacFreeProcs, for
 others it use freeProcs.
 
 5. Below description in config.sgml needs to be changed for
 superuser_reserved_connections to include the effect of max_wal_enders
 in reserved connections.
 Whenever the number of active concurrent connections is at least
 max_connections minus superuser_reserved_connections, new connections
 will be accepted only for superusers, and no new replication
 connections will be accepted.
 
 6. Also similar description should be added to max_wal_senders
 configuration parameter.
 
 7. Below comment needs to be updated, as it assumes only superuser
 reserved connections as part of MaxConnections limit.
   /*
 * ReservedBackends is the number of backends reserved for superuser
 use.
 * This number is taken out of the pool size given by MaxBackends so
 * number of backend slots available to non-superusers is
 * (MaxBackends - ReservedBackends).  Note what this really means is
 * if there are = ReservedBackends connections available, only
 superusers
 * can make new connections --- pre-existing superuser connections
 don't
 * count against the limit.
 */
 int ReservedBackends;
 
 8. Also we can add comment on top of definition for max_wal_senders
 similar to ReservedBackends.
 
 
 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com
 
 Hi,
 
 I took the time and reworked the patch with the feedback till now.
 Thank you very much Amit!

  Is there any specific reason why you moved this patch to next CommitFest?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread MauMau

From: Bruce Momjian br...@momjian.us

I will work on auto-tuning temp_buffers next.  Any other suggestions?
wal_buffers is already auto-tuned.


Great work.  I'm looking forward to becoming able to fully utilize system 
resources right after initdb.


Although this is not directly related to memory, could you set 
max_prepared_transactions = max_connections at initdb time?  People must 
feel frustrated when they can't run applications on a Java or .NET 
application server and notice that they have to set 
max_prepared_transactions and restart PostgreSQL.  This is far from 
friendly.


Regards
MauMau





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange behavior of pg_trgm's similarity function

2013-10-10 Thread Merlin Moncure
On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 10.10.2013 15:03, Fujii Masao wrote:

 Hi,

 The behavior of pg_trgm's similarity function seems strange. Is this
 intentional?

 I was thinking that the following three calls of the similarity function
 return
 the same number because the second argument is just the three characters
 contained in the first argument in every calls.

 =# SELECT similarity('12345', '123');
 =# SELECT similarity('12345', '234');
 =# SELECT similarity('12345', '345');

 But that's not true. Each returns the different number.

 =# SELECT similarity('12345', '123');
   similarity
 
 0.428571
 (1 row)

 =# SELECT similarity('12345', '234');
   similarity
 
 0.11
 (1 row)

 =# SELECT similarity('12345', '345');
   similarity
 
 0.25
 (1 row)

 This happens because, for example, similarity('12345', '123') returns
 the similarity number of '**12345*' and '**123*' (* means the blank
 character),
 NOT '12345' and '123'. IOW, two and one blank characters are added into
 the heading and tailing of each argument, respectively. I wonder why
 pg_trgm's similarity function works in this way. We should change this
 so that no blank characters are added into the arguments?


 Well, you could also argue that 11 and 22 are quite similar,
 even though pg_trgm's similarity will not think so. It comes down to the
 definition of similarity, and how well that definition matches your
 intuition.

 FWIW, it feels right to me that a match in the beginning of a word is worth
 more than one in the middle of a string. -1 on changing that.

I'm not so sure that the assumption that leading trigrams should
effectively weight  3x is a good one to build into the library.
However, the behavior is clearly documented and can't be changed.   I
think you'd need to improvise an alternate set of trigram ops if you
wanted to rig an alternate matching behavior.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 I actually had the thought that it might be something we'd integrate
 *into* initdb.  So you'd do initdb --system-memory 8GB or something
 like that and it would do the rest.  That'd be slick, at least IMHO.

How would you handle the case that the machine (whether physical or
a VM) later gets more RAM?  That's certainly not unheard of with
physical servers, and with VMs I'm not sure that the database
server would necessarily go through a stop/start cycle for it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao
On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

Thanks for updating the document!

I'm not clear about the use case of 'session_start' and 'introduced' yet.
Could you elaborate it? Maybe we should add that explanation into
the document?

In my test, I found that pg_stat_statements.total_time always indicates a zero.
I guess that the patch might handle pg_stat_statements.total_time wrongly.

+values[i++] = DatumGetTimestamp(
+instr_get_timestamptz(pgss-session_start));
+values[i++] = DatumGetTimestamp(
+instr_get_timestamptz(entry-introduced));

These should be executed only when detected_version = PGSS_TUP_LATEST?

+  entrystructfieldsession_start/structfield/entry
+  entrytypetext/type/entry
+  entry/entry
+  entryStart time of a statistics session./entry
+ /row
+
+ row
+  entrystructfieldintroduced/structfield/entry
+  entrytypetext/type/entry

The data type of these columns is not text.

+instr_time  session_start;
+uint64private_stat_session_key;

it's better to add the comments explaining these fields.

+microsec = INSTR_TIME_GET_MICROSEC(t) -
+((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

HAVE_INT64_TIMESTAMP should be considered here.

+if (log_cannot_read)
+ereport(LOG,
+(errcode_for_file_access(),
+ errmsg(could not read pg_stat_statement file \%s\: %m,
+PGSS_DUMP_FILE)));

Is it better to use WARNING instead of LOG as the log level here?

+/*
+ * The role calling this function is unable to see
+ * sensitive aspects of this tuple.
+ *
+ * Nullify everything except the insufficient privilege
+ * message for this entry
+ */
+memset(nulls, 1, sizeof nulls);
+
+nulls[i]  = 0;
+values[i] = CStringGetTextDatum(insufficient privilege);

Why do we need to hide *all* the fields in pg_stat_statements, when
it's accessed by improper user? This is a big change of pg_stat_statements
behavior, and it might break the compatibility.

 This is not directly related to the patch itself, but why does the
  queryid
 need to be calculated based on also the statistics session?

   If we expose hash value of query tree, without using statistics session,
 it is possible that users might make wrong assumption that this value
 remains stable across version upgrades, when in reality it depends on
 whether the version has make changes to query tree internals. So to
 explicitly ensure that users do not make this wrong assumption, hash value
 generation use statistics session id, which is newly created under
 situations described above.

So, ISTM that we can use, for example, the version number to calculate
the query_id. Right?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread MauMau

From: Robert Haas robertmh...@gmail.com
On Thu, Oct 10, 2013 at 1:23 AM, Magnus Hagander mag...@hagander.net 
wrote:

I think it would be even simpler, and more reliable, to start with the
parameter to initdb - I like that. But instead of having it set a new
variable based on that and then autotune off that, just have *initdb*
do these calculations you're suggesting, and write new defaults to the
files (preferably with a comment).

That way if the user *later* comes in and say changes shared_buffers,
we don't dynamically resize the work_mem into a value that might cause
his machine to die from swapping which would definitely violate the
principle of least surprise..


+1 for all of that.  I completely agree.


I vote for this idea completely, too.  It's nice to be able to specify 
usable RAM with something like initdb --system-memory 8GB, because it 
provides flexibility for memory allocation --- use the whole machine for one 
PostgreSQL instance, or run multiple instances on one machine with 50% of 
RAM for instance-A and 25% of RAM for instance B and C, etc.  But what is 
the default value of --system-memory?  I would like it to be the whole RAM.


I hope something like pgtune will be incorporated into the core, absorbing 
the ideas in:


- pgtune
- https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
- the book PostgreSQL 9.0 High Performance by Greg Smith

Then initdb calls the tool.  Of course, DBAs can use the tool later.  Like 
pgtune, the tool would be nice if it and initdb can accept --system-type 
or --workload with arguments {OLTP | DW | mixed}.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 11:01:52PM +0900, MauMau wrote:
 From: Bruce Momjian br...@momjian.us
 I will work on auto-tuning temp_buffers next.  Any other suggestions?
 wal_buffers is already auto-tuned.
 
 Great work.  I'm looking forward to becoming able to fully utilize
 system resources right after initdb.
 
 Although this is not directly related to memory, could you set
 max_prepared_transactions = max_connections at initdb time?  People
 must feel frustrated when they can't run applications on a Java or
 .NET application server and notice that they have to set
 max_prepared_transactions and restart PostgreSQL.  This is far from
 friendly.

I think the problem is that many users don't need prepared transactions
and therefore don't want the overhead.  Is that still accurate?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Revive line type

2013-10-10 Thread Robert Haas
On Wed, Oct 9, 2013 at 10:43 PM, Peter Eisentraut pete...@gmx.net wrote:
 Revive line type

Kevin just pointed out to me that there are a bunch of buildfarm
failures.  I'm looking at the ones that are attributable to shared
memory, but there seem to be some problems with this patch as well.
Check out brolga, for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Dimitri Fontaine
MauMau maumau...@gmail.com writes:
 Although this is not directly related to memory, could you set
 max_prepared_transactions = max_connections at initdb time?  People must

You really need to have a transaction manager around when issuing
prepared transaction as failing to commit/rollback them will prevent
VACUUM and quickly lead you to an interesting situation.

It used to have a default of 5 and is now properly defaulting to 0.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 07:24:26AM -0700, Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
 
  I actually had the thought that it might be something we'd integrate
  *into* initdb.  So you'd do initdb --system-memory 8GB or something
  like that and it would do the rest.  That'd be slick, at least IMHO.
 
 How would you handle the case that the machine (whether physical or
 a VM) later gets more RAM?  That's certainly not unheard of with
 physical servers, and with VMs I'm not sure that the database
 server would necessarily go through a stop/start cycle for it.

Yes, going from a non-dedicated to a dedicated database server, adding
RAM, or moving the cluster to another server could all require an initdb
to change auto-tuned values.  This is why I think we will need to
auto-tune in the backend, rather than via initdb.  I do think an
available_mem parameter for initdb would help though, to be set in
postgresql.conf.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Thu, Oct 10, 2013 at 07:24:26AM -0700, Kevin Grittner wrote:
  Robert Haas robertmh...@gmail.com wrote:
   I actually had the thought that it might be something we'd integrate
   *into* initdb.  So you'd do initdb --system-memory 8GB or something
   like that and it would do the rest.  That'd be slick, at least IMHO.
  
  How would you handle the case that the machine (whether physical or
  a VM) later gets more RAM?  That's certainly not unheard of with
  physical servers, and with VMs I'm not sure that the database
  server would necessarily go through a stop/start cycle for it.
 
 Yes, going from a non-dedicated to a dedicated database server, adding
 RAM, or moving the cluster to another server could all require an initdb
 to change auto-tuned values.  This is why I think we will need to
 auto-tune in the backend, rather than via initdb.  I do think an
 available_mem parameter for initdb would help though, to be set in
 postgresql.conf.

For this case, I think the suggestion made by MauMau would be better-
tell the user (in the postgresql.conf comments) a command they can run
with different memory settings to see what the auto-tuning would do.
Perhaps even have a way to enable use of those new variables, but I
don't really care for the idea of making a GUC that isn't anything
except a control for defaults of *other* GUCs.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 7:20 PM, Sameer Thakur samthaku...@gmail.com wrote:
 Please find patch attached which adds documentation for session_start
 and introduced fields and corrects documentation for queryid to be
 query_id. session_start remains in the view as agreed.

 Thanks for updating the document!

 I'm not clear about the use case of 'session_start' and 'introduced' yet.
 Could you elaborate it? Maybe we should add that explanation into
 the document?

Probably.

The idea is that without those fields it's, to wit, impossible to
explain non-monotonic movement in metrics of those queries for precise
use in tools that insist on monotonicity of the fields, which are all
cumulative to date I think.

pg_stat_statements_reset() or crashing loses the session, so one
expects ncalls may decrease.  In addition, a query that is bouncing
in and out of the hash table will have its statistics be lost, so its
statistics will also decrease.  This can cause un-resolvable artifact
in analysis tools.

The two fields allow for precise understanding of when the statistics
for a given query_id are continuous since the last time a program
inspected it.

 In my test, I found that pg_stat_statements.total_time always indicates a 
 zero.
 I guess that the patch might handle pg_stat_statements.total_time wrongly.

 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(pgss-session_start));
 +values[i++] = DatumGetTimestamp(
 +instr_get_timestamptz(entry-introduced));

 These should be executed only when detected_version = PGSS_TUP_LATEST?

Yes. Oversight.

 +  entrystructfieldsession_start/structfield/entry
 +  entrytypetext/type/entry
 +  entry/entry
 +  entryStart time of a statistics session./entry
 + /row
 +
 + row
 +  entrystructfieldintroduced/structfield/entry
 +  entrytypetext/type/entry

 The data type of these columns is not text.

Oops

 +instr_time  session_start;
 +uint64private_stat_session_key;

 it's better to add the comments explaining these fields.

Yeah.


 +microsec = INSTR_TIME_GET_MICROSEC(t) -
 +((POSTGRES_EPOCH_JDATE - UNIX_EPOCH_JDATE) * USECS_PER_DAY);

 HAVE_INT64_TIMESTAMP should be considered here.

That's not a bad idea.

 +if (log_cannot_read)
 +ereport(LOG,
 +(errcode_for_file_access(),
 + errmsg(could not read pg_stat_statement file \%s\: %m,
 +PGSS_DUMP_FILE)));

 Is it better to use WARNING instead of LOG as the log level here?

Is this new code?  Why did I add it again?  Seems reasonable
though to call it a WARNING.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

It seems like an information leak that grows more major if query_id is
exposed and, at any point, one can determine the query_id for a query
text.

 This is not directly related to the patch itself, but why does the
  queryid
 need to be calculated based on also the statistics session?

   If we expose hash value of query tree, without using statistics session,
 it is possible that users might make wrong assumption that this value
 remains stable across version upgrades, when in reality it depends on
 whether the version has make changes to query tree internals. So to
 explicitly ensure that users do not make this wrong assumption, hash value
 generation use statistics session id, which is newly created under
 situations described above.

 So, ISTM that we can use, for example, the version number to calculate
 the query_id. Right?

That does seem like it may be a more reasonable stability vs. once per
statistics session, because then use case with standbys work somewhat
better.

That said, the general concern was accidental contracts being assumed
by consuming code, so this approach is tuned for making the query_id
as unstable as possible while still being useful: stable, but only in
one statistics gathering section.

I did not raise the objection about over-aggressive contracts being
exposed although I think the concern has merit...but given the use
case involving standbys, I am for now charitable to increasing the
stability to the level you indicate: a guaranteed break every point
release.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Wed, Oct  9, 2013 at 09:34:16PM -0400, Robert Haas wrote:
 But your auto-tuned value can easily be too low or too high, too.
 Consider someone with a system that has 64GB of RAM.   EnterpriseDB
 has had customers who have found that with, say, a 40GB database, it's
 best to set shared_buffers to 40GB so that the database remains fully
 cached.  Your latest formula will auto-tune work_mem to roughly 100MB.
  On the other hand, if the same customer has a 400GB database, which
 can't be fully cached no matter what, a much lower setting for
 shared_buffers, like maybe 8GB, is apt to perform better.  Your
 formula will auto-tune shared_buffers to roughly 20MB.
 
 In other words, when there's only 24GB of memory available for
 everything-except-shared-buffers, your formula sets work_mem five
 times higher than when there's 48GB of memory available for
 everything-except-shared-buffers.  That surely can't be right.

Let me walk through the idea of adding an available_mem setting, that
Josh suggested, and which I think addresses Robert's concern about
larger shared_buffers and Windows servers.

The idea is that initdb would allow you to specify an available_mem
parameter, which would set a corresponding value in postgresql.conf. 
This could be later changed by the user.  (See my other email about why
we shouldn't do the tuning in initdb.)

shared_buffers would auto-tune to 25% of that, except on Windows, and
perhaps capped at 8GB,   Here is another case where not tuning
directly on shared_buffers is a win.

All other calculations would be based on available_mem - shared_buffers,
so if shared_buffers is manually or auto-tuned high or low, other tuning
would still be accurate.

work_mem would tune to (available_mem - shared_buffers) / 16 /
max_connections, so even if you used all max_connections, and 3x of
work_mem in each, you would still only match the size of shared_buffers.
maintenance_work_mem would key on autovacuum_max_workers.

effective_cache_size would be available_mem minus all of the values
above.

Now, how to handle changes?  available_mem could only be changed by a
server restart, because shared_buffers is based on it, and the rest of
the parameters are based on available_mem - shared_buffers.  Though
users can change work_mem in postgresql.conf and per-session,
auto-tuning would not be affected by these changes.  Calculating only
with available_mem - shared_buffers would give stability and
predicability to the auto-tuning system.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 11:18:46AM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Thu, Oct 10, 2013 at 07:24:26AM -0700, Kevin Grittner wrote:
   Robert Haas robertmh...@gmail.com wrote:
I actually had the thought that it might be something we'd integrate
*into* initdb.  So you'd do initdb --system-memory 8GB or something
like that and it would do the rest.  That'd be slick, at least IMHO.
   
   How would you handle the case that the machine (whether physical or
   a VM) later gets more RAM?  That's certainly not unheard of with
   physical servers, and with VMs I'm not sure that the database
   server would necessarily go through a stop/start cycle for it.
  
  Yes, going from a non-dedicated to a dedicated database server, adding
  RAM, or moving the cluster to another server could all require an initdb
  to change auto-tuned values.  This is why I think we will need to
  auto-tune in the backend, rather than via initdb.  I do think an
  available_mem parameter for initdb would help though, to be set in
  postgresql.conf.
 
 For this case, I think the suggestion made by MauMau would be better-
 tell the user (in the postgresql.conf comments) a command they can run
 with different memory settings to see what the auto-tuning would do.
 Perhaps even have a way to enable use of those new variables, but I
 don't really care for the idea of making a GUC that isn't anything
 except a control for defaults of *other* GUCs.

Well, you then have two places you are doing the tuning --- one in
initdb, and another in the tool, and you can have cases where they are
not consistent.  You could have a mode where initdb re-writes
postgresql.conf, but that has all sorts of oddities about changing a
config file.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 On Thu, Oct 10, 2013 at 11:18:46AM -0400, Stephen Frost wrote:
  For this case, I think the suggestion made by MauMau would be better-
  tell the user (in the postgresql.conf comments) a command they can run
  with different memory settings to see what the auto-tuning would do.
  Perhaps even have a way to enable use of those new variables, but I
  don't really care for the idea of making a GUC that isn't anything
  except a control for defaults of *other* GUCs.
 
 Well, you then have two places you are doing the tuning --- one in
 initdb, and another in the tool, and you can have cases where they are
 not consistent.  You could have a mode where initdb re-writes
 postgresql.conf, but that has all sorts of oddities about changing a
 config file.

Not necessairly..  Have initdb call the tool to get the values to use
when first writing out the config file (or make the logic into a library
that initdb and the tool both use, or just both #include the same .h;
it's not like it's going to be terribly complicated), and then the tool
would be responsible for later changes to the postgresql.conf file, or
we just tell the user how to make the changes recommended by the tool.
I would *not* have initdb doing that, that's not its job.

If the user is expected to be modifying the postgresql.conf file in this
scenario anyway, I hardly see that having one-parameter-to-rule-them-all
is actually better than having 3 or 4.  If we're trying to get away from
the user modifying postgresql.conf, then we're going to need a tool to
do that (or use ALTER SYSTEM WHATEVER).  For my part, I'm really much
more interested in the first 15 minutes, as was mentioned elsewhere,
than how to help users who have been using PG for a year and then
discover they need to tune it a bit.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 11:45:41AM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  On Thu, Oct 10, 2013 at 11:18:46AM -0400, Stephen Frost wrote:
   For this case, I think the suggestion made by MauMau would be better-
   tell the user (in the postgresql.conf comments) a command they can run
   with different memory settings to see what the auto-tuning would do.
   Perhaps even have a way to enable use of those new variables, but I
   don't really care for the idea of making a GUC that isn't anything
   except a control for defaults of *other* GUCs.
  
  Well, you then have two places you are doing the tuning --- one in
  initdb, and another in the tool, and you can have cases where they are
  not consistent.  You could have a mode where initdb re-writes
  postgresql.conf, but that has all sorts of oddities about changing a
  config file.
 
 Not necessairly..  Have initdb call the tool to get the values to use
 when first writing out the config file (or make the logic into a library
 that initdb and the tool both use, or just both #include the same .h;
 it's not like it's going to be terribly complicated), and then the tool
 would be responsible for later changes to the postgresql.conf file, or
 we just tell the user how to make the changes recommended by the tool.
 I would *not* have initdb doing that, that's not its job.
 
 If the user is expected to be modifying the postgresql.conf file in this
 scenario anyway, I hardly see that having one-parameter-to-rule-them-all
 is actually better than having 3 or 4.  If we're trying to get away from
 the user modifying postgresql.conf, then we're going to need a tool to
 do that (or use ALTER SYSTEM WHATEVER).  For my part, I'm really much
 more interested in the first 15 minutes, as was mentioned elsewhere,
 than how to help users who have been using PG for a year and then
 discover they need to tune it a bit.

Well, I like the idea of initdb calling the tool, though the tool then
would need to be in C probably as we can't require python for initdb. 
The tool would not address Robert's issue of someone increasing
shared_buffers on their own.  In fact, the other constants would still
be hard-coded in from initdb, which isn't good.

I think the big win for a tool would be to query the user about how they
are going to be using Postgres, and that can then spit out values the
user can add to postgresql.conf, or to a config file that is included at
the end of postgresql.conf.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 Well, I like the idea of initdb calling the tool, though the tool then
 would need to be in C probably as we can't require python for initdb. 
 The tool would not address Robert's issue of someone increasing
 shared_buffers on their own.

I'm really not impressed with this argument.  Either the user is going
to go and modify the config file, in which case I would hope that they'd
at least glance around at what they should change, or they're going to
move off PG because it's not performing well enough for them- which is
really what I'm trying to avoid happening during the first 15m.

 In fact, the other constants would still
 be hard-coded in from initdb, which isn't good.

Actually, it *is* good, as Magnus pointed out.  Changing a completely
unrelated parameter shouldn't make all of your plans suddenly change.
This is mollified, but only a bit, if you have a GUC that's explicitly
this changes other GUCs, but I'd much rather have a tool that can do a
better job to begin with and which helps the user understand what
parameters are available to change and why there's more than one.

 I think the big win for a tool would be to query the user about how they
 are going to be using Postgres, and that can then spit out values the
 user can add to postgresql.conf, or to a config file that is included at
 the end of postgresql.conf.

Agreed.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
Since, as has been previously discussed in this forum on multiple
occasions [citation needed], the default System V shared memory limits
are absurdly low on many systems, the dynamic shared memory patch
defaults to POSIX shared memory, which has often been touted as a
superior alternative [citation needed].  Unfortunately, the buildfarm
isn't entirely happy with this decision.  On buildfarm member anole
(HP-UX B.11.31), allocation of dynamic shared memory fails with a
Permission denied error, and on smew (Debian GNU/Linux 6.0), it
fails with Function not implemented, which according to a forum
post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
on that box.

What shall we do about this?  I see a few options.

(1) Define the issue as not our problem.  IOW, as of now, if you
want to use PostgreSQL, you've got to either make POSIX shared memory
work on your machine, or change the GUC that selects the type of
dynamic shared memory used.

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.

(3) Add a new setting that auto-probes for a type of shared memory
that works.  Try POSIX first, then System V.  Maybe even fall back to
mmap'd files if neither of those works.

(4) Remove the option to use POSIX shared memory.  System V FTW!

After some consideration, I think my vote is for option #2.  Option #1
seems too user-hostile, especially for a facility that has no in-core
users yet, though I can imagine we might want to go that way
eventually, especially if we at some point try to dump System V shared
memory altogether, as has been proposed.  Option #4 seems sad; we know
that System V shared memory limits are a problem for plenty of people,
so it'd be a shame not to have a way to use the POSIX facilities if
they're available.  Option #3 is fine as far as it goes, but it adds a
significant amount of complexity I'd rather not deal with.

Other votes?  Other ideas?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://forums.justlinux.com/showthread.php?142429-shm_open-problem


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Dimitri Fontaine
Hi,

I did a partial review of this patch, wherein I focused on the patch and
the code itself, as I saw other contributors already did some testing on
it, so that we know it applies cleanly and work to some good extend.

Fujii Masao masao.fu...@gmail.com writes:
 In this patch, full_page_writes accepts three values: on, compress, and off.
 When it's set to compress, the full page image is compressed before it's
 inserted into the WAL buffers.

Code review :

In full_page_writes_str() why are you returning unrecognized rather
than doing an ELOG(ERROR, …) for this unexpected situation?

The code switches to compression (or trying to) when the following
condition is met:

+   if (fpw = FULL_PAGE_WRITES_COMPRESS)
+   {
+   rdt-data = CompressBackupBlock(page, BLCKSZ - 
bkpb-hole_length, (rdt-len));

We have

+ typedef enum FullPageWritesLevel
+ {
+   FULL_PAGE_WRITES_OFF = 0,
+   FULL_PAGE_WRITES_COMPRESS,
+   FULL_PAGE_WRITES_ON
+ } FullPageWritesLevel;

+ #define FullPageWritesIsNeeded(fpw)   (fpw = FULL_PAGE_WRITES_COMPRESS)

I don't much like using the = test against and ENUM and I'm not sure I
understand the intention you have here. It somehow looks like a typo and
disagrees with the macro. What about using the FullPageWritesIsNeeded
macro, and maybe rewriting the macro as

#define FullPageWritesIsNeeded(fpw) \
   (fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON)


Also, having on imply compress is a little funny to me. Maybe we
should just finish our testing and be happy to always compress the full
page writes. What would the downside be exactly (on buzy IO system
writing less data even if needing more CPU will be the right trade-off).
   

I like that you're checking the savings of the compressed data with
respect to the uncompressed data and cancel the compression if there's
no gain. I wonder if your test accounts for enough padding and headers
though given the results we saw in other tests made in this thread.

Why do we have both the static function full_page_writes_str() and the
macro FullPageWritesStr, with two different implementations issuing
either true and false or on and off?

!   unsignedhole_offset:15, /* number of bytes before hole */
!   flags:2,/* state of a backup 
block, see below */
!   hole_length:15; /* number of bytes in hole */

I don't understand that. I wanted to use that patch as a leverage to
smoothly discover the internals of our WAL system but won't have the
time to do that here. That said, I don't even know that C syntax.

+ #define BKPBLOCK_UNCOMPRESSED 0   /* uncompressed */
+ #define BKPBLOCK_COMPRESSED   1   /* comperssed */

There's a typo in the comment above.


   [time required to replay WAL generated during running pgbench]
   61s (on)  1209911 transactions were replayed,
 recovery speed: 19834.6 transactions/sec
   39s (compress)   1445446 transactions were replayed,
 recovery speed: 37062.7 transactions/sec
   37s (off)  1629235 transactions were replayed,
 recovery speed: 44033.3 transactions/sec

How did you get those numbers ? pg_basebackup before the test and
archiving, then a PITR maybe? Is it possible to do the same test with
the same number of transactions to replay, I guess using the -t
parameter rather than the -T one for this testing.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add json_typeof() and json_is_*() functions.

2013-10-10 Thread Andrew Dunstan


On 08/06/2013 08:42 AM, Robert Haas wrote:

On Fri, Aug 2, 2013 at 8:22 AM, Andrew Tipton and...@kiwidrew.com wrote:

But without json_is_scalar(), the choice is one of these two forms:
   json_typeof() NOT IN ('object', 'array')
   json_typeof() IN ('string', 'number', 'boolean', 'null')

The first of those is what seemed to make sense to me.  The user can
always define their own convenience function if they so desire.  I
don't think we need to bloat the default contents of pg_proc for that.




I agree. I have committed a version with just the one function.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  Well, I like the idea of initdb calling the tool, though the tool then
  would need to be in C probably as we can't require python for initdb. 
  The tool would not address Robert's issue of someone increasing
  shared_buffers on their own.
 
 I'm really not impressed with this argument.  Either the user is going
 to go and modify the config file, in which case I would hope that they'd
 at least glance around at what they should change, or they're going to
 move off PG because it's not performing well enough for them- which is
 really what I'm trying to avoid happening during the first 15m.

Well, they aren't going around and looking at other parameters now or we
would not feel a need to auto-tune many of our defaults.

How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.

One nice thing about a tool is that you can see your auto-tuned defaults
right away, while doing this in the backend, you have to start the
server to see the defaults.  I am not even sure how I could allow users
to preview their defaults for different available_mem settings.

  In fact, the other constants would still
  be hard-coded in from initdb, which isn't good.
 
 Actually, it *is* good, as Magnus pointed out.  Changing a completely
 unrelated parameter shouldn't make all of your plans suddenly change.
 This is mollified, but only a bit, if you have a GUC that's explicitly
 this changes other GUCs, but I'd much rather have a tool that can do a
 better job to begin with and which helps the user understand what
 parameters are available to change and why there's more than one.

Well, the big question is how many users are going to use the tool, as
we are not setting this up for experts, but for novices.

I think one big risk is someone changing shared_buffers and not having
an accurate available_memory.  That might lead to some very inaccurate
defaults.  Also, what happens if available_memory is not supplied at
all?  Do we auto-tune just from shared_buffers, or not autotune at all,
and then what are the defaults?  We could certainly throw an error if
shared_buffers  available_memory.  We might just ship with
available_memory defaulting to 256MB and auto-tune everything from
there.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for reserved connections for replication users

2013-10-10 Thread Gibheer
On Thu, 10 Oct 2013 09:55:24 +0530
Amit Kapila amit.kapil...@gmail.com wrote:

 On Thu, Oct 10, 2013 at 3:17 AM, Gibheer gibh...@zero-knowledge.org
 wrote:
  On Mon, 7 Oct 2013 11:39:55 +0530
  Amit Kapila amit.kapil...@gmail.com wrote:
  Robert Haas wrote:
  On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
  andres(at)2ndquadrant(dot)com wrote:
   Hmm.  It seems like this match is making MaxConnections no
   longer mean the maximum number of connections, but rather the
   maximum number of non-replication connections.  I don't think
   I support that definitional change, and I'm kinda surprised if
   this is sufficient to implement it anyway (e.g. see
   InitProcGlobal()).
  
   I don't think the implementation is correct, but why don't you
   like the definitional change? The set of things you can do from
   replication connections are completely different from a normal
   connection. So using separate pools for them seems to make
   sense. That they end up allocating similar internal data seems
   to be an implementation detail to me.
 
   Because replication connections are still connections.  If I
   tell the system I want to allow 100 connections to the server,
   it should allow 100 connections, not 110 or 95 or any other
   number.
 
  I think that to reserve connections for replication, mechanism
  similar to superuser_reserved_connections be used rather than auto
  vacuum workers or background workers.
  This won't change the definition of MaxConnections. Another thing
  is that rather than introducing new parameter for replication
  reserved connections, it is better to use max_wal_senders as it
  can serve the purpose.
 
  Review for replication_reserved_connections-v2.patch, considering
  we are going to use mechanism similar to
  superuser_reserved_connections and won't allow definition of
  MaxConnections to change.
 
  1. /* the extra unit accounts for the autovacuum launcher */
MaxBackends = MaxConnections + autovacuum_max_workers + 1 +
  - + max_worker_processes;
  + + max_worker_processes + max_wal_senders;
 
  Above changes are not required.
 
  2.
  + if ((!am_superuser  !am_walsender) 
ReservedBackends  0 
!HaveNFreeProcs(ReservedBackends))
 
  Change the check as you have in patch-1 for
  ReserveReplicationConnections
 
  if (!am_superuser 
  (max_wal_senders  0 || ReservedBackends  0) 
  !HaveNFreeProcs(max_wal_senders + ReservedBackends))
  ereport(FATAL,
  (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
  errmsg(remaining connection slots are reserved for replication and
  superuser connections)));
 
  3. In guc.c, change description of max_wal_senders similar to
  superuser_reserved_connections at below place:
 {max_wal_senders, PGC_POSTMASTER, REPLICATION_SENDING,
  gettext_noop(Sets the maximum number of simultaneously running WAL
  sender processes.),
 
  4. With this approach, there is no need to change
  InitProcGlobal(), as it is used to keep track bgworkerFreeProcs
  and autovacFreeProcs, for others it use freeProcs.
 
  5. Below description in config.sgml needs to be changed for
  superuser_reserved_connections to include the effect of
  max_wal_enders in reserved connections.
  Whenever the number of active concurrent connections is at least
  max_connections minus superuser_reserved_connections, new
  connections will be accepted only for superusers, and no new
  replication connections will be accepted.
 
  6. Also similar description should be added to max_wal_senders
  configuration parameter.
 
  7. Below comment needs to be updated, as it assumes only superuser
  reserved connections as part of MaxConnections limit.
 /*
   * ReservedBackends is the number of backends reserved for
  superuser use.
   * This number is taken out of the pool size given by MaxBackends
  so
   * number of backend slots available to non-superusers is
   * (MaxBackends - ReservedBackends).  Note what this really means
  is
   * if there are = ReservedBackends connections available, only
  superusers
   * can make new connections --- pre-existing superuser connections
  don't
   * count against the limit.
   */
  int ReservedBackends;
 
  8. Also we can add comment on top of definition for max_wal_senders
  similar to ReservedBackends.
 
 
  With Regards,
  Amit Kapila.
  EnterpriseDB: http://www.enterprisedb.com
 
 
  Hi,
 
  I took the time and reworked the patch with the feedback till now.
  Thank you very much Amit!
 
Is there any specific reason why you moved this patch to next
 CommitFest?
 
 With Regards,
 Amit Kapila.
 EnterpriseDB: http://www.enterprisedb.com
 

Mike asked me about the status of the patch a couple days back and I
didn't think I would be able to work on the patch so soon again. That's
why I told him to just move the patch into the next commitfest.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add record_image_ops opclass for matview concurrent refresh.

2013-10-10 Thread Kevin Grittner
Kevin Grittner kgri...@postgresql.org wrote:

 Add record_image_ops opclass for matview concurrent refresh.

The buildfarm pointed out that I had not handled pass-by-value data
types correctly.  Fixed based on advice from Robert.  We'll see
whether that clears up the part of the buildfarm breakage
attributed to this patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Companydiff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index 0bcbb5d..7925ce2 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1464,9 +1464,8 @@ record_image_cmp(PG_FUNCTION_ARGS)
 			}
 			else if (tupdesc1-attrs[i1]-attbyval)
 			{
-cmpresult = memcmp((values1[i1]),
-   (values2[i2]),
-   tupdesc1-attrs[i1]-attlen);
+if (values1[i1] != values2[i2])
+	cmpresult = (values1[i1]  values2[i2]) ? -1 : 1;
 			}
 			else
 			{
@@ -1695,9 +1694,7 @@ record_image_eq(PG_FUNCTION_ARGS)
 			}
 			else if (tupdesc1-attrs[i1]-attbyval)
 			{
-result = (memcmp((values1[i1]),
- (values2[i2]),
- tupdesc1-attrs[i1]-attlen) == 0);
+result = (values1[i1] == values2[i2]);
 			}
 			else
 			{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:28 PM, Bruce Momjian wrote:


How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.



Without considering any wider question here, let me just note this:

Anything that can be done in this area in Python should be doable in 
Perl fairly simply. I don't think we should be adding any Python 
dependencies. For good or ill Perl has been used for pretty much all our 
complex scripting (pgindent, MSVC build system etc.)



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:
 
 On 10/10/2013 12:28 PM, Bruce Momjian wrote:
 
 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.
 
 
 Without considering any wider question here, let me just note this:
 
 Anything that can be done in this area in Python should be doable in
 Perl fairly simply. I don't think we should be adding any Python
 dependencies. For good or ill Perl has been used for pretty much all
 our complex scripting (pgindent, MSVC build system etc.)

Yes, but this is a run-time requirement, not build-time, and we have not
used Perl in that regard.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist

2013-10-10 Thread Andrew Dunstan


On 09/19/2013 06:12 PM, Pavel Stehule wrote:



2013/9/16 Satoshi Nagayasu sn...@uptime.jp mailto:sn...@uptime.jp







I'm looking at this patch, and I have a question here.

Should DROP TRIGGER IF EXISTS ignore error for non-existing trigger
and non-existing table? Or just only for non-existing trigger?


My opinion is so, both variants should be ignored - it should be fully 
fault tolerant in this use case.






This thread seems to have gone cold, but I'm inclined to agree with 
Pavel. If the table doesn't exist, neither does the trigger, and the 
whole point of the 'IF EXISTS' variants is to provide the ability to 
issue DROP commands that don't fail if their target is missing.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:45 PM, Bruce Momjian wrote:

On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:

On 10/10/2013 12:28 PM, Bruce Momjian wrote:

How do we handle the Python dependency, or is this all to be done in
some other language?  I certainly am not ready to take on that job.


Without considering any wider question here, let me just note this:

Anything that can be done in this area in Python should be doable in
Perl fairly simply. I don't think we should be adding any Python
dependencies. For good or ill Perl has been used for pretty much all
our complex scripting (pgindent, MSVC build system etc.)

Yes, but this is a run-time requirement, not build-time, and we have not
used Perl in that regard.




Nor Python. If we want to avoid added dependencies, we would need to use C.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote:
 Probably.

 The idea is that without those fields it's, to wit, impossible to
 explain non-monotonic movement in metrics of those queries for precise
 use in tools that insist on monotonicity of the fields, which are all
 cumulative to date I think.

 pg_stat_statements_reset() or crashing loses the session, so one
 expects ncalls may decrease.  In addition, a query that is bouncing
 in and out of the hash table will have its statistics be lost, so its
 statistics will also decrease.  This can cause un-resolvable artifact
 in analysis tools.

 The two fields allow for precise understanding of when the statistics
 for a given query_id are continuous since the last time a program
 inspected it.

Thanks for elaborating them! Since 'introduced' is reset even when
the statistics is reset, maybe we can do without 'session_start' for
that purpose?

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

 It seems like an information leak that grows more major if query_id is
 exposed and, at any point, one can determine the query_id for a query
 text.

So hiding only query and query_id is enough?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 12:59:39PM -0400, Andrew Dunstan wrote:
 
 On 10/10/2013 12:45 PM, Bruce Momjian wrote:
 On Thu, Oct 10, 2013 at 12:39:04PM -0400, Andrew Dunstan wrote:
 On 10/10/2013 12:28 PM, Bruce Momjian wrote:
 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.
 
 Without considering any wider question here, let me just note this:
 
 Anything that can be done in this area in Python should be doable in
 Perl fairly simply. I don't think we should be adding any Python
 dependencies. For good or ill Perl has been used for pretty much all
 our complex scripting (pgindent, MSVC build system etc.)
 Yes, but this is a run-time requirement, not build-time, and we have not
 used Perl in that regard.
 
 
 
 Nor Python. If we want to avoid added dependencies, we would need to use C.

Yeah.  :-(  My crazy idea would be, because setting setting
available_mem without a restart would not be supported, to allow the
backend to output suggestions, e.g.:

test= SHOW available_mem = '24GB';
Auto-tuning values:
shared_buffers = 6GB
work_mem = 10MB
...
ERROR:  parameter available_mem cannot be changed without restarting 
the server

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

 Because 'maintenance' operations were rarer, so we figured we could use
 more memory in those cases.

Once we brought Autovacuum into core, though, we should have changed that.

However, I agree with Magnus that the simple course is to have an
autovacuum_worker_memory setting which overrides maint_work_mem if set.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
On 10/09/2013 02:15 PM, Bruce Momjian wrote:
 and for shared_buffers of 2GB:
 
   test= show shared_buffers;
shared_buffers
   
2GB
   (1 row)
   
   test= SHOW work_mem;
work_mem
   --
6010kB
   (1 row)

Huh?  Only 6MB work_mem for 8GB RAM?  How'd you get that?

That's way low, and frankly it's not worth bothering with this if all
we're going to get is an incremental increase.  In that case, let's just
set the default to 4MB like Robert suggested.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Bruce Momjian
On Thu, Oct 10, 2013 at 10:20:02AM -0700, Josh Berkus wrote:
 On 10/09/2013 02:15 PM, Bruce Momjian wrote:
  and for shared_buffers of 2GB:
  
  test= show shared_buffers;
   shared_buffers
  
   2GB
  (1 row)
  
  test= SHOW work_mem;
   work_mem
  --
   6010kB
  (1 row)
 
 Huh?  Only 6MB work_mem for 8GB RAM?  How'd you get that?

 That's way low, and frankly it's not worth bothering with this if all
 we're going to get is an incremental increase.  In that case, let's just
 set the default to 4MB like Robert suggested.

Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses
3x work_mem, that gives us 1.8GB for total work_mem.  This was based on
Andrew's concerns about possible over-commit of work_mem.  I can of
course adjust that.

Consider 8GB of shared memory is 21MB.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Tue, Oct 8, 2013 at 10:07 PM, KONDO Mitsumasa
kondo.mitsum...@lab.ntt.co.jp wrote:
 Hi,

 I tested dbt-2 benchmark in single instance and synchronous replication.

Thanks!

 Unfortunately, my benchmark results were not seen many differences...


 * Test server
Server: HP Proliant DL360 G7
CPU:Xeon E5640 2.66GHz (1P/4C)
Memory: 18GB(PC3-10600R-9)
Disk:   146GB(15k)*4 RAID1+0
RAID controller: P410i/256MB

 * Result
 ** Single instance**
 | NOTPM | 90%tile | Average | S.Deviation
 +---+-+-+-
 no-patched  | 3322.93   | 20.469071   | 5.882   | 10.478
 patched | 3315.42   | 19.086105   | 5.669   | 9.108


 ** Synchronous Replication **
 | NOTPM | 90%tile | Average | S.Deviation
 +---+-+-+-
 no-patched  | 3275.55   | 21.332866   | 6.072   | 9.882
 patched | 3318.82   | 18.141807   | 5.757   | 9.829

 ** Detail of result
 http://pgstatsinfo.projects.pgfoundry.org/DBT-2_Fujii_patch/


 I set full_page_write = compress with Fujii's patch in DBT-2. But it does
 not
 seems to effect for eleminating WAL files.

Could you let me know how much WAL records were generated
during each benchmark?

I think that this benchmark result clearly means that the patch
has only limited effects in the reduction of WAL volume and
the performance improvement unless the database contains
highly-compressible data like pgbench_accounts.filler. But if
we can use other compression algorithm, maybe we can reduce
WAL volume very much. I'm not sure what algorithm is good
for WAL compression, though.

It might be better to introduce the hook for compression of FPW
so that users can freely use their compression module, rather
than just using pglz_compress(). Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Wed, Oct 9, 2013 at 1:35 PM, Haribabu kommi
haribabu.ko...@huawei.com wrote:
 On 08 October 2013 18:42 KONDO Mitsumasa wrote:
(2013/10/08 20:13), Haribabu kommi wrote:
 I will test with sync_commit=on mode and provide the test results.
OK. Thanks!

 Pgbench test results with synchronous_commit mode as on.

Thanks!

 Thread-1  
   Threads-2
 Head code   FPW compressHead 
 code   FPW compress
 Pgbench-org 5min138(0.24GB) 131(0.04GB)   
   160(0.28GB) 163(0.05GB)
 Pgbench-1000 5min   140(0.29GB) 128(0.03GB)   
   160(0.33GB) 162(0.02GB)
 Pgbench-org 15min   141(0.59GB) 136(0.12GB)   
   160(0.65GB) 162(0.14GB)
 Pgbench-1000 15min  138(0.81GB) 134(0.11GB) 
 159(0.92GB) 162(0.18GB)

 Pgbench-org - original pgbench
 Pgbench-1000 - changed pgbench with a record size of 1000.

This means that you changed the data type of pgbench_accounts.filler
to char(1000)?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
All,

We can't reasonably require user input at initdb time, because most
users don't run initdb by hand -- their installer does it for them.  So
any tuning which initdb does needs to be fully automated.

So, the question is: can we reasonably determine, at initdb time, how
much RAM the system has?

I also think this is where the much-debated ALTER SYSTEM SET suddenly
becomes valuable.  With it, it's reasonable to run a tune-up tool on
the client side.  I do think it's reasonable to tell a user:

Just installed PostgreSQL?  Run this command to tune your system:

Mind you, the tuneup tool I'm working on makes use of Python,
configuration directory, and Jinga2, so it's not even relevant to the
preceeding.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera
Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?
 
 Yes. Oversight.

Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
becomes latest, somebody running the current definition with the updated
.so will no longer get these values.  Or is the intention that
PGSS_TUP_LATEST will never be updated again, and future versions will
get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
I mean, surely we can always come up with new symbols to use, but it
seems hard to follow.

There's one other use of PGSS_TUP_LATEST here which I think should also
be changed to = SOME_SPECIFIC_VERSION:

+   if (detected_version = PGSS_TUP_LATEST)
+   {
+   uint64 qid = pgss-private_stat_session_key;
+ 
+   qid ^= (uint64) entry-query_id;
+   qid ^= ((uint64) entry-query_id)  32;
+ 
+   values[i++] = Int64GetDatumFast(qid);
+   }


This paragraph reads a bit strange to me:

+  A statistics session is the time period when statistics are gathered by 
statistics collector 
+  without being reset. So a statistics session continues across normal 
shutdowns, 
+  but whenever statistics are reset, like during a crash or upgrade, a new 
time period 
+  of statistics collection commences i.e. a new statistics session. 
+  The query_id value generation is linked to statistics session to emphasize 
the fact 
+  that whenever statistics are reset,the query_id for the same queries will 
also change.

time period when?  Shouldn't that be time period during which.
Also, doesn't a new statistics session start when a stats reset is
invoked by the user?  The bit after commences appears correct (to me,
not a native by any means) but seems also a bit strange.

I just noticed that the table describing the output indicates that
session_start and introduced are of type text, but the SQL defines
timestamptz.


The instr_time thingy being used for these times maps to
QueryPerformanceCounter() on Windows, and I'm not sure how useful this
is for long-term time tracking; see
http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
for instance.  I think it'd be better to use TimestampTz and
GetCurrentTimestamp() for this.

Just noticed that you changed the timer to struct Instrumentation.  Not
really sure about that change.  Since you seem to be using only the
start time and counter, wouldn't it be better to store only those?
Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Stephen Frost
Bruce,

* Bruce Momjian (br...@momjian.us) wrote:
 On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
  I'm really not impressed with this argument.  Either the user is going
  to go and modify the config file, in which case I would hope that they'd
  at least glance around at what they should change, or they're going to
  move off PG because it's not performing well enough for them- which is
  really what I'm trying to avoid happening during the first 15m.
 
 Well, they aren't going around and looking at other parameters now or we
 would not feel a need to auto-tune many of our defaults.

I think you're confusing things here.  There's a huge difference between
didn't configure anything and got our defaults and went and changed
only one thing in postgresql.conf.  For one thing, we have a ton of the
former.  Perhaps there are some of the latter as well, but I would argue
it's a pretty small group.

 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

I agree that we can't add a Python (or really, perl) dependency, but I
don't think there's anything terribly complicated in what pgtune is
doing that couldn't be pretty easily done in C..

 One nice thing about a tool is that you can see your auto-tuned defaults
 right away, while doing this in the backend, you have to start the
 server to see the defaults.  I am not even sure how I could allow users
 to preview their defaults for different available_mem settings.

Agreed.

  Actually, it *is* good, as Magnus pointed out.  Changing a completely
  unrelated parameter shouldn't make all of your plans suddenly change.
  This is mollified, but only a bit, if you have a GUC that's explicitly
  this changes other GUCs, but I'd much rather have a tool that can do a
  better job to begin with and which helps the user understand what
  parameters are available to change and why there's more than one.
 
 Well, the big question is how many users are going to use the tool, as
 we are not setting this up for experts, but for novices.

The goal would be to have the distros and/or initdb use it for the
initial configuration..  Perhaps by using debconf or similar to ask the
user, perhaps by just running it and letting it do whatever it wants.

 I think one big risk is someone changing shared_buffers and not having
 an accurate available_memory.  That might lead to some very inaccurate
 defaults.  Also, what happens if available_memory is not supplied at
 all?  Do we auto-tune just from shared_buffers, or not autotune at all,
 and then what are the defaults?  We could certainly throw an error if
 shared_buffers  available_memory.  We might just ship with
 available_memory defaulting to 256MB and auto-tune everything from
 there.

These questions are less of an issue if we simply don't have this
available_memory GUC (which strikes me as just adding more confusion
for users anyway, not less).  If no '--available-memory' (or whatever)
option is passed to initdb then we should have it assume some default,
yes, but my view on what that is depends on what the specific results
are.  It sounds like --avail-memory=256MB would end up setting things to
about what we have now for defaults, which is alright for
shared_buffers, imv, but not for a default work_mem (1MB is *really*
small...).

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Re: dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Johnston
Robert Haas wrote
 Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.
 
 What shall we do about this?  I see a few options.

Is this something that rightly falls into being a distro/package specific
setting?  If so then the first goal should be to ensure the maximum number
of successful basic installation scenarios - namely someone installing
PostgreSQL and connect to the running postgres database without encountering
an error.  

As a default I would presume the current System V behavior is sufficient to
accomplish this goal.  If package maintainers can then guarantee that
changing the default will improve the user experience they should be
supported and encouraged to do so but if they are at all unsure they should
leave the default in place.  

As long as a new user is able to get a running database on their machine
if/when they run up against the low defaults of System V memory they will at
least be able to focus on that single problem as opposed to having a failed
initial install and being unsure exactly what they may have done wrong.

Thus option # 2 seems sufficient.  I do think that having some kind of
shared-memory-manager utility could have value but I'd rather see that be a
standalone utility as opposed to something magical done inside the bowels of
the database.  While probably harder to code and learn such a utility would
provide for a much greater UX if implemented well.

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/dynamic-shared-memory-wherein-I-am-punished-for-good-intentions-tp5774055p5774080.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Jeff Janes
On Wed, Oct 9, 2013 at 8:06 AM, Andrew Dunstan and...@dunslane.net wrote:


 On 10/09/2013 10:45 AM, Bruce Momjian wrote:

 On Wed, Oct  9, 2013 at 04:40:38PM +0200, Pavel Stehule wrote:

  Effectively, if every session uses one full work_mem, you end up
 with
  total work_mem usage equal to shared_buffers.

  We can try a different algorithm to scale up work_mem, but it seems
 wise
  to auto-scale it up to some extent based on shared_buffers.


 In my experience a optimal value of work_mem depends on data and load,
 so I
 prefer a work_mem as independent parameter.

 But it still is an independent parameter.  I am just changing the default.


 The danger with work_mem especially is that setting it too high can lead
 to crashing postgres or your system at some stage down the track, so
 autotuning it is kinda dangerous, much more dangerous than autotuning
 shared buffers.



Is this common to see?  I ask because in my experience, having 100
connections all decide to do large sorts simultaneously is going to make
the server fall over, regardless of whether it tries to do them in memory
(OOM) or whether it does them with tape sorts (stuck spin locks, usually).



 The assumption that each connection won't use lots of work_mem is also
 false, I think, especially in these days of connection poolers.


I don't follow that.  Why would using a connection pooler change the
multiples of work_mem that a connection would use?

Cheers,

Jeff


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Claudio Freire
On Thu, Oct 10, 2013 at 1:13 PM, Robert Haas robertmh...@gmail.com wrote:
 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 (3) Add a new setting that auto-probes for a type of shared memory
 that works.  Try POSIX first, then System V.  Maybe even fall back to
 mmap'd files if neither of those works.

 (4) Remove the option to use POSIX shared memory.  System V FTW!

 After some consideration, I think my vote is for option #2.  Option #1
 seems too user-hostile, especially for a facility that has no in-core
 users yet, though I can imagine we might want to go that way
 eventually, especially if we at some point try to dump System V shared
 memory altogether, as has been proposed.  Option #4 seems sad; we know
 that System V shared memory limits are a problem for plenty of people,
 so it'd be a shame not to have a way to use the POSIX facilities if
 they're available.  Option #3 is fine as far as it goes, but it adds a
 significant amount of complexity I'd rather not deal with.

 Other votes?  Other ideas?


I believe option 2 is not only good for now, but also a necessary
previous step in the way to option 3, which I believe should be the
goal.

So, ship a version with option 2, then implement 3, and make it the
default when enough people (using option 2) have successfully tested
pg's implementation of POSIX shared memory.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
Bruce,

 That's way low, and frankly it's not worth bothering with this if all
 we're going to get is an incremental increase.  In that case, let's just
 set the default to 4MB like Robert suggested.
 
 Uh, well, 100 backends at 6MB gives us 600MB, and if each backend uses
 3x work_mem, that gives us 1.8GB for total work_mem.  This was based on
 Andrew's concerns about possible over-commit of work_mem.  I can of
 course adjust that.

That's worst-case-scenario planning -- the 3X work-mem per backend was:
a) Solaris and
b) data warehousing

In a normal OLTP application each backend averages something like 0.25 *
work_mem, since many queries use no work_mem at all.

It also doesn't address my point that, if we are worst-case-scenario
default-setting, we're going to end up with defaults which aren't
materially different from the current defaults.  In which case, why even
bother with this whole exercise?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 12:13 PM, Robert Haas wrote:

Since, as has been previously discussed in this forum on multiple
occasions [citation needed], the default System V shared memory limits
are absurdly low on many systems, the dynamic shared memory patch
defaults to POSIX shared memory, which has often been touted as a
superior alternative [citation needed].  Unfortunately, the buildfarm
isn't entirely happy with this decision.  On buildfarm member anole
(HP-UX B.11.31), allocation of dynamic shared memory fails with a
Permission denied error, and on smew (Debian GNU/Linux 6.0), it
fails with Function not implemented, which according to a forum
post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
on that box.

What shall we do about this?  I see a few options.

(1) Define the issue as not our problem.  IOW, as of now, if you
want to use PostgreSQL, you've got to either make POSIX shared memory
work on your machine, or change the GUC that selects the type of
dynamic shared memory used.

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.

(3) Add a new setting that auto-probes for a type of shared memory
that works.  Try POSIX first, then System V.  Maybe even fall back to
mmap'd files if neither of those works.

(4) Remove the option to use POSIX shared memory.  System V FTW!

After some consideration, I think my vote is for option #2.  Option #1
seems too user-hostile, especially for a facility that has no in-core
users yet, though I can imagine we might want to go that way
eventually, especially if we at some point try to dump System V shared
memory altogether, as has been proposed.  Option #4 seems sad; we know
that System V shared memory limits are a problem for plenty of people,
so it'd be a shame not to have a way to use the POSIX facilities if
they're available.  Option #3 is fine as far as it goes, but it adds a
significant amount of complexity I'd rather not deal with.

Other votes?  Other ideas?



5) test and set it in initdb.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus

 It also doesn't address my point that, if we are worst-case-scenario
 default-setting, we're going to end up with defaults which aren't
 materially different from the current defaults.  In which case, why even
 bother with this whole exercise?

Oh, and let me reiterate: the way to optimize work_mem is through an
admission control mechanism.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan and...@dunslane.net wrote:
 Other votes?  Other ideas?

 5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers? Wasn't there already some
reason why it was advantageous for FreeBSD to continue to use System V
shared memory?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  Well, I like the idea of initdb calling the tool, though the tool then
  would need to be in C probably as we can't require python for initdb.
  The tool would not address Robert's issue of someone increasing
  shared_buffers on their own.

 I'm really not impressed with this argument.  Either the user is going
 to go and modify the config file, in which case I would hope that they'd
 at least glance around at what they should change, or they're going to
 move off PG because it's not performing well enough for them- which is
 really what I'm trying to avoid happening during the first 15m.

 Well, they aren't going around and looking at other parameters now or we
 would not feel a need to auto-tune many of our defaults.

 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

I don't see why it can't be done in C.  The server is written in C,
and so is initdb.  So no matter where we do this, it's gonna be in C.
Where does Python enter into it?

What I might propose is that we have add a new binary tunedb, maybe
compiled out of the src/bin/initdb.c directory.  So you can say:

initdb --available-memory=32GB

...and it will initialize the cluster with appropriate settings.  Or
you can say:

tunedb --available-memory=32GB

...and it will print out a set of proposed configuration settings.  If
we want a mode that rewrites the configuration file, we could have:

tunedb --available-memory=32GB --rewrite-config-file=$PATH

...but that might be overkill, at least for version 1.

 One nice thing about a tool is that you can see your auto-tuned defaults
 right away, while doing this in the backend, you have to start the
 server to see the defaults.  I am not even sure how I could allow users
 to preview their defaults for different available_mem settings.

Yep, agreed.  And agreed that not being able to preview settings is a problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus j...@agliodbs.com wrote:
 So, the question is: can we reasonably determine, at initdb time, how
 much RAM the system has?

As long as you are willing to write platform-dependent code, yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compression of full-page-writes

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Hi,

 I did a partial review of this patch, wherein I focused on the patch and
 the code itself, as I saw other contributors already did some testing on
 it, so that we know it applies cleanly and work to some good extend.

Thanks a lot!

 In full_page_writes_str() why are you returning unrecognized rather
 than doing an ELOG(ERROR, …) for this unexpected situation?

It's because the similar functions 'wal_level_str' and 'dbState' also return
'unrecognized' in the unexpected situation. I just implemented
full_page_writes_str()
in the same manner.

If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump
the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some
users want to use pg_xlogdump to investigate the broken WAL file, so
doing an elog(ERROR) seems not good to me.

 The code switches to compression (or trying to) when the following
 condition is met:

 +   if (fpw = FULL_PAGE_WRITES_COMPRESS)
 +   {
 +   rdt-data = CompressBackupBlock(page, BLCKSZ - 
 bkpb-hole_length, (rdt-len));

 We have

 + typedef enum FullPageWritesLevel
 + {
 +   FULL_PAGE_WRITES_OFF = 0,
 +   FULL_PAGE_WRITES_COMPRESS,
 +   FULL_PAGE_WRITES_ON
 + } FullPageWritesLevel;

 + #define FullPageWritesIsNeeded(fpw)   (fpw = FULL_PAGE_WRITES_COMPRESS)

 I don't much like using the = test against and ENUM and I'm not sure I
 understand the intention you have here. It somehow looks like a typo and
 disagrees with the macro.

I thought that FPW should be compressed only when full_page_writes is
set to 'compress' or 'off'. That is, 'off' implies a compression. When it's set
to 'off', FPW is basically not generated, so there is no need to call
CompressBackupBlock() in that case. But only during online base backup,
FPW is forcibly generated even when it's set to 'off'. So I used the check
fpw = FULL_PAGE_WRITES_COMPRESS there.

 What about using the FullPageWritesIsNeeded
 macro, and maybe rewriting the macro as

 #define FullPageWritesIsNeeded(fpw) \
(fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON)

I'm OK to change the macro so that the = test is not used.

 Also, having on imply compress is a little funny to me. Maybe we
 should just finish our testing and be happy to always compress the full
 page writes. What would the downside be exactly (on buzy IO system
 writing less data even if needing more CPU will be the right trade-off).

on doesn't imply compress. When full_page_writes is set to on,
FPW is not compressed at all.

 I like that you're checking the savings of the compressed data with
 respect to the uncompressed data and cancel the compression if there's
 no gain. I wonder if your test accounts for enough padding and headers
 though given the results we saw in other tests made in this thread.

I'm afraid that the patch has only limited effects in WAL reduction and
performance improvement unless the database contains highly-compressible
data like large blank characters column. It really depends on the contents
of the database. So, obviously FPW compression should not be the default.
Maybe we can treat it as just tuning knob.

 Why do we have both the static function full_page_writes_str() and the
 macro FullPageWritesStr, with two different implementations issuing
 either true and false or on and off?

First I was thinking to use on and off because they are often used
as the setting value of boolean GUC. But unfortunately the existing
pg_xlogdump uses true and false to show the value of full_page_writes
in WAL. To avoid breaking the backward compatibility, I implmented
the true/false version of function. I'm really not sure how many people
want such a compatibility of pg_xlogdump, though.

 !   unsignedhole_offset:15, /* number of bytes before hole */
 !   flags:2,/* state of a backup 
 block, see below */
 !   hole_length:15; /* number of bytes in hole 
 */

 I don't understand that. I wanted to use that patch as a leverage to
 smoothly discover the internals of our WAL system but won't have the
 time to do that here.

We need the flag indicating whether each FPW is compressed or not.
If no such a flag exists in WAL, the standby cannot determine whether
it should decompress each FPW or not, and then cannot replay
the WAL containing FPW properly. That is, I just used a 'space' in
the header of FPW to have such a flag.

 That said, I don't even know that C syntax.

The struct 'ItemIdData' uses the same C syntax.

 + #define BKPBLOCK_UNCOMPRESSED 0   /* uncompressed */
 + #define BKPBLOCK_COMPRESSED   1   /* comperssed */

 There's a typo in the comment above.

Yep.

   [time required to replay WAL generated during running pgbench]
   61s (on)  1209911 transactions were replayed,
 recovery speed: 19834.6 transactions/sec
   39s 

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 11:41 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't see why it can't be done in C.  The server is written in C,
 and so is initdb.  So no matter where we do this, it's gonna be in C.
 Where does Python enter into it?

I mentioned that pgtune was written in Python, but as you say that's
wholly incidental. An equivalent C program would only be slightly more
verbose.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Thu, Oct 10, 2013 at 11:43 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 1:37 PM, Josh Berkus j...@agliodbs.com wrote:
 So, the question is: can we reasonably determine, at initdb time, how
 much RAM the system has?

 As long as you are willing to write platform-dependent code, yes.

That's why trying to give the responsibility to a packager is compelling.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Eisentraut
On 10/10/13 11:31 AM, Bruce Momjian wrote:
 Let me walk through the idea of adding an available_mem setting, that
 Josh suggested, and which I think addresses Robert's concern about
 larger shared_buffers and Windows servers.

I think this is a promising idea.  available_mem could even be set
automatically by packages.  And power users could just set available_mem
= -1 to turn off all the magic.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/10/2013 11:41 AM, Robert Haas wrote:
 tunedb --available-memory=32GB

 ...and it will print out a set of proposed configuration settings.  If
 we want a mode that rewrites the configuration file, we could have:

 tunedb --available-memory=32GB --rewrite-config-file=$PATH

 ...but that might be overkill, at least for version 1.

 Given that we are talking currently about ALTER SYSTEM SET *and*
 configuration directories, we should not be rewriting any existing
 config file.  We should be adding an auto-generated one, or using ALTER
 SYSTEM SET.

 In fact, why don't we just do this though ALTER SYSTEM SET?  add a
 plpgsql function called pg_tune().

That's another way to do it, for sure.  It does require the ability to
log in to the database.  I imagine that could be less convenient in
some scripting environments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Fujii Masao
On Fri, Oct 11, 2013 at 2:49 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?

 Yes. Oversight.

 Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?

I was just thinking the same thing. Agreed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Magnus Hagander
On Thu, Oct 10, 2013 at 8:41 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Oct 10, 2013 at 12:00:54PM -0400, Stephen Frost wrote:
 * Bruce Momjian (br...@momjian.us) wrote:
  Well, I like the idea of initdb calling the tool, though the tool then
  would need to be in C probably as we can't require python for initdb.
  The tool would not address Robert's issue of someone increasing
  shared_buffers on their own.

 I'm really not impressed with this argument.  Either the user is going
 to go and modify the config file, in which case I would hope that they'd
 at least glance around at what they should change, or they're going to
 move off PG because it's not performing well enough for them- which is
 really what I'm trying to avoid happening during the first 15m.

 Well, they aren't going around and looking at other parameters now or we
 would not feel a need to auto-tune many of our defaults.

 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

 I don't see why it can't be done in C.  The server is written in C,
 and so is initdb.  So no matter where we do this, it's gonna be in C.
 Where does Python enter into it?

 What I might propose is that we have add a new binary tunedb, maybe
 compiled out of the src/bin/initdb.c directory.  So you can say:

 initdb --available-memory=32GB

 ...and it will initialize the cluster with appropriate settings.  Or
 you can say:

 tunedb --available-memory=32GB

 ...and it will print out a set of proposed configuration settings.  If
 we want a mode that rewrites the configuration file, we could have:

 tunedb --available-memory=32GB --rewrite-config-file=$PATH

 ...but that might be overkill, at least for version 1.

I like this. And I agree that the edit-in-place might be overkill. But
then, if/when we get the ability to programatically modify the config
files, that's probably not a very complicated thing to add once the
rest is done.


 One nice thing about a tool is that you can see your auto-tuned defaults
 right away, while doing this in the backend, you have to start the
 server to see the defaults.  I am not even sure how I could allow users
 to preview their defaults for different available_mem settings.

 Yep, agreed.  And agreed that not being able to preview settings is a problem.

I'd even say it would be a *big* problem.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Josh Berkus
On 10/10/2013 11:41 AM, Robert Haas wrote:
 tunedb --available-memory=32GB
 
 ...and it will print out a set of proposed configuration settings.  If
 we want a mode that rewrites the configuration file, we could have:
 
 tunedb --available-memory=32GB --rewrite-config-file=$PATH
 
 ...but that might be overkill, at least for version 1.

Given that we are talking currently about ALTER SYSTEM SET *and*
configuration directories, we should not be rewriting any existing
config file.  We should be adding an auto-generated one, or using ALTER
SYSTEM SET.

In fact, why don't we just do this though ALTER SYSTEM SET?  add a
plpgsql function called pg_tune().

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 After some consideration, I think my vote is for option #2.

 Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.

 Wasn't there already some
 reason why it was advantageous for FreeBSD to continue to use System V
 shared memory?

Yes, but this code doesn't affect the main shared memory segment, so I
think that's sort of a separate point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Geoghegan
On Wed, Oct 9, 2013 at 10:21 PM, Magnus Hagander mag...@hagander.net wrote:
 Well, the Postgres defaults won't really change, because the default
 vacuum_work_mem will be -1, which will have vacuum defer to
 maintenance_work_mem. Under this scheme, vacuum only *prefers* to get
 bound working memory size from vacuum_work_mem. If you don't like
 vacuum_work_mem, you can just ignore it.

 While unrelated to the main topic of this thread, I think this is very
 important as well. I often have to advice people to remember to cap
 their maintenance_work_mem because of autovacuum, and to remember to
 re-tune maintenance_wokr_mem when they change the number of autovacuum
 workers.

I'll code that up at some point, then.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Peter Eisentraut
On 10/10/13 11:45 AM, Bruce Momjian wrote:
 I think the big win for a tool would be to query the user about how they
 are going to be using Postgres, and that can then spit out values the
 user can add to postgresql.conf, or to a config file that is included at
 the end of postgresql.conf.

I think such a tool would actually make the initial experience worse for
many people.  Quick, how are you going to use your PostgreSQL server:

- OLTP
- web
- mixed

Uh, all of the above?  This sort of thing can quickly turn the first 15
minutes into the first 2 hours, as users are forced to analyze the
different settings, have second thoughts, wonder about how to change
them back, etc.  The fewer decisions people have to make initially, the
better.  The initdb phase already has too many required decisions that
can cause regret later (e.g., locale, encoding, checksums).



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange behavior of pg_trgm's similarity function

2013-10-10 Thread Fujii Masao
On Thu, Oct 10, 2013 at 11:00 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 7:12 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 10.10.2013 15:03, Fujii Masao wrote:

 Hi,

 The behavior of pg_trgm's similarity function seems strange. Is this
 intentional?

 I was thinking that the following three calls of the similarity function
 return
 the same number because the second argument is just the three characters
 contained in the first argument in every calls.

 =# SELECT similarity('12345', '123');
 =# SELECT similarity('12345', '234');
 =# SELECT similarity('12345', '345');

 But that's not true. Each returns the different number.

 =# SELECT similarity('12345', '123');
   similarity
 
 0.428571
 (1 row)

 =# SELECT similarity('12345', '234');
   similarity
 
 0.11
 (1 row)

 =# SELECT similarity('12345', '345');
   similarity
 
 0.25
 (1 row)

 This happens because, for example, similarity('12345', '123') returns
 the similarity number of '**12345*' and '**123*' (* means the blank
 character),
 NOT '12345' and '123'. IOW, two and one blank characters are added into
 the heading and tailing of each argument, respectively. I wonder why
 pg_trgm's similarity function works in this way. We should change this
 so that no blank characters are added into the arguments?


 Well, you could also argue that 11 and 22 are quite similar,
 even though pg_trgm's similarity will not think so. It comes down to the
 definition of similarity, and how well that definition matches your
 intuition.

 FWIW, it feels right to me that a match in the beginning of a word is worth
 more than one in the middle of a string. -1 on changing that.

Okay, understood.

 I'm not so sure that the assumption that leading trigrams should
 effectively weight  3x is a good one to build into the library.
 However, the behavior is clearly documented and can't be changed.   I
 think you'd need to improvise an alternate set of trigram ops if you
 wanted to rig an alternate matching behavior.

Yeah, this makes sense.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.4] row level security

2013-10-10 Thread Marc Munro
On Wed, 2013-09-04 at 14:35 +, Robert Haas wrote:
 
 On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think it's entirely sensible to question whether we should reject
 (not
  hold up) RLS if it has major covert-channel problems.
 
 We've already had this argument before, about the security_barrier
[ . . . ]

Sorry for following up on this so late, I have just been trying to catch
up with the mailing lists.

I am the developer of Veil, which this thread mentioned a number of
times.  I wanted to state/confirm a number of things:

Veil is not up to date wrt Postgres versions.  I didn't release a new
version for 9.2, and when no-one complained I figured no-one other than
me was using it.  I'll happily update it if anyone wants it.

Veil makes no attempt to avoid covert channels.  It can't.

Veil is a low-level toolset designed for optimising queries about
privileges.  It allows you to build RLS with reasonable performance, but
it is not in itself a solution for RLS.

I wish the Postgres RLS project well and look forward to its release in
Postgres 9.4.  

__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Magnus Hagander
On Thu, Oct 10, 2013 at 8:46 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 2:45 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/10/2013 11:41 AM, Robert Haas wrote:
 tunedb --available-memory=32GB

 ...and it will print out a set of proposed configuration settings.  If
 we want a mode that rewrites the configuration file, we could have:

 tunedb --available-memory=32GB --rewrite-config-file=$PATH

 ...but that might be overkill, at least for version 1.

 Given that we are talking currently about ALTER SYSTEM SET *and*
 configuration directories, we should not be rewriting any existing
 config file.  We should be adding an auto-generated one, or using ALTER
 SYSTEM SET.

 In fact, why don't we just do this though ALTER SYSTEM SET?  add a
 plpgsql function called pg_tune().

 That's another way to do it, for sure.  It does require the ability to
 log in to the database.  I imagine that could be less convenient in
 some scripting environments.

I think that would also make it much harder to automate for packagers.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 02:35 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:21 PM, Andrew Dunstan and...@dunslane.net wrote:

Other votes?  Other ideas?

5) test and set it in initdb.

Are you advocating for that option, or just calling out that it's
possible?  I'd say that's closely related to option #3, except at
initdb time rather than run-time - and it might be preferable to #3
for some of the same reasons discussed on the thread about tuning
work_mem, namely, that having it change from one postmaster lifetime
to the next might lead to user astonishment.




Mainly just to throw it into the mix, But like you I think it's probably 
a better option than #3 for the reason you give. It also has the 
advantage of keeping any probing code out of the backend.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Stephen Frost
* Robert Haas (robertmh...@gmail.com) wrote:
 On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:
  On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:
  (2) Default to using System V shared memory.  If people want POSIX
  shared memory, let them change the default.
 
  After some consideration, I think my vote is for option #2.
 
  Wouldn't that become the call of packagers?
 
 Packagers can certainly override whatever we do, but we still need to
 make the buildfarm green again.

While I agree that making the buildfarm green is valuable, I really
wonder about a system where /dev/shm is busted.

Personally, I like Andrew's suggestion to test and set accordingly, with
the default being POSIX if it's available and a fall-back to SysV (maybe
with a warning..).  Going back to the situation where our default set-up
limits us to the ridiculously small SysV value would *really* suck; even
if we don't have any users today, we're certainly going to have some
soon and I don't think they'll be happy with a 24MB (or whatever) limit.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Christopher Browne
On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian br...@momjian.us wrote:
 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

I should think it possible to reimplement it in C.  It was considerably
useful to start by implementing in Python, as that evades various sorts
of efforts needed in C (e.g. - memory allocation, picking a hash table
implementation), and allows someone to hack on it without needing to
run through a recompile every time something is touched.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 10:12 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Oct 11, 2013 at 12:19 AM, Daniel Farina dan...@heroku.com wrote:
 Probably.

 The idea is that without those fields it's, to wit, impossible to
 explain non-monotonic movement in metrics of those queries for precise
 use in tools that insist on monotonicity of the fields, which are all
 cumulative to date I think.

 pg_stat_statements_reset() or crashing loses the session, so one
 expects ncalls may decrease.  In addition, a query that is bouncing
 in and out of the hash table will have its statistics be lost, so its
 statistics will also decrease.  This can cause un-resolvable artifact
 in analysis tools.

 The two fields allow for precise understanding of when the statistics
 for a given query_id are continuous since the last time a program
 inspected it.

 Thanks for elaborating them! Since 'introduced' is reset even when
 the statistics is reset, maybe we can do without 'session_start' for
 that purpose?

There is a small loss of precision.

The original reason was that if one wanted to know, given two samples
of pg_stat_statements, when the query_id is going to remain stable for
a given query.

For example:

If the session changes on account of a reset, then it is known that
all query_ids one's external program is tracking are no longer going
to be updated, and the program can take account for the fact that the
same query text is going to have a new query_id.

Given the new idea of mixing in the point release number:

If the code is changed to instead mixing in the full version of
Postgres, as you suggested recently, this can probably be removed.
The caveat there is then the client is going to have to do something a
bit weird like ask for the point release and perhaps even compile
options of Postgres to know when the query_id is going to have a
different value for a given query text.  But, maybe this is an
acceptable compromise to remove one field.

 +/*
 + * The role calling this function is unable to see
 + * sensitive aspects of this tuple.
 + *
 + * Nullify everything except the insufficient privilege
 + * message for this entry
 + */
 +memset(nulls, 1, sizeof nulls);
 +
 +nulls[i]  = 0;
 +values[i] = CStringGetTextDatum(insufficient privilege);

 Why do we need to hide *all* the fields in pg_stat_statements, when
 it's accessed by improper user? This is a big change of pg_stat_statements
 behavior, and it might break the compatibility.

 It seems like an information leak that grows more major if query_id is
 exposed and, at any point, one can determine the query_id for a query
 text.

 So hiding only query and query_id is enough?

Yeah, I think so.  The other fields feel a bit weird to leave hanging
around as well, so I thought I'd just fix it in one shot, but doing
the minimum or only applying this idea to new fields is safer.  It's
shorter to hit all the fields, though, which is why I was tempted to
do that.

Perhaps not a good economy for potential subtle breaks in the next version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 3:41 PM, Christopher Browne cbbro...@gmail.com wrote:
 On Thu, Oct 10, 2013 at 12:28 PM, Bruce Momjian br...@momjian.us wrote:
 How do we handle the Python dependency, or is this all to be done in
 some other language?  I certainly am not ready to take on that job.

 I should think it possible to reimplement it in C.  It was considerably
 useful to start by implementing in Python, as that evades various sorts
 of efforts needed in C (e.g. - memory allocation, picking a hash table
 implementation), and allows someone to hack on it without needing to
 run through a recompile every time something is touched.

Also, the last time I saw that tool, it output recommendations for
work_mem that I would never, ever recommend to anyone on a production
server - they were VERY high.

More generally, Josh has made repeated comments that various proposed
value/formulas for work_mem are too low, but obviously the people who
suggested them didn't think so.  So I'm a bit concerned that we don't
all agree on what the end goal of this activity looks like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Merlin Moncure
On Thu, Oct 10, 2013 at 11:13 AM, Robert Haas robertmh...@gmail.com wrote:
 Since, as has been previously discussed in this forum on multiple
 occasions [citation needed], the default System V shared memory limits
 are absurdly low on many systems, the dynamic shared memory patch
 defaults to POSIX shared memory, which has often been touted as a
 superior alternative [citation needed].  Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.

 What shall we do about this?  I see a few options.

 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

Doesn't #2 negate all advantages of this effort?  Bringing sysv
management back on the table seems like a giant step backwards -- or
am I missing something?

http://www.postgresql.org/docs/9.3/interactive/kernel-resources.html#SYSVIPC

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 10:49 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Daniel Farina escribió:
 On Thu, Oct 10, 2013 at 7:40 AM, Fujii Masao masao.fu...@gmail.com wrote:

  In my test, I found that pg_stat_statements.total_time always indicates a 
  zero.
  I guess that the patch might handle pg_stat_statements.total_time wrongly.
 
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(pgss-session_start));
  +values[i++] = DatumGetTimestamp(
  +instr_get_timestamptz(entry-introduced));
 
  These should be executed only when detected_version = PGSS_TUP_LATEST?

 Yes. Oversight.

 Hmm, shouldn't this be conditional on a new PGSS_TUP_V1_2?  I mean, if
 later somebody patches pgss to have a PGSS_TUP_V2 or V1_3 and that
 becomes latest, somebody running the current definition with the updated
 .so will no longer get these values.  Or is the intention that
 PGSS_TUP_LATEST will never be updated again, and future versions will
 get PGSS_TUP_REALLY_LATEST and PGSS_TUP_REALLY_LATEST_HONEST and so on?
 I mean, surely we can always come up with new symbols to use, but it
 seems hard to follow.

 There's one other use of PGSS_TUP_LATEST here which I think should also
 be changed to = SOME_SPECIFIC_VERSION:

 +   if (detected_version = PGSS_TUP_LATEST)
 +   {
 +   uint64 qid = pgss-private_stat_session_key;
 +
 +   qid ^= (uint64) entry-query_id;
 +   qid ^= ((uint64) entry-query_id)  32;
 +
 +   values[i++] = Int64GetDatumFast(qid);
 +   }

I made some confusing mistakes here in using the newer tuple
versioning.  Let me try to explain what the motivation was:

I was adding new fields to pg_stat_statements and was afraid that it'd
be hard to get a very clear view that all the set fields are in
alignment and there were no accidental overruns, with the problem
getting more convoluted as more versions are added.

The idea of PGSS_TUP_LATEST is useful to catch common programmer error
by testing some invariants, and it'd be nice not to have to thrash the
invariant checking code every release, which would probably defeat the
point of such oops prevention code.  But, the fact that I went on to
rampantly do questionable things PGSS_TUP_LATEST is a bad sign.

By example, here are the two uses that have served me very well:

/* Perform version detection */
if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_0)
detected_version = PGSS_TUP_V1_0;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS_V1_1)
detected_version = PGSS_TUP_V1_1;
else if (tupdesc-natts == PG_STAT_STATEMENTS_COLS)
detected_version = PGSS_TUP_LATEST;
else
{
/*
 * Couldn't identify the tuple format.  Raise error.
 *
 * This is an exceptional case that may only happen in bizarre
 * situations, since it is thought that every released version
 * of pg_stat_statements has a matching schema.
 */
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 errmsg(pg_stat_statements schema is not supported 
by its installed binary)));
}

And

#ifdef USE_ASSERT_CHECKING
/* Check that every column appears to be filled */
switch (detected_version)
{
case PGSS_TUP_V1_0:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_0);
break;
case PGSS_TUP_V1_1:
Assert(i == PG_STAT_STATEMENTS_COLS_V1_1);
break;
case PGSS_TUP_LATEST:
Assert(i == PG_STAT_STATEMENTS_COLS);
break;
default:
Assert(false);
}
#endif

Given that, perhaps a way to fix this is something like this patch-fragment:


 {
  PGSS_TUP_V1_0 = 1,
  PGSS_TUP_V1_1,
- PGSS_TUP_LATEST
+ PGSS_TUP_V1_2
 } pgssTupVersion;

+#define PGSS_TUP_LATEST PGSS_TUP_V1_2


This way when a programmer is making new tuple versions, they are much
more likely to see the immediate need to teach those two sites about
the new tuple size.  But, the fact that one does not get the
invariants updated in a completely compulsory way may push the value
of this checking under water.

I'd be sad to see it go, it has saved me a lot of effort when
returning to the code after a while.

What do you think?

 The instr_time thingy being used for these times maps to
 QueryPerformanceCounter() on Windows, and I'm not sure how useful this
 is for long-term time tracking; see
 http://stackoverflow.com/questions/5296990/queryperformancecounter-and-overflows#5297163
 for instance.  I think it'd be better to use TimestampTz and
 GetCurrentTimestamp() for this.

Ah. I was going to do that, but thought it'd be nice to merely push
down the already-extant Instr struct in most cases, as to get the
'start' time stored there for free.  But if it can't 

Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Alvaro Herrera
Daniel Farina escribió:

 Given that, perhaps a way to fix this is something like this patch-fragment:
 
 
  {
   PGSS_TUP_V1_0 = 1,
   PGSS_TUP_V1_1,
 - PGSS_TUP_LATEST
 + PGSS_TUP_V1_2
  } pgssTupVersion;
 
 +#define PGSS_TUP_LATEST PGSS_TUP_V1_2
 

This sounds good.  I have seen other places that have the LATEST
definition as part of the enum, assigning the previous value to it.  I'm
not really sure which of these is harder to miss when updating the code.
I'm happy with either.

Make sure to use the PGSS_TUP_V1_2 in the two places mentioned, in any
case.

  Just noticed that you changed the timer to struct Instrumentation.  Not
  really sure about that change.  Since you seem to be using only the
  start time and counter, wouldn't it be better to store only those?
  Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().
 
 Yeah, I was unsure about that too.
 
 The motivation was that I need one more piece of information in
 pgss_store (the absolute start time).  I was going to widen the
 argument list, but it was looking pretty long, so instead I was
 thinking it'd be more concise to push the entire, typically extant
 Instrumentation struct pointer down.

Would it work to define your own struct to pass around?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Andrew Dunstan


On 10/10/2013 02:45 PM, Robert Haas wrote:

On Thu, Oct 10, 2013 at 2:36 PM, Peter Geoghegan p...@heroku.com wrote:

On Thu, Oct 10, 2013 at 9:13 AM, Robert Haas robertmh...@gmail.com wrote:

(2) Default to using System V shared memory.  If people want POSIX
shared memory, let them change the default.
After some consideration, I think my vote is for option #2.

Wouldn't that become the call of packagers?

Packagers can certainly override whatever we do, but we still need to
make the buildfarm green again.





I really dislike throwing things over the wall to packagers like this, 
anyway. Quite apart from anything else, not everyone uses pre-built 
packages, and we should make things as as easy as possible for those who 
don't, too.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread David Fetter
On Thu, Oct 10, 2013 at 12:13:20PM -0400, Robert Haas wrote:
 Since, as has been previously discussed in this forum on multiple
 occasions [citation needed], the default System V shared memory limits
 are absurdly low on many systems, the dynamic shared memory patch
 defaults to POSIX shared memory, which has often been touted as a
 superior alternative [citation needed].  Unfortunately, the buildfarm
 isn't entirely happy with this decision.  On buildfarm member anole
 (HP-UX B.11.31), allocation of dynamic shared memory fails with a
 Permission denied error, and on smew (Debian GNU/Linux 6.0), it
 fails with Function not implemented, which according to a forum
 post[1] I found probably indicates that /dev/shm doesn't mount a tmpfs
 on that box.
 
 What shall we do about this?  I see a few options.
 
 (1) Define the issue as not our problem.  IOW, as of now, if you
 want to use PostgreSQL, you've got to either make POSIX shared memory
 work on your machine, or change the GUC that selects the type of
 dynamic shared memory used.

+1 for this.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread pilum . 70

The V7-Patch applied cleanly and I got no issues in my first tests.

The change from column session_start to a function seems very reasonable for me.

Concernig the usability, I would like to suggest a minor change, 
that massively increases the usefulness of the patch for beginners, 
who often use this view as a first approach to optimize index structure.


The history of this tool contains a first version without normalization.
This wasn't useful enough except for prepared queries.
The actual version has normalized queries, so calls get
summarized to get a glimpse of bad queries.
But the drawback of this approach is impossibility to use
explain analyze without further substitutions.

The identification patch provides the possibility to summarize calls
by query_id, so that the normalized query string itself is no longer 
needed to be exposed in the view for everyone.


I suggest to add a parameter to recover the possibility to
display real queries. The following very minor change
(based on V7) exposes the first real query getting this
query_id if normalization of the exposed string ist deactivated 
(The actual behaviour is the default). 
This new option is not always the best starting point to discover index shortfalls, 
but a huge gain for beginners because it serves the needs

in more than 90% of the normal use cases.

What do you think?

Arne

Date:   Mon Oct 7 17:54:08 2013 +

Switch to disable normalized query strings

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index e50dfba..6cc9244 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -234,7 +234,7 @@ static int  pgss_max;   /* max # 
statements to track */
 static int pgss_track; /* tracking level */
 static bool pgss_track_utility; /* whether to track utility commands */
 static bool pgss_save; /* whether to save stats across 
shutdown */
-
+static bool pgss_normalize; /* whether to normalize the query 
representation shown in the view (otherwise show the first query executed with 
this query_id) */

 #define pgss_enabled() \
(pgss_track == PGSS_TRACK_ALL || \
@@ -356,6 +356,17 @@ _PG_init(void)
 NULL,
 NULL);

+   DefineCustomBoolVariable(pg_stat_statements.normalize,
+Selects whether the view column contains the query 
strings in a normalized form.,
+  NULL,
+  pgss_normalize,
+  true,
+  PGC_SUSET,
+  0,
+  NULL,
+  NULL,
+  NULL);
+
EmitWarningsOnPlaceholders(pg_stat_statements);

/*
@@ -1084,9 +1095,9 @@ pgss_store(const char *query, uint32 queryId,

query_len = strlen(query);

-   if (jstate)
+   if (jstate  pgss_normalize)
{
-   /* Normalize the string if enabled */
+   /* Normalize the string is not NULL and normalized 
query strings are enabled */
norm_query = generate_normalized_query(jstate, query,

   query_len,

   key.encoding);






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Pattern matching operators a index

2013-10-10 Thread soroosh sardari
Hi

I'm developing a new type for character string, like varchar. I wrote
operators for btree and so forth.
I wonder how pattern matching operators using btree index, because btree
operator class ony knows about , =, =, and = operators, but operators
for pattern matching, such as LIKE, are not known for btree access method.

Now my question is:
Is Postgre using btree for pattern matching query for varchar or other
character string types?

If it does, how i implement it for my new type?

Regards,
Soroosh


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Robert Haas
On Thu, Oct 10, 2013 at 4:00 PM, Merlin Moncure mmonc...@gmail.com wrote:
 (2) Default to using System V shared memory.  If people want POSIX
 shared memory, let them change the default.

 Doesn't #2 negate all advantages of this effort?  Bringing sysv
 management back on the table seems like a giant step backwards -- or
 am I missing something?

Not unless there's no difference between the default and the only option.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: calls under-estimation propagation

2013-10-10 Thread Daniel Farina
On Thu, Oct 10, 2013 at 1:30 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
  Just noticed that you changed the timer to struct Instrumentation.  Not
  really sure about that change.  Since you seem to be using only the
  start time and counter, wouldn't it be better to store only those?
  Particularly unsure about passing INSTRUMENT_ALL to InstrAlloc().

 Yeah, I was unsure about that too.

 The motivation was that I need one more piece of information in
 pgss_store (the absolute start time).  I was going to widen the
 argument list, but it was looking pretty long, so instead I was
 thinking it'd be more concise to push the entire, typically extant
 Instrumentation struct pointer down.

 Would it work to define your own struct to pass around?

Absolutely, I was just hoping to spare the code another abstraction if
another was a precise superset.

Looks like that isn't going to happen, though, so a pgss-oriented
struct is likely what will have to be.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory: wherein I am punished for good intentions

2013-10-10 Thread Josh Berkus
Robert,

 Doesn't #2 negate all advantages of this effort?  Bringing sysv
 management back on the table seems like a giant step backwards -- or
 am I missing something?
 
 Not unless there's no difference between the default and the only option.

Well, per our earlier discussion about the first 15 minutes, there
actually isn't a difference.

We got rid of SHMMAX for the majority of our users for 9.3.  We should
NOT revert that just so we can support older platforms -- especially
since, if anything, Kernel.org is going to cripple SysV support even
further in the future.  The platforms where it does work represent the
vast majority of our users, and are only on the increase.

I can only see two reasonable alternatives:

This one:
 (3) Add a new setting that auto-probes for a type of shared memory
 that works.  Try POSIX first, then System V.  Maybe even fall back to
 mmap'd files if neither of those works.

Or:

(5) Default to POSIX, and allow for SysV as a compile-time option for
platforms with poor POSIX memory support.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Request for Patch Feedback: Lag Lead Window Functions Can Ignore Nulls

2013-10-10 Thread Alvaro Herrera
We have this block:

+   {
+   /*
+* This is the window we want - but we have to tweak the
+* definition slightly (e.g. to support the IGNORE NULLS frame
+* option) as we're not using the default (i.e. parent) frame
+* options.
+*
+* We'll create a 'child' (using refname to inherit everything
+* from the parent) that just overrides the frame options
+* (assuming it doesn't already exist):
+*/
+   WindowDef  *clone = makeNode(WindowDef);

... then it goes to populate the clone.  When this is done, we use the
clone to walk the list of existing WindowDefs, and if there's a match we
free this one and use that one.  Wouldn't it be better to walk the
existing array first looking for a match, and only create a clone if
none is found?  This would avoid the memory leak problems; I originally
pointed out that you're leaking the bits created by this makeNode() call
above, but now that I look at it again, I think you're also leaking the
bits created by the two copyObject() calls to create the clone.  It
appears to me that it's simpler to not allocate any memory in the first
place, unless necessary.


Also, in parsenodes.h, you had the [MANDATORY] and such tags.  Three
things about that: 1) it looks a lot uglier than the original, so how
about the modified version below?  and 2) what does MANDATORY value of
NULL means?  Maybe you mean MANDATORY value or NULL instead? 3)
Exactly what case does the in this case phrase refer to?  I think the
comment should be more explicit.  Also, I think this should be its own
paragraph instead of being mixed with the For entries in a paragraph.

/*
 * WindowDef - raw representation of WINDOW and OVER clauses
 *
 * For entries in a WINDOW list, name is the window name being defined.
 * For OVER clauses, we use name for the OVER window syntax, or refname
 * for the OVER (window) syntax, which is subtly different --- the latter
 * implies overriding the window frame clause.
 * 
 * In this case, the per-field indicators determine what the semantics
 * are:
 *  [V]irtual
 *  If NULL, then the parent's (refname) value is used.
 *  [M]andatory
 *  Never inherited from the parent, so must be specified; 
may be NULL.
 *  [S]uper
 *  Always inherited from parent, any local version ignored.
 */
typedef struct WindowDef
{
NodeTag type;
char   *name;   /* [M] window's own name */
char   *refname;/* [M] referenced window name, if any */
List   *partitionClause;/* [V] PARTITION BY expression list */
List   *orderClause;/* [M] ORDER BY (list of SortBy) */
int frameOptions;   /* [M] frame_clause options, see below */
Node   *startOffset;/* [M] expression for starting bound, if 
any */
Node   *endOffset;  /* [M] expression for ending bound, if any 
*/
int location;   /* parse location, or -1 if none/unknown */
} WindowDef;

In gram.y there are some spurious whitespaces at end-of-line.  You
should be able to see them with git diff --check.  (I don't think we
support running pgindent on .y files, which would have otherwise cleaned
this up.)

A style issue.  You have this:

+   /*
+* We can process a constant offset much more efficiently; initially
+* we'll scan through the first offset non-null rows, and store that
+* index. On subsequent rows we'll decide whether to push that index
+* forwards to the next non-null value, or just return it again.
+*/
+   leadlag_const_context *context = WinGetPartitionLocalMemory(
+   winobj,
+ sizeof(leadlag_const_context));
+   int count_forward = 0;

I think it'd be better to put the declarations above the comment, and
assignment to context below the comment.  This way, the indentation of
the assignment is not so odd.  So it'd look like

+   leadlag_const_context *context;
+   int count_forward = 0;
+
+   /*
+* We can process a constant offset much more efficiently; initially
+* we'll scan through the first offset non-null rows, and store that
+* index. On subsequent rows we'll decide whether to push that index
+* forwards to the next non-null value, or just return it again.
+*/
+   context = WinGetPartitionLocalMemory(winobj,
+sizeof(leadlag_const_context));


And a final style comment.  You have a block like this:

if (ignore_nulls  !const_offset)
{
long block;
}
else if (ignore_nulls /*  const_offset */)
{
another long block;
}
else
{
more stuff;
}

I think this 

  1   2   >