date:20130903

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Atri Sharma



Sent from my iPad

On 04-Sep-2013, at 10:54, Satoshi Nagayasu  wrote:

> Hi,
> 
> (2013/09/04 12:52), Atri Sharma wrote:
>> On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu  wrote:
>>> Hi,
>>> 
>>> I'm considering overhauling pgstat.stat, and would like to know how many
>>> people are interested in this topic.
>>> 
>>> As you may know, this file could be handreds of MB in size, because
>>> pgstat.stat holds all access statistics in each database, and it needs
>>> to read/write an entire pgstat.stat frequently.
>>> 
>>> As a result, pgstat.stat often generates massive I/O operation,
>>> particularly when having a large number of tables in the database.
>>> 
>>> To support multi-tenancy or just a large number of tables (up to 10k
>>> tables in single database), I think pgstat.stat needs to be overhauled.
>>> 
>>> I think using heap and btree in pgstat.stat would be preferred to reduce
>>> read/write and to allow updating access statistics for specific tables
>>> in pgstat.stat file.
>>> 
>>> Is this good for us?
>> 
>> Hi,
>> 
>> Nice thought. I/O reduction in pgstat can be really helpful.
>> 
>> I am trying to think of our aim here. Would we be looking to split
>> pgstat per table, so that the I/O write happens for only a portion of
>> pgstat? Or reduce the I/O in general?
> 
> I prefer the latter.
> 
> Under the current implementation, DBA need to split single database
> into many smaller databases with considering access locality of the
> tables. It's difficult and could be change in future.
> 
> And splitting the statistics data into many files (per table,
> for example) would cause another performance issue when
> collecting/showing statistics at once. Just my guess though.
> 
> So, I'm looking for a new way to reduce I/O for the statistics data
> in general.
> 
> Regards,
> 
>> 
>> If the later, how would using BTree help us? I would rather go for a
>> range tree or something. But again, I may be completely wrong.
>> 
>> Please elaborate a bit more on the solution we are trying to
>> achieve.It seems really interesting.
>> 
>> Regards,
>> 
>> Atri
>> 
>> 
> 
> 

Right,thanks.

How would using heap and BTree help here? Are we looking at a priority queue 
which supports the main storage system of the stats?

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Satoshi Nagayasu


Hi,

(2013/09/04 12:52), Atri Sharma wrote:

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu  wrote:

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?


Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?


I prefer the latter.

Under the current implementation, DBA need to split single database
into many smaller databases with considering access locality of the
tables. It's difficult and could be change in future.

And splitting the statistics data into many files (per table,
for example) would cause another performance issue when
collecting/showing statistics at once. Just my guess though.

So, I'm looking for a new way to reduce I/O for the statistics data
in general.

Regards,



If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri




--
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Satoshi Nagayasu

Hi,

(2013/09/04 13:07), Alvaro Herrera wrote:

Satoshi Nagayasu wrote:

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

We already changed it:

>
> commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
> Author: Alvaro Herrera 
> Date:   Mon Feb 18 17:56:08 2013 -0300
>
>  Split pgstat file in smaller pieces

Thanks for the comments. I forgot to mention that.

Yes, we have already split single pgstat.stat file into
several pieces.

However, we still need to read/write large amount of statistics
data when we have a large number of tables in single database
or multiple databases being accessed. Right?

I think the issue here is that it is necessary to write/read
statistics data even it's not actually changed.

So, I'm wondering how we can minimize read/write operations
on these statistics data files with using heap and btree.

Regards,

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera 
Date:   Mon Feb 18 17:56:08 2013 -0300

 Split pgstat file in smaller pieces

 We now write one file per database and one global file, instead of
 having the whole thing in a single huge file.  This reduces the I/O that
 must be done when partial data is required -- which is all the time,
 because each process only needs information on its own database anyway.
 Also, the autovacuum launcher does not need data about tables and
 functions in each database; having the global stats for all DBs is
 enough.

 Catalog version bumped because we have a new subdir under PGDATA.

 Author: Tomas Vondra.  Some rework by Álvaro
 Testing by Jeff Janes
 Other discussion by Heikki Linnakangas, Tom Lane.

I don't oppose further tweaking, of course, but I wonder if you are
considering these changes.

--
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Tomonari Katsumata


(2013/09/04 2:34), Josh Berkus wrote:
>
 I've checked the PostgreSQL License,
 and had a question.
 http://opensource.org/licenses/PostgreSQL
>>
>> I don't know who opensource.org are, but their idea of "the PostgreSQL
>> license" doesn't appear to match the actual license text.
>
> The OSI, which is the organization which defines whether licenses are
> open source or not.  We requested that they list The PostgreSQL License
> on their site after Red Hat determined that our license had sufficent
> textual differences from the BSD License to be a different license.
>
> I've requested that the spurious "file name" reference be removed.
>
Thank you for checking it soon.

I understands that "the PostgreSQL license" does not require
any specified file name and missing "LICENSE" is not problem.

I'm Sorry for the stupid question.

regards,

Tomonari Katsumata




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [tiny doc fix] statistics are not retained across immediate shutdown

2013-09-03 Thread Tsunakawa, Takayuki

Hi,

In the following page, statistics are kept across server restarts:

http://www.postgresql.org/docs/current/static/monitoring-stats.html

"When the server shuts down, a permanent copy of the statistics data is stored 
in the global subdirectory, so that statistics can be retained across server 
restarts."


However, statistics are not retained after immediate shutdown (pg_ctl stop 
-mi).  You may say "pg_ctl stop -mi is not a shutdown but an abort, so the 
sentence is not wrong", but it's an "immediate shutdown" and one mode of 
shutdown.

I propose a tiny fix to clarify this.  Please find the attached patch.

I'd like this to be backported at least 9.2.  Thanks.


Regards, Takayuki Tsunakawa


stats_reset_in_recovery.patch
Description: stats_reset_in_recovery.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric？

2013-09-03 Thread wangshuo


Hi, Hackers!

I find that it takes a long time when I increase the scale of a numeric 
datatype.
By checking the code, I found that's because it needs to rewrite that 
table's file.
After checking that table's data file, I found only parameter n_header 
changed.

And, I found the data in that numeric field never changed.
So I thank It's not necessary to rewrite the table's file in this case.

Anyone has more idea about this, please come to talk about this!

Best Regards!

 Yours,
 Wang Shuo
 HighGo Software Co.,Ltd.
 September 4, 2013


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Alvaro Herrera

Satoshi Nagayasu wrote:

> As you may know, this file could be handreds of MB in size, because
> pgstat.stat holds all access statistics in each database, and it needs
> to read/write an entire pgstat.stat frequently.
> 
> As a result, pgstat.stat often generates massive I/O operation,
> particularly when having a large number of tables in the database.

We already changed it:

commit 187492b6c2e8cafc5b39063ca3b67846e8155d24
Author: Alvaro Herrera 
Date:   Mon Feb 18 17:56:08 2013 -0300

Split pgstat file in smaller pieces

We now write one file per database and one global file, instead of
having the whole thing in a single huge file.  This reduces the I/O that
must be done when partial data is required -- which is all the time,
because each process only needs information on its own database anyway.
Also, the autovacuum launcher does not need data about tables and
functions in each database; having the global stats for all DBs is
enough.

Catalog version bumped because we have a new subdir under PGDATA.

Author: Tomas Vondra.  Some rework by Álvaro
Testing by Jeff Janes
Other discussion by Heikki Linnakangas, Tom Lane.

I don't oppose further tweaking, of course, but I wonder if you are
considering these changes.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Atri Sharma

On Wed, Sep 4, 2013 at 6:40 AM, Satoshi Nagayasu  wrote:
> Hi,
>
> I'm considering overhauling pgstat.stat, and would like to know how many
> people are interested in this topic.
>
> As you may know, this file could be handreds of MB in size, because
> pgstat.stat holds all access statistics in each database, and it needs
> to read/write an entire pgstat.stat frequently.
>
> As a result, pgstat.stat often generates massive I/O operation,
> particularly when having a large number of tables in the database.
>
> To support multi-tenancy or just a large number of tables (up to 10k
> tables in single database), I think pgstat.stat needs to be overhauled.
>
> I think using heap and btree in pgstat.stat would be preferred to reduce
> read/write and to allow updating access statistics for specific tables
> in pgstat.stat file.
>
> Is this good for us?

Hi,

Nice thought. I/O reduction in pgstat can be really helpful.

I am trying to think of our aim here. Would we be looking to split
pgstat per table, so that the I/O write happens for only a portion of
pgstat? Or reduce the I/O in general?

If the later, how would using BTree help us? I would rather go for a
range tree or something. But again, I may be completely wrong.

Please elaborate a bit more on the solution we are trying to
achieve.It seems really interesting.

Regards,

Atri

-- 
Regards,

Atri
l'apprenant

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Bruce Momjian

On Tue, Sep  3, 2013 at 09:44:33PM -0500, Merlin Moncure wrote:
> It gets worse and worse.  The IS NULL operator is already pretty much
> special cased -- in just about all other case concerning rowtypes (for
> example coalesce) 'null containing rowtypes are *not* considered to be
> null as the container itself has a null bit independent of it's
> elements which is expressly contrary to the SQL standard.  This is
> tragic; postgres's way of doing it (except with IS NULL) makes an
> awful lot of sense to me but here we are.  I think before making any
> behavior changes at all, we need to ask ourselves:
> 
> 1. Are we willing to break compatibility in order to move to spec
> compliant behavior?
> 2. and if so, what mountains do we have to cross to get there?
> 
> Your proposed change (implementation details aside) seems ok in the
> sense that it doesn't seem to have a lot of obvious side effects but
> the elephant in the room is #1; if the answer is 'no', then I'd say
> the best course of action is to let things be.

Yes, these are the right questions.  If we leave things unchanged, can
we document the behavior we currently have?  What I _think_ we have now
is that IS NULL in queries checks two levels deep for nulls, but checks
only one level deep for PL/pgSQL and constraints.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Bruce Momjian

On Tue, Sep  3, 2013 at 10:27:38PM -0400, Tom Lane wrote:
> I wrote:
> > And I will say once more that a patch that affects only the behavior of
> > eval_const_expressions can be rejected on its face.  That code has to be
> > kept in sync with the behavior of execQual.c, not just whacked around by
> > itself.  And then there are the NOT NULL constraint cases to worry about.
> 
> Hmm ... actually, it's already not in sync, because:
> 
> regression=# create table tt (x int);
> CREATE TABLE
> regression=# insert into tt values(null);
> INSERT 0 1
> regression=# select row(x) from tt;
>  row 
> -
>  ()
> (1 row)
> 
> regression=# select row(row(x)) from tt;
>   row   
> 
>  ("()")
> (1 row)
> 
> regression=# select row(row(row(x))) from tt;
>  row  
> --
>  ("(""()"")")
> (1 row)


Uh, I see the same output you show for a NULL constant:

SELECT ROW(NULL);
 row
-
 ()

SELECT ROW(ROW(NULL));
  row

 ("()")

SELECT ROW(ROW(ROW(NULL)));
 row
--
 ("(""()"")")

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Bruce Momjian

On Tue, Sep  3, 2013 at 09:43:14PM -0400, Tom Lane wrote:
> Bruce Momjian  writes:
> > This has made me adjust my goal and change it so SELECT ROW(NULL) IS
> > NULL returns true, and any further nesting returns false.
> 
> AFAICS, the only good argument for breaking backwards compatibility here
> is if you can convince people that the new behavior is more conformant to
> the SQL spec.  Where's the chapter and verse that argues for this
> interpretation?

The SQL03 standard in section 8.7, table 14, says "degree 1: null" and
"degree > 1: all null".  Does that mean they are considering nested rows
as degree > 1, or is that the number of values in the row? A footnote
says:

For all R, "R IS NOT NULL" has the same
result as "NOT R IS NULL" if and only if R is of
degree 1.

which seems to support the idea that degree is the number of values,
meaning they don't discuss nesting.

> And I will say once more that a patch that affects only the behavior of
> eval_const_expressions can be rejected on its face.  That code has to be
> kept in sync with the behavior of execQual.c, not just whacked around by
> itself.  And then there are the NOT NULL constraint cases to worry about.

I thought execQual.c was already not recursing so I didn't see a need to
change that.

I could not figure out how to test a NOT NULL constraint for nesting.

What is driving my research here is that our current behavior is really
not documentable.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Merlin Moncure

On Tue, Sep 3, 2013 at 8:32 PM, Bruce Momjian  wrote:
> On Fri, Jul  5, 2013 at 10:21:19AM -0400, Bruce Momjian wrote:
>> On Thu, Jul  4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
>> > Bruce Momjian  writes:
>> > > I developed the attached patch which properly recurses into ROW()
>> > > records checking for NULLs;  you can see it returns the right answer in
>> > > all cases (and constant folds too):
>> >
>> > My recollection of the previous discussion is that we didn't have
>> > consensus on what the "right" behavior is, so I'm not sure you can just
>> > assert that this patch is right.  In any case this is only touching the
>> > tip of the iceberg.  If we intend that rows of nulls should be null,
>> > then we have got issues with, for example, NOT NULL column constraint
>> > checks, which don't have any such recursion built into them.  I think
>> > the same is true for plpgsql variable NOT NULL restrictions, and there
>> > are probably some other places.
>>
>> Well we have three cases:
>>
>>   1  SELECT ROW(NULL) IS NULL;
>>   2  SELECT ROW(ROW(NULL)) IS NULL;
>>   3  SELECT ROW(ROW(ROW(NULL))) IS NULL;
>>
>> I think we could have them all return false, or all true, or the first
>> one true, and the rest false.  What I don't think we can justify is 1
>> and 2 as true, and 3 false.
>
> I have done some more research in this, and was able to verify Tom's
> concern that PL/pgSQL's IS NULL doesn't recurse into ROW expressions:
>
> DO LANGUAGE plpgsql $$
> DECLARE
> r RECORD;
> BEGIN
>
> SELECT NULL INTO r;
>   IF (r IS NULL)
> THEN RAISE NOTICE 'true';
> ELSE RAISE NOTICE 'false';
> END IF;
> END;
> $$;
> NOTICE:  true
> DO
>
> In this test, SELECT NULL (which internally would produce SELECT
> ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting
> returns false.
>
> This has made me adjust my goal and change it so SELECT ROW(NULL) IS
> NULL returns true, and any further nesting returns false.

It gets worse and worse.  The IS NULL operator is already pretty much
special cased -- in just about all other case concerning rowtypes (for
example coalesce) 'null containing rowtypes are *not* considered to be
null as the container itself has a null bit independent of it's
elements which is expressly contrary to the SQL standard.  This is
tragic; postgres's way of doing it (except with IS NULL) makes an
awful lot of sense to me but here we are.  I think before making any
behavior changes at all, we need to ask ourselves:

1. Are we willing to break compatibility in order to move to spec
compliant behavior?
2. and if so, what mountains do we have to cross to get there?

Your proposed change (implementation details aside) seems ok in the
sense that it doesn't seem to have a lot of obvious side effects but
the elephant in the room is #1; if the answer is 'no', then I'd say
the best course of action is to let things be.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] getting rid of maintainer-check

2013-09-03 Thread Peter Eisentraut

The maintainer-check target never really caught on, I think.  Most
people don't run it, and that in turn annoys those who do.  Also, it
doesn't provide much functionality.

I propose that we get rid of it and roll the functionality into the
regular build.

Specifically:

- Running duplicate_oids during the regular build was already discussed
elsewhere recently.  There are some details to be resolved there, but
it's doable.

- Checking for tabs in SGML files can be run during the regular
documentation build without problems.

- The NLS checks can also be run during the regular NLS-enabled build.

That's it.  Any concerns?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Tom Lane

I wrote:
> And I will say once more that a patch that affects only the behavior of
> eval_const_expressions can be rejected on its face.  That code has to be
> kept in sync with the behavior of execQual.c, not just whacked around by
> itself.  And then there are the NOT NULL constraint cases to worry about.

Hmm ... actually, it's already not in sync, because:

regression=# create table tt (x int);
CREATE TABLE
regression=# insert into tt values(null);
INSERT 0 1
regression=# select row(x) from tt;
 row 
-
 ()
(1 row)

regression=# select row(row(x)) from tt;
  row   

 ("()")
(1 row)

regression=# select row(row(row(x))) from tt;
 row  
--
 ("(""()"")")
(1 row)

There's certainly no excuse for this behaving differently from the cases
with a simple constant NULL.  So I'm a bit inclined to say that we should
rip out the special case in eval_const_expressions, not make it even less
self-consistent.  It's possible to argue that existing applications won't
be too sensitive to the behavior of the constant cases, but they surely
must be depending on the behavior in the non-constant cases.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Tom Lane

Bruce Momjian  writes:
> This has made me adjust my goal and change it so SELECT ROW(NULL) IS
> NULL returns true, and any further nesting returns false.

AFAICS, the only good argument for breaking backwards compatibility here
is if you can convince people that the new behavior is more conformant to
the SQL spec.  Where's the chapter and verse that argues for this
interpretation?

And I will say once more that a patch that affects only the behavior of
eval_const_expressions can be rejected on its face.  That code has to be
kept in sync with the behavior of execQual.c, not just whacked around by
itself.  And then there are the NOT NULL constraint cases to worry about.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] strange IS NULL behaviour

2013-09-03 Thread Bruce Momjian

On Fri, Jul  5, 2013 at 10:21:19AM -0400, Bruce Momjian wrote:
> On Thu, Jul  4, 2013 at 04:29:20PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > I developed the attached patch which properly recurses into ROW()
> > > records checking for NULLs;  you can see it returns the right answer in
> > > all cases (and constant folds too):
> > 
> > My recollection of the previous discussion is that we didn't have
> > consensus on what the "right" behavior is, so I'm not sure you can just
> > assert that this patch is right.  In any case this is only touching the
> > tip of the iceberg.  If we intend that rows of nulls should be null,
> > then we have got issues with, for example, NOT NULL column constraint
> > checks, which don't have any such recursion built into them.  I think
> > the same is true for plpgsql variable NOT NULL restrictions, and there
> > are probably some other places.
> 
> Well we have three cases:
> 
>   1  SELECT ROW(NULL) IS NULL;
>   2  SELECT ROW(ROW(NULL)) IS NULL;
>   3  SELECT ROW(ROW(ROW(NULL))) IS NULL;
> 
> I think we could have them all return false, or all true, or the first
> one true, and the rest false.  What I don't think we can justify is 1
> and 2 as true, and 3 false.

I have done some more research in this, and was able to verify Tom's
concern that PL/pgSQL's IS NULL doesn't recurse into ROW expressions:

DO LANGUAGE plpgsql $$
DECLARE
r RECORD;
BEGIN

SELECT NULL INTO r;
  IF (r IS NULL)
THEN RAISE NOTICE 'true';
ELSE RAISE NOTICE 'false';
END IF;
END;
$$;
NOTICE:  true
DO

In this test, SELECT NULL (which internally would produce SELECT
ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting
returns false.

This has made me adjust my goal and change it so SELECT ROW(NULL) IS
NULL returns true, and any further nesting returns false.

Attached is a patch which accomplishes this, and a documentation update.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 00f8ffb..51bfe31
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 351,357 
  for both tests.
  This definition conforms to the SQL standard, and is a change from the
  inconsistent behavior exhibited by PostgreSQL
! versions prior to 8.2.
 

  
--- 351,359 
  for both tests.
  This definition conforms to the SQL standard, and is a change from the
  inconsistent behavior exhibited by PostgreSQL
! versions prior to 8.2.  Row expressions inside row 
! expressions are processed as non-null, even if the sub-row contains only
! nulls.
 

  
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
new file mode 100644
index 76c032c..ce7047f
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*** eval_const_expressions_mutator(Node *nod
*** 3100,3109 
  return makeBoolConst(false, false);
  			continue;
  		}
  		newntest = makeNode(NullTest);
  		newntest->arg = (Expr *) relem;
  		newntest->nulltesttype = ntest->nulltesttype;
! 		newntest->argisrow = type_is_rowtype(exprType(relem));
  		newargs = lappend(newargs, newntest);
  	}
  	/* If all the inputs were constants, result is TRUE */
--- 3100,3124 
  return makeBoolConst(false, false);
  			continue;
  		}
+ 		/* Do we have a ROW() expression in the row? */
+ 		if (type_is_rowtype(exprType(relem)))
+ 		{
+ 			/*
+ 			 * ROW values in ROW values are not null, even if
+ 			 * those sub-ROW values contain only nulls.  This
+ 			 * makes the code match PL/pgSQL and constraint
+ 			 * IS NULL checks which don't probe into sub-ROWs
+ 			 * for NULL values.
+ 			 */			
+ 			if (ntest->nulltesttype == IS_NULL)
+ return makeBoolConst(false, false);
+ 			continue;
+ 		}
+ 
  		newntest = makeNode(NullTest);
  		newntest->arg = (Expr *) relem;
  		newntest->nulltesttype = ntest->nulltesttype;
! 		newntest->argisrow = false;
  		newargs = lappend(newargs, newntest);
  	}
  	/* If all the inputs were constants, result is TRUE */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [rfc] overhauling pgstat.stat

2013-09-03 Thread Satoshi Nagayasu

Hi,

I'm considering overhauling pgstat.stat, and would like to know how many
people are interested in this topic.

As you may know, this file could be handreds of MB in size, because
pgstat.stat holds all access statistics in each database, and it needs
to read/write an entire pgstat.stat frequently.

As a result, pgstat.stat often generates massive I/O operation,
particularly when having a large number of tables in the database.

To support multi-tenancy or just a large number of tables (up to 10k
tables in single database), I think pgstat.stat needs to be overhauled.

I think using heap and btree in pgstat.stat would be preferred to reduce
read/write and to allow updating access statistics for specific tables
in pgstat.stat file.

Is this good for us?

Any comments or suggestions?

Regards,
-- 
Satoshi Nagayasu 
Uptime Technologies, LLC. http://www.uptime.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Boguk, Maksym


>> 1)Addition of new string data types NATIONAL CHARACTER and NATIONAL 
>> CHARACTER VARIABLE.
>> These types differ from the char/varchar data types in one important
>> respect:  NATIONAL string types are always have UTF8 encoding even 
>> (independent from used database encoding).

>I don't like the approach of adding a new data type for this. The
encoding used for a text field should be an implementation detail, not
something that's exposed to users at the schema-level. A separate data
type makes an >nvarchar field behave slightly differently from text, for
example when it's passed to and from functions. It will also require
drivers and client applications to know about it.

Hi,  my task is implementing ANSI NATIONAL character string types as
part of PostgreSQL core.
And requirement " require drivers and client applications to know about
it" is reason why it could not be done as add-on (these new types should
have a fixed OID for most drivers from my experience).
Implementing them as UTF8 data-type is first step which allows have
NATIONAL characters with encoding differ from database encoding (and
might me even support multiple encoding for common string types in
future).


>> 1)Full set of string functions and operators for NATIONAL types (we 
>> could not use generic text functions because they assume that the 
>> stings will have database encoding).
>> Now only basic set implemented.
>> 2)Need implement some way to define default collation for a NATIONAL 
>> types.
>> 3)Need implement some way to input UTF8 characters into NATIONAL
types 
>> via SQL  (there are serious open problem... it will be defined later 
>> in the text).

>Yeah, all of these issues stem from the fact that the NATIONAL types
are separate from text.
>I think we should take a completely different approach to this. Two
alternatives spring to mind:

>1. Implement a new encoding.  The new encoding would be some variant of
>UTF-8 that encodes languages like Russian more efficiently. Then just
use that in the whole database. Something like SCSU
>(http://www.unicode.org/reports/tr6/) should do the trick, although I'm
not sure if SCSU can be used as a server-encoding. A lot of code relies
on the fact that a server encoding must have the high bit set in all
bytes that >are part of a multi-byte character. That's why SJIS for
example can only be used as a client-encoding. But surely you could come
up with some subset or variant of SCSU which satisfies that requirement.

>2. Compress the column. Simply do "ALTER TABLE foo ALTER COLUMN bar SET
STORAGE MAIN". That will make Postgres compress that field. That might
not be very efficient for compressing short Cyrillic text encoded in
>UTF-8 today, but that could be improved. There has been discussion on
supporting more compression algorithms in the past, and one such
algorithm could be again something like SCSU.

Both of these approach requires dump/restore the whole database which is
not always an opinion.
Implementing an UTF8 NATIONAL character as new datatype will provide
opinion use pg_upgrade to latest version and have required functionality
without prolonged downtime.

PS: is it possible to reserve some narrow type OID range in PostgreSQL
core for the future use?

Kind Regards,
Maksym






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] operator precedence issues

2013-09-03 Thread Merlin Moncure

On Tue, Sep 3, 2013 at 9:13 AM, Tom Lane  wrote:
> Andres Freund  writes:
>> On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote:
>>> While playing around with Andres's trick, I noticed that it works but
>>> will not match against operators taking "any" although those will
>>> match with explicit schema declaration (FWICT it goes through the
>>> search_path trying to explicitly match int/int operator then goes
>>> again matches "any").  That's pretty weird:
>
>> Not surprising. We look for the best match for an operator and
>> explicitly matching types will be that. If there were no operator(int,
>> int) your anyelement variant should get called.
>
> Yeah, this has exactly nothing to do with operator precedence.
> Precedence is about which operator binds tighter in cases like "A+B*C".

That all makes perfect sense -- thanks guys.  For posterity, Andres's trick
worked and did end up saving me some coding after all -- in my case I have
to eval() some externally generated fairly complex expressions in SQL (via
pl/pgsql EXECUTE) in the context of a much larger query.  My de-parsing
code ended up having bugs and it was much easier to tip-toe around the
search_path (via SET LOCAL) and force the modified operator.

merlin

Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

2013-09-03 Thread Andres Freund

On 2013-09-03 23:25:01 +0200, Hannu Krosing wrote:
> On 09/03/2013 05:27 PM, Robert Haas wrote:
> > On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma  wrote:
> >> I might give it a shot later this cycle as I have familiarized my self
> >> with the problem domain anyway. I understand the appeal of staying
> >> with what we have, but this would cap the speedup at 4x and has large
> >> caveats with the extra lookup tables. A 28x speedup might be worth the
> >> extra effort.
> > I agree.  However, on the flip side, a bird in the hand is worth two
> > in the bush.  A patch that just does the same thing faster is likely
> > to be less controversial than changing the algorithm, and does not
> > preclude changing the algorithm later.
> >
> Has anybody looked at the recent i5/i7 added hrdware support for CRC32 ?
> 
> http://www.strchr.com/crc32_popcnt
> 
> From number there it is probably still slower than xxhash, but it might
> be worth doing as
> an interim optimisation.

Yes. There's two problems: For one they use a different polynom
(castagnoli afair), so the results aren't compatible with ones we
compute so far. For another, you need runtime detection of whether the
instruction is available and you need some compiler specific things to
use it.
Neither is too hard to solve, but I think that amount of work is better
spent using a different hash.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Andres Freund

On 2013-09-03 15:56:15 -0400, Robert Haas wrote:
> On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund  wrote:
> >> And why is every 15 seconds good enough?
> >
> > Waiting 15s to become consistent instead of checkpoint_timeout seems to
> > be ok to me and to be a good tradeoff between overhead and waiting. We
> > can certainly discuss other values or making it configurable. The latter
> > seemed to be unnecessary to me, but I have don't have a problem
> > implementing it. I just don't want to document it :P
> 
> I don't think it particularly needs to be configurable, but I wonder
> if we can't be a bit smarter about when we do it.  For example,
> suppose we logged it every 15 s but only until we log a non-overflowed
> snapshot.

There's actually more benefits than just overflowed snapshots (pruning
of the known xids machinery, exclusive lock cleanup).

> I realize that the overhead of a WAL record every 15
> seconds is fairly small, but the load on some systems is all but
> nonexistent.  It would be nice not to wake up the HD unnecessarily.

The patch as-is only writes if there has been WAL written since the last
time it logged a running_xacts. I think it's not worth building more
smarts than that?

> >> The WAL writer is supposed to call XLogBackgroundFlush() every time
> >> WalWriterDelay expires.  Yeah, it can hibernate, but if it's
> >> hibernating, then we should respect that decision for this WAL record
> >> type also.
> >
> > Why should we respect it?
> 
> Because I don't see any reason to believe that this WAL record is any
> more important or urgent than any other WAL record that might get
> logged.

I can't follow the logic behind that statement. Just about all WAL
records are either pretty immediately flushed afterwards or are done in
the context of a transaction where we flush (or do a
XLogSetAsyncXactLSN) at transaction commit.

XLogBackgroundFlush() won't necessarily flush the running_xacts
record. Unless you've set the async xact lsn it will only flush complete
blocks. So what can happen (I've seen that more than once in testing,
took me a while to debug) that a checkpoint is started in a busy period
but nothing happens after it finished. Since the checkpoint triggered
running_xact is triggered *before* we do the smgr flush it still is
overflowed. Then, after activity has died down, the bgwriter issues the
running xact record, but it's filling a block and thus it never get's
flushed.

To me the alternatives are to do a XLogSetAsyncXactLSN() or an
XLogFlush(). The latter is more aggressive and can block for a
measurable amount of time, which is why I don't want to do it in the
bgwriter.

> >> >> I understand why logical replication needs to connect to a database,
> >> >> but I don't understand why any other walsender would need to connect
> >> >> to a database.
> >> >
> >> > Well, logical replication actually streams out data using the walsender,
> >> > so that's the reason why I want to add it there. But there have been
> >> > cases in the past where we wanted to do stuff in the walsender that need
> >> > database access, but we couldn't do so because you cannot connect to
> >> > one.
> >
> >> Could you be more specific?
> >
> > I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up
> > before.
> 
> It seems we need some more design there.  Perhaps entering replication
> mode could be triggered by writing either dbname=replication or
> replication=yes.  But then, do the replication commands simply become
> SQL commands?  I've certainly seen hackers use them that way.  And I
> can imagine that being a sensible approach, but this patch seems like
> it's only covering a fairly small fraction of what really ought to be
> a single commit.

Yes. I think you're right that we need more input/design here. I've
previously started threads about it, but nobody replied :(.

The problem with using dbname=replication as a trigger for anything is
that we actually allow databases to be created with that name. Perhaps
that was a design mistake.

I wondered about turning replication from a boolean into something like
off|0, on|1, database. dbname= gets only used in the latter
variant. That would be compatible with previous versions and would even
support using old tools (since all of them seem to do replication=1).

> But then, do the replication commands simply become
> SQL commands?  I've certainly seen hackers use them that way.

I don't think that it's a good way at this point to make them to plain
SQL. There is more infrastructure (signal handlers, permissions,
different timeouts) & memory required for walsenders, so using plain SQL
there seems beyond the scope of this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2013-09-03 Thread Andres Freund

On 2013-09-03 12:56:53 -0400, Alvaro Herrera wrote:
> Antonin Houska wrote:
> 
> > +   
> > +Suffixes k (kilobytes) and m
> > +(megabytes) are accepted. For example: 10m
> > +   
> 
> "m" is for meters, or milli.  Please use "M" here.

Shouldn't it be MB? Consistent with GUCs?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Bruce Momjian

On Tue, Sep  3, 2013 at 07:47:22PM +0200, Magnus Hagander wrote:
> 
> On Sep 3, 2013 6:23 PM, "Josh Berkus"  wrote:
> >
> > Peter,
> >
> >
> > > I wonder, is anyone really running in production with full_page_writes
> > > off?
> >
> > We're doing so with clients on Joyent (ZFS).  And also for "ephemeral"
> > postgres instances -- ones where we've also turned off fsync.
> >
> > It's also quite reasonable to assume that future filesystems will also
> > be able to make better torn page guarantees.
> >
> > So it should remain a setting.  I see no reason why it should be a
> > SIGHUP setting, though, if there's any code maintenance required to
> > support it -- frankly, I only change this setting at first system
> > deployment.
> >
> 
> This matches my experience as well - people certainly use it, but don't change
> it during runtime.
> 
> Not having looked at the code, but doesn't pg_start_backup use at least parts
> of the same code path? That one will definitely still be able to modify it at
> runtime

Yes, this was already mentioned in the thread, but didn't get noticed by
a few folks.  In summary, no changes needed here.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

2013-09-03 Thread Hannu Krosing

On 09/03/2013 05:27 PM, Robert Haas wrote:
> On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma  wrote:
>> I might give it a shot later this cycle as I have familiarized my self
>> with the problem domain anyway. I understand the appeal of staying
>> with what we have, but this would cap the speedup at 4x and has large
>> caveats with the extra lookup tables. A 28x speedup might be worth the
>> extra effort.
> I agree.  However, on the flip side, a bird in the hand is worth two
> in the bush.  A patch that just does the same thing faster is likely
> to be less controversial than changing the algorithm, and does not
> preclude changing the algorithm later.
>
Has anybody looked at the recent i5/i7 added hrdware support for CRC32 ?

http://www.strchr.com/crc32_popcnt

>From number there it is probably still slower than xxhash, but it might
be worth doing as
an interim optimisation.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2013-09-03 Thread Antonin Houska

On 09/03/2013 06:56 PM, Alvaro Herrera wrote:

>> +/*
>> + * Only the following suffixes are allowed. It's not 
>> too useful to
>> + * restrict the rate to gigabytes: such a rate will 
>> probably bring
>> + * significant impact on the master anyway, so the 
>> throttling
>> + * won't help much.
>> + */
>> +case 'g':
>> +factor <<= 10;
> 
> I don't understand why you allow a 'g' here, given the comment above ...
> but in any case it should be G.
> 

This reflects my hesitation whether GB should be accepted as a unit or
not. I'll probably remove this suffix.

(Will fix the other findings too,)

Thanks,
Tony


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Robert Haas

On Tue, Sep 3, 2013 at 12:57 PM, Andres Freund  wrote:
>> To my way of thinking, it seems as though we ought to always begin
>> replay at a checkpoint, so the standby ought always to see one of
>> these records immediately.  Obviously that's not good enough, but why
>> not?
>
> We always see one after the checkpoint (well, actually before the
> checkpoint record, but ...), correct. The problem is just that reading a
> single xact_running record doesn't automatically make you consistent. If
> there's a single suboverflowed transaction running on the primary when
> the xl_runing_xacts is logged we won't be able to switch to
> consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
> and some optimizations.
> Since the only place where we currently have the information to
> potentially become consistent is ProcArrayApplyRecoveryInfo() we will
> have to wait checkpoint_timeout time till we get consistent. Which
> sucks as there are good arguments to set that to 1h.
> That especially sucks as you loose consistency everytime you restart the
> standby...

Right, OK.

>> And why is every 15 seconds good enough?
>
> Waiting 15s to become consistent instead of checkpoint_timeout seems to
> be ok to me and to be a good tradeoff between overhead and waiting. We
> can certainly discuss other values or making it configurable. The latter
> seemed to be unnecessary to me, but I have don't have a problem
> implementing it. I just don't want to document it :P

I don't think it particularly needs to be configurable, but I wonder
if we can't be a bit smarter about when we do it.  For example,
suppose we logged it every 15 s but only until we log a non-overflowed
snapshot.  I realize that the overhead of a WAL record every 15
seconds is fairly small, but the load on some systems is all but
nonexistent.  It would be nice not to wake up the HD unnecessarily.

>> The WAL writer is supposed to call XLogBackgroundFlush() every time
>> WalWriterDelay expires.  Yeah, it can hibernate, but if it's
>> hibernating, then we should respect that decision for this WAL record
>> type also.
>
> Why should we respect it?

Because I don't see any reason to believe that this WAL record is any
more important or urgent than any other WAL record that might get
logged.

>> >> I understand why logical replication needs to connect to a database,
>> >> but I don't understand why any other walsender would need to connect
>> >> to a database.
>> >
>> > Well, logical replication actually streams out data using the walsender,
>> > so that's the reason why I want to add it there. But there have been
>> > cases in the past where we wanted to do stuff in the walsender that need
>> > database access, but we couldn't do so because you cannot connect to
>> > one.
>
>> Could you be more specific?
>
> I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up
> before.

It seems we need some more design there.  Perhaps entering replication
mode could be triggered by writing either dbname=replication or
replication=yes.  But then, do the replication commands simply become
SQL commands?  I've certainly seen hackers use them that way.  And I
can imagine that being a sensible approach, but this patch seems like
it's only covering a fairly small fraction of what really ought to be
a single commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2013-09-03 Thread Alvaro Herrera

Antonin Houska wrote:

> +   
> +Suffixes k (kilobytes) and m
> +(megabytes) are accepted. For example: 10m
> +   

"m" is for meters, or milli.  Please use "M" here.

> +static uint32
> +parse_max_rate(char *src)
> +{
> + int factor;
> + char   *after_num;
> + int64   result;
> + int errno_copy;
> +
> + result = strtol(src, &after_num, 0);
> + errno_copy = errno;
> + if (src == after_num)
> + {
> + fprintf(stderr, _("%s: transfer rate %s is not a valid integer 
> value\n"), progname, src);
> + exit(1);
> + }

Please add quotes to the invalid value.

> +
> + /*
> +  * Evaluate (optional) suffix.
> +  *
> +  * after_num should now be right behind the numeric value.
> +  */
> + factor = 1;
> + switch (tolower(*after_num))
> + {
> + /*
> +  * Only the following suffixes are allowed. It's not 
> too useful to
> +  * restrict the rate to gigabytes: such a rate will 
> probably bring
> +  * significant impact on the master anyway, so the 
> throttling
> +  * won't help much.
> +  */
> + case 'g':
> + factor <<= 10;

I don't understand why you allow a 'g' here, given the comment above ...
but in any case it should be G.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Robert Haas

On Mon, Sep 2, 2013 at 10:10 PM, Jeff Davis  wrote:
> There is a significant amount of code supporting the changing of
> full_page_writes while the server is running, including an
> XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost
> nobody changes this on a running server, because either your filesystem
> guarantees atomic page writes for you or not.

Although this is true, the administrator's estimate of whether that
guarantee is or is not provided might not be as consistent as the
hardware behavior itself.  I am generally of the feeling that having
to restart the server to change setting sucks, and while it surely
sucks less for this setting than some, mostly because few people
change this setting in the first place, I'm still not convinced that
this is moving in the right direction.

> I'd like to submit a patch to just make it into a PGC_POSTMASTER and
> remove the code to support changing it.
>
> Then, I intend to write another patch to make the full-page writes for
> checksums honor the full_page_writes setting. That will be easier to
> write once it's a PGC_POSTMASTER.

I don't think I know exactly what this means.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions

2013-09-03 Thread Alvaro Herrera

Robert Haas escribió:

> The patch file turned out to be sorta garbled.  I'm not sure if a
> broken version of diff was used to generate this or whether MauMau
> hand-edited it after the fact, but the number of lines that were
> indicated in the control lines didn't match the actual hunks, and
> patch threw up.  So it took me 20 minutes to do what should have taken
> 5, but now it's done.

If this ever happens to you in the future, I suggest giving recountdiff
a look (in the patchutils package).  It is supposed to fix offsets and
line counts of hand-edited diffs.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Further XLogInsert scaling tweaking

2013-09-03 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Mon, Sep  2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote:
> > diff --git a/src/backend/access/transam/xlog.c 
> > b/src/backend/access/transam/xlog.c
> > index 39c58d0..28e62ea 100644

> > -   XLogInsertSlotPadded *insertSlots;
> > +   /*
> > +* Make sure the above heavily-contended spinlock and byte positions are
> > +* on their own cache line. In particular, the RedoRecPtr and full page
> > +* write variables below should be on a different cache line. They are
> > +* read on every WAL insertion, but updated rarely, and we don't want
> > +* those reads to steal the cache line containing Curr/PrevBytePos.
> > +*/
> > +   charpad[128];
> 
> Do we adjust for cache line lengths anywhere else?  PGPROC?  Should it
> be a global define?

We have LWLockPadded in lwlock.c; it only adjusts the size of the struct
to be a power of 2.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump

2013-09-03 Thread Tom Lane

Robert Haas  writes:
> On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
>  wrote:
>> attached is a rough patch that does exactly that, if we are worried
>> about an api change we could simple do a ProcessGUCArrayNotice() in the
>> backbranches if that approach is actually sane.

> This patch has some definite coding-style issues, but those should be
> easy to fix.  The bigger thing I worry about is whether distributing
> the decision as to what elevel ought to be used here all over the code
> base is indeed sane.  Perhaps that ship has already sailed, though.

I don't particularly care for this approach, for that reason and others.
One such issue is that it would result in duplicate messages, because
functioncmds.c will already have issued warnings thanks to the GUCArrayAdd
calls it makes.  (Those calls are made with context PGC_S_TEST, which is
why they only throw warnings not errors, cf validate_option_array_item.)

But here's the big-picture consideration: the reason that we call
ProcessGUCArray in this place in ProcedureCreate is that we are trying to
set up the correct GUC environment for the function validator.  As an
example, if the SET clause for a SQL-language function includes a setting
for search_path, we had better adopt that search path while checking the
function body, or we will come to incorrect conclusions.

If we use this patch, we'd be saying that we're okay with failing to
apply the SET clause before calling the validator, any time the requested
GUC value happens to be invalid.  That's not terribly comfortable: in
general we'd be assuming that validators' results don't depend on the GUC
environment, which is clearly false.  Now we can probably get away with
that in the context of check_function_bodies = false, because the
validator shouldn't be doing much of anything then (although a few of them
check some things anyway...).  But if we're going to assume that the
validator doesn't depend on GUC environment when check_function_bodies =
false, why apply the SET clause at all?

So I think a more reasonable fix is just to skip the ProcessGUCArray
call altogether when check_function_bodies = false, and to document
that validators mustn't assume anything about the GUC environment
when check_function_bodies = false.

If no objections, I'll go fix it that way.

BTW, I notice the various comments about PGC_S_TEST context are out of
date, because they don't mention that it is used for CREATE FUNCTION SET
cases ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump

2013-09-03 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> So I think a more reasonable fix is just to skip the ProcessGUCArray
> call altogether when check_function_bodies = false, and to document
> that validators mustn't assume anything about the GUC environment
> when check_function_bodies = false.
> 
> If no objections, I'll go fix it that way.

Works for me.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Peter Geoghegan

On Tue, Sep 3, 2013 at 9:23 AM, Josh Berkus  wrote:
> So it should remain a setting.

Sure. I wasn't suggesting deprecating it or anything.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Josh Berkus


>>> I've checked the PostgreSQL License,
>>> and had a question.
>>> http://opensource.org/licenses/PostgreSQL
> 
> I don't know who opensource.org are, but their idea of "the PostgreSQL
> license" doesn't appear to match the actual license text.

The OSI, which is the organization which defines whether licenses are
open source or not.  We requested that they list The PostgreSQL License
on their site after Red Hat determined that our license had sufficent
textual differences from the BSD License to be a different license.

I've requested that the spurious "file name" reference be removed.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Magnus Hagander

On Sep 3, 2013 6:23 PM, "Josh Berkus"  wrote:
>
> Peter,
>
>
> > I wonder, is anyone really running in production with full_page_writes
> > off?
>
> We're doing so with clients on Joyent (ZFS).  And also for "ephemeral"
> postgres instances -- ones where we've also turned off fsync.
>
> It's also quite reasonable to assume that future filesystems will also
> be able to make better torn page guarantees.
>
> So it should remain a setting.  I see no reason why it should be a
> SIGHUP setting, though, if there's any code maintenance required to
> support it -- frankly, I only change this setting at first system
> deployment.
>

This matches my experience as well - people certainly use it, but don't
change it during runtime.

Not having looked at the code, but doesn't pg_start_backup use at least
parts of the same code path? That one will definitely still be able to
modify it at runtime

/Magnus

Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Tom Lane

Robert Haas  writes:
> On Tue, Sep 3, 2013 at 12:16 PM, Tomonari Katsumata
>  wrote:
>> PostgreSQL is released under "the PostgreSQL License".
>> http://www.postgresql.org/about/licence/

The actual license terms can be found in the COPYRIGHT file in our
distribution.  There is nothing in there mentioning a LICENSE file.

>> I've checked the PostgreSQL License,
>> and had a question.
>> http://opensource.org/licenses/PostgreSQL

I don't know who opensource.org are, but their idea of "the PostgreSQL
license" doesn't appear to match the actual license text.

The real question here is why our website is linking to theirs.  Surely
we should not be setting things up in such a way that people might think
that opensource.org defines the authoritative version of the text.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Andres Freund

On 2013-09-03 12:22:22 -0400, Robert Haas wrote:
> On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund  wrote:
> >> 1. I think more comments are needed here to explain why we need this.
> >> I don't know if the comments should go into the functions modified by
> >> this patch or in some other location, but I don't find what's here now
> >> adequate for understanding.
> >
> > Hm. What information are you actually missing? I guess the
> > XLogSetAsyncXactLSN() needs a bit more context based on your question,
> > what else?
> > Not sure if it makes sense to explain in detail why it helps us to get
> > into a consistent state faster?

> Well, we must have had some idea in mind when the original Hot Standby
> patch went in that doing this once per checkpoint was good enough.
> Now we think we need it every 15 seconds, but not more or less often.
> So, why the change of heart?

I think the primary reason for that was that it's was a pretty
complicated patchset and we needed to start somewhere. By now we do have
reports of standbys taking their time to get consistent.

> To my way of thinking, it seems as though we ought to always begin
> replay at a checkpoint, so the standby ought always to see one of
> these records immediately.  Obviously that's not good enough, but why
> not?

We always see one after the checkpoint (well, actually before the
checkpoint record, but ...), correct. The problem is just that reading a
single xact_running record doesn't automatically make you consistent. If
there's a single suboverflowed transaction running on the primary when
the xl_runing_xacts is logged we won't be able to switch to
consistent. Check procarray.c:ProcArrayApplyRecoveryInfo() for some fun
and some optimizations.
Since the only place where we currently have the information to
potentially become consistent is ProcArrayApplyRecoveryInfo() we will
have to wait checkpoint_timeout time till we get consistent. Which
sucks as there are good arguments to set that to 1h.
That especially sucks as you loose consistency everytime you restart the
standby...

> And why is every 15 seconds good enough?

Waiting 15s to become consistent instead of checkpoint_timeout seems to
be ok to me and to be a good tradeoff between overhead and waiting. We
can certainly discuss other values or making it configurable. The latter
seemed to be unnecessary to me, but I have don't have a problem
implementing it. I just don't want to document it :P

> >> 3. Why does LogCurrentRunningXacts() need to call
> >> XLogSetAsyncXactLSN()?  Hopefully any WAL record is going to get
> >> sync'd in a reasonably timely fashion; I can't see off-hand why this
> >> one should need special handling.
> >
> > No, we don't force writing out wal records in a timely fashion if
> > there's no pressure in wal_buffers, basically only on commits and
> > various XLogFlush()es. It doesn't make much of a difference if the
> > entire system is busy, but if it's not the wal writer will sleep. The
> > alternative would be to XLogFlush() the record, but that would actually
> > block, which isn't really what we want/need.

> The WAL writer is supposed to call XLogBackgroundFlush() every time
> WalWriterDelay expires.  Yeah, it can hibernate, but if it's
> hibernating, then we should respect that decision for this WAL record
> type also.

Why should we respect it? There is work to be done and the wal writer
has no way of knowing that without us telling it? Normally we rely on
commit records and XLogFlush()es to trigger the wal writer.
Alternatively we can start a transaction and set synchronous_commit =
off, but that seems like a complication to me.

> >> I understand why logical replication needs to connect to a database,
> >> but I don't understand why any other walsender would need to connect
> >> to a database.
> >
> > Well, logical replication actually streams out data using the walsender,
> > so that's the reason why I want to add it there. But there have been
> > cases in the past where we wanted to do stuff in the walsender that need
> > database access, but we couldn't do so because you cannot connect to
> > one.

> Could you be more specific?

I only remember 3959.1349384...@sss.pgh.pa.us but I think it has come up
before.

> >>  Absent a clear use case for such a thing, I don't
> >> think we should allow it.  Ignorant suggestion: perhaps the database
> >> name could be stored in the logical replication slot.
> >
> > The problem is that you need to InitPostgres() with a database. You
> > cannot do that again, after connecting with an empty database which we
> > do in a plain walsender.
> 
> Are you saying that the logical replication slot can't be read before
> calling InitPostgres()?

The slot can be read just fine, but we won't know that we should do
that. Walsender accepts commands via PostgresMain()'s mainloop which has
done a InitPostgres(dbname) before. Which we need to do because we need
the environment it sets up.

The database is stored in the slots btw (as oid, not a

Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump

2013-09-03 Thread Stefan Kaltenbrunner

On 09/03/2013 06:15 PM, Robert Haas wrote:
> On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
>  wrote:
>>> It would seem that a simple solution would be to add an elevel argument
>>> to ProcessGUCArray and then call it with NOTICE in the case that
>>> check_function_bodies is true.  None of the contrib modules call
>>> ProcessGUCArray, but should we worry that some external module does?
>>
>> attached is a rough patch that does exactly that, if we are worried
>> about an api change we could simple do a ProcessGUCArrayNotice() in the
>> backbranches if that approach is actually sane.
> 
> This patch has some definite coding-style issues, but those should be
> easy to fix.  The bigger thing I worry about is whether distributing
> the decision as to what elevel ought to be used here all over the code
> base is indeed sane.  Perhaps that ship has already sailed, though.

I can certainly fix the coding style thing up - but it was declared as
"rough" mostly because I'm not entirely sure the way this is going is
actually the right way to attack this...

This whole stuff seems to be a bit messy and bolted on in some ways.
There is ProcessGUCArray(), but also set_config_option() and its
external "wrapper" SetConfigOption() - the division of labour between
the caller deciding on what it wants vs what the function does with some
combination of elevel and source internally is not very consistent at best.

I also note that a lot of places are actually calling
set_config_option() directly, so maybe there is an opportunity to unify
here as well.

Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Josh Berkus

On 09/03/2013 11:16 AM, Tomonari Katsumata wrote:
> It seems the contents of the "COPYRIGTH" file are all required by the
> license.
> So I have confused why PostgreSQL has a "COPYRIGHT" file instead of
> "LICENSE".
> Anybody knows the reason?
> And is this non problem thing?

The first paragraph on the OSI page isn't part of the licences, but
rather is general advice by OSI.

I'm not clear on why they specified a particular filename, since they
don't do that for other licenses; I'll ask.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE FUNCTION .. SET vs. pg_dump

2013-09-03 Thread Robert Haas

On Sun, Sep 1, 2013 at 10:36 AM, Stefan Kaltenbrunner
 wrote:
>> It would seem that a simple solution would be to add an elevel argument
>> to ProcessGUCArray and then call it with NOTICE in the case that
>> check_function_bodies is true.  None of the contrib modules call
>> ProcessGUCArray, but should we worry that some external module does?
>
> attached is a rough patch that does exactly that, if we are worried
> about an api change we could simple do a ProcessGUCArrayNotice() in the
> backbranches if that approach is actually sane.

This patch has some definite coding-style issues, but those should be
easy to fix.  The bigger thing I worry about is whether distributing
the decision as to what elevel ought to be used here all over the code
base is indeed sane.  Perhaps that ship has already sailed, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Robert Haas

On Tue, Sep 3, 2013 at 12:16 PM, Tomonari Katsumata
 wrote:
> Hi,
>
> PostgreSQL is released under "the PostgreSQL License".
> http://www.postgresql.org/about/licence/
>
> I've checked the PostgreSQL License,
> and had a question.
> http://opensource.org/licenses/PostgreSQL
>
> It seems that the license requres a "LICENSE" file
> in the target product
> [quote]
>> Then put the license into a file called "LICENSE" in
>> your software distribution.
>
> But I couldn't find "LICENSE" file in current PostgreSQL source tree,
> instead found a "COPYRIGHT" file.
>
> It seems the contents of the "COPYRIGTH" file are all required by the
> license.
> So I have confused why PostgreSQL has a "COPYRIGHT" file instead of
> "LICENSE".
> Anybody knows the reason?
> And is this non problem thing?

I don't think it matters very much what the file is called, as long as
it's in a file that can be found easily by people who want to know
what the license is.  I imagine that either LICENSE or COPYRIGHT would
suffice for that purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freezing without write I/O

2013-09-03 Thread Robert Haas

On Mon, Sep 2, 2013 at 3:16 PM, Jeff Davis  wrote:
> On Fri, 2013-08-30 at 20:34 +0200, Andres Freund wrote:
>> I have a quick question: The reason I'd asked about the status of the
>> patch was that I was thinking about the state of the "forensic freezing"
>> patch. After a quick look at your proposal, we still need to freeze in
>> some situations (old & new data on the same page basically), so I'd say
>> it still makes sense to apply the forensic freezing patch, right?
>>
>> Differing Opinions?
>
> The Freeze Forensically patch is nice because (if I understand it
> correctly) it allows us to freeze at the same time as we mark
> PD_ALL_VISIBLE, which avoids the potential extra page write.

The patch itself doesn't actually make that change, but it removes one
major objection to such a change.

> But that's
> not such a big advantage if we don't ordinarily have to write again for
> freezing, anyway.

That was my thought as well.

> However, there are still some cases where we'd be able to preserve the
> forensic information. If nothing else, that might help debug this patch,
> if necessary. There might also be cases where we can freeze more eagerly
> to avoid the case where very old (but unfrozen) and very new tuples mix
> on the same page. Perhaps Robert has some thoughts here, as well.

I basically agree.  I think if we adopt Heikki's patch forensic
freezing becomes much less important, but we might find there's still
a reason to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The PostgreSQL License requires "LICENSE" file?

2013-09-03 Thread Tomonari Katsumata

Hi,

PostgreSQL is released under "the PostgreSQL License".
http://www.postgresql.org/about/licence/

I've checked the PostgreSQL License,
and had a question.
http://opensource.org/licenses/PostgreSQL

It seems that the license requres a "LICENSE" file
in the target product
[quote]
> Then put the license into a file called "LICENSE" in
> your software distribution.

But I couldn't find "LICENSE" file in current PostgreSQL source tree,
instead found a "COPYRIGHT" file.

It seems the contents of the "COPYRIGTH" file are all required by the
license.
So I have confused why PostgreSQL has a "COPYRIGHT" file instead of
"LICENSE".
Anybody knows the reason?
And is this non problem thing?

regards,
-
Tomonari Katsumata

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Josh Berkus

Peter,


> I wonder, is anyone really running in production with full_page_writes
> off? 

We're doing so with clients on Joyent (ZFS).  And also for "ephemeral"
postgres instances -- ones where we've also turned off fsync.

It's also quite reasonable to assume that future filesystems will also
be able to make better torn page guarantees.

So it should remain a setting.  I see no reason why it should be a
SIGHUP setting, though, if there's any code maintenance required to
support it -- frankly, I only change this setting at first system
deployment.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Robert Haas

On Tue, Sep 3, 2013 at 12:05 PM, Andres Freund  wrote:
>> 1. I think more comments are needed here to explain why we need this.
>> I don't know if the comments should go into the functions modified by
>> this patch or in some other location, but I don't find what's here now
>> adequate for understanding.
>
> Hm. What information are you actually missing? I guess the
> XLogSetAsyncXactLSN() needs a bit more context based on your question,
> what else?
> Not sure if it makes sense to explain in detail why it helps us to get
> into a consistent state faster?

Well, we must have had some idea in mind when the original Hot Standby
patch went in that doing this once per checkpoint was good enough.
Now we think we need it every 15 seconds, but not more or less often.
So, why the change of heart?  To my way of thinking, it seems as
though we ought to always begin replay at a checkpoint, so the standby
ought always to see one of these records immediately.  Obviously
that's not good enough, but why not?  And why is every 15 seconds good
enough?

>> 3. Why does LogCurrentRunningXacts() need to call
>> XLogSetAsyncXactLSN()?  Hopefully any WAL record is going to get
>> sync'd in a reasonably timely fashion; I can't see off-hand why this
>> one should need special handling.
>
> No, we don't force writing out wal records in a timely fashion if
> there's no pressure in wal_buffers, basically only on commits and
> various XLogFlush()es. It doesn't make much of a difference if the
> entire system is busy, but if it's not the wal writer will sleep. The
> alternative would be to XLogFlush() the record, but that would actually
> block, which isn't really what we want/need.

The WAL writer is supposed to call XLogBackgroundFlush() every time
WalWriterDelay expires.  Yeah, it can hibernate, but if it's
hibernating, then we should respect that decision for this WAL record
type also.

>> > 0003 wal_decoding: Allow walsender's to connect to a specific database
>> > * biggest problem is how to specify the connection we connect
>> >   to. Currently with the patch walsender connects to a database if it's
>> >   not named "replication" (via dbname). Perhaps it's better to invent a
>> >   replication_dbname parameter?
>
>> I understand why logical replication needs to connect to a database,
>> but I don't understand why any other walsender would need to connect
>> to a database.
>
> Well, logical replication actually streams out data using the walsender,
> so that's the reason why I want to add it there. But there have been
> cases in the past where we wanted to do stuff in the walsender that need
> database access, but we couldn't do so because you cannot connect to
> one.

Could you be more specific?

>>  Absent a clear use case for such a thing, I don't
>> think we should allow it.  Ignorant suggestion: perhaps the database
>> name could be stored in the logical replication slot.
>
> The problem is that you need to InitPostgres() with a database. You
> cannot do that again, after connecting with an empty database which we
> do in a plain walsender.

Are you saying that the logical replication slot can't be read before
calling InitPostgres()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Andres Freund

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
> On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund  
> wrote:
> > 0005 wal_decoding: Log xl_running_xact's at a higher frequency than 
> > checkpoints are done
> > * benefits hot standby startup
> 
> Review:
> 
> 1. I think more comments are needed here to explain why we need this.
> I don't know if the comments should go into the functions modified by
> this patch or in some other location, but I don't find what's here now
> adequate for understanding.

Hm. What information are you actually missing? I guess the
XLogSetAsyncXactLSN() needs a bit more context based on your question,
what else?
Not sure if it makes sense to explain in detail why it helps us to get
into a consistent state faster?

> 2. I think the variable naming could be better.  If nothing else, I'd
> spell out "snapshot" rather than abbreviating it to "snap".  I'd also
> add comments explaining what each of those variables does.

Ok.

> And why
> isn't log_snap_interval_ms a #define rather than a variable?  (Don't
> even talk to me about using gdb on a running instance.  If you're even
> thinking about that, this needs to be a GUC.)

Ugh. It certainly doesn't have anything to do with wanting to change it
on a running system using gdb. Brrr.

I think I wanted it to be a constant variable but forgot the const. I
personally prefer 'static const' to #define's if its legal C, but I
guess the project's style differs, so I'll change that.

> 3. Why does LogCurrentRunningXacts() need to call
> XLogSetAsyncXactLSN()?  Hopefully any WAL record is going to get
> sync'd in a reasonably timely fashion; I can't see off-hand why this
> one should need special handling.

No, we don't force writing out wal records in a timely fashion if
there's no pressure in wal_buffers, basically only on commits and
various XLogFlush()es. It doesn't make much of a difference if the
entire system is busy, but if it's not the wal writer will sleep. The
alternative would be to XLogFlush() the record, but that would actually
block, which isn't really what we want/need.

> > 0003 wal_decoding: Allow walsender's to connect to a specific database
> > * biggest problem is how to specify the connection we connect
> >   to. Currently with the patch walsender connects to a database if it's
> >   not named "replication" (via dbname). Perhaps it's better to invent a
> >   replication_dbname parameter?

> I understand why logical replication needs to connect to a database,
> but I don't understand why any other walsender would need to connect
> to a database.

Well, logical replication actually streams out data using the walsender,
so that's the reason why I want to add it there. But there have been
cases in the past where we wanted to do stuff in the walsender that need
database access, but we couldn't do so because you cannot connect to
one.

>  Absent a clear use case for such a thing, I don't
> think we should allow it.  Ignorant suggestion: perhaps the database
> name could be stored in the logical replication slot.

The problem is that you need to InitPostgres() with a database. You
cannot do that again, after connecting with an empty database which we
do in a plain walsender.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add database to PGXACT / per database vacuuming

2013-09-03 Thread Robert Haas

On Fri, Aug 30, 2013 at 2:29 PM, Andres Freund  wrote:
>> I don't know how big an impact adding the database oid would have, on the
>> case that the PGPROC/PGXACT split was done in the first place. In the worst
>> case it will make taking a snapshot 1/3 slower under contention. That needs
>> to be tested.
>
> Yes, definitely. I am basically wondering whether somebody has/sees
> fundamental probles with it making it pointless to investigate.

I expect there will be a measurable performance degradation, though
I'm willing to be proven wrong.  I think the question is whether we
get enough bang for the buck out of it to eat that.  It seems quite
likely that users with many databases will come out ahead, as such
systems seem likely to be shared hosting environments where the
machine is lightly loaded most of the time anyway, but where
cross-database interactions cause headaches.  But many users have One
Big Database, and AFAICS this is just overhead for them.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.3 doc fix] clarification of Solaris versions

2013-09-03 Thread Robert Haas

On Fri, Aug 30, 2013 at 5:01 AM, Bjorn Munch  wrote:
> On 29/08 21.17, MauMau wrote:
>>
>> Thanks.  I belive PostgreSQL runs successfully on Solaris 10 and later,
>> because the binaries are published on the community site:
>>
>> http://ftp.postgresql.org/pub/binary/v9.3beta2/solaris/
>
> Sorry, I didn't notice this thread earlier. Yes, I am building those
> binaries and I have had no problems building PostgreSQL and running
> the regression tests on Solaris 11. It would be quite unusual for
> something to work on Solaris 10 but not 11, but of course it could
> happen. There may have been a case or two which I now can't remember.

OK, patch committed and back-patched to 9.3.

The patch file turned out to be sorta garbled.  I'm not sure if a
broken version of diff was used to generate this or whether MauMau
hand-edited it after the fact, but the number of lines that were
indicated in the control lines didn't match the actual hunks, and
patch threw up.  So it took me 20 minutes to do what should have taken
5, but now it's done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL CPU overhead/optimization (was Master-slave visibility order)

2013-09-03 Thread Robert Haas

On Thu, Aug 29, 2013 at 8:22 PM, Ants Aasma  wrote:
> I might give it a shot later this cycle as I have familiarized my self
> with the problem domain anyway. I understand the appeal of staying
> with what we have, but this would cap the speedup at 4x and has large
> caveats with the extra lookup tables. A 28x speedup might be worth the
> extra effort.

I agree.  However, on the flip side, a bird in the hand is worth two
in the bush.  A patch that just does the same thing faster is likely
to be less controversial than changing the algorithm, and does not
preclude changing the algorithm later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v5

2013-09-03 Thread Robert Haas

On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund  wrote:
> 0005 wal_decoding: Log xl_running_xact's at a higher frequency than 
> checkpoints are done
> * benefits hot standby startup

Review:

1. I think more comments are needed here to explain why we need this.
I don't know if the comments should go into the functions modified by
this patch or in some other location, but I don't find what's here now
adequate for understanding.

2. I think the variable naming could be better.  If nothing else, I'd
spell out "snapshot" rather than abbreviating it to "snap".  I'd also
add comments explaining what each of those variables does.  And why
isn't log_snap_interval_ms a #define rather than a variable?  (Don't
even talk to me about using gdb on a running instance.  If you're even
thinking about that, this needs to be a GUC.)

3. Why does LogCurrentRunningXacts() need to call
XLogSetAsyncXactLSN()?  Hopefully any WAL record is going to get
sync'd in a reasonably timely fashion; I can't see off-hand why this
one should need special handling.

> 0003 wal_decoding: Allow walsender's to connect to a specific database
> * biggest problem is how to specify the connection we connect
>   to. Currently with the patch walsender connects to a database if it's
>   not named "replication" (via dbname). Perhaps it's better to invent a
>   replication_dbname parameter?

I understand why logical replication needs to connect to a database,
but I don't understand why any other walsender would need to connect
to a database.  Absent a clear use case for such a thing, I don't
think we should allow it.  Ignorant suggestion: perhaps the database
name could be stored in the logical replication slot.

> 0006 wal_decoding: copydir: move fsync_fname to fd.[c.h] and make it public
> * Pretty trivial and boring.

Seems fine.

> 0007 wal_decoding: Add information about a tables primary key to struct 
> RelationData
> * Could be used in the matview refresh code

I think you and Kevin should discuss whether this is actually the
right way to do this.  ISTM that if logical replication and
materialized views end up selecting different approaches to this
problem, everybody loses.

> 0002 wal_decoding: Introduce InvalidCommandId and declare that to be the new 
> maximum for CommandCounterIncrement

I'm still unconvinced we want this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bugfix and new feature for PGXS

2013-09-03 Thread Andrew Dunstan



On 09/03/2013 04:02 AM, Cédric Villemain wrote:

Simple one, attached.
I didn't document USE_VPATH, not sure how to explain that clearly.

Just a remember that the doc is written and is waiting to be commited.

There is also an issue spoted by Christoph with the installdirs prerequisite,
the attached patch fix that.

Also, the bugfixes were supposed to be backported. The behavior is not changed,
nothing new, just fixing problem for some kind of extension builds. (However
the USE_VPATH is new and is not needed in <9.3)



Darn, I knew I had forgotten something. Mea maxima culpa.

Well, the 9.3 tarballs have been cut, unfortunately, but I'll get to 
this as soon as I can.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> I wonder, is anyone really running in production with full_page_writes
> off? 

The simple answer to this is "yes, definitely."

I don't know if they change it on-the-fly, but I doubt it.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Hstore: Query speedups with Gin index

2013-09-03 Thread Blake Smith

Thanks for the feedback everyone. I've attached the patch that we are now
running in production to service our hstore include queries. We rebuilt the
index to account for the on-disk incompatibility. I've submitted the patch
to commitfest here:
https://commitfest.postgresql.org/action/patch_view?id=1203

Michael: I don't have a formal benchmark, but several of our worst queries
went from 10-20 seconds per query down to 50-400 ms. These are numbers
we've seen when testing real production queries against our production
dataset with real world access patterns.
Oleg: Thanks for your thoughts on this change. As for the spgist / gin work
you're doing, is there anything you need help with or are you still in the
research phase? I'd love to help get something more robust merged into
mainline if you think there's collaborative work to be done (even if it's
only user testing).

Thanks,

Blake




On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund wrote:

> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> > > Michael Paquier  writes:
> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith 
> wrote:
> > > >> The combined entry is used to support "contains (@>)" queries, and
> the key
> > > >> only item is used to support "key contains (?)" queries. This
> change seems
> > > >> to help especially with hstore keys that have high cardinalities.
> Downsides
> > > >> of this change is that it requires an index rebuild, and the index
> will be
> > > >> larger in size.
> > >
> > > > Index rebuild would be a problem only for minor releases,
> > >
> > > That's completely false; people have expected major releases to be
> > > on-disk-compatible for several years now.  While there probably will be
> > > future releases in which we are willing to break storage compatibility,
> > > a contrib module doesn't get to dictate that.
> > >
> > > What might be a practical solution, especially if this isn't always a
> > > win (which seems likely given the index-bloat risk), is to make hstore
> > > offer two different GIN index opclasses, one that works the traditional
> > > way and one that works this way.
> > >
> > > Another thing that needs to be taken into account here is Oleg and
> > > Teodor's in-progress work on extending hstore:
> > > https://www.pgcon.org/2013/schedule/events/518.en.html
> > > I'm not sure if this patch would conflict with that at all, but it
> > > needs to be considered.
> >
> > We can disallow in-place upgrades for clusters that use certain contrib
> > modules --- we have done that in the past.
>
> But that really cannot be acceptable for hstore. The probably most
> widely used extension there is.
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>



-- 
Blake Smith
http://blakesmith.me
@blakesmith


hstore_gin_speedup.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] operator precedence issues

2013-09-03 Thread Tom Lane

Andres Freund  writes:
> On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote:
>> While playing around with Andres's trick, I noticed that it works but
>> will not match against operators taking "any" although those will
>> match with explicit schema declaration (FWICT it goes through the
>> search_path trying to explicitly match int/int operator then goes
>> again matches "any").  That's pretty weird:

> Not surprising. We look for the best match for an operator and
> explicitly matching types will be that. If there were no operator(int,
> int) your anyelement variant should get called.

Yeah, this has exactly nothing to do with operator precedence.
Precedence is about which operator binds tighter in cases like "A+B*C".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] operator precedence issues

2013-09-03 Thread Andres Freund

On 2013-09-03 08:59:53 -0500, Merlin Moncure wrote:
> While playing around with Andres's trick, I noticed that it works but
> will not match against operators taking "any" although those will
> match with explicit schema declaration (FWICT it goes through the
> search_path trying to explicitly match int/int operator then goes
> again matches "any").  That's pretty weird:

Not surprising. We look for the best match for an operator and
explicitly matching types will be that. If there were no operator(int,
int) your anyelement variant should get called.

> Ideally though you could specify operator precedence in the operator
> name itself though in such a way that bison pick it up.  I don't know
> if that's possible since so many operator names have been given out
> without any thought to reserving characters for precedence, or if it
> would be worth the extra parsing time even if you could do it.
> Overriding stock operator behaviors is a really dodgy practice with
> the limited but important exception of handling certain classes of
> mathematical errors.

I have to say, even those it seems like it's primary advantage is
making it harder to read the code, but YMMV.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] operator precedence issues

2013-09-03 Thread Merlin Moncure

On Fri, Aug 30, 2013 at 5:48 PM, Andres Freund  wrote:
> Hi,
>
> On 2013-08-30 17:35:04 -0500, Merlin Moncure wrote:
>> "When a schema-qualified operator name is used in the OPERATOR syntax,
>> as for example in:
>> SELECT 3 OPERATOR(pg_catalog.+) 4;
>> the OPERATOR construct is taken to have the default precedence shown
>> in Table 4-2 for "any other" operator. This is true no matter which
>> specific operator appears inside OPERATOR()."
>>
>> That rule seems intentionally designed to make it impossible to to
>> override mathematical behaviors.  Mainly curious -- was that
>> intentional?
>
> You can change your search_path to include your schema before an
> explicitly listed pg_catalog afair. Not nice, but should work...

hurk -- wish I had known that last week, but that's a nifty trick!  It
satisfies my particular problem (safe division) since in this case the
problem is handled 'in function' and I can temporarily hack the
search_path.  Interestingly, if you do this the database doesn't match

Ideally though you could specify operator precedence in the operator
name itself though in such a way that bison pick it up.  I don't know
if that's possible since so many operator names have been given out
without any thought to reserving characters for precedence, or if it
would be worth the extra parsing time even if you could do it.
Overriding stock operator behaviors is a really dodgy practice with
the limited but important exception of handling certain classes of
mathematical errors.

While playing around with Andres's trick, I noticed that it works but
will not match against operators taking "any" although those will
match with explicit schema declaration (FWICT it goes through the
search_path trying to explicitly match int/int operator then goes
again matches "any").  That's pretty weird:

postgres=# CREATE OR REPLACE FUNCTION SafeDiv(
postgres(#   anyelement,
postgres(#   anyelement) RETURNS anyelement AS
postgres-# $$
postgres$#   SELECT $1 OPERATOR(pg_catalog./)  NULLIF($2, 0);
postgres$# $$ LANGUAGE SQL;
CREATE FUNCTION

postgres=# set search_path to safediv, pg_catalog, public;
SET

postgres=# CREATE OPERATOR safediv./
postgres-# (
postgres(#   PROCEDURE = SafeDiv,
postgres(#   LEFTARG = anyelement,
postgres(#   RIGHTARG = anyelement,
postgres(#   COMMUTATOR = /
postgres(# );
CREATE OPERATOR

postgres=# select 1/0;
ERROR:  division by zero
postgres=# select 1 operator(safediv./) 0;
 ?column?
--

(1 row)

postgres=# CREATE OR REPLACE FUNCTION SafeDiv(
postgres(#   int4,
postgres(#   int4) RETURNS int4 AS
postgres-# $$
postgres$#   SELECT $1 OPERATOR(pg_catalog./) NULLIF($2, 0);
postgres$# $$ LANGUAGE SQL;
CREATE FUNCTION

postgres=#
postgres=# CREATE OPERATOR safediv./
postgres-# (
postgres(#   PROCEDURE = SafeDiv,
postgres(#   LEFTARG = int4,
postgres(#   RIGHTARG = int4,
postgres(#   COMMUTATOR = /
postgres(# );
CREATE OPERATOR

postgres=# select 1/0;
 ?column?
--

(1 row)

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Tom Lane

Heikki Linnakangas  writes:
> On 03.09.2013 05:28, Boguk, Maksym wrote:
>> Target usage:  ability to store UTF8 national characters in some
>> selected fields inside a single-byte encoded database.

> I think we should take a completely different approach to this. Two 
> alternatives spring to mind:

> 1. Implement a new encoding.  The new encoding would be some variant of 
> UTF-8 that encodes languages like Russian more efficiently.

+1.  I'm not sure that SCSU satisfies the requirement (which I read as
that Russian text should be pretty much 1 byte/character).  But surely
we could devise a variant that does.  For instance, it could look like
koi8r (or any other single-byte encoding of your choice) with one byte
value, say 255, reserved as a prefix.  255 means that a UTF8 character
follows.  The main complication here is that you don't want to allow more
than one way to represent a character --- else you break text hashing,
for instance.  So you'd have to take care that you never emit the 255+UTF8
representation for a character that can be represented in the single-byte
encoding.  In particular, you'd never encode ASCII that way, and thus this
would satisfy the all-multibyte-chars-must-have-all-high-bits-set rule.

Ideally we could make a variant like this for each supported single-byte
encoding, and thus you could optimize a database for "mostly but not
entirely LATIN1 text", etc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Andres Freund

On 2013-09-02 21:59:42 -0700, Peter Geoghegan wrote:
> > I can't really agree with this part. First off, a heap_insert()
> > certainly isn't lightweight operation. There's several things that can
> > take extended time:
> > * toasting of rows (writing a gigabyte worth of data...)
> > * extension of the heap/toast table
> > * writeout of a dirty victim buffer if the page isn't in s_b
> > * reading in the target page if the page isn't in s_b
> > * waiting for an exclusive lock on the target pages
>
> These are all valid concerns. I'll need to better assess just how bad
> this can get (hopefully with help from others). But some of this stuff
> can surely be done separately for my case. For example, surely we
> could make TOASTing occur after index insertion proper when locks are
> released, without too much work.

I don't think those are that easy to solve, but I'd like to be surprised.

E.g. the toasting cannot really happen after the main table insert since
the inserted rows needs to refer to the toast id. And you don't even
know the final size of a row without doing the toasting, since how much
it will get compressed depends on the amount of free space on the
current target buffer.

> > I'd expect that with the current implementation strategy if you
> > e.g. made pgbench use INSERT .. ON DUPLICATE IGNORE without ever
> > actually hitting duplicates, you'd see a very, very noticeable hit in
> > scalability. I wouldn't expect a single session to get much slower bug
> > trying a dozen or two should show a very noticeable dip.
>
> I do not accept the premise of your questions regarding the patch's
> performance, which appears to be that it's fair to expect the same
> level of performance of INSERT .,. ON DUPLICATE IGNORE as it is to
> expect of regular INSERT. No scheme is going to get that, including
> your scheme, which necessitates WAL logging everything related to
> index tuple insertion twice for each unique index. We certainly didn't
> require SSI to perform as well as the current REPEATABLE READ, and
> certainly not as well as READ COMMITTED, and for good reasons.

I think it's ok that INSERT .. IGNORE itself is noticeably slower than a
normal INSERT. What I am concerned about is it negatively impacting
overall scalability.

> I'm not going to bother benchmarking the patch myself until I give it
> another whack. There is some really obvious low-hanging performance
> fruit. Obviously by this I mean doing as little as possible when
> holding the exclusive lock.

Sounds fair.

> > I am pretty sure it would be better. We know that nbtree works pretty
> > damn well under concurrency. And the "promise" logic wouldn't change
> > anything of that.

> It would have multiple extra steps. Including WAL-logging of the
> initial insertion, and clean-up for the IGNORE case. And WAL-logging
> of the cleanup. Instead of doing exactly nothing extra in the IGNORE
> case, which, under my proposal, has an additional duration of the
> exclusive lock being held of zero (instead of the ereport() call we
> just immediately release the lock and report the duplicate). The
> IGNORE case matters too, and there is going to be considerably more
> exclusive locking for that case with your proposal, before we even
> talk about other overhead.

What I mean is that none of the individual proposed steps require
changing the locking semantics/granularity. I.e. we will never hold
locks on more pages at once than we do for a normal insert. Yes, one
page will get locked twice, but for shorter than a normal insert (all
you need to do is to rewrite the target of the item) and without having
any other pages locked.

> > Furthermore, I am pretty sure that locking an index page *and* related
> > heap pages exclusively at the same time will be a deadlock
> > nightmare. Even in the case of only having a single unique index.

> It's seems like what you're saying here is "I don't know, this
> extended index locking thing seems pretty sketchy to me". Which is a
> perfectly reasonable thing to say in the absence of better analysis
> from either of us - it is sketchy.

Yes, that's basically what I am saying.

I'd really like some input from others on this. Perhaps you could write
up an email/thread just about the locking semantics after the next
version of the patch (since that presumably will change the semantics)?

> I have taken a certain amount of reassurance from the fact that I was
> not able to observe any deadlocks when pushing the patch hard -
> perhaps that's naive. I mean hammering a table with inserts using
> pgbench for 3 hours, with 64 INSERT...DUPLICATE KEY IGNORE sessions.
> PK values were over a range of values from 115 million. This was
> with shared_buffers set to 512MB and assertions enabled. It stood up
> without any deadlocks or other manifestations of bugs. Granted, I did
> that before fixing the bug you've highlighted (but after you pointed
> it out), and so I haven't tested the multi-index case in the same way.

I can readil

Re: [HACKERS] Further XLogInsert scaling tweaking

2013-09-03 Thread Merlin Moncure

On Mon, Sep 2, 2013 at 10:32 PM, Bruce Momjian  wrote:
> On Mon, Sep  2, 2013 at 10:14:03AM +0300, Heikki Linnakangas wrote:
>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index 39c58d0..28e62ea 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -428,8 +428,14 @@ typedef struct XLogCtlInsert
>>   uint64  CurrBytePos;
>>   uint64  PrevBytePos;
>>
>> - /* insertion slots, see above for details */
>> - XLogInsertSlotPadded *insertSlots;
>> + /*
>> +  * Make sure the above heavily-contended spinlock and byte positions 
>> are
>> +  * on their own cache line. In particular, the RedoRecPtr and full page
>> +  * write variables below should be on a different cache line. They are
>> +  * read on every WAL insertion, but updated rarely, and we don't want
>> +  * those reads to steal the cache line containing Curr/PrevBytePos.
>> +  */
>> + charpad[128];
>
> Do we adjust for cache line lengths anywhere else?  PGPROC?  Should it
> be a global define?

+1 -- that is, I think it should be.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2013-09-03 Thread Andres Freund

Hi,

On 2013-09-03 14:35:18 +0200, Antonin Houska wrote:

> + /*
> +  * THROTTLING_SAMPLE_MIN / MAX_RATE_LOWER (in seconds) should 
> be the
> +  * longest possible time to sleep.
> +  */
> + pg_usleep((long) sleep);
> + else
> +
> + /*
> +  * The actual transfer rate is below the limit. Negative value 
> would
> +  * distort the adjustment of throttled_last.
> +  */
> + sleep = 0;
> +
> + /*
> +  * Only the whole multiples of throttling_sample processed. The rest 
> will
> +  * be done during the next call of this function.
> +  */
> + throttling_counter %= throttling_sample;
> + /* Once the (possible) sleep ends, new period starts. */
> + throttled_last += elapsed + sleep;
> +}

It's probably better to use latches for the waiting, those have properly
defined interruption semantics. Whether pg_usleep will be interrupted is
platform dependant...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup throttling

2013-09-03 Thread Antonin Houska

On 07/24/2013 09:20 AM, Antonin Houska wrote:
> Hello,
> the purpose of this patch is to limit impact of pg_backup on running 
> server.

Attached is a new version. Server-side implementation this time.

Antonin Houska (Tony)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index eb0c1d6..f58eb60 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,22 @@ PostgreSQL documentation
  
 
  
+  -r
+  --max-rate
+  
+   
+The maximum amount of data transferred from server per second.
+The purpose is to limit impact of pg_basebackup
+on a running master server while transfering data directory.
+   
+   
+Suffixes k (kilobytes) and m
+(megabytes) are accepted. For example: 10m
+   
+  
+ 
+
+ 
   -R
   --write-recovery-conf
   
diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index ba8d173..08e4f26 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -33,6 +33,7 @@
 #include "utils/builtins.h"
 #include "utils/elog.h"
 #include "utils/ps_status.h"
+#include "utils/timestamp.h"
 #include "pgtar.h"
 
 typedef struct
@@ -42,6 +43,7 @@ typedef struct
 	bool		fastcheckpoint;
 	bool		nowait;
 	bool		includewal;
+	uint32		maxrate;
 } basebackup_options;
 
 
@@ -59,6 +61,7 @@ static void perform_base_backup(basebackup_options *opt, DIR *tblspcdir);
 static void parse_basebackup_options(List *options, basebackup_options *opt);
 static void SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli);
 static int	compareWalFileNames(const void *a, const void *b);
+static void throttle(size_t increment);
 
 /* Was the backup currently in-progress initiated in recovery mode? */
 static bool backup_started_in_recovery = false;
@@ -68,6 +71,41 @@ static bool backup_started_in_recovery = false;
  */
 #define TAR_SEND_SIZE 32768
 
+
+/*
+ * The maximum amount of data per second - bounds of the user input.
+ *
+ * If the maximum should be increased to more than 4 GB, uint64 must
+ * be introduced for the related variables. However such high values have
+ * little to do with throttling.
+ */
+#define MAX_RATE_LOWER	32768
+#define MAX_RATE_UPPER	(1024 << 20)
+
+/*
+ * Transfer rate is only measured when this number of bytes has been sent.
+ * (Too frequent checks would impose too high CPU overhead.)
+ *
+ * The default value is used unless it'd result in too frequent checks.
+ */
+#define THROTTLING_SAMPLE_MIN	32768
+
+/* The maximum number of checks per second.  */
+#define THROTTLING_MAX_FREQUENCY	256
+
+/* The actual value, transfer of which may cause sleep. */
+static uint32 throttling_sample;
+
+/* Amount of data already transfered but not yet throttled.  */
+static int32 throttling_counter;
+
+/* The minimum time required to transfer throttling_sample bytes. */
+static int64 elapsed_min_unit;
+
+/* The last check of the transfer rate. */
+static int64 throttled_last;
+
+
 typedef struct
 {
 	char	   *oid;
@@ -171,6 +209,31 @@ perform_base_backup(basebackup_options *opt, DIR *tblspcdir)
 		/* Send tablespace header */
 		SendBackupHeader(tablespaces);
 
+		if (opt->maxrate > 0)
+		{
+			throttling_sample = opt->maxrate / THROTTLING_MAX_FREQUENCY;
+
+			/* Don't measure too small pieces of data. */
+			if (throttling_sample < THROTTLING_SAMPLE_MIN)
+throttling_sample = THROTTLING_SAMPLE_MIN;
+
+			/*
+			 * opt->maxrate is bytes per seconds. Thus the expression in
+			 * brackets is microseconds per byte.
+			 */
+			elapsed_min_unit = throttling_sample *
+((double) USECS_PER_SEC / opt->maxrate);
+
+			/* Enable throttling. */
+			throttling_counter = 0;
+
+			/* The 'real data' starts now (header was ignored). */
+			throttled_last = GetCurrentIntegerTimestamp();
+		}
+		else
+			/* Disable throttling. */
+			throttling_counter = -1;
+
 		/* Send off our tablespaces one by one */
 		foreach(lc, tablespaces)
 		{
@@ -468,6 +531,7 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 	bool		o_fast = false;
 	bool		o_nowait = false;
 	bool		o_wal = false;
+	bool		o_maxrate = false;
 
 	MemSet(opt, 0, sizeof(*opt));
 	foreach(lopt, options)
@@ -519,6 +583,29 @@ parse_basebackup_options(List *options, basebackup_options *opt)
 			opt->includewal = true;
 			o_wal = true;
 		}
+		else if (strcmp(defel->defname, "maxrate") == 0)
+		{
+			long		maxrate;
+
+			if (o_maxrate)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("duplicate option \"%s\"", defel->defname)));
+			maxrate = intVal(defel->arg);
+
+			opt->maxrate = (uint32) maxrate;
+			if (opt->maxrate > 0 &&
+			(opt->maxrate < MAX_RATE_LOWER || opt->maxrate > MAX_RATE_UPPER))
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+  errmsg("transfer rate %u bytes per second is out of range",
+		 opt->maxrate),
+ errhint("The accepte

Re: [HACKERS] [9.4] Make full_page_writes only settable on server start?

2013-09-03 Thread Andres Freund

On 2013-09-02 19:10:57 -0700, Jeff Davis wrote:
> There is a significant amount of code supporting the changing of
> full_page_writes while the server is running, including an
> XLOG_FPW_CHANGE wal record. But I don't see the use-case; surely almost
> nobody changes this on a running server, because either your filesystem
> guarantees atomic page writes for you or not.
>
> I'd like to submit a patch to just make it into a PGC_POSTMASTER and
> remove the code to support changing it.

I've wondered about that before. But always decided that we wouldn't
gain all that much by making it PGC_POSTMASTER since the effective value
of full_page_writes still needs to be changeable at runtime since we
need to force-enable full page writes during a pg_start/stop_backup().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Peter Geoghegan

On Tue, Sep 3, 2013 at 2:11 AM, Peter Geoghegan  wrote:
> and it would be totally unacceptable if that meant that lots of people
> blocked on the page lock that an upserter held while it tried to
> acquire locks on tuples

By which I mean: it seems shaky that I'd then be assuming that to lock
all of a row's unique index values was to lock the row itself, and so
that there was no need to worry about blocking on acquiring an actual
row lock while holding all those exclusive/write buffer locks.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Peter Geoghegan

On Tue, Sep 3, 2013 at 12:52 AM, Craig Ringer  wrote:
> That'd certainly be the ideal, but as you say is far from simple, and
> IIRC prior discussions here have suggested the SSI / predicate locking
> stuff won't help.

I think that by far the strongest argument for what Andres suggested
(that has, incidentally, been suggested by a number of others), which
is to insert what he termed promise index tuples before a heap tuple,
is that there is no sensible way to lock the tuples while holding an
exclusive buffer lock on an index.

It could take basically any amount of time to acquire those locks (see
GetTupleForTrigger() to get a rough idea of how that ought to work),
and it would be totally unacceptable if that meant that lots of people
blocked on the page lock that an upserter held while it tried to
acquire locks on tuples (we could only unlock the buffer lock/locked
value when the rows were locked). So while I think what I've done here
is probably workable for the ON DUPLICATE KEY IGNORE case, perhaps I'm
being short-sighted in focusing on that. I'm surprised Andres didn't
call me out on this himself, actually.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bugfix and new feature for PGXS

2013-09-03 Thread Cédric Villemain

> Simple one, attached.
> I didn't document USE_VPATH, not sure how to explain that clearly.

Just a remember that the doc is written and is waiting to be commited.

There is also an issue spoted by Christoph with the installdirs prerequisite, 
the attached patch fix that.

Also, the bugfixes were supposed to be backported. The behavior is not changed, 
nothing new, just fixing problem for some kind of extension builds. (However 
the USE_VPATH is new and is not needed in <9.3)

-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation>From fb9d5ed99397b40e7fc33224fa31cd5c54c7b5d2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Villemain?= 
Date: Tue, 3 Sep 2013 09:55:05 +0200
Subject: [PATCH] Add installdirs to PGXS order-only-prerequisites

prerequisites are not ordered but installdirs is required by others.
This patch is the simplest one, another solution being to remove completely
installdirs prerequisite.

Spotted by Christoph Berg with plr build.
---
 src/makefiles/pgxs.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/makefiles/pgxs.mk b/src/makefiles/pgxs.mk
index 8618aa1..ac12f7d 100644
--- a/src/makefiles/pgxs.mk
+++ b/src/makefiles/pgxs.mk
@@ -124,7 +124,7 @@ all: all-lib
 endif # MODULE_big
 
 
-install: all installdirs installcontrol installdata installdatatsearch installdocs installscripts
+install: all installcontrol installdata installdatatsearch installdocs installscripts | installdirs
 ifdef MODULES
 	$(INSTALL_SHLIB) $(addsuffix $(DLSUFFIX), $(MODULES)) '$(DESTDIR)$(pkglibdir)/'
 endif # MODULES
-- 
1.8.4.rc3



signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-03 Thread Craig Ringer

On 09/03/2013 06:18 AM, Peter Geoghegan wrote:
> On Mon, Sep 2, 2013 at 6:25 AM, Craig Ringer  wrote:
>> It'll be yet another way for people to get upsert wrong, of course.
>> They'll use a wCTE with RETURNING REJECTS to do an UPDATE of the rejects
>> w/o locking the table against writes first. Documenting this pitfall
>> should be enough, though.
> 
> My preferred solution is to actually provide a variant to lock the
> rows implicated in the would-be unique constraint violation. Obviously
> that's a harder problem to solve.

That'd certainly be the ideal, but as you say is far from simple, and
IIRC prior discussions here have suggested the SSI / predicate locking
stuff won't help.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.

2013-09-03 Thread Heikki Linnakangas


On 03.09.2013 05:28, Boguk, Maksym wrote:

Target usage:  ability to store UTF8 national characters in some
selected fields inside a single-byte encoded database.
For sample if I have a ru-RU.koi8r encoded database with mostly Russian
text inside,  it would be nice to be able store an Japanese text in one
field without converting the whole database to UTF8 (convert such
database to UTF8 easily could almost double the database size even if
only one field in whole database will use any symbols outside of
ru-RU.koi8r encoding).


Ok.


What has been done:

1)Addition of new string data types NATIONAL CHARACTER and NATIONAL
CHARACTER VARIABLE.
These types differ from the char/varchar data types in one important
respect:  NATIONAL string types are always have UTF8 encoding even
(independent from used database encoding).


I don't like the approach of adding a new data type for this. The 
encoding used for a text field should be an implementation detail, not 
something that's exposed to users at the schema-level. A separate data 
type makes an nvarchar field behave slightly differently from text, for 
example when it's passed to and from functions. It will also require 
drivers and client applications to know about it.



What need to be done:

1)Full set of string functions and operators for NATIONAL types (we
could not use generic text functions because they assume that the stings
will have database encoding).
Now only basic set implemented.
2)Need implement some way to define default collation for a NATIONAL
types.
3)Need implement some way to input UTF8 characters into NATIONAL types
via SQL  (there are serious open problem... it will be defined later in
the text).


Yeah, all of these issues stem from the fact that the NATIONAL types are 
separate from text.


I think we should take a completely different approach to this. Two 
alternatives spring to mind:


1. Implement a new encoding.  The new encoding would be some variant of 
UTF-8 that encodes languages like Russian more efficiently. Then just 
use that in the whole database. Something like SCSU 
(http://www.unicode.org/reports/tr6/) should do the trick, although I'm 
not sure if SCSU can be used as a server-encoding. A lot of code relies 
on the fact that a server encoding must have the high bit set in all 
bytes that are part of a multi-byte character. That's why SJIS for 
example can only be used as a client-encoding. But surely you could come 
up with some subset or variant of SCSU which satisfies that requirement.


2. Compress the column. Simply do "ALTER TABLE foo ALTER COLUMN bar SET 
STORAGE MAIN". That will make Postgres compress that field. That might 
not be very efficient for compressing short cyrillic text encoded in 
UTF-8 today, but that could be improved. There has been discussion on 
supporting more compression algorithms in the past, and one such 
algorithm could be again something like SCSU.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ENABLE/DISABLE CONSTRAINT NAME

2013-09-03 Thread wangshuo


于 2013-09-03 08:15, David Johnston 回复:

Jeff Davis-8 wrote

Is there any semantic difference between marking a constraint as
DISABLED and simply dropping it? Or does it just make it easier to
re-add it later?



David Johnston wrote:
I cannot answer the question but if there is none then the main 
concern I'd
have is capturing "meta-information" about WHY such a constraint has 
been

disabled instead of dropped.


Drop/build and disable/enable constraint has no fundamental difference,
and could achieve the same purpose.What I do also more convenient for 
the user.
Recording the disabled constraints is easier than recoding all the 
constrains.

What's more, a lot of people ever asked about turing off constraint and
The sql2008 support this.So I think it's necessary in some ways.

I guess this whole feature extends from the trigger disable feature 
that
already exists.  Given we have the one adding this seems 
symmetrical...


I cannot really see using either feature on a production system (if
following best practices) but I can imagine where they could both be 
helpful
during development.  Note with this usage pattern the 
meta-information about

"why" becomes considerably less important.

David J.




 Wang Shuo
 HighGo Software Co.,Ltd.
 September 3, 2013


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

68 matches

Mail list logo