Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-18 Thread Cédric Villemain
Le samedi 17 novembre 2012 22:57:49, Tom Lane a écrit :
 Fujii Masao masao.fu...@gmail.com writes:
  Do we really need to store the settings in a system table?
  Since WAL would be generated when storing the settings
  in a system table, this approach seems to prevent us from
  changing the settings in the standby.
 
 That's a really good point: if we try to move all GUCs into a system
 table, there's no way for a standby to have different values; and for
 some of them different values are *necessary*.
 
 I think that shoots down this line of thought entirely.  Can we go
 back to the plain write a file approach now?  I think a SET
 PERSISTENT command that's disallowed in transaction blocks and just
 writes the file immediately is perfectly sensible.

I was justifying the usage of a table structure, not to keep it in sync (just 
use it to hide the complexity of locks).

Anyway that was just comments.
-- 
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation


signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] Materialized views WIP patch

2012-11-18 Thread Simon Riggs
On 16 November 2012 11:25, Kevin Grittner kgri...@mail.com wrote:

 16. To get new data into the MV, the command is LOAD MATERIALIZED
 VIEW mat view_name. This seemed more descriptive to me that the
 alternatives and avoids declaring any new keywords beyond
 MATERIALIZED. If the MV is flagged as relisvalid == false, this
 will change it to true.

 UPDATE MATERIALIZED VIEW was problematic?

 Not technically, really, but I saw two reasons that I preferred LOAD MV:

 1. It seems to me to better convey that the entire contents of the MV
will be built from scratch, rather than incrementally adjusted.
 2. We haven't hashed out the syntax for more aggressive maintenance of
an MV, and it seemed like UPDATE MV might be syntax we would want to
use for something which updated selected parts of an MV when we do.

 Does LOAD automatically TRUNCATE the view before reloading it? If not,
 why not?

 It builds a new heap and moves it into place. When the transaction
 running LMV commits, the old heap is deleted. In implementation it is
 closer to CLUSTER or the new VACUUM FULL than TRUNCATE followed by
 creating a new table. This allows all permissions, etc., to stay in
 place.

This seems very similar to the REPLACE command we discussed earlier,
except this is restricted to Mat Views.

If we're going to have this, I would prefer a whole command.

e.g. REPLACE matviewname REFRESH

that would also allow

REPLACE tablename AS query

Same thing under the covers, just more widely applicable and thus more useful.



Either way, I don't much like overloading the use of LOAD, which
already has a very different meaning.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser - Query Analyser

2012-11-18 Thread Craig Ringer
On 11/17/2012 10:18 PM, Michael Giannakopoulos wrote:
 Hello guys,

 My name is Michail Giannakopoulos and I am a graduate student at
 University of Toronto. I have no previous experience in developing a
 system like postgreSQL before.

 What I am trying to explore is if it is possible to extend postgreSQL
 in order to accept queries of the form:

 Select function(att1, att2, att3) AS output(out1, out2, ..., outk)
 FROM [database_name];

I think you meant FROM [table_name]. It certainly seems like it, as
you describe database_name as a relation a little later.

If you really meant FROM [database_name], you're not going to have
much luck. PostgreSQL backends are associated with a single database.
They cannot query across databases without hacks like dblink, which
internally opens a connection to another backend. So you really can't
query FROM [database_name], you must connect to a database then issue
queries against it.

If you meant FROM relation_name: it sounds like you are describing a
stored procedure that returns SETOF RECORD. If so, you can already do
this, though the syntax is a little different. You have to pass the
relation *name* or regclass oid into the procedure, where it builds a
dynamic SQL statement to SELECT from the table and return the result.

Alternately: Are you trying to describe a *row filter function*? Like a
WHERE clause wrapped up in a function?

It would really help if you could show some mock examples of what you're
trying to achieve. Inputs as CREATE TABLE and INSERT statements, mock
output, explanation of how you'd get that output, what problem you're
trying to solve, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need so many hint bits?

2012-11-18 Thread Simon Riggs
On 17 November 2012 21:20, Jeff Davis pg...@j-davis.com wrote:

 ISTM that we should tune that specifically by performing a VM lookup
 for next 32 pages (or more), so we reduce the lookups well below 1 per
 page. That way the overhead of using the VM will be similar to using
 the PD_ALL_VISIBLE.

 That's another potential way to mitigate the effects during a scan, but
 it does add a little complexity. Right now, it share locks a buffer, and
 uses an array with one element for each tuple in the page. If
 PD_ALL_VISIBLE is set, then it marks all of the tuples *currently
 present* on the page as visible in the array, and then releases the
 share lock. Then, when reading the page, if another tuple is added
 (because we released the share lock and only have a pin), it doesn't
 matter because it's already invisible according to the array.

 With this approach, we'd need to keep a larger array to represent many
 pages. And it sounds like we'd need to share lock the pages ahead, and
 find out which items are currently present, in order to properly fill in
 the array. Not quite sure what to do there, but would require some more
 thought.

Hmm, that's too much and not really what I was thinking, but I concede
that was a little vague. No need for bigger arrays etc..

If we check the VM for next N blocks, then we know that all completed
transactions are commited. Yes, the VM can change, but that is not a
problem.

What I mean is that we keep an array of boolean[N] that simply tracks
what the VM said last time we checked it. If that is true for a block
then we do special processing, similar to the current all-visible path
and yet different, desribed below.

What we want is to do a HeapTupleVisibility check that does not rely
on tuple hints AND yet avoids all clog access. So when we scan a
buffer in page mode and we know the VM said it was all visible we
still check each tuple's visibility. If xid is below snapshot xmin
then the xid is known committed and the tuple is visible to this scan
(not necessarily all scans). We know this because the VM said this
page was all-visible AFTER our snapshot was taken. If tuple xid is
within snapshot or greater than snapshot xmax then the tuple is
invisible to our snapshot and we don't need to check clog. So once we
know the VM said the page was all visible we do not need to check clog
to establish visibility, we only need to check the tuple xmin against
our snapshot xmin.

So the VM can change under us and it doesn't matter. We don't need a
pin or lock on the VM, we just read it and let go. No race conditions,
no fuss.

The difference here is that we still need to check visibility of each
tuple, but that can be a very cheap check and never involves clog, nor
does it dirty the page. Tuple access is reasonably expensive in
comparison with a clog-less check on tuple xmin against snapshot xmin,
so the extra work is negligible.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need so many hint bits?

2012-11-18 Thread Andres Freund
On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote:
 Process A (process that clears a VM bit for a data page):
   1. Acquires exclusive lock on data buffer
   2. Acquires exclusive lock on VM buffer
   3. clears VM bit
   4. Releases VM buffer lock
   5. Releases data buffer lock

Well, but right this is a rather big difference. If vm pages get 
unconditionally locked all the time we will have a huge source of new 
contention as they are shared between so many heap pages.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we need so many hint bits?

2012-11-18 Thread Simon Riggs
On 18 November 2012 08:52, Simon Riggs si...@2ndquadrant.com wrote:

 The difference here is that we still need to check visibility of each
 tuple, but that can be a very cheap check and never involves clog, nor
 does it dirty the page. Tuple access is reasonably expensive in
 comparison with a clog-less check on tuple xmin against snapshot xmin,
 so the extra work is negligible.

The attached *partial* patch shows how the tuple checks would work.

This should fit in neatly with the vismap skip code you've got already.

Looks to me possible to skip the all-vis hint completely, as you
suggest, plus avoid repeated checks of VM or clog.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


all_visible_cheap_check.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 9.3 pg_archivecleanup broken?

2012-11-18 Thread Erik Rijkers
(In a test setup) I can't get pg_archivecleanup to remove WALfiles in 9.3devel. 
(A very similar 
setup in 9.2 works fine).

In 9.3 pg_archivecleanup just keeps repeating lines like:

pg_archivecleanup: keep WAL file 
/home/aardvark/pg_stuff/archive_dir93/
and later

(and does not delete any files.)

Configuration:

# master  pgsql.93_1/data/postgresql.conf:
data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_1/data'
listen_addresses = '*'
max_connections = 100
shared_buffers = 128MB
wal_level = hot_standby
synchronous_commit = on
checkpoint_segments = 3
archive_mode = on
archive_command = 'cp %p /home/aardvark/pg_stuff/archive_dir93/%f  /dev/null'
max_wal_senders = 3
synchronous_standby_names = '*'

# slave   pgsql.93_2/data/postgresql.conf:
data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_2/data'
listen_addresses = '*'
port = 6665
max_connections = 100
shared_buffers = 128MB
wal_level = hot_standby
synchronous_commit = on
checkpoint_segments = 3
max_wal_senders = 3
synchronous_standby_names = ''
hot_standby = on
wal_receiver_status_interval = 59

# pgsql.93_2/data/recovery.conf
primary_conninfo = 'host=127.0.0.1 port=6664 user=aardvark password=sekr1t
application_name=wal_receiver_01'
standby_mode = 'on'
restore_command = 'cp /home/aardvark/pg_stuff/archive_dir93/%f %p  /dev/null'
archive_cleanup_command = 'pg_archivecleanup -d 
/home/aardvark/pg_stuff/archive_dir93 %r'


Seeing that the same setup in 9.2 has pg_archivecleanup deleting files, it 
would seem that some
bug exists but I haven't followed changes regarding WAL too closely.

(My apologies if it's a config mistake on my part after all; but in that case: 
I cannot find it
and would be thankful for a hint)


Thanks,

Erik Rijkers





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [PATCH 05/14] Add a new relmapper.c function RelationMapFilenodeToOid that acts as a reverse of RelationMapOidToFilenode

2012-11-18 Thread Andres Freund
On 2012-11-17 19:14:06 +0900, Michael Paquier wrote:
 On Fri, Nov 16, 2012 at 7:58 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  Hi,
 
  On 2012-11-16 13:44:45 +0900, Michael Paquier wrote:
   This patch looks OK.
  
   I got 3 comments:
   1) Why changing the OID of pg_class_tblspc_relfilenode_index from 3171 to
   3455? It does not look necessary.
 
  Its a mismerge and should have happened in Add a new RELFILENODE
  syscache to fetch a pg_class entry via (reltablespace, relfilenode) but
  it seems I squashed the wrong two commits.
  I had originally used 3171 but that since got used up for lo_tell64...
 
   2) You should perhaps change the header of RelationMapFilenodeToOid so as
   not mentionning it as the opposite operation of RelationMapOidToFilenode
   but as an operation that looks for the OID of a relation based on its
   relfilenode. Both functions are opposite but independent.
 
  I described it as the opposite because RelationMapOidToFilenode is the
  relmappers stated goal and RelationMapFilenodeToOid is just some
  side-business.
 
   3) Both functions are doing similar operations. Could it be possible
   to wrap them in the same central function?
 
  I don't really see how without making both quite a bit more
  complicated. The amount of if's needed seems to be too large to me.
 
 OK thanks for your answers.
 As this patch only adds a new function and is not that much complicated, I
 think there is no problem in committing it. The only thing to remove is the
 diff in indexing.h. Could someone take care of that?

I pushed a rebase to the git repository that fixed it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-18 Thread Magnus Hagander
On Tue, Oct 23, 2012 at 5:08 PM, Magnus Hagander mag...@hagander.net wrote:

 On Oct 23, 2012 4:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Boszormenyi Zoltan escribió:

  Also, the check for conflict between -R and -x/-X is now removed.
 
  The documentation for option -R has changed to reflect this and
  there is no different error code 2 now: it would make the behaviour
  inconsistent between -Fp and -Ft.
 
  The PQconninfo patch is also attached but didn't change since the
   last mail.

 Magnus,

 This patch is all yours to handle.  I'm guessing nothing will happen
 until pgconf.eu is done and over, but hopefully you can share a few
 beers with Zoltan over the whole subject (and maybe with Peter about the
 PQconninfo stuff?)

 I'm not closing this just yet, but if you're not able to handle this
 soon, maybe it'd be better to punt it to the November commitfest.

 It's on my to do list for when I get back, but correct, won't get to it
 until after the conference.

I finally got around to looking at this patch now. Sorry about the way
too long delay.

A few thoughts:

First, on the libpq patch:

I'm not sure I like the for_replication flag to PQconninfo(). It seems
we're it's a quite limited API, and not very future proof. What's to
say that an app would only be interested in filtering based on
for_replication? One idea might be to have a bitmap field there, and
assign *all* conninfo options to a category. We could then have
categories for NORMAL and REPLICATION. But we could also for example
have a category for PASSWORD (or similar), so that you could get with
and without those. Passing in a 32-bit integer would allow us to have
32 different categories, and we could then use a bitmask to pick
things out.

It might sound a bit like overengineering, but it's also an API and
it's a PITA to change it in the future if more needs show up..


Second, I wonder if we really need to add the code for requiressl=1,
or if we should just remove it. The spelling requiressl=1 was
deprecated back in 7.4 - which has obviously been out of support for a
long time now.


Third, in fillPGconn. If mapping has both conninfoValue and connvalue,
it does a free() on the old value in memberaddr, but if it doesn't it
just overwrites memberaddr with strdup(). Shouldn't those two paths be
the same, meaning shouldn't the  if (*memberaddr) free(*memberaddr);
check be outside the if block?

Fourth, I'm not sure I like the memberaddr term. It's not wrong, but
it threw me off a couple of times while reading it. It's not all that
important, and I'm not sure about another idea for it though - maybe
just connmember?


Then, about the pg_basebackup patch:

What's the reason for the () around 512 for TARCHUNKSZ?

We have a lot of hardcoded entries of the 512 for tar in that file. We
should either keep using 512 as a constant, or change all those to
*also* use the #define. Right now, the code will obviously break if
you change the #define (for example, it compares to 512, but then uses
hdrleft = TARCHUNKSZ - tarhdrsz; to do calculation).

The name choice of basebackup for the bool in ReceiveTarFile() is
unfortunate, since we use the term base backup to refer to the
complete thing, not just the main tablespace. Probably
basetablespcae instead. And it should then be assigned at the top of
the function (to the result of PQgetisnull()), and used in the main
if() branch as well.

Should we really print the status message even if not in verbose mode?
We only print the base backup complete messages in verbose mode, for
example.

It seems wrong to free() recoveryconf in ReceiveTarFile(). It's
allocated globally at the beginning. While that loop should only be
called once (since only one tablespace can be the main one), it's a
confusing location for the free.

The whole tar writing part of the code needs a lot more comments. It's
entirely unclear what the code does there. Why does the recovery.conf
writing code need to be split up in multiple locations inside
ReceiveTarFile(), for example? It either needs to be simplified, or
explained why it can't be simplified in comments.

_tarCreateHeader() is really badly named, since it specifically
creates a tar header for recovery.conf only. Either that needs to be a
parameter, or it needs to have a name that indicates this is the only
thing it does. The former is probably better.

Much of the tar stuff is very similar (I haven't looked to see if it's
identical) to the stuff in backend/replication/basebackup.c. Perhaps
we should have a src/port/tarutil.c?

escape_string() - already exists as escape_quotes() in initdb, AFAICT.
We should either move it to src/port/, or at least copy it so it's
exactly the same.

CreateRecoveryConf() should just use PQExpBuffer (in libpq), I think -
that does away with a lot of code. We already use this from e.g.
pg_dump, so there's a precedent for using internal code from libpq in
frontends.



Again, my apologies for this review taking so long. I will try to be
more 

Re: [HACKERS] Do we need so many hint bits?

2012-11-18 Thread Jeff Davis
On Sun, 2012-11-18 at 15:19 +0100, Andres Freund wrote:
 On Sunday, November 18, 2012 03:07:01 AM Jeff Davis wrote:
  Process A (process that clears a VM bit for a data page):
1. Acquires exclusive lock on data buffer
2. Acquires exclusive lock on VM buffer
3. clears VM bit
4. Releases VM buffer lock
5. Releases data buffer lock
 
 Well, but right this is a rather big difference. If vm pages get 
 unconditionally locked all the time we will have a huge source of new 
 contention as they are shared between so many heap pages.

No, that is only for the process *clearing* the bit, and this already
happens. I am not planning on introducing any new locks, aside from the
buffer header lock when acquiring a pin. And I plan to keep those pins
for long enough that those don't matter, either.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Make pg_basebackup configure and start standby [Review]

2012-11-18 Thread Boszormenyi Zoltan

Hi,

2012-11-18 17:20 keltezéssel, Magnus Hagander írta:

On Tue, Oct 23, 2012 at 5:08 PM, Magnus Hagander mag...@hagander.net wrote:

On Oct 23, 2012 4:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

Boszormenyi Zoltan escribió:


Also, the check for conflict between -R and -x/-X is now removed.

The documentation for option -R has changed to reflect this and
there is no different error code 2 now: it would make the behaviour
inconsistent between -Fp and -Ft.


The PQconninfo patch is also attached but didn't change since the
last mail.

Magnus,

This patch is all yours to handle.  I'm guessing nothing will happen
until pgconf.eu is done and over, but hopefully you can share a few
beers with Zoltan over the whole subject (and maybe with Peter about the
PQconninfo stuff?)

I'm not closing this just yet, but if you're not able to handle this
soon, maybe it'd be better to punt it to the November commitfest.

It's on my to do list for when I get back, but correct, won't get to it
until after the conference.

I finally got around to looking at this patch now. Sorry about the way
too long delay.


No problem, thanks for looking at it.


A few thoughts:

First, on the libpq patch:

I'm not sure I like the for_replication flag to PQconninfo(). It seems
we're it's a quite limited API, and not very future proof. What's to
say that an app would only be interested in filtering based on
for_replication? One idea might be to have a bitmap field there, and
assign *all* conninfo options to a category. We could then have
categories for NORMAL and REPLICATION. But we could also for example
have a category for PASSWORD (or similar), so that you could get with
and without those. Passing in a 32-bit integer would allow us to have
32 different categories, and we could then use a bitmask to pick
things out.

It might sound a bit like overengineering, but it's also an API and
it's a PITA to change it in the future if more needs show up..


You are right, I will change this accordingly.


Second, I wonder if we really need to add the code for requiressl=1,
or if we should just remove it. The spelling requiressl=1 was
deprecated back in 7.4 - which has obviously been out of support for a
long time now.


This needs opinions from more people, I am not the one to decide it.
The code would be definitely cleaner without processing this extra
non-1:1 mapping.


Third, in fillPGconn. If mapping has both conninfoValue and connvalue,
it does a free() on the old value in memberaddr, but if it doesn't it
just overwrites memberaddr with strdup(). Shouldn't those two paths be
the same, meaning shouldn't the  if (*memberaddr) free(*memberaddr);
check be outside the if block?


Yes, and set it to NULL too. Otherwise there might be a case when
the free() leaves a stale pointer value if the extra mapping fails
the strcmp() check. This is all unnecessary if the extra mapping
for requiressl=1 is removed, the code would be straight.


Fourth, I'm not sure I like the memberaddr term. It's not wrong, but
it threw me off a couple of times while reading it. It's not all that
important, and I'm not sure about another idea for it though - maybe
just connmember?


I am not attached to this variable name, I will change it.


Then, about the pg_basebackup patch:

What's the reason for the () around 512 for TARCHUNKSZ?


It's simply a habit, to not forget it for more complex macros.


We have a lot of hardcoded entries of the 512 for tar in that file. We
should either keep using 512 as a constant, or change all those to
*also* use the #define. Right now, the code will obviously break if
you change the #define (for example, it compares to 512, but then uses
hdrleft = TARCHUNKSZ - tarhdrsz; to do calculation).


Yes, I left 5 pieces of the hardcoded value of 512, because
I (maybe erroneously) distinguished between a file header
and file chunks inside a TAR file, they are all 512.

Is it okay to change every hardcoded 512 to TARCHUNKSZ,
maybe adding a comment to the #define that it must not
be modified ever?


The name choice of basebackup for the bool in ReceiveTarFile() is
unfortunate, since we use the term base backup to refer to the
complete thing, not just the main tablespace. Probably
basetablespcae instead. And it should then be assigned at the top of
the function (to the result of PQgetisnull()), and used in the main
if() branch as well.


Will change it.


Should we really print the status message even if not in verbose mode?
We only print the base backup complete messages in verbose mode, for
example.


OK.


It seems wrong to free() recoveryconf in ReceiveTarFile(). It's
allocated globally at the beginning. While that loop should only be
called once (since only one tablespace can be the main one), it's a
confusing location for the free.

The whole tar writing part of the code needs a lot more comments. It's
entirely unclear what the code does there. Why does the recovery.conf
writing code need to be split up in multiple locations inside

[HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Tom Lane
The discussion of bug #7670 showed that what's happening there is that
if you specify a log_rotation_age of more than 25 days (2^31 msec),
WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
leading to unportable behavior.  At least some kernels will return
EINVAL for that, and it's not very clear what will happen on others.

After some thought about this, I think the best thing to do is to tweak
syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
that a couple of people have tried to set log_rotation_age to 30 days or
more suggests that it's useful, so reducing the GUC's upper limit isn't
a desirable fix.  This should be an easy change since the logic in that
loop will already behave correctly if it's woken up before the requested
rotation time.

I went looking for other timeout-related GUC variables that might have
overoptimistic upper limits, and found these cases:

PostAuthDelay is converted to microseconds, and therefore had better
have an upper bound of INT_MAX/100 seconds.  (Larger values would
work on machines with 64-bit long, but it doesn't seem worth
complicating the code to allow for that.)

contrib/auth_delay's auth_delay_milliseconds is likewise converted
to microseconds, so its max had better be INT_MAX/1000 msec.  (Likewise,
I don't see any value in supporting larger limits.)

XLogArchiveTimeout, although it's never multiplied by anything,
is compared to 
((int) (now - last_xlog_switch_time))
which means that its nominal maximum of INT_MAX is problematic:
if you actually set it to that it would be quite possible for
the system to miss the timeout occurrence if it didn't chance
to service the timeout signal in exactly the last second of the
interval.  (After that second, the above-quoted expression would
overflow and lead to the wrong comparison result.)  So it seems
to me we'd better back it off some.  I'm inclined to propose
dropping it to INT_MAX/2 seconds.

Comments, objections?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enabling Checksums

2012-11-18 Thread Jeff Davis
On Wed, 2012-11-14 at 17:40 -0800, Jeff Davis wrote:
 I'll do another pass to make sure I update all of the comments, and try
 to self review it.

Updated patches attached (the TLI patch wasn't changed though, only the
main checksums patch).

Changes:
  * A lot of cleanup
  * More testing
  * Added check during pg_upgrade to make sure the checksum settings
match.
  * Fixed output of pg_resetxlog to include information about checksums.
  * fixed contrib/pageinspect, and included upgrade script for it
  * removed code to skip the page hole during the checksum calculation.
We can reconsider if we think performance will be a real problem.
  * I added the header bits back in, because we will need them when we
want to support enabling/disabling checksums when the system is online.

I also did quite a bit more testing, although it could use some
performance testing. I'll also probably do another review pass myself,
but I think it's in good shape.

Also, if performance of the checksum calculation itself turns out to be
a problem, we might consider modifying the algorithm to do multiple
bytes at a time.

One purpose of this patch is to establish the on-disk format for
checksums, so we shouldn't defer decisions that would affect that (e.g.
doing checksum calculation in larger chunks, ignoring the page hole, or
using a different scheme for the bits in the header).

Regards,
Jeff Davis


replace-tli-with-checksums-20121118.patch.gz
Description: GNU Zip compressed data


checksums-20121118.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Andres Freund
On 2012-11-18 14:57:51 -0500, Tom Lane wrote:
 The discussion of bug #7670 showed that what's happening there is that
 if you specify a log_rotation_age of more than 25 days (2^31 msec),
 WaitLatch will sometimes be passed a timeout of more than 2^31 msec,
 leading to unportable behavior.  At least some kernels will return
 EINVAL for that, and it's not very clear what will happen on others.

 After some thought about this, I think the best thing to do is to tweak
 syslogger.c to to clamp the requested sleep to INT_MAX msec.  The fact
 that a couple of people have tried to set log_rotation_age to 30 days or
 more suggests that it's useful, so reducing the GUC's upper limit isn't
 a desirable fix.  This should be an easy change since the logic in that
 loop will already behave correctly if it's woken up before the requested
 rotation time.

Cool. Agreed.

 I went looking for other timeout-related GUC variables that might have
 overoptimistic upper limits, and found these cases:

 [sensible stuff]

Lowering the maximum of those seems sensible to me. Anybody using that
large value for those already had a problem even if it worked.

I think at least wal_sender_timeout and wal_receiver_timeout are also
problematic.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I think at least wal_sender_timeout and wal_receiver_timeout are also
 problematic.

I looked at those and didn't see a problem --- what are you worried
about exactly?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding overflow in timeout-related calculations

2012-11-18 Thread Andres Freund
On 2012-11-18 15:21:34 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  I think at least wal_sender_timeout and wal_receiver_timeout are also
  problematic.
 
 I looked at those and didn't see a problem --- what are you worried
 about exactly?

Forget it, too hungry to read the code properly...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP: store additional info in GIN index

2012-11-18 Thread Alexander Korotkov
Hackers,

Attached patch enables GIN to store additional information with item
pointers in posting lists and trees.
Such additional information could be positions of words, positions of
trigrams, lengths of arrays and so on.
This is the first and most huge patch of serie of GIN improvements which
was presented at PGConf.EU
http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf

Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).

Patch completely changes storage in posting lists and leaf pages of posting
trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
BlockNumber are stored incremental in page. Additionally one bit of
OffsetNumber is reserved for additional information NULL flag. To be able
to find position in leaf data page quickly patch introduces small index in
the end of page.

--
With best regards,
Alexander Korotkov.


ginaddinfo.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-11-18 Thread Tomas Vondra
Hi!

On 26.9.2012 19:18, Jeff Janes wrote:
 On Wed, Sep 26, 2012 at 9:29 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Excerpts from Euler Taveira's message of mié sep 26 11:53:27 -0300 2012:
 On 26-09-2012 09:43, Tomas Vondra wrote:
 5) splitting the single stat file into multiple pieces - e.g. per 
 database,
 written separately, so that the autovacuum workers don't need to read all
 the data even for databases that don't need to be vacuumed. This might be
 combined with (4).

 IMHO that's the definitive solution. It would be one file per database 
 plus a
 global one. That way, the check would only read the global.stat and process
 those database that were modified. Also, an in-memory map could store that
 information to speed up the checks.

 +1

 That would help for the case of hundreds of databases, but how much
 does it help for lots of tables in a single database?
 
 It doesn't help that case, but that case doesn't need much help.  If
 you have N statistics-kept objects in total spread over M databases,
 of which T objects need vacuuming per naptime, the stats file traffic
 is proportional to N*(M+T).  If T is low, then there is generally is
 no problem if M is also low.  Or at least, the problem is much smaller
 than when M is high for a fixed value of N.

I've done some initial hacking on splitting the stat file into multiple
smaller pieces over the weekend, and it seems promising (at least with
respect to the issues we're having).

See the patch attached, but be aware that this is a very early WIP (or
rather a proof of concept), so it has many rough edges (read sloppy
coding). I haven't even added it to the commitfest yet ...

The two main changes are these:

(1) The stats file is split into a common db file, containing all the
DB Entries, and per-database files with tables/functions. The common
file is still called pgstat.stat, the per-db files have the
database OID appended, so for example pgstat.stat.12345 etc.

This was a trivial hack pgstat_read_statsfile/pgstat_write_statsfile
functions, introducing two new functions:

   pgstat_read_db_statsfile
   pgstat_write_db_statsfile

that do the trick of reading/writing stat file for one database.

(2) The pgstat_read_statsfile has an additional parameter onlydbs that
says that you don't need table/func stats - just the list of db
entries. This is used for autovacuum launcher, which does not need
to read the table/stats (if I'm reading the code in autovacuum.c
correctly - it seems to be working as expected).

So what are the benefits?

(a) When a launcher asks for info about databases, something like this
is called in the end:

   pgstat_read_db_statsfile(InvalidOid, false, true)

which means all databases (InvalidOid) and only db info (true). So
it reads only the one common file with db entries, not the
table/func stats.

(b) When a worker asks for stats for a given DB, something like this is
called in the end:

   pgstat_read_db_statsfile(MyDatabaseId, false, false)

which reads only the common stats file (with db entries) and only
one file for the one database.

The current implementation (with the single pgstat.stat file), all
the data had to be read (and skipped silently) in both cases.
That's a lot of CPU time, and we're seeing ~60% of CPU spent on
doing just this (writing/reading huge statsfile).

So with a lot of databases/objects, this pgstat.stat split saves
us a lot of CPU ...

(c) This should lower the space requirements too - with a single file,
you actually need at least 2x the disk space (or RAM, if you're
using tmpfs as we are), because you need to keep two versions of
the file at the same time (pgstat.stat and pgstat.tmp).

Thanks to this split you only need additional space for a copy of
the largest piece (with some reasonable safety reserve).


Well, it's very early patch, so there are rough edges too

(a) It does not solve the many-schema scenario at all - that'll need
a completely new approach I guess :-(

(b) It does not solve the writing part at all - the current code uses a
single timestamp (last_statwrite) to decide if a new file needs to
be written.

That clearly is not enough for multiple files - there should be one
timestamp for each database/file. I'm thinking about how to solve
this and how to integrate it with pgstat_send_inquiry etc.

One way might be adding the timestamp(s) into PgStat_StatDBEntry
and the other one is using an array of inquiries for each database.

And yet another one I'm thinking about is using a fixed-length
array of timestamps (e.g. 256), indexed by mod(dboid,256). That
would mean stats for all databases with the same mod(oid,256) would
be written at the same time. Seems like an over-engineering though.

(c) I'm a bit worried about the number of files - right now there's one

Re: [HACKERS] pg_dump --split patch

2012-11-18 Thread Marko Tiikkaja

Hi,

On 16/11/2012 15:52, Dimitri Fontaine wrote:

Marko Tiikkaja pgm...@joh.to writes:

The general output scheme looks like this:
schemaname/OBJECT_TYPES/object_name.sql,


I like this feature, I actually did have to code it myself in the past
and several other people did so, so we already have at least 3 copies of
`getddl` variants around. I really think this feature should be shipped
by default with PostgreSQL.

I don't much care for the all uppercase formating of object type
directories in your patch though.


*shrug*  I have no real preference to one way or the other.


Overloaded functions are dumped into the same file.  Object names are
encoded into the POSIX Portable Filename Character Set ([a-z0-9._-]) by
replacing any characters outside that set with an underscore.


What happens if you have a table foo and another table FoO?


They would go to the same file.  If you think there are technical issues 
behind that decision (e.g. the dump would not restore), I would like to 
hear an example case.


On the other hand, some people might find it preferrable to have them in 
different files (for example foo, foo.1, foo.2 etc).  Or some might 
prefer some other naming scheme.  One of the problems with this patch is 
exactly that people prefer different things, and providing switches for 
all of the different options people come up with would mean a lot of 
switches. :-(



Restoring the dump is supported through an index.sql file containing
statements which include (through \i) the actual object files in the dump
directory.


I think we should be using \ir now that we have that.


Good point, will have to get that fixed.


Any thoughts?  Objections on the idea or the implementation?


As far as the implementation goes, someone with more experience on the
Archiver Handles should have a look. To me, it looks like you are trying
to shoehorn your feature in the current API and that doesn't feel good.


It feels a bit icky to me too, but I didn't feel comfortable with 
putting in a lot of work to refactor the API because of how 
controversial this feature is.



The holly grail here that we've been speaking about in the past would be
to separate out tooling and formats so that we have:

pg_dump | pg_restore
pg_export | psql

In that case we would almost certainly need libpgdump to share the code,
and we maybe could implement a binary output option for pg_dump too
(yeah, last time it was proposed we ended up with bytea_output = 'hex').


While I agree that this idea - when implemented - would be nicer in 
practically every way, I'm not sure I want to volunteer to do all the 
necessary work.



That libpgdump idea basically means we won't have the --split feature in
9.3, and that's really bad, as we already are some releases late on
delivering that, in my opinion.

Maybe the pg_export and pg_dump tool could share code by just #include
magic rather than a full blown lib in a first incantation?


That's one idea..


Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum stress-testing our system

2012-11-18 Thread Tomas Vondra
On 26.9.2012 18:29, Tom Lane wrote:
 What seems to me like it could help more is fixing things so that the
 autovac launcher needn't even launch a child process for databases that
 haven't had any updates lately.  I'm not sure how to do that, but it
 probably involves getting the stats collector to produce some kind of
 summary file.

Couldn't we use the PgStat_StatDBEntry for this? By splitting the
pgstat.stat file into multiple pieces (see my other post in this thread)
there's a file with StatDBEntry items only, so maybe it could be used as
the summary file ...

I've been thinking about this:

 (a) add needs_autovacuuming flag to PgStat_(TableEntry|StatDBEntry)

 (b) when table stats are updated, run quick check to decide whether
 the table needs to be processed by autovacuum (vacuumed or
 analyzed), and if yes then set needs_autovacuuming=true and both
 for table and database

The worker may read the DB entries from the file and act only on those
that need to be processed (those with needs_autovacuuming=true).

Maybe the DB-level field might be a counter of tables that need to be
processed, and the autovacuum daemon might act on those first? Although
the simpler the better I guess.

Or did you mean something else?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-18 Thread Xi Wang
The INT_MIN / -1 crash problem was partially addressed in 2006 and
commit 9fc6f4e1ae107f44807c4906105e1f7eb052ecb1.

http://archives.postgresql.org/pgsql-patches/2006-06/msg00102.php

However, the fix is incomplete and incorrect for some cases.

64-bit crash


Below is an example that crashes PostgreSQL on Windows, using the
64-bit binary from http://www.postgresql.org/download/windows/.

SELECT ((-9223372036854775808)::int8) % (-1);

Note that the previous discussion concluded that int8 (64-bit)
division didn't crash, which is incorrect.

http://archives.postgresql.org/pgsql-patches/2006-06/msg00104.php

My guess is that the previous test was carried out on a 32-bit
Windows, where there's no 64-bit division instruction.  In that
case, the generated code calls a runtime function (e.g., lldiv),
which doesn't trap.  However, on a 64-bit system, the compiler
generates the idivq instruction, which leads to a crash.

Note that the problem is not limited to division.  The following
multiplication crashes as well:

SELECT ((-9223372036854775808)::int8) * ((-1)::int8);

The reason is that the multiplication overflow check uses a division.

32-bit crash


The previous fix doesn't fix all possible crashes, even on a 32-bit
Windows.  Below is an example to crash a 32-bit PostgreSQL:

SELECT ((-2147483648)::int4) % ((-1)::int2);

Portability
===

The previous fix uses #ifdef WIN32 ... #endif, which is not portable,
as noted below:

http://archives.postgresql.org/pgsql-patches/2006-06/msg00103.php

Using a SIGFPE handler is also dangerous (e.g., causing an infinite
loop sometimes):

https://www.securecoding.cert.org/confluence/display/seccode/SIG35-C.+Do+not+return+from+SIGSEGV,+SIGILL,+or+SIGFPE+signal+handlers

Strictly speaking, using postcondition checking to detect signed
division overflows (actually, all signed integer overflows) violates
the C language standard, because signed integer overflow is undefined
behavior in C and we cannot first compute the result and then check
for overflow.  Compilers can do a lot of funny and crazy things
(e.g., generating traps or even optimizing away overflow checks).

BTW, PostgreSQL currently uses gcc's workaround option -fwrapv to
disable offending optimizations based on integer overflows.  The
problems are: (1) it doesn't always work (e.g., for division), and
(2) many other C compilers do not even support this workaround
option and can perform offending optimizations.

Patching


Below is a patch that fixes division crashes.  It removes #ifdef WIN32
and tries to use portable checks.  I'll send more (e.g., for fixing
multiplication crashes) if this looks good.

I understand that the existing integer overflow checks tried to
avoid dependency on constants like INT64_MIN.  But I'm not sure how
to perform simpler and portable overflow checks without using such
constants.

Also, I could use the SHRT_MIN rather than introducing INT16_MIN; I just
feel like using INT16_MIN with int16 is clearer and less confusing.

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 9f752ed..d7867cb 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -732,30 +732,18 @@ int4div(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
 
-#ifdef WIN32
-
/*
-* Win32 doesn't throw a catchable exception for SELECT -2147483648 /
-* (-1); -- INT_MIN
+* Overflow check.  The only possible overflow case is for arg1 =
+* INT32_MIN, arg2 = -1, where the correct result is -INT32_MIN, which
+* can't be represented on a two's-complement machine.
 */
-   if (arg2 == -1  arg1 == INT_MIN)
+   if (arg1 == INT32_MIN  arg2 == -1)
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
 errmsg(integer out of range)));
-#endif
 
result = arg1 / arg2;
 
-   /*
-* Overflow check.  The only possible overflow case is for arg1 = 
INT_MIN,
-* arg2 = -1, where the correct result is -INT_MIN, which can't be
-* represented on a two's-complement machine.  Most machines produce
-* INT_MIN but it seems some produce zero.
-*/
-   if (arg2 == -1  arg1  0  result = 0)
-   ereport(ERROR,
-   (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
-errmsg(integer out of range)));
PG_RETURN_INT32(result);
 }
 
@@ -877,18 +865,17 @@ int2div(PG_FUNCTION_ARGS)
PG_RETURN_NULL();
}
 
-   result = arg1 / arg2;
-
-   /*
-* Overflow check.  The only possible overflow case is for arg1 =
-* SHRT_MIN, arg2 = -1, where the correct result is -SHRT_MIN, which 
can't
-* be represented on a two's-complement machine.  Most machines produce
-* SHRT_MIN but it seems some produce zero.
+   /* 

Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-18 Thread Tom Lane
Xi Wang xi.w...@gmail.com writes:
 [ patch adding a bunch of explicit INT_MIN/MAX constants ]

I was against this style of coding before, and I still am.
For one thing, it's just about certain to introduce conflicts
against system headers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.3 pg_archivecleanup broken?

2012-11-18 Thread Fujii Masao
On Mon, Nov 19, 2012 at 12:43 AM, Erik Rijkers e...@xs4all.nl wrote:
 (In a test setup) I can't get pg_archivecleanup to remove WALfiles in 
 9.3devel. (A very similar
 setup in 9.2 works fine).

 In 9.3 pg_archivecleanup just keeps repeating lines like:

 pg_archivecleanup: keep WAL file 
 /home/aardvark/pg_stuff/archive_dir93/
 and later

 (and does not delete any files.)

 Configuration:

 # master  pgsql.93_1/data/postgresql.conf:
 data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_1/data'
 listen_addresses = '*'
 max_connections = 100
 shared_buffers = 128MB
 wal_level = hot_standby
 synchronous_commit = on
 checkpoint_segments = 3
 archive_mode = on
 archive_command = 'cp %p /home/aardvark/pg_stuff/archive_dir93/%f  /dev/null'
 max_wal_senders = 3
 synchronous_standby_names = '*'

 # slave   pgsql.93_2/data/postgresql.conf:
 data_directory = '/home/aardvark/pg_stuff/pg_installations/pgsql.93_2/data'
 listen_addresses = '*'
 port = 6665
 max_connections = 100
 shared_buffers = 128MB
 wal_level = hot_standby
 synchronous_commit = on
 checkpoint_segments = 3
 max_wal_senders = 3
 synchronous_standby_names = ''
 hot_standby = on
 wal_receiver_status_interval = 59

 # pgsql.93_2/data/recovery.conf
 primary_conninfo = 'host=127.0.0.1 port=6664 user=aardvark password=sekr1t
 application_name=wal_receiver_01'
 standby_mode = 'on'
 restore_command = 'cp /home/aardvark/pg_stuff/archive_dir93/%f %p  /dev/null'
 archive_cleanup_command = 'pg_archivecleanup -d 
 /home/aardvark/pg_stuff/archive_dir93 %r'


 Seeing that the same setup in 9.2 has pg_archivecleanup deleting files, it 
 would seem that some
 bug exists but I haven't followed changes regarding WAL too closely.

Thanks for the report! I was able to reproduce this problem.

What's broken is not pg_archivecleanup itself but %r in archive_cleanup_command
which is replaced by the name of the file containing the last valid
restart point.
In 9.3dev, %r is always replaced by an invalid WAL filename (i.e., )
wrongly.

This bug is derived from the commit d5497b95f3ca2fc50c6eef46d3394ab6e6855956.
This commit changed ExecuteRecoveryCommand() so that it calculates the
the last valid
retart file by using GetOldestRestartPoint(), even though
GetOldestRestartPoint() only
works in the startup process and only while WAL replay is in progress
(i.e., InRedo = true).
In archive_cleanup_command, ExecuteRecoveryCommand() is executed by the
checkpointer process, so the problem happened.

I found recovery_end_command also has the same bug because it calls
ExecuteRecoveryComand() after WAL replay is completed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parser - Query Analyser

2012-11-18 Thread Craig Ringer
On 11/18/2012 09:57 PM, Michael Giannakopoulos wrote:
 Hi guys,

 Thanks for your answers. Yes, what I meant is to create a function
 that takes as an input rows of a specific relation, does something and
 returns as an output rows with different attributes. I am
 experimenting right now with the 'CREATE TYPE' and 'CREATE FUNCTION'
 commands in order to see what I can get out of them!
I have replied on pgsql-general.

Please do not top-post, and reply to the list not directly to me.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-18 Thread Jeff Janes
On Sun, Oct 21, 2012 at 12:59 AM, Amit kapila amit.kap...@huawei.com wrote:
 On Saturday, October 20, 2012 11:03 PM Jeff Janes wrote:

Run the modes in reciprocating order?
 Sorry, I didn't understood this, What do you mean by modes in reciprocating 
 order?

Sorry for the long delay.  In your scripts, it looks like you always
run the unpatched first, and then the patched second.

By reciprocating, I mean to run them in the reverse order, or in random order.

Also, for the select only transactions, I think that 20 minutes is
much longer than necessary.  I'd rather see many more runs, each one
being shorter.

Because you can't restart the server without wiping out the
shared_buffers, what I would do is make a test patch which introduces
a new guc.c setting which allows the behavior to be turned on and off
with a SIGHUP (pg_ctl reload).



I haven't been able to detect any reliable difference in performance
with this patch.  I've been testing with 150 scale factor with 4GB of
ram and 4 cores, over a variety of shared_buffers and concurrencies.

 I think the main reason for this is that when shared buffers are less, then 
 there is no performance gain,
 even the same is observed by me when I ran this test with shared buffers=2G, 
 there is no performance gain.
 Please see the results of shared buffers=2G in below mail:
 http://archives.postgresql.org/pgsql-hackers/2012-09/msg00422.php

True, but I think that testing with shared_buffers=2G when RAM is 4GB
(and pgbench scale is also lower) should behave different than doing
so when RAM is 24 GB.


 The reason I can think of is because when shared buffers are less then clock 
 sweep runs very fast and there is no bottleneck.
 Only when shared buffers increase above some threshhold, it spends reasonable 
 time in clock sweep.

I am rather skeptical of this.  When the work set doesn't fit in
memory under a select-only workload, then about half the buffers will
be evictable at any given time, and half will have usagecount=1, and a
handful will usagecount=4 (index meta, root and branch blocks).  This
will be the case over a wide range of shared_buffers, as long as it is
big enough to hold all index branch blocks but not big enough to hold
everything.  Given this state of affairs, the average clock sweep
should be about 2, regardless of the exact size of shared_buffers.

The one wrinkle I could think of is if all the usagecount=1 buffers
are grouped into a continuous chunk, and all the usagecount=0 are in
another chunk.  The average would still be 2, but the average would be
made up of N/2 runs of length 1, followed by one run of length N/2.
Now if 1 process is stuck in the N/2 stretch and all other processes
are waiting on that, maybe that somehow escalates the waits so that
they are larger when N is larger, but I still don't see how the math
works on that.

Are you working on this just because it was on the ToDo List, or
because you have actually run into a problem with it?  I've never seen
freelist lock contention be a problem on machines with less than 8
CPU, but both of us are testing on smaller machines.  I think we
really need to test this on something bigger.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3

2012-11-18 Thread Michael Paquier
On Fri, Nov 16, 2012 at 5:16 PM, Andrea Suisani sick...@opinioni.netwrote:

 Il 16/11/2012 05:34, Michael Paquier ha scritto:

  Do you have a git repository or something where all the 14 patches are
 applied? I would like to test the feature globally.
 Sorry I recall that you put a link somewhere but I cannot remember its
 email...


 http://archives.postgresql.org/pgsql-hackers/2012-11/msg00686.php

Thanks Andrea.
I am pretty sure I will be able to provide some feedback by Friday.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [WIP PATCH] for Performance Improvement in Buffer Management

2012-11-18 Thread Jeff Janes
On Mon, Oct 22, 2012 at 10:51 AM, Amit kapila amit.kap...@huawei.com wrote:


 Today again I have again collected the data for configuration Shared_buffers 
 = 7G along with vmstat.
 The data and vmstat information (bi) are attached with this mail. It is 
 observed from vmstat info that I/O is happening for both cases, however after 
 running for
 long time, the I/O is also comparatively less with new patch.

What I see in the vmstat report is that it takes 5.5 runs to get
really good and warmed up, and so it crawls for the first 5.5
benchmarks and then flies for the last 0.5 benchmark.  The way you
have your runs ordered, that last 0.5 of a benchmark is for the
patched version, and this drives up the average tps for the patched
case.

Also, there is no theoretical reason to think that your patch would
decrease the amount of IO needed (in fact, by invalidating buffers
early, it could be expected to increase the amount of IO).  So this
also argues that the increase in performance is caused by the decrease
in IO, but the patch isn't causing that decrease, it merely benefits
from it due to an accident of timing.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] pg_ping utility

2012-11-18 Thread Michael Paquier
On Sat, Nov 17, 2012 at 2:48 AM, Phil Sorber p...@omniti.com wrote:

 On Thu, Nov 15, 2012 at 10:55 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Fri, Nov 16, 2012 at 12:34 PM, Phil Sorber p...@omniti.com wrote:
  On Thu, Nov 15, 2012 at 9:23 PM, Michael Paquier
  michael.paqu...@gmail.com wrote:
   3) Having an output close to what ping actually does would also be
 nice,
   the
   current output like Accepting/Rejecting Connections are not that
 
  Could you be more specific? Are you saying you don't want to see
  accepting/rejecting info output?
 
  OK sorry.
 
  I meant something like that for an accessible server:
  $ pg_ping -c 3 -h server.com
  PING server.com (192.168.1.3)
  accept from 192.168.1.3: icmp_seq=0 time=0.241 ms
  accept from 192.168.1.3: icmp_seq=0 time=0.240 ms
  accept from 192.168.1.3: icmp_seq=0 time=0.242 ms
 
  Like that for a rejected connection:
  reject from 192.168.1.3: icmp_seq=0 time=0.241 ms
 
  Like that for a timeout:
  timeout from 192.168.1.3: icmp_seq=0
  Then print 1 line for each ping taken to stdout.

 How does icmp_seq fit into this? Or was that an oversight?

Yes, sorry it doesn't fit in this model. Please forget about it.


 Also, in standard ping if you don't pass -c it will continue to loop
 until interrupted. Would you suggest that pg_ping mimic that, or that
 we add an additional flag for that behavior?

By targeting pg_ping as a clone of ping, yes it would mean that we target
it to loop indefinitely if no c flags is given.


 FWIW, I would use 'watch' with the existing output for cases that I
 would need something like that.

watch allows you to launch a program given a certain time period. I am not
sure this is related with pinging a server.
When pinging a server, what you are looking for is not only the
connectivity to it but also the latency you have with it, no?
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [WIP] pg_ping utility

2012-11-18 Thread Michael Paquier
On Fri, Nov 16, 2012 at 2:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Maybe I missed something here, but I believe it's standard that
 program --help should result in exit(0), no matter what the program's
 exitcode conventions are for live-fire exercises.

Yes indeed you are right. Thanks.
-- 
Michael Paquier
http://michael.otacoo.com


Re: [HACKERS] [RFC] Fix div/mul crash and more undefined behavior

2012-11-18 Thread Xi Wang
On 11/18/12 6:47 PM, Tom Lane wrote:
 Xi Wang xi.w...@gmail.com writes:
 [ patch adding a bunch of explicit INT_MIN/MAX constants ]
 
 I was against this style of coding before, and I still am.
 For one thing, it's just about certain to introduce conflicts
 against system headers.

I totally agree.

I would be happy to rewrite the integer overflow checks without
using these explicit constants, but it seems extremely tricky to
do so.  One possible check without using INTn_MIN is:

if (arg1  0  -arg1  0  arg2 == -1) { ... }

Compared to (arg1 == INTn_MIN  arg2 == -1), the above check is
not only more confusing and difficult to understand, but it also
invokes undefined behavior (-INT_MIN overflow), which is dangerous:
many C compilers will optimize away the check.  I've tried gcc,
clang, PathScale, and AMD's Open64, all of which perform such
optimizations.

Since INTn_MIN and INTn_MAX are standard macros from the C library,
can we assume that every C compiler should provide them in stdint.h?
At least this is true for gcc, clang, and Visual C++.  Then we don't
have to define them and worry about possible conflicts (though I
think using #ifndef...#endif should be able to avoid conflicts).

- xi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm partial-match

2012-11-18 Thread Tomas Vondra
On 15.11.2012 20:39, Fujii Masao wrote:
 Hi,
 
 I'd like to propose to extend pg_trgm so that it can compare a partial-match
 query key to a GIN index. IOW, I'm thinking to implement the 'comparePartial'
 GIN method for pg_trgm.
 
 Currently, when the query key is less than three characters, we cannot use
 a GIN index (+ pg_trgm) efficiently, because pg_trgm doesn't support a
 partial-match method. In this case, seq scan or index full scan would be
 executed, and its response time would be very slow. I'd like to alleviate this
 problem.
 
 Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
 i.e., if query key contains multibyte characters. In this case, byte length of
 the trigram string might be larger than three, and its CRC is used as a
 trigram key instead of the trigram string itself. Because of using CRC, we
 cannot do a partial-match. Attached patch extends pg_trgm so that it
 compares a partial-match query key only when KEEPONLYALNUM is
 enabled.
 
 Attached patch is WIP yet. What I should do next is:
 
 * version up pg_trgm from 1.0 to 1.1, i.e., create pg_trgm--1.1.sql, etc.
 * write the regression test
 
 Comments? Review? Objection?

Hi,

I've done a quick review of the current patch:

(1) It applies cleanly on the current master and builds fine.

(2) In pg_trgm--1.0.sql the gin_trgm_compare_partial is indented
differently (using tabs instead of spaces).

(3) In trgm_gin.c, function gin_extract_value_trgm contains #ifdef
KEEPONLYALNUM, although trgm_pmatch is not used at all.

(4) The patch removes some commented-out variables, but there still
remain various commented-out variables. Will this be cleaned too?

(5) I've done basic functionality of the patch, it really seems to work:

CREATE EXTENSION pg_trgm ;
CREATE TABLE TEST (val TEXT);
INSERT INTO test
   SELECT md5(i::text) FROM generate_series(1,100) s(i);
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;

EXPLAIN SELECT * FROM test WHERE val LIKE '%aa%';
  QUERY PLAN

 Bitmap Heap Scan on test  (cost=1655.96..11757.63 rows=141414 width=33)
   Recheck Cond: (val ~~ '%aa%'::text)
   -  Bitmap Index Scan on trgm_idx (cost=0.00..1620.61 rows=141414
width=0)
 Index Cond: (val ~~ '%aa%'::text)
(4 rows)

Without the patch, this gives a seq scan (as expected).

Do you expect to update the docs too? IMHO it's worth mentioning that
the pg_trgm can handle even patterns shorter than 2 chars ...


regards
Tomas Vondra


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-18 Thread Amit Kapila
 On Sunday, November 18, 2012 3:08 AM Fujii Masao wrote:
 On Sat, Nov 17, 2012 at 10:25 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  1. have a system table pg_global_system_settings(key,value)
 
 Do we really need to store the settings in a system table?
 Since WAL would be generated when storing the settings
 in a system table, this approach seems to prevent us from
 changing the settings in the standby.

Thanks for your feedback.

  It is one of the Approach, we were trying to evaluate for this feature.
  However this feature can be done by directly operating on file as well.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Patch to fix missing libecpg_compat.lib and libpgtypes.lib.

2012-11-18 Thread JiangGuiqing
hi

I install postgresql-9.1.5 from source code on windows successfully.
But there is not libecpg_compat.lib and libpgtypes.lib in the
installed lib directory .
If someone used functions of pgtypes or ecpg_compat library in ecpg
programs,the ecpg program build will fail.

So i modify src/tools/msvc/Install.pm to resolve this problem.
The diff file refer to the attachment Install.pm.patch.


Regards,
Jiang Guiqing




diff --git a/src/tools/msvc/Install.pm b/src/tools/msvc/Install.pm
index 6036714..91e6a53 100644
--- a/src/tools/msvc/Install.pm
+++ b/src/tools/msvc/Install.pm
@@ -86,7 +86,7 @@ sub Install
'Import libraries', $target . '/lib/',
$conf\\,  postgres\\postgres.lib,
libpq\\libpq.lib, libecpg\\libecpg.lib,
-   libpgport\\libpgport.lib);
+   
libpgport\\libpgport.lib,libpgtypes\\libpgtypes.lib,libecpg_compat\\libecpg_compat.lib);
CopySetOfFiles(
'timezone names',
[ glob('src\timezone\tznames\*.txt') ],

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_trgm partial-match

2012-11-18 Thread Alexander Korotkov
Hi!

On Thu, Nov 15, 2012 at 11:39 PM, Fujii Masao masao.fu...@gmail.com wrote:

 Note that we cannot do a partial-match if KEEPONLYALNUM is disabled,
 i.e., if query key contains multibyte characters. In this case, byte
 length of
 the trigram string might be larger than three, and its CRC is used as a
 trigram key instead of the trigram string itself. Because of using CRC, we
 cannot do a partial-match. Attached patch extends pg_trgm so that it
 compares a partial-match query key only when KEEPONLYALNUM is
 enabled.


Didn't get this point. How does KEEPONLYALNUM guarantee that each trigram
character is singlebyte?

CREATE TABLE test (val TEXT);
INSERT INTO test VALUES ('aa'), ('aaa'), ('шaaш');
CREATE INDEX trgm_idx ON test USING gin (val gin_trgm_ops);
ANALYZE test;
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val
--
 aa
 aaa
 шaaш
(3 rows)
test=# set enable_seqscan = off;
SET
test=# SELECT * FROM test WHERE val LIKE '%aa%';
 val
-
 aa
 aaa
(2 rows)

I think we can use partial match only for singlebyte encodings. Or, at
most, in cases when all alpha-numeric characters are singlebyte (have no
idea how to check this).

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] [PATCH 13/14] Introduce pg_receivellog, the pg_receivexlog equivalent for logical changes

2012-11-18 Thread Michael Paquier
Hi,

I am just looking at this patch and will provide some comments.
By the way, you forgot the installation part of pg_receivellog, please see
patch attached.
Thanks,

On Thu, Nov 15, 2012 at 10:17 AM, Andres Freund and...@2ndquadrant.comwrote:

 ---
  src/bin/pg_basebackup/Makefile |   7 +-
  src/bin/pg_basebackup/pg_receivellog.c | 717
 +
  src/bin/pg_basebackup/streamutil.c |   3 +-
  src/bin/pg_basebackup/streamutil.h |   1 +
  4 files changed, 725 insertions(+), 3 deletions(-)
  create mode 100644 src/bin/pg_basebackup/pg_receivellog.c



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Michael Paquier
http://michael.otacoo.com


20121119_pg_receivellog_install.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-18 Thread Amit Kapila
On Sunday, November 18, 2012 3:28 AM Tom Lane wrote:
 Fujii Masao masao.fu...@gmail.com writes:
  Do we really need to store the settings in a system table?
  Since WAL would be generated when storing the settings
  in a system table, this approach seems to prevent us from
  changing the settings in the standby.
 
 That's a really good point: if we try to move all GUCs into a system
 table, there's no way for a standby to have different values; and for
 some of them different values are *necessary*.
 
 I think that shoots down this line of thought entirely.  Can we go
 back to the plain write a file approach now?  

Sure.

 I think a SET
 PERSISTENT command that's disallowed in transaction blocks and just
 writes the file immediately is perfectly sensible.

I got the point that we can disallow inside transaction blocks.
Just to clarify, that by above do you mean to say that file write (rename
from .conf.lock to .conf) should be done as soon as
command execution ends rather than the transaction end or it should be done
at transaction end?

Still I think we are not able to completely rule out one or other from
syntax perspective.

We have discussion about below 3 different syntaxes for this command
 
1. ALTER SYSTEM
2. SET PERSISENT
3. pg_change_conf()

I think to conclude, we need to evaluate which syntax has more advantages.
Comparison for above syntax

1. If we want to allow multiple configuration parameters now or in future to
be updated in single command. Example
   a. Alter System Set port = 5435, max_connections = 100;
   b. Select pg_change_conf('port', 5435),
pg_change_conf('max_connections',100);

   I think it might be convenient for user to use Command syntax.

2. If we provide built-in, user can try to use in some complicated syntax
   Select 1/0 from tbl where a= pg_change_conf('max_connections',100);
   The design and test needs to take care of such usage, so that it doesn't
create any problem.

3. Using with the SET command syntax can create some confusion for user, as
SET SESSION | LOCAL option can work
   in transaction blocks, but this feature may not be required to work in
transaction blocks as it will change in
   config file which can take effect only on re-start or sighup. 


I believe some more thoughts and suggestions are required to conclude.

Thoughts/Suggestions/Comments?


With Regards,
Amit Kapila.





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL

2012-11-18 Thread Amit Kapila
On Sunday, November 18, 2012 3:22 PM Cédric Villemain wrote:
 Le samedi 17 novembre 2012 22:57:49, Tom Lane a écrit :
  Fujii Masao masao.fu...@gmail.com writes:
   Do we really need to store the settings in a system table?
   Since WAL would be generated when storing the settings in a system
   table, this approach seems to prevent us from changing the settings
   in the standby.
 
  That's a really good point: if we try to move all GUCs into a system
  table, there's no way for a standby to have different values; and for
  some of them different values are *necessary*.
 
  I think that shoots down this line of thought entirely.  Can we go
  back to the plain write a file approach now?  I think a SET
  PERSISTENT command that's disallowed in transaction blocks and just
  writes the file immediately is perfectly sensible.
 
 I was justifying the usage of a table structure, not to keep it in sync
 (just use it to hide the complexity of locks).
 
 Anyway that was just comments.
  Thanks.
  You comments are thought provoking. I was able to proceed for table
related approach based on your suggestions.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v3

2012-11-18 Thread Michael Paquier
Hi Andres,

I have been able to fetch your code (thanks Andrea!) and some it. For the
time being I am spending some time reading the code and understanding the
whole set of features you are trying to implement inside core, even if I
got some background from what you presented at PGCon and from the hackers
ML.
Btw, as a first approach, I tried to run the logical log receiver plugged
on a postgres server, and I am not able to make it work.

Well, I am using settings similar to yours.
# Run master
rm -r ~/bin/pgsql/master/
initdb -D ~/bin/pgsql/master/
echo local replication $USER trust  ~/bin/pgsql/master/pg_hba.conf
postgres -D ~/bin/pgsql/master \
  -c wal_level=logical \
  -c max_wal_senders=10 \
  -c max_logical_slots=10 \
  -c wal_keep_segments=100 \
  -c log_line_prefix=[%p %x] 
# Logical log receiver
pg_receivellog -f $HOME/output.txt -d postgres -v

After launching some SQLs, the logical receiver is stuck just after sending
INIT_LOGICAL_REPLICATION, please see bt of process waiting:
(gdb) bt
#0  0x7f1bbc13b170 in __poll_nocancel () from /usr/lib/libc.so.6
#1  0x7f1bbc43072d in pqSocketPoll (sock=3, forRead=1, forWrite=0,
end_time=-1) at fe-misc.c:1089
#2  0x7f1bbc43060d in pqSocketCheck (conn=0x1dd0290, forRead=1,
forWrite=0, end_time=-1) at fe-misc.c:1031
#3  0x7f1bbc4304dd in pqWaitTimed (forRead=1, forWrite=0,
conn=0x1dd0290, finish_time=-1) at fe-misc.c:963
#4  0x7f1bbc4304af in pqWait (forRead=1, forWrite=0, conn=0x1dd0290) at
fe-misc.c:946
#5  0x7f1bbc42c64c in PQgetResult (conn=0x1dd0290) at fe-exec.c:1709
#6  0x7f1bbc42cd62 in PQexecFinish (conn=0x1dd0290) at fe-exec.c:1974
#7  0x7f1bbc42c9c8 in PQexec (conn=0x1dd0290, query=0x406c60
INIT_LOGICAL_REPLICATION 'test_decoding') at fe-exec.c:1808
#8  0x00402370 in StreamLog () at pg_receivellog.c:263
#9  0x004030c9 in main (argc=6, argv=0x7fff44edb288) at
pg_receivellog.c:694
So I am not able to output any results using pg_receivellog.

Also, I noticed 2 errors in your set of tests.

On Thu, Nov 15, 2012 at 9:27 AM, Andres Freund and...@anarazel.de wrote:

 -- wrapped in a transaction
 BEGIN;
 INSERT INTO replication_example(somedata, text) VALUES (1, 1);
 UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT
 currval('replication_example_id_seq'));

In SET clause, the column name is *somedata* and not *somedate*


 -- dont write out aborted data
 BEGIN;
 INSERT INTO replication_example(somedata, text) VALUES (2, 1);
 UPDATE replication_example SET somedate = - somedata WHERE id = (SELECT
 currval('replication_example_id_seq'));

Same error here, *somedata* and not *somedate*. Not a big deal, it made the
transactions failing though.
-- 
Michael Paquier
http://michael.otacoo.com