Re: [HACKERS] Proposal: First step towards Intelligent, integrateddatabase

2010-12-02 Thread Vaibhav Kaushal
I know world's population. Non of the person thinks alike and still many
peoples goal can be the same. Nothing is virgin in this world. If
someone thinks like that then it is a mistake.  My aim is to prove that
Postgresql can be the great leader if we put natural intelligence in
database which is missing all over.

And what is that intelligence?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] build problem

2010-12-02 Thread Peter Eisentraut
On ons, 2010-12-01 at 21:27 -0800, Jeff Davis wrote:
 On Wed, 2010-12-01 at 22:16 -0500, Tom Lane wrote:
  Jeff Davis pg...@j-davis.com writes:
   When I build HEAD using just make -s install, everything works fine.
   But when I add -j12, it appears to cause problems. This problem
   appeared very recently. Output below.
  
  Platform?
 
 Oops, sorry:
 
 $ uname -a
 Linux jdavis-ux 2.6.32-24-generic #43-Ubuntu SMP Thu Sep 16 14:58:24 UTC
 2010 x86_64 GNU/Linux

I'll look into it.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 01.12.2010 20:51, Heikki Linnakangas wrote:

Another approach would be to revisit the way the running-xacts snapshot
is taken. Currently, we first take a snapshot, and then WAL-log it.
There is a small window between the steps where backends can begin/end
transactions, and recovery has to deal with that. When this was
designed, there was long discussion on whether we should instead grab
WALInsertLock and ProcArrayLock at the same time, to ensure that the
running-xacts snapshot represents an up-to-date situation at the point
in WAL where it's inserted.

We didn't want to do that because both locks can be heavily contended.
But maybe we should after all. It would make the recovery code simpler.

If we want to get fancy, we wouldn't necessarily need to hold both locks
for the whole duration. We could first grab ProcArrayLock and construct
the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
finally write the WAL record and release WALInsertLock. But that would
require small changes to XLogInsert.


I took a look at that approach. We don't actually need to hold 
ProcArrayLock while the WAL-record is written, we need to hold 
XidGenLock. I believe that's less severe than holding the ProcArrayLock 
as there's already precedence for writing a WAL record while holding 
that: we do that when we advance to a new clog page and write a 
zero-clog-page record.


So this is what we should do IMHO.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 6095,6102  StartupXLOG(void)
  			StartupSUBTRANS(oldestActiveXID);
  			StartupMultiXact();
  
- 			ProcArrayInitRecoveryInfo(oldestActiveXID);
- 
  			/*
  			 * If we're beginning at a shutdown checkpoint, we know that
  			 * nothing was running on the master at this point. So fake-up an
--- 6095,6100 
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***
*** 435,453  ProcArrayClearTransaction(PGPROC *proc)
  }
  
  /*
-  * ProcArrayInitRecoveryInfo
-  *
-  * When trying to assemble our snapshot we only care about xids after this value.
-  * See comments for LogStandbySnapshot().
-  */
- void
- ProcArrayInitRecoveryInfo(TransactionId oldestActiveXid)
- {
- 	latestObservedXid = oldestActiveXid;
- 	TransactionIdRetreat(latestObservedXid);
- }
- 
- /*
   * ProcArrayApplyRecoveryInfo -- apply recovery info about xids
   *
   * Takes us through 3 states: Initialized, Pending and Ready.
--- 435,440 
***
*** 519,533  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * OK, we need to initialise from the RunningTransactionsData record
! 	 */
! 
! 	/*
! 	 * Remove all xids except xids later than the snapshot. We don't know
! 	 * exactly which ones that is until precisely now, so that is why we allow
! 	 * xids to be added only to remove most of them again here.
  	 */
- 	ExpireOldKnownAssignedTransactionIds(running-nextXid);
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
--- 506,514 
  	Assert(standbyState == STANDBY_INITIALIZED);
  
  	/*
! 	 * Release any locks belonging to old transactions that are not
! 	 * running according to the running-xacts record.
  	 */
  	StandbyReleaseOldLocks(running-nextXid);
  
  	/*
***
*** 536,544  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
- 	 * Combine the running xact data with already known xids, if any exist.
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * combine them first, sort them and then re-add to KnownAssignedXids.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
--- 517,524 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  
  	/*
  	 * KnownAssignedXids is sorted so we cannot just add new xids, we have to
! 	 * sort them first.
  	 *
  	 * Some of the new xids are top-level xids and some are subtransactions.
  	 * We don't call SubtransSetParent because it doesn't matter yet. If we
***
*** 547,597  ProcArrayApplyRecoveryInfo(RunningTransactions running)
  	 * xids to subtrans. If RunningXacts is overflowed then we don't have
  	 * enough information to correctly update subtrans anyway.
  	 */
  
  	/*
! 	 * Allocate a temporary array so we can combine xids. The total of both
! 	 * arrays should never normally exceed TOTAL_MAX_CACHED_SUBXIDS.
  	 */
! 	xids = palloc(sizeof(TransactionId) * TOTAL_MAX_CACHED_SUBXIDS);
  
  	/*
! 	 * Get the remaining KnownAssignedXids. In most cases there won't be any
! 	 * at all since this exists only to catch a theoretical race condition.
! 	 */
! 	nxids = KnownAssignedXidsGet(xids, InvalidTransactionId);
! 	if (nxids  0)
! 		

Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 10:39 +0200, Heikki Linnakangas wrote:
 On 01.12.2010 20:51, Heikki Linnakangas wrote:
  Another approach would be to revisit the way the running-xacts snapshot
  is taken. Currently, we first take a snapshot, and then WAL-log it.
  There is a small window between the steps where backends can begin/end
  transactions, and recovery has to deal with that. When this was
  designed, there was long discussion on whether we should instead grab
  WALInsertLock and ProcArrayLock at the same time, to ensure that the
  running-xacts snapshot represents an up-to-date situation at the point
  in WAL where it's inserted.
 
  We didn't want to do that because both locks can be heavily contended.
  But maybe we should after all. It would make the recovery code simpler.
 
  If we want to get fancy, we wouldn't necessarily need to hold both locks
  for the whole duration. We could first grab ProcArrayLock and construct
  the snapshot. Then grab WALInsertLock and release ProcArrayLock, and
  finally write the WAL record and release WALInsertLock. But that would
  require small changes to XLogInsert.
 
 I took a look at that approach. We don't actually need to hold 
 ProcArrayLock while the WAL-record is written, we need to hold 
 XidGenLock. I believe that's less severe than holding the ProcArrayLock 
 as there's already precedence for writing a WAL record while holding 
 that: we do that when we advance to a new clog page and write a 
 zero-clog-page record.
 
 So this is what we should do IMHO.

Oh, thanks for looking at this. I've been looking at this also and as we
might expect had a slightly different design.

First, your assessment of the locking above is better than mine. I agree
with your analysis so we should do it that way. The locking issue was
the reason I haven't patched this yet so I'm glad you've improved this.

In terms of the rest of the patch, it seems we have different designs, I
think I have a much simpler, less invasive solution:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hi- How frequently Postgres Poll for trigger file

2010-12-02 Thread aaliya zarrin
Hey!1
I am able to change the switch over time.. Thanks alot..
2010/12/2 aaliya zarrin aaliya.zar...@gmail.com

 Ok.. Thanks I will try!!!

 2010/12/2 Heikki Linnakangas heikki.linnakan...@enterprisedb.com

 Oh, and there's another value for the case we're polling the archive, not
 streaming from master. Search for 5 in XLogPageRead function to catch them
 all.


 On 02.12.2010 08:21, Heikki Linnakangas wrote:

 It's the 500L argument in the WaitLatch call. It's expressed in
 microseconds.

 On 02.12.2010 06:39, aaliya zarrin wrote:

 Where this 5 Seconds time is defined in the code ???



 On Wed, Dec 1, 2010 at 5:53 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 On 01.12.2010 13:27, aaliya zarrin wrote:

 I want to know how frequently postgres search for trigger file to
 switch
 over.


 In 9.0, every 100ms while streaming replication is active and
 connected. 5
 seconds otherwise. In current git master branch, it's always 5 s.


 Can this switch over time be reduced?



 Not without hacking the sources and compiling.

 Although, on many platforms, Linux included I believe, sending a
 signal to
 the startup process should wake it up from the sleep and make it
 check the
 trigger file immediately. pg_ctl reload for example should do it.
 So if
 ıou send a signal to the startup process immediately after creating the
 trigger file, it should take notice sooner.


 Plz let me know where postgres poll for trigger file. I could find it
 out

 in
 backend/access/trans/xlog.c ? am i right?


 Yes. search for callers of CheckForStandbyTrigger() function.


 --
 Heikki Linnakangas
 EnterpriseDB http://www.enterprisedb.com








 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com




 --
 Thanks  Regards,

 Aaliya Zarrin
 (+91)-9160665888




-- 
Thanks  Regards,

Aaliya Zarrin
(+91)-9160665888


Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-02 Thread Yeb Havinga

On 2010-12-01 16:05, Tom Lane wrote:


Perhaps I should have said possibly workable proposal.  What you wrote
doesn't even begin to cover the interesting part of the problem, namely
how to ensure uniqueness is preserved in the face of concurrent
insertions.
The usual page locking done by nbtree doesn't work in this case, since a 
conflict can be two inserts into two different indexes. This means that 
one way or another, some extra kind of lock is needed. The question is 
what current lock mechanism is suitable for this purpose, the answer to 
that may depend on the way conflicts are checked for. Checking for 
conflicts can be done at row insert time or commit time.


With 'inheritance chain' I mean any set of more than one relation that 
is the transitive closure of the inheritance parent and child relation 
on a given relation.


- Check at row insert time:
transaction A
-- before insert of row into an inheritance chain, suppose with key value 10
-- lock X exclusively (this is to be a fined grained lock, X could be a 
name derived from the inserted key value 10)

-- query inheritance chain for existing values with SnapshotNow
-- on existing value, release 'X locks' taken by current backend and error
-- else normal processing continues (actual insert etc..)
-- after commit, release 'X locks' taken by current backend

If another transaction B wants to insert the same key value 10, it will 
block on the lock taken by A, which A releases after commit, then B 
continues and when it queries the inheritance chain, it sees the 
committed record of A and will then report an error.


The drawback here is lock proliferation: for each row a lock seems unwanted.

- Check at commit time
transaction A
-- inserts row into inheritance chain
-- record row and tuple for later check
-- at commit time, if any check was recorded
--- for every inheritance chain with inserts, ordered by lowest relation oid
 lock Y exclusively (where Y is a name derived from the inheritance 
chain, e.g. lowest relation oid)
 for every inserted value, query for duplicate key values in 
inheritance chain with SnapshotNow

 on error, release Y and abort transaction (serialization error)
--- after commit / heap tuples marked committed, release all Y locks.

transaction B
May insert duplicate values into the inheritance chain, but at commit 
time will either get a lock or block on lock. It can continue after A 
releases locks, and then it will read the changes committed by B.


Though this solution has fewer locks as the per-row case, a deadlock can 
occur if the inheritance chains are not locked in order. This is avoided 
by ordering the chains on the lowest oid of its relations. Perhaps the 
biggest drawback is that a serialization error must be thrown at commit, 
even when the user did not set the transaction isolation level to 
SERIALIZABLE and that might violate POLA.

(My current feelings about this are that a general-purpose solution
would probably cost more than it's worth.  What people really care
about is FK to a partitioned table, which is a structure in which
we don't have to solve the general problem: if we know that the
partitioning prevents the same key from appearing in multiple
partitions, then we only have to look into one partition.  So this
is just another item that's pending introduction of real partitioning
infrastructure.)
While this is true for the partitioning use case, it will never be for 
'true' inheritance use cases, and therefore a real partitioning 
structure doesn't help the inheritance use cases. (Such as ours that is 
a implementation of HL7's reference implementation model. 
http://www.hl7.org/v3ballot/html/infrastructure/rim/rim.html) A lot can 
be said about object model implementations in relational databases. To 
keep it short: Codd stressed the value of the relational model in it's 
ACM Turing award lecture: a practical foundation for productivity. It is 
not productive, that users must implement foreign key checking in 
user-space, instead of using a database primitive.


regards,
Yeb Havinga





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-02 Thread Yeb Havinga

On 2010-12-02 11:27, Yeb Havinga wrote:
With 'inheritance chain' I mean any set of more than one relation that 
is the transitive closure of the inheritance parent and child relation 
on a given relation.

s/transitive closure/transitive reflexive closure




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the 
running-xacts snapshot and writing it to the log is sufficient, I don't 
see what moving the running-xacts record buys us. Does it allow some 
further simplifications somewhere?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_execute_from_file review

2010-12-02 Thread Dimitri Fontaine
Hi,

Please find attached the v8 version of the patch, that fixes the following:

Itagaki Takahiro itagaki.takah...@gmail.com writes:
 * pg_read_binary_file_internal() should return not only the contents
   as char * but also the length, because the file might contain 0x00.
   In addition, null-terminations for the contents buffer is useless.

 * The 1st argument of pg_convert must be bytea rather than cstring in
   pg_convert_and_execute_sql_file(). I think you can fix both the bug
   and the above one if pg_read_binary_file_internal() returns bytea.

I've changed pg_read_binary_file_internal() to return bytea*, which is
much cleaner, thanks for the suggestion!

 * pg_read_file() has stronger restrictions than pg_read_binary_file().
   (absolute path not allowed) and -1 length is not supported.
   We could fix pg_read_file() to behave as like as pg_read_binary_file().

It's now using the _internal function directly, so that there's only one
code definition to care about here.

 * (It was my suggestion, but) Is it reasonable to use -1 length to read
   the while file? It might fit with C, but NULL might be better for SQL.

Well thinking about it, omitting the length parameter alltogether seems
like the more natural SQL level API for me, so I've made it happen:

=# \df pg_read_*|replace|pg_exe*
List of functions
   Schema   | Name  | Result data type |   Argument data 
types   |  Type  
+---+--+-+
 pg_catalog | pg_execute_sql_file   | void | text   
 | normal
 pg_catalog | pg_execute_sql_file   | void | text, name 
 | normal
 pg_catalog | pg_execute_sql_file   | void | text, name, VARIADIC 
text   | normal
 pg_catalog | pg_execute_sql_string | void | text   
 | normal
 pg_catalog | pg_execute_sql_string | void | text, VARIADIC text
 | normal
 pg_catalog | pg_read_binary_file   | bytea| text, bigint   
 | normal
 pg_catalog | pg_read_binary_file   | bytea| text, bigint, bigint   
 | normal
 pg_catalog | pg_read_file  | text | text, bigint   
 | normal
 pg_catalog | pg_read_file  | text | text, bigint, bigint   
 | normal
 pg_catalog | replace   | text | text, text, text   
 | normal
 pg_catalog | replace   | text | text, text, text, 
VARIADIC text | normal
(11 rows)


 * The doc says pg_execute_sql_string() is restricted for superusers,
   but is not restricted actually. I think we don't have to.

Agreed, fixed the doc.

 * In docs, the example of replace_placeholders() is
   replace('abcdefabcdef', 'cd', 'XX', 'ef', 'YY').
   ~~~
   BTW, I think we can call it just replace because parser can handle
   them correctly even if we have both replace(text, text, text) and
   replace(text, VARIADIC text[]). We will need only one doc for them.
   Note that if we call replace() with 3 args, the non-VARIADIC version
   is called. So, there is no performance penalty.

The same idea occured to me yesternight when reading through the patch
to send. It's now done in the way you can see above. The idea is not to
change the existing behavior at all, so I've not changed the
non-VARIADIC version of the function.

 * We might rename pg_convert_and_execute_sql_file() to
   pg_execute_sql_file_with_encoding() or so to have the same prefix.

Well, I think I prefer reading the verbs in the order that things are
happening in the code, it's actually convert then execute. But again,
maybe some Native Speaker will have a say here, or maybe your proposed
name fits better in PostgreSQL. I'd leave it for commiter :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1840,1846 
  /indexterm
  literalfunctionreplace(parameterstring/parameter typetext/type,
  parameterfrom/parameter typetext/type,
! parameterto/parameter typetext/type)/function/literal
 /entry
 entrytypetext/type/entry
 entryReplace all occurrences in parameterstring/parameter of substring
--- 1840,1849 
  /indexterm
  literalfunctionreplace(parameterstring/parameter typetext/type,
  parameterfrom/parameter typetext/type,
! parameterto/parameter typetext/type
! [, parameterfrom/parameter typetext/type,
!parameterto/parameter typetext/type,
![, ...] ])/function/literal
 /entry
 entrytypetext/type/entry
 entryReplace all occurrences in parameterstring/parameter of substring
***
*** 14449,14466  postgres=# SELECT * FROM 

Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 07:39, Joachim Wieland wrote:

On Sun, Nov 14, 2010 at 6:52 PM, Joachim Wielandj...@mcknight.de  wrote:

You would add a regular parallel dump with

$ pg_dump -j 4 -Fd -f out.dir dbname


So this is an updated series of patches for my parallel pg_dump WIP
patch. Most importantly it now runs on Windows once you get it to
compile there (I have added the new files to the respective project of
Mkvcbuild.pm but I wondered why the other archive formats do not need
to be defined in that file...).

So far nobody has volunteered to review this patch. It would be great
if people could at least check it out, run it and let me know if it
works and if they have any comments.


That's a big patch..

I don't see the point of the sort-by-relpages code. The order the 
objects are dumped should be irrelevant, as long as you obey the 
restrictions dictated by dependencies. Or is it only needed for the 
multiple-target-dirs feature? Frankly I don't see the point of that, so 
it would be good to cull it out at least in this first stage.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Simon Riggs
On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:
 On 02.12.2010 11:02, Simon Riggs wrote:
  The cause of the issue is that replay starts at one LSN and there is a
  delay until the RunningXacts WAL record occurs. If there was no delay,
  there would be no issue at all. In CreateCheckpoint() we start by
  grabbing the WAInsertLock and later recording that pointer as part of
  the checkpoint record. My proposal is to replace the grab the lock
  code with the insert of the RunningXacts WAL record (when wal_level
  set), so that recovery always starts with that record type.
 
 Oh, interesting idea. But AFAICS closing the gap between acquiring the 
 running-xacts snapshot and writing it to the log is sufficient, I don't 
 see what moving the running-xacts record buys us. Does it allow some 
 further simplifications somewhere?

Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.

Earlier on this thread, we discussed:

On Wed, 2010-11-24 at 15:19 +, Simon Riggs wrote: 
 On Wed, 2010-11-24 at 12:48 +0200, Heikki Linnakangas wrote:
  When recovery starts, we fetch the oldestActiveXid from the checkpoint
  record. Let's say that it's 100. We then start replaying WAL records 
  from the Redo pointer, and the first record (heap insert in your case)
  contains an Xid that's much larger than 100, say 1. We call 
  RecordKnownAssignedXids() to make note that all xids between that
  range are in-progress, but there isn't enough room in the array for
  that.
 
 Agreed.

The current code fails because of the gap between the redo pointer and
the XLOG_RUNNING_XACTS WAL record. If there is no gap, there is no
problem.

So my preferred solution would:
* Log XLOG_RUNNING_XACTS while holding XidGenLock, as you suggest
* Move logging to occur at the Redo pointer

That is a much smaller patch with a smaller footprint.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby: too many KnownAssignedXids

2010-12-02 Thread Heikki Linnakangas

On 02.12.2010 13:25, Simon Riggs wrote:

On Thu, 2010-12-02 at 12:41 +0200, Heikki Linnakangas wrote:

On 02.12.2010 11:02, Simon Riggs wrote:

The cause of the issue is that replay starts at one LSN and there is a
delay until the RunningXacts WAL record occurs. If there was no delay,
there would be no issue at all. In CreateCheckpoint() we start by
grabbing the WAInsertLock and later recording that pointer as part of
the checkpoint record. My proposal is to replace the grab the lock
code with the insert of the RunningXacts WAL record (when wal_level
set), so that recovery always starts with that record type.


Oh, interesting idea. But AFAICS closing the gap between acquiring the
running-xacts snapshot and writing it to the log is sufficient, I don't
see what moving the running-xacts record buys us. Does it allow some
further simplifications somewhere?


Your patch is quite long and you do a lot more than just alter the
locking. I don't think we need those changes at all and especially would
not wish to backpatch that.


Most of the changes to procarray.c were about removing code that's no 
longer necessary when we close the gap between acquiring and writing the 
running-xacts WAL record. You can leave it as it is as a historical 
curiosity, but I'd prefer to simplify it, given that we now know that it 
doesn't actually work correctly if the gap is not closed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Or maybe I do.  One other thing I've been thinking about with regard
 to hint bit updates is that we might choose to mark that are
 hint-bit-updated as untidy rather than dirty.  The background

Please review archives, you'll find the idea discussed and some patches
to implement it, by Simon. I suppose you could begin here:

  http://archives.postgresql.org/pgsql-patches/2008-06/msg00113.php

Regards,  
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I don't see the point of the sort-by-relpages code. The order the objects
 are dumped should be irrelevant, as long as you obey the restrictions
 dictated by dependencies. Or is it only needed for the multiple-target-dirs
 feature? Frankly I don't see the point of that, so it would be good to cull
 it out at least in this first stage.

From the talk at CHAR(10), and provided memory serves, it's an
optimisation so that you're doing largest file in a process and all the
little file in other processes. In lots of case the total pg_dump
duration is then reduced to about the time to dump the biggest files.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] When will 8.1 be EOLed?

2010-12-02 Thread Devrim GÜNDÜZ

IIRC we announced that we will EOL 8.1 in 2010 -- given that we are now
in December, will it be done soon?

(As a packager, I'll be happy to drop is ASAP, so that I can use those
VMs for new OSes).

Regards,

-- 
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org  Twitter: http://twitter.com/devrimgunduz


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] When will 8.1 be EOLed?

2010-12-02 Thread Magnus Hagander
2010/12/2 Devrim GÜNDÜZ dev...@gunduz.org:

 IIRC we announced that we will EOL 8.1 in 2010 -- given that we are now
 in December, will it be done soon?

 (As a packager, I'll be happy to drop is ASAP, so that I can use those
 VMs for new OSes).

We even said November 2010 - see
http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy.

The usual policy is we'll put out one more minor release and that's
the final one. So the answer is whenever we put out the next batch of
minor releases.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Joachim Wieland
On Thu, Dec 2, 2010 at 6:19 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I don't see the point of the sort-by-relpages code. The order the objects
 are dumped should be irrelevant, as long as you obey the restrictions
 dictated by dependencies. Or is it only needed for the multiple-target-dirs
 feature? Frankly I don't see the point of that, so it would be good to cull
 it out at least in this first stage.

A guy called Dimitri Fontaine actually proposed the
serveral-directories feature here and other people liked the idea.

http://archives.postgresql.org/pgsql-hackers/2008-02/msg01061.php  :-)

The code doesn't change much with or without it, and if people are no
longer in favour of it, I have no problem with taking it out.

As Dimitri has already pointed out, the relpage sorting thing is there
to start with the largest table(s) first.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] When will 8.1 be EOLed?

2010-12-02 Thread Alvaro Herrera
Excerpts from Devrim GÜNDÜZ's message of jue dic 02 10:10:28 -0300 2010:
 
 IIRC we announced that we will EOL 8.1 in 2010 -- given that we are now
 in December, will it be done soon?
 
 (As a packager, I'll be happy to drop is ASAP, so that I can use those
 VMs for new OSes).

My guess is that the next update will be the last one for 8.1.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] When will 8.1 be EOLed?

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 08:10 AM, Devrim GÜNDÜZ wrote:

IIRC we announced that we will EOL 8.1 in 2010 -- given that we are now
in December, will it be done soon?

(As a packager, I'll be happy to drop is ASAP, so that I can use those
VMs for new OSes).




EOL does not mean we simply drop the branch on the floor. The policy at 
http://wiki.postgresql.org/wiki/PostgreSQL_Release_Support_Policy 
states: The first version update released after the EOL date will 
normally be the final one for that version. Note that this can extend 
support shortly past the listed date.  That shortly could amount to 
several months, and has in the past.


(I'm not sure I understand why you need a separate VM for each branch, 
as your post seems to imply.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-02 Thread Yeb Havinga

On 2010-12-01 16:58, Florian Pflug wrote:

On Dec1, 2010, at 15:27 , Tom Lane wrote:

Indeed.  This isn't even worth the time to review, unless you have a
proposal for fixing the unique-index-across-multiple-tables problem.


I've wondered in the past if a unique index is really necessary on the columns 
referenced by a FK in all cases, though.
Me too, but Tom wrote, in one of the archive threads linked to from the 
todo wiki, that it was in the SQL spec, so..

Another idea might be to allow this if tableoid is part of the FK. In that 
case, unique indices on the remaining columns in the individual child table would be 
enough to guarantee global uniqueness.
Yes that would work, but if the database user must make the tableoid 
column, and include in the PK and FK's (i.e. not transparant for the 
user), then it would still be a workaround.



BTW, my serializable_lock_consisteny patch would allow you to do this purely 
within pl/pgsql in a race-condition free way. So if that patch should get applied you 
might want to consider this as a workaround. Whether it will get applied is yet to be 
seen, though - currently there doesn't seem to be unanimous vote one way or the other.

Thanks for the reference, I will try and see how to use it for this purpose.

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] is cachedFetchXid ever invalidated?

2010-12-02 Thread Alvaro Herrera
Excerpts from Jeff Davis's message of mié dic 01 22:48:36 -0300 2010:
 I can't see any place that cachedFetchXid is ever invalidated.
 Shouldn't it be invalidated before transaction ID wraparound?
 
 It's difficult to construct a test case to show whether this is a
 problem or not,

Couldn't you just create a C function that advanced the Xid counter by,
say, 100k?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Dimitri Fontaine
Joachim Wieland j...@mcknight.de writes:
 A guy called Dimitri Fontaine actually proposed the
 serveral-directories feature here and other people liked the idea.

Hehe :)

Reading that now, it could be that I didn't know at the time that given
a powerful enough subsystem disk there's no way to saturate it with one
CPU. So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time.  Not sure how releveant it is.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving foreign key locks

2010-12-02 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mié dic 01 14:44:03 -0300 2010:
 Florian Pflug f...@phlo.org writes:
  On Dec1, 2010, at 17:17 , Tom Lane wrote:
  There's not enough space in the infomask to record which columns (or
  which unique index) are involved.  And if you're talking about data that
  could remain on disk long after the unique index is gone, that's not
  going to be good enough.
 
  We'd distinguish two cases
A) The set of locked columns is a subset of the set of columns mentioned 
  in
   *any* unique index. (In other words, for every locked column there is a
   unique index which includes that column, though not necessarily one 
  index
   which includes them all)
B) The set of locked columns does not satisfy (A)
 
 How's that fix it?  The on-disk flags are still falsifiable by
 subsequent index changes.
 
  Creating indices shouldn't pose a problem, since it would only enlarge the 
  set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEY set.
 
 Not with that definition.  I could create a unique index that doesn't
 contain some column that every previous unique index contained.

This is not a problem, because a later UPDATE that tried to modify a
locked tuple on a column only indexed by the new index, would consider
it locked.  Which is OK.  There would only be a problem if we were
allowed to drop an index (which would reduce the set of locked columns),
but we don't allow this because DROP INDEX requires an exclusive lock on
the table and thus there cannot be anyone with key-locked tuples in it.

AFAICT Florian's definition is good.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FK's to refer to rows in inheritance child

2010-12-02 Thread Yeb Havinga

On 2010-12-02 01:18, Jim Nasby wrote:

On Dec 1, 2010, at 8:07 AM, Yeb Havinga wrote:

FK's cannot refer to rows in inheritance childs.

We have partially solved this issue at work. In our scenario, we're not using inheritance for 
partitioning, we're using it for, well, inheriting. As part of that, we have a field in the parent 
table that tells you what type of object each row is, and constraints on the child 
tables that enforce that. We've created triggers that perform the same operations that the built-in 
RI triggers do, namely grabbing a share lock on the target row. The difference is that our trigger 
looks at the type field to determine exactly what table it needs to try and grab shared 
locks on (we need to do this because the backend doesn't allow you to SELECT ... FROM parent FOR 
SHARE).
That part is exactly what the current WIP patch takes care of: grabbing 
share locks on the right relation.

Our solution is not complete though. Offhand, I know it doesn't support 
cascade, but I think there's more stuff it doesn't do. AFAIK all of those 
shortcomings could be handled with whats available at a user level though, so 
someone with enough motivation could produce an entire RI framework that worked 
with inheritance (though the framework would need a way to work around the 
uniqueness issue).
But is 'it can be solved on the user level' enough reason to not 
implement it in the server code? Foreign key and unique constraint 
checking are features the server should provide.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V3: Idle in transaction cancellation

2010-12-02 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 On Tuesday 19 October 2010 16:18:29 Kevin Grittner wrote:
 
 For SSI purposes, it would be highly desirable to be able to set
 the SQLSTATE and message generated when the canceled transaction
 terminates.
 
 Ok, I implemented that capability, but the patch feels somewhat
 wrong to me, 
 
 so its a separate patch on top the others:
 
The patch is in context diff format, applies with minor offsets,
compiles and passes regression tests.
 
I have to admit that after reading the patch, I think I previously
misunderstood the scope of it.  Am I correct in reading that the
main thrust of this is to improve error handling on standbys?  Is
there any provision for one backend to cause a *different* backend
which is idle in a transaction to terminate cleanly when it attempts
to process its next statement?  (That is what I was hoping to find,
for use in the SSI patch.)
 
Anyway, if the third patch file is only there because of my request,
I think it might be best to focus on the first two as a solution for
the standby issues this was originally meant to address, and then to
look at an API for the usage I have in mind after that is settled.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 6:37 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 Or maybe I do.  One other thing I've been thinking about with regard
 to hint bit updates is that we might choose to mark that are
 hint-bit-updated as untidy rather than dirty.  The background

 Please review archives, you'll find the idea discussed and some patches
 to implement it, by Simon. I suppose you could begin here:

  http://archives.postgresql.org/pgsql-patches/2008-06/msg00113.php

Thanks for the pointer.  I guess that demonstrates that good ideas
will keep floating back up to the surface.  It seems like the idea was
met with generally positive feedback, except that most people seemed
to want a slightly simpler system than what Simon was proposing.  I
suspect that this is mostly suffering from a lack of round tuits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V3: Idle in transaction cancellation

2010-12-02 Thread Andres Freund
On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:
 Andres Freund and...@anarazel.de wrote:
  On Tuesday 19 October 2010 16:18:29 Kevin Grittner wrote:
  For SSI purposes, it would be highly desirable to be able to set
  the SQLSTATE and message generated when the canceled transaction
  terminates.
  
  Ok, I implemented that capability, but the patch feels somewhat
  wrong to me,
 
  so its a separate patch on top the others:
 The patch is in context diff format, applies with minor offsets,
 compiles and passes regression tests.
 
 I have to admit that after reading the patch, I think I previously
 misunderstood the scope of it.  Am I correct in reading that the
 main thrust of this is to improve error handling on standbys?  Is
 there any provision for one backend to cause a *different* backend
 which is idle in a transaction to terminate cleanly when it attempts
 to process its next statement?  (That is what I was hoping to find,
 for use in the SSI patch.)
Do you wan't to terminate it immediately or on next statement?

You might want to check out SendProcSignal() et al.

 Anyway, if the third patch file is only there because of my request,
 I think it might be best to focus on the first two as a solution for
 the standby issues this was originally meant to address, and then to
 look at an API for the usage I have in mind after that is settled.
Besides that I dont like the implementation very much, I think its generally a 
good idea...

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] twitter_fdw-0.1

2010-12-02 Thread Josh Berkus

Hitoshi,


I've written twitter_fdw, which makes twitter table to get twitter
search result by SELECT statement. This is done against WIP SQL/MED
version PostgreSQL (git sha-1 is
c5e027c32b8faf0e2d25b8deecb4d3c05399411c). To play with it, make
make install, run the install script as you usually do so with contrib
modules and connect the server, then,


This is way cool!  Will have to try it out.

--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: First step towards Intelligent, integrateddatabase

2010-12-02 Thread Kevin Grittner
ghatpa...@vsnl.net wrote:
 
 If possible can you provide glimpses of History.
 
http://en.wikipedia.org/wiki/Navigational_database
 
http://en.wikipedia.org/wiki/MARK_IV_%28software%29
 
I don't want to go back there, myself.  YMMV.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Josh Berkus

On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:

So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time.  Not sure how releveant it is.


I think it will complicate this feature unnecessarily for 9.1. 
Personally, I need this patch so much I'm thinking of backporting it. 
However, having all the data go to one directory/mount wouldn't trouble 
me at all.


Now, if only I could think of some way to write a parallel dump to a set 
of pipes, I'd be in heaven.


--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: First step towards Intelligent, integrateddatabase

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 12:44 PM, Kevin Grittner wrote:

ghatpa...@vsnl.net  wrote:


If possible can you provide glimpses of History.


http://en.wikipedia.org/wiki/Navigational_database

http://en.wikipedia.org/wiki/MARK_IV_%28software%29

I don't want to go back there, myself.  YMMV.




Me either. BTDTGTTS. :-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 12:56 PM, Josh Berkus wrote:

On 12/02/2010 05:50 AM, Dimitri Fontaine wrote:

So the use case of parralel dump in a bunch or user given locations
would be to use different mount points (disk subsystems) at the same
time.  Not sure how releveant it is.


I think it will complicate this feature unnecessarily for 9.1. 
Personally, I need this patch so much I'm thinking of backporting it. 
However, having all the data go to one directory/mount wouldn't 
trouble me at all.


Now, if only I could think of some way to write a parallel dump to a 
set of pipes, I'd be in heaven.


The only way I can see that working sanely would be to have a program 
gathering stuff at the other end of the pipes, and ensuring it was all 
coherent. That would be a huge growth in scope for this, and I seriously 
doubt it's worth it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Josh Berkus

 Now, if only I could think of some way to write a parallel dump to a
 set of pipes, I'd be in heaven.
 
 The only way I can see that working sanely would be to have a program
 gathering stuff at the other end of the pipes, and ensuring it was all
 coherent. That would be a huge growth in scope for this, and I seriously
 doubt it's worth it.

Oh, no question.  And there's workarounds ... sshfs, for example.  I'm
just thinking of the ad-hoc parallel backup I'm running today, which
relies heavily on pipes.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V3: Idle in transaction cancellation

2010-12-02 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 
 Do you wan't to terminate it immediately or on next statement?
 
I want to have one backend terminate the transaction on another
backend as soon as practicable.  If a query is active, it would be
best if it was canceled.  It appears that if it is idle in
transaction there is a need to wait for the next request.  It would
be a big plus for the backend requesting the cancellation to be able
to specify the SQLSTATE, message, etc., used by the other backend on
failure.
 
 You might want to check out SendProcSignal() et al.
 
Will take a look.
 
 Besides that I dont like the implementation very much, I think its
 generally a good idea...
 
OK.  While browsing around, I'll try to think of an alternative
approach, but this is new territory for me -- I've been learning
about areas in the code at need so far
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Jeff Davis
On Wed, 2010-12-01 at 23:22 -0500, Robert Haas wrote:
 Well, let's think about what we'd need to do to make CRCs work
 reliably.  There are two problems.
 
 1. [...]  If we CRC the entire page, the torn pages are never
 acceptable, so every action that modifies the page must be WAL-logged.
 
 2. Currently, we allow hint bits on a page to be updated while holding

[...]

The way I see it, here are the rules we are breaking, and why:

* We don't get an exclusive lock when dirtying a page with hint bits
- Why: we write while reading, and we want good concurrency.
- Why': because after a bulk load, we don't have any hint bits, and the
only way to get them set without VACUUM is to write while reading. I've
never been entirely sure why VACUUM isn't good enough in this case,
aside from the fact that a user might not run VACUUM (and autovacuum
might not either, if it was only a bulk load and no updates/deletes).

* We don't WAL log setting hint bits (which dirties a page)
- Why: because after a bulk load, we don't want to write the data a 4th
time

Hypothetically, if we had a bulk loading strategy, these problems would
go away, and we could follow the rules. Right? Is there a case other
than bulk loading which demands that we break these rules?

And, if we had a bulk loading path, we could probably get away with
writing the data only twice (today, we write it 3 times including the
hint bits) or maybe once if WAL archiving is off.

So, is there a case other than bulk loading for which we need to break
these rules? If not, perhaps we should consider bulk loading a different
problem, and simplify the design of all of these other features (and
allow new storage-touching features to come about, like CRCs, without
exponentially increasing the complexity with each one).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Joachim Wieland
On Thu, Dec 2, 2010 at 12:56 PM, Josh Berkus j...@agliodbs.com wrote:
 Now, if only I could think of some way to write a parallel dump to a set of
 pipes, I'd be in heaven.

What exactly are you trying to accomplish with the pipes?

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Kevin Grittner
Jeff Davis pg...@j-davis.com wrote:
 
 And, if we had a bulk loading path, we could probably get away
 with writing the data only twice (today, we write it 3 times
 including the hint bits) or maybe once if WAL archiving is off.
 
If you're counting WAL writes, you're low.  If you don't go out of
your way to avoid it, you are likely to write the data to the table
once during the bulk load, a second time on first read to set the
hint bits, and a third time to freeze data to prevent wrap-around. 
The initial write may or may not be WAL-logged.  The freezing
typically is WAL-logged.
 
So, you can easily write the data to disk four or five times.  With
luck these are spread out uniformly or happen during off-peak
periods.  Unmanaged, a WAL-logged freeze of bulk-loaded data is
somewhat more likely to occur, however, during hours of heavy OLTP
load, because transaction numbers are consumed so quickly.
 
Currently, a VACUUM FREEZE after a bulk load collapses at least two
of those writes to one.  With luck, some pages might still be dirty
in cache, and you can save two of the writes.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Greg Stark
On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote:
 I ask because I don't have a mental model of how the pause can help.
 Given that this dirty data has been hanging around for many minutes
 already, what is a 3 second pause going to heal?


 The difference is that once an fsync call is made, dirty data is much more
 likely to be forced out.  It's the one thing that bypasses all other ways
 the kernel might try to avoid writing the data

I had always assumed the problem was that other I/O had been done to
the files in the meantime. I.e. the fsync is not just syncing the
checkpoint but any other blocks that had been flushed since the
checkpoint started. The longer the checkpoint is spread over the more
other I/O is included as well.

Using sync_file_range you can specify the set of blocks to sync and
then block on them only after some time has passed. But there's no
documentation on how this relates to the I/O scheduler so it's not
clear it would have any effect on the problem. We might still have to
delay the begining of the sync to allow the dirty blocks to be synced
naturally and then when we issue it still end up catching a lot of
other i/o as well.




-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Josh Berkus

 Using sync_file_range you can specify the set of blocks to sync and
 then block on them only after some time has passed. But there's no
 documentation on how this relates to the I/O scheduler so it's not
 clear it would have any effect on the problem. We might still have to
 delay the begining of the sync to allow the dirty blocks to be synced
 naturally and then when we issue it still end up catching a lot of
 other i/o as well.

This *really* sounds like we should be working with the FS geeks on
making the OS do this work for us.  Greg, you wanna go to LinuxCon next
year?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-12-02 Thread Heikki Linnakangas

Ok, committed, with some small cleanup since the last patch I posted.

Could you update the directory-format patch on top of the committed 
version, please?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-02 Thread Robert Haas
On Wed, Dec 1, 2010 at 3:55 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Josh Berkus's message of mié dic 01 17:13:35 -0300 2010:

  Well, porting applications from other database systems that support 
  synonyms
  (i.e. Oracle, DB2, SQL Server).

 SQL Server supports synonyms?  If it's not Oracle-only, it's a more
 powerful argument to have the feature.

 I think it's worth mentioning that in SQL Server, synonyms are not
 schema-qualified; they're global objects.

Seems like they have more than one kind.

http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7001.htm

The section entitled notes on public synonyms is particularly
interesting, as it seems to imply that under some but not all
conditions synonyms get materialized inside schemas that use them.
The list of objects for which they support synonyms is also
interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-02 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue dic 02 17:27:01 -0300 2010:
 On Wed, Dec 1, 2010 at 3:55 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Josh Berkus's message of mié dic 01 17:13:35 -0300 2010:
 
   Well, porting applications from other database systems that support 
   synonyms
   (i.e. Oracle, DB2, SQL Server).
 
  SQL Server supports synonyms?  If it's not Oracle-only, it's a more
  powerful argument to have the feature.
 
  I think it's worth mentioning that in SQL Server, synonyms are not
  schema-qualified; they're global objects.
 
 Seems like they have more than one kind.
 
 http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7001.htm

Yeah, the Oracle system is a lot more complex than SQL Server's, but I
was only talking about the latter, for which see here:

http://www.databasejournal.com/features/mssql/article.php/3635426/SYNONYM-in-SQL-Server-2005.htm

 The list of objects for which they support synonyms is also
 interesting.

The bit that allows a synonym to reference another synonym seems like
worth considering further (either reject them altogether, or have some
way to deal with possible cycles).  I think the original proposal in
this thread didn't mention them at all.

(I don't think we should consider synonyms for either functions or
stored procedures; that would make the current mess of function
resolution rules a lot messier.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] V3: Idle in transaction cancellation

2010-12-02 Thread Kevin Grittner
Andres Freund and...@anarazel.de wrote:
 On Thursday 02 December 2010 00:48:53 Kevin Grittner wrote:

 Is there any provision for one backend to cause a *different*
 backend which is idle in a transaction to terminate cleanly when
 it attempts to process its next statement?
 
 You might want to check out SendProcSignal() et al.
 
Yeah, that was the missing link for me.  Thanks!
 
 Anyway, if the third patch file is only there because of my
 request, I think it might be best to focus on the first two as a
 solution for the standby issues this was originally meant to
 address, and then to look at an API for the usage I have in mind
 after that is settled.
 
 Besides that I dont like the implementation very much, I think its
 generally a good idea...
 
Is it sane to leave the implementation of this for the specific
areas which need it (like SSI), or do you think a generalized API
for it is needed?
 
I'll look at it more closely tonight, but at first scan it appears
that just reserving one flag for PROCSIG_SERIALIZATION_FAILURE (or
PROCSIG_SSI_CANCELLATION?) would allow me to code up the desired
behavior in a function called from procsignal_sigusr1_handler.  I
can arrange for passing any needed detail through the SSI-controlled
structures somehow.  Would that allow you to skip the parts you
didn't like?
 
It looks as though this is something which could easily be split off
as a separate patch within the SSI effort.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-12-02 Thread Alvaro Herrera
Excerpts from Heikki Linnakangas's message of jue dic 02 16:52:27 -0300 2010:
 Ok, committed, with some small cleanup since the last patch I posted.

I think the comments on _ReadBuf and friends need to be updated, since
they are not just for headers and TOC stuff anymore.  I'm not sure if
they were already outdated before your patch ...

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-12-02 Thread Alvaro Herrera
Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:

 Ill set this up for the next commitfest, I don't think I can do much more 
 without further input.

Are you reserving about 20 bits for levels, and 12 for flags?  Given the
relatively scarce growth of levels, I think we could live with about 6
or 7 bits for level, rest for flags.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] V3: Idle in transaction cancellation

2010-12-02 Thread Andres Freund
On Thursday 02 December 2010 22:21:37 Alvaro Herrera wrote:
 Excerpts from Andres Freund's message of sáb oct 30 05:49:21 -0300 2010:
  Ill set this up for the next commitfest, I don't think I can do much
  more  without further input.
 
 Are you reserving about 20 bits for levels, and 12 for flags?  Given the
 relatively scarce growth of levels, I think we could live with about 6
 or 7 bits for level, rest for flags.
The number I picked was absolutely arbitrary I admit. Neither did I think it 
would be likely to see more levels, nor did I forsee many flags, so I just 
chose some number I liked in that certain moment ;-)

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 2:01 PM, Jeff Davis pg...@j-davis.com wrote:
 * We don't get an exclusive lock when dirtying a page with hint bits
 - Why: we write while reading, and we want good concurrency.
 - Why': because after a bulk load, we don't have any hint bits, and the
 only way to get them set without VACUUM is to write while reading. I've
 never been entirely sure why VACUUM isn't good enough in this case,
 aside from the fact that a user might not run VACUUM (and autovacuum
 might not either, if it was only a bulk load and no updates/deletes).

 * We don't WAL log setting hint bits (which dirties a page)
 - Why: because after a bulk load, we don't want to write the data a 4th
 time

 Hypothetically, if we had a bulk loading strategy, these problems would
 go away, and we could follow the rules. Right? Is there a case other
 than bulk loading which demands that we break these rules?

I'm not really convinced that this problem is confined to bulk
loading.  Every INSERT or UPDATE results in a new tuple that may need
hit bits set and eventually to be frozen.  A bulk load is just a time
when you do lots of inserts all at once; it seems to me that a large
update would cause all the same problems, plus bloat.  The triple I/O
problem exists for small transactions as well (and isn't desirable
there either); it's just less noticeable because the second and third
writes are, like the first one, small.

 And, if we had a bulk loading path, we could probably get away with
 writing the data only twice (today, we write it 3 times including the
 hint bits) or maybe once if WAL archiving is off.

It seems to me that a COPY command executed in a transaction with no
other open snapshots writing to a table created or truncated within
the same transaction should be able to write frozen tuples from the
get-go, regardless of anything else we do.

 So, is there a case other than bulk loading for which we need to break
 these rules? If not, perhaps we should consider bulk loading a different
 problem, and simplify the design of all of these other features (and
 allow new storage-touching features to come about, like CRCs, without
 exponentially increasing the complexity with each one).

I don't think we're exponentially increasing complexity - I think
we're incrementally improving our algorithms.  If you want to propose
a bulk loading path, great.  Propose away!  But without something a
bit more concrete, I don't think it would be appropriate to hold off
making the visibility map crash-safe, on the off chance that our
design for so doing might complicate something else we want to do
later.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 That's a big patch..

Not nearly big enough :-(

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 05:01 PM, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

That's a big patch..

Not nearly big enough :-(

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.




Well, snapshot cloning should allow that objection to be overcome, no?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/02/2010 05:01 PM, Tom Lane wrote:
 In the past, proposals for this have always been rejected on the grounds
 that it's impossible to assure a consistent dump if different
 connections are used to read different tables.  I fail to understand
 why that consideration can be allowed to go by the wayside now.

 Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's.  But the
snapshot issue is definitely a showstopper.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Bruce Momjian
Dimitri Fontaine wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  I don't see the point of the sort-by-relpages code. The order the objects
  are dumped should be irrelevant, as long as you obey the restrictions
  dictated by dependencies. Or is it only needed for the multiple-target-dirs
  feature? Frankly I don't see the point of that, so it would be good to cull
  it out at least in this first stage.
 
 From the talk at CHAR(10), and provided memory serves, it's an
 optimisation so that you're doing largest file in a process and all the
 little file in other processes. In lots of case the total pg_dump
 duration is then reduced to about the time to dump the biggest files.

Seems there should be a comment in the code explaining why this is being
done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Jeff Davis
On Thu, 2010-12-02 at 17:00 -0500, Robert Haas wrote:
 I'm not really convinced that this problem is confined to bulk
 loading.  Every INSERT or UPDATE results in a new tuple that may need
 hit bits set and eventually to be frozen.  A bulk load is just a time
 when you do lots of inserts all at once; it seems to me that a large
 update would cause all the same problems, plus bloat.

A big UPDATE does a lot of work, I don't see any near-term approach for
solving that. Avoiding WAL (and full page writes) for the updates to
PD_ALL_VISIBLE, hint bits, VM bits, etc., would probably be the least of
my concerns in that case.

   The triple I/O
 problem exists for small transactions as well (and isn't desirable
 there either); it's just less noticeable because the second and third
 writes are, like the first one, small.

Bulk loading poses some unique challenges because there is no
opportunity to set PD_ALL_VISIBLE or hint bits before the loading is
complete; and by that time, many checkpoints will have already happened,
and the pages have already hit disk. That means we need to re-read them,
modify them, and write them again (plus WAL, if we were following the
rules).

Small transactions don't suffer from the same problems. They generally
begin and end without an intervening checkpoint. That means that you
have an opportunity to set PD_ALL_VISIBLE or hint bits before the
checkpoint happens, thus avoiding unnecessary extra writes.

Additionally, small transaction workloads will generally have, to some
approximation, some working set of pages. So, even if you do a read of a
single tuple, write PD_ALL_VISIBLE and hint bits (thus dirtying the
page), there is a reasonable chance that someone will come by later and
do an insert/update/delete (thus forcing WAL anyway).

And if the small transaction workload is completely random and you
aren't touching the same pages between checkpoints, then setting hint
bits one-at-a-time is not a good strategy anyway. It would be much
better to do it in bulk with a VACUUM. And if VACUUM does anything
significant to a page, it's going to WAL anyway.

I'm having trouble seeing a case other than bulk-loading which causes a
real problem. Maybe a small-transaction workload with a few long-running
transactions? Even that doesn't seem so bad.

 It seems to me that a COPY command executed in a transaction with no
 other open snapshots writing to a table created or truncated within
 the same transaction should be able to write frozen tuples from the
 get-go, regardless of anything else we do.

Well, some transaction might pick up a snapshot between the time you
begin the copy and the time it commits. We'd need to prevent such a
transaction from actually reading the table.

 I don't think it would be appropriate to hold off
 making the visibility map crash-safe, on the off chance that our
 design for so doing might complicate something else we want to do
 later.

I'm not suggesting we hold off on it at all. To the contrary, I'm
suggesting that we simply log updates of PD_ALL_VISIBLE as well as VM
bits, at least until a performance problem presents itself. That will
_simplify_ the design.

Then, when a performance problem does present itself for a certain use
case, we can see how to fix it. If many cases are affected, then we
might choose one of these more creative solutions that breaks the rules
in controlled ways, understanding the trade-offs. If only bulk loading
is affected, we might choose to address that case directly.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 05:32 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.


Yes, I agree with that.


(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's.  But the
snapshot issue is definitely a showstopper.)





Why is that more an issue with parallel pg_dump?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] We really ought to do something about O_DIRECT and data=journalled on ext4

2010-12-02 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 On 11/30/2010 11:17 PM, Tom Lane wrote:
  Andrew Dunstanand...@dunslane.net  writes:
  On 11/30/2010 10:09 PM, Tom Lane wrote:
  We should wait for the outcome of the discussion about whether to change
  the default wal_sync_method before worrying about this.
  we've just had a significant PGX customer encounter this with the latest
  Postgres on Redhat's freshly released flagship product. Presumably the
  default wal_sync_method will only change prospectively.
  I don't think so.  The fact that Linux is changing underneath us is a
  compelling reason for back-patching a change here.  Our older branches
  still have to be able to run on modern OS versions.  I'm also fairly
  unclear on what you think a fix would look like if it's not effectively
  a change in the default.
 
  (Hint: this *will* be changing, one way or another, in Red Hat's version
  of 8.4, since that's what RH is shipping in RHEL6.)
 
  
 
 Well, my initial idea was that if PG_O_DIRECT is non-zero, we should 
 test at startup time if we can use it on the WAL file system and inhibit 
 its use if not.
 
 Incidentally, I notice it's not used at all in test_fsync.c - should it 
 not be?

test_fsync certainly should be using PG_O_DIRECT in the same places the
backend does.  Once we decide how to handle PG_O_DIRECT, I will modify
test_fsync to match.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] should we set hint bits without dirtying the page?

2010-12-02 Thread Robert Haas
In a sleepy email late last night on the crash-safe visibility map
thread, I proposed introducing a new buffer state BM_UNTIDY.  When a
page is dirtied by a hint bit update, we mark it untidy but not dirty.
 Untidy buffers would be treated as dirty by the background writer
cleaning scan, but as clean by checkpoints and by backends doing
emergency buffer cleaning to feed new allocations.  This would have
the effect of rate-limiting the number of buffers that we write just
for hint-bit updates.  With default settings, we'd write at most
bgwriter_lru_maxpages * (1000 ms/second / bgwriter_delay) untidy pages
per second, which works out to 4MB/second of write traffic with
default settings.  That seems like it might be enough to prevent the
bulk load followed by SELECT access pattern from totally swamping
the machine with write traffic, while still ensuring that all the hint
bits eventually do get set.

I then got to wondering whether we should even go a step further, and
simply decree that a page with only hint bit updates is not dirty and
won't be written, period.  If your working set fits in RAM, this isn't
really a big deal because you'll read the pages in once, set the hint
bits, and those pages will just stick around.  Where it's a problem is
when you have a huge table that you're scanning over and over again,
especially if data in that table was loaded by many different, widely
spaced XIDs that require looking at many different CLOG pages.  But
maybe we could ameliorate that problem by freezing more aggressively.
As soon as all tuples on the page are all-visible, VACUUM will freeze
every tuple on the page (setting a HEAP_XMIN_FROZEN bit rather than
actually overwriting XMIN, to preserve forensic information) and mark
it all-visible in a single WAL-logged operation.  Also, we could have
the background writer (!) try to perform this same operation on pages
evicted during the cleaning scan.  This would impose the same sort of
I/O cap as the previous idea, although it would generate not only page
writes but also WAL activity.

The result would be not only to reduce the number of times we write
the page (which, right now, can be as much as 3 * number_of_tuples, if
we insert, hint-bit update, and then freeze each tuple separately),
but also to make the freezing happen gradually over time rather than
in a sudden spike when the XID age cut-off is reached.  This would
also be advantageous for index-only scans, because a large insert only
table would gradually accumulate frozen pages without ever being
vacuumed.  The gradual freezing wouldn't apply in all cases - in
particular, if you have a large insert-only table that you never
actually read anything out of, you'd still get a spike when the XID
age cut-off is reached.  I'm inclined to think it would still be a big
improvement over the status quo - you'd write the table twice instead
of three times, and the second one would often be spread out rather
than all at once.

I foresee various objections.  One is that freezing will force FPIs,
so you'll still be writing the data three times.  Of course, if you
count FPIs, we're now writing the data four times, but under this
scheme much more data would stick around long enough to get frozen, so
the objection has merit.  However, I think we can avoid this too, by
allocating an additional bit in pd_flags, PD_FPI.  Instead of emitting
an FPI when the old LSN precedes the redo pointer, we'll emit an FPI
when the FPI bit is set (in which case we'll also clear the bit) OR
when the old LSN precedes the redo pointer.  Upon emitting a WAL
record that is torn-page safe (such as a freeze or all-visible
record), we'll pass a flag to XLogInsert that arranges to suppress
FPIs, bump the LSN, and set PD_FPI.  That way, if the page is touched
again before the next checkpoint by an operation that does NOT
suppress FPI, one will be emitted then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 6:37 PM, Jeff Davis pg...@j-davis.com wrote:
 It seems to me that a COPY command executed in a transaction with no
 other open snapshots writing to a table created or truncated within
 the same transaction should be able to write frozen tuples from the
 get-go, regardless of anything else we do.

 Well, some transaction might pick up a snapshot between the time you
 begin the copy and the time it commits. We'd need to prevent such a
 transaction from actually reading the table.

Oh, hmm.  That's awkward.  I guess under present semantics it can see
the table - but not its contents - once the inserting transaction has
committed.  That stinks.

 I don't think it would be appropriate to hold off
 making the visibility map crash-safe, on the off chance that our
 design for so doing might complicate something else we want to do
 later.

 I'm not suggesting we hold off on it at all. To the contrary, I'm
 suggesting that we simply log updates of PD_ALL_VISIBLE as well as VM
 bits, at least until a performance problem presents itself. That will
 _simplify_ the design.

 Then, when a performance problem does present itself for a certain use
 case, we can see how to fix it. If many cases are affected, then we
 might choose one of these more creative solutions that breaks the rules
 in controlled ways, understanding the trade-offs. If only bulk loading
 is affected, we might choose to address that case directly.

I don't think that you can seriously suggest that emitting that volume
of FPIs isn't going to be a problem immediately.  We have to have some
solution to that problem out of the gate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 3:43 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Robert Haas's message of jue dic 02 17:27:01 -0300 2010:
 On Wed, Dec 1, 2010 at 3:55 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Josh Berkus's message of mié dic 01 17:13:35 -0300 2010:
 
   Well, porting applications from other database systems that support 
   synonyms
   (i.e. Oracle, DB2, SQL Server).
 
  SQL Server supports synonyms?  If it's not Oracle-only, it's a more
  powerful argument to have the feature.
 
  I think it's worth mentioning that in SQL Server, synonyms are not
  schema-qualified; they're global objects.

 Seems like they have more than one kind.

 http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_7001.htm

 Yeah, the Oracle system is a lot more complex than SQL Server's, but I
 was only talking about the latter, for which see here:

 http://www.databasejournal.com/features/mssql/article.php/3635426/SYNONYM-in-SQL-Server-2005.htm

Well, that seems primarily designed to cut down on three and four part
names.  We don't have that problem anyway.

 The list of objects for which they support synonyms is also
 interesting.

 The bit that allows a synonym to reference another synonym seems like
 worth considering further (either reject them altogether, or have some
 way to deal with possible cycles).

It's pretty trivial to do cycle-detection at runtime.

 I think the original proposal in
 this thread didn't mention them at all.

 (I don't think we should consider synonyms for either functions or
 stored procedures; that would make the current mess of function
 resolution rules a lot messier.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spread checkpoint sync

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 2:24 PM, Greg Stark gsst...@mit.edu wrote:
 On Wed, Dec 1, 2010 at 4:25 AM, Greg Smith g...@2ndquadrant.com wrote:
 I ask because I don't have a mental model of how the pause can help.
 Given that this dirty data has been hanging around for many minutes
 already, what is a 3 second pause going to heal?


 The difference is that once an fsync call is made, dirty data is much more
 likely to be forced out.  It's the one thing that bypasses all other ways
 the kernel might try to avoid writing the data

 I had always assumed the problem was that other I/O had been done to
 the files in the meantime. I.e. the fsync is not just syncing the
 checkpoint but any other blocks that had been flushed since the
 checkpoint started.

It strikes me that we might start the syncs of the files that the
checkpoint isn't going to dirty further at the start of the
checkpoint, and do the rest at the end.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-12-02 Thread Bruce Momjian
Peter Eisentraut wrote:
 On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote:
  I agree, that argument is completely misconceived. If the DBA is
  paying enough attention to use LIMIT, s/he should be paying enough
  attention not to do damage in the first place. If that were the only
  argument in its favor I'd be completely against the feature.
 
 I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could
 be very useful if you are doing full-table updates and you don't have
 enough space so you do it in chunks.

So should this now be a TODO item?  Text?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 5:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 12/02/2010 05:01 PM, Tom Lane wrote:
 In the past, proposals for this have always been rejected on the grounds
 that it's impossible to assure a consistent dump if different
 connections are used to read different tables.  I fail to understand
 why that consideration can be allowed to go by the wayside now.

 Well, snapshot cloning should allow that objection to be overcome, no?

 Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 7:12 PM, Bruce Momjian br...@momjian.us wrote:
 Peter Eisentraut wrote:
 On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote:
  I agree, that argument is completely misconceived. If the DBA is
  paying enough attention to use LIMIT, s/he should be paying enough
  attention not to do damage in the first place. If that were the only
  argument in its favor I'd be completely against the feature.

 I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could
 be very useful if you are doing full-table updates and you don't have
 enough space so you do it in chunks.

 So should this now be a TODO item?  Text?

Allow DELETE and UPDATE to be used with LIMIT and ORDER BY.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another proposal for table synonyms

2010-12-02 Thread Alvaro Herrera
Excerpts from Robert Haas's message of jue dic 02 21:10:48 -0300 2010:
 On Thu, Dec 2, 2010 at 3:43 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Robert Haas's message of jue dic 02 17:27:01 -0300 2010:

  Yeah, the Oracle system is a lot more complex than SQL Server's, but I
  was only talking about the latter, for which see here:
 
  http://www.databasejournal.com/features/mssql/article.php/3635426/SYNONYM-in-SQL-Server-2005.htm
 
 Well, that seems primarily designed to cut down on three and four part
 names.  We don't have that problem anyway.

Right.  (My point here is that SQL Server is not a good guidance on what
the synonym system should do.)

  The list of objects for which they support synonyms is also
  interesting.
 
  The bit that allows a synonym to reference another synonym seems like
  worth considering further (either reject them altogether, or have some
  way to deal with possible cycles).
 
 It's pretty trivial to do cycle-detection at runtime.

No disagreement on that, but something needs to be decided.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-02 Thread Josh Berkus
On 12/2/10 4:00 PM, Robert Haas wrote:
 As soon as all tuples on the page are all-visible, VACUUM will freeze
 every tuple on the page (setting a HEAP_XMIN_FROZEN bit rather than
 actually overwriting XMIN, to preserve forensic information) and mark
 it all-visible in a single WAL-logged operation.  Also, we could have
 the background writer (!) try to perform this same operation on pages
 evicted during the cleaning scan.  This would impose the same sort of
 I/O cap as the previous idea, although it would generate not only page
 writes but also WAL activity.

I would love this.  It would also help considerably with the freezing
already cold data problem ... if we were allowed to treat the frozen
bit as canonical and not update any of the tuples.  While never needing
to touch pages at all for freezing is my preference, updating them while
they're in memory anyway is a close second.

Hmm.  That doesn't work, though; the page can contain tuples which are
attached to rolledback XIDs.  Also, autovacuum would have no way of
knowing which pages are frozen without reading them.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 07:13 PM, Robert Haas wrote:

On Thu, Dec 2, 2010 at 5:32 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 12/02/2010 05:01 PM, Tom Lane wrote:

In the past, proposals for this have always been rejected on the grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.



That seems like a bit of an easy shot. Requiring that parallel pg_dump 
produce a dump that is as consistent as non-parallel pg_dump currently 
produces isn't unreasonable. It's not stopping us moving forward, it's 
just not wanting to go backwards.


And it shouldn't be terribly hard. IIRC Joachim has already done some 
work on it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DELETE with LIMIT (or my first hack)

2010-12-02 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Dec 2, 2010 at 7:12 PM, Bruce Momjian br...@momjian.us wrote:
  Peter Eisentraut wrote:
  On tis, 2010-11-30 at 14:20 -0500, Andrew Dunstan wrote:
   I agree, that argument is completely misconceived. If the DBA is
   paying enough attention to use LIMIT, s/he should be paying enough
   attention not to do damage in the first place. If that were the only
   argument in its favor I'd be completely against the feature.
 
  I don't have any use for DELETE with LIMIT, but UPDATE with LIMIT could
  be very useful if you are doing full-table updates and you don't have
  enough space so you do it in chunks.
 
  So should this now be a TODO item? ?Text?
 
 Allow DELETE and UPDATE to be used with LIMIT and ORDER BY.

Done:

Allow DELETE and UPDATE to be used with LIMIT and ORDER BY

* http://archives.postgresql.org/pgsql-hackers/2010-11/msg01997.php
* http://archives.postgresql.org/pgsql-hackers/2010-12/msg00021.php 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 7:19 PM, Josh Berkus j...@agliodbs.com wrote:
 On 12/2/10 4:00 PM, Robert Haas wrote:
 As soon as all tuples on the page are all-visible, VACUUM will freeze
 every tuple on the page (setting a HEAP_XMIN_FROZEN bit rather than
 actually overwriting XMIN, to preserve forensic information) and mark
 it all-visible in a single WAL-logged operation.  Also, we could have
 the background writer (!) try to perform this same operation on pages
 evicted during the cleaning scan.  This would impose the same sort of
 I/O cap as the previous idea, although it would generate not only page
 writes but also WAL activity.

 I would love this.  It would also help considerably with the freezing
 already cold data problem ... if we were allowed to treat the frozen
 bit as canonical and not update any of the tuples.  While never needing
 to touch pages at all for freezing is my preference, updating them while
 they're in memory anyway is a close second.

 Hmm.  That doesn't work, though; the page can contain tuples which are
 attached to rolledback XIDs.

Sure, well, any pages that are not all-visible will need to get
vacuumed before they get marked all-visible.  I can't fix that
problem.  But the more we freeze opportunistically before vacuum, the
less painful vacuum will be when it finally kicks in.  I don't
anticipate this is going to be perfect; I'd be happy if we could
achieve better.

 Also, autovacuum would have no way of
 knowing which pages are frozen without reading them.

Well, reading them is still better than reading them and then writing
them.  But in the long term I imagine we can avoid even doing that
much.  If we have a crash-safe visibility map and an aggressive
freezing policy that freezes all tuples on the page before marking it
all-visible, then even an anti-wraparound vacuum needn't scan
all-visible pages.  We might not feel confident to rely on that right
away, but I think over the long term we can hope to get there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take three

2010-12-02 Thread Jeff Davis
On Thu, 2010-12-02 at 19:06 -0500, Robert Haas wrote:
 I don't think that you can seriously suggest that emitting that volume
 of FPIs isn't going to be a problem immediately.  We have to have some
 solution to that problem out of the gate.

Fair enough. I think you understand my point, and it's easy enough to
remove complexity later if it makes sense to do so.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Robert Haas
On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstan and...@dunslane.net wrote:
 In the past, proposals for this have always been rejected on the
 grounds
 that it's impossible to assure a consistent dump if different
 connections are used to read different tables.  I fail to understand
 why that consideration can be allowed to go by the wayside now.
 Well, snapshot cloning should allow that objection to be overcome, no?
 Possibly, but we need to see that patch first not second.
 Yes, by all means let's allow the perfect to be the enemy of the good.


 That seems like a bit of an easy shot. Requiring that parallel pg_dump
 produce a dump that is as consistent as non-parallel pg_dump currently
 produces isn't unreasonable.  It's not stopping us moving forward, it's just
 not wanting to go backwards.

I certainly agree that would be nice.  But if Joachim thought the
patch were useless without that, perhaps he wouldn't have bothered
writing it at this point.  In fact, he doesn't think that, and he
mentioned the use cases he sees in his original post.  But even
supposing you wouldn't personally find this useful in those
situations, how can you possibly say that HE wouldn't find it useful
in those situations?  I understand that people sometimes show up here
and ask for ridiculous things, but I don't think we should be too
quick to attribute ridiculousness to regular contributors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 07:48 PM, Robert Haas wrote:

On Thu, Dec 2, 2010 at 7:21 PM, Andrew Dunstanand...@dunslane.net  wrote:

In the past, proposals for this have always been rejected on the
grounds
that it's impossible to assure a consistent dump if different
connections are used to read different tables.  I fail to understand
why that consideration can be allowed to go by the wayside now.

Well, snapshot cloning should allow that objection to be overcome, no?

Possibly, but we need to see that patch first not second.

Yes, by all means let's allow the perfect to be the enemy of the good.


That seems like a bit of an easy shot. Requiring that parallel pg_dump
produce a dump that is as consistent as non-parallel pg_dump currently
produces isn't unreasonable.  It's not stopping us moving forward, it's just
not wanting to go backwards.

I certainly agree that would be nice.  But if Joachim thought the
patch were useless without that, perhaps he wouldn't have bothered
writing it at this point.  In fact, he doesn't think that, and he
mentioned the use cases he sees in his original post.  But even
supposing you wouldn't personally find this useful in those
situations, how can you possibly say that HE wouldn't find it useful
in those situations?  I understand that people sometimes show up here
and ask for ridiculous things, but I don't think we should be too
quick to attribute ridiculousness to regular contributors.



Umm, nobody has attributed ridiculousness to anyone. Please don't put 
words in my mouth. But I think this is a perfectly reasonable discussion 
to have. Nobody gets to come along and get the features they want 
without some sort of consensus, not me, not you, not Joachim, not Tom.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Instrument checkpoint sync calls

2010-12-02 Thread Jeff Janes
On Tue, Nov 30, 2010 at 12:15 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Tue, Nov 30, 2010 at 8:38 AM, Greg Smith g...@2ndquadrant.com wrote:


 Hi Greg,

 Thanks for the update.



 This might be ready for some proper review now.  I know there's at least one
 blatant bug still in here I haven't found yet, related to how the averages
 are computed.

 Yes, the blatant bug:

 average_sync_time = CheckpointStats.ckpt_longest_sync /
 CheckpointStats.ckpt_sync_rels;

 That should clearly be ckpt_agg_sync_time, not ckpt_longest_sync.

I've attached a tiny patch to apply over yours, to deal with this and
with the case where no files are synced.

Combining this instrumentation patch with the backend sync one already
committed, the net result under log_min_messages=debug1is somewhat
undesirable in that I can now see the individual sync times for the
syncs done by the checkpoint writer, but I do not get times for the
syncs done by the backend (I only get informed of their existence).

I don't know what I would propose to fix this.  Having the reportage
of sync time of backend syncs be controlled by log_checkpoints seems
somewhat perverse, but the only alternative I see is to have
log_min_messages=debug1 always report times for the backend syncs.  Or
to have them go unreported altogether.

Cheers,

Jeff


patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Author names in source files

2010-12-02 Thread Itagaki Takahiro
I found there are some author names in some of source files.
Will we keep their names? And, will we add new author names
who wrote patches for those files?

src/backend/postmaster/syslogger.c(15)  : * Author: Andreas Pflug
src/backend/utils/adt/genfile.c(9)  : * Author: Andreas Pflug
src/backend/utils/adt/oracle_compat.c(7): * Author: Edmund Mergl
src/backend/utils/adt/txid.c(14): * Author: Jan Wieck, Afilias USA 
INC.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hi- How frequently Postgres Poll for trigger file

2010-12-02 Thread Fujii Masao
On Thu, Dec 2, 2010 at 12:16 AM, Euler Taveira de Oliveira
eu...@timbira.com wrote:
 As you said, there are platforms that a signal doesn't wake up a
 process, so I suggest (ii) but I'm fine to include (i) at docs too.

Can we use pg_ctl kill to send signal on such platforms?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Robert Haas
On Dec 2, 2010, at 8:11 PM, Andrew Dunstan and...@dunslane.net wrote:
 Umm, nobody has attributed ridiculousness to anyone. Please don't put words 
 in my mouth. But I think this is a perfectly reasonable discussion to have. 
 Nobody gets to come along and get the features they want without some sort of 
 consensus, not me, not you, not Joachim, not Tom.

I'm not disputing that we COULD reject the patch. I AM disputing that we've 
made a cogent argument for doing so.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/02/2010 05:32 PM, Tom Lane wrote:
 (I'm not actually convinced that snapshot cloning is the only problem
 here; locking could be an issue too, if there are concurrent processes
 trying to take locks that will conflict with pg_dump's.  But the
 snapshot issue is definitely a showstopper.)

 Why is that more an issue with parallel pg_dump?

The scenario that bothers me is

1. pg_dump parent process AccessShareLocks everything to be dumped.

2. somebody else tries to acquire AccessExclusiveLock on table foo.

3. pg_dump child process is told to dump foo, tries to acquire
AccessShareLock.

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete.  Can you say undetectable deadlock?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 09:09 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 12/02/2010 05:32 PM, Tom Lane wrote:

(I'm not actually convinced that snapshot cloning is the only problem
here; locking could be an issue too, if there are concurrent processes
trying to take locks that will conflict with pg_dump's.  But the
snapshot issue is definitely a showstopper.)

Why is that more an issue with parallel pg_dump?

The scenario that bothers me is

1. pg_dump parent process AccessShareLocks everything to be dumped.

2. somebody else tries to acquire AccessExclusiveLock on table foo.
hmm.
3. pg_dump child process is told to dump foo, tries to acquire
AccessShareLock.

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete.  Can you say undetectable deadlock?




Hmm. Yeah. Maybe we could get around it if we prefork the workers and 
they all acquire locks on everything to be dumped up front in nowait 
mode, right after the parent, and if they can't the whole dump fails. Or 
something along those lines.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Umm, nobody has attributed ridiculousness to anyone. Please don't put 
 words in my mouth. But I think this is a perfectly reasonable discussion 
 to have. Nobody gets to come along and get the features they want 
 without some sort of consensus, not me, not you, not Joachim, not Tom.

In particular, this issue *has* been discussed before, and there was a
consensus that preserving dump consistency was a requirement.  I don't
think that Joachim gets to bypass that decision just by submitting a
patch that ignores it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 12/02/2010 09:09 PM, Tom Lane wrote:
 Now, process 3 is blocked behind process 2 is blocked behind process 1
 which is waiting for 3 to complete.  Can you say undetectable deadlock?

 Hmm. Yeah. Maybe we could get around it if we prefork the workers and 
 they all acquire locks on everything to be dumped up front in nowait 
 mode, right after the parent, and if they can't the whole dump fails. Or 
 something along those lines.

[ thinks for a bit... ]  Actually it might be good enough if a child
simply takes the lock it needs in nowait mode, and reports failure on
error.  We know the parent already has that lock, so the only way that
the child's request can fail is if something conflicting with
AccessShareLock is queued up behind the parent's lock.  So failure to
get the child lock immediately proves that the deadlock case applies.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I then got to wondering whether we should even go a step further, and
 simply decree that a page with only hint bit updates is not dirty and
 won't be written, period.

This sort of thing has been discussed before.  It seems fairly clear to
me that any of these variations represents a performance tradeoff: some
cases will get better and some will get worse.  I think we are not going
to get far unless we can agree on a set of benchmark cases that we'll
use to decide whether the tradeoff is a win or not.  How can we arrive
at that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 09:41 PM, Tom Lane wrote:

Andrew Dunstanand...@dunslane.net  writes:

On 12/02/2010 09:09 PM, Tom Lane wrote:

Now, process 3 is blocked behind process 2 is blocked behind process 1
which is waiting for 3 to complete.  Can you say undetectable deadlock?

Hmm. Yeah. Maybe we could get around it if we prefork the workers and
they all acquire locks on everything to be dumped up front in nowait
mode, right after the parent, and if they can't the whole dump fails. Or
something along those lines.

[ thinks for a bit... ]  Actually it might be good enough if a child
simply takes the lock it needs in nowait mode, and reports failure on
error.  We know the parent already has that lock, so the only way that
the child's request can fail is if something conflicting with
AccessShareLock is queued up behind the parent's lock.  So failure to
get the child lock immediately proves that the deadlock case applies.





Yeah, that would be a whole lot simpler. It would avoid the deadlock, 
but it would have lots more chances for failure. But it would at least 
be a good place to start.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Bruce Momjian
Itagaki Takahiro wrote:
 I found there are some author names in some of source files.
 Will we keep their names? And, will we add new author names
 who wrote patches for those files?
 
 src/backend/postmaster/syslogger.c(15)  : * Author: Andreas Pflug
 src/backend/utils/adt/genfile.c(9)  : * Author: Andreas Pflug
 src/backend/utils/adt/oracle_compat.c(7): *   Author: Edmund Mergl
 src/backend/utils/adt/txid.c(14): *   Author: Jan Wieck, Afilias USA 
 INC.

When someone writes a file from scratch, we usually keep their name on
it so we remember the original author.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-02 Thread Andy Colson



2nd) I can get the data to stick around after restart. Though not reliably. In 
general:

create and fill a table, vacuum it (not sure if its important, I do it because 
thats what I'd done in my pgbench testing where I noticed the data stuck 
around), wait an hour (I usually left it for 12-24 hours, but recreated it with 
as little as a half hour), then restart pg. Sometimes the data is there... 
sometimes not.

I also filled my table with more data than memory would hold so it would spill 
to disk, again, because it recreates my pgbench setup.

I'm still working on finding the exact steps, but I wanted to get you #1 above.

-Andy







Ok, forget the time thing.  Has nothing to do with it.  (Which everyone already 
assumed I imagine).

Its truncate.

Create unloged table, fill it, truncate it, fill it again, restart pg, and the 
data will still be there.

-Andy

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of vie dic 03 00:15:28 -0300 2010:
 Itagaki Takahiro wrote:
  I found there are some author names in some of source files.
  Will we keep their names? And, will we add new author names
  who wrote patches for those files?
  
  src/backend/postmaster/syslogger.c(15)  : * Author: Andreas Pflug
  src/backend/utils/adt/genfile.c(9)  : * Author: Andreas Pflug
  src/backend/utils/adt/oracle_compat.c(7): *Author: Edmund Mergl
  src/backend/utils/adt/txid.c(14): *Author: Jan Wieck, Afilias 
  USA INC.
 
 When someone writes a file from scratch, we usually keep their name on
 it so we remember the original author.

The reason Takahiro-san is asking the question is because one of
Dimitri's patches adds a function to genfile.c, and add his name to the
Author field while at it.  So answering the second part of the
question is important.  I take it your answer to that is no.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] unlogged tables

2010-12-02 Thread Alvaro Herrera
Excerpts from Andy Colson's message of vie dic 03 00:37:17 -0300 2010:

 Ok, forget the time thing.  Has nothing to do with it.  (Which everyone 
 already assumed I imagine).
 
 Its truncate.
 
 Create unloged table, fill it, truncate it, fill it again, restart pg, and 
 the data will still be there.

Hmm, presumably the table rewrite thing in truncate is not preserving
the unlogged state (perhaps it's the swap-relfilenode business).  Does
CLUSTER have a similar effect?  What about VACUUM FULL?  If so you know
where the bug is.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Andrew Dunstan



On 12/02/2010 10:15 PM, Bruce Momjian wrote:

Itagaki Takahiro wrote:

I found there are some author names in some of source files.
Will we keep their names? And, will we add new author names
who wrote patches for those files?

src/backend/postmaster/syslogger.c(15)  : * Author: Andreas Pflug
src/backend/utils/adt/genfile.c(9)  : * Author: Andreas Pflug
src/backend/utils/adt/oracle_compat.c(7): * Author: Edmund Mergl
src/backend/utils/adt/txid.c(14): * Author: Jan Wieck, Afilias USA 
INC.

When someone writes a file from scratch, we usually keep their name on
it so we remember the original author.




That's probably reasonable for a while. But the code drifts. I certainly 
wouldn't object to my name being removed from initdb.c, after 7 years, 
athough probably half the code or more is still mine.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Bruce Momjian's message of vie dic 03 00:15:28 -0300 2010:
 When someone writes a file from scratch, we usually keep their name on
 it so we remember the original author.

 The reason Takahiro-san is asking the question is because one of
 Dimitri's patches adds a function to genfile.c, and add his name to the
 Author field while at it.  So answering the second part of the
 question is important.  I take it your answer to that is no.

If everybody who had added any code to a file got to put their name on
it, the author lists would be ridiculously long (and some of us would
have our names plastered in hundreds of places).  My answer is no,
and I'm not sure those things should be there in the first place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Jaime Casanova
On Thu, Dec 2, 2010 at 10:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
[...]
 My answer is no,
 and I'm not sure those things should be there in the first place.


i agree. after all the committers give the appropiate credit in the
commit message so if we want history it's there

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Author names in source files

2010-12-02 Thread Itagaki Takahiro
On Fri, Dec 3, 2010 at 13:28, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Thu, Dec 2, 2010 at 10:57 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 [...]
 My answer is no,
 and I'm not sure those things should be there in the first place.

 i agree. after all the committers give the appropiate credit in the
 commit message so if we want history it's there

Hi, folks. Thanks for all of your replies.

Dimitri, your name cannot be added in source files,
but will be recorded in commit logs ;-)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch for parallel pg_dump

2010-12-02 Thread Joachim Wieland
On Thu, Dec 2, 2010 at 9:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 In particular, this issue *has* been discussed before, and there was a
 consensus that preserving dump consistency was a requirement.  I don't
 think that Joachim gets to bypass that decision just by submitting a
 patch that ignores it.

I am not trying to bypass anything here :)  Regarding the locking
issue I probably haven't done sufficient research, at least I managed
to miss the emails that mentioned it. Anyway, that seems to be solved
now fortunately, I'm going to implement your idea over the weekend.

Regarding snapshot cloning and dump consistency, I brought this up
already several months ago and asked if the feature is considered
useful even without snapshot cloning. And actually it was you who
motivated me to work on it even without having snapshot consistency...

http://archives.postgresql.org/pgsql-hackers/2010-03/msg01181.php

In my patch pg_dump emits a warning when called with -j, if you feel
better with an extra option
--i-know-that-i-have-no-synchronized-snapshots, fine with me :-)

In the end we provide a tool with limitations, it might not serve all
use cases but there are use cases that would benefit a lot. I
personally think this is better than to provide no tool at all...


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] should we set hint bits without dirtying the page?

2010-12-02 Thread Heikki Linnakangas

On 03.12.2010 04:54, Tom Lane wrote:

Robert Haasrobertmh...@gmail.com  writes:

I then got to wondering whether we should even go a step further, and
simply decree that a page with only hint bit updates is not dirty and
won't be written, period.


This sort of thing has been discussed before.  It seems fairly clear to
me that any of these variations represents a performance tradeoff: some
cases will get better and some will get worse.  I think we are not going
to get far unless we can agree on a set of benchmark cases that we'll
use to decide whether the tradeoff is a win or not.  How can we arrive
at that?


It's pretty easy to come up with a test case where that would be a win. 
I'd like to see some benchmark results of the worst case, to see how 
much loss we're talking about at most. Robert described the worst case:



Where it's a problem is
when you have a huge table that you're scanning over and over again,
especially if data in that table was loaded by many different, widely
spaced XIDs that require looking at many different CLOG pages.


I'd like to add to that: and the table is big enough to not fit in 
shared_buffers, but small enough to fit in OS cache.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_execute_from_file review

2010-12-02 Thread Itagaki Takahiro
On Thu, Dec 2, 2010 at 20:00, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Please find attached the v8 version of the patch, that fixes the following:

I fixed and cleanup some of codes in it; v9 patch attached. Please check
my modifications, and set the status to Ready to Committer if you find
no problems. I think documentation and code comments might need to be
checked by native English speakers.

 Well thinking about it, omitting the length parameter alltogether seems
 like the more natural SQL level API for me, so I've made it happen:

Good idea. I re-added negative lengths checks in pg_read_file functions;
negative length is used internally, but not exposed as SQL functions.

   BTW, I think we can call it just replace
 The same idea occured to me yesternight when reading through the patch
 to send. It's now done in the way you can see above. The idea is not to
 change the existing behavior at all, so I've not changed the
 non-VARIADIC version of the function.

You added replace(text, text, text, VARIADIC text), but I think
replace(text, VARIADIC text) also works. If we have the short form,
we can use it easily from execute functions with placeholders.

Other changes:
* Added some regression tests.
* Int64GetDatum((int64) fst.st_size) was broken.
* An error checks for could not read file didn't work.
* Read file contents into bytea buffer directly to avoid memcpy.
* Fixed bad usages of text and bytea values
  because they are not null-terminated.
* I don't want to expose ArrayType to builtins.h.
  So I call replace_text_variadic() from execute functions.
* pg_execute_sql_file(path, NULL) won't work because it's a STRICT function.
  It returns NULL with no works when at least one of the argument is NULL.

BTW, we have many text from/to cstring conversions in the new codes.
It would be not an item for now, but we would need to improve them
if those functions are heavily used, Especially replace_text_variadic().

 * We might rename pg_convert_and_execute_sql_file() to
   pg_execute_sql_file_with_encoding() or so to have the same prefix.

 Well, I think I prefer reading the verbs in the order that things are
 happening in the code, it's actually convert then execute. But again,
 maybe some Native Speaker will have a say here, or maybe your proposed
 name fits better in PostgreSQL. I'd leave it for commiter :)

Agreed. I also prefer pg_read_file_all rather than pg_read_whole_file :P

-- 
Itagaki Takahiro


pg_execute_from_file.v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers