Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-29 Thread Stefan Kaltenbrunner

Gregory Stark wrote:

Kris Jurka [EMAIL PROTECTED] writes:


On Mon, 28 Jan 2008, Jeff Davis wrote:


I think that pg_dump is a reasonable use case for synchoronized scans
when the table has not been clustered. It could potentially make pg_dump
have much less of a performance impact when run against an active
system.

One of the advantages I see with maintaining table dump order is that rsyncing
backups to remote locations will work better.


I can't see what scenario you're talking about here. pg_dump your live
database, restore it elsewhere, then shut down the production database and run
rsync from the live database to the restored one? Why not just run rsync for
the initial transfer?


take a dump (maybe in plaintext format) save it to disk and use rsync to 
copy it elsewhere. the more similiar the dumps the more efficient 
rsync can copy the data over.



Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-29 Thread Gregory Stark

Kris Jurka [EMAIL PROTECTED] writes:

 On Mon, 28 Jan 2008, Jeff Davis wrote:

 I think that pg_dump is a reasonable use case for synchoronized scans
 when the table has not been clustered. It could potentially make pg_dump
 have much less of a performance impact when run against an active
 system.

 One of the advantages I see with maintaining table dump order is that rsyncing
 backups to remote locations will work better.

I can't see what scenario you're talking about here. pg_dump your live
database, restore it elsewhere, then shut down the production database and run
rsync from the live database to the restored one? Why not just run rsync for
the initial transfer?

I can't see that working well for a real database and a database loaded from a
pg_dump anyways. Every dead record will introduce skew, plus page headers, and
every record will have a different system data such as xmin for one.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-28 Thread Russell Smith

Guillaume Smet wrote:

Hi Russell,

On Jan 28, 2008 7:27 AM, Russell Smith [EMAIL PROTECTED] wrote:
  

Can somebody explain why it's important to load with
synchronized_scanning off?

do_sql_command(g_conn, SET synchronized_scanning TO off);



It's the start point of this patch. See this thread [
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ]
for more information

Sorry, total brain fade in interpreting the patch.

g_conn is our connection to the database, not the command we are dumping.

Sorry

Russell

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-28 Thread Jeff Davis
On Sun, 2008-01-27 at 15:07 -0500, Tom Lane wrote:
 Per today's -hackers discussion, add a GUC variable to allow clients to
 disable the new synchronized-scanning behavior, and make pg_dump disable
 sync scans so that it will reliably preserve row ordering.  This is a
 pretty trivial patch, but seeing how late we are in the 8.3 release
 cycle, I thought I'd better post it for comment anyway.

I apologize for the late reply, but I have one comment I'd like to add.

 + if (g_fout-remoteVersion = 80300)
 + do_sql_command(g_conn, SET synchronized_scanning TO off);
 + 
 + /*
* Start serializable transaction to dump consistent data.
*/

I think that pg_dump is a reasonable use case for synchoronized scans
when the table has not been clustered. It could potentially make pg_dump
have much less of a performance impact when run against an active
system.

I think it's worth considering enabling sync scans for non-clustered
tables if it would not interfere with the release. Of course, a painless
8.3 release is the top priority.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Tom Lane
Per today's -hackers discussion, add a GUC variable to allow clients to
disable the new synchronized-scanning behavior, and make pg_dump disable
sync scans so that it will reliably preserve row ordering.  This is a
pretty trivial patch, but seeing how late we are in the 8.3 release
cycle, I thought I'd better post it for comment anyway.

regards, tom lane

Index: doc/src/sgml/config.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.162
diff -c -r1.162 config.sgml
*** doc/src/sgml/config.sgml27 Jan 2008 19:12:28 -  1.162
--- doc/src/sgml/config.sgml27 Jan 2008 20:00:16 -
***
*** 4611,4616 
--- 4611,4638 
/listitem
   /varlistentry
  
+  varlistentry id=guc-synchronized-scanning 
xreflabel=synchronized_scanning
+   termvarnamesynchronized_scanning/varname 
(typeboolean/type)/term
+   indexterm
+primaryvarnamesynchronized_scanning/ configuration 
parameter/primary
+   /indexterm
+   listitem
+para
+ This allows sequential scans of large tables to synchronize with each
+ other, so that concurrent scans read the same block at about the
+ same time and hence share the I/O workload.  When this is enabled,
+ a scan might start in the middle of the table and then quotewrap
+ around/ the end to cover all rows, so as to synchronize with the
+ activity of scans already in progress.  This can result in
+ unpredictable changes in the row ordering returned by queries that
+ have no literalORDER BY/ clause.  Setting this parameter to
+ literaloff/ ensures the pre-8.3 behavior in which a sequential
+ scan always starts from the beginning of the table.  The default
+ is literalon/.
+/para
+   /listitem
+  /varlistentry
+ 
   /variablelist
  /sect2
  
Index: src/backend/access/heap/heapam.c
===
RCS file: /cvsroot/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.248
diff -c -r1.248 heapam.c
*** src/backend/access/heap/heapam.c14 Jan 2008 01:39:09 -  1.248
--- src/backend/access/heap/heapam.c27 Jan 2008 20:00:18 -
***
*** 59,64 
--- 59,68 
  #include utils/syscache.h
  
  
+ /* GUC variable */
+ bool  synchronized_scanning = true;
+ 
+ 
  static HeapScanDesc heap_beginscan_internal(Relation relation,
Snapshot snapshot,
int nkeys, ScanKey key,
***
*** 104,110 
 * the thresholds for these features could be different, we make them 
the
 * same so that there are only two behaviors to tune rather than four.
 * (However, some callers need to be able to disable one or both of
!* these behaviors, independently of the size of the table.)
 *
 * During a rescan, don't make a new strategy object if we don't have 
to.
 */
--- 108,115 
 * the thresholds for these features could be different, we make them 
the
 * same so that there are only two behaviors to tune rather than four.
 * (However, some callers need to be able to disable one or both of
!* these behaviors, independently of the size of the table; also there
!* is a GUC variable that can disable synchronized scanning.)
 *
 * During a rescan, don't make a new strategy object if we don't have 
to.
 */
***
*** 129,135 
scan-rs_strategy = NULL;
}
  
!   if (allow_sync)
{
scan-rs_syncscan = true;
scan-rs_startblock = ss_get_location(scan-rs_rd, 
scan-rs_nblocks);
--- 134,140 
scan-rs_strategy = NULL;
}
  
!   if (allow_sync  synchronized_scanning)
{
scan-rs_syncscan = true;
scan-rs_startblock = ss_get_location(scan-rs_rd, 
scan-rs_nblocks);
Index: src/backend/utils/misc/guc.c
===
RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v
retrieving revision 1.431
diff -c -r1.431 guc.c
*** src/backend/utils/misc/guc.c27 Jan 2008 19:12:28 -  1.431
--- src/backend/utils/misc/guc.c27 Jan 2008 20:00:18 -
***
*** 110,115 
--- 110,116 
  extern intCommitSiblings;
  extern char *default_tablespace;
  extern char *temp_tablespaces;
+ extern bool synchronized_scanning;
  extern bool fullPageWrites;
  
  #ifdef TRACE_SORT
***
*** 1053,1058 
--- 1054,1068 
},
  
{
+   {synchronized_scanning, PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,
+   gettext_noop(Enable synchronized scans.),
+   NULL
+

Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Jonah H. Harris
On Jan 27, 2008 3:07 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Per today's -hackers discussion, add a GUC variable to allow clients to
 disable the new synchronized-scanning behavior, and make pg_dump disable
 sync scans so that it will reliably preserve row ordering.  This is a
 pretty trivial patch, but seeing how late we are in the 8.3 release
 cycle, I thought I'd better post it for comment anyway.

+1

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Gregory Stark
Jonah H. Harris [EMAIL PROTECTED] writes:

 On Jan 27, 2008 3:07 PM, Tom Lane [EMAIL PROTECTED] wrote:
 Per today's -hackers discussion, add a GUC variable to allow clients to
 disable the new synchronized-scanning behavior, and make pg_dump disable
 sync scans so that it will reliably preserve row ordering.  This is a
 pretty trivial patch, but seeing how late we are in the 8.3 release
 cycle, I thought I'd better post it for comment anyway.

 +1

I liked the synchronized_sequential_scans idea myself.

But otherwise, sure.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Gevik Babakhani
 
  Per today's -hackers discussion, add a GUC variable to 
 allow clients 
  to disable the new synchronized-scanning behavior, and 
 make pg_dump 
  disable sync scans so that it will reliably preserve row 
 ordering.  
  This is a pretty trivial patch, but seeing how late we are 
 in the 8.3 
  release cycle, I thought I'd better post it for comment anyway.
 
  +1
 
 I liked the synchronized_sequential_scans idea myself.
 
 But otherwise, sure.
 

this will save some hackwork I have to do for a import/export project I am
doing.

+1, please

Regards,
Gevik Babakhani

PostgreSQL NL   http://www.postgresql.nl
TrueSoftware BV http://www.truesoftware.nl



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 I liked the synchronized_sequential_scans idea myself.

The name is still open for discussion --- it's an easy
search-and-replace in the patch ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Neil Conway
On Sun, 2008-01-27 at 21:54 +, Gregory Stark wrote:
 I liked the synchronized_sequential_scans idea myself.

I think that's a bit too long. How about synchronized_scans, or
synchronized_seqscans?

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Russell Smith

Tom Lane wrote:

Per today's -hackers discussion, add a GUC variable to allow clients to
disable the new synchronized-scanning behavior, and make pg_dump disable
sync scans so that it will reliably preserve row ordering.  This is a
pretty trivial patch, but seeing how late we are in the 8.3 release
cycle, I thought I'd better post it for comment anyway.

regards, tom lane
  
  
Index: src/bin/pg_dump/pg_dump.c

===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_dump.c,v
retrieving revision 1.481
diff -c -r1.481 pg_dump.c
*** src/bin/pg_dump/pg_dump.c   1 Jan 2008 19:45:55 -   1.481
--- src/bin/pg_dump/pg_dump.c   27 Jan 2008 20:00:18 -
***
*** 553,558 
--- 553,572 
do_sql_command(g_conn, SET DATESTYLE = ISO);
  
  	/*

+* If supported, set extra_float_digits so that we can dump float data
+* exactly (given correctly implemented float I/O code, anyway)
+*/
+   if (g_fout-remoteVersion = 70400)
+   do_sql_command(g_conn, SET extra_float_digits TO 2);
+ 
+ 	/*

+* If synchronized scanning is supported, disable it, to prevent
+* unpredictable changes in row ordering across a dump and reload.
+*/
+   if (g_fout-remoteVersion = 80300)
+   do_sql_command(g_conn, SET synchronized_scanning TO off);
+ 
+ 	/*

 * Start serializable transaction to dump consistent data.
 */
do_sql_command(g_conn, BEGIN);
  

Hi,

Can somebody explain why it's important to load with 
synchronized_scanning off?


do_sql_command(g_conn, SET synchronized_scanning TO off);

Thanks

Russell Smith


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Neil Conway
On Mon, 2008-01-28 at 17:27 +1100, Russell Smith wrote:
 Can somebody explain why it's important to load with 
 synchronized_scanning off?

*Loading* with synchronized scanning off is not important (and is not
implemented by the patch).

*Dumping* with synchronized scanning off is necessary to ensure that the
order of the rows in the pg_dump matches the on-disk order of the rows
in the table, which is important if you want to preserve the clustering
of the table data on restore.

See the -hackers thread:

http://markmail.org/message/qbytsco6oj2qkxsa

-Neil



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

2008-01-27 Thread Guillaume Smet
Hi Russell,

On Jan 28, 2008 7:27 AM, Russell Smith [EMAIL PROTECTED] wrote:
 Can somebody explain why it's important to load with
 synchronized_scanning off?

 do_sql_command(g_conn, SET synchronized_scanning TO off);

It's the start point of this patch. See this thread [
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00987.php ]
for more information.

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend