date:20110111

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 01:28, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2011/1/10 Magnus Hagander mag...@hagander.net:
 On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 I've committed the backend side of this, without that. Still working
 on the client, and on cleaning up Heikki's patch for grammar/parser
 support.

 attached is a small patch fixing -d basedir when its called with an
 absolute path.
 maybe we can use pg_mkdir_p() instead of mkdir ?

Heh, that was actually a hack to be able to run pg_basebackup on the
same machine as the database with the tablespaces. It will be removed
before commit :-) (It was also in the wrong place to work, I realize I
managed to break it in a refactor) I've put in a big ugly comment to
make sure it gets removed :-)

And yes, using pg_mkdir_p() is good. I used to do that, I think I
removed it by mistake when it was supposed to be removed elsewhere.
I've put it back.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Set process title to indicate base backup is running

2011-01-11 Thread Heikki Linnakangas


On 10.01.2011 22:54, Magnus Hagander wrote:

Set process title to indicate base backup is running


It would be good to reset it after the base backup is finished..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Set process title to indicate base backup is running

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 09:50, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 10.01.2011 22:54, Magnus Hagander wrote:

 Set process title to indicate base backup is running

 It would be good to reset it after the base backup is finished..

Good point, fixed.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP: RangeTypes

2011-01-11 Thread Jeff Davis

Ok, I have made some progress. This is still a proof-of-concept patch,
but the important pieces are working together.

Synopsis:

  CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, 
SUBTYPE_CMP=numeric_cmp);

  SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]');
  SELECT range_lbound('(3.7,9]'::numrange);
  SELECT range(6.7);
  SELECT '-'::numrange; -- empty
  SELECT '[1, NULL]'::numrange; -- ] will become )
  SELECT '(INF, 3)'::numrange;

I haven't completed many of the other generic functions, because I'd
like to make sure I'm on the right track first. The important thing
about the functions above is that they show ANYRANGE working in
conjunction with ANYELEMENT in various combinations, which was a
significant part of this patch.

Here are the open items:

1. Generic functions -- most of which are fairly obvious. However, I
want to make sure I'm on the right track first.

2. GiST -- I'll need a mechanism to implement the penalty function,
and perhaps I'll also need additional support for the picksplit
function. For the penalty function, I think I'll need to require a
function to convert the subtype into a float, and I can use that to find
a distance (which can be the penalty). That should also satisfy anything
that picksplit might need.

3. Typmod -- There is still one annoyance about typmod remaining. I need
to treat it like an array in find_typmod_coercion_function(), and then
create a coercion expression. Is it worth it? Would typmod on a range be
confusing, or should I just finish this item up?

4. Docs

5. Tests

6. pg_dump -- should be pretty easy; I just want to settle some of the
other stuff first.

7. Right now the parse function is quite dumb. Is there some example
code I should follow to make sure I get this right?

8. In order to properly support the various combinations of ANYRANGE and
ANYELEMENT in a function definition (which are all important), we need
to be able to determine the range type given a subtype. That means that
each subtype can only have one associated range, which sounds somewhat
limiting, but it can be worked around by using domains. I don't think
this is a major limitation. Comments?

9. Representation -- right now I store the OID of the range type in the
range itself, much like arrays, in order to call the find the functions
to operate on the subtype. Robert has some justifiable concerns about
that 4-byte overhead. Possible ideas:

  * Forget about ANYRANGE altogether, and generate new catalog entries
for the generic functions for each new range type defined. I don't
particularly like this approach because it makes it very difficult to
define new generic functions.

  * Somehow fix the type system so that we know the specific types of
arguments in all situations. I don't know if this is feasible.

  * Store a 8- or 16-bit unique number in pg_range, and store that
number in the representation. That would be pretty ugly, and limit the
total possible range types defined at once, but it saves a couple bytes
per value.

  * Try to somehow mimic what records do. Records use a global array and
use the typmod as an index into that array. It looks like a hack to me,
but might be worth borrowing anyway.

Also related to representation:

  * Right now I always align the subtypes within the range according to
typalign. I could avoid that by packing the bytes tightly, and then
copying them around later. Suggestions? And what should the overall
alignment of the range type be?

  * If it's a fixed-length type, we can save the varlena header byte on
the overall range; but we lose the ability to save space when one of the
boundaries of the range is missing (NULL or INF), and it would
complicate the code a little. Thoughts?

Regards,
Jeff Davis


rangetypes-20110110.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED - file_fdw

2011-01-11 Thread Shigeru HANADA

On Mon, 10 Jan 2011 19:26:11 -0500
Tom Lane t...@sss.pgh.pa.us wrote:
 Shigeru HANADA han...@metrosystems.co.jp writes:
  For the purpose of file_fdw, additional ResetCopyFrom() would be
  necessary. I'm planning to include such changes in file_fdw patch. 
  Please find attached partial patch for ResetCopyFrom(). Is there
  anything else which should be done at reset?
 
 Seems like it would be smarter to close and re-open the copy operation.
 Adding a reset function is just creating an additional maintenance
 burden and point of failure, for what seems likely to be a negligible
 performance benefit.

Agreed. fileReScan can be implemented with close/re-open with storing
some additional information into FDW private area. I would withdraw
the proposal.

 If you think it's not negligible, please show some proof of that before
 asking us to support such code.

Anyway, I've measured overhead of re-open with executing query
including inner join between foreign tables copied from pgbench schema. 
I used SELECT statement below:

EXPLAIN (ANALYZE) SELECT count(*) FROM csv_accounts a JOIN
csv_branches b ON (b.bid = a.bid);

On the average of (Nested Loop - (Foreign Scan * 2)), overhead of
re-open is round 0.048ms per tuple (average of 3 times measurement).

After the implementation of file_fdw, I'm going to measure again. If
ResetCopyFrom significantly improves performance of ReScan, I'll
propose it as a separate patch.

=

The results of EXPLAIN ANALYZE are:

[using ResetCopyFrom]
 QUERY PLAN

 Aggregate  (cost=11717.02..11717.03 rows=1 width=0) (actual 
time=73357.655..73357.657 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..11717.01 rows=1 width=0) (actual 
time=0.209..71424.059 rows=100 loops=1)
 -  Foreign Scan on public.csv_accounts a  (cost=0.00..11717.00 rows=1 
width=4) (actual time=0.144..6998.497 rows=100 loops=1)
 -  Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 
width=4) (actual time=0.008..0.037 rows=10 loops=100)
 Total runtime: 73358.135 ms
(11 rows)

[using EndCopyFrom + BeginCopyFrom]
 QUERY PLAN

 Aggregate  (cost=11717.02..11717.03 rows=1 width=0) (actual 
time=120724.138..120724.140 rows=1 loops=1)
   -  Nested Loop  (cost=0.00..11717.01 rows=1 width=0) (actual 
time=0.321..118583.681 rows=100 loops=1)
 -  Foreign Scan on public.csv_accounts a  (cost=0.00..11717.00 rows=1 
width=4) (actual time=0.156..7208.968 rows=100 loops=1)
 -  Foreign Scan on public.csv_branches b  (cost=0.00..0.00 rows=1 
width=4) (actual time=0.016..0.046 rows=10 loops=100)
 Total runtime: 121118.792 ms
(11 rows)

Time: 121122.205 ms

=

Regards,
--
Shigeru Hanada



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Simon Riggs

On Sun, 2011-01-09 at 16:59 -0500, Noah Misch wrote:

 This begins the patch series for the design I recently proposed[1] for 
 avoiding
 some table rewrites in ALTER TABLE ... ALTER COLUMN ... TYPE.  I'm posting 
 these
 patches today:

These sound very good.

I have a concern that by making the ALTER TABLE more complex that we
might not be able to easily tell if a rewrite happens, or not.

Perhaps we should add a WITHOUT REWRITE clause? That would allow a user
to specify that they do not wish a rewrite, so if the AT requires them
to have one it would then fail.

You might point out I didn't do anything like that for my ALTER TABLE
patch, and I would agree with you. WITHOUT ACCESS EXCLUSIVE LOCK might
be an option here also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Tatsuo Ishii

 On Fri, Jan 7, 2011 at 6:28 PM, Florian Pflug f...@phlo.org wrote:
 I forgot about sequences earlier. If we dump while someone deletes all
 rows and resets the sequence the dump might contain rows and still
 reset the sequence. This could cause duplicate key errors on restore.
 I haven't checked if this is really possible though - I guess it would
 depend on the exact order of these events...
 
 To fix this, you'd need a lock that allowed getting values from the
 sequence but prevented resetting it, and...
 
 I wonder how such locks would work. Would such locks prevent accessing
 these objects? Or just modifications? For example, if I locked a function,
 could someone else execute it while I held the lock?
 
 ...in fact we do very little locking of objects other than tables.
 DROP takes an AccessExclusiveLock on whatever it's dropping, and
 COMMENT and SECURITY LABEL take ShareUpdateExclusiveLocks to avoid
 orphaning pg_{sh,}description or pg_seclabel entries in the face of a
 concurrent drop, but most operations on non-table objects don't AFAIK
 take any lock at all.  We probably don't want to make too many changes
 in this area, because there are already workloads where the
 heavyweight lock manager can become a bottleneck, and one can easily
 imagine that locking operators or namespaces could make that problem
 much worse.

For query based replication tools like pgpool-II (I don't know any
other tools, for example Postgres XC falls in this category or
not...), we need to be able to lock sequences. Fortunately it is allowed to:

SELECT 1 FROM foo_sequece FOR UPDATE;

but LOCK foo_sequence looks more appropreate syntax for me.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Koichi Suzuki

At present, XC does not seem to need locks to maintain cluster-wide
integrity because it is maintained centrally in GTM.  If application
needs to do this, for example, to synchronize between different
clusters, XC needs this capability obviously.
--
Koichi Suzuki



2011/1/11 Tatsuo Ishii is...@postgresql.org:
 On Fri, Jan 7, 2011 at 6:28 PM, Florian Pflug f...@phlo.org wrote:
 I forgot about sequences earlier. If we dump while someone deletes all
 rows and resets the sequence the dump might contain rows and still
 reset the sequence. This could cause duplicate key errors on restore.
 I haven't checked if this is really possible though - I guess it would
 depend on the exact order of these events...

 To fix this, you'd need a lock that allowed getting values from the
 sequence but prevented resetting it, and...

 I wonder how such locks would work. Would such locks prevent accessing
 these objects? Or just modifications? For example, if I locked a function,
 could someone else execute it while I held the lock?

 ...in fact we do very little locking of objects other than tables.
 DROP takes an AccessExclusiveLock on whatever it's dropping, and
 COMMENT and SECURITY LABEL take ShareUpdateExclusiveLocks to avoid
 orphaning pg_{sh,}description or pg_seclabel entries in the face of a
 concurrent drop, but most operations on non-table objects don't AFAIK
 take any lock at all.  We probably don't want to make too many changes
 in this area, because there are already workloads where the
 heavyweight lock manager can become a bottleneck, and one can easily
 imagine that locking operators or namespaces could make that problem
 much worse.

 For query based replication tools like pgpool-II (I don't know any
 other tools, for example Postgres XC falls in this category or
 not...), we need to be able to lock sequences. Fortunately it is allowed to:

 SELECT 1 FROM foo_sequece FOR UPDATE;

 but LOCK foo_sequence looks more appropreate syntax for me.
 --
 Tatsuo Ishii
 SRA OSS, Inc. Japan
 English: http://www.sraoss.co.jp/index_en.php
 Japanese: http://www.sraoss.co.jp

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 02:24, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jan 10, 2011 at 10:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
  I think we need a status enum. ('BACKUP', 'CATCHUP', 'STREAM') for the 3
  phases of replication.
 
  That seems reasonable. But if we keep BACKUP in there, should we
  really have it called pg_stat_replication? (yeah, I know, I'm not
  giving up :P)
 
  (You'd need a 4th mode for WAITING or so, to indicate it's waiting for
  a command)
 
  That's something different.
 
  The 3 phases are more concrete.
 
  BACKUP --  CATCHUP---  STREAM
 
  When you connect you either do BACKUP or CATCHUP. Once in CATCHUP mode
  you never issue a BACKUP. Once we have caught up we move to STREAM. That
  has nothing to do with idle/active.

 So how does a walsender that's waiting for a command from the client
 show up? Surely it's not in catchup mode yet?

 There is a trivial state between connect and first command. If you think
 that is worth publishing, feel free. STARTING?

 I think it's worth publishing.  STARTING would be OK, or maybe STARTUP
 to parallel the other two -UP states.

Here's a patch for this. I chose IDLE, because that's what we call
other backends that are waiting for commands...

Does this seem correct?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 298,305  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
entrystructnamepg_stat_replication/indextermprimarypg_stat_replication/primary/indexterm/entry
entryOne row per WAL sender process, showing process acronymID/,
user OID, user name, application name, client's address and port number,
!   time at which the server process began execution, and transaction log
!   location.
   /entry
   /row
  
--- 298,305 
entrystructnamepg_stat_replication/indextermprimarypg_stat_replication/primary/indexterm/entry
entryOne row per WAL sender process, showing process acronymID/,
user OID, user name, application name, client's address and port number,
!   time at which the server process began execution, current WAL sender
!   state and transaction log location.
   /entry
   /row
  
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***
*** 501,506  CREATE VIEW pg_stat_replication AS
--- 501,507 
  S.client_addr,
  S.client_port,
  S.backend_start,
+ W.state,
  W.sent_location
  FROM pg_stat_get_activity(NULL) AS S, pg_authid U,
  pg_stat_get_wal_senders() AS W
*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***
*** 24,29 
--- 24,30 
  #include libpq/pqformat.h
  #include nodes/pg_list.h
  #include replication/basebackup.h
+ #include replication/walsender.h
  #include storage/fd.h
  #include storage/ipc.h
  #include utils/builtins.h
***
*** 83,88  SendBaseBackup(const char *options)
--- 84,91 
  		   ALLOCSET_DEFAULT_MAXSIZE);
  	old_context = MemoryContextSwitchTo(backup_context);
  
+ 	WalSndSetState(WalSndState_BACKUP);
+ 
  	if (backup_label == NULL)
  		ereport(FATAL,
  (errcode(ERRCODE_PROTOCOL_VIOLATION),
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 179,184  WalSndHandshake(void)
--- 179,185 
  	{
  		int			firstchar;
  
+ 		WalSndSetState(WalSndState_IDLE);
  		set_ps_display(idle, false);
  
  		/* Wait for a command to arrive */
***
*** 482,487  WalSndLoop(void)
--- 483,491 
  			if (!XLogSend(output_message, caughtup))
  break;
  		}
+ 
+ 		/* Update our state to indicate if we're behind or not */
+ 		WalSndSetState(caughtup ? WalSndState_STREAMING : WalSndState_CATCHUP);
  	}
  
  	/*
***
*** 533,538  InitWalSnd(void)
--- 537,543 
  			 */
  			walsnd-pid = MyProcPid;
  			MemSet(walsnd-sentPtr, 0, sizeof(XLogRecPtr));
+ 			walsnd-state = WalSndState_IDLE;
  			SpinLockRelease(walsnd-mutex);
  			/* don't need the lock anymore */
  			OwnLatch((Latch *) walsnd-latch);
***
*** 960,965  WalSndWakeup(void)
--- 965,999 
  		SetLatch(WalSndCtl-walsnds[i].latch);
  }
  
+ /* Set state for current walsender (only called in walsender) */
+ void WalSndSetState(WalSndState state)
+ {
+ 	/* use volatile pointer to prevent code rearrangement */
+ 	volatile WalSnd *walsnd = MyWalSnd;
+ 
+ 	SpinLockAcquire(walsnd-mutex);
+ 	walsnd-state = state;
+ 	SpinLockRelease(walsnd-mutex);
+ }
+ 
+ /*
+  * Return a string constant representing the state. This is used
+  * in system views, and should *not* be translated.
+  */
+ static const char *
+ WalSndGetStateString(WalSndState state)
+ {
+ 	switch (state)
+ 	{
+ 		case WalSndState_IDLE: return IDLE;
+

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 4:46 AM, Tatsuo Ishii is...@postgresql.org wrote:
 On Fri, Jan 7, 2011 at 6:28 PM, Florian Pflug f...@phlo.org wrote:
 I forgot about sequences earlier. If we dump while someone deletes all
 rows and resets the sequence the dump might contain rows and still
 reset the sequence. This could cause duplicate key errors on restore.
 I haven't checked if this is really possible though - I guess it would
 depend on the exact order of these events...

 To fix this, you'd need a lock that allowed getting values from the
 sequence but prevented resetting it, and...

 I wonder how such locks would work. Would such locks prevent accessing
 these objects? Or just modifications? For example, if I locked a function,
 could someone else execute it while I held the lock?

 ...in fact we do very little locking of objects other than tables.
 DROP takes an AccessExclusiveLock on whatever it's dropping, and
 COMMENT and SECURITY LABEL take ShareUpdateExclusiveLocks to avoid
 orphaning pg_{sh,}description or pg_seclabel entries in the face of a
 concurrent drop, but most operations on non-table objects don't AFAIK
 take any lock at all.  We probably don't want to make too many changes
 in this area, because there are already workloads where the
 heavyweight lock manager can become a bottleneck, and one can easily
 imagine that locking operators or namespaces could make that problem
 much worse.

 For query based replication tools like pgpool-II (I don't know any
 other tools, for example Postgres XC falls in this category or
 not...), we need to be able to lock sequences. Fortunately it is allowed to:

 SELECT 1 FROM foo_sequece FOR UPDATE;

 but LOCK foo_sequence looks more appropreate syntax for me.

Those aren't doing the same thing.  The first is locking the one and
only tuple that is contained within the sequence, while the second is
locking the sequence object itself.

At this point, I'm inclined to think that the pg_dump comment is just
wrong, and we ought to fix it to say that we don't really want to be
able to lock other relations after all, and call it good.

As a side node, locking a sequence for replication seems like it could
have pretty dire effects on performance in certain workloads.  Why do
you need to do that, anyway?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 5:28 AM, Magnus Hagander mag...@hagander.net wrote:
 Does this seem correct?

It looks reasonable, except that I the way you've chosen to capitalize
the wal sender states makes me want to shoot myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compatibility GUC for serializable

2011-01-11 Thread Florian Pflug

On Jan10, 2011, at 23:56 , Kevin Grittner wrote:
 The proposed GUC would suppress the monitoring in SERIALIZABLE
 mode and avoid the new serialization failures, thereby providing
 legacy behavior -- anomalies and all.
 
 After posting that I realized that there's no technical reason that
 such a GUC couldn't be set within each session as desired, as long
 as we disallowed changes after the first snapshot of a transaction
 was acquired.  The IsolationIsSerializable() macro could be modified
 to use that along with XactIsoLevel.

From a security point of view, it seems dangerous to allow
such a GUC to be set by non-superusers. It might allow users to 
e.g. circumvent some access control scheme by exploiting a race
condition that only exists without true serializability.

The risk of confusion is also much higher if such a thing can be
set per-session.

So, if we need such a GUC at all, which I'm not sure we do, I
believe it should be settable only from postgresql.conf and the
command line.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/python custom exceptions for SPI

2011-01-11 Thread Jan Urbański

On 11/01/11 01:27, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
 On 10.1.2011 17:20, Jan Urbański wrote:
 I changed that patch to use Perl instead of sed to generate the
 exceptions, which should be a more portable.
 
 Why not python ?
 
 Because we're not adding even more different tool requirements to the
 build process.  Perl is what we've chosen to depend on, and there is no
 reason to use a different tool here.

Yep, exactly.

While looking at it again I fixed a problem with spiexceptions.h not
being distributed (and IIRC we don't require Perl for tarball builds)
and added generating it to the MSVC build system.

Updated patch attached.

Cheers,
Jan
diff --git a/src/pl/plpython/Makefile b/src/pl/plpython/Makefile
index 33dddc6..1fe386b 100644
*** a/src/pl/plpython/Makefile
--- b/src/pl/plpython/Makefile
*** rpathdir = $(python_libdir)
*** 38,44 
  
  NAME = plpython$(python_majorversion)
  OBJS = plpython.o
! 
  
  # Python on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
--- 38,44 
  
  NAME = plpython$(python_majorversion)
  OBJS = plpython.o
! SPIEXCEPTIONS = spiexceptions.h
  
  # Python on win32 ships with import libraries only for Microsoft Visual C++,
  # which are not compatible with mingw gcc. Therefore we need to build a
*** PSQLDIR = $(bindir)
*** 86,93 
  
  include $(top_srcdir)/src/Makefile.shlib
  
  
! all: all-lib
  
  install: all installdirs install-lib
  ifeq ($(python_majorversion),2)
--- 86,102 
  
  include $(top_srcdir)/src/Makefile.shlib
  
+ .PHONY: gen-spiexceptions
  
! # Generate spiexceptions.h from utils/errcodes.h
! spiexceptions.h: $(top_srcdir)/src/include/utils/errcodes.h
! 	$(PERL) $(srcdir)/generate-spiexceptions.pl $^  $(SPIEXCEPTIONS)
! 
! gen-spiexceptions: $(SPIEXCEPTIONS)
! 
! all: gen-spiexceptions all-lib
! 
! distprep: gen-spiexceptions
  
  install: all installdirs install-lib
  ifeq ($(python_majorversion),2)
*** endif
*** 134,143 
  submake:
  	$(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
  
! clean distclean maintainer-clean: clean-lib
  	rm -f $(OBJS)
  	rm -rf results
  	rm -f regression.diffs regression.out
  ifeq ($(PORTNAME), win32)
  	rm -f python${pytverstr}.def
  endif
--- 143,157 
  submake:
  	$(MAKE) -C $(top_builddir)/src/test/regress pg_regress$(X)
  
! clean distclean: clean-lib
  	rm -f $(OBJS)
  	rm -rf results
  	rm -f regression.diffs regression.out
+ 
+ # since we distribute spiexceptions.h, only remove it in maintainer-clean
+ maintainer-clean: clean distclean
+ 	rm -f $(SPIEXCEPTIONS)
+ 
  ifeq ($(PORTNAME), win32)
  	rm -f python${pytverstr}.def
  endif
diff --git a/src/pl/plpython/expected/plpython_error.out b/src/pl/plpython/expected/plpython_error.out
index 7fc8337..718ebce 100644
*** a/src/pl/plpython/expected/plpython_error.out
--- b/src/pl/plpython/expected/plpython_error.out
*** CREATE FUNCTION sql_syntax_error() RETUR
*** 8,14 
  'plpy.execute(syntax error)'
  LANGUAGE plpythonu;
  SELECT sql_syntax_error();
! ERROR:  plpy.SPIError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */
--- 8,14 
  'plpy.execute(syntax error)'
  LANGUAGE plpythonu;
  SELECT sql_syntax_error();
! ERROR:  spiexceptions.SyntaxError: syntax error at or near syntax
  CONTEXT:  PL/Python function sql_syntax_error
  /* check the handling of uncaught python exceptions
   */
*** CREATE FUNCTION exception_index_invalid_
*** 27,33 
  return rv[0]'
  	LANGUAGE plpythonu;
  SELECT exception_index_invalid_nested();
! ERROR:  plpy.SPIError: function test5(unknown) does not exist
  CONTEXT:  PL/Python function exception_index_invalid_nested
  /* a typo
   */
--- 27,33 
  return rv[0]'
  	LANGUAGE plpythonu;
  SELECT exception_index_invalid_nested();
! ERROR:  spiexceptions.UndefinedFunction: function test5(unknown) does not exist
  CONTEXT:  PL/Python function exception_index_invalid_nested
  /* a typo
   */
*** return None
*** 43,49 
  '
  	LANGUAGE plpythonu;
  SELECT invalid_type_uncaught('rick');
! ERROR:  plpy.SPIError: type test does not exist
  CONTEXT:  PL/Python function invalid_type_uncaught
  /* for what it's worth catch the exception generated by
   * the typo, and return None
--- 43,49 
  '
  	LANGUAGE plpythonu;
  SELECT invalid_type_uncaught('rick');
! ERROR:  spiexceptions.UndefinedObject: type test does not exist
  CONTEXT:  PL/Python function invalid_type_uncaught
  /* for what it's worth catch the exception generated by
   * the typo, and return None
*** SELECT valid_type('rick');
*** 109,111 
--- 109,149 
   
  (1 row)
  
+ /* Check catching specific types of exceptions
+  */
+ CREATE TABLE specific (
+ i integer PRIMARY KEY
+ );
+ NOTICE:  CREATE TABLE / PRIMARY KEY will

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Robert Haas

On Mon, Jan 10, 2011 at 8:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 10, 2011 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 My point is that this isn't a bug fix, it's more like moving the
 goalposts on what getObjectDescription is supposed to do.

 I think that adding the types to the description string is a pretty
 sensible thing to do.

 Not really.  AFAIR, there are two cases that exist in practice,
 depending on which AM you're talking about:

 1. The recorded types match the input types of the operator/function
   (btree  hash).
 2. The recorded types are always the same as the opclass's input type
   (gist  gin).

 In neither case does printing those types really add much information.
 That's why it's not there now.

I don't get it.  If two different items that exist in the system out
of the box have the same description, it seems clear that relevant
piece of disambiguating information exists nowhere in the description
string.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error code for terminating connection due to conflict with recovery

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 1:30 AM, Tatsuo Ishii is...@postgresql.org wrote:
 On Sat, Jan 8, 2011 at 9:52 AM, Tatsuo Ishii is...@postgresql.org wrote:
 While looking at the backend code, I realized that error code for
 terminating connection due to conflict with recovery is
 ERRCODE_ADMIN_SHUTDOWN.

 I thought the error code is for somewhat a human interruption, such as
 shutdown command issued by pg_ctl. Is the usage of the error code
 appropreate?

 That doesn't sound right to me.  I'd have thought something in class 40.

 What about:

 40004 CONFLICT WITH RECOVERY conflict_with_recovery

We should respect the following convention, from errcodes.h:

 * The convention is that new error codes defined by PostgreSQL in a
 * class defined by the standard have a subclass value that begins
 * with 'P'. In addition, error codes defined by PostgreSQL clients
 * (such as ecpg) have a class value that begins with 'Y'.

And don't forget there are three places where the new error code would
need to be added.

Otherwise, +1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 11:28 +0100, Magnus Hagander wrote:
  
   (You'd need a 4th mode for WAITING or so, to indicate it's waiting for
   a command)
  
   That's something different.
  
   The 3 phases are more concrete.
  
   BACKUP --  CATCHUP---  STREAM
  
   When you connect you either do BACKUP or CATCHUP. Once in CATCHUP mode
   you never issue a BACKUP. Once we have caught up we move to STREAM. That
   has nothing to do with idle/active.
 
  So how does a walsender that's waiting for a command from the client
  show up? Surely it's not in catchup mode yet?
 
  There is a trivial state between connect and first command. If you think
  that is worth publishing, feel free. STARTING?
 
  I think it's worth publishing.  STARTING would be OK, or maybe STARTUP
  to parallel the other two -UP states.
 
 Here's a patch for this. I chose IDLE, because that's what we call
 other backends that are waiting for commands...
 
 Does this seem correct?

No

You can be idle yet in STREAMING mode. What mode we are in has nothing
to do with idle/active. Either STARTING/STARTUP/NULL but not IDLE.

If you want that as well, then we need a second column, but personally
it's too much information and its too hard to say what it actually
means. For example, with sync rep, the WALSender might be idle, yet
there might yet be backends waiting for a reply.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Tatsuo Ishii

 On Tue, Jan 11, 2011 at 4:46 AM, Tatsuo Ishii is...@postgresql.org wrote:
 On Fri, Jan 7, 2011 at 6:28 PM, Florian Pflug f...@phlo.org wrote:
 I forgot about sequences earlier. If we dump while someone deletes all
 rows and resets the sequence the dump might contain rows and still
 reset the sequence. This could cause duplicate key errors on restore.
 I haven't checked if this is really possible though - I guess it would
 depend on the exact order of these events...

 To fix this, you'd need a lock that allowed getting values from the
 sequence but prevented resetting it, and...

 I wonder how such locks would work. Would such locks prevent accessing
 these objects? Or just modifications? For example, if I locked a function,
 could someone else execute it while I held the lock?

 ...in fact we do very little locking of objects other than tables.
 DROP takes an AccessExclusiveLock on whatever it's dropping, and
 COMMENT and SECURITY LABEL take ShareUpdateExclusiveLocks to avoid
 orphaning pg_{sh,}description or pg_seclabel entries in the face of a
 concurrent drop, but most operations on non-table objects don't AFAIK
 take any lock at all.  We probably don't want to make too many changes
 in this area, because there are already workloads where the
 heavyweight lock manager can become a bottleneck, and one can easily
 imagine that locking operators or namespaces could make that problem
 much worse.

 For query based replication tools like pgpool-II (I don't know any
 other tools, for example Postgres XC falls in this category or
 not...), we need to be able to lock sequences. Fortunately it is allowed to:

 SELECT 1 FROM foo_sequece FOR UPDATE;

 but LOCK foo_sequence looks more appropreate syntax for me.
 
 Those aren't doing the same thing.  The first is locking the one and
 only tuple that is contained within the sequence, while the second is
 locking the sequence object itself.

But a sequence relation contains only 1 tuple and there's no
difference among them, no?

 At this point, I'm inclined to think that the pg_dump comment is just
 wrong, and we ought to fix it to say that we don't really want to be
 able to lock other relations after all, and call it good.
 
 As a side node, locking a sequence for replication seems like it could
 have pretty dire effects on performance in certain workloads.  Why do
 you need to do that, anyway?

Pgpool not only needs to replicate sequences but replicates tuples
updated by DMLs which are using sequence value(I am talking about
SERIAL data types). For this purpose, pgpool issue nextval() to master
DB server first, then use the value for subsequent INSERT/UPDATE. This
will guarantee that inserted/updated values using sequences are
identical among master and slave DB servers. Problem is, if this
process happens in concurrent sessions, inserted/updated tuples might
not have identical value among DB servers. So I need sequence lock
here. This is the price statement based replication tools have to pay
for:-
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Robert Haas

On Sun, Jan 9, 2011 at 4:59 PM, Noah Misch n...@leadboat.com wrote:
 This begins the patch series for the design I recently proposed[1] for 
 avoiding
 some table rewrites in ALTER TABLE ... ALTER COLUMN ... TYPE.  I'm posting 
 these
 patches today:

 0 - new test cases

This doesn't look right.  You might be building it, but you sure
aren't rebuilding it.

+CREATE TABLE parent (keycol numeric PRIMARY KEY);
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
parent_pkey for table parent
+DEBUG:  Rebuilding index parent_pkey

In general, I think this is six kinds of overkill.  I don't think we
really need 2000 lines of new regression tests for this feature.  I'd
like to see that chopped down by at least 10x.

I don't like this bit:

+   ereport(IsToastRelation(indexRelation) ? DEBUG2 : DEBUG1,

I see no reason to set the verbosity differently depending on whether
or not something's a toast relation; that seems more likely to be
confusing than helpful.  I guess my vote would be to make all of these
messages DEBUG2, period.  A quick test suggests that doesn't produce
too much noise executing DDL commands.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 6:31 AM, Tatsuo Ishii is...@postgresql.org wrote:
 For query based replication tools like pgpool-II (I don't know any
 other tools, for example Postgres XC falls in this category or
 not...), we need to be able to lock sequences. Fortunately it is allowed to:

 SELECT 1 FROM foo_sequece FOR UPDATE;

 but LOCK foo_sequence looks more appropreate syntax for me.

 Those aren't doing the same thing.  The first is locking the one and
 only tuple that is contained within the sequence, while the second is
 locking the sequence object itself.

 But a sequence relation contains only 1 tuple and there's no
 difference among them, no?

No, not really.  It's still a different object.

 As a side node, locking a sequence for replication seems like it could
 have pretty dire effects on performance in certain workloads.  Why do
 you need to do that, anyway?

 Pgpool not only needs to replicate sequences but replicates tuples
 updated by DMLs which are using sequence value(I am talking about
 SERIAL data types). For this purpose, pgpool issue nextval() to master
 DB server first, then use the value for subsequent INSERT/UPDATE. This
 will guarantee that inserted/updated values using sequences are
 identical among master and slave DB servers. Problem is, if this
 process happens in concurrent sessions, inserted/updated tuples might
 not have identical value among DB servers. So I need sequence lock
 here. This is the price statement based replication tools have to pay
 for:-

Ouch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 12:17, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jan 11, 2011 at 5:28 AM, Magnus Hagander mag...@hagander.net wrote:
 Does this seem correct?

 It looks reasonable, except that I the way you've chosen to capitalize
 the wal sender states makes me want to shoot myself.

Hah, I was waiting for that. I can certainly change that :-)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 12:23, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-01-11 at 11:28 +0100, Magnus Hagander wrote:
  
   (You'd need a 4th mode for WAITING or so, to indicate it's waiting for
   a command)
  
   That's something different.
  
   The 3 phases are more concrete.
  
   BACKUP --  CATCHUP---  STREAM
  
   When you connect you either do BACKUP or CATCHUP. Once in CATCHUP mode
   you never issue a BACKUP. Once we have caught up we move to STREAM. 
   That
   has nothing to do with idle/active.
 
  So how does a walsender that's waiting for a command from the client
  show up? Surely it's not in catchup mode yet?
 
  There is a trivial state between connect and first command. If you think
  that is worth publishing, feel free. STARTING?
 
  I think it's worth publishing.  STARTING would be OK, or maybe STARTUP
  to parallel the other two -UP states.

 Here's a patch for this. I chose IDLE, because that's what we call
 other backends that are waiting for commands...

 Does this seem correct?

 No

 You can be idle yet in STREAMING mode. What mode we are in has nothing
 to do with idle/active. Either STARTING/STARTUP/NULL but not IDLE.

 If you want that as well, then we need a second column, but personally
 it's too much information and its too hard to say what it actually
 means. For example, with sync rep, the WALSender might be idle, yet
 there might yet be backends waiting for a reply.

That's a good point.

Just to be clear, you're objecting to the *name* of the state, right,
not how/where it's set? In particular, how the catchup/streaming
things are set?

I agree that using a second column for it is unnecessary.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compatibility GUC for serializable

2011-01-11 Thread Florian Pflug

On Jan10, 2011, at 20:29 , Josh Berkus wrote:
 The only reason I'm ambivalent about
 this is I'm unsure that there are more than a handful of people using
 SERIALIZABLE in production applications, precisely because it's been so
 unintuitive in the past.

I've used it quite extensively in the past. Usually either to run
two consecutive queries with the same snapshot, or to run an
UPDATE .. FROM (since that can be quite a foot-gun in READ COMMITTED
mode). 

In retrospect, I should have used REPEATABLE READ instead of
SERIALIZABLE, of course. But I didn't, and the part of the reason
for that is our very own documentation. The way it is written
gives the impression that SERIALIZABLE is the real name of the
isolation leven while REPEATABLE READ is a compatibility synonym,
and it also leads one to believe that true serializability isn't
something that will ever be implemented.

The two sections of 13.2 Transaction Isolation dealing with
the isolation levels READ COMMITTED and REPEATABLE READ/SERIALIZABLE
are for example 13.2.1 Read Committed Isolation Level and
13.2.2 Serializable Isolation Level. 

And, at the end of 13.2, in the section about SERIALIZABLE vs.
true serializability, we say

To guarantee true mathematical serializability, it is necessary
for a database system to enforce predicate locking, 
Such a locking system is complex to implement and extremely expensive
in execution,   And this large expense is mostly wasted, since in
practice most applications do not do the sorts of things that could
result in problems. ...  For these reasons, PostgreSQL does not
implement predicate locking.

I'd be very surprised if nearly all out our users used only READ
COMMITTED isolation level. And given the wording of our documentation,
I'm quite certain I'm not the only one who used to spell that other
isolation leven SERIALIZABLE, not REPEATABLE READ.

The question thus (again) comes to to whether we believe that for
virtually all of these users, true serializability is either an improvement
or at least no regression. I cannot come up with a case where that wouldn't
be the case *in theory*. In practice, however, Kevin's patch includes some
performance vs. false-positives trade-offs, and these *have* the potential
of biting people.

So, to summarize, I believe we need to look at the trade-offs - for
example the way conflict information is summarized to prevent memory
overflow - to judge whether there are any realistic workloads where
those might cause problems.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 12:41 +0100, Magnus Hagander wrote:

 Just to be clear, you're objecting to the *name* of the state, right,
 not how/where it's set? 

Yes

 In particular, how the catchup/streaming
 things are set?

You've set it in the right places.

I would personally constrain the state transitions, so that we can't
ever make illegal changes, such as CATCHUP - BACKUP.

I would also check the state locally before grabbing the spinlock, as is
typical in other xlog code. Continually resetting shared state to
exactly what it is already seems strange, to me. If we make the rule
that the state can only be changed by the WALSender itself, it won't
need to grab spinlock. If you don't like that, a local variable works.

Also, mixing upper camel case and uppercase for the STATe NamES looKS
weIRD. Uppercase makes them look more clearly like enum/states as used
elsewhere in similar code.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 12:58, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-01-11 at 12:41 +0100, Magnus Hagander wrote:

 Just to be clear, you're objecting to the *name* of the state, right,
 not how/where it's set?

 Yes

 In particular, how the catchup/streaming
 things are set?

 You've set it in the right places.

 I would personally constrain the state transitions, so that we can't
 ever make illegal changes, such as CATCHUP - BACKUP.

Well, once we enter the walsender loop we can never get out of it, so
it simply cannot happen...


 I would also check the state locally before grabbing the spinlock, as is
 typical in other xlog code. Continually resetting shared state to
 exactly what it is already seems strange, to me. If we make the rule
 that the state can only be changed by the WALSender itself, it won't
 need to grab spinlock. If you don't like that, a local variable works.

Hmm. I don't see why anybody other than the walsender should change
it, so yeah. You mean I can just drop the spinlock calls completely,
and then do an
if (walsnd-state != state)
   walsnd-state = state;


? Do I need to keep the volatile pointer, or should I drop that as well?



 Also, mixing upper camel case and uppercase for the STATe NamES looKS
 weIRD. Uppercase makes them look more clearly like enum/states as used
 elsewhere in similar code.

Yeah, I'll change that.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 09:24:46AM +, Simon Riggs wrote:
 On Sun, 2011-01-09 at 16:59 -0500, Noah Misch wrote:
 
  This begins the patch series for the design I recently proposed[1] for 
  avoiding
  some table rewrites in ALTER TABLE ... ALTER COLUMN ... TYPE.  I'm posting 
  these
  patches today:
 
 These sound very good.

Thanks.

 I have a concern that by making the ALTER TABLE more complex that we
 might not be able to easily tell if a rewrite happens, or not.
 
 Perhaps we should add a WITHOUT REWRITE clause? That would allow a user
 to specify that they do not wish a rewrite, so if the AT requires them
 to have one it would then fail.

These changes do make it harder to guess how much work the ALTER TABLE will do.
Indeed, about 1/4 of my own guesses prior to writing were wrong.  Something like
WITHOUT REWRITE might be the way to go, though there are more questions: if it
does not rewrite, does it scan the table?  Which indexes, if any, does it
rebuild?  Which foreign key constraints, if any, does it recheck?  With patch 0,
you can answer all these questions by enabling DEBUG1 messages and trying the
command on your test system.  For this reason, I did consider adding a VERBOSE
clause to show those messages at DETAIL, rather than unconditionally showing
them at DEBUG1.  In any case, if a WITHOUT REWRITE like you describe covers the
important question, it's certainly easy enough to implement.

 You might point out I didn't do anything like that for my ALTER TABLE
 patch, and I would agree with you. WITHOUT ACCESS EXCLUSIVE LOCK might
 be an option here also.

True.  At least we could completely document the lock choices on the ALTER TABLE
reference page.  The no-rewrite cases are defined at arms length from ALTER
TABLE, and users can define their own, so it's a tougher fit.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 13:04 +0100, Magnus Hagander wrote:
 On Tue, Jan 11, 2011 at 12:58, Simon Riggs si...@2ndquadrant.com wrote:
  On Tue, 2011-01-11 at 12:41 +0100, Magnus Hagander wrote:
 
  Just to be clear, you're objecting to the *name* of the state, right,
  not how/where it's set?
 
  Yes
 
  In particular, how the catchup/streaming
  things are set?
 
  You've set it in the right places.
 
  I would personally constrain the state transitions, so that we can't
  ever make illegal changes, such as CATCHUP - BACKUP.
 
 Well, once we enter the walsender loop we can never get out of it, so
 it simply cannot happen...

Accidentally it can, so I would like to protect against that.

Putting those checks in are like Asserts(), they help document what is
and is not possible by design.

  I would also check the state locally before grabbing the spinlock, as is
  typical in other xlog code. Continually resetting shared state to
  exactly what it is already seems strange, to me. If we make the rule
  that the state can only be changed by the WALSender itself, it won't
  need to grab spinlock. If you don't like that, a local variable works.
 
 Hmm. I don't see why anybody other than the walsender should change
 it, so yeah. You mean I can just drop the spinlock calls completely,
 and then do an
 if (walsnd-state != state)
walsnd-state = state;
 
 
 ? Do I need to keep the volatile pointer, or should I drop that as well?

No, do this at top

if (walsnd-state == state)
  return;

Keep spinlocks when actually setting it.

  Also, mixing upper camel case and uppercase for the STATe NamES looKS
  weIRD. Uppercase makes them look more clearly like enum/states as used
  elsewhere in similar code.
 
 Yeah, I'll change that.
 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] system views for walsender activity

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 13:18, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-01-11 at 13:04 +0100, Magnus Hagander wrote:
 On Tue, Jan 11, 2011 at 12:58, Simon Riggs si...@2ndquadrant.com wrote:
  On Tue, 2011-01-11 at 12:41 +0100, Magnus Hagander wrote:
 
  Just to be clear, you're objecting to the *name* of the state, right,
  not how/where it's set?
 
  Yes
 
  In particular, how the catchup/streaming
  things are set?
 
  You've set it in the right places.
 
  I would personally constrain the state transitions, so that we can't
  ever make illegal changes, such as CATCHUP - BACKUP.

 Well, once we enter the walsender loop we can never get out of it, so
 it simply cannot happen...

 Accidentally it can, so I would like to protect against that.

 Putting those checks in are like Asserts(), they help document what is
 and is not possible by design.

  I would also check the state locally before grabbing the spinlock, as is
  typical in other xlog code. Continually resetting shared state to
  exactly what it is already seems strange, to me. If we make the rule
  that the state can only be changed by the WALSender itself, it won't
  need to grab spinlock. If you don't like that, a local variable works.

 Hmm. I don't see why anybody other than the walsender should change
 it, so yeah. You mean I can just drop the spinlock calls completely,
 and then do an
 if (walsnd-state != state)
    walsnd-state = state;


 ? Do I need to keep the volatile pointer, or should I drop that as well?

 No, do this at top

 if (walsnd-state == state)
  return;

 Keep spinlocks when actually setting it.


Aha. Thanks for the pointers, pfa a new version.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 298,305  postgres: replaceableuser/ replaceabledatabase/ replaceablehost/ re
entrystructnamepg_stat_replication/indextermprimarypg_stat_replication/primary/indexterm/entry
entryOne row per WAL sender process, showing process acronymID/,
user OID, user name, application name, client's address and port number,
!   time at which the server process began execution, and transaction log
!   location.
   /entry
   /row
  
--- 298,305 
entrystructnamepg_stat_replication/indextermprimarypg_stat_replication/primary/indexterm/entry
entryOne row per WAL sender process, showing process acronymID/,
user OID, user name, application name, client's address and port number,
!   time at which the server process began execution, current WAL sender
!   state and transaction log location.
   /entry
   /row
  
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***
*** 501,506  CREATE VIEW pg_stat_replication AS
--- 501,507 
  S.client_addr,
  S.client_port,
  S.backend_start,
+ W.state,
  W.sent_location
  FROM pg_stat_get_activity(NULL) AS S, pg_authid U,
  pg_stat_get_wal_senders() AS W
*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***
*** 24,29 
--- 24,30 
  #include libpq/pqformat.h
  #include nodes/pg_list.h
  #include replication/basebackup.h
+ #include replication/walsender.h
  #include storage/fd.h
  #include storage/ipc.h
  #include utils/builtins.h
***
*** 83,88  SendBaseBackup(const char *options)
--- 84,91 
  		   ALLOCSET_DEFAULT_MAXSIZE);
  	old_context = MemoryContextSwitchTo(backup_context);
  
+ 	WalSndSetState(WALSNDSTATE_BACKUP);
+ 
  	if (backup_label == NULL)
  		ereport(FATAL,
  (errcode(ERRCODE_PROTOCOL_VIOLATION),
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
***
*** 179,184  WalSndHandshake(void)
--- 179,185 
  	{
  		int			firstchar;
  
+ 		WalSndSetState(WALSNDSTATE_IDLE);
  		set_ps_display(idle, false);
  
  		/* Wait for a command to arrive */
***
*** 482,487  WalSndLoop(void)
--- 483,491 
  			if (!XLogSend(output_message, caughtup))
  break;
  		}
+ 
+ 		/* Update our state to indicate if we're behind or not */
+ 		WalSndSetState(caughtup ? WALSNDSTATE_STREAMING : WALSNDSTATE_CATCHUP);
  	}
  
  	/*
***
*** 533,538  InitWalSnd(void)
--- 537,543 
  			 */
  			walsnd-pid = MyProcPid;
  			MemSet(walsnd-sentPtr, 0, sizeof(XLogRecPtr));
+ 			walsnd-state = WALSNDSTATE_IDLE;
  			SpinLockRelease(walsnd-mutex);
  			/* don't need the lock anymore */
  			OwnLatch((Latch *) walsnd-latch);
***
*** 960,965  WalSndWakeup(void)
--- 965,1004 
  		SetLatch(WalSndCtl-walsnds[i].latch);
  }
  
+ /* Set state for current walsender (only called in walsender) */
+ void WalSndSetState(WalSndState state)
+ {
+ 	/* use volatile pointer to prevent code rearrangement */
+

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 06:37:33AM -0500, Robert Haas wrote:
 On Sun, Jan 9, 2011 at 4:59 PM, Noah Misch n...@leadboat.com wrote:
  This begins the patch series for the design I recently proposed[1] for 
  avoiding
  some table rewrites in ALTER TABLE ... ALTER COLUMN ... TYPE. ?I'm posting 
  these
  patches today:
 
  0 - new test cases
 
 This doesn't look right.  You might be building it, but you sure
 aren't rebuilding it.
 
 +CREATE TABLE parent (keycol numeric PRIMARY KEY);
 +NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 parent_pkey for table parent
 +DEBUG:  Rebuilding index parent_pkey

Yes.  I considered saying Building unconditionally.  Differentiating the debug
message by passing down the fact that the index recently existed seemed like
overkill.  What do you think?

 In general, I think this is six kinds of overkill.  I don't think we
 really need 2000 lines of new regression tests for this feature.  I'd
 like to see that chopped down by at least 10x.

I just included all the tests I found useful to check my own work.  If 200 lines
of SQL and expected output is the correct amount, I'll make it so.

 I don't like this bit:
 
 +   ereport(IsToastRelation(indexRelation) ? DEBUG2 : DEBUG1,
 
 I see no reason to set the verbosity differently depending on whether
 or not something's a toast relation; that seems more likely to be
 confusing than helpful.  I guess my vote would be to make all of these
 messages DEBUG2, period.  A quick test suggests that doesn't produce
 too much noise executing DDL commands.

The theoretical basis is that users can do little to directly change the need to
rebuild a TOAST index.  We'll rebuild the TOAST index if and only if we rewrote
the table.  The practical basis is that the TOAST relation names contain parent
relation OIDs, so we don't want them mentioned in regression test output.

Thanks for reviewing.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [TESTERS] [TEST REPORT] 9.1Alpha3 Feature E.1.4.7.2 in release notes.

2011-01-11 Thread Peter Eisentraut

On tis, 2011-01-11 at 12:30 +0900, Itagaki Takahiro wrote:
 On Tue, Jan 11, 2011 at 11:10, Tom Lane t...@sss.pgh.pa.us wrote:
  Itagaki Takahiro itagaki.takah...@gmail.com writes:
  It was reported from a tester that we don't have casts of money from/to 
  integer
  types even though we have from/to numeric type.
 
  In most locales, the idea isn't sensible.
 
 The documentation says:
 | Input is accepted in a variety of formats,
 | including integer and floating-point literals
 
 If we won't to add accept integers for money, we should fix the docs.
 | integer and floating-point string literals
 |~~~
 Will it get better?

I think adding a cast from integer to money is probably quite
reasonable.  The other way around, maybe not, or only an explicit cast.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Robert Haas

On Tue, Jan 11, 2011 at 7:14 AM, Noah Misch n...@leadboat.com wrote:
 I have a concern that by making the ALTER TABLE more complex that we
 might not be able to easily tell if a rewrite happens, or not.

 Perhaps we should add a WITHOUT REWRITE clause? That would allow a user
 to specify that they do not wish a rewrite, so if the AT requires them
 to have one it would then fail.

 These changes do make it harder to guess how much work the ALTER TABLE will 
 do.
 Indeed, about 1/4 of my own guesses prior to writing were wrong.  Something 
 like
 WITHOUT REWRITE might be the way to go, though there are more questions: if it
 does not rewrite, does it scan the table?  Which indexes, if any, does it
 rebuild?  Which foreign key constraints, if any, does it recheck?  With patch 
 0,
 you can answer all these questions by enabling DEBUG1 messages and trying the
 command on your test system.  For this reason, I did consider adding a VERBOSE
 clause to show those messages at DETAIL, rather than unconditionally showing
 them at DEBUG1.  In any case, if a WITHOUT REWRITE like you describe covers 
 the
 important question, it's certainly easy enough to implement.

I'm doubtful that it's worth complicating the parser logic.  Our DDL
is transactional, so someone can always begin a transaction, increase
client_min_messages, and then look at the output from these added
debug messages to see what is happening.  It should be clear within a
second or two if it's not what is wanted, and they can just hit ^C.

 True.  At least we could completely document the lock choices on the ALTER 
 TABLE
 reference page.  The no-rewrite cases are defined at arms length from ALTER
 TABLE, and users can define their own, so it's a tougher fit.

I don't think it's prohibitively, tough, though, and I think it would
be very valuable.  We may have to scratch our heads over exactly where
to put the information, but we can figure out something that works.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Anssi Kääriäinen


On 01/10/2011 06:03 PM, Kevin Grittner wrote:

Due to popular request (Hey, David's popular, right?), I'm posting a
patch for Serializable Snapshot Isolation (SSI), although I don't
yet have everything in it that I was planning on submitting before
the CF.  I will probably be submitting another version before the
deadline with the following, but there should be plenty here for
people to test and benchmark.  We're done with the major refactoring
needed to address concerns raised in earlier reviews, and I don't
expect the remaining work to destabilize what's there or to have a
significant impact on performance.

I think I found a problem. This is using SSI v8. The table definition:

create table test_t (id integer, val1 text, val2 integer);

create index test_idx on test_t(id) where val2 = 1;

The data:

insert into test_t (select generate_series(0, 1), 'a', 2);
insert into test_t (select generate_series(0, 10), 'a', 1);

The transactions:
T1:
hot2= begin transaction isolation level serializable;
BEGIN
hot2= select * from test_t where val2 = 1;
 id | val1 | val2
+--+--
  0 | a|1
  1 | a|1
  2 | a|1
  3 | a|1
  4 | a|1
  5 | a|1
  6 | a|1
  7 | a|1
  8 | a|1
  9 | a|1
 10 | a|1
(11 rows)

hot2= update test_t set val2 = 2 where val2 = 1 and id = 10;
UPDATE 1
-- The concurrent transaction:
T2:
hot2= begin transaction isolation level serializable;
BEGIN
hot2= select * from test_t where val2 = 1;
 id | val1 | val2
+--+--
  0 | a|1
  1 | a|1
  2 | a|1
  3 | a|1
  4 | a|1
  5 | a|1
  6 | a|1
  7 | a|1
  8 | a|1
  9 | a|1
 10 | a|1
(11 rows)

hot2= update test_t set val2 = 2 where val2 = 1 and id = 9;
UPDATE 1
hot2= commit;
COMMIT
-- Now, t1 can commit, too. Even though there is a serialization anomaly
T1:
hot2= commit;
COMMIT

If the test_idx is changed:
(outside any transaction...)
hot2= drop index test_idx;
DROP INDEX
hot2= create index test_idx on test_t(id, val2);
CREATE INDEX


T1:
hot2= begin transaction isolation level serializable;
BEGIN
hot2= select * from test_t where val2 = 1;
 id | val1 | val2
+--+--
  0 | a|1
  1 | a|1
  2 | a|1
  3 | a|1
  4 | a|1
  5 | a|1
  6 | a|1
  7 | a|1
  8 | a|1
(9 rows)

hot2= update test_t set val2 = 2 where val2 = 1 and id = 8;
UPDATE 1

T2:
hot2= select * from test_t where val2 = 1;
 id | val1 | val2
+--+--
  0 | a|1
  1 | a|1
  2 | a|1
  3 | a|1
  4 | a|1
  5 | a|1
  6 | a|1
  7 | a|1
  8 | a|1
(9 rows)

hot2= update test_t set val2 = 2 where val2 = 1 and id = 7;
UPDATE 1
hot2= commit;
ERROR:  could not serialize access due to read/write dependencies among 
transactions

HINT:  The transaction might succeed if retried.
T1:
hot2= commit;
COMMIT

So, something seems to be broken when using partial indexes.

 - Anssi


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:

 These changes do make it harder to guess how much work the ALTER TABLE
 will do. Indeed, about 1/4 of my own guesses prior to writing were
 wrong.  Something like WITHOUT REWRITE might be the way to go, though
 there are more questions: if it does not rewrite, does it scan the
 table?  Which indexes, if any, does it rebuild?  Which foreign key
 constraints, if any, does it recheck?  With patch 0, you can answer
 all these questions by enabling DEBUG1 messages and trying the command
 on your test system.  For this reason, I did consider adding a VERBOSE
 clause to show those messages at DETAIL, rather than unconditionally
 showing them at DEBUG1.  In any case, if a WITHOUT REWRITE like you
 describe covers the important question, it's certainly easy enough to
 implement.

Trouble is, only superusers can set DEBUG1.

You're right, its more complex than I made out, though that strengthens
the feeling that we need a way to check what it does before it does it,
or a way to limit your expectations. Ideally that would be a
programmatic way, so that pgAdmin et al can issue a warning.

Given your thoughts above, my preference would be for 
EXPLAIN ALTER TABLE to describe the actions that will take place.

Or other ideas... 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PGCon 2011 Call for Papers - reminder

2011-01-11 Thread Dan Langille


A reminder about PGCon 2011; the deadline is 19 January 2011.

PGCon 2011 will be held 19-20 May 2011, in Ottawa at the University of
Ottawa.  It will be preceded by two days of tutorials on 17-18 May 2011.

We are now accepting proposals for talks.  Proposals can be quite 
simple. We do not require academic-style papers.


If you are doing something interesting with PostgreSQL, please submit
a proposal.  You might be one of the backend hackers or work on a
PostgreSQL related project and want to share your know-how with
others. You might be developing an interesting system using PostgreSQL
as the foundation. Perhaps you migrated from another database to
PostgreSQL and would like to share details.  These, and other stories
are welcome. Both users and developers are encouraged to share their
experiences.

Here are a some ideas to jump start your proposal process:

- novel ways in which PostgreSQL is used
- migration of production systems from another database
- data warehousing
- tuning PostgreSQL for different work loads
- replication and clustering
- hacking the PostgreSQL code
- PostgreSQL derivatives and forks
- applications built around PostgreSQL
- benchmarking and performance engineering
- case studies
- location-aware and mapping software with PostGIS
- PostgreSQL 9.1 and 9.2 features in development
- research and teaching with PostgreSQL
- things the PostgreSQL project could do better
- integrating PostgreSQL with 3rd-party software

Both users and developers are encouraged to share their experiences.

The schedule is:

 1 Dec 2010 Proposal acceptance begins
19 Jan 2011 Proposal acceptance ends
19 Feb 2011 Confirmation of accepted proposals

See also http://www.pgcon.org/2011/papers.php

Instructions for submitting a proposal to PGCon 2011 are available
from: http://www.pgcon.org/2011/submissions.php

--
Dan Langille - http://langille.org/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 12:37:28PM +, Simon Riggs wrote:
 On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
 
  These changes do make it harder to guess how much work the ALTER TABLE
  will do. Indeed, about 1/4 of my own guesses prior to writing were
  wrong.  Something like WITHOUT REWRITE might be the way to go, though
  there are more questions: if it does not rewrite, does it scan the
  table?  Which indexes, if any, does it rebuild?  Which foreign key
  constraints, if any, does it recheck?  With patch 0, you can answer
  all these questions by enabling DEBUG1 messages and trying the command
  on your test system.  For this reason, I did consider adding a VERBOSE
  clause to show those messages at DETAIL, rather than unconditionally
  showing them at DEBUG1.  In any case, if a WITHOUT REWRITE like you
  describe covers the important question, it's certainly easy enough to
  implement.
 
 Trouble is, only superusers can set DEBUG1.

Setting client_min_messages in one's session works, as does ALTER ROLE myself
SET client_min_messages = debug1.  Still, indeed, DEBUG1 is not the usual place
to send a user for information.

 You're right, its more complex than I made out, though that strengthens
 the feeling that we need a way to check what it does before it does it,
 or a way to limit your expectations. Ideally that would be a
 programmatic way, so that pgAdmin et al can issue a warning.
 
 Given your thoughts above, my preference would be for 
 EXPLAIN ALTER TABLE to describe the actions that will take place.

That does seem like the best UI.  Offhand, I would guess that's a project larger
than the patch series I have here.  We'd need to restructure ALTER TABLE into
clear planning and execution stages, if not use the actual planner and executor.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add function dependencies

2011-01-11 Thread Peter Eisentraut

On mån, 2011-01-10 at 23:59 +0100, Joel Jacobson wrote:
 It would be equally useful if it warned you when trying to drop a
 function other functions might depend on.

This would only work for a small subset of cases, so the argument can be
made that it is less surprising to say, we don't track dependencies of
the function source code at all.

Making it work for language SQL would be nice, though.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 07:27:46AM -0500, Robert Haas wrote:
 On Tue, Jan 11, 2011 at 7:14 AM, Noah Misch n...@leadboat.com wrote:
  True. ?At least we could completely document the lock choices on the ALTER 
  TABLE
  reference page. ?The no-rewrite cases are defined at arms length from ALTER
  TABLE, and users can define their own, so it's a tougher fit.
 
 I don't think it's prohibitively, tough, though, and I think it would
 be very valuable.  We may have to scratch our heads over exactly where
 to put the information, but we can figure out something that works.

Agreed.  I don't mean to suggest giving up on documenting anything, just that
some behaviors are clear enough by documentation alone, while others benefit
from additional syntax/messages to constrain/see what actually happens.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Simon Riggs

On Tue, 2011-01-11 at 08:06 -0500, Noah Misch wrote:
 On Tue, Jan 11, 2011 at 12:37:28PM +, Simon Riggs wrote:
  On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
  
   These changes do make it harder to guess how much work the ALTER TABLE
   will do. Indeed, about 1/4 of my own guesses prior to writing were
   wrong.  Something like WITHOUT REWRITE might be the way to go, though
   there are more questions: if it does not rewrite, does it scan the
   table?  Which indexes, if any, does it rebuild?  Which foreign key
   constraints, if any, does it recheck?  With patch 0, you can answer
   all these questions by enabling DEBUG1 messages and trying the command
   on your test system.  For this reason, I did consider adding a VERBOSE
   clause to show those messages at DETAIL, rather than unconditionally
   showing them at DEBUG1.  In any case, if a WITHOUT REWRITE like you
   describe covers the important question, it's certainly easy enough to
   implement.
  
  Trouble is, only superusers can set DEBUG1.
 
 Setting client_min_messages in one's session works, as does ALTER ROLE myself
 SET client_min_messages = debug1.  Still, indeed, DEBUG1 is not the usual 
 place
 to send a user for information.
 
  You're right, its more complex than I made out, though that strengthens
  the feeling that we need a way to check what it does before it does it,
  or a way to limit your expectations. Ideally that would be a
  programmatic way, so that pgAdmin et al can issue a warning.
  
  Given your thoughts above, my preference would be for 
  EXPLAIN ALTER TABLE to describe the actions that will take place.
 
 That does seem like the best UI.  Offhand, I would guess that's a project 
 larger
 than the patch series I have here.  We'd need to restructure ALTER TABLE into
 clear planning and execution stages, if not use the actual planner and 
 executor.

Please do something that works in this release, whatever that is. I will
follow your lead in putting a similar mechanism in for judging lock
levels.

I don't want to be looking through the docs each time I run this,
sweating between it taking 5 secs and 5 hours on a big server.
We need to be able to run stuff overnight, with certainty that we know
what will happen.

And I want a clear answer when the but how can you be certain?
question gets asked.

Thank you for the rest of the patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson

2011/1/11 Robert Haas robertmh...@gmail.com:
 I don't get it.  If two different items that exist in the system out
 of the box have the same description, it seems clear that relevant
 piece of disambiguating information exists nowhere in the description
 string.

I guess it is a question of prioritization.
If backwards compatibility is to be guaranteed, even for functions
returning text intended to be read by humans, then the function cannot
be modified, without violating that golden rule, if such a rule exists
within the PostgreSQL development project?

If it's not a golden rule, then it's a totally different story and
there is no excuse why it should return the same descriptions for the
same objects.
Any other reasoning is just silly.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Noah Misch

On Tue, Jan 11, 2011 at 01:17:23PM +, Simon Riggs wrote:
 On Tue, 2011-01-11 at 08:06 -0500, Noah Misch wrote:
  On Tue, Jan 11, 2011 at 12:37:28PM +, Simon Riggs wrote:
   Given your thoughts above, my preference would be for 
   EXPLAIN ALTER TABLE to describe the actions that will take place.
  
  That does seem like the best UI.  Offhand, I would guess that's a project 
  larger
  than the patch series I have here.  We'd need to restructure ALTER TABLE 
  into
  clear planning and execution stages, if not use the actual planner and 
  executor.
 
 Please do something that works in this release, whatever that is. I will
 follow your lead in putting a similar mechanism in for judging lock
 levels.

Okay; I'll see what I can come up with.  The other part I was going to try to
finish before the last commitfest begins is avoiding unnecessary rebuilds of
indexes involving changed columns.  Is that more or less important than having
an EXPLAIN ALTER TABLE?

 I don't want to be looking through the docs each time I run this,
 sweating between it taking 5 secs and 5 hours on a big server.
 We need to be able to run stuff overnight, with certainty that we know
 what will happen.
 
 And I want a clear answer when the but how can you be certain?
 question gets asked.

Just to clarify: does Robert's suggestion of starting the command in a
transaction block and cancelling it after messages appear (on any other DB with
the same schema, if need be) give too little certainty?  You could check
pg_locks to see what lock was taken, too.  It's surely not the ideal user
experience, but it seems dependable at least.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 8:25 AM, Joel Jacobson j...@gluefinance.com wrote:
 2011/1/11 Robert Haas robertmh...@gmail.com:
 I don't get it.  If two different items that exist in the system out
 of the box have the same description, it seems clear that relevant
 piece of disambiguating information exists nowhere in the description
 string.
 
 I guess it is a question of prioritization.
 If backwards compatibility is to be guaranteed, even for functions
 returning text intended to be read by humans, then the function cannot
 be modified, without violating that golden rule, if such a rule exists
 within the PostgreSQL development project?
 
 If it's not a golden rule, then it's a totally different story and
 there is no excuse why it should return the same descriptions for the
 same objects.
 Any other reasoning is just silly.

Well, we shouldn't change them randomly or arbitrarily, but improving them is 
another thing altogether.  I think the contention that any user or application 
anywhere is depending on the exact textual representation of a pg_amproc entry 
is exceedingly dubious.  And I think the current messages are flat-out 
confusing.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 8:50 AM, Noah Misch n...@leadboat.com wrote:
 Okay; I'll see what I can come up with.  The other part I was going to try to
 finish before the last commitfest begins is avoiding unnecessary rebuilds of
 indexes involving changed columns.  Is that more or less important than having
 an EXPLAIN ALTER TABLE?

I think something like EXPLAIN ALTER TABLE would be #%^* awesome (and kudos to 
Simon for thinking of it), but your chances of getting into 9.1 are not very 
good.  So I'd focus on the index rebuild issue, which is more clear-cut.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Anssi Kääriäinen


On 01/10/2011 06:03 PM, Kevin Grittner wrote:

Due to popular request (Hey, David's popular, right?), I'm posting a
patch for Serializable Snapshot Isolation (SSI), although I don't
yet have everything in it that I was planning on submitting before
the CF.  I will probably be submitting another version before the
deadline with the following, but there should be plenty here for
people to test and benchmark.  We're done with the major refactoring
needed to address concerns raised in earlier reviews, and I don't
expect the remaining work to destabilize what's there or to have a
significant impact on performance.

A speed test showing a significant drop in performance when using SSI:

hot2= create table test_t2 as (select generate_series(0, 100));
hot2= \timing
begin transaction isolation level repeatable read;
Time: 0.185 ms
hot2= select count(*) from test_t2;
Time: 134.521 ms
hot2= select count(*) from test_t2;
Time: 142.132 ms
hot2= select count(*) from test_t2;
Time: 147.469 ms
hot2= commit;
hot2= begin transaction isolation level serializable;
Time: 0.165 ms
hot2= select count(*) from test_t2;
Time: 349.219 ms
hot2= select count(*) from test_t2;
Time: 354.010 ms
hot2= select count(*) from test_t2;
Time: 355.960 ms
hot2= commit;

So, count(*) queries are more than twice as slow compared to the old 
serializable transaction isolation level. The relative speed difference 
stays the same if testing with more rows. Also, the same speed 
difference is there if testing with more columns:


create table test_t4 as (select g g1, g g2, g g3, g g4, g g5, g g6 from 
(select generate_series as g from generate_series(0, 100)) as t1);


repeatable read times:
140.113 ms 134.628 ms 140.113 ms 156.166 ms

serializable:
353.257 ms 366.558 ms 373.914 ms 359.682 ms

 - Anssi

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ALTER TYPE 0: Introduction; test cases

2011-01-11 Thread Csaba Nagy

On Tue, 2011-01-11 at 07:14 -0500, Noah Misch wrote:
 On Tue, Jan 11, 2011 at 09:24:46AM +, Simon Riggs wrote:
  I have a concern that by making the ALTER TABLE more complex that we
  might not be able to easily tell if a rewrite happens, or not.

What about add EXPLAIN support to it, then whoever wants to know what it
does should just run explain on it. I have no idea how hard that would
be to implement and if it makes sense at all, but I sure would like to
see a plan for DDLs too to estimate how long it would take.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Kevin Grittner

Anssi Kääriäinenanssi.kaariai...@thl.fi wrote:
 
 A speed test showing a significant drop in performance when using
SSI:
 
 hot2= create table test_t2 as (select generate_series(0, 100));
 hot2= \timing
 begin transaction isolation level repeatable read;
 Time: 0.185 ms
 hot2= select count(*) from test_t2;
 Time: 134.521 ms
 hot2= select count(*) from test_t2;
 Time: 142.132 ms
 hot2= select count(*) from test_t2;
 Time: 147.469 ms
 hot2= commit;
 hot2= begin transaction isolation level serializable;
 Time: 0.165 ms
 hot2= select count(*) from test_t2;
 Time: 349.219 ms
 hot2= select count(*) from test_t2;
 Time: 354.010 ms
 hot2= select count(*) from test_t2;
 Time: 355.960 ms
 hot2= commit;
 
 So, count(*) queries are more than twice as slow compared to the old

 serializable transaction isolation level.
 
Thanks for the report.  I'll profile tat and see what can be done about
it.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Kevin Grittner

Anssi Kääriäinenanssi.kaariai...@thl.fi wrote:
 
 I think I found a problem. This is using SSI v8.
 
Thanks much for testing.  You're managing to exercise some code
paths I didn't think to test, which is great!  I guess this is the
up side of having posted yesterday.  :-)
 
 So, something seems to be broken when using partial indexes.
 
Yes there is.  The predicate locks against the heap tuples should
have prevented that, but obviously we're missing a call to
PredicateLockTuple from the code path for the partial index which is
present on the code path for complete indexes.  I'll go looking for
the spot to add that line of code.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 28/12/10 12:25, Jan Urbański wrote:
 Here's the basic errcodes.txt file and three scripts to generate
 errcodes.h, plerrcodes.h and part of errcodes.sgml.
 
 I tried wiring it into the build system, but failed, I can't figure out
 which Makefiles should be updated in order to make errcodes.h and
 plerrcodes.h generated headers. Could someone help with that?

Trying a bit harder to make src/include/utils/errcodes.h a generated
file I found that it's included so early it makes the task not trivial
at all. postgres.h includes elog.h, which includes errcodes.h (after
defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:

 1) just give up on it, and keep on maintaining 3 places where new error
codes have to go
 2) do not include errcodes.h in elog.h, move the MAKE_SQLSTATE
definition to the top of errcodes.h, guarded by a #ifndef MAKE_SQLSTATE,
fix every place that included elog.h and assumed it has the sqlstate
values to explicitly include errcodes.h. This means that you can still
define your own MAKE_SQLSTATE macro and then include errcodes.h if you
want to redefine what it does.
 3) try to wire generating errcodes.h somewhere very early in the makefiles

My preference is 2) followed by 1), because 3) seems too horrible. But I
can understand if people will want to keep things as they are and just
forget about generating errcodes.h

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Joel Jacobson j...@gluefinance.com writes:
 I instead propose we introduce a new function named
 pg_get_object_unique_identifier( classid oid, objid oid, objsubid
 integer ) returns text.

Seems like concatenating the OIDs would accomplish that.  (If you
think not, well, you still haven't explained what problem you're trying
to solve...)

 The name would make sense since we already have a
 pg_get_function_identity_arguments( func_oid ), for a similar purpose
 but solely for functions.

No, that does not exist for the purpose you claim it does.  And it's far
from obvious what the extension of that to all object classes would
look like.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar ene 11 10:52:12 -0300 2011:

 Well, we shouldn't change them randomly or arbitrarily, but improving them is 
 another thing altogether.  I think the contention that any user or 
 application anywhere is depending on the exact textual representation of a 
 pg_amproc entry is exceedingly dubious.  And I think the current messages are 
 flat-out confusing.

Tom also mentioned that the descriptions are translated, so an
application built to parse anything out of the strings is already
broken.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson

2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 Seems like concatenating the OIDs would accomplish that.  (If you
 think not, well, you still haven't explained what problem you're trying
 to solve...)

The can be different in two different databases sharing the same
original schema, but of two different versions.
In this case it is better to compare textual strings describing the
objects than to compare based on oids.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] LOCK for non-tables

2011-01-11 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 11, 2011 at 4:46 AM, Tatsuo Ishii is...@postgresql.org wrote:
 For query based replication tools like pgpool-II (I don't know any
 other tools, for example Postgres XC falls in this category or
 not...), we need to be able to lock sequences. Fortunately it is allowed to:
 
 SELECT 1 FROM foo_sequece FOR UPDATE;
 
 but LOCK foo_sequence looks more appropreate syntax for me.

 Those aren't doing the same thing.  The first is locking the one and
 only tuple that is contained within the sequence, while the second is
 locking the sequence object itself.

 At this point, I'm inclined to think that the pg_dump comment is just
 wrong, and we ought to fix it to say that we don't really want to be
 able to lock other relations after all, and call it good.

The reason that pg_dump tries to acquire locks at all is to ensure that
it dumps a consistent view of the database.  The general excuse for not
locking non-table objects is that (at least in most cases) they are
defined by single catalog entries and so there's no way to see a
non-self-consistent view of them.  Tables, being defined by a collection
of rows in different catalogs, are *very* risky to dump without any
lock.  This doesn't get noticeably better for non-table relation types.

An example of the sort of risk I'm thinking about is dumping a view
without any lock while someone else does a CREATE OR REPLACE VIEW on
it.  You could very easily see a set of attributes (in pg_attribute)
that don't agree with the view rules you pulled from pg_rewrite.  The
risk is minimal now since we don't allow C.O.R.V. to change the column
set, but as soon as somebody creates a patch that allows that, pg_dump
will have a problem.

Note that using a serializable transaction (with or without true
serializability) doesn't fix this issue, since pg_dump depends so
heavily on backend-side support functions that work in SnapshotNow mode.
It really needs locks to ensure that the support functions see a view
consistent with its own catalog reads.

In the SEQUENCE example above, SELECT ... FOR UPDATE is certainly not
adequate to protect the sequence against DDL-level changes.  Fortunately
sequences don't have too many DDL commands, but still an ALTER RENAME
might be enough to confuse pg_dump.

(By the way, does that SELECT ... FOR UPDATE actually accomplish
anything at all?  nextval() doesn't go through heap_update, and neither
does ALTER SEQUENCE, so I'd be a bit surprised if it really manages to
block changes to the sequence.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI patch version 8

2011-01-11 Thread Anssi Kääriäinen


On 01/11/2011 04:53 PM, Kevin Grittner wrote:

Thanks much for testing.  You're managing to exercise some code
paths I didn't think to test, which is great!  I guess this is the
up side of having posted yesterday.  :-)


Glad that I can help. This feature is something that is very important
to our usage of PostgreSQL.

Just to add some positive feedback: I tried this patch with our in-house
in development EAV database. Everything was working perfectly, and time
to import current and history data for 7000 employees (total of 95000
attributes) to the DB using stored procedures went from 4 minutes 20
seconds to 4 minutes 30 seconds (the procedures are doing a lot of
validation...). So in this case the real-world performance difference
was barely noticeable. SSI was also able to block serialization
anomalies and I did not have any false rollbacks in my (very limited)
testing. So, thank you for the work on this feature. And, of course, I
will try to break it some more.:-)

 - Anssi



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 16:12 , Tom Lane wrote:
 Joel Jacobson j...@gluefinance.com writes:
 I instead propose we introduce a new function named
 pg_get_object_unique_identifier( classid oid, objid oid, objsubid
 integer ) returns text.
 
 Seems like concatenating the OIDs would accomplish that.  (If you
 think not, well, you still haven't explained what problem you're trying
 to solve...)

I think the OP wants a unique identifier which changes neither on dump 
reload nor on major version upgrades. I can see the value in that if you
e.g. want to compare the structures of two different postgres databases.

It seems impossible to guarantee the identifier to not change between
major versions, though - if the structure of that catalog change, so will
the identifier.

@OP: Wouldn't it be sufficient to provide such a thing for structure
objects that people are actually going to modify on a regular basis?
I suggest restricting this to

tables (including SQL/MED foreign tables)
views
sequences
types
servers (SQL/MED)
tsearch configurations
tsearch dictionaries

for which schema.name suffices,

triggers
constraints
columns

for which schema.table.name should work, and

functions
operators

which need to include the argument types.

The reg* types already solve this for

tables, views, sequences (regclass)
tsearch configurations (regconfig)
tsearch dictionaries (regdictionary)
types (regtype)
functions (regprocedure)
operators (regoperator)

which leaves

servers
triggers
constraints
columns

best regards,
Florian Pflug






-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] casts: max double precision text double precision fails with out or range error

2011-01-11 Thread Alvaro Herrera

Excerpts from Maciej Sakrejda's message of mar ene 11 03:28:13 -0300 2011:
 Tried asking this in pgsql-general but I got no response, so I thought
 I'd give hackers a shot:
 
 postgres=# select (((1.7976931348623157081e+308)::double
 precision)::text)::double precision;
 ERROR:  1.79769313486232e+308 is out of range for type double precision
 
 I'm working on a pg driver and in my float data decoder functional
 tests, I ran into some errors that I eventually traced back to this
 behavior. Essentially, postgres seems to cast the max normal double
 (i.e., the bits of ~(1ULL52 | 1ULL63)) to text in such a manner
 that it's rounded up, and the reverse cast, text-to-double-precision,
 does not recognize it as being in range. Curiously, pg_dump seems to
 print doubles with more precision (in both COPY and INSERT modes),
 avoiding this issue.

Yeah, it sets the extra_float_digits parameter.

alvherre=# set extra_float_digits to 3;
SET
alvherre=# select (((1.7976931348623157081e+308)::double 
precision)::text)::double precision;
  float8  
──
 1.79769313486231571e+308
(1 fila)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson

Has anyone written a in-depth description on how to traverse the pg_depend tree?
The 'a' and 'i' deptype really makes it hard to figure out the
dependency order, a topological sort does not work.

My latest attempt involved trying to group by all objects connected to
each other via deptype 'a' or 'i', and replacing all such nodes in the
tree with the source node, i.e. the node which according to the
topological order could be created first.

Am I on the right path, trying to fuse the internal/auto objects
together, replacing them with the top most object in the tree?
Or is there a simplier way to figure out the order in which objects
can be created?

I need a general solution, not a custom-made query for each regclass,
which is quite trivial but feels far from bullet proof, I want
something only relying on pg_depend, since it should be the safest
method of them all.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
 On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  2011/1/7 Magnus Hagander mag...@hagander.net:
  On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
  cedric.villemain.deb...@gmail.com wrote:
  2011/1/5 Magnus Hagander mag...@hagander.net:
  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr 
  wrote:
  Magnus Hagander mag...@hagander.net writes:
  * Stefan mentiond it might be useful to put some
  posix_fadvise(POSIX_FADV_DONTNEED)
    in the process that streams all the files out. Seems useful, as long 
  as that
    doesn't kick them out of the cache *completely*, for other backends 
  as well.
    Do we know if that is the case?
 
  Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
  not already in SHM?
 
  I think that's way more complex than we want to go here.
 
 
  DONTNEED will remove the block from OS buffer everytime.
 
  Then we definitely don't want to use it - because some other backend
  might well want the file. Better leave it up to the standard logic in
  the kernel.
 
  Looking at the patch, it is (very) easy to add the support for that in
  basebackup.c
  That supposed allowing mincore(), so mmap(), and so probably switch
  the fopen() to an open() (or add an open() just for mmap
  requirement...)
 
  Let's go ?
 
 Per above, I still don't think we *should* do this. We don't want to
 kick things out of the cache underneath other backends, and since we
 can't control that. Either way, it shouldn't happen in the beginning,
 and if it does, should be backed with proper benchmarks.

Another option that occurs to me is an option to use direct IO (or another
means as needed) to bypass the cache.  So rather than kicking it out of 
the cache, we attempt just not to pollute the cache by bypassing it for cold
pages and use either normal io for 'hot pages', or use a 'read()' to heat 
the cache afterward.

Garick

 
 I've committed the backend side of this, without that. Still working
 on the client, and on cleaning up Heikki's patch for grammar/parser
 support.
 
 -- 
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add function dependencies

2011-01-11 Thread Dimitri Fontaine

Peter Eisentraut pete...@gmx.net writes:
 Making it work for language SQL would be nice, though.

Please consider a new DEPENDENCY_XXX constant for that though, because
otherwise I think it could cause problems in the extension's dependency
tracking.  Even with a new DEPENDENCY_FUNCALL or other constant, the
extension code would need to be aware of this new kind of not-to-follow
dependencies somehow, I guess.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Mon, Jan 10, 2011 at 8:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not really.  AFAIR, there are two cases that exist in practice,
 depending on which AM you're talking about:
 
 1. The recorded types match the input types of the operator/function
   (btree  hash).
 2. The recorded types are always the same as the opclass's input type
   (gist  gin).
 
 In neither case does printing those types really add much information.
 That's why it's not there now.

 I don't get it.  If two different items that exist in the system out
 of the box have the same description, it seems clear that relevant
 piece of disambiguating information exists nowhere in the description
 string.

The relevant piece of disambiguating information is the function
name+parameters in the first case, or the opclass name in the second.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [TESTERS] [TEST REPORT] 9.1Alpha3 Feature E.1.4.7.2 in release notes.

2011-01-11 Thread Tom Lane

Peter Eisentraut pete...@gmx.net writes:
 On tis, 2011-01-11 at 12:30 +0900, Itagaki Takahiro wrote:
 If we won't to add accept integers for money, we should fix the docs.
 | integer and floating-point string literals
 |~~~
 Will it get better?

 I think adding a cast from integer to money is probably quite
 reasonable.  The other way around, maybe not, or only an explicit cast.

As near as I can tell, this entire thread started because someone
thought that the reference to numeric in the release notes implied
any numerical type, not the type named numeric.  We explicitly
rejected the idea of providing direct casts to/from floating point
types, on the grounds of not wanting any roundoff error; so I don't
think this is a point that should be revisited.  Perhaps it would be
sufficient to clarify the release-note item.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 I tried wiring it into the build system, but failed, I can't figure out
 which Makefiles should be updated in order to make errcodes.h and
 plerrcodes.h generated headers. Could someone help with that?

 Trying a bit harder to make src/include/utils/errcodes.h a generated
 file I found that it's included so early it makes the task not trivial
 at all. postgres.h includes elog.h, which includes errcodes.h (after
 defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:

Huh?  Why in the world would the specific location of the #include have
anything to do with the problem?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Florian Pflug f...@phlo.org writes:
 @OP: Wouldn't it be sufficient to provide such a thing for structure
 objects that people are actually going to modify on a regular basis?

Yeah, I still don't see the need to argue over whether the elements of
an operator class are uniquely identifiable or not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Tom Lane

Joel Jacobson j...@gluefinance.com writes:
 Has anyone written a in-depth description on how to traverse the pg_depend 
 tree?

Try reading the code in src/backend/catalog/dependency.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Garick Hamlin gham...@isc.upenn.edu:
 On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
 On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
  2011/1/7 Magnus Hagander mag...@hagander.net:
  On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
  cedric.villemain.deb...@gmail.com wrote:
  2011/1/5 Magnus Hagander mag...@hagander.net:
  On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine dimi...@2ndquadrant.fr 
  wrote:
  Magnus Hagander mag...@hagander.net writes:
  * Stefan mentiond it might be useful to put some
  posix_fadvise(POSIX_FADV_DONTNEED)
    in the process that streams all the files out. Seems useful, as 
  long as that
    doesn't kick them out of the cache *completely*, for other backends 
  as well.
    Do we know if that is the case?
 
  Maybe have a look at pgfincore to only tag DONTNEED for blocks that are
  not already in SHM?
 
  I think that's way more complex than we want to go here.
 
 
  DONTNEED will remove the block from OS buffer everytime.
 
  Then we definitely don't want to use it - because some other backend
  might well want the file. Better leave it up to the standard logic in
  the kernel.
 
  Looking at the patch, it is (very) easy to add the support for that in
  basebackup.c
  That supposed allowing mincore(), so mmap(), and so probably switch
  the fopen() to an open() (or add an open() just for mmap
  requirement...)
 
  Let's go ?

 Per above, I still don't think we *should* do this. We don't want to
 kick things out of the cache underneath other backends, and since we
 can't control that. Either way, it shouldn't happen in the beginning,
 and if it does, should be backed with proper benchmarks.

 Another option that occurs to me is an option to use direct IO (or another
 means as needed) to bypass the cache.  So rather than kicking it out of
 the cache, we attempt just not to pollute the cache by bypassing it for cold
 pages and use either normal io for 'hot pages', or use a 'read()' to heat
 the cache afterward.

AFAIR, even Linus is rejecting the idea to use it seriously, except if
I shuffle in my memory.


 Garick


 I've committed the backend side of this, without that. Still working
 on the client, and on cleaning up Heikki's patch for grammar/parser
 support.

 --
  Magnus Hagander
  Me: http://www.hagander.net/
  Work: http://www.redpill-linpro.com/

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson

2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 Try reading the code in src/backend/catalog/dependency.c.

I've tried but failed to figure it out anyway. The focus in
dependency.c is to find out dependencies of a given object.
What I want to do is something slighly different.
I need to figure out the order of creation of all objects, not just
the dependencies for a single object.

Basically, I want to do order by oid, but since you cannot trust the
oid order (like we did in pre-7.3), I need to get the sorted list
using pg_depend somehow.

I guess I could make findDependentObjects() callable from sql and call
it for each and every object, but that's a quite big project, I was
hoping it was possible to do it in plain sql, or at least only by
relying on plpgsql/plperl.

I've implemented tsort() in plperl, but that didn't really help me
much, since you need to jump around in the tree when you encounter
'internal' and 'auto' nodes.

My last resort is to sort by oid, but I really don't want to do that
since it would render the entire solution unreliable and I would never
feel comfortable using it in the production environment.

This is the last component I need in order to complete the work on the
pov-project http://www.github.com/gluefinance/pov

I would highly appreciate your help, I feel a bit lame since I've
spent over two days working on this. It's not difficult if you are
allowed to build specialized queries for each class, but my goal is a
general query only relying on pg_depend.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Joel Jacobson j...@gluefinance.com writes:
 2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 Seems like concatenating the OIDs would accomplish that.  (If you
 think not, well, you still haven't explained what problem you're trying
 to solve...)

 The can be different in two different databases sharing the same
 original schema, but of two different versions.
 In this case it is better to compare textual strings describing the
 objects than to compare based on oids.

If that's what you're after, getObjectDescription is entirely
unsuitable, because of the fact that its results are dependent
on search path and language settings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Tom Lane

Joel Jacobson j...@gluefinance.com writes:
 I need to figure out the order of creation of all objects, not just
 the dependencies for a single object.

In that case try pg_dump's pg_dump_sort.c.  You will never get the
order of creation of objects, because that isn't tracked; but you can
find out what a safe order would be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [TESTERS] [TEST REPORT] 9.1Alpha3 Feature E.1.4.7.2 in release notes.

2011-01-11 Thread Peter Eisentraut

On tis, 2011-01-11 at 11:03 -0500, Tom Lane wrote:
 We explicitly
 rejected the idea of providing direct casts to/from floating point
 types, on the grounds of not wanting any roundoff error; so I don't
 think this is a point that should be revisited.

We also explicitly chose floating point as the result of the money/money
operator over numeric.  Seems a bit inconsistent.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Tue, Jan 11, 2011 at 11:39:20AM -0500, Cédric Villemain wrote:
 2011/1/11 Garick Hamlin gham...@isc.upenn.edu:
  On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
  On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
  cedric.villemain.deb...@gmail.com wrote:
   2011/1/7 Magnus Hagander mag...@hagander.net:
   On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
   cedric.villemain.deb...@gmail.com wrote:
   2011/1/5 Magnus Hagander mag...@hagander.net:
   On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine 
   dimi...@2ndquadrant.fr wrote:
   Magnus Hagander mag...@hagander.net writes:
   * Stefan mentiond it might be useful to put some
   posix_fadvise(POSIX_FADV_DONTNEED)
     in the process that streams all the files out. Seems useful, as 
   long as that
     doesn't kick them out of the cache *completely*, for other 
   backends as well.
     Do we know if that is the case?
  
   Maybe have a look at pgfincore to only tag DONTNEED for blocks that 
   are
   not already in SHM?
  
   I think that's way more complex than we want to go here.
  
  
   DONTNEED will remove the block from OS buffer everytime.
  
   Then we definitely don't want to use it - because some other backend
   might well want the file. Better leave it up to the standard logic in
   the kernel.
  
   Looking at the patch, it is (very) easy to add the support for that in
   basebackup.c
   That supposed allowing mincore(), so mmap(), and so probably switch
   the fopen() to an open() (or add an open() just for mmap
   requirement...)
  
   Let's go ?
 
  Per above, I still don't think we *should* do this. We don't want to
  kick things out of the cache underneath other backends, and since we
  can't control that. Either way, it shouldn't happen in the beginning,
  and if it does, should be backed with proper benchmarks.
 
  Another option that occurs to me is an option to use direct IO (or another
  means as needed) to bypass the cache.  So rather than kicking it out of
  the cache, we attempt just not to pollute the cache by bypassing it for cold
  pages and use either normal io for 'hot pages', or use a 'read()' to heat
  the cache afterward.
 
 AFAIR, even Linus is rejecting the idea to use it seriously, except if
 I shuffle in my memory.

Direct IO is generally a pain.

POSIX_FADV_NOREUSE is an alternative (I think).  Realistically I wasn't sure 
which
way(s) actually worked.  My gut was that direct io would likely work right on 
Linux
and Solaris, at least.  If POSIX_FADV_NOREUSE works than maybe that is the 
answer
instead, but I haven't tested either.

Garick


 
 
  Garick
 
 
  I've committed the backend side of this, without that. Still working
  on the client, and on cleaning up Heikki's patch for grammar/parser
  support.
 
  --
   Magnus Hagander
   Me: http://www.hagander.net/
   Work: http://www.redpill-linpro.com/
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 
 
 
 
 -- 
 Cédric Villemain               2ndQuadrant
 http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_depend explained

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 16:54 , Joel Jacobson wrote:
 Has anyone written a in-depth description on how to traverse the pg_depend 
 tree?
 The 'a' and 'i' deptype really makes it hard to figure out the
 dependency order, a topological sort does not work.

Could you give an example of the kind of trouble you're experiencing trying
to use a topological sort?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
 My gut was that direct io would likely work right on Linux
 and Solaris, at least.

Didn't we discover recently that O_DIRECT fails for ext4 on linux
if ordered=data, or something like that?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Garick Hamlin gham...@isc.upenn.edu:
 On Tue, Jan 11, 2011 at 11:39:20AM -0500, Cédric Villemain wrote:
 2011/1/11 Garick Hamlin gham...@isc.upenn.edu:
  On Mon, Jan 10, 2011 at 09:09:28AM -0500, Magnus Hagander wrote:
  On Sun, Jan 9, 2011 at 23:33, Cédric Villemain
  cedric.villemain.deb...@gmail.com wrote:
   2011/1/7 Magnus Hagander mag...@hagander.net:
   On Fri, Jan 7, 2011 at 01:47, Cédric Villemain
   cedric.villemain.deb...@gmail.com wrote:
   2011/1/5 Magnus Hagander mag...@hagander.net:
   On Wed, Jan 5, 2011 at 22:58, Dimitri Fontaine 
   dimi...@2ndquadrant.fr wrote:
   Magnus Hagander mag...@hagander.net writes:
   * Stefan mentiond it might be useful to put some
   posix_fadvise(POSIX_FADV_DONTNEED)
     in the process that streams all the files out. Seems useful, as 
   long as that
     doesn't kick them out of the cache *completely*, for other 
   backends as well.
     Do we know if that is the case?
  
   Maybe have a look at pgfincore to only tag DONTNEED for blocks that 
   are
   not already in SHM?
  
   I think that's way more complex than we want to go here.
  
  
   DONTNEED will remove the block from OS buffer everytime.
  
   Then we definitely don't want to use it - because some other backend
   might well want the file. Better leave it up to the standard logic in
   the kernel.
  
   Looking at the patch, it is (very) easy to add the support for that in
   basebackup.c
   That supposed allowing mincore(), so mmap(), and so probably switch
   the fopen() to an open() (or add an open() just for mmap
   requirement...)
  
   Let's go ?
 
  Per above, I still don't think we *should* do this. We don't want to
  kick things out of the cache underneath other backends, and since we
  can't control that. Either way, it shouldn't happen in the beginning,
  and if it does, should be backed with proper benchmarks.
 
  Another option that occurs to me is an option to use direct IO (or another
  means as needed) to bypass the cache.  So rather than kicking it out of
  the cache, we attempt just not to pollute the cache by bypassing it for 
  cold
  pages and use either normal io for 'hot pages', or use a 'read()' to heat
  the cache afterward.

 AFAIR, even Linus is rejecting the idea to use it seriously, except if
 I shuffle in my memory.

 Direct IO is generally a pain.

 POSIX_FADV_NOREUSE is an alternative (I think).  Realistically I wasn't sure 
 which
 way(s) actually worked.  My gut was that direct io would likely work right on 
 Linux
 and Solaris, at least.  If POSIX_FADV_NOREUSE works than maybe that is the 
 answer
 instead, but I haven't tested either.

yes it should be the best option, unfortunely it is a ghost flag, it
doesn't do anythig.
At some point there were a libprefetch library and a linux fincore()
syscall in the air. Unfortunely actors of those items stop
communication with open source afais. (I didn't get answers myself,
neither linux ML get ones.)



 Garick



 
  Garick
 
 
  I've committed the backend side of this, without that. Still working
  on the client, and on cleaning up Heikki's patch for grammar/parser
  support.
 
  --
   Magnus Hagander
   Me: http://www.hagander.net/
   Work: http://www.redpill-linpro.com/
 
  --
  Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-hackers
 



 --
 Cédric Villemain               2ndQuadrant
 http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Jeff Davis

On Mon, 2011-01-10 at 11:50 -0600, Kevin Grittner wrote:
 I'm trying not to panic here, but I haven't looked at 2PC before
 yesterday and am just dipping into the code to support it, and time
 is short.  Can anyone give me a pointer to anything I should read
 before I dig through the 2PC code, which might accelerate this?

I don't see much about 2PC outside of twophase.c.

Regarding the original post, I agree that we should have two
phase-commit support for SSI. We opted not to support it for
notifications, but there was a fairly reasonable argument why users
wouldn't value the combination of 2PC and NOTIFY.

I don't expect this to be a huge roadblock for the feature though. It
seems fairly contained. I haven't read the 2PC code either, but I don't
expect that you'll need to change the rest of your algorithm just to
support it.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 11/01/11 17:11, Tom Lane wrote:
 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 I tried wiring it into the build system, but failed, I can't figure out
 which Makefiles should be updated in order to make errcodes.h and
 plerrcodes.h generated headers. Could someone help with that?
 
 Trying a bit harder to make src/include/utils/errcodes.h a generated
 file I found that it's included so early it makes the task not trivial
 at all. postgres.h includes elog.h, which includes errcodes.h (after
 defining the MAKE_SQLSTATE macro). I'm not sure how to proceed:
 
 Huh?  Why in the world would the specific location of the #include have
 anything to do with the problem?

I'v having a hard time convincing make to generate errcodes.h before
compiling any .c file that includes postgres.h. The only way I found was
to make src/include/errcodes.h a dependancy of the all target.

For instance, I tried to copy the way we generate fmgroids.h and it
turned out that it doesn't work (it tries to compile things in src/port
before entering src/include, and things in src/port include postgres.h).

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Tom Lane

Florian Pflug f...@phlo.org writes:
 On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
 My gut was that direct io would likely work right on Linux
 and Solaris, at least.

 Didn't we discover recently that O_DIRECT fails for ext4 on linux
 if ordered=data, or something like that?

Quite.  Blithe assertions that something like this should work aren't
worth the electrons they're written on.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Tom Lane

=?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 On 11/01/11 17:11, Tom Lane wrote:
 Huh?  Why in the world would the specific location of the #include have
 anything to do with the problem?

 I'v having a hard time convincing make to generate errcodes.h before
 compiling any .c file that includes postgres.h. The only way I found was
 to make src/include/errcodes.h a dependancy of the all target.

Peter would probably be a better person than me to answer that, but I
imagine that what you want is similar to what src/backend/Makefile does
for parser/gram.h, only applied at the src/ level or maybe even the
root.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Florian Pflug

On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
 I'm trying not to panic here, but I haven't looked at 2PC before
 yesterday and am just dipping into the code to support it, and time
 is short.  Can anyone give me a pointer to anything I should read
 before I dig through the 2PC code, which might accelerate this?


It roughly works as follows

Upon PREPARE, the locks previously held by the transaction are transferred
to a kind of virtual backend which only consists of a special proc array
entry. The transaction will thus still appear to be running, and will still
be holding its locks, even after the original backend is gone. The information
necessary to reconstruct that proc array entry is also written to the 2PC state,
and used to recreate the virtual backend after a restart or crash.

There are also some additional pieces of transaction state which are stored
in the 2PC state file like the full list of subtransaction xids (The proc array
entry may not contain all of them if it overflowed). 

Upon COMMIT PREPARED, the information in the 2PC state file is used to write
a COMMIT wal record and to update the clog. The transaction is then committed,
and the special proc array entry is removed and all lockmgr locks it held are
released.

For 2PC to work for SSI transaction, I guess you must check for conflicts
during PREPARE - at any later point the COMMIT may only fail transiently,
not permanently. Any transaction that adds a conflict with an already
prepared transaction must check if that conflict completes a dangerous
structure, and abort if this is the case, since the already PREPAREd transaction
can no longer be aborted. COMMIT PREPARED then probably doesn't need to do
anything special for SSI transactions, apart from some cleanup actions maybe.

Unfortunately, it seems that doing things this way will undermine the guarantee
that retrying a failed SSI transaction won't fail due to the same conflict as
it did originally. Consider

T1 BEGIN TRANSACTION ISOLATION SERIALIZABLE
T1 SELECT * FROM T
T1 UPDATE T ...
T1 PREPARE TRANSACTION

T2 BEGIN TRANSACTION ISOLATION SERIALIZABLE
T2 SELECT * FROM T
T2 UPDATE T ...
- Serialization Error

Retrying T2 won't help as long as T1 isn't COMMITTED.

There doesn't seems a way around that, however - any correct implementation
of 2PC for SSI will have to behave that way I fear :-(

Hope this helps  best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas

Now that we have a basic over-the-wire base backup capability in 
walsender, it would be nice to allow taking multiple base backups at the 
same time. It might not seem very useful at first, but it makes it 
easier to set up standbys for small databases. At the moment, if you 
want to set up two standbys, you have to either take a single base 
backup and distribute it to both standbys, or somehow coordinate that 
they don't try to take the base backup at the same time. Also, you don't 
want initializing a standby to conflict with a nightly backup cron script.


So, this patch modifies the internal do_pg_start/stop_backup functions 
so that in addition to the traditional mode of operation, where a 
backup_label file is created in the data directory where it's backed up 
along with all other files, the backup label file is be returned to the 
caller, and the caller is responsible for including it in the backup. 
The code in replication/basebackup.c includes it in the tar file that's 
streamed the client, as backup_label.


To make that safe, I've changed forcePageWrites into an integer. 
Whenever a backup is started, it's incremented, and when one ends, it's 
decremented. When forcePageWrites == 0, there's no backup in progress.


The user-visible pg_start_backup() function is not changed. You can only 
have one backup started that way in progress at a time. But you can do 
streaming base backups at the same time with traditional pg_start_backup().


I implemented this in two ways, and can't decide which I like better:

1. The contents of the backup label file are returned to the caller of 
do_pg_start_backup() as a palloc'd string.


2. do_pg_start_backup() creates a temporary file that the backup label 
is written to (instead of backup_label).


Implementation 1 changes more code, as pg_start/stop_backup() need to be 
changed to write/read from memory instead of file, but the result isn't 
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Patches for both approaches attached. They're also available in my 
github repository at g...@github.com:hlinnaka/postgres.git.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..cc4d46e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -338,7 +338,8 @@ typedef struct XLogCtlInsert
 	XLogPageHeader currpage;	/* points to header of block in cache */
 	char	   *currpos;		/* current insertion point in cache */
 	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
-	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+	int			forcePageWrites;	/* forcing full-page writes for PITR? */
+	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
 } XLogCtlInsert;
 
 /*
@@ -8313,7 +8314,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
-	startpoint = do_pg_start_backup(backupidstr, fast);
+	startpoint = do_pg_start_backup(backupidstr, fast, true, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
 			 startpoint.xlogid, startpoint.xrecoff);
@@ -8321,7 +8322,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 }
 
 XLogRecPtr
-do_pg_start_backup(const char *backupidstr, bool fast)
+do_pg_start_backup(const char *backupidstr, bool fast, bool exclusive, char **labelfile)
 {
 	XLogRecPtr	checkpointloc;
 	XLogRecPtr	startpoint;
@@ -8332,6 +8333,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	uint32		_logSeg;
 	struct stat stat_buf;
 	FILE	   *fp;
+	StringInfoData labelfbuf;
 
 	if (!superuser()  !is_authenticated_user_replication_role())
 		ereport(ERROR,
@@ -8368,15 +8370,19 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	 * ensure adequate interlocking against XLogInsert().
 	 */
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
-	if (XLogCtl-Insert.forcePageWrites)
+	if (exclusive)
 	{
-		LWLockRelease(WALInsertLock);
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg(a backup is already in progress),
- errhint(Run pg_stop_backup() and try again.)));
+		if (XLogCtl-Insert.exclusiveBackup)
+		{
+			LWLockRelease(WALInsertLock);
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg(a backup is already in progress),
+	 errhint(Run pg_stop_backup() and try again.)));
+		}
+		XLogCtl-Insert.exclusiveBackup = true;
 	}
-	XLogCtl-Insert.forcePageWrites = true;
+	XLogCtl-Insert.forcePageWrites++;
 	LWLockRelease(WALInsertLock);
 
 	/*
@@ -8393,7 +8399,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	RequestXLogSwitch();
 
 	/* Ensure we release forcePageWrites if fail below */
-	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 	{
 		/*
 		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
@@ -8420,54

Re: [HACKERS] Streaming base backups

2011-01-11 Thread Garick Hamlin

On Tue, Jan 11, 2011 at 12:45:02PM -0500, Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
  On Jan11, 2011, at 18:09 , Garick Hamlin wrote:
  My gut was that direct io would likely work right on Linux
  and Solaris, at least.
 
  Didn't we discover recently that O_DIRECT fails for ext4 on linux
  if ordered=data, or something like that?
 
 Quite.  Blithe assertions that something like this should work aren't
 worth the electrons they're written on.

Indeed.  I wasn't making such a claim in case that wasn't clear.  I believe,
in fact, there is no single way that will work everywhere.  This isn't
needed for correctness of course, it is merely a tweak for performance as
long as the 'not working case' on platform + filesystem X case degrades to
something close to what would have happened if we didn't try.  I expected
POSIX_FADV_NOREUSE not to work on Linux, but haven't looked at it recently
and not all systems are Linux so I mentioned it.  This was why I thought
direct io might be more realistic.

I did not have a chance to test before I wrote this email so I attempted to 
make my uncertainty clear.  I _know_ it will not work in some environments,
but I thought it was worth looking at if it worked on more than one sane 
common setup, but I can understand if you feel differently about that.

Garick

 
   regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Florian Pflug f...@phlo.org wrote:
 On Jan10, 2011, at 18:50 , Kevin Grittner wrote:
 I'm trying not to panic here, but I haven't looked at 2PC before
 yesterday and am just dipping into the code to support it, and
 time is short.  Can anyone give me a pointer to anything I should
 read before I dig through the 2PC code, which might accelerate
 this?
 
 
 It roughly works as follows
 
 Upon PREPARE, the locks previously held by the transaction are
 transferred to a kind of virtual backend which only consists of a
 special proc array entry. The transaction will thus still appear
 to be running, and will still be holding its locks, even after the
 original backend is gone. The information necessary to reconstruct
 that proc array entry is also written to the 2PC state, and used
 to recreate the virtual backend after a restart or crash.
 
 There are also some additional pieces of transaction state which
 are stored in the 2PC state file like the full list of
 subtransaction xids (The proc array entry may not contain all of
 them if it overflowed). 
 
 Upon COMMIT PREPARED, the information in the 2PC state file is
 used to write a COMMIT wal record and to update the clog. The
 transaction is then committed, and the special proc array entry is
 removed and all lockmgr locks it held are released.
 
 For 2PC to work for SSI transaction, I guess you must check for
 conflicts during PREPARE - at any later point the COMMIT may only
 fail transiently, not permanently. Any transaction that adds a
 conflict with an already prepared transaction must check if that
 conflict completes a dangerous structure, and abort if this is the
 case, since the already PREPAREd transaction can no longer be
 aborted. COMMIT PREPARED then probably doesn't need to do anything
 special for SSI transactions, apart from some cleanup actions
 maybe.
 
Thanks; that all makes sense.  The devil, as they say, is in the
details.  As far as I've worked it out, the PREPARE must persist
both the predicate locks and any conflict pointers which are to
other prepared transactions.  That leaves some fussy work around the
coming and going of prepared transactions, because on recovery you
need to be prepared to ignore conflict pointers with prepared
transactions which committed or rolled back.
 
What I haven't found yet is the right place and means to persist and
recover this stuff, but that's just a matter of digging through
enough source code.  Any tips regarding that may save time.  I'm
also not clear on what, if anything, needs to be written to WAL. I'm
really fuzzy on that, still.
 
 Unfortunately, it seems that doing things this way will undermine
 the guarantee that retrying a failed SSI transaction won't fail
 due to the same conflict as it did originally.
 
I hadn't thought of that, but you're right.  Of course, I can't
enforce that guarantee, anyway, without some other patch first being
there to allow me to cancel other transactions with
serialization_failure, even if they are idle in transaction.
 
 There doesn't seems a way around that, however - any correct
 implementation of 2PC for SSI will have to behave that way I fear
 :-(
 
I think you're right.
 
 Hope this helps  best regards,
 
It does.  Even the parts which just confirm my tentative conclusions
save me time in not feeling like I need to cross-check so much.  I
can move forward with more confidence.  Thanks.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Jeff Davis pg...@j-davis.com wrote:
 
 I don't expect this to be a huge roadblock for the feature though.
 It seems fairly contained. I haven't read the 2PC code either, but
 I don't expect that you'll need to change the rest of your
 algorithm just to support it.
 
Agreed; but I am starting to get concerned about whether this
particular area can be completed by the start of the CF.  I might
run a few days over on 2PC support.  Unless ... Dan?  Could you look
into this while I chase down the issue Anssi raised?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:08, Florian Pflug wrote:

Unfortunately, it seems that doing things this way will undermine the guarantee
that retrying a failed SSI transaction won't fail due to the same conflict as
it did originally. Consider

T1  BEGIN TRANSACTION ISOLATION SERIALIZABLE
T1  SELECT * FROM T
T1  UPDATE T ...
T1  PREPARE TRANSACTION

T2  BEGIN TRANSACTION ISOLATION SERIALIZABLE
T2  SELECT * FROM T
T2  UPDATE T ...
 -  Serialization Error

Retrying T2 won't help as long as T1 isn't COMMITTED.


T2 should block until T1 commits. I would be very surprised if it 
doesn't behave like that already. In general, a prepared transaction 
should be treated like an in-progress transaction - it might still abort 
too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Kevin Grittner

Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 On 11.01.2011 20:08, Florian Pflug wrote:
 Unfortunately, it seems that doing things this way will undermine
 the guarantee that retrying a failed SSI transaction won't fail
 due to the same conflict as it did originally. Consider

 T1  BEGIN TRANSACTION ISOLATION SERIALIZABLE
 T1  SELECT * FROM T
 T1  UPDATE T ...
 T1  PREPARE TRANSACTION

 T2  BEGIN TRANSACTION ISOLATION SERIALIZABLE
 T2  SELECT * FROM T
 T2  UPDATE T ...
  -  Serialization Error

 Retrying T2 won't help as long as T1 isn't COMMITTED.
 
 T2 should block until T1 commits. I would be very surprised if it 
 doesn't behave like that already. In general, a prepared
 transaction should be treated like an in-progress transaction - it
 might still abort too.
 
It shouldn't block if the updates were to different rows, which is
what I took Florian to mean; otherwise this would be handled by SI
and would have nothing to do with the SSI patch.  SSI doesn't
introduce any new blocking (with the one exception of the READ ONLY
DEFERRABLE style we invented to support long-running reports and
backups, and all blocking there is at the front -- once it's
running, it's going full speed ahead).
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Andreas Karlsson

On Tue, 2011-01-11 at 11:43 -0500, Tom Lane wrote:
 If that's what you're after, getObjectDescription is entirely
 unsuitable, because of the fact that its results are dependent
 on search path and language settings.
 
   regards, tom lane

Agreed, and as long as the additional information added to the
description by my patch is not useful for any other purpose I see no
reason for applying it.

So would anyone be confused by a description of pg_amproc not including
the types? I personally have no idea since I have not had to work with
indexes enough to say.

Andreas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI and 2PC

2011-01-11 Thread Florian Pflug

On Jan11, 2011, at 19:41 , Heikki Linnakangas wrote:
 On 11.01.2011 20:08, Florian Pflug wrote:
 Unfortunately, it seems that doing things this way will undermine the 
 guarantee
 that retrying a failed SSI transaction won't fail due to the same conflict as
 it did originally. Consider
 
 T1  BEGIN TRANSACTION ISOLATION SERIALIZABLE
 T1  SELECT * FROM T
 T1  UPDATE T ...
 T1  PREPARE TRANSACTION
 
 T2  BEGIN TRANSACTION ISOLATION SERIALIZABLE
 T2  SELECT * FROM T
 T2  UPDATE T ...
 -  Serialization Error
 
 Retrying T2 won't help as long as T1 isn't COMMITTED.
 
 T2 should block until T1 commits.

The serialization error will occur even if T1 and T2 update *different* rows. 
This is
due to the SELECTs in the interleaved schedule above returning the state of T 
prior to
both T1 and T2. Which of course never the case for a serial schedule.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I implemented this in two ways, and can't decide which I like better:

 1. The contents of the backup label file are returned to the caller of 
 do_pg_start_backup() as a palloc'd string.

 2. do_pg_start_backup() creates a temporary file that the backup label 
 is written to (instead of backup_label).

 Implementation 1 changes more code, as pg_start/stop_backup() need to be 
 changed to write/read from memory instead of file, but the result isn't 
 any more complicated. Nevertheless, I somehow feel more comfortable with 2.

Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.  How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I implemented this in two ways, and can't decide which I like better:

 1. The contents of the backup label file are returned to the caller of
 do_pg_start_backup() as a palloc'd string.

 2. do_pg_start_backup() creates a temporary file that the backup label
 is written to (instead of backup_label).

 Implementation 1 changes more code, as pg_start/stop_backup() need to be
 changed to write/read from memory instead of file, but the result isn't
 any more complicated. Nevertheless, I somehow feel more comfortable with 2.

 Seems like either one of these is fairly problematic in that you have to
 have some monstrous kluge to get the backup_label file to appear with
 the right name in the tarfile.  How badly do we actually need this?
 I don't think the use-case for concurrent base backups is all that large
 in practice given the I/O hit it's going to involve.

I think it can be done cleaner in the tar file injection - I've been
chatting with Heikki offlist about that. Not sure, but I have a
feeling it does.

As for the use-case, it doesn't necessarily involve a huge I/O hit if
either the entire DB is in RAM (not all that uncommon - though then
the backup finishes quickly as well), or if the *client* is slow, thus
making the server backlogged on sending the base backup.

Having it possible to do it concurrently also makes for *much* easier
use of the feature. It might be just about overlapping with a few
seconds, for example - which would probably have no major effect, but
takes a lot more code on the other end to work around.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Tom Lane

Andreas Karlsson andr...@proxel.se writes:
 So would anyone be confused by a description of pg_amproc not including
 the types?

It really shouldn't be useful to include those.  Attend what it says in
the fine manual for CREATE OPERATOR CLASS:

In a FUNCTION clause, the operand data type(s) the function is
intended to support, if different from the input data type(s) of
the function (for B-tree and hash indexes) or the class's data
type (for GIN and GiST indexes). These defaults are always
correct, so there is no point in specifying op_type in a
FUNCTION clause in CREATE OPERATOR CLASS, but the option is
provided for consistency with the comparable syntax in ALTER
OPERATOR FAMILY.

The reason the ALTER OPERATOR FAMILY DROP syntax has to include operand
types is that it lacks the full name/types of the referenced function.
Since getObjectDescription *does* provide those, it doesn't serve any
real purpose to repeat the information.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:51, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I implemented this in two ways, and can't decide which I like better:



1. The contents of the backup label file are returned to the caller of
do_pg_start_backup() as a palloc'd string.



2. do_pg_start_backup() creates a temporary file that the backup label
is written to (instead of backup_label).



Implementation 1 changes more code, as pg_start/stop_backup() need to be
changed to write/read from memory instead of file, but the result isn't
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.


Oh. I'm surprised you feel that way - that part didn't feel ugly or 
kludgey at all to me.



 How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.


It makes it very convenient to set up standbys, without having to worry 
that you'll conflict e.g with a nightly backup. I don't imagine people 
will use streaming base backups for very large databases anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Josh Berkus


 It makes it very convenient to set up standbys, without having to worry
 that you'll conflict e.g with a nightly backup. I don't imagine people
 will use streaming base backups for very large databases anyway.

Also, imagine that you're provisioning a 10-node replication cluster on
EC2.  This would make that worlds easier.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Seems like either one of these is fairly problematic in that you have to
 have some monstrous kluge to get the backup_label file to appear with
 the right name in the tarfile.  How badly do we actually need this?
 I don't think the use-case for concurrent base backups is all that large
 in practice given the I/O hit it's going to involve.

 I think it can be done cleaner in the tar file injection - I've been
 chatting with Heikki offlist about that. Not sure, but I have a
 feeling it does.

One point that I'm particularly interested to see how you'll kluge it
is ensuring that the tarball contains only the desired temp data and not
also the real $PGDATA/backup_label, should there be a normal base
backup being done concurrently with the streamed one.

The whole thing just seems too fragile and dangerous to be worth dealing
with given that actual usage will be a corner case.  *I* sure wouldn't
trust it to work when the chips were down.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:17, Heikki Linnakangas wrote:

Patches for both approaches attached. They're also available in my
github repository at g...@github.com:hlinnaka/postgres.git.


Just so people won't report the same issues again, a couple of bugs have 
already cropped up (thanks Magnus):


* a backup_label file in the data directory should now not be tarred up 
- we're injecting a different backup_label file there.


* pg_start_backup_callback needs to be updated to just decrement 
forcePageWrites, not reset it to 'false'


* pg_stop_backup() decrements forcePageWrites twice. oops.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: RangeTypes

2011-01-11 Thread David Fetter

On Tue, Jan 11, 2011 at 01:16:47AM -0800, Jeff Davis wrote:
 Ok, I have made some progress. This is still a proof-of-concept patch,
 but the important pieces are working together.
 
 Synopsis:
 
   CREATE TYPE numrange AS RANGE (SUBTYPE=numeric, 
 SUBTYPE_CMP=numeric_cmp);
 
   SELECT range_eq('[1,2.2)'::numrange,'[1,2.2]');
   SELECT range_lbound('(3.7,9]'::numrange);
   SELECT range(6.7);
   SELECT '-'::numrange; -- empty
   SELECT '[1, NULL]'::numrange; -- ] will become )
   SELECT '(INF, 3)'::numrange;
 
 I haven't completed many of the other generic functions, because I'd
 like to make sure I'm on the right track first. The important thing
 about the functions above is that they show ANYRANGE working in
 conjunction with ANYELEMENT in various combinations, which was a
 significant part of this patch.
 
 Here are the open items:
 
 1. Generic functions -- most of which are fairly obvious. However, I
 want to make sure I'm on the right track first.
 
 2. GiST -- I'll need a mechanism to implement the penalty function,
 and perhaps I'll also need additional support for the picksplit
 function. For the penalty function, I think I'll need to require a
 function to convert the subtype into a float, and I can use that to find
 a distance (which can be the penalty). That should also satisfy anything
 that picksplit might need.
 
 3. Typmod -- There is still one annoyance about typmod remaining. I need
 to treat it like an array in find_typmod_coercion_function(), and then
 create a coercion expression. Is it worth it? Would typmod on a range be
 confusing, or should I just finish this item up?

Probably not worth it for the first round.

 4. Docs

Happy to help evenings this week :)

 5. Tests

Same.  What do you have so far?

 6. pg_dump -- should be pretty easy; I just want to settle some of the
 other stuff first.
 
 7. Right now the parse function is quite dumb. Is there some example
 code I should follow to make sure I get this right?

KISS is a fine principle.  Do you really need it smart on the first
round? :)

 8. In order to properly support the various combinations of ANYRANGE and
 ANYELEMENT in a function definition (which are all important), we need
 to be able to determine the range type given a subtype. That means that
 each subtype can only have one associated range, which sounds somewhat
 limiting, but it can be worked around by using domains. I don't think
 this is a major limitation. Comments?

As we get a more nuanced type system, this is one of the things that
will need to get reworked, so I'd say it's better not to put too much
effort into things that a refactor of the type system
http://wiki.postgresql.org/wiki/Refactor_Type_System would make much
better, at least right now.

 Also related to representation:
 
   * Right now I always align the subtypes within the range according to
 typalign. I could avoid that by packing the bytes tightly, and then
 copying them around later. Suggestions? And what should the overall
 alignment of the range type be?

For the first cut, the simplest possible.

   * If it's a fixed-length type, we can save the varlena header byte on
 the overall range; but we lose the ability to save space when one of the
 boundaries of the range is missing (NULL or INF), and it would
 complicate the code a little. Thoughts?

Probably not worth complicating the code at this stage.  KISS again :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Dimitri Fontaine

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

 Now that we have a basic over-the-wire base backup capability in walsender,
 it would be nice to allow taking multiple base backups at the same time.

I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DISCARD ALL ; stored procedures

2011-01-11 Thread Alvaro Herrera

Excerpts from Stephen Frost's message of vie ene 07 15:29:52 -0300 2011:
 * Stephen Frost (sfr...@snowman.net) wrote:
  * Robert Haas (robertmh...@gmail.com) wrote:
Making it part of DISCARD PLANS; and back-patching it to 8.3 where
DISCARD was introduced would be awesome for me. :)
   
   I'd need to look at this more closely before committing anything, but
   at first blush I think that's reasonable.  Have a patch?
  
  To be honest, I was fully expecting a response of that's hard to do.
 
 Soo, yeah, I found the this is hard part.  Basically, the plan
 cacheing, etc, is all handled by the stored procedures themselves, and
 we havn't got any way to tell a PL destroy all your plans.  We might
 be able to hack up fmgr to throw away all references to functions, but
 that wouldn't release the memory they use up, making 'discard plans;'
 leak like a sieve.

What this discussion suggests to me is that cached plans need to be
tracked by a resource owner that's linked to the function.  The problem
I see with this idea is figure out what module would keep track of
resowners that need to be reset ...  Other than that I think it should
be straightforward :-)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 21:50, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:


Now that we have a basic over-the-wire base backup capability in walsender,
it would be nice to allow taking multiple base backups at the same time.


I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)


Yeah, those are bigger projects..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
 So, this patch modifies the internal do_pg_start/stop_backup functions 
 so that in addition to the traditional mode of operation, where a 
 backup_label file is created in the data directory where it's backed up 
 along with all other files, the backup label file is be returned to the 
 caller, and the caller is responsible for including it in the backup. 
 The code in replication/basebackup.c includes it in the tar file that's 
 streamed the client, as backup_label.

Perhaps we can use this more intelligent form of base backup to
differentiate between:
 a. a primary that has crashed while a backup was in progress; and
 b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

 1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
 2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autogenerating error code lists (was Re: [COMMITTERS] pgsql: Add foreign data wrapper error code values for SQL/MED.)

2011-01-11 Thread Jan Urbański

On 11/01/11 18:59, Tom Lane wrote:
 =?UTF-8?B?SmFuIFVyYmHFhHNraQ==?= wulc...@wulczer.org writes:
 On 11/01/11 17:11, Tom Lane wrote:
 Huh?  Why in the world would the specific location of the #include have
 anything to do with the problem?
 
 I'v having a hard time convincing make to generate errcodes.h before
 compiling any .c file that includes postgres.h. The only way I found was
 to make src/include/errcodes.h a dependancy of the all target.
 
 Peter would probably be a better person than me to answer that, but I
 imagine that what you want is similar to what src/backend/Makefile does
 for parser/gram.h, only applied at the src/ level or maybe even the
 root.

OK, that was a nudge in the right direction. Basing the rules from
src/backend/Makefile I changed src/Makefile to rebuild
src/include/errcodes.h before building the subdirectories and... it
failed miserably. There's some trickery beyond my understanding here.
There's a rule like this in src/backend/Makefile:

$(SUBDIRS:%=%-recursive): $(top_builddir)/src/include/parser/gram.h

that triggers checking whether gram.h needs to be rebuilt before
recursing into each SUBDIR.

A similar trick in src/Makefile doesn't work, because it's
src/backend/common.mk that is responsible for the SUBDIR-recursive
calls, and src/Makefile does not include it.

From what I gathered by reading Makefile.global, the $(recurse) call in
enters each SUBDIR and builds a target called target-subdir-recurse.
And actually, if I change my rule to read:

$(SUBDIRS:%=all-%-recurse): $(top_builddir)/src/include/utils/errcodes.h

it works. Now whether that's acceptable or not is another thing entirely...

Cheers,
Jan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 22:16, Jeff Davis wrote:

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:

So, this patch modifies the internal do_pg_start/stop_backup functions
so that in addition to the traditional mode of operation, where a
backup_label file is created in the data directory where it's backed up
along with all other files, the backup label file is be returned to the
caller, and the caller is responsible for including it in the backup.
The code in replication/basebackup.c includes it in the tar file that's
streamed the client, as backup_label.


Perhaps we can use this more intelligent form of base backup to
differentiate between:
  a. a primary that has crashed while a backup was in progress; and
  b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

  1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
  2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?


It won't change the situation for pg_start_backup(), but with the patch 
the base backups done via streaming won't have those issues, because 
backup_label is not created (with that name) in the master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add function dependencies

2011-01-11 Thread Peter Eisentraut

On tis, 2011-01-11 at 16:57 +0100, Dimitri Fontaine wrote:
 Peter Eisentraut pete...@gmx.net writes:
  Making it work for language SQL would be nice, though.
 
 Please consider a new DEPENDENCY_XXX constant for that though, because
 otherwise I think it could cause problems in the extension's dependency
 tracking.  Even with a new DEPENDENCY_FUNCALL or other constant, the
 extension code would need to be aware of this new kind of not-to-follow
 dependencies somehow, I guess.

What's a not-to-follow dependency?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] pgbench to the MAXINT

2011-01-11 Thread Euler Taveira de Oliveira


Em 10-01-2011 05:25, Greg Smith escreveu:

Euler Taveira de Oliveira wrote:

Em 07-01-2011 22:59, Greg Smith escreveu:

setrandom: invalid maximum number -2147467296


It is failing at atoi() circa pgbench.c:1036. But it just the first
one. There are some variables and constants that need to be converted
to int64 and some functions that must speak 64-bit such as getrand().
Are you working on a patch?


http://archives.postgresql.org/pgsql-hackers/2010-01/msg02868.php
http://archives.postgresql.org/message-id/4c326f46.4050...@2ndquadrant.com

Greg, I just improved your patch. I tried to work around the problems pointed 
out in the above threads. Also, I want to raise some points:


(i) If we want to support and scale factor greater than 21474 we have to 
convert some columns to bigint; it will change the test. From the portability 
point it is a pity but as we have never supported it I'm not too worried about 
it. Why? Because it will use bigint columns only if the scale factor is 
greater than 21474. Is it a problem? I don't think so because generally people 
compare tests with the same scale factor.


(ii) From the performance perspective, we need to test if the modifications 
don't impact performance. I don't create another code path for 64-bit 
modifications (it is too ugly) and I'm afraid some modifications affect the 
32-bit performance. I'm in a position to test it though because I don't have a 
big machine ATM. Greg, could you lead these tests?


(iii) I decided to copy scanint8() (called strtoint64 there) from backend 
(Robert suggestion [1]) because Tom pointed out that strtoll() has portability 
issues. I replaced atoi() with strtoint64() but didn't do any performance tests.


Comments?


[1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00173.php


--
  Euler Taveira de Oliveira
  http://www.timbira.com/
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 7c2ca6e..e9eb720 100644
*** a/contrib/pgbench/pgbench.c
--- b/contrib/pgbench/pgbench.c
***
*** 60,65 
--- 60,67 
  #define INT64_MAX	INT64CONST(0x7FFF)
  #endif
  
+ #define MAX_RANDOM_VALUE64	INT64_MAX
+ 
  /*
   * Multi-platform pthread implementations
   */
*** usage(const char *progname)
*** 364,378 
  		   progname, progname);
  }
  
  /* random number generator: uniform distribution from min to max inclusive */
! static int
! getrand(int min, int max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
--- 366,451 
  		   progname, progname);
  }
  
+ /*
+  * strtoint64 -- convert a string to 64-bit integer
+  *
+  * this function is a modified version of scanint8() from
+  * src/backend/utils/adt/int8.c.
+  *
+  * XXX should it have a return value?
+  *
+  */
+ static int64
+ strtoint64(const char *str)
+ {
+ 	const char *ptr = str;
+ 	int64		result = 0;
+ 	int			sign = 1;
+ 
+ 	/*
+ 	 * Do our own scan, rather than relying on sscanf which might be broken
+ 	 * for long long.
+ 	 */
+ 
+ 	/* skip leading spaces */
+ 	while (*ptr  isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	/* handle sign */
+ 	if (*ptr == '-')
+ 	{
+ 		ptr++;
+ 
+ 		/*
+ 		 * Do an explicit check for INT64_MIN.	Ugly though this is, it's
+ 		 * cleaner than trying to get the loop below to handle it portably.
+ 		 */
+ 		if (strncmp(ptr, 9223372036854775808, 19) == 0)
+ 		{
+ 			result = -INT64CONST(0x7fff) - 1;
+ 			ptr += 19;
+ 			goto gotdigits;
+ 		}
+ 		sign = -1;
+ 	}
+ 	else if (*ptr == '+')
+ 		ptr++;
+ 
+ 	/* require at least one digit */
+ 	if (!isdigit((unsigned char) *ptr))
+ 		fprintf(stderr, invalid input syntax for integer: \%s\\n, str);
+ 
+ 	/* process digits */
+ 	while (*ptr  isdigit((unsigned char) *ptr))
+ 	{
+ 		int64		tmp = result * 10 + (*ptr++ - '0');
+ 
+ 		if ((tmp / 10) != result)		/* overflow? */
+ 			fprintf(stderr, value \%s\ is out of range for type bigint\n, str);
+ 		result = tmp;
+ 	}
+ 
+ gotdigits:
+ 
+ 	/* allow trailing whitespace, but not other trailing chars */
+ 	while (*ptr != '\0'  isspace((unsigned char) *ptr))
+ 		ptr++;
+ 
+ 	if (*ptr != '\0')
+ 		fprintf(stderr, invalid input syntax for integer: \%s\\n, str);
+ 
+ 	return ((sign  0) ? -result : result);
+ }
+ 
  /* random number generator: uniform distribution from min to max inclusive */
! static int64
! getrand(int64 min, int64 max)
  {
  	/*
  	 * Odd coding is so that min and max have approximately the same chance of
  	 * being selected as do numbers between them.
  	 */
! 	return min + (int64) (((max - min + 1) * (double) random()) / (MAX_RANDOM_VALUE64 + 1.0));
  }
  
  /* call PQexec() and exit() on failure */
*** top:
*** 887,893 
  		if (commands[st-state] == NULL)
  		{
  			st-state = 0;
! 			st-use_file = getrand(0,

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
1. If it's a primary recovering from a crash, and there is a
  backup_label file, and the WAL referenced in the backup_label exists,
  then it does a bunch of extra work during recovery; and
2. In the same situation, if the WAL referenced in the backup_label
  does not exist, then it PANICs with a HINT to tell you to remove the
  backup_label.
 
  Is this an opportunity to solve these problems and simplify the code?
 
 It won't change the situation for pg_start_backup(), but with the patch 
 the base backups done via streaming won't have those issues, because 
 backup_label is not created (with that name) in the master.

Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 135 matches

Mail list logo