Re: [PATCHES] [HACKERS] WITH RECURSIVE patch V0.1

2008-05-19 Thread Hannu Krosing
On Sun, 2008-05-18 at 22:17 -0700, David Fetter wrote:
 On Mon, May 19, 2008 at 12:21:20AM -0400, Gregory Stark wrote:
  Zoltan Boszormenyi [EMAIL PROTECTED] writes:
   Also, it seems there are no infinite recursion detection:
  
   # with recursive x(level, parent, child) as (
  select 1::integer, * from test_connect_by where parent is null
  union all
  select x.level + 1, base.* from test_connect_by as base, x where 
   base.child
   = x.child
   ) select * from x;
   ... it waits and waits and waits ...
  
  Well, psql might wait and wait but it's actually receiving rows.  A
  cleverer client should be able to deal with infinite streams of
  records. 
 
 That would be a very good thing for libpq (and its descendants) to
 have :)
 
  I think DB2 does produce a warning if there is no clause it can
  determine will bound the results.  But that's not actually reliable.
 
 I'd think not, as it's (in some sense) a Halting Problem.
 
  It's quite possible to have clauses which will limit the output but
  not in a way the database can determine.  Consider for example a
  tree-traversal for a binary tree stored in a recursive table
  reference.  The DBA might know that the data contains no loops but
  the database doesn't.
 
 I seem to recall Oracle's implementation can do this traversal on
 write operations, but maybe that's just their marketing.

It may be possible to solve at least some of it by doing something
similar to hash version of DISTINCT by having an hashtable of tuples
already returned and not descending branches where you have already
been.

 Cheers,
 David.
 -- 
 David Fetter [EMAIL PROTECTED] http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: [EMAIL PROTECTED]
 
 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate
 


-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


[PATCHES] fix for pl/pythons named OUT parameter handling

2008-04-30 Thread Hannu Krosing
Before this patch, pl/python will not do the right thing if OUT
parameters are present


Hannu

Index: plpython/plpython.c
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpython/plpython.c,v
retrieving revision 1.106
diff -c -r1.106 plpython.c
*** plpython/plpython.c	2 Jan 2008 03:10:27 -	1.106
--- plpython/plpython.c	29 Apr 2008 20:40:53 -
***
*** 1159,1167 
  	bool		isnull;
  	int			i,
  rv;
- 	Datum		argnames;
- 	Datum	   *elems;
- 	int			nelems;
  
  	procStruct = (Form_pg_proc) GETSTRUCT(procTup);
  
--- 1159,1164 
***
*** 1250,1304 
  		 * now get information required for input conversion of the
  		 * procedure's arguments.
  		 */
! 		proc-nargs = procStruct-pronargs;
! 		if (proc-nargs)
! 		{
! 			argnames = SysCacheGetAttr(PROCOID, procTup, Anum_pg_proc_proargnames, isnull);
! 			if (!isnull)
! 			{
! /* XXX this code is WRONG if there are any output arguments */
! deconstruct_array(DatumGetArrayTypeP(argnames), TEXTOID, -1, false, 'i',
!   elems, NULL, nelems);
! if (nelems != proc-nargs)
! 	elog(ERROR,
! 		 proargnames must have the same number of elements 
! 		 as the function has arguments);
! proc-argnames = (char **) PLy_malloc(sizeof(char *) * proc-nargs);
! memset(proc-argnames, 0, sizeof(char *) * proc-nargs);
! 			}
! 		}
! 		for (i = 0; i  proc-nargs; i++)
! 		{
  			HeapTuple	argTypeTup;
  			Form_pg_type argTypeStruct;
  
! 			argTypeTup = SearchSysCache(TYPEOID,
! 		 ObjectIdGetDatum(procStruct-proargtypes.values[i]),
! 		0, 0, 0);
! 			if (!HeapTupleIsValid(argTypeTup))
! elog(ERROR, cache lookup failed for type %u,
! 	 procStruct-proargtypes.values[i]);
! 			argTypeStruct = (Form_pg_type) GETSTRUCT(argTypeTup);
  
! 			/* Disallow pseudotype argument */
! 			if (argTypeStruct-typtype == TYPTYPE_PSEUDO)
! ereport(ERROR,
! 		(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 		 errmsg(plpython functions cannot take type %s,
! 		format_type_be(procStruct-proargtypes.values[i];
! 
! 			if (argTypeStruct-typtype != TYPTYPE_COMPOSITE)
! PLy_input_datum_func((proc-args[i]),
! 	 procStruct-proargtypes.values[i],
! 	 argTypeTup);
! 			else
! proc-args[i].is_rowtype = 2;	/* still need to set I/O funcs */
  
! 			ReleaseSysCache(argTypeTup);
  
! 			/* Fetch argument name */
! 			if (proc-argnames)
! proc-argnames[i] = PLy_strdup(DatumGetCString(DirectFunctionCall1(textout, elems[i])));
  		}
  
  		/*
--- 1247,1309 
  		 * now get information required for input conversion of the
  		 * procedure's arguments.
  		 */
! 
! 		if (procStruct-pronargs) {
  			HeapTuple	argTypeTup;
  			Form_pg_type argTypeStruct;
  
! 			Oid		*types;
! 			char   **names,
! 	*modes;
! 			int		 i,
! 	 pos,
! 	 total;
! 
! 			total = get_func_arg_info(procTup, types, names, modes);
! 			if (modes == NULL) 
! proc-nargs = procStruct-pronargs;
! 			else 
! 			/* count number of 'i?' args into proc-nargs*/
! for (i = 0;i  total;i++) {
! 	if (modes[i] != 'o') (proc-nargs)++;
! }
  
! 			proc-argnames = (char **) PLy_malloc(sizeof(char *) * proc-nargs);
! 			for (i = pos = 0;i  total;i++) {
! if (modes  modes[i] == 'o') /* skip OUT arguments */
! 	continue;
! 
! Assert(types[i] == procStruct-proargtypes.values[i]);
! 
! argTypeTup = SearchSysCache(TYPEOID,ObjectIdGetDatum(types[i]), 0, 0, 0);
! if (!HeapTupleIsValid(argTypeTup))
! 	elog(ERROR, cache lookup failed for type %u,
! 		 procStruct-proargtypes.values[i]);
! argTypeStruct = (Form_pg_type) GETSTRUCT(argTypeTup);
! 
! switch (argTypeStruct-typtype) {
! 	case TYPTYPE_PSEUDO:
! 		ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
!  errmsg(plpython functions cannot take type %s,
!  format_type_be(types[i];
! 	case TYPTYPE_COMPOSITE:
! 		proc-args[pos].is_rowtype = 2; /* set IO funcs at first call*/
! 		break;
! 	default:
! 		PLy_input_datum_func((proc-args[pos]),
! 			 types[i],
! 			 argTypeTup);
! }
  
! ReleaseSysCache(argTypeTup);
  
! /* get argument name */
! proc-argnames[pos] = names ? PLy_strdup(names[i]) : NULL; 
! 
! pos++;
! 
! 			}
  		}
  
  		/*
Index: plpython/expected/plpython_function.out
===
RCS file: /projects/cvsroot/pgsql/src/pl/plpython/expected/plpython_function.out,v
retrieving revision 1.10
diff -c -r1.10 plpython_function.out
*** plpython/expected/plpython_function.out	16 Oct 2006 21:13:57 -	1.10
--- plpython/expected/plpython_function.out	29 Apr 2008 20:40:59 -
***
*** 436,438 
--- 436,447 
  	type_record.second = second
  	return type_record
  $$ LANGUAGE plpythonu;
+ CREATE FUNCTION test_in_out_params(first in text, second out text) AS $$
+ return first + 

Re: [PATCHES] [HACKERS] Full page writes improvement, code update

2007-04-10 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-04-10 kell 18:17, kirjutas Joshua D. Drake:
  In terms of idle time for gzip and other command to archive WAL offline,
  no difference in the environment was given other than the command to
  archive.   My guess is because the user time is very large in gzip, it
  has more chance for scheduler to give resource to other processes.   In
  the case of cp, idle time is more than 30times longer than user time.
  Pg_compresslog uses seven times longer idle time than user time.  On the
  other hand, gzip uses less idle time than user time.   Considering the
  total amount of user time, I think it's reasonable measure.
  
  Again, in my proposal, it is not the issue to increase run time
  performance.   Issue is to decrease the size of archive log to save the
  storage.
 
 Considering the relatively little amount of storage a transaction log
 takes, it would seem to me that the performance angle is more appropriate.

As I understand it it's not about transaction log but about write-ahead
log.

and the amount of data in WAL can become very important once you have to
keep standby servers in different physical locations (cities, countries
or continents) where channel throughput and cost comes into play.

With simple cp (scp/rsync) the amount of WAL data needing to be copied
is about 10x more than data collected by trigger based solutions
(Slony/pgQ). With pg_compresslog WAL-shipping seems to have roughly the
same amount and thus becomes a viable alternative again.

 Is it more efficient in other ways besides negligible tps? Possibly more
 efficient memory usage? Better restore times for a crashed system?

I think that TPS is more affected by number of writes than size of each
block written, so there is probably not that much to gain in TPS, except
perhaps from better disk cache usage. 

For me pg_compresslog seems to be a winner even if it just does not
degrade performance.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] LIKE optimization in UTF-8 and locale-C

2007-03-22 Thread Hannu Krosing
Ühel kenal päeval, N, 2007-03-22 kell 11:08, kirjutas Tom Lane:
 ITAGAKI Takahiro [EMAIL PROTECTED] writes:
  I found LIKE operators are slower on multi-byte encoding databases
  than single-byte encoding ones. It comes from difference between
  MatchText() and MBMatchText().
 
  We've had an optimization for single-byte encodings using 
  pg_database_encoding_max_length() == 1 test. I'll propose to extend it
  in UTF-8 with locale-C case.
 
 If this works for UTF8, won't it work for all the backend-legal
 encodings?

I guess it works well for % but not for _ , the latter has to know, how
many bytes the current (multibyte) character covers.

The length is still easy to find out for UTF8 encoding, so it may be
feasible to write UTF8MatchText() that is still faster than
MBMatchText().

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] [HACKERS] HOT WIP Patch - version 2

2007-02-19 Thread Hannu Krosing
Ühel kenal päeval, T, 2007-02-20 kell 12:08, kirjutas Pavan Deolasee:
 
 Reposting - looks like the message did not get through in the first
 attempt. My apologies if multiple copies are received.
 
 
 This is the next version of the HOT WIP patch. Since the last patch
 that 
 I sent out, I have implemented the HOT-update chain pruning mechanism.
 
 When following a HOT-update chain from the index fetch, if we notice
 that 
 the root tuple is dead and it is HOT-updated, we try to prune the
 chain to
 the smallest possible length. To do that, the share lock is upgraded
 to an
 exclusive lock and the tuple chain is followed till we find a
 live/recently-dead 
 tuple. At that point, the root t_ctid is made point to that tuple. In
 order to
 preserve the xmax/xmin chain, the xmax of the root tuple is also
 updated
 to xmin of the found tuple. Since this xmax is also 
 RecentGlobalXmin 
 and is a committed transaction, the visibility of the root tuple still
 remains
 the same.

What do you do, if there are no live tuples on the page ? will this
un-HOTify the root and free all other tuples in HOT chain ?

 
 The intermediate heap-only tuples are  removed from the HOT-update
 chain.
 The HOT-updated status of these tuples is cleared and their respective
 t_ctid are made point to themselves. These tuples are not reachable
 now and ready for vacuuming.

Does this mean, that they are now indistinguishable from ordinary
tuples ?

Maybe they could be freed right away instead of changing HOT-updated
status and ctid ?

  This entire action is logged in a single
 WAL record.
 
 During vacuuming, we keep track of number of root tuples vacuumed. 
 If this count is zero, then the index cleanup step is skipped. This
 would avoid unnecessary index scans whenever possible.
 
 This patch should apply cleanly on current CVS head and pass all
 regression
 tests. I am still looking for review comments from the first WIP
 patch. If anyone 
 has already looked through it and is interested in the incremental
 changes,
 please let me know. I can post that.
 
 Whats Next ?
 -
 
 ISTM that  the basic  HOT-updates and ability to prune the HOT-update
 chain, 
 should help us reduce the index bloat, limit the overhead of ctid
 following in
 index fetch and efficiently vacuum heap-only tuples. IMO the next
 important
 but rather less troublesome thing to tackle is to reuse space within a
 block 
 without complete vacuum of the table. This would help us do much more
 HOT-updates and thus further reduce index/heap bloat.
 
 I am thinking of reusing the DEAD heap-only tuples which gets removed
 from
 the HOT-update chain as part of pruning operation. Since these tuples,
 once 
 removed from the chain, are neither reachable nor have any index
 references,
 could be readily used for storing newer versions of the same or other
 rows in
 the block. How about setting LP_DELETE on these tuples as part of the 
 prune operation ? LP_DELETE is unused for heap tuples, if I am not
 mistaken. Other information like length and offset are is maintained
 as it is.

Seems like a good idea.

 When we run out space for update-within-the-block, we traverse
 through all the line pointers looking for LP_DELETEd items. If any of
 these
 items have space large enough to store the new tuple, that item is
 reused.
 Does anyone see any issue with doing this ? Also, any suggestions 
 about doing it in a better way ?

IIRC the size is determined by the next tuple pointer, so you can store
new data without changing tuple pointer only if they are exactly the
same size.

 If the page gets really fragmented, we can try to grab a
 VACUUM-strength
 lock on the page and de-fragment it. The lock is tried conditionally
 to avoid
 any deadlocks. This is done in the heap_update() code path, so would
 add 
 some overhead, but may still prove better than putting the tuple in a 
 different block and having corresponding index insert(s). Also, since
 we are
 more concerned about the large tables, the chances of being able to
 upgrade 
 the exclusive lock to vacuum-strength lock are high. Comments ?

I'm not sure about the we are more concerned about the large tables
part. I see it more as a device for high-update tables. This may not
always be the same as large, so there should be some fallbacks for
case where you can't get the lock. Maybe just give up and move to
another page ?

 If there are no objections, I am planning to work on the first part
 while Nikhil would take up the second task of block level
 retail-vacuum.
 Your comments on these issues and the patch are really appreciated.
 
 Thanks,
 Pavan
 
 -- 
 
 EnterpriseDB http://www.enterprisedb.com 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

Re: [PATCHES] Forcing current WAL file to be archived

2006-08-12 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-08-12 kell 10:59, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
  Insert points to the next byte to be written within the internal WAL
  buffers.  The byte(s) preceding it haven't necessarily gotten out of
  those buffers yet.  Write points to the end of what we've actually
  written to the kernel,
 
  I assume that it also points to the byte after what is written to
  kernel, or is it tha last byte written ?
 
 Right, it's really first-unwritten-byte for all three pointers.
 The two newly added functions to convert WAL locations to filenames
 use XLByteToPrevSeg(), so they should do the right thing here
 (see comments in src/include/access/xlog_internal.h).

How do they behave exactly at the file boundary ?

That is will it point 1 byte past end of old file, or byte 0 of the new
one ?

   regards, tom lane
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Forcing current WAL file to be archived

2006-08-12 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-08-09 kell 10:57, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Ühel kenal päeval, K, 2006-08-09 kell 12:56, kirjutas Simon Riggs:
  Methinks it should be the Write pointer all of the time, since I can't
  think of a valid reason for wanting to know where the Insert pointer is
  *before* we've written to the xlog file. Having it be the Insert pointer
  could lead to some errors.
 
  What is the difference ?
 
 Insert points to the next byte to be written within the internal WAL
 buffers.  The byte(s) preceding it haven't necessarily gotten out of
 those buffers yet.  Write points to the end of what we've actually
 written to the kernel,

I assume that it also points to the byte after what is written to
kernel, or is it tha last byte written ?

  and there's also a Flush pointer that points
 to the end of what we believe is down on disk.
 
 Simon's point is that if you're going to use pg_current_xlog_location()
 to control partial shipping of xlog files, you probably want to know
 about the Write location, because that indicates the limit of what
 is visible to an external process.

Yes, that is what I need

   regards, tom lane
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] [PATCH] Provide 8-byte transaction IDs to

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 13:41, kirjutas Darcy Buskermolen:
 On Wednesday 26 July 2006 13:04, Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   I am sure you worked hard on this, but I don't see the use case, nor
   have I heard people in the community requesting such functionality.
   Perhaps pgfoundry would be a better place for this.
 
  The part of this that would actually be useful to put in core is
  maintaining a 64-bit XID counter, ie, keep an additional counter that
  bumps every time XID wraps around.  This cannot be done very well from
  outside core but it would be nearly trivial, and nearly free, to add
  inside.  Everything else in the patch could be done just as well as an
  extension datatype.
 
  (I wouldn't do it like this though --- TransactionIdAdvance itself is
  the place to bump the secondary counter.)
 
  The question though is if we did that, would Slony actually use it?

It seems that Slony people still hope to circumvent the known brokenness
of xxid btree indexes by dropping and creating them often enough and/or
trying other workarounds.

 If it made sence to do it, then yes we would do it. The problem ends up being 
 Slony is designed to work across a multitude of versions of PG, and unless 
 this was backported to at least 7.4, it would take a while (ie when we 
 stopped supporting versions older than it was ported into)  before we would 
 make use of it.

We already have an external implementation, which requires a function
call to be executed at an interval of a few hundreds of millions
transactions to pump up the higher int4 when needed. 

It would probably be easy to backport it to any version of postgres
which is supported by slony.

Being in core just makes the overflow accounting part more robust.

The function to retrieve the 8-byte trx id will look exatly the same
from userland in both cases.

 
  regards, tom lane
 
  ---(end of broadcast)---
  TIP 2: Don't 'kill -9' the postmaster
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PATCHES] [PATCH] Provide 8-byte transaction IDs to

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 13:35, kirjutas Bruce Momjian:
 I am sure you worked hard on this, but I don't see the use case, 

The use case is any slony-like replication system or queueing system
which needs consistent means of knowing batches of transactions which
have finished during some period.

You can think of this as a core component for building slony that does
*not* break at 2G trx.

 nor
 have I heard people in the community requesting such functionality. 

You will, once more Slony users reach 2billion trx limit and start
silently losing data. And find out a few weeks later.

 Perhaps pgfoundry would be a better place for this.

At least the part that manages epoch should be in core. 

The rest can actually be on pgfoundry as a separate project, or inside
skytools/pgQ.

 ---
 
 Marko Kreen wrote:
  
  Intro
  -
  
  Following patch exports 8 byte txid and snapshot to user level
  allowing its use in regular SQL.  It is based on Slony-I xxid
  module.  It provides special 'snapshot' type for snapshot but
  uses regular int8 for transaction ID's.
  
  Exported API
  
  
  Type: snapshot
  
  Functions:
  
current_txid()returns int8
current_snapshot()returns snapshot
snapshot_xmin(snapshot)   returns int8
snapshot_xmax(snapshot)   returns int8
snapshot_active_list(snapshot)returns setof int8
snapshot_contains(snapshot, int8) returns bool
pg_sync_txid(int8)returns int8
  
  Operation
  -
  
  Extension to 8-byte is done by keeping track of wraparound count
  in pg_control.  On every checkpoint, nextxid is compared to one
  stored in pg_control.  If value is smaller wraparound happened
  and epoch is inreased.
  
  When long txid or snapshot is requested, pg_control is locked with
  LW_SHARED for retrieving epoch value from it.  The patch does not
  affect core functionality in any other way.
  
  Backup/restore of txid data
  ---
  
  Currently I made pg_dumpall output following statement:
  
SELECT pg_sync_txid(%d), current_txid()
  
  then on target database, pg_sync_txid if it's current
  (epoch + GetTopTransactionId()) are larger than given argument.
  If not then it bumps epoch, until they are, thus guaranteeing that
  new issued txid's are larger then in source database.  If restored
  into same database instance, nothing will happen.
  
  
  Advantages of 8-byte txids
  --
  
  * Indexes won't break silently.  No need for mandatory periodic
truncate which may not happen for various reasons.
  * Allows to keep values from different databases in one table/index.
  * Ability to bring data into different server and continue there.
  
  Advantages in being in core
  ---
  
  * Core code can guarantee that wraparound check happens in 2G transactions.
  * Core code can update pg_control non-transactionally.  Module
needs to operate inside user transaction when updating epoch
row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
long transactions, locking, etc).
  * Core code has only one place where it needs to update, module
needs to have epoch table in each database.
  
  Todo, tothink
  -
  
  * Flesh out the documentation.  Probably needs some background.
  * Better names for some functions?
  * pg_sync_txid allows use of pg_dump for moveing database,
but also adds possibility to shoot in the foot by allowing
epoch wraparound to happen.  Is Don't do it then enough?
  * Currently txid keeps its own copy of nextxid in pg_control,
this makes clear data dependencies.  Its possible to drop it
and use -checkPointCopy-nextXid directly, thus saving 4 bytes.
  * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
  
  -- 
  marko
  
 
 [ Attachment, skipping... ]
 
  
  ---(end of broadcast)---
  TIP 2: Don't 'kill -9' the postmaster
 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 23:02, kirjutas Martijn van
Oosterhout:
 On Wed, Jul 26, 2006 at 12:47:57PM -0400, Greg Stark wrote:
  Tom Lane [EMAIL PROTECTED] writes:
  
   So far, the case hasn't been made for retail vacuum even ignoring the
   not-so-immutable-function risk.
  
  Well the desire for it comes from a very well established need for dealing
  with extremely large tales with relatively small hot spots. The basic 
  problem
  being that currently the cost of vacuum is proportional to the size of the
  table rather than the amount of dead space. There's no link between those
  variables (at least in one direction) and any time they're far out of whack 
  it
  means excruciating pain for the DBA.
 
 I thought the suggested solution for that was the dead space map. That
 way vacuum can ignore parts of the table that havn't changed...

It can ignore parts of the *table* but still has to scan full *indexes*.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [PATCHES] [PATCH] Provide 8-byte transaction IDs to

2006-07-26 Thread Hannu Krosing
Ühel kenal päeval, K, 2006-07-26 kell 14:27, kirjutas Darcy Buskermolen:
 On Wednesday 26 July 2006 14:03, Tom Lane wrote:
  Darcy Buskermolen [EMAIL PROTECTED] writes:
   The question though is if we did that, would Slony actually use it?
  
   If it made sence to do it, then yes we would do it. The problem ends up
   being Slony is designed to work across a multitude of versions of PG, and
   unless this was backported to at least 7.4, it would take a while (ie
   when we stopped supporting versions older than it was ported into) 
   before we would make use of it.
 
  [ shrug... ]  That's not happening; for one thing the change requires a
  layout change in pg_control and we have no mechanism to do that without
  initdb.
 
 I'll take a bit more of a look through the patch and see if it is a real boot 
 to use it on those platforms that support it, and that we have a suitable way 
 around it on those that don't.

This patch is actually 2 things together:

1) fixing the xid wraparound and related btree brokenness by moving to
8byte txids represented as int8

2) cleaning up and exposing slony's snapshot usage. 

Slony stored snapshots in tables as separate xmin, xmax and
list-of-running-transactions and then constructed the snapshot struct
and used it internally.

This patch exposes the snapshot it by providing a single snapshot type
and operators for telling if any int8 trx is committed before or after
this snapshot.

This makes it possible to use txid and snapshots in a a query that does

SELECT records FROM logtable WHERE txid BETWEEN snap1 AND snap2;

that is it gets all records which are committed between two snapshots.

  But at this point I wouldn't hold my breath on that

Well, switching to using stuff from this patch would fix the
data-corruption-after-2G problem for slony. 

That is unless thera are some bugs or thinkos of its own in this
patch :)

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Iterating generator from C (PostgreSQL's pl/python RETUN

2006-05-15 Thread Hannu Krosing
Sorry for cross-posting, but this IS a cross-platform issue.

Christian Tismer tismer at stackless.com  wrote:
 Sven Suursoho wrote:
 
   Is there any way to rewrite following program to handle returned 
  generator  without hitting this bug?
 
 The only way I can think of getting around this is to make
 sure that there is always a running python frame.
 This would be possible if you can control how the
 extension is called.

What would be the easiest way to hold a always running python frame ?

The actual calls to iterator come through a pl/python framework, which can hold 
some 
state - like some inter-call dictionaries - so keeping also a simple outer 
frame 
there may be possible.

What danger (apart from general uglyness) may lurk there in keeping this frame ?

  Unfortunately, this is not an option because I can't control used 
  environment: I'm trying to improve PostgreSQL's stored procedure 
  language PL/Python and this software can be used everywhere.
 
 Then there is no other way than to report the bug, get the
 fix back-ported and nagging the PL/Python maintainers
 to update things after the fix.

Unfortunately there is no 'after the fix', as the fix must happen outside 
postgresql/plpython development and should also run on oldish systems.

The reason we want to get some workaround for that bug is the need 
to overcome resistance from core postgreSQL developers to inclusion of our 
plpython enchancements to postgreSQLs bundled plpython due to one specific use 
of our generic enchancement (using a built-in generator, usually a function 
with yield) 
on buggy RedHat's bundled plpython.so causing crashes.

Also, PL/Python has been in minimal maintenance mode for many years, with 
basically only immediate bugfixing going on. 

We at Skype (that is currently Sven Suursoho) are the first ones doing 
active development on it after Andrew Bosma dropped development many years 
ago once he got just the very basic stuff working.

 Also a test should be added which is probably missing since a while.

Test where ? In python.so build process, so that RedHat will spot it and 
fix their RPM builds ? 

As for testing in actual pl/python build environment, we had objections from 
leading postgresql Tom Lane that even if we do test it at build time, 
a determined DBA may substitute a buggy python.so later and still crash her DB 
instance.

Do you have any ideas, how to test for buggy asserts in python runtime 
environment without actually triggering the assert ?

Then we could introduce some (uglish) special handling for generators in 
pl/pythons iterator.

 I'd put a warning somewhere that generators are broken in
 debug mode, file an issue as well, but avoid trying to hack
 around this. It would make the bug even more resistent :-)

We have been trying to advocate such approach, but so far with modest results :(

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Iterating generator from C (PostgreSQL's

2006-05-15 Thread Hannu Krosing
Ühel kenal päeval, E, 2006-05-15 kell 17:21, kirjutas Tom Lane:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Sven Suursoho wrote:
  As for testing in actual pl/python build environment, we had objections 
  from 
  leading postgresql Tom Lane that even if we do test it at build time, 
  a determined DBA may substitute a buggy python.so later and still crash 
  her DB instance.
 
 The above is a straw-man depiction of my point.

Sure ;)

 What I said was that just
 because python is up-to-date on the system where plpython is *compiled*
 does not mean it'll be up-to-date on the system where plpython is *used*.

Would running an external program at pl init time and testing for its
crash status be a good broken lib test for plpython ?

 With the increasing popularity of prebuilt packages (rpm, deb, etc),
 I think it's folly to use a build-time check for this.

I guess most packaging systems can require some minimal version of
dependencies. 

And in general newer versions are less buggy than old ones.

So i guess that some combination of build-time/run-time tests,
documentation and packager education should take care of 99% of
concerns ?

Hopefully we can just ignore the determined DBA failure mode and move
forward with including the patch. Or do you think that trying to hack in
the extra python frame to all/some builds to ward of potentially broken
libs would still be something to go for ?

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] plpython improvements

2006-05-06 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-05-05 kell 09:20, kirjutas Joshua D. Drake:
  
  I think that a less confusing way of saying it would be :
  
   Generators crash if python version used is 2.4.x and it is compiled
   with asserts. 
  
   Currently only known linux distributions to distibute such python.so
   files are Fedora and possibly other RedHat distributions, while
   Gentoo, Ubuntu and Suse are OK. 
 
 Ubuntu ships 2.4 I don't know about SuSE. 2.4 has been out for sometime 
 and it would be a mistake to assume that we won't run into this.

Marko Kreen has tested the patch on Ubuntu and it is ok there.

The problem is not using 2.4, it is using 2.4 compiled with a specific set of 
flags.

---
Hannu


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] plpython improvements

2006-05-05 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-05-04 kell 18:21, kirjutas Sven Suursoho:
 Hi,
 
 
 Sun, 30 Apr 2006 19:14:28 +0300, Tom Lane [EMAIL PROTECTED]:
 
  Sven Suursoho [EMAIL PROTECTED] writes:
  Unfortunately, there is still one problem when using unpatched python,
  caused by too aggressive assert.
  http://mail.python.org/pipermail/python-checkins/2005-August/046571.html.
 
  I don't think we are going to be able to accept a patch that causes the
  server to crash when using any but a bleeding-edge copy of Python.

Actually not bleeding-edge, but just version 2.4.x as distributed in
Fedora Core (and possibly in RHAS), which have assert() enabled in
python.so. 

The assert there is buggy (bug
http://sourceforge.net/tracker/index.php?func=detailaid=1257960group_id=5470atid=105470)
 

 Did complete rewrite for SETOF functions: now accepts any python object  
 for which iter(object) returns iterable object. In this way we don't have  
 to deal with specific containers but can use unified python iterator API.  
 It means that plpython is future-proof -- whenever python introduces new  
 container, stored procedures already can use those without recompiling  
 language handler.
 
 Also integrated with regression tests and updated existing tests to use  
 named parameters.
 
 When using python interpreter with asserts enabled, generators still  
 crash. But I don't think that we should drop this feature because of that.  
 Reasons:
 1) this is someone else's bug, we are using documented API correctly
 2) it doesn't concern majority of users because probably there is no  
 asserts in production packages (tested with gentoo, ubuntu, suse). This is  
 true even for older python versions that are not patched.

From reading the bug, it seems that older versions of python also don't
have this bug, only 2.4.

 And after all, we can document using sets, lists, tuples, iterators etc  
 and explicitly state that returning generator is undefined.

I think that a less confusing way of saying it would be :

 Generators crash if python version used is 2.4.x and it is compiled
 with asserts. 

 Currently only known linux distributions to distibute such python.so
 files are Fedora and possibly other RedHat distributions, while
 Gentoo, Ubuntu and Suse are OK. 

 If you need to use generators on such a platform, compile your own 
 python from source and make sure that configure uses your version.


I think the patch should be commited so we can collect data about where
else the buggy version of python exists.

And if some buildfarm machines start crashing, python should be fixed
there.


Hannu




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] plpython improvements

2006-05-03 Thread Hannu Krosing
Ühel kenal päeval, P, 2006-04-30 kell 14:43, kirjutas Tom Lane:
 Sven Suursoho [EMAIL PROTECTED] writes:
  So, what about this in configure:
  if --with-python  test_iterator_app_crashes
 # errcode(FEATURE_NOT_SUPPORTED), errmsg(patch your python)
 disable_iterator_feature
  fi
 
 Testing it in configure is wrong, because there's no guarantee the same
 python library will be used at runtime.

As it is a crash bug, I can see two other ways to test:

1) do the test in startup script (maybe have pg_ctl run something)

2) test it in postmaster by running an external testprogram and see if
it crashes.


How do we handle other buggy library routines, like if some system
library crashes on divide-by-zero or similar ?


Hannu


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] plpython improvements

2006-04-27 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-04-27 kell 10:17, kirjutas Bruce Momjian:
 Sorry, I have to revert this patch because it is causing crashes in the
 plpython regression tests.  Would you please run those tests, fix the
 bug, and resubmit.  Thanks.

Where exactly does it crash ?

Please tell us the version (and buildflags) of python you are using. 

There is a superfluous assert in some versions of python that has
troubled us as well.

---
Hannu


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] plpython improvements

2006-04-17 Thread Hannu Krosing
Ühel kenal päeval, L, 2006-04-15 kell 17:59, kirjutas Sven Suursoho:
 1) named parameters additionally to args[]
 2) return composite-types from plpython as dictionary
 3) return result-set from plpython as list, iterator or generator
 
 Test script attached (patch-test.sql) but not integrated to plpython  
 test-suite.

If you wonder why you can't apply the patch, it is against postgres
8.0.7.


Hannu


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-08-24 Thread Hannu Krosing
On K, 2005-08-17 at 15:40 -0400, Tom Lane wrote:
Saatja: 
 Tom Lane [EMAIL PROTECTED]
   Kellele: 
 Bruce Momjian
 pgman@candle.pha.pa.us, Hannu
 Krosing [EMAIL PROTECTED], Neil Conway
 [EMAIL PROTECTED], pgsql-
 [EMAIL PROTECTED]
 Teema: 
 Re: [PATCHES] PATCH to allow
 concurrent VACUUMs to not lock each
   Kuupäev: 
 Wed, 17 Aug 2005 15:40:53 -0400
 (22:40 EEST)
 
 Just for the archives, attached is as far as I'd gotten with cleaning
 up
 Hannu's patch before I realized that it wasn't doing what it needed to
 do.  This fixes an end-of-transaction race condition (can't unset
 inVacuum before xact end, unless you want OldestXmin going backwards
 from the point of view of other people) and improves the documentation
 of what's going on.  But unless someone can convince me that it's safe
 to mess with GetSnapshotData, it's unlikely this'll ever get applied.
 
 
 

Attached is a patch, based on you last one, which messes with
GetSnapshotData in what I think is a safe way.

It introduces another attribute to PROC , proc-nonInVacuumXmin and
computes this in addition to prox-xmin inside GetSnapshotData.

When (and only when) GetOldestXmin is called with ignoreVacuum=true,
then proc-nonInVacuumXmin is checked instead of prox-xmin.

I believe that this will make this change invisible to all other places
where GetSnapshotData or GetOldestXmin is used.

-- 
Hannu Krosing [EMAIL PROTECTED]





Index: src/backend/access/transam/twophase.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/twophase.c,v
retrieving revision 1.10
diff -c -r1.10 twophase.c
*** src/backend/access/transam/twophase.c	20 Aug 2005 23:26:10 -	1.10
--- src/backend/access/transam/twophase.c	24 Aug 2005 12:01:17 -
***
*** 280,285 
--- 280,287 
  	gxact-proc.pid = 0;
  	gxact-proc.databaseId = databaseid;
  	gxact-proc.roleId = owner;
+ 	gxact-proc.inVacuum = false;
+ 	gxact-proc.nonInVacuumXmin = InvalidTransactionId;
  	gxact-proc.lwWaiting = false;
  	gxact-proc.lwExclusive = false;
  	gxact-proc.lwWaitLink = NULL;
Index: src/backend/access/transam/xact.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.214
diff -c -r1.214 xact.c
*** src/backend/access/transam/xact.c	20 Aug 2005 23:45:08 -	1.214
--- src/backend/access/transam/xact.c	24 Aug 2005 12:01:17 -
***
*** 1516,1521 
--- 1516,1523 
  		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  		MyProc-xid = InvalidTransactionId;
  		MyProc-xmin = InvalidTransactionId;
+ 		MyProc-inVacuum = false;	/* must be cleared with xid/xmin */
+ 		MyProc-nonInVacuumXmin = InvalidTransactionId; /* this too */
  
  		/* Clear the subtransaction-XID cache too while holding the lock */
  		MyProc-subxids.nxids = 0;
***
*** 1752,1757 
--- 1754,1761 
  	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  	MyProc-xid = InvalidTransactionId;
  	MyProc-xmin = InvalidTransactionId;
+ 	MyProc-inVacuum = false;	/* must be cleared with xid/xmin */
+ 	MyProc-nonInVacuumXmin = InvalidTransactionId; /* this too */
  
  	/* Clear the subtransaction-XID cache too while holding the lock */
  	MyProc-subxids.nxids = 0;
***
*** 1915,1920 
--- 1919,1926 
  		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
  		MyProc-xid = InvalidTransactionId;
  		MyProc-xmin = InvalidTransactionId;
+ 		MyProc-inVacuum = false;	/* must be cleared with xid/xmin */
+ 		MyProc-nonInVacuumXmin = InvalidTransactionId; /* this too */
  
  		/* Clear the subtransaction-XID cache too while holding the lock */
  		MyProc-subxids.nxids = 0;
Index: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.218
diff -c -r1.218 xlog.c
*** src/backend/access/transam/xlog.c	22 Aug 2005 23:59:04 -	1.218
--- src/backend/access/transam/xlog.c	24 Aug 2005 12:01:18 -
***
*** 5303,5309 
  	 * mustn't do this because StartupSUBTRANS hasn't been called yet.
  	 */
  	if (!InRecovery)
! 		TruncateSUBTRANS(GetOldestXmin(true));
  
  	if (!shutdown)
  		ereport(DEBUG2,
--- 5303,5309 
  	 * mustn't do this because StartupSUBTRANS hasn't been called yet.
  	 */
  	if (!InRecovery)
! 		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	if (!shutdown)
  		ereport(DEBUG2,
Index: src/backend/catalog/index.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.259
diff -c -r1.259 index.c
*** src/backend/catalog/index.c	12 Aug 2005 01:35:56 -	1.259
--- src/backend/catalog/index.c	24 Aug 2005 12:01:18 -
***
*** 1433,1439

Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-08-13 Thread Hannu Krosing
On R, 2005-08-12 at 15:47 -0400, Bruce Momjian wrote:
 This has been saved for the 8.2 release:
 
   http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Is there any particular reason for not putting it in 8.1 ?

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-07-04 Thread Hannu Krosing
On P, 2005-07-03 at 12:19 -0400, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  Ok, this is a new version of the vacuum patch with the following changes
  following some suggestions in this thread.
 
 The more I look at this, the uglier it looks ... and I still haven't
 seen any convincing demonstration that it *works*, ie doesn't have
 bad side-effects on the transaction-is-in-progress logic.

The function GetOldestXmin is used *only* when determining oldest xmin
for transactions.

 I'm particularly concerned about what happens to the RecentXmin horizon
 for pg_subtrans and pg_multixact operations.

How are they affected by this change ? They should still see the vacuum
as oldest transaction, unless they 


Oh, now I see. I'm pretty sure that at the time of original patch, the
*only* uses of GetOldestXmin was from VACUUM and catalog/index.c and
both for the same purpose, but now I see also a call from
access/transam/xlog.c.

Perhaps I should separate the function used by vacuum into another
function, say GetOldestDataChangingXmin(),  to keep the possible impact
as localised as possible.

Do you have any specific concerns related to this patch after that ?

Or should I just back off for now and maybe start a separate project for
ironing out patches related to running postgresql in real-world 24/7
OLTP environment (similar to what Bizgres does for OLAP ) ?

-- 
Hannu Krosing [EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-07-04 Thread Hannu Krosing
On E, 2005-07-04 at 10:24 +0300, Hannu Krosing wrote:
 On P, 2005-07-03 at 12:19 -0400, Tom Lane wrote:
  Hannu Krosing [EMAIL PROTECTED] writes:
   Ok, this is a new version of the vacuum patch with the following changes
   following some suggestions in this thread.
  
  The more I look at this, the uglier it looks ... and I still haven't
  seen any convincing demonstration that it *works*, ie doesn't have
  bad side-effects on the transaction-is-in-progress logic.

Ok, I changed GetOldestXmin() to use proc-inVacuum only when
determining the oldest visible xid for vacuum and index (i.e. which
tuples are safe to delete and which tuples there is no need to index).

The third use on GetOldestXmin() in xlog.c is changed to use old
GetOldestXmin() logic.


My reasoning for why the patch should work is as follows:

1) the only transaction during which inVacuum is set is the 2nd
transaction (of originally 3, now 4) of lazy VACUUM, which does simple
heap scanning and old tuple removal (lazy_vacuum_rel()), and does no
externally visible changes to the data. It only removes tuples which are
already invisible to all running transactions. 

2) That transaction never deletes, updates or inserts any tuples on it
own.

3) As it can't add any tuples or change any existing tuples to have its
xid as either xmin or xmax, it already does run logically outside of
transactions.

4) The only use made of of proc-inVacuum is when determining which
tuples are safe to remove (in vacuum.c) or not worth indexing (in
index.c) and thus can't affect anything else.



I can easily demonstrate that it works in the sense that it allows
several concurrent vacuums to clean out old tuples, and I have thus far
been unable to construct the counterexample where it does anything bad.

Could you tell me which part of my reasoning is wrong or what else do I
overlook.

-- 
Hannu Krosing [EMAIL PROTECTED]
Index: src/backend/access/transam/xact.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.209
diff -c -r1.209 xact.c
*** src/backend/access/transam/xact.c	29 Jun 2005 22:51:53 -	1.209
--- src/backend/access/transam/xact.c	4 Jul 2005 10:57:06 -
***
*** 1402,1407 
--- 1402,1416 
  	AfterTriggerBeginXact();
  
  	/*
+ 	 * mark the transaction as not VACUUM  (vacuum_rel will set isVacuum 
+ 	 * to true directly after calling BeginTransactionCommand() )
+ 	 *
+ 	 * this can probably be moved to be done only once when establishing 
+ 	 * connection as this is now quaranteedto be reset to false in vacuum.c
+ 	 */
+ 	MyProc-inVacuum = false;
+ 
+ 	/*
  	 * done with start processing, set current transaction state to in
  	 * progress
  	 */
Index: src/backend/access/transam/xlog.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.205
diff -c -r1.205 xlog.c
*** src/backend/access/transam/xlog.c	30 Jun 2005 00:00:50 -	1.205
--- src/backend/access/transam/xlog.c	4 Jul 2005 10:57:07 -
***
*** 5161,5167 
  	 * mustn't do this because StartupSUBTRANS hasn't been called yet.
  	 */
  	if (!InRecovery)
! 		TruncateSUBTRANS(GetOldestXmin(true));
  
  	if (!shutdown)
  		ereport(DEBUG2,
--- 5161,5167 
  	 * mustn't do this because StartupSUBTRANS hasn't been called yet.
  	 */
  	if (!InRecovery)
! 		TruncateSUBTRANS(GetOldestXmin(true, false));
  
  	if (!shutdown)
  		ereport(DEBUG2,
Index: src/backend/catalog/index.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/catalog/index.c,v
retrieving revision 1.258
diff -c -r1.258 index.c
*** src/backend/catalog/index.c	25 Jun 2005 16:53:49 -	1.258
--- src/backend/catalog/index.c	4 Jul 2005 10:57:07 -
***
*** 1420,1426 
  	else
  	{
  		snapshot = SnapshotAny;
! 		OldestXmin = GetOldestXmin(heapRelation-rd_rel-relisshared);
  	}
  
  	scan = heap_beginscan(heapRelation, /* relation */
--- 1420,1426 
  	else
  	{
  		snapshot = SnapshotAny;
! 		OldestXmin = GetOldestXmin(heapRelation-rd_rel-relisshared, true);
  	}
  
  	scan = heap_beginscan(heapRelation, /* relation */
Index: src/backend/commands/vacuum.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.310
diff -c -r1.310 vacuum.c
*** src/backend/commands/vacuum.c	14 Jun 2005 22:15:32 -	1.310
--- src/backend/commands/vacuum.c	4 Jul 2005 10:57:08 -
***
*** 37,42 
--- 37,43 
  #include miscadmin.h
  #include storage/freespace.h
  #include storage/procarray.h
+ #include storage/proc.h
  #include storage/smgr.h
  #include tcop/pquery.h
  #include utils/acl.h
***
*** 571,577 
  {
  	TransactionId limit;
  
! 	*oldestXmin = GetOldestXmin(sharedRel);
  
  	Assert

Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-07-03 Thread Hannu Krosing
On E, 2005-05-23 at 11:42 -0400, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  I can't think of any other cases where it could matter, as at least the
  work done inside vacuum_rel() itself seema non-rollbackable.
 
 VACUUM FULL's tuple-moving is definitely roll-back-able, so it might be
 prudent to only do this for lazy VACUUM.  But on the other hand, VACUUM
 FULL holds an exclusive lock on the table so no one else is going to see
 its effects concurrently anyway.

Ok, this is a new version of the vacuum patch with the following changes
following some suggestions in this thread.

* changed the patch to affect only lazy vacuum 
* moved inVacuum handling to use PG_TRY
* moved vac_update_relstats() out of lazy_vacuum_rel into a separate
  transaction. The code to do this may not be the prettiest, maybe it
  should use a separate struct.

-- 
Hannu Krosing [EMAIL PROTECTED]
Index: src/backend/access/transam/xact.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/access/transam/xact.c,v
retrieving revision 1.209
diff -c -r1.209 xact.c
*** src/backend/access/transam/xact.c	29 Jun 2005 22:51:53 -	1.209
--- src/backend/access/transam/xact.c	3 Jul 2005 15:59:09 -
***
*** 1402,1407 
--- 1402,1416 
  	AfterTriggerBeginXact();
  
  	/*
+ 	 * mark the transaction as not VACUUM  (vacuum_rel will set isVacuum 
+ 	 * to true directly after calling BeginTransactionCommand() )
+ 	 *
+ 	 * this can probably be moved to be done only once when establishing 
+ 	 * connection as this is now quaranteedto be reset to false in vacuum.c
+ 	 */
+ 	MyProc-inVacuum = false;
+ 
+ 	/*
  	 * done with start processing, set current transaction state to in
  	 * progress
  	 */
Index: src/backend/commands/vacuum.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/vacuum.c,v
retrieving revision 1.310
diff -c -r1.310 vacuum.c
*** src/backend/commands/vacuum.c	14 Jun 2005 22:15:32 -	1.310
--- src/backend/commands/vacuum.c	3 Jul 2005 15:59:15 -
***
*** 37,42 
--- 37,43 
  #include miscadmin.h
  #include storage/freespace.h
  #include storage/procarray.h
+ #include storage/proc.h
  #include storage/smgr.h
  #include tcop/pquery.h
  #include utils/acl.h
***
*** 903,908 
--- 904,913 
  	Oid			toast_relid;
  	bool		result;
  
+ 	BlockNumber	stats_rel_pages=0;
+ 	double		stats_rel_tuples=0;
+ 	bool		stats_hasindex=false;
+ 
  	/* Begin a transaction for vacuuming this relation */
  	StartTransactionCommand();
  	/* functions in indexes may want a snapshot set */
***
*** 1016,1039 
  	 */
  	toast_relid = onerel-rd_rel-reltoastrelid;
  
! 	/*
! 	 * Do the actual work --- either FULL or lazy vacuum
! 	 */
! 	if (vacstmt-full)
! 		full_vacuum_rel(onerel, vacstmt);
! 	else
! 		lazy_vacuum_rel(onerel, vacstmt);
  
! 	result = true;/* did the vacuum */
  
! 	/* all done with this class, but hold lock until commit */
! 	relation_close(onerel, NoLock);
  
! 	/*
! 	 * Complete the transaction and free all temporary memory used.
! 	 */
! 	StrategyHintVacuum(false);
! 	CommitTransactionCommand();
  
  	/*
  	 * If the relation has a secondary toast rel, vacuum that too while we
--- 1021,1071 
  	 */
  	toast_relid = onerel-rd_rel-reltoastrelid;
  
! 	PG_TRY();
! 	{
! 		/*
! 		 * Do the actual work --- either FULL or lazy vacuum
! 		 */
! 		if (vacstmt-full)
! 			full_vacuum_rel(onerel, vacstmt);
! 		else
! 			/* mark this transaction as being a lazy vacuum */
! 			MyProc-inVacuum = true;
  
! 			lazy_vacuum_rel(onerel, vacstmt, stats_rel_pages, stats_rel_tuples, stats_hasindex);
  
! 		result = true;/* did the vacuum */
  
! 		/* all done with this class, but hold lock until commit */
! 		relation_close(onerel, NoLock);
! 
! 		/*
! 		 * Complete the transaction and free all temporary memory used.
! 		 */
! 		StrategyHintVacuum(false);
! 		CommitTransactionCommand();
! 	}
! 	PG_CATCH();
! 	{
! 		/* make sure in-vacuum flag is cleared is cleared */
! 		MyProc-inVacuum = false;
! 		PG_RE_THROW();
! 	}
! 	PG_END_TRY();
! 	MyProc-inVacuum = false;
! 
! 
! /* use yet another transaction for saving stats from lazy_vacuum_rel into pg_class */
! if (stats_rel_pages  0) {
! 		StartTransactionCommand();
! 		ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); /* is this needed ? */		
! 		/* Update statistics in pg_class */
! 		vac_update_relstats(RelationGetRelid(onerel),
! 			stats_rel_pages,
! 			stats_rel_tuples,
! 			stats_hasindex);
! 		CommitTransactionCommand();
! 	}
  
  	/*
  	 * If the relation has a secondary toast rel, vacuum that too while we
Index: src/backend/commands/vacuumlazy.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.54
diff -c -r1.54 vacuumlazy.c
*** src

Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-05-23 Thread Hannu Krosing
On E, 2005-05-23 at 10:16 -0400, Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  I'm a little worried about having this set to true after a VACUUM is 
  executed, and only reset to false when the next transaction is begun: it 
  shouldn't affect correctness right now, but it seems like asking for 
  trouble. Resetting the flag to false after processing a transaction 
  would probably be worth doing.
 
 These days I'd be inclined to use a PG_TRY construct to guarantee the
 flag is cleared, rather than loading another cleanup operation onto
 unrelated code.

Ok, will check out PG_TRY. I hoped that there is some way not to set
inVacuum to false at each transaction start and still be sure that it
will be reverted after vacuum_rel.

So I'll set it once at the start of connection and then maintain it in
vacuum_rel() using PG_TRY.

 The MyProc != NULL tests are a waste of code space.  You can't even
 acquire an LWLock without MyProc being set, let alone access tables.

Thanks, I'll get rid of them.

 The real issue here though is whether anyone can blow a hole in the
 xmin assumptions: is there any situation where ignoring a vacuum
 transaction breaks things?  I haven't had time to think about it
 in any detail, but it definitely needs to be thought about.

There may be need to exclude vacuum/analyse on system relations from
being ignored by vacuum_rel() as I suspect that the info they both write
to pg_class, pg_attribute, and possibly other tables may be vulnerable
to crashes at right moment. 

Also it may be prudent to not exclude other vacuums, when the vacuum_rel
() itself is run on a system relation.

I'm not sure which way it is, as my head gets all thick each time I try
to figure it out ;p.

I can't think of any other cases where it could matter, as at least the
work done inside vacuum_rel() itself seema non-rollbackable.

-- 
Hannu Krosing [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] PATCH to allow concurrent VACUUMs to not lock each

2005-05-23 Thread Hannu Krosing
On E, 2005-05-23 at 11:42 -0400, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  I can't think of any other cases where it could matter, as at least the
  work done inside vacuum_rel() itself seema non-rollbackable.
 
 VACUUM FULL's tuple-moving is definitely roll-back-able, so it might be
 prudent to only do this for lazy VACUUM.  But on the other hand, VACUUM
 FULL holds an exclusive lock on the table so no one else is going to see
 its effects concurrently anyway.

I'm not interested in VACUUM FULL at all. This is improvement mainly for
heavy update OLAP databases, where I would not even think of running
VACUUM FULL.

I'll cheks if there's an easy way to exclude VACUUM FULL.

 As I said, it needs more thought than I've been able to spare for it yet
 ...

Ok, thanks for comments this far .

-- 
Hannu Krosing [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] COPY Fillfactor patch

2005-04-20 Thread Hannu Krosing
On T, 2005-04-12 at 18:38 +0100, Simon Riggs wrote:
 On Tue, 2005-04-12 at 09:56 -0400, Tom Lane wrote:

(Neil, I added you to CC: to show you at least two more places where
sparse heaps can be generally useful and not tweaks for single
benchmark)

  Simon Riggs [EMAIL PROTECTED] writes:
   During recent tuning of the TPC-C workload, I produced the following
   patch to force COPY to leave some space in each data block when it loads
   data into heap relations.

When I comtemplated a similar idea some time ago, my primary goal was
reducing disk head movement during massive updates. 

At that time it seemed to me cheaper to not leave space in each page,
but to leave each Nth page empty, as more new tuples will be punt on the
same page and thus cause less WAL writes. 

Warning: This may be a false assumption - I did not check from code, if
this is actually so for any or even a significant number of cases.

  Put the info into the Relation
  structure instead of cluttering heap_insert's API.  (This would mean
  that *every* heap_insert is affected, not only COPY, which is what you
  want anyway I would think.)
 
 Well, I intended it to work *only* for copy, not for insert, which is
 why I did it that way. 

To be more generally useful similar thing should be added to VACUUM FULL
and CLUSTER.

And perhaps some weird LAZY version of VACUUM EXPAND could be written
as well, for keeping the holes from filling up in constantly growing
databases.

Having these holes is also essential, if we want a cheap way to keep
data in CLUSTER order after initial CLUSTER command - if we do have free
space everywhere in the file, we can just put each new tuple on the
first page with free space on or after its preceeding tuple in cluster
index.

 Anyway, when I get time, I'll finish off the patch. Unless other readers
 would like to follow on.

I hope you will you will get that time before 8.1 ships :)

-- 
Hannu Krosing [EMAIL PROTECTED]

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [pgsql-hackers-win32] [PATCHES] SRA Win32 sync() code

2003-11-17 Thread Hannu Krosing
Bruce Momjian kirjutas E, 17.11.2003 kell 03:58:

 
 OK, let me give you my logic and you can tell me where I am wrong.
 
 First, how many backend can a single write process support if all the
 backends are doing insert/update/deletes?  5?  10?  Let's assume 10. 
 Second, once we change write to write/fsync, how much slower will that
 be?  100x, 1000x?  Let's say 10x.
 
 So, by my logic, if we have 100 backends all doing updates, we will need
 10 * 100 or 1000 writer processes or threads to keep up with that load. 
 That seems quite excessive to me from a context switching and process
 overhead perspective.
 
 Where am I wrong?

Maybe you meant 100/10 instead of 100*10 ;)


Hannu

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] ALTER TABLE modifications

2003-11-16 Thread Hannu Krosing
Rod Taylor kirjutas L, 08.11.2003 kell 18:55:
 A general re-organization of Alter Table. Node wise, it is a
 AlterTableStmt with a list of AlterTableCmds.  The Cmds are the
 individual actions to be completed (Add constraint, drop constraint, add
 column, etc.)
 
 Processing is done in 2 phases. The first phase updates the system
 catalogs and creates a work queue for the table scan. The second phase
 is to conduct the actual table scan evaluating all constraints and other
 per tuple processing simultaneously, as required. This has no effect on
 single step operations, but has a large benefit for combinational logic
 where multiple table scans would otherwise be required.

...

 ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...; 
 Currently migrates indexes, check constraints, defaults, and the
 column definition to the new type with optional transform. If
 the tranform is not supplied, a standard assignment cast is
 attempted.

Do you have special cases for type changes which don't need data
transforms. 

I mean things like changing VARCHAR(10) to VARCHAR(20), dropping the NOT
NULL constraint or changing CHECK A  3 to CHECK A  4. 

All these could be done with no data migration or extra checking.

So how much of it should PG attemt to detect automatically and should
there be NOSCAN option when progremmer knows better 
(changing CHECK ABS(A)  3 into CHECK 9  (A*A) )


Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] ALTER TABLE modifications

2003-11-14 Thread Hannu Krosing
Alvaro Herrera kirjutas R, 14.11.2003 kell 16:17:
 On Fri, Nov 14, 2003 at 08:59:05AM -0500, Dave Cramer wrote:
 
  I tried the current patch on a RC2 release, and I noticed one
  undesirable side affect. 
  
  Modifying a column  moves it to the end. In high availability situations
  this would not be desirable, I would imagine it would break lots of
  code.
 
 This is expected.  Doing otherwise would incur into a much bigger
 performance hit.

Not neccessarily, but it would make the patch much bigger ;)

IIRC there was discussion about splitting colum numbers into physical
and logical numbers at the time when DROP COLUMN was done.

 Anyway, IMHO no code should use SELECT * in any case, which is the only
 scenario where one would expect physical column order to matter, isn't
 it?

and this could also break when just changing the column type.


Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] ALTER TABLE modifications

2003-11-13 Thread Hannu Krosing
Peter Eisentraut kirjutas K, 12.11.2003 kell 21:02:
 Rod Taylor writes:
 
  ALTER TABLE tab ADD COLUMN col DEFAULT 3, ADD CHECK (anothercol  3);
  The above combinational syntax is commented out in gram.y. The
  support framework is used in both the above and below items, but
  arbitrary statements probably have some issues -- I've not
  tested enough to determine.
 
  If it is useful, it will be submitted at a later date.
 
 I think it's perfectly fine to write two separate ALTER TABLE statements.

I guess the difference is that each pass (i.e. ALTER TABLE) needs to do
another scan and copy of the table. Putting them in one statement allows
all the alterations to be done in one pass.

 No need to introduce this nonstandard syntax.
 
  ALTER TABLE tab ALTER COLUMN col TYPE text TRANSFORM ...;
  Currently migrates indexes, check constraints, defaults, and the
  column definition to the new type with optional transform. If
  the tranform is not supplied, a standard assignment cast is
  attempted.
 
 Please don't use the term transform.  It is used by the SQL standard for
 other purposes. 

Is the other use conflicting with this syntax ?

I think we have preferred reusing existing keywords to adding new ones
in the past.

-
Hannu


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])