Re: [HACKERS] Horizontal scalability/sharding

2015-08-31 Thread Marc Munro
On Mon, 2015-08-31 at 22:21 +,  Robert Haas wrote:

> It seems to me that sharding consists of (1) breaking your data set up
> into shards, (2) possibly replicating some of those shards onto
> multiple machines, and then (3) being able to access the remote data
> from local queries. [...]

I believe there is another aspect to sharding that I have not yet seen
mentioned, which is one of connection routing.  

One use case that I have been involved in, is to simply partition the
application into entirely, or almost entirely, separate datasets running
on separate databases with little or no need for queries to access
remote data.

This allows each database to deal only with connections from clients
that actually want its local data, greatly reducing the number of
connections on any individual database.  If this works for your
application, your ability to scale is great.

The pain point comes from trying to route queries to the correct
database.  Inevitably, everyone taking this route builds custom
connection-selection layers into their apps.

It seems to me that even with the more sophisticated types of sharding
being discussed here, the ability to conditionally route a
query/connection to a suitable starting database could be quite
beneficial.

Although this is probably a job for the pgbouncer/pgpool developers
rather than the hackers on this list, this thread seems to be a good
place to mention it.

__
Marc




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] BUG #11867: Strange behaviour with composite types after resetting database tablespace

2014-11-04 Thread Marc Munro
On Mon, 2014-11-03 at 22:08 -0500, Tom Lane wrote:
 spockFascinating./spock

:-)

 I believe what is happening is:

[ . . . ]

  This is not mission-critical for me but I'd be grateful for suggestions for
  work-arounds.
 
 I don't see any workaround that's much easier than fixing the bug.
 But what's your use case that involves flipping databases from one
 tablespace to another and then back again within the life expectancy of
 unused shared-buffers pages?  It doesn't seem terribly surprising that
 nobody noticed this before ...

It's a completely unreal use case.  I am working on a diff tool, and
this was found as part of my test suite: build db x, drop db x,  build
db y, apply diff y-x, compare with original x, apply diff x-y, compare
with original y.

So this is a fairly minor inconvenience for me.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [v9.4] row level security

2013-10-10 Thread Marc Munro
On Wed, 2013-09-04 at 14:35 +, Robert Haas wrote:
 
 On Fri, Aug 30, 2013 at 3:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  I think it's entirely sensible to question whether we should reject
 (not
  hold up) RLS if it has major covert-channel problems.
 
 We've already had this argument before, about the security_barrier
[ . . . ]

Sorry for following up on this so late, I have just been trying to catch
up with the mailing lists.

I am the developer of Veil, which this thread mentioned a number of
times.  I wanted to state/confirm a number of things:

Veil is not up to date wrt Postgres versions.  I didn't release a new
version for 9.2, and when no-one complained I figured no-one other than
me was using it.  I'll happily update it if anyone wants it.

Veil makes no attempt to avoid covert channels.  It can't.

Veil is a low-level toolset designed for optimising queries about
privileges.  It allows you to build RLS with reasonable performance, but
it is not in itself a solution for RLS.

I wish the Postgres RLS project well and look forward to its release in
Postgres 9.4.  

__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] leakproof

2012-02-22 Thread Marc Munro
On Wed, 2012-02-22 at 12:44 -0400, Andrew Dunstan wrote:

 Returning to the original point, I've come to the conclusion that
 pure 
 isn't the right way to go. The trouble with leakproof is that it 
 doesn't point to what it is that's not leaking, which is information 
 rather than memory, as many might imagine (and I did) without further 
 hints. I'm not sure any single English word would be as descriptive as
 I'd like.

As the developer of veil I feel marginally qualified to bikeshed here:
how about silent?  A silent function being one that will not blab.

There are also quite a few synonyms in the thesaurus for trustworthy.  I
kind of like honorable or righteous myself.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Schema grants for creating and dropping objects

2011-10-09 Thread Marc Munro
On Sun, 2011-10-09 at 11:58 -0400, Tom Lane wrote:
 Marc Munro m...@bloodnok.com writes:
  It seems that in order to create an object in a given schema, I must
  have been granted create privilege on the schema.  But in order to drop
  that object I require usage privilege.  
 
  This means that with the right privilege settings I can create objects
  that I cannot subsequently drop, or can drop an object that I cannot
  recreate.
 
 Yeah.  So?  You can get similar effects with read-only or write-only
 directories in Unix filesystems.  Don't see why you find this surprising.

It's just that ordinarily as the owner of an object, I can do what I
like with it.  In this case, I can't.  In fact, once I've created the
table I can't access it.  I guess the surprising thing to me is that I'm
allowed to create it without usage privilege.

However, it is what it is, and as it is intended behaviour I will
happily work with it.  Perhaps some extra notes in the documentation of
the sql-grants section might be useful.

As a side note: creating a file in a write-only directory on Unix
doesn't work for me.

   regards, tom lane

Thanks for the response.

__
Marc



signature.asc
Description: This is a digitally signed message part


[HACKERS] Schema grants for creating and dropping objects

2011-10-08 Thread Marc Munro
It seems that in order to create an object in a given schema, I must
have been granted create privilege on the schema.  But in order to drop
that object I require usage privilege.  

This means that with the right privilege settings I can create objects
that I cannot subsequently drop, or can drop an object that I cannot
recreate.

I assume this is a bug but if it's intended behaviour I'd love to hear
the rationale.

I checked this on 8.3, 8.4, 9.0 and 9.1 all with the same results.

Best regards.
__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [GENERAL] Dropping extensions

2011-07-23 Thread Marc Munro
On Sat, 2011-07-23 at 11:08 -0400, Tom Lane wrote:
  If I drop the extension veil_demo, I am left with the veil_demo version
  of veil_init().
 
  Is this a feature or a bug?  Is there a work-around?
 
 Hmm.  I don't think we have any code in there to prohibit the same
 object from being made a member of two different extensions ... but this
 example suggests that maybe we had better check that.
 
 In general, though, it is not intended that extension creation scripts
 use CREATE OR REPLACE, which I gather you must be doing.

That's right.  Ultimately I'd like to be able to create a number of
extensions, all further extending the base functionality of veil, with
each one further extending veil_init().  I could consider a more
generalised callback mechanism but that adds more complexity and
overhead without really buying me anything functionally.  I will look
into it though.

While it would be great to be able to return functions to their previous
definition automatically, other simpler mechanisms might suffice.  For
me, being able to run a post-drop script would probably be adequate.
For now, I will just add some notes to the documentation.

Thanks for the response.

__
Marc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [GENERAL] Date conversion using day of week

2011-03-31 Thread Marc Munro
On Thu, 2011-03-31 at 08:00 -0700, Adrian Klaver wrote:
 On Wednesday, March 30, 2011 8:39:25 pm Brendan Jurd wrote:
  On 31 March 2011 03:15, Steve Crawford scrawf...@pinpointresearch.com 
  wrote:
   On 03/29/2011 04:24 PM, Adrian Klaver wrote:
   ...
   Well the strange part is only fails for SUN:...
[. . .]
  
  We *could* make the OP's query return the Sunday of ISO week 2011-13,
  which would be properly written 2011-13-7, but I think the right move
  here would be to throw the error for illegal mixture of format tokens.
   This is a trivial change -- just a matter of changing the from_date
  type on the DAY, Day, day, DY, Dy, dy keys.
[. . .]
 Just to play Devils advocate here, but why not? The day name is the same 
 either 
 way, it is the index that changes. I am not sure why that could not be 
 context 
 specific?

Just to be clear, the reason I was mixing things in this way was that I
wanted to validate that the dayname being passed was valid for the
current locale, and I could find no easier way of doing it.  FTR, I have
now resorted to finding the given dayname in the results of this query:

select day, to_char(day, 'dy') as dayname, 
   extract('dow' from day) as dayno
  from (
select current_date + n as day 
  from generate_series(0, 6) as n) d;

If there is an easier way of doing this, please let me know.  As far as
the postgres API goes, exposing a function that would validate a dayname
returning a day number would resolve all of this for considerably less
complexity.  Also throwing an error in the to_date function for
unexpectedly mixed input formats seems quite reasonable.

Thanks for your time and attention.  The commercial RDBMS vendors could
learn a lot about customer support from this forum.

__
Marc Munro


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCH] Fix leaky VIEWs for RLS

2010-06-04 Thread Marc Munro
On Fri, 2010-06-04 at 10:33 -0400, Tom Lane wrote:
 Hmm ... that's a mighty interesting example, because it shows that any
 well-meaning change in error handling might render seemingly-unrelated
 functions unsafe.  And we're certainly not going to make error
 messages stop showing relevant information just because of this.

Although that looks like a show-stopper, I think it can be worked
around.  Errors in operations on security views could simply be caught
and conditionally rewritten.  The original error could still appear in
the logs but the full details need not be reported to unprivileged
users.

If that can be done, then we would still need to be able to identify
trusted functions and views used for security purposes, and ensure that
(please excuse my terminology if it is incorrect) untrusted quals do not
get pushed down inside secured views.  If all of that can be done along
with the error trapping, then we may have a solution.

My big concern is still about performance, particularly when joining
between multiple security views and other objects.  I don't expect to
get security for free but I don't want to see unnecessary barriers to
optimisation.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [pgsql-hackers] Daily digest v1.10705 (13 messages)

2010-06-03 Thread Marc Munro
On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-ow...@postgresql.org
wrote:
 [ . . . ]

 In my current idea, when a qual-node that contains FuncExpr tries to
 reference a part of relations within a security view, its referencing
 relations will be expanded to whole of the security view at
 distribute_qual_to_rels().
 [ . . . ]

I may be missing something here but this seems a bit too simplistic and,
I think, fails to deal with an important use case.

Security views, that do anything useful at all, tend to introduce
performance issues.  I am concerned that placing a conceptual barrier
between the secured and unsecured parts of queries is going to
unnecessarily restrict what the optimiser can do.

For example consider that we have three secured views, each of the form:

  create view s_x as select * from x where i_can_see(x.key);

and consider the query:

  select stuff from s_x 
inner join s_y on s_y.key = s_x.key
inner join s_z on s_z.key = s_x.key  
  where fn(s_x.a) = 3;

The optimiser ought to be able to spot the fact that i_can_see() need
only be called once for each joined result.  By placing a barrier (if I
understand your proposal correctly) between the outermost joins and the
inner views, doesn't this optimisation become impossible?

I think a simpler solution may be possible here.  If you can tag the
function i_can_see() as a security function, at least in the context of
its use in the security views, and then create the rule that security
functions are always considered to be lower cost than user-defined
non-security functions, don't we achieve the result of preventing the
insecure function from seeing rows that it shouldn't?

I guess my concern is that a query may be constructed a=out of secured
and unsecured parts and the optimiser should be free to group all of the
secured parts together before considering the unsecured parts.

Sorry for the imprecise language and terminolgy, and also if I have
completely misunderstood the implications.

Best Wishes

__
Marc (the veil guy)



signature.asc
Description: This is a digitally signed message part


[HACKERS] Hot Standby query cancellation and Streaming Replication integration

2010-03-02 Thread Marc Munro
On Mon, 2010-03-01 at 16:12 -0400, pgsql-hackers-ow...@postgresql.org
wrote:
 . . . 
 However there is a concern with max_standby_age. If you set it to,
 say, 300s. Then run a 300s query on the slave which causes the slave
 to fall 299s behind. Now you start a new query on the slave -- it gets
 a snapshot based on the point in time that the slave is currently at.
 If it hits a conflict it will only have 1s to finish before the
 conflict causes the query to be cancelled.
 
 In short in the current setup I think there is no safe value of
 max_standby_age which will prevent query cancellations short of -1. If
 the slave has a constant stream of queries and always has at least one
 concurrent query running then it's possible that the slave will run
 continuously max_standby_age-epsilon behind the master and cancel
 queries left and right, regardless of how large max_standby_age is.
 
 To resolve this I think you would have to introduce some chance for
 the slave to catch up. Something like refusing to use a snapshot older
 than max_standby_age/2  and instead wait until the existing queries
 finish and the slave gets a chance to catch up and see a more recent
 snapshot. The problem is that this would result in very unpredictable
 and variable response times from the slave. A single long-lived query
 could cause replay to pause for a big chunk of max_standby_age and
 prevent any new query from starting.
 
 Does anyone see any way to guarantee that the slave gets a chance to
 replay and new snapshots will become visible without freezing out new
 queries for extended periods of time?

At the risk of looking foolish, I have a hand-wavy,
unlikely-to-be-possible, not-going-to-make-it-for-9.0, and maybe
unoriginal idea that I'll share.

As Greg has identified, no matter what max_standby_age you select, a
sequence of overlapping queries will eventually exceed the
max_standby_delay threshold, and tuples that your query depends on would
then be modified underneath you.

IIUC this is only a problem for WAL from HOT updates and vacuums.  If no
vacuums or HOT updates have been performed, there is no risk of
returning bad data.  So WAL that does not contain HOT updates or vacuums
could be applied on the standby without risk, even if there are
long-running queries in play.  This is not a complete solution but may
reduce the likelihood of queries having to be cancelled.  I guess the
approach here would be to check WAL before applying it, and only cancel
queries if the WAL contains HOT updates or vacuums.

Taking the idea further, if WAL records contained the tid of the latest
tuples that were overwritten, even more WAL could be applied without
having to cancel queries.

To take it further still, if vacuum on the master could be prevented
from touching records that are less than max_standby_delay seconds old,
it would be safe to apply WAL from the very latest vacuum.  I guess HOT
could be handled similarly though that may eliminate much of the
advantage of HOT updates.

Apologies if this has already been covered, some of this discussion went
over my head.

/action Puts on asbestos underwear

__
Marc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using views for row-level access control is leaky

2009-10-22 Thread Marc Munro
Just to intoduce myself, I'm Marc Munro the developer of Veil, a
postgres add-in that allows you to implement virtual private databases
using views.

The problem we are discussing here is the existence of covert or
side-channels being available from functions that can leak information
about the rows they see, even though the end-user may not see those
rows.  These can be built-ins such as set_config() (thanks, Heikki) or
user-defined.

I assert that any attempt to use a secured-view's where-clause
conditions before any other conditions are applied will lead to poor
performance.

Here is a typical veil secured view definition:

create view parties as
SELECT party_id, client_id, party_type_id, username, party_name
FROM parties.parties
WHERE api.user_has_client_or_personal_privilege(client_id,
party_id, 'select parties') 
OR api.user_has_client_privilege(party_id, 'select clients');

A typical query against this would be:

select * from parties where party_id = 42;

The conditions in the view's where clause cannot generally be indexed.
Applying those conditions before the user-supplied conditions would mean
that a full-table scan would be required and performance would suck.  In
fact, this very suckiness also exposes a covert channel in that now we
can use the performance of the query to estimate the number of party
records.

The most acceptable solution I have heard so far for this issue, is to
identify those functions which can leak information as 'insecure', and 
those views which are for security purpose as 'secured'.  Then it is
simply (hah!) a matter of planning the query of secured views so that
all insecure functions are called after all secure functions.  In this
way, they will only be able to leak what the user is entitled to see,
and performance will only be as badly affected as is strictly necessary.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] RFE: Transparent encryption on all fields

2009-04-23 Thread Marc Munro
On Thu, 2009-04-23 at 16:08 -0300, pgsql-hackers-ow...@postgresql.org
wrote:
 On Thu, Apr 23, 2009 at 10:38:55AM -0400, Bill Moran wrote:
 
 [...]
 
  It's possible that this could be accomplished by something like
 Veil,
 
 Veil? Care to share an URL?

http://veil.projects.postgresql.org/curdocs/index.html

Veil is intended to enable implementation of virtual private databases.
It provides a bunch of primitives for managing bitmaps of privileges.
These privileges can be used to control access to individual rows within
a table.

In principle it could be used in the way that Bill Moran suggests though
I have never used it that way.  I am somewhat suspicious of passing
encryption keys to the database server as there is always the potential
for them to be leaked.  It is generally much safer to keep keys and the
decryption process on a separate server.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [BUGS] Status of issue 4593

2009-01-12 Thread Marc Munro
On Mon, 2009-01-12 at 14:35 -0400, Jeff Davis wrote:
 ate: Mon, 12 Jan 2009 09:52:00 -0800

 On Mon, 2009-01-12 at 08:32 -0500, Tom Lane wrote:
  That code has been working like this for eight or ten years now and
 this
  is the first complaint, so taking away functionality on the grounds
 that
  someone might happen to update the ordering column doesn't seem like
 the
  answer to me.
  
 
 If they are using FOR UPDATE, they clearly expect concurrent updates.
 If they're using ORDER BY, they clearly expect the results to be in
 order.
 
 So who is the target user of this functionality we're trying to
 protect?

If by the question above you are asking for a rational use-case, I think
I can give you one.

In my Oracle days I used to use select for update order by, in order to
reduce the likelihood of deadlocks.  If locks are always taken in the
same order, deadlocks become considerably less likely.

Unfortunately, I took this construct, which as far as I know works fine
in Oracle, and continued to use it in Postgres just assuming that it
would work.

At least now the source of some of my more mysterious deadlocks is
apparent :-)

I'd second the request for at least a warning to be issued.
__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Patch to eliminate duplicate b64 code from pgcrypto

2008-12-18 Thread Marc Munro
Oops, forgot to cc my reply to hackers:

On Thu, 2008-12-18 at 01:49 +0200, Marko Kreen wrote:
 On 12/16/08, Bruce Momjian br...@momjian.us wrote:
   Would someone who understand pgcrypto please review this?
 
   Marc Munro wrote:
   I am attaching a patch to eliminate duplicate b64_encode and decode
functions from pgcrypto, and to expose those functions for use by
add-ins (I want to use them in Veil).
 
 Although your patch achieves the goal, it may make more sense to export
 pg_find_encoding() and struct pg_encoding, thus making all encoding
 algorithms available externally.

I had a very specific use-case in mind which was to eliminate the need
for Veil to re-implement b64_encode and b64_decode, so I took the path
of least effort.

The patch was made against CVS head today.  It compiles and tests
successfully.  Though I was unable to run pgrypto regression tests, I
did give the b64 encoding a sanity check.
   
I also added a b64_char() function that is now used from pgcrypto.  This
allowed me to remove the duplicate _base64 character array.
 
 I think this can be avoided by using plain b64_encode() for those 3 bytes.
 That step is really not speed critical.

You are probably right: I was being somewhat timid and tried to make the
smallest set of changes that were possible.

Unfortunately I don't have the time right now to revisit the patch.  If
you or anyone else would like to re-implement it as you suggest I will
still be a happy camper.   If not, Veil has already re-implemented the
functions, for compatibility with current and older versions of
postgres, so I can live with that too.

Thanks very much for looking at it.  To be honest, I thought that it had
got lost in the transition to the new improved patch process so I was
expecting to have to resubmit it later when I have more time.  Kudos to
Bruce for tracking it down.

__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] Re: Do we really want to migrate plproxy and citext into PG core distribution?

2008-07-21 Thread Marc Munro
On Mon, 2008-07-21 at 17:03 -0300, Tom Lane wrote:
 [. . .]  I think it
 would be a good idea to be open to reviewing pgfoundry code with the
 same standards we'd use if we were going to integrate it.  Perhaps
 commitfest is not the right venue for that, though, if only because
 of the possibility of confusion over what's supposed to happen.

I think this would be a great idea.  I would be overjoyed to have veil
http://pgfoundry.org/projects/veil/ reviewed by postgres developers.


__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] Patch to eliminate duplicate b64 code from pgcrypto

2008-07-17 Thread Marc Munro
I am attaching a patch to eliminate duplicate b64_encode and decode
functions from pgcrypto, and to expose those functions for use by
add-ins (I want to use them in Veil).

The patch was made against CVS head today.  It compiles and tests
successfully.  Though I was unable to run pgrypto regression tests, I
did give the b64 encoding a sanity check.

I also added a b64_char() function that is now used from pgcrypto.  This
allowed me to remove the duplicate _base64 character array.

I hope this is the correct place to submit the patch.  The wiki
(http://wiki.postgresql.org/wiki/Submitting_a_Patch) refers to
pgsql-patches but I believe that is now deprecated.


__
Marc
Index: contrib/pgcrypto/pgp-armor.c
===
RCS file: /projects/cvsroot/pgsql/contrib/pgcrypto/pgp-armor.c,v
retrieving revision 1.3
diff -c -r1.3 pgp-armor.c
*** contrib/pgcrypto/pgp-armor.c	15 Oct 2005 02:49:06 -	1.3
--- contrib/pgcrypto/pgp-armor.c	17 Jul 2008 22:37:27 -
***
*** 30,181 
   */
  
  #include postgres.h
  
  #include px.h
  #include mbuf.h
  #include pgp.h
  
  /*
-  * BASE64 - duplicated :(
-  */
- 
- static const unsigned char _base64[] =
- ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/;
- 
- static int
- b64_encode(const uint8 *src, unsigned len, uint8 *dst)
- {
- 	uint8	   *p,
- 			   *lend = dst + 76;
- 	const uint8 *s,
- 			   *end = src + len;
- 	int			pos = 2;
- 	unsigned long buf = 0;
- 
- 	s = src;
- 	p = dst;
- 
- 	while (s  end)
- 	{
- 		buf |= *s  (pos  3);
- 		pos--;
- 		s++;
- 
- 		/*
- 		 * write it out
- 		 */
- 		if (pos  0)
- 		{
- 			*p++ = _base64[(buf  18)  0x3f];
- 			*p++ = _base64[(buf  12)  0x3f];
- 			*p++ = _base64[(buf  6)  0x3f];
- 			*p++ = _base64[buf  0x3f];
- 
- 			pos = 2;
- 			buf = 0;
- 		}
- 		if (p = lend)
- 		{
- 			*p++ = '\n';
- 			lend = p + 76;
- 		}
- 	}
- 	if (pos != 2)
- 	{
- 		*p++ = _base64[(buf  18)  0x3f];
- 		*p++ = _base64[(buf  12)  0x3f];
- 		*p++ = (pos == 0) ? _base64[(buf  6)  0x3f] : '=';
- 		*p++ = '=';
- 	}
- 
- 	return p - dst;
- }
- 
- /* probably should use lookup table */
- static int
- b64_decode(const uint8 *src, unsigned len, uint8 *dst)
- {
- 	const uint8 *srcend = src + len,
- 			   *s = src;
- 	uint8	   *p = dst;
- 	char		c;
- 	unsigned	b = 0;
- 	unsigned long buf = 0;
- 	int			pos = 0,
- end = 0;
- 
- 	while (s  srcend)
- 	{
- 		c = *s++;
- 		if (c = 'A'  c = 'Z')
- 			b = c - 'A';
- 		else if (c = 'a'  c = 'z')
- 			b = c - 'a' + 26;
- 		else if (c = '0'  c = '9')
- 			b = c - '0' + 52;
- 		else if (c == '+')
- 			b = 62;
- 		else if (c == '/')
- 			b = 63;
- 		else if (c == '=')
- 		{
- 			/*
- 			 * end sequence
- 			 */
- 			if (!end)
- 			{
- if (pos == 2)
- 	end = 1;
- else if (pos == 3)
- 	end = 2;
- else
- 	return PXE_PGP_CORRUPT_ARMOR;
- 			}
- 			b = 0;
- 		}
- 		else if (c == ' ' || c == '\t' || c == '\n' || c == '\r')
- 			continue;
- 		else
- 			return PXE_PGP_CORRUPT_ARMOR;
- 
- 		/*
- 		 * add it to buffer
- 		 */
- 		buf = (buf  6) + b;
- 		pos++;
- 		if (pos == 4)
- 		{
- 			*p++ = (buf  16)  255;
- 			if (end == 0 || end  1)
- *p++ = (buf  8)  255;
- 			if (end == 0 || end  2)
- *p++ = buf  255;
- 			buf = 0;
- 			pos = 0;
- 		}
- 	}
- 
- 	if (pos != 0)
- 		return PXE_PGP_CORRUPT_ARMOR;
- 	return p - dst;
- }
- 
- static unsigned
- b64_enc_len(unsigned srclen)
- {
- 	/*
- 	 * 3 bytes will be converted to 4, linefeed after 76 chars
- 	 */
- 	return (srclen + 2) * 4 / 3 + srclen / (76 * 3 / 4);
- }
- 
- static unsigned
- b64_dec_len(unsigned srclen)
- {
- 	return (srclen * 3)  2;
- }
- 
- /*
   * PGP armor
   */
  
--- 30,42 
   */
  
  #include postgres.h
+ #include utils/builtins.h
  
  #include px.h
  #include mbuf.h
  #include pgp.h
  
  /*
   * PGP armor
   */
  
***
*** 215,234 
  	memcpy(pos, armor_header, n);
  	pos += n;
  
! 	n = b64_encode(src, len, pos);
  	pos += n;
  
  	if (*(pos - 1) != '\n')
  		*pos++ = '\n';
  
  	*pos++ = '=';
! 	pos[3] = _base64[crc  0x3f];
  	crc = 6;
! 	pos[2] = _base64[crc  0x3f];
  	crc = 6;
! 	pos[1] = _base64[crc  0x3f];
  	crc = 6;
! 	pos[0] = _base64[crc  0x3f];
  	pos += 4;
  
  	n = strlen(armor_footer);
--- 76,95 
  	memcpy(pos, armor_header, n);
  	pos += n;
  
! 	n = b64_encode((char *) src, len, (char *) pos);
  	pos += n;
  
  	if (*(pos - 1) != '\n')
  		*pos++ = '\n';
  
  	*pos++ = '=';
! 	pos[3] = b64_char(crc);
  	crc = 6;
! 	pos[2] = b64_char(crc);
  	crc = 6;
! 	pos[1] = b64_char(crc);
  	crc = 6;
! 	pos[0] = b64_char(crc);
  	pos += 4;
  
  	n = strlen(armor_footer);
***
*** 356,367 
  		goto out;
  
  	/* decode crc */
! 	if (b64_decode(p + 1, 4, buf) != 3)
  		goto out;
  	crc = (((long) buf[0])  16) + (((long) buf[1])  8) + (long) buf[2];
  
  	/* decode data */
! 	res = b64_decode(base64_start, base64_end - base64_start, dst);
  
  	/* check crc */
  	if (res = 0  crc24(dst, res) != crc)
--- 217,229 

[HACKERS] b64_encode and decode

2008-06-12 Thread Marc Munro
I require base64 or some similar encoding scheme from a C language
extension and need it to be as fast as reasonably possible.  In 
src/backend/utils/adt/encode.c there are functions b64_encode and
b64_decode which would be ideal but these are defined static and so are
not available to my code.

I know I could call these functions indirectly by calling binary_ecncode
through DirectFunctionCalln() but this is a whole lot more complexity
and overhead than I'd like.

I note that /contrib/pgcrypto/pgp-armor.c appears to have its own copies
of these 2 functions and now I have elected to do the same.

So, would there be any chance of redefining the base64 functions in
encode.c as extern to eliminate this redundancy?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] b64_encode and decode

2008-06-12 Thread Marc Munro
On Thu, 2008-06-12 at 19:07 -0400, Andrew Dunstan wrote:
 Marc Munro wrote:
  I require base64 or some similar encoding scheme from a C language. . .
 
  I know I could call these functions indirectly by calling binary_ecncode
  through DirectFunctionCalln() but this is a whole lot more complexity
  and overhead than I'd like. . .
 
 Just how much complexity do you think calling binary_encode involves? 
 You can probably do the whole thing in one or two lines of code.

I'm sure that's true once I've got my head around the mechanism, but it
adds two levels of indirection that seem quite unnecessary, and given
that the author of pgcrypto has also wound up copying the functions I
guess I'm not the only one who'd rather avoid it.

If there are good reasons not to expose the functions, or if the hackers
just don't want to do it I'm fine with that.  For dealing with 8.3 and
earlier I will have to live with the redundancy.  For 8.4 I'd like not
to, but it's really not a big deal.

Thanks for the response though.  Being able to get a response from
developers means a lot.  I used to be an Oracle DBA and I have to say
the response I get from this group is light years ahead of what I used
to have to pay for.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] b64_encode and decode

2008-06-12 Thread Marc Munro
On Thu, 2008-06-12 at 19:10 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  So, would there be any chance of redefining the base64 functions in
  encode.c as extern to eliminate this redundancy?
 
 It'd only last until the next time Bruce runs his script that
 static-izes things that aren't used outside their own module ...

Hmmm.  Does that script look in contrib?  If so I'd be happy to provide
a patch to eliminate the redundancy there.  If not, maybe it could be
persuaded to be more inclusive?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [0/4] Proposal of SE-PostgreSQL patches

2008-05-07 Thread Marc Munro
In a digest for Tue, 2008-05-06 at 22:57 -0300, Tom Lane wrote:
...[discussion of SE-PostgreSQL patch deleted]...
 (And of course the next question after that is why we should want to
 depend on SELinux at all, rather than implementing row filtering
 in the framework of SQL permissions...)

I would love to see something like this to replace Veil which I develop
and support.  What sort of row filtering did you have in mind?

As a database application developer I have found being able to define
access controls in terms of data relationships to be tremendously useful
and I would love to see something like this built into postgres.

As far as I have been able to understand SE-PostgreSQL, it is aimed at a
very security-conscious, and expert, customer base but it cannot offer
the sort of relationally-defined security access that Veil is intended
for (I'd be happy to be wrong about this).  On the other hand Veil is
not going to be able to provide the degree of certainty (provability?)
of SE-PostgreSQL.

As an example of relationally-defined security, suppose I want only the
members of a project team to be able to see project information:

In the veil demo application (
http://veil.projects.postgresql.org/curdocs/demo-model.html ) we can
assign a developer to a project by inserting into the assignments table
a record for that developer, the given project and a specific role.
Once assigned, the developer can see project_details for that project
that previously were unavailable.

If row filtering is to be implemented directly within Postgres, I would
like this sort of capability to be considered.  For the record, Veil was
written to provide similar functionality to Oracle's Virtual Private
Databases.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] function body actors (was: [PERFORM] viewing source code)

2007-12-21 Thread Marc Munro
On Fri, 2007-21-12 at 18:05 -0400, Andrew Sullivan [EMAIL PROTECTED]
wrote:

   2.  Protect the content of a field from _some_ users on a given
 system,
  
  I would argue that (2) is reasonably well served today by setting up
  separate databases for separate users. 
 
 I thought actually this was one of the use-cases we were hearing.
 Different people using the same database (because the same data), with
 rules about the different staff being able to see this or that
 function body.  I can easily imagine such a case, for instance, in a
 large organization with different departments and different
 responsibilities.  It seems a shame that the only answer we have there
 is, Give them different databases.  

There is also Veil:
http://veil.projects.postgresql.org/curdocs/index.html


__
Marc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Marc Munro
On Wed, 2007-02-05 at 08:27 -0400, Andrew Dunstan wrote:
 
 Naz Gassiep wrote:
  Andrew Dunstan wrote:

  Naz Gassiep wrote:
  
  I believe the suggestion was to have an automated process that only ran
  on known, sane patches.

  How do we know in advance of reviewing them that they are sane?
  
  Same way as happens now. 

 
 The question was rhetorical ... there is no list of certified sane but 
 unapplied patches. You are proceeding on the basis of a faulty 
 understanding of how our processes work.

Why do we need to know the patch is sane?  If it does not apply cleanly
or causes regression tests to fail, the process would figure that out
quickly and cheaply.  There is little cost in attempting to apply a
non-sane patch.

I am not sure that I have explained exactly what I was suggesting.  Some
people seem to grok it, others seem to be talking something slightly
different.  To clarify, here it is in pseudo-code:

for each patch in the queue
  regression_success := false
  patch_success := attempt to apply patch to head
  if patch_success
regression_success := attempt to run regression tests
-- (On one machine only, not on the buildfarm)
  end if
  if this is a new patch
maybe mail the author and tell them patch_success and
regression_success
  else
if status is different from last time
  mail the author and tell them their patch has changed status
end
  end
  record the status for this patch
end loop

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Feature freeze progress report

2007-05-01 Thread Marc Munro
On Tue, 2007-01-05 at 02:54 +0100, Gregory Stark wrote:
 Naz Gassiep [EMAIL PROTECTED] writes:
 
  Even if the patch inventory wasn't kept right up to date, this system
  could potentially help many regression issues or bugs to surface sooner,
 
 I just don't understand what this would accomplish. People run regressions
 before submitting anyways. They can't run them on all architectures but bugs
 that only affect some architectures are uncommon.

The intention is to help keep patches, which may remain in the queue for
extended lengths of time, reasonably current.   The mechanism aims to
help with these specific problems:

- patches accumulates bitrot and are consequently harder to apply and
understand
- the author, by the time review occurs, no longer has the details of
the patch uppermost in their mind

If the author can be automatically prodded when the patch no longer
cleanly applies, or when it suddenly breaks regression tests, they will
be able to keep the patch current, may discover bugs in it themselves
prior to review, and it will remain more fresh in their minds.

For sure, there will be classes of patch for which this mechanism
provides no benefit.  For instance, where a patch contains code that is
for discussion only, or a patch that is dependant on another patch.  In
these cases, the mechanism would simply note that they don't apply
cleanly, or don't pass tests, and would do nothing further.  I can see
no harm here.

 This seems to be merely institutionalizing having a large backlog of patches
 which survive for long periods of time. But even in that situation I don't see
 what it buys us. Detecting bitrot isn't terribly helpful and it doesn't help
 us actually deal with the bitrot once it's happened.

I hope that it would not encourage reviewers to leave things in the
patch queue.  I don't see why it would, so don't think this would
institutionalize a backlog.  

I also disagree that detecting bitrot is not helpful.  If I had eagerly
submitted a patch, I would definitely want to fix any bitrot that
occurred and would be thankful to be automatically informed.

And to clarify, I do not think the buildfarm should be involved in this
process.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Feature freeze progress report

2007-04-30 Thread Marc Munro
On Mon, 2007-30-04 at 08:56 -0300, Heikki Linnakangaspgsql wrote:
 Date: Mon, 30 Apr 2007 09:18:36 +0100
 From: Heikki Linnakangas [EMAIL PROTECTED]
 To: Tom Lane [EMAIL PROTECTED]
 Cc: Dave Page [EMAIL PROTECTED], Simon Riggs
 [EMAIL PROTECTED], 
  Bruce Momjian [EMAIL PROTECTED],
  PostgreSQL-development pgsql-hackers@postgresql.org
 Subject: Re: Feature freeze progress report
 Message-ID: [EMAIL PROTECTED]

 If we had a 1-2 lines status blurp attached to each patch in the
 queue, 
 like waiting for review, author is fixing issue XX, etc., that
 might 
 help. Bruce would need to do that if we keep the current patch queue 
 system unmodified otherwise, or we'd need to switch to something else.

Would it be possible to also automatically determine some sort of
bit-rot status?  What I had in mind was an automated process that would
apply each patch to HEAD on a daily basis and report whether the patch
still applies cleanly and still allows all regression tests to pass on
at least one platform.  If and when the result of these tests changes
from pass to fail, the patch submitter would be automatically
notified.  

The patch status could then also show the last time at which the patch
applied cleanly, and the last time that regression tests ran
successfully.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-13 Thread Marc Munro
On Mon, 2007-12-02 at 00:10 -0500, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  Consider a table C containing 2 child records C1 and C2, of parent P.
  If transaction T1 updates C1 and C2, the locking order of the the
  records will be C1, P, C2.  Another transaction, T2, that attempts to
  update only C2, will lock the records in order C2, P.
 
  The locks on C2 and P are taken in different orders by the two
  transactions, leading to the possibility of deadlock.
 
 But the lock on P is shared, hence no deadlock.

Doh!  Yes, you are right.  It is not that simple.

For deadlock to occur, we need a transaction that takes an exclusive
lock on P as well as on one of the children.  Let us replace T2 with a
new transaction, T3, which is going to update P and only one of its
children.

If T3 is going to update P and C1 without the possibility of deadlock
against T1, then it must take out the locks in the order C1, P.  If, on
the other hand, it is going to update P and C2, then the locks must be
taken in the order P, C2.

This means that there is no single strategy we can apply to T3 that will
guarantee to avoid deadlocks with transactions that update only C (ie
transactions, which to a developers point of view do nothing to P, and
so should be unable to deadlock with T3).

From an application developer's standpoint there are few options, none
of them ideal:

1) Insist on a locking policy that requires updates to first lock their
parent records.

This is horrible for so many reasons.  It should be unnecessary; it
causes exclusive locking on parent records, thereby eliminating the
gains made by introducing row share locks in 8.1; it is onerous on the
developers; it is error-prone; etc

2) Remove FK constraints to eliminate the possibility of RI-triggered
deadlocks.

Ugh.

3) Encapsulate all transactions in some form of retry mechanism that
traps deadlocks and retries those transactions.

This may not be practicable, and incurs all of the overhead of
encountering and trapping deadlocks in the first place.  Also, as each
deadlock occurs, a number of locks will be left active before deadlock
detection kicks in, increasing the window for further deadlocks.  On a
busy system, the first deadlock may well trigger a cascade of further
deadlocks.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-13 Thread Marc Munro
On Tue, 2007-13-02 at 11:38 -0500, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  From an application developer's standpoint there are few options, none
  of them ideal:
 
 How about
 
 4) Make all the FK constraints deferred, so that they are only checked
 at end of transaction.  Then the locking order of transactions that only
 modify C is always C1, C2, ..., P.

Excellent suggestion.  Thank you.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-11 Thread Marc Munro
On Sun, 2007-11-02 at 12:21 -0600, Jim C. Nasby wrote:
 On Thu, Feb 08, 2007 at 08:47:42AM -0800, Marc Munro wrote:
  One of the causes of deadlocks in Postgres is that its referential
  integrity triggers can take locks in inconsistent orders.  Generally a
  child record will be locked before its parent, but not in all cases.
 
 Where would PostgreSQL lock the parent before the child? AFAIK the
 behavior should be consistent...

Consider a table C containing 2 child records C1 and C2, of parent P.
If transaction T1 updates C1 and C2, the locking order of the the
records will be C1, P, C2.  Another transaction, T2, that attempts to
update only C2, will lock the records in order C2, P.

The locks on C2 and P are taken in different orders by the two
transactions, leading to the possibility of deadlock.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro
Oops, forgot to include pgsql-hackers when I responded to this the first
time.

 On Tue, 2007-06-02 at 20:53 -0500, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  The RI triggers currently fire when a record is updated.  Under my
  proposal they would fire in the same way but before the record is
locked
  rather than after.  Or am I missing your point?
 
 IOW, some other transaction could update or delete the tuple
meanwhile?
 Doesn't seem very promising.
 

That other transaction, T1, would have run the same RI triggers and so
would have the same parent records locked.  The blocked transaction, T2,
once T1 has committed, would fail.

I don't see this as being much different from the current case, where T1
locks and deletes or updates a row, and T2 then tries to manipulate the
same row.  In both cases, locks manage the race for the row, and MVCC
ensures that T2 fails.

__
Marc



signature.asc
Description: This is a digitally signed message part


[HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Marc Munro
I am going to restate my earlier proposal, to clarify it and in the hope
of stimulating more discussion.

One of the causes of deadlocks in Postgres is that its referential
integrity triggers can take locks in inconsistent orders.  Generally a
child record will be locked before its parent, but not in all cases.

My proposal modifies the order in which locks are taken by referential
integrity triggers, so that parents are always locked before their
children.

The proposal is, that referential integrity triggers should fire before
locking the tuple from which they are triggered.  I guess a new sort of
trigger is required for this, a before-lock trigger.

If this is a dumb idea, please tell me why.  If it would cause more
problems than it solves, ditto.  If it would be difficult to implement,
let's discuss and try to find solutions.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)

2007-02-08 Thread Marc Munro
On Thu, 2007-08-02 at 18:06 +0100, Csaba Nagy wrote:

 The problem is that eliminating the deadlock is still not the complete
 cake... the interlocking still remains, possibly leading to degraded
 performance on high contention on very common parent rows. The real
 solution would be when an update on the parent table's non-referenced
 fields is not interlocking at all with updates of the child rows... and
 I think there were some proposals to do that.

Agreed.  There are two issues here, unnecessary blocking and deadlock.
These can be tackled separately.  My proposal deals only with the
deadlock issue. 

Even if if contention is reduced, for instance by implementing
column-level locking, there will still be the potential for deadlock
arising from inconsistent ordering of locks.  I continue to stand by my
proposal.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro
On Thu, 2007-08-02 at 10:06 -0800, Stephan Szabo wrote:
 On Thu, 8 Feb 2007, Marc Munro wrote:
. . .
 
  That other transaction, T1, would have run the same RI triggers and so
  would have the same parent records locked.
 
 That's not true in the case of delete, since the referencing table
 triggers are on insert and update. . . .

Let me see if I have this scenario right: 

Transaction T1 updates child record C1, with RI causing the parent P1 to
be locked before the child.

In the meantime transaction T2, successfully deletes C1 as it has not
yet been locked.

(Please tell me if I have misunderstood what you are saying)

Yes in this case, T1 must abort because the record it was going to
update has disappeared from underneath it.  I don't see how this is
significantly different from the same race for the record if the table
had no RI constraints.  The only difference that I can see, is that T1
now has some locks that it must relinquish as the transaction aborts.

 . . .  Second, the parent record locks are not
 exclusive which means that both can be granted, so I don't see how this
 stops the second from continuing before the first.

I don't think this does stop the second from continuing before the
first.  What will stop it, is the eventual lock that is taken on the
child (triggering) record.  I am not proposing reducing the number of
locks taken, but rather changing the order in which the locks are taken.

concerned frown What am I missing? /concerned frown

__
Marc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro
On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  Yes in this case, T1 must abort because the record it was going to
  update has disappeared from underneath it.  I don't see how this is
  significantly different from the same race for the record if the table
  had no RI constraints.  The only difference that I can see, is that T1
  now has some locks that it must relinquish as the transaction aborts.
 
 No, the difference is there would have been no error at all before;
 if the record were deleted before T1 got to it then it wouldn't have
 attempted to update it.  I really don't think you can make it work
 to perform updates or deletes on a record you have not yet locked.

The record would be locked before the update or delete is attempted,
however it would not be locked until the referential integrity
constraints have succeeded in acquiring their locks.

It is becoming clear to me that I am missing something but I still don't
know what it is.  If anyone can see it and explain it I'd really
appreciate it.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-08 Thread Marc Munro
On Thu, 2007-08-02 at 12:24 -0800, Stephan Szabo wrote:
 On Thu, 8 Feb 2007, Marc Munro wrote:
 
  I don't think this does stop the second from continuing before the
  first.  What will stop it, is the eventual lock that is taken on the
  child (triggering) record.
 
 But at that point, you've already had to compose the new row in order to
 call the trigger for the ri check, right? So, one of those new rows will
 be out of date by the time it actually gets the lock. Presumably that
 means that you need to recalculate the new row, but you've already done a
 check and gotten a lock based on the old new row.

Yes.  That is tricky.  For my proposed scheme to work, I guess we'd have
to be able to drop those locks which were just acquired by the RI
triggers.  Not too simple, I guess.

 Also, another big problem is the fact that SQL requires that the action
 already have happened before the check in cases where the constraint
 references the same table.  The row being updated or inserted might
 reference a row that will be updated or inserted by a later action of the
 same statement.

Hmmm.  That does seem to be the final nail in the coffin.  Consider the
proposal withdrawn, and thanks for explaining it all to me.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-06 Thread Marc Munro
Simon Riggs started this thread with the question:

  . . .
  Why do we need a SHARE lock at all, on the **referenc(ed)** table?
  . . .

The root problem addressed by this thread seems to be that using share
locks in this way increases the likelihood of deadlock, and causes
blocking when no blocking is actually needed.

I would like to make a few observations leading to two alternative
proposals for dealing with this issue.

Deadlocks arise because of differences in the order in which locks are
taken.  If we have a parent table P, and a child C, and we modify two
children of the same P, locks will be taken in the order C1, P, C2.
Another process modifying only C2, will cause locks to be taken in the
order C2, P, leading to the possibility of deadlock.  With the current
system of RI, this sort of deadlock arises far too easily with the
result that RI is often disabled.

It is solely the order in which the locks are taken that causes the
problem.  If the RI constraints could lock the parent records before
locking the child, the possibility of deadlock would be much reduced.  

Proposal 1: Alter the way RI triggers fire, so that they complete before
locking the row against which they fire.


Having a background in Oracle, I found myself considering how this is
not usually a problem with Oracle databases.  If I understand it
correctly, in Oracle the referential integrity constraints are
implemented by locking the index associated with the constraint, rather
than the records themselves.

Proposal 2: Lock the index associated with the parent record, rather
than the parent record itself.  Updates to indexed fields, and deletions
of records would need to also take such locks, but this should be enough
to allow non-referenced fields to be updated in a parent, even while
transactions are modifying its children.


__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] referential Integrity and SHARE locks

2007-02-06 Thread Marc Munro
On Tue, 2007-06-02 at 23:47 +, Gregory Stark wrote:
 Marc Munro [EMAIL PROTECTED] writes:
 
  Proposal 1: Alter the way RI triggers fire, so that they complete before
  locking the row against which they fire.
 
 It's kind of hard to know what records the user will choose to update before
 he actually does the update...

The RI triggers currently fire when a record is updated.  Under my
proposal they would fire in the same way but before the record is locked
rather than after.  Or am I missing your point?

  Proposal 2: Lock the index associated with the parent record, rather
  than the parent record itself.  
 
 That doesn't help in our case because each version of a record has an index
 entry. So even updates to unrelated fields imply index modifications. Worse,
 deleting and updating don't remove the old index entries so even if you've
 locked them you won't prevent people from doing exactly those operations
 you're trying to avoid.

I guess my proposal was incomplete.  Obviously, before deleting, or
updating an indexed column, a lock would have to be taken on the index.
I believe this would suffice to guarantee referential integrity without
blocking updates that leave the referred indexes unchanged.

What you say about each version of a record having an index entry
confuses me.  I thought there was one index entry that lead to a chain
of tuples.  If this is not the case, I don't see how the current
exclusive locks on indexes work to enforce uniqueness.  Could you point
me to somewhere in the code or the documentation that explains this?

It still seems to me that if we can lock an index entry to guarantee
uniqueness, we can also lock it to implement RI constraints.

__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [PATCHES] New shared memory hooks proposal (was Re:

2006-10-15 Thread Marc Munro
On Sat, 2006-10-14 at 14:55 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  The attached patch provides add-ins with the means to register for
  shared memory and LWLocks.
 
 I finally got around to reviewing this patch, and realized that it's got
 a pretty fundamental design flaw: it isn't useful under Windows (or any
 other EXEC_BACKEND platform), because there isn't any provision for
 backends to locate structs allocated by other backends by means of
 searching in shared memory.  AFAICS the code can only do something
 useful in a platform where allocations made in the postmaster process
 can be found by backends via fork inheritance of pointers.

Rats!  You are right.  I had quite overlooked the windows case.

 The right way to handle shmem allocations is to use ShmemInitStruct
 to either allocate a shared struct for the first time or attach to a
 previously made instance of the struct.  (This struct could be a
 memory allocation arena itself, but that's not the core code's problem.)
 Now we could extend the patch so that each addin has its own
 ShmemIndex within its private workspace, but I think that's probably
 overkill.  My inclination is to rip out ShmemAllocFromContext and expect
 addins to use ShmemInitStruct the same as everyone else.  The hook
 callable at shared_preload_libraries time should just serve to add
 the necessary amount to the computed size of the shared memory segment.

I think that works for me.

 RegisterAddinLWLock is broken in the same way: it could only be used
 safely if the registered lock ID were remembered in shared memory,
 but since shared memory doesn't exist at the time it's supposed to be
 called, there's no way to do that.  Again, it'd seem to work as long as
 the lock ID value were inherited via fork, but that's gonna fail on
 EXEC_BACKEND builds.  I think we should probably take this out in favor
 of something that just increments a counter that replaces
 NUM_USER_DEFINED_LWLOCKS, and expect people to use LWLockAssign() at an
 appropriate time while initializing their shared memory areas.

Agreed.

 It strikes me that there's a race condition here, which we've not seen
 in previous use because backends expect all standard shared memory
 structs to have already been initialized by the postmaster.  An add-on
 will instead have to operate under the regime of first backend wanting
 to use the struct must init it.  Although ShmemInitStruct returns a
 found bool telling you you've got to init it, there's no interlock
 ensuring that you can do so before someone else comes along and tries to
 use the struct (which he'll assume is done because he sees found = true).
 And, per above discussion, an add-on can't solve this for itself using
 an add-on LWLock, because it really has to acquire its add-on locks
 while initializing that same shmem struct, which is where it's going to
 keep the locks' identity :-(
 
 So I think we need to provide a standard LWLock reserved for the purpose
 of synchronizing first-time use of a shmem struct.  The coding rules for
 an add-on would then look like:
 
 * in the shared_preload_libraries hook:
 
   RequestAddinShmemSpace(size);
   RequestAddinLWLocks(n);
 
 * in a backend, to access a shared memory struct:
 
   static mystruct *ptr = NULL;
 
   if (!ptr)
   {
   boolfound;
 
   LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
   ptr = ShmemInitStruct(my struct name, size, found);
   if (!ptr)
   elog(ERROR, out of shared memory);
   if (!found)
   {
   initialize contents of shmem area;
   acquire any requested LWLocks using:
   ptr-mylockid = LWLockAssign();
   }
   LWLockRelease(AddinShmemInitLock);
   }
 
 
 Thoughts?

I am content that what you suggest is the right way to go.  I will work
on a new patch immediately, unless you would prefer to do this yourself.

It seems to me that these coding rules should be documented in the main
documentation, I guess in the section that describes Dynamic Loading.
Would you like me to take a stab at that?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [pgsql-hackers] Daily digest v1.6352 (22 messages)

2006-10-03 Thread Marc Munro
On Mon, 2006-10-02 at 12:02 -0300, Shaunak Godbole wrote:
 Hi,
 
 We are trying to introduce access control. For this we have to rewrite
 the
 input query by replacing each relation by its corresponding authorized
 view.

I assume from this that you are trying to implement something like
Oracle's Virtual Private Database.  

If you need access controls at the row or column level, and do not want
to implement this whole thing yourself, please take a look at Veil 
(http://veil.projects.postgresql.org/) which can already accomplish the
same sort of thing, though in a different way.

If Veil is insufficient for your needs in some way, I'd be interested to
hear about it.

__
Marc




signature.asc
Description: This is a digitally signed message part


[HACKERS] Making config file parser available to add-ins

2006-07-24 Thread Marc Munro
I want Veil (http://pgfoundry.org/projects/veil/) to be able to read
configuration details from a configuration file.  Rather than implement
my own config file parser, I'd like to be able to re-use the parser
defined in guc-file.l

If this is not contentious, I will submit a patch to make the parser
available to add-ins.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Making config file parser available to add-ins

2006-07-24 Thread Marc Munro
On Mon, 2006-07-24 at 20:31 +0200, Peter Eisentraut wrote:
 Marc Munro wrote:
  I want Veil (http://pgfoundry.org/projects/veil/) to be able to read
  configuration details from a configuration file.
 
 What kind of details?  By the time any server-side module is loaded, the 
 configuration file has already been read, so why would you need to read 
 it again?
 
I want to read a veil config file rather than the postgres config file.
Basically, I just want access to the rather nice config file parser that
already exists.

As for the kind of details:
- how much shared memory to allocate for veil
- the size of the hash tables for veil's shared variables

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Making config file parser available to add-ins

2006-07-24 Thread Marc Munro
On Mon, 2006-07-24 at 14:44 -0400, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Marc Munro wrote:
  I want Veil (http://pgfoundry.org/projects/veil/) to be able to read
  configuration details from a configuration file.
 
  What kind of details?  By the time any server-side module is loaded, the 
  configuration file has already been read, so why would you need to read 
  it again?
 
 Probably the correct question is whether Marc's problem isn't already
 solved by the custom GUC variable mechanism --- that is, whatever he
 wants to configure should be defined as custom GUCs within the
 existing configuration file.

I don't think it's already solved but I may be missing the point.  The
Veil shared library will be loaded by process_preload_libraries only
after the postgresql config file has been read.  I was assuming that at
this point it would be too late to specify custom GUCs.

Instead I want to load my own veil.conf file which would support the
same sort of syntax as postgresql.conf.

My proposal is to simply expose a new function processAddinConfigFile()
which would mimic ProcessConfigFile but would call a user-supplied
function to deal with each entry.  Obviously, this would be usable by
future add-ins and not just Veil.

If there is a better way to do this please tell me.
__
Marc




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Making config file parser available to add-ins

2006-07-24 Thread Marc Munro
On Mon, 2006-07-24 at 15:17 -0400, Tom Lane wrote:
 No, being able to do that is exactly the point of the custom-GUC
 mechanism.

Excellent.  I shall study this and hope to bother you no further :-)

Thanks.
__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-07-19 Thread Marc Munro
For the record, here are the results of our (ongoing) inevstigation into
the index/heap corruption problems I reported a couple of weeks ago.

We were able to trigger the problem with kernels 2.6.16, 2.6.17 and
2.6.18.rc1, with 2.6.16 seeming to be the most flaky.

By replacing the NFS-mounted netapp with a fibre-channel SAN, we have
eliminated the problem on all kernels.

From this, it would seem to be an NFS bug introduced post 2.6.14, though
we cannot rule out a postgres bug exposed by unusual timing issues.

Our starting systems are: 

Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux (and others)
kernel boot option: elevator=deadline
16 Gigs of RAM
postgresql-8.0.8-1PGDG
Bonded e1000/tg3 NICs with 8192 MTU.
Slony 1.1.5

NetApp FAS270 OnTap 7.0.3
Mounted with the NFS options
rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac
Jumbo frames 8192 MTU.

All postgres data and logs are stored on the netapp.

All tests results were reproduced with postgres 8.0.8

__
Marc

On Fri, 2006-06-30 at 23:20 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  We tried all of these suggestions and still get the problem.  Nothing
  interesting in the log file so I guess the Asserts did not fire.
 
 Not surprising, it was a long shot that any of those things were really
 broken.  But worth testing.
 
  We are going to try experimenting with different kernels now.  Unless
  anyone has any other suggestions.
 
 Right at the moment I have no better ideas :-(
 
   regards, tom lane
 


signature.asc
Description: This is a digitally signed message part


[HACKERS] New shared memory hooks proposal (was Re: pre_load_libraries)

2006-07-13 Thread Marc Munro
On Thu, 2006-07-13 at 01:13 -0300, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  ...  A better solution from my point of view would be
  to simply move the call to process_preload_libraries to a point
 after
  shared memory has been set up.  Is there some reason this could not
 be
  done?
 
 That would make it impossible for a preloaded library to request any
 shared memory of its own --- something that admittedly we don't have a
 hook to support, but do we want to foreclose it permanently?

That does sound like the right way to go.  Here is my new proposal:

Add-ins register their requirement for shared memory using a new
function: RegisterShmemRequirement(char *context_name, int size).  This
would be called by the init function called from
process_preload_libraries.

When shared memory is initialised, extra space is allocated for each
registered add-in.  Each add-in's registered allocation is a separate
memory context identified by the context_name parameter provided during
registration.

Add-ins allocate shared memory from their own context using a new
function ShemAddinAlloc(), which adds the context_name parameter to the
normal ShemAlloc parameter list.

This would save add-ins from having to manage their own shared memory
segements while providing a degree of separation and isolation so that
one add-in could not exhaust the shared memory of another or of the
backend.

If this is acceptable, I think it is within my skill level to implement.
Comments?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] pre_load_libraries

2006-07-12 Thread Marc Munro
On Wed, 2006-07-12 at 02:18 -0300, I wrote:
 I am trying to create an initialisation function that is called using
 the  preload_libraries option.
 
 The purpose of this is to set up shared memory for Veil, independant
 of postgres' own shared memory.  Simple init functions work fine, but
 as soon as I place calls to ShemAlloc, or LWLockAssign, the server
 startup simply halts.

To answer my own question, yes I am being unreasonable.  Shared memory
has not been set up at the time that process_preload_libraries is
called.  

Since lwlocks are allocated from postgres shared memory, I cannot assign
an lwlock from Veil's initialisation code.  Instead, I can make the
allocation of the lwlock the responsibility of the first session that
uses a Veil function but I need a lock to prevent a race.

My current thinking is to simply subvert one of the existing lwlocks
while I assign my own.  A better solution from my point of view would be
to simply move the call to process_preload_libraries to a point after
shared memory has been set up.  Is there some reason this could not be
done?

On a related note, I can see no way to release Veil's shared memory
segment when postgres is shut down.   Perhaps I should be thinking about
making the management of such shared memory segments something that
postgres does on behalf of its add-ins, though that seems presumptious.

I'd appreciate any suggestions, pointers, random thoughts, etc.
__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] pre_load_libraries

2006-07-11 Thread Marc Munro
I am trying to create an initialisation function that is called using
the  preload_libraries option.

The purpose of this is to set up shared memory for Veil, independant of
postgres' own shared memory.  Simple init functions work fine, but as
soon as I place calls to ShemAlloc, or LWLockAssign, the server startup
simply halts.

Am I being unreasonable in trying to call these functions at this point
of the server startup, or is this just some stupid bug in my code?

I wish to call ShmemAlloc in order to simply create a shared reference
to the Veil shared memory segments that I will set up separately.

Thanks
__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-30 Thread Marc Munro
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
 One easy thing that would be worth trying is to build with
 --enable-cassert and see if any Asserts get provoked during the

 A couple other things to try, given that you can provoke the failure
 fairly easily:
 . . .
 1. In studying the code, it bothers me a bit that P_NEW is the same as
 InvalidBlockNumber.  The intended uses of P_NEW appear to be adequately
 . . .
 2. I'm also eyeing this bit of code in hio.c:
 
 If someone else has just extended the relation, it's possible that this
 will allow a process to get to the page before the intended extender has
 . . . 

We tried all of these suggestions and still get the problem.  Nothing
interesting in the log file so I guess the Asserts did not fire.

We are going to try experimenting with different kernels now.  Unless
anyone has any other suggestions.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
On Tom Lane's advice, we upgraded to Postgres 8.0.8.  We also upgraded
slony to 1.1.5, due to some rpm issues.  Apart from that everything is
as described below.

We were able to corrupt the index within 90 minutes of starting up our
cluster.  A slony-induced vacuum was under way on the provider at the
time the subscriber failed.

What can we do to help identify the cause of this?  We have a test
system that seems able to reproduce this fairly easily.

__
Marc

On Wed, 2006-06-28 at 09:28 -0700, Marc Munro wrote:
 We have now experienced index corruption on two separate but identical
 slony clusters.  In each case the slony subscriber failed after
 attempting to insert a duplicate record.  In each case reindexing the
 sl_log_1 table on the provider fixed the problem.
 
 The latest occurrence was on our production cluster yesterday.  This has
 only happened since we performed kernel upgrades and we are uncertain
 whether this represents a kernel bug, or a postgres bug exposed by
 different timings in the new kernel.
 
 Our systems are:
 
 Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
 Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
 kernel boot option: elevator=deadline
 16 Gigs of RAM
 postgresql-8.0.3-1PGDG
 Bonded e1000/tg3 NICs with 8192 MTU.
 Slony 1.1.0
 
 NetApp FAS270 OnTap 7.0.3
 Mounted with the NFS options
 rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac
 Jumbo frames 8192 MTU.
 
 All postgres data and logs are stored on the netapp.
 
 In the latest episode, the index corruption was coincident with a
 slony-induced vacuum.  I don't know if this was the case with our test
 system failures.
 

 __
 Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
On Thu, 2006-06-29 at 12:11 -0400, Tom Lane wrote:
 OK, so it's not an already-known problem.
 
  We were able to corrupt the index within 90 minutes of starting up our
  cluster.  A slony-induced vacuum was under way on the provider at the
  time the subscriber failed.
 
 You still haven't given any details.  What do you mean by corrupt the
 index --- what actual symptoms are you seeing?

We have a two node slony cluster with load tests running againt the
provider node.

After resyncing the subscriber, and running load tests for about an
hour, the slony subscriber begins to fail.  From the log file:

2006-06-28 17:58:43 PDT ERROR  remoteWorkerThread_1: insert into 
public.table_trans_attribute (table_transaction_id,attribute_type,value) 
values ('374520943','22001','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22002','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22003','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22004','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22005','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22006','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22007','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520943','22007','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520942','22009','0');
insert into public.table_trans_attribute 
(table_transaction_id,attribute_type,value) values ('374520942','22010','0');
 ERROR:  duplicate key violates unique constraint table_trans_attr_pk

As you see, slony is attempting to enter one tuple
('374520943','22007','0') two times.

Each previous time we have had this problem, rebuilding the indexes on
slony log table (sl_log_1) has fixed the problem.  I have not reindexed
the table this time as I do not want to destroy any usable evidence.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
We have reproduced the problem again.  This time it looks like vacuum is
not part of the picture.  From the provider's log:

2006-06-29 14:02:41 CST DEBUG2 cleanupThread:  101.057 seconds for vacuuming

And from the subscriber's:

2006-06-29 13:00:43 PDT ERROR  remoteWorkerThread_1: insert into 
public.table_trans_attribute (table_transaction_id,attribute_type,value) 
values ('374740387','22008','4000');

If my maths is correct and the logs are honest, the vacuum would have
started at 14:01:00 CST (13:01:PDT), about 20 seconds after we first
encounter the problem.  The clocks on the two machines, though in
different timezones, are currently synced.

Tom, I will create another tarball of the sl_log_1 table and indexes.
Should be quite a bit smaller than the previous one.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
On Fri, 2006-06-30 at 00:37 +0300, Hannu Krosing wrote:
 Marc: do you have triggers on some replicated tables ?
 
We have a non-slony trigger on only 2 tables, neither of them involved
in this transaction.  We certainly have no circular trigger structures.

 I remember having some corruption in a database with weird circular
 trigger structures, some of them being slony log triggers. 
 
 The thing that seemed to mess up something inside there, was when change
 on parent rownt fired a trigger that changes child table rows and there
 rows fired another trigger that changed the same parent row again.
 

__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] [Re: Index corruption]

2006-06-29 Thread Marc Munro
Ooops: forgot to cc this to the list.

On Thu, 2006-06-29 at 19:27 -0400, Tom Lane wrote:
 Are you *certain* this slave isn't running 8.0.2 or older?  If you can
 verify that, then I guess we need to look for another mechanism that
 could cause the same kind of thing.

Certain.  We built new rpms for 8.0.8 and installed them on both
database servers (and I just checked them again to make sure).

If there's anything we can do to help debug this we will.  We can apply
patches, different build options, etc.

__
Marc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] index corruption

2006-06-29 Thread Marc Munro
On Thu, 2006-06-29 at 19:59 -0400, Tom Lane wrote:

 Ummm ... you did restart the server?  select version(); would be
 the definitive test.

Can't say I blame you for the skepticism but I have confirmed it again.

test=# select version();
 version 
-
 PostgreSQL 8.0.8 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
3.4.6 20060404 (Red Hat 3.4.6-3)
(1 row)

__
Marc



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
 One easy thing that would be worth trying is to build with
 --enable-cassert and see if any Asserts get provoked during the
 failure case.  I don't have a lot of hope for that, but it's
 something that would require only machine time not people time.

I'll try this tomorrow.

 A couple other things to try, given that you can provoke the failure
 fairly easily:
 
 1. In studying the code, it bothers me a bit that P_NEW is the same as
 InvalidBlockNumber.  The intended uses of P_NEW appear to be adequately
 interlocked, but it's fairly easy to see how something like this could
 happen if there are any places where InvalidBlockNumber is
 unintentionally passed to ReadBuffer --- that would look like a P_NEW
 call and it *wouldn't* be interlocked.  So it would be worth changing
 P_NEW to (-2) (this should just take a change in bufmgr.h and
 recompile) and adding an Assert(blockNum != InvalidBlockNumber)
 at the head of ReadBufferInternal().  Then rebuild with asserts enabled
 and see if the failure case provokes that assert.

I'll try this too.

 2. I'm also eyeing this bit of code in hio.c:
 
 /*
  * If the FSM knows nothing of the rel, try the last page before
  * we give up and extend.  This avoids one-tuple-per-page syndrome
  * during bootstrapping or in a recently-started system.
  */
 if (targetBlock == InvalidBlockNumber)
 {
 BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
 
 if (nblocks  0)
 targetBlock = nblocks - 1;
 }
 
 If someone else has just extended the relation, it's possible that this
 will allow a process to get to the page before the intended extender has
 finished initializing it.  AFAICT that's not harmful because the page
 will look like it has no free space ... but it seems a bit fragile.
 If you dike out the above-mentioned code, can you still provoke the
 failure?

By dike out, you mean remove?  Please confirm and I'll try it.

 A different line of attack is to see if you can make a self-contained
 test case so other people can try to reproduce it.  More eyeballs on the
 problem are always better.

Can't really see this being possible.  This is clearly a very unusual
problem and without similar hardware I doubt that anyone else will
trigger it.  We ran this system happily for nearly a year on the
previous kernel without experiencing this problem (tcp lockups are a
different matter).  Also the load is provided by a bunch of servers and
robots simulating rising and falling load.

 Lastly, it might be interesting to look at the WAL logs for the period
 leading up to a failure.  This would give us an idea of what was
 happening concurrently with the processes that seem directly involved.

Next time we reproduce it, I'll take a copy of the WAL files too.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Index corruption

2006-06-29 Thread Marc Munro
On Thu, 2006-06-29 at 21:59 -0400, Tom Lane wrote:
 [ back to the start of the thread... ]
 
 BTW, a couple of thoughts here:
 
 * If my theory about the low-level cause is correct, then reindexing
 sl_log_1 would make the duplicate key errors go away, but nonetheless
 you'd have lost data --- the overwritten rows would be gone.  I suppose
 that this would result in the slave missing some rows that are present
 on the master.  Have you tried comparing slave and master databases to
 see if you can find any discrepancies?

Haven't done that yet - in test we tend to restart the old subscriber as
the new provider and rebuild the cluster.  I'll check the logs from our
production failure to figure out what to compare and see what I can
discover. 

 * One way that the problem could happen would be if a race condition in
 the kernel allowed an lseek(fd, 0, SEEK_END) to return a value less than
 the true end-of-file (as determined by another process' write()
 extending the EOF just slightly earlier --- ie, lseek fails to note the
 effects of the just-completed write, and returns the prior EOF value).
 PG does have internal locking that should guarantee that the lseek is
 not done until after the write completes ... but could there be a bug in
 the kernel allowing stale data to be returned?  The SMP hardware is
 relevant (maybe one processor sees different data than the other) and
 frankly I don't trust NFS very far at all for questions such as this.
 It'd be interesting to see if you can reproduce the problem in a
 database on local storage.

Unfortunately we haven't got any local storage that can stand the sort
of loads we are putting through.  With slower storage the CPUs mostly
sit idle and we are very unlikely to trigger a timing-based bug if
that's what it is.

I'll get back to you with kernel build information tomorrow.  We'll also
try to talk to some kernel hackers about this.

Many thanks for your efforts so far.
__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] Index corruption

2006-06-28 Thread Marc Munro
We have now experienced index corruption on two separate but identical
slony clusters.  In each case the slony subscriber failed after
attempting to insert a duplicate record.  In each case reindexing the
sl_log_1 table on the provider fixed the problem.

The latest occurrence was on our production cluster yesterday.  This has
only happened since we performed kernel upgrades and we are uncertain
whether this represents a kernel bug, or a postgres bug exposed by
different timings in the new kernel.

Our systems are:

Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
kernel boot option: elevator=deadline
16 Gigs of RAM
postgresql-8.0.3-1PGDG
Bonded e1000/tg3 NICs with 8192 MTU.
Slony 1.1.0

NetApp FAS270 OnTap 7.0.3
Mounted with the NFS options
rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac
Jumbo frames 8192 MTU.

All postgres data and logs are stored on the netapp.

In the latest episode, the index corruption was coincident with a
slony-induced vacuum.  I don't know if this was the case with our test
system failures.

What can we do to help identify the cause of this?  I believe we will be
able to reproduce this on a test system if there is some useful
investigation we can perform.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] New feature proposal

2006-05-23 Thread Marc Munro
On Fri, 2006-05-19 at 12:35 -0700, Marc Munro wrote:
 On Fri, 2006-05-19 at 14:44 -0400, Tom Lane wrote:
  This could all be solved in a cleaner, more bulletproof way if you
  simply require such add-ins to be preloaded into the postmaster process
  using the existing preload_libraries hook.  Then, such an add-in would
  allocate its own shmem segment independent of the main Postgres one.
  This totally eliminates worries about one chunk of code eating the other
  one's memory, which otherwise we'd have to have additional mechanism to
  deal with.
 
 This is an interesting idea that I had not previously considered.  I
 will give it some thought.

I have give this idea some further thought and I agree; Tom's solution
is more bulletproof and is the right way to go.  My original proposal is
withdrawn.

I am going to look into the best way to implement this but my gut
feeling is that I would like the support infrastructure for this to be
in Postgres rather than in Veil.

By support infrastructure, I mean APIs to create and access new shared
memory segments, and allocate chunks of memory from those shared
segments.

I think this code is better placed in Postgres rather than in specific
add-ins because: it is functionality that could benefit many add-ins; it
can make use of existing postgres code; and it can be easily tested in
the regression suite using the buildfarm.

I don't want to start working on this without knowing there is a chance
of the patch being acceptable, so feedback is invited.

Thanks.
__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED]
wrote:
 On Thu, 2006-05-18 at 17:39 -0700, Marc Munro wrote:
 
  For Postgres 8.2 I would like Veil to be a better citizen and use
 only
  what shared memory has been reserved for postgres add-ins.
 
 How would Postgres ask the add-in how much memory it needs? How would
 the add-in know how much has been reserved for it? How would an add-in
 know whether it was the only add-in and whether it could take all of
 the
 allocation?

Postgres would not ask any add-ins how much they need, it would simply
allocate the extra amount defined in a GUC and not make that available
through the normal shared memory allocation mechanism.

The add-in would not know how much had been allocated to it, but could
be told through it's own config file.  I envisage something like:

in postgresql.conf

# add_in_shmem = 0# Amount of shared mem to set aside for add-ins
  # in KBytes
add_in_shem = 64


in veil.conf

veil_shmem = 32   # Amount of shared memory we can use from
  # the postgres add-ins shared memory pool

I think this is better than add-ins simply stealing from, and contending
for, postgres shared memory which is the only real alternative right
now.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [pgsql-hackers] Daily digest v1.5943 (21 messages)

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 13:41 -0300, [EMAIL PROTECTED]
wrote:

 Marc Munro wrote:
  Veil http://pgfoundry.org/projects/veil is currently not a very good
  Postgres citizen.  It steals what little shared memory it needs from
  postgres' shared memory using ShmemAlloc().
  
  For Postgres 8.2 I would like Veil to be a better citizen and use
 only
  what shared memory has been reserved for postgres add-ins.
 
 Why should this be individually restricted? AFAICS Veil's
 functionality 
 would be essential to access row level ACL controlled tables, so if
 it 
 fails for low mem conditions it's much like a backend failure.

The problem is that postgres currently has no way of knowing how much
shared memory Veil is likely to require, and so will not allocate space
for it.  Consequently if Veil needs a lot of shared memory, it is
unlikely to be able to get it.

My intention is to allow add-ins to use only shared memory that has been
allocated for their own use, to ensure that enough has been allocated,
to isolate postgres shared memory from abuse by add-ins, and to avoid
the need for add-ins to manage their own shared memory.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 10:05 -0700, Josh Berkus wrote:
 Marc,
 
  The add-in would not know how much had been allocated to it, but could
  be told through it's own config file.  I envisage something like:
 
  in postgresql.conf
 
  # add_in_shmem = 0# Amount of shared mem to set aside for add-ins
# in KBytes
  add_in_shem = 64
 
 
  in veil.conf
 
  veil_shmem = 32   # Amount of shared memory we can use from
# the postgres add-ins shared memory pool
 
  I think this is better than add-ins simply stealing from, and contending
  for, postgres shared memory which is the only real alternative right
  now.
 
 H ... what would happen if I did:
 
 add_in_shmem = 64
 veil_shmem = 128
 
 or even:
 
 add_in_shmem = 128
 veil_shmem = 64
 plperl_shmem = 64
 pljava_shmem = 64
 

If that happens, one of the add-ins will be sadly disappointed when it
tries to use its allocation.  The same as would happen now, if Veil
attempted to allocate too large a chunk of shared memory.

My proposal makes it possible for properly configured add-ins to have a
guaranteed amount of shared memory available.  It allows add-ins to be
well-behaved in their use of shared memory, and it prevents them from
being able to exhaust postgres' own shared memory.

It doesn't prevent add-ins from over-allocating from the add-in memory
context, nor do I think it can or should do this.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] New feature proposal

2006-05-19 Thread Marc Munro
On Fri, 2006-05-19 at 14:44 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  My proposal makes it possible for properly configured add-ins to have a
  guaranteed amount of shared memory available.
 
 This could all be solved in a cleaner, more bulletproof way if you
 simply require such add-ins to be preloaded into the postmaster process
 using the existing preload_libraries hook.  Then, such an add-in would
 allocate its own shmem segment independent of the main Postgres one.
 This totally eliminates worries about one chunk of code eating the other
 one's memory, which otherwise we'd have to have additional mechanism to
 deal with.

This is an interesting idea that I had not previously considered.  I
will give it some thought.

I'm not convinced that we actually do need to prevent add-ins from
eating each other's memory.  Just as existing add-ins that use palloc
are expected to use the appropriate memory context and behave
themselves, I would expect the same to be true for add-ins that require
shared memory.

 In a Unix environment, such a thing would Just Work because pointers to
 the new segment would be inherited through fork().  In the Windows port
 you'd need to do more pushups --- perhaps allocate a small amount of
 memory in the main Postgres shmem segment containing the ID of the other
 shmem segment, which a backend would use to reattach.
 

For me, adding windows-specific code to Veil is highly unappealling - I
have no easy way to build or test for windows, and no experience of
doing so, so the more I can leverage the existing functionality, the
better.  

I had hoped to simply piggyback on Postgres' existing memory management
with a very small change to effectively add an add-in shared memory
context.

On the other hand, if this is the way we have to go, then perhaps it
could be added to Postgres as part of its api, rather than having Veil,
and perhaps other add-ins, implement it for themselves.

Thoughts?

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] audit table containing Select statements submitted

2006-05-14 Thread Marc Munro
You could do this using Veil, http://pgfoundry.org/projects/veil/, or
something like it.  A Veil access function,
http://veil.projects.postgresql.org/curdocs/overview-page.html, could be
used to record every row returned within a query to the user that
requested it.  Note that this operates at the level of fetches and not
the resultset, meaning that queries like:

select stuff from a where exists (select 1 from b where);

would record a fetch against b.

The basic trick is to replace table_that_you_want_audited with a view
that does something like: 

select * from table_that_you_want_audited 
where  audit_this_fetch(row_identifier);

You will also need instead-of triggers for insert, update and delete of
the view.

__
Marc

On Fri, 2006-05-12 at 14:19 -0300, [EMAIL PROTECTED]
wrote:
 From: Hogan, James F. Jr. 
 Sent: Thursday, May 04, 2006 12:46 PM
 To: 'pgsql-sql@postgresql.org'; 'pgsql-general@postgresql.org'
 Subject: audit table containing Select statements submitted
 
 No response from the pgsql-admin list so I though I would try cross
 posting here:
 pgsql-sql@postgresql.org
 pgsql-general@postgresql.org
 
 
 
 I just know I am not the first to try and do this 
 
 Jim
 
 *
 Can anyone point me in a direction that may help me populate in real
 time a table that holds?
 
 Current_user
 Timestamp
 The Select Statement Submitted by the User
 
 I need to be able to determine who viewed what and when they viewed
 it.
 
 I have considered the fact that the result from SELECT yesterday may
 be
 different than the result set returned by the SAME SELECT statement
 today, but when used in conjunction with the INSERT, UPDATE, DELETE
 audit logging I have already created, the answers to who viewed, what
 and when would be readily available. 
 
 I have been searching all morning and...
 
 The only thing I find on logging of Select statements is that the
 information can be held in the Log Files...if Logging is enabled.  
 
 As I am only interested in the statements presented against certain
 tables...
 
 Turning on logging gives me more than I need or care to look through. 
 
 I could write a script to parses the Log Files into a Database Table
 but
 would prefer to avoid enabling the file logging of statements if
 possible.
 
 Thanks for any reference or help you may be able to provide.
 
 Jim
 


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] [pgsql-hackers] Daily digest v1.5568 (24 messages)

2005-11-21 Thread Marc Munro
I wonder if this idea might be taken a little further, to allow
read-only tablespaces?

This would allow old partitions in very large databases to be kept on
read-only media, and would allow normal backups to ignore this
unchanging set of data.

It also allows for certain specific optimisations for this type of data,
as the MVCC rules are now relaxed.  Inclusion of a row in a read-only
index is now enough to guarantee the visibility of that row to all
backends, and fetches based solely on the index now become possible.

FWIW, Oracle does something like this.  Just a thought.

__
Marc

On Mon, 2005-11-21 at 09:53 -0400, [EMAIL PROTECTED]
wrote:
 Date: Sun, 20 Nov 2005 22:39:49 +0200 (EET)
 From: Heikki Linnakangas [EMAIL PROTECTED]
 To: pgsql-hackers@postgresql.org
 Subject: Data directory on read-only media
 Message-ID: [EMAIL PROTECTED]
 
 Hi,
 
 I've been thinking about running postgres from read-only media. It's 
 handy for creating demo CDs etc. I hacked together a patch that
 allows 
 you to run Postgres without write permissions to the data directory.
 


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Supporting NULL elements in arrays

2005-11-09 Thread Marc Munro
How about representing null array elements with a special-case cast-like
null?  Something like {::null}

__
Marc

On Tue, 2005-11-08 at 23:30 -0400, [EMAIL PROTECTED]
wrote:
 Date: Tue, 08 Nov 2005 19:21:34 -0500
 From: Tom Lane [EMAIL PROTECTED]
 To: pgsql-hackers@postgreSQL.org
 Subject: Re: Supporting NULL elements in arrays 
 Message-ID: [EMAIL PROTECTED]
 
 I wrote:
  ... the most straightforward thing to do is define an empty element
  as meaning a NULL.  But this might be objected to on a couple of
 grounds:
 
 I just thought of another, potentially fatal objection: it's ambiguous
 whether '{}'::text[] should be taken to mean an empty (zero-length)
 array or an array containing a single NULL element.
 
 For backwards compatibility it should mean an empty array, but then
 there's no way to represent ARRAY(NULL) in data dumps, which won't
 do either.
 
 The only workaround that comes to mind is to allow explicit
 specification of what's meant:  '[1:1]{}' would be needed to represent
 the one-null case.  Ugly.
 
 Ideas anyone?
 
 regards, tom lane
 


signature.asc
Description: This is a digitally signed message part


[HACKERS] Odd db lockup - investigation advice wanted

2005-11-07 Thread Marc Munro
Last week I managed to lock-up and then crash a development database.
I'm going to try to reproduce it today and would like to know what I can
do to further investigate the problem.

I am running Linux 2.6.9-11.ELsmp X86_64 on a Quad Dual-Core Opteron

I have the following postgres RPMs installed:

postgresql-libs-7.4.7-2.RHEL4.1
postgresql-contrib-8.0.3-1PGDG
postgresql-libs-8.0.3-1PGDG
postgresql-server-8.0.3-1PGDG
postgresql-8.0.3-1PGDG

PGDATA is installed on a Netapp network storage device.

We are using slony 1.1.0 for replication.

The (provider) database locked-up after I killed a slony client process
(kill -9) on the subscriber.  Psql connections would not respond to \d
and simply locked up.  I was able to run a query to check for blocking
locks - this returned no rows.  There was a significant test load on the
database at the time.

I stopped the database but was unable to restart it.  I was unable to
kill a number of postgres processes and could not release postgres
shared memory.  Having decided that the database was toast, I discovered
that I could not even delete the database files, and eventually the only
solution was a full reboot.

This hardware was destined to be put into production in the next two
weeks but this crash has shaken our confidence somewhat.  Any advice on
how to further investigate this would be much appreciated

Here is an exerpt from the logs at the time of the failure:
LOG:  duration: 4143.996 ms  statement: execute wibble_transaction
( '2012416', '3410660', '2005-11-04 17:39:49 -0600', '20005', '3', '0',
'{22000,22011,22001,22002,22003,22004,22005,22006,22007,22008,22009,22010}', 
'{8,0,0,0,2,1,0,7,2,23,26,0}' )
LOG:  duration: 4814.012 ms  statement: execute write_wibble
( '2048847',
'{{15000,17660,0},{15001,3522,0},{15002,0,0},{15003,3851,0},{15004,0,0},{15005,0,0},{15006,0,0},{15007,0,0},{15011,0,0},{15012,0,0},{15013,0,0},{15014,0,0},{15015,0,0},{15016,0,0},{15017,0,0},{15018,0,0},{15019,0,0},{15020,0,0},{15021,0,0},{15022,0,0},{15023,0,0},{15024,0,0},{15025,0,0},{15026,0,0},{15048,0,0},{15030,0,0},{15031,0,0},{15032,0,0},{15033,0,0},{15035,0,0},{15036,0,0},{15037,0,0},{15038,0,0},{15040,0,0},{15041,1,0},{15042,0,0},{15043,0,0},{15046,0,0},{15047,0,0},{15049,1000,0},{15050,0,0},{15051,0,0},{15052,0,0}}'
 )
LOG:  unexpected EOF on client connection
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  received immediate shutdown request
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.


My lock checking query:

select d.datname || '.' || c.relname as object,
   l.transaction as trans, l.pid, l.mode, 
   case when b.blocked then b.blocker else null end as blocker
from  (
select w.pid as pid, h.pid as blocker, 't'::bool as blocked
from   pg_locks h, pg_locks w
where  h.granted 
andnot w.granted 
and   (   (h.relation = w.relation and h.database = w.database) 
   or  h.transaction = w.transaction)
union
select h.pid, null, 'f'::bool as blocked
from   pg_locks h, pg_locks w
where  h.granted 
andnot w.granted 
and   (   (h.relation = w.relation and h.database = w.database) 
   or  h.transaction = w.transaction)
   ) b,
   pg_locks l
left outer join pg_database d
 on d.oid = l.database
left outer join  pg_class c
 on c.oid = l.relation
where   l.pid = b.pid
order by l.pid;

Thanks for any suggestions.

__
Marc


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] dynamic loading of .so (originally from

2005-10-24 Thread Marc Munro
Bruce,
There are two problems, though maybe I came to the wrong solution.  I'm
not averse to changing it.

1) Veil starts from a user process and not from the postmaster.  This
means that any shared memory segments created can not necessarily be
mapped to the same address space in each process, which makes using such
shared memory a little more painful than just referencing pointers.

2) There is no simple way to close the shared memory segement when
postgres shuts down.

In an earlier version of Veil I did allocate my own shared memory and
could revive the code if we can overcome problem number 2.  I'd like to
also overcome problem 1 as it makes the code extremely ugly but it's no
show stopper.

Any thoughts?

__
Marc

On Mon, 2005-10-24 at 09:24 -0400, Bruce Momjian wrote:
 Uh, why does Veil have to allocate space from the backend shared memory
 pool rather than allocating its own shared memory segment?
 
 ---
 
 Marc Munro wrote:
 -- Start of PGP signed section.
  Tom,
  My project, Veil, steals some of this shared memory for itself.  I wan't
  aware of the slop factor and was pleased that it just worked.  I guess
  I should have been asking questions of this group.  Sorry.
  
  I would like Veil to be a good citizen and place a legitimate request
  for its shared memory (probably about 16K normally).
  
  Veil is loaded as a shared library, which I would assume means that it
  is not present during database startup, making contributing to the
  sizing calculation and racing the lockmgr a little tricky.
  
  I see a number of possibilities:
  
  - Have a GUC to allocate shmem space for add-ons
  - Have add-ons register themselves and provide hooks for sizing and
  initial space allocation
  - Let add-ons race with the lockmgr for the slop (ie leave as it is)
  
  Thoughts?
  
  I would be happy to work on this and provide whatever patches are
  necessary.
  
  Thanks.
  __
  Marc Munro
  
  On Mon, 2005-10-17 at 10:34 -0300, [EMAIL PROTECTED]
  wrote:
   Date: Mon, 17 Oct 2005 00:42:17 -0400
   From: Tom Lane [EMAIL PROTECTED]
   To: Douglas McNaught [EMAIL PROTECTED]
   Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
   pgsql-general@postgresql.org pgsql-general@postgresql.org
   Subject: Re: dynamic loading of .so 
   Message-ID: [EMAIL PROTECTED]
   
   Douglas McNaught [EMAIL PROTECTED] writes:
[EMAIL PROTECTED] writes:
are there any way to make them global for all the instances?
   
Put them in shared memory.  This probably isn't trival, as all the
shared memory is allocated up front and used for existing purposes
   (at
least, as I understand it). 
   
   There's a slop factor of 100KB or so in the shared memory size
   calculation, which means that an add-on library that requests space
   soon
   enough could probably get away with allocating up to a few KB without
   causing any problems.  (The slop is not totally useless, since for
   instance the lock manager might eat it up if more locks get requested
   than expected.)
   
   In the long run it might be interesting to add hooks that allow
   preloaded libraries to contribute to the shmem sizing calculation and
   then to snarf up the space they've requested before it can get eaten
   by
   the lockmgr.
   
   regards, tom lane
   
 -- End of PGP section, PGP failed!
 


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] dynamic loading of .so (originally from pgsql-general)

2005-10-17 Thread Marc Munro
Tom,
My project, Veil, steals some of this shared memory for itself.  I wan't
aware of the slop factor and was pleased that it just worked.  I guess
I should have been asking questions of this group.  Sorry.

I would like Veil to be a good citizen and place a legitimate request
for its shared memory (probably about 16K normally).

Veil is loaded as a shared library, which I would assume means that it
is not present during database startup, making contributing to the
sizing calculation and racing the lockmgr a little tricky.

I see a number of possibilities:

- Have a GUC to allocate shmem space for add-ons
- Have add-ons register themselves and provide hooks for sizing and
initial space allocation
- Let add-ons race with the lockmgr for the slop (ie leave as it is)

Thoughts?

I would be happy to work on this and provide whatever patches are
necessary.

Thanks.
__
Marc Munro

On Mon, 2005-10-17 at 10:34 -0300, [EMAIL PROTECTED]
wrote:
 Date: Mon, 17 Oct 2005 00:42:17 -0400
 From: Tom Lane [EMAIL PROTECTED]
 To: Douglas McNaught [EMAIL PROTECTED]
 Cc: [EMAIL PROTECTED], [EMAIL PROTECTED],
 pgsql-general@postgresql.org pgsql-general@postgresql.org
 Subject: Re: dynamic loading of .so 
 Message-ID: [EMAIL PROTECTED]
 
 Douglas McNaught [EMAIL PROTECTED] writes:
  [EMAIL PROTECTED] writes:
  are there any way to make them global for all the instances?
 
  Put them in shared memory.  This probably isn't trival, as all the
  shared memory is allocated up front and used for existing purposes
 (at
  least, as I understand it). 
 
 There's a slop factor of 100KB or so in the shared memory size
 calculation, which means that an add-on library that requests space
 soon
 enough could probably get away with allocating up to a few KB without
 causing any problems.  (The slop is not totally useless, since for
 instance the lock manager might eat it up if more locks get requested
 than expected.)
 
 In the long run it might be interesting to add hooks that allow
 preloaded libraries to contribute to the shmem sizing calculation and
 then to snarf up the space they've requested before it can get eaten
 by
 the lockmgr.
 
 regards, tom lane
 


signature.asc
Description: This is a digitally signed message part


Re: User-assigned LWLocks (was Re: [HACKERS] Announcing Veil)

2005-10-09 Thread Marc Munro
For my part, I don't see any current need for extra locking here.

Veil ensures that only one session ever calls LWLockAssign(), and as the
Veil LWLock is allocated on the first piece of user-invoked SQL to call
a Veil function, I see no scope for races between Veil and the rest of
Postgres.

Maybe the correct thing to do is only allow 1 user-defined LWLock for
now, and place a comment with the definition of NUM_USER_DEFINED_LWLOCKS
to warn that locking should be implemented if more than 1 is ever
needed.

__
Marc

On Fri, 2005-10-07 at 16:21 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  On the use of LWLockAssign():can anyone tell me if I should protect the
  call using the ShmemLock spinlock?
 
 Hmm ... the comment for LWLockAssign is not meant to imply that the
 caller could do that; in the event of being out of LWLocks, the code
 would elog(FATAL) without releasing the spinlock, which would lock up
 the whole database.  If we were to do it that way we'd need the spinlock
 handling to be done inside LWLockAssign.  This would not be that bad,
 just a marginal slowdown during database startup, but given the low
 demand for this feature I'm not very eager to do it.
 
 The alternative though would seem to be to adopt some convention about
 another LWLock to take while trying to assign a new LWLock post-startup.
 None of the existing locks seem very appropriate for this, and putting
 the responsibility on callers might be unwise anyway.
 
 Thoughts?
 
   regards, tom lane


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Announcing Veil

2005-10-07 Thread Marc Munro
In response to both Bruce and Tom,
Thanks for this.  I am very happy that this patch will be going in.
Thanks also for pointing out the correct header to use.

As Tom points out, this will do nothing for users of 7.4 and 8.0.  For
these versions I propose to continue to use MMCacheLock.  As far as I
can see, this is safe in 7.4, and otherwise unused in 8.0.

On the use of LWLockAssign():can anyone tell me if I should protect the
call using the ShmemLock spinlock?

__
Marc Munro

On Fri, 2005-10-07 at 00:10 -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian pgman@candle.pha.pa.us writes:
   Tom Lane wrote:
   With only one known request for a user-allocate d lock, it's hard to
   justify the overhead of a GUC variable.
  
   True, but are people going to recompile PostgreSQL to use this feature?
   Seems they would have to.
  
  How you figure that?  The proposed default value was 4, which seems
  fine to me, given that the known worldwide demand amounts to 1.
 
 Oh, so you are going to give him a few slots.  I thought we were going
 to default to 0 and he was going to have to bump it up to use his
 software.  That sounds fine to me.
 


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Announcing Veil

2005-10-05 Thread Marc Munro
Tom,
Thanks for your reponse.  Unless I am missing your point, to add more
locks we require a minor code change to the postgres server.  I am happy
to submit a patch but this will not help Veil work with existing
versions of Postgres.  I am aiming for compatibility with 7.4 onward.
Your views on this would be appreciated.

Assuming that simply allocating a few extra LWLocks for user-defined
functions is acceptable, here are some patches:

--cut---
*** ./src/backend/storage/lmgr/lwlock.c Sat Aug 20 16:26:24 2005
--- lwlock.cWed Oct  5 08:20:31 2005
***
*** 120,126 
 */
numLocks += 2 * NUM_SLRU_BUFFERS;

!   /* Perhaps create a few more for use by user-defined modules? */

return numLocks;
  }
--- 120,127 
 */
numLocks += 2 * NUM_SLRU_BUFFERS;

!   /* Create a few more for use by user-defined modules. */
!   numLocks += NUM_USER_DEFINED_LWLOCKS;

return numLocks;
  }
--cut---
*** ./src/include/storage/lwlock.h  Sat Aug 20 16:26:34 2005
--- lwlock.hWed Oct  5 08:22:26 2005
***
*** 53,58 
--- 53,62 
MaxDynamicLWLock = 10
  } LWLockId;
 
+ /*
+  * Allocate a few LWLocks for user-defined functions.
+  */
+ #define NUM_USER_DEFINED_LWLOCKS 4

  typedef enum LWLockMode
  {
--cut---


__
Marc Munro

On Tue, 2005-10-04 at 22:51 -0400, Tom Lane wrote:
 Marc Munro [EMAIL PROTECTED] writes:
  Since I was unable to dynamically assign a LWLock using
  LWLockAssign (none available), I have fairly arbitrarily overloaded the
  use of existing LWLocks.  When the flames die down perhaps we can
  discuss making a small number (one would be enough for me) of LWLocks
  available.
 
 Perhaps you missed the comment in NumLWLocks()?
 
   regards, tom lane


signature.asc
Description: This is a digitally signed message part


[HACKERS] Announcing Veil

2005-10-04 Thread Marc Munro
This is to announce the first, Alpha, release of Veil, a database
security add-on for Postgres. It allows databases to be created with
efficient row-level access controls; different users have access to
different subsets of data.

Veil is hosted on pgfoundry at http://veil.projects.postgresql.org/
Documentation is here:
http://veil.projects.postgresql.org/curdocs/index.html

If there is sufficient interest, I would ultimately like Veil to be
added to contrib - we will have to see how things pan out.

I am announcing this to the postgres hackers list as I hope for review
comments from a qualified community.  All comments will be welcomed.

If you would like somewhere controversial to start your review, take a
look at veil_shmem.c which hooks into the postgres shared memory
subsystem.  Since I was unable to dynamically assign a LWLock using
LWLockAssign (none available), I have fairly arbitrarily overloaded the
use of existing LWLocks.  When the flames die down perhaps we can
discuss making a small number (one would be enough for me) of LWLocks
available.

Thank you.

__
Marc Munro


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Incremental backup

2003-02-15 Thread Marc Munro
Kevin,
Hi.  I was looking into PITR for PostgreSQL myself about a year back but
life intervened.  I am an Oracle DBA so may be able to help you with an
understanding of how Oracle does this.

You wrote:
 Oracle has something they call rollback segments which I assume are
 separate bits of data that have enough information to reverse changes
 that were made to the database during a transaction, and I figured
 PITR would (or could) apply particular saved rollback segments to the
 current state in order to roll back a table, tablespace, or database
 to the state it was in at a particular point in time.
 
 As it is, it sounds like PITR is a bit less refined than I expected.

Actually Oracle uses its redo logs (like PostgreSQL' WAL) and
archivelogs (copies of old redo logs) rather than rollback segments.

The way it works, more or less, is that you start from a hot backup, and
roll-forward using archivelogs and redo logs to the point in time to
which you wish to recover.  The whole point of this is not so much that
we can restore our database to some point in the past, as that we can
recover to just before some disaster struck, from a starting point of a
previous hot backup.  In fact, the whole PITR thing for Oracle seems to
be simply an enabling technology for hot backups.  This seems to me to
be one of the killer enterprise features that PostgreSQL currently
lacks.

 So the relevant question is: how is *our* PITR going to work?  In
 particular, how is it going to interact with our WAL files and the
 table store?  If I'm not mistaken, right now (well, as of 7.2 anyway)
 we round robin through a fixed set of WAL files.  For PITR, I assume
 we'd need an archivelog function that would copy the WAL files as
 they're checkpointed to some other location (with destination names
 that reflect their order in time), just for starters.

I believe that three things are needed:
1) filesystem-based hot backups.  It may be possible to simply copy the
database files even though they would be open.  Oracle places each
tablespace into a hot-backup mode prior to taking the copy. In this
mode, updates to the files are queued-up to be applied once the
tablespace is taken out of hot-backup mode.  This all seems quite tricky
and will slow the database down.

2) Auto-archiving of WAL files.  Just as you suggest.

3) A recovery controller that can figure out the state of the restored
database, and manage the reapplication of archived and current WAL
files.
 
 It'd be *awfully* nice if you could issue a command to roll a table
 (or, perhaps, a tablespace, if you've got a bunch of foreign keys and
 such) back to a particular point in time, from the command line, with
 no significant advance preparation (so long as the required files are
 still around, and if they're not then abort the operation with the
 appropriate error message).  But it doesn't sound like that's what
 we're talking about when we talk about PITR...

Nice but lots of work for questionable benefit.  If your application
needs to keep track of history there are better ways to do it.  If not,
then PITR should be seen as an exceptional circumstance and so need not
be so easy to manage.
 
 I wouldn't expect the O* docs to be particularly revealing about
 how the database manages PITR at the file level, but if it does, would
 you happen to know where so I can look at it?  What I've seen so far
 is very basic and not very revealing at all...

The best Oracle documentation on this is available through Oracle
technet (technet.oracle.com) for which you will have to need to
register.  Look for documentation-Oracle 9i Documentation-list of
books-Backup and Recovery Concepts.

-- 
Marc[EMAIL PROTECTED]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html