date:20080929

Re: [HACKERS] [REVIEW] Prototype: In-place upgrade v02

2008-09-29 Thread Abbas

On Mon, 2008-09-29 at 14:42 +0200, Zdenek Kotala wrote:
> Abbas napsal(a):
> > Hi,
> > 
> > I have gone through the following stuff
> > 
> > 1) previous emails on the patch
> > 2) http://wiki.postgresql.org/wiki/In-place_upgrade
> > 3) http://www.pgcon.org/2008/schedule/attachments/57_pg_upgrade_2008.pdf
> > 4) http://wiki.postgresql.org/wiki/In-place_upgrade:Storage
> > 
> > Here is what I have understood so far, (correct me if I am wrong)
> > 
> > The on disk representation of data has changed from version to version
> > over the years. For some strange reason (performance may be) the newer
> > versions of pg were not backwards compatible, meaning that the newer
> > version would not read data written by an older version if the on disk
> > representation has changed in between.
> > The end user would be required to port the data stored using older
> > version to the newer version format using offline import export.
> > This project aims upgrades from older to newer version on the fly.
> > On-disk representation is not the only change that the system should
> > accommodate, it should also accommodate catalog changes, conf file
> > changes etc.
> 
> It is correct.
> 
> > Of the available design choices I think you have chosen to go with
> > on-line data conversion, meaning that pg would now be aware of all the
> > previous page layouts and based on a switch on page version would handle
> > each page layout. This will only be done to read old data, newer data
> > will be written in newer format.
> 
> Yes.
> 
> > I am supposed to test the patch and for that I have downloaded pg
> > versions 7.4, 8.0, 8.1, 8.2 and 8.3.
> >
> > I plan to create a data directory using each of the versions and then
> > try to read the same using the 8.4 with your patch applied.
> 
> It does not work. The patch is only prototype. It contains framework for 
> implementing old page layout version and it contains partial version 3.
> 
> The main purpose of this prototype is to make decision if this approach is 
> acceptable or not. Or if some part is acceptable - it contains for example 
> useful page API rework and implementation which is useful (by my opinion) in 
> general.
> 
> > What database objects should I create in the test database, should I
> > just create objects of my choice?
> > 
> > Does sizes (both length and breadth) of tables matter?
> 
> These test does not make sense at this moment. I have test script (created by 
> Nidhi) for catalog upgrade already done. However, it uses currently Sun's 
> internal framework. I will modify it and release it.
> 
> > Do I have to perform performance tests too?
> 
> Yes, please. My colleague tested it and got 5% performance drop, but it was 
> not 
> complete version and I tested full patch on Friday and It was surprise for me 
> ...  I got little bit better throughput (about 0,5%) with patch. I'm going to 
> retest it again but it would be good to get result also from others.


Are there defined ways of benchmarking the current performance of pg
database on my system?
Are there any guide lines on what type of data or table design should be
used?
One option of getting the benchmarks could be to use the run time
statistics logging with and without the patch.
And I have to test the performance of the 4 basic queries INSERT,
UPDATE, DELETE and SELECT, right?

Regards
Abbas




> 
>   thanks Zdenek
> 
> 
> > 
> > Regards
> > Abbas
> > 
> > 
> > 
> > 
> > On Fri, 2008-09-19 at 14:28 +0200, Zdenek Kotala wrote:
> >> thanks
> >>
> >> Abbas napsal(a):
> >>> Even with that a hunk failed for bufpage.c, but I applied that part
> >>> manually to move on.
> >>> Regards
> >>> Abbas
> >>>
> >>> On Thu, 2008-09-18 at 12:17 +0200, Zdenek Kotala wrote:
>  Abbas napsal(a):
> > Hi,
> > I downloaded latest postgresql source code from
> >  git clone git://git.postgresql.org/git/postgresql.git
> > and tried to apply the patch 
> >  http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz
> >
> > It does not apply cleanly, see the failures in attached file.
>  It clash with hash index patch which was committed four days ago. Try to 
>  use 
>  little bit older revision from git (without hash index modification).
> 
>   Zdenek
> 
> 
> 
> >>
> > 
> 
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Function management in PG

2008-09-29 Thread Robins Tharakan

Hi,
While making a complex database back-end, I have at-hand about 200 odd
functions and frankly 'management of functions' is already getting quite
tedious. Since the count is certain to rise, I am looking for a good tool to
do this.

By management, I guess I am looking at some kind of tagging mechanism, where
it could keep stored a list of tags for each function (tags that I provide,
for each new function created) and it be able to provide a comfortable way
of searching/browsing through the list.

Originally, I also wished that this could allow me to see whether a given
function directly (or indirectly) 'can' change the database, but I guess the
'Volatile' function type suffices my need there. The problem with that is
that I still can't see a function's volatility alongside a list of
functions. Now making a front-end in PHP seems a few days of work, and
although not impossible, I wondered whether this has already been done.

How do you manage large list of functions (where a hard segregation isn't
hard and therefore one can't separate functions based on schemas) ?Any
pointers to a Free or Commercial tool would be of immense help.

Thanks,
Robins Tharakan

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Philip Warner

Andrew Dunstan wrote:
> Unfortunately, it quite possibly would. You would not be able to build
> two indexes on the same table in parallel, even though they wouldn't
> have conflicting locks.
I suppose so, but:

1. By the same logic it might speed things up; it might build two
completely separate indexes and thereby avoid (some kind of) contention.
In any case, it would most likely do *something* else. It should only
reduce performance if (a) it can do nothing or (b) there is a benefit in
building multiple indexes on the same table at the same time.

2. Perhaps if there are a limited number of items that share
dependencies but which are known to be OK (ie. indexes), maybe list them
in the inner loop as exceptions and allow them to run parallel. This
would mean a failure to list a new TOC item type would result in worse
performance rather than a crash.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Andrew Dunstan




Philip Warner wrote:

+   if (strcmp(te->desc,"CONSTRAINT") == 0 
||
+   strcmp(te->desc,"FK 
CONSTRAINT") == 0 ||
+   strcmp(te->desc,"CHECK 
CONSTRAINT") == 0 ||
+   strcmp(te->desc,"TRIGGER") == 0 
||
+   
strcmp(slots[i].te->desc,"CONSTRAINT") == 0 ||
+   strcmp(slots[i].te->desc,"FK 
CONSTRAINT") == 0 ||
+   strcmp(slots[i].te->desc,"CHECK 
CONSTRAINT") == 0 ||
+   
strcmp(slots[i].te->desc,"TRIGGER") == 0)
  


Really just an observation from the peanut gallery here, but every time
pg_restore hard-codes this kind of thing, it introduces yet another
possible side-effect bug when someone, eg, adds a new TOC type.

Would it substantially decrease the benefits of the patch to skip *any*
toc entry that shares dependencies with another? (rather than just those
listed above).


  


Unfortunately, it quite possibly would. You would not be able to build 
two indexes on the same table in parallel, even though they wouldn't 
have conflicting locks.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Philip Warner

>
> + if (strcmp(te->desc,"CONSTRAINT") == 0 
> ||
> + strcmp(te->desc,"FK 
> CONSTRAINT") == 0 ||
> + strcmp(te->desc,"CHECK 
> CONSTRAINT") == 0 ||
> + strcmp(te->desc,"TRIGGER") == 0 
> ||
> + 
> strcmp(slots[i].te->desc,"CONSTRAINT") == 0 ||
> + strcmp(slots[i].te->desc,"FK 
> CONSTRAINT") == 0 ||
> + strcmp(slots[i].te->desc,"CHECK 
> CONSTRAINT") == 0 ||
> + 
> strcmp(slots[i].te->desc,"TRIGGER") == 0)
>   
Really just an observation from the peanut gallery here, but every time
pg_restore hard-codes this kind of thing, it introduces yet another
possible side-effect bug when someone, eg, adds a new TOC type.

Would it substantially decrease the benefits of the patch to skip *any*
toc entry that shares dependencies with another? (rather than just those
listed above).


 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread KaiGai Kohei

> Have you seen the example?
>   http://kaigai.myhome.cx/index.php (id: foo/var/baz pass: sepgsql)
   ^
It means we can select one of "foo", "var" or "baz", and they have same
password. I'm sorry, if it was a confusable representation.

> It returns different query result based on the security context associated
> with HTTP authenticated user. In like wise, your accesses to the listed
> three Jpeg files are also restricted.
> In both cases, a user "foo" cannot access anything except for objects
> labeled as "FooData" or unlabled.
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread KaiGai Kohei


Bruce Momjian wrote:

Josh Berkus wrote:

At the past, I had considered to implement polyinstantiated table
as a part of SE-PostgreSQL, but it required us unacceptable scale
of changes, so I dropped the idea.
The TrustedSolaris folks would like polyinstantiation, but I don't know 
if they actually have anyone working on Postgres anymore.


I think polyinstantiation is an issue that will need much more research
before deciding on impelemeting it.


As of this April, they didn't have a plan to implement polyinstantiation
at the row-level, but considered it in per-table granularity.
Of course, they also understood row-level polyinstantiation is a tough work.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread KaiGai Kohei


Bruce Momjian wrote:

I think there are two goals here.  At the SQL-level, we will have
per-role row and column permissions (which seem valuable on their own),
and SE-PostgreSQL allows those permissions to be controlled at the
operating system level rather than at the database level.


Yes, it is correct.
As someone noted, SQL-level fine-grained access controls are also usefull
feature. I understand it.
SE-PostgreSQL makes its decision based on the security policy stored
in operating system because of the "consistency". However, database objects
are invisible for operating system, so we have to add an option to RDBMS.


I think your major question is how often do you have users that you need
to control at both the SQL _and_ operating system level.  I guess the
answer is that security policy suggests controlling things at the lowest
level, and bubling that security up into the database and applications.


As I mentioned at the previous message, it is very frequent case when
a single web application accesses both filesystem objects and database
objects at the same time.
The important thing is to turn off at the main. Smaller number of security
sensitive codes are betther for consistency and completeness in security.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread KaiGai Kohei

Peter Eisentraut wrote:
> KaiGai Kohei wrote:
>> As I repeated several times, SE-PostgreSQL applies the seuciry policy
>> of SELinux to achieve consistency in access controls. This feature
>> enables to restrict client's privileges on accesses to database objects,
>> as if it accesses to filesystem objects. Its background is our nightmare
>> for web application flaws.
>>
>> The major purpose of this feature is to provide the most important
>> component to run enterprise class web application with least privilege
>> set which is consistent at whole of the system.
> 
> How important is this consistency goal in reality?  We typically
> recommend that database applications run entirely in the database, for
> transaction integrity reasons and so on.

It is a bit pointless.
The "consistency" in this term is also applied to references to static
web contents, like Jpeg images, PDF documents and so on.
These are not accessed via web application and databases because httpd
makes a response directly, but most of web applications are constructed
with dynamic and static contents.

Have you seen the example?
  http://kaigai.myhome.cx/index.php (id: foo/var/baz pass: sepgsql)

It returns different query result based on the security context associated
with HTTP authenticated user. In like wise, your accesses to the listed
three Jpeg files are also restricted.
In both cases, a user "foo" cannot access anything except for objects
labeled as "FooData" or unlabled.

> Of coure, this is an ideal state, and we all of to break that once in a
> while.  But hence the honest question, how often does that really happen
>  and to what extent, and does that justify the significant investment
> that is being proposed here?

I think it is very usual case.
For example, it is a usefull case for a web-based document management
system to apply consistent access control policy for database tuples
and pdf documents on filesystem.

> Then, how does MAC help with SQL injections?  Using the existing
> role-based system you can already define least-privilege users that are
> essentially powerless even if SQL injections were to happen.  I am not
> aware that important flaws or gaps in our role-based access control
> system have been pointed out that would make it impossible to create
> applications with security levels similar to those achievable with a MAC
> system.

This feature is mainly placed on fine-grained access controls.
If 50% of tuples are invisible, the scope of dameges are limited.
However, please consider a case when connection string is hijacked
and malicious SQL is injected. As widely known, it is almost impossible
to eliminate all bugs withing various kind of web applications.

> Now, if you come back to your original goal of consistency in access
> control, then it may in fact turn out that an
> FLASK/SELinux/SE-PostgreSQL-based system is the best common ground for
> such a system, but I am unconvinced that MAC by itself is necessary.

It's unclear for me the meanings of "MAC by itself".
Are you wondering the mandatory access control policy by SE-PostgreSQL, not
a Linux kernel?

If so, please consider the relationship between a client, an object manager
and method to access.
In operating system case, a client (process) invokes a system call to access
filesystem object which is managed by operating system. It is similar to that
a client send a SQL query to access database object which is managed by database
management system.
# Accesses to database objects are invisible for operating system, at all. :)

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread Bruce Momjian

Josh Berkus wrote:
> 
> > At the past, I had considered to implement polyinstantiated table
> > as a part of SE-PostgreSQL, but it required us unacceptable scale
> > of changes, so I dropped the idea.
> 
> The TrustedSolaris folks would like polyinstantiation, but I don't know 
> if they actually have anyone working on Postgres anymore.

I think polyinstantiation is an issue that will need much more research
before deciding on impelemeting it.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Andrew Dunstan




Stefan Kaltenbrunner wrote:

Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

Um, FKs could conflict with each other too, so that by itself isn't
gonna fix anything.


Good point. Looks like we'll need to make a list of "can't run in 
parallel with" items as well as strict dependencies.


Yeah, I was just thinking about that.  The current archive format
doesn't really carry enough information for this.  I think there
are two basic solutions we could adopt:

* Extend the archive format to provide some indication that "restoring
this object requires exclusive access to these dependencies".

* Hardwire knowledge into pg_restore that certain types of objects
require exclusive access to their dependencies.

The former seems more flexible, as well as more in tune with the basic
design assumption that pg_restore shouldn't have a lot of knowledge
about individual archive object types.  But it would mean that you
couldn't use parallel restore with any pre-8.4 dumps.  In the long run
that's no big deal, but in the short run it's annoying.


hmm not sure how much of a problem that really is - we usually 
recommend to use the pg_dump version of the target database anyway.







We don't really need a huge amount of hardwiring as it turns out. Here 
is a version of the patch that tries to do what's needed in this area.


cheers

andrew
Index: pg_backup.h
===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** pg_backup.h	13 Apr 2008 03:49:21 -	1.47
--- pg_backup.h	29 Sep 2008 23:34:57 -
***
*** 123,128 
--- 123,130 
  	int			suppressDumpWarnings;	/* Suppress output of WARNING entries
  		 * to stderr */
  	bool		single_txn;
+ int number_of_threads;
+ 	booltruncate_before_load;
  
  	bool	   *idWanted;		/* array showing which dump IDs to emit */
  } RestoreOptions;
***
*** 165,170 
--- 167,173 
  extern void CloseArchive(Archive *AH);
  
  extern void RestoreArchive(Archive *AH, RestoreOptions *ropt);
+ extern void RestoreArchiveParallel(Archive *AH, RestoreOptions *ropt);
  
  /* Open an existing archive */
  extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt);
Index: pg_backup_archiver.c
===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.158
diff -c -r1.158 pg_backup_archiver.c
*** pg_backup_archiver.c	5 Sep 2008 23:53:42 -	1.158
--- pg_backup_archiver.c	29 Sep 2008 23:34:58 -
***
*** 27,38 
--- 27,51 
  
  #include 
  
+ #include 
+ #include 
+ 
+ 
  #ifdef WIN32
  #include 
  #endif
  
  #include "libpq/libpq-fs.h"
  
+ typedef struct _parallel_slot 
+ {
+ 	pid_t   pid;
+ 	TocEntry *te;
+ 	DumpId  dumpId;
+ 	DumpId  tdeps[2];
+ } ParallelSlot;
+ 
+ #define NO_SLOT (-1)
  
  const char *progname;
  
***
*** 70,76 
--- 83,100 
  static void dumpTimestamp(ArchiveHandle *AH, const char *msg, time_t tim);
  static OutputContext SetOutput(ArchiveHandle *AH, char *filename, int compression);
  static void ResetOutput(ArchiveHandle *AH, OutputContext savedContext);
+ static bool work_is_being_done(ParallelSlot *slot, int n_slots);
+ static int get_next_slot(ParallelSlot *slots, int n_slots);
+ static TocEntry *get_next_work_item(ArchiveHandle *AH, ParallelSlot *slots, int n_slots);
+ static void prestore(ArchiveHandle *AH, TocEntry *te);
+ static void mark_work_done(ArchiveHandle *AH, pid_t worker, ParallelSlot *slots, int n_slots);
+ static int _restore_one_te(ArchiveHandle *ah, TocEntry *te, RestoreOptions *ropt,bool is_parallel);
+ static void _reduce_dependencies(ArchiveHandle * AH, TocEntry *te);
+ static void _fix_dependency_counts(ArchiveHandle *AH);
+ static void _inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry * te);
+ 
  
+ static ArchiveHandle *GAH;
  
  /*
   *	Wrapper functions.
***
*** 125,137 
  
  /* Public */
  void
  RestoreArchive(Archive *AHX, RestoreOptions *ropt)
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
  	teReqs		reqs;
  	OutputContext sav;
- 	bool		defnDumped;
  
  	AH->ropt = ropt;
  	AH->stage = STAGE_INITIALIZING;
--- 149,633 
  
  /* Public */
  void
+ RestoreArchiveParallel(Archive *AHX, RestoreOptions *ropt)
+ {
+ 
+ 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ 	ParallelSlot  *slots;
+ int next_slot;
+ TocEntry *next_work_item = NULL;
+ int work_status;
+ 	pid_t ret_child;
+ 	int n_slots = ropt->number_of_threads;
+ 	TocEntry *te;
+ 	teReqsreqs;
+ 	
+ 
+ 	 	AH->debugLevel = 99; 
+ 	/* some routines that use ahlog() don't get passed AH */
+ 	GAH = AH;
+ 
+ 	ahlog(AH,1,"entering RestoreARchiveParallel\n");
+ 
+ 
+ 	slots = (ParallelSlot *) calloc(sizeof(ParallelSlot),n_slots);
+ 	AH->ropt = ropt;
+ 
+ /*
+ 	if (ropt->create)

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread Josh Berkus




At the past, I had considered to implement polyinstantiated table
as a part of SE-PostgreSQL, but it required us unacceptable scale
of changes, so I dropped the idea.


The TrustedSolaris folks would like polyinstantiation, but I don't know 
if they actually have anyone working on Postgres anymore.


--Josh


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread Bruce Momjian

Peter Eisentraut wrote:
> > The major purpose of this feature is to provide the most important
> > component to run enterprise class web application with least privilege
> > set which is consistent at whole of the system.
> 
> How important is this consistency goal in reality?  We typically
> recommend that database applications run entirely in the database, for
> transaction integrity reasons and so on.  Unless you are doing wild and
> fun things with server-side copy or untrusted procedural languages,
> there really shouldn't be that much use for consistency of access
> control between PostgreSQL and something else.  In fact, on top of the
> transactional integrity criterion, having consistent access control is
> one of the reasons to have all your production data in the database
> system and nowhere else.
> 
> Of coure, this is an ideal state, and we all of to break that once in a
> while.  But hence the honest question, how often does that really happen
>  and to what extent, and does that justify the significant investment
> that is being proposed here?

> Then, how does MAC help with SQL injections?  Using the existing
> role-based system you can already define least-privilege users that are
> essentially powerless even if SQL injections were to happen.  I am not
> aware that important flaws or gaps in our role-based access control
> system have been pointed out that would make it impossible to create
> applications with security levels similar to those achievable with a MAC
> system.

I think there are two goals here.  At the SQL-level, we will have
per-role row and column permissions (which seem valuable on their own),
and SE-PostgreSQL allows those permissions to be controlled at the
operating system level rather than at the database level.

I think your major question is how often do you have users that you need
to control at both the SQL _and_ operating system level.  I guess the
answer is that security policy suggests controlling things at the lowest
level, and bubling that security up into the database and applications.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread Josh Berkus


Peter,

How important is this consistency goal in reality?  


It's actually the primary point of SE-Linux.  Its audience wants a 
centralized policy manager which applies access policies to everything 
on the network, regardless of whether it's a file, a port, or a database.


Oracle has not been consistent, and as a result the security geeks have 
been very frustrated with Oracle.


--Josh

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Stefan Kaltenbrunner


Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

Um, FKs could conflict with each other too, so that by itself isn't
gonna fix anything.


Good point. Looks like we'll need to make a list of "can't run in 
parallel with" items as well as strict dependencies.


Yeah, I was just thinking about that.  The current archive format
doesn't really carry enough information for this.  I think there
are two basic solutions we could adopt:

* Extend the archive format to provide some indication that "restoring
this object requires exclusive access to these dependencies".

* Hardwire knowledge into pg_restore that certain types of objects
require exclusive access to their dependencies.

The former seems more flexible, as well as more in tune with the basic
design assumption that pg_restore shouldn't have a lot of knowledge
about individual archive object types.  But it would mean that you
couldn't use parallel restore with any pre-8.4 dumps.  In the long run
that's no big deal, but in the short run it's annoying.


hmm not sure how much of a problem that really is - we usually recommend 
to use the pg_dump version of the target database anyway.





Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches

2008-09-29 Thread Peter Eisentraut

KaiGai Kohei wrote:
> As I repeated several times, SE-PostgreSQL applies the seuciry policy
> of SELinux to achieve consistency in access controls. This feature
> enables to restrict client's privileges on accesses to database objects,
> as if it accesses to filesystem objects. Its background is our nightmare
> for web application flaws.
> 
> The major purpose of this feature is to provide the most important
> component to run enterprise class web application with least privilege
> set which is consistent at whole of the system.

How important is this consistency goal in reality?  We typically
recommend that database applications run entirely in the database, for
transaction integrity reasons and so on.  Unless you are doing wild and
fun things with server-side copy or untrusted procedural languages,
there really shouldn't be that much use for consistency of access
control between PostgreSQL and something else.  In fact, on top of the
transactional integrity criterion, having consistent access control is
one of the reasons to have all your production data in the database
system and nowhere else.

Of coure, this is an ideal state, and we all of to break that once in a
while.  But hence the honest question, how often does that really happen
 and to what extent, and does that justify the significant investment
that is being proposed here?

> In recent years, web application flaws are nightmare for us.
> The recent report said 95% of significant incidents on the first
> half of 2008, and 75% of them were SQL injection in Japan.
> My ultimate goal is to help the situation with mandatory access
> control and least priviled set for whole of LAPP stack.

As I had previously mentioned, we have to distinguish these two goals:
consistent access controls and mandatory access controls.

Then, how does MAC help with SQL injections?  Using the existing
role-based system you can already define least-privilege users that are
essentially powerless even if SQL injections were to happen.  I am not
aware that important flaws or gaps in our role-based access control
system have been pointed out that would make it impossible to create
applications with security levels similar to those achievable with a MAC
system.

Now, if you come back to your original goal of consistency in access
control, then it may in fact turn out that an
FLASK/SELinux/SE-PostgreSQL-based system is the best common ground for
such a system, but I am unconvinced that MAC by itself is necessary.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_upgrade performance test

2008-09-29 Thread Zdenek Kotala

I run performance test on in-place patch prototype which I sent for review and I 
got nice result:


Original:
-
MQThL (Maximum Qualified Throughput LIGHT): 2202.12 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 4706.60 tpm
MQThH (Maximum Qualified Throughput HEAVY): 3956.64 tpm


TRANSACTION MIX

Total number of transactions = 330457
TYPETX. COUNT   MIX
-   ---
Light:  55053   16.66%
Medium: 117665  35.61%
DSS:36825   11.14%
Heavy:  98916   29.93%
Connection: 21998   6.66%


RESPONSE TIMES  AVG.MAX.90TH

Light   0.093   1.080   0.400
Medium  0.096   1.663   0.400
DSS 0.092   0.990   0.020
Heavy   0.095   1.262   2.000
Connections 0.092   1.030   0.400
Number of users = 60
Sum of Avg. RT * TPS for all Tx. Types = 17.154543

pg_upgrade:
---
MQThL (Maximum Qualified Throughput LIGHT): 2185.16 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 4674.04 tpm
MQThH (Maximum Qualified Throughput HEAVY): 3915.28 tpm


TRANSACTION MIX

Total number of transactions = 327811
TYPETX. COUNT   MIX
-   ---
Light:  54629   16.66%
Medium: 116851  35.65%
DSS:36367   11.09%
Heavy:  97882   29.86%
Connection: 22082   6.74%


RESPONSE TIMES  AVG.MAX.90TH

Light   0.095   1.073   0.400
Medium  0.098   1.167   0.400
DSS 0.093   1.049   0.010
Heavy   0.098   1.173   2.000
Connections 0.097   1.115   0.400
Number of users = 60
Sum of Avg. RT * TPS for all Tx. Types = 17.385362

--

It looks like it has only 1% performance gap. And good to mention that it is not 
optimized version. It would be good if somebody will run different performance 
test on it and verify my results.


I used iGen OLTP test with 60 concurrent users and run it for 30minutes.


Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Simon Riggs

On Mon, 2008-09-29 at 12:14 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > * Might we make AbortTransaction critical just as far as the
> > END_CRIT_SECTION after XLogInsert in RecordTransactionAbort(), but no
> > further? Don't expect yes, but seems worth recording thoughts.
> 
> The problem is that pretty much everything that proc_exit runs would
> have to become critical, AFAICS.  And a lot of that code is explicitly
> intended not to be critical --- that's why we split it up into multiple
> proc_exit callbacks.  If one fails we pick up with the next, after a
> recursive call to elog().

OK...next idea. If we didn't PANIC, then Postmaster knows about child
death and fumbles around with the ProcArray.

Will it be OK to simply WAL log ProcArrayAdd() and ProcArrayRemove()?

Methinks postmaster can't do this. But might be able to ask somebody
else to do it for him? 

The next person to run ProcArrayAdd() could be left a message to say
last user of this proc index didn't clean up and we need to log it. That
way we can WAL log the ProcArrayRemove() and the ProcArrayAdd() in one
message.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> * Might we make AbortTransaction critical just as far as the
> END_CRIT_SECTION after XLogInsert in RecordTransactionAbort(), but no
> further? Don't expect yes, but seems worth recording thoughts.

The problem is that pretty much everything that proc_exit runs would
have to become critical, AFAICS.  And a lot of that code is explicitly
intended not to be critical --- that's why we split it up into multiple
proc_exit callbacks.  If one fails we pick up with the next, after a
recursive call to elog().

In any case it is clear that there will be failure cases where an
abort record cannot be written --- out of disk space for WAL being
one obvious example.  Are we sure that we can, or want to, guarantee
that those all result in PANIC?  (We do already PANIC on out of disk
space for WAL, but I'm not so sure about generalizing that to any
possible failure.)

>> Thought you were trying to get rid of the shutdown checkpoint during
>> restart?

> Yes, but if I do there would still be a WAL record of some kind there to
> allow us to confirm the change of tli.

> Anyway, I thought you wanted me to keep it now?

No, I don't have a strong opinion one way or the other on that bit.
But an ordinary crash and restart shouldn't generate a tli change.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ad-hoc table type?

2008-09-29 Thread David E. Wheeler


On Sep 28, 2008, at 23:46, [EMAIL PROTECTED] wrote:

I'm not sure what that means. Can you create normal btree or hash  
indexes

on hstore columns? And is the index useful for both `@>` and `?`?


That means that those operations are supported by a GiST (or GIN)  
index,

that is:

 "find the records where col contains 'foo => 1, bar => 2'"

is supported by the index. Likewise for "is contained in" and "has  
key".

It's a bit like having mini-indexes on all keys (although I guess not
that efficient). Pretty cool, I'd say.


Yeah, that does sound good. I look forward to having an excuse for  
playing with this type…


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CTE patch versus UNION type determination rules

2008-09-29 Thread Tom Lane

Currently, the CTE patch assumes (without checking) that the output
rowtype of a recursive WITH item is the same as the output rowtype
determined by inspecting its non-recursive term.  Unfortunately
this is not always the case.  Consider

WITH RECURSIVE q AS (
SELECT int4_col FROM sometable
UNION ALL
SELECT int8_col FROM ... something referencing q ...
);

The output of this UNION will in fact be int8.  However I see no way to
determine that without performing parse analysis of the recursive term,
and we can't do that without having assigned an output rowtype for q
(else we have no idea what to do with the recursive reference to q).
So it seems like we have to throw an error for this, and insist that
the user explicitly do

WITH RECURSIVE q AS (
SELECT int4_col::int8 FROM sometable
UNION ALL
SELECT int8_col FROM ... something referencing q ...
);

Can anyone see a way around that?

I'm inclined to go a bit further and have the code assume that the
output typmods are all -1, even if some more-specific typmod can be
determined from the non-recursive term.  Otherwise you'd need to
explicitly cast in situations like

WITH RECURSIVE q AS (
SELECT varchar_10_col FROM sometable
UNION ALL
SELECT varchar_12_col FROM ... something referencing q ...
);

On the other hand this rule would lose typmod information even in cases
where both UNION arms emit the same typmod, so maybe it's debatable.
Comments?

Another point is that the patch assumes that the non-recursive term
must be the left child of the topmost UNION operator.  In SQL2008,
7.13 syntax rule 2.g.i.3 appears to allow either child to be the
non-recursive term.  However, rule 2.g.v.3 seems to say that a recursive
query is considered "expandable" only if the left child is the
non-recursive term, and in any case it'd be a bit odd to write a
recursive query the other way; it seems more likely to be a mistake
than intentional.  Is everybody happy with making this restriction?
If we don't make it, then we also have some issues with the output
column names of the UNION not necessarily being what we derive from
inspecting just the non-recursive term.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Simon Riggs


On Mon, 2008-09-29 at 11:24 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
> >> ... If we crash and restart, we'll have to get to the end
> >> of this file before we start letting backends in; which might be further
> >> than we actually got before the crash, but not too much further because
> >> we already know the whole WAL file is available.
> 
> > Don't want to make it per file though. Big systems can whizz through WAL
> > files very quickly, so we either make it a big number e.g. 255 files per
> > xlogid, or we make it settable (and recorded in pg_control).
> 
> I think you are missing the point I made above.  If you set the
> okay-to-resume point N files ahead, and then the master stops generating
> files so quickly, you've got a problem --- it might be a long time until
> the slave starts letting backends in after a crash/restart.
> 
> Fetching a new WAL segment from the archive is expensive enough that an
> additional write/fsync per cycle doesn't seem that big a problem to me.
> There's almost certainly a few fsync-equivalents going on in the
> filesystem to create and delete the retrieved segment files.

Didn't miss yer point, just didn't agree. :-)

I'll put it at one (1) and then wait for any negative perf reports. No
need to worry about things like that until later.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Simon Riggs


On Mon, 2008-09-29 at 11:18 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Mon, 2008-09-29 at 10:30 -0400, Tom Lane wrote:
> >> Like what?
> 
> > For constructing snapshots during standby. I need a data structure where
> > emulated-as-running transactions can live. If backend birth/death is
> > intimately tied to WAL visible events then I can use dummy PGPROC
> > structures. If not, then I will have to create a special area that can
> > expand to cater for the possibility that a backend dies and WAL replay
> > won't know about it - which also means I would need to periodically dump
> > a list of running backends into WAL.
> 
> Mph.  I find the idea of assuming that there must be an abort record to
> be unacceptably fragile.  Consider the possibility that the transaction
> gets an error while trying to run AbortTransaction.  Some of that code
> is a CRITICAL_SECTION, but I don't think I like the idea that all of it
> has to be one.

Aware of possibility fragility, hence the post.

Few thoughts:

* Is it close enough that we can get away with having a few spare slots
to cater for that possibility?

* Might we make AbortTransaction critical just as far as the
END_CRIT_SECTION after XLogInsert in RecordTransactionAbort(), but no
further? Don't expect yes, but seems worth recording thoughts.

> > PANIC isn't a problem case because we'll end up generating a shutdown
> > checkpoint which shows the backends have been terminated.
> 
> Thought you were trying to get rid of the shutdown checkpoint during
> restart?

Yes, but if I do there would still be a WAL record of some kind there to
allow us to confirm the change of tli.

Anyway, I thought you wanted me to keep it now?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
>> ... If we crash and restart, we'll have to get to the end
>> of this file before we start letting backends in; which might be further
>> than we actually got before the crash, but not too much further because
>> we already know the whole WAL file is available.

> Don't want to make it per file though. Big systems can whizz through WAL
> files very quickly, so we either make it a big number e.g. 255 files per
> xlogid, or we make it settable (and recorded in pg_control).

I think you are missing the point I made above.  If you set the
okay-to-resume point N files ahead, and then the master stops generating
files so quickly, you've got a problem --- it might be a long time until
the slave starts letting backends in after a crash/restart.

Fetching a new WAL segment from the archive is expensive enough that an
additional write/fsync per cycle doesn't seem that big a problem to me.
There's almost certainly a few fsync-equivalents going on in the
filesystem to create and delete the retrieved segment files.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> On Mon, 2008-09-29 at 10:30 -0400, Tom Lane wrote:
>> Like what?

> For constructing snapshots during standby. I need a data structure where
> emulated-as-running transactions can live. If backend birth/death is
> intimately tied to WAL visible events then I can use dummy PGPROC
> structures. If not, then I will have to create a special area that can
> expand to cater for the possibility that a backend dies and WAL replay
> won't know about it - which also means I would need to periodically dump
> a list of running backends into WAL.

Mph.  I find the idea of assuming that there must be an abort record to
be unacceptably fragile.  Consider the possibility that the transaction
gets an error while trying to run AbortTransaction.  Some of that code
is a CRITICAL_SECTION, but I don't think I like the idea that all of it
has to be one.

> PANIC isn't a problem case because we'll end up generating a shutdown
> checkpoint which shows the backends have been terminated.

Thought you were trying to get rid of the shutdown checkpoint during
restart?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Simon Riggs


On Mon, 2008-09-29 at 10:13 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > I think we can get away with writing the LSN value to disk, as you
> > suggested, but only every so often. No need to do it after every WAL
> > record, just consistently every so often, so it gives us a point at
> > which we know we are safe.
> 
> Huh?  How does that make you safe?  What you need to know is the max
> LSN that could possibly be on disk.
> 
> Hmm, actually we could get away with tying this to fetching WAL files
> from the archive.  When switching to a new WAL file, write out the
> *ending* WAL address of that file to pg_control.  Then process the WAL
> records in it.  Whether or not any of the affected pages get to disk,
> we know that there is no LSN on disk exceeding what we already put in
> pg_control.  If we crash and restart, we'll have to get to the end
> of this file before we start letting backends in; which might be further
> than we actually got before the crash, but not too much further because
> we already know the whole WAL file is available.
> 
> Or is that the same thing you were saying?  The detail about using
> the end address seems fairly critical, and you didn't mention it...

Same! Just said safe point was "LSN + 1", and since end = next start.

Looks we've got a solution, no matter how it's described. (I actually
have a more detailed proof of safety using snapshots/MVCC considerations
so I wasn't overly worried but what we've discussed is much easier to
understand and agree. Proof of safety is all we need, and this simpler
proof is more secure.)

Don't want to make it per file though. Big systems can whizz through WAL
files very quickly, so we either make it a big number e.g. 255 files per
xlogid, or we make it settable (and recorded in pg_control).

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Simon Riggs

On Mon, 2008-09-29 at 10:30 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Is it possible to have a FATAL error that crashes a backend and for it
> > to *not* have written an abort WAL record for any previously active
> > transaction? 
> 
> Well, a FATAL error will still go through transaction abort before
> exiting, IIRC.  The problem case is a PANIC or an actual core dump.

> > If we either *always* write a WAL record, or PANIC then that
> > makes some coding easier,
> 
> Like what?

For constructing snapshots during standby. I need a data structure where
emulated-as-running transactions can live. If backend birth/death is
intimately tied to WAL visible events then I can use dummy PGPROC
structures. If not, then I will have to create a special area that can
expand to cater for the possibility that a backend dies and WAL replay
won't know about it - which also means I would need to periodically dump
a list of running backends into WAL.

PANIC isn't a problem case because we'll end up generating a shutdown
checkpoint which shows the backends have been terminated.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fatal Errors

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> Is it possible to have a FATAL error that crashes a backend and for it
> to *not* have written an abort WAL record for any previously active
> transaction? 

Well, a FATAL error will still go through transaction abort before
exiting, IIRC.  The problem case is a PANIC or an actual core dump.

> If we either *always* write a WAL record, or PANIC then that
> makes some coding easier,

Like what?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Andrew Dunstan




Dimitri Fontaine wrote:

Le lundi 29 septembre 2008, Tom Lane a écrit :
  

* Extend the archive format to provide some indication that "restoring
this object requires exclusive access to these dependencies".

* Hardwire knowledge into pg_restore that certain types of objects
require exclusive access to their dependencies.



Well, it seems to me that currently the FK needs in term of existing indexes 
and locks, and some other object lock needs, are all hardwired. Is it even 
safe to consider having the locks needed for certain commands not be 
hardwired?


Provided I'm not all wrong here, I don't see how having something more 
flexible at restore time than at build time is a win. The drawback is that 
whenever you change a lock need in commands, you have to remember teaching 
pg_restore about it too.


So my vote here is in favor of hardwired knowledge of pg_restore, matching 
target server code assumptions and needs.


  


Well, I've had to use some knowledge of various item types already, and 
I have been trying not to disturb pg_dump also, so I'm inclined to build 
this knowledge into pg_restore.


ISTM that "things that will have lock conflicts" are different and more 
target version dependent than "things that logically depend on other 
things", so we can still rely on pg_dump to some extent to provide the 
latter while building the former at restore time.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Fatal Errors

2008-09-29 Thread Simon Riggs


Is it possible to have a FATAL error that crashes a backend and for it
to *not* have written an abort WAL record for any previously active
transaction? 

I think yes, but haven't managed to create this situation while testing
for it. If we either *always* write a WAL record, or PANIC then that
makes some coding easier, so seems sensible to check.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> I think we can get away with writing the LSN value to disk, as you
> suggested, but only every so often. No need to do it after every WAL
> record, just consistently every so often, so it gives us a point at
> which we know we are safe.

Huh?  How does that make you safe?  What you need to know is the max
LSN that could possibly be on disk.

Hmm, actually we could get away with tying this to fetching WAL files
from the archive.  When switching to a new WAL file, write out the
*ending* WAL address of that file to pg_control.  Then process the WAL
records in it.  Whether or not any of the affected pages get to disk,
we know that there is no LSN on disk exceeding what we already put in
pg_control.  If we crash and restart, we'll have to get to the end
of this file before we start letting backends in; which might be further
than we actually got before the crash, but not too much further because
we already know the whole WAL file is available.

Or is that the same thing you were saying?  The detail about using
the end address seems fairly critical, and you didn't mention it...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Simon Riggs

On Mon, 2008-09-29 at 08:46 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > ... That kinda works, but the problem is that restartpoints are time based,
> > not log based. We need them to be deterministic for us to rely upon them
> > in the above way.
> 
> Right, but the performance disadvantages of making them strictly
> log-distance-based are pretty daunting.  We don't really want slaves
> doing that while they're in catchup mode.

I don't think we need to perform restartpoints actually, now I think
about it. It's only the LSN that is important. 

I think we can get away with writing the LSN value to disk, as you
suggested, but only every so often. No need to do it after every WAL
record, just consistently every so often, so it gives us a point at
which we know we are safe. We will need to have Startup process block
momentarily while the value is written.

Propose Startup process writes/flushes LSN to pg_control every time we
change xlogid. That's independent of WAL file size and fairly clear. 

When we reach that LSN + 1 we will know that no LSNs higher than that
value can have reached disk.

OK?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Dimitri Fontaine

Le lundi 29 septembre 2008, Tom Lane a écrit :
> * Extend the archive format to provide some indication that "restoring
> this object requires exclusive access to these dependencies".
>
> * Hardwire knowledge into pg_restore that certain types of objects
> require exclusive access to their dependencies.

Well, it seems to me that currently the FK needs in term of existing indexes 
and locks, and some other object lock needs, are all hardwired. Is it even 
safe to consider having the locks needed for certain commands not be 
hardwired?

Provided I'm not all wrong here, I don't see how having something more 
flexible at restore time than at build time is a win. The drawback is that 
whenever you change a lock need in commands, you have to remember teaching 
pg_restore about it too.

So my vote here is in favor of hardwired knowledge of pg_restore, matching 
target server code assumptions and needs.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] FSM rewrite: doc changes

2008-09-29 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> In fsm_rebuild_page, surely we needn't check "if (lchild < NodesPerPage)".

> Yes, we do.

But the loop starting point is such that you must be visiting a parent
with at least one child, no?

>> reveals a rather fundamental problem: it is clearly possible
>> for this test to fail on valid request sizes, because the page
>> header overhead is less than FSM_CAT_STEP (especially if BLCKSZ
>> is more than 8K).  I'm not sure about a really clean solution
>> here.

> Hmph. The other alternative is to use 2 bytes instead of one per page, 
> and track the free space exactly. But I'd rather not do that just to 
> deal with the very special case of huge requests.

Yeah, I thought about that too.  It's got another problem besides the
sheer space cost: it would result in a whole lot more update traffic for
upper levels of the tree.  The quantization of possible values in the
current design is good because it avoids updates of parents for
relatively small deltas of free space.

> Or we could just return -1 instead of throwing an error. Requests higher 
> than the limit would then always have to extend the heap. That's not 
> good, but I think we already have that problem for tuples of exactly 
> MaxHeapTupleSize bytes. Since PageGetFreeSpace subtracts the size of a 
> new line pointer, only newly extended pages that have never had any 
> tuples on them have enough space, as determined by PagetGetFreeSpace, to 
> fit a tuple of MaxHeapTupleSize bytes.

That seems like something we'll want to fix sometime, rather than
hardwiring into the FSM design.

I suppose an alternative possibility is to set MaxHeapTupleSize at
255/256's of a block by definition, so that no request will ever exceed
what the FSM stuff can handle.  But I'm sure that'd make somebody
unhappy --- somewhere out there is a table with tuples wider than that.

Probably the least bad alternative here is to allow FSM's category
scaling to depend on MaxHeapTupleSize.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] FSM rewrite: doc changes

2008-09-29 Thread Heikki Linnakangas


Tom Lane wrote:

FreeSpaceMapTruncateRel seems to have a bug in its early-exit test: in the
case where the number of FSM blocks stays the same, it fails to zero out slots
in the last block.  I also think it's got an off-by-one problem in figuring
the number of FSM blocks: for the normal case where the new heap end is in
the middle of a FSM block, shouldn't new_nfsmblocks be one larger than it
is?  The case where nblocks is an exact multiple of SlotsPerFSMPage would
need to be special-cased to be exactly correct, though I see no real harm in
letting the FSM be left one page too big in that case.

fsm_truncate_avail seems quite broken: it's clearing the whole page always.


Yep, I noticed these myself on Friday after sending the patch..


I do not like the kluge in heap_xlog_clean one bit, and think it's unnecessary
anyway since we are not relying on the FSM to be accurate.  Suggest reverting
the heapam.c changes except for heap_sync().


The point is to have a more up-to-date FSM after recovery. PITR recovery 
in a warm stand-by server in particular.


I'll take it out for now, but it needs more discussion. In fact, I think 
we should update the FSM even more aggressively, on inserts and updates 
as well vacuums. Probably not on all inserts and updates, though, to 
keep the overhead minimal, but there's a tradeoff somewhere in between.



The new search algorithm in fsm_search_avail still doesn't work.  Consider
what happens when the target is the rightmost slot on the page; it certainly
won't wrap properly.


Crap, you're right.


In fsm_rebuild_page, surely we needn't check "if (lchild < NodesPerPage)".


Yes, we do. There can be is a number of completely unused upper nodes on 
the right. The upper levels of the tree are complete, but the bottom 
level is missing enough nodes to make room for the page header. So the 
tree looks something like this:


 X
X X
 XX X   X
X X  X X   X . . .

Where . is a missing node. The parents that miss both children will 
always be zero.



This test in fsm_space_needed_to_cat:
if (needed >= (FSM_CATEGORIES - 1) * FSM_CAT_STEP)
elog(ERROR, "invalid FSM request size");
reveals a rather fundamental problem: it is clearly possible
for this test to fail on valid request sizes, because the page
header overhead is less than FSM_CAT_STEP (especially if BLCKSZ
is more than 8K).  I'm not sure about a really clean solution
here.  We could offset the needed_to_cat and avail_to_cat
calculations so that category 255 corresponds exactly to the
maximum possible free space, but that requires assuming that FSM
knows exactly what that is, which is a bit unpleasant.  Thoughts?


Hmph. The other alternative is to use 2 bytes instead of one per page, 
and track the free space exactly. But I'd rather not do that just to 
deal with the very special case of huge requests.


Or we could just return -1 instead of throwing an error. Requests higher 
than the limit would then always have to extend the heap. That's not 
good, but I think we already have that problem for tuples of exactly 
MaxHeapTupleSize bytes. Since PageGetFreeSpace subtracts the size of a 
new line pointer, only newly extended pages that have never had any 
tuples on them have enough space, as determined by PagetGetFreeSpace, to 
fit a tuple of MaxHeapTupleSize bytes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Um, FKs could conflict with each other too, so that by itself isn't
>> gonna fix anything.

> Good point. Looks like we'll need to make a list of "can't run in 
> parallel with" items as well as strict dependencies.

Yeah, I was just thinking about that.  The current archive format
doesn't really carry enough information for this.  I think there
are two basic solutions we could adopt:

* Extend the archive format to provide some indication that "restoring
this object requires exclusive access to these dependencies".

* Hardwire knowledge into pg_restore that certain types of objects
require exclusive access to their dependencies.

The former seems more flexible, as well as more in tune with the basic
design assumption that pg_restore shouldn't have a lot of knowledge
about individual archive object types.  But it would mean that you
couldn't use parallel restore with any pre-8.4 dumps.  In the long run
that's no big deal, but in the short run it's annoying.

Another angle is that it's not clear what happens if the need for
exclusive access changes over time.  You were just speculating about
reducing the lock strength required for ALTER TABLE ADD FOREIGN KEY.
I don't know if that's workable or not, but certainly reducing the
lock strength for some types of ALTER TABLE might be in our future.
Contrarily, we don't currently try hard to lock any non-table objects
(schemas, functions, etc) while building dependent objects; but that's
obviously not really right, and someday we might decide to fix it.
So having pg_dump prepare the list of exclusive dependencies at dump
time might be the wrong thing --- it would reflect the behavior of
the source server version, not the target which is what matters.

Thoughts?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  

Tom Lane wrote:


Hmm, I'll bet the restore code doesn't realize that this can't run in
parallel with index creation on either table ...
  


  

Yeah. Of course, it's never needed to bother with stuff like that till now.



  
The very simple fix is probably to run a separate parallel cycle just 
for FKs, after the index creation.



Um, FKs could conflict with each other too, so that by itself isn't
gonna fix anything.


  


Good point. Looks like we'll need to make a list of "can't run in 
parallel with" items as well as strict dependencies.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-29 Thread Markus Wanner

Hi,

thank you for your patience in explaining. Rest assured that I've read
the relevant messages multiple times.

Tom Lane wrote:
> the
> default expression is a separate entity from the attribute itself,

That was the point I didn't understand...

> ..
> Otherwise we couldn't handle the concept that dropping some entity
> (like a function) forces discarding of the default, not the whole
> column the default is attached to.

..and that finally provided the missing piece for my puzzle: it's not
the dependency between the attribute and its default which matters here.
But the possible dependencies of the default (and not the attribute
itself) on other entities.

> Now admittedly giving it its own OID and classid = pg_attrdef is
> probably not the only way to do that.  But merging it into the
> pg_attribute row leaves no obvious way to do it within the
> object identity representation that's been chosen for pg_depend.

Understood, makes sense.

> pg_shdepend is already designed to track ACLs: an ACL dependency says
> that "there's some privilege that this role has been granted on this
> object".  So as long as you can identify the object you're okay, you
> don't need a separate identity for the ACL.

Sure.

> Stephen was arm-waving about getting rid of pg_attrdef, but trying to
> hold the column privileges patch hostage to that would be a serious
> error.  It's an independent problem, so it ought to be addressed in
> a separate patch; and it has no clear solution so it's not entirely
> obvious that it can or should be done at all.

Agreed.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> ... That kinda works, but the problem is that restartpoints are time based,
> not log based. We need them to be deterministic for us to rely upon them
> in the above way.

Right, but the performance disadvantages of making them strictly
log-distance-based are pretty daunting.  We don't really want slaves
doing that while they're in catchup mode.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [REVIEW] Prototype: In-place upgrade v02

2008-09-29 Thread Zdenek Kotala


Abbas napsal(a):

Hi,

I have gone through the following stuff

1) previous emails on the patch
2) http://wiki.postgresql.org/wiki/In-place_upgrade
3) http://www.pgcon.org/2008/schedule/attachments/57_pg_upgrade_2008.pdf
4) http://wiki.postgresql.org/wiki/In-place_upgrade:Storage

Here is what I have understood so far, (correct me if I am wrong)

The on disk representation of data has changed from version to version
over the years. For some strange reason (performance may be) the newer
versions of pg were not backwards compatible, meaning that the newer
version would not read data written by an older version if the on disk
representation has changed in between.
The end user would be required to port the data stored using older
version to the newer version format using offline import export.
This project aims upgrades from older to newer version on the fly.
On-disk representation is not the only change that the system should
accommodate, it should also accommodate catalog changes, conf file
changes etc.


It is correct.


Of the available design choices I think you have chosen to go with
on-line data conversion, meaning that pg would now be aware of all the
previous page layouts and based on a switch on page version would handle
each page layout. This will only be done to read old data, newer data
will be written in newer format.


Yes.


I am supposed to test the patch and for that I have downloaded pg
versions 7.4, 8.0, 8.1, 8.2 and 8.3.

I plan to create a data directory using each of the versions and then
try to read the same using the 8.4 with your patch applied.


It does not work. The patch is only prototype. It contains framework for 
implementing old page layout version and it contains partial version 3.


The main purpose of this prototype is to make decision if this approach is 
acceptable or not. Or if some part is acceptable - it contains for example 
useful page API rework and implementation which is useful (by my opinion) in 
general.



What database objects should I create in the test database, should I
just create objects of my choice?

Does sizes (both length and breadth) of tables matter?


These test does not make sense at this moment. I have test script (created by 
Nidhi) for catalog upgrade already done. However, it uses currently Sun's 
internal framework. I will modify it and release it.



Do I have to perform performance tests too?


Yes, please. My colleague tested it and got 5% performance drop, but it was not 
complete version and I tested full patch on Friday and It was surprise for me 
...  I got little bit better throughput (about 0,5%) with patch. I'm going to 
retest it again but it would be good to get result also from others.


thanks Zdenek




Regards
Abbas




On Fri, 2008-09-19 at 14:28 +0200, Zdenek Kotala wrote:

thanks

Abbas napsal(a):

Even with that a hunk failed for bufpage.c, but I applied that part
manually to move on.
Regards
Abbas

On Thu, 2008-09-18 at 12:17 +0200, Zdenek Kotala wrote:

Abbas napsal(a):

Hi,
I downloaded latest postgresql source code from
 git clone git://git.postgresql.org/git/postgresql.git
and tried to apply the patch 
 http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz


It does not apply cleanly, see the failures in attached file.
It clash with hash index patch which was committed four days ago. Try to use 
little bit older revision from git (without hash index modification).


Zdenek










--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Hmm, I'll bet the restore code doesn't realize that this can't run in
>> parallel with index creation on either table ...

> Yeah. Of course, it's never needed to bother with stuff like that till now.

> The very simple fix is probably to run a separate parallel cycle just 
> for FKs, after the index creation.

Um, FKs could conflict with each other too, so that by itself isn't
gonna fix anything.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Andrew Dunstan




Tom Lane wrote:

Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
  
pg_restore: [archiver (db)] could not execute query: ERROR:  deadlock 
detected
DETAIL:  Process 18100 waits for AccessExclusiveLock on relation 
1460818342 of database 1460815284; blocked by process 18103.
Process 18103 waits for AccessExclusiveLock on relation 1460818336 of 
database 1460815284; blocked by process 18100.

HINT:  See server log for query details.



  

ALTER TABLE ONLY foo
 ADD CONSTRAINT fk_av_relations_av FOREIGN KEY (vs_id) REFERENCES 
bar ...



Hmm, I'll bet the restore code doesn't realize that this can't run in
parallel with index creation on either table ...


  


Yeah. Of course, it's never needed to bother with stuff like that till now.

The very simple fix is probably to run a separate parallel cycle just 
for FKs, after the index creation.


A slightly more elegant fix would probably be to add dependencies from 
each index that might cause this to the FK constraint.


I'll work on the first for now.

Is there any chance that the locks we're taking here are too strong? 
Intuitively it looks a bit like it.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-29 Thread Simon Riggs


On Sun, 2008-09-28 at 21:16 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> >> It does nothing AFAICS for the
> >> problem that when restarting archive recovery from a restartpoint,
> >> it's not clear when it is safe to start letting in backends.  You need
> >> to get past the highest LSN that has made it out to disk, and there is
> >> no good way to know what that is.
> 
> > AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
> > the controlfile, so whenever we restart we can see that it has been set
> > previously and now we are beyond it.
> 
> Right ...
> 
> > So if we crash during recovery and
> > then restart *after* we reached minRecoveryLoc then we resume in safe
> > mode almost immediately.
> 
> Wrong.

OK, see where you're coming from now. Solution is needed, I agree.

> What minRecoveryLoc is is an upper bound for the LSNs that might be
> on-disk in the filesystem backup that an archive recovery starts from.
> (Defined as such, it never changes during a restartpoint crash/restart.)
> Once you pass that, the on-disk state as modified by any dirty buffers
> inside the recovery process represents a consistent database state.
> However, the on-disk state alone is not guaranteed consistent.  As you
> flush some (not all) of your shared buffers you enter other
> not-certainly-consistent on-disk states.  If we crash in such a state,
> we know how to use the last restartpoint plus WAL replay to recover to
> another state in which disk + dirty buffers are consistent.  However,
> we reach such a state only when we have read WAL to beyond the highest
> LSN that has reached disk --- and in recovery mode there is no clean
> way to determine what that was.
> 
> Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
> but have it scribble a highest-LSN somewhere on stable storage (maybe
> scribble on pg_control itself, or maybe better someplace else).  I'm
> not totally sure about that.  But I am sure that doing nothing will
> be unreliable.

No need to write highest LSN to disk constantly...

If we restart from a restartpoint then initially the current apply LSN
will be potentially/probably earlier than the latest on-disk LSN, as you
say. But once we have completed the next restartpoint *after* the value
pg_control says then we will be guaranteed that the two LSNs are the
same, since otherwise we would have restarted at a later point.

That kinda works, but the problem is that restartpoints are time based,
not log based. We need them to be deterministic for us to rely upon them
in the above way. If we crash and then replay we can only be certain we
are safe when we have found a restartpoint that the previous recovery
will definitely have reached.

So we must have log-based restartpoints, using either a constant LSN
offset, or a parameter like checkpoint_segments. But if it is changeable
then it needs to be written into the control file, so we don't make a
mistake about it. 

So we need to:
* add an extra test to delay safe point if required
* write restart_segments value to control file
* force a restartpoint on first valid checkpoint WAL record after we
have passed restart_segments worth of log

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-29 Thread Tom Lane

Markus Wanner <[EMAIL PROTECTED]> writes:
> And why do we keep the attributes defaults in their own table with their
> own OID, instead of merging them into pg_attributes?

That has already been explained multiple times in this thread, but: the
default expression is a separate entity from the attribute itself, so
there needs to be some different representation for it in pg_depend.
Otherwise we couldn't handle the concept that dropping some entity
(like a function) forces discarding of the default, not the whole
column the default is attached to.

Now admittedly giving it its own OID and classid = pg_attrdef is
probably not the only way to do that.  But merging it into the
pg_attribute row leaves no obvious way to do it within the
object identity representation that's been chosen for pg_depend.

> (Or put another way
> around: why do these need their own dependency tracking, while the ACLs
> don't?)

pg_shdepend is already designed to track ACLs: an ACL dependency says
that "there's some privilege that this role has been granted on this
object".  So as long as you can identify the object you're okay, you
don't need a separate identity for the ACL.

> Or do we just want to keep the column-level privileges patch simple here
> and deferring other work to another patch?

Stephen was arm-waving about getting rid of pg_attrdef, but trying to
hold the column privileges patch hostage to that would be a serious
error.  It's an independent problem, so it ought to be addressed in
a separate patch; and it has no clear solution so it's not entirely
obvious that it can or should be done at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Tom Lane

Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> pg_restore: [archiver (db)] could not execute query: ERROR:  deadlock 
> detected
> DETAIL:  Process 18100 waits for AccessExclusiveLock on relation 
> 1460818342 of database 1460815284; blocked by process 18103.
> Process 18103 waits for AccessExclusiveLock on relation 1460818336 of 
> database 1460815284; blocked by process 18100.
> HINT:  See server log for query details.

> ALTER TABLE ONLY foo
>  ADD CONSTRAINT fk_av_relations_av FOREIGN KEY (vs_id) REFERENCES 
> bar ...

Hmm, I'll bet the restore code doesn't realize that this can't run in
parallel with index creation on either table ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-29 Thread Stefan Kaltenbrunner


Andrew Dunstan wrote:



Andrew Dunstan wrote:





this works better but there is something fishy still - using the same 
dump file I get a proper restore using pg_restore normally. If I 
however use -m for a parallel one I only get parts (in this case only 
243 of the 709 tables) of the database restored ...






Yes, there are several funny things going on, including some stuff 
with dependencies. I'll have a new patch tomorrow with luck. Thanks 
for testing.





OK, in this version a whole heap of bugs are fixed, mainly those to do 
with dependencies and saved state. I get identical row counts in the 
source and destination now, quite reliably.


this looks much better (for a restore that usually takes 180min I can 
get down to 72min using -m 4) - however especially with higher 
concurrency I'm sometimes running into restore failures due to deadlocks 
happening during constraint restoration (slightly redacted):


pg_restore: [archiver (db)] Error from TOC entry 7765; 2606 1460743180 
FK CONSTRAINT fk_av_relations_av db_owner
pg_restore: [archiver (db)] could not execute query: ERROR:  deadlock 
detected
DETAIL:  Process 18100 waits for AccessExclusiveLock on relation 
1460818342 of database 1460815284; blocked by process 18103.
Process 18103 waits for AccessExclusiveLock on relation 1460818336 of 
database 1460815284; blocked by process 18100.

HINT:  See server log for query details.

ALTER TABLE ONLY foo
ADD CONSTRAINT fk_av_relations_av FOREIGN KEY (vs_id) REFERENCES 
bar ...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ad-hoc table type?

2008-09-29 Thread pgsql

> What you're talking about is a document based database like
> StrokeDB, CouchDB. With hstore you don't need to parse content of
> 'aggregate' column, it provides necessary methods. Also, we tried
> to speedup selects using indexes. Probably, we need to refresh our
> interest to hstore, do you have any actual proposals ?

Proposals, not at this point. I'm trying to decide (a) if I have the time
and (b) do I do it with Postgres or SQLite. The hstore module, as I said,
looks really cool, I've contemplated something like it. I have a module
provides a set of accessors for an XML text column that works similarly,
but it parses the XML on each access and the application has to create the
XML. (I have XML creation modules for Java, PHP, C++, and standard C
bindings.)

It is more a conflict of data ideology, IMHO. There is a class of data
that is logically on the same level as other data, but is forced into a
secondary storage methodology. It isn't a pressing need as there are work
arounds, but don't you think a cleaner interface make sense? Also, what is
the overhead for the secondary storage mechanism? I think it would make
the life of application developers easier.


>
> Oleg
>
> On Sun, 28 Sep 2008, [EMAIL PROTECTED] wrote:
>
>>> [EMAIL PROTECTED] writes:
 Something like this:
>>>
 create adhoc table foo ();
>>>
 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');
>>>
 In an "ad-hoc" table type, when an insert is made, and a column is not
 found, then a new varchar column is added.
>>>
 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.
>>>
>>> Seems like a table with one contrib/hstore column might be more
>>> relevant
>>> to this guy's idea of how to do database design.
>>>
>>
>> That's actually a very cool module, I hadn't seen it before. I've
>> considered writing something like it, but more XML centric, but I'm not
>> sure it answers the concept.
>>
>> I'm not sure if you have dealt with web site sessions and object
>> persistence crap, but its a pain to get up and running and improving
>> performance is a drag. Web guys tend to know very little about databases
>> and tend, sadly, not to be very inquisitive about such things.
>>
>> Web session and user attribute objects are typically stored in a
>> database
>> as XML, JSON, or some other aggregated format in a single column
>> (hstore).
>> That works great for when you just need to access the data by the key,
>> but
>> if you want to "use" the data outside the web application for something
>> like OLAP, you have to decide which attributes reside in the aggregate
>> column or get promoted to a full fledged column. That's why you'll see
>> tables with username, passwdhash, email, etc. in addition to an
>> aggregated
>> column of things like screen template, age, etc.
>>
>> So, how do you have a table of a generally arbitrary number of columns
>> without creating some sort of aggregate column?  With an aggregate
>> column,
>> the data isn't on the same level as real column data, so you need to
>> parse
>> the aggregate to extract a value, and you have to do that for each
>> value.
>> On top of that, you then have to explain your aggregate strategy to the
>> web guys.
>>
>> Being able to insert arbitrary named values, and extracting them
>> similarly, IMHO works "better" and more naturally than some external
>> aggregate system built on a column. I know it is a little "outside the
>> box" thinking, what do you think?
>>
>>
>
>   Regards,
>   Oleg
> _
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-29 Thread Markus Wanner

Hi,

Stephen Frost wrote:
> As part of pg_attribute..  Having a seperate table would be an
> alternative to adding a column to pg_shdepend.

Aha. Hm...

I thought tracking dependencies between tables and attributes
complicates DROP TABLE? Why doesn't that concern apply here?

And why do we keep the attributes defaults in their own table with their
own OID, instead of merging them into pg_attributes? (Or put another way
around: why do these need their own dependency tracking, while the ACLs
don't?)

Or do we just want to keep the column-level privileges patch simple here
and deferring other work to another patch?

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-29 Thread Stephen Frost

* Markus Wanner ([EMAIL PROTECTED]) wrote:
> Stephen Frost wrote:
> > * Markus Wanner ([EMAIL PROTECTED]) wrote:
> >> What does the subobject column for pg_shdepend buy us?
> > 
> > Tracking column-level ACL dependencies rather than having those
> > dependencies only be at the table-level.  This complicates
> > pg_shdepend some, but simplifies the dependency handling in the
> > ACL area and in handling table/column drops.
> 
> With a separate table? Or as part of pg_attribute (which can handle NULL
> and VARLENA attributes now?)

As part of pg_attribute..  Having a seperate table would be an
alternative to adding a column to pg_shdepend.

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] Null row vs. row of nulls in plpgsql

2008-09-29 Thread Pavel Stehule

2008/9/29 Tom Lane <[EMAIL PROTECTED]>:
> Greg Stark <[EMAIL PROTECTED]> writes:
>> On 27 Sep 2008, at 09:56 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>>> ISTM that the fundamental problem is that plpgsql doesn't distinguish
>>> properly between a null row value (eg, "null::somerowtype") and a
>>> row of null values (eg, "row(null,null,...)::somerowtype").  When that
>>> code was designed, our main SQL engine was pretty fuzzy about the
>>> difference too, but now there is a clear semantic distinction.
>
>> Iirc the reason for this fuzziness came from the SQL spec definition
>> of IS NULL for rows. As long as you maintain that level of spec-
>> compliance I don't think there are any other important constraints on
>> pg behaviour.
>
> I started to poke into this and found out that it was a bit subtler than
> I thought.  It'd be possible to associate a "rowisnull" state value
> with a row variable, but the problem is that plpgsql treats the row
> fields as independent variables that can be accessed without touching
> the row.  In particular you can assign null or nonnull values to
> individual fields.  So consider
>
>-- presumably, this'll set rowisnull to TRUE:
>rowvar := NULL;
>-- this had better cause rowisnull to become FALSE:
>rowvar.field1 := 42;
>-- does this cause it to become TRUE again?
>rowvar.field1 := NULL;


this sequence is wrong. in SQL rowvar has same behave as pointer. When
you would to fill rowvar you should to call constructor first.

some like
rowvar := NULL; -- null value
rowvar := constructor(null);
rowvar := constructor();
rowvar.field = 42;


regards
Pavel Stehule




>
> There are a bunch of implementation problems with making any such
> behavior happen, since the row field variables don't currently "know"
> that they are members of a row, and indeed it's possible for the same
> variable to be a member of more than one row.  But the core issue is
> that this interaction seems to fuzz the distinction between "row is
> null" and "all the row's elements are null".  In particular, if you
> think that rowisnull should be TRUE after the above sequence, then
> I think you are saying they are the same thing.  So maybe the spec
> authors are smarter than we are.
>
> Thoughts?  What would a consistent behavior look like?
>
>regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-29 Thread Markus Wanner

Hi,

Stephen Frost wrote:
> * Markus Wanner ([EMAIL PROTECTED]) wrote:
>> What does the subobject column for pg_shdepend buy us?
> 
> Tracking column-level ACL dependencies rather than having those
> dependencies only be at the table-level.  This complicates
> pg_shdepend some, but simplifies the dependency handling in the
> ACL area and in handling table/column drops.

With a separate table? Or as part of pg_attribute (which can handle NULL
and VARLENA attributes now?)

Regards

Markus Wanner


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

51 matches

Mail list logo