Re: [HACKERS] Null row vs. row of nulls in plpgsql

2008-09-28 Thread Greg Stark
Iirc the reason for this fuzziness came from the SQL spec definition  
of IS NULL for rows. As long as you maintain that level of spec- 
compliance I don't think there are any other important constraints on  
pg behaviour.


greg

--sorry for the top posting but the phone makes it hard to do anything  
else.


On 27 Sep 2008, at 09:56 PM, Tom Lane [EMAIL PROTECTED] wrote:


I looked a bit at the bug report here:
http://archives.postgresql.org/pgsql-bugs/2008-09/msg00164.php

ISTM that the fundamental problem is that plpgsql doesn't distinguish
properly between a null row value (eg, null::somerowtype) and a
row of null values (eg, row(null,null,...)::somerowtype).  When that
code was designed, our main SQL engine was pretty fuzzy about the
difference too, but now there is a clear semantic distinction.

For plpgsql's RECORD variables this doesn't seem hard to fix: just
take out the code in exec_move_row() that manufactures a row of nulls
when the input is null, and maybe make a few small adjustments
elsewhere.  For ROW variables there's a bigger problem, because those
are represented by a list of per-field variables, which doesn't
immediately offer any way to represent overall nullness.  I think it
could be dealt with by adding an explicit the row as a whole is null
flag to struct PLpgSQL_row.  I haven't tried to code it though, so I'm
not sure if there are gotchas or unreasonably large code changes  
needed

to make it happen.

I thought for a little bit about whether we couldn't get rid of ROW
variables entirely, or at least make them work more like RECORD  
variables

by storing a HeapTuple instead of a list of per-field variables.  But
I soon found out that the reason to have them is to be able to  
describe

the assignment target of SQL statements that assign to multiple scalar
variables, eg SELECT ... INTO x,y,z.

Comments?

   regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Prototype: In-place upgrade v02

2008-09-28 Thread Abbas
Hi,

I have gone through the following stuff

1) previous emails on the patch
2) http://wiki.postgresql.org/wiki/In-place_upgrade
3) http://www.pgcon.org/2008/schedule/attachments/57_pg_upgrade_2008.pdf
4) http://wiki.postgresql.org/wiki/In-place_upgrade:Storage

Here is what I have understood so far, (correct me if I am wrong)

The on disk representation of data has changed from version to version
over the years. For some strange reason (performance may be) the newer
versions of pg were not backwards compatible, meaning that the newer
version would not read data written by an older version if the on disk
representation has changed in between.
The end user would be required to port the data stored using older
version to the newer version format using offline import export.
This project aims upgrades from older to newer version on the fly.
On-disk representation is not the only change that the system should
accommodate, it should also accommodate catalog changes, conf file
changes etc.

Of the available design choices I think you have chosen to go with
on-line data conversion, meaning that pg would now be aware of all the
previous page layouts and based on a switch on page version would handle
each page layout. This will only be done to read old data, newer data
will be written in newer format.

I am supposed to test the patch and for that I have downloaded pg
versions 7.4, 8.0, 8.1, 8.2 and 8.3.

I plan to create a data directory using each of the versions and then
try to read the same using the 8.4 with your patch applied.

What database objects should I create in the test database, should I
just create objects of my choice?

Does sizes (both length and breadth) of tables matter?

Do I have to perform performance tests too?

Regards
Abbas




On Fri, 2008-09-19 at 14:28 +0200, Zdenek Kotala wrote:
 thanks
 
 Abbas napsal(a):
  Even with that a hunk failed for bufpage.c, but I applied that part
  manually to move on.
  Regards
  Abbas
  
  On Thu, 2008-09-18 at 12:17 +0200, Zdenek Kotala wrote:
  Abbas napsal(a):
  Hi,
  I downloaded latest postgresql source code from
   git clone git://git.postgresql.org/git/postgresql.git
  and tried to apply the patch 
   http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz
 
  It does not apply cleanly, see the failures in attached file.
  It clash with hash index patch which was committed four days ago. Try to 
  use 
  little bit older revision from git (without hash index modification).
 
 Zdenek
 
 
 
  
 
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] planned maintenance downtime - tribble.postgresql.org

2008-09-28 Thread Stefan Kaltenbrunner

Stefan Kaltenbrunner wrote:
The sysadmin team would like to announce a planned maintenance window 
for OS related updates on tribble.postgresql.org starting Sunday Sep 28 
07:00 GMT (espected to last for an hour) affecting the following 
publically visible services:


cvs.postgresql.org
wwwmaster.postgresql.org
www.pgadmin.org
doxygen.postgresql.org
wiki.postgresql.org

I would ask people to hold off on any changes or commits to the affected 
services during that time period until you see an explicit it's done.


all done and services should be up again - if you notice any problems 
please report back.



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Null row vs. row of nulls in plpgsql

2008-09-28 Thread Hannu Krosing
On Sun, 2008-09-28 at 04:03 +0300, Greg Stark wrote:
 Iirc the reason for this fuzziness came from the SQL spec definition  
 of IS NULL for rows. As long as you maintain that level of spec- 
 compliance I don't think there are any other important constraints on  
 pg behaviour.

What does SQL spec say about recursive IS NULL for rows ? 

Should we check that IS NULL is true for each row element, or must they
actually be NULL's ?

hannu=# select row(null, null) is NULL;
 ?column? 
--
 t
(1 row)

hannu=# select row(null, row(null, null)) is NULL;
 ?column? 
--
 f
(1 row)

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM rewrite: doc changes

2008-09-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 To keep everyone who's interested up-to-date, attached is the latest 
 patch. ...
 I find it a bit disturbing that a documentation patch actually removes 
 more lines from the manual than adds, but it's quite understandable 
 because it's no longer necessary to explain the two GUC options that 
 used to be quite important :-). Comments welcome.

Well, this patch isn't actually supposed to have user-visible impact
other than eliminating a couple of troublesome configuration settings.
So it's entirely expected for the docs to get shorter ;-)

I did another pass of code-reading, and found a lot of nitpicks and
some not-so-trivial issues.  In no particular order:


Copyright in indexfsm.c is a year off.

InitIndexFreeSpaceMap should have a comment

The comment for RecordFreeIndexPage gives the function's name incorrectly.

InitFreeSpaceMap() should be explicitly declared as taking void in its
definition.

FreeSpaceMapTruncateRel seems to have a bug in its early-exit test: in the
case where the number of FSM blocks stays the same, it fails to zero out slots
in the last block.  I also think it's got an off-by-one problem in figuring
the number of FSM blocks: for the normal case where the new heap end is in
the middle of a FSM block, shouldn't new_nfsmblocks be one larger than it
is?  The case where nblocks is an exact multiple of SlotsPerFSMPage would
need to be special-cased to be exactly correct, though I see no real harm in
letting the FSM be left one page too big in that case.

The patch shouldn't be touching bufmgr.c at all any more --- or at least, none
of the diffs there are improvements.

Docs for contrib/pageinspect still need work: the 3-parameter form of
get_raw_page isn't documented, nor the fork behavior of the 2-parameter form.

In gistvacuum.c, you've removed the code that adjusts totFreePages to not
count pages truncated away.  I think you could just subtract the number of
truncated pages from it, since they must have been counted in it earlier.
(ginvacuum.c seems to get this right.)

I do not like the kluge in heap_xlog_clean one bit, and think it's unnecessary
anyway since we are not relying on the FSM to be accurate.  Suggest reverting
the heapam.c changes except for heap_sync().

rd_fsm_nblocks_cache should be reset in the places where rd_targblock is.
You seem to have tracked the clearings of rd_smgr which is not the right
thing at all.

I see you renamed next, which is good, but the README isn't up to speed on
it and a lot of the comments aren't either.

Since fp_next_slot is signed, the sanity check in fsm_search_avail had better
include target  0.

The new search algorithm in fsm_search_avail still doesn't work.  Consider
what happens when the target is the rightmost slot on the page; it certainly
won't wrap properly.

fsm_truncate_avail seems quite broken: it's clearing the whole page always.

In fsm_rebuild_page, surely we needn't check if (lchild  NodesPerPage).
Also you probably ought to make it
if (fsmpage-fp_nodes[nodeno] != newvalue)
{
fsmpage-fp_nodes[nodeno] = newvalue;
changed = true;
}
to avoid useless write traffic into a shared buffer.

I think DEPTH should be a macro not a static int; it's certainly
reducible to a compile-time constant.  Also I wonder whether you
really need the SlotsPerFSMPagePowers[] array at all (and if not,
you could get rid of InitFreeSpaceMap).  It's used in only one
place and it seems a bit hard to argue that a multiplication loop
really needs to be avoided there --- the division loop that comes
after it will cost a lot more, and in any case both are negligible
compared to the shared buffer fetch that's about to occur.

This test in fsm_space_needed_to_cat:
if (needed = (FSM_CATEGORIES - 1) * FSM_CAT_STEP)
elog(ERROR, invalid FSM request size);
reveals a rather fundamental problem: it is clearly possible
for this test to fail on valid request sizes, because the page
header overhead is less than FSM_CAT_STEP (especially if BLCKSZ
is more than 8K).  I'm not sure about a really clean solution
here.  We could offset the needed_to_cat and avail_to_cat
calculations so that category 255 corresponds exactly to the
maximum possible free space, but that requires assuming that FSM
knows exactly what that is, which is a bit unpleasant.  Thoughts?

It seems a bit schizophrenic that fsm_search_avail takes a Buffer
when all the other functions in fsmpage.c take Page arguments.
I see why fsm_search_avail needs to do that, but maybe it'd be
better if the other functions did too?

fsm_search() should not take addr as an argument, since it has a
built-in assumption that it is started at the root.

I find the use of eof as both a local variable and a parameter in
fsm_vacuum_page to be pretty poor programming practice.  Maybe call
the parameter eof_p?

Shouldn't fsm_redo include a FreeFakeRelcacheEntry call?

Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-28 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Thu, 2008-09-25 at 18:28 -0400, Tom Lane wrote:
 After reading this for awhile, I realized that there is a rather
 fundamental problem with it: it switches into consistent recovery
 mode as soon as it's read WAL beyond ControlFile-minRecoveryPoint.
 In a crash recovery situation that typically is before the last
 checkpoint (if indeed it's not still zero), and what that means is
 that this patch will activate the bgwriter and start letting in
 backends instantaneously after a crash, long before we can have any
 certainty that the DB state really is consistent.
 
 In a normal crash recovery situation this would be easily fixed by
 simply not letting it go to consistent recovery state at all, but
 what about recovery from a restartpoint?  We don't want a slave that's
 crashed once to never let backends in again.  But I don't see how to
 determine that we're far enough past the restartpoint to be consistent
 again.  In crash recovery we assume (without proof ;-)) that we're
 consistent once we reach the end of valid-looking WAL, but that rule
 doesn't help for a slave that's following a continuing WAL sequence.
 
 Perhaps something could be done based on noting when we have to pull in
 a WAL segment from the recovery_command, but it sounds like a pretty
 fragile assumption.

 Seems like we just say we only signal the postmaster if
 InArchiveRecovery. Archive recovery from a restartpoint is still archive
 recovery, so this shouldn't be a problem in the way you mention. The
 presence of recovery.conf overrides all other cases.

What that implements is my comment that we don't have to let anyone in
at all during a plain crash recovery.  It does nothing AFAICS for the
problem that when restarting archive recovery from a restartpoint,
it's not clear when it is safe to start letting in backends.  You need
to get past the highest LSN that has made it out to disk, and there is
no good way to know what that is.

Unless we can get past this problem the whole thing seems a bit dead in
the water :-(

 * I'm a bit uncomfortable with the fact that the
 IsRecoveryProcessingMode flag is read and written with no lock.

 It's not a dynamic state, so I can fix that inside
 IsRecoveryProcessingMode() with a local state to make check faster.

Erm, this code doesn't look like it can allow IsRecoveryProcessingMode
to become locally true in the first place?  I guess you could fix it
by initializing IsRecoveryProcessingMode to true, but that seems likely
to break other places.  Maybe better is to have an additional local
state variable showing whether the flag has ever been fetched from
shared memory.

The other issues don't seem worth arguing about ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: move column defaults into pg_attribute along with attacl

2008-09-28 Thread Stephen Frost
Markus,

* Markus Wanner ([EMAIL PROTECTED]) wrote:
 What does the subobject column for pg_shdepend buy us?

Tracking column-level ACL dependencies rather than having those
dependencies only be at the table-level.  This complicates
pg_shdepend some, but simplifies the dependency handling in the
ACL area and in handling table/column drops.

I'm still not a fan of having column-level deps handled
differently between pg_shdepend and pg_depend, but that's not
something which has to be addressed directly by the column-level
privs patch.  Perhaps once it's done I'll do a proof-of-concept
for removing pg_attdef.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
I was in a discussion with someone about the difference between ad-hoc
storage systems and SQL. Yes, I know, I was rolling my eyes as well. One
thing did strike me though was the idea that a table could contain a
variable number of columns.

Something like this:

create adhoc table foo ();

insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

In an ad-hoc table type, when an insert is made, and a column is not
found, then a new varchar column is added.

I know the idea has a lot of holes, and is probably a bad idea, but it
answers an important problem of easily mapping programmatic types to a
database.

Anyone think its interesting?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Something like this:

 create adhoc table foo ();

 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

 In an ad-hoc table type, when an insert is made, and a column is not
 found, then a new varchar column is added.

 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.

Seems like a table with one contrib/hstore column might be more relevant
to this guy's idea of how to do database design.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
 [EMAIL PROTECTED] writes:
 Something like this:

 create adhoc table foo ();

 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

 In an ad-hoc table type, when an insert is made, and a column is not
 found, then a new varchar column is added.

 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.

 Seems like a table with one contrib/hstore column might be more relevant
 to this guy's idea of how to do database design.


That's actually a very cool module, I hadn't seen it before. I've
considered writing something like it, but more XML centric, but I'm not
sure it answers the concept.

I'm not sure if you have dealt with web site sessions and object
persistence crap, but its a pain to get up and running and improving
performance is a drag. Web guys tend to know very little about databases
and tend, sadly, not to be very inquisitive about such things.

Web session and user attribute objects are typically stored in a database
as XML, JSON, or some other aggregated format in a single column (hstore).
That works great for when you just need to access the data by the key, but
if you want to use the data outside the web application for something
like OLAP, you have to decide which attributes reside in the aggregate
column or get promoted to a full fledged column. That's why you'll see
tables with username, passwdhash, email, etc. in addition to an aggregated
column of things like screen template, age, etc.

So, how do you have a table of a generally arbitrary number of columns
without creating some sort of aggregate column?  With an aggregate column,
the data isn't on the same level as real column data, so you need to parse
the aggregate to extract a value, and you have to do that for each value.
On top of that, you then have to explain your aggregate strategy to the
web guys.

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Null row vs. row of nulls in plpgsql

2008-09-28 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 On 27 Sep 2008, at 09:56 PM, Tom Lane [EMAIL PROTECTED] wrote:
 ISTM that the fundamental problem is that plpgsql doesn't distinguish
 properly between a null row value (eg, null::somerowtype) and a
 row of null values (eg, row(null,null,...)::somerowtype).  When that
 code was designed, our main SQL engine was pretty fuzzy about the
 difference too, but now there is a clear semantic distinction.

 Iirc the reason for this fuzziness came from the SQL spec definition  
 of IS NULL for rows. As long as you maintain that level of spec- 
 compliance I don't think there are any other important constraints on  
 pg behaviour.

I started to poke into this and found out that it was a bit subtler than
I thought.  It'd be possible to associate a rowisnull state value
with a row variable, but the problem is that plpgsql treats the row
fields as independent variables that can be accessed without touching
the row.  In particular you can assign null or nonnull values to
individual fields.  So consider

-- presumably, this'll set rowisnull to TRUE:
rowvar := NULL;
-- this had better cause rowisnull to become FALSE:
rowvar.field1 := 42;
-- does this cause it to become TRUE again?
rowvar.field1 := NULL;

There are a bunch of implementation problems with making any such
behavior happen, since the row field variables don't currently know
that they are members of a row, and indeed it's possible for the same
variable to be a member of more than one row.  But the core issue is
that this interaction seems to fuzz the distinction between row is
null and all the row's elements are null.  In particular, if you
think that rowisnull should be TRUE after the above sequence, then
I think you are saying they are the same thing.  So maybe the spec
authors are smarter than we are.

Thoughts?  What would a consistent behavior look like?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Being able to insert arbitrary named values, and extracting them
 similarly, IMHO works better and more naturally than some external
 aggregate system built on a column. I know it is a little outside the
 box thinking, what do you think?

I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo?

* What datatype should it have?  (Always varchar is just lame.)

* Should it have an index?  If so, should it be unique?

* If you keep doing this, you'll soon find yourself reading out
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit.

If you've expended enough thought to be sure that the column is not just
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from member of an hstore column to
real database column is pretty painful, but I don't see that allow
columns to spring into existence solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-28 Thread Simon Riggs

On Sun, 2008-09-28 at 14:02 -0400, Tom Lane wrote:

 It does nothing AFAICS for the
 problem that when restarting archive recovery from a restartpoint,
 it's not clear when it is safe to start letting in backends.  You need
 to get past the highest LSN that has made it out to disk, and there is
 no good way to know what that is.
 
 Unless we can get past this problem the whole thing seems a bit dead
 in
 the water :-(

I agree the importance of your a problem but don't fully understand the
circumstances under which you see a problem arising.

AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
the controlfile, so whenever we restart we can see that it has been set
previously and now we are beyond it. So if we crash during recovery and
then restart *after* we reached minRecoveryLoc then we resume in safe
mode almost immediately. If we crash during recovery before we reached
minRecoveryLoc then we continue until we find it. 

There is a loophole, as described on separate post, but that can be
plugged by offering explicit setting of the minRecoveryLoc from
recovery.conf. Most people use pg_start_backup() so do not experience
the need for that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Mark Mielke

Not that I'm agreeing with the direction but just as a thinking experiment:

Tom Lane wrote:

[EMAIL PROTECTED] writes:
  

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?



I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo?
  


If it's a field in a data structure from a language such as Java, it's 
not a typo.



* What datatype should it have?  (Always varchar is just lame.)
  


SQLite uses always varchar and it doesn't seem to be a problem. For 
simpler numbers like 0, the text form can be more compact, and the 
database may be portable across different hardware architectures.



* Should it have an index?  If so, should it be unique?
  


It might be cool for indexes to automatically appear as they become 
beneficial (and removed as they become problematic). Unique is a 
constraint which should be considered separate from whether it should be 
an index or not. I don't know if it would be useful or not.



* If you keep doing this, you'll soon find yourself reading out
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit.
  


Introduce variable field-order for tuples? Only provide values if 
non-null? :-)



If you've expended enough thought to be sure that the column is not just
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from member of an hstore column to
real database column is pretty painful, but I don't see that allow
columns to spring into existence solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.
  


I find the Oracle nested table and data structure support enticing 
although I do not have experience with it. It seems like it might be a 
more mature implementation of hstore? If hstore had everything that was 
required in terms of performance or flexibility, we wouldn't need fixed 
columns at all?


But yes - I tend to agree that the object persistent layer can be hidden 
away behind something like the Java object persistence model, 
automatically doing alter table or providing a configured mapping from a 
description file. This isn't a problem that needs to be solved at the 
database layer.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-28 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 It does nothing AFAICS for the
 problem that when restarting archive recovery from a restartpoint,
 it's not clear when it is safe to start letting in backends.  You need
 to get past the highest LSN that has made it out to disk, and there is
 no good way to know what that is.

 AFAICS when we set minRecoveryLoc we *never* unset it. It's recorded in
 the controlfile, so whenever we restart we can see that it has been set
 previously and now we are beyond it.

Right ...

 So if we crash during recovery and
 then restart *after* we reached minRecoveryLoc then we resume in safe
 mode almost immediately.

Wrong.

What minRecoveryLoc is is an upper bound for the LSNs that might be
on-disk in the filesystem backup that an archive recovery starts from.
(Defined as such, it never changes during a restartpoint crash/restart.)
Once you pass that, the on-disk state as modified by any dirty buffers
inside the recovery process represents a consistent database state.
However, the on-disk state alone is not guaranteed consistent.  As you
flush some (not all) of your shared buffers you enter other
not-certainly-consistent on-disk states.  If we crash in such a state,
we know how to use the last restartpoint plus WAL replay to recover to
another state in which disk + dirty buffers are consistent.  However,
we reach such a state only when we have read WAL to beyond the highest
LSN that has reached disk --- and in recovery mode there is no clean
way to determine what that was.

Perhaps a solution is to make XLogFLush not be a no-op in recovery mode,
but have it scribble a highest-LSN somewhere on stable storage (maybe
scribble on pg_control itself, or maybe better someplace else).  I'm
not totally sure about that.  But I am sure that doing nothing will
be unreliable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
 [EMAIL PROTECTED] writes:
 Being able to insert arbitrary named values, and extracting them
 similarly, IMHO works better and more naturally than some external
 aggregate system built on a column. I know it is a little outside the
 box thinking, what do you think?

 I'm failing to see the point.  Allowing columns to spring into existence
 without any forethought seems to me to be all minuses and no pluses
 worth mentioning.

 * What if the column name is just a typo?

In an automated system like PHP, Java, etc. that's not too likely.


 * What datatype should it have?  (Always varchar is just lame.)

varchar or text is not just lame, SQLite used to do that exclusively.
One could argue that XML is nothing more than text.


 * Should it have an index?  If so, should it be unique?

The answer to that is, well, no, not unless the dba generates one or it is
declared. Just like any other column. All the rules that apply to create
table and alter table add column just apply naturally as would be
expected.

create adhoc table userdata(username varchar, email varchar, primary
key(email));


 * If you keep doing this, you'll soon find yourself reading out
 unbelievably wide tables (lots of columns), which won't be especially
 easy or efficient to process on either the backend or the client side.
 Plus you might run into the max-columns-per-tuple limit.

Well, I fully understand that it is not a general purpose unlimited
width sort of thing. In a programing environment, the target environment
for this type of feature, it is unlikely to be a run-away problem.


 If you've expended enough thought to be sure that the column is not just
 a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
 command to tell the database the results of your genius.

Like I said, if you've never dealt with a live web site, maintained by a
team of web dudes, working furiously to keep their job and get paid,
your only hope to keep up with Oh! I needed to add the 'time to live' of
the session into the session data is to use an aggregate storage system.


 I do see the point that switching from member of an hstore column to
 real database column is pretty painful, but I don't see that allow
 columns to spring into existence solves that in any meaningful way.
 Is there some other way we could address such conversions?

Every other solution creates a second tier of data storage. You either
deal with data elements at the table level, or you create a roll your
own aggregate mechanism, or make a HUGE table of user,name,value table
and force a join and index scan for every select.  (A million users, 5-10
attributes each is an expensive join.)

 BTW, I think it is (or should be) possible to create an index on
 hstore-'mycol', so at least one of the reasons why you should *need*
 to switch to a real database column seems bogus.

Oh, yea, function indexes work great. I think you did that right?

For what its worth, I don't expect you to jump all over this. It really is
a divergence from classic SQL design. I'm not even sure I like it. In
fact, I don't like it, but the argument that you are being forced to
create a second class data storage mechanism or a relational join for data
that is logically in a single relation does cause one to ponder the
problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Operation needed for datfrozenxid bug?

2008-09-28 Thread ITAGAKI Takahiro
Hello,

The datfrozenxid bug is fixed in 8.3.4,
but do I need additional operations after upgrade postgres binaries?

| Fix potential miscalculation of datfrozenxid (Alvaro)
|   This error may explain some recent reports of failure
|   to remove old pg_clog data. 

I assume that pg_database.datfrozenxid is wrongly set to 3
(FirstNormalTransactionId) if the bug has occurred.

  1. Can I use the following SQL to check whether the bug has occurred?
SELECT * FROM pg_database WHERE datfrozenxid = 3
 (will return some rows under the bug.)

  2. What is needed to fix the corrupted system catalog,
 something like the following SQL?
UPDATE pg_database SET datfrozenxid = relfrozenxid
  FROM (SELECT relfrozenxid FROM pg_class
 WHERE relkind IN ('r', 't')
 ORDER BY age(relfrozenxid) DESC LIMIT 1) AS tmp
 WHERE datname = current_database();

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel pg_restore - WIP patch

2008-09-28 Thread Andrew Dunstan



Andrew Dunstan wrote:





this works better but there is something fishy still - using the same 
dump file I get a proper restore using pg_restore normally. If I 
however use -m for a parallel one I only get parts (in this case only 
243 of the 709 tables) of the database restored ...






Yes, there are several funny things going on, including some stuff 
with dependencies. I'll have a new patch tomorrow with luck. Thanks 
for testing.





OK, in this version a whole heap of bugs are fixed, mainly those to do 
with dependencies and saved state. I get identical row counts in the 
source and destination now, quite reliably.


cheers

andrew
Index: pg_backup.h
===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup.h,v
retrieving revision 1.47
diff -c -r1.47 pg_backup.h
*** pg_backup.h	13 Apr 2008 03:49:21 -	1.47
--- pg_backup.h	29 Sep 2008 02:43:51 -
***
*** 123,128 
--- 123,130 
  	int			suppressDumpWarnings;	/* Suppress output of WARNING entries
  		 * to stderr */
  	bool		single_txn;
+ int number_of_threads;
+ 	booltruncate_before_load;
  
  	bool	   *idWanted;		/* array showing which dump IDs to emit */
  } RestoreOptions;
***
*** 165,170 
--- 167,173 
  extern void CloseArchive(Archive *AH);
  
  extern void RestoreArchive(Archive *AH, RestoreOptions *ropt);
+ extern void RestoreArchiveParallel(Archive *AH, RestoreOptions *ropt);
  
  /* Open an existing archive */
  extern Archive *OpenArchive(const char *FileSpec, const ArchiveFormat fmt);
Index: pg_backup_archiver.c
===
RCS file: /cvsroot/pgsql/src/bin/pg_dump/pg_backup_archiver.c,v
retrieving revision 1.158
diff -c -r1.158 pg_backup_archiver.c
*** pg_backup_archiver.c	5 Sep 2008 23:53:42 -	1.158
--- pg_backup_archiver.c	29 Sep 2008 02:43:52 -
***
*** 27,38 
--- 27,50 
  
  #include unistd.h
  
+ #include sys/types.h
+ #include sys/wait.h
+ 
+ 
  #ifdef WIN32
  #include io.h
  #endif
  
  #include libpq/libpq-fs.h
  
+ typedef struct _parallel_slot 
+ {
+ 	pid_t   pid;
+ 	TocEntry *te;
+ 	DumpId  dumpId;
+ } ParallelSlot;
+ 
+ #define NO_SLOT (-1)
  
  const char *progname;
  
***
*** 70,76 
--- 82,99 
  static void dumpTimestamp(ArchiveHandle *AH, const char *msg, time_t tim);
  static OutputContext SetOutput(ArchiveHandle *AH, char *filename, int compression);
  static void ResetOutput(ArchiveHandle *AH, OutputContext savedContext);
+ static bool work_is_being_done(ParallelSlot *slot, int n_slots);
+ static int get_next_slot(ParallelSlot *slots, int n_slots);
+ static TocEntry *get_next_work_item(ArchiveHandle *AH);
+ static void prestore(ArchiveHandle *AH, TocEntry *te);
+ static void mark_work_done(ArchiveHandle *AH, pid_t worker, ParallelSlot *slots, int n_slots);
+ static int _restore_one_te(ArchiveHandle *ah, TocEntry *te, RestoreOptions *ropt,bool is_parallel);
+ static void _reduce_dependencies(ArchiveHandle * AH, TocEntry *te);
+ static void _fix_dependency_counts(ArchiveHandle *AH);
+ static void _inhibit_data_for_failed_table(ArchiveHandle *AH, TocEntry * te);
+ 
  
+ static ArchiveHandle *GAH;
  
  /*
   *	Wrapper functions.
***
*** 125,137 
  
  /* Public */
  void
  RestoreArchive(Archive *AHX, RestoreOptions *ropt)
  {
  	ArchiveHandle *AH = (ArchiveHandle *) AHX;
  	TocEntry   *te;
  	teReqs		reqs;
  	OutputContext sav;
- 	bool		defnDumped;
  
  	AH-ropt = ropt;
  	AH-stage = STAGE_INITIALIZING;
--- 148,579 
  
  /* Public */
  void
+ RestoreArchiveParallel(Archive *AHX, RestoreOptions *ropt)
+ {
+ 
+ 	ArchiveHandle *AH = (ArchiveHandle *) AHX;
+ 	ParallelSlot  *slots;
+ int next_slot;
+ TocEntry *next_work_item = NULL;
+ int work_status;
+ 	pid_t ret_child;
+ 	int n_slots = ropt-number_of_threads;
+ 	TocEntry *te;
+ 	teReqsreqs;
+ 	
+ 
+ 	/* 	AH-debugLevel = 99; */
+ 	/* some routines that use ahlog() don't get passed AH */
+ 	GAH = AH;
+ 
+ 	ahlog(AH,1,entering RestoreARchiveParallel\n);
+ 
+ 
+ 	slots = (ParallelSlot *) calloc(sizeof(ParallelSlot),n_slots);
+ 	AH-ropt = ropt;
+ 
+ /*
+ 	if (ropt-create)
+ 		die_horribly(AH,modulename,
+ 	 parallel restore is incompatible with --create\n);
+ */
+ 
+ 
+ 	if (ropt-dropSchema)
+ 		die_horribly(AH,modulename,
+ 	 parallel restore is incompatible with --clean\n);
+ 
+ 	if (!ropt-useDB)
+ 		die_horribly(AH,modulename,
+ 	 parallel restore requires direct database connection\n);
+ 
+ 
+ #ifndef HAVE_LIBZ
+ 
+ 	/* make sure we won't need (de)compression we haven't got */
+ 	if (AH-compression != 0  AH-PrintTocDataPtr != NULL)
+ 	{
+ 		for (te = AH-toc-next; te != AH-toc; te = te-next)
+ 		{
+ 			reqs = _tocEntryRequired(te, ropt, false);
+ 			if (te-hadDumper  (reqs  REQ_DATA) != 0)
+ die_horribly(AH, modulename, 
+ 			 cannot restore from compressed archive (compression not 

Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread David E. Wheeler

On Sep 28, 2008, at 17:46, Tom Lane wrote:


BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.


The docs say:

  titleIndexes/title

  para
   typehstore/ has index support for literal@gt;/ and  
literal?/
   operators.  You can use either GiST or GIN index types.  For  
example:

  /para
  programlisting
CREATE INDEX hidx ON testhstore USING GIST(h);

CREATE INDEX hidx ON testhstore USING GIN(h);
  /programlisting

I'm not sure what that means. Can you create normal btree or hash  
indexes on hstore columns? And is the index useful for both `@` and `? 
`?


Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Oleg Bartunov

What you're talking about is a document based database like
StrokeDB, CouchDB. With hstore you don't need to parse content of 
'aggregate' column, it provides necessary methods. Also, we tried 
to speedup selects using indexes. Probably, we need to refresh our 
interest to hstore, do you have any actual proposals ?


Oleg

On Sun, 28 Sep 2008, [EMAIL PROTECTED] wrote:


[EMAIL PROTECTED] writes:

Something like this:



create adhoc table foo ();



insert into foo (name, rank, serial) values ('joe', 'sargent', '42');



In an ad-hoc table type, when an insert is made, and a column is not
found, then a new varchar column is added.



I know the idea has a lot of holes, and is probably a bad idea, but it
answers an important problem of easily mapping programmatic types to a
database.


Seems like a table with one contrib/hstore column might be more relevant
to this guy's idea of how to do database design.



That's actually a very cool module, I hadn't seen it before. I've
considered writing something like it, but more XML centric, but I'm not
sure it answers the concept.

I'm not sure if you have dealt with web site sessions and object
persistence crap, but its a pain to get up and running and improving
performance is a drag. Web guys tend to know very little about databases
and tend, sadly, not to be very inquisitive about such things.

Web session and user attribute objects are typically stored in a database
as XML, JSON, or some other aggregated format in a single column (hstore).
That works great for when you just need to access the data by the key, but
if you want to use the data outside the web application for something
like OLAP, you have to decide which attributes reside in the aggregate
column or get promoted to a full fledged column. That's why you'll see
tables with username, passwdhash, email, etc. in addition to an aggregated
column of things like screen template, age, etc.

So, how do you have a table of a generally arbitrary number of columns
without creating some sort of aggregate column?  With an aggregate column,
the data isn't on the same level as real column data, so you need to parse
the aggregate to extract a value, and you have to do that for each value.
On top of that, you then have to explain your aggregate strategy to the
web guys.

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers