[HACKERS] New btree_gist code has a few problems

2004-06-01 Thread Tom Lane
I tried running 'make installcheck' in contrib just now, and didn't
get past btree_gist :-(

The interval test fails with the attached diffs.  text, varchar, char,
bytea, bit, varbit and numeric either dump core or go into infinite
loops during CREATE INDEX.  (It's probably significant that these are
all varlena datatypes...)

This is on HPUX 10.20 using gcc.  Let me know if you have any thoughts
about tracking it down.

regards, tom lane

*** ./expected/interval.out Fri May 28 06:43:27 2004
--- ./results/interval.out  Tue Jun  1 01:35:07 2004
***
*** 49,66 
  SELECT count(*) FROM intervaltmp WHERE a  = '199 days 21:21:23'::interval;
   count 
  ---
!  1
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a = '199 days 21:21:23'::interval;
   count 
  ---
!271
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a   '199 days 21:21:23'::interval;
   count 
  ---
!270
  (1 row)
  
--- 49,66 
  SELECT count(*) FROM intervaltmp WHERE a  = '199 days 21:21:23'::interval;
   count 
  ---
!  0
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a = '199 days 21:21:23'::interval;
   count 
  ---
!  8
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a   '199 days 21:21:23'::interval;
   count 
  ---
!  8
  (1 row)


Core dump in varchar test looks like:

Program terminated with signal 10, Bus error.
#0  0xc0c6de40 in gbt_var_key_copy (u=0x7b03e4f0, force_node=5 '\005')
at btree_utils_var.c:31
31r = (GBT_VARKEY *) palloc(VARSIZE(u-lower) + VARSIZE(u-upper) + 
VARHDRSZ );
(gdb) bt
#0  0xc0c6de40 in gbt_var_key_copy (u=0x7b03e4f0, force_node=5 '\005')
at btree_utils_var.c:31
#1  0xc0c6e3f8 in gbt_var_bin_union (u=0x7b03d920, e=0x40110cd8,
tinfo=0x7afff594) at btree_utils_var.c:220
[ gdb gets confused here, possibly stack is smashed ]

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] CVS tip problems

2004-06-01 Thread Oliver Elphick
On Tue, 2004-06-01 at 01:33, Tom Lane wrote:
 First you might want to check which flavor of strerror_r() your platform
 has --- does it return int or char* ?  

I made the following change to the strerror_r call, which makes it work
correctly with threading enabled:

--- src/port/thread.c   23 Apr 2004 18:15:55 -  1.20
+++ src/port/thread.c   1 Jun 2004 07:18:26 -
@@ -71,7 +71,8 @@
 #if defined(FRONTEND)  defined(ENABLE_THREAD_SAFETY)  defined(HAVE_STRERROR_R)
/* reentrant strerror_r is available */
/* some early standards had strerror_r returning char * */
-   strerror_r(errnum, strerrbuf, buflen);
+   char buf[256];
+   StrNCpy(strerrbuf, strerror_r(errnum, buf, 256), buflen);
return strerrbuf;

 #else


(I realise this is not sufficient for a patch to correct the problem.)
-- 
Oliver Elphick  [EMAIL PROTECTED]
Isle of Wight  http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
 
 Thou will show me the path of life; in thy presence 
  is fullness of joy; at thy right hand there are  
  pleasures for evermore. Psalms 16:11 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] New btree_gist code has a few problems

2004-06-01 Thread Teodor Sigaev
Ok, I'll have a look.
Tom Lane wrote:
I tried running 'make installcheck' in contrib just now, and didn't
get past btree_gist :-(
The interval test fails with the attached diffs.  text, varchar, char,
bytea, bit, varbit and numeric either dump core or go into infinite
loops during CREATE INDEX.  (It's probably significant that these are
all varlena datatypes...)
This is on HPUX 10.20 using gcc.  Let me know if you have any thoughts
about tracking it down.
regards, tom lane
*** ./expected/interval.out	Fri May 28 06:43:27 2004
--- ./results/interval.out	Tue Jun  1 01:35:07 2004
***
*** 49,66 
  SELECT count(*) FROM intervaltmp WHERE a  = '199 days 21:21:23'::interval;
   count 
  ---
!  1
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a = '199 days 21:21:23'::interval;
   count 
  ---
!271
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a   '199 days 21:21:23'::interval;
   count 
  ---
!270
  (1 row)
  
--- 49,66 
  SELECT count(*) FROM intervaltmp WHERE a  = '199 days 21:21:23'::interval;
   count 
  ---
!  0
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a = '199 days 21:21:23'::interval;
   count 
  ---
!  8
  (1 row)
  
  SELECT count(*) FROM intervaltmp WHERE a   '199 days 21:21:23'::interval;
   count 
  ---
!  8
  (1 row)

Core dump in varchar test looks like:
Program terminated with signal 10, Bus error.
#0  0xc0c6de40 in gbt_var_key_copy (u=0x7b03e4f0, force_node=5 '\005')
at btree_utils_var.c:31
31r = (GBT_VARKEY *) palloc(VARSIZE(u-lower) + VARSIZE(u-upper) + 
VARHDRSZ );
(gdb) bt
#0  0xc0c6de40 in gbt_var_key_copy (u=0x7b03e4f0, force_node=5 '\005')
at btree_utils_var.c:31
#1  0xc0c6e3f8 in gbt_var_bin_union (u=0x7b03d920, e=0x40110cd8,
tinfo=0x7afff594) at btree_utils_var.c:220
[ gdb gets confused here, possibly stack is smashed ]
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
   http://www.postgresql.org/docs/faqs/FAQ.html
--
Teodor Sigaev  E-mail: [EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] pg_dump --comment?

2004-06-01 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Chris Campbell [EMAIL PROTECTED] writes:

 Harald Fuchs wrote:
 Why don't you just do
 ( echo -- This is my comment
 pg_dump whatever
 )  dumpfile
 ?

 How could I dump using the custom format, and then use dumpfile with
 pg_restore to restore the dump? If I just prepend the comment to the
 file, then pg_restore will choke, since the file won't be in the
 proper custom format. I would have to remove the comment before
 sending the file to pg_restore. Is there an easy way to do that? That
 can be easily automated, and not take a huge amount of time given a 4
 gig dump file that must be modified?

Since pg_restore is able to read from standard input, that should not
be a problem: instead of

  pg_restore options dumpfile

just do

  sed 1d dumpfile | pg_restore options


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Simon Riggs
On Tue, 2004-06-01 at 01:24, Tom Lane wrote:
 I was just about to commit a patch that revises the btree index build
 procedure as discussed here:
 http://archives.postgresql.org/pgsql-general/2004-05/msg00480.php
 specifically, not using shared buffers during index build and bypassing
 WAL-logging in favor of just fsyncing the index file before commit.
 
 I was actually writing the commit message when it occurred to me that
 this would seriously break PITR.  If the WAL datastream doesn't contain
 enough info to rebuild the index then rolling forward from a past backup
 isn't gonna work.
 
 I thought for a little bit about a magic reconstruct the index WAL
 entry that would invoke the index build procedure in toto, but that
 doesn't look like it will fly either.  (Two problems: during crash
 recovery, you couldn't be sure that what's on disk for the underlying
 table exactly matches the index you need to build --- it could be a
 later state of the table; and besides, the environment of the WAL replay
 process isn't capable of running user-defined functions, so it couldn't
 work for functional indexes.)
 
 So AFAICS, we've got to dump the index contents into WAL to support
 PITR.  This is a tad annoying.
 
 What I'm thinking about right now is tweaking the index-build code to
 write to WAL only if it sees that PITR is actually in use.  It would
 have to look at the GUC variables to determine whether WAL archiving
 is enabled.  If archiving isn't turned on, then we could assume that
 rollforward from a past backup isn't needed in this installation, and
 use the WAL-less index build method.
 
 Comments?

The mechanism you suggest would also break crash recovery, not just PITR
- though the avoidance of shared buffers seems like a gain either way.


..You raise the whole subject of UNRECOVERABLE data objects. Also known
as NOT LOGGED etc.

There are many significant performance gains to be had by turning off
recoverability for certain large operations. Oracle and Teradata make
extensive use of such features, i.e. especially in Data Warehousing.

Examples of such operations might be:
- index builds
- INSERT SELECTs into previously empty tables

This is an important area for performance...not just index builds.


A suggestion would be:
- add the dont send to xlog functionality as a user option on each
statement, default=LOGGING - this could be Oracle compatible, or not,
but concept is similar. Put the hooks in now and we can add this to all
appropriate statement syntax later.

e.g. 
CREATE INDEX blah ... NO LOGGING... ;
INSERT INTO blah ... NO LOGGING  SELECT... ;

- if conf file says dont use fsync, then dont write to log - clearly
they dont mind losing data in the event of a crash...
i.e. default=NOLOGGING on all statements



Best regards, Simon Riggs


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Shridhar Daithankar
On Monday 31 May 2004 22:00, Tom Lane wrote:
 Shridhar Daithankar [EMAIL PROTECTED] writes:
  Right now following are measured in pages
  wal_buffers
  shared_buffers
  effective_cachesize
  while rest of the memory parameters are in kb. I thought being uniform
  would be good. Besides it will make it independent of page size as well.

 It would probably be reasonable to change effective_cache_size, since we
 really do not know what the kernel's unit of buffering is (except on
 Linux, where we *do* know that it ain't 8K ;-)).  Personally I'd opt for
 measuring it in MB not KB, though; that would be a much more convenient
 unit on modern machines.  We could easily make it a float for anyone who
 thinks they know the cache size to sub-MB accuracy.

I have no problems with MB. Only thing I want to see is a bit more user 
friendly and consistent configuration.

Initially I thought of bytes as oracle does but said 'let's be little more 
practical' and put KB..:-) MB is perfectly OK. It would just change the 
multiplier.

And I don't think specifying a float is such a good idea. It is just so 
counter-intuitive. I mean how many people would care for fraction of an MB 
wasted?

 As for the others, I'll side with Emerson: a foolish consistency is the
 hobgoblin of little minds.  We know very well what the unit of
 allocation of those is, and it's pages.  There's no advantage to using
 KB except making it harder to work out what's really happening.  We
 could measure max_connections in KB too if we had a mind to: there's
 a very definite shared-mem cost per connection slot.  Or the FSM
 parameters, or checkpoint_segments, or max_locks_per_transaction.
 The fact that they have quantifiable space costs doesn't mean that space
 is the most useful way to measure them.

Agreed. This is not to change things left and right. It is only to put some 
consistency in place.

 BTW, were you intending to convert KB to NBuffers by charging exactly 8K
 per buffer, or were you intending to allow for the additional shmem
 costs such as buffer headers, per-buffer LWLocks, etc?  If not the
 latter, then what are you really measuring?  For sure it's not shared
 memory size --- charging an artificial number isn't going to help anyone
 who's trying to pick shared_buffers to arrive at a particular actual
 shmem size.  But if it is the latter then it'll become even more
 impossible to tell what's really happening, and we'll be forced to
 invent some way of reading out how many buffers really got allocated.

Well, for the purpose, this is beyond what I am trying to do. As of now there 
is a int value sitting in postgresql.conf which is in page blocks. Now there 
will be a conversion before it is used anytime so that it is bit more user 
friendly. The change should be skin deep so as to be low impact.

As far putting a prefix such as K or M, I don't know much work that would be. 
Does that mean we need to convert shared_buffers to a string parameter and 
parse it? [EMAIL PROTECTED](I would gladly write a real name but alas) 
said yesterday that there is a patch pending with Bruce for such a framework. 
I don't know what and how it does.

For simplicity, I would convert all memory parameters to either KB or MB and 
state so in postgresql.conf. No floats no suffixes. This is my opinion of 
course. Any suggestions are always welcome..

Actually I need to find out few more things about it. It is not as simple as 
adding a assign_hook. When I tried to initdb with changes, it demanded 64MB 
of shared buffers which I (now) think that somewhere NBuffers are used before 
postgresql.conf is parsed. So 8192*8000=64MB. But this is just guesswork. 
Haven't looked in it there.

If this seems reasonably OK, then I would spend some more time on it. We would 
need quite some documentation update then.

 So I disagree with the premise.  Measuring these things in KB is not an
 improvement.

As I said, KBs or MBs is not the issue. Not having it in terms of pagesize is.

 Shridhar


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Shridhar Daithankar
On Tuesday 01 June 2004 14:12, Shridhar Daithankar wrote:
 Actually I need to find out few more things about it. It is not as simple
 as adding a assign_hook. When I tried to initdb with changes, it demanded
 64MB of shared buffers which I (now) think that somewhere NBuffers are used
 before postgresql.conf is parsed. So 8192*8000=64MB. But this is just
 guesswork. Haven't looked in it there.

Found  it. Following is the code that is causing problem.

guc.c:2998
---
if (conf-assign_hook)
if (!(*conf-assign_hook) (newval, changeVal, 
source))
{
ereport(elevel,

(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 errmsg(invalid value 
for parameter \%s\: %d,
name, 
newval)));
return false;
}

if (changeVal || makeDefault)
{
if (changeVal)
{
*conf-variable = newval;
conf-gen.source = source;
}
---

So even if assign_hook is executed, the value of variable is overwritten in 
next step which nullifies any factoring/change in value done in assign hook.

I find this as a convention at many other place at guc.c. Call assign_hook and 
the overwrite the value. So is assign_hook called only to validate the value?  
How do I modify the value of the variable without getting specific?

I tried 

if (changeVal  !(conf-assign_hook))

and it worked. However that is just for int variables. I am not sure if that 
is a design decision. What should I do?

Regards,
 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-06-01 Thread Manfred Koizar
On Fri, 28 May 2004 14:47:01 -0400, Tom Lane [EMAIL PROTECTED] wrote:
If putting back xmax is the price we must pay for nested transactions,
then we *will* pay that price.  Maybe not in this release, but it will
inevitably happen.

we = every Postgres user, even those that do not use subtransactions.

price = 2% to 5% performance loss for databases where the working set
is larger than main memory.

Don't bother hollering veto ;-)

Ok, I'll shut up till I have something concrete to support my opinion.

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-06-01 Thread Manfred Koizar
On Fri, 28 May 2004 21:59:53 -0400, Alvaro Herrera
[EMAIL PROTECTED] wrote:
So the assumption was that when we see that this has
happenned, the Cmin is no longer important (== every future command can
already see the tuple)

If it was that simple, we wouldn't have had to invent the CMIN_IS_CMAX
flag.  This has been discussed two years ago.  Did you follow the link I
posted last week?  Every future command is not enough.  You have to
consider the current command and even commands started before this.

Servus
 Manfred

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] SELECT * FROM table LIMIT 1; is really slow

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 02:37:37PM +0200, Manfred Koizar wrote:
 On Fri, 28 May 2004 21:59:53 -0400, Alvaro Herrera
 [EMAIL PROTECTED] wrote:
 So the assumption was that when we see that this has
 happenned, the Cmin is no longer important (== every future command can
 already see the tuple)
 
 If it was that simple, we wouldn't have had to invent the CMIN_IS_CMAX
 flag.  This has been discussed two years ago.  Did you follow the link I
 posted last week?  Every future command is not enough.  You have to
 consider the current command and even commands started before this.

Yes, I did follow it (you mean XMAX_IS_XMIN, right?  I suppose no tuple
can really have Cmin == Cmax).  I'm not claiming I understood it fully
though.  But as you see, since the assumption is not valid we have to
drop the idea and put back the Xmax as a field on its own on
HeapTupleHeader (which is what I had done before Bruce persuaded me not
to).  I don't really like this idea but I don't see other way out.

A couple of days ago I was going to propose putting Xmax as a separate
field only as needed, in a way similar to the way Oid is handled ---
thus we would enlarge the tuple if and only if the creating transaction
deletes it.  This would be nice because one would expect that there are
not that many tuples created and deleted by the same transaction, so
we'd localize the inefficiency of storing both fields (Cmin and Xmax)
only on tuples that need it.  While I was writing the proposal I
realised that it'd mean enlarging tuples that are already on disk, and
there's no way we can do that.


If you have other ideas I'm all ears.  I'm the last one to want that
nested xacts make everything else work slower.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
The eagle never lost so much time, as
when he submitted to learn of the crow. (William Blake)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Zeugswetter Andreas SB SD

 I think an actually implementable version of this would be:
 
 1. Don't log any index operations at all in WAL.
 
 2. When recovering from WAL, restore all the table contents by WAL
 replay.  (This would of course include the system catalog contents that
 describe the indexes.)  Then sit there and do a global REINDEX to
 rebuild all the indexes.
 
 This would gain a reduction of some percentage in WAL traffic, at the
 cost of a hugely expensive recovery cycle any time you actually needed
 to use the WAL.  I guess this could be attractive to some installations,
 but I'm not sure very many people would want it ...

I think only the global part of it is not really acceptable. If we had a flag
for each index that marks it inconsistent reindexing only those that are
marked would be great.

Could we log a WAL record that basically only marks an index for deferred reindex
after WAL recovery ? During WAL replay all records for this index could be 
ignored (this is not a must because of the post update page images in WAL, 
the index would still stay inconsistent until reindex of course).

I think such a reindex step could also be responsible for those non-btree 
indexes that don't fully support WAL (gist?).

Andreas

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Nested xacts: looking for testers and review

2004-06-01 Thread Bob . Henkel





This may be out of scope but I'm goign to mention it. Would error trapping
help any of these issues. In Oracle PL/SQL you have an exception section to
handle any known or unknown errors.  Is this for the future or does the
nested xacts code include this at all?


|-+--
| |   Bruce Momjian  |
| |   [EMAIL PROTECTED]|
| | |
| |   Sent by:   |
| |   [EMAIL PROTECTED]|
| |   tgresql.org|
| |  |
| |  |
| |   05/28/2004 03:05 PM|
| |  |
|-+--
  
--|
  |
  |
  |   To:   Alvaro Herrera [EMAIL PROTECTED] 
 |
  |   cc:   Tatsuo Ishii [EMAIL PROTECTED], [EMAIL PROTECTED]
   |
  |   Subject:  Re: [HACKERS] Nested xacts: looking for testers and review 
  |
  
--|




Alvaro Herrera wrote:
 On Fri, May 28, 2004 at 01:43:16PM -0400, Bruce Momjian wrote:

  In this case, I want to try all of the inserts, but any of them can
  fail, then I want the bottom part done.

 I wonder where everyone eas when I asked this question a lot of time
 ago.  I said I thought the behavior should be like I described, and no
 one objected.

Sorry, I didn't understand the question at the time, or wasn't paying
attention.

 Personally I think it would be a mistake to allow the COMMIT for the
 subtransaction to ignore the fact that the subxact was aborted.  However
 I realize what you are proposing, and maybe this can be implemented
 using a parameter to COMMIT (indicating to not propagate the error if
 it's in aborted state, but commit normally otherwise).

 However if everyone disagrees, I can take that part out, and the code
 would be simpler.  IMHO however, it would be less reliable.

Imagine this case used in a script:

 BEGIN;
 DROP TABLE test;
 CREATE TABLE test(x int);
 COMMIT;

This will not work because the drop might fail.  However you could use
this:

 BEGIN;
 BEGIN;
 DROP TABLE test;
 COMMIT;
 CREATE TABLE test(x int);
 COMMIT;

It is done in a transaction so the table replace is an atomic operation.

One interesting idea would be for COMMIT to affect the outer
transaction, and END not affect the outer transaction.  Of course that
kills the logic that COMMIT and END are the same, but it is an
interesting idea, and doesn't affect backward compatibility because
END/COMMIT behave the same in non-nested transactions.

If this is the type of issue we are dealing with for the patch, I feel
very good.  Good job Alvaro.

--
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania
19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org






*
PRIVILEGED AND CONFIDENTIAL: This communication, including attachments, is for the 
exclusive use of addressee and may contain proprietary, confidential and/or privileged 
information.  If you are not the intended recipient, any use, copying, disclosure, 
dissemination or distribution is strictly prohibited.  If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
communication and destroy all copies.
*


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Proposed Query Planner TODO items

2004-06-01 Thread markw
On 12 Feb, Tom Lane wrote:
 [EMAIL PROTECTED] writes:
 Ok, I have EXPLAIN ANALYZE results for both the power and throughput
 tests:
  http://developer.osdl.org/markw/dbt3-pgsql/
 
 Thanks.  I just looked at Q9 and Q21, since those are the slowest
 queries according to your chart.  (Are all the queries weighted the same
 for evaluation purposes, or are some more important than others?)
 
[snip]

 The estimate for the part/partsupp join is close enough (60K vs 90K
 rows), but why is it estimating 92 rows out of the join to lineitem when
 the true figure is 681518?  With a more accurate estimate the planner
 would probably have chosen different join methods above this point.
 
 Can you show us the pg_stats rows for the columns p_partkey, l_partkey,
 ps_suppkey, and l_suppkey?
 
 It would also be interesting to see whether a better estimate emerges
 if you increase default_statistics_target (try 100 or so).

http://developer.osdl.org/markw/dbt3-pgsql/62/

This run changes default_statistics_target to 1000 and I have p_partkey,
l_partkey, ps_suppkey, and l_suppkey pg_stats here at 1 min intervals
http (no links on the web page.)  Pretty significant performance change.

Power:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/power.ps_suppkey.out

Throughput:

http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.l_suppkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.p_partkey.out
http://developer.osdl.org/markw/dbt3-pgsql/62/db_stat/thuput.ps_suppkey.out


Something went wrong when I tried to run another test with the Q21
changes overnight, so I'll have to get back to you on that one.

Mark

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 The mechanism you suggest would also break crash recovery, not just PITR

No it wouldn't.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] dblink - custom datatypes NOW work :)

2004-06-01 Thread Joe Conway
Mark Gibson wrote:
   I've found the problem, although I'm still a bit confused by it.
I hate to do this to you now, but after some thought I think I have a 
better approach -- I'd be interested in opinions on that assessment.

The attached eliminates pgresultGetTupleDesc() entirely and instead 
depends on the TupleDesc passed as rsinfo-expectedDesc from the 
executor. What this means is that the string representation of the 
remote value (from the out function on the remote side, as provided by 
libpq) will get fed into the in function corresponding to the local 
type you assign in your SQL statement. Assuming the types on the two 
sides are the same (or at least compatible), it should work well.

Please give this a try and let me know what you think.
Joe
Index: contrib/dblink/dblink.c
===
RCS file: /opt/src/cvs/pgsql-server/contrib/dblink/dblink.c,v
retrieving revision 1.29
diff -c -r1.29 dblink.c
*** contrib/dblink/dblink.c 28 Nov 2003 05:03:01 -  1.29
--- contrib/dblink/dblink.c 13 Feb 2004 18:23:49 -
***
*** 82,88 
  static int16 get_attnum_pk_pos(int16 *pkattnums, int16 pknumatts, int16 key);
  static HeapTuple get_tuple_of_interest(Oid relid, int16 *pkattnums, int16 pknumatts, 
char **src_pkattvals);
  static Oidget_relid_from_relname(text *relname_text);
- static TupleDesc pgresultGetTupleDesc(PGresult *res);
  static char *generate_relation_name(Oid relid);
  
  /* Global */
--- 82,87 
***
*** 395,400 
--- 394,400 
StringInfo  str = makeStringInfo();
char   *curname = NULL;
int howmany = 0;
+   ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo-resultinfo;
  
if (PG_NARGS() == 3)
{
***
*** 457,463 
if (functyptype == 'c')
tupdesc = TypeGetTupleDesc(functypeid, NIL);
else if (functyptype == 'p'  functypeid == RECORDOID)
!   tupdesc = pgresultGetTupleDesc(res);
else
/* shouldn't happen */
elog(ERROR, return type must be a row type);
--- 457,472 
if (functyptype == 'c')
tupdesc = TypeGetTupleDesc(functypeid, NIL);
else if (functyptype == 'p'  functypeid == RECORDOID)
!   {
!   if (!rsinfo)
!   ereport(ERROR,
!   (errcode(ERRCODE_SYNTAX_ERROR),
!errmsg(returning setof record is not 
 \
!   allowed in this 
context)));
! 
!   /* get the requested return tuple description */
!   tupdesc = CreateTupleDescCopy(rsinfo-expectedDesc);
!   }
else
/* shouldn't happen */
elog(ERROR, return type must be a row type);
***
*** 550,555 
--- 559,565 
char   *sql = NULL;
char   *conname = NULL;
remoteConn *rcon = NULL;
+   ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo-resultinfo;
  
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
***
*** 620,626 
if (functyptype == 'c')
tupdesc = TypeGetTupleDesc(functypeid, NIL);
else if (functyptype == 'p'  functypeid == RECORDOID)
!   tupdesc = pgresultGetTupleDesc(res);
else
/* shouldn't happen */
elog(ERROR, return type must be a row type);
--- 630,645 
if (functyptype == 'c')
tupdesc = TypeGetTupleDesc(functypeid, NIL);
else if (functyptype == 'p'  functypeid == RECORDOID)
!   {
!   if (!rsinfo)
!   ereport(ERROR,
!   (errcode(ERRCODE_SYNTAX_ERROR),
!errmsg(returning setof 
record is not  \
!   allowed in 
this context)));
! 
!   /* get the requested return tuple description */
!   tupdesc = CreateTupleDescCopy(rsinfo-expectedDesc);
!   }
else
/* shouldn't happen */
elog(ERROR, return type must be a row type);
***
*** 

Re: [HACKERS] CVS tip problems

2004-06-01 Thread Tom Lane
Oliver Elphick [EMAIL PROTECTED] writes:
 -   strerror_r(errnum, strerrbuf, buflen);
 +   char buf[256];
 +   StrNCpy(strerrbuf, strerror_r(errnum, buf, 256), buflen);
 return strerrbuf;

Easier and safer would be

 -   strerror_r(errnum, strerrbuf, buflen);
 -   return strerrbuf;
 +   return strerror_r(errnum, strerrbuf, buflen);

The real point here is that we need to code differently depending on
which flavor of strerror_r we have.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [pgsql-hackers-win32] [HACKERS] select like...not using index

2004-06-01 Thread Merlin Moncure
  Sometime between yesterday and today queries in the form of
  select * from t where t.f like 'k%' have been broken so that they
  never use the index (on win32, not sure about others).
 
  On win32, at least, they have been broken for a while but this was
due
  to a known issue based on the locales.  AFAICT, the current cvs has
  addressed this issue and (show lc_collate returns C) there seems to
be
  no reason why the queries aren't working properly.
 
  Merlin
 
 Did you do an ANALYZE on the table?
Yes.  Just for kicks, I also drop/rc the index...no help.  Following
that, I ran a fresh initdb which reported:
The database cluster will be initialized with locale English_United
States.1252.

I then ran I just recently had the same issue (due to locale problems).
This was recently fixed in cvs and replaced the hack I was using to work
around the problem.  The index search no longer works and I am very
suspicious about a locale related issue.  This is all off of a fresh
copy of 7.5devel from the anonymous cvs server.

 Are there a lot of duplicate keys?
 How big is the table?
About 250k with less than 1% duplicatation.
 What does the explain look like?

cpc=# explain select * from hchassis where vin_no = '2FTZX08W8WCA24365';
  QUERY PLAN

--
 Index Scan using hchassis_vin_no_idx on hchassis  (cost=0.00..8.94
rows=2 width=437)
   Index Cond: (vin_no = '2FTZX08W8WCA24365'::bpchar)
(2 rows)

cpc=# explain select * from hchassis where vin_no like
'2FTZX08W8WCA24365%';
  QUERY PLAN
--
 Seq Scan on hchassis  (cost=0.00..19577.70 rows=1 width=437)
   Filter: (vin_no ~~ '2FTZX08W8WCA24365%'::text)
(2 rows)

cpc=#

Merlin



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [pgsql-hackers-win32] [HACKERS] select like...not using index

2004-06-01 Thread pgsql
  Sometime between yesterday and today queries in the form of
  select * from t where t.f like 'k%' have been broken so that they
  never use the index (on win32, not sure about others).
 
  On win32, at least, they have been broken for a while but this was
 due
  to a known issue based on the locales.  AFAICT, the current cvs has
  addressed this issue and (show lc_collate returns C) there seems to
 be
  no reason why the queries aren't working properly.
 
  Merlin

 Did you do an ANALYZE on the table?
 Yes.  Just for kicks, I also drop/rc the index...no help.  Following
 that, I ran a fresh initdb which reported:
 The database cluster will be initialized with locale English_United
 States.1252.

 I then ran I just recently had the same issue (due to locale problems).
 This was recently fixed in cvs and replaced the hack I was using to work
 around the problem.  The index search no longer works and I am very
 suspicious about a locale related issue.  This is all off of a fresh
 copy of 7.5devel from the anonymous cvs server.

 Are there a lot of duplicate keys?
 How big is the table?
 About 250k with less than 1% duplicatation.
 What does the explain look like?

 cpc=# explain select * from hchassis where vin_no = '2FTZX08W8WCA24365';
   QUERY PLAN
 
 --
  Index Scan using hchassis_vin_no_idx on hchassis  (cost=0.00..8.94
 rows=2 width=437)
Index Cond: (vin_no = '2FTZX08W8WCA24365'::bpchar)
 (2 rows)

 cpc=# explain select * from hchassis where vin_no like
 '2FTZX08W8WCA24365%';
   QUERY PLAN
 --
  Seq Scan on hchassis  (cost=0.00..19577.70 rows=1 width=437)
Filter: (vin_no ~~ '2FTZX08W8WCA24365%'::text)
 (2 rows)

 cpc=#

It looks to me like you have an index of type bpchar but are searching
with type text. I find type conversions very limited with LIKE.

I would create an index on 'vin_no' using a cast to TEXT. This should work
on both queries.


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [pgsql-hackers-win32] [HACKERS] select like...not using index

2004-06-01 Thread Merlin Moncure
 It looks to me like you have an index of type bpchar but are
searching
 with type text. I find type conversions very limited with LIKE.
 
 I would create an index on 'vin_no' using a cast to TEXT. This should
work
 on both queries.

Not in this case.  Just to be sure, I created a new column as text type,
created index, analyzed, and searched and got the same behavior.

Furthermore, I did this:
cpc=# show lc_collate;
 lc_collate

 C
(1 row)

cpc=# show lc_ctype;
 lc_ctype
--
 C
(1 row)

followed by this:
C:\postgres\pgsql\src\test\localepg_controldata
[...]
LC_COLLATE:   English_United States.1252
LC_CTYPE: English_United States.1252


At this point I'm about 90% sure I've turned up a locale related
bug...initdb warned me wrt the locale but psql is still reporting 'C'.
Plus, my queries don't work where they used to about a week ago.  My
next step is to initdb --locale=C to confirm this.  I've informed Magnus
about this and he is looking into it.

Merlin


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Bob . Henkel





Sounds like a project manager type should be put into place to organize
this information into a straight stream instead of 50 random mists of water




|-+--
| |   Andrew Dunstan |
| |   [EMAIL PROTECTED]  |
| |   Sent by:   |
| |   [EMAIL PROTECTED]|
| |   tgresql.org|
| |  |
| |  |
| |   06/01/2004 11:10 AM|
| |  |
|-+--
  
--|
  |
  |
  |   To:   [EMAIL PROTECTED]  
   |
  |   cc:  
  |
  |   Subject:  Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004 
  |
  
--|




Marc G. Fournier wrote:


 Just so that everyone is aware, we are going to push the freeze date
 for 7.5 to July 1st.

 Although we feel that there are enough improvements and features
 already in place for 7.5, Tom's felt that if we gave it that extra
 month, we could also have PITR in place for 7.5 ...

 If anyone is working on other features that they feel can be polished
 off before the July 1st deadline, we would be most happy to
 incorporate those as well, but do recommend submitting patches for
 review *sooner*, rather then later, so that any recommended
 corrections can be addressed before teh deadline.


I welcome this, as I always thought June 1 was too soon. However, I
think that the process by which the date was eventually arrived at was
unfortunate.

I would modestly suggest that there should be a minimum period of notice
of a feature freeze - 6 weeks or 2 months seems about right to me, given
the development cycle we seem to have, and the fact that many of the
critical things people are working on are quite large.

(I'd also like to see someone who would get regular progress reports
from people who have undertaken to work on large/critical items, so that
we don't get into a position of thinking they will make a cutoff date
and then finding out late in the piece that they will not, but maybe
that's a discussion for another day).

cheers

andrew

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster






*
PRIVILEGED AND CONFIDENTIAL: This communication, including attachments, is for the 
exclusive use of addressee and may contain proprietary, confidential and/or privileged 
information.  If you are not the intended recipient, any use, copying, disclosure, 
dissemination or distribution is strictly prohibited.  If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
communication and destroy all copies.
*


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Andrew Dunstan
Marc G. Fournier wrote:
Just so that everyone is aware, we are going to push the freeze date 
for 7.5 to July 1st.

Although we feel that there are enough improvements and features 
already in place for 7.5, Tom's felt that if we gave it that extra 
month, we could also have PITR in place for 7.5 ...

If anyone is working on other features that they feel can be polished 
off before the July 1st deadline, we would be most happy to 
incorporate those as well, but do recommend submitting patches for 
review *sooner*, rather then later, so that any recommended 
corrections can be addressed before teh deadline.

I welcome this, as I always thought June 1 was too soon. However, I 
think that the process by which the date was eventually arrived at was 
unfortunate.

I would modestly suggest that there should be a minimum period of notice 
of a feature freeze - 6 weeks or 2 months seems about right to me, given 
the development cycle we seem to have, and the fact that many of the 
critical things people are working on are quite large.

(I'd also like to see someone who would get regular progress reports 
from people who have undertaken to work on large/critical items, so that 
we don't get into a position of thinking they will make a cutoff date 
and then finding out late in the piece that they will not, but maybe 
that's a discussion for another day).

cheers
andrew
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-06-01 Thread Greg Stark
Christopher Browne [EMAIL PROTECTED] writes:

 PITR may turn out to be a don't care item if Slony1 winds up
 providing its own approach to PITR.  (e.g. - if you  write out to
 disk the sets of SQL statements that are to be applied to a replica,
 then the spooled sets of these statements represent a history of
 updates that can be used to do PITR.)

In the long run nothing can substitute for PITR. It's the only way to get a
backup that is guaranteed to restore you to exactly what you had before.

Logical dumps a la pg_dump suffer from having to unparse and parse all the
data. Any data types that don't accurately store themselves as text (such as
arrays currently, due to the index lower bound) get corrupted.

SQL level replication, a la Slony wouldn't serve if you have any
non-deterministic behaviour. And given SQL's set theoretic roots
non-deterministic behaviour can creep in in places where it's not expected,
such as any query that doesn't have an ORDER BY clause.

Both of these tools have their uses, but they don't provide a rock solid
guarantee that you can restore a machine to exactly what you had previously.
To do that you really want something that works at the storage level and
doesn't try to re-interpret data or reapply any logic.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 I thought for a little bit about a magic reconstruct the index WAL
 entry that would invoke the index build procedure in toto, but that
 doesn't look like it will fly either.  (Two problems: during crash
 recovery, you couldn't be sure that what's on disk for the underlying
 table exactly matches the index you need to build --- it could be a
 later state of the table; and besides, the environment of the WAL replay
 process isn't capable of running user-defined functions, so it couldn't
 work for functional indexes.)

Could you just mark the index as unusable? Have the optimizer ignore such
indexes and PITR recovery can notify the user of these indexes and/or invoke a
rebuild automatically?

It wouldn't happen unless the user had done an index rebuild since the last
complete backup, so it wouldn't even be a performance issue. Restoring the
index from the WAL replay of an index rebuild must take a long time anyways.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Marc G. Fournier
On Tue, 1 Jun 2004, Andrew Dunstan wrote:
Marc G. Fournier wrote:
Just so that everyone is aware, we are going to push the freeze date for 
7.5 to July 1st.

Although we feel that there are enough improvements and features already 
in place for 7.5, Tom's felt that if we gave it that extra month, we could 
also have PITR in place for 7.5 ...

If anyone is working on other features that they feel can be polished off 
before the July 1st deadline, we would be most happy to incorporate those 
as well, but do recommend submitting patches for review *sooner*, rather 
then later, so that any recommended corrections can be addressed before 
teh deadline.

I welcome this, as I always thought June 1 was too soon. However, I think 
that the process by which the date was eventually arrived at was unfortunate.

I would modestly suggest that there should be a minimum period of notice of a 
feature freeze - 6 weeks or 2 months seems about right to me, given the
Oh, you mean the original freeze date that was set at the start of the dev 
cycle 6 months ago?


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Andrew Dunstan
Marc G. Fournier wrote:
On Tue, 1 Jun 2004, Andrew Dunstan wrote:
Marc G. Fournier wrote:
Just so that everyone is aware, we are going to push the freeze date 
for 7.5 to July 1st.

Although we feel that there are enough improvements and features 
already in place for 7.5, Tom's felt that if we gave it that extra 
month, we could also have PITR in place for 7.5 ...

If anyone is working on other features that they feel can be 
polished off before the July 1st deadline, we would be most happy to 
incorporate those as well, but do recommend submitting patches for 
review *sooner*, rather then later, so that any recommended 
corrections can be addressed before teh deadline.

I welcome this, as I always thought June 1 was too soon. However, I 
think that the process by which the date was eventually arrived at 
was unfortunate.

I would modestly suggest that there should be a minimum period of 
notice of a feature freeze - 6 weeks or 2 months seems about right to 
me, given the

Oh, you mean the original freeze date that was set at the start of the 
dev cycle 6 months ago?

I am far from being the only person to whom this was less than clear. I 
also know that when I discussed this with one or two members of the core 
team *they* were not clear about it either.

Maybe I missed something in an email somewhere ...
In any case, I think a target date should be set at the beginning of a 
dev cycle and a hard date should be set closer to the end of the cycle. 
Trying to adhere rigidly to a date set nine or twelve months previously 
doesn't strike me as good practice.

cheers
andrew
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 12:52:32PM -0400, Greg Stark wrote:
 
 Tom Lane [EMAIL PROTECTED] writes:
 
  I thought for a little bit about a magic reconstruct the index WAL
  entry that would invoke the index build procedure in toto, but that
  doesn't look like it will fly either.  (Two problems: during crash
  recovery, you couldn't be sure that what's on disk for the underlying
  table exactly matches the index you need to build --- it could be a
  later state of the table; and besides, the environment of the WAL replay
  process isn't capable of running user-defined functions, so it couldn't
  work for functional indexes.)
 
 Could you just mark the index as unusable? Have the optimizer ignore such
 indexes and PITR recovery can notify the user of these indexes and/or invoke a
 rebuild automatically?

The big problem I see with this kind of approaches is that building an
index from scratch can take a huge amount of time, because you have to
sort the data.  Building from WAL does not have this problem, so it can
be much faster.  Of course, when you are restoring using a PITR
approach you probably want it to be very fast, and have the DB running
with as little quirks as possible, as soon as possible.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
El realista sabe lo que quiere; el idealista quiere lo que sabe (Anónimo)


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Bob . Henkel





I really hate seeing all the developers wasting time and brain cycles on
this type of stuff.  I would much rather that time and brain cycles be put
to the design and development of the code.

Would a project manager type position be of any value to take some of this
off the developers and onto the project manager.? They would be the focal
point for this type of stuff and responsible to get updates from the
developers and check statuses and things of that nature.


|-+--
| |   Andrew Dunstan |
| |   [EMAIL PROTECTED]  |
| |   Sent by:   |
| |   [EMAIL PROTECTED]|
| |   tgresql.org|
| |  |
| |  |
| |   06/01/2004 12:26 PM|
| |  |
|-+--
  
--|
  |
  |
  |   To:   [EMAIL PROTECTED]  
   |
  |   cc:  
  |
  |   Subject:  Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004 
  |
  
--|




Marc G. Fournier wrote:

 On Tue, 1 Jun 2004, Andrew Dunstan wrote:

 Marc G. Fournier wrote:


 Just so that everyone is aware, we are going to push the freeze date
 for 7.5 to July 1st.

 Although we feel that there are enough improvements and features
 already in place for 7.5, Tom's felt that if we gave it that extra
 month, we could also have PITR in place for 7.5 ...

 If anyone is working on other features that they feel can be
 polished off before the July 1st deadline, we would be most happy to
 incorporate those as well, but do recommend submitting patches for
 review *sooner*, rather then later, so that any recommended
 corrections can be addressed before teh deadline.


 I welcome this, as I always thought June 1 was too soon. However, I
 think that the process by which the date was eventually arrived at
 was unfortunate.

 I would modestly suggest that there should be a minimum period of
 notice of a feature freeze - 6 weeks or 2 months seems about right to
 me, given the


 Oh, you mean the original freeze date that was set at the start of the
 dev cycle 6 months ago?


I am far from being the only person to whom this was less than clear. I
also know that when I discussed this with one or two members of the core
team *they* were not clear about it either.

Maybe I missed something in an email somewhere ...

In any case, I think a target date should be set at the beginning of a
dev cycle and a hard date should be set closer to the end of the cycle.
Trying to adhere rigidly to a date set nine or twelve months previously
doesn't strike me as good practice.

cheers

andrew

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]






*
PRIVILEGED AND CONFIDENTIAL: This communication, including attachments, is for the 
exclusive use of addressee and may contain proprietary, confidential and/or privileged 
information.  If you are not the intended recipient, any use, copying, disclosure, 
dissemination or distribution is strictly prohibited.  If you are not the intended 
recipient, please notify the sender immediately by return e-mail, delete this 
communication and destroy all copies.
*


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Greg Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 The big problem I see with this kind of approaches is that building an
 index from scratch can take a huge amount of time, because you have to
 sort the data.  Building from WAL does not have this problem, so it can
 be much faster.  

I'm not clear that building from WAL is really going to be that much faster.
A) algorithmically it's only the factor of log(n) that you're talking about.
and B) the WAL will have records for every write, not just the final product,
so it might potentially have a lot more writes to do.

I thought part of the original problem was specifically that going through WAL
slowed down the index rebuild much more than a factor of 2, which would tend
to imply that in fact rebuilding from WAL isn't going to be as fast as you
might expect.

Another possibility is doing the complete index build without going through
WAL and then inserting a complete copy of the index into the WAL without
syncing or activating the rebuilt index until the copy do WAL is done. That
kind of sucks since it's equivalent to just taking another backup of the data
files immediately after the rebuild, but might be a more direct solution using
the existing tools.

 Of course, when you are restoring using a PITR approach you probably want it
 to be very fast, and have the DB running with as little quirks as possible,
 as soon as possible.

This is certainly true.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Mike Benoit
Having a good hard copy (not having to search mailing list archives)
of release dates would be really nice, not just for developers, but
users too. Even if they are subject to change without notice. 

I think Mozilla has a great concept with there Milestone Schedule, the
gray table at: http://www.mozilla.org/roadmap.html#milestone-schedule.

I'm sure having just a small table like what Mozilla uses on the
PostgreSQL developers page would work wonders to eliminate much of the
confusion in the future. 


On Tue, 2004-06-01 at 13:26 -0400, Andrew Dunstan wrote:
 Marc G. Fournier wrote:
 
  On Tue, 1 Jun 2004, Andrew Dunstan wrote:
 
  Marc G. Fournier wrote:
 
 
  Just so that everyone is aware, we are going to push the freeze date 
  for 7.5 to July 1st.
 
  Although we feel that there are enough improvements and features 
  already in place for 7.5, Tom's felt that if we gave it that extra 
  month, we could also have PITR in place for 7.5 ...
 
  If anyone is working on other features that they feel can be 
  polished off before the July 1st deadline, we would be most happy to 
  incorporate those as well, but do recommend submitting patches for 
  review *sooner*, rather then later, so that any recommended 
  corrections can be addressed before teh deadline.
 
 
  I welcome this, as I always thought June 1 was too soon. However, I 
  think that the process by which the date was eventually arrived at 
  was unfortunate.
 
  I would modestly suggest that there should be a minimum period of 
  notice of a feature freeze - 6 weeks or 2 months seems about right to 
  me, given the
 
 
  Oh, you mean the original freeze date that was set at the start of the 
  dev cycle 6 months ago?
 
 
 I am far from being the only person to whom this was less than clear. I 
 also know that when I discussed this with one or two members of the core 
 team *they* were not clear about it either.
 
 Maybe I missed something in an email somewhere ...
 
 In any case, I think a target date should be set at the beginning of a 
 dev cycle and a hard date should be set closer to the end of the cycle. 
 Trying to adhere rigidly to a date set nine or twelve months previously 
 doesn't strike me as good practice.
 
 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
-- 
Mike Benoit [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 I'm not clear that building from WAL is really going to be that much faster.
 A) algorithmically it's only the factor of log(n) that you're talking about.
 and B) the WAL will have records for every write, not just the final product,
 so it might potentially have a lot more writes to do.

Wrong ... what we log in WAL for a btree index build is just the series
of completed index page images.  Recreation of the index would proceed
at whatever your disk read/write bandwidth is.

Like Alvaro, I suspect that people who are using PITR will be concerned
about recovery time, and would not be thrilled with any scenario that
involves REINDEX to get the system back on its feet.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 01:55:38PM -0400, Greg Stark wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 
  The big problem I see with this kind of approaches is that building an
  index from scratch can take a huge amount of time, because you have to
  sort the data.  Building from WAL does not have this problem, so it can
  be much faster.  
 
 I'm not clear that building from WAL is really going to be that much faster.
 A) algorithmically it's only the factor of log(n) that you're talking about.
 and B) the WAL will have records for every write, not just the final product,
 so it might potentially have a lot more writes to do.

Maybe it is log(n) algorithmically, but the constants are big because
there's a lot of non-sequential I/O involved.  With the WAL approach you
only read the pages from the log and copy them somewhere else.  It's not
nearly the same amount of I/O, and it's mostly sequential.  And if I
understood correctly what Tom said, after index construction the whole
index pages are dropped, so there's no need to redo each node split
operation.


 I thought part of the original problem was specifically that going through WAL
 slowed down the index rebuild much more than a factor of 2, which would tend
 to imply that in fact rebuilding from WAL isn't going to be as fast as you
 might expect.

I think there was more than one problem in the code.  I expect at least
those not related (such as some locking issue apparently) are solved.
And I'd expect WAL construction to be heavier than recovery from WAL,
because some WAL writes have to be fsync()ed, and this is a heavy
burden, while recovery does not need fsync on the new files AFAIK (but I
could be wrong on this).

One of the things that bothered me was that for some reason you couldn't
get the whole performance benefit you would expect from simultaneous
index builds when using a multiprocessor machine.


 Another possibility is doing the complete index build without going through
 WAL and then inserting a complete copy of the index into the WAL without
 syncing or activating the rebuilt index until the copy do WAL is done. That
 kind of sucks since it's equivalent to just taking another backup of the data
 files immediately after the rebuild, but might be a more direct solution using
 the existing tools.

Apparently this is the current state of affairs, though I'm not sure.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
La persona que no quería pecar / estaba obligada a sentarse
en duras y empinadas sillas/ desprovistas, por cierto
de blandos atenuantes  (Patricio Vogel)


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Matthew T. O'Connor
Or even KDE as an example where they have both a document on the website
for release schedule and another one that is a list of features that are
desired for the next release, have been worked on, and have been
completed.

http://developer.kde.org/development-versions/


 Having a good hard copy (not having to search mailing list archives)
 of release dates would be really nice, not just for developers, but
 users too. Even if they are subject to change without notice.

 I think Mozilla has a great concept with there Milestone Schedule, the
 gray table at: http://www.mozilla.org/roadmap.html#milestone-schedule.

 I'm sure having just a small table like what Mozilla uses on the
 PostgreSQL developers page would work wonders to eliminate much of the
 confusion in the future.


 On Tue, 2004-06-01 at 13:26 -0400, Andrew Dunstan wrote:
 Marc G. Fournier wrote:

  On Tue, 1 Jun 2004, Andrew Dunstan wrote:
 
  Marc G. Fournier wrote:
 
 
  Just so that everyone is aware, we are going to push the freeze date
  for 7.5 to July 1st.
 
  Although we feel that there are enough improvements and features
  already in place for 7.5, Tom's felt that if we gave it that extra
  month, we could also have PITR in place for 7.5 ...
 
  If anyone is working on other features that they feel can be
  polished off before the July 1st deadline, we would be most happy to
  incorporate those as well, but do recommend submitting patches for
  review *sooner*, rather then later, so that any recommended
  corrections can be addressed before teh deadline.
 
 
  I welcome this, as I always thought June 1 was too soon. However, I
  think that the process by which the date was eventually arrived at
  was unfortunate.
 
  I would modestly suggest that there should be a minimum period of
  notice of a feature freeze - 6 weeks or 2 months seems about right to
  me, given the
 
 
  Oh, you mean the original freeze date that was set at the start of the
  dev cycle 6 months ago?
 

 I am far from being the only person to whom this was less than clear. I
 also know that when I discussed this with one or two members of the core
 team *they* were not clear about it either.

 Maybe I missed something in an email somewhere ...

 In any case, I think a target date should be set at the beginning of a
 dev cycle and a hard date should be set closer to the end of the cycle.
 Trying to adhere rigidly to a date set nine or twelve months previously
 doesn't strike me as good practice.

 cheers

 andrew

 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 --
 Mike Benoit [EMAIL PROTECTED]


 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-06-01 Thread Heikki Linnakangas
On Thu, 27 May 2004, Christopher Browne wrote:

 [EMAIL PROTECTED] (Bruce Momjian) wrote:
  Win32 has 98% of its code in CVS, so it will make it
  Tablespaces - Christopher says it is ready, and has run tests
  PITR - some functionality might be in 7.5, but we aren't sure
  Nested transactions - Alvaro thinks it is close, but we don't know

 Does this mean that 2PC is likely to be deferred?  I believe that
 integration work on that was awaiting having nested transactions in
 the codebase...

I'm not waiting for the nested transactions to land anymore. I have simply
been too busy to finish up 2PC. :(

When I said I'd wait for nested transactions, some weeks ago, I thought
that Alvaros patch was going to be checked in within a couple of days or a
week. That impression was wrong. Anyway, I don't believe the patches
conflict very badly, so there's not really a need for either one to wait.

Pushing the freeze date to 1. July just might be enough for me to finish
the 2PC patch, but don't get your hopes up. I don't know how much time I
can spend on it in June.

The issue that I'm currently pondering is what to do with SET SESSION
variables on 2PC. This far my thinking has been that the PREPARE
TRANSACTION command (1st phase) detaches the transaction from the backend
You can do whatever you want with the backend, run other transaction,
drop the connection etc. The prepared transaction acts just like a
running transaction in another backend. You can also use a different
backend to commit the transaction. The question is, what happens to
session variables when the first transaction commits? The original
backend could be in the middle of another transaction, surely you can't
just modify the variables. But you can't really ignore them either.

There is also a lot of other functionality that throws not implemented
errors if you try to use 2PC with them. For example, DDL and
notifications. I believe they are quite straightforward, I just
haven't had the time to tackle them yet.

You can always check the latest version of the patch at
http://www.hut.fi/~hlinnaka/pgsql/, I try to update it whenever there is
progress.

- Heikki


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Simon Riggs
On Tue, 2004-06-01 at 18:26, Andrew Dunstan wrote:
 Marc G. Fournier wrote:
 
  On Tue, 1 Jun 2004, Andrew Dunstan wrote:
 
  Marc G. Fournier wrote:
 
 
  Just so that everyone is aware, we are going to push the freeze date 
  for 7.5 to July 1st.
 
  Although we feel that there are enough improvements and features 
  already in place for 7.5, Tom's felt that if we gave it that extra 
  month, we could also have PITR in place for 7.5 ...
 
  If anyone is working on other features that they feel can be 
  polished off before the July 1st deadline, we would be most happy to 
  incorporate those as well, but do recommend submitting patches for 
  review *sooner*, rather then later, so that any recommended 
  corrections can be addressed before teh deadline.
 
 
  I welcome this, as I always thought June 1 was too soon. However, I 
  think that the process by which the date was eventually arrived at 
  was unfortunate.
 
  I would modestly suggest that there should be a minimum period of 
  notice of a feature freeze - 6 weeks or 2 months seems about right to 
  me, given the
 
 
  Oh, you mean the original freeze date that was set at the start of the 
  dev cycle 6 months ago?
 
 
 I am far from being the only person to whom this was less than clear. I 
 also know that when I discussed this with one or two members of the core 
 team *they* were not clear about it either.
 
 Maybe I missed something in an email somewhere ...
 

The June 1st date was first mentioned on list in mid-March (to me), but
wasn't generally announced until May under a specific heading. If it was
set in January, I was never knowingly party to that info.

Major-architectural changes notwithstanding, xlog archiving was
originally completed in late April, having started in Feb.

A published schedule might have helped all of us to understand the
impact of an extra weeks discussion etc..

Personally, I feel I had good notice, but that doesn't mean it was
possible for me to finish by that time...

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Bruce Momjian
Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  I'm not clear that building from WAL is really going to be that much faster.
  A) algorithmically it's only the factor of log(n) that you're talking about.
  and B) the WAL will have records for every write, not just the final product,
  so it might potentially have a lot more writes to do.
 
 Wrong ... what we log in WAL for a btree index build is just the series
 of completed index page images.  Recreation of the index would proceed
 at whatever your disk read/write bandwidth is.
 
 Like Alvaro, I suspect that people who are using PITR will be concerned
 about recovery time, and would not be thrilled with any scenario that
 involves REINDEX to get the system back on its feet.

Agreed.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Simon Riggs
On Tue, 2004-06-01 at 03:21, Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian [EMAIL PROTECTED] writes:
   I assume if someone turns on archiving in postgresql.conf, sighups the
   postmaster, then does a tar backup, they should be able to do archiving,
   no?
  
  I would have zero problem with labeling the archive parameter as
  changeable only at postmaster start.
 
 I guess the question is whether it would be possible to start/stop it at
 other times.  And what process are we going to use to do a tar backup? 
 Do they turn off archiving before doing the tar, or tell the system to
 tar to another location?  
 

This situation has been discussed and agreed twice already. First we
discussed it was SUSET, then SIGHUP, now we talk about postmaster
startup.

I'm not sure I'm too bothered either way, but the code has now been
written to make it a SIGHUP operation. Making it SIGHUP effects the way
we invoke the archiver process at postmaster startup, so if we want to
change things again we must do so real soon.

Postmaster startup is the simplest scenario at run-time, so I'd suggest
we move to that NOW and then MAYBE back to SIGHUP at a later time, when
we are more certain everything works in production.

 And you brought up the issue of how do we feed multilple archive files
 back into the xlog directory during restore if they don't all fit on the
 disk.
 

That has already been requested by Tom and agreed as on-the-PITR feature
list as an embellishment of the general recover-to-a point scenario. It
*MIGHT* make it into this release, if we get the other stuff done first.

 I think we need to explore the procedures we are going to use for PITR.
 

Much of that has already been discussed.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Fast index build vs. PITR

2004-06-01 Thread Simon Riggs
On Tue, 2004-06-01 at 03:13, Alvaro Herrera wrote:
 A completely different idea would be to log a logical index creation,
 so that during normal recovery those entries are saved somewhere; after
 the rest of WAL recovery is done, the system is taken into a more normal
 post-recovery pre-usable state, on which those indexes are recreated
 from user data.  This would be cheapest in WAL traffic, but probably
 it'll also require more code and new hooks in the startup mechanism.
 Also, it'd require examining later WAL entries that refer to the index
 and act accordingly (e.g. ignore the entry if it modifies the index, and
 forget the creation if it's a DROP INDEX command.)
 

There will be many ways to optimise recovery once we have PITR
working...

The current code does a straight replay of all changes. We can imagine
lots of different multi-pass or lookahead strategies for replaying xlog
records, but please lets wait awhile...

 Not that I like neither of those ideas really ... issuing normal WAL
 index creation traffic if PITR is active is certainly the easiest way.

I agree, certainly for now.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Josh Berkus
Shridhar, Tom,

As long as we're messing around with PostgreSQL.conf, I propose that we 
comment the file much more thouroughly -- in the style of Apache's httpd.conf 
and our own pg_hba.conf (though maybe not quite as long as hba).   Someone 
proposed this for 7.4 and we ran out of time, and as I've thought about the 
idea over the last 6 months I've come to like it.

I'm happy to do the work for this ... it should be relatively easy if 
everyone's Doc patches are up to date.I can't see any drawback to it; is 
there something I'm missing?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 As long as we're messing around with PostgreSQL.conf, I propose that we 
 comment the file much more thouroughly -- in the style of Apache's httpd.conf
 and our own pg_hba.conf (though maybe not quite as long as hba).

ISTM we had decided that putting vast amounts of documentation into the
file comments was exactly the thing *not* to do.  It duplicates the SGML
documentation, thereby doubling the maintenance effort, to very little
purpose.  pg_hba's comments in particular had gotten quite out of hand
at one point, and have been pruned back severely.  Let's not follow the
same wheel of fate for postgresql.conf.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Win32, PITR, nested transactions, tablespaces

2004-06-01 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 There is also a lot of other functionality that throws not implemented
 errors if you try to use 2PC with them. For example, DDL and
 notifications.

Why would DDL be different from any other query?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] ACLs versus ALTER OWNER

2004-06-01 Thread Tom Lane
I've noticed yet another hole in our handling of object permissions,
which is that ALTER OWNER doesn't modify the object ACL list at all.
This leads to unpleasant results.  For example, in CVS tip:

regression=# \c - alice
You are now connected as new user alice.
regression= create table atable (f1 int);
CREATE TABLE
regression= grant select on atable to public;
GRANT
regression= \z atable
   Access privileges for database regression
 Schema |  Name  | Type  |   Access privileges
++---+
 public | atable | table | {alice=arwdRxt/alice,=r/alice}
(1 row)
 
regression= \c - postgres
You are now connected as new user postgres.
regression=# alter table atable owner to bob;
ALTER TABLE
regression=# \c - bob
You are now connected as new user bob.
regression= insert into atable values(1);
ERROR:  permission denied for relation atable

Bob hasn't got insert permissions on his own table ... the ACL says so.
Well, since Bob is now the owner he can fix that:

regression= grant all on atable to bob;
GRANT
regression= insert into atable values(1);
INSERT 154991 1

but he's not out of the woods yet.  The ACL now looks like this:

regression= \z atable
   Access privileges for database regression
 Schema |  Name  | Type  |   Access privileges
++---+
 public | atable | table | {alice=arwdRxt/alice,=r/alice,bob=arwdRxt/bob}
(1 row)

Alice still has all permissions, and PUBLIC still has select
permissions, and there isn't a darn thing Bob can do about it
because he didn't grant those permissions:
 
regression= revoke all on atable from alice;
REVOKE
regression= revoke all on atable from public;
REVOKE
regression= \z atable
   Access privileges for database regression
 Schema |  Name  | Type  |   Access privileges
++---+
 public | atable | table | {alice=arwdRxt/alice,=r/alice,bob=arwdRxt/bob}
(1 row)
 
Even more interesting, the superuser can't fix it either, at least not
without manual hacking of the ACL entry, because any GRANT/REVOKE the
superuser issues on the object will be treated as issued by Bob.
The *only* way to get rid of those rights is to persuade Alice to revoke
them.  (Or for the superuser to revert the ownership change, revoke the
rights as-if-Alice, and then give the table back to Bob.  Blech.)

ISTM that reasonable behavior for ALTER OWNER would include doing
surgery on the object's ACL to replace references to the old owner by
references to the new owner.  A simplistic approach would just be to do
that everywhere in both the grantor and grantee fields.  If there are
existing entries mentioning the new owner then this could produce
duplicate ACL entries, which would need to be merged together.

I think there are corner cases where the merging might produce
unintuitive results, but it couldn't be as spectacularly bad as
doing nothing is.

Comments?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Bruce Momjian
Alvaro Herrera wrote:
 
 Yes, I did follow it (you mean XMAX_IS_XMIN, right?  I suppose no tuple
 can really have Cmin == Cmax).  I'm not claiming I understood it fully
 though.  But as you see, since the assumption is not valid we have to
 drop the idea and put back the Xmax as a field on its own on
 HeapTupleHeader (which is what I had done before Bruce persuaded me not
 to).  I don't really like this idea but I don't see other way out.
 
 A couple of days ago I was going to propose putting Xmax as a separate
 field only as needed, in a way similar to the way Oid is handled ---
 thus we would enlarge the tuple if and only if the creating transaction
 deletes it.  This would be nice because one would expect that there are
 not that many tuples created and deleted by the same transaction, so
 we'd localize the inefficiency of storing both fields (Cmin and Xmax)
 only on tuples that need it.  While I was writing the proposal I
 realised that it'd mean enlarging tuples that are already on disk, and
 there's no way we can do that.

I have read the archives and I think understand the issue.  Before
subtransactions, the only transaction that could see and hence delete a
tuple created by an open transaction was the transaction itself, and to
keep the cmin and cmax, we created a separate tuple bit which indicated
the xmin and xmax were the same.

With subtransactions, other xids (subtransaction ids) can see and delete
tuples created by earlier parts of the main transaction, and the tuple
bit cmin=cmax doesn't work.

So, we need a way to record the xmin and xmax while keeping cmin and
cmax in the tuple header.  My idea is for subtransactions to create
additional xid's that represent the opposite of the commit state for
changing tuples created by earlier subtransactions.

BEGIN;  xid=1
INSERT a;
BEGIN;  xid=2
INSERT b;
DELETE a; xid=3
COMMIT;
COMMIT;

When DELETE a happens, we remove the xmin=1 from the tuple header and
replace it with xmin=3.  xid=3 will be marked as committed if xid2
aborts, and will be marked as aborted if xid3 commits.

So, if xid2 aborts, the insert of xid1 should be honored, and xid3 is
marked as committed, and the opposite if xid2 commits.

We would have to use pg_subtrans so these phantom xids could point to
the base xid and a list would have to be maintained so higher-level
subtransactions aborting would trigger changes in these phantom xids,
that is, if xid1 aborts, xid2 should abort as well.

Anyway, this is more of a sketch of an possible way to do this without
extending the tuple header for all transactions.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 06:40:07PM -0400, Bruce Momjian wrote:

 So, we need a way to record the xmin and xmax while keeping cmin and
 cmax in the tuple header.  My idea is for subtransactions to create
 additional xid's that represent the opposite of the commit state for
 changing tuples created by earlier subtransactions.

Hmm, interesting idea.  What seems more interesting is that the change
seems to be confined to HeapTupleHeaderSetXmax.  Every transaction and
subtransaction will need two Xids (I think we can even optimize it so
the abort xid is taken only as needed).

I don't see anything immediately that would invalidate this idea.  I'll
marinate it while I write the trigger stuff, and wait for other
comments.  If nothing bad arises I'll try an implementation and report
back.

Thanks,

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Nunca confiaré en un traidor.  Ni siquiera si el traidor lo he creado yo
(Barón Vladimir Harkonnen)


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
   BEGIN;  xid=1
   INSERT a;
   BEGIN;  xid=2
   INSERT b;
   DELETE a; xid=3
   COMMIT;
   COMMIT;

 When DELETE a happens, we remove the xmin=1 from the tuple header and
 replace it with xmin=3.

You can't change xmin --- this would break visibility tests.  Consider
a cursor opened in the outer transaction after the INSERT a.  It should
be able to see the a row (note that this depends on recognizing xid
equality and then comparing cid's within the outer transaction).  If the
subtransaction mangles xmin then it is no longer possible to make this
test correctly.

This is exactly the same argument as not being able to overwrite cmin.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Neil Conway
Tom Lane wrote:
ISTM we had decided that putting vast amounts of documentation into the
file comments was exactly the thing *not* to do.  It duplicates the SGML
documentation, thereby doubling the maintenance effort, to very little
purpose.
I agree. If people really think that adding more comments to 
pg_hba.conf is a good idea, I think the best way to do that is to 
automatically extract that information from the main SGML docs.

-Neil
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [pgsql-hackers-win32] [HACKERS] select like...not using index

2004-06-01 Thread pgsql
 It looks to me like you have an index of type bpchar but are
 searching
 with type text. I find type conversions very limited with LIKE.

 I would create an index on 'vin_no' using a cast to TEXT. This should
 work
 on both queries.

 Not in this case.  Just to be sure, I created a new column as text type,
 created index, analyzed, and searched and got the same behavior.

Hmmm, snipped from your reply was the explain plan from the query where it
was clear you were using two different character data types: bpchat and
text. That, alone, may have been a problem.


Looking at your defaults, did you do:

initdb --locale=C somepath
?

I found, at some point, 'C' used to be the default, now it seems initdb
wants to fish out what locale your system is using.

Personally, I think, if I do not specify a locale, I don't want a specific
locale. Period. I haven't been paying too close attention to the hackers
list to say when this happened, but it bit me a couple times.


 Furthermore, I did this:
 cpc=# show lc_collate;
  lc_collate
 
  C
 (1 row)

 cpc=# show lc_ctype;
  lc_ctype
 --
  C
 (1 row)

 followed by this:
 C:\postgres\pgsql\src\test\localepg_controldata
 [...]
 LC_COLLATE:   English_United States.1252
 LC_CTYPE: English_United States.1252


 At this point I'm about 90% sure I've turned up a locale related
 bug...initdb warned me wrt the locale but psql is still reporting 'C'.
 Plus, my queries don't work where they used to about a week ago.  My
 next step is to initdb --locale=C to confirm this.  I've informed Magnus
 about this and he is looking into it.

 Merlin



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Persistent main memory Storage Manager

2004-06-01 Thread Tom Lane
Diego Montenegro [EMAIL PROTECTED] writes:
 Also, in released versions, MM.c is included but not used, does anyone
 know if it should work if we define the STABLE_MEMORY_STORAGE, or do a
 lot  coding has to be done for it to work?

Actually, I removed that file a few days ago because I didn't see any
point in updating it to match the recent smgr API changes.  Seems very
unlikely that it'd still work, given that it hasn't been maintained for
so many years.

It's unclear to me why anyone would bother with it anyway.  Putting
$PGDATA on a RAM disk should have approximately the same effect.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Lamar Owen
On Tuesday 01 June 2004 16:08, Simon Riggs wrote:
 The June 1st date was first mentioned on list in mid-March (to me), but
 wasn't generally announced until May under a specific heading. If it was
 set in January, I was never knowingly party to that info.

Well, it should not have surprised anyone.  We have targeted June 1 as a beta 
freeze date for several versions, not just 7.5.  In fact, looking back 
through last year's pre-7.4 discussion, it's deja vu all over again

Please read the thread Release cycle 
length ( http://archives.postgresql.org/pgsql-hackers/2003-11/msg00889.php ) 
and follow it through.  We're following the same track we did with 7.4.  Are 
we going to be a full year this time?  (4.5 months from freeze to release 
last time)

But I could not find using the archives the date June 1 (except in relation to 
the 7.4 freeze for 6/1/2003).

The closest to such a discussion would have been in the thread 
http://archives.postgresql.org/pgsql-hackers/2004-01/msg00273.php, at least 
that's all I could find.
-- 
Lamar Owen
Director of Information Technology
Pisgah Astronomical Research Institute
1 PARI Drive
Rosman, NC  28772
(828)862-5554
www.pari.edu

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[HACKERS] Participate to translation

2004-06-01 Thread m.jjoe
Hello,
I just saw a different documents not translate in the progresql project.
So i want participate to the translation  in English to French, i am 
just a french guy using linux since 8 years. It's time for me to give 
some of my time to the Open Source Community .

So  if it's possible:
-- I want to start a new translation or continue an existing one and  
want to avoid duplicating your work with someone else,

-- I am looking for other people who want to work with you on translating,
Best regards.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] Restore live backup from posgtres/data/base/* gives PHPPGADMIN errors, works fine otherwise

2004-06-01 Thread Jo Voordeckers
Hi there fellow admins,

One of our databases had to be restored from a backup tape. Since there
weren't any dumps made. All I had was a live backup (I know I know) of
/var/lib/postgres/data/base/number. The number for the corresponding
database I manages to discover after some searching with strings and grep.
First of all: isn't there an easy way or a file to look up the number
for the corresponding databases??

Second question... what could have gone wrong while restoring just the
corresponding directory for the database? In the log files I only found
out that the database wasn't shut down properly (when I started
postgres again after the restore) which I assume is correct because it was
restored from a live system backup. One CLOG file turned up to be missing,
however no database-transactions are used in this database application.
Recovery was done and successful by postmaster. The database is now up and
running. The website running from the content works fine again. All
records including BLOBS are there as well. In PSQL no strange warnings or
errors...

HOWEVER when I browse the DB with PHPPGADMIN (version 2.4.1-2) I get
all kinds of resultset errors.

For example: Warning: Supplied argument is not a valid PostgreSQL result
resource in /usr/share/phppgadmin/db_details.php on line 344

And this is ONLY for tables from the restored DB, the
other (non-restored) DB's give no PHPPGADMIN DB-errors.

Has anyone a clue what could be wrong. Or are there tools I can run to do
more consistency checks / rebuilds / whatever. 

Postgres version: 7.2.1-2woody4

Thanks in advance!!

-- 
Kind regards,

Jo Voordeckers

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Andrew Dunstan

Lamar Owen wrote:
On Tuesday 01 June 2004 16:08, Simon Riggs wrote:
 

The June 1st date was first mentioned on list in mid-March (to me), but
wasn't generally announced until May under a specific heading. If it was
set in January, I was never knowingly party to that info.
   

Well, it should not have surprised anyone.  We have targeted June 1 as a beta 
freeze date for several versions, not just 7.5.  In fact, looking back 
through last year's pre-7.4 discussion, it's deja vu all over again

Here's what Tom said on 31 March:
--
Simon Riggs [EMAIL PROTECTED] writes:
[ expecting to finish PITR by early June ]
 

Is this all still OK for 7.5? (My attempts at cataloguing changes has
fallen by the wayside in concentrating on the more important task of
PITR.) Do we have a planned freeze month yet?
 

There's not really a plan at the moment, but I had June in the back of
my head as a good time; it looks to me like the Windows port will be
stable enough for beta in another month or two, and it'd be good if
PITR were ready to go by then.
--
That seems to indicate that at that stage, barely 2 months ago, the 
month was not definite, let alone the day.

I confess that as a newcomer I was not around before the 7.4 cycle, so 
saying that people should have known the freeze date because it is 
following past patterns doesn't help me much. Are people supposed to 
obtain this info by trawling mailing list archives years back, or by 
some sort of divine revelation? Other OS projects manage this whole 
process better, IMNSHO. I'm not trying to point fingers, but to get 
future improvement.

cheers
andrew


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[HACKERS] Why repalloc() != realloc() ?

2004-06-01 Thread Alvaro Herrera
Hackers,

Is there a reason why repalloc() does not behave the same as realloc?
realloc(NULL, size) behaves the same as malloc(size), and it seems
useful behavior -- I wonder why repalloc() chooses to Assert() against
this exact condition?

I assume this is because the NULL pointer would not know what context it
belongs to, but the obvious answer is CurrentMemoryContext just like
palloc() does.  So there must be another reason.

Can this behavior be changed?

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Para tener más hay que desear menos


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Converting postgresql.conf parameters to kilobytes

2004-06-01 Thread Bruce Momjian
Neil Conway wrote:
 Tom Lane wrote:
  ISTM we had decided that putting vast amounts of documentation into the
  file comments was exactly the thing *not* to do.  It duplicates the SGML
  documentation, thereby doubling the maintenance effort, to very little
  purpose.
 
 I agree. If people really think that adding more comments to 
 pg_hba.conf is a good idea, I think the best way to do that is to 
 automatically extract that information from the main SGML docs.

And if folks want more info in postgresql.conf, we can perhaps pull them
from the docs too, or at least the descriptions we added.

Also, I it seems postgres --describe-config isn't working.  It outputs
nothing here.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  BEGIN;  xid=1
  INSERT a;
  BEGIN;  xid=2
  INSERT b;
  DELETE a; xid=3
  COMMIT;
  COMMIT;
 
  When DELETE a happens, we remove the xmin=1 from the tuple header and
  replace it with xmin=3.
 
 You can't change xmin --- this would break visibility tests.  Consider
 a cursor opened in the outer transaction after the INSERT a.  It should
 be able to see the a row (note that this depends on recognizing xid
 equality and then comparing cid's within the outer transaction).  If the
 subtransaction mangles xmin then it is no longer possible to make this
 test correctly.
 
 This is exactly the same argument as not being able to overwrite cmin.

Basically the phantom xid's are a shorthand for saying the tuple was
created by xid1 and deleted by xid2, both part of the same main
transaction.

A cursor looking at the rows has to recognize the xid is a phantom (via
pg_subtrans) and look up the creation xid.

Also, we will need a phantom xid for every xid1/xid2 pair.  You can't
just create one phantom xid per subtransaction because you must be able
to control independently commit/rollback rows based on the status of the
insert transaction.

In this case:

BEGIN;
BEGIN;  xid=1
INSERT a;
BEGIN;  xid=2
INSERT b;
BEGIN;  xid=3
DELETE a; xid=4
DELETE b; xid=5
COMMIT;
COMMIT;
COMMIT;
COMMIT;

xid4 and xid5 has to be adjusted based on that status of xid1 and xid2.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Official Freeze Date for 7.5: July 1st, 2004

2004-06-01 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 That seems to indicate that at that stage, barely 2 months ago, the 
 month was not definite, let alone the day.
 
 I confess that as a newcomer I was not around before the 7.4 cycle, so 
 saying that people should have known the freeze date because it is 
 following past patterns doesn't help me much. Are people supposed to 
 obtain this info by trawling mailing list archives years back, or by 
 some sort of divine revelation? Other OS projects manage this whole 
 process better, IMNSHO. I'm not trying to point fingers, but to get 
 future improvement.

I sent this email on April 16th asking for a status on the big 7.5
features:

---

From pgman Fri Apr 16 14:22:42 2004
Subject: PITR, nested transactions, tablespaces, 2-phase commit:  Status request
To: PostgreSQL-development [EMAIL PROTECTED]
Date: Sat, 17 Apr 2004 21:16:20 -0400 (EDT)

Would folks report on the current status of these projects:

o nested transactions (Alvaro Herrera)
o tablespaces (Gavin Sherry)
o PITR (Simon Riggs)
o 2-phase commit (Heikki Linnakangas)

---

I got no replies, except for Simon, I think.  Without replies, it is
very hard for us to adjust feature freeze timing for these features.

I should add I have been proposing a longer development period for a
long time because our features are getting more complex.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Why repalloc() != realloc() ?

2004-06-01 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Is there a reason why repalloc() does not behave the same as realloc?
 realloc(NULL, size) behaves the same as malloc(size), and it seems
 useful behavior -- I wonder why repalloc() chooses to Assert() against
 this exact condition?

We don't allow palloc(0) either, and we don't return NULL on failure,
and for that matter we don't allow pfree(NULL).  Please don't argue that
we ought to be just like libc.

 I assume this is because the NULL pointer would not know what context it
 belongs to,

That's a sufficient reason from my point of view.  One of the main
properties of repalloc is that the alloc'd memory stays in the same
context it was first allocated in.  I'm not excited about allowing
a special exception that makes that behavior less predictable.

To give a concrete example of why this sort of corner-case exception is
bad, imagine an aggregate function or similar thing that concatenates
strings.  It starts with a palloc() in a specific context and then
assumes that repalloc's will stay in that context without prodding.
The code works fine --- unless there are corner cases for palloc(0)
returning NULL and repalloc() accepting NULL.

 Can this behavior be changed?

Not without a significantly better argument than you've offered.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 You can't change xmin --- this would break visibility tests.

 Basically the phantom xid's are a shorthand for saying the tuple was
 created by xid1 and deleted by xid2, both part of the same main
 transaction.

That would be fine if the shorthand were readable, but it's not.

 A cursor looking at the rows has to recognize the xid is a phantom (via
 pg_subtrans) and look up the creation xid.

And it will find that how?  Imagine that the creating transaction is
itself a subtransaction, and the deleting one is a few nesting levels
further down.  I don't see how the tuple is going to carry enough
information to let you determine what's what, if the deleting subxact
overwrites the creating one's XID.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  You can't change xmin --- this would break visibility tests.
 
  Basically the phantom xid's are a shorthand for saying the tuple was
  created by xid1 and deleted by xid2, both part of the same main
  transaction.
 
 That would be fine if the shorthand were readable, but it's not.
 
  A cursor looking at the rows has to recognize the xid is a phantom (via
  pg_subtrans) and look up the creation xid.
 
 And it will find that how?  Imagine that the creating transaction is
 itself a subtransaction, and the deleting one is a few nesting levels
 further down.  I don't see how the tuple is going to carry enough
 information to let you determine what's what, if the deleting subxact
 overwrites the creating one's XID.

The backend who created _and_ expired the tuple has to do a lookup to
find the creation or expire xid.

We need two things.  First, we need to control the visibility of the
tuple once the entire transaction is done.  This does that. 

Second, we need to be able to find the creation and expire xid, and that
information is only required to be visible by the main transation and
its subtransactions.  On commit we can adjust these xid status to show
the proper visibility.

How do they do the lookup?  Well, one idea would be to just create a
local backend hash of these xids and their creation/expire xids. 
Another idea is that pg_subtrans already points to a parent xid.  We
could use the same method and point to both creation and expire xids.

Why does this not work?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 11:17:40PM -0400, Bruce Momjian wrote:

 Basically the phantom xid's are a shorthand for saying the tuple was
 created by xid1 and deleted by xid2, both part of the same main
 transaction.

Hmm... this would spread the ugliness elsewhere (hopefully only
HeapTupleHeaderGetXmin).


 A cursor looking at the rows has to recognize the xid is a phantom (via
 pg_subtrans) and look up the creation xid.

No need to look at pg_subtrans because we know all the Xids of our
transaction tree.  I think this can be kept in local memory.


 Also, we will need a phantom xid for every xid1/xid2 pair.  You can't
 just create one phantom xid per subtransaction because you must be able
 to control independently commit/rollback rows based on the status of the
 insert transaction.

Oh, sure.  This could get huge pretty fast.

We still need to think on the effects this could have on crash recovery
though -- we'd have to write the phantom Xids to Xlog somehow
(indicating which ones are committed and which are aborted).  And we
still don't know what effect it would have on CPU cost for every
visibility check.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Los dioses no protegen a los insensatos.  Éstos reciben protección de
otros insensatos mejor dotados (Luis Wu, Mundo Anillo)


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Why repalloc() != realloc() ?

2004-06-01 Thread Alvaro Herrera
On Tue, Jun 01, 2004 at 11:39:57PM -0400, Tom Lane wrote:

  I assume this is because the NULL pointer would not know what context it
  belongs to,
 
 That's a sufficient reason from my point of view.

Right, you've convinced me.  Just wanted to know if I could save three
lines of code.  Probably not a compelling reason to change the behavior.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
El realista sabe lo que quiere; el idealista quiere lo que sabe (Anónimo)


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Nested transactions and tuple header info

2004-06-01 Thread Bruce Momjian
Alvaro Herrera wrote:
  Also, we will need a phantom xid for every xid1/xid2 pair.  You can't
  just create one phantom xid per subtransaction because you must be able
  to control independently commit/rollback rows based on the status of the
  insert transaction.
 
 Oh, sure.  This could get huge pretty fast.
 
 We still need to think on the effects this could have on crash recovery
 though -- we'd have to write the phantom Xids to Xlog somehow
 (indicating which ones are committed and which are aborted).  And we
 still don't know what effect it would have on CPU cost for every
 visibility check.

As I understand, this overhead would only be needed for subtransactions.
I also don't think there will be a lot of them because it is only for
creation/expire in the same main transaction, and it is only needed for
unique creation/expire combinations, which should be pretty small.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]