Re: [HACKERS] string_to_array eats too much memory?

2006-11-09 Thread Michael Paesold

Tom Lane writes:

Tatsuo Ishii [EMAIL PROTECTED] writes:

string_to_array() consumes too much memory. For example, to make
~70k array elements, string_to_array seems to eat several Gig bytes
of memory.


I'd argue that the problem comes from enlarging the work arrays only
64 elements at a time in accumArrayResult(). Most of the rest of the
code deals with resizing arrays using a double it each time it has
to grow approach, I wonder why this is different?


Without reading the code, I guess that simply means O(n^2) runtime. This 
should be fixed, then, right?


Best Regards,
Michael Paesold


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Introducing an advanced Frequent Update Optimization

2006-11-09 Thread Andrew Sullivan
On Mon, Nov 06, 2006 at 09:50:53PM +, Simon Riggs wrote:
 - There are specific issues with the optimizer's ability to understand 
 dead row numbers, which can in some cases lead to SeqScan plans that are
 inappropriate when tables grow because of updates. This is a red-herring
 that can lead to people thinking the situation is worse than it is; that
 needs fixing, but the core issues mentioned above remain.

I don't disagree with much of what you say, but I'm slightly
concerned about the wave-away answer that you give here.  In my
experience on high-update tables -- especially ones with the ones
with few rows, but lots of open transactions over the lifetime of the
row -- accurate understanding of dead rows would be a _dramatic_
improvement (perhaps at least as significant as the improvement being
discussed).

That said, I'm not opposed to the line you're taking.  I just don't
want this problem to sink forever, because it's a big problem.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.2 Beta 3 Now Available for Download / Testing ...

2006-11-09 Thread Andrew Dunstan

Heikki Linnakangas wrote:

Andrew Dunstan wrote:

Marc G. Fournier wrote:



As a result of there being two *known* outstanding bugs, we have 
just bundled up a Beta3, to allow for testing of the recent patch 
concerning WAL replay ...



What are the bugs?


AFAIK:

1. Tuple freezing and hint bits were not WAL-logged, which could lead 
to data corruption if database crashed after VACUUM. See recent thread 
titled WAL logging freezing.


2. B-tree page deletion bug, which could lead to VACUUM erroring with 
failed to re-find parent key message. See thread Nasty btree 
deletion bug: 
http://archives.postgresql.org/pgsql-hackers/2006-10/msg01373.php




I assume that we also want to fix the plperl/plperlu interaction issue, 
especially as it will result in changed behaviour on some platforms, at 
least.


I have asked a couple of people to look over and test the patch I posted 
on Sunday, but i have not had any reaction so far and feel slightly 
nervous about applying it. I will make some time to do more testing 
myself this weekend.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Introducing an advanced Frequent Update Optimization

2006-11-09 Thread Simon Riggs
On Thu, 2006-11-09 at 04:09 -0500, Andrew Sullivan wrote:
 On Mon, Nov 06, 2006 at 09:50:53PM +, Simon Riggs wrote:
  - There are specific issues with the optimizer's ability to understand 
  dead row numbers, which can in some cases lead to SeqScan plans that are
  inappropriate when tables grow because of updates. This is a red-herring
  that can lead to people thinking the situation is worse than it is; that
  needs fixing, but the core issues mentioned above remain.
 
 I don't disagree with much of what you say, but I'm slightly
 concerned about the wave-away answer that you give here.  In my
 experience on high-update tables -- especially ones with the ones
 with few rows, but lots of open transactions over the lifetime of the
 row -- accurate understanding of dead rows would be a _dramatic_
 improvement (perhaps at least as significant as the improvement being
 discussed).

Understood. I just wanted to make sure people understand that the
underlying problem would still be there even if we fix that.

 That said, I'm not opposed to the line you're taking.  I just don't
 want this problem to sink forever, because it's a big problem.

So, yeh, we should still fix that. The current prototype has a different
cost model for SeqScans as a result. Summary arriving anytime now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Simon Riggs
Design Overview of HOT Updates
--

The objective is to increase the speed of the UPDATE case, while
minimizing the overall negative effects of the UPDATE. We refer to the
general requirement as *Frequent Update Optimization*, though this
design proposal is for Heap Overflow Tuple (HOT) Updates. It is similar
in some ways to the design for SITC already proposed, though has a
number of additional features drawn from other designs to make it a
practical and effective implementation. 

EnterpriseDB have a working, performant prototype of this design. There
are still a number of issues to resolve and the intention is to follow
an open community process to find the best way forward. All required
detail will be provided for the work conducted so far.

Current PGSQL behaviour is for UPDATEs to create a new tuple version
within the heap, so acts from many perspectives as if it were an INSERT.
All of the tuple versions are chained together, so that whichever of the
tuples is visible to your Snapshot, you can walk the chain to find the
most recent tuple version to update.

The HOT proposal splits the heap into two areas: the main heap and an
overflow relation, both of which are regular, transactional relations.
INSERT and DELETE effect only the main heap exactly as they do now.
UPDATE places new tuple versions into the overflow relation in most
cases, maintaining all of the current capabilities of the MVCC model:
all tuple versions remain accessible, plus the chain of updated tuple
versions is maintained. 

UPDATEs do not insert into indexes at all - no index pointers exist at
all into the overflow relation. So during a stream of UPDATEs the main
heap and indexes stay the same size, while the indexes are used
read-only, avoiding additional database writes and the need for eventual
index rebuilding.

The current tuple version within the main heap is referred to as the
root tuple from now on. When reading the main heap, if the root tuple
is not visible we walk the chain into the overflow relation until we
find a tuple version that is visible - we follow the t_ctid pointer from
tuple to tuple, testing for MVCC visibility as we go.

As more UPDATEs take place these tuple chains would grow, making
locating the latest tuple take progressively longer. Intuitively this
sounds like a bad design, though an additional feature turns that from
bad to good performance:

- when anybody walks the chain into the overflow relation, if we find
that the root tuple is vacuumable we copy back the first visible tuple
version over the now-dead root tuple.

This allows the length of a typical tuple chain to be extremely short in
practice. For a single connection issuing a stream of UPDATEs the chain
length will no more than 1 at any time. Even under heavy concurrent
UPDATEs the modal chain length remains 1, with the chain length varying
in approximately Poisson distribution.

The overflow relation is similar in concept to a TOAST table, so we
might describe this approach as TOAST-for-updated-versions. The code
implementation is similar also, with reasonably similar modularity. HOT
does sound radical, but no more so than TOAST was when first discussed.

We can locate any tuple version and thereby allow MVCC to work
correctly, while at the same time preserving the crucial property of the
Postgres non-overwriting storage manager. This works very well for
indexed tuple access since the index and main heap never grow, thus
maintaining access speeds. HOT supports both Read Committed and
Serializable transaction isolation levels, with transactions of any
length, just as with current MVCC.

Performance results against the previously described test cases are
approximately:

1. pgbench (TPC-B) - Around 200-300% better

2. DBT-2 (TPC-C) - Signficicant improvement - possibly 10% improvement,
somewhat difficult to measure exactly because of the way the test itself
works.

3. truckin - Around 500% improvement

The performance gain remains better than REL8_2 base even in the
presence of longer running transactions.

[There is also considerable synergy with changes to b-tree indexes that
are both useful in their own right, and even better when HOT is added,
more on that on a separate thread.]

We expect that people will want to measure this for themselves, so we
don't publish all of the detailed internal tests here since YMMV.

Using HOT
-

HOT can only work in cases where a tuple does not modify one of the
columns defined in an index on the table, and when we do not alter the
row length of the tuple. [We'll be able to do that more efficiently when
we have plan invalidation]

If we perform an update that meets the HOT criteria then we put the
new version into the overflow relation; we describe this as a HOT
UPDATE. If we perform an update that does not meet the criteria, then we
carry on with the existing/old MVCC behaviour; we describe this as a
non-HOT UPDATE.

So with HOT we must support both HOT and non-HOT UPDATEs. 

Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Martijn van Oosterhout
Nice idea, just one question:

On Thu, Nov 09, 2006 at 05:13:16PM +, Simon Riggs wrote:
 Behavioural Characteristics
 ---
 
 With HOT, it is easily possible that the chain of prior versions spans
 many blocks. The chain always starts with the block of the root tuple
 but possibly includes many overflow relation blocks.
 
 A SeqScan of a HOT table will turn into a large number of
 random accesses to the overflow relation, which could be considerably
 more expensive than sequential I/O. Clearly very large tables would not
 be able to be HOT enabled without additional cost, so we make HOT a
 table-level option: WITH (freqUPDATE = on)   [discuss...]

It seems to me that bitmap index scans will get these same
characteristics also, right? The bitmap scan will have to follow the
chain of any possibly matching tuple in any of the blocks that are in
the bitmap, right?

Have a ncie day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-09 Thread Simon Riggs
On Thu, 2006-11-09 at 18:49 +0100, Martijn van Oosterhout wrote:
 Nice idea, just one question:

 It seems to me that bitmap index scans will get these same
 characteristics also, right? The bitmap scan will have to follow the
 chain of any possibly matching tuple in any of the blocks that are in
 the bitmap, right?

Yes, they would identify the root tuples. The whole chain has matching
index values, by definition, so the re-evaluation for lossy bitmaps will
work just the same before the actual tuple is retrieved by walking the
chain (if required).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Introducing an advanced Frequent Update Optimization

2006-11-09 Thread Simon Riggs
On Tue, 2006-11-07 at 15:00 +1300, Mark Kirkwood wrote:
 Simon Riggs wrote:
  EnterpriseDB has been running a research project to improve the
  performance of heavily updated tables. We have a number of approaches
  prototyped and we'd like to discuss the best of these now on -hackers
  for community input and patch submission to PostgreSQL core.
  
 
 Excellent! It would certainly be good for use cases like:
 
 - session data from web (or similar) applications
 - real time summary tables maintained by triggers
 
 to just work, as (certainly in the case of the first one) quite a few 
 folks have been bitten by exactly the issue you describe.

...and of course it would be good if LISTEN/NOTIFY were able to use this
concept also, to help Slony along also.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Josh Berkus
Simon,

 If we perform an update that meets the HOT criteria then we put the
 new version into the overflow relation; we describe this as a HOT
 UPDATE. If we perform an update that does not meet the criteria, then we
 carry on with the existing/old MVCC behaviour; we describe this as a
 non-HOT UPDATE.

Making the essential performance analysis question, Am I HOT or Not?  

Sorry, but someone had to say it.  ;-)

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Introducing an advanced Frequent Update Optimization

2006-11-09 Thread Christopher Browne
Quoth [EMAIL PROTECTED] (Simon Riggs):
 On Tue, 2006-11-07 at 15:00 +1300, Mark Kirkwood wrote:
 Simon Riggs wrote:
  EnterpriseDB has been running a research project to improve the
  performance of heavily updated tables. We have a number of approaches
  prototyped and we'd like to discuss the best of these now on -hackers
  for community input and patch submission to PostgreSQL core.
  
 
 Excellent! It would certainly be good for use cases like:
 
 - session data from web (or similar) applications
 - real time summary tables maintained by triggers
 
 to just work, as (certainly in the case of the first one) quite a few 
 folks have been bitten by exactly the issue you describe.

 ...and of course it would be good if LISTEN/NOTIFY were able to use this
 concept also, to help Slony along also.

That should be much less relevant as people migrate to version 1.2, as
1.2 uses LISTEN/NOTIFY a whole lot less than earlier versions:

 1.  Number of events generated is cut in 1/2 because we don't
 generate NOTIFIES for confirmations anymore

 2.  When a thread is busy, it shuts off LISTEN, and polls.  That
 will cut pg_listener bloat further...  
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://cbbrowne.com/info/internet.html
Ah,  fall  - when  leaves  turn  to  burnished colors  upon  darkling
branches,  collars are  turned  up  against a  wind  which murmurs  of
winter, and homework assignments appear on Usenet.  sigh
-- Bob Jarvis

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] plperl/plperlu interaction

2006-11-09 Thread Mark Dilger

Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
Anyway, it is probably not expected by many users that loading a module 
in plperlu makes it available to plperl  - I was slightly surprised 
myself to see it work and I am probably more aware than most of perl and 
plperl subtleties.


I think that is a bug and needs to be fixed.  We have the precedent of
pltcl, which uses separate interpreters for pltcl and pltclu for exactly
this reason.


If this is fixed, what becomes the mechanism for an administrator to make a perl 
module available to plperl functions?  I didn't see any other way to do this 
documented.  Thanks,


mark

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates

2006-11-09 Thread Simon Riggs
On Thu, 2006-11-09 at 13:21 -0800, Josh Berkus wrote:
 Simon,
 
  If we perform an update that meets the HOT criteria then we put the
  new version into the overflow relation; we describe this as a HOT
  UPDATE. If we perform an update that does not meet the criteria, then we
  carry on with the existing/old MVCC behaviour; we describe this as a
  non-HOT UPDATE.
 
 Making the essential performance analysis question, Am I HOT or Not?  

Very good. ;-)

Well, we had Overflow Update CHaining as an alternative name... :-)

The naming sounds silly, but we had a few alternate designs, so we
needed to be able to tell them apart sensibly. We've had TVR, SITC, UIP
and now HOT. Software research...

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Introducing an advanced Frequent Update Optimization

2006-11-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 ...and of course it would be good if LISTEN/NOTIFY were able to use this
 concept also, to help Slony along also.

LISTEN/NOTIFY are overdue to be rewritten to not use a table at all,
so I'm not particularly worried about whether this idea is applicable
to them.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 As more UPDATEs take place these tuple chains would grow, making
 locating the latest tuple take progressively longer.

This is the part that bothers me --- particularly the random-access
nature of the search.  I wonder whether you couldn't do something
involving an initial table fill-factor of less than 50%, and having
the first updated version living on the same heap page as its parent.
Only when the active chain length was more than one (which you
hypothesize is rare) would it actually be necessary to do a random
access into the overflow table.

More generally, do we need an overflow table at all, rather than having
these overflow tuples living in the same file as the root tuples?  As
long as there's a bit available to mark a tuple as being this special
not-separately-indexed type, you don't need a special location to know
what it is.  This might break down in the presence of seqscans though.

Actually, you omitted to mention the locking aspects of moving tuples
around --- exactly how are you going to make that work without breaking
concurrent scans?

 This allows the length of a typical tuple chain to be extremely short in
 practice. For a single connection issuing a stream of UPDATEs the chain
 length will no more than 1 at any time.

Only if there are no other transactions being held open, which makes
this claim a lot weaker.

 HOT can only work in cases where a tuple does not modify one of the
 columns defined in an index on the table, and when we do not alter the
 row length of the tuple.

Seems like altering the row length isn't the issue, it's just is
there room on the page for the new version.  Again, a generous
fillfactor would give you more flexibility.

 [We'll be able to do that more efficiently when
 we have plan invalidation]

Uh, what's that got to do with it?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Josh Berkus
Tom,

 Actually, you omitted to mention the locking aspects of moving tuples
 around --- exactly how are you going to make that work without breaking
 concurrent scans?

I believe that's the unsolved technical issue in the prototype, unless 
Pavan has solved it in the last two weeks.   Pavan?

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Various breakages in new contrib/isn module

2006-11-09 Thread Tom Lane
It occurred to me to run the regression tests type_sanity and opr_sanity
over the contrib/isn code.  (The easy way to do this is to copy the isn
install script into the regress/sql directory and then add it to
serial_schedule just before those two tests.)  It turned up a couple
of moderately serious problems:

* There are a whole bunch of shell operators created; try
select oid::regoperator from pg_operator where oprcode = 0;
after loading isn.  I didn't track it down in detail, but it looked
like most or all of these come from dangling oprcom links, ie, there's
an operator that claims to have a commutator but you never supplied one.
This is very bad, because the planner *will* try to use those operators
given the right kind of query.

* There are hash opclasses for these datatypes but the corresponding
equality operators are not marked hashable.  This is not quite as bad,
but should be fixed.

Please submit a patch that fixes these.

Note to hackers: it might be worth trying the same thing with the other
contrib modules; I don't have time right now though.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] plperl/plperlu interaction

2006-11-09 Thread Andrew Dunstan
Mark Dilger wrote:
 Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:
 Anyway, it is probably not expected by many users that loading a
module
 in plperlu makes it available to plperl  - I was slightly surprised
myself to see it work and I am probably more aware than most of perl
and
 plperl subtleties.
 I think that is a bug and needs to be fixed.  We have the precedent of
pltcl, which uses separate interpreters for pltcl and pltclu for
exactly
 this reason.

 If this is fixed, what becomes the mechanism for an administrator to
make
 a perl
 module available to plperl functions?  I didn't see any other way to do
this
 documented.  Thanks,


This isn't documented either :-)

I discovered this when I was working on a way of doing this nicely and
safely. I hope to have that for 8.3.

cheers

andrew





---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Pavan Deolasee
On 11/10/06, Josh Berkus josh@agliodbs.com wrote:
Tom, Actually, you omitted to mention the locking aspects of moving tuples around --- exactly how are you going to make that work without breaking concurrent scans?I believe that's the unsolved technical issue in the prototype, unless
Pavan has solved it in the last two weeks. Pavan?When an overflow tuple is copied back to the main heap, the overflow tuple ismarked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
when a backend tries to lock the overflow version of the tuple, it checks the flagand jumps to the main heap if the flag is set.We use the same technique to update the correct version of a tuple when a
tuple is moved back to the main heap.Regards,Pavan


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 On 11/10/06, Josh Berkus josh@agliodbs.com wrote:
 I believe that's the unsolved technical issue in the prototype, unless
 Pavan has solved it in the last two weeks.   Pavan?
 
 When an overflow tuple is copied back to the main heap, the overflow tuple
 is
 marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
 when a backend tries to lock the overflow version of the tuple, it checks
 the flag
 and jumps to the main heap if the flag is set.

(1) How does it jump to the main heap?  The links go the other
direction.

(2) Isn't this full of race conditions?

(3) I thought you already used up the one remaining t_infomask bit.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Pavan Deolasee
On 11/10/06, Tom Lane [EMAIL PROTECTED] wrote:

Pavan Deolasee [EMAIL PROTECTED] writes: On 11/10/06, Josh Berkus 
josh@agliodbs.com wrote:
 I believe that's the unsolved technical issue in the prototype, unless Pavan has solved it in the last two weeks. Pavan? When an overflow tuple is copied back to the main heap, the overflow tuple
 is marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently, when a backend tries to lock the overflow version of the tuple, it checks the flag and jumps to the main heap if the flag is set.
(1) How does it jump to the main heap?The links go the otherdirection.The overflow tuple has a special header to store the back pointer to the main heap.This increases the tuple header size by 6 bytes, but the overhead is restricted only to the overflow
tuples.(2) Isn't this full of race conditions?I agree, there could be race conditions. But IMO we can handle those. When we
follow the tuple chain, we hold a SHARE lock on the main heap buffer. Also, whenthe root tuple is vacuumable and needs to be overwritten, we acquire and keep EXCLUSIVElock on the main heap buffer.

This reduces the race conditions to a great extent.
(3) I thought you already used up the one remaining t_infomask bit.Yes. The last bit in the t_infomask is used up to mark presence of overflow tuple header. But I believe there are few more bits that can be reused. There are three bits available in the t_ctid field as well (since ip_posid needs maximum 13 bits). One bit is used to identify whether a given tid points to the main heap or the overflow heap. This helps when tids are passed around in the code.
Since the back pointer from the overflow tuple always points to the main heap, the same bit can be used to mark copied-back tuples (we are doing it in a slight different way in the current prototype though).

Regards,Pavan




Re: [HACKERS] Frequent Update Project: Design Overview of HOT

2006-11-09 Thread Hannu Krosing
Ühel kenal päeval, N, 2006-11-09 kell 18:28, kirjutas Tom Lane:
 Simon Riggs [EMAIL PROTECTED] writes:
  As more UPDATEs take place these tuple chains would grow, making
  locating the latest tuple take progressively longer.
 
 This is the part that bothers me --- particularly the random-access
 nature of the search.  I wonder whether you couldn't do something
 involving an initial table fill-factor of less than 50%, and having
 the first updated version living on the same heap page as its parent.
 Only when the active chain length was more than one (which you
 hypothesize is rare) would it actually be necessary to do a random
 access into the overflow table.
 
 More generally, do we need an overflow table at all, rather than having
 these overflow tuples living in the same file as the root tuples?  As
 long as there's a bit available to mark a tuple as being this special
 not-separately-indexed type, you don't need a special location to know
 what it is.  This might break down in the presence of seqscans though.

And why do you need to mark it as not-separately-indexed at all ?

We already cope with missing index pointers in VACUUM and I can't see
any other reason to have it.

What are the advantages of HOT over SITC (other than cool name) ?

Maybe just make HOT an extended SITC which can span pages.

In case of HOT together with reusing index tuples with DELETED bit set
we don't actually need copyback, but the same index pointer will follow
the head of live data automatically, maybe lagging only a small number
of versions.

 Actually, you omitted to mention the locking aspects of moving tuples
 around --- exactly how are you going to make that work without breaking
 concurrent scans?
 
  This allows the length of a typical tuple chain to be extremely short in
  practice. For a single connection issuing a stream of UPDATEs the chain
  length will no more than 1 at any time.
 
 Only if there are no other transactions being held open, which makes
 this claim a lot weaker.
 
  HOT can only work in cases where a tuple does not modify one of the
  columns defined in an index on the table, and when we do not alter the
  row length of the tuple.
 
 Seems like altering the row length isn't the issue, it's just is
 there room on the page for the new version.  Again, a generous
 fillfactor would give you more flexibility.

Maybe they hoped to take very light locks when new chaoin head is copied
iver the old one in the same-length case.

 http://www.postgresql.org/about/donate
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] beta3 CFLAGS issue on openbsd

2006-11-09 Thread Jeremy Drake
I was trying to compile 8.2beta3 on openbsd, and ran into an interesting
issue.  My account on the particular openbsd box has some restrictive
ulimit settings, so I don't have a lot of memory to work with.  I was
getting an out of memory issue linking postgres, while I did not before.
I figured out that the -g flag was being surreptitiously added to my
CFLAGS.  It was like pulling teeth trying to get the -g flag out.  I tried
--disable-debug to configure, which did not work.  I had to do
CFLAGS=-O2 ./configure ...

Is this a known feature in the betas to get people running with -g in case
things break, or is this a configure bug, or expected?

Here is the first bit from configure, note the -g in the using CFLAGS line
at the end.

[EMAIL PROTECTED](~/build/postgres/postgresql-8.2beta3)$ ./configure 
--prefix=/home/jeremyd/progs/pg82 --with-perl --with-openssl --with-pgport=54322
checking build system type... x86_64-unknown-openbsd3.9
checking host system type... x86_64-unknown-openbsd3.9
checking which template to use... openbsd
checking whether to build with 64-bit integer date/time support... no
checking whether NLS is wanted... no
checking for default port number... 54322
checking for C compiler default output file name... a.out
checking whether the C compiler works... yes
checking whether we are cross compiling... no
checking for suffix of executables...
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether cc accepts -g... yes
checking for cc option to accept ANSI C... none needed
checking if cc supports -Wdeclaration-after-statement... no
checking if cc supports -Wendif-labels... yes
checking if cc supports -fno-strict-aliasing... yes
configure: using CFLAGS=-O2 -g -pipe -Wall -Wmissing-prototypes -Wpointer-arith 
-Winline -Wendif-labels -fno-strict-aliasing


-- 
It's odd, and a little unsettling, to reflect upon the fact that
English is the only major language in which I is capitalized; in many
other languages You is capitalized and the i is lower case.
-- Sydney J. Harris

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates

2006-11-09 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 On 11/10/06, Tom Lane [EMAIL PROTECTED] wrote:
 (2) Isn't this full of race conditions?

 I agree, there  could be race  conditions. But IMO we can handle those.

Doubtless you can prevent races by introducing a bunch of additional
locking.  The question was really directed to how much concurrent
performance is left, once you get done locking it down.

 (3) I thought you already used up the one remaining t_infomask bit.

 Yes. The last bit in the t_infomask is used up to mark presence of overflow
 tuple header. But I believe there are few more bits that can be reused.
 There are three bits available in the t_ctid field as well (since ip_posid
 needs maximum 13 bits).

No, you cannot have those bits --- BLCKSZ is not limited to 8K, and even
if it were, we will not hold still for sticking flag bits into an
unrelated datatype.

You can probably fix this by inventing multiple context-dependent
interpretations of t_infomask bits, but that strikes me as a mighty
ugly and fragile way to proceed.

(Actually, the assumption that you can throw an additional back-pointer
into overflow tuple headers is the worst feature of this proposal in
that regard --- it's really not that easy to support multiple header
formats.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Frequent Update Project: Design Overview of HOT

2006-11-09 Thread Hannu Krosing
Ühel kenal päeval, R, 2006-11-10 kell 09:06, kirjutas Hannu Krosing:
 Ühel kenal päeval, N, 2006-11-09 kell 18:28, kirjutas Tom Lane:
  Simon Riggs [EMAIL PROTECTED] writes:
   As more UPDATEs take place these tuple chains would grow, making
   locating the latest tuple take progressively longer.
  
  This is the part that bothers me --- particularly the random-access
  nature of the search.  I wonder whether you couldn't do something
  involving an initial table fill-factor of less than 50%, and having
  the first updated version living on the same heap page as its parent.
  Only when the active chain length was more than one (which you
  hypothesize is rare) would it actually be necessary to do a random
  access into the overflow table.
  
  More generally, do we need an overflow table at all, rather than having
  these overflow tuples living in the same file as the root tuples?  As
  long as there's a bit available to mark a tuple as being this special
  not-separately-indexed type, you don't need a special location to know
  what it is.  This might break down in the presence of seqscans though.
 
 And why do you need to mark it as not-separately-indexed at all ?
 
 We already cope with missing index pointers in VACUUM and I can't see
 any other reason to have it.

Ok, now I see it - we can't VACUUM a tuple, if next versions of it are
accessible by t_ctid chain only. That is vacuum must not free tuples,
which have t_ctid pointing to a tuple that has not-separately-indexed
bit set. This seems to make vacuum quite complicated, as it has to
examine c_tid chains to detect if it can free a tuple, and what's even
worse, it has to examine these chains backwards.

 What are the advantages of HOT over SITC (other than cool name) ?

still wondering this, is it just the abilty to span multiple pages ?

 Maybe just make HOT an extended SITC which can span pages.
 
 In case of HOT together with reusing index tuples with DELETED bit set
 we don't actually need copyback, but the same index pointer will follow
 the head of live data automatically, maybe lagging only a small number
 of versions.

  Actually, you omitted to mention the locking aspects of moving tuples
  around --- exactly how are you going to make that work without breaking
  concurrent scans?
  
   This allows the length of a typical tuple chain to be extremely short in
   practice. For a single connection issuing a stream of UPDATEs the chain
   length will no more than 1 at any time.
  
  Only if there are no other transactions being held open, which makes
  this claim a lot weaker.
  
   HOT can only work in cases where a tuple does not modify one of the
   columns defined in an index on the table, and when we do not alter the
   row length of the tuple.
  
  Seems like altering the row length isn't the issue, it's just is
  there room on the page for the new version.  Again, a generous
  fillfactor would give you more flexibility.
 
 Maybe they hoped to take very light locks when new chaoin head is copied
 iver the old one in the same-length case.
 
  http://www.postgresql.org/about/donate
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

NOTICE: This communication contains privileged or other confidential
information. If you have received it in error, please advise the sender
by reply email and immediately delete the message and any attachments
without copying or disclosing the contents.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] beta3 CFLAGS issue on openbsd

2006-11-09 Thread Tom Lane
Jeremy Drake [EMAIL PROTECTED] writes:
 I figured out that the -g flag was being surreptitiously added to my
 CFLAGS.  It was like pulling teeth trying to get the -g flag out.

I believe that this is a default behavior of autoconf scripts.
I remember having done some ugly hacks years ago to prevent an autoconf
configure script from adding -g by default to libjpeg builds... and
the argument for not having -g has gotten ever weaker since then,
so I really doubt you'll get far complaining to the autoconf maintainers
about it.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly