Re: [HACKERS] An idle thought

2010-03-22 Thread Bruce Momjian
Jeff Davis wrote:
 On Tue, 2010-03-16 at 15:29 +, Greg Stark wrote:
  I'm picturing storing a bit in the visibility map indicating that *no*
  records are visible in a given page.
 
 I've been thinking for a while that we could store the visibility
 information in a structure separate from the heap -- sort of like the
 visibility map, but per-tuple and authoritative rather than a per-page
 hint.
 
 There are all kinds of challenges there, but it might be worth thinking
 about. Visibility information is highly compressible, and requires
 constant maintenance (updates, deletes, freezing, etc.). It also might
 make it possible to move to 64-bit xids, if we wanted to.

I don't think we want to move to 64-bit xids becasue we would still need
to do vacuum freeze to trim the clog.  In fact we do vacuum freeze much
more frequently than required for 32-bit xids for this very reason.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-22 Thread Jeff Davis
On Mon, 2010-03-22 at 16:48 -0400, Bruce Momjian wrote:
 I don't think we want to move to 64-bit xids becasue we would still need
 to do vacuum freeze to trim the clog.  In fact we do vacuum freeze much
 more frequently than required for 32-bit xids for this very reason.

Good point. I think there are a lot of issues like this, which would
make storing the visibility information separate from the heap a huge
undertaking. Still worth thinking about, though.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-19 Thread Gokulakannan Somasundaram
 Surely the VM is already update-friendly. If you update a tuple in a
 page with the visibility bit set, the bit must be unset or you will get
 wrong results.



I was referring in the context of index only scans to skip visibility
checks. I doubt, whether the visibility map feature to skip visibility
checks at the heap can be created without any extra cost to updates/inserts.
When a data is compressed then there is more contention for the same block
and hence would likely affect DMLs. I hope that's what Tom was also
referring to, but not in the visibility map context.

Gokul.


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
On Thu, Mar 18, 2010 at 2:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Davis pg...@j-davis.com writes:
  There are all kinds of challenges there, but it might be worth thinking
  about. Visibility information is highly compressible, and requires
  constant maintenance (updates, deletes, freezing, etc.). It also might
  make it possible to move to 64-bit xids, if we wanted to.

 If you want it to be cheaply updatable (or even cheaply readable),
 compression is not what you're going to do.

regards, tom lane

 +1..


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram

  Secondly there's the whole retail vacuum problem -- any
 index entries referring to this page would be left dangling unless
 there's some kind of retail vacuum or perhaps a page version number.


The issue, we can divide into two
a)volatile functions
b)broken datatypes

For a) I think volatile function issue can be resolved by using hidden
columns in the heap itself. This will create a duplication of data, but
since the index will get created on it, it will always point to the right
heap tuple

For b) We are already suffering with this issue in any index lookups, index
based updates/deletes, unique constraints, referential integrity maintenance
etc. So hopefully one day we can extend this list to include more :))

Gokul.


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram


 I didn't mean that we'd want to compress it to the absolute minimum
 size. I had envisioned that it would be a simple scheme designed only to
 eliminate long runs of identical visibility information (perhaps only
 the frozen and always visible regions would be compressed).

 The extra level of indirection would be slower, but if we freeze tuples
 more aggressively (which would be much cheaper if we didn't have to go
 to the heap), there might be a small number of tuples with interesting
 visibility information at any particular time.


 This should be achievable with current proposal of Heikki, but I think it
is useful only for tables which won't have more concurrent operations and on
databases without any long running queries. So if we have an option to
create a visibiliy map ( on the lines of materialized view ), whenever we
want for a table, it would help a good number of use cases, i suppose.

Thanks,
Gokul.


Re: [HACKERS] An idle thought

2010-03-18 Thread Chris Browne
si...@2ndquadrant.com (Simon Riggs) writes:
 On Tue, 2010-03-16 at 15:29 +, Greg Stark wrote:

 big batch delete

 Is one of the reasons for partitioning, allowing the use of truncate.

Sure, but it would be even nicer if DELETE could be thus made cheaper
without needing to interfere with the schema.

The concurrency issue might be resolved (*might!*) by the following
complication...

 - A delete request is looking at a page, and concludes, oh, all the
   tuples here are now marked dead!.

 - It flags the page as *possibly* dead.  Almost what Greg suggests for
   the visibility map, but this is just marking it as proposed dead.

 - It throws the page number, along with xid, into a side map.

When something wants to do something with the page (e.g. - vacuum), it
sees that it's possibly dead, and looks at the side map for the list
of xids that wanted to mark the page dead.

for each xid:
   if xid is still active
   do nothing with it
   else 
   remove xid entry from the map

if all xids were failed
   remove flag from page
if any xid committed
   empty the page; the tuples are all dead

I'm less confident about that last clause - I *think* that if *any*
page-clearing XID is found, that means the page is well and truly clear,
doesn't it?

The extra map mayn't be a nice thing.

It's food for thought, anyways.
-- 
let name=cbbrowne and tld=linuxfinances.info in String.concat @ 
[name;tld];;
The real  problem with the  the year 2000  is that there are  too many
zero bits and that adversely affects the global bit density.
-- Boyd Roberts b...@france3.fr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 14:29 +0530, Gokulakannan Somasundaram wrote:

 If you want it to be cheaply updatable (or even cheaply
 readable),
 compression is not what you're going to do.
 
regards, tom lane
 
 
 
 +1.. 

The visibility map itself is already an example of compression. If
visibility information were randomly distributed among tuples, the
visibility map would be nearly useless.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
 The visibility map itself is already an example of compression. If
 visibility information were randomly distributed among tuples, the
 visibility map would be nearly useless.


 I believe it is very difficult to make visibility map update friendly
without compromising durability.  But such a functionality is very much
wanted in PG still.


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Fri, 2010-03-19 at 01:59 +0530, Gokulakannan Somasundaram wrote:
 
 The visibility map itself is already an example of
 compression. If
 visibility information were randomly distributed among tuples,
 the
 visibility map would be nearly useless.
 
 
 I believe it is very difficult to make visibility map update friendly
 without compromising durability.  But such a functionality is very
 much wanted in PG still.

Surely the VM is already update-friendly. If you update a tuple in a
page with the visibility bit set, the bit must be unset or you will get
wrong results.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Fri, 2010-03-19 at 01:59 +0530, Gokulakannan Somasundaram wrote:
 I believe it is very difficult to make visibility map update friendly
 without compromising durability.  But such a functionality is very
 much wanted in PG still.

 Surely the VM is already update-friendly. If you update a tuple in a
 page with the visibility bit set, the bit must be unset or you will get
 wrong results.

The VM is (a) not compressed and (b) not correctness-critical.
Wrong bit values don't do any serious damage.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

The VM cause wrong results if a bit is set that's not supposed to be --
right? Am I missing something? How does a seq scan skip visibility
checks and still produce right results, if it doesn't rely on the bit?

The visibility map would obviously not be very useful if visibility
information was randomly distributed among tuples. Whether that
qualifies as compression or not was not my primary point. The point is
that it may be possible to use some structure that is significantly
smaller than holding xmin/xmax for every tuple in the heap, and at the
same time may be acceptably fast to update.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Alex Hunsaker
On Thu, Mar 18, 2010 at 15:07, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?

Isn't it only really used for VACUUM at this point?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Thu, 2010-03-18 at 16:50 -0400, Tom Lane wrote:
 The VM is (a) not compressed and (b) not correctness-critical.
 Wrong bit values don't do any serious damage.

 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?

It doesn't.  The only thing we currently rely on the VM for is deciding
whether a page needs vacuuming --- and even that we don't trust it for
when doing anti-wraparound vacuuming.  The worst-case consequence of a
wrong bit is failure to free some dead tuples until the vacuum freeze
limit expires.

In order to do things like not visiting a page during scans, we'll have
to solve the reliability issues.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Greg Stark
On Thu, Mar 18, 2010 at 9:07 PM, Jeff Davis pg...@j-davis.com wrote:
 The VM cause wrong results if a bit is set that's not supposed to be --
 right? Am I missing something? How does a seq scan skip visibility
 checks and still produce right results, if it doesn't rely on the bit?


There's also a PD_ALL_VISIBLE flag on the page header. We wal log when
we clear that bit so we can trust if it's set then all the tuples
really are visible. I forget whether we can trust it if it's *not* set
but there's not much point -- all tuples could become visible
spontaneously even the page is sitting on disk.
-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 17:17 -0400, Tom Lane wrote:
  The VM cause wrong results if a bit is set that's not supposed to be --
  right? Am I missing something? How does a seq scan skip visibility
  checks and still produce right results, if it doesn't rely on the bit?
 
 It doesn't.  The only thing we currently rely on the VM for is deciding
 whether a page needs vacuuming

Oh, my mistake. I misremembered the discussion and I thought the seq
scan optimization made it in.

 In order to do things like not visiting a page during scans, we'll have
 to solve the reliability issues.

Yeah, and also for the index-only scans.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Jeff Davis
On Thu, 2010-03-18 at 14:48 -0700, Jeff Davis wrote:
 On Thu, 2010-03-18 at 17:17 -0400, Tom Lane wrote:
   The VM cause wrong results if a bit is set that's not supposed to be --
   right? Am I missing something? How does a seq scan skip visibility
   checks and still produce right results, if it doesn't rely on the bit?
  
  It doesn't.  The only thing we currently rely on the VM for is deciding
  whether a page needs vacuuming
 
 Oh, my mistake. I misremembered the discussion and I thought the seq
 scan optimization made it in.
 

Hmm...

From heapgetpage() in heapam.c, CVS HEAD:
/*  

* If the all-visible flag indicates that all tuples on the page
are
* visible to everyone, we can skip the per-tuple visibility tests.
But 

I tested in gdb, and it calls HeapTupleSatisfiesMVCC, until I VACUUM a
few times, and then it doesn't call it any more. So, apparently the seq
scan optimization _is_ there. And that means it is correctness-critical.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-18 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 I tested in gdb, and it calls HeapTupleSatisfiesMVCC, until I VACUUM a
 few times, and then it doesn't call it any more. So, apparently the seq
 scan optimization _is_ there. And that means it is correctness-critical.

The page header bit is critical.  Not the VM.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-17 Thread Simon Riggs
On Tue, 2010-03-16 at 15:29 +, Greg Stark wrote:

 big batch delete

Is one of the reasons for partitioning, allowing the use of truncate.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-17 Thread Jeff Davis
On Tue, 2010-03-16 at 15:29 +, Greg Stark wrote:
 I'm picturing storing a bit in the visibility map indicating that *no*
 records are visible in a given page.

I've been thinking for a while that we could store the visibility
information in a structure separate from the heap -- sort of like the
visibility map, but per-tuple and authoritative rather than a per-page
hint.

There are all kinds of challenges there, but it might be worth thinking
about. Visibility information is highly compressible, and requires
constant maintenance (updates, deletes, freezing, etc.). It also might
make it possible to move to 64-bit xids, if we wanted to.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-17 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 There are all kinds of challenges there, but it might be worth thinking
 about. Visibility information is highly compressible, and requires
 constant maintenance (updates, deletes, freezing, etc.). It also might
 make it possible to move to 64-bit xids, if we wanted to.

If you want it to be cheaply updatable (or even cheaply readable),
compression is not what you're going to do.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-17 Thread Jeff Davis
On Wed, 2010-03-17 at 17:20 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  There are all kinds of challenges there, but it might be worth thinking
  about. Visibility information is highly compressible, and requires
  constant maintenance (updates, deletes, freezing, etc.). It also might
  make it possible to move to 64-bit xids, if we wanted to.
 
 If you want it to be cheaply updatable (or even cheaply readable),
 compression is not what you're going to do.

I didn't mean that we'd want to compress it to the absolute minimum
size. I had envisioned that it would be a simple scheme designed only to
eliminate long runs of identical visibility information (perhaps only
the frozen and always visible regions would be compressed).

The extra level of indirection would be slower, but if we freeze tuples
more aggressively (which would be much cheaper if we didn't have to go
to the heap), there might be a small number of tuples with interesting
visibility information at any particular time.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-17 Thread Simon Riggs
On Wed, 2010-03-17 at 14:09 -0700, Jeff Davis wrote:

 I've been thinking for a while that we could store the visibility
 information in a structure separate from the heap -- sort of like the
 visibility map, but per-tuple and authoritative rather than a per-page
 hint.

A lot of people have been thinking that for a while, but I've yet to see
a fully costed assessment of whether its worth doing. It might be.

It seems easier to focus on outcomes rather than specifics, then order
the possible solutions to those outcomes in cost-benefit sequence.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] An idle thought

2010-03-16 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 However then I started thinking about this case and wondered if it
 wouldn't be possible to optimize. One of the suggested optimizations
 was to look at using TRUNCATE. But I wonder why it's necessary to use
 a dedicated command. Shouldn't it be possible for the system to notice
 this situation and do effectively the same thing itself?

Not unless you'd like DELETE to always acquire exclusive lock...

 There are a couple problems with the way I've described this idea
 here.

Precisely because of the lack of lock.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers