[HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Royce Ausburn
Hi all,

I wonder, could the recent work on index only scans pave the way for auto 
clustered tables?  Consider a wide, mostly insert table with some subset of 
columns that I'd like to cluster on.  I'm after locality of tuples that are 
very frequently fetched together, but not keen on the downtime for a cluster, 
nor the maintenance that it requires.  Would it be a stretch to have an index 
that branches on the subset of cluster columns, but still stores all the 
columns, making it a covering index?  Given that we can already index 
concurrently, such an index would not require downtime, and would be self 
maintaining.  From my understanding of the index-only scan implementation, I 
suspect that such an index would effectively give locality, with some caveats… 

I'd expect the overhead of inserting in to such a table would be high, perhaps 
prohibitive.  Perhaps better ways have been discussed.  Stupid idea?

--Royce


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 7:08 AM, Royce Ausburn royce...@inomial.com wrote:
 I wonder, could the recent work on index only scans pave the way for auto 
 clustered tables?  Consider a wide, mostly insert table with some subset of 
 columns that I'd like to cluster on.  I'm after locality of tuples that are 
 very frequently fetched together, but not keen on the downtime for a cluster, 
 nor the maintenance that it requires.  Would it be a stretch to have an index 
 that branches on the subset of cluster columns, but still stores all the 
 columns, making it a covering index?  Given that we can already index 
 concurrently, such an index would not require downtime, and would be self 
 maintaining.  From my understanding of the index-only scan implementation, I 
 suspect that such an index would effectively give locality, with some caveats…

I guess we could do that, but I'm not convinced there would be much
benefit.  The only thing you'd be saving would be the cost of keeping
the tuples sorted by only the high-order columns rather than all of
them, and I doubt that's significant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 
 [implement clustered index as a covering index with all columns
 which are present in the heap]

 I guess we could do that, but I'm not convinced there would be
 much benefit. 
 
The traditional way to implement a clustered index is to have the
leaf level of the index contain the tuples rather than pointers to
the tuples.  If we're going to do clustered tables, we might want to
jump all the way to that, rather than a half-way solution which
stores everything twice.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 3:02 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:
 [implement clustered index as a covering index with all columns
 which are present in the heap]
 I guess we could do that, but I'm not convinced there would be
 much benefit.

 The traditional way to implement a clustered index is to have the
 leaf level of the index contain the tuples rather than pointers to
 the tuples.  If we're going to do clustered tables, we might want to
 jump all the way to that, rather than a half-way solution which
 stores everything twice.

Not a bad thought.

Actually, I've been thinking for a while now that we might need a
pluggable heapam, similar to the pluggable indexams we already have.
Our current heap format has served us pretty well, but there are any
number of things that we can't really do without changing it.  Of
course, if we came up with a new format that was better in every case,
across the board, then perhaps we'd be willing to just replace the
current format outright -- though even then, that would break
pg_upgrade, which would be painful, to put it mildly.  And it seems to
me that there could easily be format changes that would make sense for
particular cases, but not across the board, like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)

Alternatively, we could try to graft the concept of a self-clustering
table on top of the existing heap implementation.  But I'm having
trouble seeing how that would work.  The TODO describes it as
something like maintain CLUSTER ordering, but that's a gross
oversimplification, because we have no structure that would allow us
to sensibly do any such thing...  the current heap implementation is
just that: a pile of stuff.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Florian Pflug
On Oct11, 2011, at 21:27 , Robert Haas wrote:
 Alternatively, we could try to graft the concept of a self-clustering
 table on top of the existing heap implementation.  But I'm having
 trouble seeing how that would work.  The TODO describes it as
 something like maintain CLUSTER ordering, but that's a gross
 oversimplification, because we have no structure that would allow us
 to sensibly do any such thing...  the current heap implementation is
 just that: a pile of stuff.

We could still be smarter about where we insert new rows in a clustered
table, though.

Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
tuple will eventually go in the index we're supposed to maintain CLUSTER
for. Then we'd check if any of the pages referenced there contains enough
space, and if so place the new tuple there. If not it'd go at the end.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Kääriäinen Anssi
Robert Haas wrote:

And it seems to me that there could easily be format changes that
would make sense for particular cases, but not across the board,
like:

- index-organized tables (heap is a btree, and secondary indexes
reference the PK rather than the TID; this is how MySQL does it, and
Oracle offers it as an option)
- WORM tables (no updates or deletes, and no inserts after creating
transaction commits, allowing a much smaller tuple header)
- non-transactional tables (tuples visible as soon as they're written,
again allowing for smaller tuple header; useful for internal stuff and
perhaps for insert-only log tables)


This is probably a silly idea, but I have been wondering about the
following idea: Instead of having visibility info in the row header,
have a couple of row visibility slots in the page header. These slots
could be shared between rows in the page, so that if you do a bulk
insert/update/delete you would only use one slot. If the slots
overflow, you would use external slots buffer.

When the row is all visible, no slot would be used at all.

The xmin, xmax and cid would be in the slots. ctid would have its
current meaning, except when the external slots would be used,
then ctid would point to the external slot, and it would have the real
row header. I don't know if there would be any other row header
parts which could be shared.

The external slots buffer would then contain xmin, xmax, cid and
the real ctid.

Updates would write the new rows to another page in the heap,
and old rows would stay in place, just as now. So there would not
be any redo log like configuration. Also, the external slots buffer
would be small (18 bytes per row), so it would not get out of
cache too easily.

The performance would suck if you had lots of small updates, or
long running transactions. On the other hand in data warehousing,
where bulk loads are normal, and there are a lot of small rows,
this could actually work.

As said, this is probably a silly idea. But as pluggable heap types
came up, I thought to ask if this could actually work. If this kind of
wondering posts are inappropriate for this list, please tell me so
that I can avoid these in the future.

 - Anssi Kääriäinen

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Alternatively, we could try to graft the concept of a self-clustering
 table on top of the existing heap implementation.  But I'm having
 trouble seeing how that would work.  The TODO describes it as
 something like maintain CLUSTER ordering, but that's a gross
 oversimplification, because we have no structure that would allow us
 to sensibly do any such thing...  the current heap implementation is
 just that: a pile of stuff.

I currently think that's the way to go, with some coarser granularity
than tuple or page.  Picture HOT inserts, if you will.  That would be at
the page level, but do we need that level of precision?

I'm thinking that we need something more like segment based here, or
maybe some intermediate value would be good between a page of 8Kb and a
segment of 1GB, but I'm not so sure.  We would have to track the bounds
of each segment for the indexed columns, and maintain them, and the
planner would have to exercise pruning at the segment level.

So going down too much in granularity would have negative impacts on
planning performances (too many data to play with), and anyway a server
that needs that kind of optimization can certainly handle a couple of GB
in its file system cache.

So, it's quite hand wavy still, but Segment Exclusion has been discussed
here already, and it seems to me that's the next thing we need. Call it
partial Seq Scan and HOT inserts if new names are your thing :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 4:00 PM, Florian Pflug f...@phlo.org wrote:
 On Oct11, 2011, at 21:27 , Robert Haas wrote:
 Alternatively, we could try to graft the concept of a self-clustering
 table on top of the existing heap implementation.  But I'm having
 trouble seeing how that would work.  The TODO describes it as
 something like maintain CLUSTER ordering, but that's a gross
 oversimplification, because we have no structure that would allow us
 to sensibly do any such thing...  the current heap implementation is
 just that: a pile of stuff.

 We could still be smarter about where we insert new rows in a clustered
 table, though.

 Upon INSERT and UPDATE, we'd need to lookup the leaf page where the new
 tuple will eventually go in the index we're supposed to maintain CLUSTER
 for. Then we'd check if any of the pages referenced there contains enough
 space, and if so place the new tuple there. If not it'd go at the end.

That's an interesting idea, but my guess is that the if not clause
would trigger frequently enough to make this not work very well.

Of course, we'd need to test it to know for sure

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index only scan paving the way for auto clustered tables?

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 4:21 PM, Kääriäinen Anssi
anssi.kaariai...@thl.fi wrote:
 This is probably a silly idea, but I have been wondering about the
 following idea: Instead of having visibility info in the row header,
 have a couple of row visibility slots in the page header. These slots
 could be shared between rows in the page, so that if you do a bulk
 insert/update/delete you would only use one slot. If the slots
 overflow, you would use external slots buffer.

 When the row is all visible, no slot would be used at all.

I've thought about stuff like this, too.  One idea would be to make
the slots just be items on the page.  Each tuple includes a 2-byte
field which points to the item number (on the same page) where the
XMAX, CID, and CTID information for the update/delete of that tuple is
stored.  If the tuple isn't updated or deleted, it points back to the
tuple's own item ID.  When a tuple is updated or deleted, add an item
to the page with the necessary XMAX/CMAX/CTID and set the pointer to
the new item ID.  If the page is full, allocate an overflow block
and store the overflow block number in the page header.  Add the new
item to the overflow block and set the 2-byte field to point to it,
setting the high bit or something like that to indicate that the data
is in the overflow block rather than the data block.  The main
objection to this idea I see is that if the overflow blocks get much
use, the overall performance might end up being poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers