Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC


There were some talks lately about compression.
	With a bit of lateral thinking I guess this can be used to contain the  
bloat induced by updates.

Of course this is just my hypothesis.

Compression in indexes :

	Instead of storing (value, tuple identifier) keys in the indexes, store  
(value, [tuple identifier list]) ; ie. all tuples which have the same  
indexed value are referenced by the same index tuple, instead of having  
one index tuple per actual tuple.
	The length of the list would of course be limited to the space actually  
available on an index page ; if many rows have the same indexed value,  
several index tuples would be generated so that index tuples fit on index  
pages.
	This would make the index smaller (more likely to fit in RAM) at the cost  
of a little CPU overhead for index modifications, but would make the index  
scans actually use less CPU (no need to compare the indexed value on each  
table tuple).


Compression in data pages :

	The article that circulated on the list suggested several types of  
compression, offset, dictionary, etc. The point is that several row  
versions on the same page can be compressed well because these versions  
probably have similar column values.


Just a thought...

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread Bruce Momjian
PFC wrote:
 
   There were some talks lately about compression.
   With a bit of lateral thinking I guess this can be used to contain the  
 bloat induced by updates.
   Of course this is just my hypothesis.
 
   Compression in indexes :
 
   Instead of storing (value, tuple identifier) keys in the indexes, store 
  
 (value, [tuple identifier list]) ; ie. all tuples which have the same  
 indexed value are referenced by the same index tuple, instead of having  
 one index tuple per actual tuple.
   The length of the list would of course be limited to the space actually 
  
 available on an index page ; if many rows have the same indexed value,  
 several index tuples would be generated so that index tuples fit on index  
 pages.
   This would make the index smaller (more likely to fit in RAM) at the 
 cost  
 of a little CPU overhead for index modifications, but would make the index  
 scans actually use less CPU (no need to compare the indexed value on each  
 table tuple).

What about increasing the size of an existing index entry?  Can that be
done easily when a new row is added?

   Compression in data pages :
 
   The article that circulated on the list suggested several types of  
 compression, offset, dictionary, etc. The point is that several row  
 versions on the same page can be compressed well because these versions  
 probably have similar column values.
 
   Just a thought...

I would be worried about the overhead of doing that on compression and
decompression.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC, and compression

2006-06-26 Thread PFC



What about increasing the size of an existing index entry?  Can that be
done easily when a new row is added?


I'd say it looks pretty much like inserting a new index tuple...
Say value is the indexed column.

Find first page in the index featuring value.
1   If there is space on the page,
		add the tuple id to the list of the corresponding index entry (just like  
creating a new index tuple, but uses less space).

else
look at next page.
If next page has an index tuple with the same indexed value,
goto 1
else
insert new page and create an index tuple on it


I would be worried about the overhead of doing that on compression and
decompression.


	The compression methods mentioned in the article which was passed on the  
list seemed pretty fast. From IO-limited, the test database became  
CPU-limited (and a lot faster).



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match