Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-22 Thread Heikki Linnakangas
ph...@apra.asso.fr wrote:
 Hi Jeff,
 If you can help (either benchmark work or C coding), try reviving the
 features by testing them and merging them with the current tree.
 OK, that's the rule of the game in such a community.
 I am not a good C writer, but I will see what I could do.

The FSM rewrite in 8.4 opened up more options for implementing this. The
patch used to check the index for the block the nearest key is stored
in, read that page in, and insert there if there's enough free space on
it. with the new FSM, you can check how much space there is on that
particular page before fetching it. And if it's full, the new FSM data
structure can be searched for a page with enough free space as close as
possible to the old page, although there's no interface to do that yet.

A completely different line of attack would be to write a daemon that
concurrently moves tuples in order to keep the table clustered. It would
interfere with UPDATEs and DELETEs, and ctids of the tuples would
change, but for many use cases it would be just fine. We discussed a
utility like that as a replacement for VACUUM FULL on hackers a while
ago, see thread Feedback on getting rid of VACUUM FULL. A similar
approach would work here, the logic for deciding which tuples to move
and where would just be different.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-21 Thread ph...@apra.asso.fr
Hi Jeff,

 Hi all,
 
 The current discussion about Indexes on low cardinality columns let
 me discover this 
 grouped index tuples patch (http://community.enterprisedb.com/git/)
 and its associated 
 maintain cluster order patch
 (http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)
 
 This last patch seems to cover the TODO item named Automatically
 maintain clustering on a table.

The TODO item isn't clear about whether the order should be strictly
maintained, or whether it should just make an effort to keep the table
mostly clustered. The patch mentioned above makes an effort, but does
not guarantee cluster order.

You are right, there are 2 different visions : a strictly maintained order or a 
 possibly maintained order.
This later is already a good enhancement as it largely decrease the time 
interval between 2 CLUSTER operations, in particular if the FILLFACTOR is 
properly set. In term of performance, having 99% of rows in the right page is 
not realy worse than having totaly optimized storage. 
The only benefit of a strictly maintained order is that there is no need for 
CLUSTER at all, which could be very interesting for very large databases with 
24/24 access constraint.
For our need, the possibly maintained order is enough.

 As this patch is not so new (2007), I would like to know why it has
 not been yet integrated in a standart version of PG (not well
 finalized ? not totaly sure ? not corresponding to the way the core
 team would like to address this item ?) and if there are good chance
 to see it committed in a near future.

Search the archives on -hackers for discussion. I don't think either of
these features were rejected, but some of the work and benchmarking have
not been completed.
OK, I will have a look.

If you can help (either benchmark work or C coding), try reviving the
features by testing them and merging them with the current tree.
OK, that's the rule of the game in such a community.
I am not a good C writer, but I will see what I could do.

 I recommend reading the discussion first, to see if there are any major
problems.


Personally, I'd like to see the GIT feature finished as well. When I
have time, I was planning to take a look into it.

Regards,
   Jeff Davis



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread ph...@apra.asso.fr
Hi all,

The current discussion about Indexes on low cardinality columns let me 
discover this 
grouped index tuples patch (http://community.enterprisedb.com/git/) and its 
associated 
maintain cluster order patch 
(http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)

This last patch seems to cover the TODO item named Automatically maintain 
clustering on a table. 
As this patch is not so new (2007), I would like to know why it has not been 
yet integrated in a standart version of PG (not well finalized ? not totaly 
sure ? not corresponding to the way the core team would like to address this 
item ?) and if there are good chance to see it committed in a near future.

I currently work for a large customer who is migrating a lot of databases used 
by an application that currently largely takes benefit from well clustered 
tables, especialy for batch processing. The migration brings a lot of benefits. 
In fact, the only regression, compared to the old RDBMS, is the fact that 
tables organisation level decreases more quickly, generating more frequent 
heavy cluster operations. 

So this maintain cluster order patch (and may be git also) should fill the 
lack. But leaving the way of the standart PG is not something very 
attractive...

Regards. 
Philippe Beaudoin.





-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-19 Thread Jeff Davis
On Mon, 2009-10-19 at 21:32 +0200, ph...@apra.asso.fr wrote:
 Hi all,
 
 The current discussion about Indexes on low cardinality columns let
 me discover this 
 grouped index tuples patch (http://community.enterprisedb.com/git/)
 and its associated 
 maintain cluster order patch
 (http://community.enterprisedb.com/git/maintain_cluster_order_v5.patch)
 
 This last patch seems to cover the TODO item named Automatically
 maintain clustering on a table.

The TODO item isn't clear about whether the order should be strictly
maintained, or whether it should just make an effort to keep the table
mostly clustered. The patch mentioned above makes an effort, but does
not guarantee cluster order.

 As this patch is not so new (2007), I would like to know why it has
 not been yet integrated in a standart version of PG (not well
 finalized ? not totaly sure ? not corresponding to the way the core
 team would like to address this item ?) and if there are good chance
 to see it committed in a near future.

Search the archives on -hackers for discussion. I don't think either of
these features were rejected, but some of the work and benchmarking have
not been completed.

If you can help (either benchmark work or C coding), try reviving the
features by testing them and merging them with the current tree. I
recommend reading the discussion first, to see if there are any major
problems.

Personally, I'd like to see the GIT feature finished as well. When I
have time, I was planning to take a look into it.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance