Re: Proposal: implement Postgres CLUSTER USING by default for MtM tables

2013-05-22 Thread Donald Stufft

On May 21, 2013, at 9:28 PM, bry...@obviously.com wrote:

> The tables created by django for MtM fields are unordered.  Loading all the 
> relationships may result in a number of I/O operations approaching the number 
> of relationships:
> 
> # select * from basetable;
>  id |basetable_id| mtm_id 
> ++--
>   6 |  3 |1
>   7 |  2 |2
>  15 |  1 |1
>  16 |  1 |2
>  18 |  3 |5
> 
> Postgres has a feature to 'defragment' such tables packing data that gets 
> accessed at the same time into the same disk blocks.  The number of I/O 
> operations can be as small as one:
> 
> # \pset pager off
> # cluster basetable using table_mem_id;
> # select * from basetable;
>  id |basetable_id| mtm_id 
> ++--
>  15 |  1 |1
>  16 |  1 |2
>   7 |  2 |2
>   6 |  3 |1
>  18 |  3 |5
> 
> On a large table this can put hundreds of MtM relationships sequentially on 
> disk.  This makes for less I/O operations and less cache space used.
> 
> 
> This feature is postgres specific.
> And it would involve django making a guess about the use pattern of the 
> relationship.  Either the forward or reverse is optimized.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  



What do you mean by "clustering by default". CLUSTER is a one time operation. 
You use to to arrange the orders of the row in a table but after they have been 
arranged it has no long lasting affects past that.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Proposal: implement Postgres CLUSTER USING by default for MtM tables

2013-05-22 Thread Russell Keith-Magee
Hi Bryce,

An interesting idea. How are you proposing that this feature be exposed as
an API?

Is there any reason that this couldn't be handled as a DB-admin operation?
i.e., Django just does what it normally does, and if a DB admin notices
that table clustering is a problem, they apply the cluster command at the
database level. Given that this is a very low level optimisation, is there
any reason that this needs to be exposed as a Django-level API optimisation
option?

Yours,
Russ Magee %-)

On Wed, May 22, 2013 at 9:28 AM,  wrote:

> The tables created by django for MtM fields are unordered.  Loading all
> the relationships may result in a number of I/O operations approaching the
> number of relationships:
>
> # select * from basetable;
>  id |basetable_id| mtm_id
> ++--
>   6 |  3 |1
>   7 |  2 |2
>  15 |  1 |1
>  16 |  1 |2
>  18 |  3 |5
>
>
> Postgres has a feature to 'defragment' such tables packing data that gets
> accessed at the same time into the same disk blocks.  The number of I/O
> operations can be as small as one:
>
> # \pset pager off
> # cluster basetable using table_mem_id;
> # select * from basetable;
>
>  id |basetable_id| mtm_id
> ++--
>  15 |  1 |1
>  16 |  1 |2
>   7 |  2 |2
>   6 |  3 |1
>  18 |  3 |5
>
> On a large table this can put hundreds of MtM relationships sequentially
> on disk.  This makes for less I/O operations and less cache space used.
>
> 
> This feature is postgres specific.
> And it would involve django making a guess about the use pattern of the
> relationship.  Either the forward or reverse is optimized.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.




Proposal: implement Postgres CLUSTER USING by default for MtM tables

2013-05-21 Thread bryce2
The tables created by django for MtM fields are unordered.  Loading all the 
relationships may result in a number of I/O operations approaching the 
number of relationships:

# select * from basetable;
 id |basetable_id| mtm_id 
++--
  6 |  3 |1
  7 |  2 |2
 15 |  1 |1
 16 |  1 |2
 18 |  3 |5


Postgres has a feature to 'defragment' such tables packing data that gets 
accessed at the same time into the same disk blocks.  The number of I/O 
operations can be as small as one:

# \pset pager off
# cluster basetable using table_mem_id;
# select * from basetable;

 id |basetable_id| mtm_id 
++--
 15 |  1 |1
 16 |  1 |2
  7 |  2 |2
  6 |  3 |1
 18 |  3 |5

On a large table this can put hundreds of MtM relationships sequentially on 
disk.  This makes for less I/O operations and less cache space used.


This feature is postgres specific.
And it would involve django making a guess about the use pattern of the 
relationship.  Either the forward or reverse is optimized.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.