Hello to all of you, I'm running into a performance problem when considering the following scenario: I have a fairly large table (1mio rows) related to other smaller tables (between 100 and 10000 rows) and would like to retrieve the joined data (through a view) in random order. In order to do so, the main table contains a 'Random' field (which is updated on a regular basis, in order to re-randomize the data set), on which an index is created:
***** * SCHEMA ***** CREATE TABLE tb_Small AS ( PK integer UNIQUE NOT NULL PRIMARY KEY (PK) ); CREATE TABLE tb_Large AS ( Random integer DEFAULT( CAST( 1000000*random() AS integer ) ) NOT NULL, FK_Small integer NOT NULL, PK integer UNIQUE NOT NULL, FOREIGN KEY (FK_Small) REFERENCES tb_Small (PK), PRIMARY KEY (PK) ); CREATE INDEX ix_Large__Random ON tb_Large (Random); CREATE TABLE tb_Foo AS ( FK_Large integer NOT NULL, PK integer UNIQUE NOT NULL, FOREIGN KEY (FK_Large) REFERENCES tb_Large (PK) DEFERRABLE, PRIMARY KEY (PK) ); CREATE VIEW vw_Large AS SELECT tb_Small.*, tb_Large.* FROM tb_Small INNER JOIN tb_Large ON ( tb_Small.PK = tb_Large.FK_Small ); NOTA BENE: My production view actually involves much more inner- or left-joined tables that this simple example Here are the various querying scenario and the related performance problem (my production view actually involves much more inner- or left-joined tables that the scenarios below, simplified for the sake of clarity) ***** * 1. ***** CREATE VIEW vw_Large AS SELECT * FROM tb_Small AS Small INNER JOIN tb_Large AS Large ON ( Small.PK = Large.FK_Small ); SELECT * FROM vw_Large ORDER BY Random LIMIT 50; -> The slowest way (~60 time units), since the entire view is evaluated before it is sorted properly (the index on the 'Random' being ignored) SELECT * FROM vw_Large WHERE Small.PK = <any> ORDER BY Random LIMIT 50; -> Quicker (~5 time units), since the evaluated view is smaller (cf. the WHERE clause) before it is sorted properly ***** * 2. ***** CREATE VIEW vw_Large AS SELECT * FROM tb_Small AS Small INNER JOIN ( SELECT * FROM tb_Large ORDER BY Random ) AS Large ON ( Small.PK = Large.FK_Small ); SELECT * FROM vw_Large LIMIT 50; -> Much quicker (~15 time units), since the ordering is achieved on the table itself, using the index, before the view is evaluated SELECT * FROM vw_Large WHERE Small.PK = <any> LIMIT 50; -> Slow (~15 time units), since the ordering is achieved on the entire table, despite the WHERE clause ***** * POSSIBLE SOLUTIONS AND PROBLEMS ***** Since the second approach seems to give better results, the idea was to reorder (cluster) the 'Large' table regurlarly - like once a day -, so as to have a randomized data set and avoid the ORDER BY clause (this is the way I achieved VERY GOOD performance on MS SQL Server [~1 time unit], for exactly the same scenario). In order to do so, one might: ***** * 1. ***** CLUSTER ix_Large__Random TABLE tb_Large; -> Would be ideal... but OIDS, CONTRAINTS AND INDEXES ARE LOST AND ALL RELATED VIEWS/FUNCTIONS WON'T WORK ANYLONGER... ***** * 2. ***** BEGIN; SET CONSTRAINTS ALL DEFERRED; CREATE TEMP TABLE tmp_Large AS SELECT * FROM tb_Table; DELETE FROM tb_Large; -- won't work; RI violation on foreign key 'tb_Foo(FK_Large)' INSERT INTO tb_Large SELECT * FROM tb_Table ORDER BY Random; DROP TABLE tmp_Large; COMMIT; -> Would preserve oids, constraints and indexes... BUT DELETE IS IMPOSSIBLE, BECAUSE REFERENTIAL INTEGRITY IS VIOLATED ON FOREIGN KEY 'FK_Large' IN TABLE 'tb_Foo', despite the SET CONSTRAINTS ALL DEFERRED clause ***** * HELP !!! ***** Would anyone have a solution to this (general) randomization problem ? Is there a way to turn RI off during the transaction ? Is there another way to reorder (cluster) the table without having oids/constraints/indexes or RI problems ? Any clues would be very much appreciated ;-) Cédric Dufour ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]