On Wed, Mar 23, 2011 at 11:58 AM, Adarsh Sharma <adarsh.sha...@orkash.com>wrote:

>  Dear all,
>
> I have 2 tables in my database name clause2( 4900 MB) & page_content(1582
> MB).
>
> My table definations are as :
>
> *page_content :-
>
> *CREATE TABLE page_content
> (
>   content_id integer,
>   wkb_geometry geometry,
>   link_level integer,
>   isprocessable integer,
>   isvalid integer,
>   isanalyzed integer,
>   islocked integer,
>   content_language character(10),
>   url_id integer,
>   publishing_date character(40),
>   heading character(150),
>   category character(150),
>   crawled_page_url character(500),
>   keywords character(500),
>   dt_stamp timestamp with time zone,
>   "content" character varying,
>   crawled_page_id bigint,
>   id integer
> )
> WITH (
>   OIDS=FALSE
> );
>
> *Indexes on it :-*
> CREATE INDEX idx_page_id  ON page_content  USING btree  (crawled_page_id);
> CREATE INDEX idx_page_id_content   ON page_content  USING btree
> (crawled_page_id, content_language, publishing_date, isprocessable);
> CREATE INDEX pgweb_idx  ON page_content   USING gin
> (to_tsvector('english'::regconfig, content::text));
>
> *clause 2:-
> *CREATE TABLE clause2
> (
>   id bigint NOT NULL DEFAULT nextval('clause_id_seq'::regclass),
>   source_id integer,
>   sentence_id integer,
>   clause_id integer,
>   tense character varying(30),
>   clause text,
>   CONSTRAINT pk_clause_demo_id PRIMARY KEY (id)
> )WITH ( OIDS=FALSE);
>
> *Indexes on it :
>
> *CREATE INDEX idx_clause2_march10
>   ON clause2
>   USING btree
>   (id, source_id);*
>
> *I perform a join query on it as :
>
> * explain analyze select distinct(p.crawled_page_id) from page_content p ,
> clause2  c where p.crawled_page_id != c.source_id ;
>
> *What it takes more than 1 hour to complete. As I issue the explain
> analyze command and cannot able to wait for output but I send my explain
> output as :
>                                              QUERY
> PLAN
>
> --------------------------------------------------------------------------------------------------------
>  Unique  (cost=927576.16..395122387390.13 rows=382659 width=8)
>    ->  Nested Loop  (cost=927576.16..360949839832.15 rows=13669019023195
> width=8)
>          Join Filter: (p.crawled_page_id <> c.source_id)
>          ->  Index Scan using idx_page_id on page_content p
> (cost=0.00..174214.02 rows=428817 width=8)
>          ->  Materialize  (cost=927576.16..1370855.12 rows=31876196
> width=4)
>                ->  Seq Scan on clause2 c  (cost=0.00..771182.96
> rows=31876196 width=4)
> (6 rows)
>
>
> Please guide me how to make the above query run faster as I am not able to
> do that.
>
>
> Thanks, Adarsh
>
> *
>
> *
>

Could you try just explaining the below query:
explain  select distinct(p.crawled_page_id) from page_content p where NOT
EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);

The idea here is to avoid directly using NOT operator.



Regards,
Chetan

-- 
Chetan Suttraway
EnterpriseDB <http://www.enterprisedb.com/>, The Enterprise
PostgreSQL<http://www.enterprisedb.com/>
 company.

Reply via email to