[PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Thanks Chetan, After my Lunch Break, I tried the below steps :

*My original query was :
*explain analyze select distinct(p.crawled_page_id) from page_content p 
, clause2  c where p.crawled_page_id != c.source_id


which hangs because it is wrong query to fetch the desired output .

*Next Updated Query be Chetan Suttraway :*

explain analyze select distinct(p.crawled_page_id) from page_content p
where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);


   
QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7192.843..7195.923 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.040..7192.426 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.009..395.599 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.014..0.014 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 7199.748 ms
(6 rows)

I think it is very much faster but I don't understand the query :

*explain select distinct(b) from t1,t2 where t1.b t2.d union all  
select distinct(b) from t1,t2 where  t1.b t2.d;


*As i transform it into my format as:

explain select distinct(p.crawled_page_id) from page_content p , clause2 
c where p.crawled_page_id  c.source_id union all  select 
distinct(p.crawled_page_id) from page_content p,clause2 c where 
p.crawled_page_id  c.source_id;


QUERY 
PLAN 
-

Append  (cost=0.00..296085951076.34 rows=765318 width=8)
  -  Unique  (cost=0.00..148042971711.58 rows=382659 width=8)
-  Nested Loop  (cost=0.00..136655213119.84 rows=4555103436696 
width=8)
  -  Index Scan using idx_page_id on page_content p  
(cost=0.00..174214.02 rows=428817 width=8)
  -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..185898.05 rows=10622488 width=4)

Index Cond: (p.crawled_page_id  c.source_id)
  -  Unique  (cost=0.00..148042971711.58 rows=382659 width=8)
-  Nested Loop  (cost=0.00..136655213119.84 rows=4555103436696 
width=8)
  -  Index Scan using idx_page_id on page_content p  
(cost=0.00..174214.02 rows=428817 width=8)
  -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..185898.05 rows=10622488 width=4)

Index Cond: (p.crawled_page_id  c.source_id)
(11 rows)

I don't think this is correct because it produce 11 rows output.

Any further suggestions, Please guide.

Thanks  best Regards,
Adarsh Sharma



Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Vitalii Tymchyshyn

23.03.11 11:17, Adarsh Sharma ???(??):


I think it is very much faster but I don't understand the query :

*explain select distinct(b) from t1,t2 where t1.b t2.d union all  
select distinct(b) from t1,t2 where  t1.b t2.d;

*

I don't understand it too. What are you trying to get? Is it
select distinct(b) from t1 where  b  (select min(d) from t2)**or b  
(select max(d) from t2)

?

Can you explain in words, not SQL, what do you expect do retrieve?

Best regards, Vitalii Tymchyshyn


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma
I just want to retrieve that id 's from page_content which do not have 
any entry in clause2 table.


Thanks , Adarsh

Vitalii Tymchyshyn wrote:

23.03.11 11:17, Adarsh Sharma ???(??):


I think it is very much faster but I don't understand the query :

*explain select distinct(b) from t1,t2 where t1.b t2.d union all  
select distinct(b) from t1,t2 where  t1.b t2.d;

*

I don't understand it too. What are you trying to get? Is it
select distinct(b) from t1 where  b  (select min(d) from t2)* *or b  
 (select max(d) from t2)

?

Can you explain in words, not SQL, what do you expect do retrieve?

Best regards, Vitalii Tymchyshyn




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Vitalii Tymchyshyn

23.03.11 12:10, Adarsh Sharma ???(??):
I just want to retrieve that id 's from page_content which do not have 
any entry in clause2 table.



Then
select distinct(p.crawled_page_id) from page_content p
 where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);

is correct query.

Best regards, Vitalii Tymchyshyn.


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Vitalii Tymchyshyn wrote:

23.03.11 12:10, Adarsh Sharma ???(??):
I just want to retrieve that id 's from page_content which do not 
have any entry in clause2 table.



Then
select distinct(p.crawled_page_id) from page_content p
 where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);

is correct query.



I can't understand how* select 1 from  clause2 c where c.source_id = 
p.crawled_page_id works too, *i get my output .


What is the significance of 1 here.

Thanks , Adarsh
**

Best regards, Vitalii Tymchyshyn.




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Vitalii Tymchyshyn

23.03.11 12:19, Adarsh Sharma ???(??):

Vitalii Tymchyshyn wrote:

23.03.11 12:10, Adarsh Sharma ???(??):
I just want to retrieve that id 's from page_content which do not 
have any entry in clause2 table.



Then
select distinct(p.crawled_page_id) from page_content p
 where NOT EXISTS (select 1 from  clause2 c where c.source_id = 
p.crawled_page_id);

is correct query.



I can't understand how*select 1 from  clause2 c where c.source_id = 
p.crawled_page_id works too, *i get my output .


What is the significance of 1 here.
No significance. You can put anything there. E.g. *. Simply arbitrary 
constant. Exists checks if there were any rows, it does not matter which 
columns are there or what is in this columns.


Best regards, Vitalii Tymchyshyn



Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv
 I just want to retrieve that id 's from page_content which do not have
 any entry in clause2 table.

In that case the query probably does not work (at least the query you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition). At least
that's how I understand it.

So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Chetan Suttraway
On Wed, Mar 23, 2011 at 3:49 PM, Adarsh Sharma adarsh.sha...@orkash.comwrote:

  Vitalii Tymchyshyn wrote:

 23.03.11 12:10, Adarsh Sharma написав(ла):

 I just want to retrieve that id 's from page_content which do not have any
 entry in clause2 table.

  Then
 select distinct(p.crawled_page_id) from page_content p
  where NOT EXISTS (select 1 from  clause2 c where c.source_id =
 p.crawled_page_id);
 is correct query.


 I can't understand how* select 1 from  clause2 c where c.source_id =
 p.crawled_page_id works too, *i get my output .

 What is the significance of 1 here.

 Thanks , Adarsh
 **

 Best regards, Vitalii Tymchyshyn.



Its the inverted logic for finding crawled_page_id not matching with
source_id.
Actually, the idea was to force index scan on clause2 though.


-- 
Regards,
Chetan Suttraway
EnterpriseDB http://www.enterprisedb.com/, The Enterprise
PostgreSQLhttp://www.enterprisedb.com/
 company.


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Chetan Suttraway
On Wed, Mar 23, 2011 at 4:08 PM, t...@fuzzy.cz wrote:

  I just want to retrieve that id 's from page_content which do not have
  any entry in clause2 table.

 In that case the query probably does not work (at least the query you've
 sent in the first post) as it will return even those IDs that have at
 least one other row in 'clause2' (not matching the != condition). At least
 that's how I understand it.

 true.


 So instead of this

 select distinct(p.crawled_page_id)
 from page_content p, clause2 c where p.crawled_page_id != c.source_id ;

 you should probably do this

 select distinct(p.crawled_page_id)
 from page_content p left join clause2 c on (p.crawled_page_id =
 c.source_id) where (c.source_id is null);

 I guess this will be much more efficient too.


This looks like to give expected results. Also note that the where clause
is null is really required and is not an
optional predicate.



 regards
 Tomas


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




-- 
Regards,
Chetan Suttraway
EnterpriseDB http://www.enterprisedb.com/, The Enterprise
PostgreSQLhttp://www.enterprisedb.com/
 company.


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding 
the needed queries :


*First Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(# c.source_id) where (c.source_id is null);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=87927.000..87930.084 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.191..87926.546 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 87933.882 ms :-(
(6 rows)

*Second Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id 
= p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7047.259..7050.261 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.039..7046.826 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 7054.074 ms :-)
(6 rows)


Thanks  best Regards,
Adarsh Sharma




Chetan Suttraway wrote:



On Wed, Mar 23, 2011 at 4:08 PM, t...@fuzzy.cz mailto:t...@fuzzy.cz wrote:

 I just want to retrieve that id 's from page_content which do
not have
 any entry in clause2 table.

In that case the query probably does not work (at least the query
you've
sent in the first post) as it will return even those IDs that have at
least one other row in 'clause2' (not matching the != condition).
At least
that's how I understand it.

true.
 


So instead of this

select distinct(p.crawled_page_id)
from page_content p, clause2 c where p.crawled_page_id !=
c.source_id ;

you should probably do this

select distinct(p.crawled_page_id)
from page_content p left join clause2 c on (p.crawled_page_id =
c.source_id) where (c.source_id is null);

I guess this will be much more efficient too.


This looks like to give expected results. Also note that the where 
clause is null is really required and is not an

optional predicate.

 


regards
Tomas


--
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org
mailto:pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




--
Regards,
Chetan Suttraway
EnterpriseDB http://www.enterprisedb.com/, The Enterprise PostgreSQL 
http://www.enterprisedb.com/ company.








Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Vitalii Tymchyshyn

23.03.11 13:21, Adarsh Sharma ???(??):

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding 
the needed queries :


*First Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(# c.source_id) where (c.source_id is null);
 
QUERY PLAN

-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=87927.000..87930.084 rows=72 loops=1)
   -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.191..87926.546 rows=74 loops=1)
 -  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
 -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 
loops=428467)

   Index Cond: (p.crawled_page_id = c.source_id)
 Total runtime: 87933.882 ms:-(
(6 rows)

*Second Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where 
c.source_id = p.crawled_page_id);
 
QUERY PLAN

-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=7047.259..7050.261 rows=72 loops=1)
   -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.039..7046.826 rows=74 loops=1)
 -  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
 -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 
loops=428467)

   Index Cond: (c.source_id = p.crawled_page_id)
 Total runtime: 7054.074 ms :-)
(6 rows)



Actually the plans are equal, so I suppose it depends on what were run 
first :). Slow query operates with data mostly on disk, while fast one 
with data in memory.


Best regards, Vitalii Tymchyshyn


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Chetan Suttraway
On Wed, Mar 23, 2011 at 4:51 PM, Vitalii Tymchyshyn tiv...@gmail.comwrote:

  23.03.11 13:21, Adarsh Sharma написав(ла):

 Thank U all, for U'r Nice Support.

 Let me Conclude the results, below results are obtained after finding the
 needed queries :

 *First Option :

 *pdc_uima=# explain analyze select distinct(p.crawled_page_id)
 pdc_uima-# from page_content p left join clause2 c on (p.crawled_page_id =
 pdc_uima(# c.source_id) where (c.source_id is null);
  QUERY
 PLAN

 -
  HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
 time=87927.000..87930.084 rows=72 loops=1)
-  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8)
 (actual time=0.191..87926.546 rows=74 loops=1)
  -  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817
 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
  -  Index Scan using idx_clause2_source_id on clause2 c
 (cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1
 loops=428467)
Index Cond: (p.crawled_page_id = c.source_id)
  Total runtime: 87933.882 ms :-(
 (6 rows)

 *Second Option :

 *pdc_uima=# explain analyze select distinct(p.crawled_page_id) from
 page_content p
 pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id =
 p.crawled_page_id);
  QUERY
 PLAN

 -
  HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
 time=7047.259..7050.261 rows=72 loops=1)
-  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8)
 (actual time=0.039..7046.826 rows=74 loops=1)
  -  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817
 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
  -  Index Scan using idx_clause2_source_id on clause2 c
 (cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1
 loops=428467)
Index Cond: (c.source_id = p.crawled_page_id)
  Total runtime: 7054.074 ms :-)
 (6 rows)


 Actually the plans are equal, so I suppose it depends on what were run
 first :). Slow query operates with data mostly on disk, while fast one with
 data in memory.

 yeah. maybe the easiest way, is to start a fresh session and fire the
queries.


  Best regards, Vitalii Tymchyshyn




-- 
Regards,
Chetan Suttraway
EnterpriseDB http://www.enterprisedb.com/, The Enterprise
PostgreSQLhttp://www.enterprisedb.com/
 company.


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

Vitalii Tymchyshyn wrote:

23.03.11 13:21, Adarsh Sharma ???(??):

Thank U all, for U'r Nice Support.

Let me Conclude the results, below results are obtained after finding 
the needed queries :


*First Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p left join clause2 c on (p.crawled_page_id = 
c.source_id) where (c.source_id is null);
 
QUERY 
PLAN  


-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=87927.000..87930.084 rows=72 loops=1)
   -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.191..87926.546 rows=74 loops=1)
 -  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.027..528.978 rows=428467 loops=1)
 -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.202..0.202 rows=1 
loops=428467)

   Index Cond: (p.crawled_page_id = c.source_id)
 Total runtime: 87933.882 ms :-(
(6 rows)

*Second Option :

*pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where 
c.source_id = p.crawled_page_id);
 
QUERY 
PLAN  


-
 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) 
(actual time=7047.259..7050.261 rows=72 loops=1)
   -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 
width=8) (actual time=0.039..7046.826 rows=74 loops=1)
 -  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..388.976 rows=428467 loops=1)
 -  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.013..0.013 rows=1 
loops=428467)

   Index Cond: (c.source_id = p.crawled_page_id)
 Total runtime: 7054.074 ms :-)
(6 rows)



Actually the plans are equal, so I suppose it depends on what were run 
first :). Slow query operates with data mostly on disk, while fast one 
with data in memory.


Yes U 'r absolutely right, if I run it again, it display the output as :

pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p left join clause2 c on (p.crawled_page_id = c.source_id) 
where (c.source_id is null);


   
QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7618.452..7621.427 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.131..7618.043 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.020..472.811 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 7637.132 ms
(6 rows)

I let U know after a fresh start (session ).
Then the true result comes and if further tuning required can be performed.

Best Regards, Adarsh


Best regards, Vitalii Tymchyshyn




Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma



Actually the plans are equal, so I suppose it depends on what were
run first :). Slow query operates with data mostly on disk, while
fast one with data in memory.

yeah. maybe the easiest way, is to start a fresh session and fire the 
queries.



After the fresh start , the results obtained are :

pdc_uima=# explain analyze select distinct(p.crawled_page_id)
pdc_uima-#  from page_content p left join clause2 c on (p.crawled_page_id =
pdc_uima(#  c.source_id) where (c.source_id is null);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=7725.132..7728.341 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.115..7724.713 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.021..472.199 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.015..0.015 rows=1 
loops=428467)

  Index Cond: (p.crawled_page_id = c.source_id)
Total runtime: 7731.840 ms
(6 rows)

pdc_uima=#  explain analyze select distinct(p.crawled_page_id) 
from page_content p
pdc_uima-#   where NOT EXISTS (select 1 from  clause2 c where 
c.source_id = p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=6192.249..6195.368 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.036..6191.838 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.008..372.489 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.012..0.012 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 6198.567 ms
(6 rows)


This seems a slight upper hand of the second query .


Would it be possible to tune it further.
My postgresql.conf parameters are as follows : ( Total RAM = 16 GB )

shared_buffers = 4GB
max_connections=700
effective_cache_size = 6GB
work_mem=16MB
maintenance_mem=64MB

I think to change

work_mem=64MB
maintenance_mem=256MB

Does it has some effects now.


Thanks  best Regards,
Adarsh Sharma



Best regards, Vitalii Tymchyshyn




--
Regards,
Chetan Suttraway
EnterpriseDB http://www.enterprisedb.com/, The Enterprise PostgreSQL 
http://www.enterprisedb.com/ company.








Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv

 Actually the plans are equal, so I suppose it depends on what were
 run first :). Slow query operates with data mostly on disk, while
 fast one with data in memory.

 yeah. maybe the easiest way, is to start a fresh session and fire the
 queries.


 After the fresh start , the results obtained are :

As Chetan Suttraway already pointed out, the execution plans are exactly
the same. And by excactly I mean there's no difference in evaluating
those two queries.

The difference is due to cached data - not just in shared buffers (which
will be lost of postgres restart) but also in filesystem cache (which is
managed by kernel, not postgres).

So the first execution had to load (some of) the data into shared buffers,
while the second execution already had a lot of data in shared buffers.
That's why the first query run in 7.7sec while the second 6.2sec.

 This seems a slight upper hand of the second query .

Again, there's no difference between those two queries, they're exactly
the same. It's just a matter of which of them is executed first.

 Would it be possible to tune it further.

I don't think so. The only possibility I see is to add a flag into
page_content table, update it using a trigger (when something is
inserted/deleted from clause2). Then you don't need to do the join.

 My postgresql.conf parameters are as follows : ( Total RAM = 16 GB )

 shared_buffers = 4GB
 max_connections=700
 effective_cache_size = 6GB
 work_mem=16MB
 maintenance_mem=64MB

 I think to change

 work_mem=64MB
 maintenance_mem=256MB

 Does it has some effects now.

Generally a good idea, but we don't know if there are other processes
running on the same machine and what kind of system is this (how many
users are there, what kind of queries do they run). If there's a lot of
users, keep work_mem low. If there's just a few users decrease
max_connections and bump up work_mem and consider increasing
shared_buffers.

Maintenance_work_mem is used for vacuum/create index etc. so it really
does not affect regular queries.

Some of those values (e.g. work_mem/maintenance_work_mem) are dynamic, so
you can set them for the current connection and see how it affects the
queries.

Just do something like

db=# SET work_mem='32MB'
db=# EXPLAIN ANALYZE SELECT ...

But I don't think this will improve the query we've been talking about.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Shaun Thomas

On 03/23/2011 04:17 AM, Adarsh Sharma wrote:


explain analyze select distinct(p.crawled_page_id) from page_content
p where NOT EXISTS (select 1 from clause2 c where c.source_id =
p.crawled_page_id);


You know... I'm surprised nobody has mentioned this, but DISTINCT is 
very slow unless you have a fairly recent version of Postgres that 
replaces it with something faster. Try this:


EXPLAIN ANALYZE
SELECT p.crawled_page_id
  FROM page_content p
 WHERE NOT EXISTS (
 SELECT 1
   FROM clause2 c
  WHERE c.source_id = p.crawled_page_id
   )
 GROUP BY p.crawled_page_id;

Or if you like the cleaner query without a sub-select:

EXPLAIN ANALYZE
SELECT p.crawled_page_id
  FROM page_content p
  LEFT JOIN clause2 c ON (c.source_id = p.crawled_page_id)
 WHERE c.source_id IS NULL
 GROUP BY p.crawled_page_id;

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread tv
 On 03/23/2011 04:17 AM, Adarsh Sharma wrote:

 explain analyze select distinct(p.crawled_page_id) from page_content
 p where NOT EXISTS (select 1 from clause2 c where c.source_id =
 p.crawled_page_id);

 You know... I'm surprised nobody has mentioned this, but DISTINCT is
 very slow unless you have a fairly recent version of Postgres that
 replaces it with something faster. Try this:

Nobody mentioned that because the explain plan already uses hash aggregate
(instead of the old sort)

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=7047.259..7050.261 rows=72 loops=1)

which means this is at least 8.4. Plus the 'distinct' step uses less than
1% of total time, so even if you improve it the impact will be minimal.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Shaun Thomas

On 03/23/2011 09:16 AM, t...@fuzzy.cz wrote:


which means this is at least 8.4. Plus the 'distinct' step uses less than
1% of total time, so even if you improve it the impact will be minimal.


Haha. Noted. I guess I'm still on my original crusade against DISTINCT. 
I was pulling it out of so much old code it's been fused to my DNA. 
Actually, we're still on 8.2 so... :)


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re-Reason of Slowness of Query

2011-03-23 Thread Adarsh Sharma

t...@fuzzy.cz wrote:

On 03/23/2011 04:17 AM, Adarsh Sharma wrote:



explain analyze select distinct(p.crawled_page_id) from page_content
p where NOT EXISTS (select 1 from clause2 c where c.source_id =
p.crawled_page_id);
  

You know... I'm surprised nobody has mentioned this, but DISTINCT is
very slow unless you have a fairly recent version of Postgres that
replaces it with something faster. Try this:



Nobody mentioned that because the explain plan already uses hash aggregate
(instead of the old sort)

 HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual
time=7047.259..7050.261 rows=72 loops=1)

which means this is at least 8.4. Plus the 'distinct' step uses less than
1% of total time, so even if you improve it the impact will be minimal.

  


Yes, Ur absolutely right I am using Version 8.4SS and i am satisfied 
with the below query results:


pdc_uima=# explain analyze select distinct(p.crawled_page_id) from 
page_content p
pdc_uima-#  where NOT EXISTS (select 1 from  clause2 c where c.source_id 
= p.crawled_page_id);

QUERY 
PLAN 
-
HashAggregate  (cost=100278.16..104104.75 rows=382659 width=8) (actual 
time=5149.308..5152.251 rows=72 loops=1)
  -  Nested Loop Anti Join  (cost=0.00..99320.46 rows=383079 width=8) 
(actual time=0.119..5148.954 rows=74 loops=1)
-  Seq Scan on page_content p  (cost=0.00..87132.17 
rows=428817 width=8) (actual time=0.021..444.487 rows=428467 loops=1)
-  Index Scan using idx_clause2_source_id on clause2 c  
(cost=0.00..18.18 rows=781 width=4) (actual time=0.009..0.009 rows=1 
loops=428467)

  Index Cond: (c.source_id = p.crawled_page_id)
Total runtime: 5155.874 ms
(6 rows)

I don't think that the above results are optimized further.


Thanks  best Regards,
Adarsh Sharma

regards
Tomas