Thanks Marshall, would I need to change the data type of *content
*column to tsvector and create a Gist Index on it.
Best Regards,
Adarsh
Kenneth Marshall wrote:
On Wed, Mar 16, 2011 at 02:43:38PM +0530, Adarsh Sharma wrote:
Dear all,
I am facing a problem while creating the index to make the below query run
faster. My table size is near about 1065 MB and 428467 rows.
explain analyze select count(*) from page_content where publishing_date
like '%2010%' and content_language='en' and content is not null and
isprocessable = 1 and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content like
'%terrorist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like
'%police%' OR content like '%defence%' OR content like '%cops%' OR content
like '%crpf%' OR content like '%dsf%' OR content like '%ssb%') AND (content
like '%kill%' or content like '%injure%');
*Output:
* Aggregate (cost=107557.78..107557.79 rows=1 width=0) (actual
time=18564.631..18564.631 rows=1 loops=1)
-> Seq Scan on page_content (cost=0.00..107466.82 rows=36381 width=0)
(actual time=0.146..18529.371 rows=59918 loops=1)
Filter: ((content IS NOT NULL) AND (publishing_date ~~
'%2010%'::text) AND (content_language = 'en'::bpchar) AND (isprocessable =
1) AND (((content)
::text ~~ '%kill%'::text) OR ((content)::text ~~ '%injure%'::text)) AND
(((content)::text ~~ '%Militant%'::text) OR ((content)::text ~~
'%jihad%'::text) OR (
(content)::text ~~ '%Mujahid%'::text) OR ((content)::text ~~
'%fedayeen%'::text) OR ((content)::text ~~ '%insurgent%'::text) OR
((content)::text ~~ '%terrori
st%'::text) OR ((content)::text ~~ '%cadre%'::text) OR ((content)::text ~~
'%civilians%'::text) OR ((content)::text ~~ '%police%'::text) OR
((content)::text
~~ '%defence%'::text) OR ((content)::text ~~ '%cops%'::text) OR
((content)::text ~~ '%crpf%'::text) OR ((content)::text ~~ '%dsf%'::text)
OR ((content)::text
~~ '%ssb%'::text)))
Total runtime: 18564.673 ms
*Index on that Table :
*CREATE INDEX idx_page_id
ON page_content
USING btree
(crawled_page_id);
*Index I create :*
CREATE INDEX idx_page_id_content
ON page_content
USING btree
(crawled_page_id,content_language,publishing_date,isprocessable);
*Index that fail to create:
*CREATE INDEX idx_page_id_content1
ON page_content
USING btree
(crawled_page_id,content);
Error :-ERROR: index row requires 13240 bytes, maximum size is 8191
********** Error **********
ERROR: index row requires 13240 bytes, maximum size is 8191
SQL state: 54000
How to resolve this error
Please give any suggestion to tune the query.
Thanks & best Regards,
Adarsh Sharma
You should probably be looking at using full-text indexing:
http://www.postgresql.org/docs/9.0/static/textsearch.html
or limit the size of content for the index.
Cheers,
Ken