Thanks , it works now .. :-)
Here is the output :
pdc_uima=# SELECT count(*) from page_content WHERE publishing_date like
'%2010%' and
pdc_uima-# content_language='en' and content is not null and
isprocessable = 1 and
pdc_uima-# to_tsvector('english',content) @@
to_tsquery('english','Mujahid' || ' | '
pdc_uima(# || 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen' || ' | '
pdc_uima(# || 'insurgent' || ' | ' || 'terrORist' || ' | ' || 'cadre' ||
' | '
pdc_uima(# || 'civilians' || ' | ' || 'police' || ' | ' || 'cops' ||
'crpf' || ' | '
pdc_uima(# || 'defence' || ' | ' || 'dsf' || ' | ' || 'ssb' );
count
--------
137193
(1 row)
Time: 195441.894 ms
But my original query is to use AND also i.e
select count(*) from page_content where publishing_date like '%2010%'
and content_language='en' and content is not null and isprocessable = 1
and (content like '%Militant%'
OR content like '%jihad%' OR content like '%Mujahid%' OR
content like '%fedayeen%' OR content like '%insurgent%' OR content
like '%terrORist%' OR
content like '%cadre%' OR content like '%civilians%' OR content like
'%police%' OR content like '%defence%' OR content like '%cops%' OR
content like '%crpf%' OR content like '%dsf%' OR content like '%ssb%')
AND (content like '%kill%' OR content like '%injure%');
count
-------
57061
(1 row)
Time: 19423.087 ms
Now I have to add AND condition ( AND (content like '%kill%' OR content
like '%injure%') ) also.
Thanks & Regards,
Adarsh Sharma
t...@fuzzy.cz wrote:
t...@fuzzy.cz wrote:
Yes , I think we caught the problem but it results in the below error :
SELECT count(*) from page_content
WHERE publishing_date like '%2010%' and content_language='en' and
content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid ' ||
'jihad ' || 'Militant ' || 'fedayeen ' || 'insurgent ' || 'terrORist '
|| 'cadre ' || 'civilians ' || 'police ' || 'defence ' || 'cops ' ||
'crpf ' || 'dsf ' || 'ssb');
ERROR: syntax error in tsquery: "Mujahid jihad Militant fedayeen
insurgent terrORist cadre civilians police defence cops crpf dsf ssb"
The text passed to to_tsquery has to be a proper query, i.e. single
tokens
separated by boolean operators. In your case, you should put there '|'
(which means OR) to get something like this
'Mujahid | jihad | Militant | ...'
or you can use plainto_tsquery() as that accepts simple text, but it
puts
'&' (AND) between the tokens and I guess that's not what you want.
Tomas
What to do to make it satisfies the OR condition to match any of the
to_tsquery values as we got it right through like '%Mujahid' or .....
or ....
You can't force the plainto_tsquery to somehow use the OR instead of AND.
You need to modify the piece of code that produces the search text to put
there '|' characters. So do something like this
SELECT count(*) from page_content WHERE publishing_date like '%2010%' and
content_language='en' and content is not null and isprocessable = 1 and
to_tsvector('english',content) @@ to_tsquery('english','Mujahid' || ' | '
|| 'jihad' || ' | ' || 'Militant' || ' | ' || 'fedayeen);
Not sure where does this text come from, but you can do this in a higher
level language, e.g. in PHP. Something like this
$words = implode(' | ', explode(' ',$text));
and then pass the $words into the query. Or something like that.
Tomas