Ahmad,
On Mon, 26 Sep 2005, Ahmad Fajar wrote:
Hi Oleg,
what king of garbage ? Probably you index not needed token types, for
example, email address, file names....
do you need proximity ? If no, use strip(tsvector) function to remove
coordinate information from tsvector.
I need proximity. Some time I have to rank my article and make a chart for
that.
don't index default configuration and index only needed tokens, for
example, to index only 3 type of tokens, first create 'qq' configuration
and specify tokens to index.
insert into pg_ts_cfg values('qq','default','en_US');
-- tokens to index
insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');
I still don't understand about tsearch2 configuration, so until now I just
use default configuration. I will try your suggestion. But how can I get the
en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
From default configuration I only see: ..., '{en_stem}';
I think you should read documentation. I couldn't explain you things already
written.
Beside that, I still have problem, if I do a simple query like:
Select ids, keywords from dict where keywords='blabla' ('blabla' is a
single
word); The table have 200 million rows, I have index the keywords field.
On
the first time my query seem to slow to get the result, about 15-60 sec to
get the result. I use latest pgAdmin3 to test all queries. But if I repeat
the query I will get fast result. My question is why on the first time the
query seem to slow.
because index pages should be readed from disk into shared buffers, so
next query will benefit from that. You need enough shared memory to get
real benefit. You may get postgresql stats and look on cache hit ration.
btw, how does your query ( keywords='blabla') relates to tsearch2 ?
(Keywords='blabla') isn't related to tsearch2, I just got an idea from
tsearch2 and try different approach. But I stuck on the query result speed.
Very slow to get result on the first query.
And how to see postgresql stats and look on cache hit ratio? I still don't
know how to get it.
learn from http://www.postgresql.org/docs/8.0/static/monitoring-stats.html
I try to cluster the table base on keyword index, but after 15 hours
waiting and it doesn't finish I stop clustering.
don't use cluster for big tables ! simple
select * into clustered_foo from foo order by indexed_field
would be faster and does the same job.
What the use of clustered_foo table? And how to use it?
I think it will not distinct duplicate rows. And the clustered_foo table
still not have an index, so if query to this table, I think the query will
be very slow to get a result.
oh guy, you certainly need to read documentation
http://www.postgresql.org/docs/8.0/static/sql-cluster.html
Regards,
ahmad fajar
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings