Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED

2004-06-20 Thread Aaron
Title: Message



The words for the keyword can be made up of a 
sentace, ie 10 or more keywords to one entry.
Also incase I didnt answer before, we are using 
TSearch2 and all tables have been fully analyzed and indexed.
Any other suggestions?
How long do searches take when 10 000 rows are 
returned?
We can not use a limit of 100 because we need to 
analyze the entire data set returned.
Thanks,

  - Original Message - 
  From: 
  Jeremy 
  Dunn 
  To: 'borajetta' 
  Cc: Postgresql Performance 
  
  Sent: Tuesday, June 15, 2004 5:43 
AM
  Subject: RE: [PERFORM] 50 000 000 Table 
  entries and I have to do a keyword search HELP NEEDED
  
  One 
  option that does not take advantage of any fancy indexing methods is to create 
  a trigger on the table, on insert/update/delete, which extracts each 
  individual word from the field you care about, and creates an entry in another 
  'keyword' table, id = 'word', value = pk of your original table. then 
  index the keyword table on the 'keyword' field, and do your searches from 
  there. this should improve performance substantially, even on very large 
  return sets, because the keyword table rows are very small and thus a lot of 
  them fit in a disk block.
  
  - 
  Jeremy
  

-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
borajettaSent: Monday, June 07, 2004 5:47 PMTo: 
[EMAIL PROTECTED]Subject: [PERFORM] 50 000 000 
Table entries and I have to do a keyword search HELP 
NEEDED

So I have a table with about 50 million entries in it, I have to do a 
keyword search.

The keyword search is done on the title of the entry. For example a entry could be "This 
is a title string which could be searched"

I have tried a few ways to search but I get horrible search 
times. Some keywords will come 
up with matches as big as .25 million but most are around 
1000-5000.

I use an index which narrows the table down to about 1.5-2million 
entries.

I used 2 tables which had a 1:1 correspondence.
One held a gist index which was on a int field which searched the for 
the keyword. Then I would join 
the table to another to retrieve the rest of the information about the items 
it matched.

This was slow even for returning 100 entries. About 10 seconds, sometimes 5. But when I start getting 1xxx 
entries its about 30-50 seconds. 
The rest is just horrible.

How should I set up my indexes and or tables.
We were thinking of putting the index inside one table then the join 
would not have to be done but this still returns rather slow 
results.

I have not fully tested this method but it looks like when run for 
just the keyword search on the title and no joining it can return in about 
10 seconds or less.
This is a great improvement but I am currently going 
to make the table all in one and see how long it will take. I believe it will not be much more 
as there will be no join needed only the returning of some attribute 
fields. 

This is still not the kind of time I would like to see, I wanted 
something around 2 seconds or less. 
I know there is a lot of information especially if .25 million rows 
are to be returned but if there is only 1xxx-9xxx rows to be returned I 
believe 2 seconds seems about right.

How do search engines do it?
Any suggestions are welcome,

Thanks


Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED

2004-06-15 Thread Jeremy Dunn
Title: Message



One 
option that does not take advantage of any fancy indexing methods is to create a 
trigger on the table, on insert/update/delete, which extracts each individual 
word from the field you care about, and creates an entry in another 'keyword' 
table, id = 'word', value = pk of your original table. then index the 
keyword table on the 'keyword' field, and do your searches from there. 
this should improve performance substantially, even on very large return sets, 
because the keyword table rows are very small and thus a lot of them fit in a 
disk block.

- 
Jeremy

  
  -Original Message-From: 
  [EMAIL PROTECTED] 
  [mailto:[EMAIL PROTECTED] On Behalf Of 
  borajettaSent: Monday, June 07, 2004 5:47 PMTo: 
  [EMAIL PROTECTED]Subject: [PERFORM] 50 000 000 Table 
  entries and I have to do a keyword search HELP NEEDED
  
  So I have a table with about 50 million entries in it, I have to do a 
  keyword search.
  
  The keyword search is done on the title of the entry. For example a entry could be "This is 
  a title string which could be searched"
  
  I have tried a few ways to search but I get horrible search times. Some keywords will come up with 
  matches as big as .25 million but most are around 1000-5000.
  
  I use an index which narrows the table down to about 1.5-2million 
  entries.
  
  I used 2 tables which had a 1:1 correspondence.
  One held a gist index which was on a int field which searched the for 
  the keyword. Then I would join 
  the table to another to retrieve the rest of the information about the items 
  it matched.
  
  This was slow even for returning 100 entries. About 10 seconds, sometimes 5. But when I start getting 1xxx entries 
  its about 30-50 seconds. The rest 
  is just horrible.
  
  How should I set up my indexes and or tables.
  We were thinking of putting the index inside one table then the join 
  would not have to be done but this still returns rather slow 
  results.
  
  I have not fully tested this method but it looks like when run for just 
  the keyword search on the title and no joining it can return in about 10 
  seconds or less.
  This is a great improvement but I am currently going to 
  make the table all in one and see how long it will take. I believe it will not be much more as 
  there will be no join needed only the returning of some attribute fields. 
  
  This is still not the kind of time I would like to see, I wanted 
  something around 2 seconds or less. 
  I know there is a lot of information especially if .25 million rows are 
  to be returned but if there is only 1xxx-9xxx rows to be returned I believe 2 
  seconds seems about right.
  
  How do search engines do it?
  Any suggestions are welcome,
  
  Thanks