Miles Barr wrote:
Has anyone tried to remove similar documents from their search results?
It looks like Google does some on the fly filtering of the results,
hiding pages which is thinks are too similar, i.e. when you see:
"In order to show you the most relevant results, we have omitted some
entrie
to be returned it
might be used to normalize the scores however...
Otis
--- David Spencer <[EMAIL PROTECTED]> wrote:
Miles Barr wrote:
Has anyone tried to remove similar documents from their search
results?
It looks like Google does some on the fly filtering of the results,
hiding pages whic
Miles Barr wrote:
On Mon, 2005-03-14 at 20:48 +0100, Dawid Weiss wrote:
I think what they do at Google is a fancy heuristic -- as David Spencer
mentioned, suburls of a given page, identical snippets, or titles... My
idea was more towards providing a 'realistic overview' of subjects in
Robert Watkins wrote:
The reason your suggestion is not practical is scalability. In a production
environment you might have, for example, 10,000 stored queries and 10 new
documents a minute. That's a fair bit of load on the system for only one
aspect of a much larger search application.
Fun, inter
ut you have
certainly given me some good ideas. Answers to your questions are
below.
-- Robert
On Thu, 17 Mar 2005, David Spencer wrote:
Fun, interesting question - maybe you could elaborate on the
requirements a bit.
We deliver on-line content -- journals, reference works and the like.
Users ca
Daniel Herlitz wrote:
Hi everybody,
We have been using Lucene for about one year now with great success.
Recently though the index has growed noticably and so has the number of
searches. I was wondering if anyone would like to comment on these
figures and say if it works for them?
Index size: ~
Yura Smolsky wrote:
Hello, mark.
mh> 2) My app uses long queries, some of which include
mh> very common terms. Using the "MoreLikeThis" query to
mh> drop common terms drastically improved performance. If
mh> your "killer queries" are long ones you could spot
mh> them and service them with a MoreLik
Daniel Naber wrote:
On Wednesday 20 April 2005 18:22, Paul Elschot wrote:
Has anyone tried an index based on n-grams?
Nutch has bigrams for phrases with frequently occurring words.
Also the spell checker in SVN uses n-grams I think.
SVN here:
http://svn.apache.org/repos/asf/lucene/java/trunk/co
Pablo Gomes Ludermir wrote:
Hello all,
I know that we can expand a word to get its synonyms with Wordnet. I
was wondering if we could reduce the index size by including a synonym
instead of a word on the synonym list.
For instance, if "screen" shows up, I would like to replace it by
"monitor" (it i
You could try downloading a copy of the wikipedia and processing the
entries yourself. I don't know how well represented other languages are
but there's lot of English.
Ahmet Aksoy wrote:
Hi,
I have a project which will be used in order to supply automatic
dictionary helps in different language
Anna Bing wrote:
Firstly the Lucene in Action Book is great. It really helped me with
implementing search for a project.
Sorry if this is the wrong forum but as you are all search people. I
wondered if you could recommend any good books about search
theory/algorithms, readable if that is possible
Chris Conrad wrote:
I know I've been asked before for a description of how SourceForge.net
is using Lucene. I wrote a blog entry about it and thought people
might be interested in seeing at a high level how it was designed.
Take a look at http://blog.dev.sf.net. Any comments are welcome.
Chris Fraschetti wrote:
I've got an application that performs millions of searches against a
If the results are not, say, "personalized", than I suggest some kind of
web container cache - I use and like OSCache - and it can even cache
page fragments.
http://www.opensymphony.com/oscache/
Otis Gospodnetic wrote:
--- "Kevin L. Cobb" <[EMAIL PROTECTED]> wrote:
Open Source C/C++ only? When are you going to include Open Source
Java? We demand fair treatmant ;)
There are several related sites:
http://www.searchmorph.com/
Thanks for ref Otis. I run this site, and primarily inde
Nice write up.
One other nice thing I noticed is you seem to sort numeric attributes
numerically instead of alphabetically e.g. here:
http://reviews.cnet.com/4566-3156_7-0.html?filter=500193_5314692_
see the 3rd col, "Find by max speed", and note that has has choices in
this order:
< 2
15 matches
Mail list logo