A while back we had some problems with the Visvo index. For instance if
you did a search for dallas Google was returned first. This was due to
inbound link text because one link to google had inbound link text that
said dallas. My response (now looking back not a very good one) was to
test the Wikia index without inbound link text.
I think the answer lies in finding the right links. I was going to
start with a filter that did some type of similarity measure on the
links and ordered them by the most clustered or most similar, the idea
being that the truly relevant links will be the most populous and most
similar (who knows if that is true). The one thing I am worried about
with this is googlebombing. Any ideas on how they get around that?
Dennis Kubes
Otis Gospodnetic wrote:
Hm, I didn't see that comment before. I think indexing incoming text is super
obvious, the equivalent to human annotation/tagging of web pages, no?
As for which anchor texts not to index.... hm, not sure. Nothing from spam
pages? Nothing from non-authoritative pages even if they are not spam? bah,
frozen brain, that's all I can think of now. :(
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Dennis Kubes <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, January 10, 2008 12:17:21 PM
Subject: Inbound Link Text
So awhile back I said I thought it might be better to NOT have inbound
link text indexed. One of the great things about Search Wikia and
Visvo
is not we have two comparable indexes to play with. The current Search
Wikia index was created without indexing inbound link text while the
current Visvo index was create while indexing inbound link text. Here
are two comparable queries:
http://re.search.wikia.com/search#pydev
Try this query, notice the results, then click on the Visvo index link
on the right hand side and notice results. There are explain links for
both. The major difference is inbound link text being indexed.
So I was way wrong before, inbound link text being indexed is essential
to web searches. No we just need to find a better way to filter which
inbound link text is included and which is not. Right now the first x
number of links are included. Any ideas on ways to make that filtering
better to know which links to include?
Dennis Kubes