Hm, I didn't see that comment before. I think indexing incoming text is super obvious, the equivalent to human annotation/tagging of web pages, no?
As for which anchor texts not to index.... hm, not sure. Nothing from spam pages? Nothing from non-authoritative pages even if they are not spam? bah, frozen brain, that's all I can think of now. :( Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Dennis Kubes <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, January 10, 2008 12:17:21 PM Subject: Inbound Link Text So awhile back I said I thought it might be better to NOT have inbound link text indexed. One of the great things about Search Wikia and Visvo is not we have two comparable indexes to play with. The current Search Wikia index was created without indexing inbound link text while the current Visvo index was create while indexing inbound link text. Here are two comparable queries: http://re.search.wikia.com/search#pydev Try this query, notice the results, then click on the Visvo index link on the right hand side and notice results. There are explain links for both. The major difference is inbound link text being indexed. So I was way wrong before, inbound link text being indexed is essential to web searches. No we just need to find a better way to filter which inbound link text is included and which is not. Right now the first x number of links are included. Any ideas on ways to make that filtering better to know which links to include? Dennis Kubes
