Hi,

Because most of the internet is garbage, i'd like not to index garbage. There 
is a huge number of pages that consist just links and almost no text. 

To filter these pages out i intend to build an indexing filter. The problem is 
how to detect whether a page is considered a link page. From what i've seen 
there should be a distinct ratio between amount of text and number of outlinks 
to the same and other domains.

My question, has anyone come across literature on this topic? Or does someone 
already has such an ratio defined?

Thanks!

Reply via email to