Hi, Because most of the internet is garbage, i'd like not to index garbage. There is a huge number of pages that consist just links and almost no text.
To filter these pages out i intend to build an indexing filter. The problem is how to detect whether a page is considered a link page. From what i've seen there should be a distinct ratio between amount of text and number of outlinks to the same and other domains. My question, has anyone come across literature on this topic? Or does someone already has such an ratio defined? Thanks!

