> I think algortihm # 1 is what google uses.
> google ignores content that does not change from page to page, as well
> as content that isn't part of a pblock of text.

Are you sure?
Take a look at this search results:
http://www.google.com/search?hl=en&hs=otT&lr=&c2coff=1&safe=off&client=firefox-a&rls=org.mozilla:en-US:official&pwst=1&q=+site:gamingalmanac.com+global+gaming+almanac
... and you will notice that menus are indexed by google and displayed in
summaries.

But if you can contribute a HtmlParseFilter with ability to remove menus and
navigation, it will be a real improvement.
A first step, that I have developed in a previous project many years ago is
to remove pages that contains textual content only in links: it avoid
indexing frames or iframes that only contains some navigation text...

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to