Re: Avoid indexing common html to all pages, promoting page titles.

Andrzej Bialecki Fri, 12 Mar 2010 05:56:06 -0800

On 2010-03-12 12:52, Pedro Bezunartea López wrote:

Hi,


I'm developing a site that has shows the dynamic content in a<div
id="content">, the rest of the page doesn't really change. I'd like to store
and index only the contents of this<div>, to basically avoid re-indexing
over and over the same content (header, footer, menu).

I've checked the WritingPluginExample-0.9 howto, but I couldn't figure out a
couple of things:

1.- Should I extend the parse-html plugin, or should I just replace it?

You should write an HtmlParseFilter and extract only the portions thatyou care about, and then replace the output parseText with yourextracted text.

2.- The example talks about finding a meta tag, extracting some information
from it, and adding a field in the index. I think I just need to get rid of
all html except the div id=content tag, and index its content. Can someone
point me in the right direction?


See above.

And just one more thing, I'd like to give a higher score to pages which the
search terms appear in the title. Right now pages that contain the terms in
the body rank higher than those that contain the search terms in the title,
how could I modify this behaviour?

You can define these weights in the configuration, look for query boostproperties.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Avoid indexing common html to all pages, promoting page titles.

Reply via email to