AW: Indexing parts of an HTML file differently

Michael Clivot Thu, 27 Mar 2014 01:29:23 -0700

Thanks for your answer Jack.
@Gora:

> How are you fetching the HTML content, and indexing it into Solr?

We are using SolR with the OpenText Delivery Server. The Delivery Server 
generated HTML representations of the published pages and writes them to the 
directory, which is used by solr to get data content.

> It is probably best to handle this requirement at that point. Haven't used 
> Nutch ( http://nutch.apache.org/) recently, but you might be able to use it 
> for this.

Do you mean the web crawler way? From the first view, it fits us not very good. 
In this case we need to implement ourselves the OpenText Search layer. 
Theoretically, we can try to teach DeliveryServer to understand external 
indexes. But the crawling itself is not the preferred solution - it is not so 
responsive, as the DS-way; in case of existing authorization restrictions, it 
should be many crawler users for every role; etc...

-----Ursprüngliche Nachricht-----
Von: Gora Mohanty [mailto:g...@mimirtech.com] 
Gesendet: Dienstag, 25. März 2014 11:32
An: solr-user@lucene.apache.org
Betreff: Re: Indexing parts of an HTML file differently

On 25 March 2014 15:59, Michael Clivot <cli...@netmedia.de> wrote:
> Hello,
>
> I have the following issue and need help:
>
> One HTML file has different parts for different countries.
> For example:
>
> <!-- Country: FR, BE --->
> ....
> Address for France and Benelux
> ....
> <!-- Country End -->
> <!-- Country: CH -->
> ....
> Address for Switzerland
> ....
> <!-- Country End -->
>
> Depending on a parameter, I show or hide the parts on the website 
> Logically, all parts are in the index and therefore all items are found by 
> SolR.
> My question is: how can I have only the items for the current country in my 
> result list?

How are you fetching the HTML content, and indexing it into Solr?
It is probably best to handle this requirement at that point. Haven't used 
Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it 
for this.

Regards,
Gora

AW: Indexing parts of an HTML file differently

Reply via email to