Can you get Delivery Server to generate Solr-style XML or JSON update file? Might be easier than generating and then re-parsing HTML?
Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Mar 27, 2014 at 3:28 PM, Michael Clivot <cli...@netmedia.de> wrote: > Thanks for your answer Jack. > @Gora: > >> How are you fetching the HTML content, and indexing it into Solr? > > We are using SolR with the OpenText Delivery Server. The Delivery Server > generated HTML representations of the published pages and writes them to the > directory, which is used by solr to get data content. > >> It is probably best to handle this requirement at that point. Haven't used >> Nutch ( http://nutch.apache.org/) recently, but you might be able to use it >> for this. > > Do you mean the web crawler way? From the first view, it fits us not very > good. In this case we need to implement ourselves the OpenText Search layer. > Theoretically, we can try to teach DeliveryServer to understand external > indexes. But the crawling itself is not the preferred solution - it is not so > responsive, as the DS-way; in case of existing authorization restrictions, it > should be many crawler users for every role; etc... > > -----Ursprüngliche Nachricht----- > Von: Gora Mohanty [mailto:g...@mimirtech.com] > Gesendet: Dienstag, 25. März 2014 11:32 > An: solr-user@lucene.apache.org > Betreff: Re: Indexing parts of an HTML file differently > > On 25 March 2014 15:59, Michael Clivot <cli...@netmedia.de> wrote: >> Hello, >> >> I have the following issue and need help: >> >> One HTML file has different parts for different countries. >> For example: >> >> <!-- Country: FR, BE ---> >> .... >> Address for France and Benelux >> .... >> <!-- Country End --> >> <!-- Country: CH --> >> .... >> Address for Switzerland >> .... >> <!-- Country End --> >> >> Depending on a parameter, I show or hide the parts on the website >> Logically, all parts are in the index and therefore all items are found by >> SolR. >> My question is: how can I have only the items for the current country in my >> result list? > > How are you fetching the HTML content, and indexing it into Solr? > It is probably best to handle this requirement at that point. Haven't used > Nutch ( http://nutch.apache.org/ ) recently, but you might be able to use it > for this. > > Regards, > Gora