--- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com wrote:
From: Bryan Loofbourrow bloofbour...@knowledgemosaic.com
Subject: Displaying highlights in formatted HTML document
To: solr-user@lucene.apache.org
Date: Thursday, June 9, 2011, 2:14 AM
Here is my use case:
Hi Bryan,
how do you index your html files ? I mean do you create fields for different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ?
iorixxx, could you please explain a bit more your solution, because I don't
see how your solution
iorixxx, could you please explain a bit more your solution,
because I don't
see how your solution could give an exact highlighting, I
mean with the
different fields analysis for each fields.
It does not work with your use case (e.g. different synonyms applied different
parts of the html/xml
Ludovic,
how do you index your html files ? I mean do you create fields for
different
parts of your document (for different stop words lists, stemming, etc) ?
with DIH or solrj or something else ?
We are sending them over http, and using Tika to strip the HTML, at
present.
We do not split
I am not (yet) a tika user, perhaps that the iorixxx's solution is good for
you.
We will share the highlighter module and 2 other developments soon. ('have
to see how to do that)
Ludovic.
-
Jouve
France.
--
View this message in context:
-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Wednesday, June 08, 2011 11:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Displaying highlights in formatted HTML document
--- On Thu, 6/9/11, Bryan Loofbourrow bloofbour...@knowledgemosaic.com
wrote
OK, I think see what you're up to. Might be pretty viable
for me as well.
Can you talk about anything in your mappings.txt files that
is an
important part of the solution?
It is not important. I just copied it. Plus html strip char filter does not
have mappings parameter. It was a copy
OK, I think see what you're up to. Might be pretty viable
for me as well.
Can you talk about anything in your mappings.txt files that
is an
important part of the solution?
It is not important. I just copied it. Plus html strip char filter does
not have mappings parameter. It was a
Yes, I asked the wrong question. What I was subconsciously
getting at is
this: how are you avoiding the possibility of getting hits
in the HTML
elements? Is that accomplished by putting tag names in your
stopwords, or
by some other mechanism?
HtmlStripCharFilter removes html tags. After it