problems with PhraseHighlighter
Hello everyone, I am having problems with highlighting the complete text of a field. I have an xml field. I am querying proximity searches on this field. xml: ( proximity1 AND/OR proximity2 AND/OR …) Results are returned successfully satisfying the proximity query. However when I request highlighting sometimes it returns nothing sometimes it returns missing proximity terms. I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml. maxFieldLength2147483647/maxFieldLength I am using these highlighting parameters: hl.maxAnalyzedChars=2147483647 hl.fragsize=2147483647 hl.usePhraseHighlighter=true hl.requireFieldMatch=true hl.fl=xml hl=true I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but all query terms are highlighted. What value of hl.fragsize should I use to highlight complete text of a field? 0 or 2147483647? What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize? I am querying same field and requesting same field in highlighting. Although a document matches a query no highlighting returns back. What could be the reason? If a document matches a query, there should be highlighting returning back, right? Any help or pointers are really appreciated.
Re: problems with PhraseHighlighter
Copy-paste your field definition for the field you are trying to highlight/search on. Cheers Avlesh On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote: Hello everyone, I am having problems with highlighting the complete text of a field. I have an xml field. I am querying proximity searches on this field. xml: ( proximity1 AND/OR proximity2 AND/OR …) Results are returned successfully satisfying the proximity query. However when I request highlighting sometimes it returns nothing sometimes it returns missing proximity terms. I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml. maxFieldLength2147483647/maxFieldLength I am using these highlighting parameters: hl.maxAnalyzedChars=2147483647 hl.fragsize=2147483647 hl.usePhraseHighlighter=true hl.requireFieldMatch=true hl.fl=xml hl=true I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but all query terms are highlighted. What value of hl.fragsize should I use to highlight complete text of a field? 0 or 2147483647? What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize? I am querying same field and requesting same field in highlighting. Although a document matches a query no highlighting returns back. What could be the reason? If a document matches a query, there should be highlighting returning back, right? Any help or pointers are really appreciated.
Re: problems with PhraseHighlighter
Copy-paste your field definition for the field you are trying to highlight/search on. Cheers Avlesh Thank you for your interest Avlesh, My field type mostly contains custom filters and tokenizers. fieldType name=XMLText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=XMLStripStandardTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt ignoreCase=true expand=true / filter class=CustomStemFilterFactory protected=protwords.txt / filter class=LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=CustomTokenizerFactory / filter class=CustomDeasciifyFilterFactory / filter class=CustomStemFilterFactory protected=protwords.txt / filter class=LowerCaseFilterFactory / /analyzer /fieldType Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it works fine but when it comes to highlighting the em tags are replaced incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The em tags are inserted interestingly exactly one character before the actual term. So I added a new token definition to StandardTokenizer's jflex file, to recogize xml tags and ingores them. I confirmed that it is working with some testcases. It strips xml tags in tokenizer level. I am doing this because I am displaying original documents with xml + xslt. Therefore i need to highlight xml files to display. And I am using ComplexPhraseQueryParser [1]. But i reproduced the problem with defType=luceneq=term1 term2~5 I see that term1 and term2 is 5 terms close to each other . Therefore it is returned. But highlighting is empty. And there is no xml tags (stripped by tokenizer) between those terms in the original document. hl.maxanalyzedchars parameter is about original document, right? I mean in my case including xml tags too. [1] http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html