subject:"problems with PhraseHighlighter"

problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN

Hello everyone,

I am having problems with highlighting the complete text of a field. I have an 
xml field. I am querying proximity searches on this field. 

xml:  ( proximity1 AND/OR proximity2 AND/OR …)

Results are returned successfully satisfying the proximity query. However when 
I request highlighting sometimes it returns nothing sometimes it returns 
missing proximity terms.

I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
maxFieldLength2147483647/maxFieldLength

I am using these highlighting parameters:

hl.maxAnalyzedChars=2147483647
hl.fragsize=2147483647
hl.usePhraseHighlighter=true
hl.requireFieldMatch=true
hl.fl=xml
hl=true

I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it 
didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but 
all query terms are highlighted. 

What value of hl.fragsize should I use to highlight complete text of a field? 0 
or 2147483647?

What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize?

I am querying same field and requesting same field in highlighting. Although a 
document matches a query no highlighting returns back. What could be the reason?

If a document matches a query, there should be highlighting returning back, 
right?

Any help or pointers are really appreciated.

Re: problems with PhraseHighlighter

2009-11-01 Thread Avlesh Singh

Copy-paste your field definition for the field you are trying to
highlight/search on.

Cheers
Avlesh

On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hello everyone,

 I am having problems with highlighting the complete text of a field. I have
 an xml field. I am querying proximity searches on this field.

 xml:  ( proximity1 AND/OR proximity2 AND/OR …)

 Results are returned successfully satisfying the proximity query. However
 when I request highlighting sometimes it returns nothing sometimes it
 returns missing proximity terms.

 I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
 maxFieldLength2147483647/maxFieldLength

 I am using these highlighting parameters:

 hl.maxAnalyzedChars=2147483647
 hl.fragsize=2147483647
 hl.usePhraseHighlighter=true
 hl.requireFieldMatch=true
 hl.fl=xml
 hl=true

 I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it
 didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns
 but all query terms are highlighted.

 What value of hl.fragsize should I use to highlight complete text of a
 field? 0 or 2147483647?

 What is the highest value that I can set to hl.maxAnalyzedChars and
 hl.fragsize?

 I am querying same field and requesting same field in highlighting.
 Although a document matches a query no highlighting returns back. What could
 be the reason?

 If a document matches a query, there should be highlighting returning back,
 right?

 Any help or pointers are really appreciated.

Re: problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN

Copy-paste your field definition for
the field you are trying to
highlight/search on.

Cheers
Avlesh

Thank you for your interest Avlesh,

My field type mostly contains custom filters and tokenizers.

fieldType name=XMLText class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=XMLStripStandardTokenizerFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt
ignoreCase=true expand=true /
filter class=CustomStemFilterFactory protected=protwords.txt /
filter class=LowerCaseFilterFactory /
/analyzer
analyzer type=query
tokenizer class=CustomTokenizerFactory /
filter class=CustomDeasciifyFilterFactory /
filter class=CustomStemFilterFactory protected=protwords.txt /
filter class=LowerCaseFilterFactory /
/analyzer
/fieldType

Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it
works fine but when it comes to highlighting the em tags are replaced
incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The em
tags are inserted interestingly exactly one character before the actual term.
So I added a new token definition to StandardTokenizer's jflex file, to
recogize xml tags and ingores them. I confirmed that it is working with some
testcases. It strips xml tags in tokenizer level. I am doing this because I am
displaying original documents with xml + xslt. Therefore i need to highlight
xml files to display.

And I am using ComplexPhraseQueryParser [1].

But i reproduced the problem with defType=luceneq=term1 term2~5 I see that
term1 and term2 is 5 terms close to each other . Therefore it is returned. But
highlighting is empty. And there is no xml tags (stripped by tokenizer) between
those terms in the original document.

hl.maxanalyzedchars parameter is about original document, right? I mean in my
case including xml tags too.

[1]
http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html

problems with PhraseHighlighter

Re: problems with PhraseHighlighter

Re: problems with PhraseHighlighter

3 matches

Site Navigation

Mail list logo

Footer information