problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN
Hello everyone,

I am having problems with highlighting the complete text of a field. I have an 
xml field. I am querying proximity searches on this field. 

xml:  ( proximity1 AND/OR proximity2 AND/OR …)

Results are returned successfully satisfying the proximity query. However when 
I request highlighting sometimes it returns nothing sometimes it returns 
missing proximity terms.

I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
maxFieldLength2147483647/maxFieldLength

I am using these highlighting parameters:

hl.maxAnalyzedChars=2147483647
hl.fragsize=2147483647
hl.usePhraseHighlighter=true
hl.requireFieldMatch=true
hl.fl=xml
hl=true

I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it 
didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but 
all query terms are highlighted. 

What value of hl.fragsize should I use to highlight complete text of a field? 0 
or 2147483647?

What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize?

I am querying same field and requesting same field in highlighting. Although a 
document matches a query no highlighting returns back. What could be the reason?

If a document matches a query, there should be highlighting returning back, 
right?

Any help or pointers are really appreciated. 






Re: problems with PhraseHighlighter

2009-11-01 Thread Avlesh Singh
Copy-paste your field definition for the field you are trying to
highlight/search on.

Cheers
Avlesh

On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hello everyone,

 I am having problems with highlighting the complete text of a field. I have
 an xml field. I am querying proximity searches on this field.

 xml:  ( proximity1 AND/OR proximity2 AND/OR …)

 Results are returned successfully satisfying the proximity query. However
 when I request highlighting sometimes it returns nothing sometimes it
 returns missing proximity terms.

 I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
 maxFieldLength2147483647/maxFieldLength

 I am using these highlighting parameters:

 hl.maxAnalyzedChars=2147483647
 hl.fragsize=2147483647
 hl.usePhraseHighlighter=true
 hl.requireFieldMatch=true
 hl.fl=xml
 hl=true

 I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it
 didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns
 but all query terms are highlighted.

 What value of hl.fragsize should I use to highlight complete text of a
 field? 0 or 2147483647?

 What is the highest value that I can set to hl.maxAnalyzedChars and
 hl.fragsize?

 I am querying same field and requesting same field in highlighting.
 Although a document matches a query no highlighting returns back. What could
 be the reason?

 If a document matches a query, there should be highlighting returning back,
 right?

 Any help or pointers are really appreciated.







Re: problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN
 Copy-paste your field definition for
 the field you are trying to
 highlight/search on.
 
 Cheers
 Avlesh

Thank you for your interest Avlesh,

My field type mostly contains custom filters and tokenizers.

fieldType name=XMLText class=solr.TextField positionIncrementGap=100
 analyzer type=index
  tokenizer class=XMLStripStandardTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt 
ignoreCase=true expand=true / 
  filter class=CustomStemFilterFactory protected=protwords.txt / 
  filter class=LowerCaseFilterFactory / 
  /analyzer
 analyzer type=query
  tokenizer class=CustomTokenizerFactory / 
  filter class=CustomDeasciifyFilterFactory / 
  filter class=CustomStemFilterFactory protected=protwords.txt / 
  filter class=LowerCaseFilterFactory / 
  /analyzer
  /fieldType


Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it 
works fine but when it comes to highlighting the em tags are replaced 
incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The em 
tags are inserted interestingly exactly one character before the actual term. 
So I added a new token definition to StandardTokenizer's jflex file, to 
recogize xml tags and ingores them. I confirmed that it is working with some 
testcases. It strips xml tags in tokenizer level. I am doing this because I am 
displaying original documents with xml + xslt. Therefore i need to highlight 
xml files to display.

And I am using ComplexPhraseQueryParser [1].

But i reproduced the problem with defType=luceneq=term1 term2~5 I see that 
term1 and term2 is 5 terms close to each other . Therefore it is returned. But 
highlighting is empty. And there is no xml tags (stripped by tokenizer) between 
those terms in the original document.

hl.maxanalyzedchars parameter is about original document, right? I mean in my 
case including xml tags too.

[1] 
http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html