RE: Document match with no highlight

2011-05-13 Thread Pierre GOSSE
In WordDelimiterFilter the parameters catenateNumbers, catenateWords, 
catenateAlls are set to 1. This parameters adds overlapping tokens which could 
explain that you meet the bug described in the jira issue I mentioned. 

As I understand WordDelimiterFilter :
0176 R3 1.5 TO should we tokenized with tokens R3 overlapping with R and 
3, and 15 overlapping with 1 and 5 

This parmeters are set to 0 for query, but having them set to 1 should not 
correct your problem unless you search for R3 1.5.

I think you have to either
 - set this parameters to 0 in index, but your query won't match anymore
 - wait for correction to be released in a new solr version, 
 - use solr trunk, 
 - or backport the modifications in the lucene-highlighter version you use. 

I did a backport for solr 1.4.1 since I won't move to 3.0 until some time, so 
please ask if you have question about how to do this.

Pierre


-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 20:06
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my text fieldtype.

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=0 catenateNumbers=0 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
fieldtype
Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
0176 R3 1.5 TO 

Searching for 3 1 15 returns a match with empty highlight.
Searching for 3 1 15~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

  Since you're using the standard text field, this should NOT be you're
 case.

 Sorry, for the missing NOT in previous phrase. You should have the same
 issue given what you said, but still, it sound very similar.

 Are you sure your fieldtype text has nothing special ? a tokenizer or
 filter that could add some token in your indexed text but not in your query,
 like for example a WordDelimiter present in index and not query ?

 Pierre

 -Message d'origine-
 De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
 Envoyé : jeudi 12 mai 2011 18:21
 À : solr-user@lucene.apache.org
 Objet : RE: Document match with no highlight

  In fact if I did 3 1 15~1 I do get snipet also.

 Strange, I had a very similar problem, but with overlapping tokens. Since
 you're using the standard text field, this should be you're case.

 Maybe you could have a look at this issue, since it sound very familiar to
 me :
 https://issues.apache.org/jira/browse/LUCENE-3087

 Pierre

 -Message d'origine-
 De : Phong Dais [mailto:phong.gd...@gmail.com]
 Envoyé : jeudi 12 mai 2011 17:26
 À : solr-user@lucene.apache.org
 Objet : Re: Document match with no highlight

 Hi,

 field name=DOC_TEXT type=text indexed=true stored=true/

 The type text is the default one that came with the default solr 1.4
 install w.o any modifications.

 If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
 get snipet also.

 Hope that helps.

 P.

 On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

URL:
  
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
  
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name

Re: Document match with no highlight

2011-05-13 Thread Phong Dais
Pierre,

Merci beaucoup Pierre. :)

You saved me a lot of time and headache.

As I understand WordDelimiterFilter :
 0176 R3 1.5 TO should we tokenized with tokens R3 overlapping with R
 and 3, and 15 overlapping with 1 and 5


This parmeters are set to 0 for query, but having them set to 1 should not
 correct your problem unless you search for R3 1.5.


You are correct.



 I think you have to either
  - set this parameters to 0 in index, but your query won't match anymore
  - wait for correction to be released in a new solr version,
  - use solr trunk,
  - or backport the modifications in the lucene-highlighter version you
 use.


For what I need, using 0 for the index should do the trick.  I did not want
my query to match.


 I did a backport for solr 1.4.1 since I won't move to 3.0 until some time,
 so please ask if you have question about how to do this.


I don't anticipate the need for a backport but is there any wiki out there
that outline this process?

Regards,
Phong



 -Message d'origine-
 De : Phong Dais [mailto:phong.gd...@gmail.com]
 Envoyé : jeudi 12 mai 2011 20:06
 À : solr-user@lucene.apache.org
 Objet : Re: Document match with no highlight

 Hi,

 I read the link provided and I'll need some time to digest what it is
 saying.

 Here's my text fieldtype.

 fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
 generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
 generateNumberParts=1
  catenateWords=0 catenateNumbers=0 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
  /analyzer
 fieldtype
 Also, I figured out what value in DOC_TEXT cause this issue to occur.
 With a DOC_TEXT of (without the quotes):
 0176 R3 1.5 TO 

 Searching for 3 1 15 returns a match with empty highlight.
 Searching for 3 1 15~1 returns a match with highlight.

 Can anyone see anything that I'm missing?

 Thanks,
 P.


 On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.com
 wrote:

   Since you're using the standard text field, this should NOT be you're
  case.
 
  Sorry, for the missing NOT in previous phrase. You should have the same
  issue given what you said, but still, it sound very similar.
 
  Are you sure your fieldtype text has nothing special ? a tokenizer or
  filter that could add some token in your indexed text but not in your
 query,
  like for example a WordDelimiter present in index and not query ?
 
  Pierre
 
  -Message d'origine-
  De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
  Envoyé : jeudi 12 mai 2011 18:21
  À : solr-user@lucene.apache.org
  Objet : RE: Document match with no highlight
 
   In fact if I did 3 1 15~1 I do get snipet also.
 
  Strange, I had a very similar problem, but with overlapping tokens. Since
  you're using the standard text field, this should be you're case.
 
  Maybe you could have a look at this issue, since it sound very familiar
 to
  me :
  https://issues.apache.org/jira/browse/LUCENE-3087
 
  Pierre
 
  -Message d'origine-
  De : Phong Dais [mailto:phong.gd...@gmail.com]
  Envoyé : jeudi 12 mai 2011 17:26
  À : solr-user@lucene.apache.org
  Objet : Re: Document match with no highlight
 
  Hi,
 
  field name=DOC_TEXT type=text indexed=true stored=true/
 
  The type text is the default one that came with the default solr 1.4
  install w.o any modifications.
 
  If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I
 do
  get snipet also.
 
  Hope that helps.
 
  P.
 
  On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:
 
 URL:
   
  
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
   
  
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
   
XML:
?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime19/int
lst name=params
  str name=explainOther/
  str
name=indenton/str
  str
name=hl.flDOC_TEXT/str
  str
name=wtstandard/str

Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

Ok, here it is.  Please note that I had to type everything.  I did double
and triple check for typos.
I do not use term vectors.  I also left out the timing section.

Thanks for all the help.
P.

URL:
http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1

XML:
?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime19/int
lst name=params
  str name=explainOther/
  str name=indenton/str
  str name=hl.flDOC_TEXT/str
  str name=wtstandard/str
  str name=hl.maxAnalyzedChars-1/str
  str name=hlon/str
  str name=rows10/str
  str name=version2.2/str
  str name=debugQueryon/str
  str name=flDOC_TEXT,score/str
  str name=start0/str
  str name=qDOC_TEXT:3 1 15/str
  str name=qtstandard/str
  str name=fq/
/lst
  /lst
  result name=response numFound='1 start=0 maxScore=0.035959315
doc
  float name=score0.035959315/float
  arr name=DOC_TEXTstr ... /str/arr
doc
  /result
  lst name=highlighting
lst name=123456/
  /lst
  lst name=debug
str name=rawquerystringDOC_TEXT:3 1 15/str
str name=querystringDOC_TEXT:3 1 15/str
str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str
str name=parsedquery_toStringDOC_TEXT:3 1 15/str
lst name=explain
  str name=123456
0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
tf(phraseFreq=1.0)
0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1)
0.0390625 = fieldNorm(field=DOC_TEXT, doc=0)
/str
  /lst
  str name=QParserLuceneQParser/str
  arr name=filter_queries
str/
  /arr
  arr name=parsed_filter_queries/
  lst name=timing
...
  /lst
/response


On Wed, May 11, 2011 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote:

  I can upload the search URL and part of the output but not
  all of it.
   Company trade secrets does not allow me to upload the
  content of the
  DOC_TEXT field.  I can upload the debug output
  section and whatever else
  is needed but I cannot upload the actual document data.
 
  Please let me know if any of this will help without the
  actual data.

 Sure they will help. Seeing complete list of parameters.
 Do you store term vectors?



Re: Document match with no highlight

2011-05-12 Thread Ahmet Arslan
 URL:
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
 XML:
 ?xml version=1.0 encoding=UTF-8?
 response
   lst name=responseHeader
     int name=status0/int
     int name=QTime19/int
     lst name=params
       str name=explainOther/
       str
 name=indenton/str
       str
 name=hl.flDOC_TEXT/str
       str
 name=wtstandard/str
       str
 name=hl.maxAnalyzedChars-1/str
       str name=hlon/str
       str name=rows10/str
       str
 name=version2.2/str
       str
 name=debugQueryon/str
       str
 name=flDOC_TEXT,score/str
       str name=start0/str
       str name=qDOC_TEXT:3 1
 15/str
       str
 name=qtstandard/str
       str name=fq/
     /lst
   /lst
   result name=response numFound='1 start=0
 maxScore=0.035959315
     doc
       float
 name=score0.035959315/float
       arr name=DOC_TEXTstr
 ... /str/arr
     doc
   /result
   lst name=highlighting
     lst name=123456/
   /lst
   lst name=debug
     str name=rawquerystringDOC_TEXT:3
 1 15/str
     str name=querystringDOC_TEXT:3 1
 15/str
     str
 name=parsedqueryPhraseQuery(DOC_TEXT:3 1
 15)/str
     str
 name=parsedquery_toStringDOC_TEXT:3 1
 15/str
     lst name=explain
       str name=123456
         0.035959315 =
 fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
 tf(phraseFreq=1.0)
         0.92055845 = idf(DOC_TEXT: 3=1
 1=1 15=1)
         0.0390625 =
 fieldNorm(field=DOC_TEXT, doc=0)
     /str
   /lst
   str name=QParserLuceneQParser/str
   arr name=filter_queries
     str/
   /arr
   arr name=parsed_filter_queries/
   lst name=timing
     ...
   /lst
 /response


Nothing looks suspicious. 

Can you provide two things more;
fieldType of DOC_TEXT
and
field definition of DOC_TEXT.

Also do you get snippet from the same doc, when you remove quotes from your 
query?



RE: Document match with no highlight

2011-05-12 Thread Bob Sandiford
Don't you need to include your unique id field in your 'fl' parameter?  It will 
be needed anyways so you can match up the highlight fragments with the result 
docs once highlighting is working...

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 
Join the conversation - you may even get an iPad or Nook out of it!

Like us on Facebook!

Follow us on Twitter!



 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Thursday, May 12, 2011 7:10 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Document match with no highlight
 
  URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2
 23+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl
 ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
    lst name=responseHeader
      int name=status0/int
      int name=QTime19/int
      lst name=params
        str name=explainOther/
        str
  name=indenton/str
        str
  name=hl.flDOC_TEXT/str
        str
  name=wtstandard/str
        str
  name=hl.maxAnalyzedChars-1/str
        str name=hlon/str
        str name=rows10/str
        str
  name=version2.2/str
        str
  name=debugQueryon/str
        str
  name=flDOC_TEXT,score/str
        str name=start0/str
        str name=qDOC_TEXT:3 1
  15/str
        str
  name=qtstandard/str
        str name=fq/
      /lst
    /lst
    result name=response numFound='1 start=0
  maxScore=0.035959315
      doc
        float
  name=score0.035959315/float
        arr name=DOC_TEXTstr
  ... /str/arr
      doc
    /result
    lst name=highlighting
      lst name=123456/
    /lst
    lst name=debug
      str name=rawquerystringDOC_TEXT:3
  1 15/str
      str name=querystringDOC_TEXT:3 1
  15/str
      str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
      str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
      lst name=explain
        str name=123456
          0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
          0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
          0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
      /str
    /lst
    str name=QParserLuceneQParser/str
    arr name=filter_queries
      str/
    /arr
    arr name=parsed_filter_queries/
    lst name=timing
      ...
    /lst
  /response
 
 
 Nothing looks suspicious.
 
 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.
 
 Also do you get snippet from the same doc, when you remove quotes from
 your query?
 




Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

I use
uniqueKeyDOC_ID/uniquekey
in schema.xml

I think this is the default unique id that is used for matching.  Someone
correct me if I am wrong.

P.



On Thu, May 12, 2011 at 11:01 AM, Bob Sandiford 
bob.sandif...@sirsidynix.com wrote:

 Don't you need to include your unique id field in your 'fl' parameter?  It
 will be needed anyways so you can match up the highlight fragments with the
 result docs once highlighting is working...

 Bob Sandiford | Lead Software Engineer | SirsiDynix
 P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
 www.sirsidynix.com
 Join the conversation - you may even get an iPad or Nook out of it!

 Like us on Facebook!

 Follow us on Twitter!



  -Original Message-
  From: Ahmet Arslan [mailto:iori...@yahoo.com]
  Sent: Thursday, May 12, 2011 7:10 AM
  To: solr-user@lucene.apache.org
   Subject: Re: Document match with no highlight
 
   URL:
  
  http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%2
  23+1+15%22fq=start=0
  
  rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexpl
  ainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name=flDOC_TEXT,score/str
 str name=start0/str
 str name=qDOC_TEXT:3 1
   15/str
 str
   name=qtstandard/str
 str name=fq/
   /lst
 /lst
 result name=response numFound='1 start=0
   maxScore=0.035959315
   doc
 float
   name=score0.035959315/float
 arr name=DOC_TEXTstr
   ... /str/arr
   doc
 /result
 lst name=highlighting
   lst name=123456/
 /lst
 lst name=debug
   str name=rawquerystringDOC_TEXT:3
   1 15/str
   str name=querystringDOC_TEXT:3 1
   15/str
   str
   name=parsedqueryPhraseQuery(DOC_TEXT:3 1
   15)/str
   str
   name=parsedquery_toStringDOC_TEXT:3 1
   15/str
   lst name=explain
 str name=123456
   0.035959315 =
   fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
   tf(phraseFreq=1.0)
   0.92055845 = idf(DOC_TEXT: 3=1
   1=1 15=1)
   0.0390625 =
   fieldNorm(field=DOC_TEXT, doc=0)
   /str
 /lst
 str name=QParserLuceneQParser/str
 arr name=filter_queries
   str/
 /arr
 arr name=parsed_filter_queries/
 lst name=timing
   ...
 /lst
   /response
 
 
  Nothing looks suspicious.
 
  Can you provide two things more;
  fieldType of DOC_TEXT
  and
  field definition of DOC_TEXT.
 
  Also do you get snippet from the same doc, when you remove quotes from
  your query?
 





RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




RE: Document match with no highlight

2011-05-12 Thread Pierre GOSSE
 Since you're using the standard text field, this should NOT be you're case.

Sorry, for the missing NOT in previous phrase. You should have the same issue 
given what you said, but still, it sound very similar. 

Are you sure your fieldtype text has nothing special ? a tokenizer or filter 
that could add some token in your indexed text but not in your query, like for 
example a WordDelimiter present in index and not query ?

Pierre

-Message d'origine-
De : Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Envoyé : jeudi 12 mai 2011 18:21
À : solr-user@lucene.apache.org
Objet : RE: Document match with no highlight

 In fact if I did 3 1 15~1 I do get snipet also.

Strange, I had a very similar problem, but with overlapping tokens. Since 
you're using the standard text field, this should be you're case. 

Maybe you could have a look at this issue, since it sound very familiar to me :
https://issues.apache.org/jira/browse/LUCENE-3087

Pierre

-Message d'origine-
De : Phong Dais [mailto:phong.gd...@gmail.com] 
Envoyé : jeudi 12 mai 2011 17:26
À : solr-user@lucene.apache.org
Objet : Re: Document match with no highlight

Hi,

field name=DOC_TEXT type=text indexed=true stored=true/

The type text is the default one that came with the default solr 1.4
install w.o any modifications.

If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
get snipet also.

Hope that helps.

P.

On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

   URL:
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
 
  XML:
  ?xml version=1.0 encoding=UTF-8?
  response
lst name=responseHeader
  int name=status0/int
  int name=QTime19/int
  lst name=params
str name=explainOther/
str
  name=indenton/str
str
  name=hl.flDOC_TEXT/str
str
  name=wtstandard/str
str
  name=hl.maxAnalyzedChars-1/str
str name=hlon/str
str name=rows10/str
str
  name=version2.2/str
str
  name=debugQueryon/str
str
  name=flDOC_TEXT,score/str
str name=start0/str
str name=qDOC_TEXT:3 1
  15/str
str
  name=qtstandard/str
str name=fq/
  /lst
/lst
result name=response numFound='1 start=0
  maxScore=0.035959315
  doc
float
  name=score0.035959315/float
arr name=DOC_TEXTstr
  ... /str/arr
  doc
/result
lst name=highlighting
  lst name=123456/
/lst
lst name=debug
  str name=rawquerystringDOC_TEXT:3
  1 15/str
  str name=querystringDOC_TEXT:3 1
  15/str
  str
  name=parsedqueryPhraseQuery(DOC_TEXT:3 1
  15)/str
  str
  name=parsedquery_toStringDOC_TEXT:3 1
  15/str
  lst name=explain
str name=123456
  0.035959315 =
  fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
  tf(phraseFreq=1.0)
  0.92055845 = idf(DOC_TEXT: 3=1
  1=1 15=1)
  0.0390625 =
  fieldNorm(field=DOC_TEXT, doc=0)
  /str
/lst
str name=QParserLuceneQParser/str
arr name=filter_queries
  str/
/arr
arr name=parsed_filter_queries/
lst name=timing
  ...
/lst
  /response


 Nothing looks suspicious.

 Can you provide two things more;
 fieldType of DOC_TEXT
 and
 field definition of DOC_TEXT.

 Also do you get snippet from the same doc, when you remove quotes from your
 query?




Re: Document match with no highlight

2011-05-12 Thread Phong Dais
Hi,

I read the link provided and I'll need some time to digest what it is
saying.

Here's my text fieldtype.

fieldtype name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=1 catenateNumbers=1 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimeterFilterFactory generateWordParts=1
generateNumberParts=1
  catenateWords=0 catenateNumbers=0 catenateAll=0
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
  /analyzer
fieldtype
Also, I figured out what value in DOC_TEXT cause this issue to occur.
With a DOC_TEXT of (without the quotes):
0176 R3 1.5 TO 

Searching for 3 1 15 returns a match with empty highlight.
Searching for 3 1 15~1 returns a match with highlight.

Can anyone see anything that I'm missing?

Thanks,
P.


On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote:

  Since you're using the standard text field, this should NOT be you're
 case.

 Sorry, for the missing NOT in previous phrase. You should have the same
 issue given what you said, but still, it sound very similar.

 Are you sure your fieldtype text has nothing special ? a tokenizer or
 filter that could add some token in your indexed text but not in your query,
 like for example a WordDelimiter present in index and not query ?

 Pierre

 -Message d'origine-
 De : Pierre GOSSE [mailto:pierre.go...@arisem.com]
 Envoyé : jeudi 12 mai 2011 18:21
 À : solr-user@lucene.apache.org
 Objet : RE: Document match with no highlight

  In fact if I did 3 1 15~1 I do get snipet also.

 Strange, I had a very similar problem, but with overlapping tokens. Since
 you're using the standard text field, this should be you're case.

 Maybe you could have a look at this issue, since it sound very familiar to
 me :
 https://issues.apache.org/jira/browse/LUCENE-3087

 Pierre

 -Message d'origine-
 De : Phong Dais [mailto:phong.gd...@gmail.com]
 Envoyé : jeudi 12 mai 2011 17:26
 À : solr-user@lucene.apache.org
 Objet : Re: Document match with no highlight

 Hi,

 field name=DOC_TEXT type=text indexed=true stored=true/

 The type text is the default one that came with the default solr 1.4
 install w.o any modifications.

 If I remove the quotes I do get snipets.  In fact if I did 3 1 15~1 I do
 get snipet also.

 Hope that helps.

 P.

 On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote:

URL:
  
 
 http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0
  
 
 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1
  
   XML:
   ?xml version=1.0 encoding=UTF-8?
   response
 lst name=responseHeader
   int name=status0/int
   int name=QTime19/int
   lst name=params
 str name=explainOther/
 str
   name=indenton/str
 str
   name=hl.flDOC_TEXT/str
 str
   name=wtstandard/str
 str
   name=hl.maxAnalyzedChars-1/str
 str name=hlon/str
 str name=rows10/str
 str
   name=version2.2/str
 str
   name=debugQueryon/str
 str
   name=flDOC_TEXT,score/str
 str name=start0/str
 str name=qDOC_TEXT:3 1
   15/str
 str
   name=qtstandard/str
 str name=fq/
   /lst
 /lst
 result name=response numFound='1 start=0
   maxScore=0.035959315
   doc
 float
   name=score0.035959315/float
 arr name=DOC_TEXTstr
   ... /str/arr
   doc
 /result
 lst name=highlighting
   lst name=123456/
 /lst
 lst name=debug
   str name=rawquerystringDOC_TEXT:3
   1 15/str
   str name=querystringDOC_TEXT:3 1
   15/str
   str
   name=parsedqueryPhraseQuery(DOC_TEXT:3 1
   15)/str
   str
   name=parsedquery_toStringDOC_TEXT:3 1
   15/str
   lst name=explain
 str name=123456
   0.035959315 =
   fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 =
   tf(phraseFreq=1.0)
   0.92055845 = idf(DOC_TEXT: 3=1
   1=1 15=1)
   0.0390625 =
   fieldNorm(field=DOC_TEXT, doc=0)
   /str
 /lst
 str name=QParserLuceneQParser/str
 arr name=filter_queries
   str/
 /arr
 arr name=parsed_filter_queries

Document match with no highlight

2011-05-11 Thread Phong Dais
HI,

I am having a problem with highlighting which I cannot comprehend.
I'm using the solr/admin/form.jsp (full interface) to submit a search for 3
1 15 (with the quotes).
I have Enable Highlighting checked and I have specified the field to
highlight, in my case DOC_TEXT.  Everything else default values.
On the result page (after hitting the Search button), I have the, in the
highlight section:

lst name=highlighting
  lst name=1234 /
/lst

1234 is the DOC_ID (fieldtype string) field of my document.
The DOC_TEXT field is nowhere to be found in the highlighting section even
though the

result name=response numFound=1 start=0 maxScore=0.03432435

shows that there was 1 match.

I get this problem with only a very small number of my documents.  Most of
my documents are correctly found along with the correct highlights.
For the ones that matched without the highlight, I did a manual search
within the document for 3 1 15 (ie token 3 followed by token 1 followed by
token 15) and found no match.

The DOC_TEXT is of fieldtype name=text class=solr.TextField
positionIncrementGap=100.  DOC_TEXT is stored.
This is the same canned fieldtype that comes pre-defined with the solr 1.4
standard install.

I am totally stumped and I do not even know where to begin to resolve this
issue.

Thanks for any help.

Phong


Re: Document match with no highlight

2011-05-11 Thread Ahmet Arslan


--- On Wed, 5/11/11, Phong Dais phong.gd...@gmail.com wrote:

 From: Phong Dais phong.gd...@gmail.com
 Subject: Document match with no highlight
 To: solr-user@lucene.apache.org
 Date: Wednesday, May 11, 2011, 1:29 PM
 HI,
 
 I am having a problem with highlighting which I cannot
 comprehend.
 I'm using the solr/admin/form.jsp (full interface) to
 submit a search for 3
 1 15 (with the quotes).
 I have Enable Highlighting checked and I have specified
 the field to
 highlight, in my case DOC_TEXT.  Everything else
 default values.
 On the result page (after hitting the Search button), I
 have the, in the
 highlight section:
 
 lst name=highlighting
   lst name=1234 /
 /lst
 
 1234 is the DOC_ID (fieldtype string) field of my
 document.
 The DOC_TEXT field is nowhere to be found in the
 highlighting section even
 though the
 
 result name=response numFound=1 start=0
 maxScore=0.03432435
 
 shows that there was 1 match.
 
 I get this problem with only a very small number of my
 documents.  Most of
 my documents are correctly found along with the correct
 highlights.
 For the ones that matched without the highlight, I did a
 manual search
 within the document for 3 1 15 (ie token 3 followed by
 token 1 followed by
 token 15) and found no match.
 
 The DOC_TEXT is of fieldtype name=text
 class=solr.TextField
 positionIncrementGap=100.  DOC_TEXT is stored.
 This is the same canned fieldtype that comes pre-defined
 with the solr 1.4
 standard install.
 
 I am totally stumped and I do not even know where to begin
 to resolve this
 issue.

What is you default search field? 

Increasing hl.maxAnalyzedChars may help.

http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars


Re: Document match with no highlight

2011-05-11 Thread Phong Dais
Hi,

Already tried that.  Tried a ridiculously huge number and -1.  Same result.

Some clarification.  I submitted the search string:

DOC_TEXT:3 1 15

Thanks,
P.

On Wed, May 11, 2011 at 7:01 AM, Ahmet Arslan iori...@yahoo.com wrote:



 --- On Wed, 5/11/11, Phong Dais phong.gd...@gmail.com wrote:

  From: Phong Dais phong.gd...@gmail.com
  Subject: Document match with no highlight
  To: solr-user@lucene.apache.org
  Date: Wednesday, May 11, 2011, 1:29 PM
  HI,
 
  I am having a problem with highlighting which I cannot
  comprehend.
  I'm using the solr/admin/form.jsp (full interface) to
  submit a search for 3
  1 15 (with the quotes).
  I have Enable Highlighting checked and I have specified
  the field to
  highlight, in my case DOC_TEXT.  Everything else
  default values.
  On the result page (after hitting the Search button), I
  have the, in the
  highlight section:
 
  lst name=highlighting
lst name=1234 /
  /lst
 
  1234 is the DOC_ID (fieldtype string) field of my
  document.
  The DOC_TEXT field is nowhere to be found in the
  highlighting section even
  though the
 
  result name=response numFound=1 start=0
  maxScore=0.03432435
 
  shows that there was 1 match.
 
  I get this problem with only a very small number of my
  documents.  Most of
  my documents are correctly found along with the correct
  highlights.
  For the ones that matched without the highlight, I did a
  manual search
  within the document for 3 1 15 (ie token 3 followed by
  token 1 followed by
  token 15) and found no match.
 
  The DOC_TEXT is of fieldtype name=text
  class=solr.TextField
  positionIncrementGap=100.  DOC_TEXT is stored.
  This is the same canned fieldtype that comes pre-defined
  with the solr 1.4
  standard install.
 
  I am totally stumped and I do not even know where to begin
  to resolve this
  issue.

 What is you default search field?

 Increasing hl.maxAnalyzedChars may help.

 http://wiki.apache.org/solr/HighlightingParameters#hl.maxAnalyzedChars



Re: Document match with no highlight

2011-05-11 Thread Jan-Eirik B . Nævdal
Have you checked that the search phrase are in the field you uses as
highlight field?
Its standard if it dont get hits in the defined highlight field it would
return emty result.

A way around this problem is to add more fields to highlight or merge the
searchable text into a single text field and hightlight that.

Jan Eirik


On Wed, May 11, 2011 at 12:29 PM, Phong Dais phong.gd...@gmail.com wrote:

 HI,

 I am having a problem with highlighting which I cannot comprehend.
 I'm using the solr/admin/form.jsp (full interface) to submit a search for
 3
 1 15 (with the quotes).
 I have Enable Highlighting checked and I have specified the field to
 highlight, in my case DOC_TEXT.  Everything else default values.
 On the result page (after hitting the Search button), I have the, in the
 highlight section:

 lst name=highlighting
  lst name=1234 /
 /lst

 1234 is the DOC_ID (fieldtype string) field of my document.
 The DOC_TEXT field is nowhere to be found in the highlighting section even
 though the

 result name=response numFound=1 start=0 maxScore=0.03432435

 shows that there was 1 match.

 I get this problem with only a very small number of my documents.  Most of
 my documents are correctly found along with the correct highlights.
 For the ones that matched without the highlight, I did a manual search
 within the document for 3 1 15 (ie token 3 followed by token 1 followed
 by
 token 15) and found no match.

 The DOC_TEXT is of fieldtype name=text class=solr.TextField
 positionIncrementGap=100.  DOC_TEXT is stored.
 This is the same canned fieldtype that comes pre-defined with the solr
 1.4
 standard install.

 I am totally stumped and I do not even know where to begin to resolve this
 issue.

 Thanks for any help.

 Phong




-- 
Jan Eirik B. Nævdal
Solutions Engineer | +47 982 65 347
Iterate AS | www.iterate.no
The Lean Software Development Consultancy


Re: Document match with no highlight

2011-05-11 Thread Phong Dais
Hi,

When I eyeball the highlighted field, I do not find the search phrase in
the document that was returned as a match.
The search field is DOC_TEXT, the highlighted field is DOC_TEXT, and the
search query is DOC_TEXT:3 1 15.
I get a match with empty highlight but it looks to me like it shouldn't be
a match.  And this only happens to small number of docs.

P.

On Wed, May 11, 2011 at 8:21 AM, Jan-Eirik B. Nævdal 
jan-eirik.naev...@iterate.no wrote:

 Have you checked that the search phrase are in the field you uses as
 highlight field?
 Its standard if it dont get hits in the defined highlight field it would
 return emty result.

 A way around this problem is to add more fields to highlight or merge the
 searchable text into a single text field and hightlight that.

 Jan Eirik


 On Wed, May 11, 2011 at 12:29 PM, Phong Dais phong.gd...@gmail.com
 wrote:

  HI,
 
  I am having a problem with highlighting which I cannot comprehend.
  I'm using the solr/admin/form.jsp (full interface) to submit a search for
  3
  1 15 (with the quotes).
  I have Enable Highlighting checked and I have specified the field to
  highlight, in my case DOC_TEXT.  Everything else default values.
  On the result page (after hitting the Search button), I have the, in the
  highlight section:
 
  lst name=highlighting
   lst name=1234 /
  /lst
 
  1234 is the DOC_ID (fieldtype string) field of my document.
  The DOC_TEXT field is nowhere to be found in the highlighting section
 even
  though the
 
  result name=response numFound=1 start=0 maxScore=0.03432435
 
  shows that there was 1 match.
 
  I get this problem with only a very small number of my documents.  Most
 of
  my documents are correctly found along with the correct highlights.
  For the ones that matched without the highlight, I did a manual search
  within the document for 3 1 15 (ie token 3 followed by token 1 followed
  by
  token 15) and found no match.
 
  The DOC_TEXT is of fieldtype name=text class=solr.TextField
  positionIncrementGap=100.  DOC_TEXT is stored.
  This is the same canned fieldtype that comes pre-defined with the solr
  1.4
  standard install.
 
  I am totally stumped and I do not even know where to begin to resolve
 this
  issue.
 
  Thanks for any help.
 
  Phong
 



 --
 Jan Eirik B. Nævdal
 Solutions Engineer | +47 982 65 347
 Iterate AS | www.iterate.no
 The Lean Software Development Consultancy



Re: Document match with no highlight

2011-05-11 Thread Ahmet Arslan
 Already tried that.  Tried a ridiculously huge number
 and -1.  Same result.
 
 Some clarification.  I submitted the search string:
 
 DOC_TEXT:3 1 15

Can you append debugQuery=on and give us its output? And the complete search 
URL will also help.


Re: Document match with no highlight

2011-05-11 Thread Phong Dais
Hi,

I can upload the search URL and part of the output but not all of it.
 Company trade secrets does not allow me to upload the content of the
DOC_TEXT field.  I can upload the debug output section and whatever else
is needed but I cannot upload the actual document data.

Please let me know if any of this will help without the actual data.

Thanks,
P.


On Wed, May 11, 2011 at 8:46 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Already tried that.  Tried a ridiculously huge number
  and -1.  Same result.
 
  Some clarification.  I submitted the search string:
 
  DOC_TEXT:3 1 15

 Can you append debugQuery=on and give us its output? And the complete
 search URL will also help.



Re: Document match with no highlight

2011-05-11 Thread Ahmet Arslan
 I can upload the search URL and part of the output but not
 all of it.
  Company trade secrets does not allow me to upload the
 content of the
 DOC_TEXT field.  I can upload the debug output
 section and whatever else
 is needed but I cannot upload the actual document data.
 
 Please let me know if any of this will help without the
 actual data.

Sure they will help. Seeing complete list of parameters. 
Do you store term vectors?