Re: Weird behaviour with phrase queries

2011-01-26 Thread Jerome Renard
Hi Erick,

On Tue, Jan 25, 2011 at 1:38 PM, Erick Erickson erickerick...@gmail.comwrote:

 Frankly, this puzzles me. It *looks* like it should be OK. One warning, the
 analysis page sometimes is a bit misleading, so beware of that.

 But the output of your queries make it look like the query is parsing as
 you
 expect, which leaves the question of whether your index contains what
 you think it does. You might get a copy of Luke, which allows you to
 examine
 what's actually in your index instead of what you think is in there.
 Sometimes
 there are surprises here!


Bingo ! Some data were not in the index. Indexing them obviously fixed the
problem.


 I didn't mean to re-index your whole corpus, I was thinking that you could
 just index a few documents in a test index so you have something small to
 look at.

 Sorry I can't spot what's happening right away.


No worries, thanks for your support :)

-- 
Jérôme


Re: Weird behaviour with phrase queries

2011-01-25 Thread Erick Erickson
Frankly, this puzzles me. It *looks* like it should be OK. One warning, the
analysis page sometimes is a bit misleading, so beware of that.

But the output of your queries make it look like the query is parsing as you
expect, which leaves the question of whether your index contains what
you think it does. You might get a copy of Luke, which allows you to examine
what's actually in your index instead of what you think is in there.
Sometimes
there are surprises here!

I didn't mean to re-index your whole corpus, I was thinking that you could
just index a few documents in a test index so you have something small to
look at.

Sorry I can't spot what's happening right away.

Good luck!
Erick

On Tue, Jan 25, 2011 at 2:45 AM, Jerome Renard jerome.ren...@gmail.comwrote:

 Erick,

 On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Hmmm, I don't see any screen shots. Several things:
 1 If your stopword file has comments, I'm not sure what the effect would
 be.


 Ha, I thought comments were supported in stopwords.txt


 2 Something's not right here, or I'm being fooled again. Your withresults
 xml has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d
 ingenieur)~0.01) ()/str
 and your noresults has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:academi
 charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi
 charpenti~100)~0.01)/str

 the empty () in the first one often means you're NOT going to your
 configured dismax parser in solrconfig.xml. Yet that doesn't square with
 your custom qt, so I'm puzzled.

 Could we see your raw query string on the way in? It's almost as if you
 defined qt in one and defType in the other, which are not equivalent.


 You are right I fixed this problem (my bad).

 3 It may take 12 hours to index, but you could experiment with a smaller
 subset. You say you know that the noresults one should return documents,
 what proof do
 you have? If there's a single document that you know should match this,
 just
 index it and a few others and you should be able to make many runs until
 you
 get
 to the bottom of this...


 I could but I always thought I had to fully re-index after updating
 schema.xml. If
 I update only few documents will that take the changes into account without
 breaking
 the rest ?


 And obviously your stemming is happening on the query, are you sure it's
 happening at index time too?


 Since you did not get the screenshots you will find attached the full
 output of the analysis
 for a phrase that works and for another that does not.

 Thanks for your support

 Best Regards,

 --
 Jérôme



Re: Weird behaviour with phrase queries

2011-01-24 Thread Em

Hi Jerome,

does your fieldtype contains a stopword-filter?
Probably this could be the root of all evil :-).

Could you provide us the fieldtype definition and the explain-content of an
example-query?
Did you check the analysis.jsp to have a look at the produced results?

Regards,
Em


Jerome Renard wrote:
 
 Hi,
 
 I have a problem with phrase queries, from times to times I do not get any
 result
 where as I know I should get returned something.
 
 The search is run against a field of type text which definition is
 available at the following URL :
 - http://pastebin.com/Ncem7M8z
 
 This field is defined with the following configuration:
 field name=meta_text type=textindexed=true  stored=true
 multiValued=true termVectors=true/
 
 I use the following request handler:
 requestHandler name=custom class=solr.DisMaxRequestHandler
 lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qfmeta_text/str
 str name=pfmeta_text/str
 str name=bf/
 str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str
 int name=ps100/int
 str name=q.alt*:*/str
 /lst
 /requestHandler
 
 Depending on the kind of phrase query I use I get either exactly what I am
 looking for or nothing.
 
 Index' contents is all french so I thought about a possible problem with
 accents but I got queries working
 with phrase queries containing é and è chars like académie or
 ingénieur.
 
 As you will see the filter used in the text type uses the
 SnowballPorterFilterFactory for the english language,
 I plan to fix that by using the correct language for the index (French)
 and
 the following protwords http://bit.ly/i8JeX6 .
 
 But except this mistake with the stemmer, did I do something (else) wrong
 ?
 Did I overlook something ? What could
 explain I do not always get results for my phrase queries ?
 
 Thanks in advance for your feedback.
 
 Best Regards,
 
 --
 Jérôme
 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Weird-behaviour-with-phrase-queries-tp2321241p2321362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Weird behaviour with phrase queries

2011-01-24 Thread Erick Erickson
Try submitting your query from the admin page with debugQuery=on and see
if that helps. The output is pretty dense, so feel free to cut-paste the
results for
help.

Your stemmers have English as the language, which could also be
interesting.

As Em says, the analysis page may help here, but I'd start by taking out
WordDelimiterFilterFactory, SnowballPorterFilterFactory and
StopFilterFactory
and build back up if you really need them. Although, again, the analysis
page
that's accessible from the admin page may help greatly (check debug in
both
index and query).

Oh, and you MUST re-index after changing your schema to have a true test.

Best
Erick

On Mon, Jan 24, 2011 at 12:31 PM, Jerome Renard jerome.ren...@gmail.comwrote:

 Hi,

 I have a problem with phrase queries, from times to times I do not get any
 result
 where as I know I should get returned something.

 The search is run against a field of type text which definition is
 available at the following URL :
 - http://pastebin.com/Ncem7M8z

 This field is defined with the following configuration:
 field name=meta_text type=textindexed=true  stored=true
 multiValued=true termVectors=true/

 I use the following request handler:
 requestHandler name=custom class=solr.DisMaxRequestHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qfmeta_text/str
str name=pfmeta_text/str
str name=bf/
str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str
int name=ps100/int
str name=q.alt*:*/str
/lst
 /requestHandler

 Depending on the kind of phrase query I use I get either exactly what I am
 looking for or nothing.

 Index' contents is all french so I thought about a possible problem with
 accents but I got queries working
 with phrase queries containing é and è chars like académie or
 ingénieur.

 As you will see the filter used in the text type uses the
 SnowballPorterFilterFactory for the english language,
 I plan to fix that by using the correct language for the index (French) and
 the following protwords http://bit.ly/i8JeX6 .

 But except this mistake with the stemmer, did I do something (else) wrong ?
 Did I overlook something ? What could
 explain I do not always get results for my phrase queries ?

 Thanks in advance for your feedback.

 Best Regards,

 --
 Jérôme



Re: Weird behaviour with phrase queries

2011-01-24 Thread Erick Erickson
Hmmm, I don't see any screen shots. Several things:
1 If your stopword file has comments, I'm not sure what the effect would
be.
2 Something's not right here, or I'm being fooled again. Your withresults
xml has this line:
str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d
ingenieur)~0.01) ()/str
and your noresults has this line:
str name=parsedquery+DisjunctionMaxQuery((meta_text:academi
charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi
charpenti~100)~0.01)/str

the empty () in the first one often means you're NOT going to your
configured dismax parser in solrconfig.xml. Yet that doesn't square with
your custom qt, so I'm puzzled.

Could we see your raw query string on the way in? It's almost as if you
defined qt in one and defType in the other, which are not equivalent.
3 It may take 12 hours to index, but you could experiment with a smaller
subset. You say you know that the noresults one should return documents,
what proof do
you have? If there's a single document that you know should match this, just
index it and a few others and you should be able to make many runs until you
get
to the bottom of this...

And obviously your stemming is happening on the query, are you sure it's
happening at index time too?

Best
Erick

On Mon, Jan 24, 2011 at 1:51 PM, Jerome Renard jerome.ren...@gmail.comwrote:

 Hi Em, Erick

 thanks for your feedback.

 Em : yes Here is the stopwords.txt I use :
 -
 http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt

 On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Try submitting your query from the admin page with debugQuery=on and see
 if that helps. The output is pretty dense, so feel free to cut-paste the
 results for
 help.

 Your stemmers have English as the language, which could also be
 interesting.


 Yes, I noticed that this will be fixed.


 As Em says, the analysis page may help here, but I'd start by taking out
 WordDelimiterFilterFactory, SnowballPorterFilterFactory and
 StopFilterFactory
 and build back up if you really need them. Although, again, the analysis
 page
 that's accessible from the admin page may help greatly (check debug in
 both
 index and query).


 You will find attached two xml files one with no results (noresult.xml.gz)
 and one with
 a lot of results (withresults.xml.gz). You will also find attached two
 screenshots showing
 there is a highlighted section in the Index analyzer section when
 analysing text.


 Oh, and you MUST re-index after changing your schema to have a true test.


 Yes, the problem is that reindexing takes around 12 hours which makes it
 really hard
 for testing :/


 Thanks in advance for your feedback.

 Best Regards,

 --
 Jérôme



Re: Weird behaviour with phrase queries

2011-01-24 Thread Jerome Renard
Erick,

On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I don't see any screen shots. Several things:
 1 If your stopword file has comments, I'm not sure what the effect would
 be.


Ha, I thought comments were supported in stopwords.txt


 2 Something's not right here, or I'm being fooled again. Your withresults
 xml has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d
 ingenieur)~0.01) ()/str
 and your noresults has this line:
 str name=parsedquery+DisjunctionMaxQuery((meta_text:academi
 charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi
 charpenti~100)~0.01)/str

 the empty () in the first one often means you're NOT going to your
 configured dismax parser in solrconfig.xml. Yet that doesn't square with
 your custom qt, so I'm puzzled.

 Could we see your raw query string on the way in? It's almost as if you
 defined qt in one and defType in the other, which are not equivalent.


You are right I fixed this problem (my bad).

3 It may take 12 hours to index, but you could experiment with a smaller
 subset. You say you know that the noresults one should return documents,
 what proof do
 you have? If there's a single document that you know should match this,
 just
 index it and a few others and you should be able to make many runs until
 you
 get
 to the bottom of this...


I could but I always thought I had to fully re-index after updating
schema.xml. If
I update only few documents will that take the changes into account without
breaking
the rest ?


 And obviously your stemming is happening on the query, are you sure it's
 happening at index time too?


Since you did not get the screenshots you will find attached the full output
of the analysis
for a phrase that works and for another that does not.

Thanks for your support

Best Regards,

--
Jérôme


analysis-noresults.html.gz
Description: GNU Zip compressed data


analysis-withresults.html.gz
Description: GNU Zip compressed data