Re: Weird behaviour with phrase queries
Hi Erick, On Tue, Jan 25, 2011 at 1:38 PM, Erick Erickson erickerick...@gmail.comwrote: Frankly, this puzzles me. It *looks* like it should be OK. One warning, the analysis page sometimes is a bit misleading, so beware of that. But the output of your queries make it look like the query is parsing as you expect, which leaves the question of whether your index contains what you think it does. You might get a copy of Luke, which allows you to examine what's actually in your index instead of what you think is in there. Sometimes there are surprises here! Bingo ! Some data were not in the index. Indexing them obviously fixed the problem. I didn't mean to re-index your whole corpus, I was thinking that you could just index a few documents in a test index so you have something small to look at. Sorry I can't spot what's happening right away. No worries, thanks for your support :) -- Jérôme
Re: Weird behaviour with phrase queries
Frankly, this puzzles me. It *looks* like it should be OK. One warning, the analysis page sometimes is a bit misleading, so beware of that. But the output of your queries make it look like the query is parsing as you expect, which leaves the question of whether your index contains what you think it does. You might get a copy of Luke, which allows you to examine what's actually in your index instead of what you think is in there. Sometimes there are surprises here! I didn't mean to re-index your whole corpus, I was thinking that you could just index a few documents in a test index so you have something small to look at. Sorry I can't spot what's happening right away. Good luck! Erick On Tue, Jan 25, 2011 at 2:45 AM, Jerome Renard jerome.ren...@gmail.comwrote: Erick, On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. Ha, I thought comments were supported in stopwords.txt 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. You are right I fixed this problem (my bad). 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... I could but I always thought I had to fully re-index after updating schema.xml. If I update only few documents will that take the changes into account without breaking the rest ? And obviously your stemming is happening on the query, are you sure it's happening at index time too? Since you did not get the screenshots you will find attached the full output of the analysis for a phrase that works and for another that does not. Thanks for your support Best Regards, -- Jérôme
Re: Weird behaviour with phrase queries
Hi Jerome, does your fieldtype contains a stopword-filter? Probably this could be the root of all evil :-). Could you provide us the fieldtype definition and the explain-content of an example-query? Did you check the analysis.jsp to have a look at the produced results? Regards, Em Jerome Renard wrote: Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme -- View this message in context: http://lucene.472066.n3.nabble.com/Weird-behaviour-with-phrase-queries-tp2321241p2321362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Weird behaviour with phrase queries
Try submitting your query from the admin page with debugQuery=on and see if that helps. The output is pretty dense, so feel free to cut-paste the results for help. Your stemmers have English as the language, which could also be interesting. As Em says, the analysis page may help here, but I'd start by taking out WordDelimiterFilterFactory, SnowballPorterFilterFactory and StopFilterFactory and build back up if you really need them. Although, again, the analysis page that's accessible from the admin page may help greatly (check debug in both index and query). Oh, and you MUST re-index after changing your schema to have a true test. Best Erick On Mon, Jan 24, 2011 at 12:31 PM, Jerome Renard jerome.ren...@gmail.comwrote: Hi, I have a problem with phrase queries, from times to times I do not get any result where as I know I should get returned something. The search is run against a field of type text which definition is available at the following URL : - http://pastebin.com/Ncem7M8z This field is defined with the following configuration: field name=meta_text type=textindexed=true stored=true multiValued=true termVectors=true/ I use the following request handler: requestHandler name=custom class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfmeta_text/str str name=pfmeta_text/str str name=bf/ str name=mm1lt;1 2lt;-1 5lt;-2 7lt;60%/str int name=ps100/int str name=q.alt*:*/str /lst /requestHandler Depending on the kind of phrase query I use I get either exactly what I am looking for or nothing. Index' contents is all french so I thought about a possible problem with accents but I got queries working with phrase queries containing é and è chars like académie or ingénieur. As you will see the filter used in the text type uses the SnowballPorterFilterFactory for the english language, I plan to fix that by using the correct language for the index (French) and the following protwords http://bit.ly/i8JeX6 . But except this mistake with the stemmer, did I do something (else) wrong ? Did I overlook something ? What could explain I do not always get results for my phrase queries ? Thanks in advance for your feedback. Best Regards, -- Jérôme
Re: Weird behaviour with phrase queries
Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... And obviously your stemming is happening on the query, are you sure it's happening at index time too? Best Erick On Mon, Jan 24, 2011 at 1:51 PM, Jerome Renard jerome.ren...@gmail.comwrote: Hi Em, Erick thanks for your feedback. Em : yes Here is the stopwords.txt I use : - http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/french_stop.txt On Mon, Jan 24, 2011 at 6:58 PM, Erick Erickson erickerick...@gmail.comwrote: Try submitting your query from the admin page with debugQuery=on and see if that helps. The output is pretty dense, so feel free to cut-paste the results for help. Your stemmers have English as the language, which could also be interesting. Yes, I noticed that this will be fixed. As Em says, the analysis page may help here, but I'd start by taking out WordDelimiterFilterFactory, SnowballPorterFilterFactory and StopFilterFactory and build back up if you really need them. Although, again, the analysis page that's accessible from the admin page may help greatly (check debug in both index and query). You will find attached two xml files one with no results (noresult.xml.gz) and one with a lot of results (withresults.xml.gz). You will also find attached two screenshots showing there is a highlighted section in the Index analyzer section when analysing text. Oh, and you MUST re-index after changing your schema to have a true test. Yes, the problem is that reindexing takes around 12 hours which makes it really hard for testing :/ Thanks in advance for your feedback. Best Regards, -- Jérôme
Re: Weird behaviour with phrase queries
Erick, On Mon, Jan 24, 2011 at 9:57 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I don't see any screen shots. Several things: 1 If your stopword file has comments, I'm not sure what the effect would be. Ha, I thought comments were supported in stopwords.txt 2 Something's not right here, or I'm being fooled again. Your withresults xml has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:ecol d ingenieur)~0.01) ()/str and your noresults has this line: str name=parsedquery+DisjunctionMaxQuery((meta_text:academi charpenti)~0.01) DisjunctionMaxQuery((meta_text:academi charpenti~100)~0.01)/str the empty () in the first one often means you're NOT going to your configured dismax parser in solrconfig.xml. Yet that doesn't square with your custom qt, so I'm puzzled. Could we see your raw query string on the way in? It's almost as if you defined qt in one and defType in the other, which are not equivalent. You are right I fixed this problem (my bad). 3 It may take 12 hours to index, but you could experiment with a smaller subset. You say you know that the noresults one should return documents, what proof do you have? If there's a single document that you know should match this, just index it and a few others and you should be able to make many runs until you get to the bottom of this... I could but I always thought I had to fully re-index after updating schema.xml. If I update only few documents will that take the changes into account without breaking the rest ? And obviously your stemming is happening on the query, are you sure it's happening at index time too? Since you did not get the screenshots you will find attached the full output of the analysis for a phrase that works and for another that does not. Thanks for your support Best Regards, -- Jérôme analysis-noresults.html.gz Description: GNU Zip compressed data analysis-withresults.html.gz Description: GNU Zip compressed data