Re: analysis tool vs. reality
: Maybe, separate from analysis.jsp (showing only how text is analyzed), : Solr needs a debug page showing the steps the field's QueryParser goes : through on a given query, to debug such tricky QueryParser/Analyzer : interactions? As mentioned earlier in this thread, i set out to build something exactly like this a while back, but as part of the DebugComponent instead of a standalone page. I ran into a lot of problems i couldn't figure out any way arround, so i posted my thoughts in Jira for future refrence in case other folks wanted to follow up with alternate suggestions on how to work arround them (or mitigate the maintence headaches involved)... https://issues.apache.org/jira/browse/SOLR-1749 -Hoss
Re: analysis tool vs. reality
: even if you change the Lucene QUeryParser so that whitespace isn't a meta : character it doens't affect the underlying issue: analysis.jsp is agnostic : about QueryParsers. : analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and : your default queryparser is actually a de-facto whitespace tokenizer, don't : try to sugarcoat it. If it makes you feel better to use the word ignorant instead of agnostic fine -- but i'm not suger coating anything. analysis.jsp's query analyzer output is ignorant of all the QueryParsers that might be used at query time in the same way that it's index analyzer output is ignorant of the UpdateProcessors that might be used at index time -- in both cases it only focuses on analysis, and tells you that give input X, the analyzer produces output Y. if you want to change the Lucene QueryParser then go fight that battle in another thread -- i'm trying to have a meaningful conversation about how we can better educate users about the distinction between Query Parsing and Analysis, and about how we can make it more clear what analysis.jsp is doing. Even if you convince folks to make every change you think should be made to the Lucene QueryParser (again: please take that up in a seperate thread) it won't change the fact that people using analysis.jsp should understand the distinction between Query Parsing and Analysis -- unless you plan on getting rid of every metacharacter that the Lucene QueryParser uses to decide what types of Query to build (ie: '', '-', '', '*', '?') and unless you plan on forcing Solr users to only ever use that one QueryParser, then no matter what the Lucene QueryParser does with whitespace, users still need to understand the distinction between Query Parsing and Analysis so they don't type 'Foo*' into analysis.jsp and then ask why it says that will match food but it doesn't actually match at query time. (suprise suprise: Query Parsing is not the same as analysis, and when the QueryParser sees wildcards it doesn't use the analyzer) -Hoss
Re: analysis tool vs. reality
On Mon, Aug 16, 2010 at 4:20 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Even if you convince folks to make every change you think should be made to the Lucene QueryParser (again: please take that up in a seperate thread) it won't change the fact that people using analysis.jsp should understand the distinction between Query Parsing and Analysis -- unless you plan on getting rid of every metacharacter that the Lucene QueryParser uses to decide what types of Query to build (ie: '', '-', '', '*', '?') and unless you plan on forcing Solr users to only ever use that one QueryParser, then no matter what the Lucene QueryParser does with whitespace, users still need to understand the distinction between Query Parsing and Analysis so they don't type 'Foo*' into analysis.jsp and then ask why it says that will match food but it doesn't actually match at query time. (suprise suprise: Query Parsing is not the same as analysis, and when the QueryParser sees wildcards it doesn't use the analyzer) Maybe for once your argument isn't completely bogus: the surprise is actually key here. Theres really nothing documenting the various hacks/limitations in the queryparsers: such as auto-tokenizing on whitespace. I think the 'expanded terms' not being analyzed is similar, its not really documented well. Thats probably why it comes up on the mailing list it seems at least every week [at this point you have to admit, there is a problem]. If you want to say the analysis tool is agnostic about queryparsers, thats fine, you can keep saying that. I'm saying it shouldn't be. -- Robert Muir rcm...@gmail.com
RE: analysis tool vs. reality
Hi Robert, You wrote in response to Hoss: Maybe for once your argument isn't completely bogus Attacking people here is really uncalled for. -1 from me. Steve
Re: analysis tool vs. reality
On Mon, Aug 16, 2010 at 5:23 PM, Steven A Rowe sar...@syr.edu wrote: Hi Robert, You wrote in response to Hoss: Maybe for once your argument isn't completely bogus Attacking people here is really uncalled for. actually, he asked for it: you're right, we should just fix the bug that the queryparser tokenizes on whitespace first. then analysis.jsp will be significantly less confusing. dude .. not trying to get into a holy war here -1 from me. well, that might be your opinion, but it doesn't change the facts. -- Robert Muir rcm...@gmail.com
Re: analysis tool vs. reality
Maybe, separate from analysis.jsp (showing only how text is analyzed), Solr needs a debug page showing the steps the field's QueryParser goes through on a given query, to debug such tricky QueryParser/Analyzer interactions? We could make a wrapper around the analyzer that records each text fragment sent to it by the QueryParser, as a start. It'd be great to also see it spelled out how that then resulted in a particular part of the query. So for query ABC12 FOO you'd see that ABC12 was sent to analyzer, it returned two tokens (ABC, 12), and then QueryParser made a PhraseQuery from that, and then FOO was sent, and that turned into TermQuery, and default op was AND and so a toplevel BooleanQuery with 2 MUST terms was created... Mike On Thu, Aug 12, 2010 at 8:39 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Aug 12, 2010 at 8:07 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace first. That's my : point: analysis.jsp doesn't make any assumptions about what query parser : *might* be used, it just tells you what your analyzers do with strings. : : : you're right, we should just fix the bug that the queryparser tokenizes on : whitespace first. then analysis.jsp will be significantly less confusing. dude .. not trying to get into a holy war here actually I'm suggesting the practical solution: that we fix the primary problem that makes it confusing. even if you change the Lucene QUeryParser so that whitespace isn't a meta character it doens't affect the underlying issue: analysis.jsp is agnostic about QueryParsers. analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and your default queryparser is actually a de-facto whitespace tokenizer, don't try to sugarcoat it. -- Robert Muir rcm...@gmail.com
RE: analysis tool vs. reality
+1 I just had occasion to debug something where the interaction between the queryparser and the analyzer produced *interesting* results. Having a separate jsp that includes the whole chain (i.e. analyzer/tokenizer/filter and qp) would be great! Tom -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, August 13, 2010 5:19 AM To: solr-user@lucene.apache.org Subject: Re: analysis tool vs. reality Maybe, separate from analysis.jsp (showing only how text is analyzed), Solr needs a debug page showing the steps the field's QueryParser goes through on a given query, to debug such tricky QueryParser/Analyzer interactions? We could make a wrapper around the analyzer that records each text fragment sent to it by the QueryParser, as a start. It'd be great to also see it spelled out how that then resulted in a particular part of the query. So for query ABC12 FOO you'd see that ABC12 was sent to analyzer, it returned two tokens (ABC, 12), and then QueryParser made a PhraseQuery from that, and then FOO was sent, and that turned into TermQuery, and default op was AND and so a toplevel BooleanQuery with 2 MUST terms was created... Mike On Thu, Aug 12, 2010 at 8:39 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Aug 12, 2010 at 8:07 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace first. That's my : point: analysis.jsp doesn't make any assumptions about what query parser : *might* be used, it just tells you what your analyzers do with strings. : : : you're right, we should just fix the bug that the queryparser tokenizes on : whitespace first. then analysis.jsp will be significantly less confusing. dude .. not trying to get into a holy war here actually I'm suggesting the practical solution: that we fix the primary problem that makes it confusing. even if you change the Lucene QUeryParser so that whitespace isn't a meta character it doens't affect the underlying issue: analysis.jsp is agnostic about QueryParsers. analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and your default queryparser is actually a de-facto whitespace tokenizer, don't try to sugarcoat it. -- Robert Muir rcm...@gmail.com
Re: analysis tool vs. reality
: Furthermore, I would like to add its not just the highlight matches : functionality that is horribly broken here, but the output of the analysis : itself is misleading. : : lets say i take 'textTight' from the example, and add the following synonym: : : this is broken = broke : : the query time analysis is wrong, as it clearly shows synonymfilter : collapsing this is broken to broke, but in reality with the qp for that : field, you are gonna get 3 separate tokenstreams and this will never : actually happen (because the qp will divide it up on whitespace first) : : So really the output from 'Query Analyzer' is completely bogus. analysis.jsp is only intended to explain *analysis* ... it accurately tells you what the analyzer type=query ... for the specified field (or fieldType) is going to produce given a hunk of text. That is what it does, that is all that it does, that is all it has ever done, and all it has ever purported to do. You say it's bogus because the qp will divide on whitesapce first -- but you're assuming you know what query parser will be used ... the field query parser (to name one) doesn't split on whitespace first. That's my point: analysis.jsp doesn't make any assumptions about what query parser *might* be used, it just tells you what your analyzers do with strings. Saying the output of analisys.jsp is bogus because it doesn't take into account QueryParsing is like saying the output of stats.jsp is bogus because those are only the stats of the local solr instance on that machine, and it doesn't do distributed stats -- yeah that would be nice to have, but the stats.jsp never implies that's what it's giving you. If there are ways we can make the purpose of analysis.jsp more obvious, and less missleading for people who don't udnerstand the distinction between query parsing and analysis then i am all for it. if you really believe getting rid of the highlite check box is going to help, then fine -- but i have yet to see any evidence that people who don't understand the relationship between query parsing and analysis are confused by the blue boxes. what people seem to be confused by is when they see the same tokens ultimately produced by both the index analyzer and the query analyzer -- it doesn't matter if those tokens are in blue or not, if they see that the tokens in the index analyzer output are a super set of the tokens in the query analyzer output then they tend to assume that means searching for the string in the query box will match documents containing hte string in the index text box. Getting rid of the blue table cell is just going to make it harder to notice matching tokens in the output -- not reduce the confusion when those matching tokens exist in the output. My question is: What can we do to make it more clear what the *purpose* of analysis.jsp is? is there verbage we can add to the page to make it more obvious? NOTE: I'm not just asking Robert, this is a question for the solr-user community as a whole. I *know* what analysis.jsp is for, i've never been confused -- for people who have been confused in hte past (or are still confused) please help us understand what type of changes we could make to the output of analysis.jsp to make it's functionality more understandable. -Hoss
Re: analysis tool vs. reality
On Thu, Aug 12, 2010 at 7:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: You say it's bogus because the qp will divide on whitesapce first -- but you're assuming you know what query parser will be used ... the field query parser (to name one) doesn't split on whitespace first. That's my point: analysis.jsp doesn't make any assumptions about what query parser *might* be used, it just tells you what your analyzers do with strings. you're right, we should just fix the bug that the queryparser tokenizes on whitespace first. then analysis.jsp will be significantly less confusing. -- Robert Muir rcm...@gmail.com
Re: analysis tool vs. reality
: You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace first. That's my : point: analysis.jsp doesn't make any assumptions about what query parser : *might* be used, it just tells you what your analyzers do with strings. : : : you're right, we should just fix the bug that the queryparser tokenizes on : whitespace first. then analysis.jsp will be significantly less confusing. dude .. not trying to get into a holy war here even if you change the Lucene QUeryParser so that whitespace isn't a meta character it doens't affect the underlying issue: analysis.jsp is agnostic about QueryParsers. Some other QParser the users uses might have other special behavior and if people don't understand hte distinction between QueryParsing and analysis they can still be confused -- hell even if the only QParser anyone ever uses is the lucene QParser, and even if you get the QUeryParser changed so that whitespace isn't a metacharacter, you we are still going to be left with the fact that *other* charaters (like '+' and '-' and '' and '*' and ...) are metacharacters for that query parser, and have special meaning. analysis.jsp isn't going to know about those, or do anything special for them -- so people cna still be easily confused when analysis.jsp says one thing about how the string +foo* -bar get's analyzed, but that string as a query means something completley different. Hence my point: leave arguments about QueryParser out of it -- how do we make the function of analysis.jsp more clear? -Hoss
Re: analysis tool vs. reality
On Thu, Aug 12, 2010 at 8:07 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : You say it's bogus because the qp will divide on whitesapce first -- but : you're assuming you know what query parser will be used ... the field : query parser (to name one) doesn't split on whitespace first. That's my : point: analysis.jsp doesn't make any assumptions about what query parser : *might* be used, it just tells you what your analyzers do with strings. : : : you're right, we should just fix the bug that the queryparser tokenizes on : whitespace first. then analysis.jsp will be significantly less confusing. dude .. not trying to get into a holy war here actually I'm suggesting the practical solution: that we fix the primary problem that makes it confusing. even if you change the Lucene QUeryParser so that whitespace isn't a meta character it doens't affect the underlying issue: analysis.jsp is agnostic about QueryParsers. analysis.jsp isn't agnostic about queryparsers, its ignorant of them, and your default queryparser is actually a de-facto whitespace tokenizer, don't try to sugarcoat it. -- Robert Muir rcm...@gmail.com
analysis tool vs. reality
Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
Re: analysis tool vs. reality
I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex and actually parsed the query w/ queryparser, ran it, and used the highlighter to try to show any matches. On Wed, Aug 4, 2010 at 10:14 AM, Justin Lolofie jta...@gmail.com wrote: Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin -- Robert Muir rcm...@gmail.com
analysis tool vs. reality
Wow, I got to work this morning and my query results now include the 'ABC12' document. I'm not sure what that means. Either I made a mistake in the process I described in the last email (I dont think this is the case) or there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
Re: analysis tool vs. reality
On Wed, Aug 4, 2010 at 7:52 PM, Robert Muir rcm...@gmail.com wrote: I think I agree with Justin here, I think the way analysis tool highlights 'matches' is extremely misleading, especially considering it completely ignores queryparsing. it would be better if it put your text in a memoryindex and actually parsed the query w/ queryparser, ran it, and used the highlighter to try to show any matches. +1 -- Regards, Shalin Shekhar Mangar.
Re: analysis tool vs. reality
: I think I agree with Justin here, I think the way analysis tool highlights : 'matches' is extremely misleading, especially considering it completely : ignores queryparsing. it really only attempts to identify when there is overlap between analaysis at query time and at indexing time so you can easily spot when one analyzer or the other breaks things so that they no longer line up (or when it fiexes things so they start to line up) Even if we eliminated that highlighting as missleading, people would still do it in thier minds, it would just be harder -- it doesn't change the underlying fact that analysis is only part of the picture. : it would be better if it put your text in a memoryindex and actually parsed : the query w/ queryparser, ran it, and used the highlighter to try to show : any matches. Thta level of query explanation really only works if the user gives us a full document (all fields, not just one) and a full query string, and all of the possible query params -- because the query parser (either implicit because of config, or explicitly specified by the user) might change it's behavior based on those other params. I agree with you: debugging functionality along hte lines of what you are describing would be *VASTLY* more useful then what we've got right now, and is something i breifly looked into doing before as an extension of the existing DebugComponent... https://issues.apache.org/jira/browse/SOLR-1749 ...the problems i encountered trying to do it as a debug component on a real Solr request seem like they would also be problems for a MemoryIndex based admin tool approach like what you suggest -- but if you've got ideas on working arround them i am 100% interested. Independent of how we might create a better QueryPasrser + Analyssis Explanation tool / debug component is hte question of what we can do to make it more clear what exactly the analysis.jsp page is doing and what people can infer from that page. As i said, i don't think removing the match highlighting will actaully reduce confusion, but perhaps there is verbage/disclaimers that could be added to make it more clear? -Hoss
Re: analysis tool vs. reality
Furthermore, I would like to add its not just the highlight matches functionality that is horribly broken here, but the output of the analysis itself is misleading. lets say i take 'textTight' from the example, and add the following synonym: this is broken = broke the query time analysis is wrong, as it clearly shows synonymfilter collapsing this is broken to broke, but in reality with the qp for that field, you are gonna get 3 separate tokenstreams and this will never actually happen (because the qp will divide it up on whitespace first) So really the output from 'Query Analyzer' is completely bogus. On Wed, Aug 4, 2010 at 1:57 PM, Robert Muir rcm...@gmail.com wrote: On Wed, Aug 4, 2010 at 1:45 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: it really only attempts to identify when there is overlap between analaysis at query time and at indexing time so you can easily spot when one analyzer or the other breaks things so that they no longer line up (or when it fiexes things so they start to line up) It attempts badly, because it only works in the most trivial of cases (e.g. doesnt reflect the interaction of queryparser with multiword synonyms or worddelimiterfilter). Since Solr includes these non-trivial analysis components *in the example* it means that this 'highlight matches' doesnt actually even really work at all. Someone is gonna use this thing when they dont understand why analysis isnt doing what they want, i.e. the cases like I outlined above. For the trivial cases where it does work the 'highlight matches' isnt useful anyway, so in its current state its completely unnecessary. Even if we eliminated that highlighting as missleading, people would still do it in thier minds, it would just be harder -- it doesn't change the underlying fact that analysis is only part of the picture. I'm not suggesting that. I'm suggesting fixing the highlighting so its not misleading. There are really only two choices: 1. remove the current highlighting 2. fix it. in its current state its completely useless and misleading, except for very trivial cases, in which you dont need it anyway. : it would be better if it put your text in a memoryindex and actually parsed : the query w/ queryparser, ran it, and used the highlighter to try to show : any matches. Thta level of query explanation really only works if the user gives us a full document (all fields, not just one) and a full query string, and all of the possible query params -- because the query parser (either implicit because of config, or explicitly specified by the user) might change it's behavior based on those other params. thats true, but I dont see why the user couldnt be allowed to provide just this. I'd bet money a lot of people are using this thing with a specific query/document in mind anyway! people can infer from that page. As i said, i don't think removing the match highlighting will actaully reduce confusion, but perhaps there is verbage/disclaimers that could be added to make it more clear? As i said before, I think i disagree with you. I think for stuff like this the technicals are less important, whats important is this is a misleading checkbox that really confuses users. I suggest disabling it entirely, you are only going to remove confusion. -- Robert Muir rcm...@gmail.com -- Robert Muir rcm...@gmail.com
Re: analysis tool vs. reality
there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Yes. Solr by default has http caching on if there is no configuration, and the example solrconfig.xml has it configured on. You should edit solrconfig.xml to use the alternative described in the comments. On Wed, Aug 4, 2010 at 7:55 AM, Justin Lolofie jta...@gmail.com wrote: Wow, I got to work this morning and my query results now include the 'ABC12' document. I'm not sure what that means. Either I made a mistake in the process I described in the last email (I dont think this is the case) or there is some kind of caching of query results going on that doesnt get flushed on a restart of tomcat. Erik: Yes, I did re-index if that means adding the document again. Here are the exact steps I took: 1. analysis.jsp ABC12 does NOT match title ABC12 (however, ABC or 12 does) 2. changed schema.xml WordDelimeterFilterFactory catenate-all 3. restarted tomcat 4. deleted the document with title ABC12 5. added the document with title ABC12 6. query ABC12 does NOT result in the document with title ABC12 7. analysis.jsp ABC12 DOES match that document now Is there any way to see, given an ID, how something is indexed internally? Lance: I understand the index/query sections of analysis.jsp. However, it operates on text that you enter into the form, not on actual index data. Since all my documents have a unique ID, I'd like to supply an ID and a query, and get back the same index/query sections- using whats actually in the index. -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 22:43:17 -0400 Subject: Re: analysis tool vs. reality Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin -- Lance Norskog goks...@gmail.com
analysis tool vs. reality
Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
Re: analysis tool vs. reality
The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
analysis tool vs. reality
Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin
Re: analysis tool vs. reality
This is the 'index' part of the analyser.jsp page. You can ask how the text is indexed as well as how it is turned into a query. On Tue, Aug 3, 2010 at 4:35 PM, Justin Lolofie jta...@gmail.com wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin -- Lance Norskog goks...@gmail.com
Re: analysis tool vs. reality
Did you reindex after changing the schema? On Aug 3, 2010, at 7:35 PM, Justin Lolofie wrote: Hi Erik, thank you for replying. So, turning on debugQuery shows information about how the query is processed- is there a way to see how things are stored internally in the index? My query is ABC12. There is a document who's title field is ABC12. However, I can only get it to match if I search for ABC or 12. This was also true in the analysis tool up until recently. However, I changed schema.xml and turned on catenate-all in WordDelimterFilterFactory for title fieldtype. Now, in the analysis tool ABC12 matches ABC12. However, when doing an actual query, it does not match. Thank you for any help, Justin -- Forwarded message -- From: Erik Hatcher erik.hatc...@gmail.com To: solr-user@lucene.apache.org Date: Tue, 3 Aug 2010 16:50:06 -0400 Subject: Re: analysis tool vs. reality The analysis tool is merely that, but during querying there is also a query parser involved. Adding debugQuery=true to your request will give you the parsed query in the response offering insight into what might be going on. Could be lots of things, like not querying the fields you think you are to a misunderstanding about some text not being analyzed (like wildcard clauses). Erik On Aug 3, 2010, at 4:43 PM, Justin Lolofie wrote: Hello, I have found the analysis tool in the admin page to be very useful in understanding my schema. I've made changes to my schema so that a particular case I'm looking at matches properly. I restarted solr, deleted the document from the index, and added it again. But still, when I do a query, the document does not get returned in the results. Does anyone have any tips for debugging this sort of issue? What is different between what I see in analysis tool and new documents added to the index? Thanks, Justin