[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067437#comment-13067437 ] Hoss Man commented on SOLR-2477: Having just looked at this code in SOLR-2663 i'm realizing that as we add more types of analyzers, we should really clean up the semantics of how a analyzers w/o type attributes are treated, and how each of hte analyzers default if they aren't specified. Consider the following (contrived) example... {code} fieldType name=hoss class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ /analyzer /fieldType {code} Right now (on trunk and with this patch) that config will result in all of the analyzers (index/query[/phrase]) using KeywordTokenizerFactory because the type-less analyzer is ignored if there is is an analyzer with type=index. I don't think that makes much sense, and as we add more types of analyzers it makes even less sense -- an analyzer w/o a type attribute should really be the default for each other type I think we should change the overall flow to be (psudeo-code) ... {code} // exactly what is in the config Analyzer defaultA = readAnalyzer(xpath(./analyzer[not(@type)])); Analyzer indexA = readAnalyzer(xpath(./analyzer[@type='index'])); Analyzer queryA = readAnalyzer(xpath(./analyzer[@type='query'])); Analyzer phraseA = readAnalyzer(xpath(./analyzer[@type='phrase'])); if (null != defaultA) { // we have an explicit default if (null == indexA) indexA = defaultA; if (null == queryA) queryA = defaultA; if (null == phraseA) phraseA = defaultA; } else { // implicit defaults, either historical or common sense if (null == queryA) queryA = indexA; if (null == phraseA) phraseA = queryA; } {code} add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067441#comment-13067441 ] Robert Muir commented on SOLR-2477: --- +1 If we decide to implement this or SOLR-219 via 'types of analyzers', I don't want to think of all the combinations if we do it any other way. I would even go so far as to say, dont call it defaultA, but instead globalA, and if you declare this thing, and then also declare some specific analyzer, we throw an exception. add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050642#comment-13050642 ] Hoss Man commented on SOLR-2477: At first glance this looks great to me ... but we should seriously consider whether FieldQParser should also be using getPhraseAnalyzer. I think given the semantics the answer is yes -- but either way it should be clearly documented. we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have options for using this. add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050646#comment-13050646 ] Robert Muir commented on SOLR-2477: --- {quote} but we should seriously consider whether FieldQParser should also be using getPhraseAnalyzer. {quote} Looking at how this is described, it seems to me it should use the phrase analyzer... we can document that it does this, and of course the change is backwards compatible (because if you don't define it, its your query analyzer). {quote} we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have options for using this. {quote} I agree... hopefully this isn't too bad. add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024944#comment-13024944 ] Yonik Seeley commented on SOLR-2477: Interesting idea having a separate analyzer to expose this. It's probably important to come up with a good example for the example schema, because I could see it being error-prone if people do it themselves. For example, if they tried your test example (which may look reasonable to someone at first blush) they wouldn't get any matches for anything that the WDF would normally split? add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024946#comment-13024946 ] Robert Muir commented on SOLR-2477: --- Well, we could maybe add something to the example, I thought it was sort of expert. Well in my example, they would get matches for things that WDF normally splits, but only if the punctuation is exactly as they entered it: assume doc 3 is 'foo bar' and doc4 is 'foo-bar' {noformat} /** * test punctuation, we preserve the original for this purpose */ public void testPunctuation() { assertQ(normal query: , req(fl, id, q, foo-bar, sort, id asc ), //*[@numFound='2'], //result/doc[1]/int[@name='id'][.=3], //result/doc[2]/int[@name='id'][.=4] ); assertQ(phrase query: , req(fl, id, q, \foo-bar\, sort, id asc ), //*[@numFound='1'], //result/doc[1]/int[@name='id'][.=4] ); } {noformat} But, this was just an example, you don't have to involve WDF to take advantage of this (probably stopwords/synonyms/decompounders are the simplest way). I was just coming up with an examples to have some unit tests. add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024954#comment-13024954 ] Yonik Seeley commented on SOLR-2477: bq. Well in my example, they would get matches for things that WDF normally splits, but only if the punctuation is exactly as they entered it Ah, I had missed the preserveOriginal on the index analyzer. add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2477) add analyzer type=phrase
[ https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024958#comment-13024958 ] Robert Muir commented on SOLR-2477: --- Yeah, still even then, if we want something for the example, maybe its enough to just exclude the synonymfilter? add analyzer type=phrase -- Key: SOLR-2477 URL: https://issues.apache.org/jira/browse/SOLR-2477 Project: Solr Issue Type: Improvement Reporter: Robert Muir Fix For: 4.0 Attachments: SOLR-2477.patch This is just exposing LUCENE-2892, so you can easily configure things so that if users put things in double quotes they get a more precise search. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org