[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-07-18 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067437#comment-13067437
 ] 

Hoss Man commented on SOLR-2477:


Having just looked at this code in SOLR-2663 i'm realizing that as we add more 
types of analyzers, we should really clean up the semantics of how a analyzers 
w/o type attributes are treated, and how each of hte analyzers default if 
they aren't specified.

Consider the following (contrived) example...

{code}
fieldType name=hoss class=solr.TextField positionIncrementGap=100
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
   /analyzer
/fieldType
{code}

Right now (on trunk and with this patch) that config will result in all of the 
analyzers (index/query[/phrase]) using KeywordTokenizerFactory because the 
type-less analyzer is ignored if there is is an analyzer with type=index.  I 
don't think that makes much sense, and as we add more types of analyzers it 
makes even less sense -- an analyzer w/o a type attribute should really be the 
default for each other type

I think we should change the overall flow to be (psudeo-code) ...

{code}

// exactly what is in the config
Analyzer defaultA = readAnalyzer(xpath(./analyzer[not(@type)]));
Analyzer indexA = readAnalyzer(xpath(./analyzer[@type='index']));
Analyzer queryA = readAnalyzer(xpath(./analyzer[@type='query']));
Analyzer phraseA = readAnalyzer(xpath(./analyzer[@type='phrase']));

if (null != defaultA) {
  // we have an explicit default
  if (null == indexA) indexA = defaultA;
  if (null == queryA) queryA = defaultA;
  if (null == phraseA) phraseA = defaultA;
} else {
  // implicit defaults, either historical or common sense
  if (null == queryA) queryA = indexA;
  if (null == phraseA) phraseA = queryA;
}
{code}

 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-07-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067441#comment-13067441
 ] 

Robert Muir commented on SOLR-2477:
---

+1

If we decide to implement this or SOLR-219 via 'types of analyzers', I don't 
want to think of all the combinations if we do it any other way.

I would even go so far as to say, dont call it defaultA, but instead globalA, 
and if you declare this thing, and then also declare some specific analyzer,
we throw an exception.

 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-06-16 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050642#comment-13050642
 ] 

Hoss Man commented on SOLR-2477:


At first glance this looks great to me ... but we should seriously consider 
whether FieldQParser should also be using getPhraseAnalyzer.  I think given the 
semantics the answer is yes -- but either way it should be clearly documented.

we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.



 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-06-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050646#comment-13050646
 ] 

Robert Muir commented on SOLR-2477:
---

{quote}
but we should seriously consider whether FieldQParser should also be using 
getPhraseAnalyzer. 
{quote}

Looking at how this is described, it seems to me it should use the phrase 
analyzer... we can document that it does this, and of course the change is 
backwards compatible (because if you don't define it, its your query analyzer).

{quote}
we should also make sure analysis.jsp and the Analysis RequestHandler(s?) have 
options for using this.
{quote}

I agree... hopefully this isn't too bad.


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-04-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024944#comment-13024944
 ] 

Yonik Seeley commented on SOLR-2477:


Interesting idea having a separate analyzer to expose this.
It's probably important to come up with a good example for the example schema, 
because I could see it being error-prone if people do it themselves.  For 
example, if they tried your test example (which may look reasonable to someone 
at first blush)
they wouldn't get any matches for anything that the WDF would normally split?


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-04-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024946#comment-13024946
 ] 

Robert Muir commented on SOLR-2477:
---

Well, we could maybe add something to the example, I thought it was sort of 
expert.

Well in my example, they would get matches for things that WDF normally splits, 
but only if the punctuation is exactly as they entered it:
assume doc 3 is 'foo bar' and doc4 is 'foo-bar'
{noformat}
  /** 
   * test punctuation, we preserve the original for this purpose
   */
  public void testPunctuation() {
assertQ(normal query: ,
   req(fl, id, q, foo-bar, sort, id asc ),
  //*[@numFound='2'],
  //result/doc[1]/int[@name='id'][.=3],
  //result/doc[2]/int[@name='id'][.=4]
);

assertQ(phrase query: ,
req(fl, id, q, \foo-bar\, sort, id asc ),
   //*[@numFound='1'],
   //result/doc[1]/int[@name='id'][.=4]
 );
  }
{noformat}

But, this was just an example, you don't have to involve WDF to take advantage 
of this (probably stopwords/synonyms/decompounders are the simplest way). I was 
just coming up with an examples to have some unit tests.


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-04-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024954#comment-13024954
 ] 

Yonik Seeley commented on SOLR-2477:


bq. Well in my example, they would get matches for things that WDF normally 
splits, but only if the punctuation is exactly as they entered it

Ah, I had missed the preserveOriginal on the index analyzer.


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2477) add analyzer type=phrase

2011-04-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024958#comment-13024958
 ] 

Robert Muir commented on SOLR-2477:
---

Yeah, still even then, if we want something for the example, maybe its enough 
to just exclude the synonymfilter?


 add analyzer type=phrase
 --

 Key: SOLR-2477
 URL: https://issues.apache.org/jira/browse/SOLR-2477
 Project: Solr
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: SOLR-2477.patch


 This is just exposing LUCENE-2892, so you can easily configure things
 so that if users put things in double quotes they get a more precise search.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org