RE: Complexphrase treats wildcards differently than other query parsers

2017-10-09 Thread Allison, Timothy B.
  Right.  Sorry.

Despite appearances to the contrary, I'm not a bot designed to lead you down 
the garden path of debugging for yourself with the goal of increasing the size 
of the Solr contributor pool...

I confirmed the failure in 6.x, but all seems to work in 7.x and trunk.  I 
opened SOLR-11450 and attached a unit test based on your correction of mine. 

Thank you, again!


-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] 
Sent: Monday, October 9, 2017 8:39 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

Thanks again, Tim,
following your recipe, I was able to write a failing test:

assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I 
originally reported, namely that CPQP does not analyse it because of the 
wildcard and thus does not hit the charfilter from the query side.


2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> That could be it.  I'm not able to reproduce this with trunk.  More 
> next week.
>
> In trunk, if I add this to schema15.xml:
>   
> 
>mapping="mapping- ISOLatin1Accent.txt"/>
>   
> 
>   
>stored="true"/>
>
> This test passes.
>
>   @Test
>   public void testCharFilter() {
> assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
> assertU(commit());
> assertU(optimize());
>
> assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:traen")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
>     , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>   }
>
>
>
> -Original Message-
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Friday, October 6, 2017 6:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other 
> query parsers
>
> Thanks a lot for your effort, Tim.
>
> Looking at it from the Solr side, I see some use of local classes. The 
> snippet below in particular caught my eye (in 
> solr/core/src/java/org/apache/ solr/search/ComplexPhraseQParserPlugin.java).
> The instance of ComplexPhraseQueryParser is not the clean one from 
> Lucene, but a modified one. If any of the modifications messes with 
> the analysis logic, well then that might answer it.
>
> What do you make of it?
>
> lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
>

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-09 Thread Bjarke Buur Mortensen
Thanks again, Tim,
following your recipe, I was able to write a failing test:

assertQ(req("q", "{!complexphrase} iso-latin1:cr\u00E6zy*")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

Notice how cr\u00E6zy* is used as a query term which mimics the behaviour I
originally reported, namely that CPQP does not analyse it because of the
wildcard and thus does not hit the charfilter from the query side.


2017-10-06 20:54 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> That could be it.  I'm not able to reproduce this with trunk.  More next
> week.
>
> In trunk, if I add this to schema15.xml:
>   
> 
>   
>   
> 
>   
>stored="true"/>
>
> This test passes.
>
>   @Test
>   public void testCharFilter() {
> assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
> assertU(commit());
> assertU(optimize());
>
> assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:traen")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
>     , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>
> assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
> , "//result[@numFound='1']"
> , "//doc[./str[@name='id']='1']"
> );
>   }
>
>
>
> -Original Message-
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Friday, October 6, 2017 6:46 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other query
> parsers
>
> Thanks a lot for your effort, Tim.
>
> Looking at it from the Solr side, I see some use of local classes. The
> snippet below in particular caught my eye (in solr/core/src/java/org/apache/
> solr/search/ComplexPhraseQParserPlugin.java).
> The instance of ComplexPhraseQueryParser is not the clean one from Lucene,
> but a modified one. If any of the modifications messes with the analysis
> logic, well then that might answer it.
>
> What do you make of it?
>
> lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
> getQueryAnalyzer())
> {
> protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try {
> org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
> getWildcardQuery(t.field(), t.text());
> setRewriteMethod(wildcardQuery);
> return wildcardQuery;
> } catch (SyntaxError e) {
> throw new RuntimeException(e);
> }
> }
> private Query setRewriteMethod(org.apache.lucene.search.Query query) { if
> (query instanceof MultiTermQuery) {
> ((MultiTermQuery) query).setRewriteMethod( org.apache.lucene.search.
> MultiTermQuery.SCORING_BOOLEAN_REWRITE);
> }
> return query;
> }
> protected Query newRangeQuery(String field, String part1, String part2,
> boolean startInclusive, boolean endInclusive) { boolean reverse =
> reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
> part1);
> return super.newRangeQuery(field,
> reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2,
> startInclusive || reverse, endInclusive); } } ;
>
> Thanks,
> Bjarke
>
>
>


RE: Complexphrase treats wildcards differently than other query parsers

2017-10-06 Thread Allison, Timothy B.
That could be it.  I'm not able to reproduce this with trunk.  More next week.

In trunk, if I add this to schema15.xml:
  

  
  

  
  

This test passes.

  @Test
  public void testCharFilter() {
assertU(adoc("iso-latin1", "cr\u00E6zy tr\u00E6n", "id", "1"));
assertU(commit());
assertU(optimize());

assertQ(req("q", "{!complexphrase} iso-latin1:craezy")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:traen")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:caezy~1")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:crae*")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:*aezy")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:crae*y")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"craezy traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"caezy~1 traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"craez* traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"*aezy traen\"")
    , "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);

assertQ(req("q", "{!complexphrase} iso-latin1:\"crae*y traen\"")
, "//result[@numFound='1']"
, "//doc[./str[@name='id']='1']"
);
  }



-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] 
Sent: Friday, October 6, 2017 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

Thanks a lot for your effort, Tim.

Looking at it from the Solr side, I see some use of local classes. The snippet 
below in particular caught my eye (in 
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java).
The instance of ComplexPhraseQueryParser is not the clean one from Lucene, but 
a modified one. If any of the modifications messes with the analysis logic, 
well then that might answer it.

What do you make of it?

lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
getQueryAnalyzer())
{
protected Query newWildcardQuery(org.apache.lucene.index.Term t) { try { 
org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
getWildcardQuery(t.field(), t.text());
setRewriteMethod(wildcardQuery);
return wildcardQuery;
} catch (SyntaxError e) {
throw new RuntimeException(e);
}
}
private Query setRewriteMethod(org.apache.lucene.search.Query query) { if 
(query instanceof MultiTermQuery) {
((MultiTermQuery) query).setRewriteMethod( 
org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE);
}
return query;
}
protected Query newRangeQuery(String field, String part1, String part2, boolean 
startInclusive, boolean endInclusive) { boolean reverse = 
reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
part1);
return super.newRangeQuery(field,
reverse ? reverseAwareParser.getLowerBoundForReverse() : part1, part2, 
startInclusive || reverse, endInclusive); } } ;

Thanks,
Bjarke




Re: Complexphrase treats wildcards differently than other query parsers

2017-10-06 Thread Bjarke Buur Mortensen
Thanks a lot for your effort, Tim.

Looking at it from the Solr side, I see some use of local classes. The
snippet below in particular caught my eye (in
solr/core/src/java/org/apache/solr/search/ComplexPhraseQParserPlugin.java).
The instance of ComplexPhraseQueryParser is not the clean one from Lucene,
but a modified one. If any of the modifications messes with the analysis
logic, well then that might answer it.

What do you make of it?

lparser = new ComplexPhraseQueryParser(defaultField, getReq().getSchema().
getQueryAnalyzer())
{
protected Query newWildcardQuery(org.apache.lucene.index.Term t) {
try {
org.apache.lucene.search.Query wildcardQuery = reverseAwareParser.
getWildcardQuery(t.field(), t.text());
setRewriteMethod(wildcardQuery);
return wildcardQuery;
} catch (SyntaxError e) {
throw new RuntimeException(e);
}
}
private Query setRewriteMethod(org.apache.lucene.search.Query query) {
if (query instanceof MultiTermQuery) {
((MultiTermQuery) query).setRewriteMethod(
org.apache.lucene.search.MultiTermQuery.SCORING_BOOLEAN_REWRITE);
}
return query;
}
protected Query newRangeQuery(String field, String part1, String part2,
boolean startInclusive,
boolean endInclusive) {
boolean reverse = reverseAwareParser.isRangeShouldBeProtectedFromReverse(field,
part1);
return super.newRangeQuery(field,
reverse ? reverseAwareParser.getLowerBoundForReverse() : part1,
part2,
startInclusive || reverse,
endInclusive);
}
}
;

Thanks,
Bjarke

2017-10-05 21:15 GMT+02:00 Allison, Timothy B. :

> After some more digging, I'm wrong even at the Lucene level.
>
> When I use the CustomAnalyzer and make my UC vowel mock filter
> MultitermAware, I get this with Lucene in trunk:
>
> "the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck
>
> So, there's room for improvement with phrases, but the regular multiterms
> should be ok.
>
> Still no answer for you...
>
> 2017-10-05 14:34 GMT+02:00 Allison, Timothy B. :
>
> > There's every chance that I'm missing something at the Solr level, but
> > it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still
> > not applying analysis to multiterms.
> >
> > When I call this on 7.0.0:
> >QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> > analyzer);
> > return qp.parse(qString);
> >
> >  where the analyzer is a mock "uppercase vowel" analyzer[1] and the
> > qString is;
> >
> > "the* quick~" the* quick~ the quick
> >
> > I get this:
> > "the* quick~" name:the* name:quick~2 name:thE name:qUIck
>
>


RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
After some more digging, I'm wrong even at the Lucene level.

When I use the CustomAnalyzer and make my UC vowel mock filter MultitermAware, 
I get this with Lucene in trunk:

"the* quick~" name:thE* name:qUIck~2 name:thE name:qUIck

So, there's room for improvement with phrases, but the regular multiterms 
should be ok.

Still no answer for you...

2017-10-05 14:34 GMT+02:00 Allison, Timothy B. :

> There's every chance that I'm missing something at the Solr level, but 
> it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still 
> not applying analysis to multiterms.
>
> When I call this on 7.0.0:
>QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> analyzer);
> return qp.parse(qString);
>
>  where the analyzer is a mock "uppercase vowel" analyzer[1] and the 
> qString is;
>
> "the* quick~" the* quick~ the quick
>
> I get this:
> "the* quick~" name:the* name:quick~2 name:thE name:qUIck



RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
Prob the usual reasons...no one has submitted a patch yet, or could be a 
regression after LUCENE-7355.

See also:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201407.mbox/%3c1d06a081892adf4589bd83ee24b9dc3025971...@imcmbx02.mitre.org%3E

I'll take a look.


-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] 
Sent: Thursday, October 5, 2017 8:52 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

Thanks Tim,
that might be what I'm experiencing. I'm actually quite certain of it :-)

Do you remember any reason that multi term analysis is not happening in 
ComplexPhraseQueryParser?

I'm on 6.6.1, so latest on the 6.x branch.

2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> There's every chance that I'm missing something at the Solr level, but 
> it _looks_ at the Lucene level, like ComplexPhraseQueryParser is still 
> not applying analysis to multiterms.
>
> When I call this on 7.0.0:
>QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> analyzer);
> return qp.parse(qString);
>
>  where the analyzer is a mock "uppercase vowel" analyzer[1] and the 
> qString is;
>
> "the* quick~" the* quick~ the quick
>
> I get this:
> "the* quick~" name:the* name:quick~2 name:thE name:qUIck
>
>
> [1] https://github.com/tballison/lucene-addons/blob/master/
> lucene-5205/src/test/java/org/apache/lucene/queryparser/
> spans/TestAdvancedAnalyzers.java#L117
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Thursday, October 5, 2017 8:02 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Complexphrase treats wildcards differently than other 
> query parsers
>
> What version of Solr are you using?
>
> I thought this had been fixed fairly recently, but I can't quickly 
> find the JIRA.  Let me take a look.
>
> Best,
>
>  Tim
>
> This was one of my initial reasons for my SpanQueryParser 
> LUCENE-5205[1] and [2], which handles analysis of multiterms even in phrases.
>
> [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
> [2] https://mvnrepository.com/artifact/org.tallison.lucene/
> lucene-5205/6.6-0.1
>
> -----Original Message-
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Thursday, October 5, 2017 6:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other 
> query parsers
>
> 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:
>
> > Hi Bjarke,
> > You are right - I jumped into wrong/old conclusion as the simplest 
> > answer to your question.
>
>
>  No problem :-)
>
> I guess looking at the code could give you an answer.
> >
>
> This is what I would like to avoid out of fear that my head would 
> explode
> ;-)
>
>
> >
> > Thanks,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> > Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen 
> > > <morten...@eluence.com>
> > wrote:
> > >
> > > Well, according to
> > > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> > wildcard-multiterm-queries-in-solr/
> > > multiterm means
> > >
> > > wildcard
> > > range
> > > prefix
> > >
> > > so it is that way i'm using the word. That same article explains 
> > > how analysis will be performed with wildcards if the analyzers are 
> > > multi-term aware.
> > > Furthermore, both lucene and dismax do the correct analysis, so I 
> > > don't think you are right in your statement about the majority of 
> > > QPs skipping analysis for wildcards.
> > >
> > > So I'm still confused as to why complexphrase does things differently.
> > >
> > > Thanks,
> > > /Bjarke
> > >
> > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović 
> > ><emir.arnauto...@sematext.com
> > >:
> > >
> > >> Hi Bjarke,
> > >> It is not multiterm that is causing query parser to skip analysis 
> > >> chain but wildcard. The majority of query parsers do not analyse 
> > >> query string
> > if
> > >> there are wildcards.
> > >>
> > >> HTH
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> > &

Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
Thanks Tim,
that might be what I'm experiencing. I'm actually quite certain of it :-)

Do you remember any reason that multi term analysis is not happening in
ComplexPhraseQueryParser?

I'm on 6.6.1, so latest on the 6.x branch.

2017-10-05 14:34 GMT+02:00 Allison, Timothy B. <talli...@mitre.org>:

> There's every chance that I'm missing something at the Solr level, but it
> _looks_ at the Lucene level, like ComplexPhraseQueryParser is still not
> applying analysis to multiterms.
>
> When I call this on 7.0.0:
>QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName,
> analyzer);
> return qp.parse(qString);
>
>  where the analyzer is a mock "uppercase vowel" analyzer[1] and the
> qString is;
>
> "the* quick~" the* quick~ the quick
>
> I get this:
> "the* quick~" name:the* name:quick~2 name:thE name:qUIck
>
>
> [1] https://github.com/tballison/lucene-addons/blob/master/
> lucene-5205/src/test/java/org/apache/lucene/queryparser/
> spans/TestAdvancedAnalyzers.java#L117
>
> -Original Message-
> From: Allison, Timothy B. [mailto:talli...@mitre.org]
> Sent: Thursday, October 5, 2017 8:02 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Complexphrase treats wildcards differently than other query
> parsers
>
> What version of Solr are you using?
>
> I thought this had been fixed fairly recently, but I can't quickly find
> the JIRA.  Let me take a look.
>
> Best,
>
>  Tim
>
> This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1]
> and [2], which handles analysis of multiterms even in phrases.
>
> [1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
> [2] https://mvnrepository.com/artifact/org.tallison.lucene/
> lucene-5205/6.6-0.1
>
> -----Original Message-
> From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
> Sent: Thursday, October 5, 2017 6:28 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Complexphrase treats wildcards differently than other query
> parsers
>
> 2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:
>
> > Hi Bjarke,
> > You are right - I jumped into wrong/old conclusion as the simplest
> > answer to your question.
>
>
>  No problem :-)
>
> I guess looking at the code could give you an answer.
> >
>
> This is what I would like to avoid out of fear that my head would explode
> ;-)
>
>
> >
> > Thanks,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> > > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen
> > > <morten...@eluence.com>
> > wrote:
> > >
> > > Well, according to
> > > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> > wildcard-multiterm-queries-in-solr/
> > > multiterm means
> > >
> > > wildcard
> > > range
> > > prefix
> > >
> > > so it is that way i'm using the word. That same article explains how
> > > analysis will be performed with wildcards if the analyzers are
> > > multi-term aware.
> > > Furthermore, both lucene and dismax do the correct analysis, so I
> > > don't think you are right in your statement about the majority of
> > > QPs skipping analysis for wildcards.
> > >
> > > So I'm still confused as to why complexphrase does things differently.
> > >
> > > Thanks,
> > > /Bjarke
> > >
> > > 2017-10-05 10:16 GMT+02:00 Emir Arnautović
> > ><emir.arnauto...@sematext.com
> > >:
> > >
> > >> Hi Bjarke,
> > >> It is not multiterm that is causing query parser to skip analysis
> > >> chain but wildcard. The majority of query parsers do not analyse
> > >> query string
> > if
> > >> there are wildcards.
> > >>
> > >> HTH
> > >> Emir
> > >> --
> > >> Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > >> Elasticsearch Consulting Support Training - http://sematext.com/
> > >>
> > >>
> > >>
> > >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen
> > >>> <morten...@eluence.com>
> > >> wrote:
> > >>>
> > >>> Hi list,
> > >>>
> > >>> I'm trying to search for the term funktionsnedsättning* In my
> > >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
&g

RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
There's every chance that I'm missing something at the Solr level, but it 
_looks_ at the Lucene level, like ComplexPhraseQueryParser is still not 
applying analysis to multiterms.

When I call this on 7.0.0:
   QueryParser qp = new ComplexPhraseQueryParser(defaultFieldName, analyzer);
return qp.parse(qString);

 where the analyzer is a mock "uppercase vowel" analyzer[1] and the qString is;

"the* quick~" the* quick~ the quick

I get this:
"the* quick~" name:the* name:quick~2 name:thE name:qUIck


[1] 
https://github.com/tballison/lucene-addons/blob/master/lucene-5205/src/test/java/org/apache/lucene/queryparser/spans/TestAdvancedAnalyzers.java#L117

-Original Message-
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Thursday, October 5, 2017 8:02 AM
To: solr-user@lucene.apache.org
Subject: RE: Complexphrase treats wildcards differently than other query parsers

What version of Solr are you using?

I thought this had been fixed fairly recently, but I can't quickly find the 
JIRA.  Let me take a look.

Best,

 Tim

This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and 
[2], which handles analysis of multiterms even in phrases.

[1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
[2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 

-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com]
Sent: Thursday, October 5, 2017 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:

> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest 
> answer to your question.


 No problem :-)

I guess looking at the code could give you an answer.
>

This is what I would like to avoid out of fear that my head would explode
;-)


>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen 
> > <morten...@eluence.com>
> wrote:
> >
> > Well, according to
> > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> wildcard-multiterm-queries-in-solr/
> > multiterm means
> >
> > wildcard
> > range
> > prefix
> >
> > so it is that way i'm using the word. That same article explains how 
> > analysis will be performed with wildcards if the analyzers are 
> > multi-term aware.
> > Furthermore, both lucene and dismax do the correct analysis, so I 
> > don't think you are right in your statement about the majority of 
> > QPs skipping analysis for wildcards.
> >
> > So I'm still confused as to why complexphrase does things differently.
> >
> > Thanks,
> > /Bjarke
> >
> > 2017-10-05 10:16 GMT+02:00 Emir Arnautović 
> ><emir.arnauto...@sematext.com
> >:
> >
> >> Hi Bjarke,
> >> It is not multiterm that is causing query parser to skip analysis 
> >> chain but wildcard. The majority of query parsers do not analyse 
> >> query string
> if
> >> there are wildcards.
> >>
> >> HTH
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> >> Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
> >>> <morten...@eluence.com>
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> I'm trying to search for the term funktionsnedsättning* In my 
> >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
> >>> So I would expect that funktionsnedsättning* would translate to 
> >>> funktionsnedsattning*.
> >>>
> >>> If I use e.g. the lucene query parser, this is indeed what happens:
> >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives 
> >>> me "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsattning*"
> >>> and 15 documents returned.
> >>>
> >>> Trying the same with complexphrase gives me:
> >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning
> >>> *
> >> gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsättning*"
> >>> and 0 documents. Notice how ä has not been changed to a.
> >>>
> >>> How can this be? Is complexphrase somehow skipping the analysis 
> >>> chain
> for
> >>> multiterms, even though components and in particular 
> >>> MappingCharFilterFactory are Multi-term aware
> >>>
> >>> Are there any configuration gotchas that I'm not aware of?
> >>>
> >>> Thanks for the help,
> >>> Bjarke Buur Mortensen
> >>> Senior Software Engineer, Eluence A/S
> >>
> >>
>
>


RE: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Allison, Timothy B.
What version of Solr are you using?

I thought this had been fixed fairly recently, but I can't quickly find the 
JIRA.  Let me take a look.

Best,

 Tim

This was one of my initial reasons for my SpanQueryParser LUCENE-5205[1] and 
[2], which handles analysis of multiterms even in phrases.

[1] https://github.com/tballison/lucene-addons/tree/master/lucene-5205
[2] https://mvnrepository.com/artifact/org.tallison.lucene/lucene-5205/6.6-0.1 

-Original Message-
From: Bjarke Buur Mortensen [mailto:morten...@eluence.com] 
Sent: Thursday, October 5, 2017 6:28 AM
To: solr-user@lucene.apache.org
Subject: Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 11:29 GMT+02:00 Emir Arnautović <emir.arnauto...@sematext.com>:

> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest 
> answer to your question.


 No problem :-)

I guess looking at the code could give you an answer.
>

This is what I would like to avoid out of fear that my head would explode
;-)


>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen 
> > <morten...@eluence.com>
> wrote:
> >
> > Well, according to
> > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> wildcard-multiterm-queries-in-solr/
> > multiterm means
> >
> > wildcard
> > range
> > prefix
> >
> > so it is that way i'm using the word. That same article explains how 
> > analysis will be performed with wildcards if the analyzers are 
> > multi-term aware.
> > Furthermore, both lucene and dismax do the correct analysis, so I 
> > don't think you are right in your statement about the majority of 
> > QPs skipping analysis for wildcards.
> >
> > So I'm still confused as to why complexphrase does things differently.
> >
> > Thanks,
> > /Bjarke
> >
> > 2017-10-05 10:16 GMT+02:00 Emir Arnautović 
> ><emir.arnauto...@sematext.com
> >:
> >
> >> Hi Bjarke,
> >> It is not multiterm that is causing query parser to skip analysis 
> >> chain but wildcard. The majority of query parsers do not analyse 
> >> query string
> if
> >> there are wildcards.
> >>
> >> HTH
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> >> Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
> >>> <morten...@eluence.com>
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> I'm trying to search for the term funktionsnedsättning* In my 
> >>> analyzer chain I use a MappingCharFilterFactory to change ä to a.
> >>> So I would expect that funktionsnedsättning* would translate to 
> >>> funktionsnedsattning*.
> >>>
> >>> If I use e.g. the lucene query parser, this is indeed what happens:
> >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives 
> >>> me "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsattning*"
> >>> and 15 documents returned.
> >>>
> >>> Trying the same with complexphrase gives me:
> >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning
> >>> *
> >> gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsättning*"
> >>> and 0 documents. Notice how ä has not been changed to a.
> >>>
> >>> How can this be? Is complexphrase somehow skipping the analysis 
> >>> chain
> for
> >>> multiterms, even though components and in particular 
> >>> MappingCharFilterFactory are Multi-term aware
> >>>
> >>> Are there any configuration gotchas that I'm not aware of?
> >>>
> >>> Thanks for the help,
> >>> Bjarke Buur Mortensen
> >>> Senior Software Engineer, Eluence A/S
> >>
> >>
>
>


Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
2017-10-05 11:29 GMT+02:00 Emir Arnautović :

> Hi Bjarke,
> You are right - I jumped into wrong/old conclusion as the simplest answer
> to your question.


 No problem :-)

I guess looking at the code could give you an answer.
>

This is what I would like to avoid out of fear that my head would explode
;-)


>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen 
> wrote:
> >
> > Well, according to
> > https://lucidworks.com/2011/11/29/whats-with-lowercasing-
> wildcard-multiterm-queries-in-solr/
> > multiterm means
> >
> > wildcard
> > range
> > prefix
> >
> > so it is that way i'm using the word. That same article explains how
> > analysis will be performed with wildcards if the analyzers are multi-term
> > aware.
> > Furthermore, both lucene and dismax do the correct analysis, so I don't
> > think you are right in your statement about the majority of QPs skipping
> > analysis for wildcards.
> >
> > So I'm still confused as to why complexphrase does things differently.
> >
> > Thanks,
> > /Bjarke
> >
> > 2017-10-05 10:16 GMT+02:00 Emir Arnautović  >:
> >
> >> Hi Bjarke,
> >> It is not multiterm that is causing query parser to skip analysis chain
> >> but wildcard. The majority of query parsers do not analyse query string
> if
> >> there are wildcards.
> >>
> >> HTH
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
> >> wrote:
> >>>
> >>> Hi list,
> >>>
> >>> I'm trying to search for the term funktionsnedsättning*
> >>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
> >>> So I would expect that funktionsnedsättning* would translate to
> >>> funktionsnedsattning*.
> >>>
> >>> If I use e.g. the lucene query parser, this is indeed what happens:
> >>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsattning*"
> >>> and 15 documents returned.
> >>>
> >>> Trying the same with complexphrase gives me:
> >>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning*
> >> gives me
> >>> "rawquerystring":"funktionsnedsättning*", "querystring":
> >>> "funktionsnedsättning*", "parsedquery":"content_ol:
> >> funktionsnedsättning*"
> >>> and 0 documents. Notice how ä has not been changed to a.
> >>>
> >>> How can this be? Is complexphrase somehow skipping the analysis chain
> for
> >>> multiterms, even though components and in particular
> >>> MappingCharFilterFactory are Multi-term aware
> >>>
> >>> Are there any configuration gotchas that I'm not aware of?
> >>>
> >>> Thanks for the help,
> >>> Bjarke Buur Mortensen
> >>> Senior Software Engineer, Eluence A/S
> >>
> >>
>
>


Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
Hi Bjarke,
You are right - I jumped into wrong/old conclusion as the simplest answer to 
your question. I guess looking at the code could give you an answer.

Thanks,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 5 Oct 2017, at 10:44, Bjarke Buur Mortensen  wrote:
> 
> Well, according to
> https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
> multiterm means
> 
> wildcard
> range
> prefix
> 
> so it is that way i'm using the word. That same article explains how
> analysis will be performed with wildcards if the analyzers are multi-term
> aware.
> Furthermore, both lucene and dismax do the correct analysis, so I don't
> think you are right in your statement about the majority of QPs skipping
> analysis for wildcards.
> 
> So I'm still confused as to why complexphrase does things differently.
> 
> Thanks,
> /Bjarke
> 
> 2017-10-05 10:16 GMT+02:00 Emir Arnautović :
> 
>> Hi Bjarke,
>> It is not multiterm that is causing query parser to skip analysis chain
>> but wildcard. The majority of query parsers do not analyse query string if
>> there are wildcards.
>> 
>> HTH
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
>> wrote:
>>> 
>>> Hi list,
>>> 
>>> I'm trying to search for the term funktionsnedsättning*
>>> In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
>>> So I would expect that funktionsnedsättning* would translate to
>>> funktionsnedsattning*.
>>> 
>>> If I use e.g. the lucene query parser, this is indeed what happens:
>>> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me
>>> "rawquerystring":"funktionsnedsättning*", "querystring":
>>> "funktionsnedsättning*", "parsedquery":"content_ol:
>> funktionsnedsattning*"
>>> and 15 documents returned.
>>> 
>>> Trying the same with complexphrase gives me:
>>> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning*
>> gives me
>>> "rawquerystring":"funktionsnedsättning*", "querystring":
>>> "funktionsnedsättning*", "parsedquery":"content_ol:
>> funktionsnedsättning*"
>>> and 0 documents. Notice how ä has not been changed to a.
>>> 
>>> How can this be? Is complexphrase somehow skipping the analysis chain for
>>> multiterms, even though components and in particular
>>> MappingCharFilterFactory are Multi-term aware
>>> 
>>> Are there any configuration gotchas that I'm not aware of?
>>> 
>>> Thanks for the help,
>>> Bjarke Buur Mortensen
>>> Senior Software Engineer, Eluence A/S
>> 
>> 



Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Bjarke Buur Mortensen
Well, according to
https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
multiterm means

wildcard
range
prefix

so it is that way i'm using the word. That same article explains how
analysis will be performed with wildcards if the analyzers are multi-term
aware.
Furthermore, both lucene and dismax do the correct analysis, so I don't
think you are right in your statement about the majority of QPs skipping
analysis for wildcards.

So I'm still confused as to why complexphrase does things differently.

Thanks,
/Bjarke

2017-10-05 10:16 GMT+02:00 Emir Arnautović :

> Hi Bjarke,
> It is not multiterm that is causing query parser to skip analysis chain
> but wildcard. The majority of query parsers do not analyse query string if
> there are wildcards.
>
> HTH
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen 
> wrote:
> >
> > Hi list,
> >
> > I'm trying to search for the term funktionsnedsättning*
> > In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
> > So I would expect that funktionsnedsättning* would translate to
> > funktionsnedsattning*.
> >
> > If I use e.g. the lucene query parser, this is indeed what happens:
> > ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me
> > "rawquerystring":"funktionsnedsättning*", "querystring":
> > "funktionsnedsättning*", "parsedquery":"content_ol:
> funktionsnedsattning*"
> > and 15 documents returned.
> >
> > Trying the same with complexphrase gives me:
> > ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning*
> gives me
> > "rawquerystring":"funktionsnedsättning*", "querystring":
> > "funktionsnedsättning*", "parsedquery":"content_ol:
> funktionsnedsättning*"
> > and 0 documents. Notice how ä has not been changed to a.
> >
> > How can this be? Is complexphrase somehow skipping the analysis chain for
> > multiterms, even though components and in particular
> > MappingCharFilterFactory are Multi-term aware
> >
> > Are there any configuration gotchas that I'm not aware of?
> >
> > Thanks for the help,
> > Bjarke Buur Mortensen
> > Senior Software Engineer, Eluence A/S
>
>


Re: Complexphrase treats wildcards differently than other query parsers

2017-10-05 Thread Emir Arnautović
Hi Bjarke,
It is not multiterm that is causing query parser to skip analysis chain but 
wildcard. The majority of query parsers do not analyse query string if there 
are wildcards.

HTH
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 4 Oct 2017, at 22:08, Bjarke Buur Mortensen  wrote:
> 
> Hi list,
> 
> I'm trying to search for the term funktionsnedsättning*
> In my analyzer chain I use a MappingCharFilterFactory to change ä to a.
> So I would expect that funktionsnedsättning* would translate to
> funktionsnedsattning*.
> 
> If I use e.g. the lucene query parser, this is indeed what happens:
> ...debugQuery=on=lucene=funktionsneds%C3%A4ttning* gives me
> "rawquerystring":"funktionsnedsättning*", "querystring":
> "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsattning*"
> and 15 documents returned.
> 
> Trying the same with complexphrase gives me:
> ...debugQuery=on=complexphrase=funktionsneds%C3%A4ttning* gives me
> "rawquerystring":"funktionsnedsättning*", "querystring":
> "funktionsnedsättning*", "parsedquery":"content_ol:funktionsnedsättning*"
> and 0 documents. Notice how ä has not been changed to a.
> 
> How can this be? Is complexphrase somehow skipping the analysis chain for
> multiterms, even though components and in particular
> MappingCharFilterFactory are Multi-term aware
> 
> Are there any configuration gotchas that I'm not aware of?
> 
> Thanks for the help,
> Bjarke Buur Mortensen
> Senior Software Engineer, Eluence A/S