Re: query bag of word with negation

2018-04-22 Thread Erick Erickson
Queries of the form *some* can be _quite_ expensive, make sure you
test on a realistic corpus.

ngrams are often used to solve that problem.

If you mean *some then you may want to include ReverseWildcardFilterFactory

Best,
Erick

On Sun, Apr 22, 2018 at 12:33 PM, Nicolas Paris <nipari...@gmail.com> wrote:
> Hello Markus
>
> Thanks !
>
> The ComplexPhraseQueryParser syntax:
> q={!complexphrase inOrder=false}collector:"wonderful pizza -peperoni"~5
> answers my needs.
>
> BTW,
> Apparently it accepts both leading/ending wildcards, that's look powerful
> feature.
>
> Any chance it would support the "sow=false" in order to combine with
>  multi-word synonyms ?
>
>
> 2018-04-22 21:11 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:
>
>> Hello Nicolas,
>>
>> Yes you can! Check out ComplexPhaseQParser
>> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-
>> ComplexPhraseQueryParser
>>
>> Regards,
>> Markus
>>
>>
>>
>> -Original message-----
>> > From:Nicolas Paris <nipari...@gmail.com>
>> > Sent: Sunday 22nd April 2018 20:04
>> > To: solr-user@lucene.apache.org
>> > Subject: query bag of word with negation
>> >
>> > Hello
>> >
>> > I wonder if there is a plain text query syntax to say:
>> > give me all document that match:
>> >
>> > wonderful pizza NOT peperoni
>> >
>> > all those in a 5 distance word bag
>> > then
>> >
>> > pizza are wonderful -> would match
>> > I made a wonderful pasta and pizza -> would match
>> > Peperoni pizza are so wonderful -> would not match
>> >
>> > I tested:
>> > "wonderful pizza - peperoni"~5
>> > without success
>> >
>> > Thanks
>> >
>>


Re: query bag of word with negation

2018-04-22 Thread Nicolas Paris
Hello Markus

Thanks !

The ComplexPhraseQueryParser syntax:
q={!complexphrase inOrder=false}collector:"wonderful pizza -peperoni"~5
answers my needs.

BTW,
Apparently it accepts both leading/ending wildcards, that's look powerful
feature.

Any chance it would support the "sow=false" in order to combine with
 multi-word synonyms ?


2018-04-22 21:11 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>:

> Hello Nicolas,
>
> Yes you can! Check out ComplexPhaseQParser
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-
> ComplexPhraseQueryParser
>
> Regards,
> Markus
>
>
>
> -Original message-
> > From:Nicolas Paris <nipari...@gmail.com>
> > Sent: Sunday 22nd April 2018 20:04
> > To: solr-user@lucene.apache.org
> > Subject: query bag of word with negation
> >
> > Hello
> >
> > I wonder if there is a plain text query syntax to say:
> > give me all document that match:
> >
> > wonderful pizza NOT peperoni
> >
> > all those in a 5 distance word bag
> > then
> >
> > pizza are wonderful -> would match
> > I made a wonderful pasta and pizza -> would match
> > Peperoni pizza are so wonderful -> would not match
> >
> > I tested:
> > "wonderful pizza - peperoni"~5
> > without success
> >
> > Thanks
> >
>


Re: query bag of word with negation

2018-04-22 Thread Nicolas Paris
  1. Query terms containing other than just letters or digits may be placed
>> within double quotes so that  those other characters do not separate a term
>> into many terms. A dot (period) and white space are neither  letter nor
>> digit. Examples: "Now is the time for all good men" (spaces, quotes impose
>> ordering too), "goods.doc" (a dot).
>
>

> 2. Mode button "or" (the default) means match one or more terms, perhaps
>> scattered about. Mode button "and" means must match all terms, scattered or
>> not.
>
>

> 3. A one word query term may be prefixed by title: or url: to search on
>> those fields. A space must follow the colon, and the search term is case
>> sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a
>> formal internal title field, thus prefix title: may not work.
>
>

> 4. Compound queries can be built by joining terms with and or - and group
>> items with ( ). Not is expressed as a minus sign prefixing a term. A bare
>> space means use the Mode (or, and). Example: Nancy and Mary and -Jane and
>> -(Robert Daniel) which means both the first two and not Jane and neither of
>> the two guys.
>
>



5. A query of asterisk/star (*) means match everything. Examples: * for
>> everything (zero or more characters). Fussy, show all without term .pdf *
>> and -".pdf" For normal queries the program uses the edismax interface. A
>> few, such as url: foobar, reference the Lucene interface. This is specified
>> by the qagent= parameter, of edismax or empty respectively, in a search
>> request. Thus regular facilities can do most of this work.
>
>


> What this example does not address is your distance 5 critera. However,
>> the NOT facility may do the trick for you, though a minus sign is taken as
>> a literal minus sign or word separator if located within a quoted string.
>
>
​​Indeed sadly words can be anywhere in the document ​ (no notion of
distance​)

Thanks, Joe D.
>
>
​Thanks for the 5 details anyway​


RE: query bag of word with negation

2018-04-22 Thread Markus Jelsma
Hello Nicolas,

Yes you can! Check out ComplexPhaseQParser
https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser

Regards,
Markus

 
 
-Original message-
> From:Nicolas Paris <nipari...@gmail.com>
> Sent: Sunday 22nd April 2018 20:04
> To: solr-user@lucene.apache.org
> Subject: query bag of word with negation
> 
> Hello
> 
> I wonder if there is a plain text query syntax to say:
> give me all document that match:
> 
> wonderful pizza NOT peperoni
> 
> all those in a 5 distance word bag
> then
> 
> pizza are wonderful -> would match
> I made a wonderful pasta and pizza -> would match
> Peperoni pizza are so wonderful -> would not match
> 
> I tested:
> "wonderful pizza - peperoni"~5
> without success
> 
> Thanks
> 


Re: query bag of word with negation

2018-04-22 Thread Joe Doupnik

On 22/04/2018 19:26, Joe Doupnik wrote:

On 22/04/2018 19:04, Nicolas Paris wrote:

Hello

I wonder if there is a plain text query syntax to say:
give me all document that match:

wonderful pizza NOT peperoni

all those in a 5 distance word bag
then

pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not match

I tested:
"wonderful pizza - peperoni"~5
without success

Thanks


---
    A partial answer to your question is contained in this Help screen 
text from my Solr query program:


Some hints about using this facility: 1. Query terms containing other 
than just letters or digits may be placed within double quotes so that 
 those other characters do not separate a term into many terms. A dot 
(period) and white space are neither  letter nor digit. Examples: "Now 
is the time for all good men" (spaces, quotes impose ordering too), 
"goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
or more terms, perhaps scattered about. Mode button "and" means must 
match all terms, scattered or not. 3. A one word query term may be 
prefixed by title: or url: to search on those fields. A space must 
follow the colon, and the search term is case sensitive. Examples: 
url: .ppt or title: Goodies. Many docs do not have a formal internal 
title field, thus prefix title: may not work. 4. Compound queries can 
be built by joining terms with and or - and group items with ( ). Not 
is expressed as a minus sign prefixing a term. A bare space means use 
the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert 
Daniel) which means both the first two and not Jane and neither of the 
two guys. 5. A query of asterisk/star (*) means match everything. 
Examples: * for everything (zero or more characters). Fussy, show all 
without term .pdf * and -".pdf" For normal queries the program uses 
the edismax interface. A few, such as url: foobar, reference the 
Lucene interface. This is specified by the qagent= parameter, of 
edismax or empty respectively, in a search request. Thus regular 
facilities can do most of this work. What this example does not 
address is your distance 5 critera. However, the NOT facility may do 
the trick for you, though a minus sign is taken as a literal minus 
sign or word separator if located within a quoted string. Thanks, Joe D.




--
    Golly, that was well and truly munged by the receiver. Let me try 
again -
    A partial answer to your question is contained in this Help screen 
text from my Solr query program:
Some hints about using this facility: 1. Query terms containing other 
than just letters or digits may be placed within double quotes so that 
 those other characters do not separate a term into many terms. A dot 
(period) and white space are neither  letter nor digit. Examples: "Now 
is the time for all good men" (spaces, quotes impose ordering too), 
"goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
or more terms, perhaps scattered about. Mode button "and" means must 
match all terms, scattered or not. 3. A one word query term may be 
prefixed by title: or url: to search on those fields. A space must 
follow the colon, and the search term is case sensitive. Examples: 
url: .ppt or title: Goodies. Many docs do not have a formal internal 
title field, thus prefix title: may not work. 4. Compound queries can 
be built by joining terms with and or - and group items with ( ). Not 
is expressed as a minus sign prefixing a term. A bare space means use 
the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert 
Daniel) which means both the first two and not Jane and neither of the 
two guys. 5. A query of asterisk/star (*) means match everything. 
Examples: * for everything (zero or more characters). Fussy, show all 
without term .pdf * and -".pdf" For normal queries the program uses 
the edismax interface. A few, such as url: foobar, reference the 
Lucene interface. This is specified by the qagent= parameter, of 
edismax or empty respectively, in a search request. Thus regular 
facilities can do most of this work. What this example does not 
address is your distance 5 critera. However, the NOT facility may do 
the trick for you, though a minus sign is taken as a literal minus 
sign or word separator if located within a quoted string.

    Hopefully that will be more readable.
    Thanks,
    Joe D.


Re: query bag of word with negation

2018-04-22 Thread Joe Doupnik

On 22/04/2018 19:04, Nicolas Paris wrote:

Hello

I wonder if there is a plain text query syntax to say:
give me all document that match:

wonderful pizza NOT peperoni

all those in a 5 distance word bag
then

pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not match

I tested:
"wonderful pizza - peperoni"~5
without success

Thanks


---
    A partial answer to your question is contained in this Help screen 
text from my Solr query program:


Some hints about using this facility: 1. Query terms containing other 
than just letters or digits may be placed within double quotes so that 
 those other characters do not separate a term into many terms. A dot 
(period) and white space are neither  letter nor digit. Examples: "Now 
is the time for all good men" (spaces, quotes impose ordering too), 
"goods.doc" (a dot). 2. Mode button "or" (the default) means match one 
or more terms, perhaps scattered about. Mode button "and" means must 
match all terms, scattered or not. 3. A one word query term may be 
prefixed by title: or url: to search on those fields. A space must 
follow the colon, and the search term is case sensitive. Examples: url: 
.ppt or title: Goodies. Many docs do not have a formal internal title 
field, thus prefix title: may not work. 4. Compound queries can be built 
by joining terms with and or - and group items with ( ). Not is 
expressed as a minus sign prefixing a term. A bare space means use the 
Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) 
which means both the first two and not Jane and neither of the two guys. 
5. A query of asterisk/star (*) means match everything. Examples: * for 
everything (zero or more characters). Fussy, show all without term .pdf 
* and -".pdf" For normal queries the program uses the edismax interface. 
A few, such as url: foobar, reference the Lucene interface. This is 
specified by the qagent= parameter, of edismax or empty respectively, in 
a search request. Thus regular facilities can do most of this work. What 
this example does not address is your distance 5 critera. However, the 
NOT facility may do the trick for you, though a minus sign is taken as a 
literal minus sign or word separator if located within a quoted string. 
Thanks, Joe D.




query bag of word with negation

2018-04-22 Thread Nicolas Paris
Hello

I wonder if there is a plain text query syntax to say:
give me all document that match:

wonderful pizza NOT peperoni

all those in a 5 distance word bag
then

pizza are wonderful -> would match
I made a wonderful pasta and pizza -> would match
Peperoni pizza are so wonderful -> would not match

I tested:
"wonderful pizza - peperoni"~5
without success

Thanks