Re: query bag of word with negation
Queries of the form *some* can be _quite_ expensive, make sure you test on a realistic corpus. ngrams are often used to solve that problem. If you mean *some then you may want to include ReverseWildcardFilterFactory Best, Erick On Sun, Apr 22, 2018 at 12:33 PM, Nicolas Paris <nipari...@gmail.com> wrote: > Hello Markus > > Thanks ! > > The ComplexPhraseQueryParser syntax: > q={!complexphrase inOrder=false}collector:"wonderful pizza -peperoni"~5 > answers my needs. > > BTW, > Apparently it accepts both leading/ending wildcards, that's look powerful > feature. > > Any chance it would support the "sow=false" in order to combine with > multi-word synonyms ? > > > 2018-04-22 21:11 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: > >> Hello Nicolas, >> >> Yes you can! Check out ComplexPhaseQParser >> https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers- >> ComplexPhraseQueryParser >> >> Regards, >> Markus >> >> >> >> -Original message----- >> > From:Nicolas Paris <nipari...@gmail.com> >> > Sent: Sunday 22nd April 2018 20:04 >> > To: solr-user@lucene.apache.org >> > Subject: query bag of word with negation >> > >> > Hello >> > >> > I wonder if there is a plain text query syntax to say: >> > give me all document that match: >> > >> > wonderful pizza NOT peperoni >> > >> > all those in a 5 distance word bag >> > then >> > >> > pizza are wonderful -> would match >> > I made a wonderful pasta and pizza -> would match >> > Peperoni pizza are so wonderful -> would not match >> > >> > I tested: >> > "wonderful pizza - peperoni"~5 >> > without success >> > >> > Thanks >> > >>
Re: query bag of word with negation
Hello Markus Thanks ! The ComplexPhraseQueryParser syntax: q={!complexphrase inOrder=false}collector:"wonderful pizza -peperoni"~5 answers my needs. BTW, Apparently it accepts both leading/ending wildcards, that's look powerful feature. Any chance it would support the "sow=false" in order to combine with multi-word synonyms ? 2018-04-22 21:11 GMT+02:00 Markus Jelsma <markus.jel...@openindex.io>: > Hello Nicolas, > > Yes you can! Check out ComplexPhaseQParser > https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers- > ComplexPhraseQueryParser > > Regards, > Markus > > > > -Original message- > > From:Nicolas Paris <nipari...@gmail.com> > > Sent: Sunday 22nd April 2018 20:04 > > To: solr-user@lucene.apache.org > > Subject: query bag of word with negation > > > > Hello > > > > I wonder if there is a plain text query syntax to say: > > give me all document that match: > > > > wonderful pizza NOT peperoni > > > > all those in a 5 distance word bag > > then > > > > pizza are wonderful -> would match > > I made a wonderful pasta and pizza -> would match > > Peperoni pizza are so wonderful -> would not match > > > > I tested: > > "wonderful pizza - peperoni"~5 > > without success > > > > Thanks > > >
Re: query bag of word with negation
1. Query terms containing other than just letters or digits may be placed >> within double quotes so that those other characters do not separate a term >> into many terms. A dot (period) and white space are neither letter nor >> digit. Examples: "Now is the time for all good men" (spaces, quotes impose >> ordering too), "goods.doc" (a dot). > > > 2. Mode button "or" (the default) means match one or more terms, perhaps >> scattered about. Mode button "and" means must match all terms, scattered or >> not. > > > 3. A one word query term may be prefixed by title: or url: to search on >> those fields. A space must follow the colon, and the search term is case >> sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a >> formal internal title field, thus prefix title: may not work. > > > 4. Compound queries can be built by joining terms with and or - and group >> items with ( ). Not is expressed as a minus sign prefixing a term. A bare >> space means use the Mode (or, and). Example: Nancy and Mary and -Jane and >> -(Robert Daniel) which means both the first two and not Jane and neither of >> the two guys. > > 5. A query of asterisk/star (*) means match everything. Examples: * for >> everything (zero or more characters). Fussy, show all without term .pdf * >> and -".pdf" For normal queries the program uses the edismax interface. A >> few, such as url: foobar, reference the Lucene interface. This is specified >> by the qagent= parameter, of edismax or empty respectively, in a search >> request. Thus regular facilities can do most of this work. > > > What this example does not address is your distance 5 critera. However, >> the NOT facility may do the trick for you, though a minus sign is taken as >> a literal minus sign or word separator if located within a quoted string. > > Indeed sadly words can be anywhere in the document (no notion of distance) Thanks, Joe D. > > Thanks for the 5 details anyway
RE: query bag of word with negation
Hello Nicolas, Yes you can! Check out ComplexPhaseQParser https://lucene.apache.org/solr/guide/6_6/other-parsers.html#OtherParsers-ComplexPhraseQueryParser Regards, Markus -Original message- > From:Nicolas Paris <nipari...@gmail.com> > Sent: Sunday 22nd April 2018 20:04 > To: solr-user@lucene.apache.org > Subject: query bag of word with negation > > Hello > > I wonder if there is a plain text query syntax to say: > give me all document that match: > > wonderful pizza NOT peperoni > > all those in a 5 distance word bag > then > > pizza are wonderful -> would match > I made a wonderful pasta and pizza -> would match > Peperoni pizza are so wonderful -> would not match > > I tested: > "wonderful pizza - peperoni"~5 > without success > > Thanks >
Re: query bag of word with negation
On 22/04/2018 19:26, Joe Doupnik wrote: On 22/04/2018 19:04, Nicolas Paris wrote: Hello I wonder if there is a plain text query syntax to say: give me all document that match: wonderful pizza NOT peperoni all those in a 5 distance word bag then pizza are wonderful -> would match I made a wonderful pasta and pizza -> would match Peperoni pizza are so wonderful -> would not match I tested: "wonderful pizza - peperoni"~5 without success Thanks --- A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Thanks, Joe D. -- Golly, that was well and truly munged by the receiver. Let me try again - A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Hopefully that will be more readable. Thanks, Joe D.
Re: query bag of word with negation
On 22/04/2018 19:04, Nicolas Paris wrote: Hello I wonder if there is a plain text query syntax to say: give me all document that match: wonderful pizza NOT peperoni all those in a 5 distance word bag then pizza are wonderful -> would match I made a wonderful pasta and pizza -> would match Peperoni pizza are so wonderful -> would not match I tested: "wonderful pizza - peperoni"~5 without success Thanks --- A partial answer to your question is contained in this Help screen text from my Solr query program: Some hints about using this facility: 1. Query terms containing other than just letters or digits may be placed within double quotes so that those other characters do not separate a term into many terms. A dot (period) and white space are neither letter nor digit. Examples: "Now is the time for all good men" (spaces, quotes impose ordering too), "goods.doc" (a dot). 2. Mode button "or" (the default) means match one or more terms, perhaps scattered about. Mode button "and" means must match all terms, scattered or not. 3. A one word query term may be prefixed by title: or url: to search on those fields. A space must follow the colon, and the search term is case sensitive. Examples: url: .ppt or title: Goodies. Many docs do not have a formal internal title field, thus prefix title: may not work. 4. Compound queries can be built by joining terms with and or - and group items with ( ). Not is expressed as a minus sign prefixing a term. A bare space means use the Mode (or, and). Example: Nancy and Mary and -Jane and -(Robert Daniel) which means both the first two and not Jane and neither of the two guys. 5. A query of asterisk/star (*) means match everything. Examples: * for everything (zero or more characters). Fussy, show all without term .pdf * and -".pdf" For normal queries the program uses the edismax interface. A few, such as url: foobar, reference the Lucene interface. This is specified by the qagent= parameter, of edismax or empty respectively, in a search request. Thus regular facilities can do most of this work. What this example does not address is your distance 5 critera. However, the NOT facility may do the trick for you, though a minus sign is taken as a literal minus sign or word separator if located within a quoted string. Thanks, Joe D.
query bag of word with negation
Hello I wonder if there is a plain text query syntax to say: give me all document that match: wonderful pizza NOT peperoni all those in a 5 distance word bag then pizza are wonderful -> would match I made a wonderful pasta and pizza -> would match Peperoni pizza are so wonderful -> would not match I tested: "wonderful pizza - peperoni"~5 without success Thanks