Hi Matt, I think your concern about performance is spot-on, though.
The combinatorial explosion would be at query time, not at index time - my solution has a single token indexed per document. My suggested query-time filter would generate the following number of output terms, where C(n,k) is the combination of n things taken k at a time, n is the number of input query terms, and k is the number of concatenated input query terms forming one output query term: C(n,1)+C(n,2)...+C(n,n-1)+C(n,n) For small queries this would not be a problem: 1 input query term -> 1 output query term 2 input query terms -> 3 output query terms 3 input query terms -> 7 output query terms 4 input query terms -> 15 output query terms But for larger queries, it could be fairly expensive: 10 input query terms -> 1,023 output query terms ... 15 input query terms -> 32,767 output query terms This is exactly (2^n - 1) output query terms, where n is the number of input terms. 32k query terms might be too slow to be functional. Steve > -----Original Message----- > From: Matthew Hall [mailto:mh...@informatics.jax.org] > Sent: Tuesday, October 26, 2010 3:51 PM > To: solr-user@lucene.apache.org > Subject: Re: How do I this in Solr? > > Bah.. nope this would miss documents that only match a subset of the > given terms. > > I'm going to have to go with Steven's approach as the right choice here. > > Matt > > On 10/26/2010 3:44 PM, Matthew Hall wrote: > > Indeed, I'd missed the second part of his requirements, my and > > solution is sadly insufficient to this task. > > > > The combinatorial part of you solution worries me a bit though Steven, > > because his documents that are on the larger side of his corpus would > > likely slow down query performance a bit while the filter calculates > > all of the possibilities for a given document. > > > > I'm wondering if a slightly hybrid approach would be valid: > > > > Have a filter that calculates the total number of terms for a given > > document. And then add a clause into your query at runtime that would > > match what the filter would come up with: > > > > So: > > > > text:"Nokia" AND text:"Mobile" AND text:"GPS" AND termCount: 3 > > > > Something like that anyhow. > > > > Matt > > > > On 10/26/2010 3:35 PM, Dennis Gearon wrote: > >> I'm the LAST person anyone will ever need to worry about flame > >> baiting. You did notice that I retracted what I said and supported > >> your point of view? > >> > >> Sorry if my cryptic comment sounded critical. I was wrong, you were > >> right :-) > >> Dennis Gearon > >> > >> Signature Warning > >> ---------------- > >> It is always a good idea to learn from your own mistakes. It is > >> usually a better idea to learn from others’ mistakes, so you do not > >> have to make them yourself. from > >> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > >> > >> EARTH has a Right To Life, > >> otherwise we all die. > >> > >> > >> --- On Tue, 10/26/10, Steven A Rowe<sar...@syr.edu> wrote: > >> > >>> From: Steven A Rowe<sar...@syr.edu> > >>> Subject: RE: How do I this in Solr? > >>> To: "solr-user@lucene.apache.org"<solr-user@lucene.apache.org> > >>> Date: Tuesday, October 26, 2010, 12:27 PM > >>> Hi Dennis, > >>> > >>> You wrote: > >>>> If Solr is like Google, once documents matching only > >>> the ANDed items > >>>> in the query ran out, then those that had only two of > >>> the terms, then > >>>> only 1 of the terms, and then those close to it would > >>> start showing up. > >>> [...] > >>>> Plus, if he wants terms that contain ONLY those words, > >>> and no others, an > >>>> ANDed query would not do that, right? ANDed queries > >>> return results that > >>>> must have ALL the terms listed, and could have lots of > >>> other words, right? > >>> > >>> This is *exactly* what I just said: ANDed queries (i.e., > >>> requiring all query terms) will not satisfy Varun's > >>> requirements. > >>> > >>> Your participation in this thread looks an awful lot like > >>> flame-bating: Someone else asks a question, I answer with a > >>> possible solution, you give a one-word "overkill" response, > >>> I say why it's not overkill. You then ask if anybody > >>> knows the answer to the original question, and then parrot > >>> my response to your "overkill" statement. Really???? > >>> > >>> Get your shit together or shut up. Please. > >>> > >>> Steve > >>> > >>>> -----Original Message----- > >>>> From: Dennis Gearon [mailto:gear...@sbcglobal.net] > >>>> Sent: Tuesday, October 26, 2010 3:14 PM > >>>> To: solr-user@lucene.apache.org > >>>> Subject: RE: How do I this in Solr? > >>>> > >>>> > >>>> > >>>> Dennis Gearon > >>>> > >>>> Signature Warning > >>>> ---------------- > >>>> It is always a good idea to learn from your own > >>> mistakes. It is usually a > >>>> better idea to learn from others’ mistakes, so you > >>> do not have to make > >>>> them yourself. from > >>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > >>>> EARTH has a Right To Life, > >>>> otherwise we all die. > >>>> > >>>> > >>>> --- On Tue, 10/26/10, Steven A Rowe<sar...@syr.edu> > >>> wrote: > >>>>> From: Steven A Rowe<sar...@syr.edu> > >>>>> Subject: RE: How do I this in Solr? > >>>>> To: "solr-user@lucene.apache.org" > >>> <solr-user@lucene.apache.org> > >>>>> Date: Tuesday, October 26, 2010, 12:10 PM > >>>>> Dennis, > >>>>> > >>>>> Do you mean to say that you read my earlier post, > >>> and > >>>>> disagree that it would solve the problem? Or > >>> have you > >>>>> simply not read it? > >>>>> > >>>>> Steve > >>>>> > >>>>>> -----Original Message----- > >>>>>> From: Dennis Gearon [mailto:gear...@sbcglobal.net] > >>>>>> Sent: Tuesday, October 26, 2010 3:00 PM > >>>>>> To: solr-user@lucene.apache.org > >>>>>> Subject: RE: How do I this in Solr? > >>>>>> > >>>>>> Good point. Since I might need such a query > >>> myself > >>>>> someday, how *IS* that > >>>>>> done? > >>>>>> > >>>>>> > >>>>>> Dennis Gearon > >>>>>> > >>>>>> Signature Warning > >>>>>> ---------------- > >>>>>> It is always a good idea to learn from your > >>> own > >>>>> mistakes. It is usually a > >>>>>> better idea to learn from others’ > >>> mistakes, so you > >>>>> do not have to make > >>>>>> them yourself. from > >>>>>> 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > >>>>>> EARTH has a Right To Life, > >>>>>> otherwise we all die. > >>>>>> > >>>>>> > >>>>>> --- On Tue, 10/26/10, Steven A Rowe<sar...@syr.edu> > >>>>> wrote: > >>>>>>> From: Steven A Rowe<sar...@syr.edu> > >>>>>>> Subject: RE: How do I this in Solr? > >>>>>>> To: "solr-user@lucene.apache.org" > >>>>> <solr-user@lucene.apache.org> > >>>>>>> Date: Tuesday, October 26, 2010, 11:46 > >>> AM > >>>>>>> Um, maybe I'm way off base, but when > >>>>>>> Varun said: > >>>>>>> > >>>>>>>> If I search with the text "samsung > >>> andriod > >>>>> GPS", > >>>>>>>> search results should only conain > >>> "samsung", > >>>>> "GPS", > >>>>>>>> "andriod" and "samsung andriod". > >>>>>>> I interpreted that to mean that hit > >>> documents > >>>>> should > >>>>>>> contain terms from the query, and > >>> nothing else. > >>>>> Making > >>>>>>> all terms required doesn't do this. > >>>>>>> > >>>>>>> Steve > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: Matthew Hall [mailto:mh...@informatics.jax.org] > >>>>>>>> Sent: Tuesday, October 26, 2010 > >>> 2:30 PM > >>>>>>>> To: solr-user@lucene.apache.org > >>>>>>>> Subject: Re: How do I this in > >>> Solr? > >>>>>>>> Um.. you could change your default > >>> clause to > >>>>> AND > >>>>>>> rather than or. > >>>>>>>> That should do the trick. > >>>>>>>> > >>>>>>>> Matt > >>>>>>>> > >>>>>>>> On 10/26/2010 2:26 PM, Dennis > >>> Gearon wrote: > >>>>>>>>> Overkill? > >>>>>>>>> > >>>>>>>>> Dennis Gearon > >>>>>>>>>> I can't think of a way to > >>> do it > >>>>> without > >>>>>>> writing new > >>>>>>>>>> analysis filters. > >>>>>>>>>> > >>>>>>>>>> But I think you could do > >>> what you > >>>>> want with > >>>>>>> two filters > >>>>>>>>>> (this is untested): > >>>>>>>>>> > >>>>>>>>>> 1. An index-time filter > >>> that > >>>>> outputs a single > >>>>>>> token > >>>>>>>>>> consisting of all of the > >>> input > >>>>> tokens, sorted > >>>>>>> in a > >>>>>>>>>> consistent way, e.g.: > >>>>>>>>>> > >>>>>>>>>> "mobile with > >>> GPS" > >>>>>>> -> "GPS mobile > >>>>>>>>>> with" > >>>>>>>>>> "samsung > >>> android" > >>>>>>> -> "android > >>>>>>>>>> samsung" > >>>>>>>>>> > >>>>>>>>>> 2. A query-time filter > >>> that outputs > >>>>> one token > >>>>>>> per input > >>>>>>>>>> term combination, sorted > >>> in the > >>>>> same > >>>>>>> consistent way as the > >>>>>>>>>> index-time filter, e.g.: > >>>>>>>>>> > >>>>>>>>>> "samsung andriod > >>>>>>> GPS" > >>>>>>>>>> -> > >>>>>>>>>> > >>> "samsung","android","GPS", > >>>>>>>>>> "android > >>>>>>>>>> samsung","GPS > >>> samsung","android > >>>>> GPS" > >>>>>>>>>> "android > >>>>>>> GPS > >>>>>>>>>> samsung" > >>>>>>>>>> > >>>>>>>>>> Steve > >>>>>>>>>> > >>>>>>>>>>> -----Original > >>> Message----- > >>>>>>>>>>> From: Varun Gupta > >>> [mailto:varun.vgu...@gmail.com] > >>>>>>>>>>> Sent: Tuesday, > >>> October 26, 2010 > >>>>> 9:08 AM > >>>>>>>>>>> To: solr-user@lucene.apache.org > >>>>>>>>>>> Subject: How do I > >>> this in > >>>>> Solr? > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> I have lot of small > >>> documents > >>>>> (each > >>>>>>> containing 1 to 15 > >>>>>>>>>> words) indexed in > >>>>>>>>>>> Solr. For the search > >>> query, I > >>>>> want the > >>>>>>> search results > >>>>>>>>>> to contain only > >>>>>>>>>>> those > >>>>>>>>>>> documents that > >>> satisfy this > >>>>> criteria "All > >>>>>>> of the words > >>>>>>>>>> of the search > >>>>>>>>>>> result > >>>>>>>>>>> document are present > >>> in the > >>>>> search > >>>>>>> query" > >>>>>>>>>>> For example: > >>>>>>>>>>> If I have the > >>> following > >>>>> documents > >>>>>>> indexed: "nokia > >>>>>>>>>> n95", "GPS", "android", > >>>>>>>>>>> "samsung", "samsung > >>> andriod", > >>>>> "nokia > >>>>>>> andriod", "mobile > >>>>>>>>>> with GPS" > >>>>>>>>>>> If I search with the > >>> text > >>>>> "samsung > >>>>>>> andriod GPS", > >>>>>>>>>> search results should > >>>>>>>>>>> only > >>>>>>>>>>> conain "samsung", > >>> "GPS", > >>>>> "andriod" and > >>>>>>> "samsung > >>>>>>>>>> andriod". > >>>>>>>>>>> Is there a way to do > >>> this in > >>>>> Solr. > >>>>>>>>>>> -- > >>>>>>>>>>> Thanks > >>>>>>>>>>> Varun Gupta > >>>>>>> > > > > > > > -- > Matthew Hall > Software Engineer > Mouse Genome Informatics > mh...@informatics.jax.org > (207) 288-6012 >