Re: Percolate feature?

Roman Chyla Fri, 09 Aug 2013 09:51:28 -0700

On Fri, Aug 9, 2013 at 11:29 AM, Mark <static.void....@gmail.com> wrote:


> > *All* of the terms in the field must be matched by the query....not
> vice-versa.
>
> Exactly. This is why I was trying to explain it as a reverse search.
>
> I just realized I describe it as a *large list of known keywords when
> really its small; no more than 1000. Forgetting about performance  how hard
> do you think this would be to implement? How should I even start?
>

not hard, index all terms into a field - make sure there are no duplicates,
as you want to count them - then I can imagine at least two options: save
the number of terms as a payload together with the terms, or in second step
(in a collector, for example), load the document and count them terms in
the field - if they match the query size, you are done

a trivial, naive implementation (as you say 'forget performance') could be:

searcher.search(query, null, new Collector() {
  ...
  public void collect(int i) throws Exception {
     d = reader.document(i, fieldsToLoa);
     if (d.getValues(fieldToLoad).size() == query.size()) {
        PriorityQueue.add(new ScoreDoc(score, i + docBase));
     }
  }
}

so if your query contains no duplicates and all terms must match, you can
be sure that you are collecting docs only when the number of terms matches
number of clauses in the query

roman


> Thanks for the input
>
> On Aug 9, 2013, at 6:56 AM, Yonik Seeley <yo...@lucidworks.com> wrote:
>
> > *All* of the terms in the field must be matched by the query....not
> vice-versa.
> > And no, we don't have a query for that out of the box.  To implement,
> > it seems like it would require the total number of terms indexed for a
> > field (for each document).
> > I guess you could also index start and end tokens and then use query
> > expansion to all possible combinations... messy though.
> >
> > -Yonik
> > http://lucidworks.com
> >
> > On Fri, Aug 9, 2013 at 8:19 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> >> This _looks_ like simple phrase matching (no slop) and highlighting...
> >>
> >> But whenever I think the answer is really simple, it usually means
> >> that I'm missing something....
> >>
> >> Best
> >> Erick
> >>
> >>
> >> On Thu, Aug 8, 2013 at 11:18 PM, Mark <static.void....@gmail.com>
> wrote:
> >>
> >>> Ok forget the mention of percolate.
> >>>
> >>> We have a large list of known keywords we would like to match against.
> >>>
> >>> Product keyword:  "Sony"
> >>> Product keyword:  "Samsung Galaxy"
> >>>
> >>> We would like to be able to detect given a product title whether or
> not it
> >>> matches any known keywords. For a keyword to be matched all of it's
> terms
> >>> must be present in the product title given.
> >>>
> >>> Product Title: "Sony Experia"
> >>> Matches and returns a highlight: "<em>Sony</em> Experia"
> >>>
> >>> Product Title: "Samsung 52inch LC"
> >>> Does not match
> >>>
> >>> Product Title: "Samsung Galaxy S4"
> >>> Matches a returns a highlight: "<em>Samsung Galaxy</em>"
> >>>
> >>> Product Title: "Galaxy Samsung S4"
> >>> Matches a returns a highlight: "<em> Galaxy  Samsung</em>"
> >>>
> >>> What would be the best way to approach this?
> >>>
> >>>
> >>>
> >>>
> >>> On Aug 5, 2013, at 7:02 PM, Chris Hostetter <hossman_luc...@fucit.org>
> >>> wrote:
> >>>
> >>>>
> >>>> : Subject: Percolate feature?
> >>>>
> >>>> can you give a more concrete, realistic example of what you are
> trying to
> >>>> do? your synthetic hypothetical example is kind of hard to make sense
> of.
> >>>>
> >>>> your Subject line and comment that the "percolate" feature of elastic
> >>>> search sounds like what you want seems to have some lead people down a
> >>>> path of assuming you want to run these types of queries as documents
> are
> >>>> indexed -- but that isn't at all clear to me from the way you worded
> your
> >>>> question other then that.
> >>>>
> >>>> it's also not clear what aspect of the "results" you really care
> about --
> >>>> are you only looking for the *number* of documents that "match"
> according
> >>>> to your concept of matching, or are you looking for a list of matches?
> >>>> what multiple documents have all of their terms in the query string --
> >>> how
> >>>> should they score relative to eachother?  what if a document contains
> the
> >>>> same term multiple times, do you expect it to be a match of a query
> only
> >>>> if that term appears in the query multiple times as well?  do you care
> >>>> about hte ordering of the terms in the query? the ordering of hte
> terms
> >>> in
> >>>> the document?
> >>>>
> >>>> Ideally: describe for us what you wnat to do, w/o assuming
> >>>> solr/elasticsearch/anything specific about the implementation -- just
> >>>> describe your actual use case for us, with several real document/query
> >>>> examples.
> >>>>
> >>>>
> >>>>
> >>>> https://people.apache.org/~hossman/#xyproblem
> >>>> XY Problem
> >>>>
> >>>> Your question appears to be an "XY Problem" ... that is: you are
> dealing
> >>>> with "X", you are assuming "Y" will help you, and you are asking about
> >>> "Y"
> >>>> without giving more details about the "X" so that we can understand
> the
> >>>> full issue.  Perhaps the best solution doesn't involve "Y" at all?
> >>>> See Also: http://www.perlmonks.org/index.pl?node_id=542341
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> -Hoss
> >>>
> >>>
>
>

Re: Percolate feature?

Reply via email to