Re: Filtered join in Solr?

Edward Ribeiro Tue, 04 Feb 2020 15:00:29 -0800

Just for the sake of an imagined scenario, you could use the [subquery] doc
transformer. A query like the one below:


/select?q=family: Smith&fq=watched_movies:[* TO *]&fl=*,
movies:[subquery]&movies.q={!terms f=id v=$row.watched_movies}

Would bring back the results below:

{ "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "movies.q":"{!terms f=id v=$row.watched_movies}",
      "q":"family: Smith",
      "fl":"*, movies:[subquery]",
      "fq":"watched_movies:[* TO *]"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"user_1",
        "name":["Jane"],
        "family":["Smith"],
        "born":["1990-01-01T00:00:00Z"],
        "watched_movies":["1",
          "3"],
        "_version_":1657646162820202496,
        "movies":{"numFound":2,"start":0,"docs":[
            {
              "id":"1",
              "title":["Rambo 1"],
              "release_date":["1978-01-01T00:00:00Z"],
              "_version_":1657646123722997760},
            {
              "id":"3",
              "title":["300 Spartaaaaaans"],
              "release_date":["2005-01-01T00:00:00Z"],
              "_version_":1657646123726143488}]
        }},
      {
        "id":"user_2",
        "title":["Joe"],
        "family":["Smith"],
        "born":["1970-01-01T00:00:00Z"],
        "watched_movies":["2"],
        "_version_":1657646162827542528,
        "movies":{"numFound":1,"start":0,"docs":[
            {
              "id":"2",
              "title":["Rambo 5"],
              "release_date":["1998-01-01T00:00:00Z"],
              "_version_":1657646123725094912}]
        }}]
  }}

But I wasn't able to filter on date (I could filter a specific date
using movies.fq={!term f=release_date v=2005-01-01T00:00:00Z} but not on
range) nor could I perform facets in the children of the above example. It
probably only works on a single node too. Finally, there a couple of
parameters that can be important but that I ommited for the sake of brevity
and clarity: movies.limit=100 and movies.sort=release_date DESC


Best,
Edward



On Tue, Feb 4, 2020 at 11:17 AM Radu Gheorghe <radu.gheor...@sematext.com>
wrote:
>
> Hello Solr users,
>
> How would you design a filtered join scenario?
>
> Say I have a bunch of movies (excuse any inaccuracies, this is an
> imagined scenario):
>
> curl -XPOST -H 'Content-Type: application/json'
> 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary '
> [{
> "id": "1",
> "title": "Rambo 1",
> "release_date": "1978-01-01"
> },
> {
> "id": "2",
> "title": "Rambo 5",
> "release_date": "1998-01-01"
> },
> {
> "id": "3",
>     "title": "300 Spartaaaaaans",
> "release_date": "2005-01-01"
> }]'
>
> And a bunch of users of certain families who watched those movies:
>
> curl -XPOST -H 'Content-Type: application/json'
> 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary '
> [{
> "id": "user_1",
> "name": "Jane",
> "family": "Smith",
> "born": "1990-01-01",
> "watched_movies": ["1", "3"]
> },
> {
> "id": "user_2",
> "title": "Joe",
> "family": "Smith",
> "born": "1970-01-01",
> "watched_movies": ["2"]
> },
> {
> "id": "user_3",
> "title": "Radu",
> "family": "Gheorghe,
> "born": "1985-01-01",
> "watched_movies": ["1", "2", "3"]
> }]'
>
> They don't have to be in the same collection. The important question
> is how to get:
> - movies watched by user of family Smith
> - after they were born
> - including the matching users
> - I'd like to be able to facet on movie metadata, but I don't need to
> facet on user metadata, just to be able to retrieve those fields
>
> The above query should bring back Rambo 5 and 300, with Joe and Jane
> respectively. I wouldn't get Rambo 1, because although Jane watched
> it, the movie was released before she was born.
>
> Here are some options that I have in mind:
> 1) using the join query parser (or the newer XCJF) to do the join
> itself. Then have some sort of plugin pull the "born" value or each
> corresponding user (via some subquery) and filter movies afterwards.
> Normalized, but likely painfully slow
>
> 2) similar approach with 1), in a streaming expression. Again,
> normalized, but slow (we're talking billions of movies, millions of
> users). And limited support for facets.
>
> 3) have some sort of denormalization. For example, pre-compute
> matching users for every movie, then just use join/XCJF to do the
> actual join. This makes indexing/updates expensive and potentially
> complicated
>
> 4) normalization with nested documents. This is best for searches, but
> pretty much a no-go for indexing/updates. In this imaginary use-case,
> there are binge-watchers who might watch a billion movies in a week,
> making us reindex everything
>
> Do you see better ways?
>
> Thanks in advance and best regards,
> Radu

Re: Filtered join in Solr?

Reply via email to