Just for the sake of an imagined scenario, you could use the [subquery] doc transformer. A query like the one below:
/select?q=family: Smith&fq=watched_movies:[* TO *]&fl=*, movies:[subquery]&movies.q={!terms f=id v=$row.watched_movies} Would bring back the results below: { "responseHeader":{ "status":0, "QTime":0, "params":{ "movies.q":"{!terms f=id v=$row.watched_movies}", "q":"family: Smith", "fl":"*, movies:[subquery]", "fq":"watched_movies:[* TO *]"}}, "response":{"numFound":2,"start":0,"docs":[ { "id":"user_1", "name":["Jane"], "family":["Smith"], "born":["1990-01-01T00:00:00Z"], "watched_movies":["1", "3"], "_version_":1657646162820202496, "movies":{"numFound":2,"start":0,"docs":[ { "id":"1", "title":["Rambo 1"], "release_date":["1978-01-01T00:00:00Z"], "_version_":1657646123722997760}, { "id":"3", "title":["300 Spartaaaaaans"], "release_date":["2005-01-01T00:00:00Z"], "_version_":1657646123726143488}] }}, { "id":"user_2", "title":["Joe"], "family":["Smith"], "born":["1970-01-01T00:00:00Z"], "watched_movies":["2"], "_version_":1657646162827542528, "movies":{"numFound":1,"start":0,"docs":[ { "id":"2", "title":["Rambo 5"], "release_date":["1998-01-01T00:00:00Z"], "_version_":1657646123725094912}] }}] }} But I wasn't able to filter on date (I could filter a specific date using movies.fq={!term f=release_date v=2005-01-01T00:00:00Z} but not on range) nor could I perform facets in the children of the above example. It probably only works on a single node too. Finally, there a couple of parameters that can be important but that I ommited for the sake of brevity and clarity: movies.limit=100 and movies.sort=release_date DESC Best, Edward On Tue, Feb 4, 2020 at 11:17 AM Radu Gheorghe <radu.gheor...@sematext.com> wrote: > > Hello Solr users, > > How would you design a filtered join scenario? > > Say I have a bunch of movies (excuse any inaccuracies, this is an > imagined scenario): > > curl -XPOST -H 'Content-Type: application/json' > 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary ' > [{ > "id": "1", > "title": "Rambo 1", > "release_date": "1978-01-01" > }, > { > "id": "2", > "title": "Rambo 5", > "release_date": "1998-01-01" > }, > { > "id": "3", > "title": "300 Spartaaaaaans", > "release_date": "2005-01-01" > }]' > > And a bunch of users of certain families who watched those movies: > > curl -XPOST -H 'Content-Type: application/json' > 'localhost:8983/solr/test/update?commitWithin=1000' --data-binary ' > [{ > "id": "user_1", > "name": "Jane", > "family": "Smith", > "born": "1990-01-01", > "watched_movies": ["1", "3"] > }, > { > "id": "user_2", > "title": "Joe", > "family": "Smith", > "born": "1970-01-01", > "watched_movies": ["2"] > }, > { > "id": "user_3", > "title": "Radu", > "family": "Gheorghe, > "born": "1985-01-01", > "watched_movies": ["1", "2", "3"] > }]' > > They don't have to be in the same collection. The important question > is how to get: > - movies watched by user of family Smith > - after they were born > - including the matching users > - I'd like to be able to facet on movie metadata, but I don't need to > facet on user metadata, just to be able to retrieve those fields > > The above query should bring back Rambo 5 and 300, with Joe and Jane > respectively. I wouldn't get Rambo 1, because although Jane watched > it, the movie was released before she was born. > > Here are some options that I have in mind: > 1) using the join query parser (or the newer XCJF) to do the join > itself. Then have some sort of plugin pull the "born" value or each > corresponding user (via some subquery) and filter movies afterwards. > Normalized, but likely painfully slow > > 2) similar approach with 1), in a streaming expression. Again, > normalized, but slow (we're talking billions of movies, millions of > users). And limited support for facets. > > 3) have some sort of denormalization. For example, pre-compute > matching users for every movie, then just use join/XCJF to do the > actual join. This makes indexing/updates expensive and potentially > complicated > > 4) normalization with nested documents. This is best for searches, but > pretty much a no-go for indexing/updates. In this imaginary use-case, > there are binge-watchers who might watch a billion movies in a week, > making us reindex everything > > Do you see better ways? > > Thanks in advance and best regards, > Radu