Here is the documentation for fetch: https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#fetch
Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Nov 23, 2020 at 3:22 PM Joel Bernstein <joels...@gmail.com> wrote: > There are two streams that behave like that. > > One is the "nodes" expression, which is not going to work for this use > case because it does everything in memory. > > The second one is the "fetch" expression which behaves like a nested loop > join with some limitations. Unfortunately the main limitation is likely to > be a blocker for you which is that it doesn't support one-to-many joins yet. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz <uyil...@vivaldi.net.invalid> > wrote: > >> Hi all, >> >> I’m looking for a way to query two collections and find documents that >> exist in both, I know this can be done with innerJoin streaming expression >> but I want to avoid it, since one of the collection streams can possibly >> have billions of results: >> >> Let’s say two collections are: >> >> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...] >> items = [ >> { >> id: 1, >> name: "a" >> }, >> { id: 2, >> name: "b" >> }, >> { >> id: 3, >> name: "c" >> }..... >> ] >> >> “deletedItems” contain a few documents compared to “items” collection >> (1mil vs 2-3 bil). If I query them both with a typical query in our system, >> deletedItems gives a few thousand results but items give tens/hundreds of >> millions. To use innerJoin, I have to stream the whole items result to >> worker node over network. >> >> Is there a way to avoid this, something like using “deletedItems” result >> as a query to “items” stream? >> >> Thanks in advance for the help >> >> Sent from Mail for Windows 10 >> >>