Re: Use stream result like a query (alternative to innerJoin)

Joel Bernstein Mon, 23 Nov 2020 12:23:32 -0800

Here is the documentation for fetch:

https://lucene.apache.org/solr/guide/8_4/stream-decorator-reference.html#fetch



Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, Nov 23, 2020 at 3:22 PM Joel Bernstein <joels...@gmail.com> wrote:

> There are two streams that behave like that.
>
> One is the "nodes" expression, which is not going to work for this use
> case because it does everything in memory.
>
> The second one is the "fetch" expression which behaves like a nested loop
> join with some limitations. Unfortunately the main limitation is likely to
> be a blocker for you which is that it doesn't support one-to-many joins yet.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Sun, Nov 22, 2020 at 10:37 AM ufuk yılmaz <uyil...@vivaldi.net.invalid>
> wrote:
>
>> Hi all,
>>
>> I’m looking for a way to query two collections and find documents that
>> exist in both, I know this can be done with innerJoin streaming expression
>> but I want to avoid it, since one of the collection streams can possibly
>> have billions of results:
>>
>> Let’s say two collections are:
>>
>> deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
>> items = [
>>         {
>>                 id: 1,
>>                 name: "a"
>>         },
>>         {       id: 2,
>>                 name: "b"
>>         },
>>         {
>>                 id: 3,
>>                 name: "c"
>>         }.....
>> ]
>>
>> “deletedItems” contain a few documents compared to “items” collection
>> (1mil vs 2-3 bil). If I query them both with a typical query in our system,
>> deletedItems gives a few thousand results but items give tens/hundreds of
>> millions. To use innerJoin, I have to stream the whole items result to
>> worker node over network.
>>
>> Is there a way to avoid this, something like using “deletedItems” result
>> as a query to “items” stream?
>>
>> Thanks in advance for the help
>>
>> Sent from Mail for Windows 10
>>
>>

Re: Use stream result like a query (alternative to innerJoin)

Reply via email to