Hi all,
I’m looking for a way to query two collections and find documents that exist in
both, I know this can be done with innerJoin streaming expression but I want to
avoid it, since one of the collection streams can possibly have billions of
results:
Let’s say two collections are:
deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
items = [
{
id: 1,
name: "a"
},
{ id: 2,
name: "b"
},
{
id: 3,
name: "c"
}.....
]
“deletedItems” contain a few documents compared to “items” collection (1mil vs
2-3 bil). If I query them both with a typical query in our system, deletedItems
gives a few thousand results but items give tens/hundreds of millions. To use
innerJoin, I have to stream the whole items result to worker node over network.
Is there a way to avoid this, something like using “deletedItems” result as a
query to “items” stream?
Thanks in advance for the help
Sent from Mail for Windows 10