Hi all,

I’m looking for a way to query two collections and find documents that exist in 
both, I know this can be done with innerJoin streaming expression but I want to 
avoid it, since one of the collection streams can possibly have billions of 
results:

Let’s say two collections are:

deletedItems = [{deletedItemId: 1}, {deletedItemId: 2}...]
items = [
        {
                id: 1,
                name: "a"
        },
        {       id: 2,
                name: "b"
        },
        {
                id: 3,
                name: "c"
        }.....
]

“deletedItems” contain a few documents compared to “items” collection (1mil vs 
2-3 bil). If I query them both with a typical query in our system, deletedItems 
gives a few thousand results but items give tens/hundreds of millions. To use 
innerJoin, I have to stream the whole items result to worker node over network.

Is there a way to avoid this, something like using “deletedItems” result as a 
query to “items” stream?

Thanks in advance for the help

Sent from Mail for Windows 10

Reply via email to