You run n scans in parallel. 
You want a single result set in sort order. 

How do you do that? 
(Rhetorical) 

That’s the extra work that you don’t have when you have a single result set. 

This goes in to why the work done for secondary indexing to be associated with 
the base table won’t scale or work when you have to consider joins. 

Remember to think beyond your specific use case and think about a general use 
case.  You said that you thought about not caring about the RS order when in 
the general case you have to consider it. 

Think of it this way… 

In many RDBMSs you have two ways to handle parallelism. 
You can partition your data in a round robin fashion, or you can partition your 
data against a range. 
In one use case, the client used a date range partition. That is that they 
created a partition based on the month of the data verus just storing it on a 
round robin fashion.

In one you get a high degree of parallelism because you’re going against the 
data that’s spread across the nodes in the database. 
In the other, your data is segmented so you’re only going after a subset of 
your data that’s local on to a single system. 

Which is better? 
Which is more efficient? 

;-) 



On May 19, 2014, at 2:00 PM, Mike Axiak <[email protected]> wrote:

> On Mon, May 19, 2014 at 8:53 AM, Michael Segel
> <[email protected]> wrote:
>> While in each scan, the RS is in sort order, the overall set of RS needs to 
>> be merged in to one RS and that’s where you start to have issues.
> 
> What issues? As I said, in multiple tests we saw performance
> improvements across the board with this strategy.
> 

Reply via email to