Re: Optimizing Collocated Join

kellan Wed, 19 Dec 2018 12:07:27 -0800

I haven't increased parallelism yet, but that's not a solution to my problem.


I was able to speed up the query by running a ComputeTask that distributes
work to the nodes in my cluster based on affinity key parentS2CellId, and
the runs this local query for each matching parentS2CellId, s2CellId:

SELECT EventTheta.theta
FROM EventTheta
WHERE parentS2CellId = ?
  AND s2CellId BETWEEN ? AND ?
  AND eventDate BETWEEN ? AND ?
  AND eventHour BETWEEN ? AND ?;

With seven days of data in the database I get results back in about 750ms,
which is on target, but when I increase my data set size to thirty days and
run the same query (for both 7 days of data and 30 days of data), I'm up to
2-3s.

7 Days: 1913 ms => 8312 rows
30 Days: 1965 ms => 39038 rows

The query execution time seems to be growing at roughly O(n) not O(log(n))
time in relation to the size of the data set. I need to find a way to
preserve my affinity key (parentS2CellId), while growing out the size of my
data set. Is the problem with the order of the index, with the range queries
on the index or something else?





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Optimizing Collocated Join

Reply via email to