Re: Optimizing Collocated Join

Ilya Kasnacheev Thu, 20 Dec 2018 05:30:53 -0800

Hello!

As I have already said, there is likely a problem with index selectivity.
SQL engine has to walk every record where coveringId = 166, and it has to
do join for every such record by using index on parentS2CellId, and only
then it can filter by date.


Regards,
-- 
Ilya Kasnacheev


ср, 19 дек. 2018 г. в 23:07, kellan <[email protected]>:

> I haven't increased parallelism yet, but that's not a solution to my
> problem.
>
> I was able to speed up the query by running a ComputeTask that distributes
> work to the nodes in my cluster based on affinity key parentS2CellId, and
> the runs this local query for each matching parentS2CellId, s2CellId:
>
> SELECT EventTheta.theta
> FROM EventTheta
> WHERE parentS2CellId = ?
>   AND s2CellId BETWEEN ? AND ?
>   AND eventDate BETWEEN ? AND ?
>   AND eventHour BETWEEN ? AND ?;
>
> With seven days of data in the database I get results back in about 750ms,
> which is on target, but when I increase my data set size to thirty days and
> run the same query (for both 7 days of data and 30 days of data), I'm up to
> 2-3s.
>
> 7 Days: 1913 ms => 8312 rows
> 30 Days: 1965 ms => 39038 rows
>
> The query execution time seems to be growing at roughly O(n) not O(log(n))
> time in relation to the size of the data set. I need to find a way to
> preserve my affinity key (parentS2CellId), while growing out the size of my
> data set. Is the problem with the order of the index, with the range
> queries
> on the index or something else?
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Optimizing Collocated Join

Reply via email to