Re: Optimizing Collocated Join

Ilya Kasnacheev Mon, 17 Dec 2018 06:03:18 -0800

Hello!

PUBLIC.EVENTTHETA_PARENTS2CELLID_ASC_S2CELLID_ASC_EVENTDATE_ASC_EVENTHOUR_ASC_IDX
Looks like this index is not selective enough for efficient filtering by
EVENTDATE. So you have to scan a lot of entries regardless of date range.


You can set local=true on SQL query to only return results from local node.

Compute tasks can't use SQL indexes outside of SQL queries so it might not
be so efficient.

Regards,

Regards,
-- 
Ilya Kasnacheev


пн, 17 дек. 2018 г. в 07:02, kellan <[email protected]>:

> I'm trying to understand the mechanics of collocated joins in Ignite (I'm
> running 2.7). I have two queries that are only differentiated by the date
> range that return in the same amount of time, although one pulls data for
> thirty days and the other for only one.
>
>
>     SELECT S2CellCovering.coveringId, COUNT(*)
>     FROM S2CellCovering
>     INNER JOIN EventTheta
>       ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
>       AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
> S2CellCovering.maxS2CellId
>       AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-02'
>       AND EventTheta.eventHour BETWEEN -1 AND -1
>     WHERE S2CellCovering.coveringId = 166
>     GROUP BY S2CellCovering.coveringId;
>
>     COVERINGID  166
>     COUNT(*)    5629
>
>     COVERINGID  166
>     COUNT(*)    6407
>
>     COVERINGID  166
>     COUNT(*)    9030
>
>     COVERINGID  166
>     COUNT(*)    8965
>
>     4 rows selected (12.389 seconds)
>
>
>     SELECT S2CellCovering.coveringId, COUNT(*)
>     FROM S2CellCovering
>     INNER JOIN EventTheta
>       ON EventTheta.parentS2CellId = S2CellCovering.parentS2CellId
>       AND EventTheta.s2CellId BETWEEN S2CellCovering.minS2CellId AND
> S2CellCovering.maxS2CellId
>       AND EventTheta.eventDate BETWEEN '2018-10-02' AND '2018-10-31'
>       AND EventTheta.eventHour BETWEEN -1 AND -1
>     WHERE S2CellCovering.coveringId = 166
>     GROUP BY S2CellCovering.coveringId;
>
>     COVERINGID  166
>     COUNT(*)    149690
>
>     COVERINGID  166
>     COUNT(*)    161104
>
>     COVERINGID  166
>     COUNT(*)    215174
>
>     COVERINGID  166
>     COUNT(*)    218183
>
>     4 rows selected (12.578 seconds)
>
>
> When I run EXPLAIN it looks like my rows are all properly indexed, as far
> as
> I can tell:
>
>
> PLAN  SELECT
>     __Z0.COVERINGID AS __C0_0,
>     COUNT(*) AS __C0_1
> FROM PUBLIC.S2CELLCOVERING __Z0
>     /* PUBLIC.S2CELLCOVERING_COVERINGID_ASC_IDX: COVERINGID = 166 */
>     /* WHERE __Z0.COVERINGID = 166
>     */
> INNER JOIN PUBLIC.EVENTTHETA __Z1
>     /*
>
> PUBLIC.EVENTTHETA_PARENTS2CELLID_ASC_S2CELLID_ASC_EVENTDATE_ASC_EVENTHOUR_ASC_IDX:
> EVENTHOUR >= -1
>         AND EVENTHOUR <= -1
>         AND EVENTDATE >= DATE '2018-10-02'
>         AND EVENTDATE <= DATE '2018-10-31'
>         AND PARENTS2CELLID = __Z0.PARENTS2CELLID
>         AND S2CELLID >= __Z0.MINS2CELLID
>         AND S2CELLID <= __Z0.MAXS2CELLID
>      */
>     ON 1=1
> WHERE (__Z0.COVERINGID = 166)
>     AND (((__Z1.EVENTHOUR >= -1)
>     AND (__Z1.EVENTHOUR <= -1))
>     AND (((__Z1.EVENTDATE >= DATE '2018-10-02')
>     AND (__Z1.EVENTDATE <= DATE '2018-10-31'))
>     AND ((__Z1.PARENTS2CELLID = __Z0.PARENTS2CELLID)
>     AND ((__Z0.MINS2CELLID <= __Z1.S2CELLID)
>     AND (__Z0.MAXS2CELLID >= __Z1.S2CELLID)))))
> GROUP BY __Z0.COVERINGID
> /* group sorted */
>
> PLAN  SELECT
>     __C0_0 AS COVERINGID,
>     __C0_1 AS __C0_1
> FROM PUBLIC.__T0
>     /* PUBLIC."merge_scan" */
>
>
> The EVENTTHETA and S2CELLCOVERING keys should be collocated on the same
> nodes. Here are my keys:
>
>
>     case class EventThetaKey(
>         @(AffinityKeyMapped@field)
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_eventtheta",
>                     order = 4,
>                     descending = true
>                 )
>             )
>         ) parentS2CellId: Long,
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_eventtheta",
>                     order = 3,
>                     descending = true
>                 )
>             )
>         ) s2CellId: Long,
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_eventtheta",
>                     order = 2,
>                     descending = true
>                 )
>             )
>         ) eventDate: Date,
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_eventtheta",
>                     order = 1,
>                     descending = true
>                 )
>             )
>         ) eventHour: Byte
>     )
>
>
>     case class S2CellCoveringKey(
>         @(AffinityKeyMapped@field)
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_s2cellcovering",
>                     order = 3,
>                     descending = true
>                 )
>             )
>         ) parentS2CellId: Long,
>         @(QuerySqlField@field)(
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_s2cellcovering",
>                     order = 2,
>                     descending = true
>                 )
>             )
>         ) s2CellId: Long,
>         @(QuerySqlField@field)(
>             index = true,
>             orderedGroups = Array(
>                 new (QuerySqlField.Group@field)(
>                     name = "pk_public_s2cellcovering",
>                     order = 1,
>                     descending = true
>                 )
>             )
>         ) coveringId: Long
>     )
>
>
> Based on the fact that I am able to scan 30x more rows in roughly the same
> amount of time, I gather that at one of my indices is set up properly, but
> both queries are taking over 12s to run with plenty of compute available.
> What am I missing here? I'm assuming that since the tables are joined on a
> shared affinity key the join is collocated, but maybe I'm wrong.
>
> On another note, is there a way to run this query in a compute task such
> that I'm only returning results from the local node?
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Optimizing Collocated Join

Reply via email to