Hello!

Why don't you diff the results of those two queries, tell us what the
difference is?

Regards,
-- 
Ilya Kasnacheev


пн, 17 сент. 2018 г. в 16:08, eugene miretsky <[email protected]>:

> Hello,
>
> Just wanted to see if anybody had time to look into this.
>
> Cheers,
> Eugene
>
> On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <[email protected]>
> wrote:
>
>> Thanks!
>>
>> Tried joining with an inlined table instead of IN as per the second
>> suggestion, and it didn't quite work.
>>
>> Query1:
>>
>>    - Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index(
>>    ) where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
>>    group by customer_id having SUM(product_views_app) > 2 OR
>>    SUM(product_clicks_app) > 1 )
>>    - exec time = 17s
>>    - *Result: 3105868*
>>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>    customer_id index
>>    - Using an index on category_id increases the query time 33s
>>
>> Query2:
>>
>>    - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use
>>    index (PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930,
>>    175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>>    ga.category_id   group by customer_id having SUM(product_views_app) > 2 OR
>>    SUM(product_clicks_app) > 1 )
>>    - exec time = 38s
>>    - *Result: 3113921*
>>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>>    customer_id index or category_id index
>>    - Using an index on category_id doesnt change the run time
>>
>> Query plans are attached.
>>
>> 3 questions:
>>
>>    1. Why is the result differnt for the 2 queries - this is quite
>>    concerning.
>>    2. Why is the 2nd query taking longer
>>    3. Why  category_id index doesn't work in case of query 2.
>>
>>
>> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev <[email protected]>
>> wrote:
>>
>>> Hello!
>>>
>>> I don't think that we're able to use index with IN () clauses. Please
>>> convert it into OR clauses.
>>>
>>> Please see
>>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>>>
>>> Regards,
>>> --
>>> Ilya Kasnacheev
>>>
>>>
>>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov <
>>> [email protected]>:
>>>
>>>> Hi
>>>>
>>>> Actually, first query uses index on affinity key which looks more
>>>> efficient than index on category_id column.
>>>> The first query can process groups one by one and stream partial
>>>> results from map phase to reduce phase as it use sorted index lookup,
>>>> while second query should process full dataset on map phase before pass
>>>> it for reducing.
>>>>
>>>> Try to use composite index (customer_id, category_id).
>>>>
>>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>>>> more efficient plan when group by on collocated column is used.
>>>>
>>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <
>>>> [email protected]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Schema:
>>>>>
>>>>>    -
>>>>>
>>>>>    PUBLIC.GATABLE2.CUSTOMER_ID
>>>>>
>>>>>    PUBLIC.GATABLE2.DT
>>>>>
>>>>>    PUBLIC.GATABLE2.CATEGORY_ID
>>>>>
>>>>>    PUBLIC.GATABLE2.VERTICAL_ID
>>>>>
>>>>>    PUBLIC.GATABLE2.SERVICE
>>>>>
>>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>>>
>>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>>>
>>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>>>
>>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>>>
>>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>>>
>>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>>>    - pkey = customer_id,dt
>>>>>    - affinityKey = customer
>>>>>
>>>>> Query:
>>>>>
>>>>>    - select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>>>    category_id in (175925, 101450, 9005, 175930, 175930, 
>>>>> 175940,175945,101450,
>>>>>    6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>>    SUM(product_clicks_app) > 1 )
>>>>>
>>>>> The table has 600M rows.
>>>>> At first, the query took 1m, when we added an index on category_id the
>>>>> query started taking 3m.
>>>>>
>>>>> The SQL execution plan for both queries is attached.
>>>>>
>>>>> We are using a single x1.16xlarge insntace with query parallelism set
>>>>> to 32
>>>>>
>>>>> Cheers,
>>>>> Eugene
>>>>>
>>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>

Reply via email to