Hello,

Just wanted to see if anybody had time to look into this.

Cheers,
Eugene

On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <[email protected]>
wrote:

> Thanks!
>
> Tried joining with an inlined table instead of IN as per the second
> suggestion, and it didn't quite work.
>
> Query1:
>
>    - Select COUNT(*) FROM( Select customer_id from GATABLE3  use Index( )
>    where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453)
>    group by customer_id having SUM(product_views_app) > 2 OR
>    SUM(product_clicks_app) > 1 )
>    - exec time = 17s
>    - *Result: 3105868*
>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>    customer_id index
>    - Using an index on category_id increases the query time 33s
>
> Query2:
>
>    - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga  use index
>    (PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930,
>    175930, 175940,175945,101450, 6453)) cats on cats.category_id =
>    ga.category_id   group by customer_id having SUM(product_views_app) > 2 OR
>    SUM(product_clicks_app) > 1 )
>    - exec time = 38s
>    - *Result: 3113921*
>    - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or
>    customer_id index or category_id index
>    - Using an index on category_id doesnt change the run time
>
> Query plans are attached.
>
> 3 questions:
>
>    1. Why is the result differnt for the 2 queries - this is quite
>    concerning.
>    2. Why is the 2nd query taking longer
>    3. Why  category_id index doesn't work in case of query 2.
>
>
> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev <[email protected]>
> wrote:
>
>> Hello!
>>
>> I don't think that we're able to use index with IN () clauses. Please
>> convert it into OR clauses.
>>
>> Please see
>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov <[email protected]
>> >:
>>
>>> Hi
>>>
>>> Actually, first query uses index on affinity key which looks more
>>> efficient than index on category_id column.
>>> The first query can process groups one by one and stream partial results
>>> from map phase to reduce phase as it use sorted index lookup,
>>> while second query should process full dataset on map phase before pass
>>> it for reducing.
>>>
>>> Try to use composite index (customer_id, category_id).
>>>
>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build
>>> more efficient plan when group by on collocated column is used.
>>>
>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <
>>> [email protected]> wrote:
>>>
>>>> Hello,
>>>>
>>>> Schema:
>>>>
>>>>    -
>>>>
>>>>    PUBLIC.GATABLE2.CUSTOMER_ID
>>>>
>>>>    PUBLIC.GATABLE2.DT
>>>>
>>>>    PUBLIC.GATABLE2.CATEGORY_ID
>>>>
>>>>    PUBLIC.GATABLE2.VERTICAL_ID
>>>>
>>>>    PUBLIC.GATABLE2.SERVICE
>>>>
>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_APP
>>>>
>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_APP
>>>>
>>>>    PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB
>>>>
>>>>    PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB
>>>>
>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_APP
>>>>
>>>>    PUBLIC.GATABLE2.PDP_SESSIONS_WEB
>>>>    - pkey = customer_id,dt
>>>>    - affinityKey = customer
>>>>
>>>> Query:
>>>>
>>>>    - select COUNT(*) FROM( Select customer_id from GATABLE2 where
>>>>    category_id in (175925, 101450, 9005, 175930, 175930, 
>>>> 175940,175945,101450,
>>>>    6453) group by customer_id having SUM(product_views_app) > 2 OR
>>>>    SUM(product_clicks_app) > 1 )
>>>>
>>>> The table has 600M rows.
>>>> At first, the query took 1m, when we added an index on category_id the
>>>> query started taking 3m.
>>>>
>>>> The SQL execution plan for both queries is attached.
>>>>
>>>> We are using a single x1.16xlarge insntace with query parallelism set
>>>> to 32
>>>>
>>>> Cheers,
>>>> Eugene
>>>>
>>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>

Reply via email to