Hello! Why don't you diff the results of those two queries, tell us what the difference is?
Regards, -- Ilya Kasnacheev пн, 17 сент. 2018 г. в 16:08, eugene miretsky <[email protected]>: > Hello, > > Just wanted to see if anybody had time to look into this. > > Cheers, > Eugene > > On Wed, Sep 12, 2018 at 6:29 PM eugene miretsky <[email protected]> > wrote: > >> Thanks! >> >> Tried joining with an inlined table instead of IN as per the second >> suggestion, and it didn't quite work. >> >> Query1: >> >> - Select COUNT(*) FROM( Select customer_id from GATABLE3 use Index( >> ) where category_id in (9005, 175930, 175930, 175940,175945,101450, 6453) >> group by customer_id having SUM(product_views_app) > 2 OR >> SUM(product_clicks_app) > 1 ) >> - exec time = 17s >> - *Result: 3105868* >> - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or >> customer_id index >> - Using an index on category_id increases the query time 33s >> >> Query2: >> >> - Select COUNT(*) FROM( Select customer_id from GATABLE3 ga use >> index (PUBLIC."_key_PK") inner join table(category_id int = (9005, 175930, >> 175930, 175940,175945,101450, 6453)) cats on cats.category_id = >> ga.category_id group by customer_id having SUM(product_views_app) > 2 OR >> SUM(product_clicks_app) > 1 ) >> - exec time = 38s >> - *Result: 3113921* >> - Same exec time if using AFFINITY_KEY index or "_key_PK_hash or >> customer_id index or category_id index >> - Using an index on category_id doesnt change the run time >> >> Query plans are attached. >> >> 3 questions: >> >> 1. Why is the result differnt for the 2 queries - this is quite >> concerning. >> 2. Why is the 2nd query taking longer >> 3. Why category_id index doesn't work in case of query 2. >> >> >> On Wed, Sep 5, 2018 at 8:31 AM Ilya Kasnacheev <[email protected]> >> wrote: >> >>> Hello! >>> >>> I don't think that we're able to use index with IN () clauses. Please >>> convert it into OR clauses. >>> >>> Please see >>> https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-sql-performance-and-usability-considerations >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> пн, 3 сент. 2018 г. в 12:46, Andrey Mashenkov < >>> [email protected]>: >>> >>>> Hi >>>> >>>> Actually, first query uses index on affinity key which looks more >>>> efficient than index on category_id column. >>>> The first query can process groups one by one and stream partial >>>> results from map phase to reduce phase as it use sorted index lookup, >>>> while second query should process full dataset on map phase before pass >>>> it for reducing. >>>> >>>> Try to use composite index (customer_id, category_id). >>>> >>>> Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build >>>> more efficient plan when group by on collocated column is used. >>>> >>>> On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky < >>>> [email protected]> wrote: >>>> >>>>> Hello, >>>>> >>>>> Schema: >>>>> >>>>> - >>>>> >>>>> PUBLIC.GATABLE2.CUSTOMER_ID >>>>> >>>>> PUBLIC.GATABLE2.DT >>>>> >>>>> PUBLIC.GATABLE2.CATEGORY_ID >>>>> >>>>> PUBLIC.GATABLE2.VERTICAL_ID >>>>> >>>>> PUBLIC.GATABLE2.SERVICE >>>>> >>>>> PUBLIC.GATABLE2.PRODUCT_VIEWS_APP >>>>> >>>>> PUBLIC.GATABLE2.PRODUCT_CLICKS_APP >>>>> >>>>> PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB >>>>> >>>>> PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB >>>>> >>>>> PUBLIC.GATABLE2.PDP_SESSIONS_APP >>>>> >>>>> PUBLIC.GATABLE2.PDP_SESSIONS_WEB >>>>> - pkey = customer_id,dt >>>>> - affinityKey = customer >>>>> >>>>> Query: >>>>> >>>>> - select COUNT(*) FROM( Select customer_id from GATABLE2 where >>>>> category_id in (175925, 101450, 9005, 175930, 175930, >>>>> 175940,175945,101450, >>>>> 6453) group by customer_id having SUM(product_views_app) > 2 OR >>>>> SUM(product_clicks_app) > 1 ) >>>>> >>>>> The table has 600M rows. >>>>> At first, the query took 1m, when we added an index on category_id the >>>>> query started taking 3m. >>>>> >>>>> The SQL execution plan for both queries is attached. >>>>> >>>>> We are using a single x1.16xlarge insntace with query parallelism set >>>>> to 32 >>>>> >>>>> Cheers, >>>>> Eugene >>>>> >>>>> >>>> >>>> -- >>>> Best regards, >>>> Andrey V. Mashenkov >>>> >>>
