Hi Actually, first query uses index on affinity key which looks more efficient than index on category_id column. The first query can process groups one by one and stream partial results from map phase to reduce phase as it use sorted index lookup, while second query should process full dataset on map phase before pass it for reducing.
Try to use composite index (customer_id, category_id). Also, SqlQueryFields.setCollocated(true) flag can help Ignite to build more efficient plan when group by on collocated column is used. On Sun, Sep 2, 2018 at 2:02 AM eugene miretsky <[email protected]> wrote: > Hello, > > Schema: > > - > > PUBLIC.GATABLE2.CUSTOMER_ID > > PUBLIC.GATABLE2.DT > > PUBLIC.GATABLE2.CATEGORY_ID > > PUBLIC.GATABLE2.VERTICAL_ID > > PUBLIC.GATABLE2.SERVICE > > PUBLIC.GATABLE2.PRODUCT_VIEWS_APP > > PUBLIC.GATABLE2.PRODUCT_CLICKS_APP > > PUBLIC.GATABLE2.PRODUCT_VIEWS_WEB > > PUBLIC.GATABLE2.PRODUCT_CLICKS_WEB > > PUBLIC.GATABLE2.PDP_SESSIONS_APP > > PUBLIC.GATABLE2.PDP_SESSIONS_WEB > - pkey = customer_id,dt > - affinityKey = customer > > Query: > > - select COUNT(*) FROM( Select customer_id from GATABLE2 where > category_id in (175925, 101450, 9005, 175930, 175930, 175940,175945,101450, > 6453) group by customer_id having SUM(product_views_app) > 2 OR > SUM(product_clicks_app) > 1 ) > > The table has 600M rows. > At first, the query took 1m, when we added an index on category_id the > query started taking 3m. > > The SQL execution plan for both queries is attached. > > We are using a single x1.16xlarge insntace with query parallelism set to > 32 > > Cheers, > Eugene > > -- Best regards, Andrey V. Mashenkov
