Here is the result of EXPLAIN for the afermantioned query: SELECT __Z0.CUSTOMER_ID AS __C0_0, SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1, MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2 FROM PUBLIC.GAL3EC1 __Z0 /* PUBLIC.DT_IDX: DT > '2018-05-12' */ WHERE __Z0.DT > '2018-05-12' GROUP BY __Z0.CUSTOMER_ID SELECT COUNT(*) FROM ( SELECT __C0_0 AS CUSTOMER_ID FROM PUBLIC.__T0 GROUP BY __C0_0 HAVING (SUM(__C0_1) > 2) AND (MAX(__C0_2) < 1) ) _0__Z1
On Tue, Aug 21, 2018 at 8:18 PM, eugene miretsky <[email protected]> wrote: > Hi, > > We have a cache called GAL3EC1, it has > > 1. A composite pKey consisting of customer_id and date > 2. An Index on the date column > 3. 300 sparse columns > > We are running a single EC2 4x8xlarge node. > > The following query takes 8min to finish > Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt > > '2018-05-12' GROUP BY customer_id having SUM(ec1_bknt_total_product_views_app) > > 2 AND MAX(ec1_hnk_total_product_clicks_app) < 1) > > I have a few questions: > > 1. 'top' command shows %100 cpu utilization (i.e only one of the 32 > CPUs is used). How can I get the query to use all 32 CPUs? I have tried > setting Query Parallelism to 32, but it didn't help, > 2. Adding the index on date column seems to have slowed down the > query. The 8min time from above was without the index, with the index the > query doesn't finish (I gave up after 30min). A similar query on a > smaller date range showed a 10x slow down with the index. Why? > 3. Our loads from Spark are very slow as well, and also seem to not > use the system resource properly, can that be related? > 4. What are some good tools and techniques to troubleshoot these > problems in Ignite? > > > All the relevant info is attached (configs, cache stats, node stats, > etc.). > > Cheers, > Eugene > > > > > > >
