Here is the result of EXPLAIN for the afermantioned query:

SELECT
__Z0.CUSTOMER_ID AS __C0_0,
SUM(__Z0.EC1_BKNT_TOTAL_PRODUCT_VIEWS_APP) AS __C0_1,
MAX(__Z0.EC1_HNK_TOTAL_PRODUCT_CLICKS_APP) AS __C0_2
FROM PUBLIC.GAL3EC1 __Z0
/* PUBLIC.DT_IDX: DT > '2018-05-12' */
WHERE __Z0.DT > '2018-05-12'
GROUP BY __Z0.CUSTOMER_ID
SELECT
COUNT(*)
FROM (
SELECT
__C0_0 AS CUSTOMER_ID
FROM PUBLIC.__T0
GROUP BY __C0_0
HAVING (SUM(__C0_1) > 2)
AND (MAX(__C0_2) < 1)
) _0__Z1



On Tue, Aug 21, 2018 at 8:18 PM, eugene miretsky <[email protected]>
wrote:

> Hi,
>
> We have a cache called GAL3EC1, it has
>
>    1. A composite pKey consisting of customer_id and date
>    2. An Index on the date column
>    3. 300 sparse columns
>
> We are running a single EC2 4x8xlarge node.
>
> The following query takes 8min to finish
> Select COUNT (*) FROM (SELECT customer_id FROM GAl3ec1 where dt >
> '2018-05-12' GROUP BY customer_id having SUM(ec1_bknt_total_product_views_app)
> > 2 AND MAX(ec1_hnk_total_product_clicks_app) < 1)
>
> I have a few questions:
>
>    1. 'top' command shows %100 cpu utilization (i.e only one of the 32
>    CPUs is used). How can I get the query to use all 32 CPUs? I have tried
>    setting Query Parallelism to 32, but it didn't help,
>    2. Adding the index on date column seems to have slowed down the
>    query. The 8min time from above was without the index, with the index the
>    query doesn't finish (I gave up after 30min). A similar query on a
>    smaller date range showed a 10x slow down with the index. Why?
>    3. Our loads from Spark are very slow as well, and also seem to not
>    use the system resource properly, can that be related?
>    4. What are some good tools and techniques to troubleshoot these
>    problems in Ignite?
>
>
> All the relevant info is attached (configs, cache stats, node stats,
> etc.).
>
> Cheers,
> Eugene
>
>
>
>
>
>
>

Reply via email to