Re: GroupBy with index is really slow.

Andrey Mashenkov Thu, 11 May 2017 04:50:11 -0700

Hi,

Seems, full index scan is needed for this query.
Adding more nodes and\or using queryParallelism feature [1] should reduce
query time by utilizing multiple threads, when each thread will process
with smaller index.


[1]
https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism

On Wed, May 10, 2017 at 9:55 PM, Guillermo Ortiz <[email protected]>
wrote:

> Yes,
>
> ******Result SELECT distinct(age) FROM PERSONWITHINDEX
> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
> Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
> [SELECT DISTINCT
>     __Z0.AGE AS __C0_0
> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>     /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
> [SELECT DISTINCT
>     __C0_0 AS AGE
> FROM PUBLIC.__T0
>     /* "PersonWithIndex"."merge_scan" */]
>
>
>
> ******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
> Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
> [SELECT DISTINCT
>     __Z0.AGE AS __C0_0
> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
> [SELECT DISTINCT
>     __C0_0 AS AGE
> FROM PUBLIC.__T0
>     /* "PersonWithoutIndex"."merge_scan" */]
>
> Age is a value between 0-100. I tried with a value between 0-1000 with the
> same result.
> It seems that it's not using any index.
>
> I'm not sure if in memory a random access is slower than a sequencial
> access. I guess, but I'm not sure.
>
>
> 2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <[email protected]>:
>
>> Hi,
>>
>> I've just think you can achive same result with query without groupBy.
>> Select distinct age from ..
>>
>> Does it workable for you?
>>
>> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <[email protected]>
>> написал:
>>
>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
>>> and it takes about 30sec.
>>> I have tried to execute the query with the same collection with and
>>> without indices with the same time results.
>>>
>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>> rows, 7gb) but it takes so long that query is aborted.
>>>
>>>
>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[email protected]>
>>> :
>>>
>>>> Hi,
>>>>
>>>> Why do you think query is slow?
>>>> What its execution time? What is expected time?
>>>> How many nodes do you use? How much data does cache contains?
>>>>
>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[email protected]>
>>>> wrote:
>>>>
>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>
>>>>> I have created indeces by id and age to try some examples but queries
>>>>> with "group by" go really slow.
>>>>>
>>>>> I'm trying this query:
>>>>> SELECT age
>>>>> FROM
>>>>> PersonWithindex
>>>>> group by age
>>>>>
>>>>>
>>>>> SELECT
>>>>>     AGE AS __C0
>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>> GROUP BY AGE
>>>>> /* group sorted */
>>>>>
>>>>> SELECT
>>>>>     __C0 AS AGE
>>>>> FROM PUBLIC.__T0
>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>> GROUP BY __C0
>>>>>
>>>>> Although it seems that it uses index, why is it going so slow? I think
>>>>> that it should be pretty fast with an index.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Andrey V. Mashenkov
>>>>
>>>
>>>
>


-- 
Best regards,
Andrey V. Mashenkov

Re: GroupBy with index is really slow.

Reply via email to