Re: GroupBy with index is really slow.

Guillermo Ortiz Wed, 10 May 2017 11:56:35 -0700

Yes,

******Result SELECT distinct(age) FROM PERSONWITHINDEX
97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
[SELECT DISTINCT
    __Z0.AGE AS __C0_0
FROM "PersonWithIndex".PERSONWITHINDEX __Z0
    /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
[SELECT DISTINCT
    __C0_0 AS AGE
FROM PUBLIC.__T0
    /* "PersonWithIndex"."merge_scan" */]




******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,
20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,
40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,
60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,
80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
[SELECT DISTINCT
    __Z0.AGE AS __C0_0
FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
    /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
[SELECT DISTINCT
    __C0_0 AS AGE
FROM PUBLIC.__T0
    /* "PersonWithoutIndex"."merge_scan" */]

Age is a value between 0-100. I tried with a value between 0-1000 with the
same result.
It seems that it's not using any index.

I'm not sure if in memory a random access is slower than a sequencial
access. I guess, but I'm not sure.


2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <[email protected]>:

> Hi,
>
> I've just think you can achive same result with query without groupBy.
> Select distinct age from ..
>
> Does it workable for you?
>
> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <[email protected]>
> написал:
>
> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB)
>> and it takes about 30sec.
>> I have tried to execute the query with the same collection with and
>> without indices with the same time results.
>>
>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>> and 48cores each one. We tried to execute there with a real dataset(22M
>> rows, 7gb) but it takes so long that query is aborted.
>>
>>
>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[email protected]>:
>>
>>> Hi,
>>>
>>> Why do you think query is slow?
>>> What its execution time? What is expected time?
>>> How many nodes do you use? How much data does cache contains?
>>>
>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[email protected]>
>>> wrote:
>>>
>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>
>>>> I have created indeces by id and age to try some examples but queries
>>>> with "group by" go really slow.
>>>>
>>>> I'm trying this query:
>>>> SELECT age
>>>> FROM
>>>> PersonWithindex
>>>> group by age
>>>>
>>>>
>>>> SELECT
>>>>     AGE AS __C0
>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>     /* "personCacheWithIndex"."age_idx" */
>>>> GROUP BY AGE
>>>> /* group sorted */
>>>>
>>>> SELECT
>>>>     __C0 AS AGE
>>>> FROM PUBLIC.__T0
>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>> GROUP BY __C0
>>>>
>>>> Although it seems that it uses index, why is it going so slow? I think
>>>> that it should be pretty fast with an index.
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>

Re: GroupBy with index is really slow.

Reply via email to