Re: GroupBy with index is really slow.

Guillermo Ortiz Thu, 11 May 2017 05:01:28 -0700

Okay, I'll try with that.

2017-05-11 13:49 GMT+02:00 Andrey Mashenkov <[email protected]>:


> Hi,
>
> Seems, full index scan is needed for this query.
> Adding more nodes and\or using queryParallelism feature [1] should reduce
> query time by utilizing multiple threads, when each thread will process
> with smaller index.
>
> [1] https://apacheignite.readme.io/docs/sql-performance-and-debugging#
> query-parallelism
>
> On Wed, May 10, 2017 at 9:55 PM, Guillermo Ortiz <[email protected]>
> wrote:
>
>> Yes,
>>
>> ******Result SELECT distinct(age) FROM PERSONWITHINDEX
>> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
>> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
>> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
>> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
>> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
>> Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms
>> [SELECT DISTINCT
>>     __Z0.AGE AS __C0_0
>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>>     /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */]
>> [SELECT DISTINCT
>>     __C0_0 AS AGE
>> FROM PUBLIC.__T0
>>     /* "PersonWithIndex"."merge_scan" */]
>>
>>
>>
>> ******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX
>> 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2
>> 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4
>> 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6
>> 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8
>> 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,
>> Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms
>> [SELECT DISTINCT
>>     __Z0.AGE AS __C0_0
>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>>     /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */]
>> [SELECT DISTINCT
>>     __C0_0 AS AGE
>> FROM PUBLIC.__T0
>>     /* "PersonWithoutIndex"."merge_scan" */]
>>
>> Age is a value between 0-100. I tried with a value between 0-1000 with
>> the same result.
>> It seems that it's not using any index.
>>
>> I'm not sure if in memory a random access is slower than a sequencial
>> access. I guess, but I'm not sure.
>>
>>
>> 2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <[email protected]>:
>>
>>> Hi,
>>>
>>> I've just think you can achive same result with query without groupBy.
>>> Select distinct age from ..
>>>
>>> Does it workable for you?
>>>
>>> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <
>>> [email protected]> написал:
>>>
>>> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects
>>>> (1,2GB) and it takes about 30sec.
>>>> I have tried to execute the query with the same collection with and
>>>> without indices with the same time results.
>>>>
>>>> This is a test before to run in the real cluster with 6 nodes of 512Gb
>>>> and 48cores each one. We tried to execute there with a real dataset(22M
>>>> rows, 7gb) but it takes so long that query is aborted.
>>>>
>>>>
>>>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[email protected]
>>>> >:
>>>>
>>>>> Hi,
>>>>>
>>>>> Why do you think query is slow?
>>>>> What its execution time? What is expected time?
>>>>> How many nodes do you use? How much data does cache contains?
>>>>>
>>>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> I have a simple cache where I have Person object (id, age and so on..)
>>>>>>
>>>>>> I have created indeces by id and age to try some examples but queries
>>>>>> with "group by" go really slow.
>>>>>>
>>>>>> I'm trying this query:
>>>>>> SELECT age
>>>>>> FROM
>>>>>> PersonWithindex
>>>>>> group by age
>>>>>>
>>>>>>
>>>>>> SELECT
>>>>>>     AGE AS __C0
>>>>>> FROM "personCacheWithIndex".PERSONWITHINDEX
>>>>>>     /* "personCacheWithIndex"."age_idx" */
>>>>>> GROUP BY AGE
>>>>>> /* group sorted */
>>>>>>
>>>>>> SELECT
>>>>>>     __C0 AS AGE
>>>>>> FROM PUBLIC.__T0
>>>>>>     /* "personCacheWithIndex"."merge_scan" */
>>>>>> GROUP BY __C0
>>>>>>
>>>>>> Although it seems that it uses index, why is it going so slow? I
>>>>>> think that it should be pretty fast with an index.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>>
>>>>
>>>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: GroupBy with index is really slow.

Reply via email to