Hi, Seems, full index scan is needed for this query. Adding more nodes and\or using queryParallelism feature [1] should reduce query time by utilizing multiple threads, when each thread will process with smaller index.
[1] https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism On Wed, May 10, 2017 at 9:55 PM, Guillermo Ortiz <[email protected]> wrote: > Yes, > > ******Result SELECT distinct(age) FROM PERSONWITHINDEX > 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2 > 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4 > 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6 > 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8 > 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96, > Elapsed time SELECT distinct(age) FROM PERSONWITHINDEX:2880ms > [SELECT DISTINCT > __Z0.AGE AS __C0_0 > FROM "PersonWithIndex".PERSONWITHINDEX __Z0 > /* "PersonWithIndex".PERSONWITHINDEX_AGE_IDX */] > [SELECT DISTINCT > __C0_0 AS AGE > FROM PUBLIC.__T0 > /* "PersonWithIndex"."merge_scan" */] > > > > ******Result SELECT distinct(age) FROM PERSONWITHOUTINDEX > 97,98,99,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,2 > 0,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,4 > 0,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,6 > 0,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,8 > 0,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96, > Elapsed time SELECT distinct(age) FROM PERSONWITHOUTINDEX:2976ms > [SELECT DISTINCT > __Z0.AGE AS __C0_0 > FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0 > /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */] > [SELECT DISTINCT > __C0_0 AS AGE > FROM PUBLIC.__T0 > /* "PersonWithoutIndex"."merge_scan" */] > > Age is a value between 0-100. I tried with a value between 0-1000 with the > same result. > It seems that it's not using any index. > > I'm not sure if in memory a random access is slower than a sequencial > access. I guess, but I'm not sure. > > > 2017-05-10 19:50 GMT+02:00 Andrey Mashenkov <[email protected]>: > >> Hi, >> >> I've just think you can achive same result with query without groupBy. >> Select distinct age from .. >> >> Does it workable for you? >> >> 10 мая 2017 г. 17:56 пользователь "Guillermo Ortiz" <[email protected]> >> написал: >> >> It's my laptop. 16gb, i7.. The collection size it's 5Mill objects (1,2GB) >>> and it takes about 30sec. >>> I have tried to execute the query with the same collection with and >>> without indices with the same time results. >>> >>> This is a test before to run in the real cluster with 6 nodes of 512Gb >>> and 48cores each one. We tried to execute there with a real dataset(22M >>> rows, 7gb) but it takes so long that query is aborted. >>> >>> >>> 2017-05-08 12:12 GMT+02:00 Andrey Mashenkov <[email protected]> >>> : >>> >>>> Hi, >>>> >>>> Why do you think query is slow? >>>> What its execution time? What is expected time? >>>> How many nodes do you use? How much data does cache contains? >>>> >>>> On Mon, May 8, 2017 at 1:47 AM, Guillermo Ortiz <[email protected]> >>>> wrote: >>>> >>>>> I have a simple cache where I have Person object (id, age and so on..) >>>>> >>>>> I have created indeces by id and age to try some examples but queries >>>>> with "group by" go really slow. >>>>> >>>>> I'm trying this query: >>>>> SELECT age >>>>> FROM >>>>> PersonWithindex >>>>> group by age >>>>> >>>>> >>>>> SELECT >>>>> AGE AS __C0 >>>>> FROM "personCacheWithIndex".PERSONWITHINDEX >>>>> /* "personCacheWithIndex"."age_idx" */ >>>>> GROUP BY AGE >>>>> /* group sorted */ >>>>> >>>>> SELECT >>>>> __C0 AS AGE >>>>> FROM PUBLIC.__T0 >>>>> /* "personCacheWithIndex"."merge_scan" */ >>>>> GROUP BY __C0 >>>>> >>>>> Although it seems that it uses index, why is it going so slow? I think >>>>> that it should be pretty fast with an index. >>>>> >>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> Andrey V. Mashenkov >>>> >>> >>> > -- Best regards, Andrey V. Mashenkov
