I tried both to get more variety of values and index should be work better
to get the max(id) than max(age), but I got the same result.
Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY
MONTH:2621ms
[SELECT
__Z0.MONTH AS __C0_0,
MAX(__Z0.ID) AS __C0_1
FROM "PersonWithIndex".PERSONWITHINDEX __Z0
/* "PersonWithIndex".*INDEX_GROUP_MONTH_ID **/
GROUP BY __Z0.MONTH]
[SELECT
__C0_0 AS MONTH,
MAX(__C0_1) AS __C0_1
FROM PUBLIC.__T0
/* "PersonWithIndex"."merge_scan" */
GROUP BY __C0_0]
Although it seems that it uses it when I get times, seems the opposite.
2017-05-11 13:55 GMT+02:00 Andrey Mashenkov <[email protected]>:
> Query use PERSONWITHINDEX_MONTH_IDX instead of index_group_month_age.
>
> Do you mean query "SELECT MONTH, MAX(*AGE*)"?
>
> On Wed, May 10, 2017 at 6:21 PM, Guillermo Ortiz <[email protected]>
> wrote:
>
>> Is there any limitation like databases which if it has to read more than
>> x% of data it do a full scan?
>>
>> I tried this:
>>
>> Result *SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY MONTH *
>> September,1999868,February,1999869,March,1999871,August,1999
>> 844,June,1999867,April,1999854,July,1999866,January,1999857,
>> May,1999862,November,1999830,October,1999864,
>> Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHINDEX GROUP BY MONTH:
>> *2743ms*
>> [SELECT
>> __Z0.MONTH AS __C0_0,
>> MAX(__Z0.ID) AS __C0_1
>> FROM "PersonWithIndex".PERSONWITHINDEX __Z0
>> /* "PersonWithIndex".PERSONWITHINDEX_MONTH_IDX */
>> GROUP BY __Z0.MONTH
>> /* group sorted */]
>> [SELECT
>> __C0_0 AS MONTH,
>> MAX(__C0_1) AS __C0_1
>> FROM PUBLIC.__T0
>> /* "PersonWithIndex"."merge_scan" */
>> GROUP BY __C0_0]
>>
>>
>> ******Result *SELECT MONTH, MAX(ID) FROM PERSONWITHOUTINDEX GROUP BY
>> MONTH* September,1999860,November,1999870,February,1999869,March,
>> 1999866,May,1999864,April,1999865,July,1999859,August,
>> 1999871,January,1999867,June,1999868,October,1999854,
>> Elapsed time SELECT MONTH, MAX(ID) FROM PERSONWITHOUTINDEX GROUP BY MONTH:
>> *2418ms*
>> [SELECT
>> __Z0.MONTH AS __C0_0,
>> MAX(__Z0.ID) AS __C0_1
>> FROM "PersonWithoutIndex".PERSONWITHOUTINDEX __Z0
>> /* "PersonWithoutIndex".PERSONWITHOUTINDEX.__SCAN_ */
>> GROUP BY __Z0.MONTH]
>> [SELECT
>> __C0_0 AS MONTH,
>> MAX(__C0_1) AS __C0_1
>> FROM PUBLIC.__T0
>> /* "PersonWithoutIndex"."merge_scan" */
>> GROUP BY __C0_0]
>>
>>
>> I tried to execute many times same query and times are similar all the
>> times, I guess that indices aren't being useful. Am I doing something bad?
>>
>> public class PersonWithIndex implements Serializable {
>> @QuerySqlField(index = true)
>> public int id;
>> @QuerySqlField(index = true, orderedGroups =
>> {@QuerySqlField.Group(name="index_group_month_age", order=1)})
>> public int age;
>> @QuerySqlField
>> public String dni;
>> @QuerySqlField(index = true, orderedGroups =
>> {@QuerySqlField.Group(name="index_group_month_age", order=0)})
>> public String month;
>>
>> ....
>>
>>
>>
>> public class PersonWithoutIndex implements Serializable {
>> @QuerySqlField
>> public int id;
>> @QuerySqlField
>> public int age;
>> @QuerySqlField
>> public String dni;
>> @QuerySqlField
>> public String month;
>>
>>
>> 2017-05-08 12:32 GMT+02:00 Andrey Mashenkov <[email protected]>:
>>
>>> Group indices description can be found here [1]. Please, check if you
>>> read docs for your ignite version, as configuration for ignite 1.x and 2.0
>>> can differs.
>>> Group indices in Ignite a similar to composite or multi-column indixes
>>> in databases.
>>>
>>> Yes, it is work that way. Group index for (groupFiled, timestampField)
>>> should be helpful.
>>> You can try to use EXPLAIN comand to check if correct index is in use.
>>>
>>>
>>> [1] https://apacheignite.readme.io/docs/indexes#section-group-indexes
>>>
>>> On Fri, May 5, 2017 at 8:17 PM, Guillermo Ortiz <[email protected]>
>>> wrote:
>>>
>>>>
>>>>
>>>> If I have two single indices to query:
>>>>
>>>> select * table a,b where a=1 and b=2
>>>>
>>>> it doesn't work pretty good and I have to create a group index, how is
>>>> that possible? how does group indices work?
>>>>
>>>> Similar to this:
>>>>
>>>> select max(timestampField)
>>>> from myTable
>>>> group by groupField
>>>>
>>>> I have a index by groupField what it means that it should be really
>>>> fast to make the group and later make a partial map to find in each node
>>>> the max of the subset of data to merge the final submax and get the final
>>>> result in a "reduce" task after sending data through network. Is it how
>>>> does it work?
>>>> This query it's really slow, I don't know if I created an index by
>>>> timestampField I would get better performance, but it doesn't too much
>>>> sense to create a index with a timestampField where there aren't too row
>>>> with the same value in a table with 250M rows....
>>>> Any advice about this query?
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>>
>>
>>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>