Hello! I'm not sure why you think it will not scale. If this field is indexed then taking a random sample is basically one b-tree walk away.
I guess you will have to store random numbers, if you rely on non-random field it might introduce bias. Regards, -- Ilya Kasnacheev пн, 20 мая 2019 г. в 13:28, Stéphane Thibaud <snthib...@gmail.com>: > Hello Ilya, > > Thank you for that suggestion. On a traditional database I know that > approach does not scale well, since a random number is first assigned to > all rows (it scales linearly with the number of rows if I am not mistaken). > Do you think this would be different for Ignite? > > > Kind regards, > > Stéphane Thibaud > > 2019年5月20日(月) 15:53 Ilya Kasnacheev <ilya.kasnach...@gmail.com>: > >> Hello! >> >> You can have a random indexed field in your table and do queries like >> SELECT * FROM table WHERE rand_field < RAND() LIMIT 1; to sample random >> item. >> >> Regards, >> -- >> Ilya Kasnacheev >> >> >> пн, 20 мая 2019 г. в 04:50, Stéphane Thibaud <snthib...@gmail.com>: >> >>> As a small addition: it would really help if Ignite had a hashing >>> function for this, but I only see AES encryption. >>> >>> >>> Kind regards, >>> >>> Stéphane Thibaud >>> >>> 2019年5月19日(日) 20:59 Stéphane Thibaud <snthib...@gmail.com>: >>> >>>> Hello Ignite users, >>>> >>>> I am considering to sample randomly on large amounts of data, but I was >>>> wondering what would be the most efficient way for this. Right now, I think >>>> I might need cluster-based randomness using a MOD function as described >>>> here: https://www.alandix.com/academic/topics/random/sampling-SQL.html >>>> >>>> I currently have a UUID column (uuid4), which I think can be used for >>>> it, but I might need some bit manipulation to get the non-random parts out >>>> of the UUID. >>>> Do you think this is indeed the most straightforward way to do it? >>>> >>>> >>>> Kind regards, >>>> >>>> Stéphane Thibaud >>>> >>>