Hello Ilya, Yes, you are right. I sent my first response too quickly. With an extra column it will work. I will go with the approach you suggest. :-)
Kind regards, Stéphane Thibaud 2019年5月20日(月) 20:26 Ilya Kasnacheev <ilya.kasnach...@gmail.com>: > Hello! > > I'm not sure why you think it will not scale. If this field is indexed > then taking a random sample is basically one b-tree walk away. > > I guess you will have to store random numbers, if you rely on non-random > field it might introduce bias. > > Regards, > -- > Ilya Kasnacheev > > > пн, 20 мая 2019 г. в 13:28, Stéphane Thibaud <snthib...@gmail.com>: > >> Hello Ilya, >> >> Thank you for that suggestion. On a traditional database I know that >> approach does not scale well, since a random number is first assigned to >> all rows (it scales linearly with the number of rows if I am not mistaken). >> Do you think this would be different for Ignite? >> >> >> Kind regards, >> >> Stéphane Thibaud >> >> 2019年5月20日(月) 15:53 Ilya Kasnacheev <ilya.kasnach...@gmail.com>: >> >>> Hello! >>> >>> You can have a random indexed field in your table and do queries like >>> SELECT * FROM table WHERE rand_field < RAND() LIMIT 1; to sample random >>> item. >>> >>> Regards, >>> -- >>> Ilya Kasnacheev >>> >>> >>> пн, 20 мая 2019 г. в 04:50, Stéphane Thibaud <snthib...@gmail.com>: >>> >>>> As a small addition: it would really help if Ignite had a hashing >>>> function for this, but I only see AES encryption. >>>> >>>> >>>> Kind regards, >>>> >>>> Stéphane Thibaud >>>> >>>> 2019年5月19日(日) 20:59 Stéphane Thibaud <snthib...@gmail.com>: >>>> >>>>> Hello Ignite users, >>>>> >>>>> I am considering to sample randomly on large amounts of data, but I >>>>> was wondering what would be the most efficient way for this. Right now, I >>>>> think I might need cluster-based randomness using a MOD function as >>>>> described here: >>>>> https://www.alandix.com/academic/topics/random/sampling-SQL.html >>>>> >>>>> I currently have a UUID column (uuid4), which I think can be used for >>>>> it, but I might need some bit manipulation to get the non-random parts out >>>>> of the UUID. >>>>> Do you think this is indeed the most straightforward way to do it? >>>>> >>>>> >>>>> Kind regards, >>>>> >>>>> Stéphane Thibaud >>>>> >>>>