Good idea but if there is no AUTOINCREMENT you can simply get the
min/max directly:
I think the following works (and for both + and - rowid's)
NB: The MaxRowID is not needed
with Rows(MinRowID, NumRows) as (select min(RowID) as MinRowID,
max(RowID) - min(RowID) + 1 as NumRows
from tab),
Potential(Candidate) as (select abs(random() % (select NumRows from
Rows)) + (select MinRowID - 1 from Rows) as Candidate
from generate_series
where start=1
and stop=(select 2*NumRows from Rows))
select distinct tab.*
from Potential
cross join tab
on tab.RowID == Potential.Candidate
limit 200;
NB: dependancy on extension series.c -> generate_series
Selecting 200 random rows from tab containing 1 million rows
takes 0.2 seconds ... from tab with 10 million rows takes 2
seconds
SQLite version 3.24.0 2018-06-01 13:30:45
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .version
SQLite 3.24.0 2018-06-01 13:30:45
8cadaf587dc96370f9c8a1dccc366b93021e8cfe4526da9368a088828fd1alt2
zlib version 1.2.11
gcc-7.3.0
sqlite> create table tab (RowID integer primary key, uuid text not null);
sqlite> insert into tab (RowID, uuid) select value-5000000,
uuidStringCreateV4() from generate_series where start=1 and stop=10000000;
sqlite> select min(rowid), max(rowid) from tab;
-4999999|5000000
sqlite> with Rows(MinRowID, NumRows) as (select min(RowID) as MinRowID,
...> max(RowID) - min(RowID) + 1 as
NumRows
...> from tab),
...> Potential(Candidate) as (select abs(random() % (select NumRows
from Rows)) + (select MinRowID - 1 from Rows) as Candidate
...> from generate_series
...> where start=1
...> and stop=(select 2*NumRows from Rows))
...> select distinct tab.*
...> from Potential
...> cross join tab
...> on tab.RowID == Potential.Candidate
...> limit 200;
4021141|7234943b-9831-48fe-9558-da59367fba61
-3898799|428d5d0a-f48e-4e01-bb69-1ad427fa64c8
-552665|cc3c6570-4bfc-4650-a32e-1cd0bd9b4450
4629777|1edcba39-3253-488a-953b-bbdd4a870e21
4901776|72ba152f-6de2-46f1-8702-2a0f688096d4
...
-3575561|30cf7391-55ae-4b37-be3a-bc318fe5d94e
-1672703|251ea02e-4a8f-4ad6-8466-29ebcc42e11e
-3032716|f547c0f1-3f74-4f8b-a79c-434219e4c2ab
3654895|b861d1b9-34f7-42d8-91de-b028d49d70c9
-1880955|26d771c0-8ea8-4b54-9aa1-6171e5b00249
4578402|33966112-1fdf-472d-aa78-697be0775848
Run Time: real 1.767 user 1.765625 sys 0.000000
---
The fact that there's a Highway to Hell but only a Stairway to Heaven
says a lot about anticipated traffic volume.
>>-----Original Message-----
>>From: sqlite-users [mailto:sqlite-users-
>>[email protected]] On Behalf Of Hick Gunter
>>Sent: Friday, 1 June, 2018 12:09
>>To: 'SQLite mailing list'
>>Subject: Re: [sqlite] [EXTERNAL] Re: random rows
>>
>>Just an idea:
>>
>>If the table has an INTEGER PRIMARY KEY AUTOINCREMENT then then
>>maximum rowid is available in the sqlite_sequence table.
>>To generate a random sequence of rowids, this can be used, somewhat
>>like
>>
>>SELECT DISTINCT rnd() * (SELECT seq FROM sqlite_sequence WHERE
>>name='<table_name>') as rowid;
>>
>>Maybe as a view or a CTE.
>>
>>Then use this as the LHS of a join
>>
>>SELECT t.* from (...) random cross join <table_name> t on
>>random.rowid = t.rowid LIMIT x;
>>
>>Creating random rowids should be blindingly fast, even with storing
>>the already created rowids in a temp btree. Reading the table via
>>rowid is as fast as possible. Any "misses" in the rowid generation
>>(i.e. rowids not present) are compensated by the number of rowids
>>generated. Speed increases with the desity of rowids (i.e.
>><current_row_count>/<max_inserted_rowid>). Unless of course SQLite
>>chooses to create all the rowids before providing the first one.
>>
>>-----Ursprüngliche Nachricht-----
>>Von: sqlite-users [mailto:sqlite-users-
>>[email protected]] Im Auftrag von Stephen Chrzanowski
>>Gesendet: Freitag, 01. Juni 2018 18:47
>>An: SQLite mailing list <[email protected]>
>>Betreff: [EXTERNAL] Re: [sqlite] random rows
>>
>>Here's my two cents. Don't spend it all in one place...
>>
>>CREATE TABLE [RandomTable](
>> [PriID] INTEGER PRIMARY KEY AUTOINCREMENT,
>> [DataCol] CHAR);
>>
>>INSERT INTO [RandomTable]([PriID], [DataCol]) VALUES(1, 'a'); INSERT
>>INTO [RandomTable]([PriID], [DataCol]) VALUES(2, 'b'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(3, 'c'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(4, 'd'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(5, 'e'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(6, 'f'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(7, 'g'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(8, 'h'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(9, 'i'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(10, 'j'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(11, 'k'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(12, 'l'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(13, 'm'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(14, 'n'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(15, 'o'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(16, 'p'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(17, 'q'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(18, 'r'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(19, 's'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(20, 't'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(21, 'u'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(22, 'v'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(23, 'w'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(24, 'x'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(25, 'y'); INSERT INTO
>>[RandomTable]([PriID], [DataCol]) VALUES(26, 'z');
>>
>>select random() as R,* from RandomTable order by R limit 10;
>>
>>Using 3.23.1
>>
>>
>>On Fri, Jun 1, 2018 at 10:22 AM, Don V Nielsen
>><[email protected]>
>>wrote:
>>
>>> ??
>>>
>>> SELECT * FROM table WHERE id IN (SELECT id FROM table ORDER BY
>>> RANDOM() LIMIT x)
>>>
>>> Maybe. It is more memory efficient then trying to sort the entire
>>lot
>>> of data.
>>>
>>>
>>> On Thu, May 31, 2018 at 7:13 PM Torsten Curdt <[email protected]>
>>wrote:
>>>
>>> > I need to get some random rows from a large(ish) table.
>>> >
>>> > The following seems to be the most straight forward - but not
>the
>>> fastest.
>>> >
>>> > SELECT * FROM table ORDER BY random() limit 200
>>> >
>>> > Is there a faster/better approach?
>>> >
>>> > cheers,
>>> > Torsten
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users