Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
On 11/7/19 5:13 PM, Doug Currie wrote:
> On Thu, Nov 7, 2019 at 4:23 PM Richard Damon 
> wrote:
>
>> One thought would be to generate a ‘hash’ from part of the record, maybe
>> the record ID, and select records based on that value. The simplest would
>> be something like id%100 == 0 would get you 1% of the records. That
>> admittedly isn’t that random.
>>
>> Put the ID through a linear congruential generator, something like
>>
>> mod(a * Id + b, c) % 100 == 0
>>
>> And you will pretty well scramble the selection
>>
>>
> Yes, and if a, b, and c come from a randomization table, they can be
> modified to obtain a different pseudo-random set.
>
> e
a, b, and c should be chosen to give a reasonable random number
generator (there are tables of good values), not be arbitrary values.
The 100 and the 0 can be changed (or use some other test on the random
number) to get different selections.

-- 
Richard Damon

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] database is empty but intact

2019-11-07 Thread Simon Slavin
On 8 Nov 2019, at 12:40am, Roman Fleysher  
wrote:

> I am using linux, I think this analyzer is windows only.



"Precompiled Binaries for Linux"

> Maybe I should ask a different question: Would knowing answer to my question 
> (is it possible for DB to become empty?) help me single out FLoM as the cause 
> of the problem?

Could you try it for a while with locking enabled ?  See whether the problem 
reoccurs ?  I'm sure it'll be slower but apart from the above analyzer it's the 
ony diagnostic which occurs to me.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] database is empty but intact

2019-11-07 Thread Roman Fleysher
There are multiple programs working in parallel. 750 of them from several 
compute nodes.

I use command line sqlite3, so,  sqlite itself it not a suspect.

FLoM is like a file lock:

flom argumentsToFLoMIncludingTimeout  --  sqlite3 "sqlStatement;"

I am using linux, I think this analyzer is windows only.

There several things that can fail: FLoM mismanaged locks is my primary 
suspect. But I would think database would be corrupt, not empty.

Maybe I should ask a different question: Would knowing answer to my question 
(is it possible for DB to become empty?) help me single out FLoM as the cause 
of the problem?

Roman


From: sqlite-users [sqlite-users-boun...@mailinglists.sqlite.org] on behalf of 
Simon Slavin [slav...@bigfraud.org]
Sent: Thursday, November 07, 2019 7:28 PM
To: SQLite mailing list
Subject: Re: [sqlite] database is empty but intact

On 8 Nov 2019, at 12:03am, Roman Fleysher  
wrote:

> I am using command line sqlite3 with -vfs unix-none. This disables locking 
> within SQLite. Instead, locking is provided externally by FLoM (distributed 
> file lock manager). I asked questions in a thread "disable file locking 
> mechanism over the network".

Is SQLite telling FLoM when it should place or remove a lock ?  If not, how 
does FLoM know ?

Do you have more than one process which might write to the database ?

Do all your processes which write to the database close their connections 
correctly ?  Do they check the result codes from their calls to see if SQlite 
successfully closed the database ?

> It is possible that FLoM has bugs and mismanaged locks. As a result, the 
> database is now empty. PRAGMA integrity check shows the database is intact. I 
> would expect it to be corrupt. The only DELETE operation in the queries was 
> to delete a single row:
>
> DELETE FROM jobs WHERE rowID = XXX;
>
> Could this DELETE actually delete entire content of a table if lock is 
> mismanaged?

It could, though it is unsual.  Have you run the analysis program on the file 
to find out more about it ?



It could allow you to distinguish between a table with no rows in, and a table 
with lots of deleted rows in.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
https://nam02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmailinglists.sqlite.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fsqlite-usersdata=02%7C01%7Croman.fleysher%40einstein.yu.edu%7C885cd07bd2e8437eb7df08d763e29b88%7C04c70eb48f2648079934e02e89266ad0%7C1%7C0%7C637087697235291714sdata=kCZumJpJuTsUpSw4wl2XncG7aATqbPFqDpaXXFc9ecs%3Dreserved=0
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] database is empty but intact

2019-11-07 Thread Simon Slavin
On 8 Nov 2019, at 12:03am, Roman Fleysher  
wrote:

> I am using command line sqlite3 with -vfs unix-none. This disables locking 
> within SQLite. Instead, locking is provided externally by FLoM (distributed 
> file lock manager). I asked questions in a thread "disable file locking 
> mechanism over the network".

Is SQLite telling FLoM when it should place or remove a lock ?  If not, how 
does FLoM know ?

Do you have more than one process which might write to the database ?

Do all your processes which write to the database close their connections 
correctly ?  Do they check the result codes from their calls to see if SQlite 
successfully closed the database ?

> It is possible that FLoM has bugs and mismanaged locks. As a result, the 
> database is now empty. PRAGMA integrity check shows the database is intact. I 
> would expect it to be corrupt. The only DELETE operation in the queries was 
> to delete a single row:
> 
> DELETE FROM jobs WHERE rowID = XXX;
> 
> Could this DELETE actually delete entire content of a table if lock is 
> mismanaged?

It could, though it is unsual.  Have you run the analysis program on the file 
to find out more about it ?



It could allow you to distinguish between a table with no rows in, and a table 
with lots of deleted rows in.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] database is empty but intact

2019-11-07 Thread Roman Fleysher
Dear SQLiters,

I am using command line sqlite3 with -vfs unix-none. This disables locking 
within SQLite. Instead, locking is provided externally by FLoM (distributed 
file lock manager). I asked questions in a thread "disable file locking 
mechanism over the network".

It is possible that FLoM has bugs and mismanaged locks. As a result, the 
database is now empty. PRAGMA integrity check shows the database is intact. I 
would expect it to be corrupt. The only DELETE operation in the queries was to 
delete a single row:

DELETE FROM jobs WHERE rowID = XXX;

Could this DELETE actually delete entire content of a table if lock is 
mismanaged? I am trying to figure out of this is a disk related issue. Or some 
other issue.



Roman
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: (VALUES (1), (2), (3)) AS t(n)

2019-11-07 Thread Keith Medcalf
How about something like:

with t(a, b) as (values (1, 1), (2, 2)) select a, b from t;

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.

>-Original Message-
>From: sqlite-users  On
>Behalf Of Ainar Garipov
>Sent: Thursday, 7 November, 2019 13:44
>To: sqlite-users@mailinglists.sqlite.org
>Subject: [sqlite] Feature request: (VALUES (1), (2), (3)) AS t(n)
>
>Good day!  It's my first time writing to a mail list for a very long
>time, so I may not get this right the first time.
>
>I would like to request the ability to write:
>
>  SELECT *
>FROM (VALUES (1, 1), (2, 2)) AS t(a, b)
>  ;
>
>Instead of:
>
>  SELECT t.column1 AS a, t.column2 AS b
>FROM (VALUES (1, 1), (2, 2)) AS t
>  ;
>
>That is, support column names in AS aliases.  Currently I can do that in
>PostgreSQL but not in SQLite.  The latter gives me this error:
>
>  Error: near "(": syntax error
>___
>sqlite-users mailing list
>sqlite-users@mailinglists.sqlite.org
>http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Doug Currie
On Thu, Nov 7, 2019 at 4:23 PM Richard Damon 
wrote:

>
> One thought would be to generate a ‘hash’ from part of the record, maybe
> the record ID, and select records based on that value. The simplest would
> be something like id%100 == 0 would get you 1% of the records. That
> admittedly isn’t that random.
>
> Put the ID through a linear congruential generator, something like
>
> mod(a * Id + b, c) % 100 == 0
>
> And you will pretty well scramble the selection
>
>
Yes, and if a, b, and c come from a randomization table, they can be
modified to obtain a different pseudo-random set.

e
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Richard Damon
> On Nov 7, 2019, at 2:15 PM, Merijn Verstraaten  wrote:
> 
> 
>> On 7 Nov 2019, at 19:16, David Raymond  wrote:
>> 
>> Along those lines SQLite includes the reverse_unordered_selects pragma
>> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
>> which will flip the order it sends rows in queries that don't explicitly 
>> specify an ordering. It's there to assist you in finding spots in your code 
>> where you might be relying on implicit ordering when you really shouldn't be.
> 
> Like the rest of this threads, this is just pointing out why the things in my 
> initial email don't work, but I already knew that. Which is why I asked for 
> help to see if there is a way to do what I want that *does* work. I don't 
> care particularly about the details of "can I control the order the condition 
> is evaluated", it's just that all reasonable ways to sample large streams 
> that I know would require a deterministic order.
> 
> If someone has a different/better idea on how to return just a random sample 
> from a query in a repeatable way, I'm all ears.
> 
> So far the only suggestion was "use some non-deterministic random sampling 
> method and store the result", but since my samples are large and I have lots 
> of them, this would balloon my storage by >100x and I don't have the 
> available storage to make that work.
> 
> - Merijn
> 

One thought would be to generate a ‘hash’ from part of the record, maybe the 
record ID, and select records based on that value. The simplest would be 
something like id%100 == 0 would get you 1% of the records. That admittedly 
isn’t that random.

Put the ID through a linear congruential generator, something like

mod(a * Id + b, c) % 100 == 0

And you will pretty well scramble the selection
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Feature request: (VALUES (1), (2), (3)) AS t(n)

2019-11-07 Thread Ainar Garipov
Good day!  It's my first time writing to a mail list for a very long
time, so I may not get this right the first time.

I would like to request the ability to write:

  SELECT *
FROM (VALUES (1, 1), (2, 2)) AS t(a, b)
  ;

Instead of:

  SELECT t.column1 AS a, t.column2 AS b
FROM (VALUES (1, 1), (2, 2)) AS t
  ;

That is, support column names in AS aliases.  Currently I can do that in
PostgreSQL but not in SQLite.  The latter gives me this error:

  Error: near "(": syntax error
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
We went off on a tangent, apologies.

If you have contiguous integer primary keys, you could randomly sample that 
range of integers, then pull the records with those keys.

Or in your external language of choice, sample the integers from 1 to the 
record count deterministically, select ordered by the primary key, and take the 
ones with the sampled offsets. (Stepping through 1 query and NOT doing a bunch 
of ...order by pk limit 1 offset n... queries)

Making something quick in Python I might do something like:


import random
import sqlite3

conn = sqlite3.connect(dbFile, isolation_level = None)
cur = conn.cursor()
cur.execute("select count(*) from foo;")
numRecords = cur.fetchone()[0]
sampleSize = 10
random.seed(5) #Your deterministic seed here
SampleOffsets = random.sample(range(1, numRecords + 1), sampleSize)
SampleOffsets.sort()
cur.execute("select * from foo order by primary_key;")
currentOffset = 0
for selectedOffset in SampleOffsets:
for _ in range(selectedOffset - currentOffset - 1):
cur.fetchone()
nextSampleRecord = cur.fetchone()
currentOffset = selectedOffset
doSomethingWithSample(nextSampleRecord)



-Original Message-
From: sqlite-users  On Behalf Of 
Merijn Verstraaten
Sent: Thursday, November 7, 2019 2:16 PM
To: SQLite mailing list 
Subject: Re: [sqlite] Deterministic random sampling via SELECT


> On 7 Nov 2019, at 19:16, David Raymond  wrote:
> 
> Along those lines SQLite includes the reverse_unordered_selects pragma
> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
> which will flip the order it sends rows in queries that don't explicitly 
> specify an ordering. It's there to assist you in finding spots in your code 
> where you might be relying on implicit ordering when you really shouldn't be.

Like the rest of this threads, this is just pointing out why the things in my 
initial email don't work, but I already knew that. Which is why I asked for 
help to see if there is a way to do what I want that *does* work. I don't care 
particularly about the details of "can I control the order the condition is 
evaluated", it's just that all reasonable ways to sample large streams that I 
know would require a deterministic order.

If someone has a different/better idea on how to return just a random sample 
from a query in a repeatable way, I'm all ears.

So far the only suggestion was "use some non-deterministic random sampling 
method and store the result", but since my samples are large and I have lots of 
them, this would balloon my storage by >100x and I don't have the available 
storage to make that work.

- Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Chris Peachment
In the very old days before computers were common, a random number
table appeared at the back of many statistical texts. This was used
to select a series of random numbers which would then be used as
look-up indices into some other data set.

You could do the same:

  1. generate a list of pseudo-random numbers, using a pre-defined
 seed value, over the range 1 .. count(*) of records in table,

  2. use that list as record id values to select the desired subset
 of the data in the table.

This would be done in two separate operations, possibly with a
storage of the generated numbers in a separate table which could
be used in the query of the main data.

Since it is a pseudo-random number series, you can repeat it as
often as needed using the same seed value.

Chris


On Thu, 7 Nov 2019, at 15:15, Merijn Verstraaten wrote:
> 
> > On 7 Nov 2019, at 19:16, David Raymond  wrote:
> > 
> > Along those lines SQLite includes the reverse_unordered_selects pragma
> > https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
> > which will flip the order it sends rows in queries that don't explicitly 
> > specify an ordering. It's there to assist you in finding spots in your code 
> > where you might be relying on implicit ordering when you really shouldn't 
> > be.
> 
> Like the rest of this threads, this is just pointing out why the things 
> in my initial email don't work, but I already knew that. Which is why I 
> asked for help to see if there is a way to do what I want that *does* 
> work. I don't care particularly about the details of "can I control the 
> order the condition is evaluated", it's just that all reasonable ways 
> to sample large streams that I know would require a deterministic order.
> 
> If someone has a different/better idea on how to return just a random 
> sample from a query in a repeatable way, I'm all ears.
> 
> So far the only suggestion was "use some non-deterministic random 
> sampling method and store the result", but since my samples are large 
> and I have lots of them, this would balloon my storage by >100x and I 
> don't have the available storage to make that work.
> 
> - Merijn
> 
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
> 
> Attachments:
> * signature.asc
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Donald Griggs
>
> Regarding: "So far the only suggestion was "use some non-deterministic
> random sampling method and store the result", but since my samples are
> large and I have lots of them, this would balloon my storage by >100x and I
> don't have the available storage to make that work."


But Keith Medcalf was suggesting your table KeySet store only the *keys*,
not the data itself, right?
Are you saying storing the ROWIDs would still be prohibitively expensive?

>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Merijn Verstraaten

> On 7 Nov 2019, at 19:16, David Raymond  wrote:
> 
> Along those lines SQLite includes the reverse_unordered_selects pragma
> https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
> which will flip the order it sends rows in queries that don't explicitly 
> specify an ordering. It's there to assist you in finding spots in your code 
> where you might be relying on implicit ordering when you really shouldn't be.

Like the rest of this threads, this is just pointing out why the things in my 
initial email don't work, but I already knew that. Which is why I asked for 
help to see if there is a way to do what I want that *does* work. I don't care 
particularly about the details of "can I control the order the condition is 
evaluated", it's just that all reasonable ways to sample large streams that I 
know would require a deterministic order.

If someone has a different/better idea on how to return just a random sample 
from a query in a repeatable way, I'm all ears.

So far the only suggestion was "use some non-deterministic random sampling 
method and store the result", but since my samples are large and I have lots of 
them, this would balloon my storage by >100x and I don't have the available 
storage to make that work.

- Merijn


signature.asc
Description: Message signed with OpenPGP
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite with branching

2019-11-07 Thread Jens Alfke


> On Nov 7, 2019, at 9:02 AM, Bernardo Ramos  wrote:
> 
> If you are interested in just the performance without the branching feature, 
> there are at least 3 options:
> 
> 1. SQLigthning: I was thinking in updating it to the last version of SQLite

That would be awesome! I have looked at it a few times, but it's based on such 
an old version that it's useless to me in its current state.

> 2. Modified version of LiteTree, without branches

I'm curious how performance of LiteTree (w/o branching) compares to 
SQLightning, i.e. whether storing rows or pages is more efficient. Have you 
measured?

> 3. SQLite with mmap

IIRC, this is not nearly as fast as either of the LMDB-based approaches. I 
don't know why; presumably SQLite doesn't make as efficient use of 
memory-mapping.

—Jens
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] How to tell sqlite to only use tcl from the system?

2019-11-07 Thread Dennis Clarke



I am seeing confusion here on a few systems wherein this :

#include "sqliteInt.h"
#if defined(INCLUDE_SQLITE_TCL_H)
#  include "sqlite_tcl.h"
#else
#  include "tcl.h"
#endif

is in a pile of the test code.

For some reasons I am seeing undefined symbols ( sched_yield ) and
problems after getting past a build of 326 :

.
.
.

/usr/local/build/sqlite-src-326_Oracle_sparc64vii+.002/src/tclsqlite.c:
sqlite3.c:
"sqlite3.c", line 21319: warning: implicit function declaration: 
localtime_r (E_NO_IMPLICIT_DECL_ALLOWED)
"sqlite3.c", line 53307: warning: statement not reached 
(E_STATEMENT_NOT_REACHED)

ld: warning: option -Q appears more than once, first setting taken
Undefined   first referenced
 symbol in file
sched_yield test4.o
ld: fatal: symbol referencing errors. No output written to testfixture
gmake: *** [Makefile:1199: testfixture] Error 2

Current version does not build at all.

Still locked in battle here on a few systems trying to get current
release to build anywhere and thus far Solaris and RHEL both fail with
references to tcl stuff in sqlite.


--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
Along those lines SQLite includes the reverse_unordered_selects pragma
https://www.sqlite.org/pragma.html#pragma_reverse_unordered_selects
which will flip the order it sends rows in queries that don't explicitly 
specify an ordering. It's there to assist you in finding spots in your code 
where you might be relying on implicit ordering when you really shouldn't be.

Also available as a compile time option: SQLITE_REVERSE_UNORDERED_SELECTS


-Original Message-
From: sqlite-users  On Behalf Of 
Simon Slavin
Sent: Thursday, November 7, 2019 12:16 PM
To: SQLite mailing list 
Subject: Re: [sqlite] Deterministic random sampling via SELECT

On 7 Nov 2019, at 1:56pm, David Raymond  wrote:

> Others will correct me if I'm wrong on that.

No correction, but I wanted to add something.

According to the theory of how SQL (not just SQLite, SQL) works, tables have no 
order.  You can, in theory, query a table of 100 rows with

SELECT a,b FROM MyTable LIMIT 5

ten times and get ten different answers, including different rows and/or the 
same rows in different orders.  Given some of the text in the post that started 
this thread, I just wanted to make sure this was understood.

In practise I have never seen a SQL engine which does this.  Each SQL 
implementation seems to return the same result every time you repeat the same 
query.  Though different SQL implementations can return different results.

You'd have thought that at least one server/client system would return five 
rows which happened to be in the cache, but I've never seen this.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] database disk image is malformed

2019-11-07 Thread Shawn Wagner
This line stood out:

> The main process opens the databases and then forks the other processes
which can then perform database operations using the already opened
databases.

From
https://sqlite.org/howtocorrupt.html#_carrying_an_open_database_connection_across_a_fork_
:

> Do not open an SQLite database connection, then fork(), then try to use
that database connection in the child process. All kinds of locking
problems will result and you can easily end up with a corrupt database.
SQLite is not designed to support that kind of behavior. Any database
connection that is used in a child process must be opened in the child
process, not inherited from the parent.

In this kind of situation, I usually use pthread_atfork() callbacks to
automate closing databases and then re-opening them in the parent and child.


On Thu, Nov 7, 2019 at 9:18 AM Jukka Marin  wrote:

> Dear List,
>
> I'm developing software which keeps parameters and real-time data in
> SQLite databases on a x86_64/linux system.  I am getting "database
> disk image is malformed" errors from SQLite when using select.  Some
> select operations succeed, some fail.  This happens on multiple systems.
>
> I would like to know if I'm doing something that is known not to work
> with SQLite.  Background information:
>
> The software runs on a xen virtual machine.  The database files are
> on a separate disk (actually, a separate partition of the host linux)
> which is directly mounted on the virtual machine (not using NFS)
> and uses ext4 filesystem.  The whole system runs off a single SSD disk.
>
> SQLite version is 3.28.0 and it is currently built and statically
> linked against the application binary.
>
> The databases are originally created by a script and the sqlite3
> command.  "pragma journal_mode=wal;" is set for all databases.
>
> The software consists of multiple processes.  The main process opens
> the databases and then forks the other processes which can then perform
> database operations using the already opened databases.  No locking is
> done in the processes, they rely on the SQLite internal mechanisms.
> sqlite3_busy_timeout is set to 2500 for all databases.
>
> The processes mostly read (select) data from the databases, but also
> occasionally add new rows or update existing rows.
>
>
> Is this system with multiple processes accessing the databases
> safe with SQLite?  If it is, does anyone have ideas of what could
> be corrupting the database files?
>
> These systems are running 24/7, but of course, it _may_ be possible
> that someone has disconnected the power supply without shutting down
> the system first and causing errors in the database files.
>
> Is there something I could do to (try to) prevent database corruption
> if power is lost?  (In software - a UPS is a hardware solution, of
> course).
>
> If the database is corrupted, is there any way of recovering it?
> Maybe I should use sqlite3 to dump and recreate the database (of
> course, some or all of the data might get lost)?
>
> Experienced SQLite users, please share your ideas - thank you! :-)
>
>   Jukka Marin
>   (a beginner with SQLite)
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Shawn Wagner
Another option (Which the devs might consider) is to replace use of the asm
keyword with __asm__, which is a form of the extension understood even in
strict mode.

(I have no idea why gcc doesn't offer intrinsics for bit rotation, as it's
a common task and shouldn't require hacks like this to reliably do
efficiently).

On Thu, Nov 7, 2019 at 8:23 AM Dennis Clarke  wrote:

> On 2019-11-07 11:15, Shawn Wagner wrote:
> > Does that toolchain use gcc, or a different compiler? If gcc, are you
> using
> > the same CFLAGS as on the redhat box (you're turning on a bunch of extra
> > non-default options there)?
> >
>
> I don't see how --with-threads can be a concern. The other options do
> not qualify as a "bunch" at all.
>
> Regardless it looks like the codebase does strange GNU extensions.
>
> There must be a way to switch all that off.
>
> Dennis
>
> ps : don't top post.
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] database disk image is malformed

2019-11-07 Thread Jukka Marin
Dear List,

I'm developing software which keeps parameters and real-time data in
SQLite databases on a x86_64/linux system.  I am getting "database
disk image is malformed" errors from SQLite when using select.  Some
select operations succeed, some fail.  This happens on multiple systems.

I would like to know if I'm doing something that is known not to work
with SQLite.  Background information:

The software runs on a xen virtual machine.  The database files are
on a separate disk (actually, a separate partition of the host linux)
which is directly mounted on the virtual machine (not using NFS)
and uses ext4 filesystem.  The whole system runs off a single SSD disk.

SQLite version is 3.28.0 and it is currently built and statically
linked against the application binary.

The databases are originally created by a script and the sqlite3
command.  "pragma journal_mode=wal;" is set for all databases.

The software consists of multiple processes.  The main process opens
the databases and then forks the other processes which can then perform
database operations using the already opened databases.  No locking is
done in the processes, they rely on the SQLite internal mechanisms.
sqlite3_busy_timeout is set to 2500 for all databases.

The processes mostly read (select) data from the databases, but also
occasionally add new rows or update existing rows.


Is this system with multiple processes accessing the databases
safe with SQLite?  If it is, does anyone have ideas of what could
be corrupting the database files?

These systems are running 24/7, but of course, it _may_ be possible
that someone has disconnected the power supply without shutting down
the system first and causing errors in the database files.

Is there something I could do to (try to) prevent database corruption
if power is lost?  (In software - a UPS is a hardware solution, of
course).

If the database is corrupted, is there any way of recovering it?
Maybe I should use sqlite3 to dump and recreate the database (of
course, some or all of the data might get lost)?

Experienced SQLite users, please share your ideas - thank you! :-)

  Jukka Marin
  (a beginner with SQLite)
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Dennis Clarke

On 2019-11-07 11:44, Shawn Wagner wrote:

... Just don't use strict c99 mode when compiling with gcc? Drop the -std
argument from your CFLAGS to use the default (gnu11 since gcc 5) or
explicitly use gnu99, which gives you that version of the C standard + gcc
extensions.

(Not that they have anything to do with the problem, but compiling with -O0
and -fno-builtin are strange unless you're planning on spending some
quality time in a debugger stepping through code, and -malign-double is
already the default on x86-64 so kind of pointless)



Debugger .. yes. That will happen and I build on a multitude of
platforms.

OKay so the code fails on Solaris sparc with c99 whereas in the recent
past it all builds fine :

libtool: compile:  /opt/developerstudio12.6/bin/c99 -I/usr/local/include 
-D_TS_ERRNO -D_POSIX_PTHREAD_SEMANTICS -D_LARGEFILE64_SOURCE -Xc -m64 
-xarch=sparc -g -errfmt=error -errshort=full -xstrconst -xildoff 
-xmemalign=8s -xnolibmil -xcode=pic32 -xregs=no%appl -xlibmieee -mc 
-ftrap=%none -xbuiltin=%none -xunroll=1 -xs -xdebugformat=dwarf 
-errtags=yes -errwarn=%none -erroff=%none -DSQLITE_OS_UNIX=1 -I. 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/rtree 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/icu 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/fts3 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/async 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/session 
-I/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/ext/userauth 
-D_HAVE_SQLITE_CONFIG_H -DBUILD_sqlite -DNDEBUG -I/usr/local/include 
-DSQLITE_THREADSAFE=1 -DSQLITE_HAVE_ZLIB=1 -DUSE_TCL_STUBS=1 -c 
/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c 
 -KPIC -DPIC -o .libs/tclsqlite.o
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2346: error: undefined symbol: SQLITE_DBCONFIG_ENABLE_VIEW
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2346: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2351: error: undefined symbol: SQLITE_DBCONFIG_TRIGGER_EQP
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2351: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2352: error: undefined symbol: SQLITE_DBCONFIG_RESET_DATABASE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2352: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2353: error: undefined symbol: SQLITE_DBCONFIG_DEFENSIVE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2353: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2354: error: undefined symbol: SQLITE_DBCONFIG_WRITABLE_SCHEMA
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2354: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2355: error: undefined symbol: SQLITE_DBCONFIG_LEGACY_ALTER_TABLE
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2355: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2356: error: undefined symbol: SQLITE_DBCONFIG_DQS_DML
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2356: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2357: error: undefined symbol: SQLITE_DBCONFIG_DQS_DDL
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2357: error: non-constant initializer: op "NAME"
"/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c", 
line 2855: error: undefined symbol: SQLITE_DIRECTONLY
c99: acomp failed for 
/usr/local/build/sqlite-src-3300100_Oracle_sparc64vii+.001/src/tclsqlite.c

gmake: *** [Makefile:1029: tclsqlite.lo] Error 1



On Red Hat Enterprise Linux 7.4 the code actually does compile and then
core dumps with a segfault from with that same source file :

Time: walshared.test 24 ms
# WARNING: This next test takes around 12 seconds
gmake: *** [Makefile:1256: tcltest] Segmentation fault (core dumped)
boe13$ pwd
/opt/bw/build/sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.001
boe13$ find . | grep -i 'core'
./testdir/core.43494
boe13$
boe13$ file ./testdir/core.43494
./testdir/core.43494: ELF 64-bit LSB core file x86-64, version 1 (SYSV), 
SVR4-style, from './testfixture 
/opt/bw/build/sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.00', 

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Simon Slavin
On 7 Nov 2019, at 1:56pm, David Raymond  wrote:

> Others will correct me if I'm wrong on that.

No correction, but I wanted to add something.

According to the theory of how SQL (not just SQLite, SQL) works, tables have no 
order.  You can, in theory, query a table of 100 rows with

SELECT a,b FROM MyTable LIMIT 5

ten times and get ten different answers, including different rows and/or the 
same rows in different orders.  Given some of the text in the post that started 
this thread, I just wanted to make sure this was understood.

In practise I have never seen a SQL engine which does this.  Each SQL 
implementation seems to return the same result every time you repeat the same 
query.  Though different SQL implementations can return different results.

You'd have thought that at least one server/client system would return five 
rows which happened to be in the cache, but I've never seen this.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite with branching

2019-11-07 Thread Bernardo Ramos

Hi!

I am the creator of LiteTree (also LiteReplica, LiteSync and 3 new 
products that will be released soon).


When I was planning to add branching I discovered many ways to implement 
it. I selected the one that satisfied performance over disk usage. It 
can also be implemented the other way around, with low disk space and 
slower. Having both high performance and low disk usage is really hard.


The performance comes from using LMDB and from fine-tuning it to reach 
this goal (in a safe way).


But it comes with a disadvantage: it uses a lot of disk space when 
compared with a normal SQLite db file.


The reason: all past states of the database must be stored if we want to 
be able to create new branches from any place, as well as to navigate 
the database at any previous point in time.


It does not store the entire db state at each point-in-time, just the 
modified pages compared to the previous point.


It also requires a considerable amount of virtual memory space, as (I 
guess) in any memory mapped solution.


If you are interested in just the performance without the branching 
feature, there are at least 3 options:


1. SQLigthning: I was thinking in updating it to the last version of 
SQLite

2. Modified version of LiteTree, without branches
3. SQLite with mmap

I confess that I have not tried SQLite with mmap yet. So maybe it is as 
fast as LiteTree, or even faster. IDK


Do not forget that all these 3 options use memory mapping. Consider this 
on IoT devices and 32-bit processors.


Options 1 and 2 were in my list, but now I have more important products 
being implemented. And option 3 may solve the requirement anyway.



Now let me uncover some differences here:

SQLigthning: Stores SQLite db's rows on LMDB

LiteTree: Stores SQLite db's entire pages on LMDB


Or, showing by SQLite interface level:

SQLigthning: B-tree level

LiteTree: Pager/WAL level


You may wonder how storing an entire db page on another db could be 
fast... one trick is to use the SQLite's reserved space feature on each 
page, matching the size of the header for overflow pages on LMDB. In 
this way a SQLite db page is stored using exactly 4096 bytes on LMDB! 
(not counting the required b-tree index)


The same trick could be applied for another WAL file format in a way 
that each db page would be stored exactly at disk sector boundaries, off 
course having the WAL header using an entire page. The reserved space on 
this case would be the same size of a WAL page header. This would not 
change the write speed but could make the read of random pages on WAL a 
little faster.


This also comes with a disadvantage of using a little more disk space 
than normal, and it is not compatible with existing dbs (a new file with 
reserved space on each page should be created). So it could only exist 
as an option (extension?) or separate product.


In some of my projects I modified the WAL module so the interface is 
pluggable like VFS.


But yeah, I do not know whether the results would compensate the effort 
on the main trunk.


In some cases it is better to implement a virtual table instead.

Anyway, all of these modifications and derived products are only 
possible due to the spectacular work of Richard, as well as Dan and 
Mistachkin.


Thank you so much!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Shawn Wagner
... Just don't use strict c99 mode when compiling with gcc? Drop the -std
argument from your CFLAGS to use the default (gnu11 since gcc 5) or
explicitly use gnu99, which gives you that version of the C standard + gcc
extensions.

(Not that they have anything to do with the problem, but compiling with -O0
and -fno-builtin are strange unless you're planning on spending some
quality time in a debugger stepping through code, and -malign-double is
already the default on x86-64 so kind of pointless)

On Thu, Nov 7, 2019 at 8:23 AM Dennis Clarke  wrote:

> On 2019-11-07 11:15, Shawn Wagner wrote:
> > Does that toolchain use gcc, or a different compiler? If gcc, are you
> using
> > the same CFLAGS as on the redhat box (you're turning on a bunch of extra
> > non-default options there)?
> >
>
> I don't see how --with-threads can be a concern. The other options do
> not qualify as a "bunch" at all.
>
> Regardless it looks like the codebase does strange GNU extensions.
>
> There must be a way to switch all that off.
>
> Dennis
>
> ps : don't top post.
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Dennis Clarke

On 2019-11-07 11:15, Shawn Wagner wrote:

Does that toolchain use gcc, or a different compiler? If gcc, are you using
the same CFLAGS as on the redhat box (you're turning on a bunch of extra
non-default options there)?



I don't see how --with-threads can be a concern. The other options do
not qualify as a "bunch" at all.

Regardless it looks like the codebase does strange GNU extensions.

There must be a way to switch all that off.

Dennis

ps : don't top post.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Shawn Wagner
For that matter, is the Solaris box x86 or sparc? If the latter, this
particular code path obviously wouldn't ever be used on it.

On Thu, Nov 7, 2019, 8:10 AM Dennis Clarke  wrote:

> On 2019-11-07 11:01, Shawn Wagner wrote:
> > Compiling with -std=iso9899:1999 is the culprit. Strict c99 mode disables
> > gcc extensions like inline asm.
>
> OKay but I have no issues doing a compile on a Solaris server using the
> Oracle Studio 12.6 tools and strict C99.  So how is C99 the issue?
>
> Dennis
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Shawn Wagner
Does that toolchain use gcc, or a different compiler? If gcc, are you using
the same CFLAGS as on the redhat box (you're turning on a bunch of extra
non-default options there)?

On Thu, Nov 7, 2019, 8:10 AM Dennis Clarke  wrote:

> On 2019-11-07 11:01, Shawn Wagner wrote:
> > Compiling with -std=iso9899:1999 is the culprit. Strict c99 mode disables
> > gcc extensions like inline asm.
>
> OKay but I have no issues doing a compile on a Solaris server using the
> Oracle Studio 12.6 tools and strict C99.  So how is C99 the issue?
>
> Dennis
>
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Dennis Clarke

On 2019-11-07 11:01, Shawn Wagner wrote:

Compiling with -std=iso9899:1999 is the culprit. Strict c99 mode disables
gcc extensions like inline asm.


OKay but I have no issues doing a compile on a Solaris server using the
Oracle Studio 12.6 tools and strict C99.  So how is C99 the issue?

Dennis

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Shawn Wagner
Compiling with -std=iso9899:1999 is the culprit. Strict c99 mode disables
gcc extensions like inline asm.

On Thu, Nov 7, 2019, 7:57 AM Dennis Clarke  wrote:

>
> While building sqlite from the zip file sqlite-src-3300100.zip I saw
> what seems to be a normal configure and then madness :
>
>
> boe13$ pwd
> /opt/bw/build/sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.001
> boe13$ ./configure --prefix=/opt/bw --enable-shared --enable-static
> --enable-readline --enable-threadsafe 2>&1 | tee
> ../sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.001.config.log
> checking build system type... x86_64-pc-linux-gnu
> checking host system type... x86_64-pc-linux-gnu
> checking for gcc... /opt/bw/gcc9/bin/gcc
> checking whether the C compiler works... yes
> checking for C compiler default output file name... a.out
> checking for suffix of executables...
> checking whether we are cross compiling... no
> checking for suffix of object files... o
> checking whether we are using the GNU C compiler... yes
> checking whether /opt/bw/gcc9/bin/gcc accepts -g... yes
> checking for /opt/bw/gcc9/bin/gcc option to accept ISO C89... none needed
> checking for a sed that does not truncate output... /usr/bin/sed
> checking for grep that handles long lines and -e... /usr/bin/grep
> checking for egrep... /usr/bin/grep -E
> checking for fgrep... /usr/bin/grep -F
> checking for ld used by /opt/bw/gcc9/bin/gcc... /usr/bin/ld
> checking if the linker (/usr/bin/ld) is GNU ld... yes
> checking for BSD- or MS-compatible name lister (nm)...
> /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm -p
> checking the name lister (/opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm
> -p) interface... BSD nm
> checking whether ln -s works... yes
> checking the maximum length of command line arguments... 1572864
> checking whether the shell understands some XSI constructs... yes
> checking whether the shell understands "+="... yes
> checking for /usr/bin/ld option to reload object files... -r
> checking for objdump... /usr/bin/objdump
> checking how to recognize dependent libraries... pass_all
> checking for ar... /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-ar
> checking for strip... /usr/bin/strip
> checking for ranlib... /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-ranlib
> checking command to parse /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm -p
> output from /opt/bw/gcc9/bin/gcc object... ok
> checking how to run the C preprocessor... /opt/bw/gcc9/bin/gcc -E
> checking for ANSI C header files... yes
> checking for sys/types.h... yes
> checking for sys/stat.h... yes
> checking for stdlib.h... yes
> checking for string.h... yes
> checking for memory.h... yes
> checking for strings.h... yes
> checking for inttypes.h... yes
> checking for stdint.h... yes
> checking for unistd.h... yes
> checking for dlfcn.h... yes
> checking for objdir... .libs
> checking if /opt/bw/gcc9/bin/gcc supports -fno-rtti -fno-exceptions... no
> checking for /opt/bw/gcc9/bin/gcc option to produce PIC... -fPIC -DPIC
> checking if /opt/bw/gcc9/bin/gcc PIC flag -fPIC -DPIC works... yes
> checking if /opt/bw/gcc9/bin/gcc static flag -static works... no
> checking if /opt/bw/gcc9/bin/gcc supports -c -o file.o... yes
> checking if /opt/bw/gcc9/bin/gcc supports -c -o file.o... (cached) yes
> checking whether the /opt/bw/gcc9/bin/gcc linker (/usr/bin/ld -m
> elf_x86_64) supports shared libraries... yes
> checking whether -lc should be explicitly linked in... no
> checking dynamic linker characteristics... GNU/Linux ld.so
> checking how to hardcode library paths into programs... immediate
> checking whether stripping libraries is possible... yes
> checking if libtool supports shared libraries... yes
> checking whether to build shared libraries... yes
> checking whether to build static libraries... yes
> checking for a BSD-compatible install... /usr/bin/install -c
> checking for special C compiler options needed for large files... no
> checking for _FILE_OFFSET_BITS value needed for large files... no
> checking for int8_t... yes
> checking for int16_t... yes
> checking for int32_t... yes
> checking for int64_t... yes
> checking for intptr_t... yes
> checking for uint8_t... yes
> checking for uint16_t... yes
> checking for uint32_t... yes
> checking for uint64_t... yes
> checking for uintptr_t... yes
> checking for sys/types.h... (cached) yes
> checking for stdlib.h... (cached) yes
> checking for stdint.h... (cached) yes
> checking for inttypes.h... (cached) yes
> checking malloc.h usability... yes
> checking malloc.h presence... yes
> checking for malloc.h... yes
> checking for fdatasync... yes
> checking for gmtime_r... yes
> checking for isnan... yes
> checking for localtime_r... yes
> checking for localtime_s... no
> checking for malloc_usable_size... yes
> checking for strchrnul... yes
> checking for usleep... yes
> checking for utime... yes
> checking for pread... yes
> checking for pread64... yes
> checking for pwrite... yes
> checking for pwrite64... yes
> checking for tclsh8.7... tclsh8.7
> 

[sqlite] sqlite-src-3300100 on RHEL 7.4 toss mad errors about 'asm'

2019-11-07 Thread Dennis Clarke


While building sqlite from the zip file sqlite-src-3300100.zip I saw
what seems to be a normal configure and then madness :


boe13$ pwd
/opt/bw/build/sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.001
boe13$ ./configure --prefix=/opt/bw --enable-shared --enable-static 
--enable-readline --enable-threadsafe 2>&1 | tee 
../sqlite-src-3300100_rhel_74_3.10.0-693.el7.x86_64.001.config.log

checking build system type... x86_64-pc-linux-gnu
checking host system type... x86_64-pc-linux-gnu
checking for gcc... /opt/bw/gcc9/bin/gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /opt/bw/gcc9/bin/gcc accepts -g... yes
checking for /opt/bw/gcc9/bin/gcc option to accept ISO C89... none needed
checking for a sed that does not truncate output... /usr/bin/sed
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for fgrep... /usr/bin/grep -F
checking for ld used by /opt/bw/gcc9/bin/gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... 
/opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm -p
checking the name lister (/opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm 
-p) interface... BSD nm

checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... /usr/bin/objdump
checking how to recognize dependent libraries... pass_all
checking for ar... /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-ar
checking for strip... /usr/bin/strip
checking for ranlib... /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-ranlib
checking command to parse /opt/bw/gcc9/bin/x86_64-redhat-linux-gcc-nm -p 
output from /opt/bw/gcc9/bin/gcc object... ok

checking how to run the C preprocessor... /opt/bw/gcc9/bin/gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if /opt/bw/gcc9/bin/gcc supports -fno-rtti -fno-exceptions... no
checking for /opt/bw/gcc9/bin/gcc option to produce PIC... -fPIC -DPIC
checking if /opt/bw/gcc9/bin/gcc PIC flag -fPIC -DPIC works... yes
checking if /opt/bw/gcc9/bin/gcc static flag -static works... no
checking if /opt/bw/gcc9/bin/gcc supports -c -o file.o... yes
checking if /opt/bw/gcc9/bin/gcc supports -c -o file.o... (cached) yes
checking whether the /opt/bw/gcc9/bin/gcc linker (/usr/bin/ld -m 
elf_x86_64) supports shared libraries... yes

checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking for int8_t... yes
checking for int16_t... yes
checking for int32_t... yes
checking for int64_t... yes
checking for intptr_t... yes
checking for uint8_t... yes
checking for uint16_t... yes
checking for uint32_t... yes
checking for uint64_t... yes
checking for uintptr_t... yes
checking for sys/types.h... (cached) yes
checking for stdlib.h... (cached) yes
checking for stdint.h... (cached) yes
checking for inttypes.h... (cached) yes
checking malloc.h usability... yes
checking malloc.h presence... yes
checking for malloc.h... yes
checking for fdatasync... yes
checking for gmtime_r... yes
checking for isnan... yes
checking for localtime_r... yes
checking for localtime_s... no
checking for malloc_usable_size... yes
checking for strchrnul... yes
checking for usleep... yes
checking for utime... yes
checking for pread... yes
checking for pread64... yes
checking for pwrite... yes
checking for pwrite64... yes
checking for tclsh8.7... tclsh8.7
configure: Version set to 3.30
configure: Release set to 3.30.1
configure: Version number set to 3030001
checking whether to support threadsafe operation... yes
checking for library containing pthread_create... -lpthread
checking for library containing pthread_mutexattr_init... none required
checking whether to support shared library linked as release mode or 

Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Keith Medcalf
On Thursday, 7 November, 2019 04:55, Merijn Verstraaten 
 wrote:

>I'm trying sample a (deterministically) random subset of a SELECT query,

You cannot have something that is both RANDOM and DETERMINISTIC at the same 
time.

>the most common solution on the internet to get random samples seems to
>be "SELECT * FROM (...) ORDER BY RANDOM() LIMIT n;" (this already has
>some question marks, since it relies on seeding RANDOM and knowing the
>RANDOM function is always evaluated in the same order every query run),

This must be from StackOverflow because it is a terrible idea.

>but looking at the query plans this materialises the entire result set in
>memory for the sort (no surprise, I can't think of anyway that could work
>otherwise) which is rather undesirable if the sample size becomes large
>(i.e. several million rows).

Exactly.

>Now, I already know different ways to implement a predicate function that
>can deterministically keep elements from a stream, however that relies on
>having a deterministic order for the stream. Which brings us to SQLite. I
>can easily write something like:

>SELECT *
>FROM (...)
>WHERE my_predicate_fun()
>ORDER BY column1, column2,...

>And this *seems* to evaluate the where clause for each row in the order
>determined by ORDER BY, but this doesn't seem at all guaranteed by the
>SQL spec. 

That is correct.  ORDER BY is executed as the LAST STEP in returning the query 
results.  However, based on availability of indexes, the optimizer my decide to 
utilize those indexes in order to obtain results in already sorted or partially 
sorted order.  The various WHERE clause terms will be checked at the 
appropriate place to minimize the number of candidates at each step.  Since 
my_predicate_fun() is not dependent on any item it will be executed at the last 
step on each candidate (that has passed all the other where clause constraints) 
before that candidate is passed to the sorter.

>So, is this behaviour documented/guaranteed somewhere? If not,
>is there some way to guarantee my where clause is evaluated for each row
>in a deterministic order?

Yes, and no.  For a given (constant) query of a given set of (constant) data 
with a given set of (constant) indexes on that data, the (same version) 
optimizer will always arrive at the same method of answering your query.  If 
you change the query or the data or the indexes available or the version of the 
optimizer, then the most efficient plan for obtaining the results for which you 
asked will be used, which will thereafter be constant.  That is, if the input 
is constant, and the software is constant, then the output and the how the 
software arrives at that output, will be constant.

See https://www.sqlite.org/queryplanner-ng.html#qpstab

You might want to read the whole page.

>In the simple case like above I could always just evaluate the query
>without the ORDER BY, step through the entire query, and evaluate the
>predicate in the application, but if I want to use this random selection
>as a subquery, then that doesn't work.

You have changed the query.  Each query is optimized individually.

>And while I'm asking questions: What if I want to do the above, but
>selecting groups of rows? So, sort of like:

>SELECT *
>FROM (...)
>GROUP BY groupColumn
>HAVING my_predicate_fun();

>But where I want to return all rows in the group, rather than an
>aggregate.

Then you must use the group result to select the data.  Use of GROUP BY/HAVING 
implies returning GROUPs, not rows.  You can of course use the selected groups 
to find the matching members of the group and return those.  Example:

select 
  from 
  join (select a, b 
  from  
 where  
  group by a, b 
having my_predicate_function()) as g
on .a = g.a AND .b = g.b
 where 
;

If you wish to generate a "static random sample" of the data, then why do you 
not just do it once and record the result, thereafter using that keyset to 
re-retrieve the same data whenever you need it?  Eg:

create table KeySet as
select a.RowID as aRowID, b.RowID as bRowID, c.RowID as cRowID
  from a, b, c
 where ;

and thereafter

select 
  from KeySet
  join a on a.rowid == KeySet.aRowID
  join b on b.rowid == KeySet.bRowID
  join c on c.rowid == KeySet.cRowID

-- 
The fact that there's a Highway to Hell but only a Stairway to Heaven says a 
lot about anticipated traffic volume.



___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread David Raymond
"So, is this behaviour documented/guaranteed somewhere?"
Short version is: Nope. The engine is free to do whatever it wants as long as 
it gives the correct result in the end.

Consider a simple
select * from foo where predicate order by non_indexed_field;
Since there is no nice ordering of the data already it's going to have to sort 
it. In which case it's probably going to check the predicate against records on 
the way _in_ to the sorter rather than _after_ sorting. Think "I might only 
have to sort 5 things instead of 5 million, so let's filter out as much as we 
can as soon as possible." And since the same data could be on the disk with its 
pages in any order you could conceive a situation where the same data could be 
processed differently depending on the specific file that it's in. The same 
query could process the data in a different order before and after a vacuum for 
example.

Or maybe it does sort first then check. But that's an internal detail which 
could change every version. And all that is all considered fine, as the end 
result of the query will still be correct and in the order specified.

The closest thing you can do is limit a query to using a specific index during 
a query, but even then you're basically relying on implementation details, and 
not a guarantee.

Others will correct me if I'm wrong on that.



-Original Message-
From: sqlite-users  On Behalf Of 
Merijn Verstraaten
Sent: Thursday, November 7, 2019 6:55 AM
To: sqlite-users@mailinglists.sqlite.org
Subject: [sqlite] Deterministic random sampling via SELECT

I'm trying sample a (deterministically) random subset of a SELECT query, the 
most common solution on the internet to get random samples seems to be "SELECT 
* FROM (...) ORDER BY RANDOM() LIMIT n;" (this already has some question marks, 
since it relies on seeding RANDOM and knowing the RANDOM function is always 
evaluated in the same order every query run), but looking at the query plans 
this materialises the entire result set in memory for the sort (no surprise, I 
can't think of anyway that could work otherwise) which is rather undesirable if 
the sample size becomes large (i.e. several million rows).

Now, I already know different ways to implement a predicate function that can 
deterministically keep elements from a stream, however that relies on having a 
deterministic order for the stream. Which brings us to SQLite. I can easily 
write something like:

SELECT *
FROM (...)
WHERE my_predicate_fun()
ORDER BY column1, column2,...

And this *seems* to evaluate the where clause for each row in the order 
determined by ORDER BY, but this doesn't seem at all guaranteed by the SQL 
spec. So, is this behaviour documented/guaranteed somewhere? If not, is there 
some way to guarantee my where clause is evaluated for each row in a 
deterministic order?

In the simple case like above I could always just evaluate the query without 
the ORDER BY, step through the entire query, and evaluate the predicate in the 
application, but if I want to use this random selection as a subquery, then 
that doesn't work.

And while I'm asking questions: What if I want to do the above, but selecting 
groups of rows? So, sort of like:

SELECT *
FROM (...)
GROUP BY groupColumn
HAVING my_predicate_fun();

But where I want to return all rows in the group, rather than an aggregate.

Thanks in advance,
Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Deterministic random sampling via SELECT

2019-11-07 Thread Merijn Verstraaten
I'm trying sample a (deterministically) random subset of a SELECT query, the 
most common solution on the internet to get random samples seems to be "SELECT 
* FROM (...) ORDER BY RANDOM() LIMIT n;" (this already has some question marks, 
since it relies on seeding RANDOM and knowing the RANDOM function is always 
evaluated in the same order every query run), but looking at the query plans 
this materialises the entire result set in memory for the sort (no surprise, I 
can't think of anyway that could work otherwise) which is rather undesirable if 
the sample size becomes large (i.e. several million rows).

Now, I already know different ways to implement a predicate function that can 
deterministically keep elements from a stream, however that relies on having a 
deterministic order for the stream. Which brings us to SQLite. I can easily 
write something like:

SELECT *
FROM (...)
WHERE my_predicate_fun()
ORDER BY column1, column2,...

And this *seems* to evaluate the where clause for each row in the order 
determined by ORDER BY, but this doesn't seem at all guaranteed by the SQL 
spec. So, is this behaviour documented/guaranteed somewhere? If not, is there 
some way to guarantee my where clause is evaluated for each row in a 
deterministic order?

In the simple case like above I could always just evaluate the query without 
the ORDER BY, step through the entire query, and evaluate the predicate in the 
application, but if I want to use this random selection as a subquery, then 
that doesn't work.

And while I'm asking questions: What if I want to do the above, but selecting 
groups of rows? So, sort of like:

SELECT *
FROM (...)
GROUP BY groupColumn
HAVING my_predicate_fun();

But where I want to return all rows in the group, rather than an aggregate.

Thanks in advance,
Merijn
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users