Re: [sqlite] UPDATE statement without FROM clause

2016-06-06 Thread James K. Lowden
On Sat, 4 Jun 2016 18:18:36 +0200
skywind mailing lists  wrote:

> At the moment I have to run something like:
> 
> UPDATE A SET item1=(SELECT B.item FROM B WHERE B.ID=A.ID),...
> itemN=... WHERE EXISTS (SELECT 1 FROM B WHERE B.ID=A.ID);
> 
> Using a FROM clause I just need one scan through B (at least in
> principle). Now, I need N+1 scans.

Nonsense.  SQL provides no instruction to the implementation on how to
organize or traverse the data.  SQLite is free to scan B once, twice,
or not at all.  

Syntax has nothing to do with performance.  A correlated subquery is an
expression of logic; it's not meant to be taken literally, and often
isn't.  This particular form "just" needs to be recognized by the
optimizer.  

--jkl
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQL / SQLite for Beginners

2016-06-06 Thread James K. Lowden
On Thu, 26 May 2016 10:54:30 -0400
r.a.n...@gmail.com wrote:

> FWIW, since it's inception, S.Q.L has been pronounced allot like
> CICS. 

This may be more true than you know.  It's not too hard to find
old-timers who pronounce it "kicks".  

--jkl

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there equivalent to MySQL IF() function?

2014-10-07 Thread James K. Lowden
On Tue, 7 Oct 2014 12:15:09 +0300
"Tony Papadimitriou"  wrote:

> Is there any an equivalent function to the MySQL 
> IF(condition,true_expr,false_expr) function?
> 
> For example, SELECT AGE,IF(AGE < 3,"BABY",IF(AGE <
> 18,"CHILD","ADULT"));
> 
> If not, please add to wish list :)

You can always solve problems like this by putting the data in a table:

create table stages
( age int primary key
, stage string not null
);
insert into stages values (3, 'baby'), (18, 'child'), (999, 'adult');

create view vstages as
select A.age, stage from (
   select a.age, min(s.age) as threshold
   from ages as a join stages as s
   on a.age <= s.age
   group by a.age
) as A join stages as s
on A.threshold = s.age;

$ sqlite3 -echo db <<< 'select * from vstages;'
select * from vstages;
age stage 
--  --
1   baby  
2   baby  
3   baby  
4   child 
5   child 
6   child 
7   child 
8   child 
9   child 
10  child 
11  child 
12  child 
13  child 
14  child 
15  child 
16  child 
17  child 
18  child 
19  adult 
20  adult 
21  adult 

Another form of the query joins the "stages" table to itself to create
a range, and uses BETWEEN to choose the label.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Detecting multiple CHECK failures

2014-10-08 Thread James K. Lowden
On Wed, 08 Oct 2014 15:01:39 +0200
Clemens Ladisch  wrote:

> SQL constraints were designed to catch _programming_ errors, not
> _user_ errors.

Neither and both, actually.  Database theory doesn't distinguish between
different sources of invalid input.  

Constraints enforce consistency. They prevent the database from
reaching a state that's "out of bounds" as defined by its own rules.  

One way to think about a database is as one giant variable of a
single type. All changes have to continue to meet the rules of that
type, else it wouldn't be that type anymore.  It would be NaD: not a
database.  

Sometimes the user could know.  There's no February 31 (except in
MySQL).  Sometimes the program could know, as when the foreign key
referent must be inserted before the reference.  Sometimes the program
can't know, as when another user reserved seat 32B since it was
displayed as available.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible automatic ON CONFLICT resolution via DEFAULTS

2014-10-08 Thread James K. Lowden
On Wed, 8 Oct 2014 00:14:51 -0400
Stephen Chrzanowski  wrote:

> When adding a NULL value to a table that has the NOT NULL flag set on
> that field, instead of raising an exception, if the field definition
> were to have the word "USE" between "ON CONFLICT" and "DEFAULT" in
> its declaration, it'd use whatever the fields default value was set
> to.  If USE is included, the DEFAULT value must be included,
> otherwise the table isn't created.

I think what you want is usually provided by the DML, not the DDL.  

> update ModGroups set ActiveOnServer=null where GroupID = 1;

becomes

update ModGroups set ActiveOnServer=DEFAULT where GroupID = 1;

which is more direct and IMO clearer.  

The problem I see with your suggestion is that I can't think of another
situation, with or without NULL, with or without defaults, where

insert into T (t) value (X) 

results in T.t = Y.  You should get what you asked for, or an error,
not a magical transformation.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Possible automatic ON CONFLICT resolution via DEFAULTS

2014-10-10 Thread James K. Lowden
On Thu, 9 Oct 2014 11:16:25 -0400
Stephen Chrzanowski <pontia...@gmail.com> wrote:
> On Wed, Oct 8, 2014 at 8:38 PM, James K. Lowden
> <jklow...@schemamania.org> wrote:
> 
> >
> > The problem I see with your suggestion is that I can't think of
> > another situation, with or without NULL, with or without defaults,
> > where
> >
> > insert into T (t) value (X)
> >
> > results in T.t = Y.  You should get what you asked for, or an error,
> > not a magical transformation.

> I wouldn't call it 'magical' if the definition is right on the field
> declaration

Perhaps "magical" wasn't the best term.  My simple point is that in no
other case does inserting a value X into a column result in a
different value Y appearing there.  

The "value (DEFAULT)" syntax does not suffer from that problem.  

I hear someone saying "triggers".  Sure, you can abuse the system.
Anyone burned by triggers that do anything other than ensure
referential integrity quickly learns to be twice shy.  The system is
not improved when it subverts stated intentions.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Make a database read-only?

2014-10-15 Thread James K. Lowden
On Tue, 14 Oct 2014 18:21:27 -0400
Ross Altman  wrote:

> Yeah, that's actually a really good point. Oh well, I guess I'll just
> have to hope that people decide to use the database responsibly...
> haha

You can advertise your database with the tagline, "Please compute
responsibly".  

The first rule of security is that there's no security without physical
security.  If you don't control the physical thing itself -- usually
hardware, but in this case a file -- then you don't control what
can be done with it.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Search query alternatives.

2014-10-17 Thread James K. Lowden
On Thu, 16 Oct 2014 09:05:51 +1100
Michael Falconer  wrote:

> we just wonder if there is a better way to perform this search in
> SQL. Is there a general technique which is superior either in speed,
> efficiency or load bearing contexts?

The simple answer is No, because SQL is a specification, not an
implementation.  Different systems implement it differently and
therefore perform differently.  Any "general technique" affecting
performance belongs to the implementation per se, not the SQL, which is
a logical construction. SQLite itself has changed its performance
characteristics over the course of its development.  

For that reason, any question of performance has to be answered in
terms of a particular implementation, even its specific version, and
the OS and hardware it's running on.  

That said, there is reason to suppose that a single-table design would
be more efficient.  If the queries can be expressed with recursion and
the indexes lead to efficient searches, the query optimizer has less
work to do.  It has fewer permutations to consider, and the search is
apt to touch fewer pages.  The analysis tools of the system you're
using should be able to confirm or deny that supposition.  

I would remind your fellows, though, that efficiency is not all.  The
utility of a model (that is, the database design) is measured by how
well, to its purpose, it describes the real world.  Any model that must
be changed as that reality changes in predictable ways isn't really
much of a model; it turns the designer into a component of the model.
By recognizing all trees as one, you generalize your model and make it
do work you are now doing yourself (manually, or in application
logic).  By any measure, that makes it a better model.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unicode case insensitive

2014-10-24 Thread James K. Lowden
On Fri, 24 Oct 2014 21:44:50 +0400
dd  wrote:

> >>Convert everything to upper (or lower) case brute force.
>Sorry. I am not clear. Can you please elaborate this.

The standard function tolower(3) is locale-dependent. If your locale is
set to match the data's single-byte encoding, 

tolower('Ö') == tolower('ö') .

If you are using Unicode, you have towlower(3) as defined by C99.  If
you're using utf-8, you'll want to call ivonv(3) first to convert the
string to an array of wint_t. 

Plug that into the xCompare function of a custom collation, and you
have your own case-insensitive matching capability.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Finding second occurrence of character in string

2014-10-26 Thread James K. Lowden
On Sun, 26 Oct 2014 15:27:24 +0300
Baruch Burstein  wrote:

> I need to get the path with the
> first 2 parts stripped off. Currently I am doing:
> 
> substr(path, 4+instr(substr(path,4),'/'))
> 
> But that seems long and probably inefficient.
> What is the best/simplest way to find the second occurrence of the
> '/' in a string?

Fast is fast enough.  If your SQL solves your problem acceptably fast,
you're done.  

If you need something faster, you could implement support for
regular expressions.  Posix filename rules are very strict: the only
disallowed characters are '/' and NUL.  A simple regex returns the Nth
occurence of a pattern.  I would expect such a solution to move the
performance constraint from the SQL interpreter (if that's where it is)
to I/O.  

My example implementation is at
http://www.schemamania.org/sql/sqlite/udf/regex.pdf  Note however I
wrote it as a user-defined function before I understood that REGEXP is a
supported keyword in the SQLite syntax.  If you implement a regexp()
user function, you have access to it as an SQL operator, not just as a
UDF.  


--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Keeping -wal and -shm files

2014-10-27 Thread James K. Lowden
On Mon, 27 Oct 2014 12:33:59 +0100
Steinar Midtskogen  wrote:

> Is there a way to prevent the -wal and -shm files from being deleted
> after use, so that I can have them always have the right group?  Or is
> there a way to tell Linux to observe the setgid flag on a directory
> (ext4)?

See -o grpid in mount(8).  I think that's what you want.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] quasi-bug related to locking, and attached databases....

2014-10-27 Thread James K. Lowden
On Mon, 27 Oct 2014 16:49:42 -0500
Nico Williams  wrote:

> If it's not too much to ask for then SQLite3 ought to: a) check for
> duplicates by canonicalized path (but keep in mind that this can be
> difficult to do portably, or without obnoxious length limitations on
> Windows), 

The name is not the file.  Even Windows has hard links these days.  :-(

> then b) check for duplicates by st_dev/st_ino where available.

Not portable, but at least reliable.  :-)

The Googles reveal the BY_HANDLE_FILE_INFORMATION structure
returned by GetFileInformationByHandle
(http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952
(v=vs.85).aspx) includes "a unique identifier that is associated with a
file".  Why that value couldn't be returned in st_ino by fstat(2) would
seem to to rest less on technical and more on commercial concerns.  The
"File ID" discussion (Remarks,
http://msdn.microsoft.com/en-us/library/windows/desktop/aa363788
(v=vs.85).aspx) is facinating in a morbid way.  

On reflection, I don't think a file uniqueness function is something
SQLite need provide in any form.  The application supplies the filename
of the database to attach.  It is capable of calling fstat or
GetFileInformationByHandle as appropriate if required, without help
from SQLite.  

--jkl


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Keeping -wal and -shm files

2014-10-28 Thread James K. Lowden
On Mon, 27 Oct 2014 17:41:53 +0100
Steinar Midtskogen <stei...@latinitas.org> wrote:

> "James K. Lowden" <jklow...@schemamania.org> writes:
> 
> > See -o grpid in mount(8).  I think that's what you want.  
> 
> Thanks.  It works!

Hmm, I'm glad, but as David Woodhouse pointed out, it shouldn't have
been necessary.  Your question was: 

> On Mon, 27 Oct 2014 12:33:59 +0100
> Steinar Midtskogen <stei...@latinitas.org> wrote:
> 
> > is there a way to tell Linux to observe the setgid flag on a
> > directory (ext4)?

and -o grpid changes the directory's *gid* effect, not setgid.  Are you
sure that the directory is setgid?  

$ ls -ld steinar
drwxrwxr-x  2 jklowden  wheel  512 Oct 28 09:54 steinar
$ chmod 2775 steinar
$ ls -ld steinar
drwxrwsr-x  2 jklowden  wheel  512 Oct 28 09:55 steinar
 ^--- note "s" in permission bit

You may be more used to traditional BSD behavior, which -o grpid
restores.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Clarification on sqlite handling of mixed aggregate and non-aggregate columns

2014-10-29 Thread James K. Lowden
On Wed, 29 Oct 2014 20:38:07 +0200
Baruch Burstein  wrote:

> If I have a table, "t", with 2 columns, "a" and "b". Assuming that
> "a" is a unique number, will the following query always return the
> whole row (that is, with the correct "b" column) where "a" is the
> highest number below 50?
> 
> SELECT max(a), b FROM t WHERE a<50;

Standard syntax would be 

select * from t where a = (select max(a) from t where a < 50);

If several rows meet the criteria (have the same value of "a"), you'll
get all of them.  If that's not what you want, you need more criteria,
e.g., 

select a, min(b) as "first b" from t
where a = (select max(a) from t where a < 50)
group by a;

Interestingly, your query should be expressible as an existence
test, 

select * from t as T where exists (
select 1 from t 
where a < 50
having max(a) = T.a
);

but in SQLite that produces a syntax error, 

Error: near line 15: a GROUP BY clause is required before HAVING

I suggest that's an unnecessary requirement.  Lack of a GROUP BY --
combined with a lack of column names in the SELECT clause -- is a truth
test: it either produces a row or not.  In fact, the very strange

select min(b) from t having max(a) = 50;

is permissible afaik by the standard.  (Just an illustration, not your
query.)

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] man page bug

2014-10-31 Thread James K. Lowden
On Thu, 30 Oct 2014 17:37:54 +0100 (CET)
Carsten Kunze  wrote:

> the man page sqlite3.1 contains the .cc request which is not
> compatible with the man macro package (and hence must not be used in
> a man page).  

The below patch rectifies that problem and clears up some others
besides:  

1.  replace blank lines with .sp requests or .PP  
2.  remove blank-line request prior to and after .SH
3.  number the initialization steps 
4.  mark paragraphs with .PP instead of .sp
5.  make option list more compact

I took it upon myself to clarify the wording slightly.  

The patch might not apply cleanly to the trunk because it's based on an
older version (the latest I have handy).  Hope it's useful anyway.  

--jkl

[snip]
--- projects/database/sqlite3/sqlite3.1 2012-07-25
12:32:57.0 -0400 
+++ /usr/local/share/man/man1/sqlite3.1
2014-10-31 17:34:06.0 -0400 @@ -18,13 +18,13 @@
 .SH NAME
 .B sqlite3 
 \- A command line interface for SQLite version 3
-
+.
 .SH SYNOPSIS
 .B sqlite3
 .RI [ options ]
 .RI [ databasefile ]
 .RI [ SQL ]
-
+.
 .SH SUMMARY
 .PP
 .B sqlite3
@@ -33,7 +33,7 @@
 .B sqlite3
 can also be used within shell scripts and other applications to provide
 batch processing features.
-
+.
 .SH DESCRIPTION
 To start a
 .B sqlite3
@@ -42,10 +42,11 @@
 command and optionally provide the name of a database file.  If the
 database file does not exist, it will be created.  If the database file
 does exist, it will be opened.
-
+.
+.PP
 For example, to create a new database file named "mydata.db", create
 a table named "memos" and insert a couple of records into that table:
-.sp
+.PP
 $ 
 .B sqlite3 mydata.db
 .br
@@ -70,29 +71,28 @@
 lunch with Christine|100
 .br
 sqlite>
-.sp
-
+.
+.PP
 If no database name is supplied, the ATTACH sql command can be used
 to attach to existing or create new database files.  ATTACH can also
 be used to attach to multiple databases within the same interactive
 session.  This is useful for migrating data between databases,
 possibly changing the schema along the way.
-
+.
+.PP
 Optionally, a SQL statement or set of SQL statements can be supplied as
 a single argument.  Multiple statements should be separated by
-semi-colons.
-
-For example:
-.sp
+semi-colons. For example:
+.PP
 $ 
-.B sqlite3 -line mydata.db 'select * from memos where priority > 20;'
+.B
+sqlite3 -line mydata.db 'select * from memos where priority > 20;'
 .br
 text = lunch with Christine
 .br
 priority = 100
 .br
-.sp
-
+.
 .SS SQLITE META-COMMANDS
 .PP
 The interactive interpreter offers a set of meta-commands that can be
@@ -100,24 +100,26 @@
 database files, or perform administrative operations upon the
 attached databases (such as rebuilding indices).   Meta-commands are
 always prefixed with a dot (.).
-
+.
+.PP
 A list of available meta-commands can be viewed at any time by issuing
-the '.help' command.  For example:
-.sp
+the
+.B .help
+command:
+.PP
 sqlite>
 .B .help
 .nf
-.cc |
-.databases List names and files of attached databases
-.dump ?TABLE? ...  Dump the database in an SQL text format
-.echo ON|OFF   Turn command echo on or off
-.exit  Exit this program
-.explain ON|OFFTurn output mode suitable for EXPLAIN on or off.
-.header(s) ON|OFF  Turn display of headers on or off
-.help  Show this message
-.import FILE TABLE Import data from FILE into TABLE
-.indices TABLE Show names of all indices on TABLE
-.mode MODE ?TABLE? Set output mode where MODE is one of:
+\&.databases List names and files of attached databases
+\&.dump ?TABLE? ...  Dump the database in an SQL text format
+\&.echo ON|OFF   Turn command echo on or off
+\&.exit  Exit this program
+\&.explain ON|OFFTurn output mode suitable for EXPLAIN on or
off. +\&.header(s) ON|OFF  Turn display of headers on or off
+\&.help  Show this message
+\&.import FILE TABLE Import data from FILE into TABLE
+\&.indices TABLE Show names of all indices on TABLE
+\&.mode MODE ?TABLE? Set output mode where MODE is one of:
  csv  Comma-separated values
  column   Left-aligned columns.  (See .width)
  html HTML  code
@@ -126,27 +128,25 @@
  list Values delimited by .separator string
  tabs Tab-separated values
  tcl  TCL list elements
-.nullvalue STRING  Print STRING in place of NULL values
-.output FILENAME   Send output to FILENAME
-.output stdout Send output to the screen
-.prompt MAIN CONTINUE  Replace the standard prompts
-.quit  Exit this program
-.read FILENAME Execute SQL in FILENAME
-.schema ?TABLE?Show the CREATE statements
-.separator STRING  Change separator used by output mode and .import
-.show  Show the current 

Re: [sqlite] Index without backing table

2014-11-01 Thread James K. Lowden
On Sat, 01 Nov 2014 11:06:51 +0200
Paul  wrote:

> Would be nice to have ability to store both key and payload in the
> index. (Let's call it index-only table)
> This could be a feature that sets some limitations on a table, like
> being unable to have more than one index or inefficient table scans,
> but it will also give some advantage in special cases like mine.

What you're describing sounds very much to me like a SQLite table.  See
http://www.sqlite.org/fileformat2.html, section 1.5, the reference to
"index b-tree".  

You're in good company.  The technique of storing key and value together
"in order" in some sense is as old as databases.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Index without backing table

2014-11-03 Thread James K. Lowden
On Mon, 03 Nov 2014 11:50:17 +0200
Paul  wrote:

> > > Would be nice to have ability to store both key and payload in the
> > > index. (Let's call it index-only table)
> > > This could be a feature that sets some limitations on a table,
> > > like being unable to have more than one index or inefficient
> > > table scans, but it will also give some advantage in special
> > > cases like mine.
> > 
> > What you're describing sounds very much to me like a SQLite table.
> > See http://www.sqlite.org/fileformat2.html, section 1.5, the
> > reference to "index b-tree".  
> 
> So, to be clear, WITHOUT ROWID table will have it's PRIMARY KEY 
> as a replacement for ROWID and table itself is an index?

Yes, approximately.  http://www.sqlite.org/withoutrowid.html says: 

> CREATE TABLE IF NOT EXISTS wordcount(
>   word TEXT PRIMARY KEY,
>   cnt INTEGER
> ) WITHOUT ROWID;
> 
> In this latter table, there is only a single B-Tree which uses the
> "word" column as its key and the "cnt" column as its data.  

That is, the table is stored as a B-tree with the declared primary key
as the B-tree key.  

I wouldn't say, "the table is an index".  I reserve the word "index" in
this sense to mean "something that speeds up searching in something
else", and in this case there's no "else".  

The table is stored on disk as a tree.  A tree can be used as an index,
and an index may be implemented as a tree.  But not every tree is an
index.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3.8.7 crashes when creating an index with WAL mode disabled

2014-11-04 Thread James K. Lowden
On Tue, 4 Nov 2014 12:06:35 +
Simon Davies  wrote:

> > And watch as it crashes when creating the index.
> 
> From https://www.sqlite.org/compile.html:
> 
> Important Note: The SQLITE_OMIT_* options do not work with the
> amalgamation or with pre-packaged C code files. SQLITE_OMIT_*
> compile-time options only work correctly when SQLite is built from
> canonical source files.

Perhaps this could be caught at compile time.  The process that creates
the amalgamation could add a preprocessor macro defining e.g.
_SQLITE_AMALGAMATION_.  The SQLITE_OMIT_* macros could test for that
definition and #pragma error if found.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] x64 vs x32 DLL

2014-11-04 Thread James K. Lowden
On Tue, 04 Nov 2014 22:20:23 +0200
RSmith  wrote:

> The best way to think of the 64 bit upgrade in normal programming is:
> "Able to do larger accuracy calculations at more or less the same
> speed".

Eh, more accurate how?  Every 32-architecture I compiled for supported
64-bit long integers.  Floating point is unchanged.  

The change is bigger address space, and consequently somewhat larger
and slower code.  "Able to address more memory" [virtual or real] is
how I'd put it.  

That makes some problems feasible that formerly were not.  And I'm not
so sure such problems are rare.  I remember when allocation 1 GB of
memory (virtual or not) was pure fantasy.  Nowadays laptops with 8 GB
of RAM are pretty pedestrian.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to check if a record exists

2014-11-04 Thread James K. Lowden
On Tue, 4 Nov 2014 21:47:20 +
"Drago, William @ CSG - NARDAEAST"  wrote:

> I've been pulling my hair out trying to figure how to use EXISTS.

I have several examples at
http://www.schemamania.org/sql/#missing.table.  

> Is there a better/recommended way in SQLite to check if a record
> exists?
> 
> static bool IDisDuplicate(string dbFileName, int id)

Yes.  Assert, don't check!  

Your function is checking for duplicates.  I could be wrong of course,
but I suspect that you're checking for duplicates to do something about
it in the database.  If so, between the time you check and the time you
"do something" that duplicate could have gone away and another arisen.
Cache coherency folks call that a write-after-read error.  

The way to avoid that is to put everything on the broad shoulders of
SQLite, viz, 

insert/update/delete T as t
-- ... --
where exists ( 
select 1 from T
where something = t.something
group by something
having count(*) > 1 
);

HTH.  Exists is confusing at first, and consequently
under-appreciated.  

(BTW, I dis duplicates, too.  Despise them, in fact.) 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LEFT joins affect query plan for semantically equal queries

2014-11-05 Thread James K. Lowden
On Wed, 05 Nov 2014 08:24:47 -0700
"Keith Medcalf"  wrote:

> The two queries are different.  They may end up with the same result,
> but you are asking different questions.  In the first you are
> returning only matching rows.  In the later you are requesting a
> projection (outer join) then applying the conditional (where) to the
> projection.  

I'm surprised to see you say that, Keith, because it's not true.  

An SQL query is one piece, integral and whole.  There's no 
before & after, no first this then that. Projection and selection are
independent operations and can be applied in either order, at the
discretion of the implementation. The only requirement, as you know, is
that the criteria be met. 

> An index on profiles.userid will speed it up, but you are still
> asking a different question.  You cannot change the table visitation
> order around a left outer join because it changes the question.

Actually, he can't change the visitation order because he doesn't
*control* the visitation order.  That's up to SQLite.  He can only ask a
question.  

Equivalent to an outer join is a union.  Let's look at it that way
(simplified slightly for clarity):

>SELECT * FROM bugs b LEFT JOIN profiles p ON p.userid=b.assigned_to
>WHERE p.login_name='vita...@yourcmc.ru'

select b.*, p.login_name 
from bugs as b join profiles as p 
on p.userid=b.assigned_to 
where login_name='vita...@yourcmc.ru'
UNION
select *, NULL
from bugs
where assigned_to not in (select userid from profiles)
and NULL = 'vita...@yourcmc.ru'

Same question, differently expressed.  How much work would you suggest
the system do to answer the second part of that query?  ;-)  

--jkl


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Does using e.g. LTRIM function remove collation?

2014-11-07 Thread James K. Lowden
On Thu, 6 Nov 2014 17:02:26 -0500
Richard Hipp  wrote:

> > sqlite> INSERT INTO test VALUES ('b'), ('A'), ('B'), ('a');
> > sqlite> SELECT * FROM test;
> > b
> > A
> > B
> > a
...
> > sqlite> SELECT * FROM test WHERE LTRIM(col)<'b';
> > A
> > B
> > A
...
> Works as designed.  See
> https://www.sqlite.org/datatype3.html#collation and in particular the
> three rules under section 6.1.

Aren't you overlooking the fact that the WHERE clause is changing not
only which rows are selected, but the *values* of those rows?  

I don't understand how any WHERE clause can convert 'a' to 'A' in the
database.  I'm unable to find any suggestion of such in the collation
documentation.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

2014-11-08 Thread James K. Lowden
On Sat, 08 Nov 2014 22:55:46 +0900
Tristan Van Berkom  wrote:

> So I would have to say, the "right way to do it" is the most efficient
> way, the one which provides SQLite with the best indications of how
> to plot an efficient query plan.

Keith is suggesting that the right way to do it is neither "any way that
works" nor necessarily "whatever is fastest" but "the clearest
formulation of the query".  Clarity has the salutary property of being
most likely to be correct (because understood by the human) and stands
a better than fair chance of being executed efficiently (because it
translates easily to a good query plan).  

Most of time -- not every time, but most of the time -- indexes
and table design matter much more to efficient execution than query
syntax. When a clearly expressed query is not executed efficiently in
the presence of useful indexes, and especially when a slightly different
one does, that's usually considered a defect of the query planner.  

--jkl
 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] What is the best alternative to this RIGHT OUTER JOIN ?

2014-11-08 Thread James K. Lowden
On Sun, 09 Nov 2014 00:45:16 +0900
Tristan Van Berkom  wrote:

> While I do understand SQL as a functional language, most functional
> programming I've done still has rather explicit syntax/rules, so I get
> particularly uncomfortable with writing vague statements, such as
> JOIN tableA, tableB WHERE ... without being exactly specific on the
> heart/nature of the join which is happening.

Oh, I think you'll agree SQL has explicit syntax.  I think what you
mean by "vague" is "nondeterministic with respect to the physical data
structures and algorithms".  About that you're right; it's considered a
feature.  :-)  

The idea is that the engine interpreting your SQL might not be
absolutely as fast as the most optimal solution for your hardware and
data at any one point in time.  But it will be nearly so, much more
adaptable as hardware and data change, and thereby not trouble your
application with issues outside its problem domain.  And of course
there are other advantages besides, as you know.  

Partly it's a matter of trust.  You trust the OS to schedule your job
fairly, to supply virtual memory, to deliver your TCP packet.  You
trust SQLite to deliver on its ACID contract, and to produce logically
correct result from queries.  Why not trust it to find the shortest
path to your data?  

> Also what I've found in my limited experience is that nesting SELECT
> statements, at least in SQLite, completely throws off the planner,
> as in it has no opportunity to make a good guess and has to deal
> with the result set of a nested SELECT as an opaque table, 

A good SQL rule of thumb: if you can think of a way, so can the
DBMS.  "... no opportunity to make a good guess" is not true.  In some
sense, SQLite has had 10 years to make a good guess, and often does.  

A nested select need not be materialized as a "table", opaque or
otherwise.  It can be converted, or "flattened" under some
circumstances.  SQLite's query planner isn't the most sophisticated; no
one suggests otherwise.  It does not follow that every JOIN will
outperform every EXISTS or vice versa.  

> is generally not an indexed table (or perhaps it is, but I wouldnt
> know because those indexes don't seem to be declarative in any way).

I don't know what you're mean by indexes that "don't seem to be
declarative".  

> So indeed, I am not comfortable with 'leaving it up to chance',
> and if there is a way to get higher specificity, I try to achieve
> that.

If I may be so bold, beware of your assumptions.  The fastest plan with
100 rows may perform very poorly with 100 million rows.  Be careful
what you optimize for.  :-)  

My fans on this list (both of them) will be surprised, though, that I
partly agree with you.  In terms of technology, there's very little
middle ground between fopen(3) and SQL's "I give query you give data"
contract.  SQL experts, no matter the platform, spend a fair amount of
time coercing the system into using an efficient query plan.  They
cannot say "apply criteria X to index Y and join to table T"; they must
work by indirection, creating indexes and using "query hints" (or,
sometimes, other query formations) until the planner does the "right
thing", however defined.  I sometimes wish for a system that let me
express the query algebraically and the order of operation explicitly,
but afaik no such system exists except partially and grudgingly within
SQL.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Does using e.g. LTRIM function remove collation?

2014-11-10 Thread James K. Lowden
On Mon, 10 Nov 2014 08:43:24 +
Hick Gunter <h...@scigames.at> wrote:

> I get the following results for the second select:
> 
> A
> B
> a (lowercase!!!)
> 
> Are you sure you ran the exact query stated?

I didn't run it.  Your mail showed 3 uppercase letters: 

> > sqlite> SELECT * FROM test WHERE LTRIM(col)<'b';
> > A
> > B
> > A

--jkl

> 
> -Ursprüngliche Nachricht-
> Von: James K. Lowden [mailto:jklow...@schemamania.org]
> Gesendet: Samstag, 08. November 2014 01:52
> An: sqlite-users@sqlite.org
> Betreff: Re: [sqlite] Does using e.g. LTRIM function remove collation?
> 
> On Thu, 6 Nov 2014 17:02:26 -0500
> Richard Hipp <d...@sqlite.org> wrote:
> 
> > > sqlite> INSERT INTO test VALUES ('b'), ('A'), ('B'), ('a');
> > > sqlite> SELECT * FROM test;
> > > b
> > > A
> > > B
> > > a
> ...
> > > sqlite> SELECT * FROM test WHERE LTRIM(col)<'b';
> > > A
> > > B
> > > A
> ...
> > Works as designed.  See
> > https://www.sqlite.org/datatype3.html#collation and in particular
> > the three rules under section 6.1.
> 
> Aren't you overlooking the fact that the WHERE clause is changing not
> only which rows are selected, but the *values* of those rows?
> 
> I don't understand how any WHERE clause can convert 'a' to 'A' in the
> database.  I'm unable to find any suggestion of such in the collation
> documentation.
> 
> --jkl
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> 
> 
> ___
>  Gunter Hick
> Software Engineer
> Scientific Games International GmbH
> FN 157284 a, HG Wien
> Klitschgasse 2-4, A-1130 Vienna, Austria
> Tel: +43 1 80100 0
> E-Mail: h...@scigames.at
> 
> This communication (including any attachments) is intended for the
> use of the intended recipient(s) only and may contain information
> that is confidential, privileged or legally protected. Any
> unauthorized use or dissemination of this communication is strictly
> prohibited. If you have received this communication in error, please
> immediately notify the sender by return e-mail message and delete all
> copies of the original communication. Thank you for your cooperation.
> 
> 
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Triggers and CTE's

2014-11-12 Thread James K. Lowden
On Tue, 11 Nov 2014 17:15:53 -0600
Ben Newberg  wrote:

> CREATE TRIGGER t_populate_zweeks
> AFTER UPDATE ON zSPs WHEN new.Procedure = 6 AND new.Flag = 1
> BEGIN
> DELETE FROM zWeeks;
> WITH RECURSIVE Weeks(wk) as (select 1 union all select wk + 1 from
> Weeks limit 10)
> INSERT INTO zWeeks (Week) select wk from Weeks;
> END;
> 
> When I run this, I get an error message from sqlite3_errmsg() saying
> syntax error near "INSERT". However, when I run the DELETE, WITH and
> INSERT statements above separately without the Create Trigger DDL,
> the query runs successfully and populates my zWeeks table with values
> 1 through 10.
> 
> Do triggers not support this behavior, or is my syntax incorrect?

You really don't want to use triggers this way.  I know you think you
do.  :-)  Everyone faces the temptation sooner or later.  But it is a
misuse of triggers, and you'll have the devil to pay in due time.  

Rather than lifting everything back up into the application, I would
suggest moving the SQL into a prepared statement inside a transaction.
It is an example of why I'd like to see SQLite support user-defined
functions (not "stored procedures") for DML purposes.  Before I did
that, though, I would make sure I actually need to persist weeks.  I'm
pretty sure weeks can be computed in a view, and it's not obvious that
performance would be worse.  

If persistence is required, consider maintaining a single offset that
can be added to weeks and incremented separately, instead incrementing
each week.  For example, you could have the first row be week 0, and
each successive row as max(week) - 1, keeping e.g. 0, -1, -2 (where
zero is oldest) and offset of 2, making 2+0, 2+-1, 2+-2, i.e. 2, 1, 0.  

Best advice is to restrict the use of triggers to enforcement of
referential integrity.  For the most part they are obsolete now that
we have DRI (e.g. declared foreign keys) support.  

Why only that?  Triggers fire regardless of application, regardless of
business rule.  They can maintain the internal consistency of the
database.  If you put business rules in triggers, you cannot change
them without affecting all applications and future uses.  Business
rules are less logical and more volatile than RI rules, too, and thus
more prone to change.  

In your case you have some incrementation of week if Procedure = 6 AND
new.Flag = 1.  The database would not be *inconsistent* if that rule
were violated; it would only be inaccurate insofar as it failed to
reflect the world it models.  A different interpretation of world and
model could be used without changing the database or requiring that
particular row relationship. 

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [SQLite] Support for 23,10 Precision number format

2014-11-13 Thread James K. Lowden
On Thu, 13 Nov 2014 14:38:10 +
Simon Slavin  wrote:

> In summary, if you need ultimate precision, use integers.  If not,
> use 64-bit IEEE-571 like everyone else does without being sued.  If
> you somehow really need 23,10 maths, then you're going to have to
> write your own mathematical library anyway, because I'm not aware of
> any usable libraries which actually support 23,10 outside the world
> of physics.

http://www.mpfr.org/#free-sw

I'm sure you're aware of such things.  I don't believe the OP's problem
statement, but if he's bound and determined to go that route, I don't
see how he's obliged to write the math functions.  Why are none of
these suitable?  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Column name as a variable

2014-11-17 Thread James K. Lowden
On Mon, 17 Nov 2014 12:00:06 +
Hick Gunter  wrote:

> SELECT table_name FROM sqlite_master;
> 
> And then, in your programming language of choice, execute

Or, with some determination, you can do it in two steps in pure SQL:
Use SQL to produce SQL, and execute the result, 

SELECT'select count(*), '
|| table_name
|| ' from '
|| table_name
|| ' union '
FROM sqlite_master;

To replace the last 'union clause' in the result with a semicolon, you
could sling a string in the application, or use a correlated subquery
(and ORDER BY) to supply ';' when e.g. table_name  is max(table_name).  

To do it in one fell swoop in SQL, you need a virtual table that will
execute SQL for you.  Supply the above as input, and get two columns of
output.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Column name as a variable

2014-11-20 Thread James K. Lowden
On Tue, 18 Nov 2014 12:06:02 +
Simon Slavin  wrote:

> > my requirement, which is using a
> > table name as a variable
> 
> This is deliberately made very difficult in SQL.  I think it's for
> security reasons.

That may be part of it, but It's really all about values.  

SQL has value semantics.  That there are no pointers is a deliberate
choice, made because value semantics are easier to reason about.  

A table is an SQL variable; the name of a table is  a name of a
variable. Not many languages interpolate variable names the way e.g.
Perl does.  When you think of it that way, it's unsurprising that

select T = 'foo';
select count(*) from tableof(T);

doesn't work, nor does anything similar.  Even though it doesn't look
like a C pointer, T above is a pointer in semantic terms: it is not a
table, but putatively a "tablename variable", something that holds the
name of a table (which is SQL is the only way to reference any
variable).  

Given the obscure SQL I've seen, I'm glad not to deal with that kind of
indirection.  

IMO the most confusing aspect is parameterized queries, which IIRC
aren't defined by the SQL standard (not that that matters much).  Once
you can replace data with @param or similar, it seems a small step to
parameterize metadata, too.  But the designers were thinking along
functional lines, 

table = function(parameter)

and again, within that framework, it's pretty rare to find a language
in which the parameter could itself be the name of a variable.  

Note that parameterized metadata would add no value to SQL per se.
Parameterized data yields efficiency and even accuracy gains because
numeric (and other nontext) data can be sent to the server in binary
form without being converted to strings. (In that way parameterized
queries mirror plain old SELECT, which returns binary data.)  Metadata
on the other hand are always text and can be readily parameterized
with any macro processor without reference to datatypes in the
database.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-22 Thread James K. Lowden
On Fri, 21 Nov 2014 14:01:39 -0500 (EST)
Joseph Fernandes  wrote:

> 4) Therefore, we are looking at a datastore that can give us a very
> quick write(almost zero latency, as the recording is done inline
> w.r.t file IO) and that as good data querying facilities(Slight
> latency in the read is fine but the fresh of that record data should
> be spot on).

This strikes me as a classic case for "record, then analyze".  I would
capture the data more cheaply and use SQLite to decide what to do.  

You don't really care if the recorded times are exactly right; a few
missed updates wouldn't affect the cold/hot status very much.  You
should be willing to lose a few if you improve write latency.  OTOH
the maintenance operation isn't *very* time critical; you just can't
afford to walk the whole tree first.  

That suggests two possibilities for capture: 

1.  Keep a sequential file of {name,time} or {inode,time} pairs
(whichever is more convenient to use).  By using O_APPEND you get
atomic writes and perfect captures across threads.  fsync(2) as
desired.  

2.  If in practice the above file grows too large, use a primitive
hashing store such as BerkeleyDB to capture counts by name/inode.  It's
not even obvious you need an external store; you might be able to get
away with std::hash_map in C++ and periodically serialize that.  ISTM
you don't need to worry about concurrency because a few missed updates
here and there won't change much.  

At maintenance time, scoop the file into a SQLite table, and you're
back where you started, except you already have zero
write-time latency.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-23 Thread James K. Lowden
On Sat, 22 Nov 2014 23:25:16 -0500 (EST)
Joseph Fernandes  wrote:

> 2) Using the changelog to feed the db has another issue i.e freshness
> of data in the DB w.r.t the IO. Few of our data maintainer scanners
> would require the freshness of the feed to be close to real. [...]
> Your thoughts on this. 

If your in-memory LRU structure suffices to describe all "hot" files,
you're in good shape.  Rather than dumping periodically, I would
consider placing it in shared memory and write a virtual table
function for it in SQLite, such that it can be queried directly as
needed.  

To me based on your description your choice isn't how best to use
SQLite in line with I/O, but how best to capture the data such that
they can be aggregated with SQLite at time of update.  That choice is
one of two: 1) capture each I/O event in a sequential file, always
appending, or 2) maintain per-file counts in a hash or map.  Which is
better depends on how much you're willing to pay for each I/O.  By
paying the lookup cost of #2 each time, the total space is smaller and
the maintenance-time computation less.  

> 3) Now that we would use Sqlite3(with WAL) to be direcly feed by the
> IO path(in the absence of changelog) we are looking to get the best
> performance from it. 

Metadata updates to Posix filesystems are seen as so costly that 
fsync(2) on the datafile descriptor doesn't update them.  A separate
sync on the directory is required.  Compared to an in-memory update
(of metadata, in kernel space) and a single fsync call, the price of a
SQLite transaction is enormous, at a guess an order of magnitude more.
Bear in mind that WAL buys you not efficiency but consistency, the very
thing you don't really need.  The data are written sequentially to the
log and then inserted into the table.  You can expect no better than
O(n log n) performance.  Filesystems generally would never tolerate
that, but for your application you'd be the judge. 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Implementing per-value custom types

2014-11-26 Thread James K. Lowden
Darko, 

I have rather a long answer for you.  I'm not confused about the
difference between logical and physical types, but I am confused about
what you meant and what you're hoping to accomplish.  

On Wed, 26 Nov 2014 03:22:03 -0800
Darko Volaric  wrote:

> A 64 bit floating point number and an 64 bit integer can be
> represented by the exact same 64 bit pattern, and no-one would
> suggest they're the same value. 

Because they're different types.  

> You can have those two differently typed although identical bit
> values in the same SQLite column.  

Because SQLite doesn't enforce type-checking.  

> The data identifying the representation of those datums is integral
> to that value and doesn't belong in a different column as you say. 

"The data identifying the representation" is type information.  

> You're assuming that because columns have a domain or type, then that
> domain must have a fixed representation in the database
> implementation. 

I think it's a reasonable design choice for the machines we have.  In
any event, the DBMS's choice of representation is its own.  

Looking at your other replies, I'm not sure I understand what you're
hoping to accomplish.  I answered your assertion that you'd like to
have multi-type columns:

> > I have a need to implement per-value custom typing in SQLite. In my
> > case I may have the integer value 1234 appear in two rows in the
> > same column, but they are different in that they have completely
> > different meanings because they have different types although they
> > have the same representation in the database.

Where you lose me is the idea that two rows in one column might have
identical bit patterns but mean different things. Computers and
databases generally represent values as bits.  Surely two identical
values of one type are equal.  You have to store the distinction
somewhere, right?  

Elsewhere you suggested you want to *extend* the type system, to define
more complex types (for which identical bit patterns are equal).  You
could define them in terms of contraints, 

> A 1 bit integer is a subtype of a 64 bit integer since the set of
> numbers allowed by the 1 bit integer is {0, 1} are contained with the
> set of numbers allowed by 64 bit integers, ie {0..2^64-1}. 

This as you know can already be done using CHECK constraints.   

> There is no logical or formal reason why this can't be extended
> further to allow arbitrary subtypes according to the user's wishes.

I think I would argue that's true, provided:

1.  the constraints can be expressed to the evaluation system 
2.  further-constrained types don't necessary represent a "subtype"

After all, a "1-bit integer" might as well be a Boolean or for that
matter might represent red and green.  You can't add red and green in a
1-bit regime, nor divide, nor increment.  It's a "subtype" only in some
physical sense, the very sense you're saying is irrelevant. 

I also think such constraints can be expressed now using DRI.  Is there
something you'd like to enforce that you cannot?  

Aggregate types are a different matter.  For example, one might want to
have a PDF type.  PDFs can contain certain metadata, such as author or
date.  Comparison could be defined in terms of those metadata;
constraints could require them. If you had a way to signify the type
and comparitor to SQLite, you could extend its type system and
distinguish PDFs from ordinary blobs.  

We can go one step further: a column may hold tables.  Nothing in RM
prevents that either; it's just not supported by SQL.  

I don't think you're going to find much appetite for a user-extensible
type system in SQLite.  It's harder than it seems on both sides,
implementation and use.  And without very deep support -- in the
constraint system, in the SQL syntax -- it would probably do more harm
than good.  

I'm not sure where that leaves us.  It's one thing to distinguish
between logical and physical types.  It's another to extend the type
system.  It's a third to argue that because a logical type could be
represented in more than one way physically, that it should be, in one
system, in one column.  And it's yet a fourth to say that a column
should be able to represent more than one type, be it physical or
logical.  I'm not sure which of those you mean.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report: USBAN failure

2014-12-02 Thread James K. Lowden
On Tue, 02 Dec 2014 15:58:47 +0100
Abramo Bagnara  wrote:

> The point is not about overzealousness, but about the declaration of
> memcpy/memset on your machine.
> 
> If it contains the nonnull attribute then (correctly) UBSan detect
> that such constraint is not respected.

Hmm, I guess you mean this (from my handy LTS Ubuntu box):  

/* Copy N bytes of SRC to DEST.  */
extern void *memcpy (void *__restrict __dest,
 __const void *__restrict __src, size_t __n)
 __THROW __nonnull ((1, 2));

(Documentation at 
https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#index-g_t_0040code_007bnonnull_007d-function-attribute-2263.)

IIUC the declaration specifies the pointer cannot be NULL and the
compiler generates a diagnostic if it notices that it can be.  But the
declaration is strictly stricter than the standard defines.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug report: USBAN failure

2014-12-03 Thread James K. Lowden
On Wed, 03 Dec 2014 08:56:44 +0100
Clemens Ladisch <clem...@ladisch.de> wrote:

> James K. Lowden wrote:
> > /* Copy N bytes of SRC to DEST.  */
> > extern void *memcpy (void *__restrict __dest,
> >  __const void *__restrict __src, size_t __n)
> >  __THROW __nonnull ((1, 2));
> >
> > IIUC the declaration specifies the pointer cannot be NULL and the
> > compiler generates a diagnostic if it notices that it can be.  But
> > the declaration is strictly stricter than the standard defines.
> 
> Do you have a standard that allows NULL?  The one I quoted does not.

I'm sure you're right about the standard.  I merely observed that my
not-too-recent copy of strings.h includes a nonnull attribute.
Strictly speaking, that attribute may be "wrong" in the sense that the
standard does permit the pointer to be NULL if the length is zero.  A
compiler may nevertheless emit a diagnostic when it comes across one.
For gcc that's triggered with -Wnonnull.  I think that might explain the
message the OP saw.  

What to do is another question.  SQLite can surely ignore it.  If I
felt strongly about it, I'd submit a bug report to GCC because IIUC
the nonnull attribute syntax provides no way to express the constraint
defined by the standard, i.e. "__n > 0 || __nonnull ((1, 2))".  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite bugreport : unique index causes valid updates to fail

2014-12-09 Thread James K. Lowden
On Mon, 08 Dec 2014 22:01:15 +0700
Dan Kennedy  wrote:

> On 12/08/2014 09:55 PM, Nico Williams wrote:
> > Ideally there would be something like DEFERRED foreign key checking
> > for uniqueness constraints...
> 
> You could hack SQLite to do enforce unique constraints the same way
> as FKs. When adding an entry to a UNIQUE index b-tree, you check for
> a duplicate. If one exists, increment a counter. Do the opposite when 
> removing entries - decrement the counter if there are two or more 
> duplicates of the entry you are removing. If your counter is greater 
> than zero at commit time, a UNIQUE constraint has failed.

It's not *deferred* constraint checking.  It's constraint checking.
Best to honor the transaction first.  

Rather than adding to the syntax, perhaps a pragma could cause updates
to happen in a transaction: 

1.  Create a temporary table to hold the after-image of the updated
rows. 
2.  begin transaction
3 . Delete the rows from the target table. 
3.  Insert the updated rows from the temporary table.  
4.  commit
5.  drop temporary table. 

Of course there are more efficient answers available deeper in the
update logic, affecting only the partcular columns at the time the
constraint is enforced.  I guess they all involve deleting the
prior set from the index and inserting the new one.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite bugreport : unique index causes valid updates to fail

2014-12-09 Thread James K. Lowden
On Mon, 08 Dec 2014 15:48:41 +0200
RSmith  wrote:

> > UPDATE pages SET position = position + 1 WHERE book_id = 0 AND
> > position >= 1;
> 
> NOT a bug...  the moment you SET position to position +1 for the
> first iteration of the query, it tries to make that entry look like
> (0,2) and there is of course at this point in time already an entry
> like (0,2).

Yes, that's how SQLite works, or doesn't.  Whether or not it's a bug
depends on how you define the term.  

The issue has come up here before: contrary to the SQL standard, SQLite
does not support constraint enforcement with transaction semantics.
I've never heard of another SQL DBMS that behaves that way.  

sqlite> create table T (t int primary key);
sqlite> insert into T values (1), (2);
sqlite> update T set t = t+1;
Error: column t is not unique

As the OP discovered, the one recourse is to relieve the constraint
during the update.  Another is to update a temporary table, and then
delete & insert the rows in a transaction.  I would say "must implement
one's own transaction semantics" is, if not a bug, at least a
misfeature.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How do I update multiple rows in a single sql statement

2014-12-09 Thread James K. Lowden
On Mon, 08 Dec 2014 20:57:00 -0500
Igor Tandetnik  wrote:

> Yes, there are workarounds (a view; or REPLACE INTO may sometimes be 
> pressed into service). But I, for one, kinda miss UPDATE ... FROM.

Be careful what you wish for.  :-)  

The only implementation of UPDATE...FROM that I know is on SQL Server
(Sybase & Microsoft).  If the join criteria are sastified by more than
one row, each successive value is applied.  The result is
nondeterministic, produces no diagnostic, and cannot be prevented.  

The standard SQL syntax -- while verbose, granted -- at least gets the
right answer.  If the subquery to the right of the SET clause produces
more than one row, the statement fails.  With SQL Server's syntax, it
succeeds with the target holding the "last" value, whatever that was.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bugreport - slowdown in sqlite after the ANALYZE statement

2014-12-09 Thread James K. Lowden
On Tue, 09 Dec 2014 12:06:20 +0100
Jan Stan?k  wrote:

>  INSERT INTO CoreCache (ModelID, ItemID)
>  SELECT
... 
>  ORDER BY Year

Why ORDER BY on INSERT?  Does it work better?  I would expect the
unnecessary sort to be pure overhead.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bugreport - slowdown in sqlite after the ANALYZE statement

2014-12-11 Thread James K. Lowden
On Wed, 10 Dec 2014 08:49:21 +0100
Eduardo Morras  wrote:

> > Why ORDER BY on INSERT?  Does it work better?  I would expect the
> > unnecessary sort to be pure overhead.  
> 
> If you insert in correct index order, the index update phase is
> faster because it don't need rebalance the b-tree so often after each
> insert.

OK, but at the cost of sorting the input first.  Rebalancing a tree
requires diddling a few pointers.  Sorting in the average case is 
O(log N), and possibly worse, plus the attendant I/O.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How do I update multiple rows in a single sql statement

2014-12-11 Thread James K. Lowden
On Tue, 09 Dec 2014 10:46:23 -0500
Igor Tandetnik <i...@tandetnik.org> wrote:

> On 12/9/2014 10:38 AM, James K. Lowden wrote:
> > If the subquery to the right of the SET clause produces
> > more than one row, the statement fails.
> 
> Are you sure? Normally, a scalar subquery doesn't fail when the 
> resultset contains more than one row - it just silently produces the 
> value from the first row of the first column. 

Well, I *was* sure.  I don't know about "normally", but you're right
that SQLite gets it wrong, see below.  I'm pretty sure the standard
calls for a diagnostic anywhere a scalar is required and not provided.  

There is a workaround worth knowing: if you add, 

group by k having count(*) = 1

to the UPDATE statement below, it works correctly in the sense that
it becomes deterministic.  A separate check is required of course to
determine if there were any count(*) > 1.  

[snip]
create table T ( k int primary key, v string );
create table S ( k int, v string, primary key( k,v) );
insert into T values (1, 'a'), (2, 'b');
insert into S values (1, 'y'), (1, 'z');
select * from T;
k   v 
--  --
1   a 
2   b 
select * from S;
k   v 
--  --
1   y 
1   z 
select * from T join S on T.k = S.k;
k   v   k   v 
--  --  --  --
1   a   1   y 
1   a   1   z 
update T 
set v = (select v from S where k = T.k)
where exists (
  select 1 
  from S where k = T.k
);
select * from T;
k   v 
--  --
1   y 
2   b 
[pins]

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature Request - RowCount

2014-12-13 Thread James K. Lowden
On Sat, 13 Dec 2014 14:15:15 +0200
RSmith  wrote:

> Most DB Admin tools out there displays the number of rows in a table
> when you select it or open it, so too the one I am working on and
> after testing stuff on Simon's question about the row counting, I
> realised that selecting a large table always pauses somewhat (with
> suitable progress bar) to simply open and report the usual
> statistics. The culprit of course being row-counting.  

Every DB Admin tool I've ever used proved to be more hinderance than
help.  They seem to be written by the moderately competent to help the
novice, and run out of gas or fall over when faced with anything
complex.  So the number of tools with feature X is no measure of the
value of X.  (Notable example: the tool should keep every query and
result in a time-sequenced transcript log, so that prior results can be
re-examined and prior queries modified.  Most tools disassociate query
from output and invite the user to modify the query in-place,
destroying the prior.)  

My first question, then, is whether or not the rowcount is so
interesting that it must be known before a table can be operated on.
I suggest the answer is No.  The relative & approximate sizes of the
tables is known to the admin in most cases and, when it is not, the
information is readily discovered on a case-by-case basis.  

Would a proxy figure do?  Is it enough to know the number of pages or
bytes allocated to a table?  I don't know if such is available, but if
it is perhaps that would serve your purpose.  Otherwise your user
might be satisfied by lazy evaluation: run your count(*) query on a
fork and provide the answer through a pipe or similar, in such a way
that the user can cancel it if it's placing too high a load on the
system.  

That said, I'm puzzled why rowcount isn't maintained and exposed in
SQLite as part of a table's metadata, particularly when indexes/keys are
present.  The cost of maintaining a rowcount is small, in terms of
computation and complexity.  ISTM it is valuable information to the
system itself in evaluating query-plan costs.  The "because it's lite"
argument doesn't hold water insofar as applications are paying a high
cost (even if somewhat optimized) for a common query.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Client/Server Best Practices

2015-01-02 Thread James K. Lowden
On Thu, 25 Dec 2014 05:32:45 -0700 (MST)
Rick Kelly  wrote:

> All SELECT type requests are wrapped with BEGIN TRANSACTION/COMMIT 

That shouldn't be necessary and afaik isn't necessary.  SELECT does not
modify the database.  To "commit a select" is to apply the nonchanges.  

A common misconception is that BEGIN TRANSACTION "takes a lock" in some
sense.  It doesn't; it marks a point in logical time that will be
concluded with COMMIT/ROLLBACK.  Locks, if any, are implicit in the
SELECT itself.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] New column in select will not mask column of the same name in having clause and sqlite won't warn

2015-01-02 Thread James K. Lowden
On Sun, 28 Dec 2014 17:46:08 +0100
Tomas Telensky  wrote:

> select kvadrat, datum, count(distinct kontrola) as pocet
> from b
> group by kvadrat, datum
> having pocet > 1
> 
> The problem was that pocet was actually a column in table b and I
> didn't notice, and the having clause was using the table column
> instead of the newly derived column specified in select clause.
> 
> So far so good, but sqlite should at least issue any warning, right?

I would say it should raise an error.  The HAVING clause should include
at least one aggregate.  Comparing a column to a constant is the job of
WHERE.  

The accepted syntax is ambiguous.  Was the HAVING applied before or
after the aggregation.  IOW, did you get 

1.  the count for each {kvadrat, datum} pair for which pocet > 1, or
2.  the count of {kvadrat, datum} pairs that have at least one pocet > 1

?

In the first case the counts would be smaller by the number of rows for
which pocet <= 1.  In the second case results rows would be eliminated
for pairs that contain only rows for which pocet <= 1.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is this date supposed to be less than or more than the other?

2015-01-02 Thread James K. Lowden
On Fri, 2 Jan 2015 16:12:23 -0800
J Decker  wrote:

> I understand it's kept as a string... 

It might be more helpful to think of it not in terms of how it's "kept"
but as what its type is.  How it's kept is up to the DBMS to decide.
But the column is of a type: one of text, integer, and double.  Even
though that type isn't enforced in SQLite, it creeps in from time to
time.  You're comparing two values whose type is 'text', and that's how
they're being compared.  

> and there's no internal functions for this

There are in fact several functions, as you know, among them 
datetime().  (I consider them "internal", since they come with the
base system.)  Once you use one to convert your strings to another type
-- perhaps datetime -- then you're not comparing strings anymore.  

ISTM what you really want is to define the column as a datetime type,
regardless of how it's "kept", so that comparisons would be based on
the represented time, not as strings.  That's not a feature of SQLite.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] New column in select will not mask column of the same name in having clause and sqlite won't warn

2015-01-07 Thread James K. Lowden
On Mon, 5 Jan 2015 06:39:42 +
Hick Gunter  wrote:

> This is completely legal and well defined.
> 
> HAVING is applied to the RESULT set of a SELECT.

I beg to differ.  It's both invalid SQL and (therefore) undefined.
Furthermore, it's illogical.  Consider:

create table T (a int, b int, c int);
insert into T values (1, 2, 1);

select count(*), a from T group by a having b > 0;

Surely "select count(*), a from T" produces 1 row, 

count(*)a
-
   11

but what does "having b > 0" mean if "HAVING is applied to the RESULT
set of a SELECT"?  There is no B!  

If ad argumentum we say B refers to the B in the table, the question
remains: how to interpret the having clause?  Is it TRUE If 

1.  there exists a row for a given value of A for which B > 0? 
2.  all rows for a given A have B > 0? 

It so happens (per http://sqlfiddle.com/#!5/34e8d/1) that sqlite
accepts the query and uses interpretation #2.  But that is arbitrary:
HAVING is not a FORALL or EXISTS quantifier.  It has a single, clunky
job: to test aggregates.  

Most SQL DBMSs reject the query outright.  As well they should, because
it is nonsensical.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Client/Server Best Practices

2015-01-07 Thread James K. Lowden
On Fri, 02 Jan 2015 21:41:02 -0700
"Keith Medcalf" <kmedc...@dessus.com> wrote:
> On Friday, 2 January, 2015 16:26, James K. Lowden
> <jklow...@schemamania.org> said:
> 
> >On Thu, 25 Dec 2014 05:32:45 -0700 (MST) Rick Kelly
> ><rpke...@gci.net> wrote:
> 
> >> All SELECT type requests are wrapped with BEGIN TRANSACTION/COMMIT
> 
> >That shouldn't be necessary and afaik isn't necessary.  SELECT does
> >not modify the database.  To "commit a select" is to apply the
> >nonchanges.
> 
> It does not matter whether it modifies the database.  "reading" the
> database requires a lock -- a shared lock.  "updating" the database
> requires a "write" lock, which precludes obtaining a "shared" lock.

When I first read your answer my reaction was, "yes, yes, of course".
But I'm not sure where that leaves the OP.  Are you suggesting select
statements work "better" in some sense with autocommit turned off?  

In passing I suggest *requires* is not true in general.  It may be that
SQLite and other implementations use locks to mediate access and
implement ACID semantics.  Locks are just one popular way to accomplish
that, not the only one.  

Your description of transaction implementation is illuminating.  If I
understand correctly, you're describing an odd set of
design choices.  

> BEGIN TRANSACTION does not acquire a lock -- BEGIN IMMEDIATE does
> that -- BEGIN TRANSACTION merely turns off autocommit, meaning that
> the lock will not be released magically, but rather by an explicit
> COMMIT (which itself does not do anything -- it merely turns
> autocommit back on so that the next statement will commit the
> transaction before magically acquiring a new lock).  

I find this very peculiar.  You aren't saying is that

begin transaction;
insert into T value (1);
commit;
[sqlite3_close]

leaves the database unchanged (because there's no "next statement")?
If not, and there's a power outage between (successful) commit and
closing the connection, what will be the state of the database on
restart?  Is the transaction still open or, if not, is it rolled
forward and completed, or rolled back?  

> However, execution of a SELECT statement does cause a lock to be
> obtained (a shared lock) and a COMMIT does cause that shared lock to
> be released.  

Again, I find this surprising.  I would expect SELECT to 

1.  establish a shared lock
2.  select the data
3.  release the lock

whether or not BEGIN is called.  If I understand what commit does per
your description above, in a "transaction", the effect would be

1.  BEGIN TRANSACTION (autocommit off)
2.  SELECT (take shared lock)
3.  data data data
4.  [SELECT  done] (release shared lock)
5.  COMMIT (autocommit on)

which leaves steps #1 and #5 redundant.  

> Executing an UPDATE after a SELECT -- in the same connection -- (or
> while a select is in progress) will escalate the SHARED lock to a
> WRITE lock.  

OK, this is typical. 

> COMMIT will release "the lock" -- "the lock" is now a WRITE lock, not
> a shared lock.  Therefore the next _step() will be executing without
> any lock at all leading to apparently undefined results (really an
> error should be thrown "SQLITE_DATABASE_UNLOCKED_FROM_UNDER_ME" or a
> MISUSE error, but is not). 

Good to know.  I hope we agree this is unlovely, and intersects with
SQLite's unfortunate property of not isolating SELECT as atomic.  (By
which I mean: if there are 10 rows in a table and connection 1 issues
"SELECT * FROM T" and between calls 4 & 5 to sqlite3_step another
connection 2 issues "DELETE FROM T", the first process may get 4 or 5
rows, or 10, depending on luck of the draw.)  

The right behavior is not a misuse error.  How is it "misuse" for two
connections to read and update the same table in overlapping time?  

The right behavior is to isolate SELECT from UPDATE.  Let the reader see
what was in the database at the time the SELECT was issued, unaffected
by UPDATE.  The COMMIT associated with UPDATE should not affect the
SELECT's lock; rather it should pend until SELECT completes.  Then its
own lock acquires rights to the resources it needs, and is released when
the work is done.  

> This is inherent in how WAL works.  Just because WAL is not in effect
> does not alter the fundamental workings of the transaction system.

Not sure how to parse that.  I think you mean it's inherent in how
transactions work, whether or not WAL is used?  

> I do not believe that there is a way to specify "COMMIT BUT MAINTAIN
> THE SHARED LOCK", (that is, to commit the changes only and
> un-escalate the lock back to a shared lock) 

No such syntax is needed if ACID semantics are respected. In SQL

Re: [sqlite] Client/Server Best Practices

2015-01-09 Thread James K. Lowden
On Wed, 07 Jan 2015 21:47:24 -0700
"Keith Medcalf"  wrote:

> >As I said, your description (which I trust is accurate) is very
> >helpful to someone who wants to understand how SQLite will act on
> >the SQL provided to it.  But it also protrays problematic choices
> >that stray from SQL's defined behavior.
> 
> Not really.  It only shows that updating the database on the same
> connection as is being used to run a select, and attempting to commit
> the update before the select is complete results in chaos.  I think
> you would agree that the sequence:
> 
> _prepare('select ...')
> do while _step() != DONE
>_step(_prepare('BEGIN'))
>_step(_prepare('UPDATE ...'))
>_step(_prepare('COMMIT'))
>continue
> 
> should not be valid.  The BEGIN COMMIT should be moved outside the
> loop unless the connection on which the select is processed is
> separate from that on which the BEGIN/UPDATE/COMMIT is performed.

Yes, I agree that shouldn't be valid.  Or, if it is, then it should
work correctly!  ;-)   

Thanks for the clarification.  The salient point, as you emphasized, is

> There is only one connection.  

I tend to forget that SQLite behaves oddly sometimes when misused.  We
had a discussion months ago about approximately the same thing, where
threads were sharing a connection.  

Sybase works similarly.  The server insists that each query be handled
in its entirety before it accepts the next.  Attempting to issue an
UPDATE per your example above would result in the dread "pending
results" error.  

Along the same lines, since you mentioned it, 

>  in an SQLite database you can do:
> 
> BEGIN
> SELECT ...
> SELECT ...
> SELECT ...
> SELECT ...
> COMMIT
> 
> and the view of the database seen by this connection will be
> consistent even though "some other process" modified the tables used
> in query 3 while query 2 was executing.  Even if those changes are
> commited by the writer process, the above process will not see them
> until the COMMIT releases the locks.  I expect other databases do
> this as well.  

In other DBMSs that I know of -- and afaik standard SQL -- BEGIN
TRANSACTION has no effect on SELECT. Each SELECT is atomic, of course,
but there's no way to hook two SELECTs together to make them see one
database state. That's what JOIN is for.  :-)  

A common error among inexperienced programmers is to assume that BEGIN
TRANSACTION works as you describe above: that it isolates SELECT from
competing updates.  But a transaction is not a lock!  A better mental
model is the one defined in the standard: BEGIN TRANSACTION defines a
unit of work that either will or will not be applied to the database in
its entirety (depending on COMMIT or ROLLBACK, of course).  It has a
role to play in maintaining database integrity.  It really has nothing
to do with isolation.  

Interesting discussion, Keith.  Thanks for the clarification.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] New column in select will not mask column of the same name in having clause and sqlite won't warn

2015-01-09 Thread James K. Lowden
On Thu, 8 Jan 2015 08:42:25 +
Hick Gunter  wrote:

> It is legal and well defined in SQLite. See the explain output below.
> This is because of the well-documented feature of SQLite that columns
> that are neither GROUPED BY nor aggregated will have a defined value.

OK, understood, "in SQLite".  Thanks.  UIUC, I wouldn't call it "well"
defined because of the effect of the ungrouped column: 

> the b value returned for each a will be (an arbitrary one) 

so two SELECTs taking advantage of this aspect of SQLite may return
different results for the same data.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Client/Server Best Practices

2015-01-16 Thread James K. Lowden
On Sat, 10 Jan 2015 00:58:25 -0700
"Keith Medcalf"  wrote:

> > there's no way to hook two SELECTs together to make them see one
> >database state. That's what JOIN is for.  :-)
> 
> Yes, it is a part of the SQL Standard isolation levels in excess of
> the default default of READ COMMITTED.  Different vendors call the
> isolation levels by different names -- MS SQL Server calls them Read
> Uncomitted, Read Committed, Repeatable Read, Snapshot, and
> Serializable; DB/2 calls them No Commit (similar to autocommit),
> Uncommitted Read, Cursor Stability, Read Stability, Repeatable Read.
> DB/2 Repeatable Read = MS Serializable (which is the SQL Standard
> Serializable isolation level).  Cursor Stability = Read Committed.

Keith, I just want to thank you for relieving me of my
misunderstanding.  I think some years ago I must have decided that for
my purposes Repeatable Read & friends don't exist because of their
effect on performance and concurrency.  I've always thought the
standard was paradoxical in the sense that the more you need isolation
(because of concurrent updates), the less useful it is to the extent
that restricts the system's abiltity to support concurrent updates.  

To bring it back to SQLite, we have two logging modes and two
isolation levels.  IIUC, given two connections it can be boiled down
thusly:

modetransaction isolation
Journal implicitRead Committed
WAL implicitRead Committed
journal explicitRepeatable Read
WAL explicitRepeatable Read

with the difference being concurrency: 

* in journal mode a writer is blocked by a reader, whereas 
* in WAL mode a writer can commit while a reader continues to work,
in isolation, unaffected.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] sqlite3 shell in windows

2015-01-16 Thread James K. Lowden
Hello all, 

I had the bright idea yesterday of trying to use an extension module in
Windows.  I found myself a bit confused, and the messages and
documentation were not as helpful as they might have been.  I suspect I
had a 32/64 bit mismatch in one case, and that sqlite3 wasn't compiled
with extension-module support, but I was unable to prove that to
myself.  

I downloaded and built my own code and the csvfile extension.  I was
using the windows binary downloaded from sqlite.org. The only messages
I managed to produce using the ".load" command were:

Error: The specified module could not be found
Error: The specified procedure could not be found

IIUC these are not SQLite messages; they are OS messages.  I certainly
hope so, because they have that patented Microsoft je ne sais quoi:
something went vaguely wrong.  Very vaguely.  

At first I thought I had a pathname problem, but eventually convinced
myself I was trying to load a 64-bit extension in a 32-bit process.  It
was the growing pile of broken pointers under my desk that clued me
in.  

When I managed to coerce Visual Studio to build a 32-bit DLL, it
elicited the second message, also from the OS.  At the time I thought
the shell as compiled might not support extensions. ISTR that being
mentioned on this very list once upon a time.  I might also have not
really produced a DLL; maybe it was just an executable named .dll. I
don't have the compile steps accessible as I write this, and it's easy
enough to verify when I'm back in the office.  

Things I would have liked to have been able to display on the screen:

1.  The architecture of an executable file, x86 or x64.  
2.  The compilation options used when building binaries posted on the
download page.  In particular, SQLITE_OMIT_LOAD_EXTENSION.  
3.  The name of the module not found. 
4.  The name of the procedure not found. 

Things I would like to verify here:

5.  The 32-bit windows sqlite3 shell supports extensions? 
6.  The above messages come from the OS, and result from LoadLibrary
failing?  IOW, I didn't build a proper extension, windowswise?  
7.  How does the shell behave if SQLITE_OMIT_LOAD_EXTENSION is used?
Obviously the function would fail, but I wonder if ".load" is dropped
from the help and parseable syntax?  
8.  Where is $HOME in windows, wrt .sqliterc?  I tried %appdata%; that
didn't seem to be it.  

Many thanks for clues, instructions, and pointers.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 tool bug

2015-01-16 Thread James K. Lowden
On Fri, 16 Jan 2015 10:38:54 -0800
Dave Dyer  wrote:

> [$] sqlite3 po.sqlite .dump | sqlite3 po2.sqlite
> Error: incomplete SQL: INSERT INTO "imageblob" VALUES(1,'G:\share

Perhaps try -echo, to display the incomplete SQL?  

I'm skeptical of the notion that cmd.exe is diddling with your data en
route to the pipe.  I can't think of a time Windows munged my data in
that particular way despite more years using that lousy tool than I
care to remember.  Quotes and escapes, sure, don't get me started. 

Shot in the dark: maybe a string is being continued by ending the
line with a backslash.  If the output handle is opened as text with
fopen, the sequence would be 

5c 0d 0a

which the escape-reader wouldn't recognize, expecting only 

5c 0a

leading to a noncontinued, incomplete line.  

But that doesn't explain the difference between redirecting to a file
and redirecting to a pipe.  

HTH.  

--jkl


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 shell in windows

2015-01-17 Thread James K. Lowden
On Fri, 16 Jan 2015 14:31:40 -0800
Random Coder  wrote:

> If you're seeing the "Error: The specified procedure could not be
> found." error, and you're not specifying an entry point in the .load
> command, then no doubt the sqlite3_load_extension symbol isn't
> properly exported.  I'd verify that your DLL has this symbol exported
> using a tool like depends (http://www.dependencywalker.com/).  I'm
> guessing you're missing an entry in your .def file, or a similar
> location.

Bingo, and thanks for the pointer.   I forgot that DLL symbols have to
be exported explicitly.  Up until now my module was compiled only for
NetBSD, and lacks any dllexport notation.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 shell in windows

2015-01-17 Thread James K. Lowden
On Fri, 16 Jan 2015 16:24:21 -0700
"Keith Medcalf"  wrote:

> >1.  The architecture of an executable file, x86 or x64.
> 
> dumpbin -- comes with the dev kit

I would have thought so, but I didn't find an option that reports it.  

> Importantly make sure you are exporting "C" names.  

I'm pretty sure, but thanks for the reminder.  IIRC files ending in .c
are compiled as C by default.  Definitely I forgot an explicit export.  

> Telling you the name of the service and the name of the file (and
> where it expected to find it) would be useful, but I think rule 1
> prevails "Everything Useful is Prohibited".  

Don't I know it.  Amen, brother, say it!  

> The compiler is called CL and it can give you its help with "cl -?".
> There is a silly batch file somewhere in the VS install directories
> that set all the environment variables properly so you can then use a
> real editor and compile from the command line.

So silly it is that when I choose the "x86 command-line" option in VS,
it doesn't DTRT.  I had to reverse engineer which vsvars32.bat to run.  

I decided to give http://mxe.cc a try.  News at 11.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] OT: cmd.exe (was: sqlite3 tool bug)

2015-01-18 Thread James K. Lowden
On Sat, 17 Jan 2015 20:20:06 +
Graham Holden  wrote:

> > I'm skeptical of the notion that cmd.exe is diddling with your data
> 
> I would echo this: it's good at mangling the command-line, but I've
> not been aware of it ever mangling data sent to a file/stream (other
> than the binary/text mode conversions).

cmd.exe holds the camel-straw that convinced me to give up on Windows as
a development platform.

Say you're a Perl programmer and associate .PL files with perl.exe so
that you can put .pl files on your PATH.  (The documentation for assoc
and ftype specifically mentions Perl as an example.)  When launched
though file-extension association, cmd.exe starts the process with all
files closed (instead of three open). There is absolutely nothing you
can do about it.  

Of course I'm not the first or only one to realize this, nor the only
one inconvenienced.  Of course the documentation doesn't mention that
if invoked in this way your program ceases to be a filter,
functionality every Perl programmer understands (and most expect).
And of course the problem has remained unaddressed for decades.  

Thanks for listening.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Invalid column prefix returned in SELECT with joined subquery

2015-01-25 Thread James K. Lowden
On Sun, 25 Jan 2015 23:18:05 +0200
RSmith  wrote:

> There is no documentation in either SQLite or the SQL standard 
> that would lead anyone to believe that behavior is expected - in fact
> it is very clear about the returned column names being
> non-deterministic if not explicitly requested by aliasing.

I no longer think that is entirely true.  In SQLite's documentation,
it's not in the description of the SELECT statement that I can find
(http://www.sqlite.org/lang_select.html).  It is mentioned in the
sqlite3_column_name description, so, OK, it's not a bug.  

But I'm going to say that a function that "returns the name" of a
column really should return the column's *name*.  Especially if that
name would suffice to address the column syntactically if the form
statement were wrapped one more level, i.e.: 

select * from () as T

Otherwise there's weird mismatch between the names SQL itself uses and
the ones presented to the user.  

My reading of what I can find of the standard makes me think that if
this is left up to the implementation, it's a molehill we can't use to
shout from.  Definitely it's a POLA pain point from the user's
perspective.  

The standard is proprietary, but the old SQL-92 draft is online at
http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt.  Part 6.4
describes a "column reference", and distinguishes a column *reference*
from its name: 

  ::= [   ] 

  ::=

  | 

So there's no such thing (in case anyone thinks there is) as a "fully
qualified name" in SQL.  There are names and references.  Sometimes a 
reference is letter-for-letter identical to the name; sometimes the
reference is longer to distinguish one name from another belonging to a
differerent entity.  

What should the "name" be that is returned to the user?  Going back to
my example, 

select  
from () as T

each name in  should be the shortest possible syntactically
correct column reference.  A  that designates a
unique set of names (by whatever means) thus yields an unambigiuous,
reasonable result. One that doesn't is more problematic to the user,
but then he's playing a little fast and loose in the first place.  If
there is no syntactically correct reference for a particular column
then, sure, make one up.  No one can complain about that.  

Implementation of such a policy would not break any existing
applications.  It might even yield improvements to SQLite internally
by making names more dependable, and by distinguishing more strongly
between the column's name and its syntactic reference.  

Humbly submitted.  

--jkl

P.S.  I tried testing this with sqlfiddle to get a vote, too.  But
they must have a problem with their output column logic; afaict
same-named columns are ignored after the first one (as it were).  If
you cross-join a table to itself, you get only one set of columns, not
two.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Invalid column prefix returned in SELECT with joined subquery

2015-01-25 Thread James K. Lowden
On Mon, 26 Jan 2015 02:28:33 +
Simon Slavin  wrote:

> > each name in  should be the shortest possible syntactically
> > correct column reference.
> 
> While you're discussing possibilties and alternatives, what should be
> returned as the name for the following column
> 
> SELECT 1*2 FROM myTable

To quote me,  ;-)  

>> If there is no syntactically correct reference for a particular
>> column then, sure, make one up.  No one can complain about that.  

Acknowledged, not every selected column will have a name in the catalog
or designated alias.  For those that do, though, I think both the
standard and common sense, not to mention logical consistency, indicate
the column ought to be known by its name.  That unifies how the user
references it in SQL and in the API.  What could be simpler?  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Invalid column prefix returned in SELECT with joined subquery

2015-01-26 Thread James K. Lowden
On Mon, 26 Jan 2015 09:05:32 +0200
RSmith  wrote:

> Understand, I do not think these are insurmountable problems, but two
> questions arise:
> - Who decides the rules for handling it so that it may become
> "trusted" by DB users/admins/programmers, if not the SQL standard? 

My reading of the standard is that the column *name* is what's
returned.  As such, SQLite would be correct to return a name where
there is one, and none where there is not, without distinguishing
duplicates.  (FWIW that's what SQL Server does.)  If SQLite did that in
the next release, no applications would break, and new applications
would benefit from deterministic, standard column names.  

My suggestion is a little more complex.  I offered it because it seemed
to be in the spirit of SQLite's current behavior (assign some kind of
name to every column) with the benefit of being deterministic.  

It's perfectly fine for SQLite to choose nonstandard behavior, and
there's no question that the documentation in this case is clear.  I
think it's a stretch, though, to claim the current behavior is
permitted by the standard.  

I doubt anyone considers it a great feature.  It is clearly confusing,
witness the topic arising here about once a month.  It's more of a
quirk: predictable and easy to deal with if you know about it.  

> how many cpu cycles might be spent to finding suitable column names

I'm not convinced there's any real performance issue at stake here.
No one's said so.  My impression is the current behavior is more an
engineering artifact. Because it has a pretty simple workaround, it's
being left as-is for now.  Which is also understandable.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Converting *.sqlite3 files

2015-01-27 Thread James K. Lowden
On Mon, 26 Jan 2015 19:26:49 +
Luke Niewiadomski  wrote:

> I am looking to translate *.sqlite3 files into *.csv, or similar
> format.  Would you be willing to point me in the right direction?  I
> appreciate any help on this.  

for T in $(sqlite3 -noheader $DB 
"select name from SQLITE_MASTER 
 where type = 'table'");
do 
sqlite3 -csv db "select * from $T" > ${DB}.$T.csv
done

That's the command you want, subsituting your database name for "$DB".
I can appreciate that it's not easy to read if you don't know Bourne
shell syntax, but if you can find someone who does, that will produce
one csv file for every table in the database.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Truncation of floating point numbers in SQLite?

2015-01-28 Thread James K. Lowden
On Wed, 28 Jan 2015 23:09:21 +
Simon Slavin  wrote:

> > This is a bit of a speculative question related to a problem I'm
> > having - are there legal values of a C++ double that would get
> > truncated when written into and read from an SQLite database?
> 
> In theory there should be no problem here.
> 
> C doubles have 15 to 16 digits of precision.
> 
> In SQLite databases, numbers which can't be stored as integers are
> stored as IEEE 754-2008 64-bit floating point numbers, sometimes
> known as 'binary64'.  These give 15 to 17 digits of precision.

We can make stronger statements than that, can't we?  It's not like
there's some mystery about it: database and memory share a single
floating-point format.  

If the caller uses _bind_double and _column_double, he's storing and
fetching a 64-bit double.  SQLlite's only job is to collect those 8
bytes, store them, and return them unmolested.  It has no private
definition of "double", offers no conversion between different
representations of double.  The bit pattern -- IEEE 754-2008, as you
say -- is defined by the *hardware*.  That format is shared up and down
the stack, regardless of whether or not SQLite and the application were
compiled using the same compiler.  

(That's no 100% true, of course.  Is anyone seen SQLite running on a
VAX?  Has it ever been done?)  

So I think it's a bit disingenuous to speak in terms of digits of
precision.  Application and library alike share a single IEEE
double-precision floating point representation.  There's no reason to
think, if the data are provided in binary form, that they won't be
returned in the identical form absent an explicit conversion.  If
that's not so, I'd sure like to know why.  

I'm faintly surprised NaNs can't be stored, too.  Why should SQLlite
interpret them if they're bound to a double?  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] "database disk image is malformed" error occurs more

2015-01-31 Thread James K. Lowden
On Fri, 30 Jan 2015 13:17:26 -0500
Stephen Chrzanowski  wrote:

> 2.1 Filesystems with broken or missing lock implementations
> 
> SQLite depends on the underlying filesystem to do locking as the
> documentation says it will. But some filesystems contain bugs in their
> locking logic such that the locks do not always behave as advertised. 

The problem is even deeper than that.  NFS does not implement Posix
semantics.  The actual behavior is hard to reason about and far outside
SQLite's scope.  

On a local file system the kernel guarantees filebuffer cache
coherency.  If process A reads a block from the disk, and process B
modifies the same block, the next time process A consults that block it
will see B's changes.  Note this happens whether or not locking is
involved, and regardless of the state of the disk.  It's a by-product of
a unified buffer cache.  

On a network filesystem there is no unified buffer cache.  Writes by B
are not seen when A consults its cached block.  NFS does not promise
that a second read by A will reflect changes made by B.  Even if all
locks are implemented corrected and honored, A stands to read invalid
data unless steps are taken to manage the cache, something SQLite
doesn't do afaik.  

The subject has been discussed here before, as it turns out.  The
Googles returned
http://sqlite.1065341.n5.nabble.com/SQLite-on-NFS-cache-coherency-td33697.html,
which contains much more detail and references.  

DBMS implementations are always about arbitrating access to shared
data.  They require a single, unified view of the data.  Getting that
view over a remote filesystem is difficult in the best of circumstances
and requires explicit measures be taken.  SQLite doesn't attempt to
do so, by design.  (A reasonable choice IMO.)  If you want multi-node
access to a database over a network, there are many other options.
Unsurprisingly, none of them use a network filesystem either.  

HTH.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Truncation of floating point numbers in SQLite?

2015-01-31 Thread James K. Lowden
On Fri, 30 Jan 2015 10:39:31 +0200
RSmith  wrote:

> At first I thought SQLite (or any RDBMS) should really strore
> whatever you give and return it untouched, but that cannot be true
> for an RDBMS because it has to interpret the data, it isn't just a
> binary store. It has to answer questions like SELECT ("colA"+3),
> (7/"colB");  or sort by colA or use a collation on strings etc. etc.
> - all of which means it must care about what the value relates to and
> cannot simply ignore it unless stored as an ignorant type (i.e Blob).

Yes, that's true.  Whenever a column value is interpreted by SQLite,
that interpretation might differ from the one used by the application
that inserted it.  

For example, I might have an 8-bit unsigned integer stored in column A
with the value 0xFF (all bits on).  The clause "WHERE A > 128" would
not return that row because SQLite interprets the column as signed and
the value as -1.  Similarly ORDER BY would not work in accordance with
the application's interpretation.  

That said, https://www.sqlite.org/datatype3.html says REAL is "stored
as an 8-byte IEEE floating point number".  I can't see why a
non-signalling NaN couldn't be stored and interpreted.  You'd need some
convention for collation, and a function like is_nan() would be helpful
for WHERE clauses.  

It could be argued that SQLite should *not* do that, in the interest of
simplicity.  NaN represents an invalid output, say sqrt(-2).  For any
function F, NaN = F (NaN), and NaN <> NaN.  Because SQL NULL works
similarly, ISTM a defensible alternative would be to say all NaNs
become NULL in the database, or raise an error.  That loses the
distinction between "missing" and "not computable", but that probably
doesn't matter in most practical senses.  

On the third hand, many numerical and stats packages, e.g. R, use NaN
(mistakenly, in my view) to represent missing data.  Integration with
such packages might be facilitated by "NaN fidelity" if we can call it
that, so that they can retrieve what they saved.  

> > Trying to retrieve a stored qNaN or sNaN returns a column type of
> > NULL and a value of 0.

That's consistent with division-by-zero yielding zero, and is an
impediment to using SQLite for scientific work.   

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 fails due to too long path (MAX_PATHNAME)

2015-02-02 Thread James K. Lowden
On Mon, 02 Feb 2015 12:59:55 +0200
Török Edwin  wrote:

> Would it be possible to raise that limit, or output a better error
> message that says why it failed to open the file?

Maybe.  open(2) should return ENAMETOOLONG.  It is possible, though
unlikely these days, that the shell's command-line limit is less than
PATH_MAX, in which case the string passed to the sqlite3 binary would
be truncated (and invalid).  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Truncation of floating point numbers in SQLite?

2015-02-02 Thread James K. Lowden
On Sun, 1 Feb 2015 02:13:15 +0100
Stephan Beal  wrote:

> On Sun, Feb 1, 2015 at 2:07 AM, Simon Slavin 
> wrote:
> 
> > So, having established that NaN and -0 do not make the round trip
> > from a C variable through a database and back into a C variable ...
> > at least I think we have ...
> 
> If you're assuming C89 (which sqlite3 is, by and large), it's a
> technical fact that there is no standard representation of either
> negative zero, NaN, or Infinity. Any such support would be
> non-C-standard.

As you know, C quite intentionally does not define bit-patterns for any
numeric type.  It doesn't specify endianism, twos-complement negatives,
or IEEE floating point format.  It doesn't even specify the number of
bits used.  That is part of what makes C code portable.  

IEEE floating point is implemented in hardware.  The format has been
universally adopted for floating-point units for 25 years or so.  There
are processors that lack floating point support, but I've never heard
of one that provides floating point in some other format.  

The compiler is free to decide what the bit pattern for 

double x = 1.2;

would be.  As a practical matter, normally the obvious choice would be
to implement IEEE format and rely the processor's floating point
support.  But that's not an obligation; it's implementation-defined.  

SQLite's documentation states that REAL is "stored as an 8-byte IEEE
floating point number".  I suspect that's actually an incidental
by-product of being compiled exclusively on modern machines, all of
which use IEEE floating point (if any).  I suspect a more accurate (but
obscure) description would be "stored in the format of a C double as
implemented by the compiler that compiled SQLite (normally IEEE 754)".  

If the following are not true, they should be, and we should understand
why not:

1.  For storage and retrieval as REAL with binary bindings, SQLite
simply copies the bits to and from the database.  There's no reason the
64 bits presented to the database can't be kept and returned on
demand.  

2.  For interpretation -- sorting, SQL computation, user-defined
functions -- SQLite again relies on the compiler and perhaps math
routines in the standard library.  Want to know how rounding works, or
how your single-precision variable is widened to double-precision?
RTFM!  

I mentioned collation before, but I don't think SQLite need have any
rule about e.g. "how to sort NaN".  The processor surely has a rule for
comparing NaNs.  The compiler will cause the processor to make the
comparison and report the result, which SQLite can use without further
assessment.  

It was alleged earlier that denormal numbers, -0, and NaN cannot be
bound and stored to a column with REAL affinity.  If that's true,
SQLite is interpreting the values or applying rules that afaik aren't
explicitly stated and aren't strictly necessary.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ordinary CTE containing sum()

2015-02-07 Thread James K. Lowden
On Sat, 7 Feb 2015 12:31:37 -0500
Doug Currie  wrote:

> In response to this SO question:
> 
> http://stackoverflow.com/questions/28377210/how-to-retrieve-rank-based-on-total-mark-in-sqlite-table
> 
> I tried to formulate a query without temp tables using an ordinary
> CTE, but received an error "misuse of aggregate: sum()".

My standard answer is http://www.schemamania.org/sql/#rank.rows. 

You don't need a CTE; it's just a syntactic convenience.  Substituting
his table in my example, something like this should work: 

select S.id, S.total, count(lesser.total) as RANK
from (
 select id, sum(cal1 + cal2 + exam) as total 
 from T group by id 
) as S
join (
 select id, sum(cal1 + cal2 + exam) as total 
 from T group by id 
) as as lesser
on   S.SID >= lesser.SID
group by S.SID, S.total
order by S.SID;

Someone will doubtless be tempted to suggest that's inefficient in
SQLite, and that may be so.  (I haven't checked.)  It needed be,
though.  The system could detect the repeated subquery and evaluate it
once.  Not that it matters for any dataset of students and exams on the
planet!  :-)  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ordinary CTE containing sum()

2015-02-08 Thread James K. Lowden
On Sun, 8 Feb 2015 09:57:54 -0500
Doug Currie  wrote:

> tonypdmtr  on SO
> posted a CTE solution; it is something like this, which works for me:
> 
> with tt (S_id, total) as
>(select S_id, sum(ca1) + sum(ca2) + sum(exam) as total
>from t group by S_id
>union values (NULL, 0))
> select s.S_id, s.total,
>(select count(*)+1 from tt as r where r.total > s.total) as
> rank from tt as s where S_id is not NULL;
> 
> But my question remains, why is the UNION necessary in the  CTE?
> 
> why doesn't this work? ...
> 
> with tt (S_id, total) as
>(select S_id, sum(ca1) + sum(ca2) + sum(exam) as total
>from t group by S_id)
> select s.S_id, s.total,
>(select count(*)+1 from tt as r where r.total > s.total) as
> rank from tt as s;

I don't know the answer to your questions, except to say that IMO the
SO answer is incorrect for at least two reasons:

1.  Last I checked, SELECT in a column position in the SELECT clause as
in 

select foo (select ...)

is not permitted by the SQL standard.  

2.  In principle, adding data in a query to "make it work" is the wrong
approach.  Here "and S_id is not NULL" is needed to compensate for that
UNION.  

Why it's "needed", in the sense that you get "misuse of aggregate", I
don't know.  It looks like valid SQLite syntax to me.  But I like my
corrected syntax better:

with tt (S_id, total) as
   (select S_id, sum(ca1 +ca2 + exam) as total
   from t group by S_id)
select S.S_id, S.total, 1+count(lesser.total) as RANK
from tt as S
left join tt as lesser
on   S.total < lesser.total
group by S.S_id, S.total
order by S.total desc;
S_idtotal   RANK  
--  --  --
2   198 1 
4   198 1 
5   183 3 
3   165 4 
1   143 5 

because it produces the desired result using standard syntax.  I would
argue that's a clearer expression of the problem, too.  

I have a couple of efficiency questions for those who know:

1.  Is the left-join on a CTE apt to be more effecient than the version
that uses a correlated subquery in the SELECT clause?  

2.  Is there any performance difference between 

sum(ca1 +ca2 + exam) 
and 
sum(ca1) + sum(ca2) + sum(exam) 

I would expect the left join is faster than a correlated subquery, and
that fewer aggregates is better than more. 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] equivalent for JOIN LATERAL

2015-02-08 Thread James K. Lowden
On Sun, 8 Feb 2015 23:52:43 +0100
Big Stone  wrote:

> I fall over this presentation of LATERAL, from postgresql guys.
> 
> Does it exist in SQLITE ?

Syntactically, no.  Functionally, in part.  

> If not, would it be possible too much effort ?

I'm guessing the answer is No because the prerequisites are missing.  

Something like LATERAL (or APPLY in SQL Server) arises around
table-valued functions, which really should be called parameterized
views.  You think you'd like to be able to say, 

SELECT S.*
FROM T join F(T.t) as S on T.t < S.x

where F is some function that produces a table for a scalar/row
input.  

However, perfectly nothing new is really needed to express the idea: 

SELECT S.*
FROM (select F(t) from T) as S
WHERE EXISTS (select 1 from T where S.x > T.t)

I suspect that new syntax like this is usually added to SQL for the
wrong reasons.  

1.  Marketing.  Now with LATERAL added!
2.  User-imagined need, because don't know SQL
3.  Punt on query optimization, invent keyword as hint

In each case, they have added complexity without power.  The "improved"
system is harder to use and to develop.  But, hey, it's progress.   

?Perfection is achieved not when there is nothing
left to add, but when there is nothing left to take away? 
? Antoine de Saint-Exupery

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] SQLite to SQL Server

2015-04-13 Thread James K. Lowden
On Mon, 13 Apr 2015 21:38:25 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> Is there a convenient way to transfer data from SQLite to SQL Server? 

If I were doing it, I'd produce tab-delimited files with sqlite3 and
use the SQL Server bcp utility to upload the output.  

The current trunk of the FreeTDS freebcp utility accepts input on a
pipe, such that

$ sqlite3 -separator "  " db "select ... " | freebcp [...]

will work.  

HTH.  

--jkl


[sqlite] JSON expressions for records and synchoronisation

2015-04-13 Thread James K. Lowden
On Fri, 10 Apr 2015 14:15:43 -0700
Roger Binns  wrote:

> On 04/10/2015 01:51 PM, Simon Slavin wrote:
> > With a cursory glance I do see important incompatibilities with
> > SQLite.
> 
> I use JSON as the data format for $work stuff (startups) for years,
> and these JSON schemas etc miss why some of us use JSON.  If you are
> using a statically typed language like Java or C++, then arbitrary
> data is very difficult to deal with.  XML is very popular in that
> world because DTDs mean you can make it statically typed.  The mindset
> is that if everything can be specified and constrained in advance,
> then successfully compiling the code means it is mostly correct.

I beg to differ, Roger.  The programming language of choice doesn't
affect the correctness of a function.  A function has preconditions and
postconditions.  They define the domain of the input and the range of
the output.  Failure to meet a precondition invalidates the function,
irrespective of the language used.  Lack of a compiler doesn't change
that.  

It's not a matter of "mindset", and type-correctness isn't eliminated
by uncompiled languages.  I program in Python and C++. In both
languages, if I try to take the substring of an integer, I get an
error.  The difference is that in C++ the error comes from the
complier.  I suspect lack of compilation in popular languages like
Python is driving the current mania for test-driven development.
Instead of letting the compiler find type errors systematically, we use
an ad-hoc framework to discover them manually. This we call
"progress".  

> JSON like several other languages (eg Python, Ruby) is dynamically
> typed.  You can stick anything you want in variables/fields, since the
> type is associated with the value, not the variable name.  

That's fine as far as it goes.  If your language lets you use
variable foo to refer to value X in one place and to value Y somewhere
else, that's OK with me.  

C lets you do the same, by the way, in a limited sense.  One difference
is that in C the name denotes storage, and C won't let you re-use a name
within the same scope to refer to different storage. Another difference
is that C checks to ensure the storage (i.e. value) is used
consistently according to its declared type *before* the program is
executed.  In Python, that work is left to the interpreter, and errors
are consequently discovered later.  

I don't see why name-value association and static type-checking are
necessarily related.  I see that static type-checking in Python is
difficult, but not why it's undesirable.  

> Importing code does not mean it is correct - running it is how you
> figure that out.

Exactly.  

> This allows for far greater flexibility, and especially means you do
> not have to decide in advance exactly what you will do with data.  

I think you know better.  When you write a program, you *have*
decided "in advance" (of running it) "exactly what you will do" with
the data.  What you have not done, in a language like Python, is use
the computer to verify to any extent the correctness of your program
before you run it.  

There are two contradictory language trends in our industry today, one
scientific and one not.  Rooted in computer science are languages like
Haskell that enforce type-checking rigorously and, to a limited extent,
let the programmer write programs that are provably correct.   At the
other end of the spectrum are lightweight languages like Python and
Javascript that offer no typechecking and rely on programmer
discipline and testing to reduce the error count to an acceptable
nonzero number.  

Coming back to SQLite and JSON, the issues of type, domain, and range
continue to be relevant regardless of storage format and rule
enforcement.  JSON won't enforce uniqueness constraints or referential
integrity.  Lack of such support places the burden of correctness on
the programmer, precisely the situation that the relational model was
invented to obviate.  Any apparent "flexibility" afforded by lack of
constraint enforcement is a chimera.  The constraints must be met; it's
only a question of who will make sure they are, and when, and how.  

--jkl


[sqlite] Request: Metadata about C API constants and functions

2015-04-16 Thread James K. Lowden
On Thu, 16 Apr 2015 00:40:28 +0100
Simon Slavin  wrote:

> > It is a very productive time when you
> > get to delete code :-)
> 
> 

"[If] we wish to count lines of code, we should not regard
them as "lines produced" but as "lines spent": the current conventional
wisdom is so foolish as to book that count on the wrong side of the
ledger."
--EWD, On the cruelty of really teaching computing science
http://www.cs.utexas.edu/users/EWD/transcriptions/EWD10xx/EWD1036.html


[sqlite] How do non-SQLite DBMS communicate?

2015-04-23 Thread James K. Lowden
On Wed, 22 Apr 2015 22:28:57 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> When using SQLite the application program accesses the SQLite DBMS
> via its .dll file. 

The DLL is a function-call library.  A function is a named bit of
code.  To "call a function" is to jump to that named bit of code.  The
SQLite functions are part of your application.  That means they execute
in your process's virtual address space.  

> When using something like Oracle Express (a local DBMS) the
> application program is communicating with Oracle Express via some
> sort of network protocol even though there's no network involved.
> What is that called?

See "loopback" in wikipedia.  Oracle is a separate process from yours;
you application communicates with it via TCP/IP.  If the Oracle process
resides on a machine other than the one your process is running on, it
connects to that machine with TCP/IP (probably using a hostname and a
port number).  If the Oracle process is running on the same machine as
yours, your process connects to it, still using TCP/IP, but via the
loopback network, usually by the name "localhost".  

>From the point of view of Oracle and your process, there *is* a "network
involved" either way.  In the case of the loopback address, the network
is emulated by the operating system.  Many parts are missing (wires,
for one, at least ones you can see) but the ends look the same to both
players.  

--jkl


[sqlite] Destroy all evidence of a database

2015-04-23 Thread James K. Lowden
On Wed, 22 Apr 2015 16:56:07 +0100
Simon Slavin  wrote:

> You have made me realise, however, that a nice attack against
> encrypted SQLite databases might be to crash a SQLite application
> while it's processing and examine any journal files, shared memory
> file and temporary index files.  It might be interesting to review
> the various encryption systems widely available for SQLite and figure
> out which of them would be vulnerable to such an attack.

Encryption found its way into DBMS featuredom about 10 years ago, I
guess.  I've always thought it was patently stupid.  A DBMSs job is
store data.  Encryption probably should be done in the application.
Failing that, whole-filesystem encryption solves the problem in a
general way.  For SQLite, an encrypted loopback filesystem would solve
you problem neatly, except for that "no virtual filesystem"
stipulation.  

Coming back to the problem at hand, Scott Hess suggested that you modify
the SQLite VFS to trap all calls to the underlying open(2).  In that
way ISTM you could add each opened filename to a list processed by
a function whose address is passed to atexit(3).  Assuming the task
terminates nomally (unsignalled) all registered files would be deleted
by that function. If signals also need to be dealt with (IIUC they do
not) then I would fork the process that uses SQLite and arrange for the
parent to delete the files when the child terminates.  

What I like about the VFS idea is that it's minimally dependent on
SQLite particulars.  However the code changes in the future, its
relationship to the VFS will be quite stable.  You don't need to know
how many OS files are opened, or by what name.  Just trap and record
each one.  

HTH.  

--jkl


[sqlite] json_* functions in sqlite

2015-04-23 Thread James K. Lowden
On Tue, 21 Apr 2015 18:09:33 -0700
Ashish Sharma  wrote:

> Many times I store JSON data in sqlite. It will be useful if sqlite
> came with functions which understand JSON. Presto has a nice set
> https://prestodb.io/docs/current/functions/json.html

In case you don't know, you could implement functions such as Presto
provides yourself with a set of user-defined functions, without any
help from the SQLite project.  

As to whether JSON should be a supported datatype in SQLite, the answer
is clearly No.  SQLite, let us note, barely recognizes datatypes at
all, and lacks even a date type.  But more than that, JSON as "native"
datatype has two fundamental problems: definition and complexity.  

Definitionally, we should require any datatype have meaningful
operations comparable to those of numbers and strings.  What does it
mean to "add" two JSON objects?  When is one less than another?  Do you
seriously want to propose as a *type* a thing that can't be a primary
key?  

The problem of complexity is that everything in the JSON blob can,
fundamentally, be represented as tables in SQLite.  I realize "modern"
tools read/write JSON, that it's the OODBMS of the Javascript set.  But
that doesn't change the fact that the JSON tree is a graph, and we know
every graph can be represented with tables. 

Why does that matter?  Because a tree-as-datatype would add a whole new
theoretical structure (graph theory) that is 100% redundant to the
relational model embodied in SQLite.  You get a bunch of new functions
and ways to get at the data.  What you do *not* get is additional query
power.  In fact you get less, because graph theory gives you less: no
subsets and no joins, to name just two.  

That's not to say there should be some rule preventing you from storing
JSON in your SQLite database.  You may find it convenient, especially if
supported with some functions that make it possible to compare (or
perhaps update) components of it, because it represents some giant
state-property that for most purposes can be treated as an integral
unit.  It is to say that every JSON-specific feature you add duplicates 
one already present (in a different form) in SQLite.  Add enough of
them and you'll replace the DBMS with itself, if you see what I mean.  

--jkl


[sqlite] Thoughts on storing arrays of complex numbers

2015-04-24 Thread James K. Lowden
On Fri, 24 Apr 2015 13:37:40 +
"Drago, William @ CSG - NARDA-MITEQ"  wrote:

> I'm trying to avoid re-inventing the wheel. Is there a best or
> generally accept way to store arrays of complex numbers? 

A table in First Normal Form has no repeating groups.  That means no
row has an array of any kind.  Arrays in the relational model are
represented in columns, one element per row.  

A column whose type is nonrepeating is said to be "atomic", but that's
something of a convention.  An atom is supposed to be indivisible, but
we can take substrings of string, parts of dates, and exponents of
floating point numbers.  So nonrepeating datatypes aren't necessarily
atomic, exactly.  They're just not repeating.  ;-)  

The question of your complex array then comes down to two apects: how
to represent the complex number, and how to represent the array.  The
case for the array is simple: keep one complex number per row.  The
case for the "divisible atomic" complex number depends on a choice: how
you want the DBMS to treat the components of the complex type.  

The most general solution -- and therefore probably the best one -- is
to keep the real and complex component each in its own REAL column.
That lets you sort and select the complex numbers using SQLite's
built-in functions without limitation.  For example, if we call those
components "a" and "b", you could say, 

select * from T where "a" between 1.0 and 2.0

Such a table would be 

create table complex_array
( name TEXT not null
, ordinal INTEGER not null
, real_part REAL not null
, imaginary REAL not null
, primary key( name, ordinal )
);

That's the textbook solution on a DBMS without user-defined types.  

An alternative is to conceive of the complex type as a datatype, and
represent it in SQLite as BLOB or TEXT.  That severely limits
SQLite's ability to compare and select the values, although that
limitation can be somewhat alleviated with user-defined functions e.g., 

select * from T where "C" = complex(1.0, -0.5)

If 1) you're not interested in letting the DBMS inspect the data, and
2) you have some convenient C function to losslessly convert your
complex type to a string or bit-string, then a single-column
representation might be more convenient.  

--jkl




[sqlite] SQLite and Scientific Computing: Arrays and Complex Numbers

2015-04-27 Thread James K. Lowden
On Mon, 27 Apr 2015 16:35:11 -0400
Jim Callahan  wrote:

> So, the application of arrays and complex numbers go far beyond the
> one question posed to this list with arrays of integers and reals
> being far more common than arrays of complex numbers.
> 
> Complex numbers are included as types in C, C#, FORTRAN and Python
> and are being added to server databases (Oracle and Microsoft SQL
> Server) as user defined types, so programmers and server database
> administrators may develop an expectation that complex types or user
> defined types will be supported SQLite3.

Useful as complex numbers and arrays are to scientific work, SQLite is
deficient in ordinary math.  Division by zero yields NULL, for
example.  The usual suite of mathematical functions such as declared in
math.h are absent, never mind linear algebra.  Even enforcing the rule
that a column always hold a number is inconvenient at best.  

> I would expect that users of complex numbers would be an
> extremely small subset of data base users

Yes, and database users are a small subset of scientific programmers.
As some of the papers you cited allude to, there is some work (thanks
in part to Jim Gray) to answer the needs of that subset.  Most of that
work, though -- already a subset of subset -- focuses on giant
datasets.  "SQLite for science" is thus a small population by almost any
measure.  

IMO a "scientific DBMS" comprises two separate good ideas.  Neither one
has been solved.  

The first is efficient computation over "array-like" stuff, alerting
the computation engine that it is working with arrays, not sets, and/or
directly invoking existing libraries e.g. BLAS.  A major hurdle is the
memory model: because array-programming libraries normally mandate the
data be in contiguous memory, there's a cost to converting to/from the
DBMS's B+ tree.  The more array-like the physical storage of the DBMS,
the more it cedes transactional and update capability.  

The second is expression of mathematics in a database language.  SciQL
goes some distance in that direction by making tables and arrays
interchangable, but the battle is not yet won.  I suspect a better
answer would arise from APL-for-databases than SQL-for-math. But, see
above, not exactly a popular topic.  

--jkl


[sqlite] SQLite and Scientific Computing: Arrays and Complex Numbers

2015-04-29 Thread James K. Lowden
On Tue, 28 Apr 2015 09:24:56 +0200
Dominique Devienne  wrote:

> On Tue, Apr 28, 2015 at 4:16 AM, James K. Lowden
>  wrote:
> 
> > A major hurdle is the memory model: because array-programming
> > libraries normally mandate the data be in contiguous memory,
> > there's a cost to converting to/from the DBMS's B+ tree.  The more
> > array-like the physical storage of the DBMS, the more it cedes
> > transactional and update capability.
> 
> Well, just look at how Oracle solved that problem. The row data
> doesn't hold the blob itself, like in SQLite, but an index to
> separate blob pages. This proverbial level of indirection brings
> tremendous benefits, because you can then update a blob w/o having to
> rewrite the whole blob (you copy the "blob" page(s) being updated
> only, and copy the row into a new "data" page with an updated "blob
> index" with just a few entries changed to point to the updated pages. 

Does Oracle provide access to the BLOB as a contiguous memory image in
the Oracle server's virtual address space?

The benefit you're describing reduces the I/O required to update the
BLOB and to traverse the table without referencing it.  That's a *very*
old technique; it was available in Sybase in 1980s.  To support use of 
arbitrary linear algebra functions on those BLOBs, they would have to
supplied to user-defined functions in linear memory with known
addresses.  

--jkl


[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread James K. Lowden
On Wed, 29 Apr 2015 02:39:50 -0600
Scott Robison  wrote:

> On linux, malloc may return a non null yet invalid pointer and only
> fail when the memory is accessed because it wasn't really available.

Citation needed.  I believe SIGSEGV is possible with anonymous mmap and
overcommitted memory.  ISTR I've read as much.  But I was unable to
find a definitive reference saying as much when I looked for one in
answering this message.  

It's not quite accurate to say the pointer is "invalid".  It's valid;
it just refers to memory the system may not be able to supply when
committed_memory > memory.  

--jkl



[sqlite] Segfault during FTS index creation from huge data

2015-04-29 Thread James K. Lowden
On Wed, 29 Apr 2015 01:27:03 +0300
Artem  wrote:

> >> Error: near line 1: out of memory
> 
> > That's not a segfault, though, is it.
> 
> When I did the same in linux version of SQLite - I saw
> the "Segmentation Fault" error.

The above message indicates that SQLite trapped an errno of ENOMEM
after malloc returned NULL.  Otherwise it would not report "near line
1", something the kernel could never do.  

One possible way you could see SIGSEGV on Linux -- even if every call
to malloc is handled correctly and no out-of-bounds reference is ever
made -- is if the process runs out of stack space.[1]  That seems within
the realm of possiblity if we're talking about giant allocations.  

It's also plausible that Windows would report ENOMEM while LInux does
not.  If, as is commonly done, Linux is configured to overcommit
memory, malloc will seldom or never (depending
on /proc/sys/vm/overcommit_memory) return NULL.  Given the same RAM and
swap space, Windows would report ENOMEM when Linux would not.  In the
Linux case, the process could plausibly continue (and not be the OOM
victim) up until the point that it exhausts its stack.  

You could check the value of your overcommit_memory and set it to 2 if
that's not already the case.  That might shed some light.  

Developers: What value of overcommit_memory is used in testing?  

---jkl

[1] http://www.win.tue.nl/~aeb/linux/lk/lk-9.html


[sqlite] Segfault during FTS index creation from huge data

2015-04-30 Thread James K. Lowden
On Wed, 29 Apr 2015 20:29:07 -0600
Scott Robison  wrote:

> > That code can fail on a system configured to overcommit memory. By
> > that standard, the pointer is invalid.
> >
> 
> Accidentally sent before I was finished. In any case, by "invalid
> pointer" I did not mean to imply "it returns a bit pattern that could
> never represent a valid pointer". I mean "if you dereference a
> pointer returned by malloc that is not null or some implementation
> defined value, it should not result in an invalid memory access".

Agreed.  And I don't think that will happen with malloc.  It might, and
I have a plausible scenario, but I don't think that's what happened.  

In the bizarre context of the Linux OOM killer, the OS may promise more
memory than it can supply.  This promise is made by malloc and
materialized by writes to memory allocated through the returned
pointer, because at time of writing the the OS must actually (and may
fail to) allocate the memory from RAM or swap.  

Exhaustion of overcommitted memory does *not* result in SIGSEGV,
however.  The OOM killer selects a process for SIGKILL, and the
straw-on-the-camel's-back process that triggered the OOM condition is
not necessarily the one that is selected.  

As far as "invalid" goes, I don't see how we can single out pointers
from malloc.  In the presence of overcommitted memory, *all* addresses,
including that of the program text, are invalid in the sense that they
are undependable.  The process may be killed through no fault of its
own by virtue of a heuristic.  I think it's fair to say it makes the
machine nondeterministic, or at least adds to the machine's
nondeterminism.  

Can writing through a pointer returned by malloc (within the
allocated range) ever result in SIGSEGV?  Maybe.   I have a plausible
scenario in the context of sparse files and mmap, which malloc uses.  

Let us say that you have two processes on a 64-bit machine, and a 1 TB
filesystem.  Each process opens a new file, seeks to the position 1 TB -
1, and writes 1 byte.  Each process now owns a file whose "size" is 1 TB
and whose block count is 1.  Most of the filesystem is empty, yet the
two files have allocated 200% of the available space.  These are known
as "sparse" files; the unwritten locations are called "holes".  

Now each process calls mmap(2) on its file for the entire 1 TB.  Still
OK.  mmap will not fail.  The holes in the files return 0 when read.
When written to, the OS allocates a block from filesystem and maps it
to a page of memory.  As each process begins writing 1's sequentially
to its memory, successive blocks are allocated.  Soon enough the last
block is allocated and the filesystem will be really and truly full.  

At the next allocation, no block can be allocated and no page mapped.
What to do?  When calling write(2) on a full filesystem we expect
ENOSPC, but there's nowhere to return an error condition when writing
to memory.  Consequently the OS has no choice but to signal the
process.  That signal will be, yes, SIGSEGV.  

What does that have to do with malloc?  GNU malloc uses mmap for large
allocations; the pointer it returns is supplied by mmap for an
anonymous mapping to blocks in the swap partition.  If malloc creates
sparse files, writes through malloc'd pointers could result in SIGSEGV.
However, I do not know that that's what malloc does.  

I do not think that's what's happening in the OP's case.  I suspect the
OP's process sailed past any memory-allocation constraints because of
the overcommitted memory configuration, and eventually ran aground when
the stack was exhausted.  Others have already suggested fixing the
overcommit setting as a first step.  Others might be:

1.  Examine the core dump to determine if the SIGSEGV was triggered by
a write to heap or stack memory.  Or not, as the case may be.  ;-)  

2.  Investigate the malloc algorithm and/or replace it with one that
does not use sparse files.  

3.  Increase the stack space allocated to the process.  

It's an interesting problem.  I hope we learn the answer.   

--jkl


[sqlite] Does column order matter for performance?

2015-04-30 Thread James K. Lowden
On Thu, 30 Apr 2015 09:45:14 -0700
Pol-Online  wrote:

> I wasn?t able to find the answer to this question online: does the
> column order matter for SQLite performance? E.g. should you put fixed
> width columns like INTEGER before TEXT or BLOB?  

I'm going to check that box as No.  

SQLite keeps its rows in pages and reads whole pages.  Within a page it
references rows and columns by offset.  Every location within a page is
accessed in constant time.  So column order doesn't matter because
page-offset doesn't matter.  

Most database performance problems are design problems.  If you pay
attention to cardinality and index definition, the rest will usually
take care of itself.   

--jkl


[sqlite] pragma temp_store_directory is deprecated, what is the alternative?

2015-08-14 Thread James K. Lowden
On Mon, 27 Jul 2015 20:35:30 +0100
Simon Slavin  wrote:

> On 27 Jul 2015, at 8:03pm, Zsb?n Ambrus  wrote:
> 
> > Does this work with the same environment variable name on both unix
> > and windows?  I'm asking because unix and windows programs each use
> > different conventions for what environment variable to care about
> > when determining the temporary directory, and these conventions
> > differ on unix and windows, though I believe TEMP is the customary
> > variable on windows
...
> If it's going to be documented, it would also be nice to see it on
> 
> 
> 
> which, according to my browser, doesn't mention the word
> 'environment'.

I would be nice to have clearly documented the means by which the
location of temporary files is controlled.  I assumed some flavor of
tmpfile(3) was used, but evidently SQLite doesn't rely on that
function.  

BTW, Posix is almost silent on the question.  It says TMPDIR will
define the location of a temporary store, but not how.  The C standard
library doesn't mention it for functions like tmpfile.   BSDs such
as OS X do, but according to the docs GNU glibc doesn't.  

The issue of thead safety when using environment variables is
spurious.  In a few places I can think of in libc (time zone, locale),
it consults the environment once on startup (or first use) and uses that
variable henceforward.  SQLite could do something similar, or otherwise
ensure that whenever the environment is read all threads get a
consistent view.  

--jkl


[sqlite] pragma temp_store_directory is deprecated, what is the alternative?

2015-08-18 Thread James K. Lowden
On Sat, 15 Aug 2015 01:17:28 +0100
Simon Slavin  wrote:

> > BTW, Posix is almost silent on the question.  It says TMPDIR will
> > define the location of a temporary store, but not how. 
> 
> I'm okay if the documentation simply says something like ...
> 
> For Darwin (Mac), it's always /tmp/
> For Linux, see the TMPDIR environment variable
> For Windows see [whatever it is]

Agreed, although IIUC it could be simpler than that, see next.  

> However I suspect things may be more complicated than that.  For
> instance, does .NET respect the OS's choice of temporary directory no
> matter which OS it's running under ?  I have no idea.

What I'm suggesting is that there is no "OS's choice", really.  

There are a few functions in the C standard library, e.g. tmpfile(3),
that may consult the environment.  The variable's name varies by
implementation.  Some implementations, notably GNU's (if the
documentation is correct), do not consult the environment.  

I would guess .NET is written atop the Win32 API and uses
GetTempFileName or somesuch.  That uses GetTempPath, whose return value
is affected by TMP and TEMP.
(https://msdn.microsoft.com/en-us/library/windows/desktop/aa364992
(v=vs.85).aspx).  

GetTempPath and tmpnam(3) on Windows both honor TMP, but the fallback
policies differ.  So it's not really a question of what the OS's choice
is, because the *OS* offers no "temporary file" function.  It's really a
question of which library function is called, and how that function is
implemented.  

But none of that matters unless those functions are used.  An
application -- or library, as in SQLite's case -- need not use them,
which in any case aren't all that helpful.  AIUI SQLite does *not* use
those functions, but rather has its own way to determine where temporary
files go.  In that case the rule could be quite simple and
OS-independent.  For instance, 

1.  Use "SQLITE_TMPDIR" if defined 
2.  Use current working directory otherwise

where the value is set by sqlite3_initialize and cannot be changed
thereafter. 

--jkl



[sqlite] Boolean constraint regression between 3.7.7.1 and 3.8.6?

2015-08-25 Thread James K. Lowden
On Mon, 17 Aug 2015 17:19:49 +0200
Olivier Barthelemy  wrote:

> CREATE TABLE VariableDetails (dtl_id INTEGER PRIMARY KEY ASC
> AUTOINCREMENT, storage_implicit BOOLEANCHECK (storage_implicit =
> 'true' OR storage_implicit = 'false'), storage_type TEXT NOT NULL);
> 
> Insert statement
> INSERT INTO VariableDetails (storage_implicit, storage_type) VALUES
> (true, INT_64);
> ( Values are not passed dirrectly in the statement. The boolean in
> particular is passed using sqlite3_bind_int() )

I think you got to the right place with this, but I'm not sure it was
made clear that the reason is evident in the above text.  

The constraint is against the strings 'true' and 'false'.  The insert
statement inserts "true", no quotes; as you say, it's a C symbol
interpreted by the compiler.  It's bound to the prepared statement with
sqlite3_bind_int.  The value in the bound location will be interpreted
as an integer, not as a pointer to a character array!  

--jkl


[sqlite] order by not working in combination with random()

2015-08-25 Thread James K. Lowden
On Mon, 17 Aug 2015 12:01:58 +0200
Clemens Ladisch  wrote:

> Just because the ORDER BY clause refers to a column of the
> SELECT clause does not mean that the value is not computed
> a second time.

Let's at least recognize that as a bug.  ORDER BY shouldn't interpret
SQL or invoke functions.  It's not even properly part of the SELECT
statement, but is rather just a post-SELECT filter.  Any changes to the
behavior of ORDER BY effected by changing the syntax in semantically
equivalent ways is only further evidence of the bug.  

ORDER BY should evaluate the *data* only.  In the case of random(),
like any other function it gets invoked once, by SELECT, and produces a
value.  That value is the one that ORDER BY should operate on.  

--jkl


[sqlite] order by not working in combination with random()

2015-08-26 Thread James K. Lowden
On Wed, 26 Aug 2015 13:39:09 +0100
Simon Slavin  wrote:

> 
> On 26 Aug 2015, at 11:27am, tonyp at acm.org wrote:
> 
> > Plus, it apparently recognizes if the random() expression in the
> > ORDER BY is the same as the SELECT one and again sort correctly
> > (without re-evaluating) and without needing an alias.
> 
> Ah, but I would call /that/ a bug !

I think I can convince you Postgres is right and SQLite is not.  :-)
I'll give you 4 reasons:

1.  Theory
2.  Authority
3.  Atomicity
4.  Consistency

Theory.  Like the relation of relational algebra, a SQL table has no
order. (Not only tables, but any table-like thing: view, CTE,
subquery.)  If you take a table and permute its order, it's still the
same table. ORDER BY permutes the order in a particular way.  It does
not change the table.  Relationally, the input and output of ORDER BY
must be identical.  

Authority.  As a matter of fact, CJ Date says the output of ORDER BY is
*not* a table, but a cursor.  That makes sense, and also explains why
ORDER BY can, according to the SQL standard, appear only once in a
query, and only at the end.  You could think of ORDER BY not as a sort
operator but as a row-returning operator that magically returns the
"next" row per a specification.  

The SQL-92 standard distinqushes between the SELECT statement with
all its many clauses, and ORDER BY:

"""
 Format

  ::=
   [  ]
"""

SQLite would do well to adopt this notion in its query processor.  It
would simplify the system, and would have avoided this error.  

Atomicity.  The rule in SQL is that all functions act *as if* they were
invoked only once.  We had this confusion some time back with 
"date(now)".  There's only one "now" in a query, and there's only one
random() in a row-column.  

The column-specification of SELECT is applied *once* to each row.  The
column-specification of ORDER BY (and GROUP BY) is *not* applied; it
merely *designates* the column by using the same specification text.
Consider this example:

sqlite> select 2+2 order by '2+2';
2+2   
--
4 

'2+2' here is only the column name.  It's not an instruction to create
data.  

Consistency.  Any two semantically equivalent queries that produce
different outputs from the same input constitute prima facia evidence
of a bug.  However you restate a query, if it is logically equivalent,
it should produce the same results.  The fact that you can rejigger
queries with RANDOM() and get different answers tells you immediately
that something is wrong.  

It's sometimes useful to think about how a DBMS works, and imagine
traversing trees, sorting, etc.  It helps in understanding how to
express a query, and in how it will be processed.  But you have to
throw all that away when thinking about query semantics.  The user has
*one* chance only to state the query; he has no control over how its
executed.  As he is constrained, so is the system: it cannot let
implementation details affect query semantics.  That's the way it is
and should be, because it best serves the interests of both parties.  

--jkl


[sqlite] ABOUT ROWID

2015-12-11 Thread James K. Lowden
On Fri, 11 Dec 2015 05:14:24 -0700
"Keith Medcalf"  wrote:

> Far better is to run the queries multiple times in succession (say,
> 1000) and then average the result.

Good advice.  Sometimes it seems like caching is "cheating": we
don't know the performance of something if we're using the cache.  But
the cache is part of reality; it will affect performance during
real-world use of the software.  Pretending it doesn't exist will lead
to optimizing the wrong thing.  

OTOH, testing can leave the cache "hotter" than it would be under
normal use, depending on what "normal" is.  if the OS is answering to a
lot of disparate requests, SQLite will suffer more cache misses than if
it's the only application running.  If the data are large enough, even
one table scan could remove more interesting bits from the cache.  

If you want to test the effect of caching, one suggestion that's been
missing from this thread is umount(8).  Unmounting the device is bound
to release any kernel resources associated with it.  Remounting it will
bring only the barest information from the filesystem into memory.  

One difference between unmounting and using the Big Red Switch is
that the drive's own cache is in play.  But that cache is relatively
small and afaik has a write lifetime measured in milliseconds.  Unless
you're reading the same 8 MB over and over, the effect of the drive
cache won't skew your test results.  

--jkl


[sqlite] Bug with DATETIME('localtime')

2015-12-13 Thread James K. Lowden
On Thu, 10 Dec 2015 06:34:44 -0700
"Keith Medcalf"  wrote:

> The only way to convert datetime data on windows is to use a
> third-party package that does it properly, or write it yourself.
> Using the WinAPI functions is equivalent to "writing it yourself"
> because they do not actually do anything -- you have to manage all
> the meaningful data and deal with the vagaries of the
> under-documented closed proprietary function implementations in
> windows (which Microsoft even admits do not work properly).

Keith, your answer was so disheartening that I was impelled to revisit
where the C standard is on time zones.  I remembered it was messy, but
thought it had surely been fixed.  

It's not fixed, although gacial progress is being made.  Even though
we've had the TZ database & Posix datetime functions since 1986, 30
years later we're still struggling with it, and not only on Windows.  

The C standard library defines functions for "local time", defined
globally with tzset(3).  To work with two time zones (even if one of
them is UTC) requires manipulating the TZ environment variable,
and "setting the time" with tzset(3) for each each one. 

The latest version of the tz database[1] incorporates changes that
originated with NetBSD in 2014.  NetBSD introduced some new functions
e.g. mktime_z(3) that add a timezone argument to the traditional time
functions of the C standard library.  This lets you allocate a timezone
variable for each zone you're interested in, and thereby to handle the
two time zones without touching global structures.  

I don't know the status of these functions beyond NetBSD.  From the
docs[2] they don't seem to have yet been incorporated in GNU libc.
Perhaps they would be possible to incorporate them in SQLite?  

Below are two programs -- one Posix, one NetBSD 7.0 -- that produce the
output the OP expects.  Neither one requires any heavy lifiting,
although I think you'll agree the second is more interesting.  

--jkl

[Posix]
#include 
#include 
#include 

int main(int argc, char *argv[] )
{
  // > 2014-10-25 20:00:00
  struct tm tm = { .tm_hour = 20,
   .tm_mday = 25,
   .tm_mon = 9,
   .tm_year = 114 };

  putenv("TZ=no time like the present");
  tzset();

  time_t time = mktime();
  printf( "time is %lu\n", (long)time );

  struct tm *greenwich_tm = gmtime();

  printf( "time in %s is %s",
  greenwich_tm->tm_zone, asctime(greenwich_tm) );

  putenv("TZ=:Europe/Moscow");
  tzset();

  struct tm *moscow_tm;
  moscow_tm = localtime();

  printf( "time in %s is %s",
  moscow_tm->tm_zone, asctime(moscow_tm) );

  return EXIT_SUCCESS;
}
[xisoP]

make && ./moscow
c99 -D_XOPEN_SOURCE=600 -D_BSD_SOURCE -g -o moscow main.c
time is 1414267200
time in GMT is Sat Oct 25 20:00:00 2014
time in MSK is Sun Oct 26 00:00:00 2014


[NetBSD]
#include 
#include 
#include 

int main(int argc, char *argv[] )
{
  // > 2014-10-25 20:00:00
  struct tm tm = { .tm_hour = 20,
   .tm_mday = 25,
   .tm_mon = 9,
   .tm_year = 114 };

  time_t time = mktime_z(NULL, );
  printf( "time is %lu\n", (long)time );

  struct tm *greenwich_tm = gmtime();

  timezone_t moscow_tz = tzalloc("Europe/Moscow");
  const char *name = tzgetname(moscow_tz, 1);
  printf("Moscow time zone name is '%s'\n", name);

  struct tm *moscow_tm, data;
  moscow_tm = localtime_rz(moscow_tz, , );

  printf( "time in %s is %s",
  greenwich_tm->tm_zone, asctime(greenwich_tm) );

  printf( "time in %s is %s",
  moscow_tm->tm_zone, asctime(moscow_tm) );

  tzfree(moscow_tz);

  return EXIT_SUCCESS;
}
[DSBteN]

make && ./moscow
c99 -g -o moscow main.c
time is 1414267200
Moscow time zone name is 'MST'
time in GMT is Sat Oct 25 20:00:00 2014
time in MSK is Sun Oct 26 00:00:00 2014

[1] https://www.iana.org/time-zones/repository/tz-link.html

[2]https://www.gnu.org/software/libc/manual/html_node/Broken_002ddown-Time.html#Broken_002ddown-Time


[sqlite] Problem with accumulating decimal values

2015-12-15 Thread James K. Lowden
On Fri, 11 Dec 2015 16:21:30 +0200
"Frank Millman"  wrote:

> sqlite> UPDATE fmtemp SET balance = balance + 123.45;
> sqlite> SELECT bal FROM fmtemp;
> 5925.599

To a question like that you'll receive a lot of answers about numerical
accuracy.  And it's true that there are ways to "do the math" without
using floating point representation.  It's also true that it's rarely
necessary, which is why floating point representation exists and *is*
widely used.  You may find it works for you too, unless you have to
adhere to a specific rounding policy.  

Per your example, you're working with 2 decimal places of precision.
5925.599 rounds off to 5925.60; it even rounds off to
5925.60, not too shabby.  If you keep adding 123.45 to it,
you'll find you can go on forever before the answer is wrong in the
second decimal place.  

IEEE 754 is a solid bit of engineering.  It's capable of representing
15 decimal digit of precision.  That's good enough to measure the
distance to the moon ... in millimeters.  

You could have an exceptional situation, but that would be
exceptional.  Usually, double-precision math works just fine, provided
you have some form of round(3) at your disposal when it comes time to
render the value in decimal form.  

--jkl


[sqlite] Bug with DATETIME('localtime')

2015-12-16 Thread James K. Lowden
On Sun, 13 Dec 2015 20:11:32 -0700
Scott Robison  wrote:

> > It's not fixed, although gacial progress is being made.  Even though
> > we've had the TZ database & Posix datetime functions since 1986, 30
> > years later we're still struggling with it, and not only on Windows.
> 
> The problem would be that SQLite could not depend on the presence of
> TZ functions even if they were added to the standard:

I think the time when "the standard" mattered regarding *libraries* has
passed. Nowadays, new functions do or don't get added to libc largely
based on what GNU does, and to a lesser extent on the BSD projects.  

> 1. SQLite generally avoids non ANSI C so as to be compatible with the
> largest possible number of platforms. ANSI C (aka C89 or C90 for the
> ISO version) will never be updated to add new requirements.

SQLite maintains its own implementation of not a few functions for the
sake of compatibility.  I don't know whether this should be one of
them, but there is more than enough precedent.  

> 2. Let's say that that the next version of the C standard does add TZ
> functionality. 

I haven't peeked to find out how SQLite implements date arithmetic.  I
assume it parses strings in the database, calls mktime(3), and
subtracts time_t values.  That's pretty vanilla, and doesn't *require*
the TZ database.  

The downside of using mktime is that it locks you into a "time zone
perspective", if you will.  The timezone that will be used to convert
a (UTC-based) time_t value to "local time" is set globally.  If you
want to compare two local times, you have to manipulate that global
variable between conversions.  

The new mktime_z(3) function from NetBSD unglobalizes the timezone: it
adds a timezone parameter.  That makes it much more convenient to use
(if that's what you need!)  It's been accepted afaict by IANA, but I
found no discussion of it at GNU.  

While the NetBSD (and IANA, obviously) implementation uses the TZ
database, that's not a requirement.  The function's definition makes no
reference to its implementation.  

mktime_z could be emulated on Windows without IANA's help.  Which it
would have to be, because Windows doesn't use the TZ database:  

save TZ
set TZ to something
_tzset() // Microsoft! 
mktime
restore TZ
_tzset

A quick glance at the documentation suggests
TzSpecificLocalTimeToSystemTimeEx might be useful, too.  

Someone will complain that would be slow, and something about
threads.  My understanding is that the OP got the wrong answer, and I
would say slow is better than broken.  And it won't be slow: there's no
I/O; not even a context switch.  

As Keith said, as of now you have to roll your own.  SQLite does not
support date arithmetic across time zones.  Should it?  Should it as an
extension?  I don't know.  I was just trying to understand (and
explain) what the C foundation looks like, why/how it's broken, and
what would be required to fix it.  

--jkl










[sqlite] Problem with accumulating decimal values

2015-12-16 Thread James K. Lowden
On Wed, 16 Dec 2015 20:33:40 +0200
R Smith  wrote:

> > Ok this does not work of any scale of numbers. But a solution with 
> > integers neither does
> 
> I think the bit that Keith tried to highlight is that we should
> always refrain from storing errors. 

Keith recommended against storing *rounded* values.  If you store $0.30
in SQLite as REAL, you store a binary approximation.  It's a fine thing
to keep unless you care about picodollars.  

> Calculated errors are fine because we can at any time revisit the
> calculation procedures, we can refine and perhaps opt for more
> significant digits - but we can ALWAYS guarantee the accuracy-level
> of the calculated result. However, storing wrong values (or let's
> call them "approximate" values if you like) is pure evil. 

I'm not sure what you mean.  

There's no problem storing a C double from memory and later fetching
it.  The same 64 bits pass through the interface unchanged.  (Well,
maybe not the *same* bits, but who can tell?!)  Once replaced back in C
memory, the computation can resume where it left off unaffected.  

What you usually don't want to do is compute based on rounded numbers.
If you store a rounded number to the database, you may lose
information.  Even if you don't -- even when the rounded number is the
right one -- such errors as accumulate at the edge of accuracy normally
wind up not mattering.  That's why C does all computation in
double precision, even when the operands are single-precision.  

The opposite mistake -- losing information --
can easily lead to results that are spectacularly wrong.  

--jkl




[sqlite] Problem with accumulating decimal values

2015-12-16 Thread James K. Lowden
On Wed, 16 Dec 2015 15:05:34 +0100
"E.Pasma"  wrote:

> 16 dec 2015, Keith Medcalf:
> >> BEGIN;
> >> UPDATE fmtemp SET bal = ROUND(bal,2) + 123.45;
> >> (repeat a 1.000.001 times
> >> END;
> >> SELECT bal FROM fmtemp;
> >> 123450123.45
> >
> > You should NEVER round as you have done above.  You may get lucky  
> > and the errors may cancel each other out, or you may get more
> > usual results where the error equals the theoretical max of the sum
> > of the absolute value of all the truncated values, which can be
> > quite significant depending on the scale of the number you are
> > dealing with (and theior scales relative to each other).
> 
> Hello, I was only trying to digest JKL's post and the result looks  
> good. The example prints the value as it is in the database and
> shows that there is no accumulated error there. I do not see a
> counter example (not yet).
> 
> Ok this does not work of any scale of numbers. But a solution with  
> integers neither does

Keith's advice is well taken.  Keep the real number; round for
presentation.  I always say, "store what you know".  

Yes, starting from zero you can add 123.45 to a double-precision
floating point number for a very, very long time, about 81,004,455,245
times, before the error will appear in pennies. When it does, you'll
have 13 digits left of the decimal.  That's on the order of the US
GDP.  We don't measure things like that down to the penny, so no one
will know if you're wrong.  ;-)  

The thing to keep in mind is that you get ~15 decimal places of
precision.  The decimal floats.  You can put it way on the left, and
measure tiny things accurately.  You can put it on the right, and
measure astronomical things accurately.  Unless you care about
millimeters to the moon, it will do the job.  

Whole books have been written on numerical accuracy.  I suspect if that
mattered to your application you'd know about it.  My advice is to let
the engineers worry about it -- they did, years ago -- and accept
rounded output unless and until you have an example of a computation
for which that doesn't work.  

--jkl




[sqlite] Behaviour when subquery behind comparison operator in where-clause returns empty result

2015-02-24 Thread James K. Lowden
On Wed, 18 Feb 2015 08:55:37 +0100
gunnar  wrote:

instead of 

> SELECT *
> FROM ordercallback
> WHERE account=@ACCOUNT
>AND cb_seq_num>(
>  SELECT cb_seq_num
>  FROM ordercallback
>  WHERE cb_uuid=@CBUUID);

SELECT *
FROM ordercallback ocb
WHERE account=@ACCOUNT
   AND exists (
 SELECT 1 
 FROM ordercallback
 WHERE cb_uuid=@CBUUID
and ocb.cb_seq_num > cb_seq_num );

No union, no limit, no order.  Just existential quantification.  :-)

--jkl


[sqlite] Complex insert query to normalised database

2015-02-24 Thread James K. Lowden
On Wed, 18 Feb 2015 14:16:32 +0100
Staffan Tylen  wrote:

> I suspect that this is wrong as nobody has suggested it but isn't
> this what triggers are meant to solve?

Triggers were invented before SQL defined what we now call Declarative
Referential Integrity (DRI).  It is (I'm going to say) always
preferable to declare your integrity constraints rather than
programming them imperatively.  

If you find yourself in a situation in which DRI is insufficient,
it's a good idea to look for a normalization problem or consider if the
trigger is being "abused", i.e., being used for something other than
referential integrity enforcement.  There are exceptions, but not
many.  

--jkl


[sqlite] Appropriate Uses For SQLite

2015-02-24 Thread James K. Lowden
On Thu, 19 Feb 2015 07:21:17 -0800
Roger Binns  wrote:

> There is a lot that would have to be done with it:
> 
> - - make the IR stable across releases
> - - add more general instructions beyond only what is needed for SQL
> - - expose an API that takes the IR
> - - possibly redesign it to make static verification possible, like
> Java bytecode did.  

As it happens, I'm working on a project exactly like that, and can
attest that it's plenty of work!  

You might be tempted to think -- I was -- that it would be nice to have
a C library of relational algebra operators on a transactional store.
No SQL, just C functions like ra_project, ra_select, and ra_join.  What
could be nicer than, 

t3 = ra_join(db, t1, t2)

?   Well, there's the little matter of "join on what?", so you get 

t3 = ra_join(db, t1, t2, how)

and then there's the question of "what goes in 'how'?".  IOW, how to
specify the join criteria without syntax, using only data?  In fact,
consider a simpler problem, 

t2 = ra_select(db, t1, "where id > 2")

and suddenly you're back in Parserville whether you wanted to be or
not.  You're inventing a language, a very limited, incomplete version
of SQL.  Limited, that is, until you want to say something like "where
exists (select 1 from t5 where t1.id < t5.id)".  Before you know it,
you've reinvented more than half of SQL, except that the ratio of SQL
speakers to your-language speakers approaches infinity.  

Now, you could avoid adding complexity to your language by moving more
of the relational algebra stuff out of it, 

t3 = ra_semijoin( db, t1, 
ra_join(db, t1, t2, ra_natural) )

or something like that.  To Richard's point, the more you do that, the
*less* there is for the optimizer to do, and the more the application
comes to control (for good and ill) the order of operation.  It's very
difficult to maintain the relational abstraction *and* be efficient,
because the C language defines an order of operation, specifically
"greedy evaluation".  That means inner functions are evaluated before
outer ones, inviting production of temporaries that a query optimizer
would just laugh at.  

None of this is news.  Query optimizers have (for decades) been
"lowering the filter" to minimize the processed data.  DBMSs and vector
languages (R, but also APL, NumPy, Matlab, inter alia) are *languages*
precisely so that they can *effect* high-level instructions without
interpreting them literally.  That's not an option in C.  (It's also
what makes things like BLAS such a bear to write *and* use.)  

I'm happy to discuss this off-list in more detail with anyone
interested.  Bringing it back to the topic, I would only say, "be
careful what you wish for".  You want SQL, or something very like it,
to provide abtraction and execution efficiency, because C can't
do that kind of interpretation.  

--jkl


[sqlite] recurrent failure mode

2015-02-25 Thread James K. Lowden
On Wed, 25 Feb 2015 16:26:45 -0800
Dave Dyer  wrote:

> >Do you have any multi-access things going on ?  Two or more
> >computers, applications, processes or threads trying to access the
> >database at the same time ?
> 
> No, but it would be normal for the database to be on a different
> computer than the sqlite client, and be using whatever networked 
> file system is common.  The culprit clients seem to be macs, we're
> still seeking more information about the specifics.

You might want to read my message on the topic from the list archives,
dated Sat, 31 Jan 2015.  

Although "bugs" are frequently blamed, in fact the semantics of
networked filesystems are different from that of local filesystems.
Making a database work on a network filesystem might be possible, but
requires considerable work to support cache coherency.  Expecting
SQLite to do that is not too different from expecting to read a letter
the moment it's dropped in the mailbox across town.  

Just consider this scenario:  two clients open the same database, where
the file is on a fileserver somewhere on the network.  Each one does,
say, "select * from T" and peruses the data.  Then, 

client A inserts record 17:
lock record
insert row
free lock

All good.  Now,

client B inserts record 17:
lock record
insert row
free lock

Should result in a primary key violation, right?  No.  

Client A has updated his *local* cache, his in-memory image of some
part of the database.  That's not a SQLite cache; that's the kernel's
filebuffer cache representing the state of some part of the filesystem.
It's completely correct from A's point of view.  SQLite depends on that
cache being correct, and it is: from A's point of view.  

If B is on the same machine as A, they share a single, kernel-provided
filebuffer cache, and when B attempt to insert the duplicate record,
SQLite will see A's record and reject the insert.  

N machines is N caches.  When A flushes his cache with sync(2), how
long before B learns of the change?  With N=1 (same machine), B
learns instantaneously.  With N > 1?  NFS promises only that B will
see A's changes after closing and reopening the file.  

When B is on a different machine, the local representation of the state
of the filesystem does not include A's update.  SQLite examines the
"file", sees no record 17, and updates its local image.  When the
kernel eventually flushes B's update, A's local cache becomes stale.  

With some "luck", you can actually go on like this for a while with no
one noticing.  As long as different clients are updating different
parts of the database and fortuitously refereshing their caches (by
re-reading updated parts that the network isn't helpfully caching,
too), it can seem to sort of work.  Failure is guaranteed in the most
important scenario: when SQLite requires a coherent cache and doesn't
have one.  

So it really doesn't matter if the locking mechanism on network
filesystems are perfect.  By *design*, they make weaker promises than
Posix.  Expecting them to do something they're documented not to do is
asking for trouble.  Which is to say, unfortunately, that you got what
you asked for.  :-/

HTH.  

--jkl




[sqlite] Could frequent 'database is locked' errors mean SQLite is not a good fit for my application?

2015-07-02 Thread James K. Lowden
On Thu, 2 Jul 2015 10:09:12 -0400
Kathleen Alexander  wrote:

> Essentially, I have written an application in C++ that interfaces
> (reads and writes) with a SQLite database, and I am getting lots of
> 'database is locked' errors. [...]
> 
> My application runs on Linux (ubuntu 13.10), and is driven by a bash
> script that spawns many (~60 on a 64 core workstation) instances of a
> serial, C++ program, each of which opens its own connection to the
> database and performs reads and writes.

It may be that SQLite is not a good fit for your application.
Concurrent update is SQLite's Achilles heel.  

Each insert/update/delete in SQLite requires exclusive access.  In WAL
mode, it requires exclusive access to the table; else it requires
exclusive access to the whole database.  That means, by default, only
one process can update the database at a time.  If you have 60
processes, 59 will wait.  

Depending on your requirements, that may still be OK.  If the updates
run quickly enough for your purposes, then increasing the timeout may
do the trick.  Estimate the processing time and number of updates to
compute an overall throughput.  If that's acceptable, problem solved.  

Most DBMSs, the heavy kind, devote extensive resources to support
concurrent update.  Usually contention is managed at the page or row
level, and a whole section of the manual covers how the system
implements SQL's "isolation levels".  Just exactly how many processes
can update the database at once is a function of almost everything:
DBMS configuration, table design, index definition, and isolation
level, not to mention hardware capacity and the OS I/O subsystem.  Your
mileage will most certainly vary.  

> B. If SQLite is not a good fit for my program, do you have any
> suggestions of an alternative database engine that is free or might
> be free or inexpensive for academic use?

*If* that's the case, your best option IMO is Postgres.  If you want to
handle ~1000 concurrent updates, though, you will have to pay attention
to the details, and may have to arrange to minimize resource contention
in the DBMS.  It all depends on the throughput and response-time
requirements.  

HTH.  

--jkl


[sqlite] Slow real world performance - Any suggestions please (warning long)

2015-07-05 Thread James K. Lowden
On Fri, 3 Jul 2015 11:35:21 +0100
Rob Willett  wrote:

> It basically has taken no time to run 10,551 selects. This implies
> that we had issues on the database before with either fragmentation
> or indexes being poorly setup.  

Perhaps you said so and I missed it, but to me it's clear that almost
all the improvement comes from using RAG_Idx1.  Because it indexes the
Text column, you're now doing a binary search instead of a linear
scan.  

By changing the collation, you went from scanning every row to
searching a binary tree.  That took you from O(n) to O(log2 n).  If
each row in your 16 GB database is 100 bytes, allowing for overhead you
might have 80 million rows?  To satisfy your query, on average that
would be an improvement from 40,000,000 I/O operations to 13.  

When something that used to take 122 seconds starts going a million
times faster, it begins to look pretty much instantaneous.   :-)  

--jkl


[sqlite] Suggestions for Fast Set Logic?

2015-07-10 Thread James K. Lowden
On Thu, 9 Jul 2015 22:28:04 +0100
Simon Slavin  wrote:

> If you're going to do it properly you have a keywords column which
> contains strings like
> 
> ,
> ,animal,
> ,animal,predator,
> ,animal,predator,bird,

Why do it that way?  I would recommend a schema like Keith's, with one
keyword per row in a separate table.  

The OP mentioned that searching 250,000 rows took 2 seconds.  Because a
LIKE clause can't use an index, every query of a column such as you
suggest will require a linear scan, plus pattern-matching overhead.  A
normalized plan like Keith's permits indexing by keyword, which should
be blazing fast when testing for equality to string constants.  

--jkl


[sqlite] Suggestions for Fast Set Logic?

2015-07-11 Thread James K. Lowden
On Fri, 10 Jul 2015 09:54:27 -0600
Scott Robison  wrote:

> As described, the user will be able to create arbitrarily
> complex queries. Since it is impossible to know in advance what
> indexes might be required, breaking it down to individual sub queries
> with simple where clauses that can be virtually guaranteed to use a
> simple index

I don't follow you.  A complex query is an assemblage of clauses.
Whether or not broken down "to individual sub queries", the search
arguments are the same.  They are few in kind, even though they may be
large in number.  The analysis for what indexes to create will be the
same either way.  In fact, Keith already proposed a plausible design.  

My concerns would be two:

1.  OR clauses are essentially UNION.  I don't know how sophisticated
SQLite is about using different indexes for different OR-separated
criteria or parts of a UNION.  Maybe separate queries would be
better.  

2.  LIKE clauses usually don't engage indexes.  The OP should
understand that pattern-matching queries will often entail a table scan
unless ANDed to an indexed term.  

--jkl


[sqlite] Suggestions for Fast Set Logic?

2015-07-12 Thread James K. Lowden
On Sat, 11 Jul 2015 19:02:59 -0600
Scott Robison  wrote:

> > I don't follow you.  A complex query is an assemblage of clauses.
> > Whether or not broken down "to individual sub queries", the search
> > arguments are the same.  They are few in kind, even though they may
> > be large in number.  The analysis for what indexes to create will
> > be the same either way.  In fact, Keith already proposed a
> > plausible design.
> 
> I'm just saying that knowing which index to create in advance is
> impossible if the user can specify arbitrarily complex where clauses.

In that case, I can make your job easier, starting today.  You don't
need to consider query complexity to design your indexes.  :-) 

The complexity of the query per se is immaterial.   If we had a
database with one table with only one column of integers, the
*complexity* of the queries against it could still be arbitrarily
complex, because that's what SQL allows.  But the choice of what to
index would be trivial.  

The database in question is nearly that simple.  It consists of just a
few tables, and AIUI the search terms are limited to keywords and rating
values. Maybe there are a few others.  The combinations in which those
could be used in a query are limitless.  The number of indexes appears
to be about 3.  

Determination of indexes is always driven by search arguments.  What
will be in the WHERE clause?  What will be in the JOIN?  The number of
ANDs and ORs doesn't matter, nor does the number of tables joined.  No
matter how complex the queries, the number of indexes is always limited
by the searchable columns.  

--jkl


[sqlite] User-defined types

2015-06-05 Thread James K. Lowden
On Fri, 5 Jun 2015 13:07:59 -0400
Stephen Chrzanowski  wrote:

> If N-Tier software development is 'annoying' and you are not happy,
> either get help by other members of your team, or, find a different
> hobby, because anything less than 3-tier programming dealing with
> multiple languages, technologies and functionality just isn't going
> away.  Quite honestly, and this is just my opinion, but I think it is
> absolutely wrong of you to go into a place of employment angry or
> annoyed at ANY level because of the multiple languages and
> technologies used to bring your product to life. Get mad at the
> politics, not the tools being used.  

Whew!  Most days this list is as well behaved as a mature poodle, and
then once in a while someone writes a rant that segfaults on the first
paragraph.  

You're saying complexity here to stay, it's inevitable.  Simon is
saying it's unnecesary (ergo annoying).  The more you know about
inherent and accidental complexity, the lower your threshold for being
annoyed by the latter.  There's no reason your 6-tier application
couldn't be written in a single language.  You could have one syntax
for data structures, one representation (and semantics) for missing
data, one binary representation for each type.  You could throw an
exception at tier-0 couldn't be caught in tier-5.  

The jumble of technologies we use is very much a happenstance accident
of industrial history.  No one designed it that way.  Why else would we
have no less than two completely different abstractions of the machine
-- the OS and the browser -- connected by nothing more sophisticated
than a stream of bytes?

> Second... Come on... Really?  This "switching...requires a lot of
> effort" comment is a pretty weak sauce excuse coming from any
> developer

It's not an excuse, "sauce" (whatever that means) or otherwise.  It's a
fact.  It's called "cognitive load" and it's been recognized since the
dawn of software.  

--jkl


  1   2   3   4   5   6   7   >