from:"Nico Williams"

Re: [sqlite] bug: failure to write journal reported as "disk I/O error"

2017-09-26 Thread Nico Williams

On Tue, Sep 26, 2017 at 01:37:42PM -0700, Jens Alfke wrote:
> > On Sep 26, 2017, at 1:17 PM, Guy Harris  wrote:
> > A user wouldn't know what to do with "you've exceeded your stored data 
> > quota”?
> 
> A Turkish or Chinese user likely wouldn’t. (SQLite’s error messages
> are not localized.) And there are plenty of messages that are much
> less understandable to a lay user than the one you picked out.

They could be.  And regardless, more detail in the error _code_ is
better for the applicaton developer.

EIO is definitely an I/O error.  Could be all sorts of things.  E.g.,
you're using iSCSI and the network is timing out.

ENOSPC is very, very different.  Reporting ENOSPC as an I/O error means
that the app or the user must now use df(1) or strace(1) or similar to
work it out, when SQLite3 could just have reported that the FS is full.
Ditto EDQUOT.

EROFS is also very different.

And so on.

These are ancient error codes.

> > The *number* might annoy the support staff; right off the top of
> > your head, what's the error number for "file system quota exceeded"
> > or "I/O error"?  (No cheating by looking it up in a man page or
> > include file!)
> 
> On the contrary, error numbers are a lot easier for support. They’re
> independent of locale, they don’t get re-worded from one version of
> the app to the next, and they’re very short and easy to dictate over
> the phone. Of course, these shouldn’t be the primary error information
> given to the user! But the user-level error message should be
> something specific to the application, like “an unexpected database
> error occurred (19)” instead of "Abort due to constraint violation”.
> The number would appear only for support purposes.

As long as you can resolve them to symbolic names and/or messages.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] [EXTERNAL] Common subexpression optimization of deterministic functions

2017-09-14 Thread Nico Williams

On Thu, Sep 14, 2017 at 1:10 PM Simon Slavin  wrote:

>
>
> On 14 Sep 2017, at 5:55pm, R Smith  wrote:
>
> > Richard, wouldn't it be possible to supply a wrapping function (perhaps
> a hint function, like the likelihood() function), that takes another
> function as a parameter and then ensuring that THAT gets calculated only
> once?
>
> Can you not do it with WITH ?  I don’t really understand how WITH works
> but it would seem to evaluate its terms just once for each iteration.

In PostgreSQL CTEs fiction as an optimizer barrier: the engine always
evaluates each CTE in order and stores their results in temporary tables,
then it runs the main statement with those tables as sources.

But in SQLite3 CTEs are not materialized first -- they are like VIEWs, and
so they do not function as optimizer barriers, therefore they cannot be
used for CSE.

I would much prefer to be able to specify which CTEs must be materialized,
and which may be left as internal views.  That would give the user a great
deal of control.  WITH x AS () MATERIALIZED ... .  Can we get that?

>
>
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams

On Sat, Sep 09, 2017 at 11:26:35PM +0200, R Smith wrote:
> I think you are missing something or my explanation was not clear.
> When I say "first test" I mean of THIS test suite, not the previous set from
> 3 days ago.

I meant the opposite.

___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT vs. WHERE IN (...) - speed and efficiency differences

2017-09-09 Thread Nico Williams

On Sat, Sep 09, 2017 at 05:56:03PM +0200, R Smith wrote:
> *Findings* pertaining to JOIN, INTERSECT and WHERE IN (...) type queries on
> tables *WITHOUT Row_ids*:
> (This is the full test posted below because it is the one that matters most)
> INTERSECT AND WHERE IN (...) queries posted similar times here with WHERE IN
> (...) being the slightly faster (similar to the above findings), but BOTH
> were a good 10 - 15% faster than on the ROW-ID tables, so WITHOUT ROWID
> tables seem to have a definite advantage here (it is currently unclear to me
> why this is).
> 
> A troubling test is the JOIN on WITHOUT ROWID tables - it took several
> orders of magnitude longer than any other test in the entire experiment.

In your first test you were ordering by PK, now you're not, and you
don't have an [covering] index on the columns you're ordering by, so,
yeah, "orders of magnitude" slower is to be expected.  You're comparing
apples and oranges.

Also, in your first test you have one column.  Now you have lots.
That's no way to isolate the performance numbers you're trying to get.

The only way to have an ORDER BY speed up a query is when there's an
index on the columns in question that can be scanned to produce the
order you're looking for without additional sorting, or when the engine
can create such an index as a temporary index (this, of course, has a
cost, so it won't always work well)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Proof that a line has been modified

2017-09-07 Thread Nico Williams

On Thu, Sep 07, 2017 at 10:16:15AM +0200, Paxdo wrote:
> For security reasons, a customer wants to be sure that a database line
> cannot be modified after its initial insertion (or unmodified without
> being visible, with proof that the line has been modified). Including
> by technicians who can open the database (SQLITE of course).
> 
> Is there a solution to that?

You have these choices:

 - hash the whole SQLite3 file and record or sign such hash values for
   approved DB files (this will let you detect all changes)

 - something like what you described (hash every row of every table and
   bind them all somehow, then save or sign this; this too will let you
   detect all changes for all tables that you apply this to)

   (this is DB-agnostic)

 - switch to a DB that uses a Merkle hash tree (see below)

Generally speaking, the best approach for this sort of thing is to use
something called a Merkle Hash Tree, such that for each database/
filesystem/datastore you always have available a single, small (e.g.,
256 bits) cryptographic hash value for the entire thing.

In order to make such hash values usable for this purpose you'll need
the system to be "content-addressed" storage (CAS) if at all possible
(more on that in below).

A good example of a system that comes close to this is ZFS.  ZFS is a
filesystem that actually is a Merkle Hash Tree on-disk, but it's not
content-addressed, which means that if the locations of data on-disk
changes, then the root hash also changes, even if none of the actual
data changed.

What this actually means in practice is that any time you have a
"pointer" from a database/filesystem/datastore page to another, what
must actually be stored is not just the address of the target page, but
the hash of its contents.  If you apply this rule rigorously, and if you
have a root page (filesystems generally do, and databases can too, at
least per-table, and often for the entire DB), you necessarily end up
with a root page whose cryptographic hash *is* the cryptographic hash of
the entire DB/FS.

To get CAS you also need to not include block/page addresses in the
cryptographic hash computations (but still must include the hashes of
pointed-to pages/blocks).  Then you can say that the hash of a page's
content *is* its address (it's not really).

One reason that Merkle hash trees are best is that you don't have to
read every page of a DB/FS to verify the root hash.  You need only hash
the root page and you're done -- if, anyways, any errors verifying other
page hashes can be handled at run-time.  Another is that they let you
compute hashes for sub-trees.  Another is that they're easy to build.

SQLite3 is NOT a Merkle hash tree, however.  You *can* build a Merkle
hash tree with SQLite3 though.  Fossil is a version control system that
does exactly that, but that's not a technique you're likely to apply to
your use case (I'm guessing).

Given a Merkley hash tree, you can digitally sign (or save in a remote,
secure system) root hash values of approved DB/FS states.  This is
interesting, for example, for secure-boot/TPM applications.

Given that you can't easily use a Merkle hash tree with SQLite3 this
without building a DB on top of a DB (like Fossil basically does) or
switching to one that uses a Merkle hash tree (and exposes the root hash
value to you), you could hash every row, XOR the hash values (since
there's no defined order for the rows, or else you can hash the
concatenation of the hashes in some order you define), and sign that.
You could apply this for every table and XOR all the table hashes, or
just those tables that are of interest to you.  You'll want to do this
for all interesting rows in sqlite_master as well.

Lastly, as others have pointed out, the best you can do with a DB hash
is cryptographically prove that the FS/DB has approved content, for some
value of "approved content".

You cannot prove that the DB/FS hasn't been reset to an earlier approved
state without adding a revocation system.

Nor can you prove that the DB/FS has no malicious content in it -- only
that an approved entity signed it as "approved".

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

2017-09-07 Thread Nico Williams

On Thu, Sep 07, 2017 at 09:51:07PM +0200, R Smith wrote:
> INTERSECT will happily match however many columns you desire (and specify),
> there is no need to match full records or single keys specifically.

But the two queries on either side of the set operator must have the
same number of columns (and in strongly-typed RDBMSes, the same types).

That's a huge constraint.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

2017-09-07 Thread Nico Williams

On Wed, Sep 06, 2017 at 07:43:07PM -0600, Keith Medcalf wrote:
> Try the same test using 147 columns in each table.
> 
> 1 column is rather trivial.  Even a kindergarten kid could do it in no
> time using crayons and the wall.
> 
> [...]
> 
> In other words except in very trivial cases (like having only one
> column that is not nullable) it will be very difficult to write a
> "correct" JOIN or correlated subquery that emulates an INTERSECT.

Yup.  But that doesn't mean that the engine couldn't internally build a
result-set from the query without some filtering JOIN, then implement
the same strategy as an INTERSECT.  You can't do this in SQL if the
filter table has different shape than the result set, but the engine
might be able to do it.

On the other hand, building a complete result set first is... not online
behavior.  If the result set size is enormous, then the INTERSECT
approach is going to make the user very unhappy!

I do think OP's tests point out a case where SQLite3 is pessimally
picking table scan over covering index scan...

...though scanning the index
means that there will be no rowid column in the result, which might
actually be a compatibility issue when using rowid tables, so maybe
SQLite3 is doing exactly the right thing?

I don't think that pessimization is too consequential as users can
improve the situation by adding ORDER BY clauses or using WITHOUT ROWID.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

2017-09-06 Thread Nico Williams

On Wed, Sep 06, 2017 at 11:54:35PM +0200, R Smith wrote:
> It's still remarkable that in both tests 5 and 6 I've used the very same PK
> setup, yet Test 6 was significantly faster with the added ORDER BY clause.
> In tests 1 through 4 I did not use a PK at all, just plain INT data field,
> but then I did not test the ORDER BY in those tests.

It's an optimizer issue.  It didn't occur to the optimizer that scanning
a covering index was better than scanning the table because the covering
index has (in this case, and always, for rowid tables anyways) strictly
less contents to read and decode.

Scanning the covering index has the happy side-effect (if you wanted if) of
producing ordered results and making an equivalent ORDER BY free.

(Whereas scanning the table will produce results in rowid order, which is
almost certainly not useful unless you explicitly wanted an INTEGER
PRIMARY KEY.)

Note that INTERSECT could have used a hash table, thus producing
unordered results (most likely).  But SQLite3 only knows b-trees.

All of this explains the accidental ordering / non-ordering.  And also
why you shouldn't count on it: it's all implementaton details!

But just because you know to add an ORDER BY doesn't mean you shouldn't
think to make it match some suitable index...  The optimizer is nice,
but you still have to think a little bit like an optimizer yourself :(

> It might turn out to be a wild goose chase, but that will be easily evident
> when testing without the PK and with more realistic real-world data. I'll do
> that this weekend.

You can't have a PK-less table -- SQLite3 always want some PK, even if
it's a hidden rowid column.  WITHOUT ROWID tables make this clearer:

  sqlite> create table t(a text) without rowid;
  Error: PRIMARY KEY missing on table t

Adding an explicit PK implicitly added the covering index that sped up
the JOIN (once you forced the optimizer to use it).  But you should just
always have had an explicit PK and WITHOUT ROWID.

You still found something interesting about using JOINs to filter result
sets (as opposed to adding columns to the the result set).  Something to
keep in mind...  I do a lot of queries where some of the JOINed tables
are used only for filtering.  It's not always possible to convert such
queries to INTERSECT, but it might be possible for SQLite3 to learn how
to perform the equivalent optimization internally, and when to do it.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

2017-09-06 Thread Nico Williams

On Wed, Sep 06, 2017 at 10:57:41PM +0200, R Smith wrote:
> On 2017/09/06 8:26 PM, Nico Williams wrote:
> >On Wed, Sep 06, 2017 at 08:05:57PM +0200, R Smith wrote:
> >>-- Another interesting thing to note: The INTERSECT test produces ORDERED
> >>-- output, which suggests that an ORDER-BY addition to the query would
> >>-- favour the INTERSECT method.
> >Nothing about INTERSECT requires it to produce ordered output.
> 
> No, and it was not suggested, it was just noted that it does, suggesting
> that it could be the more performant choice when adding an ORDER BY clause,
> which turned out to not only be true in terms of being the better choice,
> but also that it itself sped up by simply adding the ORDER BY clause as was
> demonstrated in Test 6.

I point this out only because users should know not to assume result set
order without an ORDER BY.

> >Nothing about the JOIN case makes it not possible to produce ordered
> >output by accident.
> 
> Yet it doesn't seem to by accident, which would suggest that an ORDER BY
> clause when added to the JOIN statements would incur an additional time

See my explanation below.

> penalty for having to actually order the results - Yet, as again
> demonstrated in Test 6, the ORDER BY actually sped up the JOIN query too
> (perhaps via forcing the Index earlier or used in a different way) - which
> was most interesting, and, as you noted, there is nothing about the JOIN
> that precludes it from having ordered output, so this optimization might be
> worthwhile.
> 
> >You'll want to re-measure with an ORDER BY added.
> 
> I did. It was done in Test 6. It showed significantly interesting results.
> Was my explanation lacking in clarity or did it fall down the TLDR; rabbit
> hole? :)

I saw it.

I think the ORDER BY helped the JOIN because it caused SQLite3 to scan a
covering index (the primary) key instead of scanning the table.  That
without it SQLite3 didn't use that index is rather inefficient, though
it may not be a win in real-world use-cases to fix that.

Of course, IF you had used WITHOUT ROWIDs you would have found (I'm
sure) that the JOIN also produced ordered results by default and was as
fast as in your 6th test.

In fact, INTERSECT does an implicit ordering step by building a b-tree
that the JOIN with the index scan optimization does not have to build at
all, so JOIN has a leg up on INTERSECT in that sense.

> I agree, and not only the INTERSECT optimization but the tests suggest
> adding a silent ORDER BY would also be an optimization, though not sure if

I think here the ORDER BY merely forced SQLite3 to pick the more
efficient query plan, and that it's probably a (rather minor) optimizer
bug that it didn't do so to begin with without the ORDER BY.

> the effort-to-pleasure ratio is low enough yet. Perhaps if re-doing the
> tests with tables using several more non-Integer columns to see if the
> optimization could be generalized across all kinds of data in some way. I
> might pursue this later when I have some time.

If you'll redo this I'd urge you to use WITHOUT ROWIDS.  First, that's
almost always the right thing to do anyways.  Second, it won't perform
worse but likely will perform better.  Third, write performance
definitely should improve with WITHOUT ROWIDS.  Fourth, I think users
are starting to use WITHOUT ROWIDS more, so testing that seems more
useful.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] JOIN vs. INTERSECT speed and efficiency

2017-09-06 Thread Nico Williams

On Wed, Sep 06, 2017 at 08:05:57PM +0200, R Smith wrote:
> -- Another interesting thing to note: The INTERSECT test produces ORDERED
> -- output, which suggests that an ORDER-BY addition to the query would
> -- favour the INTERSECT method.

Nothing about INTERSECT requires it to produce ordered output.

Nothing about the JOIN case makes it not possible to produce ordered
output by accident.

You'll want to re-measure with an ORDER BY added.

In any case, this is quite interesting.  Many uses of JOIN are not
merely to filter results, but to construct joined result rows -- such
uses of JOIN cannot be optimized by using INTERSECT.  But for
filter-uses of JOIN... this might be a useful optimization for the
engine to learn.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Enforcing uniqueness from multiple indexes

2017-08-24 Thread Nico Williams

On Sun, Aug 20, 2017 at 05:17:16PM +, Wout Mertens wrote:
> Oh wow, I didn't know about ON CONFLICT, thanks!
> 
> Unfortunately the behavior on UPDATE (actually INSERT OR REPLACE in my
> case) is not the desired behavior, it removes the row with the same k but
> different id.

PostgreSQL has some unwieldy but very general "UPSERT" syntax where you
can specify what to do ON CONFLICT in an INSERT on a
per-constraint/colliding columns basis, and with WHERE clauses too.

  INSERT INTO .. (id, k, thing1, thing2) VALUES (..)
  ON CONFLICT (id) DO UPDATE SET thing1 = .. WHERE id != 5
  ON CONFLICT (k) DO UPDATE SET thing2 = .. WHERE k != 3;

That's very generic, but for the common case I'd really like:

  -- Insert or else update conflicting rows' columns to all the new
  -- values from the insert.
  INSERT OR UPDATE INTO .. (..) VALUES (..);

and

  INSERT INTO .. (..) VALUES (..)
  ON CONFLICT (..) DO UPDATE; -- change all the columns of the
  -- conflicting row to the new values
  -- from the INSERT

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Enforcing uniqueness from multiple indexes

2017-08-18 Thread Nico Williams

An INSERT OR UPDATE sure would be nice.  What i often do in cases like this
is: UPDATE .. WHERE; INSERT .. SELECT .. WHERE NOT EXISTS (SELECT ..);.
That's two statements -- if that's a problem, then you should use D. R.
Hipp's trigger-based solution.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Packing integer primary key with field bits

2017-08-10 Thread Nico Williams

On Thu, Aug 10, 2017 at 03:36:31PM +0200, Clemens Ladisch wrote:
> x wrote:
> > I’m thinking about this more from the gain in speed rather than saving 
> > space.
> 
> Database performance is usually limited by I/O, i.e., you gain speed by
> saving space.

Then proper compression (e.g., with zlib) will be much better for you
than what you're trying to do.

I don't know how to get SQLite3 DBs compressed though, short of writing
your own VFS (one might already exist) or hosting on ZFS (which works
very well).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Mailinglist question

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 03:48:34PM -0600, Keith Medcalf wrote:
> There is a solution since about 1984 ... it is called NNTP (Usenet
> News).  Google Groups is basically Usenet News with a (so some people
> thing -- but not I -- I detest so-called web-forums) purty front end
> to the news reader.  Gateways between mailman, bitnet, usenet, and a
> bunch of other stuff existed for a long time (and still do, mostly).

I... would like to use NNTP, but I'd need someone to provide the
service.  Though I'd be annoyed at having to use two different UAs (I
use mutt primarily; it doesn't support NNTP).

On the other hand, anyone running a mailing list could easily provide a
read-only IMAP service given a suitable implementation.  That would be
easy enough for users, but the MUA bits are missing.  (Sadly, very
little work is being done on MUAs.)  Providing a mostly-read-only IMAP
service would be possible, but it requires at least a modicum of account
provisioning.

What a mess.

Another thing I've done in the past is: download list archives, run a
script to fix the From lines so the archive is then a proper mbox
format, then use mutt.  :)  It helps if the archives keep Message-ID
headers.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Tutorials, books, video about SQLite

2017-08-09 Thread Nico Williams

By far the best generic SQL book, IMO, is the O'Reilly "SQL" Pocket
Guide.  It's very small and yet fairly comprehensive.  It covers Oracle,
DB2, SQL Server, and MySQL.  But it's very general and brief, and
everything it has to teach you is generally applicable to PostgreSQL and
SQLite3.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Mailinglist question

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 12:38:42PM -0700, Jens Alfke wrote:
> > On Aug 9, 2017, at 12:31 PM, Nico Williams <n...@cryptonector.com> wrote:
> > (It'd be great to have mostly-read-only public IMAP servers serving
> > mailing list archives.  I say mostly-read-only because one should want
> > to keep track of what one has read, responded to, flagged, and
> > "deleted".  But this concept appears to be a pipe dream at this time.)
> 
> I literally spent years exploring concepts in that vein, and almost
> got a product like that built while I was at Apple ten years ago.

Supposedly the IETF was going to contract out setting up such a service
for IETF lists.  IDK what became of that.  I should check.

> The closest existing solution I’ve found is groups.io
> <http://groups.io/>, which is a pretty good integration of
> mailing-lists and web-forums. You can get posts as emails or in a web
> interface, every email has a link to a view of the thread, and of
> course you can browse and search the archives. (Yeah, it’s a lot like
> Google Groups, but IMHO the functionality is better and the UI
> cleaner.)

I'm not fond of web fora, but a good integration would be nice, for
sure.  An IMAP service would be fantastic for users like me.  The need
to keep some state (unless we enhance MUAs to do it for read-only IMAP
mailboxes) requires user accounts, which requires more of a business
model :(

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Mailinglist question

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 09:10:58PM +0200, Lars Frederiksen wrote:
> Is it possible to read the mails directly on a server instead of receiving
> the mails constantly?

There are several archives online, and if you're subscribed you can use
your list subscription password to use the mailman archives.

(It'd be great to have mostly-read-only public IMAP servers serving
mailing list archives.  I say mostly-read-only because one should want
to keep track of what one has read, responded to, flagged, and
"deleted".  But this concept appears to be a pipe dream at this time.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] calculated-value indexes are not covering?

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 06:59:18PM +, Wout Mertens wrote:
> but… index s is covering and only includes the field s? I thought a
> covering index was one where all the data needed to satisfy the query is in
> index? I would say that all the indexes here conform to that definition?

No, "covering" means that the columns listed in the index include all
the columns from the source table that you need for a given query:

  CREATE TABLE t(j TEXT, s TEXT, foo TEXT);
  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan bc [s] is not
  -- indexed, is not a PRIMARY KEY,
  -- and is not UNIQUE
  CREATE INDEX t1 ON t(s);
  SELECT s FROM t WHERE s = 'foo';-- uses index; index covers column
  -- selection (just [s])

  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan unless [s] is
  -- a PRIMARY KEY

  CREATE INDEX t2 ON t(j, s);
  SELECT s, j FROM t WHERE s = 'foo'; -- full table scan; t2 doesn't
  -- help because we need a covering
  -- index where [s] is a prefix

  CREATE INDEX t3 ON t(s, j);
  SELECT s, j FROM t WHERE s = 'foo'; -- uses covering index t3, finally

  SELECT s, j, foo FROM t WHERE s = 'foo'; -- t3 does not cover -> full
   -- table scan

Usually you should have a PRIMARY KEY, and if [s] were one here, then
none of these would need full table scans, but only two of these would
use only an index and not also index the table via the PK.

  -- truly covering index, but only usable in queries by [s] or [s],
  -- [j], or [s], [j], [foo]:
  CREATE INDEX t4 ON t(s, j, foo);

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] calculated-value indexes are not covering?

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 06:48:51PM +, Wout Mertens wrote:
> sqlite> create table t(j json, s string);
> sqlite> create index s on t(s);
> sqlite> create index j on t(json_extract(j, '$.foo'));
> sqlite> create index l on t(length(s));

In order for any of these indices to be covering indices you need to add
all the columns of the table t to them:

sqlite> create table t(j json, s string);
sqlite> create index s on t(s, j);
sqlite> create index j on t(json_extract(j, '$.foo'), j, s);
sqlite> create index l on t(length(s), s, j);

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] LSM1 extension

2017-08-09 Thread Nico Williams

On Wed, Aug 09, 2017 at 08:26:51AM -0500, Charles Leifer wrote:
> sqlite4's future is uncertain. It's not, as I understand, meant to replace
> sqlite3 any time soon. I think it was more of a place to try out new ideas
> for implementations.

I must say though, SQLite4 is *brilliant*.  I do hope that it either
gets completed and replaces SQLite3, or has all its good ideas folded
into SQLite3.  The LSM1 work in SQLite3 seems to point towords the
latter.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-08 Thread Nico Williams

On Tue, Aug 08, 2017 at 10:30:33AM -0700, Jens Alfke wrote:
> > On Aug 4, 2017, at 11:28 AM, Nico Williams <n...@cryptonector.com> wrote:
> > Imagine a mode where there is only a WAL, and to checkpoint is to write
> > a new WAL with only live contents and... rename(2) into place.
> 
> What you’re describing is exactly how CouchDB’s storage engine works,
> as well as descendants like Couchbase Server’s CouchStore and
> ForestDB. (Note: I work for Couchbase.)

Also how ZFS works.

> Efficient lookups in a file like this require the existence of a bunch
> of extraneous metadata like interior B-tree nodes. This metadata
> changes all the time as records are written*, so a lot of it has to be
> written out too along with every transaction, resulting in substantial
> write amplification.

Yes.  It's called write magnification.  ZFS deals with this by first
committing to an intent log (ZIL) without writing all those interior
nodes, then it eventually writes a proper transaction.

One could do the same sort of thing for a single-file DB: write a number
of transactions as intents, then once in a while "checkpoint" them by
writing a b-tree transaction that includes all those updates.  For
readers this means always processing all intent log entries since the
last b-tree-updating transaction.

LSMs are... a lot lik this, IIUC.  Are they not?

> The other big drawback is that compaction (the checkpoint step you
> describe) is very expensive in terms of I/O. I’ve known of CouchDB
> systems that took many hours to compact their databases, and since
> every write that occurs during a compaction has to be replayed onto
> the new file after the copy before compaction completes, one can get
> into a state where a busy database either never actually finishes
> compacting, or has to temporarily block all writers just so it can get
> the damn job done without interruption. (It’s a similar problem to GC
> thrash.)

There's definitely trade-offs.  This is already the case for SQLite3's
WAL.  You have to checkpoint often, but when you do you lose read
concurrency.  When you value read concurrency highly you might tune the
WAL max size up to reduce checkpoint frequence, and now you slow down
checkpointing.

Another thing that can be done is to write to the "next" DB as soon as
possible, possibly synchronously with writes to the "current" DB.  Then
the checkpoint process is very simple: write the "close" TX to the
"current" DB, rename the "next" into place, create the next "next".

> We’ve also seen that, on low-end hardware like mobile devices, I/O
> bandwidth is limited enough that a running compaction can really harm
> the responsiveness of the _entire OS_, as well as cause significant
> battery drain.

Yes.  But there you don't really need read concurrency, so SQLite3
without a WAL (or with a WALL but frequent checkpointing) will suffice.

> * Modifying/rewriting a single record requires rewriting the leaf node
> that points to it, which requires rewriting the parent node that
> points to the leaf, and this ripples all the way up to the root node.

I'm well aware :)  Others on the list might not, naturally.

One of the very nice features of an append-only single-file DB file
format is that you can just "tail -f" (well, a binary tail -f) to
replicate.  If you add in the intent log concept, it's not so bad in
terms of I/O, but you slow down readers somewhat.

Another very nice feature of an append-only single-file DB file is high
read concurrency.  And yet another is fast recovery (throw away a
truncated transaction and "checkpoint").

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] hex and char functions

2017-08-07 Thread Nico Williams

On Mon, Aug 07, 2017 at 11:45:50AM -0400, Richard Hipp wrote:
> On 8/7/17, Nico Williams <n...@cryptonector.com> wrote:
> > Internally SQLite3 uses UTF-8.  The SQLite3 API lets you deal with
> > UTF-16, but this just transcodes to/from UTF-8 internally.
> 
> That is not quite correct.
> 
> SQL statements are always converted into UTF8 for parsing and code
> generation.  But data can be stored in the database file and processed
> as UTF8, UTF16be, or UTF16le.  All text content for a single database
> must use the same encoding.  When creating the database file, set the
> encoding before adding any content by running one of:
> 
> PRAGMA encoding('utf-8');
> PRAGMA encoding('utf-16be');
> PRAGMA encoding('utf-16le');
> 
> See https://www.sqlite.org/pragma.html#pragma_encoding for additional
> information.

Ah, OK, thanks for the correction!
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] hex and char functions

2017-08-07 Thread Nico Williams

On Mon, Aug 07, 2017 at 03:29:41PM +, x wrote:
> Apologies, I should have said I was using c++ builder Berlin on
> windows 10 and that UnicodeString was UTF16.
> 
> I thought I had learned enough about this string lunacy to get by but
> finding out that the UTF8 code for the UTF16 code \u0085 is in fact
> \uc285 has tipped me over the edge. I assumed they both used the same
> codes but UTF16 allowed some characters UTF8 didn’t have.

Internally SQLite3 uses UTF-8.  The SQLite3 API lets you deal with
UTF-16, but this just transcodes to/from UTF-8 internally.

So when you SELECT hex(); the hex() function
sees UTF-8, not UTF-16.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 10:09:04PM +0200, Sylvain Pointeau wrote:
> On Fri, Aug 4, 2017 at 10:01 PM, Nico Williams <n...@cryptonector.com>
> wrote:
> > The main issue is that you can't tell when a transaction has begun or
> > ended, so you can't tell when curr_val() should raise an error.  You can
> > only tell that next_val() has been called on this connection, and that's
> > not the same semantics as H2's.
> 
> actually the temp table is better in case of rollback.

Right.

> however normally it should be executed under an autonomous transaction,
> a rollback should not affect the sequence.

Correct.

> > I would just not bother with curr_val() for now.
> 
> I need it, but the current implementation should be enough.

OK.

> => I would really really like if a real implementation of the sequence
> could be done in sqlite, behaving like oracle.
> (very fast, reliable etc)
> 
> Thanks for all your help!

You're welcome.  Thanks for posting this on github.  Make sure there's a
license, preferably a nice and friendly one such as a BSD license, or
else put it in the public domain like SQLite3 is -- but it's your code,
so you do what you like with it.

Cheers,

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 09:55:03PM +0200, Sylvain Pointeau wrote:
> On Fri, Aug 4, 2017 at 9:16 PM, Nico Williams <n...@cryptonector.com> wrote:
> > In general I would say: pthread_once() (Unix) or InitOnceExecuteOnce()
> > (WIN32).  But here, a global in combination with CREATE TEMP TABLE IF
> > NOT EXISTS is probably good enough.
> 
> I wonder if a memory structure would not be better?

Doesn't matter.  You already have a solution.

The main issue is that you can't tell when a transaction has begun or
ended, so you can't tell when curr_val() should raise an error.  You can
only tell that next_val() has been called on this connection, and that's
not the same semantics as H2's.

I would just not bother with curr_val() for now.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 07:20:00PM +, Hick Gunter wrote:
> A "temp table" would only be visible in the session that creates it
> and would not live past the end of the session. Sequences should be
> persistent...

This is about the H2 curr_val() semantics -- that it only works if
you've already done a next_val() in the same tx.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 09:09:10PM +0200, Sylvain Pointeau wrote:
> I programmed the currval using a temp table, but the performance dropped
> slightly
> 
> sqlite> WITH T(i) AS (SELECT (1)i UNION ALL SELECT i+1 FROM T WHERE
> i<100) INSERT INTO seq_test(seq_num) SELECT seq_nextval('s
> eq1') from T;
> Run Time: real 25.837 user 23.446950 sys 0.171601
> 
> I create the temp table only once with the help of a global variable. I
> believe it is safe (per session), am I right?

In general I would say: pthread_once() (Unix) or InitOnceExecuteOnce()
(WIN32).  But here, a global in combination with CREATE TEMP TABLE IF
NOT EXISTS is probably good enough.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 07:05:06PM +, Peter Da Silva wrote:
> Step 2 seems rather expensive, even if you’re filtering out dead blocks in 
> the process.

It's no more expensive than WAL checkpointing is today.

You could always do what LMDB does to reuse free blocks in a DB and
avoid having to checkpoint-and-rename-into-place, but that greatly
complicates locking because readers have to lock the transaction that
they are reading at.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 01:59:07PM -0500, Nico Williams wrote:
> The checkpoint process would look like this:
> 
>  - make a new file in the same directory
>  - copy the DB to the new file

The copy would basically be copying all the live data as a single
transaction on the new DB/WAL file.  At the end there should be at most
two transactions in the file: one for the schema, one for the data.

>  - rename the new file into place
>  - write the "closed, renamed" marker into the old file (which is still
>open)
> 
> This works on POSIX, and if you use the right CreateFileEx() options, it
> works on WIN32.

I'm referring here to readers that have the first file open...
continuing to be able to read from it as long as they retain the open
file handle, even after that file is deleted by the rename.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 06:49:42PM +, Peter Da Silva wrote:
> On 8/4/17, 1:45 PM, "sqlite-users on behalf of Nico Williams" 
> <sqlite-users-boun...@mailinglists.sqlite.org on behalf of 
> n...@cryptonector.com> wrote:
> > SQLite3's WAL is already log-structured.  The main DB file isn't.
> > So SQLite3 is a hybrid.  But it doesn't have to be a hybrid.
> 
> One issue I see with this is you’ll have to retain the old WALs as
> long as they have any live data, or the checkpoint operation will have
> to copy all the unmodified data in the log to the new WAL, or you’ll
> have to keep a non-log structure containing all the relatively static
> data that hasn’t been modified in the last “N” checkpoints.

The checkpoint process would look like this:

 - make a new file in the same directory
 - copy the DB to the new file
 - rename the new file into place
 - write the "closed, renamed" marker into the old file (which is still
   open)

This works on POSIX, and if you use the right CreateFileEx() options, it
works on WIN32.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 06:43:08PM +, Peter Da Silva wrote:
> On 8/4/17, 1:28 PM, "sqlite-users on behalf of Nico Williams" wrote:
> > [...]
> 
> A log-structured database, like a log-structured file system?

Yes.  Filesystems and databases are each other's dual.

SQLite3's WAL is already log-structured.  The main DB file isn't.  So
SQLite3 is a hybrid.  But it doesn't have to be a hybrid.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 11:53:05AM -0500, Nico Williams wrote:
> WAL mode still ends up having no read concurrency when it's time to
> checkpoint the WAL.  The same would happen with this concept.  I don't
> think this works well.

Speaking of which... and I know I've mentioned this before and I risk
being a broken record...

Imagine a mode where there is only a WAL, and to checkpoint is to write
a new WAL with only live contents and... rename(2) into place.  Such a
mode would a) be a 100% Copy-on-Write (CoW) mode, whereas currently WAL
is only CoW until a checkpoint operation comes along, b) have better
read concurrency.  A special marker could be used to denote "this WAL is
closed and replaced with a checkpointed one", that way readers only have
to stat/re-open when they see this.

It probably wouldn't take much to write such a thing.

One nice thing about this is that block-level replication would be easy
and fast because all writes would be append-writes: just send missing
blocks until the file ID (st_dev, st_ino, and inode gen number) changes,
then send the whole new file.  (Not that block-level replication is an
end-all-be-all.  But that for some use cases it's very nice.  Logical
replication is better for other use-cases.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 08:04:38PM +0200, Sylvain Pointeau wrote:
> On Fri, Aug 4, 2017 at 6:24 PM, Nico Williams <n...@cryptonector.com> wrote:
> > On Fri, Aug 04, 2017 at 06:06:55PM +0200, Sylvain Pointeau wrote:
> > > Now I would like to implement the seq_currval:
> 
> we could implement it like (in pseudo code):
> 
> var currvalues = map[string,number]
> 
> function seq_nextval(seq_name) {
>   update sequence set val = val + inc where name = seq_name
>   nextval = select val from sequence where name = seq_name
>   currvalues[seq_name] = nextval
>   return nextval
> }
> 
> function seq_currval(seq_name) {
>   if currvalues[seq_name] is not set => raise error
>   return currvalues[seq_name]
> }

Sure, but the currvalues[] map has to be store... in the DB handle.  You
could use a TEMP TABLE, but you'd have to make sure to DELETE all its
rows when a transaction ends.

> however I noticed that because I am retrieving the params as int64:
> '10a' becomes 10,
> 'abc' becomes 0
> null becomes 0
> 
> and I think that is ok, so the SEQ_INIT INTEGER NOT NULL can never be null
> and the SEQ_INC can never be 0 due to the check

Sure.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Feature request: check for 'lib' prefix for load_extension()

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 10:55:57AM -0500, Bob Friesenhahn wrote:
> Sqlite does not really have a way to know if a module in the current
> directory (the directory which just happened to be current when the request
> was made) should be trusted.  To be secure, sqlite should insist that the
> load request be something other than a bare module name because then the
> responsibility is put on the user of sqlite.

You can always load an absolute path.  That said, using relative paths
and depending on the caller's run-path is not bad per-se -- just
something to be aware of.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 08:50:05AM +0200, Luc DAVID wrote:
> sqlite has WAL mode for better concurrency and this could maybe be used to
> extend the number of writters:
> 
> Do you think it would be possible to create a
> MyDb.WAL001...MyDb.WAL.002...MyDb.WAL.nnn when a write operation is
> currently running in order to allow more writers?
> 
> The sqlite engine would then take care of dealing with all the WAL files
> when reading data, backup...etc

WAL mode still ends up having no read concurrency when it's time to
checkpoint the WAL.  The same would happen with this concept.  I don't
think this works well.

Other approaches will work better.  One would be to have a server
process, with the SQLite3 API transparently using IPC to talk to it.  Or
else a single-process DB (connections from only one process allowed at
any time) where internal thread synchronization could be used to manage
locks (on rows/pages and a log used to ultimately put together a final
transaction in the background so that other threads can keep making
progress).

Doing this sort of thing with multiple processes gets tricky fast,
requiring shared memory, mutexes on shared memory, ...  IPC to a server
process makes everything much simpler, but there is an IPC performance
penalty.  The server process model works very well for PostgreSQL
though...

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 12:58:43PM +0100, Simon Slavin wrote:
> The problem you’re trying to fix is one of the big problems with
> distributed databases.  Nobody has found a good solution for it yet.

It's impossible to solve for the eventually-consistent types.  You just
have to explicitly handle... eventual conflicts... eventually.

For the rest, higher write concurrency is not the end of the world, but
you have to be aware of it and you have to be careful.  The RDBMS can
only do so much to protect you automatically -- the more it does, the
worse it will perform, but the less it does the more you have to be
aware and code defensively.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Thinking about a way to extend the number of writers in WAL mode

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 09:33:31AM +0200, Eric Grange wrote:
> The main problem with multiple writers would be in conflict resolution,
> locking and deadlocks.

There is plenty in the literature about this.

You have to code more defensively, you may need things like "FOR
UPDATE", etc.

> Imagine a simple accounting transactions "update accounts set value = value
> + 1 where ..." if run at the same time from 2 threads (or close enough),
> then if you do not have conflict resolution in place you may end up
> increase value by only 1

I write that like so:

  SELECT value
  FROM accounts
  WHERE ...; -- store this in a variable

  UPDATE accounts
  SET value = value + 1
  WHERE ... AND value = :value_read_earlier;

This is how one is supposed to get atomicity in LDAP too, incidentally.

So, if you were creating a SEQUENCE kind of thing (see the "sequencer"
thread this week too) then I'd return 1 + value_read_earlier.

This approach works well and is portable!  But it does require that you
think about concurrency :(

> 1) alice starts transaction
> 2) bob starts transaction
> 3) alice reads value, sees 10
> 4) bob reads value, sees 10
> 5) alice writes 10+1
> 6) bob writes 10+1 ooops

Yeah, "don't do that"; see above.

> And once you have more than one query, things become even more complicated.

Things are already this complicated if you use PostgreSQL, SQL Server,
Oracle, ...

There's nothing new here, except that plenty of code wirtten with
SQLite3 in mind may need to be modified to be safe with higher write
concurrency enabled (so it probably shouldn't be enabled by default).

> Even among full relational DBs, few manage these correctly. IME only
> PostgreSQL and Firebird handle these correctly by default, for Oracle or
> MSSQL you have to use special locking modes and transaction options with
> significant performance penalties.

Yes, you have to be aware of varying synchronization/locking features
(e.g., "FOR UPDATE") and write concurrency semantics.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 06:06:55PM +0200, Sylvain Pointeau wrote:
> Please find below the last source code, I removed the check on the table
> (NOT NULL on both sql_val and seq_inc)

Yeah, I saw.  I think this is another argument for SQLite3 needing a
strict-type mode!

(I very much prefer strong typing, though for EAV schemas it's very
nice to have a duck-typed V column.  But if CHECK constraints kill
performance, then strong typing via CHECK constraints is not viable.
I know, CHECK constraints don't get you strong typing in VIEWs and
queries in general, but it's good enough for my purposes.)

> I find that the performance is not that bad now.

Excellent.

> Now I would like to implement the seq_currval:
> 
> Oracle raises an exception because seq_nextval must be called at least one
> time for the session

I'm not sure how to implement this.  Maybe with the sessions extension.

But I don't think this is an important semantic in the context of
SQLite3 -- after all, it's "lite".

> H2 returns init minus seq (making nextval returning the init value) => this
> is also the trick I used so the update is simpler so more efficient

I don't follow.  What's H2?

> do you agree that we must use a variable mapping sequenceName => last
> nextval
>
> should I use a very simple structure (fixed sized array and O(n) access)?
> or use a map? but then I would rather prefer to switch to C++, unless you
> have a better idea?

I don't understand this.

> #include "sqlite3ext.h"
> SQLITE_EXTENSION_INIT1
> 
> void sp_seq_init(sqlite3_context *context, int argc, sqlite3_value **argv) {
>   int rc = 0;
>   sqlite3_stmt *stmt;
>   sqlite3 *db = sqlite3_context_db_handle(context);
> 
>   const unsigned char* seq_name = sqlite3_value_text(argv[0]);
>   long seq_init_val = sqlite3_value_int64(argv[1]);
>   long seq_inc_val = argc < 3 ? 1 : sqlite3_value_int64(argv[2]);
> 
>   rc = sqlite3_exec(db, "CREATE TABLE IF NOT EXISTS SP_SEQUENCE ( " \
>   " SEQ_NAME TEXT PRIMARY KEY, " \
>   " SEQ_VAL INTEGER, " \
>   " SEQ_INC INTEGER " \
>   " )", 0, 0, 0);
> 
>   if( rc != SQLITE_OK ) {
> sqlite3_result_error(context, sqlite3_errmsg(db), -1);
> return;
>   }

You could still check the types here...  It won't kill performance --
after all, this is only at sequence init time.

Also, you could keep the CHECK constraints on the name and increment
columns, and not on the current columns.  Not sure it's worth it.

The rest looks good.

Also, I saw D. R. Hipp's discussion of increased write concurrency.
Ooops!  (Also, great news!)  Maybe you should change the code for the
next value to a loop like I described earlier (get the current value,
update to add 1 where the value hasn't changed, repeat until the update
statement updates one row, not zero).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Feature request: check for 'lib' prefix for load_extension()

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 08:20:10AM -0700, J Decker wrote:
> On Fri, Aug 4, 2017 at 8:11 AM, Nico Williams <n...@cryptonector.com> wrote:
> > No, see, the ".so"/".dll" suffix is used in all cases, and it varies by
> > platform, so it's best if SQLite3 adds it so you can keep your code more
> > portable.  While the "lib" prefix is only ever needed if you want a
> > link-edit to find the thing as -l -- which you almost never ever
> > want when  is a loadable module.
> 
> so does the lib prefix.  You cannot package a .so in an android .apk.
>  but you can include lib.so.

Really?!  Oy.  But still, that would just mean that when building for an
Android platform SQLite3 *must* add the "lib" prefix, not that it should
try it on every Unix-like platform.

> At which point, because I apparently missed that too.  if one is using
> CMake, you get a lib.so without setting additional options.

So?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Feature request: check for 'lib' prefix for load_extension()

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 10:17:33AM +0200, Dominique Devienne wrote:
> On Fri, Aug 4, 2017 at 2:46 AM, Nico Williams <n...@cryptonector.com> wrote:
> > You're mistaken.
> >
> > lib.so is NOT "the default naming scheme on many *nix platforms".
> >
> > lib.so is the naming scheme when you want the link-editor (e.g.,
> > ld(1)) to find a library using -l.
> >
> > But loadable modules are almost never meant to be used that way.
> > They're usually meant to be used only through dlopen() and friends.
> 
> While you're technically correct, Matt's request seems completely
> reasonable to me.

Not if it's a result of being mistaken.  Now that OP knows about object
naming, he can reconsider and restate his request.

> If SQLite wasn't doing *any* changes to the filename, not adding the
> extension for example, you may have had a point, but since it does,
> trying with the lib prefix on Linux, which is undeniably a common

No, see, the ".so"/".dll" suffix is used in all cases, and it varies by
platform, so it's best if SQLite3 adds it so you can keep your code more
portable.  While the "lib" prefix is only ever needed if you want a
link-edit to find the thing as -l -- which you almost never ever
want when  is a loadable module.

> naming convention on Linux, seems like a worthwhile addition. [...]

You didn't understand.

> I really don't see what's controversial with Matt's request :)

a) it's borne of a mistake
b) it's not necessary
c) it's more code and more docs
d) it's more likely to lead to accidents

> It's not like load-extension is a performance-critical operation, that
> trying an extra load is that expensive.

This is true, but also irrelevant :)

> And the security consideration that an "attacker" could make it load
> his own library instead, but using a different name tried before the
> actual one is moot IMHO, since extension loading is by nature unsafe.
> 
> In short, I support Matt's request and I hope DRH considers it seriously.
> FWIW :). --DD

What problems does this solve?  None.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Feature request: check for 'lib' prefix for load_extension()

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 06:05:53AM +, Hick Gunter wrote:
> >Von: sqlite-users [mailto:sqlite-users-boun...@mailinglists.sqlite.org] Im 
> >Auftrag von Nico Williams
> >But loadable modules are almost never meant to be used that way.
> >They're usually meant to be used only through dlopen() and friends.
> 
> Which other method apart from dlopen() would you recommend for
> dynamically loaded extensions?

There's only two methods for loading an object dynamically: because you
demanded it at link-edit-time, or because you used dlopen() or similar.

There are no others, full stop.

(You could write your own run-time loader, but you'd still be
implementing a dlopen().)

> We are using virtual tables to interface with diverse data stores and
> make them queryable with SQL. The general interactive shell needs
> access to a certain subset of functions, and some speciality tools
> have dedicated loadable extensions that should not be accessible
> outside of that scope.

So?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-04 Thread Nico Williams

On Fri, Aug 04, 2017 at 03:35:36AM +, Sylvain Pointeau wrote:
> Le ven. 4 août 2017 à 02:42, Nico Williams <n...@cryptonector.com> a écrit :
> > >   sqlite3_prepare_v2(db, "insert or replace into sp_sequence (seq_name,
> > > seq_val, seq_inc) values (?, ?, ?)", -1, , 0);
> >
> > Should this init function re-initialize sequences if called more than
> > once for the same sequence name?
> 
> yes I did it on purpose

I think it would be too easy to reset sequences, but that's not
something one wants to do often.

> > Anyways, no change needed here.
> 
> should I use the mutex lock to ensure atomicity?

No.  It can't help.  As I said, no change is needed here.

> > Nice!  Did you test it?
> 
> yes! I tested it with mingw compiler (and cmake), and the sqlite3 shell to
> test the seq functions.
> 
> I will do the code update and will probably create a github repository

Excellent!

> thanks a lot for your review.

Thank you for taking this on!

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-03 Thread Nico Williams

On Thu, Aug 03, 2017 at 07:42:12PM -0500, Nico Williams wrote:
> On Fri, Aug 04, 2017 at 12:35:56AM +0200, Sylvain Pointeau wrote:
> I think some type checking should be done.
> 
> You could just take the argv[] values and bind them directly to the
> insert below, and use CHECK constraints on the SP_SEQUENCE table.

This seems best, actually, partly because you can't hide this table and
with CHECK constraints you get to make sure that users use the right
types in these columns.

I do wish SQLite3 had a strict typing mode where the declared type of a
column is automatically made into a CHECK constraint.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Feature request: check for 'lib' prefix for load_extension()

2017-08-03 Thread Nico Williams

On Tue, Aug 01, 2017 at 10:56:47AM -0700, Matt Chambers wrote:
> load_extension() has the very sensible behavior of:
> > So for example, if "samplelib" cannot be loaded, then names like
> > "samplelib.so" or "samplelib.dylib" or "samplelib.dll" might be tried
> > also.
> 
> I would like to see that extended to include "libsamplelib.so" since that is
> the default naming scheme on many *nix platforms. This simple change would
> allow me to use the same base library name for my extension on both Windows
> and Linux. Otherwise I have to modify my build system to override its
> default behavior of adding the lib prefix on Linux.

You're mistaken.

lib.so is NOT "the default naming scheme on many *nix platforms".

lib.so is the naming scheme when you want the link-editor (e.g.,
ld(1)) to find a library using -l.

But loadable modules are almost never meant to be used that way.
They're usually meant to be used only through dlopen() and friends.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-03 Thread Nico Williams

On Fri, Aug 04, 2017 at 12:35:56AM +0200, Sylvain Pointeau wrote:
> void sp_seq_init(sqlite3_context *context, int argc, sqlite3_value **argv) {
>   int rc = 0;
>   sqlite3_stmt *stmt;
>   sqlite3 *db = sqlite3_context_db_handle(context);

If you use sqlite3_create_function*() with nArg == -1 then you can have
just one UDF from sp_seq_init().  You'd have to check argc here though.

>   const unsigned char* seq_name = sqlite3_value_text(argv[0]);
>   long seq_init_val = sqlite3_value_int64(argv[1]);

seq_init_val could be optional and default to 0 or maybe 1.

>   long seq_inc_val = argc < 3 ? 1 : sqlite3_value_int64(argv[2]);

I think some type checking should be done.

You could just take the argv[] values and bind them directly to the
insert below, and use CHECK constraints on the SP_SEQUENCE table.

Or you could use sqlite3_value_type() to check that each argument is of
the expected value.

I like the idea of using a CHECK constraint.  It simplifies the C code
by having SQLite3 do more of the work.

>   rc = sqlite3_exec(db, "CREATE TABLE IF NOT EXISTS SP_SEQUENCE ( " \
>   " SEQ_NAME TEXT NOT NULL PRIMARY KEY, " \
>   " SEQ_VAL INTEGER NOT NULL, " \
>   " SEQ_INC INTEGER NOT NULL " \
>   " )", 0, 0, 0);

You might want to add WITHOUT ROWID here (unless you want this to work
with older versions of SQLite3).  You could make this conditional on a C
pre-processor macro.

>   if( rc != SQLITE_OK ) {
> sqlite3_result_error(context, sqlite3_errmsg(db), -1);
> return;
>   }
> 
> 
>   sqlite3_prepare_v2(db, "insert or replace into sp_sequence (seq_name,
> seq_val, seq_inc) values (?, ?, ?)", -1, , 0);

Should this init function re-initialize sequences if called more than
once for the same sequence name?

Because that's what INSERT OR REPLACE will do (since on conflict it will
delete the old row and insert a new one).  If you don't mean to
re-initialize then use INSERT OR IGNORE.

(When I want INSERT OR UPDATE I just run two statements:

UPDATE .. WHERE ..;
INSERT OR IGNORE ..;)

>   sqlite3_bind_text(stmt, 1, seq_name, -1, SQLITE_STATIC);
>   sqlite3_bind_int64(stmt, 2, seq_init_val);
>   sqlite3_bind_int64(stmt, 3, seq_inc_val);
> 
>   rc = sqlite3_step(stmt);
> 
>   sqlite3_finalize(stmt);
> 
>   if (rc != SQLITE_DONE) {
> sqlite3_result_error(context, sqlite3_errmsg(db), -1);
> return;
>   }
> 
>   sqlite3_result_int64( context, seq_init_val );
> }
> 
> void sp_seq_nextval(sqlite3_context *context, int argc, sqlite3_value
> **argv) {
>   int rc = 0;
>   sqlite3_stmt *stmt;
>   sqlite3 *db = sqlite3_context_db_handle(context);
>   long seq_val = 0;
> 
>   const unsigned char* seq_name = sqlite3_value_text(argv[0]);
> 
>   sqlite3_prepare_v2(db, "update sp_sequence set seq_val = seq_val +
> seq_inc where seq_name = ?", -1, , 0);
> 
>   sqlite3_bind_text(stmt, 1, seq_name, -1, SQLITE_STATIC);
> 
>   rc = sqlite3_step(stmt);
> 
>   sqlite3_finalize(stmt);
> 
>   if (rc != SQLITE_DONE) {
> sqlite3_result_error(context, sqlite3_errmsg(db), -1);
> return;
>   }
> 
>   sqlite3_prepare_v2(db, "select seq_val from sp_sequence where seq_name =
> ?", -1, , 0);
> 
>   sqlite3_bind_text(stmt, 1, seq_name, -1, SQLITE_STATIC);

If ever SQLite3 got higher write concurrency (it almost certainly won't,
but derivatives might), then a simple way to ensure atomicity here might
be to do the SELECT first then an UPDATE with a WHERE seq_val =
:the_value_just_read, and loop as necessary.

do {
current_value = ;
;
} while ();

Though if one is going to have higher write concurrency then one ought
to have SELECT .. FOR UPDATE.

Anyways, no change needed here.

>   rc = sqlite3_step(stmt);
> 
>   if( rc == SQLITE_ROW) {
> seq_val = sqlite3_column_int64(stmt, 0);
>   }
> 
>   sqlite3_finalize(stmt);
> 
>   if (rc != SQLITE_ROW) {
> if( rc == SQLITE_DONE ) sqlite3_result_error(context, "sequence name
> does not exist", -1);
> else sqlite3_result_error(context, sqlite3_errmsg(db), -1);
> return;
>   }
> 
>   sqlite3_result_int64( context, seq_val );
> }
> 
> 
> int sqlite3_extension_init(
>   sqlite3 *db,
>   char **pzErrMsg,
>   const sqlite3_api_routines *pApi
> ){
>  SQLITE_EXTENSION_INIT2(pApi)

I'd create just one UDF using sp_seq_init(), make the narg -1, and check
the argc counts in sp_seq_init().

>  sqlite3_create_function(db, "seq_init_inc", 3, SQLITE_UTF8, 0,
> sp_seq_init, 0, 0);
>  sqlite3_create_function(db, "seq_init", 2, SQLITE_UTF8, 0, sp_seq_init, 0,
> 0);

If you made seq_next() have nArg == -1 then you could have a default
sequence for the no-arguments case...  You don't have to -- you could
say that's just creeping featuritis!

>  sqlite3_create_function(db, "seq_nextval", 1, SQLITE_UTF8, 0,
> sp_seq_nextval, 0, 0);
>  return 0;
> }

Nice!  Did you test it?
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org

Re: [sqlite] Problem with the new pointer-passing interface

2017-08-03 Thread Nico Williams

On Thu, Aug 03, 2017 at 08:33:31PM +0200, Ulrich Telle wrote:
> > Rhetorical: Why not use the pointer value itself then instead of the
> > contents of the string?  After all, the string should just be a .text
> > section constant string...
> 
> The SQLite developer team chose a string representation for the
> pointer type to make naming collisions less likely.

So, that was rhetorical :)

> > Now, suppose that you're building FFI bindings for SQLite3 for some
> > programming language, e.g., Python.  So you have a string from the
> > run-time for this language, but you can't ensure that it will have the
> > necessary lifetime.  So now you have to make a copy.  It's a bit of a
> > pain.  SQLite3 could have made a copy itself.
> 
> I would have preferred that, too, but I learned that this would have
> imposed a runtime penalty even for applications not using the new
> pointer interface. 

That seems unlikely.  Basically the connection has to have an array/list
of types, which can be empty/null when the pointer passing interface is
not used.

> > On the flip side, it's possible that you have to do some conversions
> > anyways because SQLite3 deals with C strings and the target language
> > run-time deals with counted-codeunit strings or whatever.
> 
> The conversion is not the problem. The problem is that SQLite requires
> that the pointer type string has a life span at least as long as the
> prepared statement referring to it. And this requires that the
> application takes care of that.

I meant that the author of some bindings for SQLite3 might have to
perform conversions at their layer.  If you have to do that, then
allocating your own copy (and keeping track of it so you can release it
atexit() or whatever) is not that big a deal, which is why I concluded
with this:

> > So even if SQLite3 made a copy, the FFI bindings might have to make
> > their own copy anyways.  Thus: who cares :)
> 
> The problem is to keep the copy alive long enough. However, the
> problem is not that big. It can be solved in less than 50 lines of
> code.

It's easier for SQLite3 to do that since it has a suitable structure for
that: the connection associated with the statement.

Whereas for an FFI bindings of SQLite3 it can be much harder, since now
you'll need a list/table of these strings to free atexit() or when the
DLL is unloded, or whatever is appropriate.

> > Still, for many cases it will be easier to write code to this API if
> > SQLite3 makes its own copy.
> 
> Yes, but Richard Hipp made it quite clear that the latter is unlikely
> to happen.

Maybe I've made a strong enough argument that he might revisit that
decision.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-03 Thread Nico Williams

On Thu, Aug 03, 2017 at 06:59:44PM +0300, Alek Paunov wrote:
> On 2017-08-02 20:24, Nico Williams wrote:
> >I've implemented "inheritance" with triggers to map DMLs on "derived"
> >tables onto "base" tables.  That works and is much more general.  If you
> >need a rowid, however, the triggers have to do more work, first acquring
> >the rowid from the base table, then setting it on the derived table
> >rows, and this can get tricky.
> 
> Yes. I meant exactly that usecase - where many logically "subclass" tables
> share common "address" (rowid) space, so that, any object (or the rest of
> the DB) could "reference" objects in the whole hierarchy (like unified value
> references in script languages data-models).

It's so darned helpful, that one might expect SQL to support this.  But
no :(

PG's INHERIT fails to inherit PRIMARY KEY and UNIQUE constraints!
Inheriting those would have to also share the same namespaces, else
INHERIT would be useless still.

FOREIGN KEYs generally compile into triggers internally.  INHERIT could
as well, though there is more scope for space optimization here, as the
trigger-based scheme I've used ends up creating a fair bit of
duplication.  It's "just a matter of code" :)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Problem with the new pointer-passing interface

2017-08-03 Thread Nico Williams

On Thu, Aug 03, 2017 at 12:33:05PM +0300, Paul wrote:
> To me it seems like that string is very tightly coupled with the
> actual pointer being bound.  I think it's a good idea, in case you
> cannot make it a literal or static, to keep it with an object whose
> pointer you bind.

Rhetorical: Why not use the pointer value itself then instead of the
contents of the string?  After all, the string should just be a .text
section constant string...

Now, suppose that you're building FFI bindings for SQLite3 for some
programming language, e.g., Python.  So you have a string from the
run-time for this language, but you can't ensure that it will have the
necessary lifetime.  So now you have to make a copy.  It's a bit of a
pain.  SQLite3 could have made a copy itself.

On the flip side, it's possible that you have to do some conversions
anyways because SQLite3 deals with C strings and the target language
run-time deals with counted-codeunit strings or whatever.

So even if SQLite3 made a copy, the FFI bindings might have to make
their own copy anyways.  Thus: who cares :)

Still, for many cases it will be easier to write code to this API if
SQLite3 makes its own copy.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-03 Thread Nico Williams

On Thu, Aug 03, 2017 at 12:25:00PM +0200, Sylvain Pointeau wrote:
> On Thu, 3 Aug 2017 at 08:04, Hick Gunter  wrote:
> > A sequence is very easily implemented as a virtual table that keeps the
> > current values in a separate table my_sequences (name text primary key,
> > initial integer, current integer, increment integer).
> >
> > (...) Or whatever else tickles your fancy. Just think table <=> class,
> > virtual field <=> method
> 
> What do you all recommend? UDF or Virtual tables?

I recommend a UDF because that's how it works in PG, so your code will
be more portable that way.  That said, the two approaches are basically
equivalent.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite3_prepare16_v3 and prepFlags

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 11:01:07PM +0100, Bart Smissaert wrote:
> Using 3.20.0 now on Windows and wonder when exactly I should use the
> SQLITE_PREPARE_PERSISTENT
> 
> flag
> instead of a zero. I have tried both options
> with a plain select statement producing some 10 rows. Both worked and
> SQLITE_PREPARE_PERSISTENT
> 
> seemed
> a bit faster. Are there any clear guidelines
> when to use either option?
> Note I am not using FTS3, FTS5 or the R-Tree extension.

I thought the docs were clear: if you'll be executing the statement
repeatedly, then use SQLITE_PREPARE_PERSISTENT.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Pointers in 3.20.0 are great! Dreaming of better

2017-08-02 Thread Nico Williams

Another thing is that with serializers the shell could automatically
serialize on output if desired.

Also, the callbacks could be attached to a "type", and perhaps that way
you could preserve the APIs you already added for pointer values.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 06:46:55PM +, Sylvain Pointeau wrote:
> if I do
> 
> insert into mytable(f)
> select nextval("myseq") from T
> 
> is it really possible to make an update into that nextval function? I don't
> think so since only one statement can be run at the same time if sqlite is
> in serialized mode.

Yes, UDFs can run arbitrary SQL statements on the same connection.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 06:10:52PM +, Sylvain Pointeau wrote:
> for a general case, I would need to persist the counter into a table (for a
> specified sequencer) and doing the nextval inside a mutex lock
> 
> Is it possible to insert/ select from a UDF if the statements are
> serialized? or should I use the virtual tables? (should we store the
> sequencers in a different DB?)

In SQLite3 all writes in transactions are serialized.  No locks needed
because there's just one big lock around the entire DB.  I recommend you
read up on SQLite3's transactions and locking.

A next_serial() UDF would basically be a C-coded (or Perl, or whatever)
function that uses the SQLite3 API to first run an UPDATE on the
sequence then a SELECT to get the now-next value, and would return that.

If you use the INSTEAD OF trigger approach, then the same applies,
except that the triggers will be SQL-coded (which is nice, IMO).

This is all perfectly safe in the current SQLite3 concurrency model
(just one writer at a time).  I don't think SQLite3's write concurrency
will ever get better, but I suppose one never knows!

If you were using an RDBMS with higher write concurrency then you'd need
to be more careful and arrange for synchronization.  Usually such
RDBMSes provide builtin next_serial()-style functions anyways.

As to your last question, I'd put the sequences table in the same DB,
unless you need to attach multiple DBs and have all of them share
sequences, in which case I'd make a DB just for the sequences, or else
put them in the main DB.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 05:41:38PM +0100, Simon Slavin wrote:
> On 2 Aug 2017, at 5:35pm, Nico Williams <n...@cryptonector.com> wrote:
> > On Wed, Aug 02, 2017 at 04:56:34PM +0100, Simon Slavin wrote:
> >> Can someone explain ?
> > 
> > They make it easy to have N tables with the same rowid namespace, for
> > example.  So you could do something like:
> > 
> >  ...
> 
> Okay, I understand that.  Thanks, Nico.
> 
> So the reason I didn’t understand the need for sequences is that
> someone who had been using SQLite for a long time would never come up
> with the concept.  It’s just not needed.  You’d just create individual
> tables, each with their own AUTOINC key as normal, then key into them
> using a VIEW or FOREIGN KEY.  The problem solved by SEQUENCEs never
> arises.

Sometimes you have external constraints on your schema and so don't have
the freedom to do what you propose.  I guess it can happen that you've
never had to experience that, but I have.

> This gives us a problem with Peter’s original question, because it
> seems unlike that implmenting sequences with SQLite is common enough
> that we have a ready solution.

Well, if no one's done the UDF thing, maybe OP can do it and maybe open
source it.  I mean, it's pretty simple.  Alternatively OP can use
triggers as discussed.

Another thing I often do with SQLite3 is to create VIEWs with INSTEAD OF
triggers that do the right thing.  Something like:

  CREATE TABLE sequences (...); -- see previous post
  CREATE TABLE real_thing (...);
  CREATE VIEW  thing AS
  SELECT * FROM real_thing;
  CREATE TRIGGER thing_ins INSTEAD OF INSERT ON thing
  FOR EACH ROW
  BEGIN
UPDATE sequences
SET last = last + 1
WHERE name = 'foo';
INSERT INTO real_thing (...)
SELECT (SELECT last FROM sequences WHERE name = 'foo'),
   NEW.column_1, ...;
  END;
  CREATE TRIGGER thing_upd INSTEAD OF UPDATE ON thing
  FOR EACH ROW
  BEGIN
UPDATE real_thing
SET ...
WHERE ...;
  END;
  CREATE TRIGGER thing_del INSTEAD OF DELETE ON thing
  FOR EACH ROW
  BEGIN
DELETE FROM real_thing
WHERE ...;
  END;

I often basically use VIEWs in SQLite3 as a form of SQL-coded stored
procedures, with NEW.* / OLD.* being the "function"'s arguments.

This can get very verbose and somewhat tedious though.  I've used SQL to
generate all of these VIEWs and INSTEAD OF TRIGGERs in order to reduce
the amount of code to hand-maintain.  (I also do similar things with PG:
https://github.com/twosigma/postgresql-contrib .)

Since it's not possible to have an INSERT output a result, nor can you
have a SELECT on such a VIEW have a side-effect, it's not possible to
write a sequences VIEW that you can then use like this:

  -- Doesn't work because SELECT on VIEWs can't have side-effects (not
  -- without side-effect-having UDFs in the VIEW's definition):
  INSERT INTO real_thing (...)
  SELECT (SELECT next FROM seq WHERE name = 'foo'), ...;

Without a next_serial() UDF one has to resort to the triggers discussed
earlier.  So there's a limit to what one can do with this technique if
you also appreciate and want elegance (which I do).

An actual next_serial()-type function would be very nice.

In short, SQLite3 basically has stored procedures.  It's just missing
syntactic sugar for them.

It really does help to not be afraid of using SQL as a programming
language.  Many are allergic to SQL as a programming language -- those
poor souls often end up using ORMs and pay the price for it later.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 07:48:52PM +0300, Alek Paunov wrote:
> On 2017-08-02 18:23, Sylvain Pointeau wrote:
> ...
> >
> >CREATE SEQUENCE IF NOT EXISTS SEQ_1 START WITH 12123;
> >
> >insert into MYTABLE(SPECIFIED_NUMBER, OTHERINFO) values (seq_1.nextval,
> >'other info')
> >
> 
> BTW, your request is somewhat related with the brand new union-vtab SQLite
> extension [1] (for optimized union view of identical tables) - if we have
> the basic PostreSQL nextval in SQLite, the following pattern would become
> possible:

I find the union vtab thing mostly not useful because it requires
constraining the tables to be union'ed to have distinct ranges of
rowids.  This is of too narrow utility.

PostgreSQL-style sequence support would be much more general.

If the union vtab thing is aiming to make it easier to implement
something like table inheritance, I'll warn you right off that
PostgreSQL's INHERIT is utterly useless -- do not copy it.

I've implemented "inheritance" with triggers to map DMLs on "derived"
tables onto "base" tables.  That works and is much more general.  If you
need a rowid, however, the triggers have to do more work, first acquring
the rowid from the base table, then setting it on the derived table
rows, and this can get tricky.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sequencer

2017-08-02 Thread Nico Williams

On Wed, Aug 02, 2017 at 04:56:34PM +0100, Simon Slavin wrote:
> On 2 Aug 2017, at 4:54pm, Peter Da Silva  
> wrote:
> > Can’t you do the same basic logic then use (SELECT value FROM
> > super_sequences WHERE id=’SEQ_1’) instead of SEQ_1.nextval?
> 
> Actually, I don’t understand how sequences are superior to normal use
> of an AUTOINC column.  Can someone explain ?

Think of sequences as non-deterministic functions.  (They are actually
deterministic, but using hidden state, so from the engine's perspective
they are non-deterministic.)

They make it easy to have N tables with the same rowid namespace, for
example.  So you could do something like:

  CREATE SEQUENCE foo START WITH 0 INCREMENT BY 5;
  CREATE TABLE t1 (rowid integer primary key default (next_serial(foo)), ...);
  CREATE TABLE t2 (rowid integer primary key default (next_serial(foo)), ...);
  CREATE TABLE t3 (rowid integer primary key default (next_serial(foo)), ...);

and have those three tables share a rowid namespace -- there will be no
collisions between them.  The way one might handle this use case in
SQLite3 is to create one master table with autoincrement and then have
foreign key references onto it from the others; you'd insert into the
master table first then into the referring table using
last_insert_rowid() or a sub-query that returns the same.

There are other workarounds too, as others have mentioned (triggers,
...).

One could always create a UDF that does an UPDATE and SELECT behind the
scenes.  Then one would create a table like so:

  CREATE TABLE sequences (name TEXT PRIMARY KEY,
  start INTEGER DEFAULT (0),
  increment INTEGER DEFAULT (1),
  last INTEGER DEFAULT (0)) WITHOUT ROWID;

and one would create sequences by INSERTing rows into that table, and
the UDF would just do the obvious

  UPDATE sequences SET last = last + 1 WHERE name = _name_argument;

then

  SELECT last FROM sequences WHERE name = _name_argument;

and return that.

Such a UDF probably exists as open source soemwhere, so OP should look
around for it.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Pointers in 3.20.0 are great! Dreaming of better

2017-08-01 Thread Nico Williams

http://sqlite.org/bindptr.html

Great stuff.  This will make it possible to, e.g., have JSON functions
that keep parsed JSON texts... parsed for as long as possible.  The
user's SQL will have to invoke an encoder function to encode parsed
JSON texts back to actual text, sadly, but that's not a very big deal.

I'd like to propose making this slightly (a lot) better as follows:

a) add a serializer callback function (not just a destructor)

When SQLite3 would need to store a pointer value, or when a CAST is
used on a pointer value, SQLite3 would invoke the serializer to
produce a value of the normal types supported by SQLite3 (e.g, text,
blob, ...).

b) allow one to define new SQL types as an existing type with an
associated parser function that produces a pointer value.

This would entail a sqlite3 C API for creating SQL "types" and
associating with them: a pointer type name, a parser, a serializer,
and a destructor.  (This would also enable a simpler
sqlite3_bind/result/value_pointer() API that doesn't need the
destructor to be provided.)

c) maybe it'd be nice to have a void * context data argument for the
parser, serializer, and destructor functions.

These two enhancements would make it possible to transparently use
JSON in SQLite3 with JSON texts kept parsed as long as possible, thus
reducing the amount of parsing and encoding.

A particularly interesting use of this for me would be SQLite3
bindings for https://stedolan.github.io/jq.  Imagine a jq() function
(including aggregate and table-valued kinds of functions).  One could:

-- select the first foo key's value anywhere in the given JSON
SELECT t.name, jq('..?|.foo', t.json_column) AS result FROM my_table t;

-- select all the foo keys' values anywhere in the given JSON (here
jq() is a table-valued function)
SELECT r.name, r.result FROM (SELECT t.name AS name, jq('..?|.foo',
t.json_column) AS result FROM my_table t) r;

-- aggregate into a JSON array (yes, SQLite3 already has this, I'm
aware, but this is just a simple example)
SELECT t.name, jq('[inputs]', t.foo) FROM my_table t GROUP BY t.name;

I hope it's not too late for this idea.

Thanks,

Nico
--
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Datatype for prices (1,500)

2016-11-30 Thread Nico Williams

On Wed, Nov 30, 2016 at 02:22:42PM -0600, John McKown wrote:
> On Wed, Nov 30, 2016 at 9:46 AM, Chris Locke  wrote:
> > Storing as integers is the way to go.
> 
> At present, I think that is the best way. Of course, we could argue that
> every RDMS "should" implement Decimal Floating Point.

You could argue that every programming language needs that.  What makes
SQL more special than the others in this regard?

There are reasons we use IEEE754: it's fixed-sized, it's built-in pretty
much everywhere, and it's usually implemented in hardware, so it's fast.
Those are fantastic selling points for IEEE754.  There are reasons we
all dislike it too...

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Datatype for prices (1,500)

2016-11-30 Thread Nico Williams

On Wed, Nov 30, 2016 at 04:39:25PM +0100, Werner Kleiner wrote:
> yes thats right, but it would be nice to have same behavior between
> MySQL and sqlite

SQLite3 has very few types.  Adding types is very difficult, so I doubt
new types will be added anytime soon.

Price data and datetime data are the ones most commonly requested.
SQLite3 does have suitable types for representing them, but not quite
the native types that you and others want.  It's a fact you must accept
or else not use SQLite3.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Read-only access which does not block writers

2016-11-28 Thread Nico Williams

On Mon, Nov 28, 2016 at 07:45:16PM +, Howard Chu wrote:
> Nico Williams wrote:
> >One of the problems with LMDB's locks is that readers need write
> >permission to the lock file, IIRC :( but at least it's a separate file.
> 
> We developed an alternate locking protocol for Postfix. It just uses
> standard fcntl() locks.

Yeah, I'm aware.

> http://www.postfix.org/lmdb_table.5.html
> 
> Of course nothing comes for free - with this approach, writers don't block
> readers, but readers block writers. That's the best you're going to get
> without custom lock protocols like LMDB uses natively.

TANSTAAFL, I know.  A proper COW DB with coalesce-and-rename-into-place
would need no locking of any kind for readers, at the price of needing
more storage and more I/O to do the vacuuming.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Read-only access which does not block writers

2016-11-28 Thread Nico Williams

On Fri, Nov 25, 2016 at 09:08:37AM +0100, Florian Weimer wrote:
> I think you misunderstood what I was asking (see the start of the thread).
> I need to support Byzantine readers which do not follow the locking
> protocol.  Based on the documentation, LMDB uses locks to implement MVCC and
> prevent premature page reuse.

There's a good use case for an LMDB-like COW DB such that readers need
never lock.  That would require vacuuming by writing a new file and
renaming it into place, which is a trade-off.

One of the problems with LMDB's locks is that readers need write
permission to the lock file, IIRC :( but at least it's a separate file.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Query Planner fails to recognise efficient strategy when '=' condition gives a strong hint

2016-11-17 Thread Nico Williams

On Thu, Nov 17, 2016 at 04:48:20PM +0200, Paul wrote:
> Replacing JOIN does not help either:
> 
> sqlite> EXPLAIN QUERY PLAN SELECT foo.id FROM bar JOIN foo ON bar.foo = 
> foo.id ORDER BY id DESC LIMIT 0, 40;
> selectidorder   fromdetail
>   
> --  --  --  
> 
> 0   0   1   SCAN TABLE foo
>   
> 0   1   0   SEARCH TABLE bar USING INTEGER PRIMARY 
> KEY (rowid=?) 
> 
> My guess is it's because there are no sqlite_stat* tables and SQLite
> doesn't know that bar scan is more efficient.  That's why there was a
> LEFT JOIN in the first place, but as it seems, it wasn't that good
> idea.

If you want the ORDER BY to not sort (either as results are generated or
after, but either way before the application sees a single result) then
you kinda need to scan foo first.  Is there any other way?  I don't
think so, but correct me!

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Query Planner fails to recognise efficient strategy when '=' condition gives a strong hint

2016-11-17 Thread Nico Williams

On Thu, Nov 17, 2016 at 09:54:01AM -0500, Richard Hipp wrote:
> Standard SQL requires an implied NOT NULL on all PRIMARY KEY columns.
> But due to a coding error, early versions of SQLite did not enforce
> that, and so we have taken care not to enforce it on all subsequent
> versions of SQLite to preserve backwards compatibility.
> 
> WITHOUT ROWID tables were added later, and so NOT NULL is properly
> enforced on all PRIMARY KEY columns in WITHOUT ROWID tables.

I've noticed this, and I'm quite happy about it.

A [persistent] pragma to make SQLite3 adhere more strictly to the
standard would be nice.  You could slowly add more strictness.  Anyone
who turns on strictness would be getting what they asked for as
subsequent add more.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Changing ID's to UUID

2016-11-16 Thread Nico Williams

On Wed, Nov 16, 2016 at 02:51:35PM -0600, Jeffrey Mattox wrote:
> I think this discussion is about apples and oranges.  UUID stands for
> universally UNIQUE identifier, so there won't be any collisions.  It
> looks random, but it never repeats.  [...]

No, DRH is right that this depends on how good your entropy source (and,
typically, PRNG fed that entropy) is.  Nothing about putting
"universally unique" in the name makes it so -- only the details of how
the sausage is made can take care of it.

Regardless, using UUIDs to make a distributed DB is not that great.  It
does work though.  Microsoft's Active Directory (AD), for example, uses
96-bit UUID-like values to form "domain SIDs", with user, group, and
other SIDs being formed by adding a 32-bit "relative ID" to the domain
SID.  This has worked rather well for MSFT, and it has allowed the
creation of "forests" of domains and forests of forests.  I do think AD
checks SID uniqueness within each forest, and IIRC there's a way to
handle SID collisions in forests of forests.

Uniqueness checks are not too expensive when they are feasible at all.

In the AD forest case they are feasible, while in the forest of forests
case they are not.

The alternative to randomly-generated IDs would be to have a global
registry (perhaps hierarchical), not unlike DNS, or ASN.1 OID arcs, but
there is a real cost to having to have a global registry.

So in a distributed system roughly like SPARQL, or AD, say, UUIDs will
do.  You might store them as blobs to avoid having to waste space, but,
whatever.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Uninitialized memory reads (not likely false positives)

2016-11-16 Thread Nico Williams

On Wed, Nov 16, 2016 at 12:06:39PM -0600, Nico Williams wrote:
> On Tue, Nov 15, 2016 at 09:38:11PM -0200, Bernardo Sulzbach wrote:
> >   if( s1>7 && s2>7 ){
> > res = s1 - s2;
> >   }else{
> > if( s1==s2 ){
> >   // Accesses to aLen as mentioned above
> > 
> > If s1 > 7 && s2 > 7 is false, then at least one of s1 and s2 is not above 7.
> > If they are equal, then neither s1 nor s2 is above 7.
> 
> 7 is past the end of the array.
> 
> > > and we also know that either s1 or s2 can be 8 or 9,
> > 
> > This is false, unless I am mistaken. See my reasoning above.
> > 
> > The issue is valid, and the message your analyzer (or compiler) wrote is
> > correct: it is not guaranteed to be < 7, which it should be.
> 
> Right.

Er, no, wrong: it has to be < 7 given the assertions and the if
conditions.  I should have read Dan Kennedy's response first.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Uninitialized memory reads (not likely false positives)

2016-11-16 Thread Nico Williams

On Wed, Nov 16, 2016 at 10:52:13PM +0700, Dan Kennedy wrote:
> On 11/16/2016 05:53 AM, Nico Williams wrote:
> > [...]
> >
> > Anyways, the analysis from here is non-trivial, and I can't convince
> > myself that sNC.pNext will not be dereferenced.
> 
> Thanks for taking the time to look into these.
> 
> Some kind of assert() could be helpful there I think. The reason sNC.pNext
> will not be accessed is that generateColumnNames() is only called (a) on a
> top-level SELECT statement and (b) after all references have already
> resolved successfully. Implying that this:
> 
>   http://www.sqlite.org/src/artifact/672b1af237ad2?ln=1406
> 
> is always true.

That's... a bit too convoluted to for my liking.  I'd rather have
sNC.pNext initialized than an assertion that might get compiled out.

This is a function invoked on statement preparation, so I think
initializing sNC.pNext can't negatively affect performance in any
terribly meaningful way.

> > Another one that I find difficult to analyze is a possible
> > out-of-bounds read in vdbeSorterCompareInt():
> >
> > [...]
> >
> > At 85715 we know that (s1 <= 7 || s2 <= 7) && s1 == s2, and we also
> > know that either s1 or s2 can be 8 or 9, so aLen[s1] at 85715 could
> > very well have s1 > 6, which would read past the bounds of aLen[].
> 
> I think ( ( s1<=7 || s2<=7) && s1==s2 ) implies that s1<=7. And we assume
> s1!=7 because there is an assert() that says so.

Oh, I see it now.  Yeah, OK.  This is definitely a false positive.

Thanks for the response,

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Uninitialized memory reads (not likely false positives)

2016-11-16 Thread Nico Williams

On Tue, Nov 15, 2016 at 09:38:11PM -0200, Bernardo Sulzbach wrote:
>   if( s1>7 && s2>7 ){
> res = s1 - s2;
>   }else{
> if( s1==s2 ){
>   // Accesses to aLen as mentioned above
> 
> If s1 > 7 && s2 > 7 is false, then at least one of s1 and s2 is not above 7.
> If they are equal, then neither s1 nor s2 is above 7.

7 is past the end of the array.

> > and we also know that either s1 or s2 can be 8 or 9,
> 
> This is false, unless I am mistaken. See my reasoning above.
> 
> The issue is valid, and the message your analyzer (or compiler) wrote is
> correct: it is not guaranteed to be < 7, which it should be.

Right.

> I am unsure whether or not this is actually a bug, but it almost certainly
> is a mistake.

That's how it seems to me, yes.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Uninitialized memory reads (not likely false positives)

2016-11-15 Thread Nico Williams

I don't normally pay attention to warnings when compiling SQLite3, nor
to Coverity or other static analysis tools' output either, as I'm quite
aware that most of these are false positives and thus unwelcome noise
here.

However, I do sample them occasionally, and though usually such reports
are false positives, here are two that don't quite look like false
positives to me.  I get these from building the SQLite3 3.15.1
amalgamation.

Uninitialized pointer dereference:

115861 static void generateColumnTypes(
115862   Parse *pParse,  /* Parser context */
115863   SrcList *pTabList,  /* List of tables */
115864   ExprList *pEList/* Expressions defining the result set */
115865 ){
115866 #ifndef SQLITE_OMIT_DECLTYPE
115867   Vdbe *v = pParse->pVdbe;
115868   int i;

1. var_decl: Declaring variable sNC without initializer.
115869   NameContext sNC;
115870   sNC.pSrcList = pTabList;
115871   sNC.pParse = pParse;

2. Condition i < pEList->nExpr, taking true branch
115872   for(i=0; inExpr; i++){
115873 Expr *p = pEList->a[i].pExpr;
115874 const char *zType;
115875 #ifdef SQLITE_ENABLE_COLUMN_METADATA
115876 const char *zOrigDb = 0;
115877 const char *zOrigTab = 0;
115878 const char *zOrigCol = 0;
115879 zType = columnType(, p, , , , 0);
115880 
115881 /* The vdbe must make its own copy of the column-type and other 
115882 ** column specific strings, in case the schema is reset before 
this
115883 ** virtual machine is deleted.
115884 */
115885 sqlite3VdbeSetColName(v, i, COLNAME_DATABASE, zOrigDb, 
SQLITE_TRANSIENT);
115886 sqlite3VdbeSetColName(v, i, COLNAME_TABLE, zOrigTab, 
SQLITE_TRANSIENT);
115887 sqlite3VdbeSetColName(v, i, COLNAME_COLUMN, zOrigCol, 
SQLITE_TRANSIENT);
115888 #else

CID 12 301 (#1 of 1): Uninitialized pointer read (UNINIT)
3. uni nit_use_in_call: Using uninitialized value sNC.pNext when calling 
columnTypeImpl.
115889 zType = columnType(, p, 0, 0, 0, 0);

columnType() is a macro expanding to a call to columnTypeImpl().

Anyways, the analysis from here is non-trivial, and I can't convince
myself that sNC.pNext will not be dereferenced.

The obvious fix is to initialize sNC a bit more before the loop at
115872.  At least setting sNC.pNext = 0 seems like the right thing to
do.

Another one that I find difficult to analyze is a possible out-of-bounds
read in vdbeSorterCompareInt():

 85712 static const u8 aLen[] = {0, 1, 2, 3, 4, 6, 8 };
 85713 int i;
 85714 res = 0;
 85715 for(i=0; i0 && s1<7) || s1==8 || s1==9 );
 85701   assert( (s2>0 && s2<7) || s2==8 || s2==9 );
 85702
 85703   if( s1>7 && s2>7 ){
 85704 res = s1 - s2;
 85705   }else{
 85706 if( s1==s2 ){

At 85715 we know that (s1 <= 7 || s2 <= 7) && s1 == s2, and we also know
that either s1 or s2 can be 8 or 9, so aLen[s1] at 85715 could very well
have s1 > 6, which would read past the bounds of aLen[].

In both of these cases very detailed knowledge of the VDBE being handled
might show that these uninitialized reads do not happen.  If so, I don't
have that knowledge.

I'll hold off on other reports for the time being.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-19 Thread Nico Williams

On Wed, Nov 18, 2015 at 11:22:51PM -0700, Keith Medcalf wrote:
> On Wednesday, 18 November, 2015 20:36, Nico Williams  cryptonector.com> said:
> > On Thu, Nov 19, 2015 at 12:39:41AM +, Simon Slavin wrote:
> > > On 19 Nov 2015, at 12:26am, Nico Williams  
> > > wrote:
> > > > two concurrent scans of the same table should be able to go faster
> > > > than the same two scans in series.
> >
> > > SQLite is not processor-bound, it's file-bound.  Both
> > > threads/processes need to continually read blocks from disk and a disk
> > > can only answer one request at a time.
>  
> > Two table scans of the same data in sequence are going to cost more than
> > two table scans of the same data concurrently.
> 
> The original was taking about the "same table".  "same data" is
> somewhat sloppy terminology.  Assuming that "same data" == "same

It was sloppy.  In fact, it was wrong: it's not the same data because
it's a UNION ALL of two queries each of which will be scanning a
different table, not the "same data".

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Wed, Nov 18, 2015 at 09:11:13PM -0500, Richard Hipp wrote:
> On 11/18/15, Nico Williams  wrote:
> >
> > Now, diff(1) (i.e., the Unix command) can do this very efficiently
> > because it just moves a "cursor" in both files looking for different
> > records (lines).
> >
> 
> Umm..  Diff is a little more complicated than that.  See

Yes, quite right, but for this particular case (only the PK columns
selected) the algorithm I described suffices and is fast because the
two table sources are already sorted.  I know I neglected to mention
explicitly that these are PK columns.

> https://en.wikipedia.org/wiki/Longest_common_subsequence_problem
> further niformation.  Basically, any efficient implementation of
> diff(1) needs to have both files completely in memory, because the
> computation does lots of jumping around and searching within both
> files.  I wrote the diff implementation for Fossil, from scratch, so I
> have some familiarity with these kinds of things.  Would that it were
> so simple as a couple of moving cursors...

I quite believe that.  This isn't a generic diff.

> That said, given sorted tables like this:
> 
> CREATE TABLE t1(a,b,c,PRIMARY KEY(a,b));
> CREATE TABLE t2(x,y,z,PRIMARY KEY(x,y));
> 
> Then the following query:
> 
> SELECT a, b FROM t1 EXCEPT SELECT x, y FROM t2 ORDER BY 1, 2;
> 
> Works by running both SELECT statements in parallel - walking cursors
> down each table - and "merging" the results together in a single pass.
> Sadly, SQLite currrently does *not* do this if you omit the ORDER BY
> clause:
> 
> SELECT a, b FROM t1 EXCEPT SELECT x, y FROM t2;
> 
> So here is a case where (paradoxically) adding an ORDER BY clause
> makes a query go faster.  I noticed this a few weeks ago and was
> thinking that I need to do something about it.

The data is already sorted (in the primary key index), thus the ORDER BY
costs nothing.  Then the ORDER BY is just a clue that SQLite3 needed?

The difference (for those following) is this:

1|0|0|SCAN TABLE t1 USING COVERING INDEX sqlite_autoindex_t1_1
2|0|0|SCAN TABLE t2 USING COVERING INDEX sqlite_autoindex_t2_1
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (EXCEPT)

vs.

1|0|0|SCAN TABLE t1 USING COVERING INDEX sqlite_autoindex_t1_1
2|0|0|SCAN TABLE t2 USING COVERING INDEX sqlite_autoindex_t2_1
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (EXCEPT)

The improvement is on the EXCEPT, which gets one half the diff (either
the additions or the deletions).  That's nice, and it's not nothing.

The same query plan would work for the UNION ALL of the two EXCEPT
queries.  This is partly why I think that either having a DIFF operator
or the optimizer recognize the diff-style query would help.  And if
there was such an operator/optimization, then the above optimization
would indeed be icing on the cake.

Anyways, this is good, this makes the SQL diff only twice as slow as it
could be.

I also noticed that UNIQUE indices aren't given the same treatment as
PKs.  That seems strange (though maybe not a big deal, not for me
anyways).

Thanks,

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Thu, Nov 19, 2015 at 12:39:41AM +, Simon Slavin wrote:
> On 19 Nov 2015, at 12:26am, Nico Williams  wrote:
> > two concurrent scans of the same table should be able to go faster
> > than the same two scans in series.
> 
> SQLite is not processor-bound, it's file-bound.  Both
> threads/processes need to continually read blocks from disk and a disk
> can only answer one request at a time.

Two table scans of the same data in sequence are going to cost more than
two table scans of the same data concurrently.

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Thu, Nov 19, 2015 at 12:43:18AM +, Stadin, Benjamin wrote:
> Sqlitediff diffs a same db (one db changed over time). If you are
> looking for diffing data using the same db scheme that has been
> generated at different times (producing different instances of the db
> scheme, eg you import data from somewhere and crate a same db scheme
> from it, but different instance therefore) you need your own tool. It
> is application specific. 

The query produced by sqldiff does the trick.

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Wed, Nov 18, 2015 at 11:56:50PM +, Simon Slavin wrote:
> On 18 Nov 2015, at 10:58pm, Nico Williams  wrote:
> > Can SQLite3 run the sub-queries of a UNION ALL with any degree of
> > concurrency?  E.g., with its co-routines?
> 
> SQLite always has the bottleneck of access to the file storage medium.
> Many posts to this list have complained that implementing
> multi-threading or multi-processing does not speed up SQLite
> significantly, and in some cases makes things worse.

But two concurrent scans of the same table should be able to go faster
than the same two scans in series.

> The problem here is not that SQLite does this job inefficiently, but
> that the job is a bit weird.  If you know you're going to have to diff
> two versions of the same database, you have your program write a log
> of the modifications it makes.

So weird that sqldiff exists.  Huh.

Writing a log is a nice idea that doesn't always work.  There's no need
to go into that in detail.  Suffice to say that a) I do use logs in some
case, b) diff'ing two tables is not such crazy thing to want to do.

There's a whole class of problems that SQL query optimizers don't do as
well with as a programmer writing a general purpose programming language
can do.  This is, IMO, an interesting problem.  I'm not just after "how
can I do this faster", but "how can SQLite3 do this faster".  I
understand and accept, of course, that the SQLite3 dev team may not be
interested in the latter.

Thanks,

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Wed, Nov 18, 2015 at 11:52:14PM +, Simon Slavin wrote:
> On 18 Nov 2015, at 8:11pm, Nico Williams  wrote:
> > SELECT 'added., a.* FROM
> >  (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a
> > UNION ALL
> > SELECT 'deleted', d.* FROM
> >  (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d;
> > 
> > In other words, the SQL analog of a straightforward diff(1) of two
> > [sorted] files.  I think this is the canonical way to do this in SQL,
> > though there are variants.
> 
> I don't know why anyone would want to do that in SQL.  If you need to
> diff two tables you also need to know about UPDATE, not just INSERT
> and DELETE.

I implied that all the columns were as if part of the primary key.  See
the follow-up posts.

> > Now, diff(1) (i.e., the Unix command) can do this very efficiently
> > because it just moves a "cursor" in both files looking for different
> > records (lines).
> 
> Text files have the concept of order.  SQL tables do not.  You are
> imposing an arbitrary idea of order on table rows which doesn't really
> make sense for data.

There was no need to be patronizing.  There's no need for CREATE INDEX
in pure SQL.  Over here in real life we have them because they *can*
make queries go faster (at some price, of course).  And in SQLite3 the
indices happen to support ordered traversal, which is useful for certain
optimizations, in particular the one we're discussing.  If SQLite3 had
only supported hash tables for indexing then I would not have bothered
posting this thread at all, naturally.

Why make me say any of this?  Especially when the rest of the thread was
actually quite useful.  Are you traumatized by newcomers to SQL?

> > SQLite3, on the other hand, does four table scans and further scans two
> > sub-queries' result row-sets.
> 
> Instead of the above try the following.  Suppose that the primary key
> (i.e. a significant unique value like a company name) for the above
> tables is a column 'pk', and suppose further that if this is not
> literally the primary key that there's a UNIQUE INDEX on it ...

Yes, one would think.  But actually SQLite3 doesn't treat UNIQUE INDEX
as the same as PRIMARY KEY for some of these queries (see the rest of
this thread).

> SELECT 'added., a.* FROM
>  (SELECT a.pk FROM a EXCEPT SELECT b.pk FROM b) a
> UNION ALL
> SELECT 'deleted', d.pk FROM
>  (SELECT b.pk FROM b EXCEPT SELECT a.pk FROM a) d;

After correcting the syntax and adapting as

SELECT 'added', a.* FROM
 (SELECT a.a, a.b FROM main.toy a EXCEPT SELECT b.a, b.b FROM aux.toy b) a
UNION ALL
SELECT 'deleted', d.* FROM
 (SELECT b.a, b.b FROM aux.toy b EXCEPT SELECT a.a, a.b FROM main.toy a) d;

I get:

3|0|0|SCAN TABLE toy AS a USING COVERING INDEX sqlite_autoindex_toy_1
4|0|0|SCAN TABLE toy AS b USING COVERING INDEX sqlite_autoindex_toy_1
2|0|0|COMPOUND SUBQUERIES 3 AND 4 USING TEMP B-TREE (EXCEPT)
1|0|0|SCAN SUBQUERY 2 AS a
7|0|0|SCAN TABLE toy AS b USING COVERING INDEX sqlite_autoindex_toy_1
8|0|0|SCAN TABLE toy AS a USING COVERING INDEX sqlite_autoindex_toy_1
6|0|0|COMPOUND SUBQUERIES 7 AND 8 USING TEMP B-TREE (EXCEPT)
5|0|0|SCAN SUBQUERY 6 AS d
0|0|0|COMPOUND SUBQUERIES 1 AND 5 (UNION ALL)

Which... is not as good as the query produced by sqldiff (see the rest
of the thread).

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

Can SQLite3 run the sub-queries of a UNION ALL with any degree of
concurrency?  E.g., with its co-routines?  That might help three
union'ed scans of the same table go pretty fast, maybe not much slower
than one scan.  In which case this might compare quite well to diff(1).

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Wed, Nov 18, 2015 at 11:10:47PM +0100, E.Pasma wrote:
> op 18-11-2015 21:23 schreef Richard Hipp op drh at sqlite.org:
> > The "sqldiff.exe" command-line utility has an (undocumented and
> > unsupported) "--debug 2" option that will show you the SQL that it
> > uses to do a diff between two tables.  Maybe look at that and tweak it
> > for your use?
> > 
> Does that look like this:
> SELECT * FROM
> (
> SELECT 'added' AS what, a.* FROM a
> UNION ALL 
> SELECT 'deleted', b.* FROM b
> )
> GROUP BY 
> HAVING COUNT (*)<>2
> ;

No, it looks like this when the columns are all in the primary key:

  SELECT A.a, A.b, 2 -- deleted row
FROM main.toy A
   WHERE NOT EXISTS(SELECT 1 FROM aux.toy B
 WHERE A.a=B.a AND A.b=B.b)
   UNION ALL
  SELECT B.a, B.b, 3 -- inserted row
FROM aux.toy B
   WHERE NOT EXISTS(SELECT 1 FROM main.toy A
 WHERE A.a=B.a AND A.b=B.b)
   ORDER BY 1, 2;

Else it looks like this when not all of them are in the PK:

  SELECT B.a, 1, -- changed row
 A.b IS NOT B.b, B.b
FROM main.toy A, aux.toy B
   WHERE A.a=B.a
 AND (A.b IS NOT B.b)
   UNION ALL
  SELECT A.a, 2, -- deleted row
 NULL, NULL
FROM main.toy A
   WHERE NOT EXISTS(SELECT 1 FROM aux.toy B
 WHERE A.a=B.a)
   UNION ALL
  SELECT B.a, 3, -- inserted row
 1, B.b
FROM aux.toy B
   WHERE NOT EXISTS(SELECT 1 FROM main.toy A
 WHERE A.a=B.a)
   ORDER BY 1;

It's one UNION ALL when all columns are part of the PK, two UNION ALLs
otherwise.  sqldiff wasn't smart enough to recognize that a UNIQUE INDEX
exists (other than the PK) on all columns when it does exist, in which
case is should use the first query (one UNION ALL).

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

On Wed, Nov 18, 2015 at 03:23:33PM -0500, Richard Hipp wrote:
> If it's so common, why are you the first to request it :-)  And, yeah,
> according to the WWPD principal, if Postgres doesn't do it, we
> probably won't be in a big hurry to do it either...

I've written this sort of query many times.  I imagined I couldn't have
been the first to do so.  Maybe I am, but I doubt it.  I am suprised to
be the first to request it.  I suspect most people who need this either
accept the performance they get, or write the diff in application code.

I've wanted better performance for this kind of query for two years,
and I'm only just now getting around to asking for it.

> Note that you can get efficient performance by tweaking the query
> slightly.

There's several ways of writing this, and I've not tried all of them.

> The "sqldiff.exe" command-line utility has an (undocumented and
> unsupported) "--debug 2" option that will show you the SQL that it
> uses to do a diff between two tables.  Maybe look at that and tweak it
> for your use?

Well hey!  I'm not the first to need to diff two table sources after
all!  :)

Here's what sqldiff's query's explain query plan looks like this both,
with and without indices:

2|0|1|SCAN TABLE toy AS B
2|1|0|SEARCH TABLE toy AS A USING INTEGER PRIMARY KEY (rowid=?)
3|0|0|SCAN TABLE toy AS A
3|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 4
4|0|0|SEARCH TABLE toy AS B USING INTEGER PRIMARY KEY (rowid=?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 (UNION ALL)
5|0|0|SCAN TABLE toy AS B
5|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 6
6|0|0|SEARCH TABLE toy AS A USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 5 (UNION ALL)

on a table created as:

CREATE TABLE toy(a,b);

So that's three table scans with a JOIN each onto the other.

That *is* faster than what I was getting, so I'll probably be using this
(I have to check what Postgres does too, since I need and use this kind
of query on Postgres as well).

It's not as fast as diff(1), but it may compare well to a

  $ diff <(sort ...) <(sort ...)

Of course, *with* sorted sources the sort is unnecessary and then
diff(1) gets faster again; with indices the diff query ought to get fast
too.

Now, if all the columns are part of the PRIMARY KEY and we use WITHOUT
ROWID

  CREATE TABLE TOY (a,b, PRIMARY KEY (a,b)) WITHOUT ROWID;

then the SQL emitted by sqldiff --debug 2 gets down to just two scans:

1|0|0|SCAN TABLE toy AS A USING COVERING INDEX sqlite_autoindex_toy_1
1|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE toy AS B USING COVERING INDEX sqlite_autoindex_toy_1 (a=? 
AND b=?)
3|0|0|SCAN TABLE toy AS B USING COVERING INDEX sqlite_autoindex_toy_1
3|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 4
4|0|0|SEARCH TABLE toy AS A USING COVERING INDEX sqlite_autoindex_toy_1 (a=? 
AND b=?)
0|0|0|COMPOUND SUBQUERIES 1 AND 3 (UNION ALL)

And that's starting to look pretty good!

In the tables I happen to care about most for this kind of query I can
make all the columns part of the primary key (if they aren't already),
so I may well be able to get diff(1)-like performance now -- I haven't
tested it yet, but I'll let you know.

The RDBMS could do better even with not all columns being in the primary
key: the RDBMS should be able to do as well as diff(1) when there's
suitable indices, and as good as diff(1) + sort(1) of inputs when there
aren't.

Thanks for the reference to sqldiff!

Nico
--

[sqlite] Diff two tables as fast as diff(1) (of sorted files)

2015-11-18 Thread Nico Williams

Consider two tables with the same columns and primary keys, and then
consider this query:

SELECT 'added., a.* FROM
  (SELECT a.* FROM a EXCEPT SELECT b.* FROM b) a
UNION ALL
SELECT 'deleted', d.* FROM
  (SELECT b.* FROM b EXCEPT SELECT a.* FROM a) d;

In other words, the SQL analog of a straightforward diff(1) of two
[sorted] files.  I think this is the canonical way to do this in SQL,
though there are variants.

Now, diff(1) (i.e., the Unix command) can do this very efficiently
because it just moves a "cursor" in both files looking for different
records (lines).

SQLite3, on the other hand, does four table scans and further scans two
sub-queries' result row-sets.  If only the optimizer could recognize
this pattern, then it could surely do as well as diff(1).  This sort of
query strikes me as a common enough kind of query that implementing such
an optimization might be worthwhile, though the SQLite3 team will have
better knowledge of that.  I'd certainly appreciate such an
optimization.  (No, Postgres 9.4.4 doesn't seem to optimize this
substantially better either.)

A difference UDF would be one way to handle this.  Table-valued
functions are half the battle, but one still needs table-valued
arguments to UDFs to make it clean looking:

  -- Outputs two-column rows where the first column denotes addition vs
  -- deletion and the second is a row-valued column with the result
  -- columns of the corresponding row in the old or new table source:
  SELECT diff((SELECT ...), (SELECT ...));

Using "eval" and JSON one could define a diff() UDF that takes two
strings, compile the two strings as SQL statements and execute them,
stepping over the results of each just like diff(1) would:

  -- Outputs JSON arrays of addition/deletion marker and column array or
  -- object.
  SELECT diff('SELECT ...', 'SELECT ...');

and/or

  SELECT diff('old_view_or_table', 'new_view_or_table');

Perhaps SQL could grow a DIFFERENCE operator for this, though there are
syntactic/semantic issues to deal with.  Maybe:

  SELECT d.diff, o.c0, o.c1, .., o.cN
  FROM old_source o
  DIFFERENCE (/*deletion*/ '<', '>' /*addition*/)) d
  SELECT d.diff, n.c0, n.c1, .., n.cN
  FROM new_source n;

One might get a comm(1) equivalent too:

  SELECT d.diff, o.c0, o.c1, .., o.cN
  FROM old_source o
  DIFFERENCE (/*deletion*/ '<', '>' /*addition*/), '=' /*same*/) d
  SELECT d.diff, n.c0, n.c1, .., n.cN
  FROM new_source n;

Nico
--

[sqlite] SQLite, fork(), and exec()

2015-08-10 Thread Nico Williams

On Mon, Aug 10, 2015 at 11:02:04AM -0400, Richard Hipp wrote:
> On 8/10/15, Felipe Gasper  wrote:
> > Does SQLite need to have nothing at all to do with fork()? Even when
> > the fork() immediately precedes an exec(), and the exec()ed command has
> > nothing to do with SQLite?
> >
> > Or is it only a problem if the child process *uses* SQLite?
> >
> 
> I think it is only a problem if the child process tries to use SQLite.

If SQLite3 doesn't use pthread_atfork() nor getpid() to detect forks
then that should be the case, though it could also be the case that only
one of the parent or child can continue to use SQLite3 after a fork().
(The latter can be useful for "detaching" daemons, but is a bad pattern.
It's best to do the setup work in the child and signal the parent via a
pipe when the child is ready.)

Nico
--

[sqlite] sqlite error code 1034

2015-06-23 Thread Nico Williams

On Tue, Jun 23, 2015 at 09:12:31PM +0100, Simon Slavin wrote:
> On 23 Jun 2015, at 8:49pm, Mayank Kumar (mayankum)  
> wrote:
> > Thanks all for the useful inputs. One more question I have is can
> > the 1034 error code ever mean that disk is full or nvram is full and
> > hence it cannot fsync ?
> 
> The error for 'disk full' should never happen at fsync().  It should
> happen at the operation which requests the additional disk space,
> which is probably fwrite().

It could happen as a result of a bug in a COW-style filesystem.  The
space needed to write metadata needs to be acquired at write()-time, not
fsync()-time, but it wouldn't surprise me to see filesystems that don't
check until sync time.

As others have pointed out, fsync() errors are essentially like EIO.

Except for "oops, someone tripped over the cable" or "the user removed
the removable storage device before we were done" type errors, these are
not recoverable.  You can't really know if the error is recoverable
though.  Simply retrying is not appropriate.  If the application is
running on a modern phone, say, then the error is probably not retriable
(most new phones don't have micro-SDcard slots anymore, right?).  If the
application is running on a server, then the error is not retriable.  If
it's running on a desktop, it might be retriable.

Nico
--

[sqlite] UDTs

2015-06-05 Thread Nico Williams

On Fri, Jun 05, 2015 at 09:36:09PM +0100, Simon Slavin wrote:
> One advantage I can think of of having a DateTime type -- enforcement
> of storing the correct thing in the correct column -- won't work in
> SQLite anyway, because rather than enforce column types is uses only
> affinities.
> 
> How would your code profit from having a DateTime type in SQLite, as
> opposed to you just standardising on strings of the format described
> in ISO8601:
> 
> -MM-DDThh:mm:ssTZD
> 
> ?  Would you require another format too -- TimeInterval -- so that you
> could subtract one DateTime from another ?  Would you want automatic
> reformatting of dates to and from your preferred date format (which
> would require a list of locales to be added to SQLite) ?

Ideally numeric types (i.e., ones for whose values there are arithmetic
operators) could have optional (but strongly typed and algebraic)
dimensions and units, and an absolute vs. scalar (relative) form.  This
would help prevent many accidents (though unit conversions in a limited
precision environment is a source of errors, so care is still required).

Time should behave like a numeric type with "time" dimension and some
unit such as seconds, say, or microseconds.  The internal representation
of time wouldn't have to be numeric.  Type conversions -casts- to/from
external representations (seconds since Unix epoch, seconds since
Windows epoch, ISO8601 time strings, ...) should be available; some
might be the identity function applied to the internal representation.

(Also, obviously, there should be no time unit like "month".)  A variant
of the same type, coercing the internal representation to whatever is
best for the user, would also be nice.

> Strings of the above format can be searched and sorted.  As long as
> the programmer for a particular database is consistent about whether
> they use TimeZones or not, the NOCASE collation works fine for
> date/times.  It's a solution that works fine without needing any extra
> code in SQLite.

I agree that it works well enough.  Datetime is not really a great
source of motivation for UDTs.  If anything datetime is a poor
motivation for UDTs because it is so special (having so many possible
representations).

If anything, UDTs are probably not as interesting as adding something
like type-tagged blobs and maybe bitstrings.

Nico
--

[sqlite] User-defined types

2015-06-04 Thread Nico Williams

On Thu, Jun 04, 2015 at 03:36:34PM -0700, Darko Volaric wrote:
> I now regret using JSON as an example since everyone wants me to convert
> [...]

So you don't like the SQL language, but if you're after UDTs and your
first stop is to design a different language (or merely make it easier
for you to add such a language later) then your priorities are kinda
wrong (I'm being generous).  The way you're approaching an alternative
front-end won't make any aspect of adding UDTs any easier; at best it's
a get-to-know-the-core project.

[sqlite] User-defined types

2015-06-04 Thread Nico Williams

On Thu, Jun 04, 2015 at 02:16:22PM -0700, Darko Volaric wrote:
> {
>   operation: "insert"
>   table: "blah"
>   columns: ["a", "b", "c"]
>   values: [1.3, 2.0, 3.1]
>   on-conflict: "replace"
> }

I do this all the time.  It's trivial enough to generate SQL from that
sort of thing.  If you have an AST then you can trivially map the
AST<->a JSON/XML/ASN.1/whatever schema.

But I don't think ease of alternative representation is the winning
argument for wanting the engine core to use an AST.  It's only
convenient.

The winning argument is that working with an AST makes some tasks easy
that are otherwise hard (e.g., common sub-expression elimination).

>  [...]  Why are people who come from the websphere
> learning SQL syntax? [...]

Because it's standard.

> The feature I'm working on now, as a first step, basically feeds the parser
> tokens so I don't have to generate a query string. [...]

That seems rather basic, not really good enough.  It must save some
allocations.  But is it worth forking SQLite3 for this?!  Whatever you
do with a fork, it's got to be worth it.  Forking is quite hard, so
every change has got to be worth the effort.

Switching to an AST is going to require more allocations (and much more
developer effort), that's for sure.  But then, this is in statement
compilation, which should not be the most critical component.

Anyways, this is all very far afield from UDTs.  If you want to fork to
add UDTs, focus on that first.

Nico
--

[sqlite] User-defined types

2015-06-04 Thread Nico Williams

On Thu, Jun 04, 2015 at 11:45:28AM -0700, Darko Volaric wrote:
> Which sort of leads me to my next feature, which is bypassing the SQL
> language. [...]

I like SQL, but sure, if the compiler worked by first parsing into an
AST, and if the AST were enough of an interface (versioned, though not
necessarily backward-compatible between versions), then one could:

 - write different front-end languages (though an AST isn't needed for
   this: you can always generate SQL)

 - write powerful macro languages

 - write alternative/additional optimizers (and linters, syntax
   highlighters, ...) that work at the AST level

If the VDBE bytecode were also a versioned interface then one could
write peep-hole optimizers as well.

One might even want to generate IR code for LLVM, or use a JIT-er,
though for SQL I don't think that would pay off.  I suspect that most of
the CPU cycles go to data-intensive tasks such as I/O, cache thrashing,
and encoding/decoding.  I'd be much more interested in SQLite4 being
finished than an LLVM backend for SQLite3, and I'd be very interested in
seeing if word-optimized variable-length encoding would have better
performance than byte-optimized variable-length encoding.  The point
though is that using an AST would make the system more modular.

>[...]. Why use that crusty old syntax when it's equally expressible in
> JSON, XML or something else. Again I see it just as an API, [...]

Now I'm confused.  JSON and XML are not query languages.  There exist
query languages for them (e.g., XPath/XSLT for XML).

I suppose you might have meant that SQL itself is a data representation
language like JSON and XML are, and it is (data being expressed as
INSERT .. VALUES ..; statements).

Nico
--

[sqlite] User-defined types

2015-06-04 Thread Nico Williams

On Thu, Jun 04, 2015 at 10:54:16AM +0200, Dominique Devienne wrote:
> On Thu, Jun 4, 2015 at 10:20 AM, Christopher Vance 
> wrote:
> > If you really want your own types, you could always bundle with ASN.1 and
> > store the result as a blob.

FYI, Heimdal has a very nice, small, simple, featureful, and easily
separable ASN.1 compiler with a BSD-type license.

https://github.com/heimdal/heimdal/tree/master/lib/asn1

> Or Protobuf, or ... But you're back to option 1, you must store somewhere
> that knowledge, and it's an app-convention, SQL and SQLite don't cooperate.

I've never understood why Protocol Buffers.  Its authors made all the
same sorts of design choices that make DER a lousy encoding for ASN.1...

Anyways, the biggest problem with any kind of encoding (ASN.1, JSON,
whatever), is that if you want to be able to use decoded values in SQL
statements, you end up having to decode and re-encode in every UDF.
This could be avoided if SQLite3 had a sort of UDT that's just an opaque
handle for a value and which includes an interface for coercing it to a
native SQLite3 type (e.g., blobs): this way re-encoding can be avoided
as much as possible.

> You can also use a self-describing format like AVRO (many others I'm sure),
> but you still need to know "a-priori" which blobs are avro encoded, and
> even assuming that, that doesn't enforce a "schema" (structure) on the
> column (i.e. "static typing"), which I think is necessary. Actually, maybe
> a CHECK constraint works here. I've never tried to use a
> custom-SQL-function-based check constraint in SQLite. Does that work? --DD

Yes, it does.

Nico
--

[sqlite] User-defined types

2015-06-04 Thread Nico Williams

On Wed, Jun 03, 2015 at 06:04:29PM -0700, Darko Volaric wrote:
> Yep, references a another one. Just like the functions, you have to join on
> the user type information, add it to constraints, etc.

Once you're extending SQLite3 proper the referential integrity problem
goes away (being no different than the "problem" SQLite3 has of tracking
value types internally).

> In my case I'm already modifying and maintaining my own version of SQLite.
> My project is basically a database with a lot of extensions. Submitting
> patches is not an issue. The last time I brought these ideas up I was
> practically chased off by a mob waving pitchforks and torches. Apparently
> almost no-one thinks user defined types is a good idea so there is no point
> sharing it. I don't expect anyone to help me maintain the code. The
> critical parts of SQLite (like the record read/write) are very stable and
> updates hardly ever affect me.

I'm not at all sure that UDTs are a bad idea.  I do want JSON support,
preferably using jq for expressing transformations (or merely addressing
specific values in a JSON text) of JSON texts.  And I'd like bigint (and
bigfloat?) support.  Those two types should be enough for easily dealing
with a great many needs for UDTs without necessarily having support for
arbitrary UDTs.  A bitstring type would also be convenient for things
like IP addresses and CIDR.

Given that SQLite3 is in the public domain, you're not obliged to share
your development.  But who knows, in spite of all the reasons that using
your "fork" would not be advisable, you might succeed in forming a
community of users.  So unless you have reasons not to share your work,
I'd encourage you to share it.  If you do share it, you'll want to give
it a distinct name (it wouldn't be SQLite3, would it), and you'll want
to very careful whose patches you accept: if you want to ever be able to
contribute your changes back into the mainline, you'll need to be able
to show that each contribution is permitted by the contributor's
employer and so on.

I have a long wishlist of features or changes that I don't think the
SQLite3 developers are going to be very interested in at this time.  You
might not be either, but if there was a community of external developers
that could pool its resources to make contributions that might be
welcomed by the SQLite3 developers...

Nico
--

[sqlite] User-defined types

2015-06-03 Thread Nico Williams

On Wed, Jun 03, 2015 at 03:55:04PM -0700, Darko Volaric wrote:
> I've tackled this problem from a couple of different angles. My goal was to
> allow arbitrary user defined types, based on the builtin types (essentially
> subtypes of the existing types), with a minimum of work and minimum
> disruption of the normal/existing use of the database and API.
> 
> The approaches I considered were:

A UDT can't be first-class without modifying SQLite3.  Without
first-class UDT support the application has to be responsible to some
degree for adhering to whatever encoding conventions it chooses to use.

User-defined functions, collations, and virtual tables can be used to
move some of the burden from the application to the UDFs and VTs, but
you can't move all of it (e.g., because whatever SQLite3 type you use to
encode UDTs will often have other uses, leading to an aliasing problem
that requires the application to avoid it).

> - encoding the user type codes for each data column in a separate column
> dedicated to the purpose. This is a low impact but cumbersome, for instance
> using a function that interprets the user type would have to have the user
> type passed in for each argument, along with the actual data.

It's cumbersome because it creates a referential integrity problem.

> - modifying the data file format to carry user type information. There is
> space in the record/row header where you can encode this information in a
> backwards compatible way, but the source code for data record access is not
> friendly, basically a dense blob of code with a lot of integer literals
> which are all very important, but it's hard to be sure what they entail and
> that you haven't introduced a subtle bug and ultimately data corruption.
> Additionally the user type would have to be passed around internally - for
> example in the sqlite3_value object - and tracking down all of those
> reliably is a bit of work.

Right, you'd basically be talking about adding new first-class types to
SQLite3.  That's quite an understaking and not for the faint of heart.
Even if you tackle this, chances are it'd be very difficult to get the
SQLite3 dev team to accept the changes -- one would be forking SQLite3,
and that requires serious (read: lots of experienced software engineer
time) effort to develop and maintain.

> - using blobs. Although using text representation is friendly when looking
> at the data with standard tools, it's slower and takes up more memory in
> various places. I found that encoding some user types as blobs with a type
> marker at their start (a single byte with extensions) and interpreting them
> was a simple and low impact approach. [...]

Encoding as text or blobs is about your only realistic option.  Enums
can be encoded as numbers too, as can small bitsets.

> The first option isn't very practical. The second option is the fastest and
> most robust solution and my long term approach which I will be going back
> to after development has progressed a bit more. Currently I'm using the
> third approach as an interim measure. I'm supporting arbitrary prec ints
> and reals, arrays and tuples and other types this way.

At that point why not just switch to Postgres?

Nico
--

[sqlite] unreached code in sqlite3.c?

2015-02-23 Thread Nico Williams

On Thu, Feb 12, 2015 at 8:58 AM, Doug Currie  wrote:
> Well, if TERM_VNULL is 0, then the code is truly unreachable, so I wouldn't
> call it a compiler bug.

Agreed.  But since this usage is so common, the compiler ought to have
a switch to turn off warnings for this particular sort of case, which
is (exp) & 0.  Or perhaps even not warn about this sub-case of
unreachable code ever.

The problem for the compiler writer is that it's relatively easy to
generally detect unreachable code using an AST or intermediate
representation, but by then it might be too late to note that this is
a use-case that should not be warned about.

Nico
--

Re: [sqlite] Whish List for 2015

2014-12-23 Thread Nico Williams

On Tue, Dec 23, 2014 at 05:26:00PM +0200, Tony Papadimitriou wrote:
> Proposal:
> 
> Minimal stored procedure functionality -- or, maybe call it stored

It's already there.  It's called TRIGGER.

I use triggers to hold "stored procedures", and WHERE conditions on DMLs
in the trigger body as IF..THEN..ELSEs.

CREATE VIEW foo AS
SELECT null AS arg0, null AS arg1, .., null as argN LIMIT 0;
CREATE TRIGGER foo_ins INSTEAD OF INSERT ON foo
FOR EACH ROW
BEGIN
-- INSERT/UPDATE/DELETE/SELECT RAISE(...) statements here
INSERT INTO .. SELECT .. WHERE ;
INSERT INTO .. SELECT .. WHERE ;
..
INSERT INTO .. SELECT .. WHERE ;
...
END;

Invoking a stored procedure requires an INSERT INTO  .., with
the columns of the inserted row(s) being the arguments.  You can't
SELECT from a stored procedure, but it can store results in a TEMP TABLE
you can SELECT from.

Need looping?  Use recursive queries and/or recursive triggers.

Need local variables?  Use TEMP TABLEs (which you have to clean up), or
pass them as arguments to yet more stored procedures (recursive
triggers).

There's no need for a trigger rollback journal (I think!) for an insert
into a VIEW with an INSTEAD OF trigger, so it's not too bad.

There's no syntactic sugar for this.  It could be added; not much
machinery would be needed beyond the sugar, just some naming conventions
for the views and triggers.

I also use virtual tables as table-valued user-defined functions.  A bit
of syntactic sugar for that too would be nice.

> CREATE PROC sample AS (
> SELECT table1.* FROM table1,...,tableN
>  WHERE ... possibly complicated join ...
>  AND last_name like :1
>  ORDER BY :2;
> );
> 
> @sample 'Smith%',tax_id
> 
> would become:
> 
> SELECT table1.* FROM table1,...,tableN
>  WHERE ... possibly complicated join ...
>  AND last_name like 'Smith%'
>  ORDER BY last_name,tax_id;

Yes, this would be nice: UDFs written in SQL.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] ALTER TABLE .. ADD COLUMN .. IF NOT EXISTS ..?

2014-12-16 Thread Nico Williams

On Tue, Dec 16, 2014 at 11:40:22PM +, Simon Slavin wrote:
> If 'ALTER TABLE ... ADD COLUMN ...' fails it fails harmlessly, with

But it doesn't fail so harmlessly:

$ sqlite3 db 'alter table toy add column foo text; select 5;' || echo fail
SQL Error: duplicate column name: foo
fail
$ 

Note that:

 - the second statement was not executed
 - the noise on stderr
 - the non-zero exit status for the sqlite3 shell (which means that one
   of sqlite3_prepare_v2() or sqlite3_step() failed)

Yes, I can work around this.

Compare to a CREATE TABLE .. IF NOT EXISTS, which is silent and does not
cause the shell to exit with a non-zero exit status (because it doesn't
cause the sqlite3_exec() nor sqlite3_step() to return an error), and
does not stop evaluation of remaining input to the shell.

Ideally I could just have schema SQL in a file, doing DROP .. IF EXISTS
for some schema elements, CREATE .. IF NOT EXISTS for all of them, and
ALTER TABLE .. IF NOT EXISTS to upgrade schemas by just evaluating this
one file.

Executing a schema setup/upgrade file this via the shell is extremely
convenient.

(I do that all the time, but not with ALTER TABLE.)

> its work already having been done.  The ALTER command came along a
> long time after original SQL.  By that time software could handle
> cases where a single SQL command failed without the software having to
> crash at that point.

Yes, there are workarounds, I'm well aware.

> I would value far more the ability to do
> 
> ALTER TABLE ... DROP COLUMN ...
> 
> in SQLite, difficult though it would be to implement in SQLite3.

I would like this too, yes.

It wouldn't be that difficult: all that's needed is to arrange for the
dropped column to remain on-disk but otherwise be ignored (hidden, but
really well hidden), but still be added (with null value) for INSERTs
and UPDATEs.  For SQLite3 that would mean something like extending the
sqlite_master table to list the on-disk columns, with dropped columns
marked-up as such.  One would have to vaccuum to have them truly
disappear.

(For extra credit fail if triggers/FKs retain dangling references to the
dropped column, and even better, defer this check to commit time, since
subsequent statements might remediate this.)

Some things are easier than others.  ALTER .. IF NOT EXISTS surely would
be easier to add than DROP COLUMN.  Whether that's enough to recommend
it is a different story; I leave it to the SQLite3 team to decide that.

Even better, I'd like a normalized form of the schema stored in sqlite_*
tables, so that I could create/alter/drop schema elements with normal
CREATE/UPDATE/DELETE statements with WHERE clauses (so that I could
express conditional schema changes in SQL).  It'd be better than any
pragmas like table_info(table_name).  Much of the schema manipulation
statement logic could later be re-implemented by mapping those to DMLs
and then executing them, with many constraints (e.g., new columns must
allow NULL or otherwise have a default value, ...) implemented as
triggers.

Today I'm just asking for IF NOT EXISTS.  If it's not adopted, no big
deal.  I think it has a couple of things to recommend it (utility,
relative ease of implementation), but I'm not paying for it.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] ALTER TABLE .. ADD COLUMN .. IF NOT EXISTS ..?

2014-12-16 Thread Nico Williams

I have a habit of putting schema definitions in a file that's always
safe to read and execute against a DB connection.  This means that I
DROP some things IF EXISTS and CREATE all things IF NOT EXISTS.

But if I have to ALTER TABLE... there's no IF NOT EXISTS .. equivalent
for ALTER TABLE.

Funny that, or that I only just noticed this absence.

Looking at other SQL databases I see that this is actually a common
question/request, and it seems that where this is implemented it looks
like this:

  ALTER TABLE [IF EXISTS]  ADD COLUMN  [IF NOT EXISTS] ..;

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Query Planner for Virtual Tables: link table evaluation & transitive property of constraints not used

2014-12-15 Thread Nico Williams

On Mon, Dec 15, 2014 at 06:23:31PM +0700, Dan Kennedy wrote:
> It's tricky. As you say, xBestIndex() will currently be invoked
> twice - once with no constraints usable and once with both "b.id=?"
> and "b.linkid=?" usable. I guess the reason it is not invoked in the
> other ways you suggest is that that strategy might conceivably
> require a huge number of xBestIndex() calls if there were more than
> a few other tables in the join.

Perhaps there should be a method by which distinct indices could be
expressed by the virtual table to the SQLite3 query optimizer.

One possible compromise would be to call xBestIndex() with each
constraint, then once more with all of them.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite bugreport : unique index causes valid updates to fail

2014-12-08 Thread Nico Williams

On Mon, Dec 8, 2014 at 9:01 AM, Dan Kennedy  wrote:
> You could hack SQLite to do enforce unique constraints the same way as FKs.
> When adding an entry to a UNIQUE index b-tree, you check for a duplicate. If
> one exists, increment a counter. Do the opposite when removing entries -
> decrement the counter if there are two or more duplicates of the entry you
> are removing. If your counter is greater than zero at commit time, a UNIQUE
> constraint has failed.
>
> I suspect there would be a non-trivial increase in the CPU use of UPDATE
> statements though.

Well, it'd be an option which, when not used, ought to cost very few
additional unlikely branches.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] sqlite bugreport : unique index causes valid updates to fail

2014-12-08 Thread Nico Williams

Ideally there would be something like DEFERRED foreign key checking
for uniqueness constraints...  You can get something like that by
using non-unique indexes (but there would also go your primary keys)
and then check that there are no duplicates before you COMMIT.  (Doing
this reliably would require something like transaction triggers, which
IIRC exists in a "sessions" branch.)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

BTW, the experience with dedup is that doing things off-line means
never catching up, while doing them online means going slow.

You might cache as much as you can in memory then go slow when you
miss the cache...

In practice I think it's best to separate data and metadata devices so
that you can make metadata as fast as you can, writing COW-style, like
ZFS does, and caching in memory on clients and servers as much as
possible for all data writes during the writes until they are
committed.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

My advice is to borrow from other clustered filesystems' experience.

If you want to adhere to POSIX semantics then st_mtime and st_size
visibility will be a particular headache, especially since you don't
know when it's OK to lie (i.e., which callers of stat() are using
st_mtime/st_size for synchronization).

Ideally we'd split stat() into metastat() and datastat()...

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Using Sqlite3 as a Change Time Recording Data Store in Glusterfs

2014-11-24 Thread Nico Williams

On Sun, Nov 23, 2014 at 4:26 PM, James K. Lowden
 wrote:
> Metadata updates to Posix filesystems are seen as so costly that
> fsync(2) on the datafile descriptor doesn't update them.  A separate
> sync on the directory is required.  Compared to an in-memory update
> (of metadata, in kernel space) and a single fsync call, the price of a
> SQLite transaction is enormous, at a guess an order of magnitude more.

This.  Updates of mtime and atime in particular are expensive.

Another problem (for Lustre-style clusters) is stat(), since it
returns information that only a metadata service might know (e.g.,
file type, file ID (st_dev and st_ino), and link count) but also
information that it might not (file size, various timestamps), which
then presents enormous headaches for implementors.  There are also
write visibility rules as to stat() that some applications depend
on...  This is why "ls -l" can be slow on some such clusters.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] SQLite as a meta database

2014-11-05 Thread Nico Williams

IIRC there's a GNOME interface to databases, libgda, that works this way.

The fundamental problem with using SQLite3 virtual tables to access
remote/high-latency resources is that the SQLite3 VM and the generated
bytecode are very serial, therefore the VM can't make progress on
other parts of a query while waiting for a virtual table operation to
complete.

In principle nothing stops one from adding support for using
co-routines in the VM to better exploit concurrency in a query whose
plan could evince concurrency.  But that doesn't exist yet in SQLite3.

IMO this would be a very good thing to add to SQLite3, and it would
make it a great meta-DB.  But I'm not sure that the SQLite3 developers
would welcome it.

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Ordering of group_concat values using subselect in 3.8.7

2014-10-28 Thread Nico Williams

On Tue, Oct 28, 2014 at 2:33 PM, James Earl  wrote:
> After upgrading from SQLite 3.8.6 to 3.8.7, I am no longer able to
> order group_concat values by using a subselect.  For example the
> following query with 3.8.6 will give me an ordered string of
> items.image values based on items.position (which contains integers):

Simon told you why this happened.  IIRC the common solution to this in
other RDBMSes is to have an ORDER BY clause in group_concat().  If
SQLite3 had an opaque-but-coercible-to-SQLite3-type type, then one
could build a UDF group_concat() that applies whatever order one
wishes.  (Please let us have this! :)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] quasi-bug related to locking, and attached databases....

2014-10-27 Thread Nico Williams

On Mon, Oct 27, 2014 at 5:12 PM, Simon Slavin <slav...@bigfraud.org> wrote:
> On 27 Oct 2014, at 9:49pm, Nico Williams <n...@cryptonector.com> wrote:
>> If it's not too much to ask for then SQLite3 ought to: a) check for
>> duplicates by canonicalized path (but keep in mind that this can be
>> difficult to do portably, or without obnoxious length limitations on
>> Windows), then b) check for duplicates by st_dev/st_ino where
>> available.
>
> Checking for cannonical path would seem to be important to improving this 
> functionality.  There's a function to do this under OS X (resolve links, then 
> do "stringByStandardizingPath") and Windows ("PathCchCanonicalize").  BSD has 
> "realpath(3)".  It should be possible for most VFSes.

Yes, but there's portability concerns.  For example,
PathCchCanonicalize() is not available before Windows 8 / Windows
Server 2012.  (Also, use PathCchCanonicalizeEx() instead...)

(Windows also has an inode number on NTFS, as well as hardlinks, which is nice.)

Nico
--
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

1 2 3 4 5 >

1 - 100 of 445 matches

Mail list logo