[sqlite] Auto Reply: Auto Reply: Re: sqlite-users Digest, Vol 38, Issue 4

2011-02-04 Thread nicolas . williams
This is an auto-replied message. I have left Oracle.  My new e-mail address is 
n...@cryptonector.com.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Auto Reply: Re: sqlite-users Digest, Vol 38, Issue 4

2011-02-04 Thread nicolas . williams
This is an auto-replied message. I have left Oracle.  My new e-mail address is 
n...@cryptonector.com.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 04:45:16PM -0800, Jeff Rogers wrote:
> Nicolas Williams wrote:
> >On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote:
> >
> >>SQLite seems to do quite poorly performance-wise with fully-normalized
> >>attribute tables like this, when you want to query against multiple
> >>attributes.  My timing comparisons with postgres show sqlite to be as
> >>much as 10x-15x slower than pg.
> >>
> >>My timing code is at http://paste.tclers.tk/2346
> >
> >You need an index on props(id) -- you always need an index on columns
> >that form a foreign key.  SQLite3 requires that if you want to cascade
> >deletions.
> 
> Ok, that caught me by surprise, but it improved things a lot.  With
> that index sqlite is now significantly faster than pg on 3 of the
> tests, but still similar to or slower than pg on the remaining 2 -
> in my code, "sx" and "sx3".   (I can't be sure of the exact timings
> because I can't run sqlite and the pg server on the same machine)
> 
> Any idea why pg does ok on these queries without the extra index -
> Maybe they're created by default?  SQLIte doesn't create any indexes
> automatically on primary key fields or anything else, correct?

No, it doesn't.  Use EXPLAIN QUERY PLAN to see what SQLite3 is doing.

Also, see the followup about covering indexes.  Also, run ANALYZE.

Finally, you might denormalize somewhat by having a trigger to copy the
obj.name column to the props table and add that column to the _end_ of
the covering indexes.  Then SQLite3 might be able to do your joins using
a single index.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 03:38:07PM -0800, Jim Morris wrote:
> I'd probably move the analyze out of the loop.
> 
> Since your joining on props.id a better index pind might be
> create index pind on props (id, pnam)

Yes, you probably want two covering or partially-covering indexes:

CREATE INDEX pind ON props (id, pnam, pval);
CREATE INDEX pind ON props (pval, pname, id);

The first is for when you need to join on 'id' and want to extract
attribute/values.  The second is for when you want to search by
attribute/values and need to get rowids back.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Question about database design

2011-02-02 Thread Nicolas Williams
On Wed, Feb 02, 2011 at 03:23:49PM -0800, Jeff Rogers wrote:
> Andreas Kupries wrote:
> 
> > It seems to me that you are looking for
> > http://en.wikipedia.org/wiki/Database_normalization
> >
> 
> SQLite seems to do quite poorly performance-wise with fully-normalized 
> attribute tables like this, when you want to query against multiple 
> attributes.  My timing comparisons with postgres show sqlite to be as 
> much as 10x-15x slower than pg.
> 
> My timing code is at http://paste.tclers.tk/2346

You need an index on props(id) -- you always need an index on columns
that form a foreign key.  SQLite3 requires that if you want to cascade
deletions.

Without that index your joins will use full table scans.

Also, it's better to use FROM table1 t1 JOIN table2 t2 ON t1.col1 =
t2.col2 than to put the join conditions in the WHERE clause, as that's
much easier to read.  But I can see that you have JOINs using that
un-indexed column.

> This is a synthetic test, but I ran across the issue in a real 
> application.  I'm not sure what else I can do do optimize the queries; 
> using a denormalized table is the only thing that seems to help.

Add the missing index.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WAL for single user database on NFS and Solaris

2011-01-30 Thread Nicolas Williams
On Sun, Jan 30, 2011 at 11:06:34AM +0800, Ginn Chen wrote:
> I think a single process accessing a single Sqlite database at a time over 
> NFS is supposed to be fine.
> 
> But it is not working on Solaris.
> On Solaris, man page of mmap() has
> 
>  EAGAINThe file to be  mapped  is  already  locked  using
>advisory   or   mandatory   record   locking.  See
>fcntl(2).
> 
> I found if the file on NFS has been locked with fcntl(), mmap() may fail.

Right, because mmap()s can be inherited by child processes, and there's
[performant] no way to enforce mandatory record locking then.  I don't
know why it'd fail in the case of advisory locking though.  I'll
inquire.

> It doesn't fail if both l_start and l_end are 0.
> 
> Thus, I got "Error: disk I/O error" for any operation after setting 
> journal_mode to WAL.
> See https://bugzilla.mozilla.org/show_bug.cgi?id=629296
> 
> The problem is setting journal_mode to WAL is successful, but it cannot be 
> set back.
> User has to move the db file to another filesystem and set journal_mode to 
> delete.
> 
> I guess -DSQLITE_SHM_DIRECTORY="/var/tmp" might be a solution, but it is 
> unsupported.

You don't want to do that, no.

> Any idea how can we make the behavior better on Solaris?
> Maybe fallback to another lock method if it is on NFS?

I could use more information.  Say, a truss -v all -t all -f sqlite3 ...

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LAST() function not supported

2011-01-30 Thread Nicolas Williams
On Sat, Jan 29, 2011 at 01:06:07PM -0800, Marian Cascaval wrote:
> On Sat, January 29, 2011, Nicolas Williams wrote:
> > If there's enough indices to satisfy all the ORDER BY expressions then
> > how could a "last()" function do any better?  For that matter, if there
> > aren't enough indices to satisfy all the ORDER BY expressions then how
> > could a "last()" function do any better?  What optimization could such a
> > function implement that the query optimizer couldn't?  Syntactically
> > speaking, there's no additional information in "last()" -- it's just
> > syntactic sugar.
>
> As I see it,from the point of view of just retrieving the last row from a 
> table, 
> no ORDER BY is necessary thus saving processor time.

You seem to think that SQLite3 has to get all the rows and sort them,
then pick the last.  But remember, if there's enough indices then it's
not picking the last, but the first row, and there's no sorting to do.

Seriously, have SQLite3 explain the query plan to you and you'll see.

If there's no suitable indices, then SQLite3 will have to do a full
table scan, select specific rows, then sort them to pick the one you
want -- a "last()" function wouldn't change that.  If there is a
suitable index then SQLite3 will not do any scanning, nor sorting,
because of that LIMIT 1.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LAST() function not supported

2011-01-29 Thread Nicolas Williams
On Sat, Jan 29, 2011 at 01:03:05PM -0800, Marian Cascaval wrote:
> Here's the info source on LAST() function:
> 
> http://www.w3schools.com/sql/sql_func_last.asp
> 
> 
> 
> I needed to retrieve the last row from a table.

If you need the "last row from a table" that's trivial to do efficiently
in SQLite3:

SELECT ... FROM ... ORDER BY rowid DESC LIMIT 1;

(You should, but don't have to, replace rowid with whatever the INTEGER
PRIMARY KEY column is, if there is one.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] LAST() function not supported

2011-01-29 Thread Nicolas Williams
On Sat, Jan 29, 2011 at 12:38:37PM -0800, Marian Cascaval wrote:
> Hi!
> 
> Is LAST() function going to be supported?
> 
> Or will the "SELECT ... FROM ... ORDER BY ... DESC LIMIT 1" workaround always 
> be 
> enough?
> 
> My concern is if there might be any speed improvement if LAST() function were 
> to 
> be implemented, comparing to the workaround.

Use EXPLAIN QUERY PLAN...

If there's enough indices to satisfy all the ORDER BY expressions then
how could a "last()" function do any better?  For that matter, if there
aren't enough indices to satisfy all the ORDER BY expressions then how
could a "last()" function do any better?  What optimization could such a
function implement that the query optimizer couldn't?  Syntactically
speaking, there's no additional information in "last()" -- it's just
syntactic sugar.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] how to use group_concat uniquely

2011-01-19 Thread Nicolas Williams
On Wed, Jan 19, 2011 at 09:19:54AM -0800, Noah Hart wrote:
> Any ideas?

You have two columns to sub-group by independently, as it were.  You
need correlated sub-queries to get that done:

sqlite> SELECT f1.e, (SELECT group_concat(f2.t, ';')
   ...> FROM foo f2 WHERE f1.e = f2.e GROUP BY f2.p), (SELECT 
group_concat(f2.p, ';')
   ...> FROM foo f2 WHERE f1.e = f2.e GROUP BY f2.t) FROM foo f1 GROUP BY f1.e;
A|1;2|R;S
B|1;2|R;S
sqlite> 

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] JOIN bug in 3.7.2, not in 3.6.22

2011-01-18 Thread Nicolas Williams
On Tue, Jan 18, 2011 at 10:13:10PM +0100, Florian Weimer wrote:
> * Richard Hipp:
> 
> > I don't think it makes sense in SQL (not just SQLite but SQL in
> > general) for an aggregate query to return columns that are not in
> > the GROUP BY clause.
> 
> Isn't this just what PostgreSQL implements as DISTINCT ON?  Then it
> *is* useful.

DISTINCT and DISTINCT ON effectively map to GROUP BY, no?  DISTINCT ON
explicitly results in unpredictable results unless an ORDER BY clause is
also used.

Incidentally, if there is a UNIQUE constraint for the {grouping columns}
_and_ the non-aggregated/non-grouped columns, then such a query also
makes sense.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Procedural SQL techniques (Re: How to track a record if the aliased id is changed)

2011-01-14 Thread Nicolas Williams
On Fri, Jan 14, 2011 at 10:53:56AM -0600, Nicolas Williams wrote:
> But that function would have to track those changed rowids somewhere.
> The trigger I posted does exactly that, using SQLite3's own primitives
> (a temp table in this case):

Speaking of which, the lack of procedural programming in SQLite3 is not
really a barrier, and it's not necessarily the case that it's best to
code procedural logic in C, Tcl, Python, or...  With a little ingenuity
one can realize that SQLite3 provides much of what is needed for
procedural programming:

 - variables?-> INSERT/SELECT on a special-purpose table, preferably a
TEMP one;

 - conditionals? -> Where you need "IF" statements use WHEN clauses on
TRIGGERS or INSERTs/UPDATEs/DELETEs with WHERE
clauses having your conditional expression, SELECT
CASE where you need something like the C ? :
operator;

 - iteration?-> Unroll loops or use recursive TRIGGERS (either way
you have a fairly tight maximum number of
iterations);

 - recursion?-> Use recursive TRIGGERS, see above;

 - functions?-> Use VIEWs for side-effect free functions that
require no iteration, else INSERT into some table
with a trigger that computes the function.

I will grant that writing procedural code in that fashion will result in
a fair amount of disjointed looking SQL.  But that can happen in any
language (e.g., when one must write callback functions in languages
lacking closures).  If you can keep the code size manageable then for
some problems you might be able to keep the SQL code size significantly
smaller than the non-SQL code size; SQL is quite expressive.

For example, I just implemented a loop detection feature that works by
having a trigger on a table used by the app, that inserts into another a
row that kicks off a recursive search for references, inserting into
this same table each reference found in a path, and raising an exception
if duplicates are found.  And that table is self-cleaning too.  One
could use the same technique to implement recursive queries...

Would I recommend this?  In most cases no -- there's too many problems:
a) limits on trigger recursion, b) readability/maintainability, c)
SQLite3 will almost certainly be slower than any other language for
implementing CPU-bound functions.  You'll need to write code in C or
some other language anyways, thus you can implement procedural logic in
that language.

But sometimes the above can be handy.  I'd recommend these techniques in
the case this thread is covering, for example, since they are far
simpler to use than it is to write a SQL function that has side-effects.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to track a record if the aliased id is changed

2011-01-14 Thread Nicolas Williams
On Fri, Jan 14, 2011 at 10:20:22AM +, Philip Graham Willoughby wrote:
> On 13 Jan 2011, at 17:27, Nicolas Williams wrote:
> > On Thu, Jan 13, 2011 at 10:59:29AM +, Simon Slavin wrote:
> >> CREATE TEMP TRIGGER [UpdatedRowIdTrigger] AFTER UPDATE ON TestTable
> >> BEGIN
> >> UPDATE UpdatedRowId Set Value = new.rowid WHERE new.rowid != old.rowid;
> >> END;
> >> 
> >> But I don't know whether SQLite can actually do that comparison at
> >> that stage.
> > 
> > I just tried it.  SQLite3 does handle this correctly:
> 
> You can do slightly better using a SELECT instead of an UPDATE in the
> trigger body. Install a custom function called rowid_changed() and
> SELECT the result of that; e.g.
> 
> CREATE TEMP TRIGGER a AFTER UPDATE ON t FOR EACH ROW BEGIN SELECT
> rowid_changed(OLD.rowid,NEW.rowid) WHERE OLD.rowid != NEW.rowid; END;
> 
> You then put code to do whatever your application needs to do when the
> rowid changes in the rowid_changed function you install. You don't
> need to create a new table (removing a small risk of name collision)
> and you will not be doing unnecessary work by polling said table for
> changes.

But that function would have to track those changed rowids somewhere.
The trigger I posted does exactly that, using SQLite3's own primitives
(a temp table in this case):

CREATE TEMP TRIGGER t2 AFTER UPDATE ON t FOR EACH ROW BEGIN
INSERT INTO t1 (a, newa) SELECT OLD.rowid, NEW.rowid WHERE
   OLD.rowid != NEW.rowid; END;

No need to write a function in any language.

(I do, however, wish that SQLite3 could persist temp tables/triggers
such that when one opens a DB handle all those temp tables and triggers
get automatically created.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to track a record if the aliased id is changed

2011-01-13 Thread Nicolas Williams
On Thu, Jan 13, 2011 at 10:59:29AM +, Simon Slavin wrote:
> His problem is that he doesn't know which rows are aliased to rowid,
> so he can't provide a list of column names.  So the following might be
> closer
> 
> CREATE TEMP TRIGGER [UpdatedRowIdTrigger] AFTER UPDATE ON TestTable
> BEGIN
>  UPDATE UpdatedRowId Set Value = new.rowid WHERE new.rowid != old.rowid;
> END;
> 
> But I don't know whether SQLite can actually do that comparison at that stage.

I just tried it.  SQLite3 does handle this correctly:

sqlite> CREATE TEMP TABLE t(a integer primary key, b text);
sqlite> INSERT INTO t VALUES (1, 'abc');
sqlite> INSERT INTO t VALUES (2, 'def');
sqlite> CREATE TEMP TABLE t1 (a integer, newa integer);
sqlite> CREATE TEMP TRIGGER t2 AFTER UPDATE ON t FOR EACH ROW BEGIN
   ...> INSERT INTO t1 (a, newa) SELECT OLD.rowid, NEW.rowid WHERE OLD.rowid !=
   ...> NEW.rowid; END;
sqlite> UPDATE t SET b = 'xyz' WHERE a = 1;
sqlite> SELECT * FROM t;
1|XYZ
2|def
sqlite> SELECT * FROM t1;
sqlite> UPDATE t SET a = 3 WHERE b = 'def';
sqlite> SELECT * FROM t;
1|xyz
3|def
sqlite> SELECT * FROM t1;
2|3
sqlite> 

Handy!

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to track a record if the aliased id is changed

2011-01-12 Thread Nicolas Williams
On Wed, Jan 12, 2011 at 10:07:36PM +, Simon Slavin wrote:
> On 12 Jan 2011, at 9:57pm, Max Vlasov wrote:
> > Simon, your reply led me to the following sequence:
> > - I know the rowid of the record I'm changing. I remember all integers (and
> > all other data) I'm going to change in the Update query (it' comparatively
> > easy task)
> > - I check this rowid after the change. If it exists, the record did not
> > change the rowid and if it does not I form SELECT .. where rowid= or
> > rowid=.. containing all the integers I used and compare the rest of the data
> > only with this result set. If there's only single match, this is the answer,
> > but if not ... I should think about it :)
> > 
> > Does it sound reasonable?
> 
> It will deal with most cases, but it can still be fooled by creative
> use of TRIGGERs, or by bad coincidences in the numbers stored.  A
> question worth asking might be why you need to maintain these rowids.
> If you're just letting your user change whatever data they want, why
> are you bothering to keep track of rowids ?

That's my view as well.  If you're not in control over what the DB
itself is doing (through triggers, for example) or what the user is
doing (you're given them too much rope) then you can't reliably detect
rowid changes (assuming the DB allows them).

Assuming this is for a generic data browser type application, I'd say:
a) don't show rowids to users, b) on every update re-run the query that
produced the rows displayed to the user and refresh the display.  If you
can guarantee that the rowid won't change then you can skip (b) and
instead just do a query for the modified row to get its new column
values.

If you can at all help it, then disallow arbitrary rowid changes.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on breaking trigger recursion?

2011-01-10 Thread Nicolas Williams
On Fri, Jan 07, 2011 at 09:54:07PM -0600, Nicolas Williams wrote:
> On Sat, Jan 08, 2011 at 01:29:23AM +, Simon Slavin wrote:
> > On 8 Jan 2011, at 1:12am, Nicolas Williams wrote:
> > > I need to use recursive triggers.  In some cases I want to "normalize"
> > > values of some columns of NEW being INSERTed or UPDATEd, but there's no
> > > UPDATE syntax for changing NEW, thus I can't write something like:
> > 
> > 
> > CREATE TRIGGER fred INSTEAD OF INSERT ON myTable ...
> 
> Oh, for some reason I thought that INSTEAD OF triggers were for views
> only, but that's not the case.  [...]

Actually, I was unable to get SQLite3 to create INSTEAD OF triggers on
tables.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on breaking trigger recursion?

2011-01-08 Thread Nicolas Williams
On Sat, Jan 08, 2011 at 12:07:08PM +0700, Dan Kennedy wrote:
> On 01/08/2011 08:12 AM, Nicolas Williams wrote:
> > I need to use recursive triggers.  In some cases I want to "normalize"
> > values of some columns of NEW being INSERTed or UPDATEd, but there's no
> > UPDATE syntax for changing NEW, thus I can't write something like:
> >
> > [...]
> >
> > I must write:
> >
> > CREATE TRIGGER ...
> > BEGIN
> >  UPDATE  SET somecol = () 
> > WHERE rowid = NEW.rowid;
> > END;
> 
> Something like this perhaps:
> 
> CREATE TRIGGER sometrig AFTER UPDATE WHEN somecol !=  BEGIN
>UPDATE sometab SET somecol =  WHERE rowid = NEW.rowid;
> END;

Yes, adding a WHEN clause to the triggers also works.

I did have some triggers with WHEN clauses, actually, but it didn't
occur to me to use that as a technique for breaking recursion in part, I
think, because the WHEN expression would, in this case, be several lines
long, and aesthetically that rubbed me the wrong way.  I've got several
instances of that sort of expression in my code and it occurs to me that
I can use additional views to reduce the length of that expression.

Thanks to Dan as well.  CHeers,

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] tcl and NULL

2010-12-23 Thread Nicolas Williams
IMO the best solution is to have a special procedure passed to the Tcl
SQL function that must be called to set the return value of the SQL
function.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Low-load methods of finding previous or next row

2010-12-15 Thread Nicolas Williams
And if you use parametrized queries then you get this query plan:

0|0|TABLE toy VIA MULTI-INDEX UNION
0|0|TABLE toy WITH INDEX toy_abc
0|0|TABLE toy WITH INDEX toy_abc
0|0|TABLE toy WITH INDEX toy_abc

The ORed terms are optimized as a UNION, with each sub-query using the
index.  That's three index operations per column that you order by.  Not
bad.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Low-load methods of finding previous or next row

2010-12-15 Thread Nicolas Williams
On Wed, Dec 15, 2010 at 04:29:42AM -0600, Nicolas Williams wrote:
> The ORed terms are optimized as a UNION, with each sub-query using the
> index.  That's three index operations per column that you order by.  Not
> bad.

s/three/one/
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Low-load methods of finding previous or next row

2010-12-15 Thread Nicolas Williams
On Wed, Dec 15, 2010 at 03:56:06AM -0600, Nicolas Williams wrote:
> SELECT * FROM toy
> WHERE
> a >= (SELECT a FROM toy WHERE id = 6) OR
> (a = (SELECT a FROM toy WHERE id = 6) AND
>  b <= (SELECT b FROM toy WHERE id = 6)) OR
> (a = (SELECT a FROM toy WHERE id = 6) AND
>  b = (SELECT b FROM toy WHERE id = 6) AND
>  c >= (SELECT c FROM toy WHERE id = 6))
> ORDER BY a ASC, b DESC, c ASC LIMIT 1 OFFSET 1;

Oh, and drop the = from the >= and <= tests and you can drop the OFFSET
1 part.  Same for the other one.  So you get:

SELECT * FROM toy
WHERE
a > (SELECT a FROM toy WHERE id = 6) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b < (SELECT b FROM toy WHERE id = 6)) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b = (SELECT b FROM toy WHERE id = 6) AND
 c > (SELECT c FROM toy WHERE id = 6))
ORDER BY a ASC, b DESC, c ASC LIMIT 1;

And:

SELECT * FROM toy
WHERE
a < (SELECT a FROM toy WHERE id = 6) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b > (SELECT b FROM toy WHERE id = 6)) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b = (SELECT b FROM toy WHERE id = 6) AND
 c < (SELECT c FROM toy WHERE id = 6))
ORDER BY a DESC, b ASC, c DESC LIMIT 1;

Both have the same efficient query plan.  Drop the index and you'll see
that you get full table scans.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Low-load methods of finding previous or next row

2010-12-15 Thread Nicolas Williams
On Tue, Dec 14, 2010 at 05:09:04PM +, Simon Slavin wrote:
> I recently found out that when you use LIMIT in SQLite the engine
> still processes all applicable records even if it only has to return
> the number you asked for.  I suspect that this makes something I used
> to do inefficient.  So let me ask for what I want and see what the
> list comes up with.
> 
> I start with three things:
> 
> * an arbitrary table which definitely has an explicitly defined field
> 'id' which is always an alias to SQLite's use of 'rowid'.  The table
> may have tens of thousands of rows.
> 
> * the value of 'id' of a particular record

Use this to get the values of the columns desired for the ORDER BY.

> * an ORDER BY string that can be applied to this table, e.g.
> 'surname,age DESC'

Once you have those values do two SELECTs.  One SELECT will be for the
desired ORDER BY with WHERE matching all rows that would match the rows
following the one whose id you have, LIMIT 1.  You can build the WHERE
expressions based on the columns retrieved in the above step.  The other
SELECT will have the sort order in the ORDER BY reversed, and the
matching direction in the WHERE expressions reversed as well.

E.g.,

CREATE TABLE toy(id integer primary key, a int, b int, c int);
CREATE INDEX toy_abc ON toy (a ASC, b DESC, c ASC);
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());
INSERT INTO toy (a, b, c) VALUES (random(), random(), random());

SELECT * FROM toy ORDER BY a ASC, b DESC, c ASC LIMIT 1;

7|-9084516558209846371|5509077441613329789|-6039970258732513679
4|-8048386703128611944|6994995616875512667|8999476370168858025
3|-7424117500756575148|4942101875445443496|1450545533195196343
8|-5743400534550269237|389208812082260818|-6076412160162338146
5|-5670534775733401701|1663837058082466530|-4342108509867877155
6|-3182970629602473541|-5223987460270046935|1484365513264068016
1|-3041202902673535746|-2810235477737418946|-6552445179965515592
12|-2658836410255605115|1738314160220639562|-5959979395208249554
11|-189594377855007160|-8847915915461491005|3035343813815035327
9|61938152197633097|23466571579391398|5202656507889790288
2|872885551150985642|-6337095019136234251|-283494106466248099
14|3088016326372574532|-562863847042835080|-7774010663666943917
13|4053781913908032366|297289804958612507|3778649002432168869
10|7412852161679323896|-1652518713928674542|2034739374121838177


-- given row id, say, 6, get the values we need below
SELECT a, b, c FROM toy WHERE id = 6;
-- Just for this example I'm not using '?', so you can try it with the
-- command shell.
-- Get the row id of the row immediately following the above row:
SELECT * FROM toy
WHERE
a >= (SELECT a FROM toy WHERE id = 6) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b <= (SELECT b FROM toy WHERE id = 6)) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b = (SELECT b FROM toy WHERE id = 6) AND
 c >= (SELECT c FROM toy WHERE id = 6))
ORDER BY a ASC, b DESC, c ASC LIMIT 1 OFFSET 1;
-- the above returns:
1|-3041202902673535746|-2810235477737418946|-6552445179965515592
-- get the row id of the row immediately preceding it:
SELECT * FROM toy
WHERE
a <= (SELECT a FROM toy WHERE id = 6) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b >= (SELECT b FROM toy WHERE id = 6)) OR
(a = (SELECT a FROM toy WHERE id = 6) AND
 b = (SELECT b FROM toy WHERE id = 6) AND
 c <= (SELECT c FROM toy WHERE id = 6))
ORDER BY a DESC, b ASC, c DESC LIMIT 1 OFFSET 1;
-- which return:
5|-5670534775733401701|1663837058082466530|-4342108509867877155

Now check the query plan.  I get this sort of thing:

0 0  TABLE toy VIA MULTI-INDEX UNION
0 0  TABLE toy WITH INDEX toy_abc
0 0  TABLE toy USING PRIMARY KEY
0 0  TABLE toy WITH INDEX toy_abc
0 0  TABLE toy USING PRIMARY KEY
0 0  TABLE toy USING PRIMARY KEY
0 0  TABLE toy WITH INDEX toy_abc
0 0  TABLE toy USING PRIMARY KEY
0 0  TABLE toy USING PRIMARY KEY
0 0  TABLE toy USING PRIMARY KEY

Which looks about right.

> I can 

Re: [sqlite] SQLITE 3.7.3 bug report (shell) - output in column?mode does not align UTF8-strings correctly

2010-11-29 Thread Nicolas Williams
On Fri, Nov 26, 2010 at 06:52:56AM +, Niklas Bäckman wrote:
> Igor Tandetnik  writes:
> > Note that counting codepoints, while it happens to help with your
> > particular data, won't help in general.  Consider combining
> > diacritics: U+00E4 (small A with diaeresis) looks the same as U+0061
> > U+0308 (small letter A + combining diaeresis) when printed on the
> > console.
> 
> You are right of course. The shell should not count code points, but 
> graphemes.

And their widths.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Trouble with TRIGGERS

2010-11-16 Thread Nicolas Williams
Do you have recursive triggers enabled?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] is it really ok to allow non-aggregates in an aggregate line?

2010-11-12 Thread Nicolas Williams
On Tue, Oct 19, 2010 at 09:39:44AM -0400, Richard Hipp wrote:
> On Mon, Oct 18, 2010 at 8:19 AM,  wrote:
> 
> > I made an error in my SQL when I did not include one of my non-aggregate
> > columns in my group.  I was surprised that Sqlite did not catch this, and
> > even more surprised when the docs spelled out this behavior.
> >
> > Is everyone ok with this?
> > Do any other SQL engines allow this?
> > (DB2 does not)
> >
> 
> I was going to change this at one point, so that it raised an error, but
> that suggestion raised such an outcry that I decided to leave it.
> Apparently, there are many applications out there that depend on this
> behavior.
> 
> I think the big use case is as a substitute for DISTINCT.

Perhaps there should be aggregate functions whose purpose is pick the
first/last value of some expression/column as ordered by some
expression (including by random).

Something like:

SELECT average(a), agg_random(b) FROM some_table GROUP BY c;
SELECT average(a), agg_first(b ORDER BY b ASC) FROM some_table GROUP BY c;
SELECT average(a), agg_last(b ORDER BY rowid ASC) FROM some_table GROUP BY c;

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WAL file growth concern

2010-10-25 Thread Nicolas Williams
On Fri, Oct 22, 2010 at 09:56:22PM -0400, Richard Hipp wrote:
> On many (most?) filesystems, it is faster to overwrite an existing area of a
> file than it is to extend the file by writing past the end.  That's why
> SQLite doesn't truncate the WAL file on each checkpoint - so that subsequent
> writes will be overwriting an existing file region and thus go faster.

I think that assumption is getting harder to make.  For one thing,
filesystems nowadays aggregate transactions into large writes, which
means that old blocks aren't overwritten, but replaced -- COW goes
hand-in-hand with such aggregation.

For ZFS your assumption is wrong because of ZFS' variable block size
support[*].  I don't know if there are filesystems other than ZFS where
file data block size varies.  But for filesystems that aggregate writes
I think you'd find that overwriting performs about as well as appending
(assuming there's no O_APPEND synchronization going on).

Does the WAL store modified pages?

Nico

[*] In ZFS files have a single data block until the file size exceeds
the host filesystem's "recordsize", from which point the file will
consist of two or more data blocks, all of that size.  Block sizes
are all the powers of two between nine and seventeen (512 bytes to
128KB).  Thus overwriting a 1KB SQLite3 page in the middle of the
file with 128KB recordsize will result in a read-modify-write of the
modified block.  Though likely the application will have already
caused that block to be in memory, in which case there's no RMW, but
the disparity between application "page size" and ZFS recordsize
obviously has a significant cost.  SQLite3 users should set the
SQLite3 page size and host ZFS dataset recordsize so they match.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] programmatic way of determining fragmentation?

2010-10-22 Thread Nicolas Williams
On Sat, Oct 23, 2010 at 02:12:19AM +0400, Max Vlasov wrote:
> As for your initial question, I think fragmentation evaluation is possible
> with the help of VFS. I'd keep a total sum of of absolute difference between
> consequent read offsets for xRead operation. In this case if some xRead
> request reads 1024 bytes at the 4096 offset and the next xRead reads
> something at 5120 (4096 +1024) this will add 0 to the sum, but if the next
> read something at 8192, this will add 3072 = (8192 - (4096 +1024)). If this
> implemented, you will be able to see how this value changes for some of your
> SELECT and maybe evaluate it on per record basis. If it's more like some
> threshold value, ok, peform VACUUM

SQLite knows when it's generating full table scans, so it could arrange
to detect non-sequential I/O for just that table's B-Tree (random I/O
might result anyways if the query is a join, say), gather rudimentary
statistics (say, the number of non-sequential page reads for the table
scan) and report fragmentation via a pragma.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed up DELETE of a lot of records

2010-10-08 Thread Nicolas Williams
On Fri, Oct 08, 2010 at 05:49:18PM +0100, Simon Slavin wrote:
> On 8 Oct 2010, at 5:48pm, Stephan Wehner wrote:
> > On Fri, Oct 8, 2010 at 7:14 AM, Michele Pradella
> >  wrote:
> >>  "science fiction?" was a rhetorically question. I'm only wondering
> >> about what is the best and fastest way to DELETE a lot of records from
> >> huge DB. I know and understand physical limit of data moving: anyway for
> >> now I'm trying to split the BIG DELETE in some smaller DELETE to spread
> >> the time used. It's the only way I can figure out at the moment.
> > 
> > Is a soft-delete faster? Then you could add a slow-moving delete
> > (mentioned earlier by Aldes Rossi, for example)
> > for the soft-deleted records.
> 
> Soft-delete ?  Is that having another column which is a '1' if the
> record is supposed to exist and a '0' if it's supposed to be deleted ?

Use NULL to indicate deleted-ness.  But note that you may still have to
update lots of rows and indexes.  Indeed, a true delete might not need
much more I/O (whereas a VACUUM after the DELETE would).
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite database sync

2010-10-07 Thread Nicolas Williams
On Thu, Oct 07, 2010 at 09:09:19PM +0200, Petite Abeille wrote:
> On Oct 6, 2010, at 5:50 PM, David Haymond wrote:
> > If I copy, I don't want to transfer EVERY record to the server each time I 
> > sync, because that would be a waste of bandwidth. What is the best way to 
> > copy only those records that have changed to the server?
> 
> Perhaps you could simply rsync the two files?

That only works for one-way synchronization, but for that rsync is
probably a very good idea.

For bi-directional synchronization things get messy because conflicts
can arise that must get resolved.  Since conflicts must get resolved
(i.e., you can't raise exceptions, can't rollback, can't reject) you
cannot handle them in the same way that you'd handle conflicts in the
case of normal transactions.  Dealing with uniqueness vilations here
requires generating new values for some conflicting records.  Dealing
with RESTRICT foreign key violations requires either un-DELETEing rows
or finding new rows to refer to, and so on.  Worse, the conflict
resolution must be done in such a way that the two DB copies end up
having the same contents when you're done, so the conflict resolution
must be deterministic regardless of where you're doing it.

A general tool for hands-off bi-di synchronization of arbitrary DBs is
not really feasible, not in a way that would satisfy most users.  Such a
tool would at minimum require plenty of schema-specific configuration to
be acceptbale.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Tricky grouping query

2010-10-01 Thread Nicolas Williams
On Fri, Oct 01, 2010 at 08:37:07PM +0100, Andy Chambers wrote:
> Given the following
> 
> create table events (
>  id,
>  date,
>  status
> );
> insert into events values ('001','a','N');
> insert into events values ('001','b','N');
> insert into events values ('001','c','Y');
> insert into events values ('001','d','N');
> insert into events values ('001','e','Y');
> insert into events values ('001','f','Y');
> insert into events values ('001','g','N');
> 
> Is it possible, using plain SQL, to derive the following
> 
> 001,c,d
> 001,e,g

It'd be easier, I think, if you had a monotonically increasing (always
by one) unique numeric column.  The you could do a select for rows in
events where the value for that column is between two values which are
sub-queries for immediately preceding and following rows in the same
table and where the status differs from that of the row in the top-level
query.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Need help with self-join (I think)

2010-09-24 Thread Nicolas Williams
On Sat, Sep 25, 2010 at 01:01:36AM +0200, Kristoffer Danielsson wrote:
> CREATE TABLE Test (TestID INTEGER PRIMARY KEY, Year INT NOT NULL, Name TEXT 
> NOT NULL);
> INSERT INTO Test (Year, Name) VALUES (2007, 'A');
> INSERT INTO Test (Year, Name) VALUES (2007, 'B');
> INSERT INTO Test (Year, Name) VALUES (2008, 'B');
> INSERT INTO Test (Year, Name) VALUES (2009, 'A');
> INSERT INTO Test (Year, Name) VALUES (2009, 'B');
> INSERT INTO Test (Year, Name) VALUES (2009, 'C');
> CREATE TABLE Test2 (TestID INTEGER NOT NULL UNIQUE, SomeValue INT NOT NULL);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (1, 123);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (2, 123);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (3, 123);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (4, 123);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (5, 123);
> INSERT INTO Test2 (TestID, SomeValue) VALUES (6, 123);
>  
> [...]
> My problem is that I want each combination of Year+Name to be represented - 
> with zero/NULL for SomeValue. How do I accomplish this?
> 
> 2007|A|123
> 2007|B|123
> 2007|C|NULL

You didn't have a Test1 row for (2007, 'C'), so why would you get
2007|C|NULL?  You also had one (and just one) row in Test2 for every
TestID in Test, so there are no NULLs that could appear as you request.

But, if you did:

INSERT INTO Test (Year, Name) VALUES (2007, 'C');

without a corresponding row in Test2, then your SELECT would not return
2007|C|NULL.  Try this:

SELECT Year, Name, SomeValue
FROM Test LEFT OUTER JOIN Test2 ON (Test.TestID = Test2.TestID)
ORDER BY Year, Name;

(Self-join is when both sides of the JOIN use the same table.  That's
not the case here.  What you were looking for here is an OUTER JOIN
instead of an INNER JOIN; NATURAL JOINs are a type of INNER JOIN.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UNIQUE PRIMARY KEY on TEXT column?

2010-09-17 Thread Nicolas Williams
On Fri, Sep 17, 2010 at 02:02:44PM -0400, Igor Tandetnik wrote:
> Keith Roberts  wrote:
> > I think what I really mean is I want a text column with only
> > one unique value, that is also indexed.
> 
> UNIQUE constraint will do just that.
> 
> > Also ,I guess the INTEGER PRIMARY KEY column is worth
> > keeping in the table, as a matter of course?
> 
> Usually, yes.

SQLite3 always has an integer primary key, either explicit or implied.
So if you have a primary key column that is not an integer primary key,
then effectively what you get is the same as having one column as
integer primary key and another column with a unique constraint.

It doesn't have to be so though.  A SQL RDMBS could easily have tables
which are hash-based instead of b-tree-based, in which case a table with
a primary key column that is not an integer primary key could well not
have an implied integer primary key.  But that's not what SQLite3 does
at this time.

In any case, it's almost always useful to have an integer primary key.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] what could be the reason that natural join stops working ?

2010-09-15 Thread Nicolas Williams
On Wed, Sep 15, 2010 at 08:05:26PM +0200, Stef Mientki wrote:
>  On 15-09-2010 11:36, Benoit Mortgat wrote:
> > Are you sure that after altering your tables adding columns, natural
> > join still only joins on vlid?
> >
> no, very stupid of me !!
> I added a column to each of the tables, with the same name, they
> should have the same value, but due to another programming error, the
> values were different.  again, thank all very much !!
> 
> That brings another issue to my mind, which I still don't understand:
> "why can we so easily manipulate complex pages of text, by adding /
> deleting / copying / pasting etc, and is manipulating of a few tables
> in a database so difficult?"

You shouldn't select * and you shouldn't natural joins precisely to
avoid these sorts of surprises.  Select * is handy for manual queries
from a shell; ditto natural joins.

Of course, you can always change your schema in such a way as to break
existing statements.  But the point here is to write statements such
that they will be stable in the face of a schema that evolves in a
backwards compatible way.  When you have to radically change your schema
you know you have to update your code; when you're merely adding columns
that don't contribute to primary keys then you shouldn't have to go
change any statements other than the ones where you specifically need
the new columns.  SQL shortcuts like '*' and natural join are just not
compatible with such a schema evolution model.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Detecting storage class from C

2010-09-08 Thread Nicolas Williams
On Wed, Sep 08, 2010 at 07:01:19PM -0400, Igor Tandetnik wrote:
> Schrum, Allan  wrote:
> > I guess I'm confused as to why sqlite3_column_type() would not work? It 
> > works for me on queries and I get back either
> > SQLITE_INTEGER, SQLITE_FLOAT, or SQLITE3_TEXT. While I know the types could 
> > be coerced to something else, at least it is an
> > answer of what you originally intended the column to be.  
> 
> No. sqlite3_column_type returns the type of the value in the given
> column and the current row. The type reported by sqlite3_column_type
> may change from row to row. It is largely unrelated to the type "you
> originally intended the column to be" (SQLite doesn't really have such
> a concept; column affinity comes closest, but there's no API to report
> it, directly).
> 
> In particular, if the value in the current row is null,
> sqlite3_column_type returns SQLITE_NULL.

You can use CHECK() expressions to ensure all values are of the intended
type, and you can use NOT NULL to avoid NULL (or treat NULL as being of
whatever type sqlite3_column_type() reports).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Suggestion for project: Chat

2010-08-24 Thread Nicolas Williams
On Tue, Aug 24, 2010 at 10:23:55PM +0530, Roger Binns wrote:
> On 08/24/2010 09:51 PM, Artur Reilin wrote:
> > I thinking about coding a chat with sqlite. 
> 
> The hard part of implementing chat is not storing the messages, but
> rather dealing with all the clients at the same time especially with
> potentially hostile networking (eg reading data slower than you send it).
> 
> If you want to write one yourself I suggest looking into the Erlang
> programming language.
> 
> Instead of reinventing the wheel, I suggest you use the open XMPP chat
> protocol and open source XMPP servers.  In addition to using any XMPP
> client, you can also use a web based client.

+1.

Moreover, because often you'll want to log chats hosted on services that
you do not control, the simplest, most popular, and most interoperable
(though not most reliable) way to log is to do it via "bots".  "Bots"
are automated clients.

The most reliable way to log chats is by modifying the service
implementation or by co-locating logger bots with the service, but this
only works when you own the service (or otherwise can get the owners to
log this way).

Don't reinvent the wheel, just find a suitable open source bot and
modify it to suit your needs.  You might even find some existing ones.
For example, the "Sqlite Users" wiki page [0] lists one exact match,
though it appears to be dead.

Just search for "chat bot logger sqlite", "xmpp bot logger sqlite",
etcetera, and you'll find plenty.  I count at least a handful just on
the first page of results for each of those searches.

Nico

[0] http://www.sqlite.org/cvstrac/wiki?p=SqliteUsers
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [PATCH] Fix locale-unsafe usage of tolower (upstreaming Chromium change)

2010-08-06 Thread Nicolas Williams
On Fri, Aug 06, 2010 at 02:11:33PM -0400, Richard Hipp wrote:
> If "ch" is an unsigned char then how is the following unsafe:
> 
>  ch = (ch<0x80) ? tolower(ch) : ch
> 
> And why does it need to be changed to
> 
> ch = (ch>='A' && ch<='Z') ? ch - 'A' + 'a' : ch;
> 
> There is only one such instance of code remaining in FTS3 (at
> fts3_tokenizer1.c:196) but I want to understand what the issue is before I
> change it.

The problem is that libc's tolower() is locale-aware.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-13 Thread Nicolas Williams
On Mon, Jul 12, 2010 at 11:25:07PM -0700, Roger Binns wrote:
> > No, just entry points into the library.  What makes you think that I
> > meant that every function in the library should check this?
> 
> I also meant entry points but wasn't specific.  There are a lot of them that
> acquire/release mutexes and every single one would need to be modified. That
> was my original approach until I hit on the idea of swizzling the mutexes
> instead.

There's about ~170 entry points, depending on which version of SQLite3.
There's about ~110 calls to sqlite3_mutex_enter(), and about as many to
drop mutexes.  Some mutex enter/leave calls might occur in loops
(haven't checked), but already ~170 < ~220 :)  Not that such a static
analysis says much.  Most apps probably use a small sub-set of entry
points, and estimating the number of mutex calls on average per-call to
one of those entry points is non-trivial.  But the point is that I would
generally expect #-of-entry-points < #-of-mutex-calls -- that needn't
always be true, but it's probably a good rule of thumb.

Your approach does allow you to implement fork-safety without modifying
SQLite3 source itself, which is pretty cool.  But that's not necessarily
how the SQLite3 maintainers might do it.

> > but when it comes to
> > POSIX file byte range locking (at least for Solaris anyways, which is
> > what I know best, implementation-wise) you can assume that the only
> > broken thing about them is their semantics, not their implementations.
> 
> Having spent a lot of my commercial programming life with many different
> Unix implementations, including Solaris, I do not trust them, especially for
> little used functionality.  And Solaris had many bugs in the past, but I
> haven't touched it in the last few years to form a recent opinion.  (Like
> this one time when I found Sun's NFS implementation was broken based on log
> messages on an HPUX machine and then doing packet capture, and Sun invented
> NFS ...)
> 
> In any event this roughly comes down to me saying to assume they are
> untrustworthy until it can be substantially be proven otherwise, and you
> seeming to believe that they are all correct seemingly based on
> extrapolating from a recent version of Solaris.  Or being a pessimist and
> optimist respectively :-)
> 
> This doesn't really matter.  [...]

I can't argue with detail-free anecdotes.  And I'm sure there have been
bugs.  I do have access to the Solaris bug database, so I could search
it, but by then I'd be spending an enormous amount of time just to argue
a rather minor point.  If you fill in some details I will do a search
for the bugs in question though (since details will help narrow down the
search).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Nicolas Williams
On Mon, Jul 12, 2010 at 08:43:32PM -0700, Roger Binns wrote:
> > They're insane (the _first_, not last, fildes close(2) in a process
> > drops all locks on the underlying file), but the child won't clobber the
> > parent's locks.
> 
> That is assuming all components (C libraries, threading, compatibility
> layers, operating system etc) are perfect.  Given how rarely these locks are

The point of support contracts is that you can assume crazy bugs like
you're supposing will be fixed (and, in this case, because of
conformance testing, won't happen in the first place).  If we not only
don't assume "perfect" components but also go so far the other way as to
assume these components are uselessly broken then we might as well go
home.  Surely the SQLite community sells support based on this very same
idea.

> used, and how many different platforms SQLite runs on as well as different
> vintages of those platforms, that is indeed a big gamble.  ie I would want
> to see evidence that they are all substantially right, ideally a way to
> establish (non-destructively) at runtime if some aren't etc.

The semantics of POSIX file byte range locking are well-defined.  The
operating systems supporting this feature have had it for very long
periods of time.  There are conformance test suites.  There's a lot of
things you might be able to suspect are broken, but when it comes to
POSIX file byte range locking (at least for Solaris anyways, which is
what I know best, implementation-wise) you can assume that the only
broken thing about them is their semantics, not their implementations.

Of course, well-tested-implementation-but-broken-design might not be
very comforting :(

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Nicolas Williams
On Mon, Jul 12, 2010 at 08:43:32PM -0700, Roger Binns wrote:
> Earlier you were trying to optimise out calls to getpid() and now you want
> every SQLite function source to be changed?

No, just entry points into the library.  What makes you think that I
meant that every function in the library should check this?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Nicolas Williams
On Mon, Jul 12, 2010 at 02:47:19PM -0700, Roger Binns wrote:
> About the only correct thing to do in a process using SQLite through a fork
> is to terminate the process.

You can return an error to the caller too.  Of course, if you're using
the mutex functions to do this then it's too late, so you have to
abort().  But if this were integrated into libsqlite3 then the entry
points could return an error.

That said, it's better to abort() as the error could be ignored by the
caller and this kind of error is so serious that it should be discovered
as soon as possible (it's hard to ignore an abort()).

>   (Even then that could mess with locks
> depending on how the operating systems deals with it - POSIX locks are 
> insane.)

They're insane (the _first_, not last, fildes close(2) in a process
drops all locks on the underlying file), but the child won't clobber the
parent's locks.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Nicolas Williams
On Mon, Jul 12, 2010 at 01:59:30PM -0500, Nicolas Williams wrote:
> The result is that you end up with a tiny penalty for fork
> detection: two loads, a compare and a likely-not-taken branch.

Actually three loads, two compares and two likely-not-taken branch,
unless you know you always have pthread_atfork() available at run-time
(as in Solaris 10 and up, whih is what I had in mind when I wrote the
above), in which case you can make this two loads, one compare and one
likely-not-taken branch.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-12 Thread Nicolas Williams
On Sat, Jul 10, 2010 at 11:27:41AM -0700, Roger Binns wrote:
> On 07/10/2010 07:12 AM, Eric Smith wrote:
> > Your wrapper is nice -- have you considered folding something like it 
> > into the core (disabled by default, enabled by a compile-time flag) and 
> > submitting it to drh for official adoption?  
> 
> DRH and the developers are aware of it as my approach was discussed on one
> of the mailing lists, and I did the performance measurements in response to
> their queries.
> 
> I'd have no problem contributing the code to SQLite, but there isn't very
> much of it and it is an open issue as to how you report the cross fork usage
> should it happen.  (In my case I know Python is being used.)

Fast fork-detection logic to make the library a little more fork-safe is
a very good thing to do in all otherwise-fork-unsafe libraries.

getpid() might not be fast enough on all operating systems.  What I'd do
is make getpid() a macro like so:

#if defined(HAVE_PTHREAD_ATFORK) && defined(HAVE_WEAK_SYMBOLS)
extern pid_t my_saved_pid; /* set from a pthread_atfork(3C) child handler */
#define my_getpid() \
(pthread_atfork == NULL ? getpid() : my_saved_pid)
#else
#ifdef NO_FORK_DETECTION
#define my_getpid() ((pid_t)-1)
#else
#define my_getpid() getpid()
#endif
#endif

/*
 * Library entry points should check whether we're forked:
 *
 *  if (FORKED(handle))
 *  return ();
 */
#define FORKED(handle)  ((handle->owner_pid == my_getpid()) ? 0 : 1)

This guarantees a fast getpid() no matter how slow the actual getpid()
might be, at least on modern operating systems.  Where getpid() is slow
and pthread_atfork() is not available you end up disabling fork
detection.  The result is that you end up with a tiny penalty for fork
detection: two loads, a compare and a likely-not-taken branch.

Also, where the library gets initialized, it has to set a fork child
handler with pthread_atfork(3C) if defined(HAVE_PTHREAD_ATFORK) &&
defined(HAVE_WEAK_SYMBOLS) and pthread_atfork != NULL.  The handler
would do nothing much more than my_saved_pid = getpid().  Where handles
are created, handle->owner_pid has to be set to my_getpid().

I strongly recommend this approach for all library developers.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-09 Thread Nicolas Williams
On Fri, Jul 09, 2010 at 02:38:08PM -0700, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 07/09/2010 02:31 PM, Nicolas Williams wrote:
> > The trick to making that go fast is to use pthread_atfork() to get the
> > new PID on the child side of fork() and store the PID in a global
> > variable so that you don't need to call getpid().
> 
> That assumes that pthreads is in use, that getpid is poorly implemented/slow
> and that enough of the 1% slowdown was due to getpid calls that changing it
> would make a difference :-)

I'm spoiled by Solaris, where (since Solaris 10) there's a unified
process model and every process has pthreads (not necessarily more than
one thread, mind you), with pthreads folded into libc.  :)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-09 Thread Nicolas Williams
On Fri, Jul 09, 2010 at 02:22:37PM -0700, Roger Binns wrote:
> On 07/09/2010 01:52 PM, Eric Smith wrote:
> > What do you mean, "immediately"?  As I said, my child comes to life,
> > does some work without touching (its copy of) existing SQLite strucures,
> > and then calls exit(2).
> 
> I'll bet you are actually getting exit(3) which means anything registered
> with atexit will be run.  (SQLite does not register with atexit.)

Oh, duh.  I forgot that distinction.  Yes, exit(3), not exit(2).  (Any
library with atexit handlers should check the process' PID if fork-
safety is an issue, and do nothing when called on the child side of a
fork().)

> In my wrapper I provide functionality that can check SQLite objects are not
> being used across processes.  The way it does this is by providing an
> alternate mutex implementation (almost every SQLite operation acquires and
> releases mutexes) and verifies the mutex is used in the same process id it
> was allocated in.  In a benchmark doing only SQLite operations I found a 1%
> performance hit.

The trick to making that go fast is to use pthread_atfork() to get the
new PID on the child side of fork() and store the PID in a global
variable so that you don't need to call getpid().

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-07-09 Thread Nicolas Williams
On Fri, Jul 09, 2010 at 04:52:35PM -0400, Eric Smith wrote:
> 
> > I strongly recommend that you always make the child side of fork(2)
> > either exit(2) or exec(2) immediately.  
> 
> Sorry Nico, I never saw this response -- I appreciate it!
> 
> What do you mean, "immediately"?

Good question.  One answer: "before ever utilizing state inherited from
the parent for libraries such as libsqlite3".

A much more conservative answer: "the exit(2) or exec(2) call should be
lexically close to the fork(2) call, and it should be obvious from
lexical context that the child will not ever utilize state inherited
from the parent for libraries such as libsqlite3".

(What a mouthful.  If there's a good English-language word to describe
the above without superfluously referring to 'time', as 'immediately'
does, I'd love to hear what it is.)

>   As I said, my child comes to life,
> does some work without touching (its copy of) existing SQLite strucures,
> and then calls exit(2).  The lifetime of the child is small wrt the
> lifetime of the parent.

That's fine.  The child could run forever then.  Of course, if you
design an architecture like that you end up creating the temptation
(for subsequent developers) to do more in the child process later,
so I'd not recommend it.

> Let's assume for the moment that I don't care about safety wrt
> non-sqlite libraries (except of course any libraries on which 
> sqlite depends).

Good luck!  :)

> > With respect to SQLite3, there are two sets of fork-safety issues: file
> > descriptor offsets (use USE_PREAD to avoid this issue), 
> 
> I take you to mean that the child and parent's fds point to the same
> underlying file description, and if the child changes the file
> description then it will have a side effect in the parent.

Exactly.

> But I have assumed that the child does not make any sqlite api calls
> against existing sqlite structures.  I believe this assumption allows 
> me to conclude that sqlite will not touch any existing fd, and hence
> will not bear such an impact on the parent (even if the child makes
> sqlite api calls against structures the child creates on its own).  
> Am I right? 

Correct.

> > and POSIX file byte range locks.
> 
> I'm not using POSIX locks, so I'm good to go there.  But even if I were,
> I believe my above reasoning applies equally well here, since I believe
> your reason for being concerned about it is similar.  The fds that were 
> duplicated across the fork refer to the same underlying file
> description, so we are technically in a "dangerous" state: the child
> *could*, at its whim, release the parent's lock (for example).  But if 
> it guarantees not to do so (by guaranteeing to make no sqlite calls
> against existing structures), then no harm will result.

No, I meant that if you stepped, in the child, a statement inherited
from the parent then the child would go ahead believing it has a lock
when in fact it does not (because the parent got the lock).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] open db cx to fork(2)d children

2010-06-28 Thread Nicolas Williams
On Mon, Jun 28, 2010 at 11:30:49AM -0400, Eric Smith wrote:
> From the docs:
> 
> > Under Unix, you should not carry an open SQLite database across a
> > fork() system call into the child process. Problems will result if you
> > do.
> 
> What if I fork a process that promises not to use the handle, and
> furthermore the child process certainly dies before the parent does?

Libraries could well (and many do) make use of pthread_atfork(3C) to
render "handles" unusable in a child process.  If an API is described as
caller "should not" or "must not" follow fork(2) then you need to be
prepared for it to use pthread_atfork(3C) as described, either now or in
the future.

> Will this still cause problems?
> 
> What if I add an additional assumption that the parent process is the
> only process that ever accesses the database?

I strongly recommend that you always make the child side of fork(2)
either exit(2) or exec(2) immediately.  Self re-exec()ing is fine, if
you can pass in whatever state you need preserved on the command-line
and/or pipe and/or temp file.  If you're going to fork(2) and neither
exit nor exec then you should be careful to do whatever cleanup the
various libraries demand, and stick to whatever fork-safety protocol
they insist on, if any.  Not all fork-unsafe libraries will give you a
way to re-initialize them on the child-side of fork(2); the only
fool-proof way forward is to exec(2) in the child of fork(2).

With respect to SQLite3, there are two sets of fork-safety issues: file
descriptor offsets (use USE_PREAD to avoid this issue), and POSIX file
byte range locks.  There may be other fork-safety issues too that don't
leap to my mind at this time.  Basically, fork(2) must be handled with
extreme care.

The easiest thing for library developers to do is to declare their
libraries to be fork-unsafe and make the matter a problem for their
callers, which in a sufficiently layered system means many libraries
become inherently fork-unsafe, which leads one to conclude even more
strongly that one must always exec(2) or exit(2) the child of fork(2).
What you really need from libraries is a way to recover whatever state
you'll want to re-establish after exec()ing in the child.  SQLite3 has
no such state (what state it has the caller already knows how to
re-create).  (Think of crypto APIs, where the state of a cipher might
not be feasible to reproduce with ease; there you need the library to
give you a way to export the cipher's state.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64 bit libsqlite for AIX and Solaris

2010-06-25 Thread Nicolas Williams
On Thu, Jun 24, 2010 at 01:37:50PM +0530, Sushil wrote:
> I am looking for 64 bit libsqlite for AIX and Solaris. Is there a
> place from where I can get them pre-built ?

For OpenSolaris you can find SQLite3 packages, including a 64-bit build
of the library, in the OpenSolaris IPS /release package repository.

> I have downloaded sqlite-amalgamation-3.6.23.1.tar and building it.
> But I don't see any option in configure to build it for 64 bits. Will
> the default library built be 64 bits. ?  If not, then how to build
> them ?

SQLite3 has no option for building both, 32- and 64-bit objects, nor
does it have an option to build either.  Which of 32- or 64-bit objects
you get depends on your compiler's defaults and your CFLAGS.

So, to build SQLite3 in 32- and 64-bit you need to build it twice, and
one of those times at lease you'll have to add "CFLAGS=..." to your
./configure command-line, with <...> replaced with whatever options your
compiler needs in order to produce whichever of 32- or 64-bit object
code you want.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] marking transaction boundaries

2010-06-23 Thread Nicolas Williams
On Tue, Jun 22, 2010 at 08:35:14AM -0700, b s wrote:
> hi,
> long ago, drh had proposed a trigger like mechanism that
> can be invoked at the begin/end of a transaction.
> http://osdir.com/ml/db.sqlite.general/2003-04/msg00137.html
> 
> the general consensus was there is no use other than up'ng
> a counter. however, i have a requirement where i would like
> to mark transaction boundaries from a user level.

That's not how I read that thread.  Around-transaction triggers are
incredibly useful (they're even more useful with a procedural language
or at least some way to apply logic to all rows modified/inserted/
deleted by the transaction).

Since there aren't any, your only choice is to manually wrap the
transaction statements -- at least the COMMIT statement, while the BEGIN
could be handled with a temp table and a trigger to do an INSERT OR
IGNORE into it which you DELETE on COMMIT.

CREATE TEMP TABLE current_transaction (rowid INTEGER PRIMARY KEY, tran_id TEXT);

-- The trigger has to be a TEMP trigger too...
CREATE TEMP TRIGGER ...
INSERT OR IGNORE INTO current_transaction VALUES (1, randomblob(16));
UPDATE ... SET NEW.tran_id = (SELECT tran_id FROM current_transaction);
END;
...

and add "DELETE FROM current_transaction;" before every COMMIT.  (You
also need to do this DELETE on rollbacks.

If you re-open your DB connection for every transaction (well, OK,
that's heavy-duty), then you don't even need to wrap the COMMIT, but you
do need to create the temp table before you BEGIN.

So that's your choice: wrap the BEGIN or the COMMIT/ROLLBACK, or both.

Back to the 2003 thread...  Triggers on ROLLBACK, BEFORE BEGIN or AFTER
COMMIT obviously can't modify the DB, so they are only useful for
invoking user-defined functions, really, but that _is_ a useful feature!

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Composite Foreign Key not enforced

2010-05-17 Thread Nicolas Williams
On Fri, May 14, 2010 at 03:01:39PM -0400, George Somers wrote:

[edited for brevity]

> CREATE TABLE parent(parentID INTEGER PRIMARY KEY, ...);
> 
> CREATE TABLE child(childID INTEGER PRIMARY KEY, fatherID INTEGER, ...);
> 
> CREATE TABLE fatherChildActivity(
>fcChildID   INTEGER,   
>fcParentID  INTEGER,
>fcPlaysWhat TEXT,   
>FOREIGN KEY(fcChildID, fcParentID) REFERENCES child(childID, fatherID)   
>-- This foreign key make sure that 1) child exist, 2) father exist,
>-- 3) father is really father of child.
> );

Why do you need fatherChildActivity to make sure both, child and father
exist when the existence of the child implies the existence of the
father?  Or can you have father-less child rows as long as the have no
relations in fatherChildActivity?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Changing file descriptor of database file

2010-05-08 Thread Nicolas Williams
On Fri, May 07, 2010 at 05:24:58PM -0400, Mikhail Terekhov wrote:
> IMHO it would be better to switch to poll/epoll from select instead of
> fighting file descriptor numbers.

Better: use libevent.

(select()'s limit of 1024 fildes is... odd and derives mainly from the
FD*() macros and their semantics.  Not all operating systems enforce a
limit of 1024 fildes as the obvious thing to do is to treat the fildes
sets as bitmasks of nfds bits rounded up to a multiple of NFDBITS.
Solaris does enfore a 1024 limit for select(), at run-time.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select %column% from table

2010-04-30 Thread Nicolas Williams
On Fri, Apr 30, 2010 at 07:57:05AM -0700, David Lyon wrote:
> if I had many many files like this:
> http://research.stowers-institute.org/efg/ScientificSoftware/Utility/FCSExtract/CC4_067_BM.txt
> 
> you see 2 columns keyword and value, the keywords would be the fields
> (1st column in the html link above) in the table while the values
> (second column) are the data I would insert.

When you get to have many similar columns in one table you should
consider normalizing.  So a table with 10 sets of 6 very similar
columns, as in your example:

CREATE TABLE project (name TEXT, ...,
  P1R INT, P1B INT, P1N TEXT, P1S TEXT, P1E TEXT, P1P INT,
  P2R ...);

can become two tables:

CREATE TABLE project (project_id INTEGER PRIMARY KEY, name TEXT, ...);
CREATE TABLE projectPs (project_id INT, x INT,
PxR INT, PxB INT, PxN TEXT, PxS TEXT, PxE TEXT,
PxP INT);

-- get P1R, P2R and P3R for name = ...
SELECT proj.*, ps.PxR FROM project proj NATURAL JOIN projectPs ps WHERE
proj.name = ... AND x < 4;

or:

CREATE TABLE project (project_id INTEGER PRIMARY KEY, name TEXT, ...);
CREATE TABLE project_fields (project_id INT, keyword TEXT, value TEX);

-- get P*R for name = ...
SELECT proj.*, pf.value FROM project proj NATURAL JOIN project_fields pf
WHERE proj.name = ... AND pf.keyword LIKE P_R;

or:

CREATE TABLE project (project_id INTEGER PRIMARY KEY, name TEXT, ...);
CREATE TABLE project_int_fields (project_id INT, keyword TEXT, value INT);
CREATE TABLE project_text_fields (project_id INT, keyword TEXT, value TEXT);

and so on.

You could go all the way and just have one table indexed by file name
and keyword with a value column.  However, understanding your data will
help you do better; the file you point to doesn't really include enough
information to know how best to represent it and others like it in a SQL
schema.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] copy one row to another

2010-04-07 Thread Nicolas Williams
sqlite> CREATE TABLE foo(id INTEGER PRIMARY KEY, a, b, c);
sqlite> insert into foo values(1, 'a', 'b', 'c');
sqlite> select * from foo;
1|a|b|c
sqlite> CREATE TEMP TABLE tempfoo AS SELECT * FROM foo WHERE id = 1;
sqlite> UPDATE tempfoo SET a = 'z';
sqlite> INSERT OR REPLACE INTO foo SELECT * FROM tempfoo;
sqlite> SELECT * FROM foo;
1|z|b|c
sqlite> 

The trick is CREATE TEMP TABLE ... AS SELECT * FROM ... WHERE ...
followed by INSERT OR REPLACE INTO ... SELECT * FROM .
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Berkeley DB adds SQL using SQLite API !!

2010-03-31 Thread Nicolas Williams
On Wed, Mar 31, 2010 at 06:16:13PM -0700, Jim "Jed" Dodgen wrote:
> On Wed, Mar 31, 2010 at 8:50 AM, Wiktor Adamski
> >> (3) Each table and index is in a
> >> separate file so your "database" was a directory full of files instead
> >> of a single file
> >
> > This one is not a problem. Actually I don't see how 1 file is better
> > than 1 directory. [...]
> 
> Two reasons I prefer the single file approach:
> 
> 1. Simpler copy, tables and indexes don't get lost or mismatched.
> 
> 2. fewer handles to open a database. Lower overhead.

3. One fsync() vs. many (or a full sync()).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite over NFS

2010-03-29 Thread Nicolas Williams
On Mon, Mar 29, 2010 at 07:50:38AM -0700, Kumar, Abhinav wrote:
> I am using SQLite version 3.5.9. My db sizes are 50-100 Gb. My DB is a
> typical star schema. I am seeing an order of magnitude more time to do
> a simple select query when doing over NFS (30-60 seconds) as compared
> to local disk (2 seconds). Is there any way to optimize this ?

Does setting the DB page size to 4KB help?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Available alternatives to syntax diagrams in documentation

2010-03-25 Thread Nicolas Williams
On Tue, Mar 23, 2010 at 11:50:10PM -0400, Shane Harrelson wrote:
> I have no idea what kinds of things a "screen reader" would need to make
> this version useful, but if you let me know, I will try to add them.

Screen readers are used by those who have impaired sight or not sight
(i.e., blindness).  Screen readers look for character data and generate
speech to read it.  GIFs, PNGs, ... are not readable by screen readers
-- not without resorting to OCR.  Thus just generating text to put in
the ALT tag of the IMGs would be a huge improvement in making the docs
accessible to the visually impaired.

What you made available at

http://www.sqlite.org/docsrc/artifact/873cf35adf

is readable via a screen reader because it's all text (the only IMG in
that page is for the SQLite3 documentation logo).

All you have to do now is make the scripts that generate

http://www.sqlite.org/lang.html

and friends put the text diagrams as ALT text for the GIFs and you're
done :)

(I don't need accessible docs, but clearly others do; on their behalf:
thanks!)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Computing day of week the using strftime() function - Feature or bug ?

2010-03-23 Thread Nicolas Williams
On Tue, Mar 23, 2010 at 07:25:56PM +, Bernie Reiter wrote:
> I am checking this for Sunday, March 21st 2010, Monday, 22nd March 2010 and 
> Tuesday, 23nd March 2010:
> 
> Sunday, March 21st 2010:SELECT strftime('%w',2010-03-21); => 6

You need single quotes around the date value.

2010-03-21 == 2010 + -03 + -21 -- surely not what you intended.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Why we don't have strpos function?

2010-03-17 Thread Nicolas Williams
On Wed, Mar 17, 2010 at 03:02:59PM -0500, Jay A. Kreibich wrote:
> On Wed, Mar 17, 2010 at 04:42:59PM -0300, Israel Lins Albuquerque scratched 
> on the wall:
> > are you right the call to strpos("??", "??") are returning 5 and 
> > not 3 
> > 
> > I'm looking for this... 
> 
>   You can't do pointer math with values returned by strlen().

Hmmm.  On Solaris strlen() returns the number of _bytes_, not
characters, in the given string.  On Linux the manpage claims that
strlen() returns the number of characters in the string, but doesn't
define "character", but then, other glibc manpages actually have
examples of using strlen() in pointer arithmetic (see utmpname(3), for
example).  I can't imagine the glibc strlen() counting characters in the
Unicode sense, only in the old C sense (char), but then, I've not looked
at its source code.  In any case, there's a lot of code out there that
uses strlen() for pointer arithmetic.  And if strlen() really did count
characters rather than bytes, that wouldn't cause buffer overruns (since
number of characters in a string is necessarily less than or equal to
the number of bytes) though it would cause other bugs, some potentially
security bugs, such that I believe it quite unsafe for strlen() to do
anything other than count bytes in a string (not including the NUL
terminator).

As to strpos(), one should not naively implement it or anything like
strstr() or strrstr().  The relevant Unicode concepts are: codepoint,
character (composed of codepoints) and glyph (composed of characters and
codepoints).  Even if you support only codepoints you have to be mindful
of multi-byte encodings in UTF-8 and UTF-16.  Multi-character glyphs are
harder to deal with than multi-codepoint characters since you can easily
determine whether a codepoint is a combining codepoint (well, you have
to map the codepoint to various codepoint ranges, so that this is not a
cheap operation).  Normalization also affects strstr()-like functions.

On the plus side you can optimize such functions whenever you see two or
more contiguous US-ASCII codepoints.

Cheers,

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] parallelizing an update

2010-02-02 Thread Nicolas Williams
On Tue, Feb 02, 2010 at 09:23:36AM +0100, Sylvain Pointeau wrote:
> On Mon, Feb 1, 2010 at 5:16 PM, Nicolas Williams
> <nicolas.willi...@sun.com>wrote:
> > Now to parallelize this:
> >
> > function par_updates {

> I would be very interested to see some benchmark, just to see.

Feel free to write the relevant program, schema, SQL statements and run
benchmarks against it.  What performance you get will depend on: what
the long-running processes are doing (are they CPU bound or I/O bound?)
and what CPU or I/O resources you have.

Note: I didn't test par_updates(); I've written that sort of script
many times before, so the gist of it is right.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] parallelizing an update

2010-02-01 Thread Nicolas Williams
On Sat, Jan 30, 2010 at 10:36:56AM +0100, Sylvain Pointeau wrote:
> echo "begin transaction" >> update.sql
> 
> sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
> while read rowid item ; do
>  status=$(long_running_process "${item}" )
>  echo "update foo set status=${status} where rowid=${rowid} ;" >> update.sql
> done
> 
> echo "commit transaction" >> update.sql
> 
> sqlite3 sample.db < update.sql

More aesthetic (since it seems to matter :)

(
echo "begin transaction"
sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
while read rowid item ; do
status=$(long_running_process "${item}" )
echo "update foo set status=${status} where rowid=${rowid} ;"
done
echo "commit transaction"
) | sqlite3 sample.db

Now to parallelize this:

function par_updates {
typeset -i n
n=$1
shift
(
trap "((n++))" CHLD
echo "begin transaction"
sqlite3 -separator $'\t' sample.db 'select rowid, item from foo;' |
while read rowid item
do
while ((n == 0))
do
sleep 1
done
(echo "update foo set status=$("$@") where rowid=$rowid;") &
((n--))
done
echo "commit transaction"
) | sqlite3 sample.db
}

You should run this like so:

par_updates 20 long_running_program [arguments]

It should run up to 20 processes to generate the updates.  And it should
batch all updates into one transaction.

I've not tested this, but it should work with recent versions of ksh93
and bash.  (Older versions of ksh93 might not handle the SIGCHLD trap
correctly, IIRC.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] using SQLite to store test-data having C.J.K.T. characters, in it?

2010-01-05 Thread Nicolas Williams
On Wed, Jan 06, 2010 at 08:21:05AM +0530, Harsha wrote:
> I am developing a small testing application, which in vague way does
> compare/diff etc. b/w the results obtained by executing a CLI command
> under test.  The output of this command may contain Asian character
> such as Chinese(C), Japanese (J), Korean (K) and Taiwanese (T).
> 
> And these characters may be in a specific encoding based on the OS for
> example:
> 
> Chinese - GB18030 on Solaris
>UTF-8 on Linux
> Japanese - Euc-jp on Solaris
> SJIS on windows
> UTF-8 on Solaris
> Taiwanese - UFT8 on windows and linux
>   BIG5 on Solaris
> Korean -EUC-Kr on Solaris
> UTF8 on windows and linux

Solaris definitely supports UTF-8.  The thing to do is to use UTF-8.

You can do that by either: a) using only UTF-8 locales, or b) adding
code to your application to convert between UTF-8 (used by SQLite3) and
the user's locale's codeset.  That is, convert user input to UTF-8, then
use those strings with SQLite3, and convert output back to the user's
locale's codeset before displaying it to the user.

For (b) see the iconv(3C) and iconv(1) manpages to get started on
codeset conversions, and the setlocale(3C), nl_langinfo(3C) and
langinfo.h(3C) manpages to figure out how to get the current locale's
codeset (first call setlocale(), then nl_langinfo(CODESET) to get the
codeset.  These are portable interfaces, available on Solaris and Linux
both; I don't know how you'd do the same on Windows.

> I had a glance at the documentation of SQLite, all I could figure reading
> few sections of the doc. is that.. SQLite support Unicode (up to UTF-16
> LE,BE)
> Is it a good idea for me to use SQLite to store the test-data involving the
> above set of encoding?

You can store non-UTF-8, non-ASCII strings if you use BLOBs, but I
recommend (a), else (b), as described above.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ICU collation

2009-12-22 Thread Nicolas Williams
On Tue, Dec 22, 2009 at 07:49:24AM -0500, Tim Romano wrote:
> On 12/22/2009 5:31 AM, Sylvain Pointeau wrote:
> > It cannot be done in the application layer...
> >
> You are wrong about that. I have written a full-text search application 
> to go against ancient Germanic texts where, for example, there were two 
> dozen ways to spell the word for modern English 'sister' --spelling had 
> not yet been regularized but reflected regional dialect pronunciations 
> and regional scriptorium conventions.  There is no way an ICU collation 
> could handle that crazy quilt and it had to be done in the application 
> layer.
> 
> It is done in the application layer  by *normalizing* the data on its 
> way into the database and then, of course, you must also normalize the 
> search terms as the user supplies them.  So, for example, if you are 
> importing a-umlaut you store 'ae' and if you are given a-umlaut as a 
> search term by the user, you search for 'ae'.  Normalization of 
> graphemes is analogous to Unicode decomposition of composite characters.

Indeed.

> However, if SQLite can flip an ICU German collation into Full 
> Normalization mode this could be done in the database.

Sure, but you lose control in the process.  Suppose you eventually need
to add support for non-German locales yet you've been using a collation
that performs these conversions -- oops.  Or suppose there are
well-known words of foreign origin to which this sort of normalization
must not be applied: a generic toolkit like ICU, that works
character-by-character or codepoint-by-codepoint, will not know about
them -- why should it?  And so on.

> P.S. I recently asked for a lightweight raw "reverse-string" 
> (codepoint-by-codepoint) function to be added to the SQLite core 
> (because I don't have access to its UDF mechanism in Adobe 
> Flex/FlashBuilder) and do agree that there are often good reasons for 
> wanting something to be done in the database layer, provided it does not 
> slow the database down for everyone else.

Yes.  I believe that databases need to support Unicode normalization-
insensitive/preserving behavior, at least as an option (most input
methods produce pre-composed output, so often one can get away with not
normalizing at all).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ICU collation

2009-12-21 Thread Nicolas Williams
On Tue, Dec 22, 2009 at 06:56:29AM +, Simon Slavin wrote:
> On 22 Dec 2009, at 4:55am, Sylvain Pointeau wrote:
> > How is [this?] supposed to work ICU in SQLite?
> 
> I hope someone can answer your question.  I don't know enough.

I don't know about ICU but, really this is something that needs to be
handled where you have enough context to decide if a given equivalent
spelling rule actually applies.  You can only really do that at the
application layer, where you might know such things as the user's locale
(language, codeset, ...).  By the time SQL collations are involved it's
probably too late and you'll get equivalences that you might not have
intended.

Can you "normalize" any such strings at the application layer prior to
passing them to SQLite3?

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite files and locking on Lustre filesystems.

2009-12-14 Thread Nicolas Williams
On Sun, Dec 13, 2009 at 11:02:18AM -0800, George Hartzell wrote:
> My computing environment at work is bringing a new cluster online, in
> addition to some local storage there will be shared access to SAN
> style storage using the Lustre filesystem.
> 
> My reading about Lustre tells me that it supports POSIX semantics,
> including flock and lockf (although they must be explicitly enabled).

Yes, Lustre provides POSIX semantics, including true POSIX data
synchronization semantics (i.e., no need to worry about NFS-like
close-to-open semantics).

> That suggests to me that sqlite databases can be safely read/written.
> Is that correct?

Yes.

> Does anyone have any direct experiences they can share with sqlite and
> Lustre?

I haven't tried it yet.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] BUG: The sqlite3 shell does not call setlocale

2009-12-14 Thread Nicolas Williams
On Sat, Dec 12, 2009 at 12:39:23PM -0800, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Alexey Pechnikov wrote:
> > On unix the shell must do this initialisation:
> > 
> > setlocale(LC_ALL, "");
> 
> Why?  Yes I know what the call does, but what desirable effect does it have
> on shell input and output?  Pretty much all formatted output is done using
> %s.  The only float output is for timing commands.  No input is done using
> scanf.  Except for the disabled by default iotrace, all fopens are in binary
> mode.

If there's any perror()/strerror() calls, or if you're printing
floats/doubles for user presentation then you definitely want to
setlocale().  There are probably other reasons to call setlocale().

I see only strerror() calls in the proxy lock paths, so that's out.  And
SQLite3 doesn't distinguish between user presentation and other
contexts.  So not calling setlocale(LC_ALL, "") seems possibly
appropriate, but it seems better to either not depend on it (so that if
called by an the application, or an extension, nothing bad happens) or
to force the use of the C locale (but a library must not do that!).

Since an application using libsqlite3 can always call setlocale(), it
seems to me that the sqlite3 shell should as well.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The byte order mark problem in fields with ICUcollation

2009-12-07 Thread Nicolas Williams
On Mon, Dec 07, 2009 at 05:35:49PM -0500, Igor Tandetnik wrote:
> Alexey Pechnikov 
> wrote: 
> > The normalization is now performed by any string operation. But more
> > fast and useful to do it once at data store. 
> 
> So, which normalization form should the data store choose for me? And
> what if I need a different one?
> 
> I'd rather the database store my data exactly the way I put it in. I
> really don't want it to decide for me what my data should look like.

I believe the right thing to do is to normalize strings when creating
index entries, but to leave the table data unnormalized.  You'd have to
make the equality operator also normalize though.

That way you can have a unique text column and it will accept 
only one way, composed or decomposed, but not both.

I.e., normalization-insensitive matching, normalization-preserving.
Provides the best user experience.  If multiple systems' input methods
produce text in different normalization forms, or even unnormalized,
users will still find their data -- no surprises.  And given that
whatever systems the users are using likely can display the strings
produced by their input modes, preserving those strings unmodified gives
you the highest likelihood that the strings returned will display
properly.

(This is what Solaris implements for NFSv4, CIFS and local ZFS
filesystem access, for example.  ZFS hashes directories, and it
normalizes filenames prior to hashing, both on create and lookup, but
the directory entries are left unnormalized.)

To do this right requires support in SQLite3, even if it's provided by
an extension.  I don't recall if user-defined collation functions
provide everything you need to support this.

> >>> May be automatically dropping the BOM for
> >>> ICU collated fields is more correct way.
> >> 
> >> Why don't you do just that in your application?
> > 
> > Yes, I fix it in my application, but this problem can be produced in
> > any application. 
> 
> One person's problem is another's feature. If that other application
> doesn't want BOM in its strings, it should strip it, just like yours
> now does.

+1

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] The next release of SQLite....

2009-12-03 Thread Nicolas Williams
Use the glob operator.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WHERE CLAUSE in UNICODE

2009-11-30 Thread Nicolas Williams
On Mon, Nov 30, 2009 at 08:56:48PM +, Simon Slavin wrote:
> On 30 Nov 2009, at 6:49pm, Nicolas Williams wrote:
> > See my previous message: it would make no sense to have a column with
> > data-dependent collations.  But perhaps I'm missing something.  Can you
> > describe the semantics of data-dependent collations?
> 
> Suppose I am preparing a sales system.  It can be used by users in
> many countries, but a user in each country sees only products sold in
> that country, and they see those products sorted into alphabetic order
> -- however their own country sorts its alphabet.  So in one table I
> have
> 
> Product IDMarket  Product name
> ----  
> 1 US  Box
> 2 US  Bible
> 3 US  Barrel
> 4 France  Boîte
> 5 France  Bible
> [...]
> 
> So in the same column I have words in three different languages.  If
> German had a different alphabet to English, depending on which
> customer I am SELECTing for I will need to use a different collation
> to sort the results.  Of course, I expect this is relatively rare, and
> can be perhaps be handled using VIEWs or some other trick, but I
> wanted to demonstrate that it is sometimes needed.

This falls into case (b): localization.  Here it'd be nice if SQLite3
allowed collations to be named via parameters, so you could compile a
query once.  But it does not, so you need to compile a query for each
language.  But that's as far as it goes: you're still specifying the
collation rather than letting the data specify a collation.

I.e., it'd be nice if you could compile

SELECT product_name FROM products
WHERE market = :country ORDER BY product_name COLLATE :lang ASC;

bind in the country and language, and then execute, but as it is SQLite3
doesn't let you parametrize collations.  So you have to mprintf and
compile a separate query for each collation.

Oh, I suppose one might want to allow a collation to be the result of
another query, so that you could lookup collations by country code, or
what have you

SELECT product_name FROM products
WHERE market = :country
ORDER BY (SELECT lang FROM country_to_lang WHERE country = :country)

But that's still not a collation that is derived from the data being
sorted, but from other data.

This:

SELECT product_name FROM products
WHERE market = :country ORDER BY lang_of(product_name);

makes no sense!  That's because nothing guarantees that
lang_of(product_name) will be singular here.  (Also, not only does it
make no sense, it doesn't help you since SQLite3 will not compile such a
query.)

> >>> Note too that Unicode has codepoints for specifying the language that
> >>> the subsequent text is written in.
> > 
> > http://unicode.org/unicode/faq/languagetagging.html#2
> > http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#G17521
> > (section 16.9)
> 
> Many thanks.  I did not know this and it saves me from having to
> invent my own system in something else I am doing.  I do understand
> why it should not normally be used and I will not over-use it.

I don't think you should use Unicode language tags though.  See above
and previous messages.

> On 30 Nov 2009, at 7:16pm, Nuno Lucas wrote:
> > Note that some countries have different collations depending on the 
> > objective of the output. For example, dictionary order may be different 
> > from phone-book order and different from other general listing order.
> 
> Arg !

Yes, it all sucks.  At least some language academies are now working to
reduce such problems.  For example, it used to be that case folding is
Spanish did preserve accents in the to-upper-case direction, and Spanish
used to have special sort orders for 'ch' and 'll' (which were as though
a single character).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WHERE CLAUSE in UNICODE

2009-11-30 Thread Nicolas Williams
On Mon, Nov 30, 2009 at 06:37:11PM +, Simon Slavin wrote:
> 
> On 30 Nov 2009, at 5:51pm, Nicolas Williams wrote:
> 
> > Consider a column that contains a person's last name.  Q: do proper
> > names have a language?  A: No, since people can be from all over and
> > even within a single country may have last names of various radically
> > different origins.
> 
> But what is the purpose of collating a column ?  Why, to allow it to
> be indexed, of course.  And for it to be indexed every value in the
> column must be comparable to every other value.  So it might be
> sufficient to simply declare the column as having a language:
> 
> ALTER TABLE ADD COLUMN familyname UNICODE LANG Deutsche
> 
> Actually, we'd probably use ISO 639-3:
> 
> ALTER TABLE ADD COLUMN familyname UNICODE LANG deu

There's already a COLLATE column-constraint.  Given the use of Unicode
'collation' is approximately the same as 'language'.  One use of it is
to ensure that an index can be used to optimize ORDER BY clauses where
the ORDER BY clause's collation is defaulted or the same as in the
index, but it also affects comparisons, even equality comparisons.

> That would be sufficient to allow the standard SQL functions like
> indexing and comparison to be implemented.  The column 'language'
> could perhaps be absolute, or perhaps be used as a default if the
> individual values did not declare a language.  On the other hand, it
> might perhaps not be necessary to declare the language for each
> column: it's likely that all columns for any database would want to
> use the same language for collation.

See my previous message: it would make no sense to have a column with
data-dependent collations.  But perhaps I'm missing something.  Can you
describe the semantics of data-dependent collations?

> > Note too that Unicode has codepoints for specifying the language that
> > the subsequent text is written in.
> 
> I did not know this !  This makes things simpler.  Are you talking
> about
> 
> http://unicode.org/reports/tr35/
> 
> This appears to be a way of specifying a language outside of the text
> stream, not inside it.

No, I meant this:

http://unicode.org/unicode/faq/languagetagging.html#2
http://www.unicode.org/versions/Unicode5.0.0/ch16.pdf#G17521
(section 16.9)

Note that use of Unicode language tags is discouraged.  The same reasons
apply here, IMO.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WHERE CLAUSE in UNICODE

2009-11-30 Thread Nicolas Williams
On Mon, Nov 30, 2009 at 01:21:08PM -0500, Igor Tandetnik wrote:
> Nicolas Williams <nicolas.willi...@sun.com> wrote:
> > IMO you'll have two types of text to sort: a) generic text (e.g.,
> > proper names), b) localized text (e.g., message catalogs).  For (a)
> > you'll want 
> > to pick a collation, _any_ collation.
> 
> Actually, you may want to choose a collation familiar to your
> application's user. After all, she's the one looking at the list of
> names, the one you have to convince the list is in fact sorted.

Indeed, that's one way to pick a collation.  But you might not always be
able to do even that!  Suppose you need to print a checklist and post it
on a wall/door/whatever, and have various people update it.

> E.g. Windows has the concept of default locale and sort order, chosen
> by the user (Control Panel | Regional and Language Options). I imagine
> other operating systems provide something similar. You wouldn't
> normally want to build indexes using this collation though, as it can
> change at any time (in fact, with multiple users sharing the same
> database, you may end up sorting the same data in two different ways
> at the same time). It's only good for sorting on the fly.

Right.  For indexes you need a collation.  You might keep multiple
indexes built with different collations, but that sounds like a waste of
resources (unless you have very static data).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] WHERE CLAUSE in UNICODE

2009-11-30 Thread Nicolas Williams
On Mon, Nov 30, 2009 at 08:15:58AM +, Simon Slavin wrote:
> On 30 Nov 2009, at 1:58am, Igor Tandetnik wrote:
> > Note that Unicode collation is not as simple as you might think. Did
> > you know that in Estonian, 'y' sorts between 'i' and 'j'? Or that in
> > German phonebook sort, 'oe' sorts as if it were a single letter
> > between 'o' and 'p'? Basically, your simplistic approach would only
> > work for plain unaccented Latin letters and English collation rules.
> 
> I spent a lot of time annoyed about this, and ended up deciding that
> the only way to do Unicode sorting correctly is to store the language
> (or collation method) with each piece of Unicode text.  Of course,
> this still gives you the problem of working out which order two pieces
> of text go in if they are in two different languages.  Perhaps you
> also need a 'default language' marker for the entire column.

Consider a column that contains a person's last name.  Q: do proper
names have a language?  A: No, since people can be from all over and
even within a single country may have last names of various radically
different origins.

Now consider a column that contains a person's self-description.  Q:
what language will it be in?  A: If you don't specify it, it could be in
one (or more!) language(s) of the person's choice.  You might not need
to sort by such a column though...

IMO you'll have two types of text to sort: a) generic text (e.g., proper
names), b) localized text (e.g., message catalogs).  For (a) you'll want
to pick a collation, _any_ collation.  For (b) you'll know the language
of the relevant text and can then sort with a specific collation.  If
you think there is a third class of text then you'll be best off forcing
it into (a) or (b), IMO, as statements like this make no sense:

CREATE TABLE foo (txt, lang);
...
-- this makes no sense:
SELECT txt FROM foo ORDER BY txt COLLATE lang; -- where lang is a
   -- column of foo;

Unfortunately you can't parametrize collations in COLLATE clauses in
SQLite3!  I.e., this gives an error:

sqlite> CREATE TABLE toy(a);
sqlite> EXPLAIN SELECT a FROM toy ORDER BY a COLLATE :a ASC;
SQL error: near ":a": syntax error

Therefore for (b) you'll have to compile statements for each collation
that you want to use.  (Say you have per-language tables: you'll need
separate SELECT statements for each table.  Say you have a single table
with a text column and another column indicating language: you'll need
to compile a statement for each language since the collation cannot be
parametrized.)

Making collation functions parametrizable might be a useful extension,
so you could then write:

sqlite> CREATE TABLE toy(txt, lang);
sqlite> explain SELECT txt FROM toy WHERE lang = :a ORDER BY txt COLLATE :b ASC;

where :a specifies some language and :b specifies a collation for that
language.

Note too that Unicode has codepoints for specifying the language that
the subsequent text is written in.  Such codepoints could be used for
deriving a collation from some text.  But again, I don't think this will
prove useful, both, for the reasons given above (SELECT ...  ORDER BY
... COLLATE lang_of(...) makes no sense) and also because users won't
know how to ensure that such language tags are embedded in the text that
they write.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode support

2009-11-20 Thread Nicolas Williams
On Tue, Nov 17, 2009 at 09:31:46PM -0500, Tim Romano wrote:
>  but if ORDER BY is
> relying on an index for ordering, then flip() can have negative 
> effects.
> 
> 
> Substr() could have negative effects on ordering too.  That is a red 
> herring.  Flip() is merely a function that  reverses the order of 
> codepoints "as found" without knowing anything about what those 
> codepoints, individually or in combination, might signify in a writing 
> system.  If I want to write those codepoints to a column that's my concern.

In Unicode there's codepoints, characters, and glyphs.  Codepoints are
single 21-bit values.  Characters are either single codepoints or
combinations of codepoints.  Glyphs are either single characters or
combinations of characters that are displayed as single
programatically-constructed glyphs.  SQLite3 knows about none of that.
Nor about normalization forms.

Therefore any functions like substr() and flip() that work at the
codepoint level (or worse, at the byte level, but fortunately substr()
is UTF-8/16 aware) can break semantics for your strings.

> What if I wanted to have a column that consisted of codepoints from all 
> over the Unicode range: a codepoint from Greek next to a codepoint from 
> Swahili next to a codepoint from Hungarian?  Shouldn't I be able to say 
> to a database:  this column contains codepoints (characters) and 
> collation is not relevant, sort the column using the numeric value of 
> the codepoints? 

Yes, I think so.  I'm not sure why you'd want that, but yes, it ought to
be possible, and right now SQLite3 lets you do that because it is not
aware of characters and glyphs -- SQLite3 is aware of only codepoints.
But if you load the ICU extensions that might change!

Ideally there should be a way to indicate a variety of Unicode-related
behaviors:

 - normalization form for use in index keys
 - normalization-insensitive string comparison operators
 - whether to normalize values in tables and, if so, with what form (by
   column, obviously)
- if you normalize strings in index keys but not in tables then you
  get normalization-insensitive-but-normalization-preserving
  behavior, which is really, really convenient

 - collation options, such as language
- whether to honor language tags embedded in the UTF-8/16 strings

 - multiple text types? (string of codepoints, of characters, or glyphs)

 - a whole range of Unicode-aware functions like substr() (and flip(),
   and like(), and regex(), and glob(), ...), with options for character
   and glyph counting instead of codepoint counting

 - codesets (for non-Unicode data), with automatic codeset conversions
   similar to type conversions
 - to have automatic conversions I think would require an extensible
   text type system

That's... a lot of functionality.  I'm not sure how much of it needs to
be implemented with help from the SQLite3 core, versus extensions.  It'd
be nice if all of it could be implemented via extensions, but I don't
think that's possible right now.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode support

2009-11-17 Thread Nicolas Williams
On Tue, Nov 17, 2009 at 05:15:16PM -0500, Igor Tandetnik wrote:
> Nicolas Williams <nicolas.willi...@sun.com> wrote:
> > This is no longer true, either of 'ch' nor 'll'.
> 
> There is a number of contractions in Hungarian that are still very
> much in use, but I can't recall them off the top of my head the way I
> can 'ch' (it's something like 'dzs'). There are also contractions in
> German Phonebook sort (e.g. 'oe' should sort between 'o with umlaut'
> and 'p', if I recall correctly). There are likely other cases.

I'm not surprised :(

> > The principle you
> > state is correct, of course, but really, this is a collation problem,
> > and affects SQLite3 apps regardless of "flip()".
> 
> My point is, it's difficult to even define what the correct behavior
> of flip() should be, let alone implement one. And so the safest course
> of action is to leave it out of core SQLite: a developer in need of
> such a function would presumably know the nature of their data and
> precisely what they want the function to achieve, and can always
> implement it as a custom function.

Maybe.  For indexing, I don't see the harm as long as an index built
with this function isn't used for ORDER BY when you care about
collations (ah! SQLite3 couldn't tell this is happening without knowing
the semantics of the function).

> > The collation is
> > per-column, and the run-time should make functions aware of the
> > collation (if any) of a column when an argument.
> 
> What about
> 
> select flip(EnglishText || GermanText || SpanishText)
> from MyMultilingualTable;

No different than:

select EnglishText || GermanText || SpanishText from MyMultilingualTable;

the concatenation can create 'oe' and all those other whatever they are
called's.

This is OK until you ORDER BY, and _then_ the collation requested or
inferred needs to apply.  Ah, there should be no inference of collation
from function names, and functions shouldn't have to care about
collations "in effect" -- only ORDER BY should care, but if ORDER BY is
relying on an index for ordering, then flip() can have negative effects.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unicode support

2009-11-17 Thread Nicolas Williams
On Tue, Nov 17, 2009 at 02:01:55PM -0500, Igor Tandetnik wrote:
> This would mean that the result of the hypothetical flip() function
> would be locale-dependent. E.g. in Spanish Traditional sort, a
> combination 'ch' sorts as if it were a single letter between 'c' and
> 'd', forming a single sort element (a so-called contraction). So
> should 'a ch b' reverse to 'b ch a' under Spanish Traditional sort,
> and to 'b hc a' otherwise? Would you pass a desired locale as a
> parameter to flip(), in order to achieve that?

This is no longer true, either of 'ch' nor 'll'.  The principle you
state is correct, of course, but really, this is a collation problem,
and affects SQLite3 apps regardless of "flip()".  The collation is
per-column, and the run-time should make functions aware of the
collation (if any) of a column when an argument.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Output in currency format

2009-11-13 Thread Nicolas Williams
On Fri, Nov 13, 2009 at 03:07:27AM +, Simon Slavin wrote:
> 
> On 13 Nov 2009, at 12:34am, Nicolas Williams wrote:
> 
> > On Thu, Nov 12, 2009 at 11:40:23PM +, Simon Slavin wrote:
> >> There's still some possibility for confusion, however: how many places
> >> of decimals do you use for each currency ?  As far as I know, no
> >> currently traded currency uses more than two digits of precision.
> >   ^^^
> > They use integer math to avoid floating point rounding issues, but
> > logically those integers are still real (or at least rational) numbers,
> > and so we can speak of base, mantissa and exponent.  The precision
> > required is pretty large, much more than two digits.
> 
> Integers in all languages I'm aware of are not stored as
> mantissa/exponent, they're stored as bits with complete precision.

That's why I wrote "logically".  Clearly, very, very clearly, int64_t
is not a float, has no mantissa, no exponent.  But you can use it as
though an int64_t were a real (well, rational) number.

> You can say you need a particular number of bits, but you'll never
> lose the last bit (the 1s) just because your numbers have got too big.
> You'll get an overflow error instead.

Yes.

> By 'two digits of precision' I was referring to cents for US dollars,
> pence for pounds sterling, etc..  Some currencies have no fractional
> part (e.g. Yen).  And I was sure there were ... ah, here we are:

Tenths of pennies are also used though.  See just about any gas station
in the U.S.

Oh, I forgot, with 64-bit _signed_ ints you get one fewer digit of
precision than I wrote earlier.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Output in currency format

2009-11-12 Thread Nicolas Williams
On Thu, Nov 12, 2009 at 11:40:23PM +, Simon Slavin wrote:
> There's still some possibility for confusion, however: how many places
> of decimals do you use for each currency ?  As far as I know, no
> currently traded currency uses more than two digits of precision.
   ^^^
They use integer math to avoid floating point rounding issues, but
logically those integers are still real (or at least rational) numbers,
and so we can speak of base, mantissa and exponent.  The precision
required is pretty large, much more than two digits.

Consider U.S. dollars, where we need to express from some fraction of
pennies to tens of trillions of dollars.  That's at least 16 digits of
precision.  You need to use larger than 32-bit integers for this,
meaning, in practice, 64-bit integers.  Besides, 100 years ago 16 digits
of precision for counting money would probably have seemed farfeteched,
so use 64-bit integers, which gets you a bit less than 20 digits of
precision.  (Hmmm, 19 digits looks a bit small now!)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] feature proposal - strong but dynamic typing

2009-10-30 Thread Nicolas Williams
On Fri, Oct 30, 2009 at 03:59:11PM -0500, Jay A. Kreibich wrote:
> On Fri, Oct 30, 2009 at 03:19:59PM -0500, Nicolas Williams scratched on the 
> wall:
> > I should add that a pragma that cause CHECK constraints to be
> > automatically created for enforcing strong typing in subsequent CREATE
> > TABLE statements 
> 
>   That's tricky.  Values have TYPES.  Columns have AFFINITIES.  There
>   is not a clear or obvious one-to-one mapping between them in all
>   cases.  The CREATE TABLE column datatypes can be used to figure out
>   the column affinity, but that doesn't always make it clear what
>   value type should go in the CHECK expression.

Indeed.

>   Also, CHECK expressions need to go into the CREATE TABLE definition.
>   They can't be added after-the-fact, like key triggers.  There are
>   ways around this, of course, but they're kind of messy.

That's actually a _feature_.  The pragma should affect only subsequent
table creation.

Note: I am not asking for this.  I'm only asking that whatever is done,
if anything, be backwards compatible, and preferably result in DB files
that are compatible with older versions of SQLite3.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] feature proposal - strong but dynamic typing

2009-10-30 Thread Nicolas Williams
I should add that a pragma that cause CHECK constraints to be
automatically created for enforcing strong typing in subsequent CREATE
TABLE statements is rather like having FOREIGN KEY clauses automatically
generate triggers.  There's precedent, in other words, and it is a
simple way to implement strong typing.

Also, when you view the schema you'll see the CHECK clauses, and will
know not only that typing is enforced at INSERT/UPDATE time, but also
what type SQLite3 actually inferred from the declared type.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] feature proposal - strong but dynamic typing

2009-10-30 Thread Nicolas Williams
On Fri, Oct 30, 2009 at 01:30:31PM -0400, John Crenshaw wrote:

+1

I don't think this proposal can or will be accepted.

One reasonable idea, perhaps, would to have a pragma that causes
subsequent CREATE TABLE statements to get automatically generated CHECK
expressions that enforce typing.  Any CHECK expressions in the given
CREATE TABLE statements would have to be wrapped, but that seems simple
enough.

That way you get forwards- and backwards-compatibility for DB files,
schemas, and SQL statements, while while still having the options of
strong and dynamic typing and the ability to mix the two.

And you'd not need any ugly keywords like "STRONG" or "UNIVERSAL"
(which, because SQLite3 accepts any type names, would have compatibility
issues anyways).

You'd still pay for type checking at run-time, even when using strong
type checking.  I'm OK with that.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Late data typing. Am I missing something?

2009-10-29 Thread Nicolas Williams
On Wed, Oct 28, 2009 at 07:11:29PM +, O'Neill, Owen wrote:
> You can get close if you put some check constraints on the columns.

This is key: you can get the benefits of static and dynamic data typing.

> I must agree with other posters that the lack of an exposed timestamp
> type does feel like something of a gap.

Given the rather large number of reasonable ways to represent
timestamps, I have to disagree.  I'd rather have the freedom to use
whichever representation is best for my needs.

E.g., if a table contains rows representing Unix files, then I'll
probably want to store seconds since the Unix epoch because that will
mean fewer conversions, depending on how I use that table anyways.

What's needed is a _cheap_ function that can be used in a CHECK
constraint on timestamp columns.  And, perhaps, a common convention for
naming types that correspond to specific timestamp representations.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Idea for improving page cache

2009-10-27 Thread Nicolas Williams
On Tue, Oct 27, 2009 at 04:28:11PM -0400, John Crenshaw wrote:
> "advantage" kind of depends. ULL is more specialized. You gain some
> benefit, but also lose some as well. For example, consider what is
> involved in doing a sorted insert into an ULL. On the other hand, you
> can get all of the same locality benefit with a pool allocation
> scheme. You don't reduce the pointer overhead this way, but that isn't
> really much of an issue.

I don't know that you get the same locality benefit with pools: with a
ULL there's fewer prev/next pointers to take up valuable cache space.

The need to "pullup"/"breakup" a ULL at times is annoying, and to
minimize you probably have to waste some space in order to amortize the
cost.  As you say, there's a trade-off.  Many optimizations result in
more complex, error-prone, brittle software.

Wasting space is not much of a problem for cache locality, and if you
keep the wasted space well under 50% you're ahead of plain lists in
terms of memory footprint.  So, ignoring code complexity, ULL seems like
a likely win-win performance-wise.  Even code complexity-wise, ULLs
allow random access to be much faster than with plain linked lists,
which means you're more likely to use random access, which means you
probably win code complexity-wise too.  Of course, once you start
leaving room for additions in each chunk, ULLs start resembling B-trees,
which, I think, is your point: might as well go whole-hog.

> ULL requires less time to walk all nodes, but since you can't afford
> to keep ULL sorted, there aren't a lot of scenarios where that saves

I don't think it's true that you can't afford to keep it sorted, but you
probably have to waste space if you want it sorted and allowing most
random inserts to not require spills.  Also, just because a chunk is
full, or even all chunks are full, does not mean you must move lots of
memory around: you can just insert a new chunk and move some entries
from the neighboring ones into it, thus creating some breathing space
for further additions (at the cost of space waste).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Degree character not displayed correctly.

2009-10-26 Thread Nicolas Williams
On Mon, Oct 26, 2009 at 10:01:43AM -0700, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Jean-Christophe Deschamps wrote:
> > First decide or determine what is (or shall be) your database 
> > encoding.  Even if SQLite has no problem storing ANSI (or EBCDIC or 
> > anything else) strings untouched,
> 
> This isn't particularly good advice.  SQLite works solely in Unicode.  When
> you supply text it *must* be in either UTF8 or UTF16 according to the API
> being used.  Sometimes it will appear that you can get by using a different
> encoding but that is just luck and things that operate on text will fail.
> The actual encoding used by the database is pretty much irrelevant and other
> than the pragma you can't even tell what it is nor would you care.

Indeed.  IIRC SQLite3 is actually 8-bit clean, but that doesn't matter:
by stating that it uses UTF-8 (and UTF-16) the SQLite3 developers are
actually allowing themselves the freedom to do a variety of things that
would break your application if you used non-UTF-8 (and non-UTF-16)
text.

For example, the SQLite3 developers might add code to reject strings
with invalid UTF-8 sequences, or strings which use unassigned code
points (unassigned in the version of Unicode supported by the SQLite3
that you are running).  Or they might add support for case-insensitive
matching for non-ASCII text.  Or they might add normalization-
insensitive matching.  And so on.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index for a group by

2009-10-21 Thread Nicolas Williams
On Thu, Oct 22, 2009 at 12:34:26AM +0200, Sylvain Pointeau wrote:
> if your "book" contains all lines (a,b,c,t,d)and you create an index on
> (a,b,c,t)
> 
> then your index is as fat as your book, isn't it?

Depends on the size of d.

Also, if you add a constraint declaring t, a, b, and c (you want 't'
first!) to be unique then the DB could make the whole thing smaller than
if you first create the table, then the index.  (I'm not sure of SQLite3
does that, but it could).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug in date() function ??

2009-10-13 Thread Nicolas Williams
On Tue, Oct 13, 2009 at 03:30:44PM -0400, D. Richard Hipp wrote:
> Begin with 2001-03-31
> Add 1 to 03, yielding 2001-04-31
> 04-31 means the 31st day from the beginning of april: 2001-05-01
> 
> Begin with 2001-03-31
> Subtract 1 from 03 yielding 2001-02-31.
> 02-31 means the 31st day from the beginning of february:  2001-03-03

The fact that Earth years are not a whole multiple of some convenient
number of Earth days (i.e., months), is certainly annoying.  What
SQLite3 does seems perfectly justified; that it may sometimes seem
surprising is not your fault, but cosmic chance.

I think the OP expected that 2001-03-31 - 1month == 2001-02-28 (or 29,
on leap years), because that's often (but not always) what people mean
when they say "a month ago".  You could have a lot of special casing in
date() to get something closer to what people normally mean by "a month
ago", but it'd be alot harder to explain the many heuristic choices, and
the choices might be too specific to one language/culture -- that'd not
be worthwhile, IMO.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Compiler warnings with Sun Studio 12.1 on Solaris 10

2009-09-30 Thread Nicolas Williams
On Tue, Sep 29, 2009 at 05:28:39PM -0700, Roger Binns wrote:
> Nicolas Williams wrote:
> > On Tue, Sep 29, 2009 at 11:21:30AM -0700, Roger Binns wrote:
> >> Nicolas Williams wrote:
> >>> If you move the cast to the left the warning should go away: 
> >>> ((sqlite3_int64)(1L<<63))
> >> And this is why making warnings go away leads to bugs.  The replacement
> >> above will only work if sizeof(long)==sizeof(long long) which is not the
> >> case on Windows in 64 bit mode or in 32 bit mode in general on any 
> >> platform.
> > 
> > Where is long long entering the picture here? 
> 
> Really? The cast is to a 64 bit quantity.  Pretty much every compiler uses
> 'long long' to represent a 64 bit quantity (even in 32 bit mode), although
> some also have __int64 and others int64_t depending on age and standards
> compliance.

Oh, I think you meant that the L suffix would break on anything other
than LP64 or ILP64 models.  I'm not entirely sure of the use of the L
suffix, and perhaps that would be a problem.  If you can't find a
constant that works equally well in ILP32 and LP64 then you can use
#ifdefs.  But I think that it is possible to have such a constant
without causing compiler warnings (for one, you can have the literal
constant, in decimal).

> Another example of compilers whining is gcc saying that in one part of the
> code, a parameter to memset could be zero (which usually indicates a
> programming error).  Except the compiler is wrong as human inspection
> proves.  Just because a warning is present does not mean it is right or that
> the compiler is perfect.  Hiding these warnings is even more dangerous
> because it obfuscates the original intention of the code.  (This is why a
> universal no warnings policy can hurt.)

Indeed, and I've expressed agreement on that.

> Demonstrate the code is wrong functionality (ie tests are broken) or that it
> isn't written in a standards compliant way :-)  The compiler warnings could
> help point to those locations.

In this case, there's no problem with ignoring this warning.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Compiler warnings with Sun Studio 12.1 on Solaris 10

2009-09-29 Thread Nicolas Williams
On Tue, Sep 29, 2009 at 12:05:21PM -0700, Jim Showalter wrote:
> Warnings are never harmless--they clutter the build output and 
> introduce cognitive dissonance when trying to see if a build is clean 
> or not.
> 
> I worked on a project where they hadn't enabled warnings during 
> development because it was "too much trouble". When I enabled 
> warnings, there were more than 14,000 of them, and of course then the 
> warnings couldn't be addressed because "there were too many of them".
> 
> Zero tolerance for warnings!

I mostly agree.  There are warnings that one can safely choose to
ignore.  For example, the Sun Studio lint has a very large number of
static analysis options, a few of which I think should be removed (e.g.,
complaints about constant while loop conditions -- while (1) ... is just
too common and useful to complain about, nor is it clear why constant
while loop conditions are a bad thing given that for (;;) elicits no
complaints).  Others you might quibble about in specific projects (e.g.,
lint complaining about ignored function return values when you don't
cast the function return to void).  But ignoring all warnings?  That's
just asking for trouble.  That doesn't appear to be what the SQLite3
community is doing though, so I've no real complaint here.

In this case I agree that the warnings are harmless.  You do have do
wonder: why use constant expressions that result in overflow warnings
(0x1f<<28) when there are equivalent constant expressions that don't
(0xf<<28)?  Compiler variations might be a good reason for not using
certain constant expressions (e.g., D. R.  Hipp's point about the L
suffix).  I don't really know about compilers other than GCC and Sun
Studio, so I'll shut up now :)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Compiler warnings with Sun Studio 12.1 on Solaris 10

2009-09-29 Thread Nicolas Williams
On Tue, Sep 29, 2009 at 11:21:30AM -0700, Roger Binns wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Nicolas Williams wrote:
> > If you move the cast to the left the warning should go away: 
> > ((sqlite3_int64)(1L<<63))
> 
> And this is why making warnings go away leads to bugs.  The replacement
> above will only work if sizeof(long)==sizeof(long long) which is not the
> case on Windows in 64 bit mode or in 32 bit mode in general on any platform.

Where is long long entering the picture here?  And what of the
alternative I gave where the actual most negative value of sqlite3_int64
is used?  In any case, you clearly know how to fix the warning safely.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQL language question

2009-09-23 Thread Nicolas Williams
On Wed, Sep 23, 2009 at 06:12:13PM +0100, Simon Slavin wrote:
> 
> On 23 Sep 2009, at 5:12pm, Nicolas Williams wrote:
> 
> > On Tue, Sep 22, 2009 at 04:45:31PM -0400, D. Richard Hipp wrote:
> >> UPDATE t1 SET x=x;   -- key line:  Is this considered an "update"
> >> of t1.x?
> >
> > Igor pointed to the standards text, which I think is quite reasonable:
> > an update is only an update if something changes.
> 
> On this basis, should the results of sqlite3_changes() and PRAGMA  
> count_changes not agree with those ?  Because they don't.

Yes, but pragmas are outside the language -- it might be useful to have
a pragma to count no-ops.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQL language question

2009-09-23 Thread Nicolas Williams
On Tue, Sep 22, 2009 at 04:45:31PM -0400, D. Richard Hipp wrote:
>  UPDATE t1 SET x=x;   -- key line:  Is this considered an "update"  
> of t1.x?

Igor pointed to the standards text, which I think is quite reasonable:
an update is only an update if something changes.

The same should probably apply to TRIGGERs too!

> And a related question:  Does anybody really care about ON UPDATE SET  
> NULL?  Has anybody ever actually seen ON UPDATE SET NULL used in  
> practice?

ON UPDATE SET NULL has very few uses, if any.  Any use of ON UPDATE SET
NULL that I can imagine is contrived (e.g., select for NULL references
in t2 as a way to detect past updates to t1).

On the other hand, ON DELETE SET NULL is very useful.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Faster inserts in SQlite ...

2009-09-21 Thread Nicolas Williams
On Mon, Sep 21, 2009 at 03:37:02PM -0400, Pavel Ivanov wrote:
> > Have you any IO operations? As result you have dependence of page
> > size.
> 
> Though your performance most probably will not depend on these
> operations because they will be executed at some random times by OS.
> And they will be collected to have multiple blocks in one operation
> anyway...
> I don't have good knowledge of how disk cache works in kernel to say
> if it will be beneficiary to send data there in chunks equal to blocks
> on disk as opposed to chunks of any arbitrary size...

It is well-known that matching DB page size and filesystem record size
is important for improving performance.

SQLite3 supports power-of-2 page sizes from 512 to 32KB.

Most filesystems use 512, 4096 or 8192 byte blocks.  SQLite3's default
pagesize is 1024 bytes.

In the case of ZFS the fs recordsize is variable up to 128KB, and can be
tuned per-dataset.  You should set the recordsize to 32KB for ZFS
datasets containing SQLite3 DBs, and you should set the SQLite3 pagesize
to 32KB.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.6.17

2009-09-03 Thread Nicolas Williams
On Thu, Sep 03, 2009 at 03:27:38PM +0100, Simon Slavin wrote:
> On 3 Sep 2009, at 1:38am, P Kishor wrote:
> > well, I think the problem is with the sqlite3 command line tool.
> I agree.  I just checked it with OS X 10.6, which comes with SQLite  
> version 3.6.12, and got the same problem: you can't type accented  
> characters into the sqlite3 tool.  So I guess the sqlite3 tool uses a  
> version of readline (or whatever) which doesn't do this properly.   
> However, the SQLite database system does handle unicode characters  
> just fine.

It's almost certainly readline.  I tried SQLite3 on Solaris without
readline and it handles non-ASCII UTF-8 input just fine.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Booleans in SQLite

2009-09-03 Thread Nicolas Williams
On Thu, Sep 03, 2009 at 04:21:28PM -0400, Wilson, Ronald wrote:
> > Wrong pragma.  Try:
> 
> Thanks.  I'm going to stop talking for a few days now.  Enough gaffs for
> one day.

Heh.  But, actually, why doesn't SQLite3 produce an error when unknown
pragmas are used?  Wouldn't that be the right thing to do?  I would
thinks so.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Booleans in SQLite

2009-09-03 Thread Nicolas Williams
On Thu, Sep 03, 2009 at 03:57:14PM -0400, Wilson, Ronald wrote:
> Hmm.  I can't get the pragma to return a value at all.
> 
> SQLite version 3.6.10
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> pragma default_file_format;

Wrong pragma.  Try:

sqlite> pragma legacy_file_format;
1
sqlite> pragma legacy_file_format=0;
sqlite> pragma legacy_file_format;
0
sqlite> 

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Defining user groups in SQLite table

2009-09-03 Thread Nicolas Williams
On Thu, Sep 03, 2009 at 10:57:13AM -0700, karenebs wrote:
> I have a database table that holds about 20,000 codes.  Each code can be used
> by several different user groups.  I could add a column to the database
> table for each user group to indicate which codes that particular group has
> access to.  But, I have over 100 different groups!  Can anyone suggest
> another way of doing this?  So, group 1 has access to code1, code2, code3,
> code4, code5;  group 2 has access to code1, code2, code3;  group 3 has
> access to code 2, code3, code4. etc.  Thanks in advance for any suggestions!

Group memberships are best modelled as separate objects.

You need three tables then: one for users, one for groups, and one for
group memberships:

CREATE TABLE users (userid INTEGER PRIMARY KEY, username TEXT, ...);
CREATE TABLE groups (groupid INTEGER PRIMARY KEY, groupname TEXT, ...);
CREATE TABLE group_memberships (groupid INTEGER, userid INTEGER,
UNIQUE (groupid, userid));

You could sprinkle FOREIGN KEY and add ON CONFLICT IGNORE:

CREATE TABLE group_memberships (groupid INTEGER, userid INTEGER, 
UNIQUE (groupid, userid) ON CONFLICT IGNORE,
FOREIGN KEY (groupid) REFERENCES groups (groupid),
FOREIGN KEY (userid) REFERENCES users (userid));

You could even sprinkle ON DELETE/UPDATE/INSERT clauses:

CREATE TABLE group_memberships (groupid INTEGER, userid INTEGER, 
UNIQUE (groupid, userid) ON CONFLICT IGNORE,
FOREIGN KEY (groupid) REFERENCES groups (groupid)
ON DELETE CASCADE ON UPDATE CASCADE,
FOREIGN KEY (userid) REFERENCES users (userid)
ON DELETE CASCADE ON UPDATE CASCADE);

This makes queries for memberships and mebers, both very fast.  And you
can always use group_concat() to get a single string with all groups
that a user is a member of or all users that are members of a group.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Booleans in SQLite

2009-09-02 Thread Nicolas Williams
On Wed, Sep 02, 2009 at 05:44:38PM -0400, Shaun Seckman (Firaxis) wrote:
> I'm just curious how difficult it would be to add
> support for booleans in SQLite.  This would most likely involve adding a
> new type affinity as well as adding "true" and "false" keywords to the
> lexer.  There's much more that could be done but I'm just looking for
> rudimentary support.

The SQL Standard does specify and OPTIONAL boolean type.

> I understand that ANSI C doesn't have native support for
> booleans and that's fine,

ANSI C99 most certainly does have a boolean type.

>   I would just like to reduce some memory
> overhead as well as use those keywords instead of creating integer
> fields using 0 and 1.

SQLite3 uses a variable length encoding for integers on disk, so you
will gain no space on disk by having a native boolean type.

Well, if you had a table with a single column and that column was a
boolean, an RDBMS could express the entire table as a sparse bitstring
indexed by row number/ID/OID, with two bits per row (two bits would be
needed to encode: true, false, null/unknown, and "this record doesn't
exist").  But such a thing would be a huge new feature for SQLite3, with
very few uses.

And you're not really going to save memory in SQLite3 by having a
boolean type either -- 31 or 63 bits are a wash in the context of
evaluating a prepared statement.

_Your_ application could save memory by treating booleans as a single
bit and packing them into bitstrings, but you can do that now without
any help from SQLite3.

A group_concat() like aggregation function could be defined that groups
and concatenates boolean inputs into a bitstring -- this might be
helpful to you.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Looking for a w_char alternative for sqlite3_get_table

2009-09-01 Thread Nicolas Williams
On Tue, Sep 01, 2009 at 10:41:27AM +0200, A.J.Millan wrote:
> *  Make sure there was no 16-bit version of the sqlite3_get_table at 
> function -perhaps it would be a good idea to include it in the standard API. 
> The reason is the same who advised include the current version.

It might be easier to provide utility functions for UTF-8<->UTF-16
conversion and let developers use those when 16 variants of various
functions are missing.

(Yes, there should be libraries in your OS for UTF-8<->UTF-16
conversion, but SQLite3 already has its own.  Exporting them should not
hurt.)

But wchar_t <-> UTF-8 and wchar_t <-> UTF-16 conversions should
definitely be out of scope for SQLite3.  (wchar_t is very OS- and
locale-specific.  Expecting SQLite3 to know about wchar_t seems to me
like asking for bloat.)

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] genfkey as loadable module (Re: Run genfkey on temp db from app)

2009-07-20 Thread Nicolas Williams
On Mon, Jul 20, 2009 at 08:34:52AM -0500, Jay A. Kreibich wrote:
>   If someone is going to take the time to refactor the genfkey code
>   into a stand-alone piece of code, I would like to suggest it is made
>   into it a loadable module, so that it can be used pretty much anywhere.

That sounds like a good idea.  One could SELECT genfkey("some table");
and get back a string that has the trigger statement(s) needed, and one
could then exec those statements.  Since SQLite3 is re-entrant this
should be possible.

>   Also, if the code is written carefully, it is possible to use the same
>   code to compile a module (.dll, .so, .dylib, etc.) or to compile
>   directly into an SQLite build with very minimal changes (e.g. using
>   #defines).  That would be provide the most flexibility to the most
>   people.

I agree.  SQLite3 would remain Lite, yet the genfkey tool would be
available outside of the shell and without having to execute any
external programs.

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Do people think of SQLite as a file or as a database

2009-07-15 Thread Nicolas Williams
On Wed, Jul 15, 2009 at 07:11:51PM -0700, CadMapper wrote:
> 
> This is not a technical question about SQLite.  I want to you how people in
> general think about SQLite.  Is that a file or a database?  When you talk
> about it, do you refer to it as file or database?

It's a database.  It happens to be stored in a file (or, if you use
ATTACH, several files).

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] plans for completion of INSERT OR REPLACE INTO?

2009-07-07 Thread Nicolas Williams
On Tue, Jul 07, 2009 at 12:17:36AM +0100, Simon Slavin wrote:
> He didn't make up the rule.  Nor did I.  It's part of the standard for  
> mailing lists and usenet:
> 
> 

This is way off-topic now, but, to be fair, RFC1855 is not a standard,
it is an Informational status RFC -- it provides information.

That said, netiquette is a very good thing :)
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] plans for completion of INSERT OR REPLACE INTO?

2009-07-06 Thread Nicolas Williams
On Mon, Jul 06, 2009 at 02:49:07PM -0700, James Gregurich wrote:
> 1) Why on earth would you want to scroll all the way to the bottom of  
> a long email to get the response simply for the sake of "We read  
> English top to bottom."

Any quoted context must be read before the reply or else is not
necessary and should not have been quoted.  Thus top posting (as
the practice is known) is obnoxious because it causes the reader to
read the reply, read the quoted text, and then re-read the reply to
understand it in its proper context, and if the quoted text was not
necessary in order to understand your reply (as in this particular
case), then it just wastes bandwidth and storage.

> 2) This is going to be a challenge for me because I'm not writing a  
> fixed DB with a known schema.  I'm writing a higher-level data store  
> wrapper using SQLITE as the backing. The schema is configured at  
> runtime.

I agree.  ISTM that SQLite3 ought to call either call DELETE triggers
for all rows deleted by INSERT OR REPLACE, or UPDATE triggers for the
row that had the same PRIMARY KEY and DELETE triggers for any other
deleted rows.

(To me INSERT OR REPLACE can equally be seen as deleting all conflicting
rows, then inserting a replacement row, or else as updating the row that
conflicts with the new row's PRIMARY KEY and deleting rows that conflict
on other unique indexes.  If the INSERT OR REPLACE has many input rows
then this gets murkier in that which rows are updated or deleted may
become non-deterministic, I think, so deleting all affected rows seems
like the better solution.)

> 3)  I found the following comment on the mysql dev site (relevant  
> since "replace into" is inspired by mysql). They apparently fire the  
> triggers as expected. It seems to me that your comments overcomplicate  
> the requirements. If you are going to delete a row, call the delete  
> triggersthat is all that is needed as best I can tell. However, I  
> do admit that I am not well educated on SQL db engines. I'm figuring  
> all this out as I go along. Feel free to tell me how absurdly wrong I  
> am.

INSERT OR REPLACE is not a feature of the standard, so anything goes...

Nico
-- 
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] referential integrity and INSERT OR REPLACE

2009-07-06 Thread Nicolas Williams
On Sat, Jul 04, 2009 at 10:24:50AM +0200, Kees Nuyt wrote:
> On Fri, 03 Jul 2009 14:38:43 -0700, James Gregurich
>  wrote:
> 
> >
> >nuts. that makes INSERT OR REPLACE worthless if you have tables  
> >dependent on one another.
> >
> >
> >Is there any way to manually get a list of records for which there  
> >would be a conflict if a given record was inserted?
> 
> BEGIN;
> INSERT the row, and observe any UNIQUE constraint failures.
> 
> If there isn't one, the INSERT succeeded and you're done.

Does SQLite3 report all those failuers in one attempt?  Or must one loop
to make this a robust conflict detection algorithm?
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


  1   2   3   >