Re: [sqlite] Row Level Locking as in InnoDB

2013-11-09 Thread Ryan Johnson

On 08/11/2013 5:07 AM, Raheel Gupta wrote:

No. It's not even feature-frozen yet, as far as we know. And whenever it

is, it's incredibly unlikely to have row level locking.

Please add row-level locking if possible.
I can't think of any other single feature that would remove the "lite" 
from sqlite3 more thoroughly than adding support for row-level locking 
(or any other fine-grained concurrency control scheme).


I'm not an official sqlite3 dev, but I've seen the underbellies of 
enough database engines to be pretty confident saying this. A good 
implementation of fine-grained concurrency control and reentrancy adds a 
drastic amount of complexity and overhead to a database engine, and 
usually slows it down quite a bit unless you're willing to sacrifice 
features or generality.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQlite Memory Leakage with 65536 Page Size

2013-10-28 Thread Ryan Johnson

On 28/10/2013 3:57 PM, Richard Hipp wrote:

On Mon, Oct 28, 2013 at 3:52 PM, Raheel Gupta  wrote:


Sir, is there any way to not allow malloc to hold memory ? I mean shouldnt
free(), be freeing the memory ?


You'lll need to speak with the developers of your libc implementation about
that.
... but if you do that, be prepared to explain why both of the following 
apply to your program:
- it does not exit right after freeing that memory and it does not 
allocate more memory later
- some other process in your system is experiencing malloc() failures or 
excessive page faults because the memory has not been returned to the OS


A libc developer is unlikely to take the issue seriously unless both of 
those statements are true for your situation.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Segmentation Fault With Trigger

2013-10-15 Thread Ryan Johnson
I'm not an sqlite3 dev, but I don't think you'll get much help until you 
provide enough information for somebody to see what is going wrong. You 
were already asked to provide a backtrace from a debug-compiled sqlite3 
library (the backtrace you sent is all but useless).


A self-contained .c file that demonstrates the problem would be even 
better. For example, without knowing what your user-defined function 
does, we have to assume it is full of memory corruption bugs that cause 
the problems you experience.


Ryan

On 15/10/2013 12:12 AM, techi eth wrote:

Please provide me hint to solve the issue.


Thanks..


On Fri, Oct 11, 2013 at 7:58 PM, techi eth  wrote:


It is giving same trace what i have sent last time.

It is possible by some one to send test code snapshot of scenario of
selecting user defined function while trigger execution.

Thanks a lot...


On Fri, Oct 11, 2013 at 7:41 PM, Richard Hipp  wrote:


On Fri, Oct 11, 2013 at 10:06 AM, techi eth  wrote:


Please provide some hint.


Recompile with -O0 and -g and then post the stack trace after the
segfault.

--
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug: analyze.c does not honor SQLITE_64BIT_STATS

2013-10-10 Thread Ryan Johnson

On 09/10/2013 9:53 PM, Richard Hipp wrote:

On Wed, Oct 9, 2013 at 9:49 PM, James K. Lowden wrote:


It's difficult to do portably because you have to account for every
combination of standard C library and integer size


Remember that SQLite does not use the standard library printf() function.
It has its own.  (See http://www.sqlite.org/src/artifact/da9119eb?ln=163)
And the SQLite version assumes that %lld means 64-bit integer and %d means
32-bit integer.
Beautiful! Proves yet again that sometimes the best solution is to 
"cheat" (change the rules).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bug: analyze.c does not honor SQLITE_64BIT_STATS

2013-10-09 Thread Ryan Johnson

On 09/10/2013 10:07 AM, Ralf Junker wrote:

On 09.10.2013 15:50, Eric Minbiole wrote:


With this change, tests pass again:

 #if sizeof(p->nRow) == sizeof(long long)
 sqlite3_snprintf(24, zRet, "%lld", p->nRow);
 #elseif sizeof(p->Row) = sizeof(long)
 sqlite3_snprintf(24, zRet, "%ld", p->nRow);
 #else
 sqlite3_snprintf(24, zRet, "%d", p->nRow);
 #endif


Slightly off-topic, but I didn't think that sizeof() could be used as 
part

of a preprocessor directive?  (I.e., that #if sizeof(x) doesn't work as
intended, or at least not portably.)


This is more portable:

#ifdef SQLITE_64BIT_STATS
sqlite3_snprintf(24, zRet, "%lld", p->nRow);
#else
sqlite3_snprintf(24, zRet, "%d", p->nRow);
#endif
Actually, some machines define 64-bit ints as long, and passing them to 
printf with %lld triggers compiler warnings [1]. Technically the 
portable way is to make sure the int is a [u]int64_t from  
and then use the (awful and painful) printf modifiers from the same 
header (PRIu64 et al). An easier workaround is to define the int as 
size_t and then use %zd in printf... but I think %zd is a GNU extension.


[1] in that particular case the warning is harmless, but it's a bad idea 
to train oneself to ignore printf warnings about variable sizes, lest 
problems like the one we're discussing arise.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to use my own python function in a SQLite WHERE clause?

2013-10-06 Thread Ryan Johnson

On 05/10/2013 6:01 AM, Clemens Ladisch wrote:

Bao Niu wrote:

SELECT * FROM myTable WHERE MyDate > MyModule.ChineseDate("兔年八月十五")

You cannot use Python function directly in SQL.
... but you can register it with sqlite3 easily enough and use it from 
SQL afterward:

http://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.create_function

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Version 3.8.1 beta

2013-10-02 Thread Ryan Johnson

On 02/10/2013 5:13 AM, Joe Mistachkin wrote:

Jan Nijtmans wrote:

Well, I did some digging as well, and it looks like the libtool upgrade
is the coolpit:

See:
Classic. The error message is completely inscrutable to someone who's 
never seen it before, but technically makes sense after you know what 
the problem is. Maybe somebody could nudge the autotools guys to make 
the message a little more helpful?


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-10-02 Thread Ryan Johnson

On 02/10/2013 2:19 AM, Baruch Burstein wrote:

On Fri, Sep 13, 2013 at 5:29 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:


-- Join cardinality: Bach was a *very* prolific composer whose output
likely dwarfs the (surviving) output of his contemporaries
select p.title, c.name, p.year from composers c join pieces p on p.c_id =
c.id where c.name like '%bach%' and p.year between 1700 and 1750


How would you mark this even using the suggested function syntax? The
likelihood of "c.name like '%bach%' " being true depends on the order the
query optimizer decides to evaluate the 2 predicates in, which in turn
depends on the likelihood of the predicate!
Predicate order doesn't matter; the optimizer will push both down to 
their respective source tables before the join, where they will execute 
independently of each other. The problem would come if the above join 
were input to some outer query and the optimizer needed to reason about 
the cardinality of the join based on the selectivity of the inputs: the 
inputs would be highly filtered but the join would impose unexpectedly 
little additional filtering due to the correlation between famous 
composes named Bach and the half century of interest.


Ryan






___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] racing with date('now') (was: Select with dates)

2013-09-16 Thread Ryan Johnson

On 15/09/2013 2:23 PM, Yuriy Kaminskiy wrote:

Stephan Beal wrote:

On Sun, Sep 15, 2013 at 1:58 PM, Yuriy Kaminskiy  wrote:


Sure, there can be several way to interpret CURRENT_* and *('now').
However,
some of them can be useful (transaction, statement), and others (step) -
cannot
be. And some (sub-expression, the way it "works" currently) are purely
insane.


i've been following this list since 2006 or 2007 and i can't remember

Oh, yes, yes, "I was on debian [...] list since [...] and can't remember anyone
complaining about broken RNG for two years". So what?

And, by the way, I already complained about this behavior on this list in the
beginning of 2012 year.


anyone every complaining about the current behaviour before. If the
behaviour bothers you, use a user-defined function which provides the

It does not bother *me* - I can happily live with knowledge that SQLite
CURRENT_*/*('now') is broken by design and should not be used ever. It should
bother people that use sqlite for something serious.
Rhetorical question: if sqlite3's behavior were tightened up would 
anybody complain? Is there any possible use case where replacing the 
current random-ish behavior with something consistent would change an 
application? Seems like the requested behavior happens on accident often 
enough that no current application could rely on its failure to appear.


$0.02
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] racing with date('now') (was: Select with dates)

2013-09-16 Thread Ryan Johnson

On 15/09/2013 3:36 AM, Petite Abeille wrote:

On Sep 15, 2013, at 12:53 AM, Kees Nuyt  wrote:


3) If an SQL-statement generally contains more than one reference
   to one or more s, then all such ref-
   erences are effectively evaluated simultaneously.

FWIW, Oracle concurs:

"All of the datetime functions that return current system datetime information, such 
as SYSDATE, SYSTIMESTAMP, CURRENT_TIMESTAMP, and so forth, are evaluated once for each 
SQL statement, regardless how many times they are referenced in that statement."
I'm pretty sure that anything weaker breaks [the illusion of] 
serializability, by giving users a way to see that their transactions 
did not execute in the order they appeared to:


A: select CURRENT_TIMESTAMP; update foo set v=1 where k=0; select * from 
foo;
B: update foo set v=2 where k=0; select * from foo; select 
CURRENT_TIMESTAMP;


Whether that matters in practice, I wouldn't know...

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hints for the query planner

2013-09-12 Thread Ryan Johnson

On 12/09/2013 7:12 PM, Simon Slavin wrote:

On 12 Sep 2013, at 11:35pm, Roger Binns  wrote:


On 12/09/13 05:03, Gabor Grothendieck wrote:

Perhaps indexing the expression in question would be an alternative
that would keep the performance info separate from the select.

I'd rather just do 'ANALYZE query' and have SQLite go off and do whatever
it deems necessary to gather to make the query perform well in the future.

I think I prefer Roger's idea to that of messing with a vital expression.  
Rather than modify a command you need to work, introduce a new command which, 
even if it entirely fails, doesn't interfere with the operation of the SELECT.  
If the ANALYZE fails the SELECT (or UPDATE, or whatever) will still work, 
albeit more slowly.  And this causes fewer problems for users who have to write 
code which works with many SQL engines.

But because ANALYZE already means something in SQLite, perhaps it might be 
better to introduce a new command
Would it really be so confusing to extend ANALYZE that way? It reads so 
naturally, I bet the most likely source of list traffic would be people 
who tried to use it in older versions of sqlite3 and were surprised it's 
not there...



STORE LIKELIHOOD test, probability

or maybe

REMEMBER LIKELIHOOD OF test AS probability

which will store the fact that such-and-such a test has a certain probability as a new 
row in a table somewhere.  Could be a new row in sqlite_stat3 (or sqlite_stat4), or could 
be in another sqlite_ table.  Omitting the second parameter tells SQLite to do the 
evaluation itself (like ANALYZE does) and store the result.  Curious users could dump the 
table just like people sometimes do "SELECT * FROM sqlite_stat3".
I think it's pretty important to examine predicates in the context of 
specific queries (and to allow the same predicate to appear any number 
of such queries). The predicate "c.name like '%bach%'" is going to 
behave quite differently in these three queries, for example:


-- Vanilla predicate: Bach isn't a very common name
select c.name from composers c where c.name like '%bach%';

-- Join cardinality: Bach was a *very* prolific composer whose output 
likely dwarfs the (surviving) output of his contemporaries
select p.title, c.name, p.year from composers c join pieces p on p.c_id 
= c.id where c.name like '%bach%' and p.year between 1700 and 1750;


-- Correlated columns: Very few Brandenburg anythings were written by 
composers other than J.S. Bach
select c.name, p.title from composers c join pieces p on p.c_id = c.id 
where c.name like '%bach%' and p.title like '%brandenburg%';


(110% agree that any new information that changes query plans needs to 
be in a stats table somewhere. It's a huge aid to performance debugging 
when you can repro a problematic query plan using only the schema, 
query, and a dump of the various stats tables.)


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to interrupt a long running update without roll back?

2013-06-22 Thread Ryan Johnson

On 21/06/2013 8:41 PM, Keith Medcalf wrote:

No, in my case user does not touch the DB - he can only add or delete
items,
but all DB modifications are done by my app and I have a full control over
all the values. All I need is to find the most efficient way to do that.

Therefore, I do not believe that I need any triggers. I guess they will
simply slow down insertion and update (however I did not measure this
yet).
In those rare cases when there is a need to reset attributes on an already
existing item, I would rather delete the entire row and insert a new one
and
treat it as a completely new item.

Perhaps you should reconsider using triggers.  For example, you can create a 
trigger that adds the rowid of rows inserted (ie, that you need to update) into 
another table.  Periodically from your application process the rows indicated 
in this table to do your updates, and delete the rowids for the rows you have 
updated.  This will add almost no discernible overhead to your table updates, 
plus it will give you a table with the rowids of the rows you need to visit and 
update.  If you wish to do them in small batches then you can do so.
I considered that as well, but it's not clear how much benefit you get 
over the autoincrement scheme: the PK-index is there either way, so 
that's not a slowdown.  The split table approach also makes 
query-writing and indexing more complex, so at a minimum you'd probably 
want to make a view that runs a UNION ALL on the two tables.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to interrupt a long running update without roll back?

2013-06-21 Thread Ryan Johnson

On 21/06/2013 2:29 PM, jhnlmn wrote:

Ryan Johnson <ryan.johnson@...> writes:
Q1: Is C1 *always* NULL in a newly-inserted row, or does the application
sometimes insert some arbitrary value?
Q2: Does the transition from NULL to calculation(C2) mean something
special to the application?

...

Scenario 3: C1=NULL has a specific meaning in your application

C1=NULL is just a flag, whose only meaning is that update was not done yet
on the recently inserted row.

If in Windows Explorer you enter a folder with few thousand music files,
then Explorer will show the list of files in the left column of detailed
view instantaneously, but then it will begin slowly fill up other columns
(Title, Artist, Album).
My application is not an Explorer, but it has a similar pattern - show the
list of item names ASAP and then prepare and store item attributes on the
background. This is a pretty common pattern, I guess. So, in my minimal
example C2 is the item url and C1 is item attributes (in reality there will
be several columns with attributes: C11, C12, etc., but C1=NULL means that
attributes were not prepared yet and C1!=NULL means that attributes are ready).


Scenario 2: C1=calculation(C2) is the default to use when the user fails

to supply C1;

No, in my case user does not touch the DB - he can only add or delete items,
but all DB modifications are done by my app and I have a full control over
all the values. All I need is to find the most efficient way to do that.

Therefore, I do not believe that I need any triggers. I guess they will
simply slow down insertion and update (however I did not measure this yet).
In those rare cases when there is a need to reset attributes on an already
existing item, I would rather delete the entire row and insert a new one and
treat it as a completely new item.
So if I understand correctly, you start out with only C2, use C2 to 
drive an expensive/slow computation that derives all other fields in the 
row, and set C1 != NULL when the derivation completes? And you don't 
want a trigger because that would pin the slow computation on the 
inserting (application) thread rather than the batch-updating 
(background) thread?


In that case, your AUTOINCREMENT approach is probably the best you can do...

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to interrupt a long running update without roll back?

2013-06-21 Thread Ryan Johnson

On 21/06/2013 2:55 AM, jhnlmn wrote:

Ryan Johnson <ryan.johnson@...> writes:
histo = list(conn.execute('select min(rowid) lo, max(rowid) hi, count(*)
n from lineitem group by rowid/1' order by lo))

...

 a,b,n = buckets[-1]


Thank you for your reply.
You code is not very correct ("a" always remain 0),
but I got your idea (and tested it).
Bucket creation adds about 50% to the time of update, which is acceptable.

By the way, I also posted my question at
http://stackoverflow.com/questions/17099491/sqlite3-how-to-interrupt-a-long-running-update-without-roll-back

So, if you care, you may wish to post your "bucket" solution there as well,
in case somebody will need it.


Why not use a trigger on T to update C1 whenever it gets set to NULL?

My case is actually simpler: C1 is NULL only on newly inserted rows
(I should have said this from the beginning).

So, I just realized that if I enable AUTOINCREMENT feature,
then I do not need to process entire rowid range,
but only range of rows added after the last update.
This range should not have many holes and I can process this range
10,000 rowids at a time.
This will work until rowid will reach the largest possible integer.
This sounds like a good time to re-examine what end goal you're really 
after here (rather than how to implement this particular way of 
achieving that goal). Some questions worth asking yourself might include:


Q1: Is C1 *always* NULL in a newly-inserted row, or does the application 
sometimes insert some arbitrary value?


Q2: Does the transition from NULL to calculation(C2) mean something 
special to the application?


Scenario 1: C1=calculation(C2) is a constraint.
Solution: mediate everything through a view:

create table T_data(..., C2, ...); -- C1 is conspicuously absent here
create view T as select *, calculation(C2) C1 from T_data;
create trigger T_update instead of update on T begin update T_data 
set ..., C2=new.C2, ... where rowid = new.rowid;

end; -- need similar "instead of insert" as well

Scenario 2: C1=calculation(C2) is the default to use when the user fails 
to supply C1; the initial C1=NULL is a workaround for inability to 
express "DEFAULT calculation(C1)". Alternatively, C1 is a constraint, 
but calculation() is expensive enough we don't want to recompute it.

Solution: use an after-insert trigger:

create trigger tc1 after insert on T when new.c1 is NULL begin update T 
set c1=calculation(c2) where rowid=new.rowid; end;


Scenario 3: C1=NULL has a specific meaning in your application, and it 
matters when C1 transitions from NULL to calculation(C2).
Solution: your AUTOINCREMENT + batch update is probably the best choice. 
The timing of the batch update almost certainly should depend on 
something other than number of rows inserted so far; otherwise a trigger 
could run the batch whenever a large enough rowid is created, which 
would imply that a normal after update trigger would work. I strongly 
suspect this is either *not* your actual scenario, or you have a lurking 
bug: if NULL C1 really matters, and it's important for the the 
transition away from NULL not to happen right away, that suggests 
something must happen in the system "promote" "old-enough" NULL C1 to 
non-NULL status. However, that batch, whatever the trigger, could catch 
a just-barely-inserted NULL C1 and change it to C2 before any reader 
notices it was even there. If that is OK, then there's a very good 
chance that you actually have Scenario 2.


Thoughts?
Ryan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to interrupt a long running update without roll back?

2013-06-19 Thread Ryan Johnson

On 19/06/2013 1:41 AM, jhnlmn wrote:

Thank you for your response

Simon Slavin  writes:

UPDATE T SET C1 = calculation(C2) WHERE C1 IS NULL AND rowid < 1

This is the best solution when the table is freshly created
and max(rowid) == number of rows.
But after many deletes and inserts many rowid will be unused,
max(rowid) may grow indefinitely and the number of required updates will be
also indefinite.

So, no perfect solution so far.
Why not use a trigger on T to update C1 whenever it gets set to NULL? Is 
it actually important for the NULL to persist until your next batch 
update comes along?


If a trigger would work, that's by far the cleanest solution, and will 
have almost no impact on concurrency. Otherwise...


Assuming you want to process between 10k and 20k rows at a time, you 
might try something like this (writing in python, you should be able to 
translate it easily to the language you actually use) :


import sqlite3
c = sqlite3.open('my-database.db')
histo = list(conn.execute('select min(rowid) lo, max(rowid) hi, count(*) 
n from lineitem group by rowid/1' order by lo))

buckets,i = [(0,0,0),], 0
while i < len(histo):
a,b,n = buckets[-1]
c,d,m = histo[i]
buckets[-1] = (a,d,n+m)
i += 1
if n+m >= 1
buckets.append((0,0,0))

for lo,hi,n in buckets:
conn.execute('update T set C1=calculation(C2) where C1 is NULL and 
rowid between ? and ?', (lo, hi))


Translated into plain English: count the number of rows in each slice of 
10k, being sure to return slices in order. That query will be fast 
because it should read from your rowid index and has small output size 
(even a billion-row input will only produce 100k rows). Merge too-small 
slices so that each contains somewhere between 10k and 20k-1 rows, then 
run your update query, passing the lower and upper bound of each slice 
to limit where it looks. Again, the index on rowid will enforce the 
range limit without a table scan (but double-check the output of 
"explain query plan").


NOTE: by splitting the transaction, you risk the database changing out 
from under you before the last slice is done. Somebody *could* delete a 
whole chunk of the rowid space, for example, and throw off your 
carefully computed slice sizes. Or they could add rows after you make 
the slices, and those rows would be ignored.  Or they could set C1=NULL 
on rows you already looked at. Most likely, you can squint and claim 
that all those kinds of things just happened "after" the batch update, 
but whether that's allowed is application dependent.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select count(*) does not use primary key??

2013-06-07 Thread Ryan Johnson

On 06/06/2013 10:52 AM, Gabriel Corneanu wrote:

In my opinion, count(*) is the same as count(rowid) (I see that even
count() is accepted); I could say it's even the same as count(x) (any other
field).
Not quite... count(x) only counts rows having non-NULL x. Granted, 
that's not a problem for rowid/pk (which are not allowed to be NULL), 
but it matters a lot in the general case.


(but that doesn't explain the problem you're seeing)

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database Corrupted!

2013-05-28 Thread Ryan Johnson

On 27/05/2013 9:40 PM, Woody Wu wrote:

On Mon, May 27, 2013 at 04:31:25PM +0100, Simon Slavin wrote:

On 27 May 2013, at 4:22pm, Woody Wu  wrote:


If Yaffs2 is the cause, how can I write an effective test to exposure it?

Do you have an opportunity to format the same drive in a different format ?  
I'm not telling you to change your long-term practises, just to try a different 
format for testing.  Reformat it, run your identical testing program, and see 
if it fails in the same way.

Also I liked Richard's point: are you testing on a cheap Flash drive ?  Perhaps 
one intended only for use with FAT and therefore only tested for use with FAT ?

Can you experts explains why a cheap Flash drive can harm an
application such as sqlite? Does you mean these cheap drive was cheating
with ECC? Otherwise, filesystem should be able to capture IO error (fix
it or forward), right?
Cheap (aka counterfeit) flash drives are notorious for advertizing more 
space to the OS than they actually have, and so at some point writes 
start to silently erase data that was written earlier.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem with index on multiple columns

2013-05-17 Thread Ryan Johnson

On 17/05/2013 11:06 AM, Richard Hipp wrote:

On Fri, May 17, 2013 at 11:02 AM, GB  wrote:


Richard Hipp schrieb am 17.05.2013 16:37:

  Collating orders and affinities might be disqualifying the constraint on

"id" from being used with the index.


It just came to my mind that "id" is an INTEGER PRIMARY KEY column and as
such is part of every index anyway. Could it be that the additional "id"
part of that index is silently ignored in this case?


Yes.  If "id" is the INTEGER PRIMARY KEY then that messes up everything.
Don't do that.  Change the index to omit the "id" and you'll get better
results.

Furthermore, the "id" is unique so if there is a constraint on the "id" the
query planner will always use that constraint to look up the rows in the
table directly, rather than going through an index, since doing so will be
about twice as fast as using an index.
Could you explain that last bit? I always thought UNIQUE was implemented 
under the hood with a regular index. How would simply knowing something 
is unique make it so much easier to find the needle in the haystack 
without an index?


Do normal indexes map index keys to primary keys, so you still have to 
drill the PK index afterward? (BerkeleyDB does that, but it does it for 
all indexes, not just user-created ones).


Thanks,
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query optimizer bug?

2013-04-30 Thread Ryan Johnson

On 30/04/2013 5:20 PM, Nico Williams wrote:

On Tue, Apr 30, 2013 at 11:59 AM, Richard Hipp  wrote:

http://www.sqlite.org/draft/queryplanner-ng.html

That's quite interesting.

Should the user have a way to influence the query planner?  Perhaps by
indicating a cost for each table source?  SQL is supposed to let the
RDBMS pick the best query plan, but the RDBMS can't always pick the
best plan... in such cases users like to have a way to dictate the
plan.  And/or the RDBMS could keep state about different plans tried
in the past and learn to pick better plans, but this sounds very
complicated.
My experience with cross joins and TPCH Q8 suggests that in cases like 
this the user's not necessarily going to pick a good join ordering, 
either. At least I had bad luck with it. Being able to force certain 
access methods (use *this* index, not that one) would be helpful, though 
(does "+" do that or just suggest it?).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query optimizer bug?

2013-04-30 Thread Ryan Johnson

On 30/04/2013 12:59 PM, Richard Hipp wrote:

On Thu, Mar 14, 2013 at 2:07 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:


Hi all,

I'm running sqlite-3.7.13 on cygwin. Playing around with various TPC-H
queries with my class recently, I hit a strangely slow query and don't
understand why it's so slow.


http://www.sqlite.org/draft/queryplanner-ng.html
Nice. If you're willing to do quadratic work anyway, would it make sense 
to try and establish upper/lower bounds on the optimal solution to 
decide how much harder to work? I believe it's possible to find lower 
bounds for TSP using minimum spanning trees, and that computing the MST 
should be log-linear in the number of joins for most queries; if the 
"easy" NN=1 solution isn't too much higher than the lowest of a random 
selection of upper bounds, it's probably safe to stop looking; if the 
best upper bound is a *lot* lower (like the NN=1 result for Q8 being 
14,000 times slower than the optimal), it probably justifies investing 
more time with an NNN search for that particular query, in hopes of 
bringing down the predicted runtime; the expected time difference could 
even be used to set the NNN factor...


Thoughts?
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite 3.7.17 preview - 2x faster?

2013-04-04 Thread Ryan Johnson

On 04/04/2013 8:02 AM, Richard Hipp wrote:

By making use of memory-mapped I/O, the current trunk of SQLite (which will
eventually become version 3.7.17 after much more refinement and testing)
can be as much as twice as fast, on some platforms and under some
workloads.

Nice!

Some quick thoughts:

1. Does this replace the page cache completely, or does it just turn 
"read" and "write" into glorified memcpy calls? I would assume the 
latter so that virtual tables continue to work?


2. Does sqlite3 attempt to map the entire database file, and what 
happens with large files in 32-bit processes?


3. It seems like this would increase the "attack surface" for stray 
pointers in the host program. Granted, writes to stray pointers are not 
sqlite's fault, but they're an unfortunately common problem... and mmap 
makes user bugs more likely to directly corrupt the database on disk. 
Perceived reliability might drop as a result (I'm not arguing that the 
risk is worth giving up 2x, just pointing it out as a potential 
unintended consequence).


Thoughts?
Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Timeline for full ALTER TABLE support?

2013-03-27 Thread Ryan Johnson

On 27/03/2013 12:14 PM, Tim Gustafson wrote:

Clemens' analysis of the likelihood of seeing ALTER TABLE anytime soon is
correct.

Might I suggest that the "omitted" page then be updated to
unambiguously state that there is no plan to even implement the
missing features, so that people aren't left wondering?  Thanks!
It really shouldn't be that surprising, this is pretty standard for open 
source software. Somebody has to do it, which means either writing code 
yourself, waiting for somebody else to get around to writing code, or 
putting up $$$ so somebody else spends their time writing code for you.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to install sqlite 3.7.11

2013-03-21 Thread Ryan Johnson

On 21/03/2013 11:47 AM, Simon Slavin wrote:

On 21 Mar 2013, at 4:43am, Pratik Patodi  wrote:


I want to install sqlite 3.7.11 in my ubuntu 10.04.
Got the source code but no the makefile/config file.

 From where can ,I Download the set-up and how to install it.

SQLite is not something you install into your computer, where it is then used 
by lots of programs.  Instead the person writing each program which needs it 
includes it as part of their sourcecode.
Perhaps OP conflated the stand-alone sqlite CLI shell with sqlite 
itself? Installing the CLI should be as simple as:


1. Download the source amalgam
2. gcc -O2 sqlite3.c shell.c -o sqlite3
3. Copy the `sqlite3' binary to wherever it will be most convenient to use

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Query optimizer bug?

2013-03-14 Thread Ryan Johnson

On 14/03/2013 3:09 PM, Richard Hipp wrote:

Nitpick:  A "bug" means it gets the wrong answer, which is not the case
here.  What you are reporting here is not a bug but an optimization
opportunity.

Oops... you're right. Sorry about that.


On Thu, Mar 14, 2013 at 2:07 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:


The offending query (slightly modified version of Q7:

select count(*)
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey
   and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
   and c_nationkey = n2.n_nationkey and (
   (n1.n_name = 'ALGERIA' and n2.n_name = 'EGYPT')
   or (n1.n_name = 'EGYPT' and n2.n_name = 'ALGERIA')
   ) and l_shipdate between  '1995-01-01' and  '1996-12-31';

The optimizer chooses this plan:

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|2|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|3|3|SEARCH TABLE customer USING INDEX cnki (C_NATIONKEY=?) (~600 rows)
0|4|2|SEARCH TABLE orders USING INDEX ocki (O_CUSTKEY=?) (~15 rows)
0|5|1|SEARCH TABLE lineitem USING INDEX lpki (L_ORDERKEY=?) (~2 rows)

Dropping index nni and disabling automatic indexing improves things a bit
(3.3 s, 50% speedup):

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
0|5|5|SEARCH TABLE nation AS n2 USING INDEX npki (N_NATIONKEY=?) (~1 rows)


SQLite version 3.7.16 chooses the second plan regardless.  So that much has
already been addressed.

Great! I'll give it a try.




Presumably it's slow because predicates don't really hit until after all
the joins finish.


Sort of.  SQLite does evaluate predicates as soon as it can.  It doesn't
wait until after the inner join finishes to evaluate the predicates.  As
soon as all information needed for a predicate is available, it is
evaluated.

The problem is that SQLite does not do a lot of algebraic manipulation of
predicates to try to factor out terms that can be evaluated early.  So the
predicate:

   (n1.n_name='ALGERIA' and n2.n_name='EGYPT')
   OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')

is treated as a unit and cannot be evaluated until both n1 and n2 are
available.

That's what I figured; part of the "lite" in sqlite.


  If SQLite were to be enhanced to deal with this case, what it
would need to do is factor this into three separate conjuncts, as follows:

   (n1.n_name='ALGERIA' AND n2.n_name='EGYPT')
   OR (n1.n_name='EGYPT' and n2.n_name='ALGERIA')
AND
   (n1.n_name='ALGERIA OR n1.n_name='EGYPT')
AND
   (n2.n_name='EGYPT' OR n2.n_name='ALGERIA')

The second two are entirely redundant in the sense that if the first is
true then the second two are also true.  (SQLite has a method of marking
them so and making sure they are not evaluated if the first is evaluated.
Similar auxiliary conjuncts are used to help optimize LIKE, GLOB, and
BETWEEN operators)  But the second two conjuncts also depend on just a
single table, so they have the option of being evaluated early whereas the
first must wait until both tables have been evaluated.

If you augment the WHERE clause of your query by adding "AND
(n1.n_name='ALGERIA' OR n1.n_name='EGYPT')" you get the observed speedup.
I guess that feature got added after 3.7.13; it has no impact on 3.7.13, 
with or without the n_name index present, but a home-brew 3.7.16 beta 
build handles it fine. Actually, I get 0.21 s response time for even the 
original, unmodified query using 3.7.16.


None of this explains what purpose all those index probes on 
nation.n_name were doing if it wasn't applying predicates, but I guess 
it doesn't matter since the problem has been fixed in later versions.


Thanks,
Ryan



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Query optimizer bug?

2013-03-14 Thread Ryan Johnson

Hi all,

I'm running sqlite-3.7.13 on cygwin. Playing around with various TPC-H 
queries with my class recently, I hit a strangely slow query and don't 
understand why it's so slow.


The schema and dataset generator are available at tpc.org, and end of 
this message has instructions to replicate my setup quickly.


Short version for the impatient: running a particular query with 
selective predicates on the first of many tables to be joined, those 
predicates aren't applied until after the last join... even though the 
query uses an index on the predicate column to access the offending 
table. I would have expected the index probe to have the effect of 
pushing down the predicate, but pushing down predicates manually makes 
the query run ~20x faster.


Long version follows...

The offending query (slightly modified version of Q7:

select count(*)
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey
  and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
  and c_nationkey = n2.n_nationkey and (
  (n1.n_name = 'ALGERIA' and n2.n_name = 'EGYPT')
  or (n1.n_name = 'EGYPT' and n2.n_name = 'ALGERIA')
  ) and l_shipdate between  '1995-01-01' and  '1996-12-31';

The query counts 561 rows and takes 4.9 seconds to execute with a warm 
page cache (pragma cache_size = - 256000).


Note that this query is a pain to optimize: the intuitive dataflow takes 
a v-shape,  with lineitem at the point and the two nations at the ends 
(nation -> customer -> orders -> lineitem <- supplier <- nation); the 
predicates are highly selective but involve nation (both ends of the 
chain) and lineitem (in the middle). The optimizer chooses this plan:


0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|1|5|SEARCH TABLE nation AS n2 USING INDEX nni (N_NAME=?) (~1 rows)
0|2|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|3|3|SEARCH TABLE customer USING INDEX cnki (C_NATIONKEY=?) (~600 rows)
0|4|2|SEARCH TABLE orders USING INDEX ocki (O_CUSTKEY=?) (~15 rows)
0|5|1|SEARCH TABLE lineitem USING INDEX lpki (L_ORDERKEY=?) (~2 rows)

Dropping index nni and disabling automatic indexing improves things a 
bit (3.3 s, 50% speedup):


0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
0|5|5|SEARCH TABLE nation AS n2 USING INDEX npki (N_NATIONKEY=?) (~1 rows)

Removing predicates on n1 yields ~15k rows in 3.4s, with the following plan:
sqlite> explain query plan select count(*)
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey
  and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
  and c_nationkey = n2.n_nationkey and (
  (n2.n_name = 'ALGERIA') or (n2.n_name = 'EGYPT')
  ) and l_shipdate between  '1995-01-01' and  '1996-12-31';

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
0|5|5|SEARCH TABLE nation AS n2 USING INDEX npki (N_NATIONKEY=?) (~1 rows)

Presumably it's slow because predicates don't really hit until after all 
the joins finish. Removing predicates on n2 uses almost the same plan, 
and yields nearly the same row count (~14k rows), but executes in 0.25 s 
because the predicate applies before the first join:


sqlite> explain query plan select count(*)
from supplier, lineitem, orders, customer, nation n1, nation n2
where s_suppkey = l_suppkey and o_orderkey = l_orderkey
  and c_custkey = o_custkey and s_nationkey = n1.n_nationkey
  and c_nationkey = n2.n_nationkey and (
  (n1.n_name = 'ALGERIA' ) or (n1.n_name = 'EGYPT')
  ) and l_shipdate between  '1995-01-01' and  '1996-12-31';

0|0|4|SCAN TABLE nation AS n1 (~25 rows)
0|1|0|SEARCH TABLE supplier USING INDEX snki (S_NATIONKEY=?) (~40 rows)
0|2|1|SEARCH TABLE lineitem USING INDEX lski (L_SUPPKEY=?) (~300 rows)
0|3|2|SEARCH TABLE orders USING INDEX opki (O_ORDERKEY=?) (~1 rows)
0|4|3|SEARCH TABLE customer USING INDEX cpki (C_CUSTKEY=?) (~1 rows)
0|5|5|SEARCH TABLE nation AS n2 USING COVERING INDEX npki 
(N_NATIONKEY=?) (~1 rows)


Working from that observation, I changed the query to manually push down 
predicates, which reduces the runtime to 0.25 s (~20x speedup):


sqlite> explain query plan select count(*)
from supplier, lineitem, orders, customer, (
select * from nation where n_name = 'ALGERIA' or n_name = 'EGYPT'
 ) n1, nation n2
where 

Re: [sqlite] per-context resources for user-defined function

2013-03-11 Thread Ryan Johnson

On 11/03/2013 7:47 AM, Clemens Ladisch wrote:

James K. Lowden wrote:

I'm not sure how to manage the lifetime of ancillary data for a
user-defined function added by sqlite3_create_function ().
[...]
Suppose xStep doesn't fail, but another query is executing
simultaneously, also using the median() UDF.  I need one data structure
per context.  I don't see support for that in the SQLite API.

http://www.sqlite.org/c3ref/aggregate_context.html

Just be careful, because that API doesn't let you resize the allocation...

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-07 Thread Ryan Johnson

On 07/03/2013 5:59 PM, Igor Korot wrote:

Ryan,

On Thu, Mar 7, 2013 at 5:26 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

Well, you *do* want an index for the target side of the join, but for FK
joins the existing PK index already takes care of that.

OK,


At this point you only need to add an index on d.id to make the d.id=1 test
fast. And yes, it will make updates predicated on d.id faster as well.

Well, I just checked and the update is on both id and playerid.
Do I need the index on both fields?
Ah. That gets a bit more tricky. You can only use one index per table, 
and whether any index on the table is useful depends on the order of its 
key columns and what you do in the query.


If you always update based on equality, it doesn't matter whether you 
index (id,playerid) or (playerid,id). If one is equality and the other 
something else, put equality first (e.g. in your query you have id=1 and 
playerid=?, so the index should be on id,playerid).


If both are non-equality, put the index on which ever column you think 
will narrow down the results best, but at that point it's getting into 
black magic territory.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 3:14 PM, Nico Williams wrote:

On Thu, Mar 7, 2013 at 12:53 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

Meanwhile, though, I'd be delighted if column affinity, cast(), implicit
conversions performed by arithmetic operations, check(), and triggers all
behaved the same way, with the current behavior of column affinity probably
the least surprising/troublesome. Right now those five operations have four
different behaviors to keep track of.

I think there's just two type conversion behaviors: implicit type
conversions and CAST.

2

CHECK applies to values w/o any conversions (I
consider this a good thing),

1

affinity applies at various times and is
really a property of values derived from their sources' type affinity
(this is odd, but really a consequence of SQLite3's dynamic typing
design).

2+ (probably 4+, but let's be conservative)

Triggers don't have any special role here, save that BEFORE triggers
get values after implicit type conversions.
1 (thank goodness BEFORE and AFTER at least match behavior, even if the 
underlying source of data differs slightly)

This one I tend to think
is a problem, because one cannot implement CHECK via equivalent
TRIGGERs, and the ability to add/modify table constraints by using
triggers instead is kinda nice given the inability to change
table/column constraints after a schema is set.


You just enumerated at least five different behaviors to worry about if 
you might get a string or real where you expected an int.


I rest my case.

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 1:48 PM, Nico Williams wrote:

On Thu, Mar 7, 2013 at 12:20 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 07/03/2013 1:07 PM, Nico Williams wrote:

You might defer checks, but not type conversions.  In any case, I see
no value in deferring check constraints.

Anything constraining cardinality. The old example of "there must always be
three doctors on duty in the ER" comes to mind: if you check() for an exact
count, it becomes very hard to make changes to the schedule without deferred
checking.

That can't be done with a CHECK expression.  I have proposed
transaction-level triggers for this sort of thing, and I have an
implementation lying around (but it's not finished, and quite rotted
by now).
Fair enough. It might be worth deferring a check until transaction end 
if it allows verifying a bulk update with an efficient index probe, but 
that would only be true for really expensive checks and really big bulk 
updates.





Anyways, think of CHECK constraints as equivalent to BEFORE
INSERT/UPDATE triggers.

They're not equivalent. The before trigger sees only the value that was
actually stored, because it runs as a separate program after the
insert/update completes.

WAT?

Yup. I've had several emails in this thread with examples. Here's the short
version:

BEFORE INSERT/UPDATE triggers may apply after type conversions, but
not after storing the new/updated row.  That's what my WAT was about
:)

Check out the vdbe generated by an insert on a table with a before
insert trigger.

Fair enough. I should have said:

They're not equivalent. The before trigger sees only the value that will 
actually be stored.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 1:45 PM, Simon Slavin wrote:

On 7 Mar 2013, at 6:27pm, Ryan Johnson <ryan.john...@cs.utoronto.ca> wrote:


The problem is sqlite3 doesn't cast to REAL first. It just parses the string 
until it hits '.' (which isn't a valid part of an integer) and then returns 
whatever it had accumulated so far. That breaks in creative ways for values 
like:

cast('10e-1' as integer) -- 10
cast('abc' as integer) -- 0

Ah, thank you.  Good explanation.

I consider this behaviour wrong, too.  Casting a string as INTEGER should take 
into account what it means as a REAL.  For instance, as a 
programmer/statistician, what do you think the desired behaviour would be for

CAST('0.9' AS INTEGER)

I know what I want.  Perhaps this can be fixed in SQLite4.
I tend to agree that some intelligent rounding is in order (those darn 
epsilons in floating point!).


Meanwhile, though, I'd be delighted if column affinity, cast(), implicit 
conversions performed by arithmetic operations, check(), and triggers 
all behaved the same way, with the current behavior of column affinity 
probably the least surprising/troublesome. Right now those five 
operations have four different behaviors to keep track of.


And that's *before* you factor in what happens when you use several of 
those operations on the same column, e.g.:


sqlite> select x,typeof(x),0+x from t1;
3|integer|3
3.0|real|3.0
3.0|real|3.0
3|text|3
3.0|text|3.0
30e-1|text|3.0
sqlite> select x,0+x,typeof(x) from t1;
3|3|integer
3.0|3.0|real
3.0|3.0|real
3|3|integer
3.0|3.0|real
30e-1|3.0|real

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 1:15 PM, Simon Slavin wrote:

On 7 Mar 2013, at 4:07pm, Ryan Johnson <ryan.john...@cs.utoronto.ca> wrote:


That does leave the question of what to do with cast ('1.0' as integer), though. 
Without the prefix-based matching that would now return NULL rather than 1, even 
though cast(1.0 as integer) would still return 1. Then again, disallowing all 
floats might be better than the current practice of returning 1 from a cast of 
both '1e-10' and '1e10' (the real->integer casts do the right thing, as does 
assignment to a column with integer affinity).

Can you (or someone) explain the problem with


cast ('1.0' as integer)

?  Possibly by explaining what "prefix-based matching" is.  I see no problem 
with SQLite /internally/ first casting as a REAL and then casting that REAL as an 
INTEGER, resulting in the integer 1.  The real question is what you'd get if you tried
The problem is sqlite3 doesn't cast to REAL first. It just parses the 
string until it hits '.' (which isn't a valid part of an integer) and 
then returns whatever it had accumulated so far. That breaks in creative 
ways for values like:


cast('10e-1' as integer) -- 10
cast('abc' as integer) -- 0


cast ('1.1' as integer)

Can you acceptably cast a non-integer REAL as an INTEGER ?  Or should /that/ 
produce a NULL ?  Since what you're actually doing is depending on some 
mathematical function like round().
IMHO, casting string version of reals to actual integers should do 
whatever happens when you cast an actual real to integer. Whether that 
means NULL, truncate, or round is debatable, but there's pretty strong 
precedent for truncation when casting real to int.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 1:07 PM, Nico Williams wrote:

On Thu, Mar 7, 2013 at 11:44 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 07/03/2013 12:27 PM, Nico Williams wrote:

On Thu, Mar 7, 2013 at 11:12 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

I would argue that, if a column has type affinity, CHECK should work with
the value that would actually get stored, not the one that was assigned.

But then you couldn't check the value that was attempted to store.
You'd be stuck with dynamic type conversions in all cases.

Is that a bad thing? Forbidding dynamic type conversions to occur at all is
very different than requiring that they always succeed if/when they do
occur, and overly strict IMO.

I prefer strong static typing and no automatic type conversions, but
the way SQLite3 is now I can get all of:

  - duck typing (no constraints)
  - strong typing with automatic type conversions (see earlier posts in
this thread)
  - strong typing with no automatic type conversions (ditto)

That's fairly flexible.  There's something to be said for that.
True, but I'd happily give up static strong typing rather than deal with 
the current mess. YMMV.




Besides, the check is defined to verify the column, not on the value that
came from the user. In theory you should be able to defer all checks until
the end of a transaction (for all the same reasons deferred FK checking can
be important), and that would cause a behavior change as things currently
stand (the attempted-to-store value would be long gone by then).

You might defer checks, but not type conversions.  In any case, I see
no value in deferring check constraints.
Anything constraining cardinality. The old example of "there must always 
be three doctors on duty in the ER" comes to mind: if you check() for an 
exact count, it becomes very hard to make changes to the schedule 
without deferred checking.



Anyways, think of CHECK constraints as equivalent to BEFORE
INSERT/UPDATE triggers.

They're not equivalent. The before trigger sees only the value that was
actually stored, because it runs as a separate program after the
insert/update completes.

WAT?
Yup. I've had several emails in this thread with examples. Here's the 
short version:


sqlite> create table t1(x);
sqlite> create table t2(x integer);
sqlite> create table t3(x integer check(typeof(x)='integer'));
sqlite> create table t4(x integer);
sqlite> create trigger t4t before insert on t4 begin select raise(FAIL, 
'ints only!') where typeof(new.x)!='integer'; end;

sqlite> insert into t1 values(3);
sqlite> insert into t1 values(3.0);
sqlite> insert into t1 values(30e-1);
sqlite> insert into t1 values('3');
sqlite> insert into t1 values('3.0');
sqlite> insert into t1 values('30e-1');
sqlite> insert into t2 select * from t1;
sqlite> insert into t3 select * from t1;
Error: constraint failed
sqlite> insert into t4 select * from t1;

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 12:18 PM, Ryan Johnson wrote:

On 07/03/2013 11:14 AM, Doug Currie wrote:
On Mar 7, 2013, at 11:07 AM, Ryan Johnson 
<ryan.john...@cs.utoronto.ca> wrote:
That does leave the question of what to do with cast ('1.0' as 
integer), though. Without the prefix-based matching that would now 
return NULL rather than 1, even though cast(1.0 as integer) would 
still return 1. Then again, disallowing all floats might be better 
than the current practice of returning 1 from a cast of both '1e-10' 
and '1e10' (the real->integer casts do the right thing, as does 
assignment to a column with integer affinity).

Would

   cast(cast(x as real) as integer)

do what you want?
Looks like it, though I'd probably make the first cast be to numeric 
(may as well go directly to int if you can).


It's also a bit more efficient (2 VDBE ops fewer) and less brittle 
than the nasty hack I came up with:


x=cast(cast(x as numeric) as integer)
vs.
x+0=x and cast(x as integer)=x
Here's some more weirdness... typeof(cast('30e-1' as numeric)) returns 
'integer' but casting 30e-1 returns 'real'. Same for '3.0' vs 3.0. And, 
of course, casting 'abc' also yields an integer.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 12:27 PM, Nico Williams wrote:

On Thu, Mar 7, 2013 at 11:12 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

I would argue that, if a column has type affinity, CHECK should work with
the value that would actually get stored, not the one that was assigned.

But then you couldn't check the value that was attempted to store.
You'd be stuck with dynamic type conversions in all cases.
Is that a bad thing? Forbidding dynamic type conversions to occur at all 
is very different than requiring that they always succeed if/when they 
do occur, and overly strict IMO.


Besides, the check is defined to verify the column, not on the value 
that came from the user. In theory you should be able to defer all 
checks until the end of a transaction (for all the same reasons deferred 
FK checking can be important), and that would cause a behavior change as 
things currently stand (the attempted-to-store value would be long gone 
by then).



Anyways, think of CHECK constraints as equivalent to BEFORE
INSERT/UPDATE triggers.
They're not equivalent. The before trigger sees only the value that was 
actually stored, because it runs as a separate program after the 
insert/update completes.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 11:14 AM, Doug Currie wrote:

On Mar 7, 2013, at 11:07 AM, Ryan Johnson <ryan.john...@cs.utoronto.ca> wrote:

That does leave the question of what to do with cast ('1.0' as integer), though. 
Without the prefix-based matching that would now return NULL rather than 1, even 
though cast(1.0 as integer) would still return 1. Then again, disallowing all 
floats might be better than the current practice of returning 1 from a cast of 
both '1e-10' and '1e10' (the real->integer casts do the right thing, as does 
assignment to a column with integer affinity).

Would

   cast(cast(x as real) as integer)

do what you want?
Looks like it, though I'd probably make the first cast be to numeric 
(may as well go directly to int if you can).


It's also a bit more efficient (2 VDBE ops fewer) and less brittle than 
the nasty hack I came up with:


x=cast(cast(x as numeric) as integer)
vs.
x+0=x and cast(x as integer)=x

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 06/03/2013 10:30 AM, Dominique Devienne wrote:

On Wed, Mar 6, 2013 at 3:29 PM, Ryan Johnson <ryan.john...@cs.utoronto.ca>
wrote:

Off topic, I'd love a way to request strong typing for a column (so that

attempts to store 'abc' into an int column would fail). You can emulate it
with a pair of before/update triggers (select raise(...) where
typeof(intcol)!='integer'), but that's clunky. Oh well... in retrospect,
most of the times I've been bitten by type mismatches were probably either
due to this bug or (more likely) due to my not specifying any affinity at
all and then being surprised when 1 != '1'.

You don't have to use triggers, you can use a check constraint instead
(simpler, but also perhaps faster as well?).

If you do, you loose some of the implicit type conversions SQLite does,
based on type affinity, so the "1" no longer gets converted to 1.

I also would prefer a strong-typing mode though (as an opt-in pragma for
example), rather than adding a bunch of check constraints, and built-in
strong-typing would likely be faster. But Dr. Hipp prefers dynamic typing,
so dynamic typing it is :).

--DD

sqlite> create table t (id number);
sqlite> insert into t values (1);
sqlite> insert into t values ("1");
sqlite> insert into t values ("1x");
sqlite> select id, typeof(id) from t;
1|integer
1|integer
1x|text

sqlite> create table tt (id number check (typeof(id) = 'integer'));
sqlite> insert into tt values (1);
sqlite> insert into tt values ("1");
Error: constraint failed
sqlite> insert into tt values ("1x");
Error: constraint failed
sqlite> select id, typeof(id) from tt;
1|integer

// recent SQLite versions also now report the name of the constraint that
failed, if available.
sqlite> create table ttt (id number, constraint id_is_integer check
(typeof(id) = 'integer'));
sqlite> insert into ttt values (1);
sqlite> insert into ttt values ("1");
Error: constraint id_is_integer failed
I would argue that, if a column has type affinity, CHECK should work 
with the value that would actually get stored, not the one that was 
assigned. Otherwise you get situations where the non-checked table ended 
up with an integer inside even though the checked table rejected the 
change. This doesn't usually show up because most check constraints on 
numeric fields use operators that try to convert strings to numbers 
(though the way they do it varies).


Looking at this some more, I'd say there are several inconsistent 
behaviors here. Most are strange type conversion behaviors that may or 
may not be intentional, but two definitely look like bugs:


1. Predicates can behave differently for WHERE vs CHECK (see table t3
   in the typescript below).
2. The type of x (as reported by typeof, but also visible in other
   ways) can change depending on how x has been used before (see query
   before table t4 in the typescript and the check constraint in table t8).

In any case, the following (see table t8) seems to enforce integer-ness 
properly: check(x+0=x and cast(x as integer)=x)


It exploits several behaviors:

1. The expression "x+0" causes all numeric strings to be converted to
   numbers (real or int)
2. Comparing the result against x discards garbage like '3bc' or 'abc'
   that would otherwise slip through.
3. The numeric type of "x+0" changes the type of x to numeric, so that
   values like '30e-1' typecast properly afterward.
4. Comparing the output of the cast against x discards reals values,
   leaving only ints.

You have to evaluate #1 before #3, though, or the whole thing falls 
apart... which is scary and brittle.


The VDBE changes from 13 to 20 instructions, with the following bit added:

4|Integer|1|2|0||00|
5|Integer|0|4|0||00|
6|Add|4|2|3||00|
7|Ne|2|11|3|collseq(BINARY)|64|
8|SCopy|2|3|0||00|
9|ToInt|3|0|0||00|
10|Eq|2|12|3|collseq(BINARY)|6b|
11|Halt|19|2|0||00|


Proper support for strong typing would be vastly more efficient; a 
single OP_MustBeInt should work beautifully for integers, and similar 
opcodes could be defined easily for the remaining types.


Meanwhile, the trigger is bulkier to code up and more expensive at 
runtime, but much less likely to break in subtle ways when some future 
release of sqlite3 starts reordering predicate expressions.


Ryan

 begin typescript ---
sqlite> create table t1(x);
sqlite> insert into t1 values(3);
sqlite> insert into t1 values('3');
sqlite> insert into t1 values(3.0);
sqlite> insert into t1 values('3.0');
sqlite> insert into t1 values(30e-1);
sqlite> insert into t1 values('30e-1');
sqlite> insert into t1 values(3.3);
sqlite> insert into t1 values('3.3');
sqlite> insert into t1 values(3e-1);
sqlite> insert into t1 values('3e-1');
sqlite> insert into t1 values('3bc');
sqlite> insert into t1 values('abc');
sqlite> select x,typeof(x) from t1;
3|integer
3|text
3.0|real
3.0|text
3.0|real
30e-1|te

Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-07 Thread Ryan Johnson

On 07/03/2013 9:28 AM, Simon Slavin wrote:

On 7 Mar 2013, at 1:36pm, "Michael Black"  wrote:


New:
select cast('2' as integer);
2
select cast('2a' as integer);
0

Sorry, but that's very bad.  There is no way that the string '2a' could 
represent 0.  I agree that interpreting '2a' as the integer 2 may be considered 
wrong, but I think you're solving the problem the wrong way.

If you want to indicate a problem, I would like to suggest that you either 
return NULL or have sqlite_step() generate an actual error code.  Of the two I 
think returning NULL is more within the spirit of SQL but I haven't though that 
through yet.


In a world where 1/0 returns NULL, it makes sense for an invalid 
typecast to do the same. For purposes of enforcing strong typing, a 
non-null integer constraint would be "check (cast(x as integer) is not 
null)" and a nullable integer constraint would just need to add "or x is 
null" -- a pretty clean solution IMO.


That does leave the question of what to do with cast ('1.0' as integer), 
though. Without the prefix-based matching that would now return NULL 
rather than 1, even though cast(1.0 as integer) would still return 1. 
Then again, disallowing all floats might be better than the current 
practice of returning 1 from a cast of both '1e-10' and '1e10' (the 
real->integer casts do the right thing, as does assignment to a column 
with integer affinity).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-07 Thread Ryan Johnson
Well, you *do* want an index for the target side of the join, but for FK 
joins the existing PK index already takes care of that.


At this point you only need to add an index on d.id to make the d.id=1 
test fast. And yes, it will make updates predicated on d.id faster as well.


Ryan

On 07/03/2013 8:04 AM, Stephen Chrzanowski wrote:

Try:

[ select distinct a.playerid, a.name, d.value, b.name, d currvalue from a
join d on a.playerid=d.playerid join b on a.teamid=b.teamid where d.id=1 ]

Indexes should be created for WHERE clauses, not JOIN.

On Thu, Mar 7, 2013 at 3:59 AM, Igor Korot <ikoro...@gmail.com> wrote:


Hi, Ryan,

On Fri, Mar 1, 2013 at 12:48 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 01/03/2013 2:23 PM, Igor Korot wrote:

Hi, guys,

On Fri, Mar 1, 2013 at 8:48 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 01/03/2013 11:10 AM, Stephen Chrzanowski wrote:

***I'm waiting for the repair man to show up to fix my waterheater...
so...
I'm bored. This is going to be to the point at the beginning, but get
wordy
and technical near the end. ;)  Super over kill. ahem

Nice explanation... just a couple of nitpicks:

Very nice indeed.
Here is what I'm trying to with little more detail.

I am using C API in a C++ based program.
What I'm trying to do is decrease the time the program displays the
main frame window.

Basically during construction I am calling SELECT ... WHERE
.PK = .FK.
Problem is when  was created I didn't make the foreign key.

Check the output of `explain query plan' but I suspect the join already

uses

an index on .PK (if that PK was really declared as such, there

will

be an index on it). The foreign key (or lack thereof) has nothing to do

with

this situation.

Depending on what conditions you have in the where clause, some

additional

indexes could be helpful. For example, if you want "where .z

between

:X and :Y" (where X and Y are parameters passed in from your code), then

an

index on .c will speed things up drastically because it will

allow

fetching only the range of records that match the predicate. The

improvement

is even more pronounced for equality predicates. As a completely bogus
example:

create table foo(x integer primary key, y, z);
create table bar(a integer primary key, b, c, x integer);
explain query plan select * from foo join bar using(x) where z between 30
and 50 and c = 10;

0|0|1|SCAN TABLE bar (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)


create index foo_z on foo(z);
create index bar_c on bar(c);
explain query plan select * from foo join bar using(x) where z between 30
and 50 and c = 10;

0|0|1|SEARCH TABLE bar USING INDEX bar_c (c=?) (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)

Let me give an example so that you can better understand the problem:

create table a(playerid integer primary key, name varchar(70), teamid
integer, value double, foreign key teamid references b(teamid));
create table b(teamid integer primary key, teamname varchar(15));
create table c(id integer primary key, name varchar(10));
create table d(id integer, playerid integer, value integer, currvalue
double);

SELECT DISTINCT a.playerid, a.name, d.value, b.name, d.currvalue FROM
a, b, d WHERE d.playerid = a.playerid AND a.teamid = b.teamid AND d.id
= 1;

Without any indexes I am getting:

0|0|2|SCAN TABLE d (~10 rows)
0|1|0|SEARCH TABLE a USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|2|1|SEARCH TABLE b USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
0|0|0|USE TEMP B-TREE FOR DISTINCT

What I am thinking is to create an index on the d for the playerid +
id to speed things up.

Am I right? Any suggestions for improvements?

Now in the end of my program I will update the d table - value and
currvalue fields - for every playerid that
belongs to the id = 1.

Will this index help me there as well or I will lose performance?

Thank you.



The row estimates are worthless (it's an empty database and I never ran
ANALYZE) but you get the idea: without an index, it scans all of bar for
matching rows (= slow) and then joins it to foo using the latter's

primary

key index. With the index, it probes bar_c for precisely the data needed

(=

fast), then uses foo's primary key index to complete the join as before.

In

general, you can use at most one index per table. So, for example, you

can't

use both foo_z and the primary key index on foo, so it decides to use

the pk

index because that's predicted to be more beneficial; statistics could
easily push that the other way, though (for example, if there's only one

or

two rows with c=10). The pk index on bar, on the other hand, is useless

for

this query because we don't care what bar.a is; using bar_c index is the
obvious choice there.

I'm afraid I didn't follow your explanation of the grid and update stuff,
but hopefully the above at least helps you

Re: [sqlite] SQLite strong-typing [WAS: inner vs. outer join inconsistency]

2013-03-06 Thread Ryan Johnson

On 06/03/2013 10:30 AM, Dominique Devienne wrote:

On Wed, Mar 6, 2013 at 3:29 PM, Ryan Johnson <ryan.john...@cs.utoronto.ca>
wrote:

Off topic, I'd love a way to request strong typing for a column (so that

attempts to store 'abc' into an int column would fail). You can emulate it
with a pair of before/update triggers (select raise(...) where
typeof(intcol)!='integer'), but that's clunky. Oh well... in retrospect,
most of the times I've been bitten by type mismatches were probably either
due to this bug or (more likely) due to my not specifying any affinity at
all and then being surprised when 1 != '1'.

You don't have to use triggers, you can use a check constraint instead
(simpler, but also perhaps faster as well?).

If you do, you loose some of the implicit type conversions SQLite does,
based on type affinity, so the "1" no longer gets converted to 1.
... which is why I prefer triggers. They kick in after the column's 
numeric affinity has a chance to convert "1" to 1.


Gives the best of both worlds, once you get past the boilerplate:

sqlite> create table t(x number);
sqlite> create trigger t1 before insert on t begin
   ...> select raise(FAIL, 'Numbers only!') where typeof(new.x) =='text';
   ...> end;
sqlite> insert into t values(1);
sqlite> insert into t values('1');
sqlite> insert into t values('a');
Error: Numbers only!
sqlite> select x,typeof(x) from t;
1|integer
1|integer

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inner vs. outer join inconsistency

2013-03-06 Thread Ryan Johnson
I would agree that no warning is needed for for columns that don't state 
any affinity, or for a non-affinity FK that refers to some PK with 
affinity.


I tend to agree with OP that an explicitly text foreign key referring to 
an explicitly int primary key is probably worth a warning (perhaps from 
the hypothetical "lint mode" that surfaces now and then), since it's 
pretty likely that a user who took the trouble to specify affinities for 
both PK and FK probably made a mistake if the types are different. Sure, 
some record might override affinity and store 'abc' as its "int" primary 
key, but even if your app relies on that behavior, an "int" foreign key 
would be harmless for the same reason.


Off topic, I'd love a way to request strong typing for a column (so that 
attempts to store 'abc' into an int column would fail). You can emulate 
it with a pair of before/update triggers (select raise(...) where 
typeof(intcol)!='integer'), but that's clunky. Oh well... in retrospect, 
most of the times I've been bitten by type mismatches were probably 
either due to this bug or (more likely) due to my not specifying any 
affinity at all and then being surprised when 1 != '1'.


Ryan

On 06/03/2013 7:23 AM, Stephen Chrzanowski wrote:

SQLite doesn't care what kind of data type you are using, so, no, it
shouldn't throw an error.  The logic of this database engine is that you
will always be comparing apples to apples, regardless if one happens to be
orange.

On Wed, Mar 6, 2013 at 4:50 AM, Tom Matrix  wrote:


Richard Hipp  writes:


On Sun, Mar 3, 2013 at 2:48 PM, Tom Matrix  wrote:


I’ve encountered a problem, which is hardly reproducable on arbitrary
databases, therefore I attached one.


A simple, reproducible test case for (what we think is) your problem can

be

seen in this ticket:

  www.sqlite.org/src/tktview/fc7bd6358f59b

This bug has been latent in SQLite for almost four years and you are the
first to hit it.  Probably this is because not many applications contain
A=B in the WHERE clause where A is a text expression and B is an integer
expression.  You can probably work around the problem by changing your
schema so that entries.measurementid is an integer rather than text.

  This

does not excuse SQLite:  It is still getting the wrong answer and needs

to

be fixed.  We are working on a fix now.  But a simple change to your

schema

will work around the problem and get you going even before that fix is
available.

Very good example, thank you! It really solved my problem. Nevertheless, I
think
I don't have to mention that entries.measurementid must have been an
integer,
and this bug must be fixed in our schema, too.

My only remaining concern is, however: Should not SQLite give an error (or
at
least a warning) in cases where a foreign key constraint refers to a
different
data type?

Thanks again,
Tamás

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] inner vs. outer join inconsistency

2013-03-06 Thread Ryan Johnson

On 06/03/2013 4:50 AM, Tom Matrix wrote:

Richard Hipp  writes:


On Sun, Mar 3, 2013 at 2:48 PM, Tom Matrix  wrote:


I’ve encountered a problem, which is hardly reproducable on arbitrary
databases, therefore I attached one.


A simple, reproducible test case for (what we think is) your problem can be
seen in this ticket:

  www.sqlite.org/src/tktview/fc7bd6358f59b

This bug has been latent in SQLite for almost four years and you are the
first to hit it.  Probably this is because not many applications contain
A=B in the WHERE clause where A is a text expression and B is an integer
expression.  You can probably work around the problem by changing your
schema so that entries.measurementid is an integer rather than text.  This
does not excuse SQLite:  It is still getting the wrong answer and needs to
be fixed.  We are working on a fix now.  But a simple change to your schema
will work around the problem and get you going even before that fix is
available.

Very good example, thank you! It really solved my problem. Nevertheless, I think
I don't have to mention that entries.measurementid must have been an integer,
and this bug must be fixed in our schema, too.

My only remaining concern is, however: Should not SQLite give an error (or at
least a warning) in cases where a foreign key constraint refers to a different
data type?
My understanding is that it shouldn't have mattered, because the text 
column should have been converted to int before performing the 
comparison. This works for simpler queries where the non-int column is 
only compared with an int column. Simplifying the test case from the 
ticket above:


sqlite3> select * from t,i where textid=intid;
12|12
34|34

However, when comparing the non-int column with both an int column and a 
non-int one, the bug is that the promotion no longer happens and the 
join fails:


sqlite3> select * from t t1,t t2, i where t1.textid=i.intid and 
t1.textid=t2.textid;

(nothing)

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-04 Thread Ryan Johnson

On 04/03/2013 2:20 PM, Petite Abeille wrote:

On Mar 4, 2013, at 1:32 AM, James K. Lowden  wrote:


What do you have in mind?  I've benn adding some user defined functions
and am thinking of creating a repository for them.

All the so-called window functions from SQL:2003 (aka analytic functions):

"Windowed Tables and Window Functions in SQL"
http://wwwlgis.informatik.uni-kl.de/cms/fileadmin/courses/SS2008/NEDM/RDDM.Chapter.06.Windows_and_Query_Functions_in_SQL.pdf
Postgresql has also had them for a while. I'd love to see sqlite3 gain 
them, but I don't think they can be added efficiently as custom 
functions. The engine would need to become window-aware, which is 
probably enough work to delay its introduction until a consortium member 
decides they need it. That said, the VDBE architecture should make it 
significantly easier to implement inside sqlite3 than for most other 
engines.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-02 Thread Ryan Johnson

On 01/03/2013 8:08 PM, Stephen Chrzanowski wrote:

On Fri, Mar 1, 2013 at 11:48 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:


On 01/03/2013 11:10 AM, Stephen Chrzanowski wrote:


***I'm waiting for the repair man to show up to fix my waterheater...
so...
I'm bored. This is going to be to the point at the beginning, but get
wordy
and technical near the end. ;)  Super over kill. ahem


Nice explanation... just a couple of nitpicks:



Thanks gents. :]



  You will ALWAYS incur slower speeds when using foreign keys in either a

join
or "ON [DELETE/UPDATE]".  Additional look ups have to happen, which means
more
time spent, which typically is the millisecond range per lookup.


I can't think of any reason a foreign key constraint would impact the cost
of joins in any query. The cost is entirely at update time (when you have
to enforce the constraint).


Wouldn't you have to do a look up to procure the list of keys to see what
is actually part of the result set, or would that be decided within the
WHERE logic/filtering?
WHERE does that. Let's say you have table P and F, and the foreign key 
is P.PK=F.FK. With no index at all, it will fetch all rows from P, 
throwing away those that don't match their filters. Then it will fetch 
all rows from F, also throwing away mismatches. Then it will do an 
(expensive) join comparing all rows in P with all rows in F, returning 
only the ones having P.PK = F.FK.


However, if (most likely) you've told sqlite3 that P.PK is a primary 
key, the plan changes: Fetch and filter rows from F side, and for each 
match, then use F.FK to "probe" the index sqlite3 automatically on P.PK. 
That will give only rows for which F.FK=P.PK, and can be vastly faster.


If you create an additional index whose field(s) make it easier to apply 
the WHERE filter on F, sqlite3 can use that instead of a scan. Think if 
you had an index on (last-name,first-name). A query that filters for a 
certain last name can use that index, but a query that filters by first 
name can't (try looking up your first name in a phone book if you don't 
believe me!).


If you created an index on F.FK (which doesn't happen automatically), 
and created other indexes on fields involved in the filters for P, then 
the engine *might* look at its stats and decide it's actually faster to 
use the index to filter P, then probe F using that index. That would 
happen if it thinks the filters on P allow fewer rows past than the 
filters on F do. Choosing which index(es) to create is mostly black 
magic: which ones are actually useful depends strongly on which query 
and data you have, and how smart the optimizer is, and how up-to-date 
your stats are (from ANALYZE).


Meanwhile, a foreign key constraint does the following:
- Whenever you insert or update F, it will blow up if there is not a 
matching P.PK in the database.
- When you delete from F, it will find the matching P.PK (which must be 
there, see above) and either delete them or give an error (depending on 
what you asked for).
- Whenever you read from F, it does absolutely nothing at all, because 
the constraint has already been enforced.


In other words, enforcing a FK constraint requires running queries very 
similar to the one you're trying to run... on every single update. P.PK 
also has the constraint that it must be unique, again requiring queries 
on every insert/update. All those queries can use the PK index just like 
yours do. That's one of the reasons sqlite3 creates that PK index 
automatically, the performance hit of enforcing PK and FK constraints 
would be ridiculous otherwise.


HTH,
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-02 Thread Ryan Johnson

On 01/03/2013 4:09 PM, Igor Korot wrote:

Ryan,

On Fri, Mar 1, 2013 at 12:48 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 01/03/2013 2:23 PM, Igor Korot wrote:

Hi, guys,

On Fri, Mar 1, 2013 at 8:48 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 01/03/2013 11:10 AM, Stephen Chrzanowski wrote:

***I'm waiting for the repair man to show up to fix my waterheater...
so...
I'm bored. This is going to be to the point at the beginning, but get
wordy
and technical near the end. ;)  Super over kill. ahem

Nice explanation... just a couple of nitpicks:

Very nice indeed.
Here is what I'm trying to with little more detail.

I am using C API in a C++ based program.
What I'm trying to do is decrease the time the program displays the
main frame window.

Basically during construction I am calling SELECT ... WHERE
.PK = .FK.
Problem is when  was created I didn't make the foreign key.

Check the output of `explain query plan' but I suspect the join already uses
an index on .PK (if that PK was really declared as such, there will
be an index on it). The foreign key (or lack thereof) has nothing to do with
this situation.

Depending on what conditions you have in the where clause, some additional
indexes could be helpful. For example, if you want "where .z between
:X and :Y" (where X and Y are parameters passed in from your code), then an
index on .c will speed things up drastically because it will allow
fetching only the range of records that match the predicate. The improvement
is even more pronounced for equality predicates. As a completely bogus
example:

create table foo(x integer primary key, y, z);
create table bar(a integer primary key, b, c, x integer);
explain query plan select * from foo join bar using(x) where z between 30
and 50 and c = 10;

0|0|1|SCAN TABLE bar (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)


create index foo_z on foo(z);
create index bar_c on bar(c);
explain query plan select * from foo join bar using(x) where z between 30
and 50 and c = 10;

0|0|1|SEARCH TABLE bar USING INDEX bar_c (c=?) (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)


The row estimates are worthless (it's an empty database and I never ran
ANALYZE) but you get the idea: without an index, it scans all of bar for
matching rows (= slow) and then joins it to foo using the latter's primary
key index. With the index, it probes bar_c for precisely the data needed (=
fast), then uses foo's primary key index to complete the join as before. In
general, you can use at most one index per table. So, for example, you can't
use both foo_z and the primary key index on foo, so it decides to use the pk
index because that's predicted to be more beneficial; statistics could
easily push that the other way, though (for example, if there's only one or
two rows with c=10). The pk index on bar, on the other hand, is useless for
this query because we don't care what bar.a is; using bar_c index is the
obvious choice there.

I'm afraid I didn't follow your explanation of the grid and update stuff,
but hopefully the above at least helps you make the query fast.

So basically what you said is:
"Don't make a foreign key. Instead create an index on the foreign key
field and it will speed things up"
Foreign keys have nothing to do with query optimization. Use them only 
if they enforce a constraint your application needs enforced.


An index on the foreign key field will probably not be useful, but the 
optimizer has the final say on that (depending on the data you stored 
and the exact nature of your query). If EXPLAIN QUERY PLAN reports using 
an index for every table it accesses, though, creating another index is 
unlikely to help.




Am I right?

What about making both foreign key and index? Will this improve it even further?

Most likely neither will improve performance in your situation.

If you think you still need more performance, you'll have to (at a 
minimum) post the query you use, the schema it runs on, and the output 
of EXPLAIN QUERY PLAN after you have run ANALYZE.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-01 Thread Ryan Johnson

On 01/03/2013 2:23 PM, Igor Korot wrote:

Hi, guys,

On Fri, Mar 1, 2013 at 8:48 AM, Ryan Johnson
<ryan.john...@cs.utoronto.ca> wrote:

On 01/03/2013 11:10 AM, Stephen Chrzanowski wrote:

***I'm waiting for the repair man to show up to fix my waterheater...
so...
I'm bored. This is going to be to the point at the beginning, but get
wordy
and technical near the end. ;)  Super over kill. ahem

Nice explanation... just a couple of nitpicks:

Very nice indeed.
Here is what I'm trying to with little more detail.

I am using C API in a C++ based program.
What I'm trying to do is decrease the time the program displays the
main frame window.

Basically during construction I am calling SELECT ... WHERE
.PK = .FK.
Problem is when  was created I didn't make the foreign key.
Check the output of `explain query plan' but I suspect the join already 
uses an index on .PK (if that PK was really declared as such, 
there will be an index on it). The foreign key (or lack thereof) has 
nothing to do with this situation.


Depending on what conditions you have in the where clause, some 
additional indexes could be helpful. For example, if you want "where 
.z between :X and :Y" (where X and Y are parameters passed in 
from your code), then an index on .c will speed things up 
drastically because it will allow fetching only the range of records 
that match the predicate. The improvement is even more pronounced for 
equality predicates. As a completely bogus example:


create table foo(x integer primary key, y, z);
create table bar(a integer primary key, b, c, x integer);
explain query plan select * from foo join bar using(x) where z between 
30 and 50 and c = 10;

0|0|1|SCAN TABLE bar (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)


create index foo_z on foo(z);
create index bar_c on bar(c);
explain query plan select * from foo join bar using(x) where z between 
30 and 50 and c = 10;

0|0|1|SEARCH TABLE bar USING INDEX bar_c (c=?) (~10 rows)
0|1|0|SEARCH TABLE foo USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)


The row estimates are worthless (it's an empty database and I never ran 
ANALYZE) but you get the idea: without an index, it scans all of bar for 
matching rows (= slow) and then joins it to foo using the latter's 
primary key index. With the index, it probes bar_c for precisely the 
data needed (= fast), then uses foo's primary key index to complete the 
join as before. In general, you can use at most one index per table. So, 
for example, you can't use both foo_z and the primary key index on foo, 
so it decides to use the pk index because that's predicted to be more 
beneficial; statistics could easily push that the other way, though (for 
example, if there's only one or two rows with c=10). The pk index on 
bar, on the other hand, is useless for this query because we don't care 
what bar.a is; using bar_c index is the obvious choice there.


I'm afraid I didn't follow your explanation of the grid and update 
stuff, but hopefully the above at least helps you make the query fast.


Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Is there a way to create a foreign key in existing table?

2013-03-01 Thread Ryan Johnson


On 01/03/2013 11:10 AM, Stephen Chrzanowski wrote:

***I'm waiting for the repair man to show up to fix my waterheater... so...
I'm bored. This is going to be to the point at the beginning, but get wordy
and technical near the end. ;)  Super over kill. ahem

Nice explanation... just a couple of nitpicks:


You will ALWAYS incur slower speeds when using foreign keys in either a join
or "ON [DELETE/UPDATE]".  Additional look ups have to happen, which means more
time spent, which typically is the millisecond range per lookup.
I can't think of any reason a foreign key constraint would impact the 
cost of joins in any query. The cost is entirely at update time (when 
you have to enforce the constraint).



FKs are two (or more) pointers that say one field in one table is related
to that of another table in some regard.  The use of FKs are typically used
to delete data at the database level.
Foreign keys protect inserts and updates as well: they prevent you from 
accidentally inserting a sales order with a bogus customer ID, for 
example, and you're not allowed to change the ID of a customer who has 
orders without chasing down all those orders and updating them as well.


The easiest way to think of it might be: foreign key constraints make 
sure your foreign key joins always behave as expected: a bogus customer 
ID would cause the order to disappear when joined with customer, and 
that's probably a Bad Thing. Note that the DBMS can be told to allow 
NULL customer ID in orders, if that's useful to you.



*1)
A basic index would be something similar to a key/value pair.  The list of
keys would be sorted, however the list of values that key holds doesn't
necessarily have to be.  From memory, back when I was doing my MCDBA cert
for SQL2k, the basic index look up engine would count how many unique
indexes exist, read the key in the middle, decide if further look ups had
to be done.  If more had to be done, it'd either look at the key at the 1/4
mark, or the 3/4 mark, and decide again.  It'd keep drilling the index page
until it found what it needed.  It'd then look at all the data pages
required and process the data.  So if you were looking for the number 30 in
a list of 100 unique numbers (1-100), it'd look at 50, decide what it found
was too high, look at 25, decide it was too low, then look at 37, decide
too high, then look at 31, again find it too high, then look at 30 and read
in the data which may live on pages 99, 45, 58, 109, and 55.
That describes a binary search, which is actually pretty inefficient for 
disk-resident data. Most database engines (sqlite3 included) use B+Tree 
search. It requires a complex data structure, but the effect is that you 
could find your needle in a one-billion entry haystack by fetching 
perhaps 5 records from disk [1]; binary search would touch perhaps 35 
records, and without an index you'd have to search all billion records 
on disk directly [2].


[1] If the index gets used very often, 2-4 of those records will 
probably still be in memory from previous searches, cutting the disk 
access count even further.


[2] If you only need exact match (rather than range or nearest-neighbor) 
you can go with a hash index and cut the disk access count to one (in 
expectation). Sqlite3 doesn't have hash indexes, though, and B+Tree 
indexes fit in memory well enough that hashing isn't usually worth the 
trouble.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] experimental (better?) usage based sqlite cache

2013-02-25 Thread Ryan Johnson

On 25/02/2013 7:24 AM, Simon Slavin wrote:

On 25 Feb 2013, at 11:33am, Howard Chu  wrote:


Gabriel Corneanu wrote:

Following a few other discussions, I had the feeling that sqlite should
benefit from a cache which discards cached pages in a least frequently
used order.

Just offhand, classical LRU is quite poor in terms of lock overhead.

Gabriel writes "least frequently used".  Howard writes "least recently used".  
You're not writing about the same thing.

And the speed advantages of any algorithm used must be assessed before anything 
new is implemented.  SQLite is meant to be 'lite' and have almost nothing in.  
Complicated algorithms and use-counting should be added only if they improve 
things a lot.
(This is also partly a response to Howard, who suggests to just rely on 
the OS fs cache)


There seem to be quite a few sqlite3 users on platforms where I would 
not trust the OS to provide effective caching (embedded, smartphone, 
etc.) and we have seen complaints on the list from them about slow I/O. 
Granted, that's often due to logging overheads, but if we're going to 
see any real benefit from some new caching strategy in sqlite3, it's 
likely to show up there.


As for `lite', this could a pluggable extension, just like the fts 
stuff. Lots of people don't need it, but it can be acquired if needed.


Ryan

P.S. locking overheads are a bogey-man on a belligerently 
single-threaded system...


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Required sqlite binaries for DOS urgently

2013-02-19 Thread Ryan Johnson

On 19/02/2013 8:08 AM, Mahesh Chavan wrote:

My IDE bus is refusing to recognise my HDD or CDROM, I am compelled to
access my PC using floppy only.
I intend to run all my database software on floppy using sqlite3.
I need to develop the software on warfooting. I have already compiled some
programs in Puppy Linux previously.
It's been about 25 years since I last saw a DOS PC with no hard drive 
that ran everything from floppy


Meta question: Maybe I'm missing something here, but why are you trying 
to do *anything* with the PC you describe, let alone run sqlite3? Even 
if it had an HDD with data worth recovering, you'd have to move it to a 
machine with working IDE bus to recover it.



When I tried to compile sqlite3 on DOS using Djgpp, the libsqlite3.a file
is produced when I removed references to mmap and munmap from original
source anf added stub for fchown.
But when I tried to produce sqlite3.exe from shell.o, it gave me error* ld
can not find lgcc and lc. However, libgcc.a anf libc.a are both there in
c:\djgpp\lib directory.*
The same problem arises when  I try to compile any program using
libsqlite3.a.
Please guide me how to gt binaries on DOS using  DJGPP
It sounds like you lack basic knowledge of how compilers, linkers, and 
operating systems work. If so, you'd probably be better off asking for 
help on a list that deals in those sorts of issues. In particular, 
removing calls to mmap/munmap from program sources might let you compile 
and link the code, but it is highly unlikely to produce a working 
executable (especially not on DOS, when the code assumes Linux 
conventions).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I suspect not - but is the database separator configurable?

2013-02-04 Thread Ryan Johnson

On 03/02/2013 10:16 AM, Stephen Chrzanowski wrote:

Just had a thought;

You could do a few things, unfortunately all at the code base level;

1> I don't know if Python will handle it, but I know most other languages
have a string-replace function, or, more specifically, in Delphi, there is
a "format" command in which you would create a string, fill it in with
parameters (Like %s, %d, etc) and then you have an array of variables the
system would fill in.  Create a variable that contains the SQL you need,
but replace what needs to be quoted with a format specifier.  Then,
regenerate the SQL with either quoted or non-quoted variables.  This would
become a PITA if you had 30 field names or so, however.  Not to mention bug
prone.

2> In line with a true string replace, replace all the field names in the
string to a format like '%db..table%' then run a string replace against all
the possible field names and replace them with either a quoted string, or a
non-quoted string.

3> Another option is to manually mangle the string based on what Sybase is
being fed.  So, if you're going to toss it into SQLite:
- scan the string for double-periods, and remember the position
- backup, character by character, until the first non-valid character, and
put a single/double quote
- go forward+1 from the remembered position until the first non-valid
character, and drop the same quote.

4> The last thing, off the top of my head, would be to locate each and
every SQL string, and move it into a file of its own, then duplicate for
SQLite and mangle as needed.  Along with option 1, nightmare to manage.

My thought, #3 would probably be the easiest to implement, as it'd require
one function to be created and an appropriate result returned then going
through the code and applying the function where needed, but it'd probably
the slowest since you'll be practically touching each and every character
in the string before its fed to the DBMS.  #2 would be next in line as a
search and replace based on field names would PROBABLY be more quicker
instead of mangling strings character by character, but it'd involve going
in and modifying the SQL calls themselves, as well as adding the
appropriate string replace commands.
Another option: wrap the appropriate functions from the various classes 
in the python sqlite3 module, and have them convert '..' to '.' in 
queries before passing on. You'd just have to make sure not to replace 
'..' appearing inside string constants.


Assuming you're lazy like me, and only issue sql using methods of the 
Connection object, that means interposing on Connection.execute and 
Connection.executemany; the Connection object has read-only attributes, 
so you'd probably need a wrapper class.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select max(x), y from table

2013-02-04 Thread Ryan Johnson

On 03/02/2013 10:31 AM, Gabor Grothendieck wrote:

On Sun, Feb 3, 2013 at 5:12 AM, E.Pasma  wrote:

Op 3 feb 2013, om 02:59 heeft Igor Tandetnik het volgende geschreven:



On 2/2/2013 6:46 PM, Gabor Grothendieck wrote:

In 3.7.11 there was a change to support the feature in the subject
which refers to guaranteeing that y comes from the same row having
maximum x.. See:

http://pages.citebite.com/o9y9n0p9neyt

Did this or other change also enhance the having clause to add a
feature to support a query containing "having max(...)" such as the
query here:


http://r.789695.n4.nabble.com/Filter-according-to-the-latest-data-tp4657248p4657305.html


The query in that thread is of the form "select * from MyTable group by
Name having max(Timestamp)", and the expectation, somehow, is that the
HAVING clause would cause each group to be represented by a row for which
max(Timestamp) is reached. I'm not sure where this expectation comes from.
This is a valid SQL statement whose HAVING clause means "only include a
group in the resultset if max(Timestamp) for this group is logically true"
(that is, not NULL, 0, empty string or empty blob).

A semantic change of the nature you envision is not backward compatible -
it modifies the meaning of existing valid statements. Also, I'm pretty sure
it's not supported by any SQL standard; and I'm not aware of any DBMS that
would interpret the statement the way you want (which doesn't mean none such
exists, of course).

All in all, It seems unlikely that such a proposal would be entertained.
--
Igor Tandetnik


Hi,

Is it then not a perfect solution? it works, if in the example the timestamp
is always logically true (i.e nit 0 or NULL). Otherwise you might write
something like HAVING MAX(timestamp) OR MAX(timestamp) IS NULL.

This construction comes in useful to deal with the issue as was recently
observed with this featurre (see

  http://www.mail-archive.com/sqlite-users@sqlite.org/msg74761.html

The construction does not involve any semantic change. The question is still
if it will work also in future versions of SQLite.

What was wanted was to pick out the row with the largest timestamp in
each group (and not to pick out those groups with a non-null maximum
timestamp) so the fact that the desired result was returned in the
example would seem to be a  coincidence assuming no specific feature
along these lines has been implemented in SQLite.
select mytable.*, mt from mytable join (select id,name, max(timestamp) 
mt from mytable group by id,name) using(id)


Slightly cumbersome, but it works just fine... probably even fairly 
efficient thanks to use of a primary key index.


Otherwise, it looks like you want some variant of the "windowing" and 
"rank" functions, which are non-standard features of postgres:

http://www.postgresql.org/docs/9.1/static/tutorial-window.html

That said, I'm not sure how to express your query more efficiently even 
using the pgsql syntax, as I understand the latter.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I suspect not - but is the database separator configurable?

2013-02-01 Thread Ryan Johnson

On 01/02/2013 12:28 PM, Mohit Sindhwani wrote:

Hi Marc,

On 1/2/2013 10:42 PM, message adams wrote:

My applications actually run against sybase, but I'd love to use a
connection to an in-memory sqlite to carry out my testing.
As part of the unit-test, I'd pass the sqlite conenction into my source
code hoping it would be none the wiser.

The only real problem I see, is sybase uses a double period separator
between the database and table. e.g.
select * from database..table

... whereas sqlite
select * from database.table


Can you use double quotes in the query?

For example, this works in SQLite3

sqlite> create table "db..abc"(p INTEGER, q INTEGER);
sqlite> .tables
db..abc
sqlite> select * from "db..abc";
sqlite>
That would break sybase, though: the quotes would also tell it to treat 
the db name and periods as part of the table name, too:


sqlite3> create table foo(x,y);
sqlite3> .tables
foo
sqlite3> select * from "main.foo";
Error: no such table: main.foo


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Unable to use table alias inside parenthesis on right side of join

2013-01-31 Thread Ryan Johnson

Looks like exactly the same thing, yes. No reply, unfortunately...

On 31/01/2013 1:05 PM, Kevin Benson wrote:

I wonder if this earlier mention is related?

http://osdir.com/ml/sqlite-users/2012-07/msg00054.html
--
--
   --
  --Ô¿Ô--
 K e V i N


On Thu, Jan 31, 2013 at 12:04 PM, Ryan Johnson
<ryan.john...@cs.utoronto.ca>wrote:


Hi all,

Strange one here... consider the following schema:

R(a,b)
S(b,c)

These queries all work fine:

select R1.* from R R1 join S S1 using(b);
select S1.* from R R1 join S S1 using(b);
select R1.* from (R R1 join S S1 using(b));
select S1.* from (R R1 join S S1 using(b));
select R1.* from (R R1 join S1 using(b)) join (R R2 join S2 using(b))
using (a,c);
select S1.* from (R R1 join S1 using(b)) join (R R2 join S2 using(b))
using (a,c);

But these two do not:

select R2.* from (R R1 join S S1 using(b)) join (R R2 join S S2 using(b))
using (a,c);


Error: no such table: R2


select S2.* from (R R1 join S S1 using(b)) join (R R2 join S S2 using(b))
using (a,c);


Error: no such table: S2


Why are R1/S1 aliases visible at top level but R2/S2 aren't? It seems that
if the parentheses made those joined pairs into nested queries, then R1/S1
should be lost; if parentheses are only for grouping, then R2/S2 should be
visible.  Either way, something seems a bit off.

The official syntax diagrams [1] suggest that parentheses imply only
grouping in this context. Then again, they also suggest it should be a
syntax error for a table alias to follow a join source in parentheses...
and that actually works fine, other than making the inner table aliases
unavailable (which kind of makes sense):

select J1.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2
using(b)) J2 using (a,c);
select J2.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2
using(b)) J2 using (a,c);
select R1.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2
using(b)) J2 using (a,c);


Error: no such table: R1


And yes, I've since ditched the USING syntax and gone back to WHERE
clauses, though the resulting is 20% longer and arguably much less clear
[2]:

select R1.*, R2.* from (R R1 join S using(b)) join (R R2 join S using(b))
using(a,c) where R1.b != R2.b
select R1.*, R2.* from R R1, R R2, S S1, S S2 where R1.b=S1.b and
R2.b=S2.b and R1.a=R2.a and S1.c=S2.c and R1.b != R2.b

[1] 
http://www.sqlite.org/**syntaxdiagrams.html#join-**source<http://www.sqlite.org/syntaxdiagrams.html#join-source>

[2] Note that I added a predicate in, which I had stripped from the
earlier test cases for clarity

Thoughts?
Ryan

__**_
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-**bin/mailman/listinfo/sqlite-**users<http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Unable to use table alias inside parenthesis on right side of join

2013-01-31 Thread Ryan Johnson

Hi all,

Strange one here... consider the following schema:

R(a,b)
S(b,c)

These queries all work fine:

select R1.* from R R1 join S S1 using(b);
select S1.* from R R1 join S S1 using(b);
select R1.* from (R R1 join S S1 using(b));
select S1.* from (R R1 join S S1 using(b));
select R1.* from (R R1 join S1 using(b)) join (R R2 join S2 using(b)) 
using (a,c);
select S1.* from (R R1 join S1 using(b)) join (R R2 join S2 using(b)) 
using (a,c);


But these two do not:

select R2.* from (R R1 join S S1 using(b)) join (R R2 join S S2 
using(b)) using (a,c);

Error: no such table: R2
select S2.* from (R R1 join S S1 using(b)) join (R R2 join S S2 
using(b)) using (a,c);

Error: no such table: S2


Why are R1/S1 aliases visible at top level but R2/S2 aren't? It seems 
that if the parentheses made those joined pairs into nested queries, 
then R1/S1 should be lost; if parentheses are only for grouping, then 
R2/S2 should be visible.  Either way, something seems a bit off.


The official syntax diagrams [1] suggest that parentheses imply only 
grouping in this context. Then again, they also suggest it should be a 
syntax error for a table alias to follow a join source in parentheses... 
and that actually works fine, other than making the inner table aliases 
unavailable (which kind of makes sense):


select J1.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2 
using(b)) J2 using (a,c);
select J2.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2 
using(b)) J2 using (a,c);
select R1.* from (R R1 join S S1 using(b)) J1 join (R R2 join S S2 
using(b)) J2 using (a,c);

Error: no such table: R1


And yes, I've since ditched the USING syntax and gone back to WHERE 
clauses, though the resulting is 20% longer and arguably much less clear 
[2]:


select R1.*, R2.* from (R R1 join S using(b)) join (R R2 join S 
using(b)) using(a,c) where R1.b != R2.b
select R1.*, R2.* from R R1, R R2, S S1, S S2 where R1.b=S1.b and 
R2.b=S2.b and R1.a=R2.a and S1.c=S2.c and R1.b != R2.b


[1] http://www.sqlite.org/syntaxdiagrams.html#join-source

[2] Note that I added a predicate in, which I had stripped from the 
earlier test cases for clarity


Thoughts?
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite3 Bug Report: The shell accepts some dot commands ending in semicolons while rejecting others.

2013-01-28 Thread Ryan Johnson

On 28/01/2013 12:08 PM, Larry Brasfield wrote:

Nathan Chung wrote:

SQLite version: 3.6.12
OS: Mac OS X 10.6.8

*Summary:
The SQLite3 shell accepts some dot commands ending in semicolons while
rejecting others without displaying proper error messages. The
behavior of the dot commands could be more consistent. Examples
include "header off;" and "header on;". The shell accepts "header
off;" while rejecting "header on;" without a proper error message.

*Steps to reproduce:
1. Launch SQLite3 shell with a populated database and type in 
".header on".
2. Try a select statement. The retrieved table will show the field 
headers.

3. Type in ".header off;".
4. Try a select statement. The retrieved table will not show the 
field headers.

5. Type in ".header on;".
6. Try a select statement. The retrieved table will still not show the
field headers. Note that the ".header on;" command from the previous
step did not generate any error messages even though the command was
rejected.
7. There may be other dot commands displaying other inconsistent
behaviors. I have only verified the issue with "header on;" and
"header off;".


The code for the shell command decoder is not written to perform as 
you expect.  It is very simple, (which is also a virtue), and focused 
on accepting correctly composed commands. The shell is considered to 
be a tool for easily performing basic operations upon a SQLite 
database, in the hands of more software-savvy people than those who 
would be lost without detailed and "consistent" error reporting.  I 
dare say that said code, if modified to meet your expectations, would 
be less suited to its purpose, harder to maintain and understand, and 
would divert effort from SQLite itself.  I say this as one who has 
found reason to modify that code and who is glad that part of the task 
consumed little time.

".header on;" -- does nothing
".header off;" -- disables headers

I'm with OP on this one. The error message doesn't need to be "helpful" 
-- in fact I'm fine if it accepts semicolons, given that most things you 
type in the shell require them. Turning apparently-successful commands 
into no-ops is never a good idea, though.


My guess is that it's trivial to fix, especially if the decision is to 
allow semicolons and other trailing garbage (though `.header foo' would 
still silently "succeed"). Probably a one-liner along the lines of 
s/strcmp/strncmp/, or some such.


$0.02
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] order by "a huge number" does not work, but "random" well why ?

2013-01-15 Thread Ryan Johnson

On 14/01/2013 9:48 AM, François-xavier Jacobs wrote:

Hi everyone

i would like to "seed random" a request, so i could do use some pagination
system with a order by random(), is this possible sqlite ? when a tried to
use "order by 1234567892" it always return the same order
That's because you told sqlite that the distinguishing feature of a 
tuple is the constant '1234567892' . The sorting routine will see "t1 < 
t2 = false" and "t1 > t2 = false" (implying t1 = t2) no matter which two 
tuples it examines. If sqlite uses a stable sorting algorithm, that will 
leave the input untouched. A very expensive no-op, in other words.


With random(), t1 < t2 is undefined (because it turns out different 
every time it's evaluated), and you end up with tuples in a random order 
(assuming the sorting routine doesn't crash; code usually assumes values 
don't change while being sorted). Note that shuffling directly is more 
efficient than "sorting" in this way, though sqlite lacks the necessary 
"shuffle" operator to do it in-database.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite - Dropping TEMP table doesn't free allocated memory

2012-12-28 Thread Ryan Johnson

On 28/12/2012 4:04 PM, Krzysztof wrote:

I don't understand :/ So what is solution in my case?
What is the problem you need to solve? If I understand correctly, your 
app repeatedly creates and deletes ~80MB of temp data. If so, it's 
actually a good thing that sqlite doesn't release the memory, because it 
would just have to re-allocate it soon anyway. I see nothing to worry 
about unless your platform is really memory-starved (to the point that 
freeing up 80MB actually matters), or you see the memory footprint 
growing with each create/delete cycle (indicating a memory leak).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] just a test

2012-12-10 Thread Ryan Johnson
Clearly, Igor is too helpful and responds to too many messages... *rolls 
eyes*


I'm not on gmail, so I didn't know this was even a problem, but 
hopefully it gets sorted out soon.


Ryan

On 09/12/2012 2:01 AM, dd wrote:

Yes. Igor Tandetnik mails marked as a spam nowadays. I marked it as a NOT
SPAM.


On Sun, Dec 9, 2012 at 9:33 AM, Gabor Grothendieck
wrote:


I am still having problems with Igor's gmail messages being marked as
spam in gmail but after the upteenth time declaring them not to be
spam google finally asked me if I wanted to report it to their gmail
team so hopefully they will fix it soon.

On Mon, Dec 3, 2012 at 11:59 PM, Clive Hayward 
wrote:

Igor's messages sometimes get marked as spam by gmail.

--
Clive Hayward


On 2012-12-03, at 7:57 AM, e-mail mgbg25171 

wrote:

I've posted a couple of mails lately...I'm not getting them via the

list or

any responses.
Admin says Igor responded to one of them...Thanks Igor!
This is just a test to see if the mail is coming to me (as a member of

the

list).
Therefore please just ignore this.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Tracing latencies

2012-12-05 Thread Ryan Johnson

On 05/12/2012 1:25 PM, Dan Kennedy wrote:

On 12/06/2012 12:47 AM, Black, Michael (IS) wrote:

I'm a bit confused with "always aligned". None of the lseeks in this
log are 1024 aligned.

And I just ran a test with the 3.7.15.1 latest amalgamation and most
of these seeks are not aligned.  Once in a while it gets lucky.
Alignment sure isn't deliberate in this.  It appears the first page
is 1080 which is already out of alignment.  File#4 here is the wal
file.


I wasn't clear. Writes to the WAL file are not aligned. However,
they are usually sequential, which means on an OS with a smart
enough cache you don't really gain anything at all trying to align
them.
That hypothesis could be easily tested by examining the corresponding 
strace output for mysql... I don't know if sqlite3's unaligned log 
writes are actually slow (didn't see this info in any of Keith's 
messages), but if they are that would suggest the hypothesis is false; 
it would really put nails in the coffin if mysql uses aligned log writes.


(BTW Michael, whatever mechanism you use to reply breaks threading in 
both Thunderbird and the list archives at sqlite.org; makes it harder to 
follow conversations)


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] light weight write barriers

2012-11-15 Thread Ryan Johnson

On 14/11/2012 8:17 PM, Vladislav Bolkhovitin wrote:

Nico Williams, on 11/13/2012 02:13 PM wrote:

declaring groups of internally-unordered writes where the groups are
ordered with respect to each other... is practically the same as
barriers.


Which barriers? Barriers meaning cache flush or barriers meaning 
commands order, or barriers meaning both?


There's no such thing as "barrier". It is fully artificial 
abstraction. After all, at the bottom of your stack, you will have to 
translate it either to cache flush, or commands order enforcement, or 
both.
Isn't that  why we *have* "the stack" in the first place? So apps 
*don't* have to worry about how the OS implements an artificial (= 
high-level and portable) abstraction on a given device?




Are you going to invent 3 types of barriers?

One will do, it just needs to be a good one.

Maybe I'm missing something here, so I'm going to back up a bit and 
recap what I understand.


The filesystem abstracts the concept of encoding patterns of bits in 
some physical media (data), and making it easy to find and retrieve 
those bits later (metadata, incl. file name). When users read(), they 
expect to see whatever they most recently sent to write(). They also 
expect that what they write will still be there later,  in spite of any 
failure that leaves the disk itself intact.


Operating systems cheat by not actually writing to disk -- for 
performance reasons -- and users are (mostly, usually) OK with that, 
because the performance gains are so attractive and things usually work 
out anyway. Disks cheat too, in the same way and for the same reason.


The cheating works great most of the time, but breaks down -- badly -- 
if we actually care about what is on disk after a crash (or if we use a 
network filesystem). Enough people do care that fsync() was added to the 
toolbox. It is defined to transfer "all modified in-core data of the 
file referred to by the file descriptor fd to the disk device" and 
"blocks until the device reports that the transfer has completed" 
(quoting from the fsync(2) man page). Translation: "Stop cheating. Make 
sure the stuff I already wrote actually got written. And tell the disk 
to stop cheating, too."


Problem is, this definition is asymmetric: it says what happens to 
writes issued before the fsync, but nothing about those issued after the 
fsync starts and before it returns [1]. The reader has to assume  
fsync() makes no promises whatsoever about these later writes: making 
fsync capture them exposes callers of fsync() to DoS attacks, and them 
from reaching disk until all outstanding fsync calls complete would add 
complexity the spec doesn't currently demand, leading to understandable 
reluctance by kernel devs to code it up. Unfortunately, we're left with 
the filesystem equivalent of what we in the database world call 
"eventual consistency" -- easy to implement, nice and fast, but very 
difficult to write reliable code against unless you're willing to pay 
the cost of being fully synchronous, all the time. Having tried that for 
a few years, many people are "returning" to better-specified concurrency 
models, trading some amount of performance for comfort that the app will 
at least work predictably when things go wrong in strange and 
unanticipated ways.


The request, then, is to tighten up fsync semantics in two conceptually 
straightforward ways [2]: First, guarantee that later writes to an fd do 
not hit disk until earlier calls to fsync() complete. Second, make the 
call asynchronous. That's all.


Note that both changes are necessary. The improved ordering semantic 
useless by itself, because it's still not safe to request a blocking 
fsync from one thread and and then let other threads continue issuing 
writes: there's a race between broadcasting that fsync has begun and 
issuing the actual syscall that begins it. An asynchronous fsync is also 
useless by itself, because it only benefits uncoordinated writes (which 
evidently don't care what data actually reaches disk anyway).


The easiest way to implement this fsync would involve three things:
1. Schedule writes for all dirty pages in the fs cache that belong to 
the affected file, wait for the device to report success, issue a cache 
flush to the device (or request ordering commands, if available) to make 
it tell the truth, and wait for the device to report success. AFAIK this 
already happens, but without taking advantage of any request ordering 
commands.
2. The requesting thread returns as soon as the kernel has identified 
all data that will be written back. This is new, but pretty similar to 
what AIO already does.
3. No write is allowed to enqueue any requests at the device that 
involve the same file, until all outstanding fsync complete [3]. This is 
new.


The performance hit for #1 can be reduced significantly if the storage 
hardware at hand happens to support some form of request ordering. The 
amount of reduction could vary greatly depending on 

Re: [sqlite] Bug: Inconsistency wrt. indirectly defined columns

2012-11-07 Thread Ryan Johnson

On 07/11/2012 7:58 PM, Simon Davies wrote:

On 7 November 2012 20:36,   wrote:

Quoting Simon Davies :


.
.
.

I think this is the documented behaviour:
http://www.sqlite.org/datatype3.html

tab1.id has integer affinity, and '42' is coerced to integer
tab2.id has none affinity, and '42' is not coerced


Hmm... I see what you mean:
Point 3 under 2.1 states that "if no type is specified then the column has
affinity NONE."

However, I find a foreign-key-clause counting as "no type specified" is at
least a bit irritating. After all the type could be inferred from the
reference. :-/

Also, ironically, the documentation claims that "The dynamic type system of
SQLite is backwards compatible with the more common static type systems of
other database engines...".

I just checked the example with the Oracle and PostgreSQL instances I have
at hand here:
Oracle does what I think is correct and returns a row in both cases.
PostgreSQL does not allow the "id REFERENCES" construction at all and
requires a data type even for foreign keys.

So in this case SQLite is incompatible with two major DBMSes. :-)

Which from what you have said, are also incompatible with each other!

Man with sharp stick has point.

To be fair, though, I have been bitten numerous times by exactly this 
same scenario, where foreign key joins fail because the key types 
somehow end up differing and don't coerce automatically. Very easy to 
forget, or to accidentally let a string slip in where an int was 
intended (say, by loading from csv). Also hard to diagnose.


Not sure the best way to "fix" the problem [1], but it might be the 
single most surprising aspect of using sqlite3 in my experience.


Ryan

[1] especially since there's probably a customer out there somewhere 
whose app actually depends on foreign key join columns having different 
types and not matching '24' with 24.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] "natrual" join (not "natural")

2012-10-31 Thread Ryan Johnson

On 31/10/2012 9:54 AM, Samuel Neff wrote:

We wrote a query and intended to use a "natural" join but had a typo and
wrote "natrual" join instead.  We were surprised this query was processed
without error and performed a cross join.

That must have been a delightful one to track down...

It's due to the (non-standard?) support for specifing table aliases 
without the keyword AS. In other words, sqlite3 sees the following as 
identical:


sqlite> select natral.* from a natral join b;
sqlite> select natral.* from a as natral join b;

... and will reject this:
sqlite> select a.* from a natral join b;

Whether this is a bug or a feature, I'm not sure... but it clearly had 
an unintended consequence in your case.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] transaction commit is successful despite I/O error

2012-10-20 Thread Ryan Johnson

On 19/10/2012 4:40 PM, Efim Dyadkin wrote:


Hi Richard,

You are right about the purpose of unlink but it is out of context. 
There are a transaction in progress and hot journal on disk. If 
journal can't be deleted by the end of transaction, the transaction 
can't be considered to be successfully finished. But current 
implementation of Sqlite considers this transaction as successfully 
committed!


The problem with file system going down during a transaction is a real 
problem that occurred to our customers. For them it was absolutely a 
disaster because they lost data they had successfully saved.


I guess unixDelete must propagate ENOENT error when it runs in the 
context of transaction. For applications it will be like "transaction 
failed due to IO error".


For a test I tried to remove "errno != ENOENT" unconditionally from 
unixDelete but immediately encountered a problem. When I tried to 
execute any command on an empty database e.g. "create table a (a1 
text)" I got I/O error. This is because Sqlite tried to remove 
non-existsing "--wal" file


I think you're on the right track: pagerOpenWalIfPresent() attempts to 
delete a file that may not exist (the comments say so!), so it should 
have been expecting the occasional ENOENT.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Find first non-NULL values of several columns

2012-10-19 Thread Ryan Johnson

On 19/10/2012 3:09 PM, Steinar Midtskogen wrote:

Thank you for all suggestions.

I will need to do such queries often, so it's just a matter of saving
the typing.  Unfortunately, views aren't going to be very practical
either, because there are a lot of tables and columns (100+), and new
ones will be added.

The actual use case is as follows:

I have tables with a timestamp (unix time) and columns containing
sensor readings which are inserted continuously.  I frequently need to
access the most recent values (or NULL if there is no value within the
latest, say, hour).  I would like to do something like:
Is each column associated with a different sensor? If so, do sensors 
routinely generate data at the same time and combine their entries? Do 
sensors emit data frequently?


If any of the above is false, I'd definitely store each column in its 
own table, because the timestamp (the only possible thing to share) is 
replicated anyway. That does get you back to the multiple subquery 
thing, but a few views should hide that pretty effectively. 
Unfortunately, it doesn't look like sqlite3 is smart enough to push 
projections down through views, otherwise a single 
"view-to-rule-them-all" would have zero runtime overhead and save a 
*lot* of typing.



SELECT coalesce(col_1), ..., coalesce(col_n) FROM v WHERE unix_time > 
strftime('%s', 'now', '-1 hour') ORDER BY unix_time DESC;

So I would typically want to access the last non-NULL value because of
the DESC keyword.  But if I understand things correctly, a statement
like above will never work because an aggregate function reads the
data in no particular order regardless of the ORDER BY statement.

I like Igor's suggestion.  Although not quite universal, it's clever.

Indeed it is. And very useful at times.


Ryan's suggestion should work well, except that I will need a first(a,
b) and last(a, b) function (if I want to support both ascending and
descending order) and I can leave out the ORDER BY part.  So:

SELECT last(col_1, unix_time), ..., last(col_n, unix_time) FROM v WHERE unix_time 
> strftime('%s', 'now', '-1 hour');

Or just pass -unix_time to first(), seeing as how it's a number.

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Find first non-NULL values of several columns

2012-10-19 Thread Ryan Johnson

On 19/10/2012 9:17 AM, Igor Tandetnik wrote:

Ryan Johnson <ryan.john...@cs.utoronto.ca> wrote:

I'd go for a user-defined aggregate taking two args: the key (to
identify "first") and the value to coalesce. Sure, it would never stop
the scan early, but the benefit of doing one scan instead of five
probability outweighs that (unless Steinar has an appropriate index on
every single column, which I doubt).

Or unless a non-null value is typically found in the first few rows of a large 
table. In other words, if the ratio of nulls to non-nulls is small (and if it's 
large, then I'd be thinking of a different schema, one that represents a sparse 
table more efficiently).
I had assumed a sparse table because I'm having a hard time imagining 
why you'd want to coalesce dense rows in this way. But you're right: in 
that case you'd really want to store sparse columns in separate tables, 
which would make the nested query approach by far the best (especially 
if you alias the sparse table's foreign key with its rowid).


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Find first non-NULL values of several columns

2012-10-19 Thread Ryan Johnson

On 19/10/2012 8:55 AM, Igor Tandetnik wrote:

Steinar Midtskogen  wrote:

Ok, so let's say the table v (with "a" as the primary key) is:

a|b|c|d|e|f
0| | |2| |9
1|1| |3| |8
2|1| |4|4|7
3| |5|5|4|6
4|1|6|6| |5

The the question becomes, is there a more convenient way to do:

SELECT * FROM (SELECT b FROM v WHERE b IS NOT NULL ORDER BY a LIMIT 1),
  (SELECT c FROM v WHERE c IS NOT NULL ORDER BY a LIMIT 1),
  (SELECT d FROM v WHERE d IS NOT NULL ORDER BY a LIMIT 1),
  (SELECT e FROM v WHERE e IS NOT NULL ORDER BY a LIMIT 1),
  (SELECT f FROM v WHERE f IS NOT NULL ORDER BY a LIMIT 1);

to get 1|5|2|4|9?

If the values of b, c and so on have a known upper bound, then you can write 
something like

select min(a*1000 + b), min(a*1000 + c), ..., min(a*1000 + f) from v;

Note however that your origial statement is likely more efficient, as it can 
stop scanning early, whereas my variant will look at every record.
That, and it will break if those fields aren't numbers, or if any take 
negative values.


I'd go for a user-defined aggregate taking two args: the key (to 
identify "first") and the value to coalesce. Sure, it would never stop 
the scan early, but the benefit of doing one scan instead of five 
probability outweighs that (unless Steinar has an appropriate index on 
every single column, which I doubt).


The python version would be:

class fnn:
def __init__(self):
self.key,self.val = None,None
def step(self, key, val):
if val is not None and (self.key is None or key < self.key):
self.key,self.val = key,val
def finalize(self):
return self.val

conn.create_aggregate('fnn', 2, fnn)


And the query would become:

select fnn(a,b), fnn(a,c), fnn(a,d), fnn(a,e), fnn(a,f) from v


Regards,
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] transaction commit is successful despite I/O error

2012-10-18 Thread Ryan Johnson

On 18/10/2012 8:45 AM, Richard Hipp wrote:

On Wed, Oct 17, 2012 at 4:11 PM, Efim Dyadkin  wrote:


Hi,

I am testing loss of data in Sqlite database correlated to auto-mounter
malfunction. I am running Sqlite on Linux and my database file is located
on network disk. For a test I stop the auto-mounter right before
transaction is committed. Surprisingly commit succeeds without any error
although hot journal remains on disk. When I get auto-mounter back and open
my database again the transaction is rolled back.

Apparently Sqlite cannot remove the journal due to unmounted path but it
ignores this error because Linux classifies it as ENOENT and unixDelete
function disregards it:

if( unlink(zPath)==(-1)) && errno!=ENOENT ){
 return unixLogError(SQLITE_IOERR_DELETE,
"unlink", zPath);
}

Can somebody please explain why "errno!=ENOENT" is required in here?


The purpose of unlink() is to make it so that the file does not exist.
ENOENT indicates that the file does not exist, and so the purpose of the
unlink() call has been fulfilled.

Suppose SQLite did treat ENOENT as an error.  What could it do about it?
It cannot roll the transaction back because the rollback journal (and
indeed the entire database) has disappeared.  I guess we could return
SQLITE_CORRUPT.  Would that somehow be more useful to the application?
Is there some plausible scenario for which an active journal file gone 
AWOL does *not* indicate a serious problem? To me it indicates that Bad 
Things are going on that sqlite is ill-equipped to deal with, so the 
best it can do is avoid causing any collateral damage by attempting to 
continue normally. Especially if the filesystem went down: it's not like 
any future transaction would succeed anyway...


If a less heavy-handed approach is desirable, perhaps a failed unlink() 
call should trigger an fstat() or seek() on the offending file 
descriptor; that might distinguish whether the file itself is 
inaccessible (as in OP's case) or whether it's just unreachable at the 
expected path (e.g. due to interference from an external agent).


I would still favor a fail-fast approach that returns a scary error 
message, though, the same kind that would be returned if a write failed.


$0.02
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite VM questions

2012-10-14 Thread Ryan Johnson

On 14/10/2012 2:26 PM, Pavel Ivanov wrote:

What kind of interpreter does the query executor uses? How important is the
interpreter's speed, to SQLite's speed ?

SQLite doesn't have interpreter, it has parser. I guess this makes the
rest of your email inapplicable.

Umm... yes it does. http://www.sqlite.org/vdbe.html

For the OP's question, it's very efficient compared to what I've seen 
in, say, postgres, but I don't know how it would compare to something 
like python or ruby.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Combining user-defined functions with aggregate functions

2012-10-10 Thread Ryan Johnson

On 10/10/2012 11:07 AM, Dan Kennedy wrote:

On 10/10/2012 10:01 PM, Ryan Johnson wrote:

On 10/10/2012 10:49 AM, Dan Kennedy wrote:

On 10/10/2012 08:14 PM, Hamish Allan wrote:

Short form question:

Working: SELECT a, userfunc(systemfunc) FROM t;
Working: SELECT a, sum(systemfunc) FROM t GROUP BY a;
Not working: SELECT a, sum(userfunc(systemfunc)) FROM t GROUP BY a;

Long form question:

I have a user-defined C function called "hits", loosely based on the
function "rankfunc" from the FTS3 example code, but returning (using
sqlite3_result_int) the total number of hits per row from the
matchinfo (code below).


As things are now, you can't use any of the FTS auxiliary functions
as arguments to an aggregate function.

The easiest workaround is probably to use a temp table to store the
unaggregated results of the FTS query.

What about a nested query?

SELECT r, sum(h) FROM (
SELECT rowid>>32 r, hits(matchinfo(text)) h
FROM text WHERE content MATCH 'a')
GROUP BY r;

Avoids the need to declare a temp table explicitly, and the optimizer
might even be able to avoid materializing the subquery.

... though if it turns out the above mechanical query rewrite works, I
would start to wonder why the original syntax can't be supported...


Actually that will cause a problem too, as the optimizer will
flatten the sub-query.

The error message will be something like "unable to use function
matchinfo() in the requested context".
So either (a) the nested query has different semantics than creating a 
temp table and selecting from it, or (b) the flattened version of the 
query has different semantics from the nested one?


Sounds like a bug.

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Combining user-defined functions with aggregate functions

2012-10-10 Thread Ryan Johnson

On 10/10/2012 10:49 AM, Dan Kennedy wrote:

On 10/10/2012 08:14 PM, Hamish Allan wrote:

Short form question:

Working: SELECT a, userfunc(systemfunc) FROM t;
Working: SELECT a, sum(systemfunc) FROM t GROUP BY a;
Not working: SELECT a, sum(userfunc(systemfunc)) FROM t GROUP BY a;

Long form question:

I have a user-defined C function called "hits", loosely based on the
function "rankfunc" from the FTS3 example code, but returning (using
sqlite3_result_int) the total number of hits per row from the
matchinfo (code below).


As things are now, you can't use any of the FTS auxiliary functions
as arguments to an aggregate function.

The easiest workaround is probably to use a temp table to store the
unaggregated results of the FTS query.

What about a nested query?

SELECT r, sum(h) FROM (
SELECT rowid>>32 r, hits(matchinfo(text)) h
FROM text WHERE content MATCH 'a')
GROUP BY r;

Avoids the need to declare a temp table explicitly, and the optimizer 
might even be able to avoid materializing the subquery.


... though if it turns out the above mechanical query rewrite works, I 
would start to wonder why the original syntax can't be supported...


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] index has higher cache priority than data?

2012-09-21 Thread Ryan Johnson

On 21/09/2012 7:54 AM, Clemens Ladisch wrote:

John Bachir wrote:

i've read other posts on this list that say that we can't guess what sqlite
will do with cache.

It uses a simple LRU algorithm to determine which pages to kick out of
the page cache first (so at least it's somewhat deterministic).


however, could i be relatively confident that most of the time, it
will prioritize keeping the index in memory before it starts keeping
the data?

The page cache does not know what is in the pages.

Let's look at a simple example: assume the index has two pages, X and Y,
which each point to records in three data pages:
   X -> A,B,C; Y -> D,E,F

The order in which the pages would be used is this:
   X A X B X C Y D Y E Y F

For LRU, the last usage matters, so the LRU list will look like this:
   A B X C D E Y F

So the data pages _will_ crowd out the index pages (especially when
there are much fewer index then data pages ).


ideally it would always keep the entire index in memory and never
cache the data.

if i can't more or less depend on this, then sqlite probably won't
work for my application.

You could write your own page cache implementation that wraps the
original one but never throws out certain pages ...

This might help: http://www.sqlite.org/capi3ref.html#sqlite3_pcache
By implementing a custom page cache using this API, an application can 
better control the amount of memory consumed by SQLite, the way in 
which that memory is allocated and released, and the policies used to 
determine exactly which parts of a database file are cached and for 
how long.


AFAICT, a pluggable cache only needs to worry about memory, with all I/O 
handled by sqlite3. It shouldn't be too hard to cook up a minimal 
version. I'm a bit doubtful on the "exactly which parts of a file" 
claim, since the API doesn't tell you anything about the pages it asks 
you to cache. However, the "clock" algorithm would probably do you want, 
without needing to know which pages actually belong to an index: it 
prefers to evict pages that have been touched the fewest times (with 
decay), rather than those that have gone the longest since their last 
touch.


Advantages:
- Popular pages are hard to evict, but become unpopular if left 
untouched too long
- Simpler code (a for loop and a simple counter at each cache slot, vs. 
some sorted data structure for LRU)
- Lower runtime overhead (amortized constant cost per unpin vs. 
logarithmic cost for LRU)


To implement Clock: arrange the cache slots in a circle, and keep a 
remembered position, the "hand." Whenever the system needs to allocate a 
new page, the "hand" sweeps around the circle looking for an unpinned 
page with zero touch count, and evicts the first such page; the touch 
count increments whenever the page is unpinned, and decrements whenever 
the "hand" passes it. Any page unpinned at least once per clock cycle 
will remain in cache, with memory pressure making clock cycles shorter.


Note that, while any given eviction can require looking at multiple 
pages, it usually averages out to only a few pages per eviction. The 
worst case would be if an adversary touched N-1 popular pages once for 
each unpopular page it fetches: the unpopular page would be evicted 
every time, but clock would have to sweep the whole pool to discover 
this. Even then, though, you pay cost N to evict a page once every N 
unpins.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Initial read speed greater than subsequent

2012-07-13 Thread Ryan Johnson

On 13/07/2012 5:37 PM, Udi Karni wrote:

Hello,

Running on Windows 7 - I am noticing that tables in :memory: DBs are read
(SELECTED) at a constant rate. However - conventional DBs on disk - even on
SSD - are read fast the first time, and much slower subsequently. Closing
and reopening a DB for every SQL statement seems to cure this - but
obviously is not a solution when you want to run a multi-step SQL script on
a database.

Is this a Windows "feature" - caching or otherwise? Is it Sqlite? Looking
at perfmon - the initial read "chunk" is > 100K while subsequently it's 4K
(the page size). Is there some prefetching taking place the first time
around? How do you make it permanent? How do you make Sqlite consistently
table-scan from disk?
This is a common problem with database buffer caches unless great pains 
are taken to avoid it (as in, not a "lite" product).


What happens is that the first time through the data, it's read 
sequentially... but not all of it fits in the page cache. What gets 
evicted is *not* sequential, so on the second time around the disk 
requests are randomly scattered and  take about 100x longer to complete.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite4 key encoding bug

2012-07-10 Thread Ryan Johnson

On 10/07/2012 9:58 AM, bardzotajneko...@interia.pl wrote:

SQLite version 4.0.0 2012-07-07 12:21:48
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table t(a);
sqlite> insert into t values(123);
sqlite> insert into t values(123.0);
sqlite> insert into t values(12323);
sqlite> insert into t values(12323.0);
sqlite> select * from t group by 1;
123.0
12323
12323.0
sqlite>
This is correct: 123 != 123.0 != "123" because they all have different 
types (int/float/string). Either create the table with typed columns to 
encourage all values to have the same type [1] or cast the group by key 
to coerce all values to the same type (not sure what happens if the type 
cast fails).


[1] See the SQLite docs for rules about automatic type conversions for 
typed columns, it's not a panacea.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite4 (don't scream)

2012-06-28 Thread Ryan Johnson

On 28/06/2012 12:30 PM, Cory Nelson wrote:

On Thu, Jun 28, 2012 at 11:20 AM, Stephan Beal wrote:


On Thu, Jun 28, 2012 at 5:57 PM, Simon Slavin 
wrote:


Now the URL:



Just thought some people might enjoy reading and thinking about it.


FWIW, my 0.02 Euros regarding this line:

"SQLite4 makes use of standard data types such as size_t, int64_t,
uint64_t,
and others."


size_t does not have a specified size and causes all sorts of grief in
porting i/o-based APIs between 32/64 bits, in my experience. PLEASE use the
fixed-size integers defined in inttypes.h, and not size_t. There is of
course one notable caveat: MSC does not support inttypes.h/stdint.h BUT
there are free drop-in replacements available here:
http://code.google.com/p/msinttypes/


stdint was made available in VC++ 2010, though inttypes is still missing.
Probably not an issue -- I'm not sure how a public API would need inttypes
anyway. Also, perhaps you are seeing size_t be misused. A blanket "please
don't use" is nonsense.
I tend to agree with Simon on this. size_t is only useful when 
expressing the amount of memory something might involve, when 
constrained only by the size of the current machine's address space. 
Major examples would include sizeof(), malloc(), and strlen(). C++ 
std::vector::size() is an anti-pattern, since the number of elements in 
a vector is not a number of bytes. However, size_t isn't particularly 
helpful even when used "correctly." Any 32-bit portable code can safely 
use uint32_t everywhere: it's equivalent to size_t on 32-bits, and it 
will silently convert to size_t whenever needed on a 64-bit machine; if 
the code is 64-bit only -- most likely to allow for allocations larger 
than 4GB -- then uint64_t can be used everywhere instead.


$0.02
Ryan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 100% CPU utilization sqlite running in VMWare

2012-06-06 Thread Ryan Johnson

On 06/06/2012 7:45 AM, Simon Slavin wrote:

On 6 Jun 2012, at 12:00pm, IQuant  wrote:


We need to be able to run 1000's of extractors concurrently processing
different tick tapes and symbol sets. aka service bureau. The Daily
tick tapes are approx 20gb each.. 30TB repository and growing. An
extraction run take 1 - 5 minutes for small symbol sets per tape.

An example would be to concurrently extract 5 years of a particular
stock's tick data. 1500 days x 2 minute extraction jobs across 15 VMs
each running 100 extractors.

I'll try journal mode off and increasing page size.

Your problem appears to relate to your virtualisation layer.  You yourself 
reported

"previously ran fine and fast on bare metal."

I see no indication that SQLite is doing anything weird here.  The only problem 
you're reporting is 100% CPU utilisation which is not, of itself, a problem.  
It just means that CPUs are being utilised very thoroughly which is good in a 
process that munches through lots of data.
Unless pegging the CPU causes the hypervisor to preempt the guest's 
synchronization primitives and cause priority inversion, in which case 
OP's throughput would plummet. If throughput remains reasonably stable 
in the VM, though, I would agree that a pegged CPU isn't necessarily a 
problem. Another possibility: the guest OS idle loop will probably count 
as CPU usage on the host, and with heavy load the guest won't 
necessarily attempt to park the cores it was using in between scheduling 
decisions; on bare metal, the spinning idle loop doesn't get reported as 
CPU usage, even though it is.


Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [PATCH] add malloc and alloc_size attributes to allocation functions

2012-06-02 Thread Ryan Johnson

The list strips attachments... you might try sqlite-dev?

On 01/06/2012 5:31 PM, Nuno Lopes wrote:

Hi,

Please find in attach a patch that adds support for the malloc and 
alloc_size function attributes.
The malloc attribute specifies that a function behaves like malloc 
(i.e., it returns a fresh pointer to an allocated block). It's mainly 
used for optimization purposes.
The alloc_size attribute specifies that a given function returns a 
buffer of the size given in the specified argument. It's used to 
generate warnings for e.g. buffer overflows. Clang will also take 
advantage of this when instrumenting code for run-time buffer overflow 
detection.


Nuno


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] 64-bit precompiled command-line shell binary for Windows 7

2012-05-01 Thread Ryan Johnson

The list strips all attachments... you might want to PM it.

On 30/04/2012 12:23 PM, Baruch Burstein wrote:

I haven't tested it (I only have a 32-bit system), but here you go. Just
change the extension to .exe

On Mon, Apr 30, 2012 at 4:28 PM, Udi Karni  wrote:


Dear Sqlite development team,

I'd like to add a vote for requesting a 64-bit precompiled command-line
shell binary (sqlite3.exe) for Windows 7 - to be added to the list of files
downloadable from the page  .

It would be very helpful for those of us who are not versed with compiling
C++ code.

I have seen a tremendous performance boost from the use of "memory
temp_store" and "memory database". The 64-bit version would allow them to
go beyond 2GB.

Is that simple? Not so simple? If you could elaborate - it would be much
appreciated.

Thank you !

Udi
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users






___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Join-source issue when using sub '(' join-source ')'

2012-03-19 Thread Ryan Johnson

On 19/03/2012 12:07 PM, TAUZIN Mathieu wrote:

Thanks for your support !



SQL Ansi  (and every major DB 
SqlServer, 
Oracle) 
supports this syntax as described in SQLite documentation.



The parenthesis are here to enforce the priority of the joins.
FYI, sqlite3 does not respect parentheses as a way of enforcing join 
orders. If you want to bypass the optimizer, the accepted way (as with 
postgres, I believe) is to use a CROSS JOIN rather than the default 
[INNER] JOIN. You may want to test whether the above-mentioned engines 
actually respect parentheses for join ordering.


Whether the parenthesis as used should or should not introduce a scope, 
I'm not prepared to make any claims about...


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] INSERT OR REPLACE

2012-03-09 Thread Ryan Johnson

On 08/03/2012 6:15 PM, Nico Williams wrote:

On Thu, Mar 8, 2012 at 4:37 PM, Simon Slavin  wrote:

That's why you don't make a DBMS (SQL) do the job of a programming language.  
Use your programming

language to to retrieve the values you need to make your calculations.
  Then use your programming language to figure out the new values and
execute the UPDATE commands needed to make the changes.

Lots of people want to and do use SQL to do a lot of what a regular
programming language can do.

Some are even taking SQL and applying it to new problems.  See ql.io
for example.

One very nice thing about using SQL to do more is that you can do a
lot of prototyping from a SQL shell and then start on the scaffolding
around it in whatever other language, and if you do it this way you'll
probably find yourself doing a lot of programming in SQL.


SQL is not a programming language.  Trying to use it like one leads to 
ridiculous contortions which take longer to run than doing it all in SQL would 
take.  There's nothing magically fast about SQL, it still has to do the same 
amount of work in the end.

SQL was not, originally, a Turing complete language (unless one were
to consider cross self joins of large tables for driving iteration as
good enough), but nowadays it pretty much is, therefore it  is a
programming language.  The language supported by SQLite3 is Turing
complete, FYI, since you have conditionals (WHERE, WHEN, CASE, ...)
and iteration (e.g., via recursive triggers).  And the SQLite3 VM most
assuredly is Turing complete.

Don't forget stored procedures.

I'd dearly love to have even a simple stored procedure language in 
sqlite3... the VM supports *everything* it needs, even function 
call/return. Triggers already connect multiple statements, joins open 
and iterate over cursors, etc. There's just no parser support to expose 
the functionality in a nice way.


Perhaps a Google summer of code project or something?


But, really, it's quite a hole that I can INSERT INTO table SELECT
  but I can't UPDATE table.

++, though I don't know of a non-kludgy syntax that would make this happen


INSERT OR REPLACE is almost a very good alternative.
The semantics are subtle enough to worry me, just based on what we've 
discussed so far.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Surprising INTEGER PRIMARY KEY Behavior

2012-02-16 Thread Ryan Johnson

On 16/02/2012 1:26 AM, Pete wrote:

When creating a new table, I accidentally assigned a default value of
CURRENT_DATE to an INTEGER PRIMARY KEY AUTOINCREMENT field and was
surprised that I didn't get an error on CREATE TABLE as a result.  It seems
that the default is ignored as an INSERT with DEFAULT VALUES assigned the
correct value to the primary key field.  IS this expected behavior?
My guess would be that autoincrement takes precedence over a normal 
default, so the default value is never requested?


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] ALTER TABLE

2012-02-06 Thread Ryan Johnson

On 06/02/2012 1:59 PM, Bill McCormick wrote:

Nico Williams wrote, On 2/6/2012 12:44 PM:
On Mon, Feb 6, 2012 at 9:36 AM, Bill 
McCormick  wrote:
Is there no way to force columns added to a table with alter table 
to be

added at certain column positions?

Alternatively, if there is some way to save the data in an existing 
table;
drop the table; re-create the table with the desired schema; and 
then reload
the data, this would be useful as well. However, I cannot see how to 
do this

simply.

If order of columns is only important for aesthetic reasons you might
just use a ALTER TABLE to add the column at the end of the list and
then a VIEW to provide the view you prefer.

The order is not important. What is important is that I come up with 
some way to manage version updates. I've tried doing something similar 
in the past using an "alter tables" script (using a different DB). The 
script assumed some base version of schema was present, and then 
proceeded adding new schema if it didn't already exist. It probably 
seemed like a good idea at the time (to whomever started it), but as 
time went on this script grew more and more unmanageable and I dreaded 
having to use it.
You might exploit #pragma user_version to help you track future changes, 
though that wouldn't necessarily help with the existing mess.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-23 Thread Ryan Johnson

On 23/01/2012 7:24 PM, Ryan Johnson wrote:

On 23/01/2012 3:09 PM, Ryan Johnson wrote:

On 23/01/2012 12:51 PM, Richard Hipp wrote:
On Mon, Jan 23, 2012 at 12:48 PM, Simon 
Slavin<slav...@bigfraud.org>  wrote:
I don't know if Dr Hipp is pursuing this privately or expecting it 
to be

solved collaboratively on this list.


I don't have a test case to work on.
My database file is ~150MB, but was generated by the TPC-H data 
generator program. Assuming a linux-like environment (including 
cygwin), the following will reproduce the setup in under five minutes:
The problem persists with a freshly-generated database on my machine, 
using a just-compiled sqlite-3.7.10.


OK, it looks like I didn't install the 3.7.10 binary properly, because 
the situation is significantly different once ANALYZE has run.


The first query from the OP is remarkably similar to a query mentioned 
in where.c (from the 3.7.5 sources, which I happened to have handy):
Previous versions of SQLite [did not always find the lowest-cost 
plan] for scripts such as the following:

**
**   CREATE TABLE t1(a, b);
**   CREATE TABLE t2(c, d);
**   SELECT * FROM t2, t1 WHERE t2.rowid = t1.a;
**
The best strategy is to iterate through table t1 first. However it is 
not possible to determine this with a simple greedy algorithm. Since 
the cost of a linear scan through table t2 is the same as the cost of 
a linear scan through table t1, a simple greedy algorithm may choose 
to use t2 for the outer loop, which is a much costlier approach.


Re-casting my first query as the above gives:
sqlite> explain query plan select * from customer t2, orders t1 where 
t2.rowid = t1.custkey;

0|0|0|SCAN TABLE customer AS t2 (~15000 rows)
0|1|1|SEARCH TABLE orders AS t1 USING INDEX OrderCustomers 
(custKey=?) (~15 rows)


The same query plan is chosen if t1 comes first; execution times and 
stats confirm the improvement.


For query 2, the situation is less clear. First of all, it appears to 
execute more than 4x faster overall (10s and .2s -- WOW!) but the 
optimizer still seems to choose the wrong plan (13.6M rows examined):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|2|SCAN SUBQUERY 1 AS Y (~37500 rows)
0|1|0|SCAN TABLE orders AS O (~75000 rows)
0|2|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders 
(orderKey=?) (~2 rows)


If I force the proper ordering, "X cross join Y", then the following, 
better-than-expected plan is used instead (only 300k rows examined!):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|0|SCAN TABLE orders AS O (~75000 rows)
0|1|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders 
(orderKey=?) (~5 rows)

0|2|2|SCAN SUBQUERY 1 AS Y (~18750 rows)


The main difference seems to be that Y should be the innermost loop 
but isn't.

Update: Running the different pieces alone shows:
- The scan of Orders for Y returns 5681 rows (not 37k)
- The temp B+Tree collapses that to 90 distinct rows
- The scan of Orders for X returns 104k rows (75k is the right ball-park)
- Each order averages 4 line items (5 was a pretty good estimate)
- Subquery 1 as Y has 90 rows (not 18k or 37k)

Given the above, the problem seems to lie with cardinality estimation, 
particularly because the estimate for DISTINCT's effectiveness was 
vastly overconservative. I assume the cardinality estimates in the two 
plans vary for the reason mentioned in where.c, that the estimated cost 
of a scan might not be as low when it's in the outer loop because 
certain (index-related?) cardinality reductions only happen in the 
innermost loop.


While I certainly won't complain if the above weakness could be fixed, 
at this point I'm satisfied that there's not an easy code tweak (or 
glaring omission in the SQL) to make the problem go away...


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-23 Thread Ryan Johnson

On 23/01/2012 3:09 PM, Ryan Johnson wrote:

On 23/01/2012 12:51 PM, Richard Hipp wrote:
On Mon, Jan 23, 2012 at 12:48 PM, Simon Slavin<slav...@bigfraud.org>  
wrote:
I don't know if Dr Hipp is pursuing this privately or expecting it 
to be

solved collaboratively on this list.


I don't have a test case to work on.
My database file is ~150MB, but was generated by the TPC-H data 
generator program. Assuming a linux-like environment (including 
cygwin), the following will reproduce the setup in under five minutes:
The problem persists with a freshly-generated database on my machine, 
using a just-compiled sqlite-3.7.10.


OK, it looks like I didn't install the 3.7.10 binary properly, because 
the situation is significantly different once ANALYZE has run.


The first query from the OP is remarkably similar to a query mentioned 
in where.c (from the 3.7.5 sources, which I happened to have handy):
Previous versions of SQLite [did not always find the lowest-cost plan] 
for scripts such as the following:

**
**   CREATE TABLE t1(a, b);
**   CREATE TABLE t2(c, d);
**   SELECT * FROM t2, t1 WHERE t2.rowid = t1.a;
**
The best strategy is to iterate through table t1 first. However it is 
not possible to determine this with a simple greedy algorithm. Since 
the cost of a linear scan through table t2 is the same as the cost of 
a linear scan through table t1, a simple greedy algorithm may choose 
to use t2 for the outer loop, which is a much costlier approach.


Re-casting my first query as the above gives:
sqlite> explain query plan select * from customer t2, orders t1 where 
t2.rowid = t1.custkey;

0|0|0|SCAN TABLE customer AS t2 (~15000 rows)
0|1|1|SEARCH TABLE orders AS t1 USING INDEX OrderCustomers (custKey=?) 
(~15 rows)


The same query plan is chosen if t1 comes first; execution times and 
stats confirm the improvement.


For query 2, the situation is less clear. First of all, it appears to 
execute more than 4x faster overall (10s and .2s -- WOW!) but the 
optimizer still seems to choose the wrong plan (13.6M rows examined):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|2|SCAN SUBQUERY 1 AS Y (~37500 rows)
0|1|0|SCAN TABLE orders AS O (~75000 rows)
0|2|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders 
(orderKey=?) (~2 rows)


If I force the proper ordering, "X cross join Y", then the following, 
better-than-expected plan is used instead (only 300k rows examined!):

1|0|0|SCAN TABLE orders (~37500 rows)
1|0|0|USE TEMP B-TREE FOR DISTINCT
0|0|0|SCAN TABLE orders AS O (~75000 rows)
0|1|1|SEARCH TABLE lineitem AS L USING INDEX LineItemOrders 
(orderKey=?) (~5 rows)

0|2|2|SCAN SUBQUERY 1 AS Y (~18750 rows)


The main difference seems to be that Y should be the innermost loop but 
isn't.


Thoughts?
Ryan


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-23 Thread Ryan Johnson

On 23/01/2012 12:51 PM, Richard Hipp wrote:

On Mon, Jan 23, 2012 at 12:48 PM, Simon Slavin<slav...@bigfraud.org>  wrote:


On 23 Jan 2012, at 4:30pm, Ryan Johnson wrote:


Bump?



   I don't know if Dr Hipp is pursuing this privately or expecting it to be
solved collaboratively on this list.


I don't have a test case to work on.
My database file is ~150MB, but was generated by the TPC-H data 
generator program. Assuming a linux-like environment (including cygwin), 
the following will reproduce the setup in under five minutes:


1. Download http://www.tpc.org/tpch/spec/tpch_2_13_0.tar.gz and extract
   the files (warning: they unpack to the current directory by default)
2. Copy/rename makefile.suite to Makefile
3. Edit 'Makefile' and set the following four variables (about 100
   lines into the file, just past the revision history):
 * CC=$YOUR_COMPILER
 * DATABASE=SQLSERVER
 * MACHINE=LINUX
 * WORKLOAD=TPCH
4. Edit 'driver.c' and divide all large magic numbers by 10 (lines
   181-195):
 * Example: {"part.tbl", "part table", 2, /* originally 20 */
5. Invoke 'make'
6. Invoke './dbgen -vf' (creates several .tbl files)
7. Create a new sqlite3 database using the attached tpch-schema.ddl file
8. Use .import to load customer.tbl and orders.tbl into the tables with
   the same name
9. Run ANALYZE (doesn't seem to make a difference)
10. Run the problem queries from the OP

The problem persists with a freshly-generated database on my machine, 
using a just-compiled sqlite-3.7.10.


Regards,
Ryan

create table Part(
partkey INTEGER PRIMARY KEY,
name,
mfgr,
brand,
type,
size INTEGER,
container,
retailprice REAL,
comment,
dummy);

create table Supplier(
suppKey INTEGER PRIMARY KEY,
name, address,
nationkey INTEGER,
phone,
acctbal REAL,
comment,
dummy);

create table PartSupp(
partKey INTEGER,
suppKey INTEGER,
availqty  INTEGER,
supplycost REAL,
comment,
dummy);

create table Customer(
   CustKey INTEGER PRIMARY KEY,
   name,
   address,
   nationkey INTEGER,
   phone,
   acctbal REAL,
   mktsegment,
   comment,
dummy);

create table Nation(
nationkey  INTEGER PRIMARY KEY,
name,
regionkey INTEGER,
comment,
dummy);

create table Region(
regionkey INTEGER PRIMARY KEY,
name,
comment,
dummy);

create table LineItem(
orderKey INTEGER,
partKey INTEGER,
suppKey INTEGER,
lineNumber INTEGER,
quantity INTEGER,
extendedPrice REAL,
discount REAL,
tax REAL,
returnFlag,
lineStatus,
shipDate DATETIME,
commitDate DATETIME,
receiptDate DATETIME,
shipInstruct,
shipMode,
comment,
dummy);

create table Orders(
orderKey INTEGER PRIMARY KEY,
custKey INTEGER,
orderStatus,
totalPrice REAL,
orderDate DATETIME,
orderPriority,
clerk,
shipPriority,
comment,
dummy);

create index SupplierNations on Supplier(nationkey);
create index CustomerNations on Customer(nationkey);
create index LineItemParts on LineItem(partkey);
create index LineItemSuppliers on LineItem(suppkey);
create index LineItemOrders on LineItem(orderkey);
create index OrderCustomers on Orders(custKey);
create index PartSuppSupp on PartSupp(suppkey);

create index OrderDate on Orders(orderDate);
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-23 Thread Ryan Johnson

On 23/01/2012 12:48 PM, Simon Slavin wrote:

On 23 Jan 2012, at 4:30pm, Ryan Johnson wrote:


Bump?

On 21/01/2012 2:47 PM, Ryan Johnson wrote:

On 21/01/2012 2:44 PM, Simon Slavin wrote:

On 21 Jan 2012, at 7:23pm, Ryan Johnson wrote:


It's a SQL89 join. Sqlite docs say SQL89 and SQL92 joins perform identically 
[1], which I confirmed before sending the OP.

Oh.  Okay.  If it spits out the same EXPLAIN QUERY PLAN then SQLite is 
interpreting it the same way.

Which brings us back to the original question: why does sqlite spit out a bad 
query plan when a vastly better one exists? There's no clear reason the better 
answer should have been hard to find.

Well, the approach would be to find a specific query, post the query and query 
plan, then to simplify the query until the strangeness in the query plan goes 
away.  The last simplification is the one that caused the problem.


On 21/01/2012 12:49 PM, Ryan Johnson wrote:

consider the following query:

select count(*) from orders O, Customer C where C.custkey=O.custkey 
and C.name like '%115';


.stats/explain reports 14 fullscan steps for the query plan:

0|0|TABLE orders AS O
1|1|TABLE Customer AS C USING PRIMARY KEY


Putting Customer first in the FROM clause makes the query markedly 
faster and executes only 14999 fullscan steps. The query plan confirms 
the change:

0|0|TABLE Customer AS C
1|1|TABLE orders AS O WITH INDEX OrderCustomers


In what way does the above query from the OP not sufficient to 
demonstrate the problem? There are only two tables involved. There are 
only two relevant indexes -- the automatically-created PK and manually 
created index the query *should* use; I don't think it's reasonable to 
suggest dropping the primary key index. Changing "count(*)" to "*" 
doesn't change the chosen access path for the query. Removing the 
predicate doesn't affect the access path either, thought it does 
increase by 1000x the number of rows returned (runtime stays about the 
same).


Given that ANALYZE ran and established |Customer|=15k vs. |Orders|=150k, 
why does the optimizer choose to do a full table scan of Orders, instead 
of scanning Customer and accessing Orders using the index created 
specifically for that purpose? (It's even marked as a foreign key in the 
schema, though I'm pretty sure sqlite doesn't pay attention to that). 
This seems a cut-and-dried join reordering scenario that the optimizer 
should eat for lunch, and instead it seems to limit its search to 
selecting the best index(es) to use given the join ordering the original 
SQL specified. If the optimizer were merely making a bad decision -- 
which the second example of the OP seems to demonstrate -- then it would 
reorder (wrongly) the "correct" version of the SQL; it does not.


Thoughts?
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-23 Thread Ryan Johnson

Bump?

On 21/01/2012 2:47 PM, Ryan Johnson wrote:

On 21/01/2012 2:44 PM, Simon Slavin wrote:

On 21 Jan 2012, at 7:23pm, Ryan Johnson wrote:

It's a SQL89 join. Sqlite docs say SQL89 and SQL92 joins perform 
identically [1], which I confirmed before sending the OP.
Oh.  Okay.  If it spits out the same EXPLAIN QUERY PLAN then SQLite 
is interpreting it the same way.
Which brings us back to the original question: why does sqlite spit 
out a bad query plan when a vastly better one exists? There's no clear 
reason the better answer should have been hard to find.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-21 Thread Ryan Johnson

On 21/01/2012 2:44 PM, Simon Slavin wrote:

On 21 Jan 2012, at 7:23pm, Ryan Johnson wrote:


It's a SQL89 join. Sqlite docs say SQL89 and SQL92 joins perform identically 
[1], which I confirmed before sending the OP.

Oh.  Okay.  If it spits out the same EXPLAIN QUERY PLAN then SQLite is 
interpreting it the same way.
Which brings us back to the original question: why does sqlite spit out 
a bad query plan when a vastly better one exists? There's no clear 
reason the better answer should have been hard to find.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Linking a "C" program with SQlite3.DLL

2012-01-21 Thread Ryan Johnson

On 21/01/2012 1:46 PM, Bruce Steele wrote:

I suspect this is a very basic question that is answered somewhere but I have 
done lots of searches and have been able to find a good answer.

I am trying to compile a C program using Sqlite3 APIs.
the program contains an include sqlite3.h line.
I am using mingw (GCC) compiler on a Win7 PC.

What I am looking for is the correct Command line including all the correct 
switches and in the correct order.
I use cygwin on win7, but this should do (works for me): gcc your-prog.c 
-lsqlite3


Naturally, you probably want to add some flags like '-g -Wall' but 
that's orthogonal.



I either get the WinMain16 linker error or file not found error with all the 
compile commands I have tried.
That means you don't define a main() function in (any of) your 
source(s), which has nothing to do with sqlite. You might try a general 
programming help list if that's your "main" problem.


Hint: posting small snippets of code that show the problem, and the 
resulting error messages, helps folks help you.


Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Automatic join reordering doesn't seem to work?

2012-01-21 Thread Ryan Johnson

On 21/01/2012 1:01 PM, Simon Slavin wrote:

On 21 Jan 2012, at 5:49pm, Ryan Johnson wrote:

In one case the optimizer seems to make a different decision depending on which 
order I write the join in; in the other case, the join ordering chosen is bad 
and compounded by an expensive subquery not being materialized into a transient 
table as it should be.

For the first issue, consider the following query:

select count(*) from orders O, Customer C where C.custkey=O.custkey and C.name 
like '%115';

I see no JOIN.  I see a WHERE.  And it's not clear to me what you're trying to 
count: orders or customers.
It's a SQL89 join. Sqlite docs say SQL89 and SQL92 joins perform 
identically [1], which I confirmed before sending the OP. The older 
syntax just requires less typing, that's all.


The query counts orders made by qualifying customers.

Ryan

[1] http://www.sqlite.org/optoverview.html#joins

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Automatic join reordering doesn't seem to work?

2012-01-21 Thread Ryan Johnson

Hi all,

I'm playing around with a small TPC-H dataset (scale factor 100) in 
sqlite-3.7.3, and have noticed that several of the optimizations 
described at http://www.sqlite.org/optoverview.html don't seem to take 
effect, even after running ANALYZE.


In one case the optimizer seems to make a different decision depending 
on which order I write the join in; in the other case, the join ordering 
chosen is bad and compounded by an expensive subquery not being 
materialized into a transient table as it should be.


For the first issue, consider the following query:

select count(*) from orders O, Customer C where C.custkey=O.custkey and 
C.name like '%115';


Then .stats/explain reports 14 fullscan steps for the query plan:

0|0|TABLE orders AS O
1|1|TABLE Customer AS C USING PRIMARY KEY


Putting Customer first in the FROM clause makes the query markedly 
faster and executes only 14999 fullscan steps. The query plan confirms 
the change:

0|0|TABLE Customer AS C
1|1|TABLE orders AS O WITH INDEX OrderCustomers


Cardinalities of the tables are customer:15k and orders:150k, so I would 
expect any predicate on Customer to get the optimizer's attention. If 
the LIKE clause was just confusing the optimizer's cardinality estimates 
then I would have expected it to always choose the same query plan, but 
it doesn't.


Note that the index referenced above corresponds to a foreign key in the 
schema

CREATE INDEX OrderCustomers on Orders(custKey);



A second problem lies with non-flattened nested queries: instead of 
materializing the result in a transient table, sqlite reruns the query 
for each tuple in the other relation, even though the query cannot 
possibly be correlated:


select count(*)
from
(select
julianday(O.orderdate) ordered,
julianday(L.receiptdate) received
from orders O, lineitem L
where
L.orderkey=O.orderkey
and ordered >= julianday('1994-01-01')
and received < julianday('1994-04-01')
) X,
(select distinct(julianday(orderdate)) d
from orders
where
d >= julianday('1994-01-01')
and d < julianday('1994-04-01')
) Y
where Y.d between X.ordered and X.received;

The first subquery has cardinality 5918 and examines 150k rows, while 
the second has cardinality 90 and examines 150k rows; the overall query 
should therefore examine 150k+150k+90*5900 = ~830k rows. Instead, it 
takes 45s and 13650087 fullscan steps to run, or roughly 90*150k + 150k 
+ 90 (the cost of evaluating Y, iterating over Y, and running X once per 
row in Y).


Reordering the query doesn't help (X really should go first but the 
optimizer insists on Y). Disabling join optimization (x cross join y) 
cuts the query's cost to 1.6s and 827k rows.


Is there something obvious I'm missing here? The second case, in 
particular, doesn't seem to depend on cardinalities: the non-correlated 
subquery should be materialized rather than re-running repeatedly 
(according to the docs, at least), at which point join ordering wouldn't 
matter nearly so much.


Thanks,
Ryan
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newbie question, how multiple joins are executed

2011-06-20 Thread Ryan Johnson
On 20/06/2011 6:59 AM, Richard Hipp wrote:
> On Mon, Jun 20, 2011 at 6:55 AM, Ryan Johnson<ryanj...@ece.cmu.edu>  wrote:
>> IIRC sqlite does *not* do any join ordering optimizations and simply
>> runs them in whatever order the query specifies. This can have
>> unfortunate effects on runtime for some queries. Can anyone verify this?
> The SQLite query optimizer started reordering joins beginning with SQLite
> version 3.2.3 in August 2005.
Oh, good. Searching the docs, I found where I was led astray: 
http://www.sqlite.org/vdbe.html (granted, the page does say it's many 
years out of date, but I forgot where I'd read it).

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newbie question, how multiple joins are executed

2011-06-20 Thread Ryan Johnson
On 19/06/2011 8:03 PM, Simon Slavin wrote:
> On 20 Jun 2011, at 12:58am, Lucas Cotta wrote:
>
>> I understand that for a query with a two tables join, SQLite will do a
>> nested loop join with these two tables.
>> But what about a query joining 5 tables?
>> It would be like this?:
>>
>> for(table1 lines){
>>   for(table2 lines){
>>   for(table3 lines){
>>   
> You can use JOIN any number of times in a SELECT query.  SQLite will work out 
> an optimal way to access each table.  You can speed things up a great deal by 
> having useful INDEXes created.
IIRC sqlite does *not* do any join ordering optimizations and simply 
runs them in whatever order the query specifies. This can have 
unfortunate effects on runtime for some queries. Can anyone verify this?

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to write Query to do this?

2010-12-13 Thread Ryan Johnson
On 12/13/2010 11:17 AM, Stefano Mtangoo wrote:
> Thanks Dr. and I'm checking the zip file. But to be frank, I don't
> understand the calculation done below. How do I come to such claculation
> (excuse my ignorance)? also how do I query that simple BETWEEN?
> That seems to be easiest way but I haven't grasped it yet
> Thanks
>
> On Mon, Dec 13, 2010 at 9:02 PM, Richard Hipp  wrote:
>
>> id = book_number*100 + chapter*1000 + verse.
The actual book is arranged in a hierarchy: books, then chapters, then 
verses. The above flattens the hierarchy in a way that preserves 
ordering: all verses in Genesis (1M <= v < 2M) have numbers smaller than 
any verse in Exodus (2M <= v < 3M), etc.

Think of it like a phone number: the first few digits give the country, 
the next few the area, and the last several identify whoever you're 
trying to call.

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem importing integers from csv

2010-12-09 Thread Ryan Johnson
On 12/9/2010 6:46 PM, Simon Slavin wrote:
> On 10 Dec 2010, at 12:20am, Ryan Johnson wrote:
>
>> On 12/9/2010 4:56 PM, Igor Tandetnik wrote:
>>> On 12/9/2010 6:42 PM, Ryan Johnson wrote:
>>>> create table a(x,y INTEGER);
>>> This creates a table with column x having no affinity, and column y
>>> having integer affinity. Is this what you meant?
>> Doh! That explains why I couldn't repro with only a single column...
> Yes, you got mixed up.  The command-line example you give works exactly as 
> expected.  The thing that may be confusing you is that the INSERT ... SELECT 
> form does work in an unexpected way.
>
>
>> Just for clarification, though, how do comparisons work when there is no
>> affinity? I thought it did a string compare, but that should have worked
>> in the above case...
>>
>> In general, is "1"=1 ever true? With all the talk of dynamic typing,
>> etc. I would have expected so, but now I wonder...
> You now have it right.  A string is never a number.  SQLite generally does 
> the conversion to a number when initially setting the value of the field.  
> This means it doesn't have to keep doing it for every comparison.  You may 
> find this useful:
>
> http://www.sqlite.org/datatype3.html
>
> for example, sequences like
>
> "When text data is inserted into a NUMERIC column, the storage class of the 
> text is converted to INTEGER or REAL (in order of preference) if such 
> conversion is lossless and reversible."
Yes, I'd read that, but the combination of two bugs (the x,y INT thing 
and misunderstanding the conversion rules) made it hard to figure out 
what was going on.

Thanks a lot!
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem importing integers from csv

2010-12-09 Thread Ryan Johnson
On 12/9/2010 4:56 PM, Igor Tandetnik wrote:
> On 12/9/2010 6:42 PM, Ryan Johnson wrote:
>> create table a(x,y INTEGER);
> This creates a table with column x having no affinity, and column y
> having integer affinity. Is this what you meant?
Doh! That explains why I couldn't repro with only a single column...

Just for clarification, though, how do comparisons work when there is no 
affinity? I thought it did a string compare, but that should have worked 
in the above case...

In general, is "1"=1 ever true? With all the talk of dynamic typing, 
etc. I would have expected so, but now I wonder...

Thanks,
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Problem importing integers from csv

2010-12-09 Thread Ryan Johnson
On 12/9/2010 4:10 PM, Simon Slavin wrote:
> On 9 Dec 2010, at 11:04pm, Ryan Johnson wrote
>> I'm a new user to sqlite (I just compiled sqlite-autoconf-3070400 on
>>  ), and am running into what seems to be a bad interaction between
>> type affinity and integer comparisons.
>>
>> I'm importing a csv file full of single-digit integer values into a
>> table which .schema reports as "CREATE TABLE numbers(num INTEGER);"
> Can you confirm that you are using the '.import' command from command-line 
> tool to do this ?
Yes.
> In your source .csv file, are the digits alone on their line or are they 
> surrounded by single or double quotes ?
Naked.

Here's a small test case... it turns out I can't repro with only a 
single-column table:

$ cat > test.csv < a natural join b:
> 1,1
> 2,3
> 4,5
> a natural join c:
> END

Thoughts? I can download the windows version if the above isn't enough.

Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Problem importing integers from csv

2010-12-09 Thread Ryan Johnson
Hi all,

I'm a new user to sqlite (I just compiled sqlite-autoconf-3070400 on 
cygwin), and am running into what seems to be a bad interaction between 
type affinity and integer comparisons.

I'm importing a csv file full of single-digit integer values into a 
table which .schema reports as "CREATE TABLE numbers(num INTEGER);"

As far as I can understand from the documentation, the imported values 
can be losslessly and reversibly converted to INTEGER, but comparison 
operations (=, in, natural join) against other integers fail until I do 
"UPDATE numbers SET num=0+num;"

Is this working as designed and I'm missing something obvious about the 
type system, or is this a bug?

Thanks,
Ryan

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users