Re: [sqlite] Low Level API for SQLite3

2006-05-10 Thread Nathan Kurz
On Thu, May 11, 2006 at 10:20:26AM +0530, Anish Enos Mathew wrote:
> 
> Dennis,
>Thank you so much for sending me the code. Its works fine!! It
> took 8 sec to insert 10 records into the table where previously it
> was taking 30 sec to insert 1 records. Dennis, if u could do me a
> favour, can u explain me how to retrieve datas from the table. Or if u
> have the code for that, can u pls send it to me...
> 

Anish,

Dennis has provided you with lots of help.  At a certain point (which
he may or may not have reached) he will want to stop providing this
help.  This is OK, but unfortunately he may also decide to stop
providing help to other people, like me.  This would be bad.  

I think it may be the time for you to do more research, try things
out, and then come back when you have more specific questions.  This
is of course only my personal opinion.  Here's a link that expresses
this more tactfully than I can: http://www.slash7.com/pages/vampires

Nathan Kurz
[EMAIL PROTECTED]




Re: [sqlite] Re: what is faster?

2006-04-15 Thread Nathan Kurz
On Sun, Apr 16, 2006 at 02:23:45AM -0300, Cesar David Rodas Maldonado wrote:
> On 4/15/06, Cesar David Rodas Maldonado <[EMAIL PROTECTED]> wrote:
> >
> >  Hello
> >
> > I have a software that uses four tables.
> >
> > that's ok.. but i wanna know if is faster if i use for every table a
> > database ( a file) or if is the same thing in velocity if a put all the
> > tables in one file, couse every table uses a lot of data.
> >
> > thanks

It is unlikely that multiple files would be faster.  It is also very
unlikely that splitting the tables into 4 separate databases would
improve the finished quality of your application.

> any one can answer me?

Your English is OK, but your question is very loose.  What platform
are you running on, why do you think SQLite would be the bottleneck,
and what do you mean by "a lot of data".  For some people, a lot of
data is 100,000 small records, for some it means terabytes.  It's
difficult for anyone to answer without knowing more specifics.  

Nathan Kurz
[EMAIL PROTECTED]






Re: [sqlite] DBD Sqlite

2006-04-04 Thread Nathan Kurz
On Tue, Apr 04, 2006 at 04:18:35PM -0700, Sripathi Raj wrote:
> On 4/4/06, Nathan Kurz <[EMAIL PROTECTED]> wrote:
> >
> > > >> 3. The performance for inserts is really bad. Around 40k entries
> > takes a
> > > >>few hours. What might I be doing wrong? I do a commit after
> > > >>all the inserts.
> >
> > That doesn't seem right for speed.  In addition to using "commit", are
> > you beginning a transaction with "begin"?  Are your inserts
> > particularly complex or large?  More details about what you are doing
> > would probably be good here, since something odd is happening here.
> > Maybe you could post a tiny test program along with the time it takes?
>
> I don't begin the transaction with begin. My assumption was that the first
> insert operation would automatically begin a transaction.

True, it does an implicit "BEGIN", but it also does an implicit
"COMMIT" when each command finishes.  Try adding an explicit "BEGIN;"
before you do the inserts: <http://www.sqlite.org/lang_transaction.html>

--nate


Re: [sqlite] DBD Sqlite

2006-04-04 Thread Nathan Kurz
> >> 3. The performance for inserts is really bad. Around 40k entries takes a
> >>few hours. What might I be doing wrong? I do a commit after
> >>all the inserts.
> >
> > A few things to help with speed:
> >
> > 1. Use DBI's prepared statements; eg, 1 prepare() and many execute().
> 
>  Yes, this is what I do.
> >
> > 2. Don't commit for each row inserted but batch them so, say, you
> >commit once per 1000 rows.
> >
>  Unfortunately, I cannot commit till I do all the inserts.

That doesn't seem right for speed.  In addition to using "commit", are
you beginning a transaction with "begin"?  Are your inserts
particularly complex or large?  More details about what you are doing
would probably be good here, since something odd is happening here.
Maybe you could post a tiny test program along with the time it takes?

--nate



Re: [sqlite] Performance & database design

2006-03-21 Thread Nathan Kurz
On Wed, Mar 22, 2006 at 10:41:23AM +1100, John Stanton wrote:
> The mmap'd index was about three times faster than when it 
> used a LRU paged cache. 

I looked fairly closely into the possibility of using mmap for the
SQLite btree backend, and realized that it would be quite difficult.
Because the SQLite file format is host byte-order independent, it's
almost impossible to use mmap without a separate cache.  If one were
to give up on the cross-platform portability, I think one could get a
significant speedup on large file access, but it would be necessary to
write/adapt the entire backend rather than just making small changes.

Nathan Kurz
[EMAIL PROTECTED]


Re: [sqlite] Re: concers about database size

2006-03-16 Thread Nathan Kurz
On Thu, Mar 16, 2006 at 11:34:31AM -0600, Jay Sprenkle wrote:
> > > You may legitimately need one really large table but most applications
> > > don't.
> 
> Too bad. My guess is that you're doing the right thing trying to consolidate.
> It's going to take expensive hardware no matter what you end up doing.
> Your design is larger than any I've ever heard of using sqlite. My guess is
> that it will work if you have enough RAM. 

Why would it require a lot of RAM?  I ask not because I doubt you, but
because my intuition says that a BTree based database should scale
pretty well.  While certainly it would run faster if you can fit the
whole thing in RAM, if the index can be made to fit in RAM it seems
like the data can be just about any size.

> I've used much larger databases than this on all of the big
> commercial database engines (sqlserver, db2, informix, oracle). Any
> of them will certainly work for you.

Again, my intuition would be tha for single-user usage patterns SQLite
should have less rather than more drop-off in performance than those.
Is there something about the way that SQLite handles large files that
would cause it to degrade faster than a commercial database?  Are you
saying he could get by with less hardware using a heavier weight database?

--nate


Re: [sqlite] Compatability issues with DBD::SQLite and SQLite v3.3.3 and 3.3.4

2006-03-06 Thread Nathan Kurz
On Mon, Mar 06, 2006 at 06:24:13PM -0800, Steve Green wrote:
> Hi,
> 
> I'm hoping that someone can shed some light on the following issue that
> I'm seeing.  When I attempt to create a temp table using DBD::SQLite (v1.11)
> and either SQLite v3.3.3 or v3.3.4, I get the following error:
> 
> DBD::SQLite::db do failed: not an error(21) at dbdimp.c line 398 at 

Same failures here on Linux 2.4.29.  I didn't have time to debug it,
so I decided just to downgrade to a copy of 3.2.7 that I had around,
and things were fine thereafter.  I presume that some recent change to
sqlite is not yet reflected in DBD::SQLite.  'make test' for
DBD::SQLite is another fine test program that fails.

Nathan Kurz
[EMAIL PROTECTED]


[sqlite] DBD:SQLite and sqlite3_column_decltype()

2006-02-09 Thread Nathan Kurz
Hello --

I'm trying to track down a segfault that happens when I'm using
DBD::SQLite for Perl, and I'm confused by the documentation for
sqlite3_column_decltype().  I think I understand what it does, but
I think there are some typos that make me uncertain:

 http://sqlite.org/capi3ref.html#sqlite3_column_decltype

 The first argument is a prepared SQL statement. If this statement is
 a SELECT statement, the Nth column of the returned result set of the
 SELECT is a table column then the declared type of the table column
 is returned. If the Nth column of the result set is not at table
 column, then a NULL pointer is returned. The returned string is UTF-8
 encoded for sqlite3_column_decltype() and UTF-16 encoded for
 sqlite3_column_decltype16(). For example, in the database schema:

 CREATE TABLE t1(c1 INTEGER);
 

 And the following statement compiled:

 SELECT c1 + 1, 0 FROM t1;
 

  Then this routine would return the string "INTEGER" for the second
  result column (i==1), and a NULL pointer for the first result column
  (i==0).

Is the first sentence supposed to be "If this statement is a SELECT
statement [and if] the Nth column..."?  And the next sentence should
be "is not [a] table column"?  And is the final paragraph correct, or
are the numbers reversed for which is NULL and which is "INTEGER"?  

I think what's happening with the Perl interface is that
sqlite3_column_decltype() returns NULL if the table is created without
an explicit type, instead of the "" or "TEXT" that I would have
expected.  Is this correct?  Or should it return something else?

Thanks!

Nathan Kurz
[EMAIL PROTECTED]


Re: [sqlite] Does anyone know how I would uninstall sqlite?

2006-01-21 Thread Nathan Kurz
> twoeyedhuman wrote:
> > I have the latest version of sqlite3 on my debian box and I'd like
> > to uninstall it because I keep getting this error through bash:
> > sqlite3: error while loading shared libraries: libsqlite3.so.0:
> > cannot open shared obje ct file: No such file or directory I'm
> > actually running xandros linux and I just want to find where all
> > the files were installed so I can delete them manually and install
> > sqlite through a debian package.  Is this possible?

I missed the original post, so I apologize if these were already
suggested, but some possibilities for finding this out include:

1) If you installed from source, try 'make -n install' from the source
   directory.  This is simulate the 'install' command without actually
   doing anything.  Look through the results to see where things go.

2) Many Linux systems have a command called 'locate' (also called
   'slocate').  If you system has this, trying 'locate sqlite' will
   probably show you what you need to remove.

3) Linux sometimes needs the command 'ldconfig' to be run as root when
   shared libraries are updated.  It's possible that running this will
   get rid of your error and let you use the new version.

4) 'ldd' and 'strace' are useful commands for figuring out which
   shared libraries are being used and where they are located.  Trying
   'ldd /usr/local/bin/sqlite3' or 'strace sqlite3' might help you.

Good luck!

--nate


Re: [sqlite] Functions

2006-01-17 Thread Nathan Kurz
> Was looking on the functions code. Wondered if their's any way I can
> do something like
> 
> for  select  from table
> 
> do something here
> 
> loop
> 
> the field have a array of data may be single row but only single column

Hi Vishal --

I'm not sure I understand your syntax.  Maybe a more complete example
of what you are hoping to do would help?

If my guess is right about what you mean, it should be easy to do
using a user-defined function and a sub-select.  In general, if you
can form the loop as part of a SQL statement, the loop part should be
easy.  And for "do something here", you can define an aggregate
function to do what you want.  But an actual example would help.

--nate


Re: [sqlite] Re: Padding with zeros

2005-12-29 Thread Nathan Kurz
On Fri, Dec 30, 2005 at 12:21:05AM +, Brian Johnson wrote:
> I have a field of text and I want to prefix the that text with numbers padded
> with zeroes.
> 
> eg currently
> text 1
> text 2
> text 3
> 
> to become
> 001 text 1
> 002 text 3
> 003 text 2
> 
> or ultimately
> b001 text 1
> b002 text 3
> b003 text 2

While it is possible you really need to do it the way you said, might
it work to just have a separate column in the database that is used
for the sorting?   Then do something like:

create table your_table (text_column, sort_column);
select text_column from your_table order by sort_column;

(very unchecked syntax, but you get the idea)

--nate


Re: [sqlite] returning multiple rows from custom functions

2005-12-15 Thread Nathan Kurz
On Thu, Dec 15, 2005 at 09:17:48PM +, Andrew McDermott wrote:
> For example, I'm currently computing a histogram in application code for
> a moderately sized table (7+ million rows), but I'm wondering whether it
> would be quicker to get the results computed via a custom SQLite
> function.  I'm expecting it to be quicker because the current
> implementation traverses the JNI boundary for each row in the result set
> whereas a custom function wouldn't need to do this.

No, it is not currently possible.  The way the VDBE is set up, all
functions return only a single value.  But do you need multiple rows,
or multiple values?  For a histogram, would multiple values suffice?

If so you can fake it.  With aggregate functions it's possible to
compute the histogram, and then return it in some special form.  You
could for example define a histogram function that returns a comma
separated list of text, or you could have it return an array as a blob.

> select histogram(column) from table;
> 1,4,1

Or you could split the histogram function into pieces, one for each
bin, and write a simple aggregate function that just counts the
occurences of one particular value:

> select number(column,1), number(column,2), number(column,3) from table;
> 1,4,1

Both of these would be quite easy to do.  That said, I'd really love
if it were possible to return either multiple values or multiple rows
from a user defined function.  It would make my life much easier,
since then I could work with the results at the SQL level rather than
the application level.  As it is, I'm currently I'm doing stuff like
writing functions that cache the results and return a handle, then
writing accessor functions that lookup the result via that handle.

My impression is that multiple values would be difficult, as the VDBE
parser would have to know in advance what columns would be returned by
the function to be able to use the results internally.  Multiple rows
(of a single value) seem like they would be a fairly simple addition,
as one would only need to specify that a function might return
multiple rows, and not how many or their type.

--nate


Re: [sqlite] Can't commit transaction - SQL statements in progress

2005-12-12 Thread Nathan Kurz
On Mon, Dec 12, 2005 at 06:55:24PM -0500, [EMAIL PROTECTED] wrote:
> Eric Scouten <[EMAIL PROTECTED]> wrote:
> > Or to put it another way, this is essentially a memory leak problem. 
> > SQLite obviously knows that I've lost track of one or more prepared 
> > statements that haven't run to completion, it isn't telling me *what* 
> > statements those are. I'm wondering if there is any way of getting that 
> > information. Armed with that knowledge, I can probably fix my code 
> > fairly quickly.

If you happen to be running on Linux, you might try using 'valgrind'
(http://www.valgrind.org).  Among other things, it's a link time
memory debugger that does a good job of catching lost pointers.  I've
used it for similar problems with SQLite custom code with great success.

--nate


Re: [sqlite] About Index using

2005-12-11 Thread Nathan Kurz
On Mon, Dec 12, 2005 at 09:48:21AM +0800, Bo Lin wrote:
> Here is a sql string ,like : select * from test where (a=0 or a=1)
> and b=1 ; and column a range from 1-1, and column b range from
> 0-1.  and DB has about 300,000 record with colum a and b configured
> randomly .
>
> Two index is create on "test" table . First is on "column b" and the
> second is on "a,b"
>
>  but when I try to use "explain" , I found the first index is
> used. but obviousely if sqlite can use the second index , the
> performance can be improved a lot .
>
>  how can I use the second index, can sqlite can support "select" to
> specify certain index ?

I'm sorry I can't just offer you the correct answer, but have you
looked at ?  It gives some good
hints on how to the choice of indices is made.

Good luck,

--nate


Re: [sqlite] Speed of a count operation

2005-12-08 Thread Nathan Kurz
On Thu, Dec 08, 2005 at 12:26:44PM -0500, Teg wrote:
> I have a 6.5 gb database with 29,587 records in it. It takes about
> 30-40 seconds for a count to return the first time I specify one in
> "Sqlite3" (seeing this in my program as well). Subsequent "count"
> operations are nearly instantaneous even when I exit Sqlite3 and
> restart. I'm pretty much seeing that across the board with other DB's
> as well, that count take a fairly long time the first time it's called
> and then it's fast.
> 
> Is there any way I can speed this up? I use "count" to set my progress
> bars to proper ranges so, the delay causes people to wait.

Most likely the time is being spent reading parts of the database from
disk into memory buffers.  After it is read once, the OS caches it if
you have enough spare RAM.  If you want it to be faster, your goal
should be to reduce the amount of data that has to be read.

If you give count() an argument, it counts the number of times that
argument is non-null.  Because you are using count(FileOrder) and
there is no index on this column, the entire database (minus the
overflow pages from your blobs) is read.

So your goal will be to come up with a "select count()" statement that
doesn't need to read as much data.  Try using ".explain" then "explain
select count(FileOrder) from Files_V1;" to see what is happening.
Determine if it is using an index, or reading the whole table.

Then try "select count(*) from Files_V1" and see if that is much
faster.  If not, try doing the count on Signature to see if you can
force it to use the index.  Failing that, you could always make a
small table of just FileOrder and do the count on that.

Good luck!

--nate


Re: [sqlite] Probably not simple question

2005-12-07 Thread Nathan Kurz
On Wed, Dec 07, 2005 at 08:34:46PM -0800, Dan Kennedy wrote:
> > To make it work in more than a superficial manner, you probably will
> > need a good understanding of how structures are internally represented
> > in C++ or C.  You pass sqlite a pointer to the struct and tell it how
> > long it is (using sizeof()).  When you get the blob back, you then
> > treat the data as an instance of your structure by casting it.
> 
> I'm not sure this is always the best way to go. Any databases produced like 
> this will not be portable between architectures. In theory, they may not
> be portable between compilers, but in practice you're probably Ok there.

A good caveat.  If any of the members of your structure are more than
one byte long (ie, an int) this approach will fail terribly if you
create the database on a little-endian machine (Intel/PC) and then try
to use it on a little-endian one (Motorola/Mac).  But I decided to
take the question as 'How do I', rather than 'Should I'.

> Also, SQLite makes no guarantees as to the alignment of returned blobs. 
> This could cause a random crash somewhere down the line. You could get
> around this by making a copy of the blob to memory obtained from malloc().

Can you offer more information about this?  In what I'm working on, I
am storing arrays of complex structures as blobs and then reconstituting
them.  I was concerned about alignment at one point, but it seems to
be working without problems.  Are there 'gotchas' that are likely to
'get me' in the future?  This is server-side, and only for temporary
data, so I'm not concerned about the endian-ness, only the alignment.

Thanks!

--nate




Re: [sqlite] problem with blobs (perl code)

2005-12-05 Thread Nathan Kurz
On Mon, Dec 05, 2005 at 08:23:19AM -0500, [EMAIL PROTECTED] wrote:
> > OK, so 1.11 is on CPAN which fixes this. However I have another bug 
> > report about this not working for user defined functions, where I do 
> > this:
> > 
> >  s = SvPV(result, len);
> >  sqlite3_result_text( context, s, len, SQLITE_TRANSIENT );

I'm the original reporter of that bug, and I can confirm that while it
did not work in DBD::SQLite 1.09, it works without problem in 1.11.

sqlite3_result_text() appears to deal fine with embedded NUL's.
Attached is a tiny C program I just used to confirm this to myself.

Large thanks to Matt and Richard for so quickly ending up with a great
solution that just transparently works!

--nate


#include 
#include 

/* gcc test_blob_as_text.c -lsqlite3 -o test_blob_as_text */


static int callback(void *NotUsed, int argc, char **argv, char **azColName){
  int i;
  for(i=0; i

Re: [sqlite] Re: segfaults on aggregate functions in where clause?

2005-12-02 Thread Nathan Kurz
On Fri, Dec 02, 2005 at 11:24:10PM -0500, Igor Tandetnik wrote:
> It is indeed an illegal SQL query. You probably want
> 
> select id from test
> group by id
> having avg(rating) > 10;

I hadn't realized that my query was actually illegal.  OK.

> >The workaround of putting the aggregate in a subselect works fine:
> >
> >select id from
> > (select id, avg(rating) as average from test where average > 10);

That was just my typing when I was too tired.  It does not work.
Make that:

  select id from
 (select id, avg(rating) as average 
  from test where average > 10 group by id);

I've got to start just pasting in my actual queries rather than trying
to simplify them.   I've messed up on that twice in a row now.

Sorry,

--nate


[sqlite] segfaults on aggregate functions in where clause?

2005-12-02 Thread Nathan Kurz
I was just about to file a bug about a segfault when I have an
aggregate in the where clause like:

  select id from test where avg(rating) > 10 group by id;

But when I looked in the bug reports, I found that several similar
bugs had been reported and that the problem had been 'fixed': 
http://www.sqlite.org/cvstrac/chngview?cn=2769

Unfortunately, the fix just makes it illegal to do.  Is this the
permanent state, or just a placeholder until a further fix is made?
The workaround of putting the aggregate in a subselect works fine:
 
 select id from 
  (select id, avg(rating) as average from test where average > 10);

So I'm not really encumbered by it.  But at a glance I would think the
optimizer could just 'optimize' the original to the form that works.
Or is that harder than it looks?

Just wondering,

--nate


Re: [sqlite] problem with blobs (perl code)

2005-12-01 Thread Nathan Kurz
On Thu, Dec 01, 2005 at 09:52:25PM -0500, [EMAIL PROTECTED] wrote:
> Suppose you do this:
> 
>sqlite3_bind_text(pVm, 1, "abc\000xyz\000pq", 10, SQLITE_STATIC);
> 
> If this is part of an INSERT, say, then you will insert a 10-character
> string that happens to contain a couple of extra \000 characters.
> The statement:

This is indeed exactly how DBD::SQLite handles an execute statement
with a blob argument: it's bound as text but with the length set
according to the blob length.   So the retrievals happen exactly as
you describe (although I had not thought to try 'cast as blob').

> So as far as I can tell, both SQLite and Perl are doing exactly what
> they ought to be.

That certainly could be argued, although it does seem to trap the
unwary (like me) with fair regularity.  One option for 'improving' it
might be to make 'BLOB' become a bona fide column affinity, so that a
TEXT type inserted into a BLOB column would be stored as a blob.

Would there be a downside to do this?

--nate


Re: [sqlite] problem with blobs (perl code)

2005-11-30 Thread Nathan Kurz
On Wed, Nov 30, 2005 at 05:10:19PM -0600, Jim Dodgen wrote:
> > What do you get back (using the command-line client) when you
> > ask for LENGTH(x) or QUOTE(x) instead of just the column x?
> 
> sqlite> select length(val) from foo;
> 3
> sqlite> select quote(val) from foo;
> 'MZP'
> 
> strange, reports a length of 3 but the database is > the size of the
> file i put into the blob.  other than the record in foo the database
> is fresh and empty.

Although the problem is definitely with Perl here, SQLite's response
_is_ pretty odd.  The whole value of the BLOB gets inserted (since the
length is set correctly by Perl), but since it is inserted as type
TEXT, the retrieval (and the length) depend on the first NUL even
though the entire blob is actually entered in the database.  Probably
not quite a bug, but maybe worth a spare thought.

> I tried one ot the workarounds noted in
> http://rt.cpan.org/NoAuth/Bug.html?id=14595 which had you force the
> data type to SQL_BLOB this makes things work!! horray!!

I'm glad something worked for you!  In some ways that is probably a
better solution than my patch, since it is more explicit.

D. Richard Hipp <[EMAIL PROTECTED]> writes:
> Matt Sergeant monitors this mailing list and should see your post.

Thanks!

--nate



Re: [sqlite] functions that return tables

2005-11-21 Thread Nathan Kurz
On Fri, Nov 18, 2005 at 04:25:12PM -0700, Dennis Cote wrote:
> >Is there any reasonable way to accomplish this?  Or am I left with
> >defining a new function type that returns a handle to a temp table,
> >and new parsing logic to wrap the right OP codes around that function?
> 
> I don't know of a way to do what you want with a user defined function, 
> but your example can be solved quite simply using SQL. The following 
> query will return a table with the required results.
> 
> select * from test order by col desc limit 3;
> 
> If you have an index on col then it will also be very fast regardless of 
> the size of the table, if not  it will do a single table scan to find 
> the three maximum values.

Thanks Dennis.  Unfortunately, I was only using the max() function as
a simplified case to try to show what I wanted.  In reality, I need to
try to find which which N rows are most similar to a given row in the
table based on a custom vector similarity function that compares a
blob in the test field against a blob in each of the other rows (as
well as some joined in parameters).  So a precalculated index is out.

Also, I think the part that Igor was pointing out was the "if not it
will do a single table scan".  His point (which I think is true) is
that in if no index is available with the information then a temporary
copy of the entire table has to be created and sorted.  The 'complex
heap approach' is indeed what I'm doing in my user defined aggegate
function, as the cost of the temp table creation and sorting is the
reason I can't do this as a simple function with an 'ORDER BY'.

So I'm still looking for how it could be done with user defined
functions, even if that means significantly extending the way that
user defined functions are handled in SQLite.  Advice on how to do
that (or suggestions on better alternatives) greatly appreciated.

--nate



Re: [sqlite] functions that return tables

2005-11-18 Thread Nathan Kurz
On Fri, Nov 18, 2005 at 05:43:01PM +0100, Noel Frankinet wrote:
> >My current workaround is to have my function return a comma separated
> >list of values ("10,9,8"), parse this string in my application, and
> >generate a new query, but ideally I'd like to do this in one step.
>
> why not a vector of values ?

A vector of values (I presume you mean returning a BLOB that is
actually an array of values?) would be great if there was any way to
use these values as part of a JOIN:

SELECT * FROM other_table JOIN (SELECT max_n(col, n) FROM test);

Is there any existing syntax that would allow this?  Or would I still
need a new token (EXPLODE) that would convert from the vector to the
component values at the VDBE level?  

Also, for my real application I would actually need a vector of rows
(ie, a table), since I want to be able to return not just the value of
max but some identifying id.  Here's closer to my actual query:

SELECT matrix_match(base.uid, base.vector, test.uid, test.vector) 
  FROM vectors AS test, (SELECT uid, vector FROM vectors) AS base
  GROUP BY base.uid;
1|123,456,789
2|234,567,890
...

--nate



[sqlite] functions that return tables

2005-11-18 Thread Nathan Kurz
Perhaps related to the recent questions about converting rows to
columns, I'm finding the need for user defined aggregate functions
that can return multiple values, or ideally multiple rows of values. 

Assume you to determine the highest N values from column.  You'd want
a aggregate function like max_n(col, n) that makes one pass through
the table, saving the highest values it sees, and then returning them.

SELECT max_n(col, 3) FROM test;
max
---
10
9
8

My current workaround is to have my function return a comma separated
list of values ("10,9,8"), parse this string in my application, and
generate a new query, but ideally I'd like to do this in one step.

Is there any reasonable way to accomplish this?  Or am I left with
defining a new function type that returns a handle to a temp table,
and new parsing logic to wrap the right OP codes around that function?

Thanks!

Nathan Kurz
[EMAIL PROTECTED]


[sqlite] strlen() and function args (PATCH)

2005-11-15 Thread Nathan Kurz
I've been looking at ways to make user defined functions run with less
overhead, and found that OP_STRING and OP_STRING8 arguments are having
strlen() called on them every time the function is invoked.  Attached
is a tiny patch that causes expr.c to pass the length of the argument
as P1 and also causes vdbe.c to use this argument if it exists.

Although the gain here is probably small for typical users, in the
case I'm optimizing for (a tight loop around a user defined function
with multiple long string args) I got about a 5% overall improvement.

I've used this patch a little bit, but haven't tested it well.  It
doesn't seem to cause any problems with the test suite.  A similar
optimization could also probably be done for OP_REAL, but since I
wasn't testing that I didn't put it in.

--nate

(In the future, should I post proposed patches here for discussion, or
should I just add them to the bug tracker as enhancements?)




--- src/expr.c~ 2005-09-23 15:11:54.0 -0600
+++ src/expr.c  2005-11-15 15:43:08.0 -0700
@@ -1482,7 +1482,9 @@
   assert( TK_FLOAT==OP_Real );
   assert( TK_STRING==OP_String8 );
   sqlite3DequoteExpr(pExpr);
-  sqlite3VdbeOp3(v, op, 0, 0, pExpr->token.z, pExpr->token.n);
+  /* calculate strlen() since token.n is off if Dequote occurred */
+  sqlite3VdbeOp3(v, op, pExpr->token.z ? strlen(pExpr->token.z) : 0, 
+0, pExpr->token.z, pExpr->token.n);
   break;
 }
 case TK_NULL: {
--- src/vdbe.c~ 2005-09-20 11:42:23.0 -0600
+++ src/vdbe.c  2005-11-15 14:00:23.0 -0700
@@ -689,7 +689,7 @@
   pTos->z = pOp->p3;
 #ifndef SQLITE_OMIT_UTF16
   if( db->enc==SQLITE_UTF8 ){
-pTos->n = strlen(pTos->z);
+pTos->n = pOp->p1 > 0 ? pOp->p1 : strlen(pTos->z); /* use given strlen */
   }else{
 pTos->n  = sqlite3utf16ByteLen(pTos->z, -1);
   }


[sqlite] aggregate functions and sqlite3_set_auxdata / sqlite3_get_auxdata

2005-11-14 Thread Nathan Kurz
I've just figured out that sqlite3_get_auxdata() and
sqlite3_set_auxdata() are designed to be used only with scalar user
defined functions.  If they are used with aggregate functions, they
cause sporadic and mysterious segfaults.  In particular, VdbeFunc
within the context is not initialized for aggregate functions, so
get_auxdata() will occasionally return random garbage.

This is mentioned in the comments in sqlite.h.in, but it took me a
while to see it.  I offer this more as a hint to future unwary SQLite
hackers searching the archives than as a bug report.  On the bright
side, both functions appear to work excellently when used (as
designed) with scalar functions.   A work around for aggregate
functions is (presumably) to use sqlite3_aggregate_context().

--nate




Re: [sqlite] optimizing out function calls

2005-11-13 Thread Nathan Kurz
On Sun, Nov 13, 2005 at 07:30:58AM -0500, [EMAIL PROTECTED] wrote:
> Nathan Kurz <[EMAIL PROTECTED]> wrote:
> > 
> > SELECT uid, match("complex", "function", vector) AS match FROM vectors 
> >   ORDER BY match DESC LIMIT 20;
> 
> SELECT uid, mx FROM
>   (SELECT uid, match(...) AS mx FROM vectors LIMIT -1)
> ORDER BY mx DESC LIMIT 20;
> 
> The LIMIT -1 on the subquery is to fake out the optimizer and prevent
> it from folding the subquery back into the main query, resulting in the
> same statement you started with.  A "LIMIT -1" is effectively a no-op.
> It does no limiting.  But subqueries that contain a limit will not be
> folded into outer queries that also contain a limit.

Thanks!  I would not have thought to try that on my own.  It does
indeed prevent the double function call, but unfortunately makes the
rest of the query more expensive by creating an extra temp table.
>From what I can tell about the VDBE execution in both cases:

SELECT function() AS func FROM table ORDER BY func DESC LIMIT 10;
   -> cycle through every row calling function() twice per row
   -> put results directly into an sorted temp table

SELECT func FROM (SELECT function() AS func FROM table LIMIT -1)
  ORDER BY func DESC LIMIT 10;
   -> only calls function() once per row
   -> puts results into temp table 1 (unsorted)
   -> inserts all rows from temp table 1 into sorted temp table 2

Is there any way to combine the best of both of these worlds: only
calling function() once but only creating one temp table?  Or is it
the case (as it seems) that SQLite only copies the expression for the
function() when parsing an AS and never goes back to determine that
the expression has already been solved for that row?

Or even better, is there any way to write a user defined function that
could do the ordering and limiting internally to reduce the data set
early?  I suppose I could do it as a aggregate function that returns a
text string and then reparse that into a second query using IN, but it
would wonderful if there was a way to 'explode' the return value of a
function into multiple rows.  Something like:

SELECT function_returning_multiple_rows(id, vector, limit) FROM vectors;

> Hint:  The output of EXPLAIN is much easier to read if you do ".explain"
> first to set up the output formatting.

Thanks for the hint!  It does indeed.

--nate

ps.  A couple Pathological cases that superficially look like bugs
 because of this double execution of the function call.  Are
 these bugs, features, or just the way things currently are?

sqlite> SELECT random(1) AS func FROM test ORDER BY func DESC LIMIT 5;
func

-141
7787
-823
1453
-654

sqlite> SELECT random() AS func FROM test WHERE func > 10 LIMIT 5;
func

-853
5973
-232
-217
9849

pps.  Despite any apparent complaints, I'm really enjoying SQLite.



Re: [sqlite] optimizing out function calls

2005-11-12 Thread Nathan Kurz
On Sat, Nov 12, 2005 at 10:01:29PM -0700, Nathan Kurz wrote:
> SELECT uid, match("complex", "function", vector) FROM vectors AS match
>  ORDER BY match DESC LIMIT 20;

Please pardon the silly typo.  I do have the AS in the right spot.

SELECT uid, match("complex", "function", vector) AS match FROM vectors 
  ORDER BY match DESC LIMIT 20;

And in case it bolsters my case, here's the EXPLAIN output I see:

sqlite> EXPLAIN SELECT uid, match("complex", "function", vector) AS match  
   ...> FROM vectors ORDER BY match DESC LIMIT 20;
0|OpenVirtual|1|3|keyinfo(1,-BINARY)
1|Integer|20|0|
2|MustBeInt|0|0|
3|Negative|0|0|
4|MemStore|0|1|
5|Goto|0|37|
6|Integer|0|0|
7|OpenRead|0|14984|
8|SetNumColumns|0|2|
9|Rewind|0|25|
10|Column|0|0|
11|String8|0|0|complex
12|String8|0|0|function
13|Column|0|1|
14|Function|3|3|match(3)
15|MakeRecord|2|0|
16|String8|0|0|complex
17|String8|0|0|function
18|Column|0|1|
19|Function|3|3|match(3)
20|Sequence|1|0|
21|Pull|2|0|
22|MakeRecord|3|0|
23|IdxInsert|1|0|
24|Next|0|10|
25|Close|0|0|
26|Sort|1|36|
27|MemIncr|0|36|
28|Column|1|2|
29|Integer|2|0|
30|Pull|1|0|
31|Column|-1|0|
32|Column|-2|1|
33|Callback|2|0|
34|Pop|2|0|
35|Next|1|27|
36|Halt|0|0|
37|Transaction|0|0|
38|VerifyCookie|0|116|
39|Goto|0|6|
40|Noop|0|0|

--nate


[sqlite] optimizing out function calls

2005-11-12 Thread Nathan Kurz
Hello --

I'm trying to figure out how to optimize a query a bit, and think I've
hit a case that could easily be optimized by sqlite but isn't.  I'm
wondering if it would be an easy optimization to add, or whether there
is some way I can 'hint' the optization into being.

I'm using a computationally expensive user defined function called
'match()'.  In case it makes a difference, match() is written in C,
and for testing, I'm loading it as a shared library into the sqlite3
shell application.  I want to return the value of match(), and also
order by it.  So my query looks something like this:

SELECT uid, match("complex", "function", vector) FROM vectors AS match
 ORDER BY match DESC LIMIT 20;

I had expected that match() would only be called once per row, but it
turns out to be called twice: once for the select, and once for the
ordering.  I've confirmed this both by putting in a counter, and by
using 'EXPLAIN'.  Is there any way to tell SQLite to reuse the value
of the first call rather than calling the function again?

I'm a comfortable C programmer, but only superficially familiar with
the SQLite code so far.  If I'm not missing something obvious, hints
on where to look at writing a patch for this would be appreciated.

Thanks!

Nathan Kurz
[EMAIL PROTECTED]




Re: [sqlite] BerkleyDB pager?

2005-07-03 Thread Nathan Kurz
On Sun, Jul 03, 2005 at 02:14:45PM -0400, Andrew Athan wrote:
> >>I'm investigating embedded databases for an upcoming project, and I came
> >>upon this thought:  Wouldn't an SQLite pager that uses
> >>Sleepycat/BerkleyDB be quite interesting?

Maybe you could clarify a bit more what you are proposing.  Your
subject line says "BerkeleyDB Pager", but in the text you ask about
using the SQLite pager with a BerkeleyDB (backend?).  Which one would
be grafted on to which one, and what might be benefit be?

> Basically I am thinking that some of BerkleyDB's locking granularity and 
> distribution/replication/etc. features might be interesting within the 
> context of SQLite -- and conversely, it would provide an SQL compliant 
> interface to what is otherwise a "B-Tree" style database.

I'm guessing this means you are proposing to write a parallel to
SQLite's btree.c that acts as a wrapper for the BerkeleyDB functions.
If so, I would find this idea interesting, not so much for the
specifics of BerkeleyDB, but as I've been thinking about how to
backend SQLite to some other custom btree implementations.

> Clay Dowling writes:
> >It would be of fairly limited value for a lot of us however.  For projects
> >which don't release source code Sleepycat is rather expensive.  Speaking
> >as a program it has also not been my favorite database for administration
> >issues.
> >
> Are we talking about the same sleepycat?  BerkleyDB (Sleepycat) is open 
> source, available here:  http://www.sleepycat.com/products/db.shtml

I think Clay's point here is that while SQLite can be redistributed in
closed source systems, BerkeleyDB cannot be unless one buys a license:

http://www.sleepycat.com/download/licensinginfo.shtml

--nate


Re: [sqlite] Relative efficiency of joins, subselects, and union/intersect

2005-02-23 Thread Nathan Kurz
On Tue, Feb 22, 2005 at 09:38:53PM -0700, John LeSueur wrote:
> >In my model, a 'song' record consists of a unique artist/album/title
> >A 'tag' is a name/value pair associated with a 'song': 
> >  Song: song_id artist album title
> >  Tag:  song_id name value
> >Searching for the tags associated with a given song would be pretty
> >easy, but complicating matters I need for tags to 'cascade'.  
> >
> >This is so that a tag can be set for an entire album, and then
> >overridden if necessary on a song by song basis.  For example, if I am
> >checking the value of the tag 'rating' for Artist/Album/Title, but no
> >such tag is found, I want to fall back on the tag for
> >Artist/Album/NULL; failing that I search for Artist/NULL/NULL.
...
> select
>coalesce(songtag.name, albumtag.name, artisttag.name) as name,
>-- you might need to use a case statement instead if the value of a 
> tag can be null
>-- in that case this would return a value that did not match the tag
>coalesce(songtag.value, albumtag.value, artisttag.value) as value
> from 
>song
>left outer join
>   tag songtag
>  on song.id = tag.object_id
>left outer join
>   tag artisttag
>  on song.artist_id = tag.object_id
>left outer join
>   tag albumtag
>  on song.album_id = tag.object_id
> where
>song.id = ?
>and tag.name = ?

Thanks for the response (and to you too, Dan)!  I hadn't known about
coalesce as a builtin, so that should be useful.

I was also thinking that there should be some way to speed up the join
a bit by short-circuiting if the first join finds a tag.  Something like:

 SELECT coalesce(song_tag.name, album_tag.name, artist_tag.name) as name,
coalesce(song_tag.value, album_tag.value, artist_tag.value) as value,
  FROM song
  LEFT OUTER JOIN tag AS song_tag
   ON song.song_id = song_tag.id
  LEFT OUTER JOIN tag as album_tag
   ON song_tag.id ISNULL AND <---
  song.artist_id = album_tag.id
  LEFT OUTER JOIN tag as artist_tag
   ON album_tag.id ISNULL AND<---
  song.artist_id = artist_tag.id
 WHERE
song.id = ? AND
tag.name = ?

(guessing at query, haven't actually tried it yet)

If this is internally optimized the way it could be (if I understand
correctly), it could just fill in NULLs for the 'album_tag' and
'artist_tag' columns without having to do a look up.  Do you know if
this is how it works, or if there is some way to make it do that?

> This could be surprisingly fast if you do it like itunes does and
> load the entire song database into memory(which sqlite supports),
> and using a file(itunes uses xml) only to load initially, and store
> permanent changes. There's probably something even simpler you can
> do, but I can't think of it at this moment.

I'm going to be accessing this database from a couple different
places: as a plugin to a music player (and hence long-lived) and from
a small client program accessible via a hot key (short-lived).  So I
think that having the XML file as a primary store might not work well
for me (at least from the short-lived client).  

But maybe I'm not really understanding the advantages of the in-memory
database.  Is it in some way inherently faster on lookups than just
setting SQLite to use a really large cache?

Thanks!

Nathan Kurz
[EMAIL PROTECTED]


[sqlite] Relative efficiency of joins, subselects, and union/intersect

2005-02-22 Thread Nathan Kurz
Hello --

I've got a SQL problem that I'm guess is about half general and half
SQLite specific.  I'm pretty new to SQLite, familiar with basic SQL
(mostly from MySQL), and I'm accessing SQLite through C and Perl.

The problem I'm working on relates to tagging music for some music
management software I'm thinking about.  It currently exists mostly in
my head, but it will be released as open source if it pans out.

In my model, a 'song' record consists of a unique artist/album/title
A 'tag' is a name/value pair associated with a 'song': 
   Song: song_id artist album title
   Tag:  song_id name value
Searching for the tags associated with a given song would be pretty
easy, but complicating matters I need for tags to 'cascade'.  

This is so that a tag can be set for an entire album, and then
overridden if necessary on a song by song basis.  For example, if I am
checking the value of the tag 'rating' for Artist/Album/Title, but no
such tag is found, I want to fall back on the tag for
Artist/Album/NULL; failing that I search for Artist/NULL/NULL.

As I see it, this means that I need to be able to search the tags
table by three different song_id's (id_exact, id_album, id_artist).  I
can come up with several ways of doing this: joins, subselects, or
unions.  For example, the join approach would need to do an outer join
between the song and tag tables three times, once on song_id=id_exact,
once on song_id=id_album, and finally on song_id=id_artist.

(note that if a tag exists for id_exact, it overrides the results for
id_album and id_artist: the cascade is conditional on no tag found.)

It strikes me that this is probably going to be pretty inefficient.
My requirements are not for incredible speed, but I probably need to
be able to search through a database of 10,000 songs and 100,000 tags
in something less than a tenth of a second on a typical desktop PC.

Is there a way of approaching this that works better with SQLite than
other ways? (ie, are intersects optimized but subselects not?) Is
there a standard pattern for doing this sort of cascade?  My guess is
that there are probably a lot of ways of solving this problem, but
that one of them is going to be strikingly more efficient than the
others, and which one is best might be specific to SQLite.

Thanks for any suggestions,

Nathan Kurz
[EMAIL PROTECTED]

ps. I'd like to be able to do this search in a single SQL statement
(rather than multiple queries) because at some point I need to
start doing intersects of multiple tag searches.  Another bridge
I'll cross when I get to it!