[sqlite] VACUUM requires 6.7 times space ?

2015-01-05 Thread Jim Wilcoxson
Simon - instead of using vacuum, it's much faster to create a new
database from the old one, then rename it.  It's easy to do this in
Python using iterdump(), or you can connect to the new (empty)
database, do your create table statements, attach the old database as
olddb, then do:

insert into table1 select * from olddb.table1;
insert into table2 select 8 from olddb.table2;

This also lets you do the create table stmts w/o indexes, and add the
indexes after the inserts.  Not sure if that is faster or not for your
data.

Good luck!
Jim
-- 
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3 performance on select count very slow for 16 GB file

2015-01-23 Thread Jim Wilcoxson
If you have a table where rows are inserted but never deleted, and you
have a rowid column, you can use this:

select seq from sqlite_sequence where name = 'tablename'

This will return instantly, without scanning any rows or indexes, and
is much faster than max(rowid) for huge tables.

If no rows have been inserted, you will get NULL.  If rows have been
inserted, you will get back the last rowid inserted.

Jim


Simon wrote:

If this is a table for which rows are inserted but never deleted, then
you will find that

SELECT max(rowid) FROM hp_table1

returns the same value almost immediately.  Perhaps value-1, but
whatever it is it'll be consistent.

-- 
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] HashBackup sqlite 3.6.18 -> 3.8.8 performance

2015-02-04 Thread Jim Wilcoxson
To add to the real-world performance info, I recently upgraded the
backup program I'm developing, HashBackup, from SQLite 3.6.18 to
3.8.8, mostly to see if it fixed some occasional "Database is locked"
errors (multithreaded).  For these tests, nothing was changed except
the SQLite library.

Hardware:
--
The tests were on a quad-core processor Mac Mini i7 with 16GB of
memory, though HB only uses maybe 100-150MB for these tests. In most
cases, can make good use of up to 3 processors before other limiting
factors kick in.

The Mini has 2 1TB 5400rpm drives that max out at about 112MB/s on the
outer tracks with dd on the raw device:

sh-3.2# dd if=/dev/rdisk0 of=/dev/null bs=128k
^C5661+0 records in
5661+0 records out
741998592 bytes transferred in 6.578497 secs (112791508 bytes/sec)

An application can read from the file system at about 82MB/sec.  This
disk has full-disk-encryption enabled, so maybe that is a factor:

sh-3.2# dd if=test-hb-home.tar of=/dev/null bs=128k
^C5361+0 records in
5361+0 records out
702676992 bytes transferred in 8.552060 secs (82164648 bytes/sec)

Environment:
--
These tests backup data from drive 0 and write the backup data and
SQLite database to drive 1.  Before testing different SQLite versions,
drive 1 (the output drive) is emptied.  Drive 0, the system drive, is
mounted and of course changing slightly between tests, but nothing is
running except the usual system processes.

Test 1: Full system backup followed by incremental backup
--
For a full backup, HB is mainly doing inserts of file system and HB
metadata using prepared insert statements.  The dedup table and arc
files (compressed user data) do not use SQLite.  Around 700K files are
backed up on the system drive, totalling 49GB.  There are only 3000
files 1MB or larger, so this test is seek-bound on the filesystem.
The largest file is 8GB.

SQLite 3.6.18 full backup:

Time: 2745.6s, 45m 45s
Checked: 698470 paths, 49500931464 bytes, 49 GB
Saved: 698418 paths, 49500931464 bytes, 49 GB
Excluded: 45
Dupbytes: 15300578096, 15 GB, 30%
Compression: 51%, 2.1:1
Space: 23 GB, 23 GB total
No errors

real45m52.149s
user26m43.971s
sys 4m4.460s

SQLite 3.6.18 incremental backup:

Time: 425.8s, 7m 5s
Checked: 698471 paths, 49501306705 bytes, 49 GB
Saved: 108 paths, 1235685931 bytes, 1.2 GB
Excluded: 45
Dupbytes: 1230321290, 1.2 GB, 99%
Compression: 99%, 1493.3:1
Space: 827 KB, 23 GB total
No errors

real7m17.200s
user1m51.300s
sys 0m43.691s

SQLite 3.8.8 full backup:

Time: 3068.1s, 51m 8s
Checked: 698472 paths, 49501873829 bytes, 49 GB
Saved: 698420 paths, 49501873829 bytes, 49 GB
Excluded: 45
Dupbytes: 15296606115, 15 GB, 30%
Compression: 51%, 2.1:1
Space: 23 GB, 23 GB total
No errors

real51m21.242s
user26m42.716s
sys 4m12.255s

SQLite 3.8.8 incremental backup:
NOTE: a large Postgres log file was backed up with 3.6.18 that was not
backed up this time.  This backup saved 18MB instead of 1.2GB.

Time: 362.6s, 6m 2s
Checked: 698472 paths, 49501964307 bytes, 49 GB
Saved: 64 paths, 18663074 bytes, 18 MB
Excluded: 45
Dupbytes: 17388470, 17 MB, 93%
Compression: 99%, 116.6:1
Space: 160 KB, 23 GB total
No errors

real6m9.153s
user1m29.346s
sys 0m42.033s

My conclusion, based on the full system backup test, is that SQLite
3.8.8 is 11-12% slower than 3.6.18 for this insert workload:

(3068-2745)/2745
.1176

I'm not particularly happy about that, but if it fixes my Database is
locked problem, I can live with it, and I might be able to use a
partial index on one index to speed this up.

Test 2: backup 2 versions of a large SQL dump
--
In this test, there are 2 versions of an SQL database dump, to SQL
commands.  The 2 files are around 1.8GB.  This test eliminates a lot
of the filesystem seeking aspect of a system backup.  The purge
command was used before each backup to force actual disk I/O on the
data file and SQLite database:

3.6.18 initial backup:
Around 33K SQL inserts are performed as the file is split into blocks.

Time: 23.1s
Checked: 4 paths, 1812149567 bytes, 1.8 GB
Saved: 4 paths, 1812149567 bytes, 1.8 GB
Excluded: 0
Dupbytes: 0
Compression: 78%, 4.7:1
Space: 382 MB, 382 MB total
No errors

real0m23.223s
user0m53.623s
sys 0m1.709s

3.6.18 incremental backup:
Around 9K inserts are performed for new blocks.
Around 24K queries are performed to verify block hashes match
Around 24K updates are performed to bump reference counts

Time: 21.5s
Checked: 4 paths, 1771063478 bytes, 1.8 GB
Saved: 4 paths, 1771063478 bytes, 1.8 GB
Excluded: 0
Dupbytes: 1306412633, 1.3 GB, 73%
Compression: 93%, 14.6:1
Space: 121 MB, 503 MB total
No errors

real0m24.181s
user0m31.974s
sys 0m1.266s

3.8.8 initial backup:

Time: 22.4s
Checked: 4 paths, 1812149567 bytes, 1.8 GB
Saved: 4 paths, 1812149567 bytes, 1.8 GB
Excluded: 0
Dupbytes: 

Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Jim Wilcoxson
On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr  wrote:
> I am currently working on a project that requires retrieving a list of
> all the rows from a FTS3 table. The ordering of the results varies by
> search criteria. Since this is for an embedded project the list results
> are passed in chunks to another module to give the appearance of faster
> operations.

Somewhere you have some state information so that you know what offset
to use.  Store the SQLite cursor with that state information, and use
it to fetch the next 2000 rows on each call, ie, only do the query
once.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing list retrieval with a FTS3 table

2011-03-16 Thread Jim Wilcoxson
On Wed, Mar 16, 2011 at 4:12 PM, Travis Orr <t...@ivl.com> wrote:
> Jim, just did a test with your recommendation, however that ended up
> using too much memory, since it is selecting the entire list in on go
> and keeping the memory for a significant amount of time, for the
> embedded system it is running on. Other tasks started having problems.
>
> Travis

I think it should only use cachesize*pagesize memory, so you should be
able to control memory usage with pragma cache_size.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com


>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Jim Wilcoxson
> Sent: March-16-11 12:51 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Optimizing list retrieval with a FTS3 table
>
> On Wed, Mar 16, 2011 at 3:15 PM, Travis Orr <t...@ivl.com> wrote:
>> I am currently working on a project that requires retrieving a list of
>> all the rows from a FTS3 table. The ordering of the results varies by
>> search criteria. Since this is for an embedded project the list
> results
>> are passed in chunks to another module to give the appearance of
> faster
>> operations.
>
> Somewhere you have some state information so that you know what offset
> to use.  Store the SQLite cursor with that state information, and use
> it to fetch the next 2000 rows on each call, ie, only do the query
> once.
>
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] unexpected large journal file

2010-06-22 Thread Jim Wilcoxson
Eric Smith  writes:
> 
> Jay A.  Kreibich wrote: 
> 
> > Try getting rid of the PK definition and see how much that buys you.  
> > It might be worth it, or it might not.  
> 
> and Simon Slavin wrote: 
> 
> > We know you are doing a huge amount of writing to this database.  Are 
> > you also reading it frequently ?  If not, then it might be worth making an 
> > index on that primary key only when you're about to need it.  
...
> I tried removing the PK definition as you both suggested, and the 
> journal stays fixed at less than 20Kb, even against a db size of (at 
> the moment) 37Gb.  My insert batch run times are improved by a factor of 
> ~2.5 and seem to be O(1).  
> 
> So, bingo.  :) 


Here's a trick that might help you:

Since your primary key is two integers, you can combine them into one integer
and use them as the primary key, without requiring a separate index:

create table x (
  pkey integer primary key,
  val  text);

a = (value 0-2M)
b = (32-bit integer)
pkey = a<<32 | b
insert into x values (pkey, data)

This shift assumes that b is a 32-bit integer.  Since a is limited to 2M, which
requires only 21 bits, b can be up to 43 bits, or 8796093022207.

Also, I think you mentioned that you are inserting records in this order:

a=0 b=0
a=1 b=0
...
a=2M b=0

then

a=0 b=1
a=1 b=1
...
a=2M b=1

To insert records in order, you should insert them as:

a=0 b=0
a=0 b=1
a=1 b=0
a=1 b=1

Jim
---
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup




___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jim Wilcoxson
On 6/22/10, Eric Smith  wrote:
> I have confirmed that INSERT times are roughly logarithmic in
> the number of existing records after creating my specific user
> indices.
>
> But INSERT times appeared to be *linear* in the number of existing
> records before I had created any user indices (and with no primary
> keys or unique indices defined).
>
> Can anyone explain this?
>
> Eric

Insert times should be constant for the 2nd case: no primary key, no
indexes; ie, it doesn't matter how many records are already in the
database.  I confirmed this with SQLite 3.6.18.

Did you see my earlier note about combining your two integers into the
primary key?  This will also give you constant insert times, if you
insert items in the order:

a=0, b=0
a=0, b=1
a=1, b=0
a=1, b=1
etc.

Jim
-- 
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] linear behavior on INSERT (was Re: unexpected large journal file)

2010-06-22 Thread Jim Wilcoxson
On 6/22/10, Eric Smith <eas@gmail.com> wrote:
> Jim Wilcoxson wrote:
...
>> Did you see my earlier note about combining your two integers into the
>> primary key?  This will also give you constant insert times, if you
>> insert items in the order:
...
> Thanks also for the tip on insertion order.  Does that also hold for
> multi-column indices (and not single-column indices transformed from two
> integers)?  I assume it's because we get more cache hits and fewer tree
> rebalances when we insert in key-order?

Yes, I'm pretty sure this applies to multi-column indices too.  So if
you do your inserts as:

a=0 b=0
a=1 b=0
...
a=0 b=1
a=1 b=1

Then the first set of rows with b=0 will be added in "more or less"
constant time.  When you start doing the second set of inserts, with
b=1, that will cause pain, because you will be modifying every index
record you created earlier and squeezing a new entry in between every
existing entry.  This will require a lot of journalling.

I think it would run faster (less journalling) to insert in order with:

a=0 b=0
a=0 b=1
a=1 b=0
a=1 b=1
etc.

Even if you load the data without indexes and add the index later, my
guess is that SQLite will still traverse the data in rowid order to
create the index.  So you are still better off inserting in the 2nd
order rather than the first.  The added advantage is that your index
pages will be clustered together.

Jim
-- 
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted sqlite journal

2010-07-09 Thread Jim Wilcoxson
On Fri, Jul 9, 2010 at 10:38 AM, Ke Tao  wrote:

>
> HI All ,
>
> I have sqlite db name "wdb" and "wdb-journal" file was created by power
> failure something , when I do any db operation sqlite always prompt "disk
> I/O error" , but when I delete the "wdb-journal" ,there is no errors
> prompted. I think maybe the wdb-journal file was corrupted , does anyone
> have any idea on this ?
> I used sqlite on linux system.
>
> Best Regards,
> Ke Tao
>

Permission problems can cause this.  The "disk I/O error" actually means "I
can't do a rollback", maybe because root created the journal and owns the
database, then another user is trying to do things with it.  Just a guess.
This "disk I/O error" message is very confusing.  Been there!

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupted sqlite journal

2010-07-09 Thread Jim Wilcoxson
On Fri, Jul 9, 2010 at 3:21 PM, Jim Wilcoxson <pri...@gmail.com> wrote:

> On Fri, Jul 9, 2010 at 10:38 AM, Ke Tao <kota...@gmail.com> wrote:
>
>>
>> HI All ,
>>
>> I have sqlite db name "wdb" and "wdb-journal" file was created by power
>> failure something , when I do any db operation sqlite always prompt "disk
>> I/O error" , but when I delete the "wdb-journal" ,there is no errors
>> prompted. I think maybe the wdb-journal file was corrupted , does anyone
>> have any idea on this ?
>> I used sqlite on linux system.
>>
>> Best Regards,
>> Ke Tao
>>
>
> Permission problems can cause this.  The "disk I/O error" actually means "I
> can't do a rollback", maybe because root created the journal and owns the
> database, then another user is trying to do things with it.  Just a guess.
> This "disk I/O error" message is very confusing.  Been there!
>
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://sites.google.com/site/hashbackup
>

Also, by deleting the journal, you are likely to corrupt your database.  You
can't delete the journal!

Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption on Linux ext3

2010-07-13 Thread Jim Wilcoxson
On Tue, Jul 13, 2010 at 8:06 PM, Roger Binns  wrote:

>
> On 07/13/2010 04:57 PM, Simon Slavin wrote:
> > One on each page and one for the entire file that checksums the page
> checksums ?
>
> One for each page plus one of the header would make the most sense, but the
>

I don't think this would work, because the problem described is that the
writes aren't making it to disk.   If pages don't make it to disk, the old
pages will be present, with the old, and valid checksums.

The only way I can see checksums helping with this problem is if there is a
checksum over the entire file (or checksums of checksums of each page).
Then if you do any writes, but not all writes, the overall checksum will be
invalid.  SQLite would have to verify the checksum on every page when the
database is opened and a hot journal exists, which could be quite a lot of
overhead for a large database.

Plus, SQLite would have to keep a list of the checksums for every page, and
at commit time, recompute the overall hash/checksum.  This could be lots of
memory for a large database.  A 1GB database for example would require 1M
20-byte SHA1 hashes, so 20MB.  If a bit for every page in the database was a
scalability problem in earlier versions of SQLite, I'm guessing that 20
bytes for every page would be unworkable.

I think a checksum on every page, and maybe even an error-correction code,
is a great idea as an optional feature.  But it would only detect hardware
problems and bit rot.  This problem of not doing writes, or doing them in
the wrong order, is a different animal IMO.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database corruption on Linux ext3

2010-07-14 Thread Jim Wilcoxson
On Wed, Jul 14, 2010 at 1:35 AM, Roger Binns <rog...@rogerbinns.com> wrote:

> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 07/13/2010 05:30 PM, Jim Wilcoxson wrote:
> > I don't think this would work, because the problem described is that the
> > writes aren't making it to disk.   If pages don't make it to disk, the
> old
> > pages will be present, with the old, and valid checksums.
>
> You are assuming the checksums are stored in the page they checksum.  That
> would only detect corruption of that page.  You could have pages that store
> the checksums of numerous other pages, so both the checksum page and the
> data page would have to fail to make it to disk.  Yes, there are scenarios
> where you could still get old apparently valid pages, but those are harder
> to happen.
>

It seems there are several level of checking possible:

- checksum on the page itself lets you detect some errors, with no extra I/O
- checksum pages for a group of pages lets you detect missing writes within
the group, with some extra I/O
- checksum of all checksum pages lets you detect missing writes for an
entire commit, with even more extra I/O

How much extra I/O depends on the size of the db, page size, and how much
memory is available for caching checksum pages.

Scott mentioned that a detection system without the ability to correct might
not be useful, but I think it is useful.  Not as good as correction of
course, but useful because:

- it might prevent the application program from issuing a bogus error
message like "the row you asked for isn't in the database"; lots of time
could be spent in the weeds chasing down a misleading error

- some applications might have backup copies of the database; they could
display an error message and revert to a backup

Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How much a table takes (will this query always work)

2010-07-16 Thread Jim Wilcoxson
On Fri, Jul 16, 2010 at 6:01 AM, Max Vlasov  wrote:

> Hi,
>
> always wanted to have a possibility to calculate how much a table occupies.
> As long as I see from the archive, there's no out-of-the-box solution
> (CMIIW)
>
>
You mean, like this?

pragma page_count
pragma page_size
size = page_count*page_size

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How much a table takes (will this query always work)

2010-07-16 Thread Jim Wilcoxson
On Fri, Jul 16, 2010 at 9:13 AM, Jim Wilcoxson <pri...@gmail.com> wrote:

> On Fri, Jul 16, 2010 at 6:01 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>
>> Hi,
>>
>> always wanted to have a possibility to calculate how much a table
>> occupies.
>> As long as I see from the archive, there's no out-of-the-box solution
>> (CMIIW)
>>
>>
> You mean, like this?
>
> pragma page_count
> pragma page_size
> size = page_count*page_size
>
>
Doh.  Nevermind.  I see you said table, not database. :)

JIm
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] [PATCH] cache preloading

2010-08-10 Thread Jim Wilcoxson
On 8/10/10, Paweł Hajdan, Jr.  wrote:
> So this is another chromium patch I'd like to submit:
> http://src.chromium.org/viewvc/chrome/trunk/src/third_party/sqlite/preload-cache.patch?revision=26596=markup
>
> I'm not the author of that one, but the main idea seems to be that with
> preloading we get better performance with many scenarios chromium uses
> sqlite (like for omnibox local suggestions). What do you think about the
> patch and the idea? Is there better way to do the same thing? Would you like
> to see some benchmarks?

The benefit of preloading is to replace random I/O with sequential
I/O.  I have definitely seen this be effective in some cases.  For
those who want something like this, doing a cat or dd command, or
using OS reads in a loop, will probably give the same benefit: the
database won't be loaded into SQLite's cache, but will be loaded into
the OS buffer cache.

An advantage of preloading outside SQLite is that the dd/cat/read
could happen in a separate thread, so it wouldn't impact app startup
time.  And it could be that one use of a database might benefit from
preloading, whereas another use might not, so it seems that preloading
inside SQLite would have to be something requested with a pragma
rather than always happening by default.

What I would like to see more than SQLite preloading is better page
allocation strategies and incremental defragmentation, to cut down on
the amount of random I/O and keep related pages in sequential order.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Forthcoming release of SQLite 3.7.1

2010-08-18 Thread Jim Wilcoxson
SQLite is a great product; I use SEE, the encrypted edition.

For those of us who are not using the C interface, I'd like to request
that when possible, new C features like hinting about file sizes and
extents also be added as pragmas.  Pragmas make it very easy for non-C
interfaces to make use of the new features.  The async writer thread
in earlier versions is one feature I'd really like to use, or at least
try out, but I'm using pysqlite bindings.  If there was a pragma to
activate this, it'd be a snap.

Thanks for providing a great library!
Jim

On 8/18/10, Richard Hipp  wrote:
> We are striving to release SQLite version 3.7.1 within the next few days.
> If you have any feedback, objections, comments, or concerns about this
> release, please respond as soon as possible to this mailing list, or via
> private email to me.
>
> A summary of the changes in 3.7.1 can be found here:
> http://www.sqlite.org/draft/releaselog/3_7_1.html
>
> A release candidate amalgamation can be downloaded from here:
> http://www.sqlite.org/draft/download.html
>
> Code changes since the prior release can be seen here (warning - approx
> 10,000 lines of diff output):
> http://www.sqlite.org/src/vdiff?from=release=trunk=1
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Can we get a pragma for SQLITE_FCNTL_CHUNKS_SIZE?

2010-08-19 Thread Jim Wilcoxson
On 8/19/10, Simon Slavin  wrote:
>
> On 19 Aug 2010, at 9:27pm, Taras Glek wrote:
>
>> I really appreciate that sqlite got this feature to reduce
>> fragmentation, but why not expose this as a pragma?
>
> Do you have figures which suggest that reducing fragmentation leads to any
> improvement in performance ?

Whether SQLITE_FCNTL_CHUNKS_SIZE is useful is a different discussion.
I myself would love to see features exposed via pragmas whenever
possible, for the simple reason that I don't use the C API and can't
make use of the features otherwise.  I would assume that since the
SQLite developers added the feature to the C API, there must be a use
for it or they wouldn't have bothered.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Jim Wilcoxson
On 8/20/10, Cory Nelson  wrote:
> +1 for this feature request.  They've got a very specific and fairly
> rare use case, but when opportunity strikes partial indexes are much
> more convenient, straightforward, and efficient than the alternative.
>
> -   If a table has 100,000,000 rows and each index page for it holds
> 100 rows, a full index will need 1,000,000 pages of storage and 4 page
> reads to search.
> -   With a partial index covering only 10,000 rows of the same table,
> it will only need 100 pages of storage and 2 page reads to search.
>
> Big improvement while keeping query syntax and results exactly the same!
>
>> Are there any known hacks to implement something similar?
>
> There's not really any good solution.  You can create a separate table
> with the subset in it, that's pretty much it.  But that has some
> overhead too, and can complicate your queries.

A while back I needed something like a partial index.  The backup
program I'm working on does block dedup, so I made an index on each
block's SHA1.  But then I wanted to only dedup some blocks, not all.
I needed to say "insert this block row and also index the sha in the
sha index", or "insert this block row but don't index the sha in the
sha index".  Then I wanted to be able to say "is X in the sha index?",
ie, only look in the index.  SQL doesn't really have a way to do that.
 It's a simple concept, but it doesn't seem to fit within the SQL
framework.

The best I could come up with is a separate table.  The problem is,
indexing the SHA1 normally means there is a copy in the row and a copy
in the index.  Using a separate table, which still has to be indexed,
means there is a copy in the row of the main table, a copy in the
separate table, and a copy in the separate table's index.

I guess one way to do this using the SQL partial index feature being
discussed would be to have an extra column in the row called
"isindexed", setting that to 1 or 0 depending on whether the SHA
should be included in the index, and using create index ... where
isindexed=1.  Then on the queries, say "select blockid from blocks
where isindexed=1 and sha=X".  And any query that didn't have
isindexed=1 wouldn't be able to use the index at all.

Is that how it would work?

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] partial index?

2010-08-20 Thread Jim Wilcoxson
On 8/20/10, Cory Nelson <phro...@gmail.com> wrote:
> On Fri, Aug 20, 2010 at 7:39 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
>> ...
>> The best I could come up with is a separate table.  The problem is,
>> indexing the SHA1 normally means there is a copy in the row and a copy
>> in the index.  Using a separate table, which still has to be indexed,
>> means there is a copy in the row of the main table, a copy in the
>> separate table, and a copy in the separate table's index.
>>
>
> You might want to index the "sha" column on the primary table, then
> the secondary table can just be indexedblocks(blockid INTEGER PRIMARY
> KEY).  No wasted space that way, but will require more I/Os to JOIN
> the tables.

In my case, the point of a partial index would be to save space.  If
the sha column were indexed completely in the primary table, no other
tables or indexes are necessary, but then no space is saved.  If I
only want to index 80% of the rows, I'd save 20*.2n bytes with a
partial index, where n is the number of rows.  (SHA1 is 20 bytes)

Using a separate table, I'd need 40 bytes (roughly) for each sha
indexed: 20 bytes for the main table, 20 for the index; or 40m for m
entries.  This saves space if half or fewer of all blocks are indexed.
 If more than than half are indexed, indexing all rows in the main
table uses less space.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.7.2

2010-08-27 Thread Jim Wilcoxson
Using the SQLite Encryption Edition rev 3.7.0.1, the latest released
version, on OSX Tiger (10.4), results in an undefined reference to
gethostuuid.  Is OSX 10.4 no longer supported?

Thanks,
Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Mon, Aug 23, 2010 at 9:01 PM, Richard Hipp  wrote:

> SQLite version 3.7.2 is now available on the SQLite website:
> http://www.sqlite.org/
>
> SQLite version 3.7.2 fixes a single bug that was discovered just hours
> after
> the release of 3.7.1.  The bug can result in corruption of the database
> free-list after an incremental vacuum.  The bug had nothing whatsoever to
> do
> with SQLite version 3.7.1 or any other recent release.  The problem had
> been
> in the code for over a year, since version 3.6.16.  The discovery of the
> problem so soon after the release of version 3.7.1 was purely coincidental.
>
> The bug fixed in 3.7.2 can result in database corruption.  However, the
> corruption caused by this bug can almost always be fixed simply by running
> VACUUM on the database.  And the corruption will only occur in an
> incrementally vacuumed database which at some point in time contains
> hundreds of unused pages which are slowly released back to the operating
> system by multiple calls to the incremental_vacuum PRAGMA.  Even then, one
> must be particularly unlucky to hit the right combination of freed pages in
> order to trigger the bug.  Hence the problem is quite obscure and was not
> noticed for over a year.
>
> Hundreds of lines of code where changed for version 3.7.2, but most of
> those
> changes were to test procedures.  As is the custom with SQLite, not only
> was
> the specific bug fixed, but new tests where put in place to detect and
> prevent similar kinds of bugs elsewhere in the code.  We believe that one
> should not just fix the bug, but also fix the process that generated the
> bug. The only 4 working lines of code were changed for version 3.7.2:
>
>
> http://www.sqlite.org/src/fdiff?v1=2dff4076d3c994dc=5047fb303cdf6806
>
> Special thanks to Filip Navara for finding and reporting the problem with
> incremental vacuum.
>
> Please report any other problems to the sqlite-users@sqlite.org mailing
> list, or directly to me.  Thanks.
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request: copying vacuum

2010-09-10 Thread Jim Wilcoxson
I'd also be interested in a VACUUM TO feature, more for performance aspect
than the fragmentation, although that's a plus too.  The backup program I'm
working on packs many files into archives, which are SQLite databases.  I
have run some vacuum tests here; the filesystem cache was purged before each
test:

cp 1GB archive: 44 seconds (for baseline comparison)

Vacuum 1GB archive w/sqlite3:
real   2m15.421s
user   0m8.776s
sys0m34.205s

Dump and reload 1GB archive:
$ time sqlite3 arc.0.0.rm ".dump"|sqlite3 arc.0.0.new

real0m52.174s
user0m23.750s
sys 0m9.086s

Creating a new archive is more than twice as fast as doing a vacuum on an
existing archive, and nearly as fast as a straight cp.  While an extra
minute and a half for SQLite vacuum may not seem like a big deal, a backup
retention operation could affect many archives.  So 30 archives would
require an extra 45 minutes to vacuum.  I've had to add code to the backup
program to avoid doing vacuums whenever possible because they're slow.

I would suggest the VACUUM TO feature takes a read lock on the database and
creates a new, vacuumed database, but leaves it up to the application
whether to replace the original or not.  If the application decides to do a
rename over the original database, then yes, this could goof up other
connections, but that could happen anyway if an app decided to delete a
database: the other connections would keep on using the database, even
though it is unlinked from the filesystem.  For my single-user application,
VACUUM TO would be very useful.  In multi-connection cases, the app would
have to provide some kind of mechanism outside SQLite to coordinate things,
or just stick to the regular vacuum operation.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Thu, Sep 9, 2010 at 7:19 PM, Taras Glek  wrote:

>  On 09/01/2010 11:41 AM, Taras Glek wrote:
> > Hi,
> > Currently VACUUM takes care of sqlite-level fragmentation.
> > Unfortunately it does little for fs-level fragmentation since the same
> > file is being reused. It would be really beneficial for Mozilla
> > performance if we could get a vacuum/hotcopy mode.
> >
> > As I understand it, currently vacuum works by copying out the data to
> > a new file, then copying it back in and truncating the original db file.
> >
> > It would be really nice to instead do something like:
> >
> > copy the data to a new file
> >
> > swap the underlying filehandles to point at new file
> >
> > remove old file
> >
> > rename the new file to old name.
> >
> > This yields two benefits:
> >
> > A less fragmented db
> >
> > ~50% vacuum speedup since the data is only copied once
> >
> > Currently we can copy the data to a new file, but it is a pretty
> > invasive change to swap all of the current sqlite connections to the
> > new file. Things like prepared statements, etc need to be updated for
> > every single db consumer. Thus it would make sense to have this
> > feature on the sqlite side.
> >
> > Is this reasonable request?
> >
> > Thanks,
> > Taras
> I sent this last week, wanted to address some issues that were raised
> about my proposal.
>
> Sorry for the weird(and late!) reply, I'm subscribed via a digest(would
> be so nice to have a gmane mirror), so I can't reply directly.
>
> Richard Hipp:
> > If other connections have the database file open while it is being
> vacuumed,
> > then on unix the other connections will still be left open on the old
> > unlinked version of the database file and will never see the new content.
> > And on windows, the file swapping and renaming simply is not allowed
> while
> > other connections have the database files open.
> > The work around is to modify SQLite so that it is constantly closing and
> > reopening the database files.  But that adds rather large overheads that
> > seem likely to be much greater than any savings seen through a reduction
> in
> > disk FS fragmentation.
> >
> >
> I agree with both points. A copying VACUUM should specify that it does
> not support the multi-connection usecase. It fail abort if it detects
> another db connection( or have this mentioned in documentation if this
> detection isn't possible).
> The wins from avoiding disk fragmentation+copying less data are
> significant. Punting seems like a reasonable alternative to forcing
> sqlite to constantly close/open the db.
>
>
> Jay A. Kreibich:
>
> >You're also breaking transactional integrity.  You need the option of
> >backing-out of the operation right up until the moment it works, and
> >this procedure can't do that.  For example, if you lose power right
> >after "remove old file", your database is no longer there.
> >
> You are right my original sequence of events was flawed, it should be:
>
>copy the data to a new file
>
>swap the underlying filehandles to point at new file
>
>rename the new file to old name(this also removes old file).
>

[sqlite] Detach command in 3.7.1 gives "database is locked" error

2010-10-09 Thread Jim Wilcoxson
This may be a bug in 3.7.1, or I may have been taking advantage of a flaw in
3.6.18:

HashBackup uses SQLite to store backup data in a meta data db and file data
archive db.  First I open the main database, then attach an archive
database.  When the archive gets full, I detach it, create a new archive db,
and attach that.  This all worked fine with 3.6.18.

With 3.7.1, I get an error on the detach:  OperationalError: database arc is
locked

The database is not opened in Exclusive mode, and a commit gets executed
before the detach.  What may be causing this - just a guess - is that there
are selects open on the main database at the time of the detach, but the
archive database is not referenced.

Is this a bug in 3.7.1, or a correction to a flaw in 3.6.18?

Thanks,
Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Detach says database is locked

2010-10-21 Thread Jim Wilcoxson
This type of thing works with SQLite 3.6.18.  I have a similar issue with
3.7.  I believe the difference is that in 3.6.18, if you do a select on a
main database that doesn't involve an attached database, the attached
database isn't locked.  In 3.7, if you do a select on the main database that
doesn't involve an attached database, but there IS an attached database, the
attached database is locked anyway.  Then it's not possible to detach it
with the open select, even though that select has nothing to do with the
attached db.

I don't know if this new behavior is a bug in 3.7, or is considered a fix to
3.6.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Thu, Oct 21, 2010 at 12:21 PM, Alan Chandler
wrote:

> On 21/10/10 00:35, Igor Tandetnik wrote:
> > Alan Chandler  wrote:
> >> Further to my other post related to attaching to databases with PHP PDO,
> >>   I have now managed to ATTACH OK
> >>
> >> However, when I come to DETACH, I am getting a Database is locked error
> >> when I try and execute it.
> >>
> >> The only thing happening to that database in between ATTACH and DETACH
> >> is a single row SELECT
> >
> > Make sure you reset or finalize the statement (not sure how it's done in
> PHP).
> I believe closeCursor();  does the job.  If not, I unset the variable.
>
> However, I have figured out the problem - which is really sad since I
> can't do what I hoped - which is loop round a set of rows from a higher
> level select statement ATTACHing and DETACHing to a database in turn .
> Because the top level select loop is its own transaction, you can't
> detach from the database which you attached in the inner part of the
> loop since at that moment you are in a transaction.
>
> I think my way out of the problem is to pull out all the rows into a
> single array, then close the transaction and interate over the array
> members.  Fortunately in the case its just a menu - so there probably
> won't be too many items.
>
>
>
> --
> Alan Chandler
> http://www.chandlerfamily.org.uk
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] programmatic way of determining fragmentation?

2010-10-21 Thread Jim Wilcoxson
On Thu, Oct 21, 2010 at 1:27 PM, Dustin Sallings  wrote:

>
> On Oct 21, 2010, at 10:05, Pavel Ivanov wrote:
>
> > I think it's not related to fragmentation, but to fill percentage of
> > b-tree pages. I guess your reconstructed table is much less in total
> > size than your initial one. Also does changing cache_size changes
> > above numbers?
>
> Interesting.  We have been using sqlite3_analyze on some tests and
> finding that we can get our dbs very fragmented.  It doesn't report fill
> size as far as I can tell.
>
>
It does report fill percentage, as "Bytes of user payload stored"


>We'll play around with cache_size to see if that does anything
> useful for us in the meantime.
>
> > What size do these tables have?
>
> About 2.2GB with about 4 million rows.
>
> > What bottleneck appears to be in 3-hour query execution? Is it disk
> thrashing?
>
> Yes.
>
>I've tried different strategies in the past.  Vacuum and the rebuild
> both seem to help quite a bit.  I don't understand the file layout all that
> well right now, so I don't completely understand how the index is traversed.
>

How much memory do you have in the test system?  Because if you rebuild, you
are loading the whole db into the filesystem cache, if it'll fit.  Then you
are no longer accessing the file randomly.  You might get the same
performance increase by just catting the db (sequentially reading it into
the FS cache) and then running your test.

It's a good idea to make sure you purge your filesystem cache before running
any performance tests related to fragmentation.  On OSX, it's purge.  On
Linux, its echo 3>/proc/sys/vm/drop_caches.  Otherwise, it's easy to fool
yourself.

But I agree with you that fragmentation can be a big issue, depending on how
the data is loaded and how it is subsequently accessed.

Jim
--
HashBackup, LLC
http://sites.google.com/site/hashbackup
**
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow SELECT Statements in Large Database file

2010-10-29 Thread Jim Wilcoxson
Jonathan - 500 queries per second is 2ms per query.  You'll have a hard time
getting that kind of speed for random queries with any rotating media.  Your
database needs to be in memory - all of it, not just indexes - or on a flash
drive.

If your queries are not random but are somehow related, eg, you are doing
thousands of queries within a small area, and the db records were also
inserted by area, you may have better luck.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Fri, Oct 29, 2010 at 12:07 PM, Jonathan Haws
wrote:

> All,
>
> I am having some problems with a new database that I am trying to setup.
>
> This database is a large file (about 8.7 GB without indexing).  The problem
> I am having is that SELECT statements are extremely slow.  The goal is to
> get the database file up and running for an embedded application (we have
> tons of storage space so the size is not a problem).
>
> Here is the schema layout:
>
> CREATE table Dted (dtedkey INTEGER PRIMARY KEY, dted_lat
> INTEGER, dted_lon INTEGER, dted_alt FLOAT);
>
> We lookup dted_alt based on dted_lat and dted_lon.  Here is our SELECT
> statement:
>
> SELECT dted_alt FROM Dted WHERE (dted_lat=%d AND dted_lon=%d);
>
> The numbers fed to dted_lat and dted_lon are typically on the order of
> 37 and -111.
>
> What can we do to speed up our SELECT statements?  Minutes is
> unacceptable for our application.  We were hoping we could run somewhere
> on the order of 500 queries per second and get valid results back.
>
> I am not an SQL expert, but I was reading about indexes that that it is
> best to have a specific index per SELECT.  Since we only have one,
> this is the index I am creating now (it has been creating this index on my
> machine for the past 10 minutes now):
>
> CREATE INDEX idx_ll ON Dted(dted_lat,dted_lon);
>
> Is that a good index for my SELECT?  Will it speed up the accesses?
>
> Any thoughts?
>
>
> Thanks!
> --
> Jonathan
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_open on non-DB files / corrupt DBs

2010-11-25 Thread Jim Wilcoxson
On Thu, Nov 25, 2010 at 11:06 AM, Tito Ciuro  wrote:

> On 25 Nov 2010, at 12:51, Igor Tandetnik  wrote:
>
> > Run "PRAGMA integrity_check" right after opening.
>
> That could be a potentially slow operation if the database is valid and
> contains lots of records.
>
> Wouldn't be better to issue a SELECT statement and let SQLite come back
> with a SQLITE_NOTADB error right away?
>
>
I ran into this problem too.  What I did was to execute pragma
synchronous=full (or whatever mode you are using) after opening the
database.  This will fail if the file is not really an SQLite db or you have
the wrong key for an encrypted db.  It doesn't handle the problem of a
corrupted db, but as has been mentioned, there's not a fast way to detect
that.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] database size (again)

2010-12-07 Thread Jim Wilcoxson
A lot of the SQLite overhead is going to be in the stuff surrounding your
actual data; I'd be surprised if you saved much space by using fixed-size
ints vs the varints used by SQLite.  You didn't mention about indexes; if
you have any, they will take a lot of space because your row size is so
small.

Maybe write your own VFS for SQLite?

Definitely run sqlite3_analyzer before deciding anything.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup


On Tue, Dec 7, 2010 at 10:57 AM, Laszlo Nemeth wrote:

> Hi,
>
> I have a database (6 in fact) of high-frequency data
>
> create table eurusd (tick integer not null, bid float not null, ask
> float not null);
>
> with 80M records currently and growing, freshly inserted, no deletions
> will ever take place, the schema will never change, and neither of the
> fields can be null. The size is already 3.6G (I put an index on it
> after bulk insert), which is a bit too much for me considering that
> the existing infrastructure (ie binary file 12bytes per record) is
> 800M (which fits into memory and I don't even need to index).
>
> Having checked older posts on sqlite-users, I noticed that this issue
> comes up frequently and normally the problem is either wrong choice of
> datatypes (ie text instead of integer), or unnecessary indices. None
> of which applies here.
>
> Question: is it possible to recompile sqlite to force the
> representation of integers to be 4 bytes, and that of floats to be
> also 4 bytes. I would like to have no observable change in the
> behaviour of sqlite.
>
> I
> (1) am quite comfortable with the hacking,
> (2) understand that the database will no longer be platform
> independent, nor compatible with anything else,
> (3) tried to run the analyser to see if there is something fishy with
> half full pages, but it wants tcl8.6 and haven't gotten around to
> install it (will do),
> (4) also checked the file format document, but that didn't give me any
> immediate hint how to achieve what  I'd like, though I only skimmed it
> through.
>
> The point of doing this is that I get a smaller db, and I still get
> all the beautiful machinery built for sqlite.
>
> Any suggestions, a complete solution, or  a "no  that's not possible,
> because..." will be much appreciated.
> Thanks, Z
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Order of UNION query results

2011-01-22 Thread Jim Wilcoxson
On Fri, Jan 21, 2011 at 6:53 PM, Josh Gibbs  wrote:

> Could someone please clarify for me if the the resulting order of a UNION
> query will come back with the left data first, then the right data in
> the case
> that no ordering has been defined for the query.
>
> My need is to have a parameter stored in a database, with an optional
> overriding parameter which should take precedence, such as:
>
> select value from param_overrides where key='setting' UNION
> select value from params where key='setting'
>
> I'd like the resulting recordset to always contain the override parameter
> first if it exists so I can simply use that value.
>
> Thanks, Josh
>

If you add a prio field (priority) to your param database, 0=highest
priority (override), 1=normal priority, you only need 1 table.  Then you can
say:

select value from params where key='setting' order by prio limit 1

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] stored procedures implementation for SQLite

2011-01-26 Thread Jim Wilcoxson
It looks interesting.  Should your except stmt reference apsw?  -Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com


On Wed, Jan 26, 2011 at 11:38 AM, Chris Wolf  wrote:

>
>
> I know this is an old thread, but shortly after I read it, I attempted to
> implement
> stored procedures in SQLite!  I only did it to see if I could, not because
> I
> necessarily think it's a good idea...  It's very experimental and not fully
> implemented,
> but if anyone is interested, I checked in my work on GitHub, including
> pre-compiled
> binaries for MacOS and Linux.
>
>
> http://chriswolf.heroku.com/articles/2011/01/26/adding-stored-procedures-to-sqlite/#extended
>
> Regards,
>
> Chris Wolf
>
> BareFeetWare wrote:
> > On 13/11/2010, at 11:14 AM, Scott Hess wrote:
> >
> >> On Fri, Nov 12, 2010 at 5:50 AM, BareFeetWare <
> list@barefeetware.com> wrote:
> >>> IMO, if you're implementing database logic (ie constraints and
> triggers) in application code, then you're reinventing the wheel, making
> your package unnecessarily complex and grossly inefficient. If you're just
> using SQLite to store your data but doing all the logic outside of SQLite,
> then you might as well just be saving your data to a CSV file or XML. See my
> previous post for reasoning behind this.
> >> From http://www.sqlite.org/about.html :
> >>> Think of SQLite not as a replacement for Oracle but as a replacement
> for fopen()
> >
> > The full paragraph from that page reads:
> >
> >>> SQLite is an embedded SQL database engine. Unlike most other SQL
> databases, SQLite does not have a separate server process. SQLite reads and
> writes directly to ordinary disk files. A complete SQL database with
> multiple tables, indices, triggers, and views, is contained in a single disk
> file. The database file format is cross-platform - you can freely copy a
> database between 32-bit and 64-bit systems or between big-endian and
> little-endian architectures. These features make SQLite a popular choice as
> an Application File Format. Think of SQLite not as a replacement for Oracle
> but as a replacement for fopen()
> >
> > So I think it's referring to how SQLite stores its data in a local file,
> rather than on a remote server with which it communicates indirectly. ie
> "SQLite does not have a separate server process". In that way, SQLite is
> like fopen rather than Oracle. The same paragraphs mentions SQLite
> "triggers, and views", freely copying a [self contained] SQLite database
> between architectures, which allude to my point about putting the logic in
> the database itself so you can move the whole database between
> architectures.
> >
> >> So, yes, you might as well just be saving your data to a CSV or XML
> file.  And I'm sure if you had a package to do that, someone would be
> arguing about whether your XML should allow for embedded transforms.
> >
> > What do you gain by implementing database logic in the application layer,
> when it could be done far more efficiently and reliably in the SQL schema?
> The only thing I can think of is avoiding the (shallow) learning curve. Why
> re-invent and roll your own integrity checking etc when it's already
> available and in a way much closer to the data than your application code
> can get?
> >
> > See my previous post for the contrary argument:
> >
> http://sqlite.org:8080/cgi-bin/mailman/private/sqlite-users/2010-October/025096.html
> >
> > Basically, database logic in the database itself is much faster, causes
> less error, optimises queries, makes the database portable between
> application environments or GUI apps. What's not to love?
> >
> > Thanks,
> > Tom
> > BareFeetWare
> >
> > --
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Trigger for incrementing a column is slow

2011-02-03 Thread Jim Wilcoxson
On Thu, Feb 3, 2011 at 5:07 PM, Kevin Wojniak  wrote:

>
> On Feb 3, 2011, at 11:41 AM, Petite Abeille wrote:
>
> > On Feb 3, 2011, at 6:53 PM, Kevin Wojniak wrote:
> >
> >> The trigger is ran once via sqlite3_exec();
> >
> > Hmm... you mean the trigger is run every single time you perform an
> insert, no?
>
> Yes. I should say the trigger is created once via sqlite3_exec().
>
> >> Any insight as to why the trigger is significantly slower?
>

Perhaps SQLite is having to recompile the trigger SQL on every use, whereas
your update stmt is prepared.  I tried triggers once, and they were slow for
me too.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bi-directional unique

2011-02-09 Thread Jim Wilcoxson
If you don't care about the order and the integers are smallish, like 31
bits or less, I'd do this:

create table t(k int primary key);

i = whatever
j = whatever
if i < j:
   k = i<<32 | j
else:
   k = j<<32 | i
insert k into table

To see if a pair is in the table, do the same steps and lookup k.

If you do care about the order, you can add k as primary key to the table
layout you mentioned with i and j.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com


On Wed, Feb 9, 2011 at 1:12 PM, Black, Michael (IS)
wrote:

> I have a need to create a unique bi-directional relationship.
>
> You can think of it as pairings of people who eat dinner together.
>
>
>
> create table t(i int, j int);
>
> insert into t(1,2);
>
> insert into t(2,1); << should give an error because the pairing of 1-2
> already exists.
>
> insert into t(3,2); << OK
>
> insert into t(3,1); << OK
>
> insert into t(1,3); << should be error
>
>
>
> You can't guarantee that one column is less than the other so there's no
> win there.
>
>
>
> Speed is of the utmost concern here so fast is really important (how many
> ways can I say that???).
>
>
>
> Is there anything clever here that can be done with indexes or such?
>
>
>
>
>
> Michael D. Black
>
> Senior Scientist
>
> NG Information Systems
>
> Advanced Analytics Directorate
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Bi-directional unique

2011-02-09 Thread Jim Wilcoxson
On Wed, Feb 9, 2011 at 5:25 PM, Simon Slavin  wrote:

>
> On 9 Feb 2011, at 7:32pm, Black, Michael (IS) wrote:
>
> > Didn't we just determine a couple of days ago that triggers were
> performance killers?
> >
> > That's one reason I want to avoid those.
>
> Okay, then since your program knows the logic of how that table will be
> used, it can do it for itself.  Just do two INSERTS.
>
> Alternatively, if your language is capable of it, simply well-order the two
> personIDs, so that the one in one column is always smaller than the one in
> the other column.  Then change your SELECT logic so that it returns the
> UNION of SELECTing on both columns.
>
> One system speeds up the INSERTs, the other speeds up the SELECTs.  It
> depends which is the  more important to you.
>

I assumed you could generate k in a procedural language outside of SQL, but
if you want to do this purely in SQL, I think you can just say:

create table t(k int primary key)
insert into t values (min(?,?)<<32 | max(?,?))

and bind i,j,i,j to the parameters.

For the select, same thing:

select * from t where k=min(?,?)<<32 | max(?,?)
and bind i,j,i,j

I don't see the need to do 2 selects or 2 inserts, but maybe I'm not
understanding something.

Jim
---
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-11 Thread Jim Wilcoxson
On Fri, Feb 11, 2011 at 10:50 PM, Thomas Fjellstrom <
tfjellst...@strangesoft.net> wrote:

> I've seen numbers on the internet stating that sqlite is super fast, should
> be
> able to handle tens of thousands of inserts or updates in a second or two,
> even in sync mode. So I'm a bit stumped as to why its performance is so low
> in
> my case. One time, I saw it take 5 seconds to flush almost 3k rows back to
> the
> db.
>
> No hosts are actually ever removed from the database, which has about 120k
> hosts in it by now, totaling up to a file size of around 7-8MB total. So
> its
> not a lot of data, and the queries are pretty simple. So I really am
> stumped.
>

You mentioned your db is 8mb.  If you are using the default page size of 1k,
that means you have 8k pages in your db.  The default cache size is 2000
pages, so your db doesn't fit into SQLite's default cache.  I'd suggest
using pragma page_size=4096.  This will have the effect of increasing your
cache size from 2mb to 8mb and prevent cache overflow.  You might want to
use pragma cache_size=3000 as a safety margin.  This would be a 12MB cache
with 4KB pages.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Jim Wilcoxson
On Sat, Feb 12, 2011 at 9:10 AM, Black, Michael (IS)  wrote:

> Here's a little benchmark program I wrote to test a super-simple
> one-integer insert to test sql speed and commit interval behavior.
>
> Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I
> go from 320M inserts per second to 361K inserts per second when no
> begin/commit occurs.  With WAL mode turned on it only drops to 5.9M inserts
> per second.
>
> D:\SQLite>batch 5000 5000
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 2500
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 1200
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 600
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 300
> 10666.7 inserts per sec
> D:\SQLite>batch 5000 150
> 5333.3 inserts per sec
> D:\SQLite>batch 5000 75
> 3200.0 inserts per sec
> D:\SQLite>batch 5000 40
> 1777.8 inserts per sec
> D:\SQLite>batch 5000 20
> 1000.0 inserts per sec
> D:\SQLite>batch 5000 10
> 333.3 inserts per sec
> D:\SQLite>batch 5000 5
> 2406015.0 inserts per sec
> D:\SQLite>batch 5000 2
> 526315.8 inserts per sec
> D:\SQLite>batch 5000 1
> 360766.6 inserts per sec
>

Unless I'm missing something, SQLite has to update the first page of the
database on every commit, to update the change counter.  Assuming you are
using rotating media, that record can only be updated 120 times per second,
maximum, on a 7200RPM drive.

I don't understand how you can do 360K commits per second if your system is
actually doing "to the platter" writes on every commit.  Can someone clue me
in?

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson
On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov <max.vla...@gmail.com> wrote:

> On Sat, Feb 12, 2011 at 7:11 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>
> >
> >
> > Unless I'm missing something, SQLite has to update the first page of the
> > database on every commit, to update the change counter.  Assuming you are
> > using rotating media, that record can only be updated 120 times per
> second,
> > maximum, on a 7200RPM drive.
> >
> >
>
> Hmm, I think there's something wrong with this theory of a single flush per
> round
>
> Yesterday I posted here results about from tests that ended with something
> like 50 commits per second in case of single flush (journal_mode=off). I
> decided to run a simple program that writes a byte to a file and does
> windows FlushFileBuffers. It reported 53 writes per second. This was
> expected results if this theory is right. But when I connected an external
> drive (WD1200E1MS, bus-powered, it seems it is even 5400 rpm), this number
> jumped to 253. I though "Ok, something, maybe os maybe drive tried to fool
> me, let's do this other way". I did the following:
>
> -  Checked the removable drive "Quick removaI" is on so no write cache on
> os
> side
> -  opened the same test db on this drive, journal_mode=off for best
> scenario
> -  Executed 10,000 updates to a single record made as separated
> transactions
> ( UPDATE TestTable SET Id=Id+1 )
> - When the program reported ok, I quickly unplugged the usb cord. The
> device
> is bus-powered so had no power to complete any postponed operation. The
> total time was about 50 seconds and to do the real quick unplug I prepared
> my hands and the delay was no more than 200 ms (subjectively).
> - Replug the drive again and opened the db in the tool to see was there any
> corruption or other lost of data.
>
> So the final results:
> - the db was always ok and contains the correct value (id=10001 for initial
> 1).
> - the speed was about 227 commits per second so very close to my system
> flush test  (253)
>
> So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
> it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> without
> prior notice), it's still twice as much as 7200/60=120.
>

If we go with the theory that a single record/sector can only be written
once per revolution, that means writes can occur 90 times per second, or
about every 11ms on your 5400 rpm drive.  So for this test to show
corruption, you'd have to unplug the drive within 11ms of the last commit,
which I think is practically impossible.

My hypothesis to explain the behavior you are seeing is that it takes 4ms
(1/253) for your OS to flush its buffers to the hard drive cache, issue the
sync to the drive, and the hard drive to acknowledge the sync.  When it is
convenient, the drive will write this data to the disk.  The sync command
may make this more urgent than usual, which would explain why I/O slows down
if sync is used.  Or, the drive could simply have a sync timer: the first
sync command starts the timer, and when 4ms have passed, the drive dumps its
cache to the platters.

Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson
On Sun, Feb 13, 2011 at 7:07 AM, Jean-Christophe Deschamps  wrote:

>
> >So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
> >it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> >without
> >prior notice), it's still twice as much as 7200/60=120.
>
> 5400/60, 7200/60 ... those values rely on the assumption that
> successive LBAs are mapped to successive physical sectors (512 or 4K,
> whatever) on the same face of the same plater.  Is it obvious that all
> today's typical stock drives actually implement only that simple old
> scheme and not an untold mix of various interleaving techniques?
>


This is true for the case of writing multiple records or multiple sectors.
For example, if you have a drive with 5000 sectors per track and you write
sector 1 of the track with a hard sync, you may have time to write sector
2500 with a hard sync in the same revolution.  Or maybe you can write every
500 sectors with a hard sync in the same revolution, giving you a commit
rate of 10 per revolution or 900 commits per second on a 5400 rpm drive.

But what I postulate is that you can't physically write *the same* record
over and over more than 90 times per second on a 5400 rpm drive, unless the
drive, OS, or filesystem implements something like wear-leveling, where the
physical location of sectors is constantly changing.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson
On Sun, Feb 13, 2011 at 11:55 AM, Max Vlasov <max.vla...@gmail.com> wrote:

> On Sun, Feb 13, 2011 at 5:31 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>
> > On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov <max.vla...@gmail.com>
> wrote:
> > > So the final results:
> > > - the db was always ok and contains the correct value (id=10001 for
> > initial
> > > 1).
> > > - the speed was about 227 commits per second so very close to my system
> > > flush test  (253)
> > >
> > > So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even
> if
> > > it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> > > without
> > > prior notice), it's still twice as much as 7200/60=120.
> > >
> >
> > My hypothesis to explain the behavior you are seeing is that it takes 4ms
> > (1/253) for your OS to flush its buffers to the hard drive cache, issue
> the
> > sync to the drive, and the hard drive to acknowledge the sync.  When it
> is
> > convenient, the drive will write this data to the disk.  The sync command
> > may make this more urgent than usual, which would explain why I/O slows
> > down
> > if sync is used.  Or, the drive could simply have a sync timer: the first
> > sync command starts the timer, and when 4ms have passed, the drive dumps
> > its
> > cache to the platters.
> >
>
>
> Jim, I see your point,  maybe really for removable media on Windows sync is
> still lazy, but much less "lazy" then general cached operations. Another
> version is that removable hds can report that they wrote some data while
> actually work as your described postonning it a little.
>
> But I thought about how it would be possible to test this explanation . I'm
> going to do some tests that works like this. The same id updating, but in
> the middle of 10,000 operation I will unplug the cord, the sqlite will say
> that it can't commit particular insert and I can see what is the value of
> this insert. After replugging two variants possible:
> - I will find the previous value in the base. If for several dozens tests
> the result is the same, that would mean that the situation is still needed
> to be explained.
> - I will find some value less than previous to the one sqlite could not
> commit at least in some of the tests (maybe -3, -4 to the failed one). In
> this case the explanation will be confirmed.
>
> How about this? Does it sound reasonable?
>
>
Sounds like a clever test to me!

I also found this page, used to force a Linux system crash:

http://evuraan.blogspot.com/2005/06/how-to-force-paniccrash-in-linux.html

I seem to remember a post that SQLite commit/sync is tested with the kill
command, but it seems like actually crashing the machine (which is done
without a sync first) might be a better test.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson
On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>
> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>
> > > But I thought about how it would be possible to test this explanation .
> > I'm
> > > going to do some tests that works like this. The same id updating, but in
> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
> > say
> > > that it can't commit particular insert and I can see what is the value of
> > > this insert. After replugging two variants possible:
> > > - I will find the previous value in the base. If for several dozens tests
> > > the result is the same, that would mean that the situation is still
> > needed
> > > to be explained.
> > > - I will find some value less than previous to the one sqlite could not
> > > commit at least in some of the tests (maybe -3, -4 to the failed one). In
> > > this case the explanation will be confirmed.
> > >
> > > How about this? Does it sound reasonable?
> > >
> > >
> > Sounds like a clever test to me!
> >
> >
> >
> Thanks for supporting :) now the results.
>
> I switched off the journal:
>  PRAGMA journal_mode=OFF;
> As I described, the only record contained id=1 and the sql query was
>  UPDATE TestTable SET Id=Id + 1
> The cord was unplugged in the middle of the 10,000 operations when about
> 5000 records were updated. The hd was bus-powered external hard drive and I
> repeated the test several times. No test showed expected value that confirms
> the Jim's explanation about postponed writing. The actual values are below
> (the first one is the expected value to be found after reopening, the next
> one is actual value and the difference is self-explainable)
>
> 5094 -> 5086 = -8
> 5084 -> 5083 = -1
> 5070 -> 5049 = -21
> 5082 -> 5069 = -13
> 5095 -> 5086 = -9
> 5072 -> 5033 = -39
> 5118 -> 5053 = -65
> 5081 -> 5075 = -6
>
> So the maximum of non-flushed commits was 65
>
> I also made a couple of tests with journal on and see what is the difference
> between expected, non-restored value, and restored value. One of the
> results:
> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
> It seems that for correctly operating os/hardware the numbers should be
> 5078->5077->5077 or no journal present depending on the moment of
> unplugging. So this postponing also made possible existing of some prior
> version of journal file.
>
> So, the next question was 'where?'. Is this software or hardware to blame.
> Richard always mentioned hardware in this context, I decided to check this
> with another device. This time it was a harddisk box with separated bus and
> power (the hd was Maxtor IDE 80Gb drive).
>
> The first variation was similar to the one with bus-powered, this time I
> used hard button on the box that is equivalent to unplugging both connection
> and power. The difference for a single test was really impressive 5355 ->
> 4445 = -910. And when I calculated numbers for non-interrupted test the
> drive showed about 300 commits per second.
>
> The second variation was just unplugging the cord but keeping the power
> intact, so if it's drive that caches, it would end its operations
> completely. This time the results were perfect, for example 4822 -> 4822,
> and even 5371 -> 5372 = +1 that actually would mean the process was
> interrupted after all data is written but before sqlite made winsync, os
> reported failure, but the data was already there.
>
> So the sad news about faulty hardware is probably true once again.

Hi Max - this turned out to be a good test you devised!  You know, you
can never learn *too* much. :)

Your test confirms that the USB drive is saying data is written to the
drive when it really isn't, which seems bad.  But it got me thinking a
little more.  Your drive for sure has lost the Durability property of
ACID: when SQLite comes back from a commit, the data just written may
or may not  be on the drive, as you proved.

So my question is, does it maintain the other 3 parts of ACID, so that
the database will never be in a corrupted state after a power loss,
even though it may be missing some writes that were confirmed?  There
are many use cases where a drive like yours would be usable by SQLite:
it would be as if the async vfs thing was being used.  But if using
this drive could lead to a corrupted database, the drive isn't safe to
use at all with SQLite, IMO.

It would be interesting to know the difference, and even better, if
SQLite or an SQLite app could somehow run a test on a drive to figure
out whether the drive is:

a) perfectly safe: every commit is on the 

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson
On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.
>
>
> Pavel

Hmm... I guess I am not quite convinced. :)

I believe (though am not sure) that the drive can reorder individual
sector/record writes between syncs however it likes, as long as it
doesn't allow writes to cross a sync and also executes the sync groups
(for lack of a better term - the group of writes between syncs) in
order.

As an example, say you are inserting records on each commit instead of
updating.  Maybe the last record inserted was 1000 and the plug is
pulled.  When plugged back in, if you only have 800 records in the db,
it's not so bad.  However, if there is a corrupt db instead, it is
much worse.

Maybe Max would be kind enough to re-run his test with an insert
instead of an update and see if there ever is corruption or if just
the last n inserts are missing.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed expected value that confirms
>>> the Jim's explanation about postponed writing. The actual values are below
>>> (the first one is the expected value to be found after reopening, the next
>>> one is actual value and the difference is self-explainable)
>>>
>>> 5094 -> 5086 = -8
>>> 5084 -> 5083 = -1
>>> 5070 -> 5049 = -21
>>> 5082 -> 5069 = -13
>>> 5095 -> 5086 = -9
>>> 5072 -> 5033 = -39
>>> 5118 -> 5053 = -65
>>> 5081 -> 5075 = -6
>>>
>>> So the maximum of non-flushed commits was 65
>>>
>>> I also made a couple of tests with journal on and see what is the difference
>>> between expected, non-restored value, and restored value. One of the
>>> results:
>>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
>>> It seems that for correctly ope

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson
On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.

You are right I think.  I wrote my own test program and ran it on a
Macbook Pro with a Seagate USB drive.  Here is the Python test
program:

import os, sqlite3, time

path = '/Volumes/TM/file.db'
exists = os.path.exists(path)
con = sqlite3.connect(path)
if exists:
k = con.execute('select max(k) from t').fetchone()[0]
print 'Previous run:', k
con.execute('drop table t')
con.commit()
con.execute('create table t (k integer primary key)')

t0 = time.time()
try:
for i in range(5000):
con.execute('insert into t values(?)', (i,))
con.commit()
con.close()
except Exception, err:
print err, 'at record', i

et = time.time() - t0
print i, 'records inserted in', et, 'seconds;', i/et, 'recs/sec'


Here are the results:

1. First run, no unplug:

[jim@mb backup]$ time py t.py
4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec

real0m17.794s
user0m0.729s
sys 0m2.139s

2. Unplugged during the run:

[jim@mb backup]$ time py t.py
Previous run: 4999
disk I/O error at record 2681
2681 records inserted in 9.2398519516 seconds; 290.156164194 recs/sec

real0m9.294s
user0m0.410s
sys 0m1.193s

3. This shows that only 161 records made it to disk in the previous
run, not 2681:

[jim@mb backup]$ time py t.py
Previous run: 161
disk I/O error at record 977
977 records inserted in 3.34149599075 seconds; 292.384010845 recs/sec

real0m3.453s
user0m0.162s
sys 0m0.432s

4. This time there was no unplug, but SQLite could not recover the db:

[jim@mb backup]$ time py t.py
Previous run: 161
database disk image is malformed at record 1323
1323 records inserted in 4.26392698288 seconds; 310.277358246 recs/sec

real0m4.341s
user0m0.193s
sys 0m0.535s

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
>
> Pavel
>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson <pri...@gmail.com> wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov <max.vla...@gmail.com> wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson
On Mon, Feb 14, 2011 at 3:02 PM, Max Vlasov <max.vla...@gmail.com> wrote:
> On Mon, Feb 14, 2011 at 8:42 PM, Jim Wilcoxson <pri...@gmail.com> wrote:
>
>> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
>> >> So my question is, does it maintain the other 3 parts of ACID, so that
>> >> the database will never be in a corrupted state after a power loss,
>> >> even though it may be missing some writes that were confirmed?
>> >
>> > Jim, I think the answer to your question is already in Max's tests:
>> > the USB drive is completely unreliable and can easily lead to database
>> > corruption. I'll explain. Max's tests showed that there were
>> > situations when database and journal were different not by one
>> > transaction but by several ones. So if one makes changes to several
>> > database pages (located in different disk sectors) and/or makes
>> > transactions touching several database pages (which makes multi-page
>> > journal files) then these pages can be way out of sync with each other
>> > (including pages inside journal). And this will easily lead to
>> > database corruption.
>>
>> You are right I think.  I wrote my own test program and ran it on a
>> Macbook Pro with a Seagate USB drive.  Here is the Python test
>> program:
>>
>> [jim@mb backup]$ time py t.py
>> 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec
>>
>>
>
> Jim, your tests also shows (this time on a different os) that either you
> have a fantastic hd with 18000 Rpm or just someone at Seagate _want_ you to
> think you have a fantastic hd :)
>
> Just wondering, I know this maybe sounds fantastic, but I'm thinking whether
> some "acid-passed harddrives" at sqlite.org can encourage manufacturers to
> hold the horses. The logic would be like this: if some model is present in
> either section then googling it will make this page very high in the google
> results  (due to high pagerank of sqlite.org). So they probably very quickly
> notice that this page at least partly can affect their sales. Unfortunately
> the technical side is more complex, the developers just can't rely on
> e-mails from users, this should be some sqlite-originated tests performing
> on a known configuration and it'd better be an oss os with known tuning.
> Maybe some other, less fantastic form of such tests could be possible...
>
> Max

I think what would be useful is for drives to have 3 settings:

a) the drive can do whatever it wants to optimize performance
b) the drive can reorder writes, but not across a sync (ACI, but no D)
c) the drive has to respect all syncs (ACID)

If the drive mgfr wants to make a) the default, that's fine, but there
should be an easy way to request the other 2 from an application.
Users are not usually sophisticated enough to know when it's okay to
cache writes or not okay.  For my use of SQLite, b) would probably be
fine, but a) apparently is not since it corrupts databases.

Michael mentioned doing backups and forgetting about all of this, but
many people are using USB drives to store their backups.  So there is
a bit of a Catch-22 here.  Almost all modern-day filesystems are going
to depend on some kind of journalling to prevent corrupted file
systems, and as far I as know, journalling filesystems depend on syncs
to maintain FS consistency.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
This is a common issue on the mailing list.  The first time you do
count(*), SQLite (actually your OS) has to load data into memory.
Most OS's will keep the file in a buffer cache, so the 2nd count(*)
doesn't have to read from disk.

Here's a timing from my own system, after a purge command to clear the
buffer cache:

$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.580s
user0m0.190s
sys 0m0.034s

Same command again, with the file cached:

$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.189s
user0m0.165s
sys 0m0.019s

This time is consistent no matter how many times I run it, because the
file is still cached.  Doing a purge command to clear the cache and
re-running the query, we get:

$ purge
$ time sqlite3 hb.db 'select count(*) from logs'
-- Loading resources from /Users/jim/.sqliterc
count(*)
--
734909

real0m0.427s
user0m0.175s
sys 0m0.024s

On my system, there is not a huge difference, but it is consistent.
Now, if you have a fragmented file system, you will see a much larger
difference.  There are many posts on the mailing list about both file
system fragmentation and logical fragmentation within the SQLite file
itself.  Your first count(*) is subject to these fragmentation
effects, while your 2nd usually is not, because the file is in memory.

Some people on the list believe fragmentation is an unimportant detail
you shouldn't worry about, because you have little control over it.
That may be true, but it's useful to understand how it can affect
performance.  I think you are seeing this first hand.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com



On Mon, Feb 21, 2011 at 9:37 AM, Sven L  wrote:
>
> Same result :(
> Note that I have compiled SQLite with the following switches:
> SQLITE_ENABLE_STAT2
> SQLITE_THREADSAFE=2
>
> I've learnt that COUNT(*) is slower than COUNT(ID), since * means the engine 
> has to traverse all columns and it might even return another value if there 
> are NULL-values...
>
> Also, this is quite interesting:
>
> sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
> 0|0|0|SCAN TABLE Items (~100 rows)
> sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
> sqlite>
>
> I would expect an index scan on the first statement. The second statement 
> tells me nada?!
>
> Thanks for your help!
>
>
>> From: slav...@bigfraud.org
>> Date: Mon, 21 Feb 2011 14:24:50 +
>> To: sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>>
>> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>>
>> > SELECT COUNT(ItemID) FROM Items;
>> >
>> > This takes around 40 seconds the first time! WHY?!
>>
>> Try again, doing everything identically except that instead of the above 
>> line use
>>
>> SELECT COUNT(*) FROM Items;
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
On Mon, Feb 21, 2011 at 10:38 AM, Puneet Kishor <punk.k...@gmail.com> wrote:
> On Mon, Feb 21, 2011 at 10:17:03AM -0500, Jim Wilcoxson wrote:
>> This is a common issue on the mailing list.  The first time you do
>> count(*), SQLite (actually your OS) has to load data into memory.
>> Most OS's will keep the file in a buffer cache, so the 2nd count(*)
>> doesn't have to read from disk.
>
>
> One question I have related to the above -- how long does that buffer
> cache remain filled with the data? I am assuming it is until new stuff
> to be cached pushes out old stuff, no?

For most OS's, the time data remains in the cache and the size of the
cache will vary as a function of available RAM.

>
> I was doing some R*Tree selects, and the first query was dog slow,
> although benchmarking showed that the actual CPU time was very small.
> Subsequent queries were lightning fast. I am assuming that the buffer is
> not getting filled with the results as much as it is getting filled with
> whatever part of the db that the program needs to open to do its work.

Right.  SQLite doesn't cache query results.  It does cache database
pages in its own cache, which by default is rather small: 2000 pages.
At the default page size is 1K, that's a 2MB cache.  To fetch records
not in it's own cache, SQLite will use the OS.  If the page is in the
OS cache, there is no seek time and no read latency (for spinning
media).

Jim

>
>
>>
>> Here's a timing from my own system, after a purge command to clear the
>> buffer cache:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.580s
>> user  0m0.190s
>> sys   0m0.034s
>>
>> Same command again, with the file cached:
>>
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.189s
>> user  0m0.165s
>> sys   0m0.019s
>>
>> This time is consistent no matter how many times I run it, because the
>> file is still cached.  Doing a purge command to clear the cache and
>> re-running the query, we get:
>>
>> $ purge
>> $ time sqlite3 hb.db 'select count(*) from logs'
>> -- Loading resources from /Users/jim/.sqliterc
>> count(*)
>> --
>> 734909
>>
>> real  0m0.427s
>> user  0m0.175s
>> sys   0m0.024s
>>
>> On my system, there is not a huge difference, but it is consistent.
>> Now, if you have a fragmented file system, you will see a much larger
>> difference.  There are many posts on the mailing list about both file
>> system fragmentation and logical fragmentation within the SQLite file
>> itself.  Your first count(*) is subject to these fragmentation
>> effects, while your 2nd usually is not, because the file is in memory.
>>
>> Some people on the list believe fragmentation is an unimportant detail
>> you shouldn't worry about, because you have little control over it.
>> That may be true, but it's useful to understand how it can affect
>> performance.  I think you are seeing this first hand.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Mon, Feb 21, 2011 at 9:37 AM, Sven L <larvpo...@hotmail.se> wrote:
>> >
>> > Same result :(
>> > Note that I have compiled SQLite with the following switches:
>> > SQLITE_ENABLE_STAT2
>> > SQLITE_THREADSAFE=2
>> >
>> > I've learnt that COUNT(*) is slower than COUNT(ID), since * means the 
>> > engine has to traverse all columns and it might even return another value 
>> > if there are NULL-values...
>> >
>> > Also, this is quite interesting:
>> >
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(ItemID) FROM Items;
>> > 0|0|0|SCAN TABLE Items (~100 rows)
>> > sqlite> EXPLAIN QUERY PLAN SELECT COUNT(*) FROM Items;
>> > sqlite>
>> >
>> > I would expect an index scan on the first statement. The second statement 
>> > tells me nada?!
>> >
>> > Thanks for your help!
>> >
>> >
>> >> From: slav...@bigfraud.org
>> >> Date: Mon, 21 Feb 2011 14:24:50 +
>> >> To: sqlite-users@sqlite.org
>> >> Subject: Re: [sqlite] COUNT() extremely slow first time!
>> >>
>> >>
>> >> On 21 Feb 2011, at 2:23pm, Sven L wrote:
>> >>
>> >> > SELECT COUNT(ItemID) FROM

Re: [sqlite] COUNT() extremely slow first time!

2011-02-21 Thread Jim Wilcoxson
On Mon, Feb 21, 2011 at 11:05 AM, Sven L  wrote:
>
> Thank you for your detailed explanation!
> First, can you please tell me how to purge the cache in Windows 7? This could 
> be very useful for my tests!

Sorry, dunno for Windows.  On Mac OSX it is the purge command, in the
development tools.  On Linux, you do: echo 3 >
/prog/sys/vm/drop_caches

>
> I'm quite sure my database itself is not fragmented, since I have only 
> inserted data.

If you insert the data in primary key order, or you don't have any key
(SQLite generates one) it's probably not very fragmented.  But keep in
mind that while to you, things are happening sequentially as you add
records, inside SQLite, things are happening quite differently.  For
example, let's say you have 4 indexes on this database.  So for every
insert you do, there are 5 database pages being affected (one for the
data row, one for each index).  These pages will fill up at different
rates, depending on the size of your keys, and will be written to disk
at different times.  So you are very likely to have data pages and
various index pages all intermixed in your SQLite file.  Also with
multiple indexes, it's unlikely that every index will be in sorted
order, based on the records you are inserting.  So IMO, there's no
practical way to avoid fragmentation within the SQLite file.

> The file system is in good shape too; Windows reports 0% fragmentation. 
> Perhaps there is some other bottleneck, like disk performance in general 
> (this is a 5400 rpm)? Or antivirus-related CPU holds? (I did turn real-time 
> scanning off though.)
> I have even turned Windows Search off (and got a 20% performance gain!). My 
> 32-bit application is running under Windows 7 (64-bit). Could WOW64 have 
> something to do with this performance issue?
>
> The size of the database is 1,98 GB, with 1.5M rows. Couldn't it be that the 
> table scan simply is darn slow for huge tables?

How much free RAM do you have?  Windows being the pig that it is, I'm
guessing not much, unless your system has > 4GB of RAM.  For
comparison, you could write a program that just read through the file,
4K at a time (I think that's the default page size for SQLite on
Windows).  Starting with a cold cache, this is near the best time you
will ever see for count(*) in SQLite.

>
> In an ideal world the COUNT() would be performed on the primary key in RAM. 
> That's not possible? :P

Sure, if all of the primary key records are in either the SQLite cache
or OS cache.  If they aren't, you have seek time.

I just did a small test to compare count(*) with count(primary key).
Here is count(*):

sqlite> explain select count(*) from logs;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Trace  0 0 000
1 Goto   0 8 000
2 OpenRead   1 120 keyinfo(1,BINARY)  00
3 Count  1 1 000
4 Close  1 0 000
5 SCopy  1 2 000
6 ResultRow  2 1 000
7 Halt   0 0 000
8 Transaction0 0 000
9 VerifyCookie   0 23000
10TableLock  0 8 0 logs   00
11Goto   0 2 000

Here is count(primary key):

sqlite> explain select count(logid) from logs;
addr  opcode p1p2p3p4 p5  comment
  -        -  --  -
0 Trace  0 0 000
1 Null   0 2 000
2 Null   0 1 000
3 Goto   0 14000
4 OpenRead   0 8 0 0  00
5 Rewind 0 9 000
6 Rowid  0 3 000
7 AggStep0 3 1 count(1)   01
8 Next   0 6 001
9 Close  0 0 000
10AggFinal   1 1 0 count(1)   00
11SCopy  1 4 000
12ResultRow  4 1 000
13Halt   0 0 000
14Transaction0 0 000
15VerifyCookie   0 23000
16TableLock  0 8 0 logs   00
17Goto   0 4 000

It's not clear to me whether for count(*), SQLite is reading each data
page or reading the primary key index pages.  But when timed, count(*)
is 16x faster than count(logid) the first time, and 8x faster once all
pages are cached.  This is just 

Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Jim Wilcoxson
The SQLite cache size is in pages (2000 by default), so by increasing
the page size 8x, you're also increasing the cache size and memory
requirements by 8x.  Not saying it's a bad thing, just something to be
aware of.

If you want to compare 1K and 8K page size and only compare the effect
page size has, you should either increase the cache size to 16000 for
1K pages or decrease the cache to 250 for 8K pages.

The other thing to be aware of is that SQLite will not allow a row to
cross 2 pages.  (It does allow a row to be larger than a page, using
an overflow page.)  So for example, if your page size is 1024 and row
size is 512 bytes, you can fit 2 rows on a page.  I'm simplifying this
somewhat and ignoring internal SQLite data, but you get the idea.  If
your row size is 513 bytes, you will have 511 bytes of waste on each
page, so 50% of your database will be "air".  As your row size heads
toward 1024 there will be less waste.  At 1025 bytes, SQLite will
start splitting rows into overflow pages, putting 1024 bytes into the
overflow page and 1 byte in the btree page.  These numbers aren't
right, but illustrate the point.

So to find a good page size, experiment and measure.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com



On Wed, Feb 23, 2011 at 10:20 AM, Teg  wrote:
> Hello Greg,
>
> I found this to be the case too. The difference between 1K and 8K is
> staggering. I default all my windows DB's to 8K now.
>
>
> Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
>
> GB> I'm currently dealing with a similar issue. I've found that the page_size
> GB> PRAGMA setting can have a dramatic effect on how long it takes to "warm 
> up"
> GB> the table. On Windows 7, with page_size=1024, a SELECT COUNT(last_column)
> GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
> takes
> GB> 8.5 seconds. This was done with a reboot between each test.
>
>
>
>
> --
> Best regards,
>  Teg                            mailto:t...@djii.com
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] COUNT() extremely slow first time!

2011-02-23 Thread Jim Wilcoxson
On Wed, Feb 23, 2011 at 11:12 AM, Sven L  wrote:
>
> Thanks for pointing this out!
>
> In my case I have spent much time on normalizing my tables, so the row size 
> should be constant in most cases. I do wonder though, what if the row size is 
> 32 bytes? Or is there a minimum?
>
> For instance, I have many lookup tables with ID+text (usually around 20 
> characters):
> MyID|MyText
>
> With a page size of 4096, will SQLite put ~200 rows in one page?

Yes, very roughly.  There is other internal information: a header on
each page, on each row, on each field, ints are variable length, etc.,
and SQLite reserves some free space on each page for later inserts.

Use sqlite3_analyzer for lots of useful info when picking a page size.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
>
>
>> Date: Wed, 23 Feb 2011 10:47:03 -0500
>> From: pri...@gmail.com
>> To: t...@djii.com; sqlite-users@sqlite.org
>> Subject: Re: [sqlite] COUNT() extremely slow first time!
>>
>> The SQLite cache size is in pages (2000 by default), so by increasing
>> the page size 8x, you're also increasing the cache size and memory
>> requirements by 8x. Not saying it's a bad thing, just something to be
>> aware of.
>>
>> If you want to compare 1K and 8K page size and only compare the effect
>> page size has, you should either increase the cache size to 16000 for
>> 1K pages or decrease the cache to 250 for 8K pages.
>>
>> The other thing to be aware of is that SQLite will not allow a row to
>> cross 2 pages. (It does allow a row to be larger than a page, using
>> an overflow page.) So for example, if your page size is 1024 and row
>> size is 512 bytes, you can fit 2 rows on a page. I'm simplifying this
>> somewhat and ignoring internal SQLite data, but you get the idea. If
>> your row size is 513 bytes, you will have 511 bytes of waste on each
>> page, so 50% of your database will be "air". As your row size heads
>> toward 1024 there will be less waste. At 1025 bytes, SQLite will
>> start splitting rows into overflow pages, putting 1024 bytes into the
>> overflow page and 1 byte in the btree page. These numbers aren't
>> right, but illustrate the point.
>>
>> So to find a good page size, experiment and measure.
>>
>> Jim
>> --
>> HashBackup: easy onsite and offsite Unix backup
>> http://www.hashbackup.com
>>
>>
>>
>> On Wed, Feb 23, 2011 at 10:20 AM, Teg  wrote:
>> > Hello Greg,
>> >
>> > I found this to be the case too. The difference between 1K and 8K is
>> > staggering. I default all my windows DB's to 8K now.
>> >
>> >
>> > Tuesday, February 22, 2011, 1:59:29 PM, you wrote:
>> >
>> > GB> I'm currently dealing with a similar issue. I've found that the 
>> > page_size
>> > GB> PRAGMA setting can have a dramatic effect on how long it takes to 
>> > "warm up"
>> > GB> the table. On Windows 7, with page_size=1024, a SELECT 
>> > COUNT(last_column)
>> > GB> takes 40.2 seconds. page_size=4096 takes 15.5 seconds. page_size=8192 
>> > takes
>> > GB> 8.5 seconds. This was done with a reboot between each test.
>> >
>> >
>> >
>> >
>> > --
>> > Best regards,
>> >  Teg                            mailto:t...@djii.com
>> >
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Commit frequency and performance

2009-02-04 Thread Jim Wilcoxson
I am creating an SQLite database via Python, and trying to understand
some performance issues.

This application does 3.8M inserts.   Most inserts are to a main
database that ends up being around 293MB.  Around 740K of the inserts
with larger data records are to 25 related databases of around 600MB
each.  The main database and 1 subdatabase are active together, then
the subdatabase is closed when it gets to around 600MB and a new one
is opened.  This is an archiving application.

As a performance baseline, I changed the application to just write out
plain sequential text files, and it takes around 62 minutes to
complete.  This includes all the Python overhead and the raw hard
drive overhead to write out the same amount of data in a "best case"
scenario:

  Time: 3754.16 seconds
  Files: 708120 Bytes: 31565490710

  real62m34.335s
  user53m55.492s
  sys 2m58.305s

If I use SQLite and commit every 5 seconds, I get this performance:

  Time: 11383.95 seconds
  Files: 708120 Bytes: 31565490710

  real189m45.061s
  user55m58.638s
  sys 4m46.528s

If I commit every 30 seconds, I get this:

  Time: 13021.34 seconds
  Files: 708120 Bytes: 31565490710

  real217m2.078s
  user56m9.647s
  sys 4m59.850s

I believe fsync/fdatasync are significant performance issues, so I
thought that it should improve performance if I start a transaction,
do ALL of the inserts, then commit.  Since I'm starting with an empty
database, the journal should stay mostly empty, the database will be
built without any syncs, and when I commit, the small journal will be
deleted.  I also added pragma synchronous=off.  (All of these test are
run with pragma lockmode exclusive.)  But when I tried this, the
performance was slowest of all:

  Time: 15356.42 seconds
  Files: 708120 Bytes: 31565490710

  real255m57.523s
  user55m51.215s
  sys 4m22.173s

Can anyone shed light on why building a database inside a single
transaction would be slower than periodically commiting?

Thanks,
Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Hundreds of thousands of INSERTs

2009-02-17 Thread Jim Wilcoxson
I had a similar experience: changing my page size decreased the run
time of my SQLite application from 4 hours to 80 minutes (Linux).

I think it would be very good if the default page size was changed to
4096, at least for LInux builds.  It makes a huge difference in
performance.

Jim

On 2/17/09, Paolo Pisati  wrote:
> Alexey Pechnikov wrote:
>>
>> PRAGMA default_cache_size = 20;
>> pragma page_size=4096;
>> vacuum;
>>
>
> sqlite> pragma page_size;
> 4096
> sqlite> PRAGMA default_cache_size;
> 20
>
> p...@ferret:~/log_analyzer >date ; ./log_parser_ms_sqlite.pl 2
> vmta2-2009-02-16_1400-success.log.bz2.done.ec.ed ; date
> Tue Feb 17 15:07:40 CET 2009
> Tue Feb 17 15:13:22 CET 2009
>
> it's a 75% shave of application time, amazing!!!
>
> --
>
> bye,
> P.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow performance with Sum function

2009-03-04 Thread Jim Wilcoxson
Have you tried changing the page size to 4096 or 8192?  Doing this
with my SQLite application and increasing the transaction size
decreased runtime from over 4 hours to 75 minutes.The runtime for
my app writing the same amount of data to flat files was 55 minutes,
so the time penalty for building a database was about 35%, which
seemed reasonable.

I haven't tried changing the cache size yet, because I like that my
app uses a small amount of memory.

Good luck!
Jim

On 3/4/09, Alexey Pechnikov  wrote:

> Can enough cache size prevent fragmentation? And how to calculate degree of
> fragmentation and when is needed make vacuum of database?
>
> Best regards.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite Transaction Rate and speed...

2009-03-07 Thread Jim Wilcoxson
If your goal is to handle 1 million inserts/updates per second, and a
hard transaction to disk takes 1/60th of a second, you need at least
1 insert/updates per transaction.

Do your testing with a transaction size of 20,000 and see what kind of
performance you get.  I'd probably set it higher, like to 100,000
transactions if you have a continuous stream of data coming in at this
rate.

You get no benefit from one insert/update per commit, unless you are
acking the transaction back to the source and it has retry/resend
capabilities.  In that case, you need a battery backed-up disk
controller, and still will probably benefit from a large transaction
size.  I sort of doubt you have retry/resend abilities though, because
for a data stream coming in at 192 MBit/sec (3 doubles = 192 bits),
any kind of latency for error recovery might put you in a situation
where you could never catch back up.

Also, if you use "insert or replace", you can avoid the select,
speeding up your app.r

Jim


On 3/6/09, Nuzzi  wrote:
>
>
>
> ken-33 wrote:
>>
>>
>>
>> look at the sql syntax for insert or replace for sqlite.
>>
>> Also you goal to handle 1 million per minute is probably going to be
>> dependant upon your hardware.
>>
>> For instance throughput greatly increases with disk striping.
>>
>> Also the faster the RPM of the drive the more transactions can be
>> processed. Code it up and find out!
>>
>>
>
> I have actually coded it up.  The way I am currently doing it is sending the
> data to a function (the data is 3 doubles) and in that function doing a
> SELECT to get the data currently in the DB, then updating the data, then
> UPDATE or INSERT.  The SQL calls are compiled statements with binds, etc.
> It is woefully slow.  I was kind of hoping that maybe I was missing a step
> or just unfamiliar with the best techniques. I know that when adding a lot
> of data with the BEGIN and END TRANSACTION things speed up considerably, but
> I can't use it in my case (I don't believe) because the UPDATE depends upon
> the data in the SELECT.
>
> John
>
> --
> View this message in context:
> http://www.nabble.com/SQLite-Transaction-Rate-and-speed...-tp22379931p22380539.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] I need help with very complex queries

2009-03-10 Thread Jim Wilcoxson
You have specified how the movies table relates to the other tables,
but you haven't specified any independent selection criteria for any
tables.   For example, in your query you need to add something like:
and genres.genre = 'drama'.  For this query, only the movies and
genres tables are needed because you aren't selecting any fields from
the other tables, from what I can tell.

Running through a few online SQL tutorials will help you get a better
feel for how SQL works.

Jim

On 3/10/09, Yuzem  wrote:
>
> As an example, I have:
> tables:
> movies genres keywords languages countries etc...
>
> movies columns:
> id title year director etc
>
> The others are:
> id "name of the table" (example: id genres)
>
> The id in all the tables are the same column that is the id of the movie so
> that every movie can be many times in all the tables but only one time in
> movies.
>
> What I want to do is something like:
> select ... where genres = Horror and countries = italy and keywords = 
>
> This is what I have:
> sqlite3 movies.db "select movies.id,title,year from
> movies,genres,countries,languages,keywords,tags where movies.id = genres.id
> and movies.id = countries.id and movies.id = languages.id and movies.id =
> keywords.id and movies.id = tags.id"
>
> There is no query and it is giving me nothing...
> Many thanks in advance!
> --
> View this message in context:
> http://www.nabble.com/I-need-help-with-very-complex-queries-tp22446301p22446301.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] designing a db to hold repeating data

2009-03-15 Thread Jim Wilcoxson
You could eliminate met_grid_id from the cells table and replace it
with an expression cell_id/2500.  This expression will automatically
truncate, giving you met_grid_id whenever you need it.  This will save
around 5 MB for a 1M row cell table.

Also, queries in the cells table by met_grid_id, if you do that, can
be made much faster using the relationship above.  If you query
directly on met_grid_id, sqlite will have to do a table scan.  But if
you use the relationship, sqlite can use the index, for example:

   select * from cells where met_grid_id = N  (table scan required)

becomes:

   select *,cell_id/2500 as met_grid_id from cells where cell_id
between N*2500 and (N*5000)-1   (can use the cell_id index)

I think by the same method you can eliminate met_grid_id from the met
table, saving around 14.5MB

Jim


On 3/15/09, P Kishor  wrote:
> I have a grid of 1000 x 1000 cells with their own data as well as 20
> years of daily weather data (20 * 365 = 7300 rows) for each contiguous
> group of 50 x 50 cell.
>
> CREATE TABLE cells (
>   cell_id INTEGER PRIMARY KEY,
>   other cell attributes,
>   lat,
>   lon,
>   met_grid_id INTEGER
> );
>
> cell_id met_grid_id
> --- 
> 0   0
> 1   0
> ..
> 24990
> 25001
> 25011
> ..
> 49991
>
> CREATE TABLE met (
>   met_id INTEGER PRIMARY KEY,
>   other met attributes,
>   met_grid_id INTEGER
> );
>
> met_id met_grid_id
> -- ---
> 0  0
> 1  0
> ..
> 7299   0
> 7300   1
> 7301   1
> ..
>
> CREATE VIRTUAL TABLE cell_index USING rtree (
>   cell_id INTEGER,
>   minx REAL,
>   maxx REAL,
>   miny REAL,
>   maxy REAL
> )
>
> The db is about 350 MB with the cell table with 1000,000 rows and the
> met table with 2,920,000 rows and the R*Tree index.
>
> Is there any other better way that jumps out at any of you?
>
>
> --
> Puneet Kishor http://www.punkish.org/
> Nelson Institute for Environmental Studies http://www.nelson.wisc.edu/
> Carbon Model http://carbonmodel.org/
> Open Source Geospatial Foundation http://www.osgeo.org/
> Sent from: Madison WI United States.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of DROP INDEX

2009-03-17 Thread Jim Wilcoxson
Drop is executed within a transaction, which means that every record
you touch has to be backed up to the journal first, then modified in
the database.  I'm guessing that if you use pragma synchronous=off, it
would speed up the drop index, but you'd take a chance on corrupting
the database if the machine crashed during the drop.

It would probably be faster to make a copy of the database (all
sequential I/O), then drop the index with synchronous=off in one of
the copies (no journal I/O), then use vacuum if you want to really
clean up the DB.  If something goes wrong, you still have your backup
copy.

I haven't actually tried this; let us know if it makes a big difference.

Jim

On 3/17/09, Nikolas Stevenson-Molnar  wrote:
> Hi,
>
> I'm trying to drop an index on a table with about 8 million rows and
> it's taking a very long time. I can understand why building the index
> would take some time, but why dropping it? And is there any way to
> speed it up?
>
> Thanks!
> _Nik
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Speed of DROP INDEX

2009-03-18 Thread Jim Wilcoxson
Hmm...  Maybe it is creating the journal but not really using it with
synchronous=off.

You might try pragma journal_mode = off.  That might keep it from
creating a journal, but if you already tried using synchronous=off, my
guess is journal_mode=off won't run any faster.

Jim

On 3/18/09, Nikolas Stevenson-Molnar <steve...@evergreen.edu> wrote:
> I've actually been running it with synchronous=off. Unfortunately, it
> doesn't seem to run any faster and still creates a journal file.
>
> _Nik
>
> On Mar 17, 2009, at 6:05 PM, Jim Wilcoxson wrote:
>
>> Drop is executed within a transaction, which means that every record
>> you touch has to be backed up to the journal first, then modified in
>> the database.  I'm guessing that if you use pragma synchronous=off, it
>> would speed up the drop index, but you'd take a chance on corrupting
>> the database if the machine crashed during the drop.
>>
>> It would probably be faster to make a copy of the database (all
>> sequential I/O), then drop the index with synchronous=off in one of
>> the copies (no journal I/O), then use vacuum if you want to really
>> clean up the DB.  If something goes wrong, you still have your backup
>> copy.
>>
>> I haven't actually tried this; let us know if it makes a big
>> difference.
>>
>> Jim
>>
>> On 3/17/09, Nikolas Stevenson-Molnar <steve...@evergreen.edu> wrote:
>>> Hi,
>>>
>>> I'm trying to drop an index on a table with about 8 million rows and
>>> it's taking a very long time. I can understand why building the index
>>> would take some time, but why dropping it? And is there any way to
>>> speed it up?
>>>
>>> Thanks!
>>> _Nik
>>> ___
>>> sqlite-users mailing list
>>> sqlite-users@sqlite.org
>>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>>
>>
>>
>> --
>> Software first.  Software lasts!
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Newbie: round function

2009-03-18 Thread Jim Wilcoxson
Well, it doesn't exactly say that an integer is returned.  round()
always returns a float:

sqlite> select typeof(3);
typeof(3)
--
integer

sqlite> select typeof(round(3));
typeof(round(3))

real
sqlite>

You can do this:

sqlite> select typeof(cast(round(3.14) as integer));
typeof(cast(round(3.14) as integer))

integer
sqlite>

Jim

On 3/18/09, sqlite.20.ede...@spamgourmet.com
 wrote:
> Hi!
>
> Is this expected?
>
> $ sqlite3
> SQLite version 3.6.10
> Enter ".help"for instructions
> Enter SQL statements terminated with a ";"
> sqlite> select round(3.1416);
> 3.0
> CPU Time: user 0.00 sys 0.00
> sqlite>
>
> I expected an integer 3 as documented:
>
> round(X)
> round(X,Y)Round off the number X to Y digits to the right of the
> decimal point. If the Y argument is omitted, 0 is assumed.
>
> Thanks in advance.
>
> Edésio
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] speeding up row by row lookup in a large db

2009-03-21 Thread Jim Wilcoxson
I'm not sure I completely understand your data structure, but here are
some ideas:

First, conduct experiments with different page and cache sizes.  I
don't know if you jumped to a 32K page size or experimented with each
size, but experimentation is the way to go.

I'm guessing that in your cell_lc and cell_dist tables, rows are
unique on cell_id.  If so, you could make cell_id the primary key and
avoid the indexes altogether.

When you posted before, you mentioned that you had multi-TB hard
drives, tons of memory, quad CPU's, etc, and didn't care about saving
space.  I'd denormalize the data, putting lc and dist fields in the
cell table.  That will eliminate join #2 and join #3.

You don't really say what you're doing with the results of these
queries below.  Are you doing a bunch off Perlish stuff with the
results?  It's possible that most of your time is being spent in the
Perl interpreter rather than in SQLite.  To check this, try sending
the output to /dev/null and using the sqlite3 command line tool to
execute your queries.  If they happen quickly, then the overhead is in
the manipulation of the results, not the queries.

Good luck!
Jim


On 3/21/09, P Kishor  wrote:
> Part 1.
> ---
>
> I have the following schema in a SQLite db that is 430 MB on my
> Macbook laptop's 320 GB HFS+ formatted 7200 RPM disk with an 8 MB
> cache.
>
> -- 1000,000 rows
> CREATE TABLE cell (
>   cell_id INTEGER PRIMARY KEY,
>   met_cell_id INTEGER,
>   8 other INTEGER or REAL columns
> )
>
> -- 38 rows
> CREATE TABLE lc (
>   lc_id INTEGER PRIMARY KEY,
>   56 other INTEGER or REAL columns
> )
>
> -- 10 rows
> CREATE TABLE dist (
>   dist_id INTEGER PRIMARY KEY,
>   37 other INTEGER or REAL columns
> )
>
> -- 2,920,000
> CREATE TABLE met (
>   met_id INTEGER PRIMARY KEY,
>   met_cell_id INTEGER,
>   9 other INTEGER or REAL columns
> )
>
> CREATE TABLE cell_lc (cell_id INTEGER, lc_id INTEGER)
> CREATE TABLE cell_dist (cell_id INTEGER, dist_id INTEGER)
>
> CREATE INDEX idx_met_cell_id ON met (met_cell_id)
> CREATE INDEX idx_cell_lc ON cell_lc (cell_id)
> CREATE INDEX idx_cell_dist ON cell_dist (cell_id)
>
> I also have an R*Tree index, but that is a different story, not relevant
> here.
>
> I retrieve *all* data for one cell ':cell_id' using the following queries
>
> [1] First retrieve all data from cell table
> SELECT * FROM cell WHERE cell_id = :cell_id
>
> [2] Now retrieve the related lc, dist and met
> SELECT lc.*
> FROM lc l JOIN cell_lc c on l.lc_id = c.lc_id
> WHERE c.cell_id = :cell_id
>
> [3] Retrieve the related dist
> SELECT d.*
> FROM dist d JOIN cell_lc c on d.dist_id = c.dist_id
> WHERE c.cell_id = :cell_id
>
> [4] Retrieve the related met
> SELECT * FROM met WHERE met_cell_id = 
>
> I did some benchmarking with the above schema using Perl DBI, and I
> get about 30 transactions per second as long as I returning the data
> to memory.
>
> [08:38 AM] ~/Data/carbonmodel$perl carbonmodel.pl
>
> timethis 1:  0 wallclock secs ( 0.03 usr +  0.00 sys =  0.03 CPU) @
> 33.33/s (n=1)
>
> timethis 10:  0 wallclock secs ( 0.31 usr +  0.02 sys =  0.33 CPU) @
> 30.30/s (n=10)
>
> timethis 100:  3 wallclock secs ( 2.85 usr +  0.20 sys =  3.05 CPU) @
> 32.79/s (n=100)
>
> timethis 1000: 33 wallclock secs (31.08 usr +  1.22 sys = 32.30 CPU) @
> 30.96/s (n=1000)
>
> if I write the data to file, the speed drops to about 1 transaction per
> second
>
> timethis 1000: 783 wallclock secs (732.26 usr + 18.22 sys = 750.48
> CPU) @  1.33/s (n=1000)
>
> Even if I stick with manipulating the data in memory, at 30
> transactions per second (or 33 ms per transaction), it would take more
> than 9 hours to query each of the 1 million cells one by one.
>
> In the real world, I will first find the relevant cell ids based on
> lat-lon bounds (hence my R*Tree index) and then extract their data one
> by one.
>
> How can I, if at all, speed this up?
>
> Part 2.
> ---
>
> Alternatively, I could denormalize the data completely. Inspired by a
> post on the Flickr blog
> (http://code.flickr.com/blog/2009/03/18/building-fast-client-side-searches/),
> in particular the para "To make this data available quickly from the
> server, we maintain and update a per-member cache in our database,
> where we store each member’s contact list in a text blob — this way
> it’s a single quick DB query to retrieve it. We can format this blob
> in any way we want: XML, JSON, etc" I decided to experiment with the
> same technique. So...
>
> CREATE TABLE cell_blobs (cell_id INTEGER PRIMARY KEY, cell_data BLOB);
>
> I then queried each cell as in Part 1, serialized it and stored it in
> the cell_blobs table. My intent is to simply retrieve a BLOB and
> deserialize it... it would *possibly* be quicker than 33 ms per
> retrieval. Well, I haven't yet completed this test because each BLOB
> is taking about 430 KB. At 1 million rows, that is going to occupy
> upward of 400 GB. I broke the load_blob_table routine after about a
> third of the records had 

Re: [sqlite] speeding up row by row lookup in a large db

2009-03-21 Thread Jim Wilcoxson
The reason you're getting the same results is because you are
CPU-bound.  I/O has nothing to do with this problem.  From your
timings of your app, 31.76/33 = 96% CPU.  If you were I/O bound, your
real time would be 33 seconds and your sys+user time would be 3
seconds, or something low.

My guess is you are spending more time looping around in Perl than you
think.  Try running your Perl code with one small, static set of test
data in a loop and see how much time it takes w/o any DB interactions.
 That will give you a baseline for performance improvements.  If it
turns out that 1000 loops w/o SQL takes 25 seconds instead of 33, none
of your SQL optimizations matter much.

Jim


On 3/21/09, P Kishor  wrote:

> So, I increased the cache_size to 1048576 but got the same results...
> 30 odd SELECTs per second.
>
> Then I created an in-memory db and copied all the data from the
> on-disk db to memory. I didn't use the backup API... simply opened a
> db connection to an in-memory db, then created all the tables and
> indexes, ATTACHed the on-disk db and did an INSERT .. SELECT * FROM
> attached db. Interestingly, the same results --
>
> [04:24 PM] ~/Data/carbonmodel$perl carbonmodel.pl
> Creating in memory tables... done.
> Transferring data to memory... done. Took: 90 wallclock secs (75.88
> usr +  8.44 sys = 84.32 CPU)
> Creating indexes... done. Took: 38 wallclock secs (23.82 usr + 13.36
> sys = 37.18 CPU)
> Prepare load testing
> ...timethis 1000: 33 wallclock secs (30.74 usr +  1.02 sys = 31.76
> CPU) @ 31.49/s (n=1000)
-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] How to get the previous row before rows matching a where clause...

2009-03-22 Thread Jim Wilcoxson
What about:

select blah from blah where rowid < windowstartrowid order by
rowid desc limit 1

to get the row before, and:

select blah from blah where rowid > windowlastrowid limit 1

to get the row after.

Jim

On 3/22/09, sorka  wrote:
>
> I have a table of events that have a title, start time, and end time.
>
> The start time is guaranteed unique, so I've made it my primary integer key.
>
> I need all events that overlap the a window of time between say windowstart
> and windowend.  Currently, the statement
>
> SELECT title FROM event WHERE startTime < windowEnd AND endTime >
> windowStart.
>
> I've indexed the end time and the query is pretty fast, but it could be a
> lot faster if I only had to use the integer primary key.
>
> If instead I do
>
> SELECT title from event WHERE startTime > windowStart AND startTime <
> windowEnd
>
> this will get me almost the same thing except that it will be missing the
> first event that overlaps the windowStart because it's startTime is at or
> before startTime.
>
> In this case, if I can get exactly the previous row added to what is
> returned in the results above, I'll have exactly what I need.
>
> So the question is, when a WHERE clause returns a set of rows, is there a
> way to also get the row at the ROWID that comes just before the row that is
> returned from above with the lowest ROWID?
>
> Another way of putting it, if I take the lowest ROWID that is returned in my
> second example and get the next lowest ROW, the one that is less than the
> lowest ROWID I got but closest to it .i.e, the one right before it, then it
> would be complete.
>
>
> --
> View this message in context:
> http://www.nabble.com/How-to-get-the-previous-row-before-rows-matching-a-where-clause...-tp22650799p22650799.html
> Sent from the SQLite mailing list archive at Nabble.com.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] problems with shared cache?

2009-03-24 Thread Jim Wilcoxson
Not sure if it will make a difference, but in your trigger stuff you
explicitly coded null for the primary key value.  Have you tried
changing that so that you don't specify the primary key field at all?
I can't remember from the previous post, but I think it was (or should
be) set up as autoincrement.

I think SQLite allows using multiple nulls for the primary key, but
according to their docs, it is non-standard and it says something
about "this may change in the future".  Maybe you are getting caught
in the middle of a change that is going to occur across multiple
revisions of SQLite.

Jim


On 3/24/09, Damien Elmes  wrote:
> Sorry, my application's files are called decks, and I unwittingly used
> the wrong terminology.
>
> Any ideas about the problem?
>
> On Sat, Mar 21, 2009 at 2:27 AM, Griggs, Donald
>  wrote:
>>
>>> However, when I ask the user to send me their deck, I find that:
>>>
>>> sqlite> pragma integrity_check;
>>> integrity_check
>>> ---
>>> ok
>>> sqlite> select id, count(id) from cards group by id having
>>> count(id)
 1;
>>> sqlite>
>>>
>>> Any ideas?
>>
>> Obviously, that user is not playing with a full deck.   ;-)
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating unique data takes many hours, help

2009-03-29 Thread Jim Wilcoxson
Yes, you're right, but if the data is already in index order, you'll
do less I/O when creating the index.  Whether the sort + create DB
time is less than "create DB from random input" time is another
question.

Jim

On 3/29/09, mrobi...@cs.fiu.edu  wrote:

> question:
> When you say
>> 2)   Sort your input file on the PRIMARY KEY
>>  or on some other INDEX
>
> I thought that while sqlite inserts the data it is creating the indexes
> therefore sorting the data by way of the index, is this corret?
> I have decared one column ONLY, unique and indexed.
-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] delete with an "exists" clause

2009-04-09 Thread Jim Wilcoxson
You want:

delete from dummy where var=2;

In years of DB work, I've never used exists.  If you're mentioning
this as a bug, I guess it could be: I'd have to lookup exists to see
exactly how it's supposed to work.

Jim

On 4/9/09, Dave Dyer  wrote:
>
> This little program deletes all rows.  Is this a bug, or
> perhaps I misunderstand how delete with an exists clause
> is supposed to work.
>
> drop table if exists dummy;
> create table dummy
> ( var int
> );
> insert into dummy (var) values (1);
> insert into dummy (var) values (2);
> insert into dummy (var) values (3);
>
> delete from dummy where exists (select * from dummy where var='2');
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] disk I/O error...?

2009-04-11 Thread Jim Wilcoxson
You'll get this if you have a database with an active journal
(incomplete transactions) and you don't have write access to the
database.  In other words, the database needs a rollback from some
prior operations done under a different userid, but now you don't have
write access to do the rollback.

To me, it should be a permission error instead, to make it clear
what's going on.

Jim

On 4/11/09, Alberto Simões  wrote:
> Hello
> I am getting disk I/O error with:
>
> [a...@search1 align]$ du -hs T.3.sqlite
> 122M  T.3.sqlite
> [a...@search1 align]$ sqlite3 T.3.sqlite
> SQLite version 3.6.6
> Enter ".help" for instructions
> Enter SQL statements terminated with a ";"
> sqlite> .schema
> CREATE TABLE trigrams (w1,w2,w3,occs);
> CREATE INDEX idxT3w1 ON trigrams (w1);
> sqlite> CREATE INDEX idxT3w2 ON trigrams (w2);
> SQL error: disk I/O error
> sqlite> [a...@search1 align]$ df -h .
> FilesystemSize  Used Avail Use% Mounted on
> /dev/sdc1 148G   42G  100G  30% /export3
> [a...@search1 align]$
>
> I tried to get I/O errors with other commands (for instance, yes > _)
> but couldn't get any error.
>
> Any hint on what I can check to discover the problem?
> Thank you
> Alberto
>
> --
> Alberto Simões
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Fixing a database

2009-04-22 Thread Jim Wilcoxson
I think it would be a good idea for sqlite3 to display a message like
"Database opened for read-only" if you don't have permission to write.
 I saw this problem myself where a rollback was necessary from a
previous root process, and as a new SQLite user, it was confusing and
made me think "great, the database is already corrupted".

Jim

On 4/22/09, Igor Tandetnik  wrote:
> Alberto Simoes 
> wrote:
>> I am not sure what happened, but I have a database that opens
>> correctly with sqlite3 shell, I can select and query it, but I can't
>> edit it. I get Disk I/O error. The disk is not full.
>
> You don't have write permissions to the file, perhaps?
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] corrupt database recovery

2009-04-26 Thread Jim Wilcoxson
You could do a binary search to find the highest accessible rowid:

select rowid where rowid = 2147483647 (fails)
select rowid where rowid = 1073741824 (fails)
...
select rowid where rowid = 65536 (fails)
select rowid where rowid = 32768 (works!)
select rowid where rowid = 49152 (works!)

Within 32 selects, you will find the highest accessible rowid.  Then
do an insert from select * where rowid <= XXX.

Jim

On 4/25/09, Gene  wrote:
> You are exactly right John...that is indeed what the code looks
> like...except we have over 25 columns (it's a flat table).
>
> We've already fixed the bad code, but there are some customers who have old
> versions...it didn't break very often with the old code, but it does still
> did.
>
> I haven't tried a select Min or max on the row id but a select count(*)
> returns an error...that's how I know I need to do the row by row recovery
> method.  Select * from mytable also returns an error.
>
> The tables usually have tens of thousands of rows, sometimes over a couple
> hundred thousand but that's rare.
>
> What seems to work is that I do a select * from myTable where rowId = 'X'
> incing X until I get an error.  After I get the error, every row higher then
> X also returns an error. So as soon as I get an error, I stop trying to
> recover more rows.

-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing concurrency with sql query - locks?

2009-05-06 Thread Jim Wilcoxson
I'm not sure what you are considering a massive slowdown, but let's
assume that the entire database fits into memory and disk I/O isn't
the bottleneck.  You said you're running 300 instances of the query on
several processors.  If several means 3 CPUs, then in a perfect world,
running 300 instances will be 100 times slower than running just 1.
This assumes you are getting linear scalability, which no one ever
does, so the slowdown will be more than 100x.  If you have 4
processors, running 300 queries simultaneously should still be 75x
slower (best case) than running 1.

You also mentioned seeing a wide variance in response times.  This is
typical, because most schedulers in multi-user systems won't perfectly
distribute CPU time to 300 processes.  If the scheduler decides to run
task 1 to completion, then task 2, etc., your last task will appear to
take much longer than the first task.  For example, let's say that
each task by itself takes 1 second of CPU time and zero I/O time,
assuming the database is all in memory.  300 queries will take 300
seconds to complete.  If the system scheduler runs each task, in
order, to completion, then the first task will take 1 second and the
last task will take 300 seconds to complete.  Wide variance.  Or, the
system scheduler could decide to give each task 1/100th of a second.
It will take 3 seconds for all tasks to get a timeslice.  In this
scenario, it will still take 300 seconds to complete all 300 jobs, but
they will complete within 1/100th of a second of each other, and each
job will report that it took 300 seconds to complete.  No variance.
The price you will pay for no variance is that you increase the
multiprocessing overhead because now instead of doing just 300 task
switches to execute each job to completion, you are doing 300x100 =
30,000 task switches.  Task switches aren't free, so the "no variance"
schedule will take longer overall than the wide variance.  This is a
classic fairness vs high throughput dilemma; you can't have both.

If you are seeing something like 1000x slower performance, then as
others have mentioned you could have a disk I/O or locking bottleneck.

Jim

On 5/6/09, Igor Tandetnik  wrote:
> "Rosemary Alles"  wrote
> in message news:af79a266-b697-4924-b304-2b1feccba...@ipac.caltech.edu
>> Run on a single processor, the following query is quite fast:
>>
>> When concurrency is introduced (simply running the query on several
>> processors against the same database - say 300 instances of it) causes
>> a massive slow down
>
> Well, you may have multiple CPUs, but you only have a single hard drive.
> That drive head can't be in multiple places simultaneously.
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Optimizing concurrency with sql query - locks?

2009-05-07 Thread Jim Wilcoxson
The original question was about 300 queries, which I took to mean
selects.  If the database is in memory, will 300 selects still cause
synchronous disk I/O?

Jim

On 5/6/09, John Stanton <jo...@viacognis.com> wrote:
> Sqlite is an ACID database - it ensures that data is written to disk, so
> a database in memory still shares a single disk resource.
>
> Jim Wilcoxson wrote:
>> I'm not sure what you are considering a massive slowdown, but let's
>> assume that the entire database fits into memory and disk I/O isn't
>> the bottleneck.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select performance with join

2009-05-08 Thread Jim Wilcoxson
I don't know if it makes any difference, but is that where clause the same as:

WHERE name< 'zUmM' OR (name= 'zUmM' AND S.id< 8122)

The original way it was coded, all 3 conditions would have to be
evaluated most of the time.  The new way might get most rows with 1
condition.  Depends on the data distribution of course.

Jim

On 5/8/09, galea...@korg.it  wrote:
> Citando Igor Tandetnik :
>
>> Andrea Galeazzi  wrote:
>>> but when I execute:
>>>
>>> SELECT S.id,title,artist,bpm,name
>>>
>>> FROM Song AS S
>>>
>>> LEFT JOIN Genre AS G ON (S.genre_id = G.id)
>>>
>>> WHERE name<= 'zUmM' AND (name< 'zUmM' OR S.id< 8122)
>>>
>>> ORDER BY name DESC, S.id DESC LIMIT 20;
>>
>> Note that LEFT JOIN is pointless here, since any record with G.name=NULL
>> won't make it past the WHERE clause. Replace it with plain JOIN, you
>> should see an improvement.
>>
>> Igor Tandetnik
>>
> I replaced LEFT JOIN with JOIN but it got worse, now the the time is
> about 8700 ms! But I think I need to use LEFT JOIN because I have also
> to accept the records with S.genre_id = NULL.
> I also tried this query:
> “SELECT S.id,title,artist,bpm,name "
> "FROM Song  AS S, Genre AS G "
> "WHERE (S.genre_id = G.id) AND (name<= 'zUmM') AND (name< 'zUmM' OR
> S.id< 8122) "
> "ORDER BY name DESC, S.id DESC LIMIT 20";
> even if it doesn't work for me because it doesn't  match  S.genre_id =
> NULL, I noticed a little improvement to 6000 ms. Then I delete S.id
> DESC and the performance has been abruptly increased to 400 ms.
> Anyway probably the right statement is LEFT JOIN but how can I
> optimize this kind of task?
> Is it really an hard work or does it depend on my no knowledge about sqlite?
>>
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] SQLite version 3.6.14

2009-05-08 Thread Jim Wilcoxson
May I suggest an extension PRAGMA SYNCHRONOUS = 3 | ASYNC so that
non-C bindings can use the async functionality?

Thanks, this is a great enhancement!
Jim

>> On Wed, May 06, 2009 at 10:36:50PM -0400, D. Richard Hipp wrote:
>>>
>>> A new optional extension is included that implements an
>>> asynchronous I/
>>> O backend for SQLite on either windows or unix.  The asynchronous I/O
>>> backend processes all writes using a background thread.  This gives
>>> the appearance of faster response time at the cost of durability and
>>> additional memory usage.  See http://www.sqlite.org/asyncvfs.html for
>>> additional information.

-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] select performance with join

2009-05-09 Thread Jim Wilcoxson
Thanks for the explanation.  I recall seeing posts suggesting the use
of union instead of or, and thought "if it's that easy, why doesn't
SQLite do it?"  The optimizer documentation says:

---
Suppose the OR clause consists of multiple subterms as follows:

  expr1 OR expr2 OR expr3

If every subterm of an OR clause is separately indexable and the
transformation to an IN operator described above does not apply, then
the OR clause is coded so that it logically works the same as the
following:


  rowid IN (SELECT rowid FROM table WHERE expr1
UNION SELECT rowid FROM table WHERE expr2
UNION SELECT rowid FROM table WHERE expr3)

The implemention of the OR clause does not really use subqueries. A
more efficient internal mechanism is employed. The implementation also
works even for tables where the "rowid" column name has been
overloaded for other uses and no longer refers to the real rowid. But
the essence of the implementation is captured by the statement above:
Separate indices are used to find rowids that satisfy each subterm of
the OR clause and then the union of those rowids is used to find all
matching rows in the database.
---

It sounds like it might use indexes for an OR after all.

Jim

On 5/8/09, Igor Tandetnik <itandet...@mvps.org> wrote:
> "Jim Wilcoxson" <pri...@gmail.com> wrote
> in message
> news:c5830b750905080813p6bf901bn526c81ca8ce7a...@mail.gmail.com
>> I don't know if it makes any difference, but is that where clause the
>> same as:
>>
>> WHERE name< 'zUmM' OR (name= 'zUmM' AND S.id< 8122)
>
> SQLite's optimizer cannot use an index for any condition involving OR.
> That's why it's common to write an equivalent but somewhat unnatural
>
> name <= 'zUmM' AND (name< 'zUmM' OR S.id< 8122)
>
> This way, at least the first condition has a chance of being satisfied
> with an index.
>
> Igor Tandetnik
>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupt Database Problems

2009-05-15 Thread Jim Wilcoxson
I think you have answered your own question.  If you use
synchronous=off, you are saying "I don't care much about this
database."  When you "save" documents, you are merely putting them in
a computer's cache (memory) and then confirming to the user that they
are on the hard drive, when they aren't necessarily there.

So, user clicks Save, program says it saved it, user turns off
computer, database is corrupt.  Don't know why this would happen all
of a sudden, unless maybe they upgraded their OS and it has decided to
cache volatile data longer to increase performance at the expense of
data integrity.

I hope you're able to rescue your data.  Someone else mentioned on
this list a while back that they could recover their data by doing
retrieval based on rowid: do a select * where rowid=1, then 2, then 3,
etc. until you get a failure.  Once you get a failure, the rest is
lost.

Good luck,
Jim

On 5/15/09, Kevin Gale  wrote:
...
> 4. synchronous is OFF (we have stopped setting this in the new build of our
> app).

-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corrupt Database Problems

2009-05-18 Thread Jim Wilcoxson
On 5/18/09, Kevin Gale  wrote:

> According to the PRAGMA command documentation the database might become
> corrupted if the o/s crashes or the computer loses power before the data has
> been written to the disk surface. From the information we have from the
> customer they continued to use the computer for some time after performing
> the final save. This indicates that there could be other situations that
> also cause this problem.

It really depends on when the OS decides to flush its cache.  All/most
OS's cache data you write before actually putting it on the drive.
These days, drives also cache writes.  When the application says "sync
this to disk", it's supposed to wait until the data is physically
recorded on the drive before returning.  But because of all the
caching going on at various levels, it's sometimes hard to ensure this
happens, and it's hard to verify that the write really did happen:
when you try to read the data, the OS gives it to you, either from its
cache (most likely) or from the drive, but you can't tell which was
the source.

You might want to do some crash tests with a virtual machine to find
out what's happening.

Good luck!
Jim
-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request

2009-05-23 Thread Jim Wilcoxson
I'd like to suggest that features such as this be implemented with
PRAGMA commands whenever possible, the reason being that the new
functionality will then be immediately available to folks using non-C
bindings, without waiting for an update to a binding package that may
or may not occur.

Jim

On 5/23/09, Filip Navara  wrote:
> PRAGMA schema_version ... for the second case.
>
> F.
>
> On Sat, May 23, 2009 at 4:20 PM, Simon Slavin
>  wrote:
>>
>> On 21 May 2009, at 5:59am, Simon Slavin wrote:
>>
>>> int sqlite3_change_count(sqlite3*)
>>>
>>> Returns an integer which is incremented whenn a change is made to any
>>> table in the database.  May be the value stored in bytes 24..27 of
>>> file header or something else if that's not what I really want.
>>
>> Having thought about it some more, I want two different numbers.  One
>> of them changes when a field changes: INSERT/DELETE/UPDATE.  The other
>> changes when the schema changes: ALTER/DROP/CREATE.  Rather than crowd
>> the function library with multiple functions, it might be possible to
>> implement it like so:
>>
>> int sqlite3_change_count(sqlite3*, int cTypes)
>>
>> when cTypes = 1, you get the number of changes to the schema
>> when cTypes = 2, you get the number of changes to fields
>> when cTypes = 3, you get the sum of the above two figures
>>
>> Or possibly 1 and 2 should be the other way around.
>>
>> Any ideas, criticism, or reasons it can't be done ?
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Minor errors in CREATE TABLE syntax diagrams

2009-05-26 Thread Jim Wilcoxson
For my money, I'd prefer to have a smaller, faster parser that worked
correctly on correct input at the expense of not catching all possible
syntax errors on silly input.  There is a definite trade-off here, and
I could see where a totally complete parser that caught every possible
error in SQL grammer might be twice the size of the entire SQLite code
base.

Of course, you don't want an SQL syntax typo to trash your database
either, without warning.  I'm assuming the SQLite developers have made
reasonable decisions about which parsing errors are important, and
which aren't.

Jim


On 5/26/09, John Machin  wrote:
>
> 1. SQLite allows NULL as a column-constraint.
>
> E.g. CREATE TABLE tname (col0 TEXT NOT NULL, col1 TEXT NULL);
>
> The column-constraint diagram doesn't show this possibility.
>
> Aside: The empirical evidence is that NULL is recognised and *ignored*;
> consequently there is no warning about sillinesses and typoes like in
> these examples of column-def:
> col1 INTEGER NOT NULL NULL
> col1 INTEGER NOTE NULL -- type="INTEGER NOTE", constraint="NULL"
>
> 2. According to the diagram for foreign-key-clause, there is no "express
> track" which allows skipping both "ON DELETE|UPDATE|INSERT etc" and
> "MATCH name". However SQLite does permit all of that to be skipped.
>
> E.g. CREATE TABLE tname(col0 TEXT PRIMARY KEY, col1 TEXT REFERENCES
> ftable(fcol));
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow Transaction Speed?

2009-05-27 Thread Jim Wilcoxson
I'm running on Linux with ext3 and just wrote a Python test program to
insert rows into a table with a single column, no indexing, and doing
a commit after each insert.  When I first ran it, I got around 440
inserts per second, which is clearly impossible.  This is a 7200rpm
drive, so even if I could write a row on every revolution, the maximum
insert rate would be 120 per second.  I tried adding "pragma
sychronous=normal", and it sped up to 520 TPS.  With synchronous=full
it slowed back to 420-460, so this must be the Python default.  Adding
synchronous=off increased the rate to over 6000 TPS -- basically
writing to memory instead of disk.

After using hdparm to turn off the drive cache (hdparm -W 0 /dev/sda),
the insert rate was 15 rows per second.  So it seems that for my
particular hardware configuration, the Linux fsync call isn't doing
what it should be doing.  I have built the SQLite stuff from source,
so perhaps my build has a problem.  If you look on the Linux kernel
mailing list archives, there are several discussions about drive write
caching not playing nice with ext3's efforts to ensure data is
actually on the disk.

$ uname -a
Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008
i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux

Here is a test program you can run, to show your system's maximum
physical I/O rate:

#include 
#include 
#include 
#include 
#include 

#define MAX 3000

main () {
  int fd;
  int n;
  int loops;
  time_t elap;
  time_t start;

  if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) {
perror("Error opening file");
exit(1);
  }

  start = time(NULL);
  for(loops=0; loops wrote:
> Thanks Nick,
>
> good point. ahh yes, I've read about this somewhere...
>
> My extension is currently ".db", a quick check indicates
> that using ".abc" gives a slight speed improvement, maybe 10%.
> But that is allready very close to the variation I get between
> different test runs, so I'm not really sure if I have this
> "microsoft effect" here.
>
> Anyway, thanks for the reminder.
>
> Well, I think now there is nothing wrong here,
> it is just as it is... Currently I achieve about 10 transactions/second,
> maybe
> not that bad... still slower than the "few dozen" that are mentioned on the
> www pages, but I agree that there too many issues that may affect the disk
> speed.
>
>
> Marcus
>
>
> Brandon, Nicholas (UK) wrote:
>>> my developemnt system is a Win XP, with of course NTFS, I
>>> don't know which drive it has, I guess a standard 7200 rpm.
>>>
>>
>> What file extension (i.e. the letters after the dot in the filename) do
>> you give the database?
>>
>> I faintly recall there is a windows peculiarity with system restore or
>> something similar that archives certain file extensions in the
>> background. That may contribute to your slow down.
>>
>> Nick
>>
>> 
>> This email and any attachments are confidential to the intended
>> recipient and may also be privileged. If you are not the intended
>> recipient please delete it from your system and notify the sender.
>> You should not copy it or use it for any purpose nor disclose or
>> distribute its contents to any other person.
>> 
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> 

Re: [sqlite] Slow Transaction Speed?

2009-05-27 Thread Jim Wilcoxson
I should have mentioned that when running my C test program on Linux,
with the hard drive cache enabled, it ran instantly.   But it should
have taken quite a while to do 3000 fsyncs.  So this problem with
defeating the hard drive cache is not an SQLite issue; it's a Linux
issue.

I also ran the same C test program on a virtual machine, without any
hdparm tweaking, and the syncs worked correctly: I got about 115 TPS.
This system is also using ext3, but a different version of Linux:

$ uname -a
Linux prirun 2.6.24-19-xen #1 SMP Sat Jul 12 00:15:59 UTC 2008 x86_64
Dual-Core AMD Opteron(tm) Processor 2212 AuthenticAMD GNU/Linux

So I don't know why syncs work correctly in one place but not the
other.  Could be my hardware config, the virtualization layer fixing
things, different version of Linux, ...

If you see SQLite run much slower on ReiserFS than ext3, it probably
means that ReiserFS is doing the syncs correctly, but ext3 is getting
cached.

Jim

On 5/27/09, Jim Wilcoxson <pri...@gmail.com> wrote:
> I'm running on Linux with ext3 and just wrote a Python test program to
> insert rows into a table with a single column, no indexing, and doing
> a commit after each insert.  When I first ran it, I got around 440
> inserts per second, which is clearly impossible.  This is a 7200rpm
> drive, so even if I could write a row on every revolution, the maximum
> insert rate would be 120 per second.  I tried adding "pragma
> sychronous=normal", and it sped up to 520 TPS.  With synchronous=full
> it slowed back to 420-460, so this must be the Python default.  Adding
> synchronous=off increased the rate to over 6000 TPS -- basically
> writing to memory instead of disk.
>
> After using hdparm to turn off the drive cache (hdparm -W 0 /dev/sda),
> the insert rate was 15 rows per second.  So it seems that for my
> particular hardware configuration, the Linux fsync call isn't doing
> what it should be doing.  I have built the SQLite stuff from source,
> so perhaps my build has a problem.  If you look on the Linux kernel
> mailing list archives, there are several discussions about drive write
> caching not playing nice with ext3's efforts to ensure data is
> actually on the disk.
>
> $ uname -a
> Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008
> i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux
>
> Here is a test program you can run, to show your system's maximum
> physical I/O rate:
>
> #include 
> #include 
> #include 
> #include 
> #include 
>
> #define MAX 3000
>
> main () {
>   int fd;
>   int n;
>   int loops;
>   time_t elap;
>   time_t start;
>
>   if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) {
> perror("Error opening file");
> exit(1);
>   }
>
>   start = time(NULL);
>   for(loops=0; loops<MAX; loops++) {
> if (lseek(fd, 0, SEEK_SET) == -1) {
>   perror("Error seeking file");
>   exit(1);
> }
> n = write(fd, , 1);
> if (n != 1) {
>   perror("Error writing file");
>   exit(1);
> }
> fsync(fd);
>   }
>   elap = time(NULL)-start;
>   printf("Time: %d seconds; TPS=%f\n", elap, MAX*1.0/elap);
> }
>
> On my system, I get these results:
>
> [...@amd ~]$ cc -o sync sync.c
> [...@amd ~]$ ./sync
> Time: 25 seconds; TPS=120.00
> [...@amd ~]$
>
> Running vmstat concurrently, we see this:
>
> [...@amd]$ vmstat 5
> procs ---memory-- ---swap-- -io -system--
> cpu
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id
> wa
>  0  0  0 791108 127632  7635600 0 9   10   33  1  1 98
> 1
>  0  1  0 791100 127648  7635600 0   439  102  212  0  1 13
> 86
>  0  1  0 791100 127664  7635600 0   478  119  242  0  1  0
> 99
>  0  1  0 790976 127672  7635600 0   478  119  242  0  1  0
> 99
>  0  1  0 790976 127688  7635600 0   481  120  244  0
> 1  0 99   <-- steady state
>  0  1  0 790976 127696  7635600 0   482  120  244  0  1  0
> 99
>  0  0  0 790984 127700  7635600 0   167   40   75  0  0 71
> 29
>  0  0  0 790984 127712  7635600 0 316  0  0 100
> 0
>
> During the steady state, there are 480KBytes written per second.
> Linux does I/O in 4K chunks, so dividing 480K by 4K gives you I/O's
> per second: 120.
>
> Jim
>
> On 5/27/09, Marcus Grimm <mgr...@medcom-online.de> wrote:
>> Thanks Nick,
>>
>> good point. ahh yes, I've read about this somewhere...
>>
>> My extension is currently ".db", a quick check indicates
>> that using "

Re: [sqlite] corrupt database recovery

2009-05-27 Thread Jim Wilcoxson
What platform is this?  I just posted a note today that my Linux box
running 2.6-25 and ext3 isn't doing fsyncs like it should, so I would
be susceptible to DB corruption if my machine crashed during DB I/O.
I posted a C program you could run on the specific machine with a
corrupt DB to see if it's really doing synchronous I/O.

The other thing you might want to check is hardware.  Running memtest
on the machine overnight could show something, unless you are on
server-class hardware with ECC memory.

Jim


On 5/27/09, Gene  wrote:
> My code is outside the database layer.  So I do all my database work, then
> compress and encrypt it.  No errors are returned anywhere.  I'm guessing
> that it's going to be an uninitialized variable or byte alignment problems
> somewhere.
>
> This code is running on hundreds of machines without a problem and I've
> never reproduced it but every now and again I get a support ticket showing a
> corrupt database.  So I'm trying to figure out WHERE to look.
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of John Elrick
> Sent: Wednesday, May 27, 2009 10:59 AM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] corrupt database recovery
>
> Gene Allen wrote:
>> Ok...it's happened again and I've decided that I need to track this down
>> once and for all!
>>
>> Here is what I'm seeing: I get errors when I do a integrity_check (see
>> below), but I can .dump it to a text file and then .read it into another
>> database ok.
>>
>> It seems to me that I'm screwing up an index or something.  Are indexes
>> stored at the end of the database file?  All I can think of is that my
>> compression/encryption routines are messing something up and I'm trying to
>> figure out 'where' to look.
>>
>> I guess the real question is, what would I have to do to make an
>> integrity_check fail, but still let a dump work correctly?
>>
>> Many thanks for any advice on tracking down this ugliness.
>>
>
> SNIP
>
> Personally, I'd refactor the code to allow me to verify the operation of
> the compression/encryption routines independently of the database
> operation.  How are you injecting the compression/encryption into the
> database layer?
>
>
> John
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Feature request: reset file mtime on rollback

2009-05-27 Thread Jim Wilcoxson
It would be handy for file synchronization if SQLite stored the
database file's mtime in the journal and reset it if a rollback
occurs.  It's difficult to do this in an application:

1. If a DB is opened, a transaction started, some stuff happens, and a
rollback is executed, the file mtime only gets changed if a cache page
had to be written.  Only SQLite knows whether cache pages were
written.

2. Suppose the DB has mtime T.  An app runs and starts modifying the
DB (and pages are written from cache), but it crashes before the
commit.  When the app starts again, if the app looks at mtime before
opening the DB, the DB appears to be modified.  After the open, the
rollback will occur, the DB will be in it's prior state when mtime was
T, but the mtime is still updated.

Not a huge deal, just would be nice.

Jim
-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow Transaction Speed?

2009-05-29 Thread Jim Wilcoxson
I agree that adding this to the library, and making it accessible via
a pragma command would be very useful.  For example, pragma commitrate
1000 would test the commit rate of 1000 commits and return the results
in transactions per second as a row.

If I install my app on a client's machine, I could run this test
periodically to ensure that the system environment is going to support
"no corruption" operation, and/or send some kind of warning to my
customer that there is a risk of corruption because their system
environment has problems.

If it were only an external program bundled with sqlite, I couldn't
really make use of it, because I'd have to distribute the program and
instructions how to use it, and rely on customers to actually do it.

This is a pretty small function.  Just for my own use, I'd consider
foreign key support to be way more bloated that this.

Jim

On 5/29/09, Marcus Grimm  wrote:
>> On Thu, May 28, 2009 at 03:12:55PM -0700, Allen Fowler scratched on the
>> wall:
>>>
>>> > just for anybody who is interested:
>>>
>>> >
>>> > I translated Jim's function into window code and added
>>> > a page of 1024 that will be written, instead of a single byte.
>>> > On my Win-XP system I got 55 TPS, much faster than sqlite
>>> > seems to write a page but that might be related to the
>>> > additional overhead sqlite needs to do.
>
> just to add: I traced a little what sqlite does when an
> simple UPDATE is done within a transaction: It does
> two syncs on the journal file and one final sync on the
> db itselve, so achieving something like 15 TPS is reasonable.
>
>
>>> >
>>> > This brings me to a nice to have feature request:
>>> > How about adding similar test function in the sqlite API ?
>>> > This might use the vfs to write pages and gives some feedback
>>> > on the performance of the system where sqlite runs on.
>>> > It might also detect problems with the commit function...
>>> > Just an idea...
>>> >
>>>
>>> Interesting idea.
>>
>>   It would make a lot more sense to make this an external utility
>>   or an extension of the sqlite3 shell.  Adding it to the core library
>>   is a definite case of code bloat.
>
> Adding it into the API would allow my application to
> easily make the test for example the first time it runs
> on a system. But maybe a problem for the users that
> apply a sqlite wrapper.
> However, having it in sqlite3 shell would be very useful as
> well.
>
> Marcus
>
>>
>>   Actually, a whole suite of performance related tests might be
>>   interesting.
>>
>>-j
>>
>> --
>> Jay A. Kreibich < J A Y  @  K R E I B I.C H >
>>
>> "Our opponent is an alien starship packed with atomic bombs.  We have
>>  a protractor."   "I'll go home and see if I can scrounge up a ruler
>>  and a piece of string."  --from Anathem by Neal Stephenson
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Join performance in SQLite

2009-05-30 Thread Jim Wilcoxson
SQLite has surprised me with its quick performance, not the other way
around.  In fact, I've implemented all kinds of lookup queries that I
knew could be optimized by caching results so I didn't have to keep
repeating the SQL query, but the performance was so good even
repeating the queries that I never bothered with the caching.

I'm sure there are queries that SQLite doesn't run as fast as database
product X, and I'm sure it goes the other way too.  It's a balancing
act, and as the primary developer, you have to choose for us what's
important to optimize and what isn't.

So far, I'm very happy with the choices and trade-offs that have been
made in SQLite. :-)

Jim

On 5/30/09, Simon Slavin  wrote:
> I'm interested in how sqlite works differently to the SQL systems
> which keep a daemon running as a background task.  One of the
> advantages of having a daemon which persists between runs of an
> application is that the daemon can keep its own list of ORDERs, and
> JOINs which are asked for frequently, and decide to maintain them even
> when no SQL-using application is running.  This can give the
> impression that something is being done very quickly, when in fact the
> majority of the time was taken during a previous run of the
> application.  It can be particularly hard to figure out what a
> performance test means under these circumstances.
>
> But the problem is that I like the way sqlite works.  I like the tiny
> library, I like the way that the SQL library is entirely inside my
> application, and any CPU load is mine.  I like knowing that when my
> app quits, nothing is going on.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Feature request

2009-05-30 Thread Jim Wilcoxson
I read through the header after Simon's request, and saw the data
change counter he mentions.  What I wasn't sure about is whether this
number gets reset during a rollback.  For example, you start a
transaction, make some changes, some cache pages get flushed, and
SQLite changes this number so that other processes know that the
database has changed.  In fact, SQLite may change this number anytime
it changes pages in my local cache - not sure.  The header word seems
to be designed to prevent stale cache lookups.

If I rollback my changes, it would be unimportant for cache management
whether the value rolled back also, but it would be important if the
data didn't actually change for the value to roll back.  Or, you could
use the value as "data might have changed", but then it would be less
useful for file synchronization.

Jim

On 5/30/09, Simon Slavin  wrote:
>
> On 23 May 2009, at 3:32pm, Filip Navara wrote:
>
>> PRAGMA schema_version ... for the second case.
>
> Okay.  Given that this does exactly what I want from one of my
> requests, and given Jim Wilcoxson's point that adding a PRAGMA is
> better than adding a function to the library, I can simplify my
> feature request to asking for something like
>
> PRAGMA content_version  OR  PRAGMA data_version
>
> which returns the number stored in bytes 24 to 27 of the header.  This
> should have only a small impact on library size, and require very
> little extra code since the number needs to be worked out anyway for
> storage in the header.  It should be useful for purposes associated
> with synchronisation and journaling but it's mostly so that
> applications which store some data outside SQL and some inside can
> tell if they need to worry about something messing with their data.
>
> It doesn't matter to me whether a schema-change is considered a
> content-change or not.  I can sum the two in my own code if needed.
> But the documentation should describe whether field-changes, or
> COMMITs, or whatever is counted.
>
> I know I can read the file's header myself (and that's what my current
> solution does) but this means I need to include file-handing libraries
> in my library which I don't otherwise need.  It's not a neat solution.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow Transaction Speed?

2009-05-31 Thread Jim Wilcoxson
Here's an update for the commit rate tester for Mac OSX.  With the
original version, using fsync(), OSX doesn't do real syncs because the
fsync call isn't properly implemented (my opinion).  Apple took a
shortcut with fsync and only flushes OS buffers to the drive, but the
drive is then free to reorder the requests (that's okay) and execute
them at a later time (that's not okay).  You have to use
fdcntl(F_FULLSYNC) to get data actually written to the platters.  At
least Apple does document this behavior.  Here's the updated program:

#include 
#include 
#include 
#include 
#include 

#define MAX 3000

main () {
  int fd;
  int n;
  int loops;
  time_t elap;
  time_t start;

  if ((fd=open("big", O_RDWR+O_CREAT, 0777)) == -1) {
perror("Error opening file");
exit(1);
  }

  start = time(NULL);
  for(loops=0; loops<MAX; loops++) {
if (lseek(fd, 0, SEEK_SET) == -1) {
  perror("Error seeking file");
  exit(1);
}
n = write(fd, , 1);
if (n != 1) {
  perror("Error writing file");
  exit(1);
}
#ifdef __APPLE__
fcntl(fd, F_FULLFSYNC);
#else
fsync(fd);
#endif
  }
  elap = time(NULL)-start;
  printf("Time: %d seconds; TPS=%f\n", elap, MAX*1.0/elap);
}


With F_FULLSYNC, my G5 server and G4 laptop running 10.4 get these results:

g5:~ mac$ ./sync
Time: 174 seconds; TPS=17.241379

$ ./sync
Time: 265 seconds; TPS=11.320755

The G5 is running a software RAID driver, so I thought that might be
contributing to the awful performance, but the G4 laptop w/o RAID is
even worse - I'm assuming because the drive rotational speed is
probably 4200RPM instead of 7200RPM.

These results are pretty horrible.  The SQLite commit documentation
says that Apple is doing a hard drive reset to implement the cache
flush, which is why the performance is so bad.  Maybe they have
improved things with Leopard (10.5).

Since SQLite does 3-4 syncs per commit, I was curious what the commit
performance would be like in this environment.  Here are the results
with a Python test program and SQLite 3.6.14.2:

import sqlite3
import time

def runtest(path, sync):
con = sqlite3.connect(path)
con.execute('create table if not exists t (f)')
con.execute('pragma synchronous=' + sync)
con.execute('pragma fullsync=ON')
con.execute('delete from t')
con.commit()
start = time.time()
end = start+60
i = 0
n = 3000
while i < n and time.time() < end:
con.execute("insert into t (f) values (?)", (i,))
con.commit()
i += 1
elap = time.time() - start
print "For pragma synchronous=", sync, "Time:", elap, "TPS:", n/elap
con.close()


path = "table"
runtest(path, "off")
runtest(path, "normal")
runtest(path, "full")


g5:~ mac$ Python-2.6.1/python.exe dbsync.py
/Users/mac/Python-2.6.1/Lib/sqlite3/dbapi2.py:27: RuntimeWarning:
Python C API version mismatch for module _sqlite3: This Python has API
version 1013, module _sqlite3 has version 1012.
  from _sqlite3 import *
For pragma synchronous= off Time: 1.97417807579 TPS: 1519.61975305
For pragma synchronous= normal Time: 4.06918787956 TPS: 737.247846202
For pragma synchronous= full Time: 4.78447008133 TPS: 627.028688445

Obviously, there's no way a 7200RPM drive can actually achieve these
results.  I'm building my own version of Python and sqlite here, and
after some research, I think I didn't build sqlite with the
-DHAVE_FULLFSYNC flag, so it isn't actually doing the fdcntl() call.

Jim


On 5/29/09, Jim Wilcoxson <pri...@gmail.com> wrote:
> I agree that adding this to the library, and making it accessible via
> a pragma command would be very useful.  For example, pragma commitrate
> 1000 would test the commit rate of 1000 commits and return the results
> in transactions per second as a row.
>
> If I install my app on a client's machine, I could run this test
> periodically to ensure that the system environment is going to support
> "no corruption" operation, and/or send some kind of warning to my
> customer that there is a risk of corruption because their system
> environment has problems.
>
> If it were only an external program bundled with sqlite, I couldn't
> really make use of it, because I'd have to distribute the program and
> instructions how to use it, and rely on customers to actually do it.
>
> This is a pretty small function.  Just for my own use, I'd consider
> foreign key support to be way more bloated that this.
>
> Jim
>
> On 5/29/09, Marcus Grimm <mgr...@medcom-online.de> wrote:
>>> On Thu, May 28, 2009 at 03:12:55PM -0700, Allen Fowler scratched on the
>>> wall:
>>>>
>>>> > just for anybody who is interested:
>>>>
>>>> >
>>>> > I translated Jim's function into window code a

Re: [sqlite] Slow Transaction Speed?

2009-05-31 Thread Jim Wilcoxson
Simon, the real point here is that Python and SQLite aren't doing real
transactions.  What we should be seeing here is AT MOST 30 TPS with
synchronous=normal, assuming SQLite does 3 hard disk syncs per commit
as was described in an earlier post.

On my old AMD box, the one where fsync() is broken, I get this:

[...@amd toys]$ uname -a
Linux amd 2.6.25-ARCH #1 SMP PREEMPT Sat Jun 14 18:07:19 CEST 2008
i686 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux

[...@amd toys]$ py dbsync.py
For pragma synchronous= off Time: 0.380044937134 TPS: 7893.80335553
For pragma synchronous= normal Time: 5.70174884796 TPS: 526.154357197
For pragma synchronous= full Time: 7.00615310669 TPS: 428.195038606

[...@amd toys]$ py dbsync.py
For pragma synchronous= off Time: 0.361596107483 TPS: 8296.54948689
For pragma synchronous= normal Time: 5.72849082947 TPS: 523.698141327
For pragma synchronous= full Time: 7.06226301193 TPS: 424.793015345

But by disabling the IDE hard drive write cache:

[r...@amd toys]# hdparm -W 0 /dev/sda
/dev/sda:
 setting drive write-caching to 0 (off)
 write-caching =  0 (off)

[r...@amd toys]# py dbsync.py
For pragma synchronous= off Time: 0.361658096313 TPS: 8295.12744379
For pragma synchronous= normal Time: 60.0442349911 TPS: 49.9631646643
For pragma synchronous= full Time: 60.0657091141 TPS: 49.9453023072

These last 2 are expected numbers for a true "on the platters" commit
that is limited by the drive's rotational speed.  The CPU doesn't
really matter for this test.

Jim

On 5/31/09, Simon Slavin  wrote:
>
> On 31 May 2009, at 4:59pm, Simon Slavin wrote:
>
>> Standard MacBookPro3,1 Core 2 Duo 2.4 GHz, OS 10.5.7, whatever hard
>> disk Apple puts in them.
>> Using SQLite version 3.4.0 (installed as part of the OS installation).
>>
>> SimonsMBP:Documents simon$ python --version
>> Python 2.5.1
>>
>> SimonsMBP:Documents simon$ python dbsynch.py
>> For pragma synchronous= off Time: 2.70638608932 TPS: 1108.48929199
>> For pragma synchronous= normal Time: 4.9895169735 TPS: 601.260606174
>> For pragma synchronous= full Time: 6.40981006622 TPS: 468.032588954
>
> I am, of course, an idiot.  That was running from the command line.
> If I run it as a .pyc I get
>
>  >>> import dbsynch
> For pragma synchronous= off Time: 2.69734382629 TPS: 1112.2052631
> For pragma synchronous= normal Time: 4.79093813896 TPS: 626.182161611
> For pragma synchronous= full Time: 6.31202697754 TPS: 475.283139739
>
> SimonsMBP:Documents simon$ python dbsynch.pyc
> For pragma synchronous= off Time: 2.61415219307 TPS: 1147.59959575
> For pragma synchronous= normal Time: 4.84184503555 TPS: 619.598516262
> For pragma synchronous= full Time: 6.0191090107 TPS: 498.412637928
>
> I don't see why the 'full' figures vary so much.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow Transaction Speed?

2009-06-01 Thread Jim Wilcoxson
Microsoft has an interesting article on hard drive caches re: SQL Server:

http://support.microsoft.com/kb/234656

"Many disk drives (SATA, ATA, SCSI and IDE based) contain onboard
caches of 512 KB, 1 MB, and larger. Drive caches usually rely on a
capacitor and not a battery-backed solution. These caching mechanisms
cannot guarantee writes across a power cycle or similar failure point.
They only guarantee the completion of the sector write operations. As
the drives continue to grow in size, the caches become larger, and
they can expose larger amounts of data during a failure. "


On 6/1/09, John Stanton  wrote:
> Simon Slavin wrote:
>> On 31 May 2009, at 11:56pm, John Stanton wrote:
>>
>>
>>> Try studying basic database theory and technology to get a better
>>> understanding of the problem.
>>>
>>
>> I have a pretty good understanding, I think.  Plus 25 years experience.
>>
> Is it 23 years experience or 1 year 23 times?  This forum is to share
> information on Sqlite, for mutual advantage, not get into pointless
> arguments.  Let us keep it that way.
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Slow select from a single table when using a view

2009-06-11 Thread Jim Wilcoxson
If you only have a handful of values for C and are already going to
the trouble of creating separate views for each C, you could partition
your data into separate tables for each value of C and maybe create
another table containing the list of values of C and maybe the number
of items in each C table (if that would be useful), or other
attributes common to each value of C.

Whether that makes sense depends on what else you're doing with the
views.  You might have to repeat queries across a set of tables and
then do some work in your application to collate the results.  That
could be a big deal or not, depending on what you're doing.

For this min example, you'd only have to query the master table.

Jim

On 6/11/09, Antti Nietosvaara  wrote:
> On Thursday 11 June 2009 16:30:12 Igor Tandetnik wrote:
>> > Ah, this would indeed explain the slowdown. I was hoping views would
>> > translate into the "where" part of the query, like:
>> > SELECT min(C) FROM T_view; -> SELECT min(C) FROM T WHERE (C=1 OR C=2);
>>
>> I predict this last query wouldn't run any faster. Try it.
>
> You are correct. It is way too slow.
>
> --
> Antti Nietosvaara
> Turun Turvatekniikka Oy
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.

2009-06-11 Thread Jim Wilcoxson
Here's what I'd try:

1. Write a small server that accepts connections and writes to the
SQLite database using prepared statements.  If you need require 500
transaction per second, it's simply not possible with rotating media.
So the solution is to either turn off synchronous, which is dangerous,
or use group commit, ie, batching your external transactions into
larger database commits.  It's possible you might lose a batch of
transactions from a hardware failure, but at least your database will
be usable afterwards.  With synchronous=off, your database may be
corrupted.

You might object to group commit because when you "ack" your external
connection, you want to ensure you have done a commit before closing
the connection.  However, with synchronous=off, your commit is only in
memory, so it's basically the same as group commit.  Using group
commit, it should be easy to do 500 SQL external "transactions" per
second.

2. If this is still too slow, it's likely because of establishing the
TCP connection.  If possible, you could switch to UDP, which has a
much lower overhead.  You'd have to be able to live with losing or
repeating data points sometimes, but maybe that would be easy to
manage in your mini server above by ignoring repeated data points or
using averaging to fill in missing data points.

Jim

On 6/11/09, Pavel Ivanov  wrote:
> I bet "synchronous"ness will not be your only bottleneck. Opening
> connection, preparing statement and closing connection will take in
> total much longer than executing statement itself. So that doing all
> these operations 500 times per second will not be possible I think. If
> you keep pool of connections along with already prepared statements
> then your application will have chances for survival in such
> contention environment.
> And yes, your application will not have any chances without
> "synchronous = OFF". Without it you're able to do only 10 to 20
> transactions per second.
>
> Pavel
>
> On Thu, Jun 11, 2009 at 9:53 AM, Robel Girma wrote:
>> Thanks all for your input, very helpful. And yes, there will be 500
>> separate
>> connections to the db per seconds, each updating 1 record. I've read about
>> setting PRAGMA synchronous=OFF to cause SQLite to not wait on data to
>> reach
>> the disk surface, which will make write operations appear to be much
>> faster.
>> "But if you lose power in the middle of a transaction, your database file
>> might go corrupt"==> I can live with this risk if it makes an huge
>> improvement with the possible contention issue I'm facing. Any input with
>> this setting you can provide will be greatly appreciated as always.
>> Robel
>>
>> -Original Message-
>> From: sqlite-users-boun...@sqlite.org
>> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Simon Slavin
>> Sent: Thursday, June 11, 2009 4:49 AM
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Advice on which to use (SQLite or SQL Server) for
>> the
>> following app.
>>
>>
>> On 11 Jun 2009, at 8:23am, Roger Binns wrote:
>>
>>> It depends very strongly on how the app is structured and in
>>> particular
>>> if there are a few persistent connections to the SQLite database, or
>>> if
>>> each request involves a separate connection to the database.  If you
>>> have lots of connections then there will be contention.
>>
>> 500 connections a second, each from a different computer.  If the OP
>> handles each one with a separate run of his/her application, that's
>> 500 connections to the database a second, each updating one record in
>> one table.
>>
>>> If the work done during contention is quick and simple then you are
>>> fine.  If it is long running then you will benefit from a server based
>>> approach.  But when you have commits then disk access is serialized
>>> and
>>> you will have performance limitations  no matter what the database
>>> server or SQLite.  (That is the point Florian is making.)
>>
>> As far as I can tell, with correct programming each query would be one
>> connection for all the data the query would want.  So the OP's
>> objective is /probably/ achievable with SQLite but I'd want to
>> prototype it to be sure.
>>
>> Sorry, Robel, but we can only guess.  Try it.
>>
>> Simon.
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.

2009-06-11 Thread Jim Wilcoxson
I should have mentioned, if it were me, I'd write the mini server
first as a single process in a loop, and make it as fast as possible.
If you try to do db updates with multiple processes, you'll have
concurrency issues.  It might make sense to use multiple processes if
you also have lots of queries, and have only 1 process (or thread)
writing, while a pool of processes handles queries.  Not sure how well
SQLite handles this situation, but since you are doing group commits,
it will greatly decrease your write load and potential concurrency
issues.

Jim

On 6/11/09, Jim Wilcoxson <pri...@gmail.com> wrote:
> Here's what I'd try:
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.

2009-06-11 Thread Jim Wilcoxson
SSD's usually have poor write performance, because to do a write, they
have to use read, erase, write sequences across large blocks like 64K.
 Most of the SSD benchmarks that quote good write performance are for
sequential write performance.  If you skip all over the disk doing
small writes, like a database does, I suspect you'll see pretty bad
performance with most SSD's.  In most of the SSD benchmarks I've seen,
random write performance is worse than rotating media.  Actually, most
of the SSD benchmarks I've seen completely skip over this point,
perhaps not even testing random write performance, but only sequential
writes.

Using a separate copy of the database on SSD for queries would
probably work well, assuming the database doesn't fit into RAM.  If
the db does fit in RAM, SSD won't buy you anything.

Companies are putting all kinds of smarts into SSD's to try to
minimize the effects of the read, erase, write cycle, usually by some
form of caching, but then you are also playing with losing the
transaction guarantees of a commit.  Can't really have it both ways.

Jim

On 6/11/09, Sam Carleton <scarle...@gmail.com> wrote:
> Jim Wilcoxson wrote:
>> Here's what I'd try:
>>
>> 1. Write a small server that accepts connections and writes to the
>> SQLite database using prepared statements.  If you need require 500
>> transaction per second, it's simply not possible with rotating media.
>
> I am a late comer to this discussion, so this might have already been
> purposed...
>
> Ever consider having all the updates are done on none rotating media
> (SSD) and then having 1 process that reads in blocks of the data and
> puts it into it's final home?  Purge the SSD SQLite db simply by rolling
> to a new one from time to time and delete the old once once all the
> connections have moved on to the new SQLite db.
>
> Sam
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.

2009-06-11 Thread Jim Wilcoxson
Hey, if anybody has an SSD laying around, it would be interesting to
run that commit test program I posted a while back to see what kind of
transaction rates are possible.  Although, verifying whether the SSD
is actually doing the commits or just saying it is would be very
difficult.  With rotating media, you at least have an upper bound
(120/sec for 7200rpm, 166/sec for 10Krpm, 250/sec for 15Krpm) and if
you go outside that, you know it's a lie.  Not sure how you could
verify that commits/syncs with an SSD are actually working other than
repeatedly pulling the plug and seeing if the db survives intact.

Jim

On 6/11/09, Jim Wilcoxson <pri...@gmail.com> wrote:
> SSD's usually have poor write performance, because to do a write, they
> have to use read, erase, write sequences across large blocks like 64K.

-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Advice on which to use (SQLite or SQL Server) for the following app.

2009-06-11 Thread Jim Wilcoxson
Yes, good point.

If you partition the database into multiple databases, you will have
to place each on its own physical disk drive to increase transaction
rates.  If your base transaction rate with one drive is T, with N
drives it should be N*T;  4 drives gives you 4x the transaction rate,
etc.  Each of the drives has to be completely independent - no
filesystems crossing drives.

If you partition into multiple databases and keep them on the same
drive, your transaction rate will likely go down, because now you are
introducing seeks back and forth between the two databases as you
commit.

I think someone mentioned using the journal_data option with ext3 to
increase performance.  In theory, it possibly could, but on my Linux
system, it didn't.  I got the same results when I tried using tune2fs
to change it, though I'm not sure it actually did anything.

Jim

On 6/11/09, Petite Abeille  wrote:
>
> On Jun 11, 2009, at 4:53 PM, Sam Carleton wrote:
>
>> I am a late comer to this discussion, so this might have already
>> been purposed...
>
> Additionally, if this was not mentioned already, you can partition
> your database across multiple physical files through the magic of
> 'attach database' or something and load balance across those.
>
> http://en.wikipedia.org/wiki/Partition_(database)
>
> CHeers,
>
> --
> PA.
> http://alt.textdrive.com/nanoki/
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] sqlite3_step performance degredation

2009-06-15 Thread Jim Wilcoxson
There was a recent SQLite bug that caused the size of the SQLite cache
to shrink in some circumstances, and the longer a program ran, the
smaller the cache became.  Maybe you are running into this bug.  IIRC,
you had to do an update in the select loop to trigger the bug, so if
you're not doing that, maybe this doesn't apply.

Jim

On 6/15/09, Mike Borland  wrote:
> Nuno, unfortunately your psychic skills are a bit off on this one.  Sorry I
> wasn't more explicit.  I am not using any LIMIT or OFFSET to do any virtual
> scrolling.  Basically I have table A which has 900 rows.  Table B has
> 180,000 rows (900 * 200) which has a foreign key relationship back to table
> A.  So for each row in table A, there are 200 rows in table B.  My query is
> basically a "SELECT * FROM Table B WHERE ID = TableA.ID".  I'm executing
> this query 900 times, once for each row in table A.
>
> When I start the 900 read iterations (always in the same order), the first
> one generally reads in about 50ms and by the last read, it's taking roughly
> 1000ms.  Sometimes it slows down immediately, sometimes after the 100th
> iteration.  The only absolutely reproducible aspect is that it always slows
> down eventually and once it slows down, it never speeds back up.  I don't
> believe it's a locking issue since my timer doesn't start until the query is
> successfully executed.
>
> Any ideas?  Would the occasional write operation in the midst of these reads
> cause any permanent slow down to the read time?  Thanks.
>
> Mike Borland
>
> -Original Message-
> From: Nuno Lucas [mailto:ntlu...@gmail.com]
> Sent: Friday, June 12, 2009 7:16 PM
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] sqlite3_step performance degredation
>
> On Sat, Jun 13, 2009 at 1:52 AM, Mike
> Borland wrote:
>> I have a fairly complex program where I am seeing a performance
>> degradation of the sqlite3_step() function.  Basically I'm iterating
>> roughly 200 rows at a time, over and over.  The only work happening
>> during the iteration is I'm copying the record into an array.  At first,
>> sqlite3_step() takes less than a millisecond to run.  After 0-50
>> iterations, it's taking anywhere from 10-100ms.
>>
>> Does anybody have any insight into what's happening behind the scenes
>> with this function to help me track down the cause?  I appreciate it!
>
> You should explicitly say what your SQL query is. Without that we can
> only guess.
>
> My current "psychic" guess is that you are using LIMIT to obtain those
> 200 rows, one "page" at a time, and as you go advancing "pages" it
> becomes slower and slower.
> If this is true, then you should re-think your design as LIMIT just
> skips the rows, but it will  "generate" them before, meaning it
> becomes slower as you advance on the offset given.
> Look at http://www.sqlite.org/cvstrac/wiki?p=ScrollingCursor (and
> notice the "What not to do" at the end, talking about "LIMIT" and
> "OFFSET").
>
> If my my psychic abilities are becoming weak, then please supply your
> exact query that is getting slower  (and maybe your database schema)
> and then someone can give you an exact answer.
>
>
> Regards,
> ~Nuno Lucas
>
>>
>> Mike Borland
>>
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Database inserts gradually slowing down

2009-06-16 Thread Jim Wilcoxson
You are doing transactions here, which is a very different thing from
normal disk I/O.  Your CPU is idle because it is waiting for the disk.
 Your disk is idle because it is waiting for the platters to rotate
around again.  The best case you can achieve on a 7200RPM disk is 120
transactions (commits) per second.

In practice, you will see much lower TPS rates, and they will
typically be 50% fractions.  For 1 sync per transaction, the max rate
is 120 TPS.  For 2 syncs per transaction, the max rate is 60 TPS.  For
3 syncs per transaction, the max rate is 30 TPS.  SQLite always does
at least 2 syncs per transaction, and sometimes 3, depending on the
options you use, so a transaction rate of 30 TPS is reasonable.

The only way you can speed this up is to get a disk that rotates
faster, ie, a 10K or 15K rpm drive will do faster transactions, but
even so, you are still limited to 250 TPS with a 15K drive.  Or, get a
battery-backup caching controller that will lie to your OS and tell it
that the data is on the media, when really it is only stored in the
controller's memory.  This allows the controller to combine write
requests to increase the transaction rate.

Jim

On 6/16/09, Jens Páll Hafsteinsson  wrote:
> Yes, I'm perfectly aware of this and hence I would expect the disk to be
> sweating like hell running this test while the CPU is relatively relaxed
> (given that sqlite is disk bound in this case and not CPU bound).
>
> But this is not happening; neither the disk nor the CPU are practically
> doing anything, which is a bit strange. It's as if both the disk and the CPU
> are waiting for each other or that sqlite is playing 'nice' behind my back
> and giving up the CPU when it shouldn't.
>
> JP
>
> -Original Message-
> From: sqlite-users-boun...@sqlite.org
> [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Adam DeVita
> Sent: 16. júní 2009 14:06
> To: General Discussion of SQLite Database
> Subject: Re: [sqlite] Database inserts gradually slowing down
>
> Remember the implications of Moore's law and how much time has passed.
>
> CPU speed is much faster than memory speed.
> Memory speed is much faster than disk access.
>
> This is why hardware folks play all sorts of tricks with pipelines, caches,
> interleaving, and parallelism.
>
> For a single process that interacts with the HDD, the HDD will be the bottle
> neck and the CPU will spend lots of time waiting for the rest of the machine
> to catch up.  Even if you have a RAID system, the CPU is still much faster
> than the bus the hard drives are on.
>
>
> On Tue, Jun 16, 2009 at 9:52 AM, Jens Páll Hafsteinsson
> wrote:
>
>> In step 5 I execute "delete from t1" without any where clause.
>>
>> I haven't monitored the disk space used (does sqlite use temporary files
>> beside the database file?) but the database file itself has been fixed in
>> size at around 12MB (12.461.056 bytes) the whole time.
>>
>> The load on the disk is minimal (sustained 1MB/s) and the CPU load is
>> about
>> 6%, which is a bit surprising since I thought I would be putting a huge
>> load
>> on the computer running a loop like this. I'm not at all happy to see
>> these
>> low load numbers given how the test is programmed (it should practically
>> own
>> the machine). The database should utilize the computer much better than
>> this.
>>
>> I've been running the test now for about 10 minutes using 3.6.15 and it
>> 'seems' to be behaving as before, slowly increasing in execution time. I
>> want to run this a bit longer to be absolutely sure and will let you know.
>>
>> JP
>>
>> -Original Message-
>> From: sqlite-users-boun...@sqlite.org [mailto:
>> sqlite-users-boun...@sqlite.org] On Behalf Of Pavel Ivanov
>> Sent: 16. júní 2009 12:15
>> To: General Discussion of SQLite Database
>> Subject: Re: [sqlite] Database inserts gradually slowing down
>>
>> How do you do step 5? Like "delete from table" or "delete from table
>> where ..."? Do you see any degrade in disk space used by database
>> along with slowness?
>>
>> Pavel
>>
>> On Tue, Jun 16, 2009 at 8:06 AM, Jens Páll
>> Hafsteinsson wrote:
>> > Hi
>> >
>> > I've been running some tests against sqlite and have found that inserts
>> are gradually slowing down. Since I'm new to sqlite I might be doing
>> something wrong and thought I'd ask here if anyone has seen this before or
>> know what might be causing this.
>> >
>> > The test I'm running creates a database containing a single table (int,
>> int, varchar(100)) along with an index (not unique) on the first field. I
>> then perform the following operations (all records have unique data in the
>> first field):
>> >
>> >
>> > 1.   start a transaction
>> >
>> > 2.   insert 1000 records
>> >
>> > 3.   commit
>> >
>> > 4.   repeat steps 1-3 100 times
>> >
>> > 5.   delete everything from the table
>> >
>> > 6.   Start again at step 1
>> >
>> > The time taken to execute steps 1-4 increases 

Re: [sqlite] async io and locks

2009-06-20 Thread Jim Wilcoxson
Async I/O fits perfectly with my app: I don't need the durable
guarantee.  But I do need the ability to activate it with a pragma
since it isn't a C app.

Are there plans to make async I/O available via pragma?

Jim

On 6/20/09, Dan  wrote:
>
> I think we have quite different approaches.
>
> The SQLite asynchronous IO backend is queueing IO, not queries. It
> intercepts
> the equivalent of the write() system call. It does not overload query
> processing
> in any way. The things on the write-queue are blobs of data to be
> written
> to specific offsets within the database file, not SQL queries.
>
> It's more complex than that of course. But the point is that the write-
> queue is
> a layer between SQLite and the OS, not between the user and SQLite.
>
> Dan.
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Python sqlite binding: commit inside select loop

2009-06-30 Thread Jim Wilcoxson
I'm using the Python sqlite3 (pysqlite) bindings.  I'd like to be able
to do a select, and while fetching those rows and inserting new ones,
periodically do a commit.  With the Python bindings, an error occurs
because a commit resets all pending select statements.  Here is an
example showing the error:

import sqlite3

con = sqlite3.connect('testdb')
con.execute('drop table t')
con.execute('create table t(f)')
con.execute('insert into t (f) values (1)')
con.execute('insert into t (f) values (2)')
con.execute('insert into t (f) values (3)')

print '1st select:'
cur = con.execute('select * from t')
for row in cur:
print row

print '2nd select:'
cur = con.execute('select * from t')
for row in cur:
print row
con.execute('update t set f=2 where f=1')
con.commit()

Running it shows:

1st select:
(1,)
(2,)
(3,)
2nd select:
(1,)
Traceback (most recent call last):
  File "dbbug.py", line 17, in 
for row in cur:
sqlite3.InterfaceError: Cursor needed to be reset because of
commit/rollback and can no longer be fetched from.

I can understand that there is some potential weirdness here in the
case where rows are being deleted or rowid's are getting changed
inside a select, and I think deciding I want to see those updates in
the select results is an isolation level setting - not sure of the
details.  Right now, I don't think it matters whether I see the old or
new values, because I'm mostly just adding rows inside the
transaction, updating some stuff where the old or new value would be
okay, and not doing any deletes.

So I have a couple of questions/observations:

1. Is the thing about resetting cursors an SQLite requirement, or
something the Python binding is doing to protect people from the
confusion of seeing the updated rows.

2. If I were deleting rows, I can see where that would be a problem:
what if some of the rows satisfying the query were deleted?  Once the
commit happens, the rows are gone unless there is a delete/add with
the same rowid.

I'm thinking about removing this reset from pysqlite, but don't want
to trash my database either.

Thanks for any insight.
Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Python sqlite binding: commit inside select loop

2009-06-30 Thread Jim Wilcoxson
I guess I am a bit confused.  Igor says it's not possible, but Roger
says my example works.

One other point of confusion is that sometimes an interface layer will
grab all or a bunch of the rows after a select, even though it may
hand them to upper layers one row at a time.  For example, with
pysqlite, cur.fetchmany() does this.  When this happens, it appears as
if the transaction & commit occur, but the select & fetch loop have
already finished behind the scenes.

For my particular app, I'm only adding new rows, so there wouldn't be
too many weird effects.

Jim

On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Jim Wilcoxson wrote:
>> With the Python bindings, an error occurs
>> because a commit resets all pending select statements.
>
> Note that there are some constraints about how SQLite works detailed by
> Igor, but in this particular case you are being stymied by pysqlite.
>
> By default pysqlite tries to parse the SQL you execute and does
> transaction management behind the scenes on your behalf (ie calling
> begin and commit as it deems fit).  This is allegedly something required
> by the Python DBAPI standard although I admit I don't understand it
> myself.  Apparently you use the isolation level parameter with pysqlite
> to control how much of this nonsense it does behind the scenes.
>
> If you use apsw (disclaimer: I am the author) then it doesn't do any
> nonsense behind your back and you get (1,) (2,) (3,) printed out twice
> without any exceptions as you were expecting.
>
> Roger
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkpKZKgACgkQmOOfHg372QSMJwCgkIoZ8VGqUcpn8rMtZPF7kpoF
> vqIAnj5qIaSCy7VTp5mJsAQ4mBVCk+GD
> =SGGD
> -END PGP SIGNATURE-
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Python sqlite binding: commit inside select loop

2009-06-30 Thread Jim Wilcoxson
Thanks for the comments and explanations everyone - much appreciated.
It seems there are a few alternatives I can check out.

Jim

On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Jim Wilcoxson wrote:
>> I guess I am a bit confused.  Igor says it's not possible, but Roger
>> says my example works.
>
> For the exact code you were running the error message came from pysqlite
> and not from SQLite.  Other posters are not aware of the underlying
> implementation details in pysqlite and it doing all sorts of transaction
> related stuff behind the scenes, and so were telling you about SQLite
> behaviour in general.  In summary the behaviour you see from pysqlite
> will not necessarily match the behaviour you would see if using the
> SQLite C api directly to do the same thing.
>
> It was issues like this that led me to create APSW in the first place
> because I wanted the exact SQLite semantics and not some random
> "standard" that tried to pretend all database apis behave in exactly the
> same way.  I also document which SQLite apis are called from each Python
> api and the index lets you work in the reverse direction:
>
>   http://apsw.googlecode.com/svn/publish/genindex.html#S
>
>> One other point of confusion is that sometimes an interface layer will
>> grab all or a bunch of the rows after a select, even though it may
>> hand them to upper layers one row at a time.
>
> Note that none of the Python apis do that by default and you have to
> call a method to grab all the results.  The apis for other databases do
> do that by default since they use the network and grabbing result rows
> one at a time would be far slower than batching them.
>
>> For my particular app, I'm only adding new rows, so there wouldn't be
>> too many weird effects.
>
> There was a poster (I forget if it was this list or the pysqlite/APSW
> one) who managed to get an infinite loop.  Effectively he was changing a
> rowid for each row found (an UPDATE was doing it - rowid wasn't
> explicitly mentioned in the query) while doing a select over all items.
>  The constantly updated rowid effectively added the changed row to the
> end of the select results and so it carried on forever.
>
> Although the relevant information is in my APSW docs, I think I need to
> pull more of it together and make things clearer.
>
> Roger
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkpKfdYACgkQmOOfHg372QTAjQCeL3zZCBh8Pgnh/GT95RH0JIY0
> qGgAoKsqr66QwxKDtQNqe6W1jayU90/J
> =Xs+y
> -END PGP SIGNATURE-
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Python sqlite binding: commit inside select loop

2009-06-30 Thread Jim Wilcoxson
I did some checking on the pysqlite mailing list.  Apparently the
change to reset all cursors before commit is recent, and the reason is
that pysqlite or sqlite itself would return bogus rows.  There is an
example here:

http://article.gmane.org/gmane.comp.python.db.pysqlite.user/2217

Returning old or new data would be one thing, but in this example,
totally wrong data is returned.  There's no explanation of why it was
happening.

On 6/30/09, Jim Wilcoxson <pri...@gmail.com> wrote:
> Thanks for the comments and explanations everyone - much appreciated.
> It seems there are a few alternatives I can check out.
>
> Jim
>
> On 6/30/09, Roger Binns <rog...@rogerbinns.com> wrote:
>> -BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Jim Wilcoxson wrote:
>>> I guess I am a bit confused.  Igor says it's not possible, but Roger
>>> says my example works.
>>
>> For the exact code you were running the error message came from pysqlite
>> and not from SQLite.  Other posters are not aware of the underlying
>> implementation details in pysqlite and it doing all sorts of transaction
>> related stuff behind the scenes, and so were telling you about SQLite
>> behaviour in general.  In summary the behaviour you see from pysqlite
>> will not necessarily match the behaviour you would see if using the
>> SQLite C api directly to do the same thing.
>>
>> It was issues like this that led me to create APSW in the first place
>> because I wanted the exact SQLite semantics and not some random
>> "standard" that tried to pretend all database apis behave in exactly the
>> same way.  I also document which SQLite apis are called from each Python
>> api and the index lets you work in the reverse direction:
>>
>>   http://apsw.googlecode.com/svn/publish/genindex.html#S
>>
>>> One other point of confusion is that sometimes an interface layer will
>>> grab all or a bunch of the rows after a select, even though it may
>>> hand them to upper layers one row at a time.
>>
>> Note that none of the Python apis do that by default and you have to
>> call a method to grab all the results.  The apis for other databases do
>> do that by default since they use the network and grabbing result rows
>> one at a time would be far slower than batching them.
>>
>>> For my particular app, I'm only adding new rows, so there wouldn't be
>>> too many weird effects.
>>
>> There was a poster (I forget if it was this list or the pysqlite/APSW
>> one) who managed to get an infinite loop.  Effectively he was changing a
>> rowid for each row found (an UPDATE was doing it - rowid wasn't
>> explicitly mentioned in the query) while doing a select over all items.
>>  The constantly updated rowid effectively added the changed row to the
>> end of the select results and so it carried on forever.
>>
>> Although the relevant information is in my APSW docs, I think I need to
>> pull more of it together and make things clearer.
>>
>> Roger
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.9 (GNU/Linux)
>>
>> iEYEARECAAYFAkpKfdYACgkQmOOfHg372QTAjQCeL3zZCBh8Pgnh/GT95RH0JIY0
>> qGgAoKsqr66QwxKDtQNqe6W1jayU90/J
>> =Xs+y
>> -END PGP SIGNATURE-
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
> --
> Software first.  Software lasts!
>


-- 
Software first.  Software lasts!
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] shell .schema command doesn't display attached table schemas

2017-02-21 Thread Jim Wilcoxson
Example:

[jim@mb ~]$ sqlite3 dummy
-- Loading resources from /Users/jim/.sqliterc
SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension
Copyright 2016 Hipp, Wyrick & Company, Inc.
Enter ".help" for usage hints.
sqlite> create table t1(text);
sqlite> ^D

[jim@mb ~]$ ls -l dummy
-rw-r--r--  1 jim  staff  8192 Feb 21 10:23 dummy

[jim@mb ~]$ sqlite3 dummy2
-- Loading resources from /Users/jim/.sqliterc
SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension
Copyright 2016 Hipp, Wyrick & Company, Inc.
Enter ".help" for usage hints.
sqlite> create table t2(text);
sqlite> ^D

[jim@mb ~]$ ls -l dummy2
-rw-r--r--  1 jim  staff  8192 Feb 21 10:24 dummy2

[jim@mb ~]$ sqlite3 dummy
-- Loading resources from /Users/jim/.sqliterc
SQLite version 3.15.1 2016-11-04 12:08:49 with the Encryption Extension
Copyright 2016 Hipp, Wyrick & Company, Inc.
Enter ".help" for usage hints.
sqlite> attach dummy2 as d2;
sqlite> .tables
d2.t2  t1
sqlite> .schema
CREATE TABLE t1(text);
sqlite> ^D

HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users