subject:"\[sqlite\] UPDATE\/INSERTing 1\-2k rows slower than expected"

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-22 Thread Thomas Fjellstrom

On February 15, 2011, Black, Michael (IS) wrote:
> I'll give you another failure point that most people never see or think of.
> 
> I used to manage numerous Linux systems with RAID-5.  One time I had a
> drive fail, the spare kicked in, and then during the rebuild a 2nd drive
> failed...hosing the RAID (i.e. two failed disks).
> 
> The problem was...normal disk access/backup only scanned the in-use blocks.
>  The RAID resync scanned the entire disk which had never been done.

I think this is one reason mdraid on linux by default will run a resync 
regularly.

> After that I put in a utility that did a nightly "dd if=/dev/md0
> >/dev/null" job to force a scan of the entire disk set.
> 
> This is one reason why they invented RAID6.
> 
> There's just so many ways to fail...sigh...
> 
> 
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
> 
> 
> 
> 
> From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on
> behalf of Simon Slavin [slav...@bigfraud.org] Sent: Monday, February 14,
> 2011 5:04 PM
> To: General Discussion of SQLite Database
> Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected
> 
> On 14 Feb 2011, at 8:50pm, Black, Michael (IS) wrote:
> > And if you want to talk about data reliability...BACK UP YOUR DATA.
> 
> And keep the backups off-site.  And once a year try to actually restore one
> of them.  I've earned /lots/ of money from companies with backup systems
> that claimed to work fine but were either faulty, or being used to backup
> to media with low-level corruption.
> 
> If there was any money in it I'd write a book, but there isn't.  Companies
> prefer to do it wrong and then hire a consultant after it fails.
> 
> On 14 Feb 2011, at 9:13pm, Jim Wilcoxson wrote:
> > I think what would be useful is for drives to have 3 settings:
> > 
> > a) the drive can do whatever it wants to optimize performance
> > b) the drive can reorder writes, but not across a sync (ACI, but no D)
> > c) the drive has to respect all syncs (ACID)
> > 
> > If the drive mgfr wants to make a) the default, that's fine, but there
> > should be an easy way to request the other 2 from an application.
> > Users are not usually sophisticated enough to know when it's okay to
> > cache writes or not okay.  For my use of SQLite, b) would probably be
> > fine, but a) apparently is not since it corrupts databases.
> 
> This is a job for someone who wants to contact lots of manufacturers and
> ask if they provide models of their drives which conform.  Some do,
> because they want to sell their drives for use in servers.  Often these
> drives are physically identical to their mass-market drives, but they have
> fewer bad blocks, and have jumpers in a different place or different
> firmware.
> 
> > Michael mentioned doing backups and forgetting about all of this, but
> > many people are using USB drives to store their backups.  So there is
> > a bit of a Catch-22 here.  Almost all modern-day filesystems are going
> > to depend on some kind of journalling to prevent corrupted file
> > systems, and as far I as know, journalling filesystems depend on syncs
> > to maintain FS consistency.
> 
> If the data is backed up when SQLite has the files closed, things are fine.
>  The really hard part of this comes when you're working with systems that
> must be live at all times.  Which is why you'll never see a bank use
> SQLite to maintain its live transaction system.  The big DBMSs have
> systems inside themselves that allow the backing-up of an active
> ever-changing database.
> 
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


-- 
Thomas Fjellstrom
tfjellst...@strangesoft.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-16 Thread Shane Harrelson

On Mon, Feb 14, 2011 at 6:04 PM, Simon Slavin  wrote:
>
> If the data is backed up when SQLite has the files closed, things are fine.  
> The really hard part of this comes when you're working with systems that must 
> be live at all times.  Which is why you'll never see a bank use SQLite to 
> maintain its live transaction system.  The big DBMSs have systems inside 
> themselves that allow the backing-up of an active ever-changing database.
>
> Simon.

SQLite is used as the DB engine on some major 24/7 data stores (I
can't be more specific due to NDAs).   You can get an idea of some of
the high demand areas SQLite is used in by reviewing the consortium
membership list (http://www.sqlite.org) and well-known users list
(http://www.sqlite.org/famous.html), though this should not be seen as
an endorsement by any of the listed companies.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-16 Thread Matthew Jones

> I'll give you another failure point that most people never see or think of.
>
> I used to manage numerous Linux systems with RAID-5.  One time I had a drive 
> fail, the spare kicked in, and then during the rebuild a 2nd drive 
> failed...hosing the RAID (i.e. two failed disks).
>
> The problem was...normal disk access/backup only scanned the in-use blocks.  
> The RAID resync scanned the entire disk which had never been done.
>
> After that I put in a utility that did a nightly "dd if=/dev/md0 >/dev/null" 
> job to force a scan of the entire disk set.
>
> This is one reason why they invented RAID6.
>
> There's just so many ways to fail...sigh...

Off topic but an interesting read for disk failures:

http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl

-- 
Matthew Jones
Hewlett-Packard Ltd
Long Down Avenue
Stoke Gifford
Bristol.  BS34 8QZ
Tel:   +44 (0) 117 312 7490
Email:matthew.jo...@hp.com

Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks, 
RG12 1HN. Registered No: 690597 England

The contents of this message and any attachments to it are confidential 
and may be legally privileged. If you have received this message in 
error, you should delete it from your system immediately and advise the 
sender.

To any recipient of this message within HP, unless otherwise stated you 
should consider this message and attachments as "HP CONFIDENTIAL".
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-15 Thread Black, Michael (IS)

I'll give you another failure point that most people never see or think of.

I used to manage numerous Linux systems with RAID-5.  One time I had a drive 
fail, the spare kicked in, and then during the rebuild a 2nd drive 
failed...hosing the RAID (i.e. two failed disks).

The problem was...normal disk access/backup only scanned the in-use blocks.  
The RAID resync scanned the entire disk which had never been done.

After that I put in a utility that did a nightly "dd if=/dev/md0 >/dev/null" 
job to force a scan of the entire disk set.

This is one reason why they invented RAID6.

There's just so many ways to fail...sigh...

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Simon Slavin [slav...@bigfraud.org]
Sent: Monday, February 14, 2011 5:04 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

On 14 Feb 2011, at 8:50pm, Black, Michael (IS) wrote:

> And if you want to talk about data reliability...BACK UP YOUR DATA.

And keep the backups off-site.  And once a year try to actually restore one of 
them.  I've earned /lots/ of money from companies with backup systems that 
claimed to work fine but were either faulty, or being used to backup to media 
with low-level corruption.

If there was any money in it I'd write a book, but there isn't.  Companies 
prefer to do it wrong and then hire a consultant after it fails.

On 14 Feb 2011, at 9:13pm, Jim Wilcoxson wrote:

> I think what would be useful is for drives to have 3 settings:
>
> a) the drive can do whatever it wants to optimize performance
> b) the drive can reorder writes, but not across a sync (ACI, but no D)
> c) the drive has to respect all syncs (ACID)
>
> If the drive mgfr wants to make a) the default, that's fine, but there
> should be an easy way to request the other 2 from an application.
> Users are not usually sophisticated enough to know when it's okay to
> cache writes or not okay.  For my use of SQLite, b) would probably be
> fine, but a) apparently is not since it corrupts databases.

This is a job for someone who wants to contact lots of manufacturers and ask if 
they provide models of their drives which conform.  Some do, because they want 
to sell their drives for use in servers.  Often these drives are physically 
identical to their mass-market drives, but they have fewer bad blocks, and have 
jumpers in a different place or different firmware.

> Michael mentioned doing backups and forgetting about all of this, but
> many people are using USB drives to store their backups.  So there is
> a bit of a Catch-22 here.  Almost all modern-day filesystems are going
> to depend on some kind of journalling to prevent corrupted file
> systems, and as far I as know, journalling filesystems depend on syncs
> to maintain FS consistency.

If the data is backed up when SQLite has the files closed, things are fine.  
The really hard part of this comes when you're working with systems that must 
be live at all times.  Which is why you'll never see a bank use SQLite to 
maintain its live transaction system.  The big DBMSs have systems inside 
themselves that allow the backing-up of an active ever-changing database.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Simon Slavin

On 14 Feb 2011, at 8:50pm, Black, Michael (IS) wrote:

> And if you want to talk about data reliability...BACK UP YOUR DATA.

And keep the backups off-site.  And once a year try to actually restore one of 
them.  I've earned /lots/ of money from companies with backup systems that 
claimed to work fine but were either faulty, or being used to backup to media 
with low-level corruption.

If there was any money in it I'd write a book, but there isn't.  Companies 
prefer to do it wrong and then hire a consultant after it fails.

On 14 Feb 2011, at 9:13pm, Jim Wilcoxson wrote:

> I think what would be useful is for drives to have 3 settings:
> 
> a) the drive can do whatever it wants to optimize performance
> b) the drive can reorder writes, but not across a sync (ACI, but no D)
> c) the drive has to respect all syncs (ACID)
> 
> If the drive mgfr wants to make a) the default, that's fine, but there
> should be an easy way to request the other 2 from an application.
> Users are not usually sophisticated enough to know when it's okay to
> cache writes or not okay.  For my use of SQLite, b) would probably be
> fine, but a) apparently is not since it corrupts databases.

This is a job for someone who wants to contact lots of manufacturers and ask if 
they provide models of their drives which conform.  Some do, because they want 
to sell their drives for use in servers.  Often these drives are physically 
identical to their mass-market drives, but they have fewer bad blocks, and have 
jumpers in a different place or different firmware.

> Michael mentioned doing backups and forgetting about all of this, but
> many people are using USB drives to store their backups.  So there is
> a bit of a Catch-22 here.  Almost all modern-day filesystems are going
> to depend on some kind of journalling to prevent corrupted file
> systems, and as far I as know, journalling filesystems depend on syncs
> to maintain FS consistency.

If the data is backed up when SQLite has the files closed, things are fine.  
The really hard part of this comes when you're working with systems that must 
be live at all times.  Which is why you'll never see a bank use SQLite to 
maintain its live transaction system.  The big DBMSs have systems inside 
themselves that allow the backing-up of an active ever-changing database.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson

On Mon, Feb 14, 2011 at 3:02 PM, Max Vlasov  wrote:
> On Mon, Feb 14, 2011 at 8:42 PM, Jim Wilcoxson  wrote:
>
>> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov  wrote:
>> >> So my question is, does it maintain the other 3 parts of ACID, so that
>> >> the database will never be in a corrupted state after a power loss,
>> >> even though it may be missing some writes that were confirmed?
>> >
>> > Jim, I think the answer to your question is already in Max's tests:
>> > the USB drive is completely unreliable and can easily lead to database
>> > corruption. I'll explain. Max's tests showed that there were
>> > situations when database and journal were different not by one
>> > transaction but by several ones. So if one makes changes to several
>> > database pages (located in different disk sectors) and/or makes
>> > transactions touching several database pages (which makes multi-page
>> > journal files) then these pages can be way out of sync with each other
>> > (including pages inside journal). And this will easily lead to
>> > database corruption.
>>
>> You are right I think.  I wrote my own test program and ran it on a
>> Macbook Pro with a Seagate USB drive.  Here is the Python test
>> program:
>>
>> [jim@mb backup]$ time py t.py
>> 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec
>>
>>
>
> Jim, your tests also shows (this time on a different os) that either you
> have a fantastic hd with 18000 Rpm or just someone at Seagate _want_ you to
> think you have a fantastic hd :)
>
> Just wondering, I know this maybe sounds fantastic, but I'm thinking whether
> some "acid-passed harddrives" at sqlite.org can encourage manufacturers to
> hold the horses. The logic would be like this: if some model is present in
> either section then googling it will make this page very high in the google
> results  (due to high pagerank of sqlite.org). So they probably very quickly
> notice that this page at least partly can affect their sales. Unfortunately
> the technical side is more complex, the developers just can't rely on
> e-mails from users, this should be some sqlite-originated tests performing
> on a known configuration and it'd better be an oss os with known tuning.
> Maybe some other, less fantastic form of such tests could be possible...
>
> Max

I think what would be useful is for drives to have 3 settings:

a) the drive can do whatever it wants to optimize performance
b) the drive can reorder writes, but not across a sync (ACI, but no D)
c) the drive has to respect all syncs (ACID)

If the drive mgfr wants to make a) the default, that's fine, but there
should be an easy way to request the other 2 from an application.
Users are not usually sophisticated enough to know when it's okay to
cache writes or not okay.  For my use of SQLite, b) would probably be
fine, but a) apparently is not since it corrupts databases.

Michael mentioned doing backups and forgetting about all of this, but
many people are using USB drives to store their backups.  So there is
a bit of a Catch-22 here.  Almost all modern-day filesystems are going
to depend on some kind of journalling to prevent corrupted file
systems, and as far I as know, journalling filesystems depend on syncs
to maintain FS consistency.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Max Vlasov

On Mon, Feb 14, 2011 at 11:50 PM, Black, Michael (IS) <
michael.bla...@ngc.com> wrote:

> And if you want to talk about data reliability...BACK UP YOUR DATA.
> The likely failure points I can think of are:
> #1 Power supply (redundant supplies available)
> #2 Hard drive smoked (and your data is toast anyways -- RAID can help).
> #3 Blue screen (or kernel lockup on Unix)
> #4 CPU smoked. (usually leads to #3)
> #5 RAM smoked. (usually leads to #3)
> #6 Motherboard smoked (usually just dies or #3)
>
> The only way to increase your reliability is to replicate and/or backup.
>  All the whining about acid-tested drives is a waste of time.  #3 through #6
> have no solution though they shouldn't cause the hard drive corruption
> you're worried about.  And since #1 and #2 have solutions what's the
> problem?
>
>
I see some problem especially for sqlite since in contrary to server-side
databases, it's more consumer-oriented. Firefox, Apple Mail to name a few.
And if we remember that it's very portable we see other possibilities. For
example, there's already Portable Firefox that is usually used from usb
sticks. What's going to happen to user history of such browser if the user
forget to safely unmount the flash drive in the middle of the page
reloading? What about music-management software that is portable enough to
be used right from the external hard-drive? You don't usually use PostgreSQL
or MySql for such applications.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Black, Michael (IS)

And you think Jim's timings are wrong because..

I've already shown you can get speed just like he's showing.

That's what you get on a good write-cache-enabled drive.

And if you want to talk about data reliability...BACK UP YOUR DATA.
The likely failure points I can think of are:
#1 Power supply (redundant supplies available)
#2 Hard drive smoked (and your data is toast anyways -- RAID can help).
#3 Blue screen (or kernel lockup on Unix)
#4 CPU smoked. (usually leads to #3)
#5 RAM smoked. (usually leads to #3)
#6 Motherboard smoked (usually just dies or #3)

The only way to increase your reliability is to replicate and/or backup.  All 
the whining about acid-tested drives is a waste of time.  #3 through #6 have no 
solution though they shouldn't cause the hard drive corruption you're worried 
about.  And since #1 and #2 have solutions what's the problem?

Linux systems have write-cache enabled by default and they are running on an 
awful lot of servers around the world with no data loss.

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Max Vlasov [max.vla...@gmail.com]
Sent: Monday, February 14, 2011 2:02 PM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

On Mon, Feb 14, 2011 at 8:42 PM, Jim Wilcoxson <pri...@gmail.com> wrote:

> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov <paiva...@gmail.com> wrote:
> >> So my question is, does it maintain the other 3 parts of ACID, so that
> >> the database will never be in a corrupted state after a power loss,
> >> even though it may be missing some writes that were confirmed?
> >
> > Jim, I think the answer to your question is already in Max's tests:
> > the USB drive is completely unreliable and can easily lead to database
> > corruption. I'll explain. Max's tests showed that there were
> > situations when database and journal were different not by one
> > transaction but by several ones. So if one makes changes to several
> > database pages (located in different disk sectors) and/or makes
> > transactions touching several database pages (which makes multi-page
> > journal files) then these pages can be way out of sync with each other
> > (including pages inside journal). And this will easily lead to
> > database corruption.
>
> You are right I think.  I wrote my own test program and ran it on a
> Macbook Pro with a Seagate USB drive.  Here is the Python test
> program:
>
> [jim@mb backup]$ time py t.py
> 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec
>
>

Jim, your tests also shows (this time on a different os) that either you
have a fantastic hd with 18000 Rpm or just someone at Seagate _want_ you to
think you have a fantastic hd :)

Just wondering, I know this maybe sounds fantastic, but I'm thinking whether
some "acid-passed harddrives" at sqlite.org can encourage manufacturers to
hold the horses. The logic would be like this: if some model is present in
either section then googling it will make this page very high in the google
results  (due to high pagerank of sqlite.org). So they probably very quickly
notice that this page at least partly can affect their sales. Unfortunately
the technical side is more complex, the developers just can't rely on
e-mails from users, this should be some sqlite-originated tests performing
on a known configuration and it'd better be an oss os with known tuning.
Maybe some other, less fantastic form of such tests could be possible...

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Max Vlasov

On Mon, Feb 14, 2011 at 8:42 PM, Jim Wilcoxson  wrote:

> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov  wrote:
> >> So my question is, does it maintain the other 3 parts of ACID, so that
> >> the database will never be in a corrupted state after a power loss,
> >> even though it may be missing some writes that were confirmed?
> >
> > Jim, I think the answer to your question is already in Max's tests:
> > the USB drive is completely unreliable and can easily lead to database
> > corruption. I'll explain. Max's tests showed that there were
> > situations when database and journal were different not by one
> > transaction but by several ones. So if one makes changes to several
> > database pages (located in different disk sectors) and/or makes
> > transactions touching several database pages (which makes multi-page
> > journal files) then these pages can be way out of sync with each other
> > (including pages inside journal). And this will easily lead to
> > database corruption.
>
> You are right I think.  I wrote my own test program and ran it on a
> Macbook Pro with a Seagate USB drive.  Here is the Python test
> program:
>
> [jim@mb backup]$ time py t.py
> 4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec
>
>

Jim, your tests also shows (this time on a different os) that either you
have a fantastic hd with 18000 Rpm or just someone at Seagate _want_ you to
think you have a fantastic hd :)

Just wondering, I know this maybe sounds fantastic, but I'm thinking whether
some "acid-passed harddrives" at sqlite.org can encourage manufacturers to
hold the horses. The logic would be like this: if some model is present in
either section then googling it will make this page very high in the google
results  (due to high pagerank of sqlite.org). So they probably very quickly
notice that this page at least partly can affect their sales. Unfortunately
the technical side is more complex, the developers just can't rely on
e-mails from users, this should be some sqlite-originated tests performing
on a known configuration and it'd better be an oss os with known tuning.
Maybe some other, less fantastic form of such tests could be possible...

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson

On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov  wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.

You are right I think.  I wrote my own test program and ran it on a
Macbook Pro with a Seagate USB drive.  Here is the Python test
program:

import os, sqlite3, time

path = '/Volumes/TM/file.db'
exists = os.path.exists(path)
con = sqlite3.connect(path)
if exists:
k = con.execute('select max(k) from t').fetchone()[0]
print 'Previous run:', k
con.execute('drop table t')
con.commit()
con.execute('create table t (k integer primary key)')

t0 = time.time()
try:
for i in range(5000):
con.execute('insert into t values(?)', (i,))
con.commit()
con.close()
except Exception, err:
print err, 'at record', i

et = time.time() - t0
print i, 'records inserted in', et, 'seconds;', i/et, 'recs/sec'


Here are the results:

1. First run, no unplug:

[jim@mb backup]$ time py t.py
4999 records inserted in 17.7488458157 seconds; 281.652117097 recs/sec

real0m17.794s
user0m0.729s
sys 0m2.139s

2. Unplugged during the run:

[jim@mb backup]$ time py t.py
Previous run: 4999
disk I/O error at record 2681
2681 records inserted in 9.2398519516 seconds; 290.156164194 recs/sec

real0m9.294s
user0m0.410s
sys 0m1.193s

3. This shows that only 161 records made it to disk in the previous
run, not 2681:

[jim@mb backup]$ time py t.py
Previous run: 161
disk I/O error at record 977
977 records inserted in 3.34149599075 seconds; 292.384010845 recs/sec

real0m3.453s
user0m0.162s
sys 0m0.432s

4. This time there was no unplug, but SQLite could not recover the db:

[jim@mb backup]$ time py t.py
Previous run: 161
database disk image is malformed at record 1323
1323 records inserted in 4.26392698288 seconds; 310.277358246 recs/sec

real0m4.341s
user0m0.193s
sys 0m0.535s

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
>
> Pavel
>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson  wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov  wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson  wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed expected value that confirms
>>> the Jim's explanation about postponed writing. The actual values are below
>>> (the first one is the expected value to be found after reopening, the next
>>> one is actual value and the difference is self-explainable)
>>>
>>> 5094 -> 5086 = -8
>>> 5084 -> 5083 = -1
>>> 5070 -> 5049 = -21
>>> 5082 -> 5069 = -13
>>> 5095 -> 5086 = -9
>>> 5072 -> 5033 = -39
>>> 5118 -> 5053 = -65
>>> 5081 -> 5075 = -6
>>>
>>> So the maximum of non-flushed commits was 65
>>>
>>> I also made a couple of tests with journal on and see what is the difference
>>>

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Max Vlasov

On Mon, Feb 14, 2011 at 7:48 PM, Jim Wilcoxson  wrote:

> On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov  wrote:
> >> So my question is, does it maintain the other 3 parts of ACID, so that
> >> the database will never be in a corrupted state after a power loss,
> >> even though it may be missing some writes that were confirmed?
> >
> > Jim, I think the answer to your question is already in Max's tests:
> > the USB drive is completely unreliable and can easily lead to database
> > corruption. I'll explain. Max's tests showed that there were
> > situations when database and journal were different not by one
> > transaction but by several ones. So if one makes changes to several
> > database pages (located in different disk sectors) and/or makes
> > transactions touching several database pages (which makes multi-page
> > journal files) then these pages can be way out of sync with each other
> > (including pages inside journal). And this will easily lead to
> > database corruption.
> >
> >
> > Pavel
>
> Maybe Max would be kind enough to re-run his test with an insert
> instead of an update and see if there ever is corruption or if just
> the last n inserts are missing.
>
>
>

Jim, I agree it's interesting to know more about usb drive consistency in
context of sqlite, at least the testing is easy.

Unfortunately when I first started the tests there was already malformed db
error but just once. I just managed to repeat this now.

The db contains a single empty table
  CREATE TABLE [TestTable] ([Id] INTEGER PRIMARY KEY AUTOINCREMENT)
The repeated query is:
  INSERT INTO TestTable DEFAULT VALUES
I suppose the journal should be present othewise ACID test makes not sense
  pragma journal_mode=delete;

The version performing insert was Sqlite 3.7.5 just in case some things were
fixed recently

So after about 500 inserts I pulled the cord, get 'can't insert #511' and
first there was an error in system tray (Windows 7) about postponed writing
error for journal file, it surprised a little since it partly breaks OS
innocence that officially states it works for "quick removal", but actually
complains on a not-finished operation. Maybe this means interrupting in the
middle of a writing to a journal file that routes to this tray error, don't
know

Ok, I replug the drive. No journal file was present.
On open:
3.7.5 shell said "Error: database disk image is malformed" for both pragma
integrity_check; and pragma encoding;
3.6.10 correctly said UTF-8 on pragma encoding; and on integrity_check; said

*** in database main ***
On page 2 at right child: invalid page number 7

Btw, some of the tests (malformed and no) made system itself suspecting
errors on this NTFS drive suggesting checking the disk and in the report
there was a line about "fixing volume bit map" (can't say in exactly in
english since the error was on russian windows). This part returns some
innocence since probably even windows ntfs driver suffers from this
postponed writing inside the drive.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Simon Slavin

On 14 Feb 2011, at 4:48pm, Jim Wilcoxson wrote:

> I believe (though am not sure) that the drive can reorder individual
> sector/record writes between syncs however it likes, as long as it
> doesn't allow writes to cross a sync and also executes the sync groups
> (for lack of a better term - the group of writes between syncs) in
> order.

At the low-level of the device driver there is a distinction between telling 
the hard disk to 'sync' and 'flush'.  They both force all pending writes, but 
one returns immediately and the other waits until the operation is complete 
before returning.  OS device-drivers writers tend to choose one or the other 
rather than implement them both, but which one to use is a matter of taste and 
prejudice.  The definitive way to do it is simply to stop giving the drive 
commands and wait until it says "I have zero writes pending.".  Unfortunately 
it's harder to detect this (you have keep polling) than it is to just issue a 
'sync'.

Add to this the problems that the drive you're using may be part of a RAID 
array or even a mounted drive image (because you're running on a virtual 
machine), and you have a complete nightmare of consistency out there.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson

On Mon, Feb 14, 2011 at 10:56 AM, Pavel Ivanov  wrote:
>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
>
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.
>
>
> Pavel

Hmm... I guess I am not quite convinced. :)

I believe (though am not sure) that the drive can reorder individual
sector/record writes between syncs however it likes, as long as it
doesn't allow writes to cross a sync and also executes the sync groups
(for lack of a better term - the group of writes between syncs) in
order.

As an example, say you are inserting records on each commit instead of
updating.  Maybe the last record inserted was 1000 and the plug is
pulled.  When plugged back in, if you only have 800 records in the db,
it's not so bad.  However, if there is a corrupt db instead, it is
much worse.

Maybe Max would be kind enough to re-run his test with an insert
instead of an update and see if there ever is corruption or if just
the last n inserts are missing.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com

>
> On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson  wrote:
>> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov  wrote:
>>>
>>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson  wrote:
>>>
>>> > > But I thought about how it would be possible to test this explanation .
>>> > I'm
>>> > > going to do some tests that works like this. The same id updating, but 
>>> > > in
>>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>>> > say
>>> > > that it can't commit particular insert and I can see what is the value 
>>> > > of
>>> > > this insert. After replugging two variants possible:
>>> > > - I will find the previous value in the base. If for several dozens 
>>> > > tests
>>> > > the result is the same, that would mean that the situation is still
>>> > needed
>>> > > to be explained.
>>> > > - I will find some value less than previous to the one sqlite could not
>>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). 
>>> > > In
>>> > > this case the explanation will be confirmed.
>>> > >
>>> > > How about this? Does it sound reasonable?
>>> > >
>>> > >
>>> > Sounds like a clever test to me!
>>> >
>>> >
>>> >
>>> Thanks for supporting :) now the results.
>>>
>>> I switched off the journal:
>>>  PRAGMA journal_mode=OFF;
>>> As I described, the only record contained id=1 and the sql query was
>>>  UPDATE TestTable SET Id=Id + 1
>>> The cord was unplugged in the middle of the 10,000 operations when about
>>> 5000 records were updated. The hd was bus-powered external hard drive and I
>>> repeated the test several times. No test showed expected value that confirms
>>> the Jim's explanation about postponed writing. The actual values are below
>>> (the first one is the expected value to be found after reopening, the next
>>> one is actual value and the difference is self-explainable)
>>>
>>> 5094 -> 5086 = -8
>>> 5084 -> 5083 = -1
>>> 5070 -> 5049 = -21
>>> 5082 -> 5069 = -13
>>> 5095 -> 5086 = -9
>>> 5072 -> 5033 = -39
>>> 5118 -> 5053 = -65
>>> 5081 -> 5075 = -6
>>>
>>> So the maximum of non-flushed commits was 65
>>>
>>> I also made a couple of tests with journal on and see what is the difference
>>> between expected, non-restored value, and restored value. One of the
>>> results:
>>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
>>> It seems that for correctly operating os/hardware the numbers should be
>>> 5078->5077->5077 or no journal present depending on the moment of
>>> unplugging. So this postponing also made possible existing of some prior
>>> version of journal file.
>>>
>>> So, the next question was 'where?'. Is this software or hardware to blame.
>>> Richard always mentioned hardware in this context, I decided to check this
>>> with another device. This time it was a harddisk box with separated bus and
>>> power (the hd was Maxtor IDE 80Gb drive).
>>>
>>> The first variation was similar to the one with bus-powered, this time I
>>> used hard button on the box that is equivalent to unplugging both connection
>>> and power. The difference for a single test was really impressive 5355 ->
>>> 4445 = -910. And when I calculated

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Simon Slavin

On 14 Feb 2011, at 3:56pm, Pavel Ivanov wrote:

>> So my question is, does it maintain the other 3 parts of ACID, so that
>> the database will never be in a corrupted state after a power loss,
>> even though it may be missing some writes that were confirmed?
> 
> Jim, I think the answer to your question is already in Max's tests:
> the USB drive is completely unreliable and can easily lead to database
> corruption. I'll explain. Max's tests showed that there were
> situations when database and journal were different not by one
> transaction but by several ones. So if one makes changes to several
> database pages (located in different disk sectors) and/or makes
> transactions touching several database pages (which makes multi-page
> journal files) then these pages can be way out of sync with each other
> (including pages inside journal). And this will easily lead to
> database corruption.

It's worth noting that SQLite's journaling mechanism is designed to make up for 
some ACID frailties of the storage medium.  Even if your drive doesn't support 
ACID correctly, SQLite (seen as a whole and used as intended) may still be able 
to manage ACID on it.  You can see a discussion of this in section 2 of

http://www.sqlite.org/atomiccommit.html

Technology has now moved on from when that page was written.  For example, it's 
no longer possible for a drive to write part of a sector: the entire sector 
would be marked as corrupt now.  However, it might be worth changing the page 
(or splitting section 2 off to another page) to list precisely the requirements 
of the storage medium SQLite needs to accomplish this.  For instance, it may be 
that if the drive doesn't support write-in-order then even SQLite can't 
guarantee ACID.

I don't see a proper cure for this: software has to trust hardware to some 
extent, but unlike the old days hardware is now a set of independent 
components, each of which can lie to the others.  For SQLite to support ACID it 
may be that you need to set up your computer as a server, using drives bought 
as server drives which are set up with write-through-caching and other slower 
but safer settings.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Pavel Ivanov

> So my question is, does it maintain the other 3 parts of ACID, so that
> the database will never be in a corrupted state after a power loss,
> even though it may be missing some writes that were confirmed?

Jim, I think the answer to your question is already in Max's tests:
the USB drive is completely unreliable and can easily lead to database
corruption. I'll explain. Max's tests showed that there were
situations when database and journal were different not by one
transaction but by several ones. So if one makes changes to several
database pages (located in different disk sectors) and/or makes
transactions touching several database pages (which makes multi-page
journal files) then these pages can be way out of sync with each other
(including pages inside journal). And this will easily lead to
database corruption.


Pavel

On Mon, Feb 14, 2011 at 8:49 AM, Jim Wilcoxson  wrote:
> On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov  wrote:
>>
>> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson  wrote:
>>
>> > > But I thought about how it would be possible to test this explanation .
>> > I'm
>> > > going to do some tests that works like this. The same id updating, but in
>> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
>> > say
>> > > that it can't commit particular insert and I can see what is the value of
>> > > this insert. After replugging two variants possible:
>> > > - I will find the previous value in the base. If for several dozens tests
>> > > the result is the same, that would mean that the situation is still
>> > needed
>> > > to be explained.
>> > > - I will find some value less than previous to the one sqlite could not
>> > > commit at least in some of the tests (maybe -3, -4 to the failed one). In
>> > > this case the explanation will be confirmed.
>> > >
>> > > How about this? Does it sound reasonable?
>> > >
>> > >
>> > Sounds like a clever test to me!
>> >
>> >
>> >
>> Thanks for supporting :) now the results.
>>
>> I switched off the journal:
>>  PRAGMA journal_mode=OFF;
>> As I described, the only record contained id=1 and the sql query was
>>  UPDATE TestTable SET Id=Id + 1
>> The cord was unplugged in the middle of the 10,000 operations when about
>> 5000 records were updated. The hd was bus-powered external hard drive and I
>> repeated the test several times. No test showed expected value that confirms
>> the Jim's explanation about postponed writing. The actual values are below
>> (the first one is the expected value to be found after reopening, the next
>> one is actual value and the difference is self-explainable)
>>
>> 5094 -> 5086 = -8
>> 5084 -> 5083 = -1
>> 5070 -> 5049 = -21
>> 5082 -> 5069 = -13
>> 5095 -> 5086 = -9
>> 5072 -> 5033 = -39
>> 5118 -> 5053 = -65
>> 5081 -> 5075 = -6
>>
>> So the maximum of non-flushed commits was 65
>>
>> I also made a couple of tests with journal on and see what is the difference
>> between expected, non-restored value, and restored value. One of the
>> results:
>> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
>> It seems that for correctly operating os/hardware the numbers should be
>> 5078->5077->5077 or no journal present depending on the moment of
>> unplugging. So this postponing also made possible existing of some prior
>> version of journal file.
>>
>> So, the next question was 'where?'. Is this software or hardware to blame.
>> Richard always mentioned hardware in this context, I decided to check this
>> with another device. This time it was a harddisk box with separated bus and
>> power (the hd was Maxtor IDE 80Gb drive).
>>
>> The first variation was similar to the one with bus-powered, this time I
>> used hard button on the box that is equivalent to unplugging both connection
>> and power. The difference for a single test was really impressive 5355 ->
>> 4445 = -910. And when I calculated numbers for non-interrupted test the
>> drive showed about 300 commits per second.
>>
>> The second variation was just unplugging the cord but keeping the power
>> intact, so if it's drive that caches, it would end its operations
>> completely. This time the results were perfect, for example 4822 -> 4822,
>> and even 5371 -> 5372 = +1 that actually would mean the process was
>> interrupted after all data is written but before sqlite made winsync, os
>> reported failure, but the data was already there.
>>
>> So the sad news about faulty hardware is probably true once again.
>
> Hi Max - this turned out to be a good test you devised!  You know, you
> can never learn *too* much. :)
>
> Your test confirms that the USB drive is saying data is written to the
> drive when it really isn't, which seems bad.  But it got me thinking a
> little more.  Your drive for sure has lost the Durability property of
> ACID: when SQLite comes back from a commit, the data just written may
> or may not  be on the drive, as you proved.
>
> So my question is, does it maintain the other 3

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Jim Wilcoxson

On Mon, Feb 14, 2011 at 2:33 AM, Max Vlasov  wrote:
>
> On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson  wrote:
>
> > > But I thought about how it would be possible to test this explanation .
> > I'm
> > > going to do some tests that works like this. The same id updating, but in
> > > the middle of 10,000 operation I will unplug the cord, the sqlite will
> > say
> > > that it can't commit particular insert and I can see what is the value of
> > > this insert. After replugging two variants possible:
> > > - I will find the previous value in the base. If for several dozens tests
> > > the result is the same, that would mean that the situation is still
> > needed
> > > to be explained.
> > > - I will find some value less than previous to the one sqlite could not
> > > commit at least in some of the tests (maybe -3, -4 to the failed one). In
> > > this case the explanation will be confirmed.
> > >
> > > How about this? Does it sound reasonable?
> > >
> > >
> > Sounds like a clever test to me!
> >
> >
> >
> Thanks for supporting :) now the results.
>
> I switched off the journal:
>  PRAGMA journal_mode=OFF;
> As I described, the only record contained id=1 and the sql query was
>  UPDATE TestTable SET Id=Id + 1
> The cord was unplugged in the middle of the 10,000 operations when about
> 5000 records were updated. The hd was bus-powered external hard drive and I
> repeated the test several times. No test showed expected value that confirms
> the Jim's explanation about postponed writing. The actual values are below
> (the first one is the expected value to be found after reopening, the next
> one is actual value and the difference is self-explainable)
>
> 5094 -> 5086 = -8
> 5084 -> 5083 = -1
> 5070 -> 5049 = -21
> 5082 -> 5069 = -13
> 5095 -> 5086 = -9
> 5072 -> 5033 = -39
> 5118 -> 5053 = -65
> 5081 -> 5075 = -6
>
> So the maximum of non-flushed commits was 65
>
> I also made a couple of tests with journal on and see what is the difference
> between expected, non-restored value, and restored value. One of the
> results:
> 5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
> It seems that for correctly operating os/hardware the numbers should be
> 5078->5077->5077 or no journal present depending on the moment of
> unplugging. So this postponing also made possible existing of some prior
> version of journal file.
>
> So, the next question was 'where?'. Is this software or hardware to blame.
> Richard always mentioned hardware in this context, I decided to check this
> with another device. This time it was a harddisk box with separated bus and
> power (the hd was Maxtor IDE 80Gb drive).
>
> The first variation was similar to the one with bus-powered, this time I
> used hard button on the box that is equivalent to unplugging both connection
> and power. The difference for a single test was really impressive 5355 ->
> 4445 = -910. And when I calculated numbers for non-interrupted test the
> drive showed about 300 commits per second.
>
> The second variation was just unplugging the cord but keeping the power
> intact, so if it's drive that caches, it would end its operations
> completely. This time the results were perfect, for example 4822 -> 4822,
> and even 5371 -> 5372 = +1 that actually would mean the process was
> interrupted after all data is written but before sqlite made winsync, os
> reported failure, but the data was already there.
>
> So the sad news about faulty hardware is probably true once again.

Hi Max - this turned out to be a good test you devised!  You know, you
can never learn *too* much. :)

Your test confirms that the USB drive is saying data is written to the
drive when it really isn't, which seems bad.  But it got me thinking a
little more.  Your drive for sure has lost the Durability property of
ACID: when SQLite comes back from a commit, the data just written may
or may not  be on the drive, as you proved.

So my question is, does it maintain the other 3 parts of ACID, so that
the database will never be in a corrupted state after a power loss,
even though it may be missing some writes that were confirmed?  There
are many use cases where a drive like yours would be usable by SQLite:
it would be as if the async vfs thing was being used.  But if using
this drive could lead to a corrupted database, the drive isn't safe to
use at all with SQLite, IMO.

It would be interesting to know the difference, and even better, if
SQLite or an SQLite app could somehow run a test on a drive to figure
out whether the drive is:

a) perfectly safe: every commit is on the metal
b) somewhat safe: the last n commits may be missing, but the db will
never be corrupted after a power loss
c) not safe: the db may be corrupted on power loss

I think the difference between a) and b) has to do with write
ordering.  If you issue the write sequence:

a1 a2 a3 sync
b1 b2 sync
c1 c2 c3 sync

then (I think) as long as all a's are written before b's and b's are
written before c's,

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Black, Michael (IS)

Also...here's another part of the benchmark which shows triggers aren't as evil 
as I thought.  Trigger for this example was 2X the manual update.

F:\>d:batch 50 1000 12
using wal mode
using update trigger
Sqlite Version: 3.7.5
Inserting 50 rows using a bulk of 1000
commits per second: 71428.6
F:\>d:batch 50 1000 20
using wal mode
using manual update after insert
Sqlite Version: 3.7.5
Inserting 50 rows using a bulk of 1000
commits per second: 38461.5

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Black, Michael (IS)

SOLVED!!

Marcus Grimm and I went back forth a number of times trying to figure out why 
my benchmarks were so much faster than his.

Found it...

My SATA RAID setup had "Enable Advanced Performance" on by default (I had never 
turned it on).
My secondary tests on an IDE drive showed similar performance to Marcus.  Then 
I turned on the performance mode for that one and got a 10X performance 
improvement (advanced write caching...power loss susceptible...UPS needed if 
you want reliability).   The "Enable write caching" was already turned on.

To check your setup
My Computer/Properties on drive/Hardware/Properties on drive/Policies -- and 
check "Enable advanced performance".  

Here's the program we're benching with now.  Now compilable on Unix, Windows 
with VS 2008/2010/tdm-gcc 4.5.1.

Some results on a Western Digital 7200RPM IDE drive with advanced performance 
turned on
F:\>d:batch 1000 1 0
Sqlite Version: 3.7.5
Inserting 1000 rows using a bulk of 1
commits per second: 166.7
F:\>d:batch 1000 1 4
using wal mode
Sqlite Version: 3.7.5
Inserting 1000 rows using a bulk of 1
commits per second: 1000.0
F:\>d:batch 50 1000 0
Sqlite Version: 3.7.5
Inserting 50 rows using a bulk of 1000
commits per second: 71428.6
F:\>d:batch 50 1000 4
using wal mode
Sqlite Version: 3.7.5
Inserting 50 rows using a bulk of 1000
commits per second: 16.7
F:\>d:batch 50 1000 5  << note that in-memory is less than 2X disk-based 
now.
using mode: :memory:
using wal mode
Sqlite Version: 3.7.5
Inserting 50 rows using a bulk of 1000
commits per second: 25.0

#include 
#include 
#include 
#ifdef _WIN32
#include 
#include 
#include 
#else
#include 
#endif
#include 
#include "sqlite3.h"
double elapsed()
{
#ifdef _WIN32
struct _timeb timebuffer;
_ftime(  );
return( (double)timebuffer.time + timebuffer.millitm / 1000.0);
#else
double t1;
 struct timeval tv;
 static long base_seconds;
 gettimeofday(,NULL);
 if (base_seconds==0) base_seconds=tv.tv_sec;
 return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
#endif
}
int AddTrigger(sqlite3 *db)
{
char SqlTxt[256];
int rc;
strcpy(SqlTxt,"CREATE TRIGGER XX AFTER INSERT ON t FOR EACH ROW BEGIN ");
strcat(SqlTxt,"UPDATE t2 SET n = NEW.i+1;");
strcat(SqlTxt,"END;");
rc = sqlite3_exec(db,SqlTxt,NULL,NULL,NULL);
if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
}
return(0);
}
int main(int argc, char *argv[])
{
  sqlite3 *db;
  sqlite3_stmt *stmt=NULL;
  int rc;
  int n=0;
  int nrec=0;
  int interval=0;
  int flags=0;
  double t1;
  char   SqlTxt[256];
  if (argc != 4) {
   fprintf(stderr,"Usage: %s nrecords commit_interval flags\n",argv[0]);
   fprintf(stderr,"Flag 0 = simple table\n");
   fprintf(stderr,"Flag 1 = in-memory database\n");
   fprintf(stderr,"Flag 2 = add index\n");
   fprintf(stderr,"Flag 4 = WAL mode\n");
   fprintf(stderr,"Flag 8 = Add an update trigger per insert\n");
   fprintf(stderr,"Flag 16= Add a manual update per insert\n");
   fprintf(stderr,"Flag 32 = Synchronous=Off\n");
   fprintf(stderr,"Add flags to combine features\n");
   exit(-1);
  }
  nrec=atoi(argv[1]);
  interval=atoi(argv[2]);
  flags=atoi(argv[3]);
  if (flags & 1)
  {
   puts("using mode: :memory:");
   rc=sqlite3_open(":memory:",);
  }
  else {
   remove("batch.db");
   rc=sqlite3_open("batch.db",);
  }
  if (rc != SQLITE_OK) {
   puts(sqlite3_errmsg(db));
  }
  rc=sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL);
  if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
  }
  rc=sqlite3_exec(db,"create table t2 (n integer)",NULL,NULL,NULL);
  if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
  }
  rc=sqlite3_exec(db,"insert into t2 values(0)",NULL,NULL,NULL);
  if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
  }
  if (flags & 32) {
   rc=sqlite3_exec(db,"pragma synchronous=OFF",NULL,NULL,NULL);
   if (rc != SQLITE_OK) {
   puts(sqlite3_errmsg(db));
   }
   puts("using pragma synchronous=OFF");
  }
  if (flags & 2) {
   rc=sqlite3_exec(db,"create index tx on t (i)",NULL,NULL,NULL);
   if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
   }
puts("using index on t(i)");
  }
  if (flags & 4) {
   rc=sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
   if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
   }
   puts("using wal mode");
  }
  if (flags & 8) {
   AddTrigger(db);
   puts("using update trigger");
/**  if (interval != 1) {
fprintf(stderr,"Can't do trigger and begin/commit together\n");
exit(-1);
   }
**/
  }
  if( flags & 16 )
 puts("using manual update after insert");
  printf("Sqlite Version: %s\n", sqlite3_version);
  printf("Inserting %d rows using a bulk of %d\n", nrec, interval);
  sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
  t1=elapsed();
  if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
  while(n < nrec)
  {
   ++n;
   if (interval != 1 && (n% interval)==0) {

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Simon Slavin


On 14 Feb 2011, at 10:30am, Max Vlasov wrote:

> On Mon, Feb 14, 2011 at 12:05 PM, Simon Slavin  wrote:
> 
>> Can you expand upon your conclusion ?  I like your test, and am interested
>> to know precisely where you think the lag is happening.
> 
> Simon, the conclusion came from the assumption that if the lag is at the OS
> level then unplugging usb cord from powered hd box should also lead to bad
> results, but it doesn't. I.e. only in case of keeping power for unplugged hd
> box, the last commit reported by sqlite as successful is actually written.
> So looks like windows takes very seriously the option "optimize for quick
> removal" for removable drives. But both examples show that the drive itself
> doesn't obey this option. On the other side SATA/PATA drives connected
> directly to the motherboard have option "enable writing cache" that is
> explained like hardware-only option, maybe in this case one can actually
> affect how the drive is responding, but in contrary to usb tests, these ones
> are harder to make.

Okay.  Nice piece of work.

simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Max Vlasov

On Mon, Feb 14, 2011 at 12:05 PM, Simon Slavin  wrote:

> > The second variation was just unplugging the cord but keeping the power
> > intact, so if it's drive that caches, it would end its operations
> > completely. This time the results were perfect, for example 4822 -> 4822,
> > and even 5371 -> 5372 = +1 that actually would mean the process was
> > interrupted after all data is written but before sqlite made winsync, os
> > reported failure, but the data was already there.
> >
> > So the sad news about faulty hardware is probably true once again.
>
> Can you expand upon your conclusion ?  I like your test, and am interested
> to know precisely where you think the lag is happening.
>
>
Simon, the conclusion came from the assumption that if the lag is at the OS
level then unplugging usb cord from powered hd box should also lead to bad
results, but it doesn't. I.e. only in case of keeping power for unplugged hd
box, the last commit reported by sqlite as successful is actually written.
So looks like windows takes very seriously the option "optimize for quick
removal" for removable drives. But both examples show that the drive itself
doesn't obey this option. On the other side SATA/PATA drives connected
directly to the motherboard have option "enable writing cache" that is
explained like hardware-only option, maybe in this case one can actually
affect how the drive is responding, but in contrary to usb tests, these ones
are harder to make.

I also tried to find any jumper except Master/Slave/etc in the hd, but there
are no other. There is a little circuit in the box itself, but it lacks any
jumper.

Max
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-14 Thread Simon Slavin


On 14 Feb 2011, at 7:33am, Max Vlasov wrote:

> The first variation was similar to the one with bus-powered, this time I
> used hard button on the box that is equivalent to unplugging both connection
> and power. The difference for a single test was really impressive 5355 ->
> 4445 = -910. And when I calculated numbers for non-interrupted test the
> drive showed about 300 commits per second.
> 
> The second variation was just unplugging the cord but keeping the power
> intact, so if it's drive that caches, it would end its operations
> completely. This time the results were perfect, for example 4822 -> 4822,
> and even 5371 -> 5372 = +1 that actually would mean the process was
> interrupted after all data is written but before sqlite made winsync, os
> reported failure, but the data was already there.
> 
> So the sad news about faulty hardware is probably true once again.

Can you expand upon your conclusion ?  I like your test, and am interested to 
know precisely where you think the lag is happening.

You may be able to find either low-level OS settings or drive jumper positions 
which will change the results.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Max Vlasov

On Sun, Feb 13, 2011 at 8:53 PM, Jim Wilcoxson  wrote:

> > But I thought about how it would be possible to test this explanation .
> I'm
> > going to do some tests that works like this. The same id updating, but in
> > the middle of 10,000 operation I will unplug the cord, the sqlite will
> say
> > that it can't commit particular insert and I can see what is the value of
> > this insert. After replugging two variants possible:
> > - I will find the previous value in the base. If for several dozens tests
> > the result is the same, that would mean that the situation is still
> needed
> > to be explained.
> > - I will find some value less than previous to the one sqlite could not
> > commit at least in some of the tests (maybe -3, -4 to the failed one). In
> > this case the explanation will be confirmed.
> >
> > How about this? Does it sound reasonable?
> >
> >
> Sounds like a clever test to me!
>
>
>
Thanks for supporting :) now the results.

I switched off the journal:
  PRAGMA journal_mode=OFF;
As I described, the only record contained id=1 and the sql query was
  UPDATE TestTable SET Id=Id + 1
The cord was unplugged in the middle of the 10,000 operations when about
5000 records were updated. The hd was bus-powered external hard drive and I
repeated the test several times. No test showed expected value that confirms
the Jim's explanation about postponed writing. The actual values are below
(the first one is the expected value to be found after reopening, the next
one is actual value and the difference is self-explainable)

5094 -> 5086 = -8
5084 -> 5083 = -1
5070 -> 5049 = -21
5082 -> 5069 = -13
5095 -> 5086 = -9
5072 -> 5033 = -39
5118 -> 5053 = -65
5081 -> 5075 = -6

So the maximum of non-flushed commits was 65

I also made a couple of tests with journal on and see what is the difference
between expected, non-restored value, and restored value. One of the
results:
5078 (expeced) -> 5077 (non restored) -> 5021 (restored)
It seems that for correctly operating os/hardware the numbers should be
5078->5077->5077 or no journal present depending on the moment of
unplugging. So this postponing also made possible existing of some prior
version of journal file.

So, the next question was 'where?'. Is this software or hardware to blame.
Richard always mentioned hardware in this context, I decided to check this
with another device. This time it was a harddisk box with separated bus and
power (the hd was Maxtor IDE 80Gb drive).

The first variation was similar to the one with bus-powered, this time I
used hard button on the box that is equivalent to unplugging both connection
and power. The difference for a single test was really impressive 5355 ->
4445 = -910. And when I calculated numbers for non-interrupted test the
drive showed about 300 commits per second.

The second variation was just unplugging the cord but keeping the power
intact, so if it's drive that caches, it would end its operations
completely. This time the results were perfect, for example 4822 -> 4822,
and even 5371 -> 5372 = +1 that actually would mean the process was
interrupted after all data is written but before sqlite made winsync, os
reported failure, but the data was already there.

So the sad news about faulty hardware is probably true once again.

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Simon Slavin

On 13 Feb 2011, at 5:53pm, Jim Wilcoxson wrote:

> I also found this page, used to force a Linux system crash:
> 
> http://evuraan.blogspot.com/2005/06/how-to-force-paniccrash-in-linux.html
> 
> I seem to remember a post that SQLite commit/sync is tested with the kill
> command, but it seems like actually crashing the machine (which is done
> without a sync first) might be a better test.

You can do extensive experimentation an find out exactly how one setup acts.  
One particular hard disk, motherboard, operating system, and (disk) device 
driver.  But change any variable, even upgrade to a later version of the OS or 
device driver, and you can get different behaviour.  Same thing if you move the 
jumpers on the hard disk or upgrade its firmware.  So if you test your 
programming on your own computer but send your application to customers you 
don't know how it'll perform.

At the level you're worrying about, you're learning too much about a very 
narrow subject.  If you need a general solution you have to write for the 
specs, not one particular setup.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson

On Sun, Feb 13, 2011 at 11:55 AM, Max Vlasov  wrote:

> On Sun, Feb 13, 2011 at 5:31 PM, Jim Wilcoxson  wrote:
>
> > On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov 
> wrote:
> > > So the final results:
> > > - the db was always ok and contains the correct value (id=10001 for
> > initial
> > > 1).
> > > - the speed was about 227 commits per second so very close to my system
> > > flush test  (253)
> > >
> > > So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even
> if
> > > it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> > > without
> > > prior notice), it's still twice as much as 7200/60=120.
> > >
> >
> > My hypothesis to explain the behavior you are seeing is that it takes 4ms
> > (1/253) for your OS to flush its buffers to the hard drive cache, issue
> the
> > sync to the drive, and the hard drive to acknowledge the sync.  When it
> is
> > convenient, the drive will write this data to the disk.  The sync command
> > may make this more urgent than usual, which would explain why I/O slows
> > down
> > if sync is used.  Or, the drive could simply have a sync timer: the first
> > sync command starts the timer, and when 4ms have passed, the drive dumps
> > its
> > cache to the platters.
> >
>
>
> Jim, I see your point,  maybe really for removable media on Windows sync is
> still lazy, but much less "lazy" then general cached operations. Another
> version is that removable hds can report that they wrote some data while
> actually work as your described postonning it a little.
>
> But I thought about how it would be possible to test this explanation . I'm
> going to do some tests that works like this. The same id updating, but in
> the middle of 10,000 operation I will unplug the cord, the sqlite will say
> that it can't commit particular insert and I can see what is the value of
> this insert. After replugging two variants possible:
> - I will find the previous value in the base. If for several dozens tests
> the result is the same, that would mean that the situation is still needed
> to be explained.
> - I will find some value less than previous to the one sqlite could not
> commit at least in some of the tests (maybe -3, -4 to the failed one). In
> this case the explanation will be confirmed.
>
> How about this? Does it sound reasonable?
>
>
Sounds like a clever test to me!

I also found this page, used to force a Linux system crash:

http://evuraan.blogspot.com/2005/06/how-to-force-paniccrash-in-linux.html

I seem to remember a post that SQLite commit/sync is tested with the kill
command, but it seems like actually crashing the machine (which is done
without a sync first) might be a better test.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jean-Christophe Deschamps


>But what I postulate is that you can't physically write *the same* record
>over and over more than 90 times per second on a 5400 rpm drive, 
>unless the
>drive, OS, or filesystem implements something like wear-leveling, 
>where the
>physical location of sectors is constantly changing.

It's still possible if the firmware lies about physical sync and, 
having noticed that you repeatedly overwrite the same LBA, just 
postpones writes in its queue and only actually flushes the last one.

By precisely monitoring input power levels, the drive firmware knows 
how much it can write before its internal power reserves drop below 
unreliable levels and it also knows what's left in its write queue, 
accounting for every internal parameter.  So it's still possible that 
some drives give you a much higher rate than our simple-minded theory 
predicts.

Given the low cost of computing power now embarked in most devices, 
such behavior is quite possible.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Max Vlasov

On Sun, Feb 13, 2011 at 5:31 PM, Jim Wilcoxson  wrote:

> On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov  wrote:
> > So the final results:
> > - the db was always ok and contains the correct value (id=10001 for
> initial
> > 1).
> > - the speed was about 227 commits per second so very close to my system
> > flush test  (253)
> >
> > So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
> > it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> > without
> > prior notice), it's still twice as much as 7200/60=120.
> >
>
> My hypothesis to explain the behavior you are seeing is that it takes 4ms
> (1/253) for your OS to flush its buffers to the hard drive cache, issue the
> sync to the drive, and the hard drive to acknowledge the sync.  When it is
> convenient, the drive will write this data to the disk.  The sync command
> may make this more urgent than usual, which would explain why I/O slows
> down
> if sync is used.  Or, the drive could simply have a sync timer: the first
> sync command starts the timer, and when 4ms have passed, the drive dumps
> its
> cache to the platters.
>


Jim, I see your point,  maybe really for removable media on Windows sync is
still lazy, but much less "lazy" then general cached operations. Another
version is that removable hds can report that they wrote some data while
actually work as your described postonning it a little.

But I thought about how it would be possible to test this explanation . I'm
going to do some tests that works like this. The same id updating, but in
the middle of 10,000 operation I will unplug the cord, the sqlite will say
that it can't commit particular insert and I can see what is the value of
this insert. After replugging two variants possible:
- I will find the previous value in the base. If for several dozens tests
the result is the same, that would mean that the situation is still needed
to be explained.
- I will find some value less than previous to the one sqlite could not
commit at least in some of the tests (maybe -3, -4 to the failed one). In
this case the explanation will be confirmed.

How about this? Does it sound reasonable?

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson

On Sun, Feb 13, 2011 at 7:07 AM, Jean-Christophe Deschamps  wrote:

>
> >So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
> >it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> >without
> >prior notice), it's still twice as much as 7200/60=120.
>
> 5400/60, 7200/60 ... those values rely on the assumption that
> successive LBAs are mapped to successive physical sectors (512 or 4K,
> whatever) on the same face of the same plater.  Is it obvious that all
> today's typical stock drives actually implement only that simple old
> scheme and not an untold mix of various interleaving techniques?
>

This is true for the case of writing multiple records or multiple sectors.
For example, if you have a drive with 5000 sectors per track and you write
sector 1 of the track with a hard sync, you may have time to write sector
2500 with a hard sync in the same revolution.  Or maybe you can write every
500 sectors with a hard sync in the same revolution, giving you a commit
rate of 10 per revolution or 900 commits per second on a 5400 rpm drive.

But what I postulate is that you can't physically write *the same* record
over and over more than 90 times per second on a 5400 rpm drive, unless the
drive, OS, or filesystem implements something like wear-leveling, where the
physical location of sectors is constantly changing.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jim Wilcoxson

On Sun, Feb 13, 2011 at 1:15 AM, Max Vlasov  wrote:

> On Sat, Feb 12, 2011 at 7:11 PM, Jim Wilcoxson  wrote:
>
> >
> >
> > Unless I'm missing something, SQLite has to update the first page of the
> > database on every commit, to update the change counter.  Assuming you are
> > using rotating media, that record can only be updated 120 times per
> second,
> > maximum, on a 7200RPM drive.
> >
> >
>
> Hmm, I think there's something wrong with this theory of a single flush per
> round
>
> Yesterday I posted here results about from tests that ended with something
> like 50 commits per second in case of single flush (journal_mode=off). I
> decided to run a simple program that writes a byte to a file and does
> windows FlushFileBuffers. It reported 53 writes per second. This was
> expected results if this theory is right. But when I connected an external
> drive (WD1200E1MS, bus-powered, it seems it is even 5400 rpm), this number
> jumped to 253. I though "Ok, something, maybe os maybe drive tried to fool
> me, let's do this other way". I did the following:
>
> -  Checked the removable drive "Quick removaI" is on so no write cache on
> os
> side
> -  opened the same test db on this drive, journal_mode=off for best
> scenario
> -  Executed 10,000 updates to a single record made as separated
> transactions
> ( UPDATE TestTable SET Id=Id+1 )
> - When the program reported ok, I quickly unplugged the usb cord. The
> device
> is bus-powered so had no power to complete any postponed operation. The
> total time was about 50 seconds and to do the real quick unplug I prepared
> my hands and the delay was no more than 200 ms (subjectively).
> - Replug the drive again and opened the db in the tool to see was there any
> corruption or other lost of data.
>
> So the final results:
> - the db was always ok and contains the correct value (id=10001 for initial
> 1).
> - the speed was about 227 commits per second so very close to my system
> flush test  (253)
>
> So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
> it's 7200 (manufacturers sometimes upgrade drives inside portable hd
> without
> prior notice), it's still twice as much as 7200/60=120.
>

If we go with the theory that a single record/sector can only be written
once per revolution, that means writes can occur 90 times per second, or
about every 11ms on your 5400 rpm drive.  So for this test to show
corruption, you'd have to unplug the drive within 11ms of the last commit,
which I think is practically impossible.

My hypothesis to explain the behavior you are seeing is that it takes 4ms
(1/253) for your OS to flush its buffers to the hard drive cache, issue the
sync to the drive, and the hard drive to acknowledge the sync.  When it is
convenient, the drive will write this data to the disk.  The sync command
may make this more urgent than usual, which would explain why I/O slows down
if sync is used.  Or, the drive could simply have a sync timer: the first
sync command starts the timer, and when 4ms have passed, the drive dumps its
cache to the platters.

Jim
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-13 Thread Jean-Christophe Deschamps


>So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
>it's 7200 (manufacturers sometimes upgrade drives inside portable hd 
>without
>prior notice), it's still twice as much as 7200/60=120.

5400/60, 7200/60 ... those values rely on the assumption that 
successive LBAs are mapped to successive physical sectors (512 or 4K, 
whatever) on the same face of the same plater.  Is it obvious that all 
today's typical stock drives actually implement only that simple old 
scheme and not an untold mix of various interleaving techniques?

Yet that wouldn't explain a two- or threefold difference between 
internal SATA and USB.  That sounds surprising.

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Max Vlasov

On Sat, Feb 12, 2011 at 7:11 PM, Jim Wilcoxson  wrote:

>
>
> Unless I'm missing something, SQLite has to update the first page of the
> database on every commit, to update the change counter.  Assuming you are
> using rotating media, that record can only be updated 120 times per second,
> maximum, on a 7200RPM drive.
>
>

Hmm, I think there's something wrong with this theory of a single flush per
round

Yesterday I posted here results about from tests that ended with something
like 50 commits per second in case of single flush (journal_mode=off). I
decided to run a simple program that writes a byte to a file and does
windows FlushFileBuffers. It reported 53 writes per second. This was
expected results if this theory is right. But when I connected an external
drive (WD1200E1MS, bus-powered, it seems it is even 5400 rpm), this number
jumped to 253. I though "Ok, something, maybe os maybe drive tried to fool
me, let's do this other way". I did the following:

-  Checked the removable drive "Quick removaI" is on so no write cache on os
side
-  opened the same test db on this drive, journal_mode=off for best scenario
-  Executed 10,000 updates to a single record made as separated transactions
( UPDATE TestTable SET Id=Id+1 )
- When the program reported ok, I quickly unplugged the usb cord. The device
is bus-powered so had no power to complete any postponed operation. The
total time was about 50 seconds and to do the real quick unplug I prepared
my hands and the delay was no more than 200 ms (subjectively).
- Replug the drive again and opened the db in the tool to see was there any
corruption or other lost of data.

So the final results:
- the db was always ok and contains the correct value (id=10001 for initial
1).
- the speed was about 227 commits per second so very close to my system
flush test  (253)

So if the drive is 5400 rpm, 227 is much more than 5400/60=90 and even if
it's 7200 (manufacturers sometimes upgrade drives inside portable hd without
prior notice), it's still twice as much as 7200/60=120.

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Max Vlasov

On Sun, Feb 13, 2011 at 12:27 AM, Marcus Grimm wrote:

>  So my brain isn't that lasy, we count 3 syncs :-)
>
> When you turn journaling to off you will most likely
> see less syncs, probably 2 in your case. So that is all
> in line..
>
> Marcus
>
>
Marcus, you're right about general journaling, I also found 3 flushes in the
Process Monitor log and the number of commits per sec is about 12-15. And
looks like journal_mode=off requires only one flush.

Interesting that WAL requires also only one sync (single flush in the log)
and the speed about 40-50 commits confirms this. So there's also another
reason to use WAL. Although the page
http://www.sqlite.org/draft/wal.htmlalready says "4. WAL uses many
fewer fsync() operations...", it attributes
more to the consistency issues, but this tests also contributes to "1. WAL
is significantly faster in most scenarios". So the hint about wal can
look like this: if possible number of commits per second can vary up to 30
per second, one should use WAL.

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Thomas Fjellstrom

On February 12, 2011, Black, Michael (IS) wrote:
> Are you wrapping your data dump into a "BEGIN" "COMMIT"?

Yup.

> Also...you mention using a select...I assume you are testing first to see
> if you need to update or insert?

The select is not used when dumping data. The in memory data structure knows 
if an entry hasn't been saved yet (id == 0), calls INSERT if it hasn't and 
UPDATE if it has.

> You may want to look at the REPLACE clause and see if you can use that.

That's an option.

[snip]

After working on my error handling a bit last night, the problem seems to be 
gone now. I tell the program to sync its data, and it takes 2-5 100ths of a 
second, rather than several seconds.

At the moment I can't find the exact change that fixed it, but performance is 
more in line with what I'd expect with `PRAGMA synchronous=off` set.

> 
> Michael D. Black
> Senior Scientist
> NG Information Systems
> Advanced Analytics Directorate
> 

-- 
Thomas Fjellstrom
tfjellst...@strangesoft.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Marcus Grimm

>>
>> Interesting, I did a test on a 7200 file and the best I could do was 50
>> commits per second (a simple base/table with only id, journalling off
>> and
>> no
>> extra code since the tool I use has "a repeated query" option with
>> accurate
>> timing). You mentioned 3 syncs per commit, but I tried to look at the
>> log
>> of
>> Process Monitor for the tool process and I saw only one entry with
>> 'FlushBuffersFile' that is as I suppose was a mirror name for
>> FlushFileBuffers in winSync of sqlite so probably 3 syncs was a too big
>> an
>> estimate, wasn't it? In this case 50 commits per second looks reasonable
>> limit
>
> I found the number of 3 syncs some time ago, I can't proofe it
> right now. As far as I remeber I simply activated the debug logout put in
> sqlite and have seen these syncs. sqlite needs to sync the journal file
> once it has copied the pages from the DB, then it needs to sync the
> DB file itselve after written pages to the main DB, then it probably needs
> to clear and sync the journal file to indicate that the operation is
> finished. This sums up to 3 -- but I'm guessing here, maybe DRH is willing
> to confirm... :-)
>
> Note that this handling is changed via PRAGMA synchronous;
>
> I'm talking about windows, not sure if that applies on unix or mac.
>
> Marcus
>

okay, to avoid that I'm talking nuts I repeated the
debug log. The following output is generated by sqlite in
debug mode after a "INSERT INTO T VALUES(7);":

INSERT COMMAND...
LOCK 4072 1 was 0(0)
READ 4072 lock=1
LOCK 4072 2 was 1(0)
OPEN 4084 C:\batch.db-journal 0xc000 ok
WRITE 4084 lock=0
WRITE 4084 lock=0
WRITE 4084 lock=0
WRITE 4084 lock=0
LOCK 4072 4 was 2(0)
unreadlock = 1
WRITE 4084 lock=0
WRITE 4084 lock=0
WRITE 4084 lock=0
READ 4084 lock=0
SYNC 4084 lock=0
WRITE 4084 lock=0
SYNC 4084 lock=0
WRITE 4072 lock=4
WRITE 4072 lock=4
SYNC 4072 lock=4
CLOSE 4084
CLOSE 4084 ok
DELETE "C:\batch.db-journal" ok
UNLOCK 4072 to 1 was 4(0)
UNLOCK 4072 to 0 was 1(0)
INSERT COMMAND END.

So my brain isn't that lasy, we count 3 syncs :-)

When you turn journaling to off you will most likely
see less syncs, probably 2 in your case. So that is all
in line..

Marcus


>
>>
>> Max Vlasov
>>
>
>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Marcus Grimm

> On Sat, Feb 12, 2011 at 9:48 PM, Marcus Grimm
> wrote:
>
>> > I should've realized it wasn't running this fast but the small 5000
>> record
>> > size got me.
>> > Test it yourself.
>> > I do have a 7200RPM drive.  My 261.4 numer is still 2+X your
>> theoretical.
>>
>> I don't want to be a smart-arse, but I still think your 261.4 is to
>> fast.
>> On a 7200 RPM drive one will have 125 chances/sec to see a sector to be
>> written. Since sqlite, under normal journal mode, will need 3 syncs
>> per commit as far as I can recall, the maximum number drops further
>> down to 41 commit/sec. This is theoretical, in reality one will see
>> maybe 20 commits/sec. Not sure if a disc write-cache will interfere
>> with that caluclation, though.
>> Am I wrong ? :-)
>>
>
>
> Interesting, I did a test on a 7200 file and the best I could do was 50
> commits per second (a simple base/table with only id, journalling off and
> no
> extra code since the tool I use has "a repeated query" option with
> accurate
> timing). You mentioned 3 syncs per commit, but I tried to look at the log
> of
> Process Monitor for the tool process and I saw only one entry with
> 'FlushBuffersFile' that is as I suppose was a mirror name for
> FlushFileBuffers in winSync of sqlite so probably 3 syncs was a too big an
> estimate, wasn't it? In this case 50 commits per second looks reasonable
> limit

I found the number of 3 syncs some time ago, I can't proofe it
right now. As far as I remeber I simply activated the debug logout put in
sqlite and have seen these syncs. sqlite needs to sync the journal file
once it has copied the pages from the DB, then it needs to sync the
DB file itselve after written pages to the main DB, then it probably needs
to clear and sync the journal file to indicate that the operation is
finished. This sums up to 3 -- but I'm guessing here, maybe DRH is willing
to confirm... :-)

Note that this handling is changed via PRAGMA synchronous;

I'm talking about windows, not sure if that applies on unix or mac.

Marcus


>
> Max Vlasov
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Marcus Grimm

> OK...I added your trigger example as option 8.  And I had pasted the wrong
> version in my last email.  My timings were correct.  Your example also did
> sql_exec instead of using prepare so it will run slower.

Yes, but that should be marginal.

When I send my code the trigger version wasn't finished.
The intension is to compare a trigger update vs. manual
UPDATE statement within your loop.
I was also quite alarmed about the issue recently raised
here about the performace penality a trigger introduces.
So far my experiments based on your code doesn't confirm
this: Using a trigger is slightly faster vs. manual update
if everthing is within implicit transaction only - Makes sense.
The trigger is slightly slower when compared to manual UPDATE
when using commit bulks - Here the overhead of the trigger
appears slightly higher than a manual UPDATE, but: The difference
is maybe 10% and not 6 times slower as reported. Strange...

Anyway, I would like to add my trigger test in your version
but I can't compile due to missing 'gettimeofday' function.
Are you using MSVS ? Can you advice me where you have the
gettimeofday from ?

Sqlite does use fsync() to sync the file content prior
proceeding with journal/db file updating. Some month ago
I looked into that and found it is done 3 times per
"INSERT". I'm not speaking about sector write performance it is
related to the file syncs,
I don't see whats wrong with my arguments, please advice.

Thanks for the code by the way - I like the approach to create
a little test tool to have a common basis to experiment.

Marcus

> I also made this compilable on Unix too.
>
> On Unix my timing matches the run time and there are the right # of
> records in the database with the right values.  Unix is faster than
> Windows (8-core 2.6Ghz Unix system using SAS drives (I'm not sure of the
> model but I'm quite sure they are 7200RPM).
> Perhaps this is write caching on the disk?
>
> time ./batch 1600 1 0
> 1543.1
> real0m1.042s
> user0m0.046s
> sys 0m0.216s
>
> Something tells me your theory is wrong.  All 1600 records get inserted
> into the database in approx 1 second so it matches independent timing
> quite well.
>
> Here's my results with your trigger.  I'll maintain the timings are
> correct until somebody can prove this wrong.  Your theory is simply
> incomplete and is not taking into account SQLite's behavior.
>
> D:\SQLite>batch 504 1 0 -- simple single insert no batch
> 442.1
> D:\SQLite>batch 504 1 8 -- with trigger insert on 2nd table -- single row
> 307.3
> D:\SQLite>batch 504 1 9 -- trigger + in memory
> 5378.5
> D:\SQLite>batch 504 1 10 -- trigger +  index
> 212.3
> D:\SQLite>batch 504 1 12 -- trigger + WAL mode
> 2482.4
> D:\SQLite>batch 504 1 14 -- trigger + WAL mode + index
> 461.0
> D:\SQLite>batch 504 1 6 -- WAL mode + index
> 4608.0
> D:\SQLite>batch 504 504 6 -- WAL mode + index in one transaction
> 32256.0
> Of course you can't do your trigger (i.e. update) inside a transaction so
> there's some error checking for that now.
> You trigger is a factor of 10 slower.  I think we determined before that
> doing that update directly instead of a trigger would be faster.  I should
> add a method for that too.
>
> I also added a SYNCHRONOUS=OFF option#16
> That speeds it up a bit too.  I didn't see where FULL made much of a
> difference here so I just made OFF the option.
> D:\SQLite>batch 1600 1 0
> 414.6
> D:\SQLite>batch 1600 1 0
> 279.8
> D:\SQLite>batch 1600 1 16
> 602.4
> D:\SQLite>batch 1600 1 16
> 572.1
> D:\SQLite>batch 1600 1600 16
> 102400.0
>
> #include 
> #include 
> #ifdef _WIN32
> #include 
> #include 
> #else
> #include 
> #endif
> #include 
> #include "sqlite3.h"
> double elapsed()
> {
> double t1;
> struct timeval tv;
> static long base_seconds;
> gettimeofday(,NULL);
> if (base_seconds==0) base_seconds=tv.tv_sec;
> return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
> }
> int AddTrigger(sqlite3 *db)
> {
> char SqlTxt[256];
> int rc;
> strcpy(SqlTxt,"CREATE TRIGGER XX AFTER INSERT ON t FOR EACH ROW
> BEGIN ");
> strcat(SqlTxt,"UPDATE t2 SET n = NEW.i+1;");
> strcat(SqlTxt,"END;");
>  rc = sqlite3_exec(db,SqlTxt,NULL,NULL,NULL);
> if (rc != SQLITE_OK) {
> puts(sqlite3_errmsg(db));
> }
> return(0);
> }
> int main(int argc, char *argv[])
> {
>  sqlite3 *db;
>  sqlite3_stmt *stmt=NULL;
>  int rc;
>  int n=0;
>  int nrec=0;
>  int interval=0;
>  int flags=0;
>  double t1;
>  if (argc != 4) {
>   fprintf(stderr,"Usage: %s nrecords commit_interval flags\n",argv[0]);
>   fprintf(stderr,"Flag 0 = simple table\n");
>   fprintf(stderr,"Flag 1 = in-memory database\n");
>   fprintf(stderr,"Flag 2 = add index\n");
>   fprintf(stderr,"Flag 4 = WAL mode\n");
>   fprintf(stderr,"Flag 8 = Add a trigger\n");
>   fprintf(stderr,"Flag 16 = Synchronous=Off\n");
>   fprintf(stderr,"Add flags to combine features\n");
>   exit(-1);
>  }
>

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Roger Binns

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 02/12/2011 11:54 AM, Simon Slavin wrote:
> Compensating for this behaviour is a big part of what SQLite does in 
> journaling.  

SQLite and other journalling mechanisms depend on an operating system call
fsync that flushes their relevant data to the hard disk, and that the hard
flush any buffered writes it has, only returning when everything is written
and would be present should there be an abrupt power failure.

About 7 years ago this issue was very prominent.  It was discovered that a
fair number drives lie in response to that command to flush buffers.  Some
operating systems (cough Mac cough) would even lie in response to the system
call fsync.  Making hard drives use write through instead of write behind helps.

Some interesting related posts for people wondering about durability:

  http://community.livejournal.com/lj_dev/670215.html
  http://blogs.msdn.com/b/oldnewthing/archive/2010/09/09/10059575.aspx
  http://peter-zaitsev.livejournal.com/11177.html
  http://brad.livejournal.com/2116715.html

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk1W7PcACgkQmOOfHg372QRuOgCgjX6VJVtoih6HXqcwagf4Wi3l
U+EAnR0RbSYaIyLoTXmSpVDRHouma1tL
=XRFx
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Max Vlasov

On Sat, Feb 12, 2011 at 9:48 PM, Marcus Grimm wrote:

> > I should've realized it wasn't running this fast but the small 5000
> record
> > size got me.
> > Test it yourself.
> > I do have a 7200RPM drive.  My 261.4 numer is still 2+X your theoretical.
>
> I don't want to be a smart-arse, but I still think your 261.4 is to fast.
> On a 7200 RPM drive one will have 125 chances/sec to see a sector to be
> written. Since sqlite, under normal journal mode, will need 3 syncs
> per commit as far as I can recall, the maximum number drops further
> down to 41 commit/sec. This is theoretical, in reality one will see
> maybe 20 commits/sec. Not sure if a disc write-cache will interfere
> with that caluclation, though.
> Am I wrong ? :-)
>

Interesting, I did a test on a 7200 file and the best I could do was 50
commits per second (a simple base/table with only id, journalling off and no
extra code since the tool I use has "a repeated query" option with accurate
timing). You mentioned 3 syncs per commit, but I tried to look at the log of
Process Monitor for the tool process and I saw only one entry with
'FlushBuffersFile' that is as I suppose was a mirror name for
FlushFileBuffers in winSync of sqlite so probably 3 syncs was a too big an
estimate, wasn't it? In this case 50 commits per second looks reasonable
limit

Max Vlasov
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Black, Michael (IS)

OK...I added your trigger example as option 8.  And I had pasted the wrong 
version in my last email.  My timings were correct.  Your example also did 
sql_exec instead of using prepare so it will run slower.
I also made this compilable on Unix too.

On Unix my timing matches the run time and there are the right # of records in 
the database with the right values.  Unix is faster than Windows (8-core 2.6Ghz 
Unix system using SAS drives (I'm not sure of the model but I'm quite sure they 
are 7200RPM).
Perhaps this is write caching on the disk?

time ./batch 1600 1 0
1543.1
real0m1.042s
user0m0.046s
sys 0m0.216s

Something tells me your theory is wrong.  All 1600 records get inserted into 
the database in approx 1 second so it matches independent timing quite well.

Here's my results with your trigger.  I'll maintain the timings are correct 
until somebody can prove this wrong.  Your theory is simply incomplete and is 
not taking into account SQLite's behavior.

D:\SQLite>batch 504 1 0 -- simple single insert no batch
442.1
D:\SQLite>batch 504 1 8 -- with trigger insert on 2nd table -- single row
307.3
D:\SQLite>batch 504 1 9 -- trigger + in memory
5378.5
D:\SQLite>batch 504 1 10 -- trigger +  index
212.3
D:\SQLite>batch 504 1 12 -- trigger + WAL mode
2482.4
D:\SQLite>batch 504 1 14 -- trigger + WAL mode + index
461.0
D:\SQLite>batch 504 1 6 -- WAL mode + index
4608.0
D:\SQLite>batch 504 504 6 -- WAL mode + index in one transaction
32256.0
Of course you can't do your trigger (i.e. update) inside a transaction so 
there's some error checking for that now.
You trigger is a factor of 10 slower.  I think we determined before that doing 
that update directly instead of a trigger would be faster.  I should add a 
method for that too.

I also added a SYNCHRONOUS=OFF option#16
That speeds it up a bit too.  I didn't see where FULL made much of a difference 
here so I just made OFF the option.
D:\SQLite>batch 1600 1 0
414.6
D:\SQLite>batch 1600 1 0
279.8
D:\SQLite>batch 1600 1 16
602.4
D:\SQLite>batch 1600 1 16
572.1
D:\SQLite>batch 1600 1600 16
102400.0

#include 
#include 
#ifdef _WIN32
#include 
#include 
#else
#include 
#endif
#include 
#include "sqlite3.h"
double elapsed()
{
double t1;
struct timeval tv;
static long base_seconds;
gettimeofday(,NULL);
if (base_seconds==0) base_seconds=tv.tv_sec;
return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
}
int AddTrigger(sqlite3 *db)
{
char SqlTxt[256];
int rc;
strcpy(SqlTxt,"CREATE TRIGGER XX AFTER INSERT ON t FOR EACH ROW BEGIN 
");
strcat(SqlTxt,"UPDATE t2 SET n = NEW.i+1;");
strcat(SqlTxt,"END;");
 rc = sqlite3_exec(db,SqlTxt,NULL,NULL,NULL);
if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
}
return(0);
}
int main(int argc, char *argv[])
{
 sqlite3 *db;
 sqlite3_stmt *stmt=NULL;
 int rc;
 int n=0;
 int nrec=0;
 int interval=0;
 int flags=0;
 double t1;
 if (argc != 4) {
  fprintf(stderr,"Usage: %s nrecords commit_interval flags\n",argv[0]);
  fprintf(stderr,"Flag 0 = simple table\n");
  fprintf(stderr,"Flag 1 = in-memory database\n");
  fprintf(stderr,"Flag 2 = add index\n");
  fprintf(stderr,"Flag 4 = WAL mode\n");
  fprintf(stderr,"Flag 8 = Add a trigger\n");
  fprintf(stderr,"Flag 16 = Synchronous=Off\n");
  fprintf(stderr,"Add flags to combine features\n");
  exit(-1);
 }
 nrec=atoi(argv[1]);
 interval=atoi(argv[2]);
 flags=atoi(argv[3]);
 if (flags & 1) {
  rc=sqlite3_open(":memory:",);
 }
 else {
  remove("batch.db");
  rc=sqlite3_open("batch.db",);
 }
 if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
 }
 rc=sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL);
if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
}
 rc=sqlite3_exec(db,"create table t2 (n integer)",NULL,NULL,NULL);
if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
}
 rc=sqlite3_exec(db,"insert into t2 values(0)",NULL,NULL,NULL);
if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
}
 if (flags & 16) {
  rc=sqlite3_exec(db,"pragma synchronous=OFF",NULL,NULL,NULL);
 if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
 }
 }
 if (flags & 2) {
  rc=sqlite3_exec(db,"create index tx on t (i)",NULL,NULL,NULL);
 if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
 }
 }
 if (flags & 4) {
  rc=sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
 if (rc != SQLITE_OK) {
 puts(sqlite3_errmsg(db));
 }
 }
 if (flags & 8) {
  AddTrigger(db);
  if (interval != 1) {
   fprintf(stderr,"Can't do trigger and begin/commit together\n");
   exit(-1);
  }
 }
 sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
 t1=elapsed();
 if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
 while(n < nrec) {
  ++n;
  if (interval != 1 && (n% interval)==0) {

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Simon Slavin

On 12 Feb 2011, at 4:11pm, Jim Wilcoxson wrote:

> I don't understand how you can do 360K commits per second if your system is
> actually doing "to the platter" writes on every commit.  Can someone clue me
> in?

My field of expertise, I'm afraid.  The answer is "Hard disks lie.".

Almost all hard disks you can buy in a mundane manner these days have onboard 
buffering.  They accept a write command, tell the computer it has been 
executed, but queue up the actual writes so that they can be done in an 
efficient manner (e.g. when the disk has rotated to the right position).  They 
may not even be done in the right order !  This technique is used to make the 
disk appear to work faster: computer benchmarks report that disks that do this 
say "I got it !" more quickly.  The computer has no way to really know when 
data has reached the magnetic surface of the disk.

Compensating for this behaviour is a big part of what SQLite does in 
journaling.  The source code looks like the programmers are paranoid but really 
they're just taking all possibilities into account.

Many (no longer all !) disks can be modified not to do this (they use 
write-through caching instead) simply by moving a jumper or two.  But if you 
did this in a production computer -- any computer used for mundane daily life 
like writing Word documents -- you would soon put it back because the drop in 
speed is really noticeable.  I've done it in a demonstration and you can hear 
the groans.  The systems this is normally done in are mission-critical servers, 
where losing even an instant of data in a power-cut would be very costly.  The 
canonical example of this is diagnostic logging device in a dangerous 
situation.  Consider, for example, the 'Black Box' device in an airplane.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Marcus Grimm

> I fixed a couple of bugs in my program...I had converted from clock() to
> the more appropriate gettimeofday and forgot to remove the CLOCKS_PER_SEC
> factor (what's a few order of magnitude between friends :-).  Plus I added
> a 3rd argument so you can in-memory, index, and WAL mode too (or combine
> them).
>
> I should've realized it wasn't running this fast but the small 5000 record
> size got me.
> Test it yourself.
> I do have a 7200RPM drive.  My 261.4 numer is still 2+X your theoretical.

I don't want to be a smart-arse, but I still think your 261.4 is to fast.
On a 7200 RPM drive one will have 125 chances/sec to see a sector to be
written. Since sqlite, under normal journal mode, will need 3 syncs
per commit as far as I can recall, the maximum number drops further
down to 41 commit/sec. This is theoretical, in reality one will see
maybe 20 commits/sec. Not sure if a disc write-cache will interfere
with that caluclation, though.
Am I wrong ? :-)

Your loop omits the sqlite_reset call between the bind statements:
This is allowed since some sqlite versions but I'm not sure if
you are really resetting the implicit transaction. Just an idea..

Your new code still uses the clocks scaling:
printf("%.1f\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
That looks strange to me as your are probably scaling seconds
down to milliseconds, or something.

Marcus


>
> batch 500 500 0
> 274207.4
> batch 500 500 1 -- memory
> 540540.5
> batch 500 500 2 -- w/index
> 160481.4
> batch 500 500 3 -- memory+index
> 220689.7
> batch 500 500 4 -- WAL mode
> 441989.0
> batch 500 500 5 -- WAL mode+memory
> 541455.2
> batch 500 500 6 -- WAL mode+index
> 188902.0
> batch 500 500 7 -- WAL mode+index+memory
> 219478.7
>
> And doing the 5000 record example and testing commit intervals
> batch 5000 5000 0
> 32.0
> batch 5000 2500 0
> 32.0
> batch 5000 1200 0
> 16.0
> batch 5000 500 0
> 16.0
> batch 5000 200 0
> 8.0
> batch 5000 100 0
> 3.6
> batch 5000 500 0
> 16.0
> batch 5000 50 0
> 2.0
> batch 5000 25 0
> 11034.5
> batch 5000 12 0
> 5333.3
> batch 5000 6 0
> 2461.5
> batch 5000 3 0
> 682.3
> batch 5000 2 0
> 509.6
> batch 5000 1 0
> 261.4
>
> #include 
> #include 
> #include 
> #include 
> #include "sqlite3.h"
> double elapsed()
> {
> double t1;
> struct timeval tv;
> static long base_seconds;
> gettimeofday(,NULL);
> if (base_seconds==0) base_seconds=tv.tv_sec;
> return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
> }
> int main(int argc, char *argv[])
> {
>  sqlite3 *db;
>  sqlite3_stmt *stmt=NULL;
>  int rc;
>  int n=0;
>  int nrec=0;
>  int interval=0;
>  int flags=0;
>  double t1;
>  if (argc != 4) {
>   fprintf(stderr,"Usage: %s nrecords commit_interval flags\n",argv[0]);
>   fprintf(stderr,"Flag 0 = simple table\n");
>   fprintf(stderr,"Flag 1 = in-memory database\n");
>   fprintf(stderr,"Flag 2 = add index\n");
>   fprintf(stderr,"Flag 4 = WAL mode\n");
>   fprintf(stderr,"Add flags to combine features\n");
>   exit(-1);
>  }
>  nrec=atoi(argv[1]);
>  interval=atoi(argv[2]);
>  flags=atoi(argv[3]);
>  if (flags & 1) {
>   rc=sqlite3_open(":memory:",);
>  }
>  else {
>   remove("batch.db");
>   rc=sqlite3_open("batch.db",);
>  }
>  if (rc != SQLITE_OK) {
>   puts(sqlite3_errmsg(db));
>  }
>  sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL);
>  if (flags & 2) {
>   sqlite3_exec(db,"create index tx on t (i)",NULL,NULL,NULL);
>  }
>  if (flags & 4) {
>   sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
>  }
>  sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
>  t1=elapsed();
>  if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
>  while(n < nrec) {
>   ++n;
>   if (interval != 1 && (n% interval)==0) {
>sqlite3_exec(db,"commit",NULL,NULL,NULL);
>sqlite3_exec(db,"begin",NULL,NULL,NULL);
>   }
>   sqlite3_bind_int(stmt,1,n);
>   rc = sqlite3_step(stmt);
>   if (rc != SQLITE_DONE) {
>puts(sqlite3_errmsg(db));
>   }
>   sqlite3_reset(stmt);
>  }
>  sqlite3_exec(db,"commit",NULL,NULL,NULL);
>  //printf("%.1f inserts per sec\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
>  printf("%.1f\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
>  sqlite3_close(db);
> }
>
>
>
> Michael D. Black
> Senior Scientist
> NG Information Systems
&g

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Black, Michael (IS)

I fixed a couple of bugs in my program...I had converted from clock() to the 
more appropriate gettimeofday and forgot to remove the CLOCKS_PER_SEC factor 
(what's a few order of magnitude between friends :-).  Plus I added a 3rd 
argument so you can in-memory, index, and WAL mode too (or combine them).

I should've realized it wasn't running this fast but the small 5000 record size 
got me.
Test it yourself.
I do have a 7200RPM drive.  My 261.4 numer is still 2+X your theoretical. 

batch 500 500 0
274207.4
batch 500 500 1 -- memory
540540.5
batch 500 500 2 -- w/index
160481.4
batch 500 500 3 -- memory+index
220689.7
batch 500 500 4 -- WAL mode
441989.0
batch 500 500 5 -- WAL mode+memory
541455.2
batch 500 500 6 -- WAL mode+index
188902.0
batch 500 500 7 -- WAL mode+index+memory
219478.7

And doing the 5000 record example and testing commit intervals
batch 5000 5000 0
32.0
batch 5000 2500 0
32.0
batch 5000 1200 0
16.0
batch 5000 500 0
16.0
batch 5000 200 0
8.0
batch 5000 100 0
3.6
batch 5000 500 0
16.0
batch 5000 50 0
2.0
batch 5000 25 0
11034.5
batch 5000 12 0
5333.3
batch 5000 6 0
2461.5
batch 5000 3 0
682.3
batch 5000 2 0
509.6
batch 5000 1 0
261.4

#include 
#include 
#include 
#include 
#include "sqlite3.h"
double elapsed()
{
double t1;
struct timeval tv;
static long base_seconds;
gettimeofday(,NULL);
if (base_seconds==0) base_seconds=tv.tv_sec;
return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
}
int main(int argc, char *argv[])
{
 sqlite3 *db;
 sqlite3_stmt *stmt=NULL;
 int rc;
 int n=0;
 int nrec=0;
 int interval=0;
 int flags=0;
 double t1;
 if (argc != 4) {
  fprintf(stderr,"Usage: %s nrecords commit_interval flags\n",argv[0]);
  fprintf(stderr,"Flag 0 = simple table\n");
  fprintf(stderr,"Flag 1 = in-memory database\n");
  fprintf(stderr,"Flag 2 = add index\n");
  fprintf(stderr,"Flag 4 = WAL mode\n");
  fprintf(stderr,"Add flags to combine features\n");
  exit(-1);
 }
 nrec=atoi(argv[1]);
 interval=atoi(argv[2]);
 flags=atoi(argv[3]);
 if (flags & 1) {
  rc=sqlite3_open(":memory:",);
 }
 else {
  remove("batch.db");
  rc=sqlite3_open("batch.db",);
 }
 if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
 }
 sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL);
 if (flags & 2) {
  sqlite3_exec(db,"create index tx on t (i)",NULL,NULL,NULL);
 }
 if (flags & 4) {
  sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
 }
 sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
 t1=elapsed();
 if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
 while(n < nrec) {
  ++n;
  if (interval != 1 && (n% interval)==0) {
   sqlite3_exec(db,"commit",NULL,NULL,NULL);
   sqlite3_exec(db,"begin",NULL,NULL,NULL);
  }
  sqlite3_bind_int(stmt,1,n);
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
   puts(sqlite3_errmsg(db));
  }
  sqlite3_reset(stmt);
 }
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 //printf("%.1f inserts per sec\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
 printf("%.1f\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
 sqlite3_close(db);
}



Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate




From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Jim Wilcoxson [pri...@gmail.com]
Sent: Saturday, February 12, 2011 10:11 AM
To: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

On Sat, Feb 12, 2011 at 9:10 AM, Black, Michael (IS) <michael.bla...@ngc.com
> wrote:

> Here's a little benchmark program I wrote to test a super-simple
> one-integer insert to test sql speed and commit interval behavior.
>
> Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I
> go from 320M inserts per second to 361K inserts per second when no
> begin/commit occurs.  With WAL mode turned on it only drops to 5.9M inserts
> per second.
>
> D:\SQLite>batch 5000 5000
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 2500
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 1200
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 600
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 300
> 10666.7 inserts per sec
> D:\SQLite>batch 5000 150
> 5333.3 inserts per sec
> D:\SQLite>batch 5000 75
> 3200.0 inserts per sec
> D:\SQLite>batch 5000 40
> 1777.8 inserts per sec
> D:\SQLite>batch 5000 20
> 1000.0 inserts per sec
> D:\SQLite>batch 5000 10
> 333.3 inserts per sec
> D:\SQLite>batch 5000 5
> 2406015.0 inserts per sec

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Marcus Grimm

> On Sat, Feb 12, 2011 at 9:10 AM, Black, Michael (IS)
> > D:\SQLite>batch 5000 1
>> 360766.6 inserts per sec
>>
>
> Unless I'm missing something, SQLite has to update the first page of the
> database on every commit, to update the change counter.  Assuming you are
> using rotating media, that record can only be updated 120 times per
> second,
> maximum, on a 7200RPM drive.
>
> I don't understand how you can do 360K commits per second if your system
> is
> actually doing "to the platter" writes on every commit.  Can someone clue
> me
> in?

I think the time measuring is not correct as sqlite can't in fact
do a commit with more than appx. 10-20 commits/sec.

here is a slightly modified version, ignore the trigger stuff:
--
#include
#include
#include
#include 
#include 
#include "sqlite3.h"
#include
#include

int AddTrigger(sqlite3 *db)
{
char SqlTxt[256];
int rc;

strcpy(SqlTxt,"CREATE TRIGGER XX AFTER INSERT ON t FOR EACH ROW BEGIN 
");
strcat(SqlTxt,"UPDATE t2 SET n = NEW.i;");
strcat(SqlTxt,"END;");
rc = sqlite3_exec(db,SqlTxt,NULL,NULL,NULL);
if (rc != SQLITE_OK) {
puts(sqlite3_errmsg(db));
}
return(0);
}

int main(int argc, char *argv[])
{
 sqlite3 *db;
 sqlite3_stmt *stmt=NULL;
 int rc;
 int n=0;
 int nrec=0;
 int interval=0;
 intAddTr = 0;
 double t1;
 char   SqlTxt[256];

 if (argc < 3) {
  fprintf(stderr,"Usage: %s nrecords commit_interval\n",argv[0]);
  exit(-1);
 }
 nrec=atoi(argv[1]);
 interval=atoi(argv[2]);
 if( argc == 4 )
AddTr = atoi(argv[3]);
 remove("C:\\batch.db");
 rc=sqlite3_open("C:\\batch.db",);
 if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
 }
 sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL);
 sqlite3_exec(db,"create table t2 (n integer)",NULL,NULL,NULL);
 sqlite3_exec(db,"insert into t2 values(7);",NULL,NULL,NULL);
 if(AddTr)
 AddTrigger(db);

 // turn on WAL mode if you want to test it
 //sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
 //sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
 t1=clock();
 if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
 while(n < nrec)
 {
  ++n;
  if (interval != 1 && (n% interval)==0) {
   sqlite3_exec(db,"commit",NULL,NULL,NULL);
   sqlite3_exec(db,"begin",NULL,NULL,NULL);
  }
  sprintf(SqlTxt, "insert into t values(%d);", n);
  sqlite3_exec(db,SqlTxt,NULL,NULL,NULL);
 }
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 printf("%.1f inserts per sec\n",nrec/((clock()-t1)/CLOCKS_PER_SEC));
 sqlite3_close(db);
}
--

Marcus

>
> Jim
> --
> HashBackup: easy onsite and offsite Unix backup
> http://www.hashbackup.com
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Jim Wilcoxson

On Sat, Feb 12, 2011 at 9:10 AM, Black, Michael (IS)  wrote:

> Here's a little benchmark program I wrote to test a super-simple
> one-integer insert to test sql speed and commit interval behavior.
>
> Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I
> go from 320M inserts per second to 361K inserts per second when no
> begin/commit occurs.  With WAL mode turned on it only drops to 5.9M inserts
> per second.
>
> D:\SQLite>batch 5000 5000
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 2500
> 32000.0 inserts per sec
> D:\SQLite>batch 5000 1200
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 600
> 16000.0 inserts per sec
> D:\SQLite>batch 5000 300
> 10666.7 inserts per sec
> D:\SQLite>batch 5000 150
> 5333.3 inserts per sec
> D:\SQLite>batch 5000 75
> 3200.0 inserts per sec
> D:\SQLite>batch 5000 40
> 1777.8 inserts per sec
> D:\SQLite>batch 5000 20
> 1000.0 inserts per sec
> D:\SQLite>batch 5000 10
> 333.3 inserts per sec
> D:\SQLite>batch 5000 5
> 2406015.0 inserts per sec
> D:\SQLite>batch 5000 2
> 526315.8 inserts per sec
> D:\SQLite>batch 5000 1
> 360766.6 inserts per sec
>

Unless I'm missing something, SQLite has to update the first page of the
database on every commit, to update the change counter.  Assuming you are
using rotating media, that record can only be updated 120 times per second,
maximum, on a 7200RPM drive.

I don't understand how you can do 360K commits per second if your system is
actually doing "to the platter" writes on every commit.  Can someone clue me
in?

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-12 Thread Black, Michael (IS)

Are you wrapping your data dump into a "BEGIN" "COMMIT"?

Also...you mention using a select...I assume you are testing first to see if 
you need to update or insert?

You may want to look at the REPLACE clause and see if you can use that.

Here's a little benchmark program I wrote to test a super-simple one-integer 
insert to test sql speed and commit interval behavior.

Running it on my machine (Windows XP64 8-core 3Ghz gcc (tdm64-1) 4.5.1) I go 
from 320M inserts per second to 361K inserts per second when no begin/commit 
occurs.  With WAL mode turned on it only drops to 5.9M inserts per second.

D:\SQLite>batch 5000 5000
32000.0 inserts per sec
D:\SQLite>batch 5000 2500
32000.0 inserts per sec
D:\SQLite>batch 5000 1200
16000.0 inserts per sec
D:\SQLite>batch 5000 600
16000.0 inserts per sec
D:\SQLite>batch 5000 300
10666.7 inserts per sec
D:\SQLite>batch 5000 150
5333.3 inserts per sec
D:\SQLite>batch 5000 75
3200.0 inserts per sec
D:\SQLite>batch 5000 40
1777.8 inserts per sec
D:\SQLite>batch 5000 20
1000.0 inserts per sec
D:\SQLite>batch 5000 10
333.3 inserts per sec
D:\SQLite>batch 5000 5
2406015.0 inserts per sec
D:\SQLite>batch 5000 2
526315.8 inserts per sec
D:\SQLite>batch 5000 1
360766.6 inserts per sec

#include 
#include 
#include 
#include 
#include "sqlite3.h"
double elapsed()
{
double t1;
struct timeval tv;
static long base_seconds;
gettimeofday(,NULL);
if (base_seconds==0) base_seconds=tv.tv_sec;
return (tv.tv_sec-base_seconds)+tv.tv_usec/(double)100;
}
int main(int argc, char *argv[])
{
 sqlite3 *db;
 sqlite3_stmt *stmt=NULL;
 int rc;
 int n=0;
 int nrec=0;
 int interval=0;
 double t1;
 if (argc != 3) {
  fprintf(stderr,"Usage: %s nrecords commit_interval\n",argv[0]);
  exit(-1);
 }
 nrec=atoi(argv[1]);
 interval=atoi(argv[2]);
 remove("batch.db");
 rc=sqlite3_open("batch.db",);
 if (rc != SQLITE_OK) {
  puts(sqlite3_errmsg(db));
 }
 sqlite3_exec(db,"create table t (i integer)",NULL,NULL,NULL)
 // turn on WAL mode if you want to test it
 //sqlite3_exec(db,"pragma journal_mode=WAL",NULL,NULL,NULL);
 sqlite3_prepare_v2(db,"insert into t values(?)",-1,,NULL);
 t1=elapsed();
 if (interval != 1) rc=sqlite3_exec(db,"begin",NULL,NULL,NULL);
 while(n < nrec) {
  ++n;
  if (interval != 1 && (n% interval)==0) {
   sqlite3_exec(db,"commit",NULL,NULL,NULL);
   sqlite3_exec(db,"begin",NULL,NULL,NULL);
  }
  sqlite3_bind_int(stmt,n,1);
  rc = sqlite3_step(stmt);
  if (rc != SQLITE_DONE) {
   puts(sqlite3_errmsg(db));
  }
 }
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 printf("%.1f inserts per sec\n",nrec/((elapsed()-t1)/CLOCKS_PER_SEC));
 sqlite3_exec(db,"commit",NULL,NULL,NULL);
 sqlite3_close(db);
}

Michael D. Black
Senior Scientist
NG Information Systems
Advanced Analytics Directorate

From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on 
behalf of Thomas Fjellstrom [tfjellst...@strangesoft.net]
Sent: Friday, February 11, 2011 9:50 PM
To: Teg
Cc: General Discussion of SQLite Database
Subject: EXT :Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

On February 11, 2011, Teg wrote:
> Hello Thomas,
>
> I download off the net at between 400Mbps to 1 Gbps and have to pack
> the data away into a Sqlite DB. I use an intermediate buffer on disk
> and a background thread to write the data to the DB. Obviously, I
> couldn't keep up if the data was continuous but, the lulls in between
> the peak data rate times give me time to pack it away. Even if I get
> an extended burst of data, it'll eventually make it in. If I was you,
> I'd start by having something service the interface and buffer the
> packet data and something else pulling the packet data from the buffer
> and feeding it to Sqlite. Then you won't be dropping packets.

The program is split into two separate threads, one is dedicated to just
capturing packets off the network, and storing them in a circular buffer in
memory. The second thread is dedicated to parsing the packets, calculating
stats, and occasionally flushing data to storage.

What normally happens, is as hosts haven't been seen for a while, they get
saved to the db, and are removed from the in memory hash table. But that
doesn't happen for local hosts, the fake host used to capture traffic totals, or
any external hosts that keep showing up. And temporarily I've added some code
to auto save all in memory hosts to the database, as I'm seeing some
instability in the interface with sqlite, causing the process to loose
information once in a while.

I've seen numbers on the internet stating that sqlite is super fast, should be
able to handle tens of thousands of inserts or updates in a second or two,
even in sy

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-11 Thread Jim Wilcoxson

On Fri, Feb 11, 2011 at 10:50 PM, Thomas Fjellstrom <
tfjellst...@strangesoft.net> wrote:

> I've seen numbers on the internet stating that sqlite is super fast, should
> be
> able to handle tens of thousands of inserts or updates in a second or two,
> even in sync mode. So I'm a bit stumped as to why its performance is so low
> in
> my case. One time, I saw it take 5 seconds to flush almost 3k rows back to
> the
> db.
>
> No hosts are actually ever removed from the database, which has about 120k
> hosts in it by now, totaling up to a file size of around 7-8MB total. So
> its
> not a lot of data, and the queries are pretty simple. So I really am
> stumped.
>

You mentioned your db is 8mb.  If you are using the default page size of 1k,
that means you have 8k pages in your db.  The default cache size is 2000
pages, so your db doesn't fit into SQLite's default cache.  I'd suggest
using pragma page_size=4096.  This will have the effect of increasing your
cache size from 2mb to 8mb and prevent cache overflow.  You might want to
use pragma cache_size=3000 as a safety margin.  This would be a 12MB cache
with 4KB pages.

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://www.hashbackup.com
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-11 Thread Thomas Fjellstrom

On February 11, 2011, Teg wrote:
> Hello Thomas,
> 
> I download off the net at between 400Mbps to 1 Gbps and have to pack
> the data away into a Sqlite DB. I use an intermediate buffer on disk
> and a background thread to write the data to the DB. Obviously, I
> couldn't keep up if the data was continuous but, the lulls in between
> the peak data rate times give me time to pack it away. Even if I get
> an extended burst of data, it'll eventually make it in. If I was you,
> I'd start by having something service the interface and buffer the
> packet data and something else pulling the packet data from the buffer
> and feeding it to Sqlite. Then you won't be dropping packets.

The program is split into two separate threads, one is dedicated to just 
capturing packets off the network, and storing them in a circular buffer in 
memory. The second thread is dedicated to parsing the packets, calculating 
stats, and occasionally flushing data to storage.

What normally happens, is as hosts haven't been seen for a while, they get 
saved to the db, and are removed from the in memory hash table. But that 
doesn't happen for local hosts, the fake host used to capture traffic totals, 
or 
any external hosts that keep showing up. And temporarily I've added some code 
to auto save all in memory hosts to the database, as I'm seeing some 
instability in the interface with sqlite, causing the process to loose 
information once in a while.

I've seen numbers on the internet stating that sqlite is super fast, should be 
able to handle tens of thousands of inserts or updates in a second or two, 
even in sync mode. So I'm a bit stumped as to why its performance is so low in 
my case. One time, I saw it take 5 seconds to flush almost 3k rows back to the 
db.

No hosts are actually ever removed from the database, which has about 120k 
hosts in it by now, totaling up to a file size of around 7-8MB total. So its 
not a lot of data, and the queries are pretty simple. So I really am stumped.

> 
> T
> 
> Friday, February 11, 2011, 6:49:16 PM, you wrote:
> 
> TF> Hi, I have a small problem with a program I've developed.
> 
> TF> It basically captures packets from a network device on linux and stores
> TF> transfer stats on each host seen. To store the stats long term I
> decided to TF> use sqlite, and when the program saves the stats every few
> minutes, it takes TF> about 4-5 seconds, and if I have the synchronous
> pragma turned off, it takes TF> 1-2 seconds.
> 
> TF> These are the relevant sql commands:
> 
> TF> CREATE TABLE hosts (id INTEGER PRIMARY KEY,\
> TF>   address INTEGER UNIQUE, \
> TF>   first_seen INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_on INTEGER DEFAULT CURRENT_DATETIME, \
> TF>   last_off INTEGER, \
> TF>   rx_bytes INTEGER, \
> TF>   tx_bytes INTEGER);
> 
> TF> UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ?
> WHERE id TF> = ?
> TF> INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes )
> VALUES ( TF> ?, ?, ?, ?, ? )
> 
> TF> The stats saving code will interleave a bunch of UPDATEs and INSERTs,
> inside a TF> single transaction, maybe that has something to do with it,
> I'm not sure.
> 
> TF> 1s to update them all isn't too bad, but it still can mean I'm
> potentially TF> dropping packets, which I'd really rather not do.
> 
> TF> Thanks.

-- 
Thomas Fjellstrom
tfjellst...@strangesoft.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-11 Thread Teg

Hello Thomas,

I download off the net at between 400Mbps to 1 Gbps and have to pack
the data away into a Sqlite DB. I use an intermediate buffer on disk
and a background thread to write the data to the DB. Obviously, I
couldn't keep up if the data was continuous but, the lulls in between
the peak data rate times give me time to pack it away. Even if I get
an extended burst of data, it'll eventually make it in. If I was you,
I'd start by having something service the interface and buffer the
packet data and something else pulling the packet data from the buffer
and feeding it to Sqlite. Then you won't be dropping packets.


T

Friday, February 11, 2011, 6:49:16 PM, you wrote:

TF> Hi, I have a small problem with a program I've developed.

TF> It basically captures packets from a network device on linux and stores
TF> transfer stats on each host seen. To store the stats long term I decided to
TF> use sqlite, and when the program saves the stats every few minutes, it takes
TF> about 4-5 seconds, and if I have the synchronous pragma turned off, it takes
TF> 1-2 seconds.

TF> These are the relevant sql commands:

TF> CREATE TABLE hosts (id INTEGER PRIMARY KEY,\
TF>   address INTEGER UNIQUE, \
TF>   first_seen INTEGER DEFAULT CURRENT_DATETIME, \
TF>   last_on INTEGER DEFAULT CURRENT_DATETIME, \
TF>   last_off INTEGER, \
TF>   rx_bytes INTEGER, \
TF>   tx_bytes INTEGER);

TF> UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ? 
WHERE id
TF> = ?
TF> INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes ) 
VALUES (
TF> ?, ?, ?, ?, ? )

TF> The stats saving code will interleave a bunch of UPDATEs and INSERTs, 
inside a
TF> single transaction, maybe that has something to do with it, I'm not sure.

TF> 1s to update them all isn't too bad, but it still can mean I'm potentially
TF> dropping packets, which I'd really rather not do.

TF> Thanks.


___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] UPDATE/INSERTing 1-2k rows slower than expected

2011-02-11 Thread Thomas Fjellstrom

Hi, I have a small problem with a program I've developed.

It basically captures packets from a network device on linux and stores 
transfer stats on each host seen. To store the stats long term I decided to 
use sqlite, and when the program saves the stats every few minutes, it takes 
about 4-5 seconds, and if I have the synchronous pragma turned off, it takes 
1-2 seconds.

These are the relevant sql commands:

CREATE TABLE hosts (id INTEGER PRIMARY KEY,\
  address INTEGER UNIQUE, \
  first_seen INTEGER DEFAULT CURRENT_DATETIME, \
  last_on INTEGER DEFAULT CURRENT_DATETIME, \
  last_off INTEGER, \
  rx_bytes INTEGER, \
  tx_bytes INTEGER);

UPDATE hosts SET last_on = ?, last_off = ?, rx_bytes = ?, tx_bytes = ? WHERE id 
= ?
INSERT INTO hosts ( address, first_seen, last_on, rx_bytes, tx_bytes ) VALUES ( 
?, ?, ?, ?, ? )

The stats saving code will interleave a bunch of UPDATEs and INSERTs, inside a 
single transaction, maybe that has something to do with it, I'm not sure.

1s to update them all isn't too bad, but it still can mean I'm potentially 
dropping packets, which I'd really rather not do.

Thanks.

-- 
Thomas Fjellstrom
tfjellst...@strangesoft.net
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

48 matches

Mail list logo