Re: [sqlite] Corruption on many Windows XP hosts

2011-04-14 Thread Richard Hipp
On Thu, Apr 14, 2011 at 5:40 AM, James Green wrote:

> Support just flagged another site. This one has a more "interesting"
> corruption issue in that data intended for one table ended up across two
> others.
>
> We have no single component that would open all of those tables during it's
> runtime.


All of the tables and indices for a single SQLite database are jumbled
together into a single file.  If something goes wrong while writing one
table, it is likely as not to damage other unrelated and/or unused tables
and indices in that same file.  SQLite makes no attempt to firewall tables
from one another.



> Probably of little consequence we also noticed that F-Secure issued
> *something* about sqlite3odbc.dll.
>
> Event Type:Error
> Event Source:F-Secure Gatekeeper
> Event Category:None
> Event ID:1
> Date:05/04/2011
> Time:13:06:40
> User:N/A
> Computer:E4886_C_8377
> Description:
> The description for Event ID ( 1 ) in Source ( F-Secure Gatekeeper ) cannot
> be found. The local computer may not have the necessary registry
> information
> or message DLL files to display messages from a remote computer. You may be
> able to use the /AUXSOURCE= flag to retrieve this description; see Help and
> Support for details. The following information is part of the event: ,
> \Device\HarddiskVolume2...sqlite3odbc.dll.
>

I don't know what this means.  But SQLite does not have a "networking" or
"event" component.  SQLite is a pure C library.  It does not communicate
with remote machines.  My understand is that sqlite3odbc.dll is the same
way, but my knowledge of sqlite3odbc is much more limited.



> Data:
> : 00 00 14 00 02 00 84 00   ..„.
> 0008: 00 00 00 00 01 00 05 c0   ...À
> 0010: 00 00 00 00 02 01 00 00   
> 0018: 00 00 00 00 00 00 00 00   
> 0020: 00 00 00 00 00 00 00 00   
> 0028: 01 00 00 00 08 08 00 00   
> 0030: 01 00 00 00 40 00 00 00   @...
> 0038: 00 00 00 00   
>
> No logs further to that could be found :(
>
> We did however decide to copy the database and run a quick vacuum &
> integrity_check:
> sqlite> vacuum;
> sqlite> pragma integrity_check;
> wrong # of entries in index IXED_EventDetails
> wrong # of entries in index sqlite_autoindex_EventDetails_1
>

Such problems can be fixed by running REINDEX.


>
> The wrongly-located data was still present afterwards. EventDetails was not
> involved in this mis-placed data issue.
>
> James
>
> On 12 April 2011 15:01, James Green  wrote:
>
> > Incidentally running pragma integrity_check on one of the samples I
> > have been given returns:
> >
> > *** in database main ***
> > On tree page 1824 cell 0: invalid page number 218103808
> > On page 73608 at right child: 2nd reference to page 92497
> > Page 92110 is never used
> > Page 92494 is never used
> > Page 92496 is never used
> > Page 92499 is never used
> > Page 92508 is never used
> > Page 92511 is never used
> > Page 92512 is never used
> > Page 92513 is never used
> > Error: database disk image is malformed
> >
> > James
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-14 Thread Filip Navara
Just to be sure. Could you run the application under Application
Verifier with heap checks enabled?

Best regards,
Filip Navara

On Thu, Apr 14, 2011 at 11:40 AM, James Green  wrote:
> Support just flagged another site. This one has a more "interesting"
> corruption issue in that data intended for one table ended up across two
> others.
>
> We have no single component that would open all of those tables during it's
> runtime. Probably of little consequence we also noticed that F-Secure issued
> *something* about sqlite3odbc.dll.
>
> Event Type:    Error
> Event Source:    F-Secure Gatekeeper
> Event Category:    None
> Event ID:    1
> Date:        05/04/2011
> Time:        13:06:40
> User:        N/A
> Computer:    E4886_C_8377
> Description:
> The description for Event ID ( 1 ) in Source ( F-Secure Gatekeeper ) cannot
> be found. The local computer may not have the necessary registry information
> or message DLL files to display messages from a remote computer. You may be
> able to use the /AUXSOURCE= flag to retrieve this description; see Help and
> Support for details. The following information is part of the event: ,
> \Device\HarddiskVolume2...sqlite3odbc.dll.
> Data:
> : 00 00 14 00 02 00 84 00   ..„.
> 0008: 00 00 00 00 01 00 05 c0   ...À
> 0010: 00 00 00 00 02 01 00 00   
> 0018: 00 00 00 00 00 00 00 00   
> 0020: 00 00 00 00 00 00 00 00   
> 0028: 01 00 00 00 08 08 00 00   
> 0030: 01 00 00 00 40 00 00 00   @...
> 0038: 00 00 00 00               
>
> No logs further to that could be found :(
>
> We did however decide to copy the database and run a quick vacuum &
> integrity_check:
> sqlite> vacuum;
> sqlite> pragma integrity_check;
> wrong # of entries in index IXED_EventDetails
> wrong # of entries in index sqlite_autoindex_EventDetails_1
>
> The wrongly-located data was still present afterwards. EventDetails was not
> involved in this mis-placed data issue.
>
> James
>
> On 12 April 2011 15:01, James Green  wrote:
>
>> Incidentally running pragma integrity_check on one of the samples I
>> have been given returns:
>>
>> *** in database main ***
>> On tree page 1824 cell 0: invalid page number 218103808
>> On page 73608 at right child: 2nd reference to page 92497
>> Page 92110 is never used
>> Page 92494 is never used
>> Page 92496 is never used
>> Page 92499 is never used
>> Page 92508 is never used
>> Page 92511 is never used
>> Page 92512 is never used
>> Page 92513 is never used
>> Error: database disk image is malformed
>>
>> James
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-14 Thread James Green
Support just flagged another site. This one has a more "interesting"
corruption issue in that data intended for one table ended up across two
others.

We have no single component that would open all of those tables during it's
runtime. Probably of little consequence we also noticed that F-Secure issued
*something* about sqlite3odbc.dll.

Event Type:Error
Event Source:F-Secure Gatekeeper
Event Category:None
Event ID:1
Date:05/04/2011
Time:13:06:40
User:N/A
Computer:E4886_C_8377
Description:
The description for Event ID ( 1 ) in Source ( F-Secure Gatekeeper ) cannot
be found. The local computer may not have the necessary registry information
or message DLL files to display messages from a remote computer. You may be
able to use the /AUXSOURCE= flag to retrieve this description; see Help and
Support for details. The following information is part of the event: ,
\Device\HarddiskVolume2...sqlite3odbc.dll.
Data:
: 00 00 14 00 02 00 84 00   ..„.
0008: 00 00 00 00 01 00 05 c0   ...À
0010: 00 00 00 00 02 01 00 00   
0018: 00 00 00 00 00 00 00 00   
0020: 00 00 00 00 00 00 00 00   
0028: 01 00 00 00 08 08 00 00   
0030: 01 00 00 00 40 00 00 00   @...
0038: 00 00 00 00   

No logs further to that could be found :(

We did however decide to copy the database and run a quick vacuum &
integrity_check:
sqlite> vacuum;
sqlite> pragma integrity_check;
wrong # of entries in index IXED_EventDetails
wrong # of entries in index sqlite_autoindex_EventDetails_1

The wrongly-located data was still present afterwards. EventDetails was not
involved in this mis-placed data issue.

James

On 12 April 2011 15:01, James Green  wrote:

> Incidentally running pragma integrity_check on one of the samples I
> have been given returns:
>
> *** in database main ***
> On tree page 1824 cell 0: invalid page number 218103808
> On page 73608 at right child: 2nd reference to page 92497
> Page 92110 is never used
> Page 92494 is never used
> Page 92496 is never used
> Page 92499 is never used
> Page 92508 is never used
> Page 92511 is never used
> Page 92512 is never used
> Page 92513 is never used
> Error: database disk image is malformed
>
> James
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Pete Attkins

On 2011-04-13, at 15:53, Richard Hipp wrote:

> On Wed, Apr 13, 2011 at 10:33 AM, Simon Slavin  
>  wrote:
>
>>
>> On 13 Apr 2011, at 12:14pm, James Green wrote:
>>
>>> sync=full does not work well for our app (no transactions). Far  
>>> too slow.
>>
>> If you're not syncing, then section 3.2 of the page Richard probably
>> indicates what's causing your corruption.
>>
>
> We have assumed that disabling synchronous requires a power failure  
> or hard
> reset in order to cause corruption.  But that is based on the  
> premise that
> operating system disk case is coherent.  In other words, if two  
> processes
> running on the same machine have the same disk file open, and if  
> process A
> writes pages 1 and 2, in that order, and process B reads pages 1 and  
> 2 at
> the same time, then process B will either see none of A's change, or  
> B will
> see the changes to page 1 and not page 2, or B will see both  
> change.  It
> will never be the case that B will see the changes to page 2 but not  
> the
> changes to page 1.
>
> Can somebody please confirm for me that windows works this way?
>
> If it turns out that I'm wrong and the windows disk cache is not  
> coherent,
> then you will indeed need to set synchronous=FULL on windows in  
> order to
> prevent corruption.  Or, perhaps we can find some magic I/O barrier  
> system
> call for windows that we can insert in place of the  
> FlushFileBuffers() when
> synchronous is OFF.  But for now, until I get better information,  
> I'm going
> to assume that he windows disk cache is indeed coherent and that  
> none of
> this is necessary.
> [...]

Sorry to chip in again, Richard, but would you kindly send me a  
personal email on these issues. I know a thing or two about the  
intricacies of operating systems and have extensive experience of  
using SQLite (and many other databases) in non-robust environments.

I would like to offer suggestions to help the original poster to solve  
the problem, but my lack of diction makes me unable to word my  
questions and suggestions in the right manner for this forum.

Kindest regards to you all,

Pete

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Richard Hipp
On Tue, Apr 12, 2011 at 5:15 AM, James Green wrote:

> We have an application that we distribute to our clients each of whom
> run Windows - often it's a desktop XP machine. Our app consists of a
> set of Delphi services and a PHP web front end each of which use ODBC
> for database connectivity.
>
> The vast majority of installations have shipped with sqliteodbc-0.83
> (sqlite 3.6.14.2). Over several weeks a number of these installations
> have broken down - our support staff have noticed problems with the
> sqlite databases which turn out to be corrupt.
>
> We recently began shipping a third component - a Java app which again
> connected by ODBC. However, it used threads and the database fell over
> inside of hours. We found and fixed mulitple thread safety issues and
> the rate of corruption reduced signficantly.


Can you elaborate on the thread-safety issues that you discovered?



> However the corruption
> issue remained albeit smaller in frequency.
>
> In the past month or so we have tested more intensively with
> sqliteodbc-0.88 (sqlite 3.7.4 and 3.7.5 with our own build). We can
> still corrupt the database however. So a couple of weeks ago we
> switched to using WAL mode which so far hasn't had any corruption to
> my knowledge.
>
> I suspect that we are avoiding the root cause by switching the
> behaviour of sqlite on disk. My question is does anyone have pointers
> to help us isolate the problems we are seeing. We can obviously
> provide further detail to your questions.
>
> Many thanks,
>
> James
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Simon Slavin

On 13 Apr 2011, at 4:36pm, James Green wrote:

> On 13 April 2011 15:33, Simon Slavin  wrote:
>> 
>> On 13 Apr 2011, at 12:14pm, James Green wrote:
>> 
>>> sync=full does not work well for our app (no transactions). Far too slow.
>> 
>> If you're not syncing, then section 3.2 of the page Richard probably 
>> indicates what's causing your corruption.
> 
> Without a reset or power loss? This is the bit that's causing concern.

Actually I'm now persuaded that everyone else is right, and that this may not 
be causing your problem.

>> So for a while, leave that PRAGMA alone, accept that you're going to get 
>> slow running, and see whether this makes the corruption go away.  If it does 
>> then you can tackle the speed problem separately without worrying about data 
>> corruption.
> 
> This is the approach we will be rolling out. More specifically we'll
> be trying synchronous=normal. It will likely take a few days before we
> can roll this out and evaluate.

... nevertheless, this might be a worthwhile test.  Whether this makes 
everything work normally or not it could tell you at what level your data is 
getting corrupted.  Another useful test might be to run integrity checking 
without waiting for any obvious corruption.

In all these things take Richard's advice over mine, of course.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Richard Hipp
On Wed, Apr 13, 2011 at 11:36 AM, James Green wrote:

> On 13 April 2011 15:33, Simon Slavin  wrote:
> >
> > On 13 Apr 2011, at 12:14pm, James Green wrote:
> >
> >> sync=full does not work well for our app (no transactions). Far too
> slow.
> >
> > If you're not syncing, then section 3.2 of the page Richard probably
> indicates what's causing your corruption.
>
> Without a reset or power loss? This is the bit that's causing concern.
>
> > So for a while, leave that PRAGMA alone, accept that you're going to get
> slow running, and see whether this makes the corruption go away.  If it does
> then you can tackle the speed problem separately without worrying about data
> corruption.
>
> This is the approach we will be rolling out. More specifically we'll
> be trying synchronous=normal. It will likely take a few days before we
> can roll this out and evaluate.
>

synchronous=NORMAL in combination with journal_mode=WAL does syncing only on
a checkpoint, not on a commit.  That should give you good performance and
safety from corruption due to power failure or reset.





>
> > It might be worth trying to figure out whether your storage system can
> handle the speed you're expecting.  If you have multiple threads all trying
> to write at the same time can your hard disk handle that much data at that
> speed ?  If not, then that's probably why you're finding the use of 'sync'
> to be too slow.
>
> We can say that SQL Server survives just fine. We are not hitting the
> disk but hundreds of megabytes of changes. One of our operations has
> the following times:
>
> sync=FULL: 3m 43s
> sync=OFF: 10.5s
>
> We're talking approximately 2,000 rows consisting of 3x 32-char string
> columns and 2x booleans. This is without WAL mode and without
> transactions.
>
> Worth noting is that we only have one test environment in which we can
> get corruption to occur and we cannot get it to happen at will. All
> other test systems operate fine without corruption with sync=OFF.
>
> James
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread James Green
On 13 April 2011 15:33, Simon Slavin  wrote:
>
> On 13 Apr 2011, at 12:14pm, James Green wrote:
>
>> sync=full does not work well for our app (no transactions). Far too slow.
>
> If you're not syncing, then section 3.2 of the page Richard probably 
> indicates what's causing your corruption.

Without a reset or power loss? This is the bit that's causing concern.

> So for a while, leave that PRAGMA alone, accept that you're going to get slow 
> running, and see whether this makes the corruption go away.  If it does then 
> you can tackle the speed problem separately without worrying about data 
> corruption.

This is the approach we will be rolling out. More specifically we'll
be trying synchronous=normal. It will likely take a few days before we
can roll this out and evaluate.

> It might be worth trying to figure out whether your storage system can handle 
> the speed you're expecting.  If you have multiple threads all trying to write 
> at the same time can your hard disk handle that much data at that speed ?  If 
> not, then that's probably why you're finding the use of 'sync' to be too slow.

We can say that SQL Server survives just fine. We are not hitting the
disk but hundreds of megabytes of changes. One of our operations has
the following times:

sync=FULL: 3m 43s
sync=OFF: 10.5s

We're talking approximately 2,000 rows consisting of 3x 32-char string
columns and 2x booleans. This is without WAL mode and without
transactions.

Worth noting is that we only have one test environment in which we can
get corruption to occur and we cannot get it to happen at will. All
other test systems operate fine without corruption with sync=OFF.

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Pavel Ivanov
> Pavel, please tell me you are wrong.  Surely windows maintains disk cache
> coherency even in the absence of explicit FlushFileBuffers() calls?

I'm sorry, I've got this idea from some post on this list. And only
after I wrote the email I've realized how scary it could be if it
worked this way.


Pavel


On Wed, Apr 13, 2011 at 10:25 AM, Richard Hipp  wrote:
> On Wed, Apr 13, 2011 at 9:53 AM, Pavel Ivanov  wrote:
>
>> > The fact that one engineer installed a site, began operating the app,
>> > then saw it become corrupt minutes later rules out power loss or hard
>> > resets in at least that case. An operating system level problem should
>> > have been noticed by now given it's Windows XP... And the file is
>> > locally held too.
>>
>> I don't know how Windows's disk cache works but theoretically when you
>> set synchronous to OFF (0) OS can write database pages to disk in any
>> order it likes, disregarding the order SQLite requires. So while one
>> process writes those pages, another process can read those pages and
>> because of random writing order second process can read inconsistent
>> data, meaning it sees corrupted database.
>>
>
> Pavel, please tell me you are wrong.  Surely windows maintains disk cache
> coherency even in the absence of explicit FlushFileBuffers() calls?
>
> Can any windows experts comment on this?
>
>
>
>>
>>
>> Pavel
>>
>>
>> On Wed, Apr 13, 2011 at 7:14 AM, James Green 
>> wrote:
>> > Yep we've read through that.
>> >
>> > Several of the possibilities are difficult to rule out of course. They
>> > just seem highly unlikely (!) given the rate of corruption across our
>> > sites.
>> >
>> > The fact that one engineer installed a site, began operating the app,
>> > then saw it become corrupt minutes later rules out power loss or hard
>> > resets in at least that case. An operating system level problem should
>> > have been noticed by now given it's Windows XP... And the file is
>> > locally held too.
>> >
>> > Is it possible that we're seeing something nasty occurring as a result
>> > of anti-virus software perhaps? I might expect a higher rate of
>> > corruption if so.
>> >
>> > The only avenue apparently open to us is to use WAL mode. Would WAL +
>> > sync=normal provide a much better resistance to corruption? We
>> > (probably) don't care about losing the last set of SQL statements.
>> >
>> > sync=full does not work well for our app (no transactions). Far too slow.
>> >
>> > James
>> >
>> > On 13 April 2011 11:47, Richard Hipp  wrote:
>> >> On Tue, Apr 12, 2011 at 5:15 AM, James Green > >wrote:
>> >>
>> >>>  My question is does anyone have pointers
>> >>> to help us isolate the problems we are seeing.
>> >>>
>> >>
>> >> http://www.sqlite.org/howtocorrupt.html
>> >>
>> >> --
>> >> D. Richard Hipp
>> >> d...@sqlite.org
>> >> ___
>> >> sqlite-users mailing list
>> >> sqlite-users@sqlite.org
>> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >>
>> > ___
>> > sqlite-users mailing list
>> > sqlite-users@sqlite.org
>> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>> >
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
>
>
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Richard Hipp
On Wed, Apr 13, 2011 at 10:33 AM, Simon Slavin  wrote:

>
> On 13 Apr 2011, at 12:14pm, James Green wrote:
>
> > sync=full does not work well for our app (no transactions). Far too slow.
>
> If you're not syncing, then section 3.2 of the page Richard probably
> indicates what's causing your corruption.
>

We have assumed that disabling synchronous requires a power failure or hard
reset in order to cause corruption.  But that is based on the premise that
operating system disk case is coherent.  In other words, if two processes
running on the same machine have the same disk file open, and if process A
writes pages 1 and 2, in that order, and process B reads pages 1 and 2 at
the same time, then process B will either see none of A's change, or B will
see the changes to page 1 and not page 2, or B will see both change.  It
will never be the case that B will see the changes to page 2 but not the
changes to page 1.

Can somebody please confirm for me that windows works this way?

If it turns out that I'm wrong and the windows disk cache is not coherent,
then you will indeed need to set synchronous=FULL on windows in order to
prevent corruption.  Or, perhaps we can find some magic I/O barrier system
call for windows that we can insert in place of the FlushFileBuffers() when
synchronous is OFF.  But for now, until I get better information, I'm going
to assume that he windows disk cache is indeed coherent and that none of
this is necessary.



>
> So for a while, leave that PRAGMA alone, accept that you're going to get
> slow running, and see whether this makes the corruption go away.  If it does
> then you can tackle the speed problem separately without worrying about data
> corruption.
>
> It might be worth trying to figure out whether your storage system can
> handle the speed you're expecting.  If you have multiple threads all trying
> to write at the same time can your hard disk handle that much data at that
> speed ?  If not, then that's probably why you're finding the use of 'sync'
> to be too slow.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Jay A. Kreibich
On Wed, Apr 13, 2011 at 09:53:27AM -0400, Pavel Ivanov scratched on the wall:
> > The fact that one engineer installed a site, began operating the app,
> > then saw it become corrupt minutes later rules out power loss or hard
> > resets in at least that case. An operating system level problem should
> > have been noticed by now given it's Windows XP... And the file is
> > locally held too.
> 
> I don't know how Window's disk cache works but theoretically when you
> set synchronous to OFF (0) OS can write database pages to disk in any
> order it likes, disregarding the order SQLite requires. So while one
> process writes those pages, another process can read those pages and
> because of random writing order second process can read inconsistent
> data, meaning it sees corrupted database.

  That may be, but I can't see that being a problem:

  1)  The pages hit the OS file cache, and are going to be pulled back
  out of the cache.  The pages will show up in the OS file cache
  in the order they are "written."

  2)  Turning synchronous off does not turn locking off.  Even if the
  page order was changed, other SQLite processes can't touch the
  file until the first process finishes writing and unlocks the
  database.  At that point, all the pages should be in the file
  cache.

-j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Pete Attkins

On 2011-04-13, at 15:25, Richard Hipp wrote:

> On Wed, Apr 13, 2011 at 9:53 AM, Pavel Ivanov   
> wrote:
>
>>> The fact that one engineer installed a site, began operating the  
>>> app,
>>> then saw it become corrupt minutes later rules out power loss or  
>>> hard
>>> resets in at least that case. An operating system level problem  
>>> should
>>> have been noticed by now given it's Windows XP... And the file is
>>> locally held too.
>>
>> I don't know how Windows's disk cache works but theoretically when  
>> you
>> set synchronous to OFF (0) OS can write database pages to disk in any
>> order it likes, disregarding the order SQLite requires. So while one
>> process writes those pages, another process can read those pages and
>> because of random writing order second process can read inconsistent
>> data, meaning it sees corrupted database.
>>
>
> Pavel, please tell me you are wrong.  Surely windows maintains disk  
> cache
> coherency even in the absence of explicit FlushFileBuffers() calls?
>
> Can any windows experts comment on this?

Of course OS disk caches maintain coherency. The only exceptions to  
this is power failure and hardware malfunction.

FlushFileBuffers() may return before all pending writes have been  
written to the disk surface rendering the function useless. Some  
drives acknowledge a flush request once the data has been written to  
their internal cache instead of waiting until the data has been  
written to the disk surface. Many IDE drives are notorious for this  
problem, which is why "robust" systems used to have SCSI drives.

> [...]

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Simon Slavin

On 13 Apr 2011, at 12:14pm, James Green wrote:

> sync=full does not work well for our app (no transactions). Far too slow.

If you're not syncing, then section 3.2 of the page Richard probably indicates 
what's causing your corruption.

So for a while, leave that PRAGMA alone, accept that you're going to get slow 
running, and see whether this makes the corruption go away.  If it does then 
you can tackle the speed problem separately without worrying about data 
corruption.

It might be worth trying to figure out whether your storage system can handle 
the speed you're expecting.  If you have multiple threads all trying to write 
at the same time can your hard disk handle that much data at that speed ?  If 
not, then that's probably why you're finding the use of 'sync' to be too slow.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Richard Hipp
On Wed, Apr 13, 2011 at 9:53 AM, Pavel Ivanov  wrote:

> > The fact that one engineer installed a site, began operating the app,
> > then saw it become corrupt minutes later rules out power loss or hard
> > resets in at least that case. An operating system level problem should
> > have been noticed by now given it's Windows XP... And the file is
> > locally held too.
>
> I don't know how Windows's disk cache works but theoretically when you
> set synchronous to OFF (0) OS can write database pages to disk in any
> order it likes, disregarding the order SQLite requires. So while one
> process writes those pages, another process can read those pages and
> because of random writing order second process can read inconsistent
> data, meaning it sees corrupted database.
>

Pavel, please tell me you are wrong.  Surely windows maintains disk cache
coherency even in the absence of explicit FlushFileBuffers() calls?

Can any windows experts comment on this?



>
>
> Pavel
>
>
> On Wed, Apr 13, 2011 at 7:14 AM, James Green 
> wrote:
> > Yep we've read through that.
> >
> > Several of the possibilities are difficult to rule out of course. They
> > just seem highly unlikely (!) given the rate of corruption across our
> > sites.
> >
> > The fact that one engineer installed a site, began operating the app,
> > then saw it become corrupt minutes later rules out power loss or hard
> > resets in at least that case. An operating system level problem should
> > have been noticed by now given it's Windows XP... And the file is
> > locally held too.
> >
> > Is it possible that we're seeing something nasty occurring as a result
> > of anti-virus software perhaps? I might expect a higher rate of
> > corruption if so.
> >
> > The only avenue apparently open to us is to use WAL mode. Would WAL +
> > sync=normal provide a much better resistance to corruption? We
> > (probably) don't care about losing the last set of SQL statements.
> >
> > sync=full does not work well for our app (no transactions). Far too slow.
> >
> > James
> >
> > On 13 April 2011 11:47, Richard Hipp  wrote:
> >> On Tue, Apr 12, 2011 at 5:15 AM, James Green  >wrote:
> >>
> >>>  My question is does anyone have pointers
> >>> to help us isolate the problems we are seeing.
> >>>
> >>
> >> http://www.sqlite.org/howtocorrupt.html
> >>
> >> --
> >> D. Richard Hipp
> >> d...@sqlite.org
> >> ___
> >> sqlite-users mailing list
> >> sqlite-users@sqlite.org
> >> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >>
> > ___
> > sqlite-users mailing list
> > sqlite-users@sqlite.org
> > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
> >
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Pavel Ivanov
> The fact that one engineer installed a site, began operating the app,
> then saw it become corrupt minutes later rules out power loss or hard
> resets in at least that case. An operating system level problem should
> have been noticed by now given it's Windows XP... And the file is
> locally held too.

I don't know how Windows's disk cache works but theoretically when you
set synchronous to OFF (0) OS can write database pages to disk in any
order it likes, disregarding the order SQLite requires. So while one
process writes those pages, another process can read those pages and
because of random writing order second process can read inconsistent
data, meaning it sees corrupted database.


Pavel


On Wed, Apr 13, 2011 at 7:14 AM, James Green  wrote:
> Yep we've read through that.
>
> Several of the possibilities are difficult to rule out of course. They
> just seem highly unlikely (!) given the rate of corruption across our
> sites.
>
> The fact that one engineer installed a site, began operating the app,
> then saw it become corrupt minutes later rules out power loss or hard
> resets in at least that case. An operating system level problem should
> have been noticed by now given it's Windows XP... And the file is
> locally held too.
>
> Is it possible that we're seeing something nasty occurring as a result
> of anti-virus software perhaps? I might expect a higher rate of
> corruption if so.
>
> The only avenue apparently open to us is to use WAL mode. Would WAL +
> sync=normal provide a much better resistance to corruption? We
> (probably) don't care about losing the last set of SQL statements.
>
> sync=full does not work well for our app (no transactions). Far too slow.
>
> James
>
> On 13 April 2011 11:47, Richard Hipp  wrote:
>> On Tue, Apr 12, 2011 at 5:15 AM, James Green wrote:
>>
>>>  My question is does anyone have pointers
>>> to help us isolate the problems we are seeing.
>>>
>>
>> http://www.sqlite.org/howtocorrupt.html
>>
>> --
>> D. Richard Hipp
>> d...@sqlite.org
>> ___
>> sqlite-users mailing list
>> sqlite-users@sqlite.org
>> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>>
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread James Green
Yep we've read through that.

Several of the possibilities are difficult to rule out of course. They
just seem highly unlikely (!) given the rate of corruption across our
sites.

The fact that one engineer installed a site, began operating the app,
then saw it become corrupt minutes later rules out power loss or hard
resets in at least that case. An operating system level problem should
have been noticed by now given it's Windows XP... And the file is
locally held too.

Is it possible that we're seeing something nasty occurring as a result
of anti-virus software perhaps? I might expect a higher rate of
corruption if so.

The only avenue apparently open to us is to use WAL mode. Would WAL +
sync=normal provide a much better resistance to corruption? We
(probably) don't care about losing the last set of SQL statements.

sync=full does not work well for our app (no transactions). Far too slow.

James

On 13 April 2011 11:47, Richard Hipp  wrote:
> On Tue, Apr 12, 2011 at 5:15 AM, James Green wrote:
>
>>  My question is does anyone have pointers
>> to help us isolate the problems we are seeing.
>>
>
> http://www.sqlite.org/howtocorrupt.html
>
> --
> D. Richard Hipp
> d...@sqlite.org
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread Richard Hipp
On Tue, Apr 12, 2011 at 5:15 AM, James Green wrote:

>  My question is does anyone have pointers
> to help us isolate the problems we are seeing.
>

http://www.sqlite.org/howtocorrupt.html

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-13 Thread James Green
Another follow-up in the hope someone can shed light on the situation
or ask for further specific testing.

I am told the only thing we change from the default settings is that
synchronous=0. Apparently full mode was far too slow. Reading the
documentation it clearly states that there are limited circumstances
in which corruption can occur but that they involve O/S crash or power
loss between the time the application writes data and the point that
the hardware actually commits it to physical media.

An engineer that went in to "rescue" one of our sites got out app up
and running on a fresh db, then managed to get corruption again inside
a few minutes - he confirms there was absolutely no crash of any kind.
This is most worrying...

Filip Navara from this list kindly asked to inspect a sample database
and has told me that the journal appears partially written.

Our support department reports that the frequency of sites becoming
corrupt is in the region of 1-2 per week and it's been happening for
several weeks (since we picked up on the pattern). We're looking at
roughly 30 instances so far.

Thoughts and recommendations welcome.

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-12 Thread James Green
Incidentally running pragma integrity_check on one of the samples I
have been given returns:

*** in database main ***
On tree page 1824 cell 0: invalid page number 218103808
On page 73608 at right child: 2nd reference to page 92497
Page 92110 is never used
Page 92494 is never used
Page 92496 is never used
Page 92499 is never used
Page 92508 is never used
Page 92511 is never used
Page 92512 is never used
Page 92513 is never used
Error: database disk image is malformed

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-12 Thread James Green
On 12 April 2011 11:27, Simon Slavin  wrote:
>
> On 12 Apr 2011, at 10:15am, James Green wrote:
[ ... ]
> Was it built threadsafe ?  See
>
> http://www.sqlite.org/threadsafe.html

Apparently so - the odbc drivers use -DTHREADSAFE=1 when building sqlite.

[ ... ]

> You don't mention which parts of the documentation you're aware of, so I'll 
> point to the above and section 6 of
>
> 

I can confirm the following:

1. Power loss is not causing this
2. The database file is held locally (not on a network share of any sort)
3. We are not using transactions

Any further queries welcomed.

Thanks,

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Corruption on many Windows XP hosts

2011-04-12 Thread Simon Slavin

On 12 Apr 2011, at 10:15am, James Green wrote:

> We recently began shipping a third component - a Java app which again
> connected by ODBC. However, it used threads and the database fell over
> inside of hours. We found and fixed mulitple thread safety issues and
> the rate of corruption reduced signficantly. However the corruption
> issue remained albeit smaller in frequency.
> 
> In the past month or so we have tested more intensively with
> sqliteodbc-0.88 (sqlite 3.7.4 and 3.7.5 with our own build).

Was it built threadsafe ?  See

http://www.sqlite.org/threadsafe.html

> We can
> still corrupt the database however. So a couple of weeks ago we
> switched to using WAL mode which so far hasn't had any corruption to
> my knowledge.
> 
> I suspect that we are avoiding the root cause by switching the
> behaviour of sqlite on disk. My question is does anyone have pointers
> to help us isolate the problems we are seeing. We can obviously
> provide further detail to your questions.

You don't mention which parts of the documentation you're aware of, so I'll 
point to the above and section 6 of



just in case.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


[sqlite] Corruption on many Windows XP hosts

2011-04-12 Thread James Green
We have an application that we distribute to our clients each of whom
run Windows - often it's a desktop XP machine. Our app consists of a
set of Delphi services and a PHP web front end each of which use ODBC
for database connectivity.

The vast majority of installations have shipped with sqliteodbc-0.83
(sqlite 3.6.14.2). Over several weeks a number of these installations
have broken down - our support staff have noticed problems with the
sqlite databases which turn out to be corrupt.

We recently began shipping a third component - a Java app which again
connected by ODBC. However, it used threads and the database fell over
inside of hours. We found and fixed mulitple thread safety issues and
the rate of corruption reduced signficantly. However the corruption
issue remained albeit smaller in frequency.

In the past month or so we have tested more intensively with
sqliteodbc-0.88 (sqlite 3.7.4 and 3.7.5 with our own build). We can
still corrupt the database however. So a couple of weeks ago we
switched to using WAL mode which so far hasn't had any corruption to
my knowledge.

I suspect that we are avoiding the root cause by switching the
behaviour of sqlite on disk. My question is does anyone have pointers
to help us isolate the problems we are seeing. We can obviously
provide further detail to your questions.

Many thanks,

James
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users