[sqlite] PRAGMA Synchronous safety
>So would it be possible to run that command each time you open the config >database and after any change to it ? That would give us a perfect way to >find out which commands were causing your problems.< Not really possible. The average update rare is low, but there are times when hundreds of settings are written, depending on which changes the user makes in preferences etc. Users can update settings from custom scripts, which may mean one update per session or hundreds per minute. Running a 5 second integrity check after each write would bring down performance badly. I now also run an integrity_check when closing the settings database during application shut-down and will seek to find a way to notify the user to retain the log file - in the hope that it contains more info. My users are no IT folks, just average users, moms & pops. Displaying scary error messages about damaged databases and asking to send log files will cause a lot of additional support and probably bad reviews in social media. Database damage is a very sensitive area.
[sqlite] PRAGMA Synchronous safety
> >So would it be possible to run that command each time you open the config > database and after any change to it ? That would give us a perfect way to > find out which commands were causing your problems.< > > Not really possible. The average update rare is low, but there are times when > hundreds of settings are written, depending on which changes the user > makes in preferences etc. Users can update settings from custom scripts, > which may mean one update per session or hundreds per minute. Running a > 5 second integrity check after each write would bring down performance > badly. > I had got the impression (and probably Simon had too) that the preferences database was a lot smaller, so the integrity check would be a lot quicker. > I now also run an integrity_check when closing the settings database during > application shut-down and will seek to find a way to notify the user to retain > the log file - in the hope that it contains more info. My users are no IT > folks, > just average users, moms & pops. Displaying scary error messages about > damaged databases and asking to send log files will cause a lot of additional > support and probably bad reviews in social media. Database damage is a very > sensitive area. > I wonder whether you could provide those that have suffered a corrupt database with "special" code with extra logging and checks. You could warn them about scary messages and longer delays. Explaining this is part of your investigations. These people already know there is a problem, so are unlikely to spread bad reviews. They may also be more likely to suffer another corruption if it relates to a particular workflow. As a user of your application I would be happy to help, unfortunately (not for me :^) I've never had a corrupted database. Regards Andy Ling
[sqlite] PRAGMA Synchronous safety
I dump the output to the log file so when a user sends me a log after the diagnosis reported a ?repaired? damage, I see one or more entries like: ?row 2481 missing from index idx_settings_sndidmnun? I will see if I can collect more log files in the coming months.
[sqlite] PRAGMA Synchronous safety
On 12 Mar 2015, at 2:51pm, Mario M. Westphal wrote: > I dump the output to the log file so when a user sends me a log after the > diagnosis reported a ?repaired? damage, I see one or more entries like: > > ?row 2481 missing from index idx_settings_sndidmnun? > > I will see if I can collect more log files in the coming months. They will not help much. We do need to see what is wrong with the database file, and they will tell us that. But we also need to know the circumstances the corruption occurred in. A big part of that is which command caused the corruption. So we need to spot the corruption immediately it occurs, not three or four commands later. In an earlier post you stated that corruption occurs even in your little configuration database. This should be small enough that you can run "PRAGMA integrity_check" on it whenever you want. So would it be possible to run that command each time you open the config database and after any change to it ? That would give us a perfect way to find out which commands were causing your problems. Simon.
[sqlite] PRAGMA Synchronous safety
One thing to add: I was sometimes successful to remote-repair a corrupted database by telling the user how to use sqlite3.exe and calling REINDEX. After learning that, I added this to my diagnosis routine so if integrity_check() returns something that?s not ?ok?, my application runs a REINDEX on the database and then runs integrity_check again. These cases are now automatically repaired and usually go unnoticed by the user. I tried to find a reproducible case where indices get broken or out of sync with the pages, but was unable to.
[sqlite] PRAGMA Synchronous safety
I have enabled the error callback and it logs everything except SQLITE_SCHEMA and SQLITE_OK. In the log files I have received so far, none of these were reported (I use special headers to identify SQLite errors because log files are often several hundred MB in size). I?m waiting for a log file which actually containers some SQLite error info before the disk image corrupted error is logged. I?ve asked the users who had the problem once to run the diagnostics more often so we catch problems faster. But often it?s just a ?Worked yesterday, today it does not start?. And then I get the log file and it shows ?disk image malformed? right after the database open procedure. It?s really hard to get more info about this, but I?ll try.
[sqlite] PRAGMA Synchronous safety
On 3/12/15, Mario M. Westphal wrote: > One thing to add: > > > > I was sometimes successful to remote-repair a corrupted database by telling > the user how to use sqlite3.exe and calling REINDEX. > > > > After learning that, I added this to my diagnosis routine so if > integrity_check() returns something that?s not ?ok?, my application runs a > REINDEX on the database and then runs integrity_check again. These cases are > now automatically repaired and usually go unnoticed by the user. I tried to > find a reproducible case where indices get broken or out of sync with the > pages, but was unable to. > I'm sure your users appreciate the automatic fixes. So that's probably the right thing to do. Nevertheless, this is really just masking a much deeper problem that needs to be rooted out. Can you change your application so that when an automatic repair does occur, it saves off a copy of the "PRAGMA integrity_check" output as well as other diagnostic information into a file, then asks the user to email that file to you? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
> Why are you using shared cache, especially with WAL? Are your devices memory > constrained? I was under the impression that shared cache has performance benefits when used in a MT environment.
[sqlite] PRAGMA Synchronous safety
> If those databases are small then running "PRAGMA integrity_check" on them > should be very fast. My application does that. This is how users find out that their settings database has become corrupted. So far it is just reported and then the config database is restored from last working backup. I did not anticipate that this small database will ever become damaged, except in the case of blue-screens or power failures. For this a simple ?Config database is damaged, restoring from last backup?? message to the user seemed sufficient. No detailed logging is implemented at this time.
[sqlite] PRAGMA Synchronous safety
I don?t run pragma quick_check in my application ? and I don?t have any corrupted database here. I only ever see the log files when my application reports a damaged application, and there I see the ?disk image malformed? error message. The diagnosis routine included in my application and which runs weekly by default performs logical tests on the data stored in the database, but also low-level tests like pragma integrity_check to ensure the SQLite database is physically correct. Either the damage is discovered here (in these cases I dump the first 100 rows returned by integrity_check into the log file) or during normal operation, e.g. while opening a database.
[sqlite] PRAGMA Synchronous safety
On 11 Mar 2015, at 5:33pm, Mario M. Westphal wrote: > All recent occurrences of the problem where on local hard disks / SSD disks. > It even affects the small configuration databases my application maintains in > SQLite, with the FULL synch mode for maximum safety. These also use WAL / > shared_cache but are only updated a couple of times per minute and have to > endure much less stress by concurrent multi-thread read/write. If those databases are small then running "PRAGMA integrity_check" on them should be very fast. Can you have your app run that PRAGMA each time it starts up and quits and show a warning if it finds anything ? Simon.
[sqlite] PRAGMA Synchronous safety
Sorry, typing on a handheld with too thick thumbs :) The database files can be on remote storage (e.g. Windows server or NAS) (I know that this may be a factor and we discussed this already and I know the texts on your web site about false/incomplete locking etc). I just mentioned that here to show that using pragma quick_check is not practicable in a production environment when you deal with larger databases. All recent occurrences of the problem where on local hard disks / SSD disks. It even affects the small configuration databases my application maintains in SQLite, with the FULL synch mode for maximum safety. These also use WAL / shared_cache but are only updated a couple of times per minute and have to endure much less stress by concurrent multi-thread read/write.
[sqlite] PRAGMA Synchronous safety
One process only.
[sqlite] PRAGMA Synchronous safety
>All recent occurrences of the problem where on local hard disks / SSD >disks. It even affects the small configuration databases my application >maintains in SQLite, with the FULL synch mode for maximum safety. These >also use WAL / shared_cache but are only updated a couple of times per >minute and have to endure much less stress by concurrent multi-thread >read/write. Why are you using shared cache, especially with WAL? Are your devices memory constrained? --- Theory is when you know everything but nothing works. Practice is when everything works but no one knows why. Sometimes theory and practice are combined: nothing works and no one knows why.
[sqlite] PRAGMA Synchronous safety
The problem is that the databases usually are several gigabytes in size, which make them too large for the users to send them to me. A pragma quick_check takes one to two minutes, if the database is on remove storage even more? I currently only have a small 30 MB database which reports ?ok? for the pragma quick_check but ?row 2481 missing from index idx_settings_sndidmnun? for pragma integrity_check. This is only a small settings database, and shows none of the disk image malformed problems. I can send you that DB if that helps. Let me know where to. A while ago I had a damaged database, but I have deleted for space reasons. Sorry. I will try to get a damaged database here as soon as a customer reports another damage.
[sqlite] PRAGMA Synchronous safety
PRAGMA quick_check Talks ~ 120s on an average size (4 GB) database. Database is on a SSD. Cold database, right after open, nothing in the Windows file cache yet. Running it again takes about 100 seconds, not much faster. Nothing you can run very often or in the background because it keeps the disk busy and blocks other transactions and database access.
[sqlite] PRAGMA Synchronous safety
On 11 Mar 2015, at 2:21pm, Mario M. Westphal wrote: > I currently only have a small 30 MB database which reports ?ok? for the > pragma quick_check but > > ?row 2481 missing from index idx_settings_sndidmnun? for pragma > integrity_check. If some of the corruption you're seeing can't be spotted by "PRAGMA quick_check" then don't worry about implementing it. Simon.
[sqlite] PRAGMA Synchronous safety
I use 3.8.8.1, source id is 2015-01-20 16:51:25 f73337e3e289915a76ca96e7a05a1a8d4e890d55 I compile the amalgamation using Visual Studio 2012. The options I use are: encoding='UTF-16le'; journal_mode=WAL; wal_autocheckpoint=2; // better bulk speed inserts locking_mode=EXCLUSIVE; synchronous=NORMAL; page_size=4096; cache_size=16384; auto_vacuum=NONE; temp_store=MEMORY; foreign_keys=ON; I implement nested transactions using checkpoints, with an outer BEGIN TRANSACTION. 10 or more threads may have a database connection open, but each uses its own sqlite3 connection created via sqlite3_open_v2(). There are typically many readers, but only one writer (BEGIN IMMEDIATE) is used explicitly or implicity. Unfortunately, the database damage seems to happen in rarely used sections of the database file so the problem is only detected during the (usually) weekly diagnosis runs. I?m waiting for a ticket where the user encounters the damage problem and still has all the log files available. Since my application logs all non-OK SQLite return codes plus has an error callback, such a log file may provide additional info about how and when the damage happened.
[sqlite] PRAGMA Synchronous safety
On 11 Mar 2015, at 1:56pm, Mario M. Westphal wrote: > PRAGMA quick_check > > Talks ~ 120s on an average size (4 GB) database. Can you just verify for us that that PRAGMA does spot the corruption you've mentioned ? It may not be practical to build it into your app but knowing whether the corruption is spotted or not tells us useful things. Simon.
[sqlite] PRAGMA Synchronous safety
On 3/11/15, Mario M. Westphal wrote: > I don?t run pragma quick_check in my application ? and I don?t have any > corrupted database here. I only ever see the log files when my application > reports a damaged application, and there I see the ?disk image malformed? > error message. > > > > The diagnosis routine included in my application and which runs weekly by > default performs logical tests on the data stored in the database, but also > low-level tests like pragma integrity_check to ensure the SQLite database is > physically correct. Either the damage is discovered here (in these cases I > dump the first 100 rows returned by integrity_check into the log file) or > during normal operation, e.g. while opening a database. > OK. As you no doubt recognize, the more information you can feed us the better. Like, maybe capturing the smaller configuration database when it fails and sending it too us. Am I correct in understanding that you have enabled the error and warning log (https://www.sqlite.org/errlog.html) and it is not showing anything unusual? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
On 11 Mar 2015, at 10:35am, Mario M. Westphal wrote: > My application logs all error codes from SQLite but I have not seen anything > unusual in the logs provided to me by customers. I have even added an error > callback for SQLite (as per our recent discussion), to get more info about > the error. In the log files I have seem, this handler was not called and > there are no unusual errors. But the database corruption may be detected days > after the actual error happened, and then the original logs are gone. I > depend on my users to get logs and that does not work too well. Please take one of the databases which has been corrupted and see what the results of PRAGMA quick_check on it are. The intention is to find out whether you have a fast method of knowing that your database is corrupt. If the above does reliably spot the type of corruption you're seeing, you can incorporate it in some operation your app does frequently. Like opening or closing the database. This might give you a way to spot corruption much sooner after it occurs than to wait for SQLite to try to use the bit of the database which is corrupt. Simon.
[sqlite] PRAGMA Synchronous safety
I?m also concerned about this. I have used SQLite since around 2008 with great success. For the latest version of my application I decided to switch to using WAL mode and shared cache, to gain better performance. I use pragma synchronous=NORMAL and for a highly-critical (yet small, 2 MB) database even synchronous=FULL. My application is multi-threaded, but each thread uses a separate instance of SQLite. Typical database sizes are from 2 MB to 8 GB. Recently I get an alarming large number of reports about databases with the ?database disk image is malformed?. Pragma integrity_check() reports a variety of errors in these cases. The typical scenario is: Windows 7 or 8. One user. Database on a local hard disk or SSD. No power failure. No blue-screen or other issue from the ?How to damage your SQLite database? help topic. The error just happens. It may cause SQLite to refuse loading the database, or it is discovered when my users run the routine diagnosis, which includes a pragma integrity_check() and if it returns anything other than ?OK? it flags the database as defect. Users then need to restore the database from the backup. A database once marked as defect will always stay in that state. My application logs all error codes from SQLite but I have not seen anything unusual in the logs provided to me by customers. I have even added an error callback for SQLite (as per our recent discussion), to get more info about the error. In the log files I have seem, this handler was not called and there are no unusual errors. But the database corruption may be detected days after the actual error happened, and then the original logs are gone. I depend on my users to get logs and that does not work too well. Questions: Is there a higher risk in using WAL and/or shared_cache? In a mulit-threaded environment? Is it safe to switch existing databases back from WAL to non-wall mode? Is there a specific workflow? SQLite was always so reliable and I had maybe one report about a damaged database in 3 months (and usually it was a hard defect or a power failure). But now I get reports about damaged databases every week, sometimes even for new databases which have been created an hour ago. I fear that by using WAL/shared_cache with multiple SQLIte instances in multiple threads I somehow stress out SQLite, causing database damage under some conditions. I would go back to the tried-and-true non-WAL no shared-cache mode and let a few hundred users test that for a while to see if this causes a drop in damaged databases.
[sqlite] PRAGMA Synchronous safety
On 3/11/15, Mario M. Westphal wrote: > > I use pragma synchronous=NORMAL and for a highly-critical (yet small, 2 MB) > database even synchronous=FULL. synchronous should not matter, except following a power-outage. > > My application is multi-threaded, but each thread uses a separate instance > of SQLite. > Are there ever multiple processes writing to the database? Or is it always just one process at a time? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
On 3/11/15, Mario M. Westphal wrote: > > A pragma quick_check takes one to two minutes, if the database is on remove > storage even more? > In your original email, you said that all database files were on a local disk. This message implies that sometimes they are on a remote filesystem. (I'm guessing that "remove storage" above is a typo for "remote storage".) As this is an important factor in tracking down the problem, please let us know which it is. Thanks. -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
On 3/11/15, Mario M. Westphal wrote: > > The typical scenario is: Windows 7 or 8. One user. Database on a local hard > disk or SSD. No power failure. No blue-screen or other issue from the ?How > to damage your SQLite database? help topic. The error just happens. It may > cause SQLite to refuse loading the database, or it is discovered when my > users run the routine diagnosis, which includes a pragma integrity_check() > and if it returns anything other than ?OK? it flags the database as defect. > Users then need to restore the database from the backup. A database once > marked as defect will always stay in that state. Do you have some example corrupt database files that you can share? Can you show the complete output of "PRAGMA integrity_check"? Can you alter your application so that it invokes "PRAGMA quick_check" more frequently, so that the problem is detected sooner? Can you provide us more detail on your work-load. How many separate threads are accessing the database from within a single process? How many processes are accessing the database? Are they all writing, or are some just reading? What are your typical transactions? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
On 3/11/15, Mario M. Westphal wrote: > > Is there a higher risk in using WAL and/or shared_cache? In a mulit-threaded > environment? > No. WAL should be "safer" than rollback as it is less prone to problems due to malfunctioning FlushFileBuffers() system calls. Multithreading should not be a problem. > > > Is it safe to switch existing databases back from WAL to non-wall mode? Is > there a specific workflow? Simply run "PRAGMA journal_mode=DELETE;" But you have to do that when only a single database connection has the database open, or else it will refuse. > > > > SQLite was always so reliable and I had maybe one report about a damaged > database in 3 months (and usually it was a hard defect or a power failure). > But now I get reports about damaged databases every week, sometimes even for > new databases which have been created an hour ago. > > > > I fear that by using WAL/shared_cache with multiple SQLIte instances in > multiple threads I somehow stress out SQLite, causing database damage under > some conditions. It should just work. It should not be possible to "stress out" SQLite. We also would like to understand what is going on. What version of SQLite are you using? And what compiler are you using? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
On 2 Mar 2015, at 10:32pm, Doug Nebeker wrote: >> Are you using any PRAGMAs apart from "PRAGMA synchronous" ? > > PRAGMA temp_store=1 (file) > PRAGMA cache_size=2000 > PRAGMA page_size=4096 > sqlite3_soft_heap_limit( 1GB ) > > Those are the only non-default settings. I see nothing in there (or the rest of your posts) that looks dangerous. So I still don't know why you're seeing corruption. I hope one of the other posts here will help you. >> Are you testing the result codes of /all/ your sqlite3_ calls > > Yes. But I usually don't see the full log to see if something has happened > earlier. I'll have to track down a log and see if the corruption error is > the first one that happens. Might be helpful. Also to perhaps make it clear to your users that something is wrong the first time you get a result that isn't SQLITE_OK. Simon.
[sqlite] PRAGMA Synchronous safety
On 3/2/15, Doug Nebeker wrote: > >> Depends on whether you are using WAL mode or not. > > I am not using WAL. > Then there is little performance impact from using PRAGMA synchronous=FULL versus PRAGMA synchronous=NORMAL. Both should work. I do not understand why you are seeing errors. Have you looked at https://www.sqlite.org/howtocorrupt.html to see if that offers any clues? -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
Thank you for your responses Simon and Richard. To your questions: > Are you using any PRAGMAs apart from "PRAGMA synchronous" ? PRAGMA temp_store=1 (file) PRAGMA cache_size=2000 PRAGMA page_size=4096 sqlite3_soft_heap_limit( 1GB ) Those are the only non-default settings. > Are you testing the result codes of /all/ your sqlite3_ calls Yes. But I usually don't see the full log to see if something has happened earlier. I'll have to track down a log and see if the corruption error is the first one that happens. > are you replacing it with a known good one before continuing Yes. Once we get the report, the user has to delete the file and start fresh with an empty database. > Depends on whether you are using WAL mode or not. I am not using WAL. Doug -Original Message- From: sqlite-users-bounces at mailinglists.sqlite.org [mailto:sqlite-users-boun...@mailinglists.sqlite.org] On Behalf Of Simon Slavin Sent: Monday, March 2, 2015 3:47 PM To: General Discussion of SQLite Database Subject: Re: [sqlite] PRAGMA Synchronous safety On 2 Mar 2015, at 4:45pm, Doug Nebeker wrote: > 1. Is setting synchronous to FULL necessary to prevent these corruption > problems? You should not be getting this corruption problem, so I don't know whether FULL would cure it. > 2. NORMAL is much slower than NONE. Is FULL much slower than NORMAL? Sorry but it's too hard to predict 'much' since the amount of time taken varies with different versions of the OS, width of the data bus, disk format, and other such things. All you can do is try it on your own specific setup. In the meantime ... Are you using any PRAGMAs apart from "PRAGMA synchronous" ? Are you testing the result codes of /all/ your sqlite3_ calls to check that they're SQLITE_OK ? Sometimes it's not the expected call which causes the corruption, or which first returns an error because the database is corrupt. Once the database is reported as corrupt, are you replacing it with a known good one before continuing ? Corruption is not automatically fixed and once a database is corrupt it will remain corrupt until something is done about it. Simon. ___ sqlite-users mailing list sqlite-users at mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] PRAGMA Synchronous safety
On 2 Mar 2015, at 4:45pm, Doug Nebeker wrote: > 1. Is setting synchronous to FULL necessary to prevent these corruption > problems? You should not be getting this corruption problem, so I don't know whether FULL would cure it. > 2. NORMAL is much slower than NONE. Is FULL much slower than NORMAL? Sorry but it's too hard to predict 'much' since the amount of time taken varies with different versions of the OS, width of the data bus, disk format, and other such things. All you can do is try it on your own specific setup. In the meantime ... Are you using any PRAGMAs apart from "PRAGMA synchronous" ? Are you testing the result codes of /all/ your sqlite3_ calls to check that they're SQLITE_OK ? Sometimes it's not the expected call which causes the corruption, or which first returns an error because the database is corrupt. Once the database is reported as corrupt, are you replacing it with a known good one before continuing ? Corruption is not automatically fixed and once a database is corrupt it will remain corrupt until something is done about it. Simon.
[sqlite] PRAGMA Synchronous safety
On 3/2/15, Doug Nebeker wrote: > In the documents, using PRAGMA synchronous 1 (NORMAL) is supposed to be safe > unless there is an OS crash or power failure. > > I've had a few customers (about one a month, out of thousands) that is > getting the "database disk image is malformed" ExtErr=11 error with the > NORMAL pragma, and they claim there was no crash, power failure or anything > else unusual. The database _does_ get hammered under normal operation. > > There are two separate sqlite3* handles to the database. Each one is > protected by a critical section, so there is never more than one thread > accessing a handle at a time. The database is on a local NTFS drive. > > Two questions: > > 1. Is setting synchronous to FULL necessary to prevent these corruption > problems? Not that we are aware of. > > 2. NORMAL is much slower than NONE. Is FULL much slower than NORMAL? > Depends on whether you are using WAL mode or not. -- D. Richard Hipp drh at sqlite.org
[sqlite] PRAGMA Synchronous safety
In the documents, using PRAGMA synchronous 1 (NORMAL) is supposed to be safe unless there is an OS crash or power failure. I've had a few customers (about one a month, out of thousands) that is getting the "database disk image is malformed" ExtErr=11 error with the NORMAL pragma, and they claim there was no crash, power failure or anything else unusual. The database _does_ get hammered under normal operation. There are two separate sqlite3* handles to the database. Each one is protected by a critical section, so there is never more than one thread accessing a handle at a time. The database is on a local NTFS drive. Two questions: 1. Is setting synchronous to FULL necessary to prevent these corruption problems? 2. NORMAL is much slower than NONE. Is FULL much slower than NORMAL? Thanks Doug