Hi mailing list gurus!
I will start with TL;DR version as this may be enough for some of you:
* We are trying to investigate an issue that we see in diagnostic data
of our C++ product.
* The issue was pinpointed to be caused by timeout on
`sqlite3_open_v2` which supposedly takes over 60s to complete (we only
give it 60s).
* We tried multiple different configurations, but never were able to
reproduce even 5s delay on this call.
So the question is if maybe there are some known scenarios in which
`sqlite3_open_v2` can take that long (on windows)?
Now to the details:
* We are using version `3.10.2` of SQLite. We went through changelogs
from this version till now and nothing we've found in bugfixes section
seems to suggest that there was some issue that was addressed in
consecutive sqlite releases and may have caused our problem.
* The issue we see affects around 0.1% unique user across all
supported versions of windows (Win 7, Win 8, Win 10). There are no
manual user complains/reports about that - this can suggest that problem
happens in the context where something serious enough is happening with
user machine/system that he doesn't expect anything to work. So
something that indicates system wide failure is a valid possibility as
long as it can possibly happen for 0.1% of random windows users.
* There are no data indicating that the same issue ever occurred on
Mac which is also supported platform with large enough sample of
diagnostic data.
* We are using Poco (https://github.com/pocoproject/poco, version:
1.7.2) as a tool for accessing our SQLite database, but we've analyzed
the Poco code and it seems that failure on this code level can only
(possibly) explain ~1% of all collected samples. This is how we've
determined that problem lies in `sqlite3_open_v2` taking long time.
* This happens on both `DELETE` journal mode as well as on `WAL`.
* It seems like after this problem happens first time for a particular
user each consecutive call to `sqlite3_open_v2` takes that long until
user restarts whole application (possibly machine, no way to tell from
our data).
* We are using following flags setup for `sqlite3_open_v2` (as in Poco):
> sqlite3_open_v2(..., ..., SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
| SQLITE_OPEN_URI, NULL);
* This usually doesn't happen on startup of the application so it's
not likely to be caused by something happening while our application is
not running. This includes power cuts offs causing data destruction
(which tend to return SQLITE_CORRUPT anyway, as mentioned in:
https://www.sqlite.org/howtocorrupt.html).
* We were never able to reproduce this issue locally even though we
tried different things:
** Multiple threads writing and reading from DB with synchronization
required by particular journaling system.
** Keeping sqlite connection open for long time and working on db
normally in a mean while.
** Trying to hit HDD hard with other data (dumping /dev/rand (WSL)
to multiple files from different processes while accessing DB normally).
** Trying to force antivirus software to scan db on every file
access (tested with Avast with basically everything enabled including
"scan on open" and "scan on write").
** Breaking our internal synchronization required by particular
journaling systems.
** Calling WinAPI CreateFile with all possible combinations of file
sharing options on db file - this caused issues but `sqlite3_open_v2`
always returned fast - just with error.
** Calling WinAPI LockFile on random parts of DB file which is btw.
nice way of reproducing `SQLITE_IOERR`, but no luck with reproducing the
discussed issue.
** Some additional attempts to actually stretch Poco layer and
double check if our static analysis of codes are right.
* We've tried to look for similar issues online but anything somewhat
relevant we've found was here
http://sqlite.1065341.n5.nabble.com/sqlite3-open-v2-performance-degrades-as-number-of-opens-increase-td37482.html
. This doesn't seem to explain our case though, as the numbers of
parallel connections are way beyond what we have as well as what would
typical windows user have (unless there is some somewhat popular app
exploiting sqlite which we don't know about).
Do you have any ideas what can cause that issue?
Maybe some hints what else should we check or what additional diagnostic
data that we can collect from users would be useful to pinpoint the real
reason why that happens?
Thanks in advance :)
Andrzej 'Yester' Fiedukowicz
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users