Am 10.01.2013 15:31, schrieb Dan Kennedy: > On 01/10/2013 07:11 PM, Michael Schlenker wrote: >> Hi everyone, >> [snip] >> I'm pretty sure there was no DELETE for the missing object, but want to >> verify what happend by comparing the WAL files. >> >> Now i checked the wal with the tool/showwal.c program from the source >> distribution and see differences in the frames 50-91 of 288 total >> frames, so basically in the middle of the file. >> >> Now my questions: >> >> Is it a sign of file corruption that some frames in the middle of the >> WAL file are differing between those two snapshots? > > Possibly. But also possibly not. > > Normally, new frames are appended to the WAL file. However, after a > checkpoint occurs (copying the contents of the WAL into the database > file), SQLite wraps around and starts writing at the start of the > WAL file again. So you can get into a situation where you have a large > WAL file on disk with only a small number of valid frames at the > start of it. > > If a crash occurs and SQLite needs to read the WAL file (database > recovery), it can tell the difference between the valid frames at > the start of the WAL and the trailing garbage using the running > checksum embedded in each frame. > > However, it looks like showwal.c assumes the entire WAL file consists > of valid frames (it does not verify the checksums). So, it is possible > that on day 1 your WAL file may have contained just 49 valid frames - > not the 288 indicated by showwal. And that between day 1 and day 2 > 42 extra frames were appended to the WAL.
Thanks Dan, No, i checked this case. The checksums would explain my observations, e.g. some commits vanishing, as the end of the WAL is skipped/ignored due to checksum mismatch. The checksums are correct for the day 1 log, a SELECT finds pages that are later in the WAL than the last difference (frame 92 in my case). Some trivial grepping/python find() on the strings confirms they are at an offset beyond the last difference. In the day 2 log the checksums start to differ on frame 50 and are the same again on frame 92, which leads to those frames >= 92 being ignored. The file header info (salt and checksums) are identical for both files. If i read the code correct, this means no walRestartLog() has happend, as that seems to reset salt1 to a new value. In addition, i would not expect to see 49 identical frames after a walRestartLog(), before new ones are added. So this points to pWal->hdr.mxFrame not being setup correctly on entering sqlite3WalFrames(), which would cause an overwrite of the already written frames without a walRestartLog(). Which points at walIndexTryHdr() reading the maximum sane index from shared memory. And thats probably not working correctly on NFS/Network. So this is a sure way to corrupt the database if the shm code gets an outdated but not corrupted copy of the -shm file from the OS/filesystem and additional writes are done. Ok, so it seems i should implement some really strict asserts/checks so customers do not even try to put those files up on NFS, and just have some pretty error blow up in their face if they try on startup. Thanks. Michael -- Michael Schlenker Software Architect CONTACT Software GmbH Tel.: +49 (421) 20153-80 Wiener Straße 1-3 Fax: +49 (421) 20153-41 28359 Bremen http://www.contact.de/ E-Mail: m...@contact.de Sitz der Gesellschaft: Bremen Geschäftsführer: Karl Heinz Zachries, Ralf Holtgrefe Eingetragen im Handelsregister des Amtsgerichts Bremen unter HRB 13215 _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users