[sqlite] recurrent failure mode

2015-02-26 Thread Dave Dyer

>
>You might want to read my message on the topic from the list archives,
>dated Sat, 31 Jan 2015.  

In this case, no concurrent or multiple users are involved.  It's just
one client and the database.   There's still plenty of scope for the
networked file system to do things that make sqlite fail.



[sqlite] recurrent failure mode

2015-02-26 Thread Simon Slavin

On 26 Feb 2015, at 12:26am, Dave Dyer  wrote:

>> Do you have any multi-access things going on ?  Two or more computers, 
>> applications, processes or threads trying to access the database at the same 
>> time ?
> 
> No, but it would be normal for the database to be on a different
> computer than the sqlite client, and be using whatever networked 
> file system is common.  The culprit clients seem to be macs, we're
> still seeking more information about the specifics.

Sorry to have to tell you, but almost no implementations of network file 
systems implement locking properly.  So any situations involving simultaneous 
writing can lead to corruption.  But if you do not have simultaneous writing, 
there's no good reason for database corruption.  What you need to find out next 
is

OS version that the computer holding the file is running.
File system that the disk the file is on is formatted in.
OS version that the computer accessing the file is running.
Network protocol used to access the remote filespace (SMB or AFP or something 
like that).

Earlier you wrote

> I suppose that this might be a sqlite bug if the "insert records" step
> and the "maintain indexes" step were separated by a disk error and the
> rollback of the failed transaction was incomplete.

Now you've told me that you are checking the result codes returned by all 
calls, this would have to be a significant bug in SQLite and I don't think I've 
heard of any other reports of it.

Simon.


[sqlite] recurrent failure mode

2015-02-25 Thread Simon Slavin

On 25 Feb 2015, at 8:23pm, Dave Dyer  wrote:

> The facile explanation would be that a transaction to insert a new
> record was executed twice, but the indexes were incorrectly maintained.
> 
> INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
> Richardson','scrollPos','0');

Can you provide the schema (the CREATE TABLE and any CREATE INDEX commands) for 
that table ?

Do you have any multi-access things going on ?  Two or more computers, 
applications, processes or threads trying to access the database at the same 
time ?

Does your application check the result code returned from all sqlite3_ calls ?  
Not just the ones inserting rows, but also those opening and closing the file, 
setting PRAGMAs, or anything else it does with the sqlite3_ library ?

Simon.


[sqlite] recurrent failure mode

2015-02-25 Thread James K. Lowden
On Wed, 25 Feb 2015 16:26:45 -0800
Dave Dyer  wrote:

> >Do you have any multi-access things going on ?  Two or more
> >computers, applications, processes or threads trying to access the
> >database at the same time ?
> 
> No, but it would be normal for the database to be on a different
> computer than the sqlite client, and be using whatever networked 
> file system is common.  The culprit clients seem to be macs, we're
> still seeking more information about the specifics.

You might want to read my message on the topic from the list archives,
dated Sat, 31 Jan 2015.  

Although "bugs" are frequently blamed, in fact the semantics of
networked filesystems are different from that of local filesystems.
Making a database work on a network filesystem might be possible, but
requires considerable work to support cache coherency.  Expecting
SQLite to do that is not too different from expecting to read a letter
the moment it's dropped in the mailbox across town.  

Just consider this scenario:  two clients open the same database, where
the file is on a fileserver somewhere on the network.  Each one does,
say, "select * from T" and peruses the data.  Then, 

client A inserts record 17:
lock record
insert row
free lock

All good.  Now,

client B inserts record 17:
lock record
insert row
free lock

Should result in a primary key violation, right?  No.  

Client A has updated his *local* cache, his in-memory image of some
part of the database.  That's not a SQLite cache; that's the kernel's
filebuffer cache representing the state of some part of the filesystem.
It's completely correct from A's point of view.  SQLite depends on that
cache being correct, and it is: from A's point of view.  

If B is on the same machine as A, they share a single, kernel-provided
filebuffer cache, and when B attempt to insert the duplicate record,
SQLite will see A's record and reject the insert.  

N machines is N caches.  When A flushes his cache with sync(2), how
long before B learns of the change?  With N=1 (same machine), B
learns instantaneously.  With N > 1?  NFS promises only that B will
see A's changes after closing and reopening the file.  

When B is on a different machine, the local representation of the state
of the filesystem does not include A's update.  SQLite examines the
"file", sees no record 17, and updates its local image.  When the
kernel eventually flushes B's update, A's local cache becomes stale.  

With some "luck", you can actually go on like this for a while with no
one noticing.  As long as different clients are updating different
parts of the database and fortuitously refereshing their caches (by
re-reading updated parts that the network isn't helpfully caching,
too), it can seem to sort of work.  Failure is guaranteed in the most
important scenario: when SQLite requires a coherent cache and doesn't
have one.  

So it really doesn't matter if the locking mechanism on network
filesystems are perfect.  By *design*, they make weaker promises than
Posix.  Expecting them to do something they're documented not to do is
asking for trouble.  Which is to say, unfortunately, that you got what
you asked for.  :-/

HTH.  

--jkl




[sqlite] recurrent failure mode

2015-02-25 Thread Dave Dyer

>
>Can you provide the schema (the CREATE TABLE and any CREATE INDEX commands) 
>for that table ?

CREATE TABLE preference_table (
 preferenceSet text,/* name of this preference group */
 preferenceName text,   /* a preference in this group */
 preferenceValue text   /* sort order of this k...;
CREATE UNIQUE INDEX preferenceindex on 
preference_table(preferenceSet,preferenceName);



>Do you have any multi-access things going on ?  Two or more computers, 
>applications, processes or threads trying to access the database at the same 
>time ?

No, but it would be normal for the database to be on a different
computer than the sqlite client, and be using whatever networked 
file system is common.  The culprit clients seem to be macs, we're
still seeking more information about the specifics.

>Does your application check the result code returned from all sqlite3_ calls ? 
> Not just the ones inserting rows, but also those opening and closing the 
>file, setting PRAGMAs, or anything else it does with the sqlite3_ library ?

Yes.  It all goes through a common interface function which is
careful about checking.

As I said in the original message, this is something that has been
working without problems for a few years, the only thing that's changing
is the network and OS environment it's deployed in.  My hypothesis is
that a new failure mode in the file system is tickling a sqlite bug.
Based on the evidence available now, a transaction that is trying to 
insert 4 records fails, and is retried, resulting in 8 records which
can't be indexed.




[sqlite] recurrent failure mode

2015-02-25 Thread Dave Dyer

We're experiencing a new, recurrent failure mode in an old (ie; not recently 
changed) sqlite application.   This may be associated with buggy networked
file system implementations (thanks to apple and/or microsoft)

The apparent problem is that indexes on a small table become corrupted
by not being unique.  Except for the non-uniqueness of the index keys,
there's no apparent damage.

The facile explanation would be that a transaction to insert a new
record was executed twice, but the indexes were incorrectly maintained.

INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','scrollPos','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','nFill','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','placeInBW','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','DB_Subset','');


I suppose that this might be a sqlite bug if the "insert records" step
and the "maintain indexes" step were separated by a disk error and the
rollback of the failed transaction was incomplete.



[sqlite] recurrent failure mode

2015-02-25 Thread Dave Dyer

We're experiencing a new, recurrent failure mode in an old (ie; not recently 
changed) sqlite application.   This may be associated with buggy networked
file system implementations (thanks to apple and/or microsoft)

The apparent problem is that indexes on a small table become corrupted
by not being unique.  Except for the non-uniqueness of the index keys,
there's no apparent damage.

The facile explanation would be that a transaction to insert a new
record was executed twice, but the indexes were incorrectly maintained.

INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','scrollPos','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','nFill','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','placeInBW','0');
INSERT INTO "preference_table" VALUES('Picture Placer-707-1304b-19-Maranda 
Richardson','DB_Subset','');


I suppose that this might be a sqlite bug if the "insert records" step
and the "maintain indexes" step were separated by a disk error and the
rollback of the failed transaction was incomplete.