Re: [sqlite] Bug Report: All database opening blocked awaiting wal index rebuild

Brian Vincent Mon, 11 Jul 2016 12:58:20 -0700

Yes, you seem to understand the issue.  The issue only happens when using
shared caches.

I've reproduced the issue using both SERIALIZED and MULTITHREADED modes.

I'm surprised to hear you say that it might be an inherit limitation or
something not necessarily undesirable.  Of course, when using shared
caches, there must be locks to protect the global list of caches.  But
ideally, I would expect these times to be short lived.  If a situation ever
arises where all unrelated database opening, from all threads, is blocked
for long periods of time, that seems to me to be an obvious bug.  That's
the bug that I'm reporting.

Being an inherit limitation would seem to imply that there is no solution
to this problem, that having shared caches and WAL indexes rebuilding
necessarily should block all unrelated databases opening.  I don't see why
that should be the case and I'll explain some reasons why.  It's a little
bit hard for me to talk about it though, because I'm not entirely sure what
the lock SQLITE_MUTEX_STATIC_OPEN is protecting.  When iterating through
the list of shared caches, it acquires the lock SQLITE_MUTEX_STATIC_MASTER,
so the other OPEN lock must be for something else.  The comments say it's
to prevent a race condition and references "Ticket #3537", but I can't seem
to find that ticket.

The problematic line seems to be this one:
https://github.com/mackyle/sqlite/blob/master/src/btree.c#L2405

It seems that the only purpose of this line is to check to see if this is
an already shared pager-cache.  The problem is that this line ends up
acquiring a lock on the btree.  If there was simply a way to check if this
is an already shared pager-cache, without the possibility of waiting on a
lock, then this bug would be solved.

This is the checkin that originally created the open lock
https://www.sqlite.org/src/info/19fa5a29b97f017a

It doesn't appear that the problematic line existed at this time, which
leads me to believe that there were probably versions of sqlite released
that didn't have this bug.

It seems to me that when this open lock was created, it was expected that
the open code would always execute within a very short period of time.
Later on, someone added the call to sqlite3BtreeSchema, not realizing that
it could possibly get hung up on a lock, which could block all opening for
all databases.

Please let me know if I'm thinking about this problem clearly, or if you
would like me to test some things or write a simple test case.

Thanks,
Brian Vincent

On Sat, Jul 9, 2016 at 5:58 AM Olivier Mascia <[email protected]> wrote:

> > Le 9 juil. 2016 à 12:33, Simon Slavin <[email protected]> a écrit :
> >
> >> I'm really interested in knowing wether you use the engine in
> SERIALIZED or MULTITHREADED mode during this event reproduction?
> >
> > In other words, please read the last part of
> >
> > <https://www.sqlite.org/threadsafe.html>
> >
> > and try to reproduce your problem in "Multi-thread" mode.  If the
> problem still occurs, please try again in "Serialized" mode.
>
> In between I had a quick look at sqlite3.c code.  It looks like the source
> of the issue is expected. The MULTITHREADED mode relaxes usage of mutexes,
> albeit only on connections and statements structures.  The remaining bits
> of the engine still make uses of multiple mutexes to protect its integrity
> from multiple threads using the engine.
>
> See sqlite3BtreeOpen() and this section of code where the
> SQLITE_MUTEX_STATIC_OPEN is acquired.
> The SERIALIZED and MULTITHREADED modes both have SQLITE_THREADSAFE != 0
> (which is right).
>
> #if !defined(SQLITE_OMIT_SHARED_CACHE) && !defined(SQLITE_OMIT_DISKIO)
>   /*
>   ** If this Btree is a candidate for shared cache, try to find an
>   ** existing BtShared object that we can share with
>   */
>   if( isTempDb==0 && (isMemdb==0 || (vfsFlags&SQLITE_OPEN_URI)!=0) ){
>     if( vfsFlags & SQLITE_OPEN_SHAREDCACHE ){
>       int nFilename = sqlite3Strlen30(zFilename)+1;
>       int nFullPathname = pVfs->mxPathname+1;
>       char *zFullPathname = sqlite3Malloc(MAX(nFullPathname,nFilename));
>       MUTEX_LOGIC( sqlite3_mutex *mutexShared; )
>
>       p->sharable = 1;
>       if( !zFullPathname ){
>         sqlite3_free(p);
>         return SQLITE_NOMEM_BKPT;
>       }
>       if( isMemdb ){
>         memcpy(zFullPathname, zFilename, nFilename);
>       }else{
>         rc = sqlite3OsFullPathname(pVfs, zFilename,
>                                    nFullPathname, zFullPathname);
>         if( rc ){
>           sqlite3_free(zFullPathname);
>           sqlite3_free(p);
>           return rc;
>         }
>       }
> #if SQLITE_THREADSAFE
>       mutexOpen = sqlite3MutexAlloc(SQLITE_MUTEX_STATIC_OPEN);
>       sqlite3_mutex_enter(mutexOpen);
>       mutexShared = sqlite3MutexAlloc(SQLITE_MUTEX_STATIC_MASTER);
>       sqlite3_mutex_enter(mutexShared);
> #endif
>
> The whole issue then revolves around the SHARED_CACHE/PRIVATE_CACHE and
> not the SERIALIZED/MULTITHREADED mode. Chances are that if OP Brian tries
> the same scenario with PRIVATE_CACHE connections, the issue will not occur.
>
> I think this would probably qualify for an inherent limitation, which
> might be documented, but not something necessarily undesirable.
>
> --
> Meilleures salutations, Met vriendelijke groeten, Best Regards,
> Olivier Mascia, integral.be/om
>
>
>
> _______________________________________________
> sqlite-users mailing list
> [email protected]
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users
>
_______________________________________________
sqlite-users mailing list
[email protected]
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Bug Report: All database opening blocked awaiting wal index rebuild

Reply via email to