Hello Eli! On 8/10/2013 4:54 AM, Eli Carter wrote: > All, > > Everyone knows the "database is locked" issue with sqlite. According to > http://trac.edgewall.org/wiki/MostFrequentDuplicates it should be gone > with 0.12dev. But, it isn't; I have a way to get that error for 15 > minutes at a time on 0.12.5.
Indeed. As with any request taking an unreasonable amount of time, the whole-db-lock nature of SQLite makes it prone to this error when one tries to write to it, like a lot of requests do, if only for saving cookies or trying to persist the session. > > The root of the problem that I'm seeing is that viewing an individual > file in the source browser will take 15 minutes if that file has only > one commit in the repository history. This is because of the SQL query > run in CachedRepository._next_prev_rev() which is called in two forms > during the request. That SQL query is searching the node_change table, > and for a node with a single commit, will return no results. In my > particular case, that table has 3.5 million entries (from one svn repo). > Running those two queries will take a total of ~950 seconds to complete > on my machine. During that time, if a commit is made to the repository, > every request will attempt to sync the repository, and fail because it > cannot get a write lock. This includes requests that would otherwise be > read-only operations. Well, the sync-per-request problem is simple to avoid: http://trac.edgewall.org/wiki/TracRepositoryAdmin#ExplicitSync ... which doesn't mean we can't improve the case of the implicit sync: > > I'd like to improve this situation in a few ways: > > 1. Add a try/except around repo.sync() and turn that exception into a > add_warning(). While this does nothing to address the real underlying > issue, this makes the problem much less painful and allows the user to > at least look at their data. (db-locked-mitigation-part1.patch) Maybe we should add a hint about configuring explicit sync, something like: "If this is a recurring problem, you can set up an _explicit synchronization_" (with that linking to the same URL as given above). > > 2. Add an explicit duration check around sql execute() calls when [trac] > debug_sql is enabled, and log SQL operations that take over 10 seconds > to complete at the warn level. This will let an admin enable debug_sql > and set the logging level to INFO and see that the database is the > bottleneck, and what query is the cause. Identifying the exact query > will allow developers to figure out the root of the issue and identify > the specific operations that limit scalability. > (db-locked-mitigation-part2.patch) Good idea. > > 3. Address half of the root problem by making the > CachedRepository.previous_rev() call self.repo.previous_rev() > unconditionally and not use the database cache. The database solution > will take time based on the size of the node_change table, but the > repository already has links to parent revs and can find the information > much more directly. In the case I was testing, the previous_rev() call > was taking over 200 seconds, and with the patch now takes under a > second. (part of db-locked-mitigation-part3.patch) Ok, that seems to help. The "real" solution would be to find a appropriate way to index that table and optimize that query. I tried a few things so far, but without success as simply adding an index on the repo/path won't help as SQLite won't use it with that query (EXPLAIN QUERY PLAN is helpful here). > > 4. That leaves the second half of the root problem; namely, that > CachedRepository.next_rev() takes a very long time under these > conditions. (12+ minutes for my particular testcase.) Worse, > self.repo.next_rev() takes an even longer time. (Did not complete over > lunch.) That's why in #8813 I said "better not use directly the svnfs API and use the DB instead". But we should really make a difference here between next_rev and prev_rev, at least until we find a way to make the DB query faster. Note that we already have a ticket for improving this specific "query of hell", it's #11104 (reported in the context of changesets and MySQL, but it's the same thing). > While I do not have a solution for this, making the problem > visible is a step in the right direction. Therefore, I added a duration > check around the next_rev() call which logs a warning if that takes more > than 10 seconds. The warning includes a suggestion to enable [trac] > debug_sql to lead an admin to the database level check from part 2. > (the other part of db-locked-mitigation-part3.patch) It appears that on > my system this is I/O bound in the sqlite database, leading me to doubt > that Postgresql would solve the problem in my case, though I have not > attempted that. This needs further investigation. One "solution" proposed in #8639 was to make an indirect link (e.g. ?rev_following=...) with the link showing a hint about the risk that the request could take a long time to complete. > > I think patches 1 and 2 should be reasonably uncontroversial. Patch #3 > may warrant some discussion and benchmarking. The patches are all > against the 0.12-stable branch, but they will also apply (with offsets) > to trunk. I'll rebase and try them out. Thanks a lot for throwing in some work and new ideas on this topic. Together with the mergeinfo issue that Jun has started to tackle recently (#11219), those are some of the worst performance issues we have. > > If feedback on the patches is favorable, I'll commit them to 0.12-stable > and trunk. (Assuming I still can... it's been a while.) No worries, you don't get removed unless you ask to be removed ;-) OTOH, if you forgot your password in the meantime, just PM me. -- Christian -- You received this message because you are subscribed to the Google Groups "Trac Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/trac-dev. For more options, visit https://groups.google.com/groups/opt_out.
