So this is the cover letter for a patch-series that is a single patch, which I normally wouldn't do a cover letter for, but I thought I'd write some background and explanation for why (and how) it would be really good if people were to test this on their systems..
Part of the idea of the whole git save format was always that it would split your divelog into multiple pieces, so that in theory you can load and save the data incrementally, and avoid one of the problems that the XML file had, which was that you had this one huge file that contained everything and was really nasty for both people and computers to parse. At the same time, our actual *model* for handling of the dive list in subsurface has always been to just have all the dives available at all times, so in practice we treated the git data the same way as we treated the xml file: we'd load and save things all in one go. However, that does end up not scaling wonderfully well as the divelog grows. It's especially noticeable on mobile, since the CPU's (and the system around the CPU's) are weaker: Dirk saw his git save times be around 12-15 seconds for his divelog on an iPhone 6 (?). Now, that's not even with _that_ many dives in the end, although one issue is that Dirk has a lot of dive computers, so each dive often contains two or more dive computers, so that doubles or triples the data. So what takes a second on a laptop basically easilyy takes ten times as long on a phone. Anyway, 10+ second save times (and this is not network time, this is the git save itself) is obviously not good, and some profiling (on the laptop, not the phone) showed that a lot of that time was just generating the nice textual representation - the vsnprintf() string engine and the low-level buffering functions from stdio. We'd then generate the hash for the end result, and find the object already in the git database (since 99% of the time you don't actually *change* most dives, and even when you change a dive detail, you won't be changing the divecomputer file associated with that dive), and go on to the next dive. Anyway, the following fairly small patch does the obvious git optimization: just remember the git ID associated with the git save area for a dive, and when saving the dive, don't spend time re-generating all those objects if they haven't changed. That apparently brought the 12 seconds down to about 2.6s on Dirk's phone. The changes to the git loading and saving are actually fairly simple, and while those need testing too, the *big* issue with this is that now we need to make sure to invalidate the dive cache when we change a dive. So now there's a new "invalidate_dive_cache()" function that needs to be called whenever you edit a dive. And we don't edit a dive in just one place. Sure, there's the normal dive editing that happens when you start editing notes or buddy information or cylinders. But we edit dives in lots of other ways: there's the mobile case, of course, but there are more subtle cases too. You can add gas switch events (and other events) in the profile, you can renumber dives in the divelist, you can merge and split dives, and you can add and remove pictures from them etc. All of those cases now need to have that "invalidate_dive_cache()" associated with them. Because if they don't, it looks like the change happens, but then when you save things back, you end up saving that old stale data that you loaded originally. Now, I added all of the invalidations that I could think of, and I even tested it. But maybe I missed something. So testing by others would be appreciated. So if you want to help test this patch, remember a few things: - it only ends up changing the git saves. It has no effect on the XML saving. - testing is easiest with a local git repo, but a cloud save ends up working too. You can see that the change actually took place by quitting and restarting subsurface, but it's much easier to just look at the git repository with the usual git tools, and just verify that ^S actually saved the change. For cloud storage, you can find the repo (at least on Linux, I didn't check the others) in your ~/.subsurface/cloudstorage/<hash> subdirectory, and just go "git log -p" to see the changes. - Remember that the problem is that if you edit something and there's a missing invalidation, the edit won't "stick" in the save. But that also means that you'd want to test this by making a small change, then saving, make another different small change to another dive, and then save, and see in the git logs that both of those changes were made. Because if you make two or more changes to the same dive, if one of them properly invalidates the cache, that may hide the fact that the other one didn't. But it really would be lovely to have people review the patch and help test. Especially if you do odd small edits or use pictures (which I don't). NOTE! Long-term, we could do this on a larger scale with whole dive trips etc. And in theory, we could try to do similar "lazy access" optimizations for reads too - not actually read and parse all dives at startup, and delay until somebody actually starts looking at it. That would be a much bigger and more fundamental change, this is just a small step to avoid one particular problem. Anyway, that was a long cover-letter for a single patch that isn't even all that big in the end. Without further ado, I present to you: Linus Torvalds (1): Don't write back dive data that hasn't changed in git desktop-widgets/maintab.cpp | 4 +++- desktop-widgets/simplewidgets.cpp | 8 ++++++-- profile-widget/profilewidget2.cpp | 4 ++++ qt-mobile/qmlmanager.cpp | 1 + subsurface-core/dive.c | 15 +++++++++++---- subsurface-core/dive.h | 13 +++++++++++++ subsurface-core/load-git.c | 11 +++++++++-- subsurface-core/save-git.c | 37 ++++++++++++++++++++++++++++++------- 8 files changed, 77 insertions(+), 16 deletions(-) -- 2.8.0.rc4.303.g3931579 _______________________________________________ subsurface mailing list [email protected] http://lists.subsurface-divelog.org/cgi-bin/mailman/listinfo/subsurface
