Daniel Axtens <d...@axtens.net> writes: > Stewart Smith <stew...@linux.ibm.com> writes: > >> There's two main bits that are really expensive when composing the list >> of patches for a project: the query getting the list, and the query >> finding the series for each patch. >> >> If we look at the query getting the list, it gets a lot of unnesseccary >> fields such as 'headers' and 'content', even though we tell Django not >> to. It turns out that Django seems to ignore the Submission relationship >> and I have no idea how to force it to ignore that thing (defer doesn't >> work) but if we go only, then it works okay. >> >> From my import of ~8000 messages for a few projects, my laptop query >> time (MySQL, as setup by whatever the docker-compose things do) goes >> from: >> >> http://localhost:8000/project/linux-kernel/list/ >> FROM: >> 342ms SQL queries cold cache, 268ms warm cache >> TO: >> 118ms SQL queries cold cache, 88ms warm cache >> >> Which is... non trivial to say the least. >> >> The big jump is the patches.only change, and the removal of ordering >> on the patchseries takes a further 10ms off. For some strange reason, it >> seems rather hard to tell Django that you don't care what order the >> results come back in for that query (if we do, then the db server has to >> do a sort rather than just return each row) > > Thanks Stewart! It's great to get some real DB experience - feel free to > hang around! :)
I'll try to :) Or at least pop up to make things faster/nicer to the database. > So, further to our conversation with Konstantin, I tested this against > Django 2.0. It still saves us some time - it means we no longer load the > following fields: > > `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, > `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`, > `patchwork_patch`.`archived`, `patchwork_patch`.`hash`, > `patchwork_patch`.`patch_project_id`, > > This obviously saves the db some work and communication overhead. I found that 'headers' and 'content' from the patch were also being fetched, and it saved fetching those (which are really quite expensive to do, as blobs are typically mostly stored on a separate page than the row itself). It's the blobs that were the big expense, as they'd likely generate 2 disk seeks and reads per row. > I'm a little nervous that this will slightly complicate some of the > further denormalisation but I think that's probably a price worth paying > unless Stephen objects. I don't think it should really be too much of a problem, as any denormalisation occurs, folk are going to have to look at the queries being produced and executed anyway otherwise there's always a chance for regressions. > I do still want to test the 'ordering' change a bit more before > committing it though. Yeah, that struck me as a weird one, but it certainly did have an effect on some things. -- Stewart Smith OPAL Architect, IBM. _______________________________________________ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork