Re: [PATCH 2/4] migrations: 0043_merge_patch_submission: do less work
On Jul 16, 2021, at 10:20 AM, Daniel Axtens wrote: > > - we do a lot of legwork to prevent there being a column with the same > name in a base class and subclass. This is purely for django's benefit > as databases don't see the class relationship and handle column names > being the same across tables just fine. Unfortunately renaming columns > in MySQL/MariaDB is _extremely_ expensive. So just don't do it: lie to > django about the names of the duplicate columns. I believe that modern versions can ALTER TABLE ONLINE to do a column rename without having to copy all the data. Should also work for adding columns (some types). Arguably Django could probably switch to trying specifying ONLINE as the method and falling back to offline (full copy) if it can’t do it online. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] docker: Switch to MariaDB for wider platform support
MySQL docker images only have amd64 arch support MariaDB images have amd64, arm64v8, and ppc64le Signed-off-by: Stewart Smith --- docker-compose.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docker-compose.yml b/docker-compose.yml index 1762d4a..1d49c51 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -1,7 +1,7 @@ version: "3" services: db: -image: mysql:5.7 +image: mariadb:10.4 volumes: - ./tools/docker/db/data:/var/lib/mysql environment: -- 2.26.2 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
> On 15 Nov 2019, at 09:43, Konstantin Ryabitsev > wrote: > > On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote: >>> GET >>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >>> The query behind this takes about 1 minute to run on a 20-core HT Xeon >>> system and requires creating a huge temporary file (there are 18375 >>> patches in that project). >> Ouch, I'm sorry to hear that. > > Well, it's true that some of kernel.org's projects are large beyond what > would be considered "sane". :) My thought is that we should make it so that Patchwork is able to cope. It’s not *that* much data or that complex queries (he says with his database hat on) :) ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
> On 15 Nov 2019, at 05:48, Daniel Axtens wrote: > > >> >> Today, the DB behind patchwork.kernel.org was in a semi-permanent state >> of suffering due to someone trying to suck down all patches in the >> linux-arm-kernel project. This is what the API request looked like: >> >> GET >> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >> >> >> The query behind this takes about 1 minute to run on a 20-core HT Xeon >> system and requires creating a huge temporary file (there are 18375 >> patches in that project). > > Ouch, I'm sorry to hear that. > >> >> So, two questions, really: >> >> 1. Any indexes we can put in place to make this query perform better? > > We have a bunch of db magic contributed by Stewart that will hit 2.2. > > Stewart, do you happen to know if any of your magic will affect API > queries? They're advertised as affecting the general listing of patches > in the UI, I'm not sure if they also affect this. > > If not, we can definitely have a look at getting an index or rate > limiting/authentication thingy in for 2.2. I am pretty sure that my improvements would help in that specific query, probably not enough to be ideal though. I know they improve the web site equivalent operation, but don’t remember how much (if any) I poked at the API there. I could take a look, I’d just need to set up a dev environment and grab a good set of data again. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] docker: Add support for using eatmydata in the database
Russell Currey writes: > When running tox on a VM with presumably pretty busy spinning disks, > using eatmydata with the database took running one configuration's test > suite from (no exaggeration) 20 minutes down to 60 seconds. As the author and been-attempting-to-no-longer-be-maintainer-of eatmydata, this 20x improvement in test execution time doesn't really surprise me. It turns out that properly flushing things to disk is *really* expensive. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: Updating a "patch" state via REST
Markus Mayer writes: > I am working on a script that will, amongst other things, update the > state ("Accepted", "Rejected", etc.) of patches in our own Patchwork > installation. The script is using the REST API. All requests in the > script, so far, are GET requests. They work fine. > > Now, I want to issue a PUT request to update the patch state, also > using the REST API. However, no matter what I try, the request gets > rejected by the server. I think you want the PATCH request rather than PUT? https://github.com/stewart-ibm/pwnm-sync is my script I use to update patch status (from notmuch mail tags) and that seems to be the request type that works for me (with patchwork.ozlabs.org) -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 10/11] Be sensible computing project patch counts
Stephen Finucane writes: > On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote: >> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> > Django actively fights constructing a query that isn't insane. >> > >> > So, let's go and just execute a raw one. This is all very standard >> > SQL so should execute everywhere without a problem. >> > >> > With the dataset of patchwork.ozlabs.org, looking at the /project/ >> > page for qemu-devel would take 13 queries and 1500ms, >> > with this patch it's down to 11 queries in ~250ms. >> > For the dataset of the netdev list, it's down to 440ms from 1500ms. >> > >> > Signed-off-by: Stewart Smith >> >> I'm generally reluctant to dive into SQL unless it's absolutely >> required. This is simple enough and the gains are significant so >> >> Reviewed-by: Stephen Finucane > > Actually, there's a potential issue here which I spotted while testing > with a smaller (odd, I know) set of patches. I can fix at merge time, > assuming a respin isn't needed. > >> Is this a potential bug report for Django? >> >> Stephen >> >> > --- >> > patchwork/views/project.py | 23 --- >> > 1 file changed, 20 insertions(+), 3 deletions(-) >> > >> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py >> > index 484455c02d9d..2a75242a06af 100644 >> > --- a/patchwork/views/project.py >> > +++ b/patchwork/views/project.py >> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse >> > from patchwork.models import Patch >> > from patchwork.models import Project >> > >> > +from django.db import connection >> > + >> > >> > def project_list(request): >> > projects = Project.objects.all() >> > @@ -44,14 +46,29 @@ def project_list(request): >> > >> > def project_detail(request, project_id): >> > project = get_object_or_404(Project, linkname=project_id) >> > -patches = Patch.objects.filter(project=project) >> > + >> > +# So, we revert to raw sql because if we do what you'd think would >> > +# be the correct thing in Django-ese, it ends up doing a *pointless* >> > +# join with patchwork_submissions that ends up ruining the query. >> > +# So, we do not do this, as this is wrong: >> > +#patches = >> > Patch.objects.filter(patch_project_id=project.id).only('archived') >> > +#patches = patches.annotate(c=Count('archived')) >> > +# and instead do this, because it's simple and fast >> > + >> > +n_patches = {} >> > +with connection.cursor() as cursor: >> > +c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c >> > FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', >> > + [project.id]) >> > + >> > +for r in cursor: >> > +n_patches[r[0]] = r[1] >> > >> > context = { >> > 'project': project, >> > 'maintainers': User.objects.filter( >> > profile__maintainer_projects=project), >> > -'n_patches': patches.filter(archived=False).count(), >> > -'n_archived_patches': patches.filter(archived=True).count(), >> > +'n_patches': n_patches[False], >> > + 'n_archived_patches': n_patches[True], > > It's possible that there are (a) no patches, (b) all patches are > archived, or (c) no patches are archived. If any of these are true, > you'll get a key error. We can fix this very simply like so: > > 'n_patches': n_patches[False] if False in n_patches else 0, > 'n_archived_patches': n_patches[True] if True in n_patches else 0, > > As noted above, I'll fix this at merge time and add a test to ensure we > don't regress. Oh, neat! Yeah, I likely didn't test those scenarios at all, I was focused on large amounts of data rather than no data. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 10/11] Be sensible computing project patch counts
Stephen Finucane writes: > On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> Django actively fights constructing a query that isn't insane. >> >> So, let's go and just execute a raw one. This is all very standard >> SQL so should execute everywhere without a problem. >> >> With the dataset of patchwork.ozlabs.org, looking at the /project/ >> page for qemu-devel would take 13 queries and 1500ms, >> with this patch it's down to 11 queries in ~250ms. >> For the dataset of the netdev list, it's down to 440ms from 1500ms. >> >> Signed-off-by: Stewart Smith > > I'm generally reluctant to dive into SQL unless it's absolutely > required. This is simple enough and the gains are significant so (Imagine all my exasperation at wanting to do that for all the other patches :) > Reviewed-by: Stephen Finucane > > Is this a potential bug report for Django? It probably is. With some luck I'll find the time to write up a small test case and send it upstream. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 05/11] Add covering index for /list/ query
Stephen Finucane writes: > On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote: >> In constructing the list of patches for a project, there are two >> main queries that are executed: >> 1) get a count() of how many patches there are >> 2) Get the page of results being displayed >> >> In a test dataset of ~11500 LKML patches and ~4000 others, the existing >> code would take around 585ms and 858ms with a cold cache and 28ms and >> 198ms for a warm cache. >> >> By adding a covering index, we get down to 4ms and 255ms for a cold >> cache, and 4ms and 143ms for a warm cache! >> >> Additionally, when there's a lot of archived or accepted patches >> (I used ~11000 archived out of the 15000 total in my test set) >> the query time goes from 28ms and 72ms down to 2ms and 33-40ms! >> >> Signed-off-by: Stewart Smith > > As before, I'm trusting your leet skillz here in all DB-related things. > I don't have a large enough dataset to validate this properly but I am > seeing a performance improvement in the smaller set I do have. Simply put, an index is just a tree with a key of something like this: |IndexField1|IndexField|PrimaryKeyOrPointerToHowtoGetFullRow| So, if you have all the fields you need to answer a query in the index, the database engine doesn't *need* to go and find the real row, it can answer the query just from the index - saving potentially *lot* of disk seeks as well as being a lot more cache friendly - especially as this will group things on disk based on the order of the index, so if we include things the right way for us, we get a cache friendly structure that groups information on patches in a project close together on disk and in an order that we're likely to request. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 03/11] Add index for patchwork_comment (submission_id, date)
Stephen Finucane writes: > On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote: >> This (at least theoretically) should speed up displaying comments >> on patches/cover letters. It's an index that will return rows >> in-order for the query that we always do ("give me the comments >> on this submission in date order"), rather than having to have >> the database server do a sort for us. >> >> I haven't been able to benchmark something locally that shows >> this is an actual improvement, but I don't have as large data >> set as various production instances. The query plan does look >> a bit nicer though. Although the benefit of index maintenance >> versus how long it takes to sort things is a good question. > > Indexes scare me purely because there aren't clear "this is when you do > this" and "then is when you don't do this" guidelines :) However, I > trust your DB knowledge here and it should be easy to revisit this so Hopefully we can get some index use statistics out of some big live sites and inform what we should keep / drop based on real data rather than gut feelings. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 02/11] 4x performance improvement for viewing patch with many comments
Stephen Finucane writes: > On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote: >> Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com >> with my test dataset of a chunk of a variety of mailing lists, has >> this cover letter have 67 comments from a variety of people. Thus, >> it's on the larger side of things. >> >> Originally, displaying the /patch/550/ for this (redirected to /cover) >> would take 81 SQL queries in ~60ms on my laptop. >> >> After this optimisation, it's down to 14 queries in 14ms. >> >> When the cache is cold, it's down to 32ms from 83ms. >> >> The effect of this patch is to execute a join in the database to >> get the submitter information for each comment at the same time as >> getting all the comments rather than doing a one-by-one lookup after >> the fact. >> >> Signed-off-by: Stewart Smith > > Looks good to me. Daniel's already pointed out the comma-space thing > (blame pep8) but we'll fix at merge time. > > Reviewed-by: Stephen Finucane > > I think the learning here, and for other, related patches in the > series, is that we should be more careful using any model relationships > in templates. Yeah, that was a bit of a theme for some of the performance things I was looking at. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 01/11] Improve patch listing performance (~3x)
Stephen Finucane writes: > On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote: >> There's two main bits that are really expensive when composing the list >> of patches for a project: the query getting the list, and the query >> finding the series for each patch. >> >> If we look at the query getting the list, it gets a lot of unnesseccary >> fields such as 'headers' and 'content', even though we tell Django not >> to. It turns out that Django seems to ignore the Submission relationship >> and I have no idea how to force it to ignore that thing (defer doesn't >> work) but if we go only, then it works okay. >> >> From my import of ~8000 messages for a few projects, my laptop query >> time (MySQL, as setup by whatever the docker-compose things do) goes >> from: >> >> http://localhost:8000/project/linux-kernel/list/ >> FROM: >> 342ms SQL queries cold cache, 268ms warm cache >> TO: >> 118ms SQL queries cold cache, 88ms warm cache >> >> Which is... non trivial to say the least. >> >> The big jump is the patches.only change, and the removal of ordering >> on the patchseries takes a further 10ms off. For some strange reason, it >> seems rather hard to tell Django that you don't care what order the >> results come back in for that query (if we do, then the db server has to >> do a sort rather than just return each row) >> >> Signed-off-by: Stewart Smith > > Tested this and it looks good to me. There's a migration missing (it's > instead squashed into the first "add index" patch) but I can fix that > at merge time. Ahh, sorry about that. > I was concerned that the combination of '.only' followed immediately by > '.prefetch_related' (for checks and series) would be an issue and > indeed it seems that, at this patch set at least, we do generate a > query per check. However, this is no different to before and it > resolved in a later check. As such: > > Reviewed-by: Stephen Finucane > > I'm not going to push this (or any of the series) yet in case Daniel > wants to weigh in but will do so next week if there are no complaints. Thanks for testing it all, increases my confidence with the patches as well :) -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 01/11] Improve patch listing performance (~3x)
Stephen Finucane writes: > On Sat, 2018-08-11 at 04:28 +1000, Daniel Axtens wrote: >> Stewart Smith writes: >> So, further to our conversation with Konstantin, I tested this against >> Django 2.0. It still saves us some time - it means we no longer load the >> following fields: >> >> `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, >> `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`, >> `patchwork_patch`.`archived`, `patchwork_patch`.`hash`, >> `patchwork_patch`.`patch_project_id`, > > Out of curiosity, does this allow us to drop the patch_project_id field > entirely or not (bear in mind, I haven't read through the rest of > this). I'm not too sure... Umm... maybe not as I think we can go from patch to project when viewing a patch? >> This obviously saves the db some work and communication overhead. >> >> I'm a little nervous that this will slightly complicate some of the >> further denormalisation but I think that's probably a price worth paying >> unless Stephen objects. > > To be honest, it could be 6 months before we get any further on big > efforts like this. A 3x performance boost right now beats an unknown > boost N months down the line, IMO. My 2c anyway. I'd tend to agree, we may as well get the maximum performance we can today, and then if it's *really* needed later down the track, we can change the schema a bunch to denormalise. >From my benchmarks, the biggest boost currently would be to precompute and cache patch counts for most of the common views, as a *lot* of page time is for that. >> I do still want to test the 'ordering' change a bit more before >> committing it though. > > I'm going to attempt to do my own testing of this (with a much larger > dataset) some evening this week. I'll hold off applying anything > pending this though. Nice, let me know how it goes. I'll be off somewhere in a national park for a bunch of next week, which means I won't look at email, so it's the *perfect* time to merge and deploy all of my patches :) -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 02/11] 4x performance improvement for viewing patch with many comments
Daniel Axtens writes: > Stewart Smith writes: > >> Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com >> with my test dataset of a chunk of a variety of mailing lists, has >> this cover letter have 67 comments from a variety of people. Thus, >> it's on the larger side of things. >> >> Originally, displaying the /patch/550/ for this (redirected to /cover) >> would take 81 SQL queries in ~60ms on my laptop. >> >> After this optimisation, it's down to 14 queries in 14ms. >> >> When the cache is cold, it's down to 32ms from 83ms. >> >> The effect of this patch is to execute a join in the database to >> get the submitter information for each comment at the same time as >> getting all the comments rather than doing a one-by-one lookup after >> the fact. >> >> Signed-off-by: Stewart Smith >> --- >> patchwork/templates/patchwork/submission.html | 2 +- >> patchwork/views/cover.py | 5 + >> patchwork/views/patch.py | 5 + >> 3 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/patchwork/templates/patchwork/submission.html >> b/patchwork/templates/patchwork/submission.html >> index e817713f7079..2f69735d6925 100644 >> --- a/patchwork/templates/patchwork/submission.html >> +++ b/patchwork/templates/patchwork/submission.html >> @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) >> >> >> >> -{% for item in submission.comments.all %} >> +{% for item in comments %} >> {% if forloop.first %} >> Comments >> {% endif %} >> diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py >> index 73f83cb99b99..edad90bc694d 100644 >> --- a/patchwork/views/cover.py >> +++ b/patchwork/views/cover.py >> @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): >> 'project': cover.project, >> } >> >> +comments = cover.comments.all() >> +comments = comments.select_related('submitter') >> +comments = comments.only('submitter','date','id','content','submission') > These items should be separated by ", " not just ",". I can fix this up > when I apply. You also don't need 'id' as it's added automatically. (thanks for fixup). Hrm.. I wonder why I had id there then... I wouldn't put it past Django to do something silly otherwise, but I could well be wrong. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [PATCH 01/11] Improve patch listing performance (~3x)
Daniel Axtens writes: > Stewart Smith writes: > >> There's two main bits that are really expensive when composing the list >> of patches for a project: the query getting the list, and the query >> finding the series for each patch. >> >> If we look at the query getting the list, it gets a lot of unnesseccary >> fields such as 'headers' and 'content', even though we tell Django not >> to. It turns out that Django seems to ignore the Submission relationship >> and I have no idea how to force it to ignore that thing (defer doesn't >> work) but if we go only, then it works okay. >> >> From my import of ~8000 messages for a few projects, my laptop query >> time (MySQL, as setup by whatever the docker-compose things do) goes >> from: >> >> http://localhost:8000/project/linux-kernel/list/ >> FROM: >> 342ms SQL queries cold cache, 268ms warm cache >> TO: >> 118ms SQL queries cold cache, 88ms warm cache >> >> Which is... non trivial to say the least. >> >> The big jump is the patches.only change, and the removal of ordering >> on the patchseries takes a further 10ms off. For some strange reason, it >> seems rather hard to tell Django that you don't care what order the >> results come back in for that query (if we do, then the db server has to >> do a sort rather than just return each row) > > Thanks Stewart! It's great to get some real DB experience - feel free to > hang around! :) I'll try to :) Or at least pop up to make things faster/nicer to the database. > So, further to our conversation with Konstantin, I tested this against > Django 2.0. It still saves us some time - it means we no longer load the > following fields: > > `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, > `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`, > `patchwork_patch`.`archived`, `patchwork_patch`.`hash`, > `patchwork_patch`.`patch_project_id`, > > This obviously saves the db some work and communication overhead. I found that 'headers' and 'content' from the patch were also being fetched, and it saved fetching those (which are really quite expensive to do, as blobs are typically mostly stored on a separate page than the row itself). It's the blobs that were the big expense, as they'd likely generate 2 disk seeks and reads per row. > I'm a little nervous that this will slightly complicate some of the > further denormalisation but I think that's probably a price worth paying > unless Stephen objects. I don't think it should really be too much of a problem, as any denormalisation occurs, folk are going to have to look at the queries being produced and executed anyway otherwise there's always a chance for regressions. > I do still want to test the 'ordering' change a bit more before > committing it though. Yeah, that struck me as a weird one, but it certainly did have an effect on some things. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 03/11] Add index for patchwork_comment (submission_id,date)
This (at least theoretically) should speed up displaying comments on patches/cover letters. It's an index that will return rows in-order for the query that we always do ("give me the comments on this submission in date order"), rather than having to have the database server do a sort for us. I haven't been able to benchmark something locally that shows this is an actual improvement, but I don't have as large data set as various production instances. The query plan does look a bit nicer though. Although the benefit of index maintenance versus how long it takes to sort things is a good question. Signed-off-by: Stewart Smith --- .../migrations/0027_add_comment_date_index.py | 23 +++ patchwork/models.py | 4 2 files changed, 27 insertions(+) create mode 100644 patchwork/migrations/0027_add_comment_date_index.py diff --git a/patchwork/migrations/0027_add_comment_date_index.py b/patchwork/migrations/0027_add_comment_date_index.py new file mode 100644 index ..0a57a9c3b212 --- /dev/null +++ b/patchwork/migrations/0027_add_comment_date_index.py @@ -0,0 +1,23 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 14:03 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0026_add_user_bundles_backref'), +] + +operations = [ +migrations.AlterModelOptions( +name='series', +options={'verbose_name_plural': 'Series'}, +), +migrations.AddIndex( +model_name='comment', +index=models.Index(fields=['submission', 'date'], name='submission_date_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d2b88fc48c91..d2389cfdad29 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model): class Meta: ordering = ['date'] unique_together = [('msgid', 'submission')] +indexes = [ +models.Index(name='submission_date_idx', + fields=['submission','date']) +] @python_2_unicode_compatible -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 06/11] check distinct(user) based on just user_id
The logic to display the Check(s) on the patch list wants to really do a DISTINCT(user_id,context) ORDER BY DATE query, but with Django that is currently a bit Too Hard (at least for me). But what we can do is from python just use the user_id rather than the user object itself. Same functionality, no join or prefetching users. This saves a couple of hundred queries on the linuxppc/list/ view and makes loading it about 4x faster in terms of time spent in the db. Signed-off-by: Stewart Smith --- patchwork/models.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patchwork/models.py b/patchwork/models.py index 15224ad69cfa..d356a6379ac3 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -539,7 +539,7 @@ class Patch(SeriesMixin, Submission): for check in self.check_set.all(): ctx = check.context -user = check.user +user = check.user_id if user in unique and ctx in unique[user]: # recheck condition - ignore the older result -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 05/11] Add covering index for /list/ query
In constructing the list of patches for a project, there are two main queries that are executed: 1) get a count() of how many patches there are 2) Get the page of results being displayed In a test dataset of ~11500 LKML patches and ~4000 others, the existing code would take around 585ms and 858ms with a cold cache and 28ms and 198ms for a warm cache. By adding a covering index, we get down to 4ms and 255ms for a cold cache, and 4ms and 143ms for a warm cache! Additionally, when there's a lot of archived or accepted patches (I used ~11000 archived out of the 15000 total in my test set) the query time goes from 28ms and 72ms down to 2ms and 33-40ms! Signed-off-by: Stewart Smith --- .../0028_add_list_covering_index.py | 19 +++ patchwork/models.py | 6 ++ 2 files changed, 25 insertions(+) create mode 100644 patchwork/migrations/0028_add_list_covering_index.py diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py new file mode 100644 index ..65ebaefbead7 --- /dev/null +++ b/patchwork/migrations/0028_add_list_covering_index.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 17:24 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0027_add_comment_date_index'), +] + +operations = [ +migrations.AddIndex( +model_name='patch', +index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d2389cfdad29..15224ad69cfa 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission): if django.VERSION >= (1, 10): base_manager_name = 'objects' +indexes = [ +# This is a covering index for the /list/ query + models.Index(fields=['archived','patch_project','state','delegate'], + name='patch_list_covering_idx'), +] + class Comment(EmailMixin, models.Model): # parent -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 10/11] Be sensible computing project patch counts
Django actively fights constructing a query that isn't insane. So, let's go and just execute a raw one. This is all very standard SQL so should execute everywhere without a problem. With the dataset of patchwork.ozlabs.org, looking at the /project/ page for qemu-devel would take 13 queries and 1500ms, with this patch it's down to 11 queries in ~250ms. For the dataset of the netdev list, it's down to 440ms from 1500ms. Signed-off-by: Stewart Smith --- patchwork/views/project.py | 23 --- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/patchwork/views/project.py b/patchwork/views/project.py index 484455c02d9d..2a75242a06af 100644 --- a/patchwork/views/project.py +++ b/patchwork/views/project.py @@ -27,6 +27,8 @@ from patchwork.compat import reverse from patchwork.models import Patch from patchwork.models import Project +from django.db import connection + def project_list(request): projects = Project.objects.all() @@ -44,14 +46,29 @@ def project_list(request): def project_detail(request, project_id): project = get_object_or_404(Project, linkname=project_id) -patches = Patch.objects.filter(project=project) + +# So, we revert to raw sql because if we do what you'd think would +# be the correct thing in Django-ese, it ends up doing a *pointless* +# join with patchwork_submissions that ends up ruining the query. +# So, we do not do this, as this is wrong: +#patches = Patch.objects.filter(patch_project_id=project.id).only('archived') +#patches = patches.annotate(c=Count('archived')) +# and instead do this, because it's simple and fast + +n_patches = {} +with connection.cursor() as cursor: +c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived', + [project.id]) + +for r in cursor: +n_patches[r[0]] = r[1] context = { 'project': project, 'maintainers': User.objects.filter( profile__maintainer_projects=project), -'n_patches': patches.filter(archived=False).count(), -'n_archived_patches': patches.filter(archived=True).count(), +'n_patches': n_patches[False], +'n_archived_patches': n_patches[True], 'enable_xmlrpc': settings.ENABLE_XMLRPC, } return render(request, 'patchwork/project.html', context) -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 07/11] Be particular over check_set and series prefetch for /list/
At this point it shaves at most 1-2ms off the query time for /linuxppc-dev/list/ Signed-off-by: Stewart Smith --- patchwork/views/__init__.py | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index 96fd0798af5a..64d3d5f9f22d 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -19,6 +19,7 @@ from django.contrib import messages from django.shortcuts import get_object_or_404 +from django.db.models import Prefetch from patchwork.compat import is_authenticated from patchwork.filters import Filters @@ -27,6 +28,8 @@ from patchwork.models import Bundle from patchwork.models import BundlePatch from patchwork.models import Patch from patchwork.models import Project +from patchwork.models import Check +from patchwork.models import Series from patchwork.paginator import Paginator @@ -290,7 +293,8 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, patches = patches.only('state','submitter','delegate','project','name','date') # we also need checks and series -patches = patches.prefetch_related('check_set', 'series') +patches = patches.prefetch_related(Prefetch('check_set', queryset=Check.objects.only('context','user_id','patch_id','state','date'))) +patches = patches.prefetch_related(Prefetch('series', queryset=Series.objects.only('name'))) paginator = Paginator(request, patches) -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 01/11] Improve patch listing performance (~3x)
There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. If we look at the query getting the list, it gets a lot of unnesseccary fields such as 'headers' and 'content', even though we tell Django not to. It turns out that Django seems to ignore the Submission relationship and I have no idea how to force it to ignore that thing (defer doesn't work) but if we go only, then it works okay. >From my import of ~8000 messages for a few projects, my laptop query time (MySQL, as setup by whatever the docker-compose things do) goes from: http://localhost:8000/project/linux-kernel/list/ FROM: 342ms SQL queries cold cache, 268ms warm cache TO: 118ms SQL queries cold cache, 88ms warm cache Which is... non trivial to say the least. The big jump is the patches.only change, and the removal of ordering on the patchseries takes a further 10ms off. For some strange reason, it seems rather hard to tell Django that you don't care what order the results come back in for that query (if we do, then the db server has to do a sort rather than just return each row) Signed-off-by: Stewart Smith --- patchwork/models.py | 1 - patchwork/views/__init__.py | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/patchwork/models.py b/patchwork/models.py index 6268f5b72e3c..d2b88fc48c91 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model): return self.name if self.name else 'Untitled series #%d' % self.id class Meta: -ordering = ('date',) verbose_name_plural = 'Series' diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index f8d23a388ac7..96fd0798af5a 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, # rendering the list template patches = patches.select_related('state', 'submitter', 'delegate') +patches = patches.only('state','submitter','delegate','project','name','date') + # we also need checks and series patches = patches.prefetch_related('check_set', 'series') -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 04/11] Fetch all series for patch/cover viewing
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20 queries in 12ms. A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries in 8ms. So, effectively, a near 2x perf improvement. Previously, at several points we were asking for the latest series and then asking for all the series. Since there just usually aren't *that* many series, fetch them all and take the first one if we need to. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 10 +- patchwork/views/cover.py | 2 +- patchwork/views/patch.py | 1 + 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index 2f69735d6925..3b6f9fbe909e 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id) -{% if submission.latest_series %} +{% if submission.all_series %} Series - {% for series in submission.series.all %} + {% for series in all_series %} - {% if series == submission.latest_series %} + {% if forloop.first %} {{ series }} {% else %} @@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id) >show -{% with submission.latest_series.cover_letter as cover %} +{% with all_series.cover_letter as cover %} {% if cover %} {% if cover == submission %} @@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id) {% endif %} {% endwith %} -{% for sibling in submission.latest_series.patches.all %} +{% for sibling in all_series.patches.all %} {% if sibling == submission %} {{ sibling.name|default:"[no subject]"|truncatechars:100 }} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index edad90bc694d..1ee2b3f988fa 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -49,7 +49,7 @@ def cover_detail(request, cover_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') context['comments'] = comments - +context['all_series'] = cover.series.all().order_by('-date') return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index f43fbecd9a4d..e1d0cdcfcf39 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -118,6 +118,7 @@ def patch_detail(request, patch_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') +context['all_series'] = patch.series.all().order_by('-date') context['comments'] = comments context['submission'] = patch context['patchform'] = form -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 08/11] Add covering index to patchwork_submissions for /list/ queries
This gets PostgreSQL to generate *much* better query plans, gaining us about two orders of magnitude in performance on the /list/ query for the worst state project on the patchwork.ozlabs.org instance (qemu-devel). Signed-off-by: Stewart Smith --- .../0029_add_submission_covering_index.py | 19 +++ patchwork/models.py | 8 2 files changed, 27 insertions(+) create mode 100644 patchwork/migrations/0029_add_submission_covering_index.py diff --git a/patchwork/migrations/0029_add_submission_covering_index.py b/patchwork/migrations/0029_add_submission_covering_index.py new file mode 100644 index ..064f2ac4ae9d --- /dev/null +++ b/patchwork/migrations/0029_add_submission_covering_index.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-10 16:15 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0028_add_list_covering_index'), +] + +operations = [ +migrations.AddIndex( +model_name='submission', +index=models.Index(fields=['date', 'project', 'submitter', 'name'], name='submission_covering_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d356a6379ac3..7a8d363c4652 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -389,6 +389,14 @@ class Submission(FilenameMixin, EmailMixin, models.Model): class Meta: ordering = ['date'] unique_together = [('msgid', 'project')] +indexes = [ +# This is a covering index for the /list/ query +# Like what we have for Patch, but used for displaying what we want +# rather than for working out the count +# (of course, this all depends on the SQL optimiser of your db engine) +models.Index(fields=['date','project','submitter','name'], + name='submission_covering_idx'), +] class SeriesMixin(object): -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 00/11] Performance for ALL THE THINGS!
This is an updated and extended patch series of the few patches I've sent out over the past few days. I've had a pretty major attack at performance of as many patchwork pages and bits of functionality that I could find. I've been using a mixture of test environments for this, initially a small dataset in MySQL, and later on, the data that's in patchwork.ozlabs.org in PostgreSQL. I've done some testing with a warm cache and cold cache (standard drop caches trick) to benchmark what the difference could be for things that aren't too commonly viewed. As of now, everything except adding patches to a bundle seems to be fewer than 20 SQL queries, and worst case for a cold cache seems to be a total of about a second in the database. I think the next big wins are going to be: - Adding a 'date' column to patchwork_patch so we don't have to go to patchwork_submission for it. (and then bending Django to produce a decent query that doesn't go there anyway) - Computing/caching patch counts for common queries. e.g. right now, just viewing the /list/ page for qemu-devel takes 26ms for the query to display the first page of patches, but 226ms to do the COUNT(). Even on high page numbers, where the query to generate the list could be 700ms on a cold cache, 226ms is a lot. i.e. a cache of the matrix of action_required/archived would give us a huge boost in page generation. - Redoing the create/add/remove from bundle logic to not do everything patch-at-a-time. - Tackling the efficiency of the python code itself. With this series, I'm struggling to find any query or page that takes more than a second of database time on my laptop, with the worst cases being around 400ms, and most being <40ms. Let me know if there's still anything slow with this patch set :) This series replaces: https://patchwork.ozlabs.org/patch/954921/ https://patchwork.ozlabs.org/patch/955321/ https://patchwork.ozlabs.org/patch/955322/ https://patchwork.ozlabs.org/patch/955342/ https://patchwork.ozlabs.org/patch/955347/ https://patchwork.ozlabs.org/patch/955392/ Stewart Smith (11): Improve patch listing performance (~3x) 4x performance improvement for viewing patch with many comments Add index for patchwork_comment (submission_id,date) Fetch all series for patch/cover viewing Add covering index for /list/ query check distinct(user) based on just user_id Be particular over check_set and series prefetch for /list/ Add covering index to patchwork_submissions for /list/ queries Optimise fetching checks when displaying a patch Be sensible computing project patch counts Fetch maintainer information in one query .../migrations/0027_add_comment_date_index.py | 23 + .../0028_add_list_covering_index.py | 19 ++ .../0029_add_submission_covering_index.py | 19 ++ patchwork/models.py | 21 ++-- patchwork/templates/patchwork/submission.html | 16 ++-- patchwork/views/__init__.py | 8 +- patchwork/views/cover.py | 5 patchwork/views/patch.py | 7 ++ patchwork/views/project.py| 25 --- 9 files changed, 128 insertions(+), 15 deletions(-) create mode 100644 patchwork/migrations/0027_add_comment_date_index.py create mode 100644 patchwork/migrations/0028_add_list_covering_index.py create mode 100644 patchwork/migrations/0029_add_submission_covering_index.py -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 11/11] Fetch maintainer information in one query
Viewing the /project/ page lists maintainers. Prior to this patch, this was done in one query to fetch the maintainer IDs, and then one query per mainatiner to get the name/email address. Now, with this patch, it's all in one query (yay joins) and saves a few ms of database queries for displaying the page. Realistically, this doesn't save us too much time as counting how many patches are there takes 99% of the database time for this page. Signed-off-by: Stewart Smith --- patchwork/views/project.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/patchwork/views/project.py b/patchwork/views/project.py index 2a75242a06af..fdcc6626eda1 100644 --- a/patchwork/views/project.py +++ b/patchwork/views/project.py @@ -66,7 +66,7 @@ def project_detail(request, project_id): context = { 'project': project, 'maintainers': User.objects.filter( -profile__maintainer_projects=project), +profile__maintainer_projects=project).select_related('profile'), 'n_patches': n_patches[False], 'n_archived_patches': n_patches[True], 'enable_xmlrpc': settings.ENABLE_XMLRPC, -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 02/11] 4x performance improvement for viewing patch with many comments
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com with my test dataset of a chunk of a variety of mailing lists, has this cover letter have 67 comments from a variety of people. Thus, it's on the larger side of things. Originally, displaying the /patch/550/ for this (redirected to /cover) would take 81 SQL queries in ~60ms on my laptop. After this optimisation, it's down to 14 queries in 14ms. When the cache is cold, it's down to 32ms from 83ms. The effect of this patch is to execute a join in the database to get the submitter information for each comment at the same time as getting all the comments rather than doing a one-by-one lookup after the fact. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 2 +- patchwork/views/cover.py | 5 + patchwork/views/patch.py | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index e817713f7079..2f69735d6925 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) -{% for item in submission.comments.all %} +{% for item in comments %} {% if forloop.first %} Comments {% endif %} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index 73f83cb99b99..edad90bc694d 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): 'project': cover.project, } +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') +context['comments'] = comments + return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index cbd4ec395d99..f43fbecd9a4d 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -114,6 +114,11 @@ def patch_detail(request, patch_id): if is_authenticated(request.user): context['bundles'] = request.user.bundles.all() +comments = patch.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') + +context['comments'] = comments context['submission'] = patch context['patchform'] = form context['createbundleform'] = createbundleform -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH 09/11] Optimise fetching checks when displaying a patch
Prior to this patch, a typical /patch// query for linuxppc-dev (which has about half a dozen checks per patch) took around 20 queries and 16.5ms in the database. About half of those queries were fetching the checks and who did the check. We can just do one query to get all that needed information, so we do that. This brings a page load down to 10 queries in 12ms. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 4 ++-- patchwork/views/patch.py | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index 3b6f9fbe909e..72bc947c3f2d 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -216,7 +216,7 @@ function toggle_div(link_id, headers_id) >{{ submission.pull_url }} {% endif %} -{% if submission.checks %} +{% if checks %} Checks @@ -224,7 +224,7 @@ function toggle_div(link_id, headers_id) Check Description -{% for check in submission.checks %} +{% for check in checks %} {{ check.user }}/{{ check.context }} diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index e1d0cdcfcf39..fbde04280844 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -119,6 +119,7 @@ def patch_detail(request, patch_id): comments = comments.only('submitter','date','id','content','submission') context['all_series'] = patch.series.all().order_by('-date') +context['checks'] = patch.check_set.all().select_related('user') context['comments'] = comments context['submission'] = patch context['patchform'] = form -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] Add covering index for /list/ query
In constructing the list of patches for a project, there are two main queries that are executed: 1) get a count() of how many patches there are 2) Get the page of results being displayed In a test dataset of ~11500 LKML patches and ~4000 others, the existing code would take around 585ms and 858ms with a cold cache and 28ms and 198ms for a warm cache. By adding a covering index, we get down to 4ms and 255ms for a cold cache, and 4ms and 143ms for a warm cache! Additionally, when there's a lot of archived or accepted patches (I used ~11000 archived out of the 15000 total in my test set) the query time goes from 28ms and 72ms down to 2ms and 33-40ms! Signed-off-by: Stewart Smith --- .../0028_add_list_covering_index.py | 19 +++ patchwork/models.py | 6 ++ 2 files changed, 25 insertions(+) create mode 100644 patchwork/migrations/0028_add_list_covering_index.py diff --git a/patchwork/migrations/0028_add_list_covering_index.py b/patchwork/migrations/0028_add_list_covering_index.py new file mode 100644 index ..65ebaefbead7 --- /dev/null +++ b/patchwork/migrations/0028_add_list_covering_index.py @@ -0,0 +1,19 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 17:24 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0027_add_comment_date_index'), +] + +operations = [ +migrations.AddIndex( +model_name='patch', +index=models.Index(fields=['archived', 'patch_project', 'state', 'delegate'], name='patch_list_covering_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d2389cfdad29..15224ad69cfa 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission): if django.VERSION >= (1, 10): base_manager_name = 'objects' +indexes = [ +# This is a covering index for the /list/ query + models.Index(fields=['archived','patch_project','state','delegate'], + name='patch_list_covering_idx'), +] + class Comment(EmailMixin, models.Model): # parent -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] Fetch all series for patch/cover viewing
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20 queries in 12ms. A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries in 8ms. So, effectively, a near 2x perf improvement. Previously, at several points we were asking for the latest series and then asking for all the series. Since there just usually aren't *that* many series, fetch them all and take the first one if we need to. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 10 +- patchwork/views/cover.py | 2 +- patchwork/views/patch.py | 1 + 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index 2f69735d6925..3b6f9fbe909e 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id) -{% if submission.latest_series %} +{% if submission.all_series %} Series - {% for series in submission.series.all %} + {% for series in all_series %} - {% if series == submission.latest_series %} + {% if forloop.first %} {{ series }} {% else %} @@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id) >show -{% with submission.latest_series.cover_letter as cover %} +{% with all_series.cover_letter as cover %} {% if cover %} {% if cover == submission %} @@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id) {% endif %} {% endwith %} -{% for sibling in submission.latest_series.patches.all %} +{% for sibling in all_series.patches.all %} {% if sibling == submission %} {{ sibling.name|default:"[no subject]"|truncatechars:100 }} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index edad90bc694d..1ee2b3f988fa 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -49,7 +49,7 @@ def cover_detail(request, cover_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') context['comments'] = comments - +context['all_series'] = cover.series.all().order_by('-date') return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index f43fbecd9a4d..e1d0cdcfcf39 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -118,6 +118,7 @@ def patch_detail(request, patch_id): comments = comments.select_related('submitter') comments = comments.only('submitter','date','id','content','submission') +context['all_series'] = patch.series.all().order_by('-date') context['comments'] = comments context['submission'] = patch context['patchform'] = form -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH v2] 4x performance improvement for viewing patch with many comments
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com with my test dataset of a chunk of a variety of mailing lists, has this cover letter have 67 comments from a variety of people. Thus, it's on the larger side of things. Originally, displaying the /patch/550/ for this (redirected to /cover) would take 81 SQL queries in ~60ms on my laptop. After this optimisation, it's down to 14 queries in 14ms. When the cache is cold, it's down to 32ms from 83ms. The effect of this patch is to execute a join in the database to get the submitter information for each comment at the same time as getting all the comments rather than doing a one-by-one lookup after the fact. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 2 +- patchwork/views/cover.py | 5 + patchwork/views/patch.py | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) --- Changes since v1: don't fat finger patch viewing (s/cover/patch/) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index e817713f7079..2f69735d6925 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) -{% for item in submission.comments.all %} +{% for item in comments %} {% if forloop.first %} Comments {% endif %} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index 73f83cb99b99..edad90bc694d 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): 'project': cover.project, } +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') +context['comments'] = comments + return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index cbd4ec395d99..f43fbecd9a4d 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -114,6 +114,11 @@ def patch_detail(request, patch_id): if is_authenticated(request.user): context['bundles'] = request.user.bundles.all() +comments = patch.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') + +context['comments'] = comments context['submission'] = patch context['patchform'] = form context['createbundleform'] = createbundleform -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] Add index for patchwork_comment (submission_id,date)
This (at least theoretically) should speed up displaying comments on patches/cover letters. It's an index that will return rows in-order for the query that we always do ("give me the comments on this submission in date order"), rather than having to have the database server do a sort for us. I haven't been able to benchmark something locally that shows this is an actual improvement, but I don't have as large data set as various production instances. The query plan does look a bit nicer though. Although the benefit of index maintenance versus how long it takes to sort things is a good question. Signed-off-by: Stewart Smith --- .../migrations/0027_add_comment_date_index.py | 23 +++ patchwork/models.py | 4 2 files changed, 27 insertions(+) create mode 100644 patchwork/migrations/0027_add_comment_date_index.py diff --git a/patchwork/migrations/0027_add_comment_date_index.py b/patchwork/migrations/0027_add_comment_date_index.py new file mode 100644 index ..0a57a9c3b212 --- /dev/null +++ b/patchwork/migrations/0027_add_comment_date_index.py @@ -0,0 +1,23 @@ +# -*- coding: utf-8 -*- +# Generated by Django 1.11.15 on 2018-08-09 14:03 +from __future__ import unicode_literals + +from django.db import migrations, models + + +class Migration(migrations.Migration): + +dependencies = [ +('patchwork', '0026_add_user_bundles_backref'), +] + +operations = [ +migrations.AlterModelOptions( +name='series', +options={'verbose_name_plural': 'Series'}, +), +migrations.AddIndex( +model_name='comment', +index=models.Index(fields=['submission', 'date'], name='submission_date_idx'), +), +] diff --git a/patchwork/models.py b/patchwork/models.py index d2b88fc48c91..d2389cfdad29 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model): class Meta: ordering = ['date'] unique_together = [('msgid', 'submission')] +indexes = [ +models.Index(name='submission_date_idx', + fields=['submission','date']) +] @python_2_unicode_compatible -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[PATCH] 4x performance improvement for viewing patch with many comments
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com with my test dataset of a chunk of a variety of mailing lists, has this cover letter have 67 comments from a variety of people. Thus, it's on the larger side of things. Originally, displaying the /patch/550/ for this (redirected to /cover) would take 81 SQL queries in ~60ms on my laptop. After this optimisation, it's down to 14 queries in 14ms. When the cache is cold, it's down to 32ms from 83ms. The effect of this patch is to execute a join in the database to get the submitter information for each comment at the same time as getting all the comments rather than doing a one-by-one lookup after the fact. Signed-off-by: Stewart Smith --- patchwork/templates/patchwork/submission.html | 2 +- patchwork/views/cover.py | 5 + patchwork/views/patch.py | 5 + 3 files changed, 11 insertions(+), 1 deletion(-) diff --git a/patchwork/templates/patchwork/submission.html b/patchwork/templates/patchwork/submission.html index e817713f7079..2f69735d6925 100644 --- a/patchwork/templates/patchwork/submission.html +++ b/patchwork/templates/patchwork/submission.html @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id) -{% for item in submission.comments.all %} +{% for item in comments %} {% if forloop.first %} Comments {% endif %} diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py index 73f83cb99b99..edad90bc694d 100644 --- a/patchwork/views/cover.py +++ b/patchwork/views/cover.py @@ -45,6 +45,11 @@ def cover_detail(request, cover_id): 'project': cover.project, } +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') +context['comments'] = comments + return render_to_response('patchwork/submission.html', context) diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py index cbd4ec395d99..bbd84954726e 100644 --- a/patchwork/views/patch.py +++ b/patchwork/views/patch.py @@ -114,6 +114,11 @@ def patch_detail(request, patch_id): if is_authenticated(request.user): context['bundles'] = request.user.bundles.all() +comments = cover.comments.all() +comments = comments.select_related('submitter') +comments = comments.only('submitter','date','id','content','submission') + +context['comments'] = comments context['submission'] = patch context['patchform'] = form context['createbundleform'] = createbundleform -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Daniel Axtens writes: > Daniel Axtens writes: > >> Konstantin Ryabitsev writes: >> >>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>>>There's two main bits that are really expensive when composing the list >>>>of patches for a project: the query getting the list, and the query >>>>finding the series for each patch. >>> >>> Stewart: >>> >>> Thanks for working on this! Do you think this would help with pagination >>> as well? I find that in semi-abandoned projects like >>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few >>> seconds to load the list view due to 800+ pages of unprocessed patches. >>> I am currently considering an automated script that would auto-archive >>> patches older than 6 months, but if simply improving pagination times >>> would fix the issue, then I wouldn't need to bother. >> >> It should do. From a brief look at his patch, it fixes a hot path in >> __init__.py that goes through the paginator (and which I was *sure* I >> had looked at, but apparently I had not!) I'll do a quick sanity check > > Ah, I have figured out what I had thought I had fixed! > > So you're hitting the issue where the db has to fetch the 'diff', > 'content' and 'headers' out of Submission just to list the patches, > which is clearly utterly bonkers. This is *despite* the defer() we > attempt to do - jk reported a Django bug for it: [1]. Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and then doing something kind of crazy to try and get things down to something even *remotely* sane. I was almost tempted to rewrite the page just as raw SQL rather than using Django at all, at least then it won't randomly regress in the future. I may be a cranky database guy with a healthy disdain for the SQL that comes out of ORMs though :) > I suppose it's worth taking the patch - the Django fix certainly isn't > going to hit the 1.11 stable tree and we're not supporting 2.0 just > yet. I do still want to check it properly first, so ... > >> on it and hopefully apply it to master and stable/2.1 within the next >> few days. (but no promises!) > > ... this lack-of-a-firm-timeline still applies. Any random testing / whatever I can do locally to try and help? This is literally my first time poking at anything Django. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] 3x improvement in patch listing
Konstantin Ryabitsev writes: > On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote: >>There's two main bits that are really expensive when composing the list >>of patches for a project: the query getting the list, and the query >>finding the series for each patch. > > Stewart: > > Thanks for working on this! Do you think this would help with pagination > as well? I find that in semi-abandoned projects like > https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few > seconds to load the list view due to 800+ pages of unprocessed patches. > I am currently considering an automated script that would auto-archive > patches older than 6 months, but if simply improving pagination times > would fix the issue, then I wouldn't need to bother. I may be accidentally letting people know I know something about databases again :) Out of interest, does the kernel.org instance back onto PostgreSQL or MySQL? I have a bunch of ideas that would quite dramatically improve performance on MySQL (properly using primary keys to force disk layout to be sane, rather than the current screw-locality method that Django enforces). On PostgreSQL, things are a bit different, but it's possible an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could help a *lot*. I'm not sure that archiving the patches would help query performance at all, but I'd have to check it out a bit. The query that Django generates is certainly "interesting". -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
[RFC PATCH] 3x improvement in patch listing
There's two main bits that are really expensive when composing the list of patches for a project: the query getting the list, and the query finding the series for each patch. If we look at the query getting the list, it gets a lot of unnesseccary fields such as 'headers' and 'content', even though we tell Django not to. It turns out that Django seems to ignore the Submission relationship and I have no idea how to force it to ignore that thing (defer doesn't work) but if we go only, then it works okay. >From my import of ~8000 messages for a few projects, my laptop query time (MySQL, as setup by whatever the docker-compose things do) goes from: http://localhost:8000/project/linux-kernel/list/ FROM: 342ms SQL queries cold cache, 268ms warm cache TO: 118ms SQL queries cold cache, 88ms warm cache Which is... non trivial to say the least. The big jump is the patches.only change, and the removal of ordering on the patchseries takes a further 10ms off. For some strange reason, it seems rather hard to tell Django that you don't care what order the results come back in for that query (if we do, then the db server has to do a sort rather than just return each row) Signed-off-by: Stewart Smith --- patchwork/models.py | 1 - patchwork/views/__init__.py | 2 ++ 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/patchwork/models.py b/patchwork/models.py index 6268f5b72e3c..d2b88fc48c91 100644 --- a/patchwork/models.py +++ b/patchwork/models.py @@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model): return self.name if self.name else 'Untitled series #%d' % self.id class Meta: -ordering = ('date',) verbose_name_plural = 'Series' diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py index f8d23a388ac7..96fd0798af5a 100644 --- a/patchwork/views/__init__.py +++ b/patchwork/views/__init__.py @@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, filter_settings=None, # rendering the list template patches = patches.select_related('state', 'submitter', 'delegate') +patches = patches.only('state','submitter','delegate','project','name','date') + # we also need checks and series patches = patches.prefetch_related('check_set', 'series') -- 2.17.1 ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
RAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 simply getting a list of patches for a project here is going to be a very expensive query, pulling in a lot of database pages containing data for patches around that time rather than ones relevant to the query. Again, I'd say go: PRIMARY KEY ('project_id', 'id') UNIQUE KEY ('id') (the project_id index can then be dropped, as the primary key could be used there). Of course, things get better if you query everything looking up project_id=X and id=Y rather than just id=Y. For PostgreSQL it gets a bit interesting as I'm not as familiar with how it does data layout. There *appears* to be a CLUSTER *command* that will re-organise an existing table but *not* make any future inserts/updates obey it (which is annoying). This seems to be a bit of a theme throughout the schema too... Actually... do we have a good test dataset I could load up locally and experiment with? > * Yes, I probably should have realised this at the time. something > something unpaid volunteers something something patch review something > something. :) pretty much neglecting some piece of software that ends up being vital open source infrastructure is a grand tradition, welcome! -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 26/07/18 23:24, Daniel Axtens wrote: >> Andrew Donnellan writes: >> >>> On 24/07/18 15:10, Andrew Donnellan wrote: >>>> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page >>>> size to the default page size in the settings. >>>> >>>> This turns out to be rather restrictive, as we usually want to keep the >>>> default page size low, but an administrator may want to allow API clients >>>> to fetch more than that per request. >>>> >>>> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. >>>> >>>> Closes: #202 ("Separate max API page size and default API page size into >>>> different settings") >>>> Suggested-by: Stewart Smith >>>> Suggested-by: Joel Stanley >>>> Signed-off-by: Andrew Donnellan >>> >>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to >>> be working. Would like some more input as to what an appropriate default >>> limit is. >> >> My completely fact-free feeling/opinion is that if it takes more than >> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not >> fussed. > > OzLabs.org is not a fast machine. 1 second round trip time == 100 per > page. 500 per page == ~2.3 seconds round trip. > > A single 500 item load is still under half the time of doing 5 * 100 > item loads... I bet MySQL would execute the query quicker :) Anyway, that sounds really quite slow and we should look at why on earth it's that slow - is there some kind of horrific hidden join or poor indexing or query plan or something? -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: [RFC PATCH] REST: Add new setting for maximum API page size
Andrew Donnellan writes: > On 24/07/18 15:10, Andrew Donnellan wrote: >> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page >> size to the default page size in the settings. >> >> This turns out to be rather restrictive, as we usually want to keep the >> default page size low, but an administrator may want to allow API clients >> to fetch more than that per request. >> >> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size. >> >> Closes: #202 ("Separate max API page size and default API page size into >> different settings") >> Suggested-by: Stewart Smith >> Suggested-by: Joel Stanley >> Signed-off-by: Andrew Donnellan > > FWIW we now have this applied on patchwork.ozlabs.org and it appears to > be working. Would like some more input as to what an appropriate default > limit is. >From a *consumer* of the API PoV, 500 at a time rather than 30 is at least a 6x speedup of my use of it, so that was extremely welcome. I haven't looked at the size of the responses I'm getting back so have no idea if 500 is a good one or not (I suspect I'd have to start optimizing my code around 700-1000 responses/call). My *guess* is that a fresh SQL query is run for each page retrieved, so maybe 500 is "good enough" while there isn't some way to just stream everything and not run the query multiple times. -- Stewart Smith OPAL Architect, IBM. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork