Re: [PATCH 2/4] migrations: 0043_merge_patch_submission: do less work

2021-07-16 Thread Stewart Smith
On Jul 16, 2021, at 10:20 AM, Daniel Axtens  wrote:
> 
>  - we do a lot of legwork to prevent there being a column with the same
>   name in a base class and subclass. This is purely for django's benefit
>   as databases don't see the class relationship and handle column names
>   being the same across tables just fine. Unfortunately renaming columns
>   in MySQL/MariaDB is _extremely_ expensive. So just don't do it: lie to
>   django about the names of the duplicate columns.

I believe that modern versions can ALTER TABLE ONLINE to do a column rename 
without having to copy all the data. Should also work for adding columns (some 
types).

Arguably Django could probably switch to trying specifying ONLINE as the method 
and falling back to offline (full copy) if it can’t do it online. 
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] docker: Switch to MariaDB for wider platform support

2020-06-13 Thread Stewart Smith
MySQL docker images only have amd64 arch support
MariaDB images have amd64, arm64v8, and ppc64le

Signed-off-by: Stewart Smith 
---
 docker-compose.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docker-compose.yml b/docker-compose.yml
index 1762d4a..1d49c51 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,7 +1,7 @@
 version: "3"
 services:
   db:
-image: mysql:5.7
+image: mariadb:10.4
 volumes:
   - ./tools/docker/db/data:/var/lib/mysql
 environment:
-- 
2.26.2

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-17 Thread Stewart Smith

> On 15 Nov 2019, at 09:43, Konstantin Ryabitsev 
>  wrote:
> 
> On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote:
>>> GET
>>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>> The query behind this takes about 1 minute to run on a 20-core HT Xeon
>>> system and requires creating a huge temporary file (there are 18375
>>> patches in that project).
>> Ouch, I'm sorry to hear that.
> 
> Well, it's true that some of kernel.org's projects are large beyond what 
> would be considered "sane". :)

My thought is that we should make it so that Patchwork is able to cope.

It’s not *that* much data or that complex queries (he says with his database 
hat on) :)


___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-17 Thread Stewart Smith

> On 15 Nov 2019, at 05:48, Daniel Axtens  wrote:
> 
> 
>> 
>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
>> of suffering due to someone trying to suck down all patches in the 
>> linux-arm-kernel project. This is what the API request looked like:
>> 
>> GET 
>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>  
>> 
>> The query behind this takes about 1 minute to run on a 20-core HT Xeon 
>> system and requires creating a huge temporary file (there are 18375 
>> patches in that project).
> 
> Ouch, I'm sorry to hear that.
> 
>> 
>> So, two questions, really:
>> 
>> 1. Any indexes we can put in place to make this query perform better?
> 
> We have a bunch of db magic contributed by Stewart that will hit 2.2.
> 
> Stewart, do you happen to know if any of your magic will affect API
> queries? They're advertised as affecting the general listing of patches
> in the UI, I'm not sure if they also affect this.
> 
> If not, we can definitely have a look at getting an index or rate
> limiting/authentication thingy in for 2.2.

I am pretty sure that my improvements would help in that specific query, 
probably not enough to be ideal though. I know they improve the web site 
equivalent operation, but don’t remember how much (if any) I poked at the API 
there.

I could take a look, I’d just need to set up a dev environment and grab a good 
set of data again.
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] docker: Add support for using eatmydata in the database

2019-05-14 Thread Stewart Smith
Russell Currey  writes:
> When running tox on a VM with presumably pretty busy spinning disks,
> using eatmydata with the database took running one configuration's test
> suite from (no exaggeration) 20 minutes down to 60 seconds.

As the author and been-attempting-to-no-longer-be-maintainer-of
eatmydata, this 20x improvement in test execution time doesn't really
surprise me. It turns out that properly flushing things to disk is
*really* expensive.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: Updating a "patch" state via REST

2018-10-10 Thread Stewart Smith
Markus Mayer  writes:
> I am working on a script that will, amongst other things, update the
> state ("Accepted", "Rejected", etc.) of patches in our own Patchwork
> installation. The script is using the REST API. All requests in the
> script, so far, are GET requests. They work fine.
>
> Now, I want to issue a PUT request to update the patch state, also
> using the REST API. However, no matter what I try, the request gets
> rejected by the server.

I think you want the PATCH request rather than PUT?

https://github.com/stewart-ibm/pwnm-sync is my script I use to update
patch status (from notmuch mail tags) and that seems to be the request
type that works for me (with patchwork.ozlabs.org)

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 10/11] Be sensible computing project patch counts

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-31 at 15:16 +0100, Stephen Finucane wrote:
>> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> > Django actively fights constructing a query that isn't insane.
>> > 
>> > So, let's go and just execute a raw one. This is all very standard
>> > SQL so should execute everywhere without a problem.
>> > 
>> > With the dataset of patchwork.ozlabs.org, looking at the /project/
>> > page for qemu-devel would take 13 queries and 1500ms,
>> > with this patch it's down to 11 queries in ~250ms.
>> > For the dataset of the netdev list, it's down to 440ms from 1500ms.
>> > 
>> > Signed-off-by: Stewart Smith 
>> 
>> I'm generally reluctant to dive into SQL unless it's absolutely
>> required. This is simple enough and the gains are significant so
>> 
>> Reviewed-by: Stephen Finucane 
>
> Actually, there's a potential issue here which I spotted while testing
> with a smaller (odd, I know) set of patches. I can fix at merge time,
> assuming a respin isn't needed.
>
>> Is this a potential bug report for Django?
>> 
>> Stephen
>> 
>> > ---
>> >  patchwork/views/project.py | 23 ---
>> >  1 file changed, 20 insertions(+), 3 deletions(-)
>> > 
>> > diff --git a/patchwork/views/project.py b/patchwork/views/project.py
>> > index 484455c02d9d..2a75242a06af 100644
>> > --- a/patchwork/views/project.py
>> > +++ b/patchwork/views/project.py
>> > @@ -27,6 +27,8 @@ from patchwork.compat import reverse
>> >  from patchwork.models import Patch
>> >  from patchwork.models import Project
>> >  
>> > +from django.db import connection
>> > +
>> >  
>> >  def project_list(request):
>> >  projects = Project.objects.all()
>> > @@ -44,14 +46,29 @@ def project_list(request):
>> >  
>> >  def project_detail(request, project_id):
>> >  project = get_object_or_404(Project, linkname=project_id)
>> > -patches = Patch.objects.filter(project=project)
>> > +
>> > +# So, we revert to raw sql because if we do what you'd think would
>> > +# be the correct thing in Django-ese, it ends up doing a *pointless*
>> > +# join with patchwork_submissions that ends up ruining the query.
>> > +# So, we do not do this, as this is wrong:
>> > +#patches = 
>> > Patch.objects.filter(patch_project_id=project.id).only('archived')
>> > +#patches = patches.annotate(c=Count('archived'))
>> > +# and instead do this, because it's simple and fast
>> > +
>> > +n_patches = {}
>> > +with connection.cursor() as cursor:
>> > +c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c 
>> > FROM patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
>> > +   [project.id])
>> > +
>> > +for r in cursor:
>> > +n_patches[r[0]] = r[1]
>> >  
>> >  context = {
>> >  'project': project,
>> >  'maintainers': User.objects.filter(
>> >  profile__maintainer_projects=project),
>> > -'n_patches': patches.filter(archived=False).count(),
>> > -'n_archived_patches': patches.filter(archived=True).count(),
>> > +'n_patches': n_patches[False],
>> > +    'n_archived_patches': n_patches[True],
>
> It's possible that there are (a) no patches, (b) all patches are
> archived, or (c) no patches are archived. If any of these are true,
> you'll get a key error. We can fix this very simply like so:
>
>   'n_patches': n_patches[False] if False in n_patches else 0,
>   'n_archived_patches': n_patches[True] if True in n_patches else 0,
>
> As noted above, I'll fix this at merge time and add a test to ensure we
> don't regress.

Oh, neat! Yeah, I likely didn't test those scenarios at all, I was
focused on large amounts of data rather than no data.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 10/11] Be sensible computing project patch counts

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> Django actively fights constructing a query that isn't insane.
>> 
>> So, let's go and just execute a raw one. This is all very standard
>> SQL so should execute everywhere without a problem.
>> 
>> With the dataset of patchwork.ozlabs.org, looking at the /project/
>> page for qemu-devel would take 13 queries and 1500ms,
>> with this patch it's down to 11 queries in ~250ms.
>> For the dataset of the netdev list, it's down to 440ms from 1500ms.
>> 
>> Signed-off-by: Stewart Smith 
>
> I'm generally reluctant to dive into SQL unless it's absolutely
> required. This is simple enough and the gains are significant so

(Imagine all my exasperation at wanting to do that for all the other
patches :)

> Reviewed-by: Stephen Finucane 
>
> Is this a potential bug report for Django?

It probably is. With some luck I'll find the time to write up a small
test case and send it upstream.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 05/11] Add covering index for /list/ query

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-10 at 18:01 +1000, Stewart Smith wrote:
>> In constructing the list of patches for a project, there are two
>> main queries that are executed:
>> 1) get a count() of how many patches there are
>> 2) Get the page of results being displayed
>> 
>> In a test dataset of ~11500 LKML patches and ~4000 others, the existing
>> code would take around 585ms and 858ms with a cold cache and 28ms and
>> 198ms for a warm cache.
>> 
>> By adding a covering index, we get down to 4ms and 255ms for a cold
>> cache, and 4ms and 143ms for a warm cache!
>> 
>> Additionally, when there's a lot of archived or accepted patches
>> (I used ~11000 archived out of the 15000 total in my test set)
>> the query time goes from 28ms and 72ms down to 2ms and 33-40ms!
>> 
>> Signed-off-by: Stewart Smith 
>
> As before, I'm trusting your leet skillz here in all DB-related things.
> I don't have a large enough dataset to validate this properly but I am
> seeing a performance improvement in the smaller set I do have.

Simply put, an index is just a tree with a key of
something like this: 
|IndexField1|IndexField|PrimaryKeyOrPointerToHowtoGetFullRow|
So, if you have all the fields you need to answer a query in the index,
the database engine doesn't *need* to go and find the real row, it can
answer the query just from the index - saving potentially *lot* of disk
seeks as well as being a lot more cache friendly - especially as this
will group things on disk based on the order of the index, so if we
include things the right way for us, we get a cache friendly structure
that groups information on patches in a project close together on disk
and in an order that we're likely to request.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 03/11] Add index for patchwork_comment (submission_id, date)

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote:
>> This (at least theoretically) should speed up displaying comments
>> on patches/cover letters. It's an index that will return rows
>> in-order for the query that we always do ("give me the comments
>> on this submission in date order"), rather than having to have
>> the database server do a sort for us.
>> 
>> I haven't been able to benchmark something locally that shows
>> this is an actual improvement, but I don't have as large data
>> set as various production instances. The query plan does look
>> a bit nicer though. Although the benefit of index maintenance
>> versus how long it takes to sort things is a good question.
>
> Indexes scare me purely because there aren't clear "this is when you do
> this" and "then is when you don't do this" guidelines :) However, I
> trust your DB knowledge here and it should be easy to revisit this so

Hopefully we can get some index use statistics out of some big live
sites and inform what we should keep / drop based on real data rather
than gut feelings.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 02/11] 4x performance improvement for viewing patch with many comments

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote:
>> Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
>> with my test dataset of a chunk of a variety of mailing lists, has
>> this cover letter have 67 comments from a variety of people. Thus,
>> it's on the larger side of things.
>> 
>> Originally, displaying the /patch/550/ for this (redirected to /cover)
>> would take 81 SQL queries in ~60ms on my laptop.
>> 
>> After this optimisation, it's down to 14 queries in 14ms.
>> 
>> When the cache is cold, it's down to 32ms from 83ms.
>> 
>> The effect of this patch is to execute a join in the database to
>> get the submitter information for each comment at the same time as
>> getting all the comments rather than doing a one-by-one lookup after
>> the fact.
>> 
>> Signed-off-by: Stewart Smith 
>
> Looks good to me. Daniel's already pointed out the comma-space thing
> (blame pep8) but we'll fix at merge time.
>
> Reviewed-by: Stephen Finucane 
>
> I think the learning here, and for other, related patches in the
> series, is that we should be more careful using any model relationships
> in templates.

Yeah, that was a bit of a theme for some of the performance things I was
looking at.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 01/11] Improve patch listing performance (~3x)

2018-09-09 Thread Stewart Smith
Stephen Finucane  writes:
> On Fri, 2018-08-10 at 18:00 +1000, Stewart Smith wrote:
>> There's two main bits that are really expensive when composing the list
>> of patches for a project: the query getting the list, and the query
>> finding the series for each patch.
>> 
>> If we look at the query getting the list, it gets a lot of unnesseccary
>> fields such as 'headers' and 'content', even though we tell Django not
>> to. It turns out that Django seems to ignore the Submission relationship
>> and I have no idea how to force it to ignore that thing (defer doesn't
>> work) but if we go only, then it works okay.
>> 
>> From my import of ~8000 messages for a few projects, my laptop query
>> time (MySQL, as setup by whatever the docker-compose things do) goes
>> from:
>> 
>> http://localhost:8000/project/linux-kernel/list/
>> FROM:
>> 342ms SQL queries cold cache, 268ms warm cache
>> TO:
>> 118ms SQL queries cold cache, 88ms warm cache
>> 
>> Which is... non trivial to say the least.
>> 
>> The big jump is the patches.only change, and the removal of ordering
>> on the patchseries takes a further 10ms off. For some strange reason, it
>> seems rather hard to tell Django that you don't care what order the
>> results come back in for that query (if we do, then the db server has to
>> do a sort rather than just return each row)
>> 
>> Signed-off-by: Stewart Smith 
>
> Tested this and it looks good to me. There's a migration missing (it's
> instead squashed into the first "add index" patch) but I can fix that
> at merge time.

Ahh, sorry about that.

> I was concerned that the combination of '.only' followed immediately by
> '.prefetch_related' (for checks and series) would be an issue and
> indeed it seems that, at this patch set at least, we do generate a
> query per check. However, this is no different to before and it
> resolved in a later check. As such:
>
> Reviewed-by: Stephen Finucane 
>
> I'm not going to push this (or any of the series) yet in case Daniel
> wants to weigh in but will do so next week if there are no complaints.

Thanks for testing it all, increases my confidence with the patches as
well :)

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 01/11] Improve patch listing performance (~3x)

2018-08-29 Thread Stewart Smith
Stephen Finucane  writes:
> On Sat, 2018-08-11 at 04:28 +1000, Daniel Axtens wrote:
>> Stewart Smith  writes:
>> So, further to our conversation with Konstantin, I tested this against
>> Django 2.0. It still saves us some time - it means we no longer load the
>> following fields:
>> 
>> `patchwork_submission`.`id`, `patchwork_submission`.`msgid`,
>> `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`,
>> `patchwork_patch`.`archived`, `patchwork_patch`.`hash`,
>> `patchwork_patch`.`patch_project_id`, 
>
> Out of curiosity, does this allow us to drop the patch_project_id field
> entirely or not (bear in mind, I haven't read through the rest of
> this).

I'm not too sure... Umm... maybe not as I think we can go from patch to
project when viewing a patch?

>> This obviously saves the db some work and communication overhead.
>> 
>> I'm a little nervous that this will slightly complicate some of the
>> further denormalisation but I think that's probably a price worth paying
>> unless Stephen objects.
>
> To be honest, it could be 6 months before we get any further on big
> efforts like this. A 3x performance boost right now beats an unknown
> boost N months down the line, IMO. My 2c anyway.

I'd tend to agree, we may as well get the maximum performance we can
today, and then if it's *really* needed later down the track, we can
change the schema a bunch to denormalise.

>From my benchmarks, the biggest boost currently would be to precompute
and cache patch counts for most of the common views, as a *lot* of page
time is for that.

>> I do still want to test the 'ordering' change a bit more before
>> committing it though.
>
> I'm going to attempt to do my own testing of this (with a much larger
> dataset) some evening this week. I'll hold off applying anything
> pending this though.

Nice, let me know how it goes. I'll be off somewhere in a national park
for a bunch of next week, which means I won't look at email, so it's the
*perfect* time to merge and deploy all of my patches :)

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 02/11] 4x performance improvement for viewing patch with many comments

2018-08-14 Thread Stewart Smith
Daniel Axtens  writes:
> Stewart Smith  writes:
>
>> Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
>> with my test dataset of a chunk of a variety of mailing lists, has
>> this cover letter have 67 comments from a variety of people. Thus,
>> it's on the larger side of things.
>>
>> Originally, displaying the /patch/550/ for this (redirected to /cover)
>> would take 81 SQL queries in ~60ms on my laptop.
>>
>> After this optimisation, it's down to 14 queries in 14ms.
>>
>> When the cache is cold, it's down to 32ms from 83ms.
>>
>> The effect of this patch is to execute a join in the database to
>> get the submitter information for each comment at the same time as
>> getting all the comments rather than doing a one-by-one lookup after
>> the fact.
>>
>> Signed-off-by: Stewart Smith 
>> ---
>>  patchwork/templates/patchwork/submission.html | 2 +-
>>  patchwork/views/cover.py  | 5 +
>>  patchwork/views/patch.py  | 5 +
>>  3 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/patchwork/templates/patchwork/submission.html 
>> b/patchwork/templates/patchwork/submission.html
>> index e817713f7079..2f69735d6925 100644
>> --- a/patchwork/templates/patchwork/submission.html
>> +++ b/patchwork/templates/patchwork/submission.html
>> @@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
>>  
>>  
>>  
>> -{% for item in submission.comments.all %}
>> +{% for item in comments %}
>>  {% if forloop.first %}
>>  Comments
>>  {% endif %}
>> diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
>> index 73f83cb99b99..edad90bc694d 100644
>> --- a/patchwork/views/cover.py
>> +++ b/patchwork/views/cover.py
>> @@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
>>  'project': cover.project,
>>  }
>>  
>> +comments = cover.comments.all()
>> +comments = comments.select_related('submitter')
>> +comments = comments.only('submitter','date','id','content','submission')
> These items should be separated by ", " not just ",". I can fix this up
> when I apply. You also don't need 'id' as it's added automatically.

(thanks for fixup).

Hrm.. I wonder why I had id there then... I wouldn't put it past Django
to do something silly otherwise, but I could well be wrong.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [PATCH 01/11] Improve patch listing performance (~3x)

2018-08-12 Thread Stewart Smith
Daniel Axtens  writes:
> Stewart Smith  writes:
>
>> There's two main bits that are really expensive when composing the list
>> of patches for a project: the query getting the list, and the query
>> finding the series for each patch.
>>
>> If we look at the query getting the list, it gets a lot of unnesseccary
>> fields such as 'headers' and 'content', even though we tell Django not
>> to. It turns out that Django seems to ignore the Submission relationship
>> and I have no idea how to force it to ignore that thing (defer doesn't
>> work) but if we go only, then it works okay.
>>
>> From my import of ~8000 messages for a few projects, my laptop query
>> time (MySQL, as setup by whatever the docker-compose things do) goes
>> from:
>>
>> http://localhost:8000/project/linux-kernel/list/
>> FROM:
>> 342ms SQL queries cold cache, 268ms warm cache
>> TO:
>> 118ms SQL queries cold cache, 88ms warm cache
>>
>> Which is... non trivial to say the least.
>>
>> The big jump is the patches.only change, and the removal of ordering
>> on the patchseries takes a further 10ms off. For some strange reason, it
>> seems rather hard to tell Django that you don't care what order the
>> results come back in for that query (if we do, then the db server has to
>> do a sort rather than just return each row)
>
> Thanks Stewart! It's great to get some real DB experience - feel free to
> hang around! :)

I'll try to :)

Or at least pop up to make things faster/nicer to the database.

> So, further to our conversation with Konstantin, I tested this against
> Django 2.0. It still saves us some time - it means we no longer load the
> following fields:
>
> `patchwork_submission`.`id`, `patchwork_submission`.`msgid`,
> `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`,
> `patchwork_patch`.`archived`, `patchwork_patch`.`hash`,
> `patchwork_patch`.`patch_project_id`, 
>
> This obviously saves the db some work and communication overhead.

I found that 'headers' and 'content' from the patch were also being
fetched, and it saved fetching those (which are really quite expensive
to do, as blobs are typically mostly stored on a separate page than the
row itself).

It's the blobs that were the big expense, as they'd likely generate 2
disk seeks and reads per row.

> I'm a little nervous that this will slightly complicate some of the
> further denormalisation but I think that's probably a price worth paying
> unless Stephen objects.

I don't think it should really be too much of a problem, as any
denormalisation occurs, folk are going to have to look at the queries
being produced and executed anyway otherwise there's always a chance for
regressions.

> I do still want to test the 'ordering' change a bit more before
> committing it though.

Yeah, that struck me as a weird one, but it certainly did have an effect
on some things.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 03/11] Add index for patchwork_comment (submission_id,date)

2018-08-10 Thread Stewart Smith
This (at least theoretically) should speed up displaying comments
on patches/cover letters. It's an index that will return rows
in-order for the query that we always do ("give me the comments
on this submission in date order"), rather than having to have
the database server do a sort for us.

I haven't been able to benchmark something locally that shows
this is an actual improvement, but I don't have as large data
set as various production instances. The query plan does look
a bit nicer though. Although the benefit of index maintenance
versus how long it takes to sort things is a good question.

Signed-off-by: Stewart Smith 
---
 .../migrations/0027_add_comment_date_index.py | 23 +++
 patchwork/models.py   |  4 
 2 files changed, 27 insertions(+)
 create mode 100644 patchwork/migrations/0027_add_comment_date_index.py

diff --git a/patchwork/migrations/0027_add_comment_date_index.py 
b/patchwork/migrations/0027_add_comment_date_index.py
new file mode 100644
index ..0a57a9c3b212
--- /dev/null
+++ b/patchwork/migrations/0027_add_comment_date_index.py
@@ -0,0 +1,23 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 14:03
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0026_add_user_bundles_backref'),
+]
+
+operations = [
+migrations.AlterModelOptions(
+name='series',
+options={'verbose_name_plural': 'Series'},
+),
+migrations.AddIndex(
+model_name='comment',
+index=models.Index(fields=['submission', 'date'], 
name='submission_date_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2b88fc48c91..d2389cfdad29 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model):
 class Meta:
 ordering = ['date']
 unique_together = [('msgid', 'submission')]
+indexes = [
+models.Index(name='submission_date_idx',
+ fields=['submission','date'])
+]
 
 
 @python_2_unicode_compatible
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 06/11] check distinct(user) based on just user_id

2018-08-10 Thread Stewart Smith
The logic to display the Check(s) on the patch list wants to really do a
DISTINCT(user_id,context) ORDER BY DATE query, but with Django that is
currently a bit Too Hard (at least for me).

But what we can do is from python just use the user_id rather than the
user object itself. Same functionality, no join or prefetching users.

This saves a couple of hundred queries on the linuxppc/list/ view and
makes loading it about 4x faster in terms of time spent in the db.

Signed-off-by: Stewart Smith 
---
 patchwork/models.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/patchwork/models.py b/patchwork/models.py
index 15224ad69cfa..d356a6379ac3 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -539,7 +539,7 @@ class Patch(SeriesMixin, Submission):
 
 for check in self.check_set.all():
 ctx = check.context
-user = check.user
+user = check.user_id
 
 if user in unique and ctx in unique[user]:
 # recheck condition - ignore the older result
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 05/11] Add covering index for /list/ query

2018-08-10 Thread Stewart Smith
In constructing the list of patches for a project, there are two
main queries that are executed:
1) get a count() of how many patches there are
2) Get the page of results being displayed

In a test dataset of ~11500 LKML patches and ~4000 others, the existing
code would take around 585ms and 858ms with a cold cache and 28ms and
198ms for a warm cache.

By adding a covering index, we get down to 4ms and 255ms for a cold
cache, and 4ms and 143ms for a warm cache!

Additionally, when there's a lot of archived or accepted patches
(I used ~11000 archived out of the 15000 total in my test set)
the query time goes from 28ms and 72ms down to 2ms and 33-40ms!

Signed-off-by: Stewart Smith 
---
 .../0028_add_list_covering_index.py   | 19 +++
 patchwork/models.py   |  6 ++
 2 files changed, 25 insertions(+)
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py

diff --git a/patchwork/migrations/0028_add_list_covering_index.py 
b/patchwork/migrations/0028_add_list_covering_index.py
new file mode 100644
index ..65ebaefbead7
--- /dev/null
+++ b/patchwork/migrations/0028_add_list_covering_index.py
@@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 17:24
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0027_add_comment_date_index'),
+]
+
+operations = [
+migrations.AddIndex(
+model_name='patch',
+index=models.Index(fields=['archived', 'patch_project', 'state', 
'delegate'], name='patch_list_covering_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2389cfdad29..15224ad69cfa 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission):
 if django.VERSION >= (1, 10):
 base_manager_name = 'objects'
 
+indexes = [
+# This is a covering index for the /list/ query
+
models.Index(fields=['archived','patch_project','state','delegate'],
+ name='patch_list_covering_idx'),
+]
+
 
 class Comment(EmailMixin, models.Model):
 # parent
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 10/11] Be sensible computing project patch counts

2018-08-10 Thread Stewart Smith
Django actively fights constructing a query that isn't insane.

So, let's go and just execute a raw one. This is all very standard
SQL so should execute everywhere without a problem.

With the dataset of patchwork.ozlabs.org, looking at the /project/
page for qemu-devel would take 13 queries and 1500ms,
with this patch it's down to 11 queries in ~250ms.
For the dataset of the netdev list, it's down to 440ms from 1500ms.

Signed-off-by: Stewart Smith 
---
 patchwork/views/project.py | 23 ---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/patchwork/views/project.py b/patchwork/views/project.py
index 484455c02d9d..2a75242a06af 100644
--- a/patchwork/views/project.py
+++ b/patchwork/views/project.py
@@ -27,6 +27,8 @@ from patchwork.compat import reverse
 from patchwork.models import Patch
 from patchwork.models import Project
 
+from django.db import connection
+
 
 def project_list(request):
 projects = Project.objects.all()
@@ -44,14 +46,29 @@ def project_list(request):
 
 def project_detail(request, project_id):
 project = get_object_or_404(Project, linkname=project_id)
-patches = Patch.objects.filter(project=project)
+
+# So, we revert to raw sql because if we do what you'd think would
+# be the correct thing in Django-ese, it ends up doing a *pointless*
+# join with patchwork_submissions that ends up ruining the query.
+# So, we do not do this, as this is wrong:
+#patches = 
Patch.objects.filter(patch_project_id=project.id).only('archived')
+#patches = patches.annotate(c=Count('archived'))
+# and instead do this, because it's simple and fast
+
+n_patches = {}
+with connection.cursor() as cursor:
+c = cursor.execute('SELECT archived,COUNT(submission_ptr_id) as c FROM 
patchwork_patch WHERE patch_project_id=%s GROUP BY archived',
+   [project.id])
+
+for r in cursor:
+n_patches[r[0]] = r[1]
 
 context = {
 'project': project,
 'maintainers': User.objects.filter(
 profile__maintainer_projects=project),
-'n_patches': patches.filter(archived=False).count(),
-'n_archived_patches': patches.filter(archived=True).count(),
+'n_patches': n_patches[False],
+'n_archived_patches': n_patches[True],
 'enable_xmlrpc': settings.ENABLE_XMLRPC,
 }
 return render(request, 'patchwork/project.html', context)
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 07/11] Be particular over check_set and series prefetch for /list/

2018-08-10 Thread Stewart Smith
At this point it shaves at most 1-2ms off the query time for
/linuxppc-dev/list/

Signed-off-by: Stewart Smith 
---
 patchwork/views/__init__.py | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index 96fd0798af5a..64d3d5f9f22d 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -19,6 +19,7 @@
 
 from django.contrib import messages
 from django.shortcuts import get_object_or_404
+from django.db.models import Prefetch
 
 from patchwork.compat import is_authenticated
 from patchwork.filters import Filters
@@ -27,6 +28,8 @@ from patchwork.models import Bundle
 from patchwork.models import BundlePatch
 from patchwork.models import Patch
 from patchwork.models import Project
+from patchwork.models import Check
+from patchwork.models import Series
 from patchwork.paginator import Paginator
 
 
@@ -290,7 +293,8 @@ def generic_list(request, project, view, view_args=None, 
filter_settings=None,
 patches = 
patches.only('state','submitter','delegate','project','name','date')
 
 # we also need checks and series
-patches = patches.prefetch_related('check_set', 'series')
+patches = patches.prefetch_related(Prefetch('check_set', 
queryset=Check.objects.only('context','user_id','patch_id','state','date')))
+patches = patches.prefetch_related(Prefetch('series', 
queryset=Series.objects.only('name')))
 
 paginator = Paginator(request, patches)
 
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 01/11] Improve patch listing performance (~3x)

2018-08-10 Thread Stewart Smith
There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.

If we look at the query getting the list, it gets a lot of unnesseccary
fields such as 'headers' and 'content', even though we tell Django not
to. It turns out that Django seems to ignore the Submission relationship
and I have no idea how to force it to ignore that thing (defer doesn't
work) but if we go only, then it works okay.

>From my import of ~8000 messages for a few projects, my laptop query
time (MySQL, as setup by whatever the docker-compose things do) goes
from:

http://localhost:8000/project/linux-kernel/list/
FROM:
342ms SQL queries cold cache, 268ms warm cache
TO:
118ms SQL queries cold cache, 88ms warm cache

Which is... non trivial to say the least.

The big jump is the patches.only change, and the removal of ordering
on the patchseries takes a further 10ms off. For some strange reason, it
seems rather hard to tell Django that you don't care what order the
results come back in for that query (if we do, then the db server has to
do a sort rather than just return each row)

Signed-off-by: Stewart Smith 
---
 patchwork/models.py | 1 -
 patchwork/views/__init__.py | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/patchwork/models.py b/patchwork/models.py
index 6268f5b72e3c..d2b88fc48c91 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model):
 return self.name if self.name else 'Untitled series #%d' % self.id
 
 class Meta:
-ordering = ('date',)
 verbose_name_plural = 'Series'
 
 
diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index f8d23a388ac7..96fd0798af5a 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, 
filter_settings=None,
 # rendering the list template
 patches = patches.select_related('state', 'submitter', 'delegate')
 
+patches = 
patches.only('state','submitter','delegate','project','name','date')
+
 # we also need checks and series
 patches = patches.prefetch_related('check_set', 'series')
 
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 04/11] Fetch all series for patch/cover viewing

2018-08-10 Thread Stewart Smith
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20
queries in 12ms.

A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries
in 8ms.

So, effectively, a near 2x perf improvement.

Previously, at several points we were asking for the latest series and
then asking for all the series. Since there just usually aren't *that*
many series, fetch them all and take the first one if we need to.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 10 +-
 patchwork/views/cover.py  |  2 +-
 patchwork/views/patch.py  |  1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index 2f69735d6925..3b6f9fbe909e 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id)

   
  
-{% if submission.latest_series %}
+{% if submission.all_series %}
  
   Series
   

 
- {% for series in submission.series.all %}
+ {% for series in all_series %}
  
-  {% if series == submission.latest_series %}
+  {% if forloop.first %}
{{ series }}
   {% else %}

@@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id)
>show

 
-{% with submission.latest_series.cover_letter as cover %}
+{% with all_series.cover_letter as cover %}
  
  {% if cover %}
   {% if cover == submission %}
@@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id)
  {% endif %}
  
 {% endwith %}
-{% for sibling in submission.latest_series.patches.all %}
+{% for sibling in all_series.patches.all %}
  
   {% if sibling == submission %}
{{ sibling.name|default:"[no subject]"|truncatechars:100 }}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index edad90bc694d..1ee2b3f988fa 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -49,7 +49,7 @@ def cover_detail(request, cover_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 context['comments'] = comments
-
+context['all_series'] = cover.series.all().order_by('-date')
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index f43fbecd9a4d..e1d0cdcfcf39 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -118,6 +118,7 @@ def patch_detail(request, patch_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 
+context['all_series'] = patch.series.all().order_by('-date')
 context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 08/11] Add covering index to patchwork_submissions for /list/ queries

2018-08-10 Thread Stewart Smith
This gets PostgreSQL to generate *much* better query plans, gaining us
about two orders of magnitude in performance on the /list/ query for the
worst state project on the patchwork.ozlabs.org instance (qemu-devel).

Signed-off-by: Stewart Smith 
---
 .../0029_add_submission_covering_index.py | 19 +++
 patchwork/models.py   |  8 
 2 files changed, 27 insertions(+)
 create mode 100644 patchwork/migrations/0029_add_submission_covering_index.py

diff --git a/patchwork/migrations/0029_add_submission_covering_index.py 
b/patchwork/migrations/0029_add_submission_covering_index.py
new file mode 100644
index ..064f2ac4ae9d
--- /dev/null
+++ b/patchwork/migrations/0029_add_submission_covering_index.py
@@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-10 16:15
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0028_add_list_covering_index'),
+]
+
+operations = [
+migrations.AddIndex(
+model_name='submission',
+index=models.Index(fields=['date', 'project', 'submitter', 
'name'], name='submission_covering_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d356a6379ac3..7a8d363c4652 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -389,6 +389,14 @@ class Submission(FilenameMixin, EmailMixin, models.Model):
 class Meta:
 ordering = ['date']
 unique_together = [('msgid', 'project')]
+indexes = [
+# This is a covering index for the /list/ query
+# Like what we have for Patch, but used for displaying what we want
+# rather than for working out the count
+# (of course, this all depends on the SQL optimiser of your db 
engine)
+models.Index(fields=['date','project','submitter','name'],
+ name='submission_covering_idx'),
+]
 
 
 class SeriesMixin(object):
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 00/11] Performance for ALL THE THINGS!

2018-08-10 Thread Stewart Smith
This is an updated and extended patch series of the few patches I've
sent out over the past few days.

I've had a pretty major attack at performance of as many patchwork pages
and bits of functionality that I could find.

I've been using a mixture of test environments for this, initially a
small dataset in MySQL, and later on, the data that's in
patchwork.ozlabs.org in PostgreSQL.

I've done some testing with a warm cache and cold cache (standard drop
caches trick) to benchmark what the difference could be for things that
aren't too commonly viewed.

As of now, everything except adding patches to a bundle seems to be
fewer than 20 SQL queries, and worst case for a cold cache seems to be a
total of about a second in the database.

I think the next big wins are going to be:
- Adding a 'date' column to patchwork_patch so we don't have to go to
  patchwork_submission for it.
  (and then bending Django to produce a decent query that doesn't go
  there anyway)
- Computing/caching patch counts for common queries.
  e.g. right now, just viewing the /list/ page for qemu-devel takes 26ms
  for the query to display the first page of patches, but 226ms to do the
  COUNT(). Even on high page numbers, where the query to generate the
  list could be 700ms on a cold cache, 226ms is a lot.
  i.e. a cache of the matrix of action_required/archived would give us a
  huge boost in page generation.
- Redoing the create/add/remove from bundle logic to not do everything
  patch-at-a-time.
- Tackling the efficiency of the python code itself.

With this series, I'm struggling to find any query or page that takes
more than a second of database time on my laptop, with the worst cases
being around 400ms, and most being <40ms.

Let me know if there's still anything slow with this patch set :)

This series replaces:
https://patchwork.ozlabs.org/patch/954921/
https://patchwork.ozlabs.org/patch/955321/
https://patchwork.ozlabs.org/patch/955322/
https://patchwork.ozlabs.org/patch/955342/
https://patchwork.ozlabs.org/patch/955347/
https://patchwork.ozlabs.org/patch/955392/

Stewart Smith (11):
  Improve patch listing performance (~3x)
  4x performance improvement for viewing patch with many comments
  Add index for patchwork_comment (submission_id,date)
  Fetch all series for patch/cover viewing
  Add covering index for /list/ query
  check distinct(user) based on just user_id
  Be particular over check_set and series prefetch for /list/
  Add covering index to patchwork_submissions for /list/ queries
  Optimise fetching checks when displaying a patch
  Be sensible computing project patch counts
  Fetch maintainer information in one query

 .../migrations/0027_add_comment_date_index.py | 23 +
 .../0028_add_list_covering_index.py   | 19 ++
 .../0029_add_submission_covering_index.py | 19 ++
 patchwork/models.py   | 21 ++--
 patchwork/templates/patchwork/submission.html | 16 ++--
 patchwork/views/__init__.py   |  8 +-
 patchwork/views/cover.py  |  5 
 patchwork/views/patch.py  |  7 ++
 patchwork/views/project.py| 25 ---
 9 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 patchwork/migrations/0027_add_comment_date_index.py
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py
 create mode 100644 patchwork/migrations/0029_add_submission_covering_index.py

-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 11/11] Fetch maintainer information in one query

2018-08-10 Thread Stewart Smith
Viewing the /project/ page lists maintainers. Prior to this patch,
this was done in one query to fetch the maintainer IDs, and then one
query per mainatiner to get the name/email address.

Now, with this patch, it's all in one query (yay joins) and saves
a few ms of database queries for displaying the page.

Realistically, this doesn't save us too much time as counting how many
patches are there takes 99% of the database time for this page.

Signed-off-by: Stewart Smith 
---
 patchwork/views/project.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/patchwork/views/project.py b/patchwork/views/project.py
index 2a75242a06af..fdcc6626eda1 100644
--- a/patchwork/views/project.py
+++ b/patchwork/views/project.py
@@ -66,7 +66,7 @@ def project_detail(request, project_id):
 context = {
 'project': project,
 'maintainers': User.objects.filter(
-profile__maintainer_projects=project),
+profile__maintainer_projects=project).select_related('profile'),
 'n_patches': n_patches[False],
 'n_archived_patches': n_patches[True],
 'enable_xmlrpc': settings.ENABLE_XMLRPC,
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 02/11] 4x performance improvement for viewing patch with many comments

2018-08-10 Thread Stewart Smith
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
with my test dataset of a chunk of a variety of mailing lists, has
this cover letter have 67 comments from a variety of people. Thus,
it's on the larger side of things.

Originally, displaying the /patch/550/ for this (redirected to /cover)
would take 81 SQL queries in ~60ms on my laptop.

After this optimisation, it's down to 14 queries in 14ms.

When the cache is cold, it's down to 32ms from 83ms.

The effect of this patch is to execute a join in the database to
get the submitter information for each comment at the same time as
getting all the comments rather than doing a one-by-one lookup after
the fact.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 2 +-
 patchwork/views/cover.py  | 5 +
 patchwork/views/patch.py  | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index e817713f7079..2f69735d6925 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
 
 
 
-{% for item in submission.comments.all %}
+{% for item in comments %}
 {% if forloop.first %}
 Comments
 {% endif %}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index 73f83cb99b99..edad90bc694d 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
 'project': cover.project,
 }
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+context['comments'] = comments
+
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index cbd4ec395d99..f43fbecd9a4d 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -114,6 +114,11 @@ def patch_detail(request, patch_id):
 if is_authenticated(request.user):
 context['bundles'] = request.user.bundles.all()
 
+comments = patch.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+
+context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
 context['createbundleform'] = createbundleform
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH 09/11] Optimise fetching checks when displaying a patch

2018-08-10 Thread Stewart Smith
Prior to this patch, a typical /patch// query for linuxppc-dev
(which has about half a dozen checks per patch) took around 20 queries
and 16.5ms in the database. About half of those queries were fetching
the checks and who did the check.

We can just do one query to get all that needed information, so we do
that. This brings a page load down to 10 queries in 12ms.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 4 ++--
 patchwork/views/patch.py  | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index 3b6f9fbe909e..72bc947c3f2d 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -216,7 +216,7 @@ function toggle_div(link_id, headers_id)
  >{{ submission.pull_url }}
 {% endif %}
 
-{% if submission.checks %}
+{% if checks %}
 Checks
 
 
@@ -224,7 +224,7 @@ function toggle_div(link_id, headers_id)
   Check
   Description
 
-{% for check in submission.checks %}
+{% for check in checks %}
 
   {{ check.user }}/{{ check.context }}
   
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index e1d0cdcfcf39..fbde04280844 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -119,6 +119,7 @@ def patch_detail(request, patch_id):
 comments = comments.only('submitter','date','id','content','submission')
 
 context['all_series'] = patch.series.all().order_by('-date')
+context['checks'] = patch.check_set.all().select_related('user')
 context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] Add covering index for /list/ query

2018-08-09 Thread Stewart Smith
In constructing the list of patches for a project, there are two
main queries that are executed:
1) get a count() of how many patches there are
2) Get the page of results being displayed

In a test dataset of ~11500 LKML patches and ~4000 others, the existing
code would take around 585ms and 858ms with a cold cache and 28ms and
198ms for a warm cache.

By adding a covering index, we get down to 4ms and 255ms for a cold
cache, and 4ms and 143ms for a warm cache!

Additionally, when there's a lot of archived or accepted patches
(I used ~11000 archived out of the 15000 total in my test set)
the query time goes from 28ms and 72ms down to 2ms and 33-40ms!

Signed-off-by: Stewart Smith 
---
 .../0028_add_list_covering_index.py   | 19 +++
 patchwork/models.py   |  6 ++
 2 files changed, 25 insertions(+)
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py

diff --git a/patchwork/migrations/0028_add_list_covering_index.py 
b/patchwork/migrations/0028_add_list_covering_index.py
new file mode 100644
index ..65ebaefbead7
--- /dev/null
+++ b/patchwork/migrations/0028_add_list_covering_index.py
@@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 17:24
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0027_add_comment_date_index'),
+]
+
+operations = [
+migrations.AddIndex(
+model_name='patch',
+index=models.Index(fields=['archived', 'patch_project', 'state', 
'delegate'], name='patch_list_covering_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2389cfdad29..15224ad69cfa 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -598,6 +598,12 @@ class Patch(SeriesMixin, Submission):
 if django.VERSION >= (1, 10):
 base_manager_name = 'objects'
 
+indexes = [
+# This is a covering index for the /list/ query
+
models.Index(fields=['archived','patch_project','state','delegate'],
+ name='patch_list_covering_idx'),
+]
+
 
 class Comment(EmailMixin, models.Model):
 # parent
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] Fetch all series for patch/cover viewing

2018-08-08 Thread Stewart Smith
e.g. a 10 comment patch goes from 26 queries in 17-20ms down to 20
queries in 12ms.

A 67 comment cover letter goes from 14 queries in 16ms down to 8 queries
in 8ms.

So, effectively, a near 2x perf improvement.

Previously, at several points we were asking for the latest series and
then asking for all the series. Since there just usually aren't *that*
many series, fetch them all and take the first one if we need to.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 10 +-
 patchwork/views/cover.py  |  2 +-
 patchwork/views/patch.py  |  1 +
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index 2f69735d6925..3b6f9fbe909e 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -64,15 +64,15 @@ function toggle_div(link_id, headers_id)

   
  
-{% if submission.latest_series %}
+{% if submission.all_series %}
  
   Series
   

 
- {% for series in submission.series.all %}
+ {% for series in all_series %}
  
-  {% if series == submission.latest_series %}
+  {% if forloop.first %}
{{ series }}
   {% else %}

@@ -93,7 +93,7 @@ function toggle_div(link_id, headers_id)
>show

 
-{% with submission.latest_series.cover_letter as cover %}
+{% with all_series.cover_letter as cover %}
  
  {% if cover %}
   {% if cover == submission %}
@@ -106,7 +106,7 @@ function toggle_div(link_id, headers_id)
  {% endif %}
  
 {% endwith %}
-{% for sibling in submission.latest_series.patches.all %}
+{% for sibling in all_series.patches.all %}
  
   {% if sibling == submission %}
{{ sibling.name|default:"[no subject]"|truncatechars:100 }}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index edad90bc694d..1ee2b3f988fa 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -49,7 +49,7 @@ def cover_detail(request, cover_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 context['comments'] = comments
-
+context['all_series'] = cover.series.all().order_by('-date')
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index f43fbecd9a4d..e1d0cdcfcf39 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -118,6 +118,7 @@ def patch_detail(request, patch_id):
 comments = comments.select_related('submitter')
 comments = comments.only('submitter','date','id','content','submission')
 
+context['all_series'] = patch.series.all().order_by('-date')
 context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH v2] 4x performance improvement for viewing patch with many comments

2018-08-08 Thread Stewart Smith
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
with my test dataset of a chunk of a variety of mailing lists, has
this cover letter have 67 comments from a variety of people. Thus,
it's on the larger side of things.

Originally, displaying the /patch/550/ for this (redirected to /cover)
would take 81 SQL queries in ~60ms on my laptop.

After this optimisation, it's down to 14 queries in 14ms.

When the cache is cold, it's down to 32ms from 83ms.

The effect of this patch is to execute a join in the database to
get the submitter information for each comment at the same time as
getting all the comments rather than doing a one-by-one lookup after
the fact.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 2 +-
 patchwork/views/cover.py  | 5 +
 patchwork/views/patch.py  | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

---
Changes since v1: don't fat finger patch viewing (s/cover/patch/)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index e817713f7079..2f69735d6925 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
 
 
 
-{% for item in submission.comments.all %}
+{% for item in comments %}
 {% if forloop.first %}
 Comments
 {% endif %}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index 73f83cb99b99..edad90bc694d 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
 'project': cover.project,
 }
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+context['comments'] = comments
+
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index cbd4ec395d99..f43fbecd9a4d 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -114,6 +114,11 @@ def patch_detail(request, patch_id):
 if is_authenticated(request.user):
 context['bundles'] = request.user.bundles.all()
 
+comments = patch.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+
+context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
 context['createbundleform'] = createbundleform
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] Add index for patchwork_comment (submission_id,date)

2018-08-08 Thread Stewart Smith
This (at least theoretically) should speed up displaying comments
on patches/cover letters. It's an index that will return rows
in-order for the query that we always do ("give me the comments
on this submission in date order"), rather than having to have
the database server do a sort for us.

I haven't been able to benchmark something locally that shows
this is an actual improvement, but I don't have as large data
set as various production instances. The query plan does look
a bit nicer though. Although the benefit of index maintenance
versus how long it takes to sort things is a good question.

Signed-off-by: Stewart Smith 
---
 .../migrations/0027_add_comment_date_index.py | 23 +++
 patchwork/models.py   |  4 
 2 files changed, 27 insertions(+)
 create mode 100644 patchwork/migrations/0027_add_comment_date_index.py

diff --git a/patchwork/migrations/0027_add_comment_date_index.py 
b/patchwork/migrations/0027_add_comment_date_index.py
new file mode 100644
index ..0a57a9c3b212
--- /dev/null
+++ b/patchwork/migrations/0027_add_comment_date_index.py
@@ -0,0 +1,23 @@
+# -*- coding: utf-8 -*-
+# Generated by Django 1.11.15 on 2018-08-09 14:03
+from __future__ import unicode_literals
+
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+
+dependencies = [
+('patchwork', '0026_add_user_bundles_backref'),
+]
+
+operations = [
+migrations.AlterModelOptions(
+name='series',
+options={'verbose_name_plural': 'Series'},
+),
+migrations.AddIndex(
+model_name='comment',
+index=models.Index(fields=['submission', 'date'], 
name='submission_date_idx'),
+),
+]
diff --git a/patchwork/models.py b/patchwork/models.py
index d2b88fc48c91..d2389cfdad29 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -625,6 +625,10 @@ class Comment(EmailMixin, models.Model):
 class Meta:
 ordering = ['date']
 unique_together = [('msgid', 'submission')]
+indexes = [
+models.Index(name='submission_date_idx',
+ fields=['submission','date'])
+]
 
 
 @python_2_unicode_compatible
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[PATCH] 4x performance improvement for viewing patch with many comments

2018-08-08 Thread Stewart Smith
Using the example of id:20180720035941.6844-1-khand...@linux.vnet.ibm.com
with my test dataset of a chunk of a variety of mailing lists, has
this cover letter have 67 comments from a variety of people. Thus,
it's on the larger side of things.

Originally, displaying the /patch/550/ for this (redirected to /cover)
would take 81 SQL queries in ~60ms on my laptop.

After this optimisation, it's down to 14 queries in 14ms.

When the cache is cold, it's down to 32ms from 83ms.

The effect of this patch is to execute a join in the database to
get the submitter information for each comment at the same time as
getting all the comments rather than doing a one-by-one lookup after
the fact.

Signed-off-by: Stewart Smith 
---
 patchwork/templates/patchwork/submission.html | 2 +-
 patchwork/views/cover.py  | 5 +
 patchwork/views/patch.py  | 5 +
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/patchwork/templates/patchwork/submission.html 
b/patchwork/templates/patchwork/submission.html
index e817713f7079..2f69735d6925 100644
--- a/patchwork/templates/patchwork/submission.html
+++ b/patchwork/templates/patchwork/submission.html
@@ -262,7 +262,7 @@ function toggle_div(link_id, headers_id)
 
 
 
-{% for item in submission.comments.all %}
+{% for item in comments %}
 {% if forloop.first %}
 Comments
 {% endif %}
diff --git a/patchwork/views/cover.py b/patchwork/views/cover.py
index 73f83cb99b99..edad90bc694d 100644
--- a/patchwork/views/cover.py
+++ b/patchwork/views/cover.py
@@ -45,6 +45,11 @@ def cover_detail(request, cover_id):
 'project': cover.project,
 }
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+context['comments'] = comments
+
 return render_to_response('patchwork/submission.html', context)
 
 
diff --git a/patchwork/views/patch.py b/patchwork/views/patch.py
index cbd4ec395d99..bbd84954726e 100644
--- a/patchwork/views/patch.py
+++ b/patchwork/views/patch.py
@@ -114,6 +114,11 @@ def patch_detail(request, patch_id):
 if is_authenticated(request.user):
 context['bundles'] = request.user.bundles.all()
 
+comments = cover.comments.all()
+comments = comments.select_related('submitter')
+comments = comments.only('submitter','date','id','content','submission')
+
+context['comments'] = comments
 context['submission'] = patch
 context['patchform'] = form
 context['createbundleform'] = createbundleform
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
Daniel Axtens  writes:
> Daniel Axtens  writes:
>
>> Konstantin Ryabitsev  writes:
>>
>>> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>>>There's two main bits that are really expensive when composing the list
>>>>of patches for a project: the query getting the list, and the query
>>>>finding the series for each patch.
>>>
>>> Stewart:
>>>
>>> Thanks for working on this! Do you think this would help with pagination 
>>> as well? I find that in semi-abandoned projects like 
>>> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
>>> seconds to load the list view due to 800+ pages of unprocessed patches.  
>>> I am currently considering an automated script that would auto-archive 
>>> patches older than 6 months, but if simply improving pagination times 
>>> would fix the issue, then I wouldn't need to bother.
>>
>> It should do. From a brief look at his patch, it fixes a hot path in
>> __init__.py that goes through the paginator (and which I was *sure* I
>> had looked at, but apparently I had not!) I'll do a quick sanity check
>
> Ah, I have figured out what I had thought I had fixed!
>
> So you're hitting the issue where the db has to fetch the 'diff',
> 'content' and 'headers' out of Submission just to list the patches,
> which is clearly utterly bonkers. This is *despite* the defer() we
> attempt to do - jk reported a Django bug for it: [1].

Yeah, that'd explain it. I was kind of going "wtf" a lot at that, and
then doing something kind of crazy to try and get things down to
something even *remotely* sane.

I was almost tempted to rewrite the page just as raw SQL rather than
using Django at all, at least then it won't randomly regress in the
future. I may be a cranky database guy with a healthy disdain for the
SQL that comes out of ORMs though :)

> I suppose it's worth taking the patch - the Django fix certainly isn't
> going to hit the 1.11 stable tree and we're not supporting 2.0 just
> yet. I do still want to check it properly first, so ...
>
>> on it and hopefully apply it to master and stable/2.1 within the next
>> few days. (but no promises!)
>
> ... this lack-of-a-firm-timeline still applies.

Any random testing / whatever I can do locally to try and help?

This is literally my first time poking at anything Django.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
Konstantin Ryabitsev  writes:
> On Wed, Aug 08, 2018 at 05:40:06PM +1000, Stewart Smith wrote:
>>There's two main bits that are really expensive when composing the list
>>of patches for a project: the query getting the list, and the query
>>finding the series for each patch.
>
> Stewart:
>
> Thanks for working on this! Do you think this would help with pagination 
> as well? I find that in semi-abandoned projects like 
> https://patchwork.kernel.org/project/qemu-devel/list/ it takes a few 
> seconds to load the list view due to 800+ pages of unprocessed patches.  
> I am currently considering an automated script that would auto-archive 
> patches older than 6 months, but if simply improving pagination times 
> would fix the issue, then I wouldn't need to bother.

I may be accidentally letting people know I know something about
databases again :)

Out of interest, does the kernel.org instance back onto PostgreSQL or
MySQL? I have a bunch of ideas that would quite dramatically improve
performance on MySQL (properly using primary keys to force disk layout
to be sane, rather than the current screw-locality method that Django
enforces). On PostgreSQL, things are a bit different, but it's possible
an occasional ALTER TABLE CLUSTER BY (or whatever the syntax is) could
help a *lot*.

I'm not sure that archiving the patches would help query performance at
all, but I'd have to check it out a bit. The query that Django generates
is certainly "interesting".

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


[RFC PATCH] 3x improvement in patch listing

2018-08-08 Thread Stewart Smith
There's two main bits that are really expensive when composing the list
of patches for a project: the query getting the list, and the query
finding the series for each patch.

If we look at the query getting the list, it gets a lot of unnesseccary
fields such as 'headers' and 'content', even though we tell Django not
to. It turns out that Django seems to ignore the Submission relationship
and I have no idea how to force it to ignore that thing (defer doesn't
work) but if we go only, then it works okay.

>From my import of ~8000 messages for a few projects, my laptop query
time (MySQL, as setup by whatever the docker-compose things do) goes
from:

http://localhost:8000/project/linux-kernel/list/
FROM:
342ms SQL queries cold cache, 268ms warm cache
TO:
118ms SQL queries cold cache, 88ms warm cache

Which is... non trivial to say the least.

The big jump is the patches.only change, and the removal of ordering
on the patchseries takes a further 10ms off. For some strange reason, it
seems rather hard to tell Django that you don't care what order the
results come back in for that query (if we do, then the db server has to
do a sort rather than just return each row)

Signed-off-by: Stewart Smith 
---
 patchwork/models.py | 1 -
 patchwork/views/__init__.py | 2 ++
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/patchwork/models.py b/patchwork/models.py
index 6268f5b72e3c..d2b88fc48c91 100644
--- a/patchwork/models.py
+++ b/patchwork/models.py
@@ -747,7 +747,6 @@ class Series(FilenameMixin, models.Model):
 return self.name if self.name else 'Untitled series #%d' % self.id
 
 class Meta:
-ordering = ('date',)
 verbose_name_plural = 'Series'
 
 
diff --git a/patchwork/views/__init__.py b/patchwork/views/__init__.py
index f8d23a388ac7..96fd0798af5a 100644
--- a/patchwork/views/__init__.py
+++ b/patchwork/views/__init__.py
@@ -287,6 +287,8 @@ def generic_list(request, project, view, view_args=None, 
filter_settings=None,
 # rendering the list template
 patches = patches.select_related('state', 'submitter', 'delegate')
 
+patches = 
patches.only('state','submitter','delegate','project','name','date')
+
 # we also need checks and series
 patches = patches.prefetch_related('check_set', 'series')
 
-- 
2.17.1

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
RAINT `patchwork_patch_submitter_id_57c66142_fk_patchwork_person_id` 
FOREIGN KEY (`submitter_id`) REFERENCES `patchwork_person` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

simply getting a list of patches for a project here is going to be a
very expensive query, pulling in a lot of database pages containing data
for patches around that time rather than ones relevant to the query.

Again, I'd say go:
PRIMARY KEY ('project_id', 'id')
UNIQUE KEY ('id')

(the project_id index can then be dropped, as the primary key could be
used there).

Of course, things get better if you query everything looking up
project_id=X and id=Y rather than just id=Y.

For PostgreSQL it gets a bit interesting as I'm not as familiar with how
it does data layout. There *appears* to be a CLUSTER *command* that will
re-organise an existing table but *not* make any future inserts/updates
obey it (which is annoying).

This seems to be a bit of a theme throughout the schema too...

Actually... do we have a good test dataset I could load up locally and
experiment with?

> * Yes, I probably should have realised this at the time. something
>   something unpaid volunteers something something patch review something
>   something.

:)

pretty much neglecting some piece of software that ends up being vital
open source infrastructure is a grand tradition, welcome!

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 26/07/18 23:24, Daniel Axtens wrote:
>> Andrew Donnellan  writes:
>> 
>>> On 24/07/18 15:10, Andrew Donnellan wrote:
>>>> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
>>>> size to the default page size in the settings.
>>>>
>>>> This turns out to be rather restrictive, as we usually want to keep the
>>>> default page size low, but an administrator may want to allow API clients
>>>> to fetch more than that per request.
>>>>
>>>> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.
>>>>
>>>> Closes: #202 ("Separate max API page size and default API page size into 
>>>> different settings")
>>>> Suggested-by: Stewart Smith 
>>>> Suggested-by: Joel Stanley 
>>>> Signed-off-by: Andrew Donnellan 
>>>
>>> FWIW we now have this applied on patchwork.ozlabs.org and it appears to
>>> be working. Would like some more input as to what an appropriate default
>>> limit is.
>> 
>> My completely fact-free feeling/opinion is that if it takes more than
>> ~1sec to run on Ozlabs, it's probably too high. Apart from that, I'm not
>> fussed.
>
> OzLabs.org is not a fast machine. 1 second round trip time == 100 per 
> page. 500 per page == ~2.3 seconds round trip.
>
> A single 500 item load is still under half the time of doing 5 * 100 
> item loads...

I bet MySQL would execute the query quicker :)

Anyway, that sounds really quite slow and we should look at why on earth
it's that slow - is there some kind of horrific hidden join or poor
indexing or query plan or something?

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: [RFC PATCH] REST: Add new setting for maximum API page size

2018-08-08 Thread Stewart Smith
Andrew Donnellan  writes:
> On 24/07/18 15:10, Andrew Donnellan wrote:
>> In 41790caf59ad ("REST: Limit max page size") we limited the maximum page
>> size to the default page size in the settings.
>> 
>> This turns out to be rather restrictive, as we usually want to keep the
>> default page size low, but an administrator may want to allow API clients
>> to fetch more than that per request.
>> 
>> Add a new setting, MAX_REST_RESULTS_PER_PAGE, to set the maximum page size.
>> 
>> Closes: #202 ("Separate max API page size and default API page size into 
>> different settings")
>> Suggested-by: Stewart Smith 
>> Suggested-by: Joel Stanley 
>> Signed-off-by: Andrew Donnellan 
>
> FWIW we now have this applied on patchwork.ozlabs.org and it appears to 
> be working. Would like some more input as to what an appropriate default 
> limit is.

>From a *consumer* of the API PoV, 500 at a time rather than 30 is at
least a 6x speedup of my use of it, so that was extremely welcome. I
haven't looked at the size of the responses I'm getting back so have no
idea if 500 is a good one or not (I suspect I'd have to start optimizing
my code around 700-1000 responses/call).

My *guess* is that a fresh SQL query is run for each page retrieved, so
maybe 500 is "good enough" while there isn't some way to just stream
everything and not run the query multiple times.

-- 
Stewart Smith
OPAL Architect, IBM.

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork