Re: DB-murdering API query (index suggestions needed)

2020-03-16 Thread Daniel Axtens
Daniel Axtens  writes:

> Daniel Axtens  writes:
>
>>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
>>> of suffering due to someone trying to suck down all patches in the 
>>> linux-arm-kernel project. This is what the API request looked like:
>>>
>>> GET 
>>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>>  
>>>
>>> The query behind this takes about 1 minute to run on a 20-core HT Xeon 
>>> system and requires creating a huge temporary file (there are 18375 
>>> patches in that project).
>>
>> Ouch, I'm sorry to hear that.
>>
>>>
>>> So, two questions, really:
>>>
>>> 1. Any indexes we can put in place to make this query perform better?
>>
>> We have a bunch of db magic contributed by Stewart that will hit 2.2.
>>
>> Stewart, do you happen to know if any of your magic will affect API
>> queries? They're advertised as affecting the general listing of patches
>> in the UI, I'm not sure if they also affect this.
>>
>> If not, we can definitely have a look at getting an index or rate
>> limiting/authentication thingy in for 2.2.
>>
>
> Trying this out, it looks like we haven't fixed this for 2.2.

Yep. Adding a project filter creates this amazingly awful set of
queries, including a subquery. It looks like Stewart set out to fix a
similar problem within a view, but we don't have the same flexibility in
DRF. I think the new models will solve this but I really don't want to
wait.

Regards,
Daniel


3. SELECT COUNT(*) FROM (SELECT DISTINCT `patchwork_submission`.`id` AS Col1, 
`patchwork_submission`.`msgid` AS Col2, `patchwork_submission`.`date` AS Col3, 
`patchwork_submission`.`submitter_id` AS Col4, 
`patchwork_submission`.`project_id` AS Col5, `patchwork_submission`.`name` AS 
Col6, `patchwork_patch`.`submission_ptr_id` AS Col7, 
`patchwork_patch`.`commit_ref` AS Col8, `patchwork_patch`.`pull_url` AS Col9, 
`patchwork_patch`.`delegate_id` AS Col10, `patchwork_patch`.`state_id` AS 
Col11, `patchwork_patch`.`archived` AS Col12, `patchwork_patch`.`hash` AS 
Col13, `patchwork_patch`.`patch_project_id` AS Col14, 
`patchwork_patch`.`series_id` AS Col15, `patchwork_patch`.`number` AS Col16, 
`patchwork_patch`.`related_id` AS Col17 FROM `patchwork_patch` INNER JOIN 
`patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = 
`patchwork_submission`.`id`) WHERE `patchwork_submission`.`project_id` = 1) 
subquery

4. SELECT DISTINCT `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, 
`patchwork_submission`.`date`, `patchwork_submission`.`submitter_id`, 
`patchwork_submission`.`project_id`, `patchwork_submission`.`name`, 
`patchwork_patch`.`submission_ptr_id`, `patchwork_patch`.`commit_ref`, 
`patchwork_patch`.`pull_url`, `patchwork_patch`.`delegate_id`, 
`patchwork_patch`.`state_id`, `patchwork_patch`.`archived`, 
`patchwork_patch`.`hash`, `patchwork_patch`.`patch_project_id`, 
`patchwork_patch`.`series_id`, `patchwork_patch`.`number`, 
`patchwork_patch`.`related_id`, `patchwork_person`.`id`, 
`patchwork_person`.`email`, `patchwork_person`.`name`, 
`patchwork_person`.`user_id`, `patchwork_project`.`id`, 
`patchwork_project`.`linkname`, `patchwork_project`.`name`, 
`patchwork_project`.`listid`, `patchwork_project`.`listemail`, 
`patchwork_project`.`subject_match`, `patchwork_project`.`web_url`, 
`patchwork_project`.`scm_url`, `patchwork_project`.`webscm_url`, 
`patchwork_project`.`list_archive_url`,
  `patchwork_project`.`list_archive_url_format`, 
`patchwork_project`.`commit_url_format`, 
`patchwork_project`.`send_notifications`, `patchwork_project`.`use_tags`, 
`auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, 
`auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, 
`auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, 
`auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, 
`patchwork_state`.`name`, `patchwork_state`.`slug`, 
`patchwork_state`.`ordering`, `patchwork_state`.`action_required`, 
`patchwork_series`.`id`, `patchwork_series`.`project_id`, 
`patchwork_series`.`cover_letter_id`, `patchwork_series`.`name`, 
`patchwork_series`.`date`, `patchwork_series`.`submitter_id`, 
`patchwork_series`.`version`, `patchwork_series`.`total`, T8.`id`, 
T8.`linkname`, T8.`name`, T8.`listid`, T8.`listemail`, T8.`subject_match`, 
T8.`web_url`, T8.`scm_url`, T8.`webscm_url`, T8.`list_archive_url`, 
T8.`list_archive_url_format`
 , T8.`commit_url_format`, T8.`send_notifications`, T8.`use_tags` FROM 
`patchwork_patch` INNER JOIN `patchwork_submission` ON 
(`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) INNER 
JOIN `patchwork_project` ON (`patchwork_submission`.`project_id` = 
`patchwork_project`.`id`) INNER JOIN `patchwork_person` ON 
(`patchwork_submission`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER 
JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT 
OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` = 

Re: DB-murdering API query (index suggestions needed)

2020-03-16 Thread Daniel Axtens
Daniel Axtens  writes:

>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
>> of suffering due to someone trying to suck down all patches in the 
>> linux-arm-kernel project. This is what the API request looked like:
>>
>> GET 
>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>  
>>
>> The query behind this takes about 1 minute to run on a 20-core HT Xeon 
>> system and requires creating a huge temporary file (there are 18375 
>> patches in that project).
>
> Ouch, I'm sorry to hear that.
>
>>
>> So, two questions, really:
>>
>> 1. Any indexes we can put in place to make this query perform better?
>
> We have a bunch of db magic contributed by Stewart that will hit 2.2.
>
> Stewart, do you happen to know if any of your magic will affect API
> queries? They're advertised as affecting the general listing of patches
> in the UI, I'm not sure if they also affect this.
>
> If not, we can definitely have a look at getting an index or rate
> limiting/authentication thingy in for 2.2.
>

Trying this out, it looks like we haven't fixed this for 2.2.

Regards,
Daniel

> Regards,
> Daniel
>
>> 2. Is there a way to disable anonymous API access?
>>
>> -K
>> ___
>> Patchwork mailing list
>> Patchwork@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/patchwork
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-17 Thread Stewart Smith

> On 15 Nov 2019, at 09:43, Konstantin Ryabitsev 
>  wrote:
> 
> On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote:
>>> GET
>>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>> The query behind this takes about 1 minute to run on a 20-core HT Xeon
>>> system and requires creating a huge temporary file (there are 18375
>>> patches in that project).
>> Ouch, I'm sorry to hear that.
> 
> Well, it's true that some of kernel.org's projects are large beyond what 
> would be considered "sane". :)

My thought is that we should make it so that Patchwork is able to cope.

It’s not *that* much data or that complex queries (he says with his database 
hat on) :)


___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-17 Thread Stewart Smith

> On 15 Nov 2019, at 05:48, Daniel Axtens  wrote:
> 
> 
>> 
>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
>> of suffering due to someone trying to suck down all patches in the 
>> linux-arm-kernel project. This is what the API request looked like:
>> 
>> GET 
>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>>  
>> 
>> The query behind this takes about 1 minute to run on a 20-core HT Xeon 
>> system and requires creating a huge temporary file (there are 18375 
>> patches in that project).
> 
> Ouch, I'm sorry to hear that.
> 
>> 
>> So, two questions, really:
>> 
>> 1. Any indexes we can put in place to make this query perform better?
> 
> We have a bunch of db magic contributed by Stewart that will hit 2.2.
> 
> Stewart, do you happen to know if any of your magic will affect API
> queries? They're advertised as affecting the general listing of patches
> in the UI, I'm not sure if they also affect this.
> 
> If not, we can definitely have a look at getting an index or rate
> limiting/authentication thingy in for 2.2.

I am pretty sure that my improvements would help in that specific query, 
probably not enough to be ideal though. I know they improve the web site 
equivalent operation, but don’t remember how much (if any) I poked at the API 
there.

I could take a look, I’d just need to set up a dev environment and grab a good 
set of data again.
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-15 Thread Konstantin Ryabitsev
On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote:
> > GET 
> > /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
> >  
> >
> > The query behind this takes about 1 minute to run on a 20-core HT Xeon 
> > system and requires creating a huge temporary file (there are 18375 
> > patches in that project).
> 
> Ouch, I'm sorry to hear that.

Well, it's true that some of kernel.org's projects are large beyond what 
would be considered "sane". :)

(Another annoyance of mine is when someone requests a project but 
doesn't end up using it. We end up accumulating huge amounts of dead 
data over the years with no clear way of determining "is this useful to 
anyone?" Now that we have mailing list archives available in a 
public-inbox format, it's easier to justify deleting projects that are 
clearly not being used.)

> We have a bunch of db magic contributed by Stewart that will hit 2.2.
> 
> Stewart, do you happen to know if any of your magic will affect API
> queries? They're advertised as affecting the general listing of patches
> in the UI, I'm not sure if they also affect this.
> 
> If not, we can definitely have a look at getting an index or rate
> limiting/authentication thingy in for 2.2.

I'm not sure rate-limiting would have helped much in this particular 
case, because whoever was doing it was not trying to be malicious. There 
were only 2-3 queries running in parallel, but they were so heavy that 
it was enough to affect DB performance. Per-second rate-limiting is also 
easy enough to do on the upstream proxy, so I doubt it's worth the 
effort to develop this feature in the code.

In fact, I can probably do the same for unauthenticated API access, but 
it might be nicer if there's a simple toggle available in Patchwork's 
UI. I wouldn't consider this a high-priority feature.

Best regards,
-K
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-15 Thread Daniel Axtens
> Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
> of suffering due to someone trying to suck down all patches in the 
> linux-arm-kernel project. This is what the API request looked like:
>
> GET 
> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150
>  
>
> The query behind this takes about 1 minute to run on a 20-core HT Xeon 
> system and requires creating a huge temporary file (there are 18375 
> patches in that project).

Ouch, I'm sorry to hear that.

>
> So, two questions, really:
>
> 1. Any indexes we can put in place to make this query perform better?

We have a bunch of db magic contributed by Stewart that will hit 2.2.

Stewart, do you happen to know if any of your magic will affect API
queries? They're advertised as affecting the general listing of patches
in the UI, I'm not sure if they also affect this.

If not, we can definitely have a look at getting an index or rate
limiting/authentication thingy in for 2.2.

Regards,
Daniel

> 2. Is there a way to disable anonymous API access?
>
> -K
> ___
> Patchwork mailing list
> Patchwork@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/patchwork
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


Re: DB-murdering API query (index suggestions needed)

2019-11-15 Thread Andrew Donnellan

On 15/11/19 11:44 am, Konstantin Ryabitsev wrote:

Hi, all:

Today, the DB behind patchwork.kernel.org was in a semi-permanent state
of suffering due to someone trying to suck down all patches in the
linux-arm-kernel project. This is what the API request looked like:

GET 
/api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150

The query behind this takes about 1 minute to run on a 20-core HT Xeon
system and requires creating a huge temporary file (there are 18375
patches in that project).

So, two questions, really:

1. Any indexes we can put in place to make this query perform better?
2. Is there a way to disable anonymous API access?


Not currently, but it would be fairly easy to do...

https://github.com/getpatchwork/patchwork/issues/325


--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork


DB-murdering API query (index suggestions needed)

2019-11-14 Thread Konstantin Ryabitsev
Hi, all:

Today, the DB behind patchwork.kernel.org was in a semi-permanent state 
of suffering due to someone trying to suck down all patches in the 
linux-arm-kernel project. This is what the API request looked like:

GET 
/api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 

The query behind this takes about 1 minute to run on a 20-core HT Xeon 
system and requires creating a huge temporary file (there are 18375 
patches in that project).

So, two questions, really:

1. Any indexes we can put in place to make this query perform better?
2. Is there a way to disable anonymous API access?

-K
___
Patchwork mailing list
Patchwork@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/patchwork