Re: DB-murdering API query (index suggestions needed)
Daniel Axtens writes: > Daniel Axtens writes: > >>> Today, the DB behind patchwork.kernel.org was in a semi-permanent state >>> of suffering due to someone trying to suck down all patches in the >>> linux-arm-kernel project. This is what the API request looked like: >>> >>> GET >>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >>> >>> >>> The query behind this takes about 1 minute to run on a 20-core HT Xeon >>> system and requires creating a huge temporary file (there are 18375 >>> patches in that project). >> >> Ouch, I'm sorry to hear that. >> >>> >>> So, two questions, really: >>> >>> 1. Any indexes we can put in place to make this query perform better? >> >> We have a bunch of db magic contributed by Stewart that will hit 2.2. >> >> Stewart, do you happen to know if any of your magic will affect API >> queries? They're advertised as affecting the general listing of patches >> in the UI, I'm not sure if they also affect this. >> >> If not, we can definitely have a look at getting an index or rate >> limiting/authentication thingy in for 2.2. >> > > Trying this out, it looks like we haven't fixed this for 2.2. Yep. Adding a project filter creates this amazingly awful set of queries, including a subquery. It looks like Stewart set out to fix a similar problem within a view, but we don't have the same flexibility in DRF. I think the new models will solve this but I really don't want to wait. Regards, Daniel 3. SELECT COUNT(*) FROM (SELECT DISTINCT `patchwork_submission`.`id` AS Col1, `patchwork_submission`.`msgid` AS Col2, `patchwork_submission`.`date` AS Col3, `patchwork_submission`.`submitter_id` AS Col4, `patchwork_submission`.`project_id` AS Col5, `patchwork_submission`.`name` AS Col6, `patchwork_patch`.`submission_ptr_id` AS Col7, `patchwork_patch`.`commit_ref` AS Col8, `patchwork_patch`.`pull_url` AS Col9, `patchwork_patch`.`delegate_id` AS Col10, `patchwork_patch`.`state_id` AS Col11, `patchwork_patch`.`archived` AS Col12, `patchwork_patch`.`hash` AS Col13, `patchwork_patch`.`patch_project_id` AS Col14, `patchwork_patch`.`series_id` AS Col15, `patchwork_patch`.`number` AS Col16, `patchwork_patch`.`related_id` AS Col17 FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) WHERE `patchwork_submission`.`project_id` = 1) subquery 4. SELECT DISTINCT `patchwork_submission`.`id`, `patchwork_submission`.`msgid`, `patchwork_submission`.`date`, `patchwork_submission`.`submitter_id`, `patchwork_submission`.`project_id`, `patchwork_submission`.`name`, `patchwork_patch`.`submission_ptr_id`, `patchwork_patch`.`commit_ref`, `patchwork_patch`.`pull_url`, `patchwork_patch`.`delegate_id`, `patchwork_patch`.`state_id`, `patchwork_patch`.`archived`, `patchwork_patch`.`hash`, `patchwork_patch`.`patch_project_id`, `patchwork_patch`.`series_id`, `patchwork_patch`.`number`, `patchwork_patch`.`related_id`, `patchwork_person`.`id`, `patchwork_person`.`email`, `patchwork_person`.`name`, `patchwork_person`.`user_id`, `patchwork_project`.`id`, `patchwork_project`.`linkname`, `patchwork_project`.`name`, `patchwork_project`.`listid`, `patchwork_project`.`listemail`, `patchwork_project`.`subject_match`, `patchwork_project`.`web_url`, `patchwork_project`.`scm_url`, `patchwork_project`.`webscm_url`, `patchwork_project`.`list_archive_url`, `patchwork_project`.`list_archive_url_format`, `patchwork_project`.`commit_url_format`, `patchwork_project`.`send_notifications`, `patchwork_project`.`use_tags`, `auth_user`.`id`, `auth_user`.`password`, `auth_user`.`last_login`, `auth_user`.`is_superuser`, `auth_user`.`username`, `auth_user`.`first_name`, `auth_user`.`last_name`, `auth_user`.`email`, `auth_user`.`is_staff`, `auth_user`.`is_active`, `auth_user`.`date_joined`, `patchwork_state`.`id`, `patchwork_state`.`name`, `patchwork_state`.`slug`, `patchwork_state`.`ordering`, `patchwork_state`.`action_required`, `patchwork_series`.`id`, `patchwork_series`.`project_id`, `patchwork_series`.`cover_letter_id`, `patchwork_series`.`name`, `patchwork_series`.`date`, `patchwork_series`.`submitter_id`, `patchwork_series`.`version`, `patchwork_series`.`total`, T8.`id`, T8.`linkname`, T8.`name`, T8.`listid`, T8.`listemail`, T8.`subject_match`, T8.`web_url`, T8.`scm_url`, T8.`webscm_url`, T8.`list_archive_url`, T8.`list_archive_url_format` , T8.`commit_url_format`, T8.`send_notifications`, T8.`use_tags` FROM `patchwork_patch` INNER JOIN `patchwork_submission` ON (`patchwork_patch`.`submission_ptr_id` = `patchwork_submission`.`id`) INNER JOIN `patchwork_project` ON (`patchwork_submission`.`project_id` = `patchwork_project`.`id`) INNER JOIN `patchwork_person` ON (`patchwork_submission`.`submitter_id` = `patchwork_person`.`id`) LEFT OUTER JOIN `auth_user` ON (`patchwork_patch`.`delegate_id` = `auth_user`.`id`) LEFT OUTER JOIN `patchwork_state` ON (`patchwork_patch`.`state_id` =
Re: DB-murdering API query (index suggestions needed)
Daniel Axtens writes: >> Today, the DB behind patchwork.kernel.org was in a semi-permanent state >> of suffering due to someone trying to suck down all patches in the >> linux-arm-kernel project. This is what the API request looked like: >> >> GET >> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >> >> >> The query behind this takes about 1 minute to run on a 20-core HT Xeon >> system and requires creating a huge temporary file (there are 18375 >> patches in that project). > > Ouch, I'm sorry to hear that. > >> >> So, two questions, really: >> >> 1. Any indexes we can put in place to make this query perform better? > > We have a bunch of db magic contributed by Stewart that will hit 2.2. > > Stewart, do you happen to know if any of your magic will affect API > queries? They're advertised as affecting the general listing of patches > in the UI, I'm not sure if they also affect this. > > If not, we can definitely have a look at getting an index or rate > limiting/authentication thingy in for 2.2. > Trying this out, it looks like we haven't fixed this for 2.2. Regards, Daniel > Regards, > Daniel > >> 2. Is there a way to disable anonymous API access? >> >> -K >> ___ >> Patchwork mailing list >> Patchwork@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
> On 15 Nov 2019, at 09:43, Konstantin Ryabitsev > wrote: > > On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote: >>> GET >>> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >>> The query behind this takes about 1 minute to run on a 20-core HT Xeon >>> system and requires creating a huge temporary file (there are 18375 >>> patches in that project). >> Ouch, I'm sorry to hear that. > > Well, it's true that some of kernel.org's projects are large beyond what > would be considered "sane". :) My thought is that we should make it so that Patchwork is able to cope. It’s not *that* much data or that complex queries (he says with his database hat on) :) ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
> On 15 Nov 2019, at 05:48, Daniel Axtens wrote: > > >> >> Today, the DB behind patchwork.kernel.org was in a semi-permanent state >> of suffering due to someone trying to suck down all patches in the >> linux-arm-kernel project. This is what the API request looked like: >> >> GET >> /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 >> >> >> The query behind this takes about 1 minute to run on a 20-core HT Xeon >> system and requires creating a huge temporary file (there are 18375 >> patches in that project). > > Ouch, I'm sorry to hear that. > >> >> So, two questions, really: >> >> 1. Any indexes we can put in place to make this query perform better? > > We have a bunch of db magic contributed by Stewart that will hit 2.2. > > Stewart, do you happen to know if any of your magic will affect API > queries? They're advertised as affecting the general listing of patches > in the UI, I'm not sure if they also affect this. > > If not, we can definitely have a look at getting an index or rate > limiting/authentication thingy in for 2.2. I am pretty sure that my improvements would help in that specific query, probably not enough to be ideal though. I know they improve the web site equivalent operation, but don’t remember how much (if any) I poked at the API there. I could take a look, I’d just need to set up a dev environment and grab a good set of data again. ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
On Sat, Nov 16, 2019 at 12:48:33AM +1100, Daniel Axtens wrote: > > GET > > /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 > > > > > > The query behind this takes about 1 minute to run on a 20-core HT Xeon > > system and requires creating a huge temporary file (there are 18375 > > patches in that project). > > Ouch, I'm sorry to hear that. Well, it's true that some of kernel.org's projects are large beyond what would be considered "sane". :) (Another annoyance of mine is when someone requests a project but doesn't end up using it. We end up accumulating huge amounts of dead data over the years with no clear way of determining "is this useful to anyone?" Now that we have mailing list archives available in a public-inbox format, it's easier to justify deleting projects that are clearly not being used.) > We have a bunch of db magic contributed by Stewart that will hit 2.2. > > Stewart, do you happen to know if any of your magic will affect API > queries? They're advertised as affecting the general listing of patches > in the UI, I'm not sure if they also affect this. > > If not, we can definitely have a look at getting an index or rate > limiting/authentication thingy in for 2.2. I'm not sure rate-limiting would have helped much in this particular case, because whoever was doing it was not trying to be malicious. There were only 2-3 queries running in parallel, but they were so heavy that it was enough to affect DB performance. Per-second rate-limiting is also easy enough to do on the upstream proxy, so I doubt it's worth the effort to develop this feature in the code. In fact, I can probably do the same for unauthenticated API access, but it might be nicer if there's a simple toggle available in Patchwork's UI. I wouldn't consider this a high-priority feature. Best regards, -K ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
> Today, the DB behind patchwork.kernel.org was in a semi-permanent state > of suffering due to someone trying to suck down all patches in the > linux-arm-kernel project. This is what the API request looked like: > > GET > /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 > > > The query behind this takes about 1 minute to run on a 20-core HT Xeon > system and requires creating a huge temporary file (there are 18375 > patches in that project). Ouch, I'm sorry to hear that. > > So, two questions, really: > > 1. Any indexes we can put in place to make this query perform better? We have a bunch of db magic contributed by Stewart that will hit 2.2. Stewart, do you happen to know if any of your magic will affect API queries? They're advertised as affecting the general listing of patches in the UI, I'm not sure if they also affect this. If not, we can definitely have a look at getting an index or rate limiting/authentication thingy in for 2.2. Regards, Daniel > 2. Is there a way to disable anonymous API access? > > -K > ___ > Patchwork mailing list > Patchwork@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/patchwork ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
Re: DB-murdering API query (index suggestions needed)
On 15/11/19 11:44 am, Konstantin Ryabitsev wrote: Hi, all: Today, the DB behind patchwork.kernel.org was in a semi-permanent state of suffering due to someone trying to suck down all patches in the linux-arm-kernel project. This is what the API request looked like: GET /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 The query behind this takes about 1 minute to run on a 20-core HT Xeon system and requires creating a huge temporary file (there are 18375 patches in that project). So, two questions, really: 1. Any indexes we can put in place to make this query perform better? 2. Is there a way to disable anonymous API access? Not currently, but it would be fairly easy to do... https://github.com/getpatchwork/patchwork/issues/325 -- Andrew Donnellan OzLabs, ADL Canberra a...@linux.ibm.com IBM Australia Limited ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork
DB-murdering API query (index suggestions needed)
Hi, all: Today, the DB behind patchwork.kernel.org was in a semi-permanent state of suffering due to someone trying to suck down all patches in the linux-arm-kernel project. This is what the API request looked like: GET /api/1.1/patches/?project=62=2019-11-01T00:00:00_page=100=6150 The query behind this takes about 1 minute to run on a 20-core HT Xeon system and requires creating a huge temporary file (there are 18375 patches in that project). So, two questions, really: 1. Any indexes we can put in place to make this query perform better? 2. Is there a way to disable anonymous API access? -K ___ Patchwork mailing list Patchwork@lists.ozlabs.org https://lists.ozlabs.org/listinfo/patchwork