Re: [PATCH] Add mark_spam.py script
On 08/16/2016 12:02 AM, Joseph Myers wrote: > On Mon, 15 Aug 2016, Martin Liška wrote: > >> It can, currently we mark as spam just the first comment. If there's a spam >> PR >> which contains multiple comments, I'll extend the script. > > There certainly are spam bugs where the spammer pasted their spam in a > comment after creating the bug, rather than putting it in the initial bug > description; see bug 76607, for example. Maybe all comments created by > the original bug submitter should be considered as spam, not just the > initial bug description? > Hi. Looks the bug has been already removed (which is good). Script improvement does exactly what Joseph suggested. If there's no objection, I'll commit it tomorrow. Martin >From 98309a80a08b1d9e5f51c1e28f35322aaca8a52c Mon Sep 17 00:00:00 2001 From: marxinDate: Tue, 16 Aug 2016 14:14:07 +0200 Subject: [PATCH] mark_spam.py: Mark as spam all comments done by a creator contrib/ChangeLog: 2016-08-16 Martin Liska * mark_spam.py: Mark as spam all comments done by a creator. --- contrib/mark_spam.py | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py index f206356..86f46a1 100755 --- a/contrib/mark_spam.py +++ b/contrib/mark_spam.py @@ -39,7 +39,9 @@ def mark_as_spam(id, api_key, verbose): return # 2) mark the bug as spam -cc_list = response['bugs'][0]['cc'] +bug = response['bugs'][0] +creator = bug['creator'] +cc_list = bug['cc'] data = { 'status': 'RESOLVED', 'resolution': 'INVALID', @@ -64,13 +66,15 @@ def mark_as_spam(id, api_key, verbose): # 3) mark the first comment as spam r = requests.get(u + '/comment') response = json.loads(r.text) -comment_id = response['bugs'][str(id)]['comments'][0]['id'] - -u2 = '%sbug/comment/%d/tags' % (base_url, comment_id) -r = requests.put(u2, json = {'comment_id': comment_id, 'add': ['spam'], 'api_key': api_key}) -if verbose: -print(r) -print(r.text) +for c in response['bugs'][str(id)]['comments']: +if c['creator'] == creator: +comment_id = c['id'] +u2 = '%sbug/comment/%d/tags' % (base_url, comment_id) +print(u2) +r = requests.put(u2, json = {'comment_id': comment_id, 'add': ['spam'], 'api_key': api_key}) +if verbose: +print(r) +print(r.text) # 4) mark all attachments as spam r = requests.get(u + '/attachment') -- 2.9.2
Re: [PATCH] Add mark_spam.py script
On Mon, 15 Aug 2016, Martin Liška wrote: > It can, currently we mark as spam just the first comment. If there's a spam PR > which contains multiple comments, I'll extend the script. There certainly are spam bugs where the spammer pasted their spam in a comment after creating the bug, rather than putting it in the initial bug description; see bug 76607, for example. Maybe all comments created by the original bug submitter should be considered as spam, not just the initial bug description? -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add mark_spam.py script
On 08/15/2016 11:48 AM, Jakub Jelinek wrote: > But can't the comment added for the attachment contain also some spam text > that should be sanitized? > > Jakub It can, currently we mark as spam just the first comment. If there's a spam PR which contains multiple comments, I'll extend the script. There's a sample of attachment marked as spam: https://gcc.gnu.org/bugzilla/attachment.cgi?id=39437=edit Martin
Re: [PATCH] Add mark_spam.py script
On Mon, Aug 15, 2016 at 11:43:11AM +0200, Martin Liška wrote: > > Is dropping of 'comment": 'spam' intentional? > > Yes, it's not necessary to do a comment about the change for an attachment. > As the name of the attachment is set to spam, it's obvious in a comment > that is made for that. But can't the comment added for the attachment contain also some spam text that should be sanitized? Jakub
Re: [PATCH] Add mark_spam.py script
On 08/15/2016 11:37 AM, Jakub Jelinek wrote: > On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote: >> diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py >> index 569a03d..f206356 100755 >> --- a/contrib/mark_spam.py >> +++ b/contrib/mark_spam.py >> @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose): >> r = requests.get(u) >> response = json.loads(r.text) >> >> +if 'error' in response and response['error']: >> +print(response['message']) >> +return >> + >> # 2) mark the bug as spam >> cc_list = response['bugs'][0]['cc'] >> data = { >> @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose): >> 'cc': {'remove': cc_list}, >> 'priority': 'P5', >> 'severity': 'trivial', >> +'url': '', >> 'assigned_to': 'unassig...@gcc.gnu.org' } >> >> r = requests.put(u, json = data) >> @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose): >> for a in attachments: >> attachment_id = a['id'] >> url = '%sbug/attachment/%d' % (base_url, attachment_id) >> -r = requests.put(url, json = {'ids': [attachment_id], 'summary': >> 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) >> +r = requests.put(url, json = {'ids': [attachment_id], >> +'summary': 'spam', >> +'file_name': 'spam', >> +'content_type': 'application/x-spam', >> +'is_obsolete': True, > > Is dropping of 'comment": 'spam' intentional? Yes, it's not necessary to do a comment about the change for an attachment. As the name of the attachment is set to spam, it's obvious in a comment that is made for that. Martin > >> +'api_key': api_key}) >> if verbose: >> print(r) >> print(r.text) > > Jakub >
Re: [PATCH] Add mark_spam.py script
On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote: > diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py > index 569a03d..f206356 100755 > --- a/contrib/mark_spam.py > +++ b/contrib/mark_spam.py > @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose): > r = requests.get(u) > response = json.loads(r.text) > > +if 'error' in response and response['error']: > +print(response['message']) > +return > + > # 2) mark the bug as spam > cc_list = response['bugs'][0]['cc'] > data = { > @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose): > 'cc': {'remove': cc_list}, > 'priority': 'P5', > 'severity': 'trivial', > +'url': '', > 'assigned_to': 'unassig...@gcc.gnu.org' } > > r = requests.put(u, json = data) > @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose): > for a in attachments: > attachment_id = a['id'] > url = '%sbug/attachment/%d' % (base_url, attachment_id) > -r = requests.put(url, json = {'ids': [attachment_id], 'summary': > 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) > +r = requests.put(url, json = {'ids': [attachment_id], > +'summary': 'spam', > +'file_name': 'spam', > +'content_type': 'application/x-spam', > +'is_obsolete': True, Is dropping of 'comment": 'spam' intentional? > +'api_key': api_key}) > if verbose: > print(r) > print(r.text) Jakub
Re: [PATCH] Add mark_spam.py script
This is version of the script I've just installed as r239467. Martin >From 6385fc5c8729dcabd791c5b0cc5ba2ff64e68489 Mon Sep 17 00:00:00 2001 From: marxinDate: Mon, 15 Aug 2016 11:28:35 +0200 Subject: [PATCH] Enhance mark_spam.py script contrib/ChangeLog: 2016-08-15 Martin Liska * mark_spam.py: Add error handling and reset another properties of attachments and bugs. --- contrib/mark_spam.py | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py index 569a03d..f206356 100755 --- a/contrib/mark_spam.py +++ b/contrib/mark_spam.py @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose): r = requests.get(u) response = json.loads(r.text) +if 'error' in response and response['error']: +print(response['message']) +return + # 2) mark the bug as spam cc_list = response['bugs'][0]['cc'] data = { @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose): 'cc': {'remove': cc_list}, 'priority': 'P5', 'severity': 'trivial', +'url': '', 'assigned_to': 'unassig...@gcc.gnu.org' } r = requests.put(u, json = data) @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose): for a in attachments: attachment_id = a['id'] url = '%sbug/attachment/%d' % (base_url, attachment_id) -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) +r = requests.put(url, json = {'ids': [attachment_id], +'summary': 'spam', +'file_name': 'spam', +'content_type': 'application/x-spam', +'is_obsolete': True, +'api_key': api_key}) if verbose: print(r) print(r.text) -- 2.9.2
Re: [PATCH] Add mark_spam.py script
On 08/12/2016 11:15 PM, Joseph Myers wrote: Next observation on this script: it dies if a bug number in the given range doesn't exist, with an error like: Marking as spam: PR75336 Traceback (most recent call last): File "./mark_spam.py", line 98, in mark_as_spam(id, args.api_key, args.verbose) File "./mark_spam.py", line 38, in mark_as_spam cc_list = response['bugs'][0]['cc'] KeyError: 'bugs' It would be more convenient if it ignored nonexistent bugs rather than falling over like this, so that it's only necessary to check that the range you pass to the script has no non-spam bugs in it, not that every bug number in the range exists. That's easy to solve, please try to apply following patch on top of current trunk. (I don't know why there are gaps in the bug numbers; I suppose some error / timeout occurred while the spammers were creating bugs, at a point after a bug number had been reserved in the database but before the transaction creating the bug was complete - in such circumstances, databases don't necessarily unwind the reservation of a number.) diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py index 569a03d..f206356 100755 --- a/contrib/mark_spam.py +++ b/contrib/mark_spam.py @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose): r = requests.get(u) response = json.loads(r.text) +if 'error' in response and response['error']: +print(response['message']) +return + # 2) mark the bug as spam cc_list = response['bugs'][0]['cc'] data = { @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose): 'cc': {'remove': cc_list}, 'priority': 'P5', 'severity': 'trivial', +'url': '', 'assigned_to': 'unassig...@gcc.gnu.org' } r = requests.put(u, json = data) @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose): for a in attachments: attachment_id = a['id'] url = '%sbug/attachment/%d' % (base_url, attachment_id) -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) +r = requests.put(url, json = {'ids': [attachment_id], +'summary': 'spam', +'file_name': 'spam', +'content_type': 'application/x-spam', +'is_obsolete': True, +'api_key': api_key}) if verbose: print(r) print(r.text)
Re: [PATCH] Add mark_spam.py script
Next observation on this script: it dies if a bug number in the given range doesn't exist, with an error like: Marking as spam: PR75336 Traceback (most recent call last): File "./mark_spam.py", line 98, in mark_as_spam(id, args.api_key, args.verbose) File "./mark_spam.py", line 38, in mark_as_spam cc_list = response['bugs'][0]['cc'] KeyError: 'bugs' It would be more convenient if it ignored nonexistent bugs rather than falling over like this, so that it's only necessary to check that the range you pass to the script has no non-spam bugs in it, not that every bug number in the range exists. (I don't know why there are gaps in the bug numbers; I suppose some error / timeout occurred while the spammers were creating bugs, at a point after a bug number had been reserved in the database but before the transaction creating the bug was complete - in such circumstances, databases don't necessarily unwind the reservation of a number.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add mark_spam.py script
On 08/12/2016 06:10 PM, Joseph Myers wrote: > On Wed, 10 Aug 2016, Martin Liška wrote: > >> On 08/10/2016 10:49 PM, Joseph Myers wrote: >>> The latest spam bugs have spam attachments as well. I'm not sure if the >>> API can delete attachments, but it would be helpful for the script to do >>> as much as possible with them (change filenames, descriptions, MIME types, >>> mark them as obsolete). >>> >> >> I'm testing this, if it's working I'll install the patch. > > Thanks, this script is very useful, some more observations on spam bugs: That's good, but I would appreciate to have a more precise spam filter ;) > > Attachment filenames can be spammish (e.g. see bug 74852), is it possible > to change those to "spam" as well? > > Although most spam attachments seem to be application/pdf, some are image > types and get displayed inline, so changing the MIME type to > application/x-spam would be an improvement in those cases. > > A few spam bugs have URL set to a spam link, so emptying URL in all cases > when marking as spam would make sense. > Implemented in the attached patch, I'll commit it after weekend if there are not comments. Martin diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py index 569a03d..960ba51 100755 --- a/contrib/mark_spam.py +++ b/contrib/mark_spam.py @@ -49,6 +49,7 @@ def mark_as_spam(id, api_key, verbose): 'cc': {'remove': cc_list}, 'priority': 'P5', 'severity': 'trivial', +'url': '', 'assigned_to': 'unassig...@gcc.gnu.org' } r = requests.put(u, json = data) @@ -74,7 +75,12 @@ def mark_as_spam(id, api_key, verbose): for a in attachments: attachment_id = a['id'] url = '%sbug/attachment/%d' % (base_url, attachment_id) -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) +r = requests.put(url, json = {'ids': [attachment_id], +'summary': 'spam', +'file_name': 'spam', +'content_type': 'application/x-spam', +'is_obsolete': True, +'api_key': api_key}) if verbose: print(r) print(r.text)
Re: [PATCH] Add mark_spam.py script
On Wed, 10 Aug 2016, Martin Liška wrote: > On 08/10/2016 10:49 PM, Joseph Myers wrote: > > The latest spam bugs have spam attachments as well. I'm not sure if the > > API can delete attachments, but it would be helpful for the script to do > > as much as possible with them (change filenames, descriptions, MIME types, > > mark them as obsolete). > > > > I'm testing this, if it's working I'll install the patch. Thanks, this script is very useful, some more observations on spam bugs: Attachment filenames can be spammish (e.g. see bug 74852), is it possible to change those to "spam" as well? Although most spam attachments seem to be application/pdf, some are image types and get displayed inline, so changing the MIME type to application/x-spam would be an improvement in those cases. A few spam bugs have URL set to a spam link, so emptying URL in all cases when marking as spam would make sense. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add mark_spam.py script
On 08/10/2016 10:49 PM, Joseph Myers wrote: The latest spam bugs have spam attachments as well. I'm not sure if the API can delete attachments, but it would be helpful for the script to do as much as possible with them (change filenames, descriptions, MIME types, mark them as obsolete). I'm testing this, if it's working I'll install the patch. Martin diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py index cc394dc..569a03d 100755 --- a/contrib/mark_spam.py +++ b/contrib/mark_spam.py @@ -67,6 +67,18 @@ def mark_as_spam(id, api_key, verbose): print(r) print(r.text) +# 4) mark all attachments as spam +r = requests.get(u + '/attachment') +response = json.loads(r.text) +attachments = response['bugs'][str(id)] +for a in attachments: +attachment_id = a['id'] +url = '%sbug/attachment/%d' % (base_url, attachment_id) +r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key}) +if verbose: +print(r) +print(r.text) + parser = argparse.ArgumentParser(description='Mark Bugzilla issues as spam.') parser.add_argument('api_key', help = 'API key') parser.add_argument('range', help = 'Range of IDs, e.g. 10-23,24,25,27')
Re: [PATCH] Add mark_spam.py script
The latest spam bugs have spam attachments as well. I'm not sure if the API can delete attachments, but it would be helpful for the script to do as much as possible with them (change filenames, descriptions, MIME types, mark them as obsolete). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Add mark_spam.py script
On 07/26/2016 06:39 AM, Martin Liška wrote: Hello. This is python script that utilizes bugzilla API and marks PRs as spam: $ ./mark_spam.py --help usage: mark_spam.py [-h] [--verbose] api_key range Mark Bugzilla issues as spam. positional arguments: api_key API key range Range of IDs, e.g. 10-23,24,25,27 optional arguments: -h, --help show this help message and exit --verbose Verbose logging Sample usage: $ ./mark_spam.py my_api_key 72634-72636 Marking as spam: PR72634 Marking as spam: PR72635 Marking as spam: PR72636 API key can be set up here: https://gcc.gnu.org/bugzilla/userprefs.cgi?tab=apikey Sample PR marked by the script: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72635 Ready to install? Martin 0001-Add-mark_spam.py-script.patch From 467dc2cf8f0c549f5d7ee190efe59c841a9acad9 Mon Sep 17 00:00:00 2001 From: marxinDate: Tue, 26 Jul 2016 14:34:55 +0200 Subject: [PATCH] Add mark_spam.py script contrib/ChangeLog: 2016-07-26 Martin Liska * mark_spam.py: New file. OK. jeff