Re: [PATCH] Add mark_spam.py script

2016-08-16 Thread Martin Liška
On 08/16/2016 12:02 AM, Joseph Myers wrote:
> On Mon, 15 Aug 2016, Martin Liška wrote:
> 
>> It can, currently we mark as spam just the first comment. If there's a spam 
>> PR
>> which contains multiple comments, I'll extend the script.
> 
> There certainly are spam bugs where the spammer pasted their spam in a 
> comment after creating the bug, rather than putting it in the initial bug 
> description; see bug 76607, for example.  Maybe all comments created by 
> the original bug submitter should be considered as spam, not just the 
> initial bug description?
> 

Hi.

Looks the bug has been already removed (which is good). Script improvement
does exactly what Joseph suggested. If there's no objection, I'll commit it
tomorrow.

Martin
>From 98309a80a08b1d9e5f51c1e28f35322aaca8a52c Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 16 Aug 2016 14:14:07 +0200
Subject: [PATCH] mark_spam.py: Mark as spam all comments done by a creator

contrib/ChangeLog:

2016-08-16  Martin Liska  

	* mark_spam.py: Mark as spam all comments done by a creator.
---
 contrib/mark_spam.py | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index f206356..86f46a1 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -39,7 +39,9 @@ def mark_as_spam(id, api_key, verbose):
 return
 
 # 2) mark the bug as spam
-cc_list = response['bugs'][0]['cc']
+bug = response['bugs'][0]
+creator = bug['creator']
+cc_list = bug['cc']
 data = {
 'status': 'RESOLVED',
 'resolution': 'INVALID',
@@ -64,13 +66,15 @@ def mark_as_spam(id, api_key, verbose):
 # 3) mark the first comment as spam
 r = requests.get(u + '/comment')
 response = json.loads(r.text)
-comment_id = response['bugs'][str(id)]['comments'][0]['id']
-
-u2 = '%sbug/comment/%d/tags' % (base_url, comment_id)
-r = requests.put(u2, json = {'comment_id': comment_id, 'add': ['spam'], 'api_key': api_key})
-if verbose:
-print(r)
-print(r.text)
+for c in response['bugs'][str(id)]['comments']:
+if c['creator'] == creator:
+comment_id = c['id']
+u2 = '%sbug/comment/%d/tags' % (base_url, comment_id)
+print(u2)
+r = requests.put(u2, json = {'comment_id': comment_id, 'add': ['spam'], 'api_key': api_key})
+if verbose:
+print(r)
+print(r.text)
 
 # 4) mark all attachments as spam
 r = requests.get(u + '/attachment')
-- 
2.9.2



Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Joseph Myers
On Mon, 15 Aug 2016, Martin Liška wrote:

> It can, currently we mark as spam just the first comment. If there's a spam PR
> which contains multiple comments, I'll extend the script.

There certainly are spam bugs where the spammer pasted their spam in a 
comment after creating the bug, rather than putting it in the initial bug 
description; see bug 76607, for example.  Maybe all comments created by 
the original bug submitter should be considered as spam, not just the 
initial bug description?

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
On 08/15/2016 11:48 AM, Jakub Jelinek wrote:
> But can't the comment added for the attachment contain also some spam text
> that should be sanitized?
> 
>   Jakub

It can, currently we mark as spam just the first comment. If there's a spam PR
which contains multiple comments, I'll extend the script.

There's a sample of attachment marked as spam:
https://gcc.gnu.org/bugzilla/attachment.cgi?id=39437=edit

Martin


Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 11:43:11AM +0200, Martin Liška wrote:
> > Is dropping of 'comment": 'spam' intentional?
> 
> Yes, it's not necessary to do a comment about the change for an attachment.
> As the name of the attachment is set to spam, it's obvious in a comment
> that is made for that.

But can't the comment added for the attachment contain also some spam text
that should be sanitized?

Jakub


Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
On 08/15/2016 11:37 AM, Jakub Jelinek wrote:
> On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote:
>> diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
>> index 569a03d..f206356 100755
>> --- a/contrib/mark_spam.py
>> +++ b/contrib/mark_spam.py
>> @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
>>  r = requests.get(u)
>>  response = json.loads(r.text)
>>  
>> +if 'error' in response and response['error']:
>> +print(response['message'])
>> +return
>> +
>>  # 2) mark the bug as spam
>>  cc_list = response['bugs'][0]['cc']
>>  data = {
>> @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
>>  'cc': {'remove': cc_list},
>>  'priority': 'P5',
>>  'severity': 'trivial',
>> +'url': '',
>>  'assigned_to': 'unassig...@gcc.gnu.org' }
>>  
>>  r = requests.put(u, json = data)
>> @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
>>  for a in attachments:
>>  attachment_id = a['id']
>>  url = '%sbug/attachment/%d' % (base_url, attachment_id)
>> -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 
>> 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
>> +r = requests.put(url, json = {'ids': [attachment_id],
>> +'summary': 'spam',
>> +'file_name': 'spam',
>> +'content_type': 'application/x-spam',
>> +'is_obsolete': True,
> 
> Is dropping of 'comment": 'spam' intentional?

Yes, it's not necessary to do a comment about the change for an attachment.
As the name of the attachment is set to spam, it's obvious in a comment
that is made for that.

Martin

> 
>> +'api_key': api_key})
>>  if verbose:
>>  print(r)
>>  print(r.text)
> 
>   Jakub
> 



Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Jakub Jelinek
On Mon, Aug 15, 2016 at 11:31:22AM +0200, Martin Liška wrote:
> diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
> index 569a03d..f206356 100755
> --- a/contrib/mark_spam.py
> +++ b/contrib/mark_spam.py
> @@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
>  r = requests.get(u)
>  response = json.loads(r.text)
>  
> +if 'error' in response and response['error']:
> +print(response['message'])
> +return
> +
>  # 2) mark the bug as spam
>  cc_list = response['bugs'][0]['cc']
>  data = {
> @@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
>  'cc': {'remove': cc_list},
>  'priority': 'P5',
>  'severity': 'trivial',
> +'url': '',
>  'assigned_to': 'unassig...@gcc.gnu.org' }
>  
>  r = requests.put(u, json = data)
> @@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
>  for a in attachments:
>  attachment_id = a['id']
>  url = '%sbug/attachment/%d' % (base_url, attachment_id)
> -r = requests.put(url, json = {'ids': [attachment_id], 'summary': 
> 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
> +r = requests.put(url, json = {'ids': [attachment_id],
> +'summary': 'spam',
> +'file_name': 'spam',
> +'content_type': 'application/x-spam',
> +'is_obsolete': True,

Is dropping of 'comment": 'spam' intentional?

> +'api_key': api_key})
>  if verbose:
>  print(r)
>  print(r.text)

Jakub


Re: [PATCH] Add mark_spam.py script

2016-08-15 Thread Martin Liška
This is version of the script I've just installed as r239467.

Martin
>From 6385fc5c8729dcabd791c5b0cc5ba2ff64e68489 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 15 Aug 2016 11:28:35 +0200
Subject: [PATCH] Enhance mark_spam.py script

contrib/ChangeLog:

2016-08-15  Martin Liska  

	* mark_spam.py: Add error handling and reset
	another properties of attachments and bugs.
---
 contrib/mark_spam.py | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index 569a03d..f206356 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
 r = requests.get(u)
 response = json.loads(r.text)
 
+if 'error' in response and response['error']:
+print(response['message'])
+return
+
 # 2) mark the bug as spam
 cc_list = response['bugs'][0]['cc']
 data = {
@@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
 'cc': {'remove': cc_list},
 'priority': 'P5',
 'severity': 'trivial',
+'url': '',
 'assigned_to': 'unassig...@gcc.gnu.org' }
 
 r = requests.put(u, json = data)
@@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
 for a in attachments:
 attachment_id = a['id']
 url = '%sbug/attachment/%d' % (base_url, attachment_id)
-r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+r = requests.put(url, json = {'ids': [attachment_id],
+'summary': 'spam',
+'file_name': 'spam',
+'content_type': 'application/x-spam',
+'is_obsolete': True,
+'api_key': api_key})
 if verbose:
 print(r)
 print(r.text)
-- 
2.9.2



Re: [PATCH] Add mark_spam.py script

2016-08-13 Thread Martin Liška

On 08/12/2016 11:15 PM, Joseph Myers wrote:

Next observation on this script: it dies if a bug number in the given
range doesn't exist, with an error like:

Marking as spam: PR75336
Traceback (most recent call last):
  File "./mark_spam.py", line 98, in 
mark_as_spam(id, args.api_key, args.verbose)
  File "./mark_spam.py", line 38, in mark_as_spam
cc_list = response['bugs'][0]['cc']
KeyError: 'bugs'

It would be more convenient if it ignored nonexistent bugs rather than
falling over like this, so that it's only necessary to check that the
range you pass to the script has no non-spam bugs in it, not that every
bug number in the range exists.


That's easy to solve, please try to apply following patch on top
of current trunk.



(I don't know why there are gaps in the bug numbers; I suppose some error
/ timeout occurred while the spammers were creating bugs, at a point after
a bug number had been reserved in the database but before the transaction
creating the bug was complete - in such circumstances, databases don't
necessarily unwind the reservation of a number.)

diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index 569a03d..f206356 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -34,6 +34,10 @@ def mark_as_spam(id, api_key, verbose):
 r = requests.get(u)
 response = json.loads(r.text)
 
+if 'error' in response and response['error']:
+print(response['message'])
+return
+
 # 2) mark the bug as spam
 cc_list = response['bugs'][0]['cc']
 data = {
@@ -49,6 +53,7 @@ def mark_as_spam(id, api_key, verbose):
 'cc': {'remove': cc_list},
 'priority': 'P5',
 'severity': 'trivial',
+'url': '',
 'assigned_to': 'unassig...@gcc.gnu.org' }
 
 r = requests.put(u, json = data)
@@ -74,7 +79,12 @@ def mark_as_spam(id, api_key, verbose):
 for a in attachments:
 attachment_id = a['id']
 url = '%sbug/attachment/%d' % (base_url, attachment_id)
-r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+r = requests.put(url, json = {'ids': [attachment_id],
+'summary': 'spam',
+'file_name': 'spam',
+'content_type': 'application/x-spam',
+'is_obsolete': True,
+'api_key': api_key})
 if verbose:
 print(r)
 print(r.text)


Re: [PATCH] Add mark_spam.py script

2016-08-12 Thread Joseph Myers
Next observation on this script: it dies if a bug number in the given 
range doesn't exist, with an error like:

Marking as spam: PR75336
Traceback (most recent call last):
  File "./mark_spam.py", line 98, in 
mark_as_spam(id, args.api_key, args.verbose)
  File "./mark_spam.py", line 38, in mark_as_spam
cc_list = response['bugs'][0]['cc']
KeyError: 'bugs'

It would be more convenient if it ignored nonexistent bugs rather than 
falling over like this, so that it's only necessary to check that the 
range you pass to the script has no non-spam bugs in it, not that every 
bug number in the range exists.

(I don't know why there are gaps in the bug numbers; I suppose some error 
/ timeout occurred while the spammers were creating bugs, at a point after 
a bug number had been reserved in the database but before the transaction 
creating the bug was complete - in such circumstances, databases don't 
necessarily unwind the reservation of a number.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add mark_spam.py script

2016-08-12 Thread Martin Liška
On 08/12/2016 06:10 PM, Joseph Myers wrote:
> On Wed, 10 Aug 2016, Martin Liška wrote:
> 
>> On 08/10/2016 10:49 PM, Joseph Myers wrote:
>>> The latest spam bugs have spam attachments as well.  I'm not sure if the
>>> API can delete attachments, but it would be helpful for the script to do
>>> as much as possible with them (change filenames, descriptions, MIME types,
>>> mark them as obsolete).
>>>
>>
>> I'm testing this, if it's working I'll install the patch.
> 
> Thanks, this script is very useful, some more observations on spam bugs:

That's good, but I would appreciate to have a more precise spam filter ;)

> 
> Attachment filenames can be spammish (e.g. see bug 74852), is it possible 
> to change those to "spam" as well?
> 
> Although most spam attachments seem to be application/pdf, some are image 
> types and get displayed inline, so changing the MIME type to 
> application/x-spam would be an improvement in those cases.
> 
> A few spam bugs have URL set to a spam link, so emptying URL in all cases 
> when marking as spam would make sense.
> 

Implemented in the attached patch, I'll commit it after weekend if there
are not comments.

Martin
diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index 569a03d..960ba51 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -49,6 +49,7 @@ def mark_as_spam(id, api_key, verbose):
 'cc': {'remove': cc_list},
 'priority': 'P5',
 'severity': 'trivial',
+'url': '',
 'assigned_to': 'unassig...@gcc.gnu.org' }
 
 r = requests.put(u, json = data)
@@ -74,7 +75,12 @@ def mark_as_spam(id, api_key, verbose):
 for a in attachments:
 attachment_id = a['id']
 url = '%sbug/attachment/%d' % (base_url, attachment_id)
-r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+r = requests.put(url, json = {'ids': [attachment_id],
+'summary': 'spam',
+'file_name': 'spam',
+'content_type': 'application/x-spam',
+'is_obsolete': True,
+'api_key': api_key})
 if verbose:
 print(r)
 print(r.text)


Re: [PATCH] Add mark_spam.py script

2016-08-12 Thread Joseph Myers
On Wed, 10 Aug 2016, Martin Liška wrote:

> On 08/10/2016 10:49 PM, Joseph Myers wrote:
> > The latest spam bugs have spam attachments as well.  I'm not sure if the
> > API can delete attachments, but it would be helpful for the script to do
> > as much as possible with them (change filenames, descriptions, MIME types,
> > mark them as obsolete).
> > 
> 
> I'm testing this, if it's working I'll install the patch.

Thanks, this script is very useful, some more observations on spam bugs:

Attachment filenames can be spammish (e.g. see bug 74852), is it possible 
to change those to "spam" as well?

Although most spam attachments seem to be application/pdf, some are image 
types and get displayed inline, so changing the MIME type to 
application/x-spam would be an improvement in those cases.

A few spam bugs have URL set to a spam link, so emptying URL in all cases 
when marking as spam would make sense.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] Add mark_spam.py script

2016-08-10 Thread Martin Liška

On 08/10/2016 10:49 PM, Joseph Myers wrote:

The latest spam bugs have spam attachments as well.  I'm not sure if the
API can delete attachments, but it would be helpful for the script to do
as much as possible with them (change filenames, descriptions, MIME types,
mark them as obsolete).



I'm testing this, if it's working I'll install the patch.

Martin
diff --git a/contrib/mark_spam.py b/contrib/mark_spam.py
index cc394dc..569a03d 100755
--- a/contrib/mark_spam.py
+++ b/contrib/mark_spam.py
@@ -67,6 +67,18 @@ def mark_as_spam(id, api_key, verbose):
 print(r)
 print(r.text)
 
+# 4) mark all attachments as spam
+r = requests.get(u + '/attachment')
+response = json.loads(r.text)
+attachments = response['bugs'][str(id)]
+for a in attachments:
+attachment_id = a['id']
+url = '%sbug/attachment/%d' % (base_url, attachment_id)
+r = requests.put(url, json = {'ids': [attachment_id], 'summary': 'spam', 'comment': 'spam', 'is_obsolete': True, 'api_key': api_key})
+if verbose:
+print(r)
+print(r.text)
+
 parser = argparse.ArgumentParser(description='Mark Bugzilla issues as spam.')
 parser.add_argument('api_key', help = 'API key')
 parser.add_argument('range', help = 'Range of IDs, e.g. 10-23,24,25,27')


Re: [PATCH] Add mark_spam.py script

2016-08-10 Thread Joseph Myers
The latest spam bugs have spam attachments as well.  I'm not sure if the 
API can delete attachments, but it would be helpful for the script to do 
as much as possible with them (change filenames, descriptions, MIME types, 
mark them as obsolete).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Add mark_spam.py script

2016-07-27 Thread Jeff Law

On 07/26/2016 06:39 AM, Martin Liška wrote:

Hello.

This is python script that utilizes bugzilla API and marks PRs as spam:

$ ./mark_spam.py --help
usage: mark_spam.py [-h] [--verbose] api_key range

Mark Bugzilla issues as spam.

positional arguments:
  api_key API key
  range   Range of IDs, e.g. 10-23,24,25,27

optional arguments:
  -h, --help  show this help message and exit
  --verbose   Verbose logging

Sample usage:
$ ./mark_spam.py my_api_key 72634-72636
Marking as spam: PR72634
Marking as spam: PR72635
Marking as spam: PR72636

API key can be set up here:
https://gcc.gnu.org/bugzilla/userprefs.cgi?tab=apikey

Sample PR marked by the script: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72635

Ready to install?
Martin


0001-Add-mark_spam.py-script.patch


From 467dc2cf8f0c549f5d7ee190efe59c841a9acad9 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 26 Jul 2016 14:34:55 +0200
Subject: [PATCH] Add mark_spam.py script

contrib/ChangeLog:

2016-07-26  Martin Liska  

* mark_spam.py: New file.

OK.
jeff