It would help to add a comment column to motivate flagging changeset/object 
content has a spam (SEO marketing description/infos, etc.).
 
Pierre 
 

    Le lundi 5 mars 2018 13:05:41 HNE, Jason Remillard 
<[email protected]> a écrit :  
 
 Hi Dave,
The detector needs to be "trained" on what a spam changeset looks like versus 
what a normal changeset looks like. Training really means programming the 
detector by example. 
Once we have a good set of example changesets, going forward, it will find them 
on its own. 
Rather than having me or Fredrick decide what is SPAM is or not, getting a 
diverse set of changeset from many people will insure that the algorithm is not 
biased relative to where the consensus is in the project. That is why I posed 
this to talk not dev. People that map are needed for this task.
Finally, this is just a software component. It will still need to be integrated 
into final end user tools. By doing the specialized machine learning code 
first, I am hoping to get some collaborators that are interested in integrating 
this into tools that everybody can use. But without the curated changeset list, 
it is going nowhere. Long term, hopefully it will get integrated into several 
tools... 
Jason
On Mon, Mar 5, 2018 at 12:42 PM, Dave F <[email protected]> wrote:

  Struggling to understand this
 If users are expected to send you changeset ids, how does it "detect spam"?
 In what way are users informed of spammy changesets?
 
 DaveF
 
 On 05/03/2018 14:06, Jason Remillard wrote:
  
  Hi, 
 
  This weekend I put together a SPAM detector for OSM changesets. 
 
 https://github.com/jremillard/ osm-changeset-classification
 
  You don't need to be a developer to contribute, send over any SPAM'y 
changesets you come across via a github issue, a pull request, or even an email 
to me. I just need the changeset id. 
 
  The code is currently hitting 99+% accuracy detecting the difference between 
1500 random normal edits and 1500 sketchy changesets that Fredrick shared with 
the talk-us last last week. This is with zero tuning, so it looks like it will 
work well.
 
  Jason
   
  
 ______________________________ _________________
talk mailing list
[email protected]
https://lists.openstreetmap. org/listinfo/talk
 
 
 
______________________________ _________________
talk mailing list
[email protected]
https://lists.openstreetmap. org/listinfo/talk



_______________________________________________
talk mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/talk
  
_______________________________________________
talk mailing list
[email protected]
https://lists.openstreetmap.org/listinfo/talk

Reply via email to