Re: Open source archives hosting malicious software packages

2017-09-20 Thread James E Keenan

On 09/20/2017 06:01 PM, Neil Bowers wrote:

http://www.theregister.co.uk/2017/09/15/pretend_python_packages_prey_on_poor_typing/Would
 CPAN be subject to the same problem as described in the article above?


Yes.

DBI::Class, for example, could be a typo for DBIx::Class or a
misremembered Class::DBI, and there's nothing stopping anyone from
uploading a DBI::Class package that does all kinds of dodgy stuff.


There are plenty of confusable (small edit distance) pairs of module names on 
CPAN.

For example,
Algorithm::SVM and Algorithm::VSM
AI::POS and AI::PSO
both pairs are from different dists. More likely with short acronyms.

One thing we could do is have a tool looking at newly registered package names 
and alert the PAUSE admins to have a look at any that are a short edit distance 
from an existing package name.



Would anyone know of any prior art for detection of "short edit 
distances"?  (Perhaps even already on CPAN?)


Thank you very much.
Jim Keenan


Re: Open source archives hosting malicious software packages

2017-09-20 Thread Zefram
James E Keenan wrote:
>Would anyone know of any prior art for detection of "short edit distances"?
>(Perhaps even already on CPAN?)

Text::Levenshtein.

-zefram


Re: Open source archives hosting malicious software packages

2017-09-20 Thread David Precious
On Wed, 20 Sep 2017 18:08:34 -0400
James E Keenan  wrote:

> On 09/20/2017 06:01 PM, Neil Bowers wrote:
> > One thing we could do is have a tool looking at newly registered
> > package names and alert the PAUSE admins to have a look at any that
> > are a short edit distance from an existing package name. 
> 
> Would anyone know of any prior art for detection of "short edit 
> distances"?  (Perhaps even already on CPAN?)

Isn't that just the Levenshtein distance?  So e.g.
Neil's Text::Levenshtein?

One thing I thing is good to consider is the fact that all CPAN releases
get announced on a quite populated IRC channel, increasing the chance of
someone spotting a release announcement and thinking "hmm, that looks
dodgy" - but that's of course not entirely reliable, and doesn't focus
only on new releases.


Re: Open source archives hosting malicious software packages

2017-09-20 Thread Neil Bowers
>> http://www.theregister.co.uk/2017/09/15/pretend_python_packages_prey_on_poor_typing/Would
>>  CPAN be subject to the same problem as described in the article above?
> 
> Yes.
> 
> DBI::Class, for example, could be a typo for DBIx::Class or a
> misremembered Class::DBI, and there's nothing stopping anyone from
> uploading a DBI::Class package that does all kinds of dodgy stuff.

There are plenty of confusable (small edit distance) pairs of module names on 
CPAN.

For example,
Algorithm::SVM and Algorithm::VSM
AI::POS and AI::PSO
both pairs are from different dists. More likely with short acronyms.

One thing we could do is have a tool looking at newly registered package names 
and alert the PAUSE admins to have a look at any that are a short edit distance 
from an existing package name.

Neil


Re: Requesting co-maint for MP3-Info

2017-09-20 Thread Neil Bowers
Hi JJ,

> > This email is to request co-maintenance for the above mentioned module 
> > https://metacpan.org/release/MP3-Info in order to make a new CPAN release.
> 
> I’m trying to track down Daniel, to see if we can get his blessing, but his 
> two email addresses I’ve tried so far have bounced. I’m still working on 
> another approach, so give me a while please.
> 
> :-) Of course.
> 
> How are things going here? Just a quick reminder. If they're going good, no 
> need to answer :-)

Thanks for the nudge. I had been trying to track Daniel down via email & 
LinkedIn, but ended up getting the “ok” via twitter:



I’ve just given you co-maint, on Daniel’s behalf.

Thanks for persisting to get co-maint on this, and thanks to Daniel for giving 
permission.

Cheers,
Neil