Re: Open source archives hosting malicious software packages

2017-09-22 Thread Neil Bowers
First cut at a script to check new CPAN packages:
https://github.com/neilb/cpan-watcher 


At the moment it just flags:
Package names that are confusable with packages in other dists
Package names which don’t come under the expected main package name

The first time you run it, it will grab the current CPAN Index. When you next 
run it (eg tomorrow) it will grab the index again, and then check new packages. 
It expects $HOME/cpan-watcher to exist.

And the output today:

new package Dancer2::Logger::LogAny (dist Dancer2-Logger-LogAny) is confusable 
with package Dancer::Logger::LogAny (dist Dancer-Logger-LogAny)
new package Device::Chip::AD5691R is in dist Device-Chip-AnalogConverters, but 
doesn't match expected namespace (Device::Chip::AnalogConverters)
new package Lab::Moose::Connection::USB is in dist Lab-Measurement, but doesn't 
match expected namespace (Lab::Measurement)
new package Lab::Moose::Connection::VXI11 is in dist Lab-Measurement, but 
doesn't match expected namespace (Lab::Measurement)
new package Lab::Moose::Instrument::ZI_MFIA is in dist Lab-Measurement, but 
doesn't match expected namespace (Lab::Measurement)

I’m going to have this in a crontab, running once a day.

Neil



Re: Open source archives hosting malicious software packages

2017-09-21 Thread David Precious
On Fri, 22 Sep 2017 01:00:22 +1200
Kent Fredric  wrote:

> On 22 September 2017 at 00:11, David Cantrell 
> wrote:
> 
> > But is anyone paying attention? I assume you're talking about
> > #cpantesters, which I'm on, but I hardly ever look at it, and when
> > I do look I certainly don't look at scrollback, let alone looking at
> > scrollback *carefully*.  
> 
> It gets duty on freenode #perl too, and its not uncommon for people
> like me to glance at https://metacpan.org/recent ( usually to see
> something and regret looking )


Yeah, freenode/#perl is the one I was referring to - 500+ sets of
eyeballs (although how many of them are people likely to recognise
typo-squatting of popular modules and go check them out I don't know).

Certainly agree that something that automatically flags up anything
suspicious-looking would be good - to a mailing list would have the
benefit of not being missed if nobody was looking at the time.  I'd
certainly be happy enough to sit on such a mailing list and help check
anything dodgy-looking.


Re: Open source archives hosting malicious software packages

2017-09-21 Thread H.Merijn Brand
On Fri, 22 Sep 2017 01:00:22 +1200, Kent Fredric
 wrote:

> On 22 September 2017 at 00:11, David Cantrell  wrote:
> 
> > But is anyone paying attention? I assume you're talking about
> > #cpantesters, which I'm on, but I hardly ever look at it, and when I do
> > look I certainly don't look at scrollback, let alone looking at
> > scrollback *carefully*.  
> 
> It gets duty on freenode #perl too, and its not uncommon for people
> like me to glance at https://metacpan.org/recent ( usually to see
> something and regret looking )

Module uploads also show on irc.perl.org #news

* rssbot CPAN: Net-IPAddress-Util-4.000 (PWBENNETT) - 
https://metacpan.org/release/PWBENNETT/Net-IPAddress-Util-4.000
* rssbot CPAN: Map-Tube-3.35 (MANWAR) - 
https://metacpan.org/release/MANWAR/Map-Tube-3.35
* rssbot CPAN: Syntax-Highlight-Engine-Simple-0.101 (AKHUETTEL) - 
https://metacpan.org/release/AKHUETTEL/Syntax-Highlight-Engine-Simple-0.101
* rssbot CPAN: Syntax-Highlight-Engine-Simple-Perl-0.04 (AKHUETTEL) - 
https://metacpan.org/release/AKHUETTEL/Syntax-Highlight-Engine-Simple-Perl-0.04
* rssbot CPAN: Syntax-Highlight-Engine-Simple-HTML-0.04 (AKHUETTEL) - 
https://metacpan.org/release/AKHUETTEL/Syntax-Highlight-Engine-Simple-HTML-0.04
* rssbot CPAN: Git-Hooks-2.1.4 (GNUSTAVO) - 
https://metacpan.org/release/GNUSTAVO/Git-Hooks-2.1.4
* rssbot CPAN: SPVM-0.0269 (KIMOTO) - 
https://metacpan.org/release/KIMOTO/SPVM-0.0269
* rssbot CPAN: Linux-DesktopFiles-0.21 (TRIZEN) - 
https://metacpan.org/release/TRIZEN/Linux-DesktopFiles-0.21

If rssbot can add lines with a warning about what this thread is about,
and with a tag tag any IRC client is able to highlight, spotting that
would be a relative easy job for humans

A (cron) job could also run periodically and mail those that subscribe
with a list of all suspicious uploads since the last run

-- 
H.Merijn Brand  http://tux.nl   Perl Monger  http://amsterdam.pm.org/
using perl5.00307 .. 5.27   porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/http://www.test-smoke.org/
http://qa.perl.org   http://www.goldmark.org/jeff/stupid-disclaimers/


pgpzeDy_EeU3f.pgp
Description: OpenPGP digital signature


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Kent Fredric
On 22 September 2017 at 00:11, David Cantrell  wrote:

> But is anyone paying attention? I assume you're talking about
> #cpantesters, which I'm on, but I hardly ever look at it, and when I do
> look I certainly don't look at scrollback, let alone looking at
> scrollback *carefully*.

It gets duty on freenode #perl too, and its not uncommon for people
like me to glance at https://metacpan.org/recent ( usually to see
something and regret looking )



-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


Re: Open source archives hosting malicious software packages

2017-09-21 Thread David Cantrell
On Wed, Sep 20, 2017 at 11:13:50PM +0100, David Precious wrote:

> One thing I thing is good to consider is the fact that all CPAN releases
> get announced on a quite populated IRC channel, increasing the chance of
> someone spotting a release announcement and thinking "hmm, that looks
> dodgy" - but that's of course not entirely reliable, and doesn't focus
> only on new releases.

But is anyone paying attention? I assume you're talking about
#cpantesters, which I'm on, but I hardly ever look at it, and when I do
look I certainly don't look at scrollback, let alone looking at
scrollback *carefully*.

-- 
David Cantrell | Godless Liberal Elitist

Planckton: n, the smallest possible living thing


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Kent Fredric
On 21 September 2017 at 20:24, Neil Bowers  wrote:

> I’ll tweak my script to not worry about packages in the same distribution
> (eg Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of
> new packages each day, and I’m just about there :-)

I'd probably want PAUSE trust modelling to play a part too. On the
basis that people are unlikely to typo-squat themselves, and that
recognized, reputable authors are less likely to typo-squat.

(Because reputation is an important thing to maintain in opensource,
tarnish your reputation and nobody will use your stuff any more)

Which, by inversion, means that newer authors are more disposed to
typo-squatting, and that people are more likely to typo squat things
dissimilar to what they already own.

A long time ago, I was discussing with somebody, I cant remember who,
that we could generalize this problem as a public feed, allowing
anyone to review new module permissions assignments and changes.

Having public access to the permissions list is good, but having some
sort of feed that makes it public knowledge every time a new
permission occurs, or every time a permission change occurs, would do
wonders for this problem ( and others, like the surprise change of
hands of important but undermaintained modules into the hands of
potentially too keen maintainers )

It would even expose attempts at smuggling typo-squatted names in the
back of distros with dissimilar names, similar to cuckoo-packages.


-- 
Kent

KENTNL - https://metacpan.org/author/KENTNL


Re: Open source archives hosting malicious software packages

2017-09-21 Thread Neil Bowers
> Would anyone know of any prior art for detection of "short edit distances"?  
> (Perhaps even already on CPAN?)

As David & Zefram pointed out, Levenshtein is the classic algorithm for this, 
but there are plenty of others; in the SEE ALSO for Text::Levenshtein I’ve 
listed at least some of the ones I know of on CPAN:
https://metacpan.org/pod/Text::Levenshtein#SEE-ALSO

A better algorithm for this purpose is the Damerau-Levenshtein edit distance:
Classic Levenshtein counts the number of insertions, deletions, and 
substitutions needed to get from one string to the other. Comparing 
"Algorithm::SVM" and "Algorithm::VSM” gives an edit distance of 2.
The Damerau variant adds transpositions of adjacent characters. This results in 
an edit distance of 1 for the example above, which is how my script found it.

I used Text::Levenshtein::Damerau::XS, because it’s quicker. That’s how I found 
the examples I gave yesterday.

I’ll tweak my script to not worry about packages in the same distribution (eg 
Acme::Flat::GV and Acme::Flat::HV). Then I just need to get a list of new 
packages each day, and I’m just about there :-)

Neil



Re: Open source archives hosting malicious software packages

2017-09-20 Thread Zefram
James E Keenan wrote:
>Would anyone know of any prior art for detection of "short edit distances"?
>(Perhaps even already on CPAN?)

Text::Levenshtein.

-zefram


Re: Open source archives hosting malicious software packages

2017-09-20 Thread David Precious
On Wed, 20 Sep 2017 18:08:34 -0400
James E Keenan  wrote:

> On 09/20/2017 06:01 PM, Neil Bowers wrote:
> > One thing we could do is have a tool looking at newly registered
> > package names and alert the PAUSE admins to have a look at any that
> > are a short edit distance from an existing package name. 
> 
> Would anyone know of any prior art for detection of "short edit 
> distances"?  (Perhaps even already on CPAN?)

Isn't that just the Levenshtein distance?  So e.g.
Neil's Text::Levenshtein?

One thing I thing is good to consider is the fact that all CPAN releases
get announced on a quite populated IRC channel, increasing the chance of
someone spotting a release announcement and thinking "hmm, that looks
dodgy" - but that's of course not entirely reliable, and doesn't focus
only on new releases.


Re: Open source archives hosting malicious software packages

2017-09-20 Thread James E Keenan

On 09/20/2017 06:01 PM, Neil Bowers wrote:

http://www.theregister.co.uk/2017/09/15/pretend_python_packages_prey_on_poor_typing/Would
 CPAN be subject to the same problem as described in the article above?


Yes.

DBI::Class, for example, could be a typo for DBIx::Class or a
misremembered Class::DBI, and there's nothing stopping anyone from
uploading a DBI::Class package that does all kinds of dodgy stuff.


There are plenty of confusable (small edit distance) pairs of module names on 
CPAN.

For example,
Algorithm::SVM and Algorithm::VSM
AI::POS and AI::PSO
both pairs are from different dists. More likely with short acronyms.

One thing we could do is have a tool looking at newly registered package names 
and alert the PAUSE admins to have a look at any that are a short edit distance 
from an existing package name.



Would anyone know of any prior art for detection of "short edit 
distances"?  (Perhaps even already on CPAN?)


Thank you very much.
Jim Keenan


Re: Open source archives hosting malicious software packages

2017-09-20 Thread Neil Bowers
>> http://www.theregister.co.uk/2017/09/15/pretend_python_packages_prey_on_poor_typing/Would
>>  CPAN be subject to the same problem as described in the article above?
> 
> Yes.
> 
> DBI::Class, for example, could be a typo for DBIx::Class or a
> misremembered Class::DBI, and there's nothing stopping anyone from
> uploading a DBI::Class package that does all kinds of dodgy stuff.

There are plenty of confusable (small edit distance) pairs of module names on 
CPAN.

For example,
Algorithm::SVM and Algorithm::VSM
AI::POS and AI::PSO
both pairs are from different dists. More likely with short acronyms.

One thing we could do is have a tool looking at newly registered package names 
and alert the PAUSE admins to have a look at any that are a short edit distance 
from an existing package name.

Neil


Re: Open source archives hosting malicious software packages

2017-09-20 Thread David Cantrell
On Fri, Sep 15, 2017 at 07:11:49PM -0400, James E Keenan wrote:

> http://www.theregister.co.uk/2017/09/15/pretend_python_packages_prey_on_poor_typing/
> 
> Would CPAN be subject to the same problem as described in the article above?

Yes.

DBI::Class, for example, could be a typo for DBIx::Class or a
misremembered Class::DBI, and there's nothing stopping anyone from
uploading a DBI::Class package that does all kinds of dodgy stuff.

-- 
David Cantrell | semi-evolved ape-thing

  Longum iter est per praecepta, breve et efficax per exempla.