Bug#738342: lintian: checks/cruft - GFDL check is slow

2014-02-10 Thread Bastien ROUCARIES
Le 9 févr. 2014 13:54, "Niels Thykier"  a écrit :
>
> Package: lintian
> Version: 2.5.21
> Severity: normal
>
> A quick benchmark suggests that lintian spends nearly 2 minutes on the
> Linux source package (I tested with linux/3.10~rc7-1~exp1).  Profiling
> Lintian with perl -d:NYTProf suggests that the vast majority of the time
> is spent in:
>
> """
> if ($cleanedblock =~ $gfdlpattern) {
> """
>
> Where $gfdlpattern is one of:
>
> """
> # classical gfdl matching pattern
> my $normalgfdlpattern = qr/
>  (?'contextbefore'(?:
> (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+
is).){1024}|
> (?:\s+ copy \s+ of \s+ the \s+ license \s+
is.{0,1024}?)))
>  gnu \s+ free \s+ documentation \s+ license
>  (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation
\s+ license).){0,1024}?)
>  a \s+ copy \s+ of \s+ the \s+ license \s+ is
> /xsmo;
>
> # for first block we get context from the beginning
> my $firstblockgfdlpattern = qr/
>  (?'rawcontextbefore'(?:
> (?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+
is).){1024}|
>   \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+
is).){0,1024}|
> (?:\s+ copy \s+ of \s+ the \s+ license \s+
is.{0,1024}?)
>   )
>  )
>  gnu \s+ free \s+ documentation \s+ license
>  (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation
\s+ license).){0,1024}?)
>  a \s+ copy \s+ of \s+ the \s+ license \s+ is
>  /xsmo;
> """
>
>
> The profiler suggests that 60% of the runtime is spent in the
> "CORE:match" operations inside "license_check" from c/cruft.  The
> regex appeas to be hit "only" 2452 times, but it spends an average of
> 55.9ms per time totalling 137s.
>
> Bastian, do you have an ideas for reducing the cost of the regex?

Yes I have.

Use these regexp only if we could match gnu free documentation license

Bastien
>
> ~Niels
>


Bug#738342: lintian: checks/cruft - GFDL check is slow

2014-02-09 Thread Niels Thykier
Package: lintian
Version: 2.5.21
Severity: normal

A quick benchmark suggests that lintian spends nearly 2 minutes on the
Linux source package (I tested with linux/3.10~rc7-1~exp1).  Profiling
Lintian with perl -d:NYTProf suggests that the vast majority of the time
is spent in:

"""
if ($cleanedblock =~ $gfdlpattern) {
"""

Where $gfdlpattern is one of:

"""
# classical gfdl matching pattern
my $normalgfdlpattern = qr/
 (?'contextbefore'(?:
(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
(?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)))
 gnu \s+ free \s+ documentation \s+ license
 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ 
license).){0,1024}?)
 a \s+ copy \s+ of \s+ the \s+ license \s+ is
/xsmo;

# for first block we get context from the beginning
my $firstblockgfdlpattern = qr/
 (?'rawcontextbefore'(?:
(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ is).){1024}|
  \A(?:(?!a \s+ copy \s+ of \s+ the \s+ license \s+ 
is).){0,1024}|
(?:\s+ copy \s+ of \s+ the \s+ license \s+ is.{0,1024}?)
  )
 )
 gnu \s+ free \s+ documentation \s+ license
 (?'rawgfdlsections'(?:(?!gnu \s+ free \s+ documentation \s+ 
license).){0,1024}?)
 a \s+ copy \s+ of \s+ the \s+ license \s+ is
 /xsmo;
"""


The profiler suggests that 60% of the runtime is spent in the
"CORE:match" operations inside "license_check" from c/cruft.  The
regex appeas to be hit "only" 2452 times, but it spends an average of
55.9ms per time totalling 137s.

Bastian, do you have an ideas for reducing the cost of the regex?

~Niels


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org