Re: [PATCH 4/7] scripts/leaking_addresses: add reporting

2017-11-08 Thread Tobin C. Harding
On Wed, Nov 08, 2017 at 11:42:21AM +0100, Petr Mladek wrote:
> On Wed 2017-11-08 14:37:36, Tobin C. Harding wrote:
> > Currently script just dumps all results found. Potentially, this risks
> > loosing single results among multiple duplicate results. We need some
> > way of restricting duplicates to assist users of the script. It would
> > also be nice if we got a report instead of raw results.
> > 
> > Duplicates can be defined in various ways, instead of trying to find a
> > single perfect solution we can present the user with various options to
> > display the output. Doing so will typically lead to users wanting to
> > view the output multiple times. Currently we scan the kernel each time,
> > this is slow and unnecessary. We can expedite the process by writing the
> > results to file for subsequent viewing.
> > 
> > Add sub-commands `scan` and `format`. Display output as a report instead
> > of raw results. Add --raw flag to view raw results. Save results to
> > file. For subsequent calls to `format` parse output file instead of
> > re-scanning.
> > 
> > Signed-off-by: Tobin C. Harding 
> > ---
> >  scripts/leaking_addresses.pl | 201 
> > ---
> >  1 file changed, 187 insertions(+), 14 deletions(-)
> > 
> > diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
> > index 719ed0aaede7..4c31e935319b 100755
> > --- a/scripts/leaking_addresses.pl
> > +++ b/scripts/leaking_addresses.pl
> > @@ -21,14 +21,19 @@ use File::Spec;
> >  use Cwd 'abs_path';
> >  use Term::ANSIColor qw(:constants);
> >  use Getopt::Long qw(:config no_auto_abbrev);
> > +use File::Spec::Functions 'catfile';
> >  
> >  my $P = $0;
> >  my $V = '0.01';
> >  
> > -# Directories to scan.
> > +# Directories to scan (we scan `dmesg` also).
> >  my @DIRS = ('/proc', '/sys');
> >  
> >  # Command line options.
> > +my $output = "scan.out";
> 
> The hard-coded filename and its use is dangerous. Nobody expects that
> this kind of script writes/re-writes a file on the system.

Understood.

> > +my $suppress_dmesg = 0;
> > +my $squash_by_path = 0;
> > +my $raw = 0;
> >  my $help = 0;
> >  my $debug = 0;
> >  
> > @@ -70,21 +75,34 @@ sub help
> > my ($exitcode) = @_;
> >  
> > print << "EOM";
> > -Usage: $P [OPTIONS]
> > +
> > +Usage: $P COMMAND [OPTIONS]
> >  Version: $V
> >  
> > +Commands:
> > +
> > +   scanScan the kernel (savesg raw results to file and runs `format`).
> > +   format  Parse results file and format output.
> 
> The later formatting/filtering might be useful but the use
> of the hard coded file is strange. Also it is pity that
> the script does not do anything useful out of box.
> 
> I suggest to remove the commands and do the scan out of box.
> It should not store anything on the disk by default.
> 
> Then we could define following options:
> 
> -o, --output=Store raw results into file for later 
> formatting.
> -i, --input=   Read raw result from file and just format them.
> 
> Well, it is still somehow non-intuitive. It might help to
> be more explicit:
> 
> -o, --output-raw=
> -i, --input-raw=

So,

 Usage: scripts/leaking_addresses.pl [OPTIONS]

 Options:

-o, --output-raw=  Save results for future processing.
-i, --input-raw=   Read results from file instead of scanning.
--rawShow raw results (default).
--suppress-dmesg Do not show dmesg results.
--squash-by-path Show one result per unique path.
--squash-by-filename Show one result per unique filename.
-d, --debug  Display debugging output.
-h, --help, --versionDisplay this help and exit.

> >  Options:
> >  
> > -   -d, --debugDisplay debugging output.
> > -   -h, --help, --version  Display this help and exit.
> > +   -o, --output=  Raw results output file, used for later 
> > formatting.
> > +   --suppress-dmesg Do not show dmesg results.
> > +   --squash-by-path Show one result per unique path.
> 
> I would personally add also option for the default mode:
> 
>   --squash-by-filename Show one result per unique filename
>(default).
> 
> In fact, I would personally use --squash-by-path or even --raw by
> default. These provide easy to understand output. While the
> --squash-by-filename mode has pretty good results but
> it is quite non-intuitive.

Thanks for you suggestions Petr. Summary reporting by default was
suggested by Linus, but now the summary is implemented and has proven to
be heuristic I tend to agree with you that raw by default is
best. This gives users the information they need to select one of the
summary types i.e if raw output has a bunch of lines from different
paths but all filename FOO then --squash-by-filename may be used.

Thanks for the tips, it's much nicer without the sub-commands.

thanks,
Tobin.


Re: [PATCH 4/7] scripts/leaking_addresses: add reporting

2017-11-08 Thread Petr Mladek
On Wed 2017-11-08 14:37:36, Tobin C. Harding wrote:
> Currently script just dumps all results found. Potentially, this risks
> loosing single results among multiple duplicate results. We need some
> way of restricting duplicates to assist users of the script. It would
> also be nice if we got a report instead of raw results.
> 
> Duplicates can be defined in various ways, instead of trying to find a
> single perfect solution we can present the user with various options to
> display the output. Doing so will typically lead to users wanting to
> view the output multiple times. Currently we scan the kernel each time,
> this is slow and unnecessary. We can expedite the process by writing the
> results to file for subsequent viewing.
> 
> Add sub-commands `scan` and `format`. Display output as a report instead
> of raw results. Add --raw flag to view raw results. Save results to
> file. For subsequent calls to `format` parse output file instead of
> re-scanning.
> 
> Signed-off-by: Tobin C. Harding 
> ---
>  scripts/leaking_addresses.pl | 201 
> ---
>  1 file changed, 187 insertions(+), 14 deletions(-)
> 
> diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
> index 719ed0aaede7..4c31e935319b 100755
> --- a/scripts/leaking_addresses.pl
> +++ b/scripts/leaking_addresses.pl
> @@ -21,14 +21,19 @@ use File::Spec;
>  use Cwd 'abs_path';
>  use Term::ANSIColor qw(:constants);
>  use Getopt::Long qw(:config no_auto_abbrev);
> +use File::Spec::Functions 'catfile';
>  
>  my $P = $0;
>  my $V = '0.01';
>  
> -# Directories to scan.
> +# Directories to scan (we scan `dmesg` also).
>  my @DIRS = ('/proc', '/sys');
>  
>  # Command line options.
> +my $output = "scan.out";

The hard-coded filename and its use is dangerous. Nobody expects that
this kind of script writes/re-writes a file on the system.

> +my $suppress_dmesg = 0;
> +my $squash_by_path = 0;
> +my $raw = 0;
>  my $help = 0;
>  my $debug = 0;
>  
> @@ -70,21 +75,34 @@ sub help
>   my ($exitcode) = @_;
>  
>   print << "EOM";
> -Usage: $P [OPTIONS]
> +
> +Usage: $P COMMAND [OPTIONS]
>  Version: $V
>  
> +Commands:
> +
> + scanScan the kernel (savesg raw results to file and runs `format`).
> + format  Parse results file and format output.

The later formatting/filtering might be useful but the use
of the hard coded file is strange. Also it is pity that
the script does not do anything useful out of box.

I suggest to remove the commands and do the scan out of box.
It should not store anything on the disk by default.

Then we could define following options:

-o, --output=  Store raw results into file for later formatting.
-i, --input=   Read raw result from file and just format them.

Well, it is still somehow non-intuitive. It might help to
be more explicit:

-o, --output-raw=
-i, --input-raw=


>  Options:
>  
> - -d, --debugDisplay debugging output.
> - -h, --help, --version  Display this help and exit.
> + -o, --output=  Raw results output file, used for later 
> formatting.
> + --suppress-dmesg Do not show dmesg results.
> + --squash-by-path Show one result per unique path.

I would personally add also option for the default mode:

--squash-by-filename Show one result per unique filename
 (default).

In fact, I would personally use --squash-by-path or even --raw by
default. These provide easy to understand output. While the
--squash-by-filename mode has pretty good results but
it is quite non-intuitive.

Best Regards,
Petr


[PATCH 4/7] scripts/leaking_addresses: add reporting

2017-11-07 Thread Tobin C. Harding
Currently script just dumps all results found. Potentially, this risks
loosing single results among multiple duplicate results. We need some
way of restricting duplicates to assist users of the script. It would
also be nice if we got a report instead of raw results.

Duplicates can be defined in various ways, instead of trying to find a
single perfect solution we can present the user with various options to
display the output. Doing so will typically lead to users wanting to
view the output multiple times. Currently we scan the kernel each time,
this is slow and unnecessary. We can expedite the process by writing the
results to file for subsequent viewing.

Add sub-commands `scan` and `format`. Display output as a report instead
of raw results. Add --raw flag to view raw results. Save results to
file. For subsequent calls to `format` parse output file instead of
re-scanning.

Signed-off-by: Tobin C. Harding 
---
 scripts/leaking_addresses.pl | 201 ---
 1 file changed, 187 insertions(+), 14 deletions(-)

diff --git a/scripts/leaking_addresses.pl b/scripts/leaking_addresses.pl
index 719ed0aaede7..4c31e935319b 100755
--- a/scripts/leaking_addresses.pl
+++ b/scripts/leaking_addresses.pl
@@ -21,14 +21,19 @@ use File::Spec;
 use Cwd 'abs_path';
 use Term::ANSIColor qw(:constants);
 use Getopt::Long qw(:config no_auto_abbrev);
+use File::Spec::Functions 'catfile';
 
 my $P = $0;
 my $V = '0.01';
 
-# Directories to scan.
+# Directories to scan (we scan `dmesg` also).
 my @DIRS = ('/proc', '/sys');
 
 # Command line options.
+my $output = "scan.out";
+my $suppress_dmesg = 0;
+my $squash_by_path = 0;
+my $raw = 0;
 my $help = 0;
 my $debug = 0;
 
@@ -70,21 +75,34 @@ sub help
my ($exitcode) = @_;
 
print << "EOM";
-Usage: $P [OPTIONS]
+
+Usage: $P COMMAND [OPTIONS]
 Version: $V
 
+Commands:
+
+   scanScan the kernel (savesg raw results to file and runs `format`).
+   format  Parse results file and format output.
+
 Options:
 
-   -d, --debugDisplay debugging output.
-   -h, --help, --version  Display this help and exit.
+   -o, --output=  Raw results output file, used for later 
formatting.
+   --suppress-dmesg Do not show dmesg results.
+   --squash-by-path Show one result per unique path.
+   --rawShow raw results.
+   -d, --debug  Display debugging output.
+   -h, --help, --versionDisplay this help and exit.
 
 Scans the running (64 bit) kernel for potential leaking addresses.
-
 EOM
exit($exitcode);
 }
 
 GetOptions(
+   'o|output=s'=> \$output,
+   'suppress-dmesg'=> \$suppress_dmesg,
+   'squash-by-path'=> \$squash_by_path,
+   'raw'   => \$raw,
'd|debug'   => \$debug,
'h|help'=> \$help,
'version'   => \$help
@@ -92,8 +110,21 @@ GetOptions(
 
 help(0) if ($help);
 
-parse_dmesg();
-walk(@DIRS);
+my ($command) = @ARGV;
+if (not defined $command) {
+   help(128);
+}
+
+if ($command ne 'scan' and $command ne 'format') {
+   printf "\nUnknown command: %s\n\n", $command;
+   help(128);
+}
+
+if ($command eq 'scan') {
+   scan();
+}
+
+format_output();
 
 exit 0;
 
@@ -102,6 +133,17 @@ sub dprint
printf(STDERR @_) if $debug;
 }
 
+sub scan
+{
+   open (my $fh, '>', "$output") or die "$0: $output: $!\n";
+   select $fh;
+
+   parse_dmesg();
+   walk(@DIRS);
+
+   select STDOUT;
+}
+
 sub is_false_positive
 {
my ($match) = @_;
@@ -120,30 +162,39 @@ sub is_false_positive
return 0;
 }
 
-# True if argument potentially contains a kernel address.
 sub may_leak_address
 {
my ($line) = @_;
+
+   my @addresses = extract_addresses($line);
+   return @addresses > 0;
+}
+
+# Return _all_ non false positive addresses from $line.
+sub extract_addresses
+{
+   my ($line) = @_;
my $address = '\b(0x)?[[:xdigit:]]{12}\b';
+   my (@addresses, @empty);
 
# Signal masks.
if ($line =~ '^SigBlk:' or
$line =~ '^SigCgt:') {
-   return 0;
+   return @empty;
}
 
if ($line =~ '\bKEY=[[:xdigit:]]{14} [[:xdigit:]]{16} 
[[:xdigit:]]{16}\b' or
$line =~ '\b[[:xdigit:]]{14} [[:xdigit:]]{16} [[:xdigit:]]{16}\b') {
-   return 0;
+   return @empty;
}
 
-   while (/($address)/g) {
+   while ($line =~ /($address)/g) {
if (!is_false_positive($1)) {
-   return 1;
+   push @addresses, $1;
}
}
 
-   return 0;
+   return @addresses;
 }
 
 sub parse_dmesg
@@ -203,7 +254,6 @@ sub parse_file
close $fh;
 }
 
-
 # True if we should skip walking this directory.
 sub skip_walk
 {
@@ -236,3 +286,126 @@ sub walk
}
}
 }
+