Re: [PATCH] elfclassify tool

2019-08-13 Thread Mark Wielaard
Hi,

With the dwelf_elf_begin fix now committed I intend to commit the
elfclassify tool as attached. It now treats a NULL return from
[dwelf_]elf_begin always as error. It has the new --library
classification and various fixes that were pointed out during the
review.

Cheers,

Mark>From 3f489b5c7c78df6d52f8982f79c36e9a220e8951 Mon Sep 17 00:00:00 2001
From: Florian Weimer 
Date: Tue, 13 Aug 2019 13:27:15 +0200
Subject: [PATCH] elfclassify: New tool to analyze ELF objects.

Usage: elfclassify [OPTION...] FILE...
Determine the type of an ELF file.

All of the classification options must apply at the same time to a
particular file.  Classification options can be negated using a
"--not-" prefix.

Since modern ELF does not clearly distinguish between programs and
dynamic shared objects, you should normally use either --executable or
--shared to identify the primary purpose of a file.  Only one of the
--shared and --executable checks can pass for a file.

If you want to know whether an ELF object might a program or a shared
library (but could be both), then use --program or --library. Some ELF
files will classify as both a program and a library.

If you just want to know whether an ELF file is loadable (as program
or library) use --loadable.  Note that files that only contain
(separate) debug information (--debug-only) are never --loadable (even
though they might contain program headers).  Linux kernel modules are
also not --loadable (in the normal sense).

Without any of the --print options, the program exits with status 0 if
the requested checks pass for all input files, with 1 if a check fails
for any file, and 2 if there is an environmental issue (such as a file
read error or a memory allocation error).

When printing file names, the program exits with status 0 even if no
file names are printed, and exits with status 2 if there is an
environmental issue.

On usage error (e.g. a bad option was given), the program exits with a
status code larger than 2.

The --quiet or -q oose_filestion suppresses some error warning output,
but doesn't change the exit status.

Classification options
  --core File is an ELF core dump file
  --debug-only   File is a debug only ELF file (separate .debug,
 .dwo or dwz multi-file)
  --elf  File looks like an ELF object or archive/static
 library (default)
  --elf-archive  File is an ELF archive or static library
  --elf-file File is an regular ELF object (not an
 archive/static library)
  --executable   File is (primarily) an ELF program executable (not
 primarily a DSO)
  --library  File is an ELF shared object (DSO) (might also be
 an executable)
  --linux-kernel-module  File is a linux kernel module
  --loadable File is a loadable ELF object (program or shared
 object)
  --program  File is an ELF program executable (might also be a
 DSO)
  --shared   File is (primarily) an ELF shared object (DSO)
 (not primarily an executable)
  --unstripped   File is an ELF file with symbol table or .debug_*
 sections and can be stripped further

Input flags
  -f, --file Only classify regular (not symlink nor special
 device) files
  --no-stdin Do not read files from standard input (default)
  --stdinAlso read file names to process from standard
 input, separated by newlines
  --stdin0   Also read file names to process from standard
 input, separated by ASCII NUL bytes
  -z, --compressed   Try to open compressed files or embedded (kernel)
 ELF images

Output flags
  --matching If printing file names, print matching files
 (default)
  --no-print Do not output file names
  --not-matching If printing file names, print files that do not
 match
  --printOutput names of files, separated by newline
  --print0   Output names of files, separated by ASCII NUL

Additional flags
  -q, --quietSuppress some error output (counterpart to
 --verbose)
  -v, --verbose  Output additional information (can be specified
 multiple times)

  -?, --help Give this help list
  --usageGive a short usage message
  -V, --version  Print program version

Report bugs to https://sourceware.org/bugzilla.

Signed-off-by: Florian Weimer 
Signed-off-by: Mark Wielaard 
---
 src/ChangeLog |9 +-
 src/Makefile.am   |4 +-
 src/elfclassify.c | 1046 +
 tests/ChangeLog   

Re: [PATCH] elfclassify tool

2019-08-13 Thread Mark Wielaard
On Mon, Jul 29, 2019 at 04:38:17PM +0200, Florian Weimer wrote:
> > On Mon, Jul 29, 2019 at 11:16:31AM +0200, Florian Weimer wrote:
> > So the test should be:
> >
> > diff --git a/src/elfclassify.c b/src/elfclassify.c
> > index ebd42c1d5..b17d14d45 100644
> > --- a/src/elfclassify.c
> > +++ b/src/elfclassify.c
> > @@ -862,7 +862,7 @@ process_stdin (int *status)
> >  break;
> >if (ret < 0)
> >  abort ();   /* Cannot happen due to error checks above.  */
> > -  if (delim != '\0' && ret > 0)
> > +  if (delim != '\0' && ret > 0 && buffer[ret - 1] == '\n')
> >  buffer[ret - 1] = '\0';
> >current_path = buffer;
> >process_current_path (status);
> 
> Right.  But now I wonder why ret == 0 can ever happen.  Maybe on
> OpenVMS, which doesn't use in-band signaling for line terminators?

I also couldn't create a situation where ret == 0.
But I still included the change because it feels more robust.

Thanks,

Mark


Re: [PATCH] elfclassify tool

2019-08-12 Thread Florian Weimer
* Mark Wielaard:

> What do you think about this change to dwelf_elf_begin?
> The change would make it possible to detect real errors in the
> elfclassify code, whether elf_begin or dwelf_elf_begin was used. So we
> would not misclassify files (but return an error status of 2).

I'm not really familiar with how these functions are used, sorry,
Viewed in isolation, the changes appear reasonable to me.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-08-11 Thread Mark Wielaard
On Mon, 2019-07-29 at 16:24 +0200, Mark Wielaard wrote:
> On Mon, Jul 29, 2019 at 10:43:56AM +0200, Florian Weimer wrote:
> > * Mark Wielaard:
> > 
> > > +  if (elf == NULL)
> > > +{
> > > +  /* This likely means it just isn't an ELF file, probably not a
> > > +  real issue, but warn if verbose reporting.  */
> > > +  if (verbose > 0)
> > > + fprintf (stderr, "warning: %s: %s\n", current_path, elf_errmsg (-1));
> > > +  return false;
> > > +}
> > 
> > Is it possible to distinguish the error from a memory allocation error?
> > It would be wrong to mis-classify a file just because the system is low
> > on memory.
> 
> You are right this is not the proper way to report the issue.
> Normally, when just using elf_begin, a NULL return should be reported
> through elf_issue (which will set the issues flag).
> 
> But, because I added -z, we are using either elf_begin or
> dwelf_elf_begin. dwelf_elf_begin will return NULL (instead of a an
> empty (ELF_K_NONE) Elf descriptor when there is an issue, or the
> (decompressed) file wasn't an ELF file.
> 
> So we should split the error reporting. If we called elf_begin and get
> NULL we should call elf_issue to report the proper issue.
> 
> If we called dwefl_elf_begin and we get NULL, I am not sure yet what
> the proper way is to detect whether it is a real issue, or "just" not
> a (decompressed) ELF file. I am afraid the current handling is the
> best we can do.
> 
> Maybe we can fix dwelf_elf_begin to return an empty (ELF_K_NONE) Elf
> descriptor if there was no issue, but the (decompressed) file wasn't
> an ELF file.

Sorry this took so long. And this is indeed the last issue holding up
the release. But this is a tricky problem.

We made a mistake when we wrote the contract for dwelf_elf_begin by
saying it would never return ELF_K_NONE. That made it different from
elf_begin and made it impossible to distinguish between a real (file or
decompression) error and whether the file simply wasn't an ELF file and
also wasn't a compressed ELF file.

I think we should fix the contract. Technically it would be an API
break, but I think no user is really relying on the fact that the Elf
handle returned is never ELF_K_NONE. Users still need to distinguish
between ELF_K_ELF and ELF_K_AR (and theoretically any other ELF_K_type,
like COFF, which we currently don't support, but we do define it).

So that is what the attached patch does. I also audited all
decompression code to make sure it returns error codes consistently.
The decompression will either decompress successfully and return
DWFL_E_NOERROR, or if the file wasn't compressed (or an embedded image)
it will return DWFL_E_BADELF. In all other cases (file or decompression
error) it will set a a different DWFL_E error.

This "only" leaves the problem that we don't have a good way to
translate those errors into "real" libelf error codes. So for now we
just fake one if it wasn't an elf_errno value. I don't intent to try to
solve this error translation issue before the release (I don't know how
to do it yet).

What do you think about this change to dwelf_elf_begin?
The change would make it possible to detect real errors in the
elfclassify code, whether elf_begin or dwelf_elf_begin was used. So we
would not misclassify files (but return an error status of 2).

Thanks,

Mark
From 648837a9f1be7628e9ceee6818bf56c80b9d3fa1 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Mon, 12 Aug 2019 00:43:22 +0200
Subject: [PATCH] libdwelf: Make dwelf_elf_begin return NULL only when there is
 an error.

dwelf_elf_begin was slightly different from elf_begin in case the file
turned out to not be an ELF file. elf_begin would return an Elf handle
with ELF_K_NONE. But dwelf_elf_begin would return NULL. This made it
impossible to tell the difference between a file or decompression error
and a (decompressed) file not being an ELF file.

Since dwelf_elf_begin could still return different kinds of ELF files
(ELF_K_ELF or ELF_K_AR - and theoretically ELF_K_COFF) this was not
really useful anyway. So make it so that dwelf_elf_begin always returns
an Elf handle unless there was a real error reading or decompressing
the file. Otherwise return NULL to make clear there was a real error.

Make sure that the decompression function returns DWFL_E_BADELF only
when the file isn't compressed. In which case the Elf handle won't
be replaced and can be returned (as ELF_K_NONE).

Add a new version to dwelf_elf_begin so programs can rely on it
returning NULL only for real errors.

Signed-off-by: Mark Wielaard 
---
 libdw/ChangeLog|  4 
 libdw/libdw.map|  4 
 libdwelf/ChangeLog |  6 ++
 libdwelf/dwelf_elf_begin.c | 12 +++-
 libdwelf/libdwelf.h|  9 ++---
 libdwfl/ChangeLog  |  8 
 libdwfl/gzip.c |  5 +++--
 libdwfl/open.c | 10 +++---
 8 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/libdw/ChangeLog b/libdw/ChangeLog
index 

Re: [PATCH] elfclassify tool

2019-07-29 Thread Florian Weimer
* Mark Wielaard:

>> Regarding the test case, I think if the build target is ELF, it makes
>> sense to check that the elfutils binaries themselves are classified as
>> expected, with the current build flags.  This will detect changes
>> required due to the evolution of the toolchain.
>
> That is what the run-elfclassify-self.sh testcase does I believe.

Ah, I had missed it.  Great, I think this one is covered.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-29 Thread Mark Wielaard
On Mon, Jul 29, 2019 at 11:22:13AM +0200, Florian Weimer wrote:
> * Mark Wielaard:
> 
> > Signed-off-by: Mark Wielaard 
> 
> Does elfutils use DCO?  Then yoy have my signoff as well:
> 
> Signed-off-by: Florian Weimer 

Thanks. Yes, elfutils uses a Developer Certificate of Origin based on
the linux kernel one (but clarified for the elfutils project details -
like some files having multiple licenses at the same time). We do
require all commits to have a Signed-off-by line from developers that
agree with the DCO as explained in the CONTRIBUTING document.

> You should you list yourself as an author somewhere in the commit
> message.

The ChangeLog entries do mention both of us as authors.

> Regarding the test case, I think if the build target is ELF, it makes
> sense to check that the elfutils binaries themselves are classified as
> expected, with the current build flags.  This will detect changes
> required due to the evolution of the toolchain.

That is what the run-elfclassify-self.sh testcase does I believe.
Or do you believe it should be extended?

Thanks,

Mark


Re: [PATCH] elfclassify tool

2019-07-29 Thread Florian Weimer
* Mark Wielaard:

> On Mon, Jul 29, 2019 at 11:16:31AM +0200, Florian Weimer wrote:
>> * Mark Wielaard:
>> 
>> > +/* Called to process standard input if flag_stdin is not no_stdin.  */
>> > +static void
>> > +process_stdin (int *status)
>> > +{
>> > +  char delim;
>> > +  if (flag_stdin == do_stdin0)
>> > +delim = '\0';
>> > +  else
>> > +delim = '\n';
>> > +
>> > +  char *buffer = NULL;
>> > +  size_t buffer_size = 0;
>> > +  while (true)
>> > +{
>> > +  ssize_t ret = getdelim (, _size, delim, stdin);
>> > +  if (ferror (stdin))
>> > +  {
>> > +current_path = NULL;
>> > +issue (errno, N_("reading from standard input"));
>> > +break;
>> > +  }
>> > +  if (feof (stdin))
>> > +break;
>> > +  if (ret < 0)
>> > +abort ();   /* Cannot happen due to error checks above.  
>> > */
>> > +  if (delim != '\0' && ret > 0)
>> > +buffer[ret - 1] = '\0';
>> 
>> I think this can overwrite the last character of the last line if the
>> file does not end with '\n'.
>
> I see.  "The buffer is null-terminated and includes the newline
> character, if one was found."
>
> So the test should be:
>
> diff --git a/src/elfclassify.c b/src/elfclassify.c
> index ebd42c1d5..b17d14d45 100644
> --- a/src/elfclassify.c
> +++ b/src/elfclassify.c
> @@ -862,7 +862,7 @@ process_stdin (int *status)
>  break;
>if (ret < 0)
>  abort ();   /* Cannot happen due to error checks above.  */
> -  if (delim != '\0' && ret > 0)
> +  if (delim != '\0' && ret > 0 && buffer[ret - 1] == '\n')
>  buffer[ret - 1] = '\0';
>current_path = buffer;
>process_current_path (status);

Right.  But now I wonder why ret == 0 can ever happen.  Maybe on
OpenVMS, which doesn't use in-band signaling for line terminators?

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-29 Thread Mark Wielaard
On Mon, Jul 29, 2019 at 11:16:31AM +0200, Florian Weimer wrote:
> * Mark Wielaard:
> 
> > +/* Called to process standard input if flag_stdin is not no_stdin.  */
> > +static void
> > +process_stdin (int *status)
> > +{
> > +  char delim;
> > +  if (flag_stdin == do_stdin0)
> > +delim = '\0';
> > +  else
> > +delim = '\n';
> > +
> > +  char *buffer = NULL;
> > +  size_t buffer_size = 0;
> > +  while (true)
> > +{
> > +  ssize_t ret = getdelim (, _size, delim, stdin);
> > +  if (ferror (stdin))
> > +   {
> > + current_path = NULL;
> > + issue (errno, N_("reading from standard input"));
> > + break;
> > +   }
> > +  if (feof (stdin))
> > +break;
> > +  if (ret < 0)
> > +abort ();   /* Cannot happen due to error checks above.  */
> > +  if (delim != '\0' && ret > 0)
> > +buffer[ret - 1] = '\0';
> 
> I think this can overwrite the last character of the last line if the
> file does not end with '\n'.

I see.  "The buffer is null-terminated and includes the newline
character, if one was found."

So the test should be:

diff --git a/src/elfclassify.c b/src/elfclassify.c
index ebd42c1d5..b17d14d45 100644
--- a/src/elfclassify.c
+++ b/src/elfclassify.c
@@ -862,7 +862,7 @@ process_stdin (int *status)
 break;
   if (ret < 0)
 abort ();   /* Cannot happen due to error checks above.  */
-  if (delim != '\0' && ret > 0)
+  if (delim != '\0' && ret > 0 && buffer[ret - 1] == '\n')
 buffer[ret - 1] = '\0';
   current_path = buffer;
   process_current_path (status);

Thanks,

Mark


Re: [PATCH] elfclassify tool

2019-07-29 Thread Mark Wielaard
On Mon, Jul 29, 2019 at 10:43:56AM +0200, Florian Weimer wrote:
> * Mark Wielaard:
> 
> > +  if (elf == NULL)
> > +{
> > +  /* This likely means it just isn't an ELF file, probably not a
> > +real issue, but warn if verbose reporting.  */
> > +  if (verbose > 0)
> > +   fprintf (stderr, "warning: %s: %s\n", current_path, elf_errmsg (-1));
> > +  return false;
> > +}
> 
> Is it possible to distinguish the error from a memory allocation error?
> It would be wrong to mis-classify a file just because the system is low
> on memory.

You are right this is not the proper way to report the issue.
Normally, when just using elf_begin, a NULL return should be reported
through elf_issue (which will set the issues flag).

But, because I added -z, we are using either elf_begin or
dwelf_elf_begin. dwelf_elf_begin will return NULL (instead of a an
empty (ELF_K_NONE) Elf descriptor when there is an issue, or the
(decompressed) file wasn't an ELF file.

So we should split the error reporting. If we called elf_begin and get
NULL we should call elf_issue to report the proper issue.

If we called dwefl_elf_begin and we get NULL, I am not sure yet what
the proper way is to detect whether it is a real issue, or "just" not
a (decompressed) ELF file. I am afraid the current handling is the
best we can do.

Maybe we can fix dwelf_elf_begin to return an empty (ELF_K_NONE) Elf
descriptor if there was no issue, but the (decompressed) file wasn't
an ELF file.

Cheers,

Mark


Re: [PATCH] elfclassify tool

2019-07-29 Thread Florian Weimer
* Mark Wielaard:

> Signed-off-by: Mark Wielaard 

Does elfutils use DCO?  Then yoy have my signoff as well:

Signed-off-by: Florian Weimer 

You should you list yourself as an author somewhere in the commit
message.  This is so much more than what I wrote.

Regarding the test case, I think if the build target is ELF, it makes
sense to check that the elfutils binaries themselves are classified as
expected, with the current build flags.  This will detect changes
required due to the evolution of the toolchain.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-29 Thread Florian Weimer
* Mark Wielaard:

> +/* Called to process standard input if flag_stdin is not no_stdin.  */
> +static void
> +process_stdin (int *status)
> +{
> +  char delim;
> +  if (flag_stdin == do_stdin0)
> +delim = '\0';
> +  else
> +delim = '\n';
> +
> +  char *buffer = NULL;
> +  size_t buffer_size = 0;
> +  while (true)
> +{
> +  ssize_t ret = getdelim (, _size, delim, stdin);
> +  if (ferror (stdin))
> + {
> +   current_path = NULL;
> +   issue (errno, N_("reading from standard input"));
> +   break;
> + }
> +  if (feof (stdin))
> +break;
> +  if (ret < 0)
> +abort ();   /* Cannot happen due to error checks above.  */
> +  if (delim != '\0' && ret > 0)
> +buffer[ret - 1] = '\0';

I think this can overwrite the last character of the last line if the
file does not end with '\n'.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-29 Thread Florian Weimer
* Mark Wielaard:

> +  if (elf == NULL)
> +{
> +  /* This likely means it just isn't an ELF file, probably not a
> +  real issue, but warn if verbose reporting.  */
> +  if (verbose > 0)
> + fprintf (stderr, "warning: %s: %s\n", current_path, elf_errmsg (-1));
> +  return false;
> +}

Is it possible to distinguish the error from a memory allocation error?
It would be wrong to mis-classify a file just because the system is low
on memory.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-27 Thread Mark Wielaard
On Sat, Jul 27, 2019 at 02:04:48AM +0300, Dmitry V. Levin wrote:
> On Sat, Jul 20, 2019 at 11:51:16PM +0200, Mark Wielaard wrote:
> > On Sat, Jul 20, 2019 at 01:57:27AM +0300, Dmitry V. Levin wrote:
> [...]
> > > btw, I think it would be appropriate to move the has_dynamic check before
> > > the first check in is_shared that returns true.
> > 
> > Yes, that is probably fine, but does it really matter?
> 
> It doesn't matter unless the file has DT_SONAME but doesn't have PT_DYNAMIC.
> 
> If /lib64/ld-linux-x86-64.so.2 --verify doesn't like files without
> PT_DYNAMIC, elfclassify --shared shouldn't classify them as DSOs, too.

Yes, I see how theoretically that is "more correct".  But if the file
doesn't have PT_DYNAMIC then it cannot have a DT_SONAME.  And there
are no other checks that return true. So in practice there is no
difference. Still, if it looks more correct, then lets just swap the
checks.

diff --git a/src/elfclassify.c b/src/elfclassify.c
index 03655aea..0b1bb63a 100644
--- a/src/elfclassify.c
+++ b/src/elfclassify.c
@@ -498,6 +498,11 @@ is_shared (void)
   if (elf_type == ET_EXEC)
 return false;
 
+  /* If there is no dynamic section, the file cannot be loaded as a
+ shared object.  */
+  if (!has_dynamic)
+return false;
+
   /* If the object is marked as PIE, it is definitely an executable,
  and not a loadlable shared object.  */
   if (has_pie_flag)
@@ -526,10 +531,6 @@ is_shared (void)
   if (has_dt_debug)
 return false;
 
-  /* If there is no dynamic section, the file cannot be loaded as a
- shared object.  */
-  if (!has_dynamic)
-return false;
   return true;
 }
 
Thanks,

Mark


Re: [PATCH] elfclassify tool

2019-07-26 Thread Dmitry V. Levin
On Sat, Jul 20, 2019 at 11:51:16PM +0200, Mark Wielaard wrote:
> On Sat, Jul 20, 2019 at 01:57:27AM +0300, Dmitry V. Levin wrote:
[...]
> > btw, I think it would be appropriate to move the has_dynamic check before
> > the first check in is_shared that returns true.
> 
> Yes, that is probably fine, but does it really matter?

It doesn't matter unless the file has DT_SONAME but doesn't have PT_DYNAMIC.

If /lib64/ld-linux-x86-64.so.2 --verify doesn't like files without
PT_DYNAMIC, elfclassify --shared shouldn't classify them as DSOs, too.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-26 Thread Mark Wielaard
Hi,

On Mon, Jul 22, 2019 at 05:54:57PM +0200, Florian Weimer wrote:
> I went through these patches, albeit in a somewhat cursory fashion, and
> they look okay to me.
> 
> Do you think this is enough to port over RPM's find-debuginfo.sh?

Yes, I think this would make it possible to drop reliance on 'file'
and replace it with more precise eu-elfclassify classifications.

I squashed your and mine commits together for a final commit as
attached. Could you look over it and see if it looks OK?  If so, I
would like to add your Signed-off-by if you agree with the Developer's
Certificate of Origin as expressed in our CONTRIBUTING file.

Thanks,

Mark
>From 990f0cf28d2bc50837172831f7b3c2bafe272265 Mon Sep 17 00:00:00 2001
From: Florian Weimer 
Date: Thu, 11 Apr 2019 18:07:20 +0200
Subject: [PATCH] elfclassify: New tool to analyze ELF objects.

Usage: eu-elfclassify [OPTION...] FILE...
Determine the type of an ELF file.

All of the classification options must apply at the same time to a
particular file.  Classification options can be negated using a
"--not-" prefix.

Since modern ELF does not clearly distinguish between programs and
dynamic shared objects, you should normally use either --executable or
--shared to identify the primary purpose of a file.  Only one of the
--shared and --executable checks can pass for a file.

If you want to know whether an ELF object might a program or a shared
library (but could be both), then use --program or --library. Some ELF
files will classify as both a program and a library.

If you just want to know whether an ELF file is loadable (as program
or library) use --loadable.  Note that files that only contain
(separate) debug information (--debug-only) are never --loadable (even
though they might contain program headers).  Linux kernel modules are
also not --loadable (in the normal sense).

Without any of the --print options, the program exits with status 0 if
the requested checks pass for all input files, with 1 if a check fails
for any file, and 2 if there is an environmental issue (such as a file
read error or a memory allocation error).

When printing file names, the program exits with status 0 even if no
file names are printed, and exits with status 2 if there is an
environmental issue.

On usage error (e.g. a bad option was given), the program exits with a
status code larger than 2.

The --quiet or -q option suppresses some error warning output, but
doesn't change the exit status.

 Classification options
  --core File is an ELF core dump file
  --debug-only   File is a debug only ELF file (separate .debug,
 .dwo or dwz multi-file)
  --elf  File looks like an ELF object or archive/static
 library (default)
  --elf-archive  File is an ELF archive or static library
  --elf-file File is an regular ELF object (not an
 archive/static library)
  --executable   File is (primarily) an ELF program executable (not
 primarily a DSO)
  --library  File is an ELF shared object (DSO) (might also be
 an executable)
  --linux-kernel-module  File is a linux kernel module
  --loadable File is a loadable ELF object (program or shared
 object)
  --program  File is an ELF program executable (might also be a
 DSO)
  --shared   File is (primarily) an ELF shared object (DSO)
 (not primarily an executable)
  --unstripped   File is an ELF file with symbol table or .debug_*
 sections and can be stripped further

 Input flags
  -f, --file Only classify regular (not symlink nor special
 device) files
  --no-stdin Do not read files from standard input (default)
  --stdinAlso read file names to process from standard
 input, separated by newlines
  --stdin0   Also read file names to process from standard
 input, separated by ASCII NUL bytes
  -z, --compressed   Try to open compressed files or embedded (kernel)
 ELF images

 Output flags
  --matching If printing file names, print matching files
 (default)
  --no-print Do not output file names
  --not-matching If printing file names, print files that do not
 match
  --printOutput names of files, separated by newline
  --print0   Output names of files, separated by ASCII NUL

 Additional flags
  -q, --quietSuppress some error output (counterpart to
 --verbose)
  -v, --verbose  Output additional information (can be specified
 multiple times)

  -?, --help Give this help list
  --usage  

Re: [PATCH] elfclassify tool

2019-07-22 Thread Florian Weimer
* Mark Wielaard:

> On Thu, 2019-04-18 at 13:17 +0200, Florian Weimer wrote:
>> * Florian Weimer:
>> 
>> > > BTW. Florian, the extra options are certainly not required for you to
>> > > implement to get eu-elfclassify accepted. They are just suggestions,
>> > > which we might decide not to do/add. Or they can be added by others if
>> > > they think they are useful.
>> > 
>> > Understood.  I would rather fix the command line syntax as a priority,
>> > implement --unstripped, and add a test suite.
>> 
>> The patch below, also available here:
>> 
>>   
>> 
>> reworks the command line parser, implements filtering of file lists, and
>> adds the --unstripped option.
>
> That looks really good. I went ahead an fixed a couple of nits and
> added some of my suggestions. I'll respond to your other email
> explaining some of my reasoning. The changes I made are:

Wow, this is much more than I expected.  Thanks!

>   elfclassify: Fix bre -> be typo in "unstripped" option help text.
>   elfclassify: When reading stdin make sure paths don't include newline.
>   elfclassify: Allow inspecting compressed or (kernel) image files with -z.
>   elfclassify: Always clean up ELF file and descriptor if one is still open.
>   elfclassify: Don't treat errors in elf_open or run_classify as fatal.
>   elfclassify: Add --quiet/-q to suppress error messages.
>   elfclassify: Add \n to fputs debug output.
>   elfclassify: Add --file/-f for testing just regular files.
>   elfclassify: Require --elf by default. Add more classifications.
>   elfclassify: Add elf_kind and elf_type strings for verbose output.
>   elfclassify: Require PT_LOAD for loadable classification.
>   elfclassify: Add --program classification.
>   elfclassify: Don't use ARGP_NO_EXIT and document exit code expectation.
>   elfclassify: Add --linux-kernel-module classification.
>   elfclassify: Add --debug-only classification.

I went through these patches, albeit in a somewhat cursory fashion, and
they look okay to me.

Do you think this is enough to port over RPM's find-debuginfo.sh?

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-20 Thread Mark Wielaard
On Sat, Jul 20, 2019 at 01:57:27AM +0300, Dmitry V. Levin wrote:
> On Fri, Jul 19, 2019 at 11:36:53PM +0200, Mark Wielaard wrote:
> > > If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
> > > could we come up with a more reliable heuristics than DT_SONAME and 
> > > PT_INTERP?
> > 
> > Why do you feel it is unreliable? Do you have any examples of files
> > misidentified?
> 
> No, I don't have such examples because most (if not all) ET_DYN
> non-DF_1_PIE objects we have nowadays are actually libraries regardless
> of their DT_SONAME or PT_INTERP.
> 
> What actually disqualifies these objects from being libraries, besides
> missing PT_DYNAMIC?

The goal here was to have a classification specifically to handle things
like glibc or Qt executable.so shared libraries. So that --shared would
make them as shared libraries primarily (and so --executable would not
mark them as programs). This is because for shared libraries you might
want to do some processing you don't want for executable (or the other
way around). Think about deciding whether or not to keep the .symtab.

Maybe you are looking for another goal/classification?  For example I
added --program which does classify those special files as programs
(even though --shared also says they are shared libraries). Maybe you
are looking for a different classification similar/dual to that. Say
--library?

> The only reason why I feel uncomfortable to rely on this has_soname check
> is that DT_SONAME is so easily added unnoticed by mistake.

But it isn't used in isolation to mark something as --shared.

> btw, I think it would be appropriate to move the has_dynamic check before
> the first check in is_shared that returns true.

Yes, that is probably fine, but does it really matter?
Florian, what do you think?

Cheers,

Mark


Re: [PATCH] elfclassify tool

2019-07-20 Thread Mark Wielaard
On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
> The individual commits can be found here:
> https://code.wildebeest.org/git/user/mjw/elfutils/log/?h=elfclassify
> 
> Please let me know if any of this looks bad or unusual.
> 
> I'll write some testcases.

And as always when writing tests you find some corner cases :)
So the attached is some testcases plus some fixes:
- Handle files without sections (only phdrs), in particular handle
  PT_DYNAMIC directly (not through a section).
- is_loadable should check whether it isn't a debug-only file
  (but only if there are section headers, if not, it cannot be debug-only).
- For .debug_only we can also have just a .symtab (but no .debug sections).
  And any allocated section makes it not-debug-only (except for SHT_NOBITS
  or SHT_NOTE)
- Detect .zdebug sections too (old GNU compressed ELF sections).

Cheers,

Mark
>From b389a9343193adb357500ccf018ff56495641697 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Sat, 20 Jul 2019 23:30:35 +0200
Subject: [PATCH] elfclassify: Add elfclassify tests and fix issues found.

---
 src/elfclassify.c | 113 +++-
 tests/Makefile.am |   7 +-
 tests/run-elfclassify-self.sh |  36 
 tests/run-elfclassify.sh  | 327 ++
 4 files changed, 432 insertions(+), 51 deletions(-)
 create mode 100755 tests/run-elfclassify-self.sh
 create mode 100755 tests/run-elfclassify.sh

diff --git a/src/elfclassify.c b/src/elfclassify.c
index 83a97d47c..1df0789d2 100644
--- a/src/elfclassify.c
+++ b/src/elfclassify.c
@@ -202,7 +202,8 @@ elf_type_string (int type)
 
 static int elf_type;
 static bool has_program_load;
-static bool has_progbits_alloc;
+static bool has_sections;
+static bool has_bits_alloc;
 static bool has_program_interpreter;
 static bool has_dynamic;
 static bool has_soname;
@@ -219,7 +220,8 @@ run_classify (void)
   /* Reset to unanalyzed default.  */
   elf_type = 0;
   has_program_load = false;
-  has_progbits_alloc = false;
+  has_sections = false;
+  has_bits_alloc = false;
   has_program_interpreter = false;
   has_dynamic = false;
   has_soname = false;
@@ -247,6 +249,7 @@ run_classify (void)
   elf_type = ehdr->e_type;
 
   /* Examine program headers.  */
+  GElf_Phdr dyn_seg = {};
   {
 size_t nphdrs;
 if (elf_getphdrnum (elf, ) != 0)
@@ -264,7 +267,10 @@ run_classify (void)
 	return false;
 	  }
 	if (phdr->p_type == PT_DYNAMIC)
-	  has_dynamic = true;
+	  {
+	dyn_seg = *phdr;
+	has_dynamic = true;
+	  }
 	if (phdr->p_type == PT_INTERP)
 	  has_program_interpreter = true;
 	if (phdr->p_type == PT_LOAD)
@@ -272,7 +278,18 @@ run_classify (void)
   }
   }
 
-  Elf_Scn *dyn_section = NULL;
+  /* Do we have sections?  */
+  {
+size_t nshdrs;
+if (elf_getshdrnum (elf, ) != 0)
+  {
+	elf_issue (N_("section headers"));
+	return false;
+  }
+if (nshdrs > 0)
+  has_sections = true;
+  }
+
   {
 size_t shstrndx;
 if (unlikely (elf_getshdrstrndx (elf, ) < 0))
@@ -303,26 +320,28 @@ run_classify (void)
 if (verbose > 2)
   fprintf (stderr, "debug: section header %s (type %d) found\n",
section_name, shdr->sh_type);
-if (shdr->sh_type == SHT_DYNAMIC)
-  {
-if (verbose > 1)
-  fputs ("debug: dynamic section found\n", stderr);
-dyn_section = scn;
-  }
 if (shdr->sh_type == SHT_SYMTAB)
   {
 if (verbose > 1)
   fputs ("debug: symtab section found\n", stderr);
 has_symtab = true;
   }
-	if (shdr->sh_type == SHT_PROGBITS && (shdr->sh_flags & SHF_ALLOC) != 0)
+	/* NOBITS and NOTE sections can be in any file.  We want to be
+	   sure there is at least one other allocated section.  */
+	if (shdr->sh_type != SHT_NOBITS
+	&& shdr->sh_type != SHT_NOTE
+	&& (shdr->sh_flags & SHF_ALLOC) != 0)
 	  {
-	if (verbose > 1 && !has_progbits_alloc)
-	  fputs ("debug: allocated PROGBITS section found\n", stderr);
-	has_progbits_alloc = true;
+	if (verbose > 1 && !has_bits_alloc)
+	  fputs ("debug: allocated (non-nobits/note) section found\n",
+		 stderr);
+	has_bits_alloc = true;
 	  }
 const char *debug_prefix = ".debug_";
-if (strncmp (section_name, debug_prefix, strlen (debug_prefix)) == 0)
+const char *zdebug_prefix = ".zdebug_";
+if (strncmp (section_name, debug_prefix, strlen (debug_prefix)) == 0
+	|| strncmp (section_name, zdebug_prefix,
+			strlen (zdebug_prefix)) == 0)
   {
 if (verbose > 1 && !has_debug_sections)
   fputs ("debug: .debug_* section found\n", stderr);
@@ -347,34 +366,29 @@ run_classify (void)
   /* Examine the dynamic section.  */
   if (has_dynamic)
 {
-  if (dyn_section != NULL)
-{
-  Elf_Data *data = elf_getdata (dyn_section, NULL);
-  if (verbose > 2)
-fprintf (stderr, "debug: Elf_Data for dynamic 

Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 11:36:53PM +0200, Mark Wielaard wrote:
> On Sat, Jul 20, 2019 at 12:23:08AM +0300, Dmitry V. Levin wrote:
> > On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> > > * Dmitry V. Levin:
> > > 
> > > >> So, I don't think the code is wrong. We might want to tweak the comment
> > > >> a bit though, to make it less definitive?
> > > >
> > > > What I'm saying is that has_soname is just a hint which is probably even
> > > > less reliable than has_program_interpreter.
> > > 
> > > If I recall correctly, I added the soname check to classify
> > > /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> > > completely out of nowhere.
> > 
> > Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.
> > 
> > If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
> > could we come up with a more reliable heuristics than DT_SONAME and 
> > PT_INTERP?
> 
> Why do you feel it is unreliable? Do you have any examples of files
> misidentified?

No, I don't have such examples because most (if not all) ET_DYN
non-DF_1_PIE objects we have nowadays are actually libraries regardless
of their DT_SONAME or PT_INTERP.

What actually disqualifies these objects from being libraries, besides
missing PT_DYNAMIC?

The only reason why I feel uncomfortable to rely on this has_soname check
is that DT_SONAME is so easily added unnoticed by mistake.

btw, I think it would be appropriate to move the has_dynamic check before
the first check in is_shared that returns true.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
On Sat, Jul 20, 2019 at 12:23:08AM +0300, Dmitry V. Levin wrote:
> On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> > * Dmitry V. Levin:
> > 
> > >> So, I don't think the code is wrong. We might want to tweak the comment
> > >> a bit though, to make it less definitive?
> > >
> > > What I'm saying is that has_soname is just a hint which is probably even
> > > less reliable than has_program_interpreter.
> > 
> > If I recall correctly, I added the soname check to classify
> > /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> > completely out of nowhere.
> 
> Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.
> 
> If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
> could we come up with a more reliable heuristics than DT_SONAME and PT_INTERP?

Why do you feel it is unreliable? Do you have any examples of files
misidentified? I tested a bit and --shared seems to correctly
indentify all shared libraries. I did add --program as a counterpart
to --executable if you really want to identify such "libraries" as
programs. But in general it looks like --shared and --executable come
up with the correct classification.

The only two examples I could find were the glibc and Qt binaries
which have "dual use" library/executables. And I believe --shared
corrrectly identifies them as primarily shared libraries.

Cheers,

Mark


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 11:00:49PM +0200, Florian Weimer wrote:
> * Dmitry V. Levin:
> 
> >> So, I don't think the code is wrong. We might want to tweak the comment
> >> a bit though, to make it less definitive?
> >
> > What I'm saying is that has_soname is just a hint which is probably even
> > less reliable than has_program_interpreter.
> 
> If I recall correctly, I added the soname check to classify
> /lib64/libc.so.6 as a library, not an executable.  So it didn't come
> completely out of nowhere.

Well, /lib64/libc.so.6 is not just a library, it's also a valid executable.

If the ELF type is ET_DYN and the object is not marked as DF_1_PIE,
could we come up with a more reliable heuristics than DT_SONAME and PT_INTERP?


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Florian Weimer
* Dmitry V. Levin:

>> So, I don't think the code is wrong. We might want to tweak the comment
>> a bit though, to make it less definitive?
>
> What I'm saying is that has_soname is just a hint which is probably even
> less reliable than has_program_interpreter.

If I recall correctly, I added the soname check to classify
/lib64/libc.so.6 as a library, not an executable.  So it didn't come
completely out of nowhere.

Thanks,
Florian


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 04:21:53PM +0200, Mark wrote:
> On Fri, 2019-07-19 at 16:43 +0300, Dmitry V. Levin wrote:
> > On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
> > [...]
> > > +static bool
> > > +is_shared (void)
> > > +{
> > > +  if (!is_loadable ())
> > > +return false;
> > > +
> > > +  /* The ELF type is very clear: this is an executable.  */
> > > +  if (elf_type == ET_EXEC)
> > > +return false;
> > > +
> > > +  /* If the object is marked as PIE, it is definitely an
> > > executable,
> > > + and not a loadlable shared object.  */
> > > +  if (has_pie_flag)
> > > +return false;
> > > +
> > > +  /* Treat a DT_SONAME tag as a strong indicator that this is a
> > > shared
> > > + object.  */
> > > +  if (has_soname)
> > > +return true;
> > 
> > I'm not sure DT_SONAME is a reliable indicator.
> > 
> > I've seen many cases of DT_SONAME being erroneously applied to 
> > non-libraries, e.g. lib.so was used as soname in openjdk executables.
> 
> I didn't know. Is this really common?

I don't think it is very common, but the mistake is very easy to make
(-Wl,-soname,lib.so) and it doesn't really break anything.  Apparently,
some projects apply the same linker flags that add DT_SONAME to all
generated files.

> I did find one java binary on my system that indeed has this problem.
> $ eu-readelf -d /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-
> 0.el7_6.x86_64/jre/bin/policytool
> 
> Dynamic segment contains 39 entries:
>  Addr: 0x00600d88  Offset: 0x000d88  Link to section: [ 7]
> '.dynstr'
>   Type  Value
>   NEEDEDShared library: [libpthread.so.0]
>   NEEDEDShared library: [libz.so.1]
>   NEEDEDShared library: [libX11.so.6]
>   NEEDEDShared library: [libjli.so]
>   NEEDEDShared library: [libdl.so.2]
>   NEEDEDShared library: [libc.so.6]
>   SONAMELibrary soname: [lib.so]
>   RPATH Library rpath:
> [$ORIGIN/../lib/amd64/jli:$ORIGIN/../lib/amd64]
> [...]
> 
> But even so eu-elfclassify still doesn't treat it as a shared library,
> because:
> $ eu-elfclassify -v --shared policytool; echo $?
> info: policytool: ELF kind: ELF_K_ELF (0x3)
> info: policytool: ELF type: ET_EXEC (0x2)
> info: policytool: PT_LOAD found
> info: policytool: allocated PROGBITS section found
> info: policytool: program interpreter found
> info: policytool: dynamic segment found
> info: policytool: soname found
> info: policytool: DT_DEBUG found
> 1
> 
> So other characteristics like it being ET_EXEC mark it as an
> executable. And I assume if it was PIE (ET_DYN) the PIE DT_FLAGS would
> have caught it.

Yes, the checks above has_soname are much more definitive.

> So, I don't think the code is wrong. We might want to tweak the comment
> a bit though, to make it less definitive?

What I'm saying is that has_soname is just a hint which is probably even
less reliable than has_program_interpreter.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Dmitry V. Levin
On Fri, Jul 19, 2019 at 02:47:09PM +0200, Mark Wielaard wrote:
[...]
> +static bool
> +is_shared (void)
> +{
> +  if (!is_loadable ())
> +return false;
> +
> +  /* The ELF type is very clear: this is an executable.  */
> +  if (elf_type == ET_EXEC)
> +return false;
> +
> +  /* If the object is marked as PIE, it is definitely an executable,
> + and not a loadlable shared object.  */
> +  if (has_pie_flag)
> +return false;
> +
> +  /* Treat a DT_SONAME tag as a strong indicator that this is a shared
> + object.  */
> +  if (has_soname)
> +return true;

I'm not sure DT_SONAME is a reliable indicator.

I've seen many cases of DT_SONAME being erroneously applied to 
non-libraries, e.g. lib.so was used as soname in openjdk executables.


-- 
ldv


signature.asc
Description: PGP signature


Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
Hi,

Some answers to this older discussion to explain some of my recent
commits suggested for elfclassify.

On Tue, 2019-04-16 at 13:38 +0200, Florian Weimer wrote:
> * Mark Wielaard:
> > --elf PATH return 0 whenever the file can be opened and a minimal ELF
> > header can be read (it might not be a completely valid ELF file). Do we
> > want or need to do any more verification (e.g. try to get the full ELF
> > header, walk through all phdrs and shdrs)?
> If we ever need that, I think we should add it as separate options,
> detecting both separate debuginfo and regular ELF files.

You are probably right that a real verification isn't the task of
elfclassify. We already have elflint. But I did make --elf required by
default (with as check simply that libelf could open the file without
marking it as ELF_K_NONE). The user can disable it again with --not-
elf. It felt slightly odd to not have any filter active by default.

I did add two "sub" classifications --elf-file and --elf-archive
because I believe people often make a distinction between "normal" ELF
files and ELF archives (containers of ELF objects).

I also added detection of --debug-only ELF file using the heuristic
that Frank also came up with for the dbgserver (contains .debug_*
sections, but no allocated SHT_PROGBIT sections).

> > --unstripped (not yet implemented) would be a classification that
> > indicates whether the ELF file can be stripped (further), that is has a
> > .symtab (symbol table), .debug_* sections (and possibly any non-
> > loadable sections -- "file" only detects the first two).
> 
> Some non-allocated sections are expected in stripped binaries:
> .gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
> context.  I'm not sure if we should flag any other non-allocated section
> in this way.

I don't think those sections need special flagging.

> > I am not sure --file=PATH is a useful option.
> 
> It's useful for determining if the file exists and can be mapped.
> 
> > But maybe we need some way to indicate whether a file is a real file or
> > a symlink? But the current implementation returns 0 even for symlinks.
> > As do every other option (if the file is a symlink to an ELF file of
> > the requested classification). Is this what we want? I would suggest
> > that we return 1 for anything that is not a regular file. But that
> > would mean that for example eu-elfclassify --executable=/proc/$$/exe
> > would also return 1 (currently it returns 0, which might be helpful in
> > some cases).
> 
> I don't know what RPM needs in this context.  I expect that it can
> easily filter out non-regular files.  My problem with symbolic link
> detection is that it is so inconsistent—it generally applies to the
> final pathname component, and that does not look useful to me.

Since --file was available again I reused it to indicate that the final
pathname component should be a regular file (not a symlink or special
device). I think that is useful in general. But you are right there are
other tools to filter out symlinks/special device files. It was useful
for quick and dirty testing though.

> > --loadable basically checks whether the given ELF file is not an object
> > (ET_REL) file, so it will return 0 for either an executable, a shared
> > object or core file, but not check whether any other attribute (like
> > whether it has program headers and/or loadable segments). Personally I
> > would like it if this at least included a check for a PT_LOAD segment.
> 
> Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
> It is possible to have mostly empty objects, after all.

I did add this extra check, because I am a little paranoid about having
to deal with totally broken ELF files. My reasoning was basically that
without a PT_LOAD there is nothing in memory to load/execute.

> > This does not classify kernel modules as loadable objects.
> > rpm does contain a check for that, it might make sense to include that
> > as a separate classification in elfclassify --kernel-module.
> > 
> > Kernel modules are also special because they can be compressed ELF
> > files. Do we want to support that? (It is easy with gelf_elf_begin).
> > That could for example be an flag/option like --compressed which can be
> > combined with any other classification option?
> 
> How relevant are kernel modules to eu-elfclassify?
> 
> Is path-based detection feasible for kernel modules?

Sadly kernel modules often need special treatment that you wouldn't
need for "normal" ET_REL files. path-based detection is only partially
possible (rpm used to use the extension name, but that was fragile). So
I added an option --linux-kernel-module that detects them. I also added
a -z option to try to detect ELF files inside compressed images because
it was easy and because these days kernel modules are often compressed.

> > I think another useful classification would be --debugfile which
> > succeeds if the primary function of the given ELF file is being a
> 

Re: [PATCH] elfclassify tool

2019-07-19 Thread Mark Wielaard
Hi,

Sorry, this took way too long. But I really like this code.

On Thu, 2019-04-18 at 13:17 +0200, Florian Weimer wrote:
> * Florian Weimer:
> 
> > > BTW. Florian, the extra options are certainly not required for you to
> > > implement to get eu-elfclassify accepted. They are just suggestions,
> > > which we might decide not to do/add. Or they can be added by others if
> > > they think they are useful.
> > 
> > Understood.  I would rather fix the command line syntax as a priority,
> > implement --unstripped, and add a test suite.
> 
> The patch below, also available here:
> 
>   
> 
> reworks the command line parser, implements filtering of file lists, and
> adds the --unstripped option.

That looks really good. I went ahead an fixed a couple of nits and
added some of my suggestions. I'll respond to your other email
explaining some of my reasoning. The changes I made are:

  elfclassify: Fix bre -> be typo in "unstripped" option help text.
  elfclassify: When reading stdin make sure paths don't include newline.
  elfclassify: Allow inspecting compressed or (kernel) image files with -z.
  elfclassify: Always clean up ELF file and descriptor if one is still open.
  elfclassify: Don't treat errors in elf_open or run_classify as fatal.
  elfclassify: Add --quiet/-q to suppress error messages.
  elfclassify: Add \n to fputs debug output.
  elfclassify: Add --file/-f for testing just regular files.
  elfclassify: Require --elf by default. Add more classifications.
  elfclassify: Add elf_kind and elf_type strings for verbose output.
  elfclassify: Require PT_LOAD for loadable classification.
  elfclassify: Add --program classification.
  elfclassify: Don't use ARGP_NO_EXIT and document exit code expectation.
  elfclassify: Add --linux-kernel-module classification.
  elfclassify: Add --debug-only classification.

Attached is the new version. The individual commits can be found here:
https://code.wildebeest.org/git/user/mjw/elfutils/log/?h=elfclassify

Please let me know if any of this looks bad or unusual.

I'll write some testcases.

Thanks,

Mark
diff --git a/src/Makefile.am b/src/Makefile.am
index 2b1c0dcb..69ac4dbe 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -26,7 +26,8 @@ AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
 AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw
 
 bin_PROGRAMS = readelf nm size strip elflint findtextrel addr2line \
-	   elfcmp objdump ranlib strings ar unstrip stack elfcompress
+	   elfcmp objdump ranlib strings ar unstrip stack elfcompress \
+	   elfclassify
 
 noinst_LIBRARIES = libar.a
 
@@ -83,6 +84,7 @@ ar_LDADD = libar.a $(libelf) $(libeu) $(argp_LDADD)
 unstrip_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl
 stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl $(demanglelib)
 elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
+elfclassify_LDADD = $(libelf) $(libdw) $(libeu) $(argp_LDADD)
 
 installcheck-binPROGRAMS: $(bin_PROGRAMS)
 	bad=0; pid=; list="$(bin_PROGRAMS)"; for p in $$list; do \
diff --git a/src/elfclassify.c b/src/elfclassify.c
new file mode 100644
index ..83a97d47
--- /dev/null
+++ b/src/elfclassify.c
@@ -0,0 +1,989 @@
+/* Classification of ELF files.
+   Copyright (C) 2019 Red Hat, Inc.
+   This file is part of elfutils.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include ELFUTILS_HEADER(elf)
+#include ELFUTILS_HEADER(dwelf)
+#include "printversion.h"
+
+/* Name and version of program.  */
+ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
+
+/* Bug report address.  */
+ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
+
+/* Set by parse_opt.  */
+static int verbose;
+
+/* Set by the main function.  */
+static const char *current_path;
+
+/* Set by open_file.  */
+static int file_fd = -1;
+
+/* Set by issue or elf_issue.  */
+static bool issue_found;
+
+/* Non-fatal issue occured while processing the current_path.  */
+static void
+issue (int e, const char *msg)
+{
+  if (verbose >= 0)
+{
+  if (current_path == NULL)
+	error (0, e, "%s", msg);
+  else
+	error (0, e, "%s '%s'", msg, current_path);
+}
+  issue_found = true;
+}
+
+/* Non-fatal issue occured 

Re: [PATCH] elfclassify tool

2019-04-18 Thread Florian Weimer
* Florian Weimer:

>> BTW. Florian, the extra options are certainly not required for you to
>> implement to get eu-elfclassify accepted. They are just suggestions,
>> which we might decide not to do/add. Or they can be added by others if
>> they think they are useful.
>
> Understood.  I would rather fix the command line syntax as a priority,
> implement --unstripped, and add a test suite.

The patch below, also available here:

  

reworks the command line parser, implements filtering of file lists, and
adds the --unstripped option.

I assume the next step is to write tests.

Thanks,
Florian

diff --git a/src/Makefile.am b/src/Makefile.am
index 2b1c0dcb..966d1da7 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -26,7 +26,8 @@ AM_CPPFLAGS += -I$(srcdir)/../libelf -I$(srcdir)/../libebl \
 AM_LDFLAGS = -Wl,-rpath-link,../libelf:../libdw
 
 bin_PROGRAMS = readelf nm size strip elflint findtextrel addr2line \
-  elfcmp objdump ranlib strings ar unstrip stack elfcompress
+  elfcmp objdump ranlib strings ar unstrip stack elfcompress \
+  elfclassify
 
 noinst_LIBRARIES = libar.a
 
@@ -83,6 +84,7 @@ ar_LDADD = libar.a $(libelf) $(libeu) $(argp_LDADD)
 unstrip_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl
 stack_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD) -ldl 
$(demanglelib)
 elfcompress_LDADD = $(libebl) $(libelf) $(libdw) $(libeu) $(argp_LDADD)
+elfclassify_LDADD = $(libelf) $(libeu) $(argp_LDADD)
 
 installcheck-binPROGRAMS: $(bin_PROGRAMS)
bad=0; pid=; list="$(bin_PROGRAMS)"; for p in $$list; do \
diff --git a/src/elfclassify.c b/src/elfclassify.c
new file mode 100644
index ..d4b46b64
--- /dev/null
+++ b/src/elfclassify.c
@@ -0,0 +1,654 @@
+/* Classification of ELF files.
+   Copyright (C) 2019 Red Hat, Inc.
+   This file is part of elfutils.
+
+   This file is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   elfutils is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see .  */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include ELFUTILS_HEADER(elf)
+#include "printversion.h"
+
+/* Name and version of program.  */
+ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
+
+/* Bug report address.  */
+ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
+
+/* Set by parse_opt.  */
+static int verbose;
+
+/* Set by the main function.  */
+static const char *current_path;
+
+/* Set by open_file.  */
+static int file_fd = -1;
+
+static bool
+open_file (void)
+{
+  if (file_fd >= 0)
+{
+  close (file_fd);
+  file_fd = -1;
+}
+
+  if (verbose > 1)
+fprintf (stderr, "debug: processing file: %s\n", current_path);
+
+  file_fd = open (current_path, O_RDONLY);
+  if (file_fd < 0)
+{
+  if (errno == ENOENT)
+{
+  if (verbose > 0)
+fprintf (stderr, N_("warning: %s: file does not exist\n"),
+ current_path);
+  return false;
+}
+  else
+error (2, errno, N_("opening %s"), current_path);
+}
+  struct stat st;
+  if (fstat (file_fd, ) != 0)
+error (2, errno, N_("reading %s\n"), current_path);
+
+  /* Reject directories here because processing those as ELF fails
+ would fail.  */
+  if (!S_ISREG (st.st_mode))
+{
+  if (verbose > 0)
+fprintf (stderr, N_("warning: %s: not a regular file\n"),
+ current_path);
+  return false;
+}
+  return true;
+}
+
+/* Set by open_elf.  */
+static Elf *elf;
+
+static bool
+open_elf (void)
+{
+  if (elf != NULL)
+{
+  elf_end (elf);
+  elf = NULL;
+}
+
+  if (!open_file ())
+return false;
+
+  elf = elf_begin (file_fd, ELF_C_READ, NULL);
+  if (elf == NULL)
+error (2, 0, "%s: %s", current_path, elf_errmsg (-1));
+  if (elf_kind (elf) != ELF_K_ELF && elf_kind (elf) != ELF_K_AR)
+{
+  if (verbose > 0)
+fprintf (stderr, N_("warning: %s: not an ELF file\n"),
+ current_path);
+  return false;
+}
+
+  return true;
+}
+
+static int elf_type;
+static bool has_program_interpreter;
+static bool has_dynamic;
+static bool has_soname;
+static bool has_pie_flag;
+static bool has_dt_debug;
+static bool has_symtab;
+static bool has_debug_sections;
+
+static void
+run_classify (void)
+{
+  /* Reset to unanalyzed default.  */
+  elf_type = 0;
+  

Re: [PATCH] elfclassify tool

2019-04-16 Thread Florian Weimer
* Mark Wielaard:

>> I still need to implement an --unstripped option and fix the
>> iteration over the dynamic section.
>
> We did already discuss some of this off-list.

Thanks for summarizing the previous discussion.

> --elf PATH return 0 whenever the file can be opened and a minimal ELF
> header can be read (it might not be a completely valid ELF file). Do we
> want or need to do any more verification (e.g. try to get the full ELF
> header, walk through all phdrs and shdrs)?

If we ever need that, I think we should add it as separate options,
detecting both separate debuginfo and regular ELF files.

> --unstripped (not yet implemented) would be a classification that
> indicates whether the ELF file can be stripped (further), that is has a
> .symtab (symbol table), .debug_* sections (and possibly any non-
> loadable sections -- "file" only detects the first two).

Some non-allocated sections are expected in stripped binaries:
.gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
context.  I'm not sure if we should flag any other non-allocated section
in this way.

> I am not sure --file=PATH is a useful option.

It's useful for determining if the file exists and can be mapped.

> But maybe we need some way to indicate whether a file is a real file or
> a symlink? But the current implementation returns 0 even for symlinks.
> As do every other option (if the file is a symlink to an ELF file of
> the requested classification). Is this what we want? I would suggest
> that we return 1 for anything that is not a regular file. But that
> would mean that for example eu-elfclassify --executable=/proc/$$/exe
> would also return 1 (currently it returns 0, which might be helpful in
> some cases).

I don't know what RPM needs in this context.  I expect that it can
easily filter out non-regular files.  My problem with symbolic link
detection is that it is so inconsistent—it generally applies to the
final pathname component, and that does not look useful to me.

> --loadable basically checks whether the given ELF file is not an object
> (ET_REL) file, so it will return 0 for either an executable, a shared
> object or core file, but not check whether any other attribute (like
> whether it has program headers and/or loadable segments). Personally I
> would like it if this at least included a check for a PT_LOAD segment.

Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
It is possible to have mostly empty objects, after all.

> This does not classify kernel modules as loadable objects.
> rpm does contain a check for that, it might make sense to include that
> as a separate classification in elfclassify --kernel-module.
>
> Kernel modules are also special because they can be compressed ELF
> files. Do we want to support that? (It is easy with gelf_elf_begin).
> That could for example be an flag/option like --compressed which can be
> combined with any other classification option?

How relevant are kernel modules to eu-elfclassify?

Is path-based detection feasible for kernel modules?

> I think another useful classification would be --debugfile which
> succeeds if the primary function of the given ELF file is being a
> separete debug file (basically .debug, .dwo or dwz .multi file) which
> cannot be linked and loaded on its own

That is very difficult to detect reliably, unfortunately, and would best
be implemented in lib(g)elf itself because it would be generally useful,
for all kinds of tools which expect to process real ELF files only.

> BTW. Florian, the extra options are certainly not required for you to
> implement to get eu-elfclassify accepted. They are just suggestions,
> which we might decide not to do/add. Or they can be added by others if
> they think they are useful.

Understood.  I would rather fix the command line syntax as a priority,
implement --unstripped, and add a test suite.

>> Suggestions for improving the argp/help output are welcome as
>> well.  I'm not familiar with argp at all.
>
> You usage of argp seems fine.

Trust me, it's been a struggle so far.

> But I think you don't want to use
> ARGP_NO_EXIT. That causes standard options like --version and --help to
> not exit (with success). Which is generally what we want.
> We do want to want --version and --help to not return an error
> indicator (this is actually checked by make distcheck).

I want to exit with status 2 on usage error.  I couldn't make that
happen without ARGP_NO_EXIT.  I'm open to different suggestions.

> I think we might want to avoid specific ELF concepts in the
> classification descriptors though. For example people might have a
> different concept of DSO.
>
>> I'm keeping a branch with these changes here:
>> 
>>   
>>
>
>> +/* Name and version of program.  */
>> +ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>> +
>> +/* Bug report address.  */
>> +ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>> +
>> +enum classify_command

Re: [PATCH] elfclassify tool

2019-04-15 Thread Mark Wielaard
Hi,

On Fri, 2019-04-12 at 17:38 +0200, Florian Weimer wrote:
> This patch adds an elfclassify tool, mainly for the benefit of RPM's
> find-debuginfo.sh.

I have CCed Panu to see if he has any input.

> I still need to implement an --unstripped option and fix the
> iteration over the dynamic section.

We did already discuss some of this off-list.

The basic idea is that we provide a replacement for using "file" as an
ELF file classifier. It currently provides the following options:

--elf=PATH Check if the file at PATH is a valid ELF object
--executable=PATH  Check if the file at PATH is an ELF program
   executable
--file=PATHCheck PATH is file that can be read
--loadable=PATHCheck if the file at PATH is a loadable object
   (program or shared object)
--shared=PATH  Check if the file at PATH is an ELF shared object
   (DSO)
-v, --verbose  Output additional information (can be specified
   multiple times)

The program returns 0 on success (the given PATH is if the requested
classification), return 1 on failure (the given PATH isn't of the
requested classification) or returns 2 on error.

Note that only one PATH can be given (the = is optional).

--elf PATH return 0 whenever the file can be opened and a minimal ELF
header can be read (it might not be a completely valid ELF file). Do we
want or need to do any more verification (e.g. try to get the full ELF
header, walk through all phdrs and shdrs)?

Where only one of --executable and --shared can be true for an ELF file.
They indicate whether the primary purpose of an ELF file is to be an
executable or a shared library (this is for example how rpm can make a
decision to strip or keep the symtab table, you might want to keep it
for an ELF file that is used primarily as a common shared library, but
not if it is primarily used as executable).

--unstripped (not yet implemented) would be a classification that
indicates whether the ELF file can be stripped (further), that is has a
.symtab (symbol table), .debug_* sections (and possibly any non-
loadable sections -- "file" only detects the first two).

I am not sure --file=PATH is a useful option.
But maybe we need some way to indicate whether a file is a real file or
a symlink? But the current implementation returns 0 even for symlinks.
As do every other option (if the file is a symlink to an ELF file of
the requested classification). Is this what we want? I would suggest
that we return 1 for anything that is not a regular file. But that
would mean that for example eu-elfclassify --executable=/proc/$$/exe
would also return 1 (currently it returns 0, which might be helpful in
some cases).

--loadable basically checks whether the given ELF file is not an object
(ET_REL) file, so it will return 0 for either an executable, a shared
object or core file, but not check whether any other attribute (like
whether it has program headers and/or loadable segments). Personally I
would like it if this at least included a check for a PT_LOAD segment.

This does not classify kernel modules as loadable objects.
rpm does contain a check for that, it might make sense to include that
as a separate classification in elfclassify --kernel-module.

Kernel modules are also special because they can be compressed ELF
files. Do we want to support that? (It is easy with gelf_elf_begin).
That could for example be an flag/option like --compressed which can be
combined with any other classification option?

I think another useful classification would be --debugfile which
succeeds if the primary function of the given ELF file is being a
separete debug file (basically .debug, .dwo or dwz .multi file) which
cannot be linked and loaded on its own

BTW. Florian, the extra options are certainly not required for you to
implement to get eu-elfclassify accepted. They are just suggestions,
which we might decide not to do/add. Or they can be added by others if
they think they are useful.

> Suggestions for improving the argp/help output are welcome as
> well.  I'm not familiar with argp at all.

You usage of argp seems fine. But I think you don't want to use
ARGP_NO_EXIT. That causes standard options like --version and --help to
not exit (with success). Which is generally what we want.
We do want to want --version and --help to not return an error
indicator (this is actually checked by make distcheck).

I think we might want to avoid specific ELF concepts in the
classification descriptors though. For example people might have a
different concept of DSO.

> I'm keeping a branch with these changes here:
> 
>   
>

> +/* Name and version of program.  */
> +ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
> +
> +/* Bug report address.  */
> +ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
> +
> +enum classify_command
> +{
> +  classify_file = 1000,
> +  classify_elf,
> +