Re: [PATCH] elfclassify tool

2019-04-16 Thread Florian Weimer
* Mark Wielaard:

>> I still need to implement an --unstripped option and fix the
>> iteration over the dynamic section.
>
> We did already discuss some of this off-list.

Thanks for summarizing the previous discussion.

> --elf PATH return 0 whenever the file can be opened and a minimal ELF
> header can be read (it might not be a completely valid ELF file). Do we
> want or need to do any more verification (e.g. try to get the full ELF
> header, walk through all phdrs and shdrs)?

If we ever need that, I think we should add it as separate options,
detecting both separate debuginfo and regular ELF files.

> --unstripped (not yet implemented) would be a classification that
> indicates whether the ELF file can be stripped (further), that is has a
> .symtab (symbol table), .debug_* sections (and possibly any non-
> loadable sections -- "file" only detects the first two).

Some non-allocated sections are expected in stripped binaries:
.gnu_debuglink, .shstrtab, .gnu.build.attributes look relevant in this
context.  I'm not sure if we should flag any other non-allocated section
in this way.

> I am not sure --file=PATH is a useful option.

It's useful for determining if the file exists and can be mapped.

> But maybe we need some way to indicate whether a file is a real file or
> a symlink? But the current implementation returns 0 even for symlinks.
> As do every other option (if the file is a symlink to an ELF file of
> the requested classification). Is this what we want? I would suggest
> that we return 1 for anything that is not a regular file. But that
> would mean that for example eu-elfclassify --executable=/proc/$$/exe
> would also return 1 (currently it returns 0, which might be helpful in
> some cases).

I don't know what RPM needs in this context.  I expect that it can
easily filter out non-regular files.  My problem with symbolic link
detection is that it is so inconsistent—it generally applies to the
final pathname component, and that does not look useful to me.

> --loadable basically checks whether the given ELF file is not an object
> (ET_REL) file, so it will return 0 for either an executable, a shared
> object or core file, but not check whether any other attribute (like
> whether it has program headers and/or loadable segments). Personally I
> would like it if this at least included a check for a PT_LOAD segment.

Is a PT_LOAD segment required to make the PT_DYNAMIC segment visible?
It is possible to have mostly empty objects, after all.

> This does not classify kernel modules as loadable objects.
> rpm does contain a check for that, it might make sense to include that
> as a separate classification in elfclassify --kernel-module.
>
> Kernel modules are also special because they can be compressed ELF
> files. Do we want to support that? (It is easy with gelf_elf_begin).
> That could for example be an flag/option like --compressed which can be
> combined with any other classification option?

How relevant are kernel modules to eu-elfclassify?

Is path-based detection feasible for kernel modules?

> I think another useful classification would be --debugfile which
> succeeds if the primary function of the given ELF file is being a
> separete debug file (basically .debug, .dwo or dwz .multi file) which
> cannot be linked and loaded on its own

That is very difficult to detect reliably, unfortunately, and would best
be implemented in lib(g)elf itself because it would be generally useful,
for all kinds of tools which expect to process real ELF files only.

> BTW. Florian, the extra options are certainly not required for you to
> implement to get eu-elfclassify accepted. They are just suggestions,
> which we might decide not to do/add. Or they can be added by others if
> they think they are useful.

Understood.  I would rather fix the command line syntax as a priority,
implement --unstripped, and add a test suite.

>> Suggestions for improving the argp/help output are welcome as
>> well.  I'm not familiar with argp at all.
>
> You usage of argp seems fine.

Trust me, it's been a struggle so far.

> But I think you don't want to use
> ARGP_NO_EXIT. That causes standard options like --version and --help to
> not exit (with success). Which is generally what we want.
> We do want to want --version and --help to not return an error
> indicator (this is actually checked by make distcheck).

I want to exit with status 2 on usage error.  I couldn't make that
happen without ARGP_NO_EXIT.  I'm open to different suggestions.

> I think we might want to avoid specific ELF concepts in the
> classification descriptors though. For example people might have a
> different concept of DSO.
>
>> I'm keeping a branch with these changes here:
>> 
>>   
>>
>
>> +/* Name and version of program.  */
>> +ARGP_PROGRAM_VERSION_HOOK_DEF = print_version;
>> +
>> +/* Bug report address.  */
>> +ARGP_PROGRAM_BUG_ADDRESS_DEF = PACKAGE_BUGREPORT;
>> +
>> +enum classify_command

Re: [PATCH V2 2/2] Add backend support for C-SKY

2019-04-16 Thread Mao Han
On Mon, Apr 15, 2019 at 01:08:58PM +0200, Mark Wielaard wrote:
> > +   * backends/Makefile.am: Add C-SKY.
> > +   * backends/csky_cfi.c: New file.
> > +   * backends/csky_corenote.c: Likewise.
> > +   * backends/csky_init.c: Likewise.
> > +   * backends/csky_initreg.c: Likewise.
> > +   * backends/csky_regs.c: Likewise.
> > +   * backends/csky_reloc.def: Likewise.
> > +   * backends/csky_symbol.c: Likewise.
> > +   * libebl/eblopenbackend.c: Add C-SKY.
> > +   * src/elflint.c: Likewise.
> 
> We have ChangeLog files per directory.
> So you don't need to prefix with backends/
> And the last two entries should go into their own (libebl and src)
> ChangeLog files.
>
OK
 
> > diff --git a/backends/csky_corenote.c b/backends/csky_corenote.c
> > new file mode 100644
> > index 000..c32b386
> > --- /dev/null
> > +++ b/backends/csky_corenote.c
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#define BACKENDcsky_
> > +#include "libebl_CPU.h"
> > +
> > +#defineULONG   uint32_t
> > +#define PID_T  int32_t
> > +#defineUID_T   uint32_t
> > +#defineGID_T   uint32_t
> > +#define ALIGN_ULONG4
> > +#define ALIGN_PID_T4
> > +#define ALIGN_UID_T4
> > +#define ALIGN_GID_T4
> > +#define TYPE_ULONG ELF_T_WORD
> > +#define TYPE_PID_T ELF_T_SWORD
> > +#define TYPE_UID_T ELF_T_WORD
> > +#define TYPE_GID_T ELF_T_WORD
> > +
> > +static const Ebl_Register_Location prstatus_regs[] =
> > +  {
> > +{ .offset = 0, .regno = 0, .count = 38, .bits = 32 } /* r0..r31 */
> > +  };
> > +#define PRSTATUS_REGS_SIZE (38 * 4)
> > +
> > +#include "linux-core-note.c"
> 
> OK, that is a very simple 32bit architecture/register setup.

The actual size of pr_reg is 36 word, so changed to
static const Ebl_Register_Location prstatus_regs[] =
  {
/* pc, a1, a0, sr, r2 ~ r31, reserved, reserved */
{ .offset = 0, .regno = 0, .count = 36, .bits = 32 }
  };
#define PRSTATUS_REGS_SIZE  (36 * 4)

We also have 400 bytes fpu registers .reg2 section, will it
cause any prblem if it is not handled?

> > +  /* gcc/config/ #define DWARF_FRAME_REGISTERS.  */
> > +  eh->frame_nregs = 71;
> > +
> > +  return MODVERSION;
> 
> The 71 matches what can/is being restored through other hooks. But
> seems to not match the c-SKY V2 CPU ABI DWARF register mappings. Also
> maybe it should be 70, because you are using link as return register
> and don't explicitly need to store the pc (see below).

As I mentioned in:
https://sourceware.org/ml/elfutils-devel/2019-q2/msg6.html
DWARF mappings from the document is mismatch with GCC source code.

71 comes from FIRST_PSEUDO_REGISTER
./gcc/defaults.h:#define DWARF_FRAME_REGISTERS FIRST_PSEUDO_REGISTER
The DWARF register mappings should be some thing like(rr stands 
for reserved):
// r0   r1   r2   r3   r4   r5   r6   r7
// 01234567
// r8   r9   r10  r11  r12  r13  sp   lr
// 8910   11   12   13   14   15
// r16  r17  r18  r19  r20  r21  r22  r23
// 16   17   18   19   20   21   22   23
// r24  r25  r26  r27  r28  r29  r30  r31
// 24   25   26   27   28   29   30   31
// rr   rr   rr   rr   hi   lo   epc
// 32   33   34   35   36   37   72
This part is quite similar to native register numbering.
from 38 to 68 are fpu registers, each fpu register is 
consist of two dwarf register. I was intended to support
the fpu part, but the fpu register mapping is not so clear
at present. So I'll drop the fpu part first, only use 38
DWARF registers.

> > diff --git a/backends/csky_regs.c b/backends/csky_regs.c
> > new file mode 100644
> > index 000..0b6706e
> > +
> This confuses me too. So there are 71 registers? But you only handle
> 0..38? The names don't seem to match the usage in other places, but I
> am assuming DWARF register numbers match native register numbering.
> Which might not be how things actually work?
>
Likewise.

> > +  dwarf_regs[3] = user_regs.a3;
> > +  for (int i = 4; i < 14; i++)
> > +dwarf_regs[i] = user_regs.regs[i - 4];
> > +  /* r ~ r13.  */
> > +  for (int i = 16; i < 31; i++)
> > +dwarf_regs[i] = user_regs.exregs[i - 16];
> > +  /* tls.  */
> > +  dwarf_regs[31] = user_regs.tls;
> > +  /* hi.  */
> > +  dwarf_regs[34] = user_regs.rhi;
> > +  /* lo.  */
> > +  dwarf_regs[35] = user_regs.rlo;
> > +  /* pc.  */
> > +  dwarf_regs[70] = user_regs.pc;
> > +
> > +  return setfunc (0, 71, dwarf_regs, arg);
> > +#endif
> > +}
> 
> I think you should use setfunc (-1, -1, [70], arg) to set the pc
> explicitly and then setfunc (0, 36, dwarf_regs, arg) to set the rest.
> But you don't seem to actually need the full 71 dwarf_regs array. Again
> I am confused about the actual DWARF register number mapping.
> 
OK

> 
> > +bool
> > +csky_machine_flag_check (GElf_Word flags)
> > +{
> > +  switch (flags & EF_CSKY_ABIMASK)
> > +{
> > +case EF_CSKY_ABIV1:
>