[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-08-21 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #23 from zenith432 at users dot sourceforge.net ---
(In reply to Nick Clifton from comment #22)
> I am confused by this part of the patch:
> 
>   for (plugin_list_iter = plugin_list;
>plugin_list_iter != NULL;
>plugin_list_iter = plugin_list_iter->next)
> {
>   if (plugin_handle == plugin_list_iter->handle)
> {
>   dlclose (plugin_handle);
>   if (!plugin_list_iter->claim_file)
> return 0;
>   register_claim_file (plugin_list_iter->claim_file);
>   goto have_claim_file;
> }
> }
> 
> If I am reading this correctly, this says that if we are attempting to 
> load a plugin that was previously loaded, then we call dlclose() on it,
> and then, if the plugin has a file claim function, we register this 
> function as the global file claimant, despite the plugin having been
> closed...
> Is that right ?

Not quite.  The first time a plugin is loaded, it's not in the linked list, so
the for loop you quoted gets skipped.  The plugin's dlopen() refcount is set to
1, the code later on creates a linked-list item for it and initializes it with
the handle, it's claim_file and links the item.
The 2nd or subsequent times a plugin is loaded, the dlopen() bumps its refcount
to 2, the plugin's handle is found in the linked list, calling dlclose()
reduces the refcount back to 1 (to keep it from growing unbounded).  The plugin
is still loaded (with refcount 1), and the cached claim_file pointer which was
found during the initial load is installed.

> Also  - it seems to me that if we have a list of loaded plugins, then
> we no longer need the claim_file global variable, and the try_claim() 
> function should walk the list of plugins, trying each in turn.

I agree, but I was trying to keep the logic localized to try_claim_plugin()
which is called with a specific plugin name and input file.  Your suggestion
requires refactoring the code at a higher level.  It is entered via
bfd_plugin_object_p() which is referenced from plugin_vec - and is called for a
specific input file.  Then it goes to load_plugin() which iterates over plugins
by name for a given input file.  load_plugin() in turn calls try_claim_plugin()
with a specific plugin name and input file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-08-17 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #20 from zenith432 at users dot sourceforge.net ---
It's not just repetitive calls to dlopen() on the same plugin.

It is also repetitive calls to onload() on the same plugin.  This isn't
supported by the plugin API.  Today this works on gcc & llvm plugins because
they don't allocate memory in onload() unless LDPT_OPTIONs are passed.

If try_load_plugin() in bfd/plugin.c can be called with different plugins
during the same run the right solution is what Alan Modra says in comment 18.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-08-16 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #15 from zenith432 at users dot sourceforge.net ---
(In reply to Evangelos Foutras from comment #14)
> I believe the above error is caused by a sequence of "dlopen -> dlclose ->
> dlopen" on LLVMgold.so. (liblto_plugin.so doesn't seem to mind being
> unloaded and loaded again.)
> 
> Is the repeated dlopen expected here?

Yes, because bfd/plugin.c unloads the plugin on any input file that is not
claimed by the plugin.

I propose the following change

= begin patch
diff --git a/bfd/plugin.c b/bfd/plugin.c
index d9b9e2f1..3b738c38 100644
--- a/bfd/plugin.c
+++ b/bfd/plugin.c
@@ -274,7 +274,7 @@ have_claim_file:
 goto err;

   if (!try_claim (abfd))
-goto err;
+return 0;

   abfd->plugin_format = bfd_plugin_yes;

= end patch

On a different subject, since you're submitting commits to llvm, could you
please fix the design flaw in llvm plugin described in Comment #5?  It uses the
file descriptor fd passed to claim_file_hook() as a uniqueness-detector to
determine which input files are found in the same archive.  This is a no-no
because the plugin API disallows using the fd passed to claim_file_hook() after
it returns.  It is essential that the plugin client be free to reuse file
descriptors for input files because there may be more input files than open
descriptors allowed by the kernel.

The plugin should not really have to detect whether input files come from the
same archive, but if it must should use some other method such as...
1) Use the filename instead of descriptor as a unique key.
2) do fstat() on the file descriptor and use a  pair as a
unique key.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-07-30 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #5 from zenith432 at users dot sourceforge.net ---
Correction:

After another review, I noticed that the gcc plugin does NOT free the syms
array passed to add_symbols() during cleanup_hook().  This array is freed
during the all_symbols_read_hook(), and cleanup_hook() assumes it's already
free.

So...  it's hopeless to prevent a memory leak for what bfd/plugin.c is doing

A file descriptor leak can still be avoided because...
- for the gcc plugin - doesn't use fd after claim_file_hook().
- as I said, the llvm plugin uses fd as an index to detect which input files
come from the same archive - but this does not mess it up for what bfd/plugin.c
is doing.  If fd is closed and its value is reused, llvm plugin incorrectly
concludes that all input files with the same fd come from the same archive. 
But this only messes it up in all_symbols_read_hook() when using an llvm
feature called "thinlto" with a "-thinlto" option passed to the plugin.

So... I believe the patch in Comment 3 is ok.  At least for the 2 plugins the
way they are today.

I noticed one more potential leak of fd though... in bfd_plugin_open_input()

=
--- a/bfd/plugin.c
+++ b/bfd/plugin.c
@@ -187,7 +187,10 @@ bfd_plugin_open_input (bfd *ibfd, struct
ld_plugin_input_file *file)
 {
   struct stat stat_buf;
   if (fstat (file->fd, _buf))
-   return 0;
+   {
+ close(file->fd);
+ return 0;
+   }
   file->offset = 0;
   file->filesize = stat_buf.st_size;
 }
=

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-07-30 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #4 from zenith432 at users dot sourceforge.net ---
FYI
I took a look at the code for GCC 8.2 plugin and LLVM 6.0.1 gold plugin.

As best as I can tell

- Neither plugins access the file through the descriptor passed to
claim_file_hook() after it returns.

  The llvm plugin uses the fd value as an index for determining which claimed
files come from the same archive, but does not later access the file through
the fd.  Moreover, if the llvm plugin is asked to register its 
all_symbols_all_read_hook(), it requires the linker to provide get_input_file()
and release_input_file() callbacks - none of which is done by bfd/plugin.c.

  The gcc plugin, for its all_symbols_all_read_hook() passes the input file
names and offsets to lto-wrapper and doesn't need any further descriptors.

- Calls to onload() only allocate memory if passing LDPT_OPTIONs to onload()
which bfd/plugin.c doesn't do.  Without options, onload() uses only static
memory.

- The gcc plugin never releases memory allocated for LDPT_OPTIONs passed to
onload().

- The gcc plugin retains all memory allocated in claim_file_hook() (including
the memory passed to add_symbols() callback - and frees it in the
cleanup_hook().  The cleanup_hook() is not called by bfd/plugin.c today.

- The llvm plugin is written in C++ so all memory it allocates is released by
the static destructors called when dlclose() is done on the plugin.  It appears
that its cleanup_hook() only deletes temporary files and releases some cached
memory - but does not fully release all memory previously allocated - including
memory allocated in claim_file_hook().

So to summarize:
- It's better to call dlopen() and onload() only once.  onload() doesn't leak
(due to the lack of LDPT_OPTION) - but it still violates the plugin API spec to
call onload() many times.

- The file descriptor passed to claim_file_hook() should be closed right after
the call or it's never closed.

- The memory allocated in claim_file_hook(), part of which is passed to the
all_symbols() callback - can be used by bfd/plugin.c.  However, this memory
accumulates more and more with each call to claim_file_hook().  For the gcc
plugin, it's enough to call its cleanup_hook() to release the memory and "start
over" to claim more input files.  For the llvm plugin this isn't enough - it
needs to be dlclose()-ed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-07-30 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

--- Comment #3 from zenith432 at users dot sourceforge.net ---
This patch should fix it

= Begin Patch
diff --git a/bfd/plugin.c b/bfd/plugin.c
index 7c5bba22..98c79c87 100644
--- a/bfd/plugin.c
+++ b/bfd/plugin.c
@@ -209,8 +209,7 @@ try_claim (bfd *abfd)
   if (!bfd_plugin_open_input (abfd, ))
 return 0;
   claim_file (, );
-  if (!claimed)
-close (file.fd);
+  close (file.fd);
   return claimed;
 }

@@ -223,6 +222,9 @@ try_load_plugin (const char *pname, bfd *abfd, int
*has_plugin_p)
   ld_plugin_onload onload;
   enum ld_plugin_status status;

+  if (claim_file)
+goto have_claim_file;
+
   *has_plugin_p = 0;

   plugin_handle = dlopen (pname, RTLD_NOW);
@@ -257,6 +259,7 @@ try_load_plugin (const char *pname, bfd *abfd, int
*has_plugin_p)
   if (status != LDPS_OK)
 goto err;

+have_claim_file:
   *has_plugin_p = 1;

   abfd->plugin_format = bfd_plugin_no;
= End Patch

There are two leaks there
- try_load_plugin() (in bfd/plugin.c) calls dlopen() and onload() again and
again to load the plugin for each object file processed via
bfd_plugin_object_p.  There is no intervening dlclose() and the plugin memory
is not cleaned up.  Moreover, add_symbols() keeps the syms array along with the
symbol name strings that were allocated by the plugin.  Presumably this memory
is released via the plugin's cleanup_handler which is not interfaced and never
called.
It is ok I think to keep the plugin loaded permanently, but calling dlopen()
over and over increases its refcount and calling onload() over and over is a
violation of the plugin API.  onload() should be called once, and claim_file
can be called any number of times.  So the patch skips the dlopen() and
onload() if claim_file is already registered.

Note however, that calling claim_file many times for 1000s of LTO objects
allocates memory each time (for the symbols passed in via add_symbols) and this
memory is not released (it is kept used from bfd_plugin_canonicalize_symtab). 
So this solution can still potentially run out-of-memory - because the memory
for the symtab passed in to add_symbols() is never released.

- The file descriptor opened for claim_file in try_claim() should always be
closed after it.  The plugin doesn't use the descriptor anymore.  If the plugin
needs a file descriptor again later during the "all symbols read" event, it
calls get_input_file() to get another one.  However, neither
all_symbols_read_handler nor get_input_file are used by bfd/plugin.c.  See the
documentation
http://gcc.gnu.org/wiki/whopr/driver
If the descriptor opened for claim_file is not closed after, it leaks because
nobody closes it.  The add_symbols() callback is called from inside claim_file.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23460] regression: ar can not create archive containing many (>1024) lto object files

2018-07-28 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23460

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||amodra at gmail dot com

--- Comment #1 from zenith432 at users dot sourceforge.net ---
This can be reproduced with even a single object file

The first invocation of open is done via _bfd_real_fopen which takes care that
the file descriptor is eventually closed.
== BEGIN QUOTE
openat(AT_FDCWD, "a.o", O_RDONLY) = 4
#0  0x778fdc50 in open64 () from /lib64/libc.so.6
#1  0x7788f0f2 in __GI__IO_file_open () from /lib64/libc.so.6
#2  0x7788f29d in __GI__IO_file_fopen () from /lib64/libc.so.6
#3  0x77883549 in __fopen_internal () from /lib64/libc.so.6
#4  0x77ed27bf in _bfd_real_fopen () from
/lib64/libbfd-2.31.1-4.fc29.so
#5  0x77edbd18 in bfd_fopen () from /lib64/libbfd-2.31.1-4.fc29.so
#6  0xb775 in ar_emul_default_append ()
#7  0x9ddf in replace_members ()
#8  0x6f31 in main ()
== END QUOTE

The second invocation of open is done via bfd_plugin_object_p that leaves the
file descriptor leaking
== BEGIN QUOTE
openat(AT_FDCWD, "a.o", O_RDONLY) = 7
#0  0x778fdc50 in open64 () from /lib64/libc.so.6
#1  0x77f6411c in bfd_plugin_open_input () from
/lib64/libbfd-2.31.1-4.fc29.so
#2  0x77f642a9 in try_load_plugin () from
/lib64/libbfd-2.31.1-4.fc29.so
#3  0x77f6453f in bfd_plugin_object_p () from
/lib64/libbfd-2.31.1-4.fc29.so
#4  0x77ed4b2a in bfd_check_format_matches () from
/lib64/libbfd-2.31.1-4.fc29.so
#5  0x77ecec88 in _bfd_write_archive_contents () from
/lib64/libbfd-2.31.1-4.fc29.so
#6  0x77edc06e in bfd_close () from /lib64/libbfd-2.31.1-4.fc29.so
#7  0x93cd in write_archive ()
#8  0x9f7d in replace_members ()
#9  0x6f31 in main ()
== END QUOTE

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23424] gold doesn't build on Darwin

2018-07-24 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23424

--- Comment #5 from zenith432 at users dot sourceforge.net ---
This can be silenced with CXXFLAGS="-std=c++11 -Wno-c++11-narrowing", but
should probably be reported separately as a bug.  The enum in namespace elfcpp
ends up having type "unsigned int" because non-decimal literals that exceed
"int" but fit in "unsigned int" have type "unsigned int".

I think a fix should change the type of parameter pr_type in function
record_gnu_property to "unsigned int" instead of casting it.  I see that in
function add_property it is declared with type "unsigned int" - so there's some
inconsitency with it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23411] Different behavior when linking common symbol statically or to shared object

2018-07-17 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23411

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC|zenith432 at users dot sourceforge |
   |.net|

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23411] Different behavior when linking common symbol statically or to shared object

2018-07-17 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23411

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||zenith432 at users dot 
sourceforge
   ||.net

--- Comment #10 from zenith432 at users dot sourceforge.net ---
(In reply to Alexander Monakov from comment #9)
> The ELF specification does not allow this...

Where do you read that it's disallowed?  Leaving something unspecified is not
the same as disallowing it.

(In reply to Cary Coutant from comment #8)
> ... If the linker decides, based on the archive index, to include
> an IR file, it asks the plugin to provide the symbols for the
> "closer look", but the API doesn't provide a way for the linker to
> then say, "never mind, I don't need that object after all," in the case
> where the only symbol of interest happens to be a common.

gold already demonstrates how to do this...

Pluginobj::get_symbol_resolution_info
>  if (static_cast(nsyms) > this->symbols_.size())
>{
>  // We never decided to include this object. We mark all symbols as
>  // preempted.
>  gold_assert(this->symbols_.size() == 0);
>  for (int i = 0; i < nsyms; i++)
>syms[i].resolution = LDPR_PREEMPTED_REG;
>  return version > 2 ? LDPS_NO_SYMS : LDPS_OK;
>}

It's a bug in ld.bfd that it ends up setting multiple prevailing defs in this
scenario.  The plugin api prohibits(*) setting multiple prevailing defs, and
it's not necessary to do so in order to have lto1 purge unwanted symbol defs.

(*) https://gcc.gnu.org/wiki/whopr/driver, rules for the "All Symbols Read"
event.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23411] Different behavior when linking common symbol statically or to shared object

2018-07-14 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23411

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC|zenith432 at users dot sourceforge |
   |.net|

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23411] Different behavior when linking common symbol statically or to shared object

2018-07-14 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23411

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||zenith432 at users dot 
sourceforge
   ||.net

--- Comment #6 from zenith432 at users dot sourceforge.net ---
(In reply to Alexander Monakov from comment #1)
> I think the behavior exhibited by gold is expected given absence of
> --whole-archive, and it's ld.bfd that is doing something not mandated by the
> ELF spec here.

So the ELF spec does lay down an undecisive law for dynamic linking of common. 
Couldn't find anything comparable for static linking of archives.

http://www.sco.com/developers/gabi/latest/ch4.symtab.html
section 4.20
When the dynamic linker encounters a reference to a symbol that resolves to a
definition of type STT_COMMON, it may (but is not required to) change its
symbol resolution rules as follows: instead of binding the reference to the
first symbol found with the given name, the dynamic linker searches for the
first symbol with that name with type other than STT_COMMON. If no such symbol
is found, it looks for the STT_COMMON definition of that name that has the
largest size.

In the prior section 4.19 there is something about weak symbols which have
similar search conundrums
 When the link editor combines several relocatable object files, it does
not allow multiple definitions of STB_GLOBAL symbols with the same name. On the
other hand, if a defined global symbol exists, the appearance of a weak symbol
with the same name will not cause an error. The link editor honors the global
definition and ignores the weak ones. Similarly, if a common symbol exists
(that is, a symbol whose st_shndx field holds SHN_COMMON), the appearance of a
weak symbol with the same name will not cause an error. The link editor honors
the common definition and ignores the weak ones.

When the link editor searches archive libraries [see ``Archive File'' in
Chapter 7], it extracts archive members that contain definitions of undefined
global symbols. The member's definition may be either a global or a weak
symbol. The link editor does not extract archive members to resolve undefined
weak symbols. Unresolved weak symbols have a zero value.


The ambiguity I mentioned about ANSI C is resolved by noticing that Annex J.5
is a list of common externsions to C.  They are allowed by the STD for
portability, but not necessarily conforming.

The GCC documentation for -fno-common states
=
fno-common
In C code, this option controls the placement of global variables defined with-
out an initializer, known as tentative definitions in the C standard. 
Tentative definitions are distinct from declarations of a variable with the
extern keyword, which do not allocate storage.
Unix C compilers have traditionally allocated storage for uninitialized global
variables  in  a  common  block.   This  allows  the  linker  to  resolve  all 
tentative definitions of the same variable in different compilation units to
the same object, or to a non-tentative definition.  This is the behavior
specified by ‘-fcommon’, and is the default for GCC on most targets.  On the
other hand, this behavior is not required by ISO C, and on some targets may
carry a speed or code size penalty on variable references.
The ‘-fno-common’ option specifies that the compiler should instead place
unini-
tialized global variables in the data section of the object file.  This
inhibits the merging of tentative definitions by the linker so you get a
multiple-definition error if the same variable is defined in more than one
compilation unit.  Compiling with ‘-fno-common’ is useful on targets for which
it provides better performance, or if you wish to verify that the program will
work on other systems that always treat uninitialized variable definitions this
way.
=
So the GCC doc claims that the common-based implementation of tentative
definitions is simply "not required by ISO C", but the last sentence is a tacit
admission that it is in fact non-conforming to ISO C - non-portable.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug gold/23411] Different behavior when linking common symbol statically or to shared object

2018-07-14 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23411

--- Comment #4 from zenith432 at users dot sourceforge.net ---
There's an ambiguity in ANSI C about this.  From C11
Section 6.9.2 External object definitions

1 If the declaration of an identifier for an object has file scope and an
initializer, the declaration is an external definition for the identifier.

2 A declaration of an identifier for an object that has file scope without an
initializer, and without a storage-class specifier or with the storage-class
specifier static, constitutes a tentative definition. If a translation unit
contains one or more tentative definitions for an identifier, and the
translation unit contains no external definition for that identifier, then the
behavior is exactly as if the translation unit contains a file scope
declaration of that identifier, with the composite type as of the end of the
translation unit, with an initializer equal to 0.

And then there's Annex J (Informative) Portability issues
J.5.11 Multiple external definitions

1 There may be more than one external definition for the identifier of an
object, with or without the explicit use of the keyword extern; if the
definitions disagree, or more than one is initialized, the behavior is
undefined (6.9.2).

So the normative 6.9.2 decrees things to be the way described in Comment 2, but
then J.5.11 which is only informative and is about portability issues talks
about "definitions disagree, or more than one is initialized" - but according
to 6.9.2 they're always initialized.  So when they wrote J.5.11 they still had
in mind the concept of a tentative definition which is uninitialized.
You could argue there's no contradiction, but there's no reason to include the
phrase "or more than one is initialized" unless the concept of tentative
definition still has any impact.

Anyways, the choice of whether to use ANSI C or GNU extensions is given to the
compiler - what business does the linker have enforcing ANSI C after a choice
was made during compile-time to use extensions?  There are no ld options to
choose whether to enforce ANSI C or not.  The linker is not even restricted to
C.  Or C++ (which has ODR).  Fortran has common blocks.  ld supports Fortran,
doesn't it?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23350] multiple prevailing defs for unused variable in lto mode

2018-07-09 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23350

--- Comment #4 from zenith432 at users dot sourceforge.net ---
What difference does it make?  This is undefined behavior.  It doesn't link
-fno-lto either.

gcc -c main.i && gcc -c lib.i && ar rusc lib.a lib.o
gcc main.o lib.a lib.a

/usr/bin/ld: /usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o: in
function `_start':
(.text+0x20): undefined reference to `main'
collect2: error: ld returned 1 exit status

Find an example where the -fno-lto link succeeds.

ld is loading the object files in the libraries because it's searching for
main.
If I add a main() to lib.i it links successfully with -flto - the 2nd load of
lib.a sets the duplicate symbol resolutions to PREEMPTED_IR.  The error with
-flto is only if main is not found - which is also an error in -fno-lto.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23324] Regression: R_X86_64_converted_reloc_bit left in linker output with -q

2018-07-04 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23324

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #8 from zenith432 at users dot sourceforge.net ---
verified fix from commit b638b5d5.
I did not specifically test the cherry-picks to 2.30, 2.31 branches but see no
reason to doubt they work.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23309] ld.bfd -u option to force symbol discards symbol when used with LTO plugin and the symbol has hidden or internal visibility

2018-07-04 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23309

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #6 from zenith432 at users dot sourceforge.net ---
verified fix from commit 94d401b8.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23324] Regression: R_X86_64_converted_reloc_bit left in linker output with -q

2018-06-26 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23324

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||nickc at redhat dot com

--- Comment #2 from zenith432 at users dot sourceforge.net ---
Update:

1) This bit was being output from its inception in commit 7898495.

2) Up until commit a6fd92b, the bit was being masked by objdump and objcopy, so
the workaround was available.  Since commit a6fd92b, the bit is not longer
masked, but causes both these tools to stop with an error message.  readelf -r
can display the bit.

3) The -q option is for leaving section relocs in the object file for
post-processing tools.  Outputting this bit can break any post-processing tool
like it breaks objdump and objcopy.

   -q
   --emit-relocs
   Leave relocation sections and contents in fully linked executables.
   Post link analysis and optimization tools may need this information
   in order to perform correct modifications of executables.  This
   results in larger executables.

   This option is currently only supported on ELF platforms.

4) Please fix this for 2.31.

5) Following is a patch that fixes this

= begin quote
diff --git a/bfd/elf64-x86-64.c b/bfd/elf64-x86-64.c
--- a/bfd/elf64-x86-64.c
+++ b/bfd/elf64-x86-64.c
@@ -2433,6 +2433,8 @@ elf_x86_64_relocate_section (bfd *output_bfd,

   converted_reloc = (r_type & R_X86_64_converted_reloc_bit) != 0;
   r_type &= ~R_X86_64_converted_reloc_bit;
+  if (converted_reloc)
+rel->r_info &= ~(bfd_vma) R_X86_64_converted_reloc_bit;

   if (r_type >= (int) R_X86_64_standard)
return _bfd_unrecognized_reloc (input_bfd, input_section, r_type);
= end quote

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23324] Regression: R_X86_64_converted_reloc_bit left in linker output with -q

2018-06-21 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23324

--- Comment #1 from zenith432 at users dot sourceforge.net ---
It appears this issue has been around longer - since the commit that this bit
was introduced

http://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=78984959cb385388e29dcc88dc169449344f22d7

I found that in objdump built recently instead of printing a warning, it masks
out R_X86_64_converted_reloc_bit - hiding that it's in the binary.  However,
recent readelf (with -r) shows it as "unrecognized: 82" so it's there.
And doing objcopy on the executable with recent objcopy makes the bit
disappear.  So that's a kind of workaround - to run it through objcopy built
along with same version ld.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23324] New: Regression: R_X86_64_converted_reloc_bit left in linker output with -q

2018-06-21 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23324

Bug ID: 23324
   Summary: Regression: R_X86_64_converted_reloc_bit left in
linker output with -q
   Product: binutils
   Version: 2.31 (HEAD)
Status: UNCONFIRMED
  Severity: critical
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: zenith432 at users dot sourceforge.net
CC: hjl.tools at gmail dot com
  Target Milestone: 2.31

== main.c
#include 

int a = 5;
int* f(void);

int main(int argc, char** argv)
{
printf("address of a is %p\n", f());
return 0;
}
==
== helper.s
.globl f
.text
f:
movq a@GOTPCREL(%rip), %rax
ret
==

gcc -fpic -static-pie -Wl,-q -o test main.c helper.s
objdump -d -r test >test.txt

Lots of...
objdump: test: invalid relocation type 130
objdump: BFD version 2.29.1-23.fc28 assertion fail elf64-x86-64.c:351

9d28 :
9d28:   48 8d 05 f5 73 0a 00lea0xa73f5(%rip),%rax#
b1124 
9d2b: R_X86_64_NONE a-0x4
9d2f:   c3  retq

It shows R_X86_64_NONE for that reloc value of 130 == 128 | 2 ==
R_X86_64_converted_reloc_bit |  R_X86_64_PC32.

It's this commit
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=a6fd92b0578c5d2172799d7f38ddbda1bd87ea03

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23309] ld.bfd -u option to force symbol discards symbol when used with LTO plugin and the symbol has hidden or internal visibility

2018-06-20 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23309

--- Comment #3 from zenith432 at users dot sourceforge.net ---
Created attachment 11088
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11088=edit
Proposed patch

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23309] ld.bfd -u option to force symbol discards symbol when used with LTO plugin and the symbol has hidden or internal visibility

2018-06-19 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23309

--- Comment #2 from zenith432 at users dot sourceforge.net ---
So here's what's going...

In ld/plugin.c (ld.bfd)... the following code in get_symbols_v2
=
  if (res == LDPR_PREVAILING_DEF_IRONLY)
{
  /* We need to know if the sym is referenced from non-IR files.  Or
 even potentially-referenced, perhaps in a future final link if
 this is a partial one, perhaps dynamically at load-time if the
 symbol is externally visible.  */
  if (blhe->non_ir_ref_regular)
res = LDPR_PREVAILING_DEF;
  else if (is_visible_from_outside ([n], blhe))
res = def_ironly_exp;
}
=
For symbol 'main', no_ir_ref_regular is true, so it sets resolution of
LDPR_PREVAILING_DEF and the symbol is handled correctly.
For symbol 'KeepMe', no_ir_ref_regular is false, and it calls
is_visible_from_outside(), which return true because KeepMe is in the list of
entry points.  The makes it set a resolution of LDPR_PREVAILING_DEF_IRONLY_EXP
(the value of def_ironly_exp) - which is what cases KeepMe to be incorrectly
converted from "global hidden" to "local".

And in gold?
=
{
  if (is_referenced_from_outside(lsym))
res = LDPR_PREVAILING_DEF;
  else if (is_visible_from_outside(lsym))
res = ldpr_prevailing_def_ironly_exp;
  else
res = LDPR_PREVAILING_DEF_IRONLY;
}
=
is in function Pluginobj::get_symbol_resolution_info.  The function
is_referenced_from_outside returns true for both "main", and "KeepMe", which
sets a resolution of LDPR_PREVAILING_DEF for both of them getting them to be
handled correctly.
In is_referenced_from_outside, lsym->in_real_elf returns true for "main"
  and parameters->options().is_undefined(lsym->name()) returns true for
"KeepMe".

To resolve this, the logic in ld.bfd needs to be changed so that both symbols
end up with a resolution of LDPR_PREVAILING_DEF.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23309] ld.bfd -u option to force symbol discards symbol when used with LTO plugin and the symbol has hidden or internal visibility

2018-06-19 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23309

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #1 from zenith432 at users dot sourceforge.net ---
This looks strongly connected to this

https://sourceware.org/bugzilla/show_bug.cgi?id=22983

and that same function
is_visible_from_outside
in ld/plugin.c

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23309] New: ld.bfd -u option to force symbol discards symbol when used with LTO plugin and the symbol has hidden or internal visibility

2018-06-18 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23309

Bug ID: 23309
   Summary: ld.bfd -u option to force symbol discards symbol when
used with LTO plugin and the symbol has hidden or
internal visibility
   Product: binutils
   Version: 2.31 (HEAD)
Status: UNCONFIRMED
  Severity: normal
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: zenith432 at users dot sourceforge.net
  Target Milestone: ---

Test Case:

test.c
==
#include 

int main(int argc, char** argv)
{
printf("Hello World\n");
return 0;
}

void KeepMe(void)
{
__asm__ volatile ( ".asciz \"This string should appear in the
executable.\"" );
}
==

compile with

gcc -o test -flto -fvisibility=hidden -ffunction-sections -save-temps
-fuse-ld=bfd -Wl,--gc-sections,--print-gc-sections,-u,KeepMe test.c
strings -a test | grep This

Output:
/usr/bin/ld.bfd: Removing unused section '.rodata.cst4' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o'
/usr/bin/ld.bfd: Removing unused section '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o'
/usr/bin/ld.bfd: Removing unused section '.rodata' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o'
/usr/bin/ld.bfd: Removing unused section '.text.KeepMe' in file
'test.ltrans0.ltrans.o'

Notice that KeepMe is discarded and does not appear in the output file.

Save the file test.ltrans0.s as test.bfd.s

Now compile with

gcc -o test -flto -fvisibility=hidden -ffunction-sections -save-temps
-fuse-ld=gold -Wl,--gc-sections,--print-gc-sections,-u,KeepMe test.c
strings -a test | grep This

(Using gold instead of bfd)

Output:
/usr/bin/ld.gold: removing unused section from '.rodata.cst4' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crt1.o'
/usr/bin/ld.gold: removing unused section from '.text' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crti.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crti.o'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crti.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o'
/usr/bin/ld.gold: removing unused section from '.rodata' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtbegin.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib64/libc_nonshared.a(elf-init.oS)'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'/usr/lib64/libc_nonshared.a(elf-init.oS)'
/usr/bin/ld.gold: removing unused section from '.text' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtend.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtend.o'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/crtend.o'
/usr/bin/ld.gold: removing unused section from '.text' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crtn.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crtn.o'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'/usr/lib/gcc/x86_64-redhat-linux/8/../../../../lib64/crtn.o'
/usr/bin/ld.gold: removing unused section from '.text' in file
'test.ltrans0.ltrans.o'
/usr/bin/ld.gold: removing unused section from '.data' in file
'test.ltrans0.ltrans.o'
/usr/bin/ld.gold: removing unused section from '.bss' in file
'test.ltrans0.ltrans.o'
This string should appear in the executable.

So KeepMe is in the executable as it should be.

Rename test.ltrans0.s to test.gold.s

and then... diff test.bfd.s test.gold.s
30a31,32
>   .globl  KeepMe
>   .hidden KeepMe

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23306] Please add feature to nm to display symbol visibility

2018-06-18 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23306

--- Comment #2 from zenith432 at users dot sourceforge.net ---
Created attachment 11079
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11079=edit
apple_nm.c and llvm-nm.cpp

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23306] Please add feature to nm to display symbol visibility

2018-06-18 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23306

--- Comment #3 from zenith432 at users dot sourceforge.net ---
I'm not sure whether it's based on a formal specification, but here are two
source files

apple_nm.c

Taken from Apple's open source.  The function print_mach_symbols does the -m
format output.

llvm-nm.cpp

Taken from llvm 6.0.  The function darwinPrintSymbol does the -m format output.
 The comments there specifically say the code is intended to mimic the Darwin
-m format, and near the beginning of the file it shows a support for 3 formats
- BSD, POSIX, DARWIN ("Darmin -m format").

The output is for mach-o.  ELF has a bigger set of symbol visibility and other
attributes.  So the format needs some adjustment - cannot be copied verbatim.

I hope this helps.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug binutils/23306] New: Please add feature to nm to display symbol visibility

2018-06-18 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23306

Bug ID: 23306
   Summary: Please add feature to nm to display symbol visibility
   Product: binutils
   Version: 2.31 (HEAD)
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P2
 Component: binutils
  Assignee: unassigned at sourceware dot org
  Reporter: zenith432 at users dot sourceforge.net
  Target Milestone: ---

There are 3 utilities that dump the symbol table of object files
objdump
readelf
nm

Of these, objdump and readelf can display detailed symbol information - such as
the symbol visiblity.

nm dumps symbols, but does not display visibility.

Why is this important?

Because objdump and readelf only work on machine object files - elf, coff.
nm is the only one that works on LTO objects.  It has plugin support, and a gcc
wrapper gcc-nm that invokes it with a plugin.

So nm today is the only tool that can dump symbols from LTO objects, but it
doesn't do a full job of dumping all information about a symbol.

LLVM-nm has an option "-m", modelled on Apple's "nm -m" that displays extended
information - including visibility.

There is probably a format issue with this request, because nm uses official
bsd, sysv or posix syntax.

However, a "-m" extension can use any extended form it wants.

Thanks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23254] ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive

2018-06-05 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23254

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #6 from zenith432 at users dot sourceforge.net ---
I built from latest head commit 5c4ce239a (one after 27b076759) and verified on
macOS that it fixes the problem and ld works.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23254] ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive

2018-06-04 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23254

--- Comment #3 from zenith432 at users dot sourceforge.net ---
- Your attached patch fixes the problem.

- For the record, I did a workaround with the patch


--- orig_plugin.c   2018-01-13 13:31:16.0 +
+++ plugin.c2018-06-04 14:41:38.0 +
@@ -1059,7 +1059,9 @@ plugin_call_claim_file (const struct ld_
  called_plugin = curplug;
  cur_offset = lseek (file->fd, 0, SEEK_CUR);
  rv = (*curplug->claim_file_handler) (file, claimed);
+#if 0
  if (!*claimed)
+#endif
lseek (file->fd, cur_offset, SEEK_SET);
  called_plugin = NULL;
  if (rv != LDPS_OK)


I do not believe this workaround is formally correct - because the plugin is
allowed to continue moving the file pointer via the file descriptor it got.  As
a practical matter, I found by setting debug breakpoints that the current
plugin only moves the file pointer in claim_file_handler, so protecting the
pointer around that call resolved the issue.

- I attached above a standalone example to reproduce the problem, however I can
only get it to happen on macOS.  On Fedora everything runs fine.

gcc is 8.1, binutils built from 2.30 (same happens on git master)

On macOS the output is
/usr/local/bin/gcc -flto -c -o a.o a.c
/usr/local/bin/gcc -flto -c -o b.o b.c
/usr/local/bin/gcc -flto -c -o c.o c.c
/usr/local/bin/gcc-ar cr libarch.a a.o b.o c.o
/usr/local/bin/gcc -nostdlib -flto -o main main.c -L . -larch
./libarch.a: error adding symbols: Malformed archive
collect2: error: ld returned 1 exit status
make: *** [all] Error 1

I had to use -nostdib on macOS because it's a cross compiler and I don't have
the runtime, but it doesn't matter because the "Malformed archive" error is
encountered *before* other errors about unfound entry point or undefined
printf.

The size of stdio FILE buffer is 4096 (it's the same on Fedora)
a.c is cooked up so inside libarch.a the ar_hdr for b.o is split across the
4096 position boundary.
Here is what happens exactly:
- First LD scans and processes a.o inside libarch.a using fseek/fread.  To do
this, stdio reads a 4096 buffer managed through FILE and caches 4096 as the
file pointer.
- Eventually a.o is passed down to claim_file_handler.  The plugin re-processes
a.o using lseek/read, with the file pointer ending up at the end of a.o inside
libarch.a instead of at 4096.
- Then it goes back to process b.o.  At this point stdio FILE has a 4096 byte
buffer and believes the file pointer is at 4096.
- b.o ar_hdr starts a few bytes below 4096, which is in the cached buffer. 
fseek is called to move to b.o header - Since it's in the buffer, fseek does
not lseek, but positions its internal pointer.
- then fread is called to read the ar_hdr for b.o.  The few bytes in the cached
buffer are copied.  Then fread, believing the file pointer to be at 4096, reads
another 4096, but from the wrong location in the file.
- Because the wrong location was read, the ar_hdr for b.o ends up being read
corrupted which is why it breaks with "Malformed archive".

So even if you can't reproduce it, this is what happens and you can follow it
by adding printfs to ld code (or running in debugger).

It doesn't break on Fedora and I'm not sure why.  One possibility is that the
pre-canned binaries are patched somehow.  The other possibility is that Fedora
fseek always calls lseek when moving to b.o ar_hdr which prevents the glitch.

Regards.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23254] ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive

2018-06-04 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23254

--- Comment #2 from zenith432 at users dot sourceforge.net ---
Created attachment 11052
  --> https://sourceware.org/bugzilla/attachment.cgi?id=11052=edit
code to reproduce error

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23254] ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive

2018-06-02 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23254

zenith432 at users dot sourceforge.net changed:

   What|Removed |Added

 CC||amodra at gmail dot com

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
bug-binutils mailing list
bug-binutils@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-binutils


[Bug ld/23254] New: ld.bfd mishandles file pointers while scanning archive with LTO members, causing "malformed archive" errors for a well-formed archive

2018-06-02 Thread zenith432 at users dot sourceforge.net
https://sourceware.org/bugzilla/show_bug.cgi?id=23254

Bug ID: 23254
   Summary: ld.bfd mishandles file pointers while scanning archive
with LTO members, causing "malformed archive" errors
for a well-formed archive
   Product: binutils
   Version: 2.31 (HEAD)
Status: UNCONFIRMED
  Severity: critical
  Priority: P2
 Component: ld
  Assignee: unassigned at sourceware dot org
  Reporter: zenith432 at users dot sourceforge.net
  Target Milestone: ---

When processing an archive with LTO object members, ld.bfd mismanages the file
pointer, occasionally causing "malformed archive" errors even though the
archive is well-formed.

The cause of this problem is as follows:
An archive is processed by the function elf_link_add_archive_symbols
(bfd/elflink.c), which calls for each member the functions
_bfd_get_elt_at_filepos
bfd_check_format
add_archive_element

The first two functions (_bfd_get_elt_at_filepos, bfd_check_format) process the
archive element using the I/O functions found in bfd/bfdio.c and bfd/cache.c. 
These I/O functions use stdio FILE*.  They store the FILE* in abfd->iostream
for the parent archive, and use fseek/fread to scan the archive file.

After this is done, if an archive element is loaded, add_archive_element (in
ld/ldmain.c) is called, which eventually calls plugin_maybe_claim (in
ld/plugin.c) to handle LTO elements.  This calls plugin_object_p (in
ld/plugin.c) which eventually calls the plugin's claim_file_handler method (see
plugin_call_claim_file).  The plugin's claim_file_handler function processes
the archive file using system calls lseek()/read() using a file descriptor set
up in plugin_object_p.

Description of The Problem:
Prior to commit 7d0b9ebc1e0079271a7c7737b53bc026525eab64 (by Alan Modra),
plugin_object_p would obtain a file descriptor to the archive file based on its
name using system call open(), then pass the file descriptor downward to the
plugin's claim_file_handler for it to process the file.

In the commit mentioned above, plugin_object_p was changed so that instead of
opening the archive by name, it obtains the file descriptor from (FILE*)
abfd->iostream using fileno(), then duplicates it with systemcall dup(), then
passes it down to the plugin's claim_file_handler.

*The documentation for system call dup() very clearly explains that the dup'd
file descriptor shares a file pointer with the original descriptor*.

As a result of this change, the plugin's claim_file_handler clobbers the file
pointer using calls to lseek()/read().  HOWEVER, the archive is later continued
to be accessed using fseek()/fread() on abfd->iostream when processing the next
archive element !!

This is WRONG.  Because fseek/fread *do not know* that the file pointer was
changed outside the stdio FILE* framework.  The FILE structure caches a buffer
for the file, and if fseek() gets a position that is within the loaded buffer,
it may not call lseek().  Moreover, if FILE struct caches a position for the
file, this potentially causes fseek() and fread() to believe the file pointer
is in a different place than it really is and they read the wrong part of the
file and eventually cause a "malformed archive" error.

A later commit ace667e59aede65c400381f1cff704b61e8ccb0b by Andrew Burgess added
a small fix that wraps the call to plugin's claim_file_handler in code saving
and restoring the file pointer.  However, restoration of the file pointer is
only done if the plugin does not "claim" the archive element.  It seems this
was done to prevent bugs if the archive consists of non-LTO elements.  However,
if the archive has LTO elements and the element is "claimed", the file pointer
remains clobbered and the problem persists.

This problem started with 2.28 official release of binutils (in line with dates
on above commits), and continues to this very day.
I am able to consistently reproduce the "malformed archive" errors on large
EDK2-based build on macOS.  This is with self-built gcc 8.1 and binutils 2.30
or binutils git master.  I've tried tweaking configure options, but none of
them made any difference.
For some reason I cannot reproduce this problem with EDK2 build on Fedora 28
with GCC 8.1.1.  I've examined the patches Redhat makes to gcc and binutils
from their source RPMs, but there doesn't seem to be anything there that fixes
this.  Another possibility is that fseek() on Fedora works differently than on
macOS, so it always calls lseek() and fixes the file pointer that's been
clobbered, so the issue never surfaces.  I can't be sure because the
development tools are all pre-built binaries and difficult to debug and see
what's going on.

Due to the complexity of this problem I cannot attach a simple example to
reproduce it, and like I said - I'm not sure it reproduces on Fedora at all. 
However, I spent time deb