On 17/01/2012 9.41, Carmelo AMOROSO wrote:
> On 17/01/2012 2.59, Khem Raj wrote:
>> On Mon, Jan 16, 2012 at 1:36 AM, Carmelo AMOROSO <carmelo.amor...@st.com> 
>> wrote:
>>> On 16/01/2012 9.09, Carmelo Amoroso wrote:
>>>> On 16/01/2012 8.53, Khem Raj wrote:
>>>>> On Sun, Jan 15, 2012 at 11:46 PM, Carmelo AMOROSO
>>>>> <carmelo.amor...@st.com> wrote:
>>>>>> On 15/01/2012 7.22, Khem Raj wrote:
>>>>>>> On Sat, Jan 14, 2012 at 6:10 PM, Khem Raj <raj.k...@gmail.com> wrote:
>>>>>>>> On Fri, Jan 13, 2012 at 4:13 PM, Khem Raj <raj.k...@gmail.com> wrote:
>>>>>>>>> On Fri, Jan 13, 2012 at 3:45 PM, Khem Raj <raj.k...@gmail.com> wrote:
>>>>>>>>>> On Fri, Jan 13, 2012 at 1:37 AM, Carmelo AMOROSO 
>>>>>>>>>> <carmelo.amor...@st.com> wrote:
>>>>>>>>>>>> and since I see the same issue on all architectures probably its 
>>>>>>>>>>>> not
>>>>>>>>>>>> elfinterp changes
>>>>>>>>>>>> too. Mostly it seems likely that it could be in the way the scopes 
>>>>>>>>>>>> are
>>>>>>>>>>>> being handled
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> we have reviewed several times this change before committing. 
>>>>>>>>>>> Anyway we
>>>>>>>>>>> will review it again. We have not ever seen any failure in the 
>>>>>>>>>>> lookup
>>>>>>>>>>> with all of our tests. The only change in the way the symbol scope 
>>>>>>>>>>> is
>>>>>>>>>>> created is in where the ld.so is added.
>>>>>>>>>>> In the original code it was the last entry of the global scope, 
>>>>>>>>>>> while
>>>>>>>>>>> with the new structure in place it was added as soon as found (as 
>>>>>>>>>>> glibc
>>>>>>>>>>> actually does).... and I don't really think this could have some 
>>>>>>>>>>> impact.
>>>>>>>>>>
>>>>>>>>>> I tried to reverse it as well but the problem remained.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We are trying to startup a X system on our platform. Is there any 
>>>>>>>>>>> simple
>>>>>>>>>>> X app we can run to show the failure ?
>>>>>>>>>>>
>>>>>>>>>>> Is some .so failing to be dl-opened due to unresolved symbol ?
>>>>>>>>>>
>>>>>>>>>> this is potentially possible. I will try to debug it through
>>>>>>>>>
>>>>>>>>> This is the problem that happens with the new scoping and does not
>>>>>>>>> happen without it
>>>>>>>>>
>>>>>>>>> Error reading Pango modules file
>>>>>>>>>
>>>>>>>>> (matchbox-desktop:1058): Pango-CRITICAL **: No modules found:
>>>>>>>>> No builtin or dynamically loaded modules were found.
>>>>>>>>> PangoFc will not work correctly.
>>>>>>>>> This probably means there was an error in the creation of:
>>>>>>>>>  '/etc/pango/pango.modules'
>>>>>>>>> You should create this file by running:
>>>>>>>>>  pango-querymodules > '/etc/pango/pango.modules'
>>>>>>>>>
>>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='latin'
>>>>>>>>>
>>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font,
>>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='common'
>>>>>>>>
>>>>>>>> here is the error
>>>>>>>>
>>>>>>>> /usr/bin/pango-querymodules: can't resolve symbol
>>>>>>>> '_ZNSt14error_categoryD2Ev' in lib '/usr/lib/libstdc++.so.6'.
>>>>>>>>
>>>>>>>> this does not happen without scope patch
>>>>>>>>
>>>>>>>> pango-querymodules loads a shared library
>>>>>>>> /usr/lib/pango/1.6.0/modules/pango-basic-fc.so using dlopen and this
>>>>>>>> library had libstdc++.so.6 in its DT_NEEDED entries
>>>>>>>>
>>>>>>>> I was trying to create a small testcase where I created a small binary
>>>>>>>> which would dlopen another .so which has libstdc++ in DT_NEEDED in its
>>>>>>>> header so not able to reproduce a small testcase but making some
>>>>>>>> progress
>>>>>>>
>>>>>>>
>>>>>>> I might have a test case here 
>>>>>>> http://uclibc.org/~kraj/reproducer_v2.tar.gz
>>>>>>> untar it on target and run make and the ./run.sh
>>>>>>>
>>>>>>> with buggy libraries i get
>>>>>>> root@qemux86:~/rep/reproducer_v2# ./run.sh
>>>>>>> 1)main:dlopen  libA.so
>>>>>>> 4)libC:dlopen  libB.so
>>>>>>> 5)libC:atexit(libC_fini)
>>>>>>> 6)main:dlclose libA.so
>>>>>>> /home/root/rep/reproducer_v2/main: can't resolve symbol '_libC_fini'
>>>>>>> in lib './/libC.so'.
>>>>>>>
>>>>>>> whereas without the scopes patch I get
>>>>>>>
>>>>>>> root@qemux86:~/rep/reproducer_v2# ./run.sh
>>>>>>> 1)main:dlopen  libA.so
>>>>>>> 4)libC:dlopen  libB.so
>>>>>>> 5)libC:atexit(libC_fini)
>>>>>>> 6)main:dlclose libA.so
>>>>>>> 7)libC:finish - atexit()
>>>>>>> 8)main:finish main
>>>>>>> root@qemux86:~/rep/reproducer_v2#
>>>>>>>
>>>>>>>
>>>>>>> I think thats the problem that I am facing in pango-querymodules as well
>>>>>>> another data point is if I use BIND_NOW then it works too.
>>>>>>>
>>>>>>> let me know if you can reproduce it with this testcase
>>>>>>>
>>>>>>> Thanks
>>>>>>> -Khem
>>>>>>>
>>>>>>
>>>>>> Thanks khem for your effort in reproducing.
>>>>>> I-ll let you know asap.
>>>>>>
>>>>>> We will focus on this 100% since now.
>>>>>>
>>>>>> Carmelo
>>>>>
>>>>> I have a patch (sort of) which fixes this issue have a look at it.
>>>>> Problem is that its trying to unload sub scopes after it has been
>>>>> removed from global scope so I just delayed the removal of dlopened
>>>>> library
>>>>>
>>>>
>>>> what is triggering the problem is the use of atexit()
>>>>
>>>
>>> I'd  ask.. is it correct that a dlopen-ed shared library install
>>> a function via atexit() to be called at program exit, if the shared
>>> library could be un-loaded at any time during the program's life ?
>>>
>>
>> does library know if it will be dlopened all the time ?
>>
> 
> no it doesn't obviously.
> 
> I've read again atexit man pages, initially it simply refers to the use
> of atexit in binaries (so the reason of my doubts), later in the Note
> I've read a reference to the use of atexit in shared libraries acting as
> a destructor.... so my concerns are invalid.
> 
>>> I'd say that with the old it was just working fortunately !
>>>
>>> The shared library image is actually un-mapped from the system, why we
>>> should expect to have some of its symbols still alive ?
>>>
>>
>> how about the dependencies that it loaded
>>
> 
> again I was wrong. Looking at the code more carefully, inded in the loop
> where the library (with dependencies) are getting unloaded, the
> destructors are called at the beginning, before unmapping the DSO, and
> before removing it from the _dl_loaded_modules and the symbol tables...
> so it works.
> 
>>>>> http://www.uclibc.org/~kraj/fix_libdl.patch
>>>>>
>>>>
>>>> looking at it
>>>
>>> not considering the concerns on the use of atexit, this patch is
>>> correct. Could we avoid to use the unlink_local_scope guard and test the
>>> stored_ls pointer directly ?
>>>
> 
> please install you patch, it is definitely correct.
> 

I'm working at a different approach... sending a new patch shortly.

>>> carmelo
>>
> 
> cheers,
> carmelo
> _______________________________________________
> uClibc mailing list
> uClibc@uclibc.org
> http://lists.busybox.net/mailman/listinfo/uclibc
> 

_______________________________________________
uClibc mailing list
uClibc@uclibc.org
http://lists.busybox.net/mailman/listinfo/uclibc

Reply via email to