On 17/01/2012 9.41, Carmelo AMOROSO wrote: > On 17/01/2012 2.59, Khem Raj wrote: >> On Mon, Jan 16, 2012 at 1:36 AM, Carmelo AMOROSO <carmelo.amor...@st.com> >> wrote: >>> On 16/01/2012 9.09, Carmelo Amoroso wrote: >>>> On 16/01/2012 8.53, Khem Raj wrote: >>>>> On Sun, Jan 15, 2012 at 11:46 PM, Carmelo AMOROSO >>>>> <carmelo.amor...@st.com> wrote: >>>>>> On 15/01/2012 7.22, Khem Raj wrote: >>>>>>> On Sat, Jan 14, 2012 at 6:10 PM, Khem Raj <raj.k...@gmail.com> wrote: >>>>>>>> On Fri, Jan 13, 2012 at 4:13 PM, Khem Raj <raj.k...@gmail.com> wrote: >>>>>>>>> On Fri, Jan 13, 2012 at 3:45 PM, Khem Raj <raj.k...@gmail.com> wrote: >>>>>>>>>> On Fri, Jan 13, 2012 at 1:37 AM, Carmelo AMOROSO >>>>>>>>>> <carmelo.amor...@st.com> wrote: >>>>>>>>>>>> and since I see the same issue on all architectures probably its >>>>>>>>>>>> not >>>>>>>>>>>> elfinterp changes >>>>>>>>>>>> too. Mostly it seems likely that it could be in the way the scopes >>>>>>>>>>>> are >>>>>>>>>>>> being handled >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> we have reviewed several times this change before committing. >>>>>>>>>>> Anyway we >>>>>>>>>>> will review it again. We have not ever seen any failure in the >>>>>>>>>>> lookup >>>>>>>>>>> with all of our tests. The only change in the way the symbol scope >>>>>>>>>>> is >>>>>>>>>>> created is in where the ld.so is added. >>>>>>>>>>> In the original code it was the last entry of the global scope, >>>>>>>>>>> while >>>>>>>>>>> with the new structure in place it was added as soon as found (as >>>>>>>>>>> glibc >>>>>>>>>>> actually does).... and I don't really think this could have some >>>>>>>>>>> impact. >>>>>>>>>> >>>>>>>>>> I tried to reverse it as well but the problem remained. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We are trying to startup a X system on our platform. Is there any >>>>>>>>>>> simple >>>>>>>>>>> X app we can run to show the failure ? >>>>>>>>>>> >>>>>>>>>>> Is some .so failing to be dl-opened due to unresolved symbol ? >>>>>>>>>> >>>>>>>>>> this is potentially possible. I will try to debug it through >>>>>>>>> >>>>>>>>> This is the problem that happens with the new scoping and does not >>>>>>>>> happen without it >>>>>>>>> >>>>>>>>> Error reading Pango modules file >>>>>>>>> >>>>>>>>> (matchbox-desktop:1058): Pango-CRITICAL **: No modules found: >>>>>>>>> No builtin or dynamically loaded modules were found. >>>>>>>>> PangoFc will not work correctly. >>>>>>>>> This probably means there was an error in the creation of: >>>>>>>>> '/etc/pango/pango.modules' >>>>>>>>> You should create this file by running: >>>>>>>>> pango-querymodules > '/etc/pango/pango.modules' >>>>>>>>> >>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font, >>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='latin' >>>>>>>>> >>>>>>>>> (matchbox-desktop:1058): Pango-WARNING **: failed to choose a font, >>>>>>>>> expect ugly output. engine-type='PangoRenderFc', script='common' >>>>>>>> >>>>>>>> here is the error >>>>>>>> >>>>>>>> /usr/bin/pango-querymodules: can't resolve symbol >>>>>>>> '_ZNSt14error_categoryD2Ev' in lib '/usr/lib/libstdc++.so.6'. >>>>>>>> >>>>>>>> this does not happen without scope patch >>>>>>>> >>>>>>>> pango-querymodules loads a shared library >>>>>>>> /usr/lib/pango/1.6.0/modules/pango-basic-fc.so using dlopen and this >>>>>>>> library had libstdc++.so.6 in its DT_NEEDED entries >>>>>>>> >>>>>>>> I was trying to create a small testcase where I created a small binary >>>>>>>> which would dlopen another .so which has libstdc++ in DT_NEEDED in its >>>>>>>> header so not able to reproduce a small testcase but making some >>>>>>>> progress >>>>>>> >>>>>>> >>>>>>> I might have a test case here >>>>>>> http://uclibc.org/~kraj/reproducer_v2.tar.gz >>>>>>> untar it on target and run make and the ./run.sh >>>>>>> >>>>>>> with buggy libraries i get >>>>>>> root@qemux86:~/rep/reproducer_v2# ./run.sh >>>>>>> 1)main:dlopen libA.so >>>>>>> 4)libC:dlopen libB.so >>>>>>> 5)libC:atexit(libC_fini) >>>>>>> 6)main:dlclose libA.so >>>>>>> /home/root/rep/reproducer_v2/main: can't resolve symbol '_libC_fini' >>>>>>> in lib './/libC.so'. >>>>>>> >>>>>>> whereas without the scopes patch I get >>>>>>> >>>>>>> root@qemux86:~/rep/reproducer_v2# ./run.sh >>>>>>> 1)main:dlopen libA.so >>>>>>> 4)libC:dlopen libB.so >>>>>>> 5)libC:atexit(libC_fini) >>>>>>> 6)main:dlclose libA.so >>>>>>> 7)libC:finish - atexit() >>>>>>> 8)main:finish main >>>>>>> root@qemux86:~/rep/reproducer_v2# >>>>>>> >>>>>>> >>>>>>> I think thats the problem that I am facing in pango-querymodules as well >>>>>>> another data point is if I use BIND_NOW then it works too. >>>>>>> >>>>>>> let me know if you can reproduce it with this testcase >>>>>>> >>>>>>> Thanks >>>>>>> -Khem >>>>>>> >>>>>> >>>>>> Thanks khem for your effort in reproducing. >>>>>> I-ll let you know asap. >>>>>> >>>>>> We will focus on this 100% since now. >>>>>> >>>>>> Carmelo >>>>> >>>>> I have a patch (sort of) which fixes this issue have a look at it. >>>>> Problem is that its trying to unload sub scopes after it has been >>>>> removed from global scope so I just delayed the removal of dlopened >>>>> library >>>>> >>>> >>>> what is triggering the problem is the use of atexit() >>>> >>> >>> I'd ask.. is it correct that a dlopen-ed shared library install >>> a function via atexit() to be called at program exit, if the shared >>> library could be un-loaded at any time during the program's life ? >>> >> >> does library know if it will be dlopened all the time ? >> > > no it doesn't obviously. > > I've read again atexit man pages, initially it simply refers to the use > of atexit in binaries (so the reason of my doubts), later in the Note > I've read a reference to the use of atexit in shared libraries acting as > a destructor.... so my concerns are invalid. > >>> I'd say that with the old it was just working fortunately ! >>> >>> The shared library image is actually un-mapped from the system, why we >>> should expect to have some of its symbols still alive ? >>> >> >> how about the dependencies that it loaded >> > > again I was wrong. Looking at the code more carefully, inded in the loop > where the library (with dependencies) are getting unloaded, the > destructors are called at the beginning, before unmapping the DSO, and > before removing it from the _dl_loaded_modules and the symbol tables... > so it works. > >>>>> http://www.uclibc.org/~kraj/fix_libdl.patch >>>>> >>>> >>>> looking at it >>> >>> not considering the concerns on the use of atexit, this patch is >>> correct. Could we avoid to use the unlink_local_scope guard and test the >>> stored_ls pointer directly ? >>> > > please install you patch, it is definitely correct. >
I'm working at a different approach... sending a new patch shortly. >>> carmelo >> > > cheers, > carmelo > _______________________________________________ > uClibc mailing list > uClibc@uclibc.org > http://lists.busybox.net/mailman/listinfo/uclibc > _______________________________________________ uClibc mailing list uClibc@uclibc.org http://lists.busybox.net/mailman/listinfo/uclibc