Re: librpmio memory allocation issue
On Mar 1, 2011, at 2:45 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:33 PM, Mark Hatle wrote: On 3/1/11 1:27 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:20 PM, Mark Hatle wrote: I think jbj might have figured out the problem. Zypper is not using pcre, but is instead causing mire to use the system regex. I made some local changes in an attempt to point the #include of regex.h in mire with a #error. I'm in the process of building both RPM and Zypper now to see if this causes a build failure -- if it does, we need to figure out a Zypper solution because it's bringing in mire.h. If you can show me the Zypper code that's trying to use miRE, I can likely suggest a clean fix. I am not finding any direct inclusions of miRE within Zypper or libzypper. However, I am finding an inclusion of regex.h within libzypper. One last note: Here's the valgrind spewage from the pojy bug report: =444== Invalid read of size 1 ==444==at 0x694C5DC: regcomp (regcomp.c:502) ==444==by 0x86F139B: mireRegcomp (mire.c:364) ==444==by 0x6E46E22: defaultMachine (rpmrc.c:475) ==444==by 0x6E4743B: rpmSetMachine (rpmrc.c:841) ==444==by 0x6E475B3: rpmRebuildTargetVars.clone.1 (rpmrc.c:925) ==444==by 0x6E47DFA: rpmReadConfigFiles (rpmrc.c:1115) ==444==by 0x52E45A6: zypp::target::rpm::librpmDb::globalInit() (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x52C6990: zypp::target::rpm::RpmDb::RpmDb() (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x52FC023: zypp::target::TargetImpl::TargetImpl(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5412F08: zypp::Target::Target(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5425BA3: zypp::zypp_detail::ZYppImpl::initializeTarget(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5420E2E: zypp::ZYpp::initializeTarget(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444== Address 0xb2a9f88 is not stack'd, malloc'd or (recently) free'd ==444== The only way that backtrace can happen (that I can see) is if RPM, not zypper, is mis-compiled using an unfixed (to redefine colliding symbols) system: PCRE with #include pcreposix.h which leaves you two choices: 1) fix you system PCRE pcreposix.h 2) build --with-pcre=internal This helps a lot, with the patching of system PCRE, the problem seems gone. The RPM we are using is rpm-5.4.0, which doesn't have an internal pcre shipped, so I diffed the system header with the internal header, and saw the #define regcomp pcreposix_regcomp artifact. It's then much clearer to understand what actually happened, regex_t from pcreposix.h is used bug regcomp from libc is what get called at runtime, that causes the failure. Interestingly, this patch is also found in Debian, but no versions of vanilla PCRE. I tried to find some original discussion on the issue, but got no meaningful result, is there any references? And why PCRE upstream is sticking to this? From what I know of zypper (mostly libsatsolver and tools), the code rips everything out-of-an-rpmdb and uses as little as possible from RPM libraries. But I haven't looked for 2+ years. Doesn't know much about zypper internals, but the issue above is triggered with only NULL passing around. Thanks, Qing __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: librpmio memory allocation issue
On Mar 3, 2011, at 7:09 AM, Qing He wrote: The only way that backtrace can happen (that I can see) is if RPM, not zypper, is mis-compiled using an unfixed (to redefine colliding symbols) system: PCRE with #include pcreposix.h which leaves you two choices: 1) fix you system PCRE pcreposix.h 2) build --with-pcre=internal This helps a lot, with the patching of system PCRE, the problem seems gone. The RPM we are using is rpm-5.4.0, which doesn't have an internal pcre shipped, so I diffed the system header with the internal header, and saw the #define regcomp pcreposix_regcomp artifact. Yes. You would have to re-add, either by preparing a tarball from a checkout form @rpm5.org, or by adding from elsewhere. The patch to change the pcre emulation is all thai is important. The only other change in the internal pcre @rpm5.org is to disable 'make install in the pcre sub-tree. There might be some other minor changes to remove compilation warnings. But more distros already have the needed patch in system pcre afaik. @redhat systems are the important exception. It's then much clearer to understand what actually happened, regex_t from pcreposix.h is used bug regcomp from libc is what get called at runtime, that causes the failure. Interestingly, this patch is also found in Debian, but no versions of vanilla PCRE. I tried to find some original discussion on the issue, but got no meaningful result, is there any references? And why PCRE upstream is sticking to this? Yes. The patch was taken from Debian originally. I *believe* (from dim memories of checking diff's upgrading pcre) that there's an equivalent change in pcre-8.10 (but I could easily be mis-remembering). From what I know of zypper (mostly libsatsolver and tools), the code rips everything out-of-an-rpmdb and uses as little as possible from RPM libraries. But I haven't looked for 2+ years. Doesn't know much about zypper internals, but the issue above is triggered with only NULL passing around. Yes a symbol collision has a pervasive and hard to diagnose effect. Glad to hear the problem was fixed. BTW, rpm-5.4.1 will likely be released in the next week. Be careful how/when you upgrade on the rpm-5_4 branch because that is the devel branch. But there's nothing in rpm-5.4.0 that differs significantly from rpm-5.3.8. I will be careful ... but I am soon to get back to more active development. 73 de jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: librpmio memory allocation issue
On Mar 1, 2011, at 11:04 AM, Paul Eggleton wrote: Hi there, In Poky we're currently seeing a crash of zypper search in conjunction with rpm 5.4.0 [1]. Using valgrind I tracked the issue down to rpmio/mire.c line 361: mire-preg = xcalloc(1, sizeof(*mire-preg)); If I hack this line to specify 64 as the size (the expected sizeof(regex_t) for x86_64, as opposed to 24 reported by valgrind) then the crash disappears and valgrind stops reporting invalid memory accesses. OK. (aside for context) The reference counting on the mire object is quite tricky. And all the memory management in RPM is quite tricky, comes with reference counting most allocations, can't be helped. But specifically painful wrto the mire object is that sometimes a mire object is a scalar, and sometimes its an array. The hard design problem is this: Does the reference count apply to the array or the element in the array? and (what is not yet implemented cleanly/correctly @rpm5.org) is that I've wired up the same reference count (through a pointer) so that the array and all of its elements share the same reference count. That works iff there are programming contraints that MUST be obeyed, likely not the case with zypper. BTW: could you dig a bit and find out how/why zypper is actually segfaulting? A simple gdb backtrace will suffice, don't bother digging deeper, I can find all the info necessary if I have the gdb backtrace (or equivalentl: the valgrind spewage) that you are fixing. I don't have much knowledge of the rpm codebase, but a bit of header grepping shows me that libpcre's pcreposix.h has a regex_t which differs quite considerably from regex_t in regex.h (and matches the smaller size reported by valgrind), and therefore I strongly suspect that the culprit is that pcre's regex_t is being used when allocating the struct in mire.c which is then passed to regcomp. FWIW we are enabling pcre support at configure time. (another aside for context) RPM had to commit to a *RE dialect = PCRE. And PCRE support is MANDATORY @rpm5.org. But I fully expect Have it your own way! to eventually 2nd guess the choice of MANDATORY. SO the code remains using the POSIX *RE PI and the MNDATORY PCRE is achieved by using pcreposx.h None of the above is obvious from reading code. Apologies. I could hack this to work, but since we may have dueling headers here the solution might not be trivial. Any suggestions? If you want to try a hack, be my guess. This implementation is not to my liking and is tortured beyond belief. But the simple rule(s) regarding miRE object reference counts and PCRE wireup are this: 1) You SHOULD be able to see all the necessary detals to debug a miRE issue if you add --miredebug from the CLI or set (programatically) static int _mire_debug = -1; The ++ and -- lines are the reference counts being changed, and (in the case of segfaults/memleaks) a reference count ++/-- has gone AWOL somehow. 2) All code using the miRE API MUST do #include pcreposix.h because there's a risk of symbol pollution. This is tricky with out-of-tree builds like zypper (that may be peeking into miRE internals, or that may need #define's as arguments). The Right Thing To Do (but Have it your own way! RPM building voids whatever suggestions I might make) is a) build RPM --with-pcre=internal b) supply /usr/include/rpm/pcreposix.h for the internal pcre. Please note that #include mire.h is not any interface I wish to export from RPM's API whatsoever. But there's an endless need to compile with RPM and so mire.h flips out and is pulled back into RPM regular as clockwork. Thanks, Paul [1] http://bugzilla.pokylinux.org/show_bug.cgi?id=721 ... checking ... Comment #3 at the bug report indicates problem is during configuration (where miRE is used to parse patterns out of /etc/rpm/platform). SHort answer: try deleting everything but the 1st line in /etc/rpm/platform, and I'll bet your segfault disappears. Hmmm ... you got a bunch of issues trying to initialize rpm through configuration. Usually there's only a single flaw. Short answer: if zypper is initializing -lrpmlib multiple times, well, that's asking for trouble because of the reference counting confusion between array and array element that is currently implemented @rpm5.org. There's no need to repeatedly re-initialize -lrpmlib. But the real hint is that regcomp not a wrapped symbol (pcre_regcomp or something) is in the stack backtrace. Which forces me to ... (yet another obscure aside) The emulation provided by pcreposix.h isn't perfect because a) #define's differ from POSIX b) its an API/ABI emulation only. E.g. the emulation is perfectly happy parsing PCRE dialect regexes passed through the POSIX emulated API. SHort answer: I believe you have symbol pollution because of the way that zypper - pcre - rpm are being built separately. There's gory details ... checking ... in
RE: librpmio memory allocation issue
Hi, i have checked mire.c and there is none difference from 5.3 and 5.4. So perhaps a autofu issue is possible, difficult that i can do a MandrivaUpdate without a problem on cokker with a such apparently evident problem with 5.3.6. Can you post your configure invocation, configure log. Regards -Original Message- From: Paul Eggleton Sent: 01/03/2011, 17:04 To: rpm-devel@rpm5.org Subject: librpmio memory allocation issue Hi there, In Poky we're currently seeing a crash of zypper search in conjunction with rpm 5.4.0 [1]. Using valgrind I tracked the issue down to rpmio/mire.c line 361: mire-preg = xcalloc(1, sizeof(*mire-preg)); If I hack this line to specify 64 as the size (the expected sizeof(regex_t) for x86_64, as opposed to 24 reported by valgrind) then the crash disappears and valgrind stops reporting invalid memory accesses. I don't have much knowledge of the rpm codebase, but a bit of header grepping shows me that libpcre's pcreposix.h has a regex_t which differs quite considerably from regex_t in regex.h (and matches the smaller size reported by valgrind), and therefore I strongly suspect that the culprit is that pcre's regex_t is being used when allocating the struct in mire.c which is then passed to regcomp. FWIW we are enabling pcre support at configure time. I could hack this to work, but since we may have dueling headers here the solution might not be trivial. Any suggestions? Thanks, Paul [1] http://bugzilla.pokylinux.org/show_bug.cgi?id=721 -- Paul Eggleton Intel Open Source Technology Centre (UK) __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: librpmio memory allocation issue
On 3/1/11 1:27 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:20 PM, Mark Hatle wrote: I think jbj might have figured out the problem. Zypper is not using pcre, but is instead causing mire to use the system regex. I made some local changes in an attempt to point the #include of regex.h in mire with a #error. I'm in the process of building both RPM and Zypper now to see if this causes a build failure -- if it does, we need to figure out a Zypper solution because it's bringing in mire.h. If you can show me the Zypper code that's trying to use miRE, I can likely suggest a clean fix. I am not finding any direct inclusions of miRE within Zypper or libzypper. However, I am finding an inclusion of regex.h within libzypper. I am still trying to track down the call order, but if I had to guess.. zypper is calling into RPM with a regex object, which is getting fed to mire (internally) and because the sizes are different we're getting a failure of some type. --Mark The traditional use of mire was tied to rpm -qa foo* as a selection filter on packages returned from a sequential iteration through Packages. The problem there is that *its Pig Slow* to load every installed header just to apply a pattern to NVRA. Its equally opig slow on @rpm.org or @rpm5.org code (though @rpm5.org will be faster than @rpm.org for other reasons ;-) But _NOT_ doing #include mire.h and once again _NOT_ installing /usr/include/rpm/mire.h (and thereby implicitly exporting a very messy API issue) is The Right Thing To Do. Anything that mire.h can also be done in zypper without a whole lotta work. Can you snip out and send along the zypper usage case for mire.h please? 73 de Jeff --Mark On 3/1/11 1:15 PM, pinto.e...@gmail.com wrote: Hi, i have checked mire.c and there is none difference from 5.3 and 5.4. So perhaps a autofu issue is possible, difficult that i can do a MandrivaUpdate without a problem on cokker with a such apparently evident problem with 5.3.6. Can you post your configure invocation, configure log. Regards -Original Message- From: Paul Eggleton Sent: 01/03/2011, 17:04 To: rpm-devel@rpm5.org Subject: librpmio memory allocation issue Hi there, In Poky we're currently seeing a crash of zypper search in conjunction with rpm 5.4.0 [1]. Using valgrind I tracked the issue down to rpmio/mire.c line 361: mire-preg = xcalloc(1, sizeof(*mire-preg)); If I hack this line to specify 64 as the size (the expected sizeof(regex_t) for x86_64, as opposed to 24 reported by valgrind) then the crash disappears and valgrind stops reporting invalid memory accesses. I don't have much knowledge of the rpm codebase, but a bit of header grepping shows me that libpcre's pcreposix.h has a regex_t which differs quite considerably from regex_t in regex.h (and matches the smaller size reported by valgrind), and therefore I strongly suspect that the culprit is that pcre's regex_t is being used when allocating the struct in mire.c which is then passed to regcomp. FWIW we are enabling pcre support at configure time. I could hack this to work, but since we may have dueling headers here the solution might not be trivial. Any suggestions? Thanks, Paul [1] http://bugzilla.pokylinux.org/show_bug.cgi?id=721 __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: librpmio memory allocation issue
On Mar 1, 2011, at 2:33 PM, Mark Hatle wrote: On 3/1/11 1:27 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:20 PM, Mark Hatle wrote: I think jbj might have figured out the problem. Zypper is not using pcre, but is instead causing mire to use the system regex. I made some local changes in an attempt to point the #include of regex.h in mire with a #error. I'm in the process of building both RPM and Zypper now to see if this causes a build failure -- if it does, we need to figure out a Zypper solution because it's bringing in mire.h. If you can show me the Zypper code that's trying to use miRE, I can likely suggest a clean fix. I am not finding any direct inclusions of miRE within Zypper or libzypper. However, I am finding an inclusion of regex.h within libzypper. If the *pig slow* load every iteration is in play in zypper (I hope note: mls is a smart coder usually), then this is the API that must be used to set a pattern on a match iterator: /** \ingroup rpmdb * Add pattern to iterator selector. * @param mirpm database iterator * @param tag rpm tag * @param mode type of pattern match * @param pattern pattern to match * @return 0 on success */ int rpmmiAddPattern(/*@null@*/ rpmmi mi, rpmTag tag, rpmMireMode mode, /*@null@*/ const char * pattern) /*@globals rpmGlobalMacroContext, h_errno, internalState @*/ /*@modifies mi, mode, rpmGlobalMacroContext, internalState @*/; I've forgot the older traditional name and lost track of @rpm.org. The leakage that forces #include mire.h is solely to get the rpmMireMode enumeration. You can easily just pass the handul of modes as integers and stub out the rpmMireMode typedef to avoid a whopping amount of pain here. I'd move rpmMireMode Somewhere Else Instead except it isn't clear where that SomeWhere SHOULD be and any change -- paricularly repeated changes -- in an API is a worser solution than busted (but known) status quo ante. I am still trying to track down the call order, but if I had to guess.. zypper is calling into RPM with a regex object, which is getting fed to mire (internally) and because the sizes are different we're getting a failure of some type. k. There's no API (I can think of) where a regex can be passed into RPM. But that's never stopped anyone from doing what was needed with RPM. The calls I see at the bug report are all on rpmlib initialization. But its usually rpmMireMode that drags in #include mire.h 73 de Jeff The traditional use of mire was tied to rpm -qa foo* as a selection filter on packages returned from a sequential iteration through Packages. The problem there is that *its Pig Slow* to load every installed header just to apply a pattern to NVRA. Its equally opig slow on @rpm.org or @rpm5.org code (though @rpm5.org will be faster than @rpm.org for other reasons ;-) But _NOT_ doing #include mire.h and once again _NOT_ installing /usr/include/rpm/mire.h (and thereby implicitly exporting a very messy API issue) is The Right Thing To Do. Anything that mire.h can also be done in zypper without a whole lotta work. Can you snip out and send along the zypper usage case for mire.h please? 73 de Jeff --Mark On 3/1/11 1:15 PM, pinto.e...@gmail.com wrote: Hi, i have checked mire.c and there is none difference from 5.3 and 5.4. So perhaps a autofu issue is possible, difficult that i can do a MandrivaUpdate without a problem on cokker with a such apparently evident problem with 5.3.6. Can you post your configure invocation, configure log. Regards -Original Message- From: Paul Eggleton Sent: 01/03/2011, 17:04 To: rpm-devel@rpm5.org Subject: librpmio memory allocation issue Hi there, In Poky we're currently seeing a crash of zypper search in conjunction with rpm 5.4.0 [1]. Using valgrind I tracked the issue down to rpmio/mire.c line 361: mire-preg = xcalloc(1, sizeof(*mire-preg)); If I hack this line to specify 64 as the size (the expected sizeof(regex_t) for x86_64, as opposed to 24 reported by valgrind) then the crash disappears and valgrind stops reporting invalid memory accesses. I don't have much knowledge of the rpm codebase, but a bit of header grepping shows me that libpcre's pcreposix.h has a regex_t which differs quite considerably from regex_t in regex.h (and matches the smaller size reported by valgrind), and therefore I strongly suspect that the culprit is that pcre's regex_t is being used when allocating the struct in mire.c which is then passed to regcomp. FWIW we are enabling pcre support at configure time. I could hack this to work, but since we may have dueling headers here the solution might not be trivial. Any suggestions? Thanks, Paul [1]
Re: librpmio memory allocation issue
On Mar 1, 2011, at 2:45 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:33 PM, Mark Hatle wrote: On 3/1/11 1:27 PM, Jeff Johnson wrote: On Mar 1, 2011, at 2:20 PM, Mark Hatle wrote: I think jbj might have figured out the problem. Zypper is not using pcre, but is instead causing mire to use the system regex. I made some local changes in an attempt to point the #include of regex.h in mire with a #error. I'm in the process of building both RPM and Zypper now to see if this causes a build failure -- if it does, we need to figure out a Zypper solution because it's bringing in mire.h. If you can show me the Zypper code that's trying to use miRE, I can likely suggest a clean fix. I am not finding any direct inclusions of miRE within Zypper or libzypper. However, I am finding an inclusion of regex.h within libzypper. One last note: Here's the valgrind spewage from the pojy bug report: =444== Invalid read of size 1 ==444==at 0x694C5DC: regcomp (regcomp.c:502) ==444==by 0x86F139B: mireRegcomp (mire.c:364) ==444==by 0x6E46E22: defaultMachine (rpmrc.c:475) ==444==by 0x6E4743B: rpmSetMachine (rpmrc.c:841) ==444==by 0x6E475B3: rpmRebuildTargetVars.clone.1 (rpmrc.c:925) ==444==by 0x6E47DFA: rpmReadConfigFiles (rpmrc.c:1115) ==444==by 0x52E45A6: zypp::target::rpm::librpmDb::globalInit() (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x52C6990: zypp::target::rpm::RpmDb::RpmDb() (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x52FC023: zypp::target::TargetImpl::TargetImpl(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5412F08: zypp::Target::Target(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5425BA3: zypp::zypp_detail::ZYppImpl::initializeTarget(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444==by 0x5420E2E: zypp::ZYpp::initializeTarget(zypp::filesystem::Pathname const, bool) (in /usr/lib/libzypp.so.810.1.0) ==444== Address 0xb2a9f88 is not stack'd, malloc'd or (recently) free'd ==444== The only way that backtrace can happen (that I can see) is if RPM, not zypper, is mis-compiled using an unfixed (to redefine colliding symbols) system: PCRE with #include pcreposix.h which leaves you two choices: 1) fix you system PCRE pcreposix.h 2) build --with-pcre=internal From what I know of zypper (mostly libsatsolver and tools), the code rips everything out-of-an-rpmdb and uses as little as possible from RPM libraries. But I haven't looked for 2+ years. hth 73 de jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org