Re: Thunderbird crashing when C_SignInit returns other than CKR_OK

2010-12-19 Thread Nelson Bolyard
On 2010-12-16 19:21 PDT, Marsh Ray wrote:
 On 12/16/2010 04:39 PM, Matej Kurpel wrote:
 ChildEBP RetAddr  Args to Child
 0015f130 5fa0c52b e06d7363 0001 0003
 KERNELBASE!RaiseException+0x58 (FPO: [Non-Fpo])
 0015f168 5fa14f13 0015f178 5fa7aa24 5fa5c11c
 MOZCRT19!_CxxThrowException+0x46 (FPO: [Non-Fpo]) (CONV: stdcall)
 [f:\sp\vctools\crt_bld\self_x86\crt\prebuild\eh\throw.cpp @ 161]
 
 So Mozilla builds its own CRT without FPO, cool.

Yes, Mozilla builds its own CRT, which is a modified version of the MSVC
CRT, whose sources come only with the pay (not free) versions of MSVC.
They do this in order to replace MSVC's normal heap code (malloc) with
their own JEmalloc.

Mozilla's source repository doesn't include ANY of the MSVC source code,
but only includes a ed script that patches that source without including
any of it.  Sadly, this means that people with the free MSVC cannot build
MOZCRT19, because they lack the sources to be patched.  IMO, this is a
flaw for an open source project, but ...  :(

 0015f180 003b474b 0028 0015f290 5f9ad1d9 MOZCRT19!operator new+0x73
 (FPO: [1,3,0]) (CONV: cdecl)
 
 The above func must be statically linked from the Mic CRT into the Moz 
 CRT. So it's still FPO. Weird.

Right.  IIRC, it's built from the plain old MSVC new.cpp source.
It calls malloc and throws an exception if malloc returns NULL.

 [e:\buildbot\win32_build_31\build\objdir-tb\mozilla\memory\jemalloc\crtsrc\new@61]
 
 Looking at 
 http://mxr.mozilla.org/mozilla-central/source/memory/jemalloc/ I don't 
 see the source or crtsrc\new.cpp. Must be copied in from Microsoft 
 source code a build time.

Right.

 In any case, 'operator new' is throwing a C++ exception. Ordinarily that 
 would be due to a bad parameter (e.g., -1) or lack of memory.

Right.  Any NULL return from malloc causes this.

 In this case is it maybe asking for 0x0028 = 40 bytes?

I wouldn't bet much money that JEmalloc never modifies its input
arguments.  That's always allowed in c (as you know) which always passes
arguments by value.

 0015f198 003b47db 09385800  003d3b55
 thunderbird!nsDOMEvent::nsDOMEvent+0x63 (FPO: [Non-Fpo]) (CONV: thiscall)
 [e:\buildbot\win32_build_31\build\mozilla\content\events\src\nsdomevent@136]
 
 http://mxr.mozilla.org/mozilla-central/source/content/events/src/nsDOMEvent.cpp
 Line 132 is in the middle of a comment, so clearly I'm n ot looking at 
 the right source. Below it is a 'new nsEvent'. 

The sources from which Thunderbird are built come from Mozilla's
comm-central repository.  I think that line 136 could be either a
reference to the line on which the new call itself occurs, or the
following line.

The versions of the nsdomevent source in which the new call occurs on line
135 are dated 2009-04-02 14:34 -0500 ... 2009-06-30 10:56 +0300 
and line 136 from  2009-09-11 16:13 -0700 ... 2009-11-30 13:31 -0500
all of which are over a year old now.
See
http://hg.mozilla.org/mozilla-central/log/90b17476216d/content/events/src/nsDOMEvent.cpp
and
http://hg.mozilla.org/mozilla-central/log/d9267e3d8f8c/content/events/src/nsDOMEvent.cpp
and
http://hg.mozilla.org/mozilla-central/annotate/9e7a2c507c41/content/events/src/nsDOMEvent.cpp#l136

 But 'nsEvent' looks like it would take more than 40 bytes.

yes.

 So, skipping down a bit, it looks like something has already gone wrong 
 before this exception is thrown. The app is attempting to show an alert 
 box, which fails because of an out-of-memory condition.

Agreed.  further back on the stack, we see:

 nsMsgSendReport::DisplayReport+0x28c  nsmsgsendreport@428]
 nsMsgComposeAndSend::Fail+0x73nsmsgsend@3812]
 nsMsgComposeAndSend::GatherMimeAttachments+0x113d nsmsgsend@1147]

That suggests that the attempt to generate and attach all the attachments
failed, and I'd guess that is likely due to Matej's intentional
introduction of a failure into C_SignInit.

So, C_SignInit failed, and then the attempt to report that failure in an
alert pop-up dialog fails due to heap allocation failure, perhaps due to
heap exhaustion, or heap corruption.

 The details are probably not important.

Well, I think the big question is: why does the heap allocation fail?

 You need to track down where the first error occurs. 

My first wild guess is that Matej's PKCS#11 module is doing something bad
to the heap.  My second one is that NSS or PSM is trying to free to the
MOZCRT17 heap something that was allocated from another heap.


-- 
/Nelson Bolyard   bnbsp;/b
12345678901234567890123456789012345678901234567890123456789012345678901234567890
0112233445566778
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Thunderbird crashing when C_SignInit returns other than CKR_OK

2010-12-19 Thread Marsh Ray

On 12/19/2010 02:27 AM, Nelson Bolyard wrote:


Yes, Mozilla builds its own CRT, which is a modified version of the MSVC
CRT, whose sources come only with the pay (not free) versions of MSVC.
They do this in order to replace MSVC's normal heap code (malloc) with
their own JEmalloc.

Mozilla's source repository doesn't include ANY of the MSVC source code,
but only includes a ed script that patches that source without including
any of it.  Sadly, this means that people with the free MSVC cannot build
MOZCRT19, because they lack the sources to be patched.  IMO, this is a
flaw for an open source project, but ...  :(


Can you build it against the compiler's CRT if you want to?


Well, I think the big question is: why does the heap allocation fail?


You need to track down where the first error occurs.


My first wild guess is that Matej's PKCS#11 module is doing something bad
to the heap.


Like if Matej's module were linked to some other CRT and an interface 
passed memory that way. Historically, having multiple CRTs in the same 
process has been a recipe for disaster on Windows. It's gotten better as 
Microsoft has switched to using the default global OS heap for everything.


Microsoft actually has a pretty decent set of heap debugging tools:

http://www.google.com/search?q=pageheap


I think this tool was made by their OS development team rather than 
their Visual dotnet 2000 enterprise team architect edition team. It's 
much better, much closer to valgrind. But the trick to using it is to 
get everything using the default OS heap, which means actually using the 
Release not the Debug Microsoft CRT. Thus JEmalloc would have to be 
redirected for testing as well, but just to the GlobalAlloc.



My second one is that NSS or PSM is trying to free to the
MOZCRT17 heap something that was allocated from another heap.


Or perhaps vice versa, but wouldn't that likely have thrown at the point 
of the bad free or delete?


- Marsh
--
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto


Re: Thunderbird crashing when C_SignInit returns other than CKR_OK

2010-12-19 Thread Nelson B Bolyard
On 2010-12-19 00:56 PDT, Marsh Ray wrote:
 On 12/19/2010 02:27 AM, Nelson Bolyard wrote:
 Yes, Mozilla builds its own CRT, which is a modified version of the MSVC
 CRT, whose sources come only with the pay (not free) versions of MSVC.
 They do this in order to replace MSVC's normal heap code (malloc) with
 their own JEmalloc.

 Mozilla's source repository doesn't include ANY of the MSVC source code,
 but only includes a ed script that patches that source without including
 any of it.  Sadly, this means that people with the free MSVC cannot build
 MOZCRT19, because they lack the sources to be patched.  IMO, this is a
 flaw for an open source project, but ...  :(
 
 Can you build it against the compiler's CRT if you want to?

Not that I'm aware.  But I've never tried.

 Well, I think the big question is: why does the heap allocation fail?

 You need to track down where the first error occurs.
 My first wild guess is that Matej's PKCS#11 module is doing something bad
 to the heap.
 
 Like if Matej's module were linked to some other CRT and an interface 
 passed memory that way. Historically, having multiple CRTs in the same 
 process has been a recipe for disaster on Windows. 

I was thinking of allocating a buffer of size N and then writing past the
end of it.  That's the most common problem, IMO.

 My second one is that NSS or PSM is trying to free to the
 MOZCRT17 heap something that was allocated from another heap.
 
 Or perhaps vice versa, but wouldn't that likely have thrown at the point 
 of the bad free or delete?

Not clear.  The debug CRT would catch such thing at free/delete time,
but not clear that any other CRT would do so.

Actually, in retrospect, I think it's doubtful that this second guess is
the problem, because the PKCS#11 API doesn't ever pass allocated memory
from one side of the API to the other, for the receiving side to
deallocate.  Generally, the process using the module allocates all memory,
and frees what it allocated.  It asks the PKCS#11 module to take data
from, or put data into, the memory it supplies, but the module should
never free memory passed in to it, and never outputs the addresses of
memory that it has allocated.


-- 
/Nelson Bolyard
-- 
dev-tech-crypto mailing list
dev-tech-crypto@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-tech-crypto