Hi, So I've been digging through the crash reports and tried to evaluate how we're doing. Here's some data, I'll give some conclusions at the end.
Crashers since one week ----------------------------------- Looking into the crashers on the "Second Life OSS" builds since we're doing them, this is what I see: Total : 30 crashers - VWR-12775 : 12 - OpenJPEG : 3 - VWR-13065: 2 - VWR-13066 : 2 - VWR-12827: 2 - Unusable stacks: 9 Clearly, the notorious VWR-12775 (LLTextureFetchWorker::callbackDecoded crasher) is the one that's impeding reliability. If you're unlucky, you can crash repeatedly over and over as I noticed that you can meet that crash just reading the cache at launch... If I look from the beginning of the month, I get 32 such crashers out of 108. Looking into "CommunityDeveloper" channel shows 6 out of 17 crashers. That's a fairly consistent 30% of crashes due to this problem. VWR-12827 is the LLVertexBuffer one and was fixed with the merge of 1.23 Second Life OSS Build 1.23.0 2166 ------------------------------------------------ This is the most recent "1.23 + http-texture" merge build and is only a couple of days old. Total : 12 crashers - VWR-12775 : 1 - OpenJPEG : 2 - VWR-13065: 2 - VWR-13066 : 1 - Unusable stacks: 6 There's no obvious pattern except that VWR-12827 disappeared (as expected) but VWR-12065 emerged (badNetworkHandler exception). Still, not much data to see anything new. I have anecdotal data that VWR-12775 is still very prevalent and that most of those crashes are not reported. I sure can repro that one easily. Conclusion --------------- VWR-12775 LLTextureFetchWorker::callbackDecoded() is the bad guy. I've seen actually a couple of different stacks but they are all rooted down to the same problem. Several people on this list commented in the PJIRA and even proposed patches (thanks Robin!). Discussing this with the lead dev though, he thinks we better get to the deep down reason as to why we get the fetch worker into a state that is not supposed to even exist (in the intent of the code at least), rather than just writing it down to a race condition, detecting the weird state and avoiding the crash. Right now then, I'm tracing when this state happens (and it happens plenty, most of the time without getting into a crashing tangle) and writing unit tests to ensure the logic is sound and well understood. Writing unit tests for threaded code is a little tricky (actually, we don't have example of this in our still skimpy set of unit tests so, I'm blazing new trails here...) but is worthwhile. At the same time, I'm tasked with updating the doc (http://wiki.secondlife.com/wiki/HTTP_Texture ) as I go. That keeps me busy which explain (but doesn't excuse) I was not super responsive on the list today. Apologies for this. The OpenJPEG crashers are concerning. There are unfortunately little info in the stack trace and I haven't myself experienced that one. What's strange is that I looked into the other viewers crash logs (thousands of them) and it doesn't surface at all in their crashers. May be of note: our 2 crashers happen on Vista. Does this ring a bell to anyone? Keep testing and the logs (with symbols!) coming. This is extremely useful. Any idea on the here above reported crashes will be very much appreciated. Cheers, - Merov _______________________________________________ Policies and (un)subscribe information available here: http://wiki.secondlife.com/wiki/SLDev Please read the policies before posting to keep unmoderated posting privileges
