Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-08-21 Thread Torsten Dreyer
I have _probably_ found at least one reason for this bug.

I was able to constantly create a FPE when running fgfs --enable-fpe 
and /sim/traffic-manager/enabled=true
I was able to locate the offending code in FGAISchedule::update when the new 
position of some AI aircraft was calculated by multiplying the start position 
with a rotation matrix.

When computing the geodetic position from cartesian coordinates in 
  current = SGGeod::fromCart(newPos);
it happened, that within SGGeodesy::SGCartToGeod() the value for 's'
was _very_ close to zero and slightly negative causing sqrt(s*(2+s)) fail 
which is ony defined for s greater or equals zero or less than or equals -2.

The workaround clamps 's' to values greater than zero. This is probably 
mathematically incorrect but should keep us running.

Maybe someone who fully understands the math in this method can explain, 
if 's' ever can legally go negative or if this is a rounding error.

Greetings, Torsten


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-06-18 Thread Mathias Fröhlich

Hi,

On Sunday 14 June 2009 10:48:03 Durk Talsma wrote:
 /home/durk/src/OpenSceneGraph/src/osg/PositionAttitudeTransform.cpp:63 #1 
 0x7fbf16244187 in osg::Transform::computeBound (this=0x9d1d3b0) at
Well the PositionAttitudeTransform is used now for everything having a 
SGModelPlacement. May be you need to look into the AI models positions and 
orientations to find that?

I am not completely sure what you mean by those variables being different on 
different stack frames.
But keep in mind that it might be even possible that due to code optimizations 
gdb might print some nonsense in some stack frames. That depends on plenty 
conditions.

Did you try to increase osg's verbose level. Does it print something for the 
node paths so that you can see the models? Keep in mind that this only works 
for osg's *trunk*.

Also you might put some code into SGModelPlacement, when this writes its 
values into the PositionAttitudeTransform.

Greetings

Mathias

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-06-14 Thread Durk Talsma
Folks,

Here's a rather long overdue follow-up to my own previous mail.

On Tuesday 26 May 2009 22:37:45 I wrote:
 Thanks for your suggestions. I've been trying to track this down, but don't
 have anything firm yet. My current working hypothesis is that a stack
 corruption may be feeding bad data into the prepare ground cache
 function. As I've been tracing the problem further up the stack, I got to
 the point that suggests this. I'll post some more specific results later,
 because the core dumps are on a different machine that I don't have access
 to. That being the case, there's probably no bad date in the scene graph
 itself. I currently don't fully understand the results form the stacktrace
 yet.



Below is a stack trace of gdb, after I placing error trapping code further 
upstream. My original thought, looking at this stack trace was that a stack 
corruption occurs somewhere between stack frames #8 and #7, but upon closer 
inspection, I'm not so sure anymore: At stack frame #8 _simTime has a normal 
value, whereas at stack frame #7, startSimTime is listed as 0 (when it should 
have been the same value as _simTime in frame 8).  However, at stack frame 6, 
startStimTime is again the same as _simTime in frame 8, so this variable is 
passed correctly, but just printed in correctly in frame 7. On previous runs, 
I've seen these values being printed as NaN though, so I'm not quite 
comfortable as to what's going on here. 

I have a core file for this particular crash, so any suggestions would be 
welcome.

Cheers,
Durk

(gdb) bt
#0  osg::PositionAttitudeTransform::computeLocalToWorldMatrix (this=0x9d1d3b0, 
matrix=value optimized out)
at /home/durk/src/OpenSceneGraph/src/osg/PositionAttitudeTransform.cpp:63
#1  0x7fbf16244187 in osg::Transform::computeBound (this=0x9d1d3b0) at 
/home/durk/src/OpenSceneGraph/src/osg/Transform.cpp:164
#2  0x7fbf162207a1 in osg::Switch::computeBound (this=0x9d1d290) at 
/home/durk/src/OpenSceneGraph/include/osg/Node:334
#3  0x7fbf1618df5f in osg::Group::computeBound (this=0x832d590) at 
/home/durk/src/OpenSceneGraph/include/osg/Node:334
#4  0x004fcb90 in FGGroundCache::CacheFill::apply 
(this=0x7fff21e74330, gro...@0x832d590) at /usr/local/include/osg/Node:334
#5  0x7fbf1618f393 in osg::Group::accept (this=0x832d590, 
n...@0x7fff21e74330) at /home/durk/src/OpenSceneGraph/include/osg/Group:38
#6  0x004f504d in FGGroundCache::prepare_ground_cache (this=0x9de29d8, 
startSimTime=238.200687, endSimTime=238.217355,
p...@0x7fff21e74730, rad=12.576125144958496) at groundcache.cxx:355
#7  0x004ee91c in FGInterface::prepare_ground_cache_m (this=value 
optimized out, startSimTime=0, endSimTime=-0,
pt=value optimized out, rad=-0) at flight.cxx:612
#8  0x00675aa5 in YASim::update (this=0x9de2650, 
dt=0.01) at YASim.cxx:213
#9  0x00427255 in fgUpdateTimeDepCalcs () at main.cxx:157
#10 0x004294af in fgMainLoop () at main.cxx:447
#11 0x004714bf in fgOSMainLoop () at fg_os_osgviewer.cxx:177
#12 0x00428bb5 in fgMainInit (argc=4, argv=0x7fff21e74d68) at 
main.cxx:1004
#13 0x00426a95 in main (argc=4, argv=0x7fff21e74d68) at 
bootstrap.cxx:216
(gdb) frame 8
#8  0x00675aa5 in YASim::update (this=0x9de2650, 
dt=0.01) at YASim.cxx:213
213 prepare_ground_cache_m( _simTime, _simTime + dt, xyz, vr );
(gdb) p _simTime
$6 = 238.200687
(gdb) p dt
$7 = 0.01
(gdb) frame 7
#7  0x004ee91c in FGInterface::prepare_ground_cache_m (this=value 
optimized out, startSimTime=0, endSimTime=-0,
pt=value optimized out, rad=-0) at flight.cxx:612
612SGVec3d(pt), rad);
(gdb)

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-26 Thread Durk Talsma
Hi Tim,

On Wednesday 20 May 2009 10:06:18 Tim Moore wrote:
  It may be helpful to dump the scene graph to a file (from the debug menu)
  once you're getting the NaN error. Hopefully the offending matrix will
  be printed with NaNs instead of valid coordinates.
 
  Tim

 I've added an --enable-fpe argument which, on Linux, will cause an abort or
 core dump on a division-by-zero or other invalid floating point operation,
 including generating NaNs and overflowing float-to-integer conversions. See
 if you can get to the source of the NaNs using that.


No breakthroughs yet, but just a quick progress report to keep the thread 
alive. :-)

Thanks for your suggestions. I've been trying to track this down, but don't 
have anything firm yet. My current working hypothesis is that a stack 
corruption may be feeding bad data into the prepare ground cache function. 
As I've been tracing the problem further up the stack, I got to the point that 
suggests this. I'll post some more specific results later, because the core 
dumps are on a different machine that I don't have access to. That being the 
case, there's probably no bad date in the scene graph itself. I currently 
don't fully understand the results form the stacktrace yet.

As for the --enable-fpe argument, this is probably going to be a very useful 
debugging tool, but enabling it resulted in a segfault inside the GUI when I 
wanted to click the menu to enable the autopilot...

Cheers,
Durk
--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA,  Big Spaceship. http://p.sf.net/sfu/creativitycat-com ___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-20 Thread Tim Moore
Tim Moore wrote:
 Durk Talsma wrote:
 On Friday 15 May 2009 20:31:17 Durk Talsma wrote:


 While trying to trap bad data in this the popMatrix function, I just
 noticed that a bad transformation matrix is already set up relatively
 early in the process, only a few levels deep at the stack. I haven't
 been able to relate this to any meaningful object yet. (All that came up
 was the name Scene).

 So, it looks like a transformation error early on blows up the intersect
 line vector(s) already. and scenegraph is traversed further down, OSG
 keeps happily multiplying already corrupted data with valid
 transformation data further down the line, restuling in an intersect
 line, composed of NaNs. This goes unnoticed, until the error is finally
 picked up at the first possible occasion where there's a nan error
 check. That is, in trialintersect.

 I hope to continue this investigation later, and hope to be able to
 traverse the bad data to their true source.
 It may be helpful to dump the scene graph to a file (from the debug menu)
 once you're getting the NaN error. Hopefully the offending matrix will
 be printed with NaNs instead of valid coordinates.
 
 Tim
I've added an --enable-fpe argument which, on Linux, will cause an abort or
core dump on a division-by-zero or other invalid floating point operation,
including generating NaNs and overflowing float-to-integer conversions. See
if you can get to the source of the NaNs using that.

Tim

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-20 Thread George Patterson
On Wed, May 20, 2009 at 6:06 PM, Tim Moore timo...@redhat.com wrote:
 Tim Moore wrote:
 Durk Talsma wrote:
 On Friday 15 May 2009 20:31:17 Durk Talsma wrote:


 While trying to trap bad data in this the popMatrix function, I just
 noticed that a bad transformation matrix is already set up relatively
 early in the process, only a few levels deep at the stack. I haven't
 been able to relate this to any meaningful object yet. (All that came up
 was the name Scene).

 So, it looks like a transformation error early on blows up the intersect
 line vector(s) already. and scenegraph is traversed further down, OSG
 keeps happily multiplying already corrupted data with valid
 transformation data further down the line, restuling in an intersect
 line, composed of NaNs. This goes unnoticed, until the error is finally
 picked up at the first possible occasion where there's a nan error
 check. That is, in trialintersect.

 I hope to continue this investigation later, and hope to be able to
 traverse the bad data to their true source.
 It may be helpful to dump the scene graph to a file (from the debug menu)
 once you're getting the NaN error. Hopefully the offending matrix will
 be printed with NaNs instead of valid coordinates.

 Tim
 I've added an --enable-fpe argument which, on Linux, will cause an abort or
 core dump on a division-by-zero or other invalid floating point operation,
 including generating NaNs and overflowing float-to-integer conversions. See
 if you can get to the source of the NaNs using that.

 Tim


Hi Tim and All,

As per conversation on IRC ia have been able to get a backtrace when
using --enable-fpe.

FG was not paused by me with the error occuring very early on (no
sound not image showing in the spash screen). Machine is a Dual Core
Intel processor with a nvidia 8600GT video card.

I did add a debug line to the file
src/Instrumentation/inst_vertical_speed_indicator.cxx on line 207.
flightgear$ grep -n DEBUG src/Instrumentation/* |grep GP
src/Instrumentation/inst_vertical_speed_indicator.cxx:207:  
printf(DEBUG GP: SeaIngHG: %fL InternalSeaInHG: %fL DT: %fL\n,
sea_inhg, _internal_sea_inhg, dt);

Either dt is zero or I have the parameter in the printf line wrong.

Please find attached a full backtrace from GDB. Let me know if you
could do with more information.

Regards


George
gpatter...@gorilla-desktop:~$ gdb fgfs
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu...
(gdb) run --enable-fpe --aircraft=b1900d --airport=LNCM
Starting program: /usr/local/bin/fgfs --enable-fpe --aircraft=b1900d 
--airport=LNCM
[Thread debugging using libthread_db enabled]
[New Thread 0x7f4ffb2a7790 (LWP 30425)]
[New Thread 0x7f4fec8ee950 (LWP 30428)]
[New Thread 0x7f4fec0ed950 (LWP 30429)]
[New Thread 0x7f4feb0a5950 (LWP 30430)]
[New Thread 0x7f4fea8a4950 (LWP 30431)]
DEBUG GP: SeaIngHG: 29.92L InternalSeaInHG: 29.92L DT: 0.00L

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7f4ffb2a7790 (LWP 30425)]
0x007995d7 in InstVerticalSpeedIndicator::update (this=0xa926050, dt=0) 
at inst_vertical_speed_indicator.cxx:208
208 double rate_sea_inhg_per_s = ( sea_inhg - 
_internal_sea_inhg ) / dt;
(gdb) run --enable-fpe --aircraft=b1900d --airport=LNCM
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /usr/local/bin/fgfs --enable-fpe --aircraft=b1900d 
--airport=LNCM
[Thread debugging using libthread_db enabled]
[New Thread 0x7fd27858a790 (LWP 30432)]
[New Thread 0x7fd269bd1950 (LWP 30433)]
[New Thread 0x7fd2693d0950 (LWP 30434)]
[New Thread 0x7fd263fff950 (LWP 30521)]
[New Thread 0x7fd2637fe950 (LWP 30522)]
DEBUG GP: SeaIngHG: 29.92L InternalSeaInHG: 29.92L DT: 0.00L

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7fd27858a790 (LWP 30432)]
0x007995d7 in InstVerticalSpeedIndicator::update (this=0xa67b430, dt=0) 
at inst_vertical_speed_indicator.cxx:208
208 double rate_sea_inhg_per_s = ( sea_inhg - 
_internal_sea_inhg ) / dt;
(gdb) bt full
#0  0x007995d7 in InstVerticalSpeedIndicator::update (this=0xa67b430, 
dt=0) at inst_vertical_speed_indicator.cxx:208
pressure_inhg = 33.309936
sea_inhg = 29.922
speed_up = value optimized out
rate_sea_inhg_per_s = 0
#1  0x009b8b21 in SGSubsystemGroup::Member::update (this=0xa613a10, 
delta_time_sec=value optimized out) at subsystem_mgr.cxx:306
No locals.
#2  0x009bb20c in SGSubsystemGroup::update (this=0xa6ed070, 
delta_time_sec=0) at subsystem_mgr.cxx:159
b = -2.72813735
i = 16
#3  0x009b8b21 in 

Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-20 Thread George Patterson
On Thu, May 21, 2009 at 12:55 AM, George Patterson
george.patter...@gmail.com wrote:
 On Wed, May 20, 2009 at 6:06 PM, Tim Moore timo...@redhat.com wrote:
 Tim Moore wrote:
 Durk Talsma wrote:
 On Friday 15 May 2009 20:31:17 Durk Talsma wrote:


 While trying to trap bad data in this the popMatrix function, I just
 noticed that a bad transformation matrix is already set up relatively
 early in the process, only a few levels deep at the stack. I haven't
 been able to relate this to any meaningful object yet. (All that came up
 was the name Scene).

 So, it looks like a transformation error early on blows up the intersect
 line vector(s) already. and scenegraph is traversed further down, OSG
 keeps happily multiplying already corrupted data with valid
 transformation data further down the line, restuling in an intersect
 line, composed of NaNs. This goes unnoticed, until the error is finally
 picked up at the first possible occasion where there's a nan error
 check. That is, in trialintersect.

 I hope to continue this investigation later, and hope to be able to
 traverse the bad data to their true source.
 It may be helpful to dump the scene graph to a file (from the debug menu)
 once you're getting the NaN error. Hopefully the offending matrix will
 be printed with NaNs instead of valid coordinates.

 Tim
 I've added an --enable-fpe argument which, on Linux, will cause an abort or
 core dump on a division-by-zero or other invalid floating point operation,
 including generating NaNs and overflowing float-to-integer conversions. See
 if you can get to the source of the NaNs using that.

 Tim


 Hi Tim and All,

 As per conversation on IRC ia have been able to get a backtrace when
 using --enable-fpe.

 FG was not paused by me with the error occuring very early on (no
 sound not image showing in the spash screen). Machine is a Dual Core
 Intel processor with a nvidia 8600GT video card.


Oops... The splash screen gets up the stage of loading scenery objects.

Sorry for any confusion caused.

Regards


George

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-16 Thread dave perry
Heiko Schulz wrote:

  I think the pa24-250 is another user of this. Does this aircrafts 
 also cause this NaN-errors?


Hi Heiko,

The pa24-250 uses Aircraft/Instruments-3d/comp/comp.xml.

Regards,
Dave P.


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


[Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Durk Talsma
Hi All,

Thanks to a report by regular Forum poster MD-TERP (a.k.a. Rob), we found a 
way to trigger the infamous NaN warning in a rather reliable manner. Following 
up on this lead, I modified OpenSceneGraph so that it deliberately segfaults 
when the warning message is triggered (I know, I could have set a break point 
there in GDB, but I was lazy). 
 
Although I don't have a fix yet, I've collected quite a bit of information 
about the problem, and have a pretty good insight as to what's going on. 

Rob suggested trying a Flight from KMTN to KSKY, using the Citation-Bravo, 
cruising at FL240, enabling AI Traffic, Traffic Manager, ATC, and Multiplayer. 
The com1 radio was tuned to KMTN tower (121.3), but this is probably not 
relevant. The actual aircraft also probably doesn't matter. Using this setup, 
a NaN error is triggered rather reliably while crossing the Pittsburgh area. 
OpenSceneGraph was modified to include a deliberately malicious piece of code 
designed to trigger a segmentation fault upon printing the error message. 

Two stack traces were obtained, and a third one was examined in more detail 
using GDB, and saved as a core file.

Stack Trace #1 showed that the warning was triggered inside the AI Traffic 
subsystem, in particular in a call to 

globals-get_scenery()-get_elevation_m() 

Stack Trace #2 showed that the warning was triggered in 
src/Environment/ridge_lift.cxx again in a call to get_elevation_m()

So, in conclusion, it looks like the get_elevation_m() is the culprit. 
However, it also seemed that the NaN warning was only triggered when AI 
Traffic was activated, which seemed to be at odds with it being triggered by 
ridge_list as well. 

At this point, Mathias Froelich pointed out to me that 3D model objects 
included in the scene graph are involved in the ground elevation scanning 
process. Mathias also suggested to me that it should be possible to determine 
the offending model, by printing out the this-_name variable at various 
levels of the stack. Traversing the stack, I found that at least one possible 
source of the problem can be found in 

Aircraft/c172p/Models/c172p.xml, 

more particularly so in the included model

Aircraft/Instruments-3d/mag-compass.xml, 

in particular in the the object with the name Interior.  I'm not a 3d 
modeling expert enough to determine what could possible wrong in this object. 
Mathias suggested that a triangle with a zero sized surface or something like 
that could blow the math. If anybody with more expertise than me could have a 
look at it, please be my guest. 

The fact that an included component of the c172 causes the math of the ground 
elevation code to blow perfectly explains why a host of subsystems (at least 
in AI Traffic itself,  and in ridge_lift) are vulnerable to this error, while 
disabling AI Traffic does not affect these other system. By placing a c172 
model in the scene, AI Traffic creates a vulnerability, not only for itself, 
but also for other systems. Although I haven't seen an error triggered by 
traffic generated by the traffic manager (which is, I stress once again, an 
entirely separate system), this is probably coincidental, and due to the fact 
that we currently don't have much Traffic Manager related activity in the test 
area, and therefore the probability of seeing an error generated by the 
traffic manager is, in this particular case, low.. Had traffic schedules 
involving the c172 model existed, the traffic manager would likely have caused 
a similar error/vulnerability itself. 

So, to conclude, the mag-compass.ac instrument should be checked for possible 
modeling errors.. Also, I would suggest that a meaningful message could be 
printed by the ground elevation code in case of an error, so that future 
errors could be avoided. This probably can't be done in FlightGear, because it 
would require some modification of the OSG code. 

Until a full fix is in place, a special purpose AI c172 model could be 
created, which doesn't contain the interior. Users not regularly flying the 
c172 could remove the 3d compass, so that it becomes usable again as an AI 
aircraft.

Obviously, I can't rule out that there aren't more possible causes for the NaN 
warning, however. given that it's occurrence seems to be strongly related to 
activating AI Traffic, I suspect they are caused by a relatively narrow set of 
circumstances.

Cheers,
Durk
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Curtis Olson
Here is a quick thought (not having thought this all the way through.)

Originally we only queried the altitude of a single point beneath out
aircraft.  As we've move forward, we now have created a cache of local
triangles and can query the altitude of each wheel and contact point.  But
also we have added nasal and C++ interfaces to query the altitude of any
arbitrary point.

I wonder though, if there is no scenery tile loaded for the requesting query
location, what happens?  Are these tiles somehow scheduled for loading?  But
the cache size is fixed so if we have too many far ranging altitude queries,
could we be running into a situation where the requested tiles are flushed
to make room for something else?

I did a long haul flight recently from Boston to NY to MSP in the alphajet
(current CVS version) and not far out of Boston started dropping tiles right
and left ... it was a mess.  I got to the point where I had no visible tiles
loaded, just flying over empty space as far as the eye could see.

Then later I tried another long cross country to verify the problem, and I
didn't see one dropped tile.

So there is some random goofiness somewhere in our tile caching scheme, and
I don't personally have a good understanding of how these arbitrary position
queries play with our tile caching/scheduling/loading scheme ... I suspect
there could be some contention there.

Best regards,

Curt.


On Fri, May 15, 2009 at 12:21 PM, Durk Talsma wrote:

  Hi All,

 Thanks to a report by regular Forum poster MD-TERP (a.k.a. Rob), we found a
 way to trigger the infamous NaN warning in a rather reliable manner.
 Following up on this lead, I modified OpenSceneGraph so that it deliberately
 segfaults when the warning message is triggered (I know, I could have set a
 break point there in GDB, but I was lazy).

  Although I don't have a fix yet, I've collected quite a bit of
 information about the problem, and have a pretty good insight as to what's
 going on.

 Rob suggested trying a Flight from KMTN to KSKY, using the Citation-Bravo,
 cruising at FL240, enabling AI Traffic, Traffic Manager, ATC, and
 Multiplayer. The com1 radio was tuned to KMTN tower (121.3), but this is
 probably not relevant. The actual aircraft also probably doesn't matter.
 Using this setup, a NaN error is triggered rather reliably while crossing
 the Pittsburgh area. OpenSceneGraph was modified to include a deliberately
 malicious piece of code designed to trigger a segmentation fault upon
 printing the error message.

 Two stack traces were obtained, and a third one was examined in more detail
 using GDB, and saved as a core file.

 Stack Trace #1 showed that the warning was triggered inside the AI Traffic
 subsystem, in particular in a call to

 globals-get_scenery()-get_elevation_m()

 Stack Trace #2 showed that the warning was triggered in
 src/Environment/ridge_lift.cxx again in a call to get_elevation_m()

 So, in conclusion, it looks like the get_elevation_m() is the culprit.
 However, it also seemed that the NaN warning was only triggered when AI
 Traffic was activated, which seemed to be at odds with it being triggered by
 ridge_list as well.

 At this point, Mathias Froelich pointed out to me that 3D model objects
 included in the scene graph are involved in the ground elevation scanning
 process. Mathias also suggested to me that it should be possible to
 determine the offending model, by printing out the this-_name variable at
 various levels of the stack. Traversing the stack, I found that at least one
 possible source of the problem can be found in

 Aircraft/c172p/Models/c172p.xml,

 more particularly so in the included model

 Aircraft/Instruments-3d/mag-compass.xml,

 in particular in the the object with the name Interior. I'm not a 3d
 modeling expert enough to determine what could possible wrong in this
 object. Mathias suggested that a triangle with a zero sized surface or
 something like that could blow the math. If anybody with more expertise than
 me could have a look at it, please be my guest.

 The fact that an included component of the c172 causes the math of the
 ground elevation code to blow perfectly explains why a host of subsystems
 (at least in AI Traffic itself, and in ridge_lift) are vulnerable to this
 error, while disabling AI Traffic does not affect these other system. By
 placing a c172 model in the scene, AI Traffic creates a vulnerability, not
 only for itself, but also for other systems. Although I haven't seen an
 error triggered by traffic generated by the traffic manager (which is, I
 stress once again, an entirely separate system), this is probably
 coincidental, and due to the fact that we currently don't have much Traffic
 Manager related activity in the test area, and therefore the probability of
 seeing an error generated by the traffic manager is, in this particular
 case, low.. Had traffic schedules involving the c172 model existed, the
 traffic manager would likely have caused a similar 

Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Heiko Schulz
Hi,

interesting to see that a small instruments can make such big troubles.

What me wonders: Aircraft/Instruments-3d/mag-compass.xml isn't only used by the 
c172p than by other aircrafts as well. I think the pa24-250 is another user of 
this. Does this aircrafts also cause this NaN-errors? 

I just check the interior object, but it is impossible for me now to see where 
the mistake is. Any idea hwo to check?

Regards
HHS

 
...Traversing the stack, I found that at least one possible source of the 
problem can be found in 
Aircraft/c172p/Models/c172p.xml, 
more particularly so in the included model
Aircraft/Instruments-3d/mag-compass.xml, 
in particular in the the object with the name Interior.  I'm not a 3d 
modeling expert enough to determine what could possible wrong in this object. 
Mathias suggested that a triangle with a zero sized surface or something like 
that could blow the math. If anybody with more expertise than me could have a 
look at it, please be my guest. 



Until a full fix is in place, a special purpose AI c172 model could be created, 
which doesn't contain the interior. Users not regularly flying the c172 could 
remove the 3d compass, so that it becomes usable again as an AI aircraft.
Obviously, I can't rule out that there aren't more possible causes for the NaN 
warning, however. given that it's occurrence seems to be strongly related to 
activating AI Traffic, I suspect they are caused by a relatively narrow set of 
circumstances.
Cheers,
Durk


  --
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Durk Talsma
Hi,

On Friday 15 May 2009 20:10:01 Heiko Schulz wrote:
 Hi,

 interesting to see that a small instruments can make such big troubles.

 What me wonders: Aircraft/Instruments-3d/mag-compass.xml isn't only used by
 the c172p than by other aircrafts as well. I think the pa24-250 is another
 user of this. Does this aircrafts also cause this NaN-errors?

I'm not sure about this, but my estimate is that the trouble doesnt arise when 
the mag-compass is part of the user aircraft, but only when it's part of the 
exterior world, i.e. when part of an AI aircraft. Also, it's possible that the 
instrument by itself may be okay, but triggers an error in interaction with 
other scene elements. 

Cheers,
Durk
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Heiko Schulz
Seems my first answer failed with the c172p.ac attached.. 
I removed some double vertices on the interior-object, and I hope this was the 
cause for the trouble. 
You will find the improved here:
http://gitorious.org/c172p
command: git clone git://gitorious.org/c172p/mainline.git
Regards
HHS


  --
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Progress report on the infamous error in TriangleIntersect NAN Problem

2009-05-15 Thread Torsten Dreyer
 I'm not sure about this, but my estimate is that the trouble doesnt arise
 when the mag-compass is part of the user aircraft, but only when it's part
 of the exterior world, i.e. when part of an AI aircraft. Also, it's
 possible that the instrument by itself may be okay, but triggers an error
 in interaction with other scene elements.
The model looks good at first glance. The Interior object has several single 
sided 4 vertex, non-planar surfaces, all facing inwards. That is perfectly 
legal. Just in case, any of these anomalies causes trouble, I have attached 
a modified version of the mag-compass.ac with all Interior surfaces 
two-sided and triangulated.

Maybe you want to give it a try.

Torsten


mag-compass.ac.bz2
Description: BZip2 compressed data
--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel