Re: [Flightgear-devel] nan-a-palooza
On Tue, Dec 15, 2009 at 2:11 AM, Csaba Halász csaba.hal...@gmail.com wrote: That may be the nasal bug Jacob is seeing. I could reproduce it and also made a little test case that I am gonna submit as a gcc bug report. It is clearly accessing the double member of the union before it has been established as valid. Even though there is a related gcc bug, we must realize that the current implementation of naRef is not standard C, it is relying on undefined behaviour: With one exception, if the value of a member of a union object is used when the most recent store to the object was to a different member, the behavior is implementation-defined. That is, setting the num member of a naRef and then examining the ref member (which IS_NUM is doing) is undefined. I believe simply expecting the two members to overlay each other is also relying on undefined behaviour. The best way would be to make naRef bigger by adding a separate tag and not mess with nonstandard stuff. I have also tested making naRef a char[8] and memcpy-ing the relevant bytes out. That still relies on the actual memory layout of a double but should otherwise be standard compliant while keeping the current size. GCC optimized away the memcpy-s for me. Thoughts? -- Csaba/Jester -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Relying on undefined behavior is definitely no good...might work fine for a long time, but it will come back to bite you eventually. If you can find a way to do it in a compliant way without increasing the size would be ideal I guess, but if you need to increase the size so be it. Nasal is an integral part of flightgear and is so widely spread through every part of the sim it needs to be done properly and reliably. I'll be happy to test whatever you come up with. I personally will be very disappointed if all these nan issues continue into the next release... cheers! -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0xb627aa20 (LWP 30813)] 0x0865ece2 in findcell (hr=0xeb43550, key= {num = nan(0x567891077bfa8), ref = {ptr = {obj = 0x1077bfa8, str = 0x1077bfa8, vec = 0x1077bfa8, hash = 0x1077bfa8, code = 0x1077bfa8, func = 0x1077bfa8, ccode = 0x1077bfa8, ghost = 0x1077bfa8}, reftag = 2146789257}}, hash=4111002719) at ../../../simgear/nasal/hash.c:67 67 if(IS_NUM(a)) return a.num == b.num; Current language: auto; currently c (gdb) bt #0 0x0865ece2 in findcell (hr=0xeb43550, key= {num = nan(0x567891077bfa8), ref = {ptr = {obj = 0x1077bfa8, str = 0x1077bfa8, vec = 0x1077bfa8, hash = 0x1077bfa8, code = 0x1077bfa8, func = 0x1077bfa8, ccode = 0x1077bfa8, ghost = 0x1077bfa8}, reftag = 2146789257}}, hash=4111002719) at ../../../simgear/nasal/hash.c:67 #1 0x0865f35d in naHash_get (hash=value optimized out, key= {num = nan(0x567891077bfa8), ref = {ptr = {obj = 0x1077bfa8, str = 0x1077bfa8, vec = 0x1077bfa8, hash = 0x1077bfa8, code = 0x1077bfa8, func = 0x1077bfa8, ccode = 0x1077bfa8, ghost = 0x1077bfa8}, reftag = 2146789257}}, out=0xbfdea690) at ../../../simgear/nasal/hash.c:130 #2 0x0865be74 in naInternSymbol (sym= {num = nan(0x567891077bfa8), ref = {ptr = {obj = 0x1077bfa8, str = 0x1077bfa8, vec = 0x1077bfa8, hash = 0x1077bfa8, code = 0x1077bfa8, func = 0x1077bfa8, ccode = 0x1077bfa8, ghost = 0x1077bfa8}, reftag = 2146789257}}) at ../../../simgear/nasal/codegen.c:74 #3 0x086586c9 in naNewContext () at ../../../simgear/nasal/code.c:190 #4 0x084c3bbe in FGNasalSys::init (this=0xeb30d50) at ../../../src/Scripting/NasalSys.cxx:650 #5 0x0808e3f0 in fgInitSubsystems () at ../../../src/Main/fg_init.cxx:1709 #6 0x0806d0f8 in fgIdleFunction () at ../../../src/Main/main.cxx:774 #7 0x080bbec2 in fgOSMainLoop () at ../../../src/Main/fg_os_osgviewer.cxx:172 #8 0x0806d8d5 in fgMainInit (argc=10, argv=0xbfdeab04) at ../../../src/Main/main.cxx:920 #9 0x0806baef in main (argc=10, argv=0xbfdeab04) at ../../../src/Main/bootstrap.cxx:229 -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Mon, Dec 14, 2009 at 1:11 PM, Jacob Burbach jmburb...@gmail.com wrote: Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0xb627aa20 (LWP 30813)] 0x0865ece2 in findcell (hr=0xeb43550, key= {num = nan(0x567891077bfa8), ref = {ptr = {obj = 0x1077bfa8, str = 0x1077bfa8, vec = 0x1077bfa8, hash = 0x1077bfa8, code = 0x1077bfa8, func = 0x1077bfa8, ccode = 0x1077bfa8, ghost = 0x1077bfa8}, reftag = 2146789257}}, hash=4111002719) at ../../../simgear/nasal/hash.c:67 67 if(IS_NUM(a)) return a.num == b.num; I believe this may be a compiler bug. Can you provide a disassembly around that line? 50 instructions in each direction should be fine I think. Nasal stores values in a tricky union and (on 32 bit systems) it uses the reftag to differentiate between numbers and pointers. The IS_NUM check is like this: #define NASAL_REFTAG 0x7ff56789 // == 2,146,789,257 decimal #define IS_REF(r) ((r).ref.reftag == NASAL_REFTAG) #define IS_NUM(r) (!IS_REF(r)) Your gdb output shows that reftag is in fact equal to NASAL_REFTAG so IS_NUM should be false and thus the comparison that raises the FPE should not be executed. I suspect gdb generated code that at least loads one of the values into the FPU triggering the FPE. -- Csaba/Jester -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Not sure what you meant about gdb generating code to cause it, I get the same error when run outside of gdb. Assembly of the function below, if you need something else let me know. 0x0865ec50 findcell+0:push %ebp 0x0865ec51 findcell+1:mov%esp,%ebp 0x0865ec53 findcell+3:push %edi 0x0865ec54 findcell+4:push %esi 0x0865ec55 findcell+5:push %ebx 0x0865ec56 findcell+6:xor%ebx,%ebx 0x0865ec58 findcell+8:sub$0x5c,%esp 0x0865ec5b findcell+11: mov0x8(%ebp),%edi 0x0865ec5e findcell+14: mov%edx,-0x38(%ebp) 0x0865ec61 findcell+17: mov0x4(%eax),%edx 0x0865ec64 findcell+20: mov%eax,-0x54(%ebp) 0x0865ec67 findcell+23: mov$0x1,%eax 0x0865ec6c findcell+28: mov%ecx,-0x34(%ebp) 0x0865ec6f findcell+31: movl $0x0,-0x4c(%ebp) 0x0865ec76 findcell+38: lea0x1(%edx),%ecx 0x0865ec79 findcell+41: shl%cl,%eax 0x0865ec7b findcell+43: sub$0x1,%eax 0x0865ec7e findcell+46: mov%eax,-0x28(%ebp) 0x0865ec81 findcell+49: mov-0x28(%ebp),%ecx 0x0865ec84 findcell+52: lea0x1(%edi,%edi,1),%eax 0x0865ec88 findcell+56: and%ecx,%eax 0x0865ec8a findcell+58: test %edx,%edx 0x0865ec8c findcell+60: mov%eax,-0x24(%ebp) 0x0865ec8f findcell+63: je 0x865eca6 findcell+86 0x0865ec91 findcell+65: mov$0x20,%ecx 0x0865ec96 findcell+70: mov%edi,%ebx 0x0865ec98 findcell+72: sub%edx,%ecx 0x0865ec9a findcell+74: shr%cl,%ebx 0x0865ec9c findcell+76: lea0x0(,%ebx,4),%esi 0x0865eca3 findcell+83: mov%esi,-0x4c(%ebp) 0x0865eca6 findcell+86: mov-0x54(%ebp),%eax 0x0865eca9 findcell+89: mov%edx,%ecx 0x0865ecab findcell+91: add$0xc,%eax 0x0865ecae findcell+94: and$0x7,%eax 0x0865ecb1 findcell+97: lea0x7(%eax),%edi 0x0865ecb4 findcell+100: and$0x8,%edi 0x0865ecb7 findcell+103: sub%eax,%edi 0x0865ecb9 findcell+105: mov$0x10,%eax 0x0865ecbe findcell+110: shl%cl,%eax 0x0865ecc0 findcell+112: lea0xc(%edi,%eax,1),%eax 0x0865ecc4 findcell+116: mov%eax,-0x2c(%ebp) 0x0865ecc7 findcell+119: mov-0x54(%ebp),%eax 0x0865ecca findcell+122: mov-0x2c(%ebp),%esi 0x0865eccd findcell+125: add-0x4c(%ebp),%eax 0x0865ecd0 findcell+128: mov%edi,-0x30(%ebp) 0x0865ecd3 findcell+131: mov(%eax,%esi,1),%eax 0x0865ecd6 findcell+134: cmp$0x,%eax 0x0865ecd9 findcell+137: je 0x865edb8 findcell+360 0x0865ecdf findcell+143: fldl -0x38(%ebp) 0x0865ece2 findcell+146: fstpl -0x48(%ebp) 0x0865ece5 findcell+149: jmp0x865ed21 findcell+209 0x0865ece7 findcell+151: nop 0x0865ece8 findcell+152: fldl -0x20(%ebp) 0x0865eceb findcell+155: fldl -0x48(%ebp) 0x0865ecee findcell+158: fucompp 0x0865ecf0 findcell+160: fnstsw %ax 0x0865ecf2 findcell+162: sahf 0x0865ecf3 findcell+163: sete %al 0x0865ecf6 findcell+166: setnp %dl 0x0865ecf9 findcell+169: and%edx,%eax 0x0865ecfb findcell+171: movzbl %al,%eax 0x0865ecfe findcell+174: test %eax,%eax 0x0865ed00 findcell+176: jne0x865edb8 findcell+360 0x0865ed06 findcell+182: mov-0x2c(%ebp),%edx 0x0865ed09 findcell+185: add-0x24(%ebp),%ebx 0x0865ed0c findcell+188: mov-0x54(%ebp),%ecx 0x0865ed0f findcell+191: and-0x28(%ebp),%ebx 0x0865ed12 findcell+194: lea(%edx,%ebx,4),%eax 0x0865ed15 findcell+197: mov(%ecx,%eax,1),%eax 0x0865ed18 findcell+200: cmp$0x,%eax 0x0865ed1b findcell+203: je 0x865edb8 findcell+360 0x0865ed21 findcell+209: cmp$0xfffe,%eax 0x0865ed24 findcell+212: je 0x865ed06 findcell+182 0x0865ed26 findcell+214: mov-0x30(%ebp),%ecx 0x0865ed29 findcell+217: shl$0x4,%eax 0x0865ed2c findcell+220: add-0x54(%ebp),%eax 0x0865ed2f findcell+223: cmpl $0x7ff56789,-0x34(%ebp) 0x0865ed36 findcell+230: mov0xc(%ecx,%eax,1),%edx 0x0865ed3a findcell+234: mov0x10(%ecx,%eax,1),%ecx 0x0865ed3e findcell+238: mov%edx,-0x20(%ebp) 0x0865ed41 findcell+241: mov%ecx,-0x1c(%ebp) 0x0865ed44 findcell+244: jne0x865ece8 findcell+152 0x0865ed46 findcell+246: mov-0x20(%ebp),%eax 0x0865ed49 findcell+249: cmp%eax,-0x38(%ebp) 0x0865ed4c findcell+252: je 0x865edb8 findcell+360 0x0865ed4e findcell+254: mov-0x38(%ebp),%edx 0x0865ed51 findcell+257: mov-0x34(%ebp),%ecx 0x0865ed54 findcell+260: mov%edx,(%esp) 0x0865ed57 findcell+263: mov%ecx,0x4(%esp) 0x0865ed5b findcell+267: call 0x8665d60 naStr_len 0x0865ed60 findcell+272: mov-0x20(%ebp),%esi 0x0865ed63 findcell+275: mov-0x1c(%ebp),%edi 0x0865ed66 findcell+278: mov%esi,(%esp) 0x0865ed69 findcell+281: mov%edi,0x4(%esp)
Re: [Flightgear-devel] nan-a-palooza
On Mon, Dec 14, 2009 at 3:25 PM, Jacob Burbach jmburb...@gmail.com wrote: Not sure what you meant about gdb generating code to cause it, I get the same error when run outside of gdb. Assembly of the function below, if you need something else let me know. I meant gcc, sorry :) Thanks for the listing. Looks like the number is loaded by the code before the IS_NUM check, but I suspect it is for passing to the equal function (which got inlined - confusing, eh?). If it were c++ code we could change it to a const reference ... as it is, I think we'll have to try the const pointer route as per attached patch :( -- Csaba/Jester diff --git a/simgear/nasal/hash.c b/simgear/nasal/hash.c index 1efe8fb..0aebc15 100644 --- a/simgear/nasal/hash.c +++ b/simgear/nasal/hash.c @@ -62,12 +62,12 @@ static unsigned int refhash(naRef key) } } -static int equal(naRef a, naRef b) +static int equal(const naRef* a, const naRef* b) { -if(IS_NUM(a)) return a.num == b.num; -if(PTR(a).obj == PTR(b).obj) return 1; -if(naStr_len(a) != naStr_len(b)) return 0; -return memcmp(naStr_data(a), naStr_data(b), naStr_len(a)) == 0; +if(IS_NUM(*a)) return a-num == b-num; +if(PTR(*a).obj == PTR(*b).obj) return 1; +if(naStr_len(*a) != naStr_len(*b)) return 0; +return memcmp(naStr_data(*a), naStr_data(*b), naStr_len(*a)) == 0; } /* Returns the index of a cell that either contains a matching key, or @@ -76,7 +76,7 @@ static int findcell(struct HashRec *hr, naRef key, unsigned int hash) { int i, mask = POW2(hr-lgsz+1)-1, step = (2*hash+1) mask; for(i=HBITS(hr,hash); TAB(hr)[i] != ENT_EMPTY; i=(i+step)mask) -if(TAB(hr)[i] != ENT_DELETED equal(key, ENTS(hr)[TAB(hr)[i]].key)) +if(TAB(hr)[i] != ENT_DELETED equal(key, ENTS(hr)[TAB(hr)[i]].key)) break; return i; } -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
I applied your hash patch, but no deuce. Program received signal SIGFPE, Arithmetic exception. [Switching to Thread 0xb62cba20 (LWP 25297)] 0x0865f79c in findcell (hr=0x107b5490, key= {num = nan(0x56789123dabd8), ref = {ptr = {obj = 0x123dabd8, str = 0x123dabd8, vec = 0x123dabd8, hash = 0x123dabd8, code = 0x123dabd8, func = 0x123dabd8, ccode = 0x123dabd8, ghost = 0x123dabd8}, reftag = 2146789257}}, hash=4111002719) at ../../../simgear/nasal/hash.c:67 67 if(IS_NUM(*a)) return a-num == b-num; Current language: auto; currently c (gdb) bt #0 0x0865f79c in findcell (hr=0x107b5490, key= {num = nan(0x56789123dabd8), ref = {ptr = {obj = 0x123dabd8, str = 0x123dabd8, vec = 0x123dabd8, hash = 0x123dabd8, code = 0x123dabd8, func = 0x123dabd8, ccode = 0x123dabd8, ghost = 0x123dabd8}, reftag = 2146789257}}, hash=4111002719) at ../../../simgear/nasal/hash.c:67 #1 0x0865fe0d in naHash_get (hash=value optimized out, key= {num = nan(0x56789123dabd8), ref = {ptr = {obj = 0x123dabd8, str = 0x123dabd8, vec = 0x123dabd8, hash = 0x123dabd8, code = 0x123dabd8, func = 0x123dabd8, ccode = 0x123dabd8, ghost = 0x123dabd8}, reftag = 2146789257}}, out=0xbf83b0e0) at ../../../simgear/nasal/hash.c:130 #2 0x0865c934 in naInternSymbol (sym= {num = nan(0x56789123dabd8), ref = {ptr = {obj = 0x123dabd8, str = 0x123dabd8, vec = 0x123dabd8, hash = 0x123dabd8, code = 0x123dabd8, func = 0x123dabd8, ccode = 0x123dabd8, ghost = 0x123dabd8}, reftag = 2146789257}}) at ../../../simgear/nasal/codegen.c:74 #3 0x08659189 in naNewContext () at ../../../simgear/nasal/code.c:190 #4 0x084c469e in FGNasalSys::init (this=0x1079e578) at ../../../src/Scripting/NasalSys.cxx:650 #5 0x0808e3f0 in fgInitSubsystems () at ../../../src/Main/fg_init.cxx:1709 #6 0x0806d0f8 in fgIdleFunction () at ../../../src/Main/main.cxx:774 #7 0x080bbf82 in fgOSMainLoop () at ../../../src/Main/fg_os_osgviewer.cxx:172 #8 0x0806d8d5 in fgMainInit (argc=10, argv=0xbf83b554) at ../../../src/Main/main.cxx:920 #9 0x0806baef in main (argc=10, argv=0xbf83b554) at ../../../src/Main/bootstrap.cxx:229 -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Mon, Dec 14, 2009 at 4:37 PM, Jacob Burbach jmburb...@gmail.com wrote: I applied your hash patch, but no deuce. Then I have no idea why the code is loading the value. Got a new asm listing? :) -- Csaba/Jester -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Dump of assembler code for function findcell: 0x0865f710 findcell+0:push %ebp 0x0865f711 findcell+1:mov%esp,%ebp 0x0865f713 findcell+3:push %edi 0x0865f714 findcell+4:xor%edi,%edi 0x0865f716 findcell+6:push %esi 0x0865f717 findcell+7:push %ebx 0x0865f718 findcell+8:xor%ebx,%ebx 0x0865f71a findcell+10: sub$0x4c,%esp 0x0865f71d findcell+13: mov0x4(%eax),%esi 0x0865f720 findcell+16: mov%eax,-0x38(%ebp) 0x0865f723 findcell+19: mov$0x1,%eax 0x0865f728 findcell+24: mov%ecx,-0x3c(%ebp) 0x0865f72b findcell+27: mov%edx,-0x40(%ebp) 0x0865f72e findcell+30: mov0x8(%ebp),%edx 0x0865f731 findcell+33: lea0x1(%esi),%ecx 0x0865f734 findcell+36: shl%cl,%eax 0x0865f736 findcell+38: sub$0x1,%eax 0x0865f739 findcell+41: mov%eax,-0x30(%ebp) 0x0865f73c findcell+44: mov-0x30(%ebp),%ecx 0x0865f73f findcell+47: lea0x1(%edx,%edx,1),%eax 0x0865f743 findcell+51: and%ecx,%eax 0x0865f745 findcell+53: test %esi,%esi 0x0865f747 findcell+55: mov%eax,-0x2c(%ebp) 0x0865f74a findcell+58: je 0x865f75e findcell+78 0x0865f74c findcell+60: mov$0x20,%ecx 0x0865f751 findcell+65: mov%edx,%ebx 0x0865f753 findcell+67: sub%esi,%ecx 0x0865f755 findcell+69: shr%cl,%ebx 0x0865f757 findcell+71: lea0x0(,%ebx,4),%edi 0x0865f75e findcell+78: mov-0x38(%ebp),%eax 0x0865f761 findcell+81: mov%esi,%ecx 0x0865f763 findcell+83: add$0xc,%eax 0x0865f766 findcell+86: and$0x7,%eax 0x0865f769 findcell+89: lea0x7(%eax),%edx 0x0865f76c findcell+92: and$0x8,%edx 0x0865f76f findcell+95: sub%eax,%edx 0x0865f771 findcell+97: mov$0x10,%eax 0x0865f776 findcell+102: shl%cl,%eax 0x0865f778 findcell+104: lea0xc(%edx,%eax,1),%eax 0x0865f77c findcell+108: mov%eax,-0x34(%ebp) 0x0865f77f findcell+111: mov-0x38(%ebp),%eax 0x0865f782 findcell+114: mov-0x34(%ebp),%ecx 0x0865f785 findcell+117: add%edi,%eax 0x0865f787 findcell+119: mov(%eax,%ecx,1),%eax 0x0865f78a findcell+122: cmp$0x,%eax 0x0865f78d findcell+125: je 0x865f870 findcell+352 0x0865f793 findcell+131: mov-0x3c(%ebp),%ecx 0x0865f796 findcell+134: add$0xc,%edx 0x0865f799 findcell+137: fldl -0x40(%ebp) 0x0865f79c findcell+140: fstpl -0x20(%ebp) 0x0865f79f findcell+143: mov%edx,-0x44(%ebp) 0x0865f7a2 findcell+146: mov%ecx,-0x14(%ebp) 0x0865f7a5 findcell+149: mov-0x40(%ebp),%ecx 0x0865f7a8 findcell+152: mov%ecx,-0x24(%ebp) 0x0865f7ab findcell+155: jmp0x865f7e8 findcell+216 0x0865f7ad findcell+157: lea0x0(%esi),%esi 0x0865f7b0 findcell+160: fldl (%esi) 0x0865f7b2 findcell+162: fldl -0x20(%ebp) 0x0865f7b5 findcell+165: fucompp 0x0865f7b7 findcell+167: fnstsw %ax 0x0865f7b9 findcell+169: sahf 0x0865f7ba findcell+170: sete %al 0x0865f7bd findcell+173: setnp %dl 0x0865f7c0 findcell+176: and%edx,%eax 0x0865f7c2 findcell+178: movzbl %al,%eax 0x0865f7c5 findcell+181: test %eax,%eax 0x0865f7c7 findcell+183: jne0x865f870 findcell+352 0x0865f7cd findcell+189: mov-0x34(%ebp),%ecx 0x0865f7d0 findcell+192: add-0x2c(%ebp),%ebx 0x0865f7d3 findcell+195: mov-0x38(%ebp),%edx 0x0865f7d6 findcell+198: and-0x30(%ebp),%ebx 0x0865f7d9 findcell+201: lea(%ecx,%ebx,4),%eax 0x0865f7dc findcell+204: mov(%edx,%eax,1),%eax 0x0865f7df findcell+207: cmp$0x,%eax 0x0865f7e2 findcell+210: je 0x865f870 findcell+352 0x0865f7e8 findcell+216: cmp$0xfffe,%eax 0x0865f7eb findcell+219: je 0x865f7cd findcell+189 0x0865f7ed findcell+221: mov-0x38(%ebp),%esi 0x0865f7f0 findcell+224: shl$0x4,%eax 0x0865f7f3 findcell+227: add-0x44(%ebp),%eax 0x0865f7f6 findcell+230: add%eax,%esi 0x0865f7f8 findcell+232: cmpl $0x7ff56789,-0x14(%ebp) 0x0865f7ff findcell+239: jne0x865f7b0 findcell+160 0x0865f801 findcell+241: mov-0x24(%ebp),%eax 0x0865f804 findcell+244: cmp(%esi),%eax 0x0865f806 findcell+246: je 0x865f870 findcell+352 0x0865f808 findcell+248: mov-0x40(%ebp),%edx 0x0865f80b findcell+251: mov-0x3c(%ebp),%ecx 0x0865f80e findcell+254: mov%edx,(%esp) 0x0865f811 findcell+257: mov%ecx,0x4(%esp) 0x0865f815 findcell+261: call 0x8666810 naStr_len 0x0865f81a findcell+266: mov0x4(%esi),%edi 0x0865f81d findcell+269: mov(%esi),%esi 0x0865f81f findcell+271: mov%edi,0x4(%esp) 0x0865f823 findcell+275: mov%esi,(%esp) 0x0865f826 findcell+278: mov%eax,-0x28(%ebp) 0x0865f829 findcell+281: call 0x8666810 naStr_len 0x0865f82e
Re: [Flightgear-devel] nan-a-palooza
Status summary: 0) I merged in Jester's nan-fixes. That was easy: git remote add -t nan-fixes jester git://gitorious.org/~jester/fg/jesters-clone.git git pull jester make I have not explicitly disabled real-weather-fetch, but it appears to be non-enabled by default. The ai-traffic remains enabled. This is the default. So here's what I observe in this state: 1) If I want to get anything done, I cannot --enable-fpe because that leads to an early FP exception, while the splash screen is still up. I cannot say whether this is due to intrinsic badness in the fglrx driver, or whether the driver is being passed unwholesome data. #include tirade about non-open-source drivers 2) It is still easy to get SEGVs or ABORTs (due to corrupt double-linked lists) when exiting from the sim. Some logs including tracebacks are here http://www.av8n.com/fly/fgfs//corrupt--21540.log http://www.av8n.com/fly/fgfs//corrupt--21628.log This appears to be about 90% reproducible chez moi. It is at least as likely to happen after after a short (30 second) simulator run as after a long (90 minute) one. 3) The good part is that I can now sit stationary on the ground, taxi, and even fly without seeing nan messages spewed on the console. This is a major improvement. I attribute this improvement to Jester's patches, since AFACT that is the only significant thing that has changed since a couple of days ago. Some seriously laborious debugging must have gone into preparing those patches. It is much appreciated. I am particularly amused by commit a3d5fda6b09e from 24 Oct 2009. It replaced 17 lines with 1 line and produced a better result. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Tue, Dec 15, 2009 at 1:46 AM, John Denker j...@av8n.com wrote: Status summary: 0) I merged in Jester's nan-fixes. That was easy: git remote add -t nan-fixes jester git://gitorious.org/~jester/fg/jesters-clone.git git pull jester make We want GIT! We want GIT! :D 1) If I want to get anything done, I cannot --enable-fpe because that leads to an early FP exception, while the splash screen is still up. That may be the nasal bug Jacob is seeing. I could reproduce it and also made a little test case that I am gonna submit as a gcc bug report. It is clearly accessing the double member of the union before it has been established as valid. I have adjusted the workaround I had posted earlier so that the bug is no longer triggered for me. (See attachment) Also, compiling with -O3 makes it go away here. 2) It is still easy to get SEGVs or ABORTs (due to corrupt double-linked lists) when exiting from the sim. Some logs including tracebacks are here http://www.av8n.com/fly/fgfs//corrupt--21540.log http://www.av8n.com/fly/fgfs//corrupt--21628.log This appears to be about 90% reproducible chez moi. It is at least as likely to happen after after a short (30 second) simulator run as after a long (90 minute) one. I can confirm this as well. I have already reported two potential issues that need to be investigated (based on valgrind reports). Also, Tat has started a cvs bisect: so far he has bracketed the problem between 1st of october and 1st of november, I think. -- Csaba/Jester diff --git a/simgear/nasal/hash.c b/simgear/nasal/hash.c index 1efe8fb..f9683ee 100644 --- a/simgear/nasal/hash.c +++ b/simgear/nasal/hash.c @@ -62,28 +62,28 @@ static unsigned int refhash(naRef key) } } -static int equal(naRef a, naRef b) +static int equal(const naRef* a, const naRef* b) { -if(IS_NUM(a)) return a.num == b.num; -if(PTR(a).obj == PTR(b).obj) return 1; -if(naStr_len(a) != naStr_len(b)) return 0; -return memcmp(naStr_data(a), naStr_data(b), naStr_len(a)) == 0; +if(IS_NUM(*a)) return a-num == b-num; +if(PTR(*a).obj == PTR(*b).obj) return 1; +if(naStr_len(*a) != naStr_len(*b)) return 0; +return memcmp(naStr_data(*a), naStr_data(*b), naStr_len(*a)) == 0; } /* Returns the index of a cell that either contains a matching key, or * is the empty slot to receive a new insertion. */ -static int findcell(struct HashRec *hr, naRef key, unsigned int hash) +static int findcell(struct HashRec *hr, const naRef* key, unsigned int hash) { int i, mask = POW2(hr-lgsz+1)-1, step = (2*hash+1) mask; for(i=HBITS(hr,hash); TAB(hr)[i] != ENT_EMPTY; i=(i+step)mask) -if(TAB(hr)[i] != ENT_DELETED equal(key, ENTS(hr)[TAB(hr)[i]].key)) +if(TAB(hr)[i] != ENT_DELETED equal(key, ENTS(hr)[TAB(hr)[i]].key)) break; return i; } static void hashset(HashRec* hr, naRef key, naRef val) { -int ent, cell = findcell(hr, key, refhash(key)); +int ent, cell = findcell(hr, key, refhash(key)); if((ent = TAB(hr)[cell]) == ENT_EMPTY) { ent = hr-next++; if(ent = NCELLS(hr)) return; /* race protection, don't overrun */ @@ -127,7 +127,7 @@ int naHash_get(naRef hash, naRef key, naRef* out) { HashRec* hr = REC(hash); if(hr) { -int ent, cell = findcell(hr, key, refhash(key)); +int ent, cell = findcell(hr, key, refhash(key)); if((ent = TAB(hr)[cell]) 0) return 0; *out = ENTS(hr)[ent].val; return 1; @@ -147,7 +147,7 @@ void naHash_delete(naRef hash, naRef key) { HashRec* hr = REC(hash); if(hr) { -int cell = findcell(hr, key, refhash(key)); +int cell = findcell(hr, key, refhash(key)); if(TAB(hr)[cell] = 0) { TAB(hr)[cell] = ENT_DELETED; if(--hr-size POW2(hr-lgsz-1)) @@ -211,7 +211,7 @@ int naiHash_tryset(naRef hash, naRef key, naRef val) { HashRec* hr = REC(hash); if(hr) { -int ent, cell = findcell(hr, key, refhash(key)); +int ent, cell = findcell(hr, key, refhash(key)); if((ent = TAB(hr)[cell]) = 0) { ENTS(hr)[ent].val = val; return 1; } } return 0; -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Update: I now have a workaround configuration that at least allows me to park without anything too terrible happening. The sim will run for an hour, even with the FPE trap enabled. This workaround configuration (call it X) entails: 1) Passing *no* options on the command line, using options in the .fgfsrc file instead. 2) Disabling various AI things: --prop:/sim/ai-traffic/enabled=0 --prop:/sim/traffic-manager/enabled=0 --disable-ai-models In contrast: 1) Starting from X, if I pass my usual bunch of arguments on the command line, I get an FPE very early, while the splash screen is still up. This happens *even if* the exact same options are also passed in my .fgfsrc file. 2) Starting from X, if I don't disable the AI stuff, then after a few minutes I get a finite bunch of nans on the console ... without catching any FPEs. 3) Keeping X with no changes, I sometimes observe segmentation faults when I exit out of the sim. A traceback can be found at: http://www.av8n.com/fly/fgfs/segv--21023.log = My next step is to start incorporating Jester's various nan-fixes into my workspace. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Hi, On Sunday 13 December 2009 01:49:47 am Ron Jensen wrote: On Sat, 2009-12-12 at 16:21 -0700, John Denker wrote: --prop:/sim/ai-traffic/enabled=0 --prop:/sim/traffic-manager/enabled=0 Just to clarify: The AI system that is generating the NaNs, is the old one (i.e. code contained in ATCDCL). Indeed you indeed shut that down using the first of the the -- prop: options above. However, note that you also suggest shutting down the traffic manager, which is unrelated to ATCDCL system causing the NaN problems[*]. Obviously, you may have other reasons to disable that, however, I just wish to point out that this isn't necessary for getting rid of the NaN flurry. Hope this helps. Cheers, Durk [*] Disclaimer: Last time I investigated, the NaNs were triggered by the ground elevation query function. This function is called by the user aircraft, the AIModels subsystem, the old ATCDCL AI system, and the ridge lift calculation code). While tracing the code for a potential cause, I occasionally noticed that it was occasionally triggered by the ridge lift code, but most of the time by the ATCDCL code. And, I pretty systematically never found a problem when ai-traffic was disabled. So, while evidence is still a bit anecdotal, I have reason to assume that the core of the NaN problem lies somewhere there. D. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Hi, My experience was, that this NaN's only appear with --prop:/sim/ai-traffic/enabled=1 on win32. All other things seems not to produce NaN's. Btw I wonder if we still need ai-traffic, as the interactive traffic works so much better and more nice? Cheers HHS Update: I now have a workaround configuration that at least allows me to park without anything too terrible happening. The sim will run for an hour, even with the FPE trap enabled. This workaround configuration (call it X) entails: 1) Passing *no* options on the command line, using options in the .fgfsrc file instead. 2) Disabling various AI things: --prop:/sim/ai-traffic/enabled=0 --prop:/sim/traffic-manager/enabled=0 --disable-ai-models In contrast: 1) Starting from X, if I pass my usual bunch of arguments on the command line, I get an FPE very early, while the splash screen is still up. This happens *even if* the exact same options are also passed in my .fgfsrc file. 2) Starting from X, if I don't disable the AI stuff, then after a few minutes I get a finite bunch of nans on the console ... without catching any FPEs. 3) Keeping X with no changes, I sometimes observe segmentation faults when I exit out of the sim. A traceback can be found at: http://www.av8n.com/fly/fgfs/segv--21023.log = My next step is to start incorporating Jester's various nan-fixes into my workspace. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel __ Do You Yahoo!? Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. http://mail.yahoo.com -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
John Denker wrote: Update: To observe this bug, I don't even need to taxi. I can just sit at the starting point of runway 31L at JFK with the engine off. After sitting about 8 minutes, I observe nan messages on the console. Jon, could you specify if this was with 'real' weather enabled or disabled? I've seen something similar when it was enabled but haven't seen it without real weather. Erik -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
I never have the ai traffic enabled, but still get nans sometimes. The ai traffic may be triggering a nan, but I'm not sure it's actually the root cause. Debugging nans can be a real pita, with every operation against a nan producing a nan they spread like wildfire. By the time it causes a problem it's usually a long way from the origin. Anyway, I really hope a release won't be rushed with problems like these still present. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Hi, On Sunday 13 December 2009 09:53:49 am Heiko Schulz wrote: Hi, My experience was, that this NaN's only appear with --prop:/sim/ai-traffic/enabled=1 on win32. All other things seems not to produce NaN's. Btw I wonder if we still need ai-traffic, as the interactive traffic works so much better and more nice? The one thing that is holding me back is the fact that the Interactive Traffic system, isn't as interactive as I would like. I am getting close to tackling that problem though, so hopefully in the course of next year we could start phasing out ai-traffic. Cheers, Durk -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On 12/13/2009 02:33 AM, Erik Hofman wrote: Jon, could you specify if this was with 'real' weather enabled or disabled? I've seen something similar when it was enabled but haven't seen it without real weather. With an explicit --disable-real-weather-fetch, I still observe an early FPE, while the splash screen is still up ... if the FPE trap is enabled. == UPDATE: I have a surprising explanation for the previously- reported fact that FPE behavior depends on whether options are passed on the command line or passed via .fgfsrc It turns out that passing --enable-fpe via .fgfsrc is a no-op. There is some special code in bootstrap.cxx that processes the enable-fpe option _if it occurs on the command line_ and not otherwise. Meanwhile, the normal option processing code in options.cxx recognizes the --enable-fpe option but does nothing about it. This needs fixing. I'm not sure what the proper fix is. *) Is there some reason why the --enable-fpe on the command line needs to be handled super-early? Normal option processing is already one of the earliest steps. Wouldn't that be early enough? *) If it needs to be a command-line-only thing, then that restriction needs to be documented ... and using it in .fgfsrc should result in an error message. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Sun, Dec 13, 2009 at 10:48 PM, John Denker j...@av8n.com wrote: ... UPDATE: I have a surprising explanation for the previously- reported fact that FPE behavior depends on whether options are passed on the command line or passed via .fgfsrc It turns out that passing --enable-fpe via .fgfsrc is a no-op. There is some special code in bootstrap.cxx that processes the enable-fpe option _if it occurs on the command line_ and not otherwise. Meanwhile, the normal option processing code in options.cxx recognizes the --enable-fpe option but does nothing about it. This needs fixing. I'm not sure what the proper fix is. *) Is there some reason why the --enable-fpe on the command line needs to be handled super-early? Normal option processing is already one of the earliest steps. Wouldn't that be early enough? Probably historical reasons, and the fact that other platform specific initialization is done in bootstrap.cxx. Also, --enable-fpe is a developer-only kind of thing... at least it would be, if we didn't have the NaN problem. These are all sort of lame reasons, but that's the way it is for the moment. Tim *) If it needs to be a command-line-only thing, then that restriction needs to be documented ... and using it in .fgfsrc should result in an error message. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
[Flightgear-devel] nan-a-palooza
Afte taxiing a little ways down runway 31L at JFK, the display freezes and the sim starts spewing messages to the console: Warning:: Picked up error in TriangleIntersect For additional details see http://www.av8n.com/fly/fgfs/nan--25387.log This is observed when some properties are being displayed on-screen (airspeed, wind speed, and throttle position). This is reproducible chez moi, in the sense that in three attempts, I was unable to taxi more than three thousand feet down the runway (although the exact distance varied from run to run). In contrast, with no properties displayed on the screen, I was able to taxi the full length (14,000 feet or so). For details concerning the system on which this was observed, see http://www.av8n.com/fly/fgfs/barf.log -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On 12/12/2009 04:53 AM, I wrote: Afte taxiing a little ways down runway 31L at JFK, the display freezes and the sim starts spewing messages to the console: Warning:: Picked up error in TriangleIntersect For additional details see http://www.av8n.com/fly/fgfs/nan--25387.log This is observed when some properties are being displayed on-screen (airspeed, wind speed, and throttle position). This is reproducible chez moi, in the sense that in three attempts, I was unable to taxi more than three thousand feet down the runway (although the exact distance varied from run to run). In contrast, with no properties displayed on the screen, I was able to taxi the full length (14,000 feet or so). Update: It happens about half the time even with no properties being displayed on the screen. Update: The same thing is observed at SFO, not just JFK. It may happen lots of other places as well; I haven't checked. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On 12/12/2009 05:16 AM, Csaba Halász wrote: Hi John, Could you please use the --enable-fpe option and try to get a backtrace? Sure. In one case I had to taxi a little while before getting an FPE: http://www.av8n.com/fly/fgfs/fpe--27376.log In another case I got an FPE very early, while the splash screen was still showing: http://www.av8n.com/fly/fgfs/fpe--27465.log (I saw the early FPE on another occasion but didn't have gdb running.) === Another observation: I find the bug is easy to reproduce when options are passed on the command line ... and harder to reproduce when the same options are requested via the .fgfsrc file. It's hard to know what to make of this, but as always we must consider the possibility that memory is getting trampled ... such that the code first affected by the bug is not the code that caused the bug ... which is no fun to debug. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
Update: To observe this bug, I don't even need to taxi. I can just sit at the starting point of runway 31L at JFK with the engine off. After sitting about 8 minutes, I observe nan messages on the console. One interesting thing is that *no* floating point exception was raised this time. The FPE trap was enabled but didn't catch anything. This is in contrast to previous times where either the trap was disabled or the FPE preceded any nan messages. This time I got only a finite number of nan messages ... in contrast to previous times when I got an apparently endless spew. This was with options passed via .fgfsrc not via the command line. This is more-or-less necessary when the FPE trap is enabled, if I want the sim to live long enough to get past the splash screen. For details, see http://www.av8n.com/fly/fgfs/nan--27763.log As before, the barf of system information is at http://www.av8n.com/fly/fgfs/barf.log -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On 12/12/2009 02:03 PM, Heiko Schulz wrote: I could solve that issue with disabling AI-Traffic (Not the Interactive Traffic) I hate to ask silly questions, but ... are you suggesting --disable-ai-models Disable the artificial traffic subsystem. The last time I used that option somebody told me I was doing the wrong thing. In any case, I observe that --disable-ai-models does not make the nan problem go away. Specifically: After being parked for a few minutes I observed a finite number of nan messages on the console. This is pretty much the same as without the --disable-ai-models. The sim remained alive, but was severely obtunded. It was using 100% of the CPU, but the frame rate was down around 18, which is about half of what I would normally expect under the circumstances. Any additional suggestions for things to try would be welcome. -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Sun, Dec 13, 2009 at 12:21 AM, John Denker j...@av8n.com wrote: Any additional suggestions for things to try would be welcome. It is very strange because I have never seen a NaN slip past the --enable-fpe guard. You could try to build my nan-fixes branch from gitorious (http://gitorious.org/~jester) to see if any of my changes make the problem go away for you. -- Cheers, Csaba/Jester -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel
Re: [Flightgear-devel] nan-a-palooza
On Sat, 2009-12-12 at 16:21 -0700, John Denker wrote: On 12/12/2009 02:03 PM, Heiko Schulz wrote: I could solve that issue with disabling AI-Traffic (Not the Interactive Traffic) I hate to ask silly questions, but ... are you suggesting --disable-ai-models Disable the artificial traffic subsystem. The last time I used that option somebody told me I was doing the wrong thing. No, --disable-ai-models simply hides the ai-traffic from you. It still runs. You need: --prop:/sim/ai-traffic/enabled=0 --prop:/sim/traffic-manager/enabled=0 Ron -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Flightgear-devel mailing list Flightgear-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/flightgear-devel