Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-24 Thread tpalinkas
On Wed, 23 Jan 2008, Torsten Dreyer wrote:

 Hi Tibor,

 I am running SuSE linux, currently 10.3.
 But there is good news, I can reproduce the client misbehaviour now where the
 aircraft sits at -ft after a client crash and restart.

 I am currently looking into the issue, but it will take some time.

 Regards, Torsten


Great, thank you for debugging this one.

I think the crash is not necessary to reproduce the - problem, 
the wrong order of starting up the two flightgears should do it.

Best regards,

Tibor

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-23 Thread Torsten Dreyer
  Currently I cannot reproduce any other misbehaviour than the segfault
  that you describe as gone now.

 Do you have .deb based system? If so, I could send you the relevant deb
 packages we used. Or alternatively we could try reproducing the issue on
 vservers and send you a compressed vserver.
Hi Tibor,

I am running SuSE linux, currently 10.3.
But there is good news, I can reproduce the client misbehaviour now where the 
aircraft sits at -ft after a client crash and restart. 

I am currently looking into the issue, but it will take some time.

Regards, Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-23 Thread Torsten Dreyer
 That sounds like you're using TCP, since if you were using UDP, the
 master would not know if the slave(s) received the message -- UDP is an
 unreliable protocol and the master
 does not know if it is transmittiing into oblivion or reaching an actual
 slave instance of FlightGear. Provided the native protocol doesn't have
 a mechanism to provide feedback of received messages to the master, that
 is.
This was running round in my head all day and I did some investigation with a 
debugger, wireshark and a internet-searchengine at hand...

Here is what I learned today:
When you send a udp datagram to any machine and the port is not open on that 
machine, you get a ICMP destination ureachable/port unreachable message from 
the targeted machine. Looks like this is interpreted by the socket 
implementation here on my linux box and it is finally handed over as an error 
from the send() system that produces the warning message.

So, nothing to worry about. It's just a warning message on the console and as 
soon as the client is up again, data flow continues as it is supposed to with 
udp.

Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-22 Thread Torsten Dreyer
  We were using the current version of gcc in Debian testing (4.1.2, I
 believe?), but the same problem occurred with the debian packaged 1.0.0,
 the ubuntu packages, and older versions as well.

  As said in another mail, the stability issues improved with the small
 patch posted on this list. There's no segfault any more.
  There's still the doublefree/corruption problem (which tends to appear
 at shutdown), and master/slave startup order still matters. If the
 master starts before the slave, the slave plane starts stuck in the
 ground at - feet, and doesn't move. A reset on the slave side
 restores correct functionality.
- What are the configurations of the two machines? 
- Are they equal, same os same architecture (32/64) bit
- What gcc/g++ version was used to compile
- What plib version?
- Do you share the same binary for the two machines or were they built 
independantly?

The native protocol is *very* native, it just copies the internal data 
structure to the stream without caring about byte order, byte/word alignment 
or the kind of data representation in a struct/class. 

Currently I cannot reproduce any other misbehaviour than the segfault that you 
describe as gone now.

Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-22 Thread tpalinkas
On Tue, 22 Jan 2008, Torsten Dreyer wrote:

snip

 - What are the configurations of the two machines?

Both machines are x86 running Debian testing.
Acer notebook: Intel Pentium M (1.60GHz), 1 gb ram, ATI Radeon Mobility X700 
(PCIE)
dektop: AMD Athlon(tm) XP 3200+, 1 gb ram, some nvidia

 - Are they equal, same os same architecture (32/64) bit

Both are 32 bit with the same byte order

 - What gcc/g++ version was used to compile

gcc: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
libstdc++6: 4.1.1-21
libc6: 2.3.6.ds1-13etch4

 - What plib version?

Compiled with 1.8.4-6 but both machines have 1.8.4-8

 - Do you share the same binary for the two machines or were they built
 independantly?

Same binary; I've compiled everything and built .deb packages and 
installed those on both machines.

 The native protocol is *very* native, it just copies the internal data
 structure to the stream without caring about byte order, byte/word alignment
 or the kind of data representation in a struct/class.

 Currently I cannot reproduce any other misbehaviour than the segfault that you
 describe as gone now.

Do you have .deb based system? If so, I could send you the relevant deb 
packages we used. Or alternatively we could try reproducing the issue on 
vservers and send you a compressed vserver.

TIA

Tibor Palinkas


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread tpalinkas


On Fri, 18 Jan 2008, Torsten Dreyer wrote:

 Hi all,

 I think there is a bug when using the native protocol to link two instances of
 FlightGear via network or when recording and playing back flights using

 fgfs --native=file,out,20,fgfs.out

 and

 fgfs --native,file,in,20,fgfs.out --fdm=external

 The external FlightGear crashes and after some investigation I think the
 problem is a missing operator =() method in FDM/flight.hxx

 The problem is:
 In Network/native.cxx a buffer is read either from network of from a file
 containing the previously written fdm state. The content of the buffer is
 than assigned to the current fdm state by doing
*cur_fdm_state = buf;
 both variables are of type FGInterface which currently lacks a operator =()
 method, so the compiler uses a simple memcpy to copy one object to the other.
 This is almost ok but for the ground_cache object. This is a complex object
 containing a std::vectorFGGroundCache::Triangle. This vector seems to store
 memory pointers to the Triangle-vertices. This is a very bad thing because
 these pointers are invalid for any other FlightGear session and dereferencing
 them causes a segmentation fault.

 A very ugly - if not disgusting - workaround is adding the following to the
 public methods of FGInterface in FDM/flight.hxx:

virtual const FGInterface  operator = ( FGInterface  src ) {
  char * start = (char*)inited;
  char * end = (char*)ground_cache;
  memcpy( inited, src.inited, end-start );
  prepare_ground_cache_m( 0, geodetic_position_v, 100.0 );
}

 This gets called instead of a memcpy when assinging one FGInterface to another
 and it does the memcpy for all member variables but the ground_cache. The
 ground_cache itself is initialized for the recovered position with a fix
 reference time of 0 and a radius of 100m.

 At least this change fixes the segfault when replaying with the native
 protocol, but I don't think this is the kind of code we want to see in
 FlightGear for two reasons:

 a) The pointer arithmetic assuming simple datatypes between the inited and
 ground_cache variable

 b) A constant used for reference time and the radius.

 While a) may be circumnavigated by using explicit assignments for all
 variables, I have no good idea for b). The radius might be saved when doing
 the output, but I do not understand the idea of the reference time...

 And there is one thing that is going round in my head: Curt reported, that he
 does not have this problem at all and no one else (except tpalinkas) reported
 this crash. Maybe this a a compiler/library problem?

 Thanks for reading all that - any comment or help is appreciated.

 Torsten


We applied your patch and it fixed the initial segfault in slave. 
(However, we experience double-free/corruption when the slave quits.)

Another strange bug is that if we start up in the wrong order (master 
first, without the patch, this order caused an immediate segfault), the 
initial states of the slave are messed up (altitude is - ft; plane 
permanently stuck in the ground). Doing a reset on the slave 
fixes the problem even if we've taken off with the master.

TIA

Tibor Palinkas

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Curtis Olson
I am testing the udp version of doing master/slave copies of FlightGear here
this morning.  I'm doing this with a stock v1.0 version.  So far everything
seems to be behaving well.  I'm not seeing any rapid memory leak, and so far
no crash.

Are you seeing this only with file I/O?  Are you seeing this with network
I/O?  How long do you need to have the system running before you see memory
thrashing or a crash?

Thanks,

Curt.


On Jan 21, 2008 7:03 AM, tpalinkas wrote:



 On Fri, 18 Jan 2008, Torsten Dreyer wrote:

  Hi all,
 
  I think there is a bug when using the native protocol to link two
 instances of
  FlightGear via network or when recording and playing back flights using
 
  fgfs --native=file,out,20,fgfs.out
 
  and
 
  fgfs --native,file,in,20,fgfs.out --fdm=external
 
  The external FlightGear crashes and after some investigation I think
 the
  problem is a missing operator =() method in FDM/flight.hxx
 
  The problem is:
  In Network/native.cxx a buffer is read either from network of from a
 file
  containing the previously written fdm state. The content of the buffer
 is
  than assigned to the current fdm state by doing
 *cur_fdm_state = buf;
  both variables are of type FGInterface which currently lacks a operator
 =()
  method, so the compiler uses a simple memcpy to copy one object to the
 other.
  This is almost ok but for the ground_cache object. This is a complex
 object
  containing a std::vectorFGGroundCache::Triangle. This vector seems to
 store
  memory pointers to the Triangle-vertices. This is a very bad thing
 because
  these pointers are invalid for any other FlightGear session and
 dereferencing
  them causes a segmentation fault.
 
  A very ugly - if not disgusting - workaround is adding the following to
 the
  public methods of FGInterface in FDM/flight.hxx:
 
 virtual const FGInterface  operator = ( FGInterface  src ) {
   char * start = (char*)inited;
   char * end = (char*)ground_cache;
   memcpy( inited, src.inited, end-start );
   prepare_ground_cache_m( 0, geodetic_position_v, 100.0 );
 }
 
  This gets called instead of a memcpy when assinging one FGInterface to
 another
  and it does the memcpy for all member variables but the ground_cache.
 The
  ground_cache itself is initialized for the recovered position with a fix
  reference time of 0 and a radius of 100m.
 
  At least this change fixes the segfault when replaying with the native
  protocol, but I don't think this is the kind of code we want to see in
  FlightGear for two reasons:
 
  a) The pointer arithmetic assuming simple datatypes between the inited
 and
  ground_cache variable
 
  b) A constant used for reference time and the radius.
 
  While a) may be circumnavigated by using explicit assignments for all
  variables, I have no good idea for b). The radius might be saved when
 doing
  the output, but I do not understand the idea of the reference time...
 
  And there is one thing that is going round in my head: Curt reported,
 that he
  does not have this problem at all and no one else (except tpalinkas)
 reported
  this crash. Maybe this a a compiler/library problem?
 
  Thanks for reading all that - any comment or help is appreciated.
 
  Torsten
 

 We applied your patch and it fixed the initial segfault in slave.
 (However, we experience double-free/corruption when the slave quits.)

 Another strange bug is that if we start up in the wrong order (master
 first, without the patch, this order caused an immediate segfault), the
 initial states of the slave are messed up (altitude is - ft; plane
 permanently stuck in the ground). Doing a reset on the slave
 fixes the problem even if we've taken off with the master.

 TIA

 Tibor Palinkas

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Flightgear-devel mailing list
 Flightgear-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/flightgear-devel




-- 
Curtis Olson: http://baron.flightgear.org/~curt/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Torsten Dreyer
Am Montag, 21. Januar 2008 18:21 schrieb Curtis Olson:
 I am testing the udp version of doing master/slave copies of FlightGear
 here this morning.  I'm doing this with a stock v1.0 version.  So far
 everything seems to be behaving well.  I'm not seeing any rapid memory
 leak, and so far no crash.

 Are you seeing this only with file I/O?  Are you seeing this with network
 I/O?  How long do you need to have the system running before you see memory
 thrashing or a crash?

I get the segfault with file-io and a udp link.
This is my environment:
- SuSE Linux 10.3 running x86_64 on a Intel(R) Core(TM)2 CPU.
- Two instances of FlightGear frame-rate-throttled to 25 fps each
- FlightGear stock 1.0.0 build from source and current plib cvs built with gcc 
4.2.1

commandline for slave (launched first):
fgfs --native=socket,in,20,localhost,5556,udp  --aircraft=c172p 
--geometry=640x480 --timeofday=noon --fdm=null

commandline for master (launched after slave startup)
fgfs --native=socket,out,20,localhost,5556,udp --aircraft=c172p 
--geometry=640x480 --timeofday=noon

segfaults the slave anything between immediately and after a couple of 
minutes. Sometimes, terminating the master and starting at other locations in 
the world immediately kills the slave, like --airport=KJFK or --airport=LOWI

Doing some more tests with the operator = () method added, I never got a 
crash.
BTW the operator =() could be reduced to

    virtual const FGInterface  operator = ( FGInterface  src ) {
      char * start = (char*)inited;
      char * end = (char*)ground_cache;
      memcpy( inited, src.inited, end-start );
    }

since the prepare_ground_cache will be called later automatically by the 
FGInterface::get_groundlevel.

And while we are at the native protocols: I am sorry to say that the 
native-ctrls is broken, too. The encoding swaps bytes for little endian 
machines when encoding to the net, but does not when decoding from the net. 
This part is commented out in version 1.32:
http://cvs.flightgear.org/cgi-bin/viewvc/viewvc.cgi/source/src/Network/native_ctrls.cxx?r1=1.31r2=1.32
(check line 296 and 353)

Is this by intention?

Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Curtis Olson
On Jan 21, 2008 12:51 PM, Torsten Dreyer wrote:

 Am Montag, 21. Januar 2008 18:21 schrieb Curtis Olson:
  I am testing the udp version of doing master/slave copies of FlightGear
  here this morning.  I'm doing this with a stock v1.0 version.  So far
  everything seems to be behaving well.  I'm not seeing any rapid memory
  leak, and so far no crash.
 
  Are you seeing this only with file I/O?  Are you seeing this with
 network
  I/O?  How long do you need to have the system running before you see
 memory
  thrashing or a crash?

 I get the segfault with file-io and a udp link.
 This is my environment:
 - SuSE Linux 10.3 running x86_64 on a Intel(R) Core(TM)2 CPU.
 - Two instances of FlightGear frame-rate-throttled to 25 fps each
 - FlightGear stock 1.0.0 build from source and current plib cvs built with
 gcc
 4.2.1


Here's one possible difference, I'm running with plib-1.8.4 ... any chance
you could try that and see if it makes a difference.  I realize a complete
ground up recompile is not trivial, but flightgear does leverage plib's low
level network code, so is it possible that a change in plib since v1.8.4 is
causing us grief?

The native_ctrls issue you point out is a surprise to me, but as I look in
the cvs web browser pages, I see that Erik Hofman's name is attached to
these, and the specific commit was adding some new fields to the structure.
I think it makes sense to remove the comments that disable network byte
order.  That appears to have been a mistake that slipped through the cracks.

Regards,

Curt.
-- 
Curtis Olson: http://baron.flightgear.org/~curt/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Curtis Olson
On Jan 21, 2008 1:45 PM, Torsten Dreyer wrote:

 
  Here's one possible difference, I'm running with plib-1.8.4 ... any
 chance
  you could try that and see if it makes a difference.  I realize a
 complete
  ground up recompile is not trivial, but flightgear does leverage plib's
 low
  level network code, so is it possible that a change in plib since v1.8.4is
  causing us grief?
 I am running 1.8.4, too:
 (checked plib/ul.h)
 #define PLIB_MAJOR_VERSION 1
 #define PLIB_MINOR_VERSION 8
 #define PLIB_TINY_VERSION  4

 But I am still convinced, that the ground_cache object causes the crash.
 The
 gdb backtrace, the invalid pointers - everything makes sense to me. On the
 contrary this means, that the native protocol is broken since Nov 22nd
 2004
 when the ground_cache object made its way into flight.hxx... But
 native_ctrls
 tell me, that is sometimes takes years for bugs to show up or be detected
 ;-)


I'm not able to replicate the problem here. :-(  I don't use --native-ctrls
very often but I use --native-fdm frequently for a variety of projects and
it has always been working well for me and seems to continue to work well.
I'm running gcc-4.1.2 here.

g++ (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

So the problem could also be related to your newer version of gcc (perhaps
being more picky or more standards compliant.)  Or there could be a libc
difference too.

The problem could very well be in the ground cache code and something about
that code is blowing up on your system?  Often, newer versions of the gcc
compiler or compilers on other platforms expose weaknesses or problems in
code that worked fine before.  It would be really great if Mathias could
comment since he is the author of the ground cache code.  I have not looked
at that code myself in any detail.

Regards,

Curt.
-- 
Curtis Olson: http://baron.flightgear.org/~curt/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Torsten Dreyer
 g++ (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)

 So the problem could also be related to your newer version of gcc (perhaps
 being more picky or more standards compliant.)  Or there could be a libc
 difference too.
Yeah that's what I think, too. I assume a different implementation in the STL 
to be more precise: the std:vector.
I don't think I like the idea to build a gcc 4.1.2 and rebuild FlightGear from 
source with that compiler but I would not be surprised, if the crash 
disappears that way.

Which compilier is tpalinkas using? 

Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Torsten Dreyer
 The native_ctrls issue you point out is a surprise to me, but as I look in
 the cvs web browser pages, I see that Erik Hofman's name is attached to
 these, and the specific commit was adding some new fields to the structure.
 I think it makes sense to remove the comments that disable network byte
 order.  That appears to have been a mistake that slipped through the
 cracks.
I have just recompiled FlightGear with the comments that disable the byte 
order code removed, and the native-ctrls protocol works like magic.

Can you change the source in CVS? I just removed the two comments:

Index: native_ctrls.cxx
===
RCS file: /var/cvs/FlightGear-0.9/source/src/Network/native_ctrls.cxx,v
retrieving revision 1.33
diff -u -p -r1.33 native_ctrls.cxx
--- native_ctrls.cxx21 Feb 2006 01:19:47 -  1.33
+++ native_ctrls.cxx21 Jan 2008 20:53:27 -
@@ -293,7 +293,6 @@ void FGNetCtrls2Props( FGNetCtrls *net,
 int i;

 SGPropertyNode * node;
-/***
 if ( net_byte_order ) {
 // convert from network byte order
 net-version = htonl(net-version);
@@ -350,7 +349,6 @@ void FGNetCtrls2Props( FGNetCtrls *net,
 net-speedup = htonl(net-speedup);
 net-freeze = htonl(net-freeze);
 }
-*/
 if ( net-version != FG_NET_CTRLS_VERSION ) {
SG_LOG( SG_IO, SG_ALERT,
 Version mismatch with raw controls packet format. );

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Curtis Olson
Yes, this should already be done in both plib and osg branches.  Regards,

Curt.


On Jan 21, 2008 2:55 PM, Torsten Dreyer wrote:

  The native_ctrls issue you point out is a surprise to me, but as I look
 in
  the cvs web browser pages, I see that Erik Hofman's name is attached to
  these, and the specific commit was adding some new fields to the
 structure.
  I think it makes sense to remove the comments that disable network byte
  order.  That appears to have been a mistake that slipped through the
  cracks.
 I have just recompiled FlightGear with the comments that disable the byte
 order code removed, and the native-ctrls protocol works like magic.

 Can you change the source in CVS? I just removed the two comments:

 Index: native_ctrls.cxx
 ===
 RCS file: /var/cvs/FlightGear-0.9/source/src/Network/native_ctrls.cxx,v
 retrieving revision 1.33
 diff -u -p -r1.33 native_ctrls.cxx
 --- native_ctrls.cxx21 Feb 2006 01:19:47 -  1.33
 +++ native_ctrls.cxx21 Jan 2008 20:53:27 -
 @@ -293,7 +293,6 @@ void FGNetCtrls2Props( FGNetCtrls *net,
 int i;

 SGPropertyNode * node;
 -/***
 if ( net_byte_order ) {
 // convert from network byte order
 net-version = htonl(net-version);
 @@ -350,7 +349,6 @@ void FGNetCtrls2Props( FGNetCtrls *net,
 net-speedup = htonl(net-speedup);
 net-freeze = htonl(net-freeze);
 }
 -*/
 if ( net-version != FG_NET_CTRLS_VERSION ) {
SG_LOG( SG_IO, SG_ALERT,
 Version mismatch with raw controls packet format. );

 -
 This SF.net email is sponsored by: Microsoft
 Defy all challenges. Microsoft(R) Visual Studio 2008.
 http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
 ___
 Flightgear-devel mailing list
 Flightgear-devel@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/flightgear-devel




-- 
Curtis Olson: http://baron.flightgear.org/~curt/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Curtis Olson
On Jan 21, 2008 2:38 PM, Torsten Dreyer wrote:

  g++ (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
 
  So the problem could also be related to your newer version of gcc
 (perhaps
  being more picky or more standards compliant.)  Or there could be a libc
  difference too.
 Yeah that's what I think, too. I assume a different implementation in the
 STL
 to be more precise: the std:vector.
 I don't think I like the idea to build a gcc 4.1.2 and rebuild FlightGear
 from
 source with that compiler but I would not be surprised, if the crash
 disappears that way.

 Which compilier is tpalinkas using?


The other thing that concerns me is that even with your patch, tpalikas is
reporting order depencies in the master/slave startup sequence and doing it
wrong results in a crash.  Also some new double free errors when exiting.

The communication between master  slaves is UDP so there should be
absolutely no order dependencies in startup.  Any machine should be able to
start (or restart) in any order without affecting the stability of the
system.  Somehow the master must be sending garbage during startup and
crashing the slaves.

Regards,

Curt.
-- 
Curtis Olson: http://baron.flightgear.org/~curt/
-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Torsten Dreyer
Am Montag, 21. Januar 2008 22:00 schrieb Curtis Olson:
 The other thing that concerns me is that even with your patch, tpalikas is
 reporting order depencies in the master/slave startup sequence and doing it
 wrong results in a crash.  Also some new double free errors when exiting.

 The communication between master  slaves is UDP so there should be
 absolutely no order dependencies in startup.  Any machine should be able to
 start (or restart) in any order without affecting the stability of the
 system.  Somehow the master must be sending garbage during startup and
 crashing the slaves.
Neither of this is occouring here. When I run the master without a slave, 
there are many Error writing data. messages on the master's console which 
stops again when the slave is up.
But I can startup/shutdown the master or slave independently without crash or 
double free error messages. 

Sorry - no idea for that one. 

Torsten


-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread R. van Steenbergen
Torsten Dreyer schreef:
 Neither of this is occouring here. When I run the master without a slave, 
 there are many Error writing data. messages on the master's console which 
 stops again when the slave is up.
 But I can startup/shutdown the master or slave independently without crash or 
 double free error messages. 

 Sorry - no idea for that one. 

 Torsten
   
That sounds like you're using TCP, since if you were using UDP, the 
master would not know if the slave(s) received the message -- UDP is an 
unreliable protocol and the master
does not know if it is transmittiing into oblivion or reaching an actual 
slave instance of FlightGear. Provided the native protocol doesn't have 
a mechanism to provide feedback of received messages to the master, that is.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel


Re: [Flightgear-devel] Bug in native protocol was: simgear 1.0.0 crash -- and yet another bug

2008-01-21 Thread Torsten Dreyer
 That sounds like you're using TCP, since if you were using UDP, the
 master would not know if the slave(s) received the message -- UDP is an
 unreliable protocol and the master
 does not know if it is transmittiing into oblivion or reaching an actual
 slave instance of FlightGear. Provided the native protocol doesn't have
 a mechanism to provide feedback of received messages to the master, that
 is.
It's udp I swear by the life of my dog (mine is to precious):

The commandline pasted fresh from the clipboard:
fgfs --native=socket,out,20,localhost,5556,udp --aircraft=c172p 
--geometry=640x480 --timeofday=noon 
--native-ctrls=socket,out,20,localhost,5557,udp 

And the output of netstat -un after the above command (sorry - german output. 
s/VERBUNDEN/CONNECTED/g):

Proto Recv-Q Send-Q Local Address   Foreign Address State
udp0  0 127.0.0.1:32817 127.0.0.1:5556  VERBUNDEN
udp0  0 127.0.0.1:32818 127.0.0.1:5557  VERBUNDEN

Torsten

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Flightgear-devel mailing list
Flightgear-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/flightgear-devel