Hi,

----- Florian Köberle <flor...@fkoeberle.de> a écrit :
> The worse thing is that the character code uses even the width of
> sprites. Just have a look at line 533 in
> src/character/body.cpp:
> body_mvt.pos.x = GetSize().x / 2.0 -
> skel_lst.front()->member->GetSprite().GetWidth() / 2.0;
> 
> The width depends on the rotation and zoom and thus on what RotoZoom
> calculates. The current situation isn't ideal as different versions of
> the called library functions (rotozoomSurfaceXY and rotozoomSurface)
> might calculate different results, but doing more calculation in non
> fixedpoint arithmetic might make things even worse.

I fear this is a dead-end then, all the more since I already saw the bad 
consequences of changing that code to use float. :/

> By the way you can get probably a very large performance boost by
> rewriting the character code. It calculates a lot of stuff multiple
> times where it isn't necessary.

I don't think it really matters: profiling-wise, Member::ApplyMovement is a 
measly 4% in my profiling, while GetObjectAt is 20%. And during game, with the 
exception of the times when the AI is thinking, framerate is around 20. So such 
code may hinder performance, but we are talking of ways to go from 20fps to 
more, while I'm more worried of the times framerate drops to below 5.

> This list of shortcomings of the character code is far form being
> complete. If you want to rewrite it and need more information let me know.

Those are unfortunately micro-optimizations in parts rather irrelevant to CPU 
usage. I don't have a usable profiling for ARM, so it could be that 
Member::ApplyMovement is more troublesome.

> Generally the AI should not be laggy. As the thinking is split in a lot
> of small packets (called ideas) which can be separately checked. You
> could try if setting REAL_THINK_TIME_PER_REFRESH_IN_MS to 0 helps. If it
> doesn't then the probably to much calculation happen within one idea.

Might be interesting to check how much time each idea actually takes: it might 
not be the 1ms it checks.
It also seems that process is not running in its own thread (so that it does 
not freeze the display), but rather serially:
MainLoop -> InputRefresh -> AIRefresh -> if runtime < limit, check idea (l111 
in ai/ai_stupid_player.cpp).

The last part also underlines that it may actually take more than the limit and 
is actually not bounded:
- start, wallclock is 0ms
- check one idea, ends at 0.2s so stop (but we still took 200ms instead of 1ms)

That's why having a thread could be useful: even if it takes time, as long as 
the display isn't frozen meanwhile, it would be ok.

Anyway, that needs some validation, at least on my target.

> About the straigth line shot: How do you want to know if you hit the
> ground without checking every pixel in the way of the missile?

My analysis has been rather rough in that I was only looking at whom made all 
those GetObjectAt calls.
Still, for each position, the problem is not checking ground/vacuum, but 
checking every object.

Another way to do GetCollisionObject (very rough) would be:
- sort before thinking objects with their abscissa
- for abscissa to first object abscissa by step x:
  . Ground/vacuum?
- for first object within reach to last object within reach
  . Check current object (the Contains(pos) / IsDead() checks)
  . for current object position to next object:
     . Ground/vacuum?

This way, each object/character should be tested only once, and not 
distance/step times, while doing the same number of checks for the ground (or 
little more, to be extra-careful with bounds of the above loops). Same *might* 
be applied to any trajectory.

> But
> generally it's true that there is a huge room of improvements for the AI.

I'm impressed with the end result anyway. I wouldn't probably have been able to 
even have it do something useful.
But I'm just seeing a problem (the lag), and looking at the circumstances (IA 
seems to be thinking), I draw quick conclusions. So I'm looking at optimizing 
it: from there comes my hunt of Doubles and seeing how using float improves 
things, I feel comforted in this hunt.

I believe the AI does not need to compute ratings with Double for instance: it 
is so local and does not need to be very precise as far as I see.
Maybe the trajectory evaluation doesn't need Double either (it's just an 
evaluation, we don't send it over network, we don't need it to be precise), but 
I'm afraid this would require templating code with Point2d/Point2f, and I'm not 
sure it's that easy, and anyway, the costly thing.

> I had some unchecked test case classes but I lost them due a recent hard
> disk failure. If I remember the large number of iterations where
> necessary when calculating the square root of large numbers. Anyway my
> 64 bit version of the sqrt function isn't that much optimized like the
> one which was there for 32 bit.

I saw that indeed: I had it changed back to 32bits, saw that even the menus got 
corrupt, checked the max value in the internal representation (was upwards of 
600000 alone for the menus, while 32bits only allows 2^16) and dropped the idea 
until I figured where those values came from. If I reduce the cases where 
Double is used, it might help tracking them.

> I could imagine that angles need the 16 bit of precision as they range
> only from 0 to 2 pi.

In such a case, cheating is possible: angles are only output from some 
functions, used in comparison and fed to functions expecting angles. Therefore, 
by adapting the angle constants and those functions, different precision could 
be used. In such a case, 16.16 allows for very large angles. For other values, 
such as distances and so on, maybe less precision in the decimal part is 
sufficient.

A very large work adapting the fixed functions, like you managed to, for a 
small gain, I fear, unfortunately.

> What you could try is to add (conditional) code
> which checks if a calculation with Double resulted in a larger value
> then you can store with a 32 bit version of Double. If that check fails
> you abort or print something to the command line. This way you could
> find all the places where we actually need 48.16 and eliminate them.

Exactly what I wrote above.

> If
> nothing helps you can change those places to calculate in a different unit:
> e.g. changing
> Double distance_in_mm = 1000000;
> to
> Double distance_in_m = 1000;

Indeed, for some computations, positions are multiplied by 40, so in a square 
computation:
(40*a)²+(40*b)² = 1600*(a²+b²). (a²+b²) integer part has to be < 32767/1600 to 
fit in 16bits with sign. a and b must be consequently very low to not overflow 
the 16 bits integer storage.

Christophe

_______________________________________________
Wormux-dev mailing list
Wormux-dev@gna.org
https://mail.gna.org/listinfo/wormux-dev

Répondre à