from:"J. William Campbell"

On 5/26/2011 9:33 PM, Graeme Russ wrote:
 Hi Bill,

snip
 get_ticks() does not care about the clock rate - It simply looks at the
 current value of the hardware tick counter and the value of the hardware
 tick counter the last time get_ticks() was called, calculates the difference
 and adds that to the 64-bit software tick-counter
 I don't see how it being a down counter makes that any more difficult

 This is neither simple nor maintainable. Further, it is un-necessary, as the
 sync_timer routine is just going to convert the number from whatever radix
 we converted it to into millisec.  If we leave the two numbers as split, all
 that complexity is removed from get_ticks and sent upward to the common
 routine that converts the answer into ms anyway. This makes the system more
 maintainable by placing the minimum requirement on get_ticks. The tick
 should be opaque to anybody but sync_timebase anyway.
 But how is the common routine going to know if it is a split timer, up
 timer, down timer, little endian, big endian, etc, etc.

 get_ticks() abstracts the hardware implementation of the hardware timer
 from sync_timer()
Hi All,
 I understand your point. I prefer a higher level of abstraction. 
You are correct that there are some aspects of the tick counter that are 
very hardware quirky, and these attributes are hard to abstract. If the 
timer is embedded into a bit field with other variables, it is 
reasonable to expect get_ticks to extract the bit field and right 
justify the number. If there are endian issues, the get_ticks routine 
must return the number in the natural endianness of the platform. 
However, after that point, the values are extremely regular. The fact 
that a counter is a down counter can be expressed in a data structure as 
a boolean. The high limit of the hardware counter is a number. The 
number of ticks per millsecond is obtainable from usec2ticks(1000), or 
1 if we want to avoid some roundoff. From these values, sync_timer 
can take the two part ticks value and convert it to millisec. Trust me 
on this. I have the routines to do it. This puts as much of the 
abstraction of the two numbers into ONE COMMON ROUTINE, sync_timer. Now 
it is clearly possible to move some of the abstraction down a level into 
sync_timer. For instance you could move inverting the counter down to 
that level, and then multiply the msb by the maximum value of the lsb 
counter and add in the msb. It is clearly possible to move ALL of 
sync_timer down into get_ticks, if one wanted to. It is clearly possible 
to replace general values in gd- with platform specific constant 
values. However, if you do that, you end up with a lot of duplicate, or 
almost duplicate, code running around. That has proven to be error 
prone, and it has left new ports of u-boot to sort of fend for 
themselves in figuring out how things should work. I prefer to abstract 
that all up into sync_timer. That way, all the math is in one place, and 
is table driven so it is easy to change.
 The top 32 bits are the rollover count and the bottom 32 bits are the
 current counter. If the counter is a full 32 bits, so much the better.
 Ah - Lets keep it that way

 Again, one could put them together inside the interrupt routine , but it
 is
 easier to check for a changed value if you don't do this. Otherwise, you
 have to check both words. It also makes the isr faster. It is just an
 As I said before - Simple First, Fast Later
 I am in favor of simple. That is why I want get_ticks to do as little as
 possible. It should just read the hardware register and the overflow counter
 if it is separate. Make sure the overflow didn't change while we were
 reading. This is redundant if we are not using interrupts but we can leave
 the code in. It just won't do anything.  We can also move the rollover
 detection to sync_timebase. It will be redundant if we are using interrupts,
 because time will never back up. But we can do it this way. This
 centralizes the overflow detection, which is a good thing.
 That does not sound simple to me. This, however, does:

 u64 get_ticks(void)
 {
   static u64 last_hw_tick_count;
   static u64 last_sw_tick_count;

   /* Now for the platform specific stuff - Read hardware tick counter */
   u64 current_hw_tick_count = /* Read hw registers etc */

   /* Deal with hardware weirdness - errata, stupid hw engineers etc */

   u64 elapsed_ticks = current_hw_tick_count - last_hw_tick_count;
   last_sw_tick_count += elapsed_ticks;

   return last_sw_tick_count;
 }

 The '/* Read hw registers etc */' bit will always be the same, no matter
 what way you do it
Agree.
 The '/* Deal with hardware weirdness - errata, stupid hw engineers etc */'
 bit is where we are truly abstracting the hardware away - This is the
 bit you propose to leave mangled and deal with in sync_time?
Not totally. The get_ticks routine must mask off any extra bits and 
right justify the hardware counter. If the counter is

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 12:28 AM, Wolfgang Denk wrote:
 Dear J. William Campbell,

 In message4ddefdbc.7050...@comcast.net  you wrote:
 I really STRONGLY disagree with this statement. If you actually needed
 64 bit variables, fine use them. But as I have already shown, you do not
 need them in general.  We are computing a 32 bit result. There is some
 entropy argument that says you shouldn't need 64 bits to do so. Another
 way to look at it is that converting the top 32 bit word and the bottom
 32 bit word to ms separately can be easier than doing them both together
 at once.  However, as we will see below, I do agree we need two 32 bit
 words to make this process go smoothly. I just don't agree that they
 should/will constitute a 64 bit binary word. See below.
 And I disagree with this.

Hi Wolfgang,
 OK, I hear you.
 Yes, that is the problem. I have come to the view that  two 32 bit words
 are the best approach. Note that the lsb may actually not fill the full
 32 bits. The top 32 bits are the rollover count and the bottom 32 bits
 are the current counter. If the counter is a full 32 bits, so much the
 better. Again, one could put them together inside the interrupt routine
 , but it is easier to check for a changed value if you don't do this.
 It's even easier if you use a single 64 bit variable, because then you
 can simply use ==.


 In general, no you can't, or at least you probably don't want to. . 
If you are reading a 64 bit performance counter, it is quite likely that 
you cannot read it twice without the clock having ticked. If the CPU 
executes 1 instruction (or fewer(if an SPR/memory reference  is 
involved?) per performance counter tick, which is the goal of the 
performance counter, == is an infinite loop A similar condition 
exists if you are combining a software counter with a fairly fast 
hardware counter. It might require flipping the hardware counter (if it 
is a down counter) and a 64 bit multiply add, which must be done in 
software/a subroutine if the cpu has no 64 by 64 multiply. By the time 
that is done, the timer LSB may have ticked. Consider the m68K case.
 Otherwise, you have to check both words. It also makes the isr faster.
 Drop any thoughts about make FOO faster for now, please.  Especially
 at this stage it is much more important to have a simple and clean
 design.  If split in two variables, even a simple read access will
 turn into code like

   do {
   upper  = timebase_upper;
   lower  = timebase_lower;
   } while (upper != timebase_upper);

 This is not exactly as simple as you claimed.

 True, but if you look at a lot of 64 bit performance counters, that 
is EXACTLY what the handbook book recommends on how to read them. There 
is no atomic way to read them both at once, and reading one half doesn't 
freeze the other half. This code is also required if timebase_upper is 
altered in the interrupt routine.  YMMV, but in a lot, dare I say most, 
cases this is required anyway.  And while the code is more complex than 
a simple assignment statement, it is not very complex.

Best Regards,
Bill Campbell

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 12:33 AM, Wolfgang Denk wrote:
 Dear J. William Campbell,

 In message4ddf2072.5090...@comcast.net  you wrote:
 ...
 The problem is that the way we previously detected wrapping does not
 work if the interrupt rate is == to the counter wrap time, which it
 essentially always is. If get_ticks is trying to update the wrap count
 You ignore the fact that this is only ever a problem when the rollover
 cannot signal through an interrupt or similar.  Also, some processors
 allow daisy-chaning of timers, etc.

 Again, I would really like to know about how many exotic systems we
 are talking that fulfil your worst-case expectations.  I bet the
 overwhelming majority behaves absolutely harmless.
Hi Wolfgang,
 I think that in fact the opposite is true. The problems occur if 
both the main program and the interrupt routine are trying to update the 
timer msb using the same code, as we were originally talking about. 
There is no problem if only the interrupt routine detects the rollover. 
That is the correct way to go if your interrupts work. There was nothing 
particularly exotic required. It was the normal case. Take a look at 
what would happen on the PPC is the main program was reading the 
decrementer, detecting wraps and increasing the timestamp while the 
interrupt routine was also incrementing the timestamp. Every so often 
you get a double increment. Why were we doing this? Because I was trying 
to re-use exactly the same code in the interrupt case and the 
non-interrupt case. Not a good idea, in fact a bad idea as it turns out.

Best Regards,
Bill Campbell
 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 12:35 AM, Graeme Russ wrote:
 Hi Wolfgang,

 On 27/05/11 17:13, Wolfgang Denk wrote:
 Dear Graeme Russ,

 In messagebanlktinwvy9b4qzelnawf7mkt9z1zem...@mail.gmail.com  you wrote:
 I think we will need to define get_timer() weak - Nios will have to
 override the default implementation to cater for it's (Nios') limitations
 Please don't  - isn't the purpose of this whole discussion to use
 common code for this ?

 Yes, but Nios is particularly bad - It has a 10ms tick counter :(
Hi All,
  And a hardware timer that you can't read to subdivide the 10 ms. 
Note that this is not necessarily a problem with all NIOS 
implementations. The timer characteristics can be controlled when you 
generate the bitstream for the FPGA. You can make the counter both 
faster and readable if you want. It just uses a bit more silicon. Sad to 
say, it probably will require  per board get_ticks routine. For the 
old nios2 timers however, overriding get_timer with a /board routine 
is probably the only way to go.

l
 I don't see reason for hamstring other platforms when a simply weak
 function can get around it
Agree.
Best Regards,
Bill Campbel
 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 6:07 AM, Scott McNutt wrote:
 Graeme Russ wrote:
 Hi Wolfgang

 On Friday, May 27, 2011, Wolfgang Denk w...@denx.de wrote:
 Dear Graeme Russ,

 In message banlktik2sum4sm8aljcrcmz+kcmgwge...@mail.gmail.com you 
 wrote:
 Besides, Nios can return an increment of 10 (presumably ms) between
 two immediately consecutive calls. This causes early timeouts in CFI
 driver
 Now this in turn is a bug in the timer implementation that needs to be
 fixed.

 And this is what reset_timer() corrected.

 Agreed, but that is not something I can achieve - I don't want to hold
 up this whole show that we have all put so much effort into for the
 sake of one weak function

 And I don't want to see something that currently works become broken
 because we improved a feature ... simply because the resolution of
 the timestamp is 10 msec rather than 1 msec.

 And just to be clear. This is not a Nios issue. Currently, if the
 timestamp is incremented via a fixed period interrupt, and the period
 of the interrupt is longer that 1 msec, calls to get_timer() may
 produce early timeouts ... regardless of platform.
Hi All,
 A more precise statement of the problem is that all timer delays 
may be shortened by the timer resolution. So this means that if you have 
a timeout of 1 ms in your get_time(0) {   } while ( ...  1), then your 
actual delay may be anywhere between 0 and 1 ms. The problem arises when 
some piece of common code uses a delay of say 8 millisec, expecting the 
actual delay to be between 7 and 8. If the resolution is 10 ms, the 
delay will be between 0 and 10 ms, 0 being particularly bad. This can be 
fixed in get_timer, making the 8 ms delay  become a minimum of 10 ms at 
the expense of it becoming up to 20 ms sometimes. Since these delays are 
used mostly for error conditions, making them longer will probably be 
ok, and doesn't require changing any of the common code. It probably 
will not make things slower either, because the error timeouts should 
not be reached. The reset of the hardware timer would cause all short 
delays to become 10 ms. This reset approach is bad in that it prevents 
proper nesting of timing loops. However, in this case it isn't so bad, 
in that the nested loops are just extended, not shortened. Note that if 
the reset is only resetting the HARDWARE interrupt generator, not the 
actual timestamp itself, we are just extending all existing timeouts by 
0 to 10 ms.. So this just lengthens all pending timeouts. The other fix 
is in my opinion nicer, because it affects the nest loops less. If the 
inner loop is executed 100 times, with the reset, the outer loop timeout 
is extended by up to 1000 ms.

Best Regards,
Bill Campbell

 --Scott




___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 11:54 PM, Graeme Russ wrote:
 On Fri, May 27, 2011 at 4:33 PM, J. William Campbell
 jwilliamcampb...@comcast.net  wrote:
 On 5/26/2011 9:33 PM, Graeme Russ wrote:
 Hi Bill,

 snip
 [massive snip]

 OK, you have my ears pricked - Can you give me code samples for:

   - get_ticks()
   - sync_timbase() (no need to implement the whole lot if that is too
 much effort right now)
   - timer_isr()

 that handle the following hardware tick counter scenarios:

 a) 64-bit up counter
 b) 64-bit down counter
Hi Graeme,
 c) 32-bit up counter, wraps at 65000
Do you mean 32 bits or 16 bits? doesn't make much difference, but just 
checking.
 d) 16-bit microsecond up counter 0-999 which wraps and triggers a 16-bit
 millisecond up counter. Reading milliseconds latched microseconds and
 clears milliseconds (look in arch/x86/cpu/sc520/timer.c for example)
 e) 24-bit counter occupying bits 2-25 of a 32-bit word (just to be
 difficult)
 f) Any other option anyone cares to throw ;)

 All of these options must be covered using:
   - Minimal global data (we would like it to work before relocation, but
 not mandatory - GD footprint would be nice)
   - All use the same sync_timebase function
   - Demonstrate using an ISR NOT synced to the hardware tick counter and
 an ISR that is
   - Parameters to get_ticks() and sync_timer() are permitted, but not for
 timer_isr() (naturally)


 OK! Once again, I accept the challenge. One Caveat. My real work 
has been sliding due to the time I have been spending on this. I am 
flying from San Francisco to Sydney tonight , (to work, not to play),  
so I will be off-grid for 14 hours+. You will not get this code for a 
few days, like probably 3 days. I have looked at the requirements, and I 
see no real problems that I don't know how to solve.
 I don't' see any reason to push this down to a lower level. It is just one
 more thing to get messed up across implementations.
 Agreed

 detection in the non-interrupt case to sync_timebase as well.
 Sync_timebase
 can also invert the down-counting counters, removing that from get_ticks.
 The wrap detection code can be #ifdef out if one is using interrupts and
 Urghh - Why are you adding unnecessary ugliness - #ifdef in the middle of
 code is usually a sign you are doing something wrong
 As I said, this is an optional optimization. I do not agree that an #ifdef
 in the middle of code indicates you have a bad design. Lots and Lots of
 ifdefs certainly indicates a bad design. An ifdef to eliminate code if some
 option is not selected is hardly such a strange thing, especially only a
 single #ifdef. However, feel free to not have it if you so desire.
 OK, I'll let this slide for the moment - please include in above example
Will Do.
 offended by it's presence. Thanks for pointing this out and compelling me
 to
 reduce the number of cases! Making get_ticks more lightweight is a good
 idea
 in my opinion.
 Lets say you have a platform with a 32-bit tick counter running at a
 reasonably long rollover time so you decide not to use interrupts. Then
 you create a new platform with the same tick counter, but it runs much
 faster and you realise you need to implement the interrupt routine to
 make get_timer() work for long enough periods - Fine, you add an ISR
 for the new platform that calls sync_timebase - No other changes are
 required.

 The last thing we want is for the 64-bit tick counter to be conceptually
 different across platforms

 I just realised - the ISR _does not need to call the sync_timebase at
 all_
 The ISR only needs to call get_ticks(), so it will be fast anyway
 The problem is that the way we previously detected wrapping does not work
 if
 the interrupt rate is == to the counter wrap time, which it essentially
 always is. If get_ticks is trying to update the wrap count when an
 interrupt
 Is it, always, on every platform?
 Yes, pretty much. You get a terminal count/counter underflow interrupt and
 that is it.
 Not on sc520 - The micro/millisecond counter cannot be used to driver an
 interrupt - you need to setup a seperate timer. I think the x86 internal
 performance counters are the same
What is true, as you have stated, is that the micro/millisecond counter 
on the sc520 does not interrupt at all. Nor do the x86 performance 
counters. The x86 performance counters are a non-problem because they 
are 64 bits long. We don't need interrupts for them. Now, if you choose 
to use the sc520 micro/millisecond counter, then you need another source 
of interrupts. Due to the fact that reading the sc520 counter resets it, 
we must accumulate the elapsed time in software. That means the 
interrupt routine must do a bit more work, but it also allows reading 
the counters in non-interrupt code (with interrupts disabled) to not 
mess up the accumulated count. We don't detect rollover in the sc520 
counters, we just read and accumulate the value. So no problem there.
 comes in, it will do it wrong. If the interrupt

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 8:44 AM, Scott McNutt wrote:
 J. William Campbell wrote:
 On 5/27/2011 6:07 AM, Scott McNutt wrote:
 Graeme Russ wrote:
 Hi Wolfgang

 On Friday, May 27, 2011, Wolfgang Denk w...@denx.de wrote:
 Dear Graeme Russ,

 In message banlktik2sum4sm8aljcrcmz+kcmgwge...@mail.gmail.com 
 you wrote:
 Besides, Nios can return an increment of 10 (presumably ms) between
 two immediately consecutive calls. This causes early timeouts in CFI
 driver
 Now this in turn is a bug in the timer implementation that needs 
 to be
 fixed.

 And this is what reset_timer() corrected.

 Agreed, but that is not something I can achieve - I don't want to hold
 up this whole show that we have all put so much effort into for the
 sake of one weak function

 And I don't want to see something that currently works become broken
 because we improved a feature ... simply because the resolution of
 the timestamp is 10 msec rather than 1 msec.

 And just to be clear. This is not a Nios issue. Currently, if the
 timestamp is incremented via a fixed period interrupt, and the period
 of the interrupt is longer that 1 msec, calls to get_timer() may
 produce early timeouts ... regardless of platform.
 snip
 This can be fixed in get_timer, making the 8 ms delay  become a 
 minimum of 10 ms at the expense of it becoming up to 20 ms sometimes.

 Ok. Now I get it. Thanks.

 snip
 This reset approach is bad in that it prevents proper nesting of 
 timing loops.

 In my particular case, because reset_timer() resets the timestamp
 to zero rather than simply restarting the timer. I believe leaving
 the timestamp alone would solve the nesting problem.

 snip
 The other fix is in my opinion nicer, because it affects the nest 
 loops less. If the inner loop is executed 100 times, with the reset, 
 the outer loop timeout is extended by up to 1000 ms.

 Bill, thank you for explaining -- probably for the nth time -- but it
 did finally sink in.
Hi Scott,
  Glad to help, I finally think I understand it myself in looking 
into it further! I think we have a good way ahead that should keep 
everything working. We will get you an alpha copy of whatever we do as 
soon as possible so you can verify we didn't break nios2!
Best Regards,
Bill Campbell

 Regards,
 --Scott



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/27/2011 8:13 AM, Simon Glass wrote:
 On Fri, May 27, 2011 at 8:00 AM, J. William Campbell
 jwilliamcampb...@comcast.net  wrote:
 [snip]
 Hi All,
  A more precise statement of the problem is that all timer delays
 may be shortened by the timer resolution. So this means that if you have
 a timeout of 1 ms in your get_time(0) {   } while ( ...  1), then your
 actual delay may be anywhere between 0 and 1 ms. The problem arises when
 some piece of common code uses a delay of say 8 millisec, expecting the
 actual delay to be between 7 and 8. If the resolution is 10 ms, the
 delay will be between 0 and 10 ms, 0 being particularly bad. This can be
 fixed in get_timer, making the 8 ms delay  become a minimum of 10 ms at
 the expense of it becoming up to 20 ms sometimes. Since these delays are
 used mostly for error conditions, making them longer will probably be
 ok, and doesn't require changing any of the common code. It probably
 will not make things slower either, because the error timeouts should
 not be reached. The reset of the hardware timer would cause all short
 delays to become 10 ms. This reset approach is bad in that it prevents
 proper nesting of timing loops. However, in this case it isn't so bad,
 in that the nested loops are just extended, not shortened. Note that if
 the reset is only resetting the HARDWARE interrupt generator, not the
 actual timestamp itself, we are just extending all existing timeouts by
 0 to 10 ms.. So this just lengthens all pending timeouts. The other fix
 is in my opinion nicer, because it affects the nest loops less. If the
 inner loop is executed 100 times, with the reset, the outer loop timeout
 is extended by up to 1000 ms.

 Best Regards,
 Bill Campbell
 Hi Bill,

 Yes I agree that this is ugly - I didn't realize that this is what
 reset_timer() does, but I think these 10ms platforms should have to
 live with the fact that timeouts will be 0-10ms longer than hoped.
 Perhaps reset_timer() should become a non-standard board thing that is
 deprecated. Really if you have a 10ms timer and are asking for a 10ms
 timeout you are being a bit hopeful.
Hi All,
 Yes, but the person writing the driver was writing common code. 
He probably didn't even know there was a timer whose resolution was not 
1 ms.
 But perhaps this argues for a function to check timeouts - at the
 moment get_timer() returns the time since an event and it is used at
 the start of the loop and the end. Perhaps we should have:

 #define TIMEOUTMS 2000

 stop_time = get_future_time(TIMEOUT_MS);  // Returns current time +
 TIMEOUT_MS + (resolution of timer)
 while (get_timer(stop_time)  0) // (I would much prefer while
 (!timed_out(stop_time))
 wait for something
 }

 Regards,
 Simon
In the existing system, you can get the same result by running the while 
loop  with a condition of (get_timer(base)  TIMEOUTMS + TIMER_RESOLUTION).
We could just make TIMER_RESOLUTION a mandatory define for all u-boots. 
Then common code would be wrong if the TIMER_RESOLUTION were omitted. 
For all I know, there may be such a define already. Anybody know of one?

Best Regards,
Bill Campbell

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 6:27 AM, Graeme Russ wrote:
 Hello Everyone,

 OK - Starting a new thread to discuss implementation details. This is a
 heads-up for arch/platform maintainers - Once this is a bit more stable, I
 will put it on the wiki

 Assumed Capabilities of the Platform
   - Has a 'tick counter' that does not rely on software to increment
Hi All,
The nios2 with the most basic timer does not meet this 
requirement. It will not count at all without the 10 ms interrupt. I 
don't think this requirement matters anyway. We need a 'tick counter' 
that 'ticks'. If it takes software to make it tick, we don't much care. 
There may be problems with early use of udelay in that case, but that is 
a different issue.
   - tick interval may by a fixed constant which cannot be controlled
 via software, or it could be programmable (PIT)

 API Functions (/lib/timer.c)
   - u32 get_timer(u32 start)
  - Returns the number of elapsed milliseconds since 'start'

 API Functions (/arch/...)
   - void udelay(u32 delay)
  - Used for 'short' delays (generally up to several seconds)
  - Can use the tick counter if it is fast enough
  - MUST NOT RESET THE TICK COUNTER
There is a requirement that udelay be available before relocation and 
before the BSS is available. One can use the tick counter to provide 
udelay as long as sync_timebase is not called OR sync timebase does not 
use BSS. It appears many implementations ignore this requirement at 
present. We should try to fix this, but is should not be a requirement.
 'Helper' Functions (/lib/timer.c)
I think this function should be weak, so that it is possible for people 
to override it with a custom function. The fully general sync_timebase 
has lots of code in it that can be simplified in special cases. We want 
and need a fully general function to be available, but other users who 
are real tight on space may want a cut down version. We should make that 
easily possible.
   - void sync_timebase(void)
  - Updates the millisecond timer
  - Utilises HAL functions to access the platform's tick counter
  - Must be called more often than the rollover period of the
platform's tick counter
  - Does not need to be called with a regular frequency (immune
to interrupt skew)
  - Is always called by get_timer()
  - For platforms with short tick counter rollovers it should
be called by an ISR
  - Bill Campbell wrote a good example which proved this can be common
and arbitrary (and optionally free of divides and capable of
maintaining accuracy even if the tick frequency is not an even
division of 1ms)

 HAL Functions (/arch/... or /board/...)
   - u64 get_ticks(void)
For what it's worth, I would like to propose using a (gasp!) typedef 
here. It seems to me there are a whole lot of cases where the max number 
of ticks is a u32 or less. In those cases, the wrap at 32 bits helps 
things a lot. If the tick counter is really 64 bits, the function of 
sync_timebase  is simply to convert the tick value  to millisec, and 
that is it. Otherwise, if it is 32 bits or less then some other actions 
will be required. I think this is one of those times where a typedef 
would help, We could define a type called timer_tick_t to describe this 
quantity. That would allow a pure 32 bit implementation where appropriate.

Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as 
well as a regular get_ticks. The lsb version would be used for udelay 
and could possibly come from another timer if that was 
necessary/desirable. See the requirement for early udelay early 
availability.
  - Returns a tick count as an unsigned 64-bit integer
  - Abstracts the implementation of the platform tick counter
(platform may by 32-bit 3MHz counter, might be a 16-bit
0-999 microsecond plus 16-bit 0-65535 millisecond etc)
   - u64 ticks_per_millisecond()
  - Returns the number of ticks (as returned by get_ticks()) per
millisecond
I think ticks_per_sec would be a better choice. First, such a function 
already exists in almost all u-boots. Second, if one wants the best 
accuracy for things like udelay, you need better accuracy than  
millisec. Since this function is used only infrequently, when things are 
initialized, a divide to get ticks_per_millsec (if that is what you 
really want) is no big deal. Lastly, I think this function can remain 
u32. Yes, there is a 4 GHz limit on the clock rate. If this ever becomes 
an issue, we can change the type to timer_tick_t. When the CPU clock 
rate gets quite high, it is an advantage to divide it down for 
performance measurement anyway. The AMD/Intel chips already do this. If 
the hardware doesn't do it, shift the timer value right two bits. I 
doubt you will miss the 0.4 nanoseconds resolution loss from your 10 GHz 
timestamp.
   - void timer_isr()
  - Optional (particularly if tick counter rollover period is
exceptionally log

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 12:16 PM, Wolfgang Denk wrote:
 Dear J. William Campbell,

 In message4ddea165.9010...@comcast.net  you wrote:
 I think it is the task of get_ticks to return the hardware tick counter
 as an  increasing counter, period.  The counter may wrap at some final
 count that is not all ones. That is ok. Sync_timebase deals with the
 NO!  We want to be able to compute time differences using simple
 unsigned arithmentics, even after a rollover of the counter.  For this
 it is mandatory that the counter always gets only incremented until it
 wraps around at te end of it's number range, and never gets reset
I agree that that is what must happen, but it should happen inside
 sync_timebase. Sync_timebase does what is needed to convert the
 less-than-fully capable counters into a fully capable one. You could
 I think you also want this behaviour for get_ticks().
Hi Wolfgang,
 I understand why that might be nice. But to do that with common 
code would require get_ticks to call a generic routine (say sync_ticks) 
that would expand the counter to 64 bits. Note that this is not always 
totally trivial, as the timer may roll over at 10 ms or some other 
not-so-nice number. Then sync_timer would convert the 64 bit number to 
milliseconds. That approach will work. However, I think that is 
overkill, as we really want the result in milliseconds. If you look at 
the prototype sync_timer routine, you can see an example of how this is 
possible without resorting to 64 bit math. I think that avoiding the 64 
bit math on processors that don't have a 64 bit tick counter (and are 
therefore probably less capable) is worthwhile. I also think that the 
purpose of the get_time routine abstracting the time into milliseconds 
is to avoid dealing with ticks anywhere except in the timer routines. 
Presumably, nobody but sync_timer would ever have reason to call 
get_ticks. If that is not your thinking,  fine, we just disagree on that 
point, and having a sync_ticks to expand the tick counter certainly can 
be done.
 To date, it has been shown conclusively that this process cannot be
 relied upon, or we wouldn't be having this discussion.  If we put that
 functionality inside sync_timebase, it is in one place and debuggable
 once. All sync_timebase requires to do this is ticks per second and
 maximum tick value. I do request that counters that decrement be negated
 in the get_ticks routine, but beyond that, it should be a simple read of
 the tick register(s).
 I think using ticks per second is not a good idea. It may exceed
 ULONG_MAX, and having to use 64 bit for all calculations is probably
 overkill.  The existing ticks2usec/usec2ticks work fine so far.
I certainly agree using 64 bits for all calculations is vast overkill. 
In fact, I think using 64 bit calculations on systems that have only a 
32 bit or less timer register is probably overkill. :-) However, to 
date,AFAIK,  no processor has exceeded the u32 in ticks per second. As I 
pointed out in a previous e-mail, if they ever do this, we can just drop 
one or 2 bits off the 64 bit counter and in millisecond resolution, 
nobody will ever know.  Also as previously pointed out, usec2ticks is 
not present yet in lots of implementations. Also, if the fundamental 
clock frequency is 32 kHz  (or anything less than 1 MHz), usec2ticks is 
0! This probably rules out using it to get ticks per millisec or ticks 
per sec.

Best Regards,
Bill Campbell

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 1:27 PM, Wolfgang Denk wrote:
 Dear J. William Campbell,

 In message4ddeafe0.8060...@comcast.net  you wrote:
 I certainly agree using 64 bits for all calculations is vast overkill.
 In fact, I think using 64 bit calculations on systems that have only a
 32 bit or less timer register is probably overkill. :-) However, to
 date,AFAIK,  no processor has exceeded the u32 in ticks per second. As I
 Not yet. But it makes no sense to start a new design with settings
 already in critical range, especially since there is zero problem
 with breaking it down by a factor of 1000 or 1e6.

 pointed out in a previous e-mail, if they ever do this, we can just drop
 one or 2 bits off the 64 bit counter and in millisecond resolution,
 nobody will ever know.  Also as previously pointed out, usec2ticks is
 No. I will not accept a design that is so close on the edge of
 breaking.

 What is your exact problem with the existing interfaces ticks2usec()
 and usec2ticks() ?

 not present yet in lots of implementations. Also, if the fundamental
 clock frequency is 32 kHz  (or anything less than 1 MHz), usec2ticks is
 0! This probably rules out using it to get ticks per millisec or ticks
 per sec.
 The statement usec2ticks is 0 makes absolutely no sense as long as
 you don't say which argument you pass in.  You get a return value of
 0 even for a tick rate in the GHz range if you pass 0 as argument.

 Hoewver, usec2ticks(1000) or maybe usec2ticks(10) will probably
 return pretty useful values.

 [Note that by passing properly scaled arguments you can also avoid a
 number of rounding errors.]
Hi Wolfgang,
   Yes, you are correct. I was thinking usec2ticks(1), which is 
certainly not the way to do it. I am happy with usec2ticks and 
ticks2usec. That works for me. Sorry for the noise.

How about the first part of my response? Are you still thinking about it 
or is it just too bad for words :-) ?

Best Regards,
Bill Campbell


 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 4:28 PM, Graeme Russ wrote:
 Hi Bill,

 On Fri, May 27, 2011 at 2:56 AM, J. William Campbell
 jwilliamcampb...@comcast.net  wrote:
 On 5/26/2011 6:27 AM, Graeme Russ wrote:
 Hello Everyone,

 OK - Starting a new thread to discuss implementation details. This is a
 heads-up for arch/platform maintainers - Once this is a bit more stable, I
 will put it on the wiki

 Assumed Capabilities of the Platform
   - Has a 'tick counter' that does not rely on software to increment
 Hi All,
The nios2 with the most basic timer does not meet this requirement. It
 will not count at all without the 10 ms interrupt. I don't think this
 requirement matters anyway. We need a 'tick counter' that 'ticks'. If it
 takes software to make it tick, we don't much care. There may be problems
 with early use of udelay in that case, but that is a different issue.
 I think we will need to define get_timer() weak - Nios will have to
 override the default implementation to cater for it's (Nios') limitations
Hi All,
 Yes, that will probably be required here.
   - tick interval may by a fixed constant which cannot be controlled
 via software, or it could be programmable (PIT)

 API Functions (/lib/timer.c)
   - u32 get_timer(u32 start)
  - Returns the number of elapsed milliseconds since 'start'

 API Functions (/arch/...)
   - void udelay(u32 delay)
  - Used for 'short' delays (generally up to several seconds)
  - Can use the tick counter if it is fast enough
  - MUST NOT RESET THE TICK COUNTER
 There is a requirement that udelay be available before relocation and before
 the BSS is available. One can use the tick counter to provide udelay as long
 as sync_timebase is not called OR sync timebase does not use BSS. It appears
 many implementations ignore this requirement at present. We should try to
 fix this, but is should not be a requirement.
 If you really wanted to, sync_timebase() could use global data (it doesn't
 have many static variables) in which case all timer functions would be
 available before relocation
Yes, my implementation of the sync_timebase routine was written that 
way, using gd- for the required variables.
 'Helper' Functions (/lib/timer.c)
 I think this function should be weak, so that it is possible for people to
 override it with a custom function. The fully general sync_timebase has
 lots of code in it that can be simplified in special cases. We want and need
 a fully general function to be available, but other users who are real tight
 on space may want a cut down version. We should make that easily possible.
 Agree

   - void sync_timebase(void)
  - Updates the millisecond timer
  - Utilises HAL functions to access the platform's tick counter
  - Must be called more often than the rollover period of the
platform's tick counter
  - Does not need to be called with a regular frequency (immune
to interrupt skew)
  - Is always called by get_timer()
  - For platforms with short tick counter rollovers it should
be called by an ISR
  - Bill Campbell wrote a good example which proved this can be common
and arbitrary (and optionally free of divides and capable of
maintaining accuracy even if the tick frequency is not an even
division of 1ms)

 HAL Functions (/arch/... or /board/...)
   - u64 get_ticks(void)
 For what it's worth, I would like to propose using a (gasp!) typedef here.
 It seems to me there are a whole lot of cases where the max number of ticks
 is a u32 or less. In those cases, the wrap at 32 bits helps things a lot. If
 the tick counter is really 64 bits, the function of sync_timebase  is simply
 to convert the tick value  to millisec, and that is it. Otherwise, if it is
 32 bits or less then some other actions will be required. I think this is
 one of those times where a typedef would help, We could define a type called
 timer_tick_t to describe this quantity. That would allow a pure 32 bit
 implementation where appropriate.

 Another suggestion is that perhaps we want a u32 get_ticks_lsb(void) as well
 as a regular get_ticks. The lsb version would be used for udelay and could
 possibly come from another timer if that was necessary/desirable. See the
 requirement for early udelay early availability.
 I think this all adds unnecessary complexity

  - Returns a tick count as an unsigned 64-bit integer
  - Abstracts the implementation of the platform tick counter
(platform may by 32-bit 3MHz counter, might be a 16-bit
0-999 microsecond plus 16-bit 0-65535 millisecond etc)
   - u64 ticks_per_millisecond()
  - Returns the number of ticks (as returned by get_ticks()) per
millisecond
 I think ticks_per_sec would be a better choice. First, such a function
 already exists in almost all u-boots. Second, if one wants the best accuracy
 for things like udelay, you need better accuracy than  millisec. Since this
 function is used only infrequently, when things

Re: [U-Boot] [RFC][Timer API] Revised Specification - Implementation details

On 5/26/2011 6:51 PM, Graeme Russ wrote:
 Hi Bill,

 On Fri, May 27, 2011 at 11:26 AM, J. William Campbell
 jwilliamcampb...@comcast.net  wrote:
 On 5/26/2011 4:28 PM, Graeme Russ wrote:
 Why mess around with bit shifting (which you would then have to cludge
 into
 your platform code) when carting around a 64-bit value is relatively
 cheap,
 transparent and poratble (all all supported up-to-date tool chains)

 I really STRONGLY disagree with this statement. If you actually needed 64
 bit variables, fine use them. But as I have already shown, you do not need
 them in general.  We are computing a 32 bit result. There is some entropy
 argument that says you shouldn't need 64 bits to do so. Another way to look
 at it is that converting the top 32 bit word and the bottom 32 bit word to
 ms separately can be easier than doing them both together at once.  However,
 as we will see below, I do agree we need two 32 bit words to make this
 process go smoothly. I just don't agree that they should/will constitute a
 64 bit binary word. See below.
   - void timer_isr()
  - Optional (particularly if tick counter rollover period is
exceptionally log which is usually the case for native 64-bit tick
counters)
  - Simply calls sync_timebase()
  - For platforms without any tick counter, this can implement one
(but accuracy will be harmed due to usage of disable_interrupts()
 and
enable_interrupts() in U-Boot

 So to get the new API up and running, only two functions are mandatory:

 get_ticks() which reads the hardware tick counter and deals with any
 'funny
 stuff' including rollovers, short timers (12-bit for example), composite
 counters (16-bit 0-999 microsecond + 16-bit millisecond) and maintains a
 'clean' 64-bit tick counter which rolls over from all 1's to all 0's.
 The
 I think it is the task of get_ticks to return the hardware tick counter
 as
 an  increasing counter, period.  The counter may wrap at some final count
 that is not all ones. That is ok. Sync_timebase deals with the rollovers
 if
 The hardware tick counter may, the 64-bit software tick counter maintained
 by get_ticks() may not
 necessary. get_ticks is very lightweight. get_ticks should deal with
 decrementing counters by returning the complement of the counter.  The
 sc520
 case is a bit more complex if you intend to use the 0-999 and 16 bit
 millisec registers, in that you do need to add them to the previous value
 to
 As I mentioned in another post, this is a problem for the platform
 maintainer and is abstracted away throught the platform specific
 implementation of get_ticks()

 make an increasing counter. Sync_timebase likes short counters in that
 they are easy to convert to millisec and tick remainders.
 The compiler should handle using 64-bit rather than 32-bit transparently
 True enough.  But you don't need 64 bit variables at this point two 32 bit
 ones work just fine, in fact better in most cases.
 Remember, we are not dealing with a high performance OS here. The primary
 goal is portability - Performance optimisations (which do not break
 portability) can be performed later

 64-bit tick counter does not need to be reset to zero ever (even on
 startup
 - sync_timebase tacks care of all the details)
 True, but sync_timebase does have to be initialized (as does the timer
 itself in most cases, so this is not a restriction).
 This can be done in timer_init() via a call to sync_timebase() after the
 timer has been configured. This should bring everything into line

 ticks_per_millisecond() simply return the number of ticks in a
 millisecond
 - This may as simple as:

 inline u64 ticks_per_millisecond(void)
 {
 return CONFIG_SYS_TICK_PER_MS;
 }

 But it may be trickier if you have a programmable tick frequency
 You will have to call the routine that initializes sync_timebase. This
 routine should have a name, like void init_sync_timebase(void)?
 The optional timer ISR is required if the tick counter has a short roll
 over duration (short is up to you - 1 second is short, 1 hour might be,
 1
 century is not)

 Regards,

 Graeme

 It is probably true that sync_timebase should have a parameter flag. The
 reason is that if the timer isr is called only when the timer wraps, then
 the calls to sync_timebase may be slightly more than a full timer period
 apart. (due to interrupt latency). Therefore, when the timer difference
 is
 computed, if the current update is due to a wrap AND the previous update
 is
 due to a wrap, the difference should be approximately 1 wrap. If it comes
 up
 real short, you must add a wrap. This isn't necessary if the routine is
 called more often than once per wrap. Also, when sync_timebase is called
 in
 timer_isr() MUST be called more often than the rollover period of the
 underlying hardware tick counter - This is a requirement
 The equality case can be made to work.  If the extension of the counter is
 done in the interrupt routine, not in get_ticks, get_ticks just

Re: [U-Boot] [RFC] Review of U-Boot timer API


On 5/24/2011 10:17 PM, Wolfgang Denk wrote:

Dear J. William Campbell,

In message4ddc31eb.6040...@comcast.net  you wrote:
...

A tick is defined as the smallest increment of system time as measured by a
computer system (seehttp://en.wikipedia.org/wiki/System_time):

System time is measured by a system clock, which is typically
implemented as a simple count of the number of ticks that have
transpired since some arbitrary starting date, called the
epoch.

Unfortunately, this definition is obsolete, and has been for quite some

Do you have any proof for such a claim?

Hi Wolfgang,
 Well, yes, in fact the same reference you used. Note that the 
statement A tick is defined as the smallest increment of system time as 
measured by a computer system is NOT found in 
http://en.wikipedia.org/wiki/System_time. That page is defining system 
time, and what we are discussing is the definition of a tick. In fact, 
on the same wiki page you cite, there IS the statement Windows NT 
http://en.wikipedia.org/wiki/Windows_NT counts the number of 
100-nanosecond ticks since 1 January 1601 00:00:00 UT as reckoned in the 
proleptic Gregorian calendar 
http://en.wikipedia.org/wiki/Proleptic_Gregorian_calendar, but returns 
the current time to the nearest millisecond. Here 100 nanosecond ticks 
clearly does not refer to any hardware 100 ns clock that exists on the 
pc. The 100 ns is a computed (dare I say virtual) tick value. The 
point here is that the definition of tick is yours, not wikipedia.org's. 
(Although we all are aware if it is wikipedia, it MUST be so).  Further, 
http://en.wikipedia.org/wiki/Jiffy_%28time%29#Use_in_computing contains 
the statement In computing http://en.wikipedia.org/wiki/Computing, a 
jiffy is the duration of one tick of the system timer 
http://en.wikipedia.org/wiki/System_time interrupt 
http://en.wikipedia.org/wiki/Interrupt.. If a tick is the smallest 
increment of system time as measured by the computer system, the of the 
system timer interrupt part of the statement would be unnecessary. The 
fact it IS present indicates there are other kinds of ticks present in 
the universe.
AFAIK, all timers tick, and the definition of the tick rate is 
1/timer resolution. The concept of timer ticks and clock ticks has 
been around forever, and exists independent of System Time. For example, 
there may be a watchdog timer on a system that does not measure time at 
all, yet the watchdog still ticks. When you read a value from a timer 
that was not the system timer, what do you call the value you read? I 
would call it ticks, and I bet just about everybody else would too.
The only reason I feel this point matters at all is that when one 
refers to a routine called get_ticks, it is not obvious to me which 
timer ticks are being referenced. You are saying that, by definition, it 
refers to the system clock.  My feeling is that it is not obvious why 
that is so on a system that has many clocks. The name of the function or 
an argument to the function should, IMNSHO,  specify which timer ticks 
are being returned.


Best Regards,
Bill Campbell

years.  When computers had a single timer, the above definition worked,
but it no longer does, as many (most?) computers have several hardware
timers. A tick today is the time increment of any particular timer of
a computer system. So, when one writes a function called get_ticks on a
PPC, does one mean read the decrementer, or does one read the RTC or
does one read the TB register(s) A similar situation exists on the
Blackfin BF531/2/3, that has a preformance counter, a real-time clock,
and three programmable timers. Which tick do you want? For each u-boot

Please re-read the definition.  At least as far as U-Boot and Linux
are concerned, there is only a single clock source used to implement
the _system_time_.  And I doubt that other OS do different.


implementation, we can pick one timer as the master timer, but it may
not be the one with the most rapid tick rate. It may be the one with the
most USEFUL tick rate for get_timer. If you take the above definition at
face value, only the fastest counter value has ticks, and all other
counters time increments are not ticks. If they are not ticks, what are
they?

Clocks, timers?


Best regards,

Wolfgang Denk



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/25/2011 12:46 PM, Wolfgang Denk wrote:
 Dear Graeme Russ,

 In messagebanlktikm3lpynzcknp64kjeq5v+te7y...@mail.gmail.com  you wrote:
 I hope to get an implementation agreed upon that does not require
 interrupts at all, provided a tick counter with sufficiently long roll
 over time is available (longer than the maximum expected period
 between 'related' get_timer() calls, for example calls to get_timer()
 in a timeout testing while loop). This 'bare minimum' implementation
 can be optionally augmented with an ISR which kicks the timer
 calculation in the background (typically by just calling get_timer())

 It really is quite simple in the end.
 The length of this thread shows that it is not as simple as you want
 to make us believe.



 If all you have in mind are timeouts, then we don't need get_timer()
 at all.  We can implement all types of timeout where we wait for some
 eent to happen using udelay() alone.

 Need a 10 second timeout? Here we go:

   int cnt = 0;
   int limit = 10 * 1000;
   while (!condition) {
   usleep(1000); /* wait 1 millisec */
   if (++cnt  limit)
   break;
   }
   if (cnt  limit)
   error(timeout);

 get_timer() comes into play only if we want to calculate a time
 difference, for example if we want to run some code where we don't
 know how long it runs, and later come back and check if a certain
 amount of time has passed.

 When we don't know how long this code runs, we also cannot know (and
 espeically not in advance) wether or not this time will be longer or
 not than the rollover time of the tick counter.


 Your plan to require that get_timer() gets called often enough to
 prevent or detect tick counter overflows is putting things on their
 head.  It should be the opposite:  The implementation of get_timer()
 should be such that it becomes independent from such low level
 details.

HI all,
   We also cannot know if this code disables interrupts. If it does, 
the existing PPC design is broken. If interrupts are disabled during the 
entire code being executed, the elapsed run time will be 0. You can say 
that disabling interrupts in the code is not allowed. Fine, then that 
becomes a constraint on the code. Calling get_timer explicitly often 
enough is also a constraint on the code. One constraint is acceptable to 
you, the other is not. Ok, that's fine too. If the interrupt routine 
calls get_timer, then get_timer is called often enough and everything 
works the same as it presently does on PPC, It is not different than 
requiring the interrupt routine to be executed often enough. The two 
methods are functionally identical. The only problems arise on systems 
that don't support interrupts and don't have any timers with enough bits 
available to meet the 4294967 seconds maximum interval requirement. 
Those systems will be broken no matter what we do, as we have all agreed.

 Right now, almost all ARM cases are broken, because they have short 
timers and don't use interrupts. In some cases, there are actual bugs 
involved. We can make these cases less broken than they now are with a 
common get_timer approach as outlined previously. However, we cannot fix 
them to the standard Wolfgang is stating here, to be not broken. So, 
back to what I was asking before, is the improvement worth the effort if 
the result is still broken?

Best Regards,
Bill Campbell
 I have stated this before:  I consider any design that requires
 get_timer() to be called often enough broken.

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/25/2011 4:13 PM, Graeme Russ wrote:
 Hi Wolfgang

 On Thu, May 26, 2011 at 7:16 AM, Wolfgang Denkw...@denx.de  wrote:
 Dear Graeme Russ,

 In message4ddd7066.4000...@gmail.com  you wrote:
 No, not at all. And I already answered this. For example on PPC, just
 reading the timebase would be perfectly sufficient, and simpler and
 more reliable than the current interrupt based approach.
 I assume by 'timebase' you mean the 64-bit tick counter. If so, that is
 By timebase I mean the timebase register, implemented as two 32 bit
 registers tbu and tbl, holding the upper and the lower 32 bits of the
 free-running 64 bit counter, respective.
 And remember, not all platforms have this implementation. The AMD sc520
 for example has a microsecond register which counts 0-999 that ticks a
 16-bit millisecond register and resets to zero. And the millisecond
 register latches the value of the microsecond register and resets
 (the millisecond register) back to zero.

 The thing is, this can all be abstracted away via get_tick() which
 (provided it is called every 65 seconds or so) can maintain a software
 version of the timebase register. So, every 65 seconds, the prescaler
 needs to be kicked. Now, if all we want to use get_timer() for is to
 monitor a timeout (which I think might be every single use in U-Boot
 to date) then the while (get_timer(start)  timeout) loop will work. If
 get_timer() is needed to measure time between two arbitrary events (which
 I 100% agree it should be able to do) then the prescaler will need to be
 kicked (typically by an interrupt)

 _exactly_ what I am suggesting we do (and what does already happen on ARM).
 I don't think so.
Hi All,
   Just to be clear, while ARMv7 has a 64 bit performance counter, 
it is not presently used by get_time. This is a change we want to make 
correct?
 On closer inspection, some do, some don't. All ARMv7 (OMAP, S5P, Tegra2)
 do. at91 is odd - It looks like it uses interrupts, but get_timer() and
 udelay() both end up calling get_timer_raw() (with udelay only having
 millisecond resolution it seems).
I am not sure why you say at91 appears to use interrupts. There is a 
comment in arch/arm/cpu/arm930t/at91/timer.c that says timer without 
interrupts (line 73). There is the same comment in 
arch/arm/cpu/arm930t/at91rm9200/timer.c Nothing in either routine refers 
to interrupts, so I would say the timer doesn't use them. I could be 
wrong of course.
   Some others can be configured to
 increment the timer using an interrupt. ARM is, quite frankly, a complete
 mess - It has a mass of *_timer_masked() functions which the core timer
 functions are 'wafer thin' wrapper around, udelay() silently resets
 the timebase trashing get_timer() loops etc.
I sure agree with this last part. The only arm timer I found that 
clearly thought it could use interrupts was in arch/arm/cpu/ixp, and 
that was conditional, not mandatory.
 So let's wind back and distill the approach I am suggesting:

   1) A common prescaler function in /lib/ - It's purpose is to maintain
  a 1ms resolution timer (if the platform cannot otherwise do so)[1]
  The prescaler utilises a platform provided get_ticks()[2]
   2) A get_ticks() function provided by the platform - This function must
  return an unsigned counter which wraps from all 1's to all 0's - It
  DOES NOT have to be initialised to zero at system start. get_ticks()
  hides the low-level tick counter implementation - The sc520 example
  above is a classic example, so is your PPC tbu/tbl example.
   3) [Optional]An ISR which calls the prescaler[3]

 Now there is an optimisation if your tick counter has a 1ms resolution
 and is not small (i.e. 64-bits) - The prescaler is defined weak, so in
 the platform code, re-implement the prescaler to simply copy the tick
 counter to the timer variable.

 And what are the specific implementation types (in decending order of
 preference)? I think:
   1) A 64-bit micro-second tick counter[5]
- No interrupts needed
- Can be used by udelay() and get_timer() trivially
   2) A 64-bit sub-micro-second tick counter
- Interrupts most likely undeeded unless the tick frequency is
  insanely high
- Can be used by udelay() and get_timer() trivially
   3) A 64-bit milli-second tick counter
- No interrupts needed
- No prescaler needed
- Can be used by get_timer() trivially
- udelay() needs another tick source (if available) or be reduced
  to millisecond resolution
   4) A 32-bit milli-second tick counter
- No prescaler needed[6]
- Max 'glitch free' duration is ~50 days
- ISR needed to kick prescaler if events longer than 50 days need
  to be timed
- Can be used by get_timer() trivially
- udelay() needs another tick source (if available) or be reduced
  to millisecond resolution
   5) A 24-bit milli-second tick counter
- No prescaler needed[6]

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/25/2011 9:19 PM, Reinhard Meyer wrote:
Dear Graeme Russ,
On closer inspection, some do, some don't. All ARMv7 (OMAP, S5P, Tegra2)
do. at91 is odd - It looks like it uses interrupts, but get_timer() and
udelay() both end up calling get_timer_raw() (with udelay only having
millisecond resolution it seems). Some others can be configured to
increment the timer using an interrupt. ARM is, quite frankly, a
complete
mess - It has a mass of *_timer_masked() functions which the core timer
functions are 'wafer thin' wrapper around, udelay() silently resets
the timebase trashing get_timer() loops etc.

Please look at current master for at91.

http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/cpu/arm926ejs/at91/timer.c;h=a0876879d3907af553d832bea187a062a22b9bd4;hb=5d1ee00b1fe1180503f6dfc10e87a6c6e74778f3

AT91 uses a 32 bit hardware register that by means of a prescaler is made
to increment at a rate in the low megahertz range.

This results in a wrap approximately every 1000 seconds.

Actually this would be sufficient for all known uses of udelay() and
get_timer()
Hi All
Yes, you are correct. It would be sufficient for all known uses
of udelay and get_timer(). However, Wolfgang has indicated a strong
desire that the performance of the new API work properly over the full
32 bit range of the millisecond delay time. That has been the basic
issue for some time here.
timeout loops. However, this hardware register is extended to 64 bits
by software
every time it is read (by detecting rollovers).
Yes, but this extension ONLY happens if get_ticks is called via udelay
or get_timer. It doesn't happen if you are sitting at the command prompt
or off executing a downloaded stand alone program. You might ask who
cares, and I would answer that Wolfgang cares, at least to some level.
If the timer overflow triggered an interrupt, we could call get_ticks to
update the extended time inside the interrupt routine. But, as far as I
know, it doesn't. There are some other ARM processors that have a 32 bit
clock derived from a 32 kHz crystal, The will work much as you example
does up to 134217 seconds, in fact much longer than your AT91 example
does. However, that doesn't touch the 4294967 seconds that the PPC can
manage. Without interrupts, the 32 bit (or smaller) counters will NEVER
get to the PPC standard if their tick rate exceeds 1 msec. It may be
that we need a lower standard, perhaps saying 1000 seconds is enough.
But that is not my decision to make.

Since a wrap of that 64 bit tick would occur after the earth has ended,
it is simple to obtain milliseconds from it by doing a 64 bit division.
True, but moot because of the above.
Best Regards,
Bill Campbell

Best Regards,
Reinhard

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/25/2011 9:40 PM, Graeme Russ wrote:
 On Thu, May 26, 2011 at 2:19 PM, Reinhard Meyer
 u-b...@emk-elektronik.de  wrote:
 Dear Graeme Russ,
 On closer inspection, some do, some don't. All ARMv7 (OMAP, S5P, Tegra2)
 do. at91 is odd - It looks like it uses interrupts, but get_timer() and
 udelay() both end up calling get_timer_raw() (with udelay only having
 millisecond resolution it seems). Some others can be configured to
 increment the timer using an interrupt. ARM is, quite frankly, a complete
 mess - It has a mass of *_timer_masked() functions which the core timer
 functions are 'wafer thin' wrapper around, udelay() silently resets
 the timebase trashing get_timer() loops etc.
 Please look at current master for at91.

 http://git.denx.de/?p=u-boot.git;a=blob;f=arch/arm/cpu/arm926ejs/at91/timer.c;h=a0876879d3907af553d832bea187a062a22b9bd4;hb=5d1ee00b1fe1180503f6dfc10e87a6c6e74778f3

 AT91 uses a 32 bit hardware register that by means of a prescaler is made
 to increment at a rate in the low megahertz range.
 Yes, I see that now

 This results in a wrap approximately every 1000 seconds.
 Actually this would be sufficient for all known uses of udelay() and 
 get_timer()
 timeout loops. However, this hardware register is extended to 64 bits by 
 software
 every time it is read (by detecting rollovers).
 Which makes it 100% compatible with my proposed solution - The software
 prescaler will trigger the 64-bit extension and rollover detection

 Since a wrap of that 64 bit tick would occur after the earth has ended,
 it is simple to obtain milliseconds from it by doing a 64 bit division.
 Which would be done in the common prescaler in /lib/

 Currently, most ARM specific utilisations of get_timer() enforce a reset
 of the tick counter by calling reset_timer() - Subsequent calls to
 get_timer() then assume a start time of zero. Provided the internal timer
 rolls over currectly, the initial call of get_timer(0) will reset the ms
 timer and remove and 'glitch' present due to not calling the 'extender'
 function between 32-bit rollovers which makes the reset_timer() call
 unneccessary - I believe at91 behaves correctly in this regard.

 In any case, the underlying assumption made by the ARM timer interface
 (call reset_timer() first always) is inherently broken as not all users
 of the timer API do this - They assume a sane behaviour of:

   start = get_timer(0);
   elapsed_time = get_timer(start);

 Add to this udelay() resetting the timer make the following very broken:

   start = get_timer(0);
   while(condition) {
   udelay(delay);
   }
   elapsed_time = get_timer(start);

 NOTE: In this case, if udelay() also calls the prescaler then no interrupt
 triggered every 1000s would be required in the above example to get
 correct elapsed_time even if the loop ran for several hours (provided
 udelay() is called at least every 1000s

 However, to allow timing of independent events with no intervening
 udelay() or get_timer() calls, an 1000s interrupt to kick the prescaler is
 all that is needed to make this particular implementation behave correctly.
Hi All,
   True, if the processor supports timer interrupts. The problem is 
that the existing u-boots in many cases do not. I think that is really 
the crux of the problem. So what are we going to do? I am open to ideas 
here.

Best Regards,
Bill Campbell

 Of course disabling interruts and not calling get_timer() or udelay() will
 break the timer - But there is nothing that can be done about that)

 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/24/2011 7:12 AM, Wolfgang Denk wrote:
 Dear Albert ARIBAUD,

 In message4ddb4c1c.7030...@aribaud.net  you wrote:
 Not sure I still follow what the two options are -- a heads up is welcome.
 However, I do like the simplicity in having a single time unit (ticks)
 for the timer API -- asuming it covers all needs -- and providing other
 time units only as helper functions.
 I don't think using ticks is a good idea.  You would need to change
 all definitiuons of timeouts and delays and such.

 Why not using a scaled unit like microsecods or the currently used
 milliseconds?

 I wonder why we suddenly have to change everything that has been
 working fine for more than a decade (ignoring the large number of
 incorrect, incomplete or broken implementations).  But so far we
 really never needed anything else but  udelay()  for the typical short
 device related timeouts, and  get_time()  for longer, often protocol
 defined,  timeouts.

 Is there anything wrong with these two solutions, based on standard
 units (us and ms) ?
Hi All,
 After really looking into this, I think I agree with Wolfgang 
that using ms for a get_timer timebase is the best way to go. This 
thinking is heavily influenced (in my case anyway) by the fact that in 
the interrupt driven cases (and these are the ONLY fully compliant cases 
ATM I think), the cost of using ms is 0, because that is the native 
unit in which the timer ticks. This makes everything real simple. We 
can, right today, produce an API that supports u32 get_time_ms(void) for 
all CPUs in use. This would allow u32 get_timer(u32 base) to continue to 
exist as-is. These implementations would still be technically broken 
in the non-interrupt case, but they would work at least as well as they 
presently do. In fact, they would operate better because they would all 
use a single routine, not a bunch of different routines (some of which I 
am pretty sure have errors). Wolfgang would need to accept the fact that 
we are not yet fixing all the non-interrupt cases. This needs to be 
done, but is a different problem (I hope). In the non-interrupt case 
there is some cost in converting to ms from the native units, but we are 
in a timout loop, so a few extra instructions do not matter much. It is 
only a few 32 bit multiplies, which these days are real fast. If we can 
figure out how to use interrupts eventually, even this will go away.

 Then, we can ADD an optional performance measuring API like 
Scott suggests. This API would be something similar to the following:
struct ticks
{
 u32tickmsb;
 u32   ticklsb;
} ;
struct time_value
{
   u32 seconds;
   u 32 nano_seconds;
};


and the functions would be get_ticks(struct ticks * p) , 
get_tick_delta(struct ticks * minuend, struct ticks * subtrahend),  
and   cvt_tick_delta(struct time_value * result, struct ticks *input). I 
didn't use u64 on purpose, because I don't want any un-necessary 
functions pulled from the library. get_ticks reads a hi precision 
counter. How high the precision is depends on the hardware. 
get_tick_delta subtracts two tick values, leaving the difference in the 
first operand. Yes, this is may be simple to do in open code but it is 
better to hide the details. (What if msb is seconds and lsb is 
nanoseconds, then it is not so simple). cvt_tick_delta converts from 
ticks to seconds and nano_seconds. We also need a u32 
get_tick_resolution, which would return the tick resolution in ns. The 
user never needs to do any arithmetic on ticks, so that makes his life 
much easier. However, the user may want to know the resolution of these 
measurements. All these functions are quite fast except possibly 
cvt_tick_delta, which is only needed for printing anyway.
 If the hardware has a performance monitoring counter, 
implementing these functions is quite simple. The PPC and the x86 
certainly do (well, any x86 pentium and above anyway). This is a whole 
lot of chips off our plate. In cases where no such counter exists, we 
will use whatever counters there are available already. Some of the ARM 
counters are running at 32 kHz, so their resolution won't be great, but 
it is what it is. If we find out that some of these other CPUs have a 
performance counter, we will use it. This API will be completely 
optional in u-boot and can be removed by changing a #define.
Thoughts welcome.

Best Regards,
Bill Campbell

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/24/2011 9:51 AM, Graeme Russ wrote:
 On 25/05/11 00:19, Wolfgang Denk wrote:
 =
snip
Hi all,
 I have a few of questions.
First, it seems that the get_timer interface is expected to work 
properly only after relocation and only when bss is available. I say 
this because the PPC version uses an (initialized) variable, timestamp, 
to hold the time. If that is the case, there is no need to hold the 
timer static data in gd, as I have been doing up to  now. Am I correct ?
  Second, udelay is expected to be available before  bss is 
available, very early in the startup process. There a comments to that 
effect in several places in the code. Therefore, udelay cannot use 
global or static variables. Is this true?
 And third, udelay is only expected/required to work correctly for 
short, like say 10 seconds. I say this based on comments contained in 
powerpr/lib/time.c. Please ack or nak this as well.

I have a couple of small changes to the previously submitted code, 
based partly on the above
 Exposing ticks and tick_frequency to everyone via a 'tick' HAL

 In /lib/timer.c

ulong timestamp = 0;
/*
  This routine is called to initialize the timer after BSS is available
*/
void  __weak prescaler_init(void)
{
 u32 tick_frequency = get_tick_frequency();
  /* initialize prescaler variables */
}

 void __weak prescaler(void)
 {
   u32 ticks = get_ticks();
 /* Bill's algorithm */
/* result stored in timestamp; */
 }

 u32 get_timer(u32 base)
 {
#if defined(CONFIG_SYS_NEED_PRESCALER)
   prescaler();
#endif
   return timestamp - base;
 }

 In /arch/cpu/soc/timer.c or /arch/cpu/timer.c or /board/board/timer.c

 u32 get_ticks(void)
 {
   u32 ticks;

   /* Get ticks from hardware counter */

   return ticks;
 }

 u32 get_tick_frequency(void)
 {
   u32 tick_frequency;

   /* Determine tick frequency - likely very trivial */

   return tick_frequency;
 }
or instead the user may override prescaler_init and prescaler if, for 
some reason, a highly optimized version is desired. Note also that if 
the configuration variable CONFIG_SYS_NEED_PRESCALER is not defined, the 
additional prescaler routines will not be called anywhere so the 
routines should not be loaded. Yes, it it a #define to manage, but it 
should allow the existing u-boots to be the same size as before, with no 
unused code. This size matters to some people a lot!
 ===
 Not exposing ticks and tick_frequency to everyone

 In /lib/timer.c

 void prescaler(u32 ticks, u32 tick_frequency)
 {
   u32 current_ms;

   /* Bill's algorithm */

   /* result stored in gd-timer_in_ms; */
 }

 In /arch/cpu/soc/timer.c or /arch/cpu/timer.c or /board/board/timer.c

 static u32 get_ticks(void)
 {
   u32 ticks;

   /* Get ticks from hardware counter */

   return ticks;
 }

 static u32 get_tick_frequency(void)
 {
   u32 tick_frequency;

   /* Determine tick frequency */

   return tick_frequency;
 }

 u32 get_timer(u32 base)
 {
   u32 ticks = get_ticks();
   u32 tick_frequency = get_tick_frequency();

   prescaler(ticks, tick_frequency);

   return gd-timer_in_ms - base;
 }

 ===
 I personally prefer the first - There is only one implementation of
 get_timer() in the entire code and the platform implementer never has to
 concern themselves with what the tick counter is used for. If the API gets
 extended to include get_timer_in_seconds() there is ZERO impact on
 platforms. Using the second method, any new feature would have to be
 implemented on all platforms - and we all know how well that works ;)

 And what about those few platforms that are actually capable of generating
 a 1ms timebase (either via interrupts or natively in a hardware counter)
 without the prescaler? Well, with prescaler() declared weak, all you need
 to do in /arch/cpu/soc/timer.c or /arch/cpu/timer.c or
 /board/board/timer.c is:

 For platforms with a 1ms hardware counter:
 void prescaler(void /* or u32 ticks, u32 tick_frequency*/)
 {
   gd-timer_in_ms = get_milliseconds();
 }

 For platforms with a 1ms interrupt source:
 void timer_isr(void *unused)
 {
   gd-timer_in_ms++;
 }

 void prescaler(void /* or u32 ticks, u32 tick_frequency*/)
 {
 }


 And finally, if the platform supports interrupts but either the hardware
 counter has better accuracy than the interrupt generator or the interrupt
 generator cannot generate 1ms interrupts, configure the interrupt generator
 to fire at any rate better than the tick counter rollover listed in
 previous post and:

 void timer_isr(void *unused)
 {
   /*
* We are here to stop the tick counter rolling over. All we
* need to do is kick the prescaler - get_timer() does that :)
*/
   get_timer(0);
 }

 In summary, platform specific code reduces to:
   - For a platform that cannot generate 1ms interrupts AND the hardware
 counter is

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/24/2011 12:19 PM, Wolfgang Denk wrote:
 Dear Graeme Russ,

 In message4ddbe22d.6050...@gmail.com  you wrote:
 Why must get_timer() be used to perform meaningful time measurement?
 Excellent question!  It was never intended to be used as such.
 Because get_timer() as it currently stands can as it is assumed to return
 milliseconds
 Yes, but without any guarantee for accuracy or resolution.
 This is good enough for timeouts, but nothing for time measurements.

 OK, let's wind back - My original suggestion made no claim towards changing
 what the API is used for, or how it looks to those who use it (for all
 practical intents and purposes). I suggested:
   - Removing set_timer() and reset_timer()
   - Implement get_timer() as a platform independent function
 Trying to remember what I have read in this thread I believe we have
 an agreement on these.

 Exposing ticks and tick_frequency to everyone via a 'tick' HAL
 I skip this.  I don't even read it.

 ===
 Not exposing ticks and tick_frequency to everyone

 In /lib/timer.c

 void prescaler(u32 ticks, u32 tick_frequency)
 {
  u32 current_ms;

  /* Bill's algorithm */

  /* result stored in gd-timer_in_ms; */
 }

 In /arch/cpu/soc/timer.c or /arch/cpu/timer.c or /board/board/timer.c

 static u32 get_ticks(void)
 Currently we have unsigned long long get_ticks(void)  which is better
 as it matches existing hardware.

 Note that we also have void wait_ticks(u32) as needed for udelay().
Hi All,
I didn't comment before on the definition of ticks, but I fear 
that was unwise. The stated definition  was:

A tick is defined as the smallest increment of system time as measured by a
computer system (seehttp://en.wikipedia.org/wiki/System_time):

System time is measured by a system clock, which is typically
implemented as a simple count of the number of ticks that have
transpired since some arbitrary starting date, called the
epoch.


Unfortunately, this definition is obsolete, and has been for quite some 
years.  When computers had a single timer, the above definition worked, 
but it no longer does, as many (most?) computers have several hardware 
timers. A tick today is the time increment of any particular timer of 
a computer system. So, when one writes a function called get_ticks on a 
PPC, does one mean read the decrementer, or does one read the RTC or 
does one read the TB register(s) A similar situation exists on the 
Blackfin BF531/2/3, that has a preformance counter, a real-time clock, 
and three programmable timers. Which tick do you want? For each u-boot 
implementation, we can pick one timer as the master timer, but it may 
not be the one with the most rapid tick rate. It may be the one with the 
most USEFUL tick rate for get_timer. If you take the above definition at 
face value, only the fastest counter value has ticks, and all other 
counters time increments are not ticks. If they are not ticks, what are 
they?

This is one of the reasons I favor the performance monitoring system be 
separate from the get_timer timing methodology, as it will often use a 
different counter and time base anyway. That is also why I prefer to 
have a conversion routine that converts timer values to seconds and 
nano-seconds without reference to tick rates, so the user never has to 
deal with these ambiguities. Yes, under the hood, somebody does, but 
that need not be specified in the external interface. )Nobody has yet 
commented on my proposed performance measuring functions, and I know 
this group well enough not to assume that silence implies consent! )

The prescaler function defined by Graeme needs to read some timer value. 
It turns out that a u32 value is the most appropriate value for the 
get_timer operation.  The value desired is usually not of hardware timer 
tick resolution. It should be the specific bits in the timer, such that 
the LSB resolution is between 1 ms and 0.5 ms. We use greater resolution 
that this only when the counter is short enough that we need lower 
significance bits to make the counter 32 bits. For that reason, the 
function should probably be call something like u32 
read_timer_value(void). I really don't much care what it is called as 
long as we understand what it does.

Best Regards,
Bill Campbell


 static u32 get_tick_frequency(void)
 {
  u32 tick_frequency;

  /* Determine tick frequency */

  return tick_frequency;
 }
 Note that we also have u32 usec2ticks(u32 usec) and u32 ticks2usec(u32 ticks).

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/24/2011 5:17 PM, Graeme Russ wrote:
 On Wed, May 25, 2011 at 5:19 AM, Wolfgang Denkw...@denx.de  wrote:
 Dear Graeme Russ,

 In message4ddbe22d.6050...@gmail.com  you wrote:
 Why must get_timer() be used to perform meaningful time measurement?
 Excellent question!  It was never intended to be used as such.
 Because get_timer() as it currently stands can as it is assumed to return
 milliseconds
 Yes, but without any guarantee for accuracy or resolution.
 This is good enough for timeouts, but nothing for time measurements.
 Out of curiosity, are there any platforms that do not use their most
 accurate source(*) as the timebase for get_timer()? If a platform is using
 it's most accurate, commonly available, source for get_timer() the the
 whole accuracy argument is moot - You can't get any better anyway so
 why sweat the details.
Hi All,
Well, it is not quite that simple. The accuracy of the 1 ms 
interrupt rate is controlled in all cases I know about by the resolution 
of the programmable divider used to produce it. It appears that the x86 
uses a 1.19318 MHz crystal oscillator to produce the nominal 1 ms timer 
tick. (There is a typo in line 30 of arch/x86/lib/pcat_timer.c that says 
1.9318. I couldn't make any of the numbers work until I figured this 
out). The tick is produced by dividing the 1.19318 rate999.313 by 1194, 
which produces an interrupt rate of 999.3 Hz, or about 0.068% error. 
However, the performance counter on an x86 is as exact as the crystal 
frequency of the CPU is. FWIW, you can read the performance counter with 
rdtsc on a 386/486 and the CYCLES and CYCLES2 registers on later 
Intel/AMD chips. So yes, there is at least one example of a cpu that 
does not use it's most accurate (or highest resolution) time source.
 (*)I'm actually referring to what is commonly available for that platform,
 and not where a board has a high precision/accuracy source in addition to
 the common source.

 As a followup question, how many platforms use two completely independent
 sources for udelay() and get_timer() - x86 does, but I plan to change this
 so the interrupt kicks the new prescaler which can be done at  1ms period
 and udelay() and get_timer() will use the same tick source and therefore
 have equivalent accuracy.
Are you sure of this? From what I see in arch/x86/lib/pcat_timer.c, the 
timer 0 is programmed to produce the 1 kHz rate timer tick and is also 
read repeatedly in __udelay to produce the delay value. They even 
preserve the 1194 inaccuracy, for some strange reason. I see that the 
sc520 does appear to use different timers for the interrupt source, and 
it would appear that it may be exact, but I don't know what the input 
to the prescaler is so I can't be sure. Is the input to the prescaler 
really 8.3 MHz exactly? Also, is the same crystal used for the input to 
the prescaler counter and the software timer millisecond count. If 
not, then we may have different accuracies in this case as well.

Also of note, it appears that the pcat_timer.c, udelay is not available 
intil interrupts are enabled. That is technically non-compliant, 
although it obviously seems not to matter.

Best Regards,
Bill Campbell
 OK, let's wind back - My original suggestion made no claim towards changing
 what the API is used for, or how it looks to those who use it (for all
 practical intents and purposes). I suggested:
   - Removing set_timer() and reset_timer()
   - Implement get_timer() as a platform independent function
 Trying to remember what I have read in this thread I believe we have
 an agreement on these.

 Exposing ticks and tick_frequency to everyone via a 'tick' HAL
 I skip this.  I don't even read it.
 Hmmm, I think it is worthwhile at least comparing the two - What is the
 lesser of two evils

   1. Exposing 'ticks' through a HAL for the prescaler
   2. Duplicating a function with identical code 50+ times across the source
  tree

 I personally think #2 is way worse - The massive redundant duplication and
 blind copying of code is what has get us into this (and many other) messes

 ===
 Not exposing ticks and tick_frequency to everyone

 In /lib/timer.c

 void prescaler(u32 ticks, u32 tick_frequency)
 {
u32 current_ms;

/* Bill's algorithm */

/* result stored in gd-timer_in_ms; */
 }

 In /arch/cpu/soc/timer.c or /arch/cpu/timer.c or /board/board/timer.c

 static u32 get_ticks(void)
 Currently we have unsigned long long get_ticks(void)  which is better
 as it matches existing hardware.
 Matches PPC - Does it match every other platform? I know it does not match
 the sc520 which has a 16-bit millisecond and a 16-bit microsecond counter
 (which only counts to 999 before resetting to 0)

 Don't assume every platform can implement a 64-bit tick counter. But yes,
 we should cater for those platforms that can

 Note that we also have void wait_ticks(u32) as needed for udelay().

 static u32 get_tick_frequency(void)
 {
u32

Re: [U-Boot] [RFC] Review of U-Boot timer API


On 5/22/2011 11:29 PM, Albert ARIBAUD wrote:

Hi all,

Sorry, could not follow the discussion although I find it very
interesting, so I will handle the task of coming in late and asking the
silly questions.
I am glad you are looking at our discussion. I am sure we are going to 
need all the help/oversight/questions that we can get, as this is a 
change that will affect all architectures.

Le 23/05/2011 07:25, Graeme Russ a écrit :


On Mon, May 23, 2011 at 3:02 PM, J. William Campbell
jwilliamcampb...@comcast.net   wrote:

On 5/22/2011 6:42 PM, Graeme Russ wrote:

OK, so in summary, we can (in theory) have:
   - A timer API in /lib/ with a single u32 get_timer(u32 base) function
   - A HAL with two functions
 - u32 get_raw_ticks()
 - u32 get_raw_tick_rate() which returns the tick rate in kHz (which
   max's out at just after 4GHz)
   - A helper function in /lib/ u32 get_raw_ms() which uses get_raw_ticks()
 and get_tick_rate() to correctly maintain the ms counter used by
 get_timer() - This function can be weak (so next point)
   - If the hardware supports a native 32-bit ms counter, get_raw_ms()
 can be overridden to simply return the hardware counter. In this case,
 get_raw_ticks() would return 1

Are you sure you did not mean 'get_raw_ticks_rate' here? Besides, I'd
like the name to specify the units used: 'get_raw_ticks_rate_in_khz' (or
conversively 'get_raw_ticks_per_ms', depending on which is simpler to
implement and use).
I think you are correct, it was the rate function desired here. I think 
the best way to go is use a get_raw_tick_rate_in_mhz function, because 
it is probably the easiest one to implement, and in many cases something 
like it already exists.

   - Calling of get_raw_ticks() regularly in the main loop (how ofter will
 depend on the raw tick rate, but I image it will never be necessary
 to call more often than once every few minutes)

That's to keep track of get_raw_ticks() rollovers, right? And then the
get_timer function (which, again, I would prefer to have '... in ms'
expressed in its name) would call get_raw_ticks() in confidence that at
most one rollover may have occurred since the last time the helper
function was called, so a simple difference of the current vs last tick
value will always be correct.
Exactly so. Note that this same function probably needs to be called in 
udelay for the same reason. More precisely, the get_timer function will 
call get_raw_ms, which will call get_raw_ticks. I think it may be better 
to move get_timer down a level in the hierarchy,
so we don't need a get_raw_ms. get_timer would then be part of the HAL. 
One would use a get_timer(0) in order to do what get_raw_ms alone would 
have done. If the user had a good reason, he would then override 
get_timer with his own version. What do you think Graeme? It reduces the 
nesting depth by one level. As for the name change to get_timer_in_ms, I 
would support it. Naturally, such a change would be up to Mr. Denk. 
Since by definition that is what the function does, it seems to be a 
good change from my point of view.

   - If the hardware implements a native 32-bit 1ms counter, no call in
 the main loop is required
   - An optional HAL function u32 get_raw_us() which can be used for
 performance profiling (must wrap correctly)

Hi All,
   Graeme, I think you have stated exactly what is the best approach to
this problem.  I will provide a version of get_raw_ms that is  initialized
using get_raw_tick_rate that will work for all reasonable values of
raw_tick_rate. This will be the generic solution. Both the initialization
function and the get_raw_ms function can be overridden if there is reason to
do so, like exact clock rates. I will do the same with get_raw_us. This
will be posted sometime on Monday for people to review, and to make sure I
didn't get too far off base. Thank you to both Graeme and Reinhard for
looking at/working on this.. Hopefully, this solution will put this timing
issue to rest for all future ports as well as the presently existing ones.

In Greame's description, I did not see a get_raw_ms, only a get_raw_us.
Was this last one a typo or is that a third HAL function?
get_raw_ms was referenced as a library function a few lines above.  
Right now, I think the functionality we require from the HAL is


  1. get_raw_tick_rate_in_mhz
  2. get_raw_ms
  3. get_raw_ticks
  4. (optional)get_raw_us

There is also  APIs for these functions called get_timer.

I think we need to add a call to another function called 
initialize_timer_system or similar that will initialize the data 
structures in gd by calling get_raw_tick_rate_in_mhz. Additionally, I 
think we need to provide a udelay function, simply because it can 
interact with calling get_raw_ms often enough. We are somewhat caught 
between two fires here, in that on the one hand we want to provide a 
very generic approach to the timing system that will work on any CPU 
while on the other hand we

Re: [U-Boot] [RFC] Review of U-Boot timer API


On 5/23/2011 6:19 AM, Wolfgang Denk wrote:

Dear Graeme Russ,

In message4dda5334.4060...@gmail.com  you wrote:

  - A helper function in /lib/ u32 get_raw_ms() which uses get_raw_ticks()
and get_tick_rate() to correctly maintain the ms counter used by
get_timer() - This function can be weak (so next point)

Ditto.  What would that do?  If it gets milliseconds as the name
suggest, that's already the function needed for get_timer()?

OK, there appears to be a consensus that not all hardware actually supports
a free-running timer with 1ms resolution. To overcome this, the idea is to

Indeed.  I guess most of them do not.


create a common library function which maintains the free running counter.
The library function acts as a pre-scaler using a 'raw tick counter' and a
'raw tick rate' supplied by the low level architecture. We define this weak

What are raw ticks?  And what are cooked ticks, then?

Hi all,
 FWIW,  cooked ticks would be 1 ms ticks, although we never 
really use them as such.

so that if the architecture can provide a free running 1ms counter, there
is no (code size) penalty

Why do we need a free running 1ms counter at all?  Any free running
counter of at least millisecoind resolution should be good enough.
Correct. Any free running counter whose resolution is better than one 
millisecond and which is long enough that it will not overflow between 
calls to get_timer is sufficient.

This approach eliminates all the existing per-arch code which (attempts) to
manage the time base behind get time. So we simplify each arch down to it's
bare essentials - Provide a counter which increments at a natural fixed
rate and what the rate is - Let common library code deal with the rest.

Did you have a look at the PowerPC implementation?  I'd like to see
this used as reference.
I have looked at it as a reference. However, there is one disadvantage 
in using the PPC code as a reference. It has a 64 bit timestamp. Many 
systems do not have a 64 bit timestamp, but rather a 32 bit timestamp.  
It is possible to extend the 32 bit timestamp to a 64 bit timestamp if 
get_timer is called often enough that there will only be a single 
rollover of the bottom 32 bits  between uses. However, if that condition 
is met, there is no need to extend the timer to 64 bits. Instead, just 
convert the elapsed time since the last call (which you know) to ms and 
be done with it.  As Wolfgang said above, any counter that has better 
than 1 ms resolution  is adequate to the task. The additional 
requirement that we have stated is that the counter be long enough that 
it does not overflow between calls to get_timer. If the counter is 64 
bits long, it pretty much for sure meets this requirement  (although 
bits below 0.5 ms resolution really don't help any). If the timer is 32 
bits long, it will meet any requirements using get_timer to time out 
hardware intervals. My original implementation used a 32 bit divide and 
does exactly that. This is the shortest and simplest approach, and we 
can get that working in all cases quite easily I think. We can avoid the 
no divide optimization until everybody is satisfied with what we have.

  - Calling of get_raw_ticks() regularly in the main loop (how ofter will
depend on the raw tick rate, but I image it will never be necessary
to call more often than once every few minutes)

NAK. This concept is fundamentally broken.  I will not accept it.

Some existing timers are fundamentally broken - The glitch at the
0x to 0x rollover or rollover early - The method discussed
in this thread eliminates all such glitches. Provided pre-scaler in /lib/
(triggered by get_timer() usually) is called often enough (71 minutes for a
32-bit 1MHz counter) then there is no need. Even then, it is only important

We already have this nightmare of code for triggering the watchdog on
systems that use it.

Assuming there are places in the main loop that get executed often
enough is a broken concept, and I will not accept any such code.
That is fine with me. The reason this was being done was to attempt to 
emulate, as much as possible, the power PC, where the 64 bit timestamp 
counter allows calls to get_timer separated by many minutes and several 
console commands to work properly. These get timer commands will NOT 
work properly on systems that have a 32 bit counter that overflows every 
200 seconds or so. The call in the idle loop was an attempt to make the 
32 bit systems work more like the 64 bit systems. One may then either


  1.   Define calls to get_timer to measure an elapsed interval
 separated by any returns to the command processor as broken.
  2. Require the use of interrupts to extend the 32 bit timestamp.
 (This may not be possible on all systems as the timer used for
 performance monitoring does not interrupt, etc.)
  3. Allow the call in the idle loop under the assumption that we are
 talking about timing in the minutes range, not a few seconds.

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/23/2011 6:19 AM, Wolfgang Denk wrote:
 Dear Graeme Russ,
snip
 This is what PPC is doing. And I understand that Reinhard did the same
 in software for AT91.
Hi All,
  My apologies for being a little (perhaps more than a little) 
dense. As they say, after further review, I think the key aspect of 
the PPC timer system is that it uses the decrementer register to 
generate an interrupt at a 1 KHz rate. What I have been attempting here 
is to produce a timer system that does not use interrupts at all. This 
is a fundamental design question. Naturally, systems that can generate 
an interrupt at a 1 KHz rate (or at any (reasonable) higher rate for 
that matter) using the decrementer register can produce a 1 ms 
resolution software counter that updates by magic. If my understanding 
of this PPC  code is incorrect, somebody please stop me before I make a 
further fool of myself!  Is it then a design requirement that the timer 
system use interrupts? Is that what is meant by using the PPC system as 
a model? If so, is it possible/reasonable on all the u-boots that are 
out there to generate and process timer interrupts at some (hopefully 
but not necessarily) programmable rate?

Best Regards,
Bill Campbell
 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API


On 5/23/2011 12:33 PM, Wolfgang Denk wrote:

Dear J. William Campbell,

In message4ddaa705.1040...@comcast.net  you wrote:

   My apologies for being a little (perhaps more than a little)
dense. As they say, after further review, I think the key aspect of
the PPC timer system is that it uses the decrementer register to
generate an interrupt at a 1 KHz rate. What I have been attempting here
is to produce a timer system that does not use interrupts at all. This
is a fundamental design question. Naturally, systems that can generate

No, it is not.  It is an implementation detail which is irrelevant to
almost all users of U-Boot.

Or do you actucally care if your UART driver uses polling or
interrupts?

Hi All,
  I might care a lot if I expect typeahead to work, or if I am 
sending a command script via a terminal emulator and I don't want to 
loose characters while u-boot is off executing a command. One might 
(correctly) say that that is too much to expect of a boot loader, and 
define polling as good enough, which I advocate, or not. YMMV.

an interrupt at a 1 KHz rate (or at any (reasonable) higher rate for
that matter) using the decrementer register can produce a 1 ms
resolution software counter that updates by magic. If my understanding
of this PPC  code is incorrect, somebody please stop me before I make a
further fool of myself!  Is it then a design requirement that the timer
system use interrupts? Is that what is meant by using the PPC system as

No, it is not a design requirement.  It is just one possible
implementation.  Any other method that achieves the same or similar
results is as good.  As noted before, on PowerPC we could have
probably avoided this and just base all timer services on the timebase
register.

[The reason for this dual implementation is historical.  When I wrote
this code, I did not know if we would ever need any fancy timer-
controlled callbacks or similar.  And I needed to implement interrupt
handling for a few other purposes (for example for use in standalone
applications; this was an explicit requirement at that time).  And the
timer was something that made a good and simple example.]


a model? If so, is it possible/reasonable on all the u-boots that are
out there to generate and process timer interrupts at some (hopefully
but not necessarily) programmable rate?

I consider this an implementation detail.  On all architectures it
should be possible to use interrupts, so if the hardware supports a
timer that can generate interrupts it should be possible to use this.
But it is not a requirement that all implementations must work like
  Ok, this is very nice to understand. I will attempt to summarize 
what I think this email and the previous one means. First, the required 
properties of the get_timer routine.


  1. It must have 1 millisecond resolution and accuracy (more or less).
 For instance, the old NIOS timer that incremented the timestamp by
 10 every 10 milliseconds in response to an interrupt is not compliant.
  2. The get_timer routine must have full period accuracy without any
 assumptions regarding what is going on in u-boot. This period is
 4294967 seconds or so.

I then suggest that the minimum system requirements to support the 
u-boot timer are as follows:


   * Either there exists a free-running timer whose period is  4294967
 and whose resolution is 1 millisecond or better. This probably
 includes all 64 bit timestamp counters.
   * Or there exists a method of generating interrupts at a known rate
 of 1 millisecond or faster. This is a superset of  the current PPC
 method.
   * Or there exists a method of generating interrupts at a known fixed
 rate slower than once a millisecond AND there exists a readable
 free running counter whose period is longer or the same as the
 interrupt rate AND whose resolution is at least 1 ms. This would
 include N bit counters that generate an interrupt when they
 overflow, or some long timestamp counter with another, possibly 
 unrelated interrupt generating methodology that is faster than the

 counter overflow interval.

There are many systems that are able to do all three cases, or two of 
three, so they have a choice on how to implement get_timer(). I claim 
that these are sufficient conditions. I also claim they are necessary. 
If a hardware system does not meet these criteria, I claim it can't meet 
the get_timer requirements. Do such systems exist today that use u-boot? 
I think probably so, but maybe they are corner cases. Note that you 
cannot extend a 32 bit counter to a 64 bit counter reliably without 
using interrupts to do so when get_timer is not being called.


Systems using the first approach above have essentially all their logic 
in the get_timer routine, but MUST use at least some 64 bit arithmetic 
(actually 33 bit arithmetic if you don't count shifts) to do so because 
the delta time between calls can be very large. The routine

Re: [U-Boot] [PATCH] [Timer]Remove calls to [get, reset]_timer outside arch/

On 5/23/2011 1:10 PM, Graeme Russ wrote:
 On 24/05/11 04:29, Scott McNutt wrote:
 Hi Bill,

 J. William Campbell wrote:
 On 5/23/2011 6:12 AM, Scott McNutt wrote:
 Dear Graeme,

 Graeme Russ wrote:
 On 23/05/11 22:19, Scott McNutt wrote:
 Hi Graeme,

 Graeme Russ wrote:
 There is no need to use get_timer() and reset_timer() and there are
 build
 I must have missed something WRT reset_timer() -- my apologies
 if I'm covering old ground.

 When the timestamp is incremented using an interrupt that occurs with
 a period greater than 1 ms, we can get early timeouts. reset_timer()
 solved the problem. What's the recommended approach for dealing with
 this without reset_timer() ?
 Hi Scott,
Are you saying that the interrupt frequency is greater than
 1000 times per second, or as I read it, the frequency is less than 1000
 per second (period greater than 1 ms). If anything, that should make the
 timer run slow, not fast.
   I wonder if it is a resolution issue. What are the typical delays in ms
 you are using?
 Some older nios2 implementations have _fixed_ 10 msec timers.
 Basically, the timestamp is incremented asynchronous to get_timer(0).
 So a  10 msec timeout can occur, for example, almost immediately if
 the timer isn't reset just prior to calling get_timer(0). There are
 more details in the comments for the following commits:

 nios2: Reload timer count in reset_timer():
d8bc0a2889700ba063598de6d4e7d135360b537e

 cfi_flash: reset timer in flash status check:
22d6c8faac4e9fa43232b0cf4da427ec14d72ad3

 I'm totally in favor of cleaning this stuff up. It caused some
 headaches (and wasted time) about 13 months ago. My primary concern
 is to avoid breaking things that currently work for us nios2
 weenies ... at least for any length of time.

 Things are a bit tight for me until next week or so. I'll probably
 come up for air around June 1st ... and I'll be glad to help out.

 Is there any reason why we cannot silently perform a reset_timer() any time
 set_timer() is called with a parameter of 0?
Hi All,
  I assume you mean get_timer(0)?  In principle, you cannot do this 
because it could be inside another get_timer(0) loop that has already 
some time elapsed before you hit the inner get_timer(0). I think what 
needs to happen on the old NIOS with 10 ms resolution on the interrupt 
times is that all timer intervals must have 10 ms added and then rounded 
up to the nearest multiple of 10. Thus, if you wanted to wait for 1 
millisecond, you must use an argument of 20 ms to be sure you wait at 
all! If you use an argument of 10, it won't help because you could get 
an interrupt right away and exit. If these routines are nios2 specific, 
you could add a local reset_timer, but I assume they are generic. . Note 
that if these routines are not nios2 specific, is there any harm in 
waiting too long?

Best Regards,
Bill Campbell
 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [PATCH] [Timer]Remove calls to [get, reset]_timer outside arch/

On 5/23/2011 2:02 PM, Graeme Russ wrote:
 On 24/05/11 06:49, J. William Campbell wrote:
 On 5/23/2011 1:10 PM, Graeme Russ wrote:
 On 24/05/11 04:29, Scott McNutt wrote:
 Hi Bill,

 J. William Campbell wrote:
 On 5/23/2011 6:12 AM, Scott McNutt wrote:
 Dear Graeme,

 Graeme Russ wrote:
 On 23/05/11 22:19, Scott McNutt wrote:
 Hi Graeme,

 Graeme Russ wrote:
 There is no need to use get_timer() and reset_timer() and there are
 build
 I must have missed something WRT reset_timer() -- my apologies
 if I'm covering old ground.

 When the timestamp is incremented using an interrupt that occurs with
 a period greater than 1 ms, we can get early timeouts. reset_timer()
 solved the problem. What's the recommended approach for dealing with
 this without reset_timer() ?
 Hi Scott,
 Are you saying that the interrupt frequency is greater than
 1000 times per second, or as I read it, the frequency is less than 1000
 per second (period greater than 1 ms). If anything, that should make the
 timer run slow, not fast.
I wonder if it is a resolution issue. What are the typical delays in ms
 you are using?
 Some older nios2 implementations have _fixed_ 10 msec timers.
 Basically, the timestamp is incremented asynchronous to get_timer(0).
 So a  10 msec timeout can occur, for example, almost immediately if
 the timer isn't reset just prior to calling get_timer(0). There are
 more details in the comments for the following commits:

 nios2: Reload timer count in reset_timer():
 d8bc0a2889700ba063598de6d4e7d135360b537e

 cfi_flash: reset timer in flash status check:
 22d6c8faac4e9fa43232b0cf4da427ec14d72ad3

 I'm totally in favor of cleaning this stuff up. It caused some
 headaches (and wasted time) about 13 months ago. My primary concern
 is to avoid breaking things that currently work for us nios2
 weenies ... at least for any length of time.

 Things are a bit tight for me until next week or so. I'll probably
 come up for air around June 1st ... and I'll be glad to help out.

 Is there any reason why we cannot silently perform a reset_timer() any time
 set_timer() is called with a parameter of 0?
 Hi All,
   I assume you mean get_timer(0)?  In principle, you cannot do this
 Yes - it's early, no coffee yet ;)
 because it could be inside another get_timer(0) loop that has already some
 time elapsed before you hit the inner get_timer(0). I think what needs to
 Correct, but that is what is already happening for ALL arches in cfi due to
 the reset_timer() before get_timer(0) - I am suggesting sandboxing the
 problem to NIOS until we sort out the timer API properly

 happen on the old NIOS with 10 ms resolution on the interrupt times is that
 all timer intervals must have 10 ms added and then rounded up to the
 nearest multiple of 10. Thus, if you wanted to wait for 1 millisecond, you
 must use an argument of 20 ms to be sure you wait at all! If you use an
 argument of 10, it won't help because you could get an interrupt right away
 and exit. If these routines are nios2 specific, you could add a local
 reset_timer, but I assume they are generic. . Note that if these routines
 are not nios2 specific, is there any harm in waiting too long?
 Well, we have no control over the argument in cfi driver (unless you plan
 to put #ifdef NIOS all over the place)

 Maybe we could round up the parameter inside get_timer() itself?
Hi All,
That would probably be the best way to go for now. It might slow 
things down a bit though, if these delays are all desired to be short, 
like 1 ms. We would expand the 1 ms delay to 15 ms (average) while the 
current (illegal) solution would expand a 1 ms delay to 10 ms always. It 
is worth trying I think. It is also true that any other delays in the 
program will suffer from the 10 ms resolution problem, so your idea is I 
think a good one.

Best Regards,
Bill Campbell
 Regards,

 Graeme




___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/21/2011 9:26 PM, Reinhard Meyer wrote:
 Dear Graeme Russ,
 Hi All,

 I've just had a good look through the timer API code with the view of
 rationalising it and fixing up some of the inconsistencies. What I found
 was a little scarier than I expected! Anyway, here is a write-up of what I
 found - Please comment
 We have been at this discussion a multiple of times :) but never reached a 
 consent.

 However, at current master, I have reduced at91 architecture to only use
 get_timer(base), set_timer() never existed and reset_timer() has been removed.

 As it showed up recently, common cfi code still calls reset_timer() - which 
 certainly
 can be fixed with little effort...

 At the lowest level, the U-Boot timer API consists of a unsigned 32-bit
 free running timestamp which increments every millisecond (wraps around
 every 4294967 seconds, or about every 49.7 days). The U-Boot timer API
 allows:
- Time deltas to be measured between arbitrary code execution points
  ulong start_time;
  ulong elapsed_time;

  start_time = get_timer(0);
  ...
  elapsed_time = get_timer(start_time);

- Repetition of code for a specified duration
  ulong start_time;

  start_time = get_timer(0);

  while (get_timer(start_time)   REPEAT_TIME) {
  ...
  }

- Device timeout detection
  ulong start_time;

  send_command_to_device();
  start = get_timer (0);
  while (device_is_busy()) {
  if (get_timer(start)   TIMEOUT)
  return ERR_TIMOUT;
  udelay(1);
  }
  return ERR_OK;
 correct.

 The U-Boot timer API is not a 'callback' API and cannot 'trigger' a
 function call after a pre-determined time.
 that would be too complex to implement and of little use in a single task
 system. u-boot can do fine with polling.

 NOTE: http://lists.denx.de/pipermail/u-boot/2010-June/073024.html appears
 to imply the following implementation of get_timer() is wrong:

  ulong get_timer(ulong base)
  {
  return get_timer_masked() - base;
  }
 Is is not wrong as long as get_timer_masked() returns the full 32 bit space
 of numbers and 0x is followed by 0x. Most implementations
 probably do NOT have this property.

 U-Boot Timer API Details
 
 There are currently three functions in the U-Boot timer API:
  ulong get_timer(ulong start_time)
 As you point out in the following, this is the only function required.
 However it REQUIRES that the internal timer value must exploit the full
 32 bit range of 0x to 0x before it wraps back to 0x.

  void set_timer(ulong preset)
  void reset_timer(void)

 get_timer() returns the number of milliseconds since 'start_time'. If
 'start_time' is zero, therefore, it returns the current value of the
 free running counter which can be used as a reference for subsequent
 timing measurements.

 set_timer() sets the free running counter to the value of 'preset'
 reset_timer() sets the free running counter to the value of zero[1]. In
 theory, the following examples are all identical

  
  ulong start_time;
  ulong elapsed_time;

  start_time = get_timer(0);
  ...
  elapsed_time = get_timer(start_time);
  
  ulong elapsed_time;

  reset_timer();
  ...
  elapsed_time = get_timer(0);
  
  ulong elapsed_time;

  set_timer(0);
  ...
  elapsed_time = get_timer(0);
  

 [1] arch/arm/cpu/arm926ejs/at91/ and arch/arm/cpu/arm926ejs/davinci/ are
 exceptions, they set the free running counter to get_ticks() instead
 Not anymore on at91.
 Architecture Specific Peculiarities
 ===
 ARM
- Generally define get_timer_masked() and reset_timer_masked()
- [get,reset]_timer_masked() are exposed outside arch\arm which is a bad
  idea as no other arches define these functions - build breakages are
  possible although the external files are most likely ARM specific (for
  now!)
- Most CPUs define their own versions of the API get/set functions which
  are wrappers to the _masked equivalents. These all tend to be the same.
  The API functions could be moved into arch/arm/lib and made weak for
  the rare occasions where they need to be different
- Implementations generally look sane[2] except for the following:
  - arm_intcm - No timer code (unused CPU arch?)
  - arm1136/mx35 - set_timer() is a NOP
  - arm926ejs/at91 - reset_timer() sets counter to get_ticks()
 no implelemtation of set_timer()
 See current master for actual implementation!
  - arm926ejs/davinci - reset_timer() sets counter to get_ticks()
no

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/21/2011 11:23 PM, Graeme Russ wrote:
 On 22/05/11 14:26, Reinhard Meyer wrote:
 Dear Graeme Russ,
 Hi All,

 I've just had a good look through the timer API code with the view of
 rationalising it and fixing up some of the inconsistencies. What I found
 was a little scarier than I expected! Anyway, here is a write-up of what I
 found - Please comment
 We have been at this discussion a multiple of times :) but never reached a
 consent.

 However, at current master, I have reduced at91 architecture to only use
 get_timer(base), set_timer() never existed and reset_timer() has been 
 removed.
 Excellent

 As it showed up recently, common cfi code still calls reset_timer() - which
 certainly
 can be fixed with little effort...
 Yes, this is one of the easy fixes as all call sites already use the start
 = get_timer(0), elapsed = get_timer(start) convention anyway - The
 reset_timer() calls are 100% redundant (provided get_timer() behaves
 correctly at the 32-bit rollover for all arches)

 The U-Boot timer API is not a 'callback' API and cannot 'trigger' a
 function call after a pre-determined time.
 that would be too complex to implement and of little use in a single task
 system. u-boot can do fine with polling.
 I am in no way suggesting this - I just want to clarify the API for anyone
 who needs to use it

 NOTE: http://lists.denx.de/pipermail/u-boot/2010-June/073024.html appears
 to imply the following implementation of get_timer() is wrong:

  ulong get_timer(ulong base)
  {
  return get_timer_masked() - base;
  }
 Is is not wrong as long as get_timer_masked() returns the full 32 bit space
 of numbers and 0x is followed by 0x. Most implementations
 probably do NOT have this property.
 U-Boot Timer API Details
 
 There are currently three functions in the U-Boot timer API:
  ulong get_timer(ulong start_time)
 As you point out in the following, this is the only function required.
 However it REQUIRES that the internal timer value must exploit the full
 32 bit range of 0x to 0x before it wraps back to 0x.
 So this needs to be clearly spelt out in formal documentation

 Rationalising the API
 =
 Realistically, the value of the free running timer at the start of a timing
 operation is irrelevant (even if the counter wraps during the timed period).
 Moreover, using reset_timer() and set_timer() makes nested usage of the
 timer API impossible. So in theory, the entire API could be reduced to
 simply get_timer()
 Full ACK here !!!
 I don't think there will be much resistance to this

 3. Remove reset_timer_masked()
 --
 This is only implemented in arm but has propagated outside arch/arm and
 into board/ and drivers/ (bad!)

 regex [\t ]*reset_timer_masked\s*\([^)]*\); reveals 135 callers!

 A lot are in timer_init() and reset_timer(), but the list includes:
- arch/arm/cpu/arm920t/at91rm9200/spi.c:AT91F_SpiWrite()
- arch/arm/cpu/arm926ejs/omap/timer.c:__udelay()
- arch/arm/cpu/arm926ejs/versatile/timer.c:__udelay()
- arch/arm/armv7/s5p-common/timer.c:__udelay()
- arch/arm/lh7a40x/timer.c:__udelay()
- A whole bunch of board specific flash drivers
- board/mx1ads/syncflash.c:flash_erase()
- board/trab/cmd_trab.c:do_burn_in()
- board/trab/cmd_trab.c:led_blink()
- board/trab/cmd_trab.c:do_temp_log()
- drivers/mtd/spi/eeprom_m95xxx.c:spi_write()
- drivers/net/netarm_eth.c:na_mii_poll_busy()
- drivers/net/netarm_eth.c:reset_eth()
- drivers/spi/atmel_dataflash_spi.c/AT91F_SpiWrite()
 Fixed in current master.
 Excellent. I have not pulled master for a little while, guess I should

- If hardware supports microsecond resolution counters, get_timer() could
  simply use get_usec_timer() / 1000
 That is wrong. Dividing 32 bits by any number will result in a result that
 does not
 exploit the full 32 bit range, i.e. wrap from (0x/1000) to 
 0x,
 which makes time differences go wrong when they span across such a wrap!

 Yes, this has already been pointer out - 42 bits are needed as a bare
 minimum. However, we can get away with 32-bits provided get_timer() is
 called at least every 71 minutes

 P.S. Can we use the main loop to kick the timer?

- get_usec_timer_64() could offer a longer period (584942 years!)
 Correct. And a must be when one uses such division.
 Unless we can rely on get_timer() to be called at least every 71 minutes in
 which case we can handle the msb's without error in software

 But you have to realize that most hardware does not provide a simple means to
 implement a timer that runs in either exact microseconds or exact
 milliseconds.
 This is where things get interesting and we need to start pushing a
 mandated low-level HAL. For example, I believe get_timer() should be
 implemented in /lib as:

   ulong get_timer(ulong base)
   {
   return get_raw_msec() - base;

Re: [U-Boot] [RFC] Review of U-Boot timer API


On 5/22/2011 1:15 AM, Reinhard Meyer wrote:

Dear J. William Campbell,

please demonstrate for me (and others), by a practical example,
how _any_ arithmetic (even less with just shifts and multiplies)
can convert a free running 3.576 MHz (wild example) free running
32 bit counter (maybe software extended to 64 bits) into a ms
value that will properly wrap from 2**32-1 to 0 ?

Hi All
  I accept the challenge! I will present two ways to do this, one 
using a 32 bit by 16 bit divide, and one using only multiplies.
This first method is exact, in that there is no difference in 
performance from a hardware counter ticking at the 1 ms rate. This is 
accomplished by operating the 1 ms counter based on the delta time in 
the hardware time base. It is necessary to call this routine often 
enough that the hardware counter does not wrap more than once between 
calls. This is not really a problem, as this time is 1201 seconds or so. 
If the routine is not called for a long time, or at the first call, it 
will return a timer_in_ms value that will work for all subsequent calls 
that are within a hardware rollover interval. Since the timer in ms is a 
32 bit number anyway. The same rollover issue will exist if you 
software extend the timer to 64 bits. You must assume 1 rollover. If 
it is more than 1, the timer is wrong.



The variables in the gd are
u32 prev_timer;
u32 timer_in_ms;
u16 timer_remainder;

/* gd-timer remainder must be initialized to 0 (actually, an number 
less than 3576, but 0 is nice). Other two variables don't matter but can 
be initialized if desired  */


u32 get_raw_ms()
{
u32 delta;
   u32  t_save;

  read(t_save);   /* atomic read of the hardware 32 bit timer 
running at 3.576 MHz */

  delta_t = (t_save  - gd-prev_timer) ;

  gd-prev_timer =  t_save;
 /*
   Hopefully, the following two lines only results in one hardware 
divide when optimized. If your CPU has no hardware divide, or if it 
slow, see second method .

*/
 gd-timer_in_ms += delta_t  / 3576; /* convert elapsed time to ms */
 gd-timer_remainder += delta_t  % 3576; /* add in remaining part 
not included above */

 if (gd-timer_remainder = 3576) /* a carry has been detected */
{
   ++gd-timer_in_ms;
   gd-timer_remainder -= 3576; /* fix remainder for the carry above */
}

 return(gd-timer_in_ms)
}

This approach works well when the number of ticks per ms is an exact 
number representable as a small integer, as it is in this case. It is 
exact with a clock rate of 600 MHz, but is not exact for a clock rate of 
666 MHz. 67 is not an exact estimate of ticks per ms, It is off by 
0.5 % That should be acceptable for use as a timeout delay. The 
accumulated error in a 10 second delay should be less than 0.5 ms.


There is a way that the divide above can be approximated by multiplying 
by an appropriate fraction, taking the resulting delta t in ms, 
multiplying it by 3576, and subtracting the product from the original 
delta to get the remainder.  This is the way to go if your CPU divides 
slowly or not at all. This approach is presented below.


the vaues in gd are as follows:

u32 prev_timer;
u32 timer_in_ms;

/*
One tick of the 3.576 MHz timer corresponds to 1/3.576/1000 ms,
or 0.000279642 ms. Scale the fraction by 65536 (16 bit shift),
you get 37532.9217
*/
u32 get_raw_ms(void)
{
  u32  t_save;
  u32  temp;

  /* read the hardware 32 bit timer running at 3.576 MHz */
  read_timer_atomic(t_save);
  t_save -= gd-prev_timer; /* get delta time since last call */
  gd-prev_timer += t_save; /* assume we will use all of the counts */

  /*
   * This first while loop is entered for any delta time  about 18.3 
ms. The

   * while loop will execute twice 2.734% of the time, otherwise once.
   */
  while (t_save  65535)
  {
temp = t_save  16;   /* extract msb */
temp  = ((temp * 37532) + ((temp * 60404)  16))  11;
/* temp  = (temp * 37532)  11; */
gd-timer_in_ms += temp;
t_save  -= temp * 3576;
  }
  /*
   * This second while loop is entered for 94.837% of all possible 
delta times,
   * 0 through 0X. The second while loop will execute twice 
0.037% of

   * the time, otherwise once.
   */
  while (t_save = 3576)
  {
temp  = (t_save * 37532)  (16 + 11);
if (temp == 0)
  temp = 1; /* we know that 1 works for sure */
gd-timer_in_ms += temp;
t_save  -= temp * 3576;
  }
  gd-prev_timer -= t_save; /* restore any counts we didn't use this 
time */

  return gd-timer_in_ms;
}

I have tested this code and it seems to work fine for me. I have 
attached a more readable copy as example .c for those who wish to play 
around with it. In my original post, I had a version of this code that 
could be used with different clock rates. I can provide the same 
functionality with this code, for CPUs/systems where the clock rate is 
unknown at compile time or is variable. I can also address the error

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/22/2011 5:02 PM, Graeme Russ wrote:
 Dear Reinhard,

 On Sun, May 22, 2011 at 6:15 PM, Reinhard Meyer
 u-b...@emk-elektronik.de  wrote:
 Dear J. William Campbell,

 please demonstrate for me (and others), by a practical example,
 how _any_ arithmetic (even less with just shifts and multiplies)
 can convert a free running 3.576 MHz (wild example) free running
 32 bit counter (maybe software extended to 64 bits) into a ms
 value that will properly wrap from 2**32-1 to 0 ?

 I fail to see how that will be possible...
 These may help:

 http://blogs.msdn.com/b/devdev/archive/2005/12/12/502980.aspx
 http://www.hackersdelight.org/divcMore.pdf

 of Google 'division of unsigned integer by a constant'

 Basically you will be dividing by a constant value of 3576 which can be
 highly optimised
Hi All,
Yes, you can do it this way. I prefer not to use properties of 
the constant because it doesn't allow multiple clock rates easily. The 
code I posted just a moment ago is easy to change for different rates 
because it does not rely much on the exact bit structure of 3576.
There are simple formulas for all the magic numbers in what I posted, 
so it is easy to change the input frequency.
Best Regards,
Bill Campbell
 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/22/2011 6:42 PM, Graeme Russ wrote:
 OK, so in summary, we can (in theory) have:
   - A timer API in /lib/ with a single u32 get_timer(u32 base) function
   - A HAL with two functions
 - u32 get_raw_ticks()
 - u32 get_raw_tick_rate() which returns the tick rate in kHz (which
   max's out at just after 4GHz)
   - A helper function in /lib/ u32 get_raw_ms() which uses get_raw_ticks()
 and get_tick_rate() to correctly maintain the ms counter used by
 get_timer() - This function can be weak (so next point)
   - If the hardware supports a native 32-bit ms counter, get_raw_ms()
 can be overridden to simply return the hardware counter. In this case,
 get_raw_ticks() would return 1
   - Calling of get_raw_ticks() regularly in the main loop (how ofter will
 depend on the raw tick rate, but I image it will never be necessary
 to call more often than once every few minutes)
   - If the hardware implements a native 32-bit 1ms counter, no call in
 the main loop is required
   - An optional HAL function u32 get_raw_us() which can be used for
 performance profiling (must wrap correctly)
Hi All,
   Graeme, I think you have stated exactly what is the best 
approach to this problem.  I will provide a version of get_raw_ms that 
is  initialized using get_raw_tick_rate that will work for all 
reasonable values of raw_tick_rate. This will be the generic 
solution. Both the initialization function and the get_raw_ms function 
can be overridden if there is reason to do so, like exact clock rates. 
I will do the same with get_raw_us. This will be posted sometime on 
Monday for people to review, and to make sure I didn't get too far off 
base. Thank you to both Graeme and Reinhard for looking at/working on 
this.. Hopefully, this solution will put this timing issue to rest for 
all future ports as well as the presently existing ones.
On a different note, the graylisting application is causing 
some (about half) of my replies to the list from showing up. The 
messages are going to the specified individuals correctly, but not 
always to the list. This is apparently because my ISP has so many 
different network address available that the probability of using the 
same one (or same subnet) is not very high. Is there anything that can 
be done about this?

Best Regards,
Bill Campbell
 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

On 5/22/2011 8:26 PM, Reinhard Meyer wrote:
 Dear J. William Campbell,
 On 5/22/2011 1:15 AM, Reinhard Meyer wrote:
 Dear J. William Campbell,

 please demonstrate for me (and others), by a practical example,
 how _any_ arithmetic (even less with just shifts and multiplies)
 can convert a free running 3.576 MHz (wild example) free running
 32 bit counter (maybe software extended to 64 bits) into a ms
 value that will properly wrap from 2**32-1 to 0 ?
 Hi All
 I accept the challenge! I will present two ways to do this, one using 
 a 32 bit by 16 bit divide, and one using only multiplies.
 This first method is exact, in that there is no difference in 
 performance from a hardware counter ticking at the 1 ms rate. This 
 is accomplished by operating the 1 ms counter based on the delta time 
 in the hardware time base. It is necessary to call this routine often 
 enough that the hardware counter does not wrap more than once between 
 calls. This is not really a problem, as this time is 1201 seconds or 
 so. If the routine is not called for a long time, or at the first 
 call, it will return a timer_in_ms value that will work for all 
 subsequent calls that are within a hardware rollover interval. Since 
 the timer in ms is a 32 bit number anyway. The same rollover issue 
 will exist if you software extend the timer to 64 bits. You must 
 assume 1 rollover. If it is more than 1, the timer is wrong.


 The variables in the gd are
 u32 prev_timer;
 u32 timer_in_ms;
 u16 timer_remainder;

 /* gd-timer remainder must be initialized to 0 (actually, an number 
 less than 3576, but 0 is nice). Other two variables don't matter but 
 can be initialized if desired */

 u32 get_raw_ms()
 {
 u32 delta;
 u32 t_save;

 read(t_save); /* atomic read of the hardware 32 bit timer running at 
 3.576 MHz */
 delta_t = (t_save - gd-prev_timer) ;

 gd-prev_timer = t_save;
 /*
 Hopefully, the following two lines only results in one hardware 
 divide when optimized. If your CPU has no hardware divide, or if it 
 slow, see second method .
 */
 gd-timer_in_ms += delta_t / 3576; /* convert elapsed time to ms */
 gd-timer_remainder += delta_t % 3576; /* add in remaining part not 
 included above */
 if (gd-timer_remainder = 3576) /* a carry has been detected */
 {
 ++gd-timer_in_ms;
 gd-timer_remainder -= 3576; /* fix remainder for the carry above */
 }

 return(gd-timer_in_ms)
 }

 Thank you! Basically this is similar to a Bresenham Algorithm.
Hi All,
   Yes, I think you are correct. I didn't know it by that name, but 
i think you are correct. It is a bit different use of the idea, but it 
is very similar.


 This approach works well when the number of ticks per ms is an exact 
 number representable as a small integer, as it is in this case. It is 
 exact with a clock rate of 600 MHz, but is not exact for a clock rate 
 of 666 MHz. 67 is not an exact estimate of ticks per ms, It is 
 off by 0.5 % That should be acceptable for use as a timeout 
 delay. The accumulated error in a 10 second delay should be less than 
 0.5 ms.

 I would think the non exact cases result in such a small error that 
 can be
 tolerated. We are using the ms tick for timeouts, not for providing a 
 clock
 or exact delays. We should just round up when calculating the divider.
Yes, we should round off the divider value, so 666.66 MHz rounds 
to 67 ticks/Ms while 333.33 MHz rounds to 33 ticks/Ms.

 Hence the hick-ups that result when this is not called frequent enough to
 prevent a multiple rollover of the raw value between calls do not matter
 either (they should be just documented).
Good, I am glad we agree on this also.


 There is a way that the divide above can be approximated by 
 multiplying by an appropriate fraction, taking the resulting delta t 
 in ms, multiplying it by 3576, and subtracting the product from the 
 original delta to get the remainder. This is the way to go if your 
 CPU divides slowly or not at all. This approach is presented below.

 [...]

 Optimizations would be up to the implementer of such a hardware and work
 only if the divider is a compile time constant. Often the divider will be
 run time determined (AT91 for example).
Correct. I will provide a generic version that computes the constants 
at run time.  If the clock rate is a constant, these routines can be 
overridden at compile/link time. This generic version should be 
available on Monday for further review.

Best Regards,
Bill Campbell

 Reinhard



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] Review of U-Boot timer API

2011-05-21 Thread J. William Campbell

On 5/21/2011 5:06 PM, Graeme Russ wrote:
 On 22/05/11 01:33, J. William Campbell wrote:
 On 5/21/2011 5:38 AM, Graeme Russ wrote:
 Hi All,
 4. Implement microsecond API - get_usec_timer()
 ---
- Useful for profiling
- A 32-bit microsecond counter wraps in 71 minutes - Probably OK for most
  U-Boot usage scenarios
- By default could return get_timer() * 1000 if hardware does not support
  microsecond accuracy - Beware of results   32 bits!
 Hi All,
I think the multiply overflowing an unsigned long is ok here as long
 as the timeout value you desire is less than 71 seconds. This assumes that
 the CPU returns the correct lower 32 bits when overflow occurs, but I think
 this is the normal behavior(?)
 I think you mean 71 minutes - 2^32 = 4294967296 (usec) = 4294967.296 (msec)
 = 4294.967296 s = 71.582788267 minutes

 So provided any millisecond timer using a 32-bit microsecond timer as a
 time base is called every 71 minutes, a 32-bit microsecond  timer should
 suffice to keep the msb's accurately maintained by software

Hi Graeme,
 Yes, you are certainly correct, it is minutes, not seconds.
- If hardware supports microsecond resolution counters, get_timer() could
  simply use get_usec_timer() / 1000
 I think this actually is NOT equivalent to the current API  in
 that the full 32 bits of the timer is not available and as a result the
 wrapping properties of a 32 bit subtract for delta times will not work
 properly. If a larger  counter is available in hardware, then it is
 certainly possible to do a 64 by 32 bit divide in get_timer, but probably
 you don't want to do that  either. As previously discussed, it is possible
 to extract a 32 bit monotonic counter of given resolution (microsecond or
 millisecond resolution) from a higher resolution counter using a shift to
 approximately the desired resolution followed by a couple of multiply/add
 functions of 32 bit resolution.  To do this with a microsecond resolution,
 a 42 bit or larger timer is required. The extra bits can be provided in
 Of course, how silly of me - To downscale the microsecond timer down to
 milliseconds you need to have at least 1000 times more resolution
 (9.965784285 bits) - It was late ;)

 software as long as the get_timer/get_usec_timer routines are called more
 often than every 71/2 sec, so that a correct delta in microseconds can be
 obtained. Note that when the timer is not actively in use (not  called
 often enough), the millisecond timer msb would stop updating, but that
 wouldn't matter.
 Minutes - see above
Correct.
 If the hardware supports sub-microsecond accuracy in a longer
 register, say 64 bits, you can just convert the 64 bit hardware timer to 32
 bit microseconds or milliseconds by a shift  and 32 bit multiplies
 Yes

Good luck with this effort. I think getting the timer API and also
 the method of implementation of the interface to the hardware to be the
 same across all u-boot architectures is a very good idea, and it is
 possible. However, it is a quite a bit of work and I am glad you are brave
 enough to try!
 It's all there already - it just needs a little bit of housekeeping :)
Correct, if we do not worry too much about the low level details of 
get_timer. It looks to me like there is a lot of cruft there, depending 
on which architecture one looks at. Many implementations create a 
parallel universe of get_ticks or some similar timing system that is 
then used to support get_timer. However, the get_ticks routines are also 
used in timeouts elsewhere in the code. Searching on get_timer doesn't 
find these non-standard usages. It would be nice to fix these also, but 
every bit does help!

Best Regards,
Bill Campbell
 Regards,

 Graeme



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] ARM timing code refactoring

2011-01-24 Thread J. William Campbell

On 1/24/2011 5:02 AM, Wolfgang Denk wrote:
 Dear Albert ARIBAUD,

 In message4d3d2942.4060...@free.fr  you wrote:
 That is assuming a 64-bit timebase, isn't it? for CPUs / SoCs that don't
 have such a timebase but only a 32-bit timer, the bogo_ms/jiffy would
 not go through the full 32-bit range, which would cause issues with the
 timing loops on rollover -- and while a timeout of more than 65 sec may
 not be too likely, a timeout starting near the wraparound value of
 bogo_ms still could happen.
Hi All,
If you use my approach of shifting right by n bits  (in order to 
get a counter that has about 1 ms resolution),  zeros are shifted into 
the top. The counter would never reach more than (32-n) bits in 
magnitude. This is easily fixed by creating a virtual n msbs for the 
counter, as must be done if a 64 bit counter is to be simulated. 
Actually, I only need n bits, but in any case it requires detecting 
that the counter has backed up mod 32 bits and if so, increase the 
virtual counter by 1.

If you really, really, really want a timer that ticks at a 1 ms rate, 
instead of bogo_ms/jiffies, I propose the following:

u32 get_time()
{
 u32 delta_t_bogo_ms;
u32  t_save;
#if defined(TIMER_IS_64_BITS)
 u64 tick;
 read(tick);

  t_save  = tick  gd-timer_shift;
  delta_t_bogo_ms = t_bogo_ms - gd-prev_timer;
#else
   read(t_save);
   delta_t_bogo_ms = (t_save  - gd-prev_timer)  gd-timer_shift;
#endif
   gd-prev_timer = t_save;  /* save previous counter */
   if (delta_t_bogo_ms  gd-bogo_ms_per_65_sec)
   {
  gd-fract_timer+=  delta_t_bogo_ms * 
gd-cvt_bogo_ms_to_ms; /* accumulate fraction of ms */
  gd-timer_in_ms += gd-fract_timer  16;
 gd-fract_timer = 0X;
   }
   else
 gd-fract_timer = 0; /* start accumulating from 0 fraction */
   return(gd-timer_in_ms)
}

This routine will create a timer in ms, that will be 
accurate as long as this routine is called either once about ever 65 
seconds or once per time base rollover, whichever is less. Nested timer 
use is ok. If the timer is not called frequently enough, it will return 
the same value twice, but after that it will start timing normally. This 
shouldn't matter, as the second returned value will  be the start of a 
timing loop. Timeout values can be the full 32 bits, as long as you keep 
calling the routine frequently enough. No initialization is required.
   Note that you can (and probably should) use the bottom 32 
bits of the hardware timebase as a 32 bit timebase unless the clock 
would overflow in 65 seconds (running faster than about 66 MHz), or if 
you want to  relax the 65 seconds.
If you want to save a word in the gd data,  use 0X1 instead of 
bogo_ms_per_65_sec (or a more precise value if you know it). Note that 
gd-timer_shift and gd-cvt_bogo_ms_to_ms can also be replaced by 
constants if the clock speed is fixed.

This is more expensive than using the bogo_ms timer, but does have the 
advantage that everything is in ms. FWIW, I think converting from ms to 
some other unit for loop control is fine, as long as we have a standard 
routine to do that that is cheap. However, others may not agree. For 
sure, passing around 64 bit tick values to do this process is, IMHO, 
vast overkill and not a good general solution, as many processors really 
don't like to do 64 bit operations.

Best Regards.
Bill Campbell


 Sorry, but I don't get it.  What exactly is the problem with a 32 bit
 counter, and why would it not go through the full 32-bit range?

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] ARM timing code refactoring

2011-01-23 Thread J. William Campbell

On 1/23/2011 2:57 PM, Wolfgang Denk wrote:
 Dear Reinhard Meyer,

 In message4d3c9bfc.1010...@emk-elektronik.de  you wrote:
 get_timer() returns a monotonous upward counting time stamp with a
 resolution of milliseconds. After reaching ULONG_MAX the timer wraps
 around to 0.
 Exactly that wrap makes the situation so complicated, since the simple code
 u32 get_timer(void)
 {
return (ticks * 1000ULL) / tickspersec;
 }
 won't do that wrap.
 Do you have a better suggestion?

 The get_timer() implementation may be interrupt based and is only
 available after relocation.
 Currently it is used before relocation in some places, I think I have
 seen it in NAND drivers... That would have to be changed then.
 Indeed.  It is unreliable or even broken now.

 This is already implemented functionally very closely (apart from factoring 
 and the
 get_timer(void) change) to this in AT91, the only (academic) hitch is that 
 it will
 burp a few billion years after each reset :)
 What bothers me is the need for 64 bit mul/div in each loop iteration, for 
 CPUs without
 hardware for that this might slow down data transfer loops of the style

 u32 start_time = get_timer();
 do {
  if (data_ready)
  /* transfer a byte */
  if (get_timer() - start_time  timeout)
  /* fail and exit loop */
 } while (--bytestodo  0);

 since get_timer() will be somewhat like:

  return (tick * 1000ULL) / tickspersec;

 As I stated before, tickspersec is a variable in, for example, AT91. So the
 expression cannot be optimized by the compiler.
 I don't think this is the only way to implement this. How does Linux
 derive time info from jiffies?

Hi All,
   In order to avoid doing 64 bit math, we can define a jiffie 
or a bogo_ms that is the 64 bit timebase shifted right such that the 
lsb of the bottom 32 bits has a resolution of between 0.5 ms and 1 ms. 
It is then possible to convert the difference between two jiffie/bogo_ms 
values to a number of ms using a 32 bit multiply and a right shift of 16 
bits, with essentially negligible error.  get_bogo_ms() would return a 
32 bit number in bogo_ms, thus the timing loop would be written.

u32 start_time = get_bogo_ms();
do {
 if (data_ready)
 /* transfer a byte */
 if (bogo_ms_to_ms(get_timer() - start_time)  TIMEOUT_IN_MS)
 /* fail and exit loop */
} while (--bytestodo  0);

u32 get_bogo_ms()
{
 u64 tick;
 read(tick);

  return (tick  gd-timer_shift);
}
u32 bogo_ms_to_ms(u32 x)
{
/* this code assumes the resulting ms will be between 0 and 65535, 
or 65 seconds */
return ((x * gd-cvt_bogo_ms_to_ms)  16); /* cvt_bogo_ms_to_ms 
is a 16 bit binary fraction */
}

All the above code assumes timeouts are 65 seconds or less, which I 
think is probably fair. Conversion of ms values up to 65 seconds to 
bogo_ms is also easy, and a 32 bit multiplied result is all that is 
required.
What is not so easy is converting a 32 bit timer value to ms.  It can be 
done if the CPU can do a 32 by 32 multiply to produce a 64 bit result, 
use the msb, and possibly correct the result by an add if  bit 32,of the 
timer is set.  You need a 33 bit counter in bogo_ms to get a monotonic, 
accurate 32 bit counter in ms. The powerpc can use a mulhw operation to 
do this, and any CPU that will produce a 64 bit product can do this. 
However, many CPUs do not produce 64 bit products easily. Using division 
to do these operations are even less appealing, as many CPUs do not 
provide hardware division at all. Since it is not necessary to do this 
conversion to easily use timeouts with 1 ms resolution and accuracy,  I 
think the idea of not using a timer in ms but rather bogo_ms/jiffies is 
possibly better?

Best Regards,
Bill Campbell

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] TIMER cleanup RFC, was: [PATCH 4/4] arm920t/at91/timer: replace bss variables by gd

2010-11-30 Thread J. William Campbell

On 11/30/2010 1:14 AM, Reinhard Meyer wrote:
 Dear Wolfgang Denk,

 what we really need is only a 32 bit monotonous free running tick that 
 increments
 at a rate of at least 1 MHz. As someone pointed out a while ago, even at 1GHz 
 that would
 last for four seconds before it rolls over. But a 1HGz counter could be 64 
 bit internally
 and always be returned as 32 bits when it is shifted right to bring it into 
 the few MHz
 range.

 Any architecture should be able to provide such a 32 bit value. On powerpc 
 that would
 simply be tbu|tbl shifted right a few bits.

 An architecture and SoC specific timer should export 3 functions:

 int timer_init(void);
 u32 get_tick(void); /* return the current value of the internal free running 
 counter */
 u32 get_tbclk(void); /* return the rate at which that counter increments (per 
 second) */

 A generic timer function common to *most* architectures and SoCs would use 
 those two
 functions to provice udelay() and reset_timer() and get_timer().
 Any other timer functions should not be required in u-boot anymore.

 However get_timer() and reset_timer() are a bit of a functional problem:

 currently reset_timer() does either actually RESET the free running timer 
 (BAD!) or
 remember its current value in another (gd-)static variable which later is 
 subtracted
 when get_timer() is called. That precludes the use of several timers 
 concurrently.

 Also, since the 1000Hz base for that timer is usually derived from get_tick() 
 by
 dividing it by some value, the resulting 1000Hz base is not exhausting the 32 
 bits
 before it wraps to zero.

 Therefore I propose two new functions that are to replace reset_timer() and 
 get_timer():

 u32 init_timeout(u32 timeout_in_ms); /* return the 32 bit tick value when the 
 timeout will be */
 bool is_timeout(u32 reference); /* return true if reference is in the past */

 A timeout loop would therefore be like:

 u32 t_ref = timeout_init(3000);   /* init a 3 second timeout */

 do ... loop ... while (!is_timeout(t_ref));

 coarse sketches of those functions:

 u32 init_timeout(u32 ms)
 {
   return get_ticks() + ((u64)get_tbclk() * (u64)ms) / (u64)1000;
 }

 bool is_timeout(u32 reference)
 {
   return ((int)get_ticks() - (int)reference)  0;
 }

 Unfortunately this requires to fix all uses of get_timer() and friends, but 
 I see no other
 long term solution to the current incoherencies.

 Comments welcome (and I would provide patches)...
Hi All,
   The idea of changing the get_timer interface to the 
init_timeout/is_timeout pair has the advantage that it is only necessary 
to change the delay time in ms to an internal timebase once, and after 
that, only a 32-bit subtraction is required. I do not however like the 
idea of using 64 bit math to do so, as on many systems this is quite 
expensive. However, this is a feature that can be optimized for 
particular CPUs. I also REALLY don't like the idea of having a get_ticks 
function, because for sure people will use this instead of the desired 
interface to the timer because it is better. Then we get back into a 
mess. Since in most cases get_ticks is one or two instructions, please, 
let us hide them in init_timeout/is_timeout.
An alternate approach, which has the merit of being more like 
the originally intended interface, simply disallows reset_timer since it 
is  totally unnecessary. The only dis-advantage of the original approach 
using just get_timer is that the conversion to ms must be considered at 
each call to get_timer, and will require at a minimum one 32 bit integer 
to remember the hardware timer value the last time get_timer was called 
(unless the hardware time can be trivially converted to a 32 bit value 
in ms, which is quite uncommon). This is not a high price to pay, and 
matches the current usage. This is probably for Mr. Denk to decide. If 
we were just starting now, the init_timeout/is_timeout is simpler, but 
since we are not, perhaps keeping the current approach has value.
I would really like to help by providing some patches, but I am 
just way too busy at present.

Best Regards,
Bill Campbell
 Reinhard

 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] TIMER cleanup RFC, was: [PATCH 4/4] arm920t/at91/timer: replace bss variables by gd

2010-11-30 Thread J. William Campbell

On 11/30/2010 7:48 AM, Reinhard Meyer wrote:
 Dear J. William Campbell,
 On 11/30/2010 1:14 AM, Reinhard Meyer wrote:
 Dear Wolfgang Denk,

 what we really need is only a 32 bit monotonous free running tick that
 increments
 at a rate of at least 1 MHz. As someone pointed out a while ago, even
 at 1GHz that would
 last for four seconds before it rolls over. But a 1HGz counter could
 be 64 bit internally
 and always be returned as 32 bits when it is shifted right to bring it
 into the few MHz
 range.

 Any architecture should be able to provide such a 32 bit value. On
 powerpc that would
 simply be tbu|tbl shifted right a few bits.

 An architecture and SoC specific timer should export 3 functions:

 int timer_init(void);
 u32 get_tick(void); /* return the current value of the internal free
 running counter */
 u32 get_tbclk(void); /* return the rate at which that counter
 increments (per second) */

 A generic timer function common to *most* architectures and SoCs would
 use those two
 functions to provice udelay() and reset_timer() and get_timer().
 Any other timer functions should not be required in u-boot anymore.

 However get_timer() and reset_timer() are a bit of a functional problem:

 currently reset_timer() does either actually RESET the free running
 timer (BAD!) or
 remember its current value in another (gd-)static variable which later
 is subtracted
 when get_timer() is called. That precludes the use of several timers
 concurrently.

 Also, since the 1000Hz base for that timer is usually derived from
 get_tick() by
 dividing it by some value, the resulting 1000Hz base is not exhausting
 the 32 bits
 before it wraps to zero.
 (###)
Hi All,
   A correct method to provide a full 32 bits of resolution at a 1 
ms rate is simple. It requires maintaining a software timer in ms. This 
timer is updated by converting the difference in the hardware time base 
to ms and then adding the correct number of ms to the software timer.
As previously discussed, this only requires calling get_timer once a 
second or so. The conversion can be made simply in most cases, 
especially if the clock rate is a nice number. Even in the worst case, 
it is not too hard to adjust the saved hardware timer value to contain 
any remainder ticks. Yes, if this is done incorrectly, the results are bad.
 Therefore I propose two new functions that are to replace
 reset_timer() and get_timer():

 u32 init_timeout(u32 timeout_in_ms); /* return the 32 bit tick value
 when the timeout will be */
 bool is_timeout(u32 reference); /* return true if reference is in the
 past */

 A timeout loop would therefore be like:

 u32 t_ref = timeout_init(3000);/* init a 3 second timeout */

 do ... loop ... while (!is_timeout(t_ref));

 coarse sketches of those functions:

 u32 init_timeout(u32 ms)
 {
  return get_ticks() + ((u64)get_tbclk() * (u64)ms) / (u64)1000;
 }

 bool is_timeout(u32 reference)
 {
  return ((int)get_ticks() - (int)reference)   0;
 }

 Unfortunately this requires to fix all uses of get_timer() and
 friends, but I see no other
 long term solution to the current incoherencies.

 Comments welcome (and I would provide patches)...
 Hi All,
The idea of changing the get_timer interface to the
 init_timeout/is_timeout pair has the advantage that it is only necessary
 to change the delay time in ms to an internal timebase once, and after
 that, only a 32-bit subtraction is required.
 Exactly.
 I do not however like the
 idea of using 64 bit math to do so, as on many systems this is quite
 expensive. However, this is a feature that can be optimized for
 particular CPUs.
 64 Bit math is only necessary when get_tbclk() times the maximun anticipated
 timeout (in ms) is larger than 32 bits. This happens easyly when tbclk is a
 few MHz.
What I think you mean to say here is that the delay time fits easily 
into 32 bits for all reasonable clock rates. For instance, a 1 GHz clock 
rate yields a maximum delay of about 2 seconds in a 32 bit word?  For 
fast clock rates, it is probably advisable to shift the 64 bit counter 
right a few bits to ensure adequate range.
   Besides the current working(!) implementations use 64 bit math
 already.
In ARM perhaps, but I think there are lots of working 32 bit in other 
processors(?).
   You can optimize into 32 bits by factoring the timeout_in_ms into
 whole seconds, and the remainder.
True.
 I also REALLY don't like the idea of having a get_ticks
 function, because for sure people will use this instead of the desired
 interface to the timer because it is better. Then we get back into a
 mess. Since in most cases get_ticks is one or two instructions, please,
 let us hide them in init_timeout/is_timeout.
 Agreed. However I intended to split the timer into two sources (read above):
 one hardware dependant part exporting exactly those functions, and one
 generic part using them.
Yes, and that is EXACTLY what I think is a bad idea for the reason I 
mentioned. If the get_ticks interface

Re: [U-Boot] Timer implementations

2010-11-01 Thread J. William Campbell

On 10/27/2010 11:02 PM, Reinhard Meyer wrote:
 Dear J. William Campbell,
 Hi All,

 I am pretty sure the migration to 64 bits was caused by 1) people not 
 understanding that the timer operating on time DIFFERENCES would work 
 fine even if the underlying timer wrapped around (most probable 
 problem) and possibly 2) broken timer functions causing bogus 
 timeouts, improperly fixed by switching to 64 bits.

 I think u-boot could get along just fine with only 2 time related 
 functions, uint_32 get_timer(uint_32 base) and udelay(uint 32 delay). 
 udelay will only work on small values of delay, on the order of 
 milliseconds. It is to be used when a short but precise delay in 
 microsecond resolution is required. Users of get_timer must 
 understand that it is only valid if it is called often enough, i.e. 
 at least once per period of the underlying timer. This is required 
 because u-boot does not want to rely on interrupts as a timer update 
 method. Therefore, all uses of get_timer must 1) call it once 
 initially to get a start value, and 2) call get_timer at least once 
 per period of the underlying hardware counter. This underlying period 
 is guaranteed to be at least 4.29 seconds (32 bit counter at 4 GHz). 
 Note that this does NOT mean that the total wait must be less than 
 4.29 seconds, only that the rate at which the elapsed time is TESTED 
 must be adequate.

 In order to implement this functionality, at least one hardware timer 
 of some kind is required. An additional software timer in 1 ms 
 resolution may be useful in maintaining the software time. If the 
 hardware timer update rate is programmable, u-boot MAY set the update 
 rate on initialization On initialization, u-boot MAY reset the 
 hardware timer and MAY reset any associated software timer. The 
 hardware timer MAY be started on initialization. On each call to 
 get_timer(), u-boot MUST start the hardware timer if it was not 
 started already. On calls to get_timer, u-boot MUST NOT reset the 
 hardware timer if it was already started. The software timer MAY be 
 reset if u-boot can unambiguously determine that more than 4.29 
 seconds has elapsed since the last call to get_timer.

 The simplest case for implementing this scheme is if two programmable 
 timers exist that can be set to 1ms and 1us. The timers are 
 initialized at start-up, get_timer just returns the 32 bit 1 ms timer 
 and udelay just waits for the number of ticks required on the second 
 timer to elapse. The most common harder case is where there is only 
 one timer available, it is running at 1 us per tick or faster, and we 
 cannot control the rate. udelay is still easy, because we can convert 
 the (small) delay in us to a delay in ticks by a 32 bit multiply that 
 will not overflow 32 bits even if we have quite a few fractional bits 
 in the tics per microsecond value. The elapsed ticks required is the 
 (delay in us * us/per tick)  # fractional bits in us/per tick. If 
 that is not close enough for you, you can do it as (delay in us * 
 (integer part of us/tick)) + ((delay in us * (fractional 
 part)us/tick)  # fraction bits). For nice numbers, like any 
 integral number of MHz, there is no fractional

 part. Only numbers like 66 MHz, or 1.666 GHz require messing with the 
 fractional part.
 For get_timer, it is a bit harder. The program must keep two 32 bit 
 global variables, the timer reading last time and the software 
 timer in 1 ms resolution. Whenever get_timer is called, it must 
 increase the software timer by the number of ms that have elapsed 
 since the previous update and record the corresponding timer reading 
 as the new last time. Note that if the number of ms elapsed is not 
 an integer (a common case), the value recorded as the last time 
 must be decreased by the number of ticks not included in the 1 ms 
 resolution software timer. There are many different ways to 
 accomplish update, depending on what hardware math capabilities are 
 available, and whether one thinks efficiency is important here. 
 Conceptually, you convert the elapsed time in ticks into an 
 equivalent number of ms, add that number to the software timer, store 
 the current value of the hardware timer in last time, and subtract 
 any remainder ticks from that value. If the elapsed time is les
 s
 that one ms, do no update of last hardware time and return the 
 current software counter. If the elapsed time is greater than 4.29 
 seconds, reset the software counter to 0, record the current hardware 
 counter time and return the current software counter. In between, do 
 the math, which will fit into 32 bits.

 If this idea seems like a good one, I can provide more detail on the 
 conversions for various hardware capabilities is people want. 
 Comments welcome.

 To get the timer mess cleaned up three things have to happen:

Hi All,

   I am glad somebody was still interested. I was afraid I had 
scared everyone off.
 1. A consensus and documentation how it MUST

Re: [U-Boot] [PATCH v2] mmc: omap: timeout counter fix

2010-10-26 Thread J. William Campbell

On 10/25/2010 11:01 PM, Reinhard Meyer wrote:
 Dear Wolfgang Denk,
 Dear Reinhard Meyer,

 In message4cc66a67.4000...@emk-elektronik.de   you wrote:
 It fails in case the timer wraps around.

 Assume 32 bit counters, start time = 0xFFF0, delay = 0x20. It
 will compute end = 0x10, the while codition is immediately false, and
 you don't have any delay at all, which most probably generates a
 false error condition.
 I used and assumed a 64 bit counter, that will not wrap around while
 our civilization still exists...

 The code is still wrong, and as a simple correct implementation exists
 there is no excuse for using such incorrect code.

 Please fix that!
 Agreed here. People are invited to dig through u-boot and find all
 those places.

 If get_ticks() is only 32 bits worth, both methods will misbehave
 at a 32 bit wrap over.
 No.

start = time();
while ((time() - start)delay)
...

 This works much better (assuming unsigned arithmetics).
 True, provided the underlying timer is really 64 bits, otherwise
 this fails, too...
 You are wrong. Try for example this:
   
 - snip ---
 #includestdio.h
   
 int main(void)
 {
  unsigned int time = 0xFFF0;
  unsigned int delay = 0x20;
  unsigned int start;

 You are wrong here, because you take it out of context.
 My demo is using the (declared as) 64 bit function get_ticks().
 I mentioned above that this function MUST be truly returning 64
 bits worth of (incrementing) value to make any version work.
 If get_ticks() just returns a 32 bit counter value neither method will work
 reliably. Just check all implementations that this function is implemented
 correctly.
Hi All,
   I have wondered for quite some time about the rush to make 
get_ticks() return a 64 bit value. For any reasonable purpose, like 
waiting a few seconds for something to complete, a 32 bit timebase is 
plenty adequate. If the number of ticks per second is 10, i.e. a 
1 GHz clock rate, the clock wraps in a 32 bit word about every five 
seconds.
The trick is that time always moves forward, so a current get_ticks() - 
a previous get_ticks() is ALWAYS a positive number. It is necessary to 
check the clock more often than (0X1 - your_timeout) times per 
second, but unless your timeout is very near the maximum time that fits 
into 32 bits, this won't be a problem. Most CPUs have a counter that 
count at a reasonable rate. Some CPUs also have a cycle counter that 
runs at the CPU clock rate. These counters are useful to determine 
exactly how many machine cycles a certain process took, and therefore 
they have high resolution. Timers for simple delays neither need nor 
want such resolution. If the only counter available on you CPU runs at 
several GHz, and is 64 bits long,  just shift it right a few bits to 
reduce the resolution to a reasonable  resolution and return a 32 bit 
value. There is no need for a bunch of long long variables and extra 
code running around to process simple timeouts. It may be that we need a 
different routine for precision timing measurements with high 
resolution, but it needn't, and probably shouldn't IMHO be get_ticks().

Best Regards,
Bill Campbell
 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Timer implementations

2010-10-26 Thread J. William Campbell

On 10/26/2010 6:33 AM, Reinhard Meyer wrote:
 Dear Wolfgang Denk,
 Then the define CONFIG_SYS_HZ should not be in everyboard.h since that
 suggests that a board developer has some freedom there...
 Agreed - there are historical reasons this has ever been changable at
 all.

 and MOST IMPORTANT that some implementations of udelay() might
 call reset_timer() and therefore break an outer timeout loop !!!
 Such implementations are inherently broken and need to be fixed.
 Found such in arm926ejs/omap... But then, that timer is multiple-broken:
 relocation broken (uses static data), returns 32 a bit value in get_ticks(),
 returns CONFIG_SYS_HZ in get_tbclk() instead of the rate get_ticks()
 increments...

 PXA:
 void udelay_masked (unsigned long usec)
 {
   unsigned long long tmp;
   ulong tmo;

   tmo = us_to_tick(usec);
   tmp = get_ticks() + tmo;/* get current timestamp */

   while (get_ticks()  tmp)
   /* loop till event _OR FOREVER is tmp happens to be  32 bit_ */
/*NOP*/;

 }

 unsigned long long get_ticks(void)
 {
   return readl(OSCR);
 }
 - not any better :( -- its the same code that AT91 had before I fixed it.

 It is also open if reset_timer() does actually reset the hardware timer
 (e.g. tbu/tbl at PPC) - which would be messing up any time difference
 calculation using get_ticks() - or does emulate that by remembering
 the hardware value and subtracting it later in every subsequent
 get_timer() call?
 This is an implementation detail.
 IF we require that get_ticks() and get_timer() shall not interfere with
 each other and IF both are based on the same hardware timer only the
 second method is available (same if the hardware timer is not easyly
 resettable).

 2. get_ticks() and friends operate at a higher rate (tbu/tbl for PPC).
 Since they are defined as having 64 bits they MUST not wrap at 32 bits,
 i.e. if the hardware provides only 32 bits, the upper 32 bits must be
 emulated by software.
 Right.

 Otherwise we have to document that get_ticks() cannot be used to get
 64 bit time differences.
 No. Such an implementation is broken and needs fixing.
 Original AT91 timer.c was like that, and I think other ARMs where this was
 copied around should be looked at... I don't know when get_timer() became
 64 bits, but it seems that some implementations just did change the return
 type: uint64 get_timer(void) {return (uint64)timer_val_32;}
Hi All,

  I am pretty sure the migration to 64 bits was caused by 1) people 
not understanding that the timer operating on time DIFFERENCES would 
work fine even if the underlying timer wrapped around (most probable 
problem) and possibly 2) broken timer functions causing bogus timeouts, 
improperly fixed by switching to 64 bits.

I think u-boot could get along just fine with only 2 time related 
functions,  uint_32 get_timer(uint_32 base) and udelay(uint 32 delay).  
udelay will only work on small values of delay, on the order of 
milliseconds. It is to be used when a short but precise delay in 
microsecond resolution is required.  Users of get_timer must understand 
that it is only valid if it is called often enough, i.e. at least once 
per period of the underlying timer. This is required because u-boot  
does not want to rely on interrupts as a timer update method. Therefore, 
all uses of get_timer must 1) call it once initially to get a start 
value, and 2) call get_timer at least once per period  of the underlying 
hardware counter. This underlying period is guaranteed to be at least 
4.29 seconds (32 bit counter at 4 GHz). Note that this does NOT mean 
that the total wait must be less than 4.29 seconds, only that the rate 
at which the elapsed time is TESTED must be adequate.

In order to implement this functionality, at least one hardware timer of 
some kind is required. An additional software timer in 1 ms resolution 
may be useful in maintaining the software time. If the hardware timer 
update rate is programmable, u-boot MAY set the update rate on 
initialization On initialization, u-boot MAY reset the hardware timer 
and MAY reset any associated software timer. The hardware timer MAY be 
started on initialization. On each call to get_timer(), u-boot MUST 
start the hardware timer if it was not started already. On calls to 
get_timer, u-boot MUST NOT reset the hardware timer if it was already 
started. The software timer MAY be reset if u-boot can unambiguously 
determine that  more than 4.29 seconds has elapsed since the last call 
to get_timer.

The simplest case for implementing this scheme is if two programmable 
timers exist that can be set to 1ms and 1us.  The timers are initialized 
at start-up, get_timer just returns the 32 bit 1 ms timer and udelay 
just waits for the number of ticks required on the second timer to 
elapse. The most common harder case is where there is only one timer 
available, it is running at 1 us per tick or faster, and we cannot 
control the rate. udelay is still

Re: [U-Boot] [PATCH 2/2] ARM: fix relocation support for onenand device.

2010-10-23 Thread J. William Campbell

On 10/23/2010 1:56 PM, Wolfgang Denk wrote:
 Dear Enric Balletbo i Serra,

 In message1287479602-21721-3-git-send-email-eballe...@iseebcn.com  you 
 wrote:
 We also have to relocate the onenand command table manually, otherwise
 onenand command don't work.

 Signed-off-by: Enric Balletbo i Serraeballe...@iseebcn.com
 ---
   arch/arm/lib/board.c |3 +++
   common/cmd_onenand.c |6 ++
   2 files changed, 9 insertions(+), 0 deletions(-)

Is this patch still necessary? I thought the relocation change made it OBE.

Best Regards,
Bill Campbell

 Applied, thanks.

 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [PATCH] mpc83xx: Add -fpic relocation support

2010-10-13 Thread J. William Campbell

  On 10/12/2010 11:30 PM, Albert ARIBAUD wrote:
 Le 12/10/2010 23:00, Joakim Tjernlund a écrit :

 Yes, but the difference isn't really the arch. It is the -mrelocatable
 flag that is the big difference.
 Not only: obviously, implementing GOT relocation is not done the same on
 both archs, and it simply is not beneficial on ARM wrt PPC in terms of
 instructions. I did a pretty extensive run of tests with and without
 -fPIC and -fPIE on ARM, and GOT relocation clearly makes code bigger,
 whereas it does not PPC.

 This simply implies that -fPIC is a better choice for PPC (and hence
 -mrelocatable) while -fpie is a better one for ARM.
Hi All,
  In particular, the PPC takes two 32 bit instructions to load the 
known address of a variable into a register. If the GOT is used, a 
single 32 bit instruction can load the address of a variable from the 
GOT table (pointed to by a fixed register) into a register. In both 
cases, there are two memory cycles, but in the GOT case, only one 
instruction is required. This is why the GOT based code is smaller. 
However, the GOT cannot be used to address constants and some other 
items that are not variables. I do think that -fPIC and -fpie are not 
mutually incompatible. On the PPC, the GOT references would be relocated 
in the loop that updates the GOT and the references to constants would 
be relocated by the ELF relocation code. That is how shared libraries 
are relocated.

Best Regards,
Bill Campbell

 Amicalement,

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [PATCH] futile c relocation attempt

2010-10-06 Thread J. William Campbell

  On 10/6/2010 2:43 AM, Graeme Russ wrote:
 On 06/10/10 01:48, Reinhard Meyer wrote:
 ---
   arch/arm/cpu/arm926ejs/start.S |8 -
   arch/arm/lib/board.c   |   57 
 +++-
   include/configs/top9000_9xe.h  |1 +
   3 files changed, 63 insertions(+), 3 deletions(-)

 I had a quick look at this and nothing is jumping out at me. Of course I am
 not familiar with ARM asm...

 I don't see any reason why this ultimately will not work eventually. You
 may be having some issues with the transition from asm-C-asm through the
 relocation - This was an especially painful thing for me involving an
 intermediate trampoline which I have only recently figured out how to remove.

 Maybe some memory barriers are needed to stop the C optimiser mangling things?

 I am sure what you have is very close to the real solution :)

 I do think the main relocation fixup loop can be moved into a common
 library in which case we can add additional case statements. The nice thing
 is that x86 as all Type 8 which is specifically allocated to x86 so my if
Hi All,
   I think that type 8 IS NOT allocated to the 386. For instance, 
R_PPC_ADDR14_BRTAKEN also has a value of 8. So does R_ARM_ABS8. I think 
that there will be a lot of #ifdefs just to keep the references 
symbolic. Many of the different platform  relocation types will come out 
to be the same code in the switch statement, but different symbolic 
names. There will also be some entries that are processor specific and 
have no equivalent on other processors. I think it would be a good idea 
to use the symbolic values of the Relocation types (as opposed to 
integer constants), as it will make the code clearer. There are sort of 
two ways to organize the code inside the switch statement. Since the 
code inside the switch statement is very short, it might be best for 
each architecture (ELF format) to be  bunched together, even at the 
expense of repeating the same executable statements that some other 
formats may use, as follows:
#ifdef PPC
case R_PPC_ADDR32:  /* S + A */
code to do the deed
   break;

case R_PPC_RELATIVE:  /* B + A */
code to do the deed
   break;
#endif
#ifdef I386
case R_386_32:  /* S + A */ /* I think this is the other 
location type Graeme used, but I could be wrong */
code to do the deed
   break;

case R_386_RELATIVE:  /* B + A */
code to do the deed
   break;

#endif
#ifdef ARM
case R_ARM_ABS32:  /* S + A */ /* I don't remember the ARM 
relocation types used  */
code to do the deed
   break;

case R_ARM_REL32:  /* B + A  - P */
code to do the deed
   break;

#endif

Or we could group the various Relocation types by what they actually do:
#ifdef PPC
case R_PPC_ADDR32:  /* S + A */
#endif
#ifdef I386
case R_386_32:  /* S + A */ /* I think this is the other 
location type Graeme used, but I could be wrong */
  #endif
  #ifdef ARM
case R_ARM_ABS32:  /* S + A */ /* I don't remember the ARM 
relocation types used  */
#endif
code to do the deed
   break;

#ifdef PPC
case R_PPC_RELATIVE:  /* B + A */
#endif
  #ifdef I386
 case R_386_RELATIVE:  /* B + A */
#endif
code to do the deed
   break;

#ifdef ARM
case R_ARM_REL32:  /* B + A  - P */
code to do the deed
   break;
#endif

Note that the ARM_REL32 is defined differently than the PPC/I386 
relative, FWIW. I also don't know what to use for the names of the 
binary formats. It would be nice if we could use something already in 
the header files? Thoughts on all this solicited!

Best Regards,
Bill Campbell
 TEXT_BASE checks can be kept. For size freaks, we could litter the code
 with #ifdefs to remove un-needed cases ;)

 Interestingly, ARM is adding gd-reloc_off while x86 is subtracting
 gd-reloc_off. If this is correct, I need to change the calculation of
 gd_reloc_off to be consistent
Adding is the way the specifications define it. add B+A.

Best Regards,
Bill Campbell
 Regards,

 Graeme
 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] ARM relocation, question to Heiko

  On 10/4/2010 3:13 AM, Wolfgang Denk wrote:
 Dear Albert ARIBAUD,

 In message4ca999ee.5030...@free.fr  you wrote:
 Note however that linking for base address 0 is not mandatory for
 achieving true position independence. What is required is that the code
 which runs from power-up until relocation be able to run anywhere, i.e.,
 this code should not require any relocation fixup. That can be achieved
 on ARM by using only relative branches and accessing data only relative
 to pc (e.g. literals) or truly absolute (e.g. HW registers etc).
 That means you need to build all of U-Boot that way, because
 significant parts of the code already run before relocation
 (including all clocks and timers setup, console setup, printf and all
 routines these pull in).

Yes, I think Wolfgang is correct. This is not going to be easy to do in 
general.  To run anywhere, the code must be true Position Independent 
code. If you intend to use any C code in the initialization, this will 
result in needing -fPIC for at least that code. I am not sure you can 
mix -fPIC and non -fPIC code in the same link, but I expect not. I am a 
bit surprised that it is possible to get even the initialization code to 
be Position Independent, but it appears that on at least some PPC it is 
possible/has been done.
 On a related topic, I did find some information on the 
-mrelocatable history. Take a look at
http://www.mail-archive.com/g...@gcc.gnu.org/msg02528.html.
If you read both thread entries, it explains -mrelocatable as more or 
less the post-processor that re-formats the ELF relocation information 
into a smaller format and puts it in the text as another segment. What 
Albert is doing now, and Graeme did before,  is the first option, 
creating a loader that understands ELF. This has the advantage that it 
will work on all architectures. However, once this understanding is in 
place, it would be easy to write a small post-processing program that 
would reduce the size of the relocation entries, much like -mrelocatable 
does. This may or may not be necessary, but it is certainly possible.

Best Regards,
Bill Campbell
 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] ARM relocation, question to Heiko


 On 10/4/2010 10:06 AM, Wolfgang Denk wrote:

Dear J. William Campbell,

In message4ca9f294.8080...@comcast.net  you wrote:

Yes, I think Wolfgang is correct. This is not going to be easy to do in
general.  To run anywhere, the code must be true Position Independent
code. If you intend to use any C code in the initialization, this will
result in needing -fPIC for at least that code. I am not sure you can
mix -fPIC and non -fPIC code in the same link, but I expect not. I am a
bit surprised that it is possible to get even the initialization code to
be Position Independent, but it appears that on at least some PPC it is
possible/has been done.

Not really. On PowerPC, only the first 20 or 30 lines of assembler
code in start.S are position independent; then we compute the link
(resp. execution) address and branch to it. From then, we run from the
very address range we were linked for (starting at TEXT_BASE).

Hi Wolfgang,
 You are of course correct. I was referring more to Jocke's 
(joakim.tjernl...@transmode.se) statements regarding:


Yes, that is there today. I am talking about linking to any TEXT_BASE(say 0)
but burn and run into another address. I impl. this quite some time
ago for PPC(search for LINK_OFF)

I understand from his comment that he had achieved total PIC for the 
initialization, that would run at any location regardless
of TEXT_BASE. I think this code was not accepted into mainline, so it is not a 
problem at present. However, any relocation code
added would have to be modified by Jocke if he wished to preserve that 
capability. I am amazed that he was able to get the
rest of u-boot to work under the constraints you pointed out. It must have been 
quite tedious.

  I also wish to support Graeme's desire that the added relocation code at 
the end of the day be written in C. The routine to do the
relocation does not require .bss and is not real long. The obvious advantage of 
this approach is that all architectures can use it. The
ELF relocation codes will have to be changed to the architecture equivalents, 
and in some casesarchitecture specific relocation code
processing added, but the theory will always be the same. This approach will 
make using relocation much easier/trivial for new
architecture  ports, thereby reducing resistance to doing it!

Best Regards,
Bill Campbell



Albert is doing now, and Graeme did before,  is the first option,
creating a loader that understands ELF. This has the advantage that it
will work on all architectures. However, once this understanding is in
place, it would be easy to write a small post-processing program that
would reduce the size of the relocation entries, much like -mrelocatable
does. This may or may not be necessary, but it is certainly possible.

Eventually we might even add -mrelocatable support for the other
architectures to the tool chain.

Best regards,

Wolfgang Denk



___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] [PATCH] arm: arm926ejs: use ELF relocations

  On 10/4/2010 5:16 PM, Albert ARIBAUD wrote:
 Le 05/10/2010 01:21, Graeme Russ a écrit :
 On Tue, Oct 5, 2010 at 9:57 AM, Albert ARIBAUDalbert.arib...@free.fr   
 wrote:
 Le 05/10/2010 00:22, Graeme Russ a écrit :
 On Tue, Oct 5, 2010 at 9:01 AM, Albert Aribaudalbert.arib...@free.fr
wrote:
 The output from MAKEALL is curiously calculated... If I look at objdumps of
 the GOT and ELF binaries, I find that:

 - the GOT .text section is 118960 bytes and the ELF .text section only
 108112. This is due to the fact that GOT relocation requires additional
 instruction for GOT indirection whereas ELF relocations work by patching the
 code.
 It would be interesting to compare against the basline non-relocatable
 version
 I #defined CONFIG_RELOC_FIXUP_WORKS and removed -pie from the ARM
 config.mk. This puts the edminiv2 code in the non-reloc build case, and
 produces identical .text and .data, and almost identical .rodata, as the
 ELF case.

 - the .rodata section is 22416 for GOT, 22698 for ELF, whereas the .data
 section is 2908 for GOT, 2627 for ELF. Some initialized data apparently
 moved from non-const ton const for some reason, but basically, initialized
 data remains constant.

 - the .bss section remains constant too, 16640 for GOT vs. 16636 for ELF.
 I'm not going to track what causes the 4 byte difference. :)

 Many sections are output in the ELF file which do not appear in the GOT
 file, such as .interp, .dynamic, .dynstr etc. They probably pollute
 MAKEALL's figures.
 I now discard a few sections:

  /DISCARD/ : { *(.dynstr*) }
  /DISCARD/ : { *(.dynamic*) }
  /DISCARD/ : { *(.plt*) }
  /DISCARD/ : { *(.interp*) }
  /DISCARD/ : { *(.gnu*) }

 Not that it makes a huge difference - most of these are trivially small
 Thanks. I'll add this to the .lds as a measure of clarity.

 That's roughly consistent with the numbers I get: about 19 KB of .rel.dyn
 plus .dynsym, which we will be able to cut by half if we preprocess it.
 Which is not copied to RAM, so not as nasty as the .got related increase
 True also. Note that we could probably shrink the table to 1/4 of its
 current size by taking advantage from the fact that the few
 non-program-base-relative relocations it has can easily be converted to
 program-base-relative, and that two consecutive relocations are always
 less than 64 KB away from each other. Of course that moves away from
 using the ELF structures as-is, and requires additional build steps, but
 people with small FLASH devices may want it.
Hi All,
  This may be pushing it in the more general case. ARM has only a 
few relocation types. Other CPU types have more types, and therefore 
still may need a type field. You can certainly get 1/2 in all cases, and 
more if you are willing to get a bit more complex in the preprocessing. 
That said, I think this is best left to later when all CPUs are in the 
relocatable state.
 I'm also looking at moving the low-level intialisation and relocation code
 into a seperate section (outside .text) so I even less to relocate to RAM
 As Wolfgang pointed out, there might be issues in that all the code that
 runs in FLASH should be truly PI, which might not be a piece of cake.
 ARM C code, for instance, tends to generate literals which need to be
 relocated if you don't run the code where it was linked for.
True, but the code WILL be running at the address it was linked for. It 
just won't be copied and relocated to the new address, as it would 
never be run again anyway. This goal is along the lines of the two 
stage u-boot that has been/is being considered, where all execute only 
once code can be concentrated into a segment that is not moved into ram.

Bill Campbell
 Then I could even compress the relocatable section, but that is just being
 silly ;)
 :)

 Regards,

 Graeme
 Amicalement,

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [RFC] [PATCH] arm: arm926ejs: use ELF relocations

  On 10/4/2010 10:30 PM, Wolfgang Denk wrote:
 Dear Albert ARIBAUD,

 In message4caa50aa.3000...@free.fr  you wrote:
 Remember: this patch only applies to boards which boot from NOR FLASH!
 You can test it on other types of boards (NAND-based, etc) for
 regression testing, but nothing more.
 Assuming the NAND loder does not load U-Boot to it's final location
 at the upper end of RAM, but - say - somewhere in lower memory, the
 standard relocation preocess will be running, so I think there should
 be no real difference between (such) NAND booting systems and NOR
 booting ones - or am I missing something?
FWIW I think you are right. If u-boot is linked for the address where 
the NAND loader put it, everything should work fine. It can
size memory, move a copy of u-boot to the top of memory, and branch to 
the entry point that continues initialization.

Bill Campbell
 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] ARM relocation, question to Heiko

2010-10-03 Thread J. William Campbell

  On 10/3/2010 1:58 AM, Albert ARIBAUD wrote:
 Le 03/10/2010 10:44, Graeme Russ a écrit :

 Bill just said that -pic (or, for ARM, -fPIC or -fPIE) was unnecessary
 for relocation. You seem to imply it actually is... In my experience,
 -fPIC and-fPIE do increase code by adding GOT relocation to symbols 
 that
 need fixing, so they would indeed be redundant to any other relocation
 mechanism -- I just did some test with basic code and this seems to
 confirm, no -fPIx is needed to get relocation the way you do on ARM.

 Just to clarify -fpic is a compiler option, -pic is a linker option. x86
 has no compile time relocation options (therefore no referencing .got 
 etc).
 Using the link time pic option produces the relocation data table
 (.rel.dyn) which must be pre-processed before execution can begin at the
 relocated address

 Thanks for clarifying, Graeme.

 This is consistent with the ARM compile-time options -fPIC/-fPIE vs 
 link-time option -pie. So there may be at least an interest in 
 investigating ELF-style relocation on ARM and comparing it to 
 GOT-based relocation in terms of FLASH and RAM sizes and code speed.

Hi All,
   It is for sure that -fPIC/-fPIE programs will contain more 
executable instructions than programs compiled without these options.
The program will also contain more data space for the got. If -fPIC 
actually produced a fully position-independent executable, the extra 
overhead would perhaps be tolerable. However, since it does not do this, 
(problems with initialized data etc.) there is really no advantage in 
using these compile-time options. The executable code and required data 
space for the program without these switches will always be smaller 
and faster than with them. In order to fix the remaining issues even 
when using -fPIC, a relocation loop must exist in the u-boot code, 
either one global one or a bunch of user written specific ones. Also, 
the -pie switch will be needed anyway at link time to build the 
relocation table for the remaining relocation requirements.
   Programs compiled without -fPIC will have a larger .rel.dyn table 
than those compiled with -fPIC. However, the table entries in the 
relocation table occupy about the same storage as the code generated by 
the compiler to relocate a reference to the symbol at run time.  So this 
is probably a almost a wash. Also, the dynamic relocation data need not 
be copied into the run-time object, as it is no longer needed. So the 
likely outcome is that the flash image is about the same size/slightly 
larger than the one compiled by -fPIC, and that the ram footprint after 
relocation is slightly smaller.
   If one is REALLY pressed for space, the size of the dynamic 
relocation area can be reduced by a post-processor program that would 
re-format the relocation entries. This re-formatting is possible because 
1) ELF is a very general format and we only need a small subset of it, 
and 2) u-boot code will never occupy say 16 MB of space, so each 
relocation can probably be compressed into a 32 bit word. I doubt anyone 
is that desperate, but it IS possible.
   It will be interesting to see what the results of this comparison 
are. For me, the no user awareness of relocation is worth a lot, and the 
fact that the difference/overhead of relocation will all be in exactly 
one place is very appealing.

Best Regards,
Bill Campbell
 Cheers,

 Graeme

 Amicalement,

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] ARM relocation, question to Heiko

2010-10-03 Thread J. William Campbell

  On 10/3/2010 11:29 AM, Wolfgang Denk wrote:
 Dear Reinhard Meyer,

 In message4ca79896.2010...@emk-elektronik.de  you wrote:
 I agree here. _If_ relocation, it should work without hand-adding
 fixup stuff to all functions using initialized data with pointers.
 Even Wolfgang forgot to fixup his 2nd level command table in
 cmd_nvedit.c ;)
 I didn't forget it - at least not in the sensse that I think this is
 something that needs to be done.

 This works fine on PPC with relocation, and we should make it work
 the same on other arches.

 And, for space concerns in flash, relocation should always be an
 option on a board by board basis...
 NAK.

 And as an idea, if position independent code is used, only pointers
 in initialized data need adjustment. Cannot the linker emit a table
 of addresses that need fixing?
 It does. That's the GOT.
I think this is actually a misunderstanding. The purpose of the GOT, at 
least from GCC's point of view, is to hold the absolute addresses of 
private data referenced by shared library code. That is what it was 
invented to do. This is similar to, but not identical with, relocating 
all data references. Initialized data in the library must have a copy 
created (and relocated as necessary if it contains pointers) by the 
runtime linker when the library is initialized in the address space of 
the process using the library. The code in the shared library is -fPIC, 
but it still needs the runtime linker to allocate a copy of the GOT for 
the current user AND to allocate and relocate any data that is required 
for the library that is private to the user. It is that second step 
where we have trouble.

Best Regards,
Bill Campbell
 Best regards,

 Wolfgang Denk


___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] ARM relocation, question to Heiko

2010-10-02 Thread J. William Campbell


 On 10/2/2010 3:17 AM, Joakim Tjernlund wrote:

Hello Reinhard,

Reinhard Meyer wrote:

Dear Albert ARIBAUD,

I try to understand how the relocation process could handle pointers (to
functions or other data) in const or data sections.
Your code cannot know what is data and what is a pointer that needs
adjustment?

Best Regards,
Reinhard

Hi Reinhart,

Short answer - the relocation process does not handle pointers inside
data structures.

And yes, this means the content arrays of pointers such as init_sequence
is not relocated. Been there, done that, can give you one of the

The init_sequence should not called anymore after relocation, as it is
the init_sequence ... or?


tee-shirts I got :)

ATM I have not found a way to fix this, except making the code which
uses the pointers aware that the are location-sensitive and fix them
when using them.

That means that things like this cannot work (with relocation),
unless adding the relocation offset before using the pointer:

Yep, you have to fix these pointers after relocation ...


const struct {
const u8 shift;
const u8 idcode;
struct spi_flash *(*probe) (struct spi_slave *spi, u8 *idcode);
} flashes[] = {
#ifdef CONFIG_SPI_FLASH_SPANSION
{ 0, 0x01, spi_flash_probe_spansion, },
#endif

[...]

#ifdef CONFIG_SPI_FRAM_RAMTRON
{ 6, 0xc2, spi_fram_probe_ramtron, },
# ifdef CONFIG_SPI_FRAM_RAMTRON_NON_JEDEC
{ 0, 0xff, spi_fram_probe_ramtron, },
# endif
# undef IDBUF_LEN
# define IDBUF_LEN 9 /* we need to read 6+3 bytes */
#endif
};

And I think there are more places of this type in u-boot...

Yes, maybe. But relocation as I did for arm, also works
on m68k, sparc, mips, avr32 and they must do also this
fixups, so for common functions (except the new env handling,
which I think got never tested on this architectures?) should
work ...

This pointer problem is solved with the fixup relocs on ppc and
should work without manual relocation. I think this is a ppc
only extension but I might be wrong.


Hi All,
  You are correct that this is a ppc only extension. As such, it is 
not a good candidate for general use.



I believe that the other alternative is to do it as x86 does
which I think is the general way which should work on any arch.
Graem Russ would know better.

Almost exactly a year ago, this was all pretty much presented by Graeme 
in the threads

Relocation size penalty calculation (October 14, 2009)
i386 Relocation (November 24, 2009)

Using the full relocation scheme eliminates the need for all these 
fixups in u-boot C code. I think this is a very desirable result.
It is also not clear to me that hard coding in the relocation as several 
C routines will produce a  u-boot that is smaller than the one 
produced by using normal ELF relocation. However, using full relocation 
creates an environment that is true C and does not rely on people 
remembering that they may have to fix up some parts of their code. It is 
hard to see much downside in using the full relocation capability 
provided by Graeme's code.
FWIW, the relocation code and data does not have to be moved into ram if 
space is at a premium.


Best Regards,
Bill Campbell


  Jocke

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot




___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] [PATCH] flash.h: pull in common.h for types

2009-11-17 Thread J. William Campbell

Mike Frysinger wrote:
 On Tuesday 17 November 2009 16:56:58 Wolfgang Denk wrote:
   
 Scott Wood wrote:
 
 My question: is there a definitive position  somewhere  (for  example
 for  the  Linux kernel; I'm sure we don't have one for U-Boot [yet]),
 whether system headers should be self-sufficient?
 
 I'd say they should be self-sufficient, in that the inclusion of the
 header itself should not fail if I haven't included some arbitrary other
 header.  I don't see what the argument would be for not doing this.
   
 Well, Theo de Raadt says for example ... people would be able to
 include less files; indeed, almost be careless about what they
 include. But this would not increase portability in any way. And
 'make build' would probably, if it was taken the nth degree, take
 twice as long. Therefore there is no benefit for the crazy rule you
 suggest... - see
 http://www.mail-archive.com/t...@openbsd.org/msg00425.html
 

 i disagree with this using, ironically, the same base logic, but a different 
 conclusion:
 http://sourceware.org/ml/libc-alpha/2006-08/msg00064.html

 also, i think a self contained system like u-boot which has full control at 
 the api level can be better at this than a user interface which really sits 
 on 
 top of an abi and has to deal with a lot of crap from user code.

 while i'm not asking for you or anyone else to audit header paths here as i 
 think that level of enforcement will bog things down, small patches from 
 people who choose to fix things should be merged.

   
FWIW, I think one needs to be very careful with this reasoning. It is 
clear that experienced and capable programmers disagree on the correct 
approach to this problem. It is also true that the logical structure of 
the include chain is important. A crap interface is going to be hard 
to maintain no matter how you do it. The problem is, crap has a way of 
sneaking into well-designed interfaces in the form of small patches to 
fix things. (I am not saying this is one of them). I have observed 
that in any large program, as time goes on, more and more things get 
included in more and more places. Hiding this fact by doing the 
inclusion in other header files often obscures the drift towards a very 
polluted name space, where editing just about any include file requires 
the entire system to be rebuilt. Some would say, so what, it is easy to 
do. However, these situations often result in hard to find bugs. The 
question of which modules can access this variable becomes all of 
them because of (badly designed?)  interfaces being included in other 
interfaces without the user being explicitly aware of this. Often I see 
in patch critiques the statement This belongs in a header file, which 
may be true, but can lead to a bunch of un-related things being stuck 
together in a header file just to meet the requirement. Those bad 
choices then get included in other header files and we are off to the 
races. Since the users are not explicitly aware they are pulling in a 
bunch of stuff that has nothing to do with the main purpose of the 
module being written, there is often no incentive to clean up bad header 
designs until things get so bad it is almost impossible to untangle 
things. So when adding an include to an include file, think hard whether 
you are bringing along a lot of things that are not required and that 
one would not expect to be exposed in all files using the modified 
include file. If you are bringing along a lot of baggage, perhaps the 
interface you are including could use some re-work.

Best Regards,
Bill Campbell
 I don't know whether Linux has a specific policy on this, but I haven't
 noticed many problems in this regard, and when I did find one in the
 kernel a few years back I didn't get any argument when I submitted a
 patch to fix it.
   

 ive semi-frequently post fixes to linux headers so that you can include just 
 that header and have it work.  i have yet to hear anyone complain; rather 
 every one has been merged (ignoring issues unrelated to the original purpose).

   
 Which man pages are you looking at?
   
 Well, for example:

 open(2):
  SYNOPSIS
 #include sys/types.h
 #include sys/stat.h
 #include fcntl.h

 mknod(2):
  SYNOPSIS
 #include sys/types.h
 #include sys/stat.h
 #include fcntl.h
 #include unistd.h

 stat(2):
  SYNOPSIS
 #include sys/types.h
 #include sys/stat.h
 #include unistd.h

 Why do we need these lists of #includes? WHy doe - for example -
 sys/stat.h not auto-include anything it might need?

 To me this seems to be an indication that there is no intention to
 make headers self-sufficent, but I am absolutely not sure.
 

 i'm pretty sure your man page example is an unrelated issue.  the include 
 list 
 does not imply that any one of those headers cannot be included by itself 
 first.  if you read the full text

Re: [U-Boot] Relocation size penalty calculation

2009-10-17 Thread J. William Campbell

Graeme Russ wrote:
 On Thu, Oct 15, 2009 at 3:45 AM, J. William Campbell
 jwilliamcampb...@comcast.net wrote:
   
 Joakim Tjernlund wrote:
 
   
megasnip

 Apologies if this is getting way off-topic for a simple boot loader, but
 this is information I have gathered from far and wide over the net. I am
 surprised that there isn't a web site out there on 'How to create a
 relocatable boot loader'...

 OK, its all starting to come together now - It helps when you look at the
 right files ;)

 Firstly, u-boot.map

 0x380589a0__rel_dyn_start = .

 .rel.dyn0x380589a0 0x42b0
  *(.rel.dyn)
  .rel.got   0x0x0 cpu/i386/start.o
  .rel.plt   0x0x0 cpu/i386/start.o
  .rel.text  0x380589a0 0x2e28 cpu/i386/start.o
  .rel.start16   0x3805b7c8   0x10 cpu/i386/start.o
  .rel.data  0x3805b7d8  0xc18 cpu/i386/start.o
  .rel.rodata0x3805c3f0  0x360 cpu/i386/start.o
  .rel.u_boot_cmd
 0x3805c750  0x500 cpu/i386/start.o
 0x3805cc50__rel_dyn_end = .


 And the output of readelf...

 Section Headers:
   [Nr] Name  TypeAddr OffSize   ES Flg Lk Inf 
 Al
   [ 0]   NULL 00 00 00  0   0 
  0
   [ 1] .text PROGBITS3804 001000 0118a4 00  AX  0   0 
  4
   [ 2] .rel.text REL  066c68 005d00 08 40   1 
  4
   [ 3] .rodata   PROGBITS380518a4 0128a4 005da5 00   A  0   0 
  4
   [ 4] .rel.rodata   REL  06c968 000360 08 40   3 
  4
   [ 5] .interp   PROGBITS38057649 018649 13 00   A  0   0 
  1
   [ 6] .dynstr   STRTAB  3805765c 01865c 0001ee 00   A  0   0 
  1
   [ 7] .hash HASH3805784c 01884c cc 04   A 11   0 
  4
   [ 8] .data PROGBITS38057918 018918 000a3c 00  WA  0   0 
  4
   [ 9] .rel.data REL  06ccc8 000c18 08 40   8 
  4
   [10] .got.plt  PROGBITS38058354 019354 0c 04  WA  0   0 
  4
   [11] .dynsym   DYNSYM  38058360 019360 000200 10   A  6   1 
  4
   [12] .dynamic  DYNAMIC 38058560 019560 80 08  WA  6   0 
  4
   [13] .u_boot_cmd   PROGBITS380585e0 0195e0 0003c0 00  WA  0   0 
  4
   [14] .rel.u_boot_cmd   REL  06d8e0 000500 08 40  13 
  4
   [15] .bss  NOBITS  3805cc50 01ec50 001a34 00  WA  0   0 
  4
   [16] .bios PROGBITS 01e000 00053e 00  AX  0   0 
  1
   [17] .rel.bios REL  06dde0 c0 08 40  16 
  4
   [18] .rel.dyn  REL 380589a0 0199a0 0042b0 08   A 11   0 
  4
   [19] .start16  PROGBITSf800 01e800 000110 00  AX  0   0 
  1
   [20] .rel.start16  REL  06dea0 38 08 40  19 
  4
   [21] .resetvec PROGBITSfff0 01eff0 10 00  AX  0   0 
  1
   [22] .rel.resetvec REL  06ded8 08 08 40  21 
  4

 ...

 Relocation section '.rel.text' at offset 0x66c68 contains 2976 entries:
  Offset InfoTypeSym.Value  Sym. Name
 38040010  0101 R_386_32  3804   .text
 3804001e  0101 R_386_32  3804   .text
 38040028  0101 R_386_32  3804   .text
 3804003f  0101 R_386_32  3804   .text
 38040051  0101 R_386_32  3804   .text
 38040075  0101 R_386_32  3804   .text
 38040085  0101 R_386_32  3804   .text
 3804009d  0003e602 R_386_PC32380403fa   load_uboot
 380400a6  0101 R_386_32  3804   .text
 38040015  00029f02 R_386_PC323804bdd8   early_board_init
 38040023  0003f702 R_386_PC323804bdda   show_boot_progress_asm

 ...

 Relocation section '.rel.rodata' at offset 0x6c968 contains 108 entries:
  Offset InfoTypeSym.Value  Sym. Name
 38051908  0201 R_386_32  380518a4   .rodata
 38051938  0201 R_386_32  380518a4   .rodata
 38051968  0201 R_386_32  380518a4   .rodata
 38051998  0201 R_386_32  380518a4   .rodata
 380519c8  0201 R_386_32  380518a4   .rodata
 380519f8  0201 R_386_32  380518a4   .rodata

 ...

 Relocation section '.rel.dyn' at offset 0x199a0 contains 2134 entries:
  Offset InfoTypeSym.Value  Sym. Name
 f838  0008 R_386_RELATIVE
 f846  0008 R_386_RELATIVE
 38040010  0008 R_386_RELATIVE
 3804001e  0008 R_386_RELATIVE
 38040028  0008 R_386_RELATIVE
 3804003f  0008 R_386_RELATIVE
 38040051  0008 R_386_RELATIVE
 38040075  0008 R_386_RELATIVE
 38040085  0008 R_386_RELATIVE

 Notice that, apart from .rel.dyn, non of the .rel.* sections have the
 A (Allocated) flag set - They do not end

Re: [U-Boot] Relocation size penalty calculation

2009-10-14 Thread J. William Campbell

Joakim Tjernlund wrote:
 J. William Campbell jwilliamcampb...@comcast.net wrote on 14/10/2009 
 01:48:52:
   
 Joakim Tjernlund wrote:
 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 22:06:56:


   
 On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:

 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 13:21:05:

   
 On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:

 
 Graeme Russ graeme.r...@gmail.com wrote on 11/10/2009 12:47:19:

   
 [Massive Snip :)]


 
 So, all that is left are .dynsym and .dynamic ...
   .dynsym
 - Contains 70 entries (16 bytes each, 1120 bytes)
 - 44 entries mimic those entries in .got which are not relocated
 - 21 entries are the remaining symbols exported from the linker
   script
 - 4 entries are labels defined in inline asm and used in C

 
 Try adding proper asm declarations. Look at what gcc
 generates for a function/variable and mimic these.

   
 Thanks - Now .dynsym contains only exports from the linker script

 
 :)

   
 - 1 entry is a NULL entry

   .dynamic
 - 88 bytes
 - Array of Elf32_Dyn
 - typedef struct {
   Elf32_Sword d_tag;
   union {
   Elf32_Word  d_val;
   Elf32_Addr  d_ptr;
   } d_un;
   } Elf32_Dyn;
 - 0x11 entries
   [00] 0x0010, 0x DT_SYMBOLIC, (ignored)
   [01] 0x0004, 0x38059994 DT_HASH, points to .hash
   [02] 0x0005, 0x380595AB DT_STRTAB, points to .dynstr
   [03] 0x0006, 0x3805BDCC DT_SYMTAB, points to .dynsym
   [04] 0x000A, 0x03E6 DT_STRSZ, size of .dynstr
   [05] 0x000B, 0x0010 DT_SYMENT, ???
   [06] 0x0015, 0x DT_DEBUG, ???
   [07] 0x0011, 0x3805A8F4 DT_REL, points to .rel.text
   [08] 0x0012, 0x14D8 DT_RELSZ, ???

 
 How big DT_REL is

   
   [09] 0x0013, 0x0008 DT_RELENT, ???

 
 hmm, cannot remeber :)

   
 How big an entry in DT_REL is

 
 Right, how could I forget :)

   
   [0a] 0x0016, 0x DT_TEXTREL, ???

 
 Oops, you got text relocations. This is generally a bad thing.
 TEXTREL is commonly caused by asm code that arent truly pic so it needs
 to modify the .text segment to adjust for relocation.
 You should get rid of this one. Look for DT_TEXTREL in .o files to find
 the culprit.


   
 Alas I cannot - The relocations are a result of loading a register with a
 return address when calling show_boot_progress in the very early stages 
 of
 initialisation prior to the stack becoming available. The x86 does not
 allow direct access to the IP so the only way to find the 'current
 execution address' is to 'call' to the next instruction and pop the 
 return
 address off the stack

 
 hmm, same as ppc but that in it self should not cause a TEXREL, should it?
 Ahh, the 'call' is absolute, not relative? I guess there is some way 
 around it
 but it is not important ATM I guess.

 Evil idea, skip -fpic et. all and add the full reloc procedure
 to relocate by rewriting directly in TEXT segment. Then you save space
 but you need more relocation code. Something like dl_do_reloc from
 uClibc. Wonder how much extra code that would be? Not too much I think.


   
 With the following flags

 PLATFORM_RELFLAGS += -fvisibility=hidden
 PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
 PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions

 I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
 this might mean I need the symbol table in the binary in order to resolve
 them

 

 BTW, how many relocs do you get compared with -fPIC? I suspect you more
 now but hopefully not that many more.

   
 Possibly, but I think you only need to add an offset to all those
 relocs.

   
 Almost right. The relocations specify a symbol value that needs to be
 added to the data in memory to relocate the reference. The symbol values
 involved should be the start of the text section for program references,
 the start of the uninitialized data section for bss references, and the
 start of the data section for initialized data and constants. So there
 are about four symbols whose value you need to keep. Take a look at
 http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
 already looked at) and it tells you what to do with R_386_PC32 ad
 R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
 will remove all the symbols you don't actually need, but I don't know
 that for sure. Note also that you can change the section flags of a
 section marked noload  to load.
 

 Still think you can get away with just ADDING an offset. The image is linked 
 to a
 specific address

Re: [U-Boot] Relocation size penalty calculation

2009-10-14 Thread J. William Campbell

Joakim Tjernlund wrote:
 Graeme Russ graeme.r...@gmail.com wrote on 14/10/2009 13:48:27:
   
 On Wed, Oct 14, 2009 at 6:25 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
 
 J. William Campbell jwilliamcampb...@comcast.net wrote on 14/10/2009 
 01:48:52:
   
 Joakim Tjernlund wrote:
 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 22:06:56:


   
 On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:

 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 13:21:05:

   
 On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:

 
 Graeme Russ graeme.r...@gmail.com wrote on 11/10/2009 12:47:19:

   
 [Massive Snip :)]


 
 So, all that is left are .dynsym and .dynamic ...
   .dynsym
 - Contains 70 entries (16 bytes each, 1120 bytes)
 - 44 entries mimic those entries in .got which are not relocated
 - 21 entries are the remaining symbols exported from the linker
   script
 - 4 entries are labels defined in inline asm and used in C

 
 Try adding proper asm declarations. Look at what gcc
 generates for a function/variable and mimic these.

   
 Thanks - Now .dynsym contains only exports from the linker script

 
 :)

   
 - 1 entry is a NULL entry

   .dynamic
 - 88 bytes
 - Array of Elf32_Dyn
 - typedef struct {
   Elf32_Sword d_tag;
   union {
   Elf32_Word  d_val;
   Elf32_Addr  d_ptr;
   } d_un;
   } Elf32_Dyn;
 - 0x11 entries
   [00] 0x0010, 0x DT_SYMBOLIC, (ignored)
   [01] 0x0004, 0x38059994 DT_HASH, points to .hash
   [02] 0x0005, 0x380595AB DT_STRTAB, points to .dynstr
   [03] 0x0006, 0x3805BDCC DT_SYMTAB, points to .dynsym
   [04] 0x000A, 0x03E6 DT_STRSZ, size of .dynstr
   [05] 0x000B, 0x0010 DT_SYMENT, ???
   [06] 0x0015, 0x DT_DEBUG, ???
   [07] 0x0011, 0x3805A8F4 DT_REL, points to .rel.text
   [08] 0x0012, 0x14D8 DT_RELSZ, ???

 
 How big DT_REL is

   
   [09] 0x0013, 0x0008 DT_RELENT, ???

 
 hmm, cannot remeber :)

   
 How big an entry in DT_REL is

 
 Right, how could I forget :)

   
   [0a] 0x0016, 0x DT_TEXTREL, ???

 
 Oops, you got text relocations. This is generally a bad thing.
 TEXTREL is commonly caused by asm code that arent truly pic so it 
 needs
 to modify the .text segment to adjust for relocation.
 You should get rid of this one. Look for DT_TEXTREL in .o files to 
 find
 the culprit.


   
 Alas I cannot - The relocations are a result of loading a register 
 with a
 return address when calling show_boot_progress in the very early 
 stages of
 initialisation prior to the stack becoming available. The x86 does not
 allow direct access to the IP so the only way to find the 'current
 execution address' is to 'call' to the next instruction and pop the 
 return
 address off the stack

 
 hmm, same as ppc but that in it self should not cause a TEXREL, should 
 it?
 Ahh, the 'call' is absolute, not relative? I guess there is some way 
 around it
 but it is not important ATM I guess.

 Evil idea, skip -fpic et. all and add the full reloc procedure
 to relocate by rewriting directly in TEXT segment. Then you save space
 but you need more relocation code. Something like dl_do_reloc from
 uClibc. Wonder how much extra code that would be? Not too much I think.


   
 With the following flags

 PLATFORM_RELFLAGS += -fvisibility=hidden
 PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
 PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions

 I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
 this might mean I need the symbol table in the binary in order to resolve
 them

 
 BTW, how many relocs do you get compared with -fPIC? I suspect you more
 now but hopefully not that many more.

   
 Possibly, but I think you only need to add an offset to all those
 relocs.

   
 Almost right. The relocations specify a symbol value that needs to be
 added to the data in memory to relocate the reference. The symbol values
 involved should be the start of the text section for program references,
 the start of the uninitialized data section for bss references, and the
 start of the data section for initialized data and constants. So there
 are about four symbols whose value you need to keep. Take a look at
 http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
 already looked at) and it tells you what to do with R_386_PC32 ad
 R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
 will remove

Re: [U-Boot] Relocation size penalty calculation

2009-10-14 Thread J. William Campbell

Joakim Tjernlund wrote:
 J. William Campbell jwilliamcampb...@comcast.net wrote on 14/10/2009 
 17:35:44:
   
 Joakim Tjernlund wrote:
 
 J. William Campbell jwilliamcampb...@comcast.net wrote on 14/10/2009 
 01:48:52:

   
 Joakim Tjernlund wrote:

 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 22:06:56:



   
 On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:


 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 13:21:05:


   
 On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:


 
 Graeme Russ graeme.r...@gmail.com wrote on 11/10/2009 12:47:19:


   
 [Massive Snip :)]
 

 [Yet another SNIP :)]

   
 Evil idea, skip -fpic et. all and add the full reloc procedure
 to relocate by rewriting directly in TEXT segment. Then you save space
 but you need more relocation code. Something like dl_do_reloc from
 uClibc. Wonder how much extra code that would be? Not too much I think.



   
 With the following flags

 PLATFORM_RELFLAGS += -fvisibility=hidden
 PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
 PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions

 I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
 this might mean I need the symbol table in the binary in order to resolve
 them


 
 BTW, how many relocs do you get compared with -fPIC? I suspect you more
 now but hopefully not that many more.


   
 Possibly, but I think you only need to add an offset to all those
 relocs.


   
 Almost right. The relocations specify a symbol value that needs to be
 added to the data in memory to relocate the reference. The symbol values
 involved should be the start of the text section for program references,
 the start of the uninitialized data section for bss references, and the
 start of the data section for initialized data and constants. So there
 are about four symbols whose value you need to keep. Take a look at
 http://refspecs.freestandards.org/elf/elf.pdf (which you have probably
 already looked at) and it tells you what to do with R_386_PC32 ad
 R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded
 will remove all the symbols you don't actually need, but I don't know
 that for sure. Note also that you can change the section flags of a
 section marked noload  to load.

 
 Still think you can get away with just ADDING an offset. The image is 
 linked to a
 specific address and then you move the whole image to a new address. 
 Therefore
 you should be able to read the current address, add offset, write back the
   
 new address.
 
 Normally one do what you describe but here we know that the whole img has 
 moved so
 we don't have to do calculate the new address from scratch.

   
 If the addresses of the bss, text, and data segments change by the same
 value, I think you are correct. However, if the text and data/bss
 segments are moved by different offsets, naturally the relocations would
 be different. One reason to retain this capability would be to allow the
 u-boot copy to execute in place in NOR flash while re-locating the
 read-write storage once memory has been sized. Having different
 relocation factors is not much worse than just one, and it may be just
 as easy to get working initially as a single relocation constant.
 

 How do figure that? You need to rewrite the insn to access the moved
 data/bss and they are in flash, did I miss something?
   
No, I did. You are quite correct, there would be references in flash 
that couldn't be fixed. Sorry about that.

Best Regards,
Bill Campbell
   
 FWIW, the ultimate solution to minimum relocation size is a
 post-processing step that creates several arrays of relocation offsets
 as two byte quantities. This reduces the cost of each relocation entry
 to just a bit more than two bytes (there is a small overhead for array
 size, MSB values and relocation offset selection.) Naturally, this is
 much less than the ELF version of the same relocations, because we do
 not need to retain as much information and ELF doesn't worry about size
 that much.. This may pacify users for which the flash size of the image
 is critical, at the expense of an extra link step. Naturally, getting
 things to work with standard ELF is the most important step, and
 probably enough for most people.
 

 That would save 2+4 bytes/reloc on REL arches and
 2+4+4 on RELA(ppc) (provided one can ignore r_addend)

 But yes, this is probably too fancy for the moment.

   Jocke



   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation

2009-10-13 Thread J. William Campbell

Joakim Tjernlund wrote:
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 13:21:05:
   
 On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
 
 Graeme Russ graeme.r...@gmail.com wrote on 11/10/2009 12:47:19:
   
 [Massive Snip :)]

 
 So, all that is left are .dynsym and .dynamic ...
   .dynsym
 - Contains 70 entries (16 bytes each, 1120 bytes)
 - 44 entries mimic those entries in .got which are not relocated
 - 21 entries are the remaining symbols exported from the linker
   script
 - 4 entries are labels defined in inline asm and used in C
 
 Try adding proper asm declarations. Look at what gcc
 generates for a function/variable and mimic these.
   
 Thanks - Now .dynsym contains only exports from the linker script
 
 :)
   
 - 1 entry is a NULL entry

   .dynamic
 - 88 bytes
 - Array of Elf32_Dyn
 - typedef struct {
   Elf32_Sword d_tag;
   union {
   Elf32_Word  d_val;
   Elf32_Addr  d_ptr;
   } d_un;
   } Elf32_Dyn;
 - 0x11 entries
   [00] 0x0010, 0x DT_SYMBOLIC, (ignored)
   [01] 0x0004, 0x38059994 DT_HASH, points to .hash
   [02] 0x0005, 0x380595AB DT_STRTAB, points to .dynstr
   [03] 0x0006, 0x3805BDCC DT_SYMTAB, points to .dynsym
   [04] 0x000A, 0x03E6 DT_STRSZ, size of .dynstr
   [05] 0x000B, 0x0010 DT_SYMENT, ???
   [06] 0x0015, 0x DT_DEBUG, ???
   [07] 0x0011, 0x3805A8F4 DT_REL, points to .rel.text
   [08] 0x0012, 0x14D8 DT_RELSZ, ???
 
 How big DT_REL is
   
   [09] 0x0013, 0x0008 DT_RELENT, ???
 
 hmm, cannot remeber :)
   
 How big an entry in DT_REL is
 

 Right, how could I forget :)
   
   [0a] 0x0016, 0x DT_TEXTREL, ???
 
 Oops, you got text relocations. This is generally a bad thing.
 TEXTREL is commonly caused by asm code that arent truly pic so it needs
 to modify the .text segment to adjust for relocation.
 You should get rid of this one. Look for DT_TEXTREL in .o files to find
 the culprit.

   
 Alas I cannot - The relocations are a result of loading a register with a
 return address when calling show_boot_progress in the very early stages of
 initialisation prior to the stack becoming available. The x86 does not
 allow direct access to the IP so the only way to find the 'current
 execution address' is to 'call' to the next instruction and pop the return
 address off the stack
 

 hmm, same as ppc but that in it self should not cause a TEXREL, should it?
 Ahh, the 'call' is absolute, not relative? I guess there is some way around it
 but it is not important ATM I guess.

 Evil idea, skip -fpic et. all and add the full reloc procedure
 to relocate by rewriting directly in TEXT segment. Then you save space
 but you need more relocation code. Something like dl_do_reloc from
 uClibc. Wonder how much extra code that would be? Not too much I think.
   
I think this approach will turn out to be a big win. At present, the 
problem with just using the relocs is that objcopy is stripping them out 
when u-boot.bin is created, as I understand it. It seems this can be 
solved by changing the command switches appropriately, like using 
--strip-unneeded. In any case, there is some combination of switches 
that will preserve the relocation data. The executable code will get 
smaller, there will be no .got, and the relocation data will be larger 
(than with -fpic). In total size, it probably will be slightly smaller, 
but that is a guess. The most important benefit of this approach is that 
it will work for all architectures, thereby solving the problem once and 
forever! Even if the result is a bit larger, the RAM footprint will be 
reduced by the smaller object code size (since the relocation data need 
not be copied into ram).Having this approach as an option would be real 
nice, since it would always just work.

Best Regards,
Bill Campbell
   
 This is not a problem because this is very low-level init that is not
 called once relocated into RAM - These relocations can be safely ignored
 

   
   [0b] 0x6FFA, 0x0236 ???, Entries in .rel.dyn
   [0c] 0x, 0x DT_NULL, End of Array
   [0d] 0x, 0x DT_NULL, End of Array
   [0e] 0x, 0x DT_NULL, End of Array
   [0f] 0x, 0x DT_NULL, End of Array
   [10] 0x, 0x DT_NULL, End of Array

 I think some more investigation into the need for .dynsym and .dynamic is
 still required...
 
 .dynsym may still be required if only for accessing the __u_boot_cmd
 structure. However, I may be able to hack that a little and not create a
 __u_boot_cmd symbol in the linker script (create some other temporary
 symbol) and populate __u_boot_cmd with a valid value after relocation. It
 will look a little weird, but may mean

Re: [U-Boot] Relocation size penalty calculation

2009-10-13 Thread J. William Campbell

Joakim Tjernlund wrote:
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 22:06:56:

   
 On Tue, Oct 13, 2009 at 10:53 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
 
 Graeme Russ graeme.r...@gmail.com wrote on 13/10/2009 13:21:05:
   
 On Sun, Oct 11, 2009 at 11:51 PM, Joakim Tjernlund
 joakim.tjernl...@transmode.se wrote:
 
 Graeme Russ graeme.r...@gmail.com wrote on 11/10/2009 12:47:19:
   
 [Massive Snip :)]

 
 So, all that is left are .dynsym and .dynamic ...
   .dynsym
 - Contains 70 entries (16 bytes each, 1120 bytes)
 - 44 entries mimic those entries in .got which are not relocated
 - 21 entries are the remaining symbols exported from the linker
   script
 - 4 entries are labels defined in inline asm and used in C
 
 Try adding proper asm declarations. Look at what gcc
 generates for a function/variable and mimic these.
   
 Thanks - Now .dynsym contains only exports from the linker script
 
 :)
   
 - 1 entry is a NULL entry

   .dynamic
 - 88 bytes
 - Array of Elf32_Dyn
 - typedef struct {
   Elf32_Sword d_tag;
   union {
   Elf32_Word  d_val;
   Elf32_Addr  d_ptr;
   } d_un;
   } Elf32_Dyn;
 - 0x11 entries
   [00] 0x0010, 0x DT_SYMBOLIC, (ignored)
   [01] 0x0004, 0x38059994 DT_HASH, points to .hash
   [02] 0x0005, 0x380595AB DT_STRTAB, points to .dynstr
   [03] 0x0006, 0x3805BDCC DT_SYMTAB, points to .dynsym
   [04] 0x000A, 0x03E6 DT_STRSZ, size of .dynstr
   [05] 0x000B, 0x0010 DT_SYMENT, ???
   [06] 0x0015, 0x DT_DEBUG, ???
   [07] 0x0011, 0x3805A8F4 DT_REL, points to .rel.text
   [08] 0x0012, 0x14D8 DT_RELSZ, ???
 
 How big DT_REL is
   
   [09] 0x0013, 0x0008 DT_RELENT, ???
 
 hmm, cannot remeber :)
   
 How big an entry in DT_REL is
 
 Right, how could I forget :)
   
   [0a] 0x0016, 0x DT_TEXTREL, ???
 
 Oops, you got text relocations. This is generally a bad thing.
 TEXTREL is commonly caused by asm code that arent truly pic so it needs
 to modify the .text segment to adjust for relocation.
 You should get rid of this one. Look for DT_TEXTREL in .o files to find
 the culprit.

   
 Alas I cannot - The relocations are a result of loading a register with a
 return address when calling show_boot_progress in the very early stages of
 initialisation prior to the stack becoming available. The x86 does not
 allow direct access to the IP so the only way to find the 'current
 execution address' is to 'call' to the next instruction and pop the return
 address off the stack
 
 hmm, same as ppc but that in it self should not cause a TEXREL, should it?
 Ahh, the 'call' is absolute, not relative? I guess there is some way around 
 it
 but it is not important ATM I guess.

 Evil idea, skip -fpic et. all and add the full reloc procedure
 to relocate by rewriting directly in TEXT segment. Then you save space
 but you need more relocation code. Something like dl_do_reloc from
 uClibc. Wonder how much extra code that would be? Not too much I think.

   
 With the following flags

 PLATFORM_RELFLAGS += -fvisibility=hidden
 PLATFORM_CPPFLAGS += -fno-dwarf2-cfi-asm
 PLATFORM_LDFLAGS += -pic --emit-relocs -Bsymbolic -Bsymbolic-functions

 I get no .got, but a lot of R_386_PC32 and R_386_32 relocations. I think
 this might mean I need the symbol table in the binary in order to resolve
 them
 

 Possibly, but I think you only need to add an offset to all those
 relocs.
   
Almost right. The relocations specify a symbol value that needs to be 
added to the data in memory to relocate the reference. The symbol values 
involved should be the start of the text section for program references, 
the start of the uninitialized data section for bss references, and the 
start of the data section for initialized data and constants. So there 
are about four symbols whose value you need to keep. Take a look at 
http://refspecs.freestandards.org/elf/elf.pdf (which you have probably 
already looked at) and it tells you what to do with R_386_PC32 ad 
R_386_32 relocations. Hopefully the objcopy with the --strip-unneeded 
will remove all the symbols you don't actually need, but I don't know 
that for sure. Note also that you can change the section flags of a 
section marked noload  to load.

Best Regards,
Bill Campbell
   Jokce

 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot


   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation

Peter Tyser wrote:
 On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
   
 Out of curiosity, I wanted to see just how much of a size penalty I am
 incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
 the results (fixed width font will help - its space, not tab, formatted):

 Section non-reloc reloc
 ---
 .text000118c4  000137fc - 0x1f38 bytes (~8kB) bigger
 .rodata  5bad  59d0
 .interp  n/a   0013
 .dynstr  n/a   0648
 .hashn/a   0428
 .eh_frame3268  34fc
 .data0a6c  01dc
 .data.reln/a   0098
 .data.rel.ro.local   n/a   0178
 .data.rel.local  n/a   07e4
 .got   01f0
 .got.plt n/a   000c
 .rel.got n/a   03e0
 .rel.dyn n/a   1228
 .dynsym  n/a   0850
 .dynamic n/a   0080
 .u_boot_cmd  03c0  03c0
 .bss 1a34  1a34
 .realmode0166  0166
 .bios053e  053e
 ===
 Total0001d5dd  00022287 - 0x4caa bytes (~19kB) bigger

 Its more than a 16% increase in size!!!

 .text accounts for a little under half of the total bloat, and of that,
 the crude dynamic loader accounts for only 341 bytes

 Have any metrics been done for PPC?
 

 Things actually improve a little bit when we use -mrelocatable and get
 rid of all the manual += gd-reloc_off fixups:

 1) Top of mainline on XPedite5370:
text  data bss dec hex filename
  308612 24488   33172  366272   596c0 u-boot

 2) Top of reloc branch on XPedite5370 (ie -mrelocatable):
text  data bss dec hex filename
  303704 28644   33156  365504   593c0 u-boot

   
Hi Peter,
 Just to be clear, the total text+data length of u-boot with the 
manual relocations (#1)  is LARGER than the text+data length of u-boot 
with the manual relocations removed and the necessary centralized 
relocation code added, along with any additional data sections required 
by -mrelocateable (#2), by 768 (dec) bytes? And both cases (1 and 2) 
work equivalently?

Best Regards,
Bill Campbell.
 For fun:
 3) #2 but with s/-mrelocatable/-fpic/ (probably doesn't boot):
text  data bss dec hex filename
  303704 24472   33156  361332   58374 u-boot


 There may be some other changes that affect the size between mainline
 and reloc, but their sizes are in the same general ballpark.

 Best,
 Peter

 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot


   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation

Graeme Russ wrote:
 Out of curiosity, I wanted to see just how much of a size penalty I am
 incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
 the results (fixed width font will help - its space, not tab, formatted):

 Section non-reloc reloc
 ---
 .text000118c4  000137fc - 0x1f38 bytes (~8kB) bigger
 .rodata  5bad  59d0
 .interp  n/a   0013
 .dynstr  n/a   0648
 .hashn/a   0428
 .eh_frame3268  34fc
 .data0a6c  01dc
 .data.reln/a   0098
 .data.rel.ro.local   n/a   0178
 .data.rel.local  n/a   07e4
 .got   01f0
 .got.plt n/a   000c
 .rel.got n/a   03e0
 .rel.dyn n/a   1228
 .dynsym  n/a   0850
 .dynamic n/a   0080
 .u_boot_cmd  03c0  03c0
 .bss 1a34  1a34
 .realmode0166  0166
 .bios053e  053e
 ===
 Total0001d5dd  00022287 - 0x4caa bytes (~19kB) bigger

 Its more than a 16% increase in size!!!

 .text accounts for a little under half of the total bloat, and of that,
 the crude dynamic loader accounts for only 341 bytes
   
Hi Graeme,
   I would be interested in a third option (column), the x86 build 
with just -mrelocateable but NOT -fpic. It will not be definitive 
because there will be extra code that references the GOT and missing 
code to do some of the relocation, but it would still be interesting.

Best Regards,
Bill Campbell
 Have any metrics been done for PPC?

 Regards,

 Graeme
 ___
 U-Boot mailing list
 U-Boot@lists.denx.de
 http://lists.denx.de/mailman/listinfo/u-boot


   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation

Peter Tyser wrote:
 On Thu, 2009-10-08 at 08:53 -0700, J. William Campbell wrote:
   
 Peter Tyser wrote:
 
 On Thu, 2009-10-08 at 22:54 +1100, Graeme Russ wrote:
   
   
 Out of curiosity, I wanted to see just how much of a size penalty I am
 incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
 the results (fixed width font will help - its space, not tab, formatted):

 Section non-reloc reloc
 ---
 .text000118c4  000137fc - 0x1f38 bytes (~8kB) bigger
 .rodata  5bad  59d0
 .interp  n/a   0013
 .dynstr  n/a   0648
 .hashn/a   0428
 .eh_frame3268  34fc
 .data0a6c  01dc
 .data.reln/a   0098
 .data.rel.ro.local   n/a   0178
 .data.rel.local  n/a   07e4
 .got   01f0
 .got.plt n/a   000c
 .rel.got n/a   03e0
 .rel.dyn n/a   1228
 .dynsym  n/a   0850
 .dynamic n/a   0080
 .u_boot_cmd  03c0  03c0
 .bss 1a34  1a34
 .realmode0166  0166
 .bios053e  053e
 ===
 Total0001d5dd  00022287 - 0x4caa bytes (~19kB) bigger

 Its more than a 16% increase in size!!!

 .text accounts for a little under half of the total bloat, and of that,
 the crude dynamic loader accounts for only 341 bytes

 Have any metrics been done for PPC?
 
 
 Things actually improve a little bit when we use -mrelocatable and get
 rid of all the manual += gd-reloc_off fixups:

 1) Top of mainline on XPedite5370:
textdata bss dec hex filename
  308612   24488   33172  366272   596c0 u-boot

 2) Top of reloc branch on XPedite5370 (ie -mrelocatable):
textdata bss dec hex filename
  303704   28644   33156  365504   593c0 u-boot

   
   
 Hi Peter,
  Just to be clear, the total text+data length of u-boot with the 
 manual relocations (#1)  is LARGER than the text+data length of u-boot 
 with the manual relocations removed and the necessary centralized 
 relocation code added, along with any additional data sections required 
 by -mrelocateable (#2), by 768 (dec) bytes?
 

 Hi Bill,
 Doah, looks like I chose a bad board as an example.  The XPedite5370
 already had -mrelocatable defined in its own
 board/xes/xpedite5370/config.mk in mainline, so the above comparison
 should be ignored as both builds used -mrelocatable.

 Here's some *real* results from the MPC8548CDS:
 1) Top of mainline:
text  data bss dec hex filename
  219968 17052   22992  260012   3f7ac u-boot

 2) Top of reloc branch (ie -mrelocatable)
text  data bss dec hex filename
  219192 20640   22980  262812   4029c u-boot

 So the reloc branch is 2.7K bigger for the MPC8548CDS.
   
Hi Peter,
 OK, that's more like it! A 1.2 % size increase in ROM seems like a 
very small price to pay for a truly relocatable u-boot image that will 
run on any size memory without the programmer having to actively worry 
about what may need relocating as code is written. . Also, it should be 
noted that the size increase in 2)  is mostly in relocation segments 
that do not need to be copied into ram, so the ram footprint should be 
smaller for 2) than 1). The relocation code itself could also be placed 
is a segment that is not copied into ram, although that may be more 
trouble than it is worth.
   I am looking forward to Graeme's results with the 386. I expect 
that it will not be quite so favorable, perhaps a 4 or 5% size increase 
for -mrelocatable over an absolute build. However, -mrelocatable vs. 
-fpic may be comparable, with -mrelocatable actually winning. But then 
again, I could be totally wrong!

Best Regards,
Bill Campbell
 Best,
 Peter



   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation

Graeme Russ wrote:
 On Fri, Oct 9, 2009 at 2:58 AM, J. William Campbell
 jwilliamcampb...@comcast.net wrote:
   
 Graeme Russ wrote:
 
 Out of curiosity, I wanted to see just how much of a size penalty I am
 incurring by using gcc -fpic / ld -pic on my x86 u-boot build. Here are
 the results (fixed width font will help - its space, not tab, formatted):

 Section non-reloc reloc
 ---
 .text000118c4  000137fc - 0x1f38 bytes (~8kB) bigger
 .rodata  5bad  59d0
 .interp  n/a   0013
 .dynstr  n/a   0648
 .hashn/a   0428
 .eh_frame3268  34fc
 .data0a6c  01dc
 .data.reln/a   0098
 .data.rel.ro.local   n/a   0178
 .data.rel.local  n/a   07e4
 .got   01f0
 .got.plt n/a   000c
 .rel.got n/a   03e0
 .rel.dyn n/a   1228
 .dynsym  n/a   0850
 .dynamic n/a   0080
 .u_boot_cmd  03c0  03c0
 .bss 1a34  1a34
 .realmode0166  0166
 .bios053e  053e
 ===
 Total0001d5dd  00022287 - 0x4caa bytes (~19kB) bigger

 Its more than a 16% increase in size!!!

 .text accounts for a little under half of the total bloat, and of that,
 the crude dynamic loader accounts for only 341 bytes

   
 Hi Graeme,
  I would be interested in a third option (column), the x86 build with
 just -mrelocateable but NOT -fpic. It will not be definitive because there
 will be extra code that references the GOT and missing code to do some of
 the relocation, but it would still be interesting.
 

 x86 does not have -mrelocatable. This is a PPC only option :(
   
Hi Graeme,
   You are unfortunately correct. However, I wonder if we can 
get essentially the same result by executing the final ld step with the 
--emit-relocs switch included. This may also include some extra 
sections that we would want to strip out, but if it works, it could give 
all ELF-based systems a way to a relocatable u-boot.

Best Regards,
Bill Campbell
**

   
 Best Regards,
 Bill Campbell
 
 Have any metrics been done for PPC?

 Regards,

 Graeme
   

 Once the reloc branch has been merged, how many arches are left which do
 not support relocation?

 Regards,

 Graeme


   

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Re: [U-Boot] Relocation size penalty calculation