Re: [CFR] Replacing while loops with proper division and multiplication

2015-06-05 Thread Neel Natu
Hi Hans,

On Fri, Jun 5, 2015 at 12:09 AM, Hans Petter Selasky h...@selasky.org wrote:
 Hi,

 I was going through some timer code and found some unnecessary while loops
 in kern/kern_clocksource.c .

 I added some prints and found that during boot, runs can exceed 2000,
 while during regular usage runs is typically 1. Do you think it is worth to
 convert these loops into division and multiplications? It might make the CPU
 pipeline a tiny bit faster, having to skip some conditionals? And also
 possibly improve readability?

 What do you think?

 --HPS

 Index: kern/kern_clocksource.c
 ===
 --- kern/kern_clocksource.c (revision 283606)
 +++ kern/kern_clocksource.c (working copy)
 @@ -155,10 +155,11 @@
  handleevents(sbintime_t now, int fake)
  {
 sbintime_t t, *hct;
 +   sbintime_t runs;
 struct trapframe *frame;
 struct pcpu_state *state;
 int usermode;
 -   int done, runs;
 +   int done;

 CTR3(KTR_SPARE2, handle at %d:  now  %d.%08x,
 curcpu, (int)(now  32), (u_int)(now  0x));
 @@ -173,12 +174,10 @@

 state = DPCPU_PTR(timerstate);

 -   runs = 0;
 -   while (now = state-nexthard) {
 -   state-nexthard += tick_sbt;
 -   runs++;
 -   }
 -   if (runs) {
 +   runs = (now - state-nexthard) / tick_sbt;
 +   if (runs  0) {
 +   printf(R%d , (int)runs);
 +   state-nexthard += tick_sbt * runs;
 hct = DPCPU_PTR(hardclocktime);
 *hct = state-nexthard - tick_sbt;
 if (fake  2) {

There is a difference in behavior in the two implementations when 'now
== state-nexthard'. In the loop-based implementation this would end
up with 'runs = 1' whereas in the division-based implementation it
would end up with 'runs = 0'.

I am not sure if this is intentional or just an oversight.

best
Neel

 @@ -186,25 +185,25 @@
 done = 1;
 }
 }
 -   runs = 0;
 -   while (now = state-nextstat) {
 -   state-nextstat += statperiod;
 -   runs++;
 +   runs = (now - state-nextstat) / statperiod;
 +   if (runs  0) {
 +   printf(S%d , (int)runs);
 +   state-nextstat += statperiod * runs;
 +   if (fake  2) {
 +   statclock_cnt(runs, usermode);
 +   done = 1;
 +   }
 }
 -   if (runs  fake  2) {
 -   statclock_cnt(runs, usermode);
 -   done = 1;
 -   }
 if (profiling) {
 -   runs = 0;
 -   while (now = state-nextprof) {
 -   state-nextprof += profperiod;
 -   runs++;
 +   runs = (now - state-nextprof) / profperiod;
 +   if (runs  0) {
 +   printf(T%d , (int)runs);
 +   state-nextprof += profperiod * runs;
 +   if (!fake) {
 +   profclock_cnt(runs, usermode,
 TRAPF_PC(frame));
 +   done = 1;
 +   }
 }
 -   if (runs  !fake) {
 -   profclock_cnt(runs, usermode, TRAPF_PC(frame));
 -   done = 1;
 -   }
 } else
 state-nextprof = state-nextstat;
 if (now = state-nextcallopt) {

 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFR] Replacing while loops with proper division and multiplication

2015-06-05 Thread Neel Natu
Hi Hans,

On Fri, Jun 5, 2015 at 11:53 AM, Hans Petter Selasky h...@selasky.org wrote:
 On 06/05/15 20:31, Neel Natu wrote:

 -   runs = 0;
 -   while (now = state-nexthard) {
 -   state-nexthard += tick_sbt;
 -   runs++;
 -   }
 -   if (runs) {
 +   runs = (now - state-nexthard) / tick_sbt;
 +   if (runs  0) {
 +   printf(R%d , (int)runs);
 +   state-nexthard += tick_sbt * runs;
  hct = DPCPU_PTR(hardclocktime);
  *hct = state-nexthard - tick_sbt;
  if (fake  2) {

 There is a difference in behavior in the two implementations when 'now
 == state-nexthard'. In the loop-based implementation this would end
 up with 'runs = 1' whereas in the division-based implementation it
 would end up with 'runs = 0'.

 I am not sure if this is intentional or just an oversight.


 Hi Neel,

 The nexthard is mainly updated in this piece of code. We can assume that
 state-nexthard is aligned to ticks_sbt. If state-nexthard % ticks_sbt
 == 0, is that still an issue?


I am not very familiar with this subsystem to make the call (mav@ or
davide@ would know for sure though).

I just noticed a discrepancy in the patch and wanted to highlight that
in case it might be an issue.

best
Neel

 --HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFR] Replacing while loops with proper division and multiplication

2015-06-05 Thread Hans Petter Selasky

On 06/05/15 20:31, Neel Natu wrote:

-   runs = 0;
-   while (now = state-nexthard) {
-   state-nexthard += tick_sbt;
-   runs++;
-   }
-   if (runs) {
+   runs = (now - state-nexthard) / tick_sbt;
+   if (runs  0) {
+   printf(R%d , (int)runs);
+   state-nexthard += tick_sbt * runs;
 hct = DPCPU_PTR(hardclocktime);
 *hct = state-nexthard - tick_sbt;
 if (fake  2) {

There is a difference in behavior in the two implementations when 'now
== state-nexthard'. In the loop-based implementation this would end
up with 'runs = 1' whereas in the division-based implementation it
would end up with 'runs = 0'.

I am not sure if this is intentional or just an oversight.


Hi Neel,

The nexthard is mainly updated in this piece of code. We can assume that 
state-nexthard is aligned to ticks_sbt. If state-nexthard % 
ticks_sbt == 0, is that still an issue?


--HPS
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CFR] Replacing while loops with proper division and multiplication

2015-06-05 Thread Konstantin Belousov
On Fri, Jun 05, 2015 at 09:09:26AM +0200, Hans Petter Selasky wrote:
 Hi,
 
 I was going through some timer code and found some unnecessary while 
 loops in kern/kern_clocksource.c .
 
 I added some prints and found that during boot, runs can exceed 2000, 
 while during regular usage runs is typically 1. Do you think it is worth 
 to convert these loops into division and multiplications? It might make 
 the CPU pipeline a tiny bit faster, having to skip some conditionals? 
 And also possibly improve readability?
For typical loop count of one, I think that setup time for the division is
much greater than single substraction and non-taken conditional jump, which
is also should be consistently predicted as not taken.

You could try to run a microbenchmark to see if the patch makes any sense.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org