Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-06 Thread Stefan Kanthak
"Segher Boessenkool"  wrote:

> On Wed, Dec 05, 2018 at 02:19:14AM +0100, Stefan Kanthak wrote:
>> "Paul Koning"  wrote:
>> 
>> > Yes, that's a rather nasty cut & paste error I made.
>> 
>> I suspected that.
>> Replacing
>> !(den & (1L<<31))
>> with
>> (signed short) den >= 0
>> avoids this type of error: there's no need for a constant here!
>> 
>> JFTR: of course the 1L should be just a 1, without suffix.
> 
> "int" can be 16 bits only, I think?

This function works on short int, which can only be 16 bits.

> If you change to 15 you can remove the L, sure.
> 
>> > But if the 31 is changed to a 15, is the code correct?
>> > I would think so.
> 
> I think so, too.
> 
>> > For optimization I'd think that an assembly language
>> > version would make more sense, and a few targets do that.
>> 
>> Moving 2 of 3 conditions from the loop is not an optimisation,
>> but a necessity!
> 
> The compiler can optimise things quite well.

It but doesn't, see : lines 15 to 20
are inside the loop, line 6 and 9 to 11 skip the loop.

>> In other words: why test 3 conditions in every pass of the
>> loop when you need to test only 1 condition inside the loop,
>> and the other 2 outside/before the loop?
> 
> Maybe the code is easier to read this way.  Maybe it doesn't matter for
> performance.  Maybe no one cared, this routine is for correctness anyway,

No, no & yes, yes: it's not very likely that it will really be called.
OTOH, an unsuspecting kid may find and copy it, assuming that a routine
shipped with GCC can't be THAT bad.-P
Or the (almost) identical
 ...

> any target that cares for performance will do an asm version, or (partly)
> inline this even.
> 
> Send a patch if you want to see it changed :-)

I saw my last PDP-11 more than 30 years ago!

regards
Stefan


Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-05 Thread Paul Koning



> On Dec 5, 2018, at 11:23 AM, Segher Boessenkool  
> wrote:
> 
> On Wed, Dec 05, 2018 at 02:19:14AM +0100, Stefan Kanthak wrote:
>> "Paul Koning"  wrote:
>> 
>>> Yes, that's a rather nasty cut & paste error I made.
>> 
>> I suspected that.
>> Replacing
>>!(den & (1L<<31))
>> with
>>(signed short) den >= 0
>> avoids this type of error: there's no need for a constant here!
>> 
>> JFTR: of course the 1L should be just a 1, without suffix.
> 
> "int" can be 16 bits only, I think?  If you change to 15 you can remove
> the L, sure.

"int" on pdp11 is either 16 or 32 bits depending on switches, but "short int" 
is always 16.  I changed it to be 1U << 15 which seems more appropriate if int 
is 16 bits.

paul




Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-05 Thread Segher Boessenkool
On Wed, Dec 05, 2018 at 02:19:14AM +0100, Stefan Kanthak wrote:
> "Paul Koning"  wrote:
> 
> > Yes, that's a rather nasty cut & paste error I made.
> 
> I suspected that.
> Replacing
> !(den & (1L<<31))
> with
> (signed short) den >= 0
> avoids this type of error: there's no need for a constant here!
> 
> JFTR: of course the 1L should be just a 1, without suffix.

"int" can be 16 bits only, I think?  If you change to 15 you can remove
the L, sure.

> > But if the 31 is changed to a 15, is the code correct?
> > I would think so.

I think so, too.

> > For optimization I'd think that an assembly language
> > version would make more sense, and a few targets do that.
> 
> Moving 2 of 3 conditions from the loop is not an optimisation,
> but a necessity!

The compiler can optimise things quite well.

> In other words: why test 3 conditions in every pass of the
> loop when you need to test only 1 condition inside the loop,
> and the other 2 outside/before the loop?

Maybe the code is easier to read this way.  Maybe it doesn't matter for
performance.  Maybe no one cared, this routine is for correctness anyway,
any target that cares for performance will do an asm version, or (partly)
inline this even.

Send a patch if you want to see it changed :-)


Segher


Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-05 Thread Paul Koning



> On Dec 4, 2018, at 8:19 PM, Stefan Kanthak  wrote:
> 
> "Paul Koning"  wrote:
> 
>> Yes, that's a rather nasty cut & paste error I made.
> 
> I suspected that.
> Replacing
>!(den & (1L<<31))
> with
>(signed short) den >= 0
> avoids this type of error: there's no need for a constant here!
> 
> JFTR: of course the 1L should be just a 1, without suffix.
> 
>> But if the 31 is changed to a 15, is the code correct?
>> I would think so.
> 
> Almost. It's the standard algorithm, and it's correct except
> for den == 0, where the current implementation returns 0 as
> quotient or the numerator as remainder, while my fix yields an
> endless loop (as could be expected for "undefined behaviour").

I submitted a patch that just changes that one line.  This file is a copy of 
udivmodsi4.c so I figured I'd aim for the same logic except for the word length 
changes, and the 31 instead of 15 was a missed edit for that.

The other changes could be left for later, or a handwritten assembly routine 
used instead as some other targets do.

Thanks!

paul



Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
"Paul Koning"  wrote:

> Yes, that's a rather nasty cut & paste error I made.

I suspected that.
Replacing
!(den & (1L<<31))
with
(signed short) den >= 0
avoids this type of error: there's no need for a constant here!

JFTR: of course the 1L should be just a 1, without suffix.

> But if the 31 is changed to a 15, is the code correct?
> I would think so.

Almost. It's the standard algorithm, and it's correct except
for den == 0, where the current implementation returns 0 as
quotient or the numerator as remainder, while my fix yields an
endless loop (as could be expected for "undefined behaviour").

To avoid that, insert
  if (den == 0)
...; // raise the appropriate SIG... here, for example.
between the 2 if's I added.

> For optimization I'd think that an assembly language
> version would make more sense, and a few targets do that.

Moving 2 of 3 conditions from the loop is not an optimisation,
but a necessity!
In other words: why test 3 conditions in every pass of the
loop when you need to test only 1 condition inside the loop,
and the other 2 outside/before the loop?

regards
Stefan

> On Dec 4, 2018, at 5:51 PM, Stefan Kanthak  wrote:
> 
> Hi @ll,
> 
> libgcc's divmodhi4() function has an obvious bug; additionally
> it shows rather poor inperformant code: two of the three conditions
> tested in the first loop should clearly moved outside the loop!
> 
> divmodsi4() shows this inperformant code too!
> 
> regards
> Stefan Kanthak
> 
> --- divmodhi4.c ---
> 
> unsigned short
> __udivmodhi4(unsigned short num, unsigned short den, int modwanted)
> {
>  unsigned short bit = 1;
>  unsigned short res = 0;
> 
> #ifdef BUGFIX
>  if (den > num)
> return modwanted ? num : 0;
>  if (den == num)
> return modwanted ? 0 : 1;
>  while ((signed short) den >= 0)
> #else // original, buggy and inperformant code
>  while (den < num && bit && !(den & (1L<<31)))  // unsigned shorts are 16 bit!
> #endif
>{
>  den <<=1;
>  bit <<=1;
>}
>  while (bit)
>{
>  if (num >= den)
>{
>  num -= den;
>  res |= bit;
>}
>  bit >>=1;
>  den >>=1;
>}
>  if (modwanted) return num;
>  return res;
> }


Re: Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Paul Koning
Yes, that's a rather nasty cut & paste error I made.

But if the 31 is changed to a 15, is the code correct?  I would think so.  For 
optimization I'd think that an assembly language version would make more sense, 
and a few targets do that.

paul


> On Dec 4, 2018, at 5:51 PM, Stefan Kanthak  wrote:
> 
> Hi @ll,
> 
> libgcc's divmodhi4() function has an obvious bug; additionally
> it shows rather poor inperformant code: two of the three conditions
> tested in the first loop should clearly moved outside the loop!
> 
> divmodsi4() shows this inperformant code too!
> 
> regards
> Stefan Kanthak
> 
> --- divmodhi4.c ---
> 
> unsigned short
> __udivmodhi4(unsigned short num, unsigned short den, int modwanted)
> {
>  unsigned short bit = 1;
>  unsigned short res = 0;
> 
> #ifdef BUGFIX
>  if (den > num)
> return modwanted ? num : 0;
>  if (den == num)
> return modwanted ? 0 : 1;
>  while ((signed short) den >= 0)
> #else // original, buggy and inperformant code
>  while (den < num && bit && !(den & (1L<<31)))  // unsigned shorts are 16 bit!
> #endif
>{
>  den <<=1;
>  bit <<=1;
>}
>  while (bit)
>{
>  if (num >= den)
>{
>  num -= den;
>  res |= bit;
>}
>  bit >>=1;
>  den >>=1;
>}
>  if (modwanted) return num;
>  return res;
> }



Bug in divmodhi4(), plus poor inperformant code

2018-12-04 Thread Stefan Kanthak
Hi @ll,

libgcc's divmodhi4() function has an obvious bug; additionally
it shows rather poor inperformant code: two of the three conditions
tested in the first loop should clearly moved outside the loop!

divmodsi4() shows this inperformant code too!

regards
Stefan Kanthak

--- divmodhi4.c ---

unsigned short
__udivmodhi4(unsigned short num, unsigned short den, int modwanted)
{
  unsigned short bit = 1;
  unsigned short res = 0;

#ifdef BUGFIX
  if (den > num)
 return modwanted ? num : 0;
  if (den == num)
 return modwanted ? 0 : 1;
  while ((signed short) den >= 0)
#else // original, buggy and inperformant code
  while (den < num && bit && !(den & (1L<<31)))  // unsigned shorts are 16 bit!
#endif
{
  den <<=1;
  bit <<=1;
}
  while (bit)
{
  if (num >= den)
{
  num -= den;
  res |= bit;
}
  bit >>=1;
  den >>=1;
}
  if (modwanted) return num;
  return res;
}