Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread John Stultz
On Fri, May 1, 2015 at 11:29 AM, Nicolas Pitre  wrote:
> On Fri, 1 May 2015, John Stultz wrote:
>
>> So yea. I see the issue now. The ktime_divns() function uses do_div
>> which is an unsigned divide. So negative numbers look like large
>> numbers, and it doesn't do sign-extension, so when it divides we lose
>> the sign bit.
>
> That doesn't stop there. On 32-bit systems, if the divisor is not
> constant or too large, then __ktime_divns() is used. If you look at the
> code for __ktime_divns(), you'll notice it would get into an infinite
> loop if you pass it a negative divisor.

True. Though while negative ktime intervals are mostly rare, I suspect
we probably don't have any negative divisors for now. Even so, it
seems like something that could easily get someones foot shot.

Is there any sort of compiler magic to generate warnings if folks are
passing signed types to the divisor?

I'll be sending my first rfc of the fix here shortly. Additional
feedback there would be appreciated!

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Nicolas Pitre
On Fri, 1 May 2015, John Stultz wrote:

> So yea. I see the issue now. The ktime_divns() function uses do_div
> which is an unsigned divide. So negative numbers look like large
> numbers, and it doesn't do sign-extension, so when it divides we lose
> the sign bit.

That doesn't stop there. On 32-bit systems, if the divisor is not 
constant or too large, then __ktime_divns() is used. If you look at the 
code for __ktime_divns(), you'll notice it would get into an infinite 
loop if you pass it a negative divisor.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread John Stultz
On Fri, May 1, 2015 at 3:02 AM, Trevor Cordes  wrote:
> May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 -> 
> divns 18446744073689394 != old method: -20158
>
> The last 2-3 or 3 groups of output I could produce on demand by stopping 
> mythbackend and running:
> systemctl restart lircd.service ; irsend SEND_ONCE dct700 info
>
> Subsequent irsends don't trigger the bug, since (as I found out a
> while ago) by that point lircd is "hung", at least for a long while.
> Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
> If this result is used as a delay timer, the negative would produce
> 0 delay, and the + number the "hang".  I calculate that hang is 584
> years?  :-)
>
> So it looks like maybe my theory wasn't so wacky: we're dealing
> with a caller passing negative numbers (or 32/64 weirdness).  Very
> strange as it seems the caller *wants* (or is happy with) negative
> numbers!

So yea. I see the issue now. The ktime_divns() function uses do_div
which is an unsigned divide. So negative numbers look like large
numbers, and it doesn't do sign-extension, so when it divides we lose
the sign bit.

ktimes are normally positive values, so this probably isn't something
its run-across often, thus why you've just noticed. It does make me
suspicious that the driver is doing something funny still, but I've
gone ahead and written up a patch to fix this since this is clearly an
issue here.

I'll send it out here shortly. If you could give it a spin at your
leisure, and if it works give me a Tested-by: tag I'd appreciate it!

Great work again on chasing this down, and thanks for helping with
debugging and validating the fix!
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Trevor Cordes
On 2015-04-30 John Stultz wrote:
> >From your description it does seem like some sort of edge case
> >problem
> w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
> with Alan to do both calculations and print out warn when that
> happens.
> 
> There's also not a ton of users of that function, but ktime_us_delta()
> is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
> ir_lirc_transmit_ir().
> 
> We should instrument that to see if its calculating negative deltas.

Thanks for looking at this!  I didn't have the know-how to add the
debug code myself (I was scared this fn might be called a zillion times
in other kernel operations and overload the system with debug logging).

> I'll send you a debug patch to do the above.

Got the patch.  Sorry for the delay (takes hours to compile using
rpmbuild on this old P4 box).

I think I have "success" in terms of useful debug output (relevant
lines only):

May  1 04:41:08 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on 
/var/run/lirc/lircd
May  1 04:41:11 piles kernel: [   55.265023] JDB: ktime_to_us: -20782699 -> 
divns 18446744073688768 != old method: -20783
May  1 04:44:00 piles lircd-0.9.1a[978]: removed client
May  1 04:44:00 piles lircd-0.9.1a[978]: caught signal
May  1 04:44:00 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:03 piles kernel: [  287.445027] JDB: ktime_to_us: -20599906 -> 
divns 18446744073688951 != old method: -20600
May  1 04:45:37 piles lircd-0.9.1a[1523]: removed client
May  1 04:45:37 piles lircd-0.9.1a[1523]: caught signal
May  1 04:45:37 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:40 piles kernel: [  324.209023] JDB: ktime_to_us: -20443355 -> 
divns 18446744073689108 != old method: -20444
May  1 04:46:12 piles lircd-0.9.1a[1579]: removed client
May  1 04:46:12 piles lircd-0.9.1a[1579]: caught signal
May  1 04:46:12 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on 
/var/run/lirc/lircd
May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 -> 
divns 18446744073689394 != old method: -20158

The last 2-3 or 3 groups of output I could produce on demand by stopping 
mythbackend and running:
systemctl restart lircd.service ; irsend SEND_ONCE dct700 info

Subsequent irsends don't trigger the bug, since (as I found out a
while ago) by that point lircd is "hung", at least for a long while.
Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
If this result is used as a delay timer, the negative would produce
0 delay, and the + number the "hang".  I calculate that hang is 584
years?  :-)

So it looks like maybe my theory wasn't so wacky: we're dealing
with a caller passing negative numbers (or 32/64 weirdness).  Very
strange as it seems the caller *wants* (or is happy with) negative
numbers!

Let me know if you need any more debugging/patch-tests.  But give
me 4+ hours between rpmbuilds (probably my responses will be 24-hr
later).

Thanks a million!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread John Stultz
On Fri, May 1, 2015 at 3:02 AM, Trevor Cordes tre...@tecnopolis.ca wrote:
 May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 - 
 divns 18446744073689394 != old method: -20158

 The last 2-3 or 3 groups of output I could produce on demand by stopping 
 mythbackend and running:
 systemctl restart lircd.service ; irsend SEND_ONCE dct700 info

 Subsequent irsends don't trigger the bug, since (as I found out a
 while ago) by that point lircd is hung, at least for a long while.
 Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
 If this result is used as a delay timer, the negative would produce
 0 delay, and the + number the hang.  I calculate that hang is 584
 years?  :-)

 So it looks like maybe my theory wasn't so wacky: we're dealing
 with a caller passing negative numbers (or 32/64 weirdness).  Very
 strange as it seems the caller *wants* (or is happy with) negative
 numbers!

So yea. I see the issue now. The ktime_divns() function uses do_div
which is an unsigned divide. So negative numbers look like large
numbers, and it doesn't do sign-extension, so when it divides we lose
the sign bit.

ktimes are normally positive values, so this probably isn't something
its run-across often, thus why you've just noticed. It does make me
suspicious that the driver is doing something funny still, but I've
gone ahead and written up a patch to fix this since this is clearly an
issue here.

I'll send it out here shortly. If you could give it a spin at your
leisure, and if it works give me a Tested-by: tag I'd appreciate it!

Great work again on chasing this down, and thanks for helping with
debugging and validating the fix!
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread John Stultz
On Fri, May 1, 2015 at 11:29 AM, Nicolas Pitre nicolas.pi...@linaro.org wrote:
 On Fri, 1 May 2015, John Stultz wrote:

 So yea. I see the issue now. The ktime_divns() function uses do_div
 which is an unsigned divide. So negative numbers look like large
 numbers, and it doesn't do sign-extension, so when it divides we lose
 the sign bit.

 That doesn't stop there. On 32-bit systems, if the divisor is not
 constant or too large, then __ktime_divns() is used. If you look at the
 code for __ktime_divns(), you'll notice it would get into an infinite
 loop if you pass it a negative divisor.

True. Though while negative ktime intervals are mostly rare, I suspect
we probably don't have any negative divisors for now. Even so, it
seems like something that could easily get someones foot shot.

Is there any sort of compiler magic to generate warnings if folks are
passing signed types to the divisor?

I'll be sending my first rfc of the fix here shortly. Additional
feedback there would be appreciated!

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Nicolas Pitre
On Fri, 1 May 2015, John Stultz wrote:

 So yea. I see the issue now. The ktime_divns() function uses do_div
 which is an unsigned divide. So negative numbers look like large
 numbers, and it doesn't do sign-extension, so when it divides we lose
 the sign bit.

That doesn't stop there. On 32-bit systems, if the divisor is not 
constant or too large, then __ktime_divns() is used. If you look at the 
code for __ktime_divns(), you'll notice it would get into an infinite 
loop if you pass it a negative divisor.


Nicolas
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-05-01 Thread Trevor Cordes
On 2015-04-30 John Stultz wrote:
 From your description it does seem like some sort of edge case
 problem
 w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
 with Alan to do both calculations and print out warn when that
 happens.
 
 There's also not a ton of users of that function, but ktime_us_delta()
 is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
 ir_lirc_transmit_ir().
 
 We should instrument that to see if its calculating negative deltas.

Thanks for looking at this!  I didn't have the know-how to add the
debug code myself (I was scared this fn might be called a zillion times
in other kernel operations and overload the system with debug logging).

 I'll send you a debug patch to do the above.

Got the patch.  Sorry for the delay (takes hours to compile using
rpmbuild on this old P4 box).

I think I have success in terms of useful debug output (relevant
lines only):

May  1 04:41:08 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on 
/var/run/lirc/lircd
May  1 04:41:11 piles kernel: [   55.265023] JDB: ktime_to_us: -20782699 - 
divns 18446744073688768 != old method: -20783
May  1 04:44:00 piles lircd-0.9.1a[978]: removed client
May  1 04:44:00 piles lircd-0.9.1a[978]: caught signal
May  1 04:44:00 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:03 piles kernel: [  287.445027] JDB: ktime_to_us: -20599906 - 
divns 18446744073688951 != old method: -20600
May  1 04:45:37 piles lircd-0.9.1a[1523]: removed client
May  1 04:45:37 piles lircd-0.9.1a[1523]: caught signal
May  1 04:45:37 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on 
/var/run/lirc/lircd
May  1 04:45:40 piles kernel: [  324.209023] JDB: ktime_to_us: -20443355 - 
divns 18446744073689108 != old method: -20444
May  1 04:46:12 piles lircd-0.9.1a[1579]: removed client
May  1 04:46:12 piles lircd-0.9.1a[1579]: caught signal
May  1 04:46:12 piles lircd: 'lirc' written to protocols file 
/sys/class/rc/rc0/protocols
May  1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using 
/var/run/lirc/lircd
May  1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on 
/var/run/lirc/lircd
May  1 04:46:12 piles kernel: [  356.838029] JDB: ktime_to_us: -20157485 - 
divns 18446744073689394 != old method: -20158

The last 2-3 or 3 groups of output I could produce on demand by stopping 
mythbackend and running:
systemctl restart lircd.service ; irsend SEND_ONCE dct700 info

Subsequent irsends don't trigger the bug, since (as I found out a
while ago) by that point lircd is hung, at least for a long while.
Hey!  Maybe lircd is then hung for 18446744073689394 us or ns :-)
If this result is used as a delay timer, the negative would produce
0 delay, and the + number the hang.  I calculate that hang is 584
years?  :-)

So it looks like maybe my theory wasn't so wacky: we're dealing
with a caller passing negative numbers (or 32/64 weirdness).  Very
strange as it seems the caller *wants* (or is happy with) negative
numbers!

Let me know if you need any more debugging/patch-tests.  But give
me 4+ hours between rpmbuilds (probably my responses will be 24-hr
later).

Thanks a million!
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-30 Thread John Stultz
On Wed, Apr 29, 2015 at 9:27 PM, Trevor Cordes  wrote:
> Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
> from last month which didn't get any replies.  I'm still having this
> weird kernel bug affecting me and I've bisected it down to like 2-4
> lines of code.  (I've thought more about my theory regarding
> unsigned/signed below and it's probably wrong, so ignore my
> prognosticating.)  Please see my rhbz link near the bottom for the full
> details.

Thanks so much for the report and all the effort to chase down this regression!

>From your description it does seem like some sort of edge case problem
w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
with Alan to do both calculations and print out warn when that
happens.

There's also not a ton of users of that function, but ktime_us_delta()
is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
ir_lirc_transmit_ir().

We should instrument that to see if its calculating negative deltas.

I'll send you a debug patch to do the above.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-30 Thread One Thousand Gnomes
> > Obviously my patch isn't the real solution, the real solution is to
> > make the new function calls use a consistent 64-bit type, or figure
> > out what in my code path is calling these functions and check it for
> > value sanity.

I would be tempted to do both computations and print then inputs as well
as do a WARN_ON when they differ. That will give you a stack trace of the
offender, and providing you return the "correct" value you can see how it
continues functioning afterwards.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-30 Thread One Thousand Gnomes
  Obviously my patch isn't the real solution, the real solution is to
  make the new function calls use a consistent 64-bit type, or figure
  out what in my code path is calling these functions and check it for
  value sanity.

I would be tempted to do both computations and print then inputs as well
as do a WARN_ON when they differ. That will give you a stack trace of the
offender, and providing you return the correct value you can see how it
continues functioning afterwards.

Alan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-30 Thread John Stultz
On Wed, Apr 29, 2015 at 9:27 PM, Trevor Cordes tre...@tecnopolis.ca wrote:
 Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
 from last month which didn't get any replies.  I'm still having this
 weird kernel bug affecting me and I've bisected it down to like 2-4
 lines of code.  (I've thought more about my theory regarding
 unsigned/signed below and it's probably wrong, so ignore my
 prognosticating.)  Please see my rhbz link near the bottom for the full
 details.

Thanks so much for the report and all the effort to chase down this regression!

From your description it does seem like some sort of edge case problem
w/ the 32bit ktime_divns(), but I don't see it right off, and I agree
with Alan to do both calculations and print out warn when that
happens.

There's also not a ton of users of that function, but ktime_us_delta()
is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in
ir_lirc_transmit_ir().

We should instrument that to see if its calculating negative deltas.

I'll send you a debug patch to do the above.

thanks
-john
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-29 Thread Trevor Cordes
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
from last month which didn't get any replies.  I'm still having this
weird kernel bug affecting me and I've bisected it down to like 2-4
lines of code.  (I've thought more about my theory regarding
unsigned/signed below and it's probably wrong, so ignore my
prognosticating.)  Please see my rhbz link near the bottom for the full
details.

Thanks everyone!

On 2015-03-23 Trevor Cordes wrote:
> Hello everyone, this is my first attempt at bisecting a kernel to
> solve a bug.  Please bear with me.
> 
> I have successfully bisected and located a commit that is causing my 
> problem.  Look at commit 166afb64511.
> 
> ktime_to_us returns s64, but the commit changes it so ktime_to_us
> just returns what ktime_divns returns, and ktime_divns returns a
> u64!  If the u64 is big enough, wouldn't it wrap s64 around to a
> negative number?  Or, perhaps if some caller is passing in negative
> ktime_t to begin with it will trigger without having to hit big
> numbers.  With my limited knowledge of C, I am stabbing in the dark
> here.
> 
> That's just my guess as to why this commit causes my problem.  My bug 
> symptom is my previously working MythTV lirc blaster no longer
> reliably sends IR signals.  Using irsend to test I can see irsend is
> just timing out (and only sometimes blasts, usually the first
> attempt).  On good kernels it returns immediately after blasting.
> 
> This little patch (at bottom of email) that puts the code back in
> place and gets rid of the function call fixes the problem for me.  I
> applied this patch to the very latest FC21
> kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is
> gone!  I can once again MythTV.  Hooray.
> 
> I suspect no one else is seeing this because less people are running 
> 32-bit now, and perhaps in most code paths the value of the u64 never
> gets above 2^63.  I suspect something in drivers/media (possibly) is
> passing very high or negative values (possibly another bug) to these
> calls.
> 
> Obviously my patch isn't the real solution, the real solution is to
> make the new function calls use a consistent 64-bit type, or figure
> out what in my code path is calling these functions and check it for
> value sanity.
> 
> I've documented the whole process / details of this bug in RHBZ:
> https://bugzilla.redhat.com/show_bug.cgi?id=1200353
> 
> Thanks!
> 
> diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
> --- a/include/linux/ktime.h   2015-02-08 20:54:22.0 -0600
> +++ b/include/linux/ktime.h   2015-03-23 01:09:43.0 -0500
> @@ -173,12 +173,16 @@
>  
>  static inline s64 ktime_to_us(const ktime_t kt)
>  {
> - return ktime_divns(kt, NSEC_PER_USEC);
> +/*   return ktime_divns(kt, NSEC_PER_USEC); */
> + struct timeval tv = ktime_to_timeval(kt);
> + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
>  }
>  
>  static inline s64 ktime_to_ms(const ktime_t kt)
>  {
> - return ktime_divns(kt, NSEC_PER_MSEC);
> +/*   return ktime_divns(kt, NSEC_PER_MSEC); */
> + struct timeval tv = ktime_to_timeval(kt);
> + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec /
> USEC_PER_MSEC; }
>  
>  static inline s64 ktime_us_delta(const ktime_t later, const ktime_t
> earlier)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-04-29 Thread Trevor Cordes
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail
from last month which didn't get any replies.  I'm still having this
weird kernel bug affecting me and I've bisected it down to like 2-4
lines of code.  (I've thought more about my theory regarding
unsigned/signed below and it's probably wrong, so ignore my
prognosticating.)  Please see my rhbz link near the bottom for the full
details.

Thanks everyone!

On 2015-03-23 Trevor Cordes wrote:
 Hello everyone, this is my first attempt at bisecting a kernel to
 solve a bug.  Please bear with me.
 
 I have successfully bisected and located a commit that is causing my 
 problem.  Look at commit 166afb64511.
 
 ktime_to_us returns s64, but the commit changes it so ktime_to_us
 just returns what ktime_divns returns, and ktime_divns returns a
 u64!  If the u64 is big enough, wouldn't it wrap s64 around to a
 negative number?  Or, perhaps if some caller is passing in negative
 ktime_t to begin with it will trigger without having to hit big
 numbers.  With my limited knowledge of C, I am stabbing in the dark
 here.
 
 That's just my guess as to why this commit causes my problem.  My bug 
 symptom is my previously working MythTV lirc blaster no longer
 reliably sends IR signals.  Using irsend to test I can see irsend is
 just timing out (and only sometimes blasts, usually the first
 attempt).  On good kernels it returns immediately after blasting.
 
 This little patch (at bottom of email) that puts the code back in
 place and gets rid of the function call fixes the problem for me.  I
 applied this patch to the very latest FC21
 kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is
 gone!  I can once again MythTV.  Hooray.
 
 I suspect no one else is seeing this because less people are running 
 32-bit now, and perhaps in most code paths the value of the u64 never
 gets above 2^63.  I suspect something in drivers/media (possibly) is
 passing very high or negative values (possibly another bug) to these
 calls.
 
 Obviously my patch isn't the real solution, the real solution is to
 make the new function calls use a consistent 64-bit type, or figure
 out what in my code path is calling these functions and check it for
 value sanity.
 
 I've documented the whole process / details of this bug in RHBZ:
 https://bugzilla.redhat.com/show_bug.cgi?id=1200353
 
 Thanks!
 
 diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
 --- a/include/linux/ktime.h   2015-02-08 20:54:22.0 -0600
 +++ b/include/linux/ktime.h   2015-03-23 01:09:43.0 -0500
 @@ -173,12 +173,16 @@
  
  static inline s64 ktime_to_us(const ktime_t kt)
  {
 - return ktime_divns(kt, NSEC_PER_USEC);
 +/*   return ktime_divns(kt, NSEC_PER_USEC); */
 + struct timeval tv = ktime_to_timeval(kt);
 + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
  }
  
  static inline s64 ktime_to_ms(const ktime_t kt)
  {
 - return ktime_divns(kt, NSEC_PER_MSEC);
 +/*   return ktime_divns(kt, NSEC_PER_MSEC); */
 + struct timeval tv = ktime_to_timeval(kt);
 + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec /
 USEC_PER_MSEC; }
  
  static inline s64 ktime_us_delta(const ktime_t later, const ktime_t
 earlier)

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-03-23 Thread Trevor Cordes
Hello everyone, this is my first attempt at bisecting a kernel to solve a 
bug.  Please bear with me.

I have successfully bisected and located a commit that is causing my 
problem.  Look at commit 166afb64511.

ktime_to_us returns s64, but the commit changes it so ktime_to_us just 
returns what ktime_divns returns, and ktime_divns returns a u64!  If the 
u64 is big enough, wouldn't it wrap s64 around to a negative number?  Or, 
perhaps if some caller is passing in negative ktime_t to begin with it 
will trigger without having to hit big numbers.  With my limited knowledge 
of C, I am stabbing in the dark here.

That's just my guess as to why this commit causes my problem.  My bug 
symptom is my previously working MythTV lirc blaster no longer reliably 
sends IR signals.  Using irsend to test I can see irsend is just timing 
out (and only sometimes blasts, usually the first attempt).  On good 
kernels it returns immediately after blasting.

This little patch (at bottom of email) that puts the code back in place 
and gets rid of the function call fixes the problem for me.  I applied 
this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm 
and rpmbuilded and the bug is gone!  I can once again MythTV.  Hooray.

I suspect no one else is seeing this because less people are running 
32-bit now, and perhaps in most code paths the value of the u64 never gets 
above 2^63.  I suspect something in drivers/media (possibly) is passing 
very high or negative values (possibly another bug) to these calls.

Obviously my patch isn't the real solution, the real solution is to make 
the new function calls use a consistent 64-bit type, or figure out what in 
my code path is calling these functions and check it for value sanity.

I've documented the whole process / details of this bug in RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353

Thanks!

diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
--- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600
+++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500
@@ -173,12 +173,16 @@
 
 static inline s64 ktime_to_us(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_USEC);
+/* return ktime_divns(kt, NSEC_PER_USEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }
 
 static inline s64 ktime_to_ms(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_MSEC);
+/* return ktime_divns(kt, NSEC_PER_MSEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC;
 }
 
 static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511

2015-03-23 Thread Trevor Cordes
Hello everyone, this is my first attempt at bisecting a kernel to solve a 
bug.  Please bear with me.

I have successfully bisected and located a commit that is causing my 
problem.  Look at commit 166afb64511.

ktime_to_us returns s64, but the commit changes it so ktime_to_us just 
returns what ktime_divns returns, and ktime_divns returns a u64!  If the 
u64 is big enough, wouldn't it wrap s64 around to a negative number?  Or, 
perhaps if some caller is passing in negative ktime_t to begin with it 
will trigger without having to hit big numbers.  With my limited knowledge 
of C, I am stabbing in the dark here.

That's just my guess as to why this commit causes my problem.  My bug 
symptom is my previously working MythTV lirc blaster no longer reliably 
sends IR signals.  Using irsend to test I can see irsend is just timing 
out (and only sometimes blasts, usually the first attempt).  On good 
kernels it returns immediately after blasting.

This little patch (at bottom of email) that puts the code back in place 
and gets rid of the function call fixes the problem for me.  I applied 
this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm 
and rpmbuilded and the bug is gone!  I can once again MythTV.  Hooray.

I suspect no one else is seeing this because less people are running 
32-bit now, and perhaps in most code paths the value of the u64 never gets 
above 2^63.  I suspect something in drivers/media (possibly) is passing 
very high or negative values (possibly another bug) to these calls.

Obviously my patch isn't the real solution, the real solution is to make 
the new function calls use a consistent 64-bit type, or figure out what in 
my code path is calling these functions and check it for value sanity.

I've documented the whole process / details of this bug in RHBZ:
https://bugzilla.redhat.com/show_bug.cgi?id=1200353

Thanks!

diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h
--- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600
+++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500
@@ -173,12 +173,16 @@
 
 static inline s64 ktime_to_us(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_USEC);
+/* return ktime_divns(kt, NSEC_PER_USEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec;
 }
 
 static inline s64 ktime_to_ms(const ktime_t kt)
 {
-   return ktime_divns(kt, NSEC_PER_MSEC);
+/* return ktime_divns(kt, NSEC_PER_MSEC); */
+   struct timeval tv = ktime_to_timeval(kt);
+   return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC;
 }
 
 static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier)
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/