Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, May 1, 2015 at 11:29 AM, Nicolas Pitre wrote: > On Fri, 1 May 2015, John Stultz wrote: > >> So yea. I see the issue now. The ktime_divns() function uses do_div >> which is an unsigned divide. So negative numbers look like large >> numbers, and it doesn't do sign-extension, so when it divides we lose >> the sign bit. > > That doesn't stop there. On 32-bit systems, if the divisor is not > constant or too large, then __ktime_divns() is used. If you look at the > code for __ktime_divns(), you'll notice it would get into an infinite > loop if you pass it a negative divisor. True. Though while negative ktime intervals are mostly rare, I suspect we probably don't have any negative divisors for now. Even so, it seems like something that could easily get someones foot shot. Is there any sort of compiler magic to generate warnings if folks are passing signed types to the divisor? I'll be sending my first rfc of the fix here shortly. Additional feedback there would be appreciated! thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, 1 May 2015, John Stultz wrote: > So yea. I see the issue now. The ktime_divns() function uses do_div > which is an unsigned divide. So negative numbers look like large > numbers, and it doesn't do sign-extension, so when it divides we lose > the sign bit. That doesn't stop there. On 32-bit systems, if the divisor is not constant or too large, then __ktime_divns() is used. If you look at the code for __ktime_divns(), you'll notice it would get into an infinite loop if you pass it a negative divisor. Nicolas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, May 1, 2015 at 3:02 AM, Trevor Cordes wrote: > May 1 04:46:12 piles kernel: [ 356.838029] JDB: ktime_to_us: -20157485 -> > divns 18446744073689394 != old method: -20158 > > The last 2-3 or 3 groups of output I could produce on demand by stopping > mythbackend and running: > systemctl restart lircd.service ; irsend SEND_ONCE dct700 info > > Subsequent irsends don't trigger the bug, since (as I found out a > while ago) by that point lircd is "hung", at least for a long while. > Hey! Maybe lircd is then hung for 18446744073689394 us or ns :-) > If this result is used as a delay timer, the negative would produce > 0 delay, and the + number the "hang". I calculate that hang is 584 > years? :-) > > So it looks like maybe my theory wasn't so wacky: we're dealing > with a caller passing negative numbers (or 32/64 weirdness). Very > strange as it seems the caller *wants* (or is happy with) negative > numbers! So yea. I see the issue now. The ktime_divns() function uses do_div which is an unsigned divide. So negative numbers look like large numbers, and it doesn't do sign-extension, so when it divides we lose the sign bit. ktimes are normally positive values, so this probably isn't something its run-across often, thus why you've just noticed. It does make me suspicious that the driver is doing something funny still, but I've gone ahead and written up a patch to fix this since this is clearly an issue here. I'll send it out here shortly. If you could give it a spin at your leisure, and if it works give me a Tested-by: tag I'd appreciate it! Great work again on chasing this down, and thanks for helping with debugging and validating the fix! -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On 2015-04-30 John Stultz wrote: > >From your description it does seem like some sort of edge case > >problem > w/ the 32bit ktime_divns(), but I don't see it right off, and I agree > with Alan to do both calculations and print out warn when that > happens. > > There's also not a ton of users of that function, but ktime_us_delta() > is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in > ir_lirc_transmit_ir(). > > We should instrument that to see if its calculating negative deltas. Thanks for looking at this! I didn't have the know-how to add the debug code myself (I was scared this fn might be called a zillion times in other kernel operations and overload the system with debug logging). > I'll send you a debug patch to do the above. Got the patch. Sorry for the delay (takes hours to compile using rpmbuild on this old P4 box). I think I have "success" in terms of useful debug output (relevant lines only): May 1 04:41:08 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on /var/run/lirc/lircd May 1 04:41:11 piles kernel: [ 55.265023] JDB: ktime_to_us: -20782699 -> divns 18446744073688768 != old method: -20783 May 1 04:44:00 piles lircd-0.9.1a[978]: removed client May 1 04:44:00 piles lircd-0.9.1a[978]: caught signal May 1 04:44:00 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on /var/run/lirc/lircd May 1 04:45:03 piles kernel: [ 287.445027] JDB: ktime_to_us: -20599906 -> divns 18446744073688951 != old method: -20600 May 1 04:45:37 piles lircd-0.9.1a[1523]: removed client May 1 04:45:37 piles lircd-0.9.1a[1523]: caught signal May 1 04:45:37 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on /var/run/lirc/lircd May 1 04:45:40 piles kernel: [ 324.209023] JDB: ktime_to_us: -20443355 -> divns 18446744073689108 != old method: -20444 May 1 04:46:12 piles lircd-0.9.1a[1579]: removed client May 1 04:46:12 piles lircd-0.9.1a[1579]: caught signal May 1 04:46:12 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on /var/run/lirc/lircd May 1 04:46:12 piles kernel: [ 356.838029] JDB: ktime_to_us: -20157485 -> divns 18446744073689394 != old method: -20158 The last 2-3 or 3 groups of output I could produce on demand by stopping mythbackend and running: systemctl restart lircd.service ; irsend SEND_ONCE dct700 info Subsequent irsends don't trigger the bug, since (as I found out a while ago) by that point lircd is "hung", at least for a long while. Hey! Maybe lircd is then hung for 18446744073689394 us or ns :-) If this result is used as a delay timer, the negative would produce 0 delay, and the + number the "hang". I calculate that hang is 584 years? :-) So it looks like maybe my theory wasn't so wacky: we're dealing with a caller passing negative numbers (or 32/64 weirdness). Very strange as it seems the caller *wants* (or is happy with) negative numbers! Let me know if you need any more debugging/patch-tests. But give me 4+ hours between rpmbuilds (probably my responses will be 24-hr later). Thanks a million! -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, May 1, 2015 at 3:02 AM, Trevor Cordes tre...@tecnopolis.ca wrote: May 1 04:46:12 piles kernel: [ 356.838029] JDB: ktime_to_us: -20157485 - divns 18446744073689394 != old method: -20158 The last 2-3 or 3 groups of output I could produce on demand by stopping mythbackend and running: systemctl restart lircd.service ; irsend SEND_ONCE dct700 info Subsequent irsends don't trigger the bug, since (as I found out a while ago) by that point lircd is hung, at least for a long while. Hey! Maybe lircd is then hung for 18446744073689394 us or ns :-) If this result is used as a delay timer, the negative would produce 0 delay, and the + number the hang. I calculate that hang is 584 years? :-) So it looks like maybe my theory wasn't so wacky: we're dealing with a caller passing negative numbers (or 32/64 weirdness). Very strange as it seems the caller *wants* (or is happy with) negative numbers! So yea. I see the issue now. The ktime_divns() function uses do_div which is an unsigned divide. So negative numbers look like large numbers, and it doesn't do sign-extension, so when it divides we lose the sign bit. ktimes are normally positive values, so this probably isn't something its run-across often, thus why you've just noticed. It does make me suspicious that the driver is doing something funny still, but I've gone ahead and written up a patch to fix this since this is clearly an issue here. I'll send it out here shortly. If you could give it a spin at your leisure, and if it works give me a Tested-by: tag I'd appreciate it! Great work again on chasing this down, and thanks for helping with debugging and validating the fix! -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, May 1, 2015 at 11:29 AM, Nicolas Pitre nicolas.pi...@linaro.org wrote: On Fri, 1 May 2015, John Stultz wrote: So yea. I see the issue now. The ktime_divns() function uses do_div which is an unsigned divide. So negative numbers look like large numbers, and it doesn't do sign-extension, so when it divides we lose the sign bit. That doesn't stop there. On 32-bit systems, if the divisor is not constant or too large, then __ktime_divns() is used. If you look at the code for __ktime_divns(), you'll notice it would get into an infinite loop if you pass it a negative divisor. True. Though while negative ktime intervals are mostly rare, I suspect we probably don't have any negative divisors for now. Even so, it seems like something that could easily get someones foot shot. Is there any sort of compiler magic to generate warnings if folks are passing signed types to the divisor? I'll be sending my first rfc of the fix here shortly. Additional feedback there would be appreciated! thanks -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Fri, 1 May 2015, John Stultz wrote: So yea. I see the issue now. The ktime_divns() function uses do_div which is an unsigned divide. So negative numbers look like large numbers, and it doesn't do sign-extension, so when it divides we lose the sign bit. That doesn't stop there. On 32-bit systems, if the divisor is not constant or too large, then __ktime_divns() is used. If you look at the code for __ktime_divns(), you'll notice it would get into an infinite loop if you pass it a negative divisor. Nicolas -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On 2015-04-30 John Stultz wrote: From your description it does seem like some sort of edge case problem w/ the 32bit ktime_divns(), but I don't see it right off, and I agree with Alan to do both calculations and print out warn when that happens. There's also not a ton of users of that function, but ktime_us_delta() is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in ir_lirc_transmit_ir(). We should instrument that to see if its calculating negative deltas. Thanks for looking at this! I didn't have the know-how to add the debug code myself (I was scared this fn might be called a zillion times in other kernel operations and overload the system with debug logging). I'll send you a debug patch to do the above. Got the patch. Sorry for the delay (takes hours to compile using rpmbuild on this old P4 box). I think I have success in terms of useful debug output (relevant lines only): May 1 04:41:08 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:41:08 piles lircd-0.9.1a[978]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:41:11 piles lircd-0.9.1a[978]: accepted new client on /var/run/lirc/lircd May 1 04:41:11 piles kernel: [ 55.265023] JDB: ktime_to_us: -20782699 - divns 18446744073688768 != old method: -20783 May 1 04:44:00 piles lircd-0.9.1a[978]: removed client May 1 04:44:00 piles lircd-0.9.1a[978]: caught signal May 1 04:44:00 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:44:00 piles lircd-0.9.1a[1523]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:45:03 piles lircd-0.9.1a[1523]: accepted new client on /var/run/lirc/lircd May 1 04:45:03 piles kernel: [ 287.445027] JDB: ktime_to_us: -20599906 - divns 18446744073688951 != old method: -20600 May 1 04:45:37 piles lircd-0.9.1a[1523]: removed client May 1 04:45:37 piles lircd-0.9.1a[1523]: caught signal May 1 04:45:37 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:45:37 piles lircd-0.9.1a[1579]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:45:40 piles lircd-0.9.1a[1579]: accepted new client on /var/run/lirc/lircd May 1 04:45:40 piles kernel: [ 324.209023] JDB: ktime_to_us: -20443355 - divns 18446744073689108 != old method: -20444 May 1 04:46:12 piles lircd-0.9.1a[1579]: removed client May 1 04:46:12 piles lircd-0.9.1a[1579]: caught signal May 1 04:46:12 piles lircd: 'lirc' written to protocols file /sys/class/rc/rc0/protocols May 1 04:46:12 piles lircd-0.9.1a[1597]: lircd(default) ready, using /var/run/lirc/lircd May 1 04:46:12 piles lircd-0.9.1a[1597]: accepted new client on /var/run/lirc/lircd May 1 04:46:12 piles kernel: [ 356.838029] JDB: ktime_to_us: -20157485 - divns 18446744073689394 != old method: -20158 The last 2-3 or 3 groups of output I could produce on demand by stopping mythbackend and running: systemctl restart lircd.service ; irsend SEND_ONCE dct700 info Subsequent irsends don't trigger the bug, since (as I found out a while ago) by that point lircd is hung, at least for a long while. Hey! Maybe lircd is then hung for 18446744073689394 us or ns :-) If this result is used as a delay timer, the negative would produce 0 delay, and the + number the hang. I calculate that hang is 584 years? :-) So it looks like maybe my theory wasn't so wacky: we're dealing with a caller passing negative numbers (or 32/64 weirdness). Very strange as it seems the caller *wants* (or is happy with) negative numbers! Let me know if you need any more debugging/patch-tests. But give me 4+ hours between rpmbuilds (probably my responses will be 24-hr later). Thanks a million! -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Wed, Apr 29, 2015 at 9:27 PM, Trevor Cordes wrote: > Sorry for the top-posting; Josh Boyer suggested I re-mail this mail > from last month which didn't get any replies. I'm still having this > weird kernel bug affecting me and I've bisected it down to like 2-4 > lines of code. (I've thought more about my theory regarding > unsigned/signed below and it's probably wrong, so ignore my > prognosticating.) Please see my rhbz link near the bottom for the full > details. Thanks so much for the report and all the effort to chase down this regression! >From your description it does seem like some sort of edge case problem w/ the 32bit ktime_divns(), but I don't see it right off, and I agree with Alan to do both calculations and print out warn when that happens. There's also not a ton of users of that function, but ktime_us_delta() is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in ir_lirc_transmit_ir(). We should instrument that to see if its calculating negative deltas. I'll send you a debug patch to do the above. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
> > Obviously my patch isn't the real solution, the real solution is to > > make the new function calls use a consistent 64-bit type, or figure > > out what in my code path is calling these functions and check it for > > value sanity. I would be tempted to do both computations and print then inputs as well as do a WARN_ON when they differ. That will give you a stack trace of the offender, and providing you return the "correct" value you can see how it continues functioning afterwards. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Obviously my patch isn't the real solution, the real solution is to make the new function calls use a consistent 64-bit type, or figure out what in my code path is calling these functions and check it for value sanity. I would be tempted to do both computations and print then inputs as well as do a WARN_ON when they differ. That will give you a stack trace of the offender, and providing you return the correct value you can see how it continues functioning afterwards. Alan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
On Wed, Apr 29, 2015 at 9:27 PM, Trevor Cordes tre...@tecnopolis.ca wrote: Sorry for the top-posting; Josh Boyer suggested I re-mail this mail from last month which didn't get any replies. I'm still having this weird kernel bug affecting me and I've bisected it down to like 2-4 lines of code. (I've thought more about my theory regarding unsigned/signed below and it's probably wrong, so ignore my prognosticating.) Please see my rhbz link near the bottom for the full details. Thanks so much for the report and all the effort to chase down this regression! From your description it does seem like some sort of edge case problem w/ the 32bit ktime_divns(), but I don't see it right off, and I agree with Alan to do both calculations and print out warn when that happens. There's also not a ton of users of that function, but ktime_us_delta() is used in drivers/media/rc/ir-lirc-codec.c, which makes use of it in ir_lirc_transmit_ir(). We should instrument that to see if its calculating negative deltas. I'll send you a debug patch to do the above. thanks -john -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail from last month which didn't get any replies. I'm still having this weird kernel bug affecting me and I've bisected it down to like 2-4 lines of code. (I've thought more about my theory regarding unsigned/signed below and it's probably wrong, so ignore my prognosticating.) Please see my rhbz link near the bottom for the full details. Thanks everyone! On 2015-03-23 Trevor Cordes wrote: > Hello everyone, this is my first attempt at bisecting a kernel to > solve a bug. Please bear with me. > > I have successfully bisected and located a commit that is causing my > problem. Look at commit 166afb64511. > > ktime_to_us returns s64, but the commit changes it so ktime_to_us > just returns what ktime_divns returns, and ktime_divns returns a > u64! If the u64 is big enough, wouldn't it wrap s64 around to a > negative number? Or, perhaps if some caller is passing in negative > ktime_t to begin with it will trigger without having to hit big > numbers. With my limited knowledge of C, I am stabbing in the dark > here. > > That's just my guess as to why this commit causes my problem. My bug > symptom is my previously working MythTV lirc blaster no longer > reliably sends IR signals. Using irsend to test I can see irsend is > just timing out (and only sometimes blasts, usually the first > attempt). On good kernels it returns immediately after blasting. > > This little patch (at bottom of email) that puts the code back in > place and gets rid of the function call fixes the problem for me. I > applied this patch to the very latest FC21 > kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is > gone! I can once again MythTV. Hooray. > > I suspect no one else is seeing this because less people are running > 32-bit now, and perhaps in most code paths the value of the u64 never > gets above 2^63. I suspect something in drivers/media (possibly) is > passing very high or negative values (possibly another bug) to these > calls. > > Obviously my patch isn't the real solution, the real solution is to > make the new function calls use a consistent 64-bit type, or figure > out what in my code path is calling these functions and check it for > value sanity. > > I've documented the whole process / details of this bug in RHBZ: > https://bugzilla.redhat.com/show_bug.cgi?id=1200353 > > Thanks! > > diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h > --- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600 > +++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500 > @@ -173,12 +173,16 @@ > > static inline s64 ktime_to_us(const ktime_t kt) > { > - return ktime_divns(kt, NSEC_PER_USEC); > +/* return ktime_divns(kt, NSEC_PER_USEC); */ > + struct timeval tv = ktime_to_timeval(kt); > + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec; > } > > static inline s64 ktime_to_ms(const ktime_t kt) > { > - return ktime_divns(kt, NSEC_PER_MSEC); > +/* return ktime_divns(kt, NSEC_PER_MSEC); */ > + struct timeval tv = ktime_to_timeval(kt); > + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / > USEC_PER_MSEC; } > > static inline s64 ktime_us_delta(const ktime_t later, const ktime_t > earlier) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Sorry for the top-posting; Josh Boyer suggested I re-mail this mail from last month which didn't get any replies. I'm still having this weird kernel bug affecting me and I've bisected it down to like 2-4 lines of code. (I've thought more about my theory regarding unsigned/signed below and it's probably wrong, so ignore my prognosticating.) Please see my rhbz link near the bottom for the full details. Thanks everyone! On 2015-03-23 Trevor Cordes wrote: Hello everyone, this is my first attempt at bisecting a kernel to solve a bug. Please bear with me. I have successfully bisected and located a commit that is causing my problem. Look at commit 166afb64511. ktime_to_us returns s64, but the commit changes it so ktime_to_us just returns what ktime_divns returns, and ktime_divns returns a u64! If the u64 is big enough, wouldn't it wrap s64 around to a negative number? Or, perhaps if some caller is passing in negative ktime_t to begin with it will trigger without having to hit big numbers. With my limited knowledge of C, I am stabbing in the dark here. That's just my guess as to why this commit causes my problem. My bug symptom is my previously working MythTV lirc blaster no longer reliably sends IR signals. Using irsend to test I can see irsend is just timing out (and only sometimes blasts, usually the first attempt). On good kernels it returns immediately after blasting. This little patch (at bottom of email) that puts the code back in place and gets rid of the function call fixes the problem for me. I applied this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is gone! I can once again MythTV. Hooray. I suspect no one else is seeing this because less people are running 32-bit now, and perhaps in most code paths the value of the u64 never gets above 2^63. I suspect something in drivers/media (possibly) is passing very high or negative values (possibly another bug) to these calls. Obviously my patch isn't the real solution, the real solution is to make the new function calls use a consistent 64-bit type, or figure out what in my code path is calling these functions and check it for value sanity. I've documented the whole process / details of this bug in RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 Thanks! diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h --- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600 +++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500 @@ -173,12 +173,16 @@ static inline s64 ktime_to_us(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_USEC); +/* return ktime_divns(kt, NSEC_PER_USEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec; } static inline s64 ktime_to_ms(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_MSEC); +/* return ktime_divns(kt, NSEC_PER_MSEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC; } static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Hello everyone, this is my first attempt at bisecting a kernel to solve a bug. Please bear with me. I have successfully bisected and located a commit that is causing my problem. Look at commit 166afb64511. ktime_to_us returns s64, but the commit changes it so ktime_to_us just returns what ktime_divns returns, and ktime_divns returns a u64! If the u64 is big enough, wouldn't it wrap s64 around to a negative number? Or, perhaps if some caller is passing in negative ktime_t to begin with it will trigger without having to hit big numbers. With my limited knowledge of C, I am stabbing in the dark here. That's just my guess as to why this commit causes my problem. My bug symptom is my previously working MythTV lirc blaster no longer reliably sends IR signals. Using irsend to test I can see irsend is just timing out (and only sometimes blasts, usually the first attempt). On good kernels it returns immediately after blasting. This little patch (at bottom of email) that puts the code back in place and gets rid of the function call fixes the problem for me. I applied this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is gone! I can once again MythTV. Hooray. I suspect no one else is seeing this because less people are running 32-bit now, and perhaps in most code paths the value of the u64 never gets above 2^63. I suspect something in drivers/media (possibly) is passing very high or negative values (possibly another bug) to these calls. Obviously my patch isn't the real solution, the real solution is to make the new function calls use a consistent 64-bit type, or figure out what in my code path is calling these functions and check it for value sanity. I've documented the whole process / details of this bug in RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 Thanks! diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h --- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600 +++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500 @@ -173,12 +173,16 @@ static inline s64 ktime_to_us(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_USEC); +/* return ktime_divns(kt, NSEC_PER_USEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec; } static inline s64 ktime_to_ms(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_MSEC); +/* return ktime_divns(kt, NSEC_PER_MSEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC; } static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
regression in ktime.h circa 3.16.0-rc5+ breaks lirc irsend, bad commit 166afb64511
Hello everyone, this is my first attempt at bisecting a kernel to solve a bug. Please bear with me. I have successfully bisected and located a commit that is causing my problem. Look at commit 166afb64511. ktime_to_us returns s64, but the commit changes it so ktime_to_us just returns what ktime_divns returns, and ktime_divns returns a u64! If the u64 is big enough, wouldn't it wrap s64 around to a negative number? Or, perhaps if some caller is passing in negative ktime_t to begin with it will trigger without having to hit big numbers. With my limited knowledge of C, I am stabbing in the dark here. That's just my guess as to why this commit causes my problem. My bug symptom is my previously working MythTV lirc blaster no longer reliably sends IR signals. Using irsend to test I can see irsend is just timing out (and only sometimes blasts, usually the first attempt). On good kernels it returns immediately after blasting. This little patch (at bottom of email) that puts the code back in place and gets rid of the function call fixes the problem for me. I applied this patch to the very latest FC21 kernel-PAE-3.19.1-201.fc21.i686 src.rpm and rpmbuilded and the bug is gone! I can once again MythTV. Hooray. I suspect no one else is seeing this because less people are running 32-bit now, and perhaps in most code paths the value of the u64 never gets above 2^63. I suspect something in drivers/media (possibly) is passing very high or negative values (possibly another bug) to these calls. Obviously my patch isn't the real solution, the real solution is to make the new function calls use a consistent 64-bit type, or figure out what in my code path is calling these functions and check it for value sanity. I've documented the whole process / details of this bug in RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 Thanks! diff -uNr a/include/linux/ktime.h b/include/linux/ktime.h --- a/include/linux/ktime.h 2015-02-08 20:54:22.0 -0600 +++ b/include/linux/ktime.h 2015-03-23 01:09:43.0 -0500 @@ -173,12 +173,16 @@ static inline s64 ktime_to_us(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_USEC); +/* return ktime_divns(kt, NSEC_PER_USEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * USEC_PER_SEC + tv.tv_usec; } static inline s64 ktime_to_ms(const ktime_t kt) { - return ktime_divns(kt, NSEC_PER_MSEC); +/* return ktime_divns(kt, NSEC_PER_MSEC); */ + struct timeval tv = ktime_to_timeval(kt); + return (s64) tv.tv_sec * MSEC_PER_SEC + tv.tv_usec / USEC_PER_MSEC; } static inline s64 ktime_us_delta(const ktime_t later, const ktime_t earlier) -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/