Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-19 Thread John Stultz
On Wed, Aug 19, 2020 at 7:01 PM John Stultz  wrote:
>
> On Wed, Aug 19, 2020 at 2:36 PM John Stultz  wrote:
> >
> > On Wed, Aug 19, 2020 at 4:46 AM Mauro Carvalho Chehab
> >  wrote:
> > > So, IMO, the best is to keep it on staging for a while, until those
> > > remaining bugs gets solved.
> > >
> > > I added this series, together with the regulator driver and
> > > a few other patches (including a hack to fix a Kernel 5.8
> > > regression at WiFi ) at:
> > >
> > > 
> > > https://gitlab.freedesktop.org/mchehab_kernel/hikey-970/-/commits/master
> >
> > Sorry, one more small request: Could you create a branch that only has
> > the DRM driver changes in it?
> >
> > The reason I ask, is that since the HiKey960 isn't affected by the
> > majority of the problems you listed as motivation for going through
> > staging. So if we can validate that your tree works fine on HiKey960,
> > the series can be cleaned up and submitted properly upstream to enable
> > that SoC, and the outstanding 970 issues can be worked out afterwards
> > against mainline.
>
> Just as a heads up, I tried testing your tree with my HiKey960, and
> after fixing the compat string inconsistency, the drivers seem to load
> properly. However the drm_hwcomposer seems to have some trouble with
> the driver:
> 01-01 00:12:41.456   345   345 E hwc-drm-display-compositor: Commit
> test failed for display 0, FIXME
> 01-01 00:12:41.456   345   345 E hwc-drm-two: Failed to apply the
> frame composition ret=-22
> 01-01 00:12:41.456   351   351 E HWComposer:
> presentAndGetReleaseFences: present failed for display 0: BadParameter
> (4)
>
> I'll dig in a bit further as to why, but wanted to give you a heads up.

Ok, I've mostly gotten it sorted out:
  - You're missing a few color formats.
  - And I re-discovered a crash that was already fixed in my tree.

I'll send those patches in a few here.

That said even with the patches I've got on top of your series, I
still see a few issues:
1) I'm seeing red-blue swap with your driver.  I need to dig a bit to
see what the difference is, I know gralloc has a config option for
this, and maybe the version of the driver I'm carrying has it wrong?
2) Performance is noticeably worse. Whereas with my tree, I see close
to 60fps (that clk issue we mentioned earlier is why it's not exactly
60) in most tests, but with yours it mostly hovers around 30some fps,
occasionally speeding up to 40 and then back down.

Obviously with some work I suspect we'll be able to sort these out,
but I also do feel that the set you're starting with for upstreaming
is pretty old. The driver I'm carrying was heavily refactored around
5.0 to share code with the existing kirin driver, in the hopes of
making usptreaming easier, and it seems a shame to throw that out and
focus your efforts on the older tree.

But to be fair, I've not had time to upstream the driver myself, and
it's obviously your choice on how you spend your time.  I am really
excited to see your efforts here, regardless of which driver you end
up pushing.

thanks
-john


Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-19 Thread John Stultz
On Wed, Aug 19, 2020 at 2:36 PM John Stultz  wrote:
>
> On Wed, Aug 19, 2020 at 4:46 AM Mauro Carvalho Chehab
>  wrote:
> > So, IMO, the best is to keep it on staging for a while, until those
> > remaining bugs gets solved.
> >
> > I added this series, together with the regulator driver and
> > a few other patches (including a hack to fix a Kernel 5.8
> > regression at WiFi ) at:
> >
> > 
> > https://gitlab.freedesktop.org/mchehab_kernel/hikey-970/-/commits/master
>
> Sorry, one more small request: Could you create a branch that only has
> the DRM driver changes in it?
>
> The reason I ask, is that since the HiKey960 isn't affected by the
> majority of the problems you listed as motivation for going through
> staging. So if we can validate that your tree works fine on HiKey960,
> the series can be cleaned up and submitted properly upstream to enable
> that SoC, and the outstanding 970 issues can be worked out afterwards
> against mainline.

Just as a heads up, I tried testing your tree with my HiKey960, and
after fixing the compat string inconsistency, the drivers seem to load
properly. However the drm_hwcomposer seems to have some trouble with
the driver:
01-01 00:12:41.456   345   345 E hwc-drm-display-compositor: Commit
test failed for display 0, FIXME
01-01 00:12:41.456   345   345 E hwc-drm-two: Failed to apply the
frame composition ret=-22
01-01 00:12:41.456   351   351 E HWComposer:
presentAndGetReleaseFences: present failed for display 0: BadParameter
(4)

I'll dig in a bit further as to why, but wanted to give you a heads up.

thanks
-john


Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-19 Thread John Stultz
On Wed, Aug 19, 2020 at 4:46 AM Mauro Carvalho Chehab
 wrote:
> So, IMO, the best is to keep it on staging for a while, until those
> remaining bugs gets solved.
>
> I added this series, together with the regulator driver and
> a few other patches (including a hack to fix a Kernel 5.8
> regression at WiFi ) at:
>
> 
> https://gitlab.freedesktop.org/mchehab_kernel/hikey-970/-/commits/master

Sorry, one more small request: Could you create a branch that only has
the DRM driver changes in it?

The reason I ask, is that since the HiKey960 isn't affected by the
majority of the problems you listed as motivation for going through
staging. So if we can validate that your tree works fine on HiKey960,
the series can be cleaned up and submitted properly upstream to enable
that SoC, and the outstanding 970 issues can be worked out afterwards
against mainline.

thanks
-john


Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-19 Thread John Stultz
On Wed, Aug 19, 2020 at 4:46 AM Mauro Carvalho Chehab
 wrote:
> Yet, I'm submitting it via staging due to the following reasons:
>
> - It depends on the LDO3 power supply, which is provided by
>   a regulator driver that it is currently on staging;
> - Due to legal reasons, I need to preserve the authorship of
>   each one responsbile for each patch. So, I need to start from
>   the original patch from Kernel 4.4;
> - There are still some problems I need to figure out how to solve:
>- The adv7535 can't get EDID data. Maybe it is a timing issue,
>  but it requires more research to be sure about how to solve it;

I've seen this on the HiKey960 as well. There is a patch to the
adv7533 driver I have to add a mdelay that seems to consistently
resolve the timing problem.  At some point I mentioned it to one of
the maintainers who seems open to having it added, but it seemed silly
to submit it until there was a upstream driver that needed such a
change.  So I think that patch can be submitted as a follow on to this
(hopefully cleaned up) series.

>- The driver only accept resolutions on a defined list, as there's
>  a known bug that this driver may have troubles with random
>  resolutions. Probably due to a bug at the pixel clock settings;

So, yes, the SoC clks can't generate proper signals for HDMI
frequencies (apparently it's not an issue for panels). There is a
fixed set that we can get "close enough" that most monitors will work,
but its always a bit iffy (some monitors are strict in what they
take).

On the kirin driver, we were able to do a calculation to figure out if
the generated frequency would be close enough:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/hisilicon/kirin/dw_drm_dsi.c#n615

I suspect we could do something similar for the hikey960/70, but I've
not really had time to dig in deeply there.

Personally, I don't see the allow-list as a problematic short term
solution, and again, not sure its worth pushing to staging for.

>- Sometimes (at least with 1080p), it generates LDI underflow
>  errors, which in turn causes the DRM to stop working. That
>  happens for example when using gdm on Wayland and
>  gnome on X11;

Interestingly, I've not seen this on HiKey960 (at least with
Android/Surfaceflinger). The original HiKey board does have the
trouble where at 1080p the screen sometimes comes up horizontally
offset due to the LDI underflow, but the patches to address it have
been worse then the problem, so we reverted those.

>- Probably related to the previous issue, when the monitor
>  suspends due to DPMS, it doesn't return back to life.
>

I don't believe I see this on HiKey960. But if it's the LDI issue on
the 970 that may explain it.


> So, IMO, the best is to keep it on staging for a while, until those
> remaining bugs gets solved.

I'm not sure I see all of these as compelling for pushing it in via
staging. And I suspect in the process of submitting the patches for
review folks may find the cause of some of the problems you list here.


> I added this series, together with the regulator driver and
> a few other patches (including a hack to fix a Kernel 5.8
> regression at WiFi ) at:
>
> 
> https://gitlab.freedesktop.org/mchehab_kernel/hikey-970/-/commits/master
>
>
> Chen Feng (1):
>   staging: hikey9xx: Add hisilicon DRM driver for hikey960/970
>
> John Stultz (1):
>   staging: hikey9xx/gpu: port it to work with Kernel v4.9

Nit: This is a display driver and has little to do with the GPU (other
then it will eventually live in drivers/gpu/drm/...), so I might
suggest using more conventional subject prefix,  "drm: hisilicon:"

thanks
-john


Re: [PATCH 00/49] DRM driver for Hikey 970

2020-08-19 Thread John Stultz
On Wed, Aug 19, 2020 at 8:31 AM Laurent Pinchart
 wrote:
> On Wed, Aug 19, 2020 at 05:21:20PM +0200, Sam Ravnborg wrote:
> > On Wed, Aug 19, 2020 at 01:45:28PM +0200, Mauro Carvalho Chehab wrote:
> > > This patch series port the out-of-tree driver for Hikey 970 (which
> > > should also support Hikey 960) from the official 96boards tree:
> > >
> > >https://github.com/96boards-hikey/linux/tree/hikey970-v4.9
> > >
> > > Based on his history, this driver seems to be originally written
> > > for Kernel 4.4, and was later ported to Kernel 4.9. The original
> > > driver used to depend on ION (from Kernel 4.4) and had its own
> > > implementation for FB dev API.
> > >
> > > As I need to preserve the original history (with has patches from
> > > both HiSilicon and from Linaro),  I'm starting from the original
> > > patch applied there. The remaining patches are incremental,
> > > and port this driver to work with upstream Kernel.
> > >
...
> > > - Due to legal reasons, I need to preserve the authorship of
> > >   each one responsbile for each patch. So, I need to start from
> > >   the original patch from Kernel 4.4;
...
> > I do acknowledge you need to preserve history and all -
> > but this patchset is not easy to review.
>
> Why do we need to preserve history ? Adding relevant Signed-off-by and
> Co-developed-by should be enough, shouldn't it ? Having a public branch
> that contains the history is useful if anyone is interested, but I don't
> think it's required in mainline.

Yea. I concur with Laurent here. I'm not sure what legal reasoning you
have on this but preserving the "absolute" history here is actively
detrimental for review and understanding of the patch set.

Preserving Authorship, Signed-off-by lines and adding Co-developed-by
lines should be sufficient to provide both atribution credit and DCO
history.

thanks
-john


Re: [PATCH v7 3/9] net/scm: Regularize compat handling of scm_detach_fds()

2020-08-07 Thread John Stultz
On Fri, Aug 7, 2020 at 3:18 PM Kees Cook  wrote:
>
> On Fri, Aug 07, 2020 at 01:29:24PM -0700, John Stultz wrote:
> > On Thu, Jul 9, 2020 at 11:28 AM Kees Cook  wrote:
> > >
> > > Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
> > > scm_detach_fds") into the compat code.
> > >
> > > Replace open-coded __receive_sock() with a call to the helper.
> > >
> > > Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
> > > vs user buffers for ->msg_control") to before the compat call, even
> > > though it should be impossible for an in-kernel call to also be compat.
> > >
> > > Correct the int "flags" argument to unsigned int to match fd_install()
> > > and similar APIs.
> > >
> > > Regularize any remaining differences, including a whitespace issue,
> > > a checkpatch warning, and add the check from commit 6900317f5eff ("net,
> > > scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
> > > fixed an overflow unique to 64-bit. To avoid confusion when comparing
> > > the compat handler to the native handler, just include the same check
> > > in the compat handler.
> > >
> > > Acked-by: Christian Brauner 
> > > Signed-off-by: Kees Cook 
> > > ---
> >
> > Hey Kees,
> >   So during the merge window (while chasing a few other regressions),
> > I noticed occasionally my Dragonboard 845c running AOSP having trouble
> > with the web browser crashing or other apps hanging, and I've bisected
> > the issue down to this change.
> >
> > Unfortunately it doesn't revert cleanly so I can't validate reverting
> > it sorts things against linus/HEAD.  Anyway, I wanted to check and see
> > if you had any other reports of similar or any ideas what might be
> > going wrong?
>
> Hi; Yes, sorry for the trouble. I had a typo in a refactor of
> SCM_RIGHTS. I suspect it'll be fixed by this:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1fa2c0a0c814fbae0eb3e79a510765225570d043
>
> Can you verify Linus's latest tree works for you? If not, there might be
> something else hiding in the corners...

Thanks so much! Yes, I just updated to Linus' latest and the issue has
disappeared!

thanks again!
-john


Re: [PATCH v7 3/9] net/scm: Regularize compat handling of scm_detach_fds()

2020-08-07 Thread John Stultz
On Thu, Jul 9, 2020 at 11:28 AM Kees Cook  wrote:
>
> Duplicate the cleanups from commit 2618d530dd8b ("net/scm: cleanup
> scm_detach_fds") into the compat code.
>
> Replace open-coded __receive_sock() with a call to the helper.
>
> Move the check added in commit 1f466e1f15cf ("net: cleanly handle kernel
> vs user buffers for ->msg_control") to before the compat call, even
> though it should be impossible for an in-kernel call to also be compat.
>
> Correct the int "flags" argument to unsigned int to match fd_install()
> and similar APIs.
>
> Regularize any remaining differences, including a whitespace issue,
> a checkpatch warning, and add the check from commit 6900317f5eff ("net,
> scm: fix PaX detected msg_controllen overflow in scm_detach_fds") which
> fixed an overflow unique to 64-bit. To avoid confusion when comparing
> the compat handler to the native handler, just include the same check
> in the compat handler.
>
> Acked-by: Christian Brauner 
> Signed-off-by: Kees Cook 
> ---

Hey Kees,
  So during the merge window (while chasing a few other regressions),
I noticed occasionally my Dragonboard 845c running AOSP having trouble
with the web browser crashing or other apps hanging, and I've bisected
the issue down to this change.

Unfortunately it doesn't revert cleanly so I can't validate reverting
it sorts things against linus/HEAD.  Anyway, I wanted to check and see
if you had any other reports of similar or any ideas what might be
going wrong?

thanks
-john


Re: [GIT] Networking

2020-08-07 Thread John Stultz
On Fri, Aug 7, 2020 at 12:19 AM Christoph Hellwig  wrote:
>
> On Thu, Aug 06, 2020 at 11:23:34PM -0700, John Stultz wrote:
> > So I've finally rebase-bisected it down to:
> >   a31edb2059ed ("net: improve the user pointer check in init_user_sockptr")
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a31edb2059ed4e498f9aa8230c734b59d0ad797a
> >
> > And reverting that from linus/HEAD (at least from this morning) seems
> > to avoid it.
> >
> > Seems like it is just adding extra checks on the data passed, so maybe
> > existing trouble from a different driver is the issue here, but it's
> > not really clear from the crash what might be wrong.
> >
> > Suggestions would be greatly appreciated!
>
> I think the sockpt optimization is just a little to clever for its
> own sake, as also chown by the other issue pointed out by Eric.
>
> Can you try this revert that just goes back to the "boring" normal
> version for everyone?

Yes! This seems to avoid the crash and networking looks ok.

Tested-by: John Stultz 

thanks
-john


Re: [GIT] Networking

2020-08-06 Thread John Stultz
On Thu, Aug 6, 2020 at 11:23 PM John Stultz  wrote:
>
> On Thu, Aug 6, 2020 at 5:32 PM John Stultz  wrote:
> >
> > On Thu, Aug 6, 2020 at 4:17 PM Eric Dumazet  wrote:
> > > On 8/6/20 2:39 PM, John Stultz wrote:
> > > > [   19.709492] Unable to handle kernel access to user memory outside
> > > > uaccess routines at virtual address 006f53337070
> > > > [   19.726539] Mem abort info:
> > > > [   19.726544]   ESR = 0x960f
> > > > [   19.741323]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > [   19.741326]   SET = 0, FnV = 0
> > > > [   19.761185]   EA = 0, S1PTW = 0
> > > > [   19.761188] Data abort info:
> > > > [   19.761190]   ISV = 0, ISS = 0x000f
> > > > [   19.761192]   CM = 0, WnR = 0
> > > > [   19.761199] user pgtable: 4k pages, 39-bit VAs, pgdp=00016e9e9000
> > > > [   19.777584] [006f53337070] pgd=00016e99e003,
> > > > p4d=00016e99e003, pud=00016e99e003, pmd=00016e99a003,
> > > > pte=00e800016d3c7f53
> > > > [   19.789205] Internal error: Oops: 960f [#1] PREEMPT SMP
> > > > [   19.789211] Modules linked in:
> > > > [   19.797153] CPU: 7 PID: 364 Comm: iptables-restor Tainted: G
> > > > W 5.8.0-mainline-08255-gf9e74a8eb6f3 #3350
> > > > [   19.797156] Hardware name: Thundercomm Dragonboard 845c (DT)
> > > > [   19.797161] pstate: a045 (NzCv daif +PAN -UAO BTYPE=--)
> > > > [   19.797177] pc : do_ipt_set_ctl+0x304/0x610
> > > > [   19.807891] lr : do_ipt_set_ctl+0x50/0x610
> > > > [   19.807894] sp : ffc0139bbba0
> > > > [   19.807898] x29: ffc0139bbba0 x28: ff80f07a3800
> > > > [   19.846468] x27:  x26: 
> > > > [   19.846472] x25:  x24: 0698
> > > > [   19.846476] x23: ffec8eb0cc80 x22: 0040
> > > > [   19.846480] x21: b46f53337070 x20: ffec8eb0c000
> > > > [   19.846484] x19: ffec8e9e9000 x18: 
> > > > [   19.846487] x17:  x16: 
> > > > [   19.846491] x15:  x14: 
> > > > [   19.846495] x13:  x12: 
> > > > [   19.846501] x11:  x10: 
> > > > [   19.856005] x9 :  x8 : 
> > > > [   19.856008] x7 : ffec8e9e9d08 x6 : 
> > > > [   19.856012] x5 :  x4 : 0213
> > > > [   19.856015] x3 : 0001ffdeffef x2 : 11ded3fb0bb85e00
> > > > [   19.856019] x1 : 0027 x0 : 0080
> > > > [   19.856024] Call trace:
> > > > [   19.866319]  do_ipt_set_ctl+0x304/0x610
> > > > [   19.866327]  nf_setsockopt+0x64/0xa8
> > > > [   19.866332]  ip_setsockopt+0x21c/0x1710
> > > > [   19.866338]  raw_setsockopt+0x50/0x1b8
> > > > [   19.866347]  sock_common_setsockopt+0x50/0x68
> > > > [   19.882672]  __sys_setsockopt+0x120/0x1c8
> > > > [   19.882677]  __arm64_sys_setsockopt+0x30/0x40
> > > > [   19.882686]  el0_svc_common.constprop.3+0x78/0x188
> > > > [   19.882691]  do_el0_svc+0x80/0xa0
> > > > [   19.882699]  el0_sync_handler+0x134/0x1a0
> > > > [   19.901555]  el0_sync+0x140/0x180
> > > > [   19.901564] Code: aa1503e0 97fffd3e 2a0003f5 1780 (a9401ea6)
> > > > [   19.901569] ---[ end trace 22010e9688ae248f ]---
> > > > [   19.913033] Kernel panic - not syncing: Fatal exception
> > > > [   19.913042] SMP: stopping secondary CPUs
> > > > [   20.138885] Kernel Offset: 0x2c7d08 from 0xffc01000
> > > > [   20.138887] PHYS_OFFSET: 0xfffa8000
> > > > [   20.138894] CPU features: 0x0040002,2a80a218
> > > > [   20.138898] Memory Limit: none
> > > >
> > > > I'll continue to work on bisecting this down further, but figured I'd
> > > > share now as you or someone else might be able to tell whats wrong
> > > > from the trace.
> > > >
> > >
> > > Can you try at commit c2f12630c60ff33a9cafd221646053fc10ec59b6 
> > > ("netfilter: switch nf_setsockopt to sockptr_t")
> > > (and right before it)
> >
> >
> > So I rebased my patches ontop of that commit, but I'm not seeing the
> > crash there.  I also hand applied your suggested patch when I did see
> > the issue

Re: [GIT] Networking

2020-08-06 Thread John Stultz
On Thu, Aug 6, 2020 at 5:32 PM John Stultz  wrote:
>
> On Thu, Aug 6, 2020 at 4:17 PM Eric Dumazet  wrote:
> > On 8/6/20 2:39 PM, John Stultz wrote:
> > > [   19.709492] Unable to handle kernel access to user memory outside
> > > uaccess routines at virtual address 006f53337070
> > > [   19.726539] Mem abort info:
> > > [   19.726544]   ESR = 0x960f
> > > [   19.741323]   EC = 0x25: DABT (current EL), IL = 32 bits
> > > [   19.741326]   SET = 0, FnV = 0
> > > [   19.761185]   EA = 0, S1PTW = 0
> > > [   19.761188] Data abort info:
> > > [   19.761190]   ISV = 0, ISS = 0x000f
> > > [   19.761192]   CM = 0, WnR = 0
> > > [   19.761199] user pgtable: 4k pages, 39-bit VAs, pgdp=00016e9e9000
> > > [   19.777584] [006f53337070] pgd=00016e99e003,
> > > p4d=00016e99e003, pud=00016e99e003, pmd=00016e99a003,
> > > pte=00e800016d3c7f53
> > > [   19.789205] Internal error: Oops: 960f [#1] PREEMPT SMP
> > > [   19.789211] Modules linked in:
> > > [   19.797153] CPU: 7 PID: 364 Comm: iptables-restor Tainted: G
> > > W 5.8.0-mainline-08255-gf9e74a8eb6f3 #3350
> > > [   19.797156] Hardware name: Thundercomm Dragonboard 845c (DT)
> > > [   19.797161] pstate: a045 (NzCv daif +PAN -UAO BTYPE=--)
> > > [   19.797177] pc : do_ipt_set_ctl+0x304/0x610
> > > [   19.807891] lr : do_ipt_set_ctl+0x50/0x610
> > > [   19.807894] sp : ffc0139bbba0
> > > [   19.807898] x29: ffc0139bbba0 x28: ff80f07a3800
> > > [   19.846468] x27:  x26: 
> > > [   19.846472] x25:  x24: 0698
> > > [   19.846476] x23: ffec8eb0cc80 x22: 0040
> > > [   19.846480] x21: b46f53337070 x20: ffec8eb0c000
> > > [   19.846484] x19: ffec8e9e9000 x18: 
> > > [   19.846487] x17:  x16: 
> > > [   19.846491] x15:  x14: 
> > > [   19.846495] x13:  x12: 
> > > [   19.846501] x11:  x10: 
> > > [   19.856005] x9 :  x8 : 
> > > [   19.856008] x7 : ffec8e9e9d08 x6 : 
> > > [   19.856012] x5 :  x4 : 0213
> > > [   19.856015] x3 : 0001ffdeffef x2 : 11ded3fb0bb85e00
> > > [   19.856019] x1 : 0027 x0 : 0080
> > > [   19.856024] Call trace:
> > > [   19.866319]  do_ipt_set_ctl+0x304/0x610
> > > [   19.866327]  nf_setsockopt+0x64/0xa8
> > > [   19.866332]  ip_setsockopt+0x21c/0x1710
> > > [   19.866338]  raw_setsockopt+0x50/0x1b8
> > > [   19.866347]  sock_common_setsockopt+0x50/0x68
> > > [   19.882672]  __sys_setsockopt+0x120/0x1c8
> > > [   19.882677]  __arm64_sys_setsockopt+0x30/0x40
> > > [   19.882686]  el0_svc_common.constprop.3+0x78/0x188
> > > [   19.882691]  do_el0_svc+0x80/0xa0
> > > [   19.882699]  el0_sync_handler+0x134/0x1a0
> > > [   19.901555]  el0_sync+0x140/0x180
> > > [   19.901564] Code: aa1503e0 97fffd3e 2a0003f5 1780 (a9401ea6)
> > > [   19.901569] ---[ end trace 22010e9688ae248f ]---
> > > [   19.913033] Kernel panic - not syncing: Fatal exception
> > > [   19.913042] SMP: stopping secondary CPUs
> > > [   20.138885] Kernel Offset: 0x2c7d08 from 0xffc01000
> > > [   20.138887] PHYS_OFFSET: 0xfffa8000
> > > [   20.138894] CPU features: 0x0040002,2a80a218
> > > [   20.138898] Memory Limit: none
> > >
> > > I'll continue to work on bisecting this down further, but figured I'd
> > > share now as you or someone else might be able to tell whats wrong
> > > from the trace.
> > >
> >
> > Can you try at commit c2f12630c60ff33a9cafd221646053fc10ec59b6 ("netfilter: 
> > switch nf_setsockopt to sockptr_t")
> > (and right before it)
>
>
> So I rebased my patches ontop of that commit, but I'm not seeing the
> crash there.  I also hand applied your suggested patch when I did see
> the issue, but that didn't seem to fix it either.
>
> So far I've only narrowed it down to between
> 65ccbbda52288527b7c48087eb33bb0757975875..530fe9d433b9e60251bb8fdc5dddecbc486a50ef.
> But I'll keep rebase-bisecting it down.

So I've finally rebase-bisected it down to:
  a31edb2059ed ("net: improve the user pointer check in init_user_sockptr")
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a31edb2059ed4e498f9aa8230c734b59d0ad797a

And reverting that from linus/HEAD (at least from this morning) seems
to avoid it.

Seems like it is just adding extra checks on the data passed, so maybe
existing trouble from a different driver is the issue here, but it's
not really clear from the crash what might be wrong.

Suggestions would be greatly appreciated!

thanks
-john


Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0

2020-08-06 Thread John Stultz
On Thu, Aug 6, 2020 at 6:52 AM Thierry Reding  wrote:
>
> On Wed, Apr 22, 2020 at 08:32:43PM +, John Stultz wrote:
> > This patch addresses a regression in 5.7-rc1+
> >
> > In commit c8c43cee29f6 ("driver core: Fix
> > driver_deferred_probe_check_state() logic"), we both cleaned up
> > the logic and also set the default driver_deferred_probe_timeout
> > value to 30 seconds to allow for drivers that are missing
> > dependencies to have some time so that the dependency may be
> > loaded from userland after initcalls_done is set.
> >
> > However, Yoshihiro Shimoda reported that on his device that
> > expects to have unmet dependencies (due to "optional links" in
> > its devicetree), was failing to mount the NFS root.
> >
> > In digging further, it seemed the problem was that while the
> > device properly probes after waiting 30 seconds for any missing
> > modules to load, the ip_auto_config() had already failed,
> > resulting in NFS to fail. This was due to ip_auto_config()
> > calling wait_for_device_probe() which doesn't wait for the
> > driver_deferred_probe_timeout to fire.
> >
> > Fixing that issue is possible, but could also introduce 30
> > second delays in bootups for users who don't have any
> > missing dependencies, which is not ideal.
> >
> > So I think the best solution to avoid any regressions is to
> > revert back to a default timeout value of zero, and allow
> > systems that need to utilize the timeout in order for userland
> > to load any modules that supply misisng dependencies in the dts
> > to specify the timeout length via the exiting documented boot
> > argument.
> >
> > Thanks to Geert for chasing down that ip_auto_config was why NFS
> > was failing in this case!
> >
> > Cc: "David S. Miller" 
> > Cc: Alexey Kuznetsov 
> > Cc: Hideaki YOSHIFUJI 
> > Cc: Jakub Kicinski 
> > Cc: Greg Kroah-Hartman 
> > Cc: Rafael J. Wysocki 
> > Cc: Rob Herring 
> > Cc: Geert Uytterhoeven 
> > Cc: Yoshihiro Shimoda 
> > Cc: Robin Murphy 
> > Cc: Andy Shevchenko 
> > Cc: Sudeep Holla 
> > Cc: Andy Shevchenko 
> > Cc: Naresh Kamboju 
> > Cc: Basil Eljuse 
> > Cc: Ferry Toth 
> > Cc: Arnd Bergmann 
> > Cc: Anders Roxell 
> > Cc: netdev 
> > Cc: linux...@vger.kernel.org
> > Reported-by: Yoshihiro Shimoda 
> > Tested-by: Yoshihiro Shimoda 
> > Fixes: c8c43cee29f6 ("driver core: Fix driver_deferred_probe_check_state() 
> > logic")
> > Signed-off-by: John Stultz 
> > ---
> >  drivers/base/dd.c | 13 ++---
> >  1 file changed, 2 insertions(+), 11 deletions(-)
>
> Sorry for being a bit late to the party, but this breaks suspend/resume
> support on various Tegra devices. I've only noticed now because, well,
> suspend/resume have been broken for other reasons for a little while and
> it's taken us a bit to resolve those issues.
>
> But now that those other issues have been fixed, I've started seeing an
> issue where after resume from suspend some of the I2C controllers are no
> longer working. The reason for this is that they share pins with DP AUX
> controllers via the pinctrl framework. The DP AUX driver registers as
> part of the DRM/KMS driver, which usually happens in userspace. Since
> the deferred probe timeout was set to 0 by default this no longer works
> because no pinctrl states are assigned to the I2C controller and
> therefore upon resume the pins cannot be configured for I2C operation.

Oof. My apologies!

> I'm also somewhat confused by this patch and a few before because they
> claim that they restore previous default behaviour, but that's just not
> true. Originally when this timeout was introduced it was -1, which meant
> that there was no timeout at all and hence users had to opt-in if they
> wanted to use a deferred probe timeout.

I don't think that's quite true, since the point of my original
changes were to avoid troubles I was seeing with drivers not loading
because once the timeout fired after init, driver loading would fail
with ENODEV instead of returning EPROBE_DEFER. The logic that existed
was buggy so the timeout handling didn't really work (changing the
boot argument wouldn't help, because after init the logic would return
ENODEV before it checked the timeout value).

That said, looking at it now, I do realize the
driver_deferred_probe_check_state_continue() logic in effect never
returned ETIMEDOUT before was consolidated in the earlier changes, and
now we've backed the default timeout to 0, old user (see bec6c0ecb243)
will now get ETIMEDOUT wher

Re: [GIT] Networking

2020-08-06 Thread John Stultz
On Thu, Aug 6, 2020 at 4:17 PM Eric Dumazet  wrote:
> On 8/6/20 2:39 PM, John Stultz wrote:
> > [   19.709492] Unable to handle kernel access to user memory outside
> > uaccess routines at virtual address 006f53337070
> > [   19.726539] Mem abort info:
> > [   19.726544]   ESR = 0x960f
> > [   19.741323]   EC = 0x25: DABT (current EL), IL = 32 bits
> > [   19.741326]   SET = 0, FnV = 0
> > [   19.761185]   EA = 0, S1PTW = 0
> > [   19.761188] Data abort info:
> > [   19.761190]   ISV = 0, ISS = 0x000f
> > [   19.761192]   CM = 0, WnR = 0
> > [   19.761199] user pgtable: 4k pages, 39-bit VAs, pgdp=00016e9e9000
> > [   19.777584] [006f53337070] pgd=00016e99e003,
> > p4d=00016e99e003, pud=00016e99e003, pmd=00016e99a003,
> > pte=00e800016d3c7f53
> > [   19.789205] Internal error: Oops: 960f [#1] PREEMPT SMP
> > [   19.789211] Modules linked in:
> > [   19.797153] CPU: 7 PID: 364 Comm: iptables-restor Tainted: G
> > W 5.8.0-mainline-08255-gf9e74a8eb6f3 #3350
> > [   19.797156] Hardware name: Thundercomm Dragonboard 845c (DT)
> > [   19.797161] pstate: a045 (NzCv daif +PAN -UAO BTYPE=--)
> > [   19.797177] pc : do_ipt_set_ctl+0x304/0x610
> > [   19.807891] lr : do_ipt_set_ctl+0x50/0x610
> > [   19.807894] sp : ffc0139bbba0
> > [   19.807898] x29: ffc0139bbba0 x28: ff80f07a3800
> > [   19.846468] x27:  x26: 
> > [   19.846472] x25:  x24: 0698
> > [   19.846476] x23: ffec8eb0cc80 x22: 0040
> > [   19.846480] x21: b46f53337070 x20: ffec8eb0c000
> > [   19.846484] x19: ffec8e9e9000 x18: 
> > [   19.846487] x17:  x16: 
> > [   19.846491] x15:  x14: 
> > [   19.846495] x13:  x12: 
> > [   19.846501] x11:  x10: 
> > [   19.856005] x9 :  x8 : 
> > [   19.856008] x7 : ffec8e9e9d08 x6 : 
> > [   19.856012] x5 :  x4 : 0213
> > [   19.856015] x3 : 0001ffdeffef x2 : 11ded3fb0bb85e00
> > [   19.856019] x1 : 0027 x0 : 0080
> > [   19.856024] Call trace:
> > [   19.866319]  do_ipt_set_ctl+0x304/0x610
> > [   19.866327]  nf_setsockopt+0x64/0xa8
> > [   19.866332]  ip_setsockopt+0x21c/0x1710
> > [   19.866338]  raw_setsockopt+0x50/0x1b8
> > [   19.866347]  sock_common_setsockopt+0x50/0x68
> > [   19.882672]  __sys_setsockopt+0x120/0x1c8
> > [   19.882677]  __arm64_sys_setsockopt+0x30/0x40
> > [   19.882686]  el0_svc_common.constprop.3+0x78/0x188
> > [   19.882691]  do_el0_svc+0x80/0xa0
> > [   19.882699]  el0_sync_handler+0x134/0x1a0
> > [   19.901555]  el0_sync+0x140/0x180
> > [   19.901564] Code: aa1503e0 97fffd3e 2a0003f5 1780 (a9401ea6)
> > [   19.901569] ---[ end trace 22010e9688ae248f ]---
> > [   19.913033] Kernel panic - not syncing: Fatal exception
> > [   19.913042] SMP: stopping secondary CPUs
> > [   20.138885] Kernel Offset: 0x2c7d08 from 0xffc01000
> > [   20.138887] PHYS_OFFSET: 0xfffa8000
> > [   20.138894] CPU features: 0x0040002,2a80a218
> > [   20.138898] Memory Limit: none
> >
> > I'll continue to work on bisecting this down further, but figured I'd
> > share now as you or someone else might be able to tell whats wrong
> > from the trace.
> >
>
> Can you try at commit c2f12630c60ff33a9cafd221646053fc10ec59b6 ("netfilter: 
> switch nf_setsockopt to sockptr_t")
> (and right before it)


So I rebased my patches ontop of that commit, but I'm not seeing the
crash there.  I also hand applied your suggested patch when I did see
the issue, but that didn't seem to fix it either.

So far I've only narrowed it down to between
65ccbbda52288527b7c48087eb33bb0757975875..530fe9d433b9e60251bb8fdc5dddecbc486a50ef.
But I'll keep rebase-bisecting it down.

thanks
-john


Re: [GIT] Networking

2020-08-06 Thread John Stultz
On Wed, Aug 5, 2020 at 6:57 PM David Miller  wrote:
> There is a minor conflict in net/ipv6/ip6_flowlabel.c, it's because of
> the commit that did the tree-wide removal of uninitialized_var().  The
> resolution is simple, kill all of the conflict markers and content
> within, and remove the uninitialized_var() marker that got moved
> elsewhere in the file in the net-next tree.
>
> Otherwise, we have:
>
> 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.
>
> 2) Support UDP segmentation in code TSO code, from Eric Dumazet.
>
> 3) Allow flashing different flash images in cxgb4 driver, from Vishal
>Kulkarni.
>
> 4) Add drop frames counter and flow status to tc flower offloading,
>from Po Liu.
>
> 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.
>
> 6) Various new indirect call avoidance, from Eric Dumazet and Brian
>Vazquez.
>
> 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
>Yonghong Song.
>
> 8) Support querying and setting hardware address of a port function
>via devlink, use this in mlx5, from Parav Pandit.
>
> 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.
>
> 10) Switch qca8k driver over to phylink, from Jonathan McDowell.
>
> 11) In bpftool, show list of processes holding BPF FD references to
> maps, programs, links, and btf objects.  From Andrii Nakryiko.
>
> 12) Several conversions over to generic power management, from Vaibhav
> Gupta.
>
> 13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from
> Dmitry Yakunin.
>
> 14) Various https url conversions, from Alexander A. Klimov.
>
> 15) Timestamping and PHC support for mscc PHY driver, from Antoine
> Tenart.
>
> 16) Support bpf iterating over tcp and udp sockets, from Yonghong
> Song.
>
> 17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.
>
> 18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.
>
> 19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
> drivers.  From Luc Van Oostenryck.
>
> 20) XDP support for xen-netfront, from Denis Kirjanov.
>
> 21) Support receive buffer autotuning in MPTCP, from Florian Westphal.
>
> 22) Support EF100 chip in sfc driver, from Edward Cree.
>
> 23) Add XDP support to mvpp2 driver, from Matteo Croce.
>
> 24) Support MPTCP in sock_diag, from Paolo Abeni.
>
> 25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
> infrastructure, from Jakub Kicinski.
>
> 26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.
>
> 27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.
>
> 28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.
>
> 29) Refactor a lot of networking socket option handling code in
> order to avoid set_fs() calls, from Christoph Hellwig.
>
> 30) Add rfc4884 support to icmp code, from Willem de Bruijn.
>
> 31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.
>
> 32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.
>
> 33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.
>
> 34) Support TCP syncookies in MPTCP, from Flowian Westphal.
>
> 35) Fix several tricky cases of PMTU handling wrt. briding, from
> Stefano Brivio.
>
> Please pull, thanks a lot!
>
> The following changes since commit ac3a0c8472969a03c0496ae774b3a29eb26c8d5a:
>
>   Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net (2020-08-01 
> 16:47:24 -0700)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git

Hey David, All,
  Just as a heads up, after net-next was merged into Linus' tree, I
started hitting the following crash on boot on the Dragonboard 845c
booting AOSP.

I've bisected it down to the net-next merge, but haven't bisected it
further yet, as I still have a handful of (unrelated to networking)
out of tree patches needed to boot the board.

[   19.709492] Unable to handle kernel access to user memory outside
uaccess routines at virtual address 006f53337070
[   19.726539] Mem abort info:
[   19.726544]   ESR = 0x960f
[   19.741323]   EC = 0x25: DABT (current EL), IL = 32 bits
[   19.741326]   SET = 0, FnV = 0
[   19.761185]   EA = 0, S1PTW = 0
[   19.761188] Data abort info:
[   19.761190]   ISV = 0, ISS = 0x000f
[   19.761192]   CM = 0, WnR = 0
[   19.761199] user pgtable: 4k pages, 39-bit VAs, pgdp=00016e9e9000
[   19.777584] [006f53337070] pgd=00016e99e003,
p4d=00016e99e003, pud=00016e99e003, pmd=00016e99a003,
pte=00e800016d3c7f53
[   19.789205] Internal error: Oops: 960f [#1] PREEMPT SMP
[   19.789211] Modules linked in:
[   19.797153] CPU: 7 PID: 364 Comm: iptables-restor Tainted: G
W 5.8.0-mainline-08255-gf9e74a8eb6f3 #3350
[   19.797156] Hardware name: Thundercomm Dragonboard 845c (DT)
[   19.797161] pstate: a045 (NzCv daif +PAN -UAO BTYPE=--)
[   19.797177] pc : do_ipt_set_ctl+0x304/0x610
[   19.807891] lr : do_ipt_set_ctl+0x50/0x610
[   19.807894] sp : ffc0139bbba0
[   19.

Re: [PATCH bpf v2] restore behaviour of CAP_SYS_ADMIN allowing the loading of networking bpf programs

2020-07-06 Thread John Stultz
On Mon, Jul 6, 2020 at 1:15 PM Daniel Borkmann  wrote:
> On 7/6/20 10:11 PM, John Stultz wrote:
> >Just wanted to follow up on this as I've not seen the regression fix
> > land in 5.8-rc4 yet? Is it still pending, or did it fall through a
> > gap?
>
> No, it's in DaveM's -net tree currently, will go to Linus' tree on his next 
> pull req:
>
>
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=b338cb921e6739ff59ce32f43342779fe5ffa732

Great! Much appreciated! Sorry to nag!
-john


Re: [PATCH bpf v2] restore behaviour of CAP_SYS_ADMIN allowing the loading of networking bpf programs

2020-07-06 Thread John Stultz
On Tue, Jun 23, 2020 at 5:54 PM Alexei Starovoitov
 wrote:
> On Mon, Jun 22, 2020 at 12:44 PM John Stultz  wrote:
> > On Sat, Jun 20, 2020 at 2:26 PM Maciej Żenczykowski
> >  wrote:
> > > From: Maciej Żenczykowski 
> > >
> > > This is a fix for a regression introduced in 5.8-rc1 by:
> > >   commit 2c78ee898d8f10ae6fb2fa23a3fbaec96b1b7366
> > >   'bpf: Implement CAP_BPF'
> > >
> > > Before the above commit it was possible to load network bpf programs
> > > with just the CAP_SYS_ADMIN privilege.
> > >
> > > The Android bpfloader happens to run in such a configuration (it has
> > > SYS_ADMIN but not NET_ADMIN) and creates maps and loads bpf programs
> > > for later use by Android's netd (which has NET_ADMIN but not SYS_ADMIN).
> > >
> > > Cc: Alexei Starovoitov 
> > > Cc: Daniel Borkmann 
> > > Reported-by: John Stultz 
> > > Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF")
> > > Signed-off-by: Maciej Żenczykowski 
> >
> > Thanks so much for helping narrow this regression down and submitting this 
> > fix!
> > It's much appreciated!
> >
> > Tested-by: John Stultz 
>
> Applied to bpf tree. Thanks

Hey all,
  Just wanted to follow up on this as I've not seen the regression fix
land in 5.8-rc4 yet? Is it still pending, or did it fall through a
gap?

thanks
-john


Re: [PATCH bpf v2] restore behaviour of CAP_SYS_ADMIN allowing the loading of networking bpf programs

2020-06-22 Thread John Stultz
On Sat, Jun 20, 2020 at 2:26 PM Maciej Żenczykowski
 wrote:
>
> From: Maciej Żenczykowski 
>
> This is a fix for a regression introduced in 5.8-rc1 by:
>   commit 2c78ee898d8f10ae6fb2fa23a3fbaec96b1b7366
>   'bpf: Implement CAP_BPF'
>
> Before the above commit it was possible to load network bpf programs
> with just the CAP_SYS_ADMIN privilege.
>
> The Android bpfloader happens to run in such a configuration (it has
> SYS_ADMIN but not NET_ADMIN) and creates maps and loads bpf programs
> for later use by Android's netd (which has NET_ADMIN but not SYS_ADMIN).
>
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Reported-by: John Stultz 
> Fixes: 2c78ee898d8f ("bpf: Implement CAP_BPF")
> Signed-off-by: Maciej Żenczykowski 

Thanks so much for helping narrow this regression down and submitting this fix!
It's much appreciated!

Tested-by: John Stultz 

thanks
-john


Re: [PATCH v3 1/3] driver core: Revert default driver_deferred_probe_timeout value to 0

2020-04-29 Thread John Stultz
On Wed, Apr 29, 2020 at 6:52 AM Mark Brown  wrote:
> On Wed, Apr 29, 2020 at 03:46:04PM +0200, Marek Szyprowski wrote:
> > On 22.04.2020 22:32, John Stultz wrote:
>
> > > Fixes: c8c43cee29f6 ("driver core: Fix 
> > > driver_deferred_probe_check_state() logic")
> > > Signed-off-by: John Stultz 
>
> > Please also revert dca0b44957e5 "regulator: Use
> > driver_deferred_probe_timeout for regulator_init_complete_work" then,
> > because now with the default 0 timeout some regulators gets disabled
> > during boot, before their supplies gets instantiated.
>
> Yes, please - I requested this when the revert was originally proposed :(

Oh, my apologies. I misunderstood what you were suggesting earlier.
Sorry for being thick headed.

I'll spin up a revert here shortly.

thanks
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-11 Thread John Stultz
On Fri, Aug 11, 2017 at 5:31 PM, John Stultz  wrote:
> On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang  wrote:
>>> If after Cong's fix, the issue still happens, could you help try the
>>> patch attached and collect all logs when you try the reproduce the
>>> issue? It would be great to have logs for both success case and the
>>> failure case.
>>>
>>> Thanks so much for your help.
>>>
>>
>> I think we have a potential fix for this issue.
>> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
>> possible that rt6->dst.dev points to loopback device while
>> rt6->rt6i_idev->dev points to a real device.
>> When the real device goes down, the current fib6 clean up code only
>> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
>> That leaves unreleased refcnt on the real device if rt6->dst.dev
>> points to loopback dev.
>>
>> The attached potential fix is tested by Martin and made sure it fixes his 
>> issue.
>>
>> John,
>> It will be great if you can also give it a try and see if it fixes the
>> issue on your side before I submit an official patch.
>
> So yes, sorry I haven't been able to get back quicker on the other
> patches sent, was mucking about in other work.
>
> So yea, this patch  (potential fix for unregister_netdevice()) seems
> to avoid the issue.
>
> I'm going to do some further testing, but its looking good so far.

Looks good so far! I've not hit the issue yet.

Thanks so much for sorting out a fix!

If its useful:
Tested-by: John Stultz 

thanks again
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-11 Thread John Stultz
On Fri, Aug 11, 2017 at 5:10 PM, Wei Wang  wrote:
>> If after Cong's fix, the issue still happens, could you help try the
>> patch attached and collect all logs when you try the reproduce the
>> issue? It would be great to have logs for both success case and the
>> failure case.
>>
>> Thanks so much for your help.
>>
>
> I think we have a potential fix for this issue.
> Martin and I found that when addrconf_dst_alloc() creates a rt6, it is
> possible that rt6->dst.dev points to loopback device while
> rt6->rt6i_idev->dev points to a real device.
> When the real device goes down, the current fib6 clean up code only
> checks for rt6->dst.dev and assumes rt6->rt6i_idev->dev is the same.
> That leaves unreleased refcnt on the real device if rt6->dst.dev
> points to loopback dev.
>
> The attached potential fix is tested by Martin and made sure it fixes his 
> issue.
>
> John,
> It will be great if you can also give it a try and see if it fixes the
> issue on your side before I submit an official patch.

So yes, sorry I haven't been able to get back quicker on the other
patches sent, was mucking about in other work.

So yea, this patch  (potential fix for unregister_netdevice()) seems
to avoid the issue.

I'm going to do some further testing, but its looking good so far.

Do you still want feedback on the previous changes?

thanks
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-10 Thread John Stultz
On Wed, Aug 9, 2017 at 10:41 PM, Wei Wang  wrote:
> Hi John,
>
> Is it possible to try the attached patch?

Thanks so much for the quick turn around!

So I dropped all the reverts you suggested, and applied this one
against 4.13-rc4, but I'm still seeing the problematic behavior.

> I am not sure if it actually fixes the issue. But I think it is worth a try.
> Also, could you get me all the ipv6 routes when you plug in the usb
> using "ip -6 route show"? (If you have multiple routing tables
> configured, could you dump them all?)

# ip -6 route show
2601:1c2:1002:83f0::/64 dev eth0  proto kernel  metric 256  expires
345599sec pref medium
fe80::/64 dev eth0  proto kernel  metric 256  pref medium
default via fe80::200:caff:fe11:2233 dev eth0  proto ra  metric 1024
expires 1799sec hoplimit 64 pref medium


After unplugging the device (and getting the unregister_netdevice errors):
# ip -6 route show
#


thanks
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-09 Thread John Stultz
On Wed, Aug 9, 2017 at 5:36 PM, Wei Wang  wrote:
> On Wed, Aug 9, 2017 at 4:44 PM, John Stultz  wrote:
>> On Wed, Aug 9, 2017 at 4:34 PM, Cong Wang  wrote:
>>> (Cc'ing Wei whose commit was blamed)
>>>
>>> On Mon, Aug 7, 2017 at 2:15 PM, John Stultz  wrote:
>>>> On Mon, Aug 7, 2017 at 2:05 PM, John Stultz  wrote:
>>>>> So, with recent testing with my HiKey board, I've been noticing some
>>>>> quirky behavior with my USB eth adapter.
>>>>>
>>>>> Basically, pluging the usb eth adapter in and then removing it, when
>>>>> plugging it back in I often find that its not detected, and the system
>>>>> slowly spits out the following message over and over:
>>>>>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
>>>>
>>>> The other bit is that after this starts printing, the board will no
>>>> longer reboot (it hangs continuing to occasionally print the above
>>>> message), and I have to manually reset the device.
>>>>
>>>
>>> So this warning is not temporarily shown but lasts until a reboot,
>>> right? If so it is a dst refcnt leak.
>>
>> Correct, once I get into the state it lasts until a reboot.
>>
>>> How reproducible is it for you? From my reading, it seems always
>>> reproduced when you unplug and plug your usb eth interface?
>>> Is there anything else involved? For example, network namespace.
>>
>> So with 4.13-rc3/4 I seem to trigger it easily, often with the first
>> unplug of the USB eth adapter.
>>
>> But as I get back closer to 4.12, it seemingly becomes harder to
>> trigger, but sometimes still happens.
>>
>> So far, I've not been able to trigger it with 4.12.
>>
>> I don't think network namespaces are involved?  Though its out of my
>> area, so AOSP may be using them these days.  Is there a simple way to
>> check?
>>
>> I'll also do another bisection to see if the bad point moves back any 
>> further.

So I went through another bisection around and got  9514528d92d4 ipv6:
call dst_dev_put() properly as the first bad commit again.

> If you see the problem starts to happen on commit
> 9514528d92d4cbe086499322370155ed69f5d06c, could you try reverting all
> the following commits:
> (from new to old)
> 1eb04e7c9e63 net: reorder all the dst flags
> a4c2fd7f7891 net: remove DST_NOCACHE flag
> b2a9c0ed75a3 net: remove DST_NOGC flag
> 5b7c9a8ff828 net: remove dst gc related code
> db916649b5dd ipv6: get rid of icmp6 dst garbage collector
> 587fea741134 ipv6: mark DST_NOGC and remove the operation of dst_free()
> ad65a2f05695 ipv6: call dst_hold_safe() properly
> 9514528d92d4 ipv6: call dst_dev_put() properly


And reverting this set off of 4.13-rc4 seems to make the issue go away.

Is there anything I can test to help narrow down the specific problem
with that patchset?

thanks
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-09 Thread John Stultz
On Wed, Aug 9, 2017 at 5:36 PM, Wei Wang  wrote:
>
> Does your USB adapter get an IPv6 address?

Yes, it does.

> If you see the problem starts to happen on commit
> 9514528d92d4cbe086499322370155ed69f5d06c, could you try reverting all
> the following commits:
> (from new to old)
> 1eb04e7c9e63 net: reorder all the dst flags
> a4c2fd7f7891 net: remove DST_NOCACHE flag
> b2a9c0ed75a3 net: remove DST_NOGC flag
> 5b7c9a8ff828 net: remove dst gc related code
> db916649b5dd ipv6: get rid of icmp6 dst garbage collector
> 587fea741134 ipv6: mark DST_NOGC and remove the operation of dst_free()
> ad65a2f05695 ipv6: call dst_hold_safe() properly
> 9514528d92d4 ipv6: call dst_dev_put() properly
>
> and try if it starts to work?

I'll give that a shot!

Thanks so much for the help! I really appreciate it!
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-09 Thread John Stultz
On Wed, Aug 9, 2017 at 4:34 PM, Cong Wang  wrote:
> (Cc'ing Wei whose commit was blamed)
>
> On Mon, Aug 7, 2017 at 2:15 PM, John Stultz  wrote:
>> On Mon, Aug 7, 2017 at 2:05 PM, John Stultz  wrote:
>>> So, with recent testing with my HiKey board, I've been noticing some
>>> quirky behavior with my USB eth adapter.
>>>
>>> Basically, pluging the usb eth adapter in and then removing it, when
>>> plugging it back in I often find that its not detected, and the system
>>> slowly spits out the following message over and over:
>>>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1
>>
>> The other bit is that after this starts printing, the board will no
>> longer reboot (it hangs continuing to occasionally print the above
>> message), and I have to manually reset the device.
>>
>
> So this warning is not temporarily shown but lasts until a reboot,
> right? If so it is a dst refcnt leak.

Correct, once I get into the state it lasts until a reboot.

> How reproducible is it for you? From my reading, it seems always
> reproduced when you unplug and plug your usb eth interface?
> Is there anything else involved? For example, network namespace.

So with 4.13-rc3/4 I seem to trigger it easily, often with the first
unplug of the USB eth adapter.

But as I get back closer to 4.12, it seemingly becomes harder to
trigger, but sometimes still happens.

So far, I've not been able to trigger it with 4.12.

I don't think network namespaces are involved?  Though its out of my
area, so AOSP may be using them these days.  Is there a simple way to
check?

I'll also do another bisection to see if the bad point moves back any further.

thanks
-john


Re: unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-07 Thread John Stultz
On Mon, Aug 7, 2017 at 2:05 PM, John Stultz  wrote:
> So, with recent testing with my HiKey board, I've been noticing some
> quirky behavior with my USB eth adapter.
>
> Basically, pluging the usb eth adapter in and then removing it, when
> plugging it back in I often find that its not detected, and the system
> slowly spits out the following message over and over:
>   unregister_netdevice: waiting for eth0 to become free. Usage count = 1

The other bit is that after this starts printing, the board will no
longer reboot (it hangs continuing to occasionally print the above
message), and I have to manually reset the device.

thanks
-john


unregister_netdevice: waiting for eth0 to become free. Usage count = 1

2017-08-07 Thread John Stultz
So, with recent testing with my HiKey board, I've been noticing some
quirky behavior with my USB eth adapter.

Basically, pluging the usb eth adapter in and then removing it, when
plugging it back in I often find that its not detected, and the system
slowly spits out the following message over and over:
  unregister_netdevice: waiting for eth0 to become free. Usage count = 1

I've tried to go through and bisect it, but apparently the issue isn't
always reproducible, as I'm apparently getting lots of false negatives
(where I can't always reproduce boot to boot the issue on the same
kernel).

I've done three bisection passes (always restarting with the "first
bad commit" from the previous bisection as the initial bad commit for
the following pass), and it does seem to keep moving back. But it
seems much easier to trigger with newer kernels then older (and so far
I've not seen it with 4.12).

Wanted to see if anyone had any ideas what might be going wrong, and
how I should further debug this.

The last bisect log I generated was:

# good: [6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c] Linux 4.12
git bisect good 6f7da290413ba713f0cdd9ff1a2a9bb129ef4f6c
# bad: [98fdd857a3bd6a3bf0003d3f68f07c25c85dcde3] net: ethernet: ti:
cpsw: move skb timestamp to packet_submit
git bisect bad 98fdd857a3bd6a3bf0003d3f68f07c25c85dcde3
# good: [48b6bbef9a1789f0365c1a385879a1fea4460016] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good 48b6bbef9a1789f0365c1a385879a1fea4460016
# good: [a2e8bbd2ef5457485f00b6b947bbbfa2778e5b1e] bpf: Fix
test_obj_id.c for llvm 5.0
git bisect good a2e8bbd2ef5457485f00b6b947bbbfa2778e5b1e
# good: [273889e306256e95ea55d5ebaef99310cf589def] Merge tag
'mlx5-updates-2017-06-16' of
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
git bisect good 273889e306256e95ea55d5ebaef99310cf589def
# bad: [8f46d46715a12f509e13200033a1ed4d6cf335ff] cxgb4: Use Firmware
params to get buffer-group map
git bisect bad 8f46d46715a12f509e13200033a1ed4d6cf335ff
# bad: [f5c306470ed0a8f03ba7017f397da2555b5800d4] Merge tag
'mlx5-updates-2017-06-20' of
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
git bisect bad f5c306470ed0a8f03ba7017f397da2555b5800d4
# bad: [e289ef0ded13021db292be9aef134451546e7c60] net: dsa: mv88e6xxx:
clarify SMI PHY functions
git bisect bad e289ef0ded13021db292be9aef134451546e7c60
# bad: [836d57e5c08e13bb206dcd559d96ee9355e8316e] liquidio: implement
vlan filter enable and disable
git bisect bad 836d57e5c08e13bb206dcd559d96ee9355e8316e
# bad: [ad65a2f05695aced349e308193c6e2a6b1d87112] ipv6: call
dst_hold_safe() properly
git bisect bad ad65a2f05695aced349e308193c6e2a6b1d87112
# good: [0830106c53900181d336350581119af09e123bf3] ipv4: take
dst->__refcnt when caching dst in fib
git bisect good 0830106c53900181d336350581119af09e123bf3
# good: [b838d5e1c5b6e57b10ec8af2268824041e3ea911] ipv4: mark DST_NOGC
and remove the operation of dst_free()
git bisect good b838d5e1c5b6e57b10ec8af2268824041e3ea911
# bad: [9514528d92d4cbe086499322370155ed69f5d06c] ipv6: call
dst_dev_put() properly
git bisect bad 9514528d92d4cbe086499322370155ed69f5d06c
# good: [1cfb71eeb12047bcdbd3e6730ffed66e810a0855] ipv6: take
dst->__refcnt for insertion into fib6 tree
git bisect good 1cfb71eeb12047bcdbd3e6730ffed66e810a0855
# first bad commit: [9514528d92d4cbe086499322370155ed69f5d06c] ipv6:
call dst_dev_put() properly


But again, reverting the "ipv6: call dst_dev_put() properly" commit
doesn't seem to completely resolve the issue on newer kernels (though
it may make it harder to trigger), and I suspect with further
bisection passes I might move further back.

Ideas?  I don't seem to have similar issues with USB mass storage
devices, so it seems to be networking specific.

thanks
-john


Re: [PATCH 04/12] fs: ceph: CURRENT_TIME with ktime_get_real_ts()

2017-06-01 Thread John Stultz
On Thu, Jun 1, 2017 at 5:26 PM, Yan, Zheng  wrote:
> On Thu, Jun 1, 2017 at 6:22 PM, Arnd Bergmann  wrote:
>> On Thu, Jun 1, 2017 at 11:56 AM, Yan, Zheng  wrote:
>>> On Sat, Apr 8, 2017 at 8:57 AM, Deepa Dinamani  
>>> wrote:
>>
 diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
 index 517838b..77204da 100644
 --- a/drivers/block/rbd.c
 +++ b/drivers/block/rbd.c
 @@ -1922,7 +1922,7 @@ static void rbd_osd_req_format_write(struct 
 rbd_obj_request *obj_request)
  {
 struct ceph_osd_request *osd_req = obj_request->osd_req;

 -   osd_req->r_mtime = CURRENT_TIME;
 +   ktime_get_real_ts(&osd_req->r_mtime);
 osd_req->r_data_offset = obj_request->offset;
  }

 diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
 index c681762..1d3fa90 100644
 --- a/fs/ceph/mds_client.c
 +++ b/fs/ceph/mds_client.c
 @@ -1666,6 +1666,7 @@ struct ceph_mds_request *
  ceph_mdsc_create_request(struct ceph_mds_client *mdsc, int op, int mode)
  {
 struct ceph_mds_request *req = kzalloc(sizeof(*req), GFP_NOFS);
 +   struct timespec ts;

 if (!req)
 return ERR_PTR(-ENOMEM);
 @@ -1684,7 +1685,8 @@ ceph_mdsc_create_request(struct ceph_mds_client 
 *mdsc, int op, int mode)
 init_completion(&req->r_safe_completion);
 INIT_LIST_HEAD(&req->r_unsafe_item);

 -   req->r_stamp = current_fs_time(mdsc->fsc->sb);
 +   ktime_get_real_ts(&ts);
 +   req->r_stamp = timespec_trunc(ts, mdsc->fsc->sb->s_time_gran);
>>>
>>> This change causes our kernel_untar_tar test case to fail (inode's
>>> ctime goes back). The reason is that there is time drift between the
>>> time stamps got by  ktime_get_real_ts() and current_time(). We need to
>>> revert this change until current_time() uses ktime_get_real_ts()
>>> internally.
>>
>> Hmm, the change was not supposed to have a user-visible effect, so
>> something has gone wrong, but I don't immediately see how it
>> relates to what you observe.
>>
>> ktime_get_real_ts() and current_time() use the same time base, there
>> is no drift, but there is a difference in resolution, as the latter uses
>> the time stamp of the last jiffies update, which may be up to one jiffy
>> (10ms) behind the exact time we put in the request stamps here.
>>
>> Do you still see problems if you use current_kernel_time() instead of
>> ktime_get_real_ts()?
>
> The problem disappears after using current_kernel_time().
>
> https://github.com/ceph/ceph-client/commit/2e0f648da23167034a3cf1500bc90ec60aef2417

>From the commit above:
"It seems there is time drift between ktime_get_real_ts() and
current_kernel_time()"

Its more of a granularity difference. current_kernel_time() returns
the cached time at the last tick, where as ktime_get_real_ts() reads
the clocksource hardware and returns the immediate time.

Filesystems usually use the cached time (similar to
CLOCK_REALTIME_COARSE), for performance reasons, as touching the
clocksource takes time.

thanks
-john


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-08 Thread John Stultz
On Tue, Dec 6, 2016 at 10:23 AM, Tejun Heo  wrote:
> Hello,
>
> On Tue, Dec 06, 2016 at 10:13:53AM -0800, Andy Lutomirski wrote:
>> > Delegation is an explicit operation and reflected in the ownership of
>> > the subdirectories and cgroup interface files in them.  The
>> > subhierarchy containment is achieved by requiring the user who's
>> > trying to migrate a process to have write perm on cgroup.procs on the
>> > common ancestor of the source and target in addition to the target.
>>
>> OK, I see what you're doing.  That's interesting.
>
> It's something born out of usages of cgroup v1.  People used it that
> way (chowning files and directories) and combined with the uid checksn
> it yielded something which is useful sometimes, but it always had
> issues with hierarchical behaviors, which files to chmod and the weird
> combination of uid checks.  cgroup v2 has a clear delegation model but
> the uid checks are still left in as not changing was the default.
>
> It's not necessary and I'm thinking about queueing something like the
> following in the next cycle.
>
> As for the android CAP discussion, I think it'd be nice to share an
> existing CAP but if we can't find a good one to share, let's create a
> new one.

So just to clarify the discussion for my purposes and make sure I
understood, per-cgroup CAP rules was not desired, and instead we
should either utilize an existing cap (are there still objections to
CAP_SYS_RESOURCE? - this isn't clear to me) or create a new one (ie,
bring back the older CAP_CGROUP_MIGRATE patch).

Tejun: Do you have a more finished version of your patch that I should
add my changes on top of?

thanks
-john


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-12-05 Thread John Stultz
On Tue, Nov 22, 2016 at 4:57 PM, John Stultz  wrote:
> On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski  wrote:
>> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>>  wrote:
>>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>>
>>>> I hate to say it, but I think I may see a problem.  Current
>>>> developments are afoot to make cgroups do more than resource control.
>>>> For example, there's Landlock and there's Daniel's ingress/egress
>>>> filter thing.  Current cgroup controllers can mostly just DoS their
>>>> controlled processes.  These new controllers (or controller-like
>>>> things) can exfiltrate data and change semantics.
>>>>
>>>> Does anyone have a security model in mind for these controllers and
>>>> the cgroups that they're attached to?  I'm reasonably confident that
>>>> CAP_SYS_RESOURCE is not the answer...
>>>
>>> and specifically the answer is... ?
>>> Also would be great if you start with specifying the question first
>>> and the problem you're trying to solve.
>>>
>>
>> I don't have a good answer right now.  Here are some constraints, though:
>>
>> 1. An insufficiently privileged process should not be able to move a
>> victim into a dangerous cgroup.
>>
>> 2. An insufficiently privileged process should not be able to move
>> itself into a dangerous cgroup and then use execve to gain privilege
>> such that the execve'd program can be compromised.
>>
>> 3. An insufficiently privileged process should not be able to make an
>> existing cgroup dangerous in a way that could compromise a victim in
>> that cgroup.
>>
>> 4. An insufficiently privileged process should not be able to make a
>> cgroup dangerous in a way that bypasses protections that would
>> otherwise protect execve() as used by itself or some other process in
>> that cgroup.
>>
>> Keep in mind that "dangerous" may apply to a cgroup's descendents in
>> addition to the cgroup being controlled.
>
> Sorry for taking awhile to get back to you here.  I'm a little
> befuddled as to what next steps I should consider (and honestly, I'm
> not totally sure I really grok your concern here, particularly what
> you mean with "dangrous cgroups").
>
> So is going back to the CAP_CGROUP_MIGRATE approach (to properly
> separate "sufficiently" from "insufficiently privileged") better?
>
> Or something closer to the original method Android used of each cgroup
> having an allow_attach() check which could determine what is
> sufficiently privledged for the respective level of danger the cgroup
> might poise?
>
> Or just stepping back, what method would you imagine to be reasonable
> to allow a specified task to migrate other tasks between cgroups
> without it having to be root/suid?

Any suggested feedback here?

thanks
-john


Re: [RESEND][PATCH v4] cgroup: Use CAP_SYS_RESOURCE to allow a process to migrate other tasks between cgroups

2016-11-22 Thread John Stultz
On Tue, Nov 8, 2016 at 4:12 PM, Andy Lutomirski  wrote:
> On Tue, Nov 8, 2016 at 4:03 PM, Alexei Starovoitov
>  wrote:
>> On Tue, Nov 08, 2016 at 03:51:40PM -0800, Andy Lutomirski wrote:
>>>
>>> I hate to say it, but I think I may see a problem.  Current
>>> developments are afoot to make cgroups do more than resource control.
>>> For example, there's Landlock and there's Daniel's ingress/egress
>>> filter thing.  Current cgroup controllers can mostly just DoS their
>>> controlled processes.  These new controllers (or controller-like
>>> things) can exfiltrate data and change semantics.
>>>
>>> Does anyone have a security model in mind for these controllers and
>>> the cgroups that they're attached to?  I'm reasonably confident that
>>> CAP_SYS_RESOURCE is not the answer...
>>
>> and specifically the answer is... ?
>> Also would be great if you start with specifying the question first
>> and the problem you're trying to solve.
>>
>
> I don't have a good answer right now.  Here are some constraints, though:
>
> 1. An insufficiently privileged process should not be able to move a
> victim into a dangerous cgroup.
>
> 2. An insufficiently privileged process should not be able to move
> itself into a dangerous cgroup and then use execve to gain privilege
> such that the execve'd program can be compromised.
>
> 3. An insufficiently privileged process should not be able to make an
> existing cgroup dangerous in a way that could compromise a victim in
> that cgroup.
>
> 4. An insufficiently privileged process should not be able to make a
> cgroup dangerous in a way that bypasses protections that would
> otherwise protect execve() as used by itself or some other process in
> that cgroup.
>
> Keep in mind that "dangerous" may apply to a cgroup's descendents in
> addition to the cgroup being controlled.

Sorry for taking awhile to get back to you here.  I'm a little
befuddled as to what next steps I should consider (and honestly, I'm
not totally sure I really grok your concern here, particularly what
you mean with "dangrous cgroups").

So is going back to the CAP_CGROUP_MIGRATE approach (to properly
separate "sufficiently" from "insufficiently privileged") better?

Or something closer to the original method Android used of each cgroup
having an allow_attach() check which could determine what is
sufficiently privledged for the respective level of danger the cgroup
might poise?

Or just stepping back, what method would you imagine to be reasonable
to allow a specified task to migrate other tasks between cgroups
without it having to be root/suid?

thanks
-john


Re: [PATCH v4 6/6] posix-timers: make it configurable

2016-11-15 Thread John Stultz
On Thu, Nov 10, 2016 at 9:10 PM, Nicolas Pitre  wrote:
> Some embedded systems have no use for them.  This removes about
> 25KB from the kernel binary size when configured out.
>
> Corresponding syscalls are routed to a stub logging the attempt to
> use those syscalls which should be enough of a clue if they were
> disabled without proper consideration. They are: timer_create,
> timer_gettime: timer_getoverrun, timer_settime, timer_delete,
> clock_adjtime, setitimer, getitimer, alarm.
>
> The clock_settime, clock_gettime, clock_getres and clock_nanosleep
> syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
> CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
> majority of use cases with very little code.
>
> Signed-off-by: Nicolas Pitre 
> Reviewed-by: Josh Triplett 
> Acked-by: Richard Cochran 
> Acked-by: Thomas Gleixner 

Ok.. ran the series through the kselftest/timers series and the results look ok.

Acked-by: John Stultz 

thanks
-john


Re: [PATCH v3 4/4] posix-timers: make it configurable

2016-11-08 Thread John Stultz
On Tue, Nov 8, 2016 at 10:19 AM, Nicolas Pitre  wrote:
> On Tue, 8 Nov 2016, John Stultz wrote:
>
>> One spot of concern is that the
>> tools/testing/selftests/timers/posix_timers.c test hangs testing
>> virtual itimers. Looking through the code I'm not seeing where an
>> error case is missed.
>>
>> The strace looks like:
>> ...
>> write(1, "Testing posix timers. False nega"..., 66Testing posix
>> timers. False negative may happen on CPU execution
>> ) = 66
>> write(1, "based timers if other threads ru"..., 48based timers if
>> other threads run on the CPU...
>> ) = 48
>> write(1, "Check itimer virtual... ", 24Check itimer virtual... ) = 24
>> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
>> 0x7fb73306ccb0}, {SIG_DFL, [], 0}, 8) = 0
>> gettimeofday({1478710402, 937476}, NULL) = 0
>> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
>> 
>>
>>
>> Where as with posix timers enabled:
>> ...
>> write(1, "Testing posix timers. False nega"..., 138Testing posix
>> timers. False negative may happen on CPU execution
>> based timers if other threads run on the CPU...
>> Check itimer virtual... ) = 138
>> rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
>> 0x7f231ba8ccb0}, {SIG_DFL, [], 0}, 8) = 0
>> gettimeofday({1478626751, 904856}, NULL) = 0
>> setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
>> --- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_KERNEL} ---
>> rt_sigreturn()  = 0
>
> I'll have a look.
>
>> So I suspect you were a little too aggressive with the #ifdefs around
>> the itimers/signal code, or we need to make sure we return an error on
>> the setitimer ITIMER_VIRTUAL case as well.
>
> Well, it seemed to me that with POSIX_TIMERS=n, all the code that would
> set up that signal is gone, so there was no point keeping the code to
> deliver it.
>
> Now... would it make more sense to remove itimer support as well when
> POSIX_TIMERS=n?  The same reasoning would apply.

Yes, returning an error with itimers seems needed if the signal bits
are missing.

Though I do worry that since getitimer/setitimer are older obsolete
interfaces which the posix timers api is supposed to replace, folks
might be surprised to see it removed when setting POSIX_TIMERS=n. So
some additional notes in the kconfig description may be needed.

thanks
-john


Re: [PATCH v3 4/4] posix-timers: make it configurable

2016-11-08 Thread John Stultz
On Mon, Nov 7, 2016 at 2:14 PM, Nicolas Pitre  wrote:
> Some embedded systems have no use for them.  This removes about
> 22KB from the kernel binary size when configured out.
>
> Corresponding syscalls are routed to a stub logging the attempt to
> use those syscalls which should be enough of a clue if they were
> disabled without proper consideration. They are: timer_create,
> timer_gettime: timer_getoverrun, timer_settime, timer_delete,
> clock_adjtime.
>
> The clock_settime, clock_gettime, clock_getres and clock_nanosleep
> syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
> CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
> majority of use cases with very little code.
>
> Signed-off-by: Nicolas Pitre 
> Reviewed-by: Josh Triplett 
> Acked-by: Richard Cochran 

So I have no design objections to the patch overall.

I ran this through my timekeeping tests last night and it passed a
fair number of the tests, considering.

I of course see a lot of failures around timer_creates failing
(set-timer-lat), and cases where clockids aren't supported.
So I'll need to see about updating the tests to fail more gracefully
with this change.

One spot of concern is that the
tools/testing/selftests/timers/posix_timers.c test hangs testing
virtual itimers. Looking through the code I'm not seeing where an
error case is missed.

The strace looks like:
...
write(1, "Testing posix timers. False nega"..., 66Testing posix
timers. False negative may happen on CPU execution
) = 66
write(1, "based timers if other threads ru"..., 48based timers if
other threads run on the CPU...
) = 48
write(1, "Check itimer virtual... ", 24Check itimer virtual... ) = 24
rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
0x7fb73306ccb0}, {SIG_DFL, [], 0}, 8) = 0
gettimeofday({1478710402, 937476}, NULL) = 0
setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0



Where as with posix timers enabled:
...
write(1, "Testing posix timers. False nega"..., 138Testing posix
timers. False negative may happen on CPU execution
based timers if other threads run on the CPU...
Check itimer virtual... ) = 138
rt_sigaction(SIGVTALRM, {0x400a80, [VTALRM], SA_RESTORER|SA_RESTART,
0x7f231ba8ccb0}, {SIG_DFL, [], 0}, 8) = 0
gettimeofday({1478626751, 904856}, NULL) = 0
setitimer(ITIMER_VIRTUAL, {it_interval={0, 0}, it_value={2, 0}}, NULL) = 0
--- SIGVTALRM {si_signo=SIGVTALRM, si_code=SI_KERNEL} ---
rt_sigreturn()  = 0
gettimeofday({1478626753, 906137}, NULL) = 0
write(1, "[OK]\nCheck itimer prof... ", 26[OK]
...

So I suspect you were a little too aggressive with the #ifdefs around
the itimers/signal code, or we need to make sure we return an error on
the setitimer ITIMER_VIRTUAL case as well.

thanks
-john


[PATCH] asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions

2016-05-16 Thread John Stultz
In testing with HiKey, we found that since
commit 3f30b158eba5 ("asix: On RX avoid creating bad Ethernet
frames"),
we're seeing lots of noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x5638f7e2, offset 4

And network throughput ends up being pretty bursty and slow with
a overall throughput of at best ~30kB/s (where as previously we
got 1.1MB/s with the slower USB1.1 "full speed" host).

We found the issue also was reproducible on a x86_64 system,
using a "high-speed" USB2.0 port but the throughput did not
measurably drop (possibly due to the scp transfer being cpu
bound on my slow test hardware).

After lots of debugging, I found the check added in the
problematic commit seems to be calculating the offset
incorrectly.

In the normal case, in the main loop of the function, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But the problematic patch calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used
to find rx->header, the one in problematic code is always too
large by sizeof(u32).

Thus, this patch removes the incorrect " + sizeof(u32)" addition
in the problematic calculation, and resolves the issue.

Cc: Dean Jenkins 
Cc: "David B. Robins" 
Cc: Mark Craske 
Cc: Emil Goode 
Cc: "David S. Miller" 
Cc: YongQin Liu 
Cc: Guodong Xu 
Cc: Ivan Vecera 
Cc: linux-...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: stable  #4.4+
Reported-by: Yongqin Liu 
Signed-off-by: John Stultz 
---
 drivers/net/usb/asix_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
index 0c5c22b..7de5ab5 100644
--- a/drivers/net/usb/asix_common.c
+++ b/drivers/net/usb/asix_common.c
@@ -66,7 +66,7 @@ int asix_rx_fixup_internal(struct usbnet *dev, struct sk_buff 
*skb,
 * buffer.
 */
if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
-   offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
+   offset = ((rx->remaining + 1) & 0xfffe);
rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;
 
-- 
1.9.1



Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-16 Thread John Stultz
On Wed, May 11, 2016 at 3:00 PM, Dean Jenkins  wrote:
>
> Your observations are consistent with missing URBs from the USB host
> controller.
>
> Here is a summary of what I think is happening in your case:
>
> Good case:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> URB #3: 988 octets of 1514 Ethernet frame (C)
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> Therefore, Ethernet frame (C) is spanning URBs #2 and #3.
>
> Bad case, URB #3 is lost:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> Remaining is 988
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> But when URB #4 is analysed the 32-bit Header word is not found after 988
> octets in the URB buffer so "sync lost".
> The end of Ethernet frame (C) is missing so drop the Ethernet frame.
> Now look at the start of the URB #4 buffer and find a 32-bit header word so
> Ethernet frame (D) can be consumed.
>
> So I think the commit is acting as intended and you are suffering from lost
> URBs.

No. I went digging on this for a bit longer, and it looks like its
just that you're calculating the offset wrong in your check.

I was wondering why without your patch we wouldn't see "Bad Header
Length" messages, since if the remaining was 988 and the skb->len was
2048 as seen in my logs, without your patch we should copy the 988
bytes out clear remaining and then continue processing the rest of the
skb, which calculates the header and checks the size. If we really
lost the URB, we should throw an error at that point, since really
we'd be midway through the following frame.  But we just don't see
that with your patch removed.

Looking more closely, in the main loop, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But your check calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used to find
rx->header, the one in your code is always too large by sizeof(u32).

So removing the extra addition in your offset calculation seems to
solve this for me.

I'll send out a patch here shortly.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
>
> [  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
> synchronisation was lost, remaining 988
>
> This error message consistently shows the remaining value to be 988, at
> least for the 3 examples provided by John. This does not suggest a random
> failure unless there are other examples of a non 988 remaining value error
> message. 988 is well within a Ethernet frame length so seems to be valid.
>
> I think a good step would be to add some debug to print the rx->remaining
> value at entry to asix_rx_fixup_internal(). This would generate a lot of
> debug but a pattern of the values might emerge.


So I've been trying to add some print messages here to better
understand whats going on.

Again, I'm a bit new to this code, so forgive me for my lack of
understanding things. Since the remaining value seems to be key, I
tried to look around and figure out where it was being set. It seems
like its only set in this function, is that right?  So this made me
guess something might be happening in a previous iteration that was
causing this to trigger.

I added some debug prints to every time we set the remaining value, or
modify it, as well as to print the value if we enter the fixup
function with a non-zero remaining value.

When we set the remaining value, usually its to 1514, when the skblen is 1518.

However, right before we catch the problem, I see this:


[   84.844337] JDB set remaining to 1514 (skblen: 1518)
[   84.844379] JDB set remaining to 1514 (skblen: 1518)
[   84.844429] JDB set remaining to 1514 (skblen: 1518)
[   84.844458] JDB set remaining to 1514 (skblen: 1518)
[   84.844483] JDB set remaining to 1514 (skblen: 1518)
[   84.844507] JDB set remaining to 1514 (skblen: 1518)
[   84.844559] JDB set remaining to 1514 (skblen: 2048)
[   84.844583] JDB set remaining to 1514 (skblen: 2048)
[   84.844595] JDB: 1514 > 2048 - 1522
[   84.844606] JDB: dropping remaining by 526
[   84.844624] asix_rx_fixup_internal()  remaining: 988,  skb->len: 2048
[   84.844672] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[   84.844945] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xdd5f8f9b, offset 4
[   84.845217] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3ce1ad3c, offset 4
[   84.845451] JDB set remaining to 1514 (skblen: 1518)
[   84.845485] JDB set remaining to 1514 (skblen: 1518)
[   84.845511] JDB set remaining to 1514 (skblen: 1518)
[   84.851003] JDB set remaining to 1514 (skblen: 1518)


So when the issue happens, it seems to be due to an larger then usual
skb (2048). The first time through the wile loop we set the remaining
to 1514, but offset is fairly small, so we set copy_length to 1514,
and clear remaining. The offset is bumped by a little more then the
copy length and we loop again.  Then the second time through we set
remaining to 1514, but since offset is bigger now, the if
(rx->remaining > skb->len - offset)  case is true..

This is where it feels a little strange..

We calculate the copy_length as the difference between the offset and
the skb->len (so how much is left in the skb, which is 526), then
decrement remaining by that amount.  Not really sure what remaining
(now 988) is supposed to represent here. We copy the 526 bytes, and
then exit the loop.

Now the next time we are called, we enter and we have a remaining
value still of 988, which triggers the header synchronization error
path.

Now, I'm not sure if the remainder handling logic is sane, or if the
skb->len being 2048 is problematic, or what.  The skb->lens can vary
in sizes, usually 1518 during high throughput, but I've seen 1588,
1640, and other larger numbers that don't trigger the same problem.

ie:
[  106.946473] JDB set remaining to 1514 (skblen: 1518)
[  106.946525] JDB set remaining to 1514 (skblen: 1640)
[  106.946546] JDB set remaining to 118 (skblen: 1640)
[  106.946586] JDB set remaining to 1514 (skblen: 1518)


So yea.. maybe that will help clue things in a bit? I'm still a bit lost. :)

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Tue, May 3, 2016 at 2:16 PM, Dean Jenkins  wrote:
> A good test would be to run "ping -c 1 -s $packet_length $ip_address" inside
> a script which has a loop with an increasing payload length $packet_length
> with a small delay between ping calls. This will show whether particular
> packet sizes trigger the failures.
>
> Then try with "ping -f -c 200 -s $packet_length $ip_address" to load up the
> USB link.

I've tried both of these on my x86_64 system.  I can send single pings
up to 65507 without triggering the issue (after which I get errors
sending on the host side as I think I cross a 64k boundary with
headers, not the asix errors).

Then when I try ping -f -c 200 -s 65507 $ip_address, I don't see any
failures. I did it for a count of 2000 as well without any issues.

I'll be adding more debug prints in soon.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Fri, May 6, 2016 at 8:00 AM, Dean Jenkins  wrote:
> My conclusion is that your USB to Ethernet Adaptor is not running at high
> speed (480Mbps) mode which is causing a partial loss (corruption) of
> Ethernet frames across the USB link. A USB Protocol Analyser or software
> tool usbmon could be used to confirm this scenario.
>
> Therefore please retest with a working high-speed USB hub or remove the
> full-speed USB hub from the test environment and directly connect the USB to
> Ethernet Adaptor to the root hub of the USB port. Then repeat the tests to
> see whether anything improved.
>
> In other words, you need to eliminate the dmesg messages saying "not running
> at top speed; connect to a high speed hub".

The aarch64 system has a quirk that at the moment limits it to the
slower full-speed mode, which also exacerbates the issue (basically
taking a fairly slow 1.1.Mb/s network connection without your patch,
to an almost unusable 30Kb/s with it).

But that isn't the case on the x86_64 system, which is seeing a very
similar problem (though the performance effect isn't nearly as bad, as
the error rate in time seems relatively similar on both, and I think
my scp transmissions are cpu bound on this atom board :).

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-06 Thread John Stultz
On Thu, May 5, 2016 at 1:11 AM, Dean Jenkins  wrote:
> On 05/05/16 00:45, John Stultz wrote:
>>
>> Just as a sample point, I have managed to reproduce exactly this issue
>> on an x86_64 system by simply scp'ing a large file.
>
> Please tell us the x86_64 kernel version number that you used and which
> Linux Distribution it was ? This allows other people a chance to reproduce
> your observations.

Sorry for being a little slow here, had some other issues I had to chase.

On my x86_64 system, its Ubuntu 14.04.4, with a 4.6.0-rc2 kernel.


>> [  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
>> synchronisation was lost, remaining 988
>
> It is interesting that the reported "remaining" value is 988. Is 988 always
> shown ? I mean that do you see any other "remaining" values for the "Data
> Header synchronisation was lost" error message ?

Yep. Its always the same 988 remaining, on either architecture.


>> [  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0xef830347, offset 4
>
> The gap in the timestamps shows 417.823415 - 417.819276 = 0.004139 = 4ms
> which is a large gap in terms of USB 2.0 high speed communications. This gap
> is expected to be in the 100us range for consecutive URBs. So 4ms is
> strange.
>
> The expectation is that the "Data Header synchronisation was lost" error
> message resets the 32-bit header word synchronisation to the start of the
> next URB buffer. The "Bad Header Length, offset 4" is the expected outcome
> for the next URB because it is unlikely the 32-bit header word is at the
> start of URB buffer due to Ethernet frames spanning across URBs.
>>
>> [  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
>> 0x31e2b348, offset 4
>
> Timestamps show the gap to be 4ms which is strange for USB 2.0 high speed,
> are you sure high speed mode is being used ?
>>

Yep, on my x86_64 system, it seems to be.

[3.101115] usb 1-5: new high-speed USB device number 2 using ehci-pci
[3.232309] usb 1-5: New USB device found, idVendor=0b95, idProduct=772b
[3.232327] usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[3.232339] usb 1-5: Product: AX88772B
[3.232350] usb 1-5: Manufacturer: ASIX Elec. Corp.
[3.232360] usb 1-5: SerialNumber: 188298
[4.032206] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98


> Please can you supply the output of ifconfig for the USB to Ethernet
> adaptor, your example above shows eth1 as the device.
>
> Please show the output of ifconfig eth1 before and after the issue is seen.
> This will show us whether the kernel logs any network errors and how many
> bytes have been transferred.

Before:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:372 errors:0 dropped:0 overruns:0 frame:0
  TX packets:385 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:38523 (38.5 KB)  TX bytes:48801 (48.8 KB)


After:
$ ifconfig eth1
eth1  Link encap:Ethernet  HWaddr 00:50:b6:18:82:98
  inet addr:192.168.0.12  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: 2601:1c2:1002:83f0:250:b6ff:fe18:8298/64 Scope:Global
  inet6 addr: fe80::250:b6ff:fe18:8298/64 Scope:Link
  inet6 addr: 2601:1c2:1002:83f0:b0f0:71a0:6c7e:346b/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:151005 errors:169 dropped:0 overruns:0 frame:0
  TX packets:61351 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:225874384 (225.8 MB)  TX bytes:4431098 (4.4 MB)




> After the issue is seen, please can you show us the output of "dmesg | grep
> asix" so that we can see status messages from the ASIX driver that the USB
> to Ethernet adaptor is using. In particular we need to check that USB high
> speed operation (480Mbps) is being used and not full speed operation
> (12Mbps).


[2.766525] usbcore: registered new interface driver asix
[4.031443] asix 1-5:1.0 eth1: register 'asix' at
usb-:00:04.1-5, ASIX AX88772B USB 2.0 Ethernet, 00:50:b6:18:82:98
[   31.578983] asix 1-5:1.0 eth1: link down
[   33.244743] asix 1-5:1.0 eth1: link up, 100Mbps, full-duplex, lpa 0xCDE1
[  171.959244] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisat

Re: [Y2038] [RESEND PATCH 2/3] fs: poll/select/recvmmsg: use timespec64 for timeout events

2016-05-04 Thread John Stultz
On Wed, May 4, 2016 at 4:51 PM, Andrew Morton  wrote:
> On Wed, 04 May 2016 23:08:11 +0200 Arnd Bergmann  wrote:
>
>> > But I'm less comfortable making the call on this one. It looks
>> > relatively straight forward, but it would be good to have maintainer
>> > acks before I add it to my tree.
>>
>> Agreed. Feel free to add my
>>
>> Reviewed-by: Arnd Bergmann 
>>
>> at least (whoever picks it up).
>
> In reply to [1/3] John said
>
> : Looks ok at the first glance. I've queued these up for testing,
> : however I only got #1 and #3 of the set. Are you hoping these two
> : patches will go through tip/timers/core or are you looking for acks so
> : they can go via another tree?
>
> However none of the patches are in linux-next.
>
> John had qualms about [2/3], but it looks like a straightforward
> substitution in areas which will get plenty of testing

Yea. My main concern is just not stepping on any other maintainers toes.

> I'll grab the patches for now to get them some external testing.

I'd be just as happy if the set went through you Andrew.

For the set: Acked-by: John Stultz 

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-04 Thread John Stultz
On Tue, May 3, 2016 at 3:54 AM, Dean Jenkins  wrote:
> On 03/05/16 11:04, Guodong Xu wrote:
>>
>> did you test on ARM 64-bit system or ARM 32-bit? I ask because HiKey
>> is an ARM 64-bit system. I suggest we should be careful on that. I saw
>> similar issues when transferring to a 64-bit system in other net
>> drivers.
>
> We used 32-bit ARM and never tested on 64-bit ARM so I suggest that the
> commits need to be reviewed with 64-bit OS in mind.
>>
>>
>> Do you have any suggestion on this regard?
>
> Try testing on a Linux PC x86 32-bit OS which has has a kernel containing my
> ASIX commits. This will help to confirm whether the failure is related to
> 32-bit or 64-bit OS. Then try with Linux PC x86 64-bit OS, this should fail
> otherwise it points to something specific in your ARM 64-bit platform.

Just as a sample point, I have managed to reproduce exactly this issue
on an x86_64 system by simply scp'ing a large file.

[  417.819276] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.823415] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xef830347, offset 4
[  417.827502] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x31e2b348, offset 4
[  417.843779] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  417.847921] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8af91035, offset 4
[  417.852004] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x8521fa03, offset 4
[  418.273394] asix 1-5:1.0 eth1: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  418.277532] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x33cd9c7c, offset 4
[  418.281683] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x3d850896, offset 4
[  418.286227] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0x86443357, offset 4
[  418.290319] asix 1-5:1.0 eth1: asix_rx_fixup() Bad Header Length
0xee6c81d1, offset 4

I don't have any 32bit x86 installs around so I'm not sure I can easly
test there, but its clear its not arm64 specific.

thanks
-john


Re: [RESEND PATCH 2/3] fs: poll/select/recvmmsg: use timespec64 for timeout events

2016-05-04 Thread John Stultz
On Wed, May 4, 2016 at 12:24 PM, Deepa Dinamani  wrote:
> struct timespec is not y2038 safe.
> Even though timespec might be sufficient to represent
> timeouts, use struct timespec64 here as the plan is to
> get rid of all timespec reference in the kernel.
>
> The patch transitions the common functions:
> poll_select_set_timeout() and select_estimate_accuracy()
> to use timespec64. And, all the syscalls that use these
> functions are transitioned in the same patch.
>
> The restart block parameters for poll uses monotonic time.
> Use timespec64 here as well to assign timeout value. This
> parameter in the restart block need not change because
> this only holds the monotonic timestamp at which timeout
> should occur. And, unsigned long data type should be big
> enough for this timestamp.
>
> The system call interfaces will be handled in a separate
> series.
>
> Compat interfaces need not change as timespec64 is an
> alias to struct timespec on a 64 bit system.
>
> Signed-off-by: Deepa Dinamani 
> Cc: Alexander Viro 
> Cc: "David S. Miller" 
> Cc: netdev@vger.kernel.org
> ---
> Resending to include John and Thomas on this patch as well.
> This is to include this patch also in John's tree.
> This will let all 3 patches in the series to merged through the same tree.
>
>  fs/eventpoll.c   | 12 +-
>  fs/select.c  | 67 
> +---
>  include/linux/poll.h | 11 +
>  net/socket.c |  8 ---
>  4 files changed, 54 insertions(+), 44 deletions(-)


So with the other two patches, I'm more comfortable queueing and
sending through to Thomas.

But I'm less comfortable making the call on this one. It looks
relatively straight forward, but it would be good to have maintainer
acks before I add it to my tree.

thanks
-john


Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-03 Thread John Stultz
On Tue, May 3, 2016 at 7:42 AM, David B. Robins  wrote:
> On 2016-05-03 00:55, John Stultz wrote:
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>
> I don't think the first one is giving you problems (except as triggered by
> the second) but I had concerns about the second myself (and emailed the
> author off-list, but received no reply), and we did not take that commit for
> our own product.

Yes, the first/later commit is being reverted as it modifies code that
was also changed by the second/earlier one. So the 3f30 patch doesn't
revert cleanly by itself, but I have tested by just removing the (now
modified by 6a57) chunk of code it adds does seem to avoid the problem
as well. Though I wasn't able to review things closely enough to be
confident that didn't introduce any subtle bugs in the remaining logic
in the 6a57 patch.

> Specifically, the second change, 3f30... (original patch:
> https://www.mail-archive.com/netdev@vger.kernel.org/msg80720.html) (1)
> appears to do the exact opposite of what it claims, i.e., instead of "resync
> if this looks like a header", it does "resync if this does NOT look like a
> (packet) header", where "looks like a header" means "bits 0-10 (size) are
> equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence
> for 1/2048 32-bit values starting a continuation URB (easy to hit dealing
> with large volumes of video data as we were). It appears to expect the
> header for every URB whereas the rest of the code at least expects it only
> once per network packet (look at following code that only reads it for
> remaining == 0).
>
> So that change made no sense to me, but I don't have significant kernel dev
> experience. Effectively it will drop/truncate every (2047/2048) split
> (longer than an URB) packet, and report an error for the second URB and then
> again for treating said second URB as a first URB for a packet. I would
> expect your problems will go away just removing the second change. You could
> also change the != to == in "if (size != ...)" but then you'd still have
> 1/2048 (depending on data patterns) false positives.

I'll have to look into this more. I'm not super familiar with usb or
networking, so I'm not sure I can judge the better approach.

thanks
-john


[REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-02 Thread John Stultz
In testing with HiKey, we found that since commit 3f30b158eba5c60
(asix: On RX avoid creating bad Ethernet frames), we're seeing lots of
noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header
synchronisation was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length
0x5638f7e2, offset 4


And network throughput ends up being pretty bursty and slow with a
overall throughput of at best ~30kB/s.

Looking through the commits since the v4.1 kernel where we didn't see
this, I narrowed the regression down, and reverting the following two
commits seems to avoid the problem:

6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
if no RX netdev buffer
3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
bad Ethernet frames

With these reverted, we don't see all the error messages, and we see
better ~1.1MB/s throughput (I've got a mouse plugged in, so I think
the usb host is only running at "full-speed" mode here).

This worries me some, as the patches seem to describe trying to fix
the issue they seem to cause, so I suspect a revert isn't the correct
solution, but am not sure why we're having such trouble and the patch
authors did not.  I'd be happy to do further testing of patches if
folks have any ideas.

Originally Reported-by: Yongqin Liu 

thanks
-john


Re: [PATCH] wcn36xx: Set SMD timeout to 10 seconds

2016-04-21 Thread John Stultz
On Thu, Apr 21, 2016 at 2:09 PM, Bjorn Andersson
 wrote:
> After booting the wireless subsystem and uploading the NV blob to the
> WCNSS_CTRL service the remote continues to do things and will not start
> servicing wlan-requests for another 2-5 seconds (measured).
>
> The downstream code does not have any special handling for this case,
> but has a timeout of 10 seconds for the communication layer. By
> extending the wcn36xx timeout to match this we follows the same flow for
> the boot procedure and can successfully configure WiFi as wlan0 is
> registered.
>
> Signed-off-by: Bjorn Andersson 

I've been using this with my nexus7 tree, and its avoided issues I was
seeing without it.

Tested-by: John Stultz 

thanks
-john


[PATCHv2 0/8][GIT PULLv2] time: Cross-timestamp infrastructure for 4.6

2016-03-03 Thread John Stultz
Hey Thomas,
So again, here is Christopher's cross-timestamp
infrastructure patchset which I wanted to send along for 4.6.
(Including the minor tweaks you suggested). These apply
against tip/timers/core.

Let me know if you have any objections.

thanks
-john

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Jeff Kirsher 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org

The following changes since commit 232d26373d310a941ef2ab46e53ea62fe076ed13:

  jiffies: Use CLOCKSOURCE_MASK instead of constant (2016-02-27 08:55:31 +0100)

are available in the git repository at:

  https://git.linaro.org/people/john.stultz/linux.git fortglx/4.6/time

for you to fetch changes up to 01d7ada57ee9c735bd71fbe44ec0bcb70847afd4:

  e1000e: Adds hardware supported cross timestamp on e1000e nic (2016-03-03 
14:28:46 -0800)

Christopher S. Hall (8):
  time: Add cycles to nanoseconds translation
  time: Add timekeeping snapshot code capturing system time and counter
  time: Remove duplicated code in ktime_get_raw_and_real()
  time: Add driver cross timestamp interface for higher precision time
synchronization
  time: Add history to cross timestamp interface supporting slower
devices
  x86/tsc: Always Running Timer (ART) correlated clocksource
  ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
  e1000e: Adds hardware supported cross timestamp on e1000e nic

 Documentation/ptp/testptp.c |   6 +-
 arch/x86/include/asm/cpufeature.h   |   2 +-
 arch/x86/include/asm/tsc.h  |   2 +
 arch/x86/kernel/tsc.c   |  59 ++
 drivers/net/ethernet/intel/Kconfig  |   9 +
 drivers/net/ethernet/intel/e1000e/defines.h |   5 +
 drivers/net/ethernet/intel/e1000e/ptp.c |  85 +
 drivers/net/ethernet/intel/e1000e/regs.h|   4 +
 drivers/ptp/ptp_chardev.c   |  27 +++
 include/linux/pps_kernel.h  |  17 +-
 include/linux/ptp_clock_kernel.h|   8 +
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |  58 ++
 include/uapi/linux/ptp_clock.h  |  13 +-
 kernel/time/timekeeping.c   | 286 +---
 15 files changed, 543 insertions(+), 40 deletions(-)

-- 
1.9.1



[PATCH 4/8] time: Add driver cross timestamp interface for higher precision time synchronization

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
. It has changed considerably and any mistakes are
mine.

The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.

Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.

The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds.  The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time.  The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.

Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 35 
 kernel/time/timekeeping.c   | 56 +
 2 files changed, 91 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 7817591..4a2ca65 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -280,6 +280,41 @@ struct system_time_snapshot {
 };
 
 /*
+ * struct system_device_crosststamp - system/device cross-timestamp
+ * (syncronized capture)
+ * @device:Device time
+ * @sys_realtime:  Realtime simultaneous with device time
+ * @sys_monoraw:   Monotonic raw simultaneous with device time
+ */
+struct system_device_crosststamp {
+   ktime_t device;
+   ktime_t sys_realtime;
+   ktime_t sys_monoraw;
+};
+
+/*
+ * struct system_counterval_t - system counter value with the pointer to the
+ * corresponding clocksource
+ * @cycles:System counter value
+ * @cs:Clocksource corresponding to system counter value. Used 
by
+ * timekeeping code to verify comparibility of two cycle values
+ */
+struct system_counterval_t {
+   cycle_t cycles;
+   struct clocksource  *cs;
+};
+
+/*
+ * Get cross timestamp between system clock and device clock
+ */
+extern int get_device_system_crosststamp(
+   int (*get_time_fn)(ktime_t *device_time,
+   struct system_counterval_t *system_counterval,
+   void *ctx),
+   void *ctx,
+   struct system_device_crosststamp *xtstamp);
+
+/*
  * Simultaneously snapshot realtime and monotonic raw clocks
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index af19a49..dba595c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -908,6 +908,62 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 /**
+ * get_device_system_crosststamp - Synchronously capture system/device 
timestamp
+ * @sync_devicetime:   Callback to get simultaneous device time and
+ * system counter from the device driver
+ * @xtstamp:   Receives simultaneously captured system and device time
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_device_system_crosststamp(int (*get_time_fn)
+ (ktime_t *device_time,
+  struct system_counterval_t *sys_counterval,
+  void *ctx),
+ void *ctx,
+ struct system_device_crosststamp *xtstamp)
+{
+   struct system_counterval_t system_counterval;
+   

[PATCH 2/8] time: Add timekeeping snapshot code capturing system time and counter

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

In the current timekeeping code there isn't any interface to
atomically capture the current relationship between the system counter
and system time. ktime_get_snapshot() returns this triple (counter,
monotonic raw, realtime) in the system_time_snapshot struct.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Moved structure definitions around to clean things up,
 fixed cycles_t/cycle_t confusion.]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 18 ++
 kernel/time/timekeeping.c   | 30 ++
 2 files changed, 48 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ec89d84..7817591 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -267,6 +267,24 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
struct timespec64 *ts_real);
 
 /*
+ * struct system_time_snapshot - simultaneous raw/real time capture with
+ * counter value
+ * @cycles:Clocksource counter value to produce the system times
+ * @real:  Realtime system time
+ * @raw:   Monotonic raw system time
+ */
+struct system_time_snapshot {
+   cycle_t cycles;
+   ktime_t real;
+   ktime_t raw;
+};
+
+/*
+ * Simultaneously snapshot realtime and monotonic raw clocks
+ */
+extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
+
+/*
  * Persistent clock related interfaces
  */
 extern int persistent_clock_is_local;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4243d28..89b4695 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -874,6 +874,36 @@ time64_t __ktime_get_real_seconds(void)
return tk->xtime_sec;
 }
 
+/**
+ * ktime_get_snapshot - snapshots the realtime/monotonic raw clocks with 
counter
+ * @systime_snapshot:  pointer to struct receiving the system time snapshot
+ */
+void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
+{
+   struct timekeeper *tk = &tk_core.timekeeper;
+   unsigned long seq;
+   ktime_t base_raw;
+   ktime_t base_real;
+   s64 nsec_raw;
+   s64 nsec_real;
+   cycle_t now;
+
+   do {
+   seq = read_seqcount_begin(&tk_core.seq);
+
+   now = tk->tkr_mono.read(tk->tkr_mono.clock);
+   base_real = ktime_add(tk->tkr_mono.base,
+ tk_core.timekeeper.offs_real);
+   base_raw = tk->tkr_raw.base;
+   nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
+   nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
+   } while (read_seqcount_retry(&tk_core.seq, seq));
+
+   systime_snapshot->cycles = now;
+   systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
+   systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+}
+EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 #ifdef CONFIG_NTP_PPS
 
-- 
1.9.1



[PATCH 3/8] time: Remove duplicated code in ktime_get_raw_and_real()

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

The code in ktime_get_snapshot() is a superset of the code in
ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
called only by the PPS code, pps_get_ts(). Consolidate the
pps_get_ts() code into a single function calling ktime_get_snapshot()
and eliminate ktime_get_raw_and_real(). A side effect of this is that
the raw and real results of pps_get_ts() correspond to exactly the
same clock cycle. Previously these values represented separate reads
of the system clock.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 include/linux/pps_kernel.h | 17 ++---
 kernel/time/timekeeping.c  | 40 ++--
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/include/linux/pps_kernel.h b/include/linux/pps_kernel.h
index 54bf148..35ac903 100644
--- a/include/linux/pps_kernel.h
+++ b/include/linux/pps_kernel.h
@@ -111,22 +111,17 @@ static inline void timespec_to_pps_ktime(struct pps_ktime 
*kt,
kt->nsec = ts.tv_nsec;
 }
 
-#ifdef CONFIG_NTP_PPS
-
 static inline void pps_get_ts(struct pps_event_time *ts)
 {
-   ktime_get_raw_and_real_ts64(&ts->ts_raw, &ts->ts_real);
-}
+   struct system_time_snapshot snap;
 
-#else /* CONFIG_NTP_PPS */
-
-static inline void pps_get_ts(struct pps_event_time *ts)
-{
-   ktime_get_real_ts64(&ts->ts_real);
+   ktime_get_snapshot(&snap);
+   ts->ts_real = ktime_to_timespec64(snap.real);
+#ifdef CONFIG_NTP_PPS
+   ts->ts_raw = ktime_to_timespec64(snap.raw);
+#endif
 }
 
-#endif /* CONFIG_NTP_PPS */
-
 /* Subtract known time delay from PPS event time(s) */
 static inline void pps_sub_ts(struct pps_event_time *ts, struct timespec64 
delta)
 {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 89b4695..af19a49 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -888,6 +888,8 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
s64 nsec_real;
cycle_t now;
 
+   WARN_ON_ONCE(timekeeping_suspended);
+
do {
seq = read_seqcount_begin(&tk_core.seq);
 
@@ -905,44 +907,6 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
-#ifdef CONFIG_NTP_PPS
-
-/**
- * ktime_get_raw_and_real_ts64 - get day and raw monotonic time in timespec 
format
- * @ts_raw:pointer to the timespec to be set to raw monotonic time
- * @ts_real:   pointer to the timespec to be set to the time of day
- *
- * This function reads both the time of day and raw monotonic time at the
- * same time atomically and stores the resulting timestamps in timespec
- * format.
- */
-void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw, struct timespec64 
*ts_real)
-{
-   struct timekeeper *tk = &tk_core.timekeeper;
-   unsigned long seq;
-   s64 nsecs_raw, nsecs_real;
-
-   WARN_ON_ONCE(timekeeping_suspended);
-
-   do {
-   seq = read_seqcount_begin(&tk_core.seq);
-
-   *ts_raw = tk->raw_time;
-   ts_real->tv_sec = tk->xtime_sec;
-   ts_real->tv_nsec = 0;
-
-   nsecs_raw  = timekeeping_get_ns(&tk->tkr_raw);
-   nsecs_real = timekeeping_get_ns(&tk->tkr_mono);
-
-   } while (read_seqcount_retry(&tk_core.seq, seq));
-
-   timespec64_add_ns(ts_raw, nsecs_raw);
-   timespec64_add_ns(ts_real, nsecs_real);
-}
-EXPORT_SYMBOL(ktime_get_raw_and_real_ts64);
-
-#endif /* CONFIG_NTP_PPS */
-
 /**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:pointer to the timeval to be set
-- 
1.9.1



[PATCH 1/8] time: Add cycles to nanoseconds translation

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34b4ced..4243d28 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -298,17 +298,34 @@ u32 (*arch_gettimeoffset)(void) = 
default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
+static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+ cycle_t delta)
+{
+   s64 nsec;
+
+   nsec = delta * tkr->mult + tkr->xtime_nsec;
+   nsec >>= tkr->shift;
+
+   /* If arch requires, add in get_arch_timeoffset() */
+   return nsec + arch_gettimeoffset();
+}
+
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
cycle_t delta;
-   s64 nsec;
 
delta = timekeeping_get_delta(tkr);
+   return timekeeping_delta_to_ns(tkr, delta);
+}
 
-   nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
+static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
+   cycle_t cycles)
+{
+   cycle_t delta;
 
-   /* If arch requires, add in get_arch_timeoffset() */
-   return nsec + arch_gettimeoffset();
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+   return timekeeping_delta_to_ns(tkr, delta);
 }
 
 /**
-- 
1.9.1



[PATCH 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.

In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).

>From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:

System Clock <-> Audio clock
System Clock <-> Network Device Clock [<-> PTP Master Clock]

Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock).  The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence.  This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.

Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.

When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Fixed up cycles_t/cycle_t type confusion]
Signed-off-by: John Stultz 
---
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |   5 ++
 kernel/time/timekeeping.c   | 171 +++-
 3 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 2524722..e880054 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -50,6 +50,7 @@ struct tk_read_base {
  * @offs_tai:  Offset clock monotonic -> clock tai
  * @tai_offset:The current UTC to TAI offset in seconds
  * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
  * @raw_time:  Monotonic raw base time in timespec64 format
  * @cycle_interval:Number of clock cycles in one NTP interval
@@ -91,6 +92,7 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
ktime_t next_leap_ktime;
struct timespec64   raw_time;
 
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 4a2ca65..96f37be 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -272,11 +272,15 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
  * @cycles:Clocksource counter value to produce the system times
  * @real:  Realtime system time
  * @raw:   Monotonic raw system time
+ * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  */
 struct system_time_snapshot {
cycle_t cycles;
ktime_t real;
ktime_t raw;
+   unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
 };
 
 /*
@@ -312,6 +316,7 @@ ext

[PATCH 7/8] ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

Currently, network /system cross-timestamping is performed in the
PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and
the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

The getcrosststamp() callback and corresponding PTP_SYS_OFFSET_PRECISE
ioctl allows the driver to perform this device/system correlation when
for example cross timestamp hardware is available. Modern Intel
systems can do this for onboard Ethernet controllers using the ART
counter. There is virtually zero latency between captures of the ART
and network device clock.

The capabilities ioctl (PTP_CLOCK_GETCAPS), is augmented allowing
applications to query whether or not drivers implement the
getcrosststamp callback, providing more precise cross timestamping.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Richard Cochran 
Signed-off-by: Christopher S. Hall 
[jstultz: Commit subject tweaks]
Signed-off-by: John Stultz 
---
 Documentation/ptp/testptp.c  |  6 --
 drivers/ptp/ptp_chardev.c| 27 +++
 include/linux/ptp_clock_kernel.h |  8 
 include/uapi/linux/ptp_clock.h   | 13 -
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 6c6247a..d99012f 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -277,13 +277,15 @@ int main(int argc, char *argv[])
   "  %d external time stamp channels\n"
   "  %d programmable periodic signals\n"
   "  %d pulse per second\n"
-  "  %d programmable pins\n",
+  "  %d programmable pins\n"
+  "  %d cross timestamping\n",
   caps.max_adj,
   caps.n_alarm,
   caps.n_ext_ts,
   caps.n_per_out,
   caps.pps,
-  caps.n_pins);
+  caps.n_pins,
+  caps.cross_timestamping);
}
}
 
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..579fd65 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ptp_private.h"
 
@@ -120,11 +121,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
struct ptp_clock_caps caps;
struct ptp_clock_request req;
struct ptp_sys_offset *sysoff = NULL;
+   struct ptp_sys_offset_precise precise_offset;
struct ptp_pin_desc pd;
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
struct ptp_clock_info *ops = ptp->info;
struct ptp_clock_time *pct;
struct timespec64 ts;
+   struct system_device_crosststamp xtstamp;
int enable, err = 0;
unsigned int i, pin_index;
 
@@ -138,6 +141,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
caps.n_per_out = ptp->info->n_per_out;
caps.pps = ptp->info->pps;
caps.n_pins = ptp->info->n_pins;
+   caps.cross_timestamping = ptp->info->getcrosststamp != NULL;
if (copy_to_user((void __user *)arg, &caps, sizeof(caps)))
err = -EFAULT;
break;
@@ -180,6 +184,29 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
err = ops->enable(ops, &req, enable);
break;
 
+   case PTP_SYS_OFFSET_PRECISE:
+   if (!ptp->info->getcrosststamp) {
+   err = -EOPNOTSUPP;
+   break;
+   }
+   err = ptp->info->getcrosststamp(ptp->info, &xtstamp);
+   if (err)
+   break;
+
+   ts = ktime_to_timespec64(xtstamp.device);
+   precise_offset.device.sec = ts.tv_sec;
+   precise_offset.device.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_realtime);
+   precise_offset.sys_realtime.sec = ts.tv_sec;
+   precise_offset.sys_realtime.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_monoraw);
+   precise_offset.sys_monoraw.sec = ts.t

[PATCH 6/8] x86/tsc: Always Running Timer (ART) correlated clocksource

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 1]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
fixes]
Signed-off-by: John Stultz 
---
 arch/x86/include/asm/cpufeature.h |  2 +-
 arch/x86/include/asm/tsc.h|  2 ++
 arch/x86/kernel/tsc.c | 59 +++
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..ff557b4 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -85,7 +85,7 @@
 #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */
 #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */
 #define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */
-/* free, was #define X86_FEATURE_FXSAVE_LEAK ( 3*32+10) * "" FXSAVE leaks 
FOP/FIP/FOP */
+#define X86_FEATURE_ART(3*32+10) /* Platform has always 
running timer (ART) */
 #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */
 #define X86_FEATURE_PEBS   ( 3*32+12) /* Precise-Event Based Sampling */
 #define X86_FEATURE_BTS( 3*32+13) /* Branch Trace Store */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 6d7c547..174c421 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -29,6 +29,8 @@ static inline cycles_t get_cycles(void)
return rdtsc();
 }
 
+extern struct system_counterval_t convert_art_to_tsc(cycle_t art);
+
 extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3d743da..80d761e 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -43,6 +43,11 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static u32 art_to_tsc_numerator;
+static u32 art_to_tsc_denominator;
+static u64 art_to_tsc_offset;
+struct clocksource *art_related_clocksource;
+
 /*
  * Use a ring-buffer like data structure, where a writer advances the head by
  * writing a new data entry and a reader advances the tail when it observes a
@@ -964,6 +969,37 @@ core_initcall(cpufreq_tsc);
 
 #endif /* CONFIG_CPU_FREQ */
 
+#define ART_CPUID_LEAF (0x15)
+#define ART_MIN_DENOMINATOR (1)
+
+
+/*
+ * If ART is present detect the numerator:denominator to convert to TSC
+ */
+static void detect_art(void)
+{
+   unsigned int unused[2];
+
+   if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
+   return;
+
+   cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+ &art_to_tsc_numerator, unused, unused+1);
+
+   /* Don't enable ART in a VM, non-stop TSC required */
+   if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
+   !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
+   art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+   return;
+
+   if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+   return;
+
+   /* Make this sticky over multiple CPU init calls */
+   setup_force_cpu_cap(X86_FEATURE_ART);
+}
+
+
 /* clocksource code */
 
 static struct clocksource clocksource_tsc;
@@ -1071,6 +1107,25 @@ int unsynchronized_tsc(void)
return 0;
 }
 
+/*
+ * Convert ART to TSC given numerator/denominator found in detect_art()
+ */
+struct system_counterval_t convert_art_to_tsc(cycle_t art)
+{
+   u64 tmp, res, rem;
+
+   rem = do_div(art, art_to_tsc_denominator);
+
+   res = art * art_to_tsc_numerator;
+   tmp = rem * art_to_tsc_numerator;
+
+   do_div(tmp, art_to_tsc_denominator);
+   res += tmp + art_to_tsc_offset;
+
+   return (struct system_counterval_t) {.cs = art_related_clocksource,
+   .cycles = res};
+}
+EXPORT_SYMBOL(convert_art_to_tsc);
 
 static void tsc_refine_calibration_work(struct work

[PATCH 8/8] e1000e: Adds hardware supported cross timestamp on e1000e nic

2016-03-03 Thread John Stultz
From: "Christopher S. Hall" 

Modern Intel systems supports cross timestamping of the network device
clock and Always Running Timer (ART) in hardware.  This allows the
device time and system time to be precisely correlated. The timestamp
pair is returned through e1000e_phc_get_syncdevicetime() used by
get_system_device_crosststamp().  The hardware cross-timestamp result
is made available to applications through the PTP_SYS_OFFSET_PRECISE
ioctl which calls e1000e_phc_getcrosststamp().

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Jeff Kirsher 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Jeff Kirsher 
Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to use new interface, commit message tweaks]
Signed-off-by: John Stultz 
---
 drivers/net/ethernet/intel/Kconfig  |  9 +++
 drivers/net/ethernet/intel/e1000e/defines.h |  5 ++
 drivers/net/ethernet/intel/e1000e/ptp.c | 85 +
 drivers/net/ethernet/intel/e1000e/regs.h|  4 ++
 4 files changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index fa593dd..3772f3a 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -83,6 +83,15 @@ config E1000E
  To compile this driver as a module, choose M here. The module
  will be called e1000e.
 
+config E1000E_HWTS
+   bool "Support HW cross-timestamp on PCH devices"
+   default y
+   depends on E1000E && X86
+   ---help---
+Say Y to enable hardware supported cross-timestamping on PCH
+devices. The cross-timestamp is available through the PTP clock
+driver precise cross-timestamp ioctl (PTP_SYS_OFFSET_PRECISE).
+
 config IGB
tristate "Intel(R) 82575/82576 PCI-Express Gigabit Ethernet support"
depends on PCI
diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
b/drivers/net/ethernet/intel/e1000e/defines.h
index f7c7804..0641c00 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -528,6 +528,11 @@
 #define E1000_RXCW_C  0x2000/* Receive config */
 #define E1000_RXCW_SYNCH  0x4000/* Receive config synch */
 
+/* HH Time Sync */
+#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK  0xF000 /* max delay */
+#define E1000_TSYNCTXCTL_SYNC_COMP 0x4000 /* sync complete */
+#define E1000_TSYNCTXCTL_START_SYNC0x8000 /* initiate sync */
+
 #define E1000_TSYNCTXCTL_VALID 0x0001 /* Tx timestamp valid */
 #define E1000_TSYNCTXCTL_ENABLED   0x0010 /* enable Tx timestamping */
 
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c 
b/drivers/net/ethernet/intel/e1000e/ptp.c
index 25a0ad5..e2ff3ef 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -26,6 +26,12 @@
 
 #include "e1000.h"
 
+#ifdef CONFIG_E1000E_HWTS
+#include 
+#include 
+#include 
+#endif
+
 /**
  * e1000e_phc_adjfreq - adjust the frequency of the hardware clock
  * @ptp: ptp clock structure
@@ -98,6 +104,78 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
return 0;
 }
 
+#ifdef CONFIG_E1000E_HWTS
+#define MAX_HW_WAIT_COUNT (3)
+
+/**
+ * e1000e_phc_get_syncdevicetime - Callback given to timekeeping code reads 
system/device registers
+ * @device: current device time
+ * @system: system counter value read synchronously with device time
+ * @ctx: context provided by timekeeping code
+ *
+ * Read device and system (ART) clock simultaneously and return the corrected
+ * clock values in ns.
+ **/
+static int e1000e_phc_get_syncdevicetime(ktime_t *device,
+struct system_counterval_t *system,
+void *ctx)
+{
+   struct e1000_adapter *adapter = (struct e1000_adapter *)ctx;
+   struct e1000_hw *hw = &adapter->hw;
+   unsigned long flags;
+   int i;
+   u32 tsync_ctrl;
+   cycle_t dev_cycles;
+   cycle_t sys_cycles;
+
+   tsync_ctrl = er32(TSYNCTXCTL);
+   tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC |
+   E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK;
+   ew32(TSYNCTXCTL, tsync_ctrl);
+   for (i = 0; i < MAX_HW_WAIT_COUNT; ++i) {
+   udelay(1);
+   tsync_ctrl = er32(TSYNCTXCTL);
+   if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP)
+   break;
+   }
+
+   if (i == MAX_HW_WAIT_COUNT)
+   return -ETIMEDOUT;
+
+   dev_cycles = er32(SYSSTMPH);
+   dev_cycles <<= 32;
+   dev_cycles |= er32(SYSSTMPL);
+   spin_lock_irqsave(&adapter->systim_lock, flags);
+   *device = ns_to_ktime(timecounter_cyc2time(&adapter

Re: [PATCH 8/8] net: e1000e: Adds hardware supported cross timestamp on e1000e nic

2016-03-03 Thread John Stultz
On Thu, Mar 3, 2016 at 2:27 PM, Jeff Kirsher
 wrote:
> Since you are making changes to patch 6's title, then you should fix
> this patch title as well, it should be:
>
> [PATCH 8/8] e1000e: Adds hardware supported...

Ack
-john


Re: [PATCH 6/8] x86: tsc: Always Running Timer (ART) correlated clocksource

2016-03-03 Thread John Stultz
On Thu, Mar 3, 2016 at 2:21 AM, Thomas Gleixner  wrote:
> On Wed, 2 Mar 2016, John Stultz wrote:
>
> Subject: x86: tsc: Always Running ...
>
> Please make that:
>
> Subject: x86/tsc: Always Running ...

Will do.

>
>>
>> +#else
>> +
>> +#define detect_art()
>
> Inline stub if at all. Why sits detect_art() under CONFIG_CPU_FREQ while the
> rest of the art code is not

Ok. Just double checked with Christopher and the code is ok to move
out from CONFIG_CPU_FREQ.

I'll respin and resend the set later today.

thanks
-john


[PATCH 2/8] time: Add timekeeping snapshot code capturing system time and counter

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

In the current timekeeping code there isn't any interface to
atomically capture the current relationship between the system counter
and system time. ktime_get_snapshot() returns this triple (counter,
monotonic raw, realtime) in the system_time_snapshot struct.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Moved structure definitions around to clean things up,
 fixed cycles_t/cycle_t confusion.]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 18 ++
 kernel/time/timekeeping.c   | 30 ++
 2 files changed, 48 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ec89d84..7817591 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -267,6 +267,24 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
struct timespec64 *ts_real);
 
 /*
+ * struct system_time_snapshot - simultaneous raw/real time capture with
+ * counter value
+ * @cycles:Clocksource counter value to produce the system times
+ * @real:  Realtime system time
+ * @raw:   Monotonic raw system time
+ */
+struct system_time_snapshot {
+   cycle_t cycles;
+   ktime_t real;
+   ktime_t raw;
+};
+
+/*
+ * Simultaneously snapshot realtime and monotonic raw clocks
+ */
+extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
+
+/*
  * Persistent clock related interfaces
  */
 extern int persistent_clock_is_local;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4243d28..89b4695 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -874,6 +874,36 @@ time64_t __ktime_get_real_seconds(void)
return tk->xtime_sec;
 }
 
+/**
+ * ktime_get_snapshot - snapshots the realtime/monotonic raw clocks with 
counter
+ * @systime_snapshot:  pointer to struct receiving the system time snapshot
+ */
+void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
+{
+   struct timekeeper *tk = &tk_core.timekeeper;
+   unsigned long seq;
+   ktime_t base_raw;
+   ktime_t base_real;
+   s64 nsec_raw;
+   s64 nsec_real;
+   cycle_t now;
+
+   do {
+   seq = read_seqcount_begin(&tk_core.seq);
+
+   now = tk->tkr_mono.read(tk->tkr_mono.clock);
+   base_real = ktime_add(tk->tkr_mono.base,
+ tk_core.timekeeper.offs_real);
+   base_raw = tk->tkr_raw.base;
+   nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
+   nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
+   } while (read_seqcount_retry(&tk_core.seq, seq));
+
+   systime_snapshot->cycles = now;
+   systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
+   systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
+}
+EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 #ifdef CONFIG_NTP_PPS
 
-- 
1.9.1



[PATCH 0/8][GIT PULL] time: Cross-timestamp infrastructure for 4.6

2016-03-02 Thread John Stultz
Hey Thomas,
So here is Christopher's cross-timestamp infrastructure
patchset which I wanted to send along for 4.6. These apply against
tip/timers/core.

Let me know if you have any objections.

thanks
-john

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Jeff Kirsher 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org


The following changes since commit 232d26373d310a941ef2ab46e53ea62fe076ed13:

  jiffies: Use CLOCKSOURCE_MASK instead of constant (2016-02-27 08:55:31 +0100)

are available in the git repository at:

  https://git.linaro.org/people/john.stultz/linux.git fortglx/4.6/time

for you to fetch changes up to dbfa7fb88d3ab62268b36ee26b2987a01f72a6c9:

  net: e1000e: Adds hardware supported cross timestamp on e1000e nic 
(2016-03-02 17:44:58 -0800)


Christopher S. Hall (8):
  time: Add cycles to nanoseconds translation
  time: Add timekeeping snapshot code capturing system time and counter
  time: Remove duplicated code in ktime_get_raw_and_real()
  time: Add driver cross timestamp interface for higher precision time
synchronization
  time: Add history to cross timestamp interface supporting slower
devices
  x86: tsc: Always Running Timer (ART) correlated clocksource
  ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping
  net: e1000e: Adds hardware supported cross timestamp on e1000e nic

 Documentation/ptp/testptp.c |   6 +-
 arch/x86/include/asm/cpufeature.h   |   2 +-
 arch/x86/include/asm/tsc.h  |   2 +
 arch/x86/kernel/tsc.c   |  62 ++
 drivers/net/ethernet/intel/Kconfig  |   9 +
 drivers/net/ethernet/intel/e1000e/defines.h |   5 +
 drivers/net/ethernet/intel/e1000e/ptp.c |  85 +
 drivers/net/ethernet/intel/e1000e/regs.h|   4 +
 drivers/ptp/ptp_chardev.c   |  27 +++
 include/linux/pps_kernel.h  |  17 +-
 include/linux/ptp_clock_kernel.h|   8 +
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |  58 ++
 include/uapi/linux/ptp_clock.h  |  13 +-
 kernel/time/timekeeping.c   | 286 +---
 15 files changed, 546 insertions(+), 40 deletions(-)

-- 
1.9.1



[PATCH 3/8] time: Remove duplicated code in ktime_get_raw_and_real()

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

The code in ktime_get_snapshot() is a superset of the code in
ktime_get_raw_and_real() code. Further, ktime_get_raw_and_real() is
called only by the PPS code, pps_get_ts(). Consolidate the
pps_get_ts() code into a single function calling ktime_get_snapshot()
and eliminate ktime_get_raw_and_real(). A side effect of this is that
the raw and real results of pps_get_ts() correspond to exactly the
same clock cycle. Previously these values represented separate reads
of the system clock.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 include/linux/pps_kernel.h | 17 ++---
 kernel/time/timekeeping.c  | 40 ++--
 2 files changed, 8 insertions(+), 49 deletions(-)

diff --git a/include/linux/pps_kernel.h b/include/linux/pps_kernel.h
index 54bf148..35ac903 100644
--- a/include/linux/pps_kernel.h
+++ b/include/linux/pps_kernel.h
@@ -111,22 +111,17 @@ static inline void timespec_to_pps_ktime(struct pps_ktime 
*kt,
kt->nsec = ts.tv_nsec;
 }
 
-#ifdef CONFIG_NTP_PPS
-
 static inline void pps_get_ts(struct pps_event_time *ts)
 {
-   ktime_get_raw_and_real_ts64(&ts->ts_raw, &ts->ts_real);
-}
+   struct system_time_snapshot snap;
 
-#else /* CONFIG_NTP_PPS */
-
-static inline void pps_get_ts(struct pps_event_time *ts)
-{
-   ktime_get_real_ts64(&ts->ts_real);
+   ktime_get_snapshot(&snap);
+   ts->ts_real = ktime_to_timespec64(snap.real);
+#ifdef CONFIG_NTP_PPS
+   ts->ts_raw = ktime_to_timespec64(snap.raw);
+#endif
 }
 
-#endif /* CONFIG_NTP_PPS */
-
 /* Subtract known time delay from PPS event time(s) */
 static inline void pps_sub_ts(struct pps_event_time *ts, struct timespec64 
delta)
 {
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 89b4695..af19a49 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -888,6 +888,8 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
s64 nsec_real;
cycle_t now;
 
+   WARN_ON_ONCE(timekeeping_suspended);
+
do {
seq = read_seqcount_begin(&tk_core.seq);
 
@@ -905,44 +907,6 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 }
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
-#ifdef CONFIG_NTP_PPS
-
-/**
- * ktime_get_raw_and_real_ts64 - get day and raw monotonic time in timespec 
format
- * @ts_raw:pointer to the timespec to be set to raw monotonic time
- * @ts_real:   pointer to the timespec to be set to the time of day
- *
- * This function reads both the time of day and raw monotonic time at the
- * same time atomically and stores the resulting timestamps in timespec
- * format.
- */
-void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw, struct timespec64 
*ts_real)
-{
-   struct timekeeper *tk = &tk_core.timekeeper;
-   unsigned long seq;
-   s64 nsecs_raw, nsecs_real;
-
-   WARN_ON_ONCE(timekeeping_suspended);
-
-   do {
-   seq = read_seqcount_begin(&tk_core.seq);
-
-   *ts_raw = tk->raw_time;
-   ts_real->tv_sec = tk->xtime_sec;
-   ts_real->tv_nsec = 0;
-
-   nsecs_raw  = timekeeping_get_ns(&tk->tkr_raw);
-   nsecs_real = timekeeping_get_ns(&tk->tkr_mono);
-
-   } while (read_seqcount_retry(&tk_core.seq, seq));
-
-   timespec64_add_ns(ts_raw, nsecs_raw);
-   timespec64_add_ns(ts_real, nsecs_real);
-}
-EXPORT_SYMBOL(ktime_get_raw_and_real_ts64);
-
-#endif /* CONFIG_NTP_PPS */
-
 /**
  * do_gettimeofday - Returns the time of day in a timeval
  * @tv:pointer to the timeval to be set
-- 
1.9.1



[PATCH 4/8] time: Add driver cross timestamp interface for higher precision time synchronization

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

ACKNOWLEDGMENT: cross timestamp code was developed by Thomas Gleixner
. It has changed considerably and any mistakes are
mine.

The precision with which events on multiple networked systems can be
synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
by the precision of the cross timestamps between the system clock and
the device (timestamp) clock. Precision here is the degree of
simultaneity when capturing the cross timestamp.

Currently the PTP cross timestamp is captured in software using the
PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
interleaved with reads of the realtime clock. At best, the precision
of this cross timestamp is on the order of several microseconds due to
software latencies. Sub-microsecond precision is required for
industrial control and some media applications. To achieve this level
of precision hardware supported cross timestamping is needed.

The function get_device_system_crosstimestamp() allows device drivers
to return a cross timestamp with system time properly scaled to
nanoseconds.  The realtime value is needed to discipline that clock
using PTP and the monotonic raw value is used for applications that
don't require a "real" time, but need an unadjusted clock time.  The
get_device_system_crosstimestamp() code calls back into the driver to
ensure that the system counter is within the current timekeeping
update interval.

Modern Intel hardware provides an Always Running Timer (ART) which is
exactly related to TSC through a known frequency ratio. The ART is
routed to devices on the system and is used to precisely and
simultaneously capture the device clock with the ART.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to remove extra structures and simplify calling]
Signed-off-by: John Stultz 
---
 include/linux/timekeeping.h | 35 
 kernel/time/timekeeping.c   | 56 +
 2 files changed, 91 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 7817591..4a2ca65 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -280,6 +280,41 @@ struct system_time_snapshot {
 };
 
 /*
+ * struct system_device_crosststamp - system/device cross-timestamp
+ * (syncronized capture)
+ * @device:Device time
+ * @sys_realtime:  Realtime simultaneous with device time
+ * @sys_monoraw:   Monotonic raw simultaneous with device time
+ */
+struct system_device_crosststamp {
+   ktime_t device;
+   ktime_t sys_realtime;
+   ktime_t sys_monoraw;
+};
+
+/*
+ * struct system_counterval_t - system counter value with the pointer to the
+ * corresponding clocksource
+ * @cycles:System counter value
+ * @cs:Clocksource corresponding to system counter value. Used 
by
+ * timekeeping code to verify comparibility of two cycle values
+ */
+struct system_counterval_t {
+   cycle_t cycles;
+   struct clocksource  *cs;
+};
+
+/*
+ * Get cross timestamp between system clock and device clock
+ */
+extern int get_device_system_crosststamp(
+   int (*get_time_fn)(ktime_t *device_time,
+   struct system_counterval_t *system_counterval,
+   void *ctx),
+   void *ctx,
+   struct system_device_crosststamp *xtstamp);
+
+/*
  * Simultaneously snapshot realtime and monotonic raw clocks
  */
 extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index af19a49..dba595c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -908,6 +908,62 @@ void ktime_get_snapshot(struct system_time_snapshot 
*systime_snapshot)
 EXPORT_SYMBOL_GPL(ktime_get_snapshot);
 
 /**
+ * get_device_system_crosststamp - Synchronously capture system/device 
timestamp
+ * @sync_devicetime:   Callback to get simultaneous device time and
+ * system counter from the device driver
+ * @xtstamp:   Receives simultaneously captured system and device time
+ *
+ * Reads a timestamp from a device and correlates it to system time
+ */
+int get_device_system_crosststamp(int (*get_time_fn)
+ (ktime_t *device_time,
+  struct system_counterval_t *sys_counterval,
+  void *ctx),
+ void *ctx,
+ struct system_device_crosststamp *xtstamp)
+{
+   struct system_counterval_t system_counterval;
+   

[PATCH 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

Another representative use case of time sync and the correlated
clocksource (in addition to PTP noted above) is PTP synchronized
audio.

In a streaming application, as an example, samples will be sent and/or
received by multiple devices with a presentation time that is in terms
of the PTP master clock. Synchronizing the audio output on these
devices requires correlating the audio clock with the PTP master
clock. The more precise this correlation is, the better the audio
quality (i.e. out of sync audio sounds bad).

>From an application standpoint, to correlate the PTP master clock with
the audio device clock, the system clock is used as a intermediate
timebase. The transforms such an application would perform are:

System Clock <-> Audio clock
System Clock <-> Network Device Clock [<-> PTP Master Clock]

Modern Intel platforms can perform a more accurate cross timestamp in
hardware (ART,audio device clock).  The audio driver requires
ART->system time transforms -- the same as required for the network
driver. These platforms offload audio processing (including
cross-timestamps) to a DSP which to ensure uninterrupted audio
processing, communicates and response to the host only once every
millsecond. As a result is takes up to a millisecond for the DSP to
receive a request, the request is processed by the DSP, the audio
output hardware is polled for completion, the result is copied into
shared memory, and the host is notified. All of these operation occur
on a millisecond cadence.  This transaction requires about 2 ms, but
under heavier workloads it may take up to 4 ms.

Adding a history allows these slow devices the option of providing an
ART value outside of the current interval. In this case, the callback
provided is an accessor function for the previously obtained counter
value. If get_system_device_crosststamp() receives a counter value
previous to cycle_last, it consults the history provided as an
argument in history_ref and interpolates the realtime and monotonic
raw system time using the provided counter value. If there are any
clock discontinuities, e.g. from calling settimeofday(), the monotonic
raw time is interpolated in the usual way, but the realtime clock time
is adjusted by scaling the monotonic raw adjustment.

When an accessor function is used a history argument *must* be
provided. The history is initialized using ktime_get_snapshot() and
must be called before the counter values are read.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Fixed up cycles_t/cycle_t type confusion]
Signed-off-by: John Stultz 
---
 include/linux/timekeeper_internal.h |   2 +
 include/linux/timekeeping.h |   5 ++
 kernel/time/timekeeping.c   | 171 +++-
 3 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/include/linux/timekeeper_internal.h 
b/include/linux/timekeeper_internal.h
index 2524722..e880054 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -50,6 +50,7 @@ struct tk_read_base {
  * @offs_tai:  Offset clock monotonic -> clock tai
  * @tai_offset:The current UTC to TAI offset in seconds
  * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
  * @raw_time:  Monotonic raw base time in timespec64 format
  * @cycle_interval:Number of clock cycles in one NTP interval
@@ -91,6 +92,7 @@ struct timekeeper {
ktime_t offs_tai;
s32 tai_offset;
unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
ktime_t next_leap_ktime;
struct timespec64   raw_time;
 
diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index 4a2ca65..96f37be 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -272,11 +272,15 @@ extern void ktime_get_raw_and_real_ts64(struct timespec64 
*ts_raw,
  * @cycles:Clocksource counter value to produce the system times
  * @real:  Realtime system time
  * @raw:   Monotonic raw system time
+ * @clock_was_set_seq: The sequence number of clock was set events
+ * @cs_was_changed_seq:The sequence number of clocksource change events
  */
 struct system_time_snapshot {
cycle_t cycles;
ktime_t real;
ktime_t raw;
+   unsigned intclock_was_set_seq;
+   u8  cs_was_changed_seq;
 };
 
 /*
@@ -312,6 +316,7 @@ ext

[PATCH 1/8] time: Add cycles to nanoseconds translation

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
Signed-off-by: John Stultz 
---
 kernel/time/timekeeping.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 34b4ced..4243d28 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -298,17 +298,34 @@ u32 (*arch_gettimeoffset)(void) = 
default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
+static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+ cycle_t delta)
+{
+   s64 nsec;
+
+   nsec = delta * tkr->mult + tkr->xtime_nsec;
+   nsec >>= tkr->shift;
+
+   /* If arch requires, add in get_arch_timeoffset() */
+   return nsec + arch_gettimeoffset();
+}
+
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
cycle_t delta;
-   s64 nsec;
 
delta = timekeeping_get_delta(tkr);
+   return timekeeping_delta_to_ns(tkr, delta);
+}
 
-   nsec = (delta * tkr->mult + tkr->xtime_nsec) >> tkr->shift;
+static inline s64 timekeeping_cycles_to_ns(struct tk_read_base *tkr,
+   cycle_t cycles)
+{
+   cycle_t delta;
 
-   /* If arch requires, add in get_arch_timeoffset() */
-   return nsec + arch_gettimeoffset();
+   /* calculate the delta since the last update_wall_time */
+   delta = clocksource_delta(cycles, tkr->cycle_last, tkr->mask);
+   return timekeeping_delta_to_ns(tkr, delta);
 }
 
 /**
-- 
1.9.1



[PATCH 6/8] x86: tsc: Always Running Timer (ART) correlated clocksource

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

On modern Intel systems TSC is derived from the new Always Running Timer
(ART). ART can be captured simultaneous to the capture of
audio and network device clocks, allowing a correlation between timebases
to be constructed. Upon capture, the driver converts the captured ART
value to the appropriate system clock using the correlated clocksource
mechanism.

On systems that support ART a new CPUID leaf (0x15) returns parameters
“m” and “n” such that:

TSC_value = (ART_value * m) / n + k [n >= 1]

[k is an offset that can adjusted by a privileged agent. The
IA32_TSC_ADJUST MSR is an example of an interface to adjust k.
See 17.14.4 of the Intel SDM for more details]

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner 
Signed-off-by: Christopher S. Hall 
[jstultz: Tweaked to fix build issue, also reworked math for
64bit division on 32bit systems, as well as !CONFIG_CPU_FREQ build
fixes]
Signed-off-by: John Stultz 
---
 arch/x86/include/asm/cpufeature.h |  2 +-
 arch/x86/include/asm/tsc.h|  2 ++
 arch/x86/kernel/tsc.c | 62 +++
 3 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index 7ad8c94..ff557b4 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -85,7 +85,7 @@
 #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */
 #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */
 #define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */
-/* free, was #define X86_FEATURE_FXSAVE_LEAK ( 3*32+10) * "" FXSAVE leaks 
FOP/FIP/FOP */
+#define X86_FEATURE_ART(3*32+10) /* Platform has always 
running timer (ART) */
 #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */
 #define X86_FEATURE_PEBS   ( 3*32+12) /* Precise-Event Based Sampling */
 #define X86_FEATURE_BTS( 3*32+13) /* Branch Trace Store */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 6d7c547..174c421 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -29,6 +29,8 @@ static inline cycles_t get_cycles(void)
return rdtsc();
 }
 
+extern struct system_counterval_t convert_art_to_tsc(cycle_t art);
+
 extern void tsc_init(void);
 extern void mark_tsc_unstable(char *reason);
 extern int unsynchronized_tsc(void);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 3d743da..683c8cb 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -43,6 +43,11 @@ static DEFINE_STATIC_KEY_FALSE(__use_tsc);
 
 int tsc_clocksource_reliable;
 
+static u32 art_to_tsc_numerator;
+static u32 art_to_tsc_denominator;
+static u64 art_to_tsc_offset;
+struct clocksource *art_related_clocksource;
+
 /*
  * Use a ring-buffer like data structure, where a writer advances the head by
  * writing a new data entry and a reader advances the tail when it observes a
@@ -949,6 +954,36 @@ static struct notifier_block time_cpufreq_notifier_block = 
{
.notifier_call  = time_cpufreq_notifier
 };
 
+#define ART_CPUID_LEAF (0x15)
+#define ART_MIN_DENOMINATOR (1)
+
+
+/*
+ * If ART is present detect the numerator:denominator to convert to TSC
+ */
+static void detect_art(void)
+{
+   unsigned int unused[2];
+
+   if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
+   return;
+
+   cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+ &art_to_tsc_numerator, unused, unused+1);
+
+   /* Don't enable ART in a VM, non-stop TSC required */
+   if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
+   !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
+   art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+   return;
+
+   if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+   return;
+
+   /* Make this sticky over multiple CPU init calls */
+   setup_force_cpu_cap(X86_FEATURE_ART);
+}
+
 static int __init cpufreq_tsc(void)
 {
if (!cpu_has_tsc)
@@ -962,6 +997,10 @@ static int __init cpufreq_tsc(void)
 
 core_initcall(cpufreq_tsc);
 
+#else
+
+#define detect_art()
+
 #endif /* CONFIG_CPU_FREQ */
 
 /* clocksource code */
@@ -1071,6 +1110,25 @@ int unsynchronized_tsc(void)
return 0;
 }
 
+/*
+ * Convert ART to TSC given numerator/denominator found in detect_art()
+ */
+struct system_counterval_t convert_art_to_tsc(cycle_t art)
+{
+   u64 tmp, res, rem;
+
+   rem = do_div(art, art_to_tsc_denominator);
+
+   res = art * art_to_tsc_numerator;
+   tmp = rem * art_to_tsc_numerator;
+
+   do_div(tmp, art_to_tsc_denominator);
+   res += tmp + art_to_tsc_of

[PATCH 8/8] net: e1000e: Adds hardware supported cross timestamp on e1000e nic

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

Modern Intel systems supports cross timestamping of the network device
clock and Always Running Timer (ART) in hardware.  This allows the
device time and system time to be precisely correlated. The timestamp
pair is returned through e1000e_phc_get_syncdevicetime() used by
get_system_device_crosststamp().  The hardware cross-timestamp result
is made available to applications through the PTP_SYS_OFFSET_PRECISE
ioctl which calls e1000e_phc_getcrosststamp().

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: Jeff Kirsher 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Jeff Kirsher 
Signed-off-by: Christopher S. Hall 
[jstultz: Reworked to use new interface, commit message tweaks]
Signed-off-by: John Stultz 
---
 drivers/net/ethernet/intel/Kconfig  |  9 +++
 drivers/net/ethernet/intel/e1000e/defines.h |  5 ++
 drivers/net/ethernet/intel/e1000e/ptp.c | 85 +
 drivers/net/ethernet/intel/e1000e/regs.h|  4 ++
 4 files changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/intel/Kconfig 
b/drivers/net/ethernet/intel/Kconfig
index fa593dd..3772f3a 100644
--- a/drivers/net/ethernet/intel/Kconfig
+++ b/drivers/net/ethernet/intel/Kconfig
@@ -83,6 +83,15 @@ config E1000E
  To compile this driver as a module, choose M here. The module
  will be called e1000e.
 
+config E1000E_HWTS
+   bool "Support HW cross-timestamp on PCH devices"
+   default y
+   depends on E1000E && X86
+   ---help---
+Say Y to enable hardware supported cross-timestamping on PCH
+devices. The cross-timestamp is available through the PTP clock
+driver precise cross-timestamp ioctl (PTP_SYS_OFFSET_PRECISE).
+
 config IGB
tristate "Intel(R) 82575/82576 PCI-Express Gigabit Ethernet support"
depends on PCI
diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
b/drivers/net/ethernet/intel/e1000e/defines.h
index f7c7804..0641c00 100644
--- a/drivers/net/ethernet/intel/e1000e/defines.h
+++ b/drivers/net/ethernet/intel/e1000e/defines.h
@@ -528,6 +528,11 @@
 #define E1000_RXCW_C  0x2000/* Receive config */
 #define E1000_RXCW_SYNCH  0x4000/* Receive config synch */
 
+/* HH Time Sync */
+#define E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK  0xF000 /* max delay */
+#define E1000_TSYNCTXCTL_SYNC_COMP 0x4000 /* sync complete */
+#define E1000_TSYNCTXCTL_START_SYNC0x8000 /* initiate sync */
+
 #define E1000_TSYNCTXCTL_VALID 0x0001 /* Tx timestamp valid */
 #define E1000_TSYNCTXCTL_ENABLED   0x0010 /* enable Tx timestamping */
 
diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c 
b/drivers/net/ethernet/intel/e1000e/ptp.c
index 25a0ad5..e2ff3ef 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -26,6 +26,12 @@
 
 #include "e1000.h"
 
+#ifdef CONFIG_E1000E_HWTS
+#include 
+#include 
+#include 
+#endif
+
 /**
  * e1000e_phc_adjfreq - adjust the frequency of the hardware clock
  * @ptp: ptp clock structure
@@ -98,6 +104,78 @@ static int e1000e_phc_adjtime(struct ptp_clock_info *ptp, 
s64 delta)
return 0;
 }
 
+#ifdef CONFIG_E1000E_HWTS
+#define MAX_HW_WAIT_COUNT (3)
+
+/**
+ * e1000e_phc_get_syncdevicetime - Callback given to timekeeping code reads 
system/device registers
+ * @device: current device time
+ * @system: system counter value read synchronously with device time
+ * @ctx: context provided by timekeeping code
+ *
+ * Read device and system (ART) clock simultaneously and return the corrected
+ * clock values in ns.
+ **/
+static int e1000e_phc_get_syncdevicetime(ktime_t *device,
+struct system_counterval_t *system,
+void *ctx)
+{
+   struct e1000_adapter *adapter = (struct e1000_adapter *)ctx;
+   struct e1000_hw *hw = &adapter->hw;
+   unsigned long flags;
+   int i;
+   u32 tsync_ctrl;
+   cycle_t dev_cycles;
+   cycle_t sys_cycles;
+
+   tsync_ctrl = er32(TSYNCTXCTL);
+   tsync_ctrl |= E1000_TSYNCTXCTL_START_SYNC |
+   E1000_TSYNCTXCTL_MAX_ALLOWED_DLY_MASK;
+   ew32(TSYNCTXCTL, tsync_ctrl);
+   for (i = 0; i < MAX_HW_WAIT_COUNT; ++i) {
+   udelay(1);
+   tsync_ctrl = er32(TSYNCTXCTL);
+   if (tsync_ctrl & E1000_TSYNCTXCTL_SYNC_COMP)
+   break;
+   }
+
+   if (i == MAX_HW_WAIT_COUNT)
+   return -ETIMEDOUT;
+
+   dev_cycles = er32(SYSSTMPH);
+   dev_cycles <<= 32;
+   dev_cycles |= er32(SYSSTMPL);
+   spin_lock_irqsave(&adapter->systim_lock, flags);
+   *device = ns_to_ktime(timecounter_cyc2time(&adapter

[PATCH 7/8] ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping

2016-03-02 Thread John Stultz
From: "Christopher S. Hall" 

Currently, network /system cross-timestamping is performed in the
PTP_SYS_OFFSET ioctl. The PTP clock driver reads gettimeofday() and
the gettime64() callback provided by the driver. The cross-timestamp
is best effort where the latency between the capture of system time
(getnstimeofday()) and the device time (driver callback) may be
significant.

The getcrosststamp() callback and corresponding PTP_SYS_OFFSET_PRECISE
ioctl allows the driver to perform this device/system correlation when
for example cross timestamp hardware is available. Modern Intel
systems can do this for onboard Ethernet controllers using the ART
counter. There is virtually zero latency between captures of the ART
and network device clock.

The capabilities ioctl (PTP_CLOCK_GETCAPS), is augmented allowing
applications to query whether or not drivers implement the
getcrosststamp callback, providing more precise cross timestamping.

Cc: Prarit Bhargava 
Cc: Richard Cochran 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Andy Lutomirski 
Cc: kevin.b.stan...@intel.com
Cc: kevin.j.cla...@intel.com
Cc: h...@zytor.com
Cc: jeffrey.t.kirs...@intel.com
Cc: netdev@vger.kernel.org
Acked-by: Richard Cochran 
Signed-off-by: Christopher S. Hall 
[jstultz: Commit subject tweaks]
Signed-off-by: John Stultz 
---
 Documentation/ptp/testptp.c  |  6 --
 drivers/ptp/ptp_chardev.c| 27 +++
 include/linux/ptp_clock_kernel.h |  8 
 include/uapi/linux/ptp_clock.h   | 13 -
 4 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/Documentation/ptp/testptp.c b/Documentation/ptp/testptp.c
index 6c6247a..d99012f 100644
--- a/Documentation/ptp/testptp.c
+++ b/Documentation/ptp/testptp.c
@@ -277,13 +277,15 @@ int main(int argc, char *argv[])
   "  %d external time stamp channels\n"
   "  %d programmable periodic signals\n"
   "  %d pulse per second\n"
-  "  %d programmable pins\n",
+  "  %d programmable pins\n"
+  "  %d cross timestamping\n",
   caps.max_adj,
   caps.n_alarm,
   caps.n_ext_ts,
   caps.n_per_out,
   caps.pps,
-  caps.n_pins);
+  caps.n_pins,
+  caps.cross_timestamping);
}
}
 
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c
index da7bae9..579fd65 100644
--- a/drivers/ptp/ptp_chardev.c
+++ b/drivers/ptp/ptp_chardev.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ptp_private.h"
 
@@ -120,11 +121,13 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
struct ptp_clock_caps caps;
struct ptp_clock_request req;
struct ptp_sys_offset *sysoff = NULL;
+   struct ptp_sys_offset_precise precise_offset;
struct ptp_pin_desc pd;
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
struct ptp_clock_info *ops = ptp->info;
struct ptp_clock_time *pct;
struct timespec64 ts;
+   struct system_device_crosststamp xtstamp;
int enable, err = 0;
unsigned int i, pin_index;
 
@@ -138,6 +141,7 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
caps.n_per_out = ptp->info->n_per_out;
caps.pps = ptp->info->pps;
caps.n_pins = ptp->info->n_pins;
+   caps.cross_timestamping = ptp->info->getcrosststamp != NULL;
if (copy_to_user((void __user *)arg, &caps, sizeof(caps)))
err = -EFAULT;
break;
@@ -180,6 +184,29 @@ long ptp_ioctl(struct posix_clock *pc, unsigned int cmd, 
unsigned long arg)
err = ops->enable(ops, &req, enable);
break;
 
+   case PTP_SYS_OFFSET_PRECISE:
+   if (!ptp->info->getcrosststamp) {
+   err = -EOPNOTSUPP;
+   break;
+   }
+   err = ptp->info->getcrosststamp(ptp->info, &xtstamp);
+   if (err)
+   break;
+
+   ts = ktime_to_timespec64(xtstamp.device);
+   precise_offset.device.sec = ts.tv_sec;
+   precise_offset.device.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_realtime);
+   precise_offset.sys_realtime.sec = ts.tv_sec;
+   precise_offset.sys_realtime.nsec = ts.tv_nsec;
+   ts = ktime_to_timespec64(xtstamp.sys_monoraw);
+   precise_offset.sys_monoraw.sec = ts.t

Re: [PATCH v8 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-02-24 Thread John Stultz
On Wed, Feb 24, 2016 at 2:56 AM, Thomas Gleixner  wrote:
> On Mon, 22 Feb 2016, Christopher S. Hall wrote:
>> +{
>> + struct timekeeper *tk = &tk_core.timekeeper;
>> + bool interp_forward;
>> + u64 corr_raw, corr_real;
>> + int ret;
>
> Once more:
>
> struct timekeeper *tk = &tk_core.timekeeper;
> u64 corr_raw, corr_real;
> bool interp_forward;
> int ret;
>
> Is way simpler to parse fast.

So I just went through and addressed these formatting issues in my tree.

but

>> @@ -929,6 +1046,12 @@ int get_device_system_crosststamp(int (*get_time_fn)
>>   ktime_t base_real;
>>   s64 nsec_raw;
>>   s64 nsec_real;
>> + cycles_t cycles;
>> + cycle_t now;

I just noticed this train-wreck: cycles_t and cycle_t are obnoxiously
different types. (One is an int on some arches and the other is a
u64).

You very much want to use cycle_t here. And I think that goes for the
introduced cycle_between() function.

So I'm fixing that up as well in this patch, but there's a few other
spots in this series too.

Sigh. Going to have to find some time to go through and try to zap
cycles_t in the kernel  because having both is just asking for
trouble.  :P

thanks
-john


Re: [PATCH v7 0/8] Patchset enabling hardware based cross-timestamps for next gen Intel platforms

2016-02-22 Thread John Stultz
On Mon, Feb 22, 2016 at 10:33 AM, Christopher Hall
 wrote:
> I just sent another patchset (v8). I corrected the comment problems pointed
> out by Richard Cochran. I also changed the arch/x86 code to use "non-stop"
> TSC rather than "invariant" TSC. They are *exactly* the same thing (i.e.
> read from the same bit of the CPUID leaf). The former exists already and
> should be used instead.  Patch 6/8 is the only patch that is changed apart
> from comments.

Ok. So I see you addressed some of Andy's feedback, but did you answer
the bit about the k offset?

thanks
-john


Re: [PATCH v7 5/8] time: Add history to cross timestamp interface supporting slower devices

2016-02-19 Thread John Stultz
On Thu, Feb 18, 2016 at 2:17 PM, Richard Cochran
 wrote:
> On Fri, Feb 12, 2016 at 12:25:26PM -0800, Christopher S. Hall wrote:
>>  /**
>>   * get_device_system_crosststamp - Synchronously capture system/device 
>> timestamp
>> - * @sync_devicetime: Callback to get simultaneous device time and
>> + * @get_time_fn: Callback to get simultaneous device time and
>
> Fold this into earlier patch?
>
>>   *   system counter from the device driver
>> + * @history_ref: Historical reference point used to interpolate system
>> + *   time when counter provided by the driver is before the current interval
>
> KernelDoc says history_ref,
>
>>   * @xtstamp: Receives simultaneously captured system and device time
>>   *
>>   * Reads a timestamp from a device and correlates it to system time
>> @@ -920,6 +1035,7 @@ int get_device_system_crosststamp(int (*get_time_fn)
>>  struct system_counterval_t *sys_counterval,
>>  void *ctx),
>> void *ctx,
>> +   struct system_time_snapshot *history_begin,
>> struct system_device_crosststamp *xtstamp)
>
> ... but parameter is called history_begin.

Yep. I reworked those in my tree when I fixed the earlier issue you pointed out.

https://git.linaro.org/people/john.stultz/linux.git/commitdiff/70aa310cf0a5181e62d910a7854a27c2f927315c
and
https://git.linaro.org/people/john.stultz/linux.git/commitdiff/e61e7d694c220d48959475b0e0a8b3bd991aeb22


Let me know if you see anything else.

thanks
-john


Re: [PATCH v7 0/8] Patchset enabling hardware based cross-timestamps for next gen Intel platforms

2016-02-18 Thread John Stultz
On Fri, Feb 12, 2016 at 12:25 PM, Christopher S. Hall
 wrote:
> Modern Intel hardware adds an Always Running Timer (ART) that allows the
> network and audio device clocks to precisely cross timestamp the device
> clock with the system clock. This allows a precise correlation of the
> device time and system time.

Thanks for your continued persistence here Christopher!  It is looking
pretty good.

I've queued these up for testing, and if that goes well, and don't hit
anything else in review, I'll likely try to submit all but the last
patch (unless there's an acked-by from the maintainer of that code)
through Thomas for 4.6.

(I've already added the function comment fix from Richard)

thanks
-john


Re: [RFC v5 3/6] Add history to cross timestamp interface supporting slower devices

2016-01-06 Thread John Stultz
On Mon, Jan 4, 2016 at 4:45 AM, Christopher S. Hall
 wrote:
> @@ -13,6 +13,9 @@
>  /**
>   * struct tk_read_base - base structure for timekeeping readout
>   * @clock: Current clocksource used for timekeeping.
> + * @cs_seq:Clocksource sequence is incremented per clocksource change.
> + * It's used to determine whether past system time can be related to
> + * current system time
>   * @read:  Read function of @clock
>   * @mask:  Bitmask for two's complement subtraction of non 64bit clocks
>   * @cycle_last: @clock cycle value at last update
> @@ -29,6 +32,7 @@
>   */
>  struct tk_read_base {
> struct clocksource  *clock;
> +   u8  cs_seq;
> cycle_t (*read)(struct clocksource *cs);
> cycle_t mask;
> cycle_t cycle_last;


So Thomas optimized the tk_read_bases to fit in a cacheline, and so I
worry about the u8 being added there. I'm also cautious about
exporting these seq values out further via the system_time_snapshot.
But I may just need some more time to warm up to the idea.


> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 9c1ddc3..5a7f784 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -235,11 +235,13 @@ static void tk_setup_internals(struct timekeeper *tk, 
> struct clocksource *clock)
>
> old_clock = tk->tkr_mono.clock;
> tk->tkr_mono.clock = clock;
> +   ++tk->tkr_mono.cs_seq;
> tk->tkr_mono.read = clock->read;
> tk->tkr_mono.mask = clock->mask;
> tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock);
>
> tk->tkr_raw.clock = clock;
> +   ++tk->tkr_raw.cs_seq;
> tk->tkr_raw.read = clock->read;
> tk->tkr_raw.mask = clock->mask;
> tk->tkr_raw.cycle_last = tk->tkr_mono.cycle_last;
> @@ -862,6 +864,39 @@ time64_t ktime_get_real_seconds(void)
>  }
>  EXPORT_SYMBOL_GPL(ktime_get_real_seconds);
>
> +/**
> + * ktime_get_snapshot - snapshots the realtime/monotonic raw clocks with 
> counter
> + * @snapshot:  pointer to struct receiving the system time snapshot
> + */
> +void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
> +{
> +   struct timekeeper *tk = &tk_core.timekeeper;
> +   unsigned long seq;
> +   ktime_t base_raw;
> +   ktime_t base_real;
> +   s64 nsec_raw;
> +   s64 nsec_real;
> +   cycle_t now;
> +
> +   do {
> +   seq = read_seqcount_begin(&tk_core.seq);
> +
> +   now = tk->tkr_mono.read(tk->tkr_mono.clock);
> +   systime_snapshot->cs_seq = tk->tkr_mono.cs_seq;
> +   systime_snapshot->clock_set_seq = tk->clock_was_set_seq;
> +   base_real = ktime_add(tk->tkr_mono.base,
> + tk_core.timekeeper.offs_real);
> +   base_raw = tk->tkr_raw.base;
> +   nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, now);
> +   nsec_raw  = timekeeping_cycles_to_ns(&tk->tkr_raw, now);
> +   } while (read_seqcount_retry(&tk_core.seq, seq));
> +
> +   systime_snapshot->cycles = now;
> +   systime_snapshot->real = ktime_add_ns(base_real, nsec_real);
> +   systime_snapshot->raw = ktime_add_ns(base_raw, nsec_raw);
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_snapshot);


So can you split out this adding of ktime_get_snapshot()  (maybe
skipping the seqcount bits initially) into a separate patch?

> @@ -936,19 +1044,63 @@ int get_device_system_crosststamp(struct 
> system_device_crosststamp *xtstamp,
>  */
> if (tk->tkr_mono.clock != raw_sys.cs)
> return -ENODEV;
> +   cycles = raw_sys.cycles;
> +
> +   /*
> +* Check whether the system counter value provided by the
> +* device driver is on the current interval.
> +*/
> +   now = tk->tkr_mono.read(tk->tkr_mono.clock);
> +   interval_start = tk->tkr_mono.cycle_last;
> +   if (!cycle_between(interval_start, cycles, now)) {
> +   cs_seq = tk->tkr_mono.cs_seq;
> +   clock_was_set_seq = tk->clock_was_set_seq;
> +   cycles = interval_start;
> +   do_interp = true;
> +   } else {
> +   do_interp = false;
> +   }
>
> base_real = ktime_add(tk->tkr_mono.base,
>   tk_core.timekeeper.offs_real);
> base_raw = tk->tkr_raw.base;
>
> -   nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono,
> -raw_sys.cycles);
> -   nsec_raw = timekeeping_cycles_to_ns(&tk->tkr_raw,
> -   raw_sys.cycles);
> +   nsec_real = timekeeping_cycles_to_ns(&tk->tkr_mono, cycles);
> +

Re: [RFC v5 4/6] Remove duplicate code from ktime_get_raw_and_real code

2016-01-06 Thread John Stultz
On Mon, Jan 4, 2016 at 4:45 AM, Christopher S. Hall
 wrote:
> The code in ktime_get_snapshot() is a superset of the code in
> ktime_get_raw_and_real() code. Changes the latter to call the former. A
> side effect of this is that ktime_get_raw_and_real() returns two clock
> times corresponding to the *exact* same clock tick. Previously, this
> code read the underlying counter twice.
>
> Signed-off-by: Christopher S. Hall 
> ---
>  kernel/time/timekeeping.c | 20 
>  1 file changed, 4 insertions(+), 16 deletions(-)
>
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5a7f784..a0f096c 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -910,26 +910,14 @@ EXPORT_SYMBOL_GPL(ktime_get_snapshot);
>   */
>  void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw, struct 
> timespec64 *ts_real)
>  {
> -   struct timekeeper *tk = &tk_core.timekeeper;
> -   unsigned long seq;
> -   s64 nsecs_raw, nsecs_real;
> +   struct system_time_snapshot snap;
>
> WARN_ON_ONCE(timekeeping_suspended);
>
> -   do {
> -   seq = read_seqcount_begin(&tk_core.seq);
> -
> -   *ts_raw = tk->raw_time;
> -   ts_real->tv_sec = tk->xtime_sec;
> -   ts_real->tv_nsec = 0;
> -
> -   nsecs_raw  = timekeeping_get_ns(&tk->tkr_raw);
> -   nsecs_real = timekeeping_get_ns(&tk->tkr_mono);
> -
> -   } while (read_seqcount_retry(&tk_core.seq, seq));
> +   ktime_get_snapshot(&snap);
>
> -   timespec64_add_ns(ts_raw, nsecs_raw);
> -   timespec64_add_ns(ts_real, nsecs_real);
> +   *ts_raw = ktime_to_timespec64(snap.raw);
> +   *ts_real = ktime_to_timespec64(snap.real);
>  }
>  EXPORT_SYMBOL(ktime_get_raw_and_real_ts64);

Looks sane, but you might even be able to get rid of this function
entirely and modify the one user (pps_get_ts()) to use
ktime_get_snapshot().

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v5 1/6] Timekeeping cross timestamp interface for device drivers

2016-01-06 Thread John Stultz
On Mon, Jan 4, 2016 at 4:45 AM, Christopher S. Hall
 wrote:
> ACKNOWLEDGMENT: The original correlated clock source and cross
> timestamp code was developed by Thomas Gleixner
> . It has changed considerably and any mistakes are
> mine.
>
> The precision with which events on multiple networked systems can be
> synchronized using, as an example, PTP (IEEE 1588, 802.1AS) is limited
> by the precision of the cross timestamps between the system clock and
> the device (timestamp) clock. Precision here is the degree of
> simultaneity when capturing the cross timestamp.
>
> Currently the PTP cross timestamp is captured in software using the
> PTP device driver ioctl PTP_SYS_OFFSET. Reads of the device clock are
> interleaved with reads of the realtime clock. At best, the precision
> of this cross timestamp is on the order of several microseconds due to
> software latencies. Sub-microsecond precision is required for
> industrial control and some media applications. To achieve this level
> of precision hardware supported cross timestamping is needed.
>
> Hardware cross timestamps are derived from simultaneously capturing
> the device clock and the system clock. Applications use timestamps in
> nanoseconds rather than clock ticks. The device driver can scale
> device clock ticks to device time in nanoseconds, but cannot transform
> system clock ticks. Only the kernel timekeeping code can do this. The
> function get_device_system_crosststamp() allows device drivers to
> return a cross timestamp with system time properly scaled to
> nanoseconds.
>
> The cross timestamps contain the realtime and monotonic raw clock
> times. The realtime value is needed to discipline that clock using PTP
> and the monotonic raw value is used for applications that don't
> require a "real" time, but need an unadjusted clock time.
>
> The get_device_system_crosststamp() code calls back into the driver
> to ensure that the system counter is within the current timekeeping
> update interval. Because of possible NTP/PTP frequency adjustments,
> extrapolating a realtime clock time outside the current interval with
> a potentially different scaling factor can result in a small amount of
> error.
>
> Modern Intel hardware provides an Always Running Timer (ART) which is
> exactly related to TSC through a known frequency ratio. The ART is
> routed to devices on the system and is used to precisely and
> simultaneously capture the device clock with the ART. The kernel
> timekeeping code requires a system clock value from the current clock
> source to calculate the system time. The correlated clocksource adds a
> means to transform a clock (in this case ART) exactly to the system
> clock (TSC clocksource) that is used to calculate system time.
>
> Signed-off-by: Christopher S. Hall 
> ---
>  include/linux/clocksource.h | 27 
>  include/linux/timekeeping.h | 35 +
>  kernel/time/timekeeping.c   | 77 
> ++---
>  3 files changed, 135 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index 7784b59..4b7973d 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -255,4 +255,31 @@ static inline void clocksource_probe(void) {}
>  #define CLOCKSOURCE_ACPI_DECLARE(name, table_id, fn)   \
> ACPI_DECLARE_PROBE_ENTRY(clksrc, name, table_id, 0, NULL, 0, fn)
>
> +/*
> + * struct correlated_cs - Descriptor for a clocksource correlated to another
> + * clocksource
> + * @related_cs:Pointer to the related timekeeping clocksource
> + * @convert:   Conversion function to convert a timestamp from
> + * the correlated clocksource to cycles of the related
> + * timekeeping clocksource
> + */
> +struct correlated_cs {
> +   struct clocksource  *related_cs;
> +   cycle_t (*convert)(struct correlated_cs *cs,
> +  cycle_t cycles);
> +};
> +
> +/*
> + * struct raw_system_counterval - system counter value captured in device
> + * driver used to produce system/device cross-timestamp
> + * @system:System counter value
> + * @cs:Clocksource related to system counter value. This is 
> used by
> + * timekeeping code to verify validity of counter for system time
> + * conversion
> + */
> +struct raw_system_counterval {
> +   cycle_t cycles;
> +   struct clocksource  *cs;
> +};
> +
>  #endif /* _LINUX_CLOCKSOURCE_H */
> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> index ec89d84..2209943 100644
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -266,6 +266,41 @@ extern void timekeeping_inject_sleeptime64(struct 
> timespec64 *delta);
>  extern void ktime_get_raw_and_real_ts64(struct timespec64 *ts_raw,
> struct tim

Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-11-09 Thread John Stultz
On Tue, Nov 3, 2015 at 11:18 AM, Stanton, Kevin B
 wrote:
> On Wed, 21 Oct 2015, Thomas Gleixner wrote:
>> On Tue, 20 Oct 2015, John Stultz wrote:
>>> Being able to have various hardware sharing a time base is quite
>>> useful, and methods for correlating timestamps together are useful.
>>> But I don't yet really understand why its important that we can
>>> translate a hardware timestamp from some time in the past to the
>>> correct system time in the past without error.
>
>>If your device can only provide timestamps from the past, then
>>having access to the history is important if you want to have
>>precise correlation.
>
> I hope this can be solved in timekeeping. But first, a quick
> recap...
>
> The timestamp pair (including the ART snapshot, as described
> previously) is captured simultaneously by the hardware
> resulting, effectively, in a (PTP,TSC) pair, or
> (AudioPosition,TSC) pair. The in-the-past-TSC value needs to be
> converted to system time so that it can be used by applications,
> without exposing the underlying ART or TSC.
>
> Note: ART is architectural, defined as part of Invariant
> Timekeeping in the current SDM, so this isn't a one-off
> capability.
>
> To convert a past TSC timestamp to system time 'correctly' (in a
> mathematical sense), a history of monotonic rate adjustments
> since that time in the past must be maintained.

But again, my main problem is that I'm not totally understanding the rational.
Here you're providing the *what*, not really the *why*.

As I mentioned earlier, there are possibly simpler (at least for the
kernel) ways to generate similar data using CLOCK_MONOTONIC_RAW, which
has potentially a 500ppm error. *Why* is it important to add more
complexity to the timekeeping core in order to avoid that error?

> Regarding the amount of history, as Chris mentioned (and
> in the context of new Intel hardware) LAN timestamp pairs are
> a few microseconds in the past (no history would be required),
> but for timestamps captured by the audio DSP, unfortunately,
> they can be a small number of *milliseconds* in the past by the
> time they're available to the audio driver (some history
> required to convert accurately). I'm told that 4ms of adjustment
> history accommodates known hardware.

For a timestamp recorded 4ms ago, 500ppm of error is 2us. Why is 2us
problematic for audio? That seems quite below the human threshold to
notice.

> Getting this 'correct' in one place (timekeeping) seems a lot
> better than unnecessarily introducing inaccuracy via software
> sampling (and extrapolation) or leaving it to each driver to do
> it themselves, and to do it differently (and/or do it wrongly).

Having a common infrastructure for extrapolating the data isn't
something I'm objecting to.  Its just that the shadow-timekeeper has
been problematic enough in recent times bug wise, so I'm just hesitant
to expand the complexity there. That said, I'm open to ideas, but
would really like a better understanding of why other solutions would
be insufficient, and why this one is the best solution.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-20 Thread John Stultz
On Tue, Oct 20, 2015 at 12:11 PM, Thomas Gleixner  wrote:
> On Tue, 20 Oct 2015, Richard Cochran wrote:
>> On Tue, Oct 20, 2015 at 01:51:13PM +0200, Richard Cochran wrote:
>> > You can, in fact, achieve "proper" correlation by sampling.  As John
>> > said, the question is whether the method in the patch set "measurably
>> > improves the error" over using another, simpler method.
>>
>> Here is a short example to put some numbers on the expected error.
>> Let the driver sample at an interval of 1 ms.  If the system time's
>> frequency hasn't changed between two samples, A and B, then the driver
>> may interpolate without introducing any error.
>
> Darn, we don't want to have that kind of sampling in every driver
> which has this kind of problem even if it looks like the simpler
> choice for this particular use case. This is going to be something
> which next generation chips will have on more than just the audio
> interface and we realy want to have a generic solution for this.

I sort of agree with Richard that the timekeeper history approach
doesn't seem like a generic solution here.

And again, you seem to be speaking with a bigger picture in mind that
at least I don't yet share (apologies for being thick headed here).
Being able to have various hardware sharing a time base is quite
useful, and methods for correlating timestamps together are useful.
But I don't yet really understand why its important that we can
translate a hardware timestamp from some time in the past to the
correct system time in the past without error.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-19 Thread John Stultz
On Mon, Oct 19, 2015 at 5:18 PM, Christopher Hall
 wrote:
> On Thu, 15 Oct 2015 01:15:57 -0700, Thomas Gleixner 
> wrote:
>>>
>>> >
>>> > > +#define SHADOW_HISTORY_DEPTH 7
>>> >
>>> > And that number is 7 because?
>>>
>>> Due to power of 2 it will be 8 instead. As above the useful history is
>>> 8-2*1
>>> ms (1 ms is the minimum jiffy length).  Array size 4 would not be enough
>>> history for the DSP which requires 4 ms of history, in the worst case.
>>
>>
>> And how exactly becomes 7 magically 8?
>
>
> I'm making the array size a power of two, per your suggestion.
>
> In my view, the candidate array sizes are 4 and 8.  But the DSP driver code
> requires that it be at least 8.  Here is my reasoning:
>
> The number of shadow_timekeeper array elements that contain useful history
> is n-2 where n is the size of the shadow_timekeeper array.  This is true
> because shadow_timekeeper[shadow_index] is a copy of tk_core.timekeeper
> (this isn't history).  The next entry of the shadow_timekeeper array may be
> in-flight and contain invalid information, because update_wall_time() makes
> changes to the next entry of shadow timekeeper outside of the sequence lock.
> If that occurs, get_correlated_timestamp() would not be notified of this
> change through a change in sequence number.

Sorry for not commenting here earlier. But I've sort of loosely been
following this thread.

I'm still very very concerned about the complexity of adding any sort
of array of timekeeper structures for historical purposes, and like
Richard, I feel like the rational for all of this has not been made
very clear.

Thomas seems to be a helpful advocate, but I suspect the use case has
been explained to him in detail off list.

If we're only tracking 4ms of history, how does this solution
measurably improve the error over using the timestamps to generate
MONOTONIC_RAW clock deltas (which doesn't require keeping any history)
and using getnstime_raw_and_real to take an anchor point to calculate
the delta from?  Why is adding complexity necessary?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] y2038 conversion for ntp/pps and sfc driver

2015-10-01 Thread John Stultz
On Wed, Sep 30, 2015 at 12:27 AM, Thomas Gleixner  wrote:
> On Tue, 29 Sep 2015, David Miller wrote:
>> From: Arnd Bergmann 
>> Date: Mon, 28 Sep 2015 22:21:27 +0200
>>
>> > When trying to build a kernel with time_t commented out, I found that
>> > the ntp subsystem still relies on timespec for its pps handling.
>> >
>> > This series addresses this and converts all the code to use timespec64
>> > instead, step by step. There is one device driver that interacts with
>> > this code directly (rather than only through the ptp subsystem), so
>> > I have to convert that driver at the same time.
>> >
>> > The patches should ideally stay together as a series, but they do
>> > span multiple subsystems, so I'm also looking for the right person
>> > to merge them.
>>
>> I'm happy with this going via a tree other than mine, and for the
>
> I think it should go via John Stultz timekeeping tree.

I've queued the set for testing.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/4] Add correlated clocksource deriving system time from an auxiliary clocksource

2015-09-04 Thread John Stultz
On Thu, Sep 3, 2015 at 4:20 PM, Hall, Christopher S
 wrote:
> For example, supply the ART value as an argument and, in the case of the 
> realtime
> clock, keep a short history of clock changes.  It would fail in cases where 
> there
> are a lot of calls to adjtimex(), but it will would work most of the time.

So, I really don't think something like this would be reasonable. For
one, keeping track of the adjtimex adjustments would be difficult
enough to do sanely, but the real issue is that the clock has its own
long-term error correction adjustments that it does in order to keep
long term frequency accuracy with coarsely adjusted clocksources.
Trying to track those small oscillation intervals would be even more
complicated.

I still think that being able to calculate the CLOCK_MONOTONIC_RAW
value for a given ART counter value is reasonable, and then one can
use the getnstime_raw_and_real() to get a current raw/real sync point,
which you can then calculate the raw delta, and subtract that from the
sycned real timestamp.

You're error there would be bound by the maxium clocksource adjustment
rate * the raw-delta interval length.

To clarify on the need to understand if this error would be
reasonable, can you provide a sense of what the delay from an ART read
to trying to calculate a REALTIME value might be?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] Add calls to translate Always Running Timer (ART) to system time

2015-07-27 Thread John Stultz
On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
 wrote:
> +static bool checked_art_to_tsc(cycle_t *tsc)
> +{
> +   if (!has_art())
> +   return false;
> +   *tsc = art_to_tsc(*tsc);
> +   return true;
> +}
> +
> +static int art_to_rawmono64(struct timespec64 *rawmono, cycle_t art)
> +{
> +   if (!checked_art_to_tsc(&art))
> +   return -ENXIO;
> +   return tsc_to_rawmono64(rawmono, art);
> +}
> +EXPORT_SYMBOL(art_to_rawmono64);

This all seems to assume the TSC is the current clocksource, which it
may not be if the user has overridden it.

If instead there were a counter_to_rawmono64() which took the counter
value and maybe the name of the clocksource (if the strncmp is
affordable for your use), it might be easier for the core to provide
an error if the current timekeeping clocksource isn't the one the
counter value is based on. This would also allow the tsc_to_*()
midlayers to be dropped (since they don't seem to do much).

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] Add functions producing system time given a backing counter value

2015-07-27 Thread John Stultz
On Mon, Jul 27, 2015 at 8:44 PM, John Stultz  wrote:
> On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
>  wrote:
>> * counter_to_rawmono64
>> * counter_to_mono64
>> * counter_to_realtime64
>>
>> Enables drivers to translate a captured system clock counter to system
>> time. This is useful for network and audio devices that capture timestamps
>> in terms of both the system clock and device clock.
>
> Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
> fact that the multiplier is constantly adjusted and corrected. So that
> calling the function twice with the same counter value may result in
> different returned values.
>
> I've not yet groked the whole patchset, but it seems like there needs
> to be some mechanism that ensures the counter value is captured and
> used in the same (or at least close) interval that the timekeeper data
> is valid for.


So reading through. It looks like you only use art_to_realtime(), right?

So again, since CLOCK_MONOTONIC and CLOCK_REALTIME are constaly being
frequency adjusted, it might be best to construct this delta in the
following way.


Add counter_to_rawmono64(), which should be able to safely calculate
the corresponding CLOCK_MONOTONIC_RAW time from any given cycle value.

Use getnstime_raw_and_real() to get a immediate snapshot of current
MONOTONIC_RAW  and REALTIME clocks.

Then calculate the delta between the snapshotted counter raw time, and
the current raw time.  Then apply that offset to the current realtime.

The larger the raw-time delta, the larger the possible realtime error.
But I think that will be as good as it gets.

This should simplify the interfaces you're adding to the timekeeping core.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] Add functions producing system time given a backing counter value

2015-07-27 Thread John Stultz
On Mon, Jul 27, 2015 at 5:46 PM, Christopher Hall
 wrote:
> * counter_to_rawmono64
> * counter_to_mono64
> * counter_to_realtime64
>
> Enables drivers to translate a captured system clock counter to system
> time. This is useful for network and audio devices that capture timestamps
> in terms of both the system clock and device clock.

Huh.  So for counter_to_realtime64 & mono64, this seems to ignore the
fact that the multiplier is constantly adjusted and corrected. So that
calling the function twice with the same counter value may result in
different returned values.

I've not yet groked the whole patchset, but it seems like there needs
to be some mechanism that ensures the counter value is captured and
used in the same (or at least close) interval that the timekeeper data
is valid for.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] FRV: do_gettimeofday() should no longer use tickadj

2006-09-05 Thread john stultz
On Tue, 2006-09-05 at 16:35 +0100, David Howells wrote:
> Stop do_gettimeofday() on FRV from using tickadj, and model it after ARM
> instead.
> 
> This patch also provides a placeholder macro for getting hardware timer data 
> to
> be filled in when such is available.

>From this patch it looks like the FRV arch could be trivially converted
to GENERIC_TIME.

Would you consider the following, totally untested patch?

Signed-off-by: John Stultz <[EMAIL PROTECTED]>

 Kconfig   |4 ++
 kernel/time.c |   81 --
 2 files changed, 4 insertions(+), 81 deletions(-)

diff --git a/arch/frv/Kconfig b/arch/frv/Kconfig
index 95a3892..a601a17 100644
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -29,6 +29,10 @@ config GENERIC_HARDIRQS
bool
default n
 
+config GENERIC_TIME
+   bool
+   default y
+
 config TIME_LOW_RES
bool
default y
diff --git a/arch/frv/kernel/time.c b/arch/frv/kernel/time.c
index d5b64e1..68a77fe 100644
--- a/arch/frv/kernel/time.c
+++ b/arch/frv/kernel/time.c
@@ -32,8 +32,6 @@
 
 #define TICK_SIZE (tick_nsec / 1000)
 
-extern unsigned long wall_jiffies;
-
 unsigned long __nongprelbss __clkin_clock_speed_HZ;
 unsigned long __nongprelbss __ext_bus_clock_speed_HZ;
 unsigned long __nongprelbss __res_bus_clock_speed_HZ;
@@ -145,85 +143,6 @@ void time_init(void)
 }
 
 /*
- * This version of gettimeofday has near microsecond resolution.
- */
-void do_gettimeofday(struct timeval *tv)
-{
-   unsigned long seq;
-   unsigned long usec, sec;
-   unsigned long max_ntp_tick;
-
-   do {
-   unsigned long lost;
-
-   seq = read_seqbegin(&xtime_lock);
-
-   usec = 0;
-   lost = jiffies - wall_jiffies;
-
-   /*
-* If time_adjust is negative then NTP is slowing the clock
-* so make sure not to go into next possible interval.
-* Better to lose some accuracy than have time go backwards..
-*/
-   if (unlikely(time_adjust < 0)) {
-   max_ntp_tick = (USEC_PER_SEC / HZ) - tickadj;
-   usec = min(usec, max_ntp_tick);
-
-   if (lost)
-   usec += lost * max_ntp_tick;
-   }
-   else if (unlikely(lost))
-   usec += lost * (USEC_PER_SEC / HZ);
-
-   sec = xtime.tv_sec;
-   usec += (xtime.tv_nsec / 1000);
-   } while (read_seqretry(&xtime_lock, seq));
-
-   while (usec >= 100) {
-   usec -= 100;
-   sec++;
-   }
-
-   tv->tv_sec = sec;
-   tv->tv_usec = usec;
-}
-
-EXPORT_SYMBOL(do_gettimeofday);
-
-int do_settimeofday(struct timespec *tv)
-{
-   time_t wtm_sec, sec = tv->tv_sec;
-   long wtm_nsec, nsec = tv->tv_nsec;
-
-   if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
-   return -EINVAL;
-
-   write_seqlock_irq(&xtime_lock);
-   /*
-* This is revolting. We need to set "xtime" correctly. However, the
-* value in this location is the value at the most recent update of
-* wall time.  Discover what correction gettimeofday() would have
-* made, and then undo it!
-*/
-   nsec -= 0 * NSEC_PER_USEC;
-   nsec -= (jiffies - wall_jiffies) * TICK_NSEC;
-
-   wtm_sec  = wall_to_monotonic.tv_sec + (xtime.tv_sec - sec);
-   wtm_nsec = wall_to_monotonic.tv_nsec + (xtime.tv_nsec - nsec);
-
-   set_normalized_timespec(&xtime, sec, nsec);
-   set_normalized_timespec(&wall_to_monotonic, wtm_sec, wtm_nsec);
-
-   ntp_clear();
-   write_sequnlock_irq(&xtime_lock);
-   clock_was_set();
-   return 0;
-}
-
-EXPORT_SYMBOL(do_settimeofday);
-
-/*
  * Scheduler clock - returns current time in nanosec units.
  */
 unsigned long long sched_clock(void)


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG] Two BUG warnings in net/core/dev.c

2006-07-11 Thread john stultz
Both of these were seen on my laptop w/ the current (as of this writing)
-git tree using the e1000 driver after a suspend/resume cycle.

thanks
-john

BUG: warning at net/core/dev.c:1171/skb_checksum_help()
 [] show_trace_log_lvl+0x149/0x170
 [] show_trace+0x1b/0x20
 [] dump_stack+0x24/0x30
 [] skb_checksum_help+0x163/0x170
 [] ip_nat_fn+0x1a5/0x210
 [] ip_nat_local_fn+0x65/0xf0
 [] nf_iterate+0x60/0x90
 [] nf_hook_slow+0x5c/0x100
 [] ip_queue_xmit+0x1f1/0x4c0
 [] tcp_transmit_skb+0x3bf/0x7d0
 [] tcp_push_one+0x9f/0x120
 [] tcp_sendmsg+0x393/0xbf0
 [] inet_sendmsg+0x35/0x60
 [] sock_sendmsg+0xcd/0x100
 [] sys_sendto+0xd3/0x100
 [] sys_send+0x32/0x40
 [] sys_socketcall+0x142/0x260
 [] sysenter_past_esp+0x56/0x8d
 [] 0xb7fa8410


BUG: warning at net/core/dev.c:1225/skb_gso_segment()
 [] show_trace_log_lvl+0x149/0x170
 [] show_trace+0x1b/0x20
 [] dump_stack+0x24/0x30
 [] skb_gso_segment+0x20a/0x210
 [] dev_hard_start_xmit+0x15f/0x300
 [] __qdisc_run+0x7b/0x1f0
 [] dev_queue_xmit+0x127/0x310
 [] neigh_resolve_output+0xec/0x2a0
 [] ip_output+0x170/0x260
 [] ip_queue_xmit+0x2a6/0x4c0
 [] tcp_transmit_skb+0x3bf/0x7d0
 [] tcp_push_one+0x9f/0x120
 [] tcp_sendmsg+0x393/0xbf0
 [] inet_sendmsg+0x35/0x60
 [] sock_sendmsg+0xcd/0x100
 [] sys_sendto+0xd3/0x100
 [] sys_send+0x32/0x40
 [] sys_socketcall+0x142/0x260
 [] sysenter_past_esp+0x56/0x8d
 [] 0xb7fa8410


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html