Re: [PATCH net] net: usb: lan78xx: Connect PHY before registering MAC
On Thu, Oct 17, 2019 at 09:29:26PM +0200, Andrew Lunn wrote: > As soon as the netdev is registers, the kernel can start using the > interface. If the driver connects the MAC to the PHY after the netdev > is registered, there is a race condition where the interface can be > opened without having the PHY connected. > > Change the order to close this race condition. > > Fixes: 92571a1aae40 ("lan78xx: Connect phy early") > Reported-by: Daniel Wagner > Signed-off-by: Andrew Lunn Tested-by: Daniel Wagner Thanks for the fix! Daniel
Re: lan78xx and phy_state_machine
> >> Unfortunately, you didn't wrote which kernel version works for you > >> (except of this splat). Only 5.3 or 5.4-rc3 too? > > With v5.2.20 I was able to boot the system. But after this discussion > > I would say that was just luck. The race seems to exist for longer and > > only with my 'special' config I am able to reproduce it. > okay, let me rephrase my question. You said that 5.4-rc3 didn't even > boot in your setup. After applying Andrew's patch, does it boot or is it > a different issue? Yes, with Andrew's patch the initial problem is gone. > >> [1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2 > >> [2] - https://patchwork.kernel.org/patch/10888797/ > > Indeed, the irq domain code looks suspicious and Marc pointed out that > > is dead wrong. Could we just go with [2] and fix this up? > > Sorry, i cannot answer this question. Sure, I just trying to lobbying :)
Re: lan78xx and phy_state_machine
Hi Stefan, On Thu, Oct 17, 2019 at 07:05:32PM +0200, Stefan Wahren wrote: > Am 17.10.19 um 08:52 schrieb Daniel Wagner: > > On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote: > >> Please could you give this a go. It is totally untested, not even > >> compile tested... > > Sure. The system boots but ther is one splat: > > > this is a known issues since 4.20 [1], [2]. So not related to the crash. Oh, I see. > Unfortunately, you didn't wrote which kernel version works for you > (except of this splat). Only 5.3 or 5.4-rc3 too? With v5.2.20 I was able to boot the system. But after this discussion I would say that was just luck. The race seems to exist for longer and only with my 'special' config I am able to reproduce it. > [1] - https://marc.info/?l=linux-netdev&m=154604180927252&w=2 > [2] - https://patchwork.kernel.org/patch/10888797/ Indeed, the irq domain code looks suspicious and Marc pointed out that is dead wrong. Could we just go with [2] and fix this up? Thanks, Daniel
Re: lan78xx and phy_state_machine
On Wed, Oct 16, 2019 at 05:51:07PM +0200, Andrew Lunn wrote: > Hi Daniel > > Please could you give this a go. It is totally untested, not even > compile tested... Sure. The system boots but ther is one splat: [2.213987] usb 1-1: new high-speed USB device number 2 using dwc2 [2.426789] hub 1-1:1.0: USB hub found [2.430677] hub 1-1:1.0: 4 ports detected [2.721982] usb 1-1.1: new high-speed USB device number 3 using dwc2 [2.826991] hub 1-1.1:1.0: USB hub found [2.831093] hub 1-1.1:1.0: 3 ports detected [3.489988] usb 1-1.1.1: new high-speed USB device number 4 using dwc2 [3.729045] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): deferred multicast write 0x7ca0 [3.870518] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): No External EEPROM. Setting MAC Speed [3.881900] libphy: lan78xx-mdiobus: probed [3.893322] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): registered mdiobus bus usb-001:004 [3.902984] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): phydev->irq = 79 [4.283761] random: crng init done [4.958866] lan78xx 1-1.1.1:1.0 eth0: receive multicast hash filter [4.965311] lan78xx 1-1.1.1:1.0 eth0: deferred multicast write 0x7ca2 [6.502358] lan78xx 1-1.1.1:1.0 eth0: PHY INTR: 0x0002 [6.507935] [ cut here ] [6.512635] irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts [6.520250] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x150/0x170 [6.529424] Modules linked in: [6.532526] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.4.0-rc3-00018-g5bc52f64e884-dirty #36 [6.541172] Hardware name: Raspberry Pi 3 Model B+ (DT) [6.546471] pstate: 6005 (nZCv daif -PAN -UAO) [6.551329] pc : __handle_irq_event_percpu+0x150/0x170 [6.556539] lr : __handle_irq_event_percpu+0x150/0x170 [6.561747] sp : 800010003cc0 [6.565104] x29: 800010003cc0 x28: 0060 [6.570493] x27: 8000110fb9b0 x26: 800011a3daeb [6.575882] x25: 800011892d40 x24: 37525800 [6.581270] x23: 004f x22: 800010003d64 [6.586659] x21: x20: 0002 [6.592046] x19: 3716fb00 x18: 0010 [6.597434] x17: 0001 x16: 0007 [6.602822] x15: 8000118931b0 x14: 747075727265746e [6.608210] x13: 692064656c62616e x12: 65203878302f3078 [6.613598] x11: 302b72656c646e61 x10: 685f7972616d6972 [6.618986] x9 : 705f746c75616665 x8 : 800011a9f000 [6.624374] x7 : 800010681150 x6 : 00f9 [6.629761] x5 : x4 : [6.635148] x3 : x2 : 8000118a2440 [6.640535] x1 : ab82878caf7c9e00 x0 : [6.645923] Call trace: [6.648404] __handle_irq_event_percpu+0x150/0x170 [6.653262] handle_irq_event_percpu+0x30/0x88 [6.657767] handle_irq_event+0x44/0xc8 [6.661659] handle_simple_irq+0x90/0xc0 [6.665635] generic_handle_irq+0x24/0x38 [6.669703] intr_complete+0x104/0x178 [6.673508] __usb_hcd_giveback_urb+0x58/0xf8 [6.677927] usb_giveback_urb_bh+0xac/0x108 [6.682173] tasklet_action_common.isra.0+0x154/0x1a0 [6.687298] tasklet_hi_action+0x24/0x30 [6.691277] __do_softirq+0x120/0x23c [6.694990] irq_exit+0xb8/0xd8 [6.698174] __handle_domain_irq+0x64/0xb8 [6.702326] bcm2836_arm_irqchip_handle_irq+0x60/0xc0 [6.707449] el1_irq+0xb8/0x180 [6.710634] arch_cpu_idle+0x10/0x18 [6.714260] do_idle+0x200/0x280 [6.717532] cpu_startup_entry+0x20/0x40 [6.721512] rest_init+0xd4/0xe0 [6.724786] arch_call_rest_init+0xc/0x14 [6.728851] start_kernel+0x420/0x44c [6.732562] ---[ end trace e770c2c68be5476f ]--- [6.742776] lan78xx 1-1.1.1:1.0 eth0: speed: 1000 duplex: 1 anadv: 0x05e1 anlpa: 0xc1e1 [6.750940] lan78xx 1-1.1.1:1.0 eth0: rx pause disabled, tx pause disabled [6.769976] Sending DHCP requests ..., OK [ 12.926088] IP-Config: Got DHCP answer from 192.168.19.2, my address is 192.168.19.53 [ 12.934059] IP-Config: Complete: [ 12.937335] device=eth0, hwaddr=b8:27:eb:85:c7:c9, ipaddr=192.168.19.53, mask=255.255.255.0, gw=192.168.19.1 [ 12.947758] host=192.168.19.53, domain=, nis-domain=(none) [ 12.953772] bootserver=192.168.19.2, rootserver=192.168.19.2, rootpath= [ 12.953776] nameserver0=192.168.19.2 [ 12.965221] ALSA device list: [ 12.968246] No soundcards found. [ 12.984397] VFS: Mounted root (nfs filesystem) on device 0:19. [ 12.991059] devtmpfs: mounted [ 13.000530] Freeing unused kernel memory: 5504K [ 13.018077] Run /sbin/init as init process [ 44.010022] nfs: server 192.168.19.2 not responding, still trying [ 44.010027] nfs: server 192.168.19.2 not responding, still trying [ 44.010033] nfs: server 192.168.19.2 not responding, still trying
Re: lan78xx and phy_state_machine
On Tue, Oct 15, 2019 at 07:16:53PM +0200, Daniel Wagner wrote: > Could it be that the networking interface is still running (from > u-boot and PXE) when the drivers is setting it up and the workqueue is > premature kicked to work? I've dump the registers before the device is setup and verified with the manual. So the device is in reset state as documented in the FIGURE 13-1 http://ww1.microchip.com/downloads/en/DeviceDoc/LAN7800-Data-Sheet-DS1992G.pdf After being burned several times I'd like to check such things first. Anyway, rules out my boot setup. > Anyway, I keep trying to get some trace out of it. After adding ignore_loglevel to command line, I finally get the a trace on the console. Note with the WARN_ON the system boots. Though there seems to be still something wrong the the network, because there is no reliable connetion to the NFS server. [3.743559] lan78xx 1-1.1.1:1.0 (unnamed net_device) (uninitialized): No External EEPROM. Setting MAC Speed [3.754941] libphy: lan78xx-mdiobus: probed [3.815609] [ cut here ] [3.820316] WARNING: CPU: 3 PID: 1 at drivers/net/phy/phy.c:496 phy_queue_state_machine+0xc/0x30 [3.829226] Modules linked in: [3.832329] CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.4.0-rc3-00018-g5bc52f64e884-dirty #32 [3.840974] Hardware name: Raspberry Pi 3 Model B+ (DT) [3.846273] pstate: 6005 (nZCv daif -PAN -UAO) [3.851132] pc : phy_queue_state_machine+0xc/0x30 [3.855903] lr : phy_start+0x88/0xa0 [3.859524] sp : 800010023b80 [3.862882] x29: 800010023b80 x28: 37c34000 [3.868270] x27: 8000111ac178 x26: 1002 [3.873657] x25: 0001 x24: [3.879046] x23: 1002 x22: 800010e3d850 [3.884433] x21: 37c34800 x20: 37328438 [3.889820] x19: 37328000 x18: 000e [3.895209] x17: 0001 x16: 0019 [3.900596] x15: x14: [3.905985] x13: x12: 1da9 [3.911372] x11: x10: [3.916759] x9 : 383b2750 x8 : 383b1dc0 [3.922148] x7 : 37e900c0 x6 : 0002 [3.927535] x5 : 0001 x4 : 37e90028 [3.932923] x3 : x2 : 0001 [3.938311] x1 : x0 : 37328000 [3.943698] Call trace: [3.946179] phy_queue_state_machine+0xc/0x30 [3.950597] phy_start+0x88/0xa0 [3.953870] lan78xx_open+0x30/0x140 [3.957499] __dev_open+0xc0/0x170 [3.960950] __dev_change_flags+0x160/0x1b8 [3.965192] dev_change_flags+0x20/0x60 [3.969083] ip_auto_config+0x254/0xe54 [3.972974] do_one_initcall+0x50/0x190 [3.976865] kernel_init_freeable+0x194/0x22c [3.981285] kernel_init+0x10/0x100 [3.984822] ret_from_fork+0x10/0x18 [3.988445] ---[ end trace a7b6e745fa28cd56 ]--- [4.025682] random: crng init done [6.401142] [ cut here ] [6.405854] irq 79 handler irq_default_primary_handler+0x0/0x8 enabled interrupts [6.413468] WARNING: CPU: 0 PID: 0 at kernel/irq/handle.c:152 __handle_irq_event_percpu+0x150/0x170 [6.422642] Modules linked in: [6.425744] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW 5.4.0-rc3-00018-g5bc52f64e884-dirty #32 [6.435799] Hardware name: Raspberry Pi 3 Model B+ (DT) [6.441099] pstate: 6005 (nZCv daif -PAN -UAO) [6.445957] pc : __handle_irq_event_percpu+0x150/0x170 [6.451168] lr : __handle_irq_event_percpu+0x150/0x170 [6.456375] sp : 800010003cc0 [6.459732] x29: 800010003cc0 x28: 0060 [6.465120] x27: 8000110929a8 x26: 80001192d86b [6.470508] x25: 800011782d40 x24: 374cde00 [6.475897] x23: 004f x22: 800010003d64 [6.481285] x21: x20: 0002 [6.486672] x19: 372ee180 x18: 0010 [6.492060] x17: 0001 x16: 0007 [6.497448] x15: 8000117831b0 x14: 747075727265746e [6.502835] x13: 692064656c62616e x12: 65203878302f3078 [6.508223] x11: 302b72656c646e61 x10: 685f7972616d6972 [6.513611] x9 : 705f746c75616665 x8 : 800011952000 [6.518999] x7 : 80001066dce0 x6 : 0106 [6.524387] x5 : x4 : [6.529775] x3 : x2 : 800011792440 [6.535163] x1 : 190f5ab71e843000 x0 : [6.540550] Call trace: [6.543032] __handle_irq_event_percpu+0x150/0x170 [6.547890] handle_irq_event_percpu+0x30/0x88 [6.552394] handle_irq_event+0x44/0xc8 [6.556283] handle_simple_irq+0x90/0xc0 [6.560260] generic_handle_irq+0x24/0x38 [6.564328] intr_complete+0xb0/0xe0 [6.567955] __usb_hcd_giveback_urb+0x58/0xf8 [6.572374] usb_give
Re: lan78xx and phy_state_machine
Hi Andrew, On Tue, Oct 15, 2019 at 02:53:27AM +0200, Andrew Lunn wrote: > On Mon, Oct 14, 2019 at 04:06:04PM +0200, Daniel Wagner wrote: > > Hi, > > > > I've trying to boot a RPi 3 Model B+ in 64 bit mode. While I can get > > my configuratin booting with v5.2.20, the current kernel v5.3.6 hangs > > when initializing the eth interface. > > > > Is this a know issue? Some configuration issues? > > Hi Daniel > > Please could you add a WARN_ON(1); in phy_queue_state_machine() and > post the stack dump. That might help us figure out what is going on. I tried to get a stack dump from the WARN_ON(1). The 'make defconfig' seems not to enable it(?). Anyway I played a bit and noticed, that depending which additional debug config switch is enabled the problem disappears. The boot timing is important it seems. After the feedback I got so far, it think my setup is 'special' in sofar I don't boot from eMMC. Instead I rely on TFTP and NFS for rootfs: - kernel is configured as 'make defconfig' + # # Built in drivers # CONFIG_USB_LAN78XX=y # # Networking # CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # NFS CONFIG_NFS_FS=y CONFIG_NFS_V4=y CONFIG_NFS_V4_1=y CONFIG_NFS_V4_2=y # # Debugging # CONFIG_PRINTK_TIME=y CONFIG_DEBUG_KERNEL=y CONFIG_EARLY_PRINTK=y CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7 # Embedded config to kernel. /proc/config.gz CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y CONFIG_KEXEC=y - u-boot enables network interface, does DHCP - fetches a PXE image - PXE loads DTB, kernel and starts the kernel - rootfs is supposed to be provided via NFS Could it be that the networking interface is still running (from u-boot and PXE) when the drivers is setting it up and the workqueue is premature kicked to work? Anyway, I keep trying to get some trace out of it. Thanks, Daniel
Re: wl1251 & mac address & calibration data
On 12/16/2016 03:03 AM, Luis R. Rodriguez wrote: For the new API a solution for "fallback mechanisms" should be clean though and I am looking to stay as far as possible from the existing mess. A solution to help both the old API and new API is possible for the "fallback mechanism" though -- but for that I can only refer you at this point to some of Daniel Wagner and Tom Gunderson's firmwared deamon prospect. It should help pave the way for a clean solution and help address other stupid issues. The firmwared project is hosted here https://github.com/teg/firmwared As Luis pointed out, firmwared relies on FW_LOADER_USER_HELPER_FALLBACK, which is not enabled by default. I don't see any reason why firmwared should not also support loading calibration data. If we find a sound way to do this. As you can see from the commit history it is a pretty young project and more ore less reanimation of the old udev firmware loader feature. We are getting int into shape, adding integration tests etc. The main motivation for this project is the get movement back in stuck discussion on the firmware loader API. Luis was very busy writing up all the details on the current situation and purely from the amount of documentation need to describe the API you can tell something is awry. Thanks, Daniel
[PATCH] xprtrdma: use complete() instead complete_all()
From: Daniel Wagner There is only one waiter for the completion, therefore there is no need to use complete_all(). Let's make that clear by using complete() instead of complete_all(). The usage pattern of the completion is: waiter context waker context frwr_op_unmap_sync() reinit_completion() ib_post_send() wait_for_completion() frwr_wc_localinv_wake() complete() Signed-off-by: Daniel Wagner Cc: Anna Schumaker Cc: Trond Myklebust Cc: Chuck Lever Cc: linux-...@vger.kernel.org Cc: netdev@vger.kernel.org --- net/sunrpc/xprtrdma/frwr_ops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index 892b5e1..4a24f0e 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -329,7 +329,7 @@ frwr_wc_localinv_wake(struct ib_cq *cq, struct ib_wc *wc) frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe); if (wc->status != IB_WC_SUCCESS) __frwr_sendcompletion_flush(wc, frmr, "localinv"); - complete_all(&frmr->fr_linv_done); + complete(&frmr->fr_linv_done); } /* Post a REG_MR Work Request to register a memory region -- 2.7.4
[PATCH 0/2] wireless: Use complete() instead complete_all()
From: Daniel Wagner Hi, Using complete_all() is not wrong per se but it suggest that there might be more than one reader. For -rt I am reviewing all complete_all() users and would like to leave only the real ones in the tree. The main problem for -rt about complete_all() is that it can be uses inside IRQ context and that can lead to unbounded amount work inside the interrupt handler. That is a no no for -rt. The patches grouped per subsystem and in small batches to allow reviewing. This series ignores all complete_all() usages in the firmware loading path. They will be hopefully address by Luis' sysdata patches [0]. That leaves a couple of complete_all() calls. The first patch fixes a real glitch for the carl9170 driver. I was able to test it because I have the hardware. For the second one I haven't found any dongle with that chip in my drawers. This series against net-next of today. cheers, daniel [0] https://lkml.kernel.org/r/1466117661-22075-1-git-send-email-mcg...@kernel.org Daniel Wagner (2): carl9170: Fix wrong completion usage ath10k: use complete() instead complete_all() drivers/net/wireless/ath/ath10k/core.c | 16 drivers/net/wireless/ath/ath10k/mac.c | 2 +- drivers/net/wireless/ath/carl9170/usb.c | 6 ++ 3 files changed, 11 insertions(+), 13 deletions(-) -- 2.7.4
[PATCH 2/2] ath10k: use complete() instead complete_all()
From: Daniel Wagner There is only one waiter for the completion, therefore there is no need to use complete_all(). Let's make that clear by using complete() instead of complete_all(). The usage pattern of the completion is: waiter context waker context scan.started ath10k_start_scan() lockdep_assert_held(conf_mutex) auth10k_wmi_start_scan() wait_for_completion_timeout(scan.started) ath10k_wmi_event_scan_start_failed() complete(scan.started) ath10k_wmi_event_scan_started() complete(scan.started) scan.completed -- ath10k_scan_stop() lockdep_assert_held(conf_mutex) ath10k_wmi_stop_scan() wait_for_completion_timeout(scan.completed) __ath10k_scan_finish() complete(scan.completed) scan.on_channel --- ath10k_remain_on_channel() mutex_lock(conf_mutex) ath10k_start_scan() wait_for_completion_timeout(scan.on_channel) ath10k_wmi_event_scan_foreign_chan() complete(scan.on_channel) offchan_tx_completed ath10k_offchan_tx_work() mutex_lock(conf_mutex) reinit_completion(offchan_tx_completed) wait_for_completion_timeout(offchan_tx_completed) ath10k_report_offchain_tx() complete(offchan_tx_completed) install_key_done ath10k_install_key() lockep_assert_held(conf_mutex) reinit_completion(install_key_done) wait_for_completion_timeout(install_key_done) ath10k_htt_t2h_msg_handler() complete(install_key_done) vdev_setup_done --- ath10k_monitor_vdev_start() lockdep_assert_held(conf_mutex) reinit_completion(vdev_setup_done) ath10k_vdev_setup_sync() wait_for_completion_timeout(vdev_setup_done) ath10k_wmi_event_vdev_start_resp() complete(vdev_setup_done) ath10k_monitor_vdev_stop() lockdep_assert_held(conf_mutex) reinit_completion(vdev_setup_done() ath10k_vdev_setup_sync() wait_for_completion_timeout(vdev_setup_done) ath10k_wmi_event_vdev_stopped() complete(vdev_setup_done) thermal.wmi_sync ath10k_thermal_show_temp() mutex_lock(conf_mutex) reinit_completion(thermal.wmi_sync) wait_for_completion_timeout(thermal.wmi_sync) ath10k_thermal_event_temperature() complete(thermal.wmi_sync) bss_survey_done --- ath10k_mac_update_bss_chan_survey lockdep_assert_held(conf_mutex) reinit_completion(bss_survey_done) wait_for_completion_timeout(bss_survey_done) ath10k_wmi_event_pdev_bss_chan_info() complete(bss_survey_done) All complete() calls happen while the conf_mutex is taken. That means at max one waiter is possible. Signed-off-by: Daniel Wagner --- drivers/net/wireless/ath/ath10k/core.c | 16 drivers/net/wireless/ath/ath10k/mac.c | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c index e889829..ed76601 100644 --- a/drivers/net/wireless/ath/ath10k/core.c +++ b/drivers/net/wireless/ath/ath10k/core.c @@ -1497,14 +1497,14 @@ static void ath10k_core_restart(struct work_struct *work) ieee80211_stop_queues(ar->hw); ath10k_drain_tx(ar); - complete_all(&ar->scan.started); - complete_all(&ar->scan.completed); - complete_all(&ar->scan.on_channel); - complete_all(&ar->offchan_tx_completed); - complete_all(&ar->install_key_done); - complete_all(&ar->vdev_setup_done); - complete_all(&ar->thermal.wmi_sync); - complete_all(&ar->bss_survey_done); + complete(&ar->scan.started); + complete(&ar->scan.completed); + complete(&ar->scan.on_channel); + complete(&ar->offchan_tx_completed); + complete(&ar->install_key_done); + complete(&ar->vdev_setup_done); + complete(&ar->thermal.wmi_sync); + complete(&ar->bss_survey_done); wake_up(&ar->htt.empty_tx_wq); wake_up(&ar->wmi.tx_credits_wq); wake_up(&ar->peer_mapping_wq); diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index 0bbd0a0..c3c1c25 100644 -
[PATCH 1/2] carl9170: Fix wrong completion usage
From: Daniel Wagner carl9170_usb_stop() is used from several places to flush and cleanup any pending work. The normal pattern is to send a request and wait for the irq handler to call complete(). The completion is not reinitialized during normal operation and as the old comment indicates it is important to keep calls to wait_for_completion_timeout() and complete() balanced. Calling complete_all() brings this equilibirum out of balance and needs to be fixed by a reinit_completion(). But that opens a small race window. It is possible that the sequence of complete_all(), reinit_completion() is faster than the wait_for_completion_timeout() can do its work. The wake up is not lost but the done counter test is after reinit_completion() has been executed. The only reason we don't see carl9170_exec_cmd() hang forever is we use the timeout version of wait_for_copletion(). Let's fix this by reinitializing the completion (that is just setting done counter to 0) just before we send out an request. Now, carl9170_usb_stop() can be sure a complete() call is enough to make progess since there is only one waiter at max. This is a common pattern also seen in various drivers which use completion. Signed-off-by: Daniel Wagner --- drivers/net/wireless/ath/carl9170/usb.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c index 76842e6..99ab203 100644 --- a/drivers/net/wireless/ath/carl9170/usb.c +++ b/drivers/net/wireless/ath/carl9170/usb.c @@ -670,6 +670,7 @@ int carl9170_exec_cmd(struct ar9170 *ar, const enum carl9170_cmd_oids cmd, ar->readlen = outlen; spin_unlock_bh(&ar->cmd_lock); + reinit_completion(&ar->cmd_wait); err = __carl9170_exec_cmd(ar, &ar->cmd, false); if (!(cmd & CARL9170_CMD_ASYNC_FLAG)) { @@ -778,10 +779,7 @@ void carl9170_usb_stop(struct ar9170 *ar) spin_lock_bh(&ar->cmd_lock); ar->readlen = 0; spin_unlock_bh(&ar->cmd_lock); - complete_all(&ar->cmd_wait); - - /* This is required to prevent an early completion on _start */ - reinit_completion(&ar->cmd_wait); + complete(&ar->cmd_wait); /* * Note: -- 2.7.4
Re: [PATCH net-next] nfnetlink_queue: enable PID info retrieval
Hi Daniel, > [ Cc'ing John, Daniel, et al ] > > Btw, while I just looked at scm_detach_fds(), I think commits ... > > * 48a87cc26c13 ("net: netprio: fd passed in SCM_RIGHTS datagram not set > correctly") > * d84295067fc7 ("net: net_cls: fd passed in SCM_RIGHTS datagram not set > correctly") > > ... might not be correct, maybe I'm missing something ...? Lets say > process A > has a socket fd that it sends via SCM_RIGHTS to process B. Process A was > the > one that called sk_alloc() originally. Now in scm_detach_fds() we > install a new > fd for process B pointing to the same sock (file's private_data) and > above commits > update the cached socket cgroup data for net_cls/net_prio to the new > process B. > So, if process A for example still sends data over that socket, skbs > will then > wrongly match on B's cgroup membership instead of A's, no? I can't remember the details right now (need to read up again but I wont have time till Wednesday). >From your analysis I would say that is not the desired effect. A should match against its own cgroup and not the one of B. cheers, daniel
Re: [PATCH v2 net-next 0/12] bpf: map pre-alloc
Hi Alexei, On 03/08/2016 06:57 AM, Alexei Starovoitov wrote: > v1->v2: > . fix few issues spotted by Daniel > . converted stackmap into pre-allocation as well > . added a workaround for lockdep false positive > . added pcpu_freelist_populate to be used by hashmap and stackmap > > this path set switches bpf hash map to use pre-allocation by default > and introduces BPF_F_NO_PREALLOC flag to keep old behavior for cases > where full map pre-allocation is too memory expensive. > > Some time back Daniel Wagner reported crashes when bpf hash map is > used to compute time intervals between preempt_disable->preempt_enable > and recently Tom Zanussi reported a dead lock in iovisor/bcc/funccount > tool if it's used to count the number of invocations of kernel > '*spin*' functions. Both problems are due to the recursive use of > slub and can only be solved by pre-allocating all map elements. I gave it a short spin and lathist sample works just fine. cheers, daniel
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On 11/23/2015 04:53 PM, Tejun Heo wrote: > On Mon, Nov 23, 2015 at 09:54:32AM +0100, Daniel Wagner wrote: > ... >>> [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 >>> [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , >>> .owner: systemd/1, .owner_cpu: 1 >>> [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 >>> [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS >>> rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 >>> [3.228906] 834a2160 88007c043ad0 81551edc >>> 88007c028000 >>> [3.229512] 88007c043af0 81136868 834a2160 >>> 88007aff5940 >>> [3.230105] 88007c043b08 81136b05 834a2160 >>> 88007c043b20 >>> [3.230716] Call Trace: >>> [3.230906] [] dump_stack+0x4e/0x82 >>> [3.231289] [] spin_dump+0x78/0xc0 >>> [3.231642] [] do_raw_spin_unlock+0x75/0xd0 >>> [3.232039] [] _raw_spin_unlock+0x27/0x50 >>> [3.232431] [] update_classid_sock+0x68/0x80 >>> [3.232836] [] iterate_fd+0x71/0x150 >>> [3.233197] [] update_classid+0x47/0x80 >>> [3.233571] [] cgrp_attach+0x14/0x20 >>> [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 >>> [3.234366] [] cgroup_migrate+0xf5/0x190 >>> [3.235130] [] cgroup_attach_task+0x176/0x200 >>> [3.235953] [] __cgroup_procs_write+0x2ad/0x460 >>> [3.236805] [] cgroup_procs_write+0x14/0x20 >>> [3.237205] [] cgroup_file_write+0x35/0x1c0 >>> [3.237600] [] kernfs_fop_write+0x141/0x190 >>> [3.237998] [] __vfs_write+0x28/0xe0 >>> [3.239554] [] vfs_write+0xac/0x1a0 >>> [3.240308] [] SyS_write+0x49/0xb0 >>> [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 >> >> I have enabled a few additional cgroup controllers as well, because I was >> trying to figure out why I only see the 'memory' cgroup controller in >> cgroup.controllers. pid and io show up but not net_prio or net_cls. >> Not sure why systemd (v227) is not mounting them. > > net_prio and net_cls aren't gonna be on the v2 hierarchy. The match > in this patchset is being introduced to replace them; however, you can > mount them separately on a v1 hierarchy and use the same as before. Okay, I could have figured that myself I guess. I mounted the v1 hierarchy and it works as you have described it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] sock, cgroup: add sock->sk_cgroup
On 11/23/2015 04:48 PM, Tejun Heo wrote: > On Mon, Nov 23, 2015 at 02:02:03PM +0100, Daniel Wagner wrote: >> On 11/21/2015 05:13 PM, Tejun Heo wrote: >>> Signed-off-by: Tejun Heo >>> Cc: Daniel Borkmann >>> Cc: Daniel Wagner >> >> I did a quick test and for new connection the cgroup2 match worked as >> expected. For an existing connection I wasn't able to trigger the match. >> >> It is quite likely I do something wrong: >> >> ssh into the box >> # mkdir /sys/fs/cgroup/test >> # echo $$ > /sys/fs/cgroup/test/cgroup.procs >> # echo $PPID > /sys/fs/cgroup/test/cgroup.procs >> # iptables -A OUTPUT -m cgroup --path test >> >> Should I see matches with the existing ssh session? > > Socket is associated with the creating cgroup and stays associated > with that cgroup until it's released. Migrating the process doesn't > change the ownership of the sockets it has created. This is in line > with how other stateful resources such as memory are handled in > cgroup2 hierarchy. Thanks for the explanation. Looks good to me: Tested-by: Daniel Wagner Acked-by: Daniel Wagner Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/9] sock, cgroup: add sock->sk_cgroup
Hi Tejun, On 11/21/2015 05:13 PM, Tejun Heo wrote: > Signed-off-by: Tejun Heo > Cc: Daniel Borkmann > Cc: Daniel Wagner I did a quick test and for new connection the cgroup2 match worked as expected. For an existing connection I wasn't able to trigger the match. It is quite likely I do something wrong: ssh into the box # mkdir /sys/fs/cgroup/test # echo $$ > /sys/fs/cgroup/test/cgroup.procs # echo $PPID > /sys/fs/cgroup/test/cgroup.procs # iptables -A OUTPUT -m cgroup --path test Should I see matches with the existing ssh session? cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/9] netfilter: prepare xt_cgroup for multi revisions
Hi Tejun, On 11/21/2015 05:14 PM, Tejun Heo wrote: > xt_cgroup will grow cgroup2 path based match. Postfix existing > symbols with _v0 and prepare for multi revision registration. > > Signed-off-by: Tejun Heo > Cc: Daniel Borkmann > Cc: Daniel Wagner Same as in my reply to patch #9 (yes, I know do it wrong order... thought can't stop now... :)) Tested-by: Daniel Wagner Acked-by: Daniel Wagner cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 9/9] netfilter: implement xt_cgroup cgroup2 path match
Hi Tejun, On 11/21/2015 05:14 PM, Tejun Heo wrote:> +static int > cgroup_mt_check_v1(const struct xt_mtchk_param *par) > +{ > + struct xt_cgroup_info_v1 *info = par->matchinfo; > + struct cgroup *cgrp; > + > + if ((info->invert_path & ~1) || (info->invert_classid & ~1)) > + return -EINVAL; The checks below use pr_info() in case the configuration is not valid. Is this missing here on purpose? I have tested it slightly and it seems to work (also on an older kernel). I don't know if that qualifies it for a Tested-by but at least Acked-by should do the trick: Tested-by: Daniel Wagner Acked-by: Daniel Wagner cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
On 11/23/2015 08:11 AM, Daniel Wagner wrote: > [3.217648] systemd[1]: tmp.mount: Directory /tmp to mount over is not > empty, mounting anyway. > [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 > [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , > .owner: systemd/1, .owner_cpu: 1 > [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 > [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 > [3.228906] 834a2160 88007c043ad0 81551edc > 88007c028000 > [3.229512] 88007c043af0 81136868 834a2160 > 88007aff5940 > [3.230105] 88007c043b08 81136b05 834a2160 > 88007c043b20 > [3.230716] Call Trace: > [3.230906] [] dump_stack+0x4e/0x82 > [3.231289] [] spin_dump+0x78/0xc0 > [3.231642] [] do_raw_spin_unlock+0x75/0xd0 > [3.232039] [] _raw_spin_unlock+0x27/0x50 > [3.232431] [] update_classid_sock+0x68/0x80 > [3.232836] [] iterate_fd+0x71/0x150 > [3.233197] [] update_classid+0x47/0x80 > [3.233571] [] cgrp_attach+0x14/0x20 > [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 > [3.234366] [] cgroup_migrate+0xf5/0x190 > [3.234747] [] ? cgroup_migrate+0x5/0x190 > [3.235130] [] cgroup_attach_task+0x176/0x200 > [3.235543] [] ? cgroup_attach_task+0x5/0x200 > [3.235953] [] __cgroup_procs_write+0x2ad/0x460 > [3.236377] [] ? __cgroup_procs_write+0x5e/0x460 > [3.236805] [] cgroup_procs_write+0x14/0x20 > [3.237205] [] cgroup_file_write+0x35/0x1c0 > [3.237600] [] kernfs_fop_write+0x141/0x190 > [3.237998] [] __vfs_write+0x28/0xe0 > [3.238361] [] ? percpu_down_read+0x57/0xa0 > [3.238761] [] ? __sb_start_write+0xb4/0xf0 > [3.239154] [] ? __sb_start_write+0xb4/0xf0 > [3.239554] [] vfs_write+0xac/0x1a0 > [3.239930] [] ? __fget_light+0x66/0x90 > [3.240308] [] SyS_write+0x49/0xb0 > [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 I have enabled a few additional cgroup controllers as well, because I was trying to figure out why I only see the 'memory' cgroup controller in cgroup.controllers. pid and io show up but not net_prio or net_cls. Not sure why systemd (v227) is not mounting them. Though, after a while a similar call trace is produced. I guess this has nothing to do with the current changes. [ 11.594536] [ cut here ] [ 11.595274] WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40() [ 11.595958] Modules linked in: [ 11.596199] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #196 [ 11.596689] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [ 11.597632] 81f66d8b 88007c04bb90 8155ccdc [ 11.598234] 88007c04bbc8 810de202 8800793dda00 88007a096800 [ 11.598877] 88007c04bc80 88007a6b6200 0001 88007c04bbd8 [ 11.599547] Call Trace: [ 11.599784] [] dump_stack+0x4e/0x82 [ 11.600197] [] warn_slowpath_common+0x82/0xc0 [ 11.600705] [] warn_slowpath_null+0x1a/0x20 [ 11.601208] [] pids_cancel.constprop.6+0x31/0x40 [ 11.601764] [] pids_can_attach+0x6d/0xf0 [ 11.602245] [] cgroup_taskset_migrate+0x6a/0x330 [ 11.602795] [] cgroup_migrate+0xf5/0x190 [ 11.603276] [] ? cgroup_migrate+0x5/0x190 [ 11.603788] [] cgroup_attach_task+0x176/0x200 [ 11.604308] [] ? cgroup_attach_task+0x5/0x200 [ 11.604831] [] __cgroup_procs_write+0x2ad/0x460 [ 11.605367] [] ? __cgroup_procs_write+0x5e/0x460 [ 11.605929] [] cgroup_procs_write+0x14/0x20 [ 11.606448] [] cgroup_file_write+0x35/0x1c0 [ 11.606931] [] kernfs_fop_write+0x141/0x190 [ 11.607401] [] __vfs_write+0x28/0xe0 [ 11.607834] [] ? percpu_down_read+0x57/0xa0 [ 11.608366] [] ? __sb_start_write+0xb4/0xf0 [ 11.608874] [] ? __sb_start_write+0xb4/0xf0 [ 11.609343] [] vfs_write+0xac/0x1a0 [ 11.609843] [] ? __fget_light+0x66/0x90 [ 11.610315] [] SyS_write+0x49/0xb0 [ 11.610756] [] entry_SYSCALL_64_fastpath+0x12/0x76 [ 11.611305] ---[ end trace 7f953d0ce5af99ea ]--- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHSET v3] netfilter, cgroup: implement cgroup2 path match in xt_cgroup
Hi Tejun, On 11/21/2015 05:13 PM, Tejun Heo wrote: > This is v3 of the xt_cgroup2 patchset. Changes from the last take are > > * Folded cgroup2 path matching into xt_cgroup as a new revision rather > than a separate xt_cgroup2 match as suggested by Pablo. > > * Refreshed on top of Nina's net_cls dynamic config update fix patch. > I included the fix patch as part of this series to ease reviewing. I started to play with your patches and was greeted by this: [3.217648] systemd[1]: tmp.mount: Directory /tmp to mount over is not empty, mounting anyway. [3.224665] BUG: spinlock bad magic on CPU#1, systemd/1 [3.225653] lock: cgroup_sk_update_lock+0x0/0x60, .magic: , .owner: systemd/1, .owner_cpu: 1 [3.227034] CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #195 [3.227862] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014 [3.228906] 834a2160 88007c043ad0 81551edc 88007c028000 [3.229512] 88007c043af0 81136868 834a2160 88007aff5940 [3.230105] 88007c043b08 81136b05 834a2160 88007c043b20 [3.230716] Call Trace: [3.230906] [] dump_stack+0x4e/0x82 [3.231289] [] spin_dump+0x78/0xc0 [3.231642] [] do_raw_spin_unlock+0x75/0xd0 [3.232039] [] _raw_spin_unlock+0x27/0x50 [3.232431] [] update_classid_sock+0x68/0x80 [3.232836] [] iterate_fd+0x71/0x150 [3.233197] [] update_classid+0x47/0x80 [3.233571] [] cgrp_attach+0x14/0x20 [3.233929] [] cgroup_taskset_migrate+0x1e1/0x330 [3.234366] [] cgroup_migrate+0xf5/0x190 [3.234747] [] ? cgroup_migrate+0x5/0x190 [3.235130] [] cgroup_attach_task+0x176/0x200 [3.235543] [] ? cgroup_attach_task+0x5/0x200 [3.235953] [] __cgroup_procs_write+0x2ad/0x460 [3.236377] [] ? __cgroup_procs_write+0x5e/0x460 [3.236805] [] cgroup_procs_write+0x14/0x20 [3.237205] [] cgroup_file_write+0x35/0x1c0 [3.237600] [] kernfs_fop_write+0x141/0x190 [3.237998] [] __vfs_write+0x28/0xe0 [3.238361] [] ? percpu_down_read+0x57/0xa0 [3.238761] [] ? __sb_start_write+0xb4/0xf0 [3.239154] [] ? __sb_start_write+0xb4/0xf0 [3.239554] [] vfs_write+0xac/0x1a0 [3.239930] [] ? __fget_light+0x66/0x90 [3.240308] [] SyS_write+0x49/0xb0 [3.240656] [] entry_SYSCALL_64_fastpath+0x12/0x76 I am using a Fedora 23 host with systemd.unified_cgroup_hierarchy=1. The config is available here: http://monom.org/cgroup/config-review-xt_cgroup2 Probably completely rubbish, because it's my random test config. cheers, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] sock, cgroup: add sock->sk_cgroup
Hi Tejun, On 11/19/2015 07:52 PM, Tejun Heo wrote: > +/* > + * There's a theoretical window where the following accessors race with > + * updaters and return part of the previous pointer as the prioidx or > + * classid. Such races are short-lived and the result isn't critical. > + */ > static inline u16 sock_cgroup_prioidx(struct sock_cgroup_data *skcd) > { > - return skcd->prioidx; > + return (skcd->is_data & 1) ? skcd->prioidx : 1; > } > > static inline u32 sock_cgroup_classid(struct sock_cgroup_data *skcd) > { > - return skcd->classid; > + return (skcd->is_data & 1) ? skcd->classid : 0; > } I still try to understand what the code does, hence this stupid question: Why is sock_cgroup_prioidx() returning 1 if is not data and sock_cgroup_classid() a 0? thanks, daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] netprio_cgroup: limit the maximum css->id to USHRT_MAX
On 11/19/2015 07:52 PM, Tejun Heo wrote: > netprio builds per-netdev contiguous priomap array which is indexed by > css->id. The array is allocated using kzalloc() effectively limiting > the maximum ID supported to some thousand range. This patch caps the > maximum supported css->id to USHRT_MAX which should be way above what > is actually useable. > > This allows reducing sock->sk_cgrp_prioidx to u16 from u32. The freed > up part will be used to overload the cgroup related fields. > sock->sk_cgrp_prioidx's position is swapped with sk_mark so that the > two cgroup related fields are adjacent. > > Signed-off-by: Tejun Heo > Cc: Daniel Borkmann > Cc: Daniel Wagner Acked-by: Daniel Wagner -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] bpf: BPF based latency tracing
On 06/20/2015 10:14 AM, Daniel Borkmann wrote: > I think it would be useful to perhaps have two options: > > 1) User specifies a specific CPU and gets one such an output above. Good point. Will do. > 2) Summary view, i.e. to have the samples of each CPU for comparison >next to each other in columns and maybe the histogram view a bit >more compressed (perhaps summary of all CPUs). I agree, the current view is not really optimal. I'll look into this as well. Alexei indicated that he is working on per-cpu variables support. I think that would be extremely useful to drop the hard coded limit of CPUs and turning this sample code into some more generic code. > Anyway, it's sample code people can go with and modify individually. I am interested to turn this code into a more useful tool. Though I think I miss some background information why this code is kept as samples. Obviously, there is the API and ARCH dependency. As long as an API change can reliable be detected I don't see a real show stopper. Maybe I am too naive. Furthermore I expected that trace_preempt_[on|off] wont change that often. > Acked-by: Daniel Borkmann Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in
[PATCH v2] bpf: BPF based latency tracing
t; 32767: 178 || 32768 -> 65535: 59 || 65536 -> 131071 : 2|| 131072 -> 262143 : 0|| 262144 -> 524287 : 1|| 524288 -> 1048575 : 174 || CPU 3 latency: count distribution 1 -> 1: 0|| 2 -> 3: 0|| 4 -> 7: 0|| 8 -> 15 : 0|| 16 -> 31 : 0|| 32 -> 63 : 0|| 64 -> 127 : 0|| 128 -> 255 : 0|| 256 -> 511 : 0|| 512 -> 1023 : 0|| 1024 -> 2047 : 0|| 2048 -> 4095 : 29626|*** | 4096 -> 8191 : 2704 |** | 8192 -> 16383: 1090 || 16384 -> 32767: 160 || 32768 -> 65535: 72 || 65536 -> 131071 : 32 || 131072 -> 262143 : 26 || 262144 -> 524287 : 12 || 524288 -> 1048575 : 298 || All this is based on the trace3 examples written by Alexei Starovoitov . Signed-off-by: Daniel Wagner Cc: Alexei Starovoitov Cc: Alexei Starovoitov Cc: "David S. Miller" Cc: Daniel Borkmann Cc: Ingo Molnar Cc: linux-ker...@vger.kernel.org Cc: netdev@vger.kernel.org --- Hi Alexei, Something broke in my toolchain and it took me a while to figure out whats going on. With the rebase on net-next no additinal patches are needed and this thing here runs fine. This time with code... cheers, daniel changes v2: - forgot to do a git add... v1: - rebases on net-next v0: - renamed to lathist since there is no direct hw latency involved - use arrays instead of hash tables samples/bpf/Makefile | 4 ++ samples/bpf/lathist_kern.c | 99 +++ samples/bpf/lathist_user.c | 103 + 3 files changed, 206 insertions(+) create mode 100644 samples/bpf/lathist_kern.c create mode 100644 samples/bpf/lathist_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 46c6a8c..4450fed 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -12,6 +12,7 @@ hostprogs-y += tracex2 hostprogs-y += tracex3 hostprogs-y += tracex4 hostprogs-y += tracex5 +hostprogs-y += lathist test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -24,6 +25,7 @@ tracex2-objs := bpf_load.o libbpf.o tracex2_user.o tracex3-objs := bpf_load.o libbpf.o tracex3_user.o tracex4-objs := bpf_load.o libbpf.o tracex4_user.o tracex5-objs := bpf_load.o libbpf.o tracex5_user.o +lathist-objs := bpf_load.o libbpf.o lathist_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -36,6 +38,7 @@ always += tracex3_kern.o always += tracex4_kern.o always += tracex5_kern.o always += tcbpf1_kern.o +always += lathist_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -48,6 +51,7 @@ HOSTLOADLIBES_tracex2 += -lelf HOSTLOADLIBES_tracex3 += -lelf HOSTLOADLIBES_tracex4 += -lelf -lrt HOSTLOADLIBES_tracex5 += -lelf +HOSTLOADLIBES_lathist += -lelf # point this to your LLVM backend with bpf support LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc diff --git a/samples/bpf/lathist_kern.c b/samples/bpf/lathist_kern.c new file mode 100644 index 000..18fa088 --- /dev/null +++ b/samples/bpf/lathist_kern.c @@ -0,0 +1,99 @@ +/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com + * Copyright (c) 2015 BMW Car IT GmbH + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include +#include +#include +#include "bpf_helpers.h" + +#define MAX_ENTRIES20 +#define MAX_CPU4 + +/* We need to stick to static allocated memory (an array instead of + * hash table) because managing dynamic mem
[PATCH v1] bpf: BPF based latency tracing
t; 32767: 178 || 32768 -> 65535: 59 || 65536 -> 131071 : 2|| 131072 -> 262143 : 0|| 262144 -> 524287 : 1|| 524288 -> 1048575 : 174 || CPU 3 latency: count distribution 1 -> 1: 0|| 2 -> 3: 0|| 4 -> 7: 0|| 8 -> 15 : 0|| 16 -> 31 : 0|| 32 -> 63 : 0|| 64 -> 127 : 0|| 128 -> 255 : 0|| 256 -> 511 : 0|| 512 -> 1023 : 0|| 1024 -> 2047 : 0|| 2048 -> 4095 : 29626|*** | 4096 -> 8191 : 2704 |** | 8192 -> 16383: 1090 || 16384 -> 32767: 160 || 32768 -> 65535: 72 || 65536 -> 131071 : 32 || 131072 -> 262143 : 26 || 262144 -> 524287 : 12 || 524288 -> 1048575 : 298 || All this is based on the trace3 examples written by Alexei Starovoitov . Signed-off-by: Daniel Wagner Cc: Alexei Starovoitov Cc: "David S. Miller" Cc: Daniel Borkmann Cc: Ingo Molnar Cc: linux-ker...@vger.kernel.org Cc: netdev@vger.kernel.org --- Hi Alexei, Something broke in my toolchain and it took me a while to figure out whats going on. With the rebase on net-next no additinal patches are needed and this thing here runs fine. cheers, daniel changes v1: - rebases on net-next v0: - renamed to lathist since there is no direct hw latency involved - use arrays instead of hash tables samples/bpf/Makefile | 4 1 file changed, 4 insertions(+) diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 46c6a8c..4450fed 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -12,6 +12,7 @@ hostprogs-y += tracex2 hostprogs-y += tracex3 hostprogs-y += tracex4 hostprogs-y += tracex5 +hostprogs-y += lathist test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -24,6 +25,7 @@ tracex2-objs := bpf_load.o libbpf.o tracex2_user.o tracex3-objs := bpf_load.o libbpf.o tracex3_user.o tracex4-objs := bpf_load.o libbpf.o tracex4_user.o tracex5-objs := bpf_load.o libbpf.o tracex5_user.o +lathist-objs := bpf_load.o libbpf.o lathist_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -36,6 +38,7 @@ always += tracex3_kern.o always += tracex4_kern.o always += tracex5_kern.o always += tcbpf1_kern.o +always += lathist_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -48,6 +51,7 @@ HOSTLOADLIBES_tracex2 += -lelf HOSTLOADLIBES_tracex3 += -lelf HOSTLOADLIBES_tracex4 += -lelf -lrt HOSTLOADLIBES_tracex5 += -lelf +HOSTLOADLIBES_lathist += -lelf # point this to your LLVM backend with bpf support LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe netdev" in