Re: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h
On Fri, 18 Feb 2022 06:12:37 -0600 Segher Boessenkool wrote: > On Fri, Feb 18, 2022 at 10:35:48AM +0900, Masahiro Yamada wrote: > > On Fri, Feb 18, 2022 at 3:10 AM Segher Boessenkool > > wrote: > > > On Fri, Feb 18, 2022 at 02:27:16AM +0900, Masahiro Yamada wrote: > > > > On Fri, Feb 18, 2022 at 1:49 AM David Laight > > > > wrote: > > > > > That description is largely fine. > > > > > > > > > > Inappropriate 'inline' ought to be removed. > > > > > Then 'inline' means - 'really do inline this'. > > > > > > > > You cannot change "static inline" to "static" > > > > in header files. > > > > > > Why not? Those two have identical semantics! > > > > e.g.) > > > > > > [1] Open include/linux/device.h with your favorite editor, > > then edit > > > > static inline void *devm_kcalloc(struct device *dev, > > > > to > > > > static void *devm_kcalloc(struct device *dev, > > > > > > [2] Build the kernel > > You get some "defined but not used" warnings that are shushed for > inlines. Do you see something else? > > The semantics are the same. Warnings are just warnings. It builds > fine. Kernel code should build with zero warnings, the compiler is telling you something.
Re: [PATCH net-next] ibmvnic: Increase driver logging
On Thu, 16 Jul 2020 13:22:00 -0700 Jakub Kicinski wrote: > On Thu, 16 Jul 2020 18:07:37 +0200 Michal Suchánek wrote: > > On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote: > > > On 7/15/20 8:29 PM, David Miller wrote: > > > > From: Jakub Kicinski > > > > Date: Wed, 15 Jul 2020 17:06:32 -0700 > > > > > > > > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote: > > > > > > free_netdev(netdev); > > > > > > dev_set_drvdata(>dev, NULL); > > > > > > + netdev_info(netdev, "VNIC client device has been successfully > > > > > > removed.\n"); > > > > > A step too far, perhaps. > > > > > > > > > > In general this patch looks a little questionable IMHO, this amount of > > > > > logging output is not commonly seen in drivers. All the the info > > > > > messages are just static text, not even carrying any extra > > > > > information. > > > > > In an era of ftrace, and bpftrace, do we really need this? > > > > Agreed, this is too much. This is debugging, and thus suitable for > > > > tracing > > > > facilities, at best. > > > > > > Thanks for your feedback. I see now that I was overly aggressive with this > > > patch to be sure, but it would help with narrowing down problems at a > > > first > > > glance, should they arise. The driver in its current state logs very > > > little > > > of what is it doing without the use of additional debugging or tracing > > > facilities. Would it be worth it to pursue a less aggressive version or > > > would that be dead on arrival? What are acceptable driver operations to > > > log > > > at this level? > > Sadly it's much more of an art than hard science. Most networking > drivers will print identifying information when they probe the device > and then only about major config changes or when link comes up or goes > down. And obviously when anything unexpected, like an error happens, > that's key. > > You seem to be adding start / end information for each driver init / > deinit stage. I'd say try to focus on the actual errors you're trying > to catch. > > > Also would it be advisable to add the messages as pr_dbg to be enabled on > > demand? > > I personally have had a pretty poor experience with pr_debug() because > CONFIG_DYNAMIC_DEBUG is not always enabled. Since you're just printing > static text there shouldn't be much difference between pr_debug and > ftrace and/or bpftrace, honestly. > > Again, slightly hard to advise not knowing what you're trying to catch. Linux drivers in general are far too noisy. In production it is not uncommon to set kernel to suppress all info messages.
RE: [PATCH V3 1/10] X86/Hyper-V: Add parameter offset for hyperv_fill_flush_guest_mapping_list()
int hyperv_fill_flush_guest_mapping_list( struct hv_guest_mapping_flush_list *flush, - u64 start_gfn, u64 pages) + int offset, u64 start_gfn, u64 pages) { u64 cur = start_gfn; u64 additional_pages; - int gpa_n = 0; + int gpa_n = offset; do { /* Do you mean to support negative offsets here? Maybe unsigned would be better?
Re: [PATCH net-next 17/22] hv_netvsc: fix return type of ndo_start_xmit function
On Thu, 20 Sep 2018 20:33:01 +0800 YueHaibing wrote: > The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', > which is a typedef for an enum type, so make sure the implementation in > this driver has returns 'netdev_tx_t' value, and change the function > return type to netdev_tx_t. > > Found by coccinelle. > > Signed-off-by: YueHaibing > --- > drivers/net/hyperv/netvsc_drv.c | 10 +++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c > index 3af6d8d..056c472 100644 > --- a/drivers/net/hyperv/netvsc_drv.c > +++ b/drivers/net/hyperv/netvsc_drv.c > @@ -511,7 +511,8 @@ static int netvsc_vf_xmit(struct net_device *net, struct > net_device *vf_netdev, > return rc; > } > > -static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) > +static netdev_tx_t > +netvsc_start_xmit(struct sk_buff *skb, struct net_device *net) > { > struct net_device_context *net_device_ctx = netdev_priv(net); > struct hv_netvsc_packet *packet = NULL; > @@ -528,8 +529,11 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct > net_device *net) >*/ > vf_netdev = rcu_dereference_bh(net_device_ctx->vf_netdev); > if (vf_netdev && netif_running(vf_netdev) && > - !netpoll_tx_running(net)) > - return netvsc_vf_xmit(net, vf_netdev, skb); > + !netpoll_tx_running(net)) { > + ret = netvsc_vf_xmit(net, vf_netdev, skb); > + if (ret) > + return NETDEV_TX_BUSY; > + } Sorry, the new code is wrong. It will fall through if ret == 0 (NETDEV_TX_OK) Please review and test your patches.
Re: [PATCH net 1/5] ibmvnic: harden interrupt handler
On Wed, 25 Jan 2017 15:02:19 -0600 Thomas Falconwrote: > static irqreturn_t ibmvnic_interrupt(int irq, void *instance) > { > struct ibmvnic_adapter *adapter = instance; > + unsigned long flags; > + > + spin_lock_irqsave(>crq.lock, flags); > + vio_disable_interrupts(adapter->vdev); > + tasklet_schedule(>tasklet); > + spin_unlock_irqrestore(>crq.lock, flags); > + return IRQ_HANDLED; > +} > + Why not use NAPI? rather than a tasklet
RE: [PATCH 1/2] PCI: Add pci_set_vpd_timeout() to set VPD access timeout
I had old Marvell Sky2 hardware that had slow flash. -Original Message- From: Andrew Donnellan [mailto:andrew.donnel...@au1.ibm.com] Sent: Monday, November 21, 2016 4:16 PM To: Bjorn Helgaas <helg...@kernel.org>; Matthew R. Ochs <mro...@linux.vnet.ibm.com> Cc: linux-...@vger.kernel.org; Frederic Barrat <fbar...@linux.vnet.ibm.com>; Uma Krishnan <ukri...@linux.vnet.ibm.com>; Ian Munsie <imun...@au1.ibm.com>; Bjorn Helgaas <bhelg...@google.com>; linuxppc-dev@lists.ozlabs.org; Stephen Hemminger <sthem...@microsoft.com> Subject: Re: [PATCH 1/2] PCI: Add pci_set_vpd_timeout() to set VPD access timeout On 22/11/16 09:05, Bjorn Helgaas wrote: > Hi Matthew, > > On Mon, Nov 21, 2016 at 03:09:49PM -0600, Matthew R. Ochs wrote: >> The PCI core uses a fixed 50ms timeout when waiting for VPD accesses >> to complete. When an access does not complete within this period, a >> warning is logged and an error returned to the caller. >> >> While this default timeout is valid for most hardware, some devices >> can experience longer access delays under certain circumstances. For >> example, one of the IBM CXL Flash devices can take up to ~120ms in a >> worst-case scenario. These types of devices can benefit from an >> extended timeout that is specific to their hardware constraints. >> >> To support per-device VPD access timeouts, pci_set_vpd_timeout() is >> added as an exported service. PCI devices will continue to default >> with the 50ms timeout and use a per-device timeout when a driver calls this >> new service. > > Can you include a pointer to something in the spec that's behind the > default 50ms timeout, or did somebody just pull that number out of the > air? It looks like Stephen Hemminger added the 50ms timeout in 1120f8b8169f, which seems to indicate that 50ms was chosen because it's longer than the 13ms per word that was measured on one device. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnel...@au1.ibm.com IBM Australia Limited
Re: [v3, 2/9] fsl/fman: Add the FMan port FLIB
On Wed, 22 Jul 2015 14:21:48 +0300 igal.liber...@freescale.com wrote: diff --git a/drivers/net/ethernet/freescale/fman/Kconfig b/drivers/net/ethernet/freescale/fman/Kconfig index 8aeae29..af42c3a 100644 --- a/drivers/net/ethernet/freescale/fman/Kconfig +++ b/drivers/net/ethernet/freescale/fman/Kconfig @@ -5,3 +5,4 @@ config FSL_FMAN help Freescale Data-Path Acceleration Architecture Frame Manager (FMan) support + Bogus blank line introduced at end of file? Why was this not in patch 1? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH net-next] af_unix: fix a fatal race with bit fields
On Tue, 30 Apr 2013 22:04:32 -0700 Eric Dumazet eric.duma...@gmail.com wrote: On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote: On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote: li 11,1 ld 0,0(9) rldimi 0,11,31,32 std 0,0(9) blr .ident GCC: (GNU) 4.6.3 You can see ld 0,0(9) is used : its a 64 bit load. Yup. This is not a powerpc64 specific problem. See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080 Fixed in 4.8.0 and 4.7.3. Ah thanks. This seems a pretty serious issue, is it documented somewhere that ppc64, sparc64 and others need such compiler version ? These kind of errors are pretty hard to find, its a pity to spend time on them. There is a checkbin target inside arch/powerpc/Makefile Shouldn't a check be added there to block building kernel with known bad GCC versions? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v5 01/45] percpu_rwlock: Introduce the global reader-writer lock backend
On Tue, 22 Jan 2013 13:03:22 +0530 Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote: A straight-forward (and obvious) algorithm to implement Per-CPU Reader-Writer locks can also lead to too many deadlock possibilities which can make it very hard/impossible to use. This is explained in the example below, which helps justify the need for a different algorithm to implement flexible Per-CPU Reader-Writer locks. We can use global rwlocks as shown below safely, without fear of deadlocks: Readers: CPU 0CPU 1 -- -- 1.spin_lock(random_lock); read_lock(my_rwlock); 2.read_lock(my_rwlock); spin_lock(random_lock); Writer: CPU 2: -- write_lock(my_rwlock); We can observe that there is no possibility of deadlocks or circular locking dependencies here. Its perfectly safe. Now consider a blind/straight-forward conversion of global rwlocks to per-CPU rwlocks like this: The reader locks its own per-CPU rwlock for read, and proceeds. Something like: read_lock(per-cpu rwlock of this cpu); The writer acquires all per-CPU rwlocks for write and only then proceeds. Something like: for_each_online_cpu(cpu) write_lock(per-cpu rwlock of 'cpu'); Now let's say that for performance reasons, the above scenario (which was perfectly safe when using global rwlocks) was converted to use per-CPU rwlocks. CPU 0CPU 1 -- -- 1.spin_lock(random_lock); read_lock(my_rwlock of CPU 1); 2.read_lock(my_rwlock of CPU 0); spin_lock(random_lock); Writer: CPU 2: -- for_each_online_cpu(cpu) write_lock(my_rwlock of 'cpu'); Consider what happens if the writer begins his operation in between steps 1 and 2 at the reader side. It becomes evident that we end up in a (previously non-existent) deadlock due to a circular locking dependency between the 3 entities, like this: (holds Waiting for random_lock) CPU 0 - CPU 2 (holds my_rwlock of CPU 0 for write) ^ | | | Waiting| | Waiting for | | for | V -- CPU 1 -- (holds my_rwlock of CPU 1 for read) So obviously this straight-forward way of implementing percpu rwlocks is deadlock-prone. One simple measure for (or characteristic of) safe percpu rwlock should be that if a user replaces global rwlocks with per-CPU rwlocks (for performance reasons), he shouldn't suddenly end up in numerous deadlock possibilities which never existed before. The replacement should continue to remain safe, and perhaps improve the performance. Observing the robustness of global rwlocks in providing a fair amount of deadlock safety, we implement per-CPU rwlocks as nothing but global rwlocks, as a first step. Cc: David Howells dhowe...@redhat.com Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com We got rid of brlock years ago, do we have to reintroduce it like this? The problem was that brlock caused starvation. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] [V3] net: add Xilinx emac lite device driver
On Thu, 20 Aug 2009 03:49:51 -0600 John Linn john.l...@xilinx.com wrote: +/** + * xemaclite_ioctl - Perform IO Control operations on the network device + * @dev: Pointer to the network device + * @rq: Pointer to the interface request structure + * @cmd: IOCTL command + * + * The only IOCTL operation supported by this function is setting the MAC + * address. An error is reported if any other operations are requested. + * + * Return: 0 to indicate success, or a negative error for failure. + */ +static int xemaclite_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) +{ + struct net_local *lp = (struct net_local *) netdev_priv(dev); + struct hw_addr_data *hw_addr = (struct hw_addr_data *) rq-ifr_hwaddr; + + switch (cmd) { + case SIOCETHTOOL: + return -EIO; + + case SIOCSIFHWADDR: + dev_err(lp-ndev-dev, SIOCSIFHWADDR\n); + + /* Copy MAC address in from user space */ + copy_from_user((void __force *) dev-dev_addr, +(void __user __force *) hw_addr, +IFHWADDRLEN); + xemaclite_set_mac_address(lp, dev-dev_addr); + break; + default: + return -EOPNOTSUPP; + } + + return 0; +} Do you really need this? I doubt the SIOCSIFHWADDR even reaches driver! The normal call path for setting hardware address is: dev_ifsioc dev_set_mac_address ops-ndo_set_mac_address -- The driver should be: 1. defining new code to do ndo_set_mac_address 2. remove existing xmaclite_ioctl - all ioctl's handled by upper layers FYI - the only ioctl's that make it to network device ndo_ioctl are listed in dev_ifsioc SIOCDEVPRIVATE ... SIOCDEVPRIVATE + 15 SIOCBOND* SIOCMII* SIOCBR* SIOCHWTSTAMP SIOCWANDEV ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] net: add Xilinx emac lite device driver
On Tue, 18 Aug 2009 09:30:41 -0600 John Linn john.l...@xilinx.com wrote: +/** + * xemaclite_enable_interrupts - Enable the interrupts for the EmacLite device + * @drvdata: Pointer to the Emaclite device private data + * + * This function enables the Tx and Rx interrupts for the Emaclite device along + * with the Global Interrupt Enable. + */ +static void xemaclite_enable_interrupts(struct net_local *drvdata) Docbook format is really a not necessary on local functions that are only used in the driver. It is fine if you want to use it, as long as the file isn't processed by kernel make docs but the docbook is intended for automatic generation of kernel API manuals. -- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Patch: Fix fec_mpc52xx driver to use net_device_ops
On Tue, 31 Mar 2009 12:44:15 +0200 Henk Stegeman henk.stege...@gmail.com wrote: Fix fec_mpc52xx driver to use net_device_ops and to be careful not to dereference phy_device if a phy has not yet been connected. Signed-off-by: Henk Stegeman henk.stege...@gmail.com diff --git a/drivers/net/fec_mpc52xx.c b/drivers/net/fec_mpc52xx.c index cd8e98b..ca76b95 100644 --- a/drivers/net/fec_mpc52xx.c +++ b/drivers/net/fec_mpc52xx.c @@ -847,24 +847,40 @@ static void mpc52xx_fec_get_drvinfo(struct net_device *dev, static int mpc52xx_fec_get_settings(struct net_device *dev, struct ethtool_cmd *cmd) { struct mpc52xx_fec_priv *priv = netdev_priv(dev); + + if (!priv-phydev) + return -ENODEV; + return phy_ethtool_gset(priv-phydev, cmd); } static int mpc52xx_fec_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) { struct mpc52xx_fec_priv *priv = netdev_priv(dev); + + if (!priv-phydev) + return -ENODEV; + return phy_ethtool_sset(priv-phydev, cmd); } static u32 mpc52xx_fec_get_msglevel(struct net_device *dev) { struct mpc52xx_fec_priv *priv = netdev_priv(dev); + + if (!priv-phydev) + return 0; + return priv-msg_enable; } static void mpc52xx_fec_set_msglevel(struct net_device *dev, u32 level) { struct mpc52xx_fec_priv *priv = netdev_priv(dev); + + if (!priv-phydev) + return; + priv-msg_enable = level; } @@ -882,12 +898,31 @@ static int mpc52xx_fec_ioctl(struct net_device *dev, struct ifreq *rq, int cmd) { struct mpc52xx_fec_priv *priv = netdev_priv(dev); + if (!priv-phydev) + return -ENODEV; + return mpc52xx_fec_phy_mii_ioctl(priv, if_mii(rq), cmd); } /* */ /* OF Driver */ /* */ +static const struct net_device_ops mpc52xx_fec_netdev_ops = { + .ndo_open = mpc52xx_fec_open, + .ndo_stop = mpc52xx_fec_close, + .ndo_start_xmit = mpc52xx_fec_hard_start_xmit, + .ndo_tx_timeout = mpc52xx_fec_tx_timeout, + .ndo_get_stats = mpc52xx_fec_get_stats, + .ndo_set_multicast_list = mpc52xx_fec_set_multicast_list, + .ndo_validate_addr = eth_validate_addr, + .ndo_set_mac_address= mpc52xx_fec_set_mac_address, + .ndo_do_ioctl = mpc52xx_fec_ioctl, What about change_mtu? Don't you want: .ndo_change_mtu = eth_change_mtu, ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: net_device_ops support in bridging and fec_mpc52xx.c
On Wed, 18 Feb 2009 13:48:52 -0800 (PST) David Miller da...@davemloft.net wrote: From: Henk Stegeman henk.stege...@gmail.com Date: Wed, 18 Feb 2009 11:41:14 +0100 Please CC: netdev, now added, on all networking reports and patches. Thank you. I discovered the hard way that because linux bridging uses net_device_ops, bridging only works with network drivers that publish their device operations trough net_device_ops. In my case running: brctl addif br0 eth0 (where eth0 fec_mpc52xx.c did not yet support net_device_ops) gave me a: Unable to handle kernel paging request... After changing fec_mpc52xx.c to support net_device_ops the problem was fixed. If possible some kind of detection in the bridging software is i think mostly appreciated for early detection of this problem, as it is pretty hard to relate the error message to a not updated driver. cheers, Henk The normal register_netdevice stuff take care of setting up net_device_ops for old style drivers. Was there something different about how this device was being setup? ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH RFC v2] net: add PCINet driver
On Wed, 29 Oct 2008 13:20:27 -0700 Ira Snyder [EMAIL PROTECTED] wrote: This adds support to Linux for a virtual ethernet interface which uses the PCI bus as its transport mechanism. It creates a simple, familiar, and fast method of communication for two devices connected by a PCI interface. I have implemented client support for the Freescale MPC8349EMDS board, which is capable of running in PCI Agent mode (It acts like a PCI card, but is a complete PowerPC computer, running Linux). It is almost certainly trivially ported to any MPC83xx system. It should be a relatively small effort to port it to any chip that can generate PCI interrupts and has at least one PCI accessible scratch register. It was developed to work in a CompactPCI crate of computers, one of which is a relatively standard x86 system (acting as the host) and many PowerPC systems (acting as clients). RFC v1 - RFC v2: * remove vim modelines * use net_device-name in request_irq(), for irqbalance * remove unnecessary wqt_get_stats(), use default get_stats() instead * use dev_printk() and friends * add message unit to MPC8349EMDS dts file Signed-off-by: Ira W. Snyder [EMAIL PROTECTED] --- This is the third posting of this driver. I got some feedback, and have corrected the problems. Stephen, thanks for the review! I also got some off-list feedback from a potential user, so I believe this is worth getting into mainline. I'll post up a revised version about once a week as long as the changes are minor. If they are more substantial, I'll post them as needed. GregKH: is this worth putting into the staging tree? (I left you out of the CC list to keep your email traffic down) The remaining issues I see in this driver: 1) Naming The name wqt originally stood for workqueue-test and somewhat evolved over time into the current driver. I'm looking for suggestions for a better name. It should be the same between the host and client drivers, to make porting the code between them easier. The drivers are /very/ similar other than the setup code. 2) IMMR Usage In the Freescale client driver, I use the whole set of board control registers (AKA IMMR registers). I only need a very small subset of them, during startup to set up the DMA window. I used the full set of registers so that I could share the register offsets between the drivers (in pcinet_hw.h) 3) ioremap() of a physical address In the Freescale client driver, I called ioremap() with the physical address of the IMMR registers, 0xe000. I don't know a better way to get them. They are somewhat exposed in the Flat Device Tree. Suggestions on how to extract them are welcome. 4) Hardcoded DMA Window Address In the Freescale client driver, I just hardcoded the address of the outbound PCI window into the DMA transfer code. It is 0x8000. Suggestions on how to get this value at runtime are welcome. Rationale behind some decisions: 1) Usage of the PCINET_NET_REGISTERS_VALID bit I want to be able to use this driver from U-Boot to tftp a kernel over the PCI backplane, and then boot up the board. This means that the device descriptor memory, which lives in the client RAM, becomes invalid during boot. 2) Buffer Descriptors in client memory I chose to put the buffer descriptors in client memory rather than host memory. It seemed more logical to me at the time. In my application, I'll have 19 boards + 1 host per cPCI chassis. The client - host direction will see most of the traffic, and so I thought I would cut down on the number of PCI accesses needed. I'm willing to change this. 3) Usage of client DMA controller for all data transfer This was done purely for speed. I tried using the CPU to transfer all data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to ~40MB/sec (as tested with netperf). 4) Static 1GB DMA window Maintaining a window while DMA's in flight, and then changing it seemed too complicated. Also, testing showed that using a static window gave me a ~10MB/sec speedup compared to moving the window for each skb. 5) The serial driver Yes, there are two essentially separate drivers here. I needed a method to communicate with the U-Boot bootloader on these boards without plugging in a serial cable. With 19 clients + 1 host per chassis, the cable clutter is worth avoiding. Since everything is connected via the PCI bus anyway, I used that. A virtual serial port was simple to implement using the messaging unit hardware that I used for the network driver. I'll post both U-Boot drivers to their mailing list once this driver is finalized. Thanks, Ira arch/powerpc/boot/dts/mpc834x_mds.dts |7 + drivers/net/Kconfig | 34 + drivers/net/Makefile
Re: [RFC v1] net: add PCINet driver
On Thu, 23 Oct 2008 15:49:32 -0700 Ira Snyder [EMAIL PROTECTED] wrote: This adds support to Linux for a virtual ethernet interface which uses the PCI bus as its transport mechanism. It creates a simple, familiar, and fast method of communication for two devices connected by a PCI interface. I have implemented client support for the Freescale MPC8349EMDS board, which is capable of running in PCI Agent mode (It acts like a PCI card, but is a complete PowerPC computer, running Linux). It is almost certainly trivially ported to any MPC83xx system. It should be a relatively small effort to port it to any chip that can generate PCI interrupts and has at least one PCI accessible scratch register. It was developed to work in a CompactPCI crate of computers, one of which is a relatively standard x86 system (acting as the host) and many PowerPC systems (acting as clients). Signed-off-by: Ira W. Snyder [EMAIL PROTECTED] --- This is my second posting of this driver. I posted it to the linux-netdev list a week ago, but did not get any replies. Therefore, I'll post it for a wider audience. :) Hello everyone. This is my first network driver, so take it easy on me. :) I'm quite sure it isn't ready for inclusion into mainline yet, but I think it is in good enough shape for some review. Through conversations on IRC, I have been led to believe that this has been done before, but no implementations have been made public. My employer has no problems with me making this public, so I thought it would be good to post it. I don't know if something like this is even desired for mainline inclusion, but I do know that even having an example driver to base this on would have saved me some effort. The major issues I see: 1) The name wqt originally stood for workqueue-test and somewhat evolved into this driver. I'm looking for suggestions. I'd like to have something that is the same between the host and client drivers, since most of the code is identical. It makes copy/paste easier. The only one I can come up with is bpc for Backplane Communications. 2) In the Freescale client driver, I use the whole set of IMMR (board control) registers. I only need a very small subset of them only during startup. I used them the way I did so I could use the same pcinet_hw.h file for both the client and server drivers. 3) I just ioremap()ed the IMMR registers directly. (They're at 0xe000). I just didn't know a better way to do this. I need them to set up the PCI memory containing the buffer descriptors. 4) I just hardcoded the address of the outbound PCI window into the DMA transfer code. It is at 0x8000. Suggestions are welcome. Why I made some decisions: 1) The PCINET_NET_REGISTERS_VALID bit: I want to be able to use this driver from U-Boot to copy a kernel, etc. over the PCI backplane to boot up the board. This meant that the memory that I used for the buffer descriptors disappears while the machine is booting Linux. Suggestions are welcome. 2) The buffer descriptors in client memory, rather than host memory: I thought it seemed more logical. Also, in my application, the clients will be transferring much more data to the host. 3) Use of the Freescale (client) DMA controller to transfer all data: I tried transferring all of the data using the CPU on each board. This turned out to be extremely slow, as in 2-3 MB/sec max. Using the DMA controller, I get ~40 MB/sec (tested with netperf). 4) Use of a static 1GB window to access the host's memory (to copy skbs): Maintaining the window while DMA's are in flight, and then changing them seemed to be too much trouble. A static window just seemed easier. Also, removing the overhead of moving the window from each skb transferred actually gave a reasonable speedup. (I tested it.) 5) The uart stuff: I needed a method to talk to the U-Boot bootloader on these boards without plugging in a serial cable. When my project gets going, I'll have 150 of them. Booting them one at a time is out of the question. A virtual serial port was simple to implement using the same hardware that I used for the network driver. I'll post the U-Boot driver to their mailing list once this driver is finalized. Thanks, Ira drivers/net/Kconfig | 34 ++ drivers/net/Makefile |3 + drivers/net/pcinet.h | 77 +++ drivers/net/pcinet_fsl.c | 1360 +++ drivers/net/pcinet_host.c | 1392 + drivers/net/pcinet_hw.h | 80 +++ 6 files changed, 2946 insertions(+), 0 deletions(-) create mode 100644 drivers/net/pcinet.h create mode 100644 drivers/net/pcinet_fsl.c create mode 100644 drivers/net/pcinet_host.c create mode 100644 drivers/net/pcinet_hw.h diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index 4a11296..9185803 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -2259,6 +2259,40 @@ config
Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll
Please don't use double underscore, for this function name. There is no reason to not make it a normal API call. The sky2 fix I am working on will use napi_synchronize as well. --- a/include/linux/netdevice.h 2007-10-16 16:48:20.0 -0700 +++ b/include/linux/netdevice.h 2007-10-17 08:29:55.0 -0700 @@ -407,6 +407,24 @@ static inline void napi_enable(struct na clear_bit(NAPI_STATE_SCHED, n-state); } +#ifdef CONFIG_SMP +/** + * napi_synchronize - wait until NAPI is not running + * @n: napi context + * + * Wait until NAPI is done being scheduled on this context. + * Any outstanding processing completes but + * does not disable future activations. + */ +static inline void napi_synchronize(const struct napi_struct *n) +{ + while (test_bit(NAPI_STATE_SCHED, n-state)) + msleep(1); +} +#else +# define napi_synchronize(n) barrier() +#endif + /* * The DEVICE structure. * Actually, this whole structure is a big mistake. It mixes I/O ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll
On Tue, 16 Oct 2007 15:49:52 +1000 Benjamin Herrenschmidt [EMAIL PROTECTED] wrote: net: Add __napi_sycnhronize() to sync with napi poll The EMAC driver which needs to handle multiple devices with one NAPI instance implements its own per-channel disable bit. However, when setting such a bit, it needs to synchronize with the poller (that is make sure that any pending poller instance has completed, or is started late enough to see that disable bit). This implements a low level __napi_synchronize() function to acheive that. The underscores are to emphasis the low level aspect of it and to discourage driver writers who don't know what they are doing to use it (to please DaveM :-) Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED] --- (Use correct address for Stephen this time) If the approach is accepted, I would like to have this merged now so the EMAC patch to make it work again can follow :-) Note: I use msleep_interruptible(1); just like napi_disable(). However I'm not too happy that the hot loop that results of a pending signal here will spin without even a cpu_relax ... what do you guys think would be the best way to handle this ? So this is really just like synchronize_irq()? Using msleep is bogus because you want to spin, you are only waiting for a softirq on the other cpu to finish. If you wait for a whole millisecond and sleep that is far longer than the napi routine should take. You could even optimize it like synchronize_irq() for the non-SMP case. -- Stephen Hemminger [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: [PATCH v4] qe: miscellaneous code improvements and fixes to the QE library
On Wed, 3 Oct 2007 11:34:59 -0500 Timur Tabi [EMAIL PROTECTED] wrote: This patch makes numerous miscellaneous code improvements to the QE library. 1. Remove struct ucc_common and merge ucc_init_guemr() into ucc_set_type() (every caller of ucc_init_guemr() also calls ucc_set_type()). Modify all callers of ucc_set_type() accordingly. 2. Remove the unused enum ucc_pram_initial_offset. 3. Refactor qe_setbrg(), also implement work-around for errata QE_General4. 4. Several printk() calls were missing the terminating \n. 5. Add __iomem where needed, and change u16 to __be16 and u32 to __be32 where appropriate. 6. In ucc_slow_init() the RBASE and TBASE registers in the PRAM were programmed with the wrong value. 7. Add the protocol type to struct us_info and updated ucc_slow_init() to use it, instead of always programming QE_CR_PROTOCOL_UNSPECIFIED. 8. Rename ucc_slow_restart_x() to ucc_slow_restart_tx() 9. Add several macros in qe.h (mostly for slow UCC support, but also to standardize some naming convention) and remove several unused macros. 10. Update ucc_geth.c to use the new macros. 11. Add ucc_slow_info.protocol to specify which QE_CR_PROTOCOL_xxx protcol to use when initializing the UCC in ucc_slow_init(). 12. Rename ucc_slow_pram.rfcr to rbmr and ucc_slow_pram.tfcr to tbmr, since these are the real names of the registers. 13. Use the setbits, clrbits, and clrsetbits where appropriate. 14. Refactor ucc_set_qe_mux_rxtx(). 15. Remove all instances of 'volatile'. 16. Simplify get_cmxucr_reg(); 17. Replace qe_mux.cmxucrX with qe_mux.cmxucr[]. 18. Updated struct ucc_geth because struct ucc_fast is not padded any more. Signed-off-by: Timur Tabi [EMAIL PROTECTED] --- Separate the changes into individual patches to allow for better comment/review and bisection in case of regression. ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: RFC: issues concerning the next NAPI interface
On Fri, 24 Aug 2007 17:47:15 +0200 Jan-Bernd Themann [EMAIL PROTECTED] wrote: Hi, On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote: On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote: ... 3) On modern systems the incoming packets are processed very fast. Especially on SMP systems when we use multiple queues we process only a few packets per napi poll cycle. So NAPI does not work very well here and the interrupt rate is still high. What we need would be some sort of timer polling mode which will schedule a device after a certain amount of time for high load situations. With high precision timers this could work well. Current usual timers are too slow. A finer granularity would be needed to keep the latency down (and queue length moderate). We found the same on ia64-sn systems with tg3 a couple of years ago. Using simple interrupt coalescing (don't interrupt until you've received N packets or M usecs have elapsed) worked reasonably well in practice. If your h/w supports that (and I'd guess it does, since it's such a simple thing), you might try it. I don't see how this should work. Our latest machines are fast enough that they simply empty the queue during the first poll iteration (in most cases). Even if you wait until X packets have been received, it does not help for the next poll cycle. The average number of packets we process per poll queue is low. So a timer would be preferable that periodically polls the queue, without the need of generating a HW interrupt. This would allow us to wait until a reasonable amount of packets have been received in the meantime to keep the poll overhead low. This would also be useful in combination with LRO. You need hardware support for deferred interrupts. Most devices have it (e1000, sky2, tg3) and it interacts well with NAPI. It is not a generic thing you want done by the stack, you want the hardware to hold off interrupts until X packets or Y usecs have expired. The parameters for controlling it are already in ethtool, the issue is finding a good default set of values for a wide range of applications and architectures. Maybe some heuristic based on processor speed would be a good starting point. The dynamic irq moderation stuff is not widely used because it is too hard to get right. -- Stephen Hemminger [EMAIL PROTECTED] ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev