Re: [PATCH net v3] net: Force inlining of checksum functions in net/checksum.h

2022-02-18 Thread Stephen Hemminger
On Fri, 18 Feb 2022 06:12:37 -0600
Segher Boessenkool  wrote:

> On Fri, Feb 18, 2022 at 10:35:48AM +0900, Masahiro Yamada wrote:
> > On Fri, Feb 18, 2022 at 3:10 AM Segher Boessenkool
> >  wrote:  
> > > On Fri, Feb 18, 2022 at 02:27:16AM +0900, Masahiro Yamada wrote:  
> > > > On Fri, Feb 18, 2022 at 1:49 AM David Laight  
> > > > wrote:  
> > > > > That description is largely fine.
> > > > >
> > > > > Inappropriate 'inline' ought to be removed.
> > > > > Then 'inline' means - 'really do inline this'.  
> > > >
> > > > You cannot change "static inline" to "static"
> > > > in header files.  
> > >
> > > Why not?  Those two have identical semantics!  
> > 
> > e.g.)
> > 
> > 
> > [1] Open  include/linux/device.h with your favorite editor,
> >  then edit
> > 
> > static inline void *devm_kcalloc(struct device *dev,
> > 
> > to
> > 
> > static void *devm_kcalloc(struct device *dev,
> > 
> > 
> > [2] Build the kernel  
> 
> You get some "defined but not used" warnings that are shushed for
> inlines.  Do you see something else?
> 
> The semantics are the same.  Warnings are just warnings.  It builds
> fine.

Kernel code should build with zero warnings, the compiler is telling you
something.


Re: [PATCH net-next] ibmvnic: Increase driver logging

2020-07-16 Thread Stephen Hemminger
On Thu, 16 Jul 2020 13:22:00 -0700
Jakub Kicinski  wrote:

> On Thu, 16 Jul 2020 18:07:37 +0200 Michal Suchánek wrote:
> > On Thu, Jul 16, 2020 at 10:59:58AM -0500, Thomas Falcon wrote:  
> > > On 7/15/20 8:29 PM, David Miller wrote:
> > > > From: Jakub Kicinski 
> > > > Date: Wed, 15 Jul 2020 17:06:32 -0700
> > > > 
> > > > > On Wed, 15 Jul 2020 18:51:55 -0500 Thomas Falcon wrote:
> > > > > > free_netdev(netdev);
> > > > > > dev_set_drvdata(>dev, NULL);
> > > > > > +   netdev_info(netdev, "VNIC client device has been successfully 
> > > > > > removed.\n");
> > > > > A step too far, perhaps.
> > > > > 
> > > > > In general this patch looks a little questionable IMHO, this amount of
> > > > > logging output is not commonly seen in drivers. All the the info
> > > > > messages are just static text, not even carrying any extra 
> > > > > information.
> > > > > In an era of ftrace, and bpftrace, do we really need this?
> > > > Agreed, this is too much.  This is debugging, and thus suitable for 
> > > > tracing
> > > > facilities, at best.
> > > 
> > > Thanks for your feedback. I see now that I was overly aggressive with this
> > > patch to be sure, but it would help with narrowing down problems at a 
> > > first
> > > glance, should they arise. The driver in its current state logs very 
> > > little
> > > of what is it doing without the use of additional debugging or tracing
> > > facilities. Would it be worth it to pursue a less aggressive version or
> > > would that be dead on arrival? What are acceptable driver operations to 
> > > log
> > > at this level?
> 
> Sadly it's much more of an art than hard science. Most networking
> drivers will print identifying information when they probe the device
> and then only about major config changes or when link comes up or goes
> down. And obviously when anything unexpected, like an error happens,
> that's key.
> 
> You seem to be adding start / end information for each driver init /
> deinit stage. I'd say try to focus on the actual errors you're trying
> to catch.
> 
> > Also would it be advisable to add the messages as pr_dbg to be enabled on 
> > demand?  
> 
> I personally have had a pretty poor experience with pr_debug() because
> CONFIG_DYNAMIC_DEBUG is not always enabled. Since you're just printing
> static text there shouldn't be much difference between pr_debug and
> ftrace and/or bpftrace, honestly.
> 
> Again, slightly hard to advise not knowing what you're trying to catch.

Linux drivers in general are far too noisy.
In production it is not uncommon to set kernel to suppress all info messages.


RE: [PATCH V3 1/10] X86/Hyper-V: Add parameter offset for hyperv_fill_flush_guest_mapping_list()

2019-02-22 Thread Stephen Hemminger
int hyperv_fill_flush_guest_mapping_list(
struct hv_guest_mapping_flush_list *flush,
-   u64 start_gfn, u64 pages)
+   int offset, u64 start_gfn, u64 pages)
 {
u64 cur = start_gfn;
u64 additional_pages;
-   int gpa_n = 0;
+   int gpa_n = offset;
 
do {
/*

Do you mean to support negative offsets here? Maybe unsigned would be better?


Re: [PATCH net-next 17/22] hv_netvsc: fix return type of ndo_start_xmit function

2018-09-20 Thread Stephen Hemminger
On Thu, 20 Sep 2018 20:33:01 +0800
YueHaibing  wrote:

> The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
> which is a typedef for an enum type, so make sure the implementation in
> this driver has returns 'netdev_tx_t' value, and change the function
> return type to netdev_tx_t.
> 
> Found by coccinelle.
> 
> Signed-off-by: YueHaibing 
> ---
>  drivers/net/hyperv/netvsc_drv.c | 10 +++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
> index 3af6d8d..056c472 100644
> --- a/drivers/net/hyperv/netvsc_drv.c
> +++ b/drivers/net/hyperv/netvsc_drv.c
> @@ -511,7 +511,8 @@ static int netvsc_vf_xmit(struct net_device *net, struct 
> net_device *vf_netdev,
>   return rc;
>  }
>  
> -static int netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
> +static netdev_tx_t
> +netvsc_start_xmit(struct sk_buff *skb, struct net_device *net)
>  {
>   struct net_device_context *net_device_ctx = netdev_priv(net);
>   struct hv_netvsc_packet *packet = NULL;
> @@ -528,8 +529,11 @@ static int netvsc_start_xmit(struct sk_buff *skb, struct 
> net_device *net)
>*/
>   vf_netdev = rcu_dereference_bh(net_device_ctx->vf_netdev);
>   if (vf_netdev && netif_running(vf_netdev) &&
> - !netpoll_tx_running(net))
> - return netvsc_vf_xmit(net, vf_netdev, skb);
> + !netpoll_tx_running(net)) {
> + ret = netvsc_vf_xmit(net, vf_netdev, skb);
> + if (ret)
> + return NETDEV_TX_BUSY;
> + }

Sorry, the new code is wrong. It will fall through if ret == 0 (NETDEV_TX_OK)
Please review and test your patches.


Re: [PATCH net 1/5] ibmvnic: harden interrupt handler

2017-01-26 Thread Stephen Hemminger
On Wed, 25 Jan 2017 15:02:19 -0600
Thomas Falcon  wrote:

>  static irqreturn_t ibmvnic_interrupt(int irq, void *instance)
>  {
>   struct ibmvnic_adapter *adapter = instance;
> + unsigned long flags;
> +
> + spin_lock_irqsave(>crq.lock, flags);
> + vio_disable_interrupts(adapter->vdev);
> + tasklet_schedule(>tasklet);
> + spin_unlock_irqrestore(>crq.lock, flags);
> + return IRQ_HANDLED;
> +}
> +

Why not use NAPI? rather than a tasklet


RE: [PATCH 1/2] PCI: Add pci_set_vpd_timeout() to set VPD access timeout

2016-11-22 Thread Stephen Hemminger
I had old Marvell Sky2 hardware that had slow flash.

-Original Message-
From: Andrew Donnellan [mailto:andrew.donnel...@au1.ibm.com] 
Sent: Monday, November 21, 2016 4:16 PM
To: Bjorn Helgaas <helg...@kernel.org>; Matthew R. Ochs 
<mro...@linux.vnet.ibm.com>
Cc: linux-...@vger.kernel.org; Frederic Barrat <fbar...@linux.vnet.ibm.com>; 
Uma Krishnan <ukri...@linux.vnet.ibm.com>; Ian Munsie <imun...@au1.ibm.com>; 
Bjorn Helgaas <bhelg...@google.com>; linuxppc-dev@lists.ozlabs.org; Stephen 
Hemminger <sthem...@microsoft.com>
Subject: Re: [PATCH 1/2] PCI: Add pci_set_vpd_timeout() to set VPD access 
timeout

On 22/11/16 09:05, Bjorn Helgaas wrote:
> Hi Matthew,
>
> On Mon, Nov 21, 2016 at 03:09:49PM -0600, Matthew R. Ochs wrote:
>> The PCI core uses a fixed 50ms timeout when waiting for VPD accesses 
>> to complete. When an access does not complete within this period, a 
>> warning is logged and an error returned to the caller.
>>
>> While this default timeout is valid for most hardware, some devices 
>> can experience longer access delays under certain circumstances. For 
>> example, one of the IBM CXL Flash devices can take up to ~120ms in a 
>> worst-case scenario. These types of devices can benefit from an 
>> extended timeout that is specific to their hardware constraints.
>>
>> To support per-device VPD access timeouts, pci_set_vpd_timeout() is 
>> added as an exported service. PCI devices will continue to default 
>> with the 50ms timeout and use a per-device timeout when a driver calls this 
>> new service.
>
> Can you include a pointer to something in the spec that's behind the 
> default 50ms timeout, or did somebody just pull that number out of the 
> air?

It looks like Stephen Hemminger added the 50ms timeout in 1120f8b8169f, which 
seems to indicate that 50ms was chosen because it's longer than the 13ms per 
word that was measured on one device.

-- 
Andrew Donnellan  OzLabs, ADL Canberra
andrew.donnel...@au1.ibm.com  IBM Australia Limited



Re: [v3, 2/9] fsl/fman: Add the FMan port FLIB

2015-07-22 Thread Stephen Hemminger
On Wed, 22 Jul 2015 14:21:48 +0300
igal.liber...@freescale.com wrote:

 diff --git a/drivers/net/ethernet/freescale/fman/Kconfig 
 b/drivers/net/ethernet/freescale/fman/Kconfig
 index 8aeae29..af42c3a 100644
 --- a/drivers/net/ethernet/freescale/fman/Kconfig
 +++ b/drivers/net/ethernet/freescale/fman/Kconfig
 @@ -5,3 +5,4 @@ config FSL_FMAN
   help
   Freescale Data-Path Acceleration Architecture Frame Manager
   (FMan) support
 +

Bogus blank line introduced at end of file?
Why was this not in patch 1?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH net-next] af_unix: fix a fatal race with bit fields

2013-05-01 Thread Stephen Hemminger
On Tue, 30 Apr 2013 22:04:32 -0700
Eric Dumazet eric.duma...@gmail.com wrote:

 On Wed, 2013-05-01 at 13:24 +0930, Alan Modra wrote:
  On Tue, Apr 30, 2013 at 07:24:20PM -0700, Eric Dumazet wrote:
 li 11,1
 ld 0,0(9)
 rldimi 0,11,31,32
 std 0,0(9)
 blr
 .ident  GCC: (GNU) 4.6.3
   
   You can see ld 0,0(9) is used : its a 64 bit load.
  
  Yup.  This is not a powerpc64 specific problem.  See
  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52080
  Fixed in 4.8.0 and 4.7.3.
 
 Ah thanks.
 
 This seems a pretty serious issue, is it documented somewhere that
 ppc64, sparc64 and others need such compiler version ?
 
 These kind of errors are pretty hard to find, its a pity to spend time
 on them.

There is a checkbin target inside arch/powerpc/Makefile
Shouldn't a check be added there to block building kernel with known
bad GCC versions?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 01/45] percpu_rwlock: Introduce the global reader-writer lock backend

2013-01-22 Thread Stephen Hemminger
On Tue, 22 Jan 2013 13:03:22 +0530
Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com wrote:

 A straight-forward (and obvious) algorithm to implement Per-CPU Reader-Writer
 locks can also lead to too many deadlock possibilities which can make it very
 hard/impossible to use. This is explained in the example below, which helps
 justify the need for a different algorithm to implement flexible Per-CPU
 Reader-Writer locks.
 
 We can use global rwlocks as shown below safely, without fear of deadlocks:
 
 Readers:
 
  CPU 0CPU 1
  --   --
 
 1.spin_lock(random_lock); read_lock(my_rwlock);
 
 
 2.read_lock(my_rwlock);   spin_lock(random_lock);
 
 
 Writer:
 
  CPU 2:
  --
 
write_lock(my_rwlock);
 
 
 We can observe that there is no possibility of deadlocks or circular locking
 dependencies here. Its perfectly safe.
 
 Now consider a blind/straight-forward conversion of global rwlocks to per-CPU
 rwlocks like this:
 
 The reader locks its own per-CPU rwlock for read, and proceeds.
 
 Something like: read_lock(per-cpu rwlock of this cpu);
 
 The writer acquires all per-CPU rwlocks for write and only then proceeds.
 
 Something like:
 
   for_each_online_cpu(cpu)
   write_lock(per-cpu rwlock of 'cpu');
 
 
 Now let's say that for performance reasons, the above scenario (which was
 perfectly safe when using global rwlocks) was converted to use per-CPU 
 rwlocks.
 
 
  CPU 0CPU 1
  --   --
 
 1.spin_lock(random_lock); read_lock(my_rwlock of CPU 1);
 
 
 2.read_lock(my_rwlock of CPU 0);   spin_lock(random_lock);
 
 
 Writer:
 
  CPU 2:
  --
 
   for_each_online_cpu(cpu)
 write_lock(my_rwlock of 'cpu');
 
 
 Consider what happens if the writer begins his operation in between steps 1
 and 2 at the reader side. It becomes evident that we end up in a (previously
 non-existent) deadlock due to a circular locking dependency between the 3
 entities, like this:
 
 
 (holds  Waiting for
  random_lock) CPU 0 - CPU 2  (holds my_rwlock of CPU 0
for write)
^   |
|   |
 Waiting|   | Waiting
   for  |   |  for
|   V
 -- CPU 1 --
 
 (holds my_rwlock of
  CPU 1 for read)
 
 
 
 So obviously this straight-forward way of implementing percpu rwlocks is
 deadlock-prone. One simple measure for (or characteristic of) safe percpu
 rwlock should be that if a user replaces global rwlocks with per-CPU rwlocks
 (for performance reasons), he shouldn't suddenly end up in numerous deadlock
 possibilities which never existed before. The replacement should continue to
 remain safe, and perhaps improve the performance.
 
 Observing the robustness of global rwlocks in providing a fair amount of
 deadlock safety, we implement per-CPU rwlocks as nothing but global rwlocks,
 as a first step.
 
 
 Cc: David Howells dhowe...@redhat.com
 Signed-off-by: Srivatsa S. Bhat srivatsa.b...@linux.vnet.ibm.com

We got rid of brlock years ago, do we have to reintroduce it like this?
The problem was that brlock caused starvation.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] [V3] net: add Xilinx emac lite device driver

2009-08-20 Thread Stephen Hemminger
On Thu, 20 Aug 2009 03:49:51 -0600
John Linn john.l...@xilinx.com wrote:

 +/**
 + * xemaclite_ioctl - Perform IO Control operations on the network device
 + * @dev: Pointer to the network device
 + * @rq:  Pointer to the interface request structure
 + * @cmd: IOCTL command
 + *
 + * The only IOCTL operation supported by this function is setting the MAC
 + * address. An error is reported if any other operations are requested.
 + *
 + * Return:   0 to indicate success, or a negative error for failure.
 + */
 +static int xemaclite_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 +{
 + struct net_local *lp = (struct net_local *) netdev_priv(dev);
 + struct hw_addr_data *hw_addr = (struct hw_addr_data *) rq-ifr_hwaddr;
 +
 + switch (cmd) {
 + case SIOCETHTOOL:
 + return -EIO;
 +
 + case SIOCSIFHWADDR:
 + dev_err(lp-ndev-dev, SIOCSIFHWADDR\n);
 +
 + /* Copy MAC address in from user space */
 + copy_from_user((void __force *) dev-dev_addr,
 +(void __user __force *) hw_addr,
 +IFHWADDRLEN);
 + xemaclite_set_mac_address(lp, dev-dev_addr);
 + break;
 + default:
 + return -EOPNOTSUPP;
 + }
 +
 + return 0;
 +}

Do you really need this? I doubt the SIOCSIFHWADDR even reaches driver!

The normal call path for setting hardware address is:
  dev_ifsioc
 dev_set_mac_address
 ops-ndo_set_mac_address -- 

The driver should be:
   1. defining new code to do ndo_set_mac_address
   2. remove existing xmaclite_ioctl - all ioctl's handled by upper layers

FYI - the only ioctl's that make it to network device ndo_ioctl
 are listed in dev_ifsioc
   SIOCDEVPRIVATE ... SIOCDEVPRIVATE + 15
   SIOCBOND*
   SIOCMII*
   SIOCBR*
   SIOCHWTSTAMP
   SIOCWANDEV


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] net: add Xilinx emac lite device driver

2009-08-18 Thread Stephen Hemminger
On Tue, 18 Aug 2009 09:30:41 -0600
John Linn john.l...@xilinx.com wrote:

 +/**
 + * xemaclite_enable_interrupts - Enable the interrupts for the EmacLite 
 device
 + * @drvdata: Pointer to the Emaclite device private data
 + *
 + * This function enables the Tx and Rx interrupts for the Emaclite device 
 along
 + * with the Global Interrupt Enable.
 + */
 +static void xemaclite_enable_interrupts(struct net_local *drvdata)

Docbook format is really a not necessary on local functions that
are only used in the driver.  It is fine if you want to use it, as
long as the file isn't processed by kernel make docs but
the docbook is intended for automatic generation of kernel API manuals.

-- 
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Patch: Fix fec_mpc52xx driver to use net_device_ops

2009-03-31 Thread Stephen Hemminger
On Tue, 31 Mar 2009 12:44:15 +0200
Henk Stegeman henk.stege...@gmail.com wrote:

 Fix fec_mpc52xx driver to use net_device_ops and to be careful not to
 dereference phy_device if a phy has not yet been connected.
 
 Signed-off-by: Henk Stegeman henk.stege...@gmail.com
 
 diff --git a/drivers/net/fec_mpc52xx.c b/drivers/net/fec_mpc52xx.c
 index cd8e98b..ca76b95 100644
 --- a/drivers/net/fec_mpc52xx.c
 +++ b/drivers/net/fec_mpc52xx.c
 @@ -847,24 +847,40 @@ static void mpc52xx_fec_get_drvinfo(struct
 net_device *dev,
  static int mpc52xx_fec_get_settings(struct net_device *dev, struct
 ethtool_cmd *cmd)
  {
   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 +
 + if (!priv-phydev)
 + return -ENODEV;
 +
   return phy_ethtool_gset(priv-phydev, cmd);
  }
 
  static int mpc52xx_fec_set_settings(struct net_device *dev, struct
 ethtool_cmd *cmd)
  {
   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 +
 + if (!priv-phydev)
 + return -ENODEV;
 +
   return phy_ethtool_sset(priv-phydev, cmd);
  }
 
  static u32 mpc52xx_fec_get_msglevel(struct net_device *dev)
  {
   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 +
 + if (!priv-phydev)
 + return 0;
 +
   return priv-msg_enable;
  }
 
  static void mpc52xx_fec_set_msglevel(struct net_device *dev, u32 level)
  {
   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 +
 + if (!priv-phydev)
 + return;
 +
   priv-msg_enable = level;
  }
 
 @@ -882,12 +898,31 @@ static int mpc52xx_fec_ioctl(struct net_device
 *dev, struct ifreq *rq, int cmd)
  {
   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 
 + if (!priv-phydev)
 + return -ENODEV;
 +
   return mpc52xx_fec_phy_mii_ioctl(priv, if_mii(rq), cmd);
  }
 
  /*  
 */
  /* OF Driver
 */
  /*  
 */
 +static const struct net_device_ops mpc52xx_fec_netdev_ops = {
 +   .ndo_open   = mpc52xx_fec_open,
 +   .ndo_stop   = mpc52xx_fec_close,
 +   .ndo_start_xmit = mpc52xx_fec_hard_start_xmit,
 +   .ndo_tx_timeout = mpc52xx_fec_tx_timeout,
 +   .ndo_get_stats  = mpc52xx_fec_get_stats,
 +   .ndo_set_multicast_list = mpc52xx_fec_set_multicast_list,
 +   .ndo_validate_addr  = eth_validate_addr,
 +   .ndo_set_mac_address= mpc52xx_fec_set_mac_address,
 +   .ndo_do_ioctl   = mpc52xx_fec_ioctl,
 
What about change_mtu? Don't you want:
  .ndo_change_mtu = eth_change_mtu,
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: net_device_ops support in bridging and fec_mpc52xx.c

2009-02-18 Thread Stephen Hemminger
On Wed, 18 Feb 2009 13:48:52 -0800 (PST)
David Miller da...@davemloft.net wrote:

 From: Henk Stegeman henk.stege...@gmail.com
 Date: Wed, 18 Feb 2009 11:41:14 +0100
 
 Please CC: netdev, now added, on all networking reports and patches.
 
 Thank you.
 
  I discovered the hard way that because linux bridging uses
  net_device_ops, bridging only works with network drivers that publish
  their device operations trough net_device_ops.
  
  In my case running:
  
  brctl addif br0 eth0 (where eth0 fec_mpc52xx.c did not yet support
  net_device_ops) gave me a:
  
  Unable to handle kernel paging request...
  
  After changing fec_mpc52xx.c to support net_device_ops the problem was 
  fixed.
  
  If possible some kind of detection in the bridging software is i think
  mostly appreciated for early detection of this problem, as it is
  pretty hard to relate the error message to a not updated driver.
  
  cheers,
  
  Henk

The normal register_netdevice stuff take care of setting up net_device_ops
for old style drivers. Was there something different about how this
device was being setup?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH RFC v2] net: add PCINet driver

2008-10-29 Thread Stephen Hemminger
On Wed, 29 Oct 2008 13:20:27 -0700
Ira Snyder [EMAIL PROTECTED] wrote:

 This adds support to Linux for a virtual ethernet interface which uses the
 PCI bus as its transport mechanism. It creates a simple, familiar, and fast
 method of communication for two devices connected by a PCI interface.
 
 I have implemented client support for the Freescale MPC8349EMDS board,
 which is capable of running in PCI Agent mode (It acts like a PCI card, but
 is a complete PowerPC computer, running Linux). It is almost certainly
 trivially ported to any MPC83xx system. It should be a relatively small
 effort to port it to any chip that can generate PCI interrupts and has at
 least one PCI accessible scratch register.
 
 It was developed to work in a CompactPCI crate of computers, one of which
 is a relatively standard x86 system (acting as the host) and many PowerPC
 systems (acting as clients).
 
 RFC v1 - RFC v2:
   * remove vim modelines
   * use net_device-name in request_irq(), for irqbalance
   * remove unnecessary wqt_get_stats(), use default get_stats() instead
   * use dev_printk() and friends
   * add message unit to MPC8349EMDS dts file
 
 Signed-off-by: Ira W. Snyder [EMAIL PROTECTED]
 ---
 
 This is the third posting of this driver. I got some feedback, and have
 corrected the problems. Stephen, thanks for the review! I also got some
 off-list feedback from a potential user, so I believe this is worth getting
 into mainline.
 
 I'll post up a revised version about once a week as long as the changes are
 minor. If they are more substantial, I'll post them as needed.
 
 GregKH: is this worth putting into the staging tree? (I left you out of the
 CC list to keep your email traffic down)
 
 The remaining issues I see in this driver:
 1)  Naming 
The name wqt originally stood for workqueue-test and somewhat evolved
over time into the current driver. I'm looking for suggestions for a
better name. It should be the same between the host and client drivers,
to make porting the code between them easier. The drivers are /very/
similar other than the setup code.
 2)  IMMR Usage 
In the Freescale client driver, I use the whole set of board control
registers (AKA IMMR registers). I only need a very small subset of them,
during startup to set up the DMA window. I used the full set of
registers so that I could share the register offsets between the drivers
(in pcinet_hw.h)
 3)  ioremap() of a physical address 
In the Freescale client driver, I called ioremap() with the physical
address of the IMMR registers, 0xe000. I don't know a better way to
get them. They are somewhat exposed in the Flat Device Tree. Suggestions
on how to extract them are welcome.
 4)  Hardcoded DMA Window Address 
In the Freescale client driver, I just hardcoded the address of the
outbound PCI window into the DMA transfer code. It is 0x8000.
Suggestions on how to get this value at runtime are welcome.
 
 
 Rationale behind some decisions:
 1)  Usage of the PCINET_NET_REGISTERS_VALID bit 
I want to be able to use this driver from U-Boot to tftp a kernel over
the PCI backplane, and then boot up the board. This means that the
device descriptor memory, which lives in the client RAM, becomes invalid
during boot.
 2)  Buffer Descriptors in client memory 
I chose to put the buffer descriptors in client memory rather than host
memory. It seemed more logical to me at the time. In my application,
I'll have 19 boards + 1 host per cPCI chassis. The client - host
direction will see most of the traffic, and so I thought I would cut
down on the number of PCI accesses needed. I'm willing to change this.
 3)  Usage of client DMA controller for all data transfer 
This was done purely for speed. I tried using the CPU to transfer all
data, and it is very slow: ~3MB/sec. Using the DMA controller gets me to
~40MB/sec (as tested with netperf).
 4)  Static 1GB DMA window 
Maintaining a window while DMA's in flight, and then changing it seemed
too complicated. Also, testing showed that using a static window gave me
a ~10MB/sec speedup compared to moving the window for each skb.
 5)  The serial driver 
Yes, there are two essentially separate drivers here. I needed a method
to communicate with the U-Boot bootloader on these boards without
plugging in a serial cable. With 19 clients + 1 host per chassis, the
cable clutter is worth avoiding. Since everything is connected via the
PCI bus anyway, I used that. A virtual serial port was simple to
implement using the messaging unit hardware that I used for the network
driver.
 
 I'll post both U-Boot drivers to their mailing list once this driver is
 finalized.
 
 Thanks,
 Ira
 
 
  arch/powerpc/boot/dts/mpc834x_mds.dts |7 +
  drivers/net/Kconfig   |   34 +
  drivers/net/Makefile  

Re: [RFC v1] net: add PCINet driver

2008-10-23 Thread Stephen Hemminger
On Thu, 23 Oct 2008 15:49:32 -0700
Ira Snyder [EMAIL PROTECTED] wrote:

 This adds support to Linux for a virtual ethernet interface which uses the
 PCI bus as its transport mechanism. It creates a simple, familiar, and fast
 method of communication for two devices connected by a PCI interface.
 
 I have implemented client support for the Freescale MPC8349EMDS board,
 which is capable of running in PCI Agent mode (It acts like a PCI card, but
 is a complete PowerPC computer, running Linux). It is almost certainly
 trivially ported to any MPC83xx system. It should be a relatively small
 effort to port it to any chip that can generate PCI interrupts and has at
 least one PCI accessible scratch register.
 
 It was developed to work in a CompactPCI crate of computers, one of which
 is a relatively standard x86 system (acting as the host) and many PowerPC
 systems (acting as clients).
 
 Signed-off-by: Ira W. Snyder [EMAIL PROTECTED]
 ---
 
 This is my second posting of this driver. I posted it to the linux-netdev
 list a week ago, but did not get any replies. Therefore, I'll post it for a
 wider audience. :)
 
 Hello everyone. This is my first network driver, so take it easy on me. :)
 I'm quite sure it isn't ready for inclusion into mainline yet, but I think
 it is in good enough shape for some review.
 
 Through conversations on IRC, I have been led to believe that this has been
 done before, but no implementations have been made public. My employer has
 no problems with me making this public, so I thought it would be good to
 post it. I don't know if something like this is even desired for mainline
 inclusion, but I do know that even having an example driver to base this on
 would have saved me some effort.
 
 The major issues I see:
 1) The name wqt originally stood for workqueue-test and somewhat evolved
 into this driver. I'm looking for suggestions. I'd like to have something
 that is the same between the host and client drivers, since most of the
 code is identical. It makes copy/paste easier. The only one I can come up
 with is bpc for Backplane Communications.
 2) In the Freescale client driver, I use the whole set of IMMR (board
 control) registers. I only need a very small subset of them only during
 startup. I used them the way I did so I could use the same pcinet_hw.h file
 for both the client and server drivers.
 3) I just ioremap()ed the IMMR registers directly. (They're at 0xe000).
 I just didn't know a better way to do this. I need them to set up the PCI
 memory containing the buffer descriptors.
 4) I just hardcoded the address of the outbound PCI window into the DMA
 transfer code. It is at 0x8000. Suggestions are welcome.
 
 Why I made some decisions:
 1) The PCINET_NET_REGISTERS_VALID bit: I want to be able to use this driver
 from U-Boot to copy a kernel, etc. over the PCI backplane to boot up the
 board. This meant that the memory that I used for the buffer descriptors
 disappears while the machine is booting Linux. Suggestions are welcome.
 2) The buffer descriptors in client memory, rather than host memory: I
 thought it seemed more logical. Also, in my application, the clients will
 be transferring much more data to the host.
 3) Use of the Freescale (client) DMA controller to transfer all data: I
 tried transferring all of the data using the CPU on each board. This turned
 out to be extremely slow, as in 2-3 MB/sec max. Using the DMA controller, I
 get ~40 MB/sec (tested with netperf).
 4) Use of a static 1GB window to access the host's memory (to copy skbs):
 Maintaining the window while DMA's are in flight, and then changing them
 seemed to be too much trouble. A static window just seemed easier. Also,
 removing the overhead of moving the window from each skb transferred
 actually gave a reasonable speedup. (I tested it.)
 5) The uart stuff: I needed a method to talk to the U-Boot bootloader on
 these boards without plugging in a serial cable. When my project gets
 going, I'll have 150 of them. Booting them one at a time is out of the
 question. A virtual serial port was simple to implement using the same
 hardware that I used for the network driver.
 
 I'll post the U-Boot driver to their mailing list once this driver is
 finalized.
 
 Thanks,
 Ira
 
 
  drivers/net/Kconfig   |   34 ++
  drivers/net/Makefile  |3 +
  drivers/net/pcinet.h  |   77 +++
  drivers/net/pcinet_fsl.c  | 1360 +++
  drivers/net/pcinet_host.c | 1392 
 +
  drivers/net/pcinet_hw.h   |   80 +++
  6 files changed, 2946 insertions(+), 0 deletions(-)
  create mode 100644 drivers/net/pcinet.h
  create mode 100644 drivers/net/pcinet_fsl.c
  create mode 100644 drivers/net/pcinet_host.c
  create mode 100644 drivers/net/pcinet_hw.h
 
 diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
 index 4a11296..9185803 100644
 --- a/drivers/net/Kconfig
 +++ b/drivers/net/Kconfig
 @@ -2259,6 +2259,40 @@ config 

Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-17 Thread Stephen Hemminger
Please don't use double underscore, for this function name. There is no
reason to not make it a normal API call.

The sky2 fix I am working on will use napi_synchronize as well.

--- a/include/linux/netdevice.h 2007-10-16 16:48:20.0 -0700
+++ b/include/linux/netdevice.h 2007-10-17 08:29:55.0 -0700
@@ -407,6 +407,24 @@ static inline void napi_enable(struct na
clear_bit(NAPI_STATE_SCHED, n-state);
 }
 
+#ifdef CONFIG_SMP
+/**
+ * napi_synchronize - wait until NAPI is not running
+ * @n: napi context
+ *
+ * Wait until NAPI is done being scheduled on this context.
+ * Any outstanding processing completes but
+ * does not disable future activations.
+ */
+static inline void napi_synchronize(const struct napi_struct *n)
+{
+   while (test_bit(NAPI_STATE_SCHED, n-state))
+   msleep(1);
+}
+#else
+# define napi_synchronize(n)   barrier()
+#endif
+
 /*
  * The DEVICE structure.
  * Actually, this whole structure is a big mistake.  It mixes I/O
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH/RFC] net: Add __napi_sycnhronize() to sync with napi poll

2007-10-16 Thread Stephen Hemminger
On Tue, 16 Oct 2007 15:49:52 +1000
Benjamin Herrenschmidt [EMAIL PROTECTED] wrote:

 net: Add __napi_sycnhronize() to sync with napi poll
 
 The EMAC driver which needs to handle multiple devices with one
 NAPI instance implements its own per-channel disable bit. However,
 when setting such a bit, it needs to synchronize with the poller
 (that is make sure that any pending poller instance has completed,
 or is started late enough to see that disable bit).
 
 This implements a low level __napi_synchronize() function to acheive
 that. The underscores are to emphasis the low level aspect of it and
 to discourage driver writers who don't know what they are doing to
 use it (to please DaveM :-)
 
 Signed-off-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
 ---
 
 (Use correct address for Stephen this time)
 
 If the approach is accepted, I would like to have this merged now
 so the EMAC patch to make it work again can follow :-)
 
 Note: I use msleep_interruptible(1); just like napi_disable(). However
 I'm not too happy that the hot loop that results of a pending signal
 here will spin without even a cpu_relax ... what do you guys think would
 be the best way to handle this ?


So this is really just like synchronize_irq()?  Using msleep is bogus
because you want to spin, you are only waiting for a softirq on the other
cpu to finish. If you wait for a whole millisecond and sleep that
is far longer than the napi routine should take.

You could even optimize it like synchronize_irq() for the non-SMP case.



-- 
Stephen Hemminger [EMAIL PROTECTED]
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v4] qe: miscellaneous code improvements and fixes to the QE library

2007-10-03 Thread Stephen Hemminger
On Wed,  3 Oct 2007 11:34:59 -0500
Timur Tabi [EMAIL PROTECTED] wrote:

 This patch makes numerous miscellaneous code improvements to the QE library.
 
 1. Remove struct ucc_common and merge ucc_init_guemr() into ucc_set_type()
(every caller of ucc_init_guemr() also calls ucc_set_type()).  Modify all
callers of ucc_set_type() accordingly.
 
 2. Remove the unused enum ucc_pram_initial_offset.
 
 3. Refactor qe_setbrg(), also implement work-around for errata QE_General4.
 
 4. Several printk() calls were missing the terminating \n.
 
 5. Add __iomem where needed, and change u16 to __be16 and u32 to __be32 where
appropriate.
 
 6. In ucc_slow_init() the RBASE and TBASE registers in the PRAM were 
 programmed
with the wrong value.
 
 7. Add the protocol type to struct us_info and updated ucc_slow_init() to
use it, instead of always programming QE_CR_PROTOCOL_UNSPECIFIED.
 
 8. Rename ucc_slow_restart_x() to ucc_slow_restart_tx()
 
 9. Add several macros in qe.h (mostly for slow UCC support, but also to
standardize some naming convention) and remove several unused macros.
 
 10. Update ucc_geth.c to use the new macros.
 
 11. Add ucc_slow_info.protocol to specify which QE_CR_PROTOCOL_xxx protcol
 to use when initializing the UCC in ucc_slow_init().
 
 12. Rename ucc_slow_pram.rfcr to rbmr and ucc_slow_pram.tfcr to tbmr, since
 these are the real names of the registers.
 
 13. Use the setbits, clrbits, and clrsetbits where appropriate.
 
 14. Refactor ucc_set_qe_mux_rxtx().
 
 15. Remove all instances of 'volatile'.
 
 16. Simplify get_cmxucr_reg();
 
 17. Replace qe_mux.cmxucrX with qe_mux.cmxucr[].
 
 18. Updated struct ucc_geth because struct ucc_fast is not padded any more.
 
 Signed-off-by: Timur Tabi [EMAIL PROTECTED]
 ---
 

Separate the changes into individual patches to allow for better comment/review
and bisection in case of regression.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Stephen Hemminger
On Fri, 24 Aug 2007 17:47:15 +0200
Jan-Bernd Themann [EMAIL PROTECTED] wrote:

 Hi,
 
 On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
  On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
   ...
   3) On modern systems the incoming packets are processed very fast. 
   Especially
      on SMP systems when we use multiple queues we process only a few 
   packets
      per napi poll cycle. So NAPI does not work very well here and the 
   interrupt 
      rate is still high. What we need would be some sort of timer polling 
   mode 
      which will schedule a device after a certain amount of time for high 
   load 
      situations. With high precision timers this could work well. Current
      usual timers are too slow. A finer granularity would be needed to keep 
   the
  latency down (and queue length moderate).
   
  
  We found the same on ia64-sn systems with tg3 a couple of years 
  ago. Using simple interrupt coalescing (don't interrupt until 
  you've received N packets or M usecs have elapsed) worked 
  reasonably well in practice. If your h/w supports that (and I'd 
  guess it does, since it's such a simple thing), you might try 
  it.
  
 
 I don't see how this should work. Our latest machines are fast enough that 
 they
 simply empty the queue during the first poll iteration (in most cases).
 Even if you wait until X packets have been received, it does not help for
 the next poll cycle. The average number of packets we process per poll queue
 is low. So a timer would be preferable that periodically polls the 
 queue, without the need of generating a HW interrupt. This would allow us
 to wait until a reasonable amount of packets have been received in the 
 meantime
 to keep the poll overhead low. This would also be useful in combination
 with LRO.
 

You need hardware support for deferred interrupts. Most devices have it (e1000, 
sky2, tg3)
and it interacts well with NAPI. It is not a generic thing you want done by the 
stack,
you want the hardware to hold off interrupts until X packets or Y usecs have 
expired.

The parameters for controlling it are already in ethtool, the issue is finding 
a good
default set of values for a wide range of applications and architectures. Maybe 
some
heuristic based on processor speed would be a good starting point. The dynamic 
irq
moderation stuff is not widely used because it is too hard to get right.

-- 
Stephen Hemminger [EMAIL PROTECTED]
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev