Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-09-20 Thread Jan-Bernd Themann
Hi Milton,

sorry for the delayed answer, was on vacation.

linux-kernel-ow...@vger.kernel.org wrote on 21.05.2010 11:18:01:
 linux-kernel-ow...@vger.kernel.org

 On Thu, 20 May 2010 at 10:21:36 +0200 (CEST) Thomas Gleixner  wrote:
  On Thu, 20 May 2010, Michael Ellerman wrote:
   On Wed, 2010-05-19 at 16:38 +0200, Thomas Gleixner wrote:
On Wed, 19 May 2010, Darren Hart wrote:
   
 On 05/18/2010 06:25 PM, Michael Ellerman wrote:
  On Tue, 2010-05-18 at 15:22 -0700, Darren Hart wrote:
  
  The result of the discussion about two years ago on this was
that we
  needed a custom flow handler for XICS on RT.

 I'm still not clear on why the ultimate solution wasn't to
 have XICS report
 edge triggered as edge triggered. Probably some complexity
 of the entire power
 stack that I am ignorant of.

  Apart from the issue of loosing interrupts there is also
 the fact that
  masking on the XICS requires an RTAS call which takes a global
lock.
   
Right, I'd love to avoid that but with real level interrupts we'd
run
into an interrupt storm. Though another solution would be to issue
the
EOI after the threaded handler finished, that'd work as well, but
needs testing.
  
   Yeah I think that was the idea for the custom flow handler. We'd
reset
   the processor priority so we can take other interrupts (which the EOI
   usually does for you), then do the actual EOI after the handler
   finished.
 
  That only works when the card does not issue new interrupts until the
  EOI happens. If the EOI is only relevant for the interrupt controller,
  then you are going to lose any edge which comes in before the EOI as
  well.

 Well, the real MSIs have an extra bit to allow the eoi to dally behind
 the mmio on another path and that should cover this race when the irq
 is left enabled.

 Jan-Bernd HEA has that change, right?

I don't now. We never hit problems so we did not look very deep into this
area.
We probably have to talk to the HEA HW developers to be sure.

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 1/2] ehea: fix delayed packet processing

2010-06-15 Thread Jan-Bernd Themann
In the eHEA poll function an rmb() is required. Without that some packets
on the receive queue are not seen and thus delayed until the next interrupt
is handled for the same receive queue.

Signed-off-by: Jan-Bernd Themann them...@de.ibm.com

---
Patch created against 2.6.35-rc3

 drivers/net/ehea/ehea_main.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index f547894..fd890fa 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -867,6 +867,7 @@ static int ehea_poll(struct napi_struct *napi, int budget)
ehea_reset_cq_ep(pr-send_cq);
ehea_reset_cq_n1(pr-recv_cq);
ehea_reset_cq_n1(pr-send_cq);
+   rmb();
cqe = ehea_poll_rq1(pr-qp, wqe_index);
cqe_skb = ehea_poll_cq(pr-send_cq);
 
-- 
1.7.0
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH 2/2] ehea: Fix kernel deadlock in DLPAR-mem processing

2010-06-15 Thread Jan-Bernd Themann
Port reset operations and memory add/remove operations need to 
be serialized to avoid a kernel deadlock. The deadlock is caused
by calling the napi_disable() function twice.
Therefore we have to employ the dlpar_mem_lock in the ehea_reset_port
function as well


Signed-off-by: Jan-Bernd Themann them...@de.ibm.com

---
created against kernel-2.6.35-rc3

 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |8 +++-
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 0630980..0060e42 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0103
+#define DRV_VERSIONEHEA_0105
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index fd890fa..8b92acb 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2860,6 +2860,7 @@ static void ehea_reset_port(struct work_struct *work)
container_of(work, struct ehea_port, reset_task);
struct net_device *dev = port-netdev;
 
+   mutex_lock(dlpar_mem_lock);
port-resets++;
mutex_lock(port-port_lock);
netif_stop_queue(dev);
@@ -2882,6 +2883,7 @@ static void ehea_reset_port(struct work_struct *work)
netif_wake_queue(dev);
 out:
mutex_unlock(port-port_lock);
+   mutex_unlock(dlpar_mem_lock);
 }
 
 static void ehea_rereg_mrs(struct work_struct *work)
@@ -3543,10 +3545,7 @@ static int ehea_mem_notifier(struct notifier_block *nb,
int ret = NOTIFY_BAD;
struct memory_notify *arg = data;
 
-   if (!mutex_trylock(dlpar_mem_lock)) {
-   ehea_info(ehea_mem_notifier must not be called parallelized);
-   goto out;
-   }
+   mutex_lock(dlpar_mem_lock);
 
switch (action) {
case MEM_CANCEL_OFFLINE:
@@ -3575,7 +3574,6 @@ static int ehea_mem_notifier(struct notifier_block *nb,
 
 out_unlock:
mutex_unlock(dlpar_mem_lock);
-out:
return ret;
 }
 
-- 
1.7.0

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Jan-Bernd Themann
Hi,


Michael Ellerman mich...@ellerman.id.au wrote on 20.05.2010 03:34:08:


 Subject:

 Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

 On Wed, 2010-05-19 at 23:08 +0200, Thomas Gleixner wrote:
  On Wed, 19 May 2010, Thomas Gleixner wrote:
I'm still not clear on why the ultimate solution wasn't to
 have XICS report
edge triggered as edge triggered. Probably some complexity of
 the entire power
stack that I am ignorant of.
   
 Apart from the issue of loosing interrupts there is also thefact
that
 masking on the XICS requires an RTAS call which takes a global
lock.
  
   Right, I'd love to avoid that but with real level interrupts we'd run
   into an interrupt storm. Though another solution would be to issue
the
   EOI after the threaded handler finished, that'd work as well, but
   needs testing.
 
  Thought more about that. The case at hand (ehea) is nasty:
 
  The driver does _NOT_ disable the rx interrupt in the card in the rx
  interrupt handler - for whatever reason.

 Yeah I saw that, but I don't know why it's written that way. Perhaps
 Jan-Bernd or Doug will chime in and enlighten us? :)

From our perspective there is no need to disable interrupts for the RX side
as
the chip does not fire further interrupts until we tell the chip to do so
for a
particular queue. We have multiple receive queues with an own interrupt
each
so that the interrupts can arrive on multiple CPUs in parallel.
Interrupts are enabled again when we leave the NAPI Poll function for the
corresponding
receive queue.

Michael, does this answer your question?

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

2010-05-20 Thread Jan-Bernd Themann

Hi Thomas

 Re: [PATCH RT] ehea: make receive irq handler non-threaded (IRQF_NODELAY)

 On Thu, 20 May 2010, Jan-Bernd Themann wrote:
Thought more about that. The case at hand (ehea) is nasty:
   
The driver does _NOT_ disable the rx interrupt in the card in the
rx
interrupt handler - for whatever reason.
  
   Yeah I saw that, but I don't know why it's written that way. Perhaps
   Jan-Bernd or Doug will chime in and enlighten us? :)
 
  From our perspective there is no need to disable interrupts for the
  RX side as the chip does not fire further interrupts until we tell
  the chip to do so for a particular queue. We have multiple receive

 The traces tell a different story though:

 ehea_recv_irq_handler()
   napi_reschedule()
 eoi()
 ehea_poll()
   ...
   ehea_recv_irq_handler() ???
 napi_reschedule()
   ...
   napi_complete()

 Can't tell whether you can see the same behaviour in mainline, but I
 don't see a reason why not.

Is this the same interrupt we are seeing here, or do we see a second other
interrupt popping up on the same CPU? As I said, with multiple receive
queues
(if enabled) you can have multiple interrupts in parallel.

Pleaes check if multiple queues are enabled. The following module parameter
is used for that:

MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0
);

you should also see the number of used HEA interrupts in /proc/interrupts



  queues with an own interrupt each so that the interrupts can arrive
  on multiple CPUs in parallel.  Interrupts are enabled again when we
  leave the NAPI Poll function for the corresponding receive queue.

 I can't see a piece of code which does that, but that's probably just
 lack of detailed hardware knowledge on my side.

If you mean the re-enable piece of code, it is not very obvious, you are
right.
Interrupts are only generated if a particular register for our completion
queues
is written. We do this in the following line:

  ehea_reset_cq_ep(pr-recv_cq);
  ehea_reset_cq_ep(pr-send_cq);
  ehea_reset_cq_n1(pr-recv_cq);
  ehea_reset_cq_n1(pr-send_cq);

So this is in a way an indirect way to ask for interrupts when new
completions were
written to memory. We don't really disable/enable interrupts on the HEA
chip itself.

I think there are some mechanisms build in the HEA chip that should prevent
that
interrupts don't get lost. But that is something that is / was completely
hidden from
us, so my skill is very limited there.

If more details are needed here we should involve the PHYP guys + eHEA HW
guys if not
already done. Did anyone already talk to them?

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH net-next-2.6] ehea: fix circular locking problem

2009-03-12 Thread Jan-Bernd Themann
This patch fixes the circular locking problem by changing the locking strategy
concerning the logging of firmware handles.

Signed-off-by: Jan-Bernd Themann them...@de.ibm.com



---

 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |   56 ++---
 2 files changed, 31 insertions(+), 27 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 029631c..6e317ca 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0099
+#define DRV_VERSIONEHEA_0100
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 40c34bf..ac0c5b4 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -155,6 +155,8 @@ static void ehea_update_firmware_handles(void)
int num_fw_handles, k, l;
 
/* Determine number of handles */
+   mutex_lock(ehea_fw_handles.lock);
+
list_for_each_entry(adapter, adapter_list, list) {
num_adapters++;
 
@@ -176,15 +178,19 @@ static void ehea_update_firmware_handles(void)
if (num_fw_handles) {
arr = kzalloc(num_fw_handles * sizeof(*arr), GFP_KERNEL);
if (!arr)
-   return;  /* Keep the existing array */
+   goto out;  /* Keep the existing array */
} else
goto out_update;
 
list_for_each_entry(adapter, adapter_list, list) {
+   if (num_adapters == 0)
+   break;
+
for (k = 0; k  EHEA_MAX_PORTS; k++) {
struct ehea_port *port = adapter-port[k];
 
-   if (!port || (port-state != EHEA_PORT_UP))
+   if (!port || (port-state != EHEA_PORT_UP)
+   || (num_ports == 0))
continue;
 
for (l = 0;
@@ -207,6 +213,7 @@ static void ehea_update_firmware_handles(void)
}
arr[i].adh = adapter-handle;
arr[i++].fwh = port-qp_eq-fw_handle;
+   num_ports--;
}
 
arr[i].adh = adapter-handle;
@@ -216,16 +223,20 @@ static void ehea_update_firmware_handles(void)
arr[i].adh = adapter-handle;
arr[i++].fwh = adapter-mr.handle;
}
+   num_adapters--;
}
 
 out_update:
kfree(ehea_fw_handles.arr);
ehea_fw_handles.arr = arr;
ehea_fw_handles.num_entries = i;
+out:
+   mutex_unlock(ehea_fw_handles.lock);
 }
 
 static void ehea_update_bcmc_registrations(void)
 {
+   unsigned long flags;
struct ehea_bcmc_reg_entry *arr = NULL;
struct ehea_adapter *adapter;
struct ehea_mc_list *mc_entry;
@@ -233,6 +244,8 @@ static void ehea_update_bcmc_registrations(void)
int i = 0;
int k;
 
+   spin_lock_irqsave(ehea_bcmc_regs.lock, flags);
+
/* Determine number of registrations */
list_for_each_entry(adapter, adapter_list, list)
for (k = 0; k  EHEA_MAX_PORTS; k++) {
@@ -250,7 +263,7 @@ static void ehea_update_bcmc_registrations(void)
if (num_registrations) {
arr = kzalloc(num_registrations * sizeof(*arr), GFP_ATOMIC);
if (!arr)
-   return;  /* Keep the existing array */
+   goto out;  /* Keep the existing array */
} else
goto out_update;
 
@@ -261,6 +274,9 @@ static void ehea_update_bcmc_registrations(void)
if (!port || (port-state != EHEA_PORT_UP))
continue;
 
+   if (num_registrations == 0)
+   goto out_update;
+
arr[i].adh = adapter-handle;
arr[i].port_id = port-logical_port_id;
arr[i].reg_type = EHEA_BCMC_BROADCAST |
@@ -272,9 +288,13 @@ static void ehea_update_bcmc_registrations(void)
arr[i].reg_type = EHEA_BCMC_BROADCAST |
  EHEA_BCMC_VLANID_ALL;
arr[i++].macaddr = port-mac_addr;
+   num_registrations -= 2;
 
list_for_each_entry(mc_entry,
port-mc_list-list, list) {
+   if (num_registrations == 0)
+   goto out_update;
+
arr[i].adh = adapter-handle;
arr[i].port_id = port-logical_port_id;
arr[i].reg_type = EHEA_BCMC_SCOPE_ALL |
@@ -288,6 +308,7 @@ static void ehea_update_bcmc_registrations(void

Re: [PATCH] eHEA: Don't do memory allocation under lock if not necessary

2009-03-11 Thread Jan-Bernd Themann
Hi David,

thanks for your patch. Coincidentally we have been working on a patch that 
does some locking rework and also touches this particular lock. 
So your patch finnally won't be required anymore. Thanks anyway for trying
to improve the eHEA driver!

I'm going to post our patch later today. 

Regards,
Jan-Bernd

On Wednesday 11 March 2009 09:44:57 David Howells wrote:
 In ehea_probe_adapter() the initial memory allocation and initialisation does
 not need to be done with the ehea_fw_handles.lock semaphore held.  Doing so
 extends the amount of time the lock is held unnecessarily.
 
 Signed-off-by: David Howells dhowe...@redhat.com
 ---
 
  drivers/net/ehea/ehea_main.c |   13 ++---
  1 files changed, 6 insertions(+), 7 deletions(-)
 
 
 diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
 index dfe9226..34480ae 100644
 --- a/drivers/net/ehea/ehea_main.c
 +++ b/drivers/net/ehea/ehea_main.c
 @@ -3370,18 +3370,19 @@ static int __devinit ehea_probe_adapter(struct 
 of_device *dev,
   ehea_error(Invalid ibmebus device probed);
   return -EINVAL;
   }
 - mutex_lock(ehea_fw_handles.lock);
 
   adapter = kzalloc(sizeof(*adapter), GFP_KERNEL);
   if (!adapter) {
 - ret = -ENOMEM;
   dev_err(dev-dev, no mem for ehea_adapter\n);
 - goto out;
 + return -ENOMEM;
   }
 
 - list_add(adapter-list, adapter_list);
 -
   adapter-ofdev = dev;
 + adapter-pd = EHEA_PD_ID;
 +
 + mutex_lock(ehea_fw_handles.lock);
 +
 + list_add(adapter-list, adapter_list);
 
   adapter_handle = of_get_property(dev-node, ibm,hea-handle,
NULL);
 @@ -3395,8 +3396,6 @@ static int __devinit ehea_probe_adapter(struct 
 of_device *dev,
   goto out_free_ad;
   }
 
 - adapter-pd = EHEA_PD_ID;
 -
   dev-dev.driver_data = adapter;
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6

2009-02-25 Thread Jan-Bernd Themann
Hi,

we have investigated this problem but didn't understand to root cause of
this problem so far.
The things we observed:
- The warning is only shown when the ehea module is loaded while the
machine is booting.
- If you load the module later (modprobe) no warnings are shown
- Machine never actually hangs

We interpret the warning like this:
- The mutex debug facility detects a dependency between port_lock and
ehea_fw_handles.lock
- ehea_fw_handles.lock is an ehea global lock
- port-port_lock is a lock per network device
- When open is called for a registered network device, port-port_lock
is taken first,
  then ehea_fw_handles.lock
- When open is left these locks are released in a proper way (inverse
order)
- In addition: ehea_fw_handles.lock is held by the function
driver_probe_device
  that registers all available network devices (register_netdev)
- When multiple network devices are registered, it is possible that
open is
  called on an already registered network device while further
netdevices are still registered
  in driver_probe_device. --- open will take port-port_lock, but
won't get ehea_fw_handles.lock
- However, ehea_fw_handles.lock is freed once all netdevices are registered.
- When the second netdevice is registered in driver_probe_device, it
will also try to get
  the port-port_lock (which in fact is a different one, as there is one
per netdevice).
- Does the mutex debug mechanism distinguish between the different
port-port_lock instances?

So far we don't see a locking problem here. Is it possible that the
mutex debug
mechanism causes a false positive here?

Any help is highly appreciated.

Regards
Jan-Bernd

Sachin P. Sant wrote:
 While booting 2.6.29-rc6 on a powerpc box came across this
 circular dependency with eHEA driver.

 ===
 [ INFO: possible circular locking dependency detected ]
 2.6.29-rc6 #2
 ---
 ip/2174 is trying to acquire lock:
 (ehea_fw_handles.lock){--..}, at: [d2a13e30]
 .ehea_up+0x64/0x6e0
 [ehea]

 but task is already holding lock:
 (port-port_lock){--..}, at: [d2a1533c]
 .ehea_open+0x3c/0xc4 [ehea]

 which lock already depends on the new lock.


 the existing dependency chain (in reverse order) is:

 - #2 (port-port_lock){--..}:
 [c00a8590] .__lock_acquire+0x7e0/0x8a8
   [c00a86ac] .lock_acquire+0x54/0x80
   [c05d7564] .mutex_lock_nested+0x190/0x46c
   [d2a1533c] .ehea_open+0x3c/0xc4 [ehea]
   [c0537834] .dev_open+0xf4/0x168
   [c0535780] .dev_change_flags+0xe4/0x1e8
   [c0597bfc] .devinet_ioctl+0x2c4/0x750
   [c05997a8] .inet_ioctl+0xcc/0x11c
   [c0523400] .sock_ioctl+0x2f0/0x34c
   [c01380ec] .vfs_ioctl+0x5c/0xf0
   [c0138810] .do_vfs_ioctl+0x690/0x70c
   [c0138900] .SyS_ioctl+0x74/0xb8
   [c016fb08] .dev_ifsioc+0x210/0x4b8
   [c016ef18] .compat_sys_ioctl+0x3f4/0x488
   [c000855c] syscall_exit+0x0/0x40

 - #1 (rtnl_mutex){--..}:
   [c00a8590] .__lock_acquire+0x7e0/0x8a8
   [c00a86ac] .lock_acquire+0x54/0x80
   [c05d7564] .mutex_lock_nested+0x190/0x46c
   [c05430a8] .rtnl_lock+0x20/0x38
   [c053677c] .register_netdev+0x1c/0x80
   [d2a12714] .ehea_setup_single_port+0x2c8/0x3d0 [ehea]
   [d2a19da8] .ehea_probe_adapter+0x288/0x394 [ehea]
   [c051f034] .of_platform_device_probe+0x78/0x86c
   [c047faec] .driver_probe_device+0x13c/0x200
   [c047fc44] .__driver_attach+0x94/0xd8
   [c047eab4] .bus_for_each_dev+0x80/0xd8
   [c047f850] .driver_attach+0x28/0x40
   [c047f23c] .bus_add_driver+0xd4/0x284
   [c047ff7c] .driver_register+0xc4/0x198
   [c051eeec] .of_register_driver+0x4c/0x60
   [c0024da4] .ibmebus_register_driver+0x30/0x4c
   [d2a1a090] .ehea_module_init+0x1dc/0x234c [ehea]
   [c0009368] .do_one_initcall+0x90/0x1b0
   [c00b2f24] .SyS_init_module+0xc8/0x220
   [c000855c] syscall_exit+0x0/0x40

 - #0 (ehea_fw_handles.lock){--..}:
   [c00a8590] .__lock_acquire+0x7e0/0x8a8
   [c00a86ac] .lock_acquire+0x54/0x80
   [c05d7564] .mutex_lock_nested+0x190/0x46c
   [d2a13e30] .ehea_up+0x64/0x6e0 [ehea]
   [d2a15364] .ehea_open+0x64/0xc4 [ehea]
   [c0537834] .dev_open+0xf4/0x168
   [c0535780] .dev_change_flags+0xe4/0x1e8
   [c0597bfc] .devinet_ioctl+0x2c4/0x750
   [c05997a8] .inet_ioctl+0xcc/0x11c
   [c0523400] .sock_ioctl+0x2f0/0x34c
   [c01380ec] .vfs_ioctl+0x5c/0xf0
   [c0138810] .do_vfs_ioctl+0x690/0x70c
   [c0138900] .SyS_ioctl+0x74/0xb8
   [c016fb08] .dev_ifsioc+0x210/0x4b8
   

Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6

2009-02-25 Thread Jan-Bernd Themann
Hi,

yes, sorry for the funny wrapping... and thanks for your quick answer!

Peter Zijlstra wrote:
 On Wed, 2009-02-25 at 16:05 +0100, Jan-Bernd Themann wrote:

   
 - When open is called for a registered network device, port-port_lock
 is taken first,
   then ehea_fw_handles.lock
 - When open is left these locks are released in a proper way (inverse
 order)
 

 So this has:

   port-port_lock
 ehea_fw_handles.lock

 This would be the case that is generating the warning.

   
 - In addition: ehea_fw_handles.lock is held by the function
 driver_probe_device
   that registers all available network devices (register_netdev)
 - When multiple network devices are registered, it is possible that
 open is
   called on an already registered network device while further
 netdevices are still registered
   in driver_probe_device. --- open will take port-port_lock, but
 won't get ehea_fw_handles.lock
 

 Right, so here you have 

   ehea_fw_handles.lock
 port-port_lock

 Overlay these two cases and you have AB-BA deadlocks.

   
The thing here is that I did not see that open is called from this
probe function,
this happens probably indirectly as each new device causes a notifier chain
to be called -- If I got it right then a userspace tool triggers the
open.
In that case the open would run in an other task/thread and thus when
the kernel
preemts the task/thread the probe function would continue and free the lock.

Lets assume that it is actually possible that open is called in the
same context as
probe, wound't that mean that we actually need to hit a deadlock?
(probe helds
the lock all the time). We have never observed a deadlock so far.

Is there a way to find out if all these locks are actually taken in the
same context
(kthread, tasklet...)?

 - However, ehea_fw_handles.lock is freed once all netdevices are registered.
 - When the second netdevice is registered in driver_probe_device, it
 will also try to get
   the port-port_lock (which in fact is a different one, as there is one
 per netdevice).
 - Does the mutex debug mechanism distinguish between the different
 port-port_lock instances?
 

 Not unless you tell it to.
   
 Are you really sure the port-port_lock in this AB-BA scenario are never
 the same? The above explanation didn't convince me (also very hard to
 read due to funny wrapping).
   
I'm not sure, especially as I just ran the same test with just one port
and we still
get the warning. But having two instances of port accessing the locks
does not
look like a problem to me as they allocate and free the locks properly
(right order).

 Suppose you do an open concurrently with a re-probe, which apparently
 takes port-port_lock's of existing devices, in the above scenario that
 deadlocks.

   

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH RFC v5] net: add PCINet driver

2009-01-16 Thread Jan-Bernd Themann
Hi!

Ira Snyder i...@ovro.caltech.edu wrote on 15.01.2009 22:40:56:

 On Thu, Jan 15, 2009 at 10:22:53PM +0100, Arnd Bergmann wrote:
  On Thursday 15 January 2009, Ira Snyder wrote:
   I have another question for you Arnd.
   
   What did you use as the host and guest drivers when you ran virtio 
over
   PCI? Did you use two unmodified instances of virtio_net (one on 
host,
   one on guest) for networking, or did you write new virtio drivers 
for
   those? How about for virtio_console (if you ran it at all).

In addition to what Arnd already told you: 
We used two (almost) unmodified instances of virtio_net. When we tested 
this
driver we modified the change_mtu function slightly as this function was 
not implemented
properly (for our purpose) at that time. 

  
  Jan-Bernd may be able to tell you details about this, and send you the
  driver code that his interns implemented for it.
  This was only doing virtio_net between two machines using MMIO 
transfers,
  i.e. the DMA engine was unused, but there was a mailbox interrupt (if 
you
  have one of these, you won't need MSI, btw -- just write a DMA to it).
  
 
 Thanks. Jan-Bernd, I'm looking forward to any input you have.
 

Regards,
Jan-Bernd___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption, eHCA is close

2008-09-23 Thread Jan-Bernd Themann
Hi,

I think these are the functional changes that need to be included in
the ibmebus driver. We'll add a RT flag in the final version to enable
these changes only for RT-Linux for now. 
Ben, can you / your team look into the implementation
of the set_irq_type functionality needed for XICS?

Regards,
Jan-Bernd  Christoph

diff -Nurp b/arch/powerpc/kernel/ibmebus.c a/arch/powerpc/kernel/ibmebus.c
--- b/arch/powerpc/kernel/ibmebus.c 2008-09-22 00:29:55.0 +0200
+++ a/arch/powerpc/kernel/ibmebus.c 2008-09-23 12:04:53.0 +0200
@@ -216,12 +216,16 @@ int ibmebus_request_irq(u32 ist, irq_han
unsigned long irq_flags, const char *devname,
void *dev_id)
 {
+   int ret;
unsigned int irq = irq_create_mapping(NULL, ist);
 
if (irq == NO_IRQ)
return -EINVAL;
 
-   return request_irq(irq, handler, irq_flags, devname, dev_id);
+   ret = request_irq(irq, handler, irq_flags, devname, dev_id);
+   set_irq_type(irq, IRQ_TYPE_EDGE_RISING);
+
+   return ret;
 }
 EXPORT_SYMBOL(ibmebus_request_irq);
 


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH HACK] powerpc: quick hack to get a functional eHEA with hardirq preemption

2008-09-15 Thread Jan-Bernd Themann
Hi,

we are a bit worried about putting this into the mainstream part of non 
real time linux.
There interrupts work perfectly fine, and it was a bit of a challenge to 
get there for all
cases / configurations / machines.

Could you try to enable these changes only for RT-Linux via a real-time 
kconfig switch?
This way we make sure we don't break the scheme for eHEA / eHCA. 

Regards,
Jan-Bernd  Christoph

Sebastien Dugue [EMAIL PROTECTED] wrote on 15.09.2008 10:04:06:

 
 WARNING: HACK - HACK - HACK
 
   Under the RT kernel (with hardirq preemption) the eHEA driver hangs 
right
 after booting. Fiddling with the hardirqs and softirqs priorities allows 
to
 run a bit longer but as soon as the network gets under load, the hang
 returns. After investigating, it appears that the driver is loosing 
 interrupts.
 
   To make a long story short, looking at the code, it appears that the 
XICS
 maps all its interrupts to level sensitive interrupts (I don't know 
 if it's the
 reality or if it's due to an incomplete implementation - no datasheets
 available to check) and use the fasteoi processing flow.
 
   When entering the low level handler, level sensitive interrupts are 
masked,
 then eio'd in interrupt context and then unmasked at the end of hardirq
 processing.
 That's fine as any interrupt comming in-between will still be processed 
since
 the kernel replays those pending interrupts.
 
   However, it appears that the eHEA interrupts are behaving as edge 
sensitive
 interrupts and are routed through the XICS which process those as level
 sensitive using the fasteoi handler __OR__ the XICS loses interruptswhen 
they
 are masked.
 
   Therefore the masking done in the handler causes any interrupt 
 happening while
 in the handler to be lost.
 
   So this patch maps the interrupts being requested through
 ibmebus_request_irq() as edge sensitive interrupts (this concerns 
 both the eHEA
 and the eHCA - only users of ibmebus_request_irq()) and changes the way 
edge
 interrupts are processed by the fasteoi handler.
 
   It works for the eHEA, dunno for the eHCA.
 
   So, unless all the designers of the XICS  eHEA have been shot to keep 
it
 a secret, could someone knowledgeable shed some light on this issue.
 
   Thanks,
 
   Sebastien.
 
 Not-Signed-off-by: Sebastien Dugue [EMAIL PROTECTED]
 ---
  arch/powerpc/kernel/ibmebus.c |   11 ++-
  kernel/irq/chip.c |5 +++--
  kernel/irq/manage.c   |9 ++---
  3 files changed, 19 insertions(+), 6 deletions(-)
 
 diff --git a/arch/powerpc/kernel/ibmebus.c 
b/arch/powerpc/kernel/ibmebus.c
 index 9971159..5200323 100644
 --- a/arch/powerpc/kernel/ibmebus.c
 +++ b/arch/powerpc/kernel/ibmebus.c
 @@ -41,6 +41,7 @@
  #include linux/kobject.h
  #include linux/dma-mapping.h
  #include linux/interrupt.h
 +#include linux/irq.h
  #include linux/of.h
  #include linux/of_platform.h
  #include asm/ibmebus.h
 @@ -213,11 +214,19 @@ int ibmebus_request_irq(u32 ist, irq_handler_t 
handler,
   void *dev_id)
  {
 unsigned int irq = irq_create_mapping(NULL, ist);
 +   struct irq_desc *desc;
 +   int ret;
 
 if (irq == NO_IRQ)
return -EINVAL;
 
 -   return request_irq(irq, handler, irq_flags, devname, dev_id);
 +   ret = request_irq(irq, handler, irq_flags, devname, dev_id);
 +
 +   desc = irq_desc + irq;
 +   desc-status = ~(IRQ_TYPE_SENSE_MASK | IRQ_LEVEL);
 +   desc-status |= IRQ_TYPE_EDGE_RISING;
 +
 +   return ret;
  }
  EXPORT_SYMBOL(ibmebus_request_irq);
 
 diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
 index b7b397a..6d366ca 100644
 --- a/kernel/irq/chip.c
 +++ b/kernel/irq/chip.c
 @@ -430,7 +430,7 @@ handle_fasteoi_irq(unsigned int irq, struct 
 irq_desc *desc)
 action = desc-action;
 if (unlikely(!action || (desc-status  (IRQ_INPROGRESS |
 IRQ_DISABLED {
 -  desc-status |= IRQ_PENDING;
 +  desc-status |= IRQ_PENDING | IRQ_MASKED;
if (desc-chip-mask)
   desc-chip-mask(irq);
goto out;
 @@ -439,9 +439,10 @@ handle_fasteoi_irq(unsigned int irq, struct 
 irq_desc *desc)
 desc-status |= IRQ_INPROGRESS;
 /*
  * In the threaded case we fall back to a mask+eoi sequence:
 +* excepted for edge interrupts which are not masked.
  */
 if (redirect_hardirq(desc)) {
 -  if (desc-chip-mask)
 +  if (desc-chip-mask  !(desc-status  IRQ_TYPE_EDGE_BOTH))
   desc-chip-mask(irq);
goto out;
 }
 diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
 index 3bffa20..3e39c71 100644
 --- a/kernel/irq/manage.c
 +++ b/kernel/irq/manage.c
 @@ -788,9 +788,12 @@ static void do_hardirq(struct irq_desc *desc)
thread_simple_irq(desc);
 else if (desc-handle_irq == handle_level_irq)
thread_level_irq(desc);
 -   else if (desc-handle_irq == handle_fasteoi_irq)
 -  thread_fasteoi_irq(desc);
 -   else if (desc-handle_irq == handle_edge_irq)
 +   else if (desc-handle_irq == handle_fasteoi_irq) {
 +  if (desc-status  

Re: [BUG] 2.6.26-rc8-git2 - powerpc - circular locking dependency detected with net/ehea driver

2008-07-07 Thread Jan-Bernd Themann
Hi Kamalesh,

where you able to reproduce this problem with the patches applied
we posted on friday?

Regards,
Jan-Bernd

On Tuesday 01 July 2008 14:38, Kamalesh Babulal wrote:
 Hi,
 
 circular locking dependency is detected, while booting the
 powerpc box with the 2.6.26-rc8-git2 kernel.
 
 ===
 [ INFO: possible circular locking dependency detected ]
 2.6.26-rc8-git2 #1
 ---
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 3/3] ehea: fix race condition

2008-07-03 Thread Jan-Bernd Themann
When ehea_stop is called the function 
cancel_work_sync(port-reset_task) is used to ensure
that the reset task is not running anymore. We need an 
additional flag to ensure that it can not be scheduled
after this call again for a certain time.


Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---

diff -Nurp -X dontdiff linux-2.6.26-rc8/drivers/net/ehea/ehea.h 
patched_kernel/drivers/net/ehea/ehea.h
--- linux-2.6.26-rc8/drivers/net/ehea/ehea.h2008-07-02 16:52:13.0 
+0200
+++ patched_kernel/drivers/net/ehea/ehea.h  2008-07-02 17:01:39.0 
+0200
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0091
+#define DRV_VERSIONEHEA_0092
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -478,6 +478,7 @@ struct ehea_port {
int num_add_tx_qps;
int num_mcs;
int resets;
+   u64 flags;
u64 mac_addr;
u32 logical_port_id;
u32 port_speed;
@@ -501,7 +502,8 @@ struct port_res_cfg {
 };
 
 enum ehea_flag_bits {
-   __EHEA_STOP_XFER
+   __EHEA_STOP_XFER,
+   __EHEA_DISABLE_PORT_RESET
 };
 
 void ehea_set_ethtool_ops(struct net_device *netdev);
diff -Nurp -X dontdiff linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c   2008-07-02 
17:01:18.0 +0200
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2008-07-02 17:01:39.0 
+0200
@@ -138,6 +138,12 @@ void ehea_dump(void *adr, int len, char 
}
 }
 
+void ehea_schedule_port_reset(struct ehea_port *port)
+{
+   if (!test_bit(__EHEA_DISABLE_PORT_RESET, port-flags))
+   schedule_work(port-reset_task);
+}
+
 static void ehea_update_firmware_handles(void)
 {
struct ehea_fw_handle_entry *arr = NULL;
@@ -588,7 +594,7 @@ static int ehea_treat_poll_error(struct 
   Resetting port., pr-qp-init_attr.qp_nr);
ehea_dump(cqe, sizeof(*cqe), CQE);
}
-   schedule_work(pr-port-reset_task);
+   ehea_schedule_port_reset(pr-port);
return 1;
}
 
@@ -766,7 +772,7 @@ static struct ehea_cqe *ehea_proc_cqes(s
ehea_error(Send Completion Error: Resetting port);
if (netif_msg_tx_err(pr-port))
ehea_dump(cqe, sizeof(*cqe), Send CQE);
-   schedule_work(pr-port-reset_task);
+   ehea_schedule_port_reset(pr-port);
break;
}
 
@@ -886,7 +892,7 @@ static irqreturn_t ehea_qp_aff_irq_handl
eqe = ehea_poll_eq(port-qp_eq);
}
 
-   schedule_work(port-reset_task);
+   ehea_schedule_port_reset(port);
 
return IRQ_HANDLED;
 }
@@ -2606,13 +2612,14 @@ static int ehea_stop(struct net_device *
if (netif_msg_ifdown(port))
ehea_info(disabling port %s, dev-name);
 
+   set_bit(__EHEA_DISABLE_PORT_RESET, port-flags);
cancel_work_sync(port-reset_task);
-
mutex_lock(port-port_lock);
netif_stop_queue(dev);
port_napi_disable(port);
ret = ehea_down(dev);
mutex_unlock(port-port_lock);
+   clear_bit(__EHEA_DISABLE_PORT_RESET, port-flags);
return ret;
 }
 
@@ -2942,7 +2949,7 @@ static void ehea_tx_watchdog(struct net_
 
if (netif_carrier_ok(dev) 
!test_bit(__EHEA_STOP_XFER, ehea_driver_flags))
-   schedule_work(port-reset_task);
+   ehea_schedule_port_reset(port);
 }
 
 int ehea_sense_adapter_attr(struct ehea_adapter *adapter)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/3] ehea: fix might sleep problem

2008-07-03 Thread Jan-Bernd Themann
A mutex has to be replaced by spinlocks as it can be called from
a context which does not allow sleeping.
The kzalloc flag GFP_KERNEL has to be replaced by GFP_ATOMIC
for the same reason.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---

diff -Nurp -X dontdiff linux-2.6.26-rc8/drivers/net/ehea/ehea.h 
patched_kernel/drivers/net/ehea/ehea.h
--- linux-2.6.26-rc8/drivers/net/ehea/ehea.h2008-06-25 03:58:20.0 
+0200
+++ patched_kernel/drivers/net/ehea/ehea.h  2008-07-02 12:38:05.0 
+0200
@@ -452,7 +452,7 @@ struct ehea_bcmc_reg_entry {
 struct ehea_bcmc_reg_array {
struct ehea_bcmc_reg_entry *arr;
int num_entries;
-   struct mutex lock;
+   spinlock_t lock;
 };
 
 #define EHEA_PORT_UP 1
diff -Nurp -X dontdiff linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c   2008-06-25 
03:58:20.0 +0200
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2008-07-02 12:38:05.0 
+0200
@@ -241,7 +241,7 @@ static void ehea_update_bcmc_registratio
}
 
if (num_registrations) {
-   arr = kzalloc(num_registrations * sizeof(*arr), GFP_KERNEL);
+   arr = kzalloc(num_registrations * sizeof(*arr), GFP_ATOMIC);
if (!arr)
return;  /* Keep the existing array */
} else
@@ -301,7 +301,7 @@ static struct net_device_stats *ehea_get
 
memset(stats, 0, sizeof(*stats));
 
-   cb2 = kzalloc(PAGE_SIZE, GFP_KERNEL);
+   cb2 = kzalloc(PAGE_SIZE, GFP_ATOMIC);
if (!cb2) {
ehea_error(no mem for cb2);
goto out;
@@ -1763,7 +1763,7 @@ static int ehea_set_mac_addr(struct net_
 
memcpy(dev-dev_addr, mac_addr-sa_data, dev-addr_len);
 
-   mutex_lock(ehea_bcmc_regs.lock);
+   spin_lock(ehea_bcmc_regs.lock);
 
/* Deregister old MAC in pHYP */
if (port-state == EHEA_PORT_UP) {
@@ -1785,7 +1785,7 @@ static int ehea_set_mac_addr(struct net_
 
 out_upregs:
ehea_update_bcmc_registrations();
-   mutex_unlock(ehea_bcmc_regs.lock);
+   spin_unlock(ehea_bcmc_regs.lock);
 out_free:
kfree(cb0);
 out:
@@ -1947,7 +1947,7 @@ static void ehea_set_multicast_list(stru
}
ehea_promiscuous(dev, 0);
 
-   mutex_lock(ehea_bcmc_regs.lock);
+   spin_lock(ehea_bcmc_regs.lock);
 
if (dev-flags  IFF_ALLMULTI) {
ehea_allmulti(dev, 1);
@@ -1978,7 +1978,7 @@ static void ehea_set_multicast_list(stru
}
 out:
ehea_update_bcmc_registrations();
-   mutex_unlock(ehea_bcmc_regs.lock);
+   spin_unlock(ehea_bcmc_regs.lock);
return;
 }
 
@@ -2497,7 +2497,7 @@ static int ehea_up(struct net_device *de
}
}
 
-   mutex_lock(ehea_bcmc_regs.lock);
+   spin_lock(ehea_bcmc_regs.lock);
 
ret = ehea_broadcast_reg_helper(port, H_REG_BCMC);
if (ret) {
@@ -2520,7 +2520,7 @@ out:
ehea_info(Failed starting %s. ret=%i, dev-name, ret);
 
ehea_update_bcmc_registrations();
-   mutex_unlock(ehea_bcmc_regs.lock);
+   spin_unlock(ehea_bcmc_regs.lock);
 
ehea_update_firmware_handles();
mutex_unlock(ehea_fw_handles.lock);
@@ -2575,7 +2575,7 @@ static int ehea_down(struct net_device *
 
mutex_lock(ehea_fw_handles.lock);
 
-   mutex_lock(ehea_bcmc_regs.lock);
+   spin_lock(ehea_bcmc_regs.lock);
ehea_drop_multicast_list(dev);
ehea_broadcast_reg_helper(port, H_DEREG_BCMC);
 
@@ -2584,7 +2584,7 @@ static int ehea_down(struct net_device *
port-state = EHEA_PORT_DOWN;
 
ehea_update_bcmc_registrations();
-   mutex_unlock(ehea_bcmc_regs.lock);
+   spin_unlock(ehea_bcmc_regs.lock);
 
ret = ehea_clean_all_portres(port);
if (ret)
@@ -3590,7 +3590,7 @@ int __init ehea_module_init(void)
memset(ehea_bcmc_regs, 0, sizeof(ehea_bcmc_regs));
 
mutex_init(ehea_fw_handles.lock);
-   mutex_init(ehea_bcmc_regs.lock);
+   spin_lock_init(ehea_bcmc_regs.lock);
 
ret = check_module_parm();
if (ret)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/3] ehea: add MODULE_DEVICE_TABLE

2008-07-03 Thread Jan-Bernd Themann
Required to allow distros to easily detect when ehea
module needs to be loaded

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---

diff -Nurp -X dontdiff linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.26-rc8/drivers/net/ehea/ehea_main.c   2008-07-02 
13:27:03.0 +0200
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2008-07-02 13:48:21.0 
+0200
@@ -118,6 +118,7 @@ static struct of_device_id ehea_device_t
},
{},
 };
+MODULE_DEVICE_TABLE(of, ehea_device_table);
 
 static struct of_platform_driver ehea_driver = {
.name = ehea,
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH][2.6.26] ehea: set mac address fix

2008-06-09 Thread Jan-Bernd Themann
eHEA has to call firmware functions in order to change the mac address
of a logical port. This patch checks if the logical port is up
when calling the register / deregister mac address calls. If the port
is down these firmware calls would fail and are therefore not executed.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---

diff -Nurp -X dontdiff linux-2.6.26-rc4/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.26-rc4/drivers/net/ehea/ehea_main.c   2008-05-26 
20:08:11.0 +0200
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2008-06-09 15:42:17.0 
+0200
@@ -1766,16 +1766,20 @@ static int ehea_set_mac_addr(struct net_
mutex_lock(ehea_bcmc_regs.lock);
 
/* Deregister old MAC in pHYP */
-   ret = ehea_broadcast_reg_helper(port, H_DEREG_BCMC);
-   if (ret)
-   goto out_upregs;
+   if (port-state == EHEA_PORT_UP) {
+   ret = ehea_broadcast_reg_helper(port, H_DEREG_BCMC);
+   if (ret)
+   goto out_upregs;
+   }
 
port-mac_addr = cb0-port_mac_addr  16;
 
/* Register new MAC in pHYP */
-   ret = ehea_broadcast_reg_helper(port, H_REG_BCMC);
-   if (ret)
-   goto out_upregs;
+   if (port-state == EHEA_PORT_UP) {
+   ret = ehea_broadcast_reg_helper(port, H_REG_BCMC);
+   if (ret)
+   goto out_upregs;
+   }
 
ret = 0;
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 1/2] ibmebus: Change ibmebus_request_irq() to optionally return irq number

2008-06-09 Thread Jan-Bernd Themann
Stefan Roscher [EMAIL PROTECTED] wrote on 09.06.2008 17:44:29:

 Signed-off-by: Stefan Roscher [EMAIL PROTECTED]
 ---
  arch/powerpc/kernel/ibmebus.c|5 -
  drivers/infiniband/hw/ehca/ehca_eq.c |4 ++--
  drivers/net/ehea/ehea_main.c |6 +++---
  include/asm-powerpc/ibmebus.h|2 +-
  4 files changed, 10 insertions(+), 7 deletions(-)
 
 diff --git a/arch/powerpc/kernel/ibmebus.c 
b/arch/powerpc/kernel/ibmebus.c
 index 9971159..a002fdf 100644
 --- a/arch/powerpc/kernel/ibmebus.c
 +++ b/arch/powerpc/kernel/ibmebus.c
 @@ -208,7 +208,7 @@ void ibmebus_unregister_driver(struct 
 of_platform_driver *drv)
  }
  EXPORT_SYMBOL(ibmebus_unregister_driver);
 
 -int ibmebus_request_irq(u32 ist, irq_handler_t handler,
 +int ibmebus_request_irq(u32 ist, int *irq_number, irq_handler_t 
handler,
   unsigned long irq_flags, const char *devname,
   void *dev_id)
  {
 @@ -217,6 +217,9 @@ int ibmebus_request_irq(u32 ist, irq_handler_t 
handler,
 if (irq == NO_IRQ)
return -EINVAL;
 
 +   if (irq_number)
 +  *irq_number = irq;
 +
 return request_irq(irq, handler, irq_flags, devname, dev_id);
  }
  EXPORT_SYMBOL(ibmebus_request_irq);
 diff --git a/drivers/infiniband/hw/ehca/ehca_eq.c 
 b/drivers/infiniband/hw/ehca/ehca_eq.c
 index 49660df..5bc494f 100644
 --- a/drivers/infiniband/hw/ehca/ehca_eq.c
 +++ b/drivers/infiniband/hw/ehca/ehca_eq.c
 @@ -122,7 +122,7 @@ int ehca_create_eq(struct ehca_shca *shca,
 
 /* register interrupt handlers and initialize work queues */
 if (type == EHCA_EQ) {
 -  ret = ibmebus_request_irq(eq-ist, ehca_interrupt_eq,
 +  ret = ibmebus_request_irq(eq-ist, NULL, ehca_interrupt_eq,
   IRQF_DISABLED, ehca_eq,
   (void *)shca);
if (ret  0)
 @@ -130,7 +130,7 @@ int ehca_create_eq(struct ehca_shca *shca,
 
tasklet_init(eq-interrupt_task, ehca_tasklet_eq, (long)shca);
 } else if (type == EHCA_NEQ) {
 -  ret = ibmebus_request_irq(eq-ist, ehca_interrupt_neq,
 +  ret = ibmebus_request_irq(eq-ist, NULL, ehca_interrupt_neq,
   IRQF_DISABLED, ehca_neq,
   (void *)shca);
if (ret  0)
 diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
 index 287a619..102ffeb 100644
 --- a/drivers/net/ehea/ehea_main.c
 +++ b/drivers/net/ehea/ehea_main.c
 @@ -1216,7 +1216,7 @@ static int ehea_reg_interrupts(struct net_device 
*dev)
 snprintf(port-int_aff_name, EHEA_IRQ_NAME_SIZE - 1, %s-aff,
 dev-name);
 
 -   ret = ibmebus_request_irq(port-qp_eq-attr.ist1,
 +   ret = ibmebus_request_irq(port-qp_eq-attr.ist1, NULL,
ehea_qp_aff_irq_handler,
IRQF_DISABLED, port-int_aff_name, port);
 if (ret) {
 @@ -1234,7 +1234,7 @@ static int ehea_reg_interrupts(struct net_device 
*dev)
pr = port-port_res[i];
snprintf(pr-int_send_name, EHEA_IRQ_NAME_SIZE - 1,
%s-queue%d, dev-name, i);
 -  ret = ibmebus_request_irq(pr-eq-attr.ist1,
 +  ret = ibmebus_request_irq(pr-eq-attr.ist1, NULL,
   ehea_recv_irq_handler,
   IRQF_DISABLED, pr-int_send_name,
   pr);
 @@ -3414,7 +3414,7 @@ static int __devinit ehea_probe_adapter(struct
 of_device *dev,
 tasklet_init(adapter-neq_tasklet, ehea_neq_tasklet,
 (unsigned long)adapter);
 
 -   ret = ibmebus_request_irq(adapter-neq-attr.ist1,
 +   ret = ibmebus_request_irq(adapter-neq-attr.ist1, NULL,
ehea_interrupt_neq, IRQF_DISABLED,
ehea_neq, adapter);
 if (ret) {
 diff --git a/include/asm-powerpc/ibmebus.h 
b/include/asm-powerpc/ibmebus.h
 index 1a9d9ae..3a2618a 100644
 --- a/include/asm-powerpc/ibmebus.h
 +++ b/include/asm-powerpc/ibmebus.h
 @@ -51,7 +51,7 @@ extern struct bus_type ibmebus_bus_type;
  int ibmebus_register_driver(struct of_platform_driver *drv);
  void ibmebus_unregister_driver(struct of_platform_driver *drv);
 
 -int ibmebus_request_irq(u32 ist, irq_handler_t handler,
 +int ibmebus_request_irq(u32 ist, int *irq_number, irq_handler_t 
handler,
   unsigned long irq_flags, const char *devname,
   void *dev_id);
  void ibmebus_free_irq(u32 ist, void *dev_id);
 -- 
 1.5.5
 

Concerning the eHEA part:

Acked-by: Jan-Bernd Themann [EMAIL PROTECTED]

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/1] ehea: Fix use after free on reboot

2008-05-20 Thread Jan-Bernd Themann
On Wednesday 14 May 2008 16:48, Brian King wrote:
 
 Fixes the following use after free oops:
 
 ehea: Reboot: freeing all eHEA resources
 Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6c5b
 Faulting instruction address: 0xd0354488
 cpu 0x0: Vector: 300 (Data Access) at [c0002ec6f310]
 pc: d0354488: .ehea_shutdown_single_port+0x50/0x78 [ehea]
 lr: d035447c: .ehea_shutdown_single_port+0x44/0x78 [ehea]
 sp: c0002ec6f590
msr: 80009032
dar: 6b6b6b6b6b6b6c5b
  dsisr: 4000
   current = 0xc000281412e0
   paca= 0xc06df300
 pid   = 10930, comm = reboot
 enter ? for help
 [c0002ec6f590] d035d64c .ehea_remove+0x44/0x124 [ehea] 
 (unreliable)
 [c0002ec6f630] c0319f88 .of_platform_device_remove+0x40/0x58
 [c0002ec6f6a0] c0291018 .__device_release_driver+0xb0/0xf0
 [c0002ec6f730] c0291120 .driver_detach+0xc8/0xfc
 [c0002ec6f7c0] c028fe24 .bus_remove_driver+0xb4/0x114
 [c0002ec6f850] c0291768 .driver_unregister+0x54/0x74
 [c0002ec6f8e0] c031a0c8 .of_unregister_driver+0x14/0x28
 [c0002ec6f950] c0023ba0 .ibmebus_unregister_driver+0x10/0x24
 [c0002ec6f9c0] d0354180 .ehea_reboot_notifier+0x30/0x4c [ehea]
 [c0002ec6fa40] c03c95a8 .notifier_call_chain+0x5c/0xcc
 [c0002ec6fae0] c0082cd4 .__blocking_notifier_call_chain+0x70/0xb0
 [c0002ec6fb90] c0075cf8 .kernel_restart_prepare+0x24/0x58
 [c0002ec6fc10] c0075f0c .kernel_restart+0x20/0x6c
 [c0002ec6fc90] c0078674 .sys_reboot+0x1d4/0x290
 [c0002ec6fe30] c00086ac syscall_exit+0x0/0x40
 --- Exception: c01 (System Call) at 0ff63a40
 SP (ffceea50) is in userspace
 
 Signed-off-by: Brian King [EMAIL PROTECTED]
 ---
 
  linux-2.6-bjking1/drivers/net/ehea/ehea_main.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff -puN drivers/net/ehea/ehea_main.c~ehea_useafter_free_fix 
 drivers/net/ehea/ehea_main.c
 --- linux-2.6/drivers/net/ehea/ehea_main.c~ehea_useafter_free_fix 
 2008-05-14 09:38:10.0 -0500
 +++ linux-2.6-bjking1/drivers/net/ehea/ehea_main.c2008-05-14 
 09:38:10.0 -0500
 @@ -3177,11 +3177,12 @@ out_err:
 
  static void ehea_shutdown_single_port(struct ehea_port *port)
  {
 + struct ehea_adapter *adapter = port-adapter;
   unregister_netdev(port-netdev);
   ehea_unregister_port(port);
   kfree(port-mc_list);
   free_netdev(port-netdev);
 - port-adapter-active_ports--;
 + adapter-active_ports--;
  }
 
  static int ehea_setup_ports(struct ehea_adapter *adapter)
 _
 --
 To unsubscribe from this list: send the line unsubscribe netdev in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

The patch looks good. 

Acked-by: Jan-Bernd Themann [EMAIL PROTECTED]

Thanks,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-18 Thread Jan-Bernd Themann
switching to proper mail client...

Dave Hansen [EMAIL PROTECTED] wrote on 15.02.2008 17:55:38:

 I've been thinking about that, and I don't think you really *need* to
 keep a comprehensive map like that. 
 
 When the memory is in a particular configuration (range of memory
 present along with unique set of holes) you get a unique ehea_bmap
 configuration.  That layout is completely predictable.
 
 So, if at any time you want to figure out what the ehea_bmap address for
 a particular *Linux* virtual address is, you just need to pretend that
 you're creating the entire ehea_bmap, use the same algorithm and figure
 out host you would have placed things, and use that result.
 
 Now, that's going to be a slow, crappy linear search (but maybe not as
 slow as recreating the silly thing).  So, you might eventually run into
 some scalability problems with a lot of packets going around.  But, I'd
 be curious if you do in practice.

Up to 14 addresses translation per packet (sg_list) might be required on 
the transmit side. On receive side it is only 1. Most packets require only 
very few translations (1 or sometimes more)  translations. However, 
with more then 700.000 packets per second this approach does not seem 
reasonable from performance perspective when memory is fragmented as you
described.

 
 The other idea is that you create a mapping that is precisely 1:1 with
 kernel memory.  Let's say you have two sections present, 0 and 100.  You
 have a high_section_index of 100, and you vmalloc() a 100 entry array.
 
 You need to create a *CONTIGUOUS* ehea map?  Create one like this:
 
 EHEA_VADDR-Linux Section
 0-0
 1-0
 2-0
 3-0
 ...
 100-100
 
 It's contiguous.  Each area points to a valid Linux memory address.
 It's also discernable in O(1) to what EHEA address a given Linux address
 is mapped.  You just have a couple of duplicate entries. 

This has a serious issues with constraint I mentions in the previous mail: 

- MRs can have a maximum size of the memory available under linux

The requirement is not met that the memory region must not be 
larger then the available memory for that partition. The create MR 
H_CALL will fails (we tried this and discussed with FW development)


Regards,
Jan-Bernd  Christoph
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-18 Thread Jan-Bernd Themann
Dave Hansen [EMAIL PROTECTED] wrote on 15.02.2008 17:55:38:

 I've been thinking about that, and I don't think you really *need* to
 keep a comprehensive map like that. 
 
 When the memory is in a particular configuration (range of memory
 present along with unique set of holes) you get a unique ehea_bmap
 configuration.  That layout is completely predictable.
 
 So, if at any time you want to figure out what the ehea_bmap address for
 a particular *Linux* virtual address is, you just need to pretend that
 you're creating the entire ehea_bmap, use the same algorithm and figure
 out host you would have placed things, and use that result.
 
 Now, that's going to be a slow, crappy linear search (but maybe not as
 slow as recreating the silly thing).  So, you might eventually run into
 some scalability problems with a lot of packets going around.  But, I'd
 be curious if you do in practice.

Up to 14 addresses translation per packet (sg_list) might be required on 
the
transmit side. On receive side it is only 1. Most packets require only 
very few
translations (1 or sometimes more)  translations. However, with more then 
700.000 
packets per second this approach does not seem reasonable from performance
perspective when memory is fragmented as you described.

 
 The other idea is that you create a mapping that is precisely 1:1 with
 kernel memory.  Let's say you have two sections present, 0 and 100.  You
 have a high_section_index of 100, and you vmalloc() a 100 entry array.
 
 You need to create a *CONTIGUOUS* ehea map?  Create one like this:
 
 EHEA_VADDR-Linux Section
 0-0
 1-0
 2-0
 3-0
 ...
 100-100
 
 It's contiguous.  Each area points to a valid Linux memory address.
 It's also discernable in O(1) to what EHEA address a given Linux address
 is mapped.  You just have a couple of duplicate entries. 

This has a serious issues with constraint I mentions in the previous mail: 

- MRs can have a maximum size of the memory available under linux

The requirement is not met that the memory region must not be 
larger then the available memory for that partition. The create MR 
H_CALL
will fails (we tried this and discussed with FW development)


Regards,
Jan-Bernd  Christoph___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-13 Thread Jan-Bernd Themann
Hi Dave,

On Monday 11 February 2008 17:47, Dave Hansen wrote:
 Also, just ripping down and completely re-doing the entire mass of cards
 every time a 16MB area of memory is added or removed seems like an
 awfully big sledgehammer to me.  I would *HATE* to see anybody else
 using this driver as an example to work off of?  Can't you just keep
 track of which areas the driver is actually *USING* and only worry about
 changing mappings if that intersects with an area having hotplug done on
 it?


to form a base for the eHEA memory add / remove concept discussion:

Explanation of the current eHEA memory add / remove concept:

Constraints imposed by HW / FW:
- eHEA has own MMU
- eHEA  Memory Regions (MRs) are used by the eHEA MMU  to translate virtual
  addresses to absolute addresses (like DMA mapped memory on a PCI bus)
- The number of MRs is limited (not enough to have one MR per packet)
- Registration of MRs is comparativley slow as done via slow firmware call
(H_CALL)
- MRs can have a maximum size of the memory available under linux
- MRs cover a contiguous virtual memory block (no holes)

Because of this there is just one big MR that covers entire kernel memory.
We also need a mapping table from kernel addresses to this
contiguous virtual memory IO space (here called ehea_bmap).

- When memory is added / removed to LPAR (and linux), the MR has to be updated.
  This can only be done by destroying and recreating the MR. There is no H_CALL
  to modify MR size. To find holes in the linux kernel memory layout we have to
  iterate over the memory sections for recreating a ehea_bmap
  (otherwise MR would be bigger then available memory causing the
  registration to fail)

- DLPAR userspace tools, kernel, driver, firmware and HMC are involved in that
  process on System p

Memory add: version without a external memory notifier call
- new memory used in a transfer_xmit will result in a ehea_bmap
  translation miss, which triggers a rebuild and reregistration
  of the ehea_bmap based on the current kernel memory setup.
- advantage: the number of MR rebuilds is reduced significantly compared to
  a rebuild for each 16MB chunk of memory added.

Memory add: version with external notifier call:
- We still need a ehea_bmap (whatever structure it has)

Memory remove with notifier:
- We have to rebuild the ehea_bmap instantly to remove the pages that are
  no longer available. Without doing that, the firmware (pHYP) cannot remove
  that memory from the LPAR. As we don't know if or how many additional 
  sections are to be removed before the DLPAR user space tool tells the 
  firmware to remove the memory, we can't wait with the rebuild.


Our current understanding about the current Memory Hotplug System are
(please correct me
if I'm wrong):

- depends on sparse mem
- only whole memory sections are added / removed
- for each section a memory resource is registered


From the driver side we need:
- some kind of memory notification mechanism.
  For memory add we can live without any external memory notification
  event. For memory remove we do need an external trigger (see explanation
  above).
- a way to iterate over all kernel pages and a way to detect holes in the
  kernel memory layout in order to build up our own ehea_bmap.


Memory notification trigger:
- These triggers exist, an exported register_memory_notifier /
  unregister_memory_notifier would work in this scheme

Functions to use while building ehea_bmap + MRs:
- Use either the functions that are used by the memory hotplug system as
  well, that means using the section defines + functions (section_nr_to_pfn,
  pfn_valid)
- Use currently other not exported functions in kernel/resource.c, like
  walk_memory_resource (where we would still need the maximum possible number
  of pages NR_MEM_SECTIONS)
- Maybe some kind of new interface?

What would you suggest?

Regards,
Jan-Bernd  Christoph
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-11 Thread Jan-Bernd Themann
Drivers like eHEA need memory notifiers in order to 
update their internal DMA memory map when memory is added
to or removed from the system.

Patch for eHEA memory hotplug support that uses these functions:
http://www.spinics.net/lists/netdev/msg54484.html

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 Hi,

 the eHEA patch belongs to a patchset that is usually
 added by Jeff Garzik once this dependency (EXPORTS)
 is resolved.

 Regards,
 Jan-Bernd


 drivers/base/memory.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7ae413f..f5a0bf1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -52,11 +52,13 @@ int register_memory_notifier(struct notifier_block *nb)
 {
 return blocking_notifier_chain_register(memory_chain, nb);
 }
+EXPORT_SYMBOL_GPL(register_memory_notifier);
 
 void unregister_memory_notifier(struct notifier_block *nb)
 {
 blocking_notifier_chain_unregister(memory_chain, nb);
 }
+EXPORT_SYMBOL_GPL(unregister_memory_notifier);
 
 /*
  * register_memory - Setup a sysfs device for a memory block
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] drivers/base: export gpl (un)register_memory_notifier

2008-02-11 Thread Jan-Bernd Themann
Drivers like eHEA need memory notifiers in order to 
update their internal DMA memory map when memory is added
to or removed from the system.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 Hi,

 this is the modified version with EXPORT_SYMBOL_GPL

 Regards,
 Jan-Bernd


 drivers/base/memory.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7ae413f..f5a0bf1 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -52,11 +52,13 @@ int register_memory_notifier(struct notifier_block *nb)
 {
 return blocking_notifier_chain_register(memory_chain, nb);
 }
+EXPORT_SYMBOL_GPL(register_memory_notifier);
 
 void unregister_memory_notifier(struct notifier_block *nb)
 {
 blocking_notifier_chain_unregister(memory_chain, nb);
 }
+EXPORT_SYMBOL_GPL(unregister_memory_notifier);
 
 /*
  * register_memory - Setup a sysfs device for a memory block
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH][RESEND] drivers/base: export (un)register_memory_notifier

2008-02-11 Thread Jan-Bernd Themann
On Monday 11 February 2008 11:12, Dave Hansen wrote:
 On Mon, 2008-02-11 at 10:49 +0100, Jan-Bernd Themann wrote:
  are you the right person to address this patch to?
 
 You might want to check the top of the file. ;)
 
  --- a/drivers/base/memory.c
  +++ b/drivers/base/memory.c
  @@ -52,11 +52,13 @@ int register_memory_notifier(struct notifier_block *nb)
   {
   return blocking_notifier_chain_register(memory_chain, nb);
   }
  +EXPORT_SYMBOL(register_memory_notifier);
   
   void unregister_memory_notifier(struct notifier_block *nb)
   {
   blocking_notifier_chain_unregister(memory_chain, nb);
   }
  +EXPORT_SYMBOL(unregister_memory_notifier);
 
 Is there a particular reason these can't be GPL?
 

I don't object to make them GPL. Greg, what do you think?

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH][RESEND] drivers/base: export (un)register_memory_notifier

2008-02-11 Thread Jan-Bernd Themann
Drivers like eHEA need memory notifiers in order to 
update their internal DMA memory map when memory is added
to or removed from the system.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
Hi Greg,

are you the right person to address this patch to?

Regards,
Jan-Bernd

 drivers/base/memory.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 7ae413f..1e1bd4c 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -52,11 +52,13 @@ int register_memory_notifier(struct notifier_block *nb)
 {
 return blocking_notifier_chain_register(memory_chain, nb);
 }
+EXPORT_SYMBOL(register_memory_notifier);
 
 void unregister_memory_notifier(struct notifier_block *nb)
 {
 blocking_notifier_chain_unregister(memory_chain, nb);
 }
+EXPORT_SYMBOL(unregister_memory_notifier);
 
 /*
  * register_memory - Setup a sysfs device for a memory block
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 0/2] ehea: kdump memory remove support

2008-02-04 Thread Jan-Bernd Themann
This patch set adds support for kdump and hotplug memory remove
to the eHEA driver. 

The memory remove patch depends on the following patch that
has been posted a few days ago. That patch exports the symbols
 - register_memory_notifier()
 - unregister_memory_notifier()

http://lkml.org/lkml/2008/2/1/293

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 2/2] ehea: add memory remove hotplug support

2008-02-04 Thread Jan-Bernd Themann
On Monday 04 February 2008 15:46, Michael Ellerman wrote:
 On Mon, 2008-02-04 at 14:04 +0100, Jan-Bernd Themann wrote:
  Add memory remove hotplug support

  @@ -3559,6 +3578,10 @@ int __init ehea_module_init(void)
  if (ret)
  ehea_info(failed registering reboot notifier);
   
  +   ret = register_memory_notifier(ehea_mem_nb);
  +   if (ret)
  +   ehea_info(failed registering memory remove notifier);
  
  ret = crash_shutdown_register(ehea_crash_handler);
  if (ret)
  ehea_info(failed registering crash handler);
 
 You don't do anything except print a message if the registration fails.
 What happens when someone tries to remove memory but the memory notifier
 wasn't registered properly? Bang?

In case the registration fails and somebody tries to free memory:
- Driver will not remove the affected memory from the eHEA memory region
  -- Firmware (phyp) can not free that memory (as marked as used)
  -- Therefore the removed memory could not be used in an other partition

It makes sense to allow the driver to work anyway. Having no ethernet
would not really be a good alternative.

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] ehea: fix sysfs link compile problem

2008-02-01 Thread Jan-Bernd Themann
Due to changes in the struct device_driver there is no direct
access to its kobj any longer. The kobj was used to create
sysfs links between eHEA ethernet devices and the driver.
This patch removes the affected sysfs links to resolve
the build problems.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 drivers/net/ehea/ehea_main.c |   37 -
 1 files changed, 0 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 869e160..9a3fd81 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2804,34 +2804,6 @@ static void __devinit logical_port_release(struct device 
*dev)
of_node_put(port-ofdev.node);
 }
 
-static int ehea_driver_sysfs_add(struct device *dev,
-struct device_driver *driver)
-{
-   int ret;
-
-   ret = sysfs_create_link(driver-kobj, dev-kobj,
-   kobject_name(dev-kobj));
-   if (ret == 0) {
-   ret = sysfs_create_link(dev-kobj, driver-kobj,
-   driver);
-   if (ret)
-   sysfs_remove_link(driver-kobj,
- kobject_name(dev-kobj));
-   }
-   return ret;
-}
-
-static void ehea_driver_sysfs_remove(struct device *dev,
-struct device_driver *driver)
-{
-   struct device_driver *drv = driver;
-
-   if (drv) {
-   sysfs_remove_link(drv-kobj, kobject_name(dev-kobj));
-   sysfs_remove_link(dev-kobj, driver);
-   }
-}
-
 static struct device *ehea_register_port(struct ehea_port *port,
 struct device_node *dn)
 {
@@ -2856,16 +2828,8 @@ static struct device *ehea_register_port(struct 
ehea_port *port,
goto out_unreg_of_dev;
}
 
-   ret = ehea_driver_sysfs_add(port-ofdev.dev, ehea_driver.driver);
-   if (ret) {
-   ehea_error(failed to register sysfs driver link);
-   goto out_rem_dev_file;
-   }
-
return port-ofdev.dev;
 
-out_rem_dev_file:
-   device_remove_file(port-ofdev.dev, dev_attr_log_port_id);
 out_unreg_of_dev:
of_device_unregister(port-ofdev);
 out:
@@ -2874,7 +2838,6 @@ out:
 
 static void ehea_unregister_port(struct ehea_port *port)
 {
-   ehea_driver_sysfs_remove(port-ofdev.dev, ehea_driver.driver);
device_remove_file(port-ofdev.dev, dev_attr_log_port_id);
of_device_unregister(port-ofdev);
 }
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] ehea: add kexec support

2007-10-26 Thread Jan-Bernd Themann
eHEA resources that are allocated via H_CALLs have a unique identifier each.
These identifiers are necessary to free the resources. A reboot notifier
is used to free all eHEA resources before the indentifiers get lost, i.e
before kexec starts a new kernel.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |   21 +
 2 files changed, 22 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 4b4b74e..f78e5bf 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0079
+#define DRV_VERSIONEHEA_0080
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 0a7e789..f0319f1 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -33,6 +33,9 @@
 #include linux/if.h
 #include linux/list.h
 #include linux/if_ether.h
+#include linux/notifier.h
+#include linux/reboot.h
+
 #include net/ip.h
 
 #include ehea.h
@@ -3295,6 +3298,20 @@ static int __devexit ehea_remove(struct of_device *dev)
return 0;
 }
 
+static int ehea_reboot_notifier(struct notifier_block *nb,
+   unsigned long action, void *unused)
+{
+   if (action == SYS_RESTART) {
+   ehea_info(Reboot: freeing all eHEA resources);
+   ibmebus_unregister_driver(ehea_driver);
+   }
+   return NOTIFY_DONE;
+}
+
+static struct notifier_block ehea_reboot_nb = {
+.notifier_call = ehea_reboot_notifier,
+};
+
 static int check_module_parm(void)
 {
int ret = 0;
@@ -3351,6 +3368,8 @@ int __init ehea_module_init(void)
if (ret)
goto out;
 
+   register_reboot_notifier(ehea_reboot_nb);
+
ret = ibmebus_register_driver(ehea_driver);
if (ret) {
ehea_error(failed registering eHEA device driver on ebus);
@@ -3362,6 +3381,7 @@ int __init ehea_module_init(void)
if (ret) {
ehea_error(failed to register capabilities attribute, ret=%d,
   ret);
+   unregister_reboot_notifier(ehea_reboot_nb);
ibmebus_unregister_driver(ehea_driver);
goto out;
}
@@ -3375,6 +3395,7 @@ static void __exit ehea_module_exit(void)
flush_scheduled_work();
driver_remove_file(ehea_driver.driver, driver_attr_capabilities);
ibmebus_unregister_driver(ehea_driver);
+   unregister_reboot_notifier(ehea_reboot_nb);
ehea_destroy_busmap();
 }
 
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] ehea: fix port_napi_disable/enable

2007-10-24 Thread Jan-Bernd Themann
napi_disable / napi_enable must be applied on all ehea queues.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |7 +++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index b557bb4..4b4b74e 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0078
+#define DRV_VERSIONEHEA_0079
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index fe5ffac..a8b05d2 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2329,7 +2329,7 @@ static void port_napi_disable(struct ehea_port *port)
 {
int i;
 
-   for (i = 0; i  port-num_def_qps; i++)
+   for (i = 0; i  port-num_def_qps + port-num_add_tx_qps; i++)
napi_disable(port-port_res[i].napi);
 }
 
@@ -2337,7 +2337,7 @@ static void port_napi_enable(struct ehea_port *port)
 {
int i;
 
-   for (i = 0; i  port-num_def_qps; i++)
+   for (i = 0; i  port-num_def_qps + port-num_add_tx_qps; i++)
napi_enable(port-port_res[i].napi);
 }
 
@@ -2373,8 +2373,6 @@ static int ehea_down(struct net_device *dev)
ehea_drop_multicast_list(dev);
ehea_free_interrupts(dev);
 
-   port_napi_disable(port);
-
port-state = EHEA_PORT_DOWN;
 
ret = ehea_clean_all_portres(port);
@@ -2396,6 +2394,7 @@ static int ehea_stop(struct net_device *dev)
flush_scheduled_work();
down(port-port_lock);
netif_stop_queue(dev);
+   port_napi_disable(port);
ret = ehea_down(dev);
up(port-port_lock);
return ret;
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH 4/5] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers

2007-10-09 Thread Jan-Bernd Themann
Roland Dreier [EMAIL PROTECTED] wrote on 03.10.2007 20:05:44:

Replace struct ibmebus_dev and struct ibmebus_driver with 
 struct of_device
and struct of_platform_driver, respectively. Match the external 
ibmebus
interface and drivers using it.

Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
   
   If not, then you need to get an Acked-by and an agreement that this
   change can go via the powerpc.git tree from Roland Dreier and Jeff
   Garzik.
 
 I don't see anything objectionable in the infiniband parts of the
 patch -- I don't have any way to test the changes but it all looks
 like a straightforward conversion to a new platform API.  So:
 
 Acked-by: Roland Dreier [EMAIL PROTECTED]
 
  - R.

Looks good from eHEA driver perspective.

Acked-by: Jan-Bernd Themann [EMAIL PROTECTED]___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] ehea: use kernel event queue

2007-10-08 Thread Jan-Bernd Themann
eHEA recovery and DLPAR functions are called seldomly. The eHEA workqueues
are replaced by the kernel event queue.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
The patch has been built against upstream git

 drivers/net/ehea/ehea.h  |3 +--
 drivers/net/ehea/ehea_main.c |   28 
 drivers/net/ehea/ehea_qmr.c  |3 +--
 3 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 3022089..ac21526 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,7 +40,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0077
+#define DRV_VERSIONEHEA_0078
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -391,7 +391,6 @@ struct ehea_adapter {
struct ibmebus_dev *ebus_dev;
struct ehea_port *port[EHEA_MAX_PORTS];
struct ehea_eq *neq;   /* notification event queue */
-   struct workqueue_struct *ehea_wq;
struct tasklet_struct neq_tasklet;
struct ehea_mr mr;
u32 pd;/* protection domain */
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 5bc0a15..2ba57e6 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -94,7 +94,6 @@ MODULE_PARM_DESC(use_lro,  Large Receive Offload, 1: enable, 
0: disable, 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
 u64 ehea_driver_flags = 0;
-struct workqueue_struct *ehea_driver_wq;
 struct work_struct ehea_rereg_mr_task;
 
 struct semaphore dlpar_mem_lock;
@@ -421,7 +420,7 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, 
int rq,
 
if (cqe-status  EHEA_CQE_STAT_FAT_ERR_MASK) {
ehea_error(Critical receive error. Resetting port.);
-   queue_work(pr-port-adapter-ehea_wq, pr-port-reset_task);
+   schedule_work(pr-port-reset_task);
return 1;
}
 
@@ -596,8 +595,7 @@ static struct ehea_cqe *ehea_proc_cqes(struct ehea_port_res 
*pr, int my_quota)
ehea_error(Send Completion Error: Resetting port);
if (netif_msg_tx_err(pr-port))
ehea_dump(cqe, sizeof(*cqe), Send CQE);
-   queue_work(pr-port-adapter-ehea_wq,
-  pr-port-reset_task);
+   schedule_work(pr-port-reset_task);
break;
}
 
@@ -716,7 +714,7 @@ static irqreturn_t ehea_qp_aff_irq_handler(int irq, void 
*param)
eqe = ehea_poll_eq(port-qp_eq);
}
 
-   queue_work(port-adapter-ehea_wq, port-reset_task);
+   schedule_work(port-reset_task);
 
return IRQ_HANDLED;
 }
@@ -2395,7 +2393,7 @@ static int ehea_stop(struct net_device *dev)
if (netif_msg_ifdown(port))
ehea_info(disabling port %s, dev-name);
 
-   flush_workqueue(port-adapter-ehea_wq);
+   flush_scheduled_work();
down(port-port_lock);
netif_stop_queue(dev);
ret = ehea_down(dev);
@@ -2710,7 +2708,7 @@ static void ehea_tx_watchdog(struct net_device *dev)
 
if (netif_carrier_ok(dev) 
!test_bit(__EHEA_STOP_XFER, ehea_driver_flags))
-   queue_work(port-adapter-ehea_wq, port-reset_task);
+   schedule_work(port-reset_task);
 }
 
 int ehea_sense_adapter_attr(struct ehea_adapter *adapter)
@@ -3243,15 +3241,9 @@ static int __devinit ehea_probe_adapter(struct 
ibmebus_dev *dev,
goto out_kill_eq;
}
 
-   adapter-ehea_wq = create_workqueue(ehea_wq);
-   if (!adapter-ehea_wq) {
-   ret = -EIO;
-   goto out_free_irq;
-   }
-
ret = ehea_create_device_sysfs(dev);
if (ret)
-   goto out_kill_wq;
+   goto out_free_irq;
 
ret = ehea_setup_ports(adapter);
if (ret) {
@@ -3265,9 +3257,6 @@ static int __devinit ehea_probe_adapter(struct 
ibmebus_dev *dev,
 out_rem_dev_sysfs:
ehea_remove_device_sysfs(dev);
 
-out_kill_wq:
-   destroy_workqueue(adapter-ehea_wq);
-
 out_free_irq:
ibmebus_free_irq(NULL, adapter-neq-attr.ist1, adapter);
 
@@ -3293,7 +3282,7 @@ static int __devexit ehea_remove(struct ibmebus_dev *dev)
 
ehea_remove_device_sysfs(dev);
 
-   destroy_workqueue(adapter-ehea_wq);
+   flush_scheduled_work();
 
ibmebus_free_irq(NULL, adapter-neq-attr.ist1, adapter);
tasklet_kill(adapter-neq_tasklet);
@@ -3351,7 +3340,6 @@ int __init ehea_module_init(void)
printk(KERN_INFO IBM eHEA ethernet device driver (Release %s)\n,
   DRV_VERSION);
 
-   ehea_driver_wq = create_workqueue(ehea_driver_wq);
 
INIT_WORK(ehea_rereg_mr_task, ehea_rereg_mrs);
sema_init(dlpar_mem_lock, 1);
@@ -3385,7 +3373,7 @@ out:
 
 static void __exit ehea_module_exit(void)
 {
-   destroy_workqueue(ehea_driver_wq

[PATCH] ehea: DLPAR memory add fix

2007-10-01 Thread Jan-Bernd Themann
Due to stability issues in high load situations the HW queue handling
has to be changed. The HW queues are now stopped and restarted again instead
of destroying and allocating new HW queues. 

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |4 +-
 drivers/net/ehea/ehea_main.c |  276 +-
 drivers/net/ehea/ehea_phyp.h |1 +
 drivers/net/ehea/ehea_qmr.c  |   20 ++--
 drivers/net/ehea/ehea_qmr.h  |4 +-
 5 files changed, 259 insertions(+), 46 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index c0cbd94..3022089 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -40,13 +40,13 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0074
+#define DRV_VERSIONEHEA_0077
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
 #define DLPAR_MEM_ADD  2
 #define DLPAR_MEM_REM  4
-#define EHEA_CAPABILITIES  (DLPAR_PORT_ADD_REM)
+#define EHEA_CAPABILITIES  (DLPAR_PORT_ADD_REM | DLPAR_MEM_ADD)
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 62d6c1e..5bc0a15 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -97,6 +97,7 @@ u64 ehea_driver_flags = 0;
 struct workqueue_struct *ehea_driver_wq;
 struct work_struct ehea_rereg_mr_task;
 
+struct semaphore dlpar_mem_lock;
 
 static int __devinit ehea_probe_adapter(struct ibmebus_dev *dev,
const struct of_device_id *id);
@@ -177,16 +178,24 @@ static void ehea_refill_rq1(struct ehea_port_res *pr, int 
index, int nr_of_wqes)
struct sk_buff **skb_arr_rq1 = pr-rq1_skba.arr;
struct net_device *dev = pr-port-netdev;
int max_index_mask = pr-rq1_skba.len - 1;
+   int fill_wqes = pr-rq1_skba.os_skbs + nr_of_wqes;
+   int adder = 0;
int i;
 
-   if (!nr_of_wqes)
+   pr-rq1_skba.os_skbs = 0;
+
+   if (unlikely(test_bit(__EHEA_STOP_XFER, ehea_driver_flags))) {
+   pr-rq1_skba.index = index;
+   pr-rq1_skba.os_skbs = fill_wqes;
return;
+   }
 
-   for (i = 0; i  nr_of_wqes; i++) {
+   for (i = 0; i  fill_wqes; i++) {
if (!skb_arr_rq1[index]) {
skb_arr_rq1[index] = netdev_alloc_skb(dev,
  EHEA_L_PKT_SIZE);
if (!skb_arr_rq1[index]) {
+   pr-rq1_skba.os_skbs = fill_wqes - i;
ehea_error(%s: no mem for skb/%d wqes filled,
   dev-name, i);
break;
@@ -194,9 +203,14 @@ static void ehea_refill_rq1(struct ehea_port_res *pr, int 
index, int nr_of_wqes)
}
index--;
index = max_index_mask;
+   adder++;
}
+
+   if (adder == 0)
+   return;
+
/* Ring doorbell */
-   ehea_update_rq1a(pr-qp, i);
+   ehea_update_rq1a(pr-qp, adder);
 }
 
 static int ehea_init_fill_rq1(struct ehea_port_res *pr, int nr_rq1a)
@@ -230,16 +244,21 @@ static int ehea_refill_rq_def(struct ehea_port_res *pr,
struct sk_buff **skb_arr = q_skba-arr;
struct ehea_rwqe *rwqe;
int i, index, max_index_mask, fill_wqes;
+   int adder = 0;
int ret = 0;
 
fill_wqes = q_skba-os_skbs + num_wqes;
+   q_skba-os_skbs = 0;
 
-   if (!fill_wqes)
+   if (unlikely(test_bit(__EHEA_STOP_XFER, ehea_driver_flags))) {
+   q_skba-os_skbs = fill_wqes;
return ret;
+   }
 
index = q_skba-index;
max_index_mask = q_skba-len - 1;
for (i = 0; i  fill_wqes; i++) {
+   u64 tmp_addr;
struct sk_buff *skb = netdev_alloc_skb(dev, packet_size);
if (!skb) {
ehea_error(%s: no mem for skb/%d wqes filled,
@@ -251,30 +270,37 @@ static int ehea_refill_rq_def(struct ehea_port_res *pr,
skb_reserve(skb, NET_IP_ALIGN);
 
skb_arr[index] = skb;
+   tmp_addr = ehea_map_vaddr(skb-data);
+   if (tmp_addr == -1) {
+   dev_kfree_skb(skb);
+   q_skba-os_skbs = fill_wqes - i;
+   ret = 0;
+   break;
+   }
 
rwqe = ehea_get_next_rwqe(qp, rq_nr);
rwqe-wr_id = EHEA_BMASK_SET(EHEA_WR_ID_TYPE, wqe_type)
| EHEA_BMASK_SET(EHEA_WR_ID_INDEX, index);
rwqe-sg_list[0].l_key = pr-recv_mr.lkey;
-   rwqe-sg_list[0].vaddr = ehea_map_vaddr(skb-data);
+   rwqe-sg_list[0].vaddr = tmp_addr;
rwqe-sg_list[0].len = packet_size;
rwqe

[PATCH net-2.6.24] eHEA: poll function update for new NAPI scheme

2007-09-19 Thread Jan-Bernd Themann
Update of ehea_poll function to work with new NAPI scheme.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
Hi David,

this patch is built upon the patches provided by Mel Gorman
(2.6.23-rc6-mm1: Build failure on ppc64 drivers/net/ehea/ehea_main.c)
and Roland Dreier 
([PATCH net-2.6.24] Fix refcounting problem with netif_rx_reschedule())


 drivers/net/ehea/ehea_main.c |   29 +++--
 1 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index c5fc0b1..1bb39a7 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -609,6 +609,7 @@ static struct ehea_cqe *ehea_proc_cqes(struct ehea_port_res 
*pr, int my_quota)
 }
 
 #define EHEA_NAPI_POLL_NUM_BEFORE_IRQ 16
+#define EHEA_POLL_MAX_CQES 65535
 
 static int ehea_poll(struct napi_struct *napi, int budget)
 {
@@ -616,15 +617,18 @@ static int ehea_poll(struct napi_struct *napi, int budget)
struct net_device *dev = pr-port-netdev;
struct ehea_cqe *cqe;
struct ehea_cqe *cqe_skb = NULL;
-   int force_irq, wqe_index, rx;
-
-   cqe = ehea_poll_rq1(pr-qp, wqe_index);
-   cqe_skb = ehea_poll_cq(pr-send_cq);
+   int force_irq, wqe_index;
+   int rx = 0;
 
force_irq = (pr-poll_counter  EHEA_NAPI_POLL_NUM_BEFORE_IRQ);
+   cqe_skb = ehea_proc_cqes(pr, EHEA_POLL_MAX_CQES);
+
+   if (!force_irq)
+   rx += ehea_proc_rwqes(dev, pr, budget - rx);
 
-   if ((!cqe  !cqe_skb) || force_irq) {
+   while ((rx != budget) || force_irq) {
pr-poll_counter = 0;
+   force_irq = 0;
netif_rx_complete(dev, napi);
ehea_reset_cq_ep(pr-recv_cq);
ehea_reset_cq_ep(pr-send_cq);
@@ -634,19 +638,16 @@ static int ehea_poll(struct napi_struct *napi, int budget)
cqe_skb = ehea_poll_cq(pr-send_cq);
 
if (!cqe  !cqe_skb)
-   return 0;
+   return rx;
 
if (!netif_rx_reschedule(dev, napi))
-   return 0;
-   }
+   return rx;
 
-   rx = ehea_proc_rwqes(dev, pr, budget);
-   cqe = ehea_poll_rq1(pr-qp, wqe_index);
-   cqe_skb = ehea_proc_cqes(pr, 300);
-
-   if (cqe || cqe_skb)
-   pr-poll_counter++;
+   cqe_skb = ehea_proc_cqes(pr, EHEA_POLL_MAX_CQES);
+   rx += ehea_proc_rwqes(dev, pr, budget - rx);
+   }
 
+   pr-poll_counter++;
return rx;
 }
 
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/2][RESEND] ehea: fix last_rx update

2007-09-07 Thread Jan-Bernd Themann
Update last_rx in registered device struct instead of
in the dummy device.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 1e9fd6f..717b129 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -471,7 +471,7 @@ static struct ehea_cqe *ehea_proc_rwqes(struct net_device 
*dev,
else
netif_receive_skb(skb);
 
-   dev-last_rx = jiffies;
+   port-netdev-last_rx = jiffies;
} else {
pr-p_stats.poll_receive_errors++;
port_reset = ehea_treat_poll_error(pr, rq, cqe,
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-29 Thread Jan-Bernd Themann
Hi David

David Miller schrieb:
 Interrupt mitigation only works if it helps you avoid interrupts.
 This scheme potentially makes more of them happen.

 The hrtimer is just another interrupt, a cpu locally triggered one,
 but it has much of the same costs nonetheless.

 So if you set this timer, it triggers, and no packets arrive, you are
 taking more interrupts and doing more work than if you had disabled
 NAPI.

 In fact, for certain packet rates, your scheme would result in
 twice as many interrupts than the current scheme
   
That depends how smart the driver switches between timer
polling and plain NAPI (depending on load situation).
 This is one of several reasons why hardware is the only truly proper
 place for this kind of logic.  Only the hardware can see the packet
 arrive, and do the interrupt deferral without any cpu intervention
 whatsoever.
   
What I'm trying to improve with this approach is interrupt
mitigation for NICs where the hardware support for interrupt
mitigation is limited. I'm not trying to improve this for NICs
that work well with the means their HW provides. I'm aware of
the fact that this scheme has it's tradeoffs and certainly
can not be as good as a HW approach.
So I'm grateful for any ideas that do have less tradeoffs and
provide a mechanism to reduce interrupts without depending on
HW support of the NIC.

In the end I want to reduce the CPU utilization. And one way
to do that is LRO which also works only well if there are more
then just a very few packets to aggregate. So at least our
driver (eHEA) would benefit from a mix of timer based polling
and plain NAPI (depending on load situations).

If there is no need for a generic mechanism for this kind of
network adapters, then we can just leave this to each device
driver.




___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-29 Thread Jan-Bernd Themann
On Wednesday 29 August 2007 10:29, David Miller wrote:
 From: Jan-Bernd Themann [EMAIL PROTECTED]
 Date: Wed, 29 Aug 2007 09:10:15 +0200
 
  In the end I want to reduce the CPU utilization. And one way
  to do that is LRO which also works only well if there are more
  then just a very few packets to aggregate. So at least our
  driver (eHEA) would benefit from a mix of timer based polling
  and plain NAPI (depending on load situations).
  
  If there is no need for a generic mechanism for this kind of
  network adapters, then we can just leave this to each device
  driver.
 
 No objections from me either way, if something works then
 fine.
 
 Let's come back to this once you have a tested sample implementation
 that does what you want, ok?

Sounds good
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-28 Thread Jan-Bernd Themann
On Monday 27 August 2007 22:37, David Miller wrote:
 From: Jan-Bernd Themann [EMAIL PROTECTED]
 Date: Mon, 27 Aug 2007 11:47:01 +0200
 
  So the question is simply: Do we want drivers that need (benefit
  from) a timer based polling support to implement their own timers
  each, or should there be a generic support?
 
 I'm trying to figure out how an hrtimer implementation would
 even work.
 
 Would you start the timer from the chip interrupt handler?  If so,
 that's taking two steps backwards as you've already taken all of the
 overhead of running the interrupt handler.

I'm also still trying to understand how hrtimer work exactly. 
The implementation of hrtimers for P6 has not been finished yet, so
I can't do experiments with hrtimers and eHEA now.

I will try the following scheme (once we get hrtimers):
Each device (queue) has a hrtimer.
Schedule the timer in the poll function instead of reactivating IRQs
when a high load situation has been detected and all packets have
been emptied from the receive queue.
The timer function could then just call netif_rx_schedule to register
the rx_queue for NAPI again. 

The advantages of this scheme (if it works as I understood it) would be:
- we don't have to modify NAPI
- benefit from fairness amoung rx_queues / network devices 
- The poll function can decide how long to stick to the timer based
  polling mode, and when to switch back to it's HW IRQs.
- driver can determine the time to wait based on the receive queue length and
  speed

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-28 Thread Jan-Bernd Themann
Hi

On Monday 27 August 2007 23:02, David Miller wrote:
 But there are huger fish to fry for you I think.  Talk to your
 platform maintainers and ask for an interface for obtaining
 a flat static distribution of interrupts to cpus in order to
 support multiqueue NAPI better.
 
 In your previous postings you made arguments saying that the
 automatic placement of interrupts to cpus made everything
 bunch of to a single cpu and you wanted to propagate the
 NAPI work to other cpu's software interrupts from there.
 
 That logic is bogus, because it merely proves that the hardware
 interrupt distribution is broken.  If it's a bad cpu to run
 software interrupts on, it's also a bad cpu to run hardware
 interrupts on.
 

As already mentioned some mails were mixed up here. To clarify the interrupt
issue that has nothing to do with the reduction of interrupts:
- Interrupts are distributed the round robin way on IBM POWER6 processors
- Interrupt distribution can be modified by user/daemons (smp_affinity, pinning)
- NAPI is scheduled on CPU where interrupt is catched
- NAPI polls on that CPU as long as poll has packets to process (default)
(David please correct if there is a misunderstanding here)

Problem for multi queue driver with interrupt distribution scheme set to
round robin for this simple example:
Assuming we have 2 SLOW CPUs. CPU_1 is under heavy load (applications). CPU_2
is not under heavy load. Now we receive a lot of packets (high load situation).
Receive queue 1 (RQ1) is scheduled on CPU_1. NAPI-Poll does not manage to empty
RQ1 ever, so it stays on CPU_1. The second receive queue (RQ2) is scheduled on
CPU_2. As that CPU is not under heavy load, RQ2 can be emptied, and the next IRQ
for RQ2 will go to CPU_1. Then both RQs are on CPU_1 and will stay there if
no IRQ is forced at some time as both RQs are never emptied completely.

This is a simplified example to demonstrate the problem. The interrupt scheme
is not bogus, it is just an effect you see if you don't use pinning.
The user can avoid this problem by pinning the interrupts to CPUs.

As pointed out by David, it will be too expensive to schedule NAPI poll on
a different CPU than the one that gets the IRQ. So I guess one solution 
is to force an HW interrupt when two many RQs are processed on the same CPU
(when no IRQ pinning is used). This is something the driver has to handle.

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-28 Thread Jan-Bernd Themann
On Tuesday 28 August 2007 11:22, James Chapman wrote:
  So in this scheme what runs -poll() to process incoming packets?
  The hrtimer?
 
 No, the regular NAPI networking core calls -poll() as usual; no timers 
 are involved. This scheme simply delays the napi_complete() from the 
 driver so the device stays in the poll list longer. It means that its 
 -poll() will be called when there is no work to do for 1-2 jiffies, 
 hence the optimization at the top of -poll() to efficiently handle that 
 case. The device's -poll() is called by the NAPI core until it has 
 continuously done no work for 1-2 jiffies, at which point it finally 
 does the netif_rx_complete() and re-enables its interrupts.
 
I'm not sure if I understand your approach correctly.
This approach may reduce the number of interrupts, but it does so
by blocking the CPU for up to 1 jiffy (that can be quite some time
on some platforms). So no other application / tasklet / softIRQ type
can do anything in between. The CPU utilization does not drop at all, 
and I thought that is one reason why we try to reduce the number of interrupts.

 If people feel that holding the device in the poll list for 1-2 jiffies 
 is too long (because there are too many wasted polls), a counter could 
 be used to to delay the netif_rx_complete() by N polls instead. N would 
 be a value depending on CPU speed. I use the jiffy sampling method 
 because it results in some natural randomization of the actual delay 
 depending on when the jiffy value was sampled in relation to the jiffy tick.
 

Waiting for N polls seems to make no sense if there are no further network 
adapters
in that machine. It would take no time to call poll N times in a row when no
new packets arrive. There is no real delay as the net_rx_action function will
do nothing else between the poll calls.

Please correct me if I'm wrong.

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-27 Thread Jan-Bernd Themann
On Monday 27 August 2007 03:58, David Miller wrote:
 From: James Chapman [EMAIL PROTECTED]
 Date: Sun, 26 Aug 2007 20:36:20 +0100
 
  David Miller wrote:
   From: James Chapman [EMAIL PROTECTED]
   Date: Fri, 24 Aug 2007 18:16:45 +0100
   
   Does hardware interrupt mitigation really interact well with NAPI?
   
   It interacts quite excellently.
  
  If NAPI disables interrupts and keeps them disabled while there are more 
  packets arriving or more transmits being completed, why do hardware 
  interrupt mitigation / coalescing features of the network silicon help?
 
 Because if your packet rate is low enough such that the cpu can
 process the interrupt fast enough and thus only one packet gets
 processed per NAPI poll, the cost of going into and out of NAPI mode
 dominates the packet processing costs.

As far as I understand your argumentation, NAPI is supposed to work well only
for HW with coalescing features (concerning dropping the interrupt rate).
NAPI itself does not provide a reliable functionality to reduce the
number of interrupts, especially not for systems with only 1 NIC. 
NAPI will only wait for some time when the budget is exceeded
and the softIRQs don't call net_rx_action again. This seems to be the case
after 10 rounds. That means NAPI really waits after 300 x 10 packets 
have been processed in a row (worst case).

As a matter of fact there is HW that does not have this feature. There seems
to be HW which does not work well with plain NAPI.
This HW performs well if the operation system supports HP timers 
(and when they are used either in the device driver or polling engine).

So the question is simply: Do we want drivers that need (benefit from)
a timer based polling support to implement their own timers each, 
or should there be a generic support? 

Thanks,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/2] ehea: propagate physical port state

2007-08-27 Thread Jan-Bernd Themann
Introduces a module parameter to decide whether the physical
port link state is propagated to the network stack or not.
It makes sense not to take the physical port state into account
on machines with more logical partitions that communicate
with each other. This is always possible no matter what the physical
port state is. Thus eHEA can be considered as a switch there.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |5 -
 drivers/net/ehea/ehea_main.c |   14 +-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..8d58be5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -402,6 +402,8 @@ struct ehea_mc_list {
 
 #define EHEA_PORT_UP 1
 #define EHEA_PORT_DOWN 0
+#define EHEA_PHY_LINK_UP 1
+#define EHEA_PHY_LINK_DOWN 0
 #define EHEA_MAX_PORT_RES 16
 struct ehea_port {
struct ehea_adapter *adapter;/* adapter that owns this port */
@@ -427,6 +429,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u8 phy_link;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index db57474..1e9fd6f 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
 static int num_tx_qps = EHEA_NUM_TX_QP;
+static int prop_carrier_state = 0;
 
 module_param(msg_level, int, 0);
 module_param(rq1_entries, int, 0);
 module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
+module_param(prop_carrier_state, int, 0);
 module_param(use_mcs, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
 MODULE_PARM_DESC(msg_level, msg_level);
+MODULE_PARM_DESC(prop_carrier_state, Propagate carrier state of physical 
+port to stack. 1:yes, 0:no.  Default = 0 );
 MODULE_PARM_DESC(rq3_entries, Number of entries for Receive Queue 3 
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) ));
@@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 
port_speed)
ehea_error(Failed setting port speed);
}
}
-   netif_carrier_on(port-netdev);
+   if (!prop_carrier_state || (port-phy_link == EHEA_PHY_LINK_UP))
+   netif_carrier_on(port-netdev);
+
kfree(cb4);
 out:
return ret;
@@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, 
u64 eqe)
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {
+   port-phy_link = EHEA_PHY_LINK_UP;
if (netif_msg_link(port))
ehea_info(%s: Physical port up,
  port-netdev-name);
+   if (prop_carrier_state)
+   netif_carrier_on(port-netdev);
} else {
+   port-phy_link = EHEA_PHY_LINK_DOWN;
if (netif_msg_link(port))
ehea_info(%s: Physical port down,
  port-netdev-name);
+   if (prop_carrier_state)
+   netif_carrier_off(port-netdev);
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PRIMARY, eqe))
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-27 Thread Jan-Bernd Themann
On Monday 27 August 2007 17:51, James Chapman wrote:

 In the second half of my previous reply (which seems to have been 
 deleted), I suggest a way to avoid this problem without using hardware 
 interrupt mitigation / coalescing. Original text is quoted below.
 
   I've seen the same and I'm suggesting that the NAPI driver keeps
   itself in polled mode for N polls or M jiffies after it sees
   workdone=0. This has always worked for me in packet forwarding
   scenarios to maximize packets/sec and minimize latency.
 
 To implement this, there's no need for timers, hrtimers or generic NAPI 
 support that others have suggested. A driver's poll() would set an 
 internal flag and record the current jiffies value when finding 
 workdone=0 rather than doing an immediate napi_complete(). Early in 
 poll() it would test this flag and if set, do a low-cost test to see if 
 it had any work to do. If no work, it would check the saved jiffies 
 value and do the napi_complete() only if no work has been done for a 
 configurable number of jiffies. This keeps interrupts disabled longer at 
 the expense of many more calls to poll() where no work is done. So 
 critical to this scheme is modifying the driver's poll() to fastpath the 
 case of having no work to do while waiting for its local jiffy count to 
 expire.
 

The problem I see with this approach is that the time that passes between
two jiffies might be too long for 10G ethernet adapters. (I tried to implement
a timer based approach with usual timers and the result was a disaster).
HW interrupts / or HP timer avoid the jiffy problem as they activate softIRQs
as soon as you call netif_rx_schedule. 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

when I tried to get the eHEA driver working with the new interface,
the following issues came up.

1) The current implementation of netif_rx_schedule, netif_rx_complete
   and the net_rx_action have the following problem: netif_rx_schedule
   sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
   netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
   to the poll_list again (as well). netif_rx_complete clears the 
NAPI_STATE_SCHED.
   If an interrupt handler calls netif_rx_schedule on CPU 2
   after netif_rx_complete has been called on CPU 1 (and the poll function 
   has not returned yet), the NAPI instance will be added twice to the 
   poll_list (by netif_rx_schedule and net_rx_action). Problems occur when 
   netif_rx_complete is called twice for the device (BUG() called)

2) If an ethernet chip supports multiple receive queues, the queues are 
   currently all processed on the CPU where the interrupt comes in. This
   is because netif_rx_schedule will always add the rx queue to the CPU's
   napi poll_list. The result under heavy presure is that all queues will
   gather on the weakest CPU (with highest CPU load) after some time as they
   will stay there as long as the entire queue is emptied. On SMP systems 
   this behaviour is not desired. It should also work well without interrupt
   pinning.
   It would be nice if it is possible to schedule queues to other CPU's, or
   at least to use interrupts to put the queue to another cpu (not nice for 
   as you never know which one you will hit). 
   I'm not sure how bad the tradeoff would be.

3) On modern systems the incoming packets are processed very fast. Especially
   on SMP systems when we use multiple queues we process only a few packets
   per napi poll cycle. So NAPI does not work very well here and the interrupt 
   rate is still high. What we need would be some sort of timer polling mode 
   which will schedule a device after a certain amount of time for high load 
   situations. With high precision timers this could work well. Current
   usual timers are too slow. A finer granularity would be needed to keep the
   latency down (and queue length moderate).

What do you think?

Thanks,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Hi,

On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
 On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
  ...
  3) On modern systems the incoming packets are processed very fast. 
  Especially
     on SMP systems when we use multiple queues we process only a few packets
     per napi poll cycle. So NAPI does not work very well here and the 
  interrupt 
     rate is still high. What we need would be some sort of timer polling 
  mode 
     which will schedule a device after a certain amount of time for high 
  load 
     situations. With high precision timers this could work well. Current
     usual timers are too slow. A finer granularity would be needed to keep 
  the
 latency down (and queue length moderate).
  
 
 We found the same on ia64-sn systems with tg3 a couple of years 
 ago. Using simple interrupt coalescing (don't interrupt until 
 you've received N packets or M usecs have elapsed) worked 
 reasonably well in practice. If your h/w supports that (and I'd 
 guess it does, since it's such a simple thing), you might try 
 it.
 

I don't see how this should work. Our latest machines are fast enough that they
simply empty the queue during the first poll iteration (in most cases).
Even if you wait until X packets have been received, it does not help for
the next poll cycle. The average number of packets we process per poll queue
is low. So a timer would be preferable that periodically polls the 
queue, without the need of generating a HW interrupt. This would allow us
to wait until a reasonable amount of packets have been received in the meantime
to keep the poll overhead low. This would also be useful in combination
with LRO.

Regards,
Jan-Bernd
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
James Chapman schrieb:
 Stephen Hemminger wrote:
 On Fri, 24 Aug 2007 17:47:15 +0200
 Jan-Bernd Themann [EMAIL PROTECTED] wrote:

 Hi,

 On Friday 24 August 2007 17:37, [EMAIL PROTECTED] wrote:
 On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
 ...
 3) On modern systems the incoming packets are processed very fast. 
 Especially
on SMP systems when we use multiple queues we process only a 
 few packets
per napi poll cycle. So NAPI does not work very well here and 
 the interruptrate is still high. What we need would be some 
 sort of timer polling modewhich will schedule a device after a 
 certain amount of time for high loadsituations. With high 
 precision timers this could work well. Current
usual timers are too slow. A finer granularity would be needed 
 to keep the
latency down (and queue length moderate).

 We found the same on ia64-sn systems with tg3 a couple of years 
 ago. Using simple interrupt coalescing (don't interrupt until 
 you've received N packets or M usecs have elapsed) worked 
 reasonably well in practice. If your h/w supports that (and I'd 
 guess it does, since it's such a simple thing), you might try it.

 I don't see how this should work. Our latest machines are fast 
 enough that they
 simply empty the queue during the first poll iteration (in most cases).
 Even if you wait until X packets have been received, it does not 
 help for
 the next poll cycle. The average number of packets we process per 
 poll queue
 is low. So a timer would be preferable that periodically polls the 
 queue, without the need of generating a HW interrupt. This would 
 allow us
 to wait until a reasonable amount of packets have been received in 
 the meantime
 to keep the poll overhead low. This would also be useful in combination
 with LRO.


 You need hardware support for deferred interrupts. Most devices have 
 it (e1000, sky2, tg3)
 and it interacts well with NAPI. It is not a generic thing you want 
 done by the stack,
 you want the hardware to hold off interrupts until X packets or Y 
 usecs have expired.

 Does hardware interrupt mitigation really interact well with NAPI? In 
 my experience, holding off interrupts for X packets or Y usecs does 
 more harm than good; such hardware features are useful only when the 
 OS has no NAPI-like mechanism.

 When tuning NAPI drivers for packets/sec performance (which is a good 
 indicator of driver performance), I make sure that the driver stays in 
 NAPI polled mode while it has any rx or tx work to do. If the CPU is 
 fast enough that all work is always completed on each poll, I have the 
 driver stay in polled mode until dev-poll() is called N times with no 
 work being done. This keeps interrupts disabled for reasonable traffic 
 levels, while minimizing packet processing latency. No need for 
 hardware interrupt mitigation.
Yes, that was one idea as well. But the problem with that is that 
net_rx_action will call
the same poll function over and over again in a row if there are no 
further network
devices. The problem about this approach is that you always poll just a 
very few packets
each time. This does not work with LRO well, as there are no packets to 
aggregate...
So it would make more sense to wait for a certain time before trying it 
again.
Second problem: after the jiffies incremented by one in net_rx_action 
(after some poll rounds), net_rx_action will quit and return control to 
the softIRQ handler. The poll function
is called again as the softIRQ handler thinks there is more work to be 
done. So even
then we do not wait... After some rounds in the softIRQ handler, we 
finally wait some time.


 The parameters for controlling it are already in ethtool, the issue 
 is finding a good
 default set of values for a wide range of applications and 
 architectures. Maybe some
 heuristic based on processor speed would be a good starting point. 
 The dynamic irq
 moderation stuff is not widely used because it is too hard to get right.

 I agree. It would be nice to find a way for the typical user to derive 
 best values for these knobs for his/her particular system. Perhaps a 
 tool using pktgen and network device phy internal loopback could be 
 developed?



___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: RFC: issues concerning the next NAPI interface

2007-08-24 Thread Jan-Bernd Themann
Linas Vepstas schrieb:
 On Fri, Aug 24, 2007 at 09:04:56PM +0200, Bodo Eggert wrote:
   
 Linas Vepstas [EMAIL PROTECTED] wrote:
 
 On Fri, Aug 24, 2007 at 03:59:16PM +0200, Jan-Bernd Themann wrote:
   
 3) On modern systems the incoming packets are processed very fast. 
 Especially
 on SMP systems when we use multiple queues we process only a few packets
 per napi poll cycle. So NAPI does not work very well here and the interrupt
 rate is still high.
 
 worst-case network ping-pong app: send one
 packet, wait for reply, send one packet, etc.
   
 Possible solution / possible brainfart:

 Introduce a timer, but don't start to use it to combine packets unless you
 receive n packets within the timeframe. If you receive less than m packets
 within one timeframe, stop using the timer. The system should now have a
 decent response time when the network is idle, and when the network is
 busy, nobody will complain about the latency.-)
 

 Ohh, that was inspirational. Let me free-associate some wild ideas.

 Suppose we keep a running average of the recent packet arrival rate,
 Lets say its 10 per millisecond (typical for a gigabit eth runnning
 flat-out).  If we could poll the driver at a rate of 10-20 per
 millisecond (i.e. letting the OS do other useful work for 0.05 millisec),
 then we could potentially service the card without ever having to enable 
 interrupts on the card, and without hurting latency.

 If the packet arrival rate becomes slow enough, we go back to an
 interrupt-driven scheme (to keep latency down).

 The main problem here is that, even for HZ=1000 machines, this amounts 
 to 10-20 polls per jiffy.  Which, if implemented in kernel, requires 
 using the high-resolution timers. And, umm, don't the HR timers require
 a cpu timer interrupt to make them go? So its not clear that this is much
 of a win.
   
That is indeed a good question. At least for 10G eHEA we see
that the average number of packets/poll cycle is very low.
With high precision timers we could control the poll interval
better and thus make sure we get enough packets on the queue in
high load situations to benefit from LRO while keeping the
latency moderate. When the traffic load is low we could just
stick to plain NAPI. I don't know how expensive hp timers are,
we probably just have to test it (when they are available for
POWER in our case). However, having more packets
per poll run would make LRO more efficient and thus the total
CPU utilization would decrease.

I guess on most systems there are not many different network
cards working in parallel. So if the driver could set the poll
interval for its devices, it could be well optimized depending
on the NICs characteristics.

Maybe it would be good enough to have a timer that schedules
the device for NAPI (and thus triggers SoftIRQs, which will
trigger NAPI). Whether this timer would be used via a generic
interface or would be implemented as a proprietary solution
would depend on whether other drivers want / need this feature
as well. Drivers / NICs that work fine with plain NAPI don't
have to use timer :-)

I tried to implement something with normal timers, but the result
was everything but great. The timers seem to be far too slow.
I'm not sure if it helps to increase it from 1000HZ to 2500HZ
or more.

Regards,
Jan-Bernd

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/4] ehea: fix module parameter description

2007-08-22 Thread Jan-Bernd Themann
Update the module parameter description of use_mcs to
show correct default value

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 22d000f..db57474 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -76,7 +76,7 @@ MODULE_PARM_DESC(rq1_entries, Number of entries for Receive 
Queue 1 
 MODULE_PARM_DESC(sq_entries,  Number of entries for the Send Queue  
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ));
-MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 1 );
+MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 0 );
 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 4/4] ehea: show physical port state

2007-08-22 Thread Jan-Bernd Themann
Introduces a module parameter to decide whether the physical
port link state is propagated to the network stack or not.
It makes sense not to take the physical port state into account
on machines with more logical partitions that communicate
with each other. This is always possible no matter what the physical
port state is. Thus eHEA can be considered as a switch there.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/ehea/ehea.h  |5 -
 drivers/net/ehea/ehea_main.c |   14 +-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..8d58be5 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -402,6 +402,8 @@ struct ehea_mc_list {
 
 #define EHEA_PORT_UP 1
 #define EHEA_PORT_DOWN 0
+#define EHEA_PHY_LINK_UP 1
+#define EHEA_PHY_LINK_DOWN 0
 #define EHEA_MAX_PORT_RES 16
 struct ehea_port {
struct ehea_adapter *adapter;/* adapter that owns this port */
@@ -427,6 +429,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u8 phy_link;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index db57474..1804c99 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -53,17 +53,21 @@ static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
 static int num_tx_qps = EHEA_NUM_TX_QP;
+static int show_phys_link = 0;
 
 module_param(msg_level, int, 0);
 module_param(rq1_entries, int, 0);
 module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
+module_param(show_phys_link, int, 0);
 module_param(use_mcs, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
 MODULE_PARM_DESC(msg_level, msg_level);
+MODULE_PARM_DESC(show_phys_link, Show link state of external port
+1:yes, 0: no.  Default = 0 );
 MODULE_PARM_DESC(rq3_entries, Number of entries for Receive Queue 3 
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_RQ3) ));
@@ -814,7 +818,9 @@ int ehea_set_portspeed(struct ehea_port *port, u32 
port_speed)
ehea_error(Failed setting port speed);
}
}
-   netif_carrier_on(port-netdev);
+   if (!show_phys_link || (port-phy_link == EHEA_PHY_LINK_UP))
+   netif_carrier_on(port-netdev);
+
kfree(cb4);
 out:
return ret;
@@ -869,13 +875,19 @@ static void ehea_parse_eqe(struct ehea_adapter *adapter, 
u64 eqe)
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PORT_UP, eqe)) {
+   port-phy_link = EHEA_PHY_LINK_UP;
if (netif_msg_link(port))
ehea_info(%s: Physical port up,
  port-netdev-name);
+   if (show_phys_link)
+   netif_carrier_on(port-netdev);
} else {
+   port-phy_link = EHEA_PHY_LINK_DOWN;
if (netif_msg_link(port))
ehea_info(%s: Physical port down,
  port-netdev-name);
+   if (show_phys_link)
+   netif_carrier_off(port-netdev);
}
 
if (EHEA_BMASK_GET(NEQE_EXTSWITCH_PRIMARY, eqe))
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] lro: eHEA example how to use LRO

2007-08-03 Thread Jan-Bernd Themann
This patch shows how the generic LRO interface is used for SKB mode

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 drivers/net/Kconfig |1 +
 drivers/net/ehea/ehea.h |9 -
 drivers/net/ehea/ehea_ethtool.c |   15 +++
 drivers/net/ehea/ehea_main.c|   84 +++---
 4 files changed, 101 insertions(+), 8 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index f8a602c..fec4004 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2399,6 +2399,7 @@ config CHELSIO_T3
 config EHEA
tristate eHEA Ethernet support
depends on IBMEBUS
+   select INET_LRO
---help---
  This driver supports the IBM pSeries eHEA ethernet adapter.
 
diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index d67f97b..70e33fe 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -33,13 +33,14 @@
 #include linux/ethtool.h
 #include linux/vmalloc.h
 #include linux/if_vlan.h
+#include linux/inet_lro.h
 
 #include asm/ibmebus.h
 #include asm/abs_addr.h
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0073
+#define DRV_VERSIONEHEA_0074
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
@@ -58,6 +59,7 @@
 
 #define EHEA_SMALL_QUEUES
 #define EHEA_NUM_TX_QP 1
+#define EHEA_LRO_MAX_AGGR 64
 
 #ifdef EHEA_SMALL_QUEUES
 #define EHEA_MAX_CQE_COUNT  1023
@@ -84,6 +86,8 @@
 #define EHEA_RQ2_PKT_SIZE   1522
 #define EHEA_L_PKT_SIZE 256/* low latency */
 
+#define MAX_LRO_DESCRIPTORS 8
+
 /* Send completion signaling */
 
 /* Protection Domain Identifier */
@@ -376,6 +380,8 @@ struct ehea_port_res {
u64 tx_packets;
u64 rx_packets;
u32 poll_counter;
+   struct net_lro_mgr lro_mgr;
+   struct net_lro_desc lro_desc[MAX_LRO_DESCRIPTORS];
 };
 
 
@@ -427,6 +433,7 @@ struct ehea_port {
u32 msg_enable;
u32 sig_comp_iv;
u32 state;
+   u32 lro_max_aggr;
u8 full_duplex;
u8 autoneg;
u8 num_def_qps;
diff --git a/drivers/net/ehea/ehea_ethtool.c b/drivers/net/ehea/ehea_ethtool.c
index decec8c..29ef7a9 100644
--- a/drivers/net/ehea/ehea_ethtool.c
+++ b/drivers/net/ehea/ehea_ethtool.c
@@ -183,6 +183,9 @@ static char ehea_ethtool_stats_keys[][ETH_GSTRING_LEN] = {
{PR5 free_swqes},
{PR6 free_swqes},
{PR7 free_swqes},
+   {LRO aggregated},
+   {LRO flushed},
+   {LRO no_desc},
 };
 
 static void ehea_get_strings(struct net_device *dev, u32 stringset, u8 *data)
@@ -239,6 +242,18 @@ static void ehea_get_ethtool_stats(struct net_device *dev,
for (k = 0; k  8; k++)
data[i++] = atomic_read(port-port_res[k].swqe_avail);
 
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.aggregated;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.flushed;
+   data[i++] = tmp;
+
+   for (k = 0, tmp = 0; k  EHEA_MAX_PORT_RES; k++)
+   tmp |= port-port_res[k].lro_mgr.stats.no_desc;
+   data[i++] = tmp;
+
 }
 
 const struct ethtool_ops ehea_ethtool_ops = {
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 9756211..fbaa395 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -52,6 +52,8 @@ static int rq2_entries = EHEA_DEF_ENTRIES_RQ2;
 static int rq3_entries = EHEA_DEF_ENTRIES_RQ3;
 static int sq_entries = EHEA_DEF_ENTRIES_SQ;
 static int use_mcs = 0;
+static int use_lro = 0;
+static int lro_max_aggr = EHEA_LRO_MAX_AGGR;
 static int num_tx_qps = EHEA_NUM_TX_QP;
 
 module_param(msg_level, int, 0);
@@ -60,6 +62,8 @@ module_param(rq2_entries, int, 0);
 module_param(rq3_entries, int, 0);
 module_param(sq_entries, int, 0);
 module_param(use_mcs, int, 0);
+module_param(use_lro, int, 0);
+module_param(lro_max_aggr, int, 0);
 module_param(num_tx_qps, int, 0);
 
 MODULE_PARM_DESC(num_tx_qps, Number of TX-QPS);
@@ -77,6 +81,10 @@ MODULE_PARM_DESC(sq_entries,  Number of entries for the 
Send Queue  
 [2^x - 1], x = [6..14]. Default = 
 __MODULE_STRING(EHEA_DEF_ENTRIES_SQ) ));
 MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive queues, Default = 1 );
+MODULE_PARM_DESC(lro_max_aggr,  LRO: Max packets to be aggregated. Default = 
+__MODULE_STRING(EHEA_LRO_MAX_AGGR));
+MODULE_PARM_DESC(use_lro,  Large Receive Offload, 1: enable, 0: disable, 
+ Default = 0);
 
 static int port_name_cnt = 0;
 static LIST_HEAD(adapter_list);
@@ -389,6 +397,60 @@ static int ehea_treat_poll_error(struct ehea_port_res *pr, 
int rq,
return 0;
 }
 
+static int get_skb_hdr(struct sk_buff *skb, void **iphdr,
+  void **tcph, u64 *hdr_flags, void *priv)
+{
+   struct ehea_cqe *cqe = priv;
+   unsigned int ip_len;
+   struct iphdr *iph

[PATCH 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-08-03 Thread Jan-Bernd Themann
This patch provides generic Large Receive Offload (LRO) functionality
for IPv4/TCP traffic.

LRO combines received tcp packets to a single larger tcp packet and 
passes them then to the network stack in order to increase performance
(throughput). The interface supports two modes: Drivers can either pass
SKBs or fragment lists to the LRO engine. 

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]


---
 include/linux/inet_lro.h |  177 ++
 net/ipv4/Kconfig |8 +
 net/ipv4/Makefile|1 +
 net/ipv4/inet_lro.c  |  600 ++
 4 files changed, 786 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..e1fc1d1
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,177 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+/*
+ * LRO statistics
+ */
+
+struct net_lro_stats {
+   unsigned long aggregated;
+   unsigned long flushed;
+   unsigned long no_desc;
+};
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int mss;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev;
+   struct net_lro_stats stats;
+
+   /* LRO features */
+   unsigned long features;
+#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
+#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
+   from received packets and eth protocol
+   is still ETH_P_8021Q */
+
+   u32 ip_summed;  /* Set in non generated SKBs in page mode */
+   u32 ip_summed_aggr; /* Set in aggregated SKBs: CHECKSUM_UNNECESSARY
+* or CHECKSUM_NONE */
+
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /* hdr_flags: */
+#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
+#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct

Re: [PATCH 0/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-31 Thread Jan-Bernd Themann
Hi,

Thanks for finding these bugs! I'll post an updated version soon (2 patches
with no separate Kconfig patches, one LRO and one eHEA patch). See comments 
below.

Thanks,
Jan-Bernd

On Monday 30 July 2007 22:32, Andrew Gallatin wrote:
 I was working on testing the myri10ge patch, and I ran into a few
 problems.  I've attached a patch to inet_lro.c to fix some of them,
 and a patch to myri10ge.c to show how to use the page based
 interface.  Both patches are signed off by Andrew Gallatin
 [EMAIL PROTECTED]
 
 First, the LRO_MAX_PG_HLEN is still a problem.  Minimally sized 60
 byte frames still cause problems in lro_gen_skb due to skb-len
 going negative.  Fixed in the attached patch.  It may be simpler
 to just drop LRO_MAX_PG_HLEN to ETH_ZLEN, but I'm not sure if
 that is enough.  Are there smart NICs which might chop off padding
 themselves?

I'd tend to stick to an explicit check as implemented in your patch
for now

 
 Second, you still need to set skb-ip_summed = CHECKSUM_UNNECESSARY
 when modified packets are flushed, else the stack will see bad
 checksums for packets from CHECKSUM_COMPLETE drivers using the
 skb interface.  Fixed in the attached patch.

I thought about it... As we do update the TCP checksum for aggregated
packets we could add a second ip_summed field in the net_lro_mgr struct 
used for aggregated packets to support HW that does not have any checksum helper
functionality. These drivers could set this ip_summed field to CHECKSUM_NONE, 
and thus leave the checksum check to the stack. I'm not sure if these old 
devices benefit
a lot from LRO. So what do you think?

 
 Fourth, I did some traffic sniffing to try to figure out what's going
 on above, and saw tcpdump complain about bad checksums.  Have you tried
 running tcpdump -s 65535 -vvv?  Have you also seen bad checksums?
 I seem to see this for both page- and skb-based versions of the driver.
 

Hmmm, can't confirm that. For our skb-based version I see
correct checksums for aggregated packets and for the page-based version as well.
I used: (tcpdump -i ethX -s 0 -w dump.bin) in combination with ethereal.
Don't see problems as well with your tcpdump command.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 2/4][RFC] lro: Kconfig and Makefile

2007-07-30 Thread Jan-Bernd Themann
Kconfig and Makefile for LRO

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 net/ipv4/Kconfig  |8 
 net/ipv4/Makefile |1 +
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index fb79097..d894f61 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -394,6 +394,14 @@ config INET_XFRM_MODE_BEET
 
  If unsure, say Y.
 
+config INET_LRO
+   tristate Large Receive Offload (ipv4/tcp)
+
+   ---help---
+ Support for Large Receive Offload (ipv4/tcp).
+
+ If unsure, say Y.
+
 config INET_DIAG
tristate INET: socket monitoring interface
default y
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index fbf1674..a02c36d 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_INET_ESP) += esp4.o
 obj-$(CONFIG_INET_IPCOMP) += ipcomp.o
 obj-$(CONFIG_INET_XFRM_TUNNEL) += xfrm4_tunnel.o
 obj-$(CONFIG_INET_XFRM_MODE_BEET) += xfrm4_mode_beet.o
+obj-$(CONFIG_INET_LRO) += inet_lro.o
 obj-$(CONFIG_INET_TUNNEL) += tunnel4.o
 obj-$(CONFIG_INET_XFRM_MODE_TRANSPORT) += xfrm4_mode_transport.o
 obj-$(CONFIG_INET_XFRM_MODE_TUNNEL) += xfrm4_mode_tunnel.o
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/4][RFC] lro: Generic Large Receive Offload for TCP traffic

2007-07-30 Thread Jan-Bernd Themann
Generic Large Receive Offload for TCP traffic

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 include/linux/inet_lro.h |  173 ++
 net/ipv4/inet_lro.c  |  590 ++
 2 files changed, 763 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..0957234
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,173 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+/*
+ * LRO statistics
+ */
+
+struct net_lro_stats {
+   unsigned long aggregated;
+   unsigned long flushed;
+   unsigned long no_desc;
+};
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev;
+   struct net_lro_stats stats;
+
+   /* LRO features */
+   unsigned long features;
+#define LRO_F_NAPI1  /* Pass packets to stack via NAPI */
+#define LRO_F_EXTRACT_VLAN_ID 2  /* Set flag if VLAN IDs are extracted
+   from received packets and eth protocol
+   is still ETH_P_8021Q */
+
+   u32 ip_summed; /* Options to be set in generated SKB in page mode */
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /* hdr_flags: */
+#define LRO_IPV4 1 /* ip_hdr is IPv4 header */
+#define LRO_TCP  2 /* tcpudp_hdr is TCP header */
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
+ struct sk_buff *skb,
+ struct vlan_group *vgrp,
+ u16 vlan_tag,
+ void *priv);
+
+/*
+ * Processes a fragment list
+ *
+ * This functions aggregate fragments and generate SKBs do pass
+ * the packets to the stack.
+ *
+ * @lro_mgr: LRO manager to use
+ * @frags: Fragment to be processed. Must contain entire header in first
+ * element.
+ * @len: Length

Re: [RFC 0/1] lro: Generic Large Receive Offload for TCP traffic

2007-07-27 Thread Jan-Bernd Themann
Hi Drew,

thanks a lot for your good feedback. See comments below.
I'll try to provide an updated version next week. It would
be nice if you could post a patch for your driver once
we have addressed the issues you mentioned. Then we would
have the eHEA driver for the SKB interface, and your driver
for the receive in pages interface.

Thanks,
Jan-Bernd

On Wednesday 25 July 2007 19:17, Andrew Gallatin wrote:
 Code review / comments:
 ===
 
 1) Checksum information for CHECKSUM_COMPLETE drivers.
 
 Our NIC passes partial checksums to our driver.  In the current code,
 it seems impossible for page based CHECKSUM_COMPLETE drivers to behave
 correctly in the case of rejected frames.  Eg, there is no way
 to pass the partial checksum to the LRO module so that it gets
 set in the skb header and passed up the stack.
 
 Further, there seems to be no (easy) way to use CHECKSUM_COMPLETE
 on an aggregated packet at LRO flush time.  By the time a packet
 is aggregated, the partial checksum from the first segment is
 out of date.
 
 I think it would make sense to require that a CHECKSUM_COMPLETE style
 driver verify the checksum in its get_frag_header / get_skb_header
 callback.  This allows the LRO code to unconditionally set
 CHECKSUM_UNNECESSARY.

I agree

 2) Non-accelerated VLAN tags
 
 Our firmware currently does not do vlan tag insertion
 and removal.  This causes a problem in __lro_proc_segment()
 where the tcp and ip headers are setup to point into the
 newly created skb.  A frame containing an unstripped vlan
 tag will cause the headers to be garbage.
 
 The attached patch modifies __lro_proc_segment() to offset
 those pointers by VLAN_HLEN when required.
 

The modifications you propose are not sufficient to work with HW
which actually extracts the VLAN IDs but does not change the 
eth protocol. Thus we have to add an additional field in
lro_mgr indicating how to interpret the eth protocol regarding
the VLAN header.

 3) Padded frames.
 
 I may be missing something, but I don't see where you
 either strip padding from frames or reject padded frames.
 (see the pskb_trim_rcsum() in net/ipv4/ip_input.c:ip_rcv()
 
I think I missed something :-) Will fix that.
In lro_tcp_ip_check we check for the SKB aggregation interface
the skb-len against ip-tot_len. This catches padded frames as
eth_type_trans(skb, dev) reduces the length of the SKB.
However, the possible VLAN header is not taken into account. 
And for the receive in pages interface a wrong length is passed
as argument as well. 

I'd suggest to reject padded frames for aggregation as we do now
(when bugs are fixed) and thus keep the code simple.
I guess that padded frames don't occur too often in high load 
situations. If we detect a real performance issue we can still
change that later.

 I did not add such a feature as I was confused about the intended
 use of len/true_size.
len: number of bytes received
true_size: used to fill the truesize field in the SKB. Thus this reflects
   the amount of memory that is actually used by that SKB. If you
   receive into pages und you have some space between packets, you
   should take this into account. Example: you use 1 page for each
   packet, then you pass 4096 as argument.

 
 Also, trimming is a pain when dealing with pure frags (without a
 containing skb).  We have code in our out-of-kernel driver to deal
 with it which you are welcome to use.
 
 

 4) LRO_MIN_PG_HLEN (== 80)
 
 This confuses me.  Can you please explain what you're trying to do?
 Because of this, I kept getting crashes in the skb_pull() done by
 eth_type_trans() because I was passing segments which were 60 bytes
 and the skb-data_len of the skb constructed by lro_gen_skb() was -20.
 I added my own code to bump the length to a magic 80 bytes, and the
 panics disappeared.  This may cause data corruption because of
 #3 above!
Yes, I see the point... I'm not sure in how far there are any requirements
that a certain amount of data (header) for other types of traffic
has to be in the skb-data field and not in frags. Maybe someone
can comment on this?
I suggest to remove LRO_MIN_PG_HLEN for tcp/ip packets that are aggregated,
but should we use a minimal length for other traffic (depending on the
number of received bytes)? Is that necessary?

 
 5) NAPI/non-NAPI
 
 The LRO code assumes the underlying driver uses NAPI, and calls
 netif_receive_skb() rather than netif_rx().  Perhaps there should be a
 field in the lro_mgr struct to specify napi / non-napi.
 
Yes, if someone intends to use it without napi, we can add this.

 6) The checks for when to stop aggregating and flush in
 __lro_proc_{segment|skb} need some improvement.
 
 The skb variant currently uses a pure count (max_aggr).  In order to
 keep the resulting aggregated frame below 64KB, the underlying driver
 computes max_aggr as 0x / MTU.  This reduces the effectiveness of
 LRO on mixed MTU networks.  Eg, this causes packets coming from a
 

[PATCH] eHEA: net_poll support

2007-07-23 Thread Jan-Bernd Themann
net_poll support for eHEA added

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |   22 +-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index 489c8b2..8ee2c2c 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0071
+#define DRV_VERSIONEHEA_0072
 
 /* eHEA capability flags */
 #define DLPAR_PORT_ADD_REM 1
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 4c70a93..58702f5 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -589,6 +589,23 @@ static int ehea_poll(struct net_device *dev, int *budget)
return 1;
 }
 
+#ifdef CONFIG_NET_POLL_CONTROLLER
+static void ehea_netpoll(struct net_device *dev)
+{
+   struct ehea_port *port = netdev_priv(dev);
+
+   netif_rx_schedule(port-port_res[0].d_netdev);
+}
+#endif
+
+static int ehea_poll_firstqueue(struct net_device *dev, int *budget)
+{
+   struct ehea_port *port = netdev_priv(dev);
+   struct net_device *d_dev = port-port_res[0].d_netdev;
+
+   return ehea_poll(d_dev, budget);
+}
+
 static irqreturn_t ehea_recv_irq_handler(int irq, void *param)
 {
struct ehea_port_res *pr = param;
@@ -2626,7 +2643,10 @@ struct ehea_port *ehea_setup_single_port(struct 
ehea_adapter *adapter,
memcpy(dev-dev_addr, port-mac_addr, ETH_ALEN);
 
dev-open = ehea_open;
-   dev-poll = ehea_poll;
+   dev-poll = ehea_poll_firstqueue;
+#ifdef CONFIG_NET_POLL_CONTROLLER
+   dev-poll_controller = ehea_netpoll;
+#endif
dev-weight = 64;
dev-stop = ehea_stop;
dev-hard_start_xmit = ehea_start_xmit;
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[RFC 0/1] lro: Generic Large Receive Offload for TCP traffic

2007-07-20 Thread Jan-Bernd Themann
Hi,

Thanks a lot for your comments so far.
This generic LRO patch differs from the last one in several points.
A new interface for a receive in pages mode has been added and tested
with an eHEA prototype. Seems to work well.

Does this extended interface seem to be sufficient?

Below some more explanations:

Thanks,
Jan-Bernd


Changes to http://www.spinics.net/lists/netdev/msg35490.html :

- Interfaces are changed to allow later support for IPv6 / UDP
- New interface to support receive in pages
- TCP checksums are updated properly
- TCP packets with push flag are aggregated now
- Timestamps are now compared using after()


The additional interface to support receive in pages:

void lro_receive_frags(struct net_lro_mgr *lro_mgr,
   struct skb_frag_struct *frags,
   int len, int true_size, void *priv);

void lro_vlan_hwaccel_receive_frags(struct net_lro_mgr *lro_mgr,
struct skb_frag_struct *frags,
int len,
int true_size,
struct vlan_group *vgrp,
u16 vlan_tag,
void *priv);

These functions generate SKBs only for the first packet of an
LRO session. The next fragment list to be aggregated will be
added in the fragment list of that SKB.

The reason why this is a smart approach is described in:
http://www.spinics.net/lists/netdev/msg35634.html

All other packets that do not match the LRO requirements are
put in an SKB and sent to the stack.

Packets that are received in an extra buffer (small packets) and
thus not in an skb fragment can be sent by the driver to the stack
after flushing the appropriate LRO sessions:

void lro_flush_pkt(struct net_lro_mgr *lro_mgr,
   struct iphdr *iph, struct tcphdr *tcph);

or

void lro_flush_all(struct net_lro_mgr *lro_mgr);

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[RFC 1/1] lro: Generic Large Receive Offload for TCP traffic

2007-07-20 Thread Jan-Bernd Themann
Generic LRO patch

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]

---
 include/linux/inet_lro.h |  154 +
 net/ipv4/inet_lro.c  |  549 ++
 2 files changed, 703 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/inet_lro.h
 create mode 100644 net/ipv4/inet_lro.c

diff --git a/include/linux/inet_lro.h b/include/linux/inet_lro.h
new file mode 100644
index 000..2680ecf
--- /dev/null
+++ b/include/linux/inet_lro.h
@@ -0,0 +1,154 @@
+/*
+ *  linux/include/linux/inet_lro.h
+ *
+ *  Large Receive Offload (ipv4 / tcp)
+ *
+ *  (C) Copyright IBM Corp. 2007
+ *
+ *  Authors:
+ *   Jan-Bernd Themann [EMAIL PROTECTED]
+ *   Christoph Raisch [EMAIL PROTECTED]
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef __INET_LRO_H_
+#define __INET_LRO_H_
+
+#include net/ip.h
+#include net/tcp.h
+
+#define LRO_IPV4 1
+#define LRO_TCP 2
+
+/*
+ * LRO descriptor for a tcp session
+ */
+struct net_lro_desc {
+   struct sk_buff *parent;
+   struct sk_buff *last_skb;
+   struct skb_frag_struct *next_frag;
+   struct iphdr *iph;
+   struct tcphdr *tcph;
+   struct vlan_group *vgrp;
+   __wsum  data_csum;
+   u32 tcp_rcv_tsecr;
+   u32 tcp_rcv_tsval;
+   u32 tcp_ack;
+   u32 tcp_next_seq;
+   u32 skb_tot_frags_len;
+   u16 ip_tot_len;
+   u16 tcp_saw_tstamp; /* timestamps enabled */
+   u16 tcp_window;
+   u16 vlan_tag;
+   int pkt_aggr_cnt;   /* counts aggregated packets */
+   int vlan_packet;
+   int active;
+};
+
+/*
+ * Large Receive Offload (LRO) Manager
+ *
+ * Fields must be set by driver
+ */
+
+struct net_lro_mgr {
+   struct net_device *dev; /* Required for receive in page mode */
+   u32 ip_summed; /* Options to be set in generated SKB in page mode */
+   int max_desc; /* Max number of LRO descriptors  */
+   int max_aggr; /* Max number of LRO packets to be aggregated */
+
+   struct net_lro_desc *lro_arr; /* Array of LRO descriptors */
+
+   /*
+* Optimized driver functions
+*
+* get_skb_header: returns tcp and ip header for packet in SKB
+*/
+   int (*get_skb_header)(struct sk_buff *skb, void **ip_hdr,
+ void **tcpudp_hdr, u64 *hdr_flags, void *priv);
+
+   /*
+* get_frag_header: returns mac, tcp and ip header for packet in SKB
+*
+* @hdr_flags: Indicate what kind of LRO has to be done
+* (IPv4/IPv6/TCP/UDP)
+*/
+   int (*get_frag_header)(struct skb_frag_struct *frag, void **mac_hdr,
+  void **ip_hdr, void **tcpudp_hdr, u64 *hdr_flags,
+  void *priv);
+};
+
+/*
+ * Processes a SKB
+ *
+ * @lro_mgr: LRO manager to use
+ * @skb: SKB to aggregate
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_skb(struct net_lro_mgr *lro_mgr,
+struct sk_buff *skb,
+void *priv);
+
+/*
+ * Processes a SKB with VLAN HW acceleration support
+ */
+
+void lro_vlan_hwaccel_receive_skb(struct net_lro_mgr *lro_mgr,
+ struct sk_buff *skb,
+ struct vlan_group *vgrp,
+ u16 vlan_tag,
+ void *priv);
+
+/*
+ * Processes a fragment list
+ *
+ * This functions aggregate fragments and generate SKBs do pass
+ * the packets to the stack.
+ *
+ * @lro_mgr: LRO manager to use
+ * @frags: Fragment to be processed. Must contain entire header in first
+ * element.
+ * @len: Length of received data
+ * @true_size: Actual size of memory the fragment is consuming
+ * @priv: Private data that may be used by driver functions
+ *(for example get_tcp_ip_hdr)
+ */
+
+void lro_receive_frags(struct net_lro_mgr *lro_mgr,
+  struct skb_frag_struct *frags,
+  int len, int true_size, void *priv);
+
+void lro_vlan_hwaccel_receive_frags(struct net_lro_mgr *lro_mgr,
+   struct skb_frag_struct *frags,
+   int len,
+   int true_size

Re: [RFC 0/3] lro: Generic Large Receive Offload for TCP traffic (IPv6)

2007-07-18 Thread Jan-Bernd Themann
Hi,

I suggest we keep the interface open for IPv6 support by adding 
an additional parameter but first just get IPv4 support only 
into the kernel. IPv6 support can then incrementially be added.
Would that be ok?



On Sunday 15 July 2007 11:40, David Miller wrote:
 From: Christoph Hellwig [EMAIL PROTECTED]
 Date: Sun, 15 Jul 2007 10:12:53 +0100
 
  I'm not sure that's a good idea.  If current chips can't handle ipv6
  lro there is no way to actually test it and the code will surely bitrot.
 
 Christoph, you can do LRO pretty much completely in software.
 
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH 1/2] eHEA: Capability flag for DLPAR support

2007-07-05 Thread Jan-Bernd Themann
This patch introduces a capability flag that is used by the DLPAR userspace
tool to check which DLPAR features are supported by the eHEA driver.

Missing goto has been included.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]
---
 drivers/net/ehea/ehea.h  |8 +++-
 drivers/net/ehea/ehea_main.c |   23 ++-
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index abaf3ac..f03f070 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,13 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0065
+#define DRV_VERSIONEHEA_0067
+
+/* EHEA capability flags */
+#define DLPAR_PORT_ADD_REM 1
+#define DLPAR_MEM_ADD 2
+#define DLPAR_MEM_REM 4
+#define EHEA_CAPABILITIES (DLPAR_PORT_ADD_REM)
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index bdb5241..383144d 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -2923,6 +2923,15 @@ static int check_module_parm(void)
return ret;
 }
 
+static ssize_t ehea_show_capabilities(struct device_driver *drv,
+ char *buf)
+{
+   return sprintf(buf, %d, EHEA_CAPABILITIES);
+}
+
+static DRIVER_ATTR(capabilities, S_IRUSR | S_IRGRP | S_IROTH,
+  ehea_show_capabilities, NULL);
+
 int __init ehea_module_init(void)
 {
int ret;
@@ -2934,8 +2943,19 @@ int __init ehea_module_init(void)
if (ret)
goto out;
ret = ibmebus_register_driver(ehea_driver);
-   if (ret)
+   if (ret) {
ehea_error(failed registering eHEA device driver on ebus);
+   goto out;
+   }
+
+   ret = driver_create_file(ehea_driver.driver,
+driver_attr_capabilities);
+   if (ret) {
+   ehea_error(failed to register capabilities attribute, ret=%d,
+  ret);
+   ibmebus_unregister_driver(ehea_driver);
+   goto out;
+   }
 
 out:
return ret;
@@ -2943,6 +2963,7 @@ out:
 
 static void __exit ehea_module_exit(void)
 {
+   driver_remove_file(ehea_driver.driver, driver_attr_capabilities);
ibmebus_unregister_driver(ehea_driver);
 }
 
-- 
1.5.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH/RESENT] ehea: Whitespace cleanup

2007-07-02 Thread Jan-Bernd Themann
This patch fixes several whitespace issues.

Signed-off-by: Jan-Bernd Themann [EMAIL PROTECTED]
---


diff --git a/drivers/net/ehea/ehea.h b/drivers/net/ehea/ehea.h
index c0f81b5..abaf3ac 100644
--- a/drivers/net/ehea/ehea.h
+++ b/drivers/net/ehea/ehea.h
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0064
+#define DRV_VERSIONEHEA_0065
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
@@ -136,10 +136,10 @@ void ehea_dump(void *adr, int len, char *msg);
(0xULL  ((64 - (mask))  0x))
 
 #define EHEA_BMASK_SET(mask, value) \
-((EHEA_BMASK_MASK(mask)  ((u64)(value)))  EHEA_BMASK_SHIFTPOS(mask))
+   ((EHEA_BMASK_MASK(mask)  ((u64)(value)))  EHEA_BMASK_SHIFTPOS(mask))
 
 #define EHEA_BMASK_GET(mask, value) \
-(EHEA_BMASK_MASK(mask)  (((u64)(value))  EHEA_BMASK_SHIFTPOS(mask)))
+   (EHEA_BMASK_MASK(mask)  (((u64)(value))  EHEA_BMASK_SHIFTPOS(mask)))
 
 /*
  * Generic ehea page
@@ -190,7 +190,7 @@ struct ehea_av;
  * Queue attributes passed to ehea_create_qp()
  */
 struct ehea_qp_init_attr {
-/* input parameter */
+   /* input parameter */
u32 qp_token;   /* queue token */
u8 low_lat_rq1;
u8 signalingtype;   /* cqe generation flag */
@@ -212,7 +212,7 @@ struct ehea_qp_init_attr {
u64 recv_cq_handle;
u64 aff_eq_handle;
 
-/* output parameter */
+   /* output parameter */
u32 qp_nr;
u16 act_nr_send_wqes;
u16 act_nr_rwqes_rq1;
@@ -279,12 +279,12 @@ struct ehea_qp {
  * Completion Queue attributes
  */
 struct ehea_cq_attr {
-/* input parameter */
+   /* input parameter */
u32 max_nr_of_cqes;
u32 cq_token;
u64 eq_handle;
 
-/* output parameter */
+   /* output parameter */
u32 act_nr_of_cqes;
u32 nr_pages;
 };
diff --git a/drivers/net/ehea/ehea_hw.h b/drivers/net/ehea/ehea_hw.h
index 1246757..1af7ca4 100644
--- a/drivers/net/ehea/ehea_hw.h
+++ b/drivers/net/ehea/ehea_hw.h
@@ -211,34 +211,34 @@ static inline void epa_store_acc(struct h_epa epa, u32 
offset, u64 value)
 }
 
 #define epa_store_eq(epa, offset, value)\
-epa_store(epa, EQTEMM_OFFSET(offset), value)
+   epa_store(epa, EQTEMM_OFFSET(offset), value)
 #define epa_load_eq(epa, offset)\
-epa_load(epa, EQTEMM_OFFSET(offset))
+   epa_load(epa, EQTEMM_OFFSET(offset))
 
 #define epa_store_cq(epa, offset, value)\
-epa_store(epa, CQTEMM_OFFSET(offset), value)
+   epa_store(epa, CQTEMM_OFFSET(offset), value)
 #define epa_load_cq(epa, offset)\
-epa_load(epa, CQTEMM_OFFSET(offset))
+   epa_load(epa, CQTEMM_OFFSET(offset))
 
 #define epa_store_qp(epa, offset, value)\
-epa_store(epa, QPTEMM_OFFSET(offset), value)
+   epa_store(epa, QPTEMM_OFFSET(offset), value)
 #define epa_load_qp(epa, offset)\
-epa_load(epa, QPTEMM_OFFSET(offset))
+   epa_load(epa, QPTEMM_OFFSET(offset))
 
 #define epa_store_qped(epa, offset, value)\
-epa_store(epa, QPEDMM_OFFSET(offset), value)
+   epa_store(epa, QPEDMM_OFFSET(offset), value)
 #define epa_load_qped(epa, offset)\
-epa_load(epa, QPEDMM_OFFSET(offset))
+   epa_load(epa, QPEDMM_OFFSET(offset))
 
 #define epa_store_mrmw(epa, offset, value)\
-epa_store(epa, MRMWMM_OFFSET(offset), value)
+   epa_store(epa, MRMWMM_OFFSET(offset), value)
 #define epa_load_mrmw(epa, offset)\
-epa_load(epa, MRMWMM_OFFSET(offset))
+   epa_load(epa, MRMWMM_OFFSET(offset))
 
 #define epa_store_base(epa, offset, value)\
-epa_store(epa, HCAGR_OFFSET(offset), value)
+   epa_store(epa, HCAGR_OFFSET(offset), value)
 #define epa_load_base(epa, offset)\
-epa_load(epa, HCAGR_OFFSET(offset))
+   epa_load(epa, HCAGR_OFFSET(offset))
 
 static inline void ehea_update_sqa(struct ehea_qp *qp, u16 nr_wqes)
 {
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index 9e13433..bdb5241 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -81,7 +81,7 @@ MODULE_PARM_DESC(use_mcs,  0:NAPI, 1:Multiple receive 
queues, Default = 1 );
 static int port_name_cnt = 0;
 
 static int __devinit ehea_probe_adapter(struct ibmebus_dev *dev,
-const struct of_device_id *id);
+   const struct of_device_id *id);
 
 static int __devexit ehea_remove(struct ibmebus_dev *dev);
 
@@ -236,7 +236,7 @@ static int ehea_refill_rq_def(struct ehea_port_res *pr,
 
rwqe = ehea_get_next_rwqe(qp, rq_nr);
rwqe-wr_id = EHEA_BMASK_SET(EHEA_WR_ID_TYPE, wqe_type)
-   | EHEA_BMASK_SET(EHEA_WR_ID_INDEX, index);
+   | EHEA_BMASK_SET(EHEA_WR_ID_INDEX, index);
rwqe-sg_list[0].l_key = pr-recv_mr.lkey;
rwqe-sg_list[0].vaddr = (u64)skb