Re: Fix Emulex oce driver in CURRENT

2014-12-05 Thread Steven Hartland


On 04/09/2014 09:49, Borja Marcos wrote:

On Jun 30, 2014, at 8:02 PM, John Baldwin wrote:


I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with
folks at Emulex to maintain this driver.  He is probably the best person to
review this.

Hi,

Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been 
applied to -STABLE. Any chances to do it yet? As far as I know, the oce 
driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 
seconds.


Was there any conclusion to this, current and releng/10.0  releng/10.1 
seem pretty similar with regards oce but a customer is reporting panics 
very similar to this thread.


Did the commit of the additional locking never make it in?

Regards
Steve
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-12-05 Thread Borja Marcos

On Dec 5, 2014, at 2:00 PM, Steven Hartland wrote:

 
 On 04/09/2014 09:49, Borja Marcos wrote:
 On Jun 30, 2014, at 8:02 PM, John Baldwin wrote:
 
 I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked 
 with
 folks at Emulex to maintain this driver.  He is probably the best person to
 review this.
 Hi,
 
 Seems 10.1 is on the pipeline now, but as far as I know none of these fixes 
 have been applied to -STABLE. Any chances to do it yet? As far as I know, 
 the oce driver is currently unusable in -STABLE. I managed to cause a 
 panic reliably within 30 seconds.
 
 Was there any conclusion to this, current and releng/10.0  releng/10.1 seem 
 pretty similar with regards oce but a customer is reporting panics very 
 similar to this thread.
 
 Did the commit of the additional locking never make it in?

Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and 
I've been applying the patch manually myself. 

I don't think it's been applied even to -HEAD. 

For now I've told my coworkers to avoid Emulex cards whenever possible. As far 
as  I know the driver is unusable in its present state.






Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-12-05 Thread Steven Hartland


On 05/12/2014 13:07, Borja Marcos wrote:

On Dec 5, 2014, at 2:00 PM, Steven Hartland wrote:


On 04/09/2014 09:49, Borja Marcos wrote:

On Jun 30, 2014, at 8:02 PM, John Baldwin wrote:


I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with
folks at Emulex to maintain this driver.  He is probably the best person to
review this.

Hi,

Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been 
applied to -STABLE. Any chances to do it yet? As far as I know, the oce 
driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 
seconds.

Was there any conclusion to this, current and releng/10.0  releng/10.1 seem 
pretty similar with regards oce but a customer is reporting panics very similar to 
this thread.

Did the commit of the additional locking never make it in?

Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and 
I've been applying the patch manually myself.

I don't think it's been applied even to -HEAD.

For now I've told my coworkers to avoid Emulex cards whenever possible. As far 
as  I know the driver is unusable in its present state.

Thanks for the quick reply Borja, review of the patch is now up:
/https://reviews.freebsd.org/D1269

Hopefully we can get this in the tree and make oce usable moving forward.

Regards
Steve
/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-12-05 Thread sthaug
  Seems 10.1 is on the pipeline now, but as far as I know none of these 
  fixes have been applied to -STABLE. Any chances to do it yet? As far as I 
  know, the oce driver is currently unusable in -STABLE. I managed to 
  cause a panic reliably within 30 seconds.
  
  Was there any conclusion to this, current and releng/10.0  releng/10.1 
  seem pretty similar with regards oce but a customer is reporting panics 
  very similar to this thread.
  
  Did the commit of the additional locking never make it in?
 
 Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and 
 I've been applying the patch manually myself. 
 
 I don't think it's been applied even to -HEAD. 

Where can I find a version of the patch to be applied to 10-STABLE? Is
this the one?

 https://bz-attachments.freebsd.org/attachment.cgi?id=144718

Steinar Haug, Nethelp consulting, sth...@nethelp.no
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-09-04 Thread Borja Marcos

On Jun 30, 2014, at 8:02 PM, John Baldwin wrote:

 
 I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with
 folks at Emulex to maintain this driver.  He is probably the best person to
 review this.

Hi,

Seems 10.1 is on the pipeline now, but as far as I know none of these fixes 
have been applied to -STABLE. Any chances to do it yet? As far as I know, the 
oce driver is currently unusable in -STABLE. I managed to cause a panic 
reliably within 30 seconds.






Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
Hi,
I found other problems in the oce driver during some experiments with
netmap in emulation mode.

In details:
- missing locking:
- in some functions there are write accesses on the wq struct (tx queue
descriptor)
without acquire LOCK on the queue, particularly in oce_wq_handler() that is
invoked
in the interrupt routine. For this reason there may be race conditions.

- tx cleanup:
- in oce_if_deactivate() the wq queues are drained but some still pending
mbufs are not freed.
For this reason, I added the oce_tx_clean() that releases any pending mbufs.

I also tried experimenting with iperf3 using the same Borja environment and
I don't have panic.
Can you try this patch? Do you still have the panic?

Cheers,
Stefano Garzarella


diff --git a/sys/dev/oce/oce_if.c b/sys/dev/oce/oce_if.c
index af57491..33b35b4 100644
--- a/sys/dev/oce/oce_if.c
+++ b/sys/dev/oce/oce_if.c
@@ -142,6 +142,7 @@ static int  oce_tx(POCE_SOFTC sc, struct mbuf **mpp,
int wq_index);
 static void oce_tx_restart(POCE_SOFTC sc, struct oce_wq *wq);
 static void oce_tx_complete(struct oce_wq *wq, uint32_t wqe_idx,
  uint32_t status);
+static void oce_tx_clean(POCE_SOFTC sc);
 static int  oce_multiq_transmit(struct ifnet *ifp, struct mbuf *m,
   struct oce_wq *wq);

@@ -585,8 +586,10 @@ oce_multiq_flush(struct ifnet *ifp)
  int i = 0;

  for (i = 0; i  sc-nwqs; i++) {
+ LOCK(sc-wq[i]-tx_lock);
  while ((m = buf_ring_dequeue_sc(sc-wq[i]-br)) != NULL)
  m_freem(m);
+ UNLOCK(sc-wq[i]-tx_lock);
  }
  if_qflush(ifp);
 }
@@ -1052,6 +1055,19 @@ oce_tx_complete(struct oce_wq *wq, uint32_t wqe_idx,
uint32_t status)
  }
 }

+static void
+oce_tx_clean(POCE_SOFTC sc) {
+ int i = 0;
+ struct oce_wq *wq;
+
+ for_all_wq_queues(sc, wq, i) {
+ LOCK(wq-tx_lock);
+ while (wq-pkt_desc_tail != wq-pkt_desc_head) {
+ oce_tx_complete(wq, 0, 0);
+ }
+ UNLOCK(wq-tx_lock);
+ }
+}

 static void
 oce_tx_restart(POCE_SOFTC sc, struct oce_wq *wq)
@@ -1213,6 +1229,8 @@ oce_wq_handler(void *arg)
  struct oce_nic_tx_cqe *cqe;
  int num_cqes = 0;

+ LOCK(wq-tx_lock);
+
  bus_dmamap_sync(cq-ring-dma.tag,
  cq-ring-dma.map, BUS_DMASYNC_POSTWRITE);
  cqe = RING_GET_CONSUMER_ITEM_VA(cq-ring, struct oce_nic_tx_cqe);
@@ -1237,6 +1255,8 @@ oce_wq_handler(void *arg)
  if (num_cqes)
  oce_arm_cq(sc, cq-cq_id, num_cqes, FALSE);

+ UNLOCK(wq-tx_lock);
+
  return 0;
 }

@@ -2087,6 +2107,9 @@ oce_if_deactivate(POCE_SOFTC sc)
  /* Delete RX queue in card with flush param */
  oce_stop_rx(sc);

+ /* Flush the mbufs that are still in TX queues */
+ oce_tx_clean(sc);
+
  /* Invalidate any pending cq and eq entries*/
  for_all_evnt_queues(sc, eq, i)
  oce_drain_eq(eq);
diff --git a/sys/dev/oce/oce_queue.c b/sys/dev/oce/oce_queue.c
index 308c16d..161011b 100644
--- a/sys/dev/oce/oce_queue.c
+++ b/sys/dev/oce/oce_queue.c
@@ -969,7 +969,9 @@ oce_start_rq(struct oce_rq *rq)
 int
 oce_start_wq(struct oce_wq *wq)
 {
+ LOCK(wq-tx_lock); /* XXX: maybe not necessary */
  oce_arm_cq(wq-parent, wq-cq-cq_id, 0, TRUE);
+ UNLOCK(wq-tx_lock);
  return 0;
 }

@@ -1076,6 +1078,8 @@ oce_drain_wq_cq(struct oce_wq *wq)
 struct oce_nic_tx_cqe *cqe;
 int num_cqes = 0;

+ LOCK(wq-tx_lock); /* XXX: maybe not necessary */
+
  bus_dmamap_sync(cq-ring-dma.tag, cq-ring-dma.map,
   BUS_DMASYNC_POSTWRITE);

@@ -1093,6 +1097,7 @@ oce_drain_wq_cq(struct oce_wq *wq)

  oce_arm_cq(sc, cq-cq_id, num_cqes, FALSE);

+ UNLOCK(wq-tx_lock);
 }



2014-07-07 13:57 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 7, 2014, at 1:23 PM, Luigi Rizzo wrote:

  On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote:
  we'll try to investigate, can you tell us more about the environment you
 use ?
  (FreeBSD version, card model (PCI id perhaps), iperf3 invocation line,
  interface configuration etc.)
 
  The main differences between 10.0.747.0 and the code in head (after
  our fix) is the use
  of drbr_enqueue/dequeue versus the peek/putback in the transmit routine.
 
 
  Both drivers still have issues when the link flaps because the
  transmit queue is not cleaned
  up properly (unlike what happens in the linux driver and all FreeBSD
  drivers for different
  hardware), so it might well be that you are seeing some side effect of
  that or other
  problem which manifests itself differently depending on the environment.
 
  'instant panic' by itself does not tell us anything about what could
  be the problem you experience (and we do not see it with either driver).

 The environment details are here:

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

 The way I produce an instant panic is:

 1) Connect to another machine (cross connect cable)

 2) iperf3 -s on the other machine
 (The other machine is different, it has an  ix card)

 3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N

 In less than 30 seconds, panic.



 mierda dumped core - see /var/crash/vmcore.0

 Mon Jul  7 13:06:44 CEST 2014

 FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul  7 11:41:45
 CEST 2014 

Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 10:22 AM, Stefano Garzarella wrote:

 Hi,
 I found other problems in the oce driver during some experiments with
 netmap in emulation mode.

What about driver  version 10.0.747.0? At least in my configuration it works 
perfectly, no crashes despite keeping it running for several days at full 
bandwidth.

I have a server about to go into production. Should this patch work on 
10-STABLE?






Borja.


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
I used the oce driver in CURRENT.
I think that this patch in combination with the previous one should work in
10-STABLE.

I have only tested if it works with CURRENT, but now I try if it works with
10-STABLE and I'll send you some feedback.

Cheers,
Stefano


2014-07-15 10:28 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 15, 2014, at 10:22 AM, Stefano Garzarella wrote:

  Hi,
  I found other problems in the oce driver during some experiments with
  netmap in emulation mode.

 What about driver  version 10.0.747.0? At least in my configuration it
 works perfectly, no crashes despite keeping it running for several days at
 full bandwidth.

 I have a server about to go into production. Should this patch work on
 10-STABLE?






 Borja.





-- 
Stefano Garzarella
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote:

 I used the oce driver in CURRENT. 
 I think that this patch in combination with the previous one should work in 
 10-STABLE.
 
 I have only tested if it works with CURRENT, but now I try if it works with 
 10-STABLE and I'll send you some feedback.

I can still try. Will get back to you soon.


Cheers,




Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote:

 I used the oce driver in CURRENT.
 I think that this patch in combination with the previous one should work in
 10-STABLE.
 
 I have only tested if it works with CURRENT, but now I try if it works with
 10-STABLE and I'll send you some feedback.

Hmmm. The patch seems to be broken. I have tried to apply it renaming the 
a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains:

Patching file oce_if.c using Plan A...
patch:  malformed patch at line 6: int wq_index);


Was it broken by the email client formatting? Or am I being especially clumsy 
today? ;)




Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
I think there is some problem with the email formatting.
I send you a file with both patches.

Cheers,
Stefano


2014-07-15 11:12 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote:

  I used the oce driver in CURRENT.
  I think that this patch in combination with the previous one should work
 in
  10-STABLE.
 
  I have only tested if it works with CURRENT, but now I try if it works
 with
  10-STABLE and I'll send you some feedback.

 Hmmm. The patch seems to be broken. I have tried to apply it renaming the
 a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains:

 Patching file oce_if.c using Plan A...
 patch:  malformed patch at line 6: int wq_index);


 Was it broken by the email client formatting? Or am I being especially
 clumsy today? ;)




 Borja.




-- 
Stefano Garzarella


oce_fix_STABLE10.patch
Description: Binary data
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
I just tried to run iperf3 with this patch and STABLE-10 and it seems to
work.
Do you have a panic?

Cheers,
Stefano


2014-07-15 11:19 GMT+02:00 Stefano Garzarella stefanogarzare...@gmail.com:

 I think there is some problem with the email formatting.
 I send you a file with both patches.

 Cheers,
 Stefano


 2014-07-15 11:12 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote:

  I used the oce driver in CURRENT.
  I think that this patch in combination with the previous one should
 work in
  10-STABLE.
 
  I have only tested if it works with CURRENT, but now I try if it works
 with
  10-STABLE and I'll send you some feedback.

 Hmmm. The patch seems to be broken. I have tried to apply it renaming the
 a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains:

 Patching file oce_if.c using Plan A...
 patch:  malformed patch at line 6: int wq_index);


 Was it broken by the email client formatting? Or am I being especially
 clumsy today? ;)




 Borja.




 --
 Stefano Garzarella




-- 
Stefano Garzarella
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote:

 I just tried to run iperf3 with this patch and STABLE-10 and it seems to work.
 Do you have a panic?

Still compiling :) Anyway, you didn't suffer panics before, right?




Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
2014-07-15 11:46 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote:

  I just tried to run iperf3 with this patch and STABLE-10 and it seems to
 work.
  Do you have a panic?

 Still compiling :) Anyway, you didn't suffer panics before, right?


Right, I didn't suffer panics with iperf3, but with netmap in emulation
mode I had a lot of panics before this patch.

Stefano





 Borja.




-- 
Stefano Garzarella
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote:

 I just tried to run iperf3 with this patch and STABLE-10 and it seems to
 work.
 Do you have a panic?

So far, so good. I've ran a couple of iperf3 tests (60 seconds, trying both 
directions) and it doesn't crash.

Without the fixes I obtained a panic quite reliably, in less than 30 seconds.

Still trying. But the bugs you mentioned (lack of locking and deallocating, 
etc) seem to be consistent with the kind of failures I saw and their apparent 
randomness.

So, asking for spiritual counsel now. Would you use this driver  in a 
production environment instead of the 747 version downloaded from Emulex? I 
think the latter is giving slightly better performance but, anyway, I disable 
LRO and TSO because I see a horrible impact on NFS performance.

Cheers,





Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Stefano Garzarella
2014-07-15 12:00 GMT+02:00 Borja Marcos bor...@sarenet.es:


 On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote:

  I just tried to run iperf3 with this patch and STABLE-10 and it seems to
  work.
  Do you have a panic?

 So far, so good. I've ran a couple of iperf3 tests (60 seconds, trying
 both directions) and it doesn't crash.

 Without the fixes I obtained a panic quite reliably, in less than 30
 seconds.


 Still trying. But the bugs you mentioned (lack of locking and
 deallocating, etc) seem to be consistent with the kind of failures I saw
 and their apparent randomness.


Well.



 So, asking for spiritual counsel now. Would you use this driver  in a
 production environment instead of the 747 version downloaded from Emulex? I
 think the latter is giving slightly better performance but, anyway, I
 disable LRO and TSO because I see a horrible impact on NFS performance.


I made a diff between the two versions (CURRENT and 747) and I saw that the
main difference is in the management of buf_ring through drbr API.
In the CURRENT driver they use a new function drbr_peek() instead of
drbr_dequeue() and I think this is better.
However, even in the 747 version seems to have the problem of the lack of
locking.

Cheers,
Stefano

Cheers,





 Borja.




-- 
Stefano Garzarella
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-15 Thread Borja Marcos

On Jul 15, 2014, at 1:36 PM, Stefano Garzarella wrote:

 So, asking for spiritual counsel now. Would you use this driver  in a 
 production environment instead of the 747 version downloaded from Emulex? I 
 think the latter is giving slightly better performance but, anyway, I disable 
 LRO and TSO because I see a horrible impact on NFS performance.
 
 
 I made a diff between the two versions (CURRENT and 747) and I saw that the 
 main difference is in the management of buf_ring through drbr API.
 In the CURRENT driver they use a new function drbr_peek() instead of 
 drbr_dequeue() and I think this is better.
 However, even in the 747 version seems to have the problem of the lack of 
 locking.

Well, definitely you saved my cake! So it was still a tickling time bomb.

Thank you very much!




Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-07 Thread Borja Marcos

On Jul 1, 2014, at 10:24 PM, Luigi Rizzo wrote:

 
 
 
 On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote:
 El 30.06.2014 18:36, Stefano Garzarella escribió:
 
 Hello,
 I had problems during some experiments with Emulex and oce driver in
 CURRENT.
 I found several bugs in the oce driver and this patch fixes them.
 
 At least with some cards, the driver simply does not work. It causes a panic 
 when there is some traffic.
 
 The relevant bug report is here.
 
 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391
 
 The latest version available from the Emulex website works. But the version 
 bundled with 9.3 and at least -STABLE (which is the same version bundled with 
 -CURRENT) does cause panics on 10- and 9-
 
 ​i compared the code on the emulex website (10.0.747.0 ?) with the
 one in HEAD and it does not seem​ much different, but perhaps
 you have some other version in mind ?
 
 The bugs found by stefano exist also in the emulex version above.

Anyway

The fixed version is an instant panic when generating traffic (just use 
iperf3). Version 10.0.747.0  does _not_ panic.





Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Fix Emulex oce driver in CURRENT

2014-07-07 Thread Luigi Rizzo
On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote:

 On Jul 1, 2014, at 10:24 PM, Luigi Rizzo wrote:




 On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote:
 El 30.06.2014 18:36, Stefano Garzarella escribió:

 Hello,
 I had problems during some experiments with Emulex and oce driver in
 CURRENT.
 I found several bugs in the oce driver and this patch fixes them.

 At least with some cards, the driver simply does not work. It causes a panic 
 when there is some traffic.

 The relevant bug report is here.

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

 The latest version available from the Emulex website works. But the version 
 bundled with 9.3 and at least -STABLE (which is the same version bundled 
 with -CURRENT) does cause panics on 10- and 9-

 i compared the code on the emulex website (10.0.747.0 ?) with the
 one in HEAD and it does not seem much different, but perhaps
 you have some other version in mind ?

 The bugs found by stefano exist also in the emulex version above.

 Anyway

 The fixed version is an instant panic when generating traffic (just use 
 iperf3). Version 10.0.747.0  does _not_ panic.

we'll try to investigate, can you tell us more about the environment you use ?
(FreeBSD version, card model (PCI id perhaps), iperf3 invocation line,
interface configuration etc.)

The main differences between 10.0.747.0 and the code in head (after
our fix) is the use
of drbr_enqueue/dequeue versus the peek/putback in the transmit routine.


Both drivers still have issues when the link flaps because the
transmit queue is not cleaned
up properly (unlike what happens in the linux driver and all FreeBSD
drivers for different
hardware), so it might well be that you are seeing some side effect of
that or other
problem which manifests itself differently depending on the environment.

'instant panic' by itself does not tell us anything about what could
be the problem you experience (and we do not see it with either driver).

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Fix Emulex oce driver in CURRENT

2014-07-07 Thread Borja Marcos

On Jul 7, 2014, at 1:23 PM, Luigi Rizzo wrote:

 On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote:
 we'll try to investigate, can you tell us more about the environment you use ?
 (FreeBSD version, card model (PCI id perhaps), iperf3 invocation line,
 interface configuration etc.)
 
 The main differences between 10.0.747.0 and the code in head (after
 our fix) is the use
 of drbr_enqueue/dequeue versus the peek/putback in the transmit routine.
 
 
 Both drivers still have issues when the link flaps because the
 transmit queue is not cleaned
 up properly (unlike what happens in the linux driver and all FreeBSD
 drivers for different
 hardware), so it might well be that you are seeing some side effect of
 that or other
 problem which manifests itself differently depending on the environment.
 
 'instant panic' by itself does not tell us anything about what could
 be the problem you experience (and we do not see it with either driver).

The environment details are here:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

The way I produce an instant panic is:

1) Connect to another machine (cross connect cable)

2) iperf3 -s on the other machine 
(The other machine is different, it has an  ix card)

3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N

In less than 30 seconds, panic.



mierda dumped core - see /var/crash/vmcore.0

Mon Jul  7 13:06:44 CEST 2014

FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul  7 11:41:45 CEST 
2014 root@mierda:/usr/obj/usr/src/sys/GENERIC  amd64

panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...

Unread portion of the kernel message buffer:
panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing
cpuid = 12
KDB: stack backtrace:
#0 0x8092a470 at kdb_backtrace+0x60
#1 0x808ef9c5 at panic+0x155
#2 0x80962710 at sbdroprecord_locked+0
#3 0x80a8ba8c at tcp_output+0xdbc
#4 0x80a8987f at tcp_do_segment+0x30ff
#5 0x80a85b34 at tcp_input+0xd04
#6 0x80a1af57 at ip_input+0x97
#7 0x809ba512 at netisr_dispatch_src+0x62
#8 0x809b1ae6 at ether_demux+0x126
#9 0x809b278e at ether_nh_input+0x35e
#10 0x809ba512 at netisr_dispatch_src+0x62
#11 0x81c19ab9 at oce_rx+0x3c9
#12 0x81c19536 at oce_rq_handler+0xb6
#13 0x81c1bb1c at oce_intr+0xdc
#14 0x80938b35 at taskqueue_run_locked+0xe5
#15 0x809395c8 at taskqueue_thread_loop+0xa8
#16 0x808c057a at fork_exit+0x9a
#17 0x80ccb51e at fork_trampoline+0xe
Uptime: 51m20s













Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-07 Thread Luigi Rizzo
On Mon, Jul 7, 2014 at 1:57 PM, Borja Marcos bor...@sarenet.es wrote:
...

 The environment details are here:

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

 The way I produce an instant panic is:

 1) Connect to another machine (cross connect cable)

 2) iperf3 -s on the other machine
 (The other machine is different, it has an  ix card)

 3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N

 In less than 30 seconds, panic.



 mierda dumped core - see /var/crash/vmcore.0

 Mon Jul  7 13:06:44 CEST 2014

 FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul  7 11:41:45 CEST 
 2014 root@mierda:/usr/obj/usr/src/sys/GENERIC  amd64

 panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 
 clashing

 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...

 Unread portion of the kernel message buffer:
 panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 
 clashing
 cpuid = 12
 KDB: stack backtrace:
 #0 0x8092a470 at kdb_backtrace+0x60
 #1 0x808ef9c5 at panic+0x155
 #2 0x80962710 at sbdroprecord_locked+0
 #3 0x80a8ba8c at tcp_output+0xdbc
 #4 0x80a8987f at tcp_do_segment+0x30ff
 #5 0x80a85b34 at tcp_input+0xd04
 #6 0x80a1af57 at ip_input+0x97
 #7 0x809ba512 at netisr_dispatch_src+0x62
 #8 0x809b1ae6 at ether_demux+0x126
 #9 0x809b278e at ether_nh_input+0x35e
 #10 0x809ba512 at netisr_dispatch_src+0x62
 #11 0x81c19ab9 at oce_rx+0x3c9
 #12 0x81c19536 at oce_rq_handler+0xb6
 #13 0x81c1bb1c at oce_intr+0xdc
 #14 0x80938b35 at taskqueue_run_locked+0xe5
 #15 0x809395c8 at taskqueue_thread_loop+0xa8
 #16 0x808c057a at fork_exit+0x9a
 #17 0x80ccb51e at fork_trampoline+0xe
 Uptime: 51m20s

ah, that seems a bug on the receive side, we were only looking
at the transmit side so far.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Fix Emulex oce driver in CURRENT

2014-07-01 Thread borjam

El 30.06.2014 18:36, Stefano Garzarella escribió:

Hello,
I had problems during some experiments with Emulex and oce driver in
CURRENT.
I found several bugs in the oce driver and this patch fixes them.


At least with some cards, the driver simply does not work. It causes a 
panic when there is some traffic.


The relevant bug report is here.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

The latest version available from the Emulex website works. But the 
version bundled with 9.3 and at least -STABLE (which is the same version 
bundled with -CURRENT) does cause panics on 10- and 9-


It's quite easy to reproduce. Link two machines, fire iperf to generate 
traffic and watch the almost instant panic.





Borja.

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Fix Emulex oce driver in CURRENT

2014-07-01 Thread Luigi Rizzo
On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote:

 El 30.06.2014 18:36, Stefano Garzarella escribió:

  Hello,
 I had problems during some experiments with Emulex and oce driver in
 CURRENT.
 I found several bugs in the oce driver and this patch fixes them.


 At least with some cards, the driver simply does not work. It causes a
 panic when there is some traffic.

 The relevant bug report is here.

 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391

 The latest version available from the Emulex website works. But the
 version bundled with 9.3 and at least -STABLE (which is the same version
 bundled with -CURRENT) does cause panics on 10- and 9-


​i compared the code on the emulex website (10.0.747.0 ?) with the
one in HEAD and it does not seem​ much different, but perhaps
you have some other version in mind ?

The bugs found by stefano exist also in the emulex version above.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: Fix Emulex oce driver in CURRENT

2014-06-30 Thread John Baldwin
On Monday, June 30, 2014 12:36:22 pm Stefano Garzarella wrote:
 Hello,
 I had problems during some experiments with Emulex and oce driver in
 CURRENT.
 I found several bugs in the oce driver and this patch fixes them.
 
 - oce_multiq_start(): if the link is down returns ENXIO without consuming
 the mbuf.
   A trivial fix is to remove the initial error check, since
 oce_multiq_transmit() which is
   called next handles the link down situation correctly.
 - oce_multiq_transmit(): there is an extra call to drbr_enqueue() causing
 the
   mbuf to be enqueued twice when the NIC's queue is full.
 - oce_multiq_transmit(): same problem fixed recently in ixgbe (r267187) and
 other drivers:
   if the mbuf is enqueued, the proper return value is 0
 
 This patch has been reviewed by luigi (in cc).
 
 If someone could have a look on this and give me some feedback it would be
 great.

I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with
folks at Emulex to maintain this driver.  He is probably the best person to
review this.

 Regards,
 Stefano Garzarella
 
 
 
  diff --git a/sys/dev/oce/oce_if.c b/sys/dev/oce/oce_if.c
  index 70d6393..af57491 100644
  --- a/sys/dev/oce/oce_if.c
  +++ b/sys/dev/oce/oce_if.c
  @@ -563,9 +563,6 @@ oce_multiq_start(struct ifnet *ifp, struct mbuf *m)
  int queue_index = 0;
  int status = 0;
 
  -   if (!sc-link_status)
  -   return ENXIO;
  -
  if ((m-m_flags  M_FLOWID) != 0)
  queue_index = m-m_pkthdr.flowid % sc-nwqs;
 
  @@ -1274,7 +1271,6 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf
 *m, struct oce_wq *wq)
  drbr_putback(ifp, br, next);
  wq-tx_stats.tx_stops ++;
  ifp-if_drv_flags |= IFF_DRV_OACTIVE;
  -   status = drbr_enqueue(ifp, br, next);
  }
  break;
  }
  @@ -1285,7 +1281,7 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf
 *m, struct oce_wq *wq)
  ETHER_BPF_MTAP(ifp, next);
  }
 
  -   return status;
  +   return 0;
   }
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org