Re: Fix Emulex oce driver in CURRENT
On 04/09/2014 09:49, Borja Marcos wrote: On Jun 30, 2014, at 8:02 PM, John Baldwin wrote: I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with folks at Emulex to maintain this driver. He is probably the best person to review this. Hi, Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been applied to -STABLE. Any chances to do it yet? As far as I know, the oce driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 seconds. Was there any conclusion to this, current and releng/10.0 releng/10.1 seem pretty similar with regards oce but a customer is reporting panics very similar to this thread. Did the commit of the additional locking never make it in? Regards Steve ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Dec 5, 2014, at 2:00 PM, Steven Hartland wrote: On 04/09/2014 09:49, Borja Marcos wrote: On Jun 30, 2014, at 8:02 PM, John Baldwin wrote: I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with folks at Emulex to maintain this driver. He is probably the best person to review this. Hi, Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been applied to -STABLE. Any chances to do it yet? As far as I know, the oce driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 seconds. Was there any conclusion to this, current and releng/10.0 releng/10.1 seem pretty similar with regards oce but a customer is reporting panics very similar to this thread. Did the commit of the additional locking never make it in? Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and I've been applying the patch manually myself. I don't think it's been applied even to -HEAD. For now I've told my coworkers to avoid Emulex cards whenever possible. As far as I know the driver is unusable in its present state. Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On 05/12/2014 13:07, Borja Marcos wrote: On Dec 5, 2014, at 2:00 PM, Steven Hartland wrote: On 04/09/2014 09:49, Borja Marcos wrote: On Jun 30, 2014, at 8:02 PM, John Baldwin wrote: I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with folks at Emulex to maintain this driver. He is probably the best person to review this. Hi, Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been applied to -STABLE. Any chances to do it yet? As far as I know, the oce driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 seconds. Was there any conclusion to this, current and releng/10.0 releng/10.1 seem pretty similar with regards oce but a customer is reporting panics very similar to this thread. Did the commit of the additional locking never make it in? Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and I've been applying the patch manually myself. I don't think it's been applied even to -HEAD. For now I've told my coworkers to avoid Emulex cards whenever possible. As far as I know the driver is unusable in its present state. Thanks for the quick reply Borja, review of the patch is now up: /https://reviews.freebsd.org/D1269 Hopefully we can get this in the tree and make oce usable moving forward. Regards Steve / ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been applied to -STABLE. Any chances to do it yet? As far as I know, the oce driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 seconds. Was there any conclusion to this, current and releng/10.0 releng/10.1 seem pretty similar with regards oce but a customer is reporting panics very similar to this thread. Did the commit of the additional locking never make it in? Not as far as I know. I´ve updated a couple of machines here to 10-STABLE and I've been applying the patch manually myself. I don't think it's been applied even to -HEAD. Where can I find a version of the patch to be applied to 10-STABLE? Is this the one? https://bz-attachments.freebsd.org/attachment.cgi?id=144718 Steinar Haug, Nethelp consulting, sth...@nethelp.no ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jun 30, 2014, at 8:02 PM, John Baldwin wrote: I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with folks at Emulex to maintain this driver. He is probably the best person to review this. Hi, Seems 10.1 is on the pipeline now, but as far as I know none of these fixes have been applied to -STABLE. Any chances to do it yet? As far as I know, the oce driver is currently unusable in -STABLE. I managed to cause a panic reliably within 30 seconds. Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
Hi, I found other problems in the oce driver during some experiments with netmap in emulation mode. In details: - missing locking: - in some functions there are write accesses on the wq struct (tx queue descriptor) without acquire LOCK on the queue, particularly in oce_wq_handler() that is invoked in the interrupt routine. For this reason there may be race conditions. - tx cleanup: - in oce_if_deactivate() the wq queues are drained but some still pending mbufs are not freed. For this reason, I added the oce_tx_clean() that releases any pending mbufs. I also tried experimenting with iperf3 using the same Borja environment and I don't have panic. Can you try this patch? Do you still have the panic? Cheers, Stefano Garzarella diff --git a/sys/dev/oce/oce_if.c b/sys/dev/oce/oce_if.c index af57491..33b35b4 100644 --- a/sys/dev/oce/oce_if.c +++ b/sys/dev/oce/oce_if.c @@ -142,6 +142,7 @@ static int oce_tx(POCE_SOFTC sc, struct mbuf **mpp, int wq_index); static void oce_tx_restart(POCE_SOFTC sc, struct oce_wq *wq); static void oce_tx_complete(struct oce_wq *wq, uint32_t wqe_idx, uint32_t status); +static void oce_tx_clean(POCE_SOFTC sc); static int oce_multiq_transmit(struct ifnet *ifp, struct mbuf *m, struct oce_wq *wq); @@ -585,8 +586,10 @@ oce_multiq_flush(struct ifnet *ifp) int i = 0; for (i = 0; i sc-nwqs; i++) { + LOCK(sc-wq[i]-tx_lock); while ((m = buf_ring_dequeue_sc(sc-wq[i]-br)) != NULL) m_freem(m); + UNLOCK(sc-wq[i]-tx_lock); } if_qflush(ifp); } @@ -1052,6 +1055,19 @@ oce_tx_complete(struct oce_wq *wq, uint32_t wqe_idx, uint32_t status) } } +static void +oce_tx_clean(POCE_SOFTC sc) { + int i = 0; + struct oce_wq *wq; + + for_all_wq_queues(sc, wq, i) { + LOCK(wq-tx_lock); + while (wq-pkt_desc_tail != wq-pkt_desc_head) { + oce_tx_complete(wq, 0, 0); + } + UNLOCK(wq-tx_lock); + } +} static void oce_tx_restart(POCE_SOFTC sc, struct oce_wq *wq) @@ -1213,6 +1229,8 @@ oce_wq_handler(void *arg) struct oce_nic_tx_cqe *cqe; int num_cqes = 0; + LOCK(wq-tx_lock); + bus_dmamap_sync(cq-ring-dma.tag, cq-ring-dma.map, BUS_DMASYNC_POSTWRITE); cqe = RING_GET_CONSUMER_ITEM_VA(cq-ring, struct oce_nic_tx_cqe); @@ -1237,6 +1255,8 @@ oce_wq_handler(void *arg) if (num_cqes) oce_arm_cq(sc, cq-cq_id, num_cqes, FALSE); + UNLOCK(wq-tx_lock); + return 0; } @@ -2087,6 +2107,9 @@ oce_if_deactivate(POCE_SOFTC sc) /* Delete RX queue in card with flush param */ oce_stop_rx(sc); + /* Flush the mbufs that are still in TX queues */ + oce_tx_clean(sc); + /* Invalidate any pending cq and eq entries*/ for_all_evnt_queues(sc, eq, i) oce_drain_eq(eq); diff --git a/sys/dev/oce/oce_queue.c b/sys/dev/oce/oce_queue.c index 308c16d..161011b 100644 --- a/sys/dev/oce/oce_queue.c +++ b/sys/dev/oce/oce_queue.c @@ -969,7 +969,9 @@ oce_start_rq(struct oce_rq *rq) int oce_start_wq(struct oce_wq *wq) { + LOCK(wq-tx_lock); /* XXX: maybe not necessary */ oce_arm_cq(wq-parent, wq-cq-cq_id, 0, TRUE); + UNLOCK(wq-tx_lock); return 0; } @@ -1076,6 +1078,8 @@ oce_drain_wq_cq(struct oce_wq *wq) struct oce_nic_tx_cqe *cqe; int num_cqes = 0; + LOCK(wq-tx_lock); /* XXX: maybe not necessary */ + bus_dmamap_sync(cq-ring-dma.tag, cq-ring-dma.map, BUS_DMASYNC_POSTWRITE); @@ -1093,6 +1097,7 @@ oce_drain_wq_cq(struct oce_wq *wq) oce_arm_cq(sc, cq-cq_id, num_cqes, FALSE); + UNLOCK(wq-tx_lock); } 2014-07-07 13:57 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 7, 2014, at 1:23 PM, Luigi Rizzo wrote: On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote: we'll try to investigate, can you tell us more about the environment you use ? (FreeBSD version, card model (PCI id perhaps), iperf3 invocation line, interface configuration etc.) The main differences between 10.0.747.0 and the code in head (after our fix) is the use of drbr_enqueue/dequeue versus the peek/putback in the transmit routine. Both drivers still have issues when the link flaps because the transmit queue is not cleaned up properly (unlike what happens in the linux driver and all FreeBSD drivers for different hardware), so it might well be that you are seeing some side effect of that or other problem which manifests itself differently depending on the environment. 'instant panic' by itself does not tell us anything about what could be the problem you experience (and we do not see it with either driver). The environment details are here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The way I produce an instant panic is: 1) Connect to another machine (cross connect cable) 2) iperf3 -s on the other machine (The other machine is different, it has an ix card) 3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N In less than 30 seconds, panic. mierda dumped core - see /var/crash/vmcore.0 Mon Jul 7 13:06:44 CEST 2014 FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul 7 11:41:45 CEST 2014
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 10:22 AM, Stefano Garzarella wrote: Hi, I found other problems in the oce driver during some experiments with netmap in emulation mode. What about driver version 10.0.747.0? At least in my configuration it works perfectly, no crashes despite keeping it running for several days at full bandwidth. I have a server about to go into production. Should this patch work on 10-STABLE? Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
I used the oce driver in CURRENT. I think that this patch in combination with the previous one should work in 10-STABLE. I have only tested if it works with CURRENT, but now I try if it works with 10-STABLE and I'll send you some feedback. Cheers, Stefano 2014-07-15 10:28 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 15, 2014, at 10:22 AM, Stefano Garzarella wrote: Hi, I found other problems in the oce driver during some experiments with netmap in emulation mode. What about driver version 10.0.747.0? At least in my configuration it works perfectly, no crashes despite keeping it running for several days at full bandwidth. I have a server about to go into production. Should this patch work on 10-STABLE? Borja. -- Stefano Garzarella ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote: I used the oce driver in CURRENT. I think that this patch in combination with the previous one should work in 10-STABLE. I have only tested if it works with CURRENT, but now I try if it works with 10-STABLE and I'll send you some feedback. I can still try. Will get back to you soon. Cheers, Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote: I used the oce driver in CURRENT. I think that this patch in combination with the previous one should work in 10-STABLE. I have only tested if it works with CURRENT, but now I try if it works with 10-STABLE and I'll send you some feedback. Hmmm. The patch seems to be broken. I have tried to apply it renaming the a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains: Patching file oce_if.c using Plan A... patch: malformed patch at line 6: int wq_index); Was it broken by the email client formatting? Or am I being especially clumsy today? ;) Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
I think there is some problem with the email formatting. I send you a file with both patches. Cheers, Stefano 2014-07-15 11:12 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote: I used the oce driver in CURRENT. I think that this patch in combination with the previous one should work in 10-STABLE. I have only tested if it works with CURRENT, but now I try if it works with 10-STABLE and I'll send you some feedback. Hmmm. The patch seems to be broken. I have tried to apply it renaming the a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains: Patching file oce_if.c using Plan A... patch: malformed patch at line 6: int wq_index); Was it broken by the email client formatting? Or am I being especially clumsy today? ;) Borja. -- Stefano Garzarella oce_fix_STABLE10.patch Description: Binary data ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
I just tried to run iperf3 with this patch and STABLE-10 and it seems to work. Do you have a panic? Cheers, Stefano 2014-07-15 11:19 GMT+02:00 Stefano Garzarella stefanogarzare...@gmail.com: I think there is some problem with the email formatting. I send you a file with both patches. Cheers, Stefano 2014-07-15 11:12 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 15, 2014, at 10:43 AM, Stefano Garzarella wrote: I used the oce driver in CURRENT. I think that this patch in combination with the previous one should work in 10-STABLE. I have only tested if it works with CURRENT, but now I try if it works with 10-STABLE and I'll send you some feedback. Hmmm. The patch seems to be broken. I have tried to apply it renaming the a/usr/src... to oce_if.c.old and oce_if.c, etc, and patch complains: Patching file oce_if.c using Plan A... patch: malformed patch at line 6: int wq_index); Was it broken by the email client formatting? Or am I being especially clumsy today? ;) Borja. -- Stefano Garzarella -- Stefano Garzarella ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote: I just tried to run iperf3 with this patch and STABLE-10 and it seems to work. Do you have a panic? Still compiling :) Anyway, you didn't suffer panics before, right? Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
2014-07-15 11:46 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote: I just tried to run iperf3 with this patch and STABLE-10 and it seems to work. Do you have a panic? Still compiling :) Anyway, you didn't suffer panics before, right? Right, I didn't suffer panics with iperf3, but with netmap in emulation mode I had a lot of panics before this patch. Stefano Borja. -- Stefano Garzarella ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote: I just tried to run iperf3 with this patch and STABLE-10 and it seems to work. Do you have a panic? So far, so good. I've ran a couple of iperf3 tests (60 seconds, trying both directions) and it doesn't crash. Without the fixes I obtained a panic quite reliably, in less than 30 seconds. Still trying. But the bugs you mentioned (lack of locking and deallocating, etc) seem to be consistent with the kind of failures I saw and their apparent randomness. So, asking for spiritual counsel now. Would you use this driver in a production environment instead of the 747 version downloaded from Emulex? I think the latter is giving slightly better performance but, anyway, I disable LRO and TSO because I see a horrible impact on NFS performance. Cheers, Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
2014-07-15 12:00 GMT+02:00 Borja Marcos bor...@sarenet.es: On Jul 15, 2014, at 11:45 AM, Stefano Garzarella wrote: I just tried to run iperf3 with this patch and STABLE-10 and it seems to work. Do you have a panic? So far, so good. I've ran a couple of iperf3 tests (60 seconds, trying both directions) and it doesn't crash. Without the fixes I obtained a panic quite reliably, in less than 30 seconds. Still trying. But the bugs you mentioned (lack of locking and deallocating, etc) seem to be consistent with the kind of failures I saw and their apparent randomness. Well. So, asking for spiritual counsel now. Would you use this driver in a production environment instead of the 747 version downloaded from Emulex? I think the latter is giving slightly better performance but, anyway, I disable LRO and TSO because I see a horrible impact on NFS performance. I made a diff between the two versions (CURRENT and 747) and I saw that the main difference is in the management of buf_ring through drbr API. In the CURRENT driver they use a new function drbr_peek() instead of drbr_dequeue() and I think this is better. However, even in the 747 version seems to have the problem of the lack of locking. Cheers, Stefano Cheers, Borja. -- Stefano Garzarella ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 15, 2014, at 1:36 PM, Stefano Garzarella wrote: So, asking for spiritual counsel now. Would you use this driver in a production environment instead of the 747 version downloaded from Emulex? I think the latter is giving slightly better performance but, anyway, I disable LRO and TSO because I see a horrible impact on NFS performance. I made a diff between the two versions (CURRENT and 747) and I saw that the main difference is in the management of buf_ring through drbr API. In the CURRENT driver they use a new function drbr_peek() instead of drbr_dequeue() and I think this is better. However, even in the 747 version seems to have the problem of the lack of locking. Well, definitely you saved my cake! So it was still a tickling time bomb. Thank you very much! Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 1, 2014, at 10:24 PM, Luigi Rizzo wrote: On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote: El 30.06.2014 18:36, Stefano Garzarella escribió: Hello, I had problems during some experiments with Emulex and oce driver in CURRENT. I found several bugs in the oce driver and this patch fixes them. At least with some cards, the driver simply does not work. It causes a panic when there is some traffic. The relevant bug report is here. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The latest version available from the Emulex website works. But the version bundled with 9.3 and at least -STABLE (which is the same version bundled with -CURRENT) does cause panics on 10- and 9- i compared the code on the emulex website (10.0.747.0 ?) with the one in HEAD and it does not seem much different, but perhaps you have some other version in mind ? The bugs found by stefano exist also in the emulex version above. Anyway The fixed version is an instant panic when generating traffic (just use iperf3). Version 10.0.747.0 does _not_ panic. Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote: On Jul 1, 2014, at 10:24 PM, Luigi Rizzo wrote: On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote: El 30.06.2014 18:36, Stefano Garzarella escribió: Hello, I had problems during some experiments with Emulex and oce driver in CURRENT. I found several bugs in the oce driver and this patch fixes them. At least with some cards, the driver simply does not work. It causes a panic when there is some traffic. The relevant bug report is here. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The latest version available from the Emulex website works. But the version bundled with 9.3 and at least -STABLE (which is the same version bundled with -CURRENT) does cause panics on 10- and 9- i compared the code on the emulex website (10.0.747.0 ?) with the one in HEAD and it does not seem much different, but perhaps you have some other version in mind ? The bugs found by stefano exist also in the emulex version above. Anyway The fixed version is an instant panic when generating traffic (just use iperf3). Version 10.0.747.0 does _not_ panic. we'll try to investigate, can you tell us more about the environment you use ? (FreeBSD version, card model (PCI id perhaps), iperf3 invocation line, interface configuration etc.) The main differences between 10.0.747.0 and the code in head (after our fix) is the use of drbr_enqueue/dequeue versus the peek/putback in the transmit routine. Both drivers still have issues when the link flaps because the transmit queue is not cleaned up properly (unlike what happens in the linux driver and all FreeBSD drivers for different hardware), so it might well be that you are seeing some side effect of that or other problem which manifests itself differently depending on the environment. 'instant panic' by itself does not tell us anything about what could be the problem you experience (and we do not see it with either driver). cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Jul 7, 2014, at 1:23 PM, Luigi Rizzo wrote: On Mon, Jul 7, 2014 at 1:03 PM, Borja Marcos bor...@sarenet.es wrote: we'll try to investigate, can you tell us more about the environment you use ? (FreeBSD version, card model (PCI id perhaps), iperf3 invocation line, interface configuration etc.) The main differences between 10.0.747.0 and the code in head (after our fix) is the use of drbr_enqueue/dequeue versus the peek/putback in the transmit routine. Both drivers still have issues when the link flaps because the transmit queue is not cleaned up properly (unlike what happens in the linux driver and all FreeBSD drivers for different hardware), so it might well be that you are seeing some side effect of that or other problem which manifests itself differently depending on the environment. 'instant panic' by itself does not tell us anything about what could be the problem you experience (and we do not see it with either driver). The environment details are here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The way I produce an instant panic is: 1) Connect to another machine (cross connect cable) 2) iperf3 -s on the other machine (The other machine is different, it has an ix card) 3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N In less than 30 seconds, panic. mierda dumped core - see /var/crash/vmcore.0 Mon Jul 7 13:06:44 CEST 2014 FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul 7 11:41:45 CEST 2014 root@mierda:/usr/obj/usr/src/sys/GENERIC amd64 panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing cpuid = 12 KDB: stack backtrace: #0 0x8092a470 at kdb_backtrace+0x60 #1 0x808ef9c5 at panic+0x155 #2 0x80962710 at sbdroprecord_locked+0 #3 0x80a8ba8c at tcp_output+0xdbc #4 0x80a8987f at tcp_do_segment+0x30ff #5 0x80a85b34 at tcp_input+0xd04 #6 0x80a1af57 at ip_input+0x97 #7 0x809ba512 at netisr_dispatch_src+0x62 #8 0x809b1ae6 at ether_demux+0x126 #9 0x809b278e at ether_nh_input+0x35e #10 0x809ba512 at netisr_dispatch_src+0x62 #11 0x81c19ab9 at oce_rx+0x3c9 #12 0x81c19536 at oce_rq_handler+0xb6 #13 0x81c1bb1c at oce_intr+0xdc #14 0x80938b35 at taskqueue_run_locked+0xe5 #15 0x809395c8 at taskqueue_thread_loop+0xa8 #16 0x808c057a at fork_exit+0x9a #17 0x80ccb51e at fork_trampoline+0xe Uptime: 51m20s Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Mon, Jul 7, 2014 at 1:57 PM, Borja Marcos bor...@sarenet.es wrote: ... The environment details are here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The way I produce an instant panic is: 1) Connect to another machine (cross connect cable) 2) iperf3 -s on the other machine (The other machine is different, it has an ix card) 3) iperf3 -t 30 -P 4 -c 10.0.0.1 -N In less than 30 seconds, panic. mierda dumped core - see /var/crash/vmcore.0 Mon Jul 7 13:06:44 CEST 2014 FreeBSD mierda 10.0-STABLE FreeBSD 10.0-STABLE #2: Mon Jul 7 11:41:45 CEST 2014 root@mierda:/usr/obj/usr/src/sys/GENERIC amd64 panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as amd64-marcel-freebsd... Unread portion of the kernel message buffer: panic: sbsndptr: sockbuf 0xf800a70489b0 and mbuf 0xf801a3326e00 clashing cpuid = 12 KDB: stack backtrace: #0 0x8092a470 at kdb_backtrace+0x60 #1 0x808ef9c5 at panic+0x155 #2 0x80962710 at sbdroprecord_locked+0 #3 0x80a8ba8c at tcp_output+0xdbc #4 0x80a8987f at tcp_do_segment+0x30ff #5 0x80a85b34 at tcp_input+0xd04 #6 0x80a1af57 at ip_input+0x97 #7 0x809ba512 at netisr_dispatch_src+0x62 #8 0x809b1ae6 at ether_demux+0x126 #9 0x809b278e at ether_nh_input+0x35e #10 0x809ba512 at netisr_dispatch_src+0x62 #11 0x81c19ab9 at oce_rx+0x3c9 #12 0x81c19536 at oce_rq_handler+0xb6 #13 0x81c1bb1c at oce_intr+0xdc #14 0x80938b35 at taskqueue_run_locked+0xe5 #15 0x809395c8 at taskqueue_thread_loop+0xa8 #16 0x808c057a at fork_exit+0x9a #17 0x80ccb51e at fork_trampoline+0xe Uptime: 51m20s ah, that seems a bug on the receive side, we were only looking at the transmit side so far. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
El 30.06.2014 18:36, Stefano Garzarella escribió: Hello, I had problems during some experiments with Emulex and oce driver in CURRENT. I found several bugs in the oce driver and this patch fixes them. At least with some cards, the driver simply does not work. It causes a panic when there is some traffic. The relevant bug report is here. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The latest version available from the Emulex website works. But the version bundled with 9.3 and at least -STABLE (which is the same version bundled with -CURRENT) does cause panics on 10- and 9- It's quite easy to reproduce. Link two machines, fire iperf to generate traffic and watch the almost instant panic. Borja. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Tue, Jul 1, 2014 at 8:58 PM, bor...@sarenet.es wrote: El 30.06.2014 18:36, Stefano Garzarella escribió: Hello, I had problems during some experiments with Emulex and oce driver in CURRENT. I found several bugs in the oce driver and this patch fixes them. At least with some cards, the driver simply does not work. It causes a panic when there is some traffic. The relevant bug report is here. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=183391 The latest version available from the Emulex website works. But the version bundled with 9.3 and at least -STABLE (which is the same version bundled with -CURRENT) does cause panics on 10- and 9- i compared the code on the emulex website (10.0.747.0 ?) with the one in HEAD and it does not seem much different, but perhaps you have some other version in mind ? The bugs found by stefano exist also in the emulex version above. cheers luigi ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: Fix Emulex oce driver in CURRENT
On Monday, June 30, 2014 12:36:22 pm Stefano Garzarella wrote: Hello, I had problems during some experiments with Emulex and oce driver in CURRENT. I found several bugs in the oce driver and this patch fixes them. - oce_multiq_start(): if the link is down returns ENXIO without consuming the mbuf. A trivial fix is to remove the initial error check, since oce_multiq_transmit() which is called next handles the link down situation correctly. - oce_multiq_transmit(): there is an extra call to drbr_enqueue() causing the mbuf to be enqueued twice when the NIC's queue is full. - oce_multiq_transmit(): same problem fixed recently in ixgbe (r267187) and other drivers: if the mbuf is enqueued, the proper return value is 0 This patch has been reviewed by luigi (in cc). If someone could have a look on this and give me some feedback it would be great. I think these sound fine, but I've cc'd Xin Li (delphij@) who has worked with folks at Emulex to maintain this driver. He is probably the best person to review this. Regards, Stefano Garzarella diff --git a/sys/dev/oce/oce_if.c b/sys/dev/oce/oce_if.c index 70d6393..af57491 100644 --- a/sys/dev/oce/oce_if.c +++ b/sys/dev/oce/oce_if.c @@ -563,9 +563,6 @@ oce_multiq_start(struct ifnet *ifp, struct mbuf *m) int queue_index = 0; int status = 0; - if (!sc-link_status) - return ENXIO; - if ((m-m_flags M_FLOWID) != 0) queue_index = m-m_pkthdr.flowid % sc-nwqs; @@ -1274,7 +1271,6 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf *m, struct oce_wq *wq) drbr_putback(ifp, br, next); wq-tx_stats.tx_stops ++; ifp-if_drv_flags |= IFF_DRV_OACTIVE; - status = drbr_enqueue(ifp, br, next); } break; } @@ -1285,7 +1281,7 @@ oce_multiq_transmit(struct ifnet *ifp, struct mbuf *m, struct oce_wq *wq) ETHER_BPF_MTAP(ifp, next); } - return status; + return 0; } ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org -- John Baldwin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org