Re: Net: ucc_geth ethernet driver optimization space
linuxppc-dev-bounces+joakim.tjernlund=transmode...@ozlabs.org wrote on 27/05/2009 07:08:07: Guys, The ucc_geth ethernet driver have dozens of strong sync read/write operation, such as in_be32/16/8, out_be32/16/8. all of them is sync read/write, it is very expensive for performance. For the critical patch, we can remove some unnecessary in_be(x), out_be(x) with normal memory operation, and keep some necessary memory barrier. eg: BD access in the interrupt handler and start_xmit. The BD operation only need the memory barrier between length/buffer and status. struct buffer descriptor { u16 status; u16 length; u32 buffer; } __attribute__ ((packed)); struct buffer descriptor *BD; BD-length = ; BD-buffer = ; wmb(); BD-status = ; For powerpc, eieio is enough for 60x, mbar 1 is enough for e500. Of couse, also need the memory clobber to avoid the compiler reorder between them. Thanks, Dave Yes, pretty please :) You might want to combine status and length into one U32 though: BD-buffer = ; wmb(); BD-stat_len = 16 | ; Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Re: Net: ucc_geth ethernet driver optimization space
On Wed, May 27, 2009 at 1:08 PM, Liu Dave-R63238 dave...@freescale.com wrote: Guys, The ucc_geth ethernet driver have dozens of strong sync read/write operation, such as in_be32/16/8, out_be32/16/8. all of them is sync read/write, it is very expensive for performance. Totally agree. That's one of my concerns right from the beginning. For the critical patch, we can remove some unnecessary in_be(x), out_be(x) with normal memory operation, and keep some necessary memory barrier. eg: BD access in the interrupt handler and start_xmit. The BD operation only need the memory barrier between length/buffer and status. struct buffer descriptor { u16 status; u16 length; u32 buffer; } __attribute__ ((packed)); struct buffer descriptor *BD; BD-length = ; BD-buffer = ; wmb(); BD-status = ; The BD can reside either in memory or memory mapped region, which makes the case more complex. MMIO accesses need to use IO accessors for the sparse checking. We might make use of the __raw_*() accessors, but I'm not sure if it's suitable for non-PCI buses on powerpc. And also we need to pay special attention to the problem described here: http://lwn.net/Articles/198988/ - Leo ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev
Net: ucc_geth ethernet driver optimization space
Guys, The ucc_geth ethernet driver have dozens of strong sync read/write operation, such as in_be32/16/8, out_be32/16/8. all of them is sync read/write, it is very expensive for performance. For the critical patch, we can remove some unnecessary in_be(x), out_be(x) with normal memory operation, and keep some necessary memory barrier. eg: BD access in the interrupt handler and start_xmit. The BD operation only need the memory barrier between length/buffer and status. struct buffer descriptor { u16 status; u16 length; u32 buffer; } __attribute__ ((packed)); struct buffer descriptor *BD; BD-length = ; BD-buffer = ; wmb(); BD-status = ; For powerpc, eieio is enough for 60x, mbar 1 is enough for e500. Of couse, also need the memory clobber to avoid the compiler reorder between them. Thanks, Dave ___ Linuxppc-dev mailing list Linuxppc-dev@ozlabs.org https://ozlabs.org/mailman/listinfo/linuxppc-dev