From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 22 May 2007 15:22:35 -0700
Yep, for any NIC that supports SG but not TSO then software GSO will
be a big win. When the NIC doesn't support SG then the win is mostly
offset by the need to copy the packet again.
Cheers,
--
We could
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 22 May 2007 15:58:05 -0700
Sorry for the confusion. I am thinking to avoid copy in skb_segment() for
GSO. The way could be in tcp_sendmsg() to allocate small discontiguous
buffers (equal = MTU) instead of allocating pages.
The SKB splitting
On Tue, May 22, 2007 at 03:36:36PM -0700, David Miller wrote:
Yep, for any NIC that supports SG but not TSO then software GSO will
be a big win. When the NIC doesn't support SG then the win is mostly
offset by the need to copy the packet again.
...
SKB's from TSO are composed of
David Miller [EMAIL PROTECTED] wrote:
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 15 May 2007 14:22:57 -0700
I just wonder without TSO support in HW, how much benefit we
can get by pushing GSO from interface layer to device layer besides
we can do multiple packets in IPoIB.
I bet
On Wed, 2007-16-05 at 23:25 -0400, jamal wrote:
This patch now includes two changed drivers (tun and e1000). I have
tested tun with this patch. I tested e1000 earlier and i couldnt find
any issues - although as the tittle says its a WIP.
As before you need net-2.6. You also need the qdisc
Jamal,
Here are some comments i have on your patch.
See them inline.
Thanks
Sridhar
+static int try_get_tx_pkts(struct net_device *dev, struct Qdisc *q, int count)
+{
+ struct sk_buff *skb;
+ struct sk_buff_head *skbs = dev-blist;
+ int tdq = count;
+
+ /*
+*
On Wed, 2007-16-05 at 15:12 -0700, Sridhar Samudrala wrote:
Jamal,
Here are some comments i have on your patch.
See them inline.
Thanks for taking the time Sridhar.
try_tx_pkts() is directly calling the device's batch xmit routine.
Don't we need to call dev_hard_start_xmit() to handle
On Wed, 2007-16-05 at 18:52 -0400, jamal wrote:
On Wed, 2007-16-05 at 15:12 -0700, Sridhar Samudrala wrote:
I will have to think a bit about this; i may end up coalescing when
grabbing the packets but call the nit from the driver using a helper.
Thats what i did. This would hopefully work
Hi Sridhar,
Sridhar Samudrala [EMAIL PROTECTED] wrote on 05/17/2007 03:42:03 AM:
AFAIK, gso_skb can be a list of skb's. Can we add a list
to another list using __skb_queue_head()?
Also, if gso_skb is a list of multiple skb's, i think the
count needs to be decremented by the number of
Krishna Kumar2 wrote:
Hi Sridhar,
Sridhar Samudrala [EMAIL PROTECTED] wrote on 05/17/2007 03:42:03 AM:
AFAIK, gso_skb can be a list of skb's. Can we add a list
to another list using __skb_queue_head()?
Also, if gso_skb is a list of multiple skb's, i think the
count needs to be decremented by
Sridhar Samudrala [EMAIL PROTECTED] wrote on 05/17/2007 03:14:41 AM:
Krishna Kumar2 wrote:
Hi Sridhar,
Sridhar Samudrala [EMAIL PROTECTED] wrote on 05/17/2007 03:42:03 AM:
AFAIK, gso_skb can be a list of skb's. Can we add a list
to another list using __skb_queue_head()?
Also, if
As I said before, getting multiple packets in one call to xmit would
be nice for amortizing per-xmit overhead in IPoIB. So it would be
nice if the cases where the stack does GSO ended up passing all the
segments into the driver in one go.
Well TCP does upto 64k -- that is what
From: Roland Dreier [EMAIL PROTECTED]
Date: Tue, 15 May 2007 09:25:28 -0700
I'll have to think about implementing that for IPoIB. One issue I see
is if I have, say, 4 free entries in my send queue and skb_gso_segment()
gives me back 5 packets to send. It's not clear I can recover at that
I'll have to think about implementing that for IPoIB. One issue I see
is if I have, say, 4 free entries in my send queue and skb_gso_segment()
gives me back 5 packets to send. It's not clear I can recover at that
point -- I guess I have to check against gso_segs in the xmit routine
On Tue, 2007-05-15 at 13:52 -0700, Roland Dreier wrote:
Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching. However, I'm not sure that
I can count on a fudge factor ensuring that there's enough space to
handle everything
Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching. However, I'm not sure that
I can count on a fudge factor ensuring that there's enough space to
handle everything skb_gso_segment() gives me -- is there any reliable
way to
I thought to enable GSO, device driver actually does nothing rather
than enabling the flag. GSO moved TCP offloading to interface layer before
device xmit. It's a different idea with multiple packets per xmit. GSO
still queue the packet one bye one in QDISC and xmit one bye one. The
On Tue, 2007-05-15 at 14:08 -0700, Roland Dreier wrote:
Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching. However, I'm not sure that
I can count on a fudge factor ensuring that there's enough space to
handle everything
From: Michael Chan [EMAIL PROTECTED]
Date: Tue, 15 May 2007 15:05:28 -0700
On Tue, 2007-05-15 at 14:08 -0700, Roland Dreier wrote:
Well, IPoIB doesn't do netif_wake_queue() until half the device's TX
queue is free, so we should get batching. However, I'm not sure that
I can count
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 15 May 2007 14:22:57 -0700
I just wonder without TSO support in HW, how much benefit we
can get by pushing GSO from interface layer to device layer besides
we can do multiple packets in IPoIB.
I bet the gain is non-trivial.
I'd say about
Shirley I just wonder without TSO support in HW, how much
Shirley benefit we can get by pushing GSO from interface layer to
Shirley device layer besides we can do multiple packets in IPoIB.
The entire benefit comes from having multiple packets to queue in one
call to the xmit
On Tue, 2007-15-05 at 14:32 -0700, David Miller wrote:
An efficient qdisc--driver
transfer during netif_wake_queue() could help solve some of that,
as is being discussed here.
Ok, heres the approach i discussed at netconf.
It needs net-2.6 and the patch i posted earlier to clean up
On Tue, 2007-15-05 at 18:17 -0400, jamal wrote:
I will post a patch for tun device in a few minutes
that i use to test on my laptop (i need to remove some debugs) to show
an example.
Ok, here it is.
The way i test is to point packets at a tun device. [One way i do it
is attach an ingress
From: Shirley Ma [EMAIL PROTECTED]
Date: Tue, 15 May 2007 16:33:22 -0700
That's interesting. So a generic LRO in interface layer will benefit
the preformance more, right? Receiving path TCP N times is more expensive
than sending, I think.
If you look at some of the drivers doing LRO,
On Tue, 2007-15-05 at 18:48 -0400, jamal wrote:
I will try to post the e1000 patch tonight or tommorow morning.
I have the e1000 path done; a few features from the 2.6.18 missing
(mainly the one mucking with tx ring pruning on the tx path).
While it compiles and looks right - i havent tested it
On Friday 11 May 2007 13:16:44 Roland Dreier wrote:
I wasn't talking about sending.
But there actually is :- TSO/GSO.
As I said before, getting multiple packets in one call to xmit would
be nice for amortizing per-xmit overhead in IPoIB. So it would be
nice if the cases where the
Krishna Kumar2 wrote:
What about a race between trying to reacquire queue_lock and another
failed transmit?
That is not possible too. I hold the QDISC_RUNNING bit in dev-state and
am the only sender for this device, so there is no other failed transmit.
Also, on failure of dev_hard_start_xmit,
Hi Dave,
David Miller [EMAIL PROTECTED] wrote on 05/11/2007 02:27:07 AM:
I don't understand how transmitting already batched up packets in one
go
introduce latency.
Keep thinking :-)
The only case where these ideas can be seriously considered is during
netif_wake_queue(). In all other
Hi Gagan,
Gagan Arneja [EMAIL PROTECTED] wrote on 05/11/2007 11:27:54 AM:
Right, but I am the sole dequeue'r, and on failure, I requeue those
packets
to
the beginning of the queue (just as it would happen in the regular case
of
one
packet xmit/failure/requeue).
What about a race
(Mistaken didn't reply-all previous time)
Hi Dave,
David Stevens [EMAIL PROTECTED] wrote on 05/11/2007 02:57:56 AM:
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit;
Hi Gagan,
I have to claim incomplete familiarity for the code. But still, if
you're out there running with no locks for a period, there's no
assumption you can make. The lock could be held quickly assertion is a
fallacy.
I will try to explain since the code is pretty complicated. Packets
Hi Roland,
Roland Dreier [EMAIL PROTECTED] wrote on 05/11/2007 01:51:50 AM:
This is pretty interesting to me for IP-over-InfiniBand, for a couple
of reasons. First of all, I can push multiple send requests to the
underlying adapter in one go, which saves taking and dropping the same
lock
Krishna Kumar [EMAIL PROTECTED] writes:
Doing some measurements, I found that for small packets like 128 bytes,
the bandwidth is approximately 60% of the line speed. To possibly speed
up performance of small packet xmits, a method of linking skbs was
thought of - where two pointers
Hi Andy,
[EMAIL PROTECTED] wrote on 05/11/2007 02:35:05 PM:
You don't need that. You can just use the normal next/prev pointers.
In general it's a good idea to lower lock overhead etc., the VM has
used similar tricks very successfully in the past.
Does this mean each skb should be for the
Hi Andy,
Andi Kleen [EMAIL PROTECTED] wrote on 05/11/2007 03:07:14 PM:
But without it aggregation on RX is much less useful because the packets
cannot be kept together after socket demux which happens relatively early
in the packet processing path.
Then I misunderstood you, my proposal is
On Fri, May 11, 2007 at 10:34:22AM +0530, Krishna Kumar2 ([EMAIL PROTECTED])
wrote:
Not combining packets, I am sending them out in the same sequence it was
queued. If the xmit failed, the driver's new API returns the skb which
failed to be sent. This skb and all other linked skbs are
Hi Evgeniy,
Evgeniy Polyakov [EMAIL PROTECTED] wrote on 05/11/2007 02:31:38 PM:
On Fri, May 11, 2007 at 10:34:22AM +0530, Krishna Kumar2
([EMAIL PROTECTED]) wrote:
Not combining packets, I am sending them out in the same sequence it
was
queued. If the xmit failed, the driver's new API
Evgeniy Polyakov [EMAIL PROTECTED] wrote on 05/11/2007 03:02:02 PM:
On Fri, May 11, 2007 at 02:48:14PM +0530, Krishna Kumar2
([EMAIL PROTECTED]) wrote:
And what if you have thousand(s) of packets queued and first one has
failed, requeing all the rest one-by-one is not a solution. If it is
On Fri, May 11, 2007 at 03:22:13PM +0530, Krishna Kumar2 ([EMAIL PROTECTED])
wrote:
No locks, no requeues? Seems simple imho.
I will analyze this in more detail when I return (leaving just now, so got
really no time). The only issue that I see quickly is No locks, since to
get things off
Hi all,
Very preliminary testing with 20 procs on E1000 driver gives me following
result:
skbszOrg BW New BW % Org demand
New Demand %
32 315.98 347.489.97% 21090
20958 0.62%
96
On Fri, May 11, 2007 at 02:48:14PM +0530, Krishna Kumar2 ([EMAIL PROTECTED])
wrote:
And what if you have thousand(s) of packets queued and first one has
failed, requeing all the rest one-by-one is not a solution. If it is
being done under heavy lock (with disabled irqs especially) it
Sounds a good idea. I had a question on error handling. What happens if
the driver asynchronously returns an error for this WR (single WR
containing multiple skbs) ? Does it mean all the skbs failed to be sent ?
Requeuing all of them is a bad idea since it leads to infinitely doing the
On Fri, 2007-11-05 at 10:52 +0530, Krishna Kumar2 wrote:
I didn't try to optimize the driver to take any real advantage, I coded it
as simply as :
top:
next = skb-skb_flink;
Original driver code here, or another option is to remove the locking
and put it before the
On Fri, 2007-11-05 at 13:56 +0400, Evgeniy Polyakov wrote:
I meant no locks during processing of the packets (pci read/write, dma
setup and so on), of course it is needed to dequeue a packet, but only
for that operation.
I dont think you can avoid the lock Evgeniy. You need to protect against
I wasn't talking about sending.
But there actually is :- TSO/GSO.
As I said before, getting multiple packets in one call to xmit would
be nice for amortizing per-xmit overhead in IPoIB. So it would be
nice if the cases where the stack does GSO ended up passing all the
segments into the
On Fri, May 11, 2007 at 07:30:02AM -0400, jamal ([EMAIL PROTECTED]) wrote:
I meant no locks during processing of the packets (pci read/write, dma
setup and so on), of course it is needed to dequeue a packet, but only
for that operation.
I dont think you can avoid the lock Evgeniy. You
On Fri, 2007-11-05 at 15:53 +0400, Evgeniy Polyakov wrote:
As I said there might be another lock, if interrupt handler is shared,
or registers are accessed, but it is privite driver's business, which
has nothing in common with stack itself.
Ok, we are saying the same thing then. eg in e1000
Right, but I am the sole dequeue'r, and on failure, I requeue those packets
to
the beginning of the queue (just as it would happen in the regular case of
one
packet xmit/failure/requeue).
What about a race between trying to reacquire queue_lock and another
failed transmit?
--
Gagan
- KK
Hi all,
While looking at common packet sizes on xmits, I found that most of
the packets are small. On my personal system, the statistics of
packets after using (browsing, mail, ftp'ing two linux kernels from
www.kernel.org) for about 6 hours is :
On Thu, May 10, 2007 at 08:23:51PM +0530, Krishna Kumar ([EMAIL PROTECTED])
wrote:
The reason to implement the same was to speed up IPoIB driver. But
before doing that, a proof of concept for E1000/AMSO drivers was
considered (as most of the code is generic) before implementing for
IPoIB. I
Hi Evgeniy,
Evgeniy Polyakov [EMAIL PROTECTED] wrote on 05/10/2007 08:38:33 PM:
On Thu, May 10, 2007 at 08:23:51PM +0530, Krishna Kumar
([EMAIL PROTECTED]) wrote:
The reason to implement the same was to speed up IPoIB driver. But
before doing that, a proof of concept for E1000/AMSO drivers
On Thu, May 10, 2007 at 08:52:12PM +0530, Krishna Kumar2 ([EMAIL PROTECTED])
wrote:
The reason to implement the same was to speed up IPoIB driver. But
before doing that, a proof of concept for E1000/AMSO drivers was
considered (as most of the code is generic) before implementing for
On Thu, 2007-10-05 at 19:48 +0400, Evgeniy Polyakov wrote:
IMHO if you do not see in profile anything related to driver's xmit
function, it does not require to be fixed.
True, but i think there may be value in amortizing the cost towards
the driver.
i.e If you grab a lock and send X packets
It is the reverse - GSO will segment one super-packet just before calling
the driver so that the stack is traversed only once. In my case, I am
trying to send out multiple skbs, possibly small packets, in one shot.
GSO will not help for small packets.
If there are small packets that implies
On Thu, 2007-05-10 at 10:19 -0700, Rick Jones wrote:
It is the reverse - GSO will segment one super-packet just before calling
the driver so that the stack is traversed only once. In my case, I am
trying to send out multiple skbs, possibly small packets, in one shot.
GSO will not help for
Rick Jones wrote:
It is the reverse - GSO will segment one super-packet just before calling
the driver so that the stack is traversed only once. In my case, I am
trying to send out multiple skbs, possibly small packets, in one shot.
GSO will not help for small packets.
If there are small
Vlad Yasevich wrote:
Rick Jones wrote:
It is the reverse - GSO will segment one super-packet just before calling
the driver so that the stack is traversed only once. In my case, I am
trying to send out multiple skbs, possibly small packets, in one shot.
GSO will not help for small packets.
Rick Jones wrote:
Vlad Yasevich wrote:
Rick Jones wrote:
It is the reverse - GSO will segment one super-packet just before
calling
the driver so that the stack is traversed only once. In my case, I am
trying to send out multiple skbs, possibly small packets, in one shot.
GSO will not help
Not sure if DCCP might fall into this category as well...
I think the idea of this patch is gather some number of these small packets and
shove them at the driver in one go instead of each small packet at a time.
This reminds me... (rick starts waxing rhapshodic about old HP-UX behviour :)
On 5/11/07, Vlad Yasevich [EMAIL PROTECTED] wrote:
May be for TCP? What about other protocols?
There are other protocols?-) True, UDP, and I suppose certain modes of
SCTP might be sending streams of small packets, as might TCP with
TCP_NODELAY set.
Do they often queue-up outside the
Ian McDonald wrote:
On 5/11/07, Vlad Yasevich [EMAIL PROTECTED] wrote:
May be for TCP? What about other protocols?
There are other protocols?-) True, UDP, and I suppose certain modes of
SCTP might be sending streams of small packets, as might TCP with
TCP_NODELAY set.
Do they
On 5/11/07, Vlad Yasevich [EMAIL PROTECTED] wrote:
The win might be biggest on a system were a lot of applications send a lot of
small packets. Some number will aggregate in the prio queue and then get shoved
into a driver in one go.
That's assuming that the device doesn't run out of things
small packets belonging to the same connection could be coalesced by
TCP, but this may help the case where multiple parallel connections are
sending small packets.
It's not just small packets. The cost of calling hard_start_xmit/byte
was rather high on your particular device. I've seen PCI
The discussion seems to have steered into protocol coalescing.
My tests for example were related to forwarding and not specific
to any protocol.
On Thu, 2007-10-05 at 12:43 -0700, Gagan Arneja wrote:
It's not just small packets. The cost of calling hard_start_xmit/byte
was rather high on
jamal wrote:
The discussion seems to have steered into protocol coalescing.
My tests for example were related to forwarding and not specific
to any protocol.
Just the natural tendency of end-system types to think of end-system things
rather than router things.
rick jones
-
To unsubscribe
jamal wrote:
You would need to almost re-write the driver to make sure it does IO
which is taking advantage of the batching.
Really! It's just the transmit routine. How radical can that be?
--
Gagan
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to
On Thu, 2007-10-05 at 13:14 -0700, Rick Jones wrote:
Just the natural tendency of end-system types to think of end-system things
rather than router things.
Well router types felt they were being left out ;-
cheers,
jamal
-
To unsubscribe from this list: send the line unsubscribe netdev in
On Thu, 2007-10-05 at 13:15 -0700, Gagan Arneja wrote:
Really! It's just the transmit routine. How radical can that be?
Ok, you have a point there, but it could be challenging with many
tunables:
For example:
my biggest challenge with the e1000 was just hacking up the DMA setup
path - i seem
This is pretty interesting to me for IP-over-InfiniBand, for a couple
of reasons. First of all, I can push multiple send requests to the
underlying adapter in one go, which saves taking and dropping the same
lock and also probably allows fewer MMIO writes for doorbells.
However the second reason
For example:
my biggest challenge with the e1000 was just hacking up the DMA setup
path - i seem to get better numbers if i dont kick the DMA until i stash
all the packets on the ring first etc. It seemed counter-intuitive.
That seems to make sense. The rings are(?) in system memory and you
From: Vlad Yasevich [EMAIL PROTECTED]
Date: Thu, 10 May 2007 15:21:30 -0400
The win might be biggest on a system were a lot of applications send
a lot of small packets. Some number will aggregate in the prio
queue and then get shoved into a driver in one go.
But... this is all conjecture
From: Gagan Arneja [EMAIL PROTECTED]
Date: Thu, 10 May 2007 12:43:53 -0700
It's not just small packets. The cost of calling hard_start_xmit/byte
was rather high on your particular device. I've seen PCI read
transaction in hard_start_xmit taking ~10,000 cycles on one particular
device.
David Miller wrote:
If the qdisc is packed with packets and we would just loop sending
them to the device, yes it might make sense.
But if that isn't the case, which frankly is the usual case, you add a
non-trivial amount of latency by batching and that's bad exactly for
the kind of
David Miller wrote:
From: Vlad Yasevich [EMAIL PROTECTED]
Date: Thu, 10 May 2007 15:21:30 -0400
The win might be biggest on a system were a lot of applications send
a lot of small packets. Some number will aggregate in the prio
queue and then get shoved into a driver in one go.
But... this
From: Gagan Arneja [EMAIL PROTECTED]
Date: Thu, 10 May 2007 13:40:22 -0700
David Miller wrote:
If the qdisc is packed with packets and we would just loop sending
them to the device, yes it might make sense.
But if that isn't the case, which frankly is the usual case, you add a
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 10 May 2007 13:49:44 -0700
I'd think one would only do this in those situations/places where a
natural out of driver queue develops in the first place wouldn't
one?
Indeed.
-
To unsubscribe from this list: send the line unsubscribe netdev in
the
David Miller wrote:
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 10 May 2007 13:49:44 -0700
I'd think one would only do this in those situations/places where a
natural out of driver queue develops in the first place wouldn't
one?
Indeed.
And one builds in qdisc because your device sink is
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.
I think the intent is to remove queue lock cycles by taking
the whole
From: David Stevens [EMAIL PROTECTED]
Date: Thu, 10 May 2007 14:27:56 -0700
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length
David Stevens a écrit :
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.
I think the intent is to remove queue lock
David Stevens wrote:
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.
I think the intent is to remove queue lock
David Miller wrote:
Right.
But I think it's critical to do two things:
1) Do this when netif_wake_queue() is triggers and thus the
TX is locked already.
2) Have some way for the driver to say how many free TX slots
there are in order to minimize if not eliminate requeueing
during
Which worked _very_ well (the whole list) going in the other direction
for the
netisr queue(s) in HP-UX 10.20. OK, I promise no more old HP-UX stories
for the
balance of the week :)
Yes, OSes I worked on in other lives usually took the whole queue
and then took responsibility for
From: Gagan Arneja [EMAIL PROTECTED]
Date: Thu, 10 May 2007 14:50:19 -0700
David Miller wrote:
If you drop the TX lock, the number of free slots can change
as another cpu gets in there queuing packets.
Can you ever have more than one thread inside the driver? Isn't
xmit_lock held
Eric Dumazet wrote:
David Stevens a écrit :
The word small is coming up a lot in this discussion, and
I think packet size really has nothing to do with it. Multiple
streams generating packets of any size would benefit; the
key ingredient is a queue length greater than 1.
I think the intent is
On Thu, 10 May 2007 14:14:05 -0700
Gagan Arneja [EMAIL PROTECTED] wrote:
David Miller wrote:
From: Rick Jones [EMAIL PROTECTED]
Date: Thu, 10 May 2007 13:49:44 -0700
I'd think one would only do this in those situations/places where a
natural out of driver queue develops in the first
If you have braindead slow hardware,
there is nothing that says your start_xmit routine can't do its own
coalescing. The cost of calling the transmit routine is the
responsibility
of the driver, not the core network code.
Yes, except you very likely run the risk of the driver introducing
Ian McDonald [EMAIL PROTECTED] wrote on 05/11/2007 12:29:08 AM:
As I see this proposed patch it is about reducing the number of task
switches between the driver and the protocol. I use task switch in
speech marks as it isn't really as is in the kernel. So in other words
we are hoping that
Gagan Arneja [EMAIL PROTECTED] wrote on 05/11/2007 01:13:53 AM:
Also, I think, you don't have to chain skbs, they're already chained in
Qdisc-q. All you have to do is take the whole q and try to shove it
at the device hoping for better results. But then, if you have rather
big backlog, you
J Hadi Salim [EMAIL PROTECTED] wrote on 05/11/2007 01:41:27 AM:
It's not just small packets. The cost of calling hard_start_xmit/byte
was rather high on your particular device. I've seen PCI read
transaction in hard_start_xmit taking ~10,000 cycles on one particular
device. Count the
David Miller [EMAIL PROTECTED] wrote on 05/11/2007 02:07:10 AM:
From: Gagan Arneja [EMAIL PROTECTED]
Date: Thu, 10 May 2007 12:43:53 -0700
Also, I think, you don't have to chain skbs, they're already chained in
Qdisc-q. All you have to do is take the whole q and try to shove it
at
Krishna Kumar2 wrote:
I haven't seen reordering packets (I did once when I was having a bug in
the requeue code, some TCP messages on receiver indicating packets out of
order). When a send fails, the packet are requeued in reverse (go to end of
the failed skb and traverse back to the failed skb
Gagan Arneja [EMAIL PROTECTED] wrote on 05/11/2007 11:05:47 AM:
Krishna Kumar2 wrote:
I haven't seen reordering packets (I did once when I was having a bug
in
the requeue code, some TCP messages on receiver indicating packets out
of
order). When a send fails, the packet are requeued in
93 matches
Mail list logo