subject:"RE\: \[PATCH\] IPROUTE\: Modify tc for new PRIO multiqueue behavior"

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-10 Thread jamal

On Fri, 2007-11-05 at 09:58 +0800, Zhu Yi wrote:

> 
> Good, we agree on this. 

Good start.

> Now let's solve the problem.
> 

Lets take it slowly, because i think i wasnt getting anywhere with
Peter. See if you make sense of what i am saying maybe we can make
progress.

> When the low priority ring buffer is full in the hardware, will you
> suppose the driver call netif_stop_queue() or not? 

Yes, I will. 
With a note: This is precisely what i have been saying all along
in my posting. I explained my reasoning; if you missed the why I think i
should write it up.
So in order to make progress - i am going to not respond to the rest of
the email; tell me if you dont understand me and i will do a write up
hopefully tommorow or over the weekend.
Or tell me you understand me and disagree because of 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-10 Thread Zhu Yi

On Thu, 2007-05-10 at 08:35 -0400, jamal wrote:
> So we may be agreeing then?
> In other words, if you had both low prio and high prio in WMM
> scheduler
> (in wireless hardware) then the station favors a higher priority
> packet
> over at low priority packet at ALL times.
> IOW:
> Given the default 802.11e AIFS, CWmin/max and PF (and TXOP) parameters
> used for the different WMM queues there is no way that a lower prio
> packet will ever be allowed to leave when it is competing with a
> higher
> prio packet. 
> This approach is what the strict prio qdisc already does. The slight
> difference is the prio qdisc is deterministic and the WMM is
> statistical
> (strict prio) in nature -  i.e there is a statiscal "luck" possibility
> (not design intent) for an lower prio packet to go out.
> 
> Does this make sense? 

Good, we agree on this. Now let's solve the problem.

When the low priority ring buffer is full in the hardware, will you
suppose the driver call netif_stop_queue() or not? In old ethernet, I
think the answer is yes because the packets in the ring buffer have to
be sent out anyway before there is room for the new packets. But in
wireless (or multiqueue devices), the high priority packets can be sent
out thru high ring buffer although the low ring buffer is full. This is
how wireless MAC differs where we agreed above.

To enable this, we need to manage the device queues separately, i.e.
netif_stop_subqueue() and the Qdisc dequeue methold is able to feed the
device with only high priority packets in the low ring full case.

I think it's possible to do all these changes in a specific Qdisc and
leave all the Qdisc APIs, netif_{start,stop}_queue(), etc untouched. But
it turns out to be what mac80211 QoS is right now. You also call it a
hack, right? I think Peter's patch resovle the problem in a generic way.
This avoids every multiqueue device creates its own Qdisc for doing its
work. Besides, it also duplicates a lot of common code.

Thanks,
-yi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-10 Thread jamal

On Thu, 2007-10-05 at 11:22 -0700, Waskiewicz Jr, Peter P wrote:
> > Wireless offers a strict priority scheduler with statistical 
> > transmit (as opposed to deterministic offered by the linux 
> > strict prio qdisc); so wireless is not in the same boat as DCE.
> 
> Again, you're comparing these patches with DCE, which is not the intent.
> It's something I presented that can use these patches, not as a
> justification for them.

I was making the claim that wireless _does not_ need you sawing off then
on the core code. It will work just fine with a prio qdisc. And the more
i think about it, the less i think DCE needs it ...

> I ran some tests on a 1gig network (isolated) using 2 hardware queues,
> streaming video on one and having everything else on the other queue.
> After the buffered video is sent and the request for more video is made,
> I see a slowdown with a single queue.  

What does this mean?

> I see a difference using these patches to mitigate the impact to the 
> different flows; 

Thats is an extremely strong statement to be making. We need your
patches to get effective qos?

> Linux may be good
> at scheduling, but that doesn't help when hardware is being pushed to
> its limit - this was running full line-rate constantly (uncompressed mpg
> for video and standard iperf settings for LAN traffic).

What qdisc did you use on the single hardware queue? What was the
classifier you used to separate the video from the rest?
Why do i get the feeling that you did not configure Linux to give you
the separation needed?  If you want to do it proper i can help. 
I will chop off the rest of the text below because imo you need to
compare apples to apples and we are not getting anywhere.

Ok, how do we make progress forward? It seems to me we are back to
square one where i dont see a meeting in the middle.

I wanted to help, but you are so persistent on selling your patches that
we are loosing track of the discussion.
I strongly disagree with your approach and you strongly agree with your
patches. Maybe i should drop off this conversation and you can go
convince Dave? 

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-10 Thread Waskiewicz Jr, Peter P

> Wireless offers a strict priority scheduler with statistical 
> transmit (as opposed to deterministic offered by the linux 
> strict prio qdisc); so wireless is not in the same boat as DCE.

Again, you're comparing these patches with DCE, which is not the intent.
It's something I presented that can use these patches, not as a
justification for them.

> Once you run the ATA over ethernet with your approach, please 
> repeat the test with a single ring in hardware and an 
> equivalent qdisc in linux.
> I dont believe you will see any difference - Linux is that good.
> This is not to say i am against your patches, I am just for 
> optimizing for the common.

I ran some tests on a 1gig network (isolated) using 2 hardware queues,
streaming video on one and having everything else on the other queue.
After the buffered video is sent and the request for more video is made,
I see a slowdown with a single queue.  I see a difference using these
patches to mitigate the impact to the different flows; Linux may be good
at scheduling, but that doesn't help when hardware is being pushed to
its limit - this was running full line-rate constantly (uncompressed mpg
for video and standard iperf settings for LAN traffic).

I almost ran some tests where I resize the Tx rings to give more buffer
for the streaming video (or ATA over Ethernet, in my previous example),
and less for the LAN traffic.  I can see people who want to ensure more
resources for latency-sensitive traffic doing this, and it would
certainly show a more significant impact without the queue visibility in
the kernel.  I did not run these tests though, since unmodified ring
sizes showed that with my patches, I have less impact to my more
demanding flow than with a single ring and the same qdisc.  I suggest
you actually try it and see.

So I have run these tests at 1gig with a 2-core and 4-core system.  I'd
argue this is optimizing for the common, since I used streaming video in
my test, whereas someone else can use ATA over Ethernet, ndb, or VoIP,
and still benefit this way.  Please provide a counter-argument or data
showing this is not the case.

> You dont believe Linux has actually been doing QoS all these 
> years before DCE? It has. And we have been separating flows 
> all those years too. 

Indeed it has been.  But the hardware is now getting fast enough and
feature rich enough that the stack needs to mature and use the extra
queues.  Having multiple queues in software, multiple queues in
hardware, and a one-lane tunnel to get between them is not right in my
opinion.  It's like taking a 2-lane highway and putting a 1-lane tunnel
in the middle of it; when traffic gets heavy, everyone is affected,
which is wrong.  That's why they put those neat diamond lanes on
highways.  :)

> Wireless with CSMA/CA is a slightly different beast due to 
> the shared channels; its worse but not very different in 
> nature than the case where you have a shared ethernet hub 
> (CSMA/CD) and you keep adding hosts to it
> - we dont ask the qdiscs to backoff because we have a collision.
> Where i find wireless intriguing is in the case where its 
> available bandwidth adjusts given the signal strength - but 
> you are talking about HOLs not that specific phenomena.

You keep referring to doing things for the "common," but you're giving
specific wireless-based examples with specific packet scheduling
configurations.  I've given 3 scenarios of fairly different traffic
configurations where these patches will help.  Yi Zhu has also replied
that he sees wireless benefiting from these patches, but if you don't
believe that's the case, it's something you guys can hash out.

Thanks,

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-10 Thread jamal

On Thu, 2007-10-05 at 11:02 +0800, Zhu Yi wrote:

> The difference is the hub provides the same transmission chance for all
> the packets but in wireless, high priority packets will block low
> priority packets transmission. You can argue there is still chances a
> low priority packet is sent first before a high priority one. But this
> is not the point of wireless QoS. It rarely happens and should be avoid
> at best effort in the implementation.

So we may be agreeing then?
In other words, if you had both low prio and high prio in WMM scheduler
(in wireless hardware) then the station favors a higher priority packet
over at low priority packet at ALL times.
IOW:
Given the default 802.11e AIFS, CWmin/max and PF (and TXOP) parameters
used for the different WMM queues there is no way that a lower prio
packet will ever be allowed to leave when it is competing with a higher
prio packet. 
This approach is what the strict prio qdisc already does. The slight
difference is the prio qdisc is deterministic and the WMM is statistical
(strict prio) in nature -  i.e there is a statiscal "luck" possibility
(not design intent) for an lower prio packet to go out.

Does this make sense?

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-09 Thread Zhu Yi

On Tue, 2007-05-08 at 19:28 -0400, jamal wrote:
> Wireless with CSMA/CA is a slightly different beast due to the shared
> channels; its worse but not very different in nature than the case
> where you have a shared ethernet hub (CSMA/CD) and you keep adding
> hosts to it 

The difference is the hub provides the same transmission chance for all
the packets but in wireless, high priority packets will block low
priority packets transmission. You can argue there is still chances a
low priority packet is sent first before a high priority one. But this
is not the point of wireless QoS. It rarely happens and should be avoid
at best effort in the implementation.

Thanks,
-yi
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-09 Thread Johannes Berg

On Tue, 2007-05-08 at 09:28 -0400, jamal wrote:

> Those virtual devices you have right now. They are a hack that needs to
> go at some point.

Actually, if we're talking about mac80211, the "master" device we have
that does the qos stuff must go, but the other virtual devices need to
stay for WDS/AP and a lot of other combinations. Just to clear up some
possible confusion.

johannes

signature.asc
Description: This is a digitally signed message part

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-08 Thread jamal

On Tue, 2007-08-05 at 08:35 -0700, Waskiewicz Jr, Peter P wrote:

> But the point is that although the DCE spec inspired the development of
> these patches, that is *not* the goal of these patches.  As Yi stated in
> a previous reply to this thread, the ability for any hardware to control
> its queues at the stack level in the kernel is something that is missing
> in the kernel.  If the hardware doesn't want to support it, then the
> patches as-is will not require anything to change in drivers to continue
> working as they do today.

Wireless offers a strict priority scheduler with statistical transmit
(as opposed to deterministic offered by the linux strict prio qdisc);
so wireless is not in the same boat as DCE.

> Bottom line: these patches are not for a specific technology.  I
> presented that spec to show a possible use case for these patches.  Yi
> presented a use case he can use in the wireless world.  I will be
> posting another use case shortly using ATA over Ethernet.
> 

Once you run the ATA over ethernet with your approach, please repeat the
test with a single ring in hardware and an equivalent qdisc in linux.
I dont believe you will see any difference - Linux is that good.
This is not to say i am against your patches, I am just for optimizing
for the common.

> > I dont believe wireless needs anything other than the simple 
> > approach i described. The fact that there an occasional low 
> > prio packet may endup going out first before a high prio due 
> > to the contention is non-affecting to the overall results.
> 
> I don't see how we can agree that having any type of
> head-of-line-blocking of a flow is or is not a problem.  

But where is this head-of-line blocking coming from?
Please correct me if am wrong:
If i had 4 hardware rings/queues in a wireless NIC with 4 different
WMM priorities all filled up (I would say impposible to achieve but for
the sake of discussion assume possible), then 
there is still a probability that a low prio packet will be sent first
before a high prio one. It all depends on the probabilistic nature of
the channel availability as well as the tx opportunity and backoff
timings.

> You believe it
> isn't an issue, but this is a gap that I see existing in the stack
> today.  As networking is used for more advanced features (such as ndb or
> VoIP), having the ability to separate flows from each other all the way
> to the wire I see is a huge advantage to ensure true QoS.
> 

You dont believe Linux has actually been doing QoS all these years
before DCE? It has. And we have been separating flows all those years
too. 
Wireless with CSMA/CA is a slightly different beast due to the shared
channels; its worse but not very different in nature than the case where
you have a shared ethernet hub (CSMA/CD) and you keep adding hosts to it
- we dont ask the qdiscs to backoff because we have a collision.
Where i find wireless intriguing is in the case where its available
bandwidth adjusts given the signal strength - but you are talking about
HOLs not that specific phenomena.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-08 Thread Waskiewicz Jr, Peter P

> As a summary, I am not against the concept of addressing 
> per-ring flow control. 
> Having said that, i fully understand where DaveM and Stephen 
> are coming from. Making such huge changes to a critical 
> region to support uncommon hardware doesnt abide to the 
> "optimize for the common" paradigm. 

But the point is that although the DCE spec inspired the development of
these patches, that is *not* the goal of these patches.  As Yi stated in
a previous reply to this thread, the ability for any hardware to control
its queues at the stack level in the kernel is something that is missing
in the kernel.  If the hardware doesn't want to support it, then the
patches as-is will not require anything to change in drivers to continue
working as they do today.

Bottom line: these patches are not for a specific technology.  I
presented that spec to show a possible use case for these patches.  Yi
presented a use case he can use in the wireless world.  I will be
posting another use case shortly using ATA over Ethernet.

> I dont believe wireless needs anything other than the simple 
> approach i described. The fact that there an occasional low 
> prio packet may endup going out first before a high prio due 
> to the contention is non-affecting to the overall results.

I don't see how we can agree that having any type of
head-of-line-blocking of a flow is or is not a problem.  You believe it
isn't an issue, but this is a gap that I see existing in the stack
today.  As networking is used for more advanced features (such as ndb or
VoIP), having the ability to separate flows from each other all the way
to the wire I see is a huge advantage to ensure true QoS.

It's a shift in thinking.

Thanks,
-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-08 Thread jamal

On Tue, 2007-08-05 at 11:45 +0200, Johannes Berg wrote:
..

Sorry, I missed a lot of the discussions; I am busyed out and will try
to catchup later tonight. I have quickly scanned the emails and
I will respond backwards (typically the most effective
way to catchup with a thread).

As a summary, I am not against the concept of addressing per-ring flow
control. 
Having said that, i fully understand where DaveM and Stephen are coming
from. Making such huge changes to a critical region to support uncommon
hardware doesnt abide to the "optimize for the common" paradigm. That is
also the basis of my arguement all along. I also agree it is quiet
fscked an approach to have the virtual flow control. I think it is
driven by some marketing people and i dont really think there is a
science behind it. Switched (External) PCI-E which is supposed to be
really cheap and hit the market RSN has per-virtual queue flow control,
so that maybe where that came from. In any case, that is a digression. 
Peter, can we meet the goals you strive for and stick to the "optimize
for the common"? How willing are you to change directions to achieve
those goals?

> On Tue, 2007-05-08 at 17:33 +0800, Zhu Yi wrote:
> 
> > Jamal, as you said, the wireless subsystem uses an interim workaround
> > (the extra netdev approach) to achieve hardware packets scheduling. But
> > with Peter's patch, the wireless stack doesn't need the workaround
> > anymore. This is the actual fix.
> 

I dont believe wireless needs anything other than the simple approach i
described. The fact that there an occasional low prio packet may endup
going out first before a high prio due to the contention is
non-affecting to the overall results.

> Actually, we still need multiple devices for virtual devices? Or which
> multiple devices are you talking about here?
> 

Those virtual devices you have right now. They are a hack that needs to
go at some point.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-08 Thread Johannes Berg

Somehow I didn't see the mails inbetween. Let me think.

On Tue, 2007-05-08 at 17:33 +0800, Zhu Yi wrote:

> Jamal, as you said, the wireless subsystem uses an interim workaround
> (the extra netdev approach) to achieve hardware packets scheduling. But
> with Peter's patch, the wireless stack doesn't need the workaround
> anymore. This is the actual fix.

Actually, we still need multiple devices for virtual devices? Or which
multiple devices are you talking about here?

johannes

signature.asc
Description: This is a digitally signed message part

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-08 Thread Zhu Yi

On Fri, 2007-05-04 at 23:22 +0200, Johannes Berg wrote:
> On Fri, 2007-05-04 at 13:43 -0700, Waskiewicz Jr, Peter P wrote:
> > If hardware exists that wants the granularity to
> > start/stop queues independent of each other and continue to have
> > traffic
> > flow, I really think it should be able to do that. 
> 
> Not much of an if there, I'm pretty sure at least some wireless hardware
> can do that. We've been watching this multiqueue stuff for a while now
> with some interest but haven't hashed out yet how we could use it.

Right.

Jamal, as you said, the wireless subsystem uses an interim workaround
(the extra netdev approach) to achieve hardware packets scheduling. But
with Peter's patch, the wireless stack doesn't need the workaround
anymore. This is the actual fix.

On Wed, 02 May 2007 08:43:49 -0400, jamal wrote:

> You feel the need to keep all the rings busy even when one is
> shutdown; I claim by having a synced up qdisc of the same scheduler 
> type you dont need to worry about that. Both approaches are correct;
> what iam proposing is many factors simpler.

Let me explain why this is not true for wireless. The wireless priority
happens in the MAC level. That is, packets priority not only compete
each other in the host, they also compete in the network. For example,
once the wireless medium becomes idle from busy, the higher priority
packet seizes the channel after waiting for a shorter time period (which
makes the channel unavailable again). Both the high and low priority
packets have to be queued in the hardware queues before they are sent
out so that the hardware knows how to kick off its timers when it
detects the medium is idle. If the Qdisc stops feeding all packets just
because the hardware low prio queue is full (as it cannot seize the
channel in the network), it is unfair to the local high prio packets.
The host is too "nice(2)" to NOT let local high prio packets complete
with the ones in the other hosts. BTW, you cannot write a smiliar
scheduler in the Qdisc since it requires hard real time in microsecond
level.

After a second thought, this is not wireless specific. It can be
generalized as hardware level packet scheduling. I think kernel needs
such kind of support. And Peter's patch address this well.

Thanks,
-yi

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread Johannes Berg

On Fri, 2007-05-04 at 13:43 -0700, Waskiewicz Jr, Peter P wrote:
> If hardware exists that wants the granularity to
> start/stop queues independent of each other and continue to have
> traffic
> flow, I really think it should be able to do that. 

Not much of an if there, I'm pretty sure at least some wireless hardware
can do that. We've been watching this multiqueue stuff for a while now
with some interest but haven't hashed out yet how we could use it.

johannes

signature.asc
Description: This is a digitally signed message part

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread David Miller

From: "Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]>
Date: Fri, 4 May 2007 13:43:43 -0700

> And if someone can explain to me why 2 months of review and scrutiny
> of these patches has shifted in another direction, I'd really like
> to understand that.

One reason is that you're sort of making it clear that this feature is
for something else.  Something that you can't disclose at this time,
which of course in and of itself is perfectly fine.

However, if you can't talk about the real motivation for this feature,
and it really doesn't stand %100 on it's own as-is without that
information (I don't think it does), trying to get this work in now
using a different premise it's a little bit dishonest.  Don't you
think?

It's also a bit unreasonable to ask the community to buy into a
technology, the real purpose of which you cannot even talk about.

I'm sure everyone is happy to reconsider this work when the real
motivation can be talked about in the open.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread Waskiewicz Jr, Peter P

> Just because they want to standardize, and put it in hardware 
> doesn't mean it is a good idea and Linux needs to support it!

I gave this as the motivation for the original idea.  But these patches
have been under scrutiny in the community for months now, and nobody
seemed to think they were totally wrong in design, just a few
implementation issues.  I did not design this specifically for this
technology either; this was the original idea, but the idea of allowing
the kernel to manage multiple queues on the NIC is the main focus here,
which many people seemed to like.  What happened that shifted that
perception?

> Why is it better for hardware to make the "next packet to 
> send" decision?
> For wired ethernet, I can't see how adding the complexity of 
> fixed number of small queues is a gain. Better to just do the 
> priority decision in software and then queue it to the 
> hardware. This seems like the old Token Ring and MAP/TOP 
> style crap crammed on top of Ethernet.

Are you saying it'd be better for the flow control requests per priority
to come up into the stack?  If you look closely at what I proposed in my
patches, it's just an API exposing the queues in the NIC to the kernel.
That's all.  How the hardware handles it is completely independent to
these patches.  If hardware exists that wants the granularity to
start/stop queues independent of each other and continue to have traffic
flow, I really think it should be able to do that.  That is all these
patches are providing.  Jamal and I talked about the original
motivation, and so I shared that with the community.  I don't want the
focus to shift to a proposed standard that could take advantage of these
patches; rather, I'd like the focus to remain on the patches themselves,
and what they're providing.  And if someone can explain to me why 2
months of review and scrutiny of these patches has shifted in another
direction, I'd really like to understand that.

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread David Miller

From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 4 May 2007 13:01:10 -0700

> Just because they want to standardize, and put it in hardware doesn't
> mean it is a good idea and Linux needs to support it!
> 
> Why is it better for hardware to make the "next packet to send" decision?
> For wired ethernet, I can't see how adding the complexity of fixed number
> of small queues is a gain. Better to just do the priority decision in software
> and then queue it to the hardware. This seems like the old Token Ring
> and MAP/TOP style crap crammed on top of Ethernet.

I suspect perhaps the real impetus behind this facility is not being
disclosed.

It certainly is not going to be faster to do this in hardware.

I myself don't see why in the world one would want to do this in
hardware either.  Software can do the selection fast enough and with
tons more flexibility.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread Stephen Hemminger

On Thu, 3 May 2007 14:03:07 -0700
"Waskiewicz Jr, Peter P" <[EMAIL PROTECTED]> wrote:

> > Lets come up with some terminology; lets call multiqueue what 
> > the qdiscs do; lets call what the NICs do multi-ring.
> > Note, i have thus far said you need to have both and they 
> > must be in sync.
> 
> I agree with the terminology.
> 
> > This maybe _the_ main difference we have in opinion.
> > Like i said earlier, I used to hold the same thoughts you do.
> > And i think you should challenge my assertion that it doesnt 
> > matter if you have a single entry point; [my assumptions are 
> > back in what i called #b and #c].
> 
> Here is a paper that describes what exactly we're trying to do:
> http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf.  Basically
> we need the ability to pause a queue independantly of another queue.
> Because of this requirement, the kernel needs visibility into the driver
> and to have knowledge of and provide control of each queue.  Please note
> that the API I'm proposing is a generic representation of the Datacenter
> Ethernet mentioned in the paper; I figured if we're putting in an
> interface to support it, it should be generic so other technologies out
> there could easily use it.
> 

Just because they want to standardize, and put it in hardware doesn't
mean it is a good idea and Linux needs to support it!

Why is it better for hardware to make the "next packet to send" decision?
For wired ethernet, I can't see how adding the complexity of fixed number
of small queues is a gain. Better to just do the priority decision in software
and then queue it to the hardware. This seems like the old Token Ring
and MAP/TOP style crap crammed on top of Ethernet.


-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-04 Thread Waskiewicz Jr, Peter P

> Let me see if i got this right:
> This new standard sends _flow control_ packets per 802.1p value?

Yes.

> Sounds a bit fscked. I am assuming that the link flow control 
> is still on (not that i am a big fan).

No, it is not.  They're mutually exclusive.

> Is Datacenter ethernet the name of the standard or just a 
> marketing term?

It's the standard name, just as it's called out in the paper.

> I suspect that vendors have not yet started deploying this technology?

No, this is new stuff.

> Is there a switch out there that supports the feature? In 
> your view is this technology going to be more prelevant or 
> just a half-ass marketing adventure?

There is no hardware that exists that does this yet that I'm aware of.
My view is that this technology will actually be pretty prevalent as the
standard is ratified, and vendors begin deploying network products using
it.  I personally cringe when I have marketing people blowing smoke
about technology, and I haven't cringed much in any discussions
surrounding this technology.

> This certainly adds a new twist to the whole thing. I agree 
> that we need to support the feature and i see more room for a 
> consensus now (which was missing before). I need to think some more.

I appreciate that.

> Like i said this is very useful detail to know.
> Give me sometime to get back to you. I need to mull over it.
> My strong view is still:
> a)  the changes to be totaly transparent to the user.

The way the patches work as-is today, the user doesn't see any change
*unless* the driver underneath uses the new code.  Otherwise, everything
that exists operates just like it does now.

> b)  to have any new qdiscs (WRR for example) for multi-ring 
> hardware to benefit single-ring hardware

I agree we'll benefit from qdiscs that can help with multi-ring; I still
think a configurable PRIO (turn multi-ring on, multi-ring off) is useful
in the meantime.

> c)  no changes to the core; i can see perhaps a new call to 
> the qdisc to provide +/- credit but i need to think some more 
> about it ..

Note that the DCE spec called out here is not saying to have any of this
credit or flow control support in software, rather it's in hardware.
The only piece of this we need in the kernel is the per-queue
start/stop.  That being said, I see no other option but to change the
core (otherwise we can't know if a queue stopped before ->dequeue()).
If you can find another option to provide what we're doing without
touching the core, I'm all ears.

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-03 Thread jamal

On Thu, 2007-03-05 at 14:03 -0700, Waskiewicz Jr, Peter P wrote:

> Here is a paper that describes what exactly we're trying to do:
> http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf.  Basically
> we need the ability to pause a queue independantly of another queue.

Ok, this is useful info Peter. 

Let me see if i got this right:
This new standard sends _flow control_ packets per 802.1p value?
Sounds a bit fscked. I am assuming that the link flow control is still
on (not that i am a big fan). And i wonder how it fits to end2end TCP
flow control etc; cant find much details on google. It almost smells
like credit/rate based ATM flow control in disguise. Almost like these
folks think in terms of ATM VCs..
Is Datacenter ethernet the name of the standard or just a marketing
term?
I suspect that vendors have not yet started deploying this technology?
Is there a switch out there that supports the feature? In your view
is this technology going to be more prelevant or just a half-ass
marketing adventure?

> Because of this requirement, the kernel needs visibility into the driver
> and to have knowledge of and provide control of each queue.  Please note
> that the API I'm proposing is a generic representation of the Datacenter
> Ethernet mentioned in the paper; I figured if we're putting in an
> interface to support it, it should be generic so other technologies out
> there could easily use it.

This certainly adds a new twist to the whole thing. I agree that we need
to support the feature and i see more room for a consensus now (which
was missing before). I need to think some more.

> Hopefully that paper can help people understand the motivation why I've
> done things the way they are in the patches.  Given this information,
> I'd really like to solicit feedback on the patches as they stand (both
> approach and implementation).

Like i said this is very useful detail to know.
Give me sometime to get back to you. I need to mull over it.
My strong view is still:
a)  the changes to be totaly transparent to the user.
b)  to have any new qdiscs (WRR for example) for multi-ring hardware to
benefit single-ring hardware
c)  no changes to the core; i can see perhaps a new call to the
qdisc to provide +/- credit but i need to think some more about it ..

If you can achieve those goals, we can go a long way ...

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-03 Thread Waskiewicz Jr, Peter P

> Lets come up with some terminology; lets call multiqueue what 
> the qdiscs do; lets call what the NICs do multi-ring.
> Note, i have thus far said you need to have both and they 
> must be in sync.

I agree with the terminology.

> This maybe _the_ main difference we have in opinion.
> Like i said earlier, I used to hold the same thoughts you do.
> And i think you should challenge my assertion that it doesnt 
> matter if you have a single entry point; [my assumptions are 
> back in what i called #b and #c].

Here is a paper that describes what exactly we're trying to do:
http://www.ieee802.org/3/ar/public/0503/wadekar_1_0503.pdf.  Basically
we need the ability to pause a queue independantly of another queue.
Because of this requirement, the kernel needs visibility into the driver
and to have knowledge of and provide control of each queue.  Please note
that the API I'm proposing is a generic representation of the Datacenter
Ethernet mentioned in the paper; I figured if we're putting in an
interface to support it, it should be generic so other technologies out
there could easily use it.

Hopefully that paper can help people understand the motivation why I've
done things the way they are in the patches.  Given this information,
I'd really like to solicit feedback on the patches as they stand (both
approach and implementation).

Cheers,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-02 Thread jamal

On Tue, 2007-01-05 at 16:04 -0700, Waskiewicz Jr, Peter P wrote:

I am just gonna delete stuff you had above here because i think you 
repeat those thoughts below. Just add back anything missed.
I will try to make this email shorter, but i am not sure i will
succeed;->

> > 1) You want to change the core code; i dont see a need for that.
> > The packet is received by the driver and netif stop works as 
> > before, with zero changes; the driver shuts down on the first 
> > ring full.
> > The only work is mostly driver specific.
> 
> To me, this doesn't buy you anything to do multiqueue only in the
> driver. 

Lets come up with some terminology; lets call multiqueue what the 
qdiscs do; lets call what the NICs do multi-ring.
Note, i have thus far said you need to have both and they must be
in sync.

> I agree the driver needs work to manage the queues in the
> hardware, but if a single feeder from the kernel is handing it
> packets, you gain nothing in my opinion without granularity of
> stopping/starting each queue in the kernel.
> 
This maybe _the_ main difference we have in opinion.
Like i said earlier, I used to hold the same thoughts you do.
And i think you should challenge my assertion that it doesnt
matter if you have a single entry point; 
[my assumptions are back in what i called #b and #c].

> The changes to PRIO are an initial example of getting my multiqueue
> approach working.  This is the only qdisc I see being a logical change
> for multiqueue; other qdiscs can certainly be added in the future,
> which
> I plan on once multiqueue device support is in the kernel in some
> form.

Fair enough, I looked at:
http://internap.dl.sourceforge.net/sourceforge/e1000/OpenSDM_8257x-10.pdf
and it seems to be implementing WRR (the M and N parameters in the count
field).
WRR doesnt exist in Linux - for no good reason really; theres a gent who
promised to submit some clean code for it but hasnt been heard of since;
you can find some really old code i wrote here:
http://www.cyberus.ca/~hadi/patches/prio-drr.kernel.patch.gz

If you clean that up as a linux qdisc, then other NICs can use it.
In your approach, that would only be usable by NICs with multi-rings
that implement WRR.

> > 3) For me: It is a driver change mostly. A new qdisc may be 
> > needed - but thats about it. 
> 
> I guess this is a fundamental difference in our thinking.  I think of
> multiqueue as the multiple paths out of the kernel, being managed by
> per-queue states.  

Yes, you are right; i think this is where we differ the most.
You feel the need to keep all the rings busy even when one is shutdown;
I claim by having a synced up qdisc of the same scheduler type you dont
need to worry about that. 
Both approaches are correct; what iam proposing is many factors simpler.

> If that is the case, the core code has to be changed at some level,
> specifically in dev_queue_xmit(), so it can check the state of the
> subqueue your skb has been associated with
> (skb->queue_mapping in my patchset).  The qdisc needs to comprehend
> how to classify the skb (using TOS or TC) and assign the queue on the
> NIC to transmit on.
> 
Indeed, something like that would be needed; but it could also be a
simple scheme like netdev->pick_ring(skb->prio) or something along those
lines once the drivers hardware transmit is invoked - and therefore you
move it away from main core. This way you only have the multi-ring
drivers doing the checks.
[Note: there is hardware i have seen which use IEEE semantics of what
priority means (essentialy 802.1p); which happens to be the reverse of
what IETF thinks of it (DSCP/TOS view); so proper mapping is nessary]

> My question to you is this: can you explain the benefit of 
> not allowing the kernel to know of and be able to manage the 
> queues on the NIC?  This seems to be the heart of our disagreement; 

For correctness it is not necessary to use all rings at the same time;
this is not to say if you do that (as you did) is wrong, both schemes
are correct.
To answer your question: For me it is for the sake of a simplicity i.e
being less intrusive and making it transparent to both writers of qdiscs
as well as users of those qdiscs. Simplicity is always better if there
was no trumping difference. Simplicity can never trump correctness. i.e
It would be totaly wrong if i am going to put a packet in coach
class when it paid for business class and vice-versa for the sake of a
simple scheme. But that is not gonna happen with either of those two
approaches.
The added side-benefit is if you followed what i described, you can then
get that WRR working for e1000 and other NICs as well. This in itself is
a big advantage.

> I  view the ability to manage
> these queues from the kernel as being true multiqueue, and view doing
> the queue management solely in the driver as something that doesn't
> give any benefit.

Let me a bit long winded ...
"benefit" should map to correctness; both yours and mine are correct. 
To my approach, managing the

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-01 Thread Waskiewicz Jr, Peter P

> > If that queue is stopped, the qdisc will never get called to run and
> > ->dequeue(), and hard_start_xmit() will never be called. 
> 
> yes, that is true (and the desired intent)

That intent is not what we want with our approach.  The desired intent
is to have independent network flows from the kernel, through the qdisc
layer, into the NIC, and onto the wire.  No flow should have the ability
to interfere with another flow's operation unless the device itself
shuts down, or a link-based flow control event occurs.  I see little to
no benefit of enabling multiqueue in a driver when all you'll get is
essentially one pipe of traffic, since the granularity of the management
of those queues is rough to the kernel.  I guess this is the source of
our differing views.

> The kernel is already multiqueue capable. Thats what qdiscs do.

The kernel is multiqueue capable to enqueue in software.  It is not
multiqueue capable when dequeuing to a NIC with multiple queues.  They
dequeue based on dev->state, which is single-threaded.  These patches
are the glue to make the multiple queues in software hook to the
multiple queues in the driver directly.

> Heres what i see the differences to be:
> 
> 1) You want to change the core code; i dont see a need for that.
> The packet is received by the driver and netif stop works as 
> before, with zero changes; the driver shuts down on the first 
> ring full.
> The only work is mostly driver specific.

To me, this doesn't buy you anything to do multiqueue only in the
driver.  I agree the driver needs work to manage the queues in the
hardware, but if a single feeder from the kernel is handing it packets,
you gain nothing in my opinion without granularity of stopping/starting
each queue in the kernel.

> 2) You want to change qdiscs to make them multi-queue specific.
> I only see a need for adding missing schedulers (if they are 
> missing) and not having some that work with multiqueues and 
> other that dont.
> The assumption is that you have mappings between qdisc and hw-rings.

The changes to PRIO are an initial example of getting my multiqueue
approach working.  This is the only qdisc I see being a logical change
for multiqueue; other qdiscs can certainly be added in the future, which
I plan on once multiqueue device support is in the kernel in some form.

> 3) For me: It is a driver change mostly. A new qdisc may be 
> needed - but thats about it. 

I guess this is a fundamental difference in our thinking.  I think of
multiqueue as the multiple paths out of the kernel, being managed by
per-queue states.  If that is the case, the core code has to be changed
at some level, specifically in dev_queue_xmit(), so it can check the
state of the subqueue your skb has been associated with
(skb->queue_mapping in my patchset).  The qdisc needs to comprehend how
to classify the skb (using TOS or TC) and assign the queue on the NIC to
transmit on.

My question to you is this: can you explain the benefit of not allowing
the kernel to know of and be able to manage the queues on the NIC?  This
seems to be the heart of our disagreement; I view the ability to manage
these queues from the kernel as being true multiqueue, and view doing
the queue management solely in the driver as something that doesn't give
any benefit.

> 4) You counter-argue that theres no need for QoS at the qdisc 
> if the hardware does it; i am counter-counter-arguing if you 
> need to write a new scheduler, it will benefit the other 
> single-hwqueue devices.

Not sure I completely understand this, but this an external argument to
the core discussion; let's leave this until we can agree on the actual
multiqueue network device support implementation.

> These emails are getting too long - typically people loose 
> interest sooner. In your response, can you perhaps restrict 
> it to just that last part and add anything you deem important?

Yes, long emails are bad.  I appreciate your efforts to come up to speed
on what we've done here, and offering your viewpoints.  Hopefully we can
come to an agreement of some sort soon, since this work has been going
on for some time to be halted after quite a bit of engineering and
community feedback.

Thanks Jamal,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-01 Thread jamal

On Tue, 2007-01-05 at 11:27 -0700, Waskiewicz Jr, Peter P wrote: 

> My patches have been under discussion for
> a few months, and the general approach has been fairly well-accepted.
> The comments that David, Patrick, and Thomas have given were more on
> implementation, which have been fixed and are what you see today.  
> So if you don't like my approach, then please provide an alternative.

I have provided you an alternative - maybe it is unclear and in which
case i will be willing to explain again. 
It really doesnt matter if i look at the patch. 
Unless you are challenging my knowledge of that subsystem,
I am not gonna comment on the semantics of you missing a comma or space 
somewhere because my disagreements are on your conceptual approach; it is 
therefore irrelvant to me that 10 other people have given you comments on 
your code. 
I am not questioning your (probably) great coding skills. I have provided 
you an alternative if you try to listen. An alternative that doesnt
require big changes that you are making or break the assumed layering
we have. 

> Please explain why this is a brute force approach.  Today, netdev gives
> drivers an API to manage the device queue (dev->queue_lock).  I extended
> this to provide a management API to manage each queue on a device.  This
> to me makes complete sense; why hide the fact a device has multiple
> queues from the kernel?  I don't understand your comments here.

What i meant by brute force was you could move all the management into
the driver. For example, you could put additional software queues in
the driver that hold things that dont fit into rings. I said this
was better because nothing is touched in the core code.
An API to me is a function call; the device queue managemnt is the tx
state transition.
If this is still not clear, just ignore it because it is a distraction.
The real core is further below.

> E1000 is not strict prio scheduling, rather, it is round-robin.  This
> question was posed by Patrick McHardy on netdev and I answered it 2
> weeks ago.

RR would fit just fine as well in what i described.
I gave prio qdisc as an example because you were modifying it and
because it covers the majority of the devices i have seen out there that
implement multi-queueing.
As i said further, what i described can be extended to any scheduling
algorithm: strict prio, WRR etc. 
I just want you to understand what i am describing; and if you 
want i can show you how it would work for a scheduler of your choice.

> > I am making the assumptions that: 
> > a) you understand the basics of strict prio scheduling
> > b) You have configured strict prio in the qdisc level and the 
> > hardware levels to be synced i.e if your hardware is capable 
> > of only strict prio, then you better use a matching strict 
> > prio qdisc (and not another qdisc like HTB etc). If your 
> > hardware is capable 2 queues, you better have your qdisc with 
> > only two bands.
> 
> I disagree.  If you read Patrick's comments, using strict PRIO in
> software and in hardware will probably give you results you don't want.

That was based on how your patches work. It is also
why you would need to say "multiqueue" in the tc config.

> Why do 2 layers of QoS? 

Look at it this way:
Linux already has the necessary qdiscs; if there is
one or two missing, writing it will solve your problem.
This way both single-hw-queue and multi-hw-queue hardware benefit
from any new qdisc.
I dont need a speacilized qdisc just because i have multiqueue
hardware.
More importantly, you dont need to make core code changes.
You may need to add a new qdisc if it doesnt exist; but you can do that
without mucking with the core code. It is clean.

> If your hardware has a hardware-based QoS, 
> then you should have a generic round-robin (unassuming) qdisc above 
> it.  

If your hardware has RR scheduling, then you better be fair in feeding
that scheduler packets in a RR fashion otherwise bad things would
happen. Thats the basis of #b. I probably misunderstood
your wording, you seem to be saying the opposite.

> I
> don't understand why you would ever want to do 2 layers of
> prioritization; this is just unnecessary work for the CPU to be doing.
> And if you see how my patches to PRIO are working, you'd see that it
> allows a user to choose as many bands as he/she wants, and they will be
> assigned to the underlying hardware queues.  This is very similar to the
> mapping that the 802.1p spec calls out for how to map many priorities to
> fewer queues.

No big deal:
It is what the prio qdisc has been doing since the dinosaur days;

> > c) If you programmed a TOS, DSCP , IEEE 802.1p to go to qdisc 
> > queue PH via some classifier, then you will make sure that 
> > packets from qdisc PH end up in hardware queue PH.
> > 
> > Not following #b and #c means it is a misconfiguration; i 
> > hope we can agree on that. i.e you need to have both the 
> > exact qdisc that maps to your hardware qdisc as well as 
> > synced configura

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-05-01 Thread Waskiewicz Jr, Peter P

> On Fri, 2007-27-04 at 08:45 -0700, Waskiewicz Jr, Peter P wrote:
> > > On Thu, 2007-26-04 at 09:30 -0700, Waskiewicz Jr, Peter P wrote:
> 
> 
> > I agree, that to be fair for discussing the code that you 
> should look 
> > at the patches before drawing conclusions.
> > I appreciate the fact you have
> > a different idea for your approach for multiqueue, but 
> without having 
> > specific things to discuss in terms of implementation, I'm 
> at a loss 
> > for what you want to see done.  These patches have been released in 
> > the community for a few months now, and the general 
> approach has been 
> > accepted for the most part.
> > 
> 
> Sorry, I (was too busy with real work and) wasnt keeping up 
> with netdev.
> And stop whining please if you want me to comment; that is 
> such an important part of the network subsystem - so your 
> patches need more scrutiny because their impact is huge. And 
> i know that subsystem enough that i dont need to look at your 
> patches to know you are going to be hit by a big truck (by 
> just observing you are crossing a busy highway on foot).

I don't know how you think I'm whining here.  I'm just expecting people
that make comments on patches submitted to actually look at them if they
wish to make comments.  Otherwise, I have no idea what to fix if you
don't give context.  The basis of your criticism and comments should be
based on the code, and not be influenced that my approach is different
than your approach.  If you have a better approach, then please post it
so the community can decide.  My patches have been under discussion for
a few months, and the general approach has been fairly well-accepted.
The comments that David, Patrick, and Thomas have given were more on
implementation, which have been fixed and are what you see today.  So if
you don't like my approach, then please provide an alternative.

> 
> > That being said, my approach was to provide an API for drivers to 
> > implement multiqueue support.  We originally went with an 
> idea to do 
> > the multiqueue support in the driver.
> 
> That is certainly one (brute) approach. This way you meet the 
> requirement of not changing anything on the qdisc level (user 
> or kernel level). But i am not sure you need an "API" perse.

Please explain why this is a brute force approach.  Today, netdev gives
drivers an API to manage the device queue (dev->queue_lock).  I extended
this to provide a management API to manage each queue on a device.  This
to me makes complete sense; why hide the fact a device has multiple
queues from the kernel?  I don't understand your comments here.

>  
> > However, many questions came up that
> > were answered by pulling things into the qdisc / netdev layer.
> > Specifically, if all the multiqueue code is in the driver, 
> how would 
> > you ensure one flow of traffic (say on queue 0) doesn't 
> interfere with 
> > another flow (say on queue 1)?  If queue 1 on your NIC ran out of 
> > descriptors, the driver will set dev->queue_lock to 
> __LINK_STATE_XOFF, 
> > which will cause all entry points into the scheduler to 
> stop (i.e. - 
> > no more packets going to the NIC).  That will also shut 
> down queue 0.  
> > As soon as that happens, that is not multiqueue network 
> support.  The 
> > other question was how to classify traffic.  We're 
> proposing to use tc 
> > filters to do it, since the user has control over that; having 
> > flexibility to meet different network needs is a plus.  We 
> had tried 
> > doing queue selection in the driver, and it killed 
> performance.  Hence 
> > why we pulled it into the qdisc layer.
> 
> at some point when my thinking was evolving, I had similar 
> thoughts crossing my mind, but came to the conclusion i was 
> thinking too hard when i started (until i started to 
> look/think about the OLPC mesh network challenge).
> 
> Lets take baby steps so we can make this a meaningful discussion.
> Ignore wireless for a second and talk just about simple wired 
> interfaces; we can then come back to wireless in a later discussion.
> 
> For the first baby steps, lets look at strict prio which if i 
> am not mistaken is what you e1000 NICs support; but even that 
> were not the case, strict prio covers a huge amount of 
> multi-queue capability. 
> For simplicity, lets pick something with just 2 hardware 
> queues; PH and PL (PH stands for High Prio and PL low prio). 
> With me so far?

E1000 is not strict prio scheduling, rather, it is round-robin.  This
question was posed by Patrick McHardy on netdev and I answered it 2
weeks ago.

> 
> I am making the assumptions that: 
> a) you understand the basics of strict prio scheduling
> b) You have configured strict prio in the qdisc level and the 
> hardware levels to be synced i.e if your hardware is capable 
> of only strict prio, then you better use a matching strict 
> prio qdisc (and not another qdisc like HTB etc). If your 
> hardware is capable 2 queues, you better have your qdisc with 
> only two b

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-30 Thread jamal

On Fri, 2007-27-04 at 08:45 -0700, Waskiewicz Jr, Peter P wrote:
> > On Thu, 2007-26-04 at 09:30 -0700, Waskiewicz Jr, Peter P wrote:

> I agree, that to be fair for discussing the code that you should look at
> the patches before drawing conclusions.  
> I appreciate the fact you have
> a different idea for your approach for multiqueue, but without having
> specific things to discuss in terms of implementation, I'm at a loss for
> what you want to see done.  These patches have been released in the
> community for a few months now, and the general approach has been
> accepted for the most part.
> 

Sorry, I (was too busy with real work and) wasnt keeping up with netdev.
And stop whining please if you want me to comment; that is such an
important part of the network subsystem - so your patches need more
scrutiny because their impact is huge. And i know that subsystem enough
that i dont need to look at your patches to know you are going to be hit
by a big truck (by just observing you are crossing a busy highway on
foot).

> That being said, my approach was to provide an API for drivers to
> implement multiqueue support.  We originally went with an idea to do the
> multiqueue support in the driver.  

That is certainly one (brute) approach. This way you meet the
requirement of not changing anything on the qdisc level (user or kernel
level). But i am not sure you need an "API" perse.

> However, many questions came up that
> were answered by pulling things into the qdisc / netdev layer.
> Specifically, if all the multiqueue code is in the driver, how would you
> ensure one flow of traffic (say on queue 0) doesn't interfere with
> another flow (say on queue 1)?  If queue 1 on your NIC ran out of
> descriptors, the driver will set dev->queue_lock to __LINK_STATE_XOFF,
> which will cause all entry points into the scheduler to stop (i.e. - no
> more packets going to the NIC).  That will also shut down queue 0.  As
> soon as that happens, that is not multiqueue network support.  The other
> question was how to classify traffic.  We're proposing to use tc filters
> to do it, since the user has control over that; having flexibility to
> meet different network needs is a plus.  We had tried doing queue
> selection in the driver, and it killed performance.  Hence why we pulled
> it into the qdisc layer.

at some point when my thinking was evolving, I had similar thoughts
crossing my mind, but came to the conclusion i was thinking too hard
when i started (until i started to look/think about the OLPC mesh
network challenge).

Lets take baby steps so we can make this a meaningful discussion.
Ignore wireless for a second and talk just about simple wired
interfaces; we can then come back to wireless in a later discussion.

For the first baby steps, lets look at strict prio which if i am not
mistaken is what you e1000 NICs support; but even that were not the
case, strict prio covers a huge amount of multi-queue capability. 
For simplicity, lets pick something with just 2 hardware queues; PH and
PL (PH stands for High Prio and PL low prio). With me so far?

I am making the assumptions that: 
a) you understand the basics of strict prio scheduling
b) You have configured strict prio in the qdisc level and the hardware
levels to be synced i.e if your hardware is capable of only strict prio,
then you better use a matching strict prio qdisc (and not another qdisc
like HTB etc). If your hardware is capable 2 queues, you better have
your qdisc with only two bands.
c) If you programmed a TOS, DSCP , IEEE 802.1p to go to qdisc queue PH
via some classifier, then you will make sure that packets from qdisc
PH end up in hardware queue PH.

Not following #b and #c means it is a misconfiguration; i hope we can
agree on that. i.e you need to have both the exact qdisc that maps to
your hardware qdisc as well as synced configuration between the two
layers.

Ok, so you ask when to shut down the hw tx path?
1) Lets say you had so many PH packets coming into the hardware PH and
that causes the PH-ring to fill up. At that point you shutdown the hw-tx
path. So what are the consequences? none - newer PH packets still come
in and queue at the qdisc level. Newer PL packets? who cares PH is more
important - so they can rot in qdisc level...
2) Lets say you had so many PL packets coming into the hardware PL and
that causes the PL-ring to fill up. At that point you shutdown the hw-tx
path. So what are the consequences? none - newer PH packets still come
in and queue at the qdisc level; the PL packets causing the tx path to
shut down can be considered to be "already sent to the wire".
And if there was any PH packets to begin with, the qdisc PL packets
would never have been able to shut down the PL-ring.

So what am i saying?
You dont need to touch the qdisc code in the kernel. You just need to
instrument a mapping between qdisc-queues and hw-rings. i.e
You need to meet #b and #c above.
Both #b and #c are provable via queueing and feedback control theory.

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-27 Thread Waskiewicz Jr, Peter P

> jamal wrote:
> > Heres the way i see it from a user perspective:
> > If a NIC has 3 hardware queues; if that NIC supports strict 
> priority 
> > (i.e the prio qdisc) which we already support, there should 
> be no need 
> > for the user to really explicitly enable that support.
> > It should be transparent to them - because by configuring a multi 
> > queue prio qdisc (3 bands/queues default), they are already 
> doing multiqueues.
> 
> Agreed.
> 
>   Jeff
>

Then for clarification, are you asking that I remove the "multiqueue"
option to TC and sch_prio, and have it behave the way it did a few
patches ago?

Thanks,

-PJ
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-27 Thread Waskiewicz Jr, Peter P

> On Thu, 2007-26-04 at 09:30 -0700, Waskiewicz Jr, Peter P wrote:
> > > jamal wrote:
> > > > On Wed, 2007-25-04 at 10:45 -0700, Waskiewicz Jr, Peter P wrote:
> 
> > We have plans to write a new qdisc that has no priority 
> given to any 
> > skb's being sent to the driver.  The reasoning for providing a 
> > multiqueue mode for PRIO is it's a well-known qdisc, so the 
> hope was 
> > people could quickly associate with what's going on.  The other 
> > reasoning is we wanted to provide a way to prioritize 
> various network 
> > flows (ala PRIO), and since hardware doesn't currently exist that 
> > provides flow prioritization, we decided to allow it to continue 
> > happening in software.
> > 
> 
> Reading the above validates my fears that we have some strong 
> differences (refer to my email to Patrick). To be fair to 
> you, i would have to look at your patches. Now i am actually 
> thinking not to look at them at all incase they influence 
> me;-> I think the thing for me to do is provide alternative 
> patches and then we can have smoother discussion.
> The way i see it is you dont touch any qdisc code. qdiscs 
> that are provided by Linux cover a majority of those provided 
> by hardware (Heck, I have was involved on an ethernet switch 
> chip from your company that provided strict prion multiqueues 
> in hardware and didnt need to touch the qdisc code)

I agree, that to be fair for discussing the code that you should look at
the patches before drawing conclusions.  I appreciate the fact you have
a different idea for your approach for multiqueue, but without having
specific things to discuss in terms of implementation, I'm at a loss for
what you want to see done.  These patches have been released in the
community for a few months now, and the general approach has been
accepted for the most part.

That being said, my approach was to provide an API for drivers to
implement multiqueue support.  We originally went with an idea to do the
multiqueue support in the driver.  However, many questions came up that
were answered by pulling things into the qdisc / netdev layer.
Specifically, if all the multiqueue code is in the driver, how would you
ensure one flow of traffic (say on queue 0) doesn't interfere with
another flow (say on queue 1)?  If queue 1 on your NIC ran out of
descriptors, the driver will set dev->queue_lock to __LINK_STATE_XOFF,
which will cause all entry points into the scheduler to stop (i.e. - no
more packets going to the NIC).  That will also shut down queue 0.  As
soon as that happens, that is not multiqueue network support.  The other
question was how to classify traffic.  We're proposing to use tc filters
to do it, since the user has control over that; having flexibility to
meet different network needs is a plus.  We had tried doing queue
selection in the driver, and it killed performance.  Hence why we pulled
it into the qdisc layer.

> 
> > > 
> > > > The driver should be configurable to be X num of queues via
> > > probably
> > > > ethtool. It should default to single ring to maintain 
> old behavior.
> > > 
> > > 
> > > That would probably make sense in either case.
> > 
> > This shouldn't be something enforced by the OS, rather, an 
> > implementation detail for the driver you write.  If you 
> want this to 
> > be something to be configured at run-time, on the fly, then the OS 
> > would need to support it.  However, I'd rather see people try the 
> > multiqueue support as-is first to make sure the simple 
> things work as 
> > expected, then we can get into run-time reconfiguration 
> issues (like 
> > queue draining if you shrink available queues, etc.).  This 
> will also 
> > require some heavy lifting by the driver to tear down queues, etc.
> > 
> 
> It could be probably a module insertion/boot time operation.

This is how the API that I am proposing works.

> 
> > > 
> > > > Ok, i see; none of those other intel people put you through
> > > the hazing
> > > > yet? ;-> This is a netdev matter - so i have taken off lkml
> > > > 
> > 
> > I appreciate the desire to lower clutter from mailing 
> lists, but I see 
> > 'tc' as a kernel configuration utility, and as such, people should 
> > know what we're doing outside of netdev, IMO.  But I'm fine with 
> > keeping this off lkml if that's what people think.
> > 
> 
> All of netdev has to do with the kernel - that doesnt justify 
> cross posting.
> People interested in network related subsystem development 
> will subscribe to netdev. Interest in scsi =. subscribe to 
> scsi mailing lists etc.
> 
> 
> cheers,
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-27 Thread Jeff Garzik


jamal wrote:

Heres the way i see it from a user perspective:
If a NIC has 3 hardware queues; if that NIC supports strict priority
(i.e the prio qdisc) which we already support, there should be no need
for the user to really explicitly enable that support. 
It should be transparent to them - because by configuring a multi queue

prio qdisc (3 bands/queues default), they are already doing multiqueues.


Agreed.

Jeff


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-27 Thread jamal

On Thu, 2007-26-04 at 09:30 -0700, Waskiewicz Jr, Peter P wrote:
> > jamal wrote:
> > > On Wed, 2007-25-04 at 10:45 -0700, Waskiewicz Jr, Peter P wrote:

> We have plans to write a new qdisc that has no priority given to any
> skb's being sent to the driver.  The reasoning for providing a
> multiqueue mode for PRIO is it's a well-known qdisc, so the hope was
> people could quickly associate with what's going on.  The other
> reasoning is we wanted to provide a way to prioritize various network
> flows (ala PRIO), and since hardware doesn't currently exist that
> provides flow prioritization, we decided to allow it to continue
> happening in software.
> 

Reading the above validates my fears that we have some strong
differences (refer to my email to Patrick). To be fair to you, i would
have to look at your patches. Now i am actually thinking not to look at
them at all incase they influence me;->
I think the thing for me to do is provide alternative patches and then
we can have smoother discussion.
The way i see it is you dont touch any qdisc code. qdiscs that are
provided by Linux cover a majority of those provided by hardware
(Heck, I have was involved on an ethernet switch chip from your company
that provided strict prion multiqueues in hardware and didnt need to
touch the qdisc code)

> > 
> > > The driver should be configurable to be X num of queues via 
> > probably 
> > > ethtool. It should default to single ring to maintain old behavior.
> > 
> > 
> > That would probably make sense in either case.
> 
> This shouldn't be something enforced by the OS, rather, an
> implementation detail for the driver you write.  If you want this to be
> something to be configured at run-time, on the fly, then the OS would
> need to support it.  However, I'd rather see people try the multiqueue
> support as-is first to make sure the simple things work as expected,
> then we can get into run-time reconfiguration issues (like queue
> draining if you shrink available queues, etc.).  This will also require
> some heavy lifting by the driver to tear down queues, etc.
> 

It could be probably a module insertion/boot time operation.

> > 
> > > Ok, i see; none of those other intel people put you through 
> > the hazing 
> > > yet? ;-> This is a netdev matter - so i have taken off lkml
> > > 
> 
> I appreciate the desire to lower clutter from mailing lists, but I see
> 'tc' as a kernel configuration utility, and as such, people should know
> what we're doing outside of netdev, IMO.  But I'm fine with keeping this
> off lkml if that's what people think.
> 

All of netdev has to do with the kernel - that doesnt justify cross
posting.
People interested in network related subsystem development will
subscribe to netdev. Interest in scsi =. subscribe to scsi mailing lists
etc.

cheers,

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-27 Thread jamal

On Thu, 2007-26-04 at 17:57 +0200, Patrick McHardy wrote:

> The reason for suggesting to add a TC option was that these patches
> move (parts of) the scheduling policy into the driver since it can
> start and stop individual subqueues, which in turn cause single
> bands of prio not to be dequeued anymore. 

I see.

> To avoid surprising users
> by this it should be explicitly enabled. Another reason is that
> prio below a classful qdisc should most likely not care about
> multiqueue.

Heres the way i see it from a user perspective:
If a NIC has 3 hardware queues; if that NIC supports strict priority
(i.e the prio qdisc) which we already support, there should be no need
for the user to really explicitly enable that support. 
It should be transparent to them - because by configuring a multi queue
prio qdisc (3 bands/queues default), they are already doing multiqueues.
i.e when i say "tc qdisc add root prio bands 4" on eth0, i am already
asking explicitly for 4 strict priority queues on eth0.
This in my opinion is separate from enabling the hardware to do 4
queues - which is a separate abstraction layer (and ethtool would
do fine there).

> We need to change the qdisc layer as well so it knows about the state
> of subqueues and can dequeue individual (active) subqueues. 

The alternative approach is to change the drivers tx state
machine netif_XX to act as well on a per hardware queue level. This is
what i have in mind working with Ashwin.

> The
> alternative to adding it to prio (or a completely new qdisc) is to add
> something very similar to qdisc_restart and have it pass the subqueue
> it wishes to dequeue to ->dequeue, but that would be less flexible
> and doesn't seem to offer any advantages.
> 

Another approach is to add between the qdisc restart and driver tx
a think layer.
You pass the skb->prio and use that as a "classification key"  to select
the correct hardware ring and dont have to change any qdisc since that
layer is between the driver and qdisc.
The challenge then becomes how to throttle/unthrottle a software queue.
But you leave that brunt work to the driver.

> I wouldn't object to putting this into a completely new scheduler
> (sch_multiqueue) though since the scheduling policy might be something
> completely different than strict priority.

I think the wireless work is already in the kernel?
The way i see it is the software scheduler should match the hardware
scheduler. The majority of these hardware scheduling approaches I have
seen match precisely to prio qdisc. i.e there is no need to write a new
scheduler ( for that matter touch an existing scheduler that matches).
Others I have seen may require some work conserving schedulers that dont
have a precise match in Linux today; i think those may have to be
written from scratch.

> The wireless multiqueue scheduler is pratically identical to this one,
> modulo the wireless classifier that should be a seperate module anyway.

The wireless folks seemed to have created an extra netdev to provide the
hierachy. I think that is a sane interim approach, just a little dirty.

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread Jan Engelhardt


On Apr 25 2007 10:45, Waskiewicz Jr, Peter P wrote:
>> 
>> BTW, is there any reason this is being cced to lkml?
>
>Since this change affects how tc interacts with the qdisc layer, I cced
>lkml.

Fine with me, at least I get to know that tc could break :)



Jan
-- 
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread Waskiewicz Jr, Peter P

> Waskiewicz Jr, Peter P wrote:
> >>I wouldn't object to putting this into a completely new scheduler
> >>(sch_multiqueue) though since the scheduling policy might 
> be something 
> >>completely different than strict priority.
> > 
> > 
> > We have plans to write a new qdisc that has no priority 
> given to any 
> > skb's being sent to the driver.
> 
> 
> I'm not sure I understand correctly, "no priority" == single 
> band qdisc?

No.  We'd have a qdisc that has a configurable number of bands, ideally
configured to have a 1-1 mapping with the number of queues on your NIC,
that would round-robin into the device.  Then whatever policy your
device has will be the policy packets are transmitted with.  This could
be a configurable policy in the driver, but at that point, it's
device-specific, and the OS won't care about it, which is good.

> 
> > The reasoning for providing a
> > multiqueue mode for PRIO is it's a well-known qdisc, so the 
> hope was 
> > people could quickly associate with what's going on.  The other 
> > reasoning is we wanted to provide a way to prioritize 
> various network 
> > flows (ala PRIO), and since hardware doesn't currently exist that 
> > provides flow prioritization, we decided to allow it to continue 
> > happening in software.
> 
> 
> Any qdisc serving multiple queues needs some scheduling 
> policy to decide which one to dequeue in case multiple queues 
> are active, so a new qdisc might as well also use strict 
> priority. Two reasons why it might make sense to add a new 
> qdisc are a) the hardware scheduling policy could be 
> something different than prio, like WRR, so a neutral name 
> like sch_multiqueue seems more fitting and b) you don't have 
> to figure out how to pass the new parameter to prio without 
> breaking compatibility.
> 
> >>The wireless multiqueue scheduler is pratically identical 
> to this one, 
> >>modulo the wireless classifier that should be a seperate module 
> >>anyway.
> > 
> > 
> > Yi Zhu from the wireless world has been active with me in this 
> > development effort.  He and I are copresenting a paper at 
> OLS on this 
> > specific topic, so I have been getting a perspective from 
> the wireless 
> > world.
> > 
> > I'd like to know if anyone has looked at the actual kernel patches, 
> > instead of the tiny patch to tc here, since that might answer many 
> > questions or concerns being presented here.  :-)
> 
> 
> I did and I'm fine with the current patches if you get rid of 
> the prio ABI breakage. Using a new scheduler is just a 
> suggestion, but I think it would be cleaner to do so.

Sounds good.  I'll get a fix for PRIO so the multiqueue option isn't
required and allow older versions of TC to work with the new PRIO.  I'm
not dismissing the new scheduler idea either, since it's our intent to
develop one for generic multiqueue, much like you suggested.

Thanks for the feedback.  I'll try to get the ABI fixed up asap.

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread Patrick McHardy

Waskiewicz Jr, Peter P wrote:
>>I wouldn't object to putting this into a completely new scheduler
>>(sch_multiqueue) though since the scheduling policy might be 
>>something completely different than strict priority.
> 
> 
> We have plans to write a new qdisc that has no priority given to any
> skb's being sent to the driver.

I'm not sure I understand correctly, "no priority" == single band
qdisc?

> The reasoning for providing a
> multiqueue mode for PRIO is it's a well-known qdisc, so the hope was
> people could quickly associate with what's going on.  The other
> reasoning is we wanted to provide a way to prioritize various network
> flows (ala PRIO), and since hardware doesn't currently exist that
> provides flow prioritization, we decided to allow it to continue
> happening in software.

Any qdisc serving multiple queues needs some scheduling policy to
decide which one to dequeue in case multiple queues are active, so
a new qdisc might as well also use strict priority. Two reasons
why it might make sense to add a new qdisc are a) the hardware
scheduling policy could be something different than prio, like WRR,
so a neutral name like sch_multiqueue seems more fitting and b)
you don't have to figure out how to pass the new parameter to prio
without breaking compatibility.

>>The wireless multiqueue scheduler is pratically identical to 
>>this one, modulo the wireless classifier that should be a 
>>seperate module anyway.
> 
> 
> Yi Zhu from the wireless world has been active with me in this
> development effort.  He and I are copresenting a paper at OLS on this
> specific topic, so I have been getting a perspective from the wireless
> world.
> 
> I'd like to know if anyone has looked at the actual kernel patches,
> instead of the tiny patch to tc here, since that might answer many
> questions or concerns being presented here.  :-)

I did and I'm fine with the current patches if you get rid of the prio
ABI breakage. Using a new scheduler is just a suggestion, but I think
it would be cleaner to do so.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread Waskiewicz Jr, Peter P

> jamal wrote:
> > On Wed, 2007-25-04 at 10:45 -0700, Waskiewicz Jr, Peter P wrote:
> > 
> >>The previous version of my multiqueue patches I sent for 
> consideration 
> >>had feedback from Patrick McHardy asking that the user be able to 
> >>configure the PRIO qdisc to run with multiqueue support or 
> not.  That 
> >>is why TC needed a modification, since I agreed with 
> Patrick that this 
> >>would be a useful option.
> > 
> > 
> > Patrick is a smart guy and I am almost sure he gave you that advice 
> > based on how your kernel patches work. Since i havent 
> looked at your 
> > patches, I cant swear to that as a fact - hence the "almost"
> 
> 
> The reason for suggesting to add a TC option was that these 
> patches move (parts of) the scheduling policy into the driver 
> since it can start and stop individual subqueues, which in 
> turn cause single bands of prio not to be dequeued anymore. 
> To avoid surprising users by this it should be explicitly 
> enabled. Another reason is that prio below a classful qdisc 
> should most likely not care about multiqueue.
> 
> >>All the versions of multiqueue network device support I've sent for 
> >>consideration had PRIO modified to support multiqueue 
> devices, since 
> >>it lends itself well for the model of multiple, independent flows.
> > 
> > 
> > So it seems your approach is to make changes to every qdisc 
> so you can 
> > support device-multiq, no? This is what i suspected and was 
> > questioning earlier, not the fact you had it in tc (which 
> is a consequence).
> > 
> > My view is:
> > - the burden of the changes should be on the driver. A thin layer 
> > between the qdisc and driver hw tx should help hide those 
> changes from 
> > the qdiscs; i.e i dont see why the kernel side qdisc needs 
> to change.
> > The rest you leave to the user; if the user configures HTB for a 
> > hardware that does multiq which is WRR, then that is their problem.
> 
> 
> We need to change the qdisc layer as well so it knows about 
> the state of subqueues and can dequeue individual (active) 
> subqueues. The alternative to adding it to prio (or a 
> completely new qdisc) is to add something very similar to 
> qdisc_restart and have it pass the subqueue it wishes to 
> dequeue to ->dequeue, but that would be less flexible and 
> doesn't seem to offer any advantages.
> 
> I wouldn't object to putting this into a completely new scheduler
> (sch_multiqueue) though since the scheduling policy might be 
> something completely different than strict priority.

We have plans to write a new qdisc that has no priority given to any
skb's being sent to the driver.  The reasoning for providing a
multiqueue mode for PRIO is it's a well-known qdisc, so the hope was
people could quickly associate with what's going on.  The other
reasoning is we wanted to provide a way to prioritize various network
flows (ala PRIO), and since hardware doesn't currently exist that
provides flow prioritization, we decided to allow it to continue
happening in software.

> 
> > The driver should be configurable to be X num of queues via 
> probably 
> > ethtool. It should default to single ring to maintain old behavior.
> 
> 
> That would probably make sense in either case.

This shouldn't be something enforced by the OS, rather, an
implementation detail for the driver you write.  If you want this to be
something to be configured at run-time, on the fly, then the OS would
need to support it.  However, I'd rather see people try the multiqueue
support as-is first to make sure the simple things work as expected,
then we can get into run-time reconfiguration issues (like queue
draining if you shrink available queues, etc.).  This will also require
some heavy lifting by the driver to tear down queues, etc.

> 
> > Ok, i see; none of those other intel people put you through 
> the hazing 
> > yet? ;-> This is a netdev matter - so i have taken off lkml
> > 

I appreciate the desire to lower clutter from mailing lists, but I see
'tc' as a kernel configuration utility, and as such, people should know
what we're doing outside of netdev, IMO.  But I'm fine with keeping this
off lkml if that's what people think.

> > I will try to talk to the other gent to see if we can join 
> into this 
> > effort instead of a parallel one; the wireless cards have 
> similar needs.
> > I plan to spend time looking at your approach (sorry, my 
> brain likes 
> > to work that way; otherwise i would have looked at it by now).
> 
> 
> The wireless multiqueue scheduler is pratically identical to 
> this one, modulo the wireless classifier that should be a 
> seperate module anyway.

Yi Zhu from the wireless world has been active with me in this
development effort.  He and I are copresenting a paper at OLS on this
specific topic, so I have been getting a perspective from the wireless
world.

I'd like to know if anyone has looked at the actual kernel patches,
instead of the tiny patch to tc here, since that might answer many
questions or co

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread Patrick McHardy

jamal wrote:
> On Wed, 2007-25-04 at 10:45 -0700, Waskiewicz Jr, Peter P wrote:
> 
>>The previous version of my multiqueue patches I sent for consideration
>>had feedback from Patrick McHardy asking that the user be able to
>>configure the PRIO qdisc to run with multiqueue support or not.  That is
>>why TC needed a modification, since I agreed with Patrick that this
>>would be a useful option.
> 
> 
> Patrick is a smart guy and I am almost sure he gave you that advice
> based on how your kernel patches work. Since i havent looked at your
> patches, I cant swear to that as a fact - hence the "almost"

The reason for suggesting to add a TC option was that these patches
move (parts of) the scheduling policy into the driver since it can
start and stop individual subqueues, which in turn cause single
bands of prio not to be dequeued anymore. To avoid surprising users
by this it should be explicitly enabled. Another reason is that
prio below a classful qdisc should most likely not care about
multiqueue.

>>All the versions of multiqueue network device support I've sent for
>>consideration had PRIO modified to support multiqueue devices, since it
>>lends itself well for the model of multiple, independent flows.
> 
> 
> So it seems your approach is to make changes to every qdisc so you can
> support device-multiq, no? This is what i suspected and was questioning
> earlier, not the fact you had it in tc (which is a consequence).
> 
> My view is:
> - the burden of the changes should be on the driver. A thin layer
> between the qdisc and driver hw tx should help hide those changes from
> the qdiscs; i.e i dont see why the kernel side qdisc needs to change.
> The rest you leave to the user; if the user configures HTB for a
> hardware that does multiq which is WRR, then that is their problem.

We need to change the qdisc layer as well so it knows about the state
of subqueues and can dequeue individual (active) subqueues. The
alternative to adding it to prio (or a completely new qdisc) is to add
something very similar to qdisc_restart and have it pass the subqueue
it wishes to dequeue to ->dequeue, but that would be less flexible
and doesn't seem to offer any advantages.

I wouldn't object to putting this into a completely new scheduler
(sch_multiqueue) though since the scheduling policy might be something
completely different than strict priority.

> The driver should be configurable to be X num of queues via probably
> ethtool. It should default to single ring to maintain old behavior.

That would probably make sense in either case.

> Ok, i see; none of those other intel people put you through the hazing
> yet? ;-> This is a netdev matter - so i have taken off lkml
> 
> I will try to talk to the other gent to see if we can join into this
> effort instead of a parallel one; the wireless cards have similar needs.
> I plan to spend time looking at your approach (sorry, my brain likes to
> work that way; otherwise i would have looked at it by now).

The wireless multiqueue scheduler is pratically identical to this one,
modulo the wireless classifier that should be a seperate module anyway.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-26 Thread jamal

On Wed, 2007-25-04 at 10:45 -0700, Waskiewicz Jr, Peter P wrote:
> > -Original Message-

> The previous version of my multiqueue patches I sent for consideration
> had feedback from Patrick McHardy asking that the user be able to
> configure the PRIO qdisc to run with multiqueue support or not.  That is
> why TC needed a modification, since I agreed with Patrick that this
> would be a useful option.

Patrick is a smart guy and I am almost sure he gave you that advice
based on how your kernel patches work. Since i havent looked at your
patches, I cant swear to that as a fact - hence the "almost"

> All the versions of multiqueue network device support I've sent for
> consideration had PRIO modified to support multiqueue devices, since it
> lends itself well for the model of multiple, independent flows.
> 

So it seems your approach is to make changes to every qdisc so you can
support device-multiq, no? This is what i suspected and was questioning
earlier, not the fact you had it in tc (which is a consequence).

My view is:
- the burden of the changes should be on the driver. A thin layer
between the qdisc and driver hw tx should help hide those changes from
the qdiscs; i.e i dont see why the kernel side qdisc needs to change.
The rest you leave to the user; if the user configures HTB for a
hardware that does multiq which is WRR, then that is their problem.
The driver should be configurable to be X num of queues via probably
ethtool. It should default to single ring to maintain old behavior.

> > BTW, is there any reason this is being cced to lkml?
> 
> Since this change affects how tc interacts with the qdisc layer, I cced
> lkml.

Ok, i see; none of those other intel people put you through the hazing
yet? ;-> This is a netdev matter - so i have taken off lkml

I will try to talk to the other gent to see if we can join into this
effort instead of a parallel one; the wireless cards have similar needs.
I plan to spend time looking at your approach (sorry, my brain likes to
work that way; otherwise i would have looked at it by now).

cheers,
jamal 

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-25 Thread Waskiewicz Jr, Peter P

> -Original Message-
> From: J Hadi Salim [mailto:[EMAIL PROTECTED] On Behalf Of jamal
> Sent: Wednesday, April 25, 2007 4:37 AM
> To: Stephen Hemminger
> Cc: Waskiewicz Jr, Peter P; netdev@vger.kernel.org; 
> [EMAIL PROTECTED]; [EMAIL PROTECTED]; cramerj; 
> Kok, Auke-jan H; Leech, Christopher; [EMAIL PROTECTED]
> Subject: Re: [PATCH] IPROUTE: Modify tc for new PRIO 
> multiqueue behavior
> 
> On Tue, 2007-24-04 at 21:05 -0700, Stephen Hemminger wrote:
> > Peter P Waskiewicz Jr wrote:
> 
> > Only if this binary compatiable with older kernels.
> 
> It is not. But i think that is a lesser problem, the bigger 
> question is:
> Why would you need to change a qdisc just so you can support 
> egress multiqueues?

The previous version of my multiqueue patches I sent for consideration
had feedback from Patrick McHardy asking that the user be able to
configure the PRIO qdisc to run with multiqueue support or not.  That is
why TC needed a modification, since I agreed with Patrick that this
would be a useful option.

All the versions of multiqueue network device support I've sent for
consideration had PRIO modified to support multiqueue devices, since it
lends itself well for the model of multiple, independent flows.

> 
> BTW, is there any reason this is being cced to lkml?

Since this change affects how tc interacts with the qdisc layer, I cced
lkml.

> 
> cheers,
> jamal
> 
> PS:- I havent read the kernel patches (i am congested and 
> about 1000 messages behind on netdev) and my opinions may be 
> influenced by an approach i have in trying to help someone 
> fixup a wireless driver with multiqueue support.

As long as someone is looking at them, I'll be happy.  :-)

Thanks,

-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-25 Thread jamal

On Tue, 2007-24-04 at 21:05 -0700, Stephen Hemminger wrote:
> Peter P Waskiewicz Jr wrote:

> Only if this binary compatiable with older kernels.

It is not. But i think that is a lesser problem, the bigger question is:
Why would you need to change a qdisc just so you can support egress
multiqueues?

BTW, is there any reason this is being cced to lkml?

cheers,
jamal

PS:- I havent read the kernel patches (i am congested and about 1000
messages behind on netdev) and my opinions may be influenced by an
approach i have in trying to help someone fixup a wireless driver with
multiqueue support.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

2007-04-24 Thread Stephen Hemminger


Peter P Waskiewicz Jr wrote:

From: Peter P Waskiewicz Jr <[EMAIL PROTECTED]>

Modified tc so PRIO can now have a multiqueue parameter passed to it.  This
will turn on multiqueue behavior if a device has more than 1 queue.  Also,
running tc qdisc ls dev  will display if multiqueue is on or off.

Signed-off-by: Peter P. Waskiewicz Jr <[EMAIL PROTECTED]>
---

 include/linux/pkt_sched.h |1 +
 tc/q_prio.c   |9 ++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d10f353..bab0b9e 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -99,6 +99,7 @@ struct tc_prio_qopt
 {
int bands;  /* Number of bands */
__u8priomap[TC_PRIO_MAX+1]; /* Map: logical priority -> PRIO band */
+   unsigned short multiqueue;  /* 0 for no mq, 1 for mq */
 };
 
 /* TBF section */

diff --git a/tc/q_prio.c b/tc/q_prio.c
index d696e1b..55cb207 100644
--- a/tc/q_prio.c
+++ b/tc/q_prio.c
@@ -29,7 +29,7 @@
 
 static void explain(void)

 {
-   fprintf(stderr, "Usage: ... prio bands NUMBER priomap P1 P2...\n");
+   fprintf(stderr, "Usage: ... prio [multiqueue] bands NUMBER priomap P1 
P2...\n");
 }
 
 #define usage() return(-1)

@@ -39,7 +39,7 @@ static int prio_parse_opt(struct qdisc_util *qu, int argc, 
char **argv, struct n
int ok=0;
int pmap_mode = 0;
int idx = 0;
-   struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1 }};
+   struct tc_prio_qopt opt={3,{ 1, 2, 2, 2, 1, 2, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1 },0};
 
 	while (argc > 0) {

if (strcmp(*argv, "bands") == 0) {
@@ -57,7 +57,9 @@ static int prio_parse_opt(struct qdisc_util *qu, int argc, 
char **argv, struct n
return -1;
}
pmap_mode = 1;
-   } else if (strcmp(*argv, "help") == 0) {
+   } else if (strcmp(*argv, "multiqueue") == 0)
+   opt.multiqueue = 1;
+   else if (strcmp(*argv, "help") == 0) {
explain();
return -1;
} else {
@@ -105,6 +107,7 @@ int prio_print_opt(struct qdisc_util *qu, FILE *f, struct 
rtattr *opt)
if (RTA_PAYLOAD(opt)  < sizeof(*qopt))
return -1;
qopt = RTA_DATA(opt);
+   fprintf(f, "multiqueue %s  ", qopt->multiqueue ? "on" : "off");
fprintf(f, "bands %u priomap ", qopt->bands);
for (i=0; i<=TC_PRIO_MAX; i++)
fprintf(f, " %d", qopt->priomap[i]);

  

Only if this binary compatiable with older kernels.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

39 matches

Mail list logo