Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-21 Thread Florian Smeets
On 20.08.12 10:32, Doug Barton wrote:
 On 08/15/2012 03:18, Alexander Motin wrote:

 It is quite pointless to speculate without real info like mentioned
 above KTR_SCHED traces.
 
 I'm sorry, you're quite wrong about that. In the cases I mentioned, and
 in about 2 out of 3 of the cases where users reported problems and I
 suggested that they try 4BSD, the results were clear. This obviously
 points out that there is a serious problem with ULE, and if I were the
 one who was responsible for that code I would be looking at ways of
 helping users figure out where the problems are. But that's just me.
 
 Main thing I've learned about schedulers, things
 there never work as you expect. There are two many factors are relations
 to predict behavior in every case.
 
 In the web hosting case that I mentioned, I purposely kept every other
 factor consistent; and changed only s/ULE/4BSD/. The results were both
 clear and consistent.
 

Can you please prove that with some actual numbers? I seem to recall you
posted something not too long ago but i was unable to find that right now.

Also can you tell us what you ran and how. I would really like to
reproduce this.

Thanks,
Florian



signature.asc
Description: OpenPGP digital signature


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-20 Thread Doug Barton
On 08/15/2012 03:18, Alexander Motin wrote:
 On 15.08.2012 03:09, Doug Barton wrote:
 On 08/14/2012 12:20 PM, Adrian Chadd wrote:
 Would you be willing to compile a kernel with KTR so you can capture
 some KTR scheduler dumps?

 That way the scheduler peeps can feed this into schedgraph.py (and you
 can too!) to figure out what's going on.

 Maybe things aren't being scheduled correctly and the added latency is
 killing performance?

 You might also try switching to SCHED_ULE to see if it helps.

 Although, in the last few months as mav has been converging the 2 I've
 started to see the same problems I saw on my desktop systems previously
 re-appear even using ULE. For example, if I'm watching an AVI with VLC
 and start doing anything that generates a lot of interrupts (like moving
 large quantities of data from one disk to another) the video and sound
 start to skip. Also, various other desktop features (like menus, window
 switching, etc.) start to take measurable time to happen, sometimes
 seconds.

 ... and lest you think this is just a desktop problem, I've seen the
 same scenario on 8.x systems used as web servers. With ULE they were
 frequently getting into peak load situations that created what I called
 mini thundering herd problems where they could never quite get caught
 up. Whereas switching to 4BSD the same servers got into high-load
 situations less often, and they recovered on their own in minutes.
 
 It is quite pointless to speculate without real info like mentioned
 above KTR_SCHED traces.

I'm sorry, you're quite wrong about that. In the cases I mentioned, and
in about 2 out of 3 of the cases where users reported problems and I
suggested that they try 4BSD, the results were clear. This obviously
points out that there is a serious problem with ULE, and if I were the
one who was responsible for that code I would be looking at ways of
helping users figure out where the problems are. But that's just me.

 Main thing I've learned about schedulers, things
 there never work as you expect. There are two many factors are relations
 to predict behavior in every case.

In the web hosting case that I mentioned, I purposely kept every other
factor consistent; and changed only s/ULE/4BSD/. The results were both
clear and consistent.

 What's about playing AVIs and using other GUIs, key word here and for
 ULE in general is interactivity. ULE gives huge boost to threads it
 counts interactive.

I'm not using ULE. I haven't for over a year. Sorry if I wasn't clear.

 If somebody still wish area for experiments, there is always some:
  - if you want video player to not lag, set negative nice for it (ULE is
 not a magician to guess user wishes);

At the same time, I don't have these problems on my Linux systems, and I
don't need to adjust anything. Not to mention that given how web servers
are one of our main server implementations, the fact that we have what
seems to be a serious performance problem with out default scheduler in
that use case seems like an issue that we would want to address.

Doug

-- 

I am only one, but I am one.  I cannot do everything, but I can do
something.  And I will not let what I cannot do interfere with what
I can do.
-- Edward Everett Hale, (1822 - 1909)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-20 Thread Alexander Motin

On 20.08.2012 11:32, Doug Barton wrote:

On 08/15/2012 03:18, Alexander Motin wrote:

On 15.08.2012 03:09, Doug Barton wrote:

On 08/14/2012 12:20 PM, Adrian Chadd wrote:

Would you be willing to compile a kernel with KTR so you can capture
some KTR scheduler dumps?

That way the scheduler peeps can feed this into schedgraph.py (and you
can too!) to figure out what's going on.

Maybe things aren't being scheduled correctly and the added latency is
killing performance?


You might also try switching to SCHED_ULE to see if it helps.

Although, in the last few months as mav has been converging the 2 I've
started to see the same problems I saw on my desktop systems previously
re-appear even using ULE. For example, if I'm watching an AVI with VLC
and start doing anything that generates a lot of interrupts (like moving
large quantities of data from one disk to another) the video and sound
start to skip. Also, various other desktop features (like menus, window
switching, etc.) start to take measurable time to happen, sometimes
seconds.

... and lest you think this is just a desktop problem, I've seen the
same scenario on 8.x systems used as web servers. With ULE they were
frequently getting into peak load situations that created what I called
mini thundering herd problems where they could never quite get caught
up. Whereas switching to 4BSD the same servers got into high-load
situations less often, and they recovered on their own in minutes.


It is quite pointless to speculate without real info like mentioned
above KTR_SCHED traces.


I'm sorry, you're quite wrong about that. In the cases I mentioned, and
in about 2 out of 3 of the cases where users reported problems and I
suggested that they try 4BSD, the results were clear. This obviously
points out that there is a serious problem with ULE, and if I were the
one who was responsible for that code I would be looking at ways of
helping users figure out where the problems are. But that's just me.


I am not telling anything bad about 4BSD. Choice is provided because 
they are indeed different and none is perfect. 4BSD also has problems. 
What I would like to say is that if we want to improve situation, we 
need more detailed info then just verbal description. I am not telling 
that ULE is perfect. I went there because I've seen problems, and I am 
still fixing some pieces. I am just trying to explain described behavior 
from the point of my knowledge about it, hoping that it may help 
somebody to set up some new experiments or try some tuning/fixing.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-20 Thread Doug Barton
On 08/20/2012 02:59, Alexander Motin wrote:
 On 20.08.2012 11:32, Doug Barton wrote:
 On 08/15/2012 03:18, Alexander Motin wrote:
 On 15.08.2012 03:09, Doug Barton wrote:
 On 08/14/2012 12:20 PM, Adrian Chadd wrote:
 Would you be willing to compile a kernel with KTR so you can capture
 some KTR scheduler dumps?

 That way the scheduler peeps can feed this into schedgraph.py (and you
 can too!) to figure out what's going on.

 Maybe things aren't being scheduled correctly and the added latency is
 killing performance?

 You might also try switching to SCHED_ULE to see if it helps.

 Although, in the last few months as mav has been converging the 2 I've
 started to see the same problems I saw on my desktop systems previously
 re-appear even using ULE. For example, if I'm watching an AVI with VLC
 and start doing anything that generates a lot of interrupts (like
 moving
 large quantities of data from one disk to another) the video and sound
 start to skip. Also, various other desktop features (like menus, window
 switching, etc.) start to take measurable time to happen, sometimes
 seconds.

 ... and lest you think this is just a desktop problem, I've seen the
 same scenario on 8.x systems used as web servers. With ULE they were
 frequently getting into peak load situations that created what I called
 mini thundering herd problems where they could never quite get caught
 up. Whereas switching to 4BSD the same servers got into high-load
 situations less often, and they recovered on their own in minutes.

 It is quite pointless to speculate without real info like mentioned
 above KTR_SCHED traces.

 I'm sorry, you're quite wrong about that. In the cases I mentioned, and
 in about 2 out of 3 of the cases where users reported problems and I
 suggested that they try 4BSD, the results were clear. This obviously
 points out that there is a serious problem with ULE, and if I were the
 one who was responsible for that code I would be looking at ways of
 helping users figure out where the problems are. But that's just me.
 
 I am not telling anything bad about 4BSD.

Yes, I get that, but thanks for making it clear.

 Choice is provided because
 they are indeed different and none is perfect.

... which is why I'm asking you to stop making them more the same until
we get a better idea of what the issues are.

 What I would like to say is that if we want to improve situation, we
 need more detailed info then just verbal description.

And what I'm saying is that the only realistic way that you're going to
get that information that you need is to make it easier for users to
give it to you. I don't know what form that is going to need to take, I
don't know anything about schedulers.

 I am not telling
 that ULE is perfect. I went there because I've seen problems, and I am
 still fixing some pieces. I am just trying to explain described behavior
 from the point of my knowledge about it, hoping that it may help
 somebody to set up some new experiments or try some tuning/fixing.

Yes, I think it's great that you're doing this work. I'm glad to see
that someone is improving ULE. It clearly needs it. :)

Doug

-- 

I am only one, but I am one.  I cannot do everything, but I can do
something.  And I will not let what I cannot do interfere with what
I can do.
-- Edward Everett Hale, (1822 - 1909)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-20 Thread Alexander Motin

On 20.08.2012 13:25, Doug Barton wrote:

On 08/20/2012 02:59, Alexander Motin wrote:

On 20.08.2012 11:32, Doug Barton wrote:

On 08/15/2012 03:18, Alexander Motin wrote:

On 15.08.2012 03:09, Doug Barton wrote:

On 08/14/2012 12:20 PM, Adrian Chadd wrote:

Would you be willing to compile a kernel with KTR so you can capture
some KTR scheduler dumps?

That way the scheduler peeps can feed this into schedgraph.py (and you
can too!) to figure out what's going on.

Maybe things aren't being scheduled correctly and the added latency is
killing performance?


You might also try switching to SCHED_ULE to see if it helps.

Although, in the last few months as mav has been converging the 2 I've
started to see the same problems I saw on my desktop systems previously
re-appear even using ULE. For example, if I'm watching an AVI with VLC
and start doing anything that generates a lot of interrupts (like
moving
large quantities of data from one disk to another) the video and sound
start to skip. Also, various other desktop features (like menus, window
switching, etc.) start to take measurable time to happen, sometimes
seconds.

... and lest you think this is just a desktop problem, I've seen the
same scenario on 8.x systems used as web servers. With ULE they were
frequently getting into peak load situations that created what I called
mini thundering herd problems where they could never quite get caught
up. Whereas switching to 4BSD the same servers got into high-load
situations less often, and they recovered on their own in minutes.


It is quite pointless to speculate without real info like mentioned
above KTR_SCHED traces.


I'm sorry, you're quite wrong about that. In the cases I mentioned, and
in about 2 out of 3 of the cases where users reported problems and I
suggested that they try 4BSD, the results were clear. This obviously
points out that there is a serious problem with ULE, and if I were the
one who was responsible for that code I would be looking at ways of
helping users figure out where the problems are. But that's just me.


I am not telling anything bad about 4BSD.


Yes, I get that, but thanks for making it clear.


Choice is provided because
they are indeed different and none is perfect.


... which is why I'm asking you to stop making them more the same until
we get a better idea of what the issues are.


I have no plans to converge them. I've just found problem in ULE, that 
was replicated into 4BSD and it would be strange to fix one without 
another. But fixing it exposed another old problem specific to 4BSD, 
which I fixed reusing logically equivalent code from ULE. I saw no 
reason to reinvent a wheel there, same as to not fix obvious bug. Sure, 
it can change behavior in some way, but ULE is not guilty.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-20 Thread Doug Barton
On 08/20/2012 06:32, Alexander Motin wrote:
 I have no plans to converge them. I've just found problem in ULE, that
 was replicated into 4BSD and it would be strange to fix one without
 another. But fixing it exposed another old problem specific to 4BSD,
 which I fixed reusing logically equivalent code from ULE. I saw no
 reason to reinvent a wheel there, same as to not fix obvious bug. Sure,
 it can change behavior in some way, but ULE is not guilty.

Thank you for that explanation.

-- 

I am only one, but I am one.  I cannot do everything, but I can do
something.  And I will not let what I cannot do interfere with what
I can do.
-- Edward Everett Hale, (1822 - 1909)
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-18 Thread Kevin Oberman
On Fri, Aug 17, 2012 at 3:23 PM, Ian Lepore
free...@damnhippie.dyndns.org wrote:
 On Fri, 2012-08-17 at 14:29 -0700, Kevin Oberman wrote:
 On Fri, Aug 17, 2012 at 10:11 AM, Ian Lepore
  No!  Not bde!  He'll notice that I violated style(9) by accidentally
  leaving an extra blank line between a comment block and the function
  definition.  :)  (There are probably more violations than that -- I did
  this when I was first trying to come to grips with the differences
  between style(9) and the almost-style(9) standards we use at work.)
 
  When I first proposed the changes, jhb remarked that they sounded good,
  but as far as I know, nobody reviewed the actual diff when I posted it.
  It looks like bde and phk were the primary maintainers back when this
  code was being more actively worked on.

 Why not bde? Everyone needs to learn what the term bruceification means.

 Believe me, there IS good reason for programming style and almost
 everyone with a commit bit gets close. bde will provide a reminder of
 any of those things you forgot were in style(9). This is something we
 should appreciate, even if it does sting a bit.

 Did you miss the smiley I buried between two sentences there?

 Having worked on code written with no style guidelines, I totally
 understand the need for consistent style.  While I find a couple of
 style(9)'s edicts to be massively annoying, all in all I'd rather work
 on code that has a consistent style I hate than on code with no
 consistency.

Nope. I didn't miss it, but you missed the one I neglected to type at
the end of the first line in my message. I really appreciate all that he
does including being the foremost style(9) checker.

Besides, many of the newer folks around here may not understand
just what bruceification is. Only those whose code has not caught
his attention, though. :-)  (I remembered, this time)
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-18 Thread Lev Serebryakov
Hello, Ian.
You wrote 17 августа 2012 г., 18:56:33:

IL That result actually matches my expectation... it fixed only a part of
IL your problem.
  I  was (partly) wrong :( Under ``really high'' load (4MiB/s up/down load
in same time) userland freezes again.
  Unfortunately, it is difficult to repeat such load on request in y
case, so I don't have KTR of scheduling in such case, but here is one
thing I notice: when load is lower (like 2MiB/s both ways) ng_queue
consume only 4-5% of CPU. And before freeze top shows ng_queue
consumes about 60% of CPU. It is strange, as traffic goes up only at
x2 rate...

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Lev Serebryakov
Hello, Ian.
You wrote 16 августа 2012 г., 21:47:06:


IL It's a long shot, but if the trouble you're seeing has the same cause,
IL it should be fixed by this patch:
IL http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html
 It  looks  like, this patch fixes freezes under network load. I could
not  repeat  freezes now (except when `ktrdump' works, but I think, it
is Ok).

 It  also change top layout of processes: em0 tasq is not on the top
now, and system have enough idel time even under load.

 But WiFi is affected by wire traffic :(

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 14:38 +0400, Lev Serebryakov wrote:
 Hello, Ian.
 You wrote 16 августа 2012 г., 21:47:06:
 
 
 IL It's a long shot, but if the trouble you're seeing has the same cause,
 IL it should be fixed by this patch:
 IL 
 http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html
  It  looks  like, this patch fixes freezes under network load. I could
 not  repeat  freezes now (except when `ktrdump' works, but I think, it
 is Ok).
 
  It  also change top layout of processes: em0 tasq is not on the top
 now, and system have enough idel time even under load.
 
  But WiFi is affected by wire traffic :(
 

That result actually matches my expectation... it fixed only a part of
your problem.  I suspected (without very good evidence) that you may
have two unrelated problems; hopefully now that we've eliminated one the
other will be easier to find.

I've submitted a PR with that patch attached, since it has now been
shown to fix a problem on two different sets of (similar) hardware:

  http://www.freebsd.org/cgi/query-pr.cgi?pr=170705

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Adrian Chadd
On 17 August 2012 07:56, Ian Lepore free...@damnhippie.dyndns.org wrote:

 That result actually matches my expectation... it fixed only a part of
 your problem.  I suspected (without very good evidence) that you may
 have two unrelated problems; hopefully now that we've eliminated one the
 other will be easier to find.

 I've submitted a PR with that patch attached, since it has now been
 shown to fix a problem on two different sets of (similar) hardware:

   http://www.freebsd.org/cgi/query-pr.cgi?pr=170705

Hm, who's a good person to review this stuff? Maybe bde?



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 09:58 -0700, Adrian Chadd wrote:
 On 17 August 2012 07:56, Ian Lepore free...@damnhippie.dyndns.org wrote:
 
  That result actually matches my expectation... it fixed only a part of
  your problem.  I suspected (without very good evidence) that you may
  have two unrelated problems; hopefully now that we've eliminated one the
  other will be easier to find.
 
  I've submitted a PR with that patch attached, since it has now been
  shown to fix a problem on two different sets of (similar) hardware:
 
http://www.freebsd.org/cgi/query-pr.cgi?pr=170705
 
 Hm, who's a good person to review this stuff? Maybe bde?
 

No!  Not bde!  He'll notice that I violated style(9) by accidentally
leaving an extra blank line between a comment block and the function
definition.  :)  (There are probably more violations than that -- I did
this when I was first trying to come to grips with the differences
between style(9) and the almost-style(9) standards we use at work.)

When I first proposed the changes, jhb remarked that they sounded good,
but as far as I know, nobody reviewed the actual diff when I posted it.
It looks like bde and phk were the primary maintainers back when this
code was being more actively worked on.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Kevin Oberman
On Fri, Aug 17, 2012 at 10:11 AM, Ian Lepore
free...@damnhippie.dyndns.org wrote:
 On Fri, 2012-08-17 at 09:58 -0700, Adrian Chadd wrote:
 On 17 August 2012 07:56, Ian Lepore free...@damnhippie.dyndns.org wrote:

  That result actually matches my expectation... it fixed only a part of
  your problem.  I suspected (without very good evidence) that you may
  have two unrelated problems; hopefully now that we've eliminated one the
  other will be easier to find.
 
  I've submitted a PR with that patch attached, since it has now been
  shown to fix a problem on two different sets of (similar) hardware:
 
http://www.freebsd.org/cgi/query-pr.cgi?pr=170705

 Hm, who's a good person to review this stuff? Maybe bde?


 No!  Not bde!  He'll notice that I violated style(9) by accidentally
 leaving an extra blank line between a comment block and the function
 definition.  :)  (There are probably more violations than that -- I did
 this when I was first trying to come to grips with the differences
 between style(9) and the almost-style(9) standards we use at work.)

 When I first proposed the changes, jhb remarked that they sounded good,
 but as far as I know, nobody reviewed the actual diff when I posted it.
 It looks like bde and phk were the primary maintainers back when this
 code was being more actively worked on.

Why not bde? Everyone needs to learn what the term bruceification means.

Believe me, there IS good reason for programming style and almost
everyone with a commit bit gets close. bde will provide a reminder of
any of those things you forgot were in style(9). This is something we
should appreciate, even if it does sting a bit.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6...@gmail.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Ian Lepore
On Fri, 2012-08-17 at 14:29 -0700, Kevin Oberman wrote:
 On Fri, Aug 17, 2012 at 10:11 AM, Ian Lepore
  No!  Not bde!  He'll notice that I violated style(9) by accidentally
  leaving an extra blank line between a comment block and the function
  definition.  :)  (There are probably more violations than that -- I did
  this when I was first trying to come to grips with the differences
  between style(9) and the almost-style(9) standards we use at work.)
 
  When I first proposed the changes, jhb remarked that they sounded good,
  but as far as I know, nobody reviewed the actual diff when I posted it.
  It looks like bde and phk were the primary maintainers back when this
  code was being more actively worked on.
 
 Why not bde? Everyone needs to learn what the term bruceification means.
 
 Believe me, there IS good reason for programming style and almost
 everyone with a commit bit gets close. bde will provide a reminder of
 any of those things you forgot were in style(9). This is something we
 should appreciate, even if it does sting a bit.

Did you miss the smiley I buried between two sentences there?

Having worked on code written with no style guidelines, I totally
understand the need for consistent style.  While I find a couple of
style(9)'s edicts to be massively annoying, all in all I'd rather work
on code that has a consistent style I hate than on code with no
consistency.

-- Ian

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-17 Thread Adrian Chadd
.. I did mean bde because it's timekeeping related and he/mav are well
versed in what's going on there. bde likely knows about the older RTC
behaviours too.

Sheesh. It's not always about style(9) :-)



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-16 Thread Ian Lepore
On Wed, 2012-08-15 at 14:40 +0400, Lev Serebryakov wrote:
 Hello, Alexander.
 You wrote 15 августа 2012 г., 14:18:05:
 
 
 AM It is quite pointless to speculate without real info like mentioned
 AM above KTR_SCHED traces. Main thing I've learned about schedulers, things
 AM there never work as you expect. There are two many factors are relations
 AM to predict behavior in every case.
   I'll take these with as much variants (ULE and 4BSD, polling with
 HZ=1000 and interrupts with default HZ) as I can, in day or two.
   Now I have kernels with KTR compiled in (GEN, NET and SCHED).
 
 AM About Soekris and idle CPU measurement, let's start from what kind of 
 AM eventtimer is used there. As soon as it is UP machine, I guess it uses
 AM i8254 timer in periodic mode. It means that it by definition can't
  It doesn't have any other timers. You could think about this machine
 as about good old true i386, with PCI (and some additional fancy
 commands in CPU core, something like classic Pentium) but
 nothing more.
 
 kern.eventtimer.choice: i8254(100) RTC(0)
 kern.eventtimer.et.RTC.flags: 17
 kern.eventtimer.et.RTC.frequency: 32768
 kern.eventtimer.et.RTC.quality: 0
 kern.eventtimer.et.i8254.flags: 1
 kern.eventtimer.et.i8254.frequency: 1193182
 kern.eventtimer.et.i8254.quality: 100
 kern.eventtimer.periodic: 1
 kern.eventtimer.timer: i8254
 kern.eventtimer.activetick: 1
 kern.eventtimer.idletick: 0
 kern.eventtimer.singlemul: 2
 
 AM properly measure load from treads running from hardclock, such as 
 AM dummynet, polling netisr threads, etc.
   You see, here are two different problems:
 
 (a) with polling, system is responsive under any load, but wire2wifi
 performance  is hugely affected by wire2wire traffic (and mpd5
 inbetween). And, yes, top seems to lie about idle time.
 
 (b) with interrupts, system works much better when it works (wire2wifi
 speed is affected by wire2wire traffic, but to much less extent), but
 it freezes every third minute for minute, when traffic is passed, but
 no user-level applications including BIND and DHCP server) works at
 all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
 video playback. It looks like 60-120 seconds lag! At least, in case of
 ULE, I didn't try 4BSD yet.
 

I had trouble earlier this year with an industrial single-board computer
that uses the same chipset as your Soekris (Geode 500 + CS5536) where
the interrupt handler for the RTC chip would occasionally get stuck in a
loop for a minute or more at a time, making userland processes
completely unresponsive during that time.

It's a long shot, but if the trouble you're seeing has the same cause,
it should be fixed by this patch:

http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-16 Thread Adrian Chadd
Hey cool; if this works out for lev, could we get this into -HEAD and MFC it?



Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-16 Thread Lev Serebryakov
Hello, Ian.
You wrote 16 августа 2012 г., 21:47:06:

IL It's a long shot, but if the trouble you're seeing has the same cause,
IL it should be fixed by this patch:
IL http://lists.freebsd.org/pipermail/freebsd-hackers/2012-January/037233.html
 I'll add this patch to my tests, thanks!


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Dimitry Andric

On 2012-08-15 02:09, Doug Barton wrote:

On 08/14/2012 12:20 PM, Adrian Chadd wrote:

...

Maybe things aren't being scheduled correctly and the added latency is
killing performance?


You might also try switching to SCHED_ULE to see if it helps.


Most likely, s/ULE/4BSD/ here, and in the rest of your mail? :)

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Adrian.
You wrote 15 августа 2012 г., 2:20:48:


AC Would you be willing to compile a kernel with KTR so you can capture
AC some KTR scheduler dumps?

AC That way the scheduler peeps can feed this into schedgraph.py (and you
AC can too!) to figure out what's going on.

AC Maybe things aren't being scheduled correctly and the added latency is
AC killing performance?
  I'll  try  this.  Also I've found, that I turned POLLING on (and set
 HZ=1000) a long time ago and forgot about it. I'll try with more
 standard config and try 4BSD too, as now I have much more problems
 with router freezes that with low speed.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Lev.
You wrote 15 августа 2012 г., 0:45:42:

LS  Answer looks trivial: router CPU is bottleneck. But here is one additional
LS detail: `top' never shows less than 50% of idle when torrents are
LS active. And `idle' time with torrents traffic is ALWAYS is higher than
LS without them, but with WiFi traffic.
  Ok,  additional  information:  it  seems,  that  `top'  is liar when
 POLLING is enabled for em0 and vr1 NICs. I'm turned POLLING off, and
 speeds are the same, but `idle' is no more 50%, it is `0%' when
 gateway is overloaded.

 But i still feezes under load with ULE. It looks like ULE is broken.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Doug Barton
On 08/14/2012 09:18 PM, Dimitry Andric wrote:
 On 2012-08-15 02:09, Doug Barton wrote:
 On 08/14/2012 12:20 PM, Adrian Chadd wrote:
 ...
 Maybe things aren't being scheduled correctly and the added latency is
 killing performance?

 You might also try switching to SCHED_ULE to see if it helps.
 
 Most likely, s/ULE/4BSD/ here, and in the rest of your mail? :)
 

yes, thanks
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Garrett Cooper
On Wed, Aug 15, 2012 at 1:21 AM, Lev Serebryakov l...@freebsd.org wrote:
 Hello, Lev.
 You wrote 15 августа 2012 г., 0:45:42:

 LS  Answer looks trivial: router CPU is bottleneck. But here is one 
 additional
 LS detail: `top' never shows less than 50% of idle when torrents are
 LS active. And `idle' time with torrents traffic is ALWAYS is higher than
 LS without them, but with WiFi traffic.
   Ok,  additional  information:  it  seems,  that  `top'  is liar when
  POLLING is enabled for em0 and vr1 NICs. I'm turned POLLING off, and
  speeds are the same, but `idle' is no more 50%, it is `0%' when
  gateway is overloaded.

  But i still feezes under load with ULE. It looks like ULE is broken.

Not sure what card you have, but the lem style e1000 cards were
changed recently (r238953) to use poll a bit differently. Try setting
hw.em.use_legacy_irq=1 as a tunable and see what happens or remove
DEVICE_POLLING altogether?
The clock and scheduling code has also been changed recently
(r239185, r239194, r239183, r239157, r239036, r239013). See if
reverting any or all of the beforementioned commits helps improve
performance for you.
HTH!
-Garrett
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Alexander Motin

On 15.08.2012 03:09, Doug Barton wrote:

On 08/14/2012 12:20 PM, Adrian Chadd wrote:

Would you be willing to compile a kernel with KTR so you can capture
some KTR scheduler dumps?

That way the scheduler peeps can feed this into schedgraph.py (and you
can too!) to figure out what's going on.

Maybe things aren't being scheduled correctly and the added latency is
killing performance?


You might also try switching to SCHED_ULE to see if it helps.

Although, in the last few months as mav has been converging the 2 I've
started to see the same problems I saw on my desktop systems previously
re-appear even using ULE. For example, if I'm watching an AVI with VLC
and start doing anything that generates a lot of interrupts (like moving
large quantities of data from one disk to another) the video and sound
start to skip. Also, various other desktop features (like menus, window
switching, etc.) start to take measurable time to happen, sometimes
seconds.

... and lest you think this is just a desktop problem, I've seen the
same scenario on 8.x systems used as web servers. With ULE they were
frequently getting into peak load situations that created what I called
mini thundering herd problems where they could never quite get caught
up. Whereas switching to 4BSD the same servers got into high-load
situations less often, and they recovered on their own in minutes.


It is quite pointless to speculate without real info like mentioned 
above KTR_SCHED traces. Main thing I've learned about schedulers, things 
there never work as you expect. There are two many factors are relations 
to predict behavior in every case.


About Soekris and idle CPU measurement, let's start from what kind of 
eventtimer is used there. As soon as it is UP machine, I guess it uses 
i8254 timer in periodic mode. It means that it by definition can't 
properly measure load from treads running from hardclock, such as 
dummynet, polling netisr threads, etc.


What's about playing AVIs and using other GUIs, key word here and for 
ULE in general is interactivity. ULE gives huge boost to threads it 
counts interactive. Disk I/O is a good candidate for it, as it does many 
voluntary sleeps by definition, while waiting for data. If it will not 
be counted interactive, it will heavily suffer from latencies while 
waiting for other threads. Modern heavy GUIs and video CODECs same time 
may consume CPU time sequentially for long periods. On busy machines 
they may never sleep at all, trying to catchup incoming data rate. It 
can make ULE count them as batch and so less preferred then I/O. As I've 
said above, let's try to collect some real data first.


If somebody still wish area for experiments, there is always some:
 - if you want video player to not lag, set negative nice for it (ULE 
is not a magician to guess user wishes);

 - same I guess counts for Xorg process;
 - there are number of sysctls ULE provides:
   - kern.sched.interact -- value in percents specifying how much run 
time may have thread to still be counted as interactive;
   - kern.sched.slice or new kern.sched.quantum -- specifying interval 
of context switches for non-interactive threads, historically set to 
100ms. It may be too long now. Reducing it may make system run more 
smooth, while price of those switches is probably not so significant now.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Alexander.
You wrote 15 августа 2012 г., 14:18:05:


AM It is quite pointless to speculate without real info like mentioned
AM above KTR_SCHED traces. Main thing I've learned about schedulers, things
AM there never work as you expect. There are two many factors are relations
AM to predict behavior in every case.
  I'll take these with as much variants (ULE and 4BSD, polling with
HZ=1000 and interrupts with default HZ) as I can, in day or two.
  Now I have kernels with KTR compiled in (GEN, NET and SCHED).

AM About Soekris and idle CPU measurement, let's start from what kind of 
AM eventtimer is used there. As soon as it is UP machine, I guess it uses
AM i8254 timer in periodic mode. It means that it by definition can't
 It doesn't have any other timers. You could think about this machine
as about good old true i386, with PCI (and some additional fancy
commands in CPU core, something like classic Pentium) but
nothing more.

kern.eventtimer.choice: i8254(100) RTC(0)
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.periodic: 1
kern.eventtimer.timer: i8254
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2

AM properly measure load from treads running from hardclock, such as 
AM dummynet, polling netisr threads, etc.
  You see, here are two different problems:

(a) with polling, system is responsive under any load, but wire2wifi
performance  is hugely affected by wire2wire traffic (and mpd5
inbetween). And, yes, top seems to lie about idle time.

(b) with interrupts, system works much better when it works (wire2wifi
speed is affected by wire2wire traffic, but to much less extent), but
it freezes every third minute for minute, when traffic is passed, but
no user-level applications including BIND and DHCP server) works at
all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
video playback. It looks like 60-120 seconds lag! At least, in case of
ULE, I didn't try 4BSD yet.

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Ian FREISLICH
Lev Serebryakov wrote:
 Hello, Lev.
 You wrote 15 =D0=B0=D0=B2=D0=B3=D1=83=D1=81=D1=82=D0=B0 2012 =D0=B3., 0:45:=
 42:
 
 LS  Answer looks trivial: router CPU is bottleneck. But here is one additi=
 onal
 LS detail: `top' never shows less than 50% of idle when torrents are
 LS active. And `idle' time with torrents traffic is ALWAYS is higher than
 LS without them, but with WiFi traffic.
   Ok,  additional  information:  it  seems,  that  `top'  is liar when
  POLLING is enabled for em0 and vr1 NICs. I'm turned POLLING off, and
  speeds are the same, but `idle' is no more 50%, it is `0%' when
  gateway is overloaded.
 
  But i still feezes under load with ULE. It looks like ULE is broken.

Are you sure it's a freeze and not a panic?  I'm seeing very frequent
panics on -CURRENT running as a gateway.  Often, it doesn't come
back without a powercycle because it's unable to complete a crashdump.

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Ian.
You wrote 15 августа 2012 г., 14:57:17:

IF Are you sure it's a freeze and not a panic?  I'm seeing very frequent
  Yes, I'm sure, because I have hardware console attached (serial one,
connected  to  other computer on my network) and because it un-freeze
after  minute  or two, and freeze again after next minute or tow, etc,
without resetting uptime :)
-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Alexander Motin

On 15.08.2012 13:40, Lev Serebryakov wrote:

You wrote 15 августа 2012 г., 14:18:05:
AM It is quite pointless to speculate without real info like mentioned
AM above KTR_SCHED traces. Main thing I've learned about schedulers, things
AM there never work as you expect. There are two many factors are relations
AM to predict behavior in every case.
   I'll take these with as much variants (ULE and 4BSD, polling with
HZ=1000 and interrupts with default HZ) as I can, in day or two.
   Now I have kernels with KTR compiled in (GEN, NET and SCHED).

AM About Soekris and idle CPU measurement, let's start from what kind of
AM eventtimer is used there. As soon as it is UP machine, I guess it uses
AM i8254 timer in periodic mode. It means that it by definition can't
  It doesn't have any other timers. You could think about this machine
as about good old true i386, with PCI (and some additional fancy
commands in CPU core, something like classic Pentium) but
nothing more.

kern.eventtimer.choice: i8254(100) RTC(0)
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.periodic: 1
kern.eventtimer.timer: i8254
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2


Yes, that is what I expected to see there. If you have timecounter other 
then i8254, you can release i8254 from those duties to allow using it as 
one-shot setting hint.attimer.0.timecounter=0. Otherwise there are no 
options now.



AM properly measure load from treads running from hardclock, such as
AM dummynet, polling netisr threads, etc.
   You see, here are two different problems:

(a) with polling, system is responsive under any load, but wire2wifi
performance  is hugely affected by wire2wire traffic (and mpd5
inbetween). And, yes, top seems to lie about idle time.


I don't know why wifi is so different. Suppose it is for some reason 
more affected by latencies.



(b) with interrupts, system works much better when it works (wire2wifi
speed is affected by wire2wire traffic, but to much less extent), but
it freezes every third minute for minute, when traffic is passed, but
no user-level applications including BIND and DHCP server) works at
all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
video playback. It looks like 60-120 seconds lag! At least, in case of
ULE, I didn't try 4BSD yet.


In this case problem may be that kernel and interrupt threads are all 
having absolute priorities. It means until they release the CPU, 
user-level may get no CPU time at all. :(


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Alexander.
You wrote 15 августа 2012 г., 15:07:32:

AM Yes, that is what I expected to see there. If you have timecounter other
AM then i8254, you can release i8254 from those duties to allow using it as
AM one-shot setting hint.attimer.0.timecounter=0. Otherwise there are no 
AM options now.

% dmesg | grep timer
pmtimer0 on isa0
Event timer RTC frequency 32768 Hz quality 0
attimer0: AT timer at port 0x40 on isa0
Event timer i8254 frequency 1193182 Hz quality 100
%

 (a) with polling, system is responsive under any load, but wire2wifi
 performance  is hugely affected by wire2wire traffic (and mpd5
 inbetween). And, yes, top seems to lie about idle time.
AM I don't know why wifi is so different. Suppose it is for some reason
AM more affected by latencies.
  Adrian says, it is.

 (b) with interrupts, system works much better when it works (wire2wifi
 speed is affected by wire2wire traffic, but to much less extent), but
 it freezes every third minute for minute, when traffic is passed, but
 no user-level applications including BIND and DHCP server) works at
 all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
 video playback. It looks like 60-120 seconds lag! At least, in case of
 ULE, I didn't try 4BSD yet.
AM In this case problem may be that kernel and interrupt threads are all
AM having absolute priorities. It means until they release the CPU, 
AM user-level may get no CPU time at all. :(
 How  could  it  be  seen  in  KTR  traces?  Where could I read how to
decipher and read these traces?

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Alexander Motin

On 15.08.2012 14:11, Lev Serebryakov wrote:

Hello, Alexander.
You wrote 15 августа 2012 г., 15:07:32:

AM Yes, that is what I expected to see there. If you have timecounter other
AM then i8254, you can release i8254 from those duties to allow using it as
AM one-shot setting hint.attimer.0.timecounter=0. Otherwise there are no
AM options now.

% dmesg | grep timer
pmtimer0 on isa0
Event timer RTC frequency 32768 Hz quality 0
attimer0: AT timer at port 0x40 on isa0
Event timer i8254 frequency 1193182 Hz quality 100
%


I've meant `kern.timecounter`.


(b) with interrupts, system works much better when it works (wire2wifi
speed is affected by wire2wire traffic, but to much less extent), but
it freezes every third minute for minute, when traffic is passed, but
no user-level applications including BIND and DHCP server) works at
all FOR MINUTE OR MORE. It not looks like 100ms lag, which could affect
video playback. It looks like 60-120 seconds lag! At least, in case of
ULE, I didn't try 4BSD yet.

AM In this case problem may be that kernel and interrupt threads are all
AM having absolute priorities. It means until they release the CPU,
AM user-level may get no CPU time at all. :(
  How  could  it  be  seen  in  KTR  traces?  Where could I read how to
decipher and read these traces?


There is python GUI tool /usr/src/tools/sched/schedgraph.py for it. 
Short manual is inside.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Lev Serebryakov
Hello, Alexander.
You wrote 15 августа 2012 г., 15:19:32:

AM I've meant `kern.timecounter`.
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(800) i8254(0) dummy(-100)
kern.timecounter.hardware: TSC
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 63995
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 276768292
kern.timecounter.tc.TSC.frequency: 499912330
kern.timecounter.tc.TSC.quality: 800
kern.timecounter.invariant_tsc: 0

AM There is python GUI tool /usr/src/tools/sched/schedgraph.py for it.
AM Short manual is inside.
 uh-oh, Python+Tk!  I wonder, will it work on Windows, as I don't have
 ``headed'' FreeBSD or Linux machines :)

 Will it work with ALQ output from KTR, not with output of ktrdump?

-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Ian FREISLICH
Lev Serebryakov wrote:
 Hello, Lev.
 You wrote 15 =D0=B0=D0=B2=D0=B3=D1=83=D1=81=D1=82=D0=B0 2012 =D0=B3., 0:45:=
 42:
 
 LS  Answer looks trivial: router CPU is bottleneck. But here is one additi=
 onal
 LS detail: `top' never shows less than 50% of idle when torrents are
 LS active. And `idle' time with torrents traffic is ALWAYS is higher than
 LS without them, but with WiFi traffic.
   Ok,  additional  information:  it  seems,  that  `top'  is liar when
  POLLING is enabled for em0 and vr1 NICs. I'm turned POLLING off, and
  speeds are the same, but `idle' is no more 50%, it is `0%' when
  gateway is overloaded.
 
  But i still feezes under load with ULE. It looks like ULE is broken.

Are you sure it's a freeze and not a panic?  I'm seeing very frequent
panics on -CURRENT running as a gateway.  Often, it doesn't come
back without a powercycle because it's unable to complete a crashdump.

Ian

-- 
Ian Freislich
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-15 Thread Alexander Motin

On 15.08.2012 14:23, Lev Serebryakov wrote:

Hello, Alexander.
You wrote 15 августа 2012 г., 15:19:32:

AM I've meant `kern.timecounter`.
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(800) i8254(0) dummy(-100)
kern.timecounter.hardware: TSC
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 63995
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 276768292
kern.timecounter.tc.TSC.frequency: 499912330
kern.timecounter.tc.TSC.quality: 800
kern.timecounter.invariant_tsc: 0


So since you have TSC timecounter, the trick with one-shot i8254 mode 
should work for you. Unluckily I was wrong. It should give you more 
correct global CPU usage percents statistics, but neither per-thread CPU 
usage (at least with ULE) nor load averages, as they both still depend 
on hardclock.



AM There is python GUI tool /usr/src/tools/sched/schedgraph.py for it.
AM Short manual is inside.
  uh-oh, Python+Tk!  I wonder, will it work on Windows, as I don't have
  ``headed'' FreeBSD or Linux machines :)

  Will it work with ALQ output from KTR, not with output of ktrdump?


Have no idea what ALQ output looks like. ktrdump output is just a text 
file that script parses.


--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-14 Thread Adrian Chadd
Hi,

Would you be willing to compile a kernel with KTR so you can capture
some KTR scheduler dumps?

That way the scheduler peeps can feed this into schedgraph.py (and you
can too!) to figure out what's going on.

Maybe things aren't being scheduled correctly and the added latency is
killing performance?


Adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: CURRENT as gateway on not-so-fast hardware: where is a bottlneck?

2012-08-14 Thread Doug Barton
On 08/14/2012 12:20 PM, Adrian Chadd wrote:
 Hi,
 
 Would you be willing to compile a kernel with KTR so you can capture
 some KTR scheduler dumps?
 
 That way the scheduler peeps can feed this into schedgraph.py (and you
 can too!) to figure out what's going on.
 
 Maybe things aren't being scheduled correctly and the added latency is
 killing performance?

You might also try switching to SCHED_ULE to see if it helps.

Although, in the last few months as mav has been converging the 2 I've
started to see the same problems I saw on my desktop systems previously
re-appear even using ULE. For example, if I'm watching an AVI with VLC
and start doing anything that generates a lot of interrupts (like moving
large quantities of data from one disk to another) the video and sound
start to skip. Also, various other desktop features (like menus, window
switching, etc.) start to take measurable time to happen, sometimes
seconds.

... and lest you think this is just a desktop problem, I've seen the
same scenario on 8.x systems used as web servers. With ULE they were
frequently getting into peak load situations that created what I called
mini thundering herd problems where they could never quite get caught
up. Whereas switching to 4BSD the same servers got into high-load
situations less often, and they recovered on their own in minutes.

Doug

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org