Re: Any objections/comments on axing out old ATA stack?

2013-03-27 Thread Steve Kargl
On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote:
 Hi.
 
 Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA 
 stack, using only some controller drivers of old ata(4) by having 
 `options ATA_CAM` enabled in all kernels by default. I have a wish to 
 drop non-ATA_CAM ata(4) code, unused since that time from the head 
 branch to allow further ATA code cleanup.
 
 Does any one here still uses legacy ATA stack (kernel explicitly built 
 without `options ATA_CAM`) for some reason, for example as workaround 
 for some regression?

Yes, I use the legacy ATA stack.

 Does anybody have good ideas why we should not drop 
 it now?

Because it works?

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Any objections/comments on axing out old ATA stack?

2013-03-27 Thread Steve Kargl
On Wed, Mar 27, 2013 at 11:35:35PM +0200, Alexander Motin wrote:
 On 27.03.2013 23:32, Steve Kargl wrote:
  On Wed, Mar 27, 2013 at 11:22:14PM +0200, Alexander Motin wrote:
  Hi.
 
  Since FreeBSD 9.0 we are successfully running on the new CAM-based ATA
  stack, using only some controller drivers of old ata(4) by having
  `options ATA_CAM` enabled in all kernels by default. I have a wish to
  drop non-ATA_CAM ata(4) code, unused since that time from the head
  branch to allow further ATA code cleanup.
 
  Does any one here still uses legacy ATA stack (kernel explicitly built
  without `options ATA_CAM`) for some reason, for example as workaround
  for some regression?
 
  Yes, I use the legacy ATA stack.
 
 On 9.x or HEAD where new one is default?

Head.

  Does anybody have good ideas why we should not drop
  it now?
 
  Because it works?
 
 Any problems with new one?
 

Last time I tested the new one, and this was several months
ago, the system (a Dell Latitude D530 laptop) would not boot.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Any objections/comments on axing out old ATA stack?

2013-03-27 Thread Steve Kargl
On Thu, Mar 28, 2013 at 12:22:11AM +0200, Alexander Motin wrote:
 On 28.03.2013 00:05, Steve Kargl wrote:
 
  Last time I tested the new one, and this was several months
  ago, the system (a Dell Latitude D530 laptop) would not boot.
 
 Probably we should just fix that. Any more info?
 

I can't remember all the details.  I intended to try again
as work was being done on the new code at the time.  I 
never got around to it as my laptop worked fine with the
old code and unfortunately I got busy with work and family.
Reading the freebsd-current mailing lists suggests that 
now is not the time to be a hero.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-23 Thread Steve Kargl
On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote:
 On 22 December 2011 11:47, Steve Kargl s...@troutmask.apl.washington.edu 
 wrote:
 
  There is the additional observation in one of my 2008
  emails (URLs have been posted) that if you have N+1
  cpu-bound jobs with, say, job0 and job1 ping-ponging
  on cpu0 (due to ULE's cpu-affinity feature) and if I
  kill job2 running on cpu1, then neither job0 nor job1
  will migrate to cpu1. ?So, one now has N cpu-bound
  jobs running on N-1 cpus.
 
 .. and this sounds like a pretty serious regression. Have you ever
 filed a PR for it?
 

Ah, so goods news!  I cannot reproduce this problem that
I saw 3+ years ago on the 4-cpu node, which is currently
running a ULE kernel.  When I killed the (N+1)th job,
the N remaining jobs are spread across the N cpus.

One difference between the 2008 tests and today tests is
the number of available cpus.  In 2008, I ran the tests
on a node with 8 cpus, while today's test used only a 
node with only 4 cpus.  If this behavior is a scaling
issue, I can't currently test it.  But, today's tests
are certainly encouraging.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-23 Thread Steve Kargl
On Fri, Dec 23, 2011 at 02:49:51PM -0800, Adrian Chadd wrote:
 On 23 December 2011 11:11, Steve Kargl s...@troutmask.apl.washington.edu 
 wrote:
 
  One difference between the 2008 tests and today tests is
  the number of available cpus. ?In 2008, I ran the tests
  on a node with 8 cpus, while today's test used only a
  node with only 4 cpus. ?If this behavior is a scaling
  issue, I can't currently test it. ?But, today's tests
  are certainly encouraging.
 
 Do you not have access to anything with 8 CPUs in it? It'd be nice to
 get clarification that this indeed was fixed.

I have a few nodes with 8 cpus, but those are running 4BSD
kernels.  I try to keep my kernel and world sync, and by
extension the kernel/world on each node is in sync with
all other nodes.  So, while I took the 4 cpu node off-line
and updated it, at the moment I can't take another node
off-line unless I do an update across the entire cluster.
The update is planned for next year.

 Does ULE care (much) if the nodes are hyperthreading or real cores?
 Would that play a part in what it tries to schedule/spread?

I only have opteron processors in the cluster, if you're referring
to Intel's hypertheading technology, I can't look into ULE's
behavior with HTT.  

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 01:07:58AM -0800, Adrian Chadd wrote:
 Are you able to go through the emails here and grab out Attilio's
 example for generating KTR scheduler traces?
 

Did your read this part of my email?

 
  Attilio,
 
  I have placed several files at
 
  http://troutmask.apl.washington.edu/~kargl/freebsd
 
  dmesg.txt  -- dmesg for ULE kernel
  summary-- A summary that includes top(1) output of all runs.
  sysctl.ule.txt -- sysctl -a for the ULE kernel
  ktr-ule-problem-kargl.out.gz

ktr-ule-problem-kargl.out is a 43 MB file.  I don't the
freebsd.org email server would allow that file through.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
 On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
 
 I have placed several files at
 
 http://troutmask.apl.washington.edu/~kargl/freebsd
 
 dmesg.txt  -- dmesg for ULE kernel
 summary-- A summary that includes top(1) output of all runs.
 sysctl.ule.txt -- sysctl -a for the ULE kernel
 ktr-ule-problem-kargl.out.gz 
 
 
 Since time is executed on the master, only the 'real' time is of
 interest (the summary file includes user and sys times).  This
 command is run at 5 times for each N value and up to 10 time for
 some N values with the ULE kernel.  The following table records
 the average 'real' time and the number in (...) is the mean
 absolute deviations. 
 
 #  N ULE 4BSD
 # -
 #  4223.27 (0.502)   221.76 (0.551)
 #  5404.35 (73.82)   270.68 (0.866)
 #  6627.56 (173.0)   247.23 (1.442)
 #  7475.53 (84.07)   285.78 (1.421)
 #  8429.45 (134.9)   223.64 (1.316)
 
 One explanation for taking 1.5-2x times is that with ULE the
 threads are not migrated properly, so you end up with idle cores
 and ready threads not running

That's what I guessed back in 2008 when I first reported the
behavior.  

http://freebsd.monkey.org/freebsd-current/200807/msg00278.html
http://freebsd.monkey.org/freebsd-current/200807/msg00280.html

The top(1) output at the above URL shows 10 completely independent
instances of the same numerically intensive application running
on a circa 2008 ULE kernel.  Look at the PRI column.  The high
PRI jobs are not only pinned to a cpu, but these are running at
100% WCPU.  The low PRI jobs seem to be pinned to a subset of the
available cpus and simply ping-pong in and out of the same cpus.
In this instance, there are 5 jobs competing for time on 3 cpus.

 Also, perhaps one could build a simple test process that replicates
 this workload (so one can run it as part of regression tests):
   1. define a CPU-intensive function f(n) which issues no
  system calls, optionally touching
  a lot of memory, where n  determines the number of iterations.
   2. by trial and error (or let the program find it),
  pick a value N1 so that the minimum execution time
  of f(N1) is in the 10..100ms range
   3. now run the function f() again from an outer loop so
  that the total execution time is large (10..100s)
  again with no intervening system calls.
   4. use an external shell script can rerun a process
  when it terminates, and then run multiple instances
  in parallel. Instead of the external script one could
  fork new instances before terminating, but i am a bit
  unclear how CPU inheritance works when a process forks.
  Going through the shell possibly breaks the chain.

The tests at the above URL does essentially what you
propose except in 2008 the kzk90 programs were doing 
some IO.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
 On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
  
  I have placed several files at
  
  http://troutmask.apl.washington.edu/~kargl/freebsd
  
  dmesg.txt  -- dmesg for ULE kernel
  summary-- A summary that includes top(1) output of all runs.
  sysctl.ule.txt -- sysctl -a for the ULE kernel
  ktr-ule-problem-kargl.out.gz 

I've replaced the original version of the ktr file with
a new version.  The old version was corrupt due to my
failure to set 'sysctl debug.ktr.mask=0' prior to the
dump.

 One explanation for taking 1.5-2x times is that with ULE the
 threads are not migrated properly, so you end up with idle cores
 and ready threads not running (the other possible explanation
 would be that there are migrations, but they are so frequent and
 expensive that they completely trash the caches. But this seems
 unlikely for this type of task).

I've used schedgraph to look at the ktrdump output.  A jpg is
available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
This shows the ping-pong effect where here 3 processes appear to be
using 2 cpus while the remaining 2 processes are pinned to their
cpus.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote:
 on 22/12/2011 20:45 Steve Kargl said the following:
  I've used schedgraph to look at the ktrdump output.  A jpg is
  available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
  This shows the ping-pong effect where here 3 processes appear to be
  using 2 cpus while the remaining 2 processes are pinned to their
  cpus.
 
 I'd recommended enabling CPU-specific background colors via the menu in
 schedgraph for a better illustration of your findings.
 
 NB: I still don't understand the point of purposefully running N+1 CPU-bound
 processes.
 

The point is that this is a node in a HPC cluster with
multiple users.  Sure, I can start my job on this node
with only N cpu-bound jobs.  Now, when user John Doe
wants to run his OpenMPI program should he login into
the 12 nodes in the cluster to see if someone is already
running N cpu-bound jobs on a given node?  4BSD
gives my jobs and John Doe's jobs a fair share of the
available cpus.  ULE does not give a fair share and 
if you read the summary file I put up on the web,
you see that it is fairly non-deterministic on when a
OpenMPI run will finish (see the mean absolute deviations
in the table of 'real' times that I posted).

There is the additional observation in one of my 2008
emails (URLs have been posted) that if you have N+1
cpu-bound jobs with, say, job0 and job1 ping-ponging
on cpu0 (due to ULE's cpu-affinity feature) and if I
kill job2 running on cpu1, then neither job0 nor job1
will migrate to cpu1.  So, one now has N cpu-bound
jobs running on N-1 cpus.

Finally, my initial post in this email thread was to
tell O. Hartman to quit beating his head against 
a wall with ULE (in an HPC environment).  Switch to
4BSD.  This was based on my 2008 observations and 
I've now wasted 2 days gather additional information
which only re-affirms my recommendation.
 
-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote:
 On 22 December 2011 11:47, Steve Kargl s...@troutmask.apl.washington.edu 
 wrote:
 
  There is the additional observation in one of my 2008
  emails (URLs have been posted) that if you have N+1
  cpu-bound jobs with, say, job0 and job1 ping-ponging
  on cpu0 (due to ULE's cpu-affinity feature) and if I
  kill job2 running on cpu1, then neither job0 nor job1
  will migrate to cpu1. ?So, one now has N cpu-bound
  jobs running on N-1 cpus.
 
 .. and this sounds like a pretty serious regression. Have you ever
 filed a PR for it?

No.  I was interacting directly with jeffr in 2008.  I got
as far as setting up root access on a node for jeffr.
Unfortunately, both jeffr and I got busy with real life,
and 4BSD allowed me to get my work done.

  Finally, my initial post in this email thread was to
  tell O. Hartman to quit beating his head against
  a wall with ULE (in an HPC environment). ?Switch to
  4BSD. ?This was based on my 2008 observations and
  I've now wasted 2 days gather additional information
  which only re-affirms my recommendation.
 
 I personally don't think this is time wasted. You've done something
 that noone else has actually done - provided actual results from
 real-life testing, rather than a hundred posts of I remember seeing
 X, so I don't use ULE.
 
 If you can definitely and consistently reproduce that N-1 cpu bound
 job bug, you're now in a great position to easily test and re-report
 KTR/schedtrace results to see what impact they have. Please don't
 underestimate exactly how valuable this is.

I'll try this tomorrow.  I first need to modify the code I used
in the 2008 test to disable IO, so that it is nearly completely
cpu-bound.

 How often are those two jobs migrating between CPUs? How am I supposed
 to read CPU load ? Why isn't it just sitting at 100% the whole time?

This is my 1st foray into ktr and schedgraph, so I may not have done
something incorrectly.  In particular, it seems that schedgraph takes
the cpu clock as a command line argument, so there is probably 
some scaling that I'm missing.

 Would you mind repeating this with 4BSD (the N+1 jobs) so we can see
 how the jobs are scheduled/interleaved? Something tells me we'll see
 it the jobs being scheduled evenly

Sure, I'll do this tomorrow as well.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-21 Thread Steve Kargl
On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
 2011/12/15 Steve Kargl s...@troutmask.apl.washington.edu:
  On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
 
  I basically went through all the e-mail you just sent and identified 4
  real report on which we could work on and summarizied in the attached
  Excel file.
  I'd like that George, Steve, Doug, Andrey and Mike possibly review the
  few datas there and add more, if they want, or make more important
  clarifications in particular about the Xorg presence (or rather not)
  in their workload.
 
  Your summary of my observations appears correct.
 
  I have grabbed an up-to-date /usr/src, built and
  installed world, and built and installed a new
  kernel on one of the nodes in my cluster. ??It
  has
 
 
 It seems a perfect environment, just please make sure you made a
 debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
 
 The first thing is, can you try reproducing your case? As far as I got
 it, for you it was enough to run N + small_amount of CPU-bound threads
 to show performance penalty, so I'd ask you to start with using dnetc
 or just your preferred cpu-bound workload and verify you can reproduce
 the issue.
 As it happens, please monitor the threads bouncing and CPU utilization
 via 'top' (you don't need to be 100% precise, jut to get an idea, and
 keep an eye on things like excessive threads migration, thread binding
 obsessity, low throughput on CPU).
 One note: if your workloads need to do I/O please use a tempfs or
 memory storage to do so, in order to reduce I/O effects at all.
 Also, verify this doesn't happen with 4BSD scheduler, just in case.
 
 Finally, if the problem is still in place, please recompile your
 kernel by adding:
 options KTR
 options KTR_ENTRIES=262144
 options KTR_COMPILE=(KTR_SCHED)
 options KTR_MASK=(KTR_SCHED)
 
 And reproduce the issue.
 When you are in the middle of the scheduling issue go with:
 # ktrdump -ctf  ktr-ule-problem-YOURNAME.out
 
 and send to the mailing list along with your dmesg and the
 informations on the CPU utilization you gathered by top(1).
 
 That should cover it all, but if you have further questions, please
 just go ahead.

Attilio,

I have placed several files at

http://troutmask.apl.washington.edu/~kargl/freebsd

dmesg.txt  -- dmesg for ULE kernel
summary-- A summary that includes top(1) output of all runs.
sysctl.ule.txt -- sysctl -a for the ULE kernel
ktr-ule-problem-kargl.out.gz 

I performed a series of tests with both 4BSD and ULE kernels.
The 4BSD and ULE kernels are identical except of course for the
scheduler.  Both witness and invariants are disabled, and malloc
has been compiled without debugging.

Here's what I did.  On the master node in my cluster, I ran an
OpenMPI code that sends N jobs off to the node with the kernel
of interest.  There is communication between the master and
slaves to generate 16 independent chunks of data.  Note, there
is no disk IO.  So, for example, N=4 will start 4 essentially
identical numerically intensity jobs.  At the start of a run,
the master node instructs each slave job to create a chunk of
data.  After the data is created, the slave sends it back to the
master and the master sends instructions to create the next chunk
of data.  This communication continues until the 16 chunks have
been assigned, computed, and returned to the master.  

Here is a rough measurement of the problem with ULE and numerical
intensity loads.  This command is executed on the master

time mpiexec -machinefile mf3 -np N sasmp sas.in

Since time is executed on the master, only the 'real' time is of
interest (the summary file includes user and sys times).  This
command is run at 5 times for each N value and up to 10 time for
some N values with the ULE kernel.  The following table records
the average 'real' time and the number in (...) is the mean
absolute deviations. 

#  N ULE 4BSD
# -
#  4223.27 (0.502)   221.76 (0.551)
#  5404.35 (73.82)   270.68 (0.866)
#  6627.56 (173.0)   247.23 (1.442)
#  7475.53 (84.07)   285.78 (1.421)
#  8429.45 (134.9)   223.64 (1.316)

These numbers to me demonstrate that ULE is not a good choice
for a HPC workload.

If you need more information, feel free to ask.  If you would
like access to the node, I can probably arrange that.  But,
we can discuss that off-line.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-15 Thread Steve Kargl
On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
 
 I basically went through all the e-mail you just sent and identified 4
 real report on which we could work on and summarizied in the attached
 Excel file.
 I'd like that George, Steve, Doug, Andrey and Mike possibly review the
 few datas there and add more, if they want, or make more important
 clarifications in particular about the Xorg presence (or rather not)
 in their workload.

Your summary of my observations appears correct.

I have grabbed an up-to-date /usr/src, built and
installed world, and built and installed a new
kernel on one of the nodes in my cluster.  It 
has

CPU: Dual Core AMD Opteron(tm) Processor 280 (2392.65-MHz K8-class CPU)
  Origin = AuthenticAMD  Id = 0x20f12  Family = f  Model = 21  Stepping = 2
  Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,
  MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x1SSE3
  AMD Features=0xe2500800SYSCALL,NX,MMX+,FFXSR,LM,3DNow!+,3DNow!
  AMD Features2=0x3LAHF,CMP
real memory  = 17179869184 (16384 MB)
avail memory = 16269832192 (15516 MB)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 2 package(s) x 2 core(s)

I can perform new tests with both ULE and 4BSD, but you'll
need to be precise in the information you want collected
(and how to collect the data) due to the rather limited
amount of time I currently have.

To summarize my workload, on the master node on my cluster
I start a job that will send N slave jobs to the node of
interest.  The slaves perform nearly identical cpu-bound
floating point computations, so the expectation is that
each slave should take nearly the same amount of cpu-time
to complete its task.  Communication occurs between only
the master and a slave at the start of the process and
when it finishes.  The communication is over GigE ipv4
internal network.  The slaves do not read or write to disk.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-13 Thread Steve Kargl
On Tue, Dec 13, 2011 at 02:23:46PM +0100, O. Hartmann wrote:
 On 12/12/11 16:51, Steve Kargl wrote:
  On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
 
  Not fully right, boinc defaults to run on idprio 31 so this isn't an
  issue. And yes, there are cases where SCHED_ULE shows much better
  performance then SCHED_4BSD.  [...]
 
  Do we have any proof at hand for such cases where SCHED_ULE performs
  much better than SCHED_4BSD? Whenever the subject comes up, it is
  mentioned, that SCHED_ULE has better performance on boxes with a ncpu 
  2. But in the end I see here contradictionary statements. People
  complain about poor performance (especially in scientific environments),
  and other give contra not being the case.
 
  Within our department, we developed a highly scalable code for planetary
  science purposes on imagery. It utilizes present GPUs via OpenCL if
  present. Otherwise it grabs as many cores as it can.
  By the end of this year I'll get a new desktop box based on Intels new
  Sandy Bridge-E architecture with plenty of memory. If the colleague who
  developed the code is willing performing some benchmarks on the same
  hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
  recent Suse. For FreeBSD I intent also to look for performance with both
  different schedulers available.
 
  
  This comes up every 9 months or so, and must be approaching
  FAQ status.
  
  In a HPC environment, I recommend 4BSD.  Depending on
  the workload, ULE can cause a severe increase in turn
  around time when doing already long computations.  If
  you have an MPI application, simply launching greater
  than ncpu+1 jobs can show the problem.
 
 Well, those recommendations should based on WHY. As the mostly
 negative experiences with SCHED_ULE in highly computative workloads get
 allways contradicted by ...but there are workloads that show the
 opposite ... this should be shown by more recent benchmarks and
 explanations than legacy benchmarks from years ago.
 

I have given the WHY in previous discussions of ULE, based
on what you call legacy benchmarks.  I have not seen any
commit to sched_ule.c that would lead me to believe that
the performance issues with ULE and cpu-bound numerical
codes have been addressed.  Repeating the benchmark would
be a waste of time.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-12 Thread Steve Kargl
On Mon, Dec 12, 2011 at 02:47:57PM +0100, O. Hartmann wrote:
 
  Not fully right, boinc defaults to run on idprio 31 so this isn't an
  issue. And yes, there are cases where SCHED_ULE shows much better
  performance then SCHED_4BSD.  [...]
 
 Do we have any proof at hand for such cases where SCHED_ULE performs
 much better than SCHED_4BSD? Whenever the subject comes up, it is
 mentioned, that SCHED_ULE has better performance on boxes with a ncpu 
 2. But in the end I see here contradictionary statements. People
 complain about poor performance (especially in scientific environments),
 and other give contra not being the case.
 
 Within our department, we developed a highly scalable code for planetary
 science purposes on imagery. It utilizes present GPUs via OpenCL if
 present. Otherwise it grabs as many cores as it can.
 By the end of this year I'll get a new desktop box based on Intels new
 Sandy Bridge-E architecture with plenty of memory. If the colleague who
 developed the code is willing performing some benchmarks on the same
 hardware platform, we'll benchmark bot FreeBSD 9.0/10.0 and the most
 recent Suse. For FreeBSD I intent also to look for performance with both
 different schedulers available.
 

This comes up every 9 months or so, and must be approaching
FAQ status.

In a HPC environment, I recommend 4BSD.  Depending on
the workload, ULE can cause a severe increase in turn
around time when doing already long computations.  If
you have an MPI application, simply launching greater
than ncpu+1 jobs can show the problem.

PS: search the list archives for kargl and ULE.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-12 Thread Steve Kargl
On Mon, Dec 12, 2011 at 04:18:35PM +, Bruce Cran wrote:
 On 12/12/2011 15:51, Steve Kargl wrote:
 This comes up every 9 months or so, and must be approaching FAQ 
 status. In a HPC environment, I recommend 4BSD. Depending on the 
 workload, ULE can cause a severe increase in turn around time when 
 doing already long computations. If you have an MPI application, 
 simply launching greater than ncpu+1 jobs can show the problem. PS: 
 search the list archives for kargl and ULE. 
 
 This isn't something that can be fixed by tuning ULE? For example for 
 desktop applications kern.sched.preempt_thresh should be set to 224 from 
 its default. I'm wondering if the installer should ask people what the 
 typical use will be, and tune the scheduler appropriately.
 

Tuning kern.sched.preempt_thresh did not seem to help for
my workload.  My code is a classic master-slave OpenMPI
application where the master runs on one node and all
cpu-bound slaves are sent to a second node.  If I send
send ncpu+1 jobs to the 2nd node with ncpu's, then 
ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
last two jobs are assigned to the ncpu'th cpu, and 
these ping-pong on the this cpu.  AFAICT, it is a cpu
affinity issue, where ULE is trying to keep each job
associated with its initially assigned cpu.

While one might suggest that starting ncpu+1 jobs
is not prudent, my example is just that.  It is an
example showing that ULE has performance issues. 
So, I now can start only ncpu jobs on each node
in the cluster and send emails to all other users
to not use those node, or use 4BSD and not worry
about loading issues.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-12 Thread Steve Kargl
On Mon, Dec 12, 2011 at 01:03:30PM -0600, Scott Lambert wrote:
 On Mon, Dec 12, 2011 at 09:06:04AM -0800, Steve Kargl wrote:
  Tuning kern.sched.preempt_thresh did not seem to help for
  my workload.  My code is a classic master-slave OpenMPI
  application where the master runs on one node and all
  cpu-bound slaves are sent to a second node.  If I send
  send ncpu+1 jobs to the 2nd node with ncpu's, then 
  ncpu-1 jobs are assigned to the 1st ncpu-1 cpus.  The
  last two jobs are assigned to the ncpu'th cpu, and 
  these ping-pong on the this cpu.  AFAICT, it is a cpu
  affinity issue, where ULE is trying to keep each job
  associated with its initially assigned cpu.
  
  While one might suggest that starting ncpu+1 jobs
  is not prudent, my example is just that.  It is an
  example showing that ULE has performance issues. 
  So, I now can start only ncpu jobs on each node
  in the cluster and send emails to all other users
  to not use those node, or use 4BSD and not worry
  about loading issues.
 
 Does it meet your expectations if you start (j modulo ncpu) = 0
 jobs on a node?
 

I've never tried to launch more than ncpu + 1 (or + 2)
jobs.  I suppose at the time I was investigating the issue,
it was determined that 4BSD allowed me to get my work done
in a more timely manner.  So, I took the path of least
resistance.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: RFC vgrind in base (and buildworld)

2011-01-22 Thread Steve Kargl
On Sat, Jan 22, 2011 at 10:58:25AM +0100, Gary Jennejohn wrote:
 On Fri, 21 Jan 2011 23:20:09 +0100
 Ulrich Sp?rlein u...@freebsd.org wrote:
 
  On Thu, 20.01.2011 at 21:17:40 +0100, Ulrich Sp?rlein wrote:
   Hello,
   
   Currently our buildworld relies on groff(1) and vgrind(1) being present
   in the host system. I have a patch ready that at least makes sure these
   are built during bootstrap-tools and completes the WITHOUT_GROFF flag.
   
   vgrind(1) is only used for two papers under share/doc and we could
   easily expand the results and commit them to svn directly, alleviating
   the need to run vgrind(1) during buildworld.
   
   OTOH, there are much more useful tools to vgrind(1) for source code
   formatting. So do we still have vgrind(1) users out there?
   
   Regards,
   Uli
  
  [trying to get this thread back on track]
  
  Does anyone actually care about vgrind in base? Will people be angry if
  I unroll the 2 cases where it is used under share/doc?
  
 
 I personally have never used vgrind, but since it's available as part of
 /usr/ports/textproc/heirloom-doctools IMO it would be safe to remove it
 from base, maybe with a note in UPDATING.

AFAICT, heirloom-doctools does not work on 64-bit platforms.
vgrind may be ok but nroff and troff die rather quickly with
a segfault when it tries to use a macro package such as mdoc.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: TTY task group scheduling

2010-11-18 Thread Steve Kargl
On Thu, Nov 18, 2010 at 10:59:43PM +, Alexander Best wrote:
 
 well i did exactly what they did in the video. watch a 1080p video and move
 the output window around while compiling the kernel.
 

It is trivial to bring ULE to its knees.  If you 
have N cores then all you need is N+1 cpu intensive
task.  The issue has been known for a few years.

http://freebsd.monkey.org/freebsd-current/200807/msg00278.html
http://www.mail-archive.com/freebsd-hack...@freebsd.org/msg65839.html

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: HEADS-UP: Shared Library Versions bumped...

2009-07-21 Thread Steve Kargl
On Tue, Jul 21, 2009 at 10:45:36PM +0200, O. Hartmann wrote:
 
 I have another box (of many) running FreeBSD 8.0-BETA2/amd64 with 2 GB
 RAM and a Athlon64 2,2GHz CPU having 800(!) ports installed. Can you
 imagine how long this box will be occupied by 'portupgrade -af'? I guess
 'cherry-picking' is the only solution.

How many of those 800 ports are actually necessary and used?
It would be better to get generate a complete list of your
installed ports, use pkg_deinstall or pkg_delete to remove
all ports, and then selectively re-install ports that are
actually used.

 FreeBSD 8.0 on AMD64 does have serious performance issues these days,
 try to compile a compiler (gcc44, for instance) and watch how bumpy your
 X11 or how network traffic on a 'headless' server becomes. Kernel
 compilation time has been increased by approx 10 minutes on the 8 core
 box with 16 GB RAM since ~ 4 months now. I know, this is a kind of off
 topic for the questiojns discussed at the moment, but I guess those
 problems and fun are guaranteed for those having lots of ports, FreeBSD
 8 running on AMD64 ;-))
 

I compile gcc trunk on my 2 cpu amd64 based system almost
everyday.  I don't see the performance issue you seem to
have.  Do you use ULE?  If yes, then switch to 4BSD.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: -m32 broken on bi-arch amd64 systems?

2008-12-23 Thread Steve Kargl
On Tue, Dec 23, 2008 at 12:55:04PM -0500, Josh Carroll wrote:
  I also noticed that behavior, shouldn't compiler/linker look
  into /usr/lib32 without additional -B switch?
  --
  regards, Maciej Suszko.
 
 
 I don't know if it should or should not, but I can confirm that this
 behavior was around in 7.0-RELEASE, so it's been that way for quite a
 while, at least in the 7 branch.
 

Sigh.  Read the list archives.  It's been this way since Peter
Wemm first introduce the ability to run i386 binaries on 
amd64.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: ath0 induced panic additional info

2007-04-27 Thread Steve Kargl
On Fri, Apr 27, 2007 at 02:26:15PM -0700, Sam Leffler wrote:
 Steve Kargl wrote:
  By increasing the kernel message buffer, I was able to
  get the previous Unread portion im my last email.
  
  Unread portion of the kernel message buffer:
  lock order reversal: (sleepable after non-sleepable)
   1st 0xc34caec0 ath0 (ath0) @ /usr/src/sys/dev/ath/if_ath.c:5210
   2nd 0xc32cbe24 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074
  --- trap 0xc, eip = 0xc06e8056, esp = 0xd9753b74, ebp = 0xd9753bac ---
  generic_copyout(c34c8c00,c3726400,c34cab30,,...) at generic_copyout+0x36
  ieee80211_ioctl(c34ca230,c0286938,c3726400) at ieee80211_ioctl+0xc1
  ath_ioctl(c34c8c00,c0286938,c3726400) at ath_ioctl+0x190
 
 Age old issue: the driver calls into the net80211 layer holding it's
 softc lock but net80211 calls copyout and if that faults copying data to
 user mode then you'll blow up.  I've proposed a solution but noone's
 responded so it remains.
 

That's unfortunate. :(

OTOH, I've since updated the laptop to -current and ath0
is working great.  Thanks for your effort on this driver
and other parts of FreeBSD.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ath induced panic in -stable

2007-04-26 Thread Steve Kargl
In trying to update from a 6.2-release to 6-2.-stable,
I run into a nasty panic which results in a corrupt 
backtrace.  It looks like a cascade of panics.  In
6.2-release, I initialize my ath wirelss NIC with the
following script

#! /bin/sh
ifconfig ath0 inet 192.168.0.10
ifconfig ath0 ssid My_ssid mode 11g channel 11 wepmode on
ifconfig ath0 wepkey 0xValid_WEP_key deftxkey 1
route add default 192.168.0.1

I can get to the net without a problem.  However, with up-to-date
6.2-stable sources, the above script will cause a panic.  In
trying various things, I've found that the mode 11g in the second
command is the guilty party.  Without mode 11g, I can once
again to the net.  Here's the output of a kgdb session


Unread portion of the kernel message buffer:
ifhwioctl(c0286938,c34c4c00,c3723e80,c3722000) at ifhwioctl+0xa40
ifioctl(c355a000,c0286938,c3723e80,c3722000,0,...) at ifioctl+0xc3
soo_ioctl(c3512a68,c0286938,c3723e80,c3745180,c3722000) at soo_ioctl+0x2db
ioctl(c3722000,da95ad04) at ioctl+0x396
syscall(bfbf003b,3b,bfbf003b,805d028,0,...) at syscall+0x22f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x28149787, esp = 0xbfbfe2fc, ebp 
= 0xbfbfe328 ---
KDB: enter: witness_checkorder
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 511MB (130786 pages) 495 479 463 447 431 415 399 383 367 351 335 319 
303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:165
#1  0xc0477d1b in db_fncall (dummy1=-1065228384, dummy2=0, 
dummy3=-1066610577, dummy4=0xda95a7c4 ??\225???l???\225???\225?\220\a)
at /usr/src/sys/ddb/db_command.c:492
#2  0xc0477b20 in db_command (last_cmdp=0xc07aef44, cmd_table=0x0, 
aux_cmd_tablep=0xc0764a34, aux_cmd_tablep_end=0xc0764a38)
at /usr/src/sys/ddb/db_command.c:350
#3  0xc0477be8 in db_command_loop () at /usr/src/sys/ddb/db_command.c:458
#4  0xc04797e5 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:222
#5  0xc0573997 in kdb_trap (type=3, code=0, tf=0xda95a904)
at /usr/src/sys/kern/subr_kdb.c:473
#6  0xc06e9a24 in trap (frame=
  {tf_fs = -627769336, tf_es = -1068040152, tf_ds = -1066205144, tf_edi = 
9, tf_esi = -1020494300, tf_ebp = -627726012, tf_isp = -627726032, tf_ebx = 
-1065345868, tf_edx = 0, tf_ecx = -1056878592, tf_eax = 31, tf_trapno = 3, 
tf_err = 0, tf_eip = -1068026085, tf_cs = 32, tf_eflags = 662, tf_esp = 
-627725960, tf_ss = -1067982253}) at /usr/src/sys/i386/i386/trap.c:594
#7  0xc06d7f5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#8  0xc057371b in kdb_enter (msg=0x1f Address 0x1f out of bounds)
at cpufunc.h:60
#9  0xc057e253 in witness_checkorder (lock=0xc32c7e24, flags=9, 
file=0xc075587c /usr/src/sys/vm/vm_map.c, line=3074)
at /usr/src/sys/kern/subr_witness.c:1079
#10 0xc0560a74 in _sx_xlock (sx=0xc32c7e24, 
file=0xc075587c /usr/src/sys/vm/vm_map.c, line=3074)
at /usr/src/sys/kern/kern_sx.c:171
#11 0xc067c273 in _vm_map_lock_read (map=0x1f, 
file=0xc1015000 Copyright (c) 1992-2007 The FreeBSD Project.\nCopyright 
(c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994\n\tThe Regents 
of the University of California. All rights reserved.\nFreeBSD is a re..., 
line=0) at /usr/src/sys/vm/vm_map.c:453
#12 0xc067f330 in vm_map_lookup (var_map=0xda95aa6c, vaddr=134602752, 
fault_typea=2 '\002', out_entry=0xda95aa70, object=0x1f, 
pindex=0xc1015000, out_prot=0x1f Address 0x1f out of bounds, 
wired=0xda95aa48) at /usr/src/sys/vm/vm_map.c:3074
#13 0xc06784bd in vm_fault (map=0xc32c7de0, vaddr=134602752, 
fault_type=2 '\002', fault_flags=8) at /usr/src/sys/vm/vm_fault.c:235
#14 0xc06e9bae in trap_pfault (frame=0xda95ab34, usermode=0, eva=134602752)
at /usr/src/sys/i386/i386/trap.c:722
#15 0xc06e98b1 in trap (frame=
  {tf_fs = -1065680888, tf_es = 40, tf_ds = -1066205144, tf_edi = 
134602752, tf_esi = -1019717632, tf_ebp = -627725396, tf_isp = -627725472, 
tf_ebx = 620, tf_edx = 0, tf_ecx = 155, tf_eax = 134603372, tf_trapno = 12, 
tf_err = 2, tf_eip = -1066500010, tf_cs = 32, tf_eflags = 66050, tf_esp = 
-1015923072, tf_ss = 155}) at /usr/src/sys/i386/i386/trap.c:435
#16 0xc06d7f5a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#17 0xc06e8056 in generic_copyout () at /usr/src/sys/i386/i386/support.s:760
Previous frame inner to this frame (corrupt stack?)

If one goes back upto the Unread portion above, on the console
I see a line about ath_ioctl, then frame #17. 

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


ath0 induced panic additional info

2007-04-26 Thread Steve Kargl
By increasing the kernel message buffer, I was able to
get the previous Unread portion im my last email.

Unread portion of the kernel message buffer:
lock order reversal: (sleepable after non-sleepable)
 1st 0xc34caec0 ath0 (ath0) @ /usr/src/sys/dev/ath/if_ath.c:5210
 2nd 0xc32cbe24 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074
KDB: stack backtrace:
kdb_backtrace(0,,c07c3e08,c07c5500,c078596c,...) at kdb_backtrace+0x29
witness_checkorder(c32cbe24,9,c075587c,c02) at witness_checkorder+0x578
_sx_xlock(c32cbe24,c075587c,c02) at _sx_xlock+0x50
_vm_map_lock_read(c32cbde0,c075587c,c02,2000246,c3722068,...) at 
_vm_map_lock_read+0x37
vm_map_lookup(d9753a6c,805e000,2,d9753a70,d9753a60,d9753a64,d9753a47,d9753a48) 
at vm_map_lookup+0x28
vm_fault(c32cbde0,805e000,2,8,c34ee180,...) at vm_fault+0x65
trap_pfault(d9753b34,0,805e000) at trap_pfault+0xce
trap(c07b0008,28,c0730028,805e000,c334f400,...) at trap+0x319
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc06e8056, esp = 0xd9753b74, ebp = 0xd9753bac ---
generic_copyout(c34c8c00,c3726400,c34cab30,c0286938,0,...) at 
generic_copyout+0x36
ieee80211_ioctl(c34ca230,c0286938,c3726400) at ieee80211_ioctl+0xc1
ath_ioctl(c34c8c00,c0286938,c3726400) at ath_ioctl+0x190
ifhwioctl(c0286938,c34c8c00,c3726400,c34ee180) at ifhwioctl+0xa40
ifioctl(c355e000,c0286938,c3726400,c34ee180,0,...) at ifioctl+0xc3
soo_ioctl(c3516ab0,c0286938,c3726400,c3748480,c34ee180) at soo_ioctl+0x2db
ioctl(c34ee180,d9753d04) at ioctl+0x396
syscall(3b,3b,3b,805d028,0,...) at syscall+0x22f
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (54, FreeBSD ELF32, ioctl), eip = 0x28149787, esp = 0xbfbfe2fc, ebp 
= 0xbfbfe328 ---
KDB: enter: witness_checkorder
panic: from debugger
KDB: stack backtrace:
Uptime: 1m1s
Dumping 511 MB (2 chunks)
  chunk 0: 1MB (159 pages) ... ok
  chunk 1: 511MB (130786 pages) 495 479 463 447 431 415 399 383 367 351 335 319 
303 287 271 255 239 223 207 191 175 159 143 127 111 95 79 63 47 31 15

#0  doadump () at pcpu.h:165
165 pcpu.h: No such file or directory.
in pcpu.h
(kgdb) quit
mobile:root[157] exit
exit

Script done on Thu Apr 26 16:38:51 2007
-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ath0 induced panic additional info

2007-04-26 Thread Steve Kargl
On Thu, Apr 26, 2007 at 10:44:52PM -0400, [EMAIL PROTECTED] wrote:
 in message [EMAIL PROTECTED],
 wrote Steve Kargl thusly...
 
  By increasing the kernel message buffer, I was able to
  get the previous Unread portion im my last email.
  
  Unread portion of the kernel message buffer:
  lock order reversal: (sleepable after non-sleepable)
   1st 0xc34caec0 ath0 (ath0) @ /usr/src/sys/dev/ath/if_ath.c:5210
   2nd 0xc32cbe24 user map (user map) @ /usr/src/sys/vm/vm_map.c:3074
 ...
 
 Oh yes, I got the problem with ath interface on mode 11g (along
 with WPA  DHCP set in /etc/rc.conf); see LOR - ath (similar to LOR
 #42) on FreeBSD 6-STABLE, [EMAIL PROTECTED].
 

It's a moot point in that the system has just reboot with -current.

If anyone wants the debug kernel and core dump, drop me an email.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Cardbus0: CIS pointer != 0 problem.

2006-07-24 Thread Steve Kargl
I have a colleague who installed FreeBSD 6.1-stable onto
an Alienware MJ-12 laptop.  A verbose dmesg is at
http://troutmask.apl.washington.edu/~kargl/alienware.dmesg

We are trying to getting his wireless nic up, but seem to
have run into a cardbus issue.  I've built a custom kernel
and stripped out all unneeded device drives.  During boot,r
we see

cardbus0: CIS pointer is 0!
cardbus0: Resource not specified in CIS: id=10, size=100
cardbus0: Resource not specified in CIS: id=14, size=0
cardbus0: Resource not specified in CIS: id=1c, size=100
cardbus0: Resource not specified in CIS: id=24, size=80
cbb alloc res fail
found- vendor=0x10de, dev=0x0299, revid=0xa1
bus=2, slot=0, func=0
class=03-00-00, hdrtype=0x00, mfdev=0
cmdreg=0x0003, statreg=0x0010, cachelnsz=8 (dwords)
lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
intpin=a, irq=255
powerspec 2  supports D0 D3  current D0
MSI supports 1 message, 64 bit

Has anyone seen this problem and do you have some recommendations
to fix or work around the issue?

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Cardbus0: CIS pointer != 0 problem.

2006-07-24 Thread Steve Kargl
On Mon, Jul 24, 2006 at 09:20:37PM -0500, John Merryweather Cooper wrote:
 Steve Kargl wrote:
 
 cardbus0: CIS pointer is 0!
 cardbus0: Resource not specified in CIS: id=10, size=100
 cardbus0: Resource not specified in CIS: id=14, size=0
 cardbus0: Resource not specified in CIS: id=1c, size=100
 cardbus0: Resource not specified in CIS: id=24, size=80
 cbb alloc res fail
 
 Has anyone seen this problem and do you have some recommendations
 to fix or work around the issue?
 
   
 This message most commonly comes up when the NIC/PCCARD is NOT supported
 by a native FreeBSD driver.  For example:

The card has a atheros chip, and I know that it worked with
FreeBSD 6.1-RELEASE.  However, because of patches, I upgraded
to 6.1-stable, and a acpi failure may be confusing cardbus.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


AMD64 kernel builds are broken

2006-06-15 Thread Steve Kargl
Doug,

Your recent commit appears to have broken buildkernel on AMD64.
For some reason the COMPAT_LINUX32 option is not honored, so I
get the wrong header files.

/usr/obj/usr/src/make.amd64/make -V CFILES -V SYSTEM_CFILES -V GEN_CFILES |  
MKDEP_CPP=cc -E CC=cc xargs mkdep -a -f .newdep -O2 -frename-registers 
-pipe -fno-strict-aliasing -march=opteron -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -fformat-extensions -std=c99 -g -nostdinc -I-  -I. 
-I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter 
-I/usr/src/sys/contrib/pf -I/usr/src/sys/contrib/dev/ath 
-I/usr/src/sys/contrib/dev/ath/freebsd -I/usr/src/sys/contrib/ngatm 
-I/usr/src/sys/dev/twa -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include 
opt_global.h -fno-common -finline-limit=8000 --param inline-unit-growth=100 
--param large-function-growth=1000  -fno-omit-frame-pointer -mcmodel=kernel 
-mno-red-zone  -mfpmath=387 -mno-sse -mno-sse2 -mno-mmx -mno-3dnow  
-msoft-float -fno-asynchronous-unwind-tables -ffreestanding
/usr/src/sys/compat/linux/linux_stats.c:55:36: machine/../linux/linux.h: No 
such file or directory
/usr/src/sys/compat/linux/linux_stats.c:56:42: machine/../linux/linux_proto.h: 
No such file or directory
mkdep: compile failed
*** Error code 1

Stop in /usr/obj/usr/src/sys/HPC.
*** Error code 1

Stop in /usr/src.
*** Error code 1

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Broadcomm BCM4401-B0 and memory upgrade issue.

2006-01-24 Thread Steve Kargl
On Tue, Jan 24, 2006 at 03:12:17PM -0600, Mark Tinguely wrote:
 Have you tried to boot with the old contigmalloc using the sysctl
 option vm.old_contigmalloc=1?

Yes.  This makes an enormous difference in boot up times.
With vm.old_contigmalloc=1, fxp0 probes within a few seconds.
Without it, fxp0 takes more than 7 minutes to probe.

 Some people are seeing slow boot/configuration with new style
 vm_page_alloc_contig/contigmalloc.

yep.

 I am doing some profiling of vm_page_alloc_contig() and have found
 that larger physical memory configurations makes these problems much worse.

yep.  I have 12 GB.

 I identified 4-5 places that can be changed to decrease the number
 of page tables that is needed to be checked before an allocating a range.
 Some of the changes will only occasionally save a few page checks, but
 others changes could save several hundred or more page checks on every call.

If you come up with a patch, I'm more than willing to test it.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Problems with AMD64 and 8 GB RAM?

2005-03-30 Thread Steve Kargl
On Thu, Mar 31, 2005 at 07:54:39AM +0930, Greg 'groggy' Lehey wrote:
 None of these problems occur when I use 4 GB memory.  About the only
 strangeness, which seems to come from the BIOS, is that it recognizes
 only 3.5 GB.  If I put all DIMMS in, it recognizes the full 8 GB
 memory.
 
 I realize that this isn't enough to diagnose the problem.  The reason
 for this message now is to ask:
 
 1.  Has anybody else seen this problem?
 2.  Has anybody else used this hardware configuration and *not* seen
 this problem?
 3.  Where should I look next?
 

Have you run sysutils/memtest86 with the 8 GB?  I had
4 bad out of 12 tested where the DIMMs were Crucial
PC2700 2GB Reg. ECC DIMMs.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Problems with AMD64 and 8 GB RAM?

2005-03-30 Thread Steve Kargl
On Thu, Mar 31, 2005 at 08:14:45AM +0930, Greg 'groggy' Lehey wrote:
 On Wednesday, 30 March 2005 at 14:35:46 -0800, Steve Kargl wrote:
  On Thu, Mar 31, 2005 at 07:54:39AM +0930, Greg 'groggy' Lehey wrote:
  None of these problems occur when I use 4 GB memory.  About the only
  strangeness, which seems to come from the BIOS, is that it recognizes
  only 3.5 GB.  If I put all DIMMS in, it recognizes the full 8 GB
  memory.
 
  I realize that this isn't enough to diagnose the problem.  The reason
  for this message now is to ask:
 
  3.  Where should I look next?
 
  Have you run sysutils/memtest86 with the 8 GB?
 
 Heh.  Difficult when the system doesn't run.

That's what happens when 1 of 8 (1 of 4?) DIMM is bad :-)

  I had 4 bad out of 12 tested where the DIMMs were Crucial PC2700 2GB
  Reg. ECC DIMMs.
 
 OK, this makes sense.  It might also explain why the 4 GB
 configuration only recognizes 3.5 GB.

Search amd64 mailing list.  The missing memory is reserved for
something which escapes me at the moment.  Similar to the 
infamous ISA memory hole.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Problems with AMD64 and 8 GB RAM?

2005-03-30 Thread Steve Kargl
On Thu, Mar 31, 2005 at 10:32:33AM +0930, Daniel O'Connor wrote:
 On Thu, 31 Mar 2005 08:14, Greg 'groggy' Lehey wrote:
   Have you run sysutils/memtest86 with the 8 GB?
 
  Heh.  Difficult when the system doesn't run.
 
 You could try http://www.memtest86.com although that doesn't do 4Gb :(
 

http://www.memtest.org/

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ATA mkIII first official patches - please test!

2005-02-03 Thread Steve Kargl
On Thu, Feb 03, 2005 at 09:52:57PM +0100, S?ren Schmidt wrote:
 
 As usual, even if it works on all the HW I have here in the lab, thats by
 far not the same as it works on YOUR system. So use glowes and safety shoes
 and if it breaks I dont want the pieces, but would like to hear the nifty
 details on how exactly it got that way :)
 

THANK YOU!  This is the first time since 7 Dec 04 that I've
been able to boot a current -CURRENT on my Dell Inspiron 4150
laptop.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]