Re: SCHED_ULE should not be the default

2011-12-22 Thread Adrian Chadd
Are you able to go through the emails here and grab out Attilio's
example for generating KTR scheduler traces?


Adrian

On 21 December 2011 16:52, Steve Kargl s...@troutmask.apl.washington.edu 
wrote:
 On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
 2011/12/15 Steve Kargl s...@troutmask.apl.washington.edu:
  On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
 
  I basically went through all the e-mail you just sent and identified 4
  real report on which we could work on and summarizied in the attached
  Excel file.
  I'd like that George, Steve, Doug, Andrey and Mike possibly review the
  few datas there and add more, if they want, or make more important
  clarifications in particular about the Xorg presence (or rather not)
  in their workload.
 
  Your summary of my observations appears correct.
 
  I have grabbed an up-to-date /usr/src, built and
  installed world, and built and installed a new
  kernel on one of the nodes in my cluster. ??It
  has
 

 It seems a perfect environment, just please make sure you made a
 debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).

 The first thing is, can you try reproducing your case? As far as I got
 it, for you it was enough to run N + small_amount of CPU-bound threads
 to show performance penalty, so I'd ask you to start with using dnetc
 or just your preferred cpu-bound workload and verify you can reproduce
 the issue.
 As it happens, please monitor the threads bouncing and CPU utilization
 via 'top' (you don't need to be 100% precise, jut to get an idea, and
 keep an eye on things like excessive threads migration, thread binding
 obsessity, low throughput on CPU).
 One note: if your workloads need to do I/O please use a tempfs or
 memory storage to do so, in order to reduce I/O effects at all.
 Also, verify this doesn't happen with 4BSD scheduler, just in case.

 Finally, if the problem is still in place, please recompile your
 kernel by adding:
 options KTR
 options KTR_ENTRIES=262144
 options KTR_COMPILE=(KTR_SCHED)
 options KTR_MASK=(KTR_SCHED)

 And reproduce the issue.
 When you are in the middle of the scheduling issue go with:
 # ktrdump -ctf  ktr-ule-problem-YOURNAME.out

 and send to the mailing list along with your dmesg and the
 informations on the CPU utilization you gathered by top(1).

 That should cover it all, but if you have further questions, please
 just go ahead.

 Attilio,

 I have placed several files at

 http://troutmask.apl.washington.edu/~kargl/freebsd

 dmesg.txt      -- dmesg for ULE kernel
 summary        -- A summary that includes top(1) output of all runs.
 sysctl.ule.txt -- sysctl -a for the ULE kernel
 ktr-ule-problem-kargl.out.gz

 I performed a series of tests with both 4BSD and ULE kernels.
 The 4BSD and ULE kernels are identical except of course for the
 scheduler.  Both witness and invariants are disabled, and malloc
 has been compiled without debugging.

 Here's what I did.  On the master node in my cluster, I ran an
 OpenMPI code that sends N jobs off to the node with the kernel
 of interest.  There is communication between the master and
 slaves to generate 16 independent chunks of data.  Note, there
 is no disk IO.  So, for example, N=4 will start 4 essentially
 identical numerically intensity jobs.  At the start of a run,
 the master node instructs each slave job to create a chunk of
 data.  After the data is created, the slave sends it back to the
 master and the master sends instructions to create the next chunk
 of data.  This communication continues until the 16 chunks have
 been assigned, computed, and returned to the master.

 Here is a rough measurement of the problem with ULE and numerical
 intensity loads.  This command is executed on the master

 time mpiexec -machinefile mf3 -np N sasmp sas.in

 Since time is executed on the master, only the 'real' time is of
 interest (the summary file includes user and sys times).  This
 command is run at 5 times for each N value and up to 10 time for
 some N values with the ULE kernel.  The following table records
 the average 'real' time and the number in (...) is the mean
 absolute deviations.

 #  N         ULE             4BSD
 # -
 #  4    223.27 (0.502)   221.76 (0.551)
 #  5    404.35 (73.82)   270.68 (0.866)
 #  6    627.56 (173.0)   247.23 (1.442)
 #  7    475.53 (84.07)   285.78 (1.421)
 #  8    429.45 (134.9)   223.64 (1.316)

 These numbers to me demonstrate that ULE is not a good choice
 for a HPC workload.

 If you need more information, feel free to ask.  If you would
 like access to the node, I can probably arrange that.  But,
 we can discuss that off-line.

 --
 Steve
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
___
freebsd-stable@freebsd.org mailing list

Re: directory listing hangs in ufs state

2011-12-22 Thread Kostik Belousov
On Wed, Dec 21, 2011 at 09:03:02PM +0400, Andrey Zonov wrote:
 On 15.12.2011 17:01, Kostik Belousov wrote:
 On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote:
 On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
 free...@jdc.parodius.comwrote:
 
 On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote:
 On 14.12.2011 22:22, Jeremy Chadwick wrote:
 On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:
 Hi Jeremy,
 
 This is not hardware problem, I've already checked that. I also ran
 fsck today and got no errors.
 
 After some more exploration of how mongodb works, I found that then
 listing hangs, one of mongodb thread is in biowr state for a long
 time. It periodically calls msync(MS_SYNC) accordingly to ktrace
 out.
 
 If I'll remove msync() calls from mongodb, how often data will be
 sync by OS?
 
 --
 Andrey Zonov
 
 On 14.12.2011 2:15, Jeremy Chadwick wrote:
 On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:
 
 Have you any ideas what is going on? or how to catch the problem?
 
 Assuming this isn't a file on the root filesystem, try booting the
 machine in single-user mode and using fsck -f on the filesystem in
 question.
 
 Can you verify there's no problems with the disk this file lives on 
 as
 well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
 thought I'd mention it.
 
 I have no real answer, I'm sorry.  msync(2) indicates it's effectively
 deprecated (see BUGS).  It looks like this is effectively a 
 mmap-version
 of fsync(2).
 
 I replaced msync(2) with fsync(2).  Unfortunately, from man pages it
 is not obvious that I can do this. Anyway, thanks.
 
 Sorry, that wasn't what I was implying.  Let me try to explain
 differently.
 
 msync(2) looks, to me, like an mmap-specific version of fsync(2).  Based
 on the man page, it seems that the with msync() you can effectively
 guaranteed flushing of certain pages within an mmap()'d region to disk.
 fsync() would flush **all** buffers/internal pages to be flushed to
 disk.
 
 One would need to look at the code to mongodb to find out what it's
 actually doing with msync().  That is to say, if it's doing something
 like this (I probably have the semantics wrong -- I've never spent much
 time with mmap()):
 
 fd = open(/some/file, O_RDWR);
 ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
 ret = msync(ptr, 65536, MS_SYNC);
 /* or alternatively, this:
 ret = msync(ptr, NULL, MS_SYNC);
 */
 
 Then this, to me, would be mostly the equivalent to:
 
 fd = fopen(/some/file, r+);
 ret = fsync(fd);
 
 Otherwise, if it's calling msync() only on an address/location within
 the region ptr points to, then that may be more efficient (less pages to
 flush).
 
 
 They call msync() for the whole file.  So, there will not be any 
 difference.
 
 
 The mmap() arguments -- specifically flags (see man page) -- also play
 a role here.  The one that catches my attention is MAP_NOSYNC.  So you
 may need to look at the mongodb code to figure out what it's mmap()
 call is.
 
 One might wonder why they don't just use open() with the O_SYNC.  I
 imagine that has to do with, again, performance; possibly the don't want
 all I/O synchronous, and would rather flush certain pages in the mmap'd
 region to disk as needed.  I see the legitimacy in that approach (vs.
 just using O_SYNC).
 
 There's really no easy way for me to tell you which is more efficient,
 better, blah blah without spending a lot of time with a benchmarking
 program that tests all of this, *plus* an entire system (world) built
 with profiling.
 
 
 I ran for two hours mongodb with fsync() and got the following:
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 10:34:52 2011 3 192744314 3080182
 
 This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'.
 
 Then I ran it with default msync():
 STARTED  INBLK OUBLK MAJFLT MINFLT
 Thu Dec 15 12:34:53 2011 0 7241555 79 5401945
 
 There are also two graphics of disk business [1] [2].
 
 The difference is significant, in 37 times!  That what I expected to get.
 
 In commentaries for vm_object_page_clean() I found this:
 
   *  When stuffing pages asynchronously, allow clustering.  XXX we 
   need a
   *  synchronous clustering mode implementation.
 
 It means for me that msync(MS_SYNC) flush every page on disk in single IO
 transaction.  If we multiply 4K and 37 we get 150K.  This number is size 
 of
 the single transaction in my experience.
 
 +alc@, kib@
 
 Am I right? Is there any plan to implement this?
 Current buffer clustering code can only do only async writes. In fact, I
 am not quite sure what would consitute the sync clustering, because the
 ability to delay the write is important to be able to cluster at all.
 
 Also, I am not sure that lack of clustering is the biggest problem.
 IMO, the fact that each write is sync is the first problem there. It
 would be quite a work to add the tracking of the issued writes 

Re: SCHED_ULE should not be the default

2011-12-22 Thread Luigi Rizzo
On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
 On Fri, Dec 16, 2011 at 12:14:24PM +0100, Attilio Rao wrote:
  2011/12/15 Steve Kargl s...@troutmask.apl.washington.edu:
   On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
  
   I basically went through all the e-mail you just sent and identified 4
   real report on which we could work on and summarizied in the attached
   Excel file.
   I'd like that George, Steve, Doug, Andrey and Mike possibly review the
   few datas there and add more, if they want, or make more important
   clarifications in particular about the Xorg presence (or rather not)
   in their workload.
  
   Your summary of my observations appears correct.
  
   I have grabbed an up-to-date /usr/src, built and
   installed world, and built and installed a new
   kernel on one of the nodes in my cluster. ??It
   has
  
  
  It seems a perfect environment, just please make sure you made a
  debug-free userland (setting MALLOC_PRODUCTION in jemalloc basically).
  
  The first thing is, can you try reproducing your case? As far as I got
  it, for you it was enough to run N + small_amount of CPU-bound threads
  to show performance penalty, so I'd ask you to start with using dnetc
  or just your preferred cpu-bound workload and verify you can reproduce
  the issue.
  As it happens, please monitor the threads bouncing and CPU utilization
  via 'top' (you don't need to be 100% precise, jut to get an idea, and
  keep an eye on things like excessive threads migration, thread binding
  obsessity, low throughput on CPU).
  One note: if your workloads need to do I/O please use a tempfs or
  memory storage to do so, in order to reduce I/O effects at all.
  Also, verify this doesn't happen with 4BSD scheduler, just in case.
  
  Finally, if the problem is still in place, please recompile your
  kernel by adding:
  options KTR
  options KTR_ENTRIES=262144
  options KTR_COMPILE=(KTR_SCHED)
  options KTR_MASK=(KTR_SCHED)
  
  And reproduce the issue.
  When you are in the middle of the scheduling issue go with:
  # ktrdump -ctf  ktr-ule-problem-YOURNAME.out
  
  and send to the mailing list along with your dmesg and the
  informations on the CPU utilization you gathered by top(1).
  
  That should cover it all, but if you have further questions, please
  just go ahead.
 
 Attilio,
 
 I have placed several files at
 
 http://troutmask.apl.washington.edu/~kargl/freebsd
 
 dmesg.txt  -- dmesg for ULE kernel
 summary-- A summary that includes top(1) output of all runs.
 sysctl.ule.txt -- sysctl -a for the ULE kernel
 ktr-ule-problem-kargl.out.gz 
 
 I performed a series of tests with both 4BSD and ULE kernels.
 The 4BSD and ULE kernels are identical except of course for the
 scheduler.  Both witness and invariants are disabled, and malloc
 has been compiled without debugging.
 
 Here's what I did.  On the master node in my cluster, I ran an
 OpenMPI code that sends N jobs off to the node with the kernel
 of interest.  There is communication between the master and
 slaves to generate 16 independent chunks of data.  Note, there
 is no disk IO.  So, for example, N=4 will start 4 essentially
 identical numerically intensity jobs.  At the start of a run,
 the master node instructs each slave job to create a chunk of
 data.  After the data is created, the slave sends it back to the
 master and the master sends instructions to create the next chunk
 of data.  This communication continues until the 16 chunks have
 been assigned, computed, and returned to the master.  
 
 Here is a rough measurement of the problem with ULE and numerical
 intensity loads.  This command is executed on the master
 
 time mpiexec -machinefile mf3 -np N sasmp sas.in
 
 Since time is executed on the master, only the 'real' time is of
 interest (the summary file includes user and sys times).  This
 command is run at 5 times for each N value and up to 10 time for
 some N values with the ULE kernel.  The following table records
 the average 'real' time and the number in (...) is the mean
 absolute deviations. 
 
 #  N ULE 4BSD
 # -
 #  4223.27 (0.502)   221.76 (0.551)
 #  5404.35 (73.82)   270.68 (0.866)
 #  6627.56 (173.0)   247.23 (1.442)
 #  7475.53 (84.07)   285.78 (1.421)
 #  8429.45 (134.9)   223.64 (1.316)

One explanation for taking 1.5-2x times is that with ULE the
threads are not migrated properly, so you end up with idle cores
and ready threads not running (the other possible explanation
would be that there are migrations, but they are so frequent and
expensive that they completely trash the caches. But this seems
unlikely for this type of task).

Also, perhaps one could build a simple test process that replicates
this workload (so one can run it as part of regression tests):
1. define a CPU-intensive function f(n) which issues no
   system calls, optionally touching
   a lot of memory, where n  

Re: Using mmap(2) with a hint address

2011-12-22 Thread Ganael LAPLANCHE
Hi Artem, Tijl,

On Tue, 20 Dec 2011 09:27:43 -0800, Artem Belevich wrote
 Something like that. [...]
 These days malloc() by default uses mmap, so if you don't force it to
 use sbrk() you can probably lower MAXDSIZE and let kernel use most
 of address space for hinted mmaps.
 [...]

On Tue, 20 Dec 2011 18:45:08 +0100, Tijl Coosemans wrote
 I don't know about NetBSD but Linux maps from the stack 
 downwards when there's no hint and FreeBSD maps from the 
 program upwards. [...]
 malloc(3) used to be implemented on top of brk(2) so the size was
 increased on amd64 so you could malloc more memory. Nowadays malloc
 can use mmap(2) so a large datasize isn't really needed anymore.

I will use setrlimit(2) to lower datasize then.

Thanks a lot for your time and explanations,
Best regards,

--
Ganael LAPLANCHE ganael.laplan...@martymac.org
http://www.martymac.org | http://contribs.martymac.org
FreeBSD: martymac marty...@freebsd.org, http://www.FreeBSD.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread George Mitchell

On 12/22/11 04:07, Adrian Chadd wrote:

Are you able to go through the emails here and grab out Attilio's
example for generating KTR scheduler traces?


Adrian
[...]

I've put up two such files:
http://www.m5p.com/~george/ktr-ule-problem.out
http://www.m5p.com/~george/ktr-ule-interact.out
but I don't know how to analyze them myself.  What do all of us do next?
-- George Mitchell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 01:07:58AM -0800, Adrian Chadd wrote:
 Are you able to go through the emails here and grab out Attilio's
 example for generating KTR scheduler traces?
 

Did your read this part of my email?

 
  Attilio,
 
  I have placed several files at
 
  http://troutmask.apl.washington.edu/~kargl/freebsd
 
  dmesg.txt  -- dmesg for ULE kernel
  summary-- A summary that includes top(1) output of all runs.
  sysctl.ule.txt -- sysctl -a for the ULE kernel
  ktr-ule-problem-kargl.out.gz

ktr-ule-problem-kargl.out is a 43 MB file.  I don't the
freebsd.org email server would allow that file through.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


emacs-devel glib-warning

2011-12-22 Thread 1126

Hello!

I considered switching from emacs23 to emacs24 over the 
christmas-holidays.. So I removed emacs23 and installed emacs-devel via 
ports. Emacs runs fine, in terminal, but it crashes my whole X-system when 
I try to start it as X-client... The error message tells me that there is 
a glib-problem: GLib-WARNING **: In call to g_spawn_sync(), exit status 
of a child process was requested but SIGCHLD action was set to SIG_IGN and 
ECHILD was received by waitpid(), so exit status can't be returned. This 
is a bug in the program calling g_spawn_sync(); either don't request the 
exit status, or don't set the SIGCHLD action.


I have emacs-devel installed, and glib-2.28.8_2.

I am runnig xmonad as WM, but it happened on awesome as well..

Does anyone know what I can do to get emacs to work as X-client? ;)

Thanks in advance!
Greetings from rainy Cologne,
1126

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9 RC3 and VirtualBox

2011-12-22 Thread Joshua Boyd
On Wed, Dec 21, 2011 at 11:56 PM, Adam Vande More amvandem...@gmail.comwrote:

 VT-x(or the AMD equiv) is a CPU feature and is necessary to run 64-bit
 guests.  VT-d(or the AMD equiv)/IOMMU is the what is done in the chipset
 however it isn't necessary to run 64-bit guests.  Both of these features
 are only found on CPU's supporting long mode.


Exactly. The E7300 lacks the VT-x bits.


-- 
Joshua Boyd

E-mail: boy...@jbip.net
http://www.jbip.net
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Oliver Brandmueller
On Thu, Dec 15, 2011 at 05:25:51PM +0100, Attilio Rao wrote:
 If someone else thinks he has a specific problem that is not
 characterized by one of the cases above please let me know and I will
 put this in the chart.

It seems I stumbled over another thing.

Setup: 2 Servers providing devices by ggated, 1 Server using ggatec for 
those devices. ZFS over each a pair of disks provided by both ggated 
servers. I use rsync to fill up the 6 zpools/zfs from an existing 
storage (2 TB zpools, about 500 to 700 GiB user per pool). 2 rsyncs 
running in parallel to fill the partitions. Main server (ggate client 
with ZFS and rsync) has an Intel Xeon X3450 2.66 GHz quadcore processor 
(+HTT or whatever it's called nowadays, gives 8 cpus in FreeBSD).

With ULE ZFS gets slower after some time and finally gets stuck after 1 
to 3 days of continouus synchronisation (ggate works like a charm as far 
as I can tell), with 4BSD (online since 6 days) the rsync seems to run a 
lot faster and I didn't get ZFS to stall. There's nearly no local I/O 
(system is on a local SSD) and the load/CPU usage are not actually high.

All is running a quite recent RELENG_9

If anyone's interested I can get more detail and carry out some tests.


- Oliver


-- 
| Oliver Brandmueller  http://sysadm.in/ o...@sysadm.in |
|Ich bin das Internet. Sowahr ich Gott helfe. |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
 On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
 
 I have placed several files at
 
 http://troutmask.apl.washington.edu/~kargl/freebsd
 
 dmesg.txt  -- dmesg for ULE kernel
 summary-- A summary that includes top(1) output of all runs.
 sysctl.ule.txt -- sysctl -a for the ULE kernel
 ktr-ule-problem-kargl.out.gz 
 
 
 Since time is executed on the master, only the 'real' time is of
 interest (the summary file includes user and sys times).  This
 command is run at 5 times for each N value and up to 10 time for
 some N values with the ULE kernel.  The following table records
 the average 'real' time and the number in (...) is the mean
 absolute deviations. 
 
 #  N ULE 4BSD
 # -
 #  4223.27 (0.502)   221.76 (0.551)
 #  5404.35 (73.82)   270.68 (0.866)
 #  6627.56 (173.0)   247.23 (1.442)
 #  7475.53 (84.07)   285.78 (1.421)
 #  8429.45 (134.9)   223.64 (1.316)
 
 One explanation for taking 1.5-2x times is that with ULE the
 threads are not migrated properly, so you end up with idle cores
 and ready threads not running

That's what I guessed back in 2008 when I first reported the
behavior.  

http://freebsd.monkey.org/freebsd-current/200807/msg00278.html
http://freebsd.monkey.org/freebsd-current/200807/msg00280.html

The top(1) output at the above URL shows 10 completely independent
instances of the same numerically intensive application running
on a circa 2008 ULE kernel.  Look at the PRI column.  The high
PRI jobs are not only pinned to a cpu, but these are running at
100% WCPU.  The low PRI jobs seem to be pinned to a subset of the
available cpus and simply ping-pong in and out of the same cpus.
In this instance, there are 5 jobs competing for time on 3 cpus.

 Also, perhaps one could build a simple test process that replicates
 this workload (so one can run it as part of regression tests):
   1. define a CPU-intensive function f(n) which issues no
  system calls, optionally touching
  a lot of memory, where n  determines the number of iterations.
   2. by trial and error (or let the program find it),
  pick a value N1 so that the minimum execution time
  of f(N1) is in the 10..100ms range
   3. now run the function f() again from an outer loop so
  that the total execution time is large (10..100s)
  again with no intervening system calls.
   4. use an external shell script can rerun a process
  when it terminates, and then run multiple instances
  in parallel. Instead of the external script one could
  fork new instances before terminating, but i am a bit
  unclear how CPU inheritance works when a process forks.
  Going through the shell possibly breaks the chain.

The tests at the above URL does essentially what you
propose except in 2008 the kzk90 programs were doing 
some IO.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 9 RC3 and VirtualBox

2011-12-22 Thread Jim King

On 12/22/2011 9:56 AM, Joshua Boyd wrote:

On Wed, Dec 21, 2011 at 11:56 PM, Adam Vande Moreamvandem...@gmail.comwrote:


VT-x(or the AMD equiv) is a CPU feature and is necessary to run 64-bit
guests.  VT-d(or the AMD equiv)/IOMMU is the what is done in the chipset
however it isn't necessary to run 64-bit guests.  Both of these features
are only found on CPU's supporting long mode.


Exactly. The E7300 lacks the VT-x bits.


Actually there are three different part numbers for the E7300.  Two of 
them have VT-x, one does not.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: emacs-devel glib-warning

2011-12-22 Thread John Baldwin
On Thursday, December 22, 2011 10:02:09 am 1126 wrote:
 Hello!
 
 I considered switching from emacs23 to emacs24 over the 
 christmas-holidays.. So I removed emacs23 and installed emacs-devel via 
 ports. Emacs runs fine, in terminal, but it crashes my whole X-system when 
 I try to start it as X-client... The error message tells me that there is 
 a glib-problem: GLib-WARNING **: In call to g_spawn_sync(), exit status 
 of a child process was requested but SIGCHLD action was set to SIG_IGN and 
 ECHILD was received by waitpid(), so exit status can't be returned. This 
 is a bug in the program calling g_spawn_sync(); either don't request the 
 exit status, or don't set the SIGCHLD action.

That is just a bug in emacs (or some library emacs is using).  It happens even 
when emacs doesn't crash.  I suspect it is unrelated to the problem you are 
having with your X server and that the crash is caused by something else emacs 
is doing.  What do you mean in detail by crashes my whole X-system.  Does X 
actually core dump?  Does X freeze or spin using 100% CPU?  Does your window 
manager crash, etc.?

One thing you can maybe try is building emacs without dbus or gconf and seeing 
if that works better.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 11:31:45AM +0100, Luigi Rizzo wrote:
 On Wed, Dec 21, 2011 at 04:52:50PM -0800, Steve Kargl wrote:
  
  I have placed several files at
  
  http://troutmask.apl.washington.edu/~kargl/freebsd
  
  dmesg.txt  -- dmesg for ULE kernel
  summary-- A summary that includes top(1) output of all runs.
  sysctl.ule.txt -- sysctl -a for the ULE kernel
  ktr-ule-problem-kargl.out.gz 

I've replaced the original version of the ktr file with
a new version.  The old version was corrupt due to my
failure to set 'sysctl debug.ktr.mask=0' prior to the
dump.

 One explanation for taking 1.5-2x times is that with ULE the
 threads are not migrated properly, so you end up with idle cores
 and ready threads not running (the other possible explanation
 would be that there are migrations, but they are so frequent and
 expensive that they completely trash the caches. But this seems
 unlikely for this type of task).

I've used schedgraph to look at the ktrdump output.  A jpg is
available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
This shows the ping-pong effect where here 3 processes appear to be
using 2 cpus while the remaining 2 processes are pinned to their
cpus.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Andriy Gapon
on 22/12/2011 20:45 Steve Kargl said the following:
 I've used schedgraph to look at the ktrdump output.  A jpg is
 available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
 This shows the ping-pong effect where here 3 processes appear to be
 using 2 cpus while the remaining 2 processes are pinned to their
 cpus.

I'd recommended enabling CPU-specific background colors via the menu in
schedgraph for a better illustration of your findings.

NB: I still don't understand the point of purposefully running N+1 CPU-bound
processes.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote:
 on 22/12/2011 20:45 Steve Kargl said the following:
  I've used schedgraph to look at the ktrdump output.  A jpg is
  available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
  This shows the ping-pong effect where here 3 processes appear to be
  using 2 cpus while the remaining 2 processes are pinned to their
  cpus.
 
 I'd recommended enabling CPU-specific background colors via the menu in
 schedgraph for a better illustration of your findings.
 
 NB: I still don't understand the point of purposefully running N+1 CPU-bound
 processes.
 

The point is that this is a node in a HPC cluster with
multiple users.  Sure, I can start my job on this node
with only N cpu-bound jobs.  Now, when user John Doe
wants to run his OpenMPI program should he login into
the 12 nodes in the cluster to see if someone is already
running N cpu-bound jobs on a given node?  4BSD
gives my jobs and John Doe's jobs a fair share of the
available cpus.  ULE does not give a fair share and 
if you read the summary file I put up on the web,
you see that it is fairly non-deterministic on when a
OpenMPI run will finish (see the mean absolute deviations
in the table of 'real' times that I posted).

There is the additional observation in one of my 2008
emails (URLs have been posted) that if you have N+1
cpu-bound jobs with, say, job0 and job1 ping-ponging
on cpu0 (due to ULE's cpu-affinity feature) and if I
kill job2 running on cpu1, then neither job0 nor job1
will migrate to cpu1.  So, one now has N cpu-bound
jobs running on N-1 cpus.

Finally, my initial post in this email thread was to
tell O. Hartman to quit beating his head against 
a wall with ULE (in an HPC environment).  Switch to
4BSD.  This was based on my 2008 observations and 
I've now wasted 2 days gather additional information
which only re-affirms my recommendation.
 
-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Andriy Gapon
on 22/12/2011 21:47 Steve Kargl said the following:
 On Thu, Dec 22, 2011 at 09:01:15PM +0200, Andriy Gapon wrote:
 on 22/12/2011 20:45 Steve Kargl said the following:
 I've used schedgraph to look at the ktrdump output.  A jpg is
 available at http://troutmask.apl.washington.edu/~kargl/freebsd/ktr.jpg
 This shows the ping-pong effect where here 3 processes appear to be
 using 2 cpus while the remaining 2 processes are pinned to their
 cpus.

 I'd recommended enabling CPU-specific background colors via the menu in
 schedgraph for a better illustration of your findings.

 NB: I still don't understand the point of purposefully running N+1 CPU-bound
 processes.

 
 The point is that this is a node in a HPC cluster with
 multiple users.  Sure, I can start my job on this node
 with only N cpu-bound jobs.  Now, when user John Doe
 wants to run his OpenMPI program should he login into
 the 12 nodes in the cluster to see if someone is already
 running N cpu-bound jobs on a given node?  4BSD
 gives my jobs and John Doe's jobs a fair share of the
 available cpus.  ULE does not give a fair share and 
 if you read the summary file I put up on the web,
 you see that it is fairly non-deterministic on when a
 OpenMPI run will finish (see the mean absolute deviations
 in the table of 'real' times that I posted).

OK.
I think I know why the uneven load occurs.  I remember even trying to explain my
observations.
There are two things:
1. ULE doesn't have either a common across CPUs runqueue nor any other kind of
mechanism for enforcing true global fairness of CPU resource sharing.
2. ULE's rebalancing code is biased and that leads to the situation where
sub-groups of threads can share subsets of CPUs rather fairly, but there won't
be a global fairness.

I haven't really given any thought as to how to fix or workaround these issues.
One dumb idea is to add an element of randomness to a choice between equally
loaded CPUs (and their subsets) instead of having a permanent bias.

 There is the additional observation in one of my 2008
 emails (URLs have been posted) that if you have N+1
 cpu-bound jobs with, say, job0 and job1 ping-ponging
 on cpu0 (due to ULE's cpu-affinity feature) and if I
 kill job2 running on cpu1, then neither job0 nor job1
 will migrate to cpu1.  So, one now has N cpu-bound
 jobs running on N-1 cpus.

Have you checked recently that that is still the case?
I would consider this a rather serious bug as opposed to a sub-optimal 
scheduling.

 Finally, my initial post in this email thread was to
 tell O. Hartman to quit beating his head against 
 a wall with ULE (in an HPC environment).  Switch to
 4BSD.  This was based on my 2008 observations and 
 I've now wasted 2 days gather additional information
 which only re-affirms my recommendation.

I think that any objective information has its value.  So maybe the time is not
really wasted.  I think there is no argument that for your usage pattern 4BSD is
better than ULE at the moment, because of the inherent design choices of both
schedulers and their current implementations.  But I think that ULE could be
improved to produce more global fairness.

P.S.
But, but, this thread has seen so many different problem reports about ULE
heaped together that it's very easy to get confused about what is caused by what
and what is real and what is not.  E.g. I don't think that there is a direct
relation between this issue (N+1 CPU-bound tasks) and my X is sluggish with ULE
when I untar a large file.

P.P.S.
About the subject line.  Let's recall why ULE has become a default.  It has
happened because of many observations from users and developers that things
were faster/snappier with ULE than with 4BSD and a significant stream of
requests to make it the default.
So it's business as usual.  The schedulers are different, so there those for
whom one scheduler works better and those for whom the other works better and
those for whom both work reasonably well and those for whom neither is
satisfactory and those who don't really care/compare.  There is a silent
majority and the vocal minorities.  There are specific bugs and quirks,
advantages and disadvantages, usage patterns, hardware configurations and what
not.  When everybody starts to talk at the same time, it's a huge mess.  But
silently triaging and debugging one problem at a time also doesn't always work.
There, I've said it.  Let me now try to recall why I felt a need to say all of
this :-)
-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Mystery panic, FreeBSD 7.2-PRE

2011-12-22 Thread Charlie Martin
We've got another mystery panic in 7.2-PRE.  Upgrading is not an option; 
however, if this is familiar to anyone, backporting a patch would be.


The stack trace is:

db_trace_self_wrapper() at 0x8019120a = db_trace_self_wrapper+0x2a^M
panic() at 0x80308797 = panic+0x187^M
devfs_populate_loop() at 0x802a45c8 = devfs_populate_loop+0x548^M
devfs_populate() at 0x802a46ab = devfs_populate+0x3b^M
devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M
VOP_LOO[24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb):  rcvd hrtbt ts 
24051, 7/9,

rc 0^M
KUP_APV() at 0x804d5995 = VOP_LOOKUP_APV+0x95^M
lookup() at 0x80384a3e = lookup+0x4ce^M
namei() at 0x80385768 = namei+0x2c8^M
vn_open_cred() at 0x8039b283 = vn_open_cred+0x1b3^M
kern_open() at 0x8039a4a0 = kern_open+0x110^M
syscall() at 0x804b0e3c = syscall+0x1ec^M
Xfast_syscall() at 0x80494ecb = Xfast_syscall+0xab^M
--- syscall (5, FreeBSD ELF64, open), rip = 0x800e022fc, rsp = 
0x7fbfa128,

rbp = 0x801002240 ---^M
KDB: enter: panic^M
--

Charles R. (Charlie) Martin
Senior Software Engineer
SGI logo
1900 Pike Road
Longmont, CO 80501
Phone: 303-532-0209
E-Mail: crmar...@sgi.com mailto:crmar...@sgi.com
Website: www.sgi.com http://www.sgi.com

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-22 Thread O. Hartmann
On 12/21/11 19:41, Alexander Leidinger wrote:
 Hi,
 
 while the discussion continued here, some work started at some other place. 
 Now... in case someone here is willing to help instead of talking, feel free 
 to go to http://wiki.freebsd.org/BenchmarkAdvice and have a look what can be 
 improved. The page is far from perfect and needs some additional people which 
 are willing to improve it.
 
 This is only part of the problem. A tuning page in the wiki - which could be 
 referenced from the benchmark page - would be great too. Any volunteers? A 
 first step would be to take he tuning-man-page and wikify it. Other tuning 
 sources are welcome too.
 
 Every FreeBSD dev with a wiki account can hand out write access to the wiki. 
 The benchmark page gives contributor-access. If someone wants write access 
 create a FirstnameLastname account and ask here for contributor-access.
 
 Don't worry if you think your english is not good enough, even some one-word 
 notes can help (and _my_ english got already corrected by other people on the 
 benchmark page).
 
 Bye,
 Alexander.
 
 
 
 

Nice to see movement ;-)

But there seems something unclear:

man make.conf(5) says, that  MALLOC_PRODUCTION is a knob set in
/etc/make.conf.
The WiJi says, MALLOC_PRODUCTION is to be set in /etc/src.conf.

What's right and what's wrong now?

Oliver



signature.asc
Description: OpenPGP digital signature


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-22 Thread Jeremy Chadwick
On Fri, Dec 23, 2011 at 12:44:14AM +0100, O. Hartmann wrote:
 On 12/21/11 19:41, Alexander Leidinger wrote:
  Hi,
  
  while the discussion continued here, some work started at some other place. 
  Now... in case someone here is willing to help instead of talking, feel 
  free to go to http://wiki.freebsd.org/BenchmarkAdvice and have a look what 
  can be improved. The page is far from perfect and needs some additional 
  people which are willing to improve it.
  
  This is only part of the problem. A tuning page in the wiki - which could 
  be referenced from the benchmark page - would be great too. Any volunteers? 
  A first step would be to take he tuning-man-page and wikify it. Other 
  tuning sources are welcome too.
  
  Every FreeBSD dev with a wiki account can hand out write access to the 
  wiki. The benchmark page gives contributor-access. If someone wants write 
  access create a FirstnameLastname account and ask here for 
  contributor-access.
  
  Don't worry if you think your english is not good enough, even some 
  one-word notes can help (and _my_ english got already corrected by other 
  people on the benchmark page).
  
  Bye,
  Alexander.
  
  
  
  
 
 Nice to see movement ;-)
 
 But there seems something unclear:
 
 man make.conf(5) says, that  MALLOC_PRODUCTION is a knob set in
 /etc/make.conf.
 The WiJi says, MALLOC_PRODUCTION is to be set in /etc/src.conf.
 
 What's right and what's wrong now?

I can say with certainty that this value belongs in /etc/make.conf
(on RELENG_8 and earlier at least).

src/share/mk/bsd.own.mk has no framework for MK_MALLOC_PRODUCTION,
so, this is definitely a make.conf variable.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Mystery panic, FreeBSD 7.2-PRE

2011-12-22 Thread Jeremy Chadwick
On Thu, Dec 22, 2011 at 04:04:48PM -0700, Charlie Martin wrote:
 We've got another mystery panic in 7.2-PRE.  Upgrading is not an
 option; however, if this is familiar to anyone, backporting a patch
 would be.
 
 The stack trace is:
 
 db_trace_self_wrapper() at 0x8019120a = db_trace_self_wrapper+0x2a^M
 panic() at 0x80308797 = panic+0x187^M
 devfs_populate_loop() at 0x802a45c8 = devfs_populate_loop+0x548^M
 devfs_populate() at 0x802a46ab = devfs_populate+0x3b^M
 devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M
 VOP_LOO[24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb):  rcvd hrtbt ts
 24051, 7/9,
 rc 0^M
 KUP_APV() at 0x804d5995 = VOP_LOOKUP_APV+0x95^M
 lookup() at 0x80384a3e = lookup+0x4ce^M
 namei() at 0x80385768 = namei+0x2c8^M
 vn_open_cred() at 0x8039b283 = vn_open_cred+0x1b3^M
 kern_open() at 0x8039a4a0 = kern_open+0x110^M
 syscall() at 0x804b0e3c = syscall+0x1ec^M
 Xfast_syscall() at 0x80494ecb = Xfast_syscall+0xab^M
 --- syscall (5, FreeBSD ELF64, open), rip = 0x800e022fc, rsp =
 0x7fbfa128,
 rbp = 0x801002240 ---^M
 KDB: enter: panic^M

devfs(5) has been massively worked on in RELENG_8 and newer.  You should
go through the below commits and see if you can find one that references
a PR with a similar backtrace, or mentions things like devfs_lookup().

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/fs/devfs/

Also, be aware that the above stack trace is interspersed.  Ultimately
you get to clean up the output yourself.  This is a long-standing
problem with FreeBSD which can be helped but only slightly/barely by
using options PRINTF_BUFR_SIZE=256 in your kernel configuration (the
default configs have a value of 128.  Do not increase the value too
high, there are concerns about it causing major issues; I can dig up the
post that says that, but I'd rather not).  It *will not* solve the
problem of interspersed output entirely.  There still is no fix for this
problem... :-(

What I'm referring to:

 devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M
 VOP_LOO[24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb):  rcvd hrtbt ts
 24051, 7/9,
 rc 0^M
 lookup() at 0x80384a3e = lookup+0x4ce^M

This should actually read (I think):

 devfs_lookup() at 0x802a7824 = devfs_lookup+0x264^M
 VOP_LOOKUP_APV() at 0x804d5995 = VOP_LOOKUP_APV+0x95^M
 [24165][irq261: plx0] DEBUG (hasc_sv_rcv_cb): rcvd hrtbt ts 24051, 7/9, rc 0^M

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator   Mountain View, CA, US |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Adrian Chadd
On 22 December 2011 11:47, Steve Kargl s...@troutmask.apl.washington.edu 
wrote:

[snip]

Thankyou for posting some actual measurements!

 There is the additional observation in one of my 2008
 emails (URLs have been posted) that if you have N+1
 cpu-bound jobs with, say, job0 and job1 ping-ponging
 on cpu0 (due to ULE's cpu-affinity feature) and if I
 kill job2 running on cpu1, then neither job0 nor job1
 will migrate to cpu1.  So, one now has N cpu-bound
 jobs running on N-1 cpus.

.. and this sounds like a pretty serious regression. Have you ever
filed a PR for it?

 Finally, my initial post in this email thread was to
 tell O. Hartman to quit beating his head against
 a wall with ULE (in an HPC environment).  Switch to
 4BSD.  This was based on my 2008 observations and
 I've now wasted 2 days gather additional information
 which only re-affirms my recommendation.

I personally don't think this is time wasted. You've done something
that noone else has actually done - provided actual results from
real-life testing, rather than a hundred posts of I remember seeing
X, so I don't use ULE.

If you can definitely and consistently reproduce that N-1 cpu bound
job bug, you're now in a great position to easily test and re-report
KTR/schedtrace results to see what impact they have. Please don't
underestimate exactly how valuable this is.

How often are those two jobs migrating between CPUs? How am I supposed
to read CPU load ? Why isn't it just sitting at 100% the whole time?

Would you mind repeating this with 4BSD (the N+1 jobs) so we can see
how the jobs are scheduled/interleaved? Something tells me we'll see
it the jobs being scheduled evenly


Adrian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Doug Barton
On 12/22/2011 16:23, Adrian Chadd wrote:
 You've done something
 that noone else has actually done - provided actual results from
 real-life testing, rather than a hundred posts of I remember seeing
 X, so I don't use ULE.

Not to take away from Steve's excellent work on this, but I actually
spent weeks following detailed instructions from various people using
ktr, dtrace, etc. and was never able to produce any data that helped
point anyone to something that could be fixed. I'm pretty sure that
others have tried as well.

That said, I'm glad that Steve was able to produce useful results, and
hopefully it will lead to improvements.


Doug

-- 

[^L]

Breadth of IT experience, and depth of knowledge in the DNS.
Yours for the right price.  :)  http://SupersetSolutions.com/

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: SCHED_ULE should not be the default

2011-12-22 Thread Steve Kargl
On Thu, Dec 22, 2011 at 04:23:29PM -0800, Adrian Chadd wrote:
 On 22 December 2011 11:47, Steve Kargl s...@troutmask.apl.washington.edu 
 wrote:
 
  There is the additional observation in one of my 2008
  emails (URLs have been posted) that if you have N+1
  cpu-bound jobs with, say, job0 and job1 ping-ponging
  on cpu0 (due to ULE's cpu-affinity feature) and if I
  kill job2 running on cpu1, then neither job0 nor job1
  will migrate to cpu1. ?So, one now has N cpu-bound
  jobs running on N-1 cpus.
 
 .. and this sounds like a pretty serious regression. Have you ever
 filed a PR for it?

No.  I was interacting directly with jeffr in 2008.  I got
as far as setting up root access on a node for jeffr.
Unfortunately, both jeffr and I got busy with real life,
and 4BSD allowed me to get my work done.

  Finally, my initial post in this email thread was to
  tell O. Hartman to quit beating his head against
  a wall with ULE (in an HPC environment). ?Switch to
  4BSD. ?This was based on my 2008 observations and
  I've now wasted 2 days gather additional information
  which only re-affirms my recommendation.
 
 I personally don't think this is time wasted. You've done something
 that noone else has actually done - provided actual results from
 real-life testing, rather than a hundred posts of I remember seeing
 X, so I don't use ULE.
 
 If you can definitely and consistently reproduce that N-1 cpu bound
 job bug, you're now in a great position to easily test and re-report
 KTR/schedtrace results to see what impact they have. Please don't
 underestimate exactly how valuable this is.

I'll try this tomorrow.  I first need to modify the code I used
in the 2008 test to disable IO, so that it is nearly completely
cpu-bound.

 How often are those two jobs migrating between CPUs? How am I supposed
 to read CPU load ? Why isn't it just sitting at 100% the whole time?

This is my 1st foray into ktr and schedgraph, so I may not have done
something incorrectly.  In particular, it seems that schedgraph takes
the cpu clock as a command line argument, so there is probably 
some scaling that I'm missing.

 Would you mind repeating this with 4BSD (the N+1 jobs) so we can see
 how the jobs are scheduled/interleaved? Something tells me we'll see
 it the jobs being scheduled evenly

Sure, I'll do this tomorrow as well.

-- 
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1 Server

2011-12-22 Thread Garrett Cooper
On Dec 22, 2011, at 3:58 PM, Jeremy Chadwick free...@jdc.parodius.com wrote:

 On Fri, Dec 23, 2011 at 12:44:14AM +0100, O. Hartmann wrote:
 On 12/21/11 19:41, Alexander Leidinger wrote:
 Hi,
 
 while the discussion continued here, some work started at some other place. 
 Now... in case someone here is willing to help instead of talking, feel 
 free to go to http://wiki.freebsd.org/BenchmarkAdvice and have a look what 
 can be improved. The page is far from perfect and needs some additional 
 people which are willing to improve it.
 
 This is only part of the problem. A tuning page in the wiki - which could 
 be referenced from the benchmark page - would be great too. Any volunteers? 
 A first step would be to take he tuning-man-page and wikify it. Other 
 tuning sources are welcome too.
 
 Every FreeBSD dev with a wiki account can hand out write access to the 
 wiki. The benchmark page gives contributor-access. If someone wants write 
 access create a FirstnameLastname account and ask here for 
 contributor-access.
 
 Don't worry if you think your english is not good enough, even some 
 one-word notes can help (and _my_ english got already corrected by other 
 people on the benchmark page).
 
 Bye,
 Alexander.
 
 
 
 
 
 Nice to see movement ;-)
 
 But there seems something unclear:
 
 man make.conf(5) says, that  MALLOC_PRODUCTION is a knob set in
 /etc/make.conf.
 The WiJi says, MALLOC_PRODUCTION is to be set in /etc/src.conf.
 
 What's right and what's wrong now?
 
 I can say with certainty that this value belongs in /etc/make.conf
 (on RELENG_8 and earlier at least).
 
 src/share/mk/bsd.own.mk has no framework for MK_MALLOC_PRODUCTION,
 so, this is definitely a make.conf variable.

Take the advice in tuning(7) with a grain of salt because a number of 
suggestions are really outdated. I know because I filed a PR last night after I 
saw how out of synch some of the defaults it claimed were with reality on 9.x+. 
And I know other suggestions in the manpage are dated as well ;/.
Thanks,
-Garrett___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: emacs-devel glib-warning

2011-12-22 Thread Denise H. G.

On 2011/12/22 at 23:02, 1126 mailingli...@elfsechsundzwanzig.de wrote:
 
 Hello!
 I considered switching from emacs23 to emacs24 over the
 christmas-holidays.. So I removed emacs23 and installed emacs-devel
 via ports. Emacs runs fine, in terminal, but it crashes my whole
 X-system when I try to start it as X-client... The error message tells
 me that there is a glib-problem: GLib-WARNING **: In call to
 g_spawn_sync(), exit status of a child process was requested but
 SIGCHLD action was set to SIG_IGN and ECHILD was received by
 waitpid(), so exit status can't be returned. This is a bug in the
 program calling g_spawn_sync(); either don't request the exit status,
 or don't set the SIGCHLD action.

I am currently using emacs-devel, however, I have been using glib-2.30.x
from marcus's experimental ports for a while. It seems everything is ok.
Either you can tweak some configure args availabe to emacs-devel, and
see how it is going, or you might pull the glib-2.30.x port from
Marcus's site and give it a try.

Good luck!

 
 I have emacs-devel installed, and glib-2.28.8_2.
 
 I am runnig xmonad as WM, but it happened on awesome as well..
 
 Does anyone know what I can do to get emacs to work as X-client? ;)
 
 Thanks in advance!
 Greetings from rainy Cologne,
 1126
 
  



-- 
The inside contact that you have developed at great
expense is the first person to be let go in any
reorganization.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: directory listing hangs in ufs state

2011-12-22 Thread Alan Cox

On 12/22/2011 03:48, Kostik Belousov wrote:

On Wed, Dec 21, 2011 at 09:03:02PM +0400, Andrey Zonov wrote:

On 15.12.2011 17:01, Kostik Belousov wrote:

On Thu, Dec 15, 2011 at 03:51:02PM +0400, Andrey Zonov wrote:

On Thu, Dec 15, 2011 at 12:42 AM, Jeremy Chadwick
free...@jdc.parodius.comwrote:


On Wed, Dec 14, 2011 at 11:47:10PM +0400, Andrey Zonov wrote:

On 14.12.2011 22:22, Jeremy Chadwick wrote:

On Wed, Dec 14, 2011 at 10:11:47PM +0400, Andrey Zonov wrote:

Hi Jeremy,

This is not hardware problem, I've already checked that. I also ran
fsck today and got no errors.

After some more exploration of how mongodb works, I found that then
listing hangs, one of mongodb thread is in biowr state for a long
time. It periodically calls msync(MS_SYNC) accordingly to ktrace
out.

If I'll remove msync() calls from mongodb, how often data will be
sync by OS?

--
Andrey Zonov

On 14.12.2011 2:15, Jeremy Chadwick wrote:

On Wed, Dec 14, 2011 at 01:11:19AM +0400, Andrey Zonov wrote:

Have you any ideas what is going on? or how to catch the problem?

Assuming this isn't a file on the root filesystem, try booting the
machine in single-user mode and using fsck -f on the filesystem in
question.

Can you verify there's no problems with the disk this file lives on
as
well (smartctl -a /dev/disk)?  I'm doubting this is the problem, but
thought I'd mention it.

I have no real answer, I'm sorry.  msync(2) indicates it's effectively
deprecated (see BUGS).  It looks like this is effectively a
mmap-version
of fsync(2).

I replaced msync(2) with fsync(2).  Unfortunately, from man pages it
is not obvious that I can do this. Anyway, thanks.

Sorry, that wasn't what I was implying.  Let me try to explain
differently.

msync(2) looks, to me, like an mmap-specific version of fsync(2).  Based
on the man page, it seems that the with msync() you can effectively
guaranteed flushing of certain pages within an mmap()'d region to disk.
fsync() would flush **all** buffers/internal pages to be flushed to
disk.

One would need to look at the code to mongodb to find out what it's
actually doing with msync().  That is to say, if it's doing something
like this (I probably have the semantics wrong -- I've never spent much
time with mmap()):

fd = open(/some/file, O_RDWR);
ptr = mmap(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
ret = msync(ptr, 65536, MS_SYNC);
/* or alternatively, this:
ret = msync(ptr, NULL, MS_SYNC);
*/

Then this, to me, would be mostly the equivalent to:

fd = fopen(/some/file, r+);
ret = fsync(fd);

Otherwise, if it's calling msync() only on an address/location within
the region ptr points to, then that may be more efficient (less pages to
flush).


They call msync() for the whole file.  So, there will not be any
difference.



The mmap() arguments -- specifically flags (see man page) -- also play
a role here.  The one that catches my attention is MAP_NOSYNC.  So you
may need to look at the mongodb code to figure out what it's mmap()
call is.

One might wonder why they don't just use open() with the O_SYNC.  I
imagine that has to do with, again, performance; possibly the don't want
all I/O synchronous, and would rather flush certain pages in the mmap'd
region to disk as needed.  I see the legitimacy in that approach (vs.
just using O_SYNC).

There's really no easy way for me to tell you which is more efficient,
better, blah blah without spending a lot of time with a benchmarking
program that tests all of this, *plus* an entire system (world) built
with profiling.


I ran for two hours mongodb with fsync() and got the following:
STARTED  INBLK OUBLK MAJFLT MINFLT
Thu Dec 15 10:34:52 2011 3 192744314 3080182

This is output of `ps -o lstart,inblock,oublock,majflt,minflt -U mongodb'.

Then I ran it with default msync():
STARTED  INBLK OUBLK MAJFLT MINFLT
Thu Dec 15 12:34:53 2011 0 7241555 79 5401945

There are also two graphics of disk business [1] [2].

The difference is significant, in 37 times!  That what I expected to get.

In commentaries for vm_object_page_clean() I found this:

  *  When stuffing pages asynchronously, allow clustering.  XXX we
  need a
  *  synchronous clustering mode implementation.

It means for me that msync(MS_SYNC) flush every page on disk in single IO
transaction.  If we multiply 4K and 37 we get 150K.  This number is size
of
the single transaction in my experience.

+alc@, kib@

Am I right? Is there any plan to implement this?

Current buffer clustering code can only do only async writes. In fact, I
am not quite sure what would consitute the sync clustering, because the
ability to delay the write is important to be able to cluster at all.

Also, I am not sure that lack of clustering is the biggest problem.
IMO, the fact that each write is sync is the first problem there. It
would be quite a work to add the tracking of the issued writes to the
vm_object_page_clean() and down the stack. Esp. due to custom page
write