Re: Too many PMC implementations

2018-08-17 Thread Jason Thorpe



> On Aug 17, 2018, at 8:42 AM, Kamil Rytarowski  wrote:
> 
> Speaking realistically, probably all the recent software-based kernel
> profiling was done with DTrace.

Yah, I suppose I'm okay will killing off kernel GPROF support ... you can 
essentially do the same-thing-but-better with an on-cpu flame graph generated 
from dtrace data.  If the lower-tier platforms don't support this properly, the 
energy should go towards fixing that.

-- thorpej



Re: Too many PMC implementations

2018-08-17 Thread Kamil Rytarowski
On 17.08.2018 17:13, Maxime Villard wrote:
> Note that I'm talking about the kernel gprof, and not the userland gprof.
> In terms of kernel profiling, it's not nonsensical to say that since we
> support ARM and x86 in tprof, we can cover 99% of the MI parts of
> whatever architecture. From then on, being able to profile the kernel on
> other architectures has very little interest.
> 

Speaking realistically, probably all the recent software-based kernel
profiling was done with DTrace.



signature.asc
Description: OpenPGP digital signature


Re: Too many PMC implementations

2018-08-17 Thread Maxime Villard

Le 17/08/2018 à 16:43, Joerg Sonnenberger a écrit :

On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote:

So no one has any opinion on that? Because in this case I will remove it
soon. (Talking about the kernel gprof.)


I'm quite reluctant to remove the only sample based profiler we have
right now. Esp. since we don't have any infrastructure for counter-based
profilers either AFAICT.


We do with tprof now, no?


Le 17/08/2018 à 16:50, Mouse a écrit :

I agree that it would be better to retire gprof in base, because
there are more powerful tools now, and also advanced hardware
support (PMC, PEBS, ProcessorTrace).


...for ports that _have_ "advanced hardware support", maybe.  (And what
are the "more powerful tools"?  I haven't been following the state of
the art in open-source profiling tools.)


Yes, basically I was talking about x86. I do know that many architectures
support PMCs, but I don't know how precise the events are (etc). The tools
were mentioned before, like the linux "perf", which is pretty good.

Note that I'm talking about the kernel gprof, and not the userland gprof.
In terms of kernel profiling, it's not nonsensical to say that since we
support ARM and x86 in tprof, we can cover 99% of the MI parts of
whatever architecture. From then on, being able to profile the kernel on
other architectures has very little interest.

The gprof code is rather shitty and old, I dropped it from the x86 kernels
so it's not like I care a lot now, but since I saw the thread I thought I
would bring this up.


Re: Too many PMC implementations

2018-08-17 Thread Mouse
>> I agree that it would be better to retire gprof in base, because
>> there are more powerful tools now, and also advanced hardware
>> support (PMC, PEBS, ProcessorTrace).

...for ports that _have_ "advanced hardware support", maybe.  (And what
are the "more powerful tools"?  I haven't been following the state of
the art in open-source profiling tools.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Too many PMC implementations

2018-08-17 Thread Joerg Sonnenberger
On Fri, Aug 17, 2018 at 04:20:30PM +0200, Maxime Villard wrote:
> Le 10/08/2018 à 11:40, Maxime Villard a écrit :
> > I saw the thread [Re: Sample based profiling] on tech-userlevel@, I'm not
> > subscribed to this list but I'm answering here because it's related to
> > tprof among other things.
> > 
> > I agree that it would be better to retire gprof in base, because there are
> > more powerful tools now, and also advanced hardware support (PMC, PEBS,
> > ProcessorTrace).
> > 
> > But in particular, it would be nice to retire the "kernel gprof". That is,
> > the MD/MI pieces that are surrounded by #ifdef GPROF. This kind of
> > profiling is weak, and misses many aspects of execution (branch prediction,
> > cache misses, heavy instructions, etc) that are offered by tprof.
> > 
> > I already dropped NENTRY() from x86, so KGPROF is officially not supported
> > there anymore. I think it has never worked on amd64.
> 
> So no one has any opinion on that? Because in this case I will remove it
> soon. (Talking about the kernel gprof.)

I'm quite reluctant to remove the only sample based profiler we have
right now. Esp. since we don't have any infrastructure for counter-based
profilers either AFAICT.

Joerg


Re: panic: biodone2 already

2018-08-17 Thread Jaromir Dolecek



> Le 17 août 2018 à 07:07, Michael van Elst  a écrit :
> 
>> On Fri, Aug 17, 2018 at 02:23:16AM +, Emmanuel Dreyfus wrote:
>> 
>>blkif_response_t *rep = RING_GET_RESPONSE(>sc_ring, i);
>>struct xbd_req *xbdreq = >sc_reqs[rep->id];
>>bp = xbdreq->req_bp;
>> 
>> It decides to call dk_done for the last occurence and return. Next
>> call to xbd_handler finds the same offending buf_t leading the queue.
>> dk_done is called again, leading to the panic.
> 

It should not do this since cons should equal prod and it should not enter the 
loop. i was investigating whether it could be some interaction between 
DIOCCACHEFLUSH and bio, or raw vs block I/O, nothing found yet. 

yes, one of the problems is the code happily handles stale bufs. it does not 
clear the buf pointer when the response is already handled. we should add some 
KASSERTs there for this and clear the response structure on reuse. 

also DPRINTF() in the loop assume bp is balid so it uses stale pointer for disk 
flush ...

the whole xbd code really needs a cleanup and proper mpification. 

I was not able to repeat the panic yet on my machine though. maybe i need 
bigger virtual disk. 

Re: All processes go tstile

2018-08-17 Thread J. Hannken-Illjes


> On 17. Aug 2018, at 04:46, Emmanuel Dreyfus  wrote:
> 
> On Thu, Aug 16, 2018 at 10:03:11AM +0200, J. Hannken-Illjes wrote:
>> Looks like a deadlock where we sync a file system, need a new buffer
>> and try to free a buffer on a currently suspending file system.
>> 
>> VFS_SYNC and VOP_BWRITE should be on different mounts here.
> 
> Here are the mounts:
> /dev/raid3a on / type ffs (log, local)
> /dev/raid3e on /home type ffs (nodev, noexec, nosuid, NFS exported, local)
> 
> The problem arises while dump is performing a snapshot backup on /home

The first thirty lines of "dumpfs /home" please.

Did you use "dump -x ..." or "dump -X"?

Did the "dump" process hang?

> It is worth noting that /home does not have -o log because frequent
> fsync from NFS clients kill performances with -o log. At least it did 
> it on netbsd-7, but I do not see why it could have changed on netbsd-8: 
> frequent fsync means frequence log flushes on the server.
> 
>> Do you have a crash dump?
> 
> I was not able to get one so far, the dump device is not properly
> configured on this system.

If it hits you again, from ddb:

ps /l
call fstrans_dump

backtrace of the "dump" process

--
J. Hannken-Illjes - hann...@eis.cs.tu-bs.de - TU Braunschweig (Germany)