Re: [R] interpreting profiling output

2004-07-27 Thread Kasper Daniel Hansen
On Tue, Jul 27, 2004 at 06:17:10PM -0400, Liaw, Andy wrote:
> > From: Kasper Daniel Hansen
> > 
> > On Tue, Jul 27, 2004 at 09:12:56AM +0200, Peter Dalgaard wrote:
> > > 
> > > Hmm... You're using summaryRprof(), obviously. Do the 
> > results match up
> > > with what you get from R CMD Rprof? Looking at the source for
> > > summaryRprof, I see that it uses this construction
> > > 
> > > selft <- sapply(ls(envir = self), function(f) get(f, envir =
> > > self))
> > >  
> > > which I suspect does not count function names that start 
> > with a dot...
> > 
> > It does not. Runnig everything through R CMD Rprof gives a much moe 
> > sensible output. I have no further questions.
> > 
> > Do I file a bug report on summaryRprof - it is certainly 
> > unintentional 
> > that it does not compute eg. .C calls when computing the self time?
> 
> I thought I did see .C in the output of summaryRprof() at one time...

Well, I tried Peter's fix and it solves the problem for me. I have filed 
a bug report including the fix. Thanks for the help everyone.

-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] interpreting profiling output

2004-07-27 Thread Liaw, Andy
> From: Kasper Daniel Hansen
> 
> On Tue, Jul 27, 2004 at 09:12:56AM +0200, Peter Dalgaard wrote:
> > 
> > Hmm... You're using summaryRprof(), obviously. Do the 
> results match up
> > with what you get from R CMD Rprof? Looking at the source for
> > summaryRprof, I see that it uses this construction
> > 
> > selft <- sapply(ls(envir = self), function(f) get(f, envir =
> > self))
> >  
> > which I suspect does not count function names that start 
> with a dot...
> 
> It does not. Runnig everything through R CMD Rprof gives a much moe 
> sensible output. I have no further questions.
> 
> Do I file a bug report on summaryRprof - it is certainly 
> unintentional 
> that it does not compute eg. .C calls when computing the self time?

I thought I did see .C in the output of summaryRprof() at one time...

Andy

> -- 
> Kasper Daniel Hansen, Research Assistant
> Department of Biostatistics, University of Copenhagen
> 
> __
> [EMAIL PROTECTED] mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interpreting profiling output

2004-07-27 Thread Kasper Daniel Hansen
On Tue, Jul 27, 2004 at 09:12:56AM +0200, Peter Dalgaard wrote:
> [What did ess-help have to do with this?? Snipped from CC:]

This happens when posting in a hurry using a new mail client. Sorry. See 
below btw.

> Kasper Daniel Hansen <[EMAIL PROTECTED]> writes:
> 
> > Thanks to both of you for considering my question. I still have some
> > problems however. Consider the nnet.default function which uses the .C
> > interface. As 
> > > typeof(.C)
> > [1] "builtin"
> > I would guess - from my current understanding - that the time spent in
> > the C functions of nn.default are added to its self.time. Now look at
> > the output
> > 
> > > prof02.out$by.self[1:10,]
> >   self.time self.pct total.time total.pct
> > nnet.default  33.32 11.22186.48  92.1
> > 
> > So nnet.default only used 33.3 on its C calls, but 2186 in total
> > time. From this I would guess that nnet.default - in my case - spends
> > its majority of time setting up the fit and postprocessing it (as this
> > is done by R functions which does not add to its self time). The
> > actual fit (which is done by the C functions) only takes very little
> > time. 
> > 
> > Looking further down the table I see
> > 
> >   self.time self.pct total.time total.pct
> > as.integer22.28  7.5  30.92   1.3
> > 
> > as.interger is basically a .Internal call - which ought not to be
> > recorded. But why does it then have such a "high" self.time?
> > 
> > And finally, if a program (such as mine) have a high discrepancy
> > between self.time and sampling.time, does this imply that the far
> > majority of time is spent in .Primitives which are of type special?
> 
> Hmm... You're using summaryRprof(), obviously. Do the results match up
> with what you get from R CMD Rprof? Looking at the source for
> summaryRprof, I see that it uses this construction
> 
> selft <- sapply(ls(envir = self), function(f) get(f, envir =
> self))
>  
> which I suspect does not count function names that start with a dot...

It does not. Runnig everything through R CMD Rprof gives a much moe 
sensible output. I have no further questions.

Do I file a bug report on summaryRprof - it is certainly unintentional 
that it does not compute eg. .C calls when computing the self time?
-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interpreting profiling output

2004-07-26 Thread Luke Tierney
On Sun, 25 Jul 2004, Peter Dalgaard wrote:

> Kasper Daniel Hansen <[EMAIL PROTECTED]> writes:
> 
> > I have some trouble interpreting the output from profiling. I have
> > read the help pages Rprof, summaryRprof and consult the R extensions
> > manual, but I still have problems understanding the output.
> > 
> > Basically the output consist of self.time and total.time. I have the
> > understanding that total.time is the time spent in a given function
> > including any subcalls or child functions or whatever the technical
> > term for that activity is. In contrasts self.time is the time spent in
> > the function excluding subcalls.
> > 
> > Now, in my understanding basically everything in R is functions. I
> > would then guess that for almost any function (except the "atomic
> > ones") the self.time would be very small as it would spend most of its
> > time calling other functions (again, since almost everything is a
> > function). So how do R determine when a subfunction is called?
> ...brutal snippage...
> > I guess some of the answers would be clear if I had a firm grasp of
> > the inner workings of R :)
> 
> In a word, yes... 
> 
> The profiling keeps track of R's *context stack* which is not quite
> the same as function calls. Essentially, it only counts R functions that
> are actually written in R, but not .Internal, .Primitive, etc. So
> "self" counts the amount of time that a function was at the top of the
> stack. This is done by a periodic poll which dumps out the context
> stack at regular intervals. Seeing cases where the self percentages
> don't add to 100% is, I believe, simply due to truncation of the tails
> -- that is, there is a large number of different functions which each
> are counted a few times, and these are not shown in the summary output.
> 
> [Sorry about the late reply, but I was out of town, and noone seems to
> have answered this.]

Just one additional clarification: There are two flavors of
.Primitive, 'builtin' and 'special'.  You can tell which is which
using typeof().  Builtins are things like arithmetic operations and
specials include control constructs like `for`.  At present profiling
does record builtins but not specials.  Recording specials is
something we would like to do eventually but it requires more changes
to the call stack (and dealing with more interections with those
changes) so it hasn't happened yet. .Internal is a special, so no
.Internal calls are recorded.

luke

-- 
Luke Tierney
University of Iowa  Phone: 319-335-3386
Department of Statistics andFax:   319-335-3017
   Actuarial Science
241 Schaeffer Hall  email:  [EMAIL PROTECTED]
Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] interpreting profiling output

2004-07-25 Thread Peter Dalgaard
Kasper Daniel Hansen <[EMAIL PROTECTED]> writes:

> I have some trouble interpreting the output from profiling. I have
> read the help pages Rprof, summaryRprof and consult the R extensions
> manual, but I still have problems understanding the output.
> 
> Basically the output consist of self.time and total.time. I have the
> understanding that total.time is the time spent in a given function
> including any subcalls or child functions or whatever the technical
> term for that activity is. In contrasts self.time is the time spent in
> the function excluding subcalls.
> 
> Now, in my understanding basically everything in R is functions. I
> would then guess that for almost any function (except the "atomic
> ones") the self.time would be very small as it would spend most of its
> time calling other functions (again, since almost everything is a
> function). So how do R determine when a subfunction is called?
...brutal snippage...
> I guess some of the answers would be clear if I had a firm grasp of
> the inner workings of R :)

In a word, yes... 

The profiling keeps track of R's *context stack* which is not quite
the same as function calls. Essentially, it only counts R functions that
are actually written in R, but not .Internal, .Primitive, etc. So
"self" counts the amount of time that a function was at the top of the
stack. This is done by a periodic poll which dumps out the context
stack at regular intervals. Seeing cases where the self percentages
don't add to 100% is, I believe, simply due to truncation of the tails
-- that is, there is a large number of different functions which each
are counted a few times, and these are not shown in the summary output.

[Sorry about the late reply, but I was out of town, and noone seems to
have answered this.]

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] interpreting profiling output

2004-07-16 Thread Kasper Daniel Hansen
I have some trouble interpreting the output from profiling. I have
read the help pages Rprof, summaryRprof and consult the R extensions
manual, but I still have problems understanding the output.

Basically the output consist of self.time and total.time. I have the
understanding that total.time is the time spent in a given function
including any subcalls or child functions or whatever the technical
term for that activity is. In contrasts self.time is the time spent in
the function excluding subcalls.

Now, in my understanding basically everything in R is functions. I
would then guess that for almost any function (except the "atomic
ones") the self.time would be very small as it would spend most of its
time calling other functions (again, since almost everything is a
function). So how do R determine when a subfunction is called?

Looking at a practical example, which basically consists of a model
search, with each model being evaluated by a nnet fit, I get (using
the default value of interval=0.02 - the output is basically similar
for interval = 0.01)

> prof02.out$by.total[1:10,]
 total.time total.pct self.time self.pct
cv.risk.dsa22373.50 100.0  0.00  0.0
validation.risk.dsa22373.50 100.0  0.00  0.0
avg.pred.nnet   2370.98  99.9  4.36  1.5
dsa.2   2337.32  98.5  0.00  0.0
best.move.dsa2  2335.22  98.4  0.06  0.0
avg.nnet2335.10  98.4  0.04  0.0
nnet2188.22  92.2  1.74  0.6
nnet.default2186.48  92.1 33.32 11.2
sub.dsa21466.32  61.8  0.18  0.1
del.dsa2 560.36  23.6  0.02  0.0

This tells me that the majority (92%) is spent in the nnet
fit. Therefore it seems that the only real improvement to the code
would be to change the algorithm in order to call nnet fewer
times. The actual search code only amounts to a little amout of the
total time.

Looking at sampling.time, the fourth column (total.time in percent) is
the third column (total.time) divided by sampling.time (which makes
sense).

My trouble really begins when I look at the self.time column.

> prof02.out$by.self[1:10,]
  self.time self.pct total.time total.pct
nnet.default  33.32 11.22186.48  92.1
FUN   23.42  7.8  46.34   2.0
as.integer22.28  7.5  30.92   1.3
mean.default  16.14  5.4  22.92   1.0
as.double.default 11.52  3.9  11.52   0.5
apply 10.22  3.4  60.80   2.6
as.double 10.14  3.4  30.24   1.3
rep.default9.98  3.3  19.96   0.8
matrix 9.62  3.2  75.86   3.2
add.net8.80  2.9  10.16   0.4

First of, I would guess that the sum of the self.time column
_ought_??? to be equal to the sampling time, which it is not
> sum(prof02.out$by.self[,1])
[1] 298.4
> prof02.out$sampling.time
[1] 2373.5
Why does such a large discrepancy happen?

Secondly, looking at the output for as.integer, there is a big
difference between total.time and self.time - but I would _guess_
as.integer to be a pretty basic function, which does not call any
subfunctions, and in that case I would expect total.time to be equal
to self.time.

In short : I think I grasp the total time concept, but I have a hard
time understanding self.time. Specifically
 - Why is the sampling time so different from the sum of the self.time
 - Which function calls (or operations) are considered to belong to a
 given function's self.time and which are not.
 - Why is total.time != self.time for the as.integer function.

I guess some of the answers would be clear if I had a firm grasp of
the inner workings of R :)

Thanks in advance, Kasper
-- 
Kasper Daniel Hansen, Research Assistant
Department of Biostatistics, University of Copenhagen

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html