Re: [Rd] RProfmem output format

2011-05-15 Thread Hadley Wickham
 In the subsequence lines I'm assuming the structure is bytes allocated : 
 call.

 I think the five numbers come from four memory allocations before
 example() is called.  Looking at the code in src/main/memory.c, it
 prints newline only when the call stack is not empty.

Looking into that example in more detail, here's the distribution of
allocation numbers:

   134
4621   302

(with a threshold of 5k)

So it happens ~30 times in total.

So what causes allocations when the call stack is empty?  Something
internal?  Does the garbage collector trigger allocations (i.e. could
it be caused by moving data to contiguous memory)?

Any ideas what the correct thing to do with these memory allocations?
Ignore them because they're not really related to the function they're
attributed to?  Sum them up?

 I don't see why this is done, and I may well be the person who did it
 (I don't have svn on this computer to check), but it is clearly
 deliberate.

It seems like it would be more consistent to always print a newline,
and then it would obvious those allocations occurred when the call
stack was empty.  This would make parsing the file a little bit
easier.

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recursively parsing srcrefs

2011-05-15 Thread Hadley Wickham
 The bug is now fixed in R-devel and R-patched.

Thanks!

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recursively parsing srcrefs

2011-05-15 Thread Hadley Wickham
 findLineNum doesn't quite do what I want - it works on the text of the
 srcref, not on the parse tree.

 It searches through the parse tree for the smallest source ref that contains
 a given line.  So for example,

 if(condition) {
  blah
  blah
  blah
 }

 is a single statement, and there will be a srcref stored in its container
 that goes from line N to line N+4.  But it also contains the compound
 statement

 {
  blah
  blah
  blah
 }

 and there will be srcrefs attached to that for each of the statements in it.
  (I forget right now whether there are 3 or 4 statements there:  R treats
 braces in a funny way, and I'd have to look at an example to check.)  Each
 of the blah's will get a srcref spanning one line, and it will be stored
 in the container.

I'm clearly missing something obvious because I don't see how to
access these lower-level srcrefs.  Would you mind providing a small
example?

Thanks!

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RProfmem output format

2011-05-15 Thread Hadley Wickham
Also, would you mind commenting how RProfmem is misleading?

There are three ways to profile memory use over time in R code. ...
All can be misleading, for different reasons.
--- 
http://cran.r-project.org/doc/manuals/R-exts.html#Profiling-R-code-for-memory-use

The other two ways describe why they are misleading.

Hadley

On Sun, May 15, 2011 at 8:02 AM, Hadley Wickham had...@rice.edu wrote:
 In the subsequence lines I'm assuming the structure is bytes allocated : 
 call.

 I think the five numbers come from four memory allocations before
 example() is called.  Looking at the code in src/main/memory.c, it
 prints newline only when the call stack is not empty.

 Looking into that example in more detail, here's the distribution of
 allocation numbers:

   1    3    4
 4621   30    2

 (with a threshold of 5k)

 So it happens ~30 times in total.

 So what causes allocations when the call stack is empty?  Something
 internal?  Does the garbage collector trigger allocations (i.e. could
 it be caused by moving data to contiguous memory)?

 Any ideas what the correct thing to do with these memory allocations?
 Ignore them because they're not really related to the function they're
 attributed to?  Sum them up?

 I don't see why this is done, and I may well be the person who did it
 (I don't have svn on this computer to check), but it is clearly
 deliberate.

 It seems like it would be more consistent to always print a newline,
 and then it would obvious those allocations occurred when the call
 stack was empty.  This would make parsing the file a little bit
 easier.

 Hadley

 --
 Assistant Professor / Dobelman Family Junior Chair
 Department of Statistics / Rice University
 http://had.co.nz/




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] By default, `names-` alters S4 objects

2011-05-15 Thread John Chambers

This is basically a case of a user error that is not being caught:

On 5/14/11 3:47 PM, Hervé Pagès wrote:

Hi,

I was stumped by this. The two S4 objects below looked exactly the same:

  a1
An object of class A
Slot aa:
integer(0)
  a2
An object of class A
Slot aa:
integer(0)

  str(a1)
Formal class 'A' [package .GlobalEnv] with 1 slots
..@ aa: int(0)
  str(a2)
Formal class 'A' [package .GlobalEnv] with 1 slots
..@ aa: int(0)

But they were not identical:

  identical(a1,a2)
[1] FALSE

Then I found that one had a names attribute but not the other:

  names(attributes(a1))
[1] aa class names
  names(attributes(a2))
[1] aa class

  names(a1)
NULL
  names(a2)
NULL

Which explained why they were not reported as identical.

After tracking the history of 'a1', I found that it was created with
something like:

  setClass(A, representation(aa=integer))
[1] A
  a1 - new(A)
  names(a1) - K
  names(a1)
NULL

So it seems that, by default (i.e. in the absence of a specialized
method), the `names-` primitive is adding a names attribute to the
object. Could this behaviour be modified so it doesn't alter the object?


Eh?  But you did alter the object.  Not only that, you altered it in 
what is technically an invalid way:  Adding a names attribute to a class 
that has no names slot.


The modification that would make sense would be to give you an error in 
the above code.  Not a bad idea, but it's likely to generate more 
complaints in other contexts, particularly where people don't 
distinguish the list class from lists with names (the namedList class).


A plausible strategy:
 1.  If the class has a vector data slot and no names slot, assign the 
names but with a warning.


 2. Otherwise, throw an error.

(I.e., I would prefer an error throughout, but discretion )

Comments?

John




Thanks,
H.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] RProfmem output format

2011-05-15 Thread Thomas Lumley
On Mon, May 16, 2011 at 1:02 AM, Hadley Wickham had...@rice.edu wrote:
 So what causes allocations when the call stack is empty?  Something
 internal?  Does the garbage collector trigger allocations (i.e. could
 it be caused by moving data to contiguous memory)?

The garbage collector doesn't move anything, it just swaps pointers in
a linked list.

The lexer, parser, and evaluator all have  to do some work before a
function context is set up for the top-level function, so I assume
that's where it is happening.

 Any ideas what the correct thing to do with these memory allocations?
 Ignore them because they're not really related to the function they're
 attributed to?  Sum them up?

 I don't see why this is done, and I may well be the person who did it
 (I don't have svn on this computer to check), but it is clearly
 deliberate.

 It seems like it would be more consistent to always print a newline,
 and then it would obvious those allocations occurred when the call
 stack was empty.  This would make parsing the file a little bit
 easier.

Yes. It's obviously better to always print a newline, and so clearly
deliberate not to, that I suspect there may have been a good reason.
If I can't work it out (after my grant deadline this week) I will just
assume it's wrong.


   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] By default, `names-` alters S4 objects

2011-05-15 Thread Hervé Pagès

On 11-05-15 11:33 AM, John Chambers wrote:

This is basically a case of a user error that is not being caught:


Sure!

  https://stat.ethz.ch/pipermail/r-devel/2009-March/052386.html



On 5/14/11 3:47 PM, Hervé Pagès wrote:

Hi,

I was stumped by this. The two S4 objects below looked exactly the same:

 a1
An object of class A
Slot aa:
integer(0)
 a2
An object of class A
Slot aa:
integer(0)

 str(a1)
Formal class 'A' [package .GlobalEnv] with 1 slots
..@ aa: int(0)
 str(a2)
Formal class 'A' [package .GlobalEnv] with 1 slots
..@ aa: int(0)

But they were not identical:

 identical(a1,a2)
[1] FALSE

Then I found that one had a names attribute but not the other:

 names(attributes(a1))
[1] aa class names
 names(attributes(a2))
[1] aa class

 names(a1)
NULL
 names(a2)
NULL

Which explained why they were not reported as identical.

After tracking the history of 'a1', I found that it was created with
something like:

 setClass(A, representation(aa=integer))
[1] A
 a1 - new(A)
 names(a1) - K
 names(a1)
NULL

So it seems that, by default (i.e. in the absence of a specialized
method), the `names-` primitive is adding a names attribute to the
object. Could this behaviour be modified so it doesn't alter the object?


Eh? But you did alter the object. Not only that, you altered it in what
is technically an invalid way: Adding a names attribute to a class that
has no names slot.


Ah, that's interesting. I didn't know I could put a names slot in my
class. Last time I tried was at least 3 years ago and that was causing
problems (don't remember the exact details) so I ended up using NAMES
instead. Trying again with R-2.14:

   setClass(A, representation(names=character))

   a - new(A)

   attributes(a)
  $names
  character(0)

  $class
  [1] A
  attr(,package)
  [1] .GlobalEnv

   names(a)
  NULL

   names(a) - K

   attributes(a)
  $names
  [1] K

  $class
  [1] A
  attr(,package)
  [1] .GlobalEnv

   names(a)
  NULL

Surprise! But that's another story...



The modification that would make sense would be to give you an error in
the above code. Not a bad idea, but it's likely to generate more
complaints in other contexts, particularly where people don't
distinguish the list class from lists with names (the namedList class).

A plausible strategy:
1. If the class has a vector data slot and no names slot, assign the
names but with a warning.

2. Otherwise, throw an error.

(I.e., I would prefer an error throughout, but discretion )


Or, at a minimum (if no consensus can be reached about the above
strategy), not add a names attribute set to NULL. My original
post was more about keeping the internal representation of objects
normalized, in general, so identical() is more likely to be
meaningful.

Thanks,
H.



Comments?

John




Thanks,
H.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel