Re: [Rd] R vs. C

Patrick Burns Tue, 18 Jan 2011 03:56:05 -0800

Claudia,

I think we agree.


Having the examples run in the
tests is a good thing, I think.
They might strengthen the tests
some (especially if there are
no other tests).  But mainly if
examples don't work, then it's
hard to have much faith in the
code.

On 18/01/2011 11:36, Claudia Beleites wrote:

On 01/18/2011 10:53 AM, Patrick Burns wrote:

I'm not at all a fan of thinking
of the examples as being tests.

Examples should clarify the thinking
of potential users. Tests should
clarify the space in which the code
is correct. These two goals are
generally at odds.


Patrick, I completely agree with you that
- Tests should not clutter the documentation and go to their proper place.
- Examples are there for the user's benefit - and must be written
accordingly.
- Often, test should cover far more situations than good examples.

Yet it seems to me that (part of the) examples are justly considered a
(small) subset of the tests:
As a potential user, I reqest two things from good examples that have an
implicit testing message/side effect:
- I like the examples to roughly outline the space in which the code
works: they should tell me what I'm supposed to do.
- Depending on the function's purpose, I like to see a demonstration of
the correctness for some example calculation.
(I don't want to see all further tests - I can look them up if I feel
the need)

The fact that the very same line of example code serves a testing (side)
purpose doesn't mean that it should be copied into the tests, does it?

Thus, I think of the "public" part (the "preface") of the tests living
in the examples.

My 2 ct,
Best regards,

Claudia


On 17/01/2011 22:15, Spencer Graves wrote:

Hi, Paul:


The "Writing R Extensions" manual says that *.R code in a "tests"
directory is run during "R CMD check". I suspect that many R programmers
do this routinely. I probably should do that also. However, for me, it's
simpler to have everything in the "examples" section of *.Rd files. I
think the examples with independently developed answers provides useful
documentation.


Spencer


On 1/17/2011 1:52 PM, Paul Gilbert wrote:

Spencer

Would it not be easier to include this kind of test in a small file in
the tests/ directory?

Paul

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; [email protected]; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process. I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on "software
repository", noting that this process encourages good software
development practices that I have not seen standardized for other
languages. I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the "examples" section of documentation
files. To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1<- myfunc() # to test myfunc

A0<- ("manual generation of the correct answer for A1")

\dontshow{stopifnot(} # so the user doesn't see "stopifnot("
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on "stopifnot(".


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately. However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand "myfunc". (Unit tests too
detailed to show users could be completely enclosed in "\dontshow".


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
[email protected]> wrote:

Another point I have not yet seen mentioned: If your code is
painfully slow, that can often be fixed without leaving R by
experimenting
with different ways of doing the same thing -- often after using
profiling
your code to find the slowest part as described in chapter 3 of
"Writing R
Extensions".


If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it
in R.
However, the problems with portability, maintainability,
transparency to
others who may not be very facile with C, etc., all suggest that
it's well
worth some effort experimenting with alternate ways of doing the
same thing
in R before jumping to C or something else.

Hope this helps.
Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:

I think we're also forgetting something, namely testing. If you
write
your
routine in C, you have placed additional burden upon yourself to
test your
C
code through unit tests, etc. If you write your code in R, you
still need
the
unit tests, but you can rely on the well tested nature of R to
allow you
to
reduce the number of tests of your algorithm. I routinely tell
people at
Sage
Bionetworks where I am working now that your new C code needs to
experience at
least one order of magnitude increase in performance to warrant the
effort
of
moving from R to C.

But, then again, I am working with scientists who are not
primarily, or
even
secondarily, coders...

Dave H

This makes sense, but I have seem some very transparent algorithms
turned
into vectorized R code
that is difficult to read (and thus to maintain or to change). These
chunks
of optimized R code are like
embedded assembly, in the sense that nobody is likely to want to mess
with
it. This could be addressed
by including pseudo code for the original (more transparent)
algorithm as a
comment, but I have never
seen this done in practice (perhaps it could be enforced by R CMD
check?!).

On the other hand, in principle a well-documented piece of C/C++ code
could
be much easier to understand,
without paying a performance penalty...but "coders" are not likely to
place
this high on their
list of priorities.

The bottom like is that R is an adaptor ("glue") language like Lisp
that
makes it easy to mix and
match functions (using classes and generic functions), many of
which are
written in C (or C++
or Fortran) for performance reasons. Like any object-based system
there can
be a lot of
object copying, and like any functional programming system, there can
be a
lot of function
calls, resulting in poor performance for some applications.

If you can vectorize your R code then you have effectively found a
way to
benefit from
somebody else's C code, thus saving yourself some time. For
operations other
than pure
vector calculations you will have to do the C/C++ programming
yourself (or
call a library
that somebody else has written).

Dominick

----- Original Message ----
From: Dirk Eddelbuettel<[email protected]>
To: Patrick Leyshock<[email protected]>
Cc: [email protected]
Sent: Mon, January 17, 2011 10:13:36 AM
Subject: Re: [Rd] R vs. C

On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
|
| Are there any guidelines or best practices for deciding when and
why to
| implement an operation in R, vs. implementing it in C? The
"Writing R
| Extensions" recommends "working in interpreted R code . . .
this is
normally
| the best option." But we do write C-functions and access them
in R -
the
| question is, when/why is this justified, and when/why is it NOT
justified?
|
| While I have identified helpful documents on R coding standards,
I have
not
| seen notes/discussions on when/why to implement in R, vs. when to
implement
| in C.

The (still fairly recent) book 'Software for Data Analysis:
Programming
with
R' by John Chambers (Springer, 2008) has a lot to say about this.
John
also
gave a talk in November which stressed 'multilanguage'
approaches; see
e.g.

http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html

In short, it all depends, and it is unlikely that you will get a
coherent
answer that is valid for all circumstances. We all love R for how
expressive
and powerful it is, yet there are times when something else is
called for.
Exactly when that time is depends on a great many things and you
have not
mentioned a single metric in your question. So I'd start with John's
book.

Hope this helps, Dirk

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
====================================================================================




La version française suit le texte anglais.

------------------------------------------------------------------------------------




This email may contain privileged and/or confidential ...{{dropped:25}}


______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


--
Patrick Burns
[email protected]
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R vs. C

Reply via email to