Re: [Rd] R vs. C now rather: how to ease package checking

2011-01-18 Thread Claudia Beleites

On 01/18/2011 01:13 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves
spencer.gra...@structuremonitoring.com  wrote:


Hi, Dominick, et al.:


  Demanding complete unit test suites with all software contributed to
CRAN would likely cut contributions by a factor of 10 or 100.  For me, the R
package creation process is close to perfection in providing a standard
process for documentation with places for examples and test suites of
various kinds.  I mention perfection, because it makes developing
trustworthy software (Chamber's prime directive) relatively easy without
forcing people to do things they don't feel comfortable doing.



I don't think I made myself clear, sorry. I was not suggesting that package
developers include a complete unit
test suite. I was suggesting that unit testing should be done outside of the
CRAN release process. Packages
should be submitted for release to CRAN after they have been tested (the
responsibility of the package
developers). I understand that the main problem here is that package
developers do not have access to
all supported platforms, so the current process is not likely to change.


Regarding access to all platforms: But there's r-forge where building and checks 
are done nightly for Linux, Win, and Mac (though for some months now the check 
protocols are not available for 32 bit Linux and Windows - but I hope they'll be 
back soon).

I found it extremely easy to get an account  project space and building.
Many thanks to r-forge!

complete unit test suites:
To me, it seems nicer and better to favour packages that do it than mechanical 
enforcement. E.g. show icons that announce if a package comes with vignette, 
test suite (code coverage), and etc.


My 2 ct,

Claudia




Dominick




  If you need more confidence in the software you use, you can build
your own test suites -- maybe in packages you write yourself -- or pay
someone else to develop test suites to your specifications.  For example,
Revolution Analytics offers Package validation, development and support.


   Spencer



On 1/17/2011 3:27 PM, Dominick Samperi wrote:


On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves
spencer.gra...@structuremonitoring.com   wrote:

  Hi, Paul:



  The Writing R Extensions manual says that *.R code in a tests
directory is run during R CMD check.  I suspect that many R programmers
do
this routinely.  I probably should do that also.  However, for me, it's
simpler to have everything in the examples section of *.Rd files.  I
think
the examples with independently developed answers provides useful
documentation.

  This is a unit test function, and I think it would be better if there

was a
way to unit test packages *before* they
are released to CRAN. Otherwise, this is not really a release, it is
test
or beta version. This is currently
possible under Windows using http://win-builder.r-project.org/, for
example.

My earlier remark about the release process was more about documentation
than about unit testing, more
about the gentle nudging that the R release process does to help insure
consistent documentation and
organization, and about how this nudging might be extended to the C/C++
part
of a package.

Dominick


   Spencer




On 1/17/2011 1:52 PM, Paul Gilbert wrote:

  Spencer


Would it not be easier to include this kind of test in a small file in
the
tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org [mailto:
r-devel-boun...@r-project.org]
On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process.  I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages.  I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files.  To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct  answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately.  However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc.  (Unit tests too
detailed to show users could

Re: [Rd] R vs. C

2011-01-18 Thread Patrick Burns

I'm not at all a fan of thinking
of the examples as being tests.

Examples should clarify the thinking
of potential users.  Tests should
clarify the space in which the code
is correct.  These two goals are
generally at odds.

On 17/01/2011 22:15, Spencer Graves wrote:

Hi, Paul:


The Writing R Extensions manual says that *.R code in a tests
directory is run during R CMD check. I suspect that many R programmers
do this routinely. I probably should do that also. However, for me, it's
simpler to have everything in the examples section of *.Rd files. I
think the examples with independently developed answers provides useful
documentation.


Spencer


On 1/17/2011 1:52 PM, Paul Gilbert wrote:

Spencer

Would it not be easier to include this kind of test in a small file in
the tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process. I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages. I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files. To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately. However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc. (Unit tests too
detailed to show users could be completely enclosed in \dontshow.


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:


Another point I have not yet seen mentioned: If your code is
painfully slow, that can often be fixed without leaving R by
experimenting
with different ways of doing the same thing -- often after using
profiling
your code to find the slowest part as described in chapter 3 of
Writing R
Extensions.


If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it in R.
However, the problems with portability, maintainability,
transparency to
others who may not be very facile with C, etc., all suggest that
it's well
worth some effort experimenting with alternate ways of doing the
same thing
in R before jumping to C or something else.

Hope this helps.
Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:


I think we're also forgetting something, namely testing. If you write
your
routine in C, you have placed additional burden upon yourself to
test your
C
code through unit tests, etc. If you write your code in R, you
still need
the
unit tests, but you can rely on the well tested nature of R to
allow you
to
reduce the number of tests of your algorithm. I routinely tell
people at
Sage
Bionetworks where I am working now that your new C code needs to
experience at
least one order of magnitude increase in performance to warrant the
effort
of
moving from R to C.

But, then again, I am working with scientists who are not
primarily, or
even
secondarily, coders...

Dave H



This makes sense, but I have seem some very transparent algorithms
turned
into vectorized R code
that is difficult to read (and thus to maintain or to change). These
chunks
of optimized R code are like
embedded assembly, in the sense that nobody is likely to want to mess
with
it. This could be addressed
by including pseudo code for the original (more transparent)
algorithm as a
comment, but I have never
seen this done in practice (perhaps it could be enforced by R CMD
check?!).

On the other hand, in principle a well-documented piece of C/C++ code
could
be much easier to understand,
without paying a performance penalty...but coders are not likely to
place
this high on their
list of priorities.

The bottom like is that R is an adaptor (glue) language like Lisp that
makes it easy to mix and
match functions (using classes and generic functions), many of which are
written in C (or C++
or Fortran) for performance reasons. Like any object-based system
there can
be a lot of
object

Re: [Rd] R vs. C

2011-01-18 Thread Claudia Beleites

On 01/18/2011 10:53 AM, Patrick Burns wrote:

I'm not at all a fan of thinking
of the examples as being tests.

Examples should clarify the thinking
of potential users. Tests should
clarify the space in which the code
is correct. These two goals are
generally at odds.


Patrick, I completely agree with you that
- Tests should not clutter the documentation and go to their proper place.
- Examples are there for the user's benefit - and must be written accordingly.
- Often, test should cover far more situations than good examples.

Yet it seems to me that (part of the) examples are justly considered a (small) 
subset of the tests:
As a potential user, I reqest two things from good examples that have an 
implicit testing message/side effect:
- I like the examples to roughly outline the space in which the code works: they 
should tell me what I'm supposed to do.
- Depending on the function's purpose, I like to see a demonstration of the 
correctness for some example calculation.

(I don't want to see all further tests - I can look them up if I feel the need)

The fact that the very same line of example code serves a testing (side) purpose 
 doesn't mean that it should be copied into the tests, does it?


Thus, I think of the public part (the preface) of the tests living in the 
examples.


My 2 ct,
Best regards,

Claudia





On 17/01/2011 22:15, Spencer Graves wrote:

Hi, Paul:


The Writing R Extensions manual says that *.R code in a tests
directory is run during R CMD check. I suspect that many R programmers
do this routinely. I probably should do that also. However, for me, it's
simpler to have everything in the examples section of *.Rd files. I
think the examples with independently developed answers provides useful
documentation.


Spencer


On 1/17/2011 1:52 PM, Paul Gilbert wrote:

Spencer

Would it not be easier to include this kind of test in a small file in
the tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process. I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages. I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files. To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately. However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc. (Unit tests too
detailed to show users could be completely enclosed in \dontshow.


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:


Another point I have not yet seen mentioned: If your code is
painfully slow, that can often be fixed without leaving R by
experimenting
with different ways of doing the same thing -- often after using
profiling
your code to find the slowest part as described in chapter 3 of
Writing R
Extensions.


If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it in R.
However, the problems with portability, maintainability,
transparency to
others who may not be very facile with C, etc., all suggest that
it's well
worth some effort experimenting with alternate ways of doing the
same thing
in R before jumping to C or something else.

Hope this helps.
Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:


I think we're also forgetting something, namely testing. If you write
your
routine in C, you have placed additional burden upon yourself to
test your
C
code through unit tests, etc. If you write your code in R, you
still need
the
unit tests, but you can rely on the well tested nature of R to
allow you
to
reduce the number of tests of your algorithm. I routinely tell
people at
Sage
Bionetworks where I am working now that your new C code needs to
experience at
least one order of magnitude increase in performance to warrant the
effort
of
moving

Re: [Rd] R vs. C

2011-01-18 Thread Patrick Burns

Claudia,

I think we agree.

Having the examples run in the
tests is a good thing, I think.
They might strengthen the tests
some (especially if there are
no other tests).  But mainly if
examples don't work, then it's
hard to have much faith in the
code.

On 18/01/2011 11:36, Claudia Beleites wrote:

On 01/18/2011 10:53 AM, Patrick Burns wrote:

I'm not at all a fan of thinking
of the examples as being tests.

Examples should clarify the thinking
of potential users. Tests should
clarify the space in which the code
is correct. These two goals are
generally at odds.


Patrick, I completely agree with you that
- Tests should not clutter the documentation and go to their proper place.
- Examples are there for the user's benefit - and must be written
accordingly.
- Often, test should cover far more situations than good examples.

Yet it seems to me that (part of the) examples are justly considered a
(small) subset of the tests:
As a potential user, I reqest two things from good examples that have an
implicit testing message/side effect:
- I like the examples to roughly outline the space in which the code
works: they should tell me what I'm supposed to do.
- Depending on the function's purpose, I like to see a demonstration of
the correctness for some example calculation.
(I don't want to see all further tests - I can look them up if I feel
the need)

The fact that the very same line of example code serves a testing (side)
purpose doesn't mean that it should be copied into the tests, does it?

Thus, I think of the public part (the preface) of the tests living
in the examples.

My 2 ct,
Best regards,

Claudia





On 17/01/2011 22:15, Spencer Graves wrote:

Hi, Paul:


The Writing R Extensions manual says that *.R code in a tests
directory is run during R CMD check. I suspect that many R programmers
do this routinely. I probably should do that also. However, for me, it's
simpler to have everything in the examples section of *.Rd files. I
think the examples with independently developed answers provides useful
documentation.


Spencer


On 1/17/2011 1:52 PM, Paul Gilbert wrote:

Spencer

Would it not be easier to include this kind of test in a small file in
the tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org
[mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process. I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages. I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files. To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately. However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc. (Unit tests too
detailed to show users could be completely enclosed in \dontshow.


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.com wrote:


Another point I have not yet seen mentioned: If your code is
painfully slow, that can often be fixed without leaving R by
experimenting
with different ways of doing the same thing -- often after using
profiling
your code to find the slowest part as described in chapter 3 of
Writing R
Extensions.


If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it
in R.
However, the problems with portability, maintainability,
transparency to
others who may not be very facile with C, etc., all suggest that
it's well
worth some effort experimenting with alternate ways of doing the
same thing
in R before jumping to C or something else.

Hope this helps.
Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:


I think we're also forgetting something, namely testing. If you
write
your
routine in C, you have placed additional burden upon yourself to
test your
C
code through unit tests, etc. If you write your code in R, you
still need
the
unit tests, but you can rely

Re: [Rd] R vs. C now rather: how to ease package checking

2011-01-18 Thread Dominick Samperi
On Tue, Jan 18, 2011 at 4:48 AM, Claudia Beleites cbelei...@units.itwrote:

 On 01/18/2011 01:13 AM, Dominick Samperi wrote:

 On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves
 spencer.gra...@structuremonitoring.com  wrote:

  Hi, Dominick, et al.:


  Demanding complete unit test suites with all software contributed to
 CRAN would likely cut contributions by a factor of 10 or 100.  For me,
 the R
 package creation process is close to perfection in providing a standard
 process for documentation with places for examples and test suites of
 various kinds.  I mention perfection, because it makes developing
 trustworthy software (Chamber's prime directive) relatively easy
 without
 forcing people to do things they don't feel comfortable doing.


 I don't think I made myself clear, sorry. I was not suggesting that
 package
 developers include a complete unit
 test suite. I was suggesting that unit testing should be done outside of
 the
 CRAN release process. Packages
 should be submitted for release to CRAN after they have been tested (the
 responsibility of the package
 developers). I understand that the main problem here is that package
 developers do not have access to
 all supported platforms, so the current process is not likely to change.


 Regarding access to all platforms: But there's r-forge where building and
 checks are done nightly for Linux, Win, and Mac (though for some months now
 the check protocols are not available for 32 bit Linux and Windows - but I
 hope they'll be back soon).
 I found it extremely easy to get an account  project space and building.
 Many thanks to r-forge!


Good point Claudia,

There are packages released to CRAN that
do not build on some platforms because the unit tests fail. It seems to me
that this kind of issue could be ironed out with the help of r-forge before
release, in which case there is no need to run the unit tests for released
packages.

Dominick

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C now rather: how to ease package checking

2011-01-18 Thread Spencer Graves

On 1/18/2011 8:44 AM, Dominick Samperi wrote:

On Tue, Jan 18, 2011 at 4:48 AM, Claudia Beleitescbelei...@units.itwrote:


On 01/18/2011 01:13 AM, Dominick Samperi wrote:


On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves
spencer.gra...@structuremonitoring.com   wrote:

  Hi, Dominick, et al.:


  Demanding complete unit test suites with all software contributed to
CRAN would likely cut contributions by a factor of 10 or 100.  For me,
the R
package creation process is close to perfection in providing a standard
process for documentation with places for examples and test suites of
various kinds.  I mention perfection, because it makes developing
trustworthy software (Chamber's prime directive) relatively easy
without
forcing people to do things they don't feel comfortable doing.



I don't think I made myself clear, sorry. I was not suggesting that
package
developers include a complete unit
test suite. I was suggesting that unit testing should be done outside of
the
CRAN release process. Packages
should be submitted for release to CRAN after they have been tested (the
responsibility of the package
developers). I understand that the main problem here is that package
developers do not have access to
all supported platforms, so the current process is not likely to change.


Regarding access to all platforms: But there's r-forge where building and
checks are done nightly for Linux, Win, and Mac (though for some months now
the check protocols are not available for 32 bit Linux and Windows - but I
hope they'll be back soon).
I found it extremely easy to get an account  project space and building.
Many thanks to r-forge!


Good point Claudia,

There are packages released to CRAN that
do not build on some platforms because the unit tests fail. It seems to me
that this kind of issue could be ironed out with the help of r-forge before
release, in which case there is no need to run the unit tests for released
packages.

Dominick


CRAN also runs R CMD check on its contributed packages.  I've found 
problems (and fixed) that I couldn't replicate by reviewing the repeated 
checks on both R-Forge and CRAN.



Spencer


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-18 Thread David Henderson
pTests and examples are different things.  The fact that your example runs 
only means that your code does not bomb on execution and not that it runs 
correctly.   Plus, the code in examples is meant as an aid to the user; a way 
to help them understand how to use your code.  Proper tests are there to make 
sure your code executes properly and computes things correctly. br/p




  
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] R vs. C

2011-01-17 Thread Patrick Leyshock
A question, please about development of R packages:

Are there any guidelines or best practices for deciding when and why to
implement an operation in R, vs. implementing it in C?  The Writing R
Extensions recommends working in interpreted R code . . . this is normally
the best option.  But we do write C-functions and access them in R - the
question is, when/why is this justified, and when/why is it NOT justified?

While I have identified helpful documents on R coding standards, I have not
seen notes/discussions on when/why to implement in R, vs. when to implement
in C.

Thanks, Patrick

On Sun, Jan 16, 2011 at 3:00 AM, r-devel-requ...@r-project.org wrote:

 Send R-devel mailing list submissions to
r-devel@r-project.org

 To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-devel
 or, via email, send a message with subject or body 'help' to
r-devel-requ...@r-project.org

 You can reach the person managing the list at
r-devel-ow...@r-project.org

 When replying, please edit your Subject line so it is more specific
 than Re: Contents of R-devel digest...


 Today's Topics:

   1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash
  (Xiaobo Gu)


 --

 Message: 1
 Date: Sat, 15 Jan 2011 10:34:55 +0800
 From: Xiaobo Gu guxiaobo1...@gmail.com
 To: r-devel@r-project.org
 Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64
crash
 Message-ID:
aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com
 Content-Type: text/plain; charset=ISO-8859-1

 Hi,
 I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003
 Server R2 64 bit SP2, the software environments are as following:
 R 2.12.1 for Win64
 RTools212 for Win64
 DBI 0.2.5
 RPostgreSQL 0.1.7
 Postgresql related binaries shipped with
 postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB

 The package can be loaded, and driver can be created, but the
 dbConnect function causes the whole RGui crashes,

 driver - dbDriver(PostgreSQL)
 con - dbConnect(driver, dbname=demo, host=192.168.8.1,
 user=postgres, password=postgres, port=5432)



 --

 ___
 R-devel@r-project.org mailing list  DIGESTED
 https://stat.ethz.ch/mailman/listinfo/r-devel


 End of R-devel Digest, Vol 95, Issue 14
 ***


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Patrick Burns

Everyone has their own utility
function.  Mine is if the boredom
of waiting for the pure R function
to finish is going to out-weight the
boredom of writing the C code.

Another issue is that adding C code
increases the hassle of users who might
want the code to run on different
architectures.

On 17/01/2011 17:13, Patrick Leyshock wrote:

A question, please about development of R packages:

Are there any guidelines or best practices for deciding when and why to
implement an operation in R, vs. implementing it in C?  The Writing R
Extensions recommends working in interpreted R code . . . this is normally
the best option.  But we do write C-functions and access them in R - the
question is, when/why is this justified, and when/why is it NOT justified?

While I have identified helpful documents on R coding standards, I have not
seen notes/discussions on when/why to implement in R, vs. when to implement
in C.

Thanks, Patrick

On Sun, Jan 16, 2011 at 3:00 AM,r-devel-requ...@r-project.org  wrote:


Send R-devel mailing list submissions to
r-devel@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
https://stat.ethz.ch/mailman/listinfo/r-devel
or, via email, send a message with subject or body 'help' to
r-devel-requ...@r-project.org

You can reach the person managing the list at
r-devel-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of R-devel digest...


Today's Topics:

   1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash
  (Xiaobo Gu)


--

Message: 1
Date: Sat, 15 Jan 2011 10:34:55 +0800
From: Xiaobo Guguxiaobo1...@gmail.com
To: r-devel@r-project.org
Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64
crash
Message-ID:
aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

Hi,
I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003
Server R2 64 bit SP2, the software environments are as following:
 R 2.12.1 for Win64
 RTools212 for Win64
 DBI 0.2.5
 RPostgreSQL 0.1.7
 Postgresql related binaries shipped with
postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB

The package can be loaded, and driver can be created, but the
dbConnect function causes the whole RGui crashes,

driver- dbDriver(PostgreSQL)
con- dbConnect(driver, dbname=demo, host=192.168.8.1,
user=postgres, password=postgres, port=5432)



--

___
R-devel@r-project.org mailing list  DIGESTED
https://stat.ethz.ch/mailman/listinfo/r-devel


End of R-devel Digest, Vol 95, Issue 14
***



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Duncan Murdoch

On 17/01/2011 12:41 PM, Patrick Burns wrote:

Everyone has their own utility
function.  Mine is if the boredom
of waiting for the pure R function
to finish is going to out-weight the
boredom of writing the C code.

Another issue is that adding C code
increases the hassle of users who might
want the code to run on different
architectures.


... and also makes it harder for you and your users to tweak your code 
for different uses.


It is not uncommon for C code to run 100 times faster than R code (but 
it is also not uncommon to see very little speedup, if the R code is 
well vectorized).  So if you have something that's really slow, think 
about the fundamental operations, and write those in C, then use R code 
to glue them together.  But if it is fast enough without doing that, 
then leave it all in R.


Duncan Murdoch


On 17/01/2011 17:13, Patrick Leyshock wrote:
  A question, please about development of R packages:

  Are there any guidelines or best practices for deciding when and why to
  implement an operation in R, vs. implementing it in C?  The Writing R
  Extensions recommends working in interpreted R code . . . this is normally
  the best option.  But we do write C-functions and access them in R - the
  question is, when/why is this justified, and when/why is it NOT justified?

  While I have identified helpful documents on R coding standards, I have not
  seen notes/discussions on when/why to implement in R, vs. when to implement
  in C.

  Thanks, Patrick

  On Sun, Jan 16, 2011 at 3:00 AM,r-devel-requ...@r-project.org   wrote:

  Send R-devel mailing list submissions to
  r-devel@r-project.org

  To subscribe or unsubscribe via the World Wide Web, visit
  https://stat.ethz.ch/mailman/listinfo/r-devel
  or, via email, send a message with subject or body 'help' to
  r-devel-requ...@r-project.org

  You can reach the person managing the list at
  r-devel-ow...@r-project.org

  When replying, please edit your Subject line so it is more specific
  than Re: Contents of R-devel digest...


  Today's Topics:

 1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash
(Xiaobo Gu)


  --

  Message: 1
  Date: Sat, 15 Jan 2011 10:34:55 +0800
  From: Xiaobo Guguxiaobo1...@gmail.com
  To: r-devel@r-project.org
  Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64
  crash
  Message-ID:
  aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com
  Content-Type: text/plain; charset=ISO-8859-1

  Hi,
  I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003
  Server R2 64 bit SP2, the software environments are as following:
   R 2.12.1 for Win64
   RTools212 for Win64
   DBI 0.2.5
   RPostgreSQL 0.1.7
   Postgresql related binaries shipped with
  postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB

  The package can be loaded, and driver can be created, but the
  dbConnect function causes the whole RGui crashes,

  driver- dbDriver(PostgreSQL)
  con- dbConnect(driver, dbname=demo, host=192.168.8.1,
  user=postgres, password=postgres, port=5432)



  --

  ___
  R-devel@r-project.org mailing list  DIGESTED
  https://stat.ethz.ch/mailman/listinfo/r-devel


  End of R-devel Digest, Vol 95, Issue 14
  ***


[[alternative HTML version deleted]]

  __
  R-devel@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-devel




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Dirk Eddelbuettel

On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
| 
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The Writing R
| Extensions recommends working in interpreted R code . . . this is normally
| the best option.  But we do write C-functions and access them in R - the
| question is, when/why is this justified, and when/why is it NOT justified?
| 
| While I have identified helpful documents on R coding standards, I have not
| seen notes/discussions on when/why to implement in R, vs. when to implement
| in C.

The (still fairly recent) book 'Software for Data Analysis: Programming with
R' by John Chambers (Springer, 2008) has a lot to say about this.  John also
gave a talk in November which stressed 'multilanguage' approaches; see e.g.
http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html

In short, it all depends, and it is unlikely that you will get a coherent
answer that is valid for all circumstances.  We all love R for how expressive
and powerful it is, yet there are times when something else is called for.
Exactly when that time is depends on a great many things and you have not
mentioned a single metric in your question.  So I'd start with John's book.

Hope this helps, Dirk

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread David Henderson
I think we're also forgetting something, namely testing.  If you write your 
routine in C, you have placed additional burden upon yourself to test your C 
code through unit tests, etc.  If you write your code in R, you still need the 
unit tests, but you can rely on the well tested nature of R to allow you to 
reduce the number of tests of your algorithm.  I routinely tell people at Sage 
Bionetworks where I am working now that your new C code needs to experience at 
least one order of magnitude increase in performance to warrant the effort of 
moving from R to C.

But, then again, I am working with scientists who are not primarily, or even 
secondarily, coders...

Dave H



- Original Message 
From: Dirk Eddelbuettel e...@debian.org
To: Patrick Leyshock ngkbr...@gmail.com
Cc: r-devel@r-project.org
Sent: Mon, January 17, 2011 10:13:36 AM
Subject: Re: [Rd] R vs. C


On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
| 
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The Writing R
| Extensions recommends working in interpreted R code . . . this is normally
| the best option.  But we do write C-functions and access them in R - the
| question is, when/why is this justified, and when/why is it NOT justified?
| 
| While I have identified helpful documents on R coding standards, I have not
| seen notes/discussions on when/why to implement in R, vs. when to implement
| in C.

The (still fairly recent) book 'Software for Data Analysis: Programming with
R' by John Chambers (Springer, 2008) has a lot to say about this.  John also
gave a talk in November which stressed 'multilanguage' approaches; see e.g.
http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html


In short, it all depends, and it is unlikely that you will get a coherent
answer that is valid for all circumstances.  We all love R for how expressive
and powerful it is, yet there are times when something else is called for.
Exactly when that time is depends on a great many things and you have not
mentioned a single metric in your question.  So I'd start with John's book.

Hope this helps, Dirk

-- 
Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Spencer Graves
  Another point I have not yet seen mentioned:  If your code is 
painfully slow, that can often be fixed without leaving R by 
experimenting with different ways of doing the same thing -- often after 
using profiling your code to find the slowest part as described in 
chapter 3 of Writing R Extensions.



  If I'm given code already written in C (or some other language), 
unless it's really simple, I may link to it rather than recode it in R.  
However, the problems with portability, maintainability, transparency to 
others who may not be very facile with C, etc., all suggest that it's 
well worth some effort experimenting with alternate ways of doing the 
same thing in R before jumping to C or something else.



  Hope this helps.
  Spencer


On 1/17/2011 10:57 AM, David Henderson wrote:

I think we're also forgetting something, namely testing.  If you write your
routine in C, you have placed additional burden upon yourself to test your C
code through unit tests, etc.  If you write your code in R, you still need the
unit tests, but you can rely on the well tested nature of R to allow you to
reduce the number of tests of your algorithm.  I routinely tell people at Sage
Bionetworks where I am working now that your new C code needs to experience at
least one order of magnitude increase in performance to warrant the effort of
moving from R to C.

But, then again, I am working with scientists who are not primarily, or even
secondarily, coders...

Dave H



- Original Message 
From: Dirk Eddelbuettele...@debian.org
To: Patrick Leyshockngkbr...@gmail.com
Cc: r-devel@r-project.org
Sent: Mon, January 17, 2011 10:13:36 AM
Subject: Re: [Rd] R vs. C


On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
|
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The Writing R
| Extensions recommends working in interpreted R code . . . this is normally
| the best option.  But we do write C-functions and access them in R - the
| question is, when/why is this justified, and when/why is it NOT justified?
|
| While I have identified helpful documents on R coding standards, I have not
| seen notes/discussions on when/why to implement in R, vs. when to implement
| in C.

The (still fairly recent) book 'Software for Data Analysis: Programming with
R' by John Chambers (Springer, 2008) has a lot to say about this.  John also
gave a talk in November which stressed 'multilanguage' approaches; see e.g.
http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html


In short, it all depends, and it is unlikely that you will get a coherent
answer that is valid for all circumstances.  We all love R for how expressive
and powerful it is, yet there are times when something else is called for.
Exactly when that time is depends on a great many things and you have not
mentioned a single metric in your question.  So I'd start with John's book.

Hope this helps, Dirk


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Barry Rowlingson
On Mon, Jan 17, 2011 at 6:57 PM, David Henderson dnadav...@yahoo.com wrote:
 I think we're also forgetting something, namely testing.  If you write your
 routine in C, you have placed additional burden upon yourself to test your C
 code through unit tests, etc.  If you write your code in R, you still need the
 unit tests, but you can rely on the well tested nature of R to allow you to
 reduce the number of tests of your algorithm.  I routinely tell people at Sage
 Bionetworks where I am working now that your new C code needs to experience at
 least one order of magnitude increase in performance to warrant the effort of
 moving from R to C.

 But, then again, I am working with scientists who are not primarily, or even
 secondarily, coders...

If you write your code in C but interface to it in R, you can use the
same R test harness system. I recently coded something up in R, tested
it on small data, discovered it was waaay too slow on the real data,
rewrote the likelihood calculation in C, and then used the same test
set to make sure it was giving the same answers as the R code. It
wasn't. So I fixed that bug until it was. If I'd written the thing in
C to start with I might not have spotted it.

 Sometimes writing a prototype in R is a useful testing tool even when
you know it'll be too slow - as an interpreted language R gives you a
rapid development cycle and handy interactive debugging possibilities.
Things that do exist in C but require compilation

Barry

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Dominick Samperi
On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves 
spencer.gra...@structuremonitoring.com wrote:

  Another point I have not yet seen mentioned:  If your code is
 painfully slow, that can often be fixed without leaving R by experimenting
 with different ways of doing the same thing -- often after using profiling
 your code to find the slowest part as described in chapter 3 of Writing R
 Extensions.


  If I'm given code already written in C (or some other language),
 unless it's really simple, I may link to it rather than recode it in R.
  However, the problems with portability, maintainability, transparency to
 others who may not be very facile with C, etc., all suggest that it's well
 worth some effort experimenting with alternate ways of doing the same thing
 in R before jumping to C or something else.


  Hope this helps.
  Spencer



 On 1/17/2011 10:57 AM, David Henderson wrote:

 I think we're also forgetting something, namely testing.  If you write
 your
 routine in C, you have placed additional burden upon yourself to test your
 C
 code through unit tests, etc.  If you write your code in R, you still need
 the
 unit tests, but you can rely on the well tested nature of R to allow you
 to
 reduce the number of tests of your algorithm.  I routinely tell people at
 Sage
 Bionetworks where I am working now that your new C code needs to
 experience at
 least one order of magnitude increase in performance to warrant the effort
 of
 moving from R to C.

 But, then again, I am working with scientists who are not primarily, or
 even
 secondarily, coders...

 Dave H


This makes sense, but I have seem some very transparent algorithms turned
into vectorized R code
that is difficult to read (and thus to maintain or to change). These chunks
of optimized R code are like
embedded assembly, in the sense that nobody is likely to want to mess with
it. This could be addressed
by including pseudo code for the original (more transparent) algorithm as a
comment, but I have never
seen this done in practice (perhaps it could be enforced by R CMD check?!).

On the other hand, in principle a well-documented piece of C/C++ code could
be much easier to understand,
without paying a performance penalty...but coders are not likely to place
this high on their
list of priorities.

The bottom like is that R is an adaptor (glue) language like Lisp that
makes it easy to mix and
match functions (using classes and generic functions), many of which are
written in C (or C++
or Fortran) for performance reasons. Like any object-based system there can
be a lot of
object copying, and like any functional programming system, there can be a
lot of function
calls, resulting in poor performance for some applications.

If you can vectorize your R code then you have effectively found a way to
benefit from
somebody else's C code, thus saving yourself some time. For operations other
than pure
vector calculations you will have to do the C/C++ programming yourself (or
call a library
that somebody else has written).

Dominick





 - Original Message 
 From: Dirk Eddelbuettele...@debian.org
 To: Patrick Leyshockngkbr...@gmail.com
 Cc: r-devel@r-project.org
 Sent: Mon, January 17, 2011 10:13:36 AM
 Subject: Re: [Rd] R vs. C


 On 17 January 2011 at 09:13, Patrick Leyshock wrote:
 | A question, please about development of R packages:
 |
 | Are there any guidelines or best practices for deciding when and why to
 | implement an operation in R, vs. implementing it in C?  The Writing R
 | Extensions recommends working in interpreted R code . . . this is
 normally
 | the best option.  But we do write C-functions and access them in R -
 the
 | question is, when/why is this justified, and when/why is it NOT
 justified?
 |
 | While I have identified helpful documents on R coding standards, I have
 not
 | seen notes/discussions on when/why to implement in R, vs. when to
 implement
 | in C.

 The (still fairly recent) book 'Software for Data Analysis: Programming
 with
 R' by John Chambers (Springer, 2008) has a lot to say about this.  John
 also
 gave a talk in November which stressed 'multilanguage' approaches; see
 e.g.

 http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html


 In short, it all depends, and it is unlikely that you will get a coherent
 answer that is valid for all circumstances.  We all love R for how
 expressive
 and powerful it is, yet there are times when something else is called for.
 Exactly when that time is depends on a great many things and you have not
 mentioned a single metric in your question.  So I'd start with John's
 book.

 Hope this helps, Dirk


 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Spencer Graves
  For me, a major strength of R is the package development 
process.  I've found this so valuable that I created a Wikipedia entry 
by that name and made additions to a Wikipedia entry on software 
repository, noting that this process encourages good software 
development practices that I have not seen standardized for other 
languages.  I encourage people to review this material and make 
additions or corrections as they like (or sent me suggestions for me to 
make appropriate changes).



  While R has other capabilities for unit and regression testing, I 
often include unit tests in the examples section of documentation 
files.  To keep from cluttering the examples with unnecessary material, 
I often include something like the following:



A1 - myfunc() # to test myfunc

A0 - (manual generation of the correct  answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


  This may not be as good in some ways as a full suite of unit 
tests, which could be provided separately.  However, this has the 
distinct advantage of including unit tests with the documentation in a 
way that should help users understand myfunc.  (Unit tests too 
detailed to show users could be completely enclosed in \dontshow.



  Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.com  wrote:


  Another point I have not yet seen mentioned:  If your code is
painfully slow, that can often be fixed without leaving R by experimenting
with different ways of doing the same thing -- often after using profiling
your code to find the slowest part as described in chapter 3 of Writing R
Extensions.


  If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it in R.
  However, the problems with portability, maintainability, transparency to
others who may not be very facile with C, etc., all suggest that it's well
worth some effort experimenting with alternate ways of doing the same thing
in R before jumping to C or something else.

  Hope this helps.
  Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:


I think we're also forgetting something, namely testing.  If you write
your
routine in C, you have placed additional burden upon yourself to test your
C
code through unit tests, etc.  If you write your code in R, you still need
the
unit tests, but you can rely on the well tested nature of R to allow you
to
reduce the number of tests of your algorithm.  I routinely tell people at
Sage
Bionetworks where I am working now that your new C code needs to
experience at
least one order of magnitude increase in performance to warrant the effort
of
moving from R to C.

But, then again, I am working with scientists who are not primarily, or
even
secondarily, coders...

Dave H



This makes sense, but I have seem some very transparent algorithms turned
into vectorized R code
that is difficult to read (and thus to maintain or to change). These chunks
of optimized R code are like
embedded assembly, in the sense that nobody is likely to want to mess with
it. This could be addressed
by including pseudo code for the original (more transparent) algorithm as a
comment, but I have never
seen this done in practice (perhaps it could be enforced by R CMD check?!).

On the other hand, in principle a well-documented piece of C/C++ code could
be much easier to understand,
without paying a performance penalty...but coders are not likely to place
this high on their
list of priorities.

The bottom like is that R is an adaptor (glue) language like Lisp that
makes it easy to mix and
match functions (using classes and generic functions), many of which are
written in C (or C++
or Fortran) for performance reasons. Like any object-based system there can
be a lot of
object copying, and like any functional programming system, there can be a
lot of function
calls, resulting in poor performance for some applications.

If you can vectorize your R code then you have effectively found a way to
benefit from
somebody else's C code, thus saving yourself some time. For operations other
than pure
vector calculations you will have to do the C/C++ programming yourself (or
call a library
that somebody else has written).

Dominick




- Original Message 
From: Dirk Eddelbuettele...@debian.org
To: Patrick Leyshockngkbr...@gmail.com
Cc: r-devel@r-project.org
Sent: Mon, January 17, 2011 10:13:36 AM
Subject: Re: [Rd] R vs. C


On 17 January 2011 at 09:13, Patrick Leyshock wrote:
| A question, please about development of R packages:
|
| Are there any guidelines or best practices for deciding when and why to
| implement an operation in R, vs. implementing it in C?  The Writing R
| Extensions recommends working in interpreted R code . . . this is
normally
| the best option

Re: [Rd] R vs. C

2011-01-17 Thread Dominick Samperi
 to
 benefit from
 somebody else's C code, thus saving yourself some time. For operations
 other
 than pure
 vector calculations you will have to do the C/C++ programming yourself (or
 call a library
 that somebody else has written).

 Dominick



  - Original Message 
 From: Dirk Eddelbuettele...@debian.org
 To: Patrick Leyshockngkbr...@gmail.com
 Cc: r-devel@r-project.org
 Sent: Mon, January 17, 2011 10:13:36 AM
 Subject: Re: [Rd] R vs. C


 On 17 January 2011 at 09:13, Patrick Leyshock wrote:
 | A question, please about development of R packages:
 |
 | Are there any guidelines or best practices for deciding when and why
 to
 | implement an operation in R, vs. implementing it in C?  The Writing R
 | Extensions recommends working in interpreted R code . . . this is
 normally
 | the best option.  But we do write C-functions and access them in R -
 the
 | question is, when/why is this justified, and when/why is it NOT
 justified?
 |
 | While I have identified helpful documents on R coding standards, I
 have
 not
 | seen notes/discussions on when/why to implement in R, vs. when to
 implement
 | in C.

 The (still fairly recent) book 'Software for Data Analysis: Programming
 with
 R' by John Chambers (Springer, 2008) has a lot to say about this.  John
 also
 gave a talk in November which stressed 'multilanguage' approaches; see
 e.g.


 http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html


 In short, it all depends, and it is unlikely that you will get a
 coherent
 answer that is valid for all circumstances.  We all love R for how
 expressive
 and powerful it is, yet there are times when something else is called
 for.
 Exactly when that time is depends on a great many things and you have
 not
 mentioned a single metric in your question.  So I'd start with John's
 book.

 Hope this helps, Dirk

  __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel



[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R vs. C

2011-01-17 Thread Paul Gilbert
Spencer

Would it not be easier to include this kind of test in a small file in the 
tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


   For me, a major strength of R is the package development 
process.  I've found this so valuable that I created a Wikipedia entry 
by that name and made additions to a Wikipedia entry on software 
repository, noting that this process encourages good software 
development practices that I have not seen standardized for other 
languages.  I encourage people to review this material and make 
additions or corrections as they like (or sent me suggestions for me to 
make appropriate changes).


   While R has other capabilities for unit and regression testing, I 
often include unit tests in the examples section of documentation 
files.  To keep from cluttering the examples with unnecessary material, 
I often include something like the following:


A1 - myfunc() # to test myfunc

A0 - (manual generation of the correct  answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


   This may not be as good in some ways as a full suite of unit 
tests, which could be provided separately.  However, this has the 
distinct advantage of including unit tests with the documentation in a 
way that should help users understand myfunc.  (Unit tests too 
detailed to show users could be completely enclosed in \dontshow.


   Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:
 On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
 spencer.gra...@structuremonitoring.com  wrote:

   Another point I have not yet seen mentioned:  If your code is
 painfully slow, that can often be fixed without leaving R by experimenting
 with different ways of doing the same thing -- often after using profiling
 your code to find the slowest part as described in chapter 3 of Writing R
 Extensions.


   If I'm given code already written in C (or some other language),
 unless it's really simple, I may link to it rather than recode it in R.
   However, the problems with portability, maintainability, transparency to
 others who may not be very facile with C, etc., all suggest that it's well
 worth some effort experimenting with alternate ways of doing the same thing
 in R before jumping to C or something else.

   Hope this helps.
   Spencer



 On 1/17/2011 10:57 AM, David Henderson wrote:

 I think we're also forgetting something, namely testing.  If you write
 your
 routine in C, you have placed additional burden upon yourself to test your
 C
 code through unit tests, etc.  If you write your code in R, you still need
 the
 unit tests, but you can rely on the well tested nature of R to allow you
 to
 reduce the number of tests of your algorithm.  I routinely tell people at
 Sage
 Bionetworks where I am working now that your new C code needs to
 experience at
 least one order of magnitude increase in performance to warrant the effort
 of
 moving from R to C.

 But, then again, I am working with scientists who are not primarily, or
 even
 secondarily, coders...

 Dave H


 This makes sense, but I have seem some very transparent algorithms turned
 into vectorized R code
 that is difficult to read (and thus to maintain or to change). These chunks
 of optimized R code are like
 embedded assembly, in the sense that nobody is likely to want to mess with
 it. This could be addressed
 by including pseudo code for the original (more transparent) algorithm as a
 comment, but I have never
 seen this done in practice (perhaps it could be enforced by R CMD check?!).

 On the other hand, in principle a well-documented piece of C/C++ code could
 be much easier to understand,
 without paying a performance penalty...but coders are not likely to place
 this high on their
 list of priorities.

 The bottom like is that R is an adaptor (glue) language like Lisp that
 makes it easy to mix and
 match functions (using classes and generic functions), many of which are
 written in C (or C++
 or Fortran) for performance reasons. Like any object-based system there can
 be a lot of
 object copying, and like any functional programming system, there can be a
 lot of function
 calls, resulting in poor performance for some applications.

 If you can vectorize your R code then you have effectively found a way to
 benefit from
 somebody else's C code, thus saving yourself some time. For operations other
 than pure
 vector calculations you will have to do the C/C++ programming yourself (or
 call a library
 that somebody else has written).

 Dominick



 - Original Message 
 From: Dirk Eddelbuettele...@debian.org
 To: Patrick Leyshockngkbr

Re: [Rd] R vs. C

2011-01-17 Thread Spencer Graves

Hi, Paul:


  The Writing R Extensions manual says that *.R code in a tests 
directory is run during R CMD check.  I suspect that many R 
programmers do this routinely.  I probably should do that also.  
However, for me, it's simpler to have everything in the examples 
section of *.Rd files.  I think the examples with independently 
developed answers provides useful documentation.



  Spencer


On 1/17/2011 1:52 PM, Paul Gilbert wrote:

Spencer

Would it not be easier to include this kind of test in a small file in the 
tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process.  I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages.  I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files.  To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct  answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately.  However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc.  (Unit tests too
detailed to show users could be completely enclosed in \dontshow.


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.com   wrote:


   Another point I have not yet seen mentioned:  If your code is
painfully slow, that can often be fixed without leaving R by experimenting
with different ways of doing the same thing -- often after using profiling
your code to find the slowest part as described in chapter 3 of Writing R
Extensions.


   If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it in R.
   However, the problems with portability, maintainability, transparency to
others who may not be very facile with C, etc., all suggest that it's well
worth some effort experimenting with alternate ways of doing the same thing
in R before jumping to C or something else.

   Hope this helps.
   Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:


I think we're also forgetting something, namely testing.  If you write
your
routine in C, you have placed additional burden upon yourself to test your
C
code through unit tests, etc.  If you write your code in R, you still need
the
unit tests, but you can rely on the well tested nature of R to allow you
to
reduce the number of tests of your algorithm.  I routinely tell people at
Sage
Bionetworks where I am working now that your new C code needs to
experience at
least one order of magnitude increase in performance to warrant the effort
of
moving from R to C.

But, then again, I am working with scientists who are not primarily, or
even
secondarily, coders...

Dave H



This makes sense, but I have seem some very transparent algorithms turned
into vectorized R code
that is difficult to read (and thus to maintain or to change). These chunks
of optimized R code are like
embedded assembly, in the sense that nobody is likely to want to mess with
it. This could be addressed
by including pseudo code for the original (more transparent) algorithm as a
comment, but I have never
seen this done in practice (perhaps it could be enforced by R CMD check?!).

On the other hand, in principle a well-documented piece of C/C++ code could
be much easier to understand,
without paying a performance penalty...but coders are not likely to place
this high on their
list of priorities.

The bottom like is that R is an adaptor (glue) language like Lisp that
makes it easy to mix and
match functions (using classes and generic functions), many of which are
written in C (or C++
or Fortran) for performance reasons. Like any object-based system there can
be a lot of
object copying, and like any functional programming system, there can be a
lot of function
calls, resulting in poor performance for some applications.

If you can vectorize your R

Re: [Rd] R vs. C

2011-01-17 Thread Dominick Samperi
On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves 
spencer.gra...@structuremonitoring.com wrote:

 Hi, Paul:


  The Writing R Extensions manual says that *.R code in a tests
 directory is run during R CMD check.  I suspect that many R programmers do
 this routinely.  I probably should do that also.  However, for me, it's
 simpler to have everything in the examples section of *.Rd files.  I think
 the examples with independently developed answers provides useful
 documentation.


This is a unit test function, and I think it would be better if there was a
way to unit test packages *before* they
are released to CRAN. Otherwise, this is not really a release, it is test
or beta version. This is currently
possible under Windows using http://win-builder.r-project.org/, for example.

My earlier remark about the release process was more about documentation
than about unit testing, more
about the gentle nudging that the R release process does to help insure
consistent documentation and
organization, and about how this nudging might be extended to the C/C++ part
of a package.

Dominick



  Spencer



 On 1/17/2011 1:52 PM, Paul Gilbert wrote:

 Spencer

 Would it not be easier to include this kind of test in a small file in the
 tests/ directory?

 Paul

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
 On Behalf Of Spencer Graves
 Sent: January 17, 2011 3:58 PM
 To: Dominick Samperi
 Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
 Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
 process.  I've found this so valuable that I created a Wikipedia entry
 by that name and made additions to a Wikipedia entry on software
 repository, noting that this process encourages good software
 development practices that I have not seen standardized for other
 languages.  I encourage people to review this material and make
 additions or corrections as they like (or sent me suggestions for me to
 make appropriate changes).


While R has other capabilities for unit and regression testing, I
 often include unit tests in the examples section of documentation
 files.  To keep from cluttering the examples with unnecessary material,
 I often include something like the following:


 A1- myfunc() # to test myfunc

 A0- (manual generation of the correct  answer for A1)

 \dontshow{stopifnot(} # so the user doesn't see stopifnot(
 all.equal(A1, A0) # compare myfunc output with the correct answer
 \dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
 tests, which could be provided separately.  However, this has the
 distinct advantage of including unit tests with the documentation in a
 way that should help users understand myfunc.  (Unit tests too
 detailed to show users could be completely enclosed in \dontshow.


Spencer


 On 1/17/2011 11:38 AM, Dominick Samperi wrote:

 On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
 spencer.gra...@structuremonitoring.com   wrote:

Another point I have not yet seen mentioned:  If your code is
 painfully slow, that can often be fixed without leaving R by
 experimenting
 with different ways of doing the same thing -- often after using
 profiling
 your code to find the slowest part as described in chapter 3 of Writing
 R
 Extensions.


   If I'm given code already written in C (or some other language),
 unless it's really simple, I may link to it rather than recode it in R.
   However, the problems with portability, maintainability, transparency
 to
 others who may not be very facile with C, etc., all suggest that it's
 well
 worth some effort experimenting with alternate ways of doing the same
 thing
 in R before jumping to C or something else.

   Hope this helps.
   Spencer



 On 1/17/2011 10:57 AM, David Henderson wrote:

  I think we're also forgetting something, namely testing.  If you write
 your
 routine in C, you have placed additional burden upon yourself to test
 your
 C
 code through unit tests, etc.  If you write your code in R, you still
 need
 the
 unit tests, but you can rely on the well tested nature of R to allow
 you
 to
 reduce the number of tests of your algorithm.  I routinely tell people
 at
 Sage
 Bionetworks where I am working now that your new C code needs to
 experience at
 least one order of magnitude increase in performance to warrant the
 effort
 of
 moving from R to C.

 But, then again, I am working with scientists who are not primarily, or
 even
 secondarily, coders...

 Dave H


  This makes sense, but I have seem some very transparent algorithms
 turned
 into vectorized R code
 that is difficult to read (and thus to maintain or to change). These
 chunks
 of optimized R code are like
 embedded assembly, in the sense that nobody is likely to want to mess
 with
 it. This could be addressed
 by including pseudo code for the original (more transparent) algorithm

Re: [Rd] R vs. C

2011-01-17 Thread Spencer Graves

Hi, Dominick, et al.:


  Demanding complete unit test suites with all software contributed 
to CRAN would likely cut contributions by a factor of 10 or 100.  For 
me, the R package creation process is close to perfection in providing a 
standard process for documentation with places for examples and test 
suites of various kinds.  I mention perfection, because it makes 
developing trustworthy software (Chamber's prime directive) 
relatively easy without forcing people to do things they don't feel 
comfortable doing.



  If you need more confidence in the software you use, you can 
build your own test suites -- maybe in packages you write yourself -- or 
pay someone else to develop test suites to your specifications.  For 
example, Revolution Analytics offers Package validation, development 
and support.



   Spencer


On 1/17/2011 3:27 PM, Dominick Samperi wrote:

On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves
spencer.gra...@structuremonitoring.com  wrote:


Hi, Paul:


  The Writing R Extensions manual says that *.R code in a tests
directory is run during R CMD check.  I suspect that many R programmers do
this routinely.  I probably should do that also.  However, for me, it's
simpler to have everything in the examples section of *.Rd files.  I think
the examples with independently developed answers provides useful
documentation.


This is a unit test function, and I think it would be better if there was a
way to unit test packages *before* they
are released to CRAN. Otherwise, this is not really a release, it is test
or beta version. This is currently
possible under Windows using http://win-builder.r-project.org/, for example.

My earlier remark about the release process was more about documentation
than about unit testing, more
about the gentle nudging that the R release process does to help insure
consistent documentation and
organization, and about how this nudging might be extended to the C/C++ part
of a package.

Dominick



  Spencer



On 1/17/2011 1:52 PM, Paul Gilbert wrote:


Spencer

Would it not be easier to include this kind of test in a small file in the
tests/ directory?

Paul

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
On Behalf Of Spencer Graves
Sent: January 17, 2011 3:58 PM
To: Dominick Samperi
Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
process.  I've found this so valuable that I created a Wikipedia entry
by that name and made additions to a Wikipedia entry on software
repository, noting that this process encourages good software
development practices that I have not seen standardized for other
languages.  I encourage people to review this material and make
additions or corrections as they like (or sent me suggestions for me to
make appropriate changes).


While R has other capabilities for unit and regression testing, I
often include unit tests in the examples section of documentation
files.  To keep from cluttering the examples with unnecessary material,
I often include something like the following:


A1- myfunc() # to test myfunc

A0- (manual generation of the correct  answer for A1)

\dontshow{stopifnot(} # so the user doesn't see stopifnot(
all.equal(A1, A0) # compare myfunc output with the correct answer
\dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
tests, which could be provided separately.  However, this has the
distinct advantage of including unit tests with the documentation in a
way that should help users understand myfunc.  (Unit tests too
detailed to show users could be completely enclosed in \dontshow.


Spencer


On 1/17/2011 11:38 AM, Dominick Samperi wrote:


On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
spencer.gra...@structuremonitoring.comwrote:

Another point I have not yet seen mentioned:  If your code is

painfully slow, that can often be fixed without leaving R by
experimenting
with different ways of doing the same thing -- often after using
profiling
your code to find the slowest part as described in chapter 3 of Writing
R
Extensions.


   If I'm given code already written in C (or some other language),
unless it's really simple, I may link to it rather than recode it in R.
   However, the problems with portability, maintainability, transparency
to
others who may not be very facile with C, etc., all suggest that it's
well
worth some effort experimenting with alternate ways of doing the same
thing
in R before jumping to C or something else.

   Hope this helps.
   Spencer



On 1/17/2011 10:57 AM, David Henderson wrote:

  I think we're also forgetting something, namely testing.  If you write

your
routine in C, you have placed additional burden upon yourself to test
your
C
code through unit tests, etc.  If you write your code in R, you still
need
the
unit

Re: [Rd] R vs. C

2011-01-17 Thread Dominick Samperi
On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves 
spencer.gra...@structuremonitoring.com wrote:

 Hi, Dominick, et al.:


  Demanding complete unit test suites with all software contributed to
 CRAN would likely cut contributions by a factor of 10 or 100.  For me, the R
 package creation process is close to perfection in providing a standard
 process for documentation with places for examples and test suites of
 various kinds.  I mention perfection, because it makes developing
 trustworthy software (Chamber's prime directive) relatively easy without
 forcing people to do things they don't feel comfortable doing.


I don't think I made myself clear, sorry. I was not suggesting that package
developers include a complete unit
test suite. I was suggesting that unit testing should be done outside of the
CRAN release process. Packages
should be submitted for release to CRAN after they have been tested (the
responsibility of the package
developers). I understand that the main problem here is that package
developers do not have access to
all supported platforms, so the current process is not likely to change.

Dominick



  If you need more confidence in the software you use, you can build
 your own test suites -- maybe in packages you write yourself -- or pay
 someone else to develop test suites to your specifications.  For example,
 Revolution Analytics offers Package validation, development and support.


   Spencer



 On 1/17/2011 3:27 PM, Dominick Samperi wrote:

 On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves
 spencer.gra...@structuremonitoring.com  wrote:

  Hi, Paul:


  The Writing R Extensions manual says that *.R code in a tests
 directory is run during R CMD check.  I suspect that many R programmers
 do
 this routinely.  I probably should do that also.  However, for me, it's
 simpler to have everything in the examples section of *.Rd files.  I
 think
 the examples with independently developed answers provides useful
 documentation.

  This is a unit test function, and I think it would be better if there
 was a
 way to unit test packages *before* they
 are released to CRAN. Otherwise, this is not really a release, it is
 test
 or beta version. This is currently
 possible under Windows using http://win-builder.r-project.org/, for
 example.

 My earlier remark about the release process was more about documentation
 than about unit testing, more
 about the gentle nudging that the R release process does to help insure
 consistent documentation and
 organization, and about how this nudging might be extended to the C/C++
 part
 of a package.

 Dominick


   Spencer



 On 1/17/2011 1:52 PM, Paul Gilbert wrote:

  Spencer

 Would it not be easier to include this kind of test in a small file in
 the
 tests/ directory?

 Paul

 -Original Message-
 From: r-devel-boun...@r-project.org [mailto:
 r-devel-boun...@r-project.org]
 On Behalf Of Spencer Graves
 Sent: January 17, 2011 3:58 PM
 To: Dominick Samperi
 Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
 Subject: Re: [Rd] R vs. C


For me, a major strength of R is the package development
 process.  I've found this so valuable that I created a Wikipedia entry
 by that name and made additions to a Wikipedia entry on software
 repository, noting that this process encourages good software
 development practices that I have not seen standardized for other
 languages.  I encourage people to review this material and make
 additions or corrections as they like (or sent me suggestions for me to
 make appropriate changes).


While R has other capabilities for unit and regression testing, I
 often include unit tests in the examples section of documentation
 files.  To keep from cluttering the examples with unnecessary material,
 I often include something like the following:


 A1- myfunc() # to test myfunc

 A0- (manual generation of the correct  answer for A1)

 \dontshow{stopifnot(} # so the user doesn't see stopifnot(
 all.equal(A1, A0) # compare myfunc output with the correct answer
 \dontshow{)} # close paren on stopifnot(.


This may not be as good in some ways as a full suite of unit
 tests, which could be provided separately.  However, this has the
 distinct advantage of including unit tests with the documentation in a
 way that should help users understand myfunc.  (Unit tests too
 detailed to show users could be completely enclosed in \dontshow.


Spencer


 On 1/17/2011 11:38 AM, Dominick Samperi wrote:

  On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves
 spencer.gra...@structuremonitoring.comwrote:

Another point I have not yet seen mentioned:  If your code is

 painfully slow, that can often be fixed without leaving R by
 experimenting
 with different ways of doing the same thing -- often after using
 profiling
 your code to find the slowest part as described in chapter 3 of
 Writing
 R
 Extensions.


   If I'm given code already written in C (or some other language