Re: [Rd] R vs. C now rather: how to ease package checking
On 01/18/2011 01:13 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Dominick, et al.: Demanding complete unit test suites with all software contributed to CRAN would likely cut contributions by a factor of 10 or 100. For me, the R package creation process is close to perfection in providing a standard process for documentation with places for examples and test suites of various kinds. I mention perfection, because it makes developing trustworthy software (Chamber's prime directive) relatively easy without forcing people to do things they don't feel comfortable doing. I don't think I made myself clear, sorry. I was not suggesting that package developers include a complete unit test suite. I was suggesting that unit testing should be done outside of the CRAN release process. Packages should be submitted for release to CRAN after they have been tested (the responsibility of the package developers). I understand that the main problem here is that package developers do not have access to all supported platforms, so the current process is not likely to change. Regarding access to all platforms: But there's r-forge where building and checks are done nightly for Linux, Win, and Mac (though for some months now the check protocols are not available for 32 bit Linux and Windows - but I hope they'll be back soon). I found it extremely easy to get an account project space and building. Many thanks to r-forge! complete unit test suites: To me, it seems nicer and better to favour packages that do it than mechanical enforcement. E.g. show icons that announce if a package comes with vignette, test suite (code coverage), and etc. My 2 ct, Claudia Dominick If you need more confidence in the software you use, you can build your own test suites -- maybe in packages you write yourself -- or pay someone else to develop test suites to your specifications. For example, Revolution Analytics offers Package validation, development and support. Spencer On 1/17/2011 3:27 PM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. This is a unit test function, and I think it would be better if there was a way to unit test packages *before* they are released to CRAN. Otherwise, this is not really a release, it is test or beta version. This is currently possible under Windows using http://win-builder.r-project.org/, for example. My earlier remark about the release process was more about documentation than about unit testing, more about the gentle nudging that the R release process does to help insure consistent documentation and organization, and about how this nudging might be extended to the C/C++ part of a package. Dominick Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto: r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could
Re: [Rd] R vs. C
I'm not at all a fan of thinking of the examples as being tests. Examples should clarify the thinking of potential users. Tests should clarify the space in which the code is correct. These two goals are generally at odds. On 17/01/2011 22:15, Spencer Graves wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm as a comment, but I have never seen this done in practice (perhaps it could be enforced by R CMD check?!). On the other hand, in principle a well-documented piece of C/C++ code could be much easier to understand, without paying a performance penalty...but coders are not likely to place this high on their list of priorities. The bottom like is that R is an adaptor (glue) language like Lisp that makes it easy to mix and match functions (using classes and generic functions), many of which are written in C (or C++ or Fortran) for performance reasons. Like any object-based system there can be a lot of object
Re: [Rd] R vs. C
On 01/18/2011 10:53 AM, Patrick Burns wrote: I'm not at all a fan of thinking of the examples as being tests. Examples should clarify the thinking of potential users. Tests should clarify the space in which the code is correct. These two goals are generally at odds. Patrick, I completely agree with you that - Tests should not clutter the documentation and go to their proper place. - Examples are there for the user's benefit - and must be written accordingly. - Often, test should cover far more situations than good examples. Yet it seems to me that (part of the) examples are justly considered a (small) subset of the tests: As a potential user, I reqest two things from good examples that have an implicit testing message/side effect: - I like the examples to roughly outline the space in which the code works: they should tell me what I'm supposed to do. - Depending on the function's purpose, I like to see a demonstration of the correctness for some example calculation. (I don't want to see all further tests - I can look them up if I feel the need) The fact that the very same line of example code serves a testing (side) purpose doesn't mean that it should be copied into the tests, does it? Thus, I think of the public part (the preface) of the tests living in the examples. My 2 ct, Best regards, Claudia On 17/01/2011 22:15, Spencer Graves wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving
Re: [Rd] R vs. C
Claudia, I think we agree. Having the examples run in the tests is a good thing, I think. They might strengthen the tests some (especially if there are no other tests). But mainly if examples don't work, then it's hard to have much faith in the code. On 18/01/2011 11:36, Claudia Beleites wrote: On 01/18/2011 10:53 AM, Patrick Burns wrote: I'm not at all a fan of thinking of the examples as being tests. Examples should clarify the thinking of potential users. Tests should clarify the space in which the code is correct. These two goals are generally at odds. Patrick, I completely agree with you that - Tests should not clutter the documentation and go to their proper place. - Examples are there for the user's benefit - and must be written accordingly. - Often, test should cover far more situations than good examples. Yet it seems to me that (part of the) examples are justly considered a (small) subset of the tests: As a potential user, I reqest two things from good examples that have an implicit testing message/side effect: - I like the examples to roughly outline the space in which the code works: they should tell me what I'm supposed to do. - Depending on the function's purpose, I like to see a demonstration of the correctness for some example calculation. (I don't want to see all further tests - I can look them up if I feel the need) The fact that the very same line of example code serves a testing (side) purpose doesn't mean that it should be copied into the tests, does it? Thus, I think of the public part (the preface) of the tests living in the examples. My 2 ct, Best regards, Claudia On 17/01/2011 22:15, Spencer Graves wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely
Re: [Rd] R vs. C now rather: how to ease package checking
On Tue, Jan 18, 2011 at 4:48 AM, Claudia Beleites cbelei...@units.itwrote: On 01/18/2011 01:13 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Dominick, et al.: Demanding complete unit test suites with all software contributed to CRAN would likely cut contributions by a factor of 10 or 100. For me, the R package creation process is close to perfection in providing a standard process for documentation with places for examples and test suites of various kinds. I mention perfection, because it makes developing trustworthy software (Chamber's prime directive) relatively easy without forcing people to do things they don't feel comfortable doing. I don't think I made myself clear, sorry. I was not suggesting that package developers include a complete unit test suite. I was suggesting that unit testing should be done outside of the CRAN release process. Packages should be submitted for release to CRAN after they have been tested (the responsibility of the package developers). I understand that the main problem here is that package developers do not have access to all supported platforms, so the current process is not likely to change. Regarding access to all platforms: But there's r-forge where building and checks are done nightly for Linux, Win, and Mac (though for some months now the check protocols are not available for 32 bit Linux and Windows - but I hope they'll be back soon). I found it extremely easy to get an account project space and building. Many thanks to r-forge! Good point Claudia, There are packages released to CRAN that do not build on some platforms because the unit tests fail. It seems to me that this kind of issue could be ironed out with the help of r-forge before release, in which case there is no need to run the unit tests for released packages. Dominick [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C now rather: how to ease package checking
On 1/18/2011 8:44 AM, Dominick Samperi wrote: On Tue, Jan 18, 2011 at 4:48 AM, Claudia Beleitescbelei...@units.itwrote: On 01/18/2011 01:13 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Dominick, et al.: Demanding complete unit test suites with all software contributed to CRAN would likely cut contributions by a factor of 10 or 100. For me, the R package creation process is close to perfection in providing a standard process for documentation with places for examples and test suites of various kinds. I mention perfection, because it makes developing trustworthy software (Chamber's prime directive) relatively easy without forcing people to do things they don't feel comfortable doing. I don't think I made myself clear, sorry. I was not suggesting that package developers include a complete unit test suite. I was suggesting that unit testing should be done outside of the CRAN release process. Packages should be submitted for release to CRAN after they have been tested (the responsibility of the package developers). I understand that the main problem here is that package developers do not have access to all supported platforms, so the current process is not likely to change. Regarding access to all platforms: But there's r-forge where building and checks are done nightly for Linux, Win, and Mac (though for some months now the check protocols are not available for 32 bit Linux and Windows - but I hope they'll be back soon). I found it extremely easy to get an account project space and building. Many thanks to r-forge! Good point Claudia, There are packages released to CRAN that do not build on some platforms because the unit tests fail. It seems to me that this kind of issue could be ironed out with the help of r-forge before release, in which case there is no need to run the unit tests for released packages. Dominick CRAN also runs R CMD check on its contributed packages. I've found problems (and fixed) that I couldn't replicate by reviewing the repeated checks on both R-Forge and CRAN. Spencer [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
pTests and examples are different things. The fact that your example runs only means that your code does not bomb on execution and not that it runs correctly. Plus, the code in examples is meant as an aid to the user; a way to help them understand how to use your code. Proper tests are there to make sure your code executes properly and computes things correctly. br/p [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R vs. C
A question, please about development of R packages: Are there any guidelines or best practices for deciding when and why to implement an operation in R, vs. implementing it in C? The Writing R Extensions recommends working in interpreted R code . . . this is normally the best option. But we do write C-functions and access them in R - the question is, when/why is this justified, and when/why is it NOT justified? While I have identified helpful documents on R coding standards, I have not seen notes/discussions on when/why to implement in R, vs. when to implement in C. Thanks, Patrick On Sun, Jan 16, 2011 at 3:00 AM, r-devel-requ...@r-project.org wrote: Send R-devel mailing list submissions to r-devel@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-devel or, via email, send a message with subject or body 'help' to r-devel-requ...@r-project.org You can reach the person managing the list at r-devel-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-devel digest... Today's Topics: 1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash (Xiaobo Gu) -- Message: 1 Date: Sat, 15 Jan 2011 10:34:55 +0800 From: Xiaobo Gu guxiaobo1...@gmail.com To: r-devel@r-project.org Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash Message-ID: aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Hi, I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003 Server R2 64 bit SP2, the software environments are as following: R 2.12.1 for Win64 RTools212 for Win64 DBI 0.2.5 RPostgreSQL 0.1.7 Postgresql related binaries shipped with postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB The package can be loaded, and driver can be created, but the dbConnect function causes the whole RGui crashes, driver - dbDriver(PostgreSQL) con - dbConnect(driver, dbname=demo, host=192.168.8.1, user=postgres, password=postgres, port=5432) -- ___ R-devel@r-project.org mailing list DIGESTED https://stat.ethz.ch/mailman/listinfo/r-devel End of R-devel Digest, Vol 95, Issue 14 *** [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
Everyone has their own utility function. Mine is if the boredom of waiting for the pure R function to finish is going to out-weight the boredom of writing the C code. Another issue is that adding C code increases the hassle of users who might want the code to run on different architectures. On 17/01/2011 17:13, Patrick Leyshock wrote: A question, please about development of R packages: Are there any guidelines or best practices for deciding when and why to implement an operation in R, vs. implementing it in C? The Writing R Extensions recommends working in interpreted R code . . . this is normally the best option. But we do write C-functions and access them in R - the question is, when/why is this justified, and when/why is it NOT justified? While I have identified helpful documents on R coding standards, I have not seen notes/discussions on when/why to implement in R, vs. when to implement in C. Thanks, Patrick On Sun, Jan 16, 2011 at 3:00 AM,r-devel-requ...@r-project.org wrote: Send R-devel mailing list submissions to r-devel@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-devel or, via email, send a message with subject or body 'help' to r-devel-requ...@r-project.org You can reach the person managing the list at r-devel-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-devel digest... Today's Topics: 1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash (Xiaobo Gu) -- Message: 1 Date: Sat, 15 Jan 2011 10:34:55 +0800 From: Xiaobo Guguxiaobo1...@gmail.com To: r-devel@r-project.org Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash Message-ID: aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Hi, I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003 Server R2 64 bit SP2, the software environments are as following: R 2.12.1 for Win64 RTools212 for Win64 DBI 0.2.5 RPostgreSQL 0.1.7 Postgresql related binaries shipped with postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB The package can be loaded, and driver can be created, but the dbConnect function causes the whole RGui crashes, driver- dbDriver(PostgreSQL) con- dbConnect(driver, dbname=demo, host=192.168.8.1, user=postgres, password=postgres, port=5432) -- ___ R-devel@r-project.org mailing list DIGESTED https://stat.ethz.ch/mailman/listinfo/r-devel End of R-devel Digest, Vol 95, Issue 14 *** [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
On 17/01/2011 12:41 PM, Patrick Burns wrote: Everyone has their own utility function. Mine is if the boredom of waiting for the pure R function to finish is going to out-weight the boredom of writing the C code. Another issue is that adding C code increases the hassle of users who might want the code to run on different architectures. ... and also makes it harder for you and your users to tweak your code for different uses. It is not uncommon for C code to run 100 times faster than R code (but it is also not uncommon to see very little speedup, if the R code is well vectorized). So if you have something that's really slow, think about the fundamental operations, and write those in C, then use R code to glue them together. But if it is fast enough without doing that, then leave it all in R. Duncan Murdoch On 17/01/2011 17:13, Patrick Leyshock wrote: A question, please about development of R packages: Are there any guidelines or best practices for deciding when and why to implement an operation in R, vs. implementing it in C? The Writing R Extensions recommends working in interpreted R code . . . this is normally the best option. But we do write C-functions and access them in R - the question is, when/why is this justified, and when/why is it NOT justified? While I have identified helpful documents on R coding standards, I have not seen notes/discussions on when/why to implement in R, vs. when to implement in C. Thanks, Patrick On Sun, Jan 16, 2011 at 3:00 AM,r-devel-requ...@r-project.org wrote: Send R-devel mailing list submissions to r-devel@r-project.org To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-devel or, via email, send a message with subject or body 'help' to r-devel-requ...@r-project.org You can reach the person managing the list at r-devel-ow...@r-project.org When replying, please edit your Subject line so it is more specific than Re: Contents of R-devel digest... Today's Topics: 1. RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash (Xiaobo Gu) -- Message: 1 Date: Sat, 15 Jan 2011 10:34:55 +0800 From: Xiaobo Guguxiaobo1...@gmail.com To: r-devel@r-project.org Subject: [Rd] RPostgreSQL 0.1.7 for Windows 64 causes R.2.12.1 Win64 crash Message-ID: aanlktinvoub-z_le1gvpyswnqtsw1p6mzzlzsztoi...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 Hi, I build the binary package file of RPostgreSQL 0.1.7 for Windows 2003 Server R2 64 bit SP2, the software environments are as following: R 2.12.1 for Win64 RTools212 for Win64 DBI 0.2.5 RPostgreSQL 0.1.7 Postgresql related binaries shipped with postgresql-9.0.2-1-windows_x64.exe from EnterpriseDB The package can be loaded, and driver can be created, but the dbConnect function causes the whole RGui crashes, driver- dbDriver(PostgreSQL) con- dbConnect(driver, dbname=demo, host=192.168.8.1, user=postgres, password=postgres, port=5432) -- ___ R-devel@r-project.org mailing list DIGESTED https://stat.ethz.ch/mailman/listinfo/r-devel End of R-devel Digest, Vol 95, Issue 14 *** [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option. But we do write C-functions and access them in R - the | question is, when/why is this justified, and when/why is it NOT justified? | | While I have identified helpful documents on R coding standards, I have not | seen notes/discussions on when/why to implement in R, vs. when to implement | in C. The (still fairly recent) book 'Software for Data Analysis: Programming with R' by John Chambers (Springer, 2008) has a lot to say about this. John also gave a talk in November which stressed 'multilanguage' approaches; see e.g. http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html In short, it all depends, and it is unlikely that you will get a coherent answer that is valid for all circumstances. We all love R for how expressive and powerful it is, yet there are times when something else is called for. Exactly when that time is depends on a great many things and you have not mentioned a single metric in your question. So I'd start with John's book. Hope this helps, Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H - Original Message From: Dirk Eddelbuettel e...@debian.org To: Patrick Leyshock ngkbr...@gmail.com Cc: r-devel@r-project.org Sent: Mon, January 17, 2011 10:13:36 AM Subject: Re: [Rd] R vs. C On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option. But we do write C-functions and access them in R - the | question is, when/why is this justified, and when/why is it NOT justified? | | While I have identified helpful documents on R coding standards, I have not | seen notes/discussions on when/why to implement in R, vs. when to implement | in C. The (still fairly recent) book 'Software for Data Analysis: Programming with R' by John Chambers (Springer, 2008) has a lot to say about this. John also gave a talk in November which stressed 'multilanguage' approaches; see e.g. http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html In short, it all depends, and it is unlikely that you will get a coherent answer that is valid for all circumstances. We all love R for how expressive and powerful it is, yet there are times when something else is called for. Exactly when that time is depends on a great many things and you have not mentioned a single metric in your question. So I'd start with John's book. Hope this helps, Dirk -- Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H - Original Message From: Dirk Eddelbuettele...@debian.org To: Patrick Leyshockngkbr...@gmail.com Cc: r-devel@r-project.org Sent: Mon, January 17, 2011 10:13:36 AM Subject: Re: [Rd] R vs. C On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option. But we do write C-functions and access them in R - the | question is, when/why is this justified, and when/why is it NOT justified? | | While I have identified helpful documents on R coding standards, I have not | seen notes/discussions on when/why to implement in R, vs. when to implement | in C. The (still fairly recent) book 'Software for Data Analysis: Programming with R' by John Chambers (Springer, 2008) has a lot to say about this. John also gave a talk in November which stressed 'multilanguage' approaches; see e.g. http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html In short, it all depends, and it is unlikely that you will get a coherent answer that is valid for all circumstances. We all love R for how expressive and powerful it is, yet there are times when something else is called for. Exactly when that time is depends on a great many things and you have not mentioned a single metric in your question. So I'd start with John's book. Hope this helps, Dirk __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
On Mon, Jan 17, 2011 at 6:57 PM, David Henderson dnadav...@yahoo.com wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... If you write your code in C but interface to it in R, you can use the same R test harness system. I recently coded something up in R, tested it on small data, discovered it was waaay too slow on the real data, rewrote the likelihood calculation in C, and then used the same test set to make sure it was giving the same answers as the R code. It wasn't. So I fixed that bug until it was. If I'd written the thing in C to start with I might not have spotted it. Sometimes writing a prototype in R is a useful testing tool even when you know it'll be too slow - as an interpreted language R gives you a rapid development cycle and handy interactive debugging possibilities. Things that do exist in C but require compilation Barry __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm as a comment, but I have never seen this done in practice (perhaps it could be enforced by R CMD check?!). On the other hand, in principle a well-documented piece of C/C++ code could be much easier to understand, without paying a performance penalty...but coders are not likely to place this high on their list of priorities. The bottom like is that R is an adaptor (glue) language like Lisp that makes it easy to mix and match functions (using classes and generic functions), many of which are written in C (or C++ or Fortran) for performance reasons. Like any object-based system there can be a lot of object copying, and like any functional programming system, there can be a lot of function calls, resulting in poor performance for some applications. If you can vectorize your R code then you have effectively found a way to benefit from somebody else's C code, thus saving yourself some time. For operations other than pure vector calculations you will have to do the C/C++ programming yourself (or call a library that somebody else has written). Dominick - Original Message From: Dirk Eddelbuettele...@debian.org To: Patrick Leyshockngkbr...@gmail.com Cc: r-devel@r-project.org Sent: Mon, January 17, 2011 10:13:36 AM Subject: Re: [Rd] R vs. C On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option. But we do write C-functions and access them in R - the | question is, when/why is this justified, and when/why is it NOT justified? | | While I have identified helpful documents on R coding standards, I have not | seen notes/discussions on when/why to implement in R, vs. when to implement | in C. The (still fairly recent) book 'Software for Data Analysis: Programming with R' by John Chambers (Springer, 2008) has a lot to say about this. John also gave a talk in November which stressed 'multilanguage' approaches; see e.g. http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html In short, it all depends, and it is unlikely that you will get a coherent answer that is valid for all circumstances. We all love R for how expressive and powerful it is, yet there are times when something else is called for. Exactly when that time is depends on a great many things and you have not mentioned a single metric in your question. So I'd start with John's book. Hope this helps, Dirk __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1 - myfunc() # to test myfunc A0 - (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm as a comment, but I have never seen this done in practice (perhaps it could be enforced by R CMD check?!). On the other hand, in principle a well-documented piece of C/C++ code could be much easier to understand, without paying a performance penalty...but coders are not likely to place this high on their list of priorities. The bottom like is that R is an adaptor (glue) language like Lisp that makes it easy to mix and match functions (using classes and generic functions), many of which are written in C (or C++ or Fortran) for performance reasons. Like any object-based system there can be a lot of object copying, and like any functional programming system, there can be a lot of function calls, resulting in poor performance for some applications. If you can vectorize your R code then you have effectively found a way to benefit from somebody else's C code, thus saving yourself some time. For operations other than pure vector calculations you will have to do the C/C++ programming yourself (or call a library that somebody else has written). Dominick - Original Message From: Dirk Eddelbuettele...@debian.org To: Patrick Leyshockngkbr...@gmail.com Cc: r-devel@r-project.org Sent: Mon, January 17, 2011 10:13:36 AM Subject: Re: [Rd] R vs. C On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option
Re: [Rd] R vs. C
to benefit from somebody else's C code, thus saving yourself some time. For operations other than pure vector calculations you will have to do the C/C++ programming yourself (or call a library that somebody else has written). Dominick - Original Message From: Dirk Eddelbuettele...@debian.org To: Patrick Leyshockngkbr...@gmail.com Cc: r-devel@r-project.org Sent: Mon, January 17, 2011 10:13:36 AM Subject: Re: [Rd] R vs. C On 17 January 2011 at 09:13, Patrick Leyshock wrote: | A question, please about development of R packages: | | Are there any guidelines or best practices for deciding when and why to | implement an operation in R, vs. implementing it in C? The Writing R | Extensions recommends working in interpreted R code . . . this is normally | the best option. But we do write C-functions and access them in R - the | question is, when/why is this justified, and when/why is it NOT justified? | | While I have identified helpful documents on R coding standards, I have not | seen notes/discussions on when/why to implement in R, vs. when to implement | in C. The (still fairly recent) book 'Software for Data Analysis: Programming with R' by John Chambers (Springer, 2008) has a lot to say about this. John also gave a talk in November which stressed 'multilanguage' approaches; see e.g. http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html In short, it all depends, and it is unlikely that you will get a coherent answer that is valid for all circumstances. We all love R for how expressive and powerful it is, yet there are times when something else is called for. Exactly when that time is depends on a great many things and you have not mentioned a single metric in your question. So I'd start with John's book. Hope this helps, Dirk __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R vs. C
Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1 - myfunc() # to test myfunc A0 - (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm as a comment, but I have never seen this done in practice (perhaps it could be enforced by R CMD check?!). On the other hand, in principle a well-documented piece of C/C++ code could be much easier to understand, without paying a performance penalty...but coders are not likely to place this high on their list of priorities. The bottom like is that R is an adaptor (glue) language like Lisp that makes it easy to mix and match functions (using classes and generic functions), many of which are written in C (or C++ or Fortran) for performance reasons. Like any object-based system there can be a lot of object copying, and like any functional programming system, there can be a lot of function calls, resulting in poor performance for some applications. If you can vectorize your R code then you have effectively found a way to benefit from somebody else's C code, thus saving yourself some time. For operations other than pure vector calculations you will have to do the C/C++ programming yourself (or call a library that somebody else has written). Dominick - Original Message From: Dirk Eddelbuettele...@debian.org To: Patrick Leyshockngkbr
Re: [Rd] R vs. C
Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm as a comment, but I have never seen this done in practice (perhaps it could be enforced by R CMD check?!). On the other hand, in principle a well-documented piece of C/C++ code could be much easier to understand, without paying a performance penalty...but coders are not likely to place this high on their list of priorities. The bottom like is that R is an adaptor (glue) language like Lisp that makes it easy to mix and match functions (using classes and generic functions), many of which are written in C (or C++ or Fortran) for performance reasons. Like any object-based system there can be a lot of object copying, and like any functional programming system, there can be a lot of function calls, resulting in poor performance for some applications. If you can vectorize your R
Re: [Rd] R vs. C
On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. This is a unit test function, and I think it would be better if there was a way to unit test packages *before* they are released to CRAN. Otherwise, this is not really a release, it is test or beta version. This is currently possible under Windows using http://win-builder.r-project.org/, for example. My earlier remark about the release process was more about documentation than about unit testing, more about the gentle nudging that the R release process does to help insure consistent documentation and organization, and about how this nudging might be extended to the C/C++ part of a package. Dominick Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit tests, but you can rely on the well tested nature of R to allow you to reduce the number of tests of your algorithm. I routinely tell people at Sage Bionetworks where I am working now that your new C code needs to experience at least one order of magnitude increase in performance to warrant the effort of moving from R to C. But, then again, I am working with scientists who are not primarily, or even secondarily, coders... Dave H This makes sense, but I have seem some very transparent algorithms turned into vectorized R code that is difficult to read (and thus to maintain or to change). These chunks of optimized R code are like embedded assembly, in the sense that nobody is likely to want to mess with it. This could be addressed by including pseudo code for the original (more transparent) algorithm
Re: [Rd] R vs. C
Hi, Dominick, et al.: Demanding complete unit test suites with all software contributed to CRAN would likely cut contributions by a factor of 10 or 100. For me, the R package creation process is close to perfection in providing a standard process for documentation with places for examples and test suites of various kinds. I mention perfection, because it makes developing trustworthy software (Chamber's prime directive) relatively easy without forcing people to do things they don't feel comfortable doing. If you need more confidence in the software you use, you can build your own test suites -- maybe in packages you write yourself -- or pay someone else to develop test suites to your specifications. For example, Revolution Analytics offers Package validation, development and support. Spencer On 1/17/2011 3:27 PM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. This is a unit test function, and I think it would be better if there was a way to unit test packages *before* they are released to CRAN. Otherwise, this is not really a release, it is test or beta version. This is currently possible under Windows using http://win-builder.r-project.org/, for example. My earlier remark about the release process was more about documentation than about unit testing, more about the gentle nudging that the R release process does to help insure consistent documentation and organization, and about how this nudging might be extended to the C/C++ part of a package. Dominick Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.comwrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language), unless it's really simple, I may link to it rather than recode it in R. However, the problems with portability, maintainability, transparency to others who may not be very facile with C, etc., all suggest that it's well worth some effort experimenting with alternate ways of doing the same thing in R before jumping to C or something else. Hope this helps. Spencer On 1/17/2011 10:57 AM, David Henderson wrote: I think we're also forgetting something, namely testing. If you write your routine in C, you have placed additional burden upon yourself to test your C code through unit tests, etc. If you write your code in R, you still need the unit
Re: [Rd] R vs. C
On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Dominick, et al.: Demanding complete unit test suites with all software contributed to CRAN would likely cut contributions by a factor of 10 or 100. For me, the R package creation process is close to perfection in providing a standard process for documentation with places for examples and test suites of various kinds. I mention perfection, because it makes developing trustworthy software (Chamber's prime directive) relatively easy without forcing people to do things they don't feel comfortable doing. I don't think I made myself clear, sorry. I was not suggesting that package developers include a complete unit test suite. I was suggesting that unit testing should be done outside of the CRAN release process. Packages should be submitted for release to CRAN after they have been tested (the responsibility of the package developers). I understand that the main problem here is that package developers do not have access to all supported platforms, so the current process is not likely to change. Dominick If you need more confidence in the software you use, you can build your own test suites -- maybe in packages you write yourself -- or pay someone else to develop test suites to your specifications. For example, Revolution Analytics offers Package validation, development and support. Spencer On 1/17/2011 3:27 PM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Hi, Paul: The Writing R Extensions manual says that *.R code in a tests directory is run during R CMD check. I suspect that many R programmers do this routinely. I probably should do that also. However, for me, it's simpler to have everything in the examples section of *.Rd files. I think the examples with independently developed answers provides useful documentation. This is a unit test function, and I think it would be better if there was a way to unit test packages *before* they are released to CRAN. Otherwise, this is not really a release, it is test or beta version. This is currently possible under Windows using http://win-builder.r-project.org/, for example. My earlier remark about the release process was more about documentation than about unit testing, more about the gentle nudging that the R release process does to help insure consistent documentation and organization, and about how this nudging might be extended to the C/C++ part of a package. Dominick Spencer On 1/17/2011 1:52 PM, Paul Gilbert wrote: Spencer Would it not be easier to include this kind of test in a small file in the tests/ directory? Paul -Original Message- From: r-devel-boun...@r-project.org [mailto: r-devel-boun...@r-project.org] On Behalf Of Spencer Graves Sent: January 17, 2011 3:58 PM To: Dominick Samperi Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel Subject: Re: [Rd] R vs. C For me, a major strength of R is the package development process. I've found this so valuable that I created a Wikipedia entry by that name and made additions to a Wikipedia entry on software repository, noting that this process encourages good software development practices that I have not seen standardized for other languages. I encourage people to review this material and make additions or corrections as they like (or sent me suggestions for me to make appropriate changes). While R has other capabilities for unit and regression testing, I often include unit tests in the examples section of documentation files. To keep from cluttering the examples with unnecessary material, I often include something like the following: A1- myfunc() # to test myfunc A0- (manual generation of the correct answer for A1) \dontshow{stopifnot(} # so the user doesn't see stopifnot( all.equal(A1, A0) # compare myfunc output with the correct answer \dontshow{)} # close paren on stopifnot(. This may not be as good in some ways as a full suite of unit tests, which could be provided separately. However, this has the distinct advantage of including unit tests with the documentation in a way that should help users understand myfunc. (Unit tests too detailed to show users could be completely enclosed in \dontshow. Spencer On 1/17/2011 11:38 AM, Dominick Samperi wrote: On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves spencer.gra...@structuremonitoring.comwrote: Another point I have not yet seen mentioned: If your code is painfully slow, that can often be fixed without leaving R by experimenting with different ways of doing the same thing -- often after using profiling your code to find the slowest part as described in chapter 3 of Writing R Extensions. If I'm given code already written in C (or some other language