[Rd] Re: [R] A long digression on packages
Hi. I think this discussion is more relevant to R-devel, so that's where I've sent my reply. Jim Lemon wrote: Hello again, First, thanks for the help that got the latest plotrix package finished. I had been planning to write something about packages since Scott Waichler offered the gantt.chart function. Then Ben Bolker (who helped me to write the axis.break function) asked if I would be willing to include some of his plotting functions and almost immediately after that Sander Oom kindly donated the soil texture plotting function in the same way. I could procrastinate no longer. There are now about 500 packages on CRAN. Some are focused, covering a particular area well, easy for the prospective user to discover their potential usefulness, while others are less so. I consider the plotrix package one of the former, and so as not to upset too many people, I will use the other package I contributed to CRAN as an example of the latter. When I initially wrote concord, it was intended as a package of functions dealing with concordance and reliability. Okay, but I found Kendall's W so useful that I couldn't help including it, and somehow Page's test of ordered alternatives crept in and invited the Jonckheere test to the party and at that point I realized that I had maybe forty or fifty more or less useful functions floating around my R directory. Now many of these are probably floating around other people's R directories as well. Consider Cohen's kappa. The tabular method is included in e1071, my version has Cohen's plus two additional methods, and the recently contributed psy package has yet another version. Maybe there are still more encrypted in packages that I haven't even looked at. The point of all this is that it would make many user's lives easier if there were less pandemonium in packages. The mistakes I have made in concord I have tried not to repeat in plotrix. Unless a user search of the documentation in packages materializes, it's become mighty hard to work out if the function you don't want to write has already been written. We also spend a lot of time responding to or deriding correspondents who ask about such things. Would it be an idea to have informal R periphery teams, or even individual package lords, who would bear with, or maybe welcome, other people's functions? That is, I think plotrix has been greatly enhanced by recent contributions. Conversely, I wonder if it would be possible to shrink or maybe even evaporate concord by discovering duplicate methods in other packages or by contributing concord functions or parts thereof myself. It's not that I don't like maintaining concord or think the functions are worthless, just that I am mildly embarrassed to be adding to the duplication of effort and unnecessary volume of packages. Feel free to comment upon this, although if you really want to rave, try it out on me first before clagging the list. Thanks for your attention. A difficulty with multi-author packages is that it's harder to maintain consistency within the package, and it's harder to handle maintenance. Another approach is to try to keep your packages small and focussed. The problem with this is what you mentioned above: there are already 500 packages, and it's hard to know what's there. The task views should help with this, there are 5 online so far. (See http://cran.us.r-project.org/src/contrib/Views.) There is also a need for Misc packages for things too small to be a package on their own, but I think we need better ways to expose what is in them. Of course, with disk sizes as they are now, it's not unreasonable to install all of the contributed CRAN packages on a PC. Then help.search() *will* do searches through them all. Duncan Murdoch __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Re: [R] A long digression on packages
Dirk Eddelbuettel [EMAIL PROTECTED] writes: Jim raises good points, as do the replies. On the topic of '500+ and growing', let me add my pet peeve: It is mighty impossible to know /what/ changed /when/ in CRANland. Being Debian maintainer for a fair number of packages, I owe users of those packages timely updates. But the best I can do is to look at the timestamp-sorted source directory http://cran.r-project.org/src/contrib/?M=D That is tedious, as well as error-prone. Moreover, as an R user, I'd like to know what is being added and, and what is being changed. There is no way to know right now. I would not be hard to write a little monitoring script that looks at the directory (and keeps tab in a Rdata structure, or SQLite db, or ...) and spits out either emails, or maybe rss-feed updates, of either or both of 'new packages' or 'new versions'. If additionally we would enforce (err let's start with encourage) a standardised changelog (say $SRC/inst/CHANGES or $SRC/inst/ChangeLog) then that could get parsed too. I had meant to play with some code for this for a while now but it just hasn't happened. Whining on a list is easier than writing code, unfortunately... Comments? You might want to have a closer look at the way recommended packages are handled by an R distribution build, using rsync, links, timestamps, and makefile rules. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Re: [R] A long digression on packages
On 5 June 2005 at 17:31, Peter Dalgaard wrote: | Dirk Eddelbuettel [EMAIL PROTECTED] writes: | | Jim raises good points, as do the replies. On the topic of '500+ and | growing', let me add my pet peeve: It is mighty impossible to know /what/ | changed /when/ in CRANland. | | Being Debian maintainer for a fair number of packages, I owe users of those | packages timely updates. But the best I can do is to look at the | timestamp-sorted source directory http://cran.r-project.org/src/contrib/?M=D | That is tedious, as well as error-prone. Moreover, as an R user, I'd like to | know what is being added and, and what is being changed. There is no way to | know right now. | | I would not be hard to write a little monitoring script that looks at the | directory (and keeps tab in a Rdata structure, or SQLite db, or ...) and | spits out either emails, or maybe rss-feed updates, of either or both of 'new | packages' or 'new versions'. If additionally we would enforce (err let's | start with encourage) a standardised changelog (say $SRC/inst/CHANGES or | $SRC/inst/ChangeLog) then that could get parsed too. I had meant to play | with some code for this for a while now but it just hasn't happened. | Whining on a list is easier than writing code, unfortunately... | | Comments? | | You might want to have a closer look at the way recommended packages | are handled by an R distribution build, using rsync, links, | timestamps, and makefile rules. And recode/adapt that for the packages I am interested in? Works, but doesn't scale. But maybe I am misunderstanding you here. Dirk -- Statistics: The (futile) attempt to offer certainty about uncertainty. -- Roger Koenker, 'Dictionary of Received Ideas of Statistics' __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Re: [R] A long digression on packages
On 6/5/05, M. Edward (Ed) Borasky [EMAIL PROTECTED] wrote: Duncan Murdoch wrote: Of course, with disk sizes as they are now, it's not unreasonable to install all of the contributed CRAN packages on a PC. Then help.search() *will* do searches through them all. Some of them are very specialized, and some of them have non-CRAN dependencies. I've done a few load everything from CRAN operations on my Linux boxes, only to overflow the warnings list with missing Linux software. And, as an example, I have zero use for molecular biology packages. Dirk Eddelbuettel has done a lot of work integrating the CRAN and other R package collections with the Debian GNU/Linux package management system. This rather neatly solves the non-CRAN dependency problems, at least for Debian. Other people have done similar things for Perl packages and Common Lisp packages, both in Debian and in Gentoo's Portage package management system. CRAN could easily be integrated into Portage, but nobody has stepped forward to volunteer. Maybe when I retire ... :) And where does this leave Windows users? There's nothing like Debian or Portage for them; CRAN would have to build it from scratch. I think that some time ago there was a discussion of having a downloadable file that oould be used to help.search through so that a relatively small download and no package installation would allow a comprehensive offline help.search of all CRAN packages. An online version of help.search might be another possibility. __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Re: [R] A long digression on packages
Gabor Grothendieck wrote: I think that some time ago there was a discussion of having a downloadable file that oould be used to help.search through so that a relatively small download and no package installation would allow a comprehensive offline help.search of all CRAN packages. An online version of help.search might be another possibility. There are some great open source indexing and search tools available, given documentation in HTML or PDF formats. One I'm rather fond of is swish-e, which can be found at *http://www.swish-e.org/ There is a Windows native version available, IIRC, although I've only used it on Linux systems. * __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Re: [R] A long digression on packages
On 5 June 2005 at 12:48, M. Edward (Ed) Borasky wrote: | Dirk Eddelbuettel has done a lot of work integrating the CRAN and other | R package collections with the Debian GNU/Linux package management | system. This rather neatly solves the non-CRAN dependency problems, at | least for Debian. Thanks for the kind mention but I'm afraid that is actually not quite correct. We do now have some 50 or so CRAN packages in Debian ... but that does not solve the updating problem. I.e. if you install those Debian packages, and then ask R to do update.packages() it has no notion of what came from manual installation (and should get updated) and what came from Debian and should get a new package via apt-get. In Quantian, I use the existing Debian package and then fill 'by hand' to get fairly complete coverage. Also, compared to CRAN, we're not providing that much coverage. There is, however, work going on behind the scenes to provide /most/ of CRAN via auto-generated Debian packages, preferably in an apt-get'able archive. We're not ready yet to lift the curtain. But if there's someone out here in the Debian and R intersection interested and willing to help (with some crude Perl coding), let me/us know and we'll get you involved. Regards, Dirk -- Statistics: The (futile) attempt to offer certainty about uncertainty. -- Roger Koenker, 'Dictionary of Received Ideas of Statistics' __ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel