[Rd] Re: [R] A long digression on packages

2005-06-05 Thread Duncan Murdoch
Hi.  I think this discussion is more relevant to R-devel, so that's 
where I've sent my reply.


Jim Lemon wrote:

Hello again,

First, thanks for the help that got the latest plotrix package finished. 
I had been planning to write something about packages since Scott 
Waichler offered the gantt.chart function. Then Ben Bolker (who helped 
me to write the axis.break function) asked if I would be willing to 
include some of his plotting functions and almost immediately after that 
Sander Oom kindly donated the soil texture plotting function in the same 
way. I could procrastinate no longer.


There are now about 500 packages on CRAN. Some are focused, covering a 
particular area well, easy for the prospective user to discover their 
potential usefulness, while others are less so. I consider the plotrix 
package one of the former, and so as not to upset too many people, I 
will use the other package I contributed to CRAN as an example of the 
latter.


When I initially wrote concord, it was intended as a package of 
functions dealing with concordance and reliability. Okay, but I found 
Kendall's W so useful that I couldn't help including it, and somehow 
Page's test of ordered alternatives crept in and invited the Jonckheere 
test to the party and at that point I realized that I had maybe forty or 
fifty more or less useful functions floating around my R directory. Now 
many of these are probably floating around other people's R directories 
as well. Consider Cohen's kappa. The tabular method is included in 
e1071, my version has Cohen's plus two additional methods, and the 
recently contributed psy package has yet another version. Maybe there 
are still more encrypted in packages that I haven't even looked at.


The point of all this is that it would make many user's lives easier if 
there were less pandemonium in packages. The mistakes I have made in 
concord I have tried not to repeat in plotrix. Unless a user search of 
the documentation in packages materializes, it's become mighty hard to 
work out if the function you don't want to write has already been 
written. We also spend a lot of time responding to or deriding 
correspondents who ask about such things.


Would it be an idea to have informal R periphery teams, or even 
individual package lords, who would bear with, or maybe welcome, other 
people's functions? That is, I think plotrix has been greatly enhanced 
by recent contributions. Conversely, I wonder if it would be possible to 
shrink or maybe even evaporate concord by discovering duplicate methods 
in other packages or by contributing concord functions or parts thereof 
myself. It's not that I don't like maintaining concord or think the 
functions are worthless, just that I am mildly embarrassed to be adding 
to the duplication of effort and unnecessary volume of packages.


Feel free to comment upon this, although if you really want to rave, try 
it out on me first before clagging the list. Thanks for your attention.


A difficulty with multi-author packages is that it's harder to maintain 
consistency within the package, and it's harder to handle maintenance.


Another approach is to try to keep your packages small and focussed. 
The problem with this is what you mentioned above:  there are already 
500 packages, and it's hard to know what's there.  The task views 
should help with this, there are 5 online so far.  (See 
http://cran.us.r-project.org/src/contrib/Views.)  There is also a need 
for Misc packages for things too small to be a package on their own, but 
I think we need better ways to expose what is in them.


Of course, with disk sizes as they are now, it's not unreasonable to 
install all of the contributed CRAN packages on a PC.  Then 
help.search() *will* do searches through them all.


Duncan Murdoch

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Re: [R] A long digression on packages

2005-06-05 Thread Peter Dalgaard
Dirk Eddelbuettel [EMAIL PROTECTED] writes:

 Jim raises good points, as do the replies. On the topic of '500+ and
 growing', let me add my pet peeve: It is mighty impossible to know /what/
 changed /when/ in CRANland.
 
 Being Debian maintainer for a fair number of packages, I owe users of those
 packages timely updates. But the best I can do is to look at the
 timestamp-sorted source directory http://cran.r-project.org/src/contrib/?M=D
 That is tedious, as well as error-prone.  Moreover, as an R user, I'd like to
 know what is being added and, and what is being changed. There is no way to
 know right now.
 
 I would not be hard to write a little monitoring script that looks at the
 directory (and keeps tab in a Rdata structure, or SQLite db, or ...) and
 spits out either emails, or maybe rss-feed updates, of either or both of 'new
 packages' or 'new versions'.  If additionally we would enforce (err let's
 start with encourage) a standardised changelog (say $SRC/inst/CHANGES or
 $SRC/inst/ChangeLog) then that could get parsed too.  I had meant to play
 with some code for this for a while now but it just hasn't happened.
 Whining on a list is easier than writing code, unfortunately...
 
 Comments?

You might want to have a closer look at the way recommended packages
are handled by an R distribution build, using rsync, links,
timestamps, and makefile rules. 

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Re: [R] A long digression on packages

2005-06-05 Thread Dirk Eddelbuettel

On 5 June 2005 at 17:31, Peter Dalgaard wrote:
| Dirk Eddelbuettel [EMAIL PROTECTED] writes:
| 
|  Jim raises good points, as do the replies. On the topic of '500+ and
|  growing', let me add my pet peeve: It is mighty impossible to know /what/
|  changed /when/ in CRANland.
|  
|  Being Debian maintainer for a fair number of packages, I owe users of those
|  packages timely updates. But the best I can do is to look at the
|  timestamp-sorted source directory http://cran.r-project.org/src/contrib/?M=D
|  That is tedious, as well as error-prone.  Moreover, as an R user, I'd like 
to
|  know what is being added and, and what is being changed. There is no way to
|  know right now.
|  
|  I would not be hard to write a little monitoring script that looks at the
|  directory (and keeps tab in a Rdata structure, or SQLite db, or ...) and
|  spits out either emails, or maybe rss-feed updates, of either or both of 
'new
|  packages' or 'new versions'.  If additionally we would enforce (err let's
|  start with encourage) a standardised changelog (say $SRC/inst/CHANGES or
|  $SRC/inst/ChangeLog) then that could get parsed too.  I had meant to play
|  with some code for this for a while now but it just hasn't happened.
|  Whining on a list is easier than writing code, unfortunately...
|  
|  Comments?
| 
| You might want to have a closer look at the way recommended packages
| are handled by an R distribution build, using rsync, links,
| timestamps, and makefile rules. 

And recode/adapt that for the packages I am interested in? Works, but doesn't
scale.  But maybe I am misunderstanding you here.

Dirk

-- 
Statistics: The (futile) attempt to offer certainty about uncertainty.
 -- Roger Koenker, 'Dictionary of Received Ideas of Statistics'

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Re: [R] A long digression on packages

2005-06-05 Thread Gabor Grothendieck
On 6/5/05, M. Edward (Ed) Borasky [EMAIL PROTECTED] wrote:
 
 
 Duncan Murdoch wrote:
 
  Of course, with disk sizes as they are now, it's not unreasonable to
  install all of the contributed CRAN packages on a PC.  Then
  help.search() *will* do searches through them all.
 
 Some of them are very specialized, and some of them have non-CRAN
 dependencies. I've done a few load everything from CRAN operations on
 my Linux boxes, only to overflow the warnings list with missing Linux
 software. And, as an example, I have zero use for molecular biology
 packages.
 
 Dirk Eddelbuettel has done a lot of work integrating the CRAN and other
 R package collections with the Debian GNU/Linux package management
 system. This rather neatly solves the non-CRAN dependency problems, at
 least for Debian.
 
 Other people have done similar things for Perl packages and Common Lisp
 packages, both in Debian and in Gentoo's Portage package management
 system. CRAN could easily be integrated into Portage, but nobody has
 stepped forward to volunteer. Maybe when I retire ... :)
 
 And where does this leave Windows users? There's nothing like Debian or
 Portage for them; CRAN would have to build it from scratch.

I think that some time ago there was a discussion of having a downloadable
file that oould be used to help.search through so that a relatively small
download and no package installation would allow a comprehensive
offline help.search of all CRAN packages.   An online version of 
help.search might be another possibility.

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Re: [R] A long digression on packages

2005-06-05 Thread M. Edward (Ed) Borasky
Gabor Grothendieck wrote:

I think that some time ago there was a discussion of having a downloadable
file that oould be used to help.search through so that a relatively small
download and no package installation would allow a comprehensive
offline help.search of all CRAN packages.   An online version of 
help.search might be another possibility.

There are some great open source indexing and search tools available,
given documentation in HTML or PDF formats. One I'm rather fond of is
swish-e, which can be found at

*http://www.swish-e.org/

There is a Windows native version available, IIRC, although I've only
used it on Linux systems.
*

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Re: [R] A long digression on packages

2005-06-05 Thread Dirk Eddelbuettel

On 5 June 2005 at 12:48, M. Edward (Ed) Borasky wrote:
| Dirk Eddelbuettel has done a lot of work integrating the CRAN and other
| R package collections with the Debian GNU/Linux package management
| system. This rather neatly solves the non-CRAN dependency problems, at
| least for Debian.

Thanks for the kind mention but I'm afraid that is actually not quite
correct.  We do now have some 50 or so CRAN packages in Debian ... but that
does not solve the updating problem.  I.e. if you install those Debian
packages, and then ask R to do update.packages() it has no notion of what
came from manual installation (and should get updated) and what came from
Debian and should get a new package via apt-get.  In Quantian, I use the
existing Debian package and then fill 'by hand' to get fairly complete
coverage.

Also, compared to CRAN, we're not providing that much coverage.  There is,
however, work going on behind the scenes to provide /most/ of CRAN via
auto-generated Debian packages, preferably in an apt-get'able archive. We're
not ready yet to lift the curtain.  But if there's someone out here in the
Debian and R intersection interested and willing to help (with some crude
Perl coding), let me/us know and we'll get you involved.

Regards, Dirk

-- 
Statistics: The (futile) attempt to offer certainty about uncertainty.
 -- Roger Koenker, 'Dictionary of Received Ideas of Statistics'

__
R-devel@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel