date:20030916

You did use dev.off() to finish the plots before trying to look at them?
The symptoms you report are what happens if you did not.

There is no ps() function in R: the postscript device is postscript() not 
ps().

If you want to print a plot, try dev.print.  If you want to copy to a 
file, try dev.copy2eps.  (You are on Linux, where EPS is more widely 
acceptable than PDF.)

On 15 Sep 2003, Weiming Zhang wrote:

 Hi, Thank both of you.
 
 I tried everything. pdf(file=out.pdf) gave me a damaged pdf file. ps()
 did not print. ps(out.ps) gave me a ps file with badly drawn graph and
 could not be printed. I am using RH linux 7.2.
 
 Thanks again.
 
 weiming Zhang
 
 
 On Mon, 2003-09-15 at 16:06, Jason Turner wrote:
  On Tue, 2003-09-16 at 08:56, Weiming Zhang wrote:
   Hi,
   
   I am using R-1.7.1 on Linux. I integrated XEMACS with R. Could anybody
   tell me how to print a plot? I used plot function to make some graphs
   and then I wanted to print them or to save them to files. But I could
   not find out how to do it.
  
  Have you tried:
  help(Devices) 
  help(pdf)
  
  What I do:
  
  pdf(file=myplots.pdf)
  plot(...)
  dev.off()
  
  Use Acrobat or gv to view the pdf files.  Postscript is also good, but
  not as universally understood; I have many coleagues who work in very
  standard Windows environments, where ghostscript is unknown.  PDF is a
  very sensible choice for e-mailing graphs.
  
  
  -- 
  Indigo Industrial Controls Ltd.
  http://www.indigoindustrial.co.nz
  +64-(0)21-343-545
  
  
  
 
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] POSIX and identify

2003-09-16 Thread Petr Pikal

Hi

On 15 Sep 2003 at 20:09, Troels Ring wrote:

 Thanks a lot, but something else may be awry ? - at least
 as.POSIXct(Dato) although now of length 84 still elicits a report of
 different argument lengths even though length now is 84 for both
 arguments.

try
plot(as.POSIXct(Dato),Crea))

and
then

identify(as.POSIXct(Dato),Crea))

   identify(as.POSIXct(Dato),Crea,5,plot=TRUE)
 Error in identify(x, y, as.character(labels), n, plot, offset) :
  different argument lengths
   length(as.POSIXct(Dato))
 [1] 84
   length(Crea)
 [1] 84
   length(Dato)
 [1] 9
 
 Best wishes
 Troels Ring
 Aalborg
 
 At 18:43 9/15/03, you wrote:
 You need to convert to POSIXct before using Dato in identify().
 This will work as you expected in R 1.8.0.
 
 On Mon, 15 Sep 2003, Troels Ring wrote:
 
   Dear Friends, I'm using winXP and R 1.7.1 and plotting some data
   using dates on the x-axis, and wanted to use identify to show some
   points but 
  was
   told by identify that the x and y vectors producing a fine graph
   with 84 points were not equal in length. Below are the Dato for
   date - and length(Dato) finds 9 but str finds 84 as known. Will
   identify not work in this context ? Best wishes Troels Ring
   Aalborg, Denmark
 Dato
 [1] 2000-01-04 2000-01-07 2000-01-10 2000-01-13
 2000-01-17 ...
   [81] 2003-04-23 2003-05-14 2003-07-30 2003-08-14
 length(Dato)
   [1] 9
 str(Dato)
   `POSIXlt', format: chr [1:84] 2000-01-04 2000-01-07
   2000-01-10 2000-01-13 ...
 
 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self) 1
 South Parks Road, +44 1865 272866 (PA) Oxford OX1
 3TG, UKFax:  +44 1865 272595
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
  [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Cheers
Petr Pikal
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Fourth R Mailing List : R-packages

2003-09-16 Thread Martin Maechler

We (mainly the R core team) have been discussing the creation of another
R mailing list, with the goal to fill the gap between

   R-helpvery high volume, with its great merits, but

and 

   R-announceonly for R important announcements (mostly R-core)
 hence __MODERATED__ and *very* low volume, and hence 
 highly recommended for almost all users of R.
 *** all messages are forwarded to R-help ***

In the past, several CRAN package authors have rightly felt that
they would like the announcement of a major update of a package
be a bit more prominent than the flood of messages on R-help but
(most of the time) they still weren't supposed nor granted to
use R-announce for this. This has been one main motivation for
this new mailing list:

  R-packageso  all messages forwarded to R-help
o  moderated (i.e. not accepting posts by anyone),
   but CRAN package authors (and others,
   similarly qualified) can freely post without
   moderator interaction {unless there's abuse}.

The corresponding (new) web page,
http://www.stat.math.ethz.ch/mailman/listinfo/r-packages/
now has

  TITLE: R Packages  Extensions Announcements

  DESCRIPTION :

 A moderated board for announcements about contributed R packages
 and similar R project extensions.

 All messages are forwarded to R-help automatically, so please do
 not subscribe to this list if you are subscribed to R-help.

 For major announcements on the R project, see the R-announce
 mailing list, instead.

And R-project.org's Mailing Lists web page will describe it
from tomorrow as

 R-packages
 
 This list is for announcements as well, usually on the
 availability of new or enhanced contributed packages (on CRAN, typically).
 
 Note that the list is moderated. However, CRAN package
 authors (and others, similarly qualified) can freely post.
 
 As with R-announce, all messages to R-packages are
 automatically forwarded to the main R-help mailing list;
 hence you should only subscribe to R-packages if you do not
 to R-help.
 
 Use the web interface for information, subscription, archives, etc.

Amount of mail to expect:  Of course, we don't know yet, but
   I'd expect to see only a few messages per week.

Finally, just re-iterating the obvious: 

 o  This is *NOT* a list for discussion, just announcements of extensions to R.

 o  Do only subscribe if you are *NOT* subscribed to R-help,
(but then, strongly consider doing it)!
 
For more info, subscription, etc, please use the URL above

Your R mailing list maintainer,
Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

___
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-announce

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] simplifying randomForest(s)

2003-09-16 Thread Ramon Diaz-Uriarte

Dear All,

I have been using the randomForest package for a couple of difficult 
prediction problems (which also share p  n). The performance is good, but 
since all the variables in the data set are used, interpretation of what is 
going on is not easy, even after looking at variable importance as produced 
by the randomForest run.

I have tried a simple variable selection scheme, and it does seem to perform 
well (as judged by leave-one-out) but I am not sure if it makes any sense.  
The idea is, in a kind of backwards elimination,  to eliminate one by one the 
variables with smallest importance (or all the ones with negative importance 
in one go) until the out-of-bag estimate of classification error becames 
larger than that of the previous model (or of the initial model). So nothing 
really new. But I haven't been able to find any comments in the literature 
about simplification of random forests. 

Any suggestions/comments?

Best,

Ramón

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Persp and color

2003-09-16 Thread ucgamdo

Hi, 

If you run the demo for persp (I have R 1.7), you will see that there is a
good example of 'coluring' a volcano according to different heights, just try

 demo(persp)

and check out the code. You probably will find it too complicated as I did,
I was trying to do the same and honestly I wasn't able to. However, there
is a way around and it is to use the function wireframe from the lattice
package

 library(lattice)
 ?wireframe

If you run through the help examples you'll see that it is a lot easier to
colour the surfaces the way you want using this function. However,
wireframe is extreMELY slow, so, if you have a big matrix it might be a
pain in the behind. Also, the way you feed the data to wireframe is
different to the way you do it with the persp function. I hope this is of
any help.

M.

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] package documentation

2003-09-16 Thread Vito Muggeo

Dear all,
I writing my first package and everything seems to work (at least up to now)

However when I try to build documentation (.dvi or .pdf), using
Rcmd Rd2dvi.sh --pdf mypack.Rd
I get a mypack.pdf whose title is
R documentation of mypack.Rd instead of
The mypack package
as it should be, is it right?
Also Version, Title, License,  namely info from the DESCRIPTION file,
are missing on the first page.

Where is the problem?
Many thanks,
vito

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] package documentation

On Tue, 16 Sep 2003, Vito Muggeo wrote:

 I writing my first package and everything seems to work (at least up to now)
 
 However when I try to build documentation (.dvi or .pdf), using
 Rcmd Rd2dvi.sh --pdf mypack.Rd
 I get a mypack.pdf whose title is
 R documentation of mypack.Rd instead of
 The mypack package
 as it should be, is it right?

It is right, rather than you: it did as you asked and not as you wanted.

 Also Version, Title, License,  namely info from the DESCRIPTION file,
 are missing on the first page.
 
 Where is the problem?

  gannet% R CMD Rd2dvi --help
  Usage: R CMD Rd2dvi [options] files

  Generate DVI (or PDF) output from the Rd sources specified by files, by
  either giving the paths to the files, or the path to a directory with
  the sources of a package.

You haven't called this with the option to give what you expected.  Note 
what follows the `or'.  That will give Package 'mypack' as the title.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] simplifying randomForest(s)

2003-09-16 Thread Liaw, Andy

Ramon,

 From: Ramon Diaz-Uriarte [mailto:[EMAIL PROTECTED] 
 
 Dear All,
 
 I have been using the randomForest package for a couple of difficult 
 prediction problems (which also share p  n). The 
 performance is good, but 
 since all the variables in the data set are used, 
 interpretation of what is 
 going on is not easy, even after looking at variable 
 importance as produced 
 by the randomForest run.
 
 I have tried a simple variable selection scheme, and it 
 does seem to perform 
 well (as judged by leave-one-out) but I am not sure if it 
 makes any sense.  
 The idea is, in a kind of backwards elimination,  to 
 eliminate one by one the 
 variables with smallest importance (or all the ones with 
 negative importance 
 in one go) until the out-of-bag estimate of classification 
 error becames 
 larger than that of the previous model (or of the initial 
 model). So nothing 
 really new. But I haven't been able to find any comments in 
 the literature 
 about simplification of random forests. 

This is quite a hazardous game.  We've been burned by this ourselves.  I'll
send you a paper we submitted on variable selection for random forest
off-line.  (Those who are interested, let me know.)

The basic problem is that when you select important variables by RF and then
re-run RF with those variables, the OOB error rate become biased downward.
As you iterate more times, the overfitting becomes more and more severe
(in the sense that, the OOB error rate will keep decreasing while error rate
on an independent test set will be flat or increases).  I was naïve enough
to ask Breiman about this, and his reply was something like any competent
statistician would know that you need something like cross-validation to do
that...

In the upcoming version 5 of Breiman's Fortran code, he offers an option to
run RF twice, first time with all variables, and the second with the k
(selected by user) most important variables from the 1st run.  The OOB error
rate from the 2nd run is no longer unbiased, but the bias is probably not
too severe with only one iteration.

Best,
Andy
 
 Any suggestions/comments?
 
 Best,
 
 Ramón
 
 -- 
 Ramón Díaz-Uriarte
 Bioinformatics Unit
 Centro Nacional de Investigaciones Oncológicas (CNIO)
 (Spanish National Cancer Center)
 Melchor Fernández Almagro, 3
 28029 Madrid (Spain)
 Fax: +-34-91-224-6972
 Phone: +-34-91-224-6900
 
http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] RSPython crashes (using R 1.7.1 under Solaris 5.9)

2003-09-16 Thread michael schmitt

Hello.
I tried to install RSPython on Solaris 5.9.

After compiling R with --enable-R-shlib, I try to
install RSPython using R INSTALL --clean RSPython.
This led to an error complaining about missing
libutil, wich seems not to exist on Solaris.
Therefore I just removed the -lutil entry in
configure and tried to install again. The
installation worked without problems, but after
calling python and importing RS the python
interpreter immediately crashes with a segmentation
fault.

Thanks for any hints.
Michael

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Interfacing C++ , MysQL and R

2003-09-16 Thread Liaw, Andy

 From: Anne Piotet [mailto:[EMAIL PROTECTED] 
 
 Hello!
 
 After a presentation of some statistical analysis of 
 process datas, (where the few  R possibilities I was able to 
 show made quite a big impression), I was asked if it was 
 possible to program a statistical  application which could be 
 used directly by the end user.
 
 Such an application would include a userfriendly interface 
 (developped in C++), a db , a core statistical program, 
 standard output; the necessary queries and statistical 
 procedures would be interactively generated from the user 
 input by the C++ program.  As I do not intend to reprogram 
 the necessary statistical functions if I can help it, I'm 
 interested to know if
 a) it is possible to integrate R in such a way?
 b) naturally as I would sell the end product, what the 
 royalties arrangements are

Others will know more about this, but that never stopped me from tossing in
my $0.02...

As R is licensed as GPL, if you distribute (e.g., sell) your code, it will
have to be GPL'ed as well.  I believe that means while you can sell it for
money,

1. You have to make it clear to whomever get the code that it's GPL'ed.
2. You have to distribute the source code, or allow a way for people to get
source code.
3. You can not restrict further distribution of the code, free or otherwise.

My understanding of how RedHat deals with this (at least in their enterprise
server product) is by tacking on GPL a term that whoever installed their
software agrees to purchase service/support contract from them.  Another
company that has a software linked to R does similar thing, by not selling
the software, but the service (installation and training).

HTH,
Andy

 c) has anybody in the list experience with such a project?
 
 Thanks for the help!
 
 Anne
 
   [[alternative HTML version deleted]]
 
 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo /r-help
 

--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re[2]: [R] Persp and color and adding a color vector

2003-09-16 Thread Mark Marques


Followin Prof. Uwes idea and after checking up some docs I was able to
build a color vector with the correct colors and then call it from
using persp col = option ...
Nevertheless I still have a small problem...
using something like :
colorvect - rainbow(length(mat3),start=0.1,end=0.8)
persp(mat3,col=colorvect, box= FALSE, theta=30)
works something like I need...
But ...
if I try to visualize a specific part like mat3[1:900,2:78] ...
persp(mat3[1:900,2:78,col=colorvect, box= FALSE, theta=30)
What I get is bleach result with only part of the colors...
I know that I were specting this but how can I avoid it with making
the colorvect vector each time I call persp ?

The other idea is making an equivalent matrix with each cell with the
color info...
but how can I automate that kind of procedure ?

THanks
Mark Marques

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] gam and concurvity

2003-09-16 Thread Martin Wegmann

Hello, 

in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras et 
al. (2003) it is mentioned that in GLM collinearity is taken into account in 
the calc of se but not in GAM (- results in confidence interval too narrow, 
p-value understated,  GAM S-Plus version). I haven't found any references to 
GAM and concurvity or collinearity on the R page. And I wonder if the R 
version of Gam differ in this point.
Another question would be, what the best manual way of a variable selection 
is, due to the lack of a stepwise procedure for GAM. Including the first 
variables, add var1, if GCV improves (what would be considered as 
improvement?) or P-value signif., keep it, otherwise drop it - add var 2, and 
so on?

thanks in advance, cheers Martin

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Persp and color

2003-09-16 Thread ucgamdo

Hi, first of all I would like to say that honestly the persp demo is quite
impresive, I won't take that away from you. The only problem I had, was
that the code that actually builds the matrix of topo-colours that are used
in the demo, is quite complicated (at least for me), and that code is
poorly commented. So I was left with a series of help's() to try to see
what each function would do, etc, etc..., while I was in that proccess I
rembered about the wireframe function, and after checking its
documentation, I found out that it has 'built-in' the ability to creaty
this topo colors, that, I think is a great advantage. Maybe a good idea
would be to insert the procedure you used to create the colors into the
persp function itself, so humble neophyte users can easily plot striking
volcano surfaces.

This is actually the bit of code I couldn't work out, I know I would if I
just could invest more of my precious time to it:

 fcol - fill

 zi - volcano[-1, -1] + volcano[-1, -61] + volcano[-87, 
-1] + volcano[-87, -61]

 fcol[-i1, -i2] - terrain.colors(20)[cut(zi, quantile(zi, 
seq(0, 1, len = 21)), include.lowest = TRUE)]

 persp(x, y, 2 * z, theta = 110, phi = 40, col = fcol, 
scale = FALSE, ltheta = -120, shade = 0.4, border = NA, box = FALSE)


just another thing, I have realised that the demo runs from beginning to
end without stopping (not always), that is not very nice because the plots
are displayed too quickly to appreciate, so the user is left to 'run' de
demo manualy, i.e. copying and pasting each bit of code in order to see
each plot in detail. I am aware that R is the product of the cooperation of
many people, contributing part of their work-time into making it better, I
think your demo is fine, and perhaps you won't have time to improve on it,
don't worry about that (no bad feelings).

You correctly pointed out that a better way around was to go ask R-help
directly. Certainly, that is what I intended to do with my own problem, but
first I wanted to write my code properly. Tomorrow, I will post a thread
about making persp representations of fractals in R, and maybe, you will be
able to help me in showing how to correctly apply the colours to the
surface. Maybe you will find this interested, and who knows, you will
perhaps put it in your demo!

By the way, I did also check help(persp), and how the colours are asigned
to the surface facets is not well specified, not even on the examples (as I
am aware of).

Thanks,
Mario.





At 14:41 16/09/03 +0200, you wrote:
 ucgamdo == ucgamdo  [EMAIL PROTECTED]
 on Tue, 16 Sep 2003 11:46:18 +0100 writes:

ucgamdo Hi, If you run the demo for persp (I have R 1.7),
ucgamdo you will see that there is a good example of
ucgamdo 'coluring' a volcano according to different
ucgamdo heights, just try

 demo(persp)

ucgamdo and check out the code. You probably will find it
ucgamdo too complicated as I did, I was trying to do the
ucgamdo same and honestly I wasn't able to. 

Thank you for you honesty.  As a main author of the part of
demo(persp) I'm quite interested to find out what the problem
was.  I assume you also have looked at  help(persp) ?

ucgamdo However, there is a way around 

[ another way around would be to ask on R-help or ask  someone who
  knows R better ... ... ]

ucgamdo and it is to use the
ucgamdo function wireframe from the lattice package

 library(lattice) ?wireframe

ucgamdo If you run through the help examples you'll see
ucgamdo that it is a lot easier to colour the surfaces the
ucgamdo way you want using this function. However,
ucgamdo wireframe is extreMELY slow, so, if you have a big
ucgamdo matrix it might be a pain in the behind. Also, the
ucgamdo way you feed the data to wireframe is different to
ucgamdo the way you do it with the persp function. I hope
ucgamdo this is of any help.

ucgamdo M.

ucgamdo __
ucgamdo [EMAIL PROTECTED] mailing list
ucgamdo https://www.stat.math.ethz.ch/mailman/listinfo/r-help



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Persp and color and adding a color vector

Mark Marques wrote:

Followin Prof. Uwes 
Who's that? If you mean me, I am not a Professor ...

 idea and after checking up some docs I was able to
build a color vector with the correct colors and then call it from
using persp col = option ...
Nevertheless I still have a small problem...
using something like :
colorvect - rainbow(length(mat3),start=0.1,end=0.8)
Attention: ?persp tells you there are (nx-1)(ny-1) facets given you have 
a matrix of dimension nx x ny.
Additionally, this was not my idea, at least you have to select the 
colors by height, if I understood your question correctly.


persp(mat3,col=colorvect, box= FALSE, theta=30)
works something like I need...
But ...
if I try to visualize a specific part like mat3[1:900,2:78] ...
persp(mat3[1:900,2:78,col=colorvect, box= FALSE, theta=30)
What I get is bleach result with only part of the colors...
What about writing a little function along the lines of

 foo - function(M){
colvect - ...(M)...
persp(M, .)
 }
and calling it with
 foo(mat3[1:900,2:78])

I know that I were specting this but how can I avoid it with making
the colorvect vector each time I call persp ?
As already mentioned, generate some colors and then take one color for a 
certain range of values of your matrix.

Anyway, wireframe() in package lattice might have some features that are 
much more convenient to perform your task, as mentioned by someone else 
(too lazy to look into the archives).

Uwe Ligges


The other idea is making an equivalent matrix with each cell with the
color info...
but how can I automate that kind of procedure ?
THanks
Mark Marques
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Retrieve ... argument values

2003-09-16 Thread huan . huang


Dear R users,

I want to retrieve ... argument values within a function. Here is a small
exmaple:

myfunc - function(x, ...)
{
  if (hasArg(ylim))  a - ylim
  plot(x, ...)
}

x - rnorm(100)
myfunc(x, ylim=c(-0.5, 0.5))
Error in myfunc(x, ylim = c(-0.5, 0.5)) : Object ylim not found


I need to retrieve values of ylim (if it is defined when function is called) for 
later use in the function. Can anybody give me some hint?

Thanks a lot.

Huan




This message and any attachments (the message) is\ intende...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

R: [R] gam and concurvity

2003-09-16 Thread Vito Muggeo

As someone (Simon Wood, for instance) could explain much better and as it is
stressed in the help files of the mgcv pakage (the package  including the
gam() function)
gam in R is not a clone of gam in S+.
S+ uses backfitting while R uses penalized splines (see the references
inside gam() function). The approaches are quite different and can lead to
substantial differences in particular cases, for instance with concurvity.

best,
vito

PS Can you point out the exact reference for Figueiras et al. (2003)?


- Original Message -
From: Martin Wegmann [EMAIL PROTECTED]
To: R-list [EMAIL PROTECTED]
Sent: Tuesday, September 16, 2003 3:47 PM
Subject: [R] gam and concurvity


 Hello,

 in the paper Avoiding the effects of concurvity in GAM's .. of Figueiras
et
 al. (2003) it is mentioned that in GLM collinearity is taken into account
in
 the calc of se but not in GAM (- results in confidence interval too
narrow,
 p-value understated,  GAM S-Plus version). I haven't found any references
to
 GAM and concurvity or collinearity on the R page. And I wonder if the R
 version of Gam differ in this point.
 Another question would be, what the best manual way of a variable
selection
 is, due to the lack of a stepwise procedure for GAM. Including the first
 variables, add var1, if GCV improves (what would be considered as
 improvement?) or P-value signif., keep it, otherwise drop it - add var 2,
and
 so on?

 thanks in advance, cheers Martin

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Persp and color

2003-09-16 Thread Thomas Lumley

On Tue, 16 Sep 2003 [EMAIL PROTECTED] wrote:

 just another thing, I have realised that the demo runs from beginning to
 end without stopping (not always), that is not very nice because the plots
 are displayed too quickly to appreciate, so the user is left to 'run' de
 demo manualy, i.e. copying and pasting each bit of code in order to see
 each plot in detail. I am aware that R is the product of the cooperation of
 many people, contributing part of their work-time into making it better, I
 think your demo is fine, and perhaps you won't have time to improve on it,
 don't worry about that (no bad feelings).


If you type
  par(ask=TRUE)

you will always be prompted before a new graph is drawn.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] simplifying randomForest(s)

2003-09-16 Thread Ramon Diaz-Uriarte

Dear Andy,

Thanks a lot for your message.

 This is quite a hazardous game.  We've been burned by this ourselves.  I'll
 send you a paper we submitted on variable selection for random forest
 off-line.  (Those who are interested, let me know.)

Thanks!


 The basic problem is that when you select important variables by RF and
 then re-run RF with those variables, the OOB error rate become biased
 downward. As you iterate more times, the overfitting becomes more and
 more severe (in the sense that, the OOB error rate will keep decreasing
 while error rate on an independent test set will be flat or increases).  I
 was naïve enough to ask Breiman about this, and his reply was something
 like any competent statistician would know that you need something like
 cross-validation to do that...

Yes, I understand the points you are making. However, I have tried to achieve 
protection against this problem by assessing the leave-one-out 
cross-validation error (LOOCVE) of the complete selection process. And the 
LOOCVE suggests this is working. Within the variable selection routine the 
OOB error rate is biased, but I guess that does not concern me that much, 
because I only use it to guide the selection. However, my final estimate of 
error comes from the LOOCVE.

This is the esqueleton of the alorithm:

n - length(y)

for(i in 1:n) {
the.simple.rf - simplify.the.rf(data = data[-i, ])
prediction[i] - predict(the.simple.rf, newdata = data[i, ])
}
loocve - sum(y != prediction) / n

Thus, the LOOCVE is computed with observations that were never used for the 
simplification of the tree that is predicting them.

[I'll be glad to send my code to anyone interested].

And, the interesting thing with the data set I have tried is that it seems to 
perform reasonably (actually, the LOOCVE of a tree with the reduced set of 
variables is smaller than the LOOCVE of the original tree).

(This is a first shot. I have a small sample size (29) so LOOCV is not that 
bad in terms of computation, although I am aware it can have high variance. I 
guess I could try the .632+ bootstrap method).



Best,

Ramón




 Best,
 Andy

  Any suggestions/comments?
 
  Best,
 
  Ramón
 
  --
  Ramón Díaz-Uriarte
  Bioinformatics Unit
  Centro Nacional de Investigaciones Oncológicas (CNIO)
  (Spanish National Cancer Center)
  Melchor Fernández Almagro, 3
  28029 Madrid (Spain)
  Fax: +-34-91-224-6972
  Phone: +-34-91-224-6900

 http://bioinfo.cnio.es/~rdiaz

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 ---
--- Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (Whitehouse Station, New Jersey, USA),
 and/or its affiliates (which may be known outside the United States as
 Merck Frosst, Merck Sharp  Dohme or MSD) that may be confidential,
 proprietary copyrighted and/or legally privileged, and is intended solely
 for the use of the individual or entity named on this message.  If you are
 not the intended recipient, and have received this message in error, please
 immediately return this by e-mail and then delete it.
 ---
---

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Persp and color

[EMAIL PROTECTED] wrote:

Hi, first of all I would like to say that honestly the persp demo is quite
impresive, I won't take that away from you. The only problem I had, was
that the code that actually builds the matrix of topo-colours that are used
in the demo, is quite complicated (at least for me), and that code is
poorly commented. So I was left with a series of help's() to try to see
what each function would do, etc, etc..., while I was in that proccess I
rembered about the wireframe function, and after checking its
documentation, I found out that it has 'built-in' the ability to creaty
this topo colors, that, I think is a great advantage. Maybe a good idea
would be to insert the procedure you used to create the colors into the
persp function itself, so humble neophyte users can easily plot striking
volcano surfaces.
This is actually the bit of code I couldn't work out, I know I would if I
just could invest more of my precious time to it:

fcol - fill


zi - volcano[-1, -1] + volcano[-1, -61] + volcano[-87, 
-1] + volcano[-87, -61]
Since

 dim(volcano)
 [1] 87 61
you have to throw away some points of the margins, because you need 
(nx-1)*(ny-1) facets' colors. And you want the color to be specified for 
the middle of the facets, not one of the 4 corners, so you average the 
matrices of those 4 corners.



fcol[-i1, -i2] - terrain.colors(20)[cut(zi, quantile(zi, 
seq(0, 1, len = 21)), include.lowest = TRUE)]


You use 20 different colors, choosen (indexed) by quantiles of the 
matrix calculated above. That's the obvious idea (nice implemented here, 
though).


persp(x, y, 2 * z, theta = 110, phi = 40, col = fcol, 
scale = FALSE, ltheta = -120, shade = 0.4, border = NA, box = FALSE)

just another thing, I have realised that the demo runs from beginning to
end without stopping (not always), that is not very nice because the plots
are displayed too quickly to appreciate, so the user is left to 'run' de
demo manualy, i.e. copying and pasting each bit of code in order to see
each plot in detail. I am aware that R is the product of the cooperation of
many people, contributing part of their work-time into making it better, I
think your demo is fine, and perhaps you won't have time to improve on it,
don't worry about that (no bad feelings).


The improvement seems to be:

 par(ask = TRUE)
 demo(persp)
Uwe Ligges



You correctly pointed out that a better way around was to go ask R-help
directly. Certainly, that is what I intended to do with my own problem, but
first I wanted to write my code properly. Tomorrow, I will post a thread
about making persp representations of fractals in R, and maybe, you will be
able to help me in showing how to correctly apply the colours to the
surface. Maybe you will find this interested, and who knows, you will
perhaps put it in your demo!
By the way, I did also check help(persp), and how the colours are asigned
to the surface facets is not well specified, not even on the examples (as I
am aware of).
Thanks,
Mario.




At 14:41 16/09/03 +0200, you wrote:

ucgamdo == ucgamdo  [EMAIL PROTECTED]
   on Tue, 16 Sep 2003 11:46:18 +0100 writes:
  ucgamdo Hi, If you run the demo for persp (I have R 1.7),
  ucgamdo you will see that there is a good example of
  ucgamdo 'coluring' a volcano according to different
  ucgamdo heights, just try
   demo(persp)

  ucgamdo and check out the code. You probably will find it
  ucgamdo too complicated as I did, I was trying to do the
  ucgamdo same and honestly I wasn't able to. 

Thank you for you honesty.  As a main author of the part of
demo(persp) I'm quite interested to find out what the problem
was.  I assume you also have looked at  help(persp) ?
  ucgamdo However, there is a way around 

[ another way around would be to ask on R-help or ask  someone who
knows R better ... ... ]
  ucgamdo and it is to use the
  ucgamdo function wireframe from the lattice package
   library(lattice) ?wireframe

  ucgamdo If you run through the help examples you'll see
  ucgamdo that it is a lot easier to colour the surfaces the
  ucgamdo way you want using this function. However,
  ucgamdo wireframe is extreMELY slow, so, if you have a big
  ucgamdo matrix it might be a pain in the behind. Also, the
  ucgamdo way you feed the data to wireframe is different to
  ucgamdo the way you do it with the persp function. I hope
  ucgamdo this is of any help.
  ucgamdo M.

  ucgamdo __
  ucgamdo [EMAIL PROTECTED] mailing list
  ucgamdo https://www.stat.math.ethz.ch/mailman/listinfo/r-help



__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Retrieve ... argument values

2003-09-16 Thread Thomas Lumley

On Tue, 16 Sep 2003 [EMAIL PROTECTED] wrote:


 Dear R users,

 I want to retrieve ... argument values within a function. Here is a small
 exmaple:

 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ylim
   plot(x, ...)
 }


One solution is
dots-substitute(list(...))
a-dots$ylim

which sets a to NULL if there is no ylim argument and to the ylim argument
if it exists

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Retrieve ... argument values

2003-09-16 Thread Simon Fear

For most purposes a more useful technique is to write the
function with a default NULL argument

myfunc - function(x, ylim=NULL)

so that it can be called as myfunc(x) or myfunc(x,y). Inside
the function you test for !is.null(ylim) and take appropriate
action.

Alternatively, and maybe more commonly, you give ylim a 
sensible default so the caller has to be explicit about 
setting ylim to NULL if required.

As it happens, in your specific case you can write as you did except
for the pseudo-line(hasArg(ylim)), because all arguments will get
passed to plot, which will itself recognise and test for
an argument called ylim, within its own 

HTH

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: 16 September 2003 15:14
 To: [EMAIL PROTECTED]
 Subject: [R] Retrieve ... argument values
 
 
 Security Warning:
 If you are not sure an attachment is safe to open please contact 
 Andy on x234. There are 0 attachments with this message.
 
 
 
 Dear R users,
 
 I want to retrieve ... argument values within a function. Here is a
 small
 exmaple:
 
 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ylim
   plot(x, ...)
 }
 
 x - rnorm(100)
 myfunc(x, ylim=c(-0.5, 0.5))
 Error in myfunc(x, ylim = c(-0.5, 0.5)) : Object ylim not found
 
 
 I need to retrieve values of ylim (if it is defined when function is
 called) for later use in the function. Can anybody give me some hint?
 
 Thanks a lot.
 
 Huan
 
 
 
 
 This message and any attachments (the message) is\
 intende...{{dropped}}
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

 

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com
 
Number of attachments included with this message: 0
 
This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Retrieve ... argument values

2003-09-16 Thread Peter Dalgaard BSA

[EMAIL PROTECTED] writes:

 Dear R users,
 
 I want to retrieve ... argument values within a function. Here is a small
 exmaple:
 
 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ylim
   plot(x, ...)
 }
 
 x - rnorm(100)
 myfunc(x, ylim=c(-0.5, 0.5))
 Error in myfunc(x, ylim = c(-0.5, 0.5)) : Object ylim not found
 
 
 I need to retrieve values of ylim (if it is defined when function
 is called) for later use in the function. Can anybody give me some
 hint?

Yes, several:

ylim %in% names(match.call(expand.dots=FALSE)$...)

or

ylim %in% names(list(...)

(Use the former if it is somehow important not to evaluate the
arguments).

Or even

a - list(...)$ylim

and then check for is.null(a).

-- 
   O__   Peter Dalgaard Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics 2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark  Ph: (+45) 35327918
~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: R: [R] gam and concurvity

2003-09-16 Thread Martin Wegmann

On Tuesday 16 September 2003 16:28, Vito Muggeo wrote:
 As someone (Simon Wood, for instance) could explain much better and as it
 is stressed in the help files of the mgcv pakage (the package  including
 the gam() function)
 gam in R is not a clone of gam in S+.
 S+ uses backfitting while R uses penalized splines (see the references
 inside gam() function). The approaches are quite different and can lead to
 substantial differences in particular cases, for instance with concurvity.

 best,
 vito

 PS Can you point out the exact reference for Figueiras et al. (2003)?

I haven't found a journal name but the *.pdf download is 
http://isi-eh.usc.es/trabajos/110_70_fullpaper.pdf 


 - Original Message -
 From: Martin Wegmann [EMAIL PROTECTED]
 To: R-list [EMAIL PROTECTED]
 Sent: Tuesday, September 16, 2003 3:47 PM
 Subject: [R] gam and concurvity

  Hello,
 
  in the paper Avoiding the effects of concurvity in GAM's .. of
  Figueiras

 et

  al. (2003) it is mentioned that in GLM collinearity is taken into account

 in

  the calc of se but not in GAM (- results in confidence interval too

 narrow,

  p-value understated,  GAM S-Plus version). I haven't found any references

 to

  GAM and concurvity or collinearity on the R page. And I wonder if the R
  version of Gam differ in this point.
  Another question would be, what the best manual way of a variable

 selection

  is, due to the lack of a stepwise procedure for GAM. Including the first
  variables, add var1, if GCV improves (what would be considered as
  improvement?) or P-value signif., keep it, otherwise drop it - add var 2,

 and

  so on?
 
  thanks in advance, cheers Martin
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Retrieve ... argument values

Huan  -

Look at the function code for  order().  To show the function
definition, type just order at the command line (no quotes,
no parentheses).  This example is what I found most useful
when I had a similar question.  The green book is also useful.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Tue, 16 Sep 2003 [EMAIL PROTECTED] wrote:

 Dear R users,

 I want to retrieve ... argument values within a function.
 Here is a small exmaple:

 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ylim
   plot(x, ...)
 }

 I need to retrieve values of ylim (if it is defined when
 function is called) for later use in the function. Can anybody
 give me some hint?  Thanks a lot.
 Huan

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Retrieve ... argument values

2003-09-16 Thread Liaw, Andy

Try:

 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ...$ylim
   plot(x, ...)
 }

HTH,
Andy

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, September 16, 2003 10:14 AM
 To: [EMAIL PROTECTED]
 Subject: [R] Retrieve ... argument values
 
 
 
 Dear R users,
 
 I want to retrieve ... argument values within a function. 
 Here is a small
 exmaple:
 
 myfunc - function(x, ...)
 {
   if (hasArg(ylim))  a - ylim
   plot(x, ...)
 }
 
 x - rnorm(100)
 myfunc(x, ylim=c(-0.5, 0.5))
 Error in myfunc(x, ylim = c(-0.5, 0.5)) : Object ylim not found
 
 
 I need to retrieve values of ylim (if it is defined when 
 function is called) for later use in the function. Can 
 anybody give me some hint?
 
 Thanks a lot.
 
 Huan
 
 
 
 
 This message and any attachments (the message) is\ 
 intende...{{dropped}}
 
 __
 [EMAIL PROTECTED] mailing list 
 https://www.stat.math.ethz.ch/mailman/listinfo /r-help
 

--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] ASA Stat. Computing and Stat. Graphics 2004 Student Paper competition

2003-09-16 Thread jose . pinheiro

The Statistical Computing and Statistical Graphics Sections of the ASA
are co-sponsoring a student paper competition on the topics of
Statistical Computing and Statistical Graphics. Students are
encouraged to submit a paper in one of these areas, which might be
original methodological research, some novel computing or graphical
application in statistics, or any other suitable contribution (for
example, a software-related project). The selected winners will
present their papers in a topic-contributed session at the 2004 Joint
Statistical Meetings. The Sections will pay registration fees for the
winners as well as a substantial allowance for transportation to the
meetings and lodging. Enclosed below is the full text of the award
announcement.
More details can be found at the Stat. Computing Section website at
http://www.statcomputing.org.

Best Regards,

--José Pinheiro

Awards Chair
ASA Statistical Computing Section Statistical Computing and Statistical Graphics Sections
American Statistical Association
Student Paper Competition 2004

The Statistical Computing and Statistical Graphics Sections of the ASA
are co-sponsoring a student paper competition on the topics of
Statistical Computing and Statistical Graphics. Students are
encouraged to submit a paper in one of these areas, which might be
original methodological research, some novel computing or graphical
application in statistics, or any other suitable contribution (for
example, a software-related project). The selected winners will
present their papers in a topic-contributed session at the 2004 Joint
Statistical Meetings. The Sections will pay registration fees for the
winners as well as a substantial allowance for transportation to the
meetings and lodging (which in most cases covers these expenses
completely).

Anyone who is a student (graduate or undergraduate) on or after
September 1, 2003 is eligible to participate. An entry must include
an abstract, a six page manuscript (including figures, tables and
references), a C.V., and a letter from a faculty member familiar with
the student's work. The applicant must be the first author of the
paper. The faculty letter must include a verification of the
applicant's student status and, in the case of joint authorship,
should indicate what fraction of the contribution is attributable to
the applicant. We prefer that electronic submissions of papers be in
Postscript or PDF. All materials must be in English.

All application materials MUST BE RECEIVED by 5:00 PM EST, Monday,
January 5, 2004 at the address below. They will be reviewed by the
Student Paper Competition Award committee of the Statistical Computing
and Graphics Sections. The selection criteria used by the committee
will include innovation and significance of the contribution. Award
announcements will be made in late January, 2004.

Additional important information on the competition can be accessed on
the website of the Statistical Computing Section,
www.statcomputing.org. A current pointer to the website is available
from the ASA website at www.amstat.org. Inquiries and application
materials should be emailed or mailed to:

Student Paper Competition
c/o Dr. José Pinheiro
Biostatistics, Novartis Pharmaceuticals
One Health Plaza, Room 419/2115
East Hanover, NJ 07936
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Old libraries with new R?

2003-09-16 Thread Ted Harding

Hi Folks,

I'm currently installing R-1.7.1 off CRAN.

As it happens, I have a CD (kindly made for me by Linux
Emporium) containing all the libraries which were on CRAN
early this year when I installed R-1.6.1. This is highly
convenient, since the alternative would be several hours
on-line.

While a recent library will on installation announce the
fact should it need a newer version of R than the one which
is installed, presumably this is not likely to be the case
for an old library if a newer version of R is incompatible
with it.

So is there a way of finding out whether a library dating
from some time back is compatible with a recent R, other than
simply trying it out to see if it works OK?

With thanks, and best wishes to all,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 167 1972
Date: 16-Sep-03   Time: 15:54:16
-- XFMail --

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Retrieve ... argument values

2003-09-16 Thread Simon Fear

Yes, and I was wrong to say ylim=NULL was more useful, I should 
have said much easier to understand and much easier to read,
debug and maintain. Of course for certain applications it IS worth
getting to grips with ..., and other people's posts have
been extremely useful in that regard.

 -Original Message-
 From: Ben Bolker [mailto:[EMAIL PROTECTED]
 Sent: 16 September 2003 16:18
 To: Simon Fear
 Cc: [EMAIL PROTECTED]; R help list
 Subject: RE: [R] Retrieve ... argument values

 Security Warning:
 If you are not sure an attachment is safe to open please contact 
 Andy on x234. There are 0 attachments with this message.

   Yes, although this becomes tedious if (e.g.) you have a 
 function that 
 calls two different functions, each of which has many arguments (e.g. 
 plot() and barplot(); then you have to set up a whole lot of 
 arguments 
 that default to NULL and, more annoyingly, you have to 
 document them all
 in any .Rd file you create -- rather than just having a ... argument
 which 
 you can say should contain arguments for either of the 
 subfunctions (as 
 long as the arguments don't overlap, of course)

   Ben

Simon Fear
Senior Statistician
Syne qua non Ltd
Tel: +44 (0) 1379 69
Fax: +44 (0) 1379 65
email: [EMAIL PROTECTED]
web: http://www.synequanon.com

Number of attachments included with this message: 0

This message (and any associated files) is confidential and\...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Retrieve ... argument values

2003-09-16 Thread Ben Bolker

  
  Yes, although this becomes tedious if (e.g.) you have a function that 
calls two different functions, each of which has many arguments (e.g. 
plot() and barplot(); then you have to set up a whole lot of arguments 
that default to NULL and, more annoyingly, you have to document them all 
in any .Rd file you create -- rather than just having a ... argument which 
you can say should contain arguments for either of the subfunctions (as 
long as the arguments don't overlap, of course)

  Ben


On Tue, 16 Sep 2003, Simon Fear wrote:

 For most purposes a more useful technique is to write the
 function with a default NULL argument
 
 myfunc - function(x, ylim=NULL)
 
 so that it can be called as myfunc(x) or myfunc(x,y). Inside
 the function you test for !is.null(ylim) and take appropriate
 action.
 
 Alternatively, and maybe more commonly, you give ylim a 
 sensible default so the caller has to be explicit about 
 setting ylim to NULL if required.
 
 As it happens, in your specific case you can write as you did except
 for the pseudo-line(hasArg(ylim)), because all arguments will get
 passed to plot, which will itself recognise and test for
 an argument called ylim, within its own 
 
 HTH
 
  -Original Message-
  From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
  Sent: 16 September 2003 15:14
  To: [EMAIL PROTECTED]
  Subject: [R] Retrieve ... argument values
  
  
  Security Warning:
  If you are not sure an attachment is safe to open please contact 
  Andy on x234. There are 0 attachments with this message.
  
  
  
  Dear R users,
  
  I want to retrieve ... argument values within a function. Here is a
  small
  exmaple:
  
  myfunc - function(x, ...)
  {
if (hasArg(ylim))  a - ylim
plot(x, ...)
  }
  
  x - rnorm(100)
  myfunc(x, ylim=c(-0.5, 0.5))
  Error in myfunc(x, ylim = c(-0.5, 0.5)) : Object ylim not found
  
  
  I need to retrieve values of ylim (if it is defined when function is
  called) for later use in the function. Can anybody give me some hint?
  
  Thanks a lot.
  
  Huan
  
  
  
  
  This message and any attachments (the message) is\
  intende...{{dropped}}
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
  
 
 Simon Fear
 Senior Statistician
 Syne qua non Ltd
 Tel: +44 (0) 1379 69
 Fax: +44 (0) 1379 65
 email: [EMAIL PROTECTED]
 web: http://www.synequanon.com
  
 Number of attachments included with this message: 0
  
 This message (and any associated files) is confidential and\...{{dropped}}
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

-- 
620B Bartram Hall[EMAIL PROTECTED]
Zoology Department, University of Floridahttp://www.zoo.ufl.edu/bolker
Box 118525   (ph)  352-392-5697
Gainesville, FL 32611-8525   (fax) 352-392-3704

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Old libraries with new R?

(Ted Harding) wrote:

Hi Folks,

I'm currently installing R-1.7.1 off CRAN.

As it happens, I have a CD (kindly made for me by Linux
Emporium) containing all the libraries which were on CRAN
early this year when I installed R-1.6.1. This is highly
convenient, since the alternative would be several hours
on-line.
While a recent library will on installation announce the
fact should it need a newer version of R than the one which
is installed, presumably this is not likely to be the case
for an old library if a newer version of R is incompatible
with it.
So is there a way of finding out whether a library dating
from some time back is compatible with a recent R, other than
simply trying it out to see if it works OK?
With thanks, and best wishes to all,
Ted.

There is a dependency field in a package's DESCRIPTION file. Here the 
package author *might* give information on dependency for a minimal 
required R version. But this is not checked, and the author not always 
knows about dependencies, because he/she is probably developing on 
recent versions.

Uwe Ligges

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] gnls( ) question

2003-09-16 Thread Paul, David A

Last week (Wed 9/10/2003, regression questions) I posted 
a question regarding the use of gnls( ) and its dissimilarity 
to the syntax that nls( ) will accept.  No one replied, so 
I partly answered my own question by constructing indicator 
variables for use in gnls( ).  The code I used to construct 
the indicators is at the end of this email.
 
I do have a nagging, unanswered question:

What exactly does Warning message: Step halving factor 
reduced below minimum in NLS step in: gnls(model = y ~ 5 + ...)
mean?  I have tried to address this by specifying control = 
list(maxIter = 1000, pnlsMaxIter = 200, msMaxIter = 1000, 
tolerance = 1e-06, pnlsTol = 1e-04, msTol = 1e-07, minScale = 
1e-10, returnObject = TRUE) in my model calls, but this
does not entirely eliminate the problem (I am running gnls( )
24 separate times on separate data sets).

 
Much thanks in advance,
  david paul



#Constructing Indicator Variables
indicator - paste( foo$X - sapply(foo$subject.id, 
FUN = function(x) if(x == X) 1 else 0) )
indicator - parse( text = indicator )[[1]]
subjectID.foo - as.factor(as.character(unique(foo$animal.id)))

for(i in subjectID.foo)
{
INDICATOR -  do.call(substitute, 
list(indicator, list(i = i, 
X = as.character(subjectID.foo[i]
eval(INDICATOR)
}

foo$Overall.Effect - rep(1,length(foo$dose.group))

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] how to print a plot

2003-09-16 Thread Weiming Zhang

Thank you all very much!

I did forget to use dev.off(). Everything works great now.

Many thanks.

Weiming zhang

On Tue, 2003-09-16 at 00:38, Prof Brian Ripley wrote:
 You did use dev.off() to finish the plots before trying to look at them?
 The symptoms you report are what happens if you did not.
 
 There is no ps() function in R: the postscript device is postscript() not 
 ps().
 
 If you want to print a plot, try dev.print.  If you want to copy to a 
 file, try dev.copy2eps.  (You are on Linux, where EPS is more widely 
 acceptable than PDF.)
 
 On 15 Sep 2003, Weiming Zhang wrote:
 
  Hi, Thank both of you.
  
  I tried everything. pdf(file=out.pdf) gave me a damaged pdf file. ps()
  did not print. ps(out.ps) gave me a ps file with badly drawn graph and
  could not be printed. I am using RH linux 7.2.
  
  Thanks again.
  
  weiming Zhang
  
  
  On Mon, 2003-09-15 at 16:06, Jason Turner wrote:
   On Tue, 2003-09-16 at 08:56, Weiming Zhang wrote:
Hi,

I am using R-1.7.1 on Linux. I integrated XEMACS with R. Could anybody
tell me how to print a plot? I used plot function to make some graphs
and then I wanted to print them or to save them to files. But I could
not find out how to do it.
   
   Have you tried:
   help(Devices) 
   help(pdf)
   
   What I do:
   
   pdf(file=myplots.pdf)
   plot(...)
   dev.off()
   
   Use Acrobat or gv to view the pdf files.  Postscript is also good, but
   not as universally understood; I have many coleagues who work in very
   standard Windows environments, where ghostscript is unknown.  PDF is a
   very sensible choice for e-mailing graphs.
   
   
   -- 
   Indigo Industrial Controls Ltd.
   http://www.indigoindustrial.co.nz
   +64-(0)21-343-545
   
   
   
  
  
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo/r-help
  
  
 
 -- 
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Question in Using sink function

2003-09-16 Thread Yao, Minghua


Could anyone please explain to me why the following writes nothing into
all.Rout
file? If the for loop is removed, t.test output can be written into
all.out.

Thanks in advance. 

Minghua Yao

 ..
  zz - file(all.Rout, open=wt)
  sink(zz)
   
  for(i in 1:n)
  {   
Cy3-X[,2*i-1];
Cy5-X[,2*i];

t.test(Cy3, Cy5)
  }


  sink()
  close(zz)
..

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Question in Using sink function

2003-09-16 Thread Andrew C. Ward

Dear Minghua Yao,

If you throw in a print() or two you'll get some output in
your file. You could try print(t.test(Cy3, Cy5)) or whatever
you actually want.


Regards,

Andrew C. Ward

CAPE Centre
Department of Chemical Engineering
The University of Queensland
Brisbane Qld 4072 Australia
[EMAIL PROTECTED]


Quoting Yao, Minghua [EMAIL PROTECTED]:

 
 Could anyone please explain to me why the following
 writes nothing into
 all.Rout
 file? If the for loop is removed, t.test output can be
 written into
 all.out.
 
 Thanks in advance. 
 
 Minghua Yao
 
  ..
   zz - file(all.Rout, open=wt)
   sink(zz)

   for(i in 1:n)
   { 
 Cy3-X[,2*i-1];
 Cy5-X[,2*i];
   
 t.test(Cy3, Cy5)
   }
 
 
   sink()
   close(zz)
 ..
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Question in Using sink function

Autoprinting does not work inside a for() {} loop, and you did not print 
anything.

Try

for(i in 1:10) {i}

Did you try your problem without sink()?

On Tue, 16 Sep 2003, Yao, Minghua wrote:

 
 Could anyone please explain to me why the following writes nothing into
 all.Rout
 file? If the for loop is removed, t.test output can be written into
 all.out.
 
 Thanks in advance. 
 
 Minghua Yao
 
  ..
   zz - file(all.Rout, open=wt)
   sink(zz)

   for(i in 1:n)
   { 
 Cy3-X[,2*i-1];
 Cy5-X[,2*i];
   
 t.test(Cy3, Cy5)
   }
 
 
   sink()
   close(zz)
 ..
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Question in Using sink function

2003-09-16 Thread Yao, Minghua


Thanks, Prof. Ripley.

Right. I saw nothing, either, when I tried without for loop.
Does anywhere in the documents mention that Autoprinting does not work
inside a for() {} loop?

Minghua 

-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 16, 2003 11:35 AM
To: Yao, Minghua
Cc: R Help (E-mail)
Subject: Re: [R] Question in Using sink function


Autoprinting does not work inside a for() {} loop, and you did not print 
anything.

Try

for(i in 1:10) {i}

Did you try your problem without sink()?

On Tue, 16 Sep 2003, Yao, Minghua wrote:

 
 Could anyone please explain to me why the following writes nothing into
 all.Rout
 file? If the for loop is removed, t.test output can be written into
 all.out.
 
 Thanks in advance. 
 
 Minghua Yao
 
  ..
   zz - file(all.Rout, open=wt)
   sink(zz)

   for(i in 1:n)
   { 
 Cy3-X[,2*i-1];
 Cy5-X[,2*i];
   
 t.test(Cy3, Cy5)
   }
 
 
   sink()
   close(zz)
 ..
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 
 

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Old libraries with new R?

On Tue, 16 Sep 2003, Uwe Ligges wrote:

 (Ted Harding) wrote:
 
  Hi Folks,
  
  I'm currently installing R-1.7.1 off CRAN.
  
  As it happens, I have a CD (kindly made for me by Linux
  Emporium) containing all the libraries which were on CRAN
  early this year when I installed R-1.6.1. This is highly
  convenient, since the alternative would be several hours
  on-line.
  
  While a recent library will on installation announce the
  fact should it need a newer version of R than the one which
  is installed, presumably this is not likely to be the case
  for an old library if a newer version of R is incompatible
  with it.
  
  So is there a way of finding out whether a library dating
  from some time back is compatible with a recent R, other than
  simply trying it out to see if it works OK?
  
  With thanks, and best wishes to all,
  Ted.
  
  
 
 There is a dependency field in a package's DESCRIPTION file. Here the 
 package author *might* give information on dependency for a minimal 
 required R version. But this is not checked, and the author not always 
 knows about dependencies, because he/she is probably developing on 
 recent versions.

I think Ted wants the reverse: will an old package source work with
current R?  That would need prescience beyond most package authors to know 
at the time the package was bundled.

The best think to do I believe is to first check if the version you have
is the same as that on CRAN, and if not try R CMD check on the old
package.  (If yes, you could check 
http://cran.r-project.org/src/contrib/checkSummary.html to see if it works 
on the CRAN machines.)

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] can predict ignore rows with insufficient info

I need predict to ignore rows that contain levels not in the
model.

Consider a data frame, const, that has columns for the number of
days required to construct a site and the city and state the site
was constructed in.

g-lm(days~city,data=const)

Some of the sites in const have not yet been completed, and therefore
they have days==NA. I want to predict how many days these sites
will take to complete (I've simplified the above discussion to
remove many of the other factors involved.)

nconst-subset(const,is.na(const$days))
x-predict(g,nconst)
Error in model.frame.default(object, data, xlev = xlev) :
factor city has new level(s) ALBANY

This is because we haven't yet completed a site in Albany.
If I just had one to worry about I could easily fix it (choose
a nearby market with similar characteristic) but I am dealing
with a several hundred cities. Instead, for the cities not
modeled by g I'd simply like to use the state, even though I
don't expect it to be as good:

g-lm(days~state,data=const)
x-predict(g,nconst)

I'm not sure how to identify the cities in nconst that are not
modeled by g (my actual model has many more predictors in the
formula) Is there a way to instruct predict to only predict the
rows for which it has enough information and not complain about
the others?

g-lm(days~city,data=const)
x-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
g-lm(days~state,data=const)
y-predict(g,nconst)
x[is.na(x)]-y[is.na(x)]

thanks,
pete

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] help(print) seems truncated

Dear r-help  -

I just noticed that in my R-1.7.1 on i386-pc-linux-gnu,
the page displayed by  help(print)  ends with the line

 ## Printing of factors illustrated for ex

and then no more.  It looks as though something got truncated
here.  I think this is an R that I compiled from source off of
CRAN, but I can't quite remember.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Question in Using sink function

On Tue, 16 Sep 2003, Yao, Minghua wrote:

 
 Thanks, Prof. Ripley.
 
 Right. I saw nothing, either, when I tried without for loop.
 Does anywhere in the documents mention that Autoprinting does not work
 inside a for() {} loop?

It is in `An Introduction to R', albeit in a rather sophisticated way,
and of course in all good books on R/S.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] Question in Using sink function

2003-09-16 Thread Thomas Lumley

On Tue, 16 Sep 2003, Yao, Minghua wrote:


 Thanks, Prof. Ripley.

 Right. I saw nothing, either, when I tried without for loop.
 Does anywhere in the documents mention that Autoprinting does not work
 inside a for() {} loop?


It's a FAQ.

-thomas

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] help(print) seems truncated

Thomas W Blackwell wrote:

Dear r-help  -

I just noticed that in my R-1.7.1 on i386-pc-linux-gnu,
the page displayed by  help(print)  ends with the line
 ## Printing of factors illustrated for ex

and then no more.  It looks as though something got truncated
here.  I think this is an R that I compiled from source off of
CRAN, but I can't quite remember.
-  tom blackwell  -  u michigan medical school  -  ann arbor  -
It's still in the R-1.8.0 alpha sources from yesterday and has been 
introduced between R-1.5.1 and R-1.6.2. Might be fixed before this 
message come through, hence not as a bug report ...

Uwe Ligges

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] path analysis

2003-09-16 Thread Christian Hennig

There is a library sem for structural equation models.

Best,
Christian Hennig

On Mon, 15 Sep 2003, Catherine Stein wrote:

 
 Can anyone help me find a R script that does path analysis with family
 data (like a Beta model)?  A script that takes the variance-covariance
 matrix in as input would be ideal.
 
 Thanks!  Please email me with any ideas!
 Cathy Stein
 [EMAIL PROTECTED]
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 

-- 
***
Christian Hennig
Seminar fuer Statistik, ETH-Zentrum (LEO), CH-8092 Zuerich (currently)
and Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
[EMAIL PROTECTED], http://stat.ethz.ch/~hennig/
[EMAIL PROTECTED], http://www.math.uni-hamburg.de/home/hennig/
###
ich empfehle www.boag-online.de

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info

On Tue, Sep 16, 2003 at 11:44:02AM -0500, Peter Whiting wrote:
 
 I'm not sure how to identify the cities in nconst that are not
 modeled by g (my actual model has many more predictors in the
 formula)

I guess I could use some form of
subset(const,const$city%in%g$xlevels$city) 
over and over again for each factor...

as usual, there has to be a better way.

pete




 Is there a way to instruct predict to only predict the
 rows for which it has enough information and not complain about
 the others?
 
 g-lm(days~city,data=const)
 x-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
 g-lm(days~state,data=const)
 y-predict(g,nconst)
 x[is.na(x)]-y[is.na(x)]
 
 thanks,
 pete
 
 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info

Peter  -

Your subsequent email seems just right.  You have to determine
ahead of time which rows can be estimated.  Here's a strategy,
and possibly some code to implement it.

Let  supported(i,y,d)  be a user-written function which returns
a logical vector indicating rows which should be omitted from
the prediction on account of a non-covered covariate in column i
of data frame d with outcome variable y.  Apply this function to
all columns in your data frame using  lapply().  Then do the or
of all the logical vectors by calculating the row sums of the
numeric (0 or 1) equivalents.  Last, convert back to logical,
and subscript your data frame with this in the call to  predict().

Here's some rough code:

supported - function(i,y,d)  {
   result - rep(F, dim(d)[1])  # default return value when
   if (is.factor(d[[i]]))   #  d[[i]] is not a factor.
 result - d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ])
   result  }

tmp.1 - lapply(seq(along=const), supported, days, const)
tmp.2 - matrix(unlist(tmp.1[ names(const) != days ]), nrow=dim(const)[1])
tmp.3 - as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))

x - predict(g, const[ is.na(const$days)  !tmp.3, ])

This code uses a few arcane maneuvers.  Look at help pages for
the relevant functions to dope out what it is doing.  Particularly
for  lapply(), seq(), rep(), unlist(), unique(), %*%, %in%.
(The last two must be quoted in order to see the help).

However, the code might work for you right out of the box !

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

On Tue, 16 Sep 2003, Peter Whiting wrote:

 I need predict to ignore rows that contain levels not in the
 model.

 Consider a data frame, const, that has columns for the number of
 days required to construct a site and the city and state the site
 was constructed in.

 g-lm(days~city,data=const)

 Some of the sites in const have not yet been completed, and therefore
 they have days==NA. I want to predict how many days these sites
 will take to complete (I've simplified the above discussion to
 remove many of the other factors involved.)

 nconst-subset(const,is.na(const$days))
 x-predict(g,nconst)
 Error in model.frame.default(object, data, xlev = xlev) :
 factor city has new level(s) ALBANY

 This is because we haven't yet completed a site in Albany.
 If I just had one to worry about I could easily fix it (choose
 a nearby market with similar characteristic) but I am dealing
 with a several hundred cities. Instead, for the cities not
 modeled by g I'd simply like to use the state, even though I
 don't expect it to be as good:

 g-lm(days~state,data=const)
 x-predict(g,nconst)

 I'm not sure how to identify the cities in nconst that are not
 modeled by g (my actual model has many more predictors in the
 formula) Is there a way to instruct predict to only predict the
 rows for which it has enough information and not complain about
 the others?

 g-lm(days~city,data=const)
 x-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
 g-lm(days~state,data=const)
 y-predict(g,nconst)
 x[is.na(x)]-y[is.na(x)]

 thanks,
 pete


__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info

Peter  -

Error !!
I forgot a not in the third line inside the function supported().
And, my mail editor doesn't balance parentheses, so I don't guarantee
that my code is even syntatically correct.

Corrected and re-named version of function:

unsupported - function(i,y,d)  {
   result - rep(F, dim(d)[1])  # default return value when
   if (is.factor(d[[i]]))   #  d[[i]] is not a factor.
 result - !(d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ]))
   result  }

tmp.1 - lapply(seq(along=const), unsupported, days, const)
tmp.2 - matrix(unlist(tmp.1[ names(const) != days ]), nrow=dim(const)[1])
tmp.3 - as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))

x - predict(g, const[ is.na(const$days)  !tmp.3, ])

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

[R] Re: Number of R users

2003-09-16 Thread Martin Maechler

I have been asked again about the numbers of R users.
The following is part of my answer, probably of some interest to some.

 In preparing some notes, I'd like to give some
 approximate baseline estimate of how many people are
 using R nowdays.

of course a very interesting question.  It came up on R-help
in June 2000, with a small heated debate :
start at   --- http://www.r-project.org/nocvs/mail/r-help/2000/1493.html
but first, read on.

 perhaps the size of the R-help list would be a decent
 starting point. Any chance you could give me the
 approximate number of users?

Well, as with practical statistics, at first it's trivial, but
if you start thinking it becomes quite interesting..
At the moment, 
o  R-help  has  2005 unique e-mail addresses subscribed
o  All R-lists have 2659 for R-* alone, i.e. w/o bioconductor
o  ALL R-lists have 3189 unique (all R-* lists + bioconductor
 combined, then uniqued) addresses,

But from the mailman logs, for R-help e.g., this noon,  
 1023 got r-help directly
  780 got r-help as digest
which leaves about 200 (~ 10%) who seem to have mail delivery
disabled for some reason {explicitly, by bouncing, delivery
not-disabled but not successful on first try, ..?..} 

Then I also guess (from the address) that some groups deliver
R-help to an `internal mailing list' ((something we pretty
strongly discourage, particularly since it complicates unsubscription)).

Now you should probably read the R-help discussion thread from
two years ago (URL above). Quite interesting.
People's guesses wildly varied then from about 10'000 to 400'000 --
based on about the third of mailing list subscribers.
The multiplication factor `f' inR_users = f * R-help_readers
was conservatively estimated in the range of 10-20 (rather the latter).
This would lead to a guess of about 50'000 users (with a wildly estimated
[logarithmic] standard error of factor 2).  I think most would
agree that this would *not* count students who only use R during
their classes. 

(Please before you comment on this, do read the June 2000 thread ..)

-- 
Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info

On Tue, Sep 16, 2003 at 04:17:59PM -0400, Thomas W Blackwell wrote:
 Peter  -
 
 Your subsequent email seems just right.  You have to determine
 ahead of time which rows can be estimated.

It seems that predict removes rows with insufficient information
(ie, if I replace ALBANY with NA and refactor everything works)
- I wonder why it doesn't exhibit the same behavior when it
encounters a new level - just eliminate the row and go on...

Somewhat related: I had been assuming (incorrectly)
that length(x) would equal length(const$days) after
x-predict(g,const) - this isn't the case if any of the rows of
const don't contain enough info for the model.  Those rows are
eliminated - I'd have expected them to just be NAs in the result.
I'll go back and look through the documents to see if there is a
straight forward way to convert:

 x
  1   3   4
1.5 1.5 1.5

to
 x
  1  2  3   4  5
1.5 NA 1.5 1.5 NA

slowly learning,
pete





  Here's a strategy,
 and possibly some code to implement it.
 
 Let  supported(i,y,d)  be a user-written function which returns
 a logical vector indicating rows which should be omitted from
 the prediction on account of a non-covered covariate in column i
 of data frame d with outcome variable y.  Apply this function to
 all columns in your data frame using  lapply().  Then do the or
 of all the logical vectors by calculating the row sums of the
 numeric (0 or 1) equivalents.  Last, convert back to logical,
 and subscript your data frame with this in the call to  predict().
 
 Here's some rough code:
 
 supported - function(i,y,d)  {
result - rep(F, dim(d)[1])  # default return value when
if (is.factor(d[[i]]))   #  d[[i]] is not a factor.
  result - d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ])
result  }
 
 tmp.1 - lapply(seq(along=const), supported, days, const)
 tmp.2 - matrix(unlist(tmp.1[ names(const) != days ]), nrow=dim(const)[1])
 tmp.3 - as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))
 
 x - predict(g, const[ is.na(const$days)  !tmp.3, ])
 
 This code uses a few arcane maneuvers.  Look at help pages for
 the relevant functions to dope out what it is doing.  Particularly
 for  lapply(), seq(), rep(), unlist(), unique(), %*%, %in%.
 (The last two must be quoted in order to see the help).
 
 However, the code might work for you right out of the box !
 
 -  tom blackwell  -  u michigan medical school  -  ann arbor  -
 
 On Tue, 16 Sep 2003, Peter Whiting wrote:
 
  I need predict to ignore rows that contain levels not in the
  model.
 
  Consider a data frame, const, that has columns for the number of
  days required to construct a site and the city and state the site
  was constructed in.
 
  g-lm(days~city,data=const)
 
  Some of the sites in const have not yet been completed, and therefore
  they have days==NA. I want to predict how many days these sites
  will take to complete (I've simplified the above discussion to
  remove many of the other factors involved.)
 
  nconst-subset(const,is.na(const$days))
  x-predict(g,nconst)
  Error in model.frame.default(object, data, xlev = xlev) :
  factor city has new level(s) ALBANY
 
  This is because we haven't yet completed a site in Albany.
  If I just had one to worry about I could easily fix it (choose
  a nearby market with similar characteristic) but I am dealing
  with a several hundred cities. Instead, for the cities not
  modeled by g I'd simply like to use the state, even though I
  don't expect it to be as good:
 
  g-lm(days~state,data=const)
  x-predict(g,nconst)
 
  I'm not sure how to identify the cities in nconst that are not
  modeled by g (my actual model has many more predictors in the
  formula) Is there a way to instruct predict to only predict the
  rows for which it has enough information and not complain about
  the others?
 
  g-lm(days~city,data=const)
  x-predict(g,nconst) ## the rows of x with city=ALBANY will be NA
  g-lm(days~state,data=const)
  y-predict(g,nconst)
  x[is.na(x)]-y[is.na(x)]
 
  thanks,
  pete
 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] Re: Number of R users

2003-09-16 Thread Martin Maechler

 MM == Martin Maechler [EMAIL PROTECTED]
 on Tue, 16 Sep 2003 22:45:24 +0200 writes:
  ^
  (too late in the evening !)
  ..

MM Well, as with practical statistics, at first it's trivial, but
MM if you start thinking it becomes quite interesting..
MM At the moment, 
MM o  R-help  has  2005 unique e-mail addresses subscribed
MM o  All R-lists have 2659 for R-* alone, i.e. w/o bioconductor
MM o  ALL R-lists have 3189 unique (all R-* lists + bioconductor
MM combined, then uniqued) addresses,

As Jeff Gentry has noted (from the size of bioconductor) this
seems pretty (too!) astonishing.
I have checked, and from the 530 bioconductor subscribers, 112
are on R-help as well.  The bug in the above counting: I got the
last number manually -- with a mistake -- where the 2659 comes
from a reliable perl script.
3189 must be corrected down to 3055.

(as it says Never trust a statistic, unless  :-)
 okay, definitely getting late today..)

Martin

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info



On Tue, 16 Sep 2003, Peter Whiting wrote:

 It seems that predict removes rows with insufficient information
 (ie, if I replace ALBANY with NA and refactor everything works)
 - I wonder why it doesn't exhibit the same behavior when it
 encounters a new level - just eliminate the row and go on...

 Somewhat related: I had been assuming (incorrectly)
 that length(x) would equal length(const$days) after
 x-predict(g,const) - this isn't the case if any of the rows of
 const don't contain enough info for the model.  Those rows are
 eliminated - I'd have expected them to just be NAs in the result.
 I'll go back and look through the documents to see if there is a
 straight forward way to convert:

  x
   1   3   4
 1.5 1.5 1.5

 to
  x
   1  2  3   4  5
 1.5 NA 1.5 1.5 NA

 slowly learning,
 pete

Before running  predict(...),  do  options(na.action=na.exclude).
this will give the equal length behavior that you may want ... as
long as you have replaced unsupported factor levels with NA.  See
help(na.omit)  and  help(options)  to see what this is doing.
(It won't have any effect of course, if you subscript the newdata
argument to predict() using my strategy.)

And, DO use a simple strategy that you cooked up yourself, in
preference to anything canned.  It's much easier to maintain.

-  tom blackwell  -  u michigan medical school  -  ann arbor  -

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info

On Tue, Sep 16, 2003 at 04:31:29PM -0400, Thomas W Blackwell wrote:
 Peter  -
 
 Error !!
 I forgot a not in the third line inside the function supported().
 And, my mail editor doesn't balance parentheses, so I don't guarantee
 that my code is even syntatically correct.
 
 Corrected and re-named version of function:
 
 unsupported - function(i,y,d)  {
result - rep(F, dim(d)[1])  # default return value when
if (is.factor(d[[i]]))   #  d[[i]] is not a factor.
  result - !(d[[i]] %in% unique(d[[i]][ !is.na(d[[y]]) ]))
result  }
 
 tmp.1 - lapply(seq(along=const), unsupported, days, const)
 tmp.2 - matrix(unlist(tmp.1[ names(const) != days ]), nrow=dim(const)[1])
 tmp.3 - as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))
 
 x - predict(g, const[ is.na(const$days)  !tmp.3, ])

this still suffers from the fact that the factor for city
still has ALBANY in it (even though it doesn't occur in the
subset).  It can be fixed by creating yet another tmp variable
and refactoring... Kinda painful with multiple predictors in
addition to city, but it is workable. 

 const
  state city days
1s1   c11
2s1   c1   NA
3s2   c21
4s2   c21
5s1   c3   NA
 tmp.1 - lapply(seq(along=const), unsupported, days, const)
 tmp.2 - matrix(unlist(tmp.1[ names(const) != days ]), nrow=dim(const)[1])
 tmp.3 - as.logical(as.vector(tmp.2 %*% rep(1, dim(tmp.2)[2])))
 x - predict(g, const[ is.na(const$days)  !tmp.3, ])
Error in model.frame.default(object, data, xlev = xlev) :
factor city has new level(s) c3
 tmp.4 - subset(const,is.na(const$days)  !tmp.3)
 x - predict(g, tmp.4)
Error in model.frame.default(object, data, xlev = xlev) :
factor city has new level(s) c3
 tmp.4$city=factor(tmp.4$city)
 x - predict(g, tmp.4)
 

pete

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Re: [R] can predict ignore rows with insufficient info