Re: [R] memory problem

2007-01-07 Thread Uwe Ligges


Zoltan Kmetty wrote:
 Hi!
 
 I had some memory problem with R - hope somebody could tell me a solution.
 
 I work with very large datasets, but R cannot allocate enough memoty to
 handle these datasets.
 
 I want work a matrix with row= 100 000 000 and column=10
 
 A know this is 1 milliard cases, but i thought R could handle it (other
 commercial software like spss could do), but R wrote out everytime: not
 enough memory..
 
 any good idea?


Buy a machine that has at least 8Gb (better 16Gb) of RAM and proceed ...

Uwe Ligges



 Thanks, Zoltan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using VGAM's vglm function for ordinal logistic regression

2007-01-07 Thread Prof Brian Ripley
On Sat, 6 Jan 2007, Inman, Brant A.   M.D. wrote:


 R-Experts:

 I am using the vglm function of the VGAM library to perform proportional
 odds ordinal logistic regression.  The issue that I would like help with
 concerns the format in which the response variable must be provided for
 this function to work correctly.

 pneumo2$severity
  [1] normal normal normal normal normal normal normal normal mild   mild
[11] mild   mild   mild   mild   mild   mild   severe severe severe severe
[21] severe severe severe severe
Levels: mild  normal  severe

is different from the ordering in the first example.

The difference between vglm and polr is that the latter uses the 
conventional sign for the coefficients: see MASS4 p.204.

I would never use as.ordered on a character vector, as this leaves it to R 
to choose the ordering.  (Even if you think you intended alphabetical 
order, that depends on the locale: see the warnings on the help page.)


 Consider the following example:

 --

 library(VGAM)
 library(MASS)

 attach(pneumo)
 pneumo# Inspect the format of the original dataset

 # Restructure the pneumo dataset into a different format
 pneumo2 - data.frame(matrix(ncol=3, nrow=24))
 colnames(pneumo2) - c('exposure.time', 'severity', 'freq')
 pneumo2[,1] - rep(pneumo[,1],3)
 pneumo2[,2] -
 as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
 pneumo2[1:8,3]   - pneumo[,2]
 pneumo2[9:16,3]  - pneumo[,3]
 pneumo2[17:24,3] - pneumo[,4]
 pneumo2   # Inspect the format of the new modified dataset

 --

 The problem occurs when I try to analyze these two datasets, which are
 identical in content, with the proportional odds model using vglm:

 --

 # Analyze the original dataset with vglm, get one set of results

 vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
 data=pneumo,
 +  family=cumulative(parallel=T))

 Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time)
  9.676093  10.581725  -2.596807

 Degrees of Freedom: 16 Total; 13 Residual
 Residual Deviance: 5.026826
 Log-likelihood: -204.2742

 # Analyzing the modified dataset with vglm gives another set of results

 vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,
 + family=cumulative(parallel=T))

 Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time)
-1.6306499  2.5630170 -0.1937956

 Degrees of Freedom: 48 Total; 45 Residual
 Residual Deviance: 503.7251
 Log-likelihood: -251.8626
 Warning messages:
 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)

 # Analyzing the modified dataset with polr reproduces these second
 results

 polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)

 Coefficients:
 log(exposure.time)
 0.1938154

 Intercepts:
  mild|normal normal|severe
-1.630612  2.563055

 Residual Deviance: 503.7251
 AIC: 509.7251

 --

 Can someone explain what I am doing wrong when using vglm and polr with
 the modified dataset?  I do not understand why these two formulations
 should give different results.

 Brant Inman

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] export many plots to one file

2007-01-07 Thread Jonne Zutt
Another thing to think about is about each individual scatter plot.
Aren't you plotting too much duplicate (x,y) values?
You could try to plot only unique values, or even try to filter out 
points that are very close.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Mark Wardle
Dear all,

The as.Date() function appears to give different results depending on
the order of the vector passed into it.

d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
as.Date(d1) # gives correct results
as.Date(d2) # fails with error (* see below)

This problem does not arise if the dates are NA rather than an empty
string, but my data is coming via RODBC and I still don't have NAs
passed across properly.

I might add that I initially noticed this behaviour when using RODBC's
sqlQuery() function call, and I initially had difficulty explaining why
one column of dates was passed correctly, but another failed. The
failing column was a date of death column where it was NA () for
most patients.

I've come up with two workarounds that work. The first is to sort the
data at the SQL level, ensuring the initial record is not null. The
second is to use sqlQuery() with as.is=T option, and then do the sorting
and conversion afterwards.

Is the behaviour of as.Date() shown above as expected/designed?

Many thanks,

Mark


(*) Error in fromchar(x) : character string is not in a standard
unambiguous format

sessionInfo():
R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale:
C/en_GB.UTF-8/C/C/C/C
attached base packages:
[1] methods   stats graphics  grDevices utils
datasets base

other attached packages:
rcompletion   RODBC
   0.0-12 1.1-7

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Gavin Simpson
On Sun, 2007-01-07 at 12:01 +, Mark Wardle wrote:
 Dear all,
 
 The as.Date() function appears to give different results depending on
 the order of the vector passed into it.
 
 d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
 d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
 as.Date(d1)   # gives correct results
 as.Date(d2)   # fails with error (* see below)
 
 This problem does not arise if the dates are NA rather than an empty
 string, but my data is coming via RODBC and I still don't have NAs
 passed across properly.
 
 I might add that I initially noticed this behaviour when using RODBC's
 sqlQuery() function call, and I initially had difficulty explaining why
 one column of dates was passed correctly, but another failed. The
 failing column was a date of death column where it was NA () for
 most patients.
 
 I've come up with two workarounds that work. The first is to sort the
 data at the SQL level, ensuring the initial record is not null. The
 second is to use sqlQuery() with as.is=T option, and then do the sorting
 and conversion afterwards.

Why not just tell R what the format the dates are in, using the format
argument to as.Date?

 d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
 d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
 as.Date(d1, %Y-%m-%d)
[1] 1900-01-01 2007-01-01 NA   2001-05-03
 as.Date(d2, %Y-%m-%d)
[1] NA   1900-01-01 2007-01-01 2001-05-03

 
 Is the behaviour of as.Date() shown above as expected/designed?

I don't know about expected/designed, but I would have thought
explicitly stating the date format would be the most fool-proof way of
making sure R did what you wanted, and the easiest way to work around
your problem.

HTH

G

 
 Many thanks,
 
 Mark
 
 
 (*) Error in fromchar(x) : character string is not in a standard
 unambiguous format
 
 sessionInfo():
 R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale:
 C/en_GB.UTF-8/C/C/C/C
 attached base packages:
 [1] methods   stats graphics  grDevices utils
 datasets base
 
 other attached packages:
 rcompletion   RODBC
0.0-12 1.1-7
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC  [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK[w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT  [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Prof Brian Ripley
On Sun, 7 Jan 2007, Mark Wardle wrote:

 Dear all,

 The as.Date() function appears to give different results depending on
 the order of the vector passed into it.

 d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
 d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
 as.Date(d1)   # gives correct results
 as.Date(d2)   # fails with error (* see below)

 This problem does not arise if the dates are NA rather than an empty
 string, but my data is coming via RODBC and I still don't have NAs
 passed across properly.

 I might add that I initially noticed this behaviour when using RODBC's
 sqlQuery() function call, and I initially had difficulty explaining why
 one column of dates was passed correctly, but another failed. The
 failing column was a date of death column where it was NA () for
 most patients.

 I've come up with two workarounds that work. The first is to sort the
 data at the SQL level, ensuring the initial record is not null. The
 second is to use sqlQuery() with as.is=T option, and then do the sorting
 and conversion afterwards.

 Is the behaviour of as.Date() shown above as expected/designed?

Yes.  It uses the first non-NA string to choose the format *if you do not 
specify it*.

The correct work-around is to get non-valid strings returned as NA, not 
.  That is argument 'na.strings' in RODBC (and elsewhere: read.table 
behaves in the same way).

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Mark Wardle
Prof Brian Ripley wrote:

 The correct work-around is to get non-valid strings returned as NA, not
 .  That is argument 'na.strings' in RODBC (and elsewhere: read.table
 behaves in the same way).
 


Thanks for these replies.

As I have mentioned before, my peculiar combination of PostgreSQL,
Actual's ODBC driver on Mac OS X, and RODBC means that for numbers and
dates, NULL values are passed to R as empty strings rather than NAs (2).
This does not occur with PostgreSQL's text column type.

For the benefit of others who in the future may use this combination(1),
my workaround for numbers/integers/boolean values is to essentially have
temporary intermediate tables with columns of type text whatever the
format, and let R/RODBC parse the strings into the correct resulting
format (which it then does faultlessly). This does not work for dates
however, and so I must use one of the two workarounds I mentioned in my
post.


Best wishes,

Mark

(1) unlikely as it may be
(2) I still cannot fathom why integers and dates are not handled correctly.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] different points and lines on the same plot

2007-01-07 Thread antoniababe
Dear all,

I have following data called paitent

daypatient1patient4patient5patient6   
0   -0.27842688 -0.04080808 -0.41948398 -0.04508318 
56  -0.22275425 -0.01767067 -0.30977249 -0.03168185  
112 -0.08217659 -0.26209243 -0.29141451 -0.09876170 
252  0.08044537 -0.26701769  0.05727087 -0.09663701 

where each patient have response values at four time
points. I want to plot each patient's values over time
on the same plot where the value points are connected
by line. That is, the graph will have four lines for
the four patients. I tried the program below but
couldn't make it work correctly. I'm new beginner and
haven't yet learned how functions line and points work
together. Hope you can help me out.

Thanks for your help,

Antonia

par(mfrow=c(1,1))
plot(patient[,1],patient[,2], pch=1,
type=l,col=1,cex=1,lwd=2,
 xlab=Days, ylab=Patient
response,cex.main =1,font.main= 1,
 main=NULL)
   
 
points(patient[,1],patient[,3],col=2,pch=1,cex=1)
lines(patient[,1],patient[,3],col=2,lty=1,cex=1)
   
points(patient[,1],patient[,4],col=3,pch=1,cex=1)
lines(patient[,1],patient[,4],col=2,lty=2,cex=1)
   
points(patient[,1],patient[,5],col=4,pch=1,cex=1)
lines(patient[,1],patient[,5],col=2,lty=1,cex=1)
  points(patient[,1],patient[,6],col=5,pch=1,cex=1)
 lines(patient[,1],patient[,6],col=2,lty=1,cex=1)

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] negative binomial family glm R and STATA

2007-01-07 Thread Lesnoff, Matthieu \(ILRI\)
Dear Patrick

below are some comments.

For ML estimation of negative binomial glim, there is also the function
negbin in the package aod (CRAN). This function uses optim(stats). Based
on your data, we have just detected a small bug in negbin, when the
Hessian matrix (that we use for computing the variances of the ML
estimates) is singular, which seems to be the case in the model you
proposed. We will soon fix this bug and update the package. At the end
of my message, I've provided a corrected (and simplified) version of the
function, negbin0, that you can source for reproducing the code below.
Note that we don't estimate theta but phi = 1 / theta (with E[y] = mu
and Var[y] = mu + phi * mu^2).

#=== FIT OF YOUR MODEL

# The data you provided

mydata - zonesdb4 

# Remove the unused level 0 of axe_routier 

mydata$axe_routier - factor(as.character(mydata$axe_routier), levels =
c(1, 2))

# Your model

negbin0(
formula = nbcas ~ pop + Area + V_100kHab + gares + ports +
axe_routier + lacs,
random = ~ 1,
control = list(maxit = 2000),
data = mydata,
)

$param
  (Intercept)   pop  AreaV_100kHab1gares1
ports1  axe_routier2 lacs1 
 6.008098e+00  1.015842e-05 -3.019320e-06  1.556476e+00  1.267495e+00
-4.549933e+00 -3.156201e+00  4.677113e+00 

8.287353e+00 

$H.singularity
[1] TRUE

$varparam
[1] NA

$logL
[1] -418.5078

$logL.max
[1] -167.6718

$dev
[1] 501.672

$code
[1] 0

#=== END

Here phi = 8.287353 (i.e. theta = 0.1206658), log-likelihood = -418.5078
and deviance = 501.672.

Few remarks:

- our results seem not to be similar to the STATA results you provided.
If I well understood, with STATA, log-likelihood = -597.1477759 (which
is lower than ours) and theta = 1. With R, I considered all the
covariables as factors, except pop and Area (continuous). Did you the
same with STATA?

- In the optimization process used in the example, the Hessian matrix is
singular. That often occurs when the model is overparametrized (and
therefore very instable) compared to the number of data you have (I
think your model is).

- I am not sure that the type of model you proposed is the most adapted.
Why not a model such as log(nbcas / pop) = X b, which is commonly used
(see Agresti, 1990. Categorical data analysis. Wiley) for analysing
rates of occurence of events, for example in epidemiology? With negbin,
this model is (with only considering axe_routier): 
 
 negbin(
+ formula = nbcas ~ axe_routier + offset(log(pop)),
+ random = ~ 1,
+ data = mydata
+ )

Negative-binomial model
---
negbin(formula = nbcas ~ axe_routier + offset(log(pop)), random = ~1, 
data = mydata)

Convergence was obtained after 82 iterations.

Fixed-effect coefficients:
 Estimate Std. Error  z value Pr( |z|)
(Intercept)   -6.5072 0.4888 -13.3114 1e-4
axe_routier2   1.0234 0.6839   1.49640.1346

Overdispersion coefficients:
Estimate Std. Error z value Pr( z)
phi.(Intercept)  10.7611 1.7936  5.9997   1e-4

Log-likelihood statistics
 Log-liknbpar  df res. Deviance  AIC AICc 
-411.1923   89  487.040  828.384  828.656 

- Finally, the response variable nbcas has a lot of values 0. A
zero-inflated model could perhaps much better fit the data.

Best wishes

Matthieu


#==
# FUNCTION negbin0 (TO SOURCE)
#==

negbin0 - function(formula, random, data, phi.ini = NULL, fixpar =
list(), hessian = TRUE,...){
link - log
f - formula
mf - model.frame(formula = f, data = data)
y - model.response(mf)
fam - poisson(link = log)
fm - glm(formula = f, family = fam, data = data)
offset - model.offset(mf)
# Model matrices
modmatrix.b - model.matrix(object = f, data = data)
if(random != ~ 1)
random - update.formula(random, ~ . - 1)
modmatrix.phi - model.matrix(object = random, data = data)
nb.b - ncol(modmatrix.b) ; nb.phi - ncol(modmatrix.phi) ; nbpar -
nb.b + nb.phi
# Initial values
if(is.null(phi.ini)) phi.ini - rep(0.1, nb.phi)
b - fm$coefficients
param.ini - c(b, phi.ini)
if(is.null(unlist(fixpar)) == FALSE) param.ini[fixpar[[1]]] -
fixpar[[2]]  
# minuslogL
minuslogL - function(param){
if(!is.null(unlist(fixpar))) param[fixpar[[1]]] - fixpar[[2]]  
b - param[1:nb.b]
eta - as.vector(modmatrix.b %*% b)
mu - if(is.null(offset)) exp(eta) else exp(offset + eta)
phi - as.vector(modmatrix.phi %*% param[(nb.b + 1):(nb.b +
nb.phi)])
k - 1 / phi 
cnd - phi == 0
f1 - dpois(x = y[cnd], lambda = mu[cnd], log = TRUE) 
y2 - y[!cnd]; k2 - k[!cnd]; mu2 - mu[!cnd]
f2 - lgamma(y2 + k2) - lfactorial(y2) - lgamma(k2) + k2 *
log(k2 / (k2 + mu2)) + y2 * log(mu2 / (k2 + mu2))
fn - sum(c(f1, f2))
if(!is.finite(fn)) fn - -1e20
-fn
}
# Fit
res - 

Re: [R] different points and lines on the same plot

2007-01-07 Thread Gabor Grothendieck
Try this:

matplot(patient[,1], patient[,-1], type = o)


On 1/7/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Dear all,

 I have following data called paitent

 daypatient1patient4patient5patient6
 0   -0.27842688 -0.04080808 -0.41948398 -0.04508318
 56  -0.22275425 -0.01767067 -0.30977249 -0.03168185
 112 -0.08217659 -0.26209243 -0.29141451 -0.09876170
 252  0.08044537 -0.26701769  0.05727087 -0.09663701

 where each patient have response values at four time
 points. I want to plot each patient's values over time
 on the same plot where the value points are connected
 by line. That is, the graph will have four lines for
 the four patients. I tried the program below but
 couldn't make it work correctly. I'm new beginner and
 haven't yet learned how functions line and points work
 together. Hope you can help me out.

 Thanks for your help,

 Antonia

 par(mfrow=c(1,1))
plot(patient[,1],patient[,2], pch=1,
 type=l,col=1,cex=1,lwd=2,
 xlab=Days, ylab=Patient
 response,cex.main =1,font.main= 1,
 main=NULL)


 points(patient[,1],patient[,3],col=2,pch=1,cex=1)
 lines(patient[,1],patient[,3],col=2,lty=1,cex=1)

 points(patient[,1],patient[,4],col=3,pch=1,cex=1)
 lines(patient[,1],patient[,4],col=2,lty=2,cex=1)

 points(patient[,1],patient[,5],col=4,pch=1,cex=1)
 lines(patient[,1],patient[,5],col=2,lty=1,cex=1)
  points(patient[,1],patient[,6],col=5,pch=1,cex=1)
  lines(patient[,1],patient[,6],col=2,lty=1,cex=1)

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using VGAM's vglm function for ordinal logistic regression

2007-01-07 Thread Inman, Brant A. M.D.
Thank you for the help.  Indeed, the differences in the results that I
noted were due to the incorrect ordering of the response variable that
resulted from my unattentive use of as.ordered on a character vector.

Brant


-Original Message-
From: Prof Brian Ripley [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 2:52 AM
To: Inman, Brant A. M.D.
Cc: r-help@stat.math.ethz.ch; [EMAIL PROTECTED]
Subject: Re: [R] Using VGAM's vglm function for ordinal logistic
regression

On Sat, 6 Jan 2007, Inman, Brant A.   M.D. wrote:


 R-Experts:

 I am using the vglm function of the VGAM library to perform
proportional
 odds ordinal logistic regression.  The issue that I would like help
with
 concerns the format in which the response variable must be provided
for
 this function to work correctly.

 pneumo2$severity
  [1] normal normal normal normal normal normal normal normal mild
mild
[11] mild   mild   mild   mild   mild   mild   severe severe severe
severe
[21] severe severe severe severe
Levels: mild  normal  severe

is different from the ordering in the first example.

The difference between vglm and polr is that the latter uses the 
conventional sign for the coefficients: see MASS4 p.204.

I would never use as.ordered on a character vector, as this leaves it to
R 
to choose the ordering.  (Even if you think you intended alphabetical 
order, that depends on the locale: see the warnings on the help page.)


 Consider the following example:

 --

 library(VGAM)
 library(MASS)

 attach(pneumo)
 pneumo# Inspect the format of the original dataset

 # Restructure the pneumo dataset into a different format
 pneumo2 - data.frame(matrix(ncol=3, nrow=24))
 colnames(pneumo2) - c('exposure.time', 'severity', 'freq')
 pneumo2[,1] - rep(pneumo[,1],3)
 pneumo2[,2] -
 as.ordered(c(rep('normal',8),rep('mild',8),rep('severe',8)))
 pneumo2[1:8,3]   - pneumo[,2]
 pneumo2[9:16,3]  - pneumo[,3]
 pneumo2[17:24,3] - pneumo[,4]
 pneumo2   # Inspect the format of the new modified dataset

 --

 The problem occurs when I try to analyze these two datasets, which are
 identical in content, with the proportional odds model using vglm:

 --

 # Analyze the original dataset with vglm, get one set of results

 vglm(vglm(cbind(normal, mild, severe) ~ log(exposure.time),
 data=pneumo,
 +  family=cumulative(parallel=T))

 Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time)
  9.676093  10.581725  -2.596807

 Degrees of Freedom: 16 Total; 13 Residual
 Residual Deviance: 5.026826
 Log-likelihood: -204.2742

 # Analyzing the modified dataset with vglm gives another set of
results

 vglm(severity ~ log(exposure.time), weights=freq, data=pneumo2,
 + family=cumulative(parallel=T))

 Coefficients:
 (Intercept):1  (Intercept):2 log(exposure.time)
-1.6306499  2.5630170 -0.1937956

 Degrees of Freedom: 48 Total; 45 Residual
 Residual Deviance: 503.7251
 Log-likelihood: -251.8626
 Warning messages:
 1: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 2: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 3: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 4: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)
 5: 4 elements replaced by 1.819e-12 in: checkwz(wz, M = M, trace =
 trace, wzeps = control$wzepsilon)

 # Analyzing the modified dataset with polr reproduces these second
 results

 polr(severity ~ log(exposure.time), weights=freq, data=pneumo2)

 Coefficients:
 log(exposure.time)
 0.1938154

 Intercepts:
  mild|normal normal|severe
-1.630612  2.563055

 Residual Deviance: 503.7251
 AIC: 509.7251

 --

 Can someone explain what I am doing wrong when using vglm and polr
with
 the modified dataset?  I do not understand why these two formulations
 should give different results.

 Brant Inman

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Partial proportional odds logistic regression

2007-01-07 Thread Inman, Brant A. M.D.

R-experts:

I would like to explore the partial proportional odds models of Peterson
and Harrell (Applied Statistics 1990, 39(2): 205-217) for a dataset that
I am analyzing.  I have not been able to locate a R package that
implements these models.  Is anyone aware of existing R functions,
packages, etc... that might be used to implement the partial
proportional odds models?

Brant Inman

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multiple plots via sapply or lapply?

2007-01-07 Thread Antje
Hi all,

I've got the following problem. I have a vector containing file names. I 
want to read these files as csv and calculate the density-function for 
each file (has just one column with data). Then, I'd like to plot all 
density functions into one window. I did the following to calculate the 
density data:

s - sapply(filelist, function(x) {
if(file.exists(x))
{
file - read.csv(x, sep=\t, header=F)
return( list(density(file$V1)$x, density(file$V1)$y))
}
})

Now I would like to plot these x,y data in a similar way but my result 
s is a matrix containing lists...

  File1.csv File2.csv   File3.csv
[1,] Numeric,512Numeric,512 Numeric,512
[2,] Numeric,512Numeric,512 Numeric,512

Now I don't know how to handle the x,y values for each plot into an 
sapply (or lapply, I don't know)

Any idea? Maybe, I should somehow change the return type?

Antje

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] odfWeave and figures in MS Word Format

2007-01-07 Thread Laurent Rhelp
Dear R-List,

   I try to use the package odfWeave but I missed something. I use the 
command odfWeave(examples.odt,out.odt) to generate the file that I 
can open with OpenOffice : it works fine. Then, I used the Save As  to 
export the document to HTML format : it works fine and creates the .png 
files for every figure from  the out.odt document. But when I export the 
document to MS Word format (OS : XP) there is no figure in the word 
document .doc.

What did I do wrong ?

Thanks

Laurent

##
R  version
   _
platform   i386-pc-mingw32  
arch   i386 
os mingw32  
system i386, mingw32
status  
major  2
minor  3.1  
year   2006 
month  06   
day01   
svn rev38247
language   R
version.string Version 2.3.1 (2006-06-01)

##

OpenOffice Version

2.1

##

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple plots via sapply or lapply?

2007-01-07 Thread talepanda
try apply() :

par(new=F);
apply(s,2,function(x){plot(x[[1]],x[[2]],type=o);par(new=T)})

On 1/8/07, Antje [EMAIL PROTECTED] wrote:
 Hi all,

 I've got the following problem. I have a vector containing file names. I
 want to read these files as csv and calculate the density-function for
 each file (has just one column with data). Then, I'd like to plot all
 density functions into one window. I did the following to calculate the
 density data:

 s - sapply(filelist, function(x) {
   if(file.exists(x))
   {
   file - read.csv(x, sep=\t, header=F)
   return( list(density(file$V1)$x, density(file$V1)$y))
   }
   })

 Now I would like to plot these x,y data in a similar way but my result
 s is a matrix containing lists...

   File1.csv   File2.csv   File3.csv
 [1,] Numeric,512  Numeric,512 Numeric,512
 [2,] Numeric,512  Numeric,512 Numeric,512

 Now I don't know how to handle the x,y values for each plot into an
 sapply (or lapply, I don't know)

 Any idea? Maybe, I should somehow change the return type?

 Antje

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory problem

2007-01-07 Thread Bos, Roger
Doing so will also require using the Linux version of R as there is no
open source 64bit compiler to create a windows version (so I am told).
Anyway, memory is getting cheap now.  I got a 64bit machine from Dell
with 16Gb od DDR2 for around $4000.  Good luck.
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Uwe Ligges
Sent: Sunday, January 07, 2007 3:42 AM
To: Zoltan Kmetty
Cc: r-help@stat.math.ethz.ch
Subject: Re: [R] memory problem



Zoltan Kmetty wrote:
 Hi!
 
 I had some memory problem with R - hope somebody could tell me a
solution.
 
 I work with very large datasets, but R cannot allocate enough memoty 
 to handle these datasets.
 
 I want work a matrix with row= 100 000 000 and column=10
 
 A know this is 1 milliard cases, but i thought R could handle it 
 (other commercial software like spss could do), but R wrote out 
 everytime: not enough memory..
 
 any good idea?


Buy a machine that has at least 8Gb (better 16Gb) of RAM and proceed ...

Uwe Ligges



 Thanks, Zoltan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

** * 
This message is for the named person's use only. It may 
contain confidential, proprietary or legally privileged 
information. No right to confidential or privileged treatment 
of this message is waived or lost by any error in 
transmission. If you have received this message in error, 
please immediately notify the sender by e-mail, 
delete the message and all copies from your system and destroy 
any hard copies. You must not, directly or indirectly, use, 
disclose, distribute, print or copy any part of this message 
if you are not the intended recipient.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] as.Date() results depend on order of data within vector?

2007-01-07 Thread Patrick Connolly
On Sun, 07-Jan-2007 at 12:01PM +, Mark Wardle wrote:

| Dear all,
| 
| The as.Date() function appears to give different results depending on
| the order of the vector passed into it.
| 
| d1 = c(1900-01-01, 2007-01-01,,2001-05-03)
| d2 = c(, 1900-01-01, 2007-01-01,2001-05-03)
| as.Date(d1)  # gives correct results
| as.Date(d2)  # fails with error (* see below)
| 
| This problem does not arise if the dates are NA rather than an empty
| string, but my data is coming via RODBC and I still don't have NAs
| passed across properly.
| 
| I might add that I initially noticed this behaviour when using RODBC's
| sqlQuery() function call, and I initially had difficulty explaining why
| one column of dates was passed correctly, but another failed. The
| failing column was a date of death column where it was NA () for
| most patients.
| 
| I've come up with two workarounds that work. The first is to sort the
| data at the SQL level, ensuring the initial record is not null. The
| second is to use sqlQuery() with as.is=T option, and then do the sorting
| and conversion afterwards.

Simpler, I think, is to add one line
d2[d2 == ] - NA

I've not tested the idea extensively, so there might be occasions
where it falls down.  If you're working with a dataframe, you can use
one of the apply functions to effect all columns.


HTH

-- 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.   
   ___Patrick Connolly   
 {~._.~} Great minds discuss ideas
 _( Y )_Middle minds discuss events 
(:_~*~_:)Small minds discuss people  
 (_)-(_)   . Anon
  
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating a list of lists

2007-01-07 Thread Paul Boutros
Hello,

I'm trying to create a series of randomForest objects, basically in a  
loop like this:

forests - list();

for (level in 1:10) {

  # do some other things here

# create a random forest
forest - randomForest(
x = x.level,
y = z.level,
ntree = trees
);

forests - c(forests, forest);

}


But instead of creating a list of 10 forests, this creates a list of  
180 elements, 18 for each forest.  Is there a way to create a list of  
randomForest objects for use later on in code like:

for (forest in forests) {
 values - predict(forest, data);
 # do things with these predicted values
 }

Sincerely,
Paul

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eval(parse(text vs. get when accessing a function

2007-01-07 Thread Ramon Diaz-Uriarte
Hi Martin,



On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote:
 Hi Ramon,

 It seems like a naming convention (f.xxx) and eval(parse(...)) are
 standing in for objects (of class 'GeneSelector', say, representing a
 function with a particular form and doing a particular operation) and
 dispatch (a function 'geneConverter' might handle a converter of class
 'GeneSelector' one way, user supplied ad-hoc functions more carefully;
 inside geneConverter the only real concern is that the converter
 argument is in fact a callable function).

 eval(parse(...)) brings scoping rules to the fore as an explicit
 programming concern; here scope is implicit, but that's probably better
 -- R will get its own rules right.

 Martin

 Here's an S4 sketch:

 setClass(GeneSelector,
  contains=function,
  representation=representation(description=character),
  validity=function(object) {
  msg - NULL
  argNames - names(formals(object))
  if (argNames[1]!=x)
msg - c(msg, \n  GeneSelector requires a first argument 
 named 'x')
  if (!... %in% argNames)
msg - c(msg, \n  GeneSelector requires '...' in its 
 signature)
  if (0==length([EMAIL PROTECTED]))
msg - c(msg, \n  Please describe your GeneSelector)
  if (is.null(msg)) TRUE else msg
  })

 setGeneric(geneConverter,
function(converter, x, ...) standardGeneric(geneConverter),
signature=c(converter))

 setMethod(geneConverter,
   signature(converter=GeneSelector),
   function(converter, x, ...) {
   ## important stuff here
   converter(x, ...)
   })

 setMethod(geneConverter,
   signature(converter=function),
   function(converter, x, ...) {
   message(ad-hoc converter; hope it works!)
   converter(x, ...)
   })

 and then...

  c1 - new(GeneSelector,
 +   function(x, ...) prod(x, ...),
 +   description=Product of x)
 
  c2 - new(GeneSelector,
 +   function(x, ...) sum(x, ...),
 +   description=Sum of x)
 
  geneConverter(c1, 1:4)
 [1] 24
  geneConverter(c2, 1:4)
 [1] 10
  geneConverter(mean, 1:4)
 ad-hoc converter; hope it works!
 [1] 2.5
 
  cvterr - new(GeneSelector, function(y) {})
 Error in validObject(.Object) : invalid class GeneSelector object: 1:
   GeneSelector requires a first argument named 'x'
 invalid class GeneSelector object: 2:
   GeneSelector requires '...' in its signature
 invalid class GeneSelector object: 3:
   Please describe your GeneSelector
  xxx - 10
  geneConverter(xxx, 1:4)
 Error in function (classes, fdef, mtable)  :
 unable to find an inherited method for function geneConverter, for 
 signature numeric




Thanks!! That is actually a rather interesting alternative approach
and I can see it also adds a lot of structure to the problem. I have
to confess, though, that I am not a fan of OOP (nor of S4 classes); in
this case, in particular, it seems there is a lot of scaffolding in
the code above (the counterpoint to the structure?) and, regarding
scoping rules, I prefer to think about them explicitly (I find it much
simpler than inheritance).

Best,

R.



 Ramon Diaz-Uriarte [EMAIL PROTECTED] writes:

  Dear Greg,
 
 
  On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote:
  Ramon,
 
  I prefer to use the list method for this type of thing, here are a couple 
  of reasons why (maybe you are more organized than me and would never do 
  some of the stupid things that I have, so these don't apply to you, but 
  you can see that the general suggestion applys to some of the rest of us).
 
 
 
  Those suggestions do apply to me of course (no claim to being
  organized nor beyond idiocy here). And actually the suggestions on
  this thread are being very useful. I think, though, that I was not
  very clear on the context and my examples were too dumbed down. So
  I'll try to give more detail (nothing here is secret, I am just trying
  not to bore people).
 
  The code is part of a web-based application, so there is no
  interactive user. The R code is passed the arguments (and optional
  user functions) from the CGI.
 
  There is one core function (call it cvFunct) that, among other
  things, does cross-validation. So this is one way to do things:
 
  cvFunct - function(whatever, genefiltertype, whateverelse) {
internalGeneSelect - eval(parse(text = paste(geneSelect,
   genefiltertype, sep = .)))
 
## do things calling internalGeneSelect,
  }
 
  and now define all possible functions as
 
  geneSelect.Fratio - function(x, y, z) {##something}
  geneSelect.Wilcoxon - function(x, y, z) {## something else}
 
  If I want more geneSelect functions, adding them is simple. And I can
  even allow the user to pass her/his own functions, with the only
  restriction that it takes three args, x, y, z, and that the 

Re: [R] eval(parse(text vs. get when accessing a function

2007-01-07 Thread Martin Morgan
Hi Ramon,

It seems like a naming convention (f.xxx) and eval(parse(...)) are
standing in for objects (of class 'GeneSelector', say, representing a
function with a particular form and doing a particular operation) and
dispatch (a function 'geneConverter' might handle a converter of class
'GeneSelector' one way, user supplied ad-hoc functions more carefully;
inside geneConverter the only real concern is that the converter
argument is in fact a callable function).

eval(parse(...)) brings scoping rules to the fore as an explicit
programming concern; here scope is implicit, but that's probably better
-- R will get its own rules right.

Martin

Here's an S4 sketch:

setClass(GeneSelector,
 contains=function,
 representation=representation(description=character),
 validity=function(object) {
 msg - NULL
 argNames - names(formals(object))
 if (argNames[1]!=x)
   msg - c(msg, \n  GeneSelector requires a first argument named 
'x')
 if (!... %in% argNames)
   msg - c(msg, \n  GeneSelector requires '...' in its signature)
 if (0==length([EMAIL PROTECTED]))
   msg - c(msg, \n  Please describe your GeneSelector)
 if (is.null(msg)) TRUE else msg
 })

setGeneric(geneConverter,
   function(converter, x, ...) standardGeneric(geneConverter),
   signature=c(converter))

setMethod(geneConverter,
  signature(converter=GeneSelector),
  function(converter, x, ...) {
  ## important stuff here
  converter(x, ...)
  })

setMethod(geneConverter,
  signature(converter=function),
  function(converter, x, ...) {
  message(ad-hoc converter; hope it works!)
  converter(x, ...)
  })

and then...

 c1 - new(GeneSelector,
+   function(x, ...) prod(x, ...),
+   description=Product of x)
 
 c2 - new(GeneSelector,
+   function(x, ...) sum(x, ...),
+   description=Sum of x)
 
 geneConverter(c1, 1:4)
[1] 24
 geneConverter(c2, 1:4)
[1] 10
 geneConverter(mean, 1:4)
ad-hoc converter; hope it works!
[1] 2.5
 
 cvterr - new(GeneSelector, function(y) {})
Error in validObject(.Object) : invalid class GeneSelector object: 1: 
  GeneSelector requires a first argument named 'x'
invalid class GeneSelector object: 2: 
  GeneSelector requires '...' in its signature
invalid class GeneSelector object: 3: 
  Please describe your GeneSelector
 xxx - 10
 geneConverter(xxx, 1:4)
Error in function (classes, fdef, mtable)  : 
unable to find an inherited method for function geneConverter, for 
signature numeric


Ramon Diaz-Uriarte [EMAIL PROTECTED] writes:

 Dear Greg,


 On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote:
 Ramon,

 I prefer to use the list method for this type of thing, here are a couple of 
 reasons why (maybe you are more organized than me and would never do some of 
 the stupid things that I have, so these don't apply to you, but you can see 
 that the general suggestion applys to some of the rest of us).



 Those suggestions do apply to me of course (no claim to being
 organized nor beyond idiocy here). And actually the suggestions on
 this thread are being very useful. I think, though, that I was not
 very clear on the context and my examples were too dumbed down. So
 I'll try to give more detail (nothing here is secret, I am just trying
 not to bore people).

 The code is part of a web-based application, so there is no
 interactive user. The R code is passed the arguments (and optional
 user functions) from the CGI.

 There is one core function (call it cvFunct) that, among other
 things, does cross-validation. So this is one way to do things:

 cvFunct - function(whatever, genefiltertype, whateverelse) {
   internalGeneSelect - eval(parse(text = paste(geneSelect,
  genefiltertype, sep = .)))

   ## do things calling internalGeneSelect,
 }

 and now define all possible functions as

 geneSelect.Fratio - function(x, y, z) {##something}
 geneSelect.Wilcoxon - function(x, y, z) {## something else}

 If I want more geneSelect functions, adding them is simple. And I can
 even allow the user to pass her/his own functions, with the only
 restriction that it takes three args, x, y, z, and that the function
 is to be called: geneSelect. and a user choosen string. (Yes, I need
 to make sure no calls to system, etc, are in the user code, etc,
 etc, but that is another issue).

 The general idea is not new of course. For instance, in package
 e1071, a somewhat similar thing is done in function tune, and
 David Meyer there uses do.call. However, tune is a lot more general
 than what I had in mind. For instance, tune deals with arbitrary
 functions, with arbitrary numbers and names of parameters, whereas my
 functions above all take only three arguments (x: a matrix, y: a
 vector; z: an integer), so the neat functionality 

Re: [R] creating a list of lists

2007-01-07 Thread Weiwei Shi
change
 forests - c(forests, forest);
into
 forests[[level]] - forest



On 1/6/07, Paul Boutros [EMAIL PROTECTED] wrote:
 Hello,

 I'm trying to create a series of randomForest objects, basically in a
 loop like this:

 forests - list();

 for (level in 1:10) {

   # do some other things here

 # create a random forest
 forest - randomForest(
 x = x.level,
 y = z.level,
 ntree = trees
 );

 forests - c(forests, forest);

 }


 But instead of creating a list of 10 forests, this creates a list of
 180 elements, 18 for each forest.  Is there a way to create a list of
 randomForest objects for use later on in code like:

 for (forest in forests) {
  values - predict(forest, data);
  # do things with these predicted values
  }

 Sincerely,
 Paul

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

Did you always know?
No, I did not. But I believed...
---Matrix III

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating a list of lists

2007-01-07 Thread Eric Archer
Howabout using 'lapply'?

forest - lapply(1:10, function(level) {
  # do some other things here
  # create and return a random forest
   randomForest(x = x.level, y = z.level, ntree = trees)
})

# give meaningful names
names(forest) - paste(level, 1:10, sep = _)
 
Paul Boutros wrote:
 Hello,

 I'm trying to create a series of randomForest objects, basically in a  
 loop like this:

 forests - list();

 for (level in 1:10) {

   # do some other things here

   # create a random forest
   forest - randomForest(
   x = x.level,
   y = z.level,
   ntree = trees
   );

   forests - c(forests, forest);

   }


 But instead of creating a list of 10 forests, this creates a list of  
 180 elements, 18 for each forest.  Is there a way to create a list of  
 randomForest objects for use later on in code like:

 for (forest in forests) {
  values - predict(forest, data);
  # do things with these predicted values
  }

 Sincerely,
 Paul

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
   


-- 

Eric Archer, Ph.D.
NOAA-SWFSC
8604 La Jolla Shores Dr.
La Jolla, CA 92037
858-546-7121,7003(FAX)
[EMAIL PROTECTED]


Lighthouses are more helpful than churches.
- Benjamin Franklin

...but I'll take a GPS over either one.
- Craig George

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] MDS in 3D

2007-01-07 Thread Atte Tenkanen
Hi,

I have tried to develop multidimensional scaling for 3D space using PCA without 
success, yet;-) Is there some application ready in R?

Cheers,

Atte

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MDS in 3D

2007-01-07 Thread Atte Tenkanen

 Hi,
 
 I have tried to develop multidimensional scaling for 3D space using 
 PCA without success, yet;-) Is there some application ready in R?
 
 Cheers,
 
 Atte
 

I found xgobi, but when I try to run example I get some command not found 
-errors.

Atte


 data(morsecodes) ## from the XGobi/XGvis data, see  ?morsecodes
 mc.row - paste(morsecodes.row[,1],morsecodes.row[,2])
 
 xgvis(dmat = morsecodes.dist,
+   pos = morsecodes.pos,
+   rowlab = mc.row,
+   colors = morsecodes.colors,
+   glyphs = morsecodes.glyphs,
+   lines = morsecodes.lines,
+   linecolors = morsecodes.linecolors)
xgvis /tmp/RtmpDaR3cT/xgvis-6058ed8  
 
 ##   2) Show lines by hitting l with the mouse over the plot.
 ##   3) Examine morsecode labels by hitting i and mousing around on the 
 plot.
 ##   3b) Press r (on the plot) to switch 3D rotation in xgobi.
 ##   4) Run MDS in 3D by clicking Run MDS (in xgvis).
 ##   5) Speed up the optimization by increasing the Stepsize with the 
 slider.
 ##  The Stress function value may go as low as 0.1925 (MM).
 ##   6) When the optimization calms down, click Run MDS to toggle MDS off.
 ##   7) Rotate the MDS configuration in 3D {by r with mouse over plot}.
 ##   8) Increase the rotation speed with the slider in the top left and
 ##  control the rotation direction by dragging the mouse on the plot.
 ##   9) You can check out the initial configuration by
 
 ## In order to have no color warning :
 Mcolors - unique(morsecodes.colors)
/bin/sh: line 1: xgvis: command not found
 (Mcolors - paste(*brushColor, 0:(length(Mcolors)-1),: , Mcolors, sep=))
[1] *brushColor0: SkyBlue *brushColor1: Green  
[3] *brushColor2: Yellow  *brushColor3: HotPink
[5] *brushColor4: Red
 
 xgobi(morsecodes.pos, collab = morsecodes.col, rowlab = mc.row,
+   colors = morsecodes.colors,
+   glyphs = morsecodes.glyphs,
+   lines  = morsecodes.lines,
+   linecolors = morsecodes.linecolors,
+   resources= c(*showLines: True, Mcolors))
xgobi -title 'morsecodes.pos' -std mmx /tmp/RtmpDaR3cT/xgobi-mrscd56e509fe  
/bin/sh: line 1: xgobi: command not found
 
 ##  This XGobi window will be linked with
 ##  the XGvis window for glyph-color brushing and labeling.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MDS in 3D

2007-01-07 Thread Atte Tenkanen
...and my system is OSX 10.4.

Atte

 
  Hi,
  
  I have tried to develop multidimensional scaling for 3D space 
 using 
  PCA without success, yet;-) Is there some application ready in R?
  
  Cheers,
  
  Atte
  
 
 I found xgobi, but when I try to run example I get some command not 
 found -errors.
 
 Atte
 
 
  data(morsecodes) ## from the XGobi/XGvis data, see  ?morsecodes
  mc.row - paste(morsecodes.row[,1],morsecodes.row[,2])
  
  xgvis(dmat = morsecodes.dist,
 +   pos = morsecodes.pos,
 +   rowlab = mc.row,
 +   colors = morsecodes.colors,
 +   glyphs = morsecodes.glyphs,
 +   lines = morsecodes.lines,
 +   linecolors = morsecodes.linecolors)
 xgvis /tmp/RtmpDaR3cT/xgvis-6058ed8  
  
  ##   2) Show lines by hitting l with the mouse over the plot.
  ##   3) Examine morsecode labels by hitting i and mousing 
 around on the plot.
  ##   3b) Press r (on the plot) to switch 3D rotation in xgobi.
  ##   4) Run MDS in 3D by clicking Run MDS (in xgvis).
  ##   5) Speed up the optimization by increasing the Stepsize 
 with the slider.
  ##  The Stress function value may go as low as 0.1925 (MM).
  ##   6) When the optimization calms down, click Run MDS to 
 toggle MDS off.
  ##   7) Rotate the MDS configuration in 3D {by r with mouse 
 over plot}.
  ##   8) Increase the rotation speed with the slider in the top 
 left and
  ##  control the rotation direction by dragging the mouse on 
 the plot.
  ##   9) You can check out the initial configuration by
  
  ## In order to have no color warning :
  Mcolors - unique(morsecodes.colors)
 /bin/sh: line 1: xgvis: command not found
  (Mcolors - paste(*brushColor, 0:(length(Mcolors)-1),: , 
 Mcolors, sep=))
 [1] *brushColor0: SkyBlue *brushColor1: Green  
 [3] *brushColor2: Yellow  *brushColor3: HotPink
 [5] *brushColor4: Red
  
  xgobi(morsecodes.pos, collab = morsecodes.col, rowlab = mc.row,
 +   colors = morsecodes.colors,
 +   glyphs = morsecodes.glyphs,
 +   lines  = morsecodes.lines,
 +   linecolors = morsecodes.linecolors,
 +   resources= c(*showLines: True, Mcolors))
 xgobi -title 'morsecodes.pos' -std mmx /tmp/RtmpDaR3cT/xgobi-
 mrscd56e509fe  
 /bin/sh: line 1: xgobi: command not found
  
  ##  This XGobi window will be linked with
  ##  the XGvis window for glyph-color brushing and labeling.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] negative binomial family glm R and STATA

2007-01-07 Thread Mikis Stasinopoulos
Dear  Patrick

Try the package gamlss which allow to fit the negative binomial 
distrbution. For example  with your data I am getting
#---
ga1-gamlss(nbcas~.,data=zonesdb4,family=NBI)
GAMLSS-RS iteration 1: Global Deviance = 817.9027
GAMLSS-RS iteration 2: Global Deviance = 817.9025
 ga1

Family:  c(NBI, Negative Binomial type I)
Fitting method: RS()

Call:  gamlss(formula = nbcas ~ ., family = NBI, data = zonesdb4)

Mu Coefficients:
 (Intercept)   pop  AreaV_100kHab1gares1 
   3.204e+00 1.114e-05 1.354e-05 9.144e-01 7.946e-01 
  ports1  axe_routier1  axe_routier2 lacs1 
  -1.730e+00 1.989e-01NA 3.042e+00 
Sigma Coefficients:
(Intercept) 
  2.313 

 Degrees of Freedom for the fit: 9 Residual Deg. of Freedom   83
Global Deviance: 817.902
AIC: 835.902
SBC: 858.599
#--

Note that the AIC: 835.902 is similar to your fitted model using  glm.nb 
which is  AIC: 836.2.

The coefficients are not identical but this is not suprissing when you 
are using x-variables with extreme values as pop and Area.
The profile function for sigma can be found using

prof.dev(ga1,sigma, min=7, max=16, step=0.1, type=l)

Your discrepancy with STATA come from the fact that in STATA you are 
fitting the model with sigma fixed to 1.
You can see that by fitting the same model in  GAMLSS.

   ga2-gamlss(nbcas~.,data=zonesdb4,family=NBI, sigma.fix=T, 
sigma.start=1)
GAMLSS-RS iteration 1: Global Deviance = 1194.299
GAMLSS-RS iteration 2: Global Deviance = 1194.298

This  is similar  to  the  log likelihod you are  getting in STATA. i.e. 
-2*-597.14778= 1194.296.

You can also use the stepGAIC() function to simplify your model. For  
example

 ga2-stepGAIC(ga1)

will result to a model with only pop and lacs in the mdel.

Note also the the Negative binomial type II fits better to you data.

   ga3-gamlss(nbcas~.,data=zonesdb4,family=NBII)
GAMLSS-RS iteration 1: Global Deviance = 804.5682
...
GAMLSS-RS iteration 10: Global Deviance = 804.4995


Thanks

Mikis

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] eval(parse(text vs. get when accessing a function

2007-01-07 Thread Gabor Grothendieck
The S4 is not essential.  You could do it in S3 too:

 f.a - function(x) x+1
 f.b - function(x) x+2
 f - function(x) UseMethod(f)

 f(structure(10, class = a))
[1] 11
attr(,class)
[1] a

On 1/6/07, Ramon Diaz-Uriarte [EMAIL PROTECTED] wrote:
 Hi Martin,



 On 1/6/07, Martin Morgan [EMAIL PROTECTED] wrote:
  Hi Ramon,
 
  It seems like a naming convention (f.xxx) and eval(parse(...)) are
  standing in for objects (of class 'GeneSelector', say, representing a
  function with a particular form and doing a particular operation) and
  dispatch (a function 'geneConverter' might handle a converter of class
  'GeneSelector' one way, user supplied ad-hoc functions more carefully;
  inside geneConverter the only real concern is that the converter
  argument is in fact a callable function).
 
  eval(parse(...)) brings scoping rules to the fore as an explicit
  programming concern; here scope is implicit, but that's probably better
  -- R will get its own rules right.
 
  Martin
 
  Here's an S4 sketch:
 
  setClass(GeneSelector,
   contains=function,
   representation=representation(description=character),
   validity=function(object) {
   msg - NULL
   argNames - names(formals(object))
   if (argNames[1]!=x)
 msg - c(msg, \n  GeneSelector requires a first argument 
  named 'x')
   if (!... %in% argNames)
 msg - c(msg, \n  GeneSelector requires '...' in its 
  signature)
   if (0==length([EMAIL PROTECTED]))
 msg - c(msg, \n  Please describe your GeneSelector)
   if (is.null(msg)) TRUE else msg
   })
 
  setGeneric(geneConverter,
 function(converter, x, ...) standardGeneric(geneConverter),
 signature=c(converter))
 
  setMethod(geneConverter,
signature(converter=GeneSelector),
function(converter, x, ...) {
## important stuff here
converter(x, ...)
})
 
  setMethod(geneConverter,
signature(converter=function),
function(converter, x, ...) {
message(ad-hoc converter; hope it works!)
converter(x, ...)
})
 
  and then...
 
   c1 - new(GeneSelector,
  +   function(x, ...) prod(x, ...),
  +   description=Product of x)
  
   c2 - new(GeneSelector,
  +   function(x, ...) sum(x, ...),
  +   description=Sum of x)
  
   geneConverter(c1, 1:4)
  [1] 24
   geneConverter(c2, 1:4)
  [1] 10
   geneConverter(mean, 1:4)
  ad-hoc converter; hope it works!
  [1] 2.5
  
   cvterr - new(GeneSelector, function(y) {})
  Error in validObject(.Object) : invalid class GeneSelector object: 1:
GeneSelector requires a first argument named 'x'
  invalid class GeneSelector object: 2:
GeneSelector requires '...' in its signature
  invalid class GeneSelector object: 3:
Please describe your GeneSelector
   xxx - 10
   geneConverter(xxx, 1:4)
  Error in function (classes, fdef, mtable)  :
  unable to find an inherited method for function geneConverter, 
  for signature numeric
 



 Thanks!! That is actually a rather interesting alternative approach
 and I can see it also adds a lot of structure to the problem. I have
 to confess, though, that I am not a fan of OOP (nor of S4 classes); in
 this case, in particular, it seems there is a lot of scaffolding in
 the code above (the counterpoint to the structure?) and, regarding
 scoping rules, I prefer to think about them explicitly (I find it much
 simpler than inheritance).

 Best,

 R.


 
  Ramon Diaz-Uriarte [EMAIL PROTECTED] writes:
 
   Dear Greg,
  
  
   On 1/5/07, Greg Snow [EMAIL PROTECTED] wrote:
   Ramon,
  
   I prefer to use the list method for this type of thing, here are a 
   couple of reasons why (maybe you are more organized than me and would 
   never do some of the stupid things that I have, so these don't apply to 
   you, but you can see that the general suggestion applys to some of the 
   rest of us).
  
  
  
   Those suggestions do apply to me of course (no claim to being
   organized nor beyond idiocy here). And actually the suggestions on
   this thread are being very useful. I think, though, that I was not
   very clear on the context and my examples were too dumbed down. So
   I'll try to give more detail (nothing here is secret, I am just trying
   not to bore people).
  
   The code is part of a web-based application, so there is no
   interactive user. The R code is passed the arguments (and optional
   user functions) from the CGI.
  
   There is one core function (call it cvFunct) that, among other
   things, does cross-validation. So this is one way to do things:
  
   cvFunct - function(whatever, genefiltertype, whateverelse) {
 internalGeneSelect - eval(parse(text = paste(geneSelect,
genefiltertype, sep = .)))
  
 ## do things calling 

[R] Using JRI Calling R function from Java

2007-01-07 Thread freezemanx

Hey guys.

I have Installed and running JRI from Java
and it works. Using : java -cp jri.jar; MainApp.java

The problem is How do I calling R function that need R library 
I have tried these with my Java program : 
   x =  re.eval(glm( y ~ x1 + x2, family = poisson));
   String resultString = x.asString();
but it did'n't work, resultString variable did't give any result it's null

questions : 
Is there any other way to call this R function from Java using JRI?

and  I have heard omegahat package can be use to calling R function
are there any links, examples , tutorials for using omegahat?

Thanks 
-- 
View this message in context: 
http://www.nabble.com/Using-JRI-Calling--R-function-from-Java-tf2937179.html#a8212021
Sent from the R help mailing list archive at Nabble.com.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R scripts to plot Taylor Diagram

2007-01-07 Thread Linda Smith
 Dear All,

Are there any existing scripts to plot Taylor Diagram (definition see
http://www.ipsl.jussieu.fr/~jmesce/Taylor_diagram/taylor_diagram_definition.html
) ? Thanks a lot!

Linda

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MDS in 3D

2007-01-07 Thread Prof Brian Ripley
On Mon, 8 Jan 2007, Atte Tenkanen wrote:

 I have tried to develop multidimensional scaling for 3D space using PCA 
 without success, yet;-) Is there some application ready in R?

Yes.

stats has cmdscale. MASS has sammon and isoMDS.  All three have a 'k'
argument to determine the output dimension.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] MDS in 3D

2007-01-07 Thread hadley wickham
 I found xgobi, but when I try to run example I get some command not found 
 -errors.

Try the more moden ggobi and ggvis, http://www.ggobi.org.  Installing
on the mac is still a bit of a trial, but we hope to have an installer
soon.

Hadley

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.