Re: [R] merge large number of raster objects

2011-06-10 Thread Henrik Bengtsson
...and ?mget for retrieving multiple objects as a list.  /Henrik

On Thu, Jun 9, 2011 at 10:32 PM, Henrik Bengtsson h...@biostat.ucsf.edu wrote:
 See ?get and ?do.call.  That should be enough.

 ?Reduce may be an alternative for do.call(), but could also be less
 memory efficient.

 My $.02

 /Henrik

 On Thu, Jun 9, 2011 at 7:57 PM, Ben Zaitchik zaitc...@jhu.edu wrote:
 Hello,

 I have a large number of raster objects in memory, with names RH100, RH101,
 RH102, etc. (myobjects - ls(pattern='^RH')).
 These rasters are sections of a continuous map, and I would like to combine
 them using the RasterObject merge tool in package 'raster.'

 Merge expects an input list of raster objects (outmap-merge(x, y, ...),
 where x, y, and ... are raster objects).  I can run the command successfully
 if I type in every raster object that I want to merge, but this is
 impractical for the large number of objects I'm combining.

 I would like to apply merge to a list of object names defined using some
 kind of wildcard-based list command, but I'm struggling to find the right
 data types in R.

 Is there some way to convert a vector of strings (e.g.,
 as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) that
 could be specified as the input to the merge function?  What I'd really like
 to do is something like outmap-merge(myobjects[1:40]), in order to merge
 the 40 raster objects, but I recognize that it might not be so simple.

 Advance apologies if this is already dealt with on the help list . . . I've
 gone in circles on wildcard, lapply, and names threads and haven't managed
 to get anything to work.

 Thank you,
 Ben

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to assess the accuracy of fitted logistic regression using glm

2011-06-10 Thread Xiaobo Gu
Hi Professor Brian,

Thanks for your reply.

I think there are many statisticians here, and it is somehow R
related, hoping someone can
help me.

I have done a simple test, using a sample csv data which I post if need.

donut - read.csv(file=D:/donut.csv, header = TRUE);
donut[[color]] - as.factor(donut[[color]])
donut[[shape]] - as.factor(donut[[shape]])
donut[[k]] - as.factor(donut[[k]])
donut[[k0]] - as.factor(donut[[k0]])
donut[[bias]] - as.factor(donut[[bias]])

lr - glm(color ~ shape + x + y, family = binomial, data = donut);
summary(lr)

Call:
glm(formula = color ~ shape + x + y, family = binomial, data = donut)

Deviance Residuals:
Min   1Q   Median   3Q  Max
-2.1079  -0.9476   0.5086   0.7518   1.4079

Coefficients:
Estimate Std. Error z value Pr(|z|)
(Intercept)  2.530101.65500   1.529   0.1263
shape22  0.056281.54990   0.036   0.9710
shape23 -0.745681.44813  -0.515   0.6066
shape24 -2.618961.38016  -1.898   0.0578 .
shape25 -2.076481.32818  -1.563   0.1180
x   -0.458851.52863  -0.300   0.7640
y   -0.593111.46999  -0.403   0.6866
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 50.446  on 39  degrees of freedom
Residual deviance: 42.473  on 33  degrees of freedom
AIC: 56.473

Number of Fisher Scoring iterations: 4

In the Coefficients section, is Pr(|z|) the P-value for that
variable, and there
are a few other questions:
1. How to determine the predict power of each variables?
2. How to determine the overall performance of the fitted model, here what's the
difference between and Deviance Residuals and Residual deviance?
3. How to compare Null deviance and Residual deviance?
4. What does AIC mean, and how to use this measure?
5. What does the Signif. codes section mean?

Regards,

Xiaobo Gu



On Mon, Jun 6, 2011 at 9:59 PM, Prof Brian Ripley rip...@stats.ox.ac.uk wrote:
 On Mon, 6 Jun 2011, Xiaobo Gu wrote:

 Hi,

 I am trying glm with family = binomial to do binary logistic
 regression, but how can I assess the accuracy of the fitted model, the
 summary method can print a lot of information about the returned
 object, such as coefficients, because statistics is not my speciality,
 so can you share some rule of thumb to exam the  fitted model from the
 practical perspective.

 It depends entirely on why you did the fit.  People have written whole books
 on assessing the performance of classification procedures such as binary
 logistic regression.  For example, the residual deviance is closely related
 to log-probability scoring: for some purposes that is a good performance
 measure, for others (e.g. when you are going to threshold the predicted
 probabilities) it can be very misleading.

 In short, you need statistical advice, not R advice (the purpose of this
 list).


 Regards,

 Xiaobo Gu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to convert a vector to an upper matrix

2011-06-10 Thread Zhishi Wang
Hi all,

If I have a vector A-1:6, I would like to convert A to an upper
matrix M=[0 1 2 3 ; 0 0 4 5 ; 0 0 0 6; 0 0 0 0].
That is, if the length of the known vector is n, then the order m of
the matrix I'd like to have should satisfy m(m-1)/2=n
How could I do this in R?
I really don't want to do it using a loop. Is there a function in R to do this ?

Thanks!

Best,
Zhishi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove Duplicate numbers from a vector through witing set of commands

2011-06-10 Thread Z B
Hi,

I want to write a set of commands instead of unique(x) to remove duplicate
numbers from a vector.

Would you please help me. I don't know how to compare a vector numbers
together. I am new to R.

Thanks.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting up a State Space Model in dlm

2011-06-10 Thread Gavin Simpson
On Tue, 2011-06-07 at 17:24 +0100, Michael Ash wrote:
 This question pertains to setting up a model in the package dlm
 (dynamic linear models,
 http://cran.r-project.org/web/packages/dlm/index.html

The author of the dlm package has just published a paper on state space
models in R including details on setting up dlm:

http://www.jstatsoft.org/v41/i04

That might help with your question - I haven't seen a reply on list, but
am unable to help answer it either.

HTH

G

 I have read both the vignette and An R Package for Dynamic Linear
 Models (http://www.jstatsoft.org/v36/i12/paper), both of which are
 very helpful.  There is also some discussion at
 https://stat.ethz.ch/pipermail/r-help/2009-May/198463.html
 
 I have what I think is a relatively straightforward state-space model
 but am unable to translate it into the terms of dlm.   It would be
 very helpful to get a basic dlm setup for the problem and I would
 guess that I can then modify it with more lags, etc., etc.
 
 The main equation is
 pi[t] = a * pi[t-1] + b*(U[t] - UN[t]) + e[t]
 
 (see 
 http://chart.apis.google.com/chart?cht=txchl=%5Cpi_t=a%5Cpi_{t-1}%2bb%28U_t-U^N_{t}%29%2Be_t
 for a pretty version)
 
 with pi and U observed, a and b fixed coefficients, and e a
 well-behaved error term (gaussian, say, variance unknown).
 The object of interest is the unobserved and time-varying component UN
 which evolves according to
 
 UN[t] = UN[t-1] + w[t]
 
 (see 
 http://chart.apis.google.com/chart?cht=txchl=U%5EN_%7Bt%7D%20=%20U%5EN_%7Bt-1%7D%20%2B%20%5Cepsilon_t
 for a pretty version)
 that is, a random walk with well-behaved error term with var(w) known
 (or assumed).
 
 I'm interested in the estimates of a and b and also in estimating the
 time series of UN.
 
 Note that the term b*(U[t] - UN[t]) makes this a nonlinear model.
 
 Below is code that does not work as expected.  I see the model as
 having four parameters, a, b, var(e), and UN.  (Or do I have a
 parameter UN[t] for every period?)
 
 I do not fully understand the dlm syntax.  Is FF specified properly?
 What should X look like?  How does m0 relate to parm()?
 
 
 I would be grateful if someone would be willing to glance at the code.
  Thanks. Michael
 
 library(quantmod)
 library(dlm)
 ## Get and organize the data
 getSymbols(UNRATE,src=FRED)  ## Unemployment rate
 getSymbols(GDPDEF,src=FRED)  ## Quarterly GDP Implicit Price Deflator
 u - aggregate(UNRATE,as.yearqtr,mean)
 gdpdef - aggregate(GDPDEF,as.yearqtr,mean)
 pi - diff(log(gdpdef))*400
 pilag - lag(pi,-1)
 tvnairu - cbind(pi,pilag,u)
 tvnairu.df - subset(data.frame(tvnairu), !is.na(pi)  !is.na(u) 
 !is.na(pilag))
 
 
 ## First attempt
 buildNAIRU - function(x) {
   modNAIRU - dlm(FF=t(matrix(c(1,1,1,0))),
   GG=diag(4),
   W=matrix(c(0,0,0,0,  0,0,0,0,  0,0,0.04,0,  0,0,0,0),4,4),
   V=exp(x[4]), m0=rep(0,4), C0=diag(1e07,4),
   JFF = t(matrix(c(1,1,0,0))),
   X=cbind( tvnairu.df$pilag, tvnairu.df$u))
   return(modNAIRU)
 }
 
 (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
 hessian=TRUE, control=list(maxit=500)))
 (dlmNAIRU - buildNAIRU(fitNAIRU$par))
 
 
 ## Second attempt
 buildNAIRU - function(x) {
   modNAIRU - dlm(FF=t(matrix(c(1,1,0,0))),
   GG=diag(4),
   W=matrix(c(0,0,0,0, 0,0,0,0, 0,0,0.04,0, 0,0,0,0 ),4,4),
   V=exp(x[4]), m0=c(0.7,-0.3,5,1), C0=diag(100,4),
   JFF = t(matrix(c(1,1,0,0))),
   X=cbind( tvnairu.df$pilag, tvnairu.df$u - x[3] ))
   return(modNAIRU)
 }
 
 (fitNAIRU - dlmMLE(tvnairu.df$pi, parm=c(0,0,0,0) , build=buildNAIRU,
 hessian=TRUE, control=list(maxit=500)))
 (dlmNAIRU - buildNAIRU(fitNAIRU$par))
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with plotting plsr loadings

2011-06-10 Thread Bjørn-Helge Mevik
Amit Patel amitrh...@yahoo.co.uk writes:

plot(BHPLS1, loadings, comps = 1:2, legendpos = topleft, labels = 
numbers, 
xlab = nm)

 Error in loadingplot.default(x, ...) : 
   Could not convert variable names to numbers.


  str(BHPLS1_Loadings)
  loadings [1:8892, 1:60] -0.00717 0.00414 0.02611 0.00468 -0.00676 ...
  - attr(*, dimnames)=List of 2
   ..$ : chr [1:8892] PCIList1 PCIList2 PCIList3 PCIList4 ...
   ..$ : chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ...
  - attr(*, explvar)= Named num [1:60] 2.67 4.14 4.41 3.55 2.59 ...
   ..- attr(*, names)= chr [1:60] Comp 1 Comp 2 Comp 3 Comp 4 ...

 Can anyone see the problem??

By using `labels = numbers', you are asking the plot function to
convert the names PCIList1 PCIList2 PCIList3 PCIList4 ... to
numbers.  It doesn't know how to do that.  (See ?loadingplot for the
details.)

Your options are using `label = names', provide your own labels, not
using the `labels' argument, or converting the names manually.

-- 
Bjørn-Helge Mevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to convert a vector to an upper matrix

2011-06-10 Thread Berend Hasselman

Zhishi Wang wrote:
 
 Hi all,
 
 If I have a vector A-1:6, I would like to convert A to an upper
 matrix M=[0 1 2 3 ; 0 0 4 5 ; 0 0 0 6; 0 0 0 0].
 That is, if the length of the known vector is n, then the order m of
 the matrix I'd like to have should satisfy m(m-1)/2=n
 How could I do this in R?
 I really don't want to do it using a loop. Is there a function in R to do
 this ?
 

?upper.tri
?uniroot

A - 1:6

n - length(A)
m - as.integer(uniroot(function(x) {x*(x-1)- 2*n}, c(1, n))$root)

U - matrix(0,nrow=m,ncol=m)
U[upper.tri(U)] - A
U


Berend


--
View this message in context: 
http://r.789695.n4.nabble.com/how-to-convert-a-vector-to-an-upper-matrix-tp3587715p3587810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot3d - help assign colors based on multiple conditions

2011-06-10 Thread Uwe Ligges



On 09.06.2011 22:40, Karthik Kota wrote:

Thanks a lot! This is very helpful.

If I have to extend this to one more condition say assign blue if both the 
corresponding labels have _Tongue_dorsum, is there a straight forward function. Doing 
something like below is overriding the colors assigned by the first statement.

col- ifelse(grepl(_Anterior_nares, xlabels)  grepl(_Anterior_nares, ylabels), 
red, black)
col- ifelse(grepl(_Tongue_dorsum, xlabels)  grepl(_Tongue_dorsum, ylabels), blue, 
black)



In that case go for it with your original way by, e.g., replacing

col[grepl(_Anterior_nares, xlabels)   grepl(_Anterior_nares, 
ylabels)] - mycols[4]


Uwe



Again, your time is really appreciated.
Karthik


On Jun 9, 2011, at 1:56 PM, Uwe Ligges wrote:




On 09.06.2011 16:51, Karthik Kota wrote:

Hi

I am relatively new to R and am trying to figure out to plot 3d scatter plot 
using defined colors based on x-axis and y-axis values.  Right now in the code 
below, I assign colors based on certain values in the names of the x-axis.  Now 
if I want to extend the condition to assign a color based on the names of both 
x-axis and y-axis values, what should I be doing? Any help or ideas would be 
greatly appreciated.

For e.g. in my 3 column matrix below, if I want to assign red to all the values whose 
first column and second column contain Anterior_nares and  assign black to any other 
combination.



Both question and answer are not really scatterplot3d related: You probably want

col- ifelse(grepl(_Anterior_nares, xlabels)  grepl(_Anterior_nares, ylabels), 
red, black)

Best,
Uwe Ligges




Thanks!
Karthik

library(scatterplot3d)

chd1=read.table(file=test.out, header=F, sep=\t)
col=as.vector(chd1[,1])
xlabels=as.vector(chd1[,1])
ylabels=as.vector(chd1[,2])

mycols-c(red,blue,green,chocolate,orange, brown)
col[grep(_Stool, xlabels) ]-mycols[1]
#col[grep(_Stool, xlabels)   grep(_Stool, ylabels) ]-mycols[1]
col[grep(_Tongue_dorsum, xlabels) ]-mycols[2]
col[grep(_Posterior_fornix, xlabels) ]-mycols[3]
col[grep(_Anterior_nares, xlabels) ]-mycols[4]
col[grep(_Buccal_mucosa, xlabels) ]-mycols[5]
col[grep(_Supragingival_plaque, xlabels) ]-mycols[6]


png(file=3dplot_test.png, w=700,h=700)

scatterplot3d(chd1[, 1], chd1[, 2], chd1[, 3], main=test, xlab=sample, ylab=sample, 
zlab=kmers, color=col,type=p)
dev.off ()


my test.out matrix looks something like this:

A011132_Anterior_nares  A011263_Anterior_nares  50130
A011132_Anterior_nares  A011397_Stool   34748
A011132_Anterior_nares  A012291_Tongue  40859
A011132_Anterior_nares  A012663_Buccal_mucosa   76213
A011132_Anterior_nares  A013155_Anterior_nares  36841
A011132_Anterior_nares  A013269_Anterior_nares  45619
A011132_Anterior_nares  A013637_Anterior_nares  56995
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge large number of raster objects

2011-06-10 Thread Rolf Turner

On 10/06/11 14:57, Ben Zaitchik wrote:

Hello,

I have a large number of raster objects in memory, with names RH100, 
RH101, RH102, etc. (myobjects - ls(pattern='^RH')).
These rasters are sections of a continuous map, and I would like to 
combine them using the RasterObject merge tool in package 'raster.'


Merge expects an input list of raster objects (outmap-merge(x, y, 
...), where x, y, and ... are raster objects).  I can run the command 
successfully if I type in every raster object that I want to merge, 
but this is impractical for the large number of objects I'm combining.


I would like to apply merge to a list of object names defined using 
some kind of wildcard-based list command, but I'm struggling to find 
the right data types in R.


Is there some way to convert a vector of strings (e.g., 
as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) 
that could be specified as the input to the merge function?  What I'd 
really like to do is something like outmap-merge(myobjects[1:40]), in 
order to merge the 40 raster objects, but I recognize that it might 
not be so simple.


Advance apologies if this is already dealt with on the help list . . . 
I've gone in circles on wildcard, lapply, and names threads and 
haven't managed to get anything to work.


I think that

outmap - do.call(merge,myobjects)

should work for you.

cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot3d - help assign colors based on multiple conditions

2011-06-10 Thread Jim Lemon

On 06/10/2011 06:40 AM, Karthik Kota wrote:

Thanks a lot! This is very helpful.

If I have to extend this to one more condition say assign blue if both the 
corresponding labels have _Tongue_dorsum, is there a straight forward function. Doing 
something like below is overriding the colors assigned by the first statement.

col- ifelse(grepl(_Anterior_nares, xlabels)  grepl(_Anterior_nares, ylabels), 
red, black)
col- ifelse(grepl(_Tongue_dorsum, xlabels)  grepl(_Tongue_dorsum, ylabels), blue, 
black)


Hi Karthik,
Your problem is that you are assigning all of the values to col twice. 
If you reverse the order of the statements you will get the red/black 
color set. Try this:


col-rep(black,dim(cdh1)[1])
col[grepl(_Anterior_nares, xlabels) 
 grepl(_Anterior_nares, ylabels)]-red
col[grepl(_Tongue_dorsum, xlabels) 
 grepl(_Tongue_dorsum, ylabels)]-blue

If your two conditions specify disjunct sets, that is, no case satisfies 
both conditions, you should have the correct vector of colors.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lattice plot query

2011-06-10 Thread Vijayan Padmanabhan
Dear Dennis
Thanks for the comments, I read thru the script  that was originally 
created and gathered that the issue was because of the use of index.cond 
in the xyplot function.
I modified the script to drop the index.cond parameter in the function and 
now it works to give the subject order as indicated by ordered levels of 
the Subj factor.
Following this however, I have another requirement for which i am seeking 
the help of R Group again.

I am providing the illsutrative data and the script to run the lattice 
plot.
What i require is to get the individual panels in each lattice plot carry 
a red or a blue background when the slope of the fitted line is -ve or 
+ve  respectively.
Woule be greatful to receive help on how this could be acheived.

library(lattice)
MyData-structure(list(Subj = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c(S1, S2, S3, 
S4), class = factor), Product = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(a, 
b), class = factor), Arm = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(L, 
R), class = factor), Attribute = structure(c(1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c(A, 
B), class = factor), Time = structure(c(1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), .Label = c(T0, 
T1), class = factor), value = c(44, 33, 31, 34, 36, 33, 38, 
32, 33, 35, 44, 38, 33, 36, 37, 34, 32, 35, 37, 33, 35, 41, 39, 
40, 35, 32, 32, 44, 42, 30, 43, 38, 13, 13, 14, 12, 15, 11, 15, 
14, 10, 15, 11, 15, 15, 10, 13, 13, 15, 11, 11, 12, 14, 10, 13, 
13, 10, 12, 10, 13, 13, 15, 13, 10)), .Names = c(Subj, Product, 
Arm, Attribute, Time, value), row.names = c(NA, 64L), class = 
data.frame, na.action = structure(65:352, .Names = c(65, 
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 
88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 
99, 100, 101, 102, 103, 104, 105, 106, 107, 
108, 109, 110, 111, 112, 113, 114, 115, 116, 
117, 118, 119, 120, 121, 122, 123, 124, 125, 
126, 127, 128, 129, 130, 131, 132, 133, 134, 
135, 136, 137, 138, 139, 140, 141, 142, 143, 
144, 145, 146, 147, 148, 149, 150, 151, 152, 
153, 154, 155, 156, 157, 158, 159, 160, 161, 
162, 163, 164, 165, 166, 167, 168, 169, 170, 
171, 172, 173, 174, 175, 176, 177, 178, 179, 
180, 181, 182, 183, 184, 185, 186, 187, 188, 
189, 190, 191, 192, 193, 194, 195, 196, 197, 
198, 199, 200, 201, 202, 203, 204, 205, 206, 
207, 208, 209, 210, 211, 212, 213, 214, 215, 
216, 217, 218, 219, 220, 221, 222, 223, 224, 
225, 226, 227, 228, 229, 230, 231, 232, 233, 
234, 235, 236, 237, 238, 239, 240, 241, 242, 
243, 244, 245, 246, 247, 248, 249, 250, 251, 
252, 253, 254, 255, 256, 257, 258, 259, 260, 
261, 262, 263, 264, 265, 266, 267, 268, 269, 
270, 271, 272, 273, 274, 275, 276, 277, 278, 
279, 280, 281, 282, 283, 284, 285, 286, 287, 
288, 289, 290, 291, 292, 293, 294, 295, 296, 
297, 298, 299, 300, 301, 302, 303, 304, 305, 
306, 307, 308, 309, 310, 311, 312, 313, 314, 
315, 316, 317, 318, 319, 320, 321, 322, 323, 
324, 325, 326, 327, 328, 329, 330, 331, 332, 
333, 334, 335, 336, 337, 338, 339, 340, 341, 
342, 343, 344, 345, 346, 347, 348, 349, 350, 
351, 352), class = omit))


MyData$Product-factor(MyData$Product, levels = c(a,b))
MyData$Time = factor(MyData$Time,levels=c(T0,T1))
MyData$Subj - factor(MyData$Subj, levels = c(paste('S', 1:4, sep = '')))
MyData-MyData
Attributeindata-levels(MyData$Attribute)
productindata-levels(MyData$Product)
MyData-MyData[ order(MyData$Subj) ,]

pdf(file=D:/latticeplotsbysubject.pdf)
for(i in 1:length(Attributeindata))
{
data-subset(MyData,Attribute==Attributeindata[i])
for(j in 1:length(productindata))
{
data2-subset(data,Product==productindata[j])
print(xyplot(value ~ Time | Subj, data=data2,layout = c(6,3), aspect = 
xy,as.table = TRUE, type = c(g, p, r),drop.unused.levels=TRUE,ylab 
= paste(value of,Attributeindata[i],for 

Re: [R] Calculating a mean based on a factor range

2011-06-10 Thread Jim Lemon

On 06/10/2011 08:10 AM, Sam Albers wrote:

Hello all,

I have been using an instrument that collects a temperature profile of a
water column. The instrument records the temperature and depth any time it
takes a reading. I was sampling many times at discrete depth rather than a
complete profile of the water column (e.g. interested in 5m, 10m and 20m
depth position only). The issue was that these measurement were taken with
the instrument hanging off the side of a boat so a big enough wave moved the
instrument such that it recorded a slightly different depth. For my
purposes, however, this difference is negligible and I wish to consider all
those different readings at close depth as a single depth. So for example:



library(ggplot2)

eg- read.csv(http://dl.dropbox.com/u/1574243/example_data.csv;,

header=TRUE, sep=,)



## Calculating an average value from all the readings for each depth reading

eg.avg- ddply(eg, c(site, depth), function(df)

return(c(temp=mean(df$temperature),
+
num_samp=length(df$temperature)
+  )))



## An example of my problem

eg.avg[eg.avg$num_samp10  eg.avg==Station 3,]

  sitedepth temp num_samp
154 Station 3  1.09000 4.073667   30
159 Station 3  2.49744 3.950972   72
175 Station 3  7.96332 3.903188   69
208 Station 3 19.37708 4.066393   61
209 Station 3 19.54096 4.025385   13

## So here you will notice that record 208 and 209, by my criteria, should
be considered a sample at the same depth and lumped together. Yet I can't
figure out a way to coerce R to calculate a mean value of temperature based
on a threshold range depth (say +/- 0.25). Generally speaking this can be
said to be calculating a mean (temperature) based on a factor (depth) range.


Hi Sam,
I suspect that you have a set of nominal depths in which you are 
interested. If so, I would suggest using cut as Peter suggested, but 
specifying the breaks like this:


depth_cat-cut(depth,breaks=c(0,2.5,7.5,15,30),
 labels=c(0,5,10,25))

So that the values nearest to your nominal depths will be aggregated 
into the correct categories.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove Duplicate numbers from a vector through witing set of commands

2011-06-10 Thread Jim Holtman
?duplicated

Sent from my iPad

On Jun 10, 2011, at 1:22, Z B z20@googlemail.com wrote:

 Hi,
 
 I want to write a set of commands instead of unique(x) to remove duplicate
 numbers from a vector.
 
 Would you please help me. I don't know how to compare a vector numbers
 together. I am new to R.
 
 Thanks.
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error with geomap in googleVis

2011-06-10 Thread Gesmann, Markus
Hi Mike,

I believe you are trying to put two charts on the same page.
The demo(googleVis) provides you with an example for this.
So applying the same approach to your data sets would require:

library(googleVis)
df-data.frame(foo=c(Brazil,Canada), bar=c(123,456))

map1-gvisGeoMap(df,locationvar='foo',numvar='bar')
m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year)

## Build a new gvis object based on the exsiting ones
ComboChart - structure(list(type='GeoMapMotionChart',
chartid='myComboChart', 
html=list(header=map1$html$header,
  chart=list(map1$html$chart, # Geo Map
   m$html$chart),   # Motion Chart
  caption=Combination of various googleVis outputs BR
/, 
  footer=map1$html$footer)),  
  class=c(gvis, list)
)
plot(ComboChart)


I hope this helps.

Kind regards,

Markus

-Original Message-
From: Mike Marchywka [mailto:marchy...@hotmail.com] 
Sent: 09 June 2011 19:12
To: Gesmann, Markus; mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
Subject: RE: [R] error with geomap in googleVis



I still got blanks with Firefox with the two examples below, I put html
up here if you want to look at it,

http://98.129.232.232/xxx.html

I just downloaded googlevis from mirror 68 and it claimed it was 0.2.5 (
I thought, but maybe I should check again). 


install.packages(googleVis,dep=T)
library(googleVis)
df-data.frame(foo=c(Brazil,Canada), bar=c(123,456))
map1-gvisGeoMap(df,locationvar='foo',numvar='bar')

cat(map1$html$header,filename=xxx.html,append=F)
cat(map1$html$header,file=xxx.html,append=F)
cat(map1$html$chart,file=xxx.html,append=T)
cat(map1$html$caption,file=xxx.html,append=T)
cat(map1$html$footer,file=xxx.html,append=T)
m-gvisMotionChart(Fruits,idvar=Fruit,timevar=Year)
str(m)
cat(m$html$header,file=xxx.html,append=F)
cat(m$html$chart,file=xxx.html,append=T)
cat(m$html$caption,file=xxx.html,append=T)
cat(m$html$footer,file=xxx.html,append=T)




 Subject: RE: [R] error with geomap in googleVis
 Date: Thu, 9 Jun 2011 14:06:22 +0100
 From: markus.gesm...@lloyds.com
 To: marchy...@hotmail.com; mjphi...@tpg.com.au; 
 r-h...@stat.math.ethz.ch

 Hi all,

 This issue occurs with googleVis 0.2.4 and RJSONIO  0.7.1.
 Version 0.2.5 of the googleVis package has been uploaded to CRAN two 
 days ago and should have fixed this issue.
 Can you please try to update to that version, e.g. from 
 http://cran.r-project.org/web/packages/googleVis/

 Further version 0.2.5 provides new interfaces to more interactive 
 Google
 charts:
 - gvisLineChart
 - gvisBarChart
 - gvisColumnChart
 - gvisAreaChart
 - gvisScatterChart
 - gvisPieChart
 - gvisGauge
 - gvisOrgChart
 - gvisIntensityMap

 Additionally a new demo 'AnimatedGeoMap' has been added which shows 
 how a Geo Map can be animated with additional JavaScript. Thanks to 
 Manoj Ananthapadmanabhan and Anand Ramalingam, who provided the idea 
 and initial code.

 For more information and examples see:
 http://code.google.com/p/google-motion-charts-with-r/

 I hope this helps

 Markus

 -Original Message-
 From: r-help-boun...@r-project.org 
 [mailto:r-help-boun...@r-project.org]
 On Behalf Of Mike Marchywka
 Sent: 09 June 2011 11:19
 To: mjphi...@tpg.com.au; r-h...@stat.math.ethz.ch
 Subject: Re: [R] error with geomap in googleVis



 
  To: r-h...@stat.math.ethz.ch
  From: mjphi...@tpg.com.au
  Date: Wed, 8 Jun 2011 10:14:01 +
  Subject: Re: [R] error with geomap in googleVis
 
  SNV Krishna primps.com.sg writes:
 
  
   Hi All,
  
   I am unable to get the plot geomap in googleVis package. data is 
   as follows
  
head(index.ret)
   country ytd
   1 Argentina -10.18
   2 Australia -3.42
   3 Austria -2.70
   4 Belgium 1.94
   5 Brazil -7.16
   6 Canada 0.56
  
map1 = gvisGeoMap(index.ret,locationvar = 'country', numvar =
'ytd')
plot(map1)
  
   But it just displays a blank page, showing an error symbol at the 
   right bottom corner. I tried demo(googleVis), it also had a 
   similar problem. The demo showed all other plots/maps except for 
   those geomaps. Could any one please hint me what/where could be 
   the problem? Many thanks for the idea and support.


 I had never used this until yesterday but it seems to generate html.
 I didn't manage to get a chart to display but if you are familiar with

 this package and html perhaps you could look at map1$html and see if 
 anything is obvious. One great thing about html/js is that it is human

 readable and you can integrate it well with other page material 
 without much in the way of special tools.








  
   Regards,
  
   SNV Krishna
  
   [[alternative HTML version deleted]]
  
  
 
  Hi All,
 
  I have also encountered this problem. I have tested the problem in 
  Windows XP

 3.0. I
  have latest java and flash and I have tried 

Re: [R] error with geomap in googleVis

2011-06-10 Thread Mike Marchywka




 Subject: RE: [R] error with geomap in googleVis
 Date: Fri, 10 Jun 2011 11:06:47 +0100
 From: markus.gesm...@lloyds.com
 To: marchy...@hotmail.com; r-h...@stat.math.ethz.ch

 Hi Mike,

 I believe you are trying to put two charts on the same page.
 The demo(googleVis) provides you with an example for this.
 So applying the same approach to your data sets would require:


Thanks, I thought however I tried to different examples the first
being my own which failed and  then the Fruits example as a second
test. I was lazy and used install package from probably wrong mirror
as in the past on 'dohs I had problems with CMD INSTALL. I reloaded
as per your suggestion and tried first example. The output did
change but it is still lacking a chart. Do I need API key or something?
Ive never used google visualization before and was just reacting to OP
as I had wanted to try it. Process is below, final result may
be here, http://98.129.232.232/xxx.html , but I will be varying as
I get time. Thanks again.

--2011-06-10 05:17:47--  http://cran.r-project.org/src/contrib/googleVis_0.2.5.t
ar.gz
Resolving cran.r-project.org... 137.208.57.37
Connecting to cran.r-project.org|137.208.57.37|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Fri, 10 Jun 2011 10:17:47 GMT
  Server: Apache/2.2.9 (Debian)
  Last-Modified: Tue, 07 Jun 2011 06:35:56 GMT
  ETag: a83451-b36e9-4a5196ea64b00
  Accept-Ranges: bytes
  Content-Length: 734953
  Keep-Alive: timeout=15, max=100
  Connection: Keep-Alive
  Content-Type: application/x-gzip
Length: 734953 (718K) [application/x-gzip]
Saving to: `googleVis_0.2.5.tar.gz'

100%[==] 734,953  527K/s   in 1.4s

2011-06-10 05:17:49 (527 KB/s) - `googleVis_0.2.5.tar.gz' saved [734953/734953]

[marchywka@351915-www1 downloads]$ R CMD INSTALL googleVis_0.2.5.tar.gz
* installing to library `/usr/local/lib64/R/library'
* installing *source* package `googleVis' ...
** R
** data
**  moving datasets to lazyload DB
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded

* DONE (googleVis)
[marchywka@351915-www1 downloads]$ R

R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

 library(googleVis)
df-data.frLoading required package: RJSONIO
ame(foo=c(BrazilTo suppress the following message use the statement:
suppressPackageStartupMessages(library(googleVis))

Welcome to googleVis version 0.2.5

Type ?googleVis to access the overall documentation and
vignette('googleVis') for the package vignette.
You can execute the demo of the package via: demo(googleVis)

More information is available on the googleVis project web-site:
http://code.google.com/p/google-motion-charts-with-r/

Please read also the Google Visualisation API Terms of Use:
http://code.google.com/apis/visualization/terms.html

Feel free to send us an email rvisualisat...@gmail.com
if you would like to be keept informed of new versions,
or if you have any feedback, ideas, suggestions or would
like to collaborate.

,C df-data.frame(foo=c(Brazil,Canada), bar=c(123,456))

 map1-gvisGeoMap(df,locationvar='foo',numvar='bar')
 cat(map1$html)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 1 (type 'list') cannot be handled by 'cat'
 cat(map1$html$header)
!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
    http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
html xmlns=http://www.w3.org/1999/xhtml;
head
  titleGeoMapID23b504e5/title
  meta http-equiv=content-type content=text/html;charset=utf-8 /
  style type=text/css
    body {
  color: #44;
  font-family: Arial,Helvetica,sans-serif;
  font-size: 75%;
    }
    a {
  color: #4D87C7;
  text-decoration: none;
    }
  /style
/head
body
 cat(map1$html$header,file=xxx.html,appeand=F)
 cat(map1$html$chart,file=xxx.html,appeand=t)
Error in cat(list(...), file, sep, fill, labels, append) :
  argument 2 (type 'closure') cannot be handled by 'cat'
 str(map1$html)
List of 4
 $ header : chr !DOCTYPE html PUBLIC \-//W3C//DTD XHTML 1.0 Strict//EN\\n
 \http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd\;\nht| __truncated__

 $ chart  : Named chr [1:7] !-- GeoMap generated in R 2.13.0 by googleVis 

Re: [R] Logistic Regression

2011-06-10 Thread Frank Harrell
Which statistical principles are you invoking on which to base such analyses?
Frank

Sergio Della Franca wrote:
 
 Dear R-Helpers,
 
 I want to perform a logistic regression on my dataset (y).
 
 I used the following code:
 
 logistic-glm(formula=interest_variable~.,family = binomial(link =
 logit),
 data = y)
 
 
 This run correctly.
 Then i want to develop the logistic regression with three different
 method:
 -forward
 -backward
 -stepwise
 
 I used these procedure:
 forward-step(logistica,direction=forward)
 backward-step(logistica,direction=backward)
 stepwise-step(logistica,direction=both)
 
 Even these run correctly, but i obtained the same results with the three
 different procedures.
 
 Then I tought i made some mistakes.
 
 My question is:
 
 Is correct what i did?
 Is correct that three different methods return the same results?
 
 If i made some mistakes, what is the correct method to correctly perform
 the
 three different logistics regression?
 
 
 Thank you in advance.
 
 
 Sergio Della Franca.
 
   [[alternative HTML version deleted]]
 
 __
 r-h...@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Logistic-Regression-tp821301p3588135.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] k-nn hierarchical clustering

2011-06-10 Thread Carlos Ortega
Hi,

Yes, check for function ann in package yaImpute.

Regards,
Carlos Ortega
www.qualityexcellence.es


On Thu, Jun 9, 2011 at 1:51 PM, Christian Hennig chr...@stats.ucl.ac.ukwrote:

 Hi there,

 is there any R-function for k-nearest neighbour agglomerative hierarchical
 clustering?
 By this I mean standard agglomerative hierarchical clustering as in hclust
 or agnes, but with the k-nearest neighbour distance between clusters used on
 the higher levels where there are at least k1 distances between two
 clusters (single linkage is 1-nearest neighbour clustering)?

 Best regards,
 Christian

 *** --- ***
 Christian Hennig
 University College London, Department of Statistical Science
 Gower St., London WC1E 6BT, phone +44 207 679 1698
 chr...@stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Odp: what is the mistake?? the coding still not function. no result display

2011-06-10 Thread Petr PIKAL
Hi
 
 [R] what is the mistake?? the coding still not function. no result 
display
 
 #lda.r
  #
  #Author:Amsha Nahid, Jairus Bowne, Gerard Murray
  #Purpose:Perform Linear Discriminant Analysis (LDA)
  #
  #Input:Data matrix as specified in Data-matrix-format.pdf
  #Output:LDA plot
  #
  #Notes:Missing values (if any) are replaced by the half of the 
lowest
  #  value in the entire data matrix.
  
  
  #
  #Load necessary libraries, and install them if they are missing
  #
  tryCatch(library(MASS), error=function(err)
 + # if this produces an error:
 + install.packages(MASS,repos=http://cran.ms.unimelb.edu.au/;))
  
  #
  #Prepare the data matrix
  #
  # Read in the .csv file
  data-read.csv(C:/Users/nadya/Desktop/praktikal 
UTM/TASK2/new(40data)S2 100 
 EMS EPI 300-399.csv, sep=,, row.names=1, header=TRUE)
  # Get groups information
  groups-data[,1]
  # Remove groups for data processing
  lda_data-data[,-1]
  # Replace any missing values (see Notes)
  lda_data[is.na(lda_data)]-0.5*(min(lda_data,na.rm=TRUE))
  
  #
  #Perform the LDA
  #
  lda_result-lda(lda_data,groups)
 Error in lda.default(x, grouping, ...) : 
   variables  1  3  5  8 10 15 17 20 27 29 34 appear to be constant 
within groups

The problem seems to be with your data, which in some groups are constant.

Try 
split(lda_data, groups) 

or

sapply(split(lda_data, groups), length)

and see your result.

Regards
Petr
  
  #
  #Generate the figures (on screen)
  #
  #Image generation - function definition
  pic_onscr-function(matrix, title, cex_val=1)
 + {x11()
 + par(mgp=c(5,2,0),   # axis margins
 + # (title, labels, 
line)
 + mar=c(7,4,4,2), # plot margins 
(b,l,t,r)
 + las=1)  # horizontal labels
 + plot(matrix,# data to plot
 + cex=cex_val,# font size
 + dimen=2 # dimensions to plot
 + )
 + title(main=title)   # title of plot
 + }
  # Plot LDA scores with sample names
  pic_onscr(lda_result,Linear Discriminant Analysis)
 Error in plot(matrix, cex = cex_val, dimen = 2) : 
   error in evaluating the argument 'x' in selecting a method for 
function 
 'plot': Error: object 'lda_result' not found
  # For plotting with larger font size, use a different value of cex:
  # pic_onscr(lda_result, LDA Plot, dimen=2, cex=3)
  
  #
  #Generate figures as image files
  #
  #(Uncomment blocks as necessary)
  
  # jpeg #
  # pic_jpg-function(filename, matrix, title, cex_val=1)
  # {# Start jpeg device with basic settings
  # jpeg(filename,
  # quality=100,# image quality 
(percent)
  # bg=white, # background colour
  # res=300,# image resolution 
(dpi)
  # units=in, width=8.3, height=5.8   # image dimensions 
(inches)
  # )
  # par(mgp=c(5,2,0),   # axis margins 
  # #  (title, labels, 
line)
  # mar=c(7,4,4,2), # plot margins 
(b,l,t,r)
  # las=1   # horizontal labels
  # )
  # # Draw the plot
  # plot(matrix,# data to plot
  # cex=cex_val,# font size
  # dimen=2 # dimensions to plot
  # )
  # title(main=title)   # title of plot
  # 
  # dev.off()
  # }
  # pic_jpg(LDA.jpg, lda_result, Linear Discriminant Analysis)
  # end jpeg #
  
  # png #
  # pic_png-function(filename, matrix, title, cex_val=1)
  # {# Start png device with basic settings
  # png(filename,
  # bg=white, # background colour
  # res=300,# image resolution 
(dpi)
  # units=in, width=8.3, height=5.8   # image dimensions 
(inches)
  # )
  # par(mgp=c(5,2,0),   # axis margins 
  # #  (title, labels, 
line)
  # mar=c(7,4,4,2), # plot margins 
(b,l,t,r)
  # las=1   # horizontal labels
  # )
  # # Draw the plot
  # plot(matrix,# data to plot
  # cex=cex_val,# font size
  # dimen=2 # dimensions to plot
  # )
  # title(main=title)   # title of plot
  # 
  # dev.off()
  # }
  # pic_png(LDA.png, lda_result, Linear 

[R] Odp: How to subset based on column name that is a number ?

2011-06-10 Thread Petr PIKAL
Hi
 [R] How to subset based on column name that is a number ?
 
 Hi,
 
 I have a data frame with column names 1, 2, 3, ... and I'd like to 
extract 
 a subset based on the values in the first column.  None of the methods I 
tried 
 worked (below).
 
 
 x - subset(dframe, 1 = = My Text)
 x - subset(dframe, 1 = = My Text)
 x - subset(dframe, names(dframe)[1] = = My Text)
 Q - dframe[1 = = FY11_Q4,]
 Q - dframe['1'==FY11_Q4,]
 Q - dframe[names(dframe)[1]==FY11_Q4,]
 

Did you try

dframe[dframe[,1]==My Text,]

Regards
Petr


 
 Might anyone have a suggestion?
 
 Many thanks,
 Mauricio
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merge large number of raster objects

2011-06-10 Thread Ben Zaitchik

Thank you Henrik and Rolf.  My migraine just disappeared. : )

 myobjects = ls(pattern='^RH')
 dataobj = mget(myTemp,.GlobalEnv)
 map = Reduce(merge,dataobj)

did the trick.  do.call() didn't cut it for some reason . . . it 
returned an  'x' is missing  error.  But Reduce works quite nicely for 
the rasters I'm working with.


Thanks again,
Ben

On 6/10/2011 2:06 AM, Henrik Bengtsson wrote:

...and ?mget for retrieving multiple objects as a list.  /Henrik

On Thu, Jun 9, 2011 at 10:32 PM, Henrik Bengtssonh...@biostat.ucsf.edu  wrote:

See ?get and ?do.call.  That should be enough.

?Reduce may be an alternative for do.call(), but could also be less
memory efficient.

My $.02

/Henrik

On Thu, Jun 9, 2011 at 7:57 PM, Ben Zaitchikzaitc...@jhu.edu  wrote:

Hello,

I have a large number of raster objects in memory, with names RH100, RH101,
RH102, etc. (myobjects- ls(pattern='^RH')).
These rasters are sections of a continuous map, and I would like to combine
them using the RasterObject merge tool in package 'raster.'

Merge expects an input list of raster objects (outmap-merge(x, y, ...),
where x, y, and ... are raster objects).  I can run the command successfully
if I type in every raster object that I want to merge, but this is
impractical for the large number of objects I'm combining.

I would like to apply merge to a list of object names defined using some
kind of wildcard-based list command, but I'm struggling to find the right
data types in R.

Is there some way to convert a vector of strings (e.g.,
as.vector(ls(pattern='^RH'))) to a vector of object names (as.names??) that
could be specified as the input to the merge function?  What I'd really like
to do is something like outmap-merge(myobjects[1:40]), in order to merge
the 40 raster objects, but I recognize that it might not be so simple.

Advance apologies if this is already dealt with on the help list . . . I've
gone in circles on wildcard, lapply, and names threads and haven't managed
to get anything to work.

Thank you,
Ben

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic Regression

2011-06-10 Thread John Sorkin
First a word of caution: Forward, backward, and stepwise regression analyses 
are not well received among statisticians. There are many reasons for this. 
Some of the reasons include:
(1) The p value computed at each step is computed ignoring all the previous 
steps. This can lead to incorrect inferences. The p value should be conditioned 
(i.e. computed taking into account) all the previous steps, i.e. the path 
used to get to the current model.
(2) When entering interaction terms, the three methods do not make sure that 
the main effects included in the interaction are in the model before the 
interaction is added.
(3) The analysis strategy substitutes the modeler's knowledge of the problem at 
hand for a thoughtless mechanical procedure.
(4) When your data are colinear (i.e. there is significant correlation among 
your independent variables) the three techniques you used may choose different 
models not because one model is better than the other, but rather because of 
colinearity.
(5) None of the three techniques gives absolute truth; each can give a 
glimpse of truth but they can also lead to a false sense of truth, a trip down 
a rabbit hole if you will.

Given these caveats, it remains to explain why your three analyses gave the 
same results. Fortunately the explanation is simple. The three analyses (one 
which starts with no terms in the model and then adds one at a time [forward], 
one that starts with all terms in the model and then removes terms one at a 
time [backward], and one that starts with no terms in the model and then adds 
and removes terms one at a time [stepwise]) all wound up with the same results. 
This is not a problem, it simply reflects the relations in your data. In fact 
the fact that the three methods all give the same result make me, at least, 
feel a bit better about the results you obtained with a method that is far from 
optimal.

In general it is considered better to model your data using standard modeling 
techniques that make use of your knowledge of the field rather than using one 
of the three techniques you used. This being said, sometimes when one has many 
independent variables, the three techniques can help one understand what is 
happening, but any inference drawn from the models must be taken with many a 
grain of salt.
John


John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

 Frank Harrell f.harr...@vanderbilt.edu 6/10/2011 7:06 AM 
Which statistical principles are you invoking on which to base such analyses?
Frank

Sergio Della Franca wrote:
 
 Dear R-Helpers,
 
 I want to perform a logistic regression on my dataset (y).
 
 I used the following code:
 
 logistic-glm(formula=interest_variable~.,family = binomial(link =
 logit),
 data = y)
 
 
 This run correctly.
 Then i want to develop the logistic regression with three different
 method:
 -forward
 -backward
 -stepwise
 
 I used these procedure:
 forward-step(logistica,direction=forward)
 backward-step(logistica,direction=backward)
 stepwise-step(logistica,direction=both)
 
 Even these run correctly, but i obtained the same results with the three
 different procedures.
 
 Then I tought i made some mistakes.
 
 My question is:
 
 Is correct what i did?
 Is correct that three different methods return the same results?
 
 If i made some mistakes, what is the correct method to correctly perform
 the
 three different logistics regression?
 
 
 Thank you in advance.
 
 
 Sergio Della Franca.
 
   [[alternative HTML version deleted]]
 
 __
 r-h...@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help 
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html 
 and provide commented, minimal, self-contained, reproducible code.
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Logistic-Regression-tp821301p3588135.html 
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code.

Confidentiality Statement:
This email message, including any attachments, is for th...{{dropped:6}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] gam() (in mgcv) with multiple interactions

2011-06-10 Thread Ben Haller
On Jun 9, 2011, at 11:35 AM, Simon Wood wrote:

 I think that the main problem here is that smooths are not constrained to 
 pass through the origin, so the covariate taking the value zero doesn't 
 correspond to no effect in the way that you would like it to. Another way of 
 putting this is that smooths are translation invariant, you get essentially 
 the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
 + e_i (which implies that x_i=0 can have no special status).

  OK, I understand the translation invariance, I think, and why x_i=0 has no 
special status.  I don't understand the consequence that you are saying follows 
from that, though.  Onward...

 All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
 te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
 te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
 te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
 even be in the range of a and b).

  Hmm.  Perhaps my question would be better phrased as a question of model 
interpretation.  Can I think of the smooth found for te(a) as the main effect 
of a?  If so, should I just not be bothered by the fact that te(a,b) at b=0 has 
a different shape from te(a)?  Or is the main effect of a really, here, te(a) 
+ te(a,b | b=0) + te(a,d | d=0) (if my notation makes any sense, I'm not much 
at math)?  Or is the whole concept of main effect meaningless for these kinds 
of models -- in which case, how do I interpret what te(a) means?  Or perhaps I 
should not be trying to interpret a by itself; perhaps I can only interpret the 
interactions, not the main effects.  In that case, do I interpret te(a,b) by 
itself, or do I need to conceptually add in te(a) to understand what te(a,b) 
is telling me?  My head is spinning.  Clearly I just don't understand what 
these GAM models even mean, on a fundamental conceptual level.

 In general I find functional ANOVA not entirely intuitive to think about, but 
 there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
 Springer), and the associated package gss is on CRAN.

  Is what I'm doing functional ANOVA?  I realize ANOVA and regression are 
fundamentally related; but I think of ANOVA as involving discrete levels 
(factors) in the independent variables, like treatment groups.  My independent 
variables are all continuous, so I would not have thought of this as ANOVA.  
Anyhow, OK.  I will go get that book today, and see if I can figure all this 
out.

  Thanks for your help!

Ben Haller
McGill University

http://biology.mcgill.ca/grad/ben/


Begin forwarded message:

 From: Simon Wood s.w...@bath.ac.uk
 Date: June 9, 2011 11:35:11 AM EDT
 To: r-help@r-project.org, rh...@sticksoftware.com
 Subject: Re: [R] gam() (in mgcv) with multiple interactions
 
 I think that the main problem here is that smooths are not constrained to 
 pass through the origin, so the covariate taking the value zero doesn't 
 correspond to no effect in the way that you would like it to. Another way of 
 putting this is that smooths are translation invariant, you get essentially 
 the same inference from the model y_i = f(x_i) + e_i as from y_i = f(x_i + k) 
 + e_i (which implies that x_i=0 can have no special status).
 
 All mgcv does in the case of te(a) + te(b) + te(d) + te(a, b) +
 te(a, d) is to remove the bases for te(a), te(b) and te(d) from the basis of 
 te(a,b) and te(a,d). Further constraining  te(a,b) and te(a,d) so that 
 te(0,b) = te(a,0) = 0 etc wouldn't make much sense (in general 0 might not 
 even be in the range of a and b).
 
 In general I find functional ANOVA not entirely intuitive to think about, but 
 there is a very good book on it by Chong Gu (Smoothing spline ANOVA, 2002, 
 Springer), and the associated package gss is on CRAN.
 
 best,
 Simon
 
 
 
 On 07/06/11 17:00, Ben Haller wrote:
 Hi!  I'm learning mgcv, and reading Simon Wood's book on GAMs, as
 recommended to me earlier by some folks on this list.  I've run into
 a question to which I can't find the answer in his book, so I'm
 hoping somebody here knows.
 
 My outcome variable is binary, so I'm doing a binomial fit with
 gam().  I have five independent variables, all continuous, all
 uniformly distributed in [0, 1].  (This dataset is the result of a
 simulation model.)  Let's call them a,b,c,d,e for simplicity.  I'm
 interested in interactions such as a*b, so I'm using tensor product
 smooths such as te(a,b).  So far so good.  But I'm also interested
 in, let's say, a*d.  So ok, I put te(a,d) in as well.  Both of these
 have a as a marginal basis (if I'm using the right terminology; all I
 mean is, both interactions involve a), and I would have expected them
 to share that basis; I would have expected them to be constrained
 such that the effect of a when b=0, for one, would be the same as the
 effect of a when d=0, for the other.  This would be just as, in a GLM
 with formula a*b + a*d, that 

Re: [R] How to assess the accuracy of fitted logistic regression using glm

2011-06-10 Thread Uwe Ligges



On 10.06.2011 08:54, Xiaobo Gu wrote:

Hi Professor Brian,

Thanks for your reply.

I think there are many statisticians here, and it is somehow R
related, hoping someone can
help me.

I have done a simple test, using a sample csv data which I post if need.

donut- read.csv(file=D:/donut.csv, header = TRUE);
donut[[color]]- as.factor(donut[[color]])
donut[[shape]]- as.factor(donut[[shape]])
donut[[k]]- as.factor(donut[[k]])
donut[[k0]]- as.factor(donut[[k0]])
donut[[bias]]- as.factor(donut[[bias]])

lr- glm(color ~ shape + x + y, family = binomial, data = donut);
summary(lr)

Call:
glm(formula = color ~ shape + x + y, family = binomial, data = donut)

Deviance Residuals:
 Min   1Q   Median   3Q  Max
-2.1079  -0.9476   0.5086   0.7518   1.4079

Coefficients:
 Estimate Std. Error z value Pr(|z|)
(Intercept)  2.530101.65500   1.529   0.1263
shape22  0.056281.54990   0.036   0.9710
shape23 -0.745681.44813  -0.515   0.6066
shape24 -2.618961.38016  -1.898   0.0578 .
shape25 -2.076481.32818  -1.563   0.1180
x   -0.458851.52863  -0.300   0.7640
y   -0.593111.46999  -0.403   0.6866
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

 Null deviance: 50.446  on 39  degrees of freedom
Residual deviance: 42.473  on 33  degrees of freedom
AIC: 56.473

Number of Fisher Scoring iterations: 4

In the Coefficients section, is Pr(|z|) the P-value for that
variable, and there
are a few other questions:
1. How to determine the predict power of each variables?
2. How to determine the overall performance of the fitted model, here what's the
difference between and Deviance Residuals and Residual deviance?
3. How to compare Null deviance and Residual deviance?
4. What does AIC mean, and how to use this measure?
5. What does the Signif. codes section mean?



To answer your question, we'd need to write half a book, at least. This 
cannot be answered in an e-mail message. Hence please re-read Brian 
Ripley's advice and try to get statistical advice from  a local 
consultant or read elementary textbooks on the subject.


Uwe Ligges




Regards,

Xiaobo Gu



On Mon, Jun 6, 2011 at 9:59 PM, Prof Brian Ripleyrip...@stats.ox.ac.uk  wrote:

On Mon, 6 Jun 2011, Xiaobo Gu wrote:


Hi,

I am trying glm with family = binomial to do binary logistic
regression, but how can I assess the accuracy of the fitted model, the
summary method can print a lot of information about the returned
object, such as coefficients, because statistics is not my speciality,
so can you share some rule of thumb to exam the  fitted model from the
practical perspective.


It depends entirely on why you did the fit.  People have written whole books
on assessing the performance of classification procedures such as binary
logistic regression.  For example, the residual deviance is closely related
to log-probability scoring: for some purposes that is a good performance
measure, for others (e.g. when you are going to threshold the predicted
probabilities) it can be very misleading.

In short, you need statistical advice, not R advice (the purpose of this
list).



Regards,

Xiaobo Gu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: package.sk​eleton() does not create 'data' folder

2011-06-10 Thread Nipesh Bajaj
Dear all, somebody please look into my problem here? Does it not
contain sufficient information? Please let me know how can I post my
question more complete.

Thanks,


-- Forwarded message --
From: Nipesh Bajaj bajaj141...@gmail.com
Date: Fri, Jun 10, 2011 at 12:57 AM
Subject: package.sk​eleton() does not create 'data' folder
To: r-help@r-project.org


Hi again, yesterday I mailed this query however I could not see this
on the mail list. Therefore, I am reposting it again.

I was using package.skeleton() function to create the
skeleton of my package in windows. Here is my attempt:

rm(list = ls())
setwd(F:/R_PackageBuild)
package.skeleton(trial1, namespace = TRUE, code_files =
F:/R_PackageBuild/trial.r)

In the trial.r file, there are 2 objects, one is a function and
another is data. Here they are:

fn1 - Vectorize(function(x,y,z) {
                       return(x + y +z)
               }, SIMPLIFY = TRUE)
Data - rnorm(20)

However my problem is that package.skeleton() does not create any data
folder in the skeleton tree. However in the man folder there are 3 Rd
files (as expected), naming:
Data, fn1, trial1-package

However on the contrary if my code is like below then,
package.skeleton() creates data folder.
 fn1 - Vectorize(function(x,y,z) {
+ return(x + y +z)
+ }, SIMPLIFY = TRUE)
 Data - rnorm(20)

 setwd(F:/R_PackageBuild)
 package.skeleton(trial2)

So is it that if I use 'code_files ' argument then, R would not create
data folder?

Can somebody help me what I am missing in this process? Till now, I
create ___manually the data folder and within that folder
manually put a
RData file where only object is that 'Data'. However I believe there
must be more elegant way to doing that.

While searching over net to settle this issue, I found a thread with
hesding 'How to create .rda file to be used in package building',
however this is not answering my question.

Thanks,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: package.sk​eleton() does not create 'data' folder

2011-06-10 Thread Duncan Murdoch

On 10/06/2011 9:01 AM, Nipesh Bajaj wrote:

Dear all, somebody please look into my problem here? Does it not
contain sufficient information? Please let me know how can I post my
question more complete.


You could try following the posting guidelines, which suggest this 
question belongs in the R-devel list, rather than R-help.  Nevertheless, 
an answer below.

Thanks,


-- Forwarded message --
From: Nipesh Bajajbajaj141...@gmail.com
Date: Fri, Jun 10, 2011 at 12:57 AM
Subject: package.sk​eleton() does not create 'data' folder
To: r-help@r-project.org


Hi again, yesterday I mailed this query however I could not see this
on the mail list. Therefore, I am reposting it again.

I was using package.skeleton() function to create the
skeleton of my package in windows. Here is my attempt:

rm(list = ls())
setwd(F:/R_PackageBuild)
package.skeleton(trial1, namespace = TRUE, code_files =
F:/R_PackageBuild/trial.r)

In the trial.r file, there are 2 objects, one is a function and
another is data. Here they are:

fn1- Vectorize(function(x,y,z) {
return(x + y +z)
}, SIMPLIFY = TRUE)
Data- rnorm(20)

However my problem is that package.skeleton() does not create any data
folder in the skeleton tree. However in the man folder there are 3 Rd
files (as expected), naming:
Data, fn1, trial1-package

However on the contrary if my code is like below then,
package.skeleton() creates data folder.
  fn1- Vectorize(function(x,y,z) {
+ return(x + y +z)
+ }, SIMPLIFY = TRUE)
  Data- rnorm(20)

  setwd(F:/R_PackageBuild)
  package.skeleton(trial2)

So is it that if I use 'code_files ' argument then, R would not create
data folder?

Can somebody help me what I am missing in this process? Till now, I
create ___manually the data folder and within that folder
manually put a
RData file where only object is that 'Data'. However I believe there
must be more elegant way to doing that.


Why?  You only call package.skeleton once in the life of your package.  
You will be making hundreds of other changes to the package.  What's 
wrong with making this one?


If you want to know how package.skeleton decides whether to create a 
data folder, just look at its source.


Duncan Murdoch


While searching over net to settle this issue, I found a thread with
hesding 'How to create .rda file to be used in package building',
however this is not answering my question.

Thanks,

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Setting up a State Space Model in dlm

2011-06-10 Thread Michael Ash
Thank you.    The new article linked below is excellent.  BTW, I have
had trouble getting the sample code to run.In particular, there is
a variable k that does not appear to be set when it is used at line
227.

Here is the line where it is called:
fit - optim(par = rep(0, 2 * k), fn = logLik, method = BFGS,
  control = list(maxit = 500))

Here is the error message:
$ R --vanilla  kfas.R  kfas.Rlog
Error in rep.int(1, length(par)) : object 'k' not found
Calls: optim - rep.int
Execution halted

Any advice?  Thanks.

Best,
Michael





On Fri, Jun 10, 2011 at 8:22 AM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:

 On Tue, 2011-06-07 at 17:24 +0100, Michael Ash wrote:
  This question pertains to setting up a model in the package dlm
  (dynamic linear models,
  http://cran.r-project.org/web/packages/dlm/index.html

 The author of the dlm package has just published a paper on state space
 models in R including details on setting up dlm:

 http://www.jstatsoft.org/v41/i04



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help on selecting genes showing highest variance

2011-06-10 Thread Juliet Hannah
# Let's say your expression data is in a matrix
# named expression in which the rows are genes
# and the columns are samples

myvars - apply(expression,1, var,na.rm=TRUE)
myvars - sort(myvars,decreasing=TRUE)
myvars - myvars[1:200]
expression - expression[names(myvars),]
dim(expression)


Also check out the genefilter package in bioconductor. You may find
the bioconductor
mailing list is better for questions like this one.


On Tue, Jun 7, 2011 at 9:47 AM, GIS Visitor 33 gis...@gis.a-star.edu.sg wrote:
 Hi

 I have a problem for which I would like to know a solution. I have a gene 
 expression data and I would like to choose only lets say top 200 genes that 
 had the highest expression variance across patients.

 How do i do this in R?

 I tried x=apply(leukemiadata,1,var)
 x1=x[order(-1*x)]

 but the problem here is  x and x1 are numeric data , If I choose the first 
 200 after sorting in descending, so I do not know how to choose the 
 associated samples with just the numeric values.

 Kindly help!


 Regards
 Ap
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot2 avoid automatic color selection

2011-06-10 Thread Immanuel B
Hey all,

I'm trying to replicate some plots with ggplot2. The problem is that I
need to specify the color for every
attribute (drug). If I use the code below the colors get automatically
assigned but I need to plot drug1 in black drug2 in blue
etc.
How do I do that?

q = qplot(days,vol,data = cellLine7064, color = drug, geom = c(line, point))

best regards,
Immanuel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 avoid automatic color selection

2011-06-10 Thread Ista Zahn
See ?scale_colour_manual

Best,
Ista

On Fri, Jun 10, 2011 at 9:55 AM, Immanuel B mane.d...@googlemail.com wrote:
 Hey all,

 I'm trying to replicate some plots with ggplot2. The problem is that I
 need to specify the color for every
 attribute (drug). If I use the code below the colors get automatically
 assigned but I need to plot drug1 in black drug2 in blue
 etc.
 How do I do that?

 q = qplot(days,vol,data = cellLine7064, color = drug, geom = c(line, 
 point))

 best regards,
 Immanuel

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R program writing standard/practices

2011-06-10 Thread Santosh
Dear Experts,

I notice that there are different ways of writing programs. Was wondering if
there is anything like a standard which could be used to write good/complete
R programs, maintain quality, easy to debug, a standard/practice that can be
consistent in an enterprise environment. Also, are there any beacon lights
that could help navigate through lines of code, programming styles,
understand the scope of a program, identify gaps etc?

Thanks,
Santosh

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Michael Friendly

On 6/10/2011 12:23 AM, Barjesh Kochar wrote:

Dear all,

i am doing linear regression with robust error  to know the effect of
a  (x) variable on (y)other if i execute the command i found positive
trend.
  But if i check the effect of number of (x.x1,x2,x3)variables
on same (y)variable then the positive effect shwon by x variable turns
to negative. so plz help me in this situation.

Barjesh Kochar
Research scholar

You don't give any data or provide any code (as the posting guide 
requests) , so I have to guess that you
have just rediscovered Simpson's paradox -- that the coefficient of a 
variable in a marginal regression can have an opposite sign to that in

a joint model with other predictors. I have no idea what you mean
by 'robust error'.

One remedy is an added-variable plot which will show you the partial
contributions of each predictor in the joint model, as well as whether
there are any influential observations that are driving the estimated
coefficients.


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] giving factor names

2011-06-10 Thread Ethan Brown
Hi Kieran,

I'm not very familiar with lattice, but here's a workaround that works for
me. Basically, I just created a new data.frame column that was a factor
(combo$zf), and forced its levels to be what you're looking for here.

require(lattice)

x-c(1,2,3)
y-c(2,4,6)
z-c(0.1,0.5,2)
combo-expand.grid(x,y,z)
combo-data.frame(combo)
names(combo)-c(x,y,z)
outcome-function(l)
{
(l[1]*l[2])/l[3]
}
resp-apply(combo,1,outcome)

## Create new column and assign levels
combo$zf - as.factor(combo$z)
levels(combo$zf) - paste(z=, levels(combo$zf), sep=)

## Now I use the new variable as the conditioning variable in the plot
levelplot(resp~x*y|zf, data=combo
,pretty=TRUE,region=TRUE,contour=FALSE)

In the future, it would help if you could specify the packages you're using,
since I had to do a little research to find where the levelplot function
is from.

Hope this helps,
Ethan



On Tue, Jun 7, 2011 at 9:30 AM, kieran martin kieranjmar...@gmail.comwrote:

 Hi,

 I've been driving myself insane with this problem. I have a trellis plot of
 contours, and I want each level to have something like z=value for each
 one. I can get each one to say z, or each one to say the value (by using
 as.factor) but not both. Heres an artificial example to show what I mean
 (as
 my actual data set is much larger!)

 x-c(1,2,3)
 y-c(2,4,6)
 z-c(0.1,0.5,2)
 combo-expand.grid(x,y,z)
 combo-data.frame(combo)
 names(combo)-c(x,y,z)
 outcome-function(l)
 {
 (l[1]*l[2])/l[3]
 }
 resp-apply(combo,1,outcome)
 levelplot(resp~x*y|z,data=combo
 ,pretty=TRUE,region=TRUE,contour=FALSE)

 , so in this final graph I want the z=0.1, z=0.5 and z=2 in turn.

 Thanks,

 Kieran Martin
 University of Southampton

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Sorting Data Frame Without Loop

2011-06-10 Thread hi Berven

Hello all!
 
I am currently trying to sort a data frame in a particular way, but I am having 
some difficulties with this. Specifically I want to sort the below dataset in 
such a way that there is only one line per ProteinID and if there are multiple 
GeneID or GeneName entries for a single proteinID, that they be concatenated 
with a comma separating them. The way I have done it earlier worked fine for 
small datasets, but as I am working with around 30,000 entries, it proved too 
slow and I'm not sure how to do it in another way. 
Here is an example of the input.  
 




ProteinID

GeneID

GeneName


M-012847-00

NM_057175

NARG1


M-012847-00

NM_057175

TBDN100


M-015544-00

NM_153008

FLJ30277


M-015544-00

NM_153027

FLJ31659


M-024202-00

NM_207330

NIPAL1


M-024202-00

NM_207330

NPAL1
 
Here is an example showing what I want: 




ProteinID

GeneID

GeneName


M-012847-00

NM_057175

NARG1, TBDN100


M-015544-00

NM_153008, NM_153027

FLJ30277, FLJ31659 


M-024202-00

NM_207330

NIPAL1, NPAL1
 
Here is the code I have been using so far. I have only managed to get this to 
work by using a loop, which I know is not the best way, but at the moment I'm 
stuck.
 
colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, 
M-024202-00,M-024202-00)
colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, 
NM_207330)
colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1')
 
dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC)
 
idVec - character()
geneIDVec - character()
geneNameVec - character()
dataType - ProteinID
annotation - data.frame()
 
for (id in unique(dupes[[dataType]])) {
   print (id)
   idVec - c(idVec, id)
   geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] == 
id]), collapse=, ))
   geneNameVec - c(geneNameVec, 
paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, ))
   annotation[[dataType]][annotation[[dataType]] == id] - NA
}
filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, GeneName=geneNameVec)
 
 
Thanks!   
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] read.graph: vertex labels in GML file?

2011-06-10 Thread Robinson, David G
I can successfully read a GML file using read.graph (e.g. those at
http://www-personal.umich.edu/~mejn/netdata/ , the football data set in
particular). However, I don't seem to be able to read the vertex labels
that are in the GML file; or maybe I am, but I cannot get at them?

I suspect I'm not reading the original labels since the command
 get.vertex.attribute(net, 'name', index=V(net))

just provides a numeric list of vertex names rather than the vertex labels
as they appeared in the original GML file.


Any suggestions would be greatly appreciated.

Cheers,
=Dave

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Mike Marchywka













 Date: Fri, 10 Jun 2011 09:53:20 +0530
 From: bkkoc...@gmail.com
 To: r-help@r-project.org
 Subject: [R] Linear multivariate regression with Robust error

 Dear all,

 i am doing linear regression with robust error to know the effect of
 a (x) variable on (y)other if i execute the command i found positive
 trend.
 But if i check the effect of number of (x.x1,x2,x3)variables
 on same (y)variable then the positive effect shwon by x variable turns
 to negative. so plz help me in this situation.

take y as goodness and x and x1 have something to do with
a product. The first analysis is from company A, second is from company
B and the underlying relationship is given with some noise LOL, 
( I'm still on first cup of cofee, this was fist example to
come to mind as these question keep coming up here everyday )

 x=1:100
 x1=x*x
 y-x-x1+runif(100)
 lm(y~x)

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)    x
   1718 -100

 lm(y~x+x1)

Call:
lm(formula = y ~ x + x1)

Coefficients:
(Intercept)    x   x1
 0.5253   1.0024  -1.














 Barjesh Kochar
 Research scholar

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multilevel pseudo maximum likelihood

2011-06-10 Thread Caterina Giusti

Dear all,

I posted this two years ago, getting no answers or suggestions - now I 
am trying again, hoping something new is available in R.


I am interested in an application of linear multilevel model with 
unequal selection probabilities at both levels.


Do you know if there is an R function for multilevel pseudo-maximum 
likelihood estimation? Or is it possible to obtain these estimates using
the nlme package? In practice what I need is to insert in  the 
likelihood the individual and cluster level design weights.


Thanks in advance,
Caterina

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to log-2 transform a matrix

2011-06-10 Thread wcc
CGH-as.matrix(CGH)
mode(CGH)-numeric

log2(CGH)

--
View this message in context: 
http://r.789695.n4.nabble.com/How-to-log-2-transform-a-matrix-tp839455p3588441.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Counting the Number of Letters in a row

2011-06-10 Thread Abraham Mathew
I'm trying to find the total number of letters in a row of a data frame.

Let's say I have the following data frame.

f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO,
Pizza is good))

The following function gives me the number of characters in each string.
So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know the
total
number of characters (13).

sapply(strsplit(as.character(f1$keyword),  ), nchar)


Thanks,
Abraham

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Data Frame Without Loop

2011-06-10 Thread jim holtman
Is this what you want:

 colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, 
 M-024202-00,M-024202-00)
 colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, 
 NM_207330)
 colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1')

 dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC)
 require(sqldf)
 sqldf(
+ select ProteinID
+ , group_concat(GeneID) as GeneID
+ , group_concat(GeneName) as GeneName
+ from dupes
+ group by ProteinID
+ , method = 'raw')
ProteinID  GeneID  GeneName
1 M-012847-00 NM_057175,NM_057175 NARG1,TBDN100
2 M-015544-00 NM_153008,NM_153027 FLJ30277,FLJ31659
3 M-024202-00 NM_207330,NM_207330  NIPAL1,NPAL1



On Fri, Jun 10, 2011 at 4:35 AM, hi Berven thebe...@hotmail.com wrote:

 Hello all!

 I am currently trying to sort a data frame in a particular way, but I am 
 having some difficulties with this. Specifically I want to sort the below 
 dataset in such a way that there is only one line per ProteinID and if there 
 are multiple GeneID or GeneName entries for a single proteinID, that they be 
 concatenated with a comma separating them. The way I have done it earlier 
 worked fine for small datasets, but as I am working with around 30,000 
 entries, it proved too slow and I'm not sure how to do it in another way.
 Here is an example of the input.





 ProteinID

 GeneID

 GeneName


 M-012847-00

 NM_057175

 NARG1


 M-012847-00

 NM_057175

 TBDN100


 M-015544-00

 NM_153008

 FLJ30277


 M-015544-00

 NM_153027

 FLJ31659


 M-024202-00

 NM_207330

 NIPAL1


 M-024202-00

 NM_207330

 NPAL1

 Here is an example showing what I want:




 ProteinID

 GeneID

 GeneName


 M-012847-00

 NM_057175

 NARG1, TBDN100


 M-015544-00

 NM_153008, NM_153027

 FLJ30277, FLJ31659


 M-024202-00

 NM_207330

 NIPAL1, NPAL1

 Here is the code I have been using so far. I have only managed to get this to 
 work by using a loop, which I know is not the best way, but at the moment I'm 
 stuck.

 colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, 
 M-024202-00,M-024202-00)
 colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, 
 NM_207330)
 colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1')

 dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC)

 idVec - character()
 geneIDVec - character()
 geneNameVec - character()
 dataType - ProteinID
 annotation - data.frame()

 for (id in unique(dupes[[dataType]])) {
       print (id)
       idVec - c(idVec, id)
       geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] 
 == id]), collapse=, ))
       geneNameVec - c(geneNameVec, 
 paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, ))
       annotation[[dataType]][annotation[[dataType]] == id] - NA
 }
 filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, 
 GeneName=geneNameVec)


 Thanks!
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sorting Data Frame Without Loop

2011-06-10 Thread jim holtman
This will give you the unique gene values:

 colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, 
 M-024202-00,M-024202-00)
 colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, 
 NM_207330)
 colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1')

 dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC)
 require(sqldf)
 sqldf(
+ select ProteinID
+ , group_concat(distinct GeneID) as GeneID
+ , group_concat(distinct GeneName) as GeneName
+ from dupes
+ group by ProteinID
+ , method = 'raw')
ProteinID  GeneID  GeneName
1 M-012847-00   NM_057175 NARG1,TBDN100
2 M-015544-00 NM_153008,NM_153027 FLJ30277,FLJ31659
3 M-024202-00   NM_207330  NIPAL1,NPAL1



On Fri, Jun 10, 2011 at 4:35 AM, hi Berven thebe...@hotmail.com wrote:

 Hello all!

 I am currently trying to sort a data frame in a particular way, but I am 
 having some difficulties with this. Specifically I want to sort the below 
 dataset in such a way that there is only one line per ProteinID and if there 
 are multiple GeneID or GeneName entries for a single proteinID, that they be 
 concatenated with a comma separating them. The way I have done it earlier 
 worked fine for small datasets, but as I am working with around 30,000 
 entries, it proved too slow and I'm not sure how to do it in another way.
 Here is an example of the input.





 ProteinID

 GeneID

 GeneName


 M-012847-00

 NM_057175

 NARG1


 M-012847-00

 NM_057175

 TBDN100


 M-015544-00

 NM_153008

 FLJ30277


 M-015544-00

 NM_153027

 FLJ31659


 M-024202-00

 NM_207330

 NIPAL1


 M-024202-00

 NM_207330

 NPAL1

 Here is an example showing what I want:




 ProteinID

 GeneID

 GeneName


 M-012847-00

 NM_057175

 NARG1, TBDN100


 M-015544-00

 NM_153008, NM_153027

 FLJ30277, FLJ31659


 M-024202-00

 NM_207330

 NIPAL1, NPAL1

 Here is the code I have been using so far. I have only managed to get this to 
 work by using a loop, which I know is not the best way, but at the moment I'm 
 stuck.

 colA - c(M-012847-00 ,M-012847-00 ,M-015544-00, M-015544-00, 
 M-024202-00,M-024202-00)
 colB - c(NM_057175, NM_057175, NM_153008, NM_153027, NM_207330, 
 NM_207330)
 colC - c( 'NARG1', 'TBDN100', 'FLJ30277', 'FLJ31659', 'NIPAL1', 'NPAL1')

 dupes - data.frame(ProteinID=colA, GeneID=colB, GeneName=colC)

 idVec - character()
 geneIDVec - character()
 geneNameVec - character()
 dataType - ProteinID
 annotation - data.frame()

 for (id in unique(dupes[[dataType]])) {
       print (id)
       idVec - c(idVec, id)
       geneIDVec - c(geneIDVec, paste(unique(dupes$GeneID[dupes[[dataType]] 
 == id]), collapse=, ))
       geneNameVec - c(geneNameVec, 
 paste(unique(dupes$GeneName[dupes[[dataType]] == id]), collapse=, ))
       annotation[[dataType]][annotation[[dataType]] == id] - NA
 }
 filtered - data.frame(ProteinID=idVec, GeneID=geneIDVec, 
 GeneName=geneNameVec)


 Thanks!
        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting the Number of Letters in a row

2011-06-10 Thread Ingmar Visser
something like this:
sapply(strsplit(as.character(
f1$keyword),  ), function(x){sum(nchar(x))})
hth, Ingmar


On Fri, Jun 10, 2011 at 6:48 PM, Abraham Mathew abra...@thisorthat.comwrote:

 I'm trying to find the total number of letters in a row of a data frame.

 Let's say I have the following data frame.

 f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO,
 Pizza is good))

 The following function gives me the number of characters in each string.
 So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know
 the
 total
 number of characters (13).

 sapply(strsplit(as.character(f1$keyword),  ), nchar)


 Thanks,
 Abraham

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting the Number of Letters in a row

2011-06-10 Thread Jorge Ivan Velez
Hi Abraham,

Try

foo - function(x){
x - as.character(x)
 sapply(strsplit(x,  ), function(s) sum(nchar(s)))
}
foo(f1$keyword)

HTH,
Jorge


On Fri, Jun 10, 2011 at 12:48 PM, Abraham Mathew  wrote:

 I'm trying to find the total number of letters in a row of a data frame.

 Let's say I have the following data frame.

 f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO,
 Pizza is good))

 The following function gives me the number of characters in each string.
 So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know
 the
 total
 number of characters (13).

 sapply(strsplit(as.character(f1$keyword),  ), nchar)


 Thanks,
 Abraham

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting NLS profiles

2011-06-10 Thread R Help
Hello list,

I'm trying to plot nls profiles, but the plot.profile.nls function in
R doesn't seem to accept any plot.default variables.  Specifically,
I'd like to be able to change the x-axis title and the colors to black
and white.  Has anyone had any luck with this?

If not, is there a way to override to plotting colors, perhaps in par()?

Thanks,
Sam

fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
pr1 - profile(fm1, alpha = 0.05)
opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1)
plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine
plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) #
doesn't change

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Double x grid in ggplot2

2011-06-10 Thread James Rome
I am trying to overlay raw data with a boxplot as follows:
pp = qplot(factor(time, levels=0:60, ordered=TRUE),
error, data=dfsub, size=I(1), main = title, ylab=Error
(min),
xlab=Time before ON (min), alpha=I(1/10),
ylim=c(-30,40),geom=jitter) +
facet_wrap(~ runway, ncol=2) +
geom_boxplot(alpha=.5, color=blue, outlier.colour=green,
outlier.size=1) +
scale_x_discrete(breaks = seq(from=0, to=60, by=10))
print(pp)

But I think ggplot2 is getting confused about factors versus numbers
 sapply(dfsub,class)
 time errorrunwayflight
numeric numeric  factor  factor
because when I do the plot, the vertical grid lines are in pairs, one
line at (0, 10, 20,...) and the other at (-1, 9, 19,)

 If I remove the scale_x_discrete, I get a grid line at every minute,
and things look right, but the labels for the minutes all overlap.

Thanks for the help,
Jim Rome
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Item analysis

2011-06-10 Thread John Kane
Here are two different ways with your data as the data.frame xx

# Basic R 
aggregate(xx$Tip, list(xx$Time), sum)

# Using the rshape package
library(reshape2)
yy - melt(xx, id=c(Time), measure.vars=c(Tip))
dcast(yy , Time ~ variable , sum)


--- On Thu, 6/9/11, Trying To learn again tryingtolearnag...@gmail.com wrote:

 From: Trying To learn again tryingtolearnag...@gmail.com
 Subject: [R] Item analysis
 To: r-help@r-project.org
 Received: Thursday, June 9, 2011, 4:41 PM
 Hi all,
 
 For several reasons I have no used CARN R in monthsI
 have an idea and I
 want to retry to learn CRAN R.
 
 I know I need to formulate more intelligent questions but
 I will expose
 and if someone can help me I would be very gratefull I
 promise to try to
 learn again
 
 The question I have a data file like this:
 
 Date, Time, Tip
 13/11/2008,23:16:00,432
 13/01/2009,23:17:00,633
 13/11/2009,23:16:00,134
 13/12/2009,23:17:00,234
 13/01/2010,23:16:00,111
 
 I want to make an statistic (the sum of tip) but to extract
 one sum by each
 different minute in Time, so you see, in this easy example
 I will have a
 final table with two items:
 
 Item Sum
 16 (432+134+111)
 17 (633+234)
 
 Of course in my file (is bigger) I have minutes from 0 to
 59.
 
 Would it be too much difficult if I want this analysis but
 on table for each
 day (indentified in the column Date) of the week. So you
 see on
 13/11/2008  was thursday so you go each day and sum if
 this day is thrusday
 from 0 to 59 minutes...
 
 MAny thanks for all, each tip you give me I will send
 thousands of
 thanks¡
 
     [[alternative HTML version deleted]]
 
 
 -Inline Attachment Follows-
 
 __
 R-help@r-project.org
 mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained,
 reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem loading packages in R 2.13.0 on Mac

2011-06-10 Thread Adrienne Keller
I am having problem loading packages in the newest version of R  
(2.13.0) on my Mac. I have tried to install various packages (e.g.  
lawstat, Rcmdr, car) and load them using the Package Installer and  
Package Manager menu options but I get the follow error:


 library(lawstat)
Loading required package: mvtnorm
Error: package 'mvtnorm' could not be loaded
In addition: Warning message:
In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc  
= lib.loc) :

  there is no package called 'mvtnorm'

When I click on the box for loading lawstat in the Package Manager, it  
immediately reverts back to an unchecked box.


I have tried to load mvtnorm and then load lawstat but I get the same  
error.


Help?

Thanks,

Adrienne

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R program writing standard/practices

2011-06-10 Thread vioravis
Check this out:

http://www1.maths.lth.se/help/R/RCC/

--
View this message in context: 
http://r.789695.n4.nabble.com/R-program-writing-standard-practices-tp3588716p3588911.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Kairavi Bhakta
Hello,

I have a bunch of files containing 300 data points each with values from 0
to 1 which also sum to 1 (I don't think  the last element is relevant
though). In addition, each data point is annotated as an a or a b.

I would like to know in which files (if any) the data is uniformly
distributed.

I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
goodness-of-fit test could be used. Then I looked up ?kolmogorov and found
ks.test, but the example there is for the normal distribution and I am not
sure how to adapt it for the uniform distribution. I did ?runif and read
about the uniform distribution but it doesn't say what the cumulative
distribution is. Is it punif, like pnorm? I thought of that because I
found a message on this list where someone was told to use pnorm instead
of dnorm. But the help page on the uniform distribution says punif is the
distribution function. Are the cumulative distribution and the
distribution function the same thing? Having several names for the same
thing has always confused me very much in statistics.

Also, I am not sure whether I need to specify any parameters for the
distribution and which. I thought maybe I should specify min=0 and max=1
but those appear to be the defaults. Do I need to specify q, the vector of
quantiles?

So is
 ks.test(x, punif)
correct or not for what I am attempting to do?

After this I will also need to find out whether the a's and b's are
distributed randomly in each file. I would be greatful for any pointers
although I have not researched this issue yet.

Kairavi.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting NLS profiles

2011-06-10 Thread Prof Brian Ripley

On Fri, 10 Jun 2011, R Help wrote:


Hello list,

I'm trying to plot nls profiles, but the plot.profile.nls function in
R doesn't seem to accept any plot.default variables.  Specifically,
I'd like to be able to change the x-axis title and the colors to black
and white.  Has anyone had any luck with this?

If not, is there a way to override to plotting colors, perhaps in par()?


No.  The authors hardcoded all these.

Take a copy of the function and modify it to suit your purposes.



Thanks,
Sam

fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
pr1 - profile(fm1, alpha = 0.05)
opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1)
plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine
plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) #
doesn't change

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting NLS profiles

2011-06-10 Thread Dieter Menne

R Help wrote:
 
 I'm trying to plot nls profiles, but the plot.profile.nls function in
 R doesn't seem to accept any plot.default variables.  
 

Package nlstools has a few nice graphical display methods.

Dieter


--
View this message in context: 
http://r.789695.n4.nabble.com/Plotting-NLS-profiles-tp3589020p3589119.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting NLS profiles

2011-06-10 Thread Prof Brian Ripley

On Fri, 10 Jun 2011, Prof Brian Ripley wrote:


On Fri, 10 Jun 2011, R Help wrote:


Hello list,

I'm trying to plot nls profiles, but the plot.profile.nls function in
R doesn't seem to accept any plot.default variables.  Specifically,
I'd like to be able to change the x-axis title and the colors to black
and white.  Has anyone had any luck with this?

If not, is there a way to override to plotting colors, perhaps in par()?


No.  The authors hardcoded all these.

Take a copy of the function and modify it to suit your purposes.


But as they hardcoded the colours to numbers, see ?palette .


Thanks,
Sam

fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
pr1 - profile(fm1, alpha = 0.05)
opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1)
plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine
plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) #
doesn't change


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scatterplot3d - help assign colors based on multiple conditions

2011-06-10 Thread Karthik Kota
Thanks! That did the trick. 


On Jun 10, 2011, at 3:16 AM, Jim Lemon wrote:

 On 06/10/2011 06:40 AM, Karthik Kota wrote:
 Thanks a lot! This is very helpful.
 
 If I have to extend this to one more condition say assign blue if both the 
 corresponding labels have _Tongue_dorsum, is there a straight forward 
 function. Doing something like below is overriding the colors assigned by 
 the first statement.
 
 col- ifelse(grepl(_Anterior_nares, xlabels)  grepl(_Anterior_nares, 
 ylabels), red, black)
 col- ifelse(grepl(_Tongue_dorsum, xlabels)  grepl(_Tongue_dorsum, 
 ylabels), blue, black)
 
 Hi Karthik,
 Your problem is that you are assigning all of the values to col twice. If 
 you reverse the order of the statements you will get the red/black color set. 
 Try this:
 
 col-rep(black,dim(cdh1)[1])
 col[grepl(_Anterior_nares, xlabels) 
 grepl(_Anterior_nares, ylabels)]-red
 col[grepl(_Tongue_dorsum, xlabels) 
 grepl(_Tongue_dorsum, ylabels)]-blue
 
 If your two conditions specify disjunct sets, that is, no case satisfies both 
 conditions, you should have the correct vector of colors.
 
 Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting NLS profiles

2011-06-10 Thread Sam Stewart
Wow, that's really clever, I didn't know you could manipulate the
palette like that.

Thanks Dr Ripley,
Sam

On Fri, Jun 10, 2011 at 3:45 PM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 On Fri, 10 Jun 2011, Prof Brian Ripley wrote:

 On Fri, 10 Jun 2011, R Help wrote:

 Hello list,

 I'm trying to plot nls profiles, but the plot.profile.nls function in
 R doesn't seem to accept any plot.default variables.  Specifically,
 I'd like to be able to change the x-axis title and the colors to black
 and white.  Has anyone had any luck with this?

 If not, is there a way to override to plotting colors, perhaps in par()?

 No.  The authors hardcoded all these.

 Take a copy of the function and modify it to suit your purposes.

 But as they hardcoded the colours to numbers, see ?palette .

 Thanks,
 Sam

 fm1 - nls(demand ~ SSasympOrig(Time, A, lrc), data = BOD)
 pr1 - profile(fm1, alpha = 0.05)
 opar - par(mfrow = c(2,2), oma = c(1.1, 0, 1.1, 0), las = 1)
 plot(pr1, conf = c(95, 90, 80, 50)/100) # works fine
 plot(pr1, conf = c(95, 90, 80, 50)/100,xlab=expression(theta),col=1) #
 doesn't change

 --
 Brian D. Ripley,                  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford,             Tel:  +44 1865 272861 (self)
 1 South Parks Road,                     +44 1865 272866 (PA)
 Oxford OX1 3TG, UK                Fax:  +44 1865 272595


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem loading packages in R 2.13.0 on Mac

2011-06-10 Thread Ethan Brown
Hi Adrienne,

I'm not familiar with your interface, but it sounds like R thinks the
package mvtnorm is not installed. You can see the packages you've
installed with:

row.names(installed.packages())

Is mvtnorm in the output of that command? You could test with the command,

mvtnorm %in% row.names(installed.packages())

If the result of the above command is FALSE, you can install it with:

install.packages(mvtnorm)

Best,
Ethan

On Fri, Jun 10, 2011 at 12:00 PM, Adrienne Keller 
adrienne.kel...@umontana.edu wrote:

 I am having problem loading packages in the newest version of R (2.13.0) on
 my Mac. I have tried to install various packages (e.g. lawstat, Rcmdr, car)
 and load them using the Package Installer and Package Manager menu options
 but I get the follow error:

  library(lawstat)
 Loading required package: mvtnorm
 Error: package 'mvtnorm' could not be loaded
 In addition: Warning message:
 In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
 lib.loc) :
  there is no package called 'mvtnorm'

 When I click on the box for loading lawstat in the Package Manager, it
 immediately reverts back to an unchecked box.

 I have tried to load mvtnorm and then load lawstat but I get the same
 error.

 Help?

 Thanks,

 Adrienne

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem loading packages in R 2.13.0 on Mac

2011-06-10 Thread Berend Hasselman

Adrienne Keller wrote:
 
 I am having problem loading packages in the newest version of R  
 (2.13.0) on my Mac. I have tried to install various packages (e.g.  
 lawstat, Rcmdr, car) and load them using the Package Installer and  
 Package Manager menu options but I get the follow error:
 
   library(lawstat)
 Loading required package: mvtnorm
 Error: package 'mvtnorm' could not be loaded
 In addition: Warning message:
 In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc  
 = lib.loc) :
there is no package called 'mvtnorm'
 
 When I click on the box for loading lawstat in the Package Manager, it  
 immediately reverts back to an unchecked box.
 
 I have tried to load mvtnorm and then load lawstat but I get the same  
 error.
 

The error message would imply that the package mvtnorm is not installed.

1. This is a Mac issue. Use the R-SIG-Mac mailing list.

2. Assuming you are using the R Mac GUI, you use the Package Installer to
install  packages.
  If you want dependencies installed don't forget to check the Install
dependencies checkbox in the lower right corner of the Package Installer
window. Select lawstat and then click Install Selected.

3. Now you should be able to load.

Berend


--
View this message in context: 
http://r.789695.n4.nabble.com/Problem-loading-packages-in-R-2-13-0-on-Mac-tp3589079p3589219.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Greg Snow
Yes, punif is the function to use, however the KS test (and the others) are 
based on an assumption of independence, and if you know that your data points 
sum to 1, then they are not independent (and not uniform if there are more than 
2).  Also note that these tests only rule out distributions (with a given type 
I error rate), but cannot confirm that the data comes from a given distribution 
(just that either they do, or there is not enough power to distinguish between 
the actual and the test distributions).

What is your ultimate question/goal?  Why do you care if the data is uniform or 
not?

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Kairavi Bhakta
 Sent: Friday, June 10, 2011 11:24 AM
 To: r-help@r-project.org
 Subject: [R] Test if data uniformly distributed (newbie)
 
 Hello,
 
 I have a bunch of files containing 300 data points each with values
 from 0
 to 1 which also sum to 1 (I don't think  the last element is relevant
 though). In addition, each data point is annotated as an a or a b.
 
 I would like to know in which files (if any) the data is uniformly
 distributed.
 
 I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
 goodness-of-fit test could be used. Then I looked up ?kolmogorov and
 found
 ks.test, but the example there is for the normal distribution and I
 am not
 sure how to adapt it for the uniform distribution. I did ?runif and
 read
 about the uniform distribution but it doesn't say what the cumulative
 distribution is. Is it punif, like pnorm? I thought of that
 because I
 found a message on this list where someone was told to use pnorm
 instead
 of dnorm. But the help page on the uniform distribution says punif is
 the
 distribution function. Are the cumulative distribution and the
 distribution function the same thing? Having several names for the
 same
 thing has always confused me very much in statistics.
 
 Also, I am not sure whether I need to specify any parameters for the
 distribution and which. I thought maybe I should specify min=0 and
 max=1
 but those appear to be the defaults. Do I need to specify q, the vector
 of
 quantiles?
 
 So is
  ks.test(x, punif)
 correct or not for what I am attempting to do?
 
 After this I will also need to find out whether the a's and b's are
 distributed randomly in each file. I would be greatful for any pointers
 although I have not researched this issue yet.
 
 Kairavi.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Greg Snow
OK, that is not the correct format for the KS test (which is expecting data 
ranging from 0 to 1 with a fairly flat histogram).  You could possibly test 
this with a Chi-squared test.  Can you tell us more about how the numbers you 
are looking at are generated?  The Chi-squared test could be used on counts of 
1-5 and compared to the assumption that each is equally likely, but there still 
is the question of power and how close to uniform is uniform enough.  You would 
need huge samples to find a difference if the true distribution is only 
slightly non uniform.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111

From: kairavibha...@googlemail.com [mailto:kairavibha...@googlemail.com] On 
Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 2:16 PM
To: Greg Snow; r-help@r-project.org
Subject: RE: [R] Test if data uniformly distributed (newbie)

Thanks for your answer. The reason I want the data to be uniform: It's the 
first step in a machine learning project I am working on. If I know the data 
isn't uniformly distributed, then this means there is probably something wrong 
and the following steps will be biased by the non-uniform input data. I'm not 
checking an assumption for another statistical test.

Actually, the data has been normalized because it is supposed to represent a 
probability distribution. That's why it sums to 1. My assumption is that, for a 
vector of 5, the data at that point should look like 0.20 0.20 0.20 0.20 0.20, 
but of course there is variation, and I would like to test whether the data 
comes close enough or not.

At the moment I am only testing whether there are more a's than b's in the top 
and bottom portion of the each file (with a wilcoxon test, I have 8 reps of the 
model I am trying to build). But that sort of felt like a very adhoc solution 
and I figured maybe testing for uniformity would be better, or at least a 
important addition. I've also been looking into testing for the randomness of 
the sequence of a's and b's instead of the wilcoxon test, although that may or 
may not involve R.

Kairavi.


 Yes, punif is the function to use, however the KS test (and the others) are 
 based on an assumption of independence, and if you know that your data points 
 sum to 1, then they are not independent (and not uniform if there are more 
 than 2).  Also note that these tests only rule out distributions (with a 
 given type I error rate), but cannot confirm that the data comes from a given 
 distribution (just that either they do, or there is not enough power to 
 distinguish between the actual and the test distributions).

 What is your ultimate question/goal?  Why do you care if the data is uniform 
 or not?

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599
 801.408.8111

[Hide Quoted Text]
-Original Message-
From: 
r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599
 
[mailto:r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599
project.orghttp://project.org] On Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 11:24 AM
To: 
r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599
Subject: [R] Test if data uniformly distributed (newbie)

Hello,

I have a bunch of files containing 300 data points each with values from 0 to 1 
which also sum to 1 (I don't think  the last element is relevant though). In 
addition, each data point is annotated as an a or a b.

I would like to know in which files (if any) the data is uniformly distributed.

I used Google and found out that a Kolmogorov-Smirnov or a Chi-square 
goodness-of-fit test could be used. Then I looked up ?kolmogorov and found 
ks.test, but the example there is for the normal distribution and I am not 
sure how to adapt it for the uniform distribution. I did ?runif and read about 
the uniform distribution but it doesn't say what the cumulative distribution 
is. Is it punif, like pnorm? I thought of that because I found a message on 
this list where someone was told to use pnorm instead of dnorm. But the 
help page on the uniform distribution says punif is the distribution 
function. Are the cumulative distribution and the distribution function 
the same thing? Having several names for the same thing has always confused me 
very much in statistics.

Also, I am not sure whether I need to specify any parameters for the 
distribution and which. I thought maybe I should specify min=0 and max=1 
but those appear to be the defaults. Do I need to specify q, the vector
of quantiles?

So is
ks.test(x, punif)
correct or not for what I am attempting to do?
After this I will also need to find out whether the a's and b's are distributed 
randomly in each file. I would be greatful for any pointers although I have not 

Re: [R] 'PROC CONTENTS' = ?

2011-06-10 Thread Frank Harrell
See the contents function in the Hmisc package.


karena wrote:
 
 Is there any function in R which does the same thing as what 'PROC
 CONTENTS' does in SAS?
 
 thanks,
 
 karena
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/PROC-CONTENTS-tp3589035p3589368.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] running R commands asynchronously

2011-06-10 Thread Richard M. Heiberger
I am interested in running R commands asynchronously.

My first choice is in the same R session that I am currently in.
Here, the goal would be to run something like

 RunAsynchSameSession(myfunction(), outputname.rda)

Once RunAsynchSameSession had started myfunction(),
RunAsynchSameSession would complete immediately.  myfunction would
keep going.  It is OK if execution of the myfunction() command
prevents new input to R until it has completed.  The important feature
is that RunAsynchSameSession must tell the progam that called it that
it was done.

Second choice is to start an independent process, BATCH or something
similar, and save the resulting data objects in an .rda file.

 RunAsynchBatch(myfile.r, outputname.rda)

The RunAsynchBatch would start a batch process and complete
immediately after starting the batch file.  The batch file would run
independently until it was completed.  While I know how to do this,
for example with system(wait=FALSE), I would appreciate seeing a
worked collection of statements, including getting outputname.rda back
to the original R session.  I am working on Windows.

Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How do I make proper use of the by() function?

2011-06-10 Thread Fredrik Karlsson
Dear list,

I have a function that uses values from two vectors  and spits out one new
vector  based on all the values of the two original vectors, and with the
same length as them.
Ok, I would now like to be able to apply that function simply on two columns
in a data frame, divided by the levels of factors in the data frame.

I'm trying to use by() for this, but the output is too hard to use. What I
am doing is this:

by(df, list(df$Factor1,df$Factor2),function(x)
my_function(x$col1,x$col2),simplify=TRUE)

and the output is too complex to be used in a simple way. Actually, I just
want something like a data frame, where the results vectors are placed in
one column and the conditions under which they were produced (i.e. the
values of the factors according to which the data set were divided) in other
columns.

This does not seem to be provided by by(), and aggregate only provides me
with the ability to use values from one column, right?
So, are there other functions I could use?

Thanks!

/Fredrik

-- 
Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double x grid in ggplot2

2011-06-10 Thread Dennis Murphy
This toy example works as you seem to expect:

library(ggplot2)
dat - data.frame(time = factor(rep(0:59, each = 100), levels = 0:59),
   error = rnorm(6000))
qplot(time, error, data = dat, size = I(1), main = title,
  ylab=Error (min), xlab=Time before ON (min), alpha=I(1/10),
  geom=jitter) +
geom_boxplot(alpha=.5, color=blue, outlier.colour=green,
 outlier.size=1) +
scale_x_discrete(breaks = seq(from=0, to=60, by=10))

 sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55523)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
[8] base

other attached packages:
 [1] raster_1.8-31   geoR_1.6-35 sp_0.9-81   googleVis_0.2.5
 [5] RJSONIO_0.5-0   sos_1.3-0   brew_1.0-6  lattice_0.19-26
 [9] ggplot2_0.8.9   proto_0.3-9.2   reshape_0.8.4   plyr_1.5.2

loaded via a namespace (and not attached):
[1] digest_0.5.0 tools_2.13.0

I don't see any of the problems that you've experienced, so you should
provide a minimal reproducible example that illustrates the problem -
it doesn't have to be your actual data, just something that mimics it
and reproduces the error you're seeing. There's likely to be something
in the way your data are constructed that generates the error, but
since you failed to provide it, who can tell?

Q: Why do you have levels 0 to 60 instead of either 0 to 59 or 1 to
60? Do you think that might be the cause of the problem?

Dennis

On Fri, Jun 10, 2011 at 11:08 AM, James Rome jamesr...@gmail.com wrote:
 I am trying to overlay raw data with a boxplot as follows:
        pp = qplot(factor(time, levels=0:60, ordered=TRUE),
            error, data=dfsub, size=I(1), main = title, ylab=Error
 (min),
            xlab=Time before ON (min), alpha=I(1/10),
 ylim=c(-30,40),    geom=jitter) +
            facet_wrap(~ runway, ncol=2) +
            geom_boxplot(alpha=.5, color=blue, outlier.colour=green,
 outlier.size=1) +
            scale_x_discrete(breaks = seq(from=0, to=60, by=10))
        print(pp)

 But I think ggplot2 is getting confused about factors versus numbers
 sapply(dfsub,class)
     time     error    runway    flight
 numeric numeric  factor  factor
 because when I do the plot, the vertical grid lines are in pairs, one
 line at (0, 10, 20,...) and the other at (-1, 9, 19,)

  If I remove the scale_x_discrete, I get a grid line at every minute,
 and things look right, but the labels for the minutes all overlap.

 Thanks for the help,
 Jim Rome

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Linear multivariate regression with Robust error

2011-06-10 Thread Daniel Malter
I am with Michael. It is almost impossible to figure out what you are trying.
However, I assume, like Michael, that you regress y on x2 and find, say, a
negative effect. But when you regress y on x1 and x2, then you find a
positive effect of x2. The short answer to your question is that in this
case your restricted model (the one only containing x2) suffers from omitted
variable bias. Here is an example:

Let's assume you are interested in the effect of x2 in this example! Let's
say we have 100 observations and that y depends on x1 and x2. Furthermore,
let us assume that x1 and x2 are positively correlated. 

x1=rnorm(100)
e1=rnorm(100) #random error term

x2=x1+rnorm(100) #x2 is correlated with x2

e=rnorm(100) #random error term

y=-3*x1+x2+e #dependent variable



Note that x1 has a negative relationship to y, but x2 has a positive
relationship to y. Note also that the effect of x1 on y is larger in size
(minus 3) than the effect of x2 on y (positive 1). Now let's run some
regressions.

First, let's run y on x1 only. An unbiased estimate should reproduce the
coefficient of -3 within the confidence interval. However, the estimated x1
is much smaller than we would expect. The reason is that because we omit x2,
x1 picks up some of the effect of x2 because x1 and x2 are correlated. Hence
the coefficient for x1 is diluted.

reg1-lm(y~x1)
summary(reg1)


Now, let's run y on x2. An unbiased estimate should reproduce the
coefficient of 1 within the confidence interval. However, the estimated
effect of x2 is negative and significant. Obviously, the estimate for x2 is
severely biased. The reasons are the following. First, x2 correlates with
x1. Hence, when you regress y only on x2, the coefficient will pickup some
of the effect of x1 on y. This will generally lead to biased estimates of
the coefficient for x2. The reason why the coefficient has the opposite sign
that it is supposed to have (and why it is not just a little bit biased like
the coefficient on x1 in the previous regression) is that 1. x1 and x2
correlated positively, 2. x1 has a negative effect, while x2 has a positive
effect on y (opposite signs), and 3. the effect of x1 is much larger in size
than the effect of x2.

reg2-lm(y~x2)
summary(reg2)


Hence, if we accounted for x1 and x2 in our regression of y, both
coefficients should be consistently estimated because then we do not suffer
from the omission of important predictors of y that are correlated among
each other.

reg3-lm(y~x1+x2)
summary(reg3)

Taahtaah. Problem solved (most likely). So the answer to your question is
that the correct coefficient is likely the one in which you include the
other control variables. You should read up on omitted variable bias. If
that is not the problem, you have to give us more information/reproducible
code.

Hope that helps,
Daniel

--
View this message in context: 
http://r.789695.n4.nabble.com/Linear-multivariate-regression-with-Robust-error-tp3587531p3589083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Counting the Number of Letters in a row

2011-06-10 Thread Weidong Gu
Is this what you want?
sapply(sapply(strsplit(as.character(f1$keyword),' '),nchar),sum)

Weidong Gu

On Fri, Jun 10, 2011 at 1:25 PM, Jorge Ivan Velez
jorgeivanve...@gmail.com wrote:
 Hi Abraham,

 Try

 foo - function(x){
 x - as.character(x)
  sapply(strsplit(x,  ), function(s) sum(nchar(s)))
 }
 foo(f1$keyword)

 HTH,
 Jorge


 On Fri, Jun 10, 2011 at 12:48 PM, Abraham Mathew  wrote:

 I'm trying to find the total number of letters in a row of a data frame.

 Let's say I have the following data frame.

 f1 - data.frame(keyword=c(I live in Denver, I live in Kansas City, MO,
 Pizza is good))

 The following function gives me the number of characters in each string.
 So for I live in Denver, I get 1, 4, 2, and 6. However, I want to know
 the
 total
 number of characters (13).

 sapply(strsplit(as.character(f1$keyword),  ), nchar)


 Thanks,
 Abraham

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Test if data uniformly distributed (newbie)

2011-06-10 Thread Kairavi Bhakta
Thanks for your answer. The reason I want the data to be uniform: It's the
first step in a machine learning project I am working on. If I know the data
isn't uniformly distributed, then this means there is probably something
wrong and the following steps will be biased by the non-uniform input data.
I'm not checking an assumption for another statistical test.

Actually, the data has been normalized because it is supposed to represent a
probability distribution. That's why it sums to 1. My assumption is that,
for a vector of 5, the data at that point should look like 0.20 0.20 0.20
0.20 0.20, but of course there is variation, and I would like to test
whether the data comes close enough or not.

At the moment I am only testing whether there are more a's than b's in the
top and bottom portion of the each file (with a wilcoxon test, I have 8 reps
of the model I am trying to build). But that sort of felt like a very adhoc
solution and I figured maybe testing for uniformity would be better, or at
least a important addition. I've also been looking into testing for the
randomness of the sequence of a's and b's instead of the wilcoxon test,
although that may or may not involve R.

Kairavi.


 Yes, punif is the function to use, however the KS test (and the others)
are based on an assumption of independence, and if you know that your data
points sum to 1, then they are not independent (and not uniform if there are
more than 2).  Also note that these tests only rule out distributions (with
a given type I error rate), but cannot confirm that the data comes from a
given distribution (just that either they do, or there is not enough power
to distinguish between the actual and the test distributions).

 What is your ultimate question/goal?  Why do you care if the data is
uniform or not?

 --
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 greg.s...@imail.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#
 801.408.8111


[Hide Quoted Text]
-Original Message-
From: 
r-help-boun...@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#[mailto:
r-help-bounces@r-https://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#
project.org] On Behalf Of Kairavi Bhakta
Sent: Friday, June 10, 2011 11:24 AM
To: 
r-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#
Subject: [R] Test if data uniformly distributed (newbie)

Hello,

I have a bunch of files containing 300 data points each with values from 0
to 1 which also sum to 1 (I don't think  the last element is relevant
though). In addition, each data point is annotated as an a or a b.

I would like to know in which files (if any) the data is uniformly
distributed.

I used Google and found out that a Kolmogorov-Smirnov or a Chi-square
goodness-of-fit test could be used. Then I looked up ?kolmogorov and found
ks.test, but the example there is for the normal distribution and I am not
sure how to adapt it for the uniform distribution. I did ?runif and read
about the uniform distribution but it doesn't say what the cumulative
distribution is. Is it punif, like pnorm? I thought of that because I
found a message on this list where someone was told to use pnorm instead
of dnorm. But the help page on the uniform distribution says punif is the
distribution function. Are the cumulative distribution and the
distribution function the same thing? Having several names for the same
thing has always confused me very much in statistics.

Also, I am not sure whether I need to specify any parameters for the
distribution and which. I thought maybe I should specify min=0 and max=1
but those appear to be the defaults. Do I need to specify q, the vector
of quantiles?

So is
ks.test(x, punif)
correct or not for what I am attempting to do?
After this I will also need to find out whether the a's and b's are
distributed randomly in each file. I would be greatful for any pointers
although I have not researched this issue yet.

Kairavi.

[[alternative HTML version deleted]]

__
R-help@r-project.orghttps://webmail.uni-saarland.de/imp/message.php?mailbox=INBOXindex=81599#mailing
list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/postinghttp://www.r-project.org/posting
-
guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with Median test and Coxon-Mann-Whittney test

2011-06-10 Thread Robert Johnson
Hi All,

I have the following dataset (triplicates values of 5 independent
measurements and duplicate vaues of a control):

12  3   4   5  C
181.8  58.2 288.9 273.2290.953.9
120.3 116.8108.9 281.3 446  39.6
86.1  148.5 52.9  126   150.3

My aim was to find if mean values of Samples 1 - 5 are significantly higher
than the mean value for C (control).

 At first, I carried out mean, SD and t-test (one-tail). Although SD error
bars are large, two of the samples have mean values that are significantly
higher than that of C.

Second, I carried out median and Coxon-Mann-Whittney test because of my
worry about the size of my data and the high variations in the replicates. I
was surprised however, that median and Coxon-Mann-Whittney tests did not
reveal statistical significant results.

I will be happy if anyone could adivce me on the best way to analyse this
dataset.

Regards,

Rob

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] change value in one cell

2011-06-10 Thread jour4life
Hello all,

I am wondering if there is a way to change the value of one cell in R. For
instance let's say I have a hypothetical data frame that looks like this:

Obs X Y Z
11 0 1
20 0 1
31 1 1
40 1 1
51 1 0
60 0 0

I would like to change the value of the 4th observation in the Y column from
1 to 0. It should look like this:

Obs X Y Z
11 0 1
20 0 1
31 1 1
40 0 1
51 1 0
60 0 0

Is it possible to change the value in one command?

Thanks,

Carlos


--
View this message in context: 
http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RES: Linear multivariate regression with Robust error

2011-06-10 Thread Filipe Leme Botelho
---BeginMessage---
Hi Barjesh,

I am not sure which data you analyze, but once I had a similar situation
and it was a multicolinearity issue. I realized this after finding a
huge correlation between the independent variables, then I dropped one
of them and signs of slopes made sense.

Beforehand, have a glance at your correlation matrix of independent
variables to see the relationship between them.

Not sure how helpful my advice will be, but it did the trick for me back
then ;)

Cheers,
Filipe Botelho

-Mensagem original-
De: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
Em nome de Michael Friendly
Enviada em: sexta-feira, 10 de junho de 2011 12:51
Para: Barjesh Kochar
Cc: r-help@r-project.org
Assunto: Re: [R] Linear multivariate regression with Robust error

On 6/10/2011 12:23 AM, Barjesh Kochar wrote:
 Dear all,

 i am doing linear regression with robust error  to know the effect of
 a  (x) variable on (y)other if i execute the command i found positive
 trend.
   But if i check the effect of number of (x.x1,x2,x3)variables
 on same (y)variable then the positive effect shwon by x variable turns
 to negative. so plz help me in this situation.

 Barjesh Kochar
 Research scholar

You don't give any data or provide any code (as the posting guide
requests) , so I have to guess that you
have just rediscovered Simpson's paradox -- that the coefficient of a
variable in a marginal regression can have an opposite sign to that in
a joint model with other predictors. I have no idea what you mean
by 'robust error'.

One remedy is an added-variable plot which will show you the partial
contributions of each predictor in the joint model, as well as whether
there are any influential observations that are driving the estimated
coefficients.


--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
---End Message---
This message and its attachments may contain confidential and/or privileged 
information. If you are not the addressee, please, advise the sender 
immediately by replying to the e-mail and delete this message.

Este mensaje y sus anexos pueden contener información confidencial o 
privilegiada. Si ha recibido este e-mail por error por favor bórrelo y envíe un 
mensaje al remitente.

Esta mensagem e seus anexos podem conter informação confidencial ou 
privilegiada. Caso não seja o destinatário, solicitamos a imediata notificação 
ao remetente e exclusão da mensagem.__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Memory(RAM) issues

2011-06-10 Thread Abhisek Saha
Dear All,
I have been working with R(desktop version) in VISTA. I have the latest
version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15
and my code is quite computing intensive ( applying MCMC gibbs sampler on a
posterior of 144 variables) that often runs into a memory issue such as
memory can not allocate the vector ,full size(shows to have reached
something like 1.5 GB) reached or something to this effect. I have a RAM of
2GB.  I checked with the option like memory.size and it says a 64 bit R if
sat on 64 bit windows then max memory capability is 8TB.

Now I don't have  background to understand the definitions and differences
between 32 and 64 bit machines and other technical requirements like servor
etc but it would be of great help if anyone can let me have a feel of it.
Could any of you tell me whether some servor version of R would resolve my
issue or not (I am not sure now what kind of servor my company would allow R
to be installed at this point ,may be linux type) and if that's the case
could any of you guide me about how to go about installing that onto a
sevor.

Thank you,
Abhisek

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there an implementation loess with more than 4 parametric predictors or a trick to similar effect?

2011-06-10 Thread Dr. D. P. Kreil (Boku)
Dear R experts,

I have a problem that is a related to the question raised in this earlier
post
https://stat.ethz.ch/pipermail/r-help/2007-January/124064.html

My situation is different in that I have only 2 predictors (coordinates x,y)
for local regression but a number of global (parametric) offsets that I
need to consider.

Essentially, I have a spatial distortion overlaid over a number of
measurements. These measurements can be grouped by the same underlying value
for each group. The groups are known, but the values are not. We need to
estimate the spatial trend, which we then want to remove. In our
application, the spatial trend is two-dimensional (x,y), and there are about
20 groups of about 50 measurements each. The measurements are randomly
placed. Taking the first group as reference, there are thus 19 unknown
offsets.

The below code for toy data (spatial trend in one dimension x) works for two
or three offset groups, although I have not yet found out how to extract the
fitted values for the globally fit parameters (the offsets).

Unfortunately, the loess call fails for four or more offset groups with the
error message
Error in simpleLoess(y, x, w, span, degree, parametric, drop.square,
normalize,  :
  only 1-4 predictors are allowed

Does anyone know of an implementation of local regression with global
(parametric) offset groups that could be applied here, or is there a better
way of dealing with this?

Any comments would be greatly appreciated!

Best regards,
David Kreil.



###
#
# loess with parametric offsets - toy data demo
#

x-seq(0,9,.1);
x.N-length(x);

o-c(0.4,-0.8,1.2#,-0.2  # works for three but not four
 );  # these are the (unknown) offsets
o.N-length(o);
f-sapply(seq(o.N),
  function(n){
ifelse((seq(x.N)= n   *x.N/(o.N+1) 
seq(x.N) (n-1)*x.N/(o.N+1)),
1,0);
  });
f-f[sample(NROW(f)),];

y-sin(x)+rnorm(length(x),0,.1)+f%*%o;
s.fs-sapply(seq(NCOL(f)),function(i){paste('f',i,sep='')});
s-paste(c('y~x',s.fs),collapse='+');
d-data.frame(x,y,f)
names(d)-c('x','y',s.fs);

l-loess(formula(s),parametric=s.fs,drop.square=s.fs,normalize=F,data=d,
 span=0.4);
yp-predict(l,newdata=d);

plot(x,y-f%*%o,ylim=c(-3,3)); # spatial distortion
lines(x,yp-f%*%o,pch='+');# estimate of that
points(x,y,pch='+',col='red');# input data
points(x,yp,pch='o',col='blue');  # fit of that

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change value in one cell

2011-06-10 Thread jim holtman
you probably need to read the Introduction to R to understand indexing:

 x
  Obs X Y Z
1   1 1 0 1
2   2 0 0 1
3   3 1 1 1
4   4 0 1 1
5   5 1 1 0
6   6 0 0 0
 x[4, Y] - 0
 x
  Obs X Y Z
1   1 1 0 1
2   2 0 0 1
3   3 1 1 1
4   4 0 0 1
5   5 1 1 0
6   6 0 0 0


On Fri, Jun 10, 2011 at 5:42 PM, jour4life jour4l...@gmail.com wrote:
 Hello all,

 I am wondering if there is a way to change the value of one cell in R. For
 instance let's say I have a hypothetical data frame that looks like this:

 Obs X Y Z
 1    1 0 1
 2    0 0 1
 3    1 1 1
 4    0 1 1
 5    1 1 0
 6    0 0 0

 I would like to change the value of the 4th observation in the Y column from
 1 to 0. It should look like this:

 Obs X Y Z
 1    1 0 1
 2    0 0 1
 3    1 1 1
 4    0 0 1
 5    1 1 0
 6    0 0 0

 Is it possible to change the value in one command?

 Thanks,

 Carlos


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/change-value-in-one-cell-tp3589456p3589456.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running R commands asynchronously

2011-06-10 Thread Greg Snow
Tk windows run asynchronous to the rest of R, so you could write a small 
function that lets you type the command into a Tk window and runs it while you 
continue to work in the main R, then the window could signal you when the 
function finishes.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Richard M. Heiberger
Sent: Friday, June 10, 2011 3:30 PM
To: r-help
Subject: [R] running R commands asynchronously

I am interested in running R commands asynchronously.

My first choice is in the same R session that I am currently in.
Here, the goal would be to run something like

 RunAsynchSameSession(myfunction(), outputname.rda)

Once RunAsynchSameSession had started myfunction(),
RunAsynchSameSession would complete immediately.  myfunction would
keep going.  It is OK if execution of the myfunction() command
prevents new input to R until it has completed.  The important feature
is that RunAsynchSameSession must tell the progam that called it that
it was done.

Second choice is to start an independent process, BATCH or something
similar, and save the resulting data objects in an .rda file.

 RunAsynchBatch(myfile.r, outputname.rda)

The RunAsynchBatch would start a batch process and complete
immediately after starting the batch file.  The batch file would run
independently until it was completed.  While I know how to do this,
for example with system(wait=FALSE), I would appreciate seeing a
worked collection of statements, including getting outputname.rda back
to the original R session.  I am working on Windows.

Rich

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R program writing standard/practices

2011-06-10 Thread Rolf Turner

On 11/06/11 03:35, Santosh wrote:

Dear Experts,

I notice that there are different ways of writing programs. Was wondering if
there is anything like a standard which could be used to write good/complete
R programs, maintain quality, easy to debug, a standard/practice that can be
consistent in an enterprise environment. Also, are there any beacon lights
that could help navigate through lines of code, programming styles,
understand the scope of a program, identify gaps etc?


This issue was recently discussed on this list.  One suggested source of 
info

was

http://4dpiecharts.com/r-code-style-guide/


cheers,

Rolf Turner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] NDCG in R?

2011-06-10 Thread Jim Cheng
Dose R have a function to calculate NDCG?
http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG

Thanks!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems with geom_vline, histograms, scale=free and facets in ggplot

2011-06-10 Thread Julian Burgos
Dear list,

I´m having a little trouble with adding vertical lines to a histogram.
 I need to draw a matrix of histograms (using facets), and in each
histogram add a vertical line indicating the mean value of the data in
each facet.  According to the last example in the geom_hline help, to
display different lines in each facet I need to provide a data frame.
I modified this example to make a plot similar to what I need:


# BEGIN R CODE 
# This example works and does exactly what I need
library(reshape)
library(ggplot2)

# First, create a data frame with the mean value, to add to each facet
mean.data=tapply(mtcars$wt,INDEX=list(mtcars$vs,mtcars$am),mean)
mean.data=melt(mean.data)
colnames(mean.data)=c(vs,am,wt.mean)

# Now do the plot
p - qplot(wt, data=mtcars)
p - p + facet_grid(vs ~ am,scale=free)
p - p + geom_vline(aes(xintercept = wt.mean), mean.data,col=red)
p
###

###
# This one does not work
# Create a test data set
set.seed(100)
my.data=data.frame(code1=rep(c(A,B,C),each=30),
  code2=rep(rep(c(D,E,F),each=10),3),
  n=runif(90))

# Create a data frame with the mean value, to add to each facet
mean.data=tapply(my.data$n,INDEX=list(my.data$code1,my.data$code2),mean)
mean.data=melt(mean.data)
colnames(mean.data)=c(code1,code2,n.mean)

# Now do the plot
p = qplot(n, data=my.data) +
  facet_grid(code2 ~ code1, scales=free)
p + geom_vline(aes(xintercept = n.mean), mean.data,col=red)

#With this I get the following error: Error in if (length(range) == 1
|| diff(range) == 0) { :   missing value where TRUE/FALSE needed.

# If I remove the scales=free argument. I get the lines but also a
lot of extra empty facets.

p = qplot(n, data=my.data) +
  facet_grid(code2 ~ code1)
p + geom_vline(aes(xintercept = n.mean), mean.data,col=red)

# END R CODE ##

I can´t figure out why the second example does not work when having
the scales=free argument, and why I get the extra facets when
removing it.  Is this a bug?  Any help will be very welcomed.

Julian

--
Julian Mariano Burgos
Hafrannsóknastofnunin/Marine Research Institute
Skúlagata 4, 121 Reykjavík, Iceland
Sími/Telephone : +354-5752037
Bréfsími/Telefax:  +354-5752001
Netfang/Email: jul...@hafro.is, jmbur...@uw.edu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Memory(RAM) issues

2011-06-10 Thread Anupam
It will be helpful on this forum to use metric measures: 12 Lakh is 1.2
million, thus your data is 1.2 million observations x 15 variables. I do not
know the intricacies of R. You may have to wait for someone with that
knowledge to respond.

Including some relevant portions of error messages and code in your query
can also help someone to respond to your message.

Anupam.
-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Abhisek Saha
Sent: Saturday, June 11, 2011 6:25 AM
To: r-help@r-project.org
Subject: [R] Memory(RAM) issues

Dear All,
I have been working with R(desktop version) in VISTA. I have the latest
version R 2.13.0. I have been working with few data-sets of size 12Lakh * 15
and my code is quite computing intensive ( applying MCMC gibbs sampler on a
posterior of 144 variables) that often runs into a memory issue such as
memory can not allocate the vector ,full size(shows to have reached
something like 1.5 GB) reached or something to this effect. I have a RAM of
2GB.  I checked with the option like memory.size and it says a 64 bit R if
sat on 64 bit windows then max memory capability is 8TB.

Now I don't have  background to understand the definitions and differences
between 32 and 64 bit machines and other technical requirements like servor
etc but it would be of great help if anyone can let me have a feel of it.
Could any of you tell me whether some servor version of R would resolve my
issue or not (I am not sure now what kind of servor my company would allow R
to be installed at this point ,may be linux type) and if that's the case
could any of you guide me about how to go about installing that onto a
sevor.

Thank you,
Abhisek

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I make proper use of the by() function?

2011-06-10 Thread Anupam
You may want to read about several apply functions. They may help you do
what you are trying.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Fredrik Karlsson
Sent: Saturday, June 11, 2011 3:13 AM
To: R-Help List
Subject: [R] How do I make proper use of the by() function?

Dear list,

I have a function that uses values from two vectors  and spits out one new
vector  based on all the values of the two original vectors, and with the
same length as them.
Ok, I would now like to be able to apply that function simply on two columns
in a data frame, divided by the levels of factors in the data frame.

I'm trying to use by() for this, but the output is too hard to use. What I
am doing is this:

by(df, list(df$Factor1,df$Factor2),function(x)
my_function(x$col1,x$col2),simplify=TRUE)

and the output is too complex to be used in a simple way. Actually, I just
want something like a data frame, where the results vectors are placed in
one column and the conditions under which they were produced (i.e. the
values of the factors according to which the data set were divided) in other
columns.

This does not seem to be provided by by(), and aggregate only provides me
with the ability to use values from one column, right?
So, are there other functions I could use?

Thanks!

/Fredrik

--
Life is like a trumpet - if you don't put anything into it, you don't get
anything out of it.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] NDCG in R?

2011-06-10 Thread Dennis Murphy
Hi:

I couldn't find one by searching with RSiteSearch() or package sos,
but it's not hard to write a simple function that reproduces the
example on the Wikipedia page:

ndcg - function(x) {
  # x is a vector of relevance scores
  ideal_x - rev(sort(x))
  DCG - function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2))
  DCG(x)/DCG(ideal_x)
 }

rel - c(3,2,3,0,1,2)  # example from Wikipedia
ndcg(rel)
[1] 0.9315085

The difference between this and the result on the Wikipedia page is
due to rounding. You can see this by pulling out the DCG function and
evaluating it separately:

 DCG - function(y) y[1] + sum(y[-1]/log(2:length(y), base = 2))
 DCG(rel)
[1] 8.097171
 DCG(rev(sort(rel)))
[1] 8.692536

The Wikipedia page clearly indicates that there are several
considerations that need to be addressed in the generation and
processing of relevance scores. The above function obviously does not
cover all the bases, but it does cover the situation corresponding to
the given example. It's a start...

HTH,
Dennis

On Fri, Jun 10, 2011 at 6:14 PM, Jim Cheng jxch...@gmail.com wrote:
 Dose R have a function to calculate NDCG?
 http://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG

 Thanks!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] running R commands asynchronously

2011-06-10 Thread Prof Brian Ripley

On Fri, 10 Jun 2011, Greg Snow wrote:

Tk windows run asynchronous to the rest of R, so you could write a 
small function that lets you type the command into a Tk window and 
runs it while you continue to work in the main R, then the window 
could signal you when the function finishes.


That is really not advised.  R is not designed to run asynchronous R 
commands and is not protected against quite a lot of things which can 
happen if you try that (many low-level routines are not re-entrant, 
for example).  Allowing callbacks to either run simple things (as 
happens from e.g. the GUI menus) whilst a task is running, or to do 
almost all the running (as happens with a menu system built in Tk) is 
fairly safe just because only one task is likely to run at a time.


For over a decade Duncan TL has promised a facility to run tasks in 
parallel in R (as I recall the estimate at DSC 2001 was 12 months).


So the only safe way at present (and the foreseeable future) is to run 
separate processes.  Packages snow and multicore provide means to do 
that (and to wait for and collect results): in the case of multicore 
the parallel tasks work on a copy of the current session and so you do 
come close to the appearance of running aysnchronous tasks.  (By 
choosing to use Windows you exclude yourself from multicore.)




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Richard M. Heiberger
Sent: Friday, June 10, 2011 3:30 PM
To: r-help
Subject: [R] running R commands asynchronously

I am interested in running R commands asynchronously.

My first choice is in the same R session that I am currently in.
Here, the goal would be to run something like

RunAsynchSameSession(myfunction(), outputname.rda)

Once RunAsynchSameSession had started myfunction(),
RunAsynchSameSession would complete immediately.  myfunction would
keep going.  It is OK if execution of the myfunction() command
prevents new input to R until it has completed.  The important feature
is that RunAsynchSameSession must tell the progam that called it that
it was done.

Second choice is to start an independent process, BATCH or something
similar, and save the resulting data objects in an .rda file.

RunAsynchBatch(myfile.r, outputname.rda)

The RunAsynchBatch would start a batch process and complete
immediately after starting the batch file.  The batch file would run
independently until it was completed.  While I know how to do this,
for example with system(wait=FALSE), I would appreciate seeing a
worked collection of statements, including getting outputname.rda back
to the original R session.  I am working on Windows.

Rich


--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.