Re: [R] panel.first problem when plotting with formula

2011-05-25 Thread Gene Leynes
Peter,

Good idea!  (why didn't I think of that?)

If it stumped the r-list, I think there is probably a slight bug with the
plot formula.

Problems like this make me realize how amazingly full featured and
relatively bug free R is.  A problem like this would never happen in Excel,
because this level of functionality does not exist.  However, if it did, it
would probably never be fixed... and you could substitute Excel with Any
commercial software.

Gene


On Tue, May 24, 2011 at 3:13 AM, Peter Ehlers ehl...@ucalgary.ca wrote:

 On 2011-05-23 16:54, Gene Leynes wrote:

 I wrote a little function called bgfun that adds gridlines and a
 background,
 but it's not working with I plot using the formula.

 I have some theories on what's happening, but even if my theory is right,
 I
 don't know how to fix it.

 Someone have a straightforward silver bullet?


 No silver bullet, but this seems to work:

  plot(y ~ x, data=dat, type=n)
  points(y ~ x, data=dat, panel.first=bgfun())

 (I think that plot.formula may need a fix but
 offhand I can't see whether that's easy or hard.)

 Peter Ehlers


 Thank you,

 Gene



 bgfun = function(color='honeydew2',linecolor='grey45', addgridlines=TRUE){
 tmp=par(usr)
 rect(tmp[1], tmp[3], tmp[2], tmp[4], col=color)
 if(addgridlines){
 ylimits=par()$usr[c(3,4)]
 abline(h=pretty(ylimits,10), lty=2, col=linecolor)
 }
 }
 dat = data.frame(x=1:10,y=1:10)

 ## Works
 plot(dat$x, dat$y, panel.first=bgfun())

 ## Why doesn't this work?
 plot(y ~ x, data=dat, panel.first=bgfun())

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting texas school district using shape files

2011-05-25 Thread Shant Ch
Hi,

I was plotting or creating a map for Texas school districts using the shape 
file 
of Texas. I could not find any other helpful mail in the mailing list.


txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools))

Error-  read.shape no found. But read.shape is there in maptools. 

If anyone can help me out it will be great. Thanks in advance.

Shant
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Processing large datasets

2011-05-25 Thread Roman Naumenko
Hi R list,

I'm new to R software, so I'd like to ask about it is capabilities.
What I'm looking to do is to run some statistical tests on quite big 
tables which are aggregated quotes from a market feed.

This is a typical set of data.
Each day contains millions of records (up to 10 non filtered).

2011-05-24  750 Bid DELL14130770400 
15.4800 BATS35482391Y   1   1   0   0
2011-05-24  904 Bid DELL14130772300 
15.4800 BATS35482391Y   1   0   0   0
2011-05-24  904 Bid DELL14130773135 
15.4800 BATS35482391Y   1   0   0   0

I'll need to filter it out first based on some criteria.
Since I keep it mysql database, it can be done through by query. Not 
super efficient, checked it already.

Then I need to aggregate dataset into different time frames (time is 
represented in ms from midnight, like 35482391).
Again, can be done through a databases query, not sure what gonna be faster.
Aggregated tables going to be much smaller, like thousands rows per 
observation day.

Then calculate basic statistic: mean, standard deviation, sums etc.
After stats are calculated, I need to perform some statistical 
hypothesis tests.

So, my question is: what tool faster for data aggregation and filtration 
on big datasets: mysql or R?

Thanks,
--Roman N.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to call an external program/web page under R for Mac OS?

2011-05-25 Thread Paul Hiemstra
 On 05/24/2011 10:56 PM, jbrezmes wrote:
 I would like to be able to call external programs such as Java scripts (*.jar
 files) or bring up the browser to a given direction. Can that be done from
 R?

 I am running R on a mac OS X system.

 Thanks again for any suggestions or solutions.

 Best regards,

 Jesus Brezmes

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-call-an-external-program-web-page-under-R-for-Mac-OS-tp3548479p3548479.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
See ?system for executing external calls...

cheers,
Paul

-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] questions about rpart

2011-05-25 Thread carol white
Hi,
I have applied rpart to my data set and for cp=.01, the cross-validation error 
(xerr) is less (min 0.05) than for other cp. However, in the final tree, an 
important predictor is not retained. Moreover, another predictor contains 
missing values in 40% of samples. So I don't know if the important predictor is 
not retained as the result of missing values or if I should have selected other 
values of cp. Note that the data contains binary class.

Another question is that how it is possible to interpret the relative or 
cross-validation error for ex by the number of samples. I know that they are 
scaled to 1 at the root node of the tree but for any number of splits, how much 
error we make for each sample (but we don't know the number of sample in each 
split retured by printcp).

Any other information is welcome.

Look forward to your reply,

Carol

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting single variables common to multiple data frames

2011-05-25 Thread Mathew Brown



Hi John,
First off, thanks again for your help with this. Much appreciated.

I've attached a file of the original data (yes, as you can see there are
header names). These hour long files are zipped together on a computer
(which is actually an analyzer) and sent each morning to a server. I
then run the function below which extracts the data files and binds them
into daily files. I then save these files in the .RData format (the
'stuff' file I sent you). I agree that the way I am saving these files
must be causing the problem. I'm very new to R and this was the only way
I found to save the files, that 'seemed' to work. Please tell me if you
know a better way (I'm sure you do)!

###Function used to extract and save##
# this function extracts 1 hour iso data from zip files and creates
daily files

#
path = Y:\\Data\\
pathout = Y:\\Daily\\
time=Sys.time()
tt- as.numeric(format.Date(time, %Y%m%d))
#tt-20110523 #used to manually enter in date
end = .zip
ind = paste(path,tt,end,sep=)
xx=unzip(ind)

#merge all files to one and split by day
iso- c();d- c()
for (x in xx) {
u-read.table(x, header=TRUE, sep=, dec=.)
   u$dataset = x
   iso = rbind(iso,u)
   udate- unique(iso$DATE)
   d=split(iso,iso$DATE)
}

#create directories and load if files exist. Then merge data from same
day and save
fname- c()
finame- c()
old- c()
udate=gsub([^0-9],,udate)
for (i in 1:length(udate)){
#fname[i]-  paste(pathout,udate[i],sep=)
finame[i]=paste(pathout,udate[i],.RData, sep=)
deskdir- dir.create(pathout,showWarnings=FALSE)
   if (file.exists(finame[i])){
   old=load(finame[i])
   e3- new.env()
   old- get('isot', e3)
   isot = merge(old,d[i],all = TRUE)

   save(isot, file=finame[i])
   }else {
isot=d[i]
save(isot, file=finame[i])
   }

}
rm(list = ls(all = TRUE))
#End


Once I have these daily RData files (e.g. 20110520.RData) I'd like to be
able to grab any number of them and plot them all together. I'm trying
to get this process streamlines as much as possible so I can come into
work each day and plot the data from the last week with 'a click of a
button'.

Thanks again!

Mat


On 5/24/2011 7:15 PM, John Kane wrote:

 Whoa, more data than I needed.  I called the rdata file from your dput results 
'stuff' so any commands to stuff is to that file

 You say The structure is kind of strange and I have to agree with you.  As 
it stands I cannot get it to do anything  A str(stuff) command show that it is  
data.frame with 8258 obs. of  38 variables. However it also says a variable called 
X2011.05.20.TIME which is a Factor w/ 114230 level--and this is patently nonsense.

 It is almost a certainty that it is something about the code you are using to 
load the data or the orginal structure of the file  which is causing the problem

 Simple commands like:
 names(stuff)
 stuff[1,1]
 stuff[,1]dim(stuff)

 are not working or returning nonsense


 I took the file, wrote it back out of R as a csv file and read it bake in and 
I seem to have something I can work with but, of course, that does not mean it 
looks like your orginal data.  Se my code below

 Some quick questions

 1. What is the format of the original data files?

 2. What commands are you using to read the data into R?
 Please supply the code.

 3.  Do the files actually have header names? It looks to me as if the reading 
in command thinks you have variable names at the top of the column but you 
don't and so it's using the first row of data as the variable names

 Mysteps
 #===
 #I took the stuff file and did a write.table on it,
 # storing the file as a text (or csv) file called mystuff
 #===
   write.table(stuff, file=c:/rdata/mystuff.csv,
  row.names = FALSE, sep=,,
 col.names=FALSE   )
 #

 # I, then, read the data back into a new file new.data
 #
 new.data- read.csv(c:/rdata/mystuff.csv,
   sep=,, header=FALSE)
 #

 #now commands like
   names(new.data)
  new.data[,1]
  dim(new.data)

 # are working the way we would expect.
 #=





 names(new.data)
  new.data[,1]






 --- On Tue, 5/24/11, Mathew Brownmathew.br...@forst.uni-goettingen.de   
wrote but


 From: Mathew Brownmathew.br...@forst.uni-goettingen.de
 Subject: Re: [R] plotting single variables common to multiple data frames
 To: John Kanejrkrid...@yahoo.ca
 Cc: r-help@r-project.org
 Received: Tuesday, May 24, 2011, 10:38 AM
 Here is some data. Only one day as
 two days were too big. The structure
 is kind of strange and I'm not sure how to 'grab' a single
 variable from
 it to plot. I would be happy if someone could tell me how
 to do that.

 Cheers


 On 5/24/2011 3:55 PM, John Kane wrote:

 

Re: [R] Count of rows while looping through data

2011-05-25 Thread Kenn Konstabel
An alternative approach would be to `split` the data frame by family,
then `lapply` a function selecting random row from each slice, and
then `rbind` it all together.

x = data.frame(family = rep(1:20,sample(2:5,20,replace=TRUE)), xyz=1)
randomrow - function(x) x[sample(1:nrow(x),1),]

# step by step
x.split - split(x, x$family)
x.rnd - lapply(x.split, randomrow)
x.togetheragain - do.call(rbind, x.rnd)

# or more concisely
do.call(rbind,  lapply(split(x, x$family), randomrow)  )

Best regards,

Kenn

On Wed, May 25, 2011 at 12:54 AM, Phil Spector
spec...@stat.berkeley.edu wrote:
 Jeanna -
    I can't imagine how you could solve this problem with a loop, but here's
 one way to solve it using R:

 First, I'll create a data frame with a family variable:

 x = data.frame(family = rep(1:20,sample(2:5,20,replace=TRUE)))

 Next, I'll number each family member within each family:

 x$seq = ave(x$family,x$family,FUN=seq)

 Now I'll choose a random number within each family:

 x$use = ave(x$family,x$family,FUN=function(x)sample(1:length(x),1))

 Finally, I'll select the family member whose sequence number matches the
 random number:

 answer = subset(x,seq == use)

 Hope this helps.  Take a look at the help page for the ave function
 to understand how it works.
                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu


 On Tue, 24 May 2011, Jeanna wrote:

 I have a data table with one column that indicates families, and
 subsequent
 columns with other characteristics.  I want to randomize one member of the
 family to a separate table.  My approach is to count the number of
 members,
 set up a random number generator, and assign the family member based on
 where they fall within the random number spectrum.

 Is there a way to count the number of family members as I loop through the
 whole table?

 Something like this:
 for (j in 1:15){
        if (x$family[j] == x$family[j+1]){
       count = count +1
 (which doesn't work)

 as I do the larger:
 for (i in 2:nrow(x.tab)){


 --
 View this message in context:
 http://r.789695.n4.nabble.com/Count-of-rows-while-looping-through-data-tp3547949p3547949.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with tune.svm() e1071

2011-05-25 Thread Salih Tuna
Hi,
I am trying to use tune.svm in e1071 package.
the command i use is

tobj - tune.svm(labels, data= data, cost = 10^(1:2))

Should the last column of the 'data' contain the labels as well? I want to
use the linear kernel. But it gives me the error
Error in model.frame.default(formula, data) : 'data' must be a data.frame,
not a matrix or an array

Do you know why this might happen?

best,
salih

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fw: questions about rpart - cont.

2011-05-25 Thread carol white
Forgot to specify that the cross-val error cannot be decreased lower than 0.91. 
Note that for smaller values of cp than 0.01, the cross-val error increases.

Is the cross-val error sum of squared error or relative error for 
classification 
problem (method = class in rpart function) or another type of error?

Is it possible to determine the true positive, false positive using rpart?

Thanks



- Forwarded Message 
From: carol white wht_...@yahoo.com
To: r-h...@stat.math.ethz.ch
Sent: Wed, May 25, 2011 9:06:15 AM
Subject: questions about rpart

Hi,
I have applied rpart to my data set and for cp=.01, the cross-validation error 
(xerr) is less (min 0.05) than for other cp. However, in the final tree, an 
important predictor is not retained. Moreover, another predictor contains 
missing values in 40% of samples. So I don't know if the important predictor is 
not retained as the result of missing values or if I should have selected other 
values of cp. Note that the data contains binary class.

Another question is that how it is possible to interpret the relative or 
cross-validation error for ex by the number of samples. I know that they are 
scaled to 1 at the root node of the tree but for any number of splits, how much 
error we make for each sample (but we don't know the number of sample in each 
split retured by printcp).

Any other information is welcome.

Look forward to your reply,

Carol

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGL package installation problem on Centos

2011-05-25 Thread john herbert
Hi.
Thank you for your help. From your suggestions, I tried the following;

R CMD INSTALL --no-test-load rgl_0.92.798.tar.gz

This seemed to load and install (starting R and issuing library(rgl) did not
flag any problems
But running the sphere example from rgl, it causes big problems :-)

# R
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
  Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
 library(rgl)
 open3d()
[1] 1
 spheres3d(rnorm(10), rnorm(10), rnorm(10), radius=runif(10),
color=rainbow(10))
X Error of failed request:  GLXUnsupportedPrivateRequest
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  16 (X_GLXVendorPrivate)
  Serial number of failed request:  75
  Current serial number in output stream:  80
 *** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
 1: .C(rgl_spheres, success = as.integer(FALSE), idata,
as.numeric(vertex), as.numeric(radius), NAOK = TRUE)
 2: rgl.spheres(x = c(0.506515614656334, -0.610549216480097,
1.08552683577513, 0.189935807154803, 1.3670636776769, 1.0181689602839,
-1.51133180077403, 1.41127485066926, 0.199668469858345, -1.22523054947931),
y = c(-0.323499291411831, -1.00507951141751, -0.901821819799205,
1.41189828512003, -0.131573335707317, -0.308459525548042, 1.50221794165404,
-0.154047787639801, 0.44717002689869, -0.93671163236924), z =
c(0.836709660070246, -0.251235618242673, -2.02289120416259,
0.499914144749108, -0.458094619767492, 1.48047512280956, 0.80987242929676,
-1.17963322744287, 0.81492625128413, 0.475181724036684), radius =
c(0.174093995941803, 0.75503840832971, 0.562892300076783, 0.541058518458158,
0.724675815086812, 0.828356854617596, 0.423405217472464, 0.540400178171694,
0.0765824350528419, 0.55016236170195), color = c(#FFFF, #FF9900FF,
#CCFF00FF, #33FF00FF, #00FF66FF, #00FF, #0066,
#3300, #CC00, #FF0099FF), alpha = 1, lit = TRUE, ambient =
#00, specular = #FF, emission = #00, shininess = 50,
smooth = TRUE, front = filled, back = filled, size = 3, lwd = 1, fog
= FALSE, point_antialias = FALSE, line_antialias = FALSE, texture =
NULL, textype = rgb, texmipmap = FALSE, texminfilter = linear,
texmagfilter = linear, texenvmap = FALSE)
 3: do.call(rgl.spheres, c(list(x = x, y = y, z = z, radius = radius),
.fixMaterialArgs(..., Params = save)))
 4: spheres3d(rnorm(10), rnorm(10), rnorm(10), radius = runif(10), color
= rainbow(10))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Does this error message make anything clearer?





On Mon, May 23, 2011 at 2:43 PM, john herbert arraystrugg...@gmail.comwrote:

 Dear R users,
 I have installed the latest version of R from source on Centos (using
 configure and make install).
 This seemed to work fine, with no Errors reported and R at the command line
 starts R.

 However, if I try and installed the package rgl using;
 install.packages(rgl)
 I get the following error;

 installing to /usr/local/lib64/R/library/rgl/libs
 ** R
 ** demo
 ** inst
 ** preparing package for lazy loading
 ** help
 *** installing help indices
 ** building package indices ...
 ** testing if installed package can be loaded
  *** caught segfault ***
 address (nil), cause 'memory not mapped'
 aborting ...
 sh: line 1: 23732 Segmentation fault  '/usr/local/lib64/R/bin/R'
 --no-save --slave  /tmp/RtmpkvIjOb/file6d97876
 ERROR: loading failed
 * removing â/usr/local/lib64/R/library/rglâ
 The downloaded packages are in
 â/tmp/Rtmp5OaGuQ/downloaded_packagesâ
 Updating HTML index of packages in '.Library'
 Making packages.html  ... done
 Warning message:
 In install.packages(rgl) :
   installation of package 'rgl' had non-zero exit status
 I read that Open GL header files have to be present and are in
 /usr/include/GL.
 I also read about different graphics cards causing problems but I don't
 know how to find this info out.

 Any help appreciated and full error message included below.

 Thanks,

  sessionInfo()
 R version 2.13.0 (2011-04-13)
 Platform: x86_64-unknown-linux-gnu (64-bit)
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] 

Re: [R] RGL package installation problem on Centos

2011-05-25 Thread Duncan Murdoch

On 11-05-25 6:08 AM, john herbert wrote:

Hi.
Thank you for your help. From your suggestions, I tried the following;

R CMD INSTALL --no-test-load rgl_0.92.798.tar.gz

This seemed to load and install (starting R and issuing library(rgl) did not
flag any problems
But running the sphere example from rgl, it causes big problems :-)

# R
R version 2.13.0 (2011-04-13)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
   Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

library(rgl)
open3d()

[1] 1

spheres3d(rnorm(10), rnorm(10), rnorm(10), radius=runif(10),

color=rainbow(10))
X Error of failed request:  GLXUnsupportedPrivateRequest
   Major opcode of failed request:  143 (GLX)
   Minor opcode of failed request:  16 (X_GLXVendorPrivate)
   Serial number of failed request:  75
   Current serial number in output stream:  80
  *** caught segfault ***
address (nil), cause 'memory not mapped'
Traceback:
  1: .C(rgl_spheres, success = as.integer(FALSE), idata,
as.numeric(vertex), as.numeric(radius), NAOK = TRUE)
  2: rgl.spheres(x = c(0.506515614656334, -0.610549216480097,
1.08552683577513, 0.189935807154803, 1.3670636776769, 1.0181689602839,
-1.51133180077403, 1.41127485066926, 0.199668469858345, -1.22523054947931),
y = c(-0.323499291411831, -1.00507951141751, -0.901821819799205,
1.41189828512003, -0.131573335707317, -0.308459525548042, 1.50221794165404,
-0.154047787639801, 0.44717002689869, -0.93671163236924), z =
c(0.836709660070246, -0.251235618242673, -2.02289120416259,
0.499914144749108, -0.458094619767492, 1.48047512280956, 0.80987242929676,
-1.17963322744287, 0.81492625128413, 0.475181724036684), radius =
c(0.174093995941803, 0.75503840832971, 0.562892300076783, 0.541058518458158,
0.724675815086812, 0.828356854617596, 0.423405217472464, 0.540400178171694,
0.0765824350528419, 0.55016236170195), color = c(#FFFF, #FF9900FF,
#CCFF00FF, #33FF00FF, #00FF66FF, #00FF, #0066,
#3300, #CC00, #FF0099FF), alpha = 1, lit = TRUE, ambient =
#00, specular = #FF, emission = #00, shininess = 50,
smooth = TRUE, front = filled, back = filled, size = 3, lwd = 1, fog
= FALSE, point_antialias = FALSE, line_antialias = FALSE, texture =
NULL, textype = rgb, texmipmap = FALSE, texminfilter = linear,
texmagfilter = linear, texenvmap = FALSE)
  3: do.call(rgl.spheres, c(list(x = x, y = y, z = z, radius = radius),
.fixMaterialArgs(..., Params = save)))
  4: spheres3d(rnorm(10), rnorm(10), rnorm(10), radius = runif(10), color
= rainbow(10))
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

Does this error message make anything clearer?


The problem is being reported by your X Windows system, because 
something that rgl is doing is not supported by it.  If you Google for 
GLXUnsupportedPrivateRequest you'll see a lot of similar reports for 
this for various systems, but I don't see a lot of solutions.


I suspect it's a badly implemented graphics driver for your graphics 
card.  All I can suggest is that you contact the vendor to see if 
there's an update.


Duncan Murdoch







On Mon, May 23, 2011 at 2:43 PM, john herbertarraystrugg...@gmail.comwrote:


Dear R users,
I have installed the latest version of R from source on Centos (using
configure and make install).
This seemed to work fine, with no Errors reported and R at the command line
starts R.

However, if I try and installed the package rgl using;
install.packages(rgl)
I get the following error;

installing to /usr/local/lib64/R/library/rgl/libs
** R
** demo
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices ...
** testing if installed package can be loaded
  *** caught segfault ***
address (nil), cause 'memory not mapped'
aborting ...
sh: line 1: 23732 Segmentation fault  '/usr/local/lib64/R/bin/R'
--no-save --slave  /tmp/RtmpkvIjOb/file6d97876
ERROR: loading failed
* removing â/usr/local/lib64/R/library/rglâ
The downloaded packages are in
 â/tmp/Rtmp5OaGuQ/downloaded_packagesâ
Updating HTML index of packages in '.Library'
Making packages.html  ... done
Warning message:
In install.packages(rgl) :
   installation of package 'rgl' had non-zero exit status
I read that Open GL header files have to be present and are in
/usr/include/GL.
I also read about different graphics cards causing problems but I don't

Re: [R] plotting texas school district using shape files

2011-05-25 Thread Ben Bolker
Shant Ch sha1one at yahoo.com writes:

 
 Hi,
 
 I was plotting or creating a map for Texas school districts 
 using the shape file 
 of Texas. I could not find any other helpful mail in the mailing list.
 
 txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools))
 
 Error-  read.shape no found. But read.shape is there in maptools. 
 

   A couple of things: that's probably not the *exact* error you
got.  Did you remember to load the package first with library(maptools) ... ?
(You did install the package first, too, right?)
  
  After you have done that I suspect you will still have a problem with
finding the file -- I think you want something like

library(maptools)
txtshp - read.shape(S:\\Districts_10_11.shp)

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Thiessen method

2011-05-25 Thread Ben Bolker
federico.eccel federico.eccel at gmail.com writes:

 I try to search in the web and in the R forum for any package for computing
 The thiessen method  but I didn't find anything. I would like to ask if it
 exists any package in R that provides the possiblity to compute the Thiessen
 method for interpolating rain gauges.
 

  Do any of the hits provided by

library(sos)
findFn(thiessen)

  help?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Jonathan Daily
In cases where I have to parse through large datasets that will not
fit into R's memory, I will grab relevant data using SQL and then
analyze said data using R. There are several packages designed to do
this, like [1] and [2] below, that allow you to query a database using
SQL and end up with that data in an R data.frame.

[1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
[2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html

On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote:
 Hi R list,

 I'm new to R software, so I'd like to ask about it is capabilities.
 What I'm looking to do is to run some statistical tests on quite big
 tables which are aggregated quotes from a market feed.

 This is a typical set of data.
 Each day contains millions of records (up to 10 non filtered).

 2011-05-24      750     Bid     DELL    14130770        400
 15.4800         BATS    35482391        Y       1       1       0       0
 2011-05-24      904     Bid     DELL    14130772        300
 15.4800         BATS    35482391        Y       1       0       0       0
 2011-05-24      904     Bid     DELL    14130773        135
 15.4800         BATS    35482391        Y       1       0       0       0

 I'll need to filter it out first based on some criteria.
 Since I keep it mysql database, it can be done through by query. Not
 super efficient, checked it already.

 Then I need to aggregate dataset into different time frames (time is
 represented in ms from midnight, like 35482391).
 Again, can be done through a databases query, not sure what gonna be faster.
 Aggregated tables going to be much smaller, like thousands rows per
 observation day.

 Then calculate basic statistic: mean, standard deviation, sums etc.
 After stats are calculated, I need to perform some statistical
 hypothesis tests.

 So, my question is: what tool faster for data aggregation and filtration
 on big datasets: mysql or R?

 Thanks,
 --Roman N.

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
===
Jon Daily
Technician
===
#!/usr/bin/env outside
# It's great, trust me.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] barplot groups of different size i.e. height is NOT a matrix

2011-05-25 Thread Victor Gabillon

Hello,

I want to use the function barplot do display several group of bars.
A standard example is given at this link
http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html

But in their example the 4 groups of bars are all composed of 8 bars.
I want to be able do display the same kind of graph but where the number 
of bars in each group are not the same. For example the first group of 
bars would have 2 bars and the second group of bars would have 10 bars.


barplot function has a first parameter named height which is a matrix 
where each line  are the values  for the bars of one particular group.
One solution could be to have a height matrix with NA values but then 
the space occupied by each group is equal to the size of the largest 
group!! So you end up with gaps (empty) where there are NAs.


Do you know how to solve this problem?
Do i have to consider multiple barplots in the same plot with the same 
axis? (btw, i don't know how to do that)


In fact the bar would represent the performance of an algorithm.
A group of bars would be the performance of an algorithms with different 
parameters.
But when comparing different algorithms it is possible that we don't 
want to display the same number of parameters for each algorithm.


Thanks for your help.
Victor

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot groups of different size i.e. height is NOT a matrix

2011-05-25 Thread ONKELINX, Thierry
Dear Victor,

Here is a basic solutions using ggplot2

library(ggplot2)
dataset - data.frame(Main = c(A, A, A, B, B), Detail = c(a, b, 
c, 1, 2), value = runif(5, min = 0.5, max = 1))
ggplot(dataset, aes(x = Detail, y = value)) + geom_bar() + facet_grid(.~Main, 
scales = free_x)

Best regards,

Thierry

 -Oorspronkelijk bericht-
 Van: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 Namens Victor Gabillon
 Verzonden: woensdag 25 mei 2011 14:56
 Aan: r-help@r-project.org
 Onderwerp: [R] barplot groups of different size i.e. height is NOT a matrix
 
 Hello,
 
 I want to use the function barplot do display several group of bars.
 A standard example is given at this link
 http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-
 plot.html
 
 But in their example the 4 groups of bars are all composed of 8 bars.
 I want to be able do display the same kind of graph but where the number of
 bars in each group are not the same. For example the first group of bars would
 have 2 bars and the second group of bars would have 10 bars.
 
 barplot function has a first parameter named height which is a matrix where
 each line  are the values  for the bars of one particular group.
 One solution could be to have a height matrix with NA values but then the 
 space
 occupied by each group is equal to the size of the largest group!! So you end 
 up
 with gaps (empty) where there are NAs.
 
 Do you know how to solve this problem?
 Do i have to consider multiple barplots in the same plot with the same axis?
 (btw, i don't know how to do that)
 
 In fact the bar would represent the performance of an algorithm.
 A group of bars would be the performance of an algorithms with different
 parameters.
 But when comparing different algorithms it is possible that we don't want to
 display the same number of parameters for each algorithm.
 
 Thanks for your help.
 Victor
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with tune.svm() e1071

2011-05-25 Thread Steve Lianoglou
Hi,

On Wed, May 25, 2011 at 4:54 AM, Salih Tuna saliht...@gmail.com wrote:
 Hi,
 I am trying to use tune.svm in e1071 package.
 the command i use is

 tobj - tune.svm(labels, data= data, cost = 10^(1:2))

The first few arguments from the method signature for tune.svm is:

tune.svm(x, y = NULL, data = NULL, ...)

I'm assuming your `labels` variable is a vector of class labels (or
real values if you are doing regression) -- this corresponds to the
`y` in the method signature.

Also note in your call to tune.svm, you are missing a correct value
for the `x` parameter.

 Should the last column of the 'data' contain the labels as well?

This depends if you are using a formula for x.

I want to
 use the linear kernel. But it gives me the error
 Error in model.frame.default(formula, data) : 'data' must be a data.frame,
 not a matrix or an array

What type of object is `data`? What is the result of:

R is(data)

 Do you know why this might happen?

You aren't calling the function correctly.

Either

(1) create a matrix of predictor variables (rows are observations,
columns are features, dimensions, whatever you want to call them) and
a vector of class labels (I guess this is your `labels` variable?).

Do *not* put the class labels as an extra column in your predictor
variable matrix.

Then do:

R tune.svm(predictors, labels, ...)

or

(2) Use a formula interface and pass in a data.frame as the data argument:

R tune.svm(y ~ some + thing, data=your.data.frame)

(where 'some' and 'thing' are names of feature columns in
your.data.frame, and y is the name of your label column)

Please read through the help pages ?tune and ?tune.svm for more examples.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi,

On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote:
 Hi R list,

 I'm new to R software, so I'd like to ask about it is capabilities.
 What I'm looking to do is to run some statistical tests on quite big
 tables which are aggregated quotes from a market feed.

 This is a typical set of data.
 Each day contains millions of records (up to 10 non filtered).

 2011-05-24      750     Bid     DELL    14130770        400
 15.4800         BATS    35482391        Y       1       1       0       0
 2011-05-24      904     Bid     DELL    14130772        300
 15.4800         BATS    35482391        Y       1       0       0       0
 2011-05-24      904     Bid     DELL    14130773        135
 15.4800         BATS    35482391        Y       1       0       0       0

 I'll need to filter it out first based on some criteria.
 Since I keep it mysql database, it can be done through by query. Not
 super efficient, checked it already.

 Then I need to aggregate dataset into different time frames (time is
 represented in ms from midnight, like 35482391).
 Again, can be done through a databases query, not sure what gonna be faster.
 Aggregated tables going to be much smaller, like thousands rows per
 observation day.

 Then calculate basic statistic: mean, standard deviation, sums etc.
 After stats are calculated, I need to perform some statistical
 hypothesis tests.

 So, my question is: what tool faster for data aggregation and filtration
 on big datasets: mysql or R?

Why not try a few experiments and see for yourself -- I guess the
answer will depend on what exactly you are doing.

If your datasets are *really* huge, check out some packages listed
under the Large memory and out-of-memory data section of the
HighPerformanceComputing task view at CRAN:

http://cran.r-project.org/web/views/HighPerformanceComputing.html

Also, if you find yourself needing to do lots of
grouping/summarizing type of calculations over large data frame-like
objects, you might want to check out the data.table package:

http://cran.r-project.org/web/packages/data.table/index.html

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] transpose ?

2011-05-25 Thread Mohamed Lajnef
Dear All,
Suppose this data.frame
D

V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C
G  G  T   T   A   A   A   A   T   A   T   T   C   C   G   G
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C


I would translate D as follow ( just for the first line)
C (V7)
C (V9)
T   G   A   C   G   C
C (V8)
C (V10)
T   G   A   C   G   C


Any help would be appreciated

Regards
M

-- 

Mohamed Lajnef,IE INSERM U955 eq 15#
Pôle de Psychiatrie#
Hôpital CHENEVIER  #
40, rue Mesly  #
94010 CRETEIL Cedex FRANCE #
mohamed.laj...@inserm.fr   #
tel : 01 49 81 32 79   #
Sec : 01 49 81 32 90   #
fax : 01 49 81 30 99   #




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] transpose ?

2011-05-25 Thread Scott Chamberlain
See ?t

__Scott Chamberlain
Rice University, EEB Dept.

On Wednesday, May 25, 2011 at 9:07 AM, Mohamed Lajnef wrote: 
 Dear All,
 Suppose this data.frame
 D
 
 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
 C C C C T T G G A A C C G G C C
 G G T T A A A A T A T T C C G G
 C C C C T T G G A A C C G G C C
 
 
 I would translate D as follow ( just for the first line)
 C (V7)
  C (V9)
  T G A C G C
 C (V8)
  C (V10)
  T G A C G C
 
 
 Any help would be appreciated
 
 Regards
 M
 
 -- 
 
 Mohamed Lajnef,IE INSERM U955 eq 15#
 Pôle de Psychiatrie # 
 Hôpital CHENEVIER #
 40, rue Mesly #
 94010 CRETEIL Cedex FRANCE #
 mohamed.laj...@inserm.fr #
 tel : 01 49 81 32 79 #
 Sec : 01 49 81 32 90 #
 fax : 01 49 81 30 99 #
 
 
 
 
  [[alternative HTML version deleted]]
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: transpose ?

2011-05-25 Thread Mohamed Lajnef
Dear All,
Sorry for the previous mail,suppose this data.frame
D


V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C
G  G  T   T   A   A   A   A   T   A   T   T   C   C   G   G
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C





I would translate D as follow ( just for the first line)

C C T G A C G C
C C T G A C G C
(V8 under V7) (V9 under V10) ...

Any help would be appreciated

Regards
M

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] issues with rJava; cannot run JRI example

2011-05-25 Thread Ajaya Mishra

Hello,

I am trying to run JRI example from rJava, but I have some issues. I have read 
many posts and didn't find any solution to my problem.

I have the following code:

Rengine re = new Rengine(null, false, null);
System.out.println(Rengine created, waiting for R);

if (!re.waitForR()) {
System.out.println(Cannot load R);
return;
}

System.out.println(re-routing stdout/err into R console);


But I get the message:

Creating Rengine
Java Result: 10

When I run the program. It never reaches the statement below Rengine re = new 
Rengine(null, false, null); I think there might be some problem while creating 
Rengine.

Then I tried something like this: without any parameters in Rengine

Rengine re = new Rengine();

System.out.println(Rengine created, waiting for R);



if (!re.waitForR()) {

System.out.println(Cannot load R);

return;

}



System.out.println(re-routing stdout/err into R console);

Now it shows me the following message:
Creating Rengine
Rengine created, waiting for R
re-routing stdout/err into R console

which means that Rengine was created. But if i try to add some lines like this 
.


double [] d = {1.0, 2.0, 3.0};
re.assign(a, d);

after the line above and try to run again it shows me following error messages:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc005) at pc=0x6c731a9e, pid=5152, tid=492
#
# JRE version: 6.0_25-b06
# Java VM: Java HotSpot(TM) Client VM (20.0-b11 mixed mode windows-x86 )
# Problematic frame:
# C  [R.dll+0x31a9e]
#
# An error report file with more information is saved as:
# C:\Documents and Settings\ajayami\My 
Documents\NetBeansProjects\JAVA_R\hs_err_pid5152.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Java Result: 1
BUILD SUCCESSFUL (total time: 2 seconds)

I don't know how to solve this problem. Does anyone have idea how to solve 
this??
I have kept all the .dll files in System32 folder.


Any kind of help is appreciated.

Regards,
Ajaya
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Print the content of several columns in only one

2011-05-25 Thread zoe.cryocla
Hi,

I’m an R beginner and I'd really appreciate an hand…
I'd like to create a new column in a dataframe in wich will be print the
content of several other columns. 

For instance :
I’ve got 2 columns : site – sampling number and I would like to create a
third column ID, in wich will appear both the site name and the sampling
number, like :

site   sampling   ID
site1   1site1.1
site1   2site1.2
site2   1site2.1
site3   1site3.1

How could I do that in R? If someone could help me it'd be great, thanks in
advance !

Zoé

--
View this message in context: 
http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] select levels of factor variables

2011-05-25 Thread zoe.cryocla
Hi again,
I've got another question...


I often use the symbol == to select some levels of factor variables like :
data[data$var==blabla, [

But this time, I'd like to select all the levels of my variable wich contain
the letter B, is that a way to determine this conditions ?

Thanks a lot !

Zoé


--
View this message in context: 
http://r.789695.n4.nabble.com/select-levels-of-factor-variables-tp3549189p3549189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] approximate function and find local peaks (Maxima or Minima)

2011-05-25 Thread Michael S.
Hi,

I have a data-matrix:

 CB
                     Zeit                Low
2   2011-05-02 08:05:05 7596.0
3   2011-05-02 08:10:06 7593.5
4   2011-05-02 08:15:11 7594.5
5   2011-05-02 08:20:15 7597.5
6   2011-05-02 08:25:18 7595.0
7   2011-05-02 08:30:20 7593.5
8   2011-05-02 08:35:21 7593.0
9   2011-05-02 08:40:21 7593.0
10  2011-05-02 08:45:25 7599.0
11  2011-05-02 08:50:34 7596.0
12  2011-05-02 08:55:59 7591.0
13  2011-05-02 09:01:00 7590.5
14  2011-05-02 09:06:00 7590.5
15  2011-05-02 09:11:04 7590.5
16  2011-05-02 09:16:04 7591.0
17  2011-05-02 09:21:06 7593.0
18  2011-05-02 09:26:08 7596.0
19  2011-05-02 09:31:09 7596.0
20  2011-05-02 09:36:10 7599.0
21  2011-05-02 09:41:11 7601.5
22  2011-05-02 09:46:11 7608.0
23  2011-05-02 09:51:18 7611.5
24  2011-05-02 09:56:20 7605.5
25  2011-05-02 10:01:20 7601.5

I want to approximate this data (actually I dont care, whether keep the 
time information, or lose it, while making it a function) 

With approxfun( ), it seems, like I managed to apprximate a function.
f - approxfun(2:nrow(CB), CB[2:nrow(CB),2])

But how do I defferentiate f()? 
g-deriv(f(2:nrow(CB)),x)
Did not work out for me, or at least, I dont know how to get those x, 
with g(x)=0.

My ultimate goal, is to find all the local minima of CB[,2]. (min() gives 
only the global minimum)

Any suggestions how to do it?

Thanks for your help in advance.
Michael



-- 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combined odds ratio

2011-05-25 Thread linda Porz
Dear all,

I am looking for an R function which does stepwise selection cox model in r
(delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr:
stcox in STATA.

I am very thankful for any reply.

Regards,
Linda

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] stepwise selection cox model

2011-05-25 Thread linda Porz
Sorry, I have wrote a wrong subject in the first email!

Regards,
Linda

-- Forwarded message --
From: linda Porz linda.p...@gmail.com
Date: 2011/5/25
Subject: combined odds ratio
To: r-help@r-project.org
Cc: r-help-requ...@stat.math.ethz.ch


Dear all,

I am looking for an R function which does stepwise selection cox model in r
(delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr:
stcox in STATA.

I am very thankful for any reply.

Regards,
Linda

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] connection problem

2011-05-25 Thread rgui

Hi, 

I have a problem during choosing a Cran mirror, an error messages comes:
  In open.connection (con, r)
 connection to 'cran.r-project.org' impossible to prt 80.
I don't know why? can you help me to choose a cran mirror.

thanks for any suggestion.


--
View this message in context: 
http://r.789695.n4.nabble.com/connection-problem-tp3549420p3549420.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] What does smaller than comparison do on strings?

2011-05-25 Thread Niklaus Kuehnis
What's the logic behind the following, and where can I find any 
documentation about it? In particular, why are 2:9 - as characters - not 
regarded as being smaller than 10?


# R-Code:
a - as.character(1:12)

a  10
#  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
FALSE


Thanks in advance!

Niklaus

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print the content of several columns in only one

2011-05-25 Thread zoe.cryocla
Ok, I found how to do, 
with the function paste() 

--
View this message in context: 
http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549514.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R as.numeric()

2011-05-25 Thread Lutz Fischer


Thanks a lot for both replies.

If I setup the option as proposed everything works as I wanted it to.

I guess as.character would work as well. Only then I guess I would need
to loop through the data frame.

Lutz


On 24/05/11 22:42, Ista Zahn wrote:
 This is a FAQ:

 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

 Please try there before posting a question to the list.

 Best,
 Ista
 On Tue, May 24, 2011 at 5:33 PM, David Scott d.sc...@auckland.ac.nz wrote:
 On 25/05/2011 9:20 a.m., Lutz Fischer wrote:
 Hi,

 I have a bit of a problem with as.numeric or as.double.

 I read in an excel-file (either xlsx::read.xlsx2 or gdata::read.xls).
 Select a subset and then try to make it numeric:


 # read in the excel-file
 alldata-read.xlsx2(input.xls,1)
 # select the subset
 s-subset(alldata, select=c(cI,cII,cIII,cIV,cV))
 # unluckily we have n/a for missing values in the file - so we turn it
 into proper missing values
 s[s == n/a]-NA

 n-data.matrix(s);




 The problem I have is that it does not convert the date the way I would
 expect.

 just as an example:
   s[1,2]
 [1] 30.94346629
 3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564 0.029540352
 0.030248034 0.030841352 0.032966308 ... n/a

 turned into:
   n[1,2]
 [1] 3020

 And I would like to have there 30.94346629 as well. I assume that has to
 do with the Levels attribute - but not sure what to make of these in
 the first place.

 I also tried to convert each value on its own:

 #make some space that holds the actual numeric data
 n - array(dim=c(length(s[,1]),length(s)))
 # now turn everything into doubles
 for (c in 1:length(s)) {
 for (r in 1:length(s[,1])) {
 n[r,c]-as.double(s[r,c])
 }
 }

 but that gave the same result - just a lot slower.



 Thanks
 Lutz


 Your problem is the conversion to factors when the data is read. Use

 options(stringsAsFactors = FALSE)

 before you read the data, then the mixed columns of numeric and missing will
 be read as character data and the conversion to numeric will go as you
 expect. (But I haven't tested this.)

 David Scott
 --
 _
 David Scott Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,NEW ZEALAND
 Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
 Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multinomial Logistical Model

2011-05-25 Thread Mark Difford
On May 24, 2011; 11:06pm Belle wrote:

 Does anyone know how to run Multinomial logistical Model in R in order to
 get predicted probability?

Yes. I could stop there but you shouldn't. The author of the package
provides plenty of examples (and two good vignettes) showing you how to do
this. Suggest you do some work in that area. Look especially at how model
formulas are used/specified. This is at least one area where you have gone
wrong, as the error message clearly tells you.

Good luck.
Mark.

-
Mark Difford (Ph.D.)
Research Associate
Botany Department
Nelson Mandela Metropolitan University
Port Elizabeth, South Africa
--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logistical-Model-tp3548239p3549611.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Adjusted Rate Ratios in R

2011-05-25 Thread matthew.benigni
I am trying to calculate Poisson regression based adjusted rate ratios in R,
but R's default in glm does not code the intercept as the global rate.  In
SAS I use cell means coding so that the intercept is the global rate, but
I do not know how to do this in R.  If anyone knows a way to make glm use
cell means, or an how to find adjusted rate ratios I would be grateful.

--
View this message in context: 
http://r.789695.n4.nabble.com/Adjusted-Rate-Ratios-in-R-tp3549604p3549604.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting texas school district using shape files

2011-05-25 Thread Shant Ch
Yes I had included the library(maptools) in my code, it is already installed in 
my computer. but still it is showing the same error.





From: Ben Bolker bbol...@gmail.com
To: r-h...@stat.math.ethz.ch
Sent: Wed, May 25, 2011 8:06:19 AM
Subject: Re: [R] plotting texas school district using shape files

Shant Ch sha1one at yahoo.com writes:

 
 Hi,
 
 I was plotting or creating a map for Texas school districts 
 using the shape file 
 of Texas. I could not find any other helpful mail in the mailing list.
 
 txshp-read.shape(system.file(S:\\Districts_10_11.shp, package=maptools))
 
 Error-  read.shape no found. But read.shape is there in maptools. 
 

   A couple of things: that's probably not the *exact* error you
got.  Did you remember to load the package first with library(maptools) ... ?
(You did install the package first, too, right?)
  
  After you have done that I suspect you will still have a problem with
finding the file -- I think you want something like

library(maptools)
txtshp - read.shape(S:\\Districts_10_11.shp)

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Roman Naumenko
Thanks Jonathan. 

I'm already using RMySQL to load data for couple of days. 
I wanted to know what are the relevant R capabilities if I want to process much 
bigger tables. 

R always reads the whole set into memory and this might be a limitation in case 
of big tables, correct? 
Doesn't it use temporary files or something similar to deal such amount of 
data? 

As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
76GB memory, without noticeable issues. 

--Roman 

- Original Message -

 In cases where I have to parse through large datasets that will not
 fit into R's memory, I will grab relevant data using SQL and then
 analyze said data using R. There are several packages designed to do
 this, like [1] and [2] below, that allow you to query a database
 using
 SQL and end up with that data in an R data.frame.

 [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
 [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html

 On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
 ro...@bestroman.com wrote:
  Hi R list,
 
  I'm new to R software, so I'd like to ask about it is capabilities.
  What I'm looking to do is to run some statistical tests on quite
  big
  tables which are aggregated quotes from a market feed.
 
  This is a typical set of data.
  Each day contains millions of records (up to 10 non filtered).
 
  2011-05-24 750 Bid DELL 14130770 400
  15.4800 BATS 35482391 Y 1 1 0 0
  2011-05-24 904 Bid DELL 14130772 300
  15.4800 BATS 35482391 Y 1 0 0 0
  2011-05-24 904 Bid DELL 14130773 135
  15.4800 BATS 35482391 Y 1 0 0 0
 
  I'll need to filter it out first based on some criteria.
  Since I keep it mysql database, it can be done through by query.
  Not
  super efficient, checked it already.
 
  Then I need to aggregate dataset into different time frames (time
  is
  represented in ms from midnight, like 35482391).
  Again, can be done through a databases query, not sure what gonna
  be faster.
  Aggregated tables going to be much smaller, like thousands rows per
  observation day.
 
  Then calculate basic statistic: mean, standard deviation, sums etc.
  After stats are calculated, I need to perform some statistical
  hypothesis tests.
 
  So, my question is: what tool faster for data aggregation and
  filtration
  on big datasets: mysql or R?
 
  Thanks,
  --Roman N.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 --
 ===
 Jon Daily
 Technician
 ===
 #!/usr/bin/env outside
 # It's great, trust me.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Roman Naumenko
 Hi,

 On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
 ro...@bestroman.com wrote:
  Hi R list,
 
  I'm new to R software, so I'd like to ask about it is capabilities.
  What I'm looking to do is to run some statistical tests on quite
  big
  tables which are aggregated quotes from a market feed.
 
  This is a typical set of data.
  Each day contains millions of records (up to 10 non filtered).
 
  2011-05-24 750 Bid DELL 14130770 400
  15.4800 BATS 35482391 Y 1 1 0 0
  2011-05-24 904 Bid DELL 14130772 300
  15.4800 BATS 35482391 Y 1 0 0 0
  2011-05-24 904 Bid DELL 14130773 135
  15.4800 BATS 35482391 Y 1 0 0 0
 
  I'll need to filter it out first based on some criteria.
  Since I keep it mysql database, it can be done through by query.
  Not
  super efficient, checked it already.
 
  Then I need to aggregate dataset into different time frames (time
  is
  represented in ms from midnight, like 35482391).
  Again, can be done through a databases query, not sure what gonna
  be faster.
  Aggregated tables going to be much smaller, like thousands rows per
  observation day.
 
  Then calculate basic statistic: mean, standard deviation, sums etc.
  After stats are calculated, I need to perform some statistical
  hypothesis tests.
 
  So, my question is: what tool faster for data aggregation and
  filtration
  on big datasets: mysql or R?

 Why not try a few experiments and see for yourself -- I guess the
 answer will depend on what exactly you are doing.

 If your datasets are *really* huge, check out some packages listed
 under the Large memory and out-of-memory data section of the
 HighPerformanceComputing task view at CRAN:

 http://cran.r-project.org/web/views/HighPerformanceComputing.html

 Also, if you find yourself needing to do lots of
 grouping/summarizing type of calculations over large data
 frame-like objects, you might want to check out the data.table package:

 http://cran.r-project.org/web/packages/data.table/index.html

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact

I don't think data.table is fundamentally different from data.frame type, but 
thanks for the suggestion. 

http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
Just like data.frames, data.tables must fit inside RAM

The ff package by Adler, listed in Large memory and out-of-memory data is 
probably most interesting.

--Roman Naumenko

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: transpose ?

2011-05-25 Thread Jeff Newmiller
Then use as.matrix. Transpose is not a well-defined operation for data frames.
---
Jeff Newmiller The . . Go Live...
DCN:jdnew...@dcn.davis.ca.us Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Mohamed Lajnef mohamed.laj...@inserm.fr wrote:

Dear All, Sorry for the previous mail,suppose this data.frame D V7 V8 V9 V10 
V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 C C C C T T G G A A C C G G C C 
G G T T A A A A T A T T C C G G C C C C T T G G A A C C G G C C I would 
translate D as follow ( just for the first line) C C T G A C G C C C T G A C G 
C (V8 under V7) (V9 under V10) ... Any help would be appreciated Regards M  
[[alternative HTML version 
deleted]]_
R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html 
and provide commented, minimal, self-contained, reproducible code. 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] panel.first problem when plotting with formula

2011-05-25 Thread David Winsemius


On May 24, 2011, at 11:42 PM, Gene Leynes wrote:


Peter,

Good idea!  (why didn't I think of that?)

If it stumped the r-list, I think there is probably a slight bug  
with the

plot formula.

Problems like this make me realize how amazingly full featured and
relatively bug free R is.  A problem like this would never happen in  
Excel,
because this level of functionality does not exist.  However, if it  
did, it
would probably never be fixed... and you could substitute Excel  
with Any

commercial software.



plot(dat, panel.first=bgfun() ) # succeeds

So the problem is not with plot.data.frame.

So someplace in the processing of dots and the handoff to

do.call(funname, c(list(mf[[i]], y, ylab = yl,  xlab = xl), dots))

... where funname = plot, the dot identities do not get honored. The  
'plot function is where it all started, but the first argument is now  
mf[[i]], and is that is now a numeric vector. So I think it gets  
handed off to plot.default,  which sets panel.first to NULL.


--
David.


Gene


On Tue, May 24, 2011 at 3:13 AM, Peter Ehlers ehl...@ucalgary.ca  
wrote:



On 2011-05-23 16:54, Gene Leynes wrote:


I wrote a little function called bgfun that adds gridlines and a
background,
but it's not working with I plot using the formula.

I have some theories on what's happening, but even if my theory is  
right,

I
don't know how to fix it.

Someone have a straightforward silver bullet?



No silver bullet, but this seems to work:

plot(y ~ x, data=dat, type=n)
points(y ~ x, data=dat, panel.first=bgfun())

(I think that plot.formula may need a fix but
offhand I can't see whether that's easy or hard.)

Peter Ehlers



Thank you,

Gene



bgfun = function(color='honeydew2',linecolor='grey45',  
addgridlines=TRUE){

   tmp=par(usr)
   rect(tmp[1], tmp[3], tmp[2], tmp[4], col=color)
   if(addgridlines){
   ylimits=par()$usr[c(3,4)]
   abline(h=pretty(ylimits,10), lty=2, col=linecolor)
   }
}
dat = data.frame(x=1:10,y=1:10)

## Works
plot(dat$x, dat$y, panel.first=bgfun())

## Why doesn't this work?
plot(y ~ x, data=dat, panel.first=bgfun())

  [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What does smaller than comparison do on strings?

2011-05-25 Thread Duncan Murdoch

On 25/05/2011 6:06 AM, Niklaus Kuehnis wrote:

What's the logic behind the following, and where can I find any
documentation about it? In particular, why are 2:9 - as characters - not
regarded as being smaller than 10?

# R-Code:
a- as.character(1:12)

a  10
#  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE


See ?Comparison for help.  There are lots of details given there.

In summary:  your second comparison is of 2 to 10.  Since the 
character 2 sorts later than the character 1, 2  10 is FALSE.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Marc Schwartz

Take a look at the High-Performance and Parallel Computing with R CRAN Task 
View:

  http://cran.us.r-project.org/web/views/HighPerformanceComputing.html

specifically at the section labeled Large memory and out-of-memory data.

There are some specific R features that have been implemented in a fashion to 
enable out of memory operations, but not all.

I believe that Revolution's commercial version of R, has developed 'big data' 
functionality, but would defer to them for additional details.

You can of course use a 64 bit version of R on a 64 bit OS to increase 
accessible RAM, however, there will still be object size limitations predicated 
upon the fact that R uses 32 bit signed integers for indexing into objects. See 
?Memory-limits for more information.

HTH,

Marc Schwartz


On May 25, 2011, at 8:49 AM, Roman Naumenko wrote:

 Thanks Jonathan. 
 
 I'm already using RMySQL to load data for couple of days. 
 I wanted to know what are the relevant R capabilities if I want to process 
 much bigger tables. 
 
 R always reads the whole set into memory and this might be a limitation in 
 case of big tables, correct? 
 Doesn't it use temporary files or something similar to deal such amount of 
 data? 
 
 As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
 76GB memory, without noticeable issues. 
 
 --Roman 
 
 - Original Message -
 
 In cases where I have to parse through large datasets that will not
 fit into R's memory, I will grab relevant data using SQL and then
 analyze said data using R. There are several packages designed to do
 this, like [1] and [2] below, that allow you to query a database
 using
 SQL and end up with that data in an R data.frame.
 
 [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
 [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html
 
 On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
 ro...@bestroman.com wrote:
 Hi R list,
 
 I'm new to R software, so I'd like to ask about it is capabilities.
 What I'm looking to do is to run some statistical tests on quite
 big
 tables which are aggregated quotes from a market feed.
 
 This is a typical set of data.
 Each day contains millions of records (up to 10 non filtered).
 
 2011-05-24 750 Bid DELL 14130770 400
 15.4800 BATS 35482391 Y 1 1 0 0
 2011-05-24 904 Bid DELL 14130772 300
 15.4800 BATS 35482391 Y 1 0 0 0
 2011-05-24 904 Bid DELL 14130773 135
 15.4800 BATS 35482391 Y 1 0 0 0
 
 I'll need to filter it out first based on some criteria.
 Since I keep it mysql database, it can be done through by query.
 Not
 super efficient, checked it already.
 
 Then I need to aggregate dataset into different time frames (time
 is
 represented in ms from midnight, like 35482391).
 Again, can be done through a databases query, not sure what gonna
 be faster.
 Aggregated tables going to be much smaller, like thousands rows per
 observation day.
 
 Then calculate basic statistic: mean, standard deviation, sums etc.
 After stats are calculated, I need to perform some statistical
 hypothesis tests.
 
 So, my question is: what tool faster for data aggregation and
 filtration
 on big datasets: mysql or R?
 
 Thanks,
 --Roman N.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] select levels of factor variables

2011-05-25 Thread David Winsemius


On May 25, 2011, at 4:46 AM, zoe.cryocla wrote:


Hi again,
I've got another question...


I often use the symbol == to select some levels of factor variables  
like :

data[data$var==blabla, [

But this time, I'd like to select all the levels of my variable wich  
contain

the letter B, is that a way to determine this conditions ?


Perhaps with grep and/or  %in%

Got a reproducible example? ...preferably constructed with dput

--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi,

On Wed, May 25, 2011 at 10:18 AM, Roman Naumenko ro...@bestroman.com wrote:
[snip]
 I don't think data.table is fundamentally different from data.frame type, but 
 thanks for the suggestion.

 http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
 Just like data.frames, data.tables must fit inside RAM

Yeah, I know -- I only mentioned in the context of manipulating
data.frame-like objects -- sorry if I wasn't clear.

If you've got data that's data.frame like that you can store in ram
AND you find yourself wanting to do some summary calcs over different
subgroups of it, you might find that data.table will be a quicker way
to get that done -- the larger your data.frame/table, the more
noticeable the speed.

To give you and idea of what scenarios I'm talking about, other
packages you'd use to do the same would by plyr and sqldf.

For out of memory datasets, you're in a different realm -- hence the
HPC Task view link.

 The ff package by Adler, listed in Large memory and out-of-memory data is 
 probably most interesting.

Cool.

I've had some luck using the bigmemory package (and friends) in the
past as well.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets/ non answer but Q on writing data frame derivative.

2011-05-25 Thread Mike Marchywka






 Date: Wed, 25 May 2011 09:49:00 -0400
 From: ro...@bestroman.com
 To: biomathjda...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Processing large datasets

 Thanks Jonathan.

 I'm already using RMySQL to load data for couple of days.
 I wanted to know what are the relevant R capabilities if I want to process 
 much bigger tables.

 R always reads the whole set into memory and this might be a limitation in 
 case of big tables, correct?

ok, now I ask, perhaps for my first R effort I will try to find source code for
data frame and make a paging or streaming derivative. That is, at least for
fixed size things, it can supply things like number of total rows but
has facilities for paging in and out of memory. Presumably all users of data
frame have to work through a limited interface which I guess could be 
expanded with various hints on  prefetch this for example. I haven't looked
at this idea in a while but the issue keeps coming up, dev list maybe?

Anyway, for your immediate issues with a few statistics you could
probably write a simple c++ program that ultimately becomes part of
an R package. It is a good idea to see what is available but these
questions come up here a lot and the normal suggestion is DB which
is exactly the opposite of what you want if you have predictable
access patterns ( although even here prefetch could probably be implemented).






 Doesn't it use temporary files or something similar to deal such amount of 
 data?

 As an example I know that SAS handles sas7bdat files up to 1TB on a box with 
 76GB memory, without noticeable issues.

 --Roman

 - Original Message -

  In cases where I have to parse through large datasets that will not
  fit into R's memory, I will grab relevant data using SQL and then
  analyze said data using R. There are several packages designed to do
  this, like [1] and [2] below, that allow you to query a database
  using
  SQL and end up with that data in an R data.frame.

  [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
  [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html

  On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko
   wrote:
   Hi R list,
  
   I'm new to R software, so I'd like to ask about it is capabilities.
   What I'm looking to do is to run some statistical tests on quite
   big
   tables which are aggregated quotes from a market feed.
  
   This is a typical set of data.
   Each day contains millions of records (up to 10 non filtered).
  
   2011-05-24 750 Bid DELL 14130770 400
   15.4800 BATS 35482391 Y 1 1 0 0
   2011-05-24 904 Bid DELL 14130772 300
   15.4800 BATS 35482391 Y 1 0 0 0
   2011-05-24 904 Bid DELL 14130773 135
   15.4800 BATS 35482391 Y 1 0 0 0
  
   I'll need to filter it out first based on some criteria.
   Since I keep it mysql database, it can be done through by query.
   Not
   super efficient, checked it already.
  
   Then I need to aggregate dataset into different time frames (time
   is
   represented in ms from midnight, like 35482391).
   Again, can be done through a databases query, not sure what gonna
   be faster.
   Aggregated tables going to be much smaller, like thousands rows per
   observation day.
  
   Then calculate basic statistic: mean, standard deviation, sums etc.
   After stats are calculated, I need to perform some statistical
   hypothesis tests.
  
   So, my question is: what tool faster for data aggregation and
   filtration
   on big datasets: mysql or R?
  
   Thanks,
   --Roman N.
  
   [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  

  --
  ===
  Jon Daily
  Technician
  ===
  #!/usr/bin/env outside
  # It's great, trust me.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot groups of different size i.e. height is NOT a matrix

2011-05-25 Thread Marc Schwartz
On May 25, 2011, at 7:56 AM, Victor Gabillon wrote:

 Hello,
 
 I want to use the function barplot do display several group of bars.
 A standard example is given at this link
 http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html
 
 But in their example the 4 groups of bars are all composed of 8 bars.
 I want to be able do display the same kind of graph but where the number of 
 bars in each group are not the same. For example the first group of bars 
 would have 2 bars and the second group of bars would have 10 bars.
 
 barplot function has a first parameter named height which is a matrix where 
 each line  are the values  for the bars of one particular group.
 One solution could be to have a height matrix with NA values but then the 
 space occupied by each group is equal to the size of the largest group!! So 
 you end up with gaps (empty) where there are NAs.
 
 Do you know how to solve this problem?
 Do i have to consider multiple barplots in the same plot with the same axis? 
 (btw, i don't know how to do that)
 
 In fact the bar would represent the performance of an algorithm.
 A group of bars would be the performance of an algorithms with different 
 parameters.
 But when comparing different algorithms it is possible that we don't want to 
 display the same number of parameters for each algorithm.
 
 Thanks for your help.
 Victor


barplot() is fundamentally built upon the use of rect() to construct the bars, 
so you could always create your own variant to allow for the flexibility that 
you desire.

That being said, if your performance measures (the bar heights) are other than 
discrete counts or proportions, I would advise you to consider using other 
visual presentation forms, as these are really the only two types of data for 
which barplots are generally considered satisfactory. A key to barplots of 
course is that they are based at 0 for proper visual comparison. Thus, if you 
need to have the minima of the relevant axis at a value other than 0, this is 
another reason to not use them.

Even then, many folks have moved away from barplots to use point or dot plots 
and similar formats, especially where you also need to include some type of 
confidence interval for each measure.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plotting texas school district using shape files

2011-05-25 Thread Ben Bolker
Shant Ch sha1one at yahoo.com writes:

 
 Yes I had included the library(maptools) in my code, it is already
 installed in 
 my computer. but still it is showing the same error.
 

  In that case you should (1) read the posting guide, (2) copy and
paste the code you ran, and the precise error you got, into an 
email to the list that also includes (3) the results of running
sessionInfo() during your R session (after loading the maptools
package).

  Then perhaps we will have enough information to help diagnose
the problem.

  PS: a little more poking around shows that, at least on my
system, read.shape() is *not* part of the maptools package.

help.search(read.shape)  finds maptools::readShapeSpatial.
library(sos); findFn(read.shape) discovers that there
is a read.shape() function in the spsurvey package.

  Ben Bolker

  I was plotting or creating a map for Texas school districts 
  using the shape file 
  of Texas. I could not find any other helpful mail in the mailing list.
  
  txshp-read.shape(system.file(S:\\Districts_10_11.shp, 
package=maptools))
  
  Error-  read.shape no found. But read.shape is there in maptools. 
  
 
   After you have done that I suspect you will still have a problem with
 finding the file -- I think you want something like
 
 library(maptools)
 txtshp - read.shape(S:\\Districts_10_11.shp)
 
   Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multinomial Logistical Model

2011-05-25 Thread Frank Harrell
I suggest a couple of courses before proceeding.  Multinomial logistic models
have special challenges.  And note that you have two nomenclature errors in
your note, which is usually a sign of not having taken the relevant
coursework.
Frank

Belle wrote:
 
 Does anyone know how to run Multinomial logistical Model in R in order to
 get predicted probability?
 
 The response is content (5 levels: 1, 2, 3, 4, 5)
 The covariance are:
 assignment - int (0, 1)
 dr0 - int (0, 1)
 dr1 - int (0, 1)
 yr_exp - num
 yr_exp_s - num
 ismgdr - int (0, 1)
 ismgyr_t_A - int (0, 1)
 pair - int (41 pairs: 1001, 1002, ...)
 
 There is no random effect involved, all the variables are fixed. 
 
 I have tried mlogit, but it does not work.
 
 x - SciContent
 x$content - as.factor(x$content)
 mldata - mlogit.data(x, varying=NULL, choice=content, shape=wide)
 SciCt - mlogit(mldata$content | mldata$assignment + mldata$dr0 +
 mldata$dr1 + mldata$yr_tch_exp + mldata$yr_tch_exp_s + mldata$ismgdr +
 mldata$ismgyr_t_A + mldata$pair)
 
 Error: inherits(object, formula) is not TRUE
 


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/Multinomial-Logistical-Model-tp3548239p3550003.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the mgcv package can not be loaded

2011-05-25 Thread gbrenes
Hi.

I have been trying to load the mgcv package but I always get the error
message:

 there is no package called 'nlme'
Error: package/namespace load failed for 'mgcv'


I load the package nlme and still I get the same message.  I have noticed
that there are some problems in using nlme in recent versions of R.  Is
there any suggestion or any special issue that I should know about nlme or
mgcv?

Thanks


Gilbert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise selection cox model

2011-05-25 Thread David Winsemius


On May 25, 2011, at 5:28 AM, linda Porz wrote:


Sorry, I have wrote a wrong subject in the first email!

Regards,
Linda

-- Forwarded message --
From: linda Porz linda.p...@gmail.com
Date: 2011/5/25
Subject: combined odds ratio
To: r-help@r-project.org
Cc: r-help-requ...@stat.math.ethz.ch


Dear all,

I am looking for an R function which does stepwise selection cox  
model in r
(delta chisq likelihood ratio test) similar to the stepwise, pe  
(0.05) lr:

stcox in STATA.




Does the Stata method apply appropriate penalization to its stepwise  
procedures?


I suspect you will find that the experts in survival analysis around  
these parts take a very dim view of stepwise procedures and I would  
not be surprised if they purposely put a barrier in front of naive  
users to protect them from falling into the well-described but perhaps  
not widely understood pitfalls of such methods.


I do know that Harrell provides for some support for penalized methods  
in his cph related functions. See the function pentrace.  He also has  
a fastbw function in rms which is provided mainly so one can  
investigate and demonstrate those aforementioned pitfalls.


Note: You should not be adding the address r-help-requ...@stat.math.ethz.ch 
 to your postings. You may get cryptic replies in your Inbox. It is  
the address for interacting with the mail-server to manage your  
subscription options.



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R as.numeric()

2011-05-25 Thread David Winsemius


On May 25, 2011, at 7:25 AM, Lutz Fischer wrote:




Thanks a lot for both replies.

If I setup the option as proposed everything works as I wanted it to.

I guess as.character would work as well. Only then I guess I would  
need

to loop through the data frame.


as.character is vectorized. You should not need loops.

--
David.


Lutz


On 24/05/11 22:42, Ista Zahn wrote:

This is a FAQ:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-do-I-convert-factors-to-numeric_003f

Please try there before posting a question to the list.

Best,
Ista
On Tue, May 24, 2011 at 5:33 PM, David Scott  
d.sc...@auckland.ac.nz wrote:

On 25/05/2011 9:20 a.m., Lutz Fischer wrote:

Hi,

I have a bit of a problem with as.numeric or as.double.

I read in an excel-file (either xlsx::read.xlsx2 or  
gdata::read.xls).

Select a subset and then try to make it numeric:


# read in the excel-file
alldata-read.xlsx2(input.xls,1)
# select the subset
s-subset(alldata, select=c(cI,cII,cIII,cIV,cV))
# unluckily we have n/a for missing values in the file - so we  
turn it

into proper missing values
s[s == n/a]-NA

n-data.matrix(s);




The problem I have is that it does not convert the date the way I  
would

expect.

just as an example:

s[1,2]

[1] 30.94346629
3136 Levels: 0.026307482 0.028239812 0.02849896 0.029054564  
0.029540352

0.030248034 0.030841352 0.032966308 ... n/a

turned into:

n[1,2]

[1] 3020

And I would like to have there 30.94346629 as well. I assume that  
has to
do with the Levels attribute - but not sure what to make of  
these in

the first place.

I also tried to convert each value on its own:

#make some space that holds the actual numeric data
n - array(dim=c(length(s[,1]),length(s)))
# now turn everything into doubles
for (c in 1:length(s)) {
for (r in 1:length(s[,1])) {
n[r,c]-as.double(s[r,c])
}
}

but that gave the same result - just a lot slower.



Thanks
Lutz



Your problem is the conversion to factors when the data is read. Use

options(stringsAsFactors = FALSE)

before you read the data, then the mixed columns of numeric and  
missing will
be read as character data and the conversion to numeric will go as  
you

expect. (But I haven't tested this.)

David Scott
--
_
David Scott Department of Statistics
  The University of Auckland, PB 92019
  Auckland 1142,NEW ZEALAND
Phone: +64 9 923 5055, or +64 9 373 7599 ext 85055
Email:  d.sc...@auckland.ac.nz,  Fax: +64 9 373 7018

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the mgcv package can not be loaded

2011-05-25 Thread Sarah Goslee
We really need some more information to be able to help you (as
requested in the posting guide):

What OS?
What version of R?

How did you install nlme? Were there any messages?

What happens when you type library(nlme) at the R prompt?

How did you install mgcv? Were there any messages?


On Wed, May 25, 2011 at 11:13 AM,  gbre...@ssc.wisc.edu wrote:
 Hi.

 I have been trying to load the mgcv package but I always get the error
 message:

  there is no package called 'nlme'
 Error: package/namespace load failed for 'mgcv'


 I load the package nlme and still I get the same message.  I have noticed
 that there are some problems in using nlme in recent versions of R.  Is
 there any suggestion or any special issue that I should know about nlme or
 mgcv?

 Thanks


 Gilbert


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Print the content of several columns in only one

2011-05-25 Thread Scott Chamberlain
Or use melt in the reshape2 package to melt all columns to one with an indexing 
column to boot...

__Scott Chamberlain
Rice University, EEB Dept.

On Wednesday, May 25, 2011 at 7:06 AM, zoe.cryocla wrote: 
 Ok, I found how to do, 
 with the function paste() 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Print-the-content-of-several-columns-in-only-one-tp3549114p3549514.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka




 Date: Wed, 25 May 2011 10:18:48 -0400
 From: ro...@bestroman.com
 To: mailinglist.honey...@gmail.com
 CC: r-help@r-project.org
 Subject: Re: [R] Processing large datasets

  Hi,
  If your datasets are *really* huge, check out some packages listed
  under the Large memory and out-of-memory data section of the
  HighPerformanceComputing task view at CRAN:

  http://cran.r-project.org/web/views/HighPerformanceComputing.html

Does this have any specific limitations ? It sounds offhand like it
does paging and all the needed buffering for arbitrary size
data. Does it work with everything? I seem to recall bigmemory came up
before in this context and there was some problem.

Thanks.




  Also, if you find yourself needing to do lots of
  grouping/summarizing type of calculations over large data
  frame-like objects, you might want to check out the data.table package:

  http://cran.r-project.org/web/packages/data.table/index.html

  --
  Steve Lianoglou
  Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
  Contact Info: http://cbio.mskcc.org/~lianos/contact

 I don't think data.table is fundamentally different from data.frame type, but 
 thanks for the suggestion.

 http://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.pdf
 Just like data.frames, data.tables must fit inside RAM

 The ff package by Adler, listed in Large memory and out-of-memory data is 
 probably most interesting.

 --Roman Naumenko

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise selection cox model

2011-05-25 Thread Marc Schwartz
Hi,

You are unlikely to find one, as fundamentally, stepwise procedures are a bad 
way to engage in covariate selection. Search the list archives at rseek.org 
using 'stepwise' as the keyword to see a plethora of discussion on this point.

This is not a new issue BTW, as I happened to stumble upon this 1998 Stata FAQ 
recently during a related search:

  http://www.stata.com/support/faqs/stat/stepwise.html

and there are more recent literature citations and books that reinforce those 
points.

HTH,

Marc Schwartz

On May 25, 2011, at 4:28 AM, linda Porz wrote:

 Sorry, I have wrote a wrong subject in the first email!
 
 Regards,
 Linda
 
 -- Forwarded message --
 From: linda Porz linda.p...@gmail.com
 Date: 2011/5/25
 Subject: combined odds ratio
 To: r-help@r-project.org
 Cc: r-help-requ...@stat.math.ethz.ch
 
 
 Dear all,
 
 I am looking for an R function which does stepwise selection cox model in r
 (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr:
 stcox in STATA.
 
 I am very thankful for any reply.
 
 Regards,
 Linda

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Importing fixed-width data

2011-05-25 Thread James Rome
I have a data set where the lines look like:
2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
Some lines are missing the field before and after the NON:
2011-05-13 00:00:05 EONBHS229 mia13001621NON

I read them into R using
df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
   
col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source),
   
colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor))

The documentation for read.fwf says that the data are read into a
dataframe. Yet, I get a list, and the conversions I specified do not
seem to have been obeyed:
 df[1:20,]
 DateTime  Flight Dest  ArrTime MsgType Conf
Runway Source
1  2011-05-13 00:00:00 AAL330   dfa 13002516  PSCNON  A
2  2011-05-13 00:00:01 AAL223   laa 13044510  AS.NON  M
. . .
 sapply(df, mode)
   DateTime  FlightDest ArrTime MsgTypeConf
  numeric   numeric   numeric   numeric character   numeric
 Runway  Source
  numeric   numeric
 dfn = df[!is.na(df$Source),]
 mode(df)
[1] list

What am I doing wrong?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing fixed-width data

2011-05-25 Thread Ian Gow
Everything looks OK. Does this help?

 test - 
data.frame(alpha=as.factor(c(A,A,B,B,C)),number=c(1,2,3,4,5))
 mode(test)
[1] list
 class(test)
[1] data.frame
 sapply(test, mode)

alphanumber
numeric numeric
 sapply(test, class)
alphanumber
 factor numeric


On 5/25/11 10:42 AM, James Rome jamesr...@gmail.com wrote:

I have a data set where the lines look like:
2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
Some lines are missing the field before and after the NON:
2011-05-13 00:00:05 EONBHS229 mia13001621NON

I read them into R using
df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),
   
col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway
,Source),
   
colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,
factor))

The documentation for read.fwf says that the data are read into a
dataframe. Yet, I get a list, and the conversions I specified do not
seem to have been obeyed:
 df[1:20,]
 DateTime  Flight Dest  ArrTime MsgType Conf
Runway Source
1  2011-05-13 00:00:00 AAL330   dfa 13002516  PSCNON  A
2  2011-05-13 00:00:01 AAL223   laa 13044510  AS.NON  M
. . .
 sapply(df, mode)
   DateTime  FlightDest ArrTime MsgTypeConf
  numeric   numeric   numeric   numeric character   numeric
 Runway  Source
  numeric   numeric
 dfn = df[!is.na(df$Source),]
 mode(df)
[1] list

What am I doing wrong?

Thanks,
Jim Rome

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise selection cox model

2011-05-25 Thread Bert Gunter
See the Vignette in the glmnet package for one alternative approach to
variable selection. Of course, you need to gain some background to
know what you're doing here.

-- Bert

On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz marc_schwa...@me.com wrote:
 Hi,

 You are unlikely to find one, as fundamentally, stepwise procedures are a bad 
 way to engage in covariate selection. Search the list archives at rseek.org 
 using 'stepwise' as the keyword to see a plethora of discussion on this point.

 This is not a new issue BTW, as I happened to stumble upon this 1998 Stata 
 FAQ recently during a related search:

  http://www.stata.com/support/faqs/stat/stepwise.html

 and there are more recent literature citations and books that reinforce those 
 points.

 HTH,

 Marc Schwartz

 On May 25, 2011, at 4:28 AM, linda Porz wrote:

 Sorry, I have wrote a wrong subject in the first email!

 Regards,
 Linda

 -- Forwarded message --
 From: linda Porz linda.p...@gmail.com
 Date: 2011/5/25
 Subject: combined odds ratio
 To: r-help@r-project.org
 Cc: r-help-requ...@stat.math.ethz.ch


 Dear all,

 I am looking for an R function which does stepwise selection cox model in r
 (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05) lr:
 stcox in STATA.

 I am very thankful for any reply.

 Regards,
 Linda

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Hugo Mildenberger
With PostgreSQL at least, R can also be used as implementation
language for stored procedures. Hence data transfers between 
processes can be avoided alltogether. 

   http://www.joeconway.com/plr/

Implemention of such a procedure in R appears to be straighforward:
 
   CREATE OR REPLACE FUNCTION overpaid (emp) RETURNS bool AS '
  if (20  arg1$salary) {
  return(TRUE)
  }
  if (arg1$age  30  10  arg1$salary) {
  return(TRUE)
  }
  return(FALSE)
' LANGUAGE 'plr';

  CREATE TABLE emp (name text, age int, salary numeric(10,2));
INSERT INTO emp VALUES ('Joe', 41, 25.00);
INSERT INTO emp VALUES ('Jim', 25, 12.00);
INSERT INTO emp VALUES ('Jon', 35, 5.00);
 

  SELECT name, overpaid(emp) FROM emp;
   name | overpaid
--+--
Joe  | t
Jim  | t
Jon  | f
   (3 rows)


Best 



On Wednesday 25 May 2011 14:12:23 Jonathan Daily wrote:
 In cases where I have to parse through large datasets that will not
 fit into R's memory, I will grab relevant data using SQL and then
 analyze said data using R. There are several packages designed to do
 this, like [1] and [2] below, that allow you to query a database using
 SQL and end up with that data in an R data.frame.
 
 [1] http://cran.cnr.berkeley.edu/web/packages/RMySQL/index.html
 [2] http://cran.cnr.berkeley.edu/web/packages/RSQLite/index.html
 
 On Wed, May 25, 2011 at 12:29 AM, Roman Naumenko ro...@bestroman.com wrote:
  Hi R list,
 
  I'm new to R software, so I'd like to ask about it is capabilities.
  What I'm looking to do is to run some statistical tests on quite big
  tables which are aggregated quotes from a market feed.
 
  This is a typical set of data.
  Each day contains millions of records (up to 10 non filtered).
 
  2011-05-24  750 Bid DELL14130770400
  15.4800 BATS35482391Y   1   1   0   0
  2011-05-24  904 Bid DELL14130772300
  15.4800 BATS35482391Y   1   0   0   0
  2011-05-24  904 Bid DELL14130773135
  15.4800 BATS35482391Y   1   0   0   0
 
  I'll need to filter it out first based on some criteria.
  Since I keep it mysql database, it can be done through by query. Not
  super efficient, checked it already.
 
  Then I need to aggregate dataset into different time frames (time is
  represented in ms from midnight, like 35482391).
  Again, can be done through a databases query, not sure what gonna be faster.
  Aggregated tables going to be much smaller, like thousands rows per
  observation day.
 
  Then calculate basic statistic: mean, standard deviation, sums etc.
  After stats are calculated, I need to perform some statistical
  hypothesis tests.
 
  So, my question is: what tool faster for data aggregation and filtration
  on big datasets: mysql or R?
 
  Thanks,
  --Roman N.
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Trouble Combining With Paste

2011-05-25 Thread Sparks, John James
Dear R Helpers,

I am having trouble combining some pieces of programming that work fine
individually, but fall down when I try to get them to work together.

The end goal is to take a data frame, and if any of the variables has more
than 10 values, then use cut2 to reduce the number of (effective) values
to 10.  I want to do this in automated fashion, which is where the
combining comes in.

For example all of these pieces work as I would expect:


tables-lapply(infert,table)
lengths-lapply(tables,length)
toolong-which(lengths10)

require(Hmisc)

foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE))
str(foo)
#num [1:248] 2 10 9 7 7 8 1 6 1 3 ...

bar-paste(inftert$,attr(toolong[1],names),sep=)
bar
#[1] inftert$age

But the following gives an error:

foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE))
Error in min(diff(x.unique))/2 : non-numeric argument to binary operator
In addition: Warning message:
In min(diff(x.unique)) : no non-missing arguments, returning NA


Your guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] the mgcv package can not be loaded

2011-05-25 Thread Sarah Goslee
Well, that answered some of my questions, though you forgot to send
your answer to the r-help list rather than just to me. I don't use
windows, so someone else may have better advice.

On Wed, May 25, 2011 at 12:02 PM,  gbre...@ssc.wisc.edu wrote:
 Sorry, I forgot to be more specific.

 I am using Windows XP.

 I am using R.12.2


 I installed both packages from the install packages menu.

And were there any messages?

 I always write library(name.of.library), and it is enough.

 But when I write library(nlme), R does not find nlme right away

 I load nlme first and it says package was downloaded succesfully.

load? Installed? Downloaded successfully is not the same as installed
successfully. How about the actual wording?

 However, when I try to do this again in another day, R cannot find nlme,
 so I try to load mgcv with library(mgcv), then I get this message:

 Error: package 'nlme' could not be loaded
 In addition: Warning message:
 In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
 lib.loc) :
  there is no package called 'nlme'



 Is there any problem with nlme that I need to install it every time I open R?

I wouldn't think so. But obviously something is not right, and you
still haven't provided enough information to be able to diagnose the
problem.

Sarah


 Gilbert



 We really need some more information to be able to help you (as
 requested in the posting guide):

 What OS?
 What version of R?

 How did you install nlme? Were there any messages?

 What happens when you type library(nlme) at the R prompt?

 How did you install mgcv? Were there any messages?


 On Wed, May 25, 2011 at 11:13 AM,  gbre...@ssc.wisc.edu wrote:
 Hi.

 I have been trying to load the mgcv package but I always get the error
 message:

  there is no package called 'nlme'
 Error: package/namespace load failed for 'mgcv'


 I load the package nlme and still I get the same message.  I have
 noticed
 that there are some problems in using nlme in recent versions of R.  Is
 there any suggestion or any special issue that I should know about nlme
 or
 mgcv?

 Thanks


 Gilbert




-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Combining With Paste

2011-05-25 Thread Joshua Wiley
Hi John,

The issue is that:

infert$age != infert$age

One is a text string, the other references the information stored in
the age variable of the infert object.  If you need to pass the names
as a string, use [ instead:

## for a data frame
infert[, age]
## for a list
infert[[age]]

It looks like from your code maybe: infert[, attr(toolong[1],names)]

HTH,

Josh


On Wed, May 25, 2011 at 9:02 AM, Sparks, John James jspa...@uic.edu wrote:
 Dear R Helpers,

 I am having trouble combining some pieces of programming that work fine
 individually, but fall down when I try to get them to work together.

 The end goal is to take a data frame, and if any of the variables has more
 than 10 values, then use cut2 to reduce the number of (effective) values
 to 10.  I want to do this in automated fashion, which is where the
 combining comes in.

 For example all of these pieces work as I would expect:


 tables-lapply(infert,table)
 lengths-lapply(tables,length)
 toolong-which(lengths10)

 require(Hmisc)

 foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE))
 str(foo)
 #num [1:248] 2 10 9 7 7 8 1 6 1 3 ...

 bar-paste(inftert$,attr(toolong[1],names),sep=)
 bar
 #[1] inftert$age

 But the following gives an error:

 foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE))
 Error in min(diff(x.unique))/2 : non-numeric argument to binary operator
 In addition: Warning message:
 In min(diff(x.unique)) : no non-missing arguments, returning NA


 Your guidance would be much appreciated.

 --John J. Sparks, Ph.D.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Combining With Paste

2011-05-25 Thread Sarah Goslee
You need to use get() so that you are acting on the dataframe,
and not the string that names the dataframe.

Sarah

On Wed, May 25, 2011 at 12:02 PM, Sparks, John James jspa...@uic.edu wrote:
 Dear R Helpers,

 I am having trouble combining some pieces of programming that work fine
 individually, but fall down when I try to get them to work together.

 The end goal is to take a data frame, and if any of the variables has more
 than 10 values, then use cut2 to reduce the number of (effective) values
 to 10.  I want to do this in automated fashion, which is where the
 combining comes in.

 For example all of these pieces work as I would expect:


 tables-lapply(infert,table)
 lengths-lapply(tables,length)
 toolong-which(lengths10)

 require(Hmisc)

 foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE))
 str(foo)
 #num [1:248] 2 10 9 7 7 8 1 6 1 3 ...

 bar-paste(inftert$,attr(toolong[1],names),sep=)
 bar
 #[1] inftert$age

 But the following gives an error:

 foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE))
 Error in min(diff(x.unique))/2 : non-numeric argument to binary operator
 In addition: Warning message:
 In min(diff(x.unique)) : no non-missing arguments, returning NA


 Your guidance would be much appreciated.

 --John J. Sparks, Ph.D.



-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot groups of different size i.e. height is NOT a matrix

2011-05-25 Thread Walmes Zeviani
Victor,

I agree with Marc's point of view. So, if you can use another representation
of you data, like points, considering looking at
http://lmdvr.r-forge.r-project.org/figures/figures.html figures 10.20 and
10.21 for a start point.

Walmes.

==
Walmes Marques Zeviani
LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W)
Departamento de Estatística - Universidade Federal do Paraná
fone: (+55) 41 3361 3573
VoIP: (3361 3600) 1053 1173
e-mail: wal...@ufpr.br
twitter: @walmeszeviani
homepage: http://www.leg.ufpr.br/~walmes
linux user number: 531218
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [Fwd: Re: the mgcv package can not be loaded]

2011-05-25 Thread gbrenes
Sorry, I forgot to be more specific.

I am using Windows XP.

I am using R.12.2


I installed both packages from the install packages menu.

I always write library(name.of.library), and it is enough.

But when I write library(nlme), R does not find nlme right away

I load nlme first and it says package was downloaded succesfully.

However, when I try to do this again in another day, R cannot find nlme,
so I try to load mgcv with library(mgcv), then I get this message:

Error: package 'nlme' could not be loaded
In addition: Warning message:
In library(pkg, character.only = TRUE, logical.return = TRUE, lib.loc =
lib.loc) :
  there is no package called 'nlme'



Is there any problem with nlme that I need to install it every time I open R?


Gilbert



 We really need some more information to be able to help you (as
 requested in the posting guide):

 What OS?
 What version of R?

 How did you install nlme? Were there any messages?

 What happens when you type library(nlme) at the R prompt?

 How did you install mgcv? Were there any messages?


 On Wed, May 25, 2011 at 11:13 AM,  gbre...@ssc.wisc.edu wrote:
 Hi.

 I have been trying to load the mgcv package but I always get the error
 message:

  there is no package called 'nlme'
 Error: package/namespace load failed for 'mgcv'


 I load the package nlme and still I get the same message.  I have
 noticed
 that there are some problems in using nlme in recent versions of R.  Is
 there any suggestion or any special issue that I should know about nlme
 or
 mgcv?

 Thanks


 Gilbert


 --
 Sarah Goslee
 http://www.functionaldiversity.org


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trouble Combining With Paste

2011-05-25 Thread Phil Spector

John -
   Try

infert[,toolong] = sapply(infert[,toolong],cut2,g=10,levels.mean=TRUE)


- Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 spec...@stat.berkeley.edu


On Wed, 25 May 2011, Sparks, John James wrote:


Dear R Helpers,

I am having trouble combining some pieces of programming that work fine
individually, but fall down when I try to get them to work together.

The end goal is to take a data frame, and if any of the variables has more
than 10 values, then use cut2 to reduce the number of (effective) values
to 10.  I want to do this in automated fashion, which is where the
combining comes in.

For example all of these pieces work as I would expect:


tables-lapply(infert,table)
lengths-lapply(tables,length)
toolong-which(lengths10)

require(Hmisc)

foo-as.numeric(cut2(infert$age,g=10,levels.mean=TRUE))
str(foo)
#num [1:248] 2 10 9 7 7 8 1 6 1 3 ...

bar-paste(inftert$,attr(toolong[1],names),sep=)
bar
#[1] inftert$age

But the following gives an error:

foobar-as.numeric(cut2(paste(inftert$,attr(toolong[1],names),sep=),g=10,levels.mean=TRUE))
Error in min(diff(x.unique))/2 : non-numeric argument to binary operator
In addition: Warning message:
In min(diff(x.unique)) : no non-missing arguments, returning NA


Your guidance would be much appreciated.

--John J. Sparks, Ph.D.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Steve Lianoglou
Hi,

On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka marchy...@hotmail.com wrote:
[snip]
  If your datasets are *really* huge, check out some packages listed
  under the Large memory and out-of-memory data section of the
  HighPerformanceComputing task view at CRAN:

  http://cran.r-project.org/web/views/HighPerformanceComputing.html

 Does this have any specific limitations ? It sounds offhand like it
 does paging and all the needed buffering for arbitrary size
 data. Does it work with everything?

I'm not sure what limitations ... I know the bigmemory (and ff)
packages try hard to make using out-of-memory datasets as
transparent as possible.

That having been said, I guess you will have to port more advanced
methods to use such packages, hence the existence of the biglm,
biganalytics, bigtabulate packages do.

 I seem to recall bigmemory came up
 before in this context and there was some problem.

Well -- I don't often see emails on this list complaining about their
functionality. That doesn't mean they're flawless (I also don't
scrutinize the list traffic too closely). It could be that not too
many people use them, or that people give up before they come knocking
when there is a problem.

Has something specifically failed for you in the past, or?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adjusted Rate Ratios in R

2011-05-25 Thread Walmes Zeviani
Matthew,

You can change the matrix (restriction) involved. Start from

help(contr.sum)

to know how specify this.

Walmes.

==
Walmes Marques Zeviani
LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W)
Departamento de Estatística - Universidade Federal do Paraná
fone: (+55) 41 3361 3573
VoIP: (3361 3600) 1053 1173
e-mail: wal...@ufpr.br
twitter: @walmeszeviani
homepage: http://www.leg.ufpr.br/~walmes
linux user number: 531218
==

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] stepwise selection cox model

2011-05-25 Thread David Winsemius


On May 25, 2011, at 12:11 PM, linda Porz wrote:

Many thanks for your reply. I have run a stepwise selection in Stata  
and R using the function fastbw (rule=p) from Design package. Both  
functions give the same results. Is this because both functions do  
the same job or can it be that for different data one will have  
different results?


I don't understand your question. Why would giving the same results  
be a concern? And why would one expect that with different data one  
would _not_ get different results? The point of the critique against  
stepwise procedures is that they assume too much determinism (i.e.  
that all of the internal structure of the small sample of data will be  
present in the wider universe) and that they generate too much  
confidence on the part of the unwary and insufficiently educated user.


--
David.



Many thanks,
Linda



2011/5/25 Bert Gunter gunter.ber...@gene.com
See the Vignette in the glmnet package for one alternative approach to
variable selection. Of course, you need to gain some background to
know what you're doing here.

-- Bert

On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz  
marc_schwa...@me.com wrote:

 Hi,

 You are unlikely to find one, as fundamentally, stepwise  
procedures are a bad way to engage in covariate selection. Search  
the list archives at rseek.org using 'stepwise' as the keyword to  
see a plethora of discussion on this point.


 This is not a new issue BTW, as I happened to stumble upon this  
1998 Stata FAQ recently during a related search:


  http://www.stata.com/support/faqs/stat/stepwise.html

 and there are more recent literature citations and books that  
reinforce those points.


 HTH,

 Marc Schwartz

 On May 25, 2011, at 4:28 AM, linda Porz wrote:

 Sorry, I have wrote a wrong subject in the first email!

 Regards,
 Linda

 -- Forwarded message --
 From: linda Porz linda.p...@gmail.com
 Date: 2011/5/25
 Subject: combined odds ratio
 To: r-help@r-project.org
 Cc: r-help-requ...@stat.math.ethz.ch


 Dear all,

 I am looking for an R function which does stepwise selection cox  
model in r
 (delta chisq likelihood ratio test) similar to the stepwise, pe  
(0.05) lr:

 stcox in STATA.

 I am very thankful for any reply.

 Regards,
 Linda

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




--
Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions.

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I assign boolean (o,1) values to a column?

2011-05-25 Thread Xenimes
Thankyou very much, I managed to count he numbr of Markers 2 linked to
Markers 3. And Markers 1 to Markers 3 with the aggregate function:

with(data,aggregate(Marker1,list(Marker2=Marker2),length))
data2-with(data,aggregate(Marker1,list(Marker2=Marker2,Marker3=Merker3),length))

So, now is easy I will only apply an if and solved. 

I want to thankyou Steve and David, the info you gave was actally usefull
and I learned the ave now. I hope I can start being usefull in the R blog
myself soon.

Regards

--
View this message in context: 
http://r.789695.n4.nabble.com/How-do-I-assign-boolean-o-1-values-to-a-column-tp3544304p3550309.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to intantiate a list of data.frames?

2011-05-25 Thread Rui Maximo

Hi Josh,

You are definitely right. And were all time.
Yes, the problem was always with the write.csv(). I though it was with the ds.
Thank you very much.

Cheers,
Rui

 Date: Tue, 24 May 2011 14:30:56 -0700
 Subject: Re: [R] How to intantiate a list of data.frames?
 From: jwiley.ps...@gmail.com
 To: ruimax...@hotmail.com
 CC: r-help@r-project.org
 
 Hi Rui,
 
 Please look at the documentation for ?write.csv
 
 I do not have oilDF, but my guess is that you make the object, ds
 fine, but then you are trying to pass a list to write.csv which works
 on matrices or data frames (or attempts to coerce to such).  The
 easiest answer is probably to write each element of ds (that is,
 each data frame) to a separate file.
 
 Cheers,
 
 Josh
 
 On Sun, May 22, 2011 at 12:11 PM, Rui Maximo ruimax...@hotmail.com wrote:
  I will post the whole function, but I believe the problem is in the 3th
  part.
  The issue is that oilDF has different number of rows than oilDF2.
 
  Thank you,
  Rui
 
  myScan - function(dirPath, num)
  {
  #dirPath is the name of the directory where we want to apply the function.
  It should be called from the immediate above level without the last 3
  characters. For example dirPath=oil_0
  #num is the mussel number
 
  #Heart rate
  startPath=getwd()
  workPath=paste(startPath,/, dirPath,_HR, sep=)
  setwd(workPath)
  temp=dir()
  d=sort(temp)
  oilDF=read.table (d[1], header=TRUE)
  oilDF=data.frame(oilDF[,1], oilDF[,2], oilDF[,num+2])
  for(i in 2:length(d))
  {
  temp - read.table(d[i], header=TRUE)
  temp=data.frame(temp[,1], temp[,2], temp[,num+2])
  colnames(temp) - colnames(oilDF)
  oilDF=rbind(oilDF,temp)
  }
  setwd(startPath)
 
  #Valve Gape
  workPath=paste(startPath,/, dirPath,_VG, sep=)
  setwd(workPath)
  temp=dir()
  d=sort(temp)
  oilDF2=read.table (d[1], header=FALSE)
  oilDF2=data.frame(oilDF2[,1],oilDF2[,2],oilDF2[,num+3])
  for(i in 2:length(d))
  {
  temp - read.table(d[i], header=FALSE)
  temp=data.frame(temp[,1], temp[,2], temp[,num+3])
  colnames(temp) - colnames(oilDF2)
  oilDF2=rbind(oilDF2,temp)
  }
 
  #Pack both signals in a vector of dataframes for each Mussel.
  ds - vector(list, 2)
  timeHR = as.numeric(strptime(paste(oilDF[,1],oilDF[,2]), %m/%d/%y
  %H:%M:%OS))
  timeVG = as.numeric(strptime(paste(oilDF2[,1],oilDF2[,2]), %d/%m/%y
  %H:%M:%OS))
  ds[[1]] - data.frame(timeHR,oilDF[,3])
  ds[[2]] - data.frame(timeVG,oilDF2[,3])
  write.csv(ds,paste(startPath, /, mussel_, i, dirPath, .csv,
  sep=))
  return(ds)
  }
 
  Date: Sun, 22 May 2011 11:33:38 -0700
  Subject: Re: [R] How to intantiate a list of data.frames?
  From: jwiley.ps...@gmail.com
  To: ruimax...@hotmail.com
  CC: r-help@r-project.org
 
  Hi Rui,
 
  data frames must have the same number of rows, but two different data
  frames stored within a list do not need to have the same number of
  rows. Can you please post the code that is giving the error?
 
  Josh
 
  On Sun, May 22, 2011 at 9:41 AM, Rui Maximo ruimax...@hotmail.com wrote:
   Hi Josh,
  
   Sorry, your examples have equal number of rows in both df and df2.
   In my situation they haven't.
   Strangely, your solution have worked only when I am copy post the code
   into
   the command line.
   If I use the code inside of a function I get an error at:
   return(ds)
   ERROR: arguments imply differing number of rows
  
   Thanks,
   Rui
  
   Date: Sat, 21 May 2011 11:46:05 -0700
   Subject: Re: [R] How to intantiate a list of data.frames?
   From: jwiley.ps...@gmail.com
   To: ruimax...@hotmail.com
   CC: r-help@r-project.org
  
   Hi Rui,
  
   Here is one option:
  
   ds - vector(list, 6)
   for(i in 1:6) ds[[i]] - list(df = mtcars[, c(i, i + 2)], df2 =
   mtcars[, c(i, i + 2)] + 10)
  
   another could be:
  
   altds - lapply(1:6, function(x) {
   list(df = mtcars[, c(x, x + 2)], df2 = mtcars[, c(x, x + 2)] + 10)
   })
  
   all.equal(ds, altds)
  
   For some documentation, see
  
   ?vector
   ?lapply
  
   Cheers,
  
   Josh
  
   On Sat, May 21, 2011 at 10:47 AM, Rui Maximo ruimax...@hotmail.com
   wrote:
   
Hello,
   
 I am newbie to R and I want to do this:
   
for(i in 1:6)
{
   ds[i] - list(df=data.frame(oilDF[,1],oilDF[,i+2]),
df2=data.frame(oilDF2[,1],oilDF2[,i+2]))
}
   
#oilDF and oilDF2 are 2 data frames with several columns. They have
different number of rows
   
#I want to have for example ds[1]$df, ds[1]$df2 with the respective
data.frames.
#How can I instantiate a list of data.frames pairs with different
number
of rows?
   
Thank you,
Rui
   
   [[alternative HTML version deleted]]
   
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

Re: [R] stepwise selection cox model

2011-05-25 Thread linda Porz
Many thanks for your reply. I have run a stepwise selection in Stata and R
using the function fastbw (rule=p) from Design package. Both functions
give the same results. Is this because both functions do the same job or can
it be that for different data one will have different results?

Many thanks,
Linda



2011/5/25 Bert Gunter gunter.ber...@gene.com

 See the Vignette in the glmnet package for one alternative approach to
 variable selection. Of course, you need to gain some background to
 know what you're doing here.

 -- Bert

 On Wed, May 25, 2011 at 8:38 AM, Marc Schwartz marc_schwa...@me.com
 wrote:
  Hi,
 
  You are unlikely to find one, as fundamentally, stepwise procedures are a
 bad way to engage in covariate selection. Search the list archives at
 rseek.org using 'stepwise' as the keyword to see a plethora of discussion
 on this point.
 
  This is not a new issue BTW, as I happened to stumble upon this 1998
 Stata FAQ recently during a related search:
 
   http://www.stata.com/support/faqs/stat/stepwise.html
 
  and there are more recent literature citations and books that reinforce
 those points.
 
  HTH,
 
  Marc Schwartz
 
  On May 25, 2011, at 4:28 AM, linda Porz wrote:
 
  Sorry, I have wrote a wrong subject in the first email!
 
  Regards,
  Linda
 
  -- Forwarded message --
  From: linda Porz linda.p...@gmail.com
  Date: 2011/5/25
  Subject: combined odds ratio
  To: r-help@r-project.org
  Cc: r-help-requ...@stat.math.ethz.ch
 
 
  Dear all,
 
  I am looking for an R function which does stepwise selection cox model
 in r
  (delta chisq likelihood ratio test) similar to the stepwise, pe (0.05)
 lr:
  stcox in STATA.
 
  I am very thankful for any reply.
 
  Regards,
  Linda
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Men by nature long to get on to the ultimate truths, and will often
 be impatient with elementary studies or fight shy of them. If it were
 possible to reach the ultimate truths without the elementary studies
 usually prefixed to them, these would not be preparatory studies but
 superfluous diversions.

 -- Maimonides (1135-1204)

 Bert Gunter
 Genentech Nonclinical Biostatistics


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subtracting rows by id

2011-05-25 Thread Sara Maxwell
Dear R users,

I have two datasets:

id1 - c(rep(1,10), rep(2,10), rep(3,10))
value1 - sample(1:100, 30, replace=TRUE)
dataset1 - cbind(id1,value1)

id2 - c(1,2,3)
subtract.value - c(1,3,5)
dataset2 - cbind(id2, subtract.value)

I want to subtract the number of rows in the subtract.value that  
corresponds to the id value in dataset1.  So for the 1 in id1, I want  
to remove the first row, for 2 in id1 I want to remove the first 3  
rows, for 3 in id1 I want to remove the first 5 rows, finally creating  
a new dataframe with the remaining values.

I am having trouble structuring a loop that can do this by the unique  
ids in the first dataset while matching the ids in the datasets.

Any thoughts would be greatly appreciated.

Thank you,
Sara


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] barplot groups of different size i.e. height is NOT a matrix

2011-05-25 Thread Walmes Zeviani
You can produce a graph similar to the ggplot with lattice::barchart,

require(lattice)
dataset - data.frame(Main=c(A,A,A,B,B),
  Detail=c(a,b,c,1,2),
  value=runif(5, min= 0.5, max=1))
barchart(value~Detail|Main, data=dataset,
 scales=list(x=list(relation=free)))

Walmes.

==
Walmes Marques Zeviani
LEG (Laboratório de Estatística e Geoinformação, 25.450418 S, 49.231759 W)
Departamento de Estatística - Universidade Federal do Paraná
fone: (+55) 41 3361 3573
VoIP: (3361 3600) 1053 1173
e-mail: wal...@ufpr.br
twitter: @walmeszeviani
homepage: http://www.leg.ufpr.br/~walmes
linux user number: 531218
==


On Wed, May 25, 2011 at 12:04 PM, Marc Schwartz marc_schwa...@me.comwrote:

 On May 25, 2011, at 7:56 AM, Victor Gabillon wrote:

  Hello,
 
  I want to use the function barplot do display several group of bars.
  A standard example is given at this link
 
 http://onertipaday.blogspot.com/2007/05/make-many-barplot-into-one-plot.html
 
  But in their example the 4 groups of bars are all composed of 8 bars.
  I want to be able do display the same kind of graph but where the number
 of bars in each group are not the same. For example the first group of bars
 would have 2 bars and the second group of bars would have 10 bars.
 
  barplot function has a first parameter named height which is a matrix
 where each line  are the values  for the bars of one particular group.
  One solution could be to have a height matrix with NA values but then the
 space occupied by each group is equal to the size of the largest group!! So
 you end up with gaps (empty) where there are NAs.
 
  Do you know how to solve this problem?
  Do i have to consider multiple barplots in the same plot with the same
 axis? (btw, i don't know how to do that)
 
  In fact the bar would represent the performance of an algorithm.
  A group of bars would be the performance of an algorithms with different
 parameters.
  But when comparing different algorithms it is possible that we don't want
 to display the same number of parameters for each algorithm.
 
  Thanks for your help.
  Victor


 barplot() is fundamentally built upon the use of rect() to construct the
 bars, so you could always create your own variant to allow for the
 flexibility that you desire.

 That being said, if your performance measures (the bar heights) are other
 than discrete counts or proportions, I would advise you to consider using
 other visual presentation forms, as these are really the only two types of
 data for which barplots are generally considered satisfactory. A key to
 barplots of course is that they are based at 0 for proper visual comparison.
 Thus, if you need to have the minima of the relevant axis at a value other
 than 0, this is another reason to not use them.

 Even then, many folks have moved away from barplots to use point or dot
 plots and similar formats, especially where you also need to include some
 type of confidence interval for each measure.

 HTH,

 Marc Schwartz

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame housekeeping

2011-05-25 Thread David Winsemius


On May 25, 2011, at 1:16 PM, Scott Hatcher wrote:


Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for  
providing something I hoped would be produced from joining this  
mailing list: a means of discovering incredibly useful packages such  
as the reshape2 one you have introduced me too.


I have a follow up question to your solution (which should produce  
exactly what I need):


when I run the cast function to reassemble the data frame I get:


I used `dcast`.



Error in names(data) - array_names(res$labels[[2]]) :
 'names' attribute [7] must be the same length as the vector [1]


And I obviously didn't get that error, so there might be a difference  
in either the code (which you did not show), or the data (which you  
did not offer in a reproducible form).




This signaled to me that the function was returning 7 values where  
it expected only 1. To test this I applied a summary function mean  
to the cast, and the result processed (however it only produced NA's  
because my values were class:factors). What I don't understand is  
where these multiple values are coming from; there should be only a  
single value corresponding to the 4 id.vars given in the cast  
function (STN_ID,YEAR,MM,variable).


If you want further effort you should address the inadequacies of your  
question. It is very possible that you will need to acquaint yourself  
with the use of either `dump` pr `dput`.


--
David.


Thanks again for your help,

Scott Hatcher

On 24/05/2011 5:16 PM, David Winsemius wrote:


On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:


Hello,

I have a large data frame that is organized by date in a peculiar  
way. I
am seeking advice on how to transform the data into a format that  
is of

more use to me.

The data is organized as follows:

   STN_ID YEAR MM ELEM  X1 X2   X3  
X4X5X6 X7
1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190   
-00251  -00243  -00249
2  2402594 1997  10  1  -3  -5  -1   
-00039  -00031  -00036  -00033
3  2402594 1997  11  1  25  65  70   
69  000115  72  93


Where MM is the month of the year, and ELEM is the variable to  
which
the values in the X* columns describe (in the actual data there  
are 31 X

columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is  
picked

out of the original data frame.


assuming this dataframe is named 'tst':

require(reshape2)
mtst - melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
str(mtst)
#--
'data.frame':54 obs. of  6 variables:
$ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
$ YEAR: num  1997 1997 1997 1997 1998 ...
$ MM  : num  9 10 11 12 1 2 3 4 5 9 ...
$ ELEM: num  1 1 1 1 1 1 1 1 1 2 ...
$ variable: Factor w/ 3 levels X1,X2,X3: 1 1 1 1 1 1 1 1 1  
1 ...

$ value   : chr  -00233 -3 25 000160 ...

dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
#-
   STN_ID YEAR MM variable  1  2
1  2402594 1997  9   X1 -00233 -00339
2  2402594 1997  9   X2 -00204 -00339
3  2402594 1997  9   X3 -00119 -00343
4  2402594 1997 10   X1 -3 -00207
5  2402594 1997 10   X2 -5 -00289
6  2402594 1997 10   X3 -1 -00278
7  2402594 1997 11   X1 25 -00242
snipped output



I would like to organize the data so it looks like this:

 STN_ID YEAR MM DAYELEM1 ELEM2
1 2402594 1997   9  X1   -00233 -00339
2 2402594 1997   9  X2   -00204 77
3 2402594 1997   9  X3   -00119 30


Where is that second column coming from. I don't see it in the data  
example


Such that I create a new column named DAY that is made up of the
numbers following X in the original data.frame columns. Also,  
the ELEM
values are converted to columns and parsed with the ELEM code (in  
this

case 1 and 2).

I have tried to split apart the columns, transform them, and bind  
them
back together, but my ability to do so just isn't there yet. I am  
still

fairly new to R, and would really appreciate some help in working
towards organizing this data frame.

Thanks in advance,
Scott Hatcher

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: transpose ?

2011-05-25 Thread Dennis Murphy
Hi:

Does this work?

dd - read.table(textConnection(
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C
G  G  T   T   A   A   A   A   T   A   T   T   C   C   G   G
C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C
), stringsAsFactors = FALSE)

# Convert the data frame to a character matrix
# To do this, you need to make sure that the variables in
# your data frame are character rather than factor
dm - as.matrix(dd)
dm# elements should be quoted if character

# Create an empty list of nrow(dm) components
mm - vector('list', nrow(dm))

# Create a two-row matrix from each row of dm
for(i in seq_len(nrow(dm))) mm[[i]] - matrix(dm[i, ], nrow = 2)
 mm
[[1]]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] C  C  T  G  A  C  G  C
[2,] C  C  T  G  A  C  G  C

[[2]]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] G  T  A  A  T  T  C  G
[2,] G  T  A  A  A  T  C  G

[[3]]
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] C  C  T  G  A  C  G  C
[2,] C  C  T  G  A  C  G  C

HTH,
Dennis

On Wed, May 25, 2011 at 7:19 AM, Mohamed Lajnef
mohamed.laj...@inserm.fr wrote:
 Dear All,
 Sorry for the previous mail,suppose this data.frame
 D


 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22
 C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C
 G  G  T   T   A   A   A   A   T   A   T   T   C   C   G   G
 C  C  C   C   T   T   G   G   A   A   C   C   G   G   C   C





 I would translate D as follow ( just for the first line)

 C C T G A C G C
 C C T G A C G C
 (V8 under V7) (V9 under V10) ...

 Any help would be appreciated

 Regards
 M

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What does smaller than comparison do on strings?

2011-05-25 Thread Dennis Murphy
Hi:

Here are two alternatives that do work as you expect; sprintf() is your friend:

 sprintf(%2d, 1:12)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
 sprintf(%02d, 1:12)
 [1] 01 02 03 04 05 06 07 08 09 10 11 12
 sprintf(%2d, 1:12)  10
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE
 sprintf(%02d, 1:12)  10
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE

A leading space or leading 0 on the digits 1-9 'fixes' the problem for
the reason Duncan mentioned.

HTH,
Dennis

On Wed, May 25, 2011 at 3:06 AM, Niklaus Kuehnis
kuehnik_0...@gmx-topmail.de wrote:
 What's the logic behind the following, and where can I find any
 documentation about it? In particular, why are 2:9 - as characters - not
 regarded as being smaller than 10?

 # R-Code:
 a - as.character(1:12)

 a  10
 #  [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 FALSE

 Thanks in advance!

 Niklaus

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep pattern

2011-05-25 Thread jim holtman
try this using strsplit:

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)
 str(y)
Class 'Date'  num [1:10] 26551 37212 57285 90821 20168 ...
 y1 - as.character(y)
 str(y1)
 chr [1:10] 2042-09-11 2071-11-19 2126-11-04 2218-08-30
2025-03-21 2215-12-22 ...
 x - strsplit(y1, '-')
 x[1:3]
[[1]]
[1] 2042 09   11

[[2]]
[1] 2071 11   19

[[3]]
[1] 2126 11   04

 x.1 - sapply(x, '[', 3)
 str(x.1)
 chr [1:10] 11 19 04 30 21 22 24 03 31 02



On Tue, May 24, 2011 at 10:19 AM, Kang Min ngokang...@gmail.com wrote:
 I have another question -

 I'd like to extract dates from a vector of -mm-dd, so I just want
 the dd.

 x - round(runif(10)*10, digits=0)
 y - as.Date(x, origin=1970-01-01)

 I tried this based on the code that Jim provided, but it just printed
 the whole date. I think I just need to tweak it a little, but haven't
 been able to figure it out.

 y[grep([[:digit:]]{2}$, y)]

 Thanks.
 Kang Min

 On May 23, 7:22 am, jim holtman jholt...@gmail.com wrote:
 If you want to only match names of length 6, you will have to use 
 thispattern:

  x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ, ZAAZ, ZAZ,

 +     ZZAZ, ZRITEZ)









  # match exactly values of length 6
  len6 - ^Z[[:alpha:]]{4}Z$
 grep(len6, x)
 [1] 2 5 9

 On Sun, May 22, 2011 at 5:10 PM, Kang Min ngokang...@gmail.com wrote:
  Thanks!

  On May 21, 7:09 am, David Winsemius dwinsem...@comcast.net wrote:
  On May 20, 2011, at 11:57 AM, Kang Min wrote:

   Hi all,

   I'm trying to subset apatternin a vector. Each argument has 6
   letters, and I need those that start with Z and end with Z.

   e.g.
   x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)

   I've looked up other discussions but still can't seem to find the
   answer.

  You may need to study the regex page a bit longer

  the ^ is the beginning of a string
  .+ will math can arbitrarily long string of anything
  and $ indicates the end of a string

    x - c(ZFHSJK, ZFHJKZ,ZIOPWE,ZLKJSD,ZKFLPZ)
   grep(^Z.+Z$, x)
  [1] 2 5
   grep(^Z.+Z$, x, value=TRUE)
  [1] ZFHJKZ ZKFLPZ

   Thanks.
   Kangmin

   __
   r-h...@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting 
   guidehttp://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  David Winsemius, MD
  West Hartford, CT

  __
  r-h...@r-project.org mailing 
  listhttps://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting 
  guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

  __
  r-h...@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?

 __
 r-h...@r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix Manipulation...

2011-05-25 Thread Jim Silverton
Hello everyone,


I have a  2 x 5 matrix: say

0.2   0.3   1   -1   3
0.2.  0.4   5   0.5  -1

I want to replace all the values greater than or equal to 1 with 1 and those
less than or equal to 0 with 0. So I should end up with a mtrix looking
like:

0.2   0.3   1   0   1
0.2.  0.4   1   0.5  0

Any ideas how to do this?

-- 
Thanks,
Jim.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matrix Manipulation...

2011-05-25 Thread Sarah Goslee
It's very easy to do in two steps:
 testmat - matrix(c(.2, .3, 1, -1, 3, .2, .4, 5, .5, -1), byrow=TRUE, nrow=2)
 testmat
 [,1] [,2] [,3] [,4] [,5]
[1,]  0.2  0.31 -1.03
[2,]  0.2  0.45  0.5   -1
 testmat[testmat = 1] - 1
 testmat[testmat  0] - 0
 testmat
 [,1] [,2] [,3] [,4] [,5]
[1,]  0.2  0.31  0.01
[2,]  0.2  0.41  0.50

This is pretty basic. You might want to read one of the many excellent
intro to R guides, especially the subsetting section.

Sarah

On Wed, May 25, 2011 at 2:51 PM, Jim Silverton jim.silver...@gmail.com wrote:
 Hello everyone,


 I have a  2 x 5 matrix: say

 0.2   0.3   1   -1   3
 0.2.  0.4   5   0.5  -1

 I want to replace all the values greater than or equal to 1 with 1 and those
 less than or equal to 0 with 0. So I should end up with a mtrix looking
 like:

 0.2   0.3   1   0   1
 0.2.  0.4   1   0.5  0

 Any ideas how to do this?

 --
 Thanks,
 Jim.


-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Importing fixed-width data

2011-05-25 Thread Dennis Murphy
I get a data frame on my end:

lines - 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
2011-05-13 00:00:05 EONBHS229 mia13001621NON

df = read.fwf(textConnection(lines), widths=c(19,-4,7,3,8,2,1,3,1),
col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source),
colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor))
 df
 DateTime  Flight Dest  ArrTime MsgType Conf Runway Source
1 2011-05-13 00:00:00 AAL330   dfa 13002516  PSCNON  A
2 2011-05-13 00:00:01 AAL223   laa 13044510  AS.NON  M
3 2011-05-13 00:00:05 BHS229   mia 13001621  NON   NA   NA
 str(df)
'data.frame':   3 obs. of  8 variables:
 $ DateTime: POSIXct, format: 2011-05-13 00:00:00 2011-05-13 00:00:01 ...
 $ Flight  : Factor w/ 3 levels AAL223 ,AAL330 ,..: 2 1 3
 $ Dest: Factor w/ 3 levels dfa,laa,mia: 1 2 3
 $ ArrTime : Factor w/ 3 levels 13001621,13002516,..: 2 3 1
 $ MsgType : chr  PS AS NO
 $ Conf: Factor w/ 3 levels .,C,N: 2 1 3
 $ Runway  : Factor w/ 1 level NON: 1 1 NA
 $ Source  : Factor w/ 2 levels A,M: 1 2 NA

 sessionInfo()
R version 2.13.0 Patched (2011-04-19 r55523)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  grid  methods
[8] base

other attached packages:
 [1] gplots_2.8.0caTools_1.12bitops_1.0-4.1  gdata_2.8.2
 [5] gtools_2.6.2sos_1.3-0   brew_1.0-6  lattice_0.19-26
 [9] ggplot2_0.8.9   proto_0.3-9.2   reshape_0.8.4   plyr_1.5.2

loaded via a namespace (and not attached):
[1] tools_2.13.0

Dennis

On Wed, May 25, 2011 at 8:42 AM, James Rome jamesr...@gmail.com wrote:
 I have a data set where the lines look like:
 2011-05-13 00:00:00 EONAAL330 dfa13002516PSCNONA
 2011-05-13 00:00:01 EONAAL223 laa13044510AS.NONM
 Some lines are missing the field before and after the NON:
 2011-05-13 00:00:05 EONBHS229 mia13001621NON

 I read them into R using
    df = read.fwf(file, widths=c(19,-4,7,3,8,2,1,3,1),

 col.names=c(DateTime,Flight,Dest,ArrTime,MsgType,Conf,Runway,Source),

 colClasses=c(POSIXct,NA,factor,factor,character,factor,factor,factor))

 The documentation for read.fwf says that the data are read into a
 dataframe. Yet, I get a list, and the conversions I specified do not
 seem to have been obeyed:
 df[1:20,]
                         DateTime  Flight Dest  ArrTime MsgType Conf
 Runway Source
 1  2011-05-13 00:00:00 AAL330   dfa 13002516      PS    C    NON      A
 2  2011-05-13 00:00:01 AAL223   laa 13044510      AS    .    NON      M
 . . .
 sapply(df, mode)
   DateTime      Flight        Dest     ArrTime     MsgType        Conf
  numeric   numeric   numeric   numeric character   numeric
     Runway      Source
  numeric   numeric
 dfn = df[!is.na(df$Source),]
 mode(df)
 [1] list

 What am I doing wrong?

 Thanks,
 Jim Rome

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Job opening at Harvard Business School

2011-05-25 Thread Ao, Xiang
Dear Colleagues,

I'd like to draw your attention to the following job available at the Harvard 
Business School.  We are looking for a candidate with strong 
statistical/econometrical background, with strong programming skills in R or 
Stata.  Please apply through the following link.  If you know someone with this 
qualification, please forward the email.  Thanks!


Job Number: 23803BR

Statistician/Analyst
Harvard Business School
Boston, Massachusetts

Application:

http://jobs.brassring.com/1033/asp/tg/cim_jobdetail.asp?partnerID=25240siteID=5341AReq=23803br


Duties  Responsibilities
Reporting to Director of Research Computing Services, works directly with 
faculty and other RCS staff in support of research-related projects.  Provides 
advanced statistical consultation for faculty researchers and doctoral 
students. Maintains expertise in new research methodologies and techniques. 
Provides design and statistical consultation for researchers, as well as 
primary support and training for two or more statistical software packages 
(e.g., R, STATA, MATLAB, SAS, Mathematica). Employs methodological approaches 
such as multinomial logit and similar models,   time-series analysis, random 
effects models,   survival analysis, text analysis, and other appropriate 
tools.   Manages and manipulates data using packages such as Python, mySQL, and 
other tools.   Produces results as reports, presentations, graphics, web sites. 
Explores and tests statistical software. Develops statistical and technical 
documents for the RCS web site.

Basic Qualifications
Advanced degree in quantitative field required.  3+ years 
statistical/programming experience in research based setting; mathematics 
background; broad training and good habits in data management and analysis; 
expertise with multiple statistical software packages, including R, Stata, 
MatLab, SAS, or Mathematica; problem solving skills, organizational ability, 
communication skills, initiative. Strong customer service orientation. Ability 
to work independently and on a team. Demonstrated ability and desire to develop 
and maintain expertise in emerging research methods and technologies.

Additional Qualifications
Ph.D. preferred.  Desired abilities include experience in Linux-based parallel 
processing computing environments; business-related research experience; 
experience with large data sets; familiarity with computer programming 
languages, such as Python or C++ .

Chase H. Harrison
Director, Research Computing Services
Principal Survey Methodologist

Harvard Business School
Baker Library | Bloomberg Center B-93
Soldiers Field Rd.
Boston, MA 02163

617.495.6100 (Main)
617.496.6252 (Direct)
617.495.5287 (FAX)
charri...@hbs.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset

2011-05-25 Thread Michael Dewey

At 00:41 25/05/2011, Greg Snow wrote:
The only statistical method that I know of that can be applied to 
any dataset without further definition of the nature of the data or 
the question being asked is 
SnowsCorrectlySizedButOtherwiseUselessTestOfAnything which is found 
in the TeachingDemos package for R.


Greg, have you overlooked the intra-ocular trauma test?


However this test is not common (for a couple of very good reasons).

If you want a more useful method you first need to decide on what 
your question is that you want answered and have some more detail 
about the dataset.


-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R

Sent: Monday, May 23, 2011 12:12 PM
To: r-help@r-project.org
Subject: [R] What are the common Standard Statistical methods used 
for the analysis of a dataset


Hi,

Anybody know what are the common Standard statistical methods used for the
analysis of a dataset,and
anybody know which of these methods give similar results

Ram

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Accessing elements of a list

2011-05-25 Thread Seth W Bigelow
I have a list that is made of lists of varying length. I wish to create a 
new vector that contains the last element of each list. So far I have used 
sapply to determine the length of each list, but I'm stymied at the part 
where I index the list to make a new vector containing only the last item 
of each list

mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz))  # Create 
list

last - sapply(mylist,length) # Make vector with list lengths 

last_only - mylist[[1:length(mylist)]][last]   # Crash and burn trying to 
make new vector with last items! 

How do I do this last step?


Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing elements of a list

2011-05-25 Thread David Winsemius


On May 25, 2011, at 3:25 PM, Seth W Bigelow wrote:

I have a list that is made of lists of varying length. I wish to  
create a
new vector that contains the last element of each list. So far I  
have used
sapply to determine the length of each list, but I'm stymied at the  
part
where I index the list to make a new vector containing only the last  
item

of each list

mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz))  #  
Create

list

last - sapply(mylist,length) # Make vector with list lengths

last_only - mylist[[1:length(mylist)]][last]   # Crash and burn  
trying to

make new vector with last items!

How do I do this last step?


 lapply(mylist, tail, 1)
[[1]]
[1] 3

[[2]]
[1] dog

[[3]]
[1] zz

 unlist(lapply(mylist, tail, 1))
[1] 3   dog zz




Dr. Seth  W. Bigelow
Biologist, USDA-FS Pacific Southwest Research Station
1731 Research Park Drive, Davis California
sbige...@fs.fed.us /  ph. 530 759 1718
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing elements of a list

2011-05-25 Thread Marc Schwartz
On May 25, 2011, at 2:25 PM, Seth W Bigelow wrote:

 I have a list that is made of lists of varying length. I wish to create a 
 new vector that contains the last element of each list. So far I have used 
 sapply to determine the length of each list, but I'm stymied at the part 
 where I index the list to make a new vector containing only the last item 
 of each list
 
 mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz))  # Create 
 list
 
 last - sapply(mylist,length) # Make vector with list lengths 
 
 last_only - mylist[[1:length(mylist)]][last]   # Crash and burn trying to 
 make new vector with last items! 
 
 How do I do this last step?


See ?tail

 lapply(mylist, tail, 1)
[[1]]
[1] 3

[[2]]
[1] dog

[[3]]
[1] zz


You can't actually create a vector, since your list contains both numeric and 
alpha data types and a vector can only contain a single data type. The 3 would 
be coerced to 3 (a character 3, not the number 3).

If your actual data contains the same type in each element, replace lapply() 
above with sapply() and that will return a vector.

HTH,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing elements of a list

2011-05-25 Thread David Winsemius


On May 25, 2011, at 3:25 PM, Seth W Bigelow wrote:

I have a list that is made of lists of varying length. I wish to  
create a
new vector that contains the last element of each list. So far I  
have used
sapply to determine the length of each list, but I'm stymied at the  
part
where I index the list to make a new vector containing only the last  
item

of each list

mylist = list(c(1,2,3),c(cat,dog),c(x,y,z,zz))  #  
Create

list

last - sapply(mylist,length) # Make vector with list lengths

last_only - mylist[[1:length(mylist)]][last]   # Crash and burn  
trying to

make new vector with last items!


If you wanted to apply the successive values of last using [ to  
successive values of mylist there is a list-ish method via mapply:


 mapply([, mylist, last)
[1] 3   dog zz

`mapply` is also the function underlying `Vectorise`
--

David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting rows by id

2011-05-25 Thread Dennis Murphy
Hi:

Interesting problem. Here's one approach:

library(plyr)
# Read in your datasets as data frames rather than matrices
dataset1 - data.frame(id1 = rep(1:3, each = 10),
   value1 = sample(seq_len(100), 30, replace = TRUE))
dataset2 - data.frame(id2 = 1:3, subtract.value = c(1, 3, 5))

# The idea is to use the rows of dataset2 as parameters for
# subsetting and removing the first n_i rows. The tail() function
# serves the purpose:
foo - function(id2, subtract.value) tail(subset(dataset1, id1 ==
id2), -subtract.value)

# Use the mdply function in the plyr package:
 mdply(dataset2, foo)[, -(1:2)]
   id1 value1
11  2
21 55
31 18
41  4
51  3
61 76
71 74
81 21
91 97
10   2 19
11   2 49
12   2 20
13   2 73
14   2 79
15   2 95
16   2 52
17   3 60
18   3 58
19   3 68
20   3 59
21   3 13


HTH,
Dennis

On Wed, May 25, 2011 at 9:55 AM, Sara Maxwell smaxw...@ucsc.edu wrote:
 Dear R users,

 I have two datasets:

 id1 - c(rep(1,10), rep(2,10), rep(3,10))
 value1 - sample(1:100, 30, replace=TRUE)
 dataset1 - cbind(id1,value1)

 id2 - c(1,2,3)
 subtract.value - c(1,3,5)
 dataset2 - cbind(id2, subtract.value)

 I want to subtract the number of rows in the subtract.value that
 corresponds to the id value in dataset1.  So for the 1 in id1, I want
 to remove the first row, for 2 in id1 I want to remove the first 3
 rows, for 3 in id1 I want to remove the first 5 rows, finally creating
 a new dataframe with the remaining values.

 I am having trouble structuring a loop that can do this by the unique
 ids in the first dataset while matching the ids in the datasets.

 Any thoughts would be greatly appreciated.

 Thank you,
 Sara


        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are the common Standard Statistical methods used fo

2011-05-25 Thread Ted Harding
[See in-line below]

On 25-May-11 19:14:11, Michael Dewey wrote:
 At 00:41 25/05/2011, Greg Snow wrote:
The only statistical method that I know of that can be
applied to any dataset without further definition of the
nature of the data or the question being asked is
SnowsCorrectlySizedButOtherwiseUselessTestOfAnything
which is found in the TeachingDemos package for R.
 
 Greg, have you overlooked the intra-ocular trauma test?

No, Greg has not overlooked it. He invented it. However, he
never published it, preferring to communicate it by causing
others to feel its impact whenever he writes anything.

Ted.

However this test is not common (for a couple of very good reasons).

If you want a more useful method you first need to decide on what 
your question is that you want answered and have some more detail 
about the dataset.

-Original Message-
From: r-help-boun...@r-project.org 
[mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R
Sent: Monday, May 23, 2011 12:12 PM
To: r-help@r-project.org
Subject: [R] What are the common Standard Statistical methods used 
for the analysis of a dataset

Hi,

Anybody know what are the common Standard statistical methods used for
the
analysis of a dataset,and
anybody know which of these methods give similar results

Ram

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
 
 Michael Dewey
 i...@aghmed.fsnet.co.uk
 http://www.aghmed.fsnet.co.uk/home.html
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


E-Mail: (Ted Harding) ted.hard...@wlandres.net
Fax-to-email: +44 (0)870 094 0861
Date: 25-May-11   Time: 21:15:36
-- XFMail --

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Processing large datasets

2011-05-25 Thread Mike Marchywka









 Date: Wed, 25 May 2011 12:32:37 -0400
 Subject: Re: [R] Processing large datasets
 From: mailinglist.honey...@gmail.com
 To: marchy...@hotmail.com
 CC: ro...@bestroman.com; r-help@r-project.org

 Hi,

 On Wed, May 25, 2011 at 11:00 AM, Mike Marchywka  wrote:
 [snip]
   If your datasets are *really* huge, check out some packages listed
   under the Large memory and out-of-memory data section of the
   HighPerformanceComputing task view at CRAN:
 
   http://cran.r-project.org/web/views/HighPerformanceComputing.html
 
  Does this have any specific limitations ? It sounds offhand like it
  does paging and all the needed buffering for arbitrary size
  data. Does it work with everything?

 I'm not sure what limitations ... I know the bigmemory (and ff)
 packages try hard to make using out-of-memory datasets as
 transparent as possible.

 That having been said, I guess you will have to port more advanced
 methods to use such packages, hence the existence of the biglm,
 biganalytics, bigtabulate packages do.

  I seem to recall bigmemory came up
  before in this context and there was some problem.

 Well -- I don't often see emails on this list complaining about their
 functionality. That doesn't mean they're flawless (I also don't
 scrutinize the list traffic too closely). It could be that not too
 many people use them, or that people give up before they come knocking
 when there is a problem.

 Has something specifically failed for you in the past, or?

No, I haven't tried. I may have it confused with something else.
But this question does come up a bit usually related to 
 I tried to read huge file into data frame and wanted to pass
it to something with predictable memory access patterns and it
ran out of memory. What can I do? I guess I also stopped reading
anything after  using a DB as this is generally not a replacement
for a data strcuture. I'll take a look when I have a big dataset that
I can't condense easily. 







 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data Frame housekeeping

2011-05-25 Thread Scott Hatcher

Hello Dr. Winsemius,

First of all, thank you for your prompt and helpful reply. Also, for 
providing something I hoped would be produced from joining this mailing 
list: a means of discovering incredibly useful packages such as the 
reshape2 one you have introduced me too.


I have a follow up question to your solution (which should produce 
exactly what I need):


when I run the cast function to reassemble the data frame I get:

Error in names(data) - array_names(res$labels[[2]]) :
  'names' attribute [7] must be the same length as the vector [1]

This signaled to me that the function was returning 7 values where it 
expected only 1. To test this I applied a summary function mean to the 
cast, and the result processed (however it only produced NA's because my 
values were class:factors). What I don't understand is where these 
multiple values are coming from; there should be only a single value 
corresponding to the 4 id.vars given in the cast function 
(STN_ID,YEAR,MM,variable).


Thanks again for your help,

Scott Hatcher

On 24/05/2011 5:16 PM, David Winsemius wrote:


On May 24, 2011, at 3:03 PM, Scott Hatcher wrote:


Hello,

I have a large data frame that is organized by date in a peculiar way. I
am seeking advice on how to transform the data into a format that is of
more use to me.

The data is organized as follows:

STN_ID YEAR MM ELEM  X1 X2   X3 X4
X5X6 X7
1  2402594 1997   9   1 *-00233* *-00204* *-00119*  -00190  -00251  
-00243  -00249
2  2402594 1997  10  1  -3  -5  -1  -00039  
-00031  -00036  -00033
3  2402594 1997  11  1  25  65  70  69  
000115  72  93


Where MM is the month of the year, and ELEM is the variable to which
the values in the X* columns describe (in the actual data there are 31 X
columns, one for each day of the month). The values in bold are the
values that are transferred into the small chart below (which is the
result I hope to get). This is to give a sense of how the data is picked
out of the original data frame.


assuming this dataframe is named 'tst':

require(reshape2)
mtst - melt(tst[, 1:7], id.vars=1:4)  Only select idvars and  X1:X3
 str(mtst)
#--
'data.frame':54 obs. of  6 variables:
 $ STN_ID  : num  2402594 2402594 2402594 2402594 2402594 ...
 $ YEAR: num  1997 1997 1997 1997 1998 ...
 $ MM  : num  9 10 11 12 1 2 3 4 5 9 ...
 $ ELEM: num  1 1 1 1 1 1 1 1 1 2 ...
 $ variable: Factor w/ 3 levels X1,X2,X3: 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : chr  -00233 -3 25 000160 ...

dcast(mtst, STN_ID +YEAR+ MM  + variable ~ ELEM)
#-
STN_ID YEAR MM variable  1  2
1  2402594 1997  9   X1 -00233 -00339
2  2402594 1997  9   X2 -00204 -00339
3  2402594 1997  9   X3 -00119 -00343
4  2402594 1997 10   X1 -3 -00207
5  2402594 1997 10   X2 -5 -00289
6  2402594 1997 10   X3 -1 -00278
7  2402594 1997 11   X1 25 -00242
snipped output



I would like to organize the data so it looks like this:

  STN_ID YEAR MM DAYELEM1 ELEM2
1 2402594 1997   9  X1   -00233 -00339
2 2402594 1997   9  X2   -00204 77
3 2402594 1997   9  X3   -00119 30


Where is that second column coming from. I don't see it in the data 
example


Such that I create a new column named DAY that is made up of the
numbers following X in the original data.frame columns. Also, the ELEM
values are converted to columns and parsed with the ELEM code (in this
case 1 and 2).

I have tried to split apart the columns, transform them, and bind them
back together, but my ability to do so just isn't there yet. I am still
fairly new to R, and would really appreciate some help in working
towards organizing this data frame.

Thanks in advance,
Scott Hatcher

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution

2011-05-25 Thread rudi
Hi,

can anyone help me to figure out how to compute the percentile of an
individual observation with respect to a reference distribution.

What I mean is. Let's assume I have a vector consisting of 10 numbers
{3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the
number 4.9 corresponds to. I failed to find any reference to such a
function, although I would assume this must frequently be necessary.

Thanks in advance for you help.

/Rudi

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution

2011-05-25 Thread David Winsemius


On May 25, 2011, at 3:42 PM, rudi wrote:


Hi,

can anyone help me to figure out how to compute the percentile of an
individual observation with respect to a reference distribution.

What I mean is. Let's assume I have a vector consisting of 10 numbers
{3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the
number 4.9 corresponds to. I failed to find any reference to such a
function, although I would assume this must frequently be necessary.


?quantile

Talking about percentiles when you only have 10 numbers seems rather  
misleading, don't you think?


--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution

2011-05-25 Thread Jorge Ivan Velez
Hi Rudi,

Take a look at ?ecdf

HTH,
Jorge


On Wed, May 25, 2011 at 3:42 PM, rudi  wrote:

 Hi,

 can anyone help me to figure out how to compute the percentile of an
 individual observation with respect to a reference distribution.

 What I mean is. Let's assume I have a vector consisting of 10 numbers
 {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the
 number 4.9 corresponds to. I failed to find any reference to such a
 function, although I would assume this must frequently be necessary.

 Thanks in advance for you help.

 /Rudi

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset

2011-05-25 Thread Greg Snow
How can anyone overlook the intra-ocular trauma test (or sometimes called the 
inter-ocular concussion test).  But the i-o trauma test needs either a small 
data set or an appropriate graph of the data (or can you look at a dataset of a 
hundred columns and a million rows and do an intra-ocular trauma test?).  We 
were not told the size of the dataset or enough information to know what type 
of graph to make.

You do make a good point though that with minimal additional information the 
intra-ocular trauma test can be useful (well if it is significant, there are 
many datasets that fail the intra-ocular trauma test, but still yield 
interesting results after careful study).  And for any dataset that has a 
significant intra-ocular trauma test result, that should trump the results of 
SnowsCorrectlySizedButOtherwiseUselessTestOfAnything.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.s...@imail.org
801.408.8111


 -Original Message-
 From: Michael Dewey [mailto:i...@aghmed.fsnet.co.uk]
 Sent: Wednesday, May 25, 2011 1:14 PM
 To: Greg Snow; Ramnath R; r-help@r-project.org
 Subject: Re: [R] What are the common Standard Statistical methods used
 for the analysis of a dataset
 
 At 00:41 25/05/2011, Greg Snow wrote:
 The only statistical method that I know of that can be applied to
 any dataset without further definition of the nature of the data or
 the question being asked is
 SnowsCorrectlySizedButOtherwiseUselessTestOfAnything which is found
 in the TeachingDemos package for R.
 
 Greg, have you overlooked the intra-ocular trauma test?
 
 However this test is not common (for a couple of very good reasons).
 
 If you want a more useful method you first need to decide on what
 your question is that you want answered and have some more detail
 about the dataset.
 
 -Original Message-
 From: r-help-boun...@r-project.org
 [mailto:r-help-boun...@r-project.org] On Behalf Of Ramnath R
 Sent: Monday, May 23, 2011 12:12 PM
 To: r-help@r-project.org
 Subject: [R] What are the common Standard Statistical methods used
 for the analysis of a dataset
 
 Hi,
 
 Anybody know what are the common Standard statistical methods used for
 the
 analysis of a dataset,and
 anybody know which of these methods give similar results
 
 Ram
 
  [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Michael Dewey
 i...@aghmed.fsnet.co.uk
 http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Opening R in 64-bit version by default

2011-05-25 Thread John C Frain
I have no problems configuring .r files to start in Emacs or RStudio
and then use Emacs or RStudio to call the required version of R.

You might check when you open with other from Windows Explorer that
the check box Always open with this program is ticked.

If you are using Windows 7 you can set an change default programs as follows -
1 open Control Panel
2 click on programs
3 click Default Programs and follow the options to set the required defaults.

 I have never used Vista and dont know if this works in Vista

Best regards

John


On 25 May 2011 01:55, Michael Sumner mdsum...@gmail.com wrote:
 When you installed R there should be shortcuts on your desktop, or under /R/
 in the start menu unless you opted for the installation to not create
 those.

 Click (or double-click) the one that has a name like R x64 2.13.0 - the
 x64 indicates that the shortcut is for the 64-bit R. You won't have this if
 you opted not to install the 64-bit R components.

 Use that shortcut every time to start R, and when it's running right-click
 the task bar item and click Pin this program to tasksbar to make it super
 accessible.

 If you have R older than 2.12.0 then the 32-bit and 64-bit installers are
 separate, but you don't specify your version and you should use the latest
 in any case.

 If you have shortcuts for 32-bit R, or other versions then you'll need to
 clean up or organize them in whatever way works best for you.

 Cheers, Mike.

 On Wed, May 25, 2011 at 5:23 AM, Duncan Murdoch 
 murdoch.dun...@gmail.comwrote:

 On 24/05/2011 1:27 PM, Josh Browning wrote:

 Oh, of course, sorry.  I'm running Windows 7.  Thanks!


 Your question is probably a question for Microsoft.  Why doesn't whatever
 you did work?

 Someone here might be able to help if you describe what you did.  I just
 tried Open with... and selected Rgui.exe from the bin/x64 directory, and
 that failed.   A couple of other things I tried worked:

 1.  Edit the registry key

 HKEY_CLASSES_ROOT\RWorkspace\shell\open\command

 2.  Rename the bin/x64/Rgui.exe file to something else, and ask to open
 with that.

 Duncan Murdoch

  Josh

 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net]
 Sent: Tuesday, May 24, 2011 11:25 AM
 To: Josh Browning
 Cc: r-help@r-project.org
 Subject: Re: [R] Opening R in 64-bit version by default


 On May 24, 2011, at 11:03 AM, Josh Browning wrote:

   Hi Everyone,
 
   This may be a dumb question, but I can't seem to figure it out.  I
   have
   32 and 64 bit versions of R installed on my machine, and I'd really
   like
   the 64-bit version to be the default (i.e. what opens when I open up a
   workspace).  I've tried right-clicking on the workspace and setting
   the
   default option as the 64 bit version, but it still opens the workspace
   in 32-bit.  Am I missing something here?  Any help would be greatly
   appreciated!

 Shirley, you don't expect us to read your mind. OS?

 --
 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Michael Sumner
 Institute for Marine and Antarctic Studies, University of Tasmania
 Hobart, Australia
 e-mail: mdsum...@gmail.com

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
John C Frain
Economics Department
Trinity College Dublin
Dublin 2
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:fra...@tcd.ie
mailto:fra...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] What are the common Standard Statistical methods used for the analysis of a dataset

2011-05-25 Thread Stephan Kolassa

Dear all,

may I suggest the acronym IOTT for the inter-ocular trauma test?

Now we just need someone to implement iot.test(). I assume it will 
appear on CRAN within the next 24 hours.


Looking forward to yet another base package,
Stephan



Am 25.05.2011 23:36, schrieb Greg Snow:

How can anyone overlook the intra-ocular trauma test (or sometimes
called the inter-ocular concussion test).  But the i-o trauma test
needs either a small data set or an appropriate graph of the data (or
can you look at a dataset of a hundred columns and a million rows and
do an intra-ocular trauma test?).  We were not told the size of the
dataset or enough information to know what type of graph to make.

You do make a good point though that with minimal additional
information the intra-ocular trauma test can be useful (well if it is
significant, there are many datasets that fail the intra-ocular
trauma test, but still yield interesting results after careful
study).  And for any dataset that has a significant intra-ocular
trauma test result, that should trump the results of
SnowsCorrectlySizedButOtherwiseUselessTestOfAnything.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution

2011-05-25 Thread Dennis Murphy
Hi:

On Wed, May 25, 2011 at 12:42 PM, rudi rudi.stras...@gmail.com wrote:
 Hi,

 can anyone help me to figure out how to compute the percentile of an
 individual observation with respect to a reference distribution.

 What I mean is. Let's assume I have a vector consisting of 10 numbers
 {3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the
 number 4.9 corresponds to. I failed to find any reference to such a
 function, although I would assume this must frequently be necessary.

The simple answer is, I believe,

x - c(3,5,8,1,9,5,4,3,5.5,7)
plot(ecdf(x))
sum(x = 4.9)/length(x)
[1] 0.4

This would correspond to the empirical cumulative distribution
function (ecdf) to which Jorge alluded.

HTH,
Dennis

 Thanks in advance for you help.

 /Rudi

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Opening R in 64-bit version by default

2011-05-25 Thread Duncan Murdoch

On 25/05/2011 5:43 PM, John C Frain wrote:

I have no problems configuring .r files to start in Emacs or RStudio
and then use Emacs or RStudio to call the required version of R.

You might check when you open with other from Windows Explorer that
the check box Always open with this program is ticked.

If you are using Windows 7 you can set an change default programs as follows -
1 open Control Panel
2 click on programs
3 click Default Programs and follow the options to set the required defaults.

  I have never used Vista and dont know if this works in Vista


I suspect the latter method will fail, since it's using the same tools 
as Open with... uses, and that appears to be buggy.  But I'm on 32 bit 
XP right now, so I can't verify.


Duncan Murdoch



Best regards

John


On 25 May 2011 01:55, Michael Sumnermdsum...@gmail.com  wrote:

When you installed R there should be shortcuts on your desktop, or under /R/
in the start menu unless you opted for the installation to not create
those.

Click (or double-click) the one that has a name like R x64 2.13.0 - the
x64 indicates that the shortcut is for the 64-bit R. You won't have this if
you opted not to install the 64-bit R components.

Use that shortcut every time to start R, and when it's running right-click
the task bar item and click Pin this program to tasksbar to make it super
accessible.

If you have R older than 2.12.0 then the 32-bit and 64-bit installers are
separate, but you don't specify your version and you should use the latest
in any case.

If you have shortcuts for 32-bit R, or other versions then you'll need to
clean up or organize them in whatever way works best for you.

Cheers, Mike.

On Wed, May 25, 2011 at 5:23 AM, Duncan Murdochmurdoch.dun...@gmail.comwrote:


On 24/05/2011 1:27 PM, Josh Browning wrote:


Oh, of course, sorry.  I'm running Windows 7.  Thanks!



Your question is probably a question for Microsoft.  Why doesn't whatever
you did work?

Someone here might be able to help if you describe what you did.  I just
tried Open with... and selected Rgui.exe from the bin/x64 directory, and
that failed.   A couple of other things I tried worked:

1.  Edit the registry key

HKEY_CLASSES_ROOT\RWorkspace\shell\open\command

2.  Rename the bin/x64/Rgui.exe file to something else, and ask to open
with that.

Duncan Murdoch

  Josh


-Original Message-
From: David Winsemius [mailto:dwinsem...@comcast.net]
Sent: Tuesday, May 24, 2011 11:25 AM
To: Josh Browning
Cc: r-help@r-project.org
Subject: Re: [R] Opening R in 64-bit version by default


On May 24, 2011, at 11:03 AM, Josh Browning wrote:


  Hi Everyone,

  This may be a dumb question, but I can't seem to figure it out.  I
  have
  32 and 64 bit versions of R installed on my machine, and I'd really
  like
  the 64-bit version to be the default (i.e. what opens when I open up a
  workspace).  I've tried right-clicking on the workspace and setting
  the
  default option as the 64 bit version, but it still opens the workspace
  in 32-bit.  Am I missing something here?  Any help would be greatly
  appreciated!


Shirley, you don't expect us to read your mind. OS?

--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Michael Sumner
Institute for Marine and Antarctic Studies, University of Tasmania
Hobart, Australia
e-mail: mdsum...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to compute the inverse percentile of a given observation w.r.t. a reference distribution

2011-05-25 Thread David Winsemius


On May 25, 2011, at 5:50 PM, Dennis Murphy wrote:


Hi:

On Wed, May 25, 2011 at 12:42 PM, rudi rudi.stras...@gmail.com  
wrote:

Hi,

can anyone help me to figure out how to compute the percentile of an
individual observation with respect to a reference distribution.

What I mean is. Let's assume I have a vector consisting of 10 numbers
{3,5,8,1,9,5,4,3,5.5,7} and I want figure out what percentile the
number 4.9 corresponds to. I failed to find any reference to such a
function, although I would assume this must frequently be necessary.


The simple answer is, I believe,

x - c(3,5,8,1,9,5,4,3,5.5,7)


Try instead:

ecdf(x)(4.9)

[1] 0.4



ecdf returns a function, so why not use it as such? It is also linked  
from the quantile help page where it is called the inverse of  
quantile.



--
David.

plot(ecdf(x))
sum(x = 4.9)/length(x)
[1] 0.4



(Somewhat more complicated than necessary.)


This would correspond to the empirical cumulative distribution
function (ecdf) to which Jorge alluded.

HTH,
Dennis


Thanks in advance for you help.



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subtracting rows by id

2011-05-25 Thread Sara Maxwell
That worked perfectly.  Thank you Dennis - I very much appreciate the  
help!

Sara Maxwell, PhD, Postdoctoral Fellow
Marine Conservation Institute
University of California Santa Cruz
Long Marine Laboratory
100 Shaffer Road
Santa Cruz CA 95060 USA
+1 206 355 3249
sara.maxw...@marine-conservation.org
www.Marine-Conservation.org



On May 25, 2011, at 12:58 PM, Dennis Murphy wrote:

 Hi:

 Interesting problem. Here's one approach:

 library(plyr)
 # Read in your datasets as data frames rather than matrices
 dataset1 - data.frame(id1 = rep(1:3, each = 10),
   value1 = sample(seq_len(100), 30, replace =  
 TRUE))
 dataset2 - data.frame(id2 = 1:3, subtract.value = c(1, 3, 5))

 # The idea is to use the rows of dataset2 as parameters for
 # subsetting and removing the first n_i rows. The tail() function
 # serves the purpose:
 foo - function(id2, subtract.value) tail(subset(dataset1, id1 ==
 id2), -subtract.value)

 # Use the mdply function in the plyr package:
 mdply(dataset2, foo)[, -(1:2)]
   id1 value1
 11  2
 21 55
 31 18
 41  4
 51  3
 61 76
 71 74
 81 21
 91 97
 10   2 19
 11   2 49
 12   2 20
 13   2 73
 14   2 79
 15   2 95
 16   2 52
 17   3 60
 18   3 58
 19   3 68
 20   3 59
 21   3 13


 HTH,
 Dennis

 On Wed, May 25, 2011 at 9:55 AM, Sara Maxwell smaxw...@ucsc.edu  
 wrote:
 Dear R users,

 I have two datasets:

 id1 - c(rep(1,10), rep(2,10), rep(3,10))
 value1 - sample(1:100, 30, replace=TRUE)
 dataset1 - cbind(id1,value1)

 id2 - c(1,2,3)
 subtract.value - c(1,3,5)
 dataset2 - cbind(id2, subtract.value)

 I want to subtract the number of rows in the subtract.value that
 corresponds to the id value in dataset1.  So for the 1 in id1, I want
 to remove the first row, for 2 in id1 I want to remove the first 3
 rows, for 3 in id1 I want to remove the first 5 rows, finally  
 creating
 a new dataframe with the remaining values.

 I am having trouble structuring a loop that can do this by the unique
 ids in the first dataset while matching the ids in the datasets.

 Any thoughts would be greatly appreciated.

 Thank you,
 Sara


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] panel.first problem when plotting with formula

2011-05-25 Thread David Winsemius


On May 25, 2011, at 5:56 PM, Gene Leynes wrote:


David, Peter (and others),

If you're interested, I submitted this as a bug, and was informed of  
the error of my ways by Professor Ripley

* His informative reply is copied below. *

The short answer is that panel.first is not a documented function of  
plot.formula, which is called by the generic plot.




Apparently not the first time he has been called upon to do so. Here  
is a similar question, albeit with no answer (at least in Baron's  
archive) at that time.


http://finzi.psych.upenn.edu/Rhelp10/2009-September/210328.html

(... the link to the ancient bug is broken.)

But plot.formula promises to pass ...  arguments to later hand  
offs and apparently it munges up the 'dots' in a manner that  
plot.data.frame does not. In fact, plot.formula gets handed back to  
generic `plot`. Prof Ripley obviously has an understanding of the term  
`expression` that surpasses mine. Does your understaning of his reply  
extend to explaining why plot.data.frame works with our naive  
invocation of panel.first while his suggested syntax does not:


plot(dat, panel.first=quote( bgfun() ) ) # Fails.
plot(dat, panel.first= bgfun()  )   # Succeeds.

So I it still appears there is a demonstrable degree of inconsistency,  
even if there is no bug.




The solution gives me some insight into how the lazy evaluation works.

## Note: It's still not a documented use of the function!
 plot(y ~ x, data=dat, panel.first=quote(bgfun()))


On Wed, May 25, 2011 at 2:13 AM, r-b...@r-project.org wrote:
https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14591

Brian Ripley rip...@stats.ox.ac.uk changed:

  What|Removed |Added

Status|NEW |CLOSED
Resolution||INVALID

--- Comment #1 from Brian Ripley rip...@stats.ox.ac.uk 2011-05-25  
03:13:34 EDT ---

panel.first is not a documented argument to plot.formula: please do
read the help.


Yes, I did read the help page. I also looked at the code (of  
plot.formula, plot.data.frame, and plot.default)  and made a good  
faith effort at following the flow of data through that code by  
inserting print and str statements at what appeared to be critical  
points so I could see where plot.formula was going and what it was  
being given to work with.



 It is a documented argument to plot.default(), as

panel.first: an expression to be evaluated after the plot axes are set
   ^^
but you passed an evaluated function call.  It first ran bgfun() and  
then the

plot call.  It worked for plot.default() by lazy evaluation.


I also tried using just panel.first=bgfun as I would have with lattice  
calls, and it did not succeed in any application.




You needed

plot(y ~ x, data=dat, panel.first=quote(bgfun()))




--
David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >