[R] Summary.Formula: prmsd and test statistic
Hello, I'm a new user to R so apologies if this is a basic question, but after scouring the web on information for summary.formula, I still am searching for an answer. I made a function to analyze my data - I have a categorical variable and three continuous variables. I am analyzing my continuous variables on the basis of my categorical variables. radioanal - function(a) { #Educational status first - pulling variables from my database. categorical is 13 = Edu. numerical is 48=Kyph, 50=Vert, 53=HL. a1= a[,c(13,48,50,53)] #make sure they are in numeric form a2= transform(a1, Kyph=as.numeric(as.character(Kyph)), Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL))) #see boxplots of the individual variables boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle, xlab=Education, ylab=Kyphosis angle) boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected, xlab=Education, ylab=#of vertebrae affected) boxplot(a2$HL~a2$Edu, main=Education vs %HL, xlab=Education, ylab=%HL) #see distribution of data d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse, overall=T, continuous=5, add=TRUE, test=T) #perform MANOVA a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2) #return results a4=list(Results of Educational Status MANOVA, print(d), summary(a3, test=Hotelling-Lawley), summary(a3, test=Roy) , summary(a3, test=Pillai), summary(a3, test=Wilks), summary.aov(a3) ) print(a4) } This function works as is, but I want to add the mean and standard deviation to my table. When I add the following code to line 36 where I print d print(d, prmsd=TRUE) The numbers in my table disappear. When I use the same commands from the command line, the same thing happens. After reading the manual, I think the error might be due to the missing numbers in my database, so I tried adding na.action to my set of commands: print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, na.action, method=reverse, overall=T, continuous=5, add=TRUE, test=T), prmsd=TRUE) but then I get the following error: Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class 'function' into a data.frame Any ideas? Also, does anyone know what kind of test statistic this function calculates? I compared the F and p values to a manual ANOVA but they were different. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Forget I asked. There was a typo in my example (stringsAsFactor instead of stringAsFactors) which explained the difference. My apologies. My second question however still stands: How does on create a data.frame with given column types and given dimensions? Thanks. Regards, Jan Quoting Jan van der Laan rh...@eoos.dds.nl: I use the following code to create two data.frames d1 and d2 from a list: types - c(integer, character, double) nlines - 10 d1 - as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 - lapply(types, do.call, list(nlines)) d2 - as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level : 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] changing the day of the week in dates format
Dear all, I have a question related to the POSIXlt function in R. I have a set of dates and times, for exmaple: startx - as.POSIXct(2011-01-01 00:00:00) finx - as.POSIXct(2011-12-31 00:00:00) daysx- seq(startx, finx, by=24 hours) I want to change the dates of all the days falling on a Saturday to the next working day (i.e. Monday). So I convert dates to POSIXlt mydaysx - as.POSIXlt(daysx) Then I change select all the Saturday's and move them on to Monday select - mydaysx$wday==6 mydaysx$mday[select] - mydaysx$mday[select] + 2 However, although all the new dates (i.e. mydaysx) are actual days of the year - the $wday have not been updated and the $mdays have not all been corrected (i.e. those falling into the next month). So if I do select - mydaysx$wday==6 I still get the same set of days as before. Is there a way to do this? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with mysql and R: partitioning by quintile
Here's how I'm trying to solve the diversity problem inherent in the data (see below for a definition of the problem): if (interquintile ranges have =4 ranges at the same freq) then (use rating=3) else (use rating as described in jim's code) i'll have a go and post an update. in the mean time, if you see that I'm going straight into the ditch with my solution please do let me know. regards gawesh On Sun, May 15, 2011 at 12:28 AM, gj gaw...@gmail.com wrote: Jim's suggestion did the trick: tqm - do.call(rbind, tq) + 0.001 head(x.new) userid freq track rating [1,] 11 1 1 [2,] 1 10 2 5 [3,] 11 3 1 [4,] 11 4 1 [5,] 1 15 5 5 [6,] 14 6 3 Dennis, what you suggested didn't work. Thanks a lot guys! :-) But before I can smile, I need to resolve a problem inherent in the data. When the play history lacks diversity (in terms of frequency), I want to assign a neutral rating of 3 (for my recommender system I use rating 1 is 'don't like' and 5 is 'i like'). Can that be done in R? For example: input: userid,track,freq 1,1,1 1,2,1 1,3,1 1,4,1 1,5,2 1,6,2 1,7,1 1,8,1 1,9,1 1,10,1 1,11,2 1,12,2 1,13,1 1,14,1 1,15,1 1,16,1 1,17,2 1,18,2 1,19,2 1,20,2 1,21,2 1,22,1 1,23,1 1,24,1 1,25,1 1,26,1 1,27,1 1,28,1 1,29,1 1,30,1 1,31,1 1,32,1 1,33,1 1,34,1 1,35,1 gives output: head(x.new) userid freq track rating [1,] 11 1 1 [2,] 11 2 1 [3,] 11 3 1 [4,] 11 4 1 [5,] 12 5 4 [6,] 12 6 4 Ideally I want to give a neutral rating in this case : userid freq track rating [1,] 11 1 3 [2,] 11 2 3 [3,] 11 3 3 [4,] 11 4 3 [5,] 12 5 3 [6,] 12 6 3 Regards Gawesh On Sat, May 14, 2011 at 11:52 PM, jim holtman jholt...@gmail.com wrote: An easy way is to just offset the quantiles by a small increment so that boundary condition is less likely. If you change the line tqm - do.call(rbind, tq) + 0.001 in my example, that should do the trick. On Sat, May 14, 2011 at 6:09 PM, gj gaw...@gmail.com wrote: Hi, I think I haven't been able to explain correctly what I want. Here another try: Given that I have the following input: userid,track,freq 1,1,1 1,2,10 1,3,1 1,4,1 1,5,15 1,6,4 1,7,16 1,8,6 1,9,1 1,10,1 1,11,2 1,12,2 1,13,1 1,14,6 1,15,7 1,16,13 1,17,3 1,18,2 1,19,5 1,20,2 1,21,2 1,22,6 1,23,4 1,24,1 1,25,1 1,26,16 1,27,4 1,28,1 1,29,4 1,30,4 1,31,4 1,32,1 1,33,14 1,34,2 1,35,7 It is a sample of the history of tracks played: userid,track and frequency. What I want is to convert the frequency into a rating scale (1-5) based on the frequency at which a user has played a track, using the following interquintile ranges for the cfd: 0%-20% = rating 1, 20%-40% = rating 2, ,80%-100%=rating 5 Jim kindly provided the following code: # cheers jim holtman x=read.csv(file=C:\\Data\\lastfm\\ratings\\play_history_3.csv,header=T, sep=',') # get the quantiles for each user(we want the frequency distribution to be based on user) tq - tapply(x$freq,x$userid,quantile,prob=c(0.2,0.4,0.6,0.8,1)) # create a matrix with the rownames as the tracks to use in the findInterval tqm - do.call(rbind, tq) #now put the ratings require(data.table) x.dt - data.table(x) x.new - x.dt[,list(freq = freq,track=track,rating = findInterval(freq,tqm[as.character(userid[1L]),], rightmost.closed = TRUE) + 1L),by=userid] head(x.new) userid freq track rating [1,] 11 1 2 [2,] 1 10 2 5 [3,] 11 3 2 [4,] 11 4 2 [5,] 1 15 5 5 [6,] 14 6 4 which is almost what I wanted except that the ratings are 1 point higher for tracks where the frequency is at the cut-off points in the interquintile range. To illustrate the quintiles are: tq$`1` 20% 40% 60% 80% 100% 1247 16 So, ideally I want (note the different ratings): userid freq track rating [1,] 11 1 1 [2,] 1 10 2 5 [3,] 11 3 1 [4,] 11 4 1 [5,] 1 15 5 5 [6,] 14 6 3 Can anybody help me? I'm new to R (as you have probably guessed). Sorry for the long explanation. Regards Gawesh On Sat, May 14, 2011 at 7:37 PM, Dennis Murphy djmu...@gmail.com wrote: Hi: Is this what you're after? tq - with(ds, quantile(freq, seq(0.2, 1, by = 0.2))) ds$int - with(ds, cut(freq, c(0, tq))) with(ds, table(int)) int (0,1] (1,2] (2,4] (4,7] (7,16] 10 6 7 6 6 HTH,
Re: [R] L'abbe plot
On 05/14/2011 07:20 AM, whitney.mel...@colorado.edu wrote: I cannot seem to get a L'abbe plot to work on R. I do not understand what the X coordinates, or alternatively an object of class metabin, is supposed to mean. What is a class of metabin? Hi Whitney, The L'Abbe plot is a relatively simple illustration that shows the results of intervention trials as two proportions on a Cartesian plane. The outcomes must be dichotomous (dead/alive, cured/not cured, improved/not improved, etc.) and the comparisons are between two interventions. Say that I was asked to evaluate an intervention for excessive drinkers that randomly assigned the subjects to either a session with a behavioral therapist or a session of equal duration with an ex-drinker. The outcome might be whether the subject drank more or less over the succeeding month. Thus: didf-data.frame(subject=1:50,interv=rep(c(therapist,ex-drinker),each=25),outcome=sample(c(more,less),50,TRUE)) didf.tab-table(didf$interv,didf$outcome) didf.tab less more ex-drinker 14 11 therapist12 13 chisq.test(didf.tab) Pearson's Chi-squared test with Yates' continuity correction data: didf.tab X-squared = 0.0801, df = 1, p-value = 0.7771 Apparently ex-drinkers are no better or worse than therapists. So we want to illustrate this with a L'Abbe plot. library(plotrix) labbePlot-function(x,main=L'Abbe plot, xlab=Positive response with placebo (%), ylab=Positive response with treatment (%),...) { plot(0,xlim=c(0,100),ylim=c(0,100),main=main,xlab=xlab, ylab=ylab,type=n,...) for(trial in 1:length(x)) { sum_treat-sum(x[[trial]][1,]) sum_interv-sum(x[[trial]][2,]) xpos-100*x[[trial]][1,1]/sum_treat ypos-100*x[[trial]][2,1]/sum_interv rad-sqrt(sum_treat+sum_interv)/2 draw.circle(xpos,ypos,rad) } segments(0,0,100,100) } x-list(didf.tab) labbePlot(x) This shows that the therapists, whom we expected to do better, were slightly, but not significantly, worse than the ex-drinkers. This can't be right, so let's follow it up with a bigger trial. didf2-data.frame(subject=1:200, interv=rep(c(therapist,ex-drinker),each=100), outcome=c(sample(c(more,less),100,TRUE,prob=c(0.3,0.7)), sample(c(more,less),100,TRUE,prob=c(0.7,0.3 didf2.tab-table(didf2$interv,didf2$outcome) x-list(didf.tab,didf2.tab) labbePlot(x) That's better, isn't it? This basic plot can be tarted up with colors for the different circles, and other decorations so beloved of those who use presentation packages. Now that I've written it, I might as well add it to the plotrix package. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Low Pain Unicode Characters in pdf graph?
Dear R-experts---is there a relatively low-pain way to get unicode characters into a plot to a pdf device? pdf(file=cardsymbols.pdf) plot( 0, xlim=c(0,5), ylim=c(0,5), type=n) text(1,1, spades;) text(2,2, hearts;) text(3,3, diams;) text(4,4, clubs;) dev.off() (these are the characters that I need the most NOW, but this is a more generic question.) sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find String Between Characters
Hi Jim, Thanks for your note. Unfortunately, when I attempt your solution in my exact setting, I get a weird and slightly different answer. First, let me be more clear. What I am attempting to do is pull the CIK number out of the information from the web page itself after it has loaded to R (this may not be optimal, but I am new at this), not from the web page reference (as you have done). So, when I execute the following as per your suggestion: require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;) num - sub(^.*CIK=([0-9]+).*, \\1, mmm) I get [1] pointer: 0x001265c0 Is this just a hex representation of the same number, or is something else going on here? Comments from any and all would be much appreciated. --John J. Sparks, Ph.D. On Sat, May 14, 2011 7:57 pm, jim holtman wrote: Is this what you want: mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40; num - sub(^.*CIK=([0-9]+).*, \\1, mmm) num [1] 320193 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;) str(mmm) I want to get the number 320193 that is between the CIK= and the . I have tried g - grep( CIK=|, mmm ) and temp-grep(mmm,\CIK=\) and variations on these themes, but all won't run or come bask as an empty object. How can I grab this number? Best wishes, --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on approximations of full logistic regression model
Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. library(rms) plogit - predict(full.model) full.ols - ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) fastbw(full.ols, aics=1e10) Deleted Chi-Sq d.f. P Residual d.f. P AICR2 stenosis 1.41 10.2354 1.41 10.2354 -0.59 0.991 x216.78 10. 18.19 20.0001 14.19 0.882 procedure 26.12 10. 44.31 30. 38.31 0.711 ClinicalScore 25.75 10. 70.06 40. 62.06 0.544 x183.42 10. 153.49 50. 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping stenosis. full.ols.approx - ols(plogit ~ x1+x2+ClinicalScore+procedure) full.ols.approx$stats n Model L.R.d.f. R2 g Sigma 104.000 487.9006640 4.000 0.9908257 1.3341718 0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping stenosis as predictor. full.approx.lrm - update(full.model, ~ . -stenosis) validate(full.model, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 R20.3270 0.3716 0.3335 0.0382 0.2888 1000 Intercept 0. 0. 0.0821 -0.0821 0.0821 1000 Slope 1. 1. 1.0548 -0.0548 1.0548 1000 Emax 0. 0. 0.0263 0.0263 0.0263 1000 validate(full.approx.lrm, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 R20.3245 0.3592 0.3428 0.0164 0.3081 1000 Intercept 0. 0. 0.1281 -0.1281 0.1281 1000 Slope 1. 1. 1.1104 -0.1104 1.1104 1000 Emax 0. 0. 0.0444 0.0444 0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. full.approx.lrm.nom - nomogram(full.approx.lrm, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.approx.lrm.nom) Another nomogram using ols model, full.ols.approx.nom - nomogram(full.ols.approx, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.ols.approx.nom) These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with makeSOCKcluster depending on R patch version
On 15.05.2011 12:27, Søren Højsgaard wrote: That raises another question: Will that patched version (2011-05-13 r55886) be made available as a windows binary - and if so: when? Daily builds for WIndows of R-patched are available from CRAN. Best, uwe Regards Søren Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; vegne af Uwe Ligges [lig...@statistik.tu-dortmund.de] Sendt: 14. maj 2011 18:23 Til: Ulrich Halekoh Cc: r-help@r-project.org Emne: Re: [R] problem with makeSOCKcluster depending on R patch version On 13.05.2011 14:01, Ulrich Halekoh wrote: Dear, I encountered a problem using the makeSOCKcluster function depending the patched version of R-2.13.0 I used. library(snow) cl- makeSOCKcluster(rep(localhost, 2)) this works fine for the R-13.0 patch (2011-04-28 r55678) but not for the patch R-13.0 patch (2011-05-10 r55826) If R-2.13.0 patched is meant: I do not see this with a recent snapshot (2011-05-13 r55886). Uwe Ligges In the latter case the command keeps running. Interrupting the command I get the error message Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection In addition: Warning message: In socketConnection(port = port, server = TRUE, blocking = TRUE, : problem in listening on this socket Does work R version 2.13.0 Patched (2011-04-28 r55678) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Does not work R version 2.13.0 Patched (2011-05-10 r55826) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Kind regards Ulrich Halekoh Associate Professor Aarhus University Email: ulrich.hale...@agrsci.dk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unexpected behaviour as.data.frame
I use the following code to create two data.frames d1 and d2 from a list: types - c(integer, character, double) nlines - 10 d1 - as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 - lapply(types, do.call, list(nlines)) d2 - as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level : 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with makeSOCKcluster depending on R patch version
That raises another question: Will that patched version (2011-05-13 r55886) be made available as a windows binary - and if so: when? Regards Søren Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; vegne af Uwe Ligges [lig...@statistik.tu-dortmund.de] Sendt: 14. maj 2011 18:23 Til: Ulrich Halekoh Cc: r-help@r-project.org Emne: Re: [R] problem with makeSOCKcluster depending on R patch version On 13.05.2011 14:01, Ulrich Halekoh wrote: Dear, I encountered a problem using the makeSOCKcluster function depending the patched version of R-2.13.0 I used. library(snow) cl- makeSOCKcluster(rep(localhost, 2)) this works fine for the R-13.0 patch (2011-04-28 r55678) but not for the patch R-13.0 patch (2011-05-10 r55826) If R-2.13.0 patched is meant: I do not see this with a recent snapshot (2011-05-13 r55886). Uwe Ligges In the latter case the command keeps running. Interrupting the command I get the error message Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection In addition: Warning message: In socketConnection(port = port, server = TRUE, blocking = TRUE, : problem in listening on this socket Does work R version 2.13.0 Patched (2011-04-28 r55678) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Does not work R version 2.13.0 Patched (2011-05-10 r55826) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Kind regards Ulrich Halekoh Associate Professor Aarhus University Email: ulrich.hale...@agrsci.dk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R function that returns an object's search path position XXXX
Hello everyone, Is there an R function that returns an object's search path position? Thank you, Dan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary.Formula: prmsd and test statistic
On May 14, 2011, at 11:23 AM, Eli Kamara wrote: Hello, I'm a new user to R so apologies if this is a basic question, but after scouring the web on information for summary.formula, I still am searching for an answer. I made a function to analyze my data - I have a categorical variable and three continuous variables. I am analyzing my continuous variables on the basis of my categorical variables. radioanal - function(a) { #Educational status first - pulling variables from my database. categorical is 13 = Edu. numerical is 48=Kyph, 50=Vert, 53=HL. a1= a[,c(13,48,50,53)] #make sure they are in numeric form a2= transform(a1, Kyph=as.numeric(as.character(Kyph)), Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL))) #see boxplots of the individual variables boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle, xlab=Education, ylab=Kyphosis angle) boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected, xlab=Education, ylab=#of vertebrae affected) boxplot(a2$HL~a2$Edu, main=Education vs %HL, xlab=Education, ylab=%HL) #see distribution of data d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse, overall=T, continuous=5, add=TRUE, test=T) I noticed that you were addressing the columns individually. That rather defeats the strategy of passing a data argument to a function and using only the column names in the formula. It often causes strange errors in model calls and I wouldn be surprised if you got better results with something like: d=summary.formula( Edu~ Kyph+ HL+ Vert, data=a2, method=reverse, overall=T, continuous=5, add=TRUE, test=T) -- David #perform MANOVA a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2) #return results a4=list(Results of Educational Status MANOVA, print(d), summary(a3, test=Hotelling-Lawley), summary(a3, test=Roy) , summary(a3, test=Pillai), summary(a3, test=Wilks), summary.aov(a3) ) print(a4) } This function works as is, but I want to add the mean and standard deviation to my table. When I add the following code to line 36 where I print d print(d, prmsd=TRUE) The numbers in my table disappear. When I use the same commands from the command line, the same thing happens. After reading the manual, I think the error might be due to the missing numbers in my database, so I tried adding na.action to my set of commands: print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, na.action, method=reverse, overall=T, continuous=5, add=TRUE, test=T), prmsd=TRUE) but then I get the following error: Error in as.data.frame.default(data, optional = TRUE) : cannot coerce class 'function' into a data.frame It may be trying to do something with 'data' and doesn't find a 'data' object until it get to the 'data' function. Any ideas? Also, does anyone know what kind of test statistic this function calculates? Huh. You do realize this function in the rms package has a help page, right? I compared the F and p values to a manual ANOVA but they were different. I think you break further questions down into components and post something that is reproducible. PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Low Pain Unicode Characters in pdf graph?
On May 15, 2011, at 9:06 AM, ivo welch wrote: Dear R-experts---is there a relatively low-pain way to get unicode characters into a plot to a pdf device? pdf(file=cardsymbols.pdf) plot( 0, xlim=c(0,5), ylim=c(0,5), type=n) text(1,1, spades;) text(2,2, hearts;) text(3,3, diams;) text(4,4, clubs;) dev.off() (these are the characters that I need the most NOW, but this is a more generic question.) The last examples in ?points should be reviewed and tested. It is cited by the ?plotmath page as a way of getting at symbols. -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DCC-GARCH model
Hello, I have a few questions concerning the DCC-GARCH model and its programming in R. So here is what I want to do: I take quotes of two indices - SP500 and DJ. And the aim is to estimate coefficients of the DCC-GARCH model for them. This is how I do it: library(tseries) p1 = get.hist.quote(instrument = ^gspc,start = 2005-01-07,end = 2009-09-04,compression = w, quote=AdjClose) p2 = get.hist.quote(instrument = ^dji,start = 2005-01-07,end = 2009-09-04,compression = w, quote=AdjClose) p = cbind(p1,p2) y = diff(log(p))*100 y[,1] = y[,1]-mean(y[,1]) y[,2] = y[,2]-mean(y[,2]) T = length(y[,1]) library(ccgarch) library(fGarch) f1 = garchFit(~ garch(1,1), data=y[,1],include.mean=FALSE) f1 = f1@fit$coef f2 = garchFit(~ garch(1,1), data=y[,2],include.mean=FALSE) f2 = f2@fit$coef a = c(f1[1], f2[1]) A = diag(c(f1[2],f2[2])) B = diag(c(f1[3], f2[3])) dccpara = c(0.2,0.6) dccresults = dcc.estimation(inia=a, iniA=A, iniB=B, ini.dcc=dccpara,dvar=y, model=diagonal) dccresults$out DCCrho = dccresults$DCC[,2] matplot(DCCrho, type='l') dccresults$out deliver me the estimated coefficients of the DCC-GARCH model. And here is my first question: How can I check if these coefficients are significant or not? How can I test them for significance? second question would be: Is this true that matplot(DCCrho, type='l') shows conditional correlation between the two indices in question? and the third one: What is actually dccpara and why do I get totally different DCC-alpha and DCC-beta coefficients if I change dccpara from c(0.2,0.6) to, let's say, c(0.01, 0.98) ? What determines which values should be chosen? Hopefully someone will find time to give me a hand. Thank you very much in advance, people of good will, for looking at/checking what I wrote and helping me. Best regards Marcin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Low Pain Unicode Characters in pdf graph?
Hi: For your specific problem, one way is: plot( 0, xlim=c(0,5), ylim=c(0,5), type=n, cex = 2) text(1, 1, expression(symbol('\252'))) text(2, 2, expression(symbol('\251'))) text(3, 3, expression(symbol('\250'))) text(4, 4, expression(symbol('\247'))) More generally, David's advice is sound; see ?plotmath and focus on the sections 'Other symbols' and 'References'; the last reference provides a summary table of standard symbols and their codes in several formats. HTH, Dennis On Sun, May 15, 2011 at 6:06 AM, ivo welch ivo.we...@gmail.com wrote: Dear R-experts---is there a relatively low-pain way to get unicode characters into a plot to a pdf device? pdf(file=cardsymbols.pdf) plot( 0, xlim=c(0,5), ylim=c(0,5), type=n) text(1,1, spades;) text(2,2, hearts;) text(3,3, diams;) text(4,4, clubs;) dev.off() (these are the characters that I need the most NOW, but this is a more generic question.) sincerely, /iaw Ivo Welch (ivo.we...@gmail.com) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with makeSOCKcluster depending on R patch version
I just downloaded the patched version from the Danish mirror; http://mirrors.dotsrc.org/cran/ That gave me: R version 2.13.0 Patched (2011-05-10 r55826) - which is *not* the version you refer to. Where may one get the latest patch then? Regards Søren Fra: Uwe Ligges [lig...@statistik.tu-dortmund.de] Sendt: 15. maj 2011 15:25 Til: Søren Højsgaard Cc: Ulrich Halekoh; r-help@r-project.org Emne: Re: SV: [R] problem with makeSOCKcluster depending on R patch version On 15.05.2011 12:27, Søren Højsgaard wrote: That raises another question: Will that patched version (2011-05-13 r55886) be made available as a windows binary - and if so: when? Daily builds for WIndows of R-patched are available from CRAN. Best, uwe Regards Søren Fra: r-help-boun...@r-project.org [r-help-boun...@r-project.org] P#229; vegne af Uwe Ligges [lig...@statistik.tu-dortmund.de] Sendt: 14. maj 2011 18:23 Til: Ulrich Halekoh Cc: r-help@r-project.org Emne: Re: [R] problem with makeSOCKcluster depending on R patch version On 13.05.2011 14:01, Ulrich Halekoh wrote: Dear, I encountered a problem using the makeSOCKcluster function depending the patched version of R-2.13.0 I used. library(snow) cl- makeSOCKcluster(rep(localhost, 2)) this works fine for the R-13.0 patch (2011-04-28 r55678) but not for the patch R-13.0 patch (2011-05-10 r55826) If R-2.13.0 patched is meant: I do not see this with a recent snapshot (2011-05-13 r55886). Uwe Ligges In the latter case the command keeps running. Interrupting the command I get the error message Error in socketConnection(port = port, server = TRUE, blocking = TRUE, : cannot open the connection In addition: Warning message: In socketConnection(port = port, server = TRUE, blocking = TRUE, : problem in listening on this socket Does work R version 2.13.0 Patched (2011-04-28 r55678) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Does not work R version 2.13.0 Patched (2011-05-10 r55826) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Danish_Denmark.1252 LC_CTYPE=Danish_Denmark.1252 [3] LC_MONETARY=Danish_Denmark.1252 LC_NUMERIC=C [5] LC_TIME=Danish_Denmark.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snow_0.3-3 Kind regards Ulrich Halekoh Associate Professor Aarhus University Email: ulrich.hale...@agrsci.dk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing the day of the week in dates format
Hi Dave, your problem is that you are working with a S3 class, what is mainly a list with naming convention. Hence it is possible to change just one entry of the list, but it is nearly never recommendable. So a slight change to your code should provide you the required output: mydaysx[select] - mydaysx[select] + 2*24*60*60 select - mydaysx$wday==6 sum(select) [1] 0 In this case not only the entry $mday of the list is changed, but the whole object is updated. Cheers Adrian Am 14.05.2011 20:44, schrieb Dave Evens: Dear all, I have a question related to the POSIXlt function in R. I have a set of dates and times, for exmaple: startx- as.POSIXct(2011-01-01 00:00:00) finx- as.POSIXct(2011-12-31 00:00:00) daysx- seq(startx, finx, by=24 hours) I want to change the dates of all the days falling on a Saturday to the next working day (i.e. Monday). So I convert dates to POSIXlt mydaysx- as.POSIXlt(daysx) Then I change select all the Saturday's and move them on to Monday select- mydaysx$wday==6 mydaysx$mday[select]- mydaysx$mday[select] + 2 However, although all the new dates (i.e. mydaysx) are actual days of the year - the $wday have not been updated and the $mdays have not all been corrected (i.e. those falling into the next month). So if I do select- mydaysx$wday==6 I still get the same set of days as before. Is there a way to do this? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] hotelling and confidence region
Good morning I've made an PCA and I'd like to plot a confidence region based on Hotelling T2? Does anyone know how to compute it? Thank you -- View this message in context: http://r.789695.n4.nabble.com/hotelling-and-confidence-region-tp3524204p3524204.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] integrate
Dear R-users, I'am really new at R. That's why I probably have a basic quastion. I have a function like f(x,y)=\int^{0}_{y}(2*x)*exp(y-t)dt or f(x,y)=\int^{0}_{y}((2*x)*exp(\int^{0}_{t}(x*k)dk)dt and I can also define some basic loops for xy like x in 1:3 and y in 1:2. Could anybody please help me? best wishes, mgm [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
In your post, you're missing the final s on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan rh...@eoos.dds.nl wrote: I use the following code to create two data.frames d1 and d2 from a list: types - c(integer, character, double) nlines - 10 d1 - as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2 - lapply(types, do.call, list(nlines)) d2 - as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level : 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://devo.gene.com/groups/devo/depts/ncb/home.shtml __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein haenl...@escpeurope.eu wrote: I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. If you are on Windows press the Windows key and type in Power Options. When the associated dialog pops up choose Power Saver. Now your PC will use less power so it won't heat up so much although your performance could suffer a bit. Also ensure that there is sufficient air circulation around the machine. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
Also: A previous post in this tread suggested Rprof [sec. 3.2 in Writing R Extensions, available via help.start()]. This should identify the functions that consume the most time. The standard procedure to improve speed is as follows: 1. Experiment with different ways of computing the same thing in R. In many cases, this can help you reduce the compute time by a factor of 10 or even 1,000 or more. Try this, perhaps using proc.time and system.time with portions of your code, the rerun Rprof. 2. After you feel you have done the best you can with R, you might try coding the most compute intensive portion of the algorithm in a compiled language like C, C++ or Fortran. Then rerun Rprof, etc. 3. After trying (or not) compiled code, it may be appropriate to consider CRAN Task View: High-Performance and Parallel Computing with R. (From a CRAN mirror, select Task Views - HighPerformanceComputing: High-Performance and Parallel Computing with R.) You may also want to try the foreach package from Revolution Computing (revolutionanalytics.com). These capabilities can help you get the most out of a multi-core computer. NOTE: While your code is running, you can check the Performance tab in Windows Task Manager to see what percent of your CPUs and physical memory you are using. I mention this, because without foreach you might get at most 1 of your 4 CPUs running R. With foreach, you might be able to get all of them working for you. Then after you have done this and satisfied yourself that you've done the best you can with all of this, I suggest you try the Amazon Cloud. If you have not already solved your problem with this and have not yet tried these three steps, I suggest you try this. It may take more of your time, but you will likely learn much that will help you in the future as well as help you make a better choice of a new computer if you ultimately decide to do that. Hope this helps. Spencer On 5/15/2011 8:28 AM, Gabor Grothendieck wrote: On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein haenl...@escpeurope.eu wrote: I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. If you are on Windows press the Windows key and type in Power Options. When the associated dialog pops up choose Power Saver. Now your PC will use less power so it won't heat up so much although your performance could suffer a bit. Also ensure that there is sufficient air circulation around the machine. -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Again on Data Mining
Dear All, I have already posted before on the list about data mining and it has proved very useful. I have now a training dataset consisting of N objects of MN different kinds (actually, M is usually 3 to 5, whereas N is of the order of 1000). Every object has its own label L_i, i=1...N, that is known. For each of these objects I measure some property in time (let's say I measure it Q times in a given time interval), i.e. the i-th object has an associated file {t, y}, where t=(t_1,t_2t_Q) and y=(y_1,y_2,...y_Q). My problem is then to come up with an algorithm that after learning on the training dataset, can guess the labels of a testing dataset. The difference with respect to the datamining I have done so far is that I do not have a set of properties for every object (e.g. age, sex, income, etc...) but rather an associated function y=f(t). Any suggestion (either conceptual or about which R package I should turn to) is greatly appreciated. Many thanks Lorenzo __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] changing the day of the week in dates format
Dear Dave, please always answer to the whole list. To answer your question: A quick check showed that your proposed code will not work as you expected it tt - as.POSIXlt(strptime(2011-01-01 00:00:00, %Y-%m-%d %H:%M:%S, tz=GMT)) tt [1] 2011-01-01 GMT tt+ 365.25*24*60*60 [1] 2012-01-01 06:00:00 GMT tt+ 365.25*24*60*60*2 [1] 2012-12-31 12:00:00 GMT I am not aware of a addtime function in base package (similar to difftime()), maybe there is one in one of the packages on cran. A quick google search provided the following link for adding years to a date http://tolstoy.newcastle.edu.au/R/help/05/10/13700.html, where seq.Date() was proposed. Regards Adrian Am 15.05.2011 16:25, schrieb Dave Evens: Hi Adrian, Many thanks for your reply. Suppose I wanted to increment the date by a year - how would I account for things like leap years? Would I just do mydaysx[select] - mydaysx[select] + 365.25*24*60*60 Regards, Dave *From:* Adrian Duffner duffn...@googlemail.com *To:* Dave Evens daveeve...@yahoo.co.uk *Cc:* r-help@r-project.org r-help@r-project.org *Sent:* Sunday, 15 May 2011, 14:21 *Subject:* Re: [R] changing the day of the week in dates format Hi Dave, your problem is that you are working with a S3 class, what is mainly a list with naming convention. Hence it is possible to change just one entry of the list, but it is nearly never recommendable. So a slight change to your code should provide you the required output: mydaysx[select] - mydaysx[select] + 2*24*60*60 select - mydaysx$wday==6 sum(select) [1] 0 In this case not only the entry $mday of the list is changed, but the whole object is updated. Cheers Adrian Am 14.05.2011 20:44, schrieb Dave Evens: Dear all, I have a question related to the POSIXlt function in R. I have a set of dates and times, for exmaple: startx- as.POSIXct(2011-01-01 00:00:00) finx- as.POSIXct(2011-12-31 00:00:00) daysx- seq(startx, finx, by=24 hours) I want to change the dates of all the days falling on a Saturday to the next working day (i.e. Monday). So I convert dates to POSIXlt mydaysx- as.POSIXlt(daysx) Then I change select all the Saturday's and move them on to Monday select- mydaysx$wday==6 mydaysx$mday[select]- mydaysx$mday[select] + 2 However, although all the new dates (i.e. mydaysx) are actual days of the year - the $wday have not been updated and the $mdays have not all been corrected (i.e. those falling into the next month). So if I do select- mydaysx$wday==6 I still get the same set of days as before. Is there a way to do this? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R function that returns an object's search path position XXXX
On 14/05/2011 9:41 PM, Dan Abner wrote: Hello everyone, Is there an R function that returns an object's search path position? Does find() do what you want? It doesn't give the position in the search path, but you could get that from something like name - plot which( search() %in% find(name) ) Be aware that a name can appear in more than one place in the path, and not all available objects are on the search path. So I'm not sure the above solves your real problem. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final s on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl wrote: I use the following code to create two data.frames d1 and d2 from a list: types- c(integer, character, double) nlines- 10 d1- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2- lapply(types, do.call, list(nlines)) d2- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level : 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
On Sun, May 15, 2011 at 9:31 AM, Spencer Graves spencer.gra...@structuremonitoring.com wrote: Also: A previous post in this tread suggested Rprof [sec. 3.2 in Writing R Extensions, available via help.start()]. This should identify the functions that consume the most time. The standard procedure to improve speed is as follows: 1. Experiment with different ways of computing the same thing in R. In many cases, this can help you reduce the compute time by a factor of 10 or even 1,000 or more. Try this, perhaps using proc.time and system.time with portions of your code, the rerun Rprof. I second this one; if you have things running for weeks, and you haven't done any serious optimization already, you most likely can bring that down to days or hours by investigating where the bottlenecks are. Here is a good illustration how a simple piece of R code is made 12,000 times faster: http://rwiki.sciviews.org/doku.php?id=tips:programming:code_optim2 2. After you feel you have done the best you can with R, you might try coding the most compute intensive portion of the algorithm in a compiled language like C, C++ or Fortran. Then rerun Rprof, etc. 3. After trying (or not) compiled code, it may be appropriate to consider CRAN Task View: High-Performance and Parallel Computing with R. (From a CRAN mirror, select Task Views - HighPerformanceComputing: High-Performance and Parallel Computing with R.) You may also want to try the foreach package from Revolution Computing (revolutionanalytics.com). These capabilities can help you get the most out of a multi-core computer. NOTE: While your code is running, you can check the Performance tab in Windows Task Manager to see what percent of your CPUs and physical memory you are using. I mention this, because without foreach you might get at most 1 of your 4 CPUs running R. With foreach, you might be able to get all of them working for you. Then after you have done this and satisfied yourself that you've done the best you can with all of this, I suggest you try the Amazon Cloud. If you have not already solved your problem with this and have not yet tried these three steps, I suggest you try this. It may take more of your time, but you will likely learn much that will help you in the future as well as help you make a better choice of a new computer if you ultimately decide to do that. Hope this helps. Spencer On 5/15/2011 8:28 AM, Gabor Grothendieck wrote: On Fri, May 13, 2011 at 6:38 AM, Michael Haenlein haenl...@escpeurope.eu wrote: I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. If you are on Windows press the Windows key and type in Power Options. When the associated dialog pops up choose Power Saver. Now your PC will use less power so it won't heat up so much although your performance could suffer a bit. Also ensure that there is sufficient air circulation around the machine. To move this hardware-specific discussion off the R-help list, I strongly recommend the 'Thinkpad.com Support Community' (open community/non-Lenovo) with lots of experts and users: http://forum.thinkpads.com/ I've seen discussions on overheating/emergency shutdowns there. /Henrik -- Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San José, CA 95126 ph: 408-655-4567 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] integrate
On May 15, 2011, at 6:51 AM, meltem gölgeli wrote: Dear R-users, I'am really new at R. That's why I probably have a basic quastion. I have a function like f(x,y)=\int^{0}_{y}(2*x)*exp(y-t)dt or f(x,y)=\int^{0}_{y}((2*x)*exp(\int^{0}_{t}(x*k)dk)dt and I can also define some basic loops for xy like x in 1:3 and y in 1:2. Could anybody please help me? You should take one of the following paths: --- Stumble around using the help pages starting with ?Control --- Buy an introductory text http://www.r-project.org/doc/bib/R-books.html --- Read the Introduction to R http://cran.r-project.org/doc/manuals/R-intro.pdf You should also --- PLEASE do read the posting guide http://www.R-project.org/posting-guide.html -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find String Between Characters
It looks like you can get the text of the document with as(mmm[[1]], character) and you can use grep, strsplit, gsub, etc. on that text. Look at the functions in the XML pacakge for ways to use the XML structure of the data instead of pattern matching to extract meaningful parts of the document. class?HTMLInternalDocument Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Sparks, John James Sent: Saturday, May 14, 2011 7:14 PM To: jim holtman Cc: r-help@r-project.org Subject: Re: [R] Find String Between Characters Hi Jim, Thanks for your note. Unfortunately, when I attempt your solution in my exact setting, I get a weird and slightly different answer. First, let me be more clear. What I am attempting to do is pull the CIK number out of the information from the web page itself after it has loaded to R (this may not be optimal, but I am new at this), not from the web page reference (as you have done). So, when I execute the following as per your suggestion: require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?actio n=getcompanyCIK=320193owner=excludecount=40) num - sub(^.*CIK=([0-9]+).*, \\1, mmm) I get [1] pointer: 0x001265c0 Is this just a hex representation of the same number, or is something else going on here? Comments from any and all would be much appreciated. --John J. Sparks, Ph.D. On Sat, May 14, 2011 7:57 pm, jim holtman wrote: Is this what you want: mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompan yCIK=320193owner=excludecount=40 num - sub(^.*CIK=([0-9]+).*, \\1, mmm) num [1] 320193 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?actio n=getcompanyCIK=320193owner=excludecount=40) str(mmm) I want to get the number 320193 that is between the CIK= and the . I have tried g - grep( CIK=|, mmm ) and temp-grep(mmm,\CIK=\) and variations on these themes, but all won't run or come bask as an empty object. How can I grab this number? Best wishes, --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Summary.Formula: prmsd and test statistic
I tried the modification but no luck. Here is exactly what I'm seeing. The command works fine, but when I add prmsd=TRUE the numbers disappear. print(summary.formula(S~Kyph+Vert, data=radio, method=reverse, overall=T, continuous=5, add=TRUE, test=T)) Descriptive Statistics by S ++---+--++++-+ ||N |Guru Teg Bahadur Hospital |St. Stephens Hospital |VIMHANS Hospital|Combined| Test | || |(N=37)|(N=1) |(N=75) |(N=113) |Statistic| ++---+--++++-+ |Kyph|109| 13.6625/20.7100/29.1325 | 11.6400/11.6400/11.6400| 0./ 9.6200/17.1650| 2.1600/12.0100/21.1900|F=9.23 d.f.=2,106 P0.001| ++---+--++++-+ |Vert|113| 2/3/3 | 2/2/2 | 2/2/3 | 2/2/3 |F=2.65 d.f.=2,110 P=0.075| ++---+--++++-+ print(summary.formula(S~Kyph+Vert, data=radio, method=reverse, overall=T, continuous=5, add=TRUE, test=T), prmsd=TRUE) Descriptive Statistics by S ++---+--+--+-+-+-+ ||N |Guru Teg Bahadur Hospital |St. Stephens Hospital |VIMHANS Hospital |Combined | Test | || |(N=37)|(N=1) |(N=75) |(N=113) |Statistic| ++---+--+--+-+-+-+ |Kyph|109| | | | |F=9.23 d.f.=2,106 P0.001| ++---+--+--+-+-+-+ |Vert|113| | | | |F=2.65 d.f.=2,110 P=0.075| ++---+--+--+-+-+-+ On Sun, May 15, 2011 at 10:03 AM, David Winsemius dwinsem...@comcast.net wrote: On May 14, 2011, at 11:23 AM, Eli Kamara wrote: Hello, I'm a new user to R so apologies if this is a basic question, but after scouring the web on information for summary.formula, I still am searching for an answer. I made a function to analyze my data - I have a categorical variable and three continuous variables. I am analyzing my continuous variables on the basis of my categorical variables. radioanal - function(a) { #Educational status first - pulling variables from my database. categorical is 13 = Edu. numerical is 48=Kyph, 50=Vert, 53=HL. a1= a[,c(13,48,50,53)] #make sure they are in numeric form a2= transform(a1, Kyph=as.numeric(as.character(Kyph)), Vert=as.numeric(as.character(Vert)), HL=as.numeric(as.character(HL))) #see boxplots of the individual variables boxplot(a2$Kyph~a2$Edu, main=Education vs Kyphosis angle, xlab=Education, ylab=Kyphosis angle) boxplot(a2$Vert~a2$Edu, main=Education vs # of vertebrae affected, xlab=Education, ylab=#of vertebrae affected) boxplot(a2$HL~a2$Edu, main=Education vs %HL, xlab=Education, ylab=%HL) #see distribution of data d=summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert, method=reverse, overall=T, continuous=5, add=TRUE, test=T) I noticed that you were addressing the columns individually. That rather defeats the strategy of passing a data argument to a function and using only the column names in the formula. It often causes strange errors in model calls and I wouldn be surprised if you got better results with something like: d=summary.formula( Edu~ Kyph+ HL+ Vert, data=a2, method=reverse, overall=T, continuous=5, add=TRUE, test=T) -- David #perform MANOVA a3=manova(cbind(Kyph, Vert, HL)~as.factor(Edu), data=a2) #return results a4=list(Results of Educational Status MANOVA, print(d), summary(a3, test=Hotelling-Lawley), summary(a3, test=Roy) , summary(a3, test=Pillai), summary(a3, test=Wilks), summary.aov(a3) ) print(a4) } This function works as is, but I want to add the mean and standard deviation to my table. When I add the following code to line 36 where I print d print(d, prmsd=TRUE) The numbers in my table disappear. When I use the same commands from the command line, the same thing happens. After reading the manual, I think the error might be due to the missing numbers in my database, so I tried adding na.action to my set of commands: print(summary.formula(a2$Edu~a2$Kyph+a2$HL+a2$Vert,
Re: [R] changing the day of the week in dates format
Hi Adrian, Many thanks for your reply. Suppose I wanted to increment the date by a year - how would I account for things like leap years? Would I just do mydaysx[select] - mydaysx[select] + 365.25*24*60*60 Regards,Dave From: Adrian Duffner duffn...@googlemail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, 15 May 2011, 14:21 Subject: Re: [R] changing the day of the week in dates format Hi Dave, your problem is that you are working with a S3 class, what is mainly a list with naming convention. Hence it is possible to change just one entry of the list, but it is nearly never recommendable. So a slight change to your code should provide you the required output: mydaysx[select] - mydaysx[select] + 2*24*60*60 select - mydaysx$wday==6 sum(select) [1] 0 In this case not only the entry $mday of the list is changed, but the whole object is updated. Cheers Adrian Am 14.05.2011 20:44, schrieb Dave Evens: Dear all, I have a question related to the POSIXlt function in R. I have a set of dates and times, for exmaple: startx- as.POSIXct(2011-01-01 00:00:00) finx- as.POSIXct(2011-12-31 00:00:00) daysx- seq(startx, finx, by=24 hours) I want to change the dates of all the days falling on a Saturday to the next working day (i.e. Monday). So I convert dates to POSIXlt mydaysx- as.POSIXlt(daysx) Then I change select all the Saturday's and move them on to Monday select- mydaysx$wday==6 mydaysx$mday[select]- mydaysx$mday[select] + 2 However, although all the new dates (i.e. mydaysx) are actual days of the year - the $wday have not been updated and the $mdays have not all been corrected (i.e. those falling into the next month). So if I do select- mydaysx$wday==6 I still get the same set of days as before. Is there a way to do this? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Row names and matrixs
Thank you - it is refreshing to have a helpful answer. I am glad some people remember the days when they were first learning too. On Thu, May 12, 2011 at 4:58 PM, jlemaitre [via R] ml-node+3518836-766936252-236...@n4.nabble.com wrote: Nielsen, The numbers in the brackets reference a component of a matrix/data frame/vector. So if you have: x - c(1:10) # a vector of integers in sequence from 1-10 x[3] # the third component of x [1] 3 For 2-way matrices or data frames, the formatting is [row,column]. So, for a 10 x 10 matrix x: x - matrix(1:100, ncol = 10, byrow = T) x [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]12345678910 [2,] 11 12 13 14 15 16 17 18 1920 [3,] 21 22 23 24 25 26 27 28 2930 [4,] 31 32 33 34 35 36 37 38 3940 [5,] 41 42 43 44 45 46 47 48 4950 [6,] 51 52 53 54 55 56 57 58 5960 [7,] 61 62 63 64 65 66 67 68 6970 [8,] 71 72 73 74 75 76 77 78 7980 [9,] 81 82 83 84 85 86 87 88 8990 [10,] 91 92 93 94 95 96 97 98 99 100 x[,1] # return the first column of x [1] 1 11 21 31 41 51 61 71 81 91 x[1,] # return the first row of x [1] 1 2 3 4 5 6 7 8 9 10 when there's a minus, it just means that component is omitted x[-1,] # return x less the first row [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 11 12 13 14 15 16 17 18 1920 [2,] 21 22 23 24 25 26 27 28 2930 [3,] 31 32 33 34 35 36 37 38 3940 [4,] 41 42 43 44 45 46 47 48 4950 [5,] 51 52 53 54 55 56 57 58 5960 [6,] 61 62 63 64 65 66 67 68 6970 [7,] 71 72 73 74 75 76 77 78 7980 [8,] 81 82 83 84 85 86 87 88 8990 [9,] 91 92 93 94 95 96 97 98 99 100 Given this context, I would double check the contents of test vs. test1. And don't let arrogant posts on this help forum discourage you. I hope this helps. -- If you reply to this email, your message will be added to the discussion below: http://r.789695.n4.nabble.com/Row-names-and-matrixs-tp3516372p3518836.html To unsubscribe from Row names and matrixs, click herehttp://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3516372code=bGluZHNleW5pZWxzZW5wY0BnbWFpbC5jb218MzUxNjM3MnwtMTcxNzE2OTY3OA==. -- Lindsey Nielsen, Ph.D. Los Alamos National Lab (505) 667-2835 -- View this message in context: http://r.789695.n4.nabble.com/Row-names-and-matrixs-tp3516372p3524671.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pls help: lattice graph with both negative and positive value, x and y cross at 0 and negative value bars are plotted just oppositive direction in contrast to positive
Dear R experts: Here is my problem: #Data 1 Y - c(0.5, 0.1, 0.5, 1.3, 1.4, 1.6, 1.65, 2.4, 2.6, 3.4, 3.6, 4.3, 4.42, 4.8, 4.7, 3.4, 3.3, 2.8, 2.8, 1.2, 1.1, 0.5, 0.2, 0.1, -0.2, -1.5, -2.5, -1.3, -0.5, -0.1) X - seq(1:30) X1 - c(rep(T1, 24), rep(T2, 6)) dat1 - data.frame(Y, X, X1) require(lattice) mcol - c(green, red) barchart(Y ~ factor (X), group = X1, data = dat1, col = mcol , ylab= y var, xlab = x var, ylim = c(-3.0, 5.0), pos = 0, scales = list(x=list(rot= 90, font = 1, cex = 1) , y = list(rot= 90, font = 1, cex = 1) )) The output is not what I want. I want the orientation of graph like the following in base R but axis label are in Y axis line and other parameters as in lattice: barplot(Y, names.arg = X) I know this is simple question, but I could not find a true solution. -- Thanks in advance. Ram H [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding dates to time series
Hi there, I have a spreadsheet in excel which consists of first column of dates and then subsequent columns that refer to prices of different securities on those dates. (the first row contains each series name) I saved the excel file as type csv and then imported to excel using prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv,header = TRUE, sep = ,) This creates the correct time series data x-ts(prices[,2]) but does not have the dates attached. However the dates refer to working days. So although in general they represent Monday-Friday this is not always the case because of holidays etc. How then can I create a time series where the dates are read in from the first column of the csv file? I can not find an example in R documentation where this is done? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3524679.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Need help with text processing / string split
I used screen scraping to extract some information and put it into a table called tbl. Now I want to modify the table a bit so the data can be more useful. Here's the code I used: library(XML) rm(list=ls()) url - http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011; tbl -data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)] names(tbl) - c(Address, Township, Parcel, Sale Date, Costs) tbl is attached as txt for your convenience. Entries in the last column of the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28 . http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt How do I: 1. Split the string 2. Have the two values show up as actual numbers that can be used 3. Put the numbers in two separate columns of the dataframe. In other words $173,933.60$2,410.28 would show up as 173933.60 in one column and 2410.28 would show up in a second column of tbl I tried using strsplit but I could not get it working properly. -- View this message in context: http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
On May 13, 2011, at 6:38 AM, Michael Haenlein wrote: Dear all, I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. You didn't mention whether you are using a 64-bit OS or not. A single 32-bit process can not use more than 2 GB RAM. If your calculations would benefit from the full 8 GB RAM on your machine, you need to be able to run 64-bit R. My understanding is that, on Windows, you either have to install the OS as 32-bit and use all 32-bit software or install 64-bit Windows and run all 64-bit software. A Mac can run 32-bit and 64-bit software simultaneously and I'm not sure about Linux. In the case of Linux, it probably doesn't matter so much because most Linux software is available as open source and you can compile it yourself either way. I'm now thinking about buying a more powerful desktop PC or laptop. Can anybody advise me on the best configuration to run R as fast as possible? I will use this PC exclusively for R so any other factors are of limited importance. You need to evaluate whether RAM or raw processor speed is most critical for what you're doing. In my case, I upgraded my Mac Pro to 16 GB RAM and was able to do hierarchical clustering heatmaps overnight which previously took more than a week to compute. Using the Activity Monitor utility, it looks like some of the, even larger, heatmap computations would benefit from 32 GB RAM or more. Linux runs on the widest range of hardware and that allows you the greatest ability to shop around. If RAM is the deciding factor, then you can look around for a machine which can hold as much RAM as possible. If processor speed is the factor, then you can optimize for that. Windows runs on a reasonable array of hardware but has the disadvantage that the OS, itself, uses a lot of resources. The Mac has the advantage of flexibility. When you download the precompiled R package, it comes with both a 32-bit and a 64-bit executable. This is because 32-bit processes run a little faster if you don't need large amounts of RAM. If you do need the RAM, then you run the 64-bit version. Aram Fingal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unexpected behaviour as.data.frame
Inline below. On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan rh...@eoos.dds.nl wrote: Thanks. I also noticed myself minutes after sending my message to the list. My 'please ignore my question it was just a stupid typo' message was sent with the wrong account and is now awaiting moderation. However, my other question still stands: what is the preferred/fastest/simplest way to create a data.fame with given column types and dimensions? I do not know, but why is simply data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE) not acceptable? Note that if you had, say, 500, numeric (= double) and 100 character columns to add, you might do something like: z - matrix(numeric(5000),nr=10) u - matrix(character(1000),nr=10) frm - data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns While this might save some typing, it may not be much more efficient than typing it all out -- maybe just some parsing time is saved. You can experiment and see. However, since a data.frame **is** a list with added attributes and a great deal of the work of the constructor is in constructing and checking these attributes (e.g. row and column names), I see nothing terribly inefficient with what you did. It's just a bit obscure. But maybe someone with greater expertise will set us both straight. Cheers, Bert Regards, Jan On 05/15/2011 04:43 PM, Bert Gunter wrote: In your post, you're missing the final s on the stringsAsFactors argument in the d1 assignment. When I typed it correctly, it works as expected. -- Bert On Sun, May 15, 2011 at 4:25 AM, Jan van der Laanrh...@eoos.dds.nl wrote: I use the following code to create two data.frames d1 and d2 from a list: types- c(integer, character, double) nlines- 10 d1- as.data.frame(lapply(types, do.call, list(nlines)), stringsAsFactor=FALSE) l2- lapply(types, do.call, list(nlines)) d2- as.data.frame(l2, stringsAsFactors=FALSE) I would expect d1 and d2 to be the same, however, in d1 the second column is a factor while in d2 it is a character (which I would expect): str(d1) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: Factor w/ 1 level : 1 1 1 1 1 1 1 1 1 1 $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 str(d2) 'data.frame': 10 obs. of 3 variables: $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int 0 0 0 0 0 0 0 0 0 0 $ c: chr ... $ c.0..0..0..0..0..0..0..0..0..0. : num 0 0 0 0 0 0 0 0 0 0 As different but related question: I use the commands above to create an 'empty' data.frame with specified column types and dimensions. I need this data.frame to pass on to my c++ routines. Is there a more simple/elegant way of creating this data.frame? Regards, Jan PS: I am running R on 64 bit Ubuntu 11.04: sessionInfo() R version 2.12.1 (2010-12-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions. -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
On 15/05/2011 3:02 PM, Aram Fingal wrote: On May 13, 2011, at 6:38 AM, Michael Haenlein wrote: Dear all, I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. You didn't mention whether you are using a 64-bit OS or not. A single 32-bit process can not use more than 2 GB RAM. If your calculations would benefit from the full 8 GB RAM on your machine, you need to be able to run 64-bit R. My understanding is that, on Windows, you either have to install the OS as 32-bit and use all 32-bit software or install 64-bit Windows and run all 64-bit software. A Mac can run 32-bit and 64-bit software simultaneously and I'm not sure about Linux. In the case of Linux, it probably doesn't matter so much because most Linux software is available as open source and you can compile it yourself either way. No, 64 bit Windows can run either 32 or 64 bit Windows programs. I'm now thinking about buying a more powerful desktop PC or laptop. Can anybody advise me on the best configuration to run R as fast as possible? I will use this PC exclusively for R so any other factors are of limited importance. You need to evaluate whether RAM or raw processor speed is most critical for what you're doing. In my case, I upgraded my Mac Pro to 16 GB RAM and was able to do hierarchical clustering heatmaps overnight which previously took more than a week to compute. Using the Activity Monitor utility, it looks like some of the, even larger, heatmap computations would benefit from 32 GB RAM or more. Linux runs on the widest range of hardware and that allows you the greatest ability to shop around. If RAM is the deciding factor, then you can look around for a machine which can hold as much RAM as possible. If processor speed is the factor, then you can optimize for that. Windows runs on a reasonable array of hardware but has the disadvantage that the OS, itself, uses a lot of resources. The Mac has the advantage of flexibility. When you download the precompiled R package, it comes with both a 32-bit and a 64-bit executable. This is because 32-bit processes run a little faster if you don't need large amounts of RAM. If you do need the RAM, then you run the 64-bit version. The same is true for Windows binaries on CRAN. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pls help: lattice graph with both negative and positive value, x and y cross at 0 and negative value bars are plotted just oppositive direction in contrast to positive
Hi: Try this: barchart(Y ~ factor(X), group = X1, data = dat1, col = mcol, origin = 0, ylab= y var, xlab = x var, ylim = c(-3.0, 5.0), scales = list(x=list(rot= 90, font = 1, cex = 1) , y = list(rot= 90, font = 1, cex = 1) )) The origin = argument comes from panel.barchart(); see its help page for more details. HTH, Dennis On Sun, May 15, 2011 at 8:17 AM, Ram H. Sharma sharma.ra...@gmail.com wrote: Dear R experts: Here is my problem: #Data 1 Y - c(0.5, 0.1, 0.5, 1.3, 1.4, 1.6, 1.65, 2.4, 2.6, 3.4, 3.6, 4.3, 4.42, 4.8, 4.7, 3.4, 3.3, 2.8, 2.8, 1.2, 1.1, 0.5, 0.2, 0.1, -0.2, -1.5, -2.5, -1.3, -0.5, -0.1) X - seq(1:30) X1 - c(rep(T1, 24), rep(T2, 6)) dat1 - data.frame(Y, X, X1) require(lattice) mcol - c(green, red) barchart(Y ~ factor (X), group = X1, data = dat1, col = mcol , ylab= y var, xlab = x var, ylim = c(-3.0, 5.0), pos = 0, scales = list(x=list(rot= 90, font = 1, cex = 1) , y = list(rot= 90, font = 1, cex = 1) )) The output is not what I want. I want the orientation of graph like the following in base R but axis label are in Y axis line and other parameters as in lattice: barplot(Y, names.arg = X) I know this is simple question, but I could not find a true solution. -- Thanks in advance. Ram H [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding dates to time series
Hi: I'd suggest using the zoo package; it allows you to use an index vector such as dates to map to the series. It is well documented and well maintained, with vignettes and an FAQ that can be found on its package help page (among other places). Here is a small example: dd - data.frame(time = seq(as.Date('1993-01-01'), by = 'months', length = 200), s = rnorm(200)) head(dd, 3) time s 1 2003-01-01 1.4292491 2 2003-02-01 -1.0713998 3 2003-03-01 -0.4738791 library(zoo) ser - with(dd, zoo(s, time)) # s is the series, time is the index vector str(ser) # ser is of class zoo plot(ser) # apply the plot method For finance applications, other possibilities include the xts and quantmod packages, both of which are built on zoo. HTH, Dennis On Sun, May 15, 2011 at 11:42 AM, Bazman76 h_a_patie...@hotmail.com wrote: Hi there, I have a spreadsheet in excel which consists of first column of dates and then subsequent columns that refer to prices of different securities on those dates. (the first row contains each series name) I saved the excel file as type csv and then imported to excel using prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv,header = TRUE, sep = ,) This creates the correct time series data x-ts(prices[,2]) but does not have the dates attached. However the dates refer to working days. So although in general they represent Monday-Friday this is not always the case because of holidays etc. How then can I create a time series where the dates are read in from the first column of the csv file? I can not find an example in R documentation where this is done? Thanks -- View this message in context: http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3524679.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Need help with text processing / string split
try this: x - read.table('/temp/tbl.txt', sep = ',', header = TRUE, as.is = TRUE) # remove commas from the Cost column x$Cost - gsub(',', '', x$Cost) # split the Cost temp - strsplit(x$Cost, \\$) # $ is special, so it is escaped temp - do.call(rbind, temp) # create a matrix mode(temp) - 'numeric' # convert to numeric x$Cost1 - temp[, 2] x$Cost2 - temp[, 3] head(x) Address Township Parcel Sale.Date 2 10 PACER LN East Norriton 330006712005 Bnkrptcy-PP to6/29/2011 3 6 BALA AVE Lower Merion 43292007 STAYED5/25/2011 4 109 STONY WAY, Condo 109 East Norriton 330008575662 Bnkrptcy-PP to6/29/2011 5 613 NORTHAMPTON RD East Norriton 330006103002 Postponed to5/25/2011 6 67 HIGH GATE LN Whitpain 660002716764 Pstpnd by CO to5/25/2011 7 236 Arundel Ave aka 236 Arundel Road Horsham 36136008 For Sale5/25/2011 Costs Cost Cost1 Cost2 2 $173,933.60$2,410.28 $173933.60$2410.28 173933.60 2410.28 3 $264,640.36$168.00 $264640.36$168.00 264640.36 168.00 4 $70,029.04$1,483.59 $70029.04$1483.59 70029.04 1483.59 5 $254,873.19$1,772.62 $254873.19$1772.62 254873.19 1772.62 6 $404,507.59$1,947.90 $404507.59$1947.90 404507.59 1947.90 7 $252,472.27$1,034.51 $252472.27$1034.51 252472.27 1034.51 On Sun, May 15, 2011 at 3:50 PM, eric ericst...@aol.com wrote: I used screen scraping to extract some information and put it into a table called tbl. Now I want to modify the table a bit so the data can be more useful. Here's the code I used: library(XML) rm(list=ls()) url - http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011; tbl -data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)] names(tbl) - c(Address, Township, Parcel, Sale Date, Costs) tbl is attached as txt for your convenience. Entries in the last column of the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28 . http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt How do I: 1. Split the string 2. Have the two values show up as actual numbers that can be used 3. Put the numbers in two separate columns of the dataframe. In other words $173,933.60$2,410.28 would show up as 173933.60 in one column and 2410.28 would show up in a second column of tbl I tried using strsplit but I could not get it working properly. -- View this message in context: http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find String Between Characters
I would assume that you have lines of text that do not include 'CIK=' and therefore the 'sub' fails and you get the original string. If you only want the lines with CIK, then use 'grepl' to just extract those lines before processing. On Sat, May 14, 2011 at 10:14 PM, Sparks, John James jspa...@uic.edu wrote: Hi Jim, Thanks for your note. Unfortunately, when I attempt your solution in my exact setting, I get a weird and slightly different answer. First, let me be more clear. What I am attempting to do is pull the CIK number out of the information from the web page itself after it has loaded to R (this may not be optimal, but I am new at this), not from the web page reference (as you have done). So, when I execute the following as per your suggestion: require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;) num - sub(^.*CIK=([0-9]+).*, \\1, mmm) I get [1] pointer: 0x001265c0 Is this just a hex representation of the same number, or is something else going on here? Comments from any and all would be much appreciated. --John J. Sparks, Ph.D. On Sat, May 14, 2011 7:57 pm, jim holtman wrote: Is this what you want: mmm-http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40; num - sub(^.*CIK=([0-9]+).*, \\1, mmm) num [1] 320193 On Sat, May 14, 2011 at 8:20 PM, Sparks, John James jspa...@uic.edu wrote: Dear R Helpers, I am trying to isolate a set of characters between two other characters in a long string file. I tried some of the examples on the R help pages and elsewhere, but I am not able to get it. Your help would be much appreciated. require(scrapeR) mmm-scrape(url=http://www.sec.gov/cgi-bin/browse-edgar?action=getcompanyCIK=320193owner=excludecount=40;) str(mmm) I want to get the number 320193 that is between the CIK= and the . I have tried g - grep( CIK=|, mmm ) and temp-grep(mmm,\CIK=\) and variations on these themes, but all won't run or come bask as an empty object. How can I grab this number? Best wishes, --John J. Sparks, Ph.D. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to merge within range?
I'd've first said it's simply sapply(df1$time, function(x) if(any(foo - (x=df2$from x=df2$to))0) df2$value[which(foo)] else NA ) but the following are much nicer (except that instead of NA you'll have 0 but that's easy to change if necessary): colSums(sapply(df1$time, function(x) (x=df2$from x =df2$to) * df2$value) ) rowSums(outer(df1$time, df2$from, =) * outer(df1$time, df2$to, =) * df2$value) On Sat, May 14, 2011 at 10:08 PM, René Mayer ma...@psychologie.tu-dresden.de wrote: sqldf is impressive - compiled it now; the trick with findInterval is nice, too. thanks guys!! Zitat von David Winsemius dwinsem...@comcast.net: On May 14, 2011, at 2:27 PM, William Dunlap wrote: You could use findInterval() along with a trick with c(rbind(...)): i - findInterval(x=df.1$time, vec=c(rbind(df.2$from, df.2$to))) i [1] 1 1 1 2 3 3 3 5 5 6 That's nice. I was working on a slightly different trick findInterval( df.1[,1],t(df.2[,1:2])) [1] 1 1 1 2 3 3 3 5 5 6 I was then trying to get the right indices with (.)'%%' 2 and (.) '%/%' 2 The even-valued outputs would map to NA's, the odds to value[(i+1)/2], but you can use the c(rbind(...)) trick again: c(rbind(df.2$value, NA))[i] [1] 1 1 1 NA 3 3 3 5 5 NA I'd like to understand that. Maybe, maybe... ah, got it. At first I didn't realize those were the final answers since they looked like indices. My t(.) trick doesn't generalize as well. My earlier suggestion tht two merges woul do it was based on my erroneous interpretation of the example, since I thought the task was to match on the end points of the intervals. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of René Mayer Sent: Saturday, May 14, 2011 11:06 AM To: David Winsemius Cc: r-help@r-project.org Subject: Re: [R] how to merge within range? thanks David and Ian, let me make a better example as the first one was flawed df.1=data.frame(round((1:10)*100+rnorm(10)), value=NA) names(df.1) = c(time, value) df.1 time value 1 101 NA 2 199 NA 3 301 NA 4 401 NA 5 501 NA 6 601 NA 7 700 NA 8 800 NA 9 900 NA 10 1000 NA # from and to define ranges within time, # note that from and to may not match the numbers given in time df.2=data.frame(from=c(99,500,799),to=c(303,702,950), value=c(1,3,5)) df.2 from to value 1 99 303 1 2 500 702 3 3 799 950 5 what I want is: time value 1 101 1 2 199 1 3 301 1 4 401 NA 5 501 3 6 601 3 7 700 3 8 800 5 9 900 5 10 1000 NA @David I don't know what you mean by 2 merges, René Zitat von David Winsemius dwinsem...@comcast.net: On May 14, 2011, at 9:16 AM, Ian Gow wrote: If I assume that the third column in data.frame.2 is named val then in SQL terms it _seems_ you want SELECT a.time, b.val FROM data.frame.1 AS a LEFT JOIN data.frame.2 AS b ON a.time BETWEEN b.start AND b.end; Not sure how to do that elegantly using R subsetting/merge, Huh? It's just two merge()'s (... once you fix the error in the example.) -- David but you might try a package that allows you to use SQL, such as sqldf. On 5/14/11 8:03 AM, David Winsemius dwinsem...@comcast.net wrote: On May 14, 2011, at 8:12 AM, René Mayer wrote: Hello, how can one merge And what happened when you typed: ?merge two data frames when in the second data frame one column defines the start values and another defines the end value of the to be merged range. data.frame.1 time ... 13 24 35 46 55 ... data.frame.2 start end 24 37 ?h? ? ... should result in this 13 NA 24 ?h? 35 ?h? 46 NA 55 ? And _why_ would that be? thanks, René __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
Re: [R] changing the day of the week in dates format
What is it that you want to do? If you move the dates forward a year, then what does it mean to add one year to 2/29/2008? You did mention accounting for leap year. It goes the other way with 2/28/2007 and 3/1/2007; what is your expectation in these cases? You can always convert everything to characters and then substring out the year and put the new one in, and then check for the leap year condition and do the appropriate action. The equation that you used would add 6 hours to each each succeeding year's date. So I ask my favorite question: what is the problem that you are trying to solve? On Sun, May 15, 2011 at 11:13 AM, Dave Evens daveeve...@yahoo.co.uk wrote: Hi Adrian, Many thanks for your reply. Suppose I wanted to increment the date by a year - how would I account for things like leap years? Would I just do mydaysx[select] - mydaysx[select] + 365.25*24*60*60 Regards,Dave From: Adrian Duffner duffn...@googlemail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Sunday, 15 May 2011, 14:21 Subject: Re: [R] changing the day of the week in dates format Hi Dave, your problem is that you are working with a S3 class, what is mainly a list with naming convention. Hence it is possible to change just one entry of the list, but it is nearly never recommendable. So a slight change to your code should provide you the required output: mydaysx[select] - mydaysx[select] + 2*24*60*60 select - mydaysx$wday==6 sum(select) [1] 0 In this case not only the entry $mday of the list is changed, but the whole object is updated. Cheers Adrian Am 14.05.2011 20:44, schrieb Dave Evens: Dear all, I have a question related to the POSIXlt function in R. I have a set of dates and times, for exmaple: startx- as.POSIXct(2011-01-01 00:00:00) finx- as.POSIXct(2011-12-31 00:00:00) daysx- seq(startx, finx, by=24 hours) I want to change the dates of all the days falling on a Saturday to the next working day (i.e. Monday). So I convert dates to POSIXlt mydaysx- as.POSIXlt(daysx) Then I change select all the Saturday's and move them on to Monday select- mydaysx$wday==6 mydaysx$mday[select]- mydaysx$mday[select] + 2 However, although all the new dates (i.e. mydaysx) are actual days of the year - the $wday have not been updated and the $mdays have not all been corrected (i.e. those falling into the next month). So if I do select- mydaysx$wday==6 I still get the same set of days as before. Is there a way to do this? Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R function that returns an object's search path position XXXX
On 15/05/11 13:41, Dan Abner wrote: Hello everyone, Is there an R function that returns an object's search path position? ?find cheers, Rolf Turner __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem converting character to dates
I have imported the data fram Excel and it comes like this: Calendar Year/Month 01.2008 01.2008 01.2008 01.2008 01.2008 02.2008 02.2008 Calendar year / week01.2008 02.2008 03.2008 04.2008 05.2008 05.2008 06.2008 There are repeats in the weeks both belonging to two months. It's the same at the end of the year. So, I think it was to do with the data acquisition, probably, and how it is saved in the database. Thanks for your help and interest. -- View this message in context: http://r.789695.n4.nabble.com/problem-converting-character-to-dates-tp3517918p3524837.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] High standard error
Significant or not you should look at the p-value of your coefficient estimation. If your prob. is significantly diff. from zero, you start to interpret your coefficient from there. Might help to look up Std.Err definition, its is some what similar to the standard deviation. Good luck. -- View this message in context: http://r.789695.n4.nabble.com/High-standard-error-tp3329903p3524958.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow/Snowfall hangs on windows 7
Same problem as Anna here. Windows 7 64-bit. Running R 2.13.0. snow + snowfall installed. Testing: library(snow) library(snowfall) sfInit(parallel=TRUE, cpus=2, type=SOCK) Then R spins forever (yes, I disabled the Windows firewall). On the same box, tried the same on Ubuntu under Virtualbox. No problem. Runs well. Any suggestions/ideas appreciated. David -- View this message in context: http://r.789695.n4.nabble.com/Snow-Snowfall-hangs-on-windows-7-tp3436724p3524990.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding dates to time series
OK I got it to work thanks to your example plot(ser) however, ultiamtely a I need a stype ts object. So I used xts - as.ts(ser) xts Time Series: Start = 1 End = 732 Frequency = 1 which just gets me back to where I started with the correct raw data but no attached dates? It is possible to have a time series ts() object with irregular dates? -- View this message in context: http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3525001.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding dates to time series
additional!! I now realise that the time series created below is in the wrong order! clearly the column of dates are not being interpreted as dates by the R. Is is possible for R to read column one as dates? how can I do this? dd-data.frame(prices[,1],prices[,2]) head(dd,3) prices...1. prices...2. 1 16/12/2004 0.13675654 2 17/12/2004 0.22967560 3 20/12/2004 0.01841611 ser-with(dd,zoo(prices[,2],prices[,1])) plot(ser) xts - as.ts(xzoo) Error in as.ts(xzoo) : object 'xzoo' not found xts - as.ts(ser) xts Time Series: Start = 1 End = 732 Frequency = 1 -- View this message in context: http://r.789695.n4.nabble.com/Adding-dates-to-time-series-tp3524679p3525011.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rbind with partially overlapping column names
Hello, I would like to merge two data frames with partially overlapping column names with an rbind-like operation. For the follow data frames, df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(b,b),c=c(c,c)) I would like the output frame to be (with NAs where the frames don't overlap) a b c A B NA A B NA NA b c NA b c I am familiar with ?merge and ?rbind, but neither seem to offer a means to accomplish this. Thanks in advance. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with partially overlapping column names
Hi: This is a bit of a kluge, but works for your test case: df2[,setdiff(names(df1),names(df2))] - NA df1[,setdiff(names(df2),names(df1))] - NA df3 - rbind(df1,df2) df3 a b c 1 A B NA 2 A B NA 3 NA b c 4 NA b c -Ian On 5/15/11 7:41 PM, Jonathan Flowers jonathanmflow...@gmail.com wrote: Hello, I would like to merge two data frames with partially overlapping column names with an rbind-like operation. For the follow data frames, df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(b,b),c=c(c,c)) I would like the output frame to be (with NAs where the frames don't overlap) a b c A B NA A B NA NA b c NA b c I am familiar with ?merge and ?rbind, but neither seem to offer a means to accomplish this. Thanks in advance. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Adding dates to time series
On Sun, May 15, 2011 at 2:42 PM, Bazman76 h_a_patie...@hotmail.com wrote: Hi there, I have a spreadsheet in excel which consists of first column of dates and then subsequent columns that refer to prices of different securities on those dates. (the first row contains each series name) I saved the excel file as type csv and then imported to excel using prices=read.csv(file=C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv,header = TRUE, sep = ,) This creates the correct time series data x-ts(prices[,2]) but does not have the dates attached. However the dates refer to working days. So although in general they represent Monday-Friday this is not always the case because of holidays etc. How then can I create a time series where the dates are read in from the first column of the csv file? I can not find an example in R documentation where this is done? Lines - time,s 2003-01-01,1.4292491 2003-02-01,-1.0713998 2003-03-01,-0.4738791 library(zoo) # F - C:/Documents and Settings/Hugh/My Documents/PhD/Option prices.csv # z - read.zoo(F, header = TRUE, sep = ,) # in reality we would read from the file as shown in the comments above # but here we do it this way so we can just copy it and paste it # verbatim into the R session z - read.zoo(textConnection(Lines), header = TRUE, sep = ,) If you want xts then: library(xts) x - as.xts(z) Note that ts is not good for dates so use zoo or xts. See ?read.zoo in the zoo package. -- Statistics Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with partially overlapping column names
Hi: Another way, with a little less typing but using the same principle, is df1$c - df2$a - NA rbind(df1, df2) Dennis On Sun, May 15, 2011 at 5:50 PM, Ian Gow iand...@gmail.com wrote: Hi: This is a bit of a kluge, but works for your test case: df2[,setdiff(names(df1),names(df2))] - NA df1[,setdiff(names(df2),names(df1))] - NA df3 - rbind(df1,df2) df3 a b c 1 A B NA 2 A B NA 3 NA b c 4 NA b c -Ian On 5/15/11 7:41 PM, Jonathan Flowers jonathanmflow...@gmail.com wrote: Hello, I would like to merge two data frames with partially overlapping column names with an rbind-like operation. For the follow data frames, df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(b,b),c=c(c,c)) I would like the output frame to be (with NAs where the frames don't overlap) a b c A B NA A B NA NA b c NA b c I am familiar with ?merge and ?rbind, but neither seem to offer a means to accomplish this. Thanks in advance. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Snow/Snowfall hangs on windows 7
btw, I installed R.10.1 on the same box (Windows 7, 64bit, 4 cores). snow/snowfall work fine. here is my sessionInfo() R version 2.10.1 (2009-12-14) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snowfall_1.84 snow_0.3-3 loaded via a namespace (and not attached): [1] tools_2.10.1 David -- View this message in context: http://r.789695.n4.nabble.com/Snow-Snowfall-hangs-on-windows-7-tp3436724p3525182.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on approximations of full logistic regression model
I think you are doing this correctly except for one thing. The validation and other inferential calculations should be done on the full model. Use the approximate model to get a simpler nomogram but not to get standard errors. With only dropping one variable you might consider just running the nomogram on the entire model. Frank 細田弘吉 wrote: Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. library(rms) plogit - predict(full.model) full.ols - ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) fastbw(full.ols, aics=1e10) Deleted Chi-Sq d.f. P Residual d.f. P AICR2 stenosis 1.41 10.2354 1.41 10.2354 -0.59 0.991 x216.78 10. 18.19 20.0001 14.19 0.882 procedure 26.12 10. 44.31 30. 38.31 0.711 ClinicalScore 25.75 10. 70.06 40. 62.06 0.544 x183.42 10. 153.49 50. 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping stenosis. full.ols.approx - ols(plogit ~ x1+x2+ClinicalScore+procedure) full.ols.approx$stats n Model L.R.d.f. R2 g Sigma 104.000 487.9006640 4.000 0.9908257 1.3341718 0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping stenosis as predictor. full.approx.lrm - update(full.model, ~ . -stenosis) validate(full.model, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 R20.3270 0.3716 0.3335 0.0382 0.2888 1000 Intercept 0. 0. 0.0821 -0.0821 0.0821 1000 Slope 1. 1. 1.0548 -0.0548 1.0548 1000 Emax 0. 0. 0.0263 0.0263 0.0263 1000 validate(full.approx.lrm, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 R20.3245 0.3592 0.3428 0.0164 0.3081 1000 Intercept 0. 0. 0.1281 -0.1281 0.1281 1000 Slope 1. 1. 1.1104 -0.1104 1.1104 1000 Emax 0. 0. 0.0444 0.0444 0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. full.approx.lrm.nom - nomogram(full.approx.lrm, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.approx.lrm.nom) Another nomogram using ols model, full.ols.approx.nom - nomogram(full.ols.approx, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.ols.approx.nom) These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with partially overlapping column names
-Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Flowers Sent: Sunday, May 15, 2011 5:41 PM To: r-help@r-project.org Subject: [R] rbind with partially overlapping column names Hello, I would like to merge two data frames with partially overlapping column names with an rbind-like operation. For the follow data frames, df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(b,b),c=c(c,c)) I would like the output frame to be (with NAs where the frames don't overlap) a b c A B NA A B NA NA b c NA b c I am familiar with ?merge and ?rbind, but neither seem to offer a means to accomplish this. What is wrong with merge(all=TRUE,...)? merge(df1,df2,all=TRUE) bac 1 BA NA 2 BA NA 3 b NAc 4 b NAc Rearrange the columns if that is necessary merge(df1,df2,all=TRUE)[c(a,b,c)] a bc 1A B NA 2A B NA 3 NA bc 4 NA bc Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thanks in advance. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting the dimnames of an array with variable dimensions
Hi list, In a function I am writing, I need to extract the dimension names of an array. I know this can be acheived easily using dimnames() but my problem is that I want my function to be robust when the number of dimensions varies. Consider the following case: foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) # What I want is to extract the *names of the dimensions* for which foo have positive values: ind - which(foo 0, arr.ind = TRUE) # A first solution is: t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3], drop=FALSE] # But it does require to know the dimensions of foo I would like to do something like: ind - which(foo 0, arr.ind = TRUE) t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE] but in that case the dimnames are dropped. Any suggestion? Cheers, Pierre -- Scientist Landcare Research, New Zealand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rbind with partially overlapping column names
That approach relies on df1 and df2 not having overlapping values in b. Slight variation in df2 gives different results: df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(B,B),c=c(c,c)) merge(df1,df2,all=TRUE) b a c 1 B A c 2 B A c 3 B A c 4 B A c On 5/15/11 11:19 PM, William Dunlap wdun...@tibco.com wrote: -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jonathan Flowers Sent: Sunday, May 15, 2011 5:41 PM To: r-help@r-project.org Subject: [R] rbind with partially overlapping column names Hello, I would like to merge two data frames with partially overlapping column names with an rbind-like operation. For the follow data frames, df1 - data.frame(a=c(A,A),b=c(B,B)) df2 - data.frame(b=c(b,b),c=c(c,c)) I would like the output frame to be (with NAs where the frames don't overlap) a b c A B NA A B NA NA b c NA b c I am familiar with ?merge and ?rbind, but neither seem to offer a means to accomplish this. What is wrong with merge(all=TRUE,...)? merge(df1,df2,all=TRUE) bac 1 BA NA 2 BA NA 3 b NAc 4 b NAc Rearrange the columns if that is necessary merge(df1,df2,all=TRUE)[c(a,b,c)] a bc 1A B NA 2A B NA 3 NA bc 4 NA bc Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com Thanks in advance. Jonathan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Question on approximations of full logistic regression model
Thank you for your reply, Prof. Harrell. I agree with you. Dropping only one variable does not actually help a lot. I have one more question. During analysis of this model I found that the confidence intervals (CIs) of some coefficients provided by bootstrapping (bootcov function in rms package) was narrower than CIs provided by usual variance-covariance matrix and CIs of other coefficients wider. My data has no cluster structure. I am wondering which CIs are better. I guess bootstrapping one, but is it right? I would appreciate your help in advance. -- KH (11/05/16 12:25), Frank Harrell wrote: I think you are doing this correctly except for one thing. The validation and other inferential calculations should be done on the full model. Use the approximate model to get a simpler nomogram but not to get standard errors. With only dropping one variable you might consider just running the nomogram on the entire model. Frank KH wrote: Hi, I am trying to construct a logistic regression model from my data (104 patients and 25 events). I build a full model consisting of five predictors with the use of penalization by rms package (lrm, pentrace etc) because of events per variable issue. Then, I tried to approximate the full model by step-down technique predicting L from all of the componet variables using ordinary least squares (ols in rms package) as the followings. I would like to know whether I am doing right or not. library(rms) plogit- predict(full.model) full.ols- ols(plogit ~ stenosis+x1+x2+ClinicalScore+procedure, sigma=1) fastbw(full.ols, aics=1e10) Deleted Chi-Sq d.f. P Residual d.f. P AICR2 stenosis 1.41 10.2354 1.41 10.2354 -0.59 0.991 x216.78 10. 18.19 20.0001 14.19 0.882 procedure 26.12 10. 44.31 30. 38.31 0.711 ClinicalScore 25.75 10. 70.06 40. 62.06 0.544 x183.42 10. 153.49 50. 143.49 0.000 Then, fitted an approximation to the full model using most imprtant variable (R^2 for predictions from the reduced model against the original Y drops below 0.95), that is, dropping stenosis. full.ols.approx- ols(plogit ~ x1+x2+ClinicalScore+procedure) full.ols.approx$stats n Model L.R.d.f. R2 g Sigma 104.000 487.9006640 4.000 0.9908257 1.3341718 0.1192622 This approximate model had R^2 against the full model of 0.99. Therefore, I updated the original full logistic model dropping stenosis as predictor. full.approx.lrm- update(full.model, ~ . -stenosis) validate(full.model, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6425 0.7017 0.6131 0.0887 0.5539 1000 R20.3270 0.3716 0.3335 0.0382 0.2888 1000 Intercept 0. 0. 0.0821 -0.0821 0.0821 1000 Slope 1. 1. 1.0548 -0.0548 1.0548 1000 Emax 0. 0. 0.0263 0.0263 0.0263 1000 validate(full.approx.lrm, bw=F, B=1000) index.orig trainingtest optimism index.correctedn Dxy 0.6446 0.6891 0.6265 0.0626 0.5820 1000 R20.3245 0.3592 0.3428 0.0164 0.3081 1000 Intercept 0. 0. 0.1281 -0.1281 0.1281 1000 Slope 1. 1. 1.1104 -0.1104 1.1104 1000 Emax 0. 0. 0.0444 0.0444 0.0444 1000 Validatin revealed this approximation was not bad. Then, I made a nomogram. full.approx.lrm.nom- nomogram(full.approx.lrm, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.approx.lrm.nom) Another nomogram using ols model, full.ols.approx.nom- nomogram(full.ols.approx, fun.at=c(0.05,0.1,0.2,0.4,0.6,0.8,0.9,0.95), fun=plogis) plot(full.ols.approx.nom) These two nomograms are very similar but a little bit different. My questions are; 1. Am I doing right? 2. Which nomogram is correct I would appreciate your help in advance. -- KH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Question-on-approximations-of-full-logistic-regression-model-tp3524294p3525372.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. E-mail address Office: khos...@med.kobe-u.ac.jp
Re: [R] graphs of gamma, normal fit to a histogram are about half as large as they should be
Hmm; still missing something - hist defaults to frequencies, not prob. densities; and, I thought I'd scaled the fitted lines to the values in the data frame. Just going with it, I specified freq=FALSE, and the prob density was of course at a different order of magnitude than the lines. What are you trying to hint at? On Fri, May 13, 2011 at 6:05 PM, Rolf Turner rolf.tur...@xtra.co.nz wrote: On 14/05/11 10:00, Benjamin Caldwell wrote: Hello, I'm trying to compare the fit of two distributions, normal and gamma, to a histogram of my response variable. rate-mean(na.omit(rwb$post.f.crwn.length))/var(na.omit(rwb$post.f.crwn.length)) shape-rate*mean(na.omit(rwb$post.f.crwn.length)) hist((rwb$post.f.crwn.length), main=rwb$post.f.crwn.length) lines(seq(0.01,70,0.01),length(rwb$post.f.crwn.length)*dgamma(seq(0.01,70,0.01),shape,rate)) lines(seq(0,70,0.1),length(na.omit(rwb$post.f.crwn.length))*dnorm(seq(0,70,.1),mean(na.omit(rwb$post.f.crwn.length)),sqrt(var(na.omit(rwb$post.f.crwn.length However, the height of the two curves are about 1/3 to 1/4 the height that they should be compared to the histogram. Any ideas? Yes. Read the help on hist! (Hint: Pay particular attention to the freq and/or probability arguments.) cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting the dimnames of an array with variable dimensions
Hi: Does it have to be an array? If all you're interested in is the dimnames, how about this? library(plyr) foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) foo , , e A B C D a -0.2183877 -0.8912908 -2.0175612 -0.8080548 b 0.4870784 -0.8626293 -0.5641368 -0.5219722 c 0.8821044 0.3187850 1.2203297 -0.3151186 d -0.9894656 -1.1779108 0.9853935 0.3560747 , , f A B C D a 0.7357773 -1.7591637 1.6320887 1.2248529 b 0.4662315 0.1131432 -0.9790887 -0.6575306 c -0.3564725 -0.9202688 0.1017894 0.7382683 d 0.2825117 0.9242299 0.3577063 -1.3297339 # flatten array into a data frame with dimnames as factors # adply() converts an array to a data frame, applying a function # along the stated dimensions u - adply(foo, c(1, 2, 3), as.vector) subset(u, V1 0)[, 1:3] X1 X2 X3 2 b A e 3 c A e 7 c B e 11 c C e 12 d C e 16 d D e 17 a A f 18 b A f 20 d A f 22 b B f 24 d B f 25 a C f 27 c C f 28 d C f 29 a D f 31 c D f HTH, Dennis On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier pierre.roud...@gmail.com wrote: Hi list, In a function I am writing, I need to extract the dimension names of an array. I know this can be acheived easily using dimnames() but my problem is that I want my function to be robust when the number of dimensions varies. Consider the following case: foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) # What I want is to extract the *names of the dimensions* for which foo have positive values: ind - which(foo 0, arr.ind = TRUE) # A first solution is: t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3], drop=FALSE] # But it does require to know the dimensions of foo I would like to do something like: ind - which(foo 0, arr.ind = TRUE) t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE] but in that case the dimnames are dropped. Any suggestion? Cheers, Pierre -- Scientist Landcare Research, New Zealand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Powerful PC to run R
On Sun, 15 May 2011, Duncan Murdoch wrote: On 15/05/2011 3:02 PM, Aram Fingal wrote: On May 13, 2011, at 6:38 AM, Michael Haenlein wrote: Dear all, I'm currently running R on my laptop -- a Lenovo Thinkpad X201 (Intel Core i7 CPU, M620, 2.67 Ghz, 8 GB RAM). The problem is that some of my calculations run for several days sometimes even weeks (mainly simulations over a large parameter space). Depending on the external conditions, my laptop sometimes shuts down due to overheating. You didn't mention whether you are using a 64-bit OS or not. A single 32-bit process can not use more than 2 GB RAM. And that is also false. For Windows, see the rw-FAQ. It is address space (not RAM) that is limited, and it is limited to 4GB *by definition* in a 32-bit process. Many OSes can give your process 4GB of address space, but may reserve some of it for the OS. If your calculations would benefit from the full 8 GB RAM on your machine, you need to be able to run 64-bit R. My understanding is that, on Windows, you either have to install the OS as 32-bit and use all 32-bit software or install 64-bit Windows and run all 64-bit software. A Mac can run 32-bit and 64-bit software simultaneously and I'm not sure about Linux. In the case of Linux, it probably doesn't matter so much because most Linux software is available as open source and you can compile it yourself either way. For the record, all modern 64-bit OSes on x86_64 cpus can do this provided you install 32-bit versions of core dynamic libraries. I run 32- and 64-bit R on 64-bit Linux, Solaris, FreeBSD, Darwin (the OS of Mac OS X), Windows As can AIX and IRIX on their CPUs. No, 64 bit Windows can run either 32 or 64 bit Windows programs. I'm now thinking about buying a more powerful desktop PC or laptop. Can anybody advise me on the best configuration to run R as fast as possible? I will use this PC exclusively for R so any other factors are of limited importance. You need to evaluate whether RAM or raw processor speed is most critical for what you're doing. In my case, I upgraded my Mac Pro to 16 GB RAM and was able to do hierarchical clustering heatmaps overnight which previously took more than a week to compute. Using the Activity Monitor utility, it looks like some of the, even larger, heatmap computations would benefit from 32 GB RAM or more. Linux runs on the widest range of hardware and that allows you the greatest ability to shop around. If RAM is the deciding factor, then you can look around for a machine which can hold as much RAM as possible. If processor speed is the factor, then you can optimize for that. Windows runs on a reasonable array of hardware but has the disadvantage that the OS, itself, uses a lot of resources. Nothing like as much as Mac OS X, though. (I would say the main disadvantage of Windows for R is the slowness of the file systems.) The Mac has the advantage of flexibility. When you download the precompiled R package, it comes with both a 32-bit and a 64-bit executable. This is because 32-bit processes run a little faster if you don't need large amounts of RAM. If you do need the RAM, then you run the 64-bit version. The same is true for Windows binaries on CRAN. And of e.g. the Fedora binaries. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html Mr Fingal: please do! You are clearly unfamiliar with the R manuals. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] graphs of gamma, normal fit to a histogram are about half as large as they should be
In your example it appears that you are plotting a histogram (on the frequency scale) and then superimposing scalar multiples of gamma and Gaussian densities. You should just plot a histogram (with frequency=FALSE) and then superimpose the densities --- without any scalar multipliers. If that doesn't work, please provide a minimal *reproducible* (no one but you has the ``rwb'' data object) example of the problem that you are having (as the posting guide requests). cheers, Rolf Turner On 16/05/11 17:01, Benjamin Caldwell wrote: Hmm; still missing something - hist defaults to frequencies, not prob. densities; and, I thought I'd scaled the fitted lines to the values in the data frame. Just going with it, I specified freq=FALSE, and the prob density was of course at a different order of magnitude than the lines. What are you trying to hint at? On Fri, May 13, 2011 at 6:05 PM, Rolf Turner rolf.tur...@xtra.co.nz mailto:rolf.tur...@xtra.co.nz wrote: On 14/05/11 10:00, Benjamin Caldwell wrote: Hello, I'm trying to compare the fit of two distributions, normal and gamma, to a histogram of my response variable. rate-mean(na.omit(rwb$post.f.crwn.length))/var(na.omit(rwb$post.f.crwn.length)) shape-rate*mean(na.omit(rwb$post.f.crwn.length)) hist((rwb$post.f.crwn.length), main=rwb$post.f.crwn.length) lines(seq(0.01,70,0.01),length(rwb$post.f.crwn.length)*dgamma(seq(0.01,70,0.01),shape,rate)) lines(seq(0,70,0.1),length(na.omit(rwb$post.f.crwn.length))*dnorm(seq(0,70,.1),mean(na.omit(rwb$post.f.crwn.length)),sqrt(var(na.omit(rwb$post.f.crwn.length However, the height of the two curves are about 1/3 to 1/4 the height that they should be compared to the histogram. Any ideas? Yes. Read the help on hist! (Hint: Pay particular attention to the freq and/or probability arguments.) cheers, Rolf Turner [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting the dimnames of an array with variable dimensions
Hi Dennis, Thanks for your answer, it works very well - clever way to sort the problem! Cheers, Pierre 2011/5/16 Dennis Murphy djmu...@gmail.com: Hi: Does it have to be an array? If all you're interested in is the dimnames, how about this? library(plyr) foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) foo , , e A B C D a -0.2183877 -0.8912908 -2.0175612 -0.8080548 b 0.4870784 -0.8626293 -0.5641368 -0.5219722 c 0.8821044 0.3187850 1.2203297 -0.3151186 d -0.9894656 -1.1779108 0.9853935 0.3560747 , , f A B C D a 0.7357773 -1.7591637 1.6320887 1.2248529 b 0.4662315 0.1131432 -0.9790887 -0.6575306 c -0.3564725 -0.9202688 0.1017894 0.7382683 d 0.2825117 0.9242299 0.3577063 -1.3297339 # flatten array into a data frame with dimnames as factors # adply() converts an array to a data frame, applying a function # along the stated dimensions u - adply(foo, c(1, 2, 3), as.vector) subset(u, V1 0)[, 1:3] X1 X2 X3 2 b A e 3 c A e 7 c B e 11 c C e 12 d C e 16 d D e 17 a A f 18 b A f 20 d A f 22 b B f 24 d B f 25 a C f 27 c C f 28 d C f 29 a D f 31 c D f HTH, Dennis On Sun, May 15, 2011 at 9:20 PM, Pierre Roudier pierre.roud...@gmail.com wrote: Hi list, In a function I am writing, I need to extract the dimension names of an array. I know this can be acheived easily using dimnames() but my problem is that I want my function to be robust when the number of dimensions varies. Consider the following case: foo - array(data = rnorm(32), dim = c(4,4,2), dimnames=list(letters[1:4], LETTERS[1:4], letters[5:6])) # What I want is to extract the *names of the dimensions* for which foo have positive values: ind - which(foo 0, arr.ind = TRUE) # A first solution is: t(apply(ind, 1, function(x) unlist(dimnames(foo[x[1], x[2], x[3], drop=FALSE] # But it does require to know the dimensions of foo I would like to do something like: ind - which(foo 0, arr.ind = TRUE) t(apply(ind, 1, function(x) unlist(dimnames(foo[x, drop=FALSE] but in that case the dimnames are dropped. Any suggestion? Cheers, Pierre -- Scientist Landcare Research, New Zealand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Scientist Landcare Research, New Zealand __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.