[R] An entire data frame which is a time-series?
I have : raw - read.table(monthly.text, skip=3, sep=|, col.names=c(junk, junk2, wpi, g.wpi, wpi.primary, g.wpi.primary, wpi.fuel, g.wpi.fuel, wpi.manuf, g.wpi.manuf, cpi.iw, g.cpi.iw, cpi.unme, g.cpi.unme, cpi.al, g.cpi.al, cpi.rl, g.cpi.rl)) Now I can do things like: g.wpi = ts(raw$g.wpi, frequency=12, start=c(1994,7)) and it works fine. One by one, I can make time-series objects. Is there a way to tell R that the entire data frame is a set of time-series, so that I don't have to go column by column and make a new ts() out of each? I tried: M = ts(raw, frequency=12, start=c(1994,7)) ts.plot(M[,wpi], M[,wpi.manuf]) but this gives nonsense results. Also, syntax like M$wpi is a lot nicer than M[,wpi]. Any ideas about what might work? An unrelated suggestion: I found the documentation of ts() to be quite daunting. I have been around time-series and computer programming for decades. But it took me a while to handle the basics : to read in a file, to make time-series vectors, to run ARMA models. This stuff ought to be easier to learn. I tried to write an ARMA example, and put it up on the web, which would've been a godsend to me if I had found it earlier (http://www.mayin.org/~ajayshah/KB/R/tsa.html). I believe that the R documentation framework could do well by always having a 2000 word conceptual introduction + little tutorial on each package, instead of straight jumping into man pages on each function (which is the only documentation that we have presently). -- Ajay Shah Consultant [EMAIL PROTECTED] Department of Economic Affairs http://www.mayin.org/ajayshah Ministry of Finance, New Delhi __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Cross-variograms
Jacques, provided that X and Y are colocated (i.e., have exactly the same observation locations), you get the cross variogram right; the definition of this cross variogram is however: gamma(h)= E[(X(s)-X(s+h))*(Y(s)-Y(s+h))] also, where you select: cv - v$gamma[1:14] you may be better off using the more general v$gamma[v$id == X.Y] Best regards, -- Edzer __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] strptime() bug? And additional problem in package tseries
Hi all, I've got some problems with irts objects, one of which could be a bug: 1) Read a table with several columns from Postgres and the first column is Timestamp with timezone (this is OK). An extract is: raincida$ts: [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC 2) Try to extract time from this column of the dataframe (bug?) lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S) # An extract is: [2038] 2000-03-25 21:00:00 2000-03-25 22:00:00 2000-03-25 23:00:00 [2041] 2000-03-26 00:00:00 2000-03-26 01:00:00 2000-03-26 03:00:00 [2044] 2000-03-26 03:00:00 2000-03-26 04:00:00 2000-03-26 05:00:00 # note that element [2043] is wrong. This happens several times in # the dataset. This will produce an eventual error because of omitted # and duplicated values 3) The additional problem is related with function time() for irts objects. I try to make an irts from several columns of the table read: rain.irts - irts(as.POSIXct(lluvia.strptime,tz=GMT),cbind(raincida[[8]],raincida[[9]],raincida[[10]],raincida[[11]],raincida[[12]],raincida[[13]],raincida[[14]])) # this step doesn't seem to have any further problem. An extract is: 2000-03-25 22:00:00 GMT 0.275 0 0.07875 0.2 0 0.025 23.65 2000-03-25 23:00:00 GMT 0.275 0 0.07875 0.2 0 0.025 23.65 2000-03-26 00:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322 2000-03-26 01:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322 2000-03-26 03:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322 2000-03-26 03:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322 2000-03-26 04:00:00 GMT 0 0 0.001667 0.008333 0 0 0.5322 # But I try to extract the time part: time(rain.irts, tz='GMT') # An extract is: [2039] 2000-03-25 23:00:00 CET 2000-03-26 00:00:00 CET [2041] 2000-03-26 01:00:00 CET 2000-03-26 03:00:00 CEST [2043] 2000-03-26 05:00:00 CEST 2000-03-26 05:00:00 CEST # There isn't a way for this time to be shown as 'GMT'? I guess sometimes it is shown as 'CET' and other times as 'CEST' depending of the lag between the locale and gmt (utc) times. But for me this is an additional problem as the output shows one or two hours more that UTC time. Thanks all, and best regards, Javier G. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
NO bug in Re: [R] strptime() bug? And additional problem in package tseries
There is no bug in R here. There was a change to DST in Spain at 2am on 2000-03-26, and they are *printed* as times in your locale, as documented. Please read the posting guide and FAQ about what is a bug. Also, please try not to confuse an object and its printed representation. On Tue, 17 Aug 2004, javier garcia - CEBAS wrote: Hi all, I've got some problems with irts objects, one of which could be a bug: 1) Read a table with several columns from Postgres and the first column is Timestamp with timezone (this is OK). An extract is: raincida$ts: [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC 2) Try to extract time from this column of the dataframe (bug?) lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S) # An extract is: NO! That is an extract of *printing* lluvia.strptime, which will give you the times in your current time zone, as documented. [2038] 2000-03-25 21:00:00 2000-03-25 22:00:00 2000-03-25 23:00:00 [2041] 2000-03-26 00:00:00 2000-03-26 01:00:00 2000-03-26 03:00:00 [2044] 2000-03-26 03:00:00 2000-03-26 04:00:00 2000-03-26 05:00:00 # note that element [2043] is wrong. This happens several times in # the dataset. This will produce an eventual error because of omitted # and duplicated values I think you want to use as.POSIXct(lluvia.strptime, tz=GMT) to get what you may have intended. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] table and getting rownames
hi there say that i have this table x-table(adoc, oarb) x oarb 0 1 adoc ab1 0 am5 1 ba 14 1 cc 271 3 ch 87 2 dz 362 6 fl7 0 fs 84 2 is there an easy way to get the row names or row numbers of rows with oarb==0 i.e. (ab, fl) or (1, 7) regards soren __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] table and getting rownames
On Tue, 17 Aug 2004 [EMAIL PROTECTED] wrote: say that i have this table x-table(adoc, oarb) x oarb 0 1 adoc ab1 0 am5 1 ba 14 1 cc 271 3 ch 87 2 dz 362 6 fl7 0 fs 84 2 is there an easy way to get the row names or row numbers of rows with oarb==0 That seems to be with *entry* zero, not oarb = 0? i.e. (ab, fl) or (1, 7) rows(x)[x==0] rownames(x)[rows(x)[x==0]] will do what I think you meant to ask. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R
First, many thanks to Frank Harrell for once again helping me out. This actually relates to the next point, which is my contribution to the 'why don't social scientists use R' discussion. I am a hybrid social scientist(child psychiatrist) who trained on SPSS. Many of my difficulties in coming to terms with R have been to do with trying to apply the logic underlying SPSS, with dire results. You do not want to know how long I spent looking for a 'recode' command in R, to change factor names and classes. I think the solution is to combine a graphical interface that encourages command line use (such as Rcommander) with the analyse(this) paradigm suggested, but also explaining how one can a) display the code on a separate window ('page' is only an obvious command once you know it), and b) how one can then save one's modification, make it generally available, and not overwrite the unmodified version (again, thanks, Frank). Finally, one would need to change the emphasis in basic statistical teaching from 'the right test' to 'the right model'. That should get people used to R's logic. If a rabbit starts to use R, s/he is likely to head for the help files associated with each function, which can assume that the reader can make sense of gnomic utterances like Omit 'var' to impute all variables, creating new variables in 'search' position 'where'. I still don't know what that one means (as I don't understand search positions, or why they're important). This can be very offputting, and could lead the rabbit to return to familiar SPSS territory. Finally, friendlier error messages would also help. It took me 3 days, and opening every function I could, to work out that '...cannot find function xxx.data.frame...' meant that MICE was unable to make a polychotomous logistic imputation model converge for the variable immediately preceding it. I am now off to the help files and FAQs to find out how to change graph parameters, as the plot.mids function in MICE a) doesn't allow one to select a subset of variables, and b) tells me that the graph it wants to produce on the whole of my 26 variable dataset is too big to fit on the (windows) plotting device. Unless anyone wants to tell me how/where? (which of course is why, in the end, R is EASIER to use than SPSS) -- Original Message -- From: [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] Date: Sun, 15 Aug 2004 12:10:22 +0200 Send R-help mailing list submissions to [EMAIL PROTECTED] To subscribe or unsubscribe via the World Wide Web, visit https://stat.ethz.ch/mailman/listinfo/r-help or, via email, send a message with subject or body 'help' to [EMAIL PROTECTED] You can reach the person managing the list at [EMAIL PROTECTED] When replying, please edit your Subject line so it is more specific than Re: Contents of R-help digest... Today's Topics: 1. Re: numerical accuracy, dumb question (Brian Gough) 2. RE: numerical accuracy, dumb question (Tony Plate) 3. RE: numerical accuracy, dumb question (Dan Bolser) 4. Re: extracting datasets from aregImpute objects (Frank E Harrell Jr) 5. RE: numerical accuracy, dumb question (Marc Schwartz) 6. RE: numerical accuracy, dumb question (Marc Schwartz) 7. RE: numerical accuracy, dumb question (Prof Brian Ripley) 8. ROracle connection problem (xianghe yan) 9. association rules in R (Christoph Lehmann) 10. R Cookbook ([EMAIL PROTECTED]) 11. RE: numerical accuracy, dumb question (Marc Schwartz) 12. How to display the equation of ECDF (Yair Benita) 13. Re: association rules in R (Spencer Graves) 14. Re: How to display the equation of ECDF (Rolf Turner) 15. Re: How to display the equation of ECDF (Spencer Graves) 16. how to draw two graphs in one graph window (Chuanjun Zhang) 17. Rserve needs (but cannot find) libR.a (or maybe it's .so) (Paul Shannon) 18. Re: Rserve needs (but cannot find) libR.a (or maybe it's .so) (A.J. Rossini) 19. calibration/validation sets (Peyuco Porras Porras .) 20. RE: calibration/validation sets (Austin, Matt) 21. Re: calibration/validation sets (Kevin Wang) 22. RE: calibration/validation sets (Liaw, Andy) 23. Dirichlet-Multinomial (Z P) 24. Re: how to draw two graphs in one graph window (Adaikalavan Ramasamy) 25. index and by groups statement (Robert Waters) 26. Re: index and by groups statement (Adaikalavan Ramasamy) -- Message: 1 Date: 14 Aug 2004 10:46:31 +0100 From: Brian Gough [EMAIL PROTECTED] Subject: Re: [R] numerical accuracy, dumb question To: Dan Bolser [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Dan Bolser [EMAIL PROTECTED] writes: I store an id as a big number, could this be a problem? If there are ids with significant leading zeros, or too big to be represented accurately (2^53)--you won't get any warning about
Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R
I'm just curious, but how do social scientists, or anyone else for that matter, learn SPSS, besides taking a class? -roger [EMAIL PROTECTED] wrote: First, many thanks to Frank Harrell for once again helping me out. This actually relates to the next point, which is my contribution to the 'why don't social scientists use R' discussion. I am a hybrid social scientist(child psychiatrist) who trained on SPSS. Many of my difficulties in coming to terms with R have been to do with trying to apply the logic underlying SPSS, with dire results. You do not want to know how long I spent looking for a 'recode' command in R, to change factor names and classes. I think the solution is to combine a graphical interface that encourages command line use (such as Rcommander) with the analyse(this) paradigm suggested, but also explaining how one can a) display the code on a separate window ('page' is only an obvious command once you know it), and b) how one can then save one's modification, make it generally available, and not overwrite the unmodified version (again, thanks, Frank). Finally, one would need to change the emphasis in basic statistical teaching from 'the right test' to 'the right model'. That should get people used to R's logic. If a rabbit starts to use R, s/he is likely to head for the help files associated with each function, which can assume that the reader can make sense of gnomic utterances like Omit 'var' to impute all variables, creating new variables in 'search' position 'where'. I still don't know what that one means (as I don't understand search positions, or why they're important). This can be very offputting, and could lead the rabbit to return to familiar SPSS territory. Finally, friendlier error messages would also help. It took me 3 days, and opening every function I could, to work out that '...cannot find function xxx.data.frame...' meant that MICE was unable to make a polychotomous logistic imputation model converge for the variable immediately preceding it. I am now off to the help files and FAQs to find out how to change graph parameters, as the plot.mids function in MICE a) doesn't allow one to select a subset of variables, and b) tells me that the graph it wants to produce on the whole of my 26 variable dataset is too big to fit on the (windows) plotting device. Unless anyone wants to tell me how/where? (which of course is why, in the end, R is EASIER to use than SPSS) __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] using nls to fit a four parameter logistic model
Shalini, I think your hill equation is meant to just be an alternative parameterization of the four parameter logistic (BTW, the hill *coefficient* is a function of the slope parameter of the FPL, but I don't believe hill equation is standard terminology). Note conc is the input in this parameterization, not log(conc). nls(log(il10)~A+(B-A)/(1+(conc/xmid )^scal),data=test, + start = list(A=3.5, B=15, + xmid=600,scal=1/2.5)) Nonlinear regression model model: log(il10) ~ A + (B - A)/(1 + (conc/xmid)^scal) data: test A Bxmidscal 14.7051665 3.7964534 607.9822962 0.3987786 residual sum-of-squares: 0.1667462 To see the equivalence to the other parametrization that you used, note 1/2.507653 [1] 0.3987793 log(607.9822962) [1] 6.410146 --Jim Message: 17 Date: Mon, 16 Aug 2004 11:25:57 -0500 From: [EMAIL PROTECTED] Subject: [R] using nls to fit a four parameter logistic model To: [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=US-ASCII I am working on what appears to be a fairly simple problem for the following data test=data.frame(cbind(conc=c(25000, 12500, 6250, 3125, 1513, 781, 391, 195, 97.7, 48.4, 24, 12, 6, 3, 1.5, 0.001), il10=c(330269, 216875, 104613, 51372, 26842, 13256, 7255, 3049, 1849, 743, 480, 255, 241, 128, 103, 50))) I am able to fit the above data to the equation nls(log(il10)~A+(B-A)/(1+exp((xmid-log(conc))/scal)),data=test, + start = list(A=log(0.001), B=log(10), + xmid=log(6000),scal=0.8)) Nonlinear regression model model: log(il10) ~ A + (B - A)/(1 + exp((xmid - log(conc))/scal)) data: test A B xmid scal 3.796457 14.705159 6.410144 2.507653 residual sum-of-squares: 0.1667462 But in attempting to achieve a fit to what is commonly known as the hill equation, which is a four parameter fit that is used widely in biological data analysis nls(log(il10)~A+(B-A)/(1+(log(conc)/xmid )^scal),data=test, + start = list(A=log(0.001), B=log(10), xmid=log(6000),scal=0.8)) Nonlinear regression model model: log(il10) ~ A + (B - A)/(1 + (log(conc)/xmid )^scal) Error in numericDeriv(form[[3]], names(ind), env) : Missing value or an Infinity produced when evaluating the model Please would someone offer a suggestion Shalini James A. Rogers Manager, Nonclinical Statistics PGRD Groton Labs Eastern Point Road (MS 260-1331) Groton, CT 06340 office: (860) 686-0786 fax: (860) 715-5445 LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}} __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Bug in colnames of data.frames?
Hi, I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? Arne __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
Arne Henningsen wrote: Hi, I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] This bug is the user ... ;-) Type: str(myData) `data.frame': 4 obs. of 3 variables: $ var1: int 1 2 3 4 $ var2: int 5 6 7 8 $ var4:`data.frame': 4 obs. of 1 variable: ..$ var1: int 6 8 10 12 Aha! You have created a data.frame consisting of one column! What you mean really mean is myData$var5 - myData[[ var1 ]] + myData[[ var2 ]] Uwe Ligges the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? Arne __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Fwd: strptime() problem?
Hi all; I've already send a similar e-mail to the list and Prof. Brian Ripley answered me but my doubts remain unresolved. Thanks for the clarification, but perhaps I wasn't clear enough in posting my questions. I've got a postgres database which I read into R. The first column is Timestamp with timezone, and my data are already in UTC format. An 'printed' extract of R character column, resulting from the timestamptz field is: raincida$ts: [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC #And I need to convert this character column into POSIXct, for eventual work. #As I can see in the documentation, the process is to use strptime(), what #creates an object POSIXlt and doesn't allow to specify that the time zone of #the data is already UTC; followed by as.POSIXct() lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S) lluvia.strptime.POSIXct - as.POSIXct(lluvia.strptime,tz=GMT) A printed extract is: [2039] 2000-03-25 22:00:00 GMT 2000-03-25 23:00:00 GMT [2041] 2000-03-26 00:00:00 GMT 2000-03-26 01:00:00 GMT [2043] 2000-03-26 03:00:00 GMT 2000-03-26 03:00:00 GMT [2045] 2000-03-26 04:00:00 GMT 2000-03-26 05:00:00 GMT As we can see, elements [2043] differ. Shouldn't they be similar as the rest of the other shown elements? I thought this was a bug, but it seems that I've got and conceptual error.(?). This happens several times in my data, and produces eventual errors. Please, how could I resolved this? Thanks all, and best regards, Javier G. __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
Arne Henningsen [EMAIL PROTECTED] writes: Hi, I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? Nope: str(myData) `data.frame': 4 obs. of 4 variables: $ var1: int 1 2 3 4 $ var2: int 5 6 7 8 $ var3: int 6 8 10 12 $ var4:`data.frame': 4 obs. of 1 variable: ..$ var1: int 6 8 10 12 It's slightly peculiar, but if a column of a data frame is itself a rectangular structure (data frame or matrix), then the innermost names are used. Cf. myData[,var4] - cbind(xyzzy=5:2) myData var1 var2 var3 xyzzy 1156 5 2268 4 337 10 3 448 12 2 Arguably, one might prefer var1 var2 var3 var4 xyzzy 1156 5 2268 4 337 10 3 448 12 2 or something like that, but it's hardly a bug. -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames? -- NOT
This is not a bug, and BTW data frames have names not colnames. As I have said already today, don't confuse the printed repesentation of an object with the object itself. On Tue, 17 Aug 2004, Arne Henningsen wrote: I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? No. Take a look at the fourth column more carefully. myData[4] var1 16 28 3 10 4 12 class(myData[4]) [1] data.frame You included a single-column data frame in your data frame. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] levels of factor
R-help, I have a data frame wich I subset like : a - subset(df,df$column2 %in% c(factor1,factor2) df$column2==1) But when I type levels(a$column2) I still get the same levels as in df (my original data frame) Why is that? Is it right? Luis Luis Ridao Cruz Fiskirannsóknarstovan Nóatún 1 P.O. Box 3051 FR-110 Tórshavn Faroe Islands Phone: +298 353900 Phone(direct): +298 353912 Mobile: +298 580800 Fax: +298 353901 E-mail: [EMAIL PROTECTED] Web:www.frs.fo __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] using nls to fit a four parameter logistic model
In your second model, log(conc) is negative for conc = 0.001. This observation will generate NA for (log(conc)/xmid)^scal unless scal is an integer or xmid is also negative. In the latter case, (log(conc)/xmid)^scal will be NA for all but that last value unless scal is an integer. What do your biological references do with this model for concentrations less than 1? If you delete that observation, the algorithm can still die testing a value for xmid = 0. To avoid these cases, I routine parameterize problems like this in terms of ln.xmid, something like the following: log(il10)~A+(B-A)/(1+(log(conc)/exp(ln.xmid))^scal). hope this helps. spencer graves [EMAIL PROTECTED] wrote: Shalini Raghavan 3M Pharmaceuticals Research Building 270-03-A-10, 3M Center St. Paul, MN 55144 E-mail: [EMAIL PROTECTED] Tel: 651-736-2575 Fax: 651-733-5096 - Forwarded by Shalini Raghavan/US-Corporate/3M/US on 08/16/2004 11:25 AM - Shalini Raghavan/US-Corpo rate/3M/US To [EMAIL PROTECTED] 08/16/2004 08:57 cc AM Subject Fw: using nls to fit a four parameter logistic model I am working on what appears to be a fairly simple problem for the following data test=data.frame(cbind(conc=c(25000, 12500, 6250, 3125, 1513, 781, 391, 195, 97.7, 48.4, 24, 12, 6, 3, 1.5, 0.001), il10=c(330269, 216875, 104613, 51372, 26842, 13256, 7255, 3049, 1849, 743, 480, 255, 241, 128, 103, 50))) test conc il10 1 25000.000 330269 2 12500.000 216875 3 6250.000 104613 4 3125.000 51372 5 1513.000 26842 6781.000 13256 7391.000 7255 8195.000 3049 9 97.700 1849 1048.400743 1124.000480 1212.000255 13 6.000241 14 3.000128 15 1.500103 16 0.001 50 I am able to fit the above data to the equation nls(log(il10)~A+(B-A)/(1+exp((xmid-log(conc))/scal)),data=test, + start = list(A=log(0.001), B=log(10), + xmid=log(6000),scal=0.8)) Nonlinear regression model model: log(il10) ~ A + (B - A)/(1 + exp((xmid - log(conc))/scal)) data: test A B xmid scal 3.796457 14.705159 6.410144 2.507653 residual sum-of-squares: 0.1667462 But in attempting to achieve a fit to what is commonly known as the hill equation, which is a four parameter fit that is used widely in biological data analysis nls(log(il10)~A+(B-A)/(1+(log(conc)/xmid )^scal),data=test, + start = list(A=log(0.001), B=log(10), xmid=log(6000),scal=0.8)) Nonlinear regression model model: log(il10) ~ A + (B - A)/(1 + (log(conc)/xmid )^scal) Error in numericDeriv(form[[3]], names(ind), env) : Missing value or an Infinity produced when evaluating the model Please would someone offer a suggestion Shalini __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
On Tue, 2004-08-17 at 09:01, Arne Henningsen wrote: Hi, I am using R 1.9.1 on on i686 PC with SuSE Linux 9.0. I have a data.frame, e.g.: myData - data.frame( var1 = c( 1:4 ), var2 = c (5:8 ) ) If I add a new column by myData$var3 - myData[ , var1 ] + myData[ , var2 ] everything is fine, but if I omit the commas: myData$var4 - myData[ var1 ] + myData[ var2 ] the name shown above the 4th column is not var4: myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 but names() and colnames() return the expected name: names( myData ) [1] var1 var2 var3 var4 colnames( myData ) [1] var1 var2 var3 var4 And it is even worse: I am not able to change the name shown above the 4th column: names( myData )[ 4 ] - var5 myData var1 var2 var3 var1 11566 22688 337 10 10 448 12 12 I guess that this is a bug, isn't it? Arne Here is a hint: # This returns an integer vector str(myData[ , var1 ] + myData[ , var2 ]) int [1:4] 6 8 10 12 # This returns a data.frame str(myData[ var1 ] + myData[ var2 ]) `data.frame': 4 obs. of 1 variable: $ var1: int 6 8 10 12 str(myData) `data.frame': 4 obs. of 5 variables: $ var1: int 1 2 3 4 $ var2: int 5 6 7 8 $ var3: int 6 8 10 12 $ var4:`data.frame': 4 obs. of 1 variable: ..$ var1: int 6 8 10 12 Take a look at the details, value and coercion sections of ?.data.frame HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
On Tue, 2004-08-17 at 09:34, Marc Schwartz wrote: Take a look at the details, value and coercion sections of ?.data.frame This must be my week for typos. That should be: ?[.data.frame (in ESS) or ?[.data.frame (otherwise) Marc __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] levels of factor
On Tue, 2004-08-17 at 09:30, Luis Rideau Cruz wrote: R-help, I have a data frame wich I subset like : a - subset(df,df$column2 %in% c(factor1,factor2) df$column2==1) But when I type levels(a$column2) I still get the same levels as in df (my original data frame) Why is that? The default for [.factor is: x[i, drop = FALSE] Hence, unused factor levels are retained. Is it right? Yes. If you want to explicitly recode the factor based upon only those levels that are actually in use, you can do something like the following: a - factor(a) However, I am a bit unclear as to the logic of the subset statement that you are using, perhaps b/c I don't know what your data is. You seem to be subsetting 'column2' on both the factor levels and a presumed numeric code. Is that really what you want to do? You might want to review the Warning section in ?factor BTW, when using subset(), the evaluation takes place within the data frame, so you do not need to use df$column2 in the function call. You can just use column2, for example: subset(df, column2 %in% c(factor1, factor2)) See ?factor and ?[.factor for more information. HTH, Marc Schwartz __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Fwd: strptime() problem?
javier garcia - CEBAS rn001 at cebas.csic.es writes: : : Hi all; : I've already send a similar e-mail to the list and Prof. Brian Ripley : answered me but my doubts remain unresolved. Thanks for the clarification, : but perhaps I wasn't clear enough in posting my questions. : : I've got a postgres database which I read into R. The first column is : Timestamp with timezone, and my data are already in UTC format. An 'printed' : extract of R character column, resulting from the timestamptz field is: : : raincida$ts: : : [2039] 25/03/2000 22:00:00 UTC 25/03/2000 23:00:00 UTC : [2041] 26/03/2000 00:00:00 UTC 26/03/2000 01:00:00 UTC : [2043] 26/03/2000 02:00:00 UTC 26/03/2000 03:00:00 UTC : [2045] 26/03/2000 04:00:00 UTC 26/03/2000 05:00:00 UTC : : #And I need to convert this character column into POSIXct, for eventual work. : #As I can see in the documentation, the process is to use strptime(), what : #creates an object POSIXlt and doesn't allow to specify that the time zone of : #the data is already UTC; followed by as.POSIXct() : : lluvia.strptime - strptime(raincida$ts, format=%d/%m/%Y %H:%M:%S) : lluvia.strptime.POSIXct - as.POSIXct(lluvia.strptime,tz=GMT) : : A printed extract is: : : [2039] 2000-03-25 22:00:00 GMT 2000-03-25 23:00:00 GMT : [2041] 2000-03-26 00:00:00 GMT 2000-03-26 01:00:00 GMT : [2043] 2000-03-26 03:00:00 GMT 2000-03-26 03:00:00 GMT : [2045] 2000-03-26 04:00:00 GMT 2000-03-26 05:00:00 GMT : : As we can see, elements [2043] differ. Shouldn't they be similar as the rest : of the other shown elements? I thought this was a bug, but it seems that I've : got and conceptual error.(?). This happens several times in my data, and : produces eventual errors. : : Please, how could I resolved this? [Sorry if this gets posted twice. I had a problem posting and not sure if the first one ever got sent.] I am in a different time zone, EDT, on Windows XP and can't replicate this but you might try reading the latest R News article on dates and times for some ideas, viz. page 32 of: http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pd In particular, try converting the datetimes to chron and then doing your manipulations in chron or else converting them from chron to POSIXct rather than going through POSIXlt: require(chron) r.asc - raincida$ts r.chron - chron(substring(r.asc, 1, 10), substring(r.asc, 12, 19), format = c(d/m/y, h:m:s)) r.ct - as.POSIXct(r.chron) format(r.ct, tz=GMT) # display in GMT __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
RE: [R] Fwd: strptime() problem?
Javier, I recently had a problem with dates. This example might shed some light on your problem. x - ISOdate(rep(2000,2),rep(3,2),rep(26,2),hour=0) x [1] 2000-03-26 GMT 2000-03-26 GMT unclass(x) [1] 954028800 954028800 attr(,tzone) [1] GMT When one creates a date with ISOdate, the resulting object is of class POSIXct and is given the attribute tzone which is set to GMT. When one prints an object of class POSIXct the function print.POSIXct is called: print.POSIXct function (x, ...) { print(format(x, usetz = TRUE, ...), ...) invisible(x) } environment: namespace:base So, that function is just calling format which gets dispatched to format.POSIXct: format.POSIXct function (x, format = , tz = , usetz = FALSE, ...) { if (!inherits(x, POSIXct)) stop(wrong class) if (missing(tz) !is.null(tzone - attr(x, tzone))) tz - tzone structure(format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...), names = names(x)) } environment: namespace:base Now, if one looks carefully at this code, you will see that it tests for the attribute tzone on the object that is passed in. If it finds that attribute, then it is passed on to format.POSIXlt (which is the function that ultimately does the printing). If there is no tzone attribute, then is passed to format.POSIXlt as the tzone, which causes the object to be printed in your locale specific format. See: attr(x,tzone) - x [1] 2000-03-25 19:00:00 Eastern Standard Time 2000-03-25 19:00:00 Eastern Standard Time attr(x,tzone) - GMT x [1] 2000-03-26 GMT 2000-03-26 GMT Now this is the part that really got me confused: x [1] 2000-03-26 GMT 2000-03-26 GMT x[1] [1] 2000-03-25 19:00:00 Eastern Standard Time What happens in the above case is that the code for [.POSIXct looks like this: get([.POSIXct) function (x, ..., drop = TRUE) { cl - oldClass(x) class(x) - NULL val - NextMethod([) class(val) - cl val } environment: namespace:base The attribute tzone is not preserved!! when val is created from the call to NextMethod, its class is restored, but not its tzone attribute. So any dates of class POSIXct that are printed after they have been subscripted ([) will have their tzone attribute stripped, and will print in the local specific format. For your specific case, I would convert all my dates to POSIXct, then set the attribute tzone to GMT. After that, be very careful when subscripting them, or you will find them printing in local specific formats again. for you: y - strptime(4/3/2000,format=%m/%d/%Y) y [1] 2000-04-03 y - as.POSIXct(y,GMT) y [1] 2000-04-03 GMT unclass(y) [1] 95472 attr(,tzone) [1] GMT I think that should straighten out your problem. Hope that helps, Whit __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Thanks Frank, setting graph parameters,and why social scientists don't use R
A few comments: First, your remarks are interesting and, I would say, mainly well founded. However, I think they are in many respects irrelevant, although they do point to the much bigger underlying issue, which Roger Peng also hinted at in his reply. I think they are sensible because R IS difficult; the documentation is often challenging, which is not surprising given (a) the inherent complexity of R; (b) the difficulty in writing good documentation, especially when many of the functions being documented are inherently technical, so subject matter knowledge (CS, statistics, numerical analysis ,...) must be assumed; (c) the documentation has been written by a variety of mostly statistical types as a sidelight of their main professional activities -- none of these writers are ** professional documenters ** (whatever that may mean) and some of them even speak ENglish as a second or third language. My own take is that the documentation for Core R and many of the packages is remarkably well done given these realities, and my hat is off to those who have produced it. Nevertheless, I agree, it is challenging -- it MUST be. But they are irrelevant because the fundamental issue **is** that there is an inherent tension between ease of use and power/flexibility. Writing good GUI's for anything is hard, very hard. For a project such as R, it doesn't make sense, although it may to write GUI's for small subsets of R targeted at specific audiences (as in BioConductor, RCommander, etc.). But even this is hard to do well and takes a lot of time and effort. So, IMHO, there never will be nor ever should/could be an overall GUI for R: it is too complex and needs to be too extensible and flexible to constrain it in that way. However, I believe the larger question that both you and Roger Peng hint at is more important: not How does a social scientist learn to use R, but how does any scientist/technologist for whom experimental design and data analysis forms a large component of their work gain the necessary technical background in statistics and related disciplines (linear algebra, numerical analysis, ...) to ** know how to use the statistical tools they need that R provides.** Software like SPSS must assume a limited collection of methods to present to their customers in an effective GUI. Their strategy **must** be (this is NOT a criticism) to dumb it down so that they can provide coherent albeit limited data analysis strategies. As you have explicitly stated, users who wish to venture outside those narrow paradigms are simply out of luck. R was designed from the outset not to be so constrained, but the cost is that you must know a good deal to use it effectively. It is obvious from the questions posted to this list that even something as simple as lm() often demands from users technical statistical understanding far beyond what they have. So we see fairly frequently indications of misunderstanding and confusion in using R. But the problem isn't R -- it's that users don't know enough statistics. I wish I could say I had an answer for this, but I don't have a clue. I do not thing it's fair to expect a mechnical engineer or psychologist or biologist to have the numerous math and statistical courses and experience in their training that would provide the base they need. For one thing, they don't have the time in their studies for this; for another, they may not have the background or interest -- they are, after all, mechanical engineers or biologists, not statisticians. Unfortunately, they could do their jobs as engineers and scientists a lot better if they did know more statistics. To me, it's a fundamental conundrum, and no one is to blame. It's just the reality, but it is the source for all kinds of frustrations on both sides of the statistical divide, which both you and Roger expressed in your own ways. Obviously, all of this is just personal ranting, so I would love to hear alternative views. An thanks again for your clear and interesting comments. Cheers, Bert [EMAIL PROTECTED] wrote: First, many thanks to Frank Harrell for once again helping me out. This actually relates to the next point, which is my contribution to the 'why don't social scientists use R' discussion. I am a hybrid social scientist(child psychiatrist) who trained on SPSS. Many of my difficulties in coming to terms with R have been to do with trying to apply the logic underlying SPSS, with dire results. You do not want to know how long I spent looking for a 'recode' command in R, to change factor names and classes. I think the solution is to combine a graphical interface that encourages command line use (such as Rcommander) with the analyse(this) paradigm suggested, but also explaining how one can a) display the code on a separate window ('page' is only an obvious command once you know it), and b) how one can then save one's modification, make it generally available, and
Re: [R] An entire data frame which is a time-series?
Ajay Shah ajayshah at mayin.org writes: : : I have : : : raw - read.table(monthly.text, skip=3, sep=|, : col.names=c(junk, junk2, : wpi, g.wpi, wpi.primary, g.wpi.primary, : wpi.fuel, g.wpi.fuel, wpi.manuf, g.wpi.manuf, : cpi.iw, g.cpi.iw, cpi.unme, g.cpi.unme, : cpi.al, g.cpi.al, cpi.rl, g.cpi.rl)) : : Now I can do things like: : : g.wpi = ts(raw$g.wpi, frequency=12, start=c(1994,7)) : : and it works fine. One by one, I can make time-series objects. : : Is there a way to tell R that the entire data frame is a set of : time-series, so that I don't have to go column by column and make a : new ts() out of each? : : I tried: : : M = ts(raw, frequency=12, start=c(1994,7)) : ts.plot(M[,wpi], M[,wpi.manuf]) : : but this gives nonsense results. Converting a data frame to a ts object seems to work for me: R my.df - data.frame(a = 1:4, b = 5:8) R my.ts - ts(my.df, start=c(2000,4), freq=12) R my.ts.a - my.ts[,a] R my.ts.a Apr May Jun Jul 2000 1 2 3 4 Suggest you provide a small reproduceable example that illustrates the problem. : Also, syntax like M$wpi is a lot : nicer than M[,wpi]. Any ideas about what might work? R $.ts - function(x, i) x[,i] R my.ts$a Apr May Jun Jul 2000 1 2 3 4 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] aov summary to matrix
Is there an easy way of converting an aov.summary into a matrix in which the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F value and Pr. For example, convert Df Sum Sq Mean Sq F value Pr(F) block5 343.29 68.66 4.4467 0.015939 * N1 189.28 189.28 12.2587 0.004372 ** P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 * N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 To Factor Df Sum Sq Mean Sq F value Pr block5 343.29 68.66 4.4467 0.015939 N1 189.28 189.28 12.2587 0.004372 P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44NA NA Thanks, - Moises __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aov summary to matrix
On Tue, 17 Aug 2004, Moises Hassan wrote: Is there an easy way of converting an aov.summary into a matrix in which the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F value and Pr. You are confusing the printed representation with the object (which seems today's favourite misconception). as.matrix(summary(npk.aov)[[1]]) is a matrix (to full precision) as you seek, although I would prefer to work with the data frame which is returned. (Note: your output is from MASS4 example(aov), unattributed.) For example, convert Df Sum Sq Mean Sq F value Pr(F) block5 343.29 68.66 4.4467 0.015939 * N1 189.28 189.28 12.2587 0.004372 ** P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 * N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 To Factor Df Sum Sq Mean Sq F value Pr block5 343.29 68.66 4.4467 0.015939 N1 189.28 189.28 12.2587 0.004372 P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44NA NA -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] survdiff
Krista Haanstra [EMAIL PROTECTED] writes: As I am quitte an ignorant user of R, excuse me for any wrongfull usage of all the terms. My question relates to the statistics behind the survdiff function in the package survival. My textbook knowledge of the logrank test tells me that if I want to compare two survival curves, I have to take the sum of the factors: (O-E)^2/E of both groups, which will give me the Chisq. If I calculate this by hand, I get a different value than the one R is giving me. Actually, the (O-E)^2/E that R gives me, those I agree with, but if I then take the sum, this is not the chisq R gives. Two questions: - How is Chisq calculated in R? - What does the column (O-E)^2/V mean? What is V, and how does this possibly relate to the calculated Chisq? You really need to read a theory book for this, but here's the basic idea: V is the theoretical variance of O-E for the first group. If O-E is approximately normally distributed, as it will be in large samples, then (O-E)^2/V will be approximately chi-squared distributed on 1 DF. In *other* models, notably those for contingency tables, the same idea works out as the familiar sum((O-E)^2/E) formula. That formula has historically been used for the logrank test too, and it still appears in some textbooks, but as it turns out, it is not actually correct (although often quite close). [To fix ideas, consider testing for a given p in the binomial distribution, you can either say O=x E=np V=npq and get chisq = (x-np)^2/npq or have O = (x, n-x), E = (np, nq) and get chisq = (x-np)^2/np + ((n-x) - nq)^2/nq and a little calculus show that the latter expression is = (x-np)^2*(1/np + 1/nq) = (x-np)^2 * (p+q)/npq so the two formulas are one and the same. In this case!] -- O__ Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] aov summary to matrix
Moises Hassan wrote: Is there an easy way of converting an aov.summary into a matrix in which the rows are the factor names and the columns are Df, Sum Sq, Mean Sq, F value and Pr. For example, convert Df Sum Sq Mean Sq F value Pr(F) block5 343.29 68.66 4.4467 0.015939 * N1 189.28 189.28 12.2587 0.004372 ** P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 * N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 To Factor Df Sum Sq Mean Sq F value Pr block5 343.29 68.66 4.4467 0.015939 N1 189.28 189.28 12.2587 0.004372 P1 8.408.40 0.5441 0.474904 K1 95.20 95.20 6.1657 0.028795 N:P 1 21.28 21.28 1.3783 0.263165 N:K 1 33.14 33.14 2.1460 0.168648 P:K 1 0.480.48 0.0312 0.862752 Residuals 12 185.29 15.44NA NA Try this: example(aov) as.data.frame.summary.aovlist - function(x) { if(length(x) == 1) { as.data.frame(x[[1]]) } else { lapply(unlist(x, FALSE), as.data.frame) } } x1 - summary(npk.aov) x2 - summary(npk.aovE) as.data.frame(x1) as.data.frame(x2) --sundar __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Bug in colnames of data.frames?
On Tue, 17 Aug 2004, Arne Henningsen wrote: Thank you for all your answers! I agree with you that it is not a bug. My mistake was that I thought that a data frame is similar to a matrix, but as ?data.frame says they ... share many of the properties of matrices and of lists. ... I think the current presentation myData var1 var2 var3 xyzzy 1156 5 2268 4 337 10 3 448 12 2 is confusing because it is not directly (without another command like str()) apparent why myData[[ var1 ]] works fine while myData[[ xyzzy ]] does not. In some ways it is a bug -- in the documentation, print.data.frame, or format.data.frame Consider assigning a wider dataframe to var4: myData-data.frame(matrix(1:12,4),var4=I(data.frame(xyzzy=5:2,plugh=1:4))) myData # error class(myData[[var4]])-data.frame myData # gives indications of sub-variables by var.xyzzy, var.plugh myData[[var4.plugh]] # NULL myData[[var4]][[plugh]] str(myData) By the way, is there a way of making such an assignment in one step without the I() class() hack? dave -- Dave Forrest [EMAIL PROTECTED](804)684-7900w [EMAIL PROTECTED] (804)642-0662h http://maplepark.com/~drf5n/ __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R
On Tuesday 17 August 2004 06:14, Roger D. Peng wrote: I'm just curious, but how do social scientists, or anyone else for that matter, learn SPSS, besides taking a class? They sit down with a book, a computer, and data they desperately need to analyze and start working. SPSS documentation and some of the third party works are fairly thorough, and pretty gentle, and the writings fits the expectations of someone who has had only the initiatory stats courses. Your teacher emphasizes checking the normality of the data, so you look for the means of measuring it and the tests that tell you whether it is significant or not, after very carefully considering the nature of your data in the light of the assumptions made in the SPSS tests make. You are far less concerned with the real mathematical mechanics than you are about meeting the expectations of the professor. SPSS, SYSTAT, NCSS and similar programs all support this kind work. Many social science professors don't really know enough to judge your work beyond similar expectations THEY learned from their own professors. It's sad, but the way it works in many schools. J __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Re: Thanks Frank, setting graph parameters, and why social scientists don't use R
On Tuesday 17 August 2004 09:20, Berton Gunter wrote: A few comments: It has been decades since I used SPSS. At that time, to really work with it you edited a text file program that identified the data file and variable columns you wanted to work with. You assembled the flow of work commands after carefully going through the SPSS documentation. After you were ready, you ran the program and crossed your fingers. R IS complex, enough so that the useability at a basic level is readily achievable. What it lacks is simply the Stat 1 and Stat 101 packages that lead users from the very basics covered in introductory statistics texts into more profound analyses that some many R users are interested in. There are some texts, such as Peter Daalgard's Introductory Statistics with R, which is a very useful book. However, from a student's view point Chapter 1 focuses on R, everything from the R Language to R programming. The statistics chapters that follow almost seem to be used as an adjunct to teaching R rather than vice versa. For some social science students, a package that leads more gradually into R would probably be a big help learning learning the language while getting their feet wet in statistics. John __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] logistic -normal model
I am working with a logistic-normal model (i.e, GLMM with random intercept model) by Bayesian method. BUt I met some difficulities for programming by R. Is there anyone have experience of this model or the R code I can refer as example? Thanks for your help. Syl __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] creating a plot
Hi, I have a time series plot to produce, yet I want the x-axis to be labelled with dates (stored on another array) and not with observation numbers. Can anyone suggest me how? Thanks. Konrad __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] creating a plot
--- Konrad Banachewicz [EMAIL PROTECTED] wrote: Hi, I have a time series plot to produce, yet I want the x-axis to be labelled with dates (stored on another array) and not with observation numbers. Can anyone suggest me how? Thanks. Konrad Try checking out http://www.medepi.net/data/wnv/index.html at bottom of page. Tomas = Tomas Aragon, MD, DrPH, Director Center for Infectious Disease Preparedness UC Berkeley School of Public Health 1918 University Ave., 4th Fl., MC-7350 Berkeley, CA 94720-7350 http://www.idready.org __ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html