Re: [R] R and SPSS
Hi, There exists a R plug-in for SPSS. You can find it on the SPSS website. Hope it helps. Alain Liviu Andronic wrote: Hello, On Wed, Nov 26, 2008 at 9:25 PM, Applejus [EMAIL PROTECTED] wrote: I have a code in R. Could anyone give me the best possible way (or just ways!) to integrate it in SPSS? I would doubt you could do this, but for the least provide commented, minimal, self-contained, reproducible code. It would help if you were more specific. Liviu -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau d.126 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in sqlCopy in RODBC
BKMooney wrote: I tried your suggestion... library(RODBC) channel = odbcConnectAccess(db.mdb) sqlCopy(channel,Select * from tab,newtab,destchannel=channel, safer=TRUE,append=TRUE,rownames=FALSE,fast=FALSE) odbcClose(channel) however, I am still running into errors, both when appending to an existing table, or creating a table if the destination table does not exist. The code I am using is: query - select * from tblHistorical where MyDate between '2008-11-21' and '2008-11-25' ; sqlCopy(RemoteChannel, query, NewTable, destchannel=LocalChannel, safer=TRUE, append=TRUE, rownames=FALSE, fast=FALSE) This is confusing: did you get an error trying my code, or did you get an error with your code? RODBC is very context dependent, and chances are close to zero that you get a useful reply when you do not provide an example or even not run the example provided in your environment. Did you try the select statement directly in a query? '2008-11-25' or #2008-11-25#? Dieter -- View this message in context: http://www.nabble.com/Error-in-sqlCopy-in-RODBC-tp20691929p20715135.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Welcome to the R-help mailing list (Digest mode)
Hi friends, Is there anyone who happened to import data set in the DL format into R for further analysis? In the package of network, there's only a method named read.paj(). So now, I have to get the dl file from the original data set, and use UCINET to convert it to .net file. It's too complicated. :( In the package of igraph, edgelist is one the format which can be imported, however, weight can not be expressed. As for the format ncol, weight is considered, but it's always undirected. Thank you for comments and suggestions! Best! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Welcome to the R-help mailing list (Digest mode)
With igraph, do read.graph(filename, format=ncol, directed=TRUE) I just noticed that the 'directed' option is not documented, sorry about that, Gabor ps. please give a proper subject to your emails... On Thu, Nov 27, 2008 at 9:54 AM, Weijia You [EMAIL PROTECTED] wrote: Hi friends, Is there anyone who happened to import data set in the DL format into R for further analysis? In the package of network, there's only a method named read.paj(). So now, I have to get the dl file from the original data set, and use UCINET to convert it to .net file. It's too complicated. :( In the package of igraph, edgelist is one the format which can be imported, however, weight can not be expressed. As for the format ncol, weight is considered, but it's always undirected. Thank you for comments and suggestions! Best! Weijia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gabor Csardi [EMAIL PROTECTED] UNIL DGM __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regression Problem for loop
Dear all, I have wrote a code for a linear regression. I want to write a loop for so, that I can get estimate for pavlues for six predictors. But I am getting for estmate for only last one. How can I get pvalues for all my predictors in a loop?? Anticipating your help Thanks Ales mat-matrix(rnorm(36),nrow=6) mat [,1] [,2] [,3] [,4] [,5][,6] [1,] 1.10536338 -0.7613770 -1.7100569 -1.8762241 -0.36579280 0.6465219 [2,] -1.34836804 -0.2174270 -0.1153477 -0.1727683 -1.88406206 1.7484955 [3,] 0.96814418 -2.1483727 0.5839668 -1.2361659 0.04592844 1.9937995 [4,] 0.01960219 -1.2339691 0.8290761 0.1002795 -0.15952881 0.3969251 [5,] 1.62343073 1.3741222 -1.2045854 0.4180127 -0.09898615 1.3575119 [6,] -0.95260509 -0.1522824 -1.4257526 1.0057412 -1.20068336 -0.4306761 res-rnorm(6) res [1] 0.2045252 -0.9824761 0.7727004 0.6439993 1.8005737 1.0167214 pval-NULL for(i in c(1:6)) + { + reg-lm(res~mat[,i]) + reg + pval[i]-reg$p.value + } pval NULL reg Call: lm(formula = res ~ mat[, i]) Coefficients: (Intercept) mat[, i] 0.8195 -0.2557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is there in a numeric (0)?
Hello! I will repeat my question :what is there in a numeric(0)? In a function, I have several matrices with dimensions 288x1, also matrices with dimensions 0x1 (numeric(0)). But I really do not know in which matrix there are elements or not. So, these matrices that do not contain anything, if I am saying this correctly, I would like to make them matrices with the same dimensions as the first ones but with NA elements and the ones with elements to remain as they are. For this reason, I thought to use the ifelse command. But I have problems determining the arguments of this ifelse... I tried to use: A1 - ifelse(nrow(A)==0,matrix(rep(NA,288),288,1),A) If A is a matrix with elements, then I get a A1 matrix with replicated 288 times its first element, but I want the initial A matrix. If A is a matrix with no elements (numeric(0)) then I get only 1 NA. So, how can I test if there are elements in my matrix and if there are to return itself the matrix and if there are not to get a matrix 288x1 with NAs? Thank you for your time! Ismini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
There exists a R plug-in for SPSS. You can find it on the SPSS website. ... and there is a page on the R wiki: http://wiki.r-project.org/rwiki/doku.php?id=tips:callingr:spss HTH, Tobias I have a code in R. Could anyone give me the best possible way (or just ways!) to integrate it in SPSS? I would doubt you could do this, but for the least provide commented, minimal, self-contained, reproducible code. It would help if you were more specific. Liviu -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau d.126 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: what is there in a numeric (0)?
Hi [EMAIL PROTECTED] napsal dne 27.11.2008 11:48:32: Hello! I will repeat my question :what is there in a numeric(0)? In a function, I have several matrices with dimensions 288x1, also matrices with dimensions 0x1 (numeric(0)). But I really do not know in which matrix there are elements or not. So, these matrices that do not contain anything, if I am saying this correctly, I would like to make them matrices with the same dimensions as the first ones but with NA elements and the ones with elements to remain as they are. For this reason, I thought to use the ifelse command. But I have problems determining the arguments of this ifelse... I tried to use: A1 - ifelse(nrow(A)==0,matrix(rep(NA,288),288,1),A) If A is a matrix with elements, then I get a A1 matrix with replicated 288 times its first element, but I want the initial A matrix. If A is a matrix with no elements (numeric(0)) then I get only 1 NA. It is not a work for ifelse but for if. You want to test only one comparison not a vector of values. if(nrow(mat)==0) A1 - matrix(NA, 288,1) else A1 - A Regards Petr So, how can I test if there are elements in my matrix and if there are to return itself the matrix and if there are not to get a matrix 288x1 with NAs? Thank you for your time! Ismini __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Troubles with the format of dates
Dear useRs, I'm struggling again with date-related stuff: I am using R to draw water levels at certain measuring stations. My data comes as a tab-delimited text file and looks like this: DATUM P1 P2 ... 2006-11-16 425.21 423.99 2006-12-15 425.12 423.97 2007-01-16 425.16 424.06 ... (measurements started in July 2004 and still continue on a monthly or bi-weekly basis) This is then plotted using this code: a-read.table(dummy_data.tab, sep=\t, header=TRUE, na.strings=c(0)) x-as.Date(a$DATUM) plot(x, a$P1, axes=FALSE, xlim=c(as.Date(2004-07-01), as.Date(2009-01-01)), ylim=c(423,428), col=red, pch=15, type=o, ylab=Kote [m a.s.l.], main=Dummy Plot) points(x, a$P2, col=red, pch=17, type=o, lty=dotted) axis(2, at=423:428, tck=1, col=gray60) axis.Date(1, at=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), labels=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), tck=1, col=gray60) So far, I've been using this type of date format (-mm-dd) because that was the only way I could get the whole thing to work when I started using R. However, living in Switzerland, I would prefer the dates to read 16.11.2006 i.e. dd.mm. or 16. Nov. 2006 (preferably using German names for the months, if possible). I've used the chron package to convert my dates to day mon year (16 Nov 2006, with or without spaces), i.e. I replaced the second line of code with: x-format(chron(as.character(a$DATUM), format=y-m-d, out.format=day mon year)) but then I can't plot them any more, because as.Date() no longer is the correct function and R gives me an error saying there is an invalid xlim-value. Can someone please point me in the right direction? I could easily change the input data to be in dd.mm. format, if that helps. Thanks for your help, Kathi -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help processing large data
Dear all, I have one problem to handle a large dataset... It looks like: read no length 2 2 144 7 7 47490 9 9 310944 11 11 10089 14 14 13152 17 17 27363 and so on There are 13 rows From this table I need to make a table like 2_1 2 100 2_2 2 44 7_1 7 100 7_2 7 100 ... ... 7_474 7 100 7_475 7 90 9_1 9 100 9_2 9 100 and so on... In words: I want to divide the 3rd column by 100 to keep the length 100 and increasing no of rows needed, where no will be same for all increased rows, but the read will be changed like 2_1,2_2 and so on.. Please let me know if any one can help. Thanks a lot in advance. Best, Mitra. -- View this message in context: http://www.nabble.com/Help-processing-large-data-tp20716564p20716564.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Error in Comprting Risks Regression
Dear All, I am trying to run the following function (a CRR=Competing Risks Regressionmodel) and receive the error in solve.default. Can anyone give me some insights into where the problem is? Thanks print(z-crr(J3500,CD3500,cov)) Error in solve.default(v[[1]]) : Lapack routine dgesv : system is exactly singular [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] par(ask=TRUE) and devAskNewPage(ask=TRUE) not working
Hi, First, I do some calculation, then a plot. Add some lines and texts to the plot. Second, do some further calculation, then a plot. Add again some lines and texts to the plot. Third, do some further calculation, then Fourth, . After a plot is complete (means the plot itself, lines and texts) I would like to click 'enter' to see the next complete plot (again the plot itself, lines and texts) and so on. par(ask=TRUE) and devAskNewPage(ask=TRUE) is not working, unfortunately? Any Ideas? Regards -- View this message in context: http://www.nabble.com/par%28ask%3DTRUE%29-and-devAskNewPage%28ask%3DTRUE%29-not-working-tp20718320p20718320.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lmer refuses nested random factors
Dear Martin, How many levels has PLANT? Does it is an ID for the individual plants? In that case it is pointless to add it to a random effect. Because then each group at the higest level would contain only one observation. There is a mailing list dedicated to mixed effects models: R-Sig-mixed-models. HTH, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 [EMAIL PROTECTED] www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Namens HENERY Martin Verzonden: donderdag 27 november 2008 13:29 Aan: r-help@r-project.org Onderwerp: [R] lmer refuses nested random factors I am trying to run the following model in R lmer(leaves.eaten~Geocytotype+(1|TEST/ PLANT),data=cyphoplantfeeding,family=poisson) My experimental setup is 41 replicates (TEST) of an experiment in which there are three Geocytotypes of a plant species in each TEST, and two plant pseudoreplicates per Geocytotype in each test (i.e. 3*2=6 plants per test). So my random factors are trying to examine/ account for variation between replicates and between pairs of plants in each test. The response variable is counts of damaged leaves on each plant hence the poisson distribution. When I try and run the model I get this error message (sorry but it is in French): Erreur : length(f1) == length(f2) is not TRUE De plus : Warning messages: 1: In PLANT:TEST : l'expression numérique a 246 éléments : seul le premier est utilisé 2: In PLANT:TEST : l'expression numérique a 246 éléments : seul le premier est utilisé This however is not connected as far as I can tell to the structure of the data because I checked the lengths of all the variables and if I run the same model design using lme function (without the family=poisson of course) and the model computes perfectly fine with the correct group sizes: Number of Observations: 246 Number of Groups: TEST PLANT %in% TEST 41 123 I couldn't find this error mentioned in any other posts on the list. Can someone enlighten me as to how to get around this intractable problem? My crude solution is to go around the obstacle by computing the mean of each plant pair and using a GLM with poisson distribution and ignore random effects. Martin Martin Henery Post doctoral researcher Département de Biologie/Ecologie Evolution Université de Fribourg [[alternative HTML version deleted]] Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survreg and pweibull solved for any distribution
Dear all - Following up my own post, having found a Terry Therneau post about the value of predict(model,type=quantile), http://tolstoy.newcastle.edu.au/R/e4/help/08/03/5335.html the following code ammendment produces what I was intending to see. It clear that the 0.5 quantile is the inflection point. And predict() has the value of being distribution independent, and producing valid SE estimates. data(lung) lung.wbs - survreg( Surv(time, status)~ 1, data=lung, dist='weibull') curve(pweibull(x, scale=exp(coef(lung.wbs)), shape=1/lung.wbs $scale,lower.tail=FALSE),from=0, to=max(lung$time)) lines(survfit(Surv(time,status)~1, data=lung), col=red) abline(h=0.5,lty=2) abline(v=unique(predict(lung.wbs,type=quantile,p=0.5)),lty=2,col=4) However, one last technical question. In an intercept only model, how do you specify newdata to get just a single estimate? In the above example I used unique against the model predictions evaluated at every datapoint. But with no terms on the RHS of the model, how do you (can you) specify new data? Best wishes, Andrew On 26 Nov 2008, at 16:27, Andrew Beckerman wrote: Dear all - I have followed the thread the reply to which was lead by Thomas Lumley about using pweibull to generate fitted survival curves for survreg models. http://tolstoy.newcastle.edu.au/R/help/04/11/7766.html Using the lung data set, data(lung) lung.wbs - survreg( Surv(time, status)~ 1, data=lung, dist='weibull') curve(pweibull(x, scale=exp(coef(lung.wbs)), shape=1/lung.wbs $scale,lower.tail=FALSE),from=0, to=max(lung$time)) lines(survfit(Surv(time,status)~1, data=lung), col=red) Assuming this is correct, why does the inflection point of this curve not match up to the exp(scale parameter)? Am I wrong in assuming that the scale represents the inflection, and the shape adjusts the shape around this point? I think I am perhaps confusing the scale and the median with the inflection point calcuation? One can visualise the mismatch with: abline(v=exp(coef(lung.wbs)),lty=2) abline(h=0.5,lty=2) Many thanks for the clarification R version 2.8.0 (2008-10-20) i386-apple-darwin8.11.1 locale: en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] splines datasets utils stats graphics grDevices methods base other attached packages: [1] survival_2.34-1 Hmisc_3.4-3 lattice_0.17-15 MASS_7.2-44 loaded via a namespace (and not attached): [1] cluster_1.11.11 grid_2.8.0 tools_2.8.0 Andrew - Dr. Andrew Beckerman Department of Animal and Plant Sciences, University of Sheffield, Alfred Denny Building, Western Bank, Sheffield S10 2TN, UK ph +44 (0)114 222 0026; fx +44 (0)114 222 0002 http://www.beckslab.staff.shef.ac.uk/ http://www.flickr.com/photos/apbeckerman/ http://www.warblefly.co.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xreg in ARIMA modelling.
Hello, Does anyone know how the parameter estimates are calculated for xreg variables when called as part of an arima() command, or know of any literature that provides this info? In particular, I was wondering if there is a quick way to compare different combinations of xreg variables in the arima() fit in the same way that you would in multiple regression (using AIC R^2 etc.). Thanks very much! -- View this message in context: http://www.nabble.com/%22xreg%22-in-ARIMA-modelling.-tp20718058p20718058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: dataframe
Hi [EMAIL PROTECTED] napsal dne 25.11.2008 05:30:52: hi there I have a dataframe abc 123 345 abc 345 456 lmn 567 345 hkl 568 535 lmn 096 456 lmn 768 094 i want the uniques of column 1 and there corresponsing column 2 and 3 output abc 123 345 lmn 567 345 hkl 568 535 cbind(DF1[,1],DF1[which(unique(DF1[,1]),c(2,3)]) but didnt work If test is your data frame test[match(unique(test$V1),test$V1),] shall selecte first unique items. Regards Petr P.S. Not necessary to post your question several times. kindly let me know how to go abt it ramya -- View this message in context: http://www.nabble.com/dataframe-tp20675022p20675022.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lmer refuses nested random factors
I am trying to run the following model in R lmer(leaves.eaten~Geocytotype+(1|TEST/ PLANT),data=cyphoplantfeeding,family=poisson) My experimental setup is 41 replicates (TEST) of an experiment in which there are three Geocytotypes of a plant species in each TEST, and two plant pseudoreplicates per Geocytotype in each test (i.e. 3*2=6 plants per test). So my random factors are trying to examine/ account for variation between replicates and between pairs of plants in each test. The response variable is counts of damaged leaves on each plant hence the poisson distribution. When I try and run the model I get this error message (sorry but it is in French): Erreur : length(f1) == length(f2) is not TRUE De plus : Warning messages: 1: In PLANT:TEST : l'expression numérique a 246 éléments : seul le premier est utilisé 2: In PLANT:TEST : l'expression numérique a 246 éléments : seul le premier est utilisé This however is not connected as far as I can tell to the structure of the data because I checked the lengths of all the variables and if I run the same model design using lme function (without the family=poisson of course) and the model computes perfectly fine with the correct group sizes: Number of Observations: 246 Number of Groups: TEST PLANT %in% TEST 41 123 I couldn't find this error mentioned in any other posts on the list. Can someone enlighten me as to how to get around this intractable problem? My crude solution is to go around the obstacle by computing the mean of each plant pair and using a GLM with poisson distribution and ignore random effects. Martin Martin Henery Post doctoral researcher Département de Biologie/Ecologie Evolution Université de Fribourg [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Odp: Troubles with the format of dates
Hi [EMAIL PROTECTED] napsal dne 27.11.2008 14:20:17: Dear useRs, I'm struggling again with date-related stuff: I am using R to draw water levels at certain measuring stations. My data comes as a tab-delimited text file and looks like this: DATUM P1 P2 ... 2006-11-16 425.21 423.99 2006-12-15 425.12 423.97 2007-01-16 425.16 424.06 ... (measurements started in July 2004 and still continue on a monthly or bi-weekly basis) This is then plotted using this code: a-read.table(dummy_data.tab, sep=\t, header=TRUE, na.strings=c(0)) x-as.Date(a$DATUM) plot(x, a$P1, axes=FALSE, xlim=c(as.Date(2004-07-01), as.Date(2009-01-01)), ylim=c(423,428), col=red, pch=15, type=o, ylab=Kote [m a.s.l.], main=Dummy Plot) points(x, a$P2, col=red, pch=17, type=o, lty=dotted) axis(2, at=423:428, tck=1, col=gray60) axis.Date(1, at=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), labels=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), tck=1, col=gray60) So far, I've been using this type of date format (-mm-dd) because that was the only way I could get the whole thing to work when I started using R. However, living in Switzerland, I would prefer the dates to read 16.11.2006 i.e. dd.mm. or 16. Nov. 2006 (preferably using German names for the months, if possible). I've used the chron package to convert my dates to day mon year (16 Nov 2006, with or without spaces), i.e. I replaced the second line of code with: x-format(chron(as.character(a$DATUM), format=y-m-d, out.format=day mon year)) Although I am not an expert in time/date functions it seems to me that you just want your labels to be in your preferred format. If it is the case labels = format(seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), %d.%m.%Y) in your call to axisDate can help you to achieve it. Regards Petr but then I can't plot them any more, because as.Date() no longer is the correct function and R gives me an error saying there is an invalid xlim-value. Can someone please point me in the right direction? I could easily change the input data to be in dd.mm. format, if that helps. Thanks for your help, Kathi -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to create a string containing '\/' to be used with SED?
What do you mean by doesn't work and the \\ are removed? a - ..\\/path\\/file a [1] ..\\/path\\/file cat(a) ..\/path\/file -- David Winsemius On Nov 26, 2008, at 7:46 PM, ikarus wrote: Thanks Sean!! I followed you suggestion to use gsub() and it worked perfectly! I still can't create a string with inside \/ (e.g., a - ..\\/path\\/file doesn't work, R complains and the \\ are removed), but I don't care, gsub() does the same job as sed and without using any system call. seanpor wrote: Good morning, You do not need to quote a forward slash / in R, but you do need to quote a backslash when you're inputting it... so to get a string which actually contains blah\/blah... you need to use blah\\/blah http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-does-backslash-behave-strangely-inside-strings_003f Unless this is a very very big file you shouldn't need to go out to sed, as gsub() should work adequately... and probably quicker and cleaner. So something along the lines of.. (UNTESTED!!! since I don't have a reproduceable example) tmp1 - readLines(configurationFile) tmp1 - gsub(^instance .*, paste(instance = , data $instancePath, /, data$newInstance, sep = ), tmp1) I'm working on 50mb text files, and doing all sorts of manipulations and I do it all inside R under windows XP... reading a 50mb text file across the 100mb network and doing a gsub() on most lines takes an elapsed 16 seconds on this office desktop. hth... Regards, Sean ikarus wrote: Hi guys, I've been struggling to find a solution to the following issue: I need to change strings in .ini files that are given in input to a program whose output is processed by R. The strings to be changed looks like: instance = /home/TSPFiles/TSPLIB/berlin52.tsp I normally use Sed for this kind of things. So, inside R I'd like to write something like: command - paste(sed -i 's/^instance .*/instance = , data$instancePath, data$newInstance, /' , configurationFile, sep = ) system(command) This will overwrite the line starting with instance using instance = the_new_instance In the example I gave, data$instancePath = /home/TSPFiles/TSPLIB/ and data$newInstance = berlin52.tsp The problem is that I need to pass the above path string to sed in the form: \/home\/TSPFiles\/TSPLIB\/ However, I couldn't find a way to create such a string in R. I tried in several different ways, but it always complains saying that '\/' is an unrecognized escape! Any suggestion? Thanks! -- View this message in context: http://www.nabble.com/How-to-create-a-string-containing-%27%5C-%27-to-be-used-with-SED--tp20694319p20711848.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 1-Pearson's R Distance
Hi again List, Well this time Im writing for a friend (really J). He needs to create a distance matrix based on an abundance matrix using the 1-Pearsons R index. Well I told him to look at the proxy package, but there is only Pearson Index. He needs it to perform a clustering. Well, as soon as he told me there proxy only had the Pearson index I thought: He could just do something like NewObject-1-PearsonMatrixObject But I didnt tell him that because Im not sure its the same thing, and it probably will generate a strange cluster with the braches ends distant from the base He told me that Statistica 7.0 has this Index, but he doesnt own it. So Is there a way on R to do this correctly? Thanks for the attention. ___ MSc. mailto:[EMAIL PROTECTED] Rodrigo Aluizio Centro de Estudos do Mar/UFPR Laboratório de Micropaleontologia Avenida Beira Mar s/n - CEP 83255-000 Pontal do Paraná - PR - BRASIL Fone: (0**41) 3455-1496 ramal 217 Fax: (0**41) 3455-1105 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Problem for loop
Have you looked at results of str on a regression object? I would not think that there would be a single p.value associated with such a beast, but that there might be if you examined individual coefficients. ? coefficients ?coef -- David Winsemius On Nov 27, 2008, at 4:03 AM, ales grill wrote: Dear all, I have wrote a code for a linear regression. I want to write a loop for so, that I can get estimate for pavlues for six predictors. But I am getting for estmate for only last one. How can I get pvalues for all my predictors in a loop?? Anticipating your help Thanks Ales mat-matrix(rnorm(36),nrow=6) mat [,1] [,2] [,3] [,4] [,5][,6] [1,] 1.10536338 -0.7613770 -1.7100569 -1.8762241 -0.36579280 0.6465219 [2,] -1.34836804 -0.2174270 -0.1153477 -0.1727683 -1.88406206 1.7484955 [3,] 0.96814418 -2.1483727 0.5839668 -1.2361659 0.04592844 1.9937995 [4,] 0.01960219 -1.2339691 0.8290761 0.1002795 -0.15952881 0.3969251 [5,] 1.62343073 1.3741222 -1.2045854 0.4180127 -0.09898615 1.3575119 [6,] -0.95260509 -0.1522824 -1.4257526 1.0057412 -1.20068336 -0.4306761 res-rnorm(6) res [1] 0.2045252 -0.9824761 0.7727004 0.6439993 1.8005737 1.0167214 pval-NULL for(i in c(1:6)) + { + reg-lm(res~mat[,i]) + reg + pval[i]-reg$p.value + } pval NULL reg Call: lm(formula = res ~ mat[, i]) Coefficients: (Intercept) mat[, i] 0.8195 -0.2557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Problem for loop
On Nov 27, 2008, at 9:49 AM, David Winsemius wrote: Have you looked at results of str on a regression object? I would not think that there would be a single p.value associated with such a beast, but that there might be if you examined individual coefficients. ? coefficients ?coef That wasn't on as on point as I thought. Take a look at this screen dialog: x - 1:5; coef(lm(c(1:3,7,6) ~ x)) (Intercept) x -0.7 1.5 str(coef(lm(c(1:3,7,6) ~ x))) Named num [1:2] -0.7 1.5 - attr(*, names)= chr [1:2] (Intercept) x anova(lm(c(1:3,7,6) ~ x)) Analysis of Variance Table Response: c(1:3, 7, 6) Df Sum Sq Mean Sq F value Pr(F) x 1 22.5000 22.5000 15.698 0.02872 * Residuals 3 4.3000 1.4333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 str(anova(lm(c(1:3,7,6) ~ x))) Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: $ Df : int 1 3 $ Sum Sq : num 22.5 4.3 $ Mean Sq: num 22.5 1.43 $ F value: num 15.7 NA $ Pr(F) : num 0.0287 NA - attr(*, heading)= chr Analysis of Variance Table\n Response: c(1:3, 7, 6) anova(lm(c(1:3,7,6) ~ x))$Pr(F) [1] 0.02871561 NA anova(lm(c(1:3,7,6) ~ x))$Pr(F)[1] [1] 0.02871561 -- David Winsemius On Nov 27, 2008, at 4:03 AM, ales grill wrote: Dear all, I have wrote a code for a linear regression. I want to write a loop for so, that I can get estimate for pavlues for six predictors. But I am getting for estmate for only last one. How can I get pvalues for all my predictors in a loop?? Anticipating your help Thanks Ales mat-matrix(rnorm(36),nrow=6) mat [,1] [,2] [,3] [,4] [,5][,6] [1,] 1.10536338 -0.7613770 -1.7100569 -1.8762241 -0.36579280 0.6465219 [2,] -1.34836804 -0.2174270 -0.1153477 -0.1727683 -1.88406206 1.7484955 [3,] 0.96814418 -2.1483727 0.5839668 -1.2361659 0.04592844 1.9937995 [4,] 0.01960219 -1.2339691 0.8290761 0.1002795 -0.15952881 0.3969251 [5,] 1.62343073 1.3741222 -1.2045854 0.4180127 -0.09898615 1.3575119 [6,] -0.95260509 -0.1522824 -1.4257526 1.0057412 -1.20068336 -0.4306761 res-rnorm(6) res [1] 0.2045252 -0.9824761 0.7727004 0.6439993 1.8005737 1.0167214 pval-NULL for(i in c(1:6)) + { + reg-lm(res~mat[,i]) + reg + pval[i]-reg$p.value + } pval NULL reg Call: lm(formula = res ~ mat[, i]) Coefficients: (Intercept) mat[, i] 0.8195 -0.2557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] par(ask=TRUE) and devAskNewPage(ask=TRUE) not working
Have you looked at the TeachingDemos package? -- David Winsemius On Nov 27, 2008, at 7:18 AM, mentor_ wrote: Hi, First, I do some calculation, then a plot. Add some lines and texts to the plot. Second, do some further calculation, then a plot. Add again some lines and texts to the plot. Third, do some further calculation, then Fourth, . After a plot is complete (means the plot itself, lines and texts) I would like to click 'enter' to see the next complete plot (again the plot itself, lines and texts) and so on. par(ask=TRUE) and devAskNewPage(ask=TRUE) is not working, unfortunately? Any Ideas? Regards -- View this message in context: http://www.nabble.com/par%28ask%3DTRUE%29-and-devAskNewPage%28ask%3DTRUE%29-not-working-tp20718320p20718320.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reshape with var as fun.aggregate
Thanks, that works fine. ___ Regards, Jason Locklin, B.Sc. Hons. University of Waterloo http://artsweb.uwaterloo.ca/~jalockli On Wednesday 26 November 2008 19:49:48 you wrote: Hi Jason, The reason you are getting this error is that the latest version of reshape uses fun.aggregate(numeric(0)) to figure out what missing values should be filled in with. Unfortunately there was a brief period in R versions when var(numeric(0)) returned an error rather than a missing value. You can work around the problem by manually specifying fill = NA. Regards, Hadley On Wed, Nov 26, 2008 at 1:05 PM, [EMAIL PROTECTED] wrote: I used to be able to use variance for fun.aggregate, but since I upgraded from R 2.6 to 2.7 (Ubuntu hardy to intrepid), it no longer works. library(reshape) data(french_fries) ffm - melt(french_fries, id=1:4, na.rm = TRUE) cast(ffm, rep ~ treatment, length) rep 1 2 3 1 1 579 578 575 2 2 580 579 580 cast(ffm, rep ~ treatment, mean) rep123 1 1 3.207772 3.177336 3.024522 2 2 3.181207 3.115544 3.277759 cast(ffm, rep ~ treatment, var) Error in fun.aggregate(data$value[0]) : 'x' is empty Any ideas? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error in Comprting Risks Regression
Hi, That will be difficult to help with the little information you gave. Please read the posting guide and what's at the bottom of this email. Best regards, Arthur Allignol kende jan wrote: Dear All, I am trying to run the following function (a CRR=Competing Risks Regressionmodel) and receive the error in solve.default. Can anyone give me some insights into where the problem is? Thanks print(z-crr(J3500,CD3500,cov)) Error in solve.default(v[[1]]) : Lapack routine dgesv : system is exactly singular [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding Stopping time
Hi Debanjan, It would be more likely that you get a response if your question was more clear. Your code is very difficult to read, and it doesn't help that you don't provide any context, or comment your code with ### This is calculating the average kind of statements. What are you trying to do? Anyhow, after quite a bit of effort trying to understand what you've done, I found your (simple!) mistake: Since you are resetting the k counter after your first try, you need to change your k constant in that big quantity you're calculating to (k-N[j-1]), like this: T[k] - (((k-N[j-1])/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-((k-N[j-1])*(theta2-theta1)/2) As an aside, try not to use variables defined outside a function in the function code (in this case your x). It makes the code more difficult to follow, and far more likely to break. Regards, Gustaf On Wed, Nov 26, 2008 at 4:04 PM, Debanjan Bhattacharjee [EMAIL PROTECTED] wrote: Can any one help me to solve problem in my code? I am actually trying to find the stopping index N. So first I generate random numbers from normals. There is no problem in finding the first stopping index. Now I want to find the second stopping index using obeservation starting from the one after the first stopping index. E.g. If my first stopping index was 5. I want to set 6th observation from the generated normal variables as the first random number, and I stop at second stopping index. This is my code, alpha - 0.05 beta - 0.07 a - log((1-beta)/alpha) b - log(beta/(1-alpha)) theta1 - 2 theta2 - 3 cumsm-function(n) {y-NULL for(i in 1:n) {y[i]=x[i]^2} s=sum(y) return(s) } psum - function(p,q) {z - NULL for(l in p:q) { z[l-p+1] - x[l]^2} ps - sum(z) return(ps) } smm - NULL sm - NULL N - NULL Nout - NULL T - NULL k-0 x - rnorm(100,theta1,theta1) for(i in 1:length(x)) { sm[i] - psum(1,i) T[i] - ((i/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*sm[i])-(i*(theta2-theta1)/2) if (T[i]=b | T[i]=a){N[1]-i break} } for(j in 2:200) { for(k in (N[j-1]+1):length(x)) { smm[k] - psum((N[j-1]+1),k) T[k] - ((k/2)*log(theta1/theta2))+(((theta2-theta1)/(2*theta1*theta2))*smm[k])-(k*(theta2-theta1)/2) if (T[k]=b | T[k]=a){N[j]-k break} } } But I cannot get the stopping index after the first one. Tanks -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Simultaneous Comparison of Factor Structures
Dear R Users, I have a situation where I want to compare the factor structures of two samples that were administered then same test. AMOS as well as SAS provide the option of simultaneous comparison of factor structures like it was proposed by Jöreskog (1971, Simultaneous factor analysis in several populations). Is there a way to do that in R? I am not sure if it is possible to model this kind of comparison with e.g. the sem package and if it is possible, how to do it. A lot of thanks in advance! Regards, Mark Heckmann [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric in data.frame, but only where it is possible
Hello, On 26 Nov 2008 at 22:53, Kinoko wrote: Hi, I would like to convert my character sequences in my matrix/ data.frame into numeric where it is possible. I would also like to retain my alphabetic character strings in their original forms. 5.15.1 hm hm k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2) mode(k) - numeric # ln1 coerces numeric chars into character and # ln2 replaces alphabet chars with NA (with warnings) # = OK as matrix can't have mixed types Perhaps there is a simpler solution, but this should work: k - matrix(c(aa, bb, 1 ,2, 4.3, 0), nrow=2) write.table(k,'xx.txt, col.names=TRUE, row.names=FALSE, sep='\t, quote=FALSE) k - read.table(xx.txt, header=TRUE, sep=\t) summary(k) k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2) g-as.data.frame(k, stringsAsFactos=FALSE) g[,2] # returns: # [1] 1 2 # Levels: 1 2 # hm... let them be levels then... sum(g[,2]) # going down in flames g[,2] - as.numeric(g[,2]) sum(g[,2]) # is the winner of the day, # but I have a hunch that there must be something more trivial/general solution. - I'm looking for something like, as.data.frame(... as.numeric.if.possible=TRUE) , if possible. - Mainly, I don't know in advance if a column has alphabetical or numeric characters. - I have relatively large matrices (or a relatively slow PC) (max 100,000 x 100 cells), if that matters *** *** *** *** *** Another related problem of mine is to create data.frames with mixed types (it should be possible, shouldn't it). d-data.frame(b=seq(1,3)) d-cbind(a,b) typeof(d) # d was coerced into character ## In this case you can do: a - c(aa, bb, cc) d - data.frame(a=a, b=seq(1, 3)) summary(d) ## or d - data.frame(b=seq(1, 3)) d$a - c(aa, bb, cc) summary(d) any help is greatly appreciated, gabor __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Héctor Villalobos [EMAIL PROTECTED] CICIMAR - IPN La Paz, Baja California Sur, MÉXICO [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Troubles with the format of dates
Thank you _SO_ much, Petr, you helped me again. That worked like a charm, exactly what I needed! Cheers, Kathi On Thu, 27 Nov 2008 15:01:29 +0100, Petr PIKAL wrote Hi [EMAIL PROTECTED] napsal dne 27.11.2008 14:20:17: Dear useRs, I'm struggling again with date-related stuff: I am using R to draw water levels at certain measuring stations. My data comes as a tab-delimited text file and looks like this: DATUM P1 P2 ... 2006-11-16 425.21 423.99 2006-12-15 425.12 423.97 2007-01-16 425.16 424.06 ... (measurements started in July 2004 and still continue on a monthly or bi-weekly basis) This is then plotted using this code: a-read.table(dummy_data.tab, sep=\t, header=TRUE, na.strings=c(0)) x-as.Date(a$DATUM) plot(x, a$P1, axes=FALSE, xlim=c(as.Date(2004-07-01), as.Date(2009-01-01)), ylim=c(423,428), col=red, pch=15, type=o, ylab=Kote [m a.s.l.], main=Dummy Plot) points(x, a$P2, col=red, pch=17, type=o, lty=dotted) axis(2, at=423:428, tck=1, col=gray60) axis.Date(1, at=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), labels=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), tck=1, col=gray60) So far, I've been using this type of date format (-mm-dd) because that was the only way I could get the whole thing to work when I started using R. However, living in Switzerland, I would prefer the dates to read 16.11.2006 i.e. dd.mm. or 16. Nov. 2006 (preferably using German names for the months, if possible). I've used the chron package to convert my dates to day mon year (16 Nov 2006, with or without spaces), i.e. I replaced the second line of code with: x-format(chron(as.character(a$DATUM), format=y-m-d, out.format=day mon year)) Although I am not an expert in time/date functions it seems to me that you just want your labels to be in your preferred format. If it is the case labels = format(seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), %d.%m.%Y) in your call to axisDate can help you to achieve it. Regards Petr but then I can't plot them any more, because as.Date() no longer is the correct function and R gives me an error saying there is an invalid xlim-value. Can someone please point me in the right direction? I could easily change the input data to be in dd.mm. format, if that helps. Thanks for your help, Kathi -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 1-Pearson's R Distance
Hi Rodrigo, afaik, (1 - r_Pearson)/2 is used rather than 1 - r_Pearson. This gives a distance measure ranging between 0 and 1 rather than 0 and 2. But after all, dies does not change anything substantial. see e.g. Theodoridis Koutroumbas: Pattern Recognition. I didn't know of the proxy package, but the calculation it straightforward (though a bit wasteful I suspect: first the whole matrix is produced, and as.dist cuts it down again to a triangular matrix): as.dist (0.5 - cor (t(x) / 2)) Take care wheter you want to use x or t(x). HTH Claudia -- Claudia Beleites Dipartimento dei Materiali e delle Risorse Naturali Università degli Studi di Trieste Via Alfonso Valerio 6/a I-34127 Trieste phone: +39 (0 40) 5 58-34 47 email: [EMAIL PROTECTED] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.numeric in data.frame, but only where it is possible
On Thu, Nov 27, 2008 at 12:53 AM, Kinoko [EMAIL PROTECTED] wrote: Hi, I would like to convert my character sequences in my matrix/ data.frame into numeric where it is possible. I would also like to retain my alphabetic character strings in their original forms. 5.15.1 hm hm k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2) mode(k) - numeric # ln1 coerces numeric chars into character and # ln2 replaces alphabet chars with NA (with warnings) # = OK as matrix can't have mixed types k-matrix(c(aa, bb, 1,2, 4.3, 0), nrow=2) g-as.data.frame(k, stringsAsFactos=FALSE) Try this: g -as.data.frame(k, stringsAsFactors = FALSE) str(g) g[] - lapply(g, type.convert) str(g) Hadley -- http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 1-Pearson's R Distance
Rodrigo Aluizio wrote: Hi again List, Well this time I?m writing for a friend (really J). He needs to create a distance matrix based on an abundance matrix using the 1-Pearson?s R index. Well I told him to look at the proxy package, but there is only Pearson Index. He needs it to perform a clustering. Well, as soon as he told me there proxy only had the Pearson index I thought: ?He could just do something like NewObject-1-PearsonMatrixObject? But I didn?t tell him that because I?m not sure it?s the same thing, and it probably will generate a strange cluster with the braches ends distant from the base? He told me that Statistica 7.0 has this Index, but he doesn?t own it. So? Is there a way on R to do this correctly? Thanks for the attention. The help file of dist has an example how to do this: ?dist ## Use correlations between variables as distance dd - as.dist((1 - cor(USJudgeRatings))/2) Note the division by 2! Correlations can be between -1 ... +1 and negative distances make no sense. Another approach is r^2, but this has, of course, a different interpretation. ThPe __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and SPSS
Thanks all! Tobias Verbeke wrote: There exists a R plug-in for SPSS. You can find it on the SPSS website. ... and there is a page on the R wiki: http://wiki.r-project.org/rwiki/doku.php?id=tips:callingr:spss HTH, Tobias I have a code in R. Could anyone give me the best possible way (or just ways!) to integrate it in SPSS? I would doubt you could do this, but for the least provide commented, minimal, self-contained, reproducible code. It would help if you were more specific. Liviu -- Alain Guillet Statistician and Computer Scientist SMCS - Institut de statistique - Université catholique de Louvain Bureau d.126 Voie du Roman Pays, 20 B-1348 Louvain-la-Neuve Belgium tel: +32 10 47 30 50 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- View this message in context: http://www.nabble.com/R-and-SPSS-tp20708367p20723360.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] 1-Pearson's R Distance
At 5:00 PM +0100 11/27/08, Claudia Beleites wrote: Hi Rodrigo, afaik, (1 - r_Pearson)/2 is used rather than 1 - r_Pearson. This gives a distance measure ranging between 0 and 1 rather than 0 and 2. But after all, dies does not change anything substantial. see e.g. Theodoridis Koutroumbas: Pattern Recognition. I didn't know of the proxy package, but the calculation it straightforward (though a bit wasteful I suspect: first the whole matrix is produced, and as.dist cuts it down again to a triangular matrix): as.dist (0.5 - cor (t(x) / 2)) Take care wheter you want to use x or t(x). HTH Claudia From the law of cosines, d = sqrt(2(1-r)) is a somewhat more appropriate transformation of a Pearson correlation to a distance. Although this is monotonically related to the (1-r)/2, by taking the square root it will lead to somewhat different solutions in clustering. Bill -- William Revelle http://personality-project.org/revelle.html Professor http://personality-project.org/personality.html Department of Psychology http://www.wcas.northwestern.edu/psych/ Northwestern University http://www.northwestern.edu/ Attend ISSID/ARP:2009 http://issid.org/issid.2009/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to optimize a trading singal generator (for-loop)?
Hi guys, I've got some performance troubles with my trading signal generator, which indicates when the system goes long or short. I'm playing with some historical data and the for-loop isn't doing his job very efficient. Is there some vectorial solution for this? Here the for-loop: trade.long - 0 trade.short - 0 for (j in peak.days : dim(commodities[[i]])[1]) { # Trading Signal Long if (commodities[[i]][j, High] = commodities[[i]][j, HighestHigh] trade.long == 0) { commodities[[i]][j, Long] - 1 trade.long - 1 } if (commodities[[i]][j, Low] = commodities[[i]][j, emaH] trade.long == 1) { commodities[[i]][j, Long] - -1 trade.long - 0 } # Trading Signal Short if (commodities[[i]][j, Low] = commodities[[i]][j, LowestLow] trade.short == 0) { commodities[[i]][j, Short] - 1 trade.short - 1 } if (commodities[[i]][j, High] = commodities[[i]][j, emaL] trade.short == 1) { commodities[[i]][j, Short] - -1 trade.short - 0 } } # for (j in peak.days : dim(commodities[[i]])[1]) Any ideas are very appreciated, because this for-loop takes about 2 - 3 hours to finish... Thank you, Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] if then statement problem
I really need help with an if then statement I am having trouble with. A brief explanation of what I am trying to do: I have a p matrix of all permutations of a vector of length 3 filled with either 1s or -1s (these make up the columns). The p.unique matrix is the unique columns of the p matrix; here, they are identical but in my real script this will not always be the case. I want to be able to take a randomly generated vector and see which column in the p.unique matrix, if any, match the random vector. If there is a match, I get an integer returned that tells me which column is the match. If there is no match, I get the response integer (0). I want to write an if then statement that allows me to do different things based upon if there is a match to one of the columns in the p.unique matrix or not. Below is a sample script plucked from bits of my master script that show this problem. The vector r was made to not match any columns in p.unique. If it did, c wou! ld equal the integer representing the matching column, and since it does not, I want c to equal the vector (0,0,0). My if then statement is not working (I keep getting the error that my argument is of length zero), so I would really appreciate any help anyone can provide me. Thanks for all of your help previously and in advance for the help with this problem!! Happy Thanksgiving! p=as.matrix(do.call(expand.grid,rep(list(c(-1,1)),3))) p=t(p) p.unique=unique(p,MARGIN=2) r=c(2,2,2) q=which(apply(p.unique,2,function(x)all(x==r))) if(q0){c=p.unique[,q]};{c=c(0,0,0)} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with replicate
Hello everyone, I have a file called trans which has 100 records (rows) and 60 columns. I need to make 2 samples (with 5 records in each) out of this file with no replacement, so I used replicate. The sampling part works fine and I called the outcome result. my problem is then I need to calculate the correlation matrix for each sample. I used do.call and called the outcome CORE. but the problem is when I get my out put it's not separated for each sample. they are all together and I need them to be separated. just to give U an idea I copy and paste the out put: 616783805134 217 616 1. -0.08522865 -0.15018785 -0.12462239 -0.04279605 783 -0.08522865 1. -0.02986738 -0.04731350 0.33191790 805 -0.15018785 -0.02986738 1. 0.15483873 -0.08998425 134 -0.12462239 -0.04731350 0.15483873 1. 0.16290948 217 -0.04279605 0.33191790 -0.08998425 0.16290948 1. 607 1. -0.12662932 0.16116459 -0.03479445 -0.03479445 *(from here is for the 2nd sample) 604 -0.12662932 1. 0.01298701 -0.06168397 -0.06168397 698 0.16116459 0.01298701 1. -0.03925343 -0.03925343 639 -0.03479445 -0.06168397 -0.03925343 1. -0.01694915 989 -0.03479445 -0.06168397 -0.03925343 -0.01694915 1. does any one has any idea how can I separate the result for 2 samples? This is my code: result=replicate(2,trans[,sample(colnames(trans),5,replace = FALSE)],simplify=FALSE) CORE=do.call(rbind,lapply(result,function(x) cor(x))) -- View this message in context: http://www.nabble.com/Help-with-%22replicate%22-tp20722891p20722891.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simultaneous Comparison of Factor Structures
Dear Mark, Although I would like, at some point, to add this feature, the sem package doesn't fit multiple-group models. Sorry, John -- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada web: socserv.mcmaster.ca/jfox -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mark Heckmann Sent: November-27-08 10:10 AM To: r-help@r-project.org Subject: [R] Simultaneous Comparison of Factor Structures Dear R Users, I have a situation where I want to compare the factor structures of two samples that were administered then same test. AMOS as well as SAS provide the option of simultaneous comparison of factor structures like it was proposed by Jvreskog (1971, Simultaneous factor analysis in several populations). Is there a way to do that in R? I am not sure if it is possible to model this kind of comparison with e.g. the sem package and if it is possible, how to do it. A lot of thanks in advance! Regards, Mark Heckmann [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regression Problem for loop
Dear ales, Try this: # Data # set.seed(123) mat=matrix(rnorm(36),ncol=6) colnames(mat)=paste('x',1:6,sep=) res=rnorm(6) # All info for coefficients. First 4 columns # correspond to the intercept and the next for to the slope t(apply(mat,2,function(x){ sm=summary(lm(res~x))$coeff res=matrix(sm,ncol=prod(dim(sm))) names(res)=rep(colnames(sm),nrow(sm)) res } ) ) Estimate Std. Error t value Pr(|t|) Estimate Std. Error t value Pr(|t|) x1 -0.3740663 0.3416852 0.2586267 0.1991565 -1.4463559 1.7156617 0.2216156 0.16136742 x2 -0.3972439 -0.5951867 0.2622604 0.3475569 -1.5146930 -1.7124872 0.2044207 0.16196830 x3 -0.1486117 -0.4973452 0.2063296 0.1890952 -0.7202638 -2.6301317 0.5112057 0.05817818 x4 -0.2408166 0.2314570 0.3159543 0.3260189 -0.7621881 0.7099497 0.4884137 0.51693194 x5 -0.4113849 0.4442002 0.1950218 0.1539688 -2.1094303 2.8850004 0.1025563 0.04478734 x6 -0.2652903 0.1900581 0.3264551 0.4945158 -0.8126393 0.3843318 0.4620218 0.72029059 HTH, Jorge On Thu, Nov 27, 2008 at 4:03 AM, ales grill [EMAIL PROTECTED] wrote: Dear all, I have wrote a code for a linear regression. I want to write a loop for so, that I can get estimate for pavlues for six predictors. But I am getting for estmate for only last one. How can I get pvalues for all my predictors in a loop?? Anticipating your help Thanks Ales mat-matrix(rnorm(36),nrow=6) mat [,1] [,2] [,3] [,4] [,5][,6] [1,] 1.10536338 -0.7613770 -1.7100569 -1.8762241 -0.36579280 0.6465219 [2,] -1.34836804 -0.2174270 -0.1153477 -0.1727683 -1.88406206 1.7484955 [3,] 0.96814418 -2.1483727 0.5839668 -1.2361659 0.04592844 1.9937995 [4,] 0.01960219 -1.2339691 0.8290761 0.1002795 -0.15952881 0.3969251 [5,] 1.62343073 1.3741222 -1.2045854 0.4180127 -0.09898615 1.3575119 [6,] -0.95260509 -0.1522824 -1.4257526 1.0057412 -1.20068336 -0.4306761 res-rnorm(6) res [1] 0.2045252 -0.9824761 0.7727004 0.6439993 1.8005737 1.0167214 pval-NULL for(i in c(1:6)) + { + reg-lm(res~mat[,i]) + reg + pval[i]-reg$p.value + } pval NULL reg Call: lm(formula = res ~ mat[, i]) Coefficients: (Intercept) mat[, i] 0.8195 -0.2557 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] if then statement problem
If I understood your problem correctly, you are just missing a couple of things: q = which(apply(p.unique,2,function(x)all(x==r)) == TRUE) Also, you should probably change this line: if(q0){c=p.unique[,q]};{c=c(0,0,0)} To something like this: if(length(q)0){c=p.unique[,q]};{c=c(0,0,0)} Regards, Gustavo On Thu, Nov 27, 2008 at 3:02 PM, Salas, Andria Kay [EMAIL PROTECTED] wrote: I really need help with an if then statement I am having trouble with. A brief explanation of what I am trying to do: I have a p matrix of all permutations of a vector of length 3 filled with either 1s or -1s (these make up the columns). The p.unique matrix is the unique columns of the p matrix; here, they are identical but in my real script this will not always be the case. I want to be able to take a randomly generated vector and see which column in the p.unique matrix, if any, match the random vector. If there is a match, I get an integer returned that tells me which column is the match. If there is no match, I get the response integer (0). I want to write an if then statement that allows me to do different things based upon if there is a match to one of the columns in the p.unique matrix or not. Below is a sample script plucked from bits of my master script that show this problem. The vector r was made to not match any columns in p.unique. If it did, c w! ou! ld equal the integer representing the matching column, and since it does not, I want c to equal the vector (0,0,0). My if then statement is not working (I keep getting the error that my argument is of length zero), so I would really appreciate any help anyone can provide me. Thanks for all of your help previously and in advance for the help with this problem!! Happy Thanksgiving! p=as.matrix(do.call(expand.grid,rep(list(c(-1,1)),3))) p=t(p) p.unique=unique(p,MARGIN=2) r=c(2,2,2) q=which(apply(p.unique,2,function(x)all(x==r))) if(q0){c=p.unique[,q]};{c=c(0,0,0)} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] s4 object
Hola muy buen dia!! Me podrían explicar de una manera sencilla en que consiste y como funciona la clase s4 object. Se les agredeceria mucho su mas pronta respuesta! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xreg in ARIMA modelling.
The help file states: The exact likelihood is computed via a state-space representation of the ARIMA process, and the innovations and their variance found by a Kalman filter. It is possible to include exogenous variables (xreg) this way, but one can only assume this is done [only one person knows for sure... the person who wrote the final version of arima(), and I hope he chimes in to this]. If this is the case, then there is a likelihood evaluation and AIC [or similar criteria, e.g., BIC and so on] would apply. 00alastair00 wrote: Hello, Does anyone know how the parameter estimates are calculated for xreg variables when called as part of an arima() command, or know of any literature that provides this info? In particular, I was wondering if there is a quick way to compare different combinations of xreg variables in the arima() fit in the same way that you would in multiple regression (using AIC R^2 etc.). Thanks very much! - The power of accurate observation is commonly called cynicism by those who have not got it. George Bernard Shaw -- View this message in context: http://www.nabble.com/%22xreg%22-in-ARIMA-modelling.-tp20718058p20727625.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package Iso
I have just uploaded a revised version of the Iso package to ``incoming'' on CRAN. It should be available soon. I have fixed the bug that was pointed out to me by the two addressees. I have also added a new feature to the package; it returns a step function representation of the isotonic regression if requested to do so. Please let me know if I have introduced any new bugs in the revision process! cheers, Rolf ## Attention:\ This e-mail message is privileged and confid...{{dropped:9}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Troubles with the format of dates
Perhaps just this: Lines - DATUM P1 P2 2006-11-16 425.21 423.99 2006-12-15 425.12 423.97 2007-01-16 425.16 424.06 library(chron) DF - read.table(textConnection(Lines), header = TRUE, as.is = TRUE) DF$DATUM - chron(DF$DATUM, format = y-m-d, out.format = d.m.y) plot(P1 ~ DATUM, DF) # or using the zoo package this: library(zoo) z - read.zoo(textConnection(Lines), header = TRUE, FUN = function(x) chron(x, format = y-m-d, out.format = d.m.y)) plot(z) On Thu, Nov 27, 2008 at 8:20 AM, Kathi [EMAIL PROTECTED] wrote: Dear useRs, I'm struggling again with date-related stuff: I am using R to draw water levels at certain measuring stations. My data comes as a tab-delimited text file and looks like this: DATUM P1 P2 ... 2006-11-16 425.21 423.99 2006-12-15 425.12 423.97 2007-01-16 425.16 424.06 ... (measurements started in July 2004 and still continue on a monthly or bi-weekly basis) This is then plotted using this code: a-read.table(dummy_data.tab, sep=\t, header=TRUE, na.strings=c(0)) x-as.Date(a$DATUM) plot(x, a$P1, axes=FALSE, xlim=c(as.Date(2004-07-01), as.Date(2009-01-01)), ylim=c(423,428), col=red, pch=15, type=o, ylab=Kote [m a.s.l.], main=Dummy Plot) points(x, a$P2, col=red, pch=17, type=o, lty=dotted) axis(2, at=423:428, tck=1, col=gray60) axis.Date(1, at=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), labels=seq(as.Date(2004-07-01), as.Date(2009-01-01), by=month), tck=1, col=gray60) So far, I've been using this type of date format (-mm-dd) because that was the only way I could get the whole thing to work when I started using R. However, living in Switzerland, I would prefer the dates to read 16.11.2006 i.e. dd.mm. or 16. Nov. 2006 (preferably using German names for the months, if possible). I've used the chron package to convert my dates to day mon year (16 Nov 2006, with or without spaces), i.e. I replaced the second line of code with: x-format(chron(as.character(a$DATUM), format=y-m-d, out.format=day mon year)) but then I can't plot them any more, because as.Date() no longer is the correct function and R gives me an error saying there is an invalid xlim-value. Can someone please point me in the right direction? I could easily change the input data to be in dd.mm. format, if that helps. Thanks for your help, Kathi -- DropNet AG - Das Unternehmen fuer Ihren Internet-Auftritt! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] AIC function and Step function
I would like to figure out the equations for calculating AIC in both step() function and AIC () function. They are different. Then I just type step in the R console, and found the AIC used in step() function is extractAIC. I went to the R help, and found: The criterion used is AIC = - 2*log L + k * edf, where L is the likelihood and edf the equivalent degrees of freedom (i.e., the number of free parameters for usual parametric models) of fit. For linear models with unknown scale (i.e., for lm and aov), -2log L is computed from the deviance and uses a different additive constant to logLik and hence AIC. If RSS denotes the (weighted) residual sum of squares then extractAIC uses for - 2log L the formulae RSS/s - n (corresponding to Mallows' Cp) in the case of known scale s and n log (RSS/n) for unknown scale. AIC only handles unknown scale and uses the formula n log (RSS/n) - n + n log 2π - sum log w where w are the weights. Now, my question is what code I should use to look at the exact calculation process in the AIC()function and extractAIC() function in R? Thanks! Dana -- View this message in context: http://www.nabble.com/AIC-function-and-Step-function-tp20728043p20728043.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] AIC function and Step function
Hi Dana, Of course the only true way to know what the AIC calculations are is to read the source code. From within R, what with namespaces, that is becoming increasingly difficult. The AIC() function is not too hard to find from R. AIC function (object, ..., k = 2) UseMethod(AIC) environment: namespace:stats So now we know that AIC is in the stats package. objects(package:stats, all=TRUE) [1] .MFclass .checkMFClasses .getXlevels AIC [5] ARMAacf ARMAtoMA Box.test C so all we are seeing is the AIC generic method shown above. Very often an S3 method will have a 'default' method, so let's see if we get lucky: stats:::AIC.default function (object, ..., k = 2) { ll - if (stats4 %in% loadedNamespaces()) stats4:::logLik else logLik if (length(list(...))) { object - list(object, ...) val - lapply(object, ll) val - as.data.frame(t(sapply(val, function(el) c(attr(el, df), AIC(el, k = k) names(val) - c(df, AIC) Call - match.call() Call$k - NULL row.names(val) - as.character(Call[-1]) val } else AIC(ll(object), k = k) } environment: namespace:stats so there is the basic AIC method. There are some references to logLik, so let's look for a 'logLik' method too. stats:::AIC.logLik function (object, ..., k = 2) -2 * c(object) + k * attr(object, df) environment: namespace:stats If anyone knows how to reveal all these methods without having to guess, I'd like to know of easier ways to find these methods. objects() doesn't reveal them, showMethods() doesn't appear to, and in my attempts, this occurred: methods(class = AIC) Error : cannot allocate vector of size 1.9 Gb R(352,0xa0350074) malloc: *** mmap(size=2023526400) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug Error in as.environment(pos) : no item called get(.__S3MethodsTable__., envir = asNamespace(i)) on the search list R(352,0xa0350074) malloc: *** mmap(size=2023526400) failed (error code=12) *** error: can't allocate region *** set a breakpoint in malloc_error_break to debug So it's just not easy finding these functions at the command prompt. Thankfully R is open source, so I downloaded the latest R-2.8.0.tar.gz to directory /Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/ and untarred it all. Navigate to /Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R and you're looking at all the stats package R scripts. There's 155 files in there, one named AIC.R so that looks promising. Look in AIC.R and you'll see the two functions shown above, but not the extractAIC bunch. Where to look for the rest? There are many ways to look for extractAIC - you could grep the files from a unix shell, etc. Here's a script I use to allow easier perusal of R source code. This script will concatenate all the R source files in a directory into one file, with file name and path descriptors between files in the output. ### 8 ### inputDir - /Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R outputDir - paste(inputDir, /temp, sep = ) outputFile - AllRScriptFiles.R outputDirFile - paste(outputDir, outputFile, sep = /) if ( !file.exists(outputDir) ) { dir.create(outputDir) } cat(paste(### All files from dir, inputDir, \n\n), file = outputDirFile) ## Get list of all files in dir fileNames - list.files(path = inputDir, pattern = \\.R$, full.names = TRUE) ofcon - file(outputDirFile, open = at) for (fi in seq(along = fileNames)) { fni - fileNames[fi] fc - readLines(fni, n = -1) cat(paste(\n\n# File, fni, #\n\n), file = outputDirFile, append = TRUE) writeLines(fc, con = ofcon) flush(ofcon) } close(ofcon) ### 8 ### After running this script, navigate to /Volumes/KilroyHD/kilroy/Software/R/R-2.8.0/R-2.8.0/src/library/stats/R/temp with your favourite editor, and edit the file AllRScriptFiles.R (Of course you will substitute your own directory names in all this.) Find extractAIC, and you'll see it came from file add.R You'll see extractAIC methods for classes coxph, survreg, glm, ... and you can see all the calculations. extractAIC - function(fit, scale, k = 2, ...) UseMethod(extractAIC) extractAIC.coxph - function(fit, scale, k = 2, ...) { ## seems that coxph sometimes gives one and sometimes gives two values ## for loglik edf - length(fit$coef) loglik - fit$loglik[length(fit$loglik)] c(edf, -2 * loglik + k * edf) } extractAIC.survreg - function(fit, scale, k = 2, ...) { edf - sum(fit$df) c(edf, -2 * fit$loglik[2] + k * edf) } extractAIC.glm - function(fit, scale = 0, k = 2, ...) { n - length(fit$residuals) edf - n - fit$df.residual aic - fit$aic c(edf, aic + (k-2) * edf) } extractAIC.lm - function(fit, scale = 0, k = 2, ...)
Re: [R] AIC function and Step function
Hi Dana, Many thanks to Christos Hatzis who sent me an offline response, pointing out the new functions that make this much easier than my last suggestions: methods() and getAnywhere() methods(extractAIC) [1] extractAIC.aov* extractAIC.coxph* extractAIC.glm* extractAIC.lm* extractAIC.negbin* [6] extractAIC.survreg* Non-visible functions are asterisked getAnywhere(extractAIC.coxph) A single object matching extractAIC.coxph was found It was found in the following places registered S3 method for extractAIC from namespace stats namespace:stats with value function (fit, scale, k = 2, ...) { edf - length(fit$coef) loglik - fit$loglik[length(fit$loglik)] c(edf, -2 * loglik + k * edf) } environment: namespace:stats Thank you Christos. That said, one of the advantages of getting the source code is that it has comments that are stripped out when the code is sourced into R e.g. from the command line getAnywhere(AIC.default) A single object matching AIC.default was found It was found in the following places registered S3 method for AIC from namespace stats namespace:stats with value function (object, ..., k = 2) { ll - if (stats4 %in% loadedNamespaces()) stats4:::logLik else logLik if (length(list(...))) { object - list(object, ...) val - lapply(object, ll) val - as.data.frame(t(sapply(val, function(el) c(attr(el, df), AIC(el, k = k) names(val) - c(df, AIC) Call - match.call() Call$k - NULL row.names(val) - as.character(Call[-1]) val } else AIC(ll(object), k = k) } environment: namespace:stats From the source file # File src/library/stats/R/AIC.R # Part of the R package, http://www.R-project.org # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # A copy of the GNU General Public License is available at # http://www.r-project.org/Licenses/ Return the object's value of the Akaike Information Criterion (or An Inf.. Crit..) AIC - function(object, ..., k = 2) UseMethod(AIC) ## AIC for logLik objects AIC.logLik - function(object, ..., k = 2) -2 * c(object) + k * attr(object, df) AIC.default - function(object, ..., k = 2) { ## AIC for various fitted objects --- any for which there's a logLik() method: ll - if(stats4 %in% loadedNamespaces()) stats4:::logLik else logLik if(length(list(...))) {# several objects: produce data.frame object - list(object, ...) val - lapply(object, ll) val - as.data.frame(t(sapply(val, function(el) c(attr(el, df), AIC(el, k = k) names(val) - c(df, AIC) Call - match.call() Call$k - NULL row.names(val) - as.character(Call[-1]) val } else AIC(ll(object), k = k) } Steven McKinney Statistician Molecular Oncology and Breast Cancer Program British Columbia Cancer Research Centre email: smckinney +at+ bccrc +dot+ ca tel: 604-675-8000 x7561 BCCRC Molecular Oncology 675 West 10th Ave, Floor 4 Vancouver B.C. V5Z 1L3 Canada __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.