[R] gsub syntax
Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn John Logsdon Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler [EMAIL PROTECTED] [EMAIL PROTECTED] +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
you could use something like: dates - c(73, 74, 02, 1973, 1974, 2002) ### nd - nchar(dates) substr(dates, ifelse(nd == 2, 1, 3), nd) I hope it helps. Best, Dimitris Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: http://www.med.kuleuven.be/biostat/ http://www.student.kuleuven.be/~m0390867/dimitris.htm - Original Message - From: John Logsdon [EMAIL PROTECTED] To: r-help@stat.math.ethz.ch Sent: Sunday, November 27, 2005 11:04 AM Subject: [R] gsub syntax Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn John Logsdon Try to make things as simple Quantex Research Ltd, Manchester UK as possible but not simpler [EMAIL PROTECTED] [EMAIL PROTECTED] +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
John Logsdon wrote: Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? TIA JOhn Hi, John, I cannot comment on your questions since I'm no regexpr guru. However, it seems to me you can do the following instead: gsub(.*([0-9][0-9]), \\1, dates) This works fine on Linux Windows, R-2.2.0. HTH, --sundar __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] the output of coxph
Dear All: I have some questions about the output of coxph. Below is the input and output: coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data = + ovarian, x = TRUE) Call: coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data = ovarian, x = TRUE) coef exp(coef) se(coef) z p age 0.147 1.158 0.0463 3.17 0.0015 rx -0.815 0.443 0.6342 -1.28 0.2000 ecog.ps 0.103 1.109 0.6064 0.17 0.8600 Likelihood ratio test=15.9 on 3 df, p=0.00118 n= 26 --- Question One: As I know, the p-value of age is the significance level. However what is the exact meaning of the parameter, and how do we calculate the parameter? If the sample size is small (20~40), is this estimation still reliable? Question Two: the p-value in the last line (Likelihood ratio test=15.9 on 3 df, p=0.00118) is asymptotically equivalent tests of the omnibus null hypothesis that all of the β’s are zero, according to John Fox's Cox Proportional-Hazards Regression for Survival Data (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf) Can anybody explain that why this true? (As I know, the p-value is obtained by 1-pchisq(2*log Likelihood ratio), and this is because 2*log(likelihood ratio) is approximately chi-square for nested models.) Thank you very much. Sincerely, Alan 2005-11-27 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] the output of coxph
Dear All: I have some questions about the output of coxph. Below is the input and output: coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data = + ovarian, x = TRUE) Call: coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data = ovarian, x = TRUE) coef exp(coef) se(coef) z p age 0.147 1.158 0.0463 3.17 0.0015 rx -0.815 0.443 0.6342 -1.28 0.2000 ecog.ps 0.103 1.109 0.6064 0.17 0.8600 Likelihood ratio test=15.9 on 3 df, p=0.00118 n= 26 --- Question One: As I know, the p-value of age is the significance level. However what is the exact meaning of the parameter, and how do we calculate the parameter? If the sample size is small (20~40), is this estimation still reliable? Question Two: the p-value in the last line (Likelihood ratio test=15.9 on 3 df, p=0.00118) is asymptotically equivalent tests of the omnibus null hypothesis that all of the β’s are zero, according to John Fox's Cox Proportional-Hazards Regression for Survival Data (http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf) Can anybody explain that why this true? (As I know, the p-value is obtained by 1-pchisq(2*log Likelihood ratio), and this is because 2*log(likelihood ratio) is approximately chi-square for nested models.) Thank you very much. Sincerely, Alan 2005-11-27 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Using an editor with R
Duncan Murdoch wrote: On 11/26/2005 4:53 PM, Walter R. Paczkowski wrote: Hello, I changed the setting in options$editor to allow me to use my favorite editor. In R 2.1.1 on Windows XP, I entered at the command line: options(editor=c:\\program files\\winedit\\winedit.exe) When I edited a function, say test, using fix(test), the editor opened perfectly. But, when I saved the file and closed the editor, the R gui screen was white, blank, and completely unresponsive. The only thing I could do was close R by clicking on the X in the upper right corner of the window. How can I use my editor but be able to continue using R after I close the editor? What extra setting am I missing? 1. Probably you do not want to use fix() (or only under very rare circumstances), but use the code in your editor and source the file into R, so you do not need to close the editor. 2. Are you talking about winedit or the editor WinEdt (just one i in it ...).? This sounds like you didn't really close your editor. R isn't smart enough to know that the editor closed a file, it can only see when the process finishes. I'd recommend using the RWinEdt package instead for a different way to integrate winedit with R. Well, at least to integrate WinEdt. ;-) Best, Uwe Ligges Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] r question
Please, 1. read the posting guide 2. use a sensible subject line 3. this is NOT an r question 4. ask your teacher to explain your homeworks, but not this list Uwe Ligges yuying shi wrote: If there are two random variable X1 and X2 which have a bivariate normal distribution with mean vector (10, 10)and variance covariance matrix [21.95 1.953] How to calculate the mean and variance of the function Y=X1/X2? Thanks a lot! xingyu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] coherency-Time Series
[EMAIL PROTECTED] wrote: hello! My name is Stefanos, from Athens. I'm a new user of R and I'm studying multivariate time series. I can't find in the help menu how to calculate the cross spectrum and the coherency of 2 Time Series. Would you like to help me? See ?spectrum and ?cor Uwe Ligges Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] coherency-Time Series
[EMAIL PROTECTED] wrote: hello! My name is Stefanos, from Athens. I'm a new user of R and I'm studying multivariate time series. I can't find in the help menu how to calculate the cross spectrum and the coherency of 2 Time Series. Would you like to help me? Thanks __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html See ?spectrum,and especially component $coh of output. Kjetil __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Newton iteration questions
Yet another time we shall solve your homeworks? Please stop sending your homework questions to R-help! Uwe Ligges yuying shi wrote: Dear Sir/Madam, If I have a sample of observations that come from an extreme value distribution, the density function for the extreme value distribution is: f(x)=(1/b)exp[-(x-a)/b]exp{-exp[-(x-a)/b]}, b0, x can be any value, my question is how to implement the Newton iteration and estimate the parameters for this distribution and the accuracy of epsilon=0.0001? The n= 100 observations are given as follows: x- c(8.8, 9.4, 8.7, 9.3, 9.6, 9.4, 9.1, 9.4, 8.4, 6.8, 8.4, ?.2, 9.4, 7.4, 8.7, 9.4, 9.2, 9.3, 8.0, 8.5, 8.7, 9.7, 9.8, ?.5, 7.1, 7.8, 9.0, 8.6, 9.4, 6.9, 9.1, 9.9, 7.3, 8.5, 8.8, ?.4, 9.0, 8.6, 8.5, 9.2, 9.7, 9.2, 9.2, 8.4, 8.7, 9.6, 9.2, ?.8, 8.5, 9.0, 8.9, 9.6, 8.0, 9.7, 8.4, 7.5, 9.1, 9.2, 8.9, ?.2, 9.8, 9.4, 8.5, 9.3, 9.8, 9.6, 9.7, 8.9, 9.7, 8.7, 8.6, ?.7, 8.6, 9.7, 7.7, 8.6, 9.7, 8.5, 9.4, 9.4, 9.7, 8.1, 9.5, ?.3, 8.0, 9.8, 8.9, 9.5, 9.0, 8.7, 9.1, 8.5, 8.7, 8.4, 9.3, ?.5, 8.9, 9.3, 9.0, 9.9)? thanks in advance! xingyu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] rescale x-axis
singyee ling wrote: Dear all, I am trying to draw a survival curve with probability of surviving as the y-axis and days (0- 500 days )as the x-axis. however, i do not want the days to be equally spaced on the x-axis as i am more interested in looking at the behaviour of the curve in the first 50 days. I am reluctant to use xlim=c(0,1000) as i want to see the whole picture. Hence, what I am interested in is a scale in which the days are not equally spaced. By that , I mean the length of the interval between the days get smaller and smaller, which gives greater emphasis to the intial period. (i.e the length of the interval betwen 0-1 days is longer then the interval between 1-2 days and so on) .Hope what i say above make sense. any advise? What about applying a logarithm such as in plot(1:10, log=x) Uwe Ligges thanks! sing yee __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] creating a factor from other factors and ifelse
[EMAIL PROTECTED] writes: Hi, Given nevermind... Five identical messages in five minutes and five seconds! Perhaps a little more patience next time? -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] r question
Uwe Ligges [EMAIL PROTECTED] writes: Please, 1. read the posting guide 2. use a sensible subject line 3. this is NOT an r question 4. ask your teacher to explain your homeworks, but not this list Uwe Ligges And, btw, neither the mean nor the variance exists, so the question is incomplete, and any answer approximate. yuying shi wrote: If there are two random variable X1 and X2 which have a bivariate normal distribution with mean vector (10, 10)and variance covariance matrix [21.95 1.953] How to calculate the mean and variance of the function Y=X1/X2? Thanks a lot! xingyu __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] IRT Package
I do not believe another IRT package exists. However, I have recently used the rasch() function in ltm for a study I am doing and have found it very useful. I'm curious (as I'm sure the ltm developer is) as to what are you doing that ltm cannot handle. Harold -Original Message- From: [EMAIL PROTECTED] on behalf of Caio Lucidius Naberezny Azevedo Sent: Sat 11/26/2005 4:50 PM To: r-help@stat.math.ethz.ch Cc: Subject:[R] IRT Package Hi all, Could anyone tell me if there is some package that fits any Item Response Model (further the ltm package)? Regards, Caio - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
On 11/27/05, John Logsdon [EMAIL PROTECTED] wrote: Hello I know that R's string functions are not as extensive as those of Unix but I don't think this statement is true although I have seen it repeated. I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Free versions of these utilities are available for Windows although they don't come with Windows. e.g. Google for gawk. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? 2) Was it a bug that has since been corrected? 3) Is it still a bug in the latest version? It works the same on my system which is 2.2.0 Windows patched (2005-10-24). At first I too thought it was a bug but I noticed it works the same in perl so now I am not sure. The following perl program under Windows using perl 5.8.6 on Windows gives 002 as the answer as the answer too: $_ = 2002; s/[19|20]([0-9])([0-9])/\1\2/g; print; In any any case, it could be done like this: sub(.*(..)$, \\1, dates) or substring(dates, nchar(dates)-1) or the following which appends -01-01 to the year, converts it to Date class, implicitly converts it back to character and then extracts the 3rd to 4th character of the result: substring(as.Date(sprintf(%s-01-01, dates)), 3, 4) or __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] gsub syntax
R is blameless here: it works as documented and in the same way as POSIX tools. It agrees with 'sed' using the same syntax (modulo the shell-specific quoting rules) e.g. in csh % echo 1973 | sed 's/[19|20]\([0-9][0-9]\)/\1/g' 973 % echo 1973 | sed 's/\([19|20]\)\([0-9][0-9]\)/-\1-\2-/g' -1-97-3 % echo 73 74 02 1973 1974 2002 | sed 's/[19|20]\([0-9][0-9]\)/\1/g' 73 74 02 973 974 002 so what happened when you were 'comparing with sed'? [19|20] is a character class (containing five characters) matching one character, not a match for two characters as you seem to imagine. It does not mean the same as 19|20, which is what you seem to have intended (and you seem only to want to do the substitution once on each string, so why use gsub?): sub(19|20([0-9][0-9]), \\1, dates) [1] 73 74 02 73 74 02 A more direct way which would work e.g. for 1837 would be sub(.*([0-9]{2}$), \\1, dates) or even better (locale-independent) sub(.*([[:digit:]]{2}$), \\1, dates) Current versions of R have a help page ?regexp explaining what regexps are. Even 2.0.1 did, although you were asked to update *before* posting (see the posting guide). It was unambiguous: A _character class_ is a list of characters enclosed by '[' and ']' matches any single character in that list ... ^^ ... Note that alternation does not work inside character classes, where \code{|} has its literal meaning. On Sun, 27 Nov 2005, John Logsdon wrote: Hello I know that R's string functions are not as extensive as those of Unix but I need to do some text handling totally within an R environment because the target is a Windows system which will not have the corresponding shell utilities, sed, awk etc. Can anyone explain the following gsub phenomenon to me: dates-c(73,74,02,1973,1974,2002) I want to take just the last two digits where it is a 4-digit year and both digits when it is a 2-digit year. I should be able to use substr but measurement from the string end (with a negative counter or something) is not implemented: Why 'should' it work in a different way to that documented? substr(dates,3,4) [1] 73 74 02 substr(dates,-2,4) [1] 73 74 02 1973 1974 2002 substr(dates,4,-2) [1] So I tried gsub: gsub([19|20]([0-9][0-9]),\\1,dates) [1] 73 74 02 973 974 002 As I understand it (and comparing with sed), the \\1 should take the first bracketed string but clearly this doesn't work. If I try what should also work: gsub([19|20]([0-9])([0-9]),\\1\\2,dates) [1] 73 74 02 973 974 002 On the other hand the following does work: gsub([19|20]([0-9])([0-9]),\\2,dates) [1] 73 74 02 73 74 02 So it appears that the substitution takes one character extra to the left but the following indicates that the lower limit of the selected range is also at fault: s-c(1,12,123,1234,12345,123456) gsub([12]([4-6]*),,s) [1] 334 345 3456 Probably more elegant examples could be constructed that could home in on the issue. The version is R 2.0.1 on Linux so perhaps it is a little old now. Questions: 1) Am I misunderstanding the gsub use? Yes. 2) Was it a bug that has since been corrected? Unfortunately the bug reported two years ago in library(fortunes); fortune(WTFM) still seems extant. See the posting guide for advice on how to correct it. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] multilevel models and sample size
It is not a pure R question,but I hope some one can give me advices. I want to use analysis my data with the multilevel model.The data has 2 levels the second level has 52 units and each second level unit has 19-23 units.I think the sample size is quite small,but just now I can't make the sample size much bigger.So I want to ask if I use the multilevel model to analysis the data set,will it be acceptable? or unacceptable because of the small sample size? Thank you very much! ronggui 2005-11-28 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Counting the occurence of each unique charecter string
LS, I would really like to know how to count the frequency/occurrence of chachters inside a dataset. I am working with extreemly large datasets of forest inventory data with a large variety of different species inside it. Each row inside the dataframe represents one individual tree and the simplified dataframe looks something like this: num species dbh 1sp1 30 2sp1 20 3sp2 30 4sp1 40 I need to be able to count the number of individuals per species, so I need a command that will return for each unique species its occurence inside the dataframe; [sp1] 3 [sp2] 1 After a long search through help.search() and the web I found very little and any alternative like exporting the dataset to another program(excel) is not really an option because the dataset is far to large. I am using R 2.2.0 in Windows and if anyone knows a solution please help! Many sincere thanks in advance, Marco - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Counting the occurence of each unique charecter string
use table() to get what you want. see ?table === 2005-11-28 01:49:19 您在来信中写道:=== LS, I would really like to know how to count the frequency/occurrence of chachters inside a dataset. I am working with extreemly large datasets of forest inventory data with a large variety of different species inside it. Each row inside the dataframe represents one individual tree and the simplified dataframe looks something like this: num species dbh 1sp1 30 2sp1 20 3sp2 30 4sp1 40 I need to be able to count the number of individuals per species, so I need a command that will return for each unique species its occurence inside the dataframe; [sp1] 3 [sp2] 1 After a long search through help.search() and the web I found very little and any alternative like exporting the dataset to another program(excel) is not really an option because the dataset is far to large. I am using R 2.2.0 in Windows and if anyone knows a solution please help! Many sincere thanks in advance, Marco - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html = = = = = = = = = = = = = = = = = = = = 2005-11-28 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Counting the occurence of each unique charecter string
?table table(mydata$species) Marco Visser wrote: LS, I would really like to know how to count the frequency/occurrence of chachters inside a dataset. I am working with extreemly large datasets of forest inventory data with a large variety of different species inside it. Each row inside the dataframe represents one individual tree and the simplified dataframe looks something like this: num species dbh 1sp1 30 2sp1 20 3sp2 30 4sp1 40 I need to be able to count the number of individuals per species, so I need a command that will return for each unique species its occurence inside the dataframe; [sp1] 3 [sp2] 1 After a long search through help.search() and the web I found very little and any alternative like exporting the dataset to another program(excel) is not really an option because the dataset is far to large. I am using R 2.2.0 in Windows and if anyone knows a solution please help! Many sincere thanks in advance, Marco - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 452-1424 (M, W, F) fax: (917) 438-0894 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multilevel models and sample size
All models are wrong, but some are useful. --George Box I do not understand what you mean by acceptable, nor levels nor units. Specifying your model would help clarify things, I think. If by levels you mean number of different values of a random factor, than 2 levels is unlikely to tell you much useful about the variability of that factor. On the other hand, 50 values might be. Depends on the model,the data, and the scientific objectives, none of which you have stated clearly enough for me to understand, anyway. -- Bert -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of ronggui Sent: Sunday, November 27, 2005 9:34 AM To: r-help@stat.math.ethz.ch Subject: [R] multilevel models and sample size It is not a pure R question,but I hope some one can give me advices. I want to use analysis my data with the multilevel model.The data has 2 levels the second level has 52 units and each second level unit has 19-23 units.I think the sample size is quite small,but just now I can't make the sample size much bigger.So I want to ask if I use the multilevel model to analysis the data set,will it be acceptable? or unacceptable because of the small sample size? Thank you very much! ronggui 2005-11-28 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Counting the occurence of each unique charecter string
On 27-Nov-05 Marco Visser wrote: LS, I would really like to know how to count the frequency/occurrence of chachters inside a dataset. I am working with extreemly large datasets of forest inventory data with a large variety of different species inside it. Each row inside the dataframe represents one individual tree and the simplified dataframe looks something like this: num species dbh 1sp1 30 2sp1 20 3sp2 30 4sp1 40 I need to be able to count the number of individuals per species, so I need a command that will return for each unique species its occurence inside the dataframe; [sp1] 3 [sp2] 1 Does the following help? (Using an artificial example a bit more complicated than yours). The dataframe trees consists of a list of species names under Species, and values of a numeric variable under X. trees Species X 1 Larix decidua 203 2 Pinus sylvestris 303 3 Larix decidua 202 4 Pinus sylvestris 301 5 Picea abies 102 6 Picea abies 103 7 Pinus sylvestris 302 8 Picea abies 101 9 Larix decidua 201 10 Picea abies 104 11 Picea abies 105 12 Pinus sylvestris 304 freqs-as.data.frame(table(trees$Species)) colnames(freqs)-c(Species,Counts) freqs Species Counts 1Larix decidua 3 2 Picea abies 5 3 Pinus sylvestris 4 mean(freqs$Counts) [1] 4 sd(freqs$Counts) [1] 1 Just using table() would give you the same information, but converting it to a dataframe makes that information more readily accessible by familiar methods. Hoping this helps, Ted. E-Mail: (Ted Harding) [EMAIL PROTECTED] Fax-to-email: +44 (0)870 094 0861 Date: 27-Nov-05 Time: 18:27:10 -- XFMail -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multilevel models and sample size
ronggui wrote: It is not a pure R question,but I hope some one can give me advices. I want to use analysis my data with the multilevel model.The data has 2 levels the second level has 52 units and each second level unit has 19-23 units.I think the sample size is quite small,but just now I can't make the sample size much bigger.So I want to ask if I use the multilevel model to analysis the data set,will it be acceptable? or unacceptable because of the small sample size? This kind of question I usually try to answer by simulation, which is very easy in R. Kjetil Thank you very much! ronggui 2005-11-28 -- Deparment of Sociology Fudan University My new mail addres is [EMAIL PROTECTED] Blog:http://sociology.yculblog.com __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] multilevel models and sample size
On Sun, 27 Nov 2005, Berton Gunter wrote: All models are wrong, but some are useful. --George Box I do not understand what you mean by acceptable, nor levels nor units. Specifying your model would help clarify things, I think. If by levels you mean number of different values of a random factor, than 2 levels is unlikely to tell you much useful about the variability of that factor. On the other hand, 50 values might be. Depends on the model,the data, and the scientific objectives, none of which you have stated clearly enough for me to understand, anyway. My guess is that he means this is a tested design with e.g. 52 classes containing 19-23 pupils each. (It always helps to state the real problem!) If so, this is quite a large problem for multilevel models. The classical nested designs for measurement errors typically have two replications at the lowest level - you get an idea of the variability from the many differences between matched pairs. Of course the homogeneity assumptions have to be approximately true. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of ronggui Sent: Sunday, November 27, 2005 9:34 AM To: r-help@stat.math.ethz.ch Subject: [R] multilevel models and sample size It is not a pure R question,but I hope some one can give me advices. I want to use analysis my data with the multilevel model.The data has 2 levels the second level has 52 units and each second level unit has 19-23 units.I think the sample size is quite small,but just now I can't make the sample size much bigger.So I want to ask if I use the multilevel model to analysis the data set,will it be acceptable? or unacceptable because of the small sample size? -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] obtaining a ROC curve
Anjali Karve wrote: Hello, I have a classification tree. I want to obtain a ROC curve for this test. What is the easiest way to obtain one? -Anjali ROC curves have a number of problems, chief among them leading to the temptation of dichotomizing test results. ROC areas are useful statistics though. In the Hmisc package see somers2 and rcorr.cens for getting the ROC area nonparametrically. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Counting the occurence of each unique charecter string
On 11/27/05, Ted Harding [EMAIL PROTECTED] wrote: On 27-Nov-05 Marco Visser wrote: LS, I would really like to know how to count the frequency/occurrence of chachters inside a dataset. I am working with extreemly large datasets of forest inventory data with a large variety of different species inside it. Each row inside the dataframe represents one individual tree and the simplified dataframe looks something like this: num species dbh 1sp1 30 2sp1 20 3sp2 30 4sp1 40 I need to be able to count the number of individuals per species, so I need a command that will return for each unique species its occurence inside the dataframe; [sp1] 3 [sp2] 1 Does the following help? (Using an artificial example a bit more complicated than yours). The dataframe trees consists of a list of species names under Species, and values of a numeric variable under X. trees Species X 1 Larix decidua 203 2 Pinus sylvestris 303 3 Larix decidua 202 4 Pinus sylvestris 301 5 Picea abies 102 6 Picea abies 103 7 Pinus sylvestris 302 8 Picea abies 101 9 Larix decidua 201 10 Picea abies 104 11 Picea abies 105 12 Pinus sylvestris 304 freqs-as.data.frame(table(trees$Species)) colnames(freqs)-c(Species,Counts) freqs Species Counts 1Larix decidua 3 2 Picea abies 5 3 Pinus sylvestris 4 mean(freqs$Counts) [1] 4 sd(freqs$Counts) [1] 1 Just using table() would give you the same information, but converting it to a dataframe makes that information more readily accessible by familiar methods. Hoping this helps, Ted. or using the iris dataset that comes with R and making use of as.data.frame.table we can shorten that slightly to just: as.data.frame.table(table(Species = iris$Species), responseName = Count) Incidently, I just noticed that there is an inconsistency between as.data.frame and as.data.frame.table making it impossible to shorten as.data.frame.table to as.data.frame in the above due to the responseName= argument which is not referenced in the generic. args(as.data.frame) function (x, row.names = NULL, optional = FALSE) NULL args(as.data.frame.table) function (x, row.names = NULL, optional = FALSE, responseName = Freq) NULL R.version.string # Windows [1] R version 2.2.0, 2005-10-24 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] R-help Digest, Vol 33, Issue 27
From: Duncan Murdoch [EMAIL PROTECTED] I'd recommend using the RWinEdt package instead for a different way to integrate winedit with R. winedit and winedt are two different editors, last I checked. best, -tony [EMAIL PROTECTED] Muttenz, Switzerland. Commit early,commit often, and commit in a repository from which we can easily roll-back your mistakes (AJR, 4Jan05). __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] fixed, random effects with variable weights
Hi everyone, I have tried to solve a simple problem for days but I can't figure out how to run it properly. If someone could give me a hint, this would be really great. Basically, I want to run a standard economist's fixed, and random effects regression (corresponds to xtreg in STATA) but with _variable_ weights (they correspond to changing industry shares in the market). Here is what I do: regsc-lme(dsc~dcomp+dperc,random=~1|ind7090) update(regsc,weights=varFixed(~wt)) 1. however, my results are different from what I obtain in Stata using areg (the weighted fixed effects times series regression). any ideas? 2. how do I read of the random affects results from this regression? (i.e. coefficients on dcomp and dperc?) Any hint would greatly be appreciated. Best, -Raphael [[alternative text/enriched version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] 'For each file in folder F do....'
Hello, I have 2700 text files in a folder and need to apply the same program/procedure to each individually. I'm trying to find how to code something like: For each file in Folder do {Procedure} is there an easy way to do this? other suggestions? I have tried to list all the files names in a vector e.g. listfiles[1:10,1] 1 H:/Rtest/AXP.txt 2H:/Rtest/BA.txt 3 H:/Rtest/C.txt 4 H:/Rtest/CAT.txt 5H:/Rtest/DD.txt 6 H:/Rtest/DIS.txt 7H:/Rtest/EK.txt 8H:/Rtest/GE.txt 9H:/Rtest/GM.txt 10 H:/Rtest/HD.txt but R doesn't like statements of type read.table(file=listfiles[1,1]) since 'file' must be a character string or connection... Any thoughts? Many thanks in advance, Ron Piccinini. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 'For each file in folder F do....'
Ron Piccinini wrote: Hello, I have 2700 text files in a folder and need to apply the same program/procedure to each individually. I'm trying to find how to code something like: For each file in Folder do {Procedure} is there an easy way to do this? other suggestions? files - listfiles() results - lapply(files, yourprocessing()) where yourprocessing is a function taking as argument a file name and returning whatever you want. Kjetil I have tried to list all the files names in a vector e.g. listfiles[1:10,1] 1 H:/Rtest/AXP.txt 2H:/Rtest/BA.txt 3 H:/Rtest/C.txt 4 H:/Rtest/CAT.txt 5H:/Rtest/DD.txt 6 H:/Rtest/DIS.txt 7H:/Rtest/EK.txt 8H:/Rtest/GE.txt 9H:/Rtest/GM.txt 10 H:/Rtest/HD.txt but R doesn't like statements of type read.table(file=listfiles[1,1]) since 'file' must be a character string or connection... Any thoughts? Many thanks in advance, Ron Piccinini. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] 'For each file in folder F do....'
On 11/27/2005 3:51 PM, Ron Piccinini wrote: Hello, I have 2700 text files in a folder and need to apply the same program/procedure to each individually. I'm trying to find how to code something like: For each file in Folder do {Procedure} is there an easy way to do this? other suggestions? I have tried to list all the files names in a vector e.g. listfiles[1:10,1] 1 H:/Rtest/AXP.txt 2H:/Rtest/BA.txt 3 H:/Rtest/C.txt 4 H:/Rtest/CAT.txt 5H:/Rtest/DD.txt 6 H:/Rtest/DIS.txt 7H:/Rtest/EK.txt 8H:/Rtest/GE.txt 9H:/Rtest/GM.txt 10 H:/Rtest/HD.txt but R doesn't like statements of type read.table(file=listfiles[1,1]) since 'file' must be a character string or connection... Any thoughts? From the look of it, the listfiles column that you created has been converted to a factor. You can convert back to character using as.character(); the as.is=TRUE parameter in the file reading functions will prevent the conversion in the first place, if that's how it happened. Then something like results - list() for (f in as.character(listfiles[,1])) results[[f]] - read.table(file=f) will read all the files and put them in a list. Duncan Murdoch __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] creating a factor from other factors and ifelse
Hi, Given sec98 - factor(rep(1:2,3), labels=c(A, B)) sec99 - factor(rep(2:1,3), labels=c(A, B)) sec99[c(2,5)] - NA sec00 - factor( c( rep(1,3), rep(2,3) ), labels=c(A, B)) sec00[c(2,4)] - NA sec1 - ifelse(!is.na(sec99), sec99, ifelse(!is.na(sec00), sec00, NA )) We get sec1; class(sec1) [1] 2 NA 2 1 2 1 [1] integer I wonder why sec1 as above defined in not a factor, since it has been created from (logical operations and) factors. Of course, one could do sec1 - factor(sec1, labels=levels(sec99)) but this would be a problem if I had (as I actually do) sec99 and sec00 instead defined as sec99 - factor(c(1,2,3,2,3,3), labels=c(A, B, C)) sec99[c(2,5)] - NA sec00 - factor(c(4,1,1,2,4,2), labels=c(A, B, D)) sec00[c(2,4)] - NA # because sec1 - ifelse(!is.na(sec99), sec99, ifelse(!is.na(sec00), sec00, NA )) # gives us sec1; class(sec1) [1] 1 NA 3 2 3 3 [1] integer now it's hard to tell where each 3 in sec1 means C or D. What I actually wanted was sec1; class(sec1) [1] A NA C B D C [1] factor Any suggestions on how to do it in a simple way will be welcome. Thanks, Dimitri __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html