[R] gsub syntax

2005-11-27 Thread John Logsdon
Hello

I know that R's string functions are not as extensive as those of Unix but
I need to do some text handling totally within an R environment because
the target is a Windows system which will not have the corresponding shell
utilities, sed, awk etc.

Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

I want to take just the last two digits where it is a 4-digit year and
both digits when it is a 2-digit year.  I should be able to use substr but
measurement from the string end (with a negative counter or something) is
not implemented:

 substr(dates,3,4)
[1]  73 74 02
 substr(dates,-2,4)
[1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
[1]  

So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
[1] 73  74  02  973 974 002

As I understand it (and comparing with sed), the \\1 should take the first
bracketed string but clearly this doesn't work.  If I try what should also
work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
[1] 73  74  02  973 974 002

On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates) 
[1] 73 74 02 73 74 02

So it appears that the substitution takes one character extra to the left
but the following indicates that the lower limit of the selected range is
also at fault:

 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
[1]   334   345  3456

Probably more elegant examples could be constructed that could home in on
the issue.

The version is R 2.0.1 on Linux so perhaps it is a little old now.

Questions:

1) Am I misunderstanding the gsub use?

2) Was it a bug that has since been corrected?

3) Is it still a bug in the latest version?

TIA

JOhn

John Logsdon   Try to make things as simple
Quantex Research Ltd, Manchester UK as possible but not simpler
[EMAIL PROTECTED]  [EMAIL PROTECTED]
+44(0)161 445 4951/G:+44(0)7717758675   www.quantex-research.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Dimitris Rizopoulos
you could use something like:

dates - c(73, 74, 02, 1973, 1974, 2002)
###
nd - nchar(dates)
substr(dates, ifelse(nd == 2, 1, 3), nd)


I hope it helps.

Best,
Dimitris


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
 http://www.student.kuleuven.be/~m0390867/dimitris.htm


- Original Message - 
From: John Logsdon [EMAIL PROTECTED]
To: r-help@stat.math.ethz.ch
Sent: Sunday, November 27, 2005 11:04 AM
Subject: [R] gsub syntax


 Hello

 I know that R's string functions are not as extensive as those of 
 Unix but
 I need to do some text handling totally within an R environment 
 because
 the target is a Windows system which will not have the corresponding 
 shell
 utilities, sed, awk etc.

 Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year 
 and
 both digits when it is a 2-digit year.  I should be able to use 
 substr but
 measurement from the string end (with a negative counter or 
 something) is
 not implemented:

 substr(dates,3,4)
 [1]  73 74 02
 substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
 [1]  

 So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the 
 first
 bracketed string but clearly this doesn't work.  If I try what 
 should also
 work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the 
 left
 but the following indicates that the lower limit of the selected 
 range is
 also at fault:

 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home 
 in on
 the issue.

 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

 2) Was it a bug that has since been corrected?

 3) Is it still a bug in the latest version?

 TIA

 JOhn

 John Logsdon   Try to make things as 
 simple
 Quantex Research Ltd, Manchester UK as possible but not 
 simpler
 [EMAIL PROTECTED] 
 [EMAIL PROTECTED]
 +44(0)161 445 4951/G:+44(0)7717758675   www.quantex-research.com

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Sundar Dorai-Raj


John Logsdon wrote:
 Hello
 
 I know that R's string functions are not as extensive as those of Unix but
 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.
 
 Can anyone explain the following gsub phenomenon to me:
 
 
dates-c(73,74,02,1973,1974,2002)
 
 
 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:
 
 
substr(dates,3,4)
 
 [1]  73 74 02
 
substr(dates,-2,4)
 
 [1] 73   74   02   1973 1974 2002
 
substr(dates,4,-2)
 
 [1]  
 
 So I tried gsub:
 
 
gsub([19|20]([0-9][0-9]),\\1,dates)
 
 [1] 73  74  02  973 974 002
 
 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.  If I try what should also
 work:
 
 
gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 
 [1] 73  74  02  973 974 002
 
 On the other hand the following does work:
 
 
gsub([19|20]([0-9])([0-9]),\\2,dates) 
 
 [1] 73 74 02 73 74 02
 
 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:
 
 
s-c(1,12,123,1234,12345,123456)
gsub([12]([4-6]*),,s)
 
 [1]   334   345  3456
 
 Probably more elegant examples could be constructed that could home in on
 the issue.
 
 The version is R 2.0.1 on Linux so perhaps it is a little old now.
 
 Questions:
 
 1) Am I misunderstanding the gsub use?
 
 2) Was it a bug that has since been corrected?
 
 3) Is it still a bug in the latest version?
 
 TIA
 
 JOhn


Hi, John,

I cannot comment on your questions since I'm no regexpr guru. However, 
it seems to me you can do the following instead:

gsub(.*([0-9][0-9]), \\1, dates)

This works fine on Linux  Windows, R-2.2.0.

HTH,

--sundar

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] the output of coxph

2005-11-27 Thread Zheng Zhao
Dear All:

I have some questions about the output of coxph.

Below is the input and output:


  coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data =
+  ovarian, x = TRUE)

Call:
coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data =
ovarian, x = TRUE)


   coef exp(coef) se(coef) z  p
age  0.147 1.158   0.0463  3.17 0.0015
rx  -0.815 0.443   0.6342 -1.28 0.2000
ecog.ps  0.103 1.109   0.6064  0.17 0.8600

Likelihood ratio test=15.9  on 3 df, p=0.00118  n= 26
---
Question One:
As I know, the p-value of age is the significance level. However what 
is the exact meaning of the parameter, and how do we calculate the 
parameter? If the sample size is small (20~40), is this estimation still 
reliable?

Question Two:
the p-value in the last line (Likelihood ratio test=15.9 on 3 df, 
p=0.00118) is asymptotically equivalent tests of the omnibus null 
hypothesis that all of the β’s are zero, according to John Fox's Cox 
Proportional-Hazards Regression for Survival Data 
(http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf)
Can anybody explain that why this true? (As I know, the p-value is 
obtained by 1-pchisq(2*log Likelihood ratio), and this is because 
2*log(likelihood ratio) is approximately chi-square for nested models.)

Thank you very much.

Sincerely,
Alan
2005-11-27

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] the output of coxph

2005-11-27 Thread Zheng Zhao
Dear All:

I have some questions about the output of coxph.

Below is the input and output:


  coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data =
+  ovarian, x = TRUE)

Call:
coxph(formula = Surv(futime, fustat) ~ age + rx + ecog.ps, data =
ovarian, x = TRUE)


   coef exp(coef) se(coef) z  p
age  0.147 1.158   0.0463  3.17 0.0015
rx  -0.815 0.443   0.6342 -1.28 0.2000
ecog.ps  0.103 1.109   0.6064  0.17 0.8600

Likelihood ratio test=15.9  on 3 df, p=0.00118  n= 26
---
Question One:
As I know, the p-value of age is the significance level. However what 
is the exact meaning of the parameter, and how do we calculate the 
parameter? If the sample size is small (20~40), is this estimation still 
reliable?

Question Two:
the p-value in the last line (Likelihood ratio test=15.9 on 3 df, 
p=0.00118) is asymptotically equivalent tests of the omnibus null 
hypothesis that all of the β’s are zero, according to John Fox's Cox 
Proportional-Hazards Regression for Survival Data 
(http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf)
Can anybody explain that why this true? (As I know, the p-value is 
obtained by 1-pchisq(2*log Likelihood ratio), and this is because 
2*log(likelihood ratio) is approximately chi-square for nested models.)

Thank you very much.

Sincerely,
Alan
2005-11-27

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Using an editor with R

2005-11-27 Thread Uwe Ligges
Duncan Murdoch wrote:

 On 11/26/2005 4:53 PM, Walter R. Paczkowski wrote:
 
 Hello,
 
 I changed the setting in options$editor to allow me to use my
 favorite editor.  In R 2.1.1 on Windows XP, I entered at the
 command line:
 
 options(editor=c:\\program files\\winedit\\winedit.exe)
 
 When I edited a function, say test, using fix(test), the editor
 opened perfectly.  But, when I saved the file and closed the
 editor, the R gui screen was white, blank, and completely
 unresponsive.  The only thing I could do was close R by clicking on
 the X in the upper right corner of the window.  How can I use my
 editor but be able to continue using R after I close the editor?
 What extra setting am I missing?


1. Probably you do not want to use fix() (or only under very rare 
circumstances), but use the code in your editor and source the file into 
R, so you do not need to close the editor.

2. Are you talking about winedit or the editor WinEdt (just one i 
in it ...).?


 
 This sounds like you didn't really close your editor.  R isn't smart
  enough to know that the editor closed a file, it can only see when
 the process finishes.
 
 I'd recommend using the RWinEdt package instead for a different way
 to integrate winedit with R.

Well, at least to integrate WinEdt. ;-)

Best,
Uwe Ligges



 Duncan Murdoch
 
 __ 
 R-help@stat.math.ethz.ch mailing list 
 https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
 posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] r question

2005-11-27 Thread Uwe Ligges
Please,

1. read the posting guide
2. use a sensible subject line
3. this is NOT an r question
4. ask your teacher to explain your homeworks, but not this list

Uwe Ligges



yuying shi wrote:

 If there are two random variable X1 and X2 which have
 a bivariate normal distribution with mean vector (10,
 10)and variance covariance matrix 
 [21.95
  1.953]
 
 How to calculate the mean and variance of the function
 Y=X1/X2? 
 
 Thanks a lot!
 xingyu
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] coherency-Time Series

2005-11-27 Thread Uwe Ligges
[EMAIL PROTECTED] wrote:

 hello!
 My name is Stefanos, from Athens.
 I'm a new user of R and I'm studying multivariate time series. I can't find 
 in the help menu how to calculate the cross spectrum and the coherency of 2 
 Time Series. Would you like to help me?

See ?spectrum and ?cor

Uwe Ligges


 Thanks
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] coherency-Time Series

2005-11-27 Thread Kjetil Brinchmann Halvorsen
[EMAIL PROTECTED] wrote:
 hello!
 My name is Stefanos, from Athens.
 I'm a new user of R and I'm studying multivariate time series. I can't find 
 in the help menu how to calculate the cross spectrum and the coherency of 2 
 Time Series. Would you like to help me?
 Thanks
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 
See
?spectrum,and especially component $coh  of output.

Kjetil

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Newton iteration questions

2005-11-27 Thread Uwe Ligges
Yet another time we shall solve your homeworks?
Please stop sending your homework questions to R-help!

Uwe Ligges


yuying shi wrote:

 Dear Sir/Madam,
  If I have a sample of observations that come from
 an extreme value distribution, the density function
 for the extreme value distribution is: 
 
 f(x)=(1/b)exp[-(x-a)/b]exp{-exp[-(x-a)/b]}, b0, x can
 be any value,
  
 my question is how to implement the Newton iteration
 and estimate the parameters for this distribution and
 the accuracy of epsilon=0.0001?
 
  The n= 100 observations are given as follows:
 x- c(8.8, 9.4, 8.7, 9.3, 9.6, 9.4, 9.1, 9.4, 8.4,
 6.8, 8.4,
 ?.2, 9.4, 7.4, 8.7, 9.4, 9.2, 9.3, 8.0, 8.5, 8.7, 9.7,
 9.8,
 ?.5, 7.1, 7.8, 9.0, 8.6, 9.4, 6.9, 9.1, 9.9, 7.3, 8.5,
 8.8,
 ?.4, 9.0, 8.6, 8.5, 9.2, 9.7, 9.2, 9.2, 8.4, 8.7, 9.6,
 9.2,
 ?.8, 8.5, 9.0, 8.9, 9.6, 8.0, 9.7, 8.4, 7.5, 9.1, 9.2,
 8.9,
 ?.2, 9.8, 9.4, 8.5, 9.3, 9.8, 9.6, 9.7, 8.9, 9.7, 8.7,
 8.6,
 ?.7, 8.6, 9.7, 7.7, 8.6, 9.7, 8.5, 9.4, 9.4, 9.7, 8.1,
 9.5,
 ?.3, 8.0, 9.8, 8.9, 9.5, 9.0, 8.7, 9.1, 8.5, 8.7, 8.4,
 9.3,
 ?.5, 8.9, 9.3, 9.0, 9.9)?
 
 thanks in advance!
 xingyu
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] rescale x-axis

2005-11-27 Thread Uwe Ligges
singyee ling wrote:

 Dear all,
 
 I am trying to draw a survival curve with probability of surviving as
 the y-axis and days (0- 500 days )as the x-axis. however, i do not
 want the days to be equally spaced on the x-axis as i am more
 interested in looking at the behaviour of the curve in the first 50
 days. I am reluctant to use  xlim=c(0,1000) as i want to see the whole
 picture. Hence, what I am interested in is a scale in which the days
 are not equally spaced. By that , I mean the length of the interval
 between the days get smaller and smaller, which gives greater emphasis
 to the intial period. (i.e the length of the interval betwen 0-1 days
 is longer then the interval between 1-2 days and so on) .Hope what i
 say above make sense. any advise?


What about applying a logarithm such as in

plot(1:10, log=x)

Uwe Ligges


 thanks!
 
 sing yee
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] creating a factor from other factors and ifelse

2005-11-27 Thread Peter Dalgaard
[EMAIL PROTECTED] writes:

 Hi,
 
 Given 

nevermind...

Five identical messages in five minutes and five seconds! Perhaps a
little more patience next time?


-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] r question

2005-11-27 Thread Peter Dalgaard
Uwe Ligges [EMAIL PROTECTED] writes:

 Please,
 
 1. read the posting guide
 2. use a sensible subject line
 3. this is NOT an r question
 4. ask your teacher to explain your homeworks, but not this list
 
 Uwe Ligges

And, btw, neither the mean nor the variance exists, so the question is
incomplete, and any answer approximate.
 
 
 
 yuying shi wrote:
 
  If there are two random variable X1 and X2 which have
  a bivariate normal distribution with mean vector (10,
  10)and variance covariance matrix 
  [21.95
   1.953]
  
  How to calculate the mean and variance of the function
  Y=X1/X2? 
  
  Thanks a lot!
  xingyu
  
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide! 
  http://www.R-project.org/posting-guide.html
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] IRT Package

2005-11-27 Thread Doran, Harold
I do not believe another IRT package exists. However, I have recently used the 
rasch() function in ltm for a study I am doing and have found it very useful. 
I'm curious (as I'm sure the ltm developer is) as to what are you doing that 
ltm cannot handle.

Harold  


-Original Message-
From:   [EMAIL PROTECTED] on behalf of Caio Lucidius Naberezny Azevedo
Sent:   Sat 11/26/2005 4:50 PM
To: r-help@stat.math.ethz.ch
Cc: 
Subject:[R] IRT Package

Hi all,
   
  Could anyone tell me if there is some package that fits any Item Response 
Model (further the ltm package)?
   
  Regards,
  
Caio


-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Gabor Grothendieck
On 11/27/05, John Logsdon [EMAIL PROTECTED] wrote:
 Hello

 I know that R's string functions are not as extensive as those of Unix but

I don't think this statement is true although I have seen it repeated.

 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.

Free versions of these utilities are available for Windows although they
don't come with Windows.  e.g. Google for gawk.


 Can anyone explain the following gsub phenomenon to me:

  dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:

  substr(dates,3,4)
 [1]  73 74 02
  substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
  substr(dates,4,-2)
 [1]  

 So I tried gsub:

  gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.  If I try what should also
 work:

  gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

  gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:

  s-c(1,12,123,1234,12345,123456)
  gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home in on
 the issue.

 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

 2) Was it a bug that has since been corrected?

 3) Is it still a bug in the latest version?


It works the same on my system which is 2.2.0 Windows patched
(2005-10-24). At first I too thought it was a bug but I noticed it
works the same in perl so now I am not sure. The following perl
program under Windows using perl 5.8.6 on Windows
gives 002 as the answer as the answer too:

   $_ = 2002;
   s/[19|20]([0-9])([0-9])/\1\2/g;
   print;

In any any case, it could be done like this:

   sub(.*(..)$, \\1, dates)

or

   substring(dates, nchar(dates)-1)

or the following which appends -01-01 to the year, converts it to Date
class, implicitly converts it back to character and then extracts
the 3rd to 4th character of the result:

   substring(as.Date(sprintf(%s-01-01, dates)), 3, 4)

or

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] gsub syntax

2005-11-27 Thread Prof Brian Ripley
R is blameless here: it works as documented and in the same way as 
POSIX tools.  It agrees with 'sed' using the same syntax (modulo the 
shell-specific quoting rules) e.g. in csh

% echo 1973 | sed 's/[19|20]\([0-9][0-9]\)/\1/g'
973
% echo 1973 | sed 's/\([19|20]\)\([0-9][0-9]\)/-\1-\2-/g'
-1-97-3
% echo 73 74 02 1973 1974 2002 | sed 's/[19|20]\([0-9][0-9]\)/\1/g'
73 74 02 973 974 002

so what happened when you were 'comparing with sed'?

[19|20] is a character class (containing five characters) matching one 
character, not a match for two characters as you seem to imagine.  It does 
not mean the same as 19|20, which is what you seem to have intended (and 
you seem only to want to do the substitution once on each string, so why 
use gsub?):

 sub(19|20([0-9][0-9]), \\1, dates)
[1] 73 74 02 73 74 02

A more direct way which would work e.g. for 1837 would be

sub(.*([0-9]{2}$), \\1, dates)

or even better (locale-independent)

sub(.*([[:digit:]]{2}$), \\1, dates)

Current versions of R have a help page ?regexp explaining what regexps 
are.  Even 2.0.1 did, although you were asked to update *before* posting 
(see the posting guide).  It was unambiguous:

A _character class_ is a list of characters enclosed by '[' and
']' matches any single character in that list ...
^^
...  Note that alternation does not work inside character classes,
where \code{|} has its literal meaning.


On Sun, 27 Nov 2005, John Logsdon wrote:

 Hello

 I know that R's string functions are not as extensive as those of Unix but
 I need to do some text handling totally within an R environment because
 the target is a Windows system which will not have the corresponding shell
 utilities, sed, awk etc.
 Can anyone explain the following gsub phenomenon to me:

 dates-c(73,74,02,1973,1974,2002)

 I want to take just the last two digits where it is a 4-digit year and
 both digits when it is a 2-digit year.  I should be able to use substr but
 measurement from the string end (with a negative counter or something) is
 not implemented:

Why 'should' it work in a different way to that documented?

 substr(dates,3,4)
 [1]  73 74 02
 substr(dates,-2,4)
 [1] 73   74   02   1973 1974 2002
 substr(dates,4,-2)
 [1]  

 So I tried gsub:

 gsub([19|20]([0-9][0-9]),\\1,dates)
 [1] 73  74  02  973 974 002

 As I understand it (and comparing with sed), the \\1 should take the first
 bracketed string but clearly this doesn't work.
 If I try what should also work:

 gsub([19|20]([0-9])([0-9]),\\1\\2,dates)
 [1] 73  74  02  973 974 002

 On the other hand the following does work:

 gsub([19|20]([0-9])([0-9]),\\2,dates)
 [1] 73 74 02 73 74 02

 So it appears that the substitution takes one character extra to the left
 but the following indicates that the lower limit of the selected range is
 also at fault:
 s-c(1,12,123,1234,12345,123456)
 gsub([12]([4-6]*),,s)
 [1]   334   345  3456

 Probably more elegant examples could be constructed that could home in on
 the issue.
 The version is R 2.0.1 on Linux so perhaps it is a little old now.

 Questions:

 1) Am I misunderstanding the gsub use?

Yes.

 2) Was it a bug that has since been corrected?

Unfortunately the bug reported two years ago in

 library(fortunes); fortune(WTFM)

still seems extant.  See the posting guide for advice on how to correct 
it.


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] multilevel models and sample size

2005-11-27 Thread ronggui
It is not a pure  R question,but I hope some one can give me advices.

I want to use analysis my data with the multilevel model.The data has 2 
levels the second level has 52 units and each second level unit has 19-23 
units.I think the sample size is quite small,but just now I can't make the 
sample size much bigger.So I want to ask if I use the multilevel model to 
analysis the data set,will it be acceptable?  or  unacceptable because of the 
small sample size?

Thank you very much!

ronggui 

2005-11-28

--
Deparment of Sociology
Fudan University

My new mail addres is [EMAIL PROTECTED]
Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

[R] Counting the occurence of each unique charecter string

2005-11-27 Thread Marco Visser
LS,
  
  I would really like to know how to count the frequency/occurrence of  
chachters inside a dataset. I am working with extreemly large datasets  of 
forest inventory data with a large variety of different species  inside it. 
  Each row inside the dataframe represents one individual tree and the 
simplified dataframe looks something like this:
  
  num species dbh   
  1sp1   30
  2sp1  20
  3sp2  30
  4sp1  40
  
  I need to be able to count the number of individuals per species, so I  need 
a command that will return for each unique species its occurence  inside the 
dataframe; 
  
  [sp1] 3
  [sp2] 1
  
  After a long search through help.search() and the web I found very  little 
and any alternative like exporting the dataset to another  program(excel) is 
not really an option because the dataset is far to  large.
  
  I am using R 2.2.0 in Windows and if anyone knows a solution please help!
  
  Many sincere thanks in advance,
  
  Marco 
  
  
  

-

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Counting the occurence of each unique charecter string

2005-11-27 Thread ronggui
use table() to get what you want.
see ?table

=== 2005-11-28 01:49:19 您在来信中写道:===

LS,
  
  I would really like to know how to count the frequency/occurrence of  
 chachters inside a dataset. I am working with extreemly large datasets  of 
 forest inventory data with a large variety of different species  inside it. 
  Each row inside the dataframe represents one individual tree and the 
 simplified dataframe looks something like this:
  
  num species dbh   
  1sp1   30
  2sp1  20
  3sp2  30
  4sp1  40
  
  I need to be able to count the number of individuals per species, so I  need 
 a command that will return for each unique species its occurence  inside the 
 dataframe; 
  
  [sp1] 3
  [sp2] 1
  
  After a long search through help.search() and the web I found very  little 
 and any alternative like exporting the dataset to another  program(excel) is 
 not really an option because the dataset is far to  large.
  
  I am using R 2.2.0 in Windows and if anyone knows a solution please help!
  
  Many sincere thanks in advance,
  
  Marco 
  
  
  
   
-

   [[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

= = = = = = = = = = = = = = = = = = = =



 

2005-11-28

--
Deparment of Sociology
Fudan University

My new mail addres is [EMAIL PROTECTED]
Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Counting the occurence of each unique charecter string

2005-11-27 Thread Chuck Cleland
?table

table(mydata$species)

Marco Visser wrote:
 LS,
   
   I would really like to know how to count the frequency/occurrence of  
 chachters inside a dataset. I am working with extreemly large datasets  of 
 forest inventory data with a large variety of different species  inside it. 
   Each row inside the dataframe represents one individual tree and the 
 simplified dataframe looks something like this:
   
   num species dbh   
   1sp1   30
   2sp1  20
   3sp2  30
   4sp1  40
   
   I need to be able to count the number of individuals per species, so I  
 need a command that will return for each unique species its occurence  inside 
 the dataframe; 
   
   [sp1] 3
   [sp2] 1
   
   After a long search through help.search() and the web I found very  little 
 and any alternative like exporting the dataset to another  program(excel) is 
 not really an option because the dataset is far to  large.
   
   I am using R 2.2.0 in Windows and if anyone knows a solution please help!
   
   Many sincere thanks in advance,
   
   Marco 
   
   
   
   
 -
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
 

-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 452-1424 (M, W, F)
fax: (917) 438-0894

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multilevel models and sample size

2005-11-27 Thread Berton Gunter
All models are wrong, but some are useful.  --George Box 

I do not understand what you mean by acceptable, nor levels nor units.
Specifying your model would help clarify things, I think. If by levels you
mean number of different values of a random factor, than 2 levels is
unlikely to tell you much useful about the variability of that factor. On
the other hand, 50 values might be. Depends on the model,the data, and the
scientific objectives, none of which you have stated clearly enough for me
to understand, anyway.

-- Bert

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of ronggui
Sent: Sunday, November 27, 2005 9:34 AM
To: r-help@stat.math.ethz.ch
Subject: [R] multilevel models and sample size

It is not a pure  R question,but I hope some one can give me advices.

I want to use analysis my data with the multilevel model.The data has 2
levels the second level has 52 units and each second level unit has
19-23 units.I think the sample size is quite small,but just now I can't make
the sample size much bigger.So I want to ask if I use the multilevel model
to analysis the data set,will it be acceptable?  or  unacceptable because of
the small sample size?

Thank you very much!

ronggui 

2005-11-28

--
Deparment of Sociology
Fudan University

My new mail addres is [EMAIL PROTECTED]
Blog:http://sociology.yculblog.com

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Counting the occurence of each unique charecter string

2005-11-27 Thread Ted Harding
On 27-Nov-05 Marco Visser wrote:
 LS,
   
   I would really like to know how to count the frequency/occurrence of 
 chachters inside a dataset. I am working with extreemly large datasets 
 of forest inventory data with a large variety of different species 
 inside it. 
   Each row inside the dataframe represents one individual tree and the
 simplified dataframe looks something like this:
   
   num species dbh   
   1sp1   30
   2sp1  20
   3sp2  30
   4sp1  40
   
   I need to be able to count the number of individuals per species, so
 I  need a command that will return for each unique species its
 occurence  inside the dataframe; 
   
   [sp1] 3
   [sp2] 1

Does the following help? (Using an artificial example a bit more
complicated than yours). The dataframe trees consists of a list
of species names under Species, and values of a numeric variable
under X.


   trees
  Species   X
  1 Larix decidua 203
  2  Pinus sylvestris 303
  3 Larix decidua 202
  4  Pinus sylvestris 301
  5   Picea abies 102
  6   Picea abies 103
  7  Pinus sylvestris 302
  8   Picea abies 101
  9 Larix decidua 201
  10  Picea abies 104
  11  Picea abies 105
  12 Pinus sylvestris 304


   freqs-as.data.frame(table(trees$Species))
   colnames(freqs)-c(Species,Counts)
   freqs
 Species Counts
  1Larix decidua  3
  2  Picea abies  5
  3 Pinus sylvestris  4


   mean(freqs$Counts)
  [1] 4
   sd(freqs$Counts)
  [1] 1


Just using table() would give you the same information, but
converting it to a dataframe makes that information more
readily accessible by familiar methods.

Hoping this helps,
Ted.



E-Mail: (Ted Harding) [EMAIL PROTECTED]
Fax-to-email: +44 (0)870 094 0861
Date: 27-Nov-05   Time: 18:27:10
-- XFMail --

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] multilevel models and sample size

2005-11-27 Thread Kjetil Brinchmann Halvorsen
ronggui wrote:
 It is not a pure  R question,but I hope some one can give me advices.
 
 I want to use analysis my data with the multilevel model.The data has 2 
 levels the second level has 52 units and each second level unit has 19-23 
 units.I think the sample size is quite small,but just now I can't make the 
 sample size much bigger.So I want to ask if I use the multilevel model to 
 analysis the data set,will it be acceptable?  or  unacceptable because of the 
 small sample size?
 

This kind of question I usually try to answer by
simulation, which is very easy in R.

Kjetil


 Thank you very much!
 
 ronggui 
 
 2005-11-28
 
 --
 Deparment of Sociology
 Fudan University
 
 My new mail addres is [EMAIL PROTECTED]
 Blog:http://sociology.yculblog.com
 
 
 
 
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] multilevel models and sample size

2005-11-27 Thread Prof Brian Ripley
On Sun, 27 Nov 2005, Berton Gunter wrote:

 All models are wrong, but some are useful.  --George Box

 I do not understand what you mean by acceptable, nor levels nor units.
 Specifying your model would help clarify things, I think. If by levels you
 mean number of different values of a random factor, than 2 levels is
 unlikely to tell you much useful about the variability of that factor. On
 the other hand, 50 values might be. Depends on the model,the data, and the
 scientific objectives, none of which you have stated clearly enough for me
 to understand, anyway.

My guess is that he means this is a tested design with e.g. 52 classes
containing 19-23 pupils each.  (It always helps to state the real 
problem!)

If so, this is quite a large problem for multilevel models.  The classical 
nested designs for measurement errors typically have two replications at 
the lowest level - you get an idea of the variability from the many 
differences between matched pairs.  Of course the homogeneity assumptions 
have to be approximately true.

 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of ronggui
 Sent: Sunday, November 27, 2005 9:34 AM
 To: r-help@stat.math.ethz.ch
 Subject: [R] multilevel models and sample size

 It is not a pure  R question,but I hope some one can give me advices.

 I want to use analysis my data with the multilevel model.The data has 2
 levels the second level has 52 units and each second level unit has
 19-23 units.I think the sample size is quite small,but just now I can't make
 the sample size much bigger.So I want to ask if I use the multilevel model
 to analysis the data set,will it be acceptable?  or  unacceptable because of
 the small sample size?


-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] obtaining a ROC curve

2005-11-27 Thread Frank E Harrell Jr
Anjali Karve wrote:
 Hello,
   
   I have a classification tree. I want to obtain a ROC curve for this test. 
 What is the easiest way to obtain one?
   
   -Anjali

ROC curves have a number of problems, chief among them leading to the 
temptation of dichotomizing test results.  ROC areas are useful 
statistics though.  In the Hmisc package see somers2 and rcorr.cens for 
getting the ROC area nonparametrically.

Frank

-- 
Frank E Harrell Jr   Professor and Chair   School of Medicine
  Department of Biostatistics   Vanderbilt University

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Counting the occurence of each unique charecter string

2005-11-27 Thread Gabor Grothendieck
On 11/27/05, Ted Harding [EMAIL PROTECTED] wrote:
 On 27-Nov-05 Marco Visser wrote:
  LS,
 
I would really like to know how to count the frequency/occurrence of
  chachters inside a dataset. I am working with extreemly large datasets
  of forest inventory data with a large variety of different species
  inside it.
Each row inside the dataframe represents one individual tree and the
  simplified dataframe looks something like this:
 
num species dbh
1sp1   30
2sp1  20
3sp2  30
4sp1  40
 
I need to be able to count the number of individuals per species, so
  I  need a command that will return for each unique species its
  occurence  inside the dataframe;
 
[sp1] 3
[sp2] 1

 Does the following help? (Using an artificial example a bit more
 complicated than yours). The dataframe trees consists of a list
 of species names under Species, and values of a numeric variable
 under X.


   trees
  Species   X
  1 Larix decidua 203
  2  Pinus sylvestris 303
  3 Larix decidua 202
  4  Pinus sylvestris 301
  5   Picea abies 102
  6   Picea abies 103
  7  Pinus sylvestris 302
  8   Picea abies 101
  9 Larix decidua 201
  10  Picea abies 104
  11  Picea abies 105
  12 Pinus sylvestris 304


   freqs-as.data.frame(table(trees$Species))
   colnames(freqs)-c(Species,Counts)
   freqs
 Species Counts
  1Larix decidua  3
  2  Picea abies  5
  3 Pinus sylvestris  4


   mean(freqs$Counts)
  [1] 4
   sd(freqs$Counts)
  [1] 1


 Just using table() would give you the same information, but
 converting it to a dataframe makes that information more
 readily accessible by familiar methods.

 Hoping this helps,
 Ted.



or using the iris dataset that comes with R and making use
of as.data.frame.table we can shorten that slightly to just:

as.data.frame.table(table(Species = iris$Species), responseName = Count)

Incidently, I just noticed that there is an inconsistency between as.data.frame
and as.data.frame.table making it impossible to shorten as.data.frame.table
to as.data.frame in the above due to the responseName= argument
which is not referenced in the generic.

 args(as.data.frame)
function (x, row.names = NULL, optional = FALSE)
NULL
 args(as.data.frame.table)
function (x, row.names = NULL, optional = FALSE, responseName = Freq)
NULL
 R.version.string # Windows
[1] R version 2.2.0, 2005-10-24

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] R-help Digest, Vol 33, Issue 27

2005-11-27 Thread A.J. Rossini
 From: Duncan Murdoch [EMAIL PROTECTED]

 I'd recommend using the RWinEdt package instead for a different way to
 integrate winedit with R.

winedit and winedt are two different editors, last I checked.

best,
-tony

[EMAIL PROTECTED]
Muttenz, Switzerland.
Commit early,commit often, and commit in a repository from which we can easily
roll-back your mistakes (AJR, 4Jan05).

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] fixed, random effects with variable weights

2005-11-27 Thread Raphael Schoenle
Hi everyone,


I have tried to solve a simple problem for days but I can't figure out 
how to run it properly. If someone could give me a hint, this would be 
really great.

Basically, I want to run a standard economist's fixed, and random 
effects regression (corresponds to xtreg in STATA) but with _variable_ 
weights (they correspond to changing industry shares in the market).

Here is what I do:

regsc-lme(dsc~dcomp+dperc,random=~1|ind7090)
update(regsc,weights=varFixed(~wt))

1. however, my results are different from what I obtain in Stata using 
areg (the weighted fixed effects times series regression). any ideas?
2. how do I read of the random affects results from this regression? 
(i.e. coefficients on dcomp and dperc?)

Any hint would greatly be appreciated.

Best,

-Raphael
[[alternative text/enriched version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] 'For each file in folder F do....'

2005-11-27 Thread Ron Piccinini
Hello,

I have 2700 text files in a folder and need to apply
the same program/procedure to each individually. I'm
trying to find how to code something like:

For each file in Folder do {Procedure}

is there an easy way to do this? other suggestions? 

I have tried to list all the files names in a vector
e.g.

listfiles[1:10,1] 

1   H:/Rtest/AXP.txt
2H:/Rtest/BA.txt
3 H:/Rtest/C.txt
4   H:/Rtest/CAT.txt
5H:/Rtest/DD.txt
6   H:/Rtest/DIS.txt
7H:/Rtest/EK.txt
8H:/Rtest/GE.txt
9H:/Rtest/GM.txt
10   H:/Rtest/HD.txt

but R doesn't like statements of type

read.table(file=listfiles[1,1])

since 'file' must be a character string or
connection...

Any thoughts?

Many thanks in advance,

Ron Piccinini.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] 'For each file in folder F do....'

2005-11-27 Thread Kjetil Brinchmann Halvorsen
Ron Piccinini wrote:
 Hello,
 
 I have 2700 text files in a folder and need to apply
 the same program/procedure to each individually. I'm
 trying to find how to code something like:
 
 For each file in Folder do {Procedure}
 
 is there an easy way to do this? other suggestions? 

files - listfiles()
results - lapply(files, yourprocessing())

where yourprocessing is a function taking as argument a file name and 
returning whatever you want.

Kjetil


 
 I have tried to list all the files names in a vector
 e.g.
 
 listfiles[1:10,1] 
 
 1   H:/Rtest/AXP.txt
 2H:/Rtest/BA.txt
 3 H:/Rtest/C.txt
 4   H:/Rtest/CAT.txt
 5H:/Rtest/DD.txt
 6   H:/Rtest/DIS.txt
 7H:/Rtest/EK.txt
 8H:/Rtest/GE.txt
 9H:/Rtest/GM.txt
 10   H:/Rtest/HD.txt
 
 but R doesn't like statements of type
 
 read.table(file=listfiles[1,1])
 
 since 'file' must be a character string or
 connection...
 
 Any thoughts?
 
 Many thanks in advance,
 
 Ron Piccinini.
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] 'For each file in folder F do....'

2005-11-27 Thread Duncan Murdoch
On 11/27/2005 3:51 PM, Ron Piccinini wrote:
 Hello,
 
 I have 2700 text files in a folder and need to apply
 the same program/procedure to each individually. I'm
 trying to find how to code something like:
 
 For each file in Folder do {Procedure}
 
 is there an easy way to do this? other suggestions? 
 
 I have tried to list all the files names in a vector
 e.g.
 
 
listfiles[1:10,1] 
 
 
 1   H:/Rtest/AXP.txt
 2H:/Rtest/BA.txt
 3 H:/Rtest/C.txt
 4   H:/Rtest/CAT.txt
 5H:/Rtest/DD.txt
 6   H:/Rtest/DIS.txt
 7H:/Rtest/EK.txt
 8H:/Rtest/GE.txt
 9H:/Rtest/GM.txt
 10   H:/Rtest/HD.txt
 
 but R doesn't like statements of type
 
 
read.table(file=listfiles[1,1])
 
 
 since 'file' must be a character string or
 connection...
 
 Any thoughts?

 From the look of it, the listfiles column that you created has been 
converted to a factor.  You can convert back to character using 
as.character(); the as.is=TRUE parameter in the file reading functions 
will prevent the conversion in the first place, if that's how it happened.

Then something like

results - list()
for (f in as.character(listfiles[,1])) results[[f]] - read.table(file=f)

will read all the files and put them in a list.

Duncan Murdoch

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] creating a factor from other factors and ifelse

2005-11-27 Thread Dimitri Joe
Hi,

Given

 sec98 - factor(rep(1:2,3), labels=c(A, B))
 sec99 - factor(rep(2:1,3), labels=c(A, B))
sec99[c(2,5)] - NA
 sec00 - factor( c( rep(1,3), rep(2,3) ), labels=c(A, B))
sec00[c(2,4)] - NA
 sec1 - ifelse(!is.na(sec99), sec99,
ifelse(!is.na(sec00), sec00, NA ))

We get

 sec1; class(sec1)
[1]  2 NA  2  1  2  1
[1] integer

I wonder why sec1 as above defined  in not a factor, since it has been 
created from (logical operations and) factors. Of course, one could do

 sec1 - factor(sec1, labels=levels(sec99))

but this would be a problem if I had (as I actually do) sec99 and sec00 
instead defined as

 sec99 - factor(c(1,2,3,2,3,3), labels=c(A, B, C))
   sec99[c(2,5)] - NA
 sec00 - factor(c(4,1,1,2,4,2), labels=c(A, B, D))
sec00[c(2,4)] - NA

# because
 sec1 - ifelse(!is.na(sec99), sec99,
ifelse(!is.na(sec00), sec00, NA ))

# gives us
 sec1; class(sec1)
[1]  1 NA  3  2  3  3
[1] integer

now it's hard to tell where each 3 in sec1 means C or D. What I 
actually wanted was

 sec1; class(sec1)
[1]  A NA  C  B  D  C
[1] factor

Any suggestions on how to do it in a simple way will be welcome.
Thanks,
Dimitri

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html