[R] regular expressions, sub
Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: ln D ~ ln N + ln^2 N + ln t The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and regular expressions What I am doing: (f - log(D) ~ log(N)+I(log(N)^2)+log(t)) (ft - sub(,,f)) # creates string with parts of formula, how to do it simpler? (fu - paste(ft[c(2,1,3)],collapse= )) # converts to one string Then I want to use \1 for backreferences something like (fv - sub(log( [:alpha:] N )^ [:alpha:),ln \\1^\\2,fu)) to change log(g)^7 to ln^7 g, and to eliminate I(): sub(I(blabla),\\1,fv) # I(xxx) - xxx The special characters are making trouble, sub acceps (, ) only in pairs. Code for experimentation: trysub - function(s,t,e) { ii-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii-ii+1,ifelse(i1, , ~),ext,ifelse(i2, , ~),perl,ifelse(i3, , ~),fixed ,ifelse(i4, , ~),useBytes: , try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=));invisible(0) } trysub(I(log(N)^2),ln n^2,fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub(log(,ln ,fu) # B: no substitutions; errors for cases 1..4,7.. 12 # typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n trysub(log\(,ln ,fu) # C: same as A trysub(log\\(,ln ,fu) # D: no substitutions; errors for cases 15,16# typical errors: 15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression 'log\\('\n trysub(log\\(([:alpha:]+)\\),ln \1,fu) # no substitutions, no errors # E: typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log\\(([:alpha:]+)\\)'\n Thanks for help Christian PS. The explanations in the documents -- Dr. Christian W. Hoffmann, Swiss Federal Research Institute WSL Mathematics + Statistical Computing Zuercherstrasse 111 CH-8903 Birmensdorf, Switzerland Tel +41-44-7392-277 (office) -111(exchange) Fax +41-44-7392-215 (fax) [EMAIL PROTECTED] http://www.wsl.ch/staff/christian.hoffmann International Conference 5.-7.6.2006 Ekaterinburg Russia Climate changes and their impact on boreal and temperate forests http://ecoinf.uran.ru/conference/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regular expressions, sub
Note that [:alpha:] is a pre-defined character class and should only be used inside []. And metacharacters need to be quoted. See ?regexp. f - log(D) ~ log(N)+I(log(N)^2)+log(t) f1 - deparse(f) f1 [1] log(D) ~ log(N) + I(log(N)^2) + log(t) Now we have a string. (f2 - gsub(I\\((.*)\\) , \\1 , f1)) [1] log(D) ~ log(N) + log(N)^2 + log(t) (f3 - gsub((?U)log\\((.*)\\), ln \\1, f2, perl=TRUE)) [1] ln D ~ ln N + ln N^2 + ln t (f4 - gsub(ln ([[:alpha:]])\\^([[:digit:]]), ln^\\2 \\1, f3)) [1] ln D ~ ln N + ln^2 N + ln t That should give you some ideas to be going on with. On Fri, 27 Jan 2006, Christian Hoffmann wrote: Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: ln D ~ ln N + ln^2 N + ln t The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and regular expressions What I am doing: (f - log(D) ~ log(N)+I(log(N)^2)+log(t)) (ft - sub(,,f)) # creates string with parts of formula, how to do it simpler? (fu - paste(ft[c(2,1,3)],collapse= )) # converts to one string Then I want to use \1 for backreferences something like (fv - sub(log( [:alpha:] N )^ [:alpha:),ln \\1^\\2,fu)) to change log(g)^7 to ln^7 g, and to eliminate I(): sub(I(blabla),\\1,fv) # I(xxx) - xxx The special characters are making trouble, sub acceps (, ) only in pairs. From ?regexp Any metacharacter with special meaning may be quoted by preceding it with a backslash. The metacharacters are '. \ | ( ) [ { ^ $ * + ?'. Code for experimentation: trysub - function(s,t,e) { ii-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii-ii+1,ifelse(i1, , ~),ext,ifelse(i2, , ~),perl,ifelse(i3, , ~),fixed ,ifelse(i4, , ~),useBytes: , try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=));invisible(0) } trysub(I(log(N)^2),ln n^2,fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub(log(,ln ,fu) # B: no substitutions; errors for cases 1..4,7.. 12 # typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n trysub(log\(,ln ,fu) # C: same as A trysub(log\\(,ln ,fu) # D: no substitutions; errors for cases 15,16# typical errors: 15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression 'log\\('\n trysub(log\\(([:alpha:]+)\\),ln \1,fu) # no substitutions, no errors # E: typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log\\(([:alpha:]+)\\)'\n Thanks for help Christian PS. The explanations in the documents -- Dr. Christian W. Hoffmann, Swiss Federal Research Institute WSL Mathematics + Statistical Computing Zuercherstrasse 111 CH-8903 Birmensdorf, Switzerland Tel +41-44-7392-277 (office) -111(exchange) Fax +41-44-7392-215 (fax) [EMAIL PROTECTED] http://www.wsl.ch/staff/christian.hoffmann International Conference 5.-7.6.2006 Ekaterinburg Russia Climate changes and their impact on boreal and temperate forests http://ecoinf.uran.ru/conference/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regular expressions, sub
Hello, Here is what I got after playing a little bit with your problem: # First of all, if you prefer 'ln' instead of 'log', why not to define: ln - function(x) log(x) ln2 - function(x) log(x)^2 ln3 - function(x) log(x)^3 ln4 - function(x) log(x)^4 # ... as many function as powers you need # Then, your formula is now closer to what you want # which makes the whole code easier to read for you: Form - ln(D) ~ ln(N) + ln2(N) + ln(t) # Same as your original formula # Here is the function to transform it in a more readable string: formulaTransform - function(form, as.expression = FALSE) { if (!inherits(form, formula)) stop('form' must be a 'formula' object!) # Transform the formula into a string (is it a better way?) Res - paste(as.character(form)[c(2, 1, 3)], collapse = ) if (as.expression) { # Transform the formula in a nice expression # Change '~' into '==' Res - sub(~, %~~%, Res) # How to do '~' in an expression? # Eliminate brackets Res - gsub([(]([A-Za-z0-9._]*)[)], ~ \\1, Res) # Transform powers Res - gsub(ln([2-9]), ln^\\1, Res) Res - eval(parse(text = Res)) } else { # Make a nicer string # Eliminate brackets Res - gsub([(]([A-Za-z0-9._]*)[)], \\1, Res) # Transform powers Res - gsub(ln([2-9]), ln^\\1, Res) } # Return the result return(Res) } # Here is a nicer presentation as a string formulaTransform(Form) # Here is an even nicer presentation (creating an expression) plot(1:3, type = n) text(2, 2, formulaTransform(Form, TRUE)) # The later form is really interesting when you use, for instance, # greek letters for variables, or so... Form2 - ln(alpha) ~ ln(beta) + ln2(beta) + ln3(beta) formulaTransform(Form2) plot(1:3, type = n) text(2, 2, formulaTransform(Form2, TRUE)) # ... but this could be refined even more! Best, Philippe Grosjean ..°})) ) ) ) ) ) ( ( ( ( (Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( (Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Pentagone (3D08) ( ( ( ( ( .. Christian Hoffmann wrote: Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: ln D ~ ln N + ln^2 N + ln t The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and regular expressions What I am doing: (f - log(D) ~ log(N)+I(log(N)^2)+log(t)) (ft - sub(,,f)) # creates string with parts of formula, how to do it simpler? (fu - paste(ft[c(2,1,3)],collapse= )) # converts to one string Then I want to use \1 for backreferences something like (fv - sub(log( [:alpha:] N )^ [:alpha:),ln \\1^\\2,fu)) to change log(g)^7 to ln^7 g, and to eliminate I(): sub(I(blabla),\\1,fv) # I(xxx) - xxx The special characters are making trouble, sub acceps (, ) only in pairs. Code for experimentation: trysub - function(s,t,e) { ii-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii-ii+1,ifelse(i1, , ~),ext,ifelse(i2, , ~),perl,ifelse(i3, , ~),fixed ,ifelse(i4, , ~),useBytes: , try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=));invisible(0) } trysub(I(log(N)^2),ln n^2,fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub(log(,ln ,fu) # B: no substitutions; errors for cases 1..4,7.. 12 # typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n trysub(log\(,ln ,fu) # C: same as A trysub(log\\(,ln ,fu) # D: no substitutions; errors for cases 15,16# typical errors: 15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression 'log\\('\n trysub(log\\(([:alpha:]+)\\),ln \1,fu) # no substitutions, no errors # E: typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log\\(([:alpha:]+)\\)'\n Thanks for help Christian PS. The explanations in the documents __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regular expressions, sub
There are some interactive regex tools around. I use a python one sometimes. You just then have to be careful re escaping and the style of regular expressions used in the tool you worked with and the target environment. Christian Hoffmann wrote: Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: ln D ~ ln N + ln^2 N + ln t The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and regular expressions __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] regular expressions, sub
In this post: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/30590.html Thomas Lumley provided a function to traverse a formula recursively. We can modify it as shown to transform ln(m)^n to ln^n(m) producing proc2. We then bundle everything up into proc3 which uses substitute to translate log to ln and remove (, the calls proc2 to do the aforementioned substitute and finally we use simple character processing to clean up the rest. Although this is substantially longer in terms of lines of code we did not have to write many of them because proc2 is actually just a modification of the code in the indicated post and the character processing becomes extremely simple. Also its more powerful able to handle expressions like: log(D) ~ log(log(N)^2)^3 proc2 -function(formula){ process-function(expr){ if (length(expr)==1) return(expr) if(length(expr)==2) { expr[[2]] - process(expr[[2]]) return(expr) } if ( expr[[1]]==as.name(^) length(expr[[2]])==2 expr[[2]][[1]] == as.name(ln) class(idx - expr[[3]]) == numeric) { expr - as.call(list(as.name(paste(ln,idx,sep = ^)), expr[[2]][[2]])) expr[[2]] - process(expr[[2]]) return(expr) } expr[[2]]-process(expr[[2]]) expr[[3]]-process(expr[[3]]) return(expr) } formula[[3]]-process(formula[[3]]) formula } proc3 - function(f) { # replace log with ln result - do.call(substitute, list(f, list(log = as.name(ln # remove I result - do.call(substitute, list(result, list(I = as.name(( # transform ln(m)^n to ln^n(m) result - proc2(result) # now clean up using simple character substitutions result - deparse(result) # ( - space result - gsub([(], , result) # remove and ) gsub([\)], , result) } # tests proc3( log(D) ~ log(N)+I(log(N)^2)+log(t) ) # ln D ~ ln N + ln^2 N + ln t proc3( log(D) ~ log(log(N)^2)^3) # ln D ~ ln^3 ln^2 N On 1/27/06, Christian Hoffmann [EMAIL PROTECTED] wrote: Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: ln D ~ ln N + ln^2 N + ln t The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help for grep etc. and regular expressions What I am doing: (f - log(D) ~ log(N)+I(log(N)^2)+log(t)) (ft - sub(,,f)) # creates string with parts of formula, how to do it simpler? (fu - paste(ft[c(2,1,3)],collapse= )) # converts to one string Then I want to use \1 for backreferences something like (fv - sub(log( [:alpha:] N )^ [:alpha:),ln \\1^\\2,fu)) to change log(g)^7 to ln^7 g, and to eliminate I(): sub(I(blabla),\\1,fv) # I(xxx) - xxx The special characters are making trouble, sub acceps (, ) only in pairs. Code for experimentation: trysub - function(s,t,e) { ii-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE)) print(paste(ii-ii+1,ifelse(i1, , ~),ext,ifelse(i2, , ~),perl,ifelse(i3, , ~),fixed ,ifelse(i4, , ~),useBytes: , try(sub(s,t,e, extended=i1, perl=i2, fixed=i3, useBytes=i4)),sep=));invisible(0) } trysub(I(log(N)^2),ln n^2,fu) # A: desired result for cases 5,6,13..16, the rest unsubstituted trysub(log(,ln ,fu) # B: no substitutions; errors for cases 1..4,7.. 12 # typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n trysub(log\(,ln ,fu) # C: same as A trysub(log\\(,ln ,fu) # D: no substitutions; errors for cases 15,16# typical errors: 15 ~ext ~perl ~fixed useBytes: Error in sub(pattern, replacement, x, ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression 'log\\('\n trysub(log\\(([:alpha:]+)\\),ln \1,fu) # no substitutions, no errors # E: typical errors: 3 ext perl ~fixed useBytes: Error in sub.perl(pattern, replacement, x, ignore.case, useBytes) : \n\tinvalid regular expression 'log\\(([:alpha:]+)\\)'\n Thanks for help Christian PS. The explanations in the documents -- Dr. Christian W. Hoffmann, Swiss Federal Research Institute WSL Mathematics + Statistical Computing Zuercherstrasse 111 CH-8903 Birmensdorf, Switzerland Tel +41-44-7392-277 (office) -111(exchange) Fax +41-44-7392-215 (fax) [EMAIL PROTECTED] http://www.wsl.ch/staff/christian.hoffmann International Conference 5.-7.6.2006 Ekaterinburg Russia Climate changes and their impact on boreal and temperate forests http://ecoinf.uran.ru/conference/
[R] Regular expressions sub
Dear all, I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . TIA, Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
x - scan(clipboard, what=) Read 7 items x [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 gsub([0-9]*\\., , x) [1] 11 11 11 31 2 3 8 Bernd Weiss wrote: Dear all, I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . TIA, Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
One solution is test - c(1.11,10.11,11.11,113.31,114.2,114.3) id - unlist(lapply(strsplit(test,[.]),function(x) {x[2]})) -Original Message- From: Bernd Weiss [mailto:[EMAIL PROTECTED] Sent: Thursday, August 18, 2005 12:10 PM To: r-help@stat.math.ethz.ch Subject: [R] Regular expressions sub Dear all, I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . TIA, Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
Bernd Weiss bernd.weiss at uni-koeln.de writes: I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . Define the dot as the hard separator, and allow for multiple digits before it: sample.id - c(1.11, 10.11, 11.11, 113.31, 114.2, 114.3, 114.8) gsub(^[0-9]*\., , sample.id) [1] 11 11 11 31 2 3 8 Hope this helps, Dirk __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
Dirk Eddelbuettel [EMAIL PROTECTED] writes: Bernd Weiss bernd.weiss at uni-koeln.de writes: I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . Define the dot as the hard separator, and allow for multiple digits before it: sample.id - c(1.11, 10.11, 11.11, 113.31, 114.2, 114.3, 114.8) gsub(^[0-9]*\., , sample.id) [1] 11 11 11 31 2 3 8 Or, more longwinded, but with less assumptions about what goes before the dot: gsub(^.*\\.(.*)$,\\1,sample.id) [1] 11 11 11 31 2 3 8 or, gsub(^.*\\.([^.]*)$,\\1,sample.id) [1] 11 11 11 31 2 3 8 -- O__ Peter Dalgaard Øster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] Regular expressions sub
On 18 Aug 2005 at 21:17, Peter Dalgaard wrote: Dirk Eddelbuettel [EMAIL PROTECTED] writes: Bernd Weiss bernd.weiss at uni-koeln.de writes: I am struggling with the use of regular expression. I got as.character(test$sample.id) [1] 1.11 10.11 11.11 113.31 114.2 114.3 114.8 and need [1] 11 11 11 31 2 3 8 I.e. remove everything before the . . Define the dot as the hard separator, and allow for multiple digits before it: sample.id - c(1.11, 10.11, 11.11, 113.31, 114.2, 114.3, 114.8) gsub(^[0-9]*\., , sample.id) [1] 11 11 11 31 2 3 8 Or, more longwinded, but with less assumptions about what goes before the dot: gsub(^.*\\.(.*)$,\\1,sample.id) [1] 11 11 11 31 2 3 8 Wow, thanks a lot for all the valuable suggestions. Bernd __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html