Re: [R] substituting dots in the names of the columns (sub, gsub, regexpr)

Felix Andrews Thu, 26 Jul 2007 07:40:29 -0700

Hi,

A dot in a regular expression matches any character, so you have to
escape each dot with backslash \\ (which itself is escaped in the
string, to confuse things...).
A plus symbol will match one or more of the preceding characters.
A dollar symbol will match the end of a string.


So:

gsub("\\.$", "", gsub("\\.+", ".", str))
[1] "y.m"        "BD.g.cm3"   "PR.Mpa"     "Ks.m.s"     "SP.g.g"
"P.m3.m3"    "theta1.g.g"
[8] "theta2.g.g" "AWC.g.g"

Learn more at ?regexp

Felix


On 7/26/07, 8rino-Luca Pantani <[EMAIL PROTECTED]> wrote:

Dear R users,
I have the following two problems, related to the function sub, grep,
regexpr and similia.

The header of the file(s) I have to import is like this.

c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks (m/s)", "SP g./g.", "P
(m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC (g/g)")

To get rid of spaces and symbols in the names of the columns,
I use read.table(... check.names=TRUE) and I get:
str <- c("y..m.", "BD..g.cm3.", "PR..Mpa.", "Ks..m.s.", "SP.g..g.",
"P..m3.m3.", "theta1..g.g.", "theta2..g.g.", "AWC..g.g.")

Now, my problem is to remove the trailing dots, as well as the double
dots, in order to get the names like the following
c("y.m", "BD.g.cm3", "PR.Mpa", "Ks.m.s", "SP.g.g", "P.m3.m3.",
"theta1.g.g", "theta2.g.g", "AWC.g.g")

I've searched the help pages for sub, regexpr and similia, and also
searched the help archives.
I understand that the dot is a peculiar sign since
sub("..", ".", str)
[1] "..m."        "...g.cm3."   "...Mpa."     "...m.s."     "..g..g."
[6] "..m3.m3."    ".eta1..g.g." ".eta2..g.g." ".C..g.g."

Therefore I tried
sub("\\..", ".", str)
[1] "y.m."        "BD.g.cm3."   "PR.Mpa."     "Ks.m.s."     "SP...g."
[6] "P.m3.m3."    "theta1.g.g." "theta2.g.g." "AWC.g.g."
and I've been surprised by the (to me) strange behaviour in "SP.g..g."
modified in "SP...g."
An this is the first problem I cannot solve.

Then there's the problem of trailing dot removal.
In
http://tolstoy.newcastle.edu.au/R/e2/help/07/01/8665.html
I've found a somewhat similar problem, but it do not works in this case
since:
gsub("[.].*", "", str)
[1] "y"      "BD"     "PR"     "Ks"     "SP"     "P"      "theta1" "theta2"
[9] "AWC"
And this the second problem

Apart this particular problems I would like to know more on regexp, sub
and so on, since each time
I have strings to manipulate, I must face my ignorance in the topic of
regular expression and its syntax.

Is there any page with examples, where I can improve my knowledge and
stop being frustrated each time I have to manipulate strings?

8rino

--
Ottorino-Luca Pantani, Università di Firenze
Dip. Scienza del Suolo e Nutrizione della Pianta
P.zle Cascine 28 50144 Firenze Italia
Tel 39 055 3288 202 (348 lab) Fax 39 055 333 273
[EMAIL PROTECTED]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Felix Andrews / 安福立
PhD candidate
Integrated Catchment Assessment and Management Centre
The Fenner School of Environment and Society
The Australian National University (Building 48A), ACT 0200
Beijing Bag, Locked Bag 40, Kingston ACT 2604
http://www.neurofractal.org/felix/
voice:+86_1051404394 (in China)
mobile:+86_13522529265 (in China)
mobile:+61_410400963 (in Australia)
xmpp:[EMAIL PROTECTED]
3358 543D AAC6 22C2 D336  80D9 360B 72DD 3E4C F5D8

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] substituting dots in the names of the columns (sub, gsub, regexpr)

Reply via email to