Hi
I have stumbled upon a problem when using gregexpr and regmatches, with the
following error-message:
Error in iconv(x, "latin1", "ASCII") :
'x' must be a list of NULL or raw vectors
The data:
(1)
I have two journal articles and after some regex manipulation I am at the
following situation:
# manipluat only two full text articles
author.test <- articles1[1:2]
# extract author informaiton
r <- gregexpr("(\"authors\":(.*?)\"(.*?)\")|(\"authors\": \\[(.*?)\\],)",
author.test)
authors.raw <- regmatches(author.test, r)
authors.raw
[[1]]
[1] "\"authors\": [\"Allan G. KING\", \"B. Lindsay LOWELL\", \"Frank D.
BEAN\"],"
[[2]]
[1] "\"authors\": \"Chris Baldry\", \""
(2)
Now, if I want to conduct additional regex manipulation I get the Error
stated above.
r <- gregexpr("([^(\"authors\":)])(.*?)(\"(.*?)\")", authors.raw)
authors.raw <- regmatches(authors.raw, r)
Error in iconv(x, "latin1", "ASCII") :
'x' must be a list of NULL or raw vectors
(3)
One of the ways to avoid this is to unlist(authors.raw) - see below - but
the problem with this is that I lose some information which was contained in
the list. The first element contains three character elements and which are
the authors of the first paper. I want to keep them in that list format.
> authors.raw <- unlist(regmatches(authors.raw, r))
> authors.raw
[1] " [\"Allan G. KING\"" ", \"B. Lindsay LOWELL\"" ", \"Frank D.
BEAN\"" " \"Chris Baldry\""
(4)
So what I want to do is to avoid unlis() and apply the gregex() multiple
times in a row. Any ideas?
Thanks in advance
Adel
--
View this message in context:
http://r.789695.n4.nabble.com/Using-gregexpr-and-regmatches-but-getting-Iconv-error-tp4700677.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.