Dear Andrew,
I screwed it a little bit up. The object was not a string vector, but an
xml object (the original xml with the abstracts).
str(x)
List of 2
$ node:<externalptr>
$ doc :<externalptr>
- attr(*, "class")= chr [1:2] "xml_document" "xml_node"
i pasted the R code for a function but had an error, which stopped the
parsing of the function. But the next lines were still executed:
npos = regexpr(patt, x, perl=TRUE);
# Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found
Variable x was actually the xml object - my mistake. It still takes 1-2
minutes to generate the final error.
Is regexpr trying to parse the xml with as.character first (I have not
checked this)?
It makes more sense to first parse the regex expression.
Sincerely,
Leonard
On 5/19/2022 3:26 AM, Andrew Simmons wrote:
Hello,
I tried this myself, something like:
dat <- utils::read.csv(
"https://raw.githubusercontent.com/discoleo/R/master/TextMining/Pubmed/Example_Abstracts_Title_Pubmed.csv",
check.names = FALSE
)
regexpr(patt, dat$Abstract, perl = TRUE)
regexpr(patt, dat$Title, perl = TRUE)
and I can't reproduce your issue. Mine seems to raise the error within
a second or less that object 'patt' does not exist. I'm using R 4.2.0
and Windows 11, though that shouldn't be making a difference: if you
look at Sys.info(), it's still Windows 10 with a build version of
22000. Don't really know what else to say, have you tried it again
since?
Regards,
Andrew Simmons
On Wed, May 18, 2022 at 5:09 PM Leonard Mada via R-help
<[email protected]> wrote:
Dear R Users,
I have run the following command in R:
# x = larger vector of strings (1200 Pubmed abstracts);
# patt = not defined;
npos = regexpr(patt, x, perl=TRUE);
# Error in regexpr(patt, x, perl = TRUE) : object 'patt' not found
The problem:
R becomes unresponsive and it takes 1-2 minutes to return the error. The
operation completes almost instantaneously with a valid pattern.
Is there a reason for this behavior?
Tested with R 4.2.0 on MS Windows 10.
I have uploaded a set with 1200 Pubmed abstracts on Github, if anyone
wants to check:
- see file: Example_Abstracts_Title_Pubmed.csv;
https://github.com/discoleo/R/tree/master/TextMining/Pubmed
The variable patt was not defined due to an error: but it took very long
to exit the operation and report the error.
Many thanks,
Leonard
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.