[R] prevent XML::readHTMLTable from suppressing

Spencer Graves Fri, 24 Jul 2020 21:00:28 -0700

Hello, All:

Thanks to Rasmus Liland, William Michels, and Luke Tierney withmy earlier web scraping question. With their help, I've made progress. Sadly, I still have a problem: One field has "<br/>", which getssuppressed by XML::readHTMLTable:

sosURL <-"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";

sosChars <- RCurl::getURL(sosURL)
MOcan <- XML::readHTMLTable(sosChars)
MOcan[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"


(Seneca <- regexpr('SENECA', sosChars))
substring(sosChars, Seneca-22, Seneca+14)


[1] "4476 FIVE MILE RD<br/>SENECA MO 64865"

How can I get essentially the same result but without havingXML::readHTMLTable suppress "<br/>"?

NOTE: I get something very similar with xml2::read_html andrvest::html_table:



sosPointers <- xml2::read_html(sosChars)
MOcan2 <- rvest::html_table(sosPointers)
MOcan2[[2]][1, 2]
[1] "4476 FIVE MILE RDSENECA MO 64865"

MOcan2 does not have names, and some of the fields areautomatically converted to integers, which I think is not smart in thisapplication.



      Thanks,
      Spencer Graves

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] prevent XML::readHTMLTable from suppressing

Reply via email to