On 2020-07-24 08:20, luke-tier...@uiowa.edu wrote:
Maybe try something like this:

url <- "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";
h <- xml2::read_html(url)


Error in open.connection(x, "rb") : HTTP error 404.


      Thanks for the suggestion, but this failed for me on the platform described in "sessionInfo" below.


tbl <- rvest::html_table(h)


      As I previously noted, RCurl::getURL returned a single character string of roughly 218 KB, from which I've so far gotten most but not all of what I want.  Unfortunately, when I fed that character vector to rvest::html_table, I got:


Error in UseMethod("html_table") :
  no applicable method for 'html_table' applied to an object of class "character"


      I don't know for sure yet, but I believe I'll be able to get what I want from the single character string using, e.g., gregexpr and other functions.


      Thanks again,
      Spencer Graves


Best,

luke

On Fri, 24 Jul 2020, Spencer Graves wrote:

Hi Bill et al.:


      That broke the dam:  It gave me a character vector of length 1 consisting of 218 KB.  I fed that to XML::readHTMLTable and purrr::map_chr, both of which returned lists of 337 data.frames. The former retained names for all the tables, absent from the latter.  The columns of the former are all character;  that's not true for the latter.


      Sadly, it's not quite what I want:  It's one table for each office-party combination, but it's lost the office designation. However, I'm confident I can figure out how to hack that.


      Thanks,
      Spencer Graves


On 2020-07-23 17:46, William Michels wrote:
Hi Spencer,

I tried the code below on an older R-installation, and it works fine.
Not a full solution, but it's a start:

library(RCurl)
Loading required package: bitops
url <- "https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";
M_sos <- getURL(url)
print(M_sos)
[1] "\r\n<!DOCTYPE html>\r\n\r\n<html
lang=\"en-us\">\r\n<head><title>\r\n\tSOS, Missouri - Elections:
Offices Filed in Candidate Filing\r\n</title><meta name=\"viewport\"
content=\"width=device-width, initial-scale=1.0\" [...remainder
truncated].

HTH, Bill.

W. Michels, Ph.D.



On Thu, Jul 23, 2020 at 2:55 PM Spencer Graves
<spencer.gra...@effectivedefense.org> wrote:
Hello, All:


        I've failed with multiple attempts to scrape the table of
candidates from the website of the Missouri Secretary of State:


https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975


        I've tried base::url, base::readLines, xml2::read_html, and
XML::readHTMLTable; see summary below.


        Suggestions?
        Thanks,
        Spencer Graves


sosURL <-
"https://s1.sos.mo.gov/CandidatesOnWeb/DisplayCandidatesPlacement.aspx?ElectionCode=750004975";

str(baseURL <- base::url(sosURL))
# this might give me something, but I don't know what

sosRead <- base::readLines(sosURL) # 404 Not Found
sosRb <- base::readLines(baseURL) # 404 Not Found

sosXml2 <- xml2::read_html(sosURL) # HTTP error 404.

sosXML <- XML::readHTMLTable(sosURL)
# List of 0;  does not seem to be XML

sessionInfo()

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets
[6] methods   base

loaded via a namespace (and not attached):
[1] compiler_4.0.2 tools_4.0.2    curl_4.3
[4] xml2_1.3.2     XML_3.99-0.3

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to