This looks as though you need to be a little XML old-school.
readHTMLTable is a summary function drawing on:
?htmlTreeParse() turns the table into xml
?xpathApply()
and more.
#xpathApply(doc, , "//td", function(x)xmlValue(x)) breaks each line at
the end of a table cell and extracts the value
# T
On 4 Nov 2013 19:30, "David Winsemius" wrote:
> Maybe you should use their "download" facility rather than trying to
deparse a complex webpage with lots of special user interaction "features":
>
> http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do
>
That web page depends on the user alrea
On Mon, 04 Nov 2013 20:26:46 +0100, David Winsemius
wrote:
On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:
Thanks.
I had already introduced this minor adjustments in the code, but the
real problem (to me) is the information that gets lost: the informative
name of the columns, the in
On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:
> Thanks.
> I had already introduced this minor adjustments in the code, but the real
> problem (to me) is the information that gets lost: the informative name of
> the columns, the indicator type and the units.
Maybe you should use their "dow
Thanks.
I had already introduced this minor adjustments in the code, but the real
problem (to me) is the information that gets lost: the informative name of
the columns, the indicator type and the units.
Cheers
Lorenzo
On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas
wrote:
Hello,
If
Hello,
If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using
Jean's code a bit changed,
library(XML)
mylines <- readLines(url("http://bit.ly/1coCohq";))
closeAllConnections()
mytable <- readHTMLTable(mylines, which = 2, asText=TRUE,
stringsAsFactors = FALSE)
str(mytable)
Hi Lorenzo,
Perhaps package "pxR" can help you out.
http://cran.at.r-project.org/web/packages/pxR/index.html
pxR: PC-Axis with R
The pxR package provides a set of functions for reading and writing PC-Axis
files, used by different statistical organizations around the globe for
data dissemination.
Hello,
And thanks a lot.
This is indeed very close to what I need.
I am trying to figure out how not to "lose" the headers and how to avoid
downloading labels like "(p)" together with the numerical data I am
interested in.
If anyone on the list knows how to make this minor modifications, s/he
Lorenzo,
You might want to post this is a new question to get some new eyes on it.
Or, you could try posting your question to http://stackoverflow.com/.
Scraping the web is a common topic for that group.
Jean
On Mon, Nov 4, 2013 at 3:53 AM, Lorenzo Isella wrote:
> Hello,
> And thanks a lot.
Lorenzo,
I may be able to help you get started. You can use the XML package to grab
the information off the internet.
library(XML)
mylines <- readLines(url("http://bit.ly/1coCohq";))
closeAllConnections()
mylist <- readHTMLTable(mylines, asText=TRUE)
mytable <- mylist1$xTable
However, when I l
Dear All,
I often need to do some work on some data which is publicly available on
the EUROSTAT website.
I saw several ways to download automatically mainly the bulk data from
EUROSTAT to later on postprocess it with R, for instance
http://bit.ly/HrDICj
http://bit.ly/HrDL10
http://bit.ly/HrD
11 matches
Mail list logo