On 4 Nov 2013 19:30, David Winsemius dwinsem...@comcast.net wrote:
Maybe you should use their download facility rather than trying to
deparse a complex webpage with lots of special user interaction features:
http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do
That web page depends on
This looks as though you need to be a little XML old-school.
readHTMLTable is a summary function drawing on:
?htmlTreeParse() turns the table into xml
?xpathApply()
and more.
#xpathApply(doc, , //td, function(x)xmlValue(x)) breaks each line at
the end of a table cell and extracts the value
#
Lorenzo,
You might want to post this is a new question to get some new eyes on it.
Or, you could try posting your question to http://stackoverflow.com/.
Scraping the web is a common topic for that group.
Jean
On Mon, Nov 4, 2013 at 3:53 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote:
Hello,
And thanks a lot.
This is indeed very close to what I need.
I am trying to figure out how not to lose the headers and how to avoid
downloading labels like (p) together with the numerical data I am
interested in.
If anyone on the list knows how to make this minor modifications, s/he
Hi Lorenzo,
Perhaps package pxR can help you out.
http://cran.at.r-project.org/web/packages/pxR/index.html
pxR: PC-Axis with R
The pxR package provides a set of functions for reading and writing PC-Axis
files, used by different statistical organizations around the globe for
data dissemination.
Hello,
If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using
Jean's code a bit changed,
library(XML)
mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()
mytable - readHTMLTable(mylines, which = 2, asText=TRUE,
stringsAsFactors = FALSE)
str(mytable)
Thanks.
I had already introduced this minor adjustments in the code, but the real
problem (to me) is the information that gets lost: the informative name of
the columns, the indicator type and the units.
Cheers
Lorenzo
On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas ruipbarra...@sapo.pt
On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:
Thanks.
I had already introduced this minor adjustments in the code, but the real
problem (to me) is the information that gets lost: the informative name of
the columns, the indicator type and the units.
Maybe you should use their
On Mon, 04 Nov 2013 20:26:46 +0100, David Winsemius
dwinsem...@comcast.net wrote:
On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:
Thanks.
I had already introduced this minor adjustments in the code, but the
real problem (to me) is the information that gets lost: the informative
name
Lorenzo,
I may be able to help you get started. You can use the XML package to grab
the information off the internet.
library(XML)
mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()
mylist - readHTMLTable(mylines, asText=TRUE)
mytable - mylist1$xTable
However, when I look
Dear All,
I often need to do some work on some data which is publicly available on
the EUROSTAT website.
I saw several ways to download automatically mainly the bulk data from
EUROSTAT to later on postprocess it with R, for instance
http://bit.ly/HrDICj
http://bit.ly/HrDL10
11 matches
Mail list logo