On 4 Nov 2013 19:30, "David Winsemius" <dwinsem...@comcast.net> wrote:

> Maybe you should use their "download" facility rather than trying to
deparse a complex webpage with lots of special user interaction "features":
>
> http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do
>

That web page depends on the user already having been to the previous page
to set up a session and so directly downloading a dataset requires setting
up cookies and making sure the request has all the right parameters. Looks
like a right pain.

--
> David.
> >
>
> On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:
>
> > Thanks.
> > I had already introduced this minor adjustments in the code, but the
real problem (to me) is the information that gets lost: the informative
name of the columns, the indicator type and the units.
>
> > Cheers
> >
> > Lorenzo
> >
> > On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas <ruipbarra...@sapo.pt>
wrote:
> >
> >> Hello,
> >>
> >> If you want to get rid of the (bp) stuff, you can use lapply/gsub.
Using Jean's code a bit changed,
> >>
> >> library(XML)
> >>
> >> mylines <- readLines(url("http://bit.ly/1coCohq";))
> >> closeAllConnections()
> >> mytable <- readHTMLTable(mylines, which = 2, asText=TRUE,
stringsAsFactors = FALSE)
> >>
> >> str(mytable)
> >>
> >> mytable[] <- lapply(mytable, function(x) gsub("\\(.*\\)", "", x))
> >> mytable[] <- lapply(mytable, function(x) gsub(",", "", x))
> >> mytable[] <- lapply(mytable, as.numeric)
> >>
> >> colnames(mytable) <- 2000:2013
> >>
> >>
> >> Hope this helps,
> >>
> >> Rui Barradas
> >>
> >> Em 04-11-2013 09:53, Lorenzo Isella escreveu:
> >>> Hello,
> >>> And thanks a lot.
> >>> This is indeed very close to what I need.
> >>> I am trying to figure out how not to "lose" the headers and how to
avoid
> >>> downloading labels like "(p)" together with the numerical data I am
> >>> interested in.
> >>> If anyone on the list knows how to make this minor modifications, s/he
> >>> will make my life much easier.
> >>> Cheers
> >>>
> >>> Lorenzo
> >>>
> >>>
> >>> On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean <jvad...@usgs.gov>
wrote:
> >>>
> >>>> Lorenzo,
> >>>>
> >>>> I may be able to help you get started.  You can use the XML package
to
> >>>> grab the information >off the internet.
> >>>>
> >>>> library(XML)
> >>>>
> >>>> mylines <- readLines(url("http://bit.ly/1coCohq";))
> >>>> closeAllConnections()mylist <- readHTMLTable(mylines,
> >>>> asText=TRUE)mytable <- mylist1$xTable
> >>>>
> >>>> However, when I look at the resulting object, mytable, it doesn't
have
> >>>> informative row or >column headings.  Perhaps someone else can figure
> >>>> out how to get that information.
> >>>>
> >>>> Jean
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
> >>>> <lorenzo.ise...@gmail.com> wrote:
> >>>>> Dear All,
> >>>>> I often need to do some work on some data which is publicly
available
> >>>>> on the EUROSTAT >>website.
> >>>>> I saw several ways to download automatically mainly the bulk data
> >>>>> from EUROSTAT to later on >>postprocess it with R, for instance
> >>>>>
> >>>>> http://bit.ly/HrDICj
> >>>>> http://bit.ly/HrDL10
> >>>>> http://bit.ly/HrDTgT
> >>>>>
> >>>>> However, what I would like to do is to be able to download directly
> >>>>> the csv file >>corresponding to a properly formatted dataset
> >>>>> (typically a dynamic dataset) from EUROSTAT.
> >>>>> To fix the ideas, please consider the dataset at the following link
> >>>>>
> >>>>> http://bit.ly/1coCohq
> >>>>>
> >>>>> what I would like to do is to automatically read its content into R,
> >>>>> or at least to >>automatically download it as a csv file (full
> >>>>> extraction, single file, no flags and >>footnotes) which I can then
> >>>>> manipulate easily.
> >>>>> Any suggestion is appreciated.
> >>>>> Cheers
> >>>>>
> >>>>> Lorenzo
> >>>>>
> >>>>> ______________________________________________
> >>>>> R-help@r-project.org mailing list
> >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>> PLEASE do read the posting guide
> >>>>> http://www.R-project.org/posting-guide.html
> >>>>> and provide commented, minimal, self-contained, reproducible code.
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to