Re: [R] Download CSV Files from EUROSTAT Website

2013-11-05 Thread Barry Rowlingson
On 4 Nov 2013 19:30, David Winsemius dwinsem...@comcast.net wrote:

 Maybe you should use their download facility rather than trying to
deparse a complex webpage with lots of special user interaction features:

 http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do


That web page depends on the user already having been to the previous page
to set up a session and so directly downloading a dataset requires setting
up cookies and making sure the request has all the right parameters. Looks
like a right pain.

--
 David.
 

 On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:

  Thanks.
  I had already introduced this minor adjustments in the code, but the
real problem (to me) is the information that gets lost: the informative
name of the columns, the indicator type and the units.

  Cheers
 
  Lorenzo
 
  On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas ruipbarra...@sapo.pt
wrote:
 
  Hello,
 
  If you want to get rid of the (bp) stuff, you can use lapply/gsub.
Using Jean's code a bit changed,
 
  library(XML)
 
  mylines - readLines(url(http://bit.ly/1coCohq;))
  closeAllConnections()
  mytable - readHTMLTable(mylines, which = 2, asText=TRUE,
stringsAsFactors = FALSE)
 
  str(mytable)
 
  mytable[] - lapply(mytable, function(x) gsub(\\(.*\\), , x))
  mytable[] - lapply(mytable, function(x) gsub(,, , x))
  mytable[] - lapply(mytable, as.numeric)
 
  colnames(mytable) - 2000:2013
 
 
  Hope this helps,
 
  Rui Barradas
 
  Em 04-11-2013 09:53, Lorenzo Isella escreveu:
  Hello,
  And thanks a lot.
  This is indeed very close to what I need.
  I am trying to figure out how not to lose the headers and how to
avoid
  downloading labels like (p) together with the numerical data I am
  interested in.
  If anyone on the list knows how to make this minor modifications, s/he
  will make my life much easier.
  Cheers
 
  Lorenzo
 
 
  On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov
wrote:
 
  Lorenzo,
 
  I may be able to help you get started.  You can use the XML package
to
  grab the information off the internet.
 
  library(XML)
 
  mylines - readLines(url(http://bit.ly/1coCohq;))
  closeAllConnections()mylist - readHTMLTable(mylines,
  asText=TRUE)mytable - mylist1$xTable
 
  However, when I look at the resulting object, mytable, it doesn't
have
  informative row or column headings.  Perhaps someone else can figure
  out how to get that information.
 
  Jean
 
 
 
 
 
  On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
  lorenzo.ise...@gmail.com wrote:
  Dear All,
  I often need to do some work on some data which is publicly
available
  on the EUROSTAT website.
  I saw several ways to download automatically mainly the bulk data
  from EUROSTAT to later on postprocess it with R, for instance
 
  http://bit.ly/HrDICj
  http://bit.ly/HrDL10
  http://bit.ly/HrDTgT
 
  However, what I would like to do is to be able to download directly
  the csv file corresponding to a properly formatted dataset
  (typically a dynamic dataset) from EUROSTAT.
  To fix the ideas, please consider the dataset at the following link
 
  http://bit.ly/1coCohq
 
  what I would like to do is to automatically read its content into R,
  or at least to automatically download it as a csv file (full
  extraction, single file, no flags and footnotes) which I can then
  manipulate easily.
  Any suggestion is appreciated.
  Cheers
 
  Lorenzo
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-05 Thread Paul Bivand
This looks as though you need to be a little XML old-school.
readHTMLTable is a summary function drawing on:

?htmlTreeParse() turns the table into xml
?xpathApply()
and more.

#xpathApply(doc, , //td, function(x)xmlValue(x)) breaks each line at
the end of a table cell and extracts the value

# The //th picks out the table headings without distinction as to
whether they are rows or columns

Followed by various gsub()  and turning it into a matrix (as this
comes out with a list of values without columns. I couldn't identify
the headings, but the table body is definitely doable.

readHTMLTable seems to assume that the column headings are a single
row, which isn't always the case.

Paul Bivand


On 5 November 2013 18:44, Barry Rowlingson b.rowling...@lancaster.ac.uk wrote:
 On 4 Nov 2013 19:30, David Winsemius dwinsem...@comcast.net wrote:

 Maybe you should use their download facility rather than trying to
 deparse a complex webpage with lots of special user interaction features:

 http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do


 That web page depends on the user already having been to the previous page
 to set up a session and so directly downloading a dataset requires setting
 up cookies and making sure the request has all the right parameters. Looks
 like a right pain.

 --
 David.
 

 On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:

  Thanks.
  I had already introduced this minor adjustments in the code, but the
 real problem (to me) is the information that gets lost: the informative
 name of the columns, the indicator type and the units.

  Cheers
 
  Lorenzo
 
  On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas ruipbarra...@sapo.pt
 wrote:
 
  Hello,
 
  If you want to get rid of the (bp) stuff, you can use lapply/gsub.
 Using Jean's code a bit changed,
 
  library(XML)
 
  mylines - readLines(url(http://bit.ly/1coCohq;))
  closeAllConnections()
  mytable - readHTMLTable(mylines, which = 2, asText=TRUE,
 stringsAsFactors = FALSE)
 
  str(mytable)
 
  mytable[] - lapply(mytable, function(x) gsub(\\(.*\\), , x))
  mytable[] - lapply(mytable, function(x) gsub(,, , x))
  mytable[] - lapply(mytable, as.numeric)
 
  colnames(mytable) - 2000:2013
 
 
  Hope this helps,
 
  Rui Barradas
 
  Em 04-11-2013 09:53, Lorenzo Isella escreveu:
  Hello,
  And thanks a lot.
  This is indeed very close to what I need.
  I am trying to figure out how not to lose the headers and how to
 avoid
  downloading labels like (p) together with the numerical data I am
  interested in.
  If anyone on the list knows how to make this minor modifications, s/he
  will make my life much easier.
  Cheers
 
  Lorenzo
 
 
  On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov
 wrote:
 
  Lorenzo,
 
  I may be able to help you get started.  You can use the XML package
 to
  grab the information off the internet.
 
  library(XML)
 
  mylines - readLines(url(http://bit.ly/1coCohq;))
  closeAllConnections()mylist - readHTMLTable(mylines,
  asText=TRUE)mytable - mylist1$xTable
 
  However, when I look at the resulting object, mytable, it doesn't
 have
  informative row or column headings.  Perhaps someone else can figure
  out how to get that information.
 
  Jean
 
 
 
 
 
  On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
  lorenzo.ise...@gmail.com wrote:
  Dear All,
  I often need to do some work on some data which is publicly
 available
  on the EUROSTAT website.
  I saw several ways to download automatically mainly the bulk data
  from EUROSTAT to later on postprocess it with R, for instance
 
  http://bit.ly/HrDICj
  http://bit.ly/HrDL10
  http://bit.ly/HrDTgT
 
  However, what I would like to do is to be able to download directly
  the csv file corresponding to a properly formatted dataset
  (typically a dynamic dataset) from EUROSTAT.
  To fix the ideas, please consider the dataset at the following link
 
  http://bit.ly/1coCohq
 
  what I would like to do is to automatically read its content into R,
  or at least to automatically download it as a csv file (full
  extraction, single file, no flags and footnotes) which I can then
  manipulate easily.
  Any suggestion is appreciated.
  Cheers
 
  Lorenzo
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, 

Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Adams, Jean
Lorenzo,

You might want to post this is a new question to get some new eyes on it.

Or, you could try posting your question to http://stackoverflow.com/.
 Scraping the web is a common topic for that group.

Jean


On Mon, Nov 4, 2013 at 3:53 AM, Lorenzo Isella lorenzo.ise...@gmail.comwrote:

  Hello,
 And thanks a lot.
 This is indeed very close to what I need.
 I am trying to figure out how not to lose the headers and how to avoid
 downloading labels like (p) together with the numerical data I am
 interested in.
 If anyone on the list knows how to make this minor modifications, s/he
 will make my life much easier.
 Cheers

 Lorenzo


 On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov wrote:

 Lorenzo,

 I may be able to help you get started.  You can use the XML package to
 grab the information off the internet.

 library(XML)

 mylines - readLines(url(http://bit.ly/1coCohq;))
 closeAllConnections()
 mylist - readHTMLTable(mylines, asText=TRUE)
 mytable - mylist1$xTable

 However, when I look at the resulting object, mytable, it doesn't have
 informative row or column headings.  Perhaps someone else can figure out
 how to get that information.

 Jean





 On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella lorenzo.ise...@gmail.com
  wrote:

 Dear All,
 I often need to do some work on some data which is publicly available on
 the EUROSTAT website.
 I saw several ways to download automatically mainly the bulk data from
 EUROSTAT to later on postprocess it with R, for instance

 http://bit.ly/HrDICj
 http://bit.ly/HrDL10
 http://bit.ly/HrDTgT

 However, what I would like to do is to be able to download directly the
 csv file corresponding to a properly formatted dataset (typically a dynamic
 dataset) from EUROSTAT.
 To fix the ideas, please consider the dataset at the following link

 http://bit.ly/1coCohq

 what I would like to do is to automatically read its content into R, or
 at least to automatically download it as a csv file (full extraction,
 single file, no flags and footnotes) which I can then manipulate easily.
 Any suggestion is appreciated.
 Cheers

 Lorenzo

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.







[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Lorenzo Isella

Hello,
And thanks a lot.
This is indeed very close to what I need.
I am trying to figure out how not to lose the headers and how to avoid  
downloading labels like (p) together with the numerical data I am  
interested in.
If anyone on the list knows how to make this minor modifications, s/he  
will make my life much easier.

Cheers

Lorenzo


On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov wrote:


Lorenzo,

I may be able to help you get started.  You can use the XML package to  
grab the information off the internet.


library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()mylist - readHTMLTable(mylines, asText=TRUE) 
mytable - mylist1$xTable


However, when I look at the resulting object, mytable, it doesn't have  
informative row or column headings.  Perhaps someone else can figure  
out how to get that information.


Jean





On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella  
lorenzo.ise...@gmail.com wrote:

Dear All,
I often need to do some work on some data which is publicly available  
on the EUROSTAT website.
I saw several ways to download automatically mainly the bulk data from  
EUROSTAT to later on postprocess it with R, for instance


http://bit.ly/HrDICj
http://bit.ly/HrDL10
http://bit.ly/HrDTgT

However, what I would like to do is to be able to download directly the  
csv file corresponding to a properly formatted dataset (typically a  
dynamic dataset) from EUROSTAT.

To fix the ideas, please consider the dataset at the following link

http://bit.ly/1coCohq

what I would like to do is to automatically read its content into R, or  
at least to automatically download it as a csv file (full extraction,  
single file, no flags and footnotes) which I can then manipulate  
easily.

Any suggestion is appreciated.
Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide  
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Carlos Ortega
Hi Lorenzo,

Perhaps package pxR can help you out.

http://cran.at.r-project.org/web/packages/pxR/index.html
pxR: PC-Axis with R

The pxR package provides a set of functions for reading and writing PC-Axis
files, used by different statistical organizations around the globe for
data dissemination.

Regards,
Carlos Ortega.


2013/11/4 Adams, Jean jvad...@usgs.gov

 Lorenzo,

 You might want to post this is a new question to get some new eyes on it.

 Or, you could try posting your question to http://stackoverflow.com/.
  Scraping the web is a common topic for that group.

 Jean


 On Mon, Nov 4, 2013 at 3:53 AM, Lorenzo Isella lorenzo.ise...@gmail.com
 wrote:

   Hello,
  And thanks a lot.
  This is indeed very close to what I need.
  I am trying to figure out how not to lose the headers and how to avoid
  downloading labels like (p) together with the numerical data I am
  interested in.
  If anyone on the list knows how to make this minor modifications, s/he
  will make my life much easier.
  Cheers
 
  Lorenzo
 
 
  On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov
 wrote:
 
  Lorenzo,
 
  I may be able to help you get started.  You can use the XML package to
  grab the information off the internet.
 
  library(XML)
 
  mylines - readLines(url(http://bit.ly/1coCohq;))
  closeAllConnections()
  mylist - readHTMLTable(mylines, asText=TRUE)
  mytable - mylist1$xTable
 
  However, when I look at the resulting object, mytable, it doesn't have
  informative row or column headings.  Perhaps someone else can figure out
  how to get that information.
 
  Jean
 
 
 
 
 
  On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella 
 lorenzo.ise...@gmail.com
   wrote:
 
  Dear All,
  I often need to do some work on some data which is publicly available on
  the EUROSTAT website.
  I saw several ways to download automatically mainly the bulk data from
  EUROSTAT to later on postprocess it with R, for instance
 
  http://bit.ly/HrDICj
  http://bit.ly/HrDL10
  http://bit.ly/HrDTgT
 
  However, what I would like to do is to be able to download directly the
  csv file corresponding to a properly formatted dataset (typically a
 dynamic
  dataset) from EUROSTAT.
  To fix the ideas, please consider the dataset at the following link
 
  http://bit.ly/1coCohq
 
  what I would like to do is to automatically read its content into R, or
  at least to automatically download it as a csv file (full extraction,
  single file, no flags and footnotes) which I can then manipulate easily.
  Any suggestion is appreciated.
  Cheers
 
  Lorenzo
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
 
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Saludos,
Carlos Ortega
www.qualityexcellence.es

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Rui Barradas

Hello,

If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using 
Jean's code a bit changed,


library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()
mytable - readHTMLTable(mylines, which = 2, asText=TRUE, 
stringsAsFactors = FALSE)


str(mytable)

mytable[] - lapply(mytable, function(x) gsub(\\(.*\\), , x))
mytable[] - lapply(mytable, function(x) gsub(,, , x))
mytable[] - lapply(mytable, as.numeric)

colnames(mytable) - 2000:2013


Hope this helps,

Rui Barradas

Em 04-11-2013 09:53, Lorenzo Isella escreveu:

Hello,
And thanks a lot.
This is indeed very close to what I need.
I am trying to figure out how not to lose the headers and how to avoid
downloading labels like (p) together with the numerical data I am
interested in.
If anyone on the list knows how to make this minor modifications, s/he
will make my life much easier.
Cheers

Lorenzo


On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov wrote:


Lorenzo,

I may be able to help you get started.  You can use the XML package to
grab the information off the internet.

library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()mylist - readHTMLTable(mylines,
asText=TRUE)mytable - mylist1$xTable

However, when I look at the resulting object, mytable, it doesn't have
informative row or column headings.  Perhaps someone else can figure
out how to get that information.

Jean





On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
lorenzo.ise...@gmail.com wrote:

Dear All,
I often need to do some work on some data which is publicly available
on the EUROSTAT website.
I saw several ways to download automatically mainly the bulk data
from EUROSTAT to later on postprocess it with R, for instance

http://bit.ly/HrDICj
http://bit.ly/HrDL10
http://bit.ly/HrDTgT

However, what I would like to do is to be able to download directly
the csv file corresponding to a properly formatted dataset
(typically a dynamic dataset) from EUROSTAT.
To fix the ideas, please consider the dataset at the following link

http://bit.ly/1coCohq

what I would like to do is to automatically read its content into R,
or at least to automatically download it as a csv file (full
extraction, single file, no flags and footnotes) which I can then
manipulate easily.
Any suggestion is appreciated.
Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Lorenzo Isella

Thanks.
I had already introduced this minor adjustments in the code, but the real  
problem (to me) is the information that gets lost: the informative name of  
the columns, the indicator type and the units.

Cheers

Lorenzo

On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas ruipbarra...@sapo.pt  
wrote:



Hello,

If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using  
Jean's code a bit changed,


library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()
mytable - readHTMLTable(mylines, which = 2, asText=TRUE,  
stringsAsFactors = FALSE)


str(mytable)

mytable[] - lapply(mytable, function(x) gsub(\\(.*\\), , x))
mytable[] - lapply(mytable, function(x) gsub(,, , x))
mytable[] - lapply(mytable, as.numeric)

colnames(mytable) - 2000:2013


Hope this helps,

Rui Barradas

Em 04-11-2013 09:53, Lorenzo Isella escreveu:

Hello,
And thanks a lot.
This is indeed very close to what I need.
I am trying to figure out how not to lose the headers and how to avoid
downloading labels like (p) together with the numerical data I am
interested in.
If anyone on the list knows how to make this minor modifications, s/he
will make my life much easier.
Cheers

Lorenzo


On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov  
wrote:



Lorenzo,

I may be able to help you get started.  You can use the XML package to
grab the information off the internet.

library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()mylist - readHTMLTable(mylines,
asText=TRUE)mytable - mylist1$xTable

However, when I look at the resulting object, mytable, it doesn't have
informative row or column headings.  Perhaps someone else can figure
out how to get that information.

Jean





On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
lorenzo.ise...@gmail.com wrote:

Dear All,
I often need to do some work on some data which is publicly available
on the EUROSTAT website.
I saw several ways to download automatically mainly the bulk data
from EUROSTAT to later on postprocess it with R, for instance

http://bit.ly/HrDICj
http://bit.ly/HrDL10
http://bit.ly/HrDTgT

However, what I would like to do is to be able to download directly
the csv file corresponding to a properly formatted dataset
(typically a dynamic dataset) from EUROSTAT.
To fix the ideas, please consider the dataset at the following link

http://bit.ly/1coCohq

what I would like to do is to automatically read its content into R,
or at least to automatically download it as a csv file (full
extraction, single file, no flags and footnotes) which I can then
manipulate easily.
Any suggestion is appreciated.
Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread David Winsemius

On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:

 Thanks.
 I had already introduced this minor adjustments in the code, but the real 
 problem (to me) is the information that gets lost: the informative name of 
 the columns, the indicator type and the units.

Maybe you should use their download facility rather than trying to deparse a 
complex webpage with lots of special user interaction features:

http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do

-- 
David.

 Cheers
 
 Lorenzo
 
 On Mon, 04 Nov 2013 19:52:51 +0100, Rui Barradas ruipbarra...@sapo.pt wrote:
 
 Hello,
 
 If you want to get rid of the (bp) stuff, you can use lapply/gsub. Using 
 Jean's code a bit changed,
 
 library(XML)
 
 mylines - readLines(url(http://bit.ly/1coCohq;))
 closeAllConnections()
 mytable - readHTMLTable(mylines, which = 2, asText=TRUE, stringsAsFactors = 
 FALSE)
 
 str(mytable)
 
 mytable[] - lapply(mytable, function(x) gsub(\\(.*\\), , x))
 mytable[] - lapply(mytable, function(x) gsub(,, , x))
 mytable[] - lapply(mytable, as.numeric)
 
 colnames(mytable) - 2000:2013
 
 
 Hope this helps,
 
 Rui Barradas
 
 Em 04-11-2013 09:53, Lorenzo Isella escreveu:
 Hello,
 And thanks a lot.
 This is indeed very close to what I need.
 I am trying to figure out how not to lose the headers and how to avoid
 downloading labels like (p) together with the numerical data I am
 interested in.
 If anyone on the list knows how to make this minor modifications, s/he
 will make my life much easier.
 Cheers
 
 Lorenzo
 
 
 On Fri, 01 Nov 2013 14:25:49 +0100, Adams, Jean jvad...@usgs.gov wrote:
 
 Lorenzo,
 
 I may be able to help you get started.  You can use the XML package to
 grab the information off the internet.
 
 library(XML)
 
 mylines - readLines(url(http://bit.ly/1coCohq;))
 closeAllConnections()mylist - readHTMLTable(mylines,
 asText=TRUE)mytable - mylist1$xTable
 
 However, when I look at the resulting object, mytable, it doesn't have
 informative row or column headings.  Perhaps someone else can figure
 out how to get that information.
 
 Jean
 
 
 
 
 
 On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
 lorenzo.ise...@gmail.com wrote:
 Dear All,
 I often need to do some work on some data which is publicly available
 on the EUROSTAT website.
 I saw several ways to download automatically mainly the bulk data
 from EUROSTAT to later on postprocess it with R, for instance
 
 http://bit.ly/HrDICj
 http://bit.ly/HrDL10
 http://bit.ly/HrDTgT
 
 However, what I would like to do is to be able to download directly
 the csv file corresponding to a properly formatted dataset
 (typically a dynamic dataset) from EUROSTAT.
 To fix the ideas, please consider the dataset at the following link
 
 http://bit.ly/1coCohq
 
 what I would like to do is to automatically read its content into R,
 or at least to automatically download it as a csv file (full
 extraction, single file, no flags and footnotes) which I can then
 manipulate easily.
 Any suggestion is appreciated.
 Cheers
 
 Lorenzo
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-04 Thread Lorenzo Isella
On Mon, 04 Nov 2013 20:26:46 +0100, David Winsemius  
dwinsem...@comcast.net wrote:




On Nov 4, 2013, at 11:03 AM, Lorenzo Isella wrote:


Thanks.
I had already introduced this minor adjustments in the code, but the  
real problem (to me) is the information that gets lost: the informative  
name of the columns, the indicator type and the units.


Maybe you should use their download facility rather than trying to  
deparse a complex webpage with lots of special user interaction  
features:


http://appsso.eurostat.ec.europa.eu/nui/setupDownloads.do




Of course, for a single data set, I agree.
In my case, I need to download and analyze several tens of data sets and I  
need to be able to do this at regular time intervals, hence the need to  
automate also the download.

Cheers

Lorenzo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Download CSV Files from EUROSTAT Website

2013-11-01 Thread Adams, Jean
Lorenzo,

I may be able to help you get started.  You can use the XML package to grab
the information off the internet.

library(XML)

mylines - readLines(url(http://bit.ly/1coCohq;))
closeAllConnections()
mylist - readHTMLTable(mylines, asText=TRUE)
mytable - mylist1$xTable

However, when I look at the resulting object, mytable, it doesn't have
informative row or column headings.  Perhaps someone else can figure out
how to get that information.

Jean





On Thu, Oct 31, 2013 at 10:38 AM, Lorenzo Isella
lorenzo.ise...@gmail.comwrote:

 Dear All,
 I often need to do some work on some data which is publicly available on
 the EUROSTAT website.
 I saw several ways to download automatically mainly the bulk data from
 EUROSTAT to later on postprocess it with R, for instance

 http://bit.ly/HrDICj
 http://bit.ly/HrDL10
 http://bit.ly/HrDTgT

 However, what I would like to do is to be able to download directly the
 csv file corresponding to a properly formatted dataset (typically a dynamic
 dataset) from EUROSTAT.
 To fix the ideas, please consider the dataset at the following link

 http://bit.ly/1coCohq

 what I would like to do is to automatically read its content into R, or at
 least to automatically download it as a csv file (full extraction, single
 file, no flags and footnotes) which I can then manipulate easily.
 Any suggestion is appreciated.
 Cheers

 Lorenzo

 __**
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/**listinfo/r-helphttps://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/**
 posting-guide.html http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.