Thanks Duncan for your input. However I could not install the package "RHTMLForms", it is saying as not not available :
> install.packages("RHTMLForms", repos = "http://www.omegahat.org/R") Warning in install.packages("RHTMLForms", repos = "http://www.omegahat.org/R") : argument 'lib' is missing: using 'C:\Users\Arrun's\Documents/R/win-library/2.9' Warning message: In getDependencies(pkgs, dependencies, available, lib) : package ‘RHTMLForms’ is not available I found this package in net : http://www.omegahat.org/RHTMLForms/ However it is gz file which I could not use as I am a window user. Can you please provide me alternate source? Thanks, Duncan Temple Lang wrote: > > > > Bogaso wrote: >> Thank you so much for those helps. However I need little more help. In >> the >> site >> "http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php" >> if I scroll below then there is an option "Historical CPI Index For USA" >> Next if I click on "Get Data" then another table pops-up, however without >> any significant change in address bar. This tables holds more data >> starting >> from 1999. Can you please help me how to get the values of this table? >> > > > Hi again > > Well, this is a little bit more involved, as this is an HTML form > and so we need to be able to emulate submitting a form with > values for the different parameters the form expects, along with > ensuring they are correct inputs. Ordinarily, this would involve > looking at the source of the HTML document, finding the relevant > <form> element, getting its action attribute, and all its inputs > and figuring out the possible inputs. This is "straightforward" > but involved. But we have an R package that does this reasonably > well in an automated form. This is the RHTMLForms from the > www.omegahat.org/R repository. > > We can use this with > install.packages("RHTMLForms", repos = "http://www.omegahat.org/R") > > Then > > library(RHTMLForms) > > ff = > getHTMLFormDescription("http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php") > > # The form we want is the third one. We can determine this > # from the names of the parameters. > # So we request that this form description be turned into an R function > > g = createFunction(ff[[3]]) > > # Now we call this. > xx = g("2001", "2008") > > > # This returns the content of an HTML document > # so we parse it and then pass this to readHTMLTable() > # This is why we have methods for > > library(XML) > doc = htmlParse(xx, asText = TRUE) > tbls = readHTMLTable(doc) > > # we want the last of the tables. > tbls[[length(tbls)]] > > > So hopefully that helps solve your problem and introduces another Omegahat > package that > we hope people find through Google. The RHTMLForms package is an approach > to the > poor-man's Web services - HTML forms- rather than REST and SOAP that are > becoming more relevant > each day. The RCurl and SSOAP address the latter. > > D. > > > > > >> Thanks >> >> >> Duncan Temple Lang wrote: >>> >>> Thanks for explaining this, Charlie. >>> >>> Just for completeness and to make things a little easier, >>> the XML package has a function named readHTMLTable() >>> and you can call it with a URL and it will attempt >>> to read all the tables in the page. >>> >>> tbls = >>> readHTMLTable('http://www.rateinflation.com/consumer-price-index/usa-cpi.php') >>> >>> yields a list with 10 elements, and the table of interest with the data >>> is >>> the 10th one. >>> >>> tbls[[10]] >>> >>> The function does the XPath voodoo and sapply() work for you and uses >>> some >>> heuristics. >>> There are various controls one can specify and also various methods for >>> working >>> with sub-parts of the HTML document directly. >>> >>> D. >>> >>> >>> >>> cls59 wrote: >>>> >>>> Bogaso wrote: >>>>> Hi all, >>>>> >>>>> I want to download data from those two different sources, directly >>>>> into >>>>> R >>>>> : >>>>> >>>>> http://www.rateinflation.com/consumer-price-index/usa-cpi.php >>>>> http://eaindustry.nic.in/asp2/list_d.asp >>>>> >>>>> First one is CPI of US and 2nd one is WPI of India. Can anyone please >>>>> give >>>>> any clue how to download them directly into R. I want to make them zoo >>>>> object for further analysis. >>>>> >>>>> Thanks, >>>>> >>>> The following site did not load for me: >>>> >>>> http://eaindustry.nic.in/asp2/list_d.asp >>>> >>>> But I was able to extract the table from the US CPI site using Duncan >>>> Temple >>>> Lang's XML package: >>>> >>>> library(XML) >>>> >>>> >>>> First, download the website into R: >>>> >>>> html.raw <- readLines( >>>> 'http://www.rateinflation.com/consumer-price-index/usa-cpi.php' ) >>>> >>>> Then, convert to an HTML object using the XML package: >>>> >>>> html.data <- htmlTreeParse( html.raw, asText = T, useInternalNodes = >>>> T >>>> ) >>>> >>>> A quick scan of the page source in the browser reveals that the table >>>> you >>>> want is encased in a div with a class of "dynamicContent"-- we will use >>>> a >>>> xpath specification[1] to retrieve all rows in that table: >>>> >>>> table.html <- getNodeSet( html.data, >>>> '//d...@class="dynamicContent"]/table/tr' ) >>>> >>>> Now, the data values can be extracted from the cells in the rows using >>>> a >>>> little sapply and xpathXpply voodoo: >>>> >>>> table.data <- t( sapply( table.html, function( row ){ >>>> >>>> row.data <- xpathSApply( row, './td', xmlValue ) >>>> return( row.data) >>>> >>>> })) >>>> >>>> >>>> Good luck! >>>> >>>> -Charlie >>>> >>>> [1]: http://www.w3schools.com/XPath/xpath_syntax.asp >>>> >>>> ----- >>>> Charlie Sharpsteen >>>> Undergraduate >>>> Environmental Resources Engineering >>>> Humboldt State University >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Downloading-data-from-from-internet-tp25568930p25622550.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.