Hi, no need for RCurl - this should suffice:
require(XML) input = "panthera-uncia" h <- htmlParse(paste("http://api.iucnredlist.org/go/", input, sep = "")) (status <- xpathSApply(h, '//div[@id="red_list_category_code"]', xmlValue)) [1] "EN" Many thanks for pointing up the IUCN-API, Eduard - it is awesome! Best, Kay 2012/6/27 Eduard Szöcs <szoe8...@uni-landau.de> > Hai Augusto, > > regarding question #3: > You could use the red list API with RCurl and XML packages. > Here is an example: > > > require(RCurl) > > require(XML) > > get_IUCN_status <- function(x) { > + spec <- tolower(x) > + spec <- gsub(" ", "-", spec) > + url <- > paste("http://api.iucnredlist.**org/go/<http://api.iucnredlist.org/go/>", > spec, sep="") > + get <- getURL(url, followlocation = TRUE) > + h <- htmlParse(get) > + status <- xpathSApply(h, '//div[@id ="red_list_category_code"]', > xmlValue) > + return(status) > + } > > > > get_IUCN_status("Panthera uncia") > [1] "EN" > > For more resources just type 'webscraping R' in your favourite search > engine. > > HTH, > > Eduard > > > On 26/06/12 20:57, Augusto Ribas wrote: > >> Hello. >> I'm haveing problems with a function to do webscrap. >> I have a long list like this example: >> >> data<-data.frame(especie=c("**Rana pipiens","Rana vaillanti","Ctenosaura >> similis","Bos taurus"),group=c("sapo","sapo"**,"reptil","mamifero")) >> >> And, as some species names are out of data, i trying to make a >> function to check catalogue of life >> (http://www.catalogueoflife.**org/<http://www.catalogueoflife.org/> >> ) >> and get the current names. >> This have some problems, like species name that split, but help as a >> first check. >> >> So i made this function to web scrap the data. >> Its simple, it search the site, makeing a link with the keywords, then >> enter the first link of the list of results produced and get the >> accepted name and author, giveing the results as a list. >> for example: >> >> sp.check("Rana pipiens") >>> >> $sp.aceito >> [1] "Lithobates pipiens" >> >> $autor >> [1] "Schreber, 1782" >> >> But sometimes the function cannot acess the internet, and give a error. >> >> I'm made this function trying to copy some examples on foruns, but i >> have some doubts: >> >> 01) How do i supress the readlines() warnings? >> >> 02) How can i make the function try again when cannot acess internet, >> or just print something like "Cant acess internet", or when i try >> something like: >> >> data$check<-NA >> for(i in 1:nrow(data)) { >> data$check[i]<-sp.check(data$**especie[i]) >> } >> >> the loop dont stop. >> I made a short list, but when with 500 or more lines it usually stop >> in the middle. >> >> 03) Anyone have an example how to scrap http://www.iucnredlist.org/ >> the status of species, as it does not use the keyword in the link? Is >> there any tutorial simple for someone without any background on >> programing or computer science? >> >> >> Well thanks for the attention. >> >> #função sp.check >> >> sp.check<-function(especie) { >> #split species name >> especie<-as.character(especie) >> >> gen<-strsplit(especie,"\\ ")[[1]][1] >> esp<-strsplit(especie,"\\ ")[[1]][2] >> >> #makeing first link >> link<-paste("http://www.**catalogueoflife.org/col/**search/all/key/<http://www.catalogueoflife.org/col/search/all/key/> >> ",gen,"+",esp,"**/match/1",sep="") >> link <- iconv(link, 'latin1', 'UTF-8') >> Encoding(link) <- 'bytes' >> >> #reading table of results >> pagina <- readLines(url(link)) >> >> n.linhas<-which(pagina%in%" <td class=\"field_header_black\">"**) >> >> #is there any results? >> if(length(n.linhas)>0) { >> >> pag.sp<-strsplit(pagina[n.**linhas[1]+1],'\\"')[[1]][2] >> >> #second link >> link2 <- paste( >> "http://www.catalogueoflife.**org<http://www.catalogueoflife.org> >> ",pag.sp,sep="") >> link2 <- iconv(link2, 'latin1', 'UTF-8') >> Encoding(link2) <- 'bytes' >> link2 >> >> #read >> pagina2 <- readLines(url(link2)) >> >> #get line of interest >> linha2<-grep('(accepted name)',pagina2) >> sp.final<-pagina2[linha2] >> >> #get species name >> corte1<-strsplit(sp.final,'<i>**')[[1]][2] >> sp.aceito<-strsplit(corte1,'</**i>')[[1]][1] >> >> #get author >> corte2<-strsplit(sp.final,'\\(**')[[1]][2] >> autor<-strsplit(corte2,')')[[**1]][1] >> }else { >> sp.aceito<-c("Não encontrado") >> autor<-c("Não encontrado") >> } >> return(list(sp.aceito=sp.**aceito,autor=autor)) >> } >> >> -- >> Grato >> Augusto C. A. Ribas >> >> Site Pessoal: >> http://augustoribas.heliohost.**org<http://augustoribas.heliohost.org> >> Lattes: >> http://lattes.cnpq.br/**7355685961127056<http://lattes.cnpq.br/7355685961127056> >> >> ______________________________**_________________ >> R-sig-ecology mailing list >> R-sig-ecology@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology> >> >> > ______________________________**_________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/**listinfo/r-sig-ecology<https://stat.ethz.ch/mailman/listinfo/r-sig-ecology> > [[alternative HTML version deleted]]
_______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology