-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
The [au] portion seems to be causing the problem. So escape the [ and ] by mapping them to %5B and %5D respectively _before_ handing the URL string to xmlTreeParse(). (The error message indicates that the internals have already performed the conversion, but if you do it yourself, things should work as I can reproduce your error message and can get the desired result by escaping the [ and ] first.) There is more information about what needs to be escaped at http://publib.boulder.ibm.com/infocenter/discover/v8r4/index.jsp?topic=/com.ibm.discovery.ds.ref.doc/t_RG_Escape_Sequences.htm The HTTP/FTP code built into the xmlTreeParse(), htmlTreeParse() and xmlEventParse() functions (specifically from libxml2) is minimalistic. For better or worse, it is the code that is also in R to implement url() connections. It does not handle aspects of HTTP other than simple request. So when I run into problems with xmlTreeParse() and a URL, I first fetch the content of the document using the RCurl package. And library(RCurl) getURL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au]") does fetch the document and the result can be passed directly to xmlTreeParse(). RCurl is an interface to libcurl which is a very solid, stable and feature rich library for performing HTTP, HTTPS, FTP, ... client queries which allows us to do, in R, pretty much anything a Web browser can do but programmatically. D. Armin Goralczyk wrote: > Hello > > In the following thread (R-help) the possibilities of analyzing > publications from pubmed via XML were discussed: > > http://www.nabble.com/Analyzing-Publications-from-Pubmed-via-XML-to14328779.html#a14343090 > > Using xmlTreeParse in a function results in a failure message on my > Mac which is not reproduced in R for Windows: > >> esearch <- function (term){ > + srch.stem <- > "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?" > + srch.mode <- "db=pubmed&retmax=10000&retmode=xml&term=" > + doc <-xmlTreeParse(paste(srch.stem,srch.mode,term,sep=""),isURL = TRUE, > + useInternalNodes = TRUE) > + sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) > + } >> term <- 'meyer' >> pmid <- esearch(term) # works fine >> >> term <- 'meyer[au]' >> pmid <- esearch(term) > Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers, > as.logical(ignoreBlanks), : > error in creating parser for > http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&retmax=10000&retmode=xml&term=meyer[au] > I/O warning : failed to load external entity > "http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubmed&retmax=10000&retmode=xml&term=meyer%5Bau%5D" > > The problem seems to be the search tag [au]. > I am not very familiar with XML or the xmlTreeParse function, so I > don't know what is wrong. Can anybody help? > > Thanks > > My version: >> R.Version() > $platform > [1] "powerpc-apple-darwin8.10.1" > > $arch > [1] "powerpc" > > $os > [1] "darwin8.10.1" > > $system > [1] "powerpc, darwin8.10.1" > > $status > [1] "Patched" > > $major > [1] "2" > > $minor > [1] "6.0" > > $year > [1] "2007" > > $month > [1] "11" > > $day > [1] "09" > > $`svn rev` > [1] "43408" > > $language > [1] "R" > > $version.string > [1] "R version 2.6.0 Patched (2007-11-09 r43408)" > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHaYvZ9p/Jzwa2QP4RAhwbAJoC+KK8tMGWnL5vQehBPWyUWqzDFwCbBxKP iwWaeL7eDgUI1jg988fYD0A= =WsL3 -----END PGP SIGNATURE----- _______________________________________________ R-SIG-Mac mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-mac
