Re: [R] Assistance converting to R a python function that extracts from an XML file
Hi Don library(XML) readxmldate = function(xmlfile) { doc = xmlParse(xmlfile) xpathSApply(doc, '//Esri/CreaDate | //Esri/CreaTime', xmlValue) } D. On 12/13/14, 12:36 PM, MacQueen, Don wrote: I would appreciate assistance doing in R what a colleague has done in python. Unfortunately (for me), I have almost no experience with either python or xml. Within an xml file there is CreaDate20120627/CreaDateCreaTime07322600/CreaTime and I need to extract those two values, 20120627 and 07322600 Here is the short python function. Even without knowing python, it's conceptually clear what it does. I would like to do the same in R. def readxmldate(xmlfile): tree = ET.parse(xmlfile) root = tree.getroot() for lev1 in root.findall('Esri'): xdate = lev1.find('CreaDate').text xtime = lev1.find('CreaTime').text return xdate, xtime Thanks in advance -Don __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saveXML() prefix argument
Thanks Earl and Milan. Yes, the C code to serialize does branch and do things differently for the different combinations of file, encoding and indent. I have updated the code to use a different routine in libxml2 for this case and that honors the indentation in this case. That will be in the next release of XML. In the meantime, you can use cat( saveXML( doc, encoding = UTF-8, indent = TRUE), file = bob.xml) rather than saveXML(doc, file = bob.xml, encoding = UTF-8, indent = TRUE) i.e. move the file argument to cat(). Thanks, D. On 10/19/13 4:36 AM, Milan Bouchet-Valat wrote: Le vendredi 18 octobre 2013 à 13:27 -0400, Earl Brown a écrit : Thanks Duncan. However, now I can't get the Spanish and Portuguese accented vowels to come out correctly and still keep the indents in the saved document, even when I set encoding = UTF-8: library(XML) concepts - c(español, português) info - c(info about español, info about português) doc - newXMLDoc() root - newXMLNode(tips, doc = doc) for (i in 1:length(concepts)) { cur.concept - concepts[i] cur.info - info[i] cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root) newXMLNode(h1, cur.concept, parent = cur.tip) newXMLNode(p, cur.info, parent = cur.tip) } # accented vowels don't come through correctly, but the indents are correct: saveXML(doc, file = test1.xml, indent = T) Resulting file looks like this: ?xml version=1.0? tips tip id=1 h1espa#xF1;ol/h1 pinfo about espa#xF1;ol/p /tip tip id=2 h1portugu#xEA;s/h1 pinfo about portugu#xEA;s/p /tip /tips # accented vowels are correct, but the indents are no longer correct: saveXML(doc, file = test2.xml, indent = T, encoding = UTF-8) Resulting file: ?xml version=1.0 encoding=UTF-8? tipstip id=1h1español/h1pinfo about español/p/tiptip id=2h1português/h1pinfo about português/p/tip/tips I tried to workaround the problem by simply loading in that resulting file and saving it again: doc2 - xmlInternalTreeParse(file = test2.xml, asTree = T) saveXML(doc2, file = test_word_around.xml, indent = T) but still don't get the indents. Does setting encoding = UTF-8 override indents = TRUE in saveXML()? I can confirm the same issue happens here. What is interesting is that without the 'file' argument, the returned string includes the expected line breaks and spacing. These do not appear when redirecting the output to a file. saveXML(doc, encoding=UTF-8, indent=T) [1] ?xml version=\1.0\ encoding=\UTF-8\?\ntips\n tip id=\1 \\nh1español/h1\npinfo about español/p\n /tip\n tip id=\2\\nh1português/h1\npinfo about português/p\n /tip\n/tips\n saveXML(doc, encoding=UTF-8, indent=T, file=test.xml) Contents of test.xml: ?xml version=1.0 encoding=UTF-8? tipstip id=1h1español/h1pinfo about español/p/tiptip id=2h1português/h1pinfo about português/p/tip/tips sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.utf8 LC_NUMERIC=C [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8 [5] LC_MONETARY=fr_FR.utf8LC_MESSAGES=fr_FR.utf8 [7] LC_PAPER=CLC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.96-1.1 Regards __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saveXML() prefix argument
Hi Earl Unfortunately, the code works for me, i.e. indents _and_ displays the accented vowels correctly. Can you send me the output of the function call libxmlVersion() and also sessionInfo(), please? D. On 10/18/13 10:27 AM, Earl Brown wrote: Thanks Duncan. However, now I can't get the Spanish and Portuguese accented vowels to come out correctly and still keep the indents in the saved document, even when I set encoding = UTF-8: library(XML) concepts - c(español, português) info - c(info about español, info about português) doc - newXMLDoc() root - newXMLNode(tips, doc = doc) for (i in 1:length(concepts)) { cur.concept - concepts[i] cur.info - info[i] cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root) newXMLNode(h1, cur.concept, parent = cur.tip) newXMLNode(p, cur.info, parent = cur.tip) } # accented vowels don't come through correctly, but the indents are correct: saveXML(doc, file = test1.xml, indent = T) Resulting file looks like this: ?xml version=1.0? tips tip id=1 h1espa#xF1;ol/h1 pinfo about espa#xF1;ol/p /tip tip id=2 h1portugu#xEA;s/h1 pinfo about portugu#xEA;s/p /tip /tips # accented vowels are correct, but the indents are no longer correct: saveXML(doc, file = test2.xml, indent = T, encoding = UTF-8) Resulting file: ?xml version=1.0 encoding=UTF-8? tipstip id=1h1español/h1pinfo about español/p/tiptip id=2h1português/h1pinfo about português/p/tip/tips I tried to workaround the problem by simply loading in that resulting file and saving it again: doc2 - xmlInternalTreeParse(file = test2.xml, asTree = T) saveXML(doc2, file = test_word_around.xml, indent = T) but still don't get the indents. Does setting encoding = UTF-8 override indents = TRUE in saveXML()? Thanks. Earl __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saveXML() prefix argument
Milan is correct. The prefix is used when saving the XML content that is represented in a different format in R. To get the prefix ?xml version=1.0? on the XML content that you save, use a document object doc = newXMLDoc() root = newXMLNode(foo, doc = doc) saveXML(doc) ?xml version=1.0? foo/ Sorry for the confusion. D On 10/17/13 2:36 AM, Milan Bouchet-Valat wrote: Le mercredi 16 octobre 2013 à 23:45 -0400, Earl Brown a écrit : I'm using the XML package and specifically the saveXML() function but I can't get the prefix argument of saveXML() to work: library(XML) concepts - c(one, two, three) info - c(info one, info two, info three) root - newXMLNode(root) for (i in 1:length(concepts)) { cur.concept - concepts[i] cur.info - info[i] cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root) newXMLNode(h1, cur.concept, parent = cur.tip) newXMLNode(p, cur.info, parent = cur.tip) } # None of the following output a prefix on the first line of the exported document saveXML(root) saveXML(root, file = test.xml) saveXML(root, file = test.xml, prefix = '?xml version=1.0?\n') Am I missing something obvious? Any ideas? It looks like the function XML:::saveXML.XMLInternalNode() does not use the 'prefix' parameter at all. So it won't be taken into account when calling saveXML() on objects of class XMLInternalNode. I think you should report this to Duncan Temple Lang, as this is probably an oversight. Regards Thanks in advance. Earl Brown - Earl K. Brown, PhD Assistant Professor of Spanish Linguistics Advisor, TEFL MA Program Department of Modern Languages Kansas State University www-personal.ksu.edu/~ekbrown __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl cookiejar
Hi Earl The cookies will only be written to the file specified by the cookiejar option when the curl handle is garbage collected. If you use rm(ch) gc() the cookie.txt file should be created. This is the way libcurl behaves rather than something RCurl introduces. If you don't explicitly specify a curl handle in a request, the cookiejar option works as on expects because the implicit curl handle is destroyed at the end of the call and often garbage collection occurs. D. On 8/24/13 11:01 PM, Earl Brown wrote: R-helpers, When I use cURL in the Terminal: curl --cookie-jar cookie.txt --url http://corpusdelespanol.org/x.asp; --user-agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0 --location --include a cookie file cookie.txt is saved to my working directory. However, when I try what I think is the equivalent command R with RCurl: ch - getCurlHandle(followlocation = T, header = T, useragent = Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0) getURL(url = http://www.corpusdelespanol.org/x.asp;, cookiejar = cookie.txt, curl = ch) no cookie file is saved. What am I missing to reproduce in RCurl what I'm successfully doing in the Terminal? Thank you for your time and help. Earl Brown - Earl K. Brown, PhD Assistant Professor of Spanish Linguistics Advisor, TEFL MA Program Department of Modern Languages Kansas State University __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package installation -- an old question
Hi Tao In the same R session as you call install.packages(), what does system(which xml2-config, intern = TRUE) return? Basically, the error message from the configuration script for the XML package is complaining that it cannot find the executable xml2-config in your PATH. (You can also send _me_ the config.log file from the attempted installation.) D. On 8/15/13 10:13 AM, Shi, Tao wrote: Hi list, I have encountered the Cannot find xml2-config problem too during XML package installation on my 64-bit Redhat (v. 6.4) linux machine. After looking through the old posts I checked all the necessary libraries and they all seem to be properly installed (see below). I don't understand why R can't see the xml2-confg during the installation process. Help, please! Many thanks! Tao == [root ~]# yum install libxml2 Setting up Install Process Package matching libxml2-2.7.6-8.el6_3.3.x86_64 already installed. Checking for update. Nothing to do [root ~]# yum install libxml2-devel Setting up Install Process Package matching libxml2-devel-2.7.6-8.el6_3.3.x86_64 already installed. Checking for update. Nothing to do [root ~]# xml2-config --version 2.7.6 [root ~]# curl-config --version libcurl 7.19.7 R session == install.packages(XML) Installing package into \u2018/usr/lib64/R/library\u2019 (as \u2018lib\u2019 is unspecified) trying URL 'http://cran.stat.ucla.edu/src/contrib/XML_3.98-1.1.tar.gz' Content type 'application/x-tar' length 1582216 bytes (1.5 Mb) opened URL == downloaded 1.5 Mb * installing *source* package \u2018XML\u2019 ... ** package \u2018XML\u2019 successfully unpacked and MD5 sums checked checking for gcc... gcc checking for C compiler default output file name... rm: cannot remove `a.out.dSYM': Is a directory a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking how to run the C preprocessor... gcc -E checking for sed... /bin/sed checking for pkg-config... /usr/bin/pkg-config checking for xml2-config... no Cannot find xml2-config ERROR: configuration failed for package \u2018XML\u2019 * removing \u2018/usr/lib64/R/library/XML\u2019 The downloaded source packages are in \u2018/tmp/RtmpwnAIFH/downloaded_packages\u2019 Updating HTML index of packages in '.Library' Making 'packages.html' ... done Warning message: In install.packages(XML) : installation of package \u2018XML\u2019 had non-zero exit status sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BiocInstaller_1.10.3 loaded via a namespace (and not attached): [1] tcltk_3.0.1 tools_3.0.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to download this data?
Hi Ron Yes, you can use ssl.verifypeer = FALSE. Or alternatively, you can use also use getURLContent(, cainfo = system.file(CurlSSL, cacert.pem, package = RCurl)) to specify where libcurl can find the certificates to verify the SSL signature. The error you are encountering appears to becoming from a garbled R expression. This may have arisen as a result of an HTML mailer adding the a href= into the expression where it found an https://... What we want to do is end up with a string of the form https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=adasdasdad?expiryData=specId=219 We have to substitute the text adasdasdad which we assigned to jsession in a previous command. So, take the literal text c(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=;, jsession, ?expiryData=specId=219) and combine it into a single string with paste0. We need the literal strings as they appear when you view the mail for R to make sense of them, not what the mailer adds. As to where I found this, it is in the source of the original HTML page in rawDoc scripts = getNodeSet(rawDoc, //body//script) scripts[[ length(scripts) ]] and look at the text, specifically the app.urls and its 'expiry' field. script type=text/javascript![CDATA[ var app = {}; app.isOption = false; app.urls = { 'spec':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?details=specId=219', 'data':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?data=specId=219', 'confirm':'/reports/dealreports/getSampleConfirm.do;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?hubId=403productId=254', 'reports':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?reports=specId=219', 'expiry':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?expiryDates=specId=219' }; app.Router = Backbone.Router.extend({ routes:{ spec:spec, data:data, confirm:confirm, On 8/3/13 1:05 AM, Ron Michael wrote: In the mean time I have this problem sorted out, hopefully I did it correctly. I have modified the line of your code as: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;, ssl.verifypeer = FALSE) However next I faced with another problem to executing: u = sprintf(a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) Error: unexpected symbol in u = sprintf(a href=https Can you or someone else help me to get out of this error? Also, my another question is: from where you got the expression: a href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219; I really appreciate if someone help me to understand that. Thank you. - Original Message - From: Ron Michael ron_michae...@yahoo.com To: Duncan Temple Lang dtemplel...@ucdavis.edu; r-help@r-project.org r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 12:58 PM Subject: Re: [R] How to download this data? Hello Duncan, Thank you very much for your pointer. However when I tried to run your code, I got following error: rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) Error in function (type, msg, asError = TRUE) : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Can someone help me to understand what could be the cause of this error? Thank you. - Original Message - From: Duncan Temple Lang dtemplel...@ucdavis.edu To: r-help@r-project.org Cc: Sent: Saturday, 3 August 2013 4:33 AM Subject: Re: [R] How to download this data? That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1
Re: [R] How to download this data?
That URL is an HTTPS (secure HTTP), not an HTTP. The XML parser cannot retrieve the file. Instead, use the RCurl package to get the file. However, it is more complicated than that. If you look at source of the HTML page in a browser, you'll see a jsessionid and that is a session identifier. The following retrieves the content of your URL and then parses it and extracts the value of the jsessionid. Then we create the full URL to the actual data page (which is actually in the HTML content but in JavaScript code) library(RCurl) library(XML) rawOrig = getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;) rawDoc = htmlParse(rawOrig) tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]] jsession = gsub(.*jsessionid=([^?]+)?.*, \\1, tmp) u = sprintf(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;, jsession) doc = htmlParse(getURLContent(u)) tbls = readHTMLTable(doc) data = tbls[[1]] dim(data) I did this quickly so it may not be the best way or completely robust, but hopefully it gets the point across and does get the data. D. On 8/2/13 2:42 PM, Ron Michael wrote: Hi all, I need to download the data from this web page: https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry I used the function readHTMLTable() from package XML, however could not download that. Can somebody help me how to get the data onto my R window? Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xmlToDataFrame very slow
Hi Stavros xmlToDataFrame() is very generic and so doesn't know anything about the particulars of the XML it is processing. If you know something about the structure of the XML, you should be able to leverage that for performance. xmlToDataFrame is also not optimized as it is just a convenience routine for people who want to work with XML without much effort. If you send me the file and the code you are using to read the file, I'll take a look at it. D. On 7/30/13 11:10 AM, Stavros Macrakis wrote: I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame (package XML). I have successfully read it into R by splitting the file 10 ways then running xmlToDataFrame on each part, then rbind.fill (package plyr) on the result. This takes about 530 s total, and results in a data.frame with 71k rows and object.size of 21MB. But trying to run xmlToDataFrame on the whole file takes forever ( 1 s so far). xmlParse of this file takes only 0.8 s. I tried running xmlToDataFrame on the first 10% of the file, then the first 10% repeated twice, then three times (with the outer tags adjusted of course). Timings: 1 copy: 111 s = 111 per copy 2 copy: 311 s = 155 3 copy: 626 s = 209 The runtime is superlinear. What is going on here? Is there a better approach? Thanks, -s __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] downloading web content
Hi Daisy Use getURLContent() rather than getURL(). The former handles binary content and this appears to be a zip file. You can write it to a file or read its contents directly in memory, e.g library(RCurl) z = getURLContent(http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;) attributes(z) library(Rcompression) ar = zipArchive(z) names(ar) getZipInfo(ar) ar[[data.csv]] dd = read.csv(textConnection(ar[[data.csv]])) D. On 7/23/13 2:59 AM, Daisy Englert Duursma wrote: Hello, I am trying to use R to download a bunch of .csv files such as: http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia I have tried the following and neither work: a- getURL( http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;) Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : embedded nul in string: and a-httpPOST( http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;) Error: Internal Server Error Any help would be appreciated. Daisy __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Weird 'xmlEventParse' encoding issue
Hi Sascha Your code gives the correct results on my machine (OS X), either reading from the file directly or via readLines() and passing the text to xmlEventParse(). The problem might be the version of the XML package or your environment settings. And it is important to report the session information. So you should provide the output from sessionInfo() Sys.getenv() libxmlVersion() D On 7/15/13 4:41 AM, Sascha Wolfer wrote: Dear list, I have got a weird encoding problem with the xmlEventParse() function from the 'XML' package. I tried finding an answer on the web for several hours and a Stack Exchange question came back without success :( So here's the problem. I created a small XML test file, which looks like this: ?xml version=1.0 encoding=iso-8859-1? !DOCTYPE testFile s type=manualauch der Schulleiter steht dafür zur Verfügung. Das ist seßhaft mit ä und ö.../s This file is encoded with the iso-8859-1 encoding which is also defined in its header. I have 3 handler functions, definitions as follows: sE2 - function (name, attrs) { if (name == s) { get.text - T } } eE2 - function (name, attrs) { if (name == s) { get.text - F } } tS2 - function (content, ...) { if (get.text nchar(content) 0) { collected.text - c(collected.text, content) } } I have one wrapper function around xmlEventParse(), definition as follows: get.all.text - function (file) { t1 - Sys.time() read.file - paste(readLines(file, encoding = ), collapse = ) print(read.file) assign(collected.text, c(), env = .GlobalEnv) assign(get.text, F, env = .GlobalEnv) xmlEventParse(read.file, asText = T, list(startElement = sE2, endElement = eE2, text = tS2), error = function (...) { }, saxVersion = 1) t2 - Sys.time() cat(That took, round(difftime(t2,t1, units=secs), 1), seconds.\n) cat(Result of reading is in variable 'collected.text'.\n) collected.text } The output of calling get.all.text(test file) is as follows: [1] ?xml version=\1.0\ encoding=\iso-8859-1\? !DOCTYPE testFile s type=\manual\auch der Schulleiter steht dafür zur Verfügung. Das ist seßhaft mit ä und ö.../s That took 0 seconds. Result of reading is in variable 'collected.text'. [1] auch der Schulleiter steht dafür zur Verfügung. Das ist seßhaft mit ä und ö... Now the REALLY weird thing (for me) is that R obviously reads in the file correctly (first output) with 'readLines()'. Then this output is passed to xmlEventParse. Afterwards the output is broken and it sometimes also inserts weird breaks were special characters occur. Do you have any ideas how to solve this problem? I cannot use the xmlParse() function because I need the SAX functionality of xmlEventParse(). I also tried reading the file with xmlEventParse() directly (with asText = F). No changes... Thanks a lot, Sascha W. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] htmlParse (from XML library) working sporadically in the same code
When readHTMLTable() or more generally the HTML/XML parser fails to retrieve a URL, I suggest you use check to see if a different approach will work. You can use the download.file() function or readLines(url()) or getURLContent() from the RCurl package to get the content of the URL. The you can pass that content to readHTMLTable() via readHTMLTable(htmlParse(text, asText = TRUE)) or readHTMLTable(text, asText = TRUE) D. On 3/20/13 10:07 AM, Andre Zege wrote: I am using htmlParse from XML library on a paricular website. Sometimes code fails, sometimes it works, most of the time id doesn't and i cannot see why. The file i am trying to parse is http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0 Sometimes the following code works n-readHTMLTable(htmlParse(url)) But most of the time it would return the following error coming from htmlParse: Error: failed to load HTTP resource Error is coming from the following line in htmlParse code: ans - .Call(RS_XML_ParseTree, as.character(file), handlers, as.logical(ignoreBlanks), as.logical(replaceEntities), as.logical(asText), as.logical(trim), as.logical(validate), as.logical(getDTD), as.logical(isURL), as.logical(addAttributeNamespaces), as.logical(useInternalNodes), as.logical(isHTML), as.logical(isSchema), as.logical(fullNamespaceInfo), as.character(encoding), as.logical(useDotNames), xinclude, error, addFinalizer, as.integer(options), PACKAGE = XML) By the way, readHTMLTable(htmlParse(url)) works fine on other pages, so the problem is somehow related to this page. I am using 64-bit R.15.3 version on windows machine Thanks much Andre [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Create a Data Frame from an XML
Hi Adam [You seem to have sent the same message twice to the mailing list.] There are various strategies/approaches to creating the data frame from the XML. Perhaps the approach that most closely follows your approach is xmlRoot(doc)[ row ] which returns a list of XML nodes whose node name is row that are children of the root node data. So sapply(xmlRoot(doc) [ row ], xmlAttrs) yields a matrix with as many columns as there are row nodes and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes. So d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) ) gives you a matrix with the correct rows and column orientation and now you can turn that into a data frame, converting the columns into numbers, etc. as you want with regular R commands (i.e. independently of the XML). D. On 1/22/13 1:43 PM, Adam Gabbert wrote: Hello, I'm attempting to read information from an XML into a data frame in R using the XML package. I am unable to get the data into a data frame as I would like. I have some sample code below. *XML Code:* Header... Data I want in a data frame: data row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 / row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 / row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 / row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 / row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 / row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 / row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 / row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 / row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 / /data *R Code:* doc -xmlInternalTreeParse (Sample2.xml) top - xmlRoot (doc) xmlName (top) names (top) art - top [[row]] art ** *Output:* artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/ * * This is where I am having difficulties. I am unable to access additional rows; ( i.e. row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / ) and I am unable to access the individual entries to actually create the data frame. The data frame I would like is as follows: BRANDNUMYEARVALUE GMC1 1999 1 FORD 2 2000 12000 GMC1 2001 12500 etc Any help or suggestions would be appreciated. Conversly, my eventual goal would be to take a data frame and write it into an XML in the previously shown format. Thank you AG [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading JSON files from R
Hi m.dr. Reading data from MongoDB is no problem. So the RJSONIO or rjson packages should work. Can you send me the sample file that is causing the problem, please? The error about a method looks like a potential oversight in the combinations of inputs. Thanks D. On 12/3/12 7:30 PM, m.dr wrote: Hello All - I am trying to use RJSONIO to read in some JSON files. I was wondering if anyone could please comment on the level of complexity of the files it can be used to read, exports from or directly from NoSQL DBMS like MongoDB and such. Also, i understand that in reading the JSON file RJSONIO will automatically create the necessary structures. However I cannot seem to use to to read the file properly and get this error: Error in function (classes, fdef, mtable) : unable to find an inherited method for function fromJSON, for signature missing, NULL The call I am making is: noSqlData - fromJSON(file='data.json') It is a small file - with 3 levels of nested records. And if there were some links to some examples with a file and usage would be great. My JSON file validates - so do not believe there is anything wrong with the file. Thanks for your help. -- View this message in context: http://r.789695.n4.nabble.com/Reading-JSON-files-from-R-tp4651976.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading json tables
Hi Michael The actual result I want is two data frames, wheat and monarch, whereas fromJSON returns a list of lists. I'll try to figure that part out. do.call(rbind, data[[1]]) will do the job, but there are elements in each of data[[1]] and data[[2]] that are incomplete and which need to be filled in with NAs before rbinding. Best, D. On 12/2/12 6:26 AM, Michael Friendly wrote: On 12/1/2012 4:08 PM, Duncan Temple Lang wrote: Hi Michael The problem is that the content of the .js file is not JSON, but actual JavaScript code. You could use something like the following tt = readLines(http://mbostock.github.com/protovis/ex/wheat.js;) txt = c([, gsub(;, ,, gsub(var [a-zA-Z]+ = , , tt)), ]) tmp = paste(txt, collapse = \n) tmp = gsub(([a-zA-Z]+):, '\\1:', tmp) o = fromJSON(tmp) data = structure(o[1:2], names = c(wheat, monarch)) Basically, this removes the 'var variable name =' part replaces the ; with a , to separate elements quotes the names of the fields, e.g. year, wheat, wages puts the two global data objects into a top-level array ([]) container This isn't ideal (as the regular expressions are not sufficiently specific and could modify the actual values incorrectly). However, it does the job for this particular file. Thanks for this, Duncan I hadn't understood that the data had to be pure JSON. The actual result I want is two data frames, wheat and monarch, whereas fromJSON returns a list of lists. I'll try to figure that part out. -Michael __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading json tables
Hi Michael The problem is that the content of the .js file is not JSON, but actual JavaScript code. You could use something like the following tt = readLines(http://mbostock.github.com/protovis/ex/wheat.js;) txt = c([, gsub(;, ,, gsub(var [a-zA-Z]+ = , , tt)), ]) tmp = paste(txt, collapse = \n) tmp = gsub(([a-zA-Z]+):, '\\1:', tmp) o = fromJSON(tmp) data = structure(o[1:2], names = c(wheat, monarch)) Basically, this removes the 'var variable name =' part replaces the ; with a , to separate elements quotes the names of the fields, e.g. year, wheat, wages puts the two global data objects into a top-level array ([]) container This isn't ideal (as the regular expressions are not sufficiently specific and could modify the actual values incorrectly). However, it does the job for this particular file. On 12/1/12 12:47 PM, Michael Friendly wrote: I'm trying to read two data sets in json format from a single .js file. I've tried fromJSON() in both RJSONIOIO and RJSON packages, but they require that the lines be pre-parsed somehow in ways I don't understand. Can someone help? wheat - readLines(http://mbostock.github.com/protovis/ex/wheat.js;) str(wheat) chr [1:70] var wheat = [ { year: 1565, wheat: 41, wages: 5 }, ... The wheat.js file looks like this and defines two tables: wheat and monarch: var wheat = [ { year: 1565, wheat: 41, wages: 5 }, { year: 1570, wheat: 45, wages: 5.05 }, { year: 1575, wheat: 42, wages: 5.08 }, { year: 1580, wheat: 49, wages: 5.12 }, { year: 1585, wheat: 41.5, wages: 5.15 }, { year: 1590, wheat: 47, wages: 5.25 }, { year: 1595, wheat: 64, wages: 5.54 }, { year: 1600, wheat: 27, wages: 5.61 }, { year: 1605, wheat: 33, wages: 5.69 }, { year: 1610, wheat: 32, wages: 5.78 }, { year: 1615, wheat: 33, wages: 5.94 }, { year: 1620, wheat: 35, wages: 6.01 }, ... { year: 1800, wheat: 79, wages: 28.5 }, { year: 1805, wheat: 81, wages: 29.5 }, { year: 1810, wheat: 99, wages: 30 }, { year: 1815, wheat: 78 }, // TODO { year: 1820, wheat: 54 }, { year: 1821, wheat: 54 } ]; var monarch = [ { name: Elizabeth, start: 1565, end: 1603 }, { name: James I, start: 1603, end: 1625 }, { name: Charles I, start: 1625, end: 1649 }, { name: Cromwell, start: 1649, end: 1660, commonwealth: true }, { name: Charles II, start: 1660, end: 1685 }, { name: James II, start: 1685, end: 1689 }, { name: WM, start: 1689, end: 1702 }, { name: Anne, start: 1702, end: 1714 }, { name: George I, start: 1714, end: 1727 }, { name: George II, start: 1727, end: 1760 }, { name: George III, start: 1760, end: 1820 }, { name: George IV, start: 1820, end: 1821 } ]; __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem with XML package
Hi Arvin 2.9.2 is very old. 2.13 is still old. Why not upgrade to 2.15.*? However, the problem is that you the object you are passing to xmlName() is NULL. This will give an error in the latest version of the XML package and most likely any version of the XML package. I imagine the structure of the XML document has changed. However, I can't tell what the problem is without some context. D. On 11/15/12 3:00 PM, Torus Insurance wrote: Hi List, I have used XML in R version 2.9.2. The code is working fine using Rv2.9.2 and its related XML package. Now I am using Rv2.13.1 and its related XML package, but I get the following error: Error in UseMethod(xmlName, node) : no applicable method for 'xmlName' applied to an object of class NULL Any idea? Thanks Arvin [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl - curlPerform - Time out?!?
Hi Florian Yes, there are several options for a curl operation that control the timeout. The timeout option is the top-level general one. There is also timeout.ms. You can also control the timeout length for different parts of the operation/request such as via the connecttimeout for just establishing the connection. See the Connection Options in the libcurl help page for curl_easy_setopt. Best, D. On 10/30/12 9:30 AM, Florian Umlauf (CRIE) wrote: Hi, I am working with the RCurl package and I am using the curlPerform function for an soap-query. The problem is that the code is usually working well, but sometimes the connection gets lost. So I wrote a while-loop to repeat the query if anything might happened so that the same query runs again, but if the query-faults it takes a very long time for the repetition. My question is if there is any possibility to force a time out for the curlPerform function or something like that? Thanks! run = 1 i=0 while(run==1) { i=i+1 try( run - curlPerform(url = http://search.webofknowledge.com/esti/wokmws/ws/WokSearchLite.cgi;, httpheader=c(Accept-Encoding=gzip,deflate,Content-Type=text/xml;charset=UTF-8,'SOAPAction'='', Cookie=paste('SID=',s_session,'',sep=),Content-Length=paste(nchar(s_body)),Host=search.webofknowledge.com,Connection=Keep-Alive,User-Agent=Apache-HttpClient/4.1.1 (java 1.5)), postfields=s_body, writefunction = h$update ,verbose = TRUE) ,TRUE) print(i) } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML namespace control
Hi Ben Can you tell us the slightly bigger picture, please? Do you want to create a single similar node entirely in isolation or do you want to create it as part of an XML tree/document? Who will be reading the resulting XML. You can use a parent node top = newXMLNode(storms, namespaceDefinitions = c(weather = http://my.weather.com/events;)) Then newXMLNode(storm, ripsnorter, namespace = weather, attrs = c(type = hurrican, name = Sandy), parent = top ) That gives you weather:storm type=hurrican name=Sandyripsnorter/weather:storm So now what are you going to do with that node? The namespace prefix is local to a document, chosen by the author of that XML document. The namespace URI is the global key that authors and consumers must agree upon. While your database may use udf, you may chose a different prefix or even the default prefix to correspond to that same URI. So each document must explicitly declare the prefix = URI mapping for it to be understood. D. On 10/29/12 5:54 AM, Ben Tupper wrote: Hello, I am working with a database system from which I can retrieve these kinds of user defined fields formed as XML ... udf:field unit=uM type=Numeric name=facs.Stain final concentration5/udf:field You can see in the above example that field is defined in the namespace udf, but that the udf namespace is not defined along with the attributes of the node. That is, 'xmlns:udf = http://blah.blah.com/blah;' doesn't appear. I would like to create a similar node from scratch, but I can't seem to define the node with a namespace without providing the namespace definition. library(XML) node1 - newXMLNode(storm, ripsnorter, namespace = weather, namespaceDefinitions = c(weather = http://my.weather.com/events;), attrs = c(type = hurricane, name = Sandy)) node1 # this returns the new node with the namespace prefix (which I want) # and the definition (which I don't want) # weather:storm xmlns:weather=http://my.weather.com/events; type=hurricane name=Sandyripsnorter/weather:storm node2 - newXMLNode(storm, ripsnorter, namespace = weather, attrs = c(type = hurricane, name = Sandy), suppressNamespaceWarning = TRUE) node2 # produces the node without the namespace prefix and without the definition # storm type=hurricane name=Sandyripsnorter/storm Is there some way to create a node with a namespace prefix but without embedding the namespace definition along with the attributes? Thanks! Ben Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org sessionInfo() R version 2.15.0 (2012-03-30) Platform: i386-apple-darwin9.8.0/i386 (32-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] tripack_1.3-4 RColorBrewer_1.0-5 Biostrings_2.24.1 IRanges_1.14.2 BiocGenerics_0.2.0 RCurl_1.91-1 [7] bitops_1.0-4.1 XML_3.9-4 loaded via a namespace (and not attached): [1] stats4_2.15.0 tools_2.15.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing very large xml datafiles with SAX: How to profile anonymous functions?
Hi Frederic Perhaps the simplest way to profile the individual functions in your handlers is to write the individual handlers as regular named functions, i.e. assigned to a variable in your work space (or function body) and then two write the handler functions as wrapper functions that call these by name startElement = function(name, attr, ...) { # code you want to run when we encounter the start of an XML element } myText = function(...) { # code } Now, when calling xmlEventParse() xmlEventParse(filename, handlers = list(.startElement = function(...) startElement(...), .text = function(...) myText(...))) Then the profiler will see the calls to startElement and myText. There is small overhead of the extra layers, but you will get the profile information. D. On 10/26/12 9:49 AM, Frederic Fournier wrote: Hello everyone, I'm trying to parse a very large XML file using SAX with the XML package (i.e., mainly the xmlEventParsing function). This function takes as an argument a list of other functions (handlers) that will be called to handle particular xml nodes. If when I use Rprof(), all the handler functions are lumped together under the anonymous label, and I get something like this: $by.total total.time total.pct self.time self.pct system.time 151.22 99.99 0.00 0.00 MyParsingFunction149.38 98.77 0.00 0.00 xmlEventParse149.38 98.77 0.00 0.00 .Call149.32 98.73 3.04 2.01 Anonymous 146.74 97.02141.2693.40--- !! xmlValue 3.04 2.01 0.46 0.30 xmlValue.XMLInternalNode 2.58 1.71 0.14 0.09 standardGeneric2.12 1.40 0.50 0.33 gc 1.86 1.23 1.86 1.23 ... Is there a way to make Rprof() identify the different handler functions, so I can know which one might be a bottleneck? Is there another profiling tool that would be more appropriate in a case like this? Thank you very much for your help! Frederic [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Downloading a html table
Rather than requiring manual tweaking, library(XML) readHTMLTable(http://www.worldatlas.com/aatlas/populations/usapoptable.htm;) will do the job for us. D. On 10/22/12 8:17 PM, David Arnold wrote: All, A friend of mine would like to use this data with his stats class: http://www.worldatlas.com/aatlas/populations/usapoptable.htm I can't figure a way of capturing this data due to the mysql commands in the source code. Any thoughts? David. -- View this message in context: http://r.789695.n4.nabble.com/Downloading-a-html-table-tp4647091.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extracting results from Google Search
Hi Eduardo Scraping the coordinates from the HTML page can be a little tricky in this case. Also, Google may not want you using their search engine for that. Instead, you might use their Geocoding API (https://developers.google.com/maps/documentation/geocoding), but do ensure that this fits within their terms of use. If you do use the Geocoding API, you can do with the following code: library(RJSONIO) library(RCurl) DB-data.frame(town=c('Ingall', 'Dogondoutchi', 'Tera'), country=rep('Niger',3)) location = with(DB, paste(town, country)) ans = lapply(location, function(loc) fromJSON(getForm(http://maps.googleapis.com/maps/api/geocode/json;, address = loc, sensor = false))$results[[1]]$geometry$location ) DB = cbind(DB, do.call(rbind, ans)) And now the data frame has the lat and lng variables. Again, check that the Geocoding terms of use allows you to do this. HTH D. On 10/23/12 6:33 AM, ECAMF wrote: Dear list, I have a long list of towns in Africa and would need to get their geographical coordinates. The Google query [/TownName Country coordinates/] works for most of the TownNames I have and give a nicely formatted Google output (try Ingall Niger coordinates for an example). I would like to launch a loop on the list of names I have and automatically extract the coordinates given by Google. Does anyone knows how it can be done? ex. DB-data.frame(town=c('Ingall', 'Dogondoutchi', 'Tera'), country=rep('Niger',3)) # Get lat and lon from the Google search on : for (i in 1:3) { paste(DB$town[i], DB$country[i], 'coordinates', sep=) } Many thanks! Eduardo. -- View this message in context: http://r.789695.n4.nabble.com/Extracting-results-from-Google-Search-tp4647136.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saving to docx
Just to let people know On the Omegahat site (and source on github), there are packages for working with Office Open documents (and LibreOffice too), includinging RWordXML, RExcelXML and the generic package OOXML on which they rely. These are prototypes in the sense that they do not comprehensively cover the entire OOXML specification. Instead, the packages do have functionality for some common things to get data in and out of OO documents, and they have foundation functions for building new features. D. On 10/19/12 3:19 PM, David Winsemius wrote: On Oct 19, 2012, at 2:48 PM, Daróczi Gergely wrote: Hi Javad, saving R output to jpeg depends on what you want to save. For example saving an `lm` object to an image would be fun :) But you could export that quite easily to e.g. docx after installing Pandoc[1] and pander[2] package. You can find some examples in the README[3]. Best, Gergely [1] http://johnmacfarlane.net/pandoc/installing.html [2] http://cran.r-project.org/web/packages/pander/index.html [3a] brew syntax: http://rapporter.github.com/pander/#brew-to-pandoc [3b] in a live R session: http://rapporter.github.com/pander/#live-report-generation I guess I need to retract my comment that such packages only existed on Windows. Despite 'pander' not passing its CRAN package check for Mac, it does build from source and the Pandoc installer does succeed inSnow Leapard and R 2.15.1. Thank you for writing the pander package, Daróczi. On Fri, Oct 19, 2012 at 9:54 PM, javad bayat j.bayat...@gmail.com wrote: hi all, how can i saving R output to docx or Jpeg format? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problems with getURL (RCurl) to obtain list files of an ftp directory
Hi Francisco The code gives me the correct results, and it works for you on a Windows machine. So while it could be different versions of software (e.g. libcurl, RCurl, etc.), the presence of the word squid in the HTML suggests to me that your machine/network is using the proxy/caching software Squid. This intercepts requests and caches the results locally and shares them across local users. So if squid has retrieved that page for an HTML target (e.g. a browser or with a Content-Type set to text/html), it may be using that cached copy for your FTP request. One thing I like to do when debugging RCurl calls is to add verbose = TRUE to the .opts argument and then see the information about the communication. D. On 10/11/12 11:37 AM, Francisco Zambrano wrote: Dear all, I have a problem with the command 'getURL' from the RCurl package, which I have been using to obtain a ftp directory list from the MOD16 (ET, DSI) products, and then to download them. (part of the script by Tomislav Hengl, spatial-analyst). Instead of the list of files (from ftp), I am getting the complete html code. Anyone knows why this might happen? This are the steps i have been doing: MOD16A2.doy- ' ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/' items - strsplit(getURL(MOD16A2.doy, .opts=curlOptions(ftplistonly=TRUE)), \n)[[1]] items #results [1] !DOCTYPE HTML PUBLIC \-//W3C//DTD HTML 4.01 Transitional//EN\ \ http://www.w3.org/TR/html4/loose.dtd\;\n!-- HTML listing generated by Squid 2.7.STABLE9 --\n!-- Wed, 10 Oct 2012 13:43:53 GMT --\nHTMLHEADTITLE\nFTP Directory: ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n/TITLE\nSTYLE type=\text/css\!--BODY{background-color:#ff;font-family:verdana,sans-serif}--/STYLE\n/HEADBODY\nH2\nFTP Directory: A HREF=\/\ftp://ftp.ntsg.umt.edu/A/A HREF=\/pub/\pub/A/A HREF=\/pub/MODIS/\MODIS/A/A HREF=\/pub/MODIS/Mirror/\Mirror/A/A HREF=\/pub/MODIS/Mirror/MOD16/\MOD16/A/A HREF=\/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\MOD16A2.105_MERRAGMAO/A//H2\nPRE\nA HREF=\../\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\; ALT=\[DIRUP]\/A A HREF=\../\Parent Directory/A \nA HREF=\GEOTIFF_0.05degree/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\GEOTIFF_0.05degree/\GEOTIFF_0.05degree/A . . . . . . . Jun 3 18:00\nA HREF=\GEOTIFF_0.5degree/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\GEOTIFF_0.5degree/\GEOTIFF_0.5degree/A. . . . . . . . Jun 3 18:01\nA HREF=\Y2000/\IMG border=\0\ SRC=\http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2000/\Y2000/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2001/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2001/\Y2001/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2002/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2002/\Y2002/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2003/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2003/\Y2003/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2004/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2004/\Y2004/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2005/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2005/\Y2005/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2006/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2006/\Y2006/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2007/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2007/\Y2007/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2008/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2008/\Y2008/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2009/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2009/\Y2009/A. . . . . . . . . . . . . . Dec 23 2010\nA HREF=\Y2010/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2010/\Y2010/A. . . . . . . . . . . . . . Feb 20 2011\nA HREF=\Y2011/\IMG border=\0\ SRC=\ http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\; ALT=\[DIR] \/A A HREF=\Y2011/\Y2011/A. . . . . . . . . . . . . . Mar 12 2012
Re: [R] scraping with session cookies
Hi ? The key is that you want to use the same curl handle for both the postForm() and for getting the data document. site = u = http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18; library(RCurl) curl = getCurlHandle(cookiefile = , verbose = TRUE) postForm(site, disclaimer_action=I Agree) Now we have the cookie in the curl handle so we can use that same curl handle to request the data document: txt = getURLContent(u, curl = curl) Now we can use readHTMLTable() on the local document content: library(XML) tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE) Rather than knowing how to post the form, I like to read the form programmatically and generate an R function to do the submission for me. The RHTMLForms package can do this. library(RHTMLForms) forms = getHTMLFormDescription(u, FALSE) fun = createFunction(forms[[1]]) Then we can use fun(.curl = curl) instead of postForm(site, disclaimer_action=I Agree) This helps to abstract the details of the form. D. On 9/18/12 5:57 PM, CPV wrote: Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write the cookie to a file, but I cannot do it succesfully The webpage cookies are written to the file but the value is FALSE... So any ideas of what I should do or what I am doing wrong with? Thank you for your help, library(RCurl) library(XML) site - http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18; postForm(site, disclaimer_action=I Agree) cf - cookies.txt no_cookie - function() { curlHandle - getCurlHandle(cookiefile=cf, cookiejar=cf) getURL(site, curl=curlHandle) rm(curlHandle) gc() } if ( file.exists(cf) == TRUE ) { file.create(cf) no_cookie() } allTables - readHTMLTable(site) allTables [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] scraping with session cookies
You don't need to use the getHTMLFormDescription() and createFunction(). Instead, you can use the postForm() call. However, getHTMLFormDescription(), etc. is more general. But you need the very latest version of the package to deal with degenerate forms that have no inputs (other than button clicks). You can get the latest version of the RHTMLForms package from github git clone g...@github.com:omegahat/RHTMLForms.git and that has the fixes for handling the degenerate forms with no arguments. D. On 9/19/12 7:51 AM, CPV wrote: Thank you for your help Duncan, I have been trying what you suggested however I am getting an error when trying to create the function fun- createFunction(forms[[1]]) it says Error in isHidden I hasDefault : operations are possible only for numeric, logical or complex types On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang dtemplel...@ucdavis.edu wrote: Hi ? The key is that you want to use the same curl handle for both the postForm() and for getting the data document. site = u = http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18 library(RCurl) curl = getCurlHandle(cookiefile = , verbose = TRUE) postForm(site, disclaimer_action=I Agree) Now we have the cookie in the curl handle so we can use that same curl handle to request the data document: txt = getURLContent(u, curl = curl) Now we can use readHTMLTable() on the local document content: library(XML) tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE) Rather than knowing how to post the form, I like to read the form programmatically and generate an R function to do the submission for me. The RHTMLForms package can do this. library(RHTMLForms) forms = getHTMLFormDescription(u, FALSE) fun = createFunction(forms[[1]]) Then we can use fun(.curl = curl) instead of postForm(site, disclaimer_action=I Agree) This helps to abstract the details of the form. D. On 9/18/12 5:57 PM, CPV wrote: Hi, I am starting coding in r and one of the things that i want to do is to scrape some data from the web. The problem that I am having is that I cannot get passed the disclaimer page (which produces a session cookie). I have been able to collect some ideas and combine them in the code below but I dont get passed the disclaimer page. I am trying to agree the disclaimer with the postForm and write the cookie to a file, but I cannot do it succesfully The webpage cookies are written to the file but the value is FALSE... So any ideas of what I should do or what I am doing wrong with? Thank you for your help, library(RCurl) library(XML) site - http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18 postForm(site, disclaimer_action=I Agree) cf - cookies.txt no_cookie - function() { curlHandle - getCurlHandle(cookiefile=cf, cookiejar=cf) getURL(site, curl=curlHandle) rm(curlHandle) gc() } if ( file.exists(cf) == TRUE ) { file.create(cf) no_cookie() } allTables - readHTMLTable(site) allTables [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory leak using XML readHTMLTable
Hi James Unfortunately, I am not certain if the latest version of the XML package has the garbage collection activated for the nodes. It is quite complicated and that feature was turned off in some versions of the package. I suggest that you install the version of the package on github git@github-omg:omegahat/XML.git I believe that will handle the garbage collection of nodes, and I'd like to know if it doesn't. Best, D. On 9/16/12 8:30 PM, J Toll wrote: Hi, I'm using the XML package to scrape data and I'm trying to figure out how to eliminate the memory leak I'm currently experiencing. In the searches I've done, it sounds like the existence of the leak is fairly well known. What isn't as clear is exactly how to solve it. The general process I'm using is this: require(XML) myFunction - function(URL) { html - readLines(URL) tables - readHTMLTable(html, stringsAsFactors = FALSE) myData - data.frame(Value = tables[[1]][, 2], row.names = make.unique(tables[[1]][, 1]), stringsAsFactors = FALSE) rm(list = c(html, tables)) # here, and free(tables) # here, my attempt to solve the memory leak return(myData) } x - lapply(myURLs, myFunction) I've tried using rm() and free() to try to free up the memory each time the function is called, but it hasn't worked as far as I can tell. By the time lapply is finished woking through my list of url's, I'm swapping about 3GB of memory. I've also tried using gc(), but that seems to also have no effect on the problem. I'm running RStudio 0.96.330 and latest version of XML. R version 2.15.1 (2012-06-22) -- Roasted Marshmallows Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) Any suggestions on how to solve this memory issue? Thanks. James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memory leak using XML readHTMLTable
Thanks Yihui for normalizing my customized git URL. The version of the package on github is in the standard R format and that part of the README is no longer relevant. Sorry for the confusion. It might be simplest to pick up a tar.gz file of the source at http://www.omegahat.org/RSXML/XML_3.94-0.tar.gz D On 9/17/12 12:31 PM, J Toll wrote: On Mon, Sep 17, 2012 at 12:51 PM, Yihui Xie x...@yihui.name wrote: I think the correct address for GIT should be git://github.com/omegahat/XML.git :) Or just https://github.com/omegahat/XML Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-2465 Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA On Mon, Sep 17, 2012 at 11:16 AM, Duncan Temple Lang dun...@wald.ucdavis.edu wrote: Hi James Unfortunately, I am not certain if the latest version of the XML package has the garbage collection activated for the nodes. It is quite complicated and that feature was turned off in some versions of the package. I suggest that you install the version of the package on github git@github-omg:omegahat/XML.git I believe that will handle the garbage collection of nodes, and I'd like to know if it doesn't. Best, D. Hi, Thanks for your response and I'm sorry, I should have been more specific regarding the version of XML. I'm using XML 3.9-4. As a sort of follow-on question? Is there a preferable way to install this version of XML from github? Do I have to use git to clone it, or maybe use the install_github function from Hadley's devtools package? I note that the README indicates that: This R package is not in the R package format in the github repository. It was initially developed in 1999 and was intended for use in both S-Plus and R and so requires a different structure for each. So I was wondering what the general procedure is and whether there's anything special I need to do to install it? Thanks. James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing large XML documents in R - how to optimize the speed?
Hi Frederic You definitely want to be using xmlParse() (or equivalently xmlTreeParse( , useInternalNodes = TRUE)). This then allows use of getNodeSet() I would suggest you use Rprof() to find out where the bottlenecks arise, e.g. in the XML functions or in S4 code, or in your code that assembles the R objects from the XML. I'm happy to take a look at speeding it up if you can make the test file available and show me your code. D. On 8/10/12 3:46 PM, Frederic Fournier wrote: Hello everyone, I would like to parse very large xml files from MS/MS experiments and create R objects from their content. (By very large, I mean going up to 5-10Gb, although I am using a 'small' 40M file to test my code.) My first attempt at parsing the 40M file, using the XML package, took more than 2200 seconds and left me quite disappointed. I managed to cut that down to around 40 seconds by: -using the 'useInternalNodes' option of the XML package when parsing the xml tree; -vectorizing the parsing (i.e., replacing loops like for(node in group.of.nodes) {...} by sapply(group.of.node, function(node){...}) I gained another 5 seconds by making small changes to the functions used (like replacing 'getNodeset' by 'xmlElementsByTagName' when I don't need to navigate to the children nodes). Now I am blocked at around 35 seconds and I would still like to cut this time by a 5x, but I have no clue what to do to achieve this gain. I'll try to expose as briefly as possible the relevant structure of the xml file I am parsing, the structure of the R object I want to create, and the type of functions I am using to do it. I hope that one of you will be able to point me towards a better and quicker way of doing the parsing! Here is the (simplified) structure of the relevant nodes of the xml file: model (many many nodes) protein (a couple of proteins per model node) peptide (1 per protein node) domain (1 or more per peptide node) aa (0 or more per domain node) /aa /domain /peptide /protein /model Here is the basic structure of the R object that I want to create: 'result' object that contains: -various attributes -a list of 'protein' objects, each of which containing: -various attributes -a list of 'peptide' objects, each of which containing: -various attributes -a list of 'aa' objects, each of which consisting of a couple of attributes. Here is the basic structure of the code: xml.doc - xmlTreeParse(file, getDTD=FALSE, useInternalNodes=TRUE) result - new('S4_result_class') result@proteins - xpathApply(xml.doc, //model/protein, function(protein.node) { protein - new('S4_protein_class') ## fill in a couple of attributes of the protein object using xmlValue and xmlAttrs(protein.node) protein@peptides - xpathApply(protein.node, ./peptide, function(peptide.node) { peptide - new('S4_peptide_class') ## fill in a couple of attributes of the peptide object using xmlValue and xmlAttrs(peptide.node) peptide@aas - sapply(xmlElementsByTagName(peptide.node, name=aa), function(aa.node) { aa - new('S4_aa_class') ## fill in a couple of attributes of the 'aa' object using xmlValue and xmlAttrs(aa.node) }) }) }) free(xml.doc) Does anyone know a better and quicker way of doing this? Sorry for the very long message and thank you very much for your time and help! Frederic [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readHTMLTable function - unable to find an inherited method ~ for signature NULL
The second page (mmo-champion.com) doesn't contain a table node. To scrape the data from the page, you will have to explore its HTML structure. D. On 6/14/12 9:31 AM, Moon Eunyoung wrote: Hi R experts, I have been playing with library(XML) recently and found out that readHTMLTable workls flawlessly for some website, but it does give me an error like below ... Error in function (classes, fdef, mtable) : unable to find an inherited method for function readHTMLTable, for signature NULL let's say..for example, this code works fine a -http://www.zam.com/forum.html?forum=21p=2; table_a - readHTMLTable(a, header = TRUE, which = 1, stringsAsFactors = FALSE) but, this website gives me an error - b -http://www.mmo-champion.com/forums/266-General-Discussions/page2; table_b - readHTMLTable(b, header = TRUE, which = 1, stringsAsFactors = FALSE) Error in function (classes, fdef, mtable) : unable to find an inherited method for function readHTMLTable, for signature NULL I think this is due to the structure of the website but i'm not very familiar with HTML file, so I'm curious what part of HTML file makes this happen. (Also it will be great (!) if someone can point out how to output the second example (like the format that first example outputs..) without an error Thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set cookies in RCurl
To just enable cookies and their management, use the cookiefile option, e.g. txt = getURLContent(url, cookiefile = ) Then you can pass this to readHTMLTable(), best done as content = readHTMLTable(htmlParse(txt, asText = TRUE)) The function readHTMLTable() doesn't use RCurl and doesn't handle cookies. D. On 6/7/12 7:33 AM, mdvaan wrote: Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url - http://www.theurl.com; content - readHTMLTable(url) content $`NULL` V1 1 2 Cookies disabled 3 4 Your browser currently does not accept cookies.\rCookies need to be enabled for Scopus to function properly.\rPlease enable session cookies in your browser and try again. $`NULL` V1 V2 V3 1 $`NULL` V1 1 Cookies disabled $`NULL` V1 1 2 3 I have carefully read section 4.4. from this: http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without succes: curl - getCurlHandle() curlSetOpt(cookiejar = 'cookies.txt', curl = curl) Any suggestions on how to allow for cookies? Thanks. Math -- View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to set cookies in RCurl
Apologies for following up on my own mail, but I forgot to explicitly mention that you will need to specify the appropriate proxy information in the call to getURLContent(). D. On 6/7/12 8:31 AM, Duncan Temple Lang wrote: To just enable cookies and their management, use the cookiefile option, e.g. txt = getURLContent(url, cookiefile = ) Then you can pass this to readHTMLTable(), best done as content = readHTMLTable(htmlParse(txt, asText = TRUE)) The function readHTMLTable() doesn't use RCurl and doesn't handle cookies. D. On 6/7/12 7:33 AM, mdvaan wrote: Hi, I am trying to access a website and read its content. The website is a restricted access website that I access through a proxy server (which therefore requires me to enable cookies). I have problems in allowing Rcurl to receive and send cookies. The following lines give me: library(RCurl) library(XML) url - http://www.theurl.com; content - readHTMLTable(url) content $`NULL` V1 1 2 Cookies disabled 3 4 Your browser currently does not accept cookies.\rCookies need to be enabled for Scopus to function properly.\rPlease enable session cookies in your browser and try again. $`NULL` V1 V2 V3 1 $`NULL` V1 1 Cookies disabled $`NULL` V1 1 2 3 I have carefully read section 4.4. from this: http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without succes: curl - getCurlHandle() curlSetOpt(cookiejar = 'cookies.txt', curl = curl) Any suggestions on how to allow for cookies? Thanks. Math -- View this message in context: http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] using XML package to read RSS
Hi James. Yes, you need to identify the namespace in the query, e.g. getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;)) This yeilds 40 matching nodes. (getNodeSet() is more convenient to use when you don't specify a function to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the entire document with the query //) BTW, you want to use xmlParse() and not xmlTreeParse(). D. On 5/16/12 6:40 PM, J Toll wrote: Hi, I'm trying to use the XML package to read an RSS feed. To get started, I was trying to use this post as an example: http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/ I can replicate the beginning section of the post, but when I try to use another RSS feed I have an issue. The RSS feed I would like to use is: URL - http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom; library(XML) doc - xmlTreeParse(URL) src - xpathApply(xmlRoot(doc), //entry) I get an empty list rather than a list of each of the entry: src list() attr(,class) [1] XMLNodeSet I'm not sure how to fix this. Any suggestions? Do I need to provide a namespace, or is the RSS malformed? Thanks, James __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scraping a web page.
Hi Keith Of course, it doesn't necessarily matter how you get the job done if it actually works correctly. But for a general approach, it is useful to use general tools and can lead to more correct, more robust, and more maintainable code. Since htmlParse() in the XML package can both retrieve and parse the HTML document doc = htmlParse(the.url) is much more succinct than using curlPerform(). However, if you want to use RCurl, just use txt = getURLContent(the.url) and that replaces h = basicTextGatherer() curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update) h$value() If you have parsed the HTML document, you can find the a nodes that have an href attribute that start with /en/Ships via hrefs = unlist(getNodeSet(doc, //a[starts-with(@href, '/en/Ships')]/@href)) The result is a character vector and you can extract the relevant substrings with substring() or gsub() or any wrapper of those functions. There are many benefits of parsing the HTML, including not falling foul of as far as I can tell the the a tag is always on it's own line being not true. D. On 5/15/12 4:06 AM, Keith Weintraub wrote: Thanks, That was very helpful. I am using readLines and grep. If grep isn't powerful enough I might end up using the XML package but I hope that won't be necessary. Thanks again, KW -- On May 14, 2012, at 7:18 PM, J Toll wrote: On Mon, May 14, 2012 at 4:17 PM, Keith Weintraub kw1...@gmail.com wrote: Folks, I want to scrape a series of web-page sources for strings like the following: /en/Ships/A-8605507.html /en/Ships/Aalborg-8122830.html which appear in an href inside an a tag inside a div tag inside a table. In fact all I want is the (exactly) 7-digit number before .html. The good news is that as far as I can tell the the a tag is always on it's own line so some kind of line-by-line grep should suffice once I figure out the following: What is the best package/command to use to get the source of a web page. I tried using something like: if(url.exists(http://www.omegahat.org/RCurl;)) { h = basicTextGatherer() curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update) # Now read the text that was cumulated during the query response. h$value() } which works except that I get one long streamed html doc without the line breaks. You could use: h - readLines(http://www.omegahat.org/RCurl;) -- or -- download.file(url = http://www.omegahat.org/RCurl;, destfile = tmp.html) h = scan(tmp.html, what = , sep = \n) and then use grep or the XML package for processing. HTH James [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to download data from soap server using R
There is a kegg package available from the BioConductor repository. Also, you can generate an interface via the SSOAP package: library(SSOAP) w = processWSDL(http://soap.genome.jp/KEGG.wsdl) iface = genSOAPClientInterface(, ) iface@functions$list_datbases() D. On 5/6/12 3:01 AM, sagarnikam123 wrote: i don't know perl,but on server site,they give soap:lite using perl , go to---http://www.kegg.jp/kegg/soap/doc/keggapi_manual.html i want to download data from kegg server ,using R only, how to proceed? what is mean by SOAP client driver ? also go to http://soap.genome.jp/KEGG.wsdl -- View this message in context: http://r.789695.n4.nabble.com/how-to-download-data-from-soap-server-using-R-tp4612595.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readHTLMTable help
Hi Lucas The HTML page is formatted by using tables in each of the cells of the top-most table. As a result, the simple table is much more complex. readHTMLTable() is intended for quick and easy tables. For tables such as this, you have to implement more customized processors. doc = htmlParse(http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980;) tb = getNodeSet(doc, //table)[[1]] This gives the top-most table. xmlSize(tb) tells us the number of rows. We want to skip the first 3 to get to the data. Then in each of these you can process each row and the cells that have the data. And the details go on D. On 3/27/12 10:57 AM, Lucas wrote: Hello to everyone. I´m using this function to download some information from a website. This is the URL: http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980 If you go to that website you´ll find a table with meteorological information. One column is called Intesidad Máxima Diaria, and that is the one i need. I´ve been traying to extract that column, but I´m unable to do it. First I tryed simple to download the complete table and then do some kind of filter to extract the column but, for some reason when I call the function a-readHTLMTable(url), the table is downloaded in a unfriendly format and I can not differentiate the column If anyone could help me I´ll appreciate it. Thank you. Lucas. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SSOAP and Chemspider: Security token?
Hi Michael Thanks for the report and digging into the actual XML documents that are sent. It turns out that if I remove the redundant namespace definitions and just use a single one on the SimpleSearch node, all is apparently fine. I've put a pre-release version of the SSOAP package that does at http://www.omegahat.org/Prerelease/SSOAP_0.9-1.tar.gz You can try that. I'll release this version when I also fix the issue with XMLSchema that causes the error in genSOAPClientInterface() BTW, the if(!is.character(token)) in the example in chemSpider.R is an error - a mixture of !is.null() and then checking only if it is a character. Best, Duncan On 3/7/12 4:58 AM, Stravs, Michael wrote: Dear community, has anyone managed to get SSOAP working with the ChemSpider Web APIs, using functions which need the security token? I use SSOAP 0.9-0 from the OmegaHat repository. In the example code from SSOAP there is a sample which uses a token function. Interestingly, it checks if(!is.character(token)) first (and proceeds if the token is NOT character.) I can't test that function since I have no idea how to get the token into non-character form :) My code: library(SSOAP) chemspider_sectoken - ---- # (token was here) cs - processWSDL(http://www.chemspider.com/Search.asmx?WSDL;) # intf - genSOAPClientInterface(,cs) # (this fails, see below. The Mass Spec API is correctly parsed. Therefore by hand:) csidlist - .SOAP(server=cs@server, method=SimpleSearch, .soapArgs=list( query=Azithromycin, token=token ), action=I(http://www.chemspider.com/SimpleSearch;), xmlns=c(http://www.chemspider.com/;) ) Fehler: Error occurred in the HTTP request: Unauthorized web service usage. Please request access to this service. --- Unauthorized web service usage. Please request access to this service. If one looks into the request, the doc seems to be correct: ?xml version=1.0? SOAP-ENV:Envelope xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/; xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:xsd=http://www.w3.org/2001/XMLSchema; SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/; SOAP-ENV:Body ns:SimpleSearch xmlns:ns=http://www.chemspider.com/; ns:query xmlns:ns=http://www.chemspider.com/; xsi:type=xsd:stringAzithromycin/ns:query ns:token xmlns:ns=http://www.chemspider.com/; xsi:type=xsd:string----/ns:token /ns:SimpleSearch /SOAP-ENV:Body /SOAP-ENV:Envelope Compared to the sample request from the ChemSpider homepage: ?xml version=1.0 encoding=utf-8? soap:Envelope xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:xsd=http://www.w3.org/2001/XMLSchema; xmlns:soap=http://schemas.xmlsoap.org/soap/envelope/; soap:Body SimpleSearch xmlns=http://www.chemspider.com/; querystring/query tokenstring/token /SimpleSearch /soap:Body /soap:Envelope To me, both look like perfectly fine XML and should be identical to the ChemSpider XML parser. Do I need to write the token in another format? Is something else wrong? I also tried to use .literal = T to no avail: Error occurred in the HTTP request: Empty query Best regards, -Michael PS: the error message from the genSOAPClientInterface call is: Note: Method with signature ClassDefinition#list chosen for function resolve, target signature ExtendedClassDefinition#SchemaCollection. SOAPType#SchemaCollection would also be valid Fehler in makePrototypeFromClassDef(properties, ClassDef, immediate, where) : 'name' muss eine nicht-Null Zeichenkette sein Zusätzlich: Warnmeldung: undefined slot classes in definition of ExactStructureSearchOptions: NA(class EMatchType) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl format
Hi KTD Services (!) I assume by DELETE, you mean the HTTP method and not the value of a parameter named _method that is processed by the URL script. If that is the case, then you want to use the customRequest option for the libcurl operation and you don't need or want to use postForm(). Either curlPerform(url = url, customrequest = DELETE, userpwd = user:password) or with a recent version of the RCurl package httpDELETE(url, userpwd = user:password) The parameter _method you are using is being passed on to the form script. It is not recognized by postForm() as being something controlling the request, but just part of the form submission. D. On 1/30/12 2:55 AM, KTD Services wrote: I am having trouble with the postForm function in RCurl. I want to send a the command DELETE https://somewebsite.com.json but I can't seem to find it. I could try: postForm(url, _method=DELETE, .opts = list(username:password) ) but I get the error: Error: unexpected input in postForm(url4, _ this error seems to be due to the underscore _ before method Any ideas how I can do a DELETE command another way in RCurl? Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting htmlParse to work with Hebrew? (on windows)
With some off-line interaction and testing by Tal, the latest version of the XML package (3.9-4) should resolve these issues. So the encoding from the document is used in more cases as the default. It is often important to specify the encoding for HTML files in the call to htmlParse() and use UTF-8 rather than the lower case. I'll add code to make this simpler when I get a chance. Thanks Tal D. On 1/30/12 5:35 AM, Tal Galili wrote: Hello dear R-help mailing list. I wish to be able to have htmlParse work well with Hebrew, but it keeps to scramble the Hebrew text in pages I feed into it. For example: # why can't I parse the Hebrew correctly? library(RCurl) library(XML) u = http://humus101.com/?p=2737; a = getURL(u) a # Here - the hebrew is fine. a2 - htmlParse(a) a2 # Here it is a mess... None of these seem to fix it: htmlParse(a, encoding = utf-8) htmlParse(a, encoding = iso8859-8) This is my locale: Sys.getlocale() [1] LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255 Any suggestions? Thanks up front, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Custom XML Readers
In addition to the general tools of the XML package, I also had code that read documents with a similar structure to the ones Andy illustrated. I put them and simple examples of using them at the bottom of http://www.omegahat.org/RSXML/ page. D. On 12/23/11 5:50 PM, Ben Tupper wrote: Hi Andy, On Dec 23, 2011, at 2:51 PM, pl.r...@gmail.com wrote: I need to construct a custom XML reader, the files I'm working with are in funky XML format: str name=authorPaul H/str str name=countryUSA/str date name=created_date2010-02-16/date I want to read the file so it looks like: author = Paul H country = USA created_date=2010-02-16 Does any one know how to go about this problem, or know of good references i could access? Have you tried Duncan Temple Lang's XML package for R? It works very well for parsing and building XML formatted data. http://www.omegahat.org/RSXML/ Cheers, Ben Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Text Mining with Facebook Reviews (XML and FQL)
Hi Kenneth First off, you probably don't need to use xmlParseDoc(), but rather xmlParse(). (Both are fine, but xmlParseDoc() allows you to control many of the options in the libxml2 parser, which you don't need here.) xmlParse() has some capabilities to fetch the content of URLs. However, it cannot deal with HTTPS requests which this call to facebook is. The approach to this is to i) make the request ii) parse the resulting string via xmlParse(txt, asText = TRUE) As for i), there are several ways to do this, but the RCurl package allows you to do it entirely within R and gives you more control over the request than you would ever want. library(RCurl) txt = getForm('https://api.facebook.com/method/fql.query', query = QUERY) mydata.xml = xmlParse(txt, asText = TRUE) However, you are most likely going to have to login / get a token before you make this request. And then, if you are using RCurl, you will want to use the same curl object with the token or cookies, etc. D. On 10/10/11 3:52 PM, Kenneth Zhang wrote: Hello, I am trying to use XML package to download Facebook reviews in the following way: require(XML) mydata.vectors - character(0) Qword - URLencode('#IBM') QUERY - paste('SELECT review_id, message, rating from review where message LIKE %',Qword,'%',sep='') Facebook_url = paste('https://api.facebook.com/method/fql.query?query= ',QUERY,sep='') mydata.xml - xmlParseDoc(Facebook_url, asText=F) mydata.vector - xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue, namespaces =c('s'='http://www.w3.org/2005/Atom')) The mydata.xml is NULL therefore no further step can be execute. I am not so familiar with XML or FQL. Any suggestion will be appreciated. Thank you! Best regards, Kenneth [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Add png image outside plot borders
Amelia You can persuade rasterImage() (and other functions) to draw outside of the data region using xpd = NA or xpd = TRUE. See the help for the par function. D. On 9/18/11 1:59 PM, Amelia McNamara wrote: If you run this, you'll see that I have some text at the bottom, but the logo is within the plot borders. plot(c(1.1, 2.3, 4.6), c(2.0, 1.6, 3.2), ylab=, xlab=) mtext(X axis label, side=1, line=3) mtext(Copyright statement, side=1, line=4, adj=0, cex=0.7) library(png) z - readPNG(Cc.logo.circle.png) rasterImage(z, 1, 1.6, 1.2, 1.7) I've tried doing things like rasterImage(z, 1, 0.5, 1.2, 1) but nothing shows up. The documentation for rasterImage() says that the corner values have to be within the plot region. As I said before, I want the logo to be down on the level of my copyright text, outside the plot region. Thanks! On Sun, Sep 18, 2011 at 1:26 PM, Joshua Wiley jwiley.ps...@gmail.com wrote: Hi Amelia, Can you give an example (using text where you want the CC is fine)? Two angles I would try would be A) changing the regions or related but more flexible (and hence complex) B) use grid of course if you're making these with, say, ggplot2, you're already in grid (but then mtext probably would not work, though I have not tried it offhand). Anyway, an example (code please, not just the picture), will clear up all these questions and we can offer a solution tailored to what you are doing. Cheers, Josh On Sun, Sep 18, 2011 at 1:18 PM, Amelia McNamara amelia.mcnam...@stat.ucla.edu wrote: I am trying to add a copyright disclaimer outside the plot borders of some images I have created. I can use mtext() to add the written portion, but I would like to have the Creative Commons license image (http://en.wikipedia.org/wiki/File:Cc.logo.circle.svg) before the text. I've found that I can plot a .png image inside the plot boundaries using rasterImage() but I can't figure out how to do it outside the boundaries. Any help would be great. If you know unicode or Adobe Symbol encoding for the CC logo, that might work too. ~Amelia McNamara Statistics PhD student, UCLA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] htmlParse hangs or crashes
Hi Simon Unfortunately, it works for me on my OS X machine. So I can't reproduce the problem. I'd be curious to know which version of libxml2 you are using. That might be the cause of the problem. You can find this with library(XML) libxmlVersion() You might install a more recent version (e.g. libxml = 2.07.0) You can send the info to me off list and we can try to resolve the problem. htmlParse() returns a reference to the internal C-level XML tree/document. When you print the value of the variable .x, we then serialize that C-level data structure to a string. htmlTreeParse(), by default, converts that C-level XML tree/document into regular R objects. So it traverses the tree and creates those R list()s before it returns and then throws the C-level tree away. D. On 9/5/11 2:48 PM, Simon Kiss wrote: Dear colleagues, each time I use htmlParse, R crashes or hangs. The url I'd like to parse is included below as is the results of a series of basic commands that describe what I'm experiencing. The results of sessionInfo() are attached at the bottom of the message. The thing is, htmlTreeParse appears to work just fine, although it doesn't appear to contain the information I need (the URLs of the articles linked to on this search page). Regardless, I'd still like to understand why htmlParse doesn't work. Thank you for any insight. Yours, Simon Kiss myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;) .x-htmlParse(myurl) class(.x) #returns HTMLInternalDocument XMLInternalDocument .x #returns *** caught segfault *** address 0x1398754, cause 'memory not mapped' Traceback: 1: .Call(RS_XML_dumpHTMLDoc, doc, as.integer(indent), as.character(encoding), as.logical(indent), PACKAGE = XML) 2: saveXML(from) 3: saveXML(from) 4: asMethod(object) 5: as(x, character) 6: cat(as(x, character), \n) 7: print.XMLInternalDocument(pointer: 0x11656d3e0) 8: print(pointer: 0x11656d3e0) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] XML_3.4-0 RCurl_1.5-0bitops_1.0-4.1 * Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R hangs after htmlTreeParse
Hi Simon I tried this on OS X, Linux and Windows and it works without any problem. So there must be some strange interaction with your configuration. So below are some things to try in order to get more information about the problem. It would be more informative to give us the explicit version information about the packages, e.g. use sessionInfo(). Details are very important in cases like this. In addition the versions of the packages, it is also important to identify the version of libxml via the libxmlVersion() function. (Mine is 2.07.03. Yours may still be in the 2.6.16 region. I can't recall the defaults on OS X 10.6.) Are you doing this in a GUI or at the command-line? If the former, try the latter, i.e. run the commands in a terminal and see if that changes anything, e.g. if any characters are causing problems. Since you are seeing some of the HTML document appear on the console, the problem is in the implicit call to print when after the call to htmlTreeParse(). The problem is likely to be delayed if you assign the result of htmlTreeParse() to a variable and do not induce this call to print(). Then you can explore the tree and see if it is corrupted in some way. Furthermore, you might use htmlParse(). It returns the tree in a very different form, but which can be manipulated with the same R functions, and also XPath queries. I very rarely (i.e. never) use htmlTreeParse() anymore. D. On 8/25/11 8:41 AM, Simon Kiss wrote: Dear colleagues, I'm trying to parse the html content from this webpage: http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011 Using the following code library(RCurl) library(XML) myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;) .x-getURL(myurl) htmlTreeParse(.x, asText=T) This prints approximately 15 lines of the output from the html document and then mysteriously stops. The command line prompt does not reappear and force quit is the only option. I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL are installed. Yours, Simon Kiss __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert an xml object into a list on R 2.13
Hi Samuel The xmlToList() function is still in the XML package. I suspect you are making some simple mistake like not loading the XML package or haven't installed it or are not capitalizing the name of the function correctly (you refer the xml package rather than by its actual name). You haven't told us about your operating system or the output of sessionInfo(). We don't even know which version of the XML package you seem to be using. D. On 8/16/11 8:52 AM, Samuel Le wrote: Hi, I am manipulating xml objects using the package xml. With the version 2.10.1 this package included the function xmlToList that was converting the xml into a list straight away. This function seems to have gone when I moved to 2.13.0. Does someone has an equivalent for it? Thanks, Samuel [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading XML files masquerading as XL files
Hi Dennis That those files are in a directory/folder suggests that they were extracted from their zip (.xlsx) file. The following are the basic contents of the .xlsx file 1484 02-28-11 12:48 [Content_Types].xml 733 02-28-11 12:48 _rels/.rels 972 02-28-11 12:48 xl/_rels/workbook.xml.rels 846 02-28-11 12:48 xl/workbook.xml 940 02-28-11 12:48 xl/styles.xml 1402 02-28-11 12:48 xl/worksheets/sheet2.xml 7562 02-28-11 12:48 xl/theme/theme1.xml 1888 02-28-11 12:48 xl/worksheets/sheet1.xml 470 02-28-11 12:48 xl/sharedStrings.xml 196 02-28-11 12:48 xl/calcChain.xml 21316 02-28-11 12:48 docProps/thumbnail.jpeg 629 02-28-11 12:48 docProps/core.xml 828 02-28-11 12:48 docProps/app.xml If most of these are present, I would explore whether the sender could give them to you without unzipping them or make sure that your software isn't automatically unzipping them for you. Note that not all files in the .xlsx are sheets and the WorkSheet is the basic entity that corresponds to a .csv file. The xlsx package and my REXcelXML packages will probably get you a fair bit of the way in extracting the content, but they probably will need some tinkering since they expect the different components to be in a zip archive. There is also an office2010 package which seems to have an overlap with what is in xlsx, and ROOXML, RWordXML and RExcelXML. D. On 8/10/11 7:26 AM, Dennis Fisher wrote: R version 2.13.1 OS X (or Windows) Colleagues, I received a number of files with a .xls extension. These files open in XL and, by all appearances, are XL files. However, it appears to me that the files are actually XML: readLines(dir()[16])[1:10] [1] ?xml version=\1.0\? [2] Workbook xmlns=\urn:schemas-microsoft-com:office:spreadsheet\ [3] xmlns:o=\urn:schemas-microsoft-com:office:office\ [4] xmlns:x=\urn:schemas-microsoft-com:office:excel\ [5] xmlns:ss=\urn:schemas-microsoft-com:office:spreadsheet\ [6] xmlns:html=\http://www.w3.org/TR/REC-html40\; [7] DocumentProperties xmlns=\urn:schemas-microsoft-com:office:office\ [8] Version12.0/Version [9] /DocumentProperties [10] OfficeDocumentSettings xmlns=\urn:schemas-microsoft-com:office:office\ I had initially tried to read the files using read.xls (gdata) but that failed (not surprisingly). I could open each Excel file, then save as csv, then use read.csv. However, there are many files so I would love to have a solution that does not require this brute force approach. Are there any packages that would allow me to read these files without the additional steps? Dennis Dennis Fisher MD P (The P Less Than Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SSOAP chemspider
Hi Paul I've been gradually filling in the XMLSchema packages for different cases that arise. My development versions of SSOAP and XMLSchema get a long way further and I have been trying to find time to finish them off. Fortunately, it is on my todo list for the next few weeks. I have released new (source) versions of the packages (XMLSchema 0.2-0 and SSOAP 0.6-0) on the Omegahat repository. These succeed in the genSOAPClientInterface(, processWSDL( url )) for each of the 3 WSDLs in your email Also, there are numerous WSDLs in the source of the package and also mentioned in the Todo.xml file and the code works for almost all of those. Thanks for the report D. On 8/2/11 9:10 AM, Benton, Paul wrote: Has anyone got SSOAP working on anything besides KEGG? I just tried another 3 SOAP servers. Both the WSDL and constructing the .SOAP call. Again the perl and ruby interface worked without any hitches. Paul library(SSOAP) massBank-processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ In addition: Warning message: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) : Ignoring additional serviceport ... elements metlin-processWSDL(http://metlin.scripps.edu/soap/metlin.wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ pubchem-processWSDL(http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ On 20 Jul 2011, at 01:54, Benton, Paul wrote: Dear all, I've been trying on and off for the past few months to get SSOAP to work with chemspider. First I tried the WSDL file: cs-processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;) Error in parse(text = paste(txt, collapse = \n)) : text:1:29: unexpected input 1: function(x, ..., obj = new( ‚ ^ In addition: Warning message: In processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;) : Ignoring additional serviceport ... elements Next I've tried using just the pure .SOAP to call the database. s - SOAPServer(http://www.chemspider.com/MassSpecAPI.asmx;) csid- .SOAP(s, SearchByMass2, mass=89.04767, range=0.01, action = I(http://www.chemspider.com/SearchByMass2;), xmlns = c(http://www.chemspider.com;), .opts = list(verbose = TRUE)) This seems to work and gives back a result. However, this result isn't the right result. It's seems to have converted the mass into 0. When I run the similar program in perl I get the correct id's. So this isn't a server side problem but SSOAP. Any thoughts or suggestions on other packages to use? Further infomation about the SeachByMass2 method and it's xml that it's expecting. http://www.chemspider.com/MassSpecAPI.asmx?op=SearchByMass2 Cheers, Paul PS Placing a fake error in the .SOAP code I can look at the xml it's sending to the server: Browse[1] doc ?xml version=1.0? SOAP-ENV:Envelope xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/; xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xmlns:xsd=http://www.w3.org/2001/XMLSchema; SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/; SOAP-ENV:Body ns:SearchByMass2 xmlns:ns=http://www.chemspider.com; ns:mass89.04767/ns:mass ns:range0.01/ns:range /ns:SearchByMass2 /SOAP-ENV:Body /SOAP-ENV:Envelope __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data from password protected url
Hi Steve RCurl can help you when you need to have more control over Web requests. The details vary from Web site to Web site and the different ways to specify passwords, etc. If the JSESSIONID and NCES_JSESSIONID are regular cookies and returned in the first request as cookies, then you can just have RCurl handle the cookies But the basics for your case are library(RCurl) h = getCurlHandle( cookiefile = ) Then make your Web request using getURLContent(), getForm() or postForm() but making certain to pass the curl handle stored in h in each call, e.g. ans = getForm(yourURL, login = bob, password = jane, curl = h) txt = getURLContent(dataURL, curl = h) If JSESSIONID and NCES_JSESSIONID are not returned as cookies but HTTP header fields, then you need to process the header. Something like rdr = dynCurlReader(h) ans = getForm(yourURL, login = bob, password = jane, curl = h, header = rdr$update) Then the header from the HTTP response is available as rdr$header() and you can use parseHTTPHeader(rdr$header()) to convert it into a named vector. HTH, D. On 6/24/11 2:12 PM, Steven R Corsi wrote: I am trying to retrieve data from a password protected database. I have login information and the proper url. When I make a request to the url, I get back some info, but need to read the hidden header information that has JSESSIONID and NCES_JSESSIONID. They need to be used to set cookies before sending off the actual url request that will result in the data transfer. Any help would be much appreciated. Thanks Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv fails to read a CSV file from google docs
Thanks David for fixing the early issues. The reason for the failure is that the response from the Web server is a to redirect the requester to another page, specifically https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv Note that this is https, not http, and the built-in URL reading facilities in R don't suport https. One way to see this is to use look at the headers in your browser (e.g. Live HTTP Headers), or to use curl, or the RCurl package tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;, hl =en, key = 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE, single = true, gid =0, output = csv, .opts = list(followlocation = TRUE, verbose = TRUE)) The verbose option shows the entire dialog, and tt contains the text of the CSV document. read.csv(textConnection(tt)) then yields the data frame D. On 4/29/11 10:36 AM, David Winsemius wrote: On Apr 29, 2011, at 11:19 AM, Tal Galili wrote: Hello all, I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url - http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv read.csv(data_url) Which results in the following error: Error in file(file, rt) : cannot open the connection I'm on windows 7. And the code was tried on R 2.12 and 2.13 I remember trying this a few months ago and it worked fine. I am always amused at such claims. Occasionally they are correct, but more often a crucial step has been omitted. In this case you have at a minimum embedded line-feeds in your URL string and have not established a connection, so it could not possibly have succeeded as presented. But now it's time to admit I do not know why it is not succeeding when I correct those flaws. closeAllConnections() data_url - url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv;) read.csv(data_url) Error in open.connection(file, rt) : cannot open the connection closeAllConnections() dd - read.csv(con - url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv;)) Error in open.connection(file, rt) : cannot open the connection So, I guess I'm not reading the help pages for `url` and `read.csv` as well I thought I was. Any suggestion what might be causing this or how to solve it? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.csv fails to read a CSV file from google docs
Hi Tal You can add ssl.verifypeer = FALSE in the .opts list so that the certificate is simply accepted. Alternatively, you can tell libcurl where to find the certification authority file containing signatures. This can be done via the cainfo option, e.g. cainfo = system.file(CurlSSL, cacert.pem, package = RCurl), Often such a collection of certificates is installed with the ssl library. D. On 4/29/11 2:42 PM, Tal Galili wrote: Hello Duncan, Thank you for having a look at this. I tried the code you provided but it failed in the getForm stage. running this: tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;, + hl =en, key = 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE, + single = true, gid =0, + output = csv, + .opts = list(followlocation = TRUE, verbose = TRUE)) Resulted in the following error: Error in curlPerform(url = url, headerfunction = header$update, curl = curl, : SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed Did I miss some step? Contact Details:--- Contact me: tal.gal...@gmail.com mailto:tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com http://www.talgalili.com (Hebrew) | www.biostatistics.co.il http://www.biostatistics.co.il (Hebrew) | www.r-statistics.com http://www.r-statistics.com (English) -- On Fri, Apr 29, 2011 at 9:18 PM, Duncan Temple Lang dun...@wald.ucdavis.edu mailto:dun...@wald.ucdavis.edu wrote: Thanks David for fixing the early issues. The reason for the failure is that the response from the Web server is a to redirect the requester to another page, specifically https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv Note that this is https, not http, and the built-in URL reading facilities in R don't suport https. One way to see this is to use look at the headers in your browser (e.g. Live HTTP Headers), or to use curl, or the RCurl package tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;, hl =en, key = 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE, single = true, gid =0, output = csv, .opts = list(followlocation = TRUE, verbose = TRUE)) The verbose option shows the entire dialog, and tt contains the text of the CSV document. read.csv(textConnection(tt)) then yields the data frame D. On 4/29/11 10:36 AM, David Winsemius wrote: On Apr 29, 2011, at 11:19 AM, Tal Galili wrote: Hello all, I wish to use read.csv to read a google doc spreadsheet. I try using the following code: data_url - http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv read.csv(data_url) Which results in the following error: Error in file(file, rt) : cannot open the connection I'm on windows 7. And the code was tried on R 2.12 and 2.13 I remember trying this a few months ago and it worked fine. I am always amused at such claims. Occasionally they are correct, but more often a crucial step has been omitted. In this case you have at a minimum embedded line-feeds in your URL string and have not established a connection, so it could not possibly have succeeded as presented. But now it's time to admit I do not know why it is not succeeding when I correct those flaws. closeAllConnections() data_url - url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv) read.csv(data_url) Error in open.connection(file, rt) : cannot open the connection closeAllConnections() dd - read.csv(con - url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv http://spreadsheets0.google.com/spreadsheet/pub?hl
Re: [R] RCurl and postForm()
Hi Ryan postForm() is using a different style (or specifically Content-Type) of submitting the form than the curl -d command. Switching the style = 'POST' uses the same type, but at a quick guess, the parameter name 'a' is causing confusion and the result is the empty JSON array - []. A quick workaround is to use curlPerform() directly rather than postForm() r = dynCurlReader() curlPerform(postfields = 'Archbishop Huxley', url = 'http://www.datasciencetoolkit.org/text2people', verbose = TRUE, post = 1L, writefunction = r$update) r$value() This yields [1] [{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:0,\end_index\:17,\matched_string\:\Archbishop Huxley\}] and you can use fromJSON() to transform it into data in R. D. On 4/29/11 12:14 PM, Elmore, Ryan wrote: Hi everybody, I think that I am missing something fundamental in how strings are passed from a postForm() call in R to the curl or libcurl functions underneath. For example, I can do the following using curl from the command line: $ curl -d Archbishop Huxley http://www.datasciencetoolkit.org/text2people; [{gender:u,first_name:,title:archbishop,surnames:Huxley,start_index:0,end_index:17,matched_string:Archbishop Huxley}] Trying the same thing, or what I *think* is the same thing (obvious not) in R (Mac OS 10.6.7, R 2.13.0) produces: library(RCurl) Loading required package: bitops api - http://www.datasciencetoolkit.org/text2people; postForm(api, a=Archbishop Huxley) [1] [{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:44,\end_index\:61,\matched_string\:\Archbishop Huxley\},{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:88,\end_index\:105,\matched_string\:\Archbishop Huxley\}] attr(,Content-Type) charset text/html utf-8 I can match the result given on the DSTK API's website by using system(), but doesn't seem like the R-like way of doing something. system(curl -d 'Archbishop Huxley' 'http://www.datasciencetoolkit.org/text2people') 158 141 141 141 0[{gender:u,first_name:,title:archbishop,surnames:Huxley,start_index:0,end_index:17,matched_string:Archbishop Huxley}]17599 72 --:--:-- --:--:-- --:--:-- 670 If you want to see some additional information related to this question, I posted on StackOverflow a few days ago: http://stackoverflow.com/questions/5797688/post-request-using-rcurl I am working on this R wrapper for the data science toolkit as a way of illustrating how to make an R package for the Denver RUG and ran into this problem. Any help to this problem will be greatly appreciated by the Denver RUG! Cheers, Ryan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Treatment of xml-stylesheet processing instructions in XML module
Hi Adam To use XPath and getNodeSet on an XML document, you will want to use xmlParse() and not xmlTreeParse() to parse the XML content. So t = xmlParse(I(a)) # or asText = TRUE elem = getNodeSet(t, /rss/channel/item)[[1]] works fine. You don't need to specify the root node, but rather the document in getNodeSet. Also, if you have the package loaded, you don't need the XML:: prefix before the function names. HTH D. On 4/6/11 11:32 AM, Adam Cooper wrote: Hello again, Another stumble here that is defeating me. I try: a-readLines(url(http://feeds.feedburner.com/grokin;)) t-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE, asText=TRUE) elem- XML::getNodeSet(XML::xmlRoot(t),/rss/channel/item)[[1]] And I get: Start tag expected, '' not found Error: 1: Start tag expected, '' not found When I modify the second line in a to remove the following (just leaving the rss tag with its attributes), I do not get the error. I removed: ?xml-stylesheet type=\text/xsl\ media=\screen\ href= \/~d/styles/rss2full.xsl\??xml-stylesheet type=\text/css\ media= \screen\ href=\http://feeds.feedburner.com/~d/styles/itemcontent.css \? I would have expected the PI to be totally ignored by default. Have I missed something?? Thanks in advance... Cheers, Adam __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Package XML: Parse Garmin *.tcx file problems
Hi Michael Almost certainly, the problem is that the document has a default namespace. You need to identify the namespace in the XPath query. xpathApply() endeavors to make this simple: xpathApply(doc2, //x:TotalTimeSeconds, xmlValue, namespaces = x) I suspect that will give you back something more meaningful. The x in the query (x:TotalTimeSeconds) is mapped to x = URI in namespaces and since the URI is not specified, we use the default namespace on the root node of the document. Some documents don't have a default namespace, and then you can use the prefix on the root node corresponding to the namespace of interest. D On 3/30/11 1:15 PM, Folkes, Michael wrote: I'm struggling with package XML to parse a Garmin file (named *.tcx). I wonder if it's form is incomplete, but appreciably reluctant to paste even a shortened version. The output below shows I can get nodes, but an attempt at value of a single node comes up empty (even though there is data there. One question: Has anybody succeeded parsing Garmin .tcx (xml) files? Thanks! Michael ___ doc2 = xmlRoot(xmlTreeParse(HR.reduced3.tcx,useInternalNodes = TRUE)) xpathApply(doc2, //*, xmlName) [[1]] [1] TrainingCenterDatabase [[2]] [1] Activities [[3]] [1] Activity [[4]] [1] Id [[5]] [1] Lap [[6]] [1] TotalTimeSeconds xpathApply(doc2, //TotalTimeSeconds, xmlValue) list() __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scrap java scripts and styles from an html document
On 3/28/11 11:38 PM, antujsrv wrote: Hi, I am working on developing a web crawler in R and I needed some help with regard to removal of javascripts and style sheets from the html document of a web page. i tried using the xml package, hence the function xpathApply library(XML) txt = xpathApply(html,//body//text()[not(ancestor::script)][not(ancestor::style)], xmlValue) The output comes out as text lines, without any html tags. I want the html tags to remain intact and scrap only the javascript and styles from it. Well then you would be best served to use that approach, i.e. find the nodes named script and style and then remove them from the tree. Then you have the document as a single object rather than a bunch of individual elements. So nodes = xpathApply(html, //body//script | //body//style) removeNodes(nodes) saveXML(html) But you don't say what you want to end up with or what you are doing with the resulting content or why you have to remove the JavaScript content, etc. D. Any help would be highly appreciated. Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Scrap-java-scripts-and-styles-from-an-html-document-tp3413894p3413894.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl HTTP Post ?
On 2/17/11 3:54 PM, Hasan Diwan wrote: According to [1] and [2], using RCurl to post a form with basic authentication is done using the postForm method. I'm trying to post generated interpolation data from R onto an HTTP form. The call I'm using is page - postForm('http://our.server.com/dbInt/new', opts = curlOptions=(userpwd=test:test, verbose=T), profileid = -1, value=1.801, type=history). The page instance shows the HTTP response 500 screen and I get a nullpointerexception in the server logs. Do you mean that the R variable page gives information about the request error and contains the 500 error code? Not sure what you mean by screen here. Client-server interactions are hard to debug as the problems can be on either side or in the communication. The error can be in your request, in RCurl, on the server side receiving the request or in the script processing the request on the server. So it is imperative to try to get diagnostic information. You used verbose = T (TRUE). What did that display? postForm() has a style parameter. It controls how the POST request is submitted, either application/x-www-form-urlencoded or multipart/form-data. Your server script might be expecting the data in a different format than is being sent. postForm() defaults to the www-form-urlencoded. But we will need more information to help you if these are not the cause of the problem. D. The line it points to is dealing with getting an integer out of profileid. Help? Many thanks in advance... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Using open calais in R
fayazvf wrote: I am using calais api in R for text analysis. But im facing a some problem when fetching the rdf from the server. I'm using the getToHost() method for the api call but i get just a null string. You haven't told us nearly enough for us to be able to reproduce what you are doing. Where and how is the R function getToHost() ? is it in an R package? The same url in browser returns an RDF document. getToHost(www.api.opencalais.com,/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=) [1] http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=; Yes, and library(RCurl) getURLContent(http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=;) returns RDF content as does download.file(http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=;, eg.txt) But since we have no way of knowing what getToHost() does (or the postToHost() in your earlier mail), we cannot figure out what is happening for you. Please do read the posting guidelines, specifically telling us about your session and what packages you are using. D. -- View this message in context: http://r.789695.n4.nabble.com/Using-open-calais-in-R-tp3235597p3235597.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- There are men who can think no deeper than a fact - Voltaire Duncan Temple Langdun...@wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Bldg. fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA pgpV6jduW3uDW.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accessing data via url
Just for the record, you don't need to manually find the URL to which your are being redirected by using the followlocation option in any of the RCurl functions: tt = getURLContent(https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;, followlocation = TRUE) (Same with getBinaryURL, but the file is not binary so no need to ask it to return the file as binary. getURLContent() figures out the right thing to do.) D. On 1/7/11 11:08 AM, David Winsemius wrote: I don't know how Henrique did it, but in Firefox one can go to the Downloads panel and right click on the downloaded file and choose Copy Download link (or something similar) and get: https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7cpNemjCFz14tAP3IPYCsAnvo-JJbgPNnPEWN_evBHG2jEYaNFOIT6GZF4M3VuKzioPZwvX7QSvMDWfJ3pHac5JK5BHyflOGBLOo_v44C0oU2V6teTwnjeg4TFufeltT-i5T3ThkuyesCztr6g2yLl65YcckwlEGEDtS-L9yzVe1B6tFEu2n6sjAOV9EHokEFx8e-HDFyf-u5mVIGMPgCHvaQL8pupVz-1p1rEdPpS0f6pqApTc%3Dattredirects=0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] toJSON question
On 12/11/10 8:00 AM, Santosh Srinivas wrote: Hello, I am trying to use RJSONIO I have: x - c(0,4,8,9) y - c(3,8,5,13) z - cbind(x,y) Any idea how to convert z into the JSON format below? I want to get the following JSON output to put into a php file. [[0, 3], [4, 8], [8, 5], [9, 13]] The toJSON() function is the basic mechanism. In this case, z has names on the columns. Remove these colnames(z) = NULL Then toJSON(z) gives you want you want. If you want to remove the new line (\n) characters, use gsub(). gsub(\\\n, , toJSON(z)) D. Thank you. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there an implementation for URL Encoding (/format) in R?
On 11/25/10 7:53 AM, Tal Galili wrote: Hello all, I would like some R function that can translate a string to a URL encoding (see here: http://www.w3schools.com/tags/ref_urlencode.asp) Is it implemented? (I wasn't able to find any reference to it) I expect there are several implementations, spread across different packages. The function curlEscape() in RCurl is one. D. Thanks, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl and cookies in POST requests
Hi Christian There is a new version of the RCurl package on the Omegahat repository and that handles this case. The issue was running the finalizer to garbage collect the curl handle, and it was correctly not being released as the dynCurlReader() update function was precious and had a reference to the curl handle. The new package has finer grained control in curlSetOpt() to control whether functions are made 'precious' or not and we can use this when we know we will leave the curl handle in a correct and consistent state even when the values of the function options are to be garbage collected before the curl handle. D. On 11/15/10 7:06 AM, Christian M. wrote: Hello Duncan. Thanks for having a look at this. As soon as I get home I'll try your suggestion. BTW, the link to the omega-help mailing list seems to be broken: http://www.omegahat.org/mailman/listinfo/ Thank you. chr Duncan Temple Lang (Monday 15 November 2010, 01:02): Hi Christian Thanks for finding this. The problem seems to be that the finalizer on the curl handle seems to disappear and so is not being called when the handle is garbage collected. So there is a bug somewhere and I'll try to hunt it down quickly. In the meantime, you can achieve the same effect by calling the C routine curl_easy_cleanup. You can't do this directly with a .Call() or .C() as there is no explicit interface in the RCurl package to this routine. However, you can use the Rffi package (on the omegahat repository) library(Rffi) cif = CIF(voidType, list(pointerType)) callCIF(cif, curl_easy_cleanup, c...@ref) I'll keep looking for why the finalizer is getting discarded. Thanks again, D. On 11/14/10 6:30 AM, Christian M. wrote: Hello. I know that it's usually possible to write cookies to a cookie file by removing the curl handle and doing a gc() call. I can do this with getURL(), but I just can't obtain the same results with postForm(). If I use: curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE) and then do: getURL(http://example.com/script.cgi, curl=curlHandle) rm(curlHandle) gc() it's OK, the cookie is there. But, if I do (same handle; the parameter is a dummy): postForm(site, .params=list(par=cookie), curl=curlHandle, style=POST) rm(curlHandle) gc() no cookie is written. Probably I'm doing something wrong, but don't know what. Is it possible to store cookies read from the output of a postForm() call? How? Thanks. Christian PS.: I'm attaching a script that can be sourced (and its .txt version). It contains an example. The expected result is a file (cookies.txt) with two cookies. The script currently uses getURL() and two cookies are stored. If postForm() is used (currently commented), only 1 cookie is written. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl and cookies in POST requests
Hi Christian Thanks for finding this. The problem seems to be that the finalizer on the curl handle seems to disappear and so is not being called when the handle is garbage collected. So there is a bug somewhere and I'll try to hunt it down quickly. In the meantime, you can achieve the same effect by calling the C routine curl_easy_cleanup. You can't do this directly with a .Call() or .C() as there is no explicit interface in the RCurl package to this routine. However, you can use the Rffi package (on the omegahat repository) library(Rffi) cif = CIF(voidType, list(pointerType)) callCIF(cif, curl_easy_cleanup, c...@ref) I'll keep looking for why the finalizer is getting discarded. Thanks again, D. On 11/14/10 6:30 AM, Christian M. wrote: Hello. I know that it's usually possible to write cookies to a cookie file by removing the curl handle and doing a gc() call. I can do this with getURL(), but I just can't obtain the same results with postForm(). If I use: curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE) and then do: getURL(http://example.com/script.cgi, curl=curlHandle) rm(curlHandle) gc() it's OK, the cookie is there. But, if I do (same handle; the parameter is a dummy): postForm(site, .params=list(par=cookie), curl=curlHandle, style=POST) rm(curlHandle) gc() no cookie is written. Probably I'm doing something wrong, but don't know what. Is it possible to store cookies read from the output of a postForm() call? How? Thanks. Christian PS.: I'm attaching a script that can be sourced (and its .txt version). It contains an example. The expected result is a file (cookies.txt) with two cookies. The script currently uses getURL() and two cookies are stored. If postForm() is used (currently commented), only 1 cookie is written. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGoogleDocs stopped working
Hi Harlan I just tried to connect to Google Docs and I had ostensibly the same problem. However, the password was actually different from what I had specified. After resetting it with GoogleDocs, the getGoogleDocsConnection() worked fine. So I don't doubt that the login and password are correct, but you might just try it again to ensure there are no typos. The other thing to look at is the values for Email and Passwd sent in the URL, i.e. the string in url in your debugging below. (Thanks for that by the way). If either has special characters, e.g. , it is imperative that they are escaped correctly, i.e. converted to %24. This should happen and nothing should have changed, but it is worth verifying. So things still seem to work for me. It is a data point, but not one that gives you much of a clue as to what is wrong on your machine. D. On 11/10/10 7:36 AM, Harlan Harris wrote: Hello, Some code using RGoogleDocs, which had been working smoothly since the summer, just stopped working. I know that it worked on November 3rd, but it doesn't work today. I've confirmed that the login and password still work when I log in manually. I've confirmed that the URL gives the same error when I paste it into Firefox. I don't know enough about this web service to figure out the problem myself, alas... Here's the error and other info (login/password omitted): ss.con - getGoogleDocsConnection(login=gd.login, password=gd.password, service='wise', error=FALSE) Error: Forbidden Enter a frame number, or 0 to exit 1: getGoogleDocsConnection(login = gd.login, password = gd.password, service = wise, error = FALSE) 2: getGoogleAuth(..., error = error) 3: getForm(https://www.google.com/accounts/ClientLogin;, accountType = HOSTED_OR_GOOGLE, Email = login, Passw 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary, curl = curl) 5: stop.if.HTTP.error(http.header) Selection: 4 Called from: eval(expr, envir, enclos) Browse[1] http.header Content-Type Cache-control Pragma text/plainno-cache, no-store no-cache ExpiresDate X-Content-Type-Options Mon, 01-Jan-1990 00:00:00 GMT Wed, 10 Nov 2010 15:24:39 GMT nosniff X-XSS-Protection Content-Length Server 1; mode=block 24 GSE status statusMessage 403 Forbidden\r\n Browse[1] url [1] https://www.google.com/accounts/ClientLogin?accountType=HOSTED%5FOR%5FGOOGLEEmail=***Passwd=***service=wisesource=R%2DGoogleDocs%2D0%2E1 Browse[1] .opts $ssl.verifypeer [1] FALSE R.Version() $platform [1] i386-apple-darwin9.8.0 $arch [1] i386 $os [1] darwin9.8.0 $system [1] i386, darwin9.8.0 $status [1] $major [1] 2 $minor [1] 10.1 $year [1] 2009 $month [1] 12 $day [1] 14 $`svn rev` [1] 50720 $language [1] R $version.string [1] R version 2.10.1 (2009-12-14) installed.packages()[c('RCurl', 'RGoogleDocs'), ] Package LibPath Version Priority Bundle Contains RCurl RCurl /Users/hharris/Library/R/2.10/library 1.4-3 NA NA NA RGoogleDocs RGoogleDocs /Library/Frameworks/R.framework/Resources/library 0.4-1 NA NA NA Depends Imports LinkingTo Suggests Enhances OS_type License Built RCurl R (= 2.7.0), methods, bitops NA NARcompression NA NA BSD 2.10.1 RGoogleDocs RCurl, XML, methods NA NANA NA NA BSD 2.10.1 Any ideas? Thank you! -Harlan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] postForm() in RCurl and library RHTMLForms
On 11/4/10 11:31 PM, sayan dasgupta wrote: Thanks a lot thats exactly what I was looking for Just a quick question I agree the form gets submitted to the URL http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp; and I am filling up the form in the page http://www.nseindia.com/content/indices/ind_histvalues.htm; How do I submit the arguments like FromDate, ToDate, Symbol using postForm() and submit the query to get the similar table. Well that is what the function that RHTMLForms creates does. So you can look at that code and see that it calls formQuery() which ends in a call to postForm(). You could use debug(postForm) and examine the arguments to it. postForm(...jsp, FromDate = 10- The answer is o = postForm(http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp;, FromDate = 01-11-2010, ToDate = 04-11-2010, IndexType = SP CNX NIFTY, check = new, style = POST ) On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang dun...@wald.ucdavis.eduwrote: On 11/4/10 2:39 AM, sayan dasgupta wrote: Hi RUsers, Suppose I want to see the data on the website url - http://www.nseindia.com/content/indices/ind_histvalues.htm; for the index SP CNX NIFTY for dates FromDate=01-11-2010,ToDate=02-11-2010 then read the html table from the page using readHTMLtable() I am using this code webpage - postForm(url,.params=list( FromDate=01-11-2010, ToDate=02-11-2010, IndexType=SP CNX NIFTY, Indicesdata=Get Details), .opts=list(useragent = getOption(HTTPUserAgent))) But it doesn't give me desired result You need to be more specific about how it fails to give the desired result. You are in fact posting to the wrong URL. The form is submitted to a different URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp Also I was trying to use the function getHTMLFormDescription from the package RHTMLForms but there we can't use the argument .opts=list(useragent = getOption(HTTPUserAgent)) which is needed for this particular website That's not the case. The function RHTMLForms will generate for you does support the .opts parameter. What you want is something along the lines: # Set default options for RCurl # requests options(RCurlOptions = list(useragent = R)) library(RCurl) # Read the HTML page since we cannot use htmlParse() directly # as it does not specify the user agent or an # Accept:*.* url - http://www.nseindia.com/content/indices/ind_histvalues.htm; wp = getURLContent(url) # Now that we have the page, parse it and use the RHTMLForms # package to create an R function that will act as an interface # to the form. library(RHTMLForms) library(XML) doc = htmlParse(wp, asText = TRUE) # need to set the URL for this document since we read it from # text, rather than from the URL directly docName(doc) = url # Create the form description and generate the R # function call the form = getHTMLFormDescription(doc)[[1]] fun = createFunction(form) # now we can invoke the form from R. We only need 2 # inputs - FromDate and ToDate o = fun(FromDate = 01-11-2010, ToDate = 04-11-2010) # Having looked at the tables, I think we want the the 3rd # one. table = readHTMLTable(htmlParse(o, asText = TRUE), which = 3, header = TRUE, stringsAsFactors = FALSE) table Yes it is marginally involved. But that is because we cannot simply read the HTML document directly from htmlParse() because the lack of Accept( useragent) HTTP header. Thanks and Regards Sayan Dasgupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RBloomberg on R-2.12.0
On 11/5/10 5:20 AM, Tolga I Uzuner wrote: Dear R Users, Tried to install RBloomberg with R-2.12.0 and appears RDComclient has not been built for this version of R, so failed. I then tried to get RBloombergs' Java API version to work, but ran into problems with RJava which does not appear to exist for Windows. My platform is Windows XP SP3. Will RDcomclient be built for R-2.12.0 anytime soon ? It is on the Omegahat site. Just that the directories weren't linked to the appropriate place. You can install it now. D. Does a version of RBloomberh with a Java API really exist ? An obvious Google search like Java api rbloomberg throws up a bunch of discussions but somehow, I cannot locate a package ? Will RJava work on Windows ? Thanks in advance for any pointers. Regards, Tolga This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] postForm() in RCurl and library RHTMLForms
On 11/4/10 2:39 AM, sayan dasgupta wrote: Hi RUsers, Suppose I want to see the data on the website url - http://www.nseindia.com/content/indices/ind_histvalues.htm; for the index SP CNX NIFTY for dates FromDate=01-11-2010,ToDate=02-11-2010 then read the html table from the page using readHTMLtable() I am using this code webpage - postForm(url,.params=list( FromDate=01-11-2010, ToDate=02-11-2010, IndexType=SP CNX NIFTY, Indicesdata=Get Details), .opts=list(useragent = getOption(HTTPUserAgent))) But it doesn't give me desired result You need to be more specific about how it fails to give the desired result. You are in fact posting to the wrong URL. The form is submitted to a different URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp Also I was trying to use the function getHTMLFormDescription from the package RHTMLForms but there we can't use the argument .opts=list(useragent = getOption(HTTPUserAgent)) which is needed for this particular website That's not the case. The function RHTMLForms will generate for you does support the .opts parameter. What you want is something along the lines: # Set default options for RCurl # requests options(RCurlOptions = list(useragent = R)) library(RCurl) # Read the HTML page since we cannot use htmlParse() directly # as it does not specify the user agent or an # Accept:*.* url - http://www.nseindia.com/content/indices/ind_histvalues.htm; wp = getURLContent(url) # Now that we have the page, parse it and use the RHTMLForms # package to create an R function that will act as an interface # to the form. library(RHTMLForms) library(XML) doc = htmlParse(wp, asText = TRUE) # need to set the URL for this document since we read it from # text, rather than from the URL directly docName(doc) = url # Create the form description and generate the R # function call the form = getHTMLFormDescription(doc)[[1]] fun = createFunction(form) # now we can invoke the form from R. We only need 2 # inputs - FromDate and ToDate o = fun(FromDate = 01-11-2010, ToDate = 04-11-2010) # Having looked at the tables, I think we want the the 3rd # one. table = readHTMLTable(htmlParse(o, asText = TRUE), which = 3, header = TRUE, stringsAsFactors = FALSE) table Yes it is marginally involved. But that is because we cannot simply read the HTML document directly from htmlParse() because the lack of Accept( useragent) HTTP header. Thanks and Regards Sayan Dasgupta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] File Downloading Problem
I got this working almost immediately with RCurl although with that one has to specify any value for the useragent option, or the same error occurs. The issue is that R does not add an Accept entry to the HTTP request header. It should add something like Accept: *.* Using RCurl, u = http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bhav.csv.zip; o = getURLContent(u, verbose = TRUE, useragent = getOption(HTTPUserAgent)) succeeds (but not if there is no useragent). We could fix R's download.file() to send Accept: *.*, or allow general headers to be specified either as an option for all requests, or as a parameter of download.file() (or both). Or we could have the makeUserAgent() function in utils be more customizable through options, or allow the R user specify the function herself. But while this would be good, the HTTP facilities in R are not intended to be as general something like libcurl (and hence RCurl). Unless there is a compelling reason to enhance R's internal facilities, I suggest people use something like libcurl. This approach also has the advantage of having the data directly in memory and avoiding writing it to disk and then reading it back in, e.g. library(Rcompression) z = zipArchive(o) names(z) read.csv(textConnection(z[[1]])) D. On 11/1/10 8:27 AM, Santosh Srinivas wrote: It's strange and the internet connection is fine because I am able to get data from yahoo. This was working till just yesterday ... strange if the website is creating issues with public access of basic data! -Original Message- From: David Winsemius [mailto:dwinsem...@comcast.net] Sent: 01 November 2010 20:48 To: Duncan Murdoch Cc: Santosh Srinivas; 'Rhelp' Subject: Re: [R] File Downloading Problem On Nov 1, 2010, at 10:41 AM, Duncan Murdoch wrote: On 01/11/2010 10:37 AM, Santosh Srinivas wrote: Nope Duncan ... no changes .. the same old way without a proxy ... actually the download.file is being returned 403 forbidden which is strange. These are just two lines that I am trying to run. sURL- http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha v.csv.zip download.file(sURL,test.zip) Put the same URL in a browser and it works fine. It doesn't work for me, so presumably there is some kind of security setting at the site (a cookie?), which allows your browser, but doesn't allow you to use R, or me to use anything. Firefox in a Mac platform will download and unzip the file with no security complaints and no cookie appears to be set when downloading, but that code will not access the file, nor will my efforts to wrap the URL in url() or unz() so it seems more likely that Santosh and I do not understand the file opening processes that R supports. con= unz(description=http://www.nseindia.com/content/historical/EQUITIES/2010/NO V/cm01NOV2010bhav.csv.zip , file=~/cm01NOV2010bhav.csv) test.df - read.csv(file=con) Error in open.connection(file, rt) : cannot open the connection In addition: Warning message: In open.connection(file, rt) : cannot open zip file 'http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha v.csv.zip' __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML getNodeSet syntax for PUBMED XML export
Hi Rob doc = xmlParse(url for document) dn = getNodeSet(doc, //descriptorna...@majortopic = 'Y']) will do what you want, I believe. XPath - a language for expressing such queries - is quite simple and based on a few simple primitive concepts from which one can create complex compound queries. The //DescriptorName is a node test. The [] is a predicate that includes/discards some of the resulting nodes. D. On 9/8/10 9:09 AM, Rob James wrote: I am looking for the syntax to capture XML tags marked with /DescriptorName MajorTopicYN=Y/ , but the combination of the internal space (between Name and Major and the embedded quote marks are defeating me. I can get all the DescriptorName tags, but these include both MajroTopicYN = Y and N variants. Any suggestions? Thanks in advance. Prototype text from PUBMED MeshHeadingList MeshHeading DescriptorName MajorTopicYN=YAntibodies, Monoclonal/DescriptorName /MeshHeading MeshHeading DescriptorName MajorTopicYN=NBlood Platelets/DescriptorName QualifierName MajorTopicYN=Nimmunology/QualifierName QualifierName MajorTopicYN=Yphysiology/QualifierName QualifierName MajorTopicYN=Nultrastructure/QualifierName /MeshHeading /MeshHeadingList [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R program google search
Hi there One way to use Google's search service from R is libary(RCurl) library(RJSONIO) # or library(rjson) val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0) results = fromJSONIO(val) Google requests that you provide your GoogleAPI key val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0, k= my google api key) Similarly, you should provide header information to identify your application, e.g xx = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google search AJAX , v = 1.0, .opts = list(useragen = RGoogleSearch, verbose = TRUE)) D. On 9/3/10 10:33 PM, Waverley @ Palo Alto wrote: My question is how to use R to program google search. I found this information: The SOAP Search API was created for developers and researchers interested in using Google Search as a resource in their applications. Unfortunately google no longer supports that. They are supporting the AJAX Search API. What about R? Thanks. On Fri, Sep 3, 2010 at 2:23 PM, Waverley @ Palo Alto waverley.paloa...@gmail.com wrote: Hi, Can someone help as how to use R to program google search in the R code? I know that other languages can allow or have the google search API If someone can give me some links or sample code I would greatly appreciate. Thanks. -- Waverley @ Palo Alto __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getNodeSet - what am I doing wrong?
Johannes Graumann wrote: Thanks! but: library(XML) xmlDoc - xmlTreeParse(http://www.unimod.org/xml/unimod_tables.xml;) You need to xmlParse() or xmlTreeParse(url, useInternalNodes = TRUE) (which are equivalent) in order to be able to use getNodeSet(). The error you are getting is because you are using xmlTreeParse() and the result is a tree represented in R rather than internal C-level data structures on which getNodeSet() can operate. xmlParse() is faster than xmlTreeParse() and one can use XPath to query it. D. getNodeSet(xmlDoc,//x:modifications_row, x) Error in function (classes, fdef, mtable) : unable to find an inherited method for function saveXML, for signature XMLDocument ? Thanks, Joh Duncan Temple Lang wrote: Hi Johannes This is a common issue. The document has a default XML namespace, e.g. the root node is defined as unimod xmlns=http://www.unimod.org/xmlns/schema/unimod_tables_1;... . So you need to specify which namespace to match in the XPath expression in getNodeSet(). The XML package provides a convenient facility for this. You need only specify the prefix such as x and that will be bound to the default namespace. You need to specify this in two places - where you use it in the XPath expression and in the namespaces argument of getNodeSet() So getNodeSet(test, //x:modifications_row, x) gives you probably what you want. D. On 8/30/10 8:02 AM, Johannes Graumann wrote: library(XML) test - xmlTreeParse( http://www.unimod.org/xml/unimod_tables.xml,useInternalNodes=TRUE) getNodeSet(test,//modifications_row) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- There are men who can think no deeper than a fact - Voltaire Duncan Temple Langdun...@wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Bldg. fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA pgpaTPO7e32dB.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getNodeSet - what am I doing wrong?
Hi Johannes This is a common issue. The document has a default XML namespace, e.g. the root node is defined as unimod xmlns=http://www.unimod.org/xmlns/schema/unimod_tables_1;... . So you need to specify which namespace to match in the XPath expression in getNodeSet(). The XML package provides a convenient facility for this. You need only specify the prefix such as x and that will be bound to the default namespace. You need to specify this in two places - where you use it in the XPath expression and in the namespaces argument of getNodeSet() So getNodeSet(test, //x:modifications_row, x) gives you probably what you want. D. On 8/30/10 8:02 AM, Johannes Graumann wrote: library(XML) test - xmlTreeParse( http://www.unimod.org/xml/unimod_tables.xml,useInternalNodes=TRUE) getNodeSet(test,//modifications_row) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Parsing a XML file
xmlDoc() is not the function to use to parse a file. Use doc = xmlParse(Malaria_Grave.xml) xmlDoc() is for programmatically creating a new XML within R. It could be more robust to being called with a string, but the key thing here is that it is not the appropriate function for what you want. Also, if there had been a problem with the parsing, you'd need to give me/us the offending XML file so that we could have a chance of reproducing the problem. D. On 8/24/10 2:35 PM, Orvalho Augusto wrote: I have one XML file with 30MB that I need to read the data. I try this; library(XML) doc - xmlDoc(Malaria_Grave.xml) And R answers like this *** caught segfault *** address 0x5, cause 'memory not mapped' Traceback: 1: .Call(RS_XML_createDocFromNode, node, PACKAGE = XML) 2: xmlDoc(Malaria_Grave.xml) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace Or I try this: doc - xmlTreeParse(Malaria_Grave.xml) I get this xmlParseEntityRef: no name xmlParseEntityRef: no name Error: 1: xmlParseEntityRef: no name 2: xmlParseEntityRef: no name Please guys help this simple mortal! Caveman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGoogleDocs ability to write to spreadsheets broken as of yesterday - CAN PAY FOR FIX
Hi Harlan Can you send some code so that we can reproduce the problem. That will enable me to fix the problem quicker. D. On 7/21/10 8:26 AM, Harlan Harris wrote: I unfortunately haven't received any responses about this problem. We (the company I work for) are willing to discuss payment to someone who is willing to quickly contribute a fix to the RGoogleDocs/RCurl toolchain that will restore write access. Please contact me directly if you're interested. Thank you, -Harlan Harris On Tue, Jul 20, 2010 at 10:19 AM, Harlan Harris har...@harris.name mailto:har...@harris.name wrote: Hi, I'm using RGoogleDocs/RCurl to update a Google Spreadsheet. Everything worked OK until this morning, when my ability to write into spreadsheet cells went away. I get the following weird error: Error in els[[type + 1]] : subscript out of bounds Looking at the Google Docs API changelog, I see the following: http://code.google.com/apis/spreadsheets/changelog.html Release 2010-01 (July 14, 2010) This is an advanced notice about an upcoming change. * Starting July 19, 2010, all links returned by all Spreadsheets API feeds will use HTTPS. This is being done in the interests of increased security. If you require the use of HTTP, we recommend that you remove the replace |https| with |http| in these links. Another announcement will be made on July 19, 2010, when this change goes to production. I suspect this is the problem. Fixing it is above my head, I'm afraid. Could anyone help? This is urgent. Thank you, -Harlan Harris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RGoogleDocs ability to write to spreadsheets broken as of yesterday - CAN PAY FOR FIX
Hi Harlan If you install the latest version of RCurl from source via install.packages(RCurl, repos = http://www.omegahat.org/R;) and that should solve the problem, assuming I have been reproducing the same problem you mentioned. You haven't mentioned what operating system your are on. If you are on Windows, that will pick up the binary version. If you are on the mac, you will have to build it from source. D. On 7/21/10 8:26 AM, Harlan Harris wrote: I unfortunately haven't received any responses about this problem. We (the company I work for) are willing to discuss payment to someone who is willing to quickly contribute a fix to the RGoogleDocs/RCurl toolchain that will restore write access. Please contact me directly if you're interested. Thank you, -Harlan Harris On Tue, Jul 20, 2010 at 10:19 AM, Harlan Harris har...@harris.name mailto:har...@harris.name wrote: Hi, I'm using RGoogleDocs/RCurl to update a Google Spreadsheet. Everything worked OK until this morning, when my ability to write into spreadsheet cells went away. I get the following weird error: Error in els[[type + 1]] : subscript out of bounds Looking at the Google Docs API changelog, I see the following: http://code.google.com/apis/spreadsheets/changelog.html Release 2010-01 (July 14, 2010) This is an advanced notice about an upcoming change. * Starting July 19, 2010, all links returned by all Spreadsheets API feeds will use HTTPS. This is being done in the interests of increased security. If you require the use of HTTP, we recommend that you remove the replace |https| with |http| in these links. Another announcement will be made on July 19, 2010, when this change goes to production. I suspect this is the problem. Fixing it is above my head, I'm afraid. Could anyone help? This is urgent. Thank you, -Harlan Harris __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)
Hi Ryusuke I would use the encoding parameter of htmlParse() and download and parse the content in one operation: htmlParse(http://home.sina.com;, encoding = UTF-8) If you want to use getURL() in RCurl, use the .encoding parameter You didn't tell us the output of Sys.getlocale() or how your terminal/console is configured, so the above may vary under your configuration, but works on various machines for me with different settings. D. Ryusuke Kenji wrote: Hi All, First method:- library(XML) theurl - http://home.sina.com; download.file(theurl, tmp.html) txt - readLines(tmp.html) txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g - xpathSApply(txt, //p, function(x) xmlValue(x)) head(grep( , g, value=T)) [1] ?? | ?? | ENGLISH ??? ??? [3] ??? ?? ??(???) ?? [5] ???? ??! ? ??! ! SecondMethod:- library(RCurl) theurl - getURL(http://home.sina.com,encoding='GB2312') Encoding(theurl) [1]unknown txt - readLines(con=textConnection(theurl),encoding='GB2312') txt[5:10] #show the lines which occurred encoding problem. [1] meta http-equiv=\Content-Type\ content=\text/html; charset=utf-8\ / [2] titleSINA.com US ? -??/title [3] meta name=\Keywords\ content=\, ???, ???, ??,, SINA, US, News, Chinese, Asia\ / [4] meta name=\Description\ content=\???, ???24, , , ??, , ?BBS, ???.\ / [5] [6] link rel=\stylesheet\ type=\text/css\ href=\http://ui.sina.com/assets/css/style_home.css\; / i am trying to read data from a Chinese language website, but the Chinese characters always unreadable, may I know if any good idea to cope such encoding problem in RCurl and XML? Regards, Ryusuke _ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- There are men who can think no deeper than a fact - Voltaire Duncan Temple Langdun...@wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Bldg. fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA pgpYi9CYtba6H.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Do colClasses in readHTMLTable (XML Package) work?
On 3/17/10 6:52 PM, Marshall Feldman wrote: Hi, I can't get the colClasses option to work in the readHTMLTable function of the XML package. Here's a code fragment: require(XML) doc - http://www.nber.org/cycles/cyclesmain.html; table - getNodeSet(htmlParse(doc),//table) [[2]]# The main table is the second one because it's embedded in the page table. xt - readHTMLTable( table, header = c(peak,trough,contraction,expansion,trough2trough,peak2peak), colClasses = c(character,character,character,character,character,character), trim = TRUE ) Does anyone know what's wrong? The coercion of the table columns is done before the call to as.data.frame. You can add stringsAsFactors = FALSE in the call to readHTMLTable() and you'll get what you expect, I believe. D. Marsh Feldman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] parse an HTML page with verbose error message (using XML)
Hi Yihui It took me a moment to see the error message as the latest development version of the XML package suppresses/hides them by default for htmlParse(). You can provide your own function via the error parameter. If you just want to see more detailed error messages on the console you can use a function like the following fullInfoErrorHandler = function(msg, code, domain, line, col, level, file) { # level tells how significant the error is # These are 0, 1, 2, 3 for WARNING, ERROR, FATAL # meaning simple warning, recoverable error and fatal/unrecoverable error. # See XML:::xmlErrorLevel # # code is an error code, See the values in XML:::xmlParserErrors # XML_HTML_UNKNOWN_TAG, XML_ERR_DOCUMENT_EMPTY # # domain tells what part of the library raised this error. # See XML:::xmlErrorDomain codeMsg = switch(level, warning, recoverable error, fatal error) cat(There was a, codeMsg, in the, file, at line, line, column, col, \n, msg, \n) } doc = htmlParse(~/htmlErrors.html, error = fullInfoErrorHandler) And of course you can mimic xmlErrorCumulator() to form a closure that collects the different details of each message into an object. If you look in the error.R and xmlErrorEnums.R files within the R code of the XML package, you'll find some additional functions that give us further support for working with errors in the XML/HTML parsers. Best, D. Yihui Xie wrote: I'm using the function htmlParse() in the XML package, and I need a little bit help on error handling while parsing an HTML page. So far I can use either the default way: # error = xmlErrorCumulator(), by default library(XML) doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;) # the error message is: # htmlParseStartTag: invalid element name or the tryCatch() approach: # error = NULL, errors to be caught by tryCatch() tryCatch({ doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;, error = NULL) }, XMLError = function(e) { cat(There was an error in the XML at line, e$line, column, e$col, \n, e$message, \n) }) # verbose error message as: # There was an error in the XML at line 90 column 2 # htmlParseStartTag: invalid element name I wish to get the verbose error messages without really stopping the parsing process; the first approach cannot return detailed error messages, while the second one will stop the program... Thanks! Regards, Yihui -- Yihui Xie xieyi...@gmail.com Phone: 515-294-6609 Web: http://yihui.name Department of Statistics, Iowa State University 3211 Snedecor Hall, Ames, IA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Making FTP operations with R
R does provide support for basic FTP requests. Not for DELETE requests. And not for communication on the same connection. I think your best approach is to use the RCurl package (http://www.omegahat.org/RCurl). D. Orvalho Augusto wrote: Dears I need to make some very basic FTP operations with R. I need to do a lot of get and issue a respective delete command too on the same connection. How can I do that? Thanks in advance Caveman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Working with combinations
I think there are several packages that implement combinations and several that allow you to specify a function to be called when each vector of combinations is generated. I can't recall the names of all such packages, but the Combinations package on www.omegahat.org/Combinations is one. D. Herm Walsh wrote: I am working with the combinations function (available in the gtools package). However, rather than store all of the possible combinations I would like to check each combination to see if it meets a certain criteria. If it does, I will then store it. I have looked at the combinations code but am unsure where in the algorithm I would be able to operate on each combination. Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with EXPASY HTML form submission in RCurl package
Sunando Roy wrote: Hi Duncan, Thanks for your help. I changed the P but the output that I get is not what I expect. The form gets aborted without any actual output. I get the same result with postForm(http://www.expasy.ch/tools/protscale.html;) That URL (...protscale.html) is the HTML page that contains the form. It is not the URL to which you are supposed to submit the form request. That information is in the attributes of the form .. /form. That is http://www.expasy.ch/cgi-bin/protscale.pl?1 So you have to know a little about HTML forms in order to figure out how to map the HTML description to a request. That is what your browser does ( including hidden fields in the form, etc.) It is also the purpose of the R package RHTMLForms (www.omegahat.org/RHTMLForms and install.packages(RHTMLForms, repos = http://www.omegahat.org/R;, type = source) ) e.g. library(RHTMLForms) f = getHTMLFormDescription(http://www.expasy.ch/tools/protscale.html;) fun = createFunction(f[[2]]) o = fun(prot_id = P05130, weight_var = exponential, style = POST) (The protscale.html is malformed with an that messes up parsing the linear option.) Now, of course, you have to parse the resulting HTML (in the string given in the variable o) to get the information the form submission generated. D. just with an added message that there was no input passed on. But with the input like I presented I get the same output. I could make some of the examples work like for e.g postForm(http://www.omegahat.org/RCurl/testPassword/form_validation.cgi;, your_name = Duncan, your_age = 35-55, your_sex = m, submit = submit, .opts = list(userpwd = bob:welcome)) which would suggest atleast the setup is correct. I parsed the expasy protscale source code to identify the variables but the form does not seem to go through. I can post the html body code if needed. Regards Sunando On Fri, Feb 12, 2010 at 3:54 PM, Duncan Temple Lang dun...@wald.ucdavis.edu mailto:dun...@wald.ucdavis.edu wrote: Sunando Roy wrote: Hi, I am trying to submit a form to the EXPASY protscale server ( http://www.expasy.ch/tools/protscale.html). I am using the RCurl package and the postForm function available in it. I have extracted the variables for the form from the HTML source page. According to the syntax of postForm, I just need to mention the url and assign values to the input mentioned in the HTML code. The code that I am using is: postForm(http://www.expasy.ch/tools/protscale.html;, sequence = , scale = Molecular weight, window = 5, weight_edges = 100, weight_var = linear, norm = no, submit = Submit), .checkparams = TRUE) I don't think that is what you actually submitted to R. It is a syntax error as you end the cal to postForm) after Submit and then have an extra , .checkparams = TRUE) afterwards. But, when you remove the ')' after Submit, the problem you get is that .checkparams is not a parameter of postForm(), but .checkParams is. R is case-sensitive so the problem is that .checkparams is being treated as a parameter of your form. So change the p to P in .checkparams, and it works. D. the constant error that I get is: Error in postForm(http://www.expasy.ch/tools/protscale.html;, .params = list(sequence = not, : STRING_ELT() can only be applied to a 'character vector', not a 'logical' Is there any other way to submit an HTML form in R ? Thanks for the help Regards Sunando [[alternative HTML version deleted]] __ R-help@r-project.org mailto:R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help with EXPASY HTML form submission in RCurl package
Sunando Roy wrote: Hi, I am trying to submit a form to the EXPASY protscale server ( http://www.expasy.ch/tools/protscale.html). I am using the RCurl package and the postForm function available in it. I have extracted the variables for the form from the HTML source page. According to the syntax of postForm, I just need to mention the url and assign values to the input mentioned in the HTML code. The code that I am using is: postForm(http://www.expasy.ch/tools/protscale.html;, sequence = , scale = Molecular weight, window = 5, weight_edges = 100, weight_var = linear, norm = no, submit = Submit), .checkparams = TRUE) I don't think that is what you actually submitted to R. It is a syntax error as you end the cal to postForm) after Submit and then have an extra , .checkparams = TRUE) afterwards. But, when you remove the ')' after Submit, the problem you get is that .checkparams is not a parameter of postForm(), but .checkParams is. R is case-sensitive so the problem is that .checkparams is being treated as a parameter of your form. So change the p to P in .checkparams, and it works. D. the constant error that I get is: Error in postForm(http://www.expasy.ch/tools/protscale.html;, .params = list(sequence = not, : STRING_ELT() can only be applied to a 'character vector', not a 'logical' Is there any other way to submit an HTML form in R ? Thanks for the help Regards Sunando [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] write.zip?
Hi Spencer I just put a new source version (0.9-0) of the Rcompression package on the www.omegahat.org/R repository and it has a new function zip() that creates or appends to a zip file, allowing one to provide alternative names. I'll add support for writing content from memory (i.e. AsIs character strings and raw vectors) soon. It doesn't yet handle replacing or removing elements yet. I may use a different approach (e.g. the 7-zip lzma SDK) to that and other things. D. spencerg wrote: Thanks to Dieter Menne and Prof. Ripley for replies. For certain definitions of better (e.g., transportability?), the Rcompression package might be superior to the system call I mentioned. I also just found the tar function in the utils package, which looks like it might be more transportable than my system call. However, as Prof. Ripley noted, there may not be a simpler way than my system call, especially considering the time I would have to invest to learn how to use it. Thanks again very much, Spencer Graves Prof Brian Ripley wrote: On Tue, 9 Feb 2010, spencerg wrote: Can one write a zip file from R? I want to create a file with a name like dat.zip, being a zip file containing dat.csv. I can create dat.csv, then call system('zip -r9 dat.zip dat.csv'). Is there a better way? Not really. One could use compiled code like that in unzip() to do this, but as nothing inside R is involved the only gain would be to not need to have the zip executable present. Omegahat package Rcompression does have means to manipulate .zip files from compiled code with an R interface, but not AFAICS a simpler way to do what you want. I can use gzfile to write a gz file, but I don't know how to give that a structure that would support unzipping to a *.csv file. A zip file is not a gzipped file, even though gzip compression is used for parts of the contents. The header and trailer are quite different. Thanks, Spencer __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] convert R plots into annotated web-graphics
Hi While there is different level of support for SVG in the different browsers, basic SVG (non-animation) does work on all of them (with a plugin for IE). In addition to the 2 SVG packages on CRAN, there is SVGAnnotation at www.omegahat.org/SVGAnnotation and that is quite a bit more powerful. There is a link on that page to some examples that are similar to yours. Imagemaps are a perfectly good way of achieving the interactivity you describe and Barry's imagemap package should make this pretty straightforward. If all you need is to have event handlers for regions, then imagemaps will be fine. And some JavaScript code will allow you to connect the image map events to changing characteristics of the table. The rest of this mail is about richer approaches However, there are other styles of interaction and animation that require working at the level of objects on the plot, i.e. points, lines, text, etc. When we have these objects at rendering time rather than pixels and regions, we can, e.g., change the color of a point, changing its appearance (color or size), hide or move a point, etc. You need this to do linked plots, for example, i.e. where we mouse over a point in one plot or the data table and highlight the corresponding observations in other plots. If you want this richer framework, you can generate the plot in R in such a way that it will be displayed in your browser not as a PNG file, but with real objects being created within the rendering. The SVGAnnotation package does this reasonably comprehensively. You can generate a plot in R that will be displayed on the JavaScript canvas. Again, this will create objects and they can then be manipulated by JavaScript event handlers that work on the plot elements and the table. There is a prototype of such an R-JavaScript canvas graphics devices in the RGraphicsDevice package at www.omegahat.org/RGraphicsDevice. Also, there is a beta-level Flash device that works at the object level and allows an R programmer to annotate the resulting plot in either R or ActionScript. (This is at www.omegahat.org/FlashMXML.) There is another Flash graphics device for R at https://r-forge.r-project.org/projects/swfdevice/ but this doesn't work at the object-level (at this point in time, at least). Both the FlashMXML and JavaScript packages rely on the RGraphicsDevice package and that could be fixed up minorily to handle font metric calculations with more accuracy (e.g. using RFreetype). Instead of using an HTML table and modifying it programmatically via CSS properties, etc., you might use a widget. a DataTable widget from the Yahoo UI javascript library. a Flash DataGrid to display the data as an interactive table. As I said, image maps are probably simplest if your needs are reasonably simple. These other approaches allow for potentially richer Web-based graphics. Barry Rowlingson wrote: On Sun, Feb 7, 2010 at 2:35 PM, Rainer Tischler rainer_...@yahoo.de wrote: Dear all, I would like to make a large scatter plot created with R available as an interactive web graphic, in combination with additional text-annotations for each data point in the plot. The idea is to present the text-annotations in an HTML-table and inter-link the data points in the plot with their corresponding entries in the table, i.e. when clicking on a data point in the plot, the corresponding entry in the table should be highlighted or centered and vice-versa, when clicking on a table-entry, the corresponding point in the plot should be highlighted. I have seen that CRAN contains various R-packages for SVG-based output of interactive graphics (with hyperlinks and tool-tip annotations for each data point); however, SVG is not supported by all browsers. Is anybody aware of another solution for this problem (maybe based on image-maps and javascript)? If you have alternative ideas for interlinking tabular annotations with plotted data points, I would appreciate any recommendation/suggestion. (I work with R 2.8.1 on different 32-bit PCs with both Linux and Windows operating systems). My 'imagemaps' package? https://r-forge.r-project.org/projects/imagemap/ Barry __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] create zip archive in R
Uwe Ligges wrote: On 04.02.2010 03:31, mkna005 mkna005 wrote: Hello all! I was wondering if it is possible to create a zip archive within R and add files to it? No. Well, the Rcompression package on the Omegahat package does have some facilities for it. It doesn't do it in memory, but does handle issues of moving disparate files to a common temporary directory and getting things in order generally to create the zip file. But it currently uses the external zip executable to create the archive. I probably will get around to implementing a version in memory as it has been an issue that has nagged me for a while. And we have the code for it. I know it is possible to unzip files but is it possible the other way round? No. For (compressed) archives see ?tar For other compression formats of single files see ?file Uwe Ligges Thanks in advance Christoph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] RCurl : limit of downloaded Urls ?
Alexis-Michel Mugabushaka wrote: Dear Rexperts, I am using R to query google. I believe that Google would much prefer that you use their API rather than their regular HTML form to make programmatica search queries. I am getting different results (in size) for manual queries and queries sent through getForm of RCurl. It seems that RCurl limits the size of the text retrieved (the maximum I could get is around 32 k bits). _bytes_ I assume zz = getForm(http://www.google.com/search;, q='google+search+api', num = 100) nchar(zz) [1] 109760 So more than 3 times 32Kb and there isn't a limit of 32K. The results will most likely be chunked, i.e. returned in blocks, but getForm() and other functions will, by default, combine the chunks and return the entire answer. If you were to provide your own function for the writefunction option in RCurl functions, then your function will be called for each chunk. So to be able to figure out why things are not working for you, we need to see the R code you are using, and know the operating system and versions of the RCurl package and R. D. Any idea how to get around this ? Thanks in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SSOAP XML-RPC
Hi Jan Is .XMLRPC(http://localhost:9000;, Cytoscape.test, .opts = list(verbose = TRUE)) the command you used? If not, what did you use? Can you debug the .XMLRPC function (e.g. with options(error = recover)) and see what the XML that was sent to the server, i.e. the cmd variable in the .XMLRPC() function. Can you find out what the Perl, Python or Ruby modules send? It is easy to fix if we know what should be sent, but we do need more details. D. Jan Bot wrote: Hi, I'm trying to use the XML-RPC client in the SSOAP package to connect to a service that I have created. From other languages (Perl, Python, Ruby) this is not a problem but the SSOAP client gives the following error: Error in .XMLRPC(http://localhost:9000;, Cytoscape.test, .opts = list(verbose = TRUE)) : Failed to parse XML-RPC request: Content is not allowed in prolog. It looks like the SSOAP XML-RPC client is not creating the right type of XML-RPC message. Does anyone know how to fix this or has successfully used the SSOAP XML-RPC client? Thanks, Jan __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data import export zipped files from URLs
Dieter Menne wrote: Velappan Periasamy wrote: I am not able to import zipped files from the following link. How to get thw same in to R?. mydata - read.csv(http://nseindia.com/content/historical/EQUITIES/2010/JAN/cm15JAN2010bhav.csv.zip;) As Brian Ripley noted in http://markmail.org/message/7dsauipzagq5y36o you will have to download it first and then to unzip. Well if downloading to disk first does need to be avoided, you can use the RCurl and Rcompression packages to do the computations in memory: library(RCurl) ctnt = getURLContent(http://nseindia.com/content/historical/EQUITIES/2010/JAN/cm15JAN2010bhav.csv.zip;) library(Rcompression) zz = zipArchive(ctnt) names(zz) txt = zz[[1]] read.csv(textConnection(txt)) D. Dieter __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xmlToDataFrame#Help!!!
Christian Ritter wrote: I'm struggling with interpreting XML files created by ADODB as data.frames and I'm looking for advice (see attached example file). You'll have to attach it (or give us a URL for it). Also, you should tell us what you have tried and how it failed. And of course, your sessionInfo(). D. Note: This file contains a result set which comes from a rectangular data array. I've been trying to play with parameters to the xmlToDataFrame function in the XML package but I dont get it to extract the data frame. This is what the result should look like: Name Sex Age Height Weight 1 Alfred M 14 69.0 112.5 2Alice F 13 56.5 84.0 3 Barbara F 13 65.3 98.0 4Carol F 14 62.8 102.5 5Henry M 14 63.5 102.5 6James M 12 57.3 83.0 7 Jane F 12 59.8 84.5 8Janet F 15 62.5 112.5 9 Jeffrey M 13 62.5 84.0 10John M 12 59.0 99.5 11 Joyce F 11 51.3 50.5 12Judy F 14 64.3 90.0 13 Louise F 12 56.3 77.0 14Mary F 15 66.5 112.0 15 Philip M 16 72.0 150.0 16 Robert M 12 64.8 128.0 17 Ronald M 15 67.0 133.0 18 Thomas M 11 57.5 85.0 19 William M 15 66.5 112.0 Thanks in advance ... Chris P.S.: In return, I'll continue developing a small package called R2sas2R with obvious meaning and I'll release it on CRAN as soon as I'm a bit further. (first tests under Windows using the StatconnDCOM connector and the rcom package are encouraging). -- There are men who can think no deeper than a fact - Voltaire Duncan Temple Langdun...@wald.ucdavis.edu Department of Statistics work: (530) 752-4782 4210 Mathematical Sciences Bldg. fax: (530) 752-7099 One Shields Ave. University of California at Davis Davis, CA 95616, USA pgpq7OyJr3InL.pgp Description: PGP signature __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)
Hi Lauri. I am in the process of making some changes to the encoding in the XML package. I'll take a look over the next few days. (Not certain precisely when.) D. Lauri Nikkinen wrote: Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below library(RCurl) library(XML) site - getURL(http://www.aarresaari.net/jobboard/jobs.html;) txt - readLines(tc - textConnection(site)); close(tc) txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g - xpathSApply(txt, //p, function(x) xmlValue(x)) head(grep( , g, value=T)) [1]   PART-TIME EXPORT SALES ASSOCIATES (ALSO SUMMER WORK)  Valuatum Oy  Helsinki  Ilmoitus lisätty: 31.12.2009. Viimeinen hakupäivä: 28.02.2010 [2]   MSN EDITOR / ONLINE PRODUCER  Manpower Oy  Espoo  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 15.1.2010 [3]   MYYNTINEUVOTTELIJA  Rand Customer Contact Oy  Helsinki  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 30.1.2010 [4]   HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN?  HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010 [5]   HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010 [6]   Korkeakouluharjoittelija/ työelämävalmennettava  Suomen suurlähetystö Pristina, Kosovo  Pristina, Kosovo  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 20.1.2010 This won't help: txt - readLines(tc - textConnection(site)); close(tc) txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE, encoding=latin1) g - xpathSApply(txt, //p, function(x) xmlValue(x)) head(grep( , g, value=T)) [1]   PART-TIME EXPORT SALES ASSOCIATES (ALSO SUMMER WORK)  Valuatum Oy  Helsinki  Ilmoitus lisätty: 31.12.2009. Viimeinen hakupäivä: 28.02.2010 [2]   MSN EDITOR / ONLINE PRODUCER  Manpower Oy  Espoo  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 15.1.2010 [3]   MYYNTINEUVOTTELIJA  Rand Customer Contact Oy  Helsinki  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 30.1.2010 [4]   HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN?  HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010 [5]   HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010 [6]   Korkeakouluharjoittelija/ työelämävalmennettava  Suomen suurlähetystö Pristina, Kosovo  Pristina, Kosovo  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 20.1.2010 Any ideas? Thanks, Lauri sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=Finnish_Finland.1252 LC_CTYPE=Finnish_Finland.1252 LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C [5] LC_TIME=Finnish_Finland.1252 attached base packages: [1] grDevices datasets splines graphics utils grid stats methods base other attached packages: [1] RDCOMClient_0.92-0 XML_2.6-0 RCurl_1.3-1 Hmisc_3.7-0survival_2.35-8ggplot2_0.8.5 digest_0.4.2 reshape_0.8.3 [9] plyr_0.1.9 proto_0.3-8gplots_2.7.4 caTools_1.10 bitops_1.0-4.1 gtools_2.6.1 gmodels_2.15.0 gdata_2.6.1 [17] lattice_0.17-26 loaded via a namespace (and not attached): [1] cluster_1.12.1 MASS_7.3-4 tools_2.10.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Have you used RGoogleDocs and RGoogleData?
Farrel Buchinsky wrote: It Works! Thanks a lot! Its great. Thanks for letting me know. Glad that fixed things for you. What were your few minor, but important, changes - in a nutshell. I will not understand unless you describe it as high level issues. Basically, recognizing the type of a document, e.g. a spreadsheet or word processing document or generic document. The changes made the detection more robust or more consistent with any changes at Google. D. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Fri, Dec 11, 2009 at 19:07, Duncan Temple Lang dun...@wald.ucdavis.eduwrote: Hi Farrel I have taken a look at the problems using RGoogleDocs to read spreadsheets and was able to reproduce the problem I believe you were having. A few minor, but important, changes and I can read spreadsheets again and apparently still other types of documents. I have put an updated version of the source of the package with these changes. It is available from http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.tar.gz There is a binary for Windows in http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.zip Hopefully this will cure the problems you have been experiencing. I'd appreciate knowing either way. Thanks, D. Farrel Buchinsky wrote: Both of these applications fulfill a great need of mine: to read data directly from google spreadsheets that are private to myself and one or two collaborators. Thanks to the authors. I had been using RGoogleDocs for the about 6 months (maybe more) but have had to stop using it in the past month since for some reason that I do not understand it no longer reads google spreadsheets. I loved it. Its loss depresses me. I started using RGoogleData which works. I have noticed that both packages read data slowly. RGoogleData is much slower than RGoogleDocs used to be. Both seem a lot slower than if one manually downloaded a google spreadsheet as a csv and then used read.csv function - but then I would not be able to use scripts and execute without finding and futzing. Can anyone explain in English why these packages read slower than a csv download? Can anyone explain what the core difference is between the two packages? Can anyone share their experience with reading Google data straight into R? Farrel Buchinsky Google Voice Tel: (412) 567-7870 Sent from Pittsburgh, Pennsylvania, United States [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Have you used RGoogleDocs and RGoogleData?
Hi Farrel I have taken a look at the problems using RGoogleDocs to read spreadsheets and was able to reproduce the problem I believe you were having. A few minor, but important, changes and I can read spreadsheets again and apparently still other types of documents. I have put an updated version of the source of the package with these changes. It is available from http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.tar.gz There is a binary for Windows in http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.zip Hopefully this will cure the problems you have been experiencing. I'd appreciate knowing either way. Thanks, D. Farrel Buchinsky wrote: Both of these applications fulfill a great need of mine: to read data directly from google spreadsheets that are private to myself and one or two collaborators. Thanks to the authors. I had been using RGoogleDocs for the about 6 months (maybe more) but have had to stop using it in the past month since for some reason that I do not understand it no longer reads google spreadsheets. I loved it. Its loss depresses me. I started using RGoogleData which works. I have noticed that both packages read data slowly. RGoogleData is much slower than RGoogleDocs used to be. Both seem a lot slower than if one manually downloaded a google spreadsheet as a csv and then used read.csv function - but then I would not be able to use scripts and execute without finding and futzing. Can anyone explain in English why these packages read slower than a csv download? Can anyone explain what the core difference is between the two packages? Can anyone share their experience with reading Google data straight into R? Farrel Buchinsky Google Voice Tel: (412) 567-7870 Sent from Pittsburgh, Pennsylvania, United States [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Scraping a web page
Hi Michael If you just want all of the text that is displayed in the HTML docment, then you might use an XPath expression to get all the text() nodes and get their value. An example is doc = htmlParse(http://www.omegahat.org/;) txt = xpathSApply(doc, //body//text(), xmlValue) The result is a character vector that contains all the text. By limiting the nodes to the body, we avoid the content in head such as inlined JavaScript or CSS. It is also possible that a document may have script elements in the document containing JavaScript that you don't want. You can omit these txt = xpathSApply(doc, //body//text()[not(ancestor::script)], xmlValue) And if there were other elements we wanted to ignore, then you could use txt = xpathSApply(doc, //body//text()[not(ancestor::script) and not(ancestor::otherElement)], xmlValue) HTH, D. Michael Conklin wrote: I would like to be able to submit a list of URLs of various webpages and extract the content i.e. not the mark-up of those pages. I can find plenty of examples in the XML library of extracting links from pages but I cannot seem to find a way to extract the text. Any help would be greatly appreciated - I will not know the structure of the URLs I would submit in advance. Any suggestions on where to look would be greatly appreciated. Mike W. Michael Conklin Chief Methodologist MarketTools, Inc. | www.markettools.comhttp://www.markettools.com 6465 Wayzata Blvd | Suite 170 | St. Louis Park, MN 55426. PHONE: 952.417.4719 | CELL: 612.201.8978 This email and attachment(s) may contain confidential and/or proprietary information and is intended only for the intended addressee(s) or its authorized agent(s). Any disclosure, printing, copying or use of such information is strictly prohibited. If this email and/or attachment(s) were received in error, please immediately notify the sender and delete all copies [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading from Google Docs
Farrel Buchinsky wrote: Please oh please could someone help me or at least confirm that they are having the same problem. Why am I getting the error message from RGoogleDocs getDocs(sheets.con) Error in getDocs(sheets.con) : problems connecting to get the list of documents You are using a connection to the wise service (for worksheets) to get the list of documents from the document service. If you call getDocs() with an connection to writely, I imagine it will succeed. So you have a token, but it is for the wrong thing. How do I troubleshoot? The first thing is to learn about debugging in R. For example, options(error = recover) getDocs(sheets.con) The error occurs and you are presented with a menu prompt that allows you to select the call frame of interest. There is only one - getDocs(). Enter 1 Return. Now you have an R prompt that allows you to explore the call frame. objects() body() Take a look at status status WWW-Authenticate GoogleLogin realm=\http://www.google.com/accounts/ClientLogin\;, service=\writely\ Content-Type text/html; charset=UTF-8 Date Sat, 28 Nov 2009 17:36:16 GMT Expires Sat, 28 Nov 2009 17:36:16 GMT Cache-Control private, max-age=0 X-Content-Type-Options nosniff X-XSS-Protection 0 X-Frame-Options SAMEORIGIN Server GFE/2.0 Transfer-Encoding chunked status 401 statusMessage Token invalid This is the parsed header of the reply from the GoogleDocs server. x contains the result of the query and it is an HTML document with the (same) error message. Farrel Buchinsky Google Voice Tel: (412) 567-7870 On Wed, Nov 25, 2009 at 17:08, Farrel Buchinsky fjb...@gmail.com wrote: Oh OH! Could you please help with a problem that I never used to get. library(RGoogleDocs) ps -readline(prompt=get the password in ) sheets.con = getGoogleDocsConnection(getGoogleAuth(fjb...@gmail.com, ps, service =wise)) ts2=getWorksheets(OnCall,sheets.con) Those opening lines of script used to work flawlesly. Now I get. Error in getDocs(con) : problems connecting to get the list of documents Yet I got it to work earlier while I had been toying with RGoogleData package in another session. Could RGoogleData have opened something for RGoogleDocs to use? Farrel Buchinsky Google Voice Tel: (412) 567-7870 Sent from Pittsburgh, Pennsylvania, United States On Wed, Nov 25, 2009 at 16:34, Farrel Buchinsky fjb...@gmail.com wrote: That was painless. I had already installed Rtools and had already put it on my path. Your line worked very well. [Thanks for telling me. However I did it last time was worse than sticking daggers in my eyes. ] install.packages( RGoogleDocs, repos=http://www.omegahat.org/R;, type=source ) I now have Package: RGoogleDocs Version: 0.4-0 Title: Maintainer: Duncan Temple Lang dun...@wald.ucdavis.edu Packaged: 2009-10-27 22:10:22 UTC; duncan Built: R 2.10.0; ; 2009-11-25 20:59:03 UTC; windows I am providing the following link to a copy of my RGoogleDocs zipped directory. It is for people who run R in windows and do not want to go through the pain of setting things up so that they can install source. http://dl.dropbox.com/u/23200/RGoogleDocs/RGoogleDocs.zip I BELIEVE that if one downloads the zip and extracts it to an empty directory called RGoogleDocs in one's Library
Re: [R] Build of XML package failed
Hi Luis. You can change the two lines PROBLEM buf WARN; to the one line warning(buf); That should compile. If not, please show us the compilation command for DocParse.c, i.e. all the arguments to the compiler, just above the error messages. D. Luis Tito de Morais wrote: Hi list, It may be a FAQ, but I searched the web and Uni of Newcastle Maths and Stats and R mailing list archive on this issue but was unable to find a solution. I would appreciate any pointer to help me solving this. I am using R version 2.10.0 (2009-10-26) on linux mandriva 2010.0 I tried to install the XML_2.6-0.tar.gz package both with install.packages('XML', dep=T) from within R and the R CMD INSTALL using a local tar.gz file. I am having the following error message (sorry it is partly in french): Dans le fichier inclus à partir de DocParse.c:13: Utils.h:175:2: attention : #warning Redefining COPY_TO_USER_STRING to use encoding from XML parser DocParse.c: In function ‘notifyError’: DocParse.c:1051: erreur: le format n'est pas une chaîne littérale et pas d'argument de format This last error message means: error: format not a string literal and no format arguments In the past when having such errors with other packages, I have been able to solve it with the help of this tip: http://mario79t.wordpress.com/2009/06/23/warning-format-not-a-string-literal-and-no-format-arguments/ and modifying the faulty source file accordingly. But in this specific case, I have been unable to find what to modify in the source file. Line 1051 in the DocParse.c source file only has the command WARN. I don't know anything about C programming and could not figure out what to modify in this case. I would appreciate any help on this issue. Best regards, Tito __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to suppress errors generated by readHTMLTable?
Just this morning, I made suppressing these parser messages the default behavior for htmlParse() and that will apply to readHTMLTable() also. Until I release that (along with another potentially non-backward compatible change regarding character encoding), you can use readHTMLTable(htmlParse(index.html, error = function(...){})) i.e. parse the document yourself and hand it to readHTMLTable(). D. Peng Yu wrote: library(XML) download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079unigene=submit=Submit','index.html') tables=readHTMLTable(index.html,error=function(...){}) tables readHTMLTable gives me the following errors. Could somebody let me know how to suppress them? Opening and ending tag mismatch: center and table htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' Opening and ending tag mismatch: td and tr htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' htmlParseEntityRef: expecting ';' Unexpected end tag : form Opening and ending tag mismatch: body and center Opening and ending tag mismatch: body and center __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML package example code?
Peng Yu wrote: On Wed, Nov 25, 2009 at 12:19 AM, cls59 ch...@sharpsteen.net wrote: Peng Yu wrote: I'm interested in parsing an html page. I should use XML, right? Could you somebody show me some example code? Is there a tutorial for this package? Did you try looking through the help pages for the XML package or browsing the Omegahat website? Look at: library(XML) ?htmlTreeParse And the relevant web page for documentation and examples is: http://www.omegahat.org/RSXML/ http://www.omegahat.org/RSXML/shortIntro.html I'm trying the example on the above webpage. But I'm not sure why I got the following error. Would you help to take a look? $ Rscript main.R library(XML) download.file('http://www.omegahat.org/RSXML/index.html','index.html') trying URL 'http://www.omegahat.org/RSXML/index.html' Content type 'text/html; charset=ISO-8859-1' length 3021 bytes opened URL == downloaded 3021 bytes doc = xmlInternalTreeParse(index.html) You are trying to parse an HTML document as if it were XML. But HTML is often not well-formed. So use htmlParse() for a more forgiving parser. Or use the RTidyHTML package (www.omegahat.org/RTidyHTML) to make the HTML well-formed before passing it to xmlTreeParse() (aka xmlInternalTreeParse()). That package is an interface to libtidy. D. Opening and ending tag mismatch: dd line 68 and dl Opening and ending tag mismatch: li line 67 and body Opening and ending tag mismatch: dt line 66 and html Premature end of data in tag dd line 64 Premature end of data in tag li line 63 Premature end of data in tag dt line 62 Premature end of data in tag dl line 61 Premature end of data in tag body line 5 Premature end of data in tag html line 1 Error: 1: Opening and ending tag mismatch: dd line 68 and dl 2: Opening and ending tag mismatch: li line 67 and body 3: Opening and ending tag mismatch: dt line 66 and html 4: Premature end of data in tag dd line 64 5: Premature end of data in tag li line 63 6: Premature end of data in tag dt line 62 7: Premature end of data in tag dl line 61 8: Premature end of data in tag body line 5 9: Premature end of data in tag html line 1 Execution halted __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] problem post request with RCurl
Use curlPerform(url = 'http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi', postfields = q) That gives me: PCT-Data PCT-Data_output PCT-OutputData PCT-OutputData_status PCT-Status-Message PCT-Status-Message_status PCT-Status value=running/ /PCT-Status-Message_status /PCT-Status-Message /PCT-OutputData_status PCT-OutputData_output PCT-OutputData_output_waiting PCT-Waiting PCT-Waiting_reqid31406321645402938/PCT-Waiting_reqid /PCT-Waiting /PCT-OutputData_output_waiting /PCT-OutputData_output /PCT-OutputData /PCT-Data_output /PCT-Data Rajarshi Guha wrote: Hi, I am trying to use a CGI service (Pubchem PUG) via RCurl and am running into a problem where the data must be supplied via POST - but I don't know the keyword for the argument. The data to be sent is an XML fragment. I can do this via the command line using curl: I save the XML string to a file called query.xml and then do curl -d @query.xml http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi; I get the expected response. More importantly, the verbose option shows: Accept: */* Content-Length: 1227 Content-Type: application/x-www-form-urlencoded However, when I try to do this via RCurl, the data doesn't seem to get sent: q - PCT-Data PCT-Data_inputPCT-InputData PCT-InputData_queryPCT-Query PCT-Query_typePCT-QueryType PCT-QueryType_qas PCT-QueryActivitySummary PCT-QueryActivitySummary_output value=\summary-table\0/PCT-QueryActivitySummary_output PCT-QueryActivitySummary_type value=\assay-central\0/PCT-QueryActivitySummary_type PCT-QueryActivitySummary_scids PCT-QueryUids PCT-QueryUids_ids PCT-ID-List PCT-ID-List_dbpccompound/PCT-ID-List_db PCT-ID-List_uids PCT-ID-List_uids_E3243128/PCT-ID-List_uids_E /PCT-ID-List_uids /PCT-ID-List /PCT-QueryUids_ids /PCT-QueryUids /PCT-QueryActivitySummary_scids /PCT-QueryActivitySummary /PCT-QueryType_qas /PCT-QueryType /PCT-Query_type/PCT-Query /PCT-InputData_query/PCT-InputData /PCT-Data_input/PCT-Data postForm(url, q, style=post, .opts = list(verbose=TRUE)) * About to connect() to pubchem.ncbi.nlm.nih.gov port 80 (#0) * Trying 130.14.29.110... * connected * Connected to pubchem.ncbi.nlm.nih.gov (130.14.29.110) port 80 (#0) POST /pug/pug.cgi HTTP/1.1 Host: pubchem.ncbi.nlm.nih.gov Accept: */* Content-Length: 0 Content-Type: application/x-www-form-urlencoded As you can see, the data in q doesn't seem to get sent (content-length = 0). Does anybody have any suggestions as to why the call to postForm doesn't work, but the command line call does? Thanks, Rajarshi Guha| NIH Chemical Genomics Center http://www.rguha.net | http://ncgc.nih.gov Q: Why did the mathematician name his dog Cauchy? A: Because he left a residue at every pole. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] XML: Reading transition matrices into R
stefan.d...@gmail.com wrote: Hello, from a software I have the following output in xml (see below): It is a series of matrices, for each age one. I have 3 categories (might vary in the application), hence, 3x3 matrices where each element gives the probability of transition from i to j. I would like read this data into R (preferably in a list, where each list element is one of the age specific matrices) and - after altering the values in R - write it back into the file. I know that there is an xml package in R with which I have already struggled, but I have to admit my understanding is too limited. Maybe somebody had a similar problem or know the code of the top of his or her head. Hi Stefan There are many approaches for handling this. I assume that the primary obstacle you are facing is extracting the values from the XML. The following will do that for you. We start with the content in transition.xml (or in a string in R). Since the XML is very shallow, i.e. not very hierarchical, and all the information is in the transition nodes under the root, we can use xmlToList(). This returns a list with an element for each transition element, and such elements are character vectors containing the values from age, sex, from, to, and percent. So I've put these into a matrix and you are now back entirely within R and can group the values by age and arrange them into the individual transition matrices. doc = xmlParse(transition.xml) matrix(as.numeric(unlist(xmlToList(doc))), , 5, byrow = TRUE, dimnames = list(NULL, names(xmlRoot(doc)[[1]]))) D. Any help appreciated. Thanks and best, Stefan ?xml version=1.0 encoding=UTF-8 standalone=no?transitionmatrix transitionage0/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage0/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage0/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage0/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage0 /! agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto2/topercent100.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage1/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage1/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom1/fro m! to3/topercent0.0/percent/transitiontransitionage1/age sex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage1/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage1/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fro m! to2/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage2/agesex0/sexfrom1/fromto1/topercent99.35205/percent/transitiontransitionage2/agesex0/sexfrom1/fromto2/topercent0.6479474/percent/transitiontransitionage2/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage2/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage2/agesex0/sexfrom2/fromto2/topercent97.101456/percent/transitiontransitionage2/agesex0/sexfrom2/fromto3/toperc e! nt2.8985496/percent/transitiontransitionage2/agesex0/sex
Re: [R] XML: Reading transition matrices into R
stefan.d...@gmail.com wrote: Hello, thanks a lot. This is a form which I can work with in R. Another question, which I hope is not a bridge too far: Can I write the R matrix which your code created back into the xml format (i.e. with the same tags and structure) from which it came and hence feed it back to the original software? trans = apply(xx, 1, function(x) { tr = newXMLNode(transition) mapply(newXMLNode, names(x), x, MoreArgs = list(parent = tr)) tr }) top = newXMLNode(transitionmatrix, .children = trans) saveXML(top, newTransition.xml) Best, Stefan On Thu, Nov 12, 2009 at 3:17 PM, Duncan Temple Lang dun...@wald.ucdavis.edu wrote: stefan.d...@gmail.com wrote: Hello, from a software I have the following output in xml (see below): It is a series of matrices, for each age one. I have 3 categories (might vary in the application), hence, 3x3 matrices where each element gives the probability of transition from i to j. I would like read this data into R (preferably in a list, where each list element is one of the age specific matrices) and - after altering the values in R - write it back into the file. I know that there is an xml package in R with which I have already struggled, but I have to admit my understanding is too limited. Maybe somebody had a similar problem or know the code of the top of his or her head. Hi Stefan There are many approaches for handling this. I assume that the primary obstacle you are facing is extracting the values from the XML. The following will do that for you. We start with the content in transition.xml (or in a string in R). Since the XML is very shallow, i.e. not very hierarchical, and all the information is in the transition nodes under the root, we can use xmlToList(). This returns a list with an element for each transition element, and such elements are character vectors containing the values from age, sex, from, to, and percent. So I've put these into a matrix and you are now back entirely within R and can group the values by age and arrange them into the individual transition matrices. doc = xmlParse(transition.xml) matrix(as.numeric(unlist(xmlToList(doc))), , 5, byrow = TRUE, dimnames = list(NULL, names(xmlRoot(doc)[[1]]))) D. Any help appreciated. Thanks and best, Stefan ?xml version=1.0 encoding=UTF-8 standalone=no?transitionmatrix transitionage0/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage0/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage0/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage0/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage 0 /! agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto2/topercent100.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage1/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage1/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom1/f ro m! to3/topercent0.0/percent/transitiontransitionage1/age sex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage1/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage1/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/f ro m! to2/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1
Re: [R] help with SSOAP (can't find working examples)
Hi Steffen et al. The development version of SSOAP and XMLSchema I have on my machine does complete the processWSDL() call without errors. I have to finish off some tests before releasing these. It may take a few days before I have time to work on this, but hopefully soon. Thanks for the info. D. Steffen Neumann wrote: Hi, I can confirm this, just today I tried to write a web service client. Affected are both SSOAP-0.5-4 and SSOAP_0.4-6. I can't access anonymous CVS atm. to check for recent fixes. I am unable to map the error message to any of the items in http://www.omegahat.org/SSOAP/Todo.html , is this already known ? Yours, Steffen library(SSOAP) Loading required package: XML Loading required package: RCurl Loading required package: bitops w = processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) Error: Cannot resolve ns:searchPeakDiff in SchemaCollection In addition: Warning messages: 1: In function (node) : skipping import node with no schemaLocation attribute 2: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) : Ignoring additional serviceport ... elements sessionInfo() R version 2.8.1 (2008-12-22) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] SSOAP_0.4-6RCurl_1.3-0bitops_1.0-4.1 XML_2.5-3 w = processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) Error: Cannot resolve ns:searchPeakDiff in SchemaCollection In addition: Warning messages: 1: In function (node) : skipping import node with no schemaLocation attribute 2: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) : Ignoring additional serviceport ... elements Enter a frame number, or 0 to exit 1: processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) 2: lapply(tmp, processWSDLBindings, doc, types) 3: FUN(X[[1]], ...) 4: lapply(els, processWSDLOperation, types, doc, namespaceDefinitions, typeDef 5: FUN(X[[1]], ...) 6: xmlSApply(msg, function(x) { 7: xmlSApply.XMLNode(msg, function(x) { 8: sapply(xmlChildren(X), FUN, ...) 9: lapply(X, FUN, ...) 10: FUN(X[[1]], ...) 11: resolve(el, typeDefinitions) 12: resolve(el, typeDefinitions) 13: resolveError(Cannot resolve , obj, in , class(context)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Error installing RSPerl.
Hi Grainne There is one likely cause. But before getting into the explanation, can you send me the output from when you installed the package, e.g. the output from R CMD INSTALL RSPerl and any configuration arguments you specified. You can send this to me off-list and we can summarize at the end. Thanks, D. Grainne Kerr wrote: Dear list, I have updated to version R-2.10.0. When I try to load the RSPerl library I get the following error: library(RSPerl) Error in dyn.load(file, DLLpath = DLLpath, ...) : unable to load shared library '/usr/local/lib/R/library/RSPerl/libs/RSPerl.so': /usr/local/lib/R/library/RSPerl/libs/RSPerl.so: undefined symbol: boot_DB_File__Glob Error: package/namespace load failed for 'RSPerl' I do not know how to fix this. Can anyone please help? I'm runninn R on Ubuntu 9.04 Many thanks, Grainne. sessionInfo() R version 2.10.0 (2009-10-26) i686-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.