Re: [R] Assistance converting to R a python function that extracts from an XML file

2014-12-13 Thread Duncan Temple Lang
Hi Don

library(XML)
readxmldate = 
function(xmlfile) 
{
  doc = xmlParse(xmlfile)
  xpathSApply(doc, '//Esri/CreaDate | //Esri/CreaTime', xmlValue)
}

 D.

On 12/13/14, 12:36 PM, MacQueen, Don wrote:
 I would appreciate assistance doing in R what a colleague has done in
 python. Unfortunately (for me), I have almost no experience with either
 python or xml.
 
 Within an xml file there is
 CreaDate20120627/CreaDateCreaTime07322600/CreaTime
 and I need to extract those two values, 20120627 and 07322600
 
 
 Here is the short python function. Even without knowing python, it's
 conceptually clear what it does. I would like to do the same in R.
 
 def readxmldate(xmlfile):
   tree = ET.parse(xmlfile)
   root = tree.getroot()
   for lev1 in root.findall('Esri'):
   xdate = lev1.find('CreaDate').text
   xtime = lev1.find('CreaTime').text
   return xdate, xtime
 
 
 Thanks in advance
 -Don


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saveXML() prefix argument

2013-10-20 Thread Duncan Temple Lang

Thanks Earl and Milan.
Yes, the C code to serialize does branch and do things
differently for the different combinations of file, encoding and indent.
I have updated the code to use a different routine in libxml2 for this case
and that honors the indentation in this case. That will be in the next release
of XML.

In the meantime, you can use

   cat( saveXML( doc, encoding = UTF-8, indent = TRUE),  file = bob.xml)

rather than 
saveXML(doc, file = bob.xml, encoding = UTF-8, indent = TRUE)
i.e. move the file argument to cat().

 Thanks,
 D.

On 10/19/13 4:36 AM, Milan Bouchet-Valat wrote:
 Le vendredi 18 octobre 2013 à 13:27 -0400, Earl Brown a écrit :
 Thanks Duncan. However, now I can't get the Spanish and Portuguese accented 
 vowels to come out correctly and still keep the indents in the saved 
 document, even when I set encoding = UTF-8:

 library(XML)
 concepts - c(español, português)
 info - c(info about español, info about português)

 doc - newXMLDoc()
 root - newXMLNode(tips, doc = doc)
 for (i in 1:length(concepts)) {
  cur.concept - concepts[i]
  cur.info - info[i]
  cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root)
  newXMLNode(h1, cur.concept, parent = cur.tip)
  newXMLNode(p, cur.info, parent = cur.tip)
 }

 # accented vowels don't come through correctly, but the indents are correct:
 saveXML(doc, file = test1.xml, indent = T)

 Resulting file looks like this:
 ?xml version=1.0?
 tips
   tip id=1
 h1espa#xF1;ol/h1
 pinfo about espa#xF1;ol/p
   /tip
   tip id=2
 h1portugu#xEA;s/h1
 pinfo about portugu#xEA;s/p
   /tip
 /tips

 # accented vowels are correct, but the indents are no longer correct:
 saveXML(doc, file = test2.xml, indent = T, encoding = UTF-8)

 Resulting file:
 ?xml version=1.0 encoding=UTF-8?
 tipstip id=1h1español/h1pinfo about español/p/tiptip
 id=2h1português/h1pinfo about português/p/tip/tips

 I tried to workaround the problem by simply loading in that resulting
 file and saving it again:
 doc2 - xmlInternalTreeParse(file = test2.xml, asTree = T)
 saveXML(doc2, file = test_word_around.xml, indent = T)

 but still don't get the indents.

 Does setting encoding = UTF-8 override indents = TRUE in saveXML()?
 I can confirm the same issue happens here. What is interesting is that
 without the 'file' argument, the returned string includes the expected
 line breaks and spacing. These do not appear when redirecting the output
 to a file.
 
 saveXML(doc, encoding=UTF-8, indent=T)
 [1] ?xml version=\1.0\ encoding=\UTF-8\?\ntips\n  tip id=\1
 \\nh1español/h1\npinfo about español/p\n  /tip\n
 tip id=\2\\nh1português/h1\npinfo about
 português/p\n  /tip\n/tips\n
 
 saveXML(doc, encoding=UTF-8, indent=T, file=test.xml)
 
 Contents of test.xml:
 ?xml version=1.0 encoding=UTF-8?
 tipstip id=1h1español/h1pinfo about español/p/tiptip 
 id=2h1português/h1pinfo about português/p/tip/tips
 
 
 sessionInfo()
 R version 3.0.1 (2013-05-16)
 Platform: x86_64-redhat-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C 
  [3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
  [5] LC_MONETARY=fr_FR.utf8LC_MESSAGES=fr_FR.utf8   
  [7] LC_PAPER=CLC_NAME=C
  [9] LC_ADDRESS=C  LC_TELEPHONE=C   
 [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C  
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods
 base 
 
 other attached packages:
 [1] XML_3.96-1.1
 
 
 Regards
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saveXML() prefix argument

2013-10-18 Thread Duncan Temple Lang
Hi Earl

Unfortunately, the code works for me, i.e. indents _and_ displays the accented 
vowels correctly.

Can you send me the output of the function call

 libxmlVersion()

and also sessionInfo(), please?

 D.

On 10/18/13 10:27 AM, Earl Brown wrote:
 Thanks Duncan. However, now I can't get the Spanish and Portuguese accented 
 vowels to come out correctly and still keep the indents in the saved 
 document, even when I set encoding = UTF-8:
 
 library(XML)
 concepts - c(español, português)
 info - c(info about español, info about português)
 
 doc - newXMLDoc()
 root - newXMLNode(tips, doc = doc)
 for (i in 1:length(concepts)) {
   cur.concept - concepts[i]
   cur.info - info[i]
   cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root)
   newXMLNode(h1, cur.concept, parent = cur.tip)
   newXMLNode(p, cur.info, parent = cur.tip)
 }
 
 # accented vowels don't come through correctly, but the indents are correct:
 saveXML(doc, file = test1.xml, indent = T)
 
 Resulting file looks like this:
 ?xml version=1.0?
 tips
   tip id=1
 h1espa#xF1;ol/h1
 pinfo about espa#xF1;ol/p
   /tip
   tip id=2
 h1portugu#xEA;s/h1
 pinfo about portugu#xEA;s/p
   /tip
 /tips
 
 # accented vowels are correct, but the indents are no longer correct:
 saveXML(doc, file = test2.xml, indent = T, encoding = UTF-8)
 
 Resulting file:
 ?xml version=1.0 encoding=UTF-8?
 tipstip id=1h1español/h1pinfo about español/p/tiptip 
 id=2h1português/h1pinfo about português/p/tip/tips
 
 I tried to workaround the problem by simply loading in that resulting file 
 and saving it again:
 doc2 - xmlInternalTreeParse(file = test2.xml, asTree = T)
 saveXML(doc2, file = test_word_around.xml, indent = T)
 
 but still don't get the indents.
 
 Does setting encoding = UTF-8 override indents = TRUE in saveXML()?
 
 Thanks. Earl
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saveXML() prefix argument

2013-10-17 Thread Duncan Temple Lang
Milan is correct.
The prefix is used when saving the XML content that is represented in
a different format in R.

To get the prefix 
 ?xml version=1.0?
on the XML content that you save, use a document object

doc = newXMLDoc()
root = newXMLNode(foo, doc = doc)

saveXML(doc)


?xml version=1.0?
foo/

Sorry for the confusion.
 D

On 10/17/13 2:36 AM, Milan Bouchet-Valat wrote:
 Le mercredi 16 octobre 2013 à 23:45 -0400, Earl Brown a écrit :
 I'm using the XML package and specifically the saveXML() function but I 
 can't get the prefix argument of saveXML() to work:

 library(XML)
 concepts - c(one, two, three)
 info - c(info one, info two, info three)
 root - newXMLNode(root)
 for (i in 1:length(concepts)) {
  cur.concept - concepts[i]
  cur.info - info[i]
  cur.tip - newXMLNode(tip, attrs = c(id = i), parent = root)
  newXMLNode(h1, cur.concept, parent = cur.tip)
  newXMLNode(p, cur.info, parent = cur.tip)
 }

 # None of the following output a prefix on the first line of the exported 
 document
 saveXML(root)
 saveXML(root, file = test.xml)
 saveXML(root, file = test.xml, prefix = '?xml version=1.0?\n')

 Am I missing something obvious? Any ideas?
 It looks like the function XML:::saveXML.XMLInternalNode() does not use
 the 'prefix' parameter at all. So it won't be taken into account when
 calling saveXML() on objects of class XMLInternalNode.
 
 I think you should report this to Duncan Temple Lang, as this is
 probably an oversight.
 
 
 Regards
 
 
 Thanks in advance. Earl Brown

 -
 Earl K. Brown, PhD
 Assistant Professor of Spanish Linguistics
 Advisor, TEFL MA Program
 Department of Modern Languages
 Kansas State University
 www-personal.ksu.edu/~ekbrown

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl cookiejar

2013-08-27 Thread Duncan Temple Lang
Hi Earl

 The cookies will only be written to the file specified by the cookiejar option
when the curl handle is garbage collected.

 If you use

   rm(ch)
   gc()

the cookie.txt file should be created.

 This is the way libcurl behaves rather than something RCurl introduces.

 If you don't explicitly specify a curl handle in a request, the cookiejar
 option works as on expects because the implicit curl handle is destroyed
 at the end of the call and often garbage collection occurs.


  D.

On 8/24/13 11:01 PM, Earl Brown wrote:
 R-helpers,
 
 When I use cURL in the Terminal:
 
 curl --cookie-jar cookie.txt --url http://corpusdelespanol.org/x.asp; 
 --user-agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) 
 Gecko/20100101 Firefox/23.0 --location --include
 
 a cookie file cookie.txt is saved to my working directory. However, when I 
 try what I think is the equivalent command R with RCurl:
 
 ch - getCurlHandle(followlocation = T, header = T, useragent = Mozilla/5.0 
 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20100101 Firefox/23.0)
 getURL(url = http://www.corpusdelespanol.org/x.asp;, cookiejar = 
 cookie.txt, curl = ch)
 
 no cookie file is saved. 
 
 What am I missing to reproduce in RCurl what I'm successfully doing in the 
 Terminal?
 
 Thank you for your time and help. Earl Brown
 
 -
 Earl K. Brown, PhD
 Assistant Professor of Spanish Linguistics
 Advisor, TEFL MA Program
 Department of Modern Languages
 Kansas State University
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML package installation -- an old question

2013-08-15 Thread Duncan Temple Lang
Hi Tao

In the same R session as you call install.packages(),
what does

  system(which xml2-config, intern = TRUE)

return?

Basically, the error message from the configuration script for the
XML package is complaining that it cannot find the executable xml2-config
in your PATH.


(You can also send _me_ the config.log file from the attempted installation.)

  D.



On 8/15/13 10:13 AM, Shi, Tao wrote:
 Hi list,
 
 I have encountered the Cannot find xml2-config problem too during XML 
 package installation on my 64-bit Redhat (v. 6.4) linux machine.  After 
 looking through the old posts I checked all the necessary libraries and they 
 all seem to be properly installed (see below).  I don't understand why R 
 can't see the xml2-confg during the installation process.  Help, please!
 
 Many thanks!
 
 Tao
 
 
 
 ==
 
 [root ~]# yum install libxml2
 Setting up Install Process
 Package matching libxml2-2.7.6-8.el6_3.3.x86_64 already installed. Checking 
 for update.
 Nothing to do
 [root ~]# yum install libxml2-devel
 Setting up Install Process
 Package matching libxml2-devel-2.7.6-8.el6_3.3.x86_64 already installed. 
 Checking for update.
 Nothing to do
 [root ~]# xml2-config --version
 2.7.6
 [root ~]# curl-config --version
 libcurl 7.19.7
 
 
 
   R session ==
 
 
 install.packages(XML)
 Installing package into \u2018/usr/lib64/R/library\u2019
 (as \u2018lib\u2019 is unspecified)
 trying URL 'http://cran.stat.ucla.edu/src/contrib/XML_3.98-1.1.tar.gz'
 Content type 'application/x-tar' length 1582216 bytes (1.5 Mb)
 opened URL
 ==
 downloaded 1.5 Mb
 
 * installing *source* package \u2018XML\u2019 ...
 ** package \u2018XML\u2019 successfully unpacked and MD5 sums checked
 checking for gcc... gcc
 checking for C compiler default output file name... rm: cannot remove 
 `a.out.dSYM': Is a directory
 a.out
 checking whether the C compiler works... yes
 checking whether we are cross compiling... no
 checking for suffix of executables... 
 checking for suffix of object files... o
 checking whether we are using the GNU C compiler... yes
 checking whether gcc accepts -g... yes
 checking for gcc option to accept ISO C89... none needed
 checking how to run the C preprocessor... gcc -E
 checking for sed... /bin/sed
 checking for pkg-config... /usr/bin/pkg-config
 checking for xml2-config... no
 Cannot find xml2-config
 ERROR: configuration failed for package \u2018XML\u2019
 * removing \u2018/usr/lib64/R/library/XML\u2019
 
 The downloaded source packages are in
 \u2018/tmp/RtmpwnAIFH/downloaded_packages\u2019
 Updating HTML index of packages in '.Library'
 Making 'packages.html' ... done
 Warning message:
 In install.packages(XML) :
   installation of package \u2018XML\u2019 had non-zero exit status
 
 
 sessionInfo()
 R version 3.0.1 (2013-05-16)
 Platform: x86_64-redhat-linux-gnu (64-bit)
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8   
  [7] LC_PAPER=C LC_NAME=C 
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 
 
 other attached packages:
 [1] BiocInstaller_1.10.3
 
 loaded via a namespace (and not attached):
 [1] tcltk_3.0.1 tools_3.0.1

 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to download this data?

2013-08-03 Thread Duncan Temple Lang
Hi Ron

  Yes, you can use ssl.verifypeer = FALSE.  Or alternatively, you can use also 
use

   getURLContent(,  cainfo = system.file(CurlSSL, cacert.pem, 
package = RCurl))

 to specify where libcurl can find the certificates to verify the SSL signature.


 The error you are encountering appears to becoming from a garbled R 
expression. This may have
arisen as a result of an HTML mailer adding the a href= into the 
expression
where it found an https://...

 What we want to do is end up with a string of the form

   
https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=adasdasdad?expiryData=specId=219

We have to substitute the text adasdasdad which  we assigned to jsession in a 
previous command.
So, take the literal text

   c(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=;,
 jsession,
 ?expiryData=specId=219)

and combine it into a single string with paste0.

We need the literal strings as they appear when you view the mail for R to make 
sense of them, not what the mailer adds.


As to where I found this, it is in the source of the original HTML page in 
rawDoc

 scripts = getNodeSet(rawDoc, //body//script)
 scripts[[ length(scripts) ]]

and look at the text, specifically the app.urls and its 'expiry' field.


script type=text/javascript![CDATA[

var app = {};

app.isOption = false;

app.urls = {


'spec':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?details=specId=219',


'data':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?data=specId=219',


'confirm':'/reports/dealreports/getSampleConfirm.do;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?hubId=403productId=254',


'reports':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?reports=specId=219',


'expiry':'/productguide/ProductSpec.shtml;jsessionid=22E9BE9DB19FC6F3446C9ED4AFF2BE3F?expiryDates=specId=219'

};

app.Router = Backbone.Router.extend({

routes:{

spec:spec,

data:data,

confirm:confirm,


On 8/3/13 1:05 AM, Ron Michael wrote:
 In the mean time I have this problem sorted out, hopefully I did it 
 correctly. I have modified the line of your code as:
  
 rawOrig = 
 getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;,
  ssl.verifypeer = FALSE)
  
 However next I faced with another problem to executing:
   u = sprintf(a 
 href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;,
  jsession) 
 Error: unexpected symbol in u = sprintf(a href=https
 
 Can you or someone else help me to get out of this error?
  
 Also, my another question is: from where you got the expression:
 a 
 href=https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;
  
 I really appreciate if someone help me to understand that.
  
 Thank you.
 
 
 - Original Message -
 From: Ron Michael ron_michae...@yahoo.com
 To: Duncan Temple Lang dtemplel...@ucdavis.edu; r-help@r-project.org 
 r-help@r-project.org
 Cc: 
 Sent: Saturday, 3 August 2013 12:58 PM
 Subject: Re: [R] How to download this data?
 
 Hello Duncan,
  
 Thank you very much for your pointer.
  
 However when I tried to run your code, I got following error:
   rawOrig = 
 getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;)
  
 Error in function (type, msg, asError = TRUE)  : 
   SSL certificate problem, verify that the CA cert is OK. Details:
 error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify 
 failed
 
 Can someone help me to understand what could be the cause of this error?
  
 Thank you.
 
 
 - Original Message -
 From: Duncan Temple Lang dtemplel...@ucdavis.edu
 To: r-help@r-project.org
 Cc: 
 Sent: Saturday, 3 August 2013 4:33 AM
 Subject: Re: [R] How to download this data?
 
 
 That URL is an HTTPS (secure HTTP), not an HTTP.
 The XML parser cannot retrieve the file.
 Instead, use the RCurl package to get the file.
 
 However, it is more complicated than that. If
 you look at source of the HTML page in a browser,
 you'll see a jsessionid and that is a session identifier.
 
 The following retrieves the content of your URL and then
 parses it and extracts the value of the jsessionid.
 Then we create the full URL to the actual data page (which is actually in the 
 HTML
 content but in JavaScript code)
 
 library(RCurl)
 library(XML)
 
 rawOrig = 
 getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;)
 rawDoc = htmlParse(rawOrig)
 tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]]
 jsession = gsub(.*jsessionid=([^?]+)?.*, \\1

Re: [R] How to download this data?

2013-08-02 Thread Duncan Temple Lang

That URL is an HTTPS (secure HTTP), not an HTTP.
The XML parser cannot retrieve the file.
Instead, use the RCurl package to get the file.

However, it is more complicated than that. If
you look at source of the HTML page in a browser,
you'll see a jsessionid and that is a session identifier.

The following retrieves the content of your URL and then
parses it and extracts the value of the jsessionid.
Then we create the full URL to the actual data page (which is actually in the 
HTML
content but in JavaScript code)

library(RCurl)
library(XML)

rawOrig = 
getURLContent(https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry;)
rawDoc = htmlParse(rawOrig)
tmp = getNodeSet(rawDoc, //@href[contains(.,\040'jsessionid=')])[[1]]
jsession = gsub(.*jsessionid=([^?]+)?.*, \\1, tmp)

u = 
sprintf(https://www.theice.com/productguide/ProductSpec.shtml;jsessionid=%s?expiryDates=specId=219;,
 jsession)

doc = htmlParse(getURLContent(u))
tbls = readHTMLTable(doc)
data = tbls[[1]]

dim(data)


I did this quickly so it may not be the best way or completely robust, but 
hopefully
it gets the point across and does get the data.

  D.

On 8/2/13 2:42 PM, Ron Michael wrote:
 Hi all,
  
 I need to download the data from this web page:
  
 https://www.theice.com/productguide/ProductSpec.shtml?specId=219#expiry
  
 I used the function readHTMLTable() from package XML, however could not 
 download that.
  
 Can somebody help me how to get the data onto my R window?
  
 Thank you.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xmlToDataFrame very slow

2013-07-31 Thread Duncan Temple Lang
Hi Stavros

 xmlToDataFrame() is very generic and so doesn't know anything
about the particulars of the XML it is processing. If you know
something about the structure of the XML, you should be able to leverage that
for performance.

xmlToDataFrame is also not optimized as it is just a convenience routine for 
people who want to work with
XML without much effort.

If you send me the file and the code you are using to read the file, I'll take a
look at it.

 D.

On 7/30/13 11:10 AM, Stavros Macrakis wrote:
 I have a modest-size XML file (52MB) in a format suited to xmlToDataFrame 
 (package XML).
 
 I have successfully read it into R by splitting the file 10 ways then running 
 xmlToDataFrame on each part, then
 rbind.fill (package plyr) on the result. This takes about 530 s total, and 
 results in a data.frame with 71k rows and
 object.size of 21MB.
 
 But trying to run xmlToDataFrame on the whole file takes forever ( 1 s 
 so far). xmlParse of this file takes only 0.8 s.
 
 I tried running xmlToDataFrame on the first 10% of the file, then the first 
 10% repeated twice, then three times (with
 the outer tags adjusted of course). Timings:
 
 1 copy: 111 s = 111 per copy
 2 copy: 311 s = 155
 3 copy: 626 s = 209
 
 The runtime is superlinear.  What is going on here? Is there a better 
 approach?
 
 Thanks,
 
   -s


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] downloading web content

2013-07-23 Thread Duncan Temple Lang
Hi Daisy

 Use getURLContent() rather than getURL().
The former handles binary content and this appears to be a zip file.

You can write it to a file or read its contents directly in memory, e.g

  library(RCurl)
  z = 
getURLContent(http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;)
  attributes(z)

  library(Rcompression)
  ar = zipArchive(z)
  names(ar)
  getZipInfo(ar)
  ar[[data.csv]]
  dd = read.csv(textConnection(ar[[data.csv]]))

  D.




On 7/23/13 2:59 AM, Daisy Englert Duursma wrote:
 Hello,
 I am trying to use R to download a bunch of .csv  files such as:
 
 http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia
 
 I have tried the following and neither work:
 
 a- getURL(
 http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;)
 
 Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
   embedded nul in string:
 and
 
 a-httpPOST(
 http://biocache.ala.org.au/ws/occurrences/download?q=Banksia+ericifolia;)
 
 Error: Internal Server Error
 
 
 Any help would be appreciated.
 
 Daisy


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Weird 'xmlEventParse' encoding issue

2013-07-16 Thread Duncan Temple Lang
Hi Sascha

 Your code gives the correct results on my machine (OS X),
either reading from the file directly or via readLines() and passing
the text to xmlEventParse().

 The problem might be the version of the XML package or your environment
settings.  And it is important to report the session information.
So you should provide the output from

   sessionInfo()
   Sys.getenv()
   libxmlVersion()


 D

On 7/15/13 4:41 AM, Sascha Wolfer wrote:
 Dear list,
 
 I have got a weird encoding problem with the xmlEventParse() function from 
 the 'XML' package.
 
 I tried finding an answer on the web for several hours and a Stack Exchange 
 question came back without success :(
 
 So here's the problem. I created a small XML test file, which looks like this:
 
 ?xml version=1.0 encoding=iso-8859-1?
 !DOCTYPE testFile
 s type=manualauch der Schulleiter steht dafür zur Verfügung. Das ist 
 seßhaft mit ä und ö.../s
 
 This file is encoded with the iso-8859-1 encoding which is also defined in 
 its header.
 
 I have 3 handler functions, definitions as follows:
 
 sE2 - function (name, attrs) {
   if (name == s) {
 get.text - T }
 }
 
 eE2 - function (name, attrs) {
   if (name == s) {
 get.text - F
   }
 }
 
 tS2 - function (content, ...) {
   if (get.text  nchar(content)  0) {
 collected.text - c(collected.text, content)
   }
 }
 
 I have one wrapper function around xmlEventParse(), definition as follows:
 
 get.all.text - function (file) {
   t1 - Sys.time()
   read.file - paste(readLines(file, encoding = ), collapse =  )
   print(read.file)
   assign(collected.text, c(), env = .GlobalEnv)
   assign(get.text, F, env = .GlobalEnv)
   xmlEventParse(read.file, asText = T, list(startElement = sE2,
endElement = eE2,
text = tS2),
error = function (...) { },
saxVersion = 1)
   t2 - Sys.time()
   cat(That took, round(difftime(t2,t1, units=secs), 1), seconds.\n)
   cat(Result of reading is in variable 'collected.text'.\n)
   collected.text
 }
 
 The output of calling get.all.text(test file) is as follows:
 [1] ?xml version=\1.0\ encoding=\iso-8859-1\? !DOCTYPE testFile s 
 type=\manual\auch der Schulleiter steht
 dafür zur Verfügung. Das ist seßhaft mit ä und ö.../s 
 That took 0 seconds.
 Result of reading is in variable 'collected.text'.
 [1] auch der Schulleiter steht dafür zur 
 Verfügung. Das ist seßhaft mit ä und ö...
 
 Now the REALLY weird thing (for me) is that R obviously reads in the file 
 correctly (first output) with 'readLines()'.
 Then this output is passed to xmlEventParse. Afterwards the output is broken 
 and it sometimes also inserts weird breaks
 were special characters occur.
 
 Do you have any ideas how to solve this problem?
 
 I cannot use the xmlParse() function because I need the SAX functionality of 
 xmlEventParse(). I also tried reading the
 file with xmlEventParse() directly (with asText = F). No changes...
 
 Thanks a lot,
 Sascha W.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] htmlParse (from XML library) working sporadically in the same code

2013-03-20 Thread Duncan Temple Lang

When readHTMLTable() or more generally the HTML/XML parser fails to retrieve
a URL, I suggest you use check to see if a different approach will work.
You can use the download.file() function or readLines(url()) or
getURLContent() from the RCurl package to get the content of the URL.

The you can pass that content to readHTMLTable() via
  readHTMLTable(htmlParse(text, asText = TRUE))
or
  readHTMLTable(text,  asText = TRUE)

 D.

On 3/20/13 10:07 AM, Andre Zege wrote:
 I am using htmlParse from XML library on a paricular website. Sometimes code 
 fails, sometimes it works, most of the time id doesn't and i cannot see why. 
 The file i am trying to parse is 
 
 http://www.londonstockexchange.com/exchange/prices-and-markets/international-markets/indices/home/sp-500.html?page=0
 
 
 Sometimes the following code works
 n-readHTMLTable(htmlParse(url))
 
 
 But most of the time it would return the following error coming from 
 htmlParse:
 
 Error: failed to load HTTP resource
 
 
 Error is coming from the following line in htmlParse code:
  
   ans - .Call(RS_XML_ParseTree, as.character(file), handlers, 
 as.logical(ignoreBlanks), as.logical(replaceEntities), as.logical(asText), 
 as.logical(trim), as.logical(validate), as.logical(getDTD), 
 as.logical(isURL), as.logical(addAttributeNamespaces), 
 as.logical(useInternalNodes), as.logical(isHTML), as.logical(isSchema), 
 as.logical(fullNamespaceInfo), as.character(encoding), 
 as.logical(useDotNames), xinclude, error, addFinalizer, as.integer(options), 
 PACKAGE = XML)
 
 
 
 By the way, readHTMLTable(htmlParse(url)) works fine on other pages, so the 
 problem is somehow related to this page. 
 
 I am using 64-bit  R.15.3 version on windows machine
 
 Thanks much
 Andre
   [[alternative HTML version deleted]]
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create a Data Frame from an XML

2013-01-22 Thread Duncan Temple Lang

Hi Adam

 [You seem to have sent the same message twice to the mailing list.]

There are various strategies/approaches to creating the data frame
from the XML.

Perhaps the approach that most closely follows your approach is

  xmlRoot(doc)[ row ]

which  returns a list of XML nodes whose node name is row that are
children of the root node data.

So
  sapply(xmlRoot(doc) [ row ], xmlAttrs)

yields a matrix with as many columns as there are  row nodes
and with 3 rows - one for each of the BRAND, YEAR and VALUE attributes.

So

  d = t( sapply(xmlRoot(doc) [ row ], xmlAttrs) )

gives you a matrix with the correct rows and column orientation
and now you can turn that into a data frame, converting the
columns into numbers, etc. as you want with regular R commands
(i.e. independently of the XML).


 D.

On 1/22/13 1:43 PM, Adam Gabbert wrote:
  Hello,
 
 I'm attempting to read information from an XML into a data frame in R using
 the XML package. I am unable to get the data into a data frame as I would
 like.  I have some sample code below.
 
 *XML Code:*
 
 Header...
 
 Data I want in a data frame:
 
data
   row BRAND=GMC NUM=1 YEAR=1999 VALUE=1 /
   row BRAND=FORD NUM=1 YEAR=2000 VALUE=12000 /
   row BRAND=GMC NUM=1 YEAR=2001 VALUE=12500 /
   row BRAND=FORD NUM=1 YEAR=2002 VALUE=13000 /
   row BRAND=GMC NUM=1 YEAR=2003 VALUE=14000 /
   row BRAND=FORD NUM=1 YEAR=2004 VALUE=17000 /
   row BRAND=GMC NUM=1 YEAR=2005 VALUE=15000 /
   row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS /
   row BRAND=FORD NUM=1 YEAR=2007 VALUE=17500 /
   row BRAND=GMC NUM=1 YEAR=2008 VALUE=22000 /
   /data
 
 *R Code:*
 
 doc -xmlInternalTreeParse (Sample2.xml)
 top - xmlRoot (doc)
 xmlName (top)
 names (top)
 art - top [[row]]
 art
 **
 *Output:*
 
 artrow BRAND=GMC NUM=1 YEAR=1999 VALUE=1/
 
 * *
 
 
 This is where I am having difficulties.  I am unable to access additional
 rows; ( i.e.  row BRAND=GMC NUM=1 YEAR=1967 VALUE=PRICLESS / )
 
 and I am unable to access the individual entries to actually create the
 data frame.  The data frame I would like is as follows:
 
 BRANDNUMYEARVALUE
 GMC1  1999  1
 FORD   2  2000  12000
 GMC1  2001   12500
 etc
 
 Any help or suggestions would be appreciated.  Conversly, my eventual goal
 would be to take a data frame and write it into an XML in the previously
 shown format.
 
 Thank you
 
 AG
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading JSON files from R

2012-12-03 Thread Duncan Temple Lang

Hi m.dr.

  Reading data from MongoDB is no problem. So the RJSONIO or rjson
packages should work.

  Can you send me the sample file that is causing the problem, please?

 The error about a method looks like a potential oversight in the combinations
of inputs.

  Thanks
D.




On 12/3/12 7:30 PM, m.dr wrote:
 Hello All -
 
 I am trying to use RJSONIO to read in some JSON files.
 
 I was wondering if anyone could please comment on the level of complexity of
 the files it can be used to read, exports from or directly from NoSQL DBMS
 like MongoDB and such.
 
 Also, i understand that in reading the JSON file RJSONIO will automatically
 create the necessary structures. However I cannot seem to use to to read the
 file properly and get this error:
 
 Error in function (classes, fdef, mtable)  : 
   unable to find an inherited method for function fromJSON, for signature
 missing, NULL
 
 The call I am making is:
 noSqlData - fromJSON(file='data.json')
 
 It is a small file - with 3 levels of nested records.
 
 And if there were some links to some examples with a file and usage would be
 great. My JSON file validates - so do not believe there is anything wrong
 with the file.
 
 Thanks for your help.
 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Reading-JSON-files-from-R-tp4651976.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading json tables

2012-12-02 Thread Duncan Temple Lang

Hi Michael

 The actual result I want is two data frames, wheat and monarch, whereas 
 fromJSON returns a list of lists.  I'll try to
 figure that part out.

 do.call(rbind, data[[1]])

will do the job, but there are elements in each of data[[1]] and data[[2]]
that are incomplete and which need to be filled in with NAs before rbinding.

Best,
  D.

On 12/2/12 6:26 AM, Michael Friendly wrote:
 On 12/1/2012 4:08 PM, Duncan Temple Lang wrote:
 Hi Michael

The problem is that the content of the .js file is not JSON,
 but actual JavaScript code.

 You could use something like the following

 tt = readLines(http://mbostock.github.com/protovis/ex/wheat.js;)

 txt = c([, gsub(;, ,, gsub(var [a-zA-Z]+ = , , tt)), ])
 tmp = paste(txt, collapse = \n)
 tmp = gsub(([a-zA-Z]+):, '\\1:', tmp)
 o = fromJSON(tmp)
 data = structure(o[1:2], names = c(wheat, monarch))

 Basically, this
  removes the 'var variable name =' part
  replaces the ; with a , to separate elements
  quotes the names of the fields, e.g. year, wheat, wages
  puts the two global data objects into a top-level array ([]) container

 This isn't ideal (as the regular expressions are not sufficiently specific
 and could modify the actual values incorrectly). However, it does the job
 for this particular file.
 
 Thanks for this, Duncan
 
 I hadn't understood that the data had to be pure JSON.
 
 The actual result I want is two data frames, wheat and monarch, whereas 
 fromJSON returns a list of lists.  I'll try to
 figure that part out.
 
 -Michael


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading json tables

2012-12-01 Thread Duncan Temple Lang

Hi Michael

  The problem is that the content of the .js file is not JSON,
but actual JavaScript code.

You could use something like the following

tt = readLines(http://mbostock.github.com/protovis/ex/wheat.js;)

txt = c([, gsub(;, ,, gsub(var [a-zA-Z]+ = , , tt)), ])
tmp = paste(txt, collapse = \n)
tmp = gsub(([a-zA-Z]+):, '\\1:', tmp)
o = fromJSON(tmp)
data = structure(o[1:2], names = c(wheat, monarch))

Basically, this
removes the 'var variable name =' part
replaces the ; with a , to separate elements
quotes the names of the fields, e.g. year, wheat, wages
puts the two global data objects into a top-level array ([]) container

This isn't ideal (as the regular expressions are not sufficiently specific
and could modify the actual values incorrectly). However, it does the job
for this particular file.

On 12/1/12 12:47 PM, Michael Friendly wrote:
 I'm trying to read two data sets in json format from a single .js file. I've 
 tried fromJSON()
 in both RJSONIOIO and RJSON packages, but they require that the lines be
 pre-parsed somehow in ways I don't understand.  Can someone help?
 
 wheat - readLines(http://mbostock.github.com/protovis/ex/wheat.js;)
 str(wheat)
  chr [1:70] var wheat = [   { year: 1565, wheat: 41, wages: 5 }, ...

 
 The wheat.js file looks like this and defines two tables:  wheat and monarch:
 
 var wheat = [
   { year: 1565, wheat: 41, wages: 5 },
   { year: 1570, wheat: 45, wages: 5.05 },
   { year: 1575, wheat: 42, wages: 5.08 },
   { year: 1580, wheat: 49, wages: 5.12 },
   { year: 1585, wheat: 41.5, wages: 5.15 },
   { year: 1590, wheat: 47, wages: 5.25 },
   { year: 1595, wheat: 64, wages: 5.54 },
   { year: 1600, wheat: 27, wages: 5.61 },
   { year: 1605, wheat: 33, wages: 5.69 },
   { year: 1610, wheat: 32, wages: 5.78 },
   { year: 1615, wheat: 33, wages: 5.94 },
   { year: 1620, wheat: 35, wages: 6.01 },
 ...
   { year: 1800, wheat: 79, wages: 28.5 },
   { year: 1805, wheat: 81, wages: 29.5 },
   { year: 1810, wheat: 99, wages: 30 },
   { year: 1815, wheat: 78 }, // TODO
   { year: 1820, wheat: 54 },
   { year: 1821, wheat: 54 }
 ];
 
 var monarch = [
   { name: Elizabeth, start: 1565, end: 1603 },
   { name: James I, start: 1603, end: 1625 },
   { name: Charles I, start: 1625, end: 1649 },
   { name: Cromwell, start: 1649, end: 1660, commonwealth: true },
   { name: Charles II, start: 1660, end: 1685 },
   { name: James II, start: 1685, end: 1689 },
   { name: WM, start: 1689, end: 1702 },
   { name: Anne, start: 1702, end: 1714 },
   { name: George I, start: 1714, end: 1727 },
   { name: George II, start: 1727, end: 1760 },
   { name: George III, start: 1760, end: 1820 },
   { name: George IV, start: 1820, end: 1821 }
 ];


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem with XML package

2012-11-15 Thread Duncan Temple Lang

Hi Arvin

 2.9.2 is very old.  2.13 is still old.
Why not upgrade to 2.15.*?

However, the problem is that you the object you are passing to xmlName()
is NULL.   This will give an error in the latest version of the XML package
and most likely any version of the XML package. I imagine the structure
of the XML document has changed. However, I can't tell what the problem is
without some context.

   D.

On 11/15/12 3:00 PM, Torus Insurance wrote:
 Hi List,
 I have used XML in R version 2.9.2. The code is working fine using Rv2.9.2
 and its related XML package. Now I am using Rv2.13.1 and its related XML
 package, but I get the following error:
 
 
 Error in UseMethod(xmlName, node) :
   no applicable method for 'xmlName' applied to an object of class NULL
 
 Any idea?
 
 Thanks
 
 Arvin
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl - curlPerform - Time out?!?

2012-10-30 Thread Duncan Temple Lang

Hi Florian

 Yes, there are several options for a curl operation that control the timeout.
The timeout option is the top-level general one.  There is also timeout.ms.
You can also control the timeout length for different parts of the 
operation/request
such as via the connecttimeout for just establishing the connection.
See the Connection Options in the libcurl help page for curl_easy_setopt.

  Best,
   D.


On 10/30/12 9:30 AM, Florian Umlauf (CRIE) wrote:
 
 
 Hi,
 
 I am working with the RCurl package and I am using the curlPerform 
 function for an soap-query.
 The problem is that the code is usually working well, but sometimes the 
 connection gets lost.
 
 So I wrote a while-loop to repeat the query if anything might happened 
 so that the same query runs again, but if the query-faults it takes a 
 very long time for the repetition.
 
 My question is if there is any possibility to force a time out for the 
 curlPerform function or something like that?
 
 
 Thanks!
 
 
 
  run = 1
  i=0
  while(run==1)
  {
  i=i+1
  try(
   run - curlPerform(url = 
 http://search.webofknowledge.com/esti/wokmws/ws/WokSearchLite.cgi;,
 httpheader=c(Accept-Encoding=gzip,deflate,Content-Type=text/xml;charset=UTF-8,'SOAPAction'='',
 Cookie=paste('SID=',s_session,'',sep=),Content-Length=paste(nchar(s_body)),Host=search.webofknowledge.com,Connection=Keep-Alive,User-Agent=Apache-HttpClient/4.1.1
  
 (java 1.5)),
 postfields=s_body,
 writefunction = h$update
  ,verbose = 
 TRUE)
   ,TRUE)
 
 
  print(i)
   }
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML namespace control

2012-10-29 Thread Duncan Temple Lang

Hi Ben

  Can you tell us the slightly bigger picture, please?
Do you want to create a single similar node entirely in isolation
or do you want to create it as part of an XML tree/document?
Who will be reading the resulting XML.

You can use a parent node

  top = newXMLNode(storms, namespaceDefinitions = c(weather = 
http://my.weather.com/events;))

Then

newXMLNode(storm, ripsnorter, namespace = weather,
attrs = c(type = hurrican, name = Sandy),
   parent = top )


That gives you

   weather:storm type=hurrican name=Sandyripsnorter/weather:storm

So now what are you going to do with that node?

The namespace prefix is local to a document, chosen by the author of that XML 
document.
The namespace URI is the global key that authors and consumers must agree upon.
While your database may use udf, you may chose a different prefix or even the 
default
prefix to correspond to that same URI.  So each document must explicitly 
declare the
prefix = URI mapping for it to be understood.

D.


On 10/29/12 5:54 AM, Ben Tupper wrote:
 Hello,
 
 I am working with a database system from which I can retrieve these kinds of 
 user defined fields formed as XML ...
 
 udf:field unit=uM type=Numeric name=facs.Stain final 
 concentration5/udf:field
 
 You can see in the above example that field is defined in the namespace 
 udf, but that the udf namespace is not defined along with the attributes 
 of the node.  That is, 'xmlns:udf = http://blah.blah.com/blah;' doesn't 
 appear.  
 
 I would like to create a similar node from scratch, but I can't seem to 
 define the node with a namespace without providing the namespace definition. 
 
 
 library(XML)
 
 node1 - newXMLNode(storm, ripsnorter,
namespace = weather,
namespaceDefinitions = c(weather = http://my.weather.com/events;),
attrs = c(type = hurricane,  name = Sandy))
 node1
 
 # this returns the new node with the namespace prefix (which I want)
 # and the definition (which I don't want)
 
 # weather:storm xmlns:weather=http://my.weather.com/events; 
 type=hurricane name=Sandyripsnorter/weather:storm
 
 
 node2 - newXMLNode(storm, ripsnorter,
namespace = weather,
attrs = c(type = hurricane,  name = Sandy),
suppressNamespaceWarning = TRUE)
 node2
 
 # produces the node without the namespace prefix and without the definition
 
 # storm type=hurricane name=Sandyripsnorter/storm
 
 Is there some way to create a node with a namespace prefix but without 
 embedding the namespace definition along with the attributes?
 
 Thanks!
 Ben
 
 Ben Tupper
 Bigelow Laboratory for Ocean Sciences
 180 McKown Point Rd. P.O. Box 475
 West Boothbay Harbor, Maine   04575-0475 
 http://www.bigelow.org
 
 
 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: i386-apple-darwin9.8.0/i386 (32-bit)
 
 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 
 
 other attached packages:
 [1] tripack_1.3-4  RColorBrewer_1.0-5 Biostrings_2.24.1  IRanges_1.14.2   
   BiocGenerics_0.2.0 RCurl_1.91-1  
 [7] bitops_1.0-4.1 XML_3.9-4 
 
 loaded via a namespace (and not attached):
 [1] stats4_2.15.0 tools_2.15.0
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing very large xml datafiles with SAX: How to profile anonymous functions?

2012-10-26 Thread Duncan Temple Lang
Hi Frederic

 Perhaps the simplest way to profile the individual functions in your
handlers is to write the individual handlers as regular
named functions, i.e. assigned to a variable in your work space (or function 
body)
and then two write the handler functions as wrapper functions that call these
by name

  startElement = function(name, attr, ...) {
 # code you want to run when we encounter the start of an XML element
  }

  myText = function(...) {
 # code
  }

  Now, when calling xmlEventParse()

   xmlEventParse(filename,
  handlers = list(.startElement = function(...) 
startElement(...),
  .text = function(...) myText(...)))

Then the profiler will see the calls to startElement and myText.

There is small overhead of the extra layers, but you will get the profile 
information.

  D.

On 10/26/12 9:49 AM, Frederic Fournier wrote:
 Hello everyone,
 
 I'm trying to parse a very large XML file using SAX with the XML package
 (i.e., mainly the xmlEventParsing function). This function takes as an
 argument a list of other functions (handlers) that will be called to handle
 particular xml nodes.
 
 If when I use Rprof(), all the handler functions are lumped together under
 the anonymous label, and I get something like this:
 
 $by.total
total.time total.pct self.time self.pct
 system.time  151.22 99.99  0.00 0.00
 MyParsingFunction149.38 98.77  0.00 0.00
 xmlEventParse149.38 98.77  0.00 0.00
 .Call149.32 98.73  3.04 2.01
 Anonymous  146.74 97.02141.2693.40---
 !!
 xmlValue   3.04  2.01  0.46 0.30
 xmlValue.XMLInternalNode   2.58  1.71  0.14 0.09
 standardGeneric2.12  1.40  0.50 0.33
 gc 1.86  1.23  1.86 1.23
 ...
 
 
 Is there a way to make Rprof() identify the different handler functions, so
 I can know which one might be a bottleneck? Is there another profiling tool
 that would be more appropriate in a case like this?
 
 Thank you very much for your help!
 
 Frederic
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Downloading a html table

2012-10-23 Thread Duncan Temple Lang
Rather than requiring manual tweaking,

library(XML)
readHTMLTable(http://www.worldatlas.com/aatlas/populations/usapoptable.htm;)

will do the job for us.

 D.

On 10/22/12 8:17 PM, David Arnold wrote:
 All,
 
 A friend of mine would like to use this data with his stats class:
 
 http://www.worldatlas.com/aatlas/populations/usapoptable.htm
 
 I can't figure a way of capturing this data due to the mysql commands in the
 source code.
 
 Any thoughts?
 
 David.
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Downloading-a-html-table-tp4647091.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting results from Google Search

2012-10-23 Thread Duncan Temple Lang

Hi Eduardo

 Scraping the coordinates from the HTML page can be a little tricky
in this case.  Also, Google may not want you using their search engine
for that. Instead, you might use their Geocoding API
(https://developers.google.com/maps/documentation/geocoding),
but do ensure that this fits within their terms of use.

If you do use the Geocoding API, you can do with the following code:

library(RJSONIO)
library(RCurl)

DB-data.frame(town=c('Ingall', 'Dogondoutchi', 'Tera'),
   country=rep('Niger',3))

location = with(DB, paste(town, country))

ans = lapply(location,
  function(loc)
 
fromJSON(getForm(http://maps.googleapis.com/maps/api/geocode/json;,
  address = loc, sensor = 
false))$results[[1]]$geometry$location
 )

DB = cbind(DB, do.call(rbind, ans))

And now the data frame has the lat and lng variables.

Again, check that the Geocoding terms of use allows you to do this.

 HTH
   D.




On 10/23/12 6:33 AM, ECAMF wrote:
 Dear list,
 
 I have a long list of towns in Africa and would need to get their
 geographical coordinates. The Google query [/TownName Country coordinates/]
 works for most of the TownNames I have and give a nicely formatted Google
 output (try Ingall Niger coordinates for an example). I would like to launch
 a loop on the list of names I have and automatically extract the coordinates
 given by Google. Does anyone knows how it can be done?
 
 ex.
 DB-data.frame(town=c('Ingall', 'Dogondoutchi', 'Tera'),
 country=rep('Niger',3))
 # Get lat and lon from the Google search on : 
 for (i in 1:3)   {
   paste(DB$town[i], DB$country[i], 'coordinates',  sep=)
 }
 
 Many thanks!
 
 Eduardo.
 
 
 
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Extracting-results-from-Google-Search-tp4647136.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] saving to docx

2012-10-20 Thread Duncan Temple Lang

Just to let people know

On the Omegahat site (and source on github),
there are packages for working with Office Open
documents (and LibreOffice too), includinging
RWordXML, RExcelXML and the generic package OOXML
on which they rely.

These are prototypes in the sense that they
do not comprehensively cover the entire OOXML specification.
Instead, the packages do have functionality for some common things to get
data in and out of OO documents, and they have
foundation functions for building new features.

  D.

On 10/19/12 3:19 PM, David Winsemius wrote:
 
 On Oct 19, 2012, at 2:48 PM, Daróczi Gergely wrote:
 
 Hi Javad,

 saving R output to jpeg depends on what you want to save. For example
 saving an `lm` object to an image would be fun :)
 But you could export that quite easily to e.g. docx after installing
 Pandoc[1] and pander[2] package. You can find some examples in the
 README[3].

 Best,
 Gergely

 [1] http://johnmacfarlane.net/pandoc/installing.html
 [2] http://cran.r-project.org/web/packages/pander/index.html
 [3a] brew syntax: http://rapporter.github.com/pander/#brew-to-pandoc
 [3b] in a live R session:
 http://rapporter.github.com/pander/#live-report-generation
 
 I guess I need to retract my comment that such packages only existed on 
 Windows. Despite 'pander' not passing its CRAN package check for Mac, it does 
 build from source and the Pandoc installer does succeed inSnow Leapard and R 
 2.15.1. Thank you for writing the pander package, Daróczi.
 
 

 On Fri, Oct 19, 2012 at 9:54 PM, javad bayat j.bayat...@gmail.com wrote:

 hi all,
 how can i saving R output to docx or Jpeg format?



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problems with getURL (RCurl) to obtain list files of an ftp directory

2012-10-12 Thread Duncan Temple Lang

Hi Francisco

  The code gives me the correct results, and it works for you on a Windows 
machine.
So while it could be different versions of software (e.g. libcurl, RCurl, etc.),
the presence of the word squid in the HTML suggests to me that
your machine/network is using the proxy/caching software Squid. This intercepts
requests and caches the results locally and shares them across
local users.  So if squid has retrieved that page for an HTML target (e.g. a 
browser or
with a Content-Type set to text/html), it may be using that cached copy for 
your FTP request.

One thing I like to do when debugging RCurl calls is to add
  verbose = TRUE
to the .opts argument and then see the information about the communication.

   D.

On 10/11/12 11:37 AM, Francisco Zambrano wrote:
 Dear all,
 
 I have a problem with the command 'getURL' from the RCurl package, which I
 have been using to obtain a ftp directory list from the MOD16 (ET, DSI)
 products, and then to  download them. (part of the script by Tomislav
 Hengl, spatial-analyst). Instead of the list of files (from ftp), I am
 getting the complete html code. Anyone knows why this might happen?
 
 This are the steps i have been doing:
 
 MOD16A2.doy- '
 ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/'
 
 items - strsplit(getURL(MOD16A2.doy,
 .opts=curlOptions(ftplistonly=TRUE)), \n)[[1]]
 
 items #results
 
 [1] !DOCTYPE HTML PUBLIC \-//W3C//DTD HTML 4.01 Transitional//EN\ \
 http://www.w3.org/TR/html4/loose.dtd\;\n!-- HTML listing generated by
 Squid 2.7.STABLE9 --\n!-- Wed, 10 Oct 2012 13:43:53 GMT
 --\nHTMLHEADTITLE\nFTP Directory:
 ftp://ftp.ntsg.umt.edu/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\n/TITLE\nSTYLE
 type=\text/css\!--BODY{background-color:#ff;font-family:verdana,sans-serif}--/STYLE\n/HEADBODY\nH2\nFTP
 Directory: A HREF=\/\ftp://ftp.ntsg.umt.edu/A/A
 HREF=\/pub/\pub/A/A HREF=\/pub/MODIS/\MODIS/A/A
 HREF=\/pub/MODIS/Mirror/\Mirror/A/A
 HREF=\/pub/MODIS/Mirror/MOD16/\MOD16/A/A
 HREF=\/pub/MODIS/Mirror/MOD16/MOD16A2.105_MERRAGMAO/\MOD16A2.105_MERRAGMAO/A//H2\nPRE\nA
 HREF=\../\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dirup.gif\;
 ALT=\[DIRUP]\/A A HREF=\../\Parent Directory/A \nA
 HREF=\GEOTIFF_0.05degree/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\GEOTIFF_0.05degree/\GEOTIFF_0.05degree/A
 . . . . . . . Jun  3 18:00\nA HREF=\GEOTIFF_0.5degree/\IMG
 border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\GEOTIFF_0.5degree/\GEOTIFF_0.5degree/A. .
 . . . . . . Jun  3 18:01\nA HREF=\Y2000/\IMG border=\0\
 SRC=\http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2000/\Y2000/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2001/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2001/\Y2001/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2002/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2002/\Y2002/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2003/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2003/\Y2003/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2004/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2004/\Y2004/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2005/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2005/\Y2005/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2006/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2006/\Y2006/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2007/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2007/\Y2007/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2008/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2008/\Y2008/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2009/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2009/\Y2009/A. . . . . . . . . . . . . .
 Dec 23  2010\nA HREF=\Y2010/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2010/\Y2010/A. . . . . . . . . . . . . .
 Feb 20  2011\nA HREF=\Y2011/\IMG border=\0\ SRC=\
 http://localhost:3128/squid-internal-static/icons/anthony-dir.gif\;
 ALT=\[DIR] \/A A HREF=\Y2011/\Y2011/A. . . . . . . . . . . . . .
 Mar 12  2012   

Re: [R] scraping with session cookies

2012-09-19 Thread Duncan Temple Lang
Hi ?

The key is that you want to use the same curl handle
for both the postForm() and for getting the data document.

site = u =
http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18;

library(RCurl)
curl = getCurlHandle(cookiefile = , verbose = TRUE)

postForm(site, disclaimer_action=I Agree)

Now we have the cookie in the curl handle so we can use that same curl handle
to request the data document:

txt = getURLContent(u, curl = curl)

Now we can use readHTMLTable() on the local document content:

library(XML)
tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE)



Rather than knowing how to post the form, I like to read
the form programmatically and generate an R function to do the submission
for me. The RHTMLForms package can do this.

library(RHTMLForms)
forms = getHTMLFormDescription(u, FALSE)
fun = createFunction(forms[[1]])

Then we can use

 fun(.curl = curl)

instead of

  postForm(site, disclaimer_action=I Agree)

This helps to abstract the details of the form.

  D.

On 9/18/12 5:57 PM, CPV wrote:
 Hi, I am starting coding in r and one of the things that i want to do is to
 scrape some data from the web.
 The problem that I am having is that I cannot get passed the disclaimer
 page (which produces a session cookie). I have been able to collect some
 ideas and combine them in the code below but I dont get passed the
 disclaimer page.
 I am trying to agree the disclaimer with the postForm and write the cookie
 to a file, but I cannot do it succesfully
 The webpage cookies are written to the file but the value is FALSE... So
 any ideas of what I should do or what I am doing wrong with?
 Thank you for your help,
 
 library(RCurl)
 library(XML)
 
 site - 
 http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18;
 
 postForm(site, disclaimer_action=I Agree)
 
 cf - cookies.txt
 
 no_cookie - function() {
 curlHandle - getCurlHandle(cookiefile=cf, cookiejar=cf)
 getURL(site, curl=curlHandle)
 
 rm(curlHandle)
 gc()
 }
 
 if ( file.exists(cf) == TRUE ) {
 file.create(cf)
 no_cookie()
 }
 allTables - readHTMLTable(site)
 allTables
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] scraping with session cookies

2012-09-19 Thread Duncan Temple Lang
 You don't need to use the  getHTMLFormDescription() and createFunction().
Instead, you can use the postForm() call.  However, getHTMLFormDescription(),
etc. is more general. But you need the very latest version of the package
to deal with degenerate forms that have no inputs (other than button clicks).

 You can get the latest version of the RHTMLForms package
 from github

  git clone g...@github.com:omegahat/RHTMLForms.git

 and that has the fixes for handling the degenerate forms with
 no arguments.

   D.

On 9/19/12 7:51 AM, CPV wrote:
 Thank you for your help Duncan,
 
 I have been trying what you suggested however  I am getting an error when
 trying to create the function fun- createFunction(forms[[1]])
 it says Error in isHidden I hasDefault :
 operations are possible only for numeric, logical or complex types
 
 On Wed, Sep 19, 2012 at 12:15 AM, Duncan Temple Lang 
 dtemplel...@ucdavis.edu wrote:
 
 Hi ?

 The key is that you want to use the same curl handle
 for both the postForm() and for getting the data document.

 site = u =
 
 http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18
 

 library(RCurl)
 curl = getCurlHandle(cookiefile = , verbose = TRUE)

 postForm(site, disclaimer_action=I Agree)

 Now we have the cookie in the curl handle so we can use that same curl
 handle
 to request the data document:

 txt = getURLContent(u, curl = curl)

 Now we can use readHTMLTable() on the local document content:

 library(XML)
 tt = readHTMLTable(txt, asText = TRUE, which = 1, stringsAsFactors = FALSE)



 Rather than knowing how to post the form, I like to read
 the form programmatically and generate an R function to do the submission
 for me. The RHTMLForms package can do this.

 library(RHTMLForms)
 forms = getHTMLFormDescription(u, FALSE)
 fun = createFunction(forms[[1]])

 Then we can use

  fun(.curl = curl)

 instead of

   postForm(site, disclaimer_action=I Agree)

 This helps to abstract the details of the form.

   D.

 On 9/18/12 5:57 PM, CPV wrote:
 Hi, I am starting coding in r and one of the things that i want to do is
 to
 scrape some data from the web.
 The problem that I am having is that I cannot get passed the disclaimer
 page (which produces a session cookie). I have been able to collect some
 ideas and combine them in the code below but I dont get passed the
 disclaimer page.
 I am trying to agree the disclaimer with the postForm and write the
 cookie
 to a file, but I cannot do it succesfully
 The webpage cookies are written to the file but the value is FALSE... So
 any ideas of what I should do or what I am doing wrong with?
 Thank you for your help,

 library(RCurl)
 library(XML)

 site - 

 http://www.wateroffice.ec.gc.ca/graph/graph_e.html?mode=textstn=05ND012prm1=3syr=2012smo=09sday=15eyr=2012emo=09eday=18
 

 postForm(site, disclaimer_action=I Agree)

 cf - cookies.txt

 no_cookie - function() {
 curlHandle - getCurlHandle(cookiefile=cf, cookiejar=cf)
 getURL(site, curl=curlHandle)

 rm(curlHandle)
 gc()
 }

 if ( file.exists(cf) == TRUE ) {
 file.create(cf)
 no_cookie()
 }
 allTables - readHTMLTable(site)
 allTables

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory leak using XML readHTMLTable

2012-09-17 Thread Duncan Temple Lang
Hi James

  Unfortunately, I am not certain if the latest version
of the XML package has the garbage collection activated for the nodes.
It is quite complicated and that feature was turned off in some versions
of the package.  I suggest that you install the version of the package on github

  git@github-omg:omegahat/XML.git

I believe that will handle the garbage collection of nodes, and I'd like
to know if it doesn't.

   Best,
D.

On 9/16/12 8:30 PM, J Toll wrote:
 Hi,
 
 I'm using the XML package to scrape data and I'm trying to figure out
 how to eliminate the memory leak I'm currently experiencing.  In the
 searches I've done, it sounds like the existence of the leak is fairly
 well known.  What isn't as clear is exactly how to solve it.  The
 general process I'm using is this:
 
 require(XML)
 
 myFunction - function(URL) {
 
   html - readLines(URL)
 
   tables - readHTMLTable(html, stringsAsFactors = FALSE)
 
   myData - data.frame(Value = tables[[1]][, 2],
 row.names = make.unique(tables[[1]][, 1]),
 stringsAsFactors = FALSE)
 
  rm(list = c(html, tables))   # here, and
  free(tables)  # here, my attempt to solve the
 memory leak
 
   return(myData)
 
 }
 
 x - lapply(myURLs, myFunction)
 
 
 I've tried using rm() and free() to try to free up the memory each
 time the function is called, but it hasn't worked as far as I can
 tell.  By the time lapply is finished woking through my list of url's,
 I'm swapping about 3GB of memory.
 
 I've also tried using gc(), but that seems to also have no effect on
 the problem.
 
 I'm running RStudio 0.96.330 and latest version of XML.
 R version 2.15.1 (2012-06-22) -- Roasted Marshmallows
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 Any suggestions on how to solve this memory issue?  Thanks.
 
 
 James
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] memory leak using XML readHTMLTable

2012-09-17 Thread Duncan Temple Lang

Thanks Yihui for normalizing my customized git URL.

The version of the package on github is in the
standard R format and that part of the README is
no longer relevant. Sorry for the confusion.

It might be simplest to pick up a tar.gz file of the source at

 http://www.omegahat.org/RSXML/XML_3.94-0.tar.gz


 D

On 9/17/12 12:31 PM, J Toll wrote:
 On Mon, Sep 17, 2012 at 12:51 PM, Yihui Xie x...@yihui.name wrote:
 I think the correct address for GIT should be
 git://github.com/omegahat/XML.git :) Or just
 https://github.com/omegahat/XML

 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-2465 Web: http://yihui.name
 Department of Statistics, Iowa State University
 2215 Snedecor Hall, Ames, IA


 On Mon, Sep 17, 2012 at 11:16 AM, Duncan Temple Lang
 dun...@wald.ucdavis.edu wrote:
 Hi James

   Unfortunately, I am not certain if the latest version
 of the XML package has the garbage collection activated for the nodes.
 It is quite complicated and that feature was turned off in some versions
 of the package.  I suggest that you install the version of the package on 
 github

   git@github-omg:omegahat/XML.git

 I believe that will handle the garbage collection of nodes, and I'd like
 to know if it doesn't.

Best,
 D.
 
 Hi,
 
 Thanks for your response and I'm sorry, I should have been more
 specific regarding the version of XML.  I'm using XML 3.9-4.
 
 As a sort of follow-on question?  Is there a preferable way to install
 this version of XML from github?  Do I have to use git to clone it, or
 maybe use the install_github function from Hadley's devtools package?
 I note that the README indicates that:
 
 This R package is not in the R package format in the github repository.
 It was initially developed in 1999 and was intended for use in both
 S-Plus and R and so requires a different structure for each.
 
 So I was wondering what the general procedure is and whether there's
 anything special I need to do to install it?  Thanks.
 
 
 James
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing large XML documents in R - how to optimize the speed?

2012-08-11 Thread Duncan Temple Lang

Hi Frederic

  You definitely want to be using xmlParse() (or equivalently
  xmlTreeParse( , useInternalNodes = TRUE)).

  This then allows use of getNodeSet()

  I would suggest you use Rprof() to find out where the bottlenecks arise,
   e.g. in the XML functions or in S4 code, or in your code that assembles the
R objects from the XML.

  I'm happy to take a look at speeding it up if you can make the test file 
available
and show me your code.

D.
On 8/10/12 3:46 PM, Frederic Fournier wrote:
 Hello everyone,
 
 I would like to parse very large xml files from MS/MS experiments and
 create R objects from their content. (By very large, I mean going up to
 5-10Gb, although I am using a 'small' 40M file to test my code.)
 
 My first attempt at parsing the 40M file, using the XML package, took more
 than 2200 seconds and left me quite disappointed.
 I managed to cut that down to around 40 seconds by:
 -using the 'useInternalNodes' option of the XML package when parsing
 the xml tree;
 -vectorizing the parsing (i.e., replacing loops like for(node in
 group.of.nodes) {...} by sapply(group.of.node, function(node){...})
 I gained another 5 seconds by making small changes to the functions used
 (like replacing 'getNodeset' by 'xmlElementsByTagName' when I don't need to
 navigate to the children nodes).
 Now I am blocked at around 35 seconds and I would still like to cut this
 time by a 5x, but I have no clue what to do to achieve this gain. I'll try
 to expose as briefly as possible the relevant structure of the xml file I
 am parsing, the structure of the R object I want to create, and the type of
 functions I am using to do it. I hope that one of you will be able to point
 me towards a better and quicker way of doing the parsing!
 
 
 Here is the (simplified) structure of the relevant nodes of the xml file:
 
 model (many many nodes)
   protein (a couple of proteins per model node)
 peptide (1 per protein node)
   domain (1 or more per peptide node)
 aa (0 or more per domain node)
 /aa
   /domain
 /peptide
   /protein
 /model
 
 Here is the basic structure of the R object that I want to create:
 
 'result' object that contains:
   -various attributes
   -a list of 'protein' objects, each of which containing:
   -various attributes
   -a list of 'peptide' objects, each of which containing:
 -various attributes
 -a list of 'aa' objects, each of which consisting of a couple of
 attributes.
 
 Here is the basic structure of the code:
 
 xml.doc - xmlTreeParse(file, getDTD=FALSE, useInternalNodes=TRUE)
 result - new('S4_result_class')
 result@proteins - xpathApply(xml.doc, //model/protein,
 function(protein.node) {
   protein - new('S4_protein_class')
   ## fill in a couple of attributes of the protein object using xmlValue
 and xmlAttrs(protein.node)
   protein@peptides - xpathApply(protein.node, ./peptide,
 function(peptide.node) {
 peptide - new('S4_peptide_class')
 ## fill in a couple of attributes of the peptide object using xmlValue
 and xmlAttrs(peptide.node)
 peptide@aas - sapply(xmlElementsByTagName(peptide.node, name=aa),
 function(aa.node) {
   aa - new('S4_aa_class')
   ## fill in a couple of attributes of the 'aa' object using xmlValue
 and xmlAttrs(aa.node)
 })
   })
 })
 free(xml.doc)
 
 
 Does anyone know a better and quicker way of doing this?
 
 Sorry for the very long message and thank you very much for your time and
 help!
 
 Frederic
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readHTMLTable function - unable to find an inherited method ~ for signature NULL

2012-06-14 Thread Duncan Temple Lang
The second page (mmo-champion.com) doesn't contain a table
node.

To scrape the data from the page, you will have to explore its
HTML structure.

   D.

On 6/14/12 9:31 AM, Moon Eunyoung wrote:
 Hi R experts,
 
 I have been playing with library(XML) recently and found out that
 readHTMLTable workls flawlessly for some website, but it does give me an
 error like below
 
  ... Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function readHTMLTable, for
 signature NULL
 let's say..for example, this code works fine
 
 a -http://www.zam.com/forum.html?forum=21p=2;
 table_a - readHTMLTable(a, header = TRUE, which = 1, stringsAsFactors =
 FALSE)
 
 but, this website gives me an error  -
 
 b -http://www.mmo-champion.com/forums/266-General-Discussions/page2;
 table_b - readHTMLTable(b, header = TRUE, which = 1, stringsAsFactors =
 FALSE)
 
 Error in function (classes, fdef, mtable)  :
   unable to find an inherited method for function readHTMLTable, for
 signature NULL
 I think this is due to the structure of the website but i'm not very
 familiar with HTML file, so I'm curious what part of HTML file makes this
 happen. (Also it will be great (!) if someone can point out how to output
 the second example (like the format that first example outputs..) without
 an error
 
 Thanks,
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set cookies in RCurl

2012-06-07 Thread Duncan Temple Lang
To just enable cookies and their management, use the cookiefile
option, e.g.

  txt = getURLContent(url,  cookiefile = )

Then you can pass this to readHTMLTable(), best done as

  content = readHTMLTable(htmlParse(txt, asText = TRUE))


The function readHTMLTable() doesn't use RCurl and doesn't
handle cookies.

   D.

On 6/7/12 7:33 AM, mdvaan wrote:
 Hi,
 
 I am trying to access a website and read its content. The website is a
 restricted access website that I access through a proxy server (which
 therefore requires me to enable cookies). I have problems in allowing Rcurl
 to receive and send cookies. 
 
 The following lines give me:
 
 library(RCurl)
 library(XML)
 
 url - http://www.theurl.com;
 content - readHTMLTable(url)
 
 content
 $`NULL`
   
   
  
 V1
 1 
   
 
 2 
  
 Cookies disabled
 3 
   
 
 4 Your browser currently does not accept cookies.\rCookies need to be
 enabled for Scopus to function properly.\rPlease enable session cookies in
 your browser and try again.
 
 $`NULL`
   V1 V2 V3
 1 
 
 $`NULL`
 V1
 1 Cookies disabled
 
 $`NULL`
   V1
 1   
 2   
 3  
 
 I have carefully read section 4.4. from this:
 http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
 succes:
 
 curl - getCurlHandle()
 curlSetOpt(cookiejar = 'cookies.txt', curl = curl)
 
 Any suggestions on how to allow for cookies?
 
 Thanks.
 
 Math
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to set cookies in RCurl

2012-06-07 Thread Duncan Temple Lang
Apologies for following up on my own mail, but I forgot
to explicitly mention that you will need to specify the
appropriate proxy information in the call to getURLContent().

  D.

On 6/7/12 8:31 AM, Duncan Temple Lang wrote:
 To just enable cookies and their management, use the cookiefile
 option, e.g.
 
   txt = getURLContent(url,  cookiefile = )
 
 Then you can pass this to readHTMLTable(), best done as
 
   content = readHTMLTable(htmlParse(txt, asText = TRUE))
 
 
 The function readHTMLTable() doesn't use RCurl and doesn't
 handle cookies.
 
D.
 
 On 6/7/12 7:33 AM, mdvaan wrote:
 Hi,

 I am trying to access a website and read its content. The website is a
 restricted access website that I access through a proxy server (which
 therefore requires me to enable cookies). I have problems in allowing Rcurl
 to receive and send cookies. 

 The following lines give me:

 library(RCurl)
 library(XML)

 url - http://www.theurl.com;
 content - readHTMLTable(url)

 content
 $`NULL`
  
  

 V1
 1
  
   
 2
  
  
 Cookies disabled
 3
  
   
 4 Your browser currently does not accept cookies.\rCookies need to be
 enabled for Scopus to function properly.\rPlease enable session cookies in
 your browser and try again.

 $`NULL`
   V1 V2 V3
 1 

 $`NULL`
 V1
 1 Cookies disabled

 $`NULL`
   V1
 1   
 2   
 3  

 I have carefully read section 4.4. from this:
 http://www.omegahat.org/RCurl/RCurlJSS.pdf and tried the following without
 succes:

 curl - getCurlHandle()
 curlSetOpt(cookiejar = 'cookies.txt', curl = curl)

 Any suggestions on how to allow for cookies?

 Thanks.

 Math

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/How-to-set-cookies-in-RCurl-tp4632693.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using XML package to read RSS

2012-05-16 Thread Duncan Temple Lang
Hi James.

 Yes, you need to identify the namespace in the query, e.g.

  getNodeSet(doc, //x:entry, c(x = http://www.w3.org/2005/Atom;))

This yeilds 40 matching nodes.

(getNodeSet() is more convenient to use when you don't specify a function
to apply to the nodes. Also, you don't need xmlRoot(doc), as it works on the
entire document with the query //)

 BTW, you want to use xmlParse() and not xmlTreeParse().

   D.


On 5/16/12 6:40 PM, J Toll wrote:
 Hi,
 
 I'm trying to use the XML package to read an RSS feed.  To get
 started, I was trying to use this post as an example:
 
 http://www.r-bloggers.com/how-to-build-a-dataset-in-r-using-an-rss-feed-or-web-page/
 
 I can replicate the beginning section of the post, but when I try to
 use another RSS feed I have an issue.  The RSS feed I would like to
 use is:
 
 URL - 
 http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrenttype=company=dateb=owner=includestart=0count=40output=atom;
 
 library(XML)
 doc - xmlTreeParse(URL)
 
 src - xpathApply(xmlRoot(doc), //entry)
 
 I get an empty list rather than a list of each of the entry:
 
 src
 list()
 attr(,class)
 [1] XMLNodeSet
 
 I'm not sure how to fix this.  Any suggestions?  Do I need to provide
 a namespace, or is the RSS malformed?
 
 Thanks,
 
 
 James
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scraping a web page.

2012-05-15 Thread Duncan Temple Lang

Hi Keith

 Of course, it doesn't necessarily matter how you get the job done
if it actually works correctly.  But for a general approach,
it is useful to use general tools and can lead to more correct,
more robust, and more maintainable code.

Since htmlParse() in the XML package can both retrieve and parse the HTML 
document
  doc = htmlParse(the.url)

is much more succinct than using curlPerform().
However, if you want to use RCurl, just use

txt = getURLContent(the.url)

and  that replaces

  h = basicTextGatherer()
  curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = h$update)
  h$value()


If you have parsed the HTML document, you can find the a nodes that have an
href attribute that start with /en/Ships via

  hrefs = unlist(getNodeSet(doc, //a[starts-with(@href, '/en/Ships')]/@href))


The result is a character vector and you can extract the relevant substrings 
with
substring() or gsub() or any wrapper of those functions.

There are many benefits of parsing the HTML, including not falling foul of
as far as I can tell the the a tag is always on it's own line being not 
true.

D.



On 5/15/12 4:06 AM, Keith Weintraub wrote:
 Thanks,
   That was very helpful.
 
 I am using readLines and grep. If grep isn't powerful enough I might end up 
 using the XML package but I hope that won't be necessary.
 
 Thanks again,
 KW
 
 --
 
 On May 14, 2012, at 7:18 PM, J Toll wrote:
 
 On Mon, May 14, 2012 at 4:17 PM, Keith Weintraub kw1...@gmail.com wrote:
 Folks,
  I want to scrape a series of web-page sources for strings like the 
 following:

 /en/Ships/A-8605507.html
 /en/Ships/Aalborg-8122830.html

 which appear in an href inside an a tag inside a div tag inside a table.

 In fact all I want is the (exactly) 7-digit number before .html.

 The good news is that as far as I can tell the the a tag is always on 
 it's own line so some kind of line-by-line grep should suffice once I 
 figure out the following:

 What is the best package/command to use to get the source of a web page. I 
 tried using something like:
 if(url.exists(http://www.omegahat.org/RCurl;)) {
  h = basicTextGatherer()
  curlPerform(url = http://www.omegahat.org/RCurl;, writefunction = 
 h$update)
   # Now read the text that was cumulated during the query response.
  h$value()
 }

 which works except that I get one long streamed html doc without the line 
 breaks.

 You could use:

 h - readLines(http://www.omegahat.org/RCurl;)

 -- or --

 download.file(url = http://www.omegahat.org/RCurl;, destfile = tmp.html)
 h = scan(tmp.html, what = , sep = \n)

 and then use grep or the XML package for processing.

 HTH

 James
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to download data from soap server using R

2012-05-06 Thread Duncan Temple Lang
There is a kegg package available from the BioConductor repository.
Also, you can generate an interface via the SSOAP package:

  library(SSOAP)
  w = processWSDL(http://soap.genome.jp/KEGG.wsdl)
  iface = genSOAPClientInterface(, )

  iface@functions$list_datbases()

D.


On 5/6/12 3:01 AM, sagarnikam123 wrote:
 i don't know perl,but on server site,they give soap:lite using perl ,
 go to---http://www.kegg.jp/kegg/soap/doc/keggapi_manual.html
 i want to download data from kegg server ,using R only, 
 how to proceed?
  what is mean by SOAP client driver ?
 also go to http://soap.genome.jp/KEGG.wsdl
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/how-to-download-data-from-soap-server-using-R-tp4612595.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] readHTLMTable help

2012-03-27 Thread Duncan Temple Lang
Hi Lucas

 The HTML page is formatted by using tables in each of the cells
of the top-most table.   As a result, the simple table is much more
complex. readHTMLTable() is intended for quick and easy tables.
For tables such as this, you have to implement more customized processors.

doc = 
htmlParse(http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980;)

tb = getNodeSet(doc, //table)[[1]]

This gives the top-most table.

xmlSize(tb) tells us the number of rows. We want to skip the first 3 to get to 
the data.
Then in each of these you can process each row and the cells that have the data.
And the details go on

  D.


On 3/27/12 10:57 AM, Lucas wrote:
 Hello to everyone.
 I´m using this function to download some information from a website.
 This is the URL:
 http://164.77.222.61/climatologia/php/vientoMaximo8.php?IdEstacion=330007FechaIni=01-1-1980
 If you go to that website you´ll find a table with meteorological
 information. One column is called Intesidad Máxima Diaria, and that is
 the one i need.
 I´ve been traying to extract that column, but I´m unable to do it.
 First I tryed simple to download the complete table and then do some kind
 of filter to extract the column but, for some reason when I call the
 function
 a-readHTLMTable(url), the table is downloaded in a unfriendly format and I
 can not differentiate the column
 
 If anyone could help me I´ll appreciate it.
 Thank you.
 
 Lucas.
 
   [[alternative HTML version deleted]]
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SSOAP and Chemspider: Security token?

2012-03-07 Thread Duncan Temple Lang
Hi Michael

Thanks for the report and digging into the actual XML documents
that are sent.

It turns out that if I remove the redundant namespace definitions
and just use a single one on the SimpleSearch node, all is apparently fine.

I've put a pre-release version of the SSOAP package that does at

  http://www.omegahat.org/Prerelease/SSOAP_0.9-1.tar.gz

You can try that.

I'll release this version when I also fix the issue with
XMLSchema  that causes the error in genSOAPClientInterface()


BTW, the if(!is.character(token)) in the example in chemSpider.R
is an error - a mixture of !is.null() and then checking only if it
is a character.


  Best,
Duncan

On 3/7/12 4:58 AM, Stravs, Michael wrote:
 Dear community,
 
 has anyone managed to get SSOAP working with the ChemSpider Web APIs, using 
 functions which need the security token?
 I use SSOAP 0.9-0 from the OmegaHat repository.
 In the example code from SSOAP there is a sample which uses a token function. 
 Interestingly, it checks if(!is.character(token)) first (and proceeds if the 
 token is NOT character.) I can't test that function since I have no idea how 
 to get the token into non-character form :)
 
 My code:
 
 library(SSOAP)
 chemspider_sectoken - ----
 # (token was here)
 cs - processWSDL(http://www.chemspider.com/Search.asmx?WSDL;)
 # intf - genSOAPClientInterface(,cs)
 # (this fails, see below. The Mass Spec API is correctly parsed. Therefore by 
 hand:)
 
 csidlist - .SOAP(server=cs@server,
   method=SimpleSearch,
   .soapArgs=list(
 query=Azithromycin,
 token=token
   ),
   action=I(http://www.chemspider.com/SimpleSearch;),
   xmlns=c(http://www.chemspider.com/;)
   )
 
 Fehler: Error occurred in the HTTP request:  Unauthorized web service usage. 
 Please request access to this service. --- Unauthorized web service usage. 
 Please request access to this service.
 
 If one looks into the request, the doc seems to be correct:
 
 ?xml version=1.0?
 
 SOAP-ENV:Envelope xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/; 
 xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/; 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xmlns:xsd=http://www.w3.org/2001/XMLSchema; 
 SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/;
 
   SOAP-ENV:Body
 
 ns:SimpleSearch xmlns:ns=http://www.chemspider.com/;
 
   ns:query xmlns:ns=http://www.chemspider.com/; 
 xsi:type=xsd:stringAzithromycin/ns:query
 
   ns:token xmlns:ns=http://www.chemspider.com/; 
 xsi:type=xsd:string----/ns:token
 
 /ns:SimpleSearch
 
   /SOAP-ENV:Body
 
 /SOAP-ENV:Envelope
 
 
 
 Compared to the sample request from the ChemSpider homepage:
 
 ?xml version=1.0 encoding=utf-8?
 
 soap:Envelope xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xmlns:xsd=http://www.w3.org/2001/XMLSchema; 
 xmlns:soap=http://schemas.xmlsoap.org/soap/envelope/;
 
   soap:Body
 
 SimpleSearch xmlns=http://www.chemspider.com/;
 
   querystring/query
 
   tokenstring/token
 
 /SimpleSearch
 
   /soap:Body
 
 /soap:Envelope
 
 
 
 To me, both look like perfectly fine XML and should be identical to the 
 ChemSpider XML parser. Do I need to write the token in another format? Is 
 something else wrong? I also tried to use .literal = T to no avail: Error 
 occurred in the HTTP request:  Empty query
 
 Best regards,
 -Michael
 
 PS: the error message from the genSOAPClientInterface call is:
 Note: Method with signature ClassDefinition#list chosen for function 
 resolve,
 target signature ExtendedClassDefinition#SchemaCollection.
 SOAPType#SchemaCollection would also be valid
 Fehler in makePrototypeFromClassDef(properties, ClassDef, immediate, where) :
   'name' muss eine nicht-Null Zeichenkette sein
 Zusätzlich: Warnmeldung:
 undefined slot classes in definition of ExactStructureSearchOptions: 
 NA(class EMatchType)
 
 
 
   [[alternative HTML version deleted]]
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl format

2012-01-30 Thread Duncan Temple Lang

Hi KTD Services (!)

 I assume by DELETE, you mean the HTTP method
and not the value of a parameter  named _method
that is processed by the URL script.

 If that is the case, then you want to use the
 customRequest option for the libcurl operation
 and you don't need or want to use postForm().


 Either

curlPerform(url = url, customrequest = DELETE,
  userpwd = user:password)

 or with a recent version of the RCurl package


httpDELETE(url, userpwd = user:password)


  The parameter _method you are using is being passed on to the form
script.  It is not recognized by postForm() as being something controlling
the request, but just part of the form submission.

  D.



On 1/30/12 2:55 AM, KTD Services wrote:
 I am having trouble with the postForm function in RCurl.
 
 I want to send a the command DELETE https://somewebsite.com.json
 
 but I can't seem to find it.  I could try:
 
 postForm(url, _method=DELETE, .opts = list(username:password) )
 
 but I get the error:
 
 Error: unexpected input in postForm(url4, _
 
 this error seems to be due to the underscore _ before method
 
 Any ideas how I can do a DELETE command another way in RCurl?
 
 Thanks.
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Getting htmlParse to work with Hebrew? (on windows)

2012-01-30 Thread Duncan Temple Lang
With some off-line interaction and testing by Tal, the latest
version of the XML package (3.9-4) should resolve these issues.
So the encoding from the document is used in more cases as the default.

It is often important to specify the encoding for HTML files in
the call to htmlParse() and use UTF-8 rather than the lower case.

I'll add code to make this simpler when I get a chance.

  Thanks Tal

D.

On 1/30/12 5:35 AM, Tal Galili wrote:
 Hello dear R-help mailing list.
 
 
 
 I wish to be able to have htmlParse work well with Hebrew, but it keeps to
 scramble the Hebrew text in pages I feed into it.
 
 For example:
 
 # why can't I parse the Hebrew correctly?
 
 library(RCurl)
 library(XML)
 u = http://humus101.com/?p=2737;
 a = getURL(u)
 a # Here - the hebrew is fine.
 a2 - htmlParse(a)
 a2 # Here it is a mess...
 
 None of these seem to fix it:
 
 htmlParse(a, encoding = utf-8)
 
 htmlParse(a, encoding = iso8859-8)
 
 This is my locale:
 
 Sys.getlocale()
 
 [1] 
 LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255

 
 Any suggestions?
 
 
 Thanks up front,
 Tal
 
 
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Custom XML Readers

2011-12-25 Thread Duncan Temple Lang

In addition to the general tools of the XML package,
I also had code that read documents with a similar structure
to the ones Andy illustrated. I put them and simple examples
of using them at the bottom of

   http://www.omegahat.org/RSXML/

page.

  D.

On 12/23/11 5:50 PM, Ben Tupper wrote:
 Hi Andy,
 
 On Dec 23, 2011, at 2:51 PM, pl.r...@gmail.com wrote:
 
 I need to construct a custom XML reader, the files I'm working with are in
 funky XML format:

 str name=authorPaul H/str
  str name=countryUSA/str
  date name=created_date2010-02-16/date

 I want to read the file so it looks like:

 author = Paul H
 country = USA
 created_date=2010-02-16

 Does any one know how to go about this problem, or know of good references i
 could access?

 
 
 Have you tried Duncan Temple Lang's XML package for R?  It works very well 
 for parsing and building XML formatted data.
 
 http://www.omegahat.org/RSXML/
 
 Cheers,
 Ben
 
 
 
 Thanks,
 Andy


 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 Ben Tupper
 Bigelow Laboratory for Ocean Sciences
 180 McKown Point Rd. P.O. Box 475
 West Boothbay Harbor, Maine   04575-0475 
 http://www.bigelow.org
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Text Mining with Facebook Reviews (XML and FQL)

2011-10-11 Thread Duncan Temple Lang

Hi Kenneth

  First off, you probably don't need to use xmlParseDoc(), but rather
  xmlParse().  (Both are fine, but xmlParseDoc() allows you to control many of
  the options in the libxml2 parser, which you don't need here.)

  xmlParse() has some capabilities to fetch the content of URLs. However,
 it cannot deal with HTTPS requests which this call to facebook is.
 The approach to this is to
i) make the request
   ii) parse the resulting string via xmlParse(txt, asText = TRUE)

 As for i), there are several ways to do this, but the RCurl
 package allows you to do it entirely within R and gives you
 more control over the request than you would ever want.

   library(RCurl)
   txt = getForm('https://api.facebook.com/method/fql.query', query = QUERY)

   mydata.xml = xmlParse(txt, asText = TRUE)

However, you are most likely going to have to login / get a token
before you make this request. And then, if you are using RCurl,
you will want to use the same curl object with the token or cookies, etc.

D.

On 10/10/11 3:52 PM, Kenneth Zhang wrote:
 Hello,
 
 I am trying to use XML package to download Facebook reviews in the following
 way:
 
 require(XML)
 mydata.vectors - character(0)
 Qword - URLencode('#IBM')
 QUERY - paste('SELECT review_id, message, rating from review where message
 LIKE %',Qword,'%',sep='')
 Facebook_url =  paste('https://api.facebook.com/method/fql.query?query=
 ',QUERY,sep='')
 mydata.xml - xmlParseDoc(Facebook_url, asText=F)
 mydata.vector - xpathSApply(mydata.xml, '//s:entry/s:title', xmlValue,
 namespaces =c('s'='http://www.w3.org/2005/Atom'))
 
 The mydata.xml is NULL therefore no further step can be execute. I am not so
 familiar with XML or FQL. Any suggestion will be appreciated. Thank you!
 
 Best regards,
 Kenneth
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Add png image outside plot borders

2011-09-18 Thread Duncan Temple Lang

Amelia

  You can persuade rasterImage() (and other functions) to draw
outside of the data region using xpd = NA or xpd = TRUE.
See the help for the par function.

D.

On 9/18/11 1:59 PM, Amelia McNamara wrote:
 If you run this, you'll see that I have some text at the bottom, but
 the logo is within the plot borders.
 
 plot(c(1.1, 2.3, 4.6), c(2.0, 1.6, 3.2), ylab=, xlab=)
 mtext(X axis label, side=1, line=3)
 mtext(Copyright statement, side=1, line=4, adj=0, cex=0.7)
 library(png)
 z - readPNG(Cc.logo.circle.png)
 rasterImage(z, 1, 1.6, 1.2, 1.7)
 
 I've tried doing things like
 
 rasterImage(z, 1, 0.5, 1.2, 1)
 
 but nothing shows up. The documentation for rasterImage() says that
 the corner values have to be within the plot region. As I said before,
 I want the logo to be down on the level of my copyright text, outside
 the plot region.
 
 Thanks!
 
 
 On Sun, Sep 18, 2011 at 1:26 PM, Joshua Wiley jwiley.ps...@gmail.com wrote:
 Hi Amelia,

 Can you give an example (using text where you want the CC is fine)?
 Two angles I would try would be A) changing the regions or related but
 more flexible (and hence complex) B) use grid of course if you're
 making these with, say, ggplot2, you're already in grid (but then
 mtext probably would not work, though I have not tried it offhand).
 Anyway, an example (code please, not just the picture), will clear up
 all these questions and we can offer a solution tailored to what you
 are doing.

 Cheers,

 Josh

 On Sun, Sep 18, 2011 at 1:18 PM, Amelia McNamara
 amelia.mcnam...@stat.ucla.edu wrote:
 I am trying to add a copyright disclaimer outside the plot borders of
 some images I have created. I can use mtext() to add the written
 portion, but I would like to have the Creative Commons license image
 (http://en.wikipedia.org/wiki/File:Cc.logo.circle.svg) before the
 text. I've found that I can plot a .png image inside the plot
 boundaries using rasterImage() but I can't figure out how to do it
 outside the boundaries.

 Any help would be great. If you know unicode or Adobe Symbol encoding
 for the CC logo, that might work too.

 ~Amelia McNamara
 Statistics PhD student, UCLA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 Programmer Analyst II, ATS Statistical Consulting Group
 University of California, Los Angeles
 https://joshuawiley.com/

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] htmlParse hangs or crashes

2011-09-06 Thread Duncan Temple Lang

Hi Simon

 Unfortunately, it works for me on my OS X machine. So I can't reproduce the 
problem.
 I'd be curious to know which version of libxml2 you are using. That might be 
the cause
of the problem.
You can find this with

  library(XML)
  libxmlVersion()

 You might install a more recent version (e.g. libxml = 2.07.0)

 You can send the info to me off list and we can try to resolve the problem.


 htmlParse() returns a reference to the internal C-level XML tree/document.
When you print the value of the variable .x, we then serialize that C-level 
data structure
to a string.

 htmlTreeParse(), by default, converts that C-level XML tree/document into 
regular R objects.
So it traverses the tree and creates those R list()s before it returns and then 
throws the
C-level tree away.

D.

On 9/5/11 2:48 PM, Simon Kiss wrote:
 Dear colleagues,
 each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is 
 included below as is the results of a series of basic commands that describe 
 what I'm experiencing.  The results of sessionInfo() are attached at the 
 bottom of the message.
 The thing is, htmlTreeParse appears to work just fine, although it doesn't 
 appear to contain the information I need (the URLs of the articles linked to 
 on this search page).  Regardless, I'd still like to understand why htmlParse 
 doesn't work.
 Thank you for any insight.
 Yours, 
 Simon Kiss
 
 
 myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;)
 
 .x-htmlParse(myurl)
 
 class(.x)
 #returns HTMLInternalDocument XMLInternalDocument 
 
 .x
 #returns
 *** caught segfault ***
 address 0x1398754, cause 'memory not mapped'
 
 Traceback:
  1: .Call(RS_XML_dumpHTMLDoc, doc, as.integer(indent), 
 as.character(encoding), as.logical(indent), PACKAGE = XML)
  2: saveXML(from)
  3: saveXML(from)
  4: asMethod(object)
  5: as(x, character)
  6: cat(as(x, character), \n)
  7: print.XMLInternalDocument(pointer: 0x11656d3e0)
  8: print(pointer: 0x11656d3e0)
 
 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace
 
 sessionInfo()
 R version 2.13.0 (2011-04-13)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 
 
 other attached packages:
 [1] XML_3.4-0  RCurl_1.5-0bitops_1.0-4.1
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R hangs after htmlTreeParse

2011-08-25 Thread Duncan Temple Lang

Hi Simon

 I tried this on OS X, Linux and Windows and it works without any problem.
So there must be some strange interaction with your configuration.
So below are some things to try in order to get more information about the 
problem.

It would be more informative to give us the explicit version information
about the packages, e.g. use sessionInfo().  Details are very important
in cases like this.

In addition the versions of the packages, it is also important to identify the
version of libxml via the  libxmlVersion() function.
(Mine is 2.07.03. Yours may still be in the 2.6.16 region. I can't recall the 
defaults on OS X 10.6.)

Are you doing this in a GUI or at the command-line? If the former, try the
latter, i.e. run the commands in a terminal and see if that changes anything,
e.g. if any characters are causing problems.

Since you are seeing some of the HTML document appear on the console, the 
problem is
in the implicit call to print when after the call to htmlTreeParse().
The problem is likely to be delayed if you assign the result of htmlTreeParse()
to a variable and do not induce this call to print().
Then you can explore the tree and see if it is corrupted in some way.

Furthermore, you might use htmlParse(). It returns the tree in a very different
form, but which can be manipulated with the same R functions, and also XPath 
queries.
I very rarely (i.e. never) use htmlTreeParse() anymore.

 D.



On 8/25/11 8:41 AM, Simon Kiss wrote:
 Dear colleagues,
 I'm trying to parse the html content from this webpage:
 http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011
 
 Using the following code
 library(RCurl)
 library(XML)
 myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;)
 
 .x-getURL(myurl)
 htmlTreeParse(.x, asText=T)
 
 This prints approximately 15 lines of the output from the html document and 
 then mysteriously stops. The command line prompt does not reappear and force 
 quit is the only option. 
 I'm running R 2.13 on Mac os 10.6 and the latest versions of XML and RCURL 
 are installed.
 Yours, Simon Kiss
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert an xml object into a list on R 2.13

2011-08-16 Thread Duncan Temple Lang

Hi Samuel

 The xmlToList() function is still in the XML package. I suspect
you are making some simple mistake like not loading the XML package
or haven't installed it or are not capitalizing the name of the function
correctly (you refer the xml package rather than by its actual name).

You haven't told us about your operating system or the output of sessionInfo().
We don't even know which version of the XML package you seem to be using.

  D.

On 8/16/11 8:52 AM, Samuel Le wrote:
 
 
 Hi,
 
 
 
 I am manipulating xml objects using the package xml. With the version 2.10.1 
 this package included the function xmlToList that was converting the xml into 
 a list straight away.
 
 This function seems to have gone when I moved to 2.13.0. Does someone has an 
 equivalent for it?
 
 
 
 Thanks,
 
 Samuel
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading XML files masquerading as XL files

2011-08-10 Thread Duncan Temple Lang

Hi Dennis

 That those files are in a directory/folder suggests that they were extracted 
from their
zip (.xlsx) file.  The following are the basic contents of the .xlsx file
1484  02-28-11 12:48   [Content_Types].xml
  733  02-28-11 12:48   _rels/.rels
  972  02-28-11 12:48   xl/_rels/workbook.xml.rels
  846  02-28-11 12:48   xl/workbook.xml
  940  02-28-11 12:48   xl/styles.xml
 1402  02-28-11 12:48   xl/worksheets/sheet2.xml
 7562  02-28-11 12:48   xl/theme/theme1.xml
 1888  02-28-11 12:48   xl/worksheets/sheet1.xml
  470  02-28-11 12:48   xl/sharedStrings.xml
  196  02-28-11 12:48   xl/calcChain.xml
21316  02-28-11 12:48   docProps/thumbnail.jpeg
  629  02-28-11 12:48   docProps/core.xml
  828  02-28-11 12:48   docProps/app.xml
If most of these are present, I would explore whether the sender could give 
them to you without
unzipping them or make sure that your software isn't automatically unzipping 
them for you.


Note that not all files in the .xlsx are sheets and the WorkSheet is the
basic entity that corresponds to a .csv file.

The xlsx package and my REXcelXML packages will probably get you a fair bit of 
the way
in extracting the content, but they probably will need some tinkering since 
they expect
the different components to be in a zip archive.
There is also an office2010 package which seems to have an overlap with what is 
in
xlsx, and ROOXML, RWordXML and RExcelXML.

  D.



On 8/10/11 7:26 AM, Dennis Fisher wrote:
 R version 2.13.1
 OS X (or Windows)
 
 Colleagues,
 
 I received a number of files with a .xls extension.  These files open in XL 
 and, by all appearances, are XL files.  However, it appears to me that the 
 files are actually XML:
 
 readLines(dir()[16])[1:10]
  [1] ?xml version=\1.0\?
 
  [2] Workbook xmlns=\urn:schemas-microsoft-com:office:spreadsheet\   
 
  [3]  xmlns:o=\urn:schemas-microsoft-com:office:office\   
 
  [4]  xmlns:x=\urn:schemas-microsoft-com:office:excel\
 
  [5]  xmlns:ss=\urn:schemas-microsoft-com:office:spreadsheet\ 
 
  [6]  xmlns:html=\http://www.w3.org/TR/REC-html40\;   
 
  [7]  DocumentProperties 
 xmlns=\urn:schemas-microsoft-com:office:office\
  [8]   Version12.0/Version  
 
  [9]  /DocumentProperties 
 
 [10]  OfficeDocumentSettings 
 xmlns=\urn:schemas-microsoft-com:office:office\
 
  I had initially tried to read the files using read.xls (gdata) but that 
 failed (not surprisingly).  I could open each Excel file, then save as csv, 
 then use read.csv.  However, there are many files so I would love to have a 
 solution that does not require this brute force approach.
 
 Are there any packages that would allow me to read these files without the 
 additional steps?
 
 Dennis
 
 
 Dennis Fisher MD
 P  (The P Less Than Company)
 Phone: 1-866-PLessThan (1-866-753-7784)
 Fax: 1-866-PLessThan (1-866-753-7784)
 www.PLessThan.com
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SSOAP chemspider

2011-08-08 Thread Duncan Temple Lang

Hi Paul

 I've been gradually filling in the XMLSchema packages for different cases that 
arise.
My development versions of SSOAP and XMLSchema get a long way further and I 
have been trying
to find time to finish them off.  Fortunately, it is on my todo list for the 
next few weeks.
I have released new (source) versions of the packages (XMLSchema 0.2-0 and 
SSOAP 0.6-0)
on the Omegahat repository.

These succeed in the genSOAPClientInterface(, processWSDL( url )) for each of 
the 3 WSDLs
in your email

Also, there are numerous WSDLs in the source of the package and also mentioned 
in the
Todo.xml file and the code works for almost all of those.

 Thanks for the report

   D.




On 8/2/11 9:10 AM, Benton, Paul wrote:
 Has anyone got SSOAP working on anything besides KEGG?
 
 I just tried another 3 SOAP servers. Both the WSDL and constructing the .SOAP 
 call. Again the perl and ruby interface worked without any hitches.
 
 Paul
 
 library(SSOAP)
 massBank-processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;)
 Error in parse(text = paste(txt, collapse = \n)) : 
   text:1:29: unexpected input
 1: function(x, ..., obj = new( ‚
^
 In addition: Warning message:
 In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) :
   Ignoring additional serviceport ... elements

 
 metlin-processWSDL(http://metlin.scripps.edu/soap/metlin.wsdl;)
 Error in parse(text = paste(txt, collapse = \n)) : 
   text:1:29: unexpected input
 1: function(x, ..., obj = new( ‚
^
 pubchem-processWSDL(http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl;)
 Error in parse(text = paste(txt, collapse = \n)) : 
   text:1:29: unexpected input
 1: function(x, ..., obj = new( ‚
^
 
 
 
 On 20 Jul 2011, at 01:54, Benton, Paul wrote:
 
 Dear all,

 I've been trying on and off for the past few months to get SSOAP to work 
 with chemspider. First I tried the WSDL file:

 cs-processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;)
 Error in parse(text = paste(txt, collapse = \n)) : 
  text:1:29: unexpected input
 1: function(x, ..., obj = new( ‚
   ^
 In addition: Warning message:
 In processWSDL(http://www.chemspider.com/MassSpecAPI.asmx?WSDL;) :
  Ignoring additional serviceport ... elements

 Next I've tried using just the pure .SOAP to call the database. 

 s - SOAPServer(http://www.chemspider.com/MassSpecAPI.asmx;)
 csid- .SOAP(s, SearchByMass2, mass=89.04767, range=0.01,
action = I(http://www.chemspider.com/SearchByMass2;),
xmlns = c(http://www.chemspider.com;), .opts = list(verbose = TRUE))

 This seems to work and gives back a result. However, this result isn't the 
 right result. It's seems to have converted the mass into 0. When I run the 
 similar program in perl I get the correct id's. So this isn't a server side 
 problem but SSOAP. Any thoughts or suggestions on other packages to use?
 Further infomation about the SeachByMass2 method and it's xml that it's 
 expecting.
 http://www.chemspider.com/MassSpecAPI.asmx?op=SearchByMass2

 Cheers,


 Paul


 PS Placing a fake error in the .SOAP code I can look at the xml it's sending 
 to the server:
 Browse[1] doc
 ?xml version=1.0?
 SOAP-ENV:Envelope 
 xmlns:SOAP-ENC=http://schemas.xmlsoap.org/soap/encoding/; 
 xmlns:SOAP-ENV=http://schemas.xmlsoap.org/soap/envelope/; 
 xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
 xmlns:xsd=http://www.w3.org/2001/XMLSchema; 
 SOAP-ENV:encodingStyle=http://schemas.xmlsoap.org/soap/encoding/;
  SOAP-ENV:Body
ns:SearchByMass2 xmlns:ns=http://www.chemspider.com;
  ns:mass89.04767/ns:mass
  ns:range0.01/ns:range
/ns:SearchByMass2
  /SOAP-ENV:Body
 /SOAP-ENV:Envelope
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading data from password protected url

2011-06-25 Thread Duncan Temple Lang

Hi Steve

 RCurl can help you when you need to have more control over Web requests.
The details vary from Web site to Web site and the different ways to specify
passwords, etc.

If the JSESSIONID and NCES_JSESSIONID are regular cookies and returned in the 
first
request as cookies, then you can just have RCurl handle the cookies
But the basics for your case are

  library(RCurl)
  h = getCurlHandle( cookiefile = )

Then make your Web request using getURLContent(), getForm() or postForm()
but making certain to pass the curl handle  stored in h in each call, e.g.

  ans = getForm(yourURL, login = bob, password = jane, curl = h)

  txt = getURLContent(dataURL, curl = h)


If JSESSIONID and NCES_JSESSIONID are not returned as cookies but HTTP header 
fields, then you
need to process the header.
Something like

  rdr = dynCurlReader(h)

  ans = getForm(yourURL, login = bob, password = jane, curl = h, header = 
rdr$update)

Then the header  from the HTTP response is available as
  rdr$header()

and you can use parseHTTPHeader(rdr$header()) to convert it into a named vector.


 HTH,
D.

On 6/24/11 2:12 PM, Steven R Corsi wrote:
 I am trying to retrieve data from a password protected database. I have login 
 information and the proper url. When I
 make a request to the url, I get back some info, but need to read the hidden 
 header information that has JSESSIONID
 and NCES_JSESSIONID. They need to be used to set cookies before sending off 
 the actual url request that will result in
 the data transfer. Any help would be much appreciated.
 Thanks
 Steve
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv fails to read a CSV file from google docs

2011-04-29 Thread Duncan Temple Lang

Thanks David for fixing the early issues.

The reason for the failure is that the response
from the Web server is a to redirect the requester
to another page, specifically

 
https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv

Note that this is https, not http, and the built-in URL reading facilities in R 
don't suport https.


One way to see this is to use look at the headers in your browser (e.g. Live 
HTTP Headers),
or to use curl, or the RCurl package

tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;,
  hl =en, key = 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE,
  single = true, gid =0,
  output = csv,
 .opts = list(followlocation = TRUE, verbose = TRUE))


The verbose option shows the entire dialog, and tt contains the
text of the CSV document.

 read.csv(textConnection(tt))

then yields the data frame

  D.


On 4/29/11 10:36 AM, David Winsemius wrote:
 
 On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
 
 Hello all,
 I wish to use read.csv to read a google doc spreadsheet.

 I try using the following code:

 data_url - 
 http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv

 
 read.csv(data_url)

 Which results in the following error:

 Error in file(file, rt) : cannot open the connection


 I'm on windows 7.  And the code was tried on R 2.12 and 2.13

 I remember trying this a few months ago and it worked fine.
 
 I am always amused at such claims. Occasionally they are correct, but more 
 often a crucial step has been omitted. In
 this case you have at a minimum embedded line-feeds in your URL string and 
 have not established a connection, so it
 could not possibly have succeeded as presented.
 
 But now it's time to admit I do not know why it is not succeeding when I 
 correct those flaws.
 
 closeAllConnections()
 data_url -
 url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv;)
 
 read.csv(data_url)
 Error in open.connection(file, rt) : cannot open the connection
 
 closeAllConnections()
 dd - read.csv(con - 
 url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv;))
 
 Error in open.connection(file, rt) : cannot open the connection
 
 
 So, I guess I'm not reading the help pages for `url` and `read.csv` as well I 
 thought I was.
 
 
 Any suggestion what might be causing this or how to solve it?
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] read.csv fails to read a CSV file from google docs

2011-04-29 Thread Duncan Temple Lang
Hi Tal

You can add

  ssl.verifypeer = FALSE

in the .opts list so that the certificate is simply accepted.

Alternatively, you can tell libcurl where to find the certification
authority file containing signatures. This can be done via the cainfo
option, e.g.

   cainfo = system.file(CurlSSL, cacert.pem, package = RCurl),

Often such a collection of certificates is installed with the ssl library.

  D.

On 4/29/11 2:42 PM, Tal Galili wrote:
 Hello Duncan,
 Thank you for having a look at this.
 
 I tried the code you provided but it failed in the getForm stage.  running 
 this:
 
  tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;,
 +  hl =en, key = 
 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE,
 +  single = true, gid =0,
 +  output = csv,
 + .opts = list(followlocation = TRUE, verbose = TRUE))
 
 Resulted in the following error:
 
 Error in curlPerform(url = url, headerfunction = header$update, curl = 
 curl,  : 
   SSL certificate problem, verify that the CA cert is OK. Details:
 error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate 
 verify failed
 
 
 Did I miss some step?
 
 
 
 
 
 Contact 
 Details:---
 Contact me: tal.gal...@gmail.com mailto:tal.gal...@gmail.com |  
 972-52-7275845
 Read me: www.talgalili.com http://www.talgalili.com (Hebrew) | 
 www.biostatistics.co.il
 http://www.biostatistics.co.il (Hebrew) | www.r-statistics.com 
 http://www.r-statistics.com (English)
 --
 
 
 
 
 On Fri, Apr 29, 2011 at 9:18 PM, Duncan Temple Lang dun...@wald.ucdavis.edu 
 mailto:dun...@wald.ucdavis.edu wrote:
 
 
 Thanks David for fixing the early issues.
 
 The reason for the failure is that the response
 from the Web server is a to redirect the requester
 to another page, specifically
 
  
 https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 https://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 Note that this is https, not http, and the built-in URL reading 
 facilities in R don't suport https.
 
 
 One way to see this is to use look at the headers in your browser (e.g. 
 Live HTTP Headers),
 or to use curl, or the RCurl package
 
 tt = getForm(http://spreadsheets0.google.com/spreadsheet/pub;,
  hl =en, key = 
 0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REE,
  single = true, gid =0,
  output = csv,
 .opts = list(followlocation = TRUE, verbose = TRUE))
 
 
 The verbose option shows the entire dialog, and tt contains the
 text of the CSV document.
 
  read.csv(textConnection(tt))
 
 then yields the data frame
 
  D.
 
 
 On 4/29/11 10:36 AM, David Winsemius wrote:
 
  On Apr 29, 2011, at 11:19 AM, Tal Galili wrote:
 
  Hello all,
  I wish to use read.csv to read a google doc spreadsheet.
 
  I try using the following code:
 
  data_url - 
 
 
 http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
  
  read.csv(data_url)
 
  Which results in the following error:
 
  Error in file(file, rt) : cannot open the connection
 
 
  I'm on windows 7.  And the code was tried on R 2.12 and 2.13
 
  I remember trying this a few months ago and it worked fine.
 
  I am always amused at such claims. Occasionally they are correct, but 
 more often a crucial step has been omitted. In
  this case you have at a minimum embedded line-feeds in your URL string 
 and have not established a connection, so it
  could not possibly have succeeded as presented.
 
  But now it's time to admit I do not know why it is not succeeding when 
 I correct those flaws.
 
  closeAllConnections()
  data_url -
 
 
 url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv)
 
  read.csv(data_url)
  Error in open.connection(file, rt) : cannot open the connection
 
  closeAllConnections()
  dd - read.csv(con -
 
 
 url(http://spreadsheets0.google.com/spreadsheet/pub?hl=enhl=enkey=0AgMhDTVek_sDdGI2YzY2R1ZESDlmZS1VYUxvblQ0REEsingle=truegid=0output=csv
 
 http://spreadsheets0.google.com/spreadsheet/pub?hl

Re: [R] RCurl and postForm()

2011-04-29 Thread Duncan Temple Lang

Hi Ryan

 postForm() is using a different style (or specifically Content-Type) of 
submitting the form than the curl -d command.
Switching the style = 'POST' uses the same type, but at a quick guess, the 
parameter name 'a' is causing confusion
and the result is the empty JSON array - [].

A quick workaround is to use curlPerform() directly rather than postForm()

 r = dynCurlReader()
 curlPerform(postfields = 'Archbishop Huxley', url = 
'http://www.datasciencetoolkit.org/text2people', verbose = TRUE,
  post = 1L, writefunction = r$update)
 r$value()

This yields

[1]
[{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:0,\end_index\:17,\matched_string\:\Archbishop
Huxley\}]

and you can use fromJSON() to transform it into data in R.

  D.

On 4/29/11 12:14 PM, Elmore, Ryan wrote:
 Hi everybody,
 
 I think that I am missing something fundamental in how strings are passed 
 from a postForm() call in R to the curl or libcurl functions underneath.  For 
 example, I can do the following using curl from the command line:
 
 $ curl -d Archbishop Huxley http://www.datasciencetoolkit.org/text2people;
 [{gender:u,first_name:,title:archbishop,surnames:Huxley,start_index:0,end_index:17,matched_string:Archbishop
  Huxley}]
 
 Trying the same thing, or what I *think* is the same thing (obvious not) in R 
 (Mac OS 10.6.7, R 2.13.0) produces:
 
 library(RCurl)
 Loading required package: bitops
 api - http://www.datasciencetoolkit.org/text2people;
 postForm(api, a=Archbishop Huxley)
 [1] 
 [{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:44,\end_index\:61,\matched_string\:\Archbishop
  
 Huxley\},{\gender\:\u\,\first_name\:\\,\title\:\archbishop\,\surnames\:\Huxley\,\start_index\:88,\end_index\:105,\matched_string\:\Archbishop
  Huxley\}]
 attr(,Content-Type)
 charset
 text/html utf-8
 
 I can match the result given on the DSTK API's website by using system(), but 
 doesn't seem like the R-like way of doing something.
 
 system(curl -d 'Archbishop Huxley' 
 'http://www.datasciencetoolkit.org/text2people')
 158   141  141   141
 0[{gender:u,first_name:,title:archbishop,surnames:Huxley,start_index:0,end_index:17,matched_string:Archbishop
  Huxley}]17599 72 --:--:-- --:--:-- --:--:--   670
 
 If you want to see some additional information related to this question, I 
 posted on StackOverflow a few days ago:
 http://stackoverflow.com/questions/5797688/post-request-using-rcurl
 
 I am working on this R wrapper for the data science toolkit as a way of 
 illustrating how to make an R package for the Denver RUG and ran into this 
 problem.  Any help to this problem will be greatly appreciated by the Denver 
 RUG!
 
 Cheers,
 Ryan
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Treatment of xml-stylesheet processing instructions in XML module

2011-04-06 Thread Duncan Temple Lang
Hi Adam

To use XPath and getNodeSet on an XML document,
you will want to use xmlParse() and not xmlTreeParse()
to parse the XML content. So

t = xmlParse(I(a)) # or asText = TRUE
elem = getNodeSet(t, /rss/channel/item)[[1]]

works fine.

You don't need to specify the root node, but rather the document
in getNodeSet.

Also, if you have the package loaded, you don't need the XML::
prefix before the function  names.

  HTH
D.

On 4/6/11 11:32 AM, Adam Cooper wrote:
 Hello again,
 Another stumble here that is defeating me.
 
 I try:
 a-readLines(url(http://feeds.feedburner.com/grokin;))
 t-XML::xmlTreeParse(a, ignoreBlanks=TRUE, replaceEntities=FALSE,
 asText=TRUE)
 elem- XML::getNodeSet(XML::xmlRoot(t),/rss/channel/item)[[1]]
 
 And I get:
 Start tag expected, '' not found
 Error: 1: Start tag expected, '' not found
 
 When I modify the second line in a to remove the following (just
 leaving the rss tag with its attributes), I do not get the error.
 I removed:
 ?xml-stylesheet type=\text/xsl\ media=\screen\ href=
 \/~d/styles/rss2full.xsl\??xml-stylesheet type=\text/css\ media=
 \screen\ href=\http://feeds.feedburner.com/~d/styles/itemcontent.css
 \?
 
 I would have expected the PI to be totally ignored by default.
 Have I missed something??
 
 Thanks in advance...
 
 Cheers, Adam
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package XML: Parse Garmin *.tcx file problems

2011-03-30 Thread Duncan Temple Lang
Hi Michael

 Almost certainly, the problem is that the document has a default namespace.
You need to identify the namespace in the XPath query.
xpathApply() endeavors to make this simple:

  xpathApply(doc2, //x:TotalTimeSeconds, xmlValue, namespaces = x)

I suspect that will give you back something more meaningful.

 The x in the query (x:TotalTimeSeconds) is mapped to x = URI
in namespaces and since the URI is not specified, we use the default
namespace on the root node of the document.  Some documents
don't have a default namespace, and then you can use the prefix on the root node
corresponding to the namespace of interest.

  D



On 3/30/11 1:15 PM, Folkes, Michael wrote:
 I'm struggling with package XML to parse a Garmin file (named *.tcx).
 I wonder if it's form is incomplete, but appreciably reluctant to paste
 even a shortened version.
 The output below shows I can get nodes, but an attempt at value of a
 single node comes up empty (even though there is data there.  
 
 One question: Has anybody succeeded parsing Garmin .tcx (xml) files?
 Thanks!
 Michael
 ___
 
 doc2 = xmlRoot(xmlTreeParse(HR.reduced3.tcx,useInternalNodes = TRUE))
 xpathApply(doc2, //*, xmlName)
 [[1]]
 [1] TrainingCenterDatabase
 
 [[2]]
 [1] Activities
 
 [[3]]
 [1] Activity
 
 [[4]]
 [1] Id
 
 [[5]]
 [1] Lap
 
 [[6]]
 [1] TotalTimeSeconds
 
 
 xpathApply(doc2, //TotalTimeSeconds, xmlValue)
 list()

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scrap java scripts and styles from an html document

2011-03-29 Thread Duncan Temple Lang


On 3/28/11 11:38 PM, antujsrv wrote:
 Hi,
 
 I am working on developing a web crawler in R and I needed some help with
 regard to removal of javascripts and style sheets from the html document of
 a web page.
 
 i tried using the xml package, hence the function xpathApply
 library(XML)
 txt =
 xpathApply(html,//body//text()[not(ancestor::script)][not(ancestor::style)],
 xmlValue)
 
 The output comes out as text lines, without any html tags. I want the html
 tags to remain intact and scrap only the javascript and styles from it. 

Well then you would be best served to use that approach, i.e.
find the nodes named script and style and then remove them from
the tree. Then you have the document as a single object
rather than a bunch of individual elements.

So

 nodes = xpathApply(html, //body//script | //body//style)
 removeNodes(nodes)

 saveXML(html)


But you don't say what you want to end up with or what you are doing with
the resulting content or why you have to remove the JavaScript content, etc.

  D.

 
 Any help would be highly appreciated.
 Thanks in advance.
 
 
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Scrap-java-scripts-and-styles-from-an-html-document-tp3413894p3413894.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl HTTP Post ?

2011-02-19 Thread Duncan Temple Lang


On 2/17/11 3:54 PM, Hasan Diwan wrote:
 According to [1] and [2], using RCurl to post a form with basic
 authentication is done using the postForm method. I'm trying to post
 generated interpolation data from R onto an HTTP form. The call I'm using is
 page - postForm('http://our.server.com/dbInt/new', opts =
 curlOptions=(userpwd=test:test, verbose=T), profileid = -1,
 value=1.801, type=history). The page instance shows the HTTP response
 500 screen and I get a nullpointerexception in the server logs. 

Do you mean that the R variable page gives information about the request error
and contains the 500 error code? Not sure what you mean by screen here.

Client-server interactions are hard to debug as the problems can be on either
side or in the communication.  The error can be in your request, in RCurl,
on the server side receiving the request or in the script processing the request
on the server.
So it is imperative to try to get diagnostic information.

You used verbose = T  (TRUE).  What did that display?

postForm() has a style parameter. It controls how the POST request is
submitted, either application/x-www-form-urlencoded or multipart/form-data.
Your server script might be expecting the data in a different format
than is being sent. postForm() defaults to the www-form-urlencoded.

But we will need more information to help you if these are not the
cause of the problem.

  D.

 The line it
 points to is dealing with getting an integer out of profileid. Help?
 Many thanks in advance...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using open calais in R

2011-01-25 Thread Duncan Temple Lang
fayazvf wrote:
 
 I am using calais api in R for text analysis.
 But im facing a some problem when fetching the rdf from the server.
 I'm using the getToHost() method for the api call but i get just a null
 string.

You haven't told us nearly enough for us to be able to reproduce
what you are doing.  Where and how is the R function getToHost() ?
is it in an R package?

 The same url in browser returns an RDF document. 
 
 getToHost(www.api.opencalais.com,/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=)
 [1] 
 http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=;

Yes, and 
  library(RCurl)
  
getURLContent(http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=;)
returns  RDF content
as does
 
download.file(http://api.opencalais.com/enlighten/rest/?licenseID=dkzdggsre232ur97c6be269gcontent=HomeparamsXML=;,
eg.txt)

But since we have no way of knowing what getToHost() does (or the postToHost() 
in your earlier mail),
we cannot figure out what is happening for you.

Please do read the posting guidelines, specifically telling us about your 
session and
what packages you are using.

  D.


 -- 
 View this message in context: 
 http://r.789695.n4.nabble.com/Using-open-calais-in-R-tp3235597p3235597.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
There are men who can think no deeper than a fact - Voltaire


Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpV6jduW3uDW.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Accessing data via url

2011-01-08 Thread Duncan Temple Lang
Just for the record, you don't need to manually find the
URL to which your are being redirected by using the followlocation
option in any of the RCurl functions:


 tt = 
getURLContent(https://sites.google.com/site/jrkrideau/home/general-stores/duplicates.csv;,
  followlocation = TRUE)


(Same with getBinaryURL, but the file is not binary so no need to ask it to 
return
the file as binary. getURLContent() figures out the right thing to do.)

  D.

On 1/7/11 11:08 AM, David Winsemius wrote:
 I don't know how Henrique did it, but in Firefox one can go to the Downloads 
 panel and right click on the downloaded
 file and choose Copy Download link (or something similar) and get:
 
 https://6326258883408400442-a-1802744773732722657-s-sites.googlegroups.com/site/jrkrideau/home/general-stores/duplicates.csv?attachauth=ANoY7cpNemjCFz14tAP3IPYCsAnvo-JJbgPNnPEWN_evBHG2jEYaNFOIT6GZF4M3VuKzioPZwvX7QSvMDWfJ3pHac5JK5BHyflOGBLOo_v44C0oU2V6teTwnjeg4TFufeltT-i5T3ThkuyesCztr6g2yLl65YcckwlEGEDtS-L9yzVe1B6tFEu2n6sjAOV9EHokEFx8e-HDFyf-u5mVIGMPgCHvaQL8pupVz-1p1rEdPpS0f6pqApTc%3Dattredirects=0
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] toJSON question

2010-12-11 Thread Duncan Temple Lang


On 12/11/10 8:00 AM, Santosh Srinivas wrote:
 Hello,
 
 I am trying to use RJSONIO
 
 I have:
 x - c(0,4,8,9)
 y - c(3,8,5,13)
 z - cbind(x,y)
 
 Any idea how to convert z into the JSON format below?
 
 I want to get the following JSON output to put into a php file.
  [[0, 3], [4, 8], [8, 5], [9, 13]]

The toJSON() function is the basic mechanism.
In this case, z has names on the columns.
Remove these
  colnames(z) = NULL

Then toJSON(z) gives you want you want.

If you want to remove the new line (\n) characters,
use gsub().

  gsub(\\\n, , toJSON(z))


  D.


 
 Thank you.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there an implementation for URL Encoding (/format) in R?

2010-11-25 Thread Duncan Temple Lang


On 11/25/10 7:53 AM, Tal Galili wrote:
 Hello all,
 
 I would like some R function that can translate a string to a URL encoding
 (see here: http://www.w3schools.com/tags/ref_urlencode.asp)
 
 Is it implemented? (I wasn't able to find any reference to it)


I expect there are several implementations, spread across
different packages.  The function curlEscape() in RCurl is one.

 D.


 
 Thanks,
 Tal
 
 
 
 
 
 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)
 --
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl and cookies in POST requests

2010-11-21 Thread Duncan Temple Lang
Hi Christian

 There is a new version of the RCurl package on the Omegahat repository
and that handles this case.
The issue was running the finalizer to garbage
collect the curl handle, and it was correctly not being released as
the dynCurlReader() update function was precious and had a reference
to the curl handle.

The new package has finer grained control in curlSetOpt() to control
whether functions are made 'precious' or not and we can use this
when we know we will leave the curl handle in a correct and consistent
state even when the values of the function options are to be garbage
collected before the curl handle.

  D.



On 11/15/10 7:06 AM, Christian M. wrote:
 Hello Duncan.
 
 Thanks for having a look at this. As soon as I get home I'll try
 your suggestion.
 
 BTW, the link to the omega-help mailing list seems to be broken:
 http://www.omegahat.org/mailman/listinfo/
 
 Thank you.
 
 chr
 
 
 Duncan Temple Lang (Monday 15 November 2010, 01:02):
 Hi Christian

  Thanks for finding this. The problem seems to be that the finalizer
 on the curl handle seems to disappear and so is not being called
 when the handle is garbage collected.  So there is a bug somewhere
 and I'll try to hunt it down quickly.

   In the meantime, you can achieve the same effect by calling the
 C routine curl_easy_cleanup.  You can't do this directly with a
 .Call() or .C() as there is no explicit interface in the RCurl
 package to this routine. However, you can use the Rffi package
 (on the omegahat  repository)

  library(Rffi)
  cif = CIF(voidType, list(pointerType))
  callCIF(cif, curl_easy_cleanup, c...@ref)

  I'll keep looking for why the finalizer is getting discarded.

  Thanks again,

  D.

 On 11/14/10 6:30 AM, Christian M. wrote:
 Hello.

 I know that it's usually possible to write cookies to a cookie
 file by removing the curl handle and doing a gc() call. I can do
 this with getURL(), but I just can't obtain the same results with
 postForm().

 If I use:

 curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE)

 and then do:

 getURL(http://example.com/script.cgi, curl=curlHandle)
 rm(curlHandle)
 gc()

 it's OK, the cookie is there. But, if I do (same handle; the
 parameter is a dummy):

 postForm(site, .params=list(par=cookie), curl=curlHandle,
   style=POST)
 rm(curlHandle)
 gc()

 no cookie is written.

 Probably I'm doing something wrong, but don't know what.

 Is it possible to store cookies read from the output of a
 postForm() call? How?

 Thanks.

 Christian

 PS.: I'm attaching a script that can be sourced (and its .txt
 version). It contains an example. The expected result is a file
 (cookies.txt) with two cookies. The script currently uses
 getURL() and two cookies are stored. If postForm() is used
 (currently commented), only 1 cookie is written.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl and cookies in POST requests

2010-11-14 Thread Duncan Temple Lang
Hi Christian

 Thanks for finding this. The problem seems to be that the finalizer
on the curl handle seems to disappear and so is not being called
when the handle is garbage collected.  So there is a bug somewhere
and I'll try to hunt it down quickly.

  In the meantime, you can achieve the same effect by calling the
C routine curl_easy_cleanup.  You can't do this directly with a
.Call() or .C() as there is no explicit interface in the RCurl
package to this routine. However, you can use the Rffi package
(on the omegahat  repository)

 library(Rffi)
 cif = CIF(voidType, list(pointerType))
 callCIF(cif, curl_easy_cleanup, c...@ref)

 I'll keep looking for why the finalizer is getting discarded.

 Thanks again,

 D.

On 11/14/10 6:30 AM, Christian M. wrote:
 Hello.
 
 I know that it's usually possible to write cookies to a cookie
 file by removing the curl handle and doing a gc() call. I can do
 this with getURL(), but I just can't obtain the same results with
 postForm().
 
 If I use:
 
 curlHandle - getCurlHandle(cookiefile=FILE, cookiejar=FILE)
 
 and then do:
 
 getURL(http://example.com/script.cgi, curl=curlHandle)
 rm(curlHandle)
 gc()
 
 it's OK, the cookie is there. But, if I do (same handle; the
 parameter is a dummy):
 
 postForm(site, .params=list(par=cookie), curl=curlHandle,
   style=POST)
 rm(curlHandle)
 gc()
 
 no cookie is written.
 
 Probably I'm doing something wrong, but don't know what.
 
 Is it possible to store cookies read from the output of a
 postForm() call? How?
 
 Thanks.
 
 Christian
 
 PS.: I'm attaching a script that can be sourced (and its .txt
 version). It contains an example. The expected result is a file
 (cookies.txt) with two cookies. The script currently uses
 getURL() and two cookies are stored. If postForm() is used
 (currently commented), only 1 cookie is written.
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGoogleDocs stopped working

2010-11-10 Thread Duncan Temple Lang

Hi Harlan

 I just tried to connect to Google Docs and I had ostensibly the same problem.
However, the password was actually different from what I had specified.
After resetting it with GoogleDocs, the getGoogleDocsConnection() worked
fine. So I don't doubt that the login and password are correct, but
you might just try it again to ensure there are no typos.
The other thing to look at is the values for Email and Passwd
sent in the URL, i.e. the string in url in your debugging
below. (Thanks for that by the way). If either has special characters,
e.g. , it is imperative that they are escaped correctly, i.e. converted
to %24.  This should happen and nothing should have changed, but it is
worth verifying.

 So things still seem to work for me. It is a data point, but not one
that gives you much of a clue as to what is wrong on your machine.

  D.

On 11/10/10 7:36 AM, Harlan Harris wrote:
 Hello,
 
 Some code using RGoogleDocs, which had been working smoothly since the
 summer, just stopped working. I know that it worked on November 3rd, but it
 doesn't work today. I've confirmed that the login and password still work
 when I log in manually. I've confirmed that the URL gives the same error
 when I paste it into Firefox. I don't know enough about this web service to
 figure out the problem myself, alas...
 
 Here's the error and other info (login/password omitted):
 
 ss.con - getGoogleDocsConnection(login=gd.login, password=gd.password,
 service='wise', error=FALSE)
 Error: Forbidden
 
 Enter a frame number, or 0 to exit
 
 1: getGoogleDocsConnection(login = gd.login, password = gd.password, service
 = wise, error = FALSE)
 2: getGoogleAuth(..., error = error)
 3: getForm(https://www.google.com/accounts/ClientLogin;, accountType =
 HOSTED_OR_GOOGLE, Email = login, Passw
 4: getURLContent(uri, .opts = .opts, .encoding = .encoding, binary = binary,
 curl = curl)
 5: stop.if.HTTP.error(http.header)
 
 Selection: 4
 Called from: eval(expr, envir, enclos)
 Browse[1] http.header
Content-Type
 Cache-control  Pragma
text/plainno-cache,
 no-store  no-cache
 ExpiresDate
 X-Content-Type-Options
 Mon, 01-Jan-1990 00:00:00 GMT Wed, 10 Nov 2010 15:24:39
 GMT   nosniff
X-XSS-Protection
 Content-Length  Server
 1; mode=block
 24   GSE
  status   statusMessage
   403 Forbidden\r\n
 Browse[1] url
 [1] 
 https://www.google.com/accounts/ClientLogin?accountType=HOSTED%5FOR%5FGOOGLEEmail=***Passwd=***service=wisesource=R%2DGoogleDocs%2D0%2E1
 
 Browse[1] .opts
 $ssl.verifypeer
 [1] FALSE
 
 
 R.Version()
 $platform
 [1] i386-apple-darwin9.8.0
 
 $arch
 [1] i386
 
 $os
 [1] darwin9.8.0
 
 $system
 [1] i386, darwin9.8.0
 
 $status
 [1] 
 
 $major
 [1] 2
 
 $minor
 [1] 10.1
 
 $year
 [1] 2009
 
 $month
 [1] 12
 
 $day
 [1] 14
 
 $`svn rev`
 [1] 50720
 
 $language
 [1] R
 
 $version.string
 [1] R version 2.10.1 (2009-12-14)
 
 
 installed.packages()[c('RCurl', 'RGoogleDocs'), ]
 Package
 LibPath Version Priority Bundle
 Contains
 RCurl   RCurl
 /Users/hharris/Library/R/2.10/library 1.4-3 NA   NA
 NA
 RGoogleDocs RGoogleDocs
 /Library/Frameworks/R.framework/Resources/library 0.4-1 NA   NA
 NA
 Depends Imports LinkingTo Suggests
 Enhances OS_type License Built
 RCurl   R (= 2.7.0), methods, bitops NA  NARcompression
 NA   NA  BSD   2.10.1
 RGoogleDocs RCurl, XML, methods   NA  NANA
 NA   NA  BSD   2.10.1
 
 
 Any ideas? Thank you!
 
  -Harlan
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] postForm() in RCurl and library RHTMLForms

2010-11-05 Thread Duncan Temple Lang


On 11/4/10 11:31 PM, sayan dasgupta wrote:
 Thanks a lot thats exactly what I was looking for
 
 Just a quick question I agree the form gets submitted to the URL
 http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp;
 
 and I am filling up the form in the page
 http://www.nseindia.com/content/indices/ind_histvalues.htm;
 
 How do I submit the arguments like FromDate, ToDate, Symbol using postForm()
 and submit the query to get the similar table.
 

Well that is what the function that RHTMLForms creates does.
So you can look at that code and see that it calls formQuery()
which ends in a call to postForm(). You could use

   debug(postForm)

and examine the arguments to it.

postForm(...jsp, FromDate = 10-


The answer is

o = 
postForm(http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp;,
  FromDate = 01-11-2010, ToDate = 04-11-2010,
  IndexType = SP CNX NIFTY, check = new,
 style = POST )


 
 
 
 
 
 
 On Fri, Nov 5, 2010 at 6:43 AM, Duncan Temple Lang
 dun...@wald.ucdavis.eduwrote:
 


 On 11/4/10 2:39 AM, sayan dasgupta wrote:
 Hi RUsers,

 Suppose I want to see the data on the website
 url - http://www.nseindia.com/content/indices/ind_histvalues.htm;

 for the index SP CNX NIFTY for
 dates FromDate=01-11-2010,ToDate=02-11-2010

 then read the html table from the page using readHTMLtable()

 I am using this code
 webpage - postForm(url,.params=list(
FromDate=01-11-2010,
ToDate=02-11-2010,
IndexType=SP CNX NIFTY,
Indicesdata=Get Details),
  .opts=list(useragent = getOption(HTTPUserAgent)))

 But it doesn't give me desired result

 You need to be more specific about how it fails to give the desired result.

 You are in fact posting to the wrong URL. The form is submitted to a
 different
 URL -
 http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp




 Also I was trying to use the function getHTMLFormDescription from the
 package RHTMLForms but there we can't use the argument
 .opts=list(useragent = getOption(HTTPUserAgent)) which is needed for
 this
 particular website

 That's not the case. The function RHTMLForms will generate for you does
 support
 the .opts parameter.

 What you want is something along the lines:


  # Set default options for RCurl
  # requests
 options(RCurlOptions = list(useragent = R))
 library(RCurl)

  # Read the HTML page since we cannot use htmlParse() directly
  # as it does not specify the user agent or an
  # Accept:*.*

 url - http://www.nseindia.com/content/indices/ind_histvalues.htm;
 wp = getURLContent(url)

  # Now that we have the page, parse it and use the RHTMLForms
  # package to create an R function that will act as an interface
  # to the form.
 library(RHTMLForms)
 library(XML)
 doc = htmlParse(wp, asText = TRUE)
  # need to set the URL for this document since we read it from
  # text, rather than from the URL directly

 docName(doc) = url

  # Create the form description and generate the R
  # function call the

 form = getHTMLFormDescription(doc)[[1]]
 fun = createFunction(form)


  # now we can invoke the form from R. We only need 2
  # inputs  - FromDate and ToDate

 o = fun(FromDate = 01-11-2010, ToDate = 04-11-2010)

  # Having looked at the tables, I think we want the the 3rd
  # one.
 table = readHTMLTable(htmlParse(o, asText = TRUE),
which = 3,
header = TRUE,
stringsAsFactors = FALSE)
 table




 Yes it is marginally involved. But that is because we cannot simply read
 the HTML document directly from htmlParse() because the lack of Accept(
 useragent)
 HTTP header.



 Thanks and Regards
 Sayan Dasgupta

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RBloomberg on R-2.12.0

2010-11-05 Thread Duncan Temple Lang


On 11/5/10 5:20 AM, Tolga I Uzuner wrote:
 Dear R Users,
 
 Tried to install RBloomberg with R-2.12.0 and appears RDComclient has not 
 been built for this version of R, so failed. I then tried to get RBloombergs' 
 Java API version to work, but ran into problems with RJava which does not 
 appear to exist for Windows. My platform is Windows XP SP3.
 
 Will RDcomclient be built for R-2.12.0 anytime soon ?

It is on the Omegahat site. Just that the directories weren't linked to the 
appropriate place.
You can install it now.

 D.

 
 Does a version of RBloomberh with a Java API really exist ? An obvious Google 
 search like Java api rbloomberg throws up a bunch of discussions but 
 somehow, I cannot locate a package ?
 
 Will RJava work on Windows ?
 
 Thanks in advance for any pointers.
 Regards,
 Tolga
 
 
 This email is confidential and subject to important disclaimers and
 conditions including on offers for the purchase or sale of
 securities, accuracy and completeness of information, viruses,
 confidentiality, legal privilege, and legal entity disclaimers,
 available at http://www.jpmorgan.com/pages/disclosures/email.  
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] postForm() in RCurl and library RHTMLForms

2010-11-04 Thread Duncan Temple Lang


On 11/4/10 2:39 AM, sayan dasgupta wrote:
 Hi RUsers,
 
 Suppose I want to see the data on the website
 url - http://www.nseindia.com/content/indices/ind_histvalues.htm;
 
 for the index SP CNX NIFTY for
 dates FromDate=01-11-2010,ToDate=02-11-2010
 
 then read the html table from the page using readHTMLtable()
 
 I am using this code
 webpage - postForm(url,.params=list(
FromDate=01-11-2010,
ToDate=02-11-2010,
IndexType=SP CNX NIFTY,
Indicesdata=Get Details),
  .opts=list(useragent = getOption(HTTPUserAgent)))
 
 But it doesn't give me desired result

You need to be more specific about how it fails to give the desired result.

You are in fact posting to the wrong URL. The form is submitted to a different
URL - http://www.nseindia.com/marketinfo/indices/histdata/historicalindices.jsp



 
 Also I was trying to use the function getHTMLFormDescription from the
 package RHTMLForms but there we can't use the argument
 .opts=list(useragent = getOption(HTTPUserAgent)) which is needed for this
 particular website

That's not the case. The function RHTMLForms will generate for you does support
the .opts parameter.

What you want is something along the lines:


 # Set default options for RCurl
 # requests
options(RCurlOptions = list(useragent = R))
library(RCurl)

 # Read the HTML page since we cannot use htmlParse() directly
 # as it does not specify the user agent or an
 # Accept:*.*

url - http://www.nseindia.com/content/indices/ind_histvalues.htm;
wp = getURLContent(url)

 # Now that we have the page, parse it and use the RHTMLForms
 # package to create an R function that will act as an interface
 # to the form.
library(RHTMLForms)
library(XML)
doc = htmlParse(wp, asText = TRUE)
  # need to set the URL for this document since we read it from
  # text, rather than from the URL directly

docName(doc) = url

  # Create the form description and generate the R
  # function call the

form = getHTMLFormDescription(doc)[[1]]
fun = createFunction(form)


  # now we can invoke the form from R. We only need 2
  # inputs  - FromDate and ToDate

o = fun(FromDate = 01-11-2010, ToDate = 04-11-2010)

  # Having looked at the tables, I think we want the the 3rd
  # one.
table = readHTMLTable(htmlParse(o, asText = TRUE),
which = 3,
header = TRUE,
stringsAsFactors = FALSE)
table




Yes it is marginally involved. But that is because we cannot simply read
the HTML document directly from htmlParse() because the lack of Accept( 
useragent)
HTTP header.

 
 
 Thanks and Regards
 Sayan Dasgupta
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] File Downloading Problem

2010-11-01 Thread Duncan Temple Lang

I got this working almost immediately with RCurl although with that
one has to specify any value for the useragent option, or the same error occurs.

The issue is that R does not add an Accept entry to the HTTP request header.
It should add something like
   Accept: *.*

Using RCurl,
 u = 
http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bhav.csv.zip;
 o = getURLContent(u, verbose = TRUE, useragent = getOption(HTTPUserAgent))

succeeds (but not if there is no useragent).


We could fix R's download.file() to send Accept: *.*,
or allow general headers to be specified either as an option for
all requests, or as a parameter of download.file() (or both).
Or we could have the makeUserAgent() function in utils be more customizable
through options, or allow the R user specify the function herself.
But while this would be good, the HTTP facilities in R are not
intended to be as general something like libcurl (and hence RCurl).

Unless there is a compelling reason to enhance R's internal facilities,
I suggest people use something like libcurl.  This approach also has
the advantage of having the data directly in memory and avoiding writing
it to disk and then reading it back in, e.g.

  library(Rcompression)
  z = zipArchive(o)
  names(z)
  read.csv(textConnection(z[[1]]))


  D.


On 11/1/10 8:27 AM, Santosh Srinivas wrote:
 It's strange and the internet connection is fine because I am able to get
 data from yahoo.
 This was working till just yesterday ... strange if the website is creating
 issues with public access of basic data!
 
 -Original Message-
 From: David Winsemius [mailto:dwinsem...@comcast.net] 
 Sent: 01 November 2010 20:48
 To: Duncan Murdoch
 Cc: Santosh Srinivas; 'Rhelp'
 Subject: Re: [R] File Downloading Problem
 
 
 On Nov 1, 2010, at 10:41 AM, Duncan Murdoch wrote:
 
 On 01/11/2010 10:37 AM, Santosh Srinivas wrote:
 Nope Duncan ... no changes .. the same old way without a proxy ...  
 actually
 the download.file is being returned 403 forbidden which is strange.

 These are just two lines that I am trying to run.

 sURL-

 http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha
 v.csv.zip
 download.file(sURL,test.zip)

 Put the same URL in a browser and it works fine.

 It doesn't work for me, so presumably there is some kind of security  
 setting at the site (a cookie?), which allows your browser, but  
 doesn't allow you to use R, or me to use anything.
 
 Firefox in a Mac platform will download and unzip the file with no  
 security complaints and no cookie appears to be set when downloading,  
 but that code will not access the file, nor will my efforts to wrap  
 the URL in url() or unz() so it seems more likely that Santosh and I  
 do not understand the file opening processes that R supports.
 
   con=
 unz(description=http://www.nseindia.com/content/historical/EQUITIES/2010/NO
 V/cm01NOV2010bhav.csv.zip 
 , file=~/cm01NOV2010bhav.csv)
   test.df -  read.csv(file=con)
 Error in open.connection(file, rt) : cannot open the connection
 In addition: Warning message:
 In open.connection(file, rt) :
cannot open zip file
 'http://www.nseindia.com/content/historical/EQUITIES/2010/NOV/cm01NOV2010bha
 v.csv.zip'
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML getNodeSet syntax for PUBMED XML export

2010-09-08 Thread Duncan Temple Lang

Hi Rob

  doc = xmlParse(url for document)

  dn = getNodeSet(doc, //descriptorna...@majortopic = 'Y'])

will do what you want, I believe.

XPath - a language for expressing such queries - is quite
simple and based on a few simple primitive concepts from which
one can create complex compound queries. The //DescriptorName
is a node test. The [] is a predicate that includes/discards
some of the resulting nodes.

   D.

On 9/8/10 9:09 AM, Rob James wrote:
  I am looking for the syntax to capture XML tags marked with 
 /DescriptorName MajorTopicYN=Y/ , but the combination of the internal 
 space (between Name and Major and the embedded quote marks are 
 defeating me. I can get all the DescriptorName tags, but these include 
 both MajroTopicYN = Y and N variants. Any suggestions?
 
 Thanks in advance.
 
 Prototype text from PUBMED
 
 MeshHeadingList
 MeshHeading
 DescriptorName MajorTopicYN=YAntibodies, Monoclonal/DescriptorName
 /MeshHeading
 MeshHeading
 DescriptorName MajorTopicYN=NBlood Platelets/DescriptorName
 QualifierName MajorTopicYN=Nimmunology/QualifierName
 QualifierName MajorTopicYN=Yphysiology/QualifierName
 QualifierName MajorTopicYN=Nultrastructure/QualifierName
 /MeshHeading
 /MeshHeadingList
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R program google search

2010-09-04 Thread Duncan Temple Lang
Hi there

One way to use Google's search service from R is

libary(RCurl)
library(RJSONIO)  # or library(rjson)

val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = 
Google search AJAX , v = 1.0)
results = fromJSONIO(val)

Google requests that you provide your GoogleAPI key

  val = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = 
Google search AJAX , v = 1.0,
  k= my google api key)

Similarly, you should provide header information to identify your application, 
e.g

xx = getForm(http://ajax.googleapis.com/ajax/services/search/web;, q = Google 
search AJAX , v = 1.0,
 .opts = list(useragen = RGoogleSearch, verbose = TRUE))




  D.

On 9/3/10 10:33 PM, Waverley @ Palo Alto wrote:
 My question is how to use R to program google search.
 I found this information:
 The SOAP Search API was created for developers and researchers
 interested in using Google Search as a resource in their
 applications.  Unfortunately google no longer supports that.  They
 are supporting the AJAX Search API.  What about R?
 
 Thanks.
 
 
 
 On Fri, Sep 3, 2010 at 2:23 PM, Waverley @ Palo Alto
 waverley.paloa...@gmail.com wrote:
 Hi,

 Can someone help as how to use R to program google search in the R
 code?  I know that other languages can allow or have the google search
 API

 If someone can give me some links or sample code I would greatly appreciate.

 Thanks.

 --
 Waverley @ Palo Alto

 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getNodeSet - what am I doing wrong?

2010-08-31 Thread Duncan Temple Lang
Johannes Graumann wrote:
 Thanks!
 but:
  library(XML)
  xmlDoc - xmlTreeParse(http://www.unimod.org/xml/unimod_tables.xml;)

You need to xmlParse() or xmlTreeParse(url, useInternalNodes = TRUE)
(which are equivalent) in order to be able to use getNodeSet().

The error you are getting is because you are using xmlTreeParse()
and the result is a tree represented in R rather than internal
C-level data structures on which getNodeSet() can operate.

xmlParse() is faster than xmlTreeParse() 
and one can use XPath to query it.

  D.

  getNodeSet(xmlDoc,//x:modifications_row, x)
 Error in function (classes, fdef, mtable)  : 
   unable to find an inherited method for function saveXML, for signature 
 XMLDocument
 
 ?
 
 Thanks, Joh
 
 
 Duncan Temple Lang wrote:
 
  
  Hi Johannes
  
   This is a common issue.  The document has a default XML namespace, e.g.
  the root node is defined as
  
   unimod xmlns=http://www.unimod.org/xmlns/schema/unimod_tables_1;...
 .
  
   So you need to specify which namespace to match in the XPath expression
  in getNodeSet().  The XML package  provides a convenient facility for
  this. You need only specify the prefix such as x and that will
  be bound to the default namespace. You need to specify this in
  two places - where you use it in the XPath expression and
  in the namespaces argument of getNodeSet()
  
  So
 getNodeSet(test, //x:modifications_row, x)
  
  gives you probably what you want.
  
   D.
  
  
  
  On 8/30/10 8:02 AM, Johannes Graumann wrote:
  library(XML)
  test - xmlTreeParse(
  http://www.unimod.org/xml/unimod_tables.xml,useInternalNodes=TRUE)
  getNodeSet(test,//modifications_row)
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
There are men who can think no deeper than a fact - Voltaire


Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpaTPO7e32dB.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] getNodeSet - what am I doing wrong?

2010-08-30 Thread Duncan Temple Lang

Hi Johannes

 This is a common issue.  The document has a default XML namespace, e.g.
the root node is defined as

 unimod xmlns=http://www.unimod.org/xmlns/schema/unimod_tables_1;...
   .

 So you need to specify which namespace to match in the XPath expression
in getNodeSet().  The XML package  provides a convenient facility for
this. You need only specify the prefix such as x and that will
be bound to the default namespace. You need to specify this in
two places - where you use it in the XPath expression and
in the namespaces argument of getNodeSet()

So
   getNodeSet(test, //x:modifications_row, x)

gives you probably what you want.

 D.



On 8/30/10 8:02 AM, Johannes Graumann wrote:
 library(XML)
 test - xmlTreeParse(
 http://www.unimod.org/xml/unimod_tables.xml,useInternalNodes=TRUE)
 getNodeSet(test,//modifications_row)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Parsing a XML file

2010-08-24 Thread Duncan Temple Lang

xmlDoc() is not the function to use to parse a file.

Use

   doc = xmlParse(Malaria_Grave.xml)


xmlDoc() is for programmatically creating a new XML within R.
It could be more robust to being called with a string, but
the key thing here is that it is not the appropriate function for what
you want.


Also, if there had been a problem with the parsing, you'd need to give
me/us the offending XML  file so that we could have a chance of reproducing
the problem.

   D.


On 8/24/10 2:35 PM, Orvalho Augusto wrote:
 I have one XML file with 30MB that I need to read the data.
 
 I try this;
 library(XML)
 doc - xmlDoc(Malaria_Grave.xml)
 
 And R answers like this
  *** caught segfault ***
 address 0x5, cause 'memory not mapped'
 
 Traceback:
  1: .Call(RS_XML_createDocFromNode, node, PACKAGE = XML)
  2: xmlDoc(Malaria_Grave.xml)
 
 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace
 
 
 Or I try this:
 doc - xmlTreeParse(Malaria_Grave.xml)
 
 I get this
 xmlParseEntityRef: no name
 xmlParseEntityRef: no name
 Error: 1: xmlParseEntityRef: no name
 2: xmlParseEntityRef: no name
 
 Please guys help this simple mortal!
 Caveman
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGoogleDocs ability to write to spreadsheets broken as of yesterday - CAN PAY FOR FIX

2010-07-21 Thread Duncan Temple Lang
Hi Harlan

  Can you send some code so that we can reproduce the problem.
That will enable me to fix the problem quicker.

  D.

On 7/21/10 8:26 AM, Harlan Harris wrote:
 I unfortunately haven't received any responses about this problem. We
 (the company I work for) are willing to discuss payment to someone who
 is willing to quickly contribute a fix to the RGoogleDocs/RCurl
 toolchain that will restore write access. Please contact me directly if
 you're interested. Thank you,
 
  -Harlan Harris
 
 On Tue, Jul 20, 2010 at 10:19 AM, Harlan Harris har...@harris.name
 mailto:har...@harris.name wrote:
 
 Hi,
 
 I'm using RGoogleDocs/RCurl to update a Google Spreadsheet.
 Everything worked OK until this morning, when my ability to write
 into spreadsheet cells went away. I get the following weird error:
 
 Error in els[[type + 1]] : subscript out of bounds
 
 Looking at the Google Docs API changelog, I see the following:
 
 http://code.google.com/apis/spreadsheets/changelog.html
 
 
 Release 2010-01 (July 14, 2010)
 
 This is an advanced notice about an upcoming change.
 
 * Starting July 19, 2010, all links returned by all Spreadsheets
   API feeds will use HTTPS. This is being done in the interests
   of increased security. If you require the use of HTTP, we
   recommend that you remove the replace |https| with |http| in
   these links. Another announcement will be made on July 19,
   2010, when this change goes to production.
 
 
 I suspect this is the problem. Fixing it is above my head, I'm
 afraid. Could anyone help? This is urgent. Thank you,
 
  -Harlan Harris
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RGoogleDocs ability to write to spreadsheets broken as of yesterday - CAN PAY FOR FIX

2010-07-21 Thread Duncan Temple Lang

Hi Harlan

 If you install the latest version of RCurl from source via

  install.packages(RCurl, repos = http://www.omegahat.org/R;)

 and that should solve the problem, assuming I have been reproducing the same
 problem you mentioned.


 You haven't mentioned what operating system your are on. If you are on Windows,
 that will pick up the binary version. If you are on the mac, you will have to 
build
 it from source.


D.

On 7/21/10 8:26 AM, Harlan Harris wrote:
 I unfortunately haven't received any responses about this problem. We
 (the company I work for) are willing to discuss payment to someone who
 is willing to quickly contribute a fix to the RGoogleDocs/RCurl
 toolchain that will restore write access. Please contact me directly if
 you're interested. Thank you,
 
  -Harlan Harris
 
 On Tue, Jul 20, 2010 at 10:19 AM, Harlan Harris har...@harris.name
 mailto:har...@harris.name wrote:
 
 Hi,
 
 I'm using RGoogleDocs/RCurl to update a Google Spreadsheet.
 Everything worked OK until this morning, when my ability to write
 into spreadsheet cells went away. I get the following weird error:
 
 Error in els[[type + 1]] : subscript out of bounds
 
 Looking at the Google Docs API changelog, I see the following:
 
 http://code.google.com/apis/spreadsheets/changelog.html
 
 
 Release 2010-01 (July 14, 2010)
 
 This is an advanced notice about an upcoming change.
 
 * Starting July 19, 2010, all links returned by all Spreadsheets
   API feeds will use HTTPS. This is being done in the interests
   of increased security. If you require the use of HTTP, we
   recommend that you remove the replace |https| with |http| in
   these links. Another announcement will be made on July 19,
   2010, when this change goes to production.
 
 
 I suspect this is the problem. Fixing it is above my head, I'm
 afraid. Could anyone help? This is urgent. Thank you,
 
  -Harlan Harris
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Duncan Temple Lang
Hi Ryusuke

 I would use the encoding parameter of htmlParse() and 
 download and parse the content in one operation:

 htmlParse(http://home.sina.com;, encoding = UTF-8)

 If you want to use getURL() in RCurl, use the .encoding parameter

  You didn't tell us the output of Sys.getlocale()
  or how your terminal/console is configured, so the above
  may vary under your configuration, but works on various
  machines for me with different settings.

D.


Ryusuke Kenji wrote:
 
 Hi All,
 
 First method:-
 library(XML)
 
 theurl - http://home.sina.com;
 download.file(theurl, tmp.html)
 
 txt - readLines(tmp.html)
 
 txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = 
 TRUE)
 
 g - xpathSApply(txt, //p, function(x) xmlValue(x))
 
 head(grep( , g, value=T))
 
 
 [1] ?? | ?? | ENGLISH   
 ??? ???
 [3] ??? ?? ??(???)  
 ?? 
 [5]  ???? ??! 
 ? ??! !  
 
 
 
 SecondMethod:-
 library(RCurl)
 
 theurl - getURL(http://home.sina.com,encoding='GB2312')
 
 Encoding(theurl)
 
 [1]unknown
 
 txt - readLines(con=textConnection(theurl),encoding='GB2312')
 txt[5:10] #show the lines which occurred encoding problem.
 [1] meta http-equiv=\Content-Type\ content=\text/html; charset=utf-8\ 
 /
 [2] titleSINA.com US ? -??/title
 [3] meta name=\Keywords\ content=\, ???, 
 ???, ??,, SINA, US, News, Chinese, 
 Asia\ /
 [4] meta name=\Description\ 
 content=\???, 
 ???24, , 
 , ??, , ?BBS, 
 ???.\ /
 [5] 
   
   

 [6] link rel=\stylesheet\ type=\text/css\ 
 href=\http://ui.sina.com/assets/css/style_home.css\; /
 
 i am trying to read data from a Chinese language website, but the Chinese 
 characters always unreadable, may I know if any good idea to cope such 
 encoding problem in RCurl and XML?
 
 
 Regards,
 Ryusuke
 
 _
 
 
   [[alternative HTML version deleted]]
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
There are men who can think no deeper than a fact - Voltaire


Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpYi9CYtba6H.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Do colClasses in readHTMLTable (XML Package) work?

2010-03-20 Thread Duncan Temple Lang



On 3/17/10 6:52 PM, Marshall Feldman wrote:
 Hi,
 
 I can't get the colClasses option to work in the readHTMLTable function 
 of the XML package. Here's a code fragment:
 
 require(XML)
 doc - http://www.nber.org/cycles/cyclesmain.html;
 table - getNodeSet(htmlParse(doc),//table) [[2]]# The
 main table is the second one because it's embedded in the page table.
 xt - readHTMLTable(
  table,
  header =
 c(peak,trough,contraction,expansion,trough2trough,peak2peak),
  colClasses =
 
 c(character,character,character,character,character,character),
  trim = TRUE
  )
 
 Does anyone know what's wrong?

The coercion of the table columns is done before the call to
as.data.frame. You can add

  stringsAsFactors = FALSE

in the call to readHTMLTable() and you'll get what you expect,
I believe.

   D.

 
  Marsh Feldman
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] parse an HTML page with verbose error message (using XML)

2010-03-11 Thread Duncan Temple Lang
Hi Yihui

  It took me a moment to see the error message as the latest
development version of the XML package suppresses/hides them by default
for htmlParse().

 You can provide your own function via the error parameter.
If you just want to see more detailed error messages on the console
you can use a function like the following


fullInfoErrorHandler =
function(msg, code, domain, line, col, level, file)
{
   # level tells how significant the error is
   #   These are 0, 1, 2, 3 for WARNING, ERROR, FATAL
   # meaning simple warning, recoverable error and fatal/unrecoverable 
error.
   #  See XML:::xmlErrorLevel
   #
   # code is an error code, See the values in XML:::xmlParserErrors
   #  XML_HTML_UNKNOWN_TAG, XML_ERR_DOCUMENT_EMPTY
   #
   # domain tells what part of the library raised this error.
   #  See XML:::xmlErrorDomain

  codeMsg = switch(level, warning, recoverable error, fatal error)
  cat(There was a, codeMsg, in the, file, at line, line, column,
 col, \n, msg, \n)
}

doc = htmlParse(~/htmlErrors.html, error = fullInfoErrorHandler)

And of course you can mimic xmlErrorCumulator() to form a closure that
collects the different details of each message into an object.  If you
look in the error.R and xmlErrorEnums.R files within the R code of the
XML package, you'll find some additional functions that give us further
support for working with errors in the XML/HTML parsers.

 Best,
   D.

Yihui Xie wrote:
 I'm using the function htmlParse() in the XML package, and I need a
 little bit help on error handling while parsing an HTML page. So far I
 can use either the default way:
 
 # error = xmlErrorCumulator(), by default
 library(XML)
 doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;)
 # the error message is:
 # htmlParseStartTag: invalid element name
 
 or the tryCatch() approach:
 
 # error = NULL, errors to be caught by tryCatch()
 tryCatch({
 doc = htmlParse(http://www.public.iastate.edu/~pdixon/stat500/;,
 error = NULL)
 }, XMLError = function(e) {
 cat(There was an error in the XML at line, e$line, column,
 e$col, \n, e$message, \n)
 })
 # verbose error message as:
 # There was an error in the XML at line 90 column 2
 # htmlParseStartTag: invalid element name
 
 I wish to get the verbose error messages without really stopping the
 parsing process; the first approach cannot return detailed error
 messages, while the second one will stop the program...
 
 Thanks!
 
 Regards,
 Yihui
 --
 Yihui Xie xieyi...@gmail.com
 Phone: 515-294-6609 Web: http://yihui.name
 Department of Statistics, Iowa State University
 3211 Snedecor Hall, Ames, IA
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making FTP operations with R

2010-03-08 Thread Duncan Temple Lang

R does provide support for basic FTP requests. Not for DELETE
requests. And not for communication on the same connection.

I think your best approach is to use the RCurl package
(http://www.omegahat.org/RCurl).

  D.

Orvalho Augusto wrote:
 Dears I need to make some very basic FTP operations with R.
 
 I need to do a lot of get and issue a respective delete command
 too on the same connection.
 
 How can I do that?
 
 Thanks in advance
 
 Caveman
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Working with combinations

2010-03-02 Thread Duncan Temple Lang

I think there are several packages that implement combinations and several
that allow you to specify a function  to be called when each vector of 
combinations
is generated.  I can't recall the names of all such packages, but the
Combinations package on www.omegahat.org/Combinations is one.

 D.

Herm Walsh wrote:
 I am working with the combinations function (available in the gtools 
 package).  However, rather than store all of the possible combinations I 
 would like to check each combination to see if it meets a certain criteria.  
 If it does, I will then store it.
 
 I have looked at the combinations code but am unsure where in the algorithm I 
 would be able to operate on each combination.
 
 Thanks!
 
 
   
   [[alternative HTML version deleted]]
 
 
 
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with EXPASY HTML form submission in RCurl package

2010-02-13 Thread Duncan Temple Lang


Sunando Roy wrote:
 
 Hi Duncan,
 
 Thanks for your help. I changed the P but the output that I get is not
 what I expect. The form gets aborted without any actual output. I get
 the same result with
 
 postForm(http://www.expasy.ch/tools/protscale.html;)


That URL  (...protscale.html) is the HTML page that contains the form.
It is not the URL to which you are supposed to submit the form request.
That information is in the attributes of the form .. /form.
That is

   http://www.expasy.ch/cgi-bin/protscale.pl?1

So you have to know a little about HTML forms in order to
figure out how to map the HTML description to a request.
That is what your browser does ( including hidden fields in
the form, etc.)
It is also the purpose of the R package RHTMLForms (www.omegahat.org/RHTMLForms
and
 install.packages(RHTMLForms, repos = http://www.omegahat.org/R;, type = 
source)
)

e.g.

 library(RHTMLForms)
 f = getHTMLFormDescription(http://www.expasy.ch/tools/protscale.html;)
 fun = createFunction(f[[2]])


 o = fun(prot_id = P05130, weight_var = exponential, style = POST)

(The protscale.html is malformed with an  that messes up parsing the
 linear option.)

Now, of course, you have to parse the resulting HTML (in the string given in 
the variable o)
to get the information the form submission generated.


  D.


 
 just with an added message that there was no input passed on. But with
 the input like I presented I get the same output. I could make some of
 the examples work like for e.g
 
 postForm(http://www.omegahat.org/RCurl/testPassword/form_validation.cgi;,
 your_name = Duncan,
 your_age = 35-55,
 your_sex = m,
 submit = submit,
 .opts = list(userpwd = bob:welcome))
 
 which would suggest atleast the setup is correct.
 I parsed the expasy protscale source code to identify the variables but
 the form does not seem to go through. I can post the html body code if
 needed.
 
 Regards
 
 Sunando
 On Fri, Feb 12, 2010 at 3:54 PM, Duncan Temple Lang
 dun...@wald.ucdavis.edu mailto:dun...@wald.ucdavis.edu wrote:
 
 
 
 Sunando Roy wrote:
  Hi,
 
  I am trying to submit a form to the EXPASY protscale server (
  http://www.expasy.ch/tools/protscale.html). I am using the RCurl
 package and
  the postForm function available in it. I have extracted the
 variables for
  the form from the HTML source page. According to the syntax of
 postForm, I
  just need to mention the url and assign values to the input
 mentioned in the
  HTML code.
  The code that I am using is:
  postForm(http://www.expasy.ch/tools/protscale.html;,
  sequence = 
  ,
  scale = Molecular weight,
  window = 5,
  weight_edges = 100,
  weight_var = linear,
  norm = no,
  submit = Submit), .checkparams = TRUE)
 
 
 I don't think that is what you actually submitted to R.
 It is a syntax error as you end the cal to postForm) after Submit
 and then have an extra  , .checkparams = TRUE) afterwards.
 
 But, when you remove the ')' after  Submit,
 the problem you get is that .checkparams is not a parameter
 of postForm(), but .checkParams is.  R is case-sensitive
 so the problem is that .checkparams is being treated as a parameter
 of your form.
 
 So change the p to P in .checkparams, and it works.
 
  D.
 
  the constant error that I get is:
  Error in postForm(http://www.expasy.ch/tools/protscale.html;,
 .params =
  list(sequence = not,  :  STRING_ELT() can only be applied to a
 'character
  vector', not a 'logical'
 
  Is there any other way to submit an HTML form in R ?
 
  Thanks for the help
 
  Regards
 
  Sunando
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailto:R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help with EXPASY HTML form submission in RCurl package

2010-02-12 Thread Duncan Temple Lang


Sunando Roy wrote:
 Hi,
 
 I am trying to submit a form to the EXPASY protscale server (
 http://www.expasy.ch/tools/protscale.html). I am using the RCurl package and
 the postForm function available in it. I have extracted the variables for
 the form from the HTML source page. According to the syntax of postForm, I
 just need to mention the url and assign values to the input mentioned in the
 HTML code.
 The code that I am using is:
 postForm(http://www.expasy.ch/tools/protscale.html;,
 sequence = 
 ,
 scale = Molecular weight,
 window = 5,
 weight_edges = 100,
 weight_var = linear,
 norm = no,
 submit = Submit), .checkparams = TRUE)


I don't think that is what you actually submitted to R.
It is a syntax error as you end the cal to postForm) after Submit
and then have an extra  , .checkparams = TRUE) afterwards.

But, when you remove the ')' after  Submit,
the problem you get is that .checkparams is not a parameter
of postForm(), but .checkParams is.  R is case-sensitive
so the problem is that .checkparams is being treated as a parameter
of your form.

So change the p to P in .checkparams, and it works.

  D.

 the constant error that I get is:
 Error in postForm(http://www.expasy.ch/tools/protscale.html;, .params =
 list(sequence = not,  :  STRING_ELT() can only be applied to a 'character
 vector', not a 'logical'
 
 Is there any other way to submit an HTML form in R ?
 
 Thanks for the help
 
 Regards
 
 Sunando
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] write.zip?

2010-02-10 Thread Duncan Temple Lang
Hi Spencer

  I just put a new source version (0.9-0) of the Rcompression package
on the www.omegahat.org/R  repository and it has a new function zip()
that creates or appends to a zip file, allowing one to provide
alternative names.

I'll add support for writing content from memory (i.e. AsIs
character strings and raw vectors) soon.

It doesn't yet handle replacing or removing elements yet.
I may use a different approach (e.g. the 7-zip lzma SDK)
to that and other things.

   D.

spencerg wrote:
 Thanks to Dieter Menne and Prof. Ripley for replies.
 
  For certain definitions of better (e.g., transportability?), the
 Rcompression package might be superior to the system call I mentioned. 
 I also just found the tar function in the utils package, which looks
 like it might be more transportable than my system call.
 
  However, as Prof. Ripley noted, there may not be a simpler way
 than my system call, especially considering the time I would have to
 invest to learn how to use it.
 
  Thanks again very much,
  Spencer Graves
 
 
 Prof Brian Ripley wrote:
 On Tue, 9 Feb 2010, spencerg wrote:

 Can one write a zip file from R?
 I want to create a file with a name like dat.zip, being a zip
 file containing dat.csv.  I can create dat.csv, then call
 system('zip -r9 dat.zip dat.csv').  Is there a better way?

 Not really.  One could use compiled code like that in unzip() to do
 this, but as nothing inside R is involved the only gain would be to
 not need to have the zip executable present.  Omegahat package
 Rcompression does have means to manipulate .zip files from compiled
 code with an R interface, but not AFAICS a simpler way to do what you
 want.

 I can use gzfile to write a gz file, but I don't know how to
 give that a structure that would support unzipping to a *.csv file.

 A zip file is not a gzipped file, even though gzip compression is used
 for parts of the contents.  The header and trailer are quite different.


 Thanks,
 Spencer

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] convert R plots into annotated web-graphics

2010-02-09 Thread Duncan Temple Lang
Hi

While there is different level of support for SVG in the different browsers,
basic SVG (non-animation) does work on all of them (with a plugin for IE).

In addition to the 2 SVG packages on CRAN, there is SVGAnnotation at
www.omegahat.org/SVGAnnotation and that is quite a bit more powerful.
There is a link on that page to some examples that are similar to yours.

Imagemaps are a perfectly good way of achieving the interactivity you describe 
and Barry's
imagemap package should make this pretty straightforward.
If all you need is to have event handlers for regions, then imagemaps will be 
fine.
And some JavaScript code will allow you to connect the image map events to 
changing
characteristics of the table.

The rest of this mail is about richer approaches

However, there are other styles of interaction and animation that require 
working at the level of objects on the plot,
i.e. points, lines, text, etc.  When we have these objects at rendering time 
rather than pixels and regions,
we can, e.g., change the color of a point, changing its appearance (color or 
size), hide or move a point, etc.
You need this to do linked plots, for example, i.e. where we mouse over a point 
in one plot or the data table
and highlight the corresponding observations in other plots.

If you want this richer framework, you can generate the plot in R in such a way 
that it will be
displayed in your browser not as a PNG file, but with real objects being 
created within the
rendering.  The SVGAnnotation package does this reasonably comprehensively.

You can generate a plot in R that will be displayed on the JavaScript canvas.
Again, this will create objects and they can then be manipulated by JavaScript
event handlers that work on the plot elements and the table.
There is a prototype  of such an R-JavaScript canvas graphics devices
in the RGraphicsDevice package at www.omegahat.org/RGraphicsDevice.

Also, there is a beta-level Flash device that works at the object level and
allows an R programmer to annotate the resulting plot in either R or 
ActionScript.
(This is at www.omegahat.org/FlashMXML.)
There is another Flash graphics device for R at
https://r-forge.r-project.org/projects/swfdevice/
but this doesn't work at the object-level (at this point in time, at least).


Both the FlashMXML and JavaScript packages rely on the RGraphicsDevice
package and that could be fixed up minorily to handle font metric calculations
with more accuracy (e.g. using RFreetype).


Instead of using an HTML table and modifying it programmatically via CSS
properties, etc., you might use a widget.

   a DataTable widget from the Yahoo UI javascript library.
   a Flash DataGrid to display the data as an interactive table.


As I said, image maps are probably simplest if your needs are reasonably simple.
These other approaches allow for potentially richer Web-based graphics.



Barry Rowlingson wrote:
 On Sun, Feb 7, 2010 at 2:35 PM, Rainer Tischler rainer_...@yahoo.de wrote:
 Dear all,

 I would like to make a large scatter plot created with R available as an 
 interactive web graphic, in combination with additional text-annotations for 
 each data point in the plot. The idea is to present the text-annotations in 
 an HTML-table and inter-link the data points in the plot with their 
 corresponding entries in the table, i.e. when clicking on a data point in 
 the plot, the corresponding entry in the table should be highlighted or 
 centered and vice-versa, when clicking on a table-entry, the corresponding 
 point in the plot should be highlighted.

 I have seen that CRAN contains various R-packages for SVG-based output of 
 interactive graphics (with hyperlinks and tool-tip annotations for each data 
 point); however, SVG is not supported by all browsers. Is anybody aware of 
 another solution for this problem (maybe based on image-maps and javascript)?
 If you have alternative ideas for interlinking tabular annotations with 
 plotted data points, I would appreciate any recommendation/suggestion.
 (I work with R 2.8.1 on different 32-bit PCs with both Linux and Windows 
 operating systems).

 
  My 'imagemaps' package?
 
 https://r-forge.r-project.org/projects/imagemap/
 
 Barry
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] create zip archive in R

2010-02-04 Thread Duncan Temple Lang


Uwe Ligges wrote:
 
 
 On 04.02.2010 03:31, mkna005 mkna005 wrote:
 Hello all!
 I was wondering if it is possible to create a zip archive within R and
 add files to it?
 
 No.

Well, the Rcompression package on the Omegahat package does have some 
facilities for it.
It doesn't do it in memory, but does handle issues of moving disparate files to
a common temporary directory and getting things in order generally to create 
the zip
file. But it currently uses the external zip executable to create the archive.

I probably will get around to implementing a version in memory as it has been
an issue that has nagged me for a while. And we have the code for it.




 
 
 I know it is possible to unzip files but is it
 possible the other way round?
 
 No.
 
 
 For (compressed) archives see ?tar
 For other compression formats of single files see ?file
 
 Uwe Ligges
 
 

 Thanks in advance

 Christoph

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RCurl : limit of downloaded Urls ?

2010-01-31 Thread Duncan Temple Lang


Alexis-Michel Mugabushaka wrote:
  Dear Rexperts,
 
 I am using R to query google.

I believe that Google would much prefer that you use their API
rather than their regular HTML form to make programmatica search queries.

 
 I am getting different results (in size) for manual queries and queries sent
 through getForm of RCurl.
 
 It seems that RCurl limits the size of the text retrieved (the maximum I
 could get is around 32 k bits).

  _bytes_   I assume

 

zz = getForm(http://www.google.com/search;, q='google+search+api', num = 100)
nchar(zz)
[1] 109760

So more than 3 times 32Kb and there isn't a limit of 32K.

The results will most likely be chunked, i.e. returned in blocks,
but getForm() and other functions will, by default, combine the chunks
and return the entire answer. If you were to provide your own function
for the writefunction option in RCurl functions, then your
function will be called for each chunk.

So to be able to figure out why things are not working for you,
we need to see the R code  you are using, and know the operating
system and versions of the RCurl package and R.

 D.


 Any idea how to get around this ?
 
 Thanks in advance
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] SSOAP XML-RPC

2010-01-27 Thread Duncan Temple Lang

Hi Jan

  Is
 .XMLRPC(http://localhost:9000;, Cytoscape.test, .opts = list(verbose = 
TRUE))

  the command you used? If not, what did you use?
  Can you debug the .XMLRPC function (e.g. with options(error = recover))
  and see what the XML that was sent to the server, i.e. the cmd variable
  in the .XMLRPC() function.

  Can you find out what the Perl, Python or Ruby modules send?

  It is easy to fix if we know what should be sent, but we do need more details.

   D.

Jan Bot wrote:
 Hi,
 
 I'm trying to use the XML-RPC client in the SSOAP package to connect to a
 service that I have created. From other languages (Perl, Python, Ruby) this
 is not a problem but the SSOAP client gives the following error:
 
 Error in .XMLRPC(http://localhost:9000;, Cytoscape.test, .opts =
 list(verbose = TRUE)) :
   Failed to parse XML-RPC request: Content is not allowed in prolog.
 
 It looks like the SSOAP XML-RPC client is not creating the right type of
 XML-RPC message. Does anyone know how to fix this or has successfully used
 the SSOAP XML-RPC client?
 
 Thanks,
 
 Jan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Data import export zipped files from URLs

2010-01-19 Thread Duncan Temple Lang


Dieter Menne wrote:
 
 Velappan Periasamy wrote:
 I am not able to import zipped files from the following link.
 How to get thw same in to R?.
 mydata -
 read.csv(http://nseindia.com/content/historical/EQUITIES/2010/JAN/cm15JAN2010bhav.csv.zip;)

 
 As Brian Ripley noted in 
 
 http://markmail.org/message/7dsauipzagq5y36o
 
 you will have to download it first and then to unzip.



Well if downloading to disk first does need to be avoided, you can use
the RCurl and Rcompression packages to do the computations in memory:

library(RCurl)
ctnt = 
getURLContent(http://nseindia.com/content/historical/EQUITIES/2010/JAN/cm15JAN2010bhav.csv.zip;)


library(Rcompression)
zz = zipArchive(ctnt)
names(zz)
txt = zz[[1]]
read.csv(textConnection(txt))

 D.

 
 Dieter
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xmlToDataFrame#Help!!!

2010-01-10 Thread Duncan Temple Lang
Christian Ritter wrote:
 I'm struggling with interpreting XML files created by ADODB as 
 data.frames and I'm looking for advice (see attached example file).

You'll have to attach it (or give us a URL for it).

Also, you should tell us what you have tried and how it failed.
And of course, your sessionInfo(). 

 D.

 
 Note:
 This file contains a result set which comes from a rectangular data array.
 I've been trying to play with parameters to the xmlToDataFrame function 
 in the XML package but I dont get it to extract the data frame.
 
 This is what the result should look like:
  Name Sex Age Height Weight
 1   Alfred   M  14   69.0  112.5
 2Alice   F  13   56.5   84.0
 3  Barbara   F  13   65.3   98.0
 4Carol   F  14   62.8  102.5
 5Henry   M  14   63.5  102.5
 6James   M  12   57.3   83.0
 7 Jane   F  12   59.8   84.5
 8Janet   F  15   62.5  112.5
 9  Jeffrey   M  13   62.5   84.0
 10John   M  12   59.0   99.5
 11   Joyce   F  11   51.3   50.5
 12Judy   F  14   64.3   90.0
 13  Louise   F  12   56.3   77.0
 14Mary   F  15   66.5  112.0
 15  Philip   M  16   72.0  150.0
 16  Robert   M  12   64.8  128.0
 17  Ronald   M  15   67.0  133.0
 18  Thomas   M  11   57.5   85.0
 19 William   M  15   66.5  112.0
 
 Thanks in advance ...
 
 Chris
 
 P.S.:
 In return, I'll continue developing a small package called R2sas2R with 
 obvious meaning and I'll release it on CRAN as soon as I'm a bit 
 further. (first tests under Windows using the StatconnDCOM connector and 
 the rcom package are encouraging).
 


-- 
There are men who can think no deeper than a fact - Voltaire


Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpq7OyJr3InL.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Duncan Temple Lang
Hi Lauri.

I am in the process of making some changes
to the encoding in the XML package. I'll take a look
over the next few days. (Not certain precisely when.)

 D.



Lauri Nikkinen wrote:
 Hi,
 
 I'm trying to get data from web page and modify it in R. I have a
 problem with encoding. I'm not able to get
 encoding right in htmlTreeParse command. See below
 
 library(RCurl)
 library(XML)

 site - getURL(http://www.aarresaari.net/jobboard/jobs.html;)
 txt - readLines(tc - textConnection(site)); close(tc)
 txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE)

 g - xpathSApply(txt, //p, function(x) xmlValue(x))
 head(grep( , g, value=T))
 
 [1] Â Â PART-TIME EXPORT SALES ASSOCIATES (ALSO SUMMER WORK) Â
 Valuatum Oy  Helsinki  Ilmoitus lisätty: 31.12.2009. Viimeinen
 hakupäivä: 28.02.2010
 [2]   MSN EDITOR / ONLINE PRODUCER  Manpower Oy  Espoo  Ilmoitus
 lisätty: 30.12.2009. Viimeinen hakupäivä: 15.1.2010
 [3]   MYYNTINEUVOTTELIJA  Rand Customer Contact Oy  Helsinki Â
 Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 30.1.2010
 [4] Â Â HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN? Â HALUATKO
 IT-ARKKITEHDIKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty:
 30.12.2009. Viimeinen hakupäivä: 28.2.2010
 [5]   HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN? Â
 HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  Shanghai, China
  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010
 [6]   Korkeakouluharjoittelija/ työelämävalmennettava  Suomen
 suurlähetystö Pristina, Kosovo  Pristina, Kosovo  Ilmoitus
 lisätty: 30.12.2009. Viimeinen hakupäivä: 20.1.2010
 
 This won't help:
 
 txt - readLines(tc - textConnection(site)); close(tc)
 txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE, 
 encoding=latin1)
 g - xpathSApply(txt, //p, function(x) xmlValue(x))
 head(grep( , g, value=T))
 
 [1] Â Â PART-TIME EXPORT SALES ASSOCIATES (ALSO SUMMER WORK) Â
 Valuatum Oy  Helsinki  Ilmoitus lisätty: 31.12.2009. Viimeinen
 hakupäivä: 28.02.2010
 [2]   MSN EDITOR / ONLINE PRODUCER  Manpower Oy  Espoo  Ilmoitus
 lisätty: 30.12.2009. Viimeinen hakupäivä: 15.1.2010
 [3]   MYYNTINEUVOTTELIJA  Rand Customer Contact Oy  Helsinki Â
 Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 30.1.2010
 [4] Â Â HALUATKO IT-ARKKITEHDIKSI SHANGHAIHIN? Â HALUATKO
 IT-ARKKITEHDIKSI SHANGHAIHIN?  Shanghai, China  Ilmoitus lisätty:
 30.12.2009. Viimeinen hakupäivä: 28.2.2010
 [5]   HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN? Â
 HALUATKO J2EE-OHJELMISTOKEHITTÄJÄKSI SHANGHAIHIN?  Shanghai, China
  Ilmoitus lisätty: 30.12.2009. Viimeinen hakupäivä: 28.2.2010
 [6]   Korkeakouluharjoittelija/ työelämävalmennettava  Suomen
 suurlähetystö Pristina, Kosovo  Pristina, Kosovo  Ilmoitus
 lisätty: 30.12.2009. Viimeinen hakupäivä: 20.1.2010
 
 Any ideas?
 
 Thanks,
 Lauri
 
 sessionInfo()
 R version 2.10.0 (2009-10-26)
 i386-pc-mingw32
 
 locale:
 [1] LC_COLLATE=Finnish_Finland.1252  LC_CTYPE=Finnish_Finland.1252
 LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
 [5] LC_TIME=Finnish_Finland.1252
 
 attached base packages:
 [1] grDevices datasets  splines   graphics  utils grid  stats
methods   base
 
 other attached packages:
  [1] RDCOMClient_0.92-0 XML_2.6-0  RCurl_1.3-1
 Hmisc_3.7-0survival_2.35-8ggplot2_0.8.5  digest_0.4.2
  reshape_0.8.3
  [9] plyr_0.1.9 proto_0.3-8gplots_2.7.4
 caTools_1.10   bitops_1.0-4.1 gtools_2.6.1
 gmodels_2.15.0 gdata_2.6.1
 [17] lattice_0.17-26
 
 loaded via a namespace (and not attached):
 [1] cluster_1.12.1 MASS_7.3-4 tools_2.10.0
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Have you used RGoogleDocs and RGoogleData?

2009-12-12 Thread Duncan Temple Lang


Farrel Buchinsky wrote:
 It Works! Thanks a lot! Its great.

Thanks for letting me know. Glad that fixed things for you.

 
 What were your few minor, but important, changes - in a nutshell. I will
 not understand unless you describe it as high level issues.

Basically, recognizing the type of a document, e.g. a spreadsheet
or word processing document or generic document.
The changes made the detection more robust or more consistent
with any changes at Google.

  D.

 Farrel Buchinsky
 Google Voice Tel: (412) 567-7870
 
 
 
 On Fri, Dec 11, 2009 at 19:07, Duncan Temple Lang
 dun...@wald.ucdavis.eduwrote:
 
 Hi Farrel

  I have taken a look at the problems using RGoogleDocs to read
 spreadsheets and was able to reproduce the problem I believe you
 were having. A few minor, but important, changes and I can read
 spreadsheets again and apparently still other types of documents.

 I have put an updated version of the source of the package with
 these changes. It is available from

  http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.tar.gz

 There is a binary for Windows  in
  http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.zip

 Hopefully  this will cure the problems you have been experiencing.
 I'd appreciate knowing either way.

  Thanks,

   D.


 Farrel Buchinsky wrote:
 Both of these applications fulfill a great need of mine: to read data
 directly from google spreadsheets that are private to myself and one or
 two
 collaborators. Thanks to the authors. I had been using RGoogleDocs for
 the
 about 6 months (maybe more) but have had to stop using it in the past
 month
 since for some reason that I do not understand it no longer reads google
 spreadsheets. I loved it. Its loss depresses me. I started using
 RGoogleData
 which works.

 I have noticed that both packages read data slowly. RGoogleData is much
 slower than RGoogleDocs used to be. Both seem a lot slower than if one
 manually downloaded a google spreadsheet as a csv and then used read.csv
 function - but then I would not be able to use scripts and execute
 without
 finding and futzing.

 Can anyone explain in English why these packages read slower than a csv
 download?
 Can anyone explain what the core difference is between the two packages?
 Can anyone share their experience with reading Google data straight into
 R?
 Farrel Buchinsky
 Google Voice Tel: (412) 567-7870

 Sent from Pittsburgh, Pennsylvania, United States

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Have you used RGoogleDocs and RGoogleData?

2009-12-11 Thread Duncan Temple Lang

Hi Farrel

 I have taken a look at the problems using RGoogleDocs to read
spreadsheets and was able to reproduce the problem I believe you
were having. A few minor, but important, changes and I can read
spreadsheets again and apparently still other types of documents.

I have put an updated version of the source of the package with
these changes. It is available from

  http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.tar.gz

There is a binary for Windows  in
  http://www.omegahat.org/RGoogleDocs/RGoogleDocs_0.4-1.zip

Hopefully  this will cure the problems you have been experiencing.
I'd appreciate knowing either way.

 Thanks,

   D.


Farrel Buchinsky wrote:
 Both of these applications fulfill a great need of mine: to read data
 directly from google spreadsheets that are private to myself and one or two
 collaborators. Thanks to the authors. I had been using RGoogleDocs for the
 about 6 months (maybe more) but have had to stop using it in the past month
 since for some reason that I do not understand it no longer reads google
 spreadsheets. I loved it. Its loss depresses me. I started using RGoogleData
 which works.
 
 I have noticed that both packages read data slowly. RGoogleData is much
 slower than RGoogleDocs used to be. Both seem a lot slower than if one
 manually downloaded a google spreadsheet as a csv and then used read.csv
 function - but then I would not be able to use scripts and execute without
 finding and futzing.
 
 Can anyone explain in English why these packages read slower than a csv
 download?
 Can anyone explain what the core difference is between the two packages?
 Can anyone share their experience with reading Google data straight into R?
 
 Farrel Buchinsky
 Google Voice Tel: (412) 567-7870
 
 Sent from Pittsburgh, Pennsylvania, United States
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Scraping a web page

2009-12-03 Thread Duncan Temple Lang

Hi Michael

If you just want all of the text that is displayed in the
HTML docment, then you might use an XPath expression to get
all the text() nodes and get their value.

An example is

  doc = htmlParse(http://www.omegahat.org/;)
  txt = xpathSApply(doc, //body//text(), xmlValue)

The result is a character vector that contains all the text.

By limiting the nodes to the body, we avoid the content in head
such as inlined JavaScript or CSS.

It is also possible that a document may have script elements
in the document containing JavaScript that you don't want.
You can omit these

  txt = xpathSApply(doc, //body//text()[not(ancestor::script)], xmlValue)

And if there were other elements we wanted to ignore, then you could use

 txt = xpathSApply(doc,
   //body//text()[not(ancestor::script) and 
not(ancestor::otherElement)],
   xmlValue)


HTH,

 D.


Michael Conklin wrote:
 I would like to be able to submit a list of URLs of various webpages and 
 extract the content i.e. not the mark-up of those pages. I can find plenty 
 of examples in the XML library of extracting links from pages but I cannot 
 seem to find a way to extract the text.  Any help would be greatly 
 appreciated - I will not know the structure of the URLs I would submit in 
 advance.  Any suggestions on where to look would be greatly appreciated.
 
 Mike
 
 W. Michael Conklin
 Chief Methodologist
 
 MarketTools, Inc. | www.markettools.comhttp://www.markettools.com
 6465 Wayzata Blvd | Suite 170 |  St. Louis Park, MN 55426.  PHONE: 
 952.417.4719 | CELL: 612.201.8978
 This email and attachment(s) may contain confidential and/or proprietary 
 information and is intended only for the intended addressee(s) or its 
 authorized agent(s). Any disclosure, printing, copying or use of such 
 information is strictly prohibited. If this email and/or attachment(s) were 
 received in error, please immediately notify the sender and delete all copies
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading from Google Docs

2009-11-28 Thread Duncan Temple Lang


Farrel Buchinsky wrote:
 Please oh please could someone help me or at least confirm that they are
 having the same problem.
 
 Why am I getting the error message from RGoogleDocs
 
 getDocs(sheets.con)
 Error in getDocs(sheets.con) :
   problems connecting to get the list of documents

You are using a connection to the wise service (for worksheets)
to get the list of documents from the document service.

If you call getDocs() with an connection to writely, I
imagine it will succeed.

So you have a token, but it is for the wrong thing.

 
 
 How do I troubleshoot?

The first thing is to learn about debugging in R.
For example,

options(error = recover)

getDocs(sheets.con)

The error occurs and you are presented with a menu prompt that allows you
to select the call frame of interest. There is only one - getDocs().
Enter 1 Return.  Now you have an R prompt  that allows you to explore
the call frame.

 objects()

 body()


Take a look at status

  status


WWW-Authenticate
GoogleLogin realm=\http://www.google.com/accounts/ClientLogin\;, 
service=\writely\
   
Content-Type
 text/html; 
charset=UTF-8

   Date
Sat, 28 Nov 2009 
17:36:16 GMT

Expires
Sat, 28 Nov 2009 
17:36:16 GMT
  
Cache-Control
   private, 
max-age=0
 
X-Content-Type-Options
  
nosniff
   
X-XSS-Protection

0

X-Frame-Options
   
SAMEORIGIN

 Server
  
GFE/2.0
  
Transfer-Encoding
  
chunked

 status

  401
  
statusMessage
Token 
invalid


This is the parsed header of the reply from the GoogleDocs server.

x contains the result of the query and it is an HTML document with the (same) 
error message.


 
 
 Farrel Buchinsky
 Google Voice Tel: (412) 567-7870
 
 
 
 On Wed, Nov 25, 2009 at 17:08, Farrel Buchinsky fjb...@gmail.com wrote:
 
 Oh OH! Could you please help with a problem that I never used to get.

 library(RGoogleDocs)
 ps -readline(prompt=get the password in )
 sheets.con = getGoogleDocsConnection(getGoogleAuth(fjb...@gmail.com, ps,
 service =wise))
 ts2=getWorksheets(OnCall,sheets.con)

 Those opening lines of script used to work flawlesly. Now I get.
 Error in getDocs(con) : problems connecting to get the list of documents

 Yet I got it to work earlier while I had been toying with RGoogleData
 package in another session. Could RGoogleData have opened something for
 RGoogleDocs to use?

 Farrel Buchinsky
 Google Voice Tel: (412) 567-7870

 Sent from Pittsburgh, Pennsylvania, United States

 On Wed, Nov 25, 2009 at 16:34, Farrel Buchinsky fjb...@gmail.com wrote:

 That was painless. I had already installed Rtools and had already put it
 on my path.

 Your line worked very well. [Thanks for telling me. However I did it last
 time was worse than sticking daggers in my eyes. ]
  install.packages( RGoogleDocs, repos=http://www.omegahat.org/R;, 
 type=source
 )

 I now have
 Package: RGoogleDocs
 Version: 0.4-0
 Title:
 
 Maintainer: Duncan Temple Lang dun...@wald.ucdavis.edu
 Packaged: 2009-10-27 22:10:22 UTC; duncan
 Built: R 2.10.0; ; 2009-11-25 20:59:03 UTC; windows

 I am providing the following link to a copy of my RGoogleDocs zipped
 directory. It is for people who run R in windows and do not want to go
 through the pain of setting things up so that they can install source.
 http://dl.dropbox.com/u/23200/RGoogleDocs/RGoogleDocs.zip

 I BELIEVE that if one downloads the zip and extracts it to an empty
 directory called RGoogleDocs in one's Library

Re: [R] Build of XML package failed

2009-11-27 Thread Duncan Temple Lang
Hi Luis.

You can change the two lines

 PROBLEM buf
WARN;

to the one line

  warning(buf);


That should compile.

If not, please show us the compilation command for DocParse.c, i.e. all the 
arguments
to the compiler, just above the error messages.

 D.

Luis Tito de Morais wrote:
 Hi list,
 
 It may be a FAQ, but I searched the web and Uni of Newcastle Maths and Stats 
 and
 R mailing list archive on this issue but was unable to find a solution. I 
 would
 appreciate any pointer to help me solving this.
 
 I am using R version 2.10.0 (2009-10-26) on linux mandriva 2010.0
 
 I tried to install the XML_2.6-0.tar.gz package both with
 install.packages('XML', dep=T) from within R and the R CMD INSTALL using a
 local tar.gz file.
 
 I am having the following error message (sorry it is partly in french):
 
 Dans le fichier inclus à partir de DocParse.c:13:
 Utils.h:175:2: attention : #warning Redefining COPY_TO_USER_STRING to use
 encoding from XML parser
 DocParse.c: In function ‘notifyError’:
 DocParse.c:1051: erreur: le format n'est pas une chaîne littérale et pas
 d'argument de format
 
 This last error message means:
 error: format not a string literal and no format arguments
 
 In the past when having such errors with other packages, I have been able to
 solve it with the help of this tip:
 http://mario79t.wordpress.com/2009/06/23/warning-format-not-a-string-literal-and-no-format-arguments/
 
 and modifying the faulty source file accordingly.
 
 But in this specific case, I have been unable to find what to modify in the
 source file. Line 1051 in the DocParse.c source file only has the command
 WARN. I don't know anything about C programming and could not figure out 
 what
 to modify in this case.
 
 I would appreciate any help on this issue.
 
 Best regards,
 
 Tito
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to suppress errors generated by readHTMLTable?

2009-11-26 Thread Duncan Temple Lang

Just this morning, I made suppressing these parser messages
the default behavior for htmlParse() and that will apply
to readHTMLTable() also.

Until I release that (along with another potentially
non-backward compatible change regarding character encoding),
you can use

 readHTMLTable(htmlParse(index.html, error = function(...){}))

i.e. parse the document yourself and hand it to readHTMLTable().

 D.

Peng Yu wrote:
 library(XML)
 
 download.file('http://polya.umdnj.edu/polya_db2/gene.php?llid=109079unigene=submit=Submit','index.html')
 tables=readHTMLTable(index.html,error=function(...){})
 tables
 
 
 readHTMLTable gives me the following errors. Could somebody let me
 know how to suppress them?
 
 
 Opening and ending tag mismatch: center and table
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 Opening and ending tag mismatch: td and tr
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 htmlParseEntityRef: expecting ';'
 Unexpected end tag : form
 Opening and ending tag mismatch: body and center
 Opening and ending tag mismatch: body and center
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML package example code?

2009-11-25 Thread Duncan Temple Lang


Peng Yu wrote:
 On Wed, Nov 25, 2009 at 12:19 AM, cls59 ch...@sharpsteen.net wrote:

 Peng Yu wrote:
 I'm interested in parsing an html page. I should use XML, right? Could
 you somebody show me some example code? Is there a tutorial for this
 package?

 Did you try looking through the help pages for the XML package or browsing
 the Omegahat website?

 Look at:

  library(XML)
  ?htmlTreeParse

 And the relevant web page for documentation and examples is:

  http://www.omegahat.org/RSXML/
 
 
 http://www.omegahat.org/RSXML/shortIntro.html
 
 I'm trying the example on the above webpage. But I'm not sure why I
 got the following error. Would you help to take a look?
 
 
 $ Rscript main.R
 library(XML)

 download.file('http://www.omegahat.org/RSXML/index.html','index.html')
 trying URL 'http://www.omegahat.org/RSXML/index.html'
 Content type 'text/html; charset=ISO-8859-1' length 3021 bytes
 opened URL
 ==
 downloaded 3021 bytes
 
 doc = xmlInternalTreeParse(index.html)


You are trying to parse an HTML document as if it were XML.
But HTML is often not well-formed.  So use htmlParse()
for a more forgiving parser.

Or use the RTidyHTML package (www.omegahat.org/RTidyHTML)
to make the HTML well-formed before passing it to xmlTreeParse()
(aka xmlInternalTreeParse()). That package is an interface to
libtidy.

 D.


 Opening and ending tag mismatch: dd line 68 and dl
 Opening and ending tag mismatch: li line 67 and body
 Opening and ending tag mismatch: dt line 66 and html
 Premature end of data in tag dd line 64
 Premature end of data in tag li line 63
 Premature end of data in tag dt line 62
 Premature end of data in tag dl line 61
 Premature end of data in tag body line 5
 Premature end of data in tag html line 1
 Error: 1: Opening and ending tag mismatch: dd line 68 and dl
 2: Opening and ending tag mismatch: li line 67 and body
 3: Opening and ending tag mismatch: dt line 66 and html
 4: Premature end of data in tag dd line 64
 5: Premature end of data in tag li line 63
 6: Premature end of data in tag dt line 62
 7: Premature end of data in tag dl line 61
 8: Premature end of data in tag body line 5
 9: Premature end of data in tag html line 1
 Execution halted
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] problem post request with RCurl

2009-11-18 Thread Duncan Temple Lang
Use

 curlPerform(url = 'http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi', postfields = 
q)


That gives me:

PCT-Data
  PCT-Data_output
PCT-OutputData
  PCT-OutputData_status
PCT-Status-Message
  PCT-Status-Message_status
PCT-Status value=running/
  /PCT-Status-Message_status
/PCT-Status-Message
  /PCT-OutputData_status
  PCT-OutputData_output
PCT-OutputData_output_waiting
  PCT-Waiting
PCT-Waiting_reqid31406321645402938/PCT-Waiting_reqid
  /PCT-Waiting
/PCT-OutputData_output_waiting
  /PCT-OutputData_output
/PCT-OutputData
  /PCT-Data_output
/PCT-Data

Rajarshi Guha wrote:
 Hi, I am trying to use a CGI service (Pubchem PUG) via RCurl and am
 running into a problem where the data must be supplied via POST - but I
 don't know the keyword for the argument.
 
 The data to be sent is an XML fragment. I can do this via the command
 line using curl: I save the XML string to a file called query.xml and
 then do
 
 curl -d @query.xml http://pubchem.ncbi.nlm.nih.gov/pug/pug.cgi;
 
 I get the expected response. More importantly, the verbose option shows:
 
 Accept: */*
 Content-Length: 1227
 Content-Type: application/x-www-form-urlencoded
 
 However, when I try to do this via RCurl, the data doesn't seem to get
 sent:
 
 q - PCT-Data  PCT-Data_inputPCT-InputData 
 PCT-InputData_queryPCT-Query 
 PCT-Query_typePCT-QueryType 
 PCT-QueryType_qas   
 PCT-QueryActivitySummary 
 PCT-QueryActivitySummary_output
 value=\summary-table\0/PCT-QueryActivitySummary_output 
 PCT-QueryActivitySummary_type
 value=\assay-central\0/PCT-QueryActivitySummary_type 
 PCT-QueryActivitySummary_scids   
 PCT-QueryUids 
 PCT-QueryUids_ids   
 PCT-ID-List 
 PCT-ID-List_dbpccompound/PCT-ID-List_db 
 PCT-ID-List_uids   
 PCT-ID-List_uids_E3243128/PCT-ID-List_uids_E 
 /PCT-ID-List_uids   
 /PCT-ID-List 
 /PCT-QueryUids_ids   
 /PCT-QueryUids 
 /PCT-QueryActivitySummary_scids   
 /PCT-QueryActivitySummary  /PCT-QueryType_qas   
 /PCT-QueryType  /PCT-Query_type/PCT-Query 
 /PCT-InputData_query/PCT-InputData  /PCT-Data_input/PCT-Data
 
 postForm(url, q, style=post, .opts = list(verbose=TRUE))
 * About to connect() to pubchem.ncbi.nlm.nih.gov port 80 (#0)
 *   Trying 130.14.29.110... * connected
 * Connected to pubchem.ncbi.nlm.nih.gov (130.14.29.110) port 80 (#0)
 POST /pug/pug.cgi HTTP/1.1
 Host: pubchem.ncbi.nlm.nih.gov
 Accept: */*
 Content-Length: 0
 Content-Type: application/x-www-form-urlencoded
 
 As you can see, the data in q doesn't seem to get sent (content-length =
 0).
 
 Does anybody have any suggestions as to why the call to postForm doesn't
 work, but the command line call does?
 
 Thanks,
 
 
 Rajarshi Guha| NIH Chemical Genomics Center
 http://www.rguha.net | http://ncgc.nih.gov
 
 Q:  Why did the mathematician name his dog Cauchy?
 A:  Because he left a residue at every pole.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] XML: Reading transition matrices into R

2009-11-12 Thread Duncan Temple Lang


stefan.d...@gmail.com wrote:
 Hello,
 from a software I have the following output in xml (see below):
 It is a series of matrices, for each age one. I have 3 categories
 (might vary in the application), hence, 3x3 matrices where each
 element gives the probability of transition from i to j. I would like
 read this data into R (preferably in a list, where each list element
 is one of the age specific matrices) and - after altering the values
 in R - write it back into the file.  I know that there is an xml
 package in R with which I have already struggled, but I have to admit
 my understanding is too limited. Maybe somebody had a similar problem
 or know the code of the top of his or her head.

Hi Stefan

  There are many approaches for handling this. I assume that the primary
obstacle you are facing is extracting the values from the XML.  The following
will do that for you.
We start with the content in transition.xml (or in a string in R).
Since the XML is very shallow, i.e. not very hierarchical, and all
the information is in the transition nodes under the root, we can
use xmlToList().
This returns a list with an element for each transition
element, and such elements are character vectors containing the
values from age, sex, from, to, and percent.
So I've put these into a matrix and you are now back entirely
within R and can group the values by age and arrange them into
the individual transition matrices.


   doc = xmlParse(transition.xml)
   matrix(as.numeric(unlist(xmlToList(doc))), , 5, byrow = TRUE,
  dimnames = list(NULL, names(xmlRoot(doc)[[1]])))


 D.



 
 Any help appreciated.
 
 Thanks and best,
 Stefan
 
 
 
 
 
 ?xml version=1.0 encoding=UTF-8 standalone=no?transitionmatrix
 transitionage0/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage0/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage0/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage0/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage0
/!
  
 agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto2/topercent100.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage1/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage1/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom1/fro
m!
  to3/topercent0.0/percent/transitiontransitionage1/age
 sex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage1/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage1/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fro
m!
  
 to2/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage2/agesex0/sexfrom1/fromto1/topercent99.35205/percent/transitiontransitionage2/agesex0/sexfrom1/fromto2/topercent0.6479474/percent/transitiontransitionage2/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage2/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage2/agesex0/sexfrom2/fromto2/topercent97.101456/percent/transitiontransitionage2/agesex0/sexfrom2/fromto3/toperc
e!
  nt2.8985496/percent/transitiontransitionage2/agesex0/sex
 

Re: [R] XML: Reading transition matrices into R

2009-11-12 Thread Duncan Temple Lang


stefan.d...@gmail.com wrote:
 Hello,
 thanks a lot. This is a form which I can work with in R.
 
 Another question, which I hope is not a bridge too far: Can I write
 the R matrix which your code created back into the xml format (i.e.
 with the same tags and structure) from which it came and hence feed it
 back to the original software?


trans = apply(xx, 1, function(x) {
   tr = newXMLNode(transition)
   mapply(newXMLNode, names(x), x, MoreArgs = list(parent = 
tr))
   tr
 })
top = newXMLNode(transitionmatrix, .children = trans)
saveXML(top, newTransition.xml)




 
 Best,
 Stefan
 
 
 On Thu, Nov 12, 2009 at 3:17 PM, Duncan Temple Lang
 dun...@wald.ucdavis.edu wrote:

 stefan.d...@gmail.com wrote:
 Hello,
 from a software I have the following output in xml (see below):
 It is a series of matrices, for each age one. I have 3 categories
 (might vary in the application), hence, 3x3 matrices where each
 element gives the probability of transition from i to j. I would like
 read this data into R (preferably in a list, where each list element
 is one of the age specific matrices) and - after altering the values
 in R - write it back into the file.  I know that there is an xml
 package in R with which I have already struggled, but I have to admit
 my understanding is too limited. Maybe somebody had a similar problem
 or know the code of the top of his or her head.
 Hi Stefan

  There are many approaches for handling this. I assume that the primary
 obstacle you are facing is extracting the values from the XML.  The following
 will do that for you.
 We start with the content in transition.xml (or in a string in R).
 Since the XML is very shallow, i.e. not very hierarchical, and all
 the information is in the transition nodes under the root, we can
 use xmlToList().
 This returns a list with an element for each transition
 element, and such elements are character vectors containing the
 values from age, sex, from, to, and percent.
 So I've put these into a matrix and you are now back entirely
 within R and can group the values by age and arrange them into
 the individual transition matrices.


   doc = xmlParse(transition.xml)
   matrix(as.numeric(unlist(xmlToList(doc))), , 5, byrow = TRUE,
  dimnames = list(NULL, names(xmlRoot(doc)[[1]])))


  D.



 Any help appreciated.

 Thanks and best,
 Stefan





 ?xml version=1.0 encoding=UTF-8 standalone=no?transitionmatrix
 transitionage0/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage0/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage0/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage0/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage
0
 /!
  
 agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto2/topercent100.0/percent/transitiontransitionage0/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage0/agesex1/sexfrom3/fromto3/topercent100.0/percent/transitiontransitionage1/agesex0/sexfrom1/fromto1/topercent99.9/percent/transitiontransitionage1/agesex0/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom1/f
ro
 m!
  to3/topercent0.0/percent/transitiontransitionage1/age
 sex0/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom2/fromto2/topercent99.85/percent/transitiontransitionage1/agesex0/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto2/topercent0.0/percent/transitiontransitionage1/agesex0/sexfrom3/fromto3/topercent99.85/percent/transitiontransitionage1/agesex1/sexfrom1/fromto1/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto2/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom1/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto1/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom2/f
ro
 m!
  
 to2/topercent100.0/percent/transitiontransitionage1/agesex1/sexfrom2/fromto3/topercent0.0/percent/transitiontransitionage1/agesex1/sexfrom3/fromto1/topercent0.0/percent/transitiontransitionage1

Re: [R] help with SSOAP (can't find working examples)

2009-11-03 Thread Duncan Temple Lang

Hi Steffen et al.

 The development version of SSOAP and XMLSchema I have on my machine
does complete the processWSDL() call without errors. I have to finish
off some tests before releasing these. It may take a few days before
I have time to work on this, but hopefully soon.

Thanks for the info.

 D.


Steffen Neumann wrote:
 Hi,
 
 I can confirm this, just today 
 I tried to write a web service client.
 Affected are both SSOAP-0.5-4 and SSOAP_0.4-6.
 
 I can't access anonymous CVS atm. to check for recent fixes.
 I am unable to map the error message to any of the items in 
 http://www.omegahat.org/SSOAP/Todo.html , is this already known ?
 
 Yours,
 Steffen
 
 library(SSOAP)
 Loading required package: XML
 Loading required package: RCurl
 Loading required package: bitops
 w = processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;)
 Error: Cannot resolve ns:searchPeakDiff in SchemaCollection
 In addition: Warning messages:
 1: In function (node)  :
   skipping import node with no schemaLocation attribute
 2: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) :
   Ignoring additional serviceport ... elements
 sessionInfo()
 R version 2.8.1 (2008-12-22) 
 x86_64-pc-linux-gnu 
 
 locale:
 LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 
 
 other attached packages:
 [1] SSOAP_0.4-6RCurl_1.3-0bitops_1.0-4.1 XML_2.5-3   
 
 
 w = processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;)
 Error: Cannot resolve ns:searchPeakDiff in SchemaCollection
 In addition: Warning messages:
 1: In function (node)  :
   skipping import node with no schemaLocation attribute
 2: In processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;) :
   Ignoring additional serviceport ... elements
 
 Enter a frame number, or 0 to exit   
 
  1: processWSDL(http://www.massbank.jp/api/services/MassBankAPI?wsdl;)
  2: lapply(tmp, processWSDLBindings, doc, types)
  3: FUN(X[[1]], ...)
  4: lapply(els, processWSDLOperation, types, doc, namespaceDefinitions, 
 typeDef
  5: FUN(X[[1]], ...)
  6: xmlSApply(msg, function(x) {
  7: xmlSApply.XMLNode(msg, function(x) {
  8: sapply(xmlChildren(X), FUN, ...)
  9: lapply(X, FUN, ...)
 10: FUN(X[[1]], ...)
 11: resolve(el, typeDefinitions)
 12: resolve(el, typeDefinitions)
 13: resolveError(Cannot resolve , obj,  in , class(context))
 
 
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error installing RSPerl.

2009-10-29 Thread Duncan Temple Lang
Hi Grainne

There is one likely cause. But before getting into the explanation,
can you send me the output from when you installed the package, e.g. the output 
from

 R CMD INSTALL   RSPerl

and any configuration arguments you specified.

You can send this to me off-list and we can summarize at the end.

 Thanks,
   D.

Grainne Kerr wrote:
 Dear list,
 
 I have updated to version R-2.10.0. When I try to load the RSPerl
 library I get the following error:
 
 library(RSPerl)
 Error in dyn.load(file, DLLpath = DLLpath, ...) :
   unable to load shared library
 '/usr/local/lib/R/library/RSPerl/libs/RSPerl.so':
   /usr/local/lib/R/library/RSPerl/libs/RSPerl.so: undefined symbol:
 boot_DB_File__Glob
 Error: package/namespace load failed for 'RSPerl'
 
 I do not know how to fix this. Can anyone please help?
 
 I'm runninn R on Ubuntu 9.04
 
 Many thanks,
 Grainne.
 
 sessionInfo()
 R version 2.10.0 (2009-10-26)
 i686-pc-linux-gnu
 
 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
  [9] LC_ADDRESS=C   LC_TELEPHONE=C
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   >