Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Duncan Temple Lang
Hi Ryusuke I would use the encoding parameter of htmlParse() and download and parse the content in one operation: htmlParse(http://home.sina.com;, encoding = UTF-8) If you want to use getURL() in RCurl, use the .encoding parameter You didn't tell us the output of Sys.getlocale()

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Ryusuke Kenji
Hi Prof, Thank you for your reply. Sorry that I missed out the below information. Sys.getlocale() [1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 I have just noticed that

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-02 Thread Ryusuke Kenji
Hi All, First method:- library(XML) theurl - http://home.sina.com; download.file(theurl, tmp.html) txt - readLines(tmp.html) txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g - xpathSApply(txt, //p, function(x) xmlValue(x)) head(grep( , g, value=T)) [1] 繁體 |

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-01-01 Thread Lauri Nikkinen
Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my Win XP. See sessionInfo from below. Mac R: sessionInfo() R version 2.9.2 (2009-08-24) i386-apple-darwin8.11.1 locale: fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8 attached base packages: [1] stats graphics

[R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Lauri Nikkinen
Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not able to get encoding right in htmlTreeParse command. See below library(RCurl) library(XML) site - getURL(http://www.aarresaari.net/jobboard/jobs.html;) txt - readLines(tc -

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Duncan Temple Lang
Hi Lauri. I am in the process of making some changes to the encoding in the XML package. I'll take a look over the next few days. (Not certain precisely when.) D. Lauri Nikkinen wrote: Hi, I'm trying to get data from web page and modify it in R. I have a problem with encoding. I'm not

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Lauri Nikkinen
Thanks, looking forward to that! Happy New Year! -Lauri 2009/12/31 Duncan Temple Lang dun...@wald.ucdavis.edu: Hi Lauri. I am in the process of making some changes to the encoding in the XML package. I'll take a look over the next few days. (Not certain precisely when.)  D. Lauri

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2009-12-31 Thread Eduardo Leoni
In the meantime, try this. library(XML) theurl - http://www.aarresaari.net/jobboard/jobs.html; download.file(theurl, tmp.html) txt - readLines(tmp.html) txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE) g - xpathSApply(txt, //p, function(x) xmlValue(x)) head(grep( , g,