Hi Ryusuke
I would use the encoding parameter of htmlParse() and
download and parse the content in one operation:
htmlParse(http://home.sina.com;, encoding = UTF-8)
If you want to use getURL() in RCurl, use the .encoding parameter
You didn't tell us the output of Sys.getlocale()
Hi Prof,
Thank you for your reply. Sorry that I missed out the below information.
Sys.getlocale()
[1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
I have just noticed that
Hi All,
First method:-
library(XML)
theurl - http://home.sina.com;
download.file(theurl, tmp.html)
txt - readLines(tmp.html)
txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes =
TRUE)
g - xpathSApply(txt, //p, function(x) xmlValue(x))
head(grep( , g, value=T))
[1] ç¹é« |
Thanks. Interestingly, your code works on my Mac 10.6.1 but not on my
Win XP. See sessionInfo from below.
Mac R:
sessionInfo()
R version 2.9.2 (2009-08-24)
i386-apple-darwin8.11.1
locale:
fi_FI.UTF-8/fi_FI.UTF-8/C/C/fi_FI.UTF-8/fi_FI.UTF-8
attached base packages:
[1] stats graphics
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not able to get
encoding right in htmlTreeParse command. See below
library(RCurl)
library(XML)
site - getURL(http://www.aarresaari.net/jobboard/jobs.html;)
txt - readLines(tc -
Hi Lauri.
I am in the process of making some changes
to the encoding in the XML package. I'll take a look
over the next few days. (Not certain precisely when.)
D.
Lauri Nikkinen wrote:
Hi,
I'm trying to get data from web page and modify it in R. I have a
problem with encoding. I'm not
Thanks, looking forward to that!
Happy New Year!
-Lauri
2009/12/31 Duncan Temple Lang dun...@wald.ucdavis.edu:
Hi Lauri.
I am in the process of making some changes
to the encoding in the XML package. I'll take a look
over the next few days. (Not certain precisely when.)
D.
Lauri
In the meantime, try this.
library(XML)
theurl - http://www.aarresaari.net/jobboard/jobs.html;
download.file(theurl, tmp.html)
txt - readLines(tmp.html)
txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = TRUE)
g - xpathSApply(txt, //p, function(x) xmlValue(x))
head(grep( , g,
8 matches
Mail list logo