Dear colleagues,
each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is 
included below as is the results of a series of basic commands that describe 
what I'm experiencing.  The results of sessionInfo() are attached at the bottom 
of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't 
appear to contain the information I need (the URLs of the articles linked to on 
this search page).  Regardless, I'd still like to understand why htmlParse 
doesn't work.
Thank you for any insight.
Yours, 
Simon Kiss


myurl<-c("http://timesofindia.indiatimes.com/searchresult.cms?sortorder=score&searchtype=2&maxrow=10&startdate=2001-01-01&enddate=2011-08-25&article=2&pagenumber=1&isphrase=no&query=IIM&searchfield=&section=&kdaterange=30&date1mm=01&date1dd=01&date1yyyy=2001&date2mm=08&date2dd=25&date2yyyy=2011";)

.x<-htmlParse(myurl)

class(.x)
#returns "HTMLInternalDocument" "XMLInternalDocument" 

.x
#returns
*** caught segfault ***
address 0x1398754, cause 'memory not mapped'

Traceback:
 1: .Call("RS_XML_dumpHTMLDoc", doc, as.integer(indent), 
as.character(encoding),     as.logical(indent), PACKAGE = "XML")
 2: saveXML(from)
 3: saveXML(from)
 4: asMethod(object)
 5: as(x, "character")
 6: cat(as(x, "character"), "\n")
 7: print.XMLInternalDocument(<pointer: 0x11656d3e0>)
 8: print(<pointer: 0x11656d3e0>)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] XML_3.4-0      RCurl_1.5-0    bitops_1.0-4.1
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to