Re: [R] htmlParse hangs or crashes

2011-09-28 Thread Jorge Cornejo
Hi, I been having the same problem all the afternoon, and I just realize that
only happens when I use the R64 and is not crashing using the 32 bits
version.

This must be a bug in this R version.

I hope this could helps.

--
View this message in context: 
http://r.789695.n4.nabble.com/htmlParse-hangs-or-crashes-tp3792285p3853858.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] htmlParse hangs or crashes

2011-09-06 Thread Duncan Temple Lang

Hi Simon

 Unfortunately, it works for me on my OS X machine. So I can't reproduce the 
problem.
 I'd be curious to know which version of libxml2 you are using. That might be 
the cause
of the problem.
You can find this with

  library(XML)
  libxmlVersion()

 You might install a more recent version (e.g. libxml = 2.07.0)

 You can send the info to me off list and we can try to resolve the problem.


 htmlParse() returns a reference to the internal C-level XML tree/document.
When you print the value of the variable .x, we then serialize that C-level 
data structure
to a string.

 htmlTreeParse(), by default, converts that C-level XML tree/document into 
regular R objects.
So it traverses the tree and creates those R list()s before it returns and then 
throws the
C-level tree away.

D.

On 9/5/11 2:48 PM, Simon Kiss wrote:
 Dear colleagues,
 each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is 
 included below as is the results of a series of basic commands that describe 
 what I'm experiencing.  The results of sessionInfo() are attached at the 
 bottom of the message.
 The thing is, htmlTreeParse appears to work just fine, although it doesn't 
 appear to contain the information I need (the URLs of the articles linked to 
 on this search page).  Regardless, I'd still like to understand why htmlParse 
 doesn't work.
 Thank you for any insight.
 Yours, 
 Simon Kiss
 
 
 myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;)
 
 .x-htmlParse(myurl)
 
 class(.x)
 #returns HTMLInternalDocument XMLInternalDocument 
 
 .x
 #returns
 *** caught segfault ***
 address 0x1398754, cause 'memory not mapped'
 
 Traceback:
  1: .Call(RS_XML_dumpHTMLDoc, doc, as.integer(indent), 
 as.character(encoding), as.logical(indent), PACKAGE = XML)
  2: saveXML(from)
  3: saveXML(from)
  4: asMethod(object)
  5: as(x, character)
  6: cat(as(x, character), \n)
  7: print.XMLInternalDocument(pointer: 0x11656d3e0)
  8: print(pointer: 0x11656d3e0)
 
 Possible actions:
 1: abort (with core dump, if enabled)
 2: normal R exit
 3: exit R without saving workspace
 4: exit R saving workspace
 
 sessionInfo()
 R version 2.13.0 (2011-04-13)
 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
 
 locale:
 [1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 
 
 other attached packages:
 [1] XML_3.4-0  RCurl_1.5-0bitops_1.0-4.1
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] htmlParse hangs or crashes

2011-09-05 Thread Simon Kiss
Dear colleagues,
each time I use htmlParse, R crashes or hangs.  The url I'd like to parse is 
included below as is the results of a series of basic commands that describe 
what I'm experiencing.  The results of sessionInfo() are attached at the bottom 
of the message.
The thing is, htmlTreeParse appears to work just fine, although it doesn't 
appear to contain the information I need (the URLs of the articles linked to on 
this search page).  Regardless, I'd still like to understand why htmlParse 
doesn't work.
Thank you for any insight.
Yours, 
Simon Kiss


myurl-c(http://timesofindia.indiatimes.com/searchresult.cms?sortorder=scoresearchtype=2maxrow=10startdate=2001-01-01enddate=2011-08-25article=2pagenumber=1isphrase=noquery=IIMsearchfield=section=kdaterange=30date1mm=01date1dd=01date1=2001date2mm=08date2dd=25date2=2011;)

.x-htmlParse(myurl)

class(.x)
#returns HTMLInternalDocument XMLInternalDocument 

.x
#returns
*** caught segfault ***
address 0x1398754, cause 'memory not mapped'

Traceback:
 1: .Call(RS_XML_dumpHTMLDoc, doc, as.integer(indent), 
as.character(encoding), as.logical(indent), PACKAGE = XML)
 2: saveXML(from)
 3: saveXML(from)
 4: asMethod(object)
 5: as(x, character)
 6: cat(as(x, character), \n)
 7: print.XMLInternalDocument(pointer: 0x11656d3e0)
 8: print(pointer: 0x11656d3e0)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/C/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] XML_3.4-0  RCurl_1.5-0bitops_1.0-4.1
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.