Howdy all,
I'm working on a short project where I am parsing a page that happens to
contain some nodes that cause REXML to die -- some specific examples are:
<page _extended="true" user:="user:" per="per" Views="Views" />
<[EMAIL PROTECTED] _extended="true" />
<j _extended="true" 221,546="221,546" />
The nodes with @, : and , all throw:
c:/ruby/lib/ruby/site_ruby/1.8/rexml/parsers/treeparser.rb:90:in `parse':
#<REXML::ParseException: malformed XML: missing tag start
(REXML::ParseException)
I've hacked in a workaround (see below) that will massage the html source
before passing it to REXML, but then I have to search the Document object
for the nodes I am looking for (instead of using the spiffy
IE.elements_by_xpath)
Any tips on getting Watir to be happy with lousy XML source?
--john
# Hack for the Watir::IE object to return an XML document that has been
scrubbed of offending node names from the html source
#
module Watir
class IE
def xml_source
xmlSource = html_source(document.body, "<?xml
version=\"1.0\" encoding=\"us-ascii\"?>\n<HTML>\n", " ")
xmlSource += "\n</HTML>\n"
xmlSource = xmlSource.gsub(/ /, ' ')
xmlSource = xmlSource.gsub(/user:/, 'user')
xmlSource = xmlSource.gsub(/@/, '_')
xmlSource = xmlSource.gsub(/,/, '')
return REXML::Document.new(xmlSource)
end
end
end
_______________________________________________
Wtr-general mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/wtr-general