I once had pretty good success parsing some sloppy HTML right off the
web through an HTTP proxy server with a parser called neko.  I can
provide code samples off-list if you need them.
 
It is also an apache offering.
 
Timothy  Jones

Syniverse Technologies

Work

(813) 637-5366

Sr. Systems Engineer  

Cell

(813) 857-7650

Development, Tampa, FL

    

 

________________________________

From: Dave Brosius [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 21, 2007 9:37 AM
To: Michael Bauer
Cc: xalan-j-users@xml.apache.org
Subject: Re: Ignoring errors



No, but there are various html 'tidying' tools that you could use to
preparse the html before passing to the transformer. 



Michael Bauer <[EMAIL PROTECTED]> 

08/21/2007 09:33 AM 

To
xalan-j-users@xml.apache.org 
cc
Subject
Ignoring errors

        




I am using Xalan/Xerces to parse out some data from a web page.  The
problem is that the web page is not well-formed, and running the
Transformer on it produces:                                         
ERROR:  'Open quote is expected for attribute "href|".' 
ERROR:  'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException:
Open quote is expected for attribute "href|".' 
Is there anyway to instruct the Parse/Transformer to ignore such errors?

Reply via email to