NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I think Andy Clark (the creator) maintains it in sourceforge these days.
Thanks. Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] "Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 10:24:44 AM: > I once had pretty good success parsing some sloppy HTML right off > the web through an HTTP proxy server with a parser called neko. I > can provide code samples off-list if you need them. > > It is also an apache offering. > > > Timothy Jones > > Syniverse Technologies > > Work > > (813) 637-5366 > > Sr. Systems Engineer > > Cell > > (813) 857-7650 > > Development, Tampa, FL > > > > > From: Dave Brosius [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 21, 2007 9:37 AM > To: Michael Bauer > Cc: xalan-j-users@xml.apache.org > Subject: Re: Ignoring errors > > No, but there are various html 'tidying' tools that you could use to > preparse the html before passing to the transformer. > > > Michael Bauer <[EMAIL PROTECTED]> > 08/21/2007 09:33 AM > > To > > xalan-j-users@xml.apache.org > > cc > > Subject > > Ignoring errors > > > > > I am using Xalan/Xerces to parse out some data from a web page. The > problem is that the web page is not well-formed, and running the > Transformer on it produces: > ERROR: 'Open quote is expected for attribute "href|".' > ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException > : Open quote is expected for attribute "href|".' > Is there anyway to instruct the Parse/Transformer to ignore such errors?