FWIW, here's a link to the project on SF.net: http://sourceforge.net/projects/nekohtml/
Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] "Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 10:58:56 AM: > Hi, Michael - > > Sorry to be contrary, but I don't see it on SF.net. > http://sourceforge.net/search/?type_of_search=soft&words=neko > > The page I found was at > http://people.apache.org/~andyc/neko/doc/index.html. It is a personal > page, but on the apache.org site. Official or not, NEKO did the trick > for me! > > tlj > -----Original Message----- > From: Michael Glavassevich [mailto:[EMAIL PROTECTED] > Sent: Tuesday, August 21, 2007 10:47 AM > To: xalan-j-users@xml.apache.org > Cc: Michael Bauer; Dave Brosius; Timothy Jones > Subject: RE: Ignoring errors > > NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I > think Andy Clark (the creator) maintains it in sourceforge these days. > > Thanks. > > Michael Glavassevich > XML Parser Development > IBM Toronto Lab > E-mail: [EMAIL PROTECTED] > E-mail: [EMAIL PROTECTED] > > "Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 > 10:24:44 > AM: > > > I once had pretty good success parsing some sloppy HTML right off the > > web through an HTTP proxy server with a parser called neko. I can > > provide code samples off-list if you need them. > > > > It is also an apache offering. > > > > > > Timothy Jones > > > > Syniverse Technologies > > > > Work > > > > (813) 637-5366 > > > > Sr. Systems Engineer > > > > Cell > > > > (813) 857-7650 > > > > Development, Tampa, FL > > > > > > > > > > From: Dave Brosius [mailto:[EMAIL PROTECTED] > > Sent: Tuesday, August 21, 2007 9:37 AM > > To: Michael Bauer > > Cc: xalan-j-users@xml.apache.org > > Subject: Re: Ignoring errors > > > > > No, but there are various html 'tidying' tools that you could use to > > preparse the html before passing to the transformer. > > > > > > > Michael Bauer <[EMAIL PROTECTED]> > > 08/21/2007 09:33 AM > > > > To > > > > xalan-j-users@xml.apache.org > > > > cc > > > > Subject > > > > Ignoring errors > > > > > > > > > > I am using Xalan/Xerces to parse out some data from a web page. The > > problem is that the web page is not well-formed, and running the > > Transformer on it produces: > > ERROR: 'Open quote is expected for attribute "href|".' > > ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException > > : Open quote is expected for attribute "href|".' > > Is there anyway to instruct the Parse/Transformer to ignore such > errors? >