NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I
think Andy Clark (the creator) maintains it in sourceforge these days.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 10:24:44
AM:

> I once had pretty good success parsing some sloppy HTML right off
> the web through an HTTP proxy server with a parser called neko.  I
> can provide code samples off-list if you need them.
>
> It is also an apache offering.
>
>
> Timothy  Jones
>
> Syniverse Technologies
>
> Work
>
> (813) 637-5366
>
> Sr. Systems Engineer
>
> Cell
>
> (813) 857-7650
>
> Development, Tampa, FL
>
>
>
>
> From: Dave Brosius [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 21, 2007 9:37 AM
> To: Michael Bauer
> Cc: xalan-j-users@xml.apache.org
> Subject: Re: Ignoring errors

>
> No, but there are various html 'tidying' tools that you could use to
> preparse the html before passing to the transformer.
>

>
> Michael Bauer <[EMAIL PROTECTED]>
> 08/21/2007 09:33 AM
>
> To
>
> xalan-j-users@xml.apache.org
>
> cc
>
> Subject
>
> Ignoring errors
>
>
>
>
> I am using Xalan/Xerces to parse out some data from a web page.  The
> problem is that the web page is not well-formed, and running the
> Transformer on it produces:
> ERROR:  'Open quote is expected for attribute "href|".'
> ERROR:  'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
> : Open quote is expected for attribute "href|".'
> Is there anyway to instruct the Parse/Transformer to ignore such errors?

Reply via email to