FWIW, here's a link to the project on SF.net:
http://sourceforge.net/projects/nekohtml/

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 10:58:56
AM:

> Hi, Michael -
>
> Sorry to be contrary, but I don't see it on SF.net.
>    http://sourceforge.net/search/?type_of_search=soft&words=neko
>
> The page I found was at
> http://people.apache.org/~andyc/neko/doc/index.html.  It is a personal
> page, but on the apache.org site.  Official or not, NEKO did the trick
> for me!
>
> tlj
> -----Original Message-----
> From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 21, 2007 10:47 AM
> To: xalan-j-users@xml.apache.org
> Cc: Michael Bauer; Dave Brosius; Timothy Jones
> Subject: RE: Ignoring errors
>
> NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I
> think Andy Clark (the creator) maintains it in sourceforge these days.
>
> Thanks.
>
> Michael Glavassevich
> XML Parser Development
> IBM Toronto Lab
> E-mail: [EMAIL PROTECTED]
> E-mail: [EMAIL PROTECTED]
>
> "Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007
> 10:24:44
> AM:
>
> > I once had pretty good success parsing some sloppy HTML right off the
> > web through an HTTP proxy server with a parser called neko.  I can
> > provide code samples off-list if you need them.
> >
> > It is also an apache offering.
> >
> >
> > Timothy  Jones
> >
> > Syniverse Technologies
> >
> > Work
> >
> > (813) 637-5366
> >
> > Sr. Systems Engineer
> >
> > Cell
> >
> > (813) 857-7650
> >
> > Development, Tampa, FL
> >
> >
> >
> >
> > From: Dave Brosius [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, August 21, 2007 9:37 AM
> > To: Michael Bauer
> > Cc: xalan-j-users@xml.apache.org
> > Subject: Re: Ignoring errors
>
> >
> > No, but there are various html 'tidying' tools that you could use to
> > preparse the html before passing to the transformer.
> >
>
> >
> > Michael Bauer <[EMAIL PROTECTED]>
> > 08/21/2007 09:33 AM
> >
> > To
> >
> > xalan-j-users@xml.apache.org
> >
> > cc
> >
> > Subject
> >
> > Ignoring errors
> >
> >
> >
> >
> > I am using Xalan/Xerces to parse out some data from a web page.  The
> > problem is that the web page is not well-formed, and running the
> > Transformer on it produces:
> > ERROR:  'Open quote is expected for attribute "href|".'
> > ERROR:  'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
> > : Open quote is expected for attribute "href|".'
> > Is there anyway to instruct the Parse/Transformer to ignore such
> errors?
>

Reply via email to