Thanks for the hand, guys! NekoHTML worked perfectly. I had to use it as a DOM Parser first then pass it to the Transformer as a DOMSource but thats fine. Now I just need to adjust my XSL and I should be set!

Begin forwarded message:

From: Michael Glavassevich <[EMAIL PROTECTED]>
Date: August 21, 2007 11:22:28 AM EDT
To: xalan-j-users@xml.apache.org
Subject: [OT]: RE: Ignoring errors

FWIW, here's a link to the project on SF.net:
http://sourceforge.net/projects/nekohtml/

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007 10:58:56
AM:

Hi, Michael -

Sorry to be contrary, but I don't see it on SF.net.
   http://sourceforge.net/search/?type_of_search=soft&words=neko

The page I found was at
http://people.apache.org/~andyc/neko/doc/index.html. It is a personal page, but on the apache.org site. Official or not, NEKO did the trick
for me!

tlj
-----Original Message-----
From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 21, 2007 10:47 AM
To: xalan-j-users@xml.apache.org
Cc: Michael Bauer; Dave Brosius; Timothy Jones
Subject: RE: Ignoring errors

NekoHTML is built on top Xerces-J 2.x but it's not an Apache project. I think Andy Clark (the creator) maintains it in sourceforge these days.

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007
10:24:44
AM:

I once had pretty good success parsing some sloppy HTML right off the
web through an HTTP proxy server with a parser called neko.  I can
provide code samples off-list if you need them.

It is also an apache offering.


Timothy  Jones

Syniverse Technologies

Work

(813) 637-5366

Sr. Systems Engineer

Cell

(813) 857-7650

Development, Tampa, FL




From: Dave Brosius [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 21, 2007 9:37 AM
To: Michael Bauer
Cc: xalan-j-users@xml.apache.org
Subject: Re: Ignoring errors


No, but there are various html 'tidying' tools that you could use to
preparse the html before passing to the transformer.



Michael Bauer <[EMAIL PROTECTED]>
08/21/2007 09:33 AM

To

xalan-j-users@xml.apache.org

cc

Subject

Ignoring errors




I am using Xalan/Xerces to parse out some data from a web page.  The
problem is that the web page is not well-formed, and running the
Transformer on it produces:
ERROR:  'Open quote is expected for attribute "href|".'
ERROR: 'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
: Open quote is expected for attribute "href|".'
Is there anyway to instruct the Parse/Transformer to ignore such
errors?



Reply via email to