Thanks for the hand, guys! NekoHTML worked perfectly. I had to use
it as a DOM Parser first then pass it to the Transformer as a
DOMSource but thats fine. Now I just need to adjust my XSL and I
should be set!
Begin forwarded message:
From: Michael Glavassevich <[EMAIL PROTECTED]>
Date: August 21, 2007 11:22:28 AM EDT
To: xalan-j-users@xml.apache.org
Subject: [OT]: RE: Ignoring errors
FWIW, here's a link to the project on SF.net:
http://sourceforge.net/projects/nekohtml/
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007
10:58:56
AM:
Hi, Michael -
Sorry to be contrary, but I don't see it on SF.net.
http://sourceforge.net/search/?type_of_search=soft&words=neko
The page I found was at
http://people.apache.org/~andyc/neko/doc/index.html. It is a
personal
page, but on the apache.org site. Official or not, NEKO did the
trick
for me!
tlj
-----Original Message-----
From: Michael Glavassevich [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 21, 2007 10:47 AM
To: xalan-j-users@xml.apache.org
Cc: Michael Bauer; Dave Brosius; Timothy Jones
Subject: RE: Ignoring errors
NekoHTML is built on top Xerces-J 2.x but it's not an Apache
project. I
think Andy Clark (the creator) maintains it in sourceforge these
days.
Thanks.
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
"Timothy Jones" <[EMAIL PROTECTED]> wrote on 08/21/2007
10:24:44
AM:
I once had pretty good success parsing some sloppy HTML right off
the
web through an HTTP proxy server with a parser called neko. I can
provide code samples off-list if you need them.
It is also an apache offering.
Timothy Jones
Syniverse Technologies
Work
(813) 637-5366
Sr. Systems Engineer
Cell
(813) 857-7650
Development, Tampa, FL
From: Dave Brosius [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 21, 2007 9:37 AM
To: Michael Bauer
Cc: xalan-j-users@xml.apache.org
Subject: Re: Ignoring errors
No, but there are various html 'tidying' tools that you could use to
preparse the html before passing to the transformer.
Michael Bauer <[EMAIL PROTECTED]>
08/21/2007 09:33 AM
To
xalan-j-users@xml.apache.org
cc
Subject
Ignoring errors
I am using Xalan/Xerces to parse out some data from a web page. The
problem is that the web page is not well-formed, and running the
Transformer on it produces:
ERROR: 'Open quote is expected for attribute "href|".'
ERROR:
'com.sun.org.apache.xml.internal.utils.WrappedRuntimeException
: Open quote is expected for attribute "href|".'
Is there anyway to instruct the Parse/Transformer to ignore such
errors?