I've been trying to rewrite the jazilla parser code so it doesn't run through JTidy when XML or XHTML is detected.

This is the code:
public void parse( InputStream input ) throws Exception {
long time1 = System.currentTimeMillis();
BufferedReader in = new BufferedReader(new InputStreamReader(input));
final StringBuffer doctype_ = new StringBuffer();
String document = "";
int currentLine = 0;


while (in.ready()) {
currentLine++;
String currentline = in.readLine();
if (currentLine == 1) {
doctype_.append(currentline);
}
document += currentline;
}
in.close();
final String doctype = new String(doctype_);
final StringReader strings = new StringReader(document);
final String doc = document;
// new Thread() {
// public void run() {
try {
if (doctype.startsWith("<?xml")) {
context.getLogger().message(this, "XHTML detected");
xmlReader.parse( new InputSource(strings));
}
else if (doctype.startsWith("<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML")) {
context.getLogger().message(this, "XHTML detected");
xmlReader.parse( new InputSource(strings));
}
else {
/* We resort to HTML Tidy when we haven't detected:
* 1) XML
* 2) A XHTML1/2 DOCTYPE
* JTidy can't do XHTML2, and it isn't actively
* maintained anyway */
// in.reset();
TidyInputSource tidy = new TidyInputSource( context );
/* StringBufferInputStream is deprecated, but its the easiest way
* to feed tidy */
xmlReader.parse( tidy.getInputSource( new StringBufferInputStream(doc)) );
}
}
catch (Exception e) {
javax.swing.JOptionPane.showMessageDialog(null, e);
}
// }
// }.start();


long time2 = System.currentTimeMillis();
context.getLogger().message( this, "XML Parser time:" + ( time2 - time1 ) );
}


The original code is at http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/jazilla/jazilla/org/netbeans/netbrowser/parser/DefaultParser.java?rev=1.2&content-type=text/vnd.viewcvs-markup

This is the XHTML:
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd";>
<html>
<head>
<title>Jazilla Start Page</title>
</head>
<body><h1>Jazilla Milestone 2</h1>
<hr/>
Welcome to Jazilla. This is only a experimental release so don't expect
everything to work.
<hr/>
Need a changelog? See the CHANGES file at the root in the Jazilla distribution
<hr/>


  </body>
</html>

Xerces locks up when it hits <body>. I've tried this code with Xerces 2.4 & 2.5



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to