This goes to the whole valid vs legal argument for xml. Most parsers I am aware of, and ones I have used, tend to default to parsing legal xml. And it is an option to require valid xml. Also, the dom documentation in witango talks only of parsing, and not validating, which are usually are not considered one in the same.

The NWS xml, is definitely LEGAL xml, however, (and the jury is not quite in on this for me) is not valid by the implementation of xerces in witango.

Finally after all this time of these issues on the list, we are seeing why some of the problems are occuring, it seems that witango is requiring both strictly VALID and LEGAL XML. IMO, it should only require legal, unless we specify in an option, and at the very least should be documented.

Phil, you make it sound like, we should know better than to try to parse invalid XML. Yet we have no guidance to know what witango wants/requires.

Also, recap the xml implementation in witango 5.5. First, it has major text encoding issues. It will only accept ascii and iso-8859-1 encoded xml, regardless of the xml-declaration. That in of itself, seems to make the witango implementation, itself invalid, to ignore the xml declaration.

And now, we find out for the first time, it requires strictly validated xml. I guess someone needs to contact the national weather service and set them straight.

As I am writing, I just saw your recent post. The list is littered with issues of text encoding and xml. I have also posted primers and tools to aid in this process. Sometimes it errors, and sometimes it just screws up the encoding of special chars, like é and others. here is an example that won't work properly in witango, it may parse, but the characters will be messed up.

<?xml version="1.0" encoding="UTF-8"?>
<doc>
<myvalue>Moiré, ü©®</myvalue>
</doc>
 
I believe this xml would be both valid, and legal, but witango can't handle it. I have to send raw binary out to a bean, transform to ascii, and return to witango to parse this and use it. I have found others, will have to dig around, that won't even parse, but will if you remove the declaration.

You say we should fix the xml that is not valid. But we didn't create it. We are just trying to use the xml out there that the world provides. All legal, but some not valid. Like the one from the national weather service. And from banks, and from retailers, or whomever. And as far as I know, UTF-8 is the standard, and usually the default encoding for xml, but witango can't handle that. So how do we fix that? You have given us a tool, with big limitations, and strict compliance, with no methods to workaround or fix. And even the bean approach. It works great if the xml is relatively small. But if it is a large chunk of xml, it stalls in the bean, like the blob to bean issue I mentioned the other day with jmagick. The Bean handler seems to choke on larger amounts of data. External actions, have limitations on env var size, and take a long time to instantiate. So the only REAL safe approach, is to inefficiently write it out to disk, without touching it, for fear of screwing up encoding, run through an external action, and read back from disk. Or don't use witango at all. Or don't use large xml files, and use my witangoxmlcleaner bean.

And for the most part, we had to figure all this out without your help, or documentation.

And for the record (jason), I am moving to php, that is true, but I have several clients on this list, that use witango everyday, and pay me to help them work through these issues.

-- 

Robert Garcia
President - BigHead Technology
VP Application Development - eventpix.com
13653 West Park Dr
Magalia, Ca 95954
ph: 530.645.4040 x222 fax: 530.645.4040

On Apr 26, 2006, at 11:59 PM, Robert Garcia wrote:

Yes, it seems that the witango xerces implementation is validating the schema against the xsd schema file. And for whatever reason, is declaring the xml schema invalid.

Now, this xml is from the National Weather Service. RB, VB, Java 1.5, Xerces-J without the extra validations, and so forth, all parse it without error. Could it be that your implementation of Xerces in witango is too strict, and is causing errors, where they are not seen elsewhere? The text encoding was a hunch, cuz this xml is gzipped from NWS, and I have had to work through a ton of text encoding issues in wtiango, as you know those are present.

I looked at the XSD file, it seems valid to me, accept for the double mention as you saw, but whether that is allowed or not, i have no idea. But the "two_day_history_url" element doesn't seem to be a problem.

My point is, there have been several examples of xml that parses just about everywhere else, and not in witango. And the error this throws, should probably be a warning, or a setting as to whether or not the parser should be that strict.

If the xerces implementation in witango, is validating to the nth degree, don't you think it should be torqued down a bit, or maybe a parameter in the <@dom> tag to allow strict or a non strict implementation?

Cuz the issue is, many of us have seen witango error on xml that parse fine elsewhere, and so we take the approach, "You can't trust witango to parse xml reliably", and find other workarounds. With the current method, we would have to create a bean or something to test xml for encoding issues, then xml validity, before passing to xml. Which is why I had to write the witangoxmlcleaner bean.

This also goes back to documentation. You said, look at the readme, and we can see what library witango uses, Also, "read the xerces manual". But we have no idea how you have implemented xerces in witango, and I have shown the same library will parse this xml. So our only recourse, is pull your head out of your hair trial and error. And it is very frustrating. This kind of development without documentation is what negates the claim of ease of use, and rapid development.

-- 

Robert Garcia
President - BigHead Technology
VP Application Development - eventpix.com
13653 West Park Dr
Magalia, Ca 95954
ph: 530.645.4040 x222 fax: 530.645.4040

On Apr 26, 2006, at 11:14 PM, Ben Johansen wrote:



From: Phil Wade [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, April 26, 2006 10:48 PM
To: [email protected]
Subject: Re: Witango-Talk: Parsing DOM Response for element by name

 

Robert,

If it is text encoding as you suggest there would have to be some character above ascii 127 to cause a problem and as far as I can see there isn't. Removing the namespace makes the XML parse which tends to indicate there is a problem with validation against the schema.

 

Your sample code is a pretty simplistic approach as you do not initialise the validator so your parser may not be getting the schema and validating the XML against it which would allow it to parse without error. You may want to check out the following xerces methods to see what state the parser validator has been initialised to. There is more to it than just get the text and parse when schemas are involved.

 

getValidationEnabled

setSchemaValidationEnabled

getSchemaValidationEnabled

setSchemaFullCheckingEnabled

 

 

Regards

 

Phil

 



________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

________________________________________________________________________
TO UNSUBSCRIBE: Go to http://www.witango.com/developer/maillist.taf

Reply via email to