On 06/07/2018 12:54 AM, Eric S. Eberhard wrote:
> I know I am the oddball here but -- why use DTDs at all?
I gave reasons above. I am working on a tool. How people using the tool
is not under my control. Maybe we can focus on the opportunity to
improve libxml2 a bit here.
> I supply software to a lot of companies (thousands through
> dealers). Many exchange millions of XML docs per day. I've used this
> since it was libxml. Even have some patches in there. My application
> is proprietary (meaning XML to get an order or tell a customer our
> availability is simply XML I designed and documented and give to my
> customer's customers (via download from a Web page)). Once they get
> it working it pretty much always works. They write software to create
> orders and send them to us -- it is consistent (I know, not everyone
> has this luxury so this may not apply to everyone). So why check them?
>
> I also found that I was getting a gagillion support tickets because
> DTDs ... simple things like a date ... seem to escape people -- take
> June 7, 2018
>
> In our date fields we will take:
> Jun 7 2018
> June 7 2018
> the above with commas and any case (upper/lower/mixed)
> 6/7/18
> 6/7/2018
> 2018/6/7
> 20180607
> 180606
> 06-07-18
>
> And actually many many more. Anything that is a date goes through
> this one routine and if there is any way in the world to extract a
> date, we do.
>
> Ditto money -- say $1,245.56
>
> We accept:
> $1,245.56
> 1245.56
> 124556 (decimal is implied at 2 places if no decimal is
> found)
> 1,235.56
>
> And many more - same thing, one routine reads it and if we can
> possibly get a reasonable number, we do.
>
> This, in turn, reduced our CONSTANT support tickets for silly things
> like a format of something to ZERO. Which I like.
>
> Even sicker -- we ignore case on tags. All of our XML is designed to
> not use duplicate names with different cases (stupid thing to do
> anyway -- expect orderNumber and OrderNumber to both be used, as
> different things).
>
> As long as the customer is consistent and the XML is well formed we
> scan the tree and compare tags without regard to case. A WHOLE LOT
> more support tickets gone.
>
> A lot of the people we deal with are not sophisticated. As the
> receiver of XML we decided it was much better to be as flexible as
> possible and take what we can if at all possible. After all -- a DTD
> can indeed tell you if an address comes in without a city name. And
> reject it and usually generate a support ticket. Since we use an
> on-line AVS system (more XML) and if we have the zip and the address
> otherwise matches ... we don't need the city and state ... the AVS
> system provides it. And if it fails they will get an error back from
> us (from the application) anyway. So why use a DTD to see if the city
> or state were sent? A LOT MORE support calls removed.
>
> And, of course, performance without the DTDs is much better.
>
> As a result we are able to give documentation to new customers and
> they are able to get it up and running with little to no help. Any
> serious errors we cannot fix are clearly explained in the responses BY
> THE APPLICATION and not by a DTD.
>
> Being flexible on our end reduces support tickets which is all I
> care. I would rather code for all the mistakes I can think of an
> enduser would make (and we add new ones when they crop up) than be
> strict and do a lot of support. We don't think DTDs are flexible
> enough. And I hate making them :-)
>
> We do offer a page with DTDs they can use manually to check their
> document if they like -- or they can send it to our test system. Once
> they are running they seem to do just fine.
>
> As programmers it is hard to believe but sometimes it is better for us
> to make slightly less efficient code in order to make the human aspect
> much more efficient. I once had someone send me a link to a "contest"
> which was a convoluted C statement and asking to solve what the result
> would be. My response -- "fire the programmer!"
>
> If it takes 100s of competent C programmers to get the right answer
> (and only a small percent did) to read a line of code -- it is bad
> code. And for people's information, modern computers read ahead and
> pre-execute code based on all kinds of weird logic. Simple C code is
> easy for it to handle ... but convoluted code ends up stopping the
> pre-execution and is actually slower -- may have less lines of code --
> but it will be slower. I see nothing wrong with short clear clean
> code with as little craziness as possible. This is the same with XML
> -- one c