At 04:04 PM 1/6/00 -0800, Assaf Arkin wrote: >+1 on on/off feature >+1 on on by default (i.e. no whitespace unless said otherwise) > >+1 on documenting that you have to go trim your text nodes and ignore >others if the DTD is missing (conclusion: always use some DTD)
Seems to me it's kind of a waste of time for an app writer to count on help from the DOM here, simply because there are going to be lots of times when you don't have a DTD or schema or whatever. Thus, you're going to have to write the code to nuke the superfluous whitespace anyhow, so why not just curse a little bit and then do it? Essentially, your app essentially *knows* which elements don't have #PCDATA content anyhow, so it has the information it needs. What would be nice, though, would be a library somewhere in Apache-land that does with whitespace more or less exactly what HTML does, given as input a set of tags that are: - element-content-only (e.g. <html:dl>) - block-level text containers (e.g. <html:p>) - inline text containers (e.g. <html:i>) It turns out there are quite a lot of subtleties, but given the above information, you can get whitespace pretty well right. One of the nice things about doing database rather than document processing is that you don't have to deal with whitespace. Sigh. -T.