Smith, Peter wrote:

Hi,

I want to filter the contents of a XML document using a DTD to allow
only the valid elements and attributes as specified in the DTD to be
passed through to the output, any data that does not conform to the DTD
should be filtered out.  The document can be large so I suspect a SAX
type filter is the way to go.  My problem is figuring out how to
integrate the actions of a validating parser and a filter together, it
looks like something that should be easy to do but I can't find and
javadoc or examples to help me.

Thanks in advance,
Pete Smith



This is not as easy as it may seem. If the DTD schema is complicated then you must choose which elements are to be filtered out. You must implement the heuristic algorithm to do the choice and it is not easy when you can expect any DTD. For example, let's have the following rules: Element A may contain one B or many Cs: A (B | C*). Suppose that A contains many Cs, but B occured accidentaly before all these Cs. What do you do now, discard B or all Cs? Thanks to this, the filter cannot be implemented so easily, I think.
However, there is a way. You can use MSV's automaton model (Sun product, just google the msv string). In this model you have an Acceptor object instance that is eating elements, one by one. If the element is rejected, then the method returns false and you know that the element (text) was invalid, hence you should discard it. The problem is that this simple algorithm will discard all C elements in the previous example, hence a more complicated algorithm should be used. If you are interested I can post some examples of how to work with the acceptor, however the MSV package itself contains some good examples, so try downloading it first.
Sincerely,
Martin Vysny


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to