Thank you, Mike. This is great. Here is the URL to the schema that Mike references:
https://github.com/DFDLSchemas/mil-std-2045/blob/master/src/main/resources/com/owlcyberdefense/mil-std-2045/xsd/milstd2045.common.dfdl.xsd From: Mike Beckerle <mbecke...@apache.org> Sent: Friday, October 13, 2023 9:26 AM To: users@daffodil.apache.org Subject: [EXT] Re: How to generate an error for invalid data, discard the invalid data, and continue parsing? The mil-std-2045 schema on github uses techniques to achieve this sort of thing. There are a few such techniques. One I like is to capture the invalid data in an element named invalid, which has facets such that any content is deemed invalid. The mil-std-2045 schema on github uses techniques to achieve this sort of thing. There are a few such techniques. One I like is to capture the invalid data in an element named invalid, which has facets such that any content is deemed invalid. (E.g., a pattern that can't be matched.) This detail, that the <invalid>message</invalid> element is in fact invalid, is important, because an element named "invalid" can of course be entirely valid, which is a mistake we want to avoid. If you download that schema from github, look for the <group name="urn_unit_name_group"> ... definition. It detects when both a URN and a UNIT_NAME field are both present, and creates an invalid element containing an error message to that effect. One could just issue a warning diagnostic using dfdl:assert with "recoverableError" as the failure type. Then you will get out a warning, and you can decide whether or not you want the failure represented in the infoset with an invalid element, or a valid element, or not at all. For CDS use, I really like putting in the "guaranteed to be invalid" element, because it makes it clear and testable that the schema is detecting what is wrong. Allows use of the schema in situations where you want to see what the invalidity was, but a CDS will still block it. Adding a DFDL variable that controls which thing the schema does can be helpful for various testing scenarios. You can have a "fail fast" setting which causes the parse to fail, a "message only" for getting the warning only, or a "capture invalid" option which creates the <invalid>...</invalid> element. On Wed, Oct 11, 2023 at 7:44 AM Roger L Costello <coste...@mitre.org<mailto:coste...@mitre.org>> wrote: My input consists of a series of label-colon-message lines: Dear Sir: Thank you for your response Dear Madam: How are you Dear Foo: Have a good day Dear Sir: Nice work The stuff before the colon is the “label” and the stuff after the colon is the “message”. There are two legal labels, Dear Sir and Dear Madam. I want this output: <Tests> <line> <label>Dear Sir</label> <message>Thank you for your response</message> </line> <line> <label>Dear Madam</label> <message>How are you</message> </line> <line> <label>Dear Sir</label> <message>Nice work</message> </line> </Tests> The third line (Dear Foo: Have a good day) contains invalid data (Dear Foo is not a valid label) so I want that line discarded and an error reported for that line and parsing to continue to the next line, In other words, I want am error generated for erroneous data, the erroneous data discarded, and parsing to continue. How to do that? /Roger