Sweet, golden... re:... > If you want the final CRLF missing to be tolerated on parsing, and whether it is there or not preserved when unparsing, > then you actually have to model it as a data element: > <element name="finalLineEnding" type="xs:string" minOccurs="0" > dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%CR;%LF;"/>
|OR| > dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%NL"/> In my particular case, I've no way of knowing what form record delimiters take and how the file is terminated - whether "%NL;" will be present or not, and if so what form will it take. I struggled w/ this a bit b/c: [1]I'm a schema newbie and [2]didn't know where to apply the script snippet. With incorporation of Mike B's suggested script snippet (above), I've reverted to "infix" instead of "postfix". After some trial/error and lots of rereading, I narrowed down where the snippet belongs and verified it works... [image: image.png] As usual, thx for the assist. Attila On 2021/03/01 19:32:33, "Beckerle, Mike" <mbecke...@owlcyberdefense.com> wrote: > Ah, so you have some simple problems here, and this thorny little issue about the NUL character. > > Your regex, the character entities say  this must have a trailing ";" to terminate the character entity > > However, � is just plain disallowed by XML period. Can't put a NUL into XML even using a character entity to do so. This is one of the things I distinctly dislike about XML. > > To cope with this, given that in DFDL people have to talk about real data with NUL in it, DFDL does a bi-directional remapping from 0 to  > > But, you are trying to express a numeric range that is from char code 0 to char code 7F. So you can't just change your regex to use  because that's not the bottom of the range. > > To do what you want you need your regex to say [-] > Notice the semicolons in there. > > With respect to the final CRLF at end of file, there are techniques to cope with this. > We need to clarify, what is the canonical/preferred representation, and whether you want your schema to accept data that is missing this final CRLF. > > Assuming the final CRLF is required, non-optional, you can change the newline separator to add the DFDL property > > dfdl:separatorPosition="postfix" > > Just on the sequence that contains the rows of data. > > This means you get all the infix separator line-endings, plus one more at the end. > > However, that one at the end is NOT optional. If not present, you'll get parse errors. > > If you want the final CRLF missing to be tolerated on parsing, and whether it is there or not preserved when unparsing, then you actually have to model it as a data element: > > <element name="finalLineEnding" type="xs:string" minOccurs="0" > dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%CR;%LF;"/> > > That final element will absorb, and represent, a final CRLF, and on unparsing, lay it down so it matches the input data. > > ________________________________ > From: Attila Horvath <attila.j.horv...@gmail.com> > Sent: Monday, March 1, 2021 2:03 PM > To: users@daffodil.apache.org <users@daffodil.apache.org> > Subject: Re: regex |AND| left over data > > 1) b) should read ...value="�-" > > On 2021/03/01 18:58:08, Attila Horvath <attila.j.horv...@gmail.com> wrote: > > All - two quick questions... > > > > 1) regex > > > > I am trying to use character range query in regex-pression like: > > a)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[\x00-\x7F]{0,10}"/> > > </cs:restriction> > > |OR| > > b)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[�- ]{0,10}"/> > > </cs:restriction> > > - either way both throw error(s) re: invalid regex expression syntax. > > - what is correct syntax for range of hex values? > > > > 2) my CSV files has CR/LF at end of last line in file > > - when parsing, I get numerous warnings ultimately "left over data" > > ...starting at byte xyz (0x0d0a...) > > a) how to consume (parse) last two bytes and avoid warnings > > b) how to reconstitute (unparse) so last two bytes are included > > > > Thx in advance > > > > Attila (newbie) > > >