Mike Appreciate the response.
I'm trying to follow customer's data spec to only allow 'printable characters' in certain fields though spec doesn't define what is/isn't printable. Wikipedia has its own definition of printable characters <https://en.wikipedia.org/wiki/ASCII#Printable_characters>. Technically, for example, bell ^G [0x07] may/could be considered a printable character. ( I know, I'm showing my age but such is life. ;) re: [-] This doesn't work. - daffodil throws errors as does Notepad++. Daffodil throws an error re: "[-]" as well. The best I can do is "[ -]" - anything under 0x20 throws an error in Daffodil which may be a problem as anything is allowed in certain fields. re: dfdl:separatorPosition="postfix" I made your recommended change. It successfully suppresses "left over data" warnings. Thx - Attila On Mon, Mar 1, 2021 at 2:32 PM Beckerle, Mike <mbecke...@owlcyberdefense.com> wrote: > Ah, so you have some simple problems here, and this thorny little issue > about the NUL character. > > Your regex, the character entities say  this must have a trailing ";" > to terminate the character entity > > However, � is just plain disallowed by XML period. Can't put a NUL > into XML even using a character entity to do so. This is one of the things > I distinctly dislike about XML. > > To cope with this, given that in DFDL people have to talk about real data > with NUL in it, DFDL does a bi-directional remapping from 0 to  > > But, you are trying to express a numeric range that is from char code 0 to > char code 7F. So you can't just change your regex to use  because > that's not the bottom of the range. > > To do what you want you need your regex to say [-] > Notice the semicolons in there. > > With respect to the final CRLF at end of file, there are techniques to > cope with this. > We need to clarify, what is the canonical/preferred representation, and > whether you want your schema to accept data that is missing this final CRLF. > > Assuming the final CRLF is required, non-optional, you can change the > newline separator to add the DFDL property > > dfdl:separatorPosition="postfix" > > Just on the sequence that contains the rows of data. > > This means you get all the infix separator line-endings, plus one more at > the end. > > However, that one at the end is NOT optional. If not present, you'll get > parse errors. > > If you want the final CRLF missing to be tolerated on parsing, and whether > it is there or not preserved when unparsing, then you actually have to > model it as a data element: > > <element name="finalLineEnding" type="xs:string" minOccurs="0" > dfdl:lengthKind="explicit" dfdl:length="0" > dfdl:initiator="%CR;%LF;"/> > > That final element will absorb, and represent, a final CRLF, and on > unparsing, lay it down so it matches the input data. > > ------------------------------ > *From:* Attila Horvath <attila.j.horv...@gmail.com> > *Sent:* Monday, March 1, 2021 2:03 PM > *To:* users@daffodil.apache.org <users@daffodil.apache.org> > *Subject:* Re: regex |AND| left over data > > 1) b) should read ...value="�-" > > On 2021/03/01 18:58:08, Attila Horvath <attila.j.horv...@gmail.com> > wrote: > > All - two quick questions... > > > > 1) regex > > > > I am trying to use character range query in regex-pression like: > > a)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[\x00-\x7F]{0,10}"/> > > </cs:restriction> > > |OR| > > b)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[�- ]{0,10}"/> > > </cs:restriction> > > - either way both throw error(s) re: invalid regex expression syntax. > > - what is correct syntax for range of hex values? > > > > 2) my CSV files has CR/LF at end of last line in file > > - when parsing, I get numerous warnings ultimately "left over data" > > ...starting at byte xyz (0x0d0a...) > > a) how to consume (parse) last two bytes and avoid warnings > > b) how to reconstitute (unparse) so last two bytes are included > > > > Thx in advance > > > > Attila (newbie) > > >