re: "...you need your regex to say [-]" as it relates to all printable ASCII characters, -suggested syntax above throws Daffodil error:... [image: image.png] [image: image.png]
Also tried following pattern which does not throw syntax error but throws validation error - not recognizing valid data pattern... [image: image.png] [image: image.png] The only thing that seems to work w/o throwing error is:... [image: image.png] For posterity, I'd like to be able to specify specific subset of printable characters as hexadecimals. Suggestions? Recommendations? Thx in advance - Attila On Mon, Mar 1, 2021 at 2:32 PM Beckerle, Mike <mbecke...@owlcyberdefense.com> wrote: > Ah, so you have some simple problems here, and this thorny little issue > about the NUL character. > > Your regex, the character entities say  this must have a trailing ";" > to terminate the character entity > > However, � is just plain disallowed by XML period. Can't put a NUL > into XML even using a character entity to do so. This is one of the things > I distinctly dislike about XML. > > To cope with this, given that in DFDL people have to talk about real data > with NUL in it, DFDL does a bi-directional remapping from 0 to  > > But, you are trying to express a numeric range that is from char code 0 to > char code 7F. So you can't just change your regex to use  because > that's not the bottom of the range. > > To do what you want you need your regex to say [-] > Notice the semicolons in there. > > With respect to the final CRLF at end of file, there are techniques to > cope with this. > We need to clarify, what is the canonical/preferred representation, and > whether you want your schema to accept data that is missing this final CRLF. > > Assuming the final CRLF is required, non-optional, you can change the > newline separator to add the DFDL property > > dfdl:separatorPosition="postfix" > > Just on the sequence that contains the rows of data. > > This means you get all the infix separator line-endings, plus one more at > the end. > > However, that one at the end is NOT optional. If not present, you'll get > parse errors. > > If you want the final CRLF missing to be tolerated on parsing, and whether > it is there or not preserved when unparsing, then you actually have to > model it as a data element: > > <element name="finalLineEnding" type="xs:string" minOccurs="0" > dfdl:lengthKind="explicit" dfdl:length="0" > dfdl:initiator="%CR;%LF;"/> > > That final element will absorb, and represent, a final CRLF, and on > unparsing, lay it down so it matches the input data. > > ------------------------------ > *From:* Attila Horvath <attila.j.horv...@gmail.com> > *Sent:* Monday, March 1, 2021 2:03 PM > *To:* users@daffodil.apache.org <users@daffodil.apache.org> > *Subject:* Re: regex |AND| left over data > > 1) b) should read ...value="�-" > > On 2021/03/01 18:58:08, Attila Horvath <attila.j.horv...@gmail.com> > wrote: > > All - two quick questions... > > > > 1) regex > > > > I am trying to use character range query in regex-pression like: > > a)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[\x00-\x7F]{0,10}"/> > > </cs:restriction> > > |OR| > > b)... > > <xs: restriction base="xs:string"> > > <xs:pattern value="[�- ]{0,10}"/> > > </cs:restriction> > > - either way both throw error(s) re: invalid regex expression syntax. > > - what is correct syntax for range of hex values? > > > > 2) my CSV files has CR/LF at end of last line in file > > - when parsing, I get numerous warnings ultimately "left over data" > > ...starting at byte xyz (0x0d0a...) > > a) how to consume (parse) last two bytes and avoid warnings > > b) how to reconstitute (unparse) so last two bytes are included > > > > Thx in advance > > > > Attila (newbie) > > >