Ah, so you have some simple problems here, and this thorny little issue about 
the NUL character.

Your regex, the character entities say &#x7f this must have a trailing ";" to 
terminate the character entity

However, � is just plain disallowed by XML period. Can't put a NUL into 
XML even using a character entity to do so. This is one of the things I 
distinctly dislike about XML.

To cope with this, given that in DFDL people have to talk about real data with 
NUL in it, DFDL does a bi-directional remapping from 0 to 

But, you are trying to express a numeric range that is from char code 0 to char 
 code 7F.  So you can't just change your regex to use  because that's 
not the bottom of the range.

To do what you want you need your regex to say [-]​
Notice the semicolons in there.

With respect to the final CRLF at end of file, there are techniques to cope 
with this.
We need to clarify, what is the canonical/preferred representation, and whether 
you want your schema to accept data that is missing this final CRLF.

Assuming the final CRLF is required, non-optional, you can change the newline 
separator to add the DFDL property

dfdl:separatorPosition="postfix"

Just on the sequence that contains the rows of data.

This means you get all the infix separator line-endings, plus one more at the 
end.

However, that one at the end is NOT optional. If not present, you'll get parse 
errors.

If you want the final CRLF missing to be tolerated on parsing, and whether it 
is there or not preserved when unparsing, then you actually have to model it as 
a data element:

<element name="finalLineEnding" type="xs:string" minOccurs="0"
      dfdl:lengthKind="explicit" dfdl:length="0" dfdl:initiator="%CR;%LF;"/>

That final element will absorb, and represent, a final CRLF, and on unparsing, 
lay it down so it matches the input data.

________________________________
From: Attila Horvath <attila.j.horv...@gmail.com>
Sent: Monday, March 1, 2021 2:03 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: regex |AND| left over data

1) b) should read ...value="&#x00-&#x7f"

On 2021/03/01 18:58:08, Attila Horvath <attila.j.horv...@gmail.com> wrote:
> All - two quick questions...
>
> 1) regex
>
> I am trying to use character range query in regex-pression like:
>  a)...
>    <xs: restriction base="xs:string">
>      <xs:pattern value="[\x00-\x7F]{0,10}"/>
>    </cs:restriction>
>  |OR|
>  b)...
>    <xs: restriction base="xs:string">
>      <xs:pattern value="[�-]{0,10}"/>
>    </cs:restriction>
>  - either way both throw error(s) re: invalid regex expression syntax.
>  - what is correct syntax for range of hex values?
>
> 2) my CSV files has CR/LF at end of last line in file
>  - when parsing, I get numerous warnings ultimately "left over data"
> ...starting at byte xyz (0x0d0a...)
>  a) how to consume (parse) last two bytes and avoid warnings
>  b) how to reconstitute (unparse) so last two bytes are included
>
> Thx in advance
>
> Attila (newbie)
>

Reply via email to