I really like your interpretation of "CSV" as "Character Separated" vs. "Comma Separated" because so many formats are referred to as "just CSV" when they are actually quite complex and varied in terms of the delimiters used, escaping (or not), etc.
I am hoping to get the DFDL community to contribute examples of CSV variations to the github DFDLSchemas CSV repository. https://github.com/DFDLSchemas/CSV I am also curious if your STX-separated data can be opened by Excel or OpenOffice Calc, and if their wizard is able to intuit that STX is the separator, or it is completely fooled by use of a C1 control character for this. ________________________________ From: Attila Horvath <attila.j.horv...@gmail.com> Sent: Friday, July 23, 2021 7:21 AM To: users@daffodil.apache.org <users@daffodil.apache.org>; Beckerle, Mike <mbecke...@owlcyberdefense.com> Subject: Re: CSV - hex char separator? re: "...I hope that helps." Very much so. :) On review, I came upon section "6.3.1 DFDLString Literals<https://daffodil.apache.org/docs/dfdl/#_Toc62570072>" in Daffodil spec. after the fact which elaborates on your response. Mastering navigation of documents/examples to answer questions for the 'uninitiated' is the challenge. Thx Mike Attila On Thu, Jul 22, 2021 at 4:48 PM Beckerle, Mike <mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>> wrote: You can do dfdl:separator="%#x02;" or even dfdl:separator="%STX;" (Section 6.3.1.2 Table 4 DFDL Entities) The "%" introduces a DFDL-specific character entity. I generally recommened people use the DFDL "%" instead of the XML "&" You are only stuck with dealing with the E000 stuff when those control characters appear in the values of elements. Delimiters that are explicit in the DFDL schema aren't part of the infoset, (they won't show up in your XML) so none of the E000 remapping occurs for those strings. I hope that helps. -mike beckerle ________________________________ From: Attila Horvath <attila.j.horv...@gmail.com<mailto:attila.j.horv...@gmail.com>> Sent: Thursday, July 22, 2021 1:56 PM To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> <users@daffodil.apache.org<mailto:users@daffodil.apache.org>> Subject: CSV - hex char separator? If I have a Character Separated Value [CSV] file, where the character is any 7 bit hexadecimal character instead of simply 'comma' separated - eg: STX [0x02], how can that be specified in a '<xs:sequence dfdl:separator="..."' attribute? I tried '<xs:sequence dfdl:separator=""'. Syntactically it is correct but daffodil is not recognizing the STX [0x02] character as a field delimiter. [image.png] Thx in advance, Attila