I really like your interpretation of "CSV" as "Character Separated" vs. "Comma 
Separated" because so many formats are referred to as "just CSV" when they are 
actually quite complex and varied in terms of the delimiters used, escaping (or 
not), etc.

I am hoping to get the DFDL community to contribute examples of CSV variations 
to the github DFDLSchemas CSV repository.

https://github.com/DFDLSchemas/CSV

I am also curious if your STX-separated data can be opened by Excel or 
OpenOffice Calc, and if their wizard is able to intuit that STX is the 
separator, or it is completely fooled by use of a C1 control character for this.




________________________________
From: Attila Horvath <attila.j.horv...@gmail.com>
Sent: Friday, July 23, 2021 7:21 AM
To: users@daffodil.apache.org <users@daffodil.apache.org>; Beckerle, Mike 
<mbecke...@owlcyberdefense.com>
Subject: Re: CSV - hex char separator?

re: "...I hope that helps."

Very much so. :)

On review, I came upon section "6.3.1 DFDLString 
Literals<https://daffodil.apache.org/docs/dfdl/#_Toc62570072>" in Daffodil 
spec. after the fact which elaborates on your response.

Mastering navigation of documents/examples to answer questions for the 
'uninitiated' is the challenge.

Thx Mike

Attila


On Thu, Jul 22, 2021 at 4:48 PM Beckerle, Mike 
<mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>> wrote:
You can do dfdl:separator="%#x02;" or even dfdl:separator="%STX;" (Section 
6.3.1.2 Table 4 DFDL Entities)

The "%" introduces a DFDL-specific character entity.

I generally recommened people use the DFDL "%" instead of the XML "&"

You are only stuck with dealing with the E000 stuff when those control 
characters appear in the values of elements. Delimiters that are explicit in 
the DFDL schema aren't part of the infoset, (they won't show up in your XML) so 
none of the E000 remapping occurs for those strings.

I hope that helps.

-mike beckerle

________________________________
From: Attila Horvath 
<attila.j.horv...@gmail.com<mailto:attila.j.horv...@gmail.com>>
Sent: Thursday, July 22, 2021 1:56 PM
To: users@daffodil.apache.org<mailto:users@daffodil.apache.org> 
<users@daffodil.apache.org<mailto:users@daffodil.apache.org>>
Subject: CSV - hex char separator?

If I have a Character Separated Value [CSV] file, where the character is any 7 
bit hexadecimal character instead of simply 'comma' separated - eg: STX [0x02],
how can that be specified in a '<xs:sequence dfdl:separator="..."' attribute?

I tried '<xs:sequence dfdl:separator="&#xE002;"'. Syntactically it is correct 
but daffodil is not recognizing the STX [0x02] character as a field delimiter.

[image.png]

Thx in advance,

Attila

Reply via email to