Actually, I think both your comment stripping and line folding are instances of 
a general layer that takes a recognizer regex, and a replacement string to 
replace matches of it.

Question: Do you need to unparse this data? If so is there a length limit after 
which lines are supposed to be folded?
________________________________
From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 3:49 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers


In this case only full lines can be comments. In my specific case the "--" must 
be the first characters on the line and if it was specifically a space it would 
get folded into the line above it.
I think if a comment feature would be created, being able to specify the 
initiating bytes and terminating bytes would facilitate supporting multiple 
comments styles easily.

Sent: Wednesday, April 29, 2020 at 2:53 PM
From: "Beckerle, Mike" <mbecke...@tresys.com>
To: "users@daffodil.apache.org" <users@daffodil.apache.org>
Subject: Re: DFDL Layers
Alas no. The plugins for layering is not something we've implemented yet.

I get your point really the comment syntax layer and the line-folding layer 
should be separate layers.

The way you are putting in comments, as optional strings initiated by "--"  
works, but won't work if the comment is a partial line, only if it is the 
entire line. The "--" also must be the first 2 characters of the line. Can the 
"--" for a comment be preceded on a line by spaces/tabs (a common convention)?










________________________________
From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 2:30 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers

re:plugin system
Is the plugin system in place yet? And if it is could you link me to an example.

re: comment syntax
Perhaps, I haven't run into it myself.
If I were to keep it what would be the best way of interleaving comments 
currently I've got something like the bellow

<xs:sequence dfdl:sequenceKind="ordered" dfld:separator=""%NL;"
       dfdl:separatorPosition="infix" 
dfdl:seperatorSuprressionPolicy="anyEmpty">
  <xs:element name="comment" type="xs:string" minOccurs="0" 
maxOccurs="unbounded" dfdl:initiator="--"/>
  <xs:element name="A" type="ns:myComplexTypeA" />
  <xs:element name="B" type="ns:myComplexTypeB" />
</xs:sequence>

And then in the complexTypes I have something simillar until I don't separate 
on lines anymore.

re: created issue
It think there might have been a bit of a misunderstanding. The line folding 
and comments are two different things. The behaviour for the line folding would 
be replacing the regex "(\r\n |\n )" with "".
While comments are any lines starting with the comment marker.

Sent: Wednesday, April 29, 2020 at 1:39 PM
From: "Beckerle, Mike" <mbecke...@tresys.com>
To: "users@daffodil.apache.org" <users@daffodil.apache.org>
Subject: Re: DFDL Layers

Created https://issues.apache.org/jira/browse/DAFFODIL-2333

re: comment syntax. While the spec may say comments should be stripped, I have 
seen that formats get extended with things that are placed into structured 
comments, so the spec de-facto evolves to need the comments preserved. May not 
match your need, but I thought it worth mentioning.

In addition, often we find use cases where people want to parse and then 
unparse data, and get back their input, perhaps canonicalized, but preserving 
everything "of value" in the data. This would also require preserving the 
comments.

This is all by way of just saying that your effort to put elements into your 
schema to model the comments is potentially valuable, despite the spec saying 
comments are to be stripped/ignored.


________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, April 29, 2020 1:23 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers


The layering feature is not extensible the way you need it to be.

Our hope is to make it pluggable so that you could write a tiny java class and 
plug it in and introduce new layering transforms easily.
Alas, we don't have that feature yet.

The layering feature is also not entirely baked yet. For example we'd like to 
be able to use it to compute/verify parity and/or checksums over parts of the 
data, but we don't have that figured out yet.

One of the beauties of open source software is that one can add the layer 
transform that you need. If what you need is really a variation on one we have 
that would be pretty easy to do. If you are a Java developer you can probably 
pull this off.

Either way we'll create a JIRA ticket to add this feature so that if you can't 
do this someone else may be able to quickly do it for you.

In terms of using Daffodil 2.6.0 today to solve your issue, I think a 
preprocessor to just convert all the folded lines to the type we already have a 
transform for, (which I think require CRLFs?) perhaps is just standardizing 
line endings to CRLF? If so existing preprocessor tools for converting files 
from Unix conventions (LF only) to MS-Windows (CRLF) style might solve your 
problem.




________________________________
From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 1:08 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: DFDL Layers

Hello,

I'm currently trying to make a dfdl schema for a textual file format. While I 
have mostly finished, there are two things that are currently giving me 
difficulty: Line folding and line comments.

For line folding the layer transforms specifically the lineFolded_* family 
looks promising however the two that are implemented don't match my file 
specification because crlf%x20; and lf%x20; can both be used to fold lines.

As for line comments I currently have some Comment elements peppered throughout 
my schema however I'm sure I missed some. And that strategy isn't even up to 
spec because comments are supposed to be ignored during parsing.

Other then preprocessing the file do you guys have any recommendations to 
handle these issues I'm facing?

Thanks

DP

Reply via email to