I would need to unparse the data. In my case I wouldn't need to fold it
 
 
Sent: Wednesday, April 29, 2020 at 4:09 PM
From: "Beckerle, Mike" <mbecke...@tresys.com>
To: "users@daffodil.apache.org" <users@daffodil.apache.org>
Subject: Re: DFDL Layers
Actually, I think both your comment stripping and line folding are instances of a general layer that takes a recognizer regex, and a replacement string to replace matches of it.
 
Question: Do you need to unparse this data? If so is there a length limit after which lines are supposed to be folded?
 

From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 3:49 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers
 
 
In this case only full lines can be comments. In my specific case the "--" must be the first characters on the line and if it was specifically a space it would get folded into the line above it.
I think if a comment feature would be created, being able to specify the initiating bytes and terminating bytes would facilitate supporting multiple comments styles easily.
 
Sent: Wednesday, April 29, 2020 at 2:53 PM
From: "Beckerle, Mike" <mbecke...@tresys.com>
To: "users@daffodil.apache.org" <users@daffodil.apache.org>
Subject: Re: DFDL Layers
Alas no. The plugins for layering is not something we've implemented yet.
 
I get your point really the comment syntax layer and the line-folding layer should be separate layers.
 
The way you are putting in comments, as optional strings initiated by "--"  works, but won't work if the comment is a partial line, only if it is the entire line. The "--" also must be the first 2 characters of the line. Can the "--" for a comment be preceded on a line by spaces/tabs (a common convention)?
 
 
 
 
 
 
 
 
 
 

From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 2:30 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers
 
re:plugin system
Is the plugin system in place yet? And if it is could you link me to an example.
 
re: comment syntax
Perhaps, I haven't run into it myself.
If I were to keep it what would be the best way of interleaving comments currently I've got something like the bellow
 
<xs:sequence dfdl:sequenceKind="ordered" dfld:separator=""%NL;"
       dfdl:separatorPosition="infix" dfdl:seperatorSuprressionPolicy="anyEmpty">
  <xs:element name="comment" type="xs:string" minOccurs="0" maxOccurs="unbounded" dfdl:initiator="--"/>
  <xs:element name="A" type="ns:myComplexTypeA" />
  <xs:element name="B" type="ns:myComplexTypeB" />
</xs:sequence>
 
And then in the complexTypes I have something simillar until I don't separate on lines anymore.
 
re: created issue
It think there might have been a bit of a misunderstanding. The line folding and comments are two different things. The behaviour for the line folding would be replacing the regex "(\r\n |\n )" with "".
While comments are any lines starting with the comment marker.
 
Sent: Wednesday, April 29, 2020 at 1:39 PM
From: "Beckerle, Mike" <mbecke...@tresys.com>
To: "users@daffodil.apache.org" <users@daffodil.apache.org>
Subject: Re: DFDL Layers
 
 
re: comment syntax. While the spec may say comments should be stripped, I have seen that formats get extended with things that are placed into structured comments, so the spec de-facto evolves to need the comments preserved. May not match your need, but I thought it worth mentioning.
 
In addition, often we find use cases where people want to parse and then unparse data, and get back their input, perhaps canonicalized, but preserving everything "of value" in the data. This would also require preserving the comments.
 
This is all by way of just saying that your effort to put elements into your schema to model the comments is potentially valuable, despite the spec saying comments are to be stripped/ignored.
 
 

From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, April 29, 2020 1:23 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: Re: DFDL Layers
 
 
The layering feature is not extensible the way you need it to be.
 
Our hope is to make it pluggable so that you could write a tiny java class and plug it in and introduce new layering transforms easily. 
Alas, we don't have that feature yet.
 
The layering feature is also not entirely baked yet. For example we'd like to be able to use it to compute/verify parity and/or checksums over parts of the data, but we don't have that figured out yet.
 
One of the beauties of open source software is that one can add the layer transform that you need. If what you need is really a variation on one we have that would be pretty easy to do. If you are a Java developer you can probably pull this off.
 
Either way we'll create a JIRA ticket to add this feature so that if you can't do this someone else may be able to quickly do it for you.
 
In terms of using Daffodil 2.6.0 today to solve your issue, I think a preprocessor to just convert all the folded lines to the type we already have a transform for, (which I think require CRLFs?) perhaps is just standardizing line endings to CRLF? If so existing preprocessor tools for converting files from Unix conventions (LF only) to MS-Windows (CRLF) style might solve your problem.
 
 
 
 

From: Depth Painter <depth.pain...@mail.com>
Sent: Wednesday, April 29, 2020 1:08 PM
To: users@daffodil.apache.org <users@daffodil.apache.org>
Subject: DFDL Layers
 
Hello,
 
I'm currently trying to make a dfdl schema for a textual file format. While I have mostly finished, there are two things that are currently giving me difficulty: Line folding and line comments.

For line folding the layer transforms specifically the lineFolded_* family looks promising however the two that are implemented don't match my file specification because crlf%x20; and lf%x20; can both be used to fold lines.

As for line comments I currently have some Comment elements peppered throughout my schema however I'm sure I missed some. And that strategy isn't even up to spec because comments are supposed to be ignored during parsing.

Other then preprocessing the file do you guys have any recommendations to handle these issues I'm facing?

Thanks
 
DP

Reply via email to