I believe this is the latest document describing "indirection", the feature to
support pointer offsets:
https://github.com/OpenGridForum/DFDL/raw/master/docs/current/gwde-dfdl-experience-8-experimental-indirection.docx
I believe it is a candidate for DFDL 2.0 and is currently implemented in IBM
z/TPF. I imagine at some point Daffodil will implement it as an extension, but
there are no current plans for it at the moment.
Note that Daffodil supports an experimental extension called "layers" which can
be used to support compressed data--it decompresses data on parse and compresses
data on unparse. Here is more information about the layer feature:
https://daffodil.apache.org/layers/
- Steve
On 2024-08-10 08:49 AM, John Dziurlaj wrote:
Hi Steve,
Thanks for your response.
Yes, PDF files use 10-digit byte offsets from the start of the file to locate
objects, which are defined in the cross-reference table. Objects can also be
compressed, and in such cases, the byte offsets are from the beginning of the
stream to the target object.
Could you please send me a link to the proposed extensions? I'd love to review
them!
Regards,
John Dziurlaj
-----Original Message-----
From: Steve Lawrence <slawre...@apache.org>
Sent: Friday, August 9, 2024 10:00 AM
To: users@daffodil.apache.org
Subject: Re: Parsing PDF with DFDL
This isn't the first I've heard of wanting to model PDF with a DFDL schemas,
but as far as I'm aware it hasn't been done.
I'm also not sure if this is true, but I feel like I've heard that PDF format
uses byte offsets, which isn't supported by the current version of DFDL. There
are proposed extensions to support it, but Daffodil does not implement them
(yet). So it might not even be possible with current DFDL if those are required.
There are a number of publicly available schemas here:
https://github.com/orgs/DFDLSchemas/repositories?type=all
Though I'm not sure any of them are as complicated as PDF.
On 2024-08-08 10:48 AM, John Dziurlaj wrote:
Hello,
Long time listener, first time caller.
I am looking to implement the XFA Format
<https://en.wikipedia.org/wiki/XFA>,
for which PDF is one of its “wrappers”. In order to inject XFA (which
is XML) into PDF (which is binary), the native XML toolchain won’t
suffice. Has anyone attempted a DFDL schema for PDF? Are there any
schemas available for it for a similarly complex format?
Thanks!
John Dziurłaj /d͡ʑurwaj/
Sr. Solutions Architect, The Turnout
e: john@turnout.rocks <mailto:john@turnout.rocks>
s: +1 (330) 714-8935
x: @dziurlaj
work hours: 7am-3pm ET
http://turnout.rocks <http://turnout.rocks/>
@turnoutrocks