Hi Steve, Thanks for your response.
Yes, PDF files use 10-digit byte offsets from the start of the file to locate objects, which are defined in the cross-reference table. Objects can also be compressed, and in such cases, the byte offsets are from the beginning of the stream to the target object. Could you please send me a link to the proposed extensions? I'd love to review them! Regards, John Dziurlaj -----Original Message----- From: Steve Lawrence <slawre...@apache.org> Sent: Friday, August 9, 2024 10:00 AM To: users@daffodil.apache.org Subject: Re: Parsing PDF with DFDL This isn't the first I've heard of wanting to model PDF with a DFDL schemas, but as far as I'm aware it hasn't been done. I'm also not sure if this is true, but I feel like I've heard that PDF format uses byte offsets, which isn't supported by the current version of DFDL. There are proposed extensions to support it, but Daffodil does not implement them (yet). So it might not even be possible with current DFDL if those are required. There are a number of publicly available schemas here: https://github.com/orgs/DFDLSchemas/repositories?type=all Though I'm not sure any of them are as complicated as PDF. On 2024-08-08 10:48 AM, John Dziurlaj wrote: > Hello, > > Long time listener, first time caller. > > I am looking to implement the XFA Format > <https://en.wikipedia.org/wiki/XFA>, > for which PDF is one of its “wrappers”. In order to inject XFA (which > is XML) into PDF (which is binary), the native XML toolchain won’t > suffice. Has anyone attempted a DFDL schema for PDF? Are there any > schemas available for it for a similarly complex format? > > Thanks! > > John Dziurłaj /d͡ʑurwaj/ > > Sr. Solutions Architect, The Turnout > > e: john@turnout.rocks <mailto:john@turnout.rocks> > > s: +1 (330) 714-8935 > x: @dziurlaj > work hours: 7am-3pm ET > > http://turnout.rocks <http://turnout.rocks/> > > @turnoutrocks >