RE: Parsing PDF with DFDL

John Dziurlaj Sat, 10 Aug 2024 05:50:05 -0700

Hi Steve,

Thanks for your response.

Yes, PDF files use 10-digit byte offsets from the start of the file to locate 
objects, which are defined in the cross-reference table. Objects can also be 
compressed, and in such cases, the byte offsets are from the beginning of the 
stream to the target object.

Could you please send me a link to the proposed extensions? I'd love to review 
them!

Regards,

John Dziurlaj

-----Original Message-----
From: Steve Lawrence <slawre...@apache.org> 
Sent: Friday, August 9, 2024 10:00 AM
To: users@daffodil.apache.org
Subject: Re: Parsing PDF with DFDL

This isn't the first I've heard of wanting to model PDF with a DFDL schemas, 
but as far as I'm aware it hasn't been done.

I'm also not sure if this is true, but I feel like I've heard that PDF format 
uses byte offsets, which isn't supported by the current version of DFDL. There 
are proposed extensions to support it, but Daffodil does not implement them 
(yet). So it might not even be possible with current DFDL if those are required.

There are a number of publicly available schemas here:

https://github.com/orgs/DFDLSchemas/repositories?type=all

Though I'm not sure any of them are as complicated as PDF.

On 2024-08-08 10:48 AM, John Dziurlaj wrote:
> Hello,
> 
> Long time listener, first time caller.
> 
> I am looking to implement the XFA Format 
> <https://en.wikipedia.org/wiki/XFA>,
> for which PDF is one of its “wrappers”. In order to inject XFA (which 
> is XML) into PDF (which is binary), the native XML toolchain won’t 
> suffice. Has anyone attempted a DFDL schema for PDF? Are there any 
> schemas available for it for a similarly complex format?
> 
> Thanks!
> 
> John Dziurłaj /d͡ʑurwaj/
> 
> Sr. Solutions Architect, The Turnout
> 
> e: john@turnout.rocks <mailto:john@turnout.rocks>
> 
> s: +1 (330) 714-8935
> x: @dziurlaj
> work hours: 7am-3pm ET
> 
> http://turnout.rocks <http://turnout.rocks/>
> 
> @turnoutrocks
>

RE: Parsing PDF with DFDL

Reply via email to