-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On Fri, Apr 17, 2026 at 11:55:19PM -0700, Jayant Saxena wrote: > > > Hi @marmarek @ben-grande @marmarta, > > I've been working on the file converter project for GSoC 2026 and have > several PRs in progress across the ecosystem: > > - PR #38 (pdf-converter): Added password-protected PDF support with > zenity GUI prompts > - PR #39 (pdf-converter): Batch conversion for multiple files > - PR #463 (core-admin-client): Propagating device assign options > - PR #448 (qubes-manager): Template manager crash fixes > > I've studied the codebase and the lessons from PR #9, and I want to ask for > guidance on the architecture before implementing broader file format > support. > > *Key Questions:* > > 1. > > *Format Scope*: Should the initial GSoC work focus on Office documents > (DOCX, ODT, XLSX, PPTX) and exclude audio/video? I understand FFmpeg's > attack surface is very large.
IMO Office formats is the hardest of those parts, better move it to the end and focus on simpler parts first. > 2. > > *qrexec Protocol Design*: For handling multiple formats with different > options (passwords, sheet selection, resolution), what's the preferred > approach? > - Extend the current protocol with a format header: --format=docx > --password=X\n[data] > - Create a new service: qubes.FileConvert > - Keep format-specific services separate? I think service should be related to the output format. If the output is going to be a PDF document (transferred as a raw image), use the current service regardless of the input format. If the output is going to be video, use another one and so on. Source detection can be done in the service itself, to minimize required parsing on the client side. > > PR #9 was criticized for using "raw sockets"—what approach would you > prefer? That part was related to multiple files, not alternative formats, no? Anyway, I see there is use of `uno` python module - maybe it provides more elegant interface? If not, I guess sockets can be used to control LibreOffice programmatically... > 3. > > *Output Standardization*: Should all formats (DOCX, XLSX, PPTX) convert > to PDF via the existing bitmap pipeline, or preserve format? This is very import question, as it highly influence how the file is converted. In practice, I don't think it's realistic to keep the file both (safely) editable and accurate especially in terms of formatting. There are a lot of files that use custom styles, fonts, and sometimes even scripts (for example in spreadsheets) - sanitizing them with sufficient confidence is a lot of work, and even then I think some information will be lost in some cases. Theoretically, there could be a mode that produce safe editable output file at the cost of lost formatting but in practice it may not be that useful. And in Qubes OS, user always can open a file in a disposable qube, having it both accurate and sandboxed. So, I think it's okay to focus mainly on static output formats (like PDF for documents), and just extend what source files can be used and maybe make it more convenient (see below). > 4. > > *File Manager Integration*: Should the Nautilus extension use magic > bytes for format detection (not extension), and return non-zero for > unsupported formats? See above about filetype detection. I guess the simplest option would be to allow conversion for all, but fail on unsupported formats. I'd rather avoid parsing too much of the untrusted file on the client side. Maybe filtering on file extension would be enough in practice? > 5. > > *What should I read?* Beyond the original Qubes PDF converter blog post, > are there specific security papers or design docs on file conversion risks > I should study? > > I want to get the architecture right before investing time in > implementation. Your guidance would be really helpful. There is also Dangerzone project that implemented (or is in process) some of the above. I would propose to: 1. Investigate Dangerzone project. For example, I was told the server part is significantly extended compared to qubes, and should be fully compatible (at least in theory) with qubes client part. 2. Then go to video formats 3. Bring in OCR support to text file formats (Dangerzone already has it, but may require changes to use on qubes) - it isn't fully editable output, but closes the critical limitation of the current PDF output - copying text from the file. 4. And only at this point consider editable output formats. - -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEhrpukzGPukRmQqkK24/THMrX1ywFAmnyAiMACgkQ24/THMrX 1yyWjggAkTob4pLVlIFz00WxKGROpNBWvQOxBhqFSeOPxbXUWFacnKWTlaeRUawn p8SDyeufeUhBuUv4yg1Lw/m/LDZK2//KfzeKaIDHTcjJVFtRlwLYUdiu+V28cf89 VXYanaU7LmcphtQtsWb4ojqOjiFrNfIcGya7hwN5bfOFjlpa3IAvfiDHnUlhWgnP BpFkynY5SikpKIl/WEjlqMHGEMonVuSITVj6SYIfUXnqWU/oR6f3/9M8K2r6TiMQ At6C+J1wlXCv/9AmZvjhB9UhxYiS8Hjq37rQ4w3YYDg0KeDzAqZZ57/6Yx2wwKGm t0PpLuMSHHK1y8xL/1Xtz4nGGQrP/g== =jG6q -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "qubes-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/d/msgid/qubes-devel/afICI871YjIcFVtD%40mail-itl.
