On 2020-04-02 21:46, Marek Marczykowski-Górecki wrote:>> Marek: is OCR on a converted PDF safe? Being able to reconstruct the >> text is very much useful. > > That's a tricky question. qpdf-convert-server have significant control > over input for such OCR (within realm of valid image data). So, given > complexity of OCR software, I think nothing can be completely ruled out. > But also, I think (because of guaranteed proper input format) some > catastrophic failure is unlikely. > > In fact, I consider another method for preserving text data. Enhanced > "simpler representation", which besides pure image, contains also text > annotations. Thing like series of (coordinates, text) pairs. This needs > careful design, to be reasonably safe (for example defining what "text" > could contain, to not risk re-interpreting it as something else in the > PDF, or some intermediate tool).
That would be absolutely awesome. The biggest problem with qvm-convert-pdf is the loss of text, and keeping the text would make it far more usable. >> Also, could this be integrated into CUPS? > > I don't see why not. Given how insecure printers are, this would be a very good idea. Perhaps similar technology (possibly based on something like seL4 instead of Xen) could be incorporated into printers themselves. Sincerely, Demi -- You received this message because you are subscribed to the Google Groups "qubes-devel" group. To unsubscribe from this group and stop receiving emails from it, send an email to qubes-devel+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/qubes-devel/c9c61ca0-3d6f-6694-39de-9382ea27f98f%40gmail.com.
signature.asc
Description: OpenPGP digital signature