Hi Before I start work on this, I just want to check to make sure I'm not missing anything obvious. There isn't currently any interface exposed to permit users to progressively read a filtered PDF stream or do random I/O in an unfiltered stream, is there?
I'd like to provide a PdfInputStream-like interface for PdfStream, so that users can read huge streams in small segments. For streams that don't have any filters applied (be they file or memory based) it could also do random I/O. This is different from GetFilteredCopy(PdfOutputStream*), in that there's no need for the caller to implement a custom PdfOutputStream to do whatever work they need to do, and for file streams it doesn't have to allocate a temporary copy of the whole stream in RAM in order to filter it. It'd also be an easier interface to use for most work, especially where you might not even want to decode all the stream. The main use I have for this is in PoDoFoBrowser, where we really shouldn't have to allocate a whole stream in memory and possibly allocate another decompressed copy of it if it's flate filtered or similar. The same principle will apply to other programs processing big PDF streams (say, huge images) though. I'd like to preserve the existing interfaces in PdfStream, but rewrite GetCopy and GetFilteredCopy to use the underlying progressive reading interfaces. PdfStream would no longer make any assumption that a stream has an "internal buffer" that may be accessed; instead, it'll request data from the stream in small chunks and feed those to the output or to any required filter. The chunk size can be big enough that the (minimal) overhead of the function calls etc for the progressive reading should be basically undetectable, and concrete stream implementations can override the methods if they have a simpler way to do it anyway. Once I've got the PdfStream interface adjustments done it should be possible to do something like extract and write a 100MB image from a PDF without using more than a few hundred kb of RAM. Sound good? If so, the next thing I'll want to do is write a variant on PdfFileStream that uses an external temp file instead of a view into the original PDF, so it's possible to edit a stream without having to load the whole thing into RAM at once. Again, I'm sure you can see uses outside the obvious ones in PoDoFoBrowser. -- Craig Ringer ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Podofo-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/podofo-users
