Re: [Podofo-users] PDF streams - progressive and random reads

Leonard Rosenthol Sat, 25 Oct 2008 15:03:26 -0700

Can this be done by using standard C++ iostreams, rather then creating  
a new model?   What about the boost stream extensions?


Leonard

On Oct 23, 2008, at 3:25 PM, Craig Ringer wrote:

> Hi
>
> Before I start work on this, I just want to check to make sure I'm not
> missing anything obvious. There isn't currently any interface  
> exposed to
> permit users to progressively read a filtered PDF stream or do random
> I/O in an unfiltered stream, is there?
>
> I'd like to provide a PdfInputStream-like interface for PdfStream, so
> that users can read huge streams in small segments. For streams that
> don't have any filters applied (be they file or memory based) it could
> also do random I/O.
>
> This is different from GetFilteredCopy(PdfOutputStream*), in that
> there's no need for the caller to implement a custom PdfOutputStream  
> to
> do whatever work they need to do, and for file streams it doesn't have
> to allocate a temporary copy of the whole stream in RAM in order to
> filter it. It'd also be an easier interface to use for most work,
> especially where you might not even want to decode all the stream.
>
> The main use I have for this is in PoDoFoBrowser, where we really
> shouldn't have to allocate a whole stream in memory and possibly
> allocate another decompressed copy of it if it's flate filtered or
> similar. The same principle will apply to other programs processing  
> big
> PDF streams (say, huge images) though.
>
> I'd like to preserve the existing interfaces in PdfStream, but rewrite
> GetCopy and GetFilteredCopy to use the underlying progressive reading
> interfaces. PdfStream would no longer make any assumption that a  
> stream
> has an "internal buffer" that may be accessed; instead, it'll request
> data from the stream in small chunks and feed those to the output or  
> to
> any required filter. The chunk size can be big enough that the  
> (minimal)
> overhead of the function calls etc for the progressive reading  
> should be
> basically undetectable, and concrete stream implementations can  
> override
> the methods if they have a simpler way to do it anyway.
>
> Once I've got the PdfStream interface adjustments done it should be
> possible to do something like extract and write a 100MB image from a  
> PDF
> without using more than a few hundred kb of RAM.
>
> Sound good? If so, the next thing I'll want to do is write a variant  
> on
> PdfFileStream that uses an external temp file instead of a view into  
> the
> original PDF, so it's possible to edit a stream without having to load
> the whole thing into RAM at once. Again, I'm sure you can see uses
> outside the obvious ones in PoDoFoBrowser.
>
> --
> Craig Ringer
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Podofo-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/podofo-users
>


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Podofo-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/podofo-users

Re: [Podofo-users] PDF streams - progressive and random reads

Reply via email to