Hi all, In this week, I'm trying to make CairoOutputDev to emit CCITT G4 or JBIG2 data to reduce the filesize (in my impression, the transcoding from CCITT G4/JBIG2 to Deflate increases the data size twice or more). Although cairo does not support CCITT emission yet, cairo supports JBIG2 emission (to PDF surface) already. However, because JBIG2 coded data is (sometimes) not self-contained, I had a few points to consider. So please let me ask your comments for appropriate design.
What is JBIG2Globals? --------------------- The problem is the handling of "Globals". Some JBIG2 streams in PDF may refer another binary data stream "Globals" that is shared by multiple JBIG2 images (by storing the same content as an external and shared resource, PDF can reduce the filesize). Here is the quote of PDF spec (PDF 32000-1:2008), p.33. 5 0 obj << /Type /XObject /Subtype /Image /Width 52 /Height 66 /ColorSpace /DeviceGray /BitsPerComponent 1 /Length 224 /Filter [/ASCIIHexDecode /JBIG2Decode] /DecodeParms [null << /JBIG2Globals 6 0 R >>] >> stream 000000013000010000001300000034000000420000000000 ... JBIG2Globals is a shared data stream stored at out of JBIG2 stream. Cairo interface to manage JBIG2Globals -------------------------------------- In cairo, we can pass 3 kinds related to JBIG2 data via cairo_surface_set_mime_data() API; 1) JBIG2 data itself (the stream in "5 0 obj" itself, in above example), 2) JBIG2 global data (the stream in "6 0 R" in above example), 3) Unique ID to specify which JBIG2 global data should be used in the decoding process. Yet I'm not fully understanding the official design in cairo, it seems that: unique-id (3) is passed for first, and JBIG2 image (1) is passed in next, and finally JBIG2 global data (2) is passed - when JBIG2 image is passed, cairo bind it with the latest declaration of the unique-id, and, when JBIG2 global data (2) is passed to cairo, cairo binds it with the latest declared unique-id. Therefore, even if we repeat sending same JBIG2 global data (2), as far as we don't change unique-id (3), only 1 JBIG2 global data is emitted to PDF output. The problem is "how we can determine the unique-id for JBIG2 global data?". Problem to make a unique-id for JBIG2Globals in PDF --------------------------------------------------- The easiest & straight-forward idea would be using the object reference and generation number (referring the JBIG2 global data) to form a unique-id. In above example, we can declare as "pdf-jbig2-globals-6-0". But, it seems that current design of JBIG2Stream hold the stream itself, not the indirect object referring to the stream (in above example, JBIG2Stream class could access to the content of "6 0 R" stream, but could not know how it is referred - the reference number (=6) and generation number (=0)). Furthermore, we could imagine a worse case, differently chained reference to same object; 1 0 obj << /Length 100 >> stream ..... endstream endobj 2 0 obj 1 0 R endobj 3 0 obj << ... /Filter /JBIG2Decode /DecodeParms << /JBIG2Globals 1 0 R >> stream ... endstream endobj 4 0 obj << ... /Filter /JBIG2Decode /DecodeParms << /JBIG2Globals 2 0 R >> stream ... endstream endobj I'm not sure if such chained indirect object is prohibited (I could not find such statement in PDF 32000-1:2008, p.21-22). If it is not prohibited, when we use "1 0 R" and "2 0 R" to make a unique-id, the global data would be duplicated. Question -------- There might be 2 ideas to solve such problem: A) Copying global data content to a temporal buffer (it is not useless work, because we should pass it to cairo anyway), and calculate some hash value, and use it as a unique-id. B) Tracking the chained reference to the stream object, and use the last referring object before the stream to make a unique-id. However, maybe we have to extend JBIG2Stream class to hold the referring object (or the reference number and generation number). Which is better, or any other good idea to make a unique-id for JBIG2 global data? Regards, mpsuzuki _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
