I can't seem to reproduce the slow down any more. I am assuming something was not rebuilt correctly when the slowdown occurred. I've pushed out the patch as I am now confident there is no regression.
On 22/01/15 21:11, Adrian Johnson wrote: > I'll take a look and see if I can work out what is going on. > > On 22/01/15 16:58, suzuki toshiya wrote: >> Dear Adrian, >> >> I've tested with The_Miseries_of_Human_Life.pdf, but >> strangely I could not reproduce the slowdown issue. >> Furthermore, pdftocairo with JBIG2 patch finishes >> the rendering faster than one without JBIG2 patch. >> Maybe my machine (Core2Duo 1GHz, RAM 2GB) is quite >> slower than yours - in fact, my machine spent 3-5 minutes >> to render the PDF by pdftocairo. >> >> I uploaded Linux/i386 static binaries built with "-pg -g -O2" at: >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2 >> >> Also I uploaded sample profiles (gmon.out) at: >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.out >> >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.out >> >> >> >> Their analysis outputs by gprof are at: >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.log >> >> http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.log >> >> >> If I made the binaries without profiler (without -pg) >> the processing times are almost same. So I think this is >> not because the profiler made the rendering slower. >> >> Comparing the profile results, the remarkable difference >> would be that 3 functions OUT of poppler spent long time >> in original pdftocairo: >> * _cairo_pdf_surface_show_page() spent 35.77s (spent 0.08s in patched) >> * longest_match() (of zlib) spent 18.72s (spent 0.72s in patched) >> * _cairo_image_analyze_color() spent 13.02s (not called in patched) >> >> I guess they are used to reencoding of pixel data from >> JBIG2 to Deflate. If it's required, I will make a profiled >> Cairo library and check more detail. >> >> -- >> >> At present, I don't have an access to the machines that >> can render The_Miseries_of_Human_Life.pdf within 1 min. >> So, please help me to analyze the cause of slowdown. >> >> Could you build the binary with profiler and show the >> profiled results? Of course you can use my binaries to >> check if our difference is because of the machines, or, >> because of different configurations. >> >> Regards, >> mpsuzuki >> >> >> suzuki toshiya wrote: >>> Dear Adrian, >>> >>> Great Thank you for benchmark evaluation. >>> I will try to analyze the slowdown and >>> improve the performance. >>> >>> Regards, >>> mpsuzuki >>> >>> Adrian Johnson wrote: >>>> I've tested the lastest patch. It works fine but I noticed about a 50% >>>> slow down with pdfs containing a lot of JBIG2 images. >>>> >>>> I've uploaded a couple of sample pdfs here: >>>> http://people.freedesktop.org/~ajohnson/jbig2/ >>>> >>>> Below are the timings for each document using git master and git master >>>> with the jbig2 patch. I used the command line >>>> pdftocairo -pdf input.pdf output.pdf >>>> >>>> >>>> - AnnualReport.pdf >>>> master >>>> real 0m13.562s >>>> user 0m12.620s >>>> sys 0m0.820s >>>> >>>> jbig2 >>>> real 0m21.045s >>>> user 0m20.000s >>>> sys 0m1.060s >>>> >>>> >>>> - The_Miseries_of_Human_Life.pdf >>>> master >>>> real 0m32.690s >>>> user 0m29.716s >>>> sys 0m2.832s >>>> >>>> jbig2 >>>> real 0m42.793s >>>> user 0m39.680s >>>> sys 0m2.848s >>>> >>>> >>>> While it is still worth a small performance reduction to get the >>>> significant reduction in the output file size, it may be worth seeing if >>>> there is a simple solution to the slowdown. My first guess is to try >>>> attaching the global data to only one image instead of to every image. >>>> >>>> >>>> On 20/01/15 02:42, suzuki toshiya wrote: >>>>> Dear Carlos, >>>>> >>>>> Sorry! I had a silly mistake, I should pass ref->getRef() >>>>> (in my previous patch, I passed ref->isRef()). >>>>> I attach corrected patch. >>>>> >>>>> Regards, >>>>> mpsuzuki >>>>> >>>>> suzuki toshiya wrote: >>>>>> Dear Carlos, >>>>>> >>>>>> Thank you for finding out the points! I attached revised patch. >>>>>> The difference from my previous revision is following. >>>>>> >>>>>> Regards, >>>>>> mpsuzuki >>>>>> >>>>>> >>>>>> diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc >>>>>> index a428d2c..17b6b3b 100644 >>>>>> --- a/poppler/CairoOutputDev.cc >>>>>> +++ b/poppler/CairoOutputDev.cc >>>>>> @@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState >>>>>> *state, Stream *str, Object *ref, >>>>>> #endif >>>>>> >>>>>> if (getStreamData (str->getNextStream(), &strBuffer, &len)) { >>>>>> - cairo_status_t status; >>>>>> + cairo_status_t status = CAIRO_STATUS_SUCCESS; >>>>>> >>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2) >>>>>> if (ref && ref->isRef()) { >>>>>> - Ref imgRef = ref->getRef(); >>>>>> status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID, >>>>>> - "poppler-surface-", imgRef); >>>>>> + "poppler-surface-", ref->isRef()); >>>>>> } >>>>>> #endif >>>>>> if (!status) { >>>>>> >>>>>> >>>>>> >>>>>> Carlos Garcia Campos wrote: >>>>>>> Adrian Johnson <[email protected]> writes: >>>>>>> >>>>>>>> The latest patch looks good. I'll wait a week before committing >>>>>>>> to see >>>>>>>> if anyone else has any comments. This will give me time to do some >>>>>>>> testing of the patch. >>>>>>> Looks good to me as well in general. There's only one thing I've >>>>>>> noticed. >>>>>>> >>>>>>>> @@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState >>>>>>>> *state, Stream *str, Object *ref, >>>>>>>> if (!colorMapHasIdentityDecodeMap(colorMap)) >>>>>>>> return; >>>>>>>> >>>>>>>> +#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0) >>>>>>>> + if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, >>>>>>>> image)) >>>>>>>> + return; >>>>>>>> +#endif >>>>>>>> + >>>>>>>> if (getStreamData (str->getNextStream(), &strBuffer, &len)) { >>>>>>>> - cairo_status_t st; >>>>>>>> + cairo_status_t status; >>>>>>>> >>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2) >>>>>>>> if (ref && ref->isRef()) { >>>>>>>> Ref imgRef = ref->getRef(); >>>>>>>> - GooString *surfaceId = new GooString("poppler-surface-"); >>>>>>>> - surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num); >>>>>>>> - char *idBuffer = copyString(surfaceId->getCString()); >>>>>>>> - st = cairo_surface_set_mime_data (image, >>>>>>>> CAIRO_MIME_TYPE_UNIQUE_ID, >>>>>>>> - (const unsigned char >>>>>>>> *)idBuffer, >>>>>>>> - surfaceId->getLength(), >>>>>>>> - gfree, idBuffer); >>>>>>>> - if (st) >>>>>>>> - gfree(idBuffer); >>>>>>>> - delete surfaceId; >>>>>>>> + status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID, >>>>>>>> + "poppler-surface-", imgRef); >>>>>>> We could pass ref->getRef() directly here. >>>>>>> >>>>>>>> } >>>>>>>> #endif >>>>>>>> + if (!status) { >>>>>>> status might be uninitialized at this point if cairo version is not >>>>>>> recent enough or the ref && ref->isRef() condition is False. >>>>>>> >>>>>>>> + status = cairo_surface_set_mime_data (image, mime_type, >>>>>>>> + (const unsigned char *)strBuffer, len, >>>>>>>> + gfree, strBuffer); >>>>>>>> + } >>>>>>> >>>>>>>> - st = cairo_surface_set_mime_data (image, >>>>>>>> - str->getKind() == strDCT ? >>>>>>>> - CAIRO_MIME_TYPE_JPEG : CAIRO_MIME_TYPE_JP2, >>>>>>>> - (const unsigned char *)strBuffer, len, >>>>>>>> - gfree, strBuffer); >>>>>>>> - if (st) >>>>>>>> + if (status) >>>>>>>> gfree (strBuffer); >>>>>>>> } >>>>>>>> } >>>>>>>> On 11/01/15 23:05, suzuki toshiya wrote: >>>>>>>>> Dear Adrian, >>>>>>>>> >>>>>>>>> OK - considering that there is a lag to the day that Linux >>>>>>>>> distribution ship the poppler stable branch including >>>>>>>>> CairoOutputDev with JBIG2 support even if I support some >>>>>>>>> snapshot of Cairo in current development branch of poppler >>>>>>>>> (and there would be no need to support Cairo snapshot in >>>>>>>>> the day), I decided to follow original version checking. >>>>>>>>> >>>>>>>>> Also I removed the default value (NULL) for Ref object to >>>>>>>>> JBIG2Stream constructor and removed NULL pointer checking >>>>>>>>> in JBIG2Stream constructor (as same as the argument to pass >>>>>>>>> Globals object). >>>>>>>>> >>>>>>>>> I replaced "st" in CairoOutputDev, by "status". >>>>>>>>> >>>>>>>>> Here I attached revised patch - oh, I'm quite sorry, my >>>>>>>>> previous posts were misunderstanding this year as 2014. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> mpsuzuki >>>>>>>>> >>>>>>>>> Adrian Johnson wrote: >>>>>>>>>> On 08/01/15 13:35, suzuki toshiya wrote: >>>>>>>>>>> Dear Adrian, >>>>>>>>>>> >>>>>>>>>>> Great Thank you for prompt review! I attached the revised patch, >>>>>>>>>>> but please let me discuss a few points. >>>>>>>>>>> >>>>>>>>>>> Adrian Johnson wrote: >>>>>>>>>>>> On 07/01/15 02:58, suzuki toshiya wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> Here I submit a preliminary patch for git head, >>>>>>>>>>>>> which enables JBIG2 embedding to PDF surface via Cairo. >>>>>>>>>>>> +static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, >>>>>>>>>>>> int >>>>>>>>>>>> refGen, >>>>>>>>>>>> + >>>>>>>>>>>> cairo_surface_t >>>>>>>>>>>> *image) >>>>>>>>>>>> +{ >>>>>>>>>>>> + GooString *globalsStrId; >>>>>>>>>>>> + char *idBuffer; >>>>>>>>>>>> + cairo_status_t st; >>>>>>>>>>>> + >>>>>>>>>>>> + >>>>>>>>>>>> + globalsStrId = new GooString; >>>>>>>>>>>> + globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen); >>>>>>>>>>>> + idBuffer = copyString(globalsStrId->getCString()); >>>>>>>>>>>> + st = cairo_surface_set_mime_data (image, >>>>>>>>>>>> CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID, >>>>>>>>>>>> >>>>>>>>>>>> The cairo JBIG2 mime types are new to 1.14.0 so will need to be >>>>>>>>>>>> inside a >>>>>>>>>>>> #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0). >>>>>>>>>>> If it's acceptable, I want to use the conditional like >>>>>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID >>>>>>>>>>> The official release of cairo including JBIG2 support is 1.14.0, >>>>>>>>>>> however, some distributions use a snapshot before 1.14 which >>>>>>>>>>> supports JBIG2. For example, Ubuntu 14.04 (long time support) >>>>>>>>>>> ships with 1.13.0~20140204. >>>>>>>>>>> >>>>>>>>>>> http://packages.ubuntu.com/search?suite=trusty§ion=all&arch=any&keywords=libcairo2&searchon=names >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> It includes JBIG2 support. Thus I prefer checking by the JBIG2 >>>>>>>>>>> macros, instead of the version checking. >>>>>>>>>> I'm not keen on making an exception for this case. The target >>>>>>>>>> audience >>>>>>>>>> for this is very small. It would only help users that compile a >>>>>>>>>> new >>>>>>>>>> version of poppler on an old distribution without also recompiling >>>>>>>>>> cairo. JBIG2 support for pdftocairo is a very minor feature >>>>>>>>>> that not >>>>>>>>>> many people would be using (so far no one else has requested this >>>>>>>>>> feature). >>>>>>>>>> >>>>>>>>>> If you really want to support distributions using a cairo 1.13 >>>>>>>>>> snapshot >>>>>>>>>> I suggest adding a comment before the first occurrence of >>>>>>>>>> #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the >>>>>>>>>> version macro >>>>>>>>>> is not used. Also include >>>>>>>>>> "#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the >>>>>>>>>> comment so >>>>>>>>>> that: >>>>>>>>>> a) searching for CAIRO_VERSION will find it, and >>>>>>>>>> b) to document that JBIG2 support is in 1.14 or later. >>>>>>>>>> >>>>>>>>>>> BTW, when the string for UNIQUE_ID is constructed, its syntax >>>>>>>>>>> is RefGen-RefNum. In PDF data, RefNum then RefGen is popular >>>>>>>>>>> syntax. Is there any reason why the order is reverted? >>>>>>>>>> No reason. Feel free to change it to num-gen order. >>>>>>>>>> >>>>>>>>>>>> +GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream *str, >>>>>>>>>>>> + >>>>>>>>>>>> cairo_surface_t >>>>>>>>>>>> *image) >>>>>>>>>>>> +{ >>>>>>>>>>>> + JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str); >>>>>>>>>>>> + Object* globalsStr = jbig2Str->getGlobalsStream(); >>>>>>>>>>>> + char *globalsBuffer; >>>>>>>>>>>> + int globalsLength; >>>>>>>>>>>> + >>>>>>>>>>>> + // According to PDF spec, Globals should be stream (or >>>>>>>>>>>> nothing) >>>>>>>>>>>> + >>>>>>>>>>>> + if (globalsStr->isNull()) >>>>>>>>>>>> + return gTrue; >>>>>>>>>>>> + >>>>>>>>>>>> + if (!globalsStr->isStream()) >>>>>>>>>>>> + return gFalse; >>>>>>>>>>>> + >>>>>>>>>>>> + // See JBIG2Stream constructor for valid reference number >>>>>>>>>>>> + if (!jbig2Str->getGlobalsStreamRefNum()) >>>>>>>>>>>> + return gFalse; >>>>>>>>>>>> >>>>>>>>>>>> The globalsStr->isStream() check should be sufficient. The >>>>>>>>>>>> reference >>>>>>>>>>>> number should always be valid of the global stream is valid. >>>>>>>>>>> Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to >>>>>>>>>>> it must be always valid as far as GlobalsStream is used" is >>>>>>>>>>> intuitive solution, but public JBIG2Stream constructor has not >>>>>>>>>>> requested the Ref to GlobalsStream before in earlier version. >>>>>>>>>>> >>>>>>>>>>> If some developers use JBIG2Stream constructor with valid >>>>>>>>>>> GlobalsStream but without Ref to it (as the syntax before my >>>>>>>>>>> patch), there is the situation that JBIG2Stream::globalsStream >>>>>>>>>>> is valid but Ref to it is invalid (if I supply the default >>>>>>>>>>> value for the argument to pass Ref). How do we detect or prevent >>>>>>>>>>> such case? >>>>>>>>>>> >>>>>>>>>>> A) Supplying the default value (NULL pointer) in the argument >>>>>>>>>>> to pass the Ref, and JBIG2Stream returns an error when >>>>>>>>>>> globalsStream is valid stream but its Ref is not passed - and >>>>>>>>>>> prevent the case that globalsStream is valid but its Ref is >>>>>>>>>>> invalid. >>>>>>>>>>> >>>>>>>>>>> B) Not supplying the default value, and prevent the construction >>>>>>>>>>> without the Ref to the globalsStream. >>>>>>>>>>> >>>>>>>>>>> C) Add new boolean value in JBIG2Stream class, to indicate if >>>>>>>>>>> Ref to globalsStream is explicitly set. >>>>>>>>>>> >>>>>>>>>>> D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535) >>>>>>>>>>> internally when Ref is not passed, and deal the case as Ref to >>>>>>>>>>> the globalsStream is not initialized (my last patch design) >>>>>>>>>>> >>>>>>>>>>> Which is the best? Or, no need to case such case? >>>>>>>>>> JBIG2Stream is not part of the public API so there is no need to >>>>>>>>>> preserve compatibility. >>>>>>>>>> >>>>>>>>>> Something else I just noticed - could you rename "st" to "status". >>>>>>>>>> CairoOutputdev.cc always uses the name "status" for cairo >>>>>>>>>> status so we >>>>>>>>>> should be consistent with this convention. >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> poppler mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler >>>>>> >>>>>> ------------------------------------------------------------------------ >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> poppler mailing list >>>>>> [email protected] >>>>>> http://lists.freedesktop.org/mailman/listinfo/poppler >>>>> >>>>> >>>>> _______________________________________________ >>>>> poppler mailing list >>>>> [email protected] >>>>> http://lists.freedesktop.org/mailman/listinfo/poppler >>>>> >>>> >>> >>> _______________________________________________ >>> poppler mailing list >>> [email protected] >>> http://lists.freedesktop.org/mailman/listinfo/poppler >>> >> _______________________________________________ >> poppler mailing list >> [email protected] >> http://lists.freedesktop.org/mailman/listinfo/poppler > > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
