Dear Adrian,

I've tested with The_Miseries_of_Human_Life.pdf, but
strangely I could not reproduce the slowdown issue.
Furthermore, pdftocairo with JBIG2 patch finishes
the rendering faster than one without JBIG2 patch.
Maybe my machine (Core2Duo 1GHz, RAM 2GB) is quite
slower than yours - in fact, my machine spent 3-5 minutes
to render the PDF by pdftocairo.

I uploaded Linux/i386 static binaries built with "-pg -g -O2" at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2

Also I uploaded sample profiles (gmon.out) at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.out
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.out


Their analysis outputs by gprof are at:
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-4m20s.gmon.log
http://gyvern.ipc.hiroshima-u.ac.jp/~mpsuzuki/JBIG2/pdftocairo-jbig2-3m03s.gmon.log

If I made the binaries without profiler (without -pg)
the processing times are almost same. So I think this is
not because the profiler made the rendering slower.

Comparing the profile results, the remarkable difference
would be that 3 functions OUT of poppler spent long time
in original pdftocairo:
* _cairo_pdf_surface_show_page() spent 35.77s (spent 0.08s in patched)
* longest_match() (of zlib) spent 18.72s (spent 0.72s in patched)
* _cairo_image_analyze_color() spent 13.02s (not called in patched)

I guess they are used to reencoding of pixel data from
JBIG2 to Deflate. If it's required, I will make a profiled
Cairo library and check more detail.

--

At present, I don't have an access to the machines that
can render The_Miseries_of_Human_Life.pdf within 1 min.
So, please help me to analyze the cause of slowdown.

Could you build the binary with profiler and show the
profiled results? Of course you can use my binaries to
check if our difference is because of the machines, or,
because of different configurations.

Regards,
mpsuzuki


suzuki toshiya wrote:
Dear Adrian,

Great Thank you for benchmark evaluation.
I will try to analyze the slowdown and
improve the performance.

Regards,
mpsuzuki

Adrian Johnson wrote:
I've tested the lastest patch. It works fine but I noticed about a 50%
slow down with pdfs containing a lot of JBIG2 images.

I've uploaded a couple of sample pdfs here:
http://people.freedesktop.org/~ajohnson/jbig2/

Below are the timings for each document using git master and git master
with the jbig2 patch. I used the command line
pdftocairo -pdf input.pdf output.pdf


- AnnualReport.pdf
master
real    0m13.562s
user    0m12.620s
sys     0m0.820s

jbig2
real    0m21.045s
user    0m20.000s
sys     0m1.060s
        

- The_Miseries_of_Human_Life.pdf
master
real    0m32.690s
user    0m29.716s
sys     0m2.832s

jbig2
real    0m42.793s
user    0m39.680s
sys     0m2.848s


While it is still worth a small performance reduction to get the
significant reduction in the output file size, it may be worth seeing if
there is a simple solution to the slowdown. My first guess is to try
attaching the global data to only one image instead of to every image.


On 20/01/15 02:42, suzuki toshiya wrote:
Dear Carlos,

Sorry! I had a silly mistake, I should pass ref->getRef()
(in my previous patch, I passed ref->isRef()).
I attach corrected patch.

Regards,
mpsuzuki

suzuki toshiya wrote:
Dear Carlos,

Thank you for finding out the points! I attached revised patch.
The difference from my previous revision is following.

Regards,
mpsuzuki


diff --git a/poppler/CairoOutputDev.cc b/poppler/CairoOutputDev.cc
index a428d2c..17b6b3b 100644
--- a/poppler/CairoOutputDev.cc
+++ b/poppler/CairoOutputDev.cc
@@ -2833,13 +2833,12 @@ void CairoOutputDev::setMimeData(GfxState *state, 
Stream *str, Object *ref,
  #endif

    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
-    cairo_status_t status;
+    cairo_status_t status = CAIRO_STATUS_SUCCESS;

  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
      if (ref && ref->isRef()) {
-      Ref imgRef = ref->getRef();
        status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
-                                "poppler-surface-", imgRef);
+                                "poppler-surface-", ref->isRef());
      }
  #endif
      if (!status) {



Carlos Garcia Campos wrote:
Adrian Johnson <[email protected]> writes:

The latest patch looks good. I'll wait a week before committing to see
if anyone else has any comments. This will give me time to do some
testing of the patch.
Looks good to me as well in general. There's only one thing I've
noticed.

@@ -2746,31 +2827,28 @@ void CairoOutputDev::setMimeData(GfxState *state, 
Stream *str, Object *ref,
    if (!colorMapHasIdentityDecodeMap(colorMap))
      return;

+#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)
+  if (strKind == strJBIG2 && !setMimeDataForJBIG2Globals(str, image))
+    return;
+#endif
+
    if (getStreamData (str->getNextStream(), &strBuffer, &len)) {
-    cairo_status_t st;
+    cairo_status_t status;

  #if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 11, 2)
      if (ref && ref->isRef()) {
        Ref imgRef = ref->getRef();
-      GooString *surfaceId = new GooString("poppler-surface-");
-      surfaceId->appendf("{0:d}-{1:d}", imgRef.gen, imgRef.num);
-      char *idBuffer = copyString(surfaceId->getCString());
-      st = cairo_surface_set_mime_data (image, CAIRO_MIME_TYPE_UNIQUE_ID,
-                                        (const unsigned char *)idBuffer,
-                                        surfaceId->getLength(),
-                                        gfree, idBuffer);
-      if (st)
-        gfree(idBuffer);
-      delete surfaceId;
+      status = setMimeIdFromRef(image, CAIRO_MIME_TYPE_UNIQUE_ID,
+                                "poppler-surface-", imgRef);
We could pass ref->getRef() directly here.

      }
  #endif
+    if (!status) {
status might be uninitialized at this point if cairo version is not
recent enough or the ref && ref->isRef() condition is False.

+      status = cairo_surface_set_mime_data (image, mime_type,
+                                           (const unsigned char *)strBuffer, 
len,
+                                           gfree, strBuffer);
+    }

-    st = cairo_surface_set_mime_data (image,
-                                     str->getKind() == strDCT ?
-                                     CAIRO_MIME_TYPE_JPEG : 
CAIRO_MIME_TYPE_JP2,
-                                     (const unsigned char *)strBuffer, len,
-                                     gfree, strBuffer);
-    if (st)
+    if (status)
        gfree (strBuffer);
    }
  }
On 11/01/15 23:05, suzuki toshiya wrote:
Dear Adrian,

OK - considering that there is a lag to the day that Linux
distribution ship the poppler stable branch including
CairoOutputDev with JBIG2 support even if I support some
snapshot of Cairo in current development branch of poppler
(and there would be no need to support Cairo snapshot in
the day), I decided to follow original version checking.

Also I removed the default value (NULL) for Ref object to
JBIG2Stream constructor and removed NULL pointer checking
in JBIG2Stream constructor (as same as the argument to pass
Globals object).

I replaced "st" in CairoOutputDev, by "status".

Here I attached revised patch - oh, I'm quite sorry, my
previous posts were misunderstanding this year as 2014.

Regards,
mpsuzuki

Adrian Johnson wrote:
On 08/01/15 13:35, suzuki toshiya wrote:
Dear Adrian,

Great Thank you for prompt review! I attached the revised patch,
but please let me discuss a few points.

Adrian Johnson wrote:
On 07/01/15 02:58, suzuki toshiya wrote:
Hi all,

Here I submit a preliminary patch for git head,
which enables JBIG2 embedding to PDF surface via Cairo.
+static cairo_status_t setJBIG2GlobalsIdByRefNums(int refNum, int
refGen,
+                                                 cairo_surface_t
*image)
+{
+  GooString *globalsStrId;
+  char *idBuffer;
+  cairo_status_t st;
+
+
+  globalsStrId = new GooString;
+  globalsStrId->appendf("{0:d}-{1:d}", refNum, refGen);
+  idBuffer = copyString(globalsStrId->getCString());
+  st = cairo_surface_set_mime_data (image,
CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID,

The cairo JBIG2 mime types are new to 1.14.0 so will need to be
inside a
#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0).
If it's acceptable, I want to use the conditional like
     #ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID
The official release of cairo including JBIG2 support is 1.14.0,
however, some distributions use a snapshot before 1.14 which
supports JBIG2. For example, Ubuntu 14.04 (long time support)
ships with 1.13.0~20140204.

http://packages.ubuntu.com/search?suite=trusty&section=all&arch=any&keywords=libcairo2&searchon=names


It includes JBIG2 support. Thus I prefer checking by the JBIG2
macros, instead of the version checking.
I'm not keen on making an exception for this case. The target audience
for this is very small. It would only help users that compile a new
version of poppler on an old distribution without also recompiling
cairo. JBIG2 support for pdftocairo is a very minor feature that not
many people would be using (so far no one else has requested this
feature).

If you really want to support distributions using a cairo 1.13 snapshot
I suggest adding a comment before the first occurrence of
#ifdef CAIRO_MIME_TYPE_JBIG2_GLOBAL_ID to explain why the version macro
is not used. Also include
"#if CAIRO_VERSION >= CAIRO_VERSION_ENCODE(1, 14, 0)" in the comment so
that:
a) searching for CAIRO_VERSION will find it, and
b) to document that JBIG2 support is in 1.14 or later.

BTW, when the string for UNIQUE_ID is constructed, its syntax
is RefGen-RefNum. In PDF data, RefNum then RefGen is popular
syntax. Is there any reason why the order is reverted?
No reason. Feel free to change it to num-gen order.

+GBool CairoOutputDev::setMimeDataForJBIG2Globals(Stream  *str,
+                                                 cairo_surface_t
*image)
+{
+  JBIG2Stream *jbig2Str = static_cast<JBIG2Stream *>(str);
+  Object* globalsStr = jbig2Str->getGlobalsStream();
+  char *globalsBuffer;
+  int globalsLength;
+
+  // According to PDF spec, Globals should be stream (or nothing)
+
+  if (globalsStr->isNull())
+    return gTrue;
+
+  if (!globalsStr->isStream())
+    return gFalse;
+
+  // See JBIG2Stream constructor for valid reference number
+  if (!jbig2Str->getGlobalsStreamRefNum())
+    return gFalse;

The globalsStr->isStream() check should be sufficient. The reference
number should always be valid of the global stream is valid.
Umm. Requesting "both of JBIG2Stream::globalsStream and Ref to
it must be always valid as far as GlobalsStream is used" is
intuitive solution, but public JBIG2Stream constructor has not
requested the Ref to GlobalsStream before in earlier version.

If some developers use JBIG2Stream constructor with valid
GlobalsStream but without Ref to it (as the syntax before my
patch), there is the situation that JBIG2Stream::globalsStream
is valid but Ref to it is invalid (if I supply the default
value for the argument to pass Ref). How do we detect or prevent
such case?

A) Supplying the default value (NULL pointer) in the argument
to pass the Ref, and JBIG2Stream returns an error when
globalsStream is valid stream but its Ref is not passed - and
prevent the case that globalsStream is valid but its Ref is
invalid.

B) Not supplying the default value, and prevent the construction
without the Ref to the globalsStream.

C) Add new boolean value in JBIG2Stream class, to indicate if
Ref to globalsStream is explicitly set.

D) Initialize Ref with irregular value (e.g. 0,0 or 0,65535)
internally when Ref is not passed, and deal the case as Ref to
the globalsStream is not initialized (my last patch design)

Which is the best? Or, no need to case such case?
JBIG2Stream is not part of the public API so there is no need to
preserve compatibility.

Something else I just noticed - could you rename "st" to "status".
CairoOutputdev.cc always uses the name "status" for cairo status so we
should be consistent with this convention.

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

------------------------------------------------------------------------

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler


_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler



_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

_______________________________________________
poppler mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/poppler

Reply via email to