-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello,
If I remember correctly, some time ago someone proposed caching the TextOuputDev/TextPage used in Poppler::Page::search to improve performance. Instead, I would propose to add another search method to Poppler::Page which searches the whole page at once and returns a list of all occurrences. Applications using the qt4 frontend and this method could then decide whether to cache this information or not. The implementation of the current search method would not change. The appended patch does this. But the two search methods share some duplicate code. I am not sure what the best way to avoid this is. Testing this with some sample files shows large improvements (above 100% as measured by runtime) for searching the whole document and especially for short phrases that occur often. Thanks for any comments and advice. Best regards, Adam. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJP7H4JAAoJEPSSjE3STU34kOcIALdNTf42b+9si+OYy3ZeLDTj S/0wHhAqtpCE6arBeN4kudVaSPB80MKzndHOpOHTm+KM79pjp4zYQHu3DIikBScT B8mo5+ut987T4gPOWpfzRi4R1DBpA7Dukla0Az48WJl8PoeE6KM0uIyskVnjWN3V bOqKnCcGhbtoUgIoMvlKh1gf9SJjIar/6Z9/q7mPefq59aCKQliudF/dMpfoLR9a G9zTWUObtm9IwAXyKTR1+o3raNKHSsZF6Q2qpECi0CtDj2LZDwTpFNc/dw35kkql XYHrvw7dmAPIHz8BHh2blGiCjul9FzOmdGdi8n3XI2mgquuhSCD+zbtyEe8b+eA= =Qm8+ -----END PGP SIGNATURE-----
>From 46cc6f78fe17df89751e31f1fed8cf89dca64858 Mon Sep 17 00:00:00 2001 From: Adam Reichold <[email protected]> Date: Thu, 28 Jun 2012 17:42:17 +0200 Subject: [PATCH] add whole-page search method to Poppler::Page --- qt4/src/poppler-page.cc | 38 ++++++++++++++++++++++++++++++++++++++ qt4/src/poppler-qt4.h | 9 +++++++++ 2 Dateien geändert, 47 Zeilen hinzugefügt(+) diff --git a/qt4/src/poppler-page.cc b/qt4/src/poppler-page.cc index 6a16d03..6a794e3 100644 --- a/qt4/src/poppler-page.cc +++ b/qt4/src/poppler-page.cc @@ -427,6 +427,44 @@ bool Page::search(const QString &text, QRectF &rect, SearchDirection direction, return found; } +QList<QRectF> Page::search(const QString &text, SearchMode caseSensitive, Rotation rotate) const +{ + const QChar * str = text.unicode(); + int len = text.length(); + QVector<Unicode> u(len); + for (int i = 0; i < len; ++i) u[i] = str[i].unicode(); + + GBool sCase; + if (caseSensitive == CaseSensitive) sCase = gTrue; + else sCase = gFalse; + + int rotation = (int)rotate * 90; + + QList<QRectF> results; + double sLeft = 0.0, sTop = 0.0, sRight = 0.0, sBottom = 0.0; + + TextOutputDev td(NULL, gTrue, 0, gFalse, gFalse); + m_page->parentDoc->doc->displayPage( &td, m_page->index + 1, 72, 72, rotation, false, true, false ); + TextPage *textPage=td.takeText(); + + while(textPage->findText( u.data(), len, + gFalse, gTrue, gTrue, gFalse, sCase, gFalse, gFalse, &sLeft, &sTop, &sRight, &sBottom )) + { + QRectF result; + + result.setLeft(sLeft); + result.setTop(sTop); + result.setRight(sRight); + result.setBottom(sBottom); + + results.append(result); + } + + textPage->decRefCnt(); + + return results; +} + QList<TextBox*> Page::textList(Rotation rotate) const { TextOutputDev *output_dev; diff --git a/qt4/src/poppler-qt4.h b/qt4/src/poppler-qt4.h index 827ea53..a363f78 100644 --- a/qt4/src/poppler-qt4.h +++ b/qt4/src/poppler-qt4.h @@ -602,6 +602,15 @@ delete it; \since 0.14 **/ bool search(const QString &text, double &rectLeft, double &rectTop, double &rectRight, double &rectBottom, SearchDirection direction, SearchMode caseSensitive, Rotation rotate = Rotate0) const; + + /** + Returns a list of all occurences of the specified text on the page. + + \param text the text to search + \param caseSensitive whether to be case sensitive + \param rotate the rotation to apply for the search order + **/ + QList<QRectF> search(const QString &text, SearchMode caseSensitive, Rotation rotate = Rotate0) const; /** Returns a list of text of the page -- 1.7.11.1
_______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
