Hi All! Today, per chance I have looked into the poppler sources and usage of the TextOutputDev and have noticed few things:
(1) TextOutputDev is used inside page::search(), but it is created newly every time the page::search() is called. Wouldn't it be better to keep cached an instance of TectOutputDev for searches? This looks like an explanation why in Okular the search is slow (speed is constant) on large documents (800-1200 pages; think CPU instruction manual), even if one searches for the same thing second time. Same pattern in the Qt4's Page::search(), with the difference that TextOutputDev parameters are not constant. But that also (theoritecally) not a problem: one can remember c'tor parameters of the cached TextOutputDev and if they need to be changed, discard old copy and create new cached copy with new parameters. That would be a great performance enhancement. If that of course is possible to implement. (2) More of a question. page::text()/Page::textList() both use the TextOutputDev to extract text - as plain text. Do I understand correctly that that is the reason why poppler based viewers wouldn't be able to "Copy" into the clipboard text with styles like bold or italic? Is that on any TODO? Is there any open-source PDF viewer which can copy into clipboard text with formatting? Thanks. _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
