A Dilluns, 9 de març de 2009, Ilya Gorenbein va escriure: > Hello, > > > > I need to extract the text out of the document/page. > > I tried a void Page::display(OutputDev *out, double hDPI, double vDPI, > > int rotate, GBool useMediaBox, GBool crop, > > GBool printing, Catalog *catalog, > > GBool (*abortCheckCbk)(void *data), > > void *abortCheckCbkData, > > GBool (*annotDisplayDecideCbk)(Annot *annot, void > *user_data), > > void *annotDisplayDecideCbkData) ; > > > > function (poppler version 0.10.4). When I measured performance of this > function, I've got ~1.5 Mb/sec on dual core 2.33GHz CPU, 2 Gb of RAM, > with kernel 2.6.24-17, Debian lenny distro.
Hope you are using a TextOutputDev there and not a renderer like Splash or Cairo. > > Please, advice me how the performance of this function could be > improved. You get a profiler like callgrind and send us patches that for the hot spots of the code. > Is there another (cheaper) way to extract text out of the > document/page. I would say not, that's what pdftotext uses. Albert _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
