PDF conversion and scanning performance problem

Mridul Kashatria Tue, 01 Jun 2010 23:49:56 -0700

Hello,

I'm a newbie to great ocropus library. What I'm doing here is to input a
PDF file and output page segmentation data. So far I can see the steps
to do this as follows,


1. Convert each page of PDF file to 300 DPI PNG (or jpeg, ppm etc) image
2. Save the PNG to disk
3. Call iulib::read_image_gray(gray, filename); passing filename of PNG
4. Make binarizer and call binarizer.binarize(bin, gray);
5. Make page segmenter and call segmenter.segment(out, bin)
6. Use RegionExtractor to find out the rectangular regions
7. Save the region data to a sqlite database for use later

The problem is now that saving each PNG file to disk and then reading it
from disk takes a lot of time, esp when there are 100+ pages in a PDF.

I'm using Cairo graphics to render PDF to images, and want to know if
there is a way I can save time by directly passing some in-memory
reference of PNG encoded data to iulib::read_image method.

I'm a newb to C, C++ as well, so forgive if I'm missing something
obvious.

Thanks

--
Regards

Mridul






-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en.

PDF conversion and scanning performance problem

Reply via email to