Thanks for help. I somewhat solved this minor problem using internal buffering of FILE pointers. After checking the header file of iulib, I found that it has a function which accepts a file pointer in argument of read_image_gray. So I just pass a file pointer which has not been closed yet, so that the buffer is not flushed, as far as I understand this stuff :)
So far i've found that XYCUTS segmentation (which I'm using) is faster than others, so disk IO time becomes significant when other processing is fast. Also, i've seen that using PNG file format as a temporary file for PDF conversion is not right, as PNG compression, encoding, and then decoding takes a lot of unnecessary overhead. PPM or raw bitmap files are better suited for this purpose. I eventually want to know how to create PPM files (simple RGBRGB data) using cairo ARGB32 surface but so far PNG is working fine, and I'm happy. Thanks -- Mridul > You could also create cairo surfaces over arrays. > > Ilya > > On Wed, 2010-06-02 at 00:39 -0700, Tom wrote: > Disk I/O may take significant amounts of time, but usually is only a > small fraction of overall processing. > > If it is really a concern, just set up a RAM disk and read and write > the images there; that way, you only have to do a little scripting > (since you need to clean up things so that the RAM disk doesn't > overflow). > > Tom > > On May 23, 6:37 pm, Mridul Kashatria <[email protected]> wrote: > > Hello, > > > > I'm a newbie to great ocropus library. What I'm doing here is to > input a > > PDF file and output page segmentation data. So far I can see the > steps > > to do this as follows, > > > > 1. Convert each page of PDF file to 300 DPI PNG (or jpeg, ppm etc) > image > > 2. Save the PNG to disk > > 3. Call iulib::read_image_gray(gray, filename); passing filename of > PNG > > 4. Make binarizer and call binarizer.binarize(bin, gray); > > 5. Make page segmenter and call segmenter.segment(out, bin) > > 6. Use RegionExtractor to find out the rectangular regions > > 7. Save the region data to a sqlite database for use later > > > > The problem is now that saving each PNG file to disk and then > reading it > > from disk takes a lot of time, esp when there are 100+ pages in a > PDF. > > > > I'm using Cairo graphics to render PDF to images, and want to know > if > > there is a way I can save time by directly passing some in-memory > > reference of PNG encoded data to iulib::read_image method. > > > > I'm a newb to C, C++ as well, so forgive if I'm missing something > > obvious. > > > > Thanks > > > > -- > > Regards > > > > Mridul > > -- > You received this message because you are subscribed to the Google > Groups "ocropus" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to ocropus > [email protected]. > For more options, visit this group at > http://groups.google.com/group/ocropus?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/ocropus?hl=en.
