image processing within tesseract is done by leptonica. https://github.com/DanBloomberg/leptonica
+ dan bloomberg ShreeDevi ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Mon, Jun 12, 2017 at 11:25 AM, Hari.K <[email protected]> wrote: > Thanks Shree. > > Hello Quan, > > Here are my further updates / observations on the post : > > - The error which I had mentioned in this post is actually occurring in > the below yellow highlighted line. > - As per my analysis, when a bitmap image is created newly, and if the > image dimensions are exceeding *1900 x 2475*, and in the next line when > the same bitmap is being tried to convert to *Pix *then at that point of > time, I am getting the error which I was talking about in the post. > > > for (int i = 0; i <= document.Pages.Count; i++) > { > bitmap = (Bitmap)document.SaveAsImage(i, > PdfImageType.Bitmap, 200, 200); > > BitmapToPixConverter b = new BitmapToPixConverter(); > Pix pix = b.Convert(bitmap); > ......... > } > So as per what I understand the Tesseract is not able to convert since the > generated bitmap is of higher dimensions and it is throwing that error what > we are talking about in the post. > > Is anyone sure that Tesseract has these kind of limitations while > converting a bitmap of higher dimensions ?? > > Now, the only way to get rid of this issue is to resize the bitmap image > before I try to convert it to Pix ? Am I in the right direction, any other > ideas please ? > > Thanks in Advance, > Hari > > On Friday, 9 June 2017 11:59:08 UTC+5:30, shree wrote: >> >> + quan >> >> Quan will be better able to advice regarding .net >> >> also see https://sourceforge.net/projects/vietocr/files/vietocr. >> net/5.0alpha/ >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Fri, Jun 9, 2017 at 10:44 AM, Hari.K <[email protected]> wrote: >> >>> Thank you Shree for replying back on the issue. Yes I know about >>> ghostscript and its commands, but with the present architecture of project >>> we are restricted to acomodate the ghostscript commands. Besides, I am also >>> aware of "gsdll32.dll", but as it is not a .Net managed library, and we >>> can't reference it directly in a project and moreover we will have to go by >>> the PInvoke procedure, hence for all those above reasons and limitations we >>> are supposed to stay away from ghostscript. >>> >>> Do you think we have any better alternative libraries which I can make >>> use of so that I would not be getting that error which I mentioned in this >>> post ? >>> >>> Thanks in Advance, >>> Hari >>> >>> On Thursday, 8 June 2017 21:16:15 UTC+5:30, shree wrote: >>>> >>>> Have you tried using ghostscript to convert pdf to tif files instead? >>>> Example commands >>>> >>>> gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=106 -dLastPage=109 -o >>>> ./tulasi/tulasikrishna%00d.tif "TulasiPuja.pdf" >>>> >>>> for one tif per page >>>> >>>> gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=126 -dLastPage=131 -o >>>> ./tulasi/tulasIviShNupUjA.tif "TulasiPuja.pdf" >>>> >>>> for multipage tif >>>> >>>> you can reduce resolution to -r300x300 >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Thu, Jun 8, 2017 at 7:25 PM, Hari.K <[email protected]> wrote: >>>> >>>>> Hi There, >>>>> >>>>> I sometimes receive an error - "Failed to create pix, this >>>>> normally occurs because the requested image size is too large, please >>>>> check >>>>> Standard Error Output" when doing OCR on a bitmap image. >>>>> >>>>> >>>>> Below highlighted line is where it's breaking for me - >>>>> >>>>> Bitmap bitmap; >>>>> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath); >>>>> >>>>> >>>>> for (int i = 0; i <= document.Pages.Count; i++) >>>>> { >>>>> bitmap = (Bitmap)document.SaveAsImage(i, >>>>> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am >>>>> setting for a bitmap image >>>>> ................... >>>>> ................. >>>>> >>>>> } >>>>> >>>>> More details on what I am trying to do here: >>>>> 1) Uploaded a PDF document which is of hardly 600KB >>>>> 2) Iterate through each PDF page and convert it into a BitMap image >>>>> 3) Then input this BitMap image to Tesseract for performing OCR >>>>> >>>>> Please note, I don't get this error often. Any ideas on why this error >>>>> as I do not receive this every time ? >>>>> >>>>> Looking forward for some inputs on this.. >>>>> >>>>> Thanks in Advance, >>>>> Hari >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707 >>>>> b-4b56-9720-b3e39ae1a658%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707b-4b56-9720-b3e39ae1a658%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/79f1f939-9fd3-4869-8dbd-15945a91591a%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/79f1f939-9fd3-4869-8dbd-15945a91591a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit https://groups.google.com/d/ > msgid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b% > 40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVNA5Y%3DXZQqX6Z-d8229NHFNVBns3k5DNa19J%2BcJfEgLg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

