Thanks, Dan. Forwarding your message to the group and original poster - who was getting errors with large bitmaps
>>when a bitmap image is created newly, and if the image dimensions are exceeding *1900 x 2475*, and in the next line when the same bitmap is being tried to convert to *Pix *then at that point of time, I am getting the error which I was talking about in the post. On Mon, Jun 12, 2017 at 7:52 PM, Dan Bloomberg <[email protected]> wrote: > > > >> BitmapToPixConverter b = new BitmapToPixConverter(); > > >> > Pix pix = b.Convert(bitmap); > > This is not leptonica code. It shouldn't compile, with b being a ptr > that is dereferenced with a ".". This is then set equal to a pix which is > (as written) not a ptr either, causing a copy if it were correct. > > > On Mon, Jun 12, 2017 at 12:16 AM, ShreeDevi Kumar <[email protected]> > wrote: > >> image processing within tesseract is done by leptonica. >> >> https://github.com/DanBloomberg/leptonica >> >> + dan bloomberg >> >> >> >> ShreeDevi >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jun 12, 2017 at 11:25 AM, Hari.K <[email protected]> wrote: >> >>> Thanks Shree. >>> >>> Hello Quan, >>> >>> Here are my further updates / observations on the post : >>> >>> - The error which I had mentioned in this post is actually occurring in >>> the below yellow highlighted line. >>> - As per my analysis, when a bitmap image is created newly, and if the >>> image dimensions are exceeding *1900 x 2475*, and in the next line when >>> the same bitmap is being tried to convert to *Pix *then at that point >>> of time, I am getting the error which I was talking about in the post. >>> >>> >>> for (int i = 0; i <= document.Pages.Count; i++) >>> { >>> bitmap = (Bitmap)document.SaveAsImage(i, >>> PdfImageType.Bitmap, 200, 200); >>> >>> >>> >>> BitmapToPixConverter b = new BitmapToPixConverter(); >>> Pix pix = b.Convert(bitmap); >>> ......... >>> } >>> So as per what I understand the Tesseract is not able to convert since >>> the generated bitmap is of higher dimensions and it is throwing that error >>> what we are talking about in the post. >>> >>> Is anyone sure that Tesseract has these kind of limitations while >>> converting a bitmap of higher dimensions ?? >>> >>> Now, the only way to get rid of this issue is to resize the bitmap image >>> before I try to convert it to Pix ? Am I in the right direction, any other >>> ideas please ? >>> >>> Thanks in Advance, >>> Hari >>> >>> On Friday, 9 June 2017 11:59:08 UTC+5:30, shree wrote: >>>> >>>> + quan >>>> >>>> Quan will be better able to advice regarding .net >>>> >>>> also see https://sourceforge.net/projects/vietocr/files/vietocr.n >>>> et/5.0alpha/ >>>> >>>> ShreeDevi >>>> ____________________________________________________________ >>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>> >>>> On Fri, Jun 9, 2017 at 10:44 AM, Hari.K <[email protected]> wrote: >>>> >>>>> Thank you Shree for replying back on the issue. Yes I know about >>>>> ghostscript and its commands, but with the present architecture of project >>>>> we are restricted to acomodate the ghostscript commands. Besides, I am >>>>> also >>>>> aware of "gsdll32.dll", but as it is not a .Net managed library, and we >>>>> can't reference it directly in a project and moreover we will have to go >>>>> by >>>>> the PInvoke procedure, hence for all those above reasons and limitations >>>>> we >>>>> are supposed to stay away from ghostscript. >>>>> >>>>> Do you think we have any better alternative libraries which I can make >>>>> use of so that I would not be getting that error which I mentioned in this >>>>> post ? >>>>> >>>>> Thanks in Advance, >>>>> Hari >>>>> >>>>> On Thursday, 8 June 2017 21:16:15 UTC+5:30, shree wrote: >>>>>> >>>>>> Have you tried using ghostscript to convert pdf to tif files instead? >>>>>> Example commands >>>>>> >>>>>> gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=106 -dLastPage=109 >>>>>> -o ./tulasi/tulasikrishna%00d.tif "TulasiPuja.pdf" >>>>>> >>>>>> for one tif per page >>>>>> >>>>>> gs -r600x600 -sDEVICE=tiffg4 -dFirstPage=126 -dLastPage=131 >>>>>> -o ./tulasi/tulasIviShNupUjA.tif "TulasiPuja.pdf" >>>>>> >>>>>> for multipage tif >>>>>> >>>>>> you can reduce resolution to -r300x300 >>>>>> >>>>>> ShreeDevi >>>>>> ____________________________________________________________ >>>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >>>>>> >>>>>> On Thu, Jun 8, 2017 at 7:25 PM, Hari.K <[email protected]> wrote: >>>>>> >>>>>>> Hi There, >>>>>>> >>>>>>> I sometimes receive an error - "Failed to create pix, this >>>>>>> normally occurs because the requested image size is too large, please >>>>>>> check >>>>>>> Standard Error Output" when doing OCR on a bitmap image. >>>>>>> >>>>>>> >>>>>>> Below highlighted line is where it's breaking for me - >>>>>>> >>>>>>> Bitmap bitmap; >>>>>>> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath); >>>>>>> >>>>>>> >>>>>>> for (int i = 0; i <= document.Pages.Count; i++) >>>>>>> { >>>>>>> bitmap = (Bitmap)document.SaveAsImage(i, >>>>>>> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am >>>>>>> setting for a bitmap image >>>>>>> ................... >>>>>>> ................. >>>>>>> >>>>>>> } >>>>>>> >>>>>>> More details on what I am trying to do here: >>>>>>> 1) Uploaded a PDF document which is of hardly 600KB >>>>>>> 2) Iterate through each PDF page and convert it into a BitMap image >>>>>>> 3) Then input this BitMap image to Tesseract for performing OCR >>>>>>> >>>>>>> Please note, I don't get this error often. Any ideas on why this >>>>>>> error as I do not receive this every time ? >>>>>>> >>>>>>> Looking forward for some inputs on this.. >>>>>>> >>>>>>> Thanks in Advance, >>>>>>> Hari >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "tesseract-ocr" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>>>> To view this discussion on the web visit >>>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707 >>>>>>> b-4b56-9720-b3e39ae1a658%40googlegroups.com >>>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707b-4b56-9720-b3e39ae1a658%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/79f1f939-9fd >>>>> 3-4869-8dbd-15945a91591a%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/79f1f939-9fd3-4869-8dbd-15945a91591a%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at https://groups.google.com/group/tesseract-ocr. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b%40goo >>> glegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV0zQVQL-xBjk6vvQMUnS%2BGm%3D5MfkNNJp5q%2BYJAWTqpWw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

