image processing within tesseract is done by leptonica.

https://github.com/DanBloomberg/leptonica

+ dan bloomberg



ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jun 12, 2017 at 11:25 AM, Hari.K <[email protected]> wrote:

> Thanks Shree.
>
> Hello Quan,
>
> Here are my further updates / observations on the post :
>
> - The error which I had mentioned in this post is actually occurring in
> the below yellow highlighted line.
> - As per my analysis, when a bitmap image is created newly, and if the
> image dimensions are exceeding *1900 x 2475*, and in the next line when
> the same bitmap is being tried to convert to *Pix *then at that point of
> time, I am getting the error which I was talking about in the post.
>
>
>             for (int i = 0; i <= document.Pages.Count; i++)
>             {
>                 bitmap = (Bitmap)document.SaveAsImage(i,
> PdfImageType.Bitmap, 200, 200);
>
>                 BitmapToPixConverter b = new BitmapToPixConverter();
>                 Pix pix = b.Convert(bitmap);
>               .........
>              }
> So as per what I understand the Tesseract is not able to convert since the
> generated bitmap is of higher dimensions and it is throwing that error what
> we are talking about in the post.
>
> Is anyone sure that Tesseract has these kind of limitations while
> converting a bitmap of higher dimensions ??
>
> Now, the only way to get rid of this issue is to resize the bitmap image
> before I try to convert it to Pix ? Am I in the right direction, any other
> ideas please ?
>
> Thanks in Advance,
> Hari
>
> On Friday, 9 June 2017 11:59:08 UTC+5:30, shree wrote:
>>
>> + quan
>>
>> Quan will be better able to advice regarding .net
>>
>> also see https://sourceforge.net/projects/vietocr/files/vietocr.
>> net/5.0alpha/
>>
>> ShreeDevi
>> ____________________________________________________________
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Jun 9, 2017 at 10:44 AM, Hari.K <[email protected]> wrote:
>>
>>> Thank you Shree for replying back on the issue. Yes I know about
>>> ghostscript and its commands, but with the present architecture of project
>>> we are restricted to acomodate the ghostscript commands. Besides, I am also
>>> aware of "gsdll32.dll", but as it is not a .Net managed library, and we
>>> can't reference it directly in a project and moreover we will have to go by
>>> the PInvoke procedure, hence for all those above reasons and limitations we
>>> are supposed to stay away from ghostscript.
>>>
>>> Do you think we have any better alternative libraries which I can make
>>> use of so that I would not be getting that error which I mentioned in this
>>> post ?
>>>
>>> Thanks in Advance,
>>> Hari
>>>
>>> On Thursday, 8 June 2017 21:16:15 UTC+5:30, shree wrote:
>>>>
>>>> Have you tried using ghostscript to convert pdf to tif files instead?
>>>> Example commands
>>>>
>>>> gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=106  -dLastPage=109    -o
>>>> ./tulasi/tulasikrishna%00d.tif  "TulasiPuja.pdf"
>>>>
>>>> for one tif per page
>>>>
>>>> gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=126  -dLastPage=131    -o
>>>> ./tulasi/tulasIviShNupUjA.tif  "TulasiPuja.pdf"
>>>>
>>>> for multipage tif
>>>>
>>>> you can reduce resolution to -r300x300
>>>>
>>>> ShreeDevi
>>>> ____________________________________________________________
>>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>>
>>>> On Thu, Jun 8, 2017 at 7:25 PM, Hari.K <[email protected]> wrote:
>>>>
>>>>> Hi There,
>>>>>
>>>>>     I sometimes receive an error - "Failed to create pix, this
>>>>> normally occurs because the requested image size is too large, please 
>>>>> check
>>>>> Standard Error Output" when doing OCR on a bitmap image.
>>>>>
>>>>>
>>>>> Below highlighted line is where it's breaking for me -
>>>>>
>>>>>  Bitmap bitmap;
>>>>> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath);
>>>>>
>>>>>
>>>>>             for (int i = 0; i <= document.Pages.Count; i++)
>>>>>             {
>>>>>                 bitmap = (Bitmap)document.SaveAsImage(i,
>>>>> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am
>>>>> setting for a bitmap image
>>>>>                 ...................
>>>>>                 .................
>>>>>
>>>>>             }
>>>>>
>>>>> More details on what I am trying to do here:
>>>>> 1) Uploaded a PDF document which is of hardly 600KB
>>>>> 2) Iterate through each PDF page and convert it into a BitMap image
>>>>> 3) Then input this BitMap image to Tesseract for performing OCR
>>>>>
>>>>> Please note, I don't get this error often. Any ideas on why this error
>>>>> as I do not receive this every time ?
>>>>>
>>>>> Looking forward for some inputs on this..
>>>>>
>>>>> Thanks in Advance,
>>>>> Hari
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "tesseract-ocr" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707
>>>>> b-4b56-9720-b3e39ae1a658%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/tesseract-ocr/dcfe7918-707b-4b56-9720-b3e39ae1a658%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/tesseract-ocr/79f1f939-9fd3-4869-8dbd-15945a91591a%40goo
>>> glegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/79f1f939-9fd3-4869-8dbd-15945a91591a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/21e32c9f-ec89-44db-be3d-d16ad771063b%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVNA5Y%3DXZQqX6Z-d8229NHFNVBns3k5DNa19J%2BcJfEgLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to