[tesseract-ocr] how to recognize the image buffers

2017-06-12 Thread Tao Deng
I have a set of image buffers (encoded by base64), how can I recognize them 
one by one using Tesseract? it seems Tesseract only support local image 
resource but not the buffer. don't know if anyone have the solution and can 
help

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/32c6ab6f-c0e1-48aa-9b9b-c00ad6c50b57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread tdhi...@gmail.com
The more I think about this the more it makes sense it's just running out of 
memory because pix didn't get disposed.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/D008F0AA-EEE7-4ACF-AE41-F29362AE0C6F%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread Quan Nguyen
Leptonica provides many different methods for creating Pix object. You can 
read from file, memory buffer, etc. So you may need to write your bitmap to 
such intermediate formats and read back as Pix.

pixRead
pixReadMem
pixReadMemPng

Check its API doc:

https://github.com/DanBloomberg/leptonica/blob/master/src/allheaders.h


On Thursday, June 8, 2017 at 8:55:04 AM UTC-5, Hari.K wrote:
>
> Hi There,
>
> I sometimes receive an error - "Failed to create pix, this normally 
> occurs because the requested image size is too large, please check Standard 
> Error Output" when doing OCR on a bitmap image.
>
>
> Below highlighted line is where it's breaking for me - 
>
>  Bitmap bitmap;
> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath);
>
>
> for (int i = 0; i <= document.Pages.Count; i++)
> {
> bitmap = (Bitmap)document.SaveAsImage(i, 
> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am 
> setting for a bitmap image
> ...
> .
>
> }
>
> More details on what I am trying to do here:
> 1) Uploaded a PDF document which is of hardly 600KB
> 2) Iterate through each PDF page and convert it into a BitMap image
> 3) Then input this BitMap image to Tesseract for performing OCR
>
> Please note, I don't get this error often. Any ideas on why this error as 
> I do not receive this every time ?
>
> Looking forward for some inputs on this..
>
> Thanks in Advance,
> Hari
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1c25db3e-3217-4bfd-9db8-3fce7e863045%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread THintz
This is the Charles Weld .Net wrapper code.  The first thing Convert() does 
is call this method.  Leptonica's picCreate() returns a null pointer 
apparently.



public static Pix Create(int width, int height, int depth)
{
if (!AllowedDepths.Contains(depth))
throw new ArgumentException("Depth must be 1, 2, 4, 8, 16, 
or 32 bits.", "depth");


if (width <= 0) throw new ArgumentException("Width must be 
greater than zero", "width");
if (height <= 0) throw new ArgumentException("Height must be 
greater than zero", "height");


var handle = Interop.LeptonicaApi.Native.pixCreate(width, height
, depth);
if (handle == IntPtr.Zero) throw new 
InvalidOperationException("Failed 
to create pix, this normally occurs because the requested image size is too 
large, please check Standard Error Output.");


return Create(handle);
}



-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/07eef4a7-f0e0-4e62-b66a-7e0fe7b1aead%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread THintz
That's charlesw's .Net Tesseract/Leptonica wrapper code.  One problem is 
that "pix" derives from IDisposable and must be disposed.

>
> On Mon, Jun 12, 2017 at 7:52 PM, Dan Bloomberg  > wrote:
>
>> ​
>> ​
>>   >> BitmapToPixConverter b = new BitmapToPixConverter();
>>   
>> ​>>​
>>Pix pix = b.Convert(bitmap);
>>
>> This is not leptonica code.​  It shouldn't compile, with b being a ptr 
>> that is dereferenced with a ".".  This is then set equal to a pix which is 
>> (as written) not a ptr either, causing a copy if it were correct.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/919b2c7e-aa57-44a3-8c57-cdaa95baf4e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread ShreeDevi Kumar
Hari,

Please also look in the leptonica program directory
for
pdf2tiff
pdf2mtiff
etc

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVdmwSqAUg1By08wUkx6LTNeAkLRjahbiYcZdMbq8RDbQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread ShreeDevi Kumar
Thanks, Dan.

Forwarding your message to the group and original poster - who was getting
errors with large bitmaps

>>when a bitmap image is created newly, and if the image dimensions are
exceeding *1900 x 2475*, and in the next line when the same bitmap is being
tried to convert to *Pix *then at that point of time, I am getting the
error which I was talking about in the post.


On Mon, Jun 12, 2017 at 7:52 PM, Dan Bloomberg 
wrote:

> ​
> ​
>   >> BitmapToPixConverter b = new BitmapToPixConverter();
>
> ​>>​
>Pix pix = b.Convert(bitmap);
>
> This is not leptonica code.​  It shouldn't compile, with b being a ptr
> that is dereferenced with a ".".  This is then set equal to a pix which is
> (as written) not a ptr either, causing a copy if it were correct.
>
>
> On Mon, Jun 12, 2017 at 12:16 AM, ShreeDevi Kumar 
> wrote:
>
>> image processing within tesseract is done by leptonica.
>>
>> https://github.com/DanBloomberg/leptonica
>>
>> + dan bloomberg
>>
>>
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Jun 12, 2017 at 11:25 AM, Hari.K  wrote:
>>
>>> Thanks Shree.
>>>
>>> Hello Quan,
>>>
>>> Here are my further updates / observations on the post :
>>>
>>> - The error which I had mentioned in this post is actually occurring in
>>> the below yellow highlighted line.
>>> - As per my analysis, when a bitmap image is created newly, and if the
>>> image dimensions are exceeding *1900 x 2475*, and in the next line when
>>> the same bitmap is being tried to convert to *Pix *then at that point
>>> of time, I am getting the error which I was talking about in the post.
>>>
>>>
>>> for (int i = 0; i <= document.Pages.Count; i++)
>>> {
>>> bitmap = (Bitmap)document.SaveAsImage(i,
>>> PdfImageType.Bitmap, 200, 200);
>>>
>>>
>>> ​​
>>>   BitmapToPixConverter b = new BitmapToPixConverter();
>>> Pix pix = b.Convert(bitmap);
>>>   .
>>>  }
>>> So as per what I understand the Tesseract is not able to convert since
>>> the generated bitmap is of higher dimensions and it is throwing that error
>>> what we are talking about in the post.
>>>
>>> Is anyone sure that Tesseract has these kind of limitations while
>>> converting a bitmap of higher dimensions ??
>>>
>>> Now, the only way to get rid of this issue is to resize the bitmap image
>>> before I try to convert it to Pix ? Am I in the right direction, any other
>>> ideas please ?
>>>
>>> Thanks in Advance,
>>> Hari
>>>
>>> On Friday, 9 June 2017 11:59:08 UTC+5:30, shree wrote:

 + quan

 Quan will be better able to advice regarding .net

 also see https://sourceforge.net/projects/vietocr/files/vietocr.n
 et/5.0alpha/

 ShreeDevi
 
 भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

 On Fri, Jun 9, 2017 at 10:44 AM, Hari.K  wrote:

> Thank you Shree for replying back on the issue. Yes I know about
> ghostscript and its commands, but with the present architecture of project
> we are restricted to acomodate the ghostscript commands. Besides, I am 
> also
> aware of "gsdll32.dll", but as it is not a .Net managed library, and we
> can't reference it directly in a project and moreover we will have to go 
> by
> the PInvoke procedure, hence for all those above reasons and limitations 
> we
> are supposed to stay away from ghostscript.
>
> Do you think we have any better alternative libraries which I can make
> use of so that I would not be getting that error which I mentioned in this
> post ?
>
> Thanks in Advance,
> Hari
>
> On Thursday, 8 June 2017 21:16:15 UTC+5:30, shree wrote:
>>
>> Have you tried using ghostscript to convert pdf to tif files instead?
>> Example commands
>>
>> gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=106  -dLastPage=109
>>  -o ./tulasi/tulasikrishna%00d.tif  "TulasiPuja.pdf"
>>
>> for one tif per page
>>
>> gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=126  -dLastPage=131
>>  -o ./tulasi/tulasIviShNupUjA.tif  "TulasiPuja.pdf"
>>
>> for multipage tif
>>
>> you can reduce resolution to -r300x300
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Thu, Jun 8, 2017 at 7:25 PM, Hari.K  wrote:
>>
>>> Hi There,
>>>
>>> I sometimes receive an error - "Failed to create pix, this
>>> normally occurs because the requested image size is too large, please 
>>> check
>>> Standard Error Output" when doing OCR on a bitmap image.
>>>
>>>
>>> Below highlighted line is where 

[tesseract-ocr] Is There Library for Swift on Mac OS?

2017-06-12 Thread James Lee
I'm looking for Swift library for MacOS to integrate into an app. I found 
one for iOS on Github, but not for MacOS. I understand you can install 
command line tool using home brew, but it's not ideal for app distribution. 
Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/3541aa94-e31b-424c-8531-92cacfad00e6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Detect Multiple Images by Command Line

2017-06-12 Thread Ibr
*much appreciated*

On Monday, June 12, 2017 at 1:35:20 PM UTC+3, shree wrote:
>
> see  https://github.com/tesseract-ocr/tesseract/issues/928
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Mon, Jun 12, 2017 at 3:58 PM, Ibr  
> wrote:
>
>> Hi,
>>
>> When I want to detect an image on the tesseract 4.00alpha I run the 
>> command *tesseract image results -l lang --tessdata-dir ./tessdata --oem 
>> 1* .
>>
>> my question is, when I need to detect say 10 image, for example image1, 
>> image2 image3 etc. but I want to do that all in one command, and 
>> include all the results in the same result file which is "result" how the 
>> command should look like in this case?
>>
>> Thanks
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/da7c6918-3449-4bbe-b2e6-7831375e57d6%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bb64b998-25b2-4d3f-b14c-621dc4aade14%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Detect Multiple Images by Command Line

2017-06-12 Thread ShreeDevi Kumar
see  https://github.com/tesseract-ocr/tesseract/issues/928

ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jun 12, 2017 at 3:58 PM, Ibr  wrote:

> Hi,
>
> When I want to detect an image on the tesseract 4.00alpha I run the
> command *tesseract image results -l lang --tessdata-dir ./tessdata --oem
> 1* .
>
> my question is, when I need to detect say 10 image, for example image1,
> image2 image3 etc. but I want to do that all in one command, and
> include all the results in the same result file which is "result" how the
> command should look like in this case?
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/da7c6918-3449-4bbe-b2e6-7831375e57d6%
> 40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduXtJzMGsoBe7vowDyoFD_hR51XM6H_CS6zKL%3DbBnB_Vig%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Detect Multiple Images by Command Line

2017-06-12 Thread Ibr
Hi,

When I want to detect an image on the tesseract 4.00alpha I run the command 
*tesseract 
image results -l lang --tessdata-dir ./tessdata --oem 1* .

my question is, when I need to detect say 10 image, for example image1, 
image2 image3 etc. but I want to do that all in one command, and 
include all the results in the same result file which is "result" how the 
command should look like in this case?

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da7c6918-3449-4bbe-b2e6-7831375e57d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Tesseract on Bitmap images giving error - Error: "Failed to create pix, this normally occurs because...

2017-06-12 Thread ShreeDevi Kumar
image processing within tesseract is done by leptonica.

https://github.com/DanBloomberg/leptonica

+ dan bloomberg



ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jun 12, 2017 at 11:25 AM, Hari.K  wrote:

> Thanks Shree.
>
> Hello Quan,
>
> Here are my further updates / observations on the post :
>
> - The error which I had mentioned in this post is actually occurring in
> the below yellow highlighted line.
> - As per my analysis, when a bitmap image is created newly, and if the
> image dimensions are exceeding *1900 x 2475*, and in the next line when
> the same bitmap is being tried to convert to *Pix *then at that point of
> time, I am getting the error which I was talking about in the post.
>
>
> for (int i = 0; i <= document.Pages.Count; i++)
> {
> bitmap = (Bitmap)document.SaveAsImage(i,
> PdfImageType.Bitmap, 200, 200);
>
> BitmapToPixConverter b = new BitmapToPixConverter();
> Pix pix = b.Convert(bitmap);
>   .
>  }
> So as per what I understand the Tesseract is not able to convert since the
> generated bitmap is of higher dimensions and it is throwing that error what
> we are talking about in the post.
>
> Is anyone sure that Tesseract has these kind of limitations while
> converting a bitmap of higher dimensions ??
>
> Now, the only way to get rid of this issue is to resize the bitmap image
> before I try to convert it to Pix ? Am I in the right direction, any other
> ideas please ?
>
> Thanks in Advance,
> Hari
>
> On Friday, 9 June 2017 11:59:08 UTC+5:30, shree wrote:
>>
>> + quan
>>
>> Quan will be better able to advice regarding .net
>>
>> also see https://sourceforge.net/projects/vietocr/files/vietocr.
>> net/5.0alpha/
>>
>> ShreeDevi
>> 
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Fri, Jun 9, 2017 at 10:44 AM, Hari.K  wrote:
>>
>>> Thank you Shree for replying back on the issue. Yes I know about
>>> ghostscript and its commands, but with the present architecture of project
>>> we are restricted to acomodate the ghostscript commands. Besides, I am also
>>> aware of "gsdll32.dll", but as it is not a .Net managed library, and we
>>> can't reference it directly in a project and moreover we will have to go by
>>> the PInvoke procedure, hence for all those above reasons and limitations we
>>> are supposed to stay away from ghostscript.
>>>
>>> Do you think we have any better alternative libraries which I can make
>>> use of so that I would not be getting that error which I mentioned in this
>>> post ?
>>>
>>> Thanks in Advance,
>>> Hari
>>>
>>> On Thursday, 8 June 2017 21:16:15 UTC+5:30, shree wrote:

 Have you tried using ghostscript to convert pdf to tif files instead?
 Example commands

 gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=106  -dLastPage=109-o
 ./tulasi/tulasikrishna%00d.tif  "TulasiPuja.pdf"

 for one tif per page

 gs   -r600x600 -sDEVICE=tiffg4   -dFirstPage=126  -dLastPage=131-o
 ./tulasi/tulasIviShNupUjA.tif  "TulasiPuja.pdf"

 for multipage tif

 you can reduce resolution to -r300x300

 ShreeDevi
 
 भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

 On Thu, Jun 8, 2017 at 7:25 PM, Hari.K  wrote:

> Hi There,
>
> I sometimes receive an error - "Failed to create pix, this
> normally occurs because the requested image size is too large, please 
> check
> Standard Error Output" when doing OCR on a bitmap image.
>
>
> Below highlighted line is where it's breaking for me -
>
>  Bitmap bitmap;
> Spire.Pdf.PdfDocument document = new Spire.Pdf.PdfDocument(pdfPath);
>
>
> for (int i = 0; i <= document.Pages.Count; i++)
> {
> bitmap = (Bitmap)document.SaveAsImage(i,
> PdfImageType.Bitmap, 200, 200); // where 200 is the DPI which I am
> setting for a bitmap image
> ...
> .
>
> }
>
> More details on what I am trying to do here:
> 1) Uploaded a PDF document which is of hardly 600KB
> 2) Iterate through each PDF page and convert it into a BitMap image
> 3) Then input this BitMap image to Tesseract for performing OCR
>
> Please note, I don't get this error often. Any ideas on why this error
> as I do not receive this every time ?
>
> Looking forward for some inputs on this..
>
> Thanks in Advance,
> Hari
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe