Re: [tesseract-ocr] Standalone tesseract 3.5 or higher with font detection for Windows

2017-12-06 Thread Quan Nguyen
VietOCR, Java version, bundles Tesseract Windows executable. You may want 
to check it out.

https://sourceforge.net/projects/vietocr/files/vietocr/

On Wednesday, December 6, 2017 at 11:30:50 AM UTC-6, Amir Vahid wrote:
>
> Either would be helpful. My real issue is finding a standalone portable 
> tesseract.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/41d3905d-f619-4888-8020-98691552ab91%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Black line deleted

2017-12-06 Thread lelive
Hi all,
i use tesseract for technical documents and produce pdf searchable . But if 
the picture contain lines, in the pdf file result, the lines are deleted 





 



  
Is there a solution or parameter for say to tesseract do not "clean" 
picture out ?

Many thanks for your help !

Olivier

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f8d3df29-c900-4172-a9ce-9892463f0634%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Need Help with extracting info from Invoice

2017-12-06 Thread Djibril Kaba
Hi Vinay,

I am trying to solve the same problem here. Have you managed to get some 
solution to your problem. Your help would be greatly appreciated.  Looking 
forward to hearing from you.

Many thanks!!

On Tuesday, November 18, 2014 at 8:53:08 PM UTC+1, Vinay Matam wrote:
>
> Hi All,
>
> I really need your help with one of the projects that I am working on. I 
> am using Tesseract 3.02 on a Ubuntu machine.
>
> I have an invoice (please see the attached file). I want to extract some 
> information from that invoice like Advisor Name, Invoice Number, Invoice 
> Date, License No, Mileage etc..
>
> I have tried to extract the whole data from the image to a text file. By 
> doing some pre-processing on the image using Imagemagick, I was able to 
> extract the info to some extent. However, I am not totally satisfied with 
> the output. 
> I need your inputs on how I should extract the information. Shall I first 
> crop the specific portion of the image to different rectangles and then OCR 
> them individually..? I tried this way and gained great results. But again 
> in this case, not all the images are in the same size with same resolution 
> and hence the rectangles co-ordinates will not work on all the cases. I 
> thought this method will not work on all images (scanned, taken from mobile 
> or pdf files).
>
> Then I thought of using Regular expressions on the extracted data and then 
> pick up the data that I require from the whole text file. But this method 
> also does not seem to be working. 
>
> I am totally in a confused state now. Any help or inputs are much 
> appreciated. .. :) I have attached a sample image and the extracted output.
>
> Thanks,
> Vinay.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/24133b1a-949b-490a-aff5-32e277359237%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Standalone tesseract 3.5 or higher with font detection for Windows

2017-12-06 Thread Amir Vahid
Either would be helpful. My real issue is finding a standalone portable 
tesseract.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d4f42ac3-97d7-4339-942a-aa0859eb3d97%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Standalone tesseract 3.5 or higher with font detection for Windows

2017-12-06 Thread Amir Vahid
Well tesseract 3.5 and 4 have an option to detect font types per character.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7f373485-e5ba-4362-9a75-71e729fb4222%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Standalone tesseract 3.5 or higher with font detection for Windows

2017-12-06 Thread Zdenko Podobný
What do you mean with font detection?

Zdenko

On Wed, Dec 6, 2017 at 9:21 AM, Amir Vahid  wrote:

> I am working in a windows environment in which I do not own admin power. I
> need a standalone tesseract version ( I guess that would be 3.5 and higher)
> which can do font detection. Any help will be very much appreciated
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/7b62600f-3564-4c1a-852c-8683259871b0%
> 40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8z5b43SihT4UvZ87ExSC2ST8m3QqMaGNd72JSdPaDvaHg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.