Re: [tesseract-ocr] OCR missing out single Characters - why ?

2017-02-28 Thread ShreeDevi Kumar
try with --psm 6

Here is the output I got - using english traineddata on 4.0 version (using
gimagereader)

‘IO

E F D

7

E F D

5

E F D

12

E F D

—

E F D

—

E F D

Wlie viele verschiedene Plagen gab es in

Agypten, bevor Moses das israelische

Volk befreite (2, Buch Mose)?

E F D



​---
>

​tesseract ./screen-vertikal.png ./screen-vertikal  --oem 1 --psm 6 -l deu

Output file attached ​

> ​
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUBEYZ9qBfNQko_5CXrd5kk-A1Mh46MojHgJiKkTr6gLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
10
E F D
7
E F D
5
E F D
12
E F D
n
E F D
en
E F D
Wie viele verschiedene Plagen gab es in
Ägypten, bevor Moses das israelische
Volk befreite (2. Buch Mose)?
E F D



[tesseract-ocr] Re: Simple Tesseract OCR in .NET 4+?

2017-02-28 Thread Cory Blissitte
Building on Quan Nguyens suggestion:

CharlesWs Tesseract wrapper project is a pretty easy thing to work with.  
In its simplest form you can do the following provided the image file is 
saved to your filesystem:

var engine = new 
TesseractEngine(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, 
@"tessdata"), "eng",
EngineMode.TesseractOnly)
{
DefaultPageSegMode = PageSegMode.AutoOsd
};

var pageOutput = engine.Process(Pix.LoadFromFile(fileName));

var hOcr = pageOutput.GetHOCRText(0);
var imageText= pageOutput.GetText();

The hOcr string is an HTML document that contains the text and placement of 
that text on the page (most useful for incorporation into searchable PDFs.  
The imageText string is just the recognized text from the image.


Cory

On Monday, February 27, 2017 at 9:37:47 PM UTC-10, Cetor Notorious wrote:
>
> Hi everybody,
>
> I was wondering if anyone had a tutorial / example code that is really 
> simple.
> It just needs to recognize text from a webimage, and return the recognized 
> text.
>
> I would like to make it where I can have this entire piece in one DLL so 
> it's easy to use.
>
> Is anyone able to help me?
>
> Have a wonderful day :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c70cf56f-5672-4809-bd22-7a96a0f60455%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] OCR missing out single Characters - why ?

2017-02-28 Thread Bernhard Gramberg
Hi, 

I make OCR from a special picture,
which is build out of several pictures 

I make separators in between to better pick up the real content  

In the middle, there is a single letter (in this example 5 + 7 ) 
which are not recogniced. (no differnz with letters as well) 

Its with Windows, version 3.5 and 4.0 as well , 
the short digit / letters (same problem) is not recognized 

Any idea, what to do, to have the 5 + 7 detected ? 

Yours Bernhard  

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0abf8cab-c2f8-41cb-bd5b-83e0ecbe2b5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
10

E F D

E F D

E F D

12

E F D

E F D

E F D

Wie Viele verschiedene Plagen gab es in
Ägypten, bevor Moses das israelische
Volk befreite (2‚ Buch Mose)?

E F D



[tesseract-ocr] Re: Simple Tesseract OCR in .NET 4+?

2017-02-28 Thread Quan Nguyen
Check out .NET wrapper for Tesseract:

https://github.com/charlesw/tesseract

On Tuesday, February 28, 2017 at 1:37:47 AM UTC-6, Cetor Notorious wrote:
>
> Hi everybody,
>
> I was wondering if anyone had a tutorial / example code that is really 
> simple.
> It just needs to recognize text from a webimage, and return the recognized 
> text.
>
> I would like to make it where I can have this entire piece in one DLL so 
> it's easy to use.
>
> Is anyone able to help me?
>
> Have a wonderful day :)
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/01457812-b969-405d-8c41-65422e1e945a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Whitelisting characters to get characters from a particular set does not work

2017-02-28 Thread 'kolomiyets' via tesseract-ocr


On Tuesday, February 28, 2017 at 10:12:55 AM UTC+1, Subhodeep Maji wrote:
>
> Thank you for replying. I have 2 questions now. 
> 1.Why would they discontinue the whitlisting feature ?
>
I don't know. 

2. Is there anything else that you know of for handwriting detection, i get 
> decent results with tesseract if the letters are not touching each other 
> and are neat. Will training tesseract with handwritten letters help ?

I am not sure, Tesseract is not designed for this.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f1885ec6-2503-4045-9bbe-4c0a607a4e39%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Whitelisting characters to get characters from a particular set does not work

2017-02-28 Thread Subhodeep Maji
Thank you for replying. I have 2 questions now. 
1.Why would they discontinue the whitlisting feature ?
2. Is there anything else that you know of for handwriting detection, i get 
decent results with tesseract if the letters are not touching each other and 
are neat. Will training tesseract with handwritten letters help ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/caa1d065-d17b-4f4a-801b-8187a888e246%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: OCR identifications

2017-02-28 Thread 'kolomiyets' via tesseract-ocr

Hi,

you may consider OpenCV for finding text within image.

See: http://stackoverflow.com/questions/23506105/extracting-text-opencv

Best regards,
Alex


On Monday, February 27, 2017 at 7:36:10 PM UTC+1, Aman Dalmia wrote:
>
> I have an image in which there are few characters. I want to identify it 
> and it has to processed on real time basis. 
>
> How to crop the only the text/digits part from the complete image? 
> And  then I need to send that "cropped image" to tesseract engine to 
> identify the characters and digit. 
>
> I am a newbie. (Preferable language C++)
>
> Please help me in detail. If there is an existing project, please provide 
> a code.
> Thanks 
>
> Aman
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/22bffb42-0370-426f-8c8c-7aabec2ba8c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Whitelisting characters to get characters from a particular set does not work

2017-02-28 Thread 'kolomiyets' via tesseract-ocr
Hi

1. Tesseract is not designed to recognize hand-written text.
2. The method you are referring to is a feature of Tesseract Ver < 4.00 and 
I am not sure if it is supported in 4.00 unless trained on a corpus with a 
limited char set.

Best regards,
Alex

On Monday, February 27, 2017 at 7:36:10 PM UTC+1, Subhodeep Maji wrote:
>
> Hello all, I am using tesseract for the first time. I want my image to 
> detect alphabets only, so i made a file called "letters" having content 
> "tessedit_char_whitelist abcdefghijklmnopqrstuvwxyz" and used command 
> "tesseract meaningful_words.tiff out letters"  after seeing this link "
> http://stackoverflow.com/questions/2363490/limit-characters-tesseract-is-looking-for;
>  
> and i got this as result -> 
> "h\ hello wor|d go to - keywords
>
> python these are meaningfd words ."
>
>  When i used the command "tesseract meaningful_words.tiff out digits", i 
> got
>  "h\ hello wor|d go to - keywords
>
> python these are 101 90   370words ." as output. My question is 
> why does the first case output other characters than alphabets and why 
> ain't the output blank in the second. Am i missing something here ?
>
> I have also attached the image which i used for detection. I am using 
> tesseract 4.
>
> P.S- This is the first time i am asking a question on a forum
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a215cd92-1cdc-4724-bc05-c678e4ffdc1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.