Hi Navaneetha
I am also looking to start training tesseract using handwritten fonts and
am about to start setting up my training environment. Are you training
tesseract 4 by following the guide
at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 ?
If so are you fine
The text I want Tesseract to read will only contain the most basic
characters. Is there a way of finetuning it therefore so as to only include
basic upper/lower case letters, digits and punctuation marks? That way I
could avoid 'c' getting misinterpreted as '¢' etc.? Would simply passing in
a
;>
>>>> then i have trained the tiff files(fonts) using serak trainer.
>>>>
>>>>
>>>> If you got the accuracy just forward the results so everyone can konw
>>>> and will follw you.
>>>>
>>>>
Hi Andreas, Have you managed to get this installed on windows 10?
On Wednesday, June 27, 2018 at 8:29:25 AM UTC+1, Andreas R wrote:
>
> Hello,
>
> is the new Tesseract 4 viable for Handwriting recognition?
>
> The FAQ says no.
> (
>
aining tesseract 3.5.
>
>
>
> On Tue, Jun 19, 2018 at 9:29 PM, James Q > wrote:
>
>> Hi Navaneetha
>> I am also looking to start training tesseract using handwritten fonts and
>> am about to start setting up my training environment. Are you training
>>
I had the same problem. I edited the csproj file for each dll to always
copt the content item to the output directory like this:
Always
After that it worked for me but still have trouble converting a Bitmap to a
Pix.
Thanks
James
On Tuesday, January 2, 2018 at 7:27:54 AM UTC,
I'm trying to use this wrapper:
https://github.com/tdhintz/tesseract4win64
It's an x64 .Net assembly with one main DLL (Tesseract.dll) and two
dependency DLLs (liblept1741.dll and libtesseract400.dll). To start with
I'm just trying to get a Visual Studio console app running. I've added
5, 2018 at 8:38:08 PM UTC+3:30, James Q wrote:
>>
>> I'm trying to use this wrapper:
>> https://github.com/tdhintz/tesseract4win64
>>
>> It's an x64 .Net assembly with one main DLL (Tesseract.dll) and two
>> dependency DLLs (liblept1741.dll and libtesseract400.d
Here is my code:
string text = "";
string tessDataPath = ConfigurationManager.AppSettings["TessPath"];
using (var engine = new TessBaseAPI(@tessDataPath, @"eng"))
{
engine.SetVariable("tessedit_ocr_engine_mode", "0");
engine.SetPageSegMode(PageSegmentationMode.SINGLE_LINE);
_____
>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>
>> On Mon, Jan 8, 2018 at 3:33 PM, James Q <james.qu...@taina.tech
>> > wrote:
>>
>>> By the way I do have the Tesseract.net nuget package working (
>>> https://www.n
.
On Thursday, January 11, 2018 at 7:15:58 PM UTC, James Q wrote:
>
> Is anyone else using tesseract 4.0alpha from C# ?
>
> On Wednesday, January 10, 2018 at 1:07:28 PM UTC, James Q wrote:
>>
>> Here is my code:
>> string text = "";
>>
>> string tessD
I haven't done this myself, but I believe you should be able to generate a
box file from the source image and use this to crop character subimages
from that source image. Tesseract won't always get the boxes right though.
On Thursday, January 18, 2018 at 12:49:22 PM UTC, Hardik Sutaria wrote:
>
1:07:28 PM UTC, James Q wrote:
>
> Here is my code:
> string text = "";
>
> string tessDataPath = ConfigurationManager.AppSettings["TessPath"];
> using (var engine = new TessBaseAPI(@tessDataPath, @"eng"))
> {
> engine.SetVariab
In my experience Tesseract gives poor results with lines within the text.
You can test this by manually whiting out the lines in a paint editor and
retrying Tesseract with the new image. If the results are improved then you
will likely need to do this programatically. This is not
Thanks for the reply, In my project I have tried all 3 DLLs in all
potential folders as follows:
D:\csharp\repos\cwtess4
|__ D:\csharp\repos\cwtess4\cwtess4
|__ liblept1741.dll
|__ libtesseract400.dll
|__ Tesseract.dll
|__ D:\csharp\repos\cwtess4\cwtess4\bin
|__
Have you tried using the OEM_TESSERACT_CUBE_COMBINED engine mode?
On Sunday, January 14, 2018 at 7:01:17 AM UTC, conman wrote:
>
> Hello!
>
> After trying out a lot I would like to ask for help on improving my OCR
> results.
>
> I am using tesseract 3.05.01 and have experimented with different
You are correct, the runtime version is 140. That doesn't appear to be my
problem though as x64 Dependency Walker finds this DLL. It fails to find
several DLLs though which begin 'API-MS-WIN-CORE...'. I would have expected
these to be present by way of the Win10 SDK but they are not. I tried
Is anyone else using tesseract 4.0alpha from C# ?
On Wednesday, January 10, 2018 at 1:07:28 PM UTC, James Q wrote:
>
> Here is my code:
> string text = "";
>
> string tessDataPath = ConfigurationManager.AppSettings["TessPath"];
> using (var engine
On Thursday, February 1, 2018 at 11:01:08 AM UTC, James Q wrote:
>
> The following appear to be both Latin, so can anyone tell me what the
> difference is between:
> Latin.traineddata
> and:
> lat.traineddata
> apart from the fact that the first one is 10 times bigge
The following appear to be both Latin, so can anyone tell me what the
difference is between:
Latin.traineddata
and:
lat.traineddata
apart from the fact that the first one is 10 times bigger?
Thanks
James
--
You received this message because you are subscribed to the Google Groups
Thanks Shree, So presumably then there is no Latin Script traineddata for
Tesseract_Only mode?
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to
Assuming you are using eng.traineddata - have you tried using it with the
dictionary off or just using osd.traineddata ?
On Friday, February 2, 2018 at 8:56:23 AM UTC, Scott Stekel wrote:
>
> In the attached images (original and preprocessed before OCR), I have some
> lines of text which
I've noticed on Tesseract 4 that on some occasions, if the first letter of
a word is 'B' it gets interpreted by Tesseract as 'R', and if the first
letter of a word is 'E' it gets interpreted by Tesseract as 'F'. It's as if
the bottom horizontal stroke of the character is getting lost/ignored.
If you are using tesseract 4 then whitelists/blacklists do not yet work (at
least not in LSTM mode). I also get the impression that the 'Control
Parameters' list you obtain by typing 'tesseract --print-parameters'on the
command line is not updated to the supported functionality in tesseract 4.
Tesseract prefers no noise around the image so you'll need to pre-process
this image. For example, if you are using opencv, you could find contours
(within the centre that have the appropriate height/width to be a digit),
draw those onto a blank mat and send that to tesseract.
On Wednesday,
;
>>>>> the above link has 1900+ fonts from that site i have downloaded the
>>>>> ttf files of fonts and converted to tiff files online.
>>>>>
>>>>> then i have trained the tiff files(fonts) using serak trainer.
>>>>>
>
Hi Shree, I'm trying out the script you posted earlier which is great so
thank you! I was wondering how many fonts I can specify at once in the
'fonts_for_training' list. I have run it with 9 fonts at once and that
seems fine but I would like to do 100s or even 1000s if I can. Is this the
best
It could be that a threshold operation is taking place at a lower
brightness than you grey text. Try binarizing the image with a high
threshold value befo sending to tesseract (e.g.200) this should make all
the text black.
On Saturday, July 28, 2018 at 4:00:16 PM UTC+1, Yogesh Sanchihar wrote:
It looks like you may need to fine tune train Tesseract on this particular
font. From the letters in you images it looks like 'Bevan', which you can
download from here:
https://www.fontsquirrel.com/fonts/bevan
If you are unable to train Tesseract, I have sometimes had success by
stretching
Correct me if I am wrong, but shouldn't each character be bound by its own
box? Try opening this in JTessBoxEditor (
http://vietocr.sourceforge.net/training.html ).
On Thursday, August 23, 2018 at 12:33:07 PM UTC+1, eng.ahmed@gmail.com
wrote:
>
> I want to train tesseract 4 using images
; I will try do some fine tune train with this font.
> thanks again for let me known this font name :)
>
>
> 在 2018年8月15日星期三 UTC+8下午9:49:01,James Q写道:
>>
>> It looks like you may need to fine tune train Tesseract on this
>> particular font. From the letters in you im
Hi
I'm trying to create a traineddata with a specific word list. What I have
done so far is:
1.) Create specific files langdata/eng
- eng.wordlist (containing my specific words)
- eng.finetune.training_text (representative text containing only chars
found in my words)
- eng.numbers
As far as I can tell these look ok to me. They open correctly in
JTessBoxEditor. If you are creating lstmf files for Tesseract 4, I think
you may need space+tab in you end-of-line boxes (This is what worked for me
anyway).
On Tuesday, July 3, 2018 at 7:15:46 AM UTC+1, chandra churh chatterjee
Have you tried removing all surrounding whitespace from the image except
for a thin border (say 8px thick)?
On Friday, July 6, 2018 at 4:52:08 PM UTC+1, Alberto Andreotti wrote:
>
> Hi,
>
> tried it with same results, also, all other cases work well.
>
> 23.78
> 15
> 1.6
> 1.7
> 1.2
> 1.3
> 1.4
No tool I can think of. What I would do is edit the file in a large text
file editor (such as EmEditor) to remove duplicate words. You could do this
by replacing all spaces for newlines then sorting and removing duplicates.
After that you can randomize the unique list of words, add an
I would like to improve accuracy by training tesseract 4 to use a context
specific list of words. For example countries. I have created a
eng.finetune.training_text file containing country names as well as common
country word (e.g. Republic, Island, New etc.). This (as far as I can tell)
Hi Chinmay
How did you get on with this? I'd be interested to know your accuracy rate
in interpreting block handwriting...
Thanks
James
On Tuesday, November 8, 2016 at 2:48:20 PM UTC, chinmay dhumal wrote:
>
> Handwriting would of a random NGO worker, the language would be English.
> and yea,
orking ?
>
> Thanks
> Vipin
>
> On Monday, 8 January 2018 15:33:50 UTC+5:30, James Q wrote:
>>
>> By the way I do have the Tesseract.net nuget package working (
>> https://www.nuget.org/packages/tesseract.net/ ), but have 2 issues with
>> this:
>> 1.) I nee
On Tuesday, September 11, 2018 at 1:57:34 PM UTC+1, ProgressNotPerfection
wrote:
>
> Hi Tesseract Group
> I am trying to train tesseract to recognize handwritten characters and
> have prepared several thousand lstmf files (from tif/box sets) so I can
> finetune best trained eng.traineddata, I
Thank you Shree
I ran with --debug_interval -1 as you suggested and I can see 1 iteration
showing 1 text line from a given font (lstmf) and then the next iteration
showing 1 text line from the next font. This suggests I would need number
of iterations calculated from *[number of training_text
40 matches
Mail list logo