[tesseract-ocr] Re: tesseract-ocr

2018-06-19 Thread James Q
Hi Navaneetha I am also looking to start training tesseract using handwritten fonts and am about to start setting up my training environment. Are you training tesseract 4 by following the guide at https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 ? If so are you fine

[tesseract-ocr] Tesseract Training using basic characters only

2018-06-25 Thread James Q
The text I want Tesseract to read will only contain the most basic characters. Is there a way of finetuning it therefore so as to only include basic upper/lower case letters, digits and punctuation marks? That way I could avoid 'c' getting misinterpreted as '¢' etc.? Would simply passing in a

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-26 Thread James Q
;> >>>> then i have trained the tiff files(fonts) using serak trainer. >>>> >>>> >>>> If you got the accuracy just forward the results so everyone can konw >>>> and will follw you. >>>> >>>>

[tesseract-ocr] Re: Tesseract 4 Handwriting recognition

2018-06-27 Thread James Q
Hi Andreas, Have you managed to get this installed on windows 10? On Wednesday, June 27, 2018 at 8:29:25 AM UTC+1, Andreas R wrote: > > Hello, > > is the new Tesseract 4 viable for Handwriting recognition? > > The FAQ says no. > ( >

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-20 Thread James Q
aining tesseract 3.5. > > > > On Tue, Jun 19, 2018 at 9:29 PM, James Q > wrote: > >> Hi Navaneetha >> I am also looking to start training tesseract using handwritten fonts and >> am about to start setting up my training environment. Are you training >>

[tesseract-ocr] Re: why tesseract always detect number 6 to number 8 in a good image??

2018-01-05 Thread James Q
I had the same problem. I edited the csproj file for each dll to always copt the content item to the output directory like this: Always After that it worked for me but still have trouble converting a Bitmap to a Pix. Thanks James On Tuesday, January 2, 2018 at 7:27:54 AM UTC,

[tesseract-ocr] I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-05 Thread James Q
I'm trying to use this wrapper: https://github.com/tdhintz/tesseract4win64 It's an x64 .Net assembly with one main DLL (Tesseract.dll) and two dependency DLLs (liblept1741.dll and libtesseract400.dll). To start with I'm just trying to get a Visual Studio console app running. I've added

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-08 Thread James Q
5, 2018 at 8:38:08 PM UTC+3:30, James Q wrote: >> >> I'm trying to use this wrapper: >> https://github.com/tdhintz/tesseract4win64 >> >> It's an x64 .Net assembly with one main DLL (Tesseract.dll) and two >> dependency DLLs (liblept1741.dll and libtesseract400.d

[tesseract-ocr] Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-10 Thread James Q
Here is my code: string text = ""; string tessDataPath = ConfigurationManager.AppSettings["TessPath"]; using (var engine = new TessBaseAPI(@tessDataPath, @"eng")) { engine.SetVariable("tessedit_ocr_engine_mode", "0"); engine.SetPageSegMode(PageSegmentationMode.SINGLE_LINE);

Re: [tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-08 Thread James Q
_____ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> On Mon, Jan 8, 2018 at 3:33 PM, James Q <james.qu...@taina.tech >> > wrote: >> >>> By the way I do have the Tesseract.net nuget package working ( >>> https://www.n

[tesseract-ocr] Re: Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-15 Thread James Q
. On Thursday, January 11, 2018 at 7:15:58 PM UTC, James Q wrote: > > Is anyone else using tesseract 4.0alpha from C# ? > > On Wednesday, January 10, 2018 at 1:07:28 PM UTC, James Q wrote: >> >> Here is my code: >> string text = ""; >> >> string tessD

[tesseract-ocr] Re: How to extract character by character using tesseract and pass it to other engine for detection.

2018-01-18 Thread James Q
I haven't done this myself, but I believe you should be able to generate a box file from the source image and use this to crop character subimages from that source image. Tesseract won't always get the boxes right though. On Thursday, January 18, 2018 at 12:49:22 PM UTC, Hardik Sutaria wrote: >

[tesseract-ocr] Re: Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-18 Thread James Q
1:07:28 PM UTC, James Q wrote: > > Here is my code: > string text = ""; > > string tessDataPath = ConfigurationManager.AppSettings["TessPath"]; > using (var engine = new TessBaseAPI(@tessDataPath, @"eng")) > { > engine.SetVariab

[tesseract-ocr] Re: Criminal record JPGs: Improving image quality

2018-01-18 Thread James Q
In my experience Tesseract gives poor results with lines within the text. You can test this by manually whiting out the lines in a paint editor and retrying Tesseract with the new image. If the results are improved then you will likely need to do this programatically. This is not

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-12 Thread James Q
Thanks for the reply, In my project I have tried all 3 DLLs in all potential folders as follows: D:\csharp\repos\cwtess4 |__ D:\csharp\repos\cwtess4\cwtess4 |__ liblept1741.dll |__ libtesseract400.dll |__ Tesseract.dll |__ D:\csharp\repos\cwtess4\cwtess4\bin |__

[tesseract-ocr] Re: Need help to improve quality

2018-01-15 Thread James Q
Have you tried using the OEM_TESSERACT_CUBE_COMBINED engine mode? On Sunday, January 14, 2018 at 7:01:17 AM UTC, conman wrote: > > Hello! > > After trying out a lot I would like to ask for help on improving my OCR > results. > > I am using tesseract 3.05.01 and have experimented with different

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-01-15 Thread James Q
You are correct, the runtime version is 140. That doesn't appear to be my problem though as x64 Dependency Walker finds this DLL. It fails to find several DLLs though which begin 'API-MS-WIN-CORE...'. I would have expected these to be present by way of the Win10 SDK but they are not. I tried

[tesseract-ocr] Re: Variables having no effect on C# Tesseract.net 4.0.0.6 wrapper

2018-01-11 Thread James Q
Is anyone else using tesseract 4.0alpha from C# ? On Wednesday, January 10, 2018 at 1:07:28 PM UTC, James Q wrote: > > Here is my code: > string text = ""; > > string tessDataPath = ConfigurationManager.AppSettings["TessPath"]; > using (var engine

[tesseract-ocr] Re: tessdata_best traineddata FIles

2018-02-01 Thread James Q
On Thursday, February 1, 2018 at 11:01:08 AM UTC, James Q wrote: > > The following appear to be both Latin, so can anyone tell me what the > difference is between: > Latin.traineddata > and: > lat.traineddata > apart from the fact that the first one is 10 times bigge

[tesseract-ocr] tessdata_best traineddata FIles

2018-02-01 Thread James Q
The following appear to be both Latin, so can anyone tell me what the difference is between: Latin.traineddata and: lat.traineddata apart from the fact that the first one is 10 times bigger? Thanks James -- You received this message because you are subscribed to the Google Groups

[tesseract-ocr] Re: tessdata_best traineddata FIles

2018-02-01 Thread James Q
Thanks Shree, So presumably then there is no Latin Script traineddata for Tesseract_Only mode? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

[tesseract-ocr] Re: Inconsistent results with slashes, even on same line

2018-02-02 Thread James Q
Assuming you are using eng.traineddata - have you tried using it with the dictionary off or just using osd.traineddata ? On Friday, February 2, 2018 at 8:56:23 AM UTC, Scott Stekel wrote: > > In the attached images (original and preprocessed before OCR), I have some > lines of text which

[tesseract-ocr] Tesseract 4 B -> R and E -> F

2018-02-05 Thread James Q
I've noticed on Tesseract 4 that on some occasions, if the first letter of a word is 'B' it gets interpreted by Tesseract as 'R', and if the first letter of a word is 'E' it gets interpreted by Tesseract as 'F'. It's as if the bottom horizontal stroke of the character is getting lost/ignored.

[tesseract-ocr] Re: Why the SetVariable don't work normally?

2018-01-31 Thread James Q
If you are using tesseract 4 then whitelists/blacklists do not yet work (at least not in LSTM mode). I also get the impression that the 'Control Parameters' list you obtain by typing 'tesseract --print-parameters'on the command line is not updated to the supported functionality in tesseract 4.

[tesseract-ocr] Re: tesseract to recognize the cropped digits

2018-02-14 Thread James Q
Tesseract prefers no noise around the image so you'll need to pre-process this image. For example, if you are using opencv, you could find contours (within the centre that have the appropriate height/width to be a digit), draw those onto a blank mat and send that to tesseract. On Wednesday,

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread James Q
; >>>>> the above link has 1900+ fonts from that site i have downloaded the >>>>> ttf files of fonts and converted to tiff files online. >>>>> >>>>> then i have trained the tiff files(fonts) using serak trainer. >>>>> >

Re: [tesseract-ocr] Re: tesseract-ocr

2018-06-21 Thread James Q
Hi Shree, I'm trying out the script you posted earlier which is great so thank you! I was wondering how many fonts I can specify at once in the 'fonts_for_training' list. I have run it with 9 fonts at once and that seems fine but I would like to do 100s or even 1000s if I can. Is this the best

[tesseract-ocr] Re: tesseract does not recognize grey colored fonts in the images..

2018-07-31 Thread James Q
It could be that a threshold operation is taking place at a lower brightness than you grey text. Try binarizing the image with a high threshold value befo sending to tesseract (e.g.200) this should make all the text black. On Saturday, July 28, 2018 at 4:00:16 PM UTC+1, Yogesh Sanchihar wrote:

[tesseract-ocr] Re: why such simple word can't be recognized?

2018-08-15 Thread James Q
It looks like you may need to fine tune train Tesseract on this particular font. From the letters in you images it looks like 'Bevan', which you can download from here: https://www.fontsquirrel.com/fonts/bevan If you are unable to train Tesseract, I have sometimes had success by stretching

[tesseract-ocr] Re: BOX File Automatic Generation using the word coordinates

2018-08-24 Thread James Q
Correct me if I am wrong, but shouldn't each character be bound by its own box? Try opening this in JTessBoxEditor ( http://vietocr.sourceforge.net/training.html ). On Thursday, August 23, 2018 at 12:33:07 PM UTC+1, eng.ahmed@gmail.com wrote: > > I want to train tesseract 4 using images

[tesseract-ocr] Re: why such simple word can't be recognized?

2018-08-18 Thread James Q
; I will try do some fine tune train with this font. > thanks again for let me known this font name :) > > > 在 2018年8月15日星期三 UTC+8下午9:49:01,James Q写道: >> >> It looks like you may need to fine tune train Tesseract on this >> particular font. From the letters in you im

[tesseract-ocr] Creating traineddata with specific wordlist

2018-07-17 Thread James Q
Hi I'm trying to create a traineddata with a specific word list. What I have done so far is: 1.) Create specific files langdata/eng - eng.wordlist (containing my specific words) - eng.finetune.training_text (representative text containing only chars found in my words) - eng.numbers

[tesseract-ocr] Re: Check validity of box and image files

2018-07-04 Thread James Q
As far as I can tell these look ok to me. They open correctly in JTessBoxEditor. If you are creating lstmf files for Tesseract 4, I think you may need space+tab in you end-of-line boxes (This is what worked for me anyway). On Tuesday, July 3, 2018 at 7:15:46 AM UTC+1, chandra churh chatterjee

Re: [tesseract-ocr] Really poor performance with decimal numbers

2018-07-06 Thread James Q
Have you tried removing all surrounding whitespace from the image except for a thin border (say 8px thick)? On Friday, July 6, 2018 at 4:52:08 PM UTC+1, Alberto Andreotti wrote: > > Hi, > > tried it with same results, also, all other cases work well. > > 23.78 > 15 > 1.6 > 1.7 > 1.2 > 1.3 > 1.4

[tesseract-ocr] Re: Explanation for training_text and wordlist files

2018-07-06 Thread James Q
No tool I can think of. What I would do is edit the file in a large text file editor (such as EmEditor) to remove duplicate words. You could do this by replacing all spaces for newlines then sorting and removing duplicates. After that you can randomize the unique list of words, add an

[tesseract-ocr] Training for specific words

2018-07-04 Thread James Q
I would like to improve accuracy by training tesseract 4 to use a context specific list of words. For example countries. I have created a eng.finetune.training_text file containing country names as well as common country word (e.g. Republic, Island, New etc.). This (as far as I can tell)

Re: [tesseract-ocr] Re: Need Help To recognise handwriting using OCR

2018-06-29 Thread James Q
Hi Chinmay How did you get on with this? I'd be interested to know your accuracy rate in interpreting block handwriting... Thanks James On Tuesday, November 8, 2016 at 2:48:20 PM UTC, chinmay dhumal wrote: > > Handwriting would of a random NGO worker, the language would be English. > and yea,

[tesseract-ocr] Re: I Need help getting Tesseract 4.0 C# .Net Wrapper working please!

2018-09-26 Thread James Q
orking ? > > Thanks > Vipin > > On Monday, 8 January 2018 15:33:50 UTC+5:30, James Q wrote: >> >> By the way I do have the Tesseract.net nuget package working ( >> https://www.nuget.org/packages/tesseract.net/ ), but have 2 issues with >> this: >> 1.) I nee

[tesseract-ocr] Re: Training with a large number of LSTMF files

2018-09-11 Thread James Q
On Tuesday, September 11, 2018 at 1:57:34 PM UTC+1, ProgressNotPerfection wrote: > > Hi Tesseract Group > I am trying to train tesseract to recognize handwritten characters and > have prepared several thousand lstmf files (from tif/box sets) so I can > finetune best trained eng.traineddata, I

[tesseract-ocr] Re: Training with a large number of LSTMF files

2018-09-11 Thread James Q
Thank you Shree I ran with --debug_interval -1 as you suggested and I can see 1 iteration showing 1 text line from a given font (lstmf) and then the next iteration showing 1 text line from the next font. This suggests I would need number of iterations calculated from *[number of training_text