Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2020-01-05 Thread Ibr
Not at all bro :) Tell me if you get a good results, I'm interested to know Selam -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2020-01-01 Thread Ibr
> > In my opinion "Theoretically" since Farsi has more letters than Arabic >> and also exists in Ottoman, Farsi should work better, but Shree is more >> well informed than me in this matter. >> > I remember fine tuning for Arabic fonts took too much time, I mean more than a week for the

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-31 Thread Ibr
> > Hi Serkan, >> > Well, that's great, in this case with the corresponding Unicode you can definitely have a work around to solve the issue at hand, which traineddata you are planning to fine tune? selamlar sana :) -- You received this message because you are subscribed to the Google

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-24 Thread Ibr
rs of ottoman in a document, > > On Thu, Dec 19, 2019 at 11:10 AM Ibr > > wrote: > >> Hi Serkan, >>> >> >> My pleasure brother, any time :) >> >> *"**Do I need a new model for ottoman, what you think ?"* of course I >> think It wou

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-19 Thread Ibr
> > Hi Serkan, > My pleasure brother, any time :) *"**Do I need a new model for ottoman, what you think ?"* of course I think It would help you a lot but honestly I really have no clue how to create a trained data for Ottoman or any other language, that's why maybe your best shot is Farsi

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-18 Thread Ibr
> > Hi Serkan, > > ** "*I wonder if the existing language models generated for Arabic and/or Farsi*" yes there is one for Arabic and one for Farsi, they are called lang-name.traineddata such as ara.traineddata and eng.traineddata you can find them and download them from GitHub here

Re: Ynt: [tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-17 Thread Ibr
> > Hi Serkan, > How Tesseract works is like the following, each language or writing system, it has a model which depend on to make recognition of the characters in the image, I guess it depends on something called (stroke width transformation) which is actually detecting the shapes, if while

[tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2019-12-15 Thread Ibr
you still on the subject ? > > 26 Eylül 2017 Salı 11:09:46 UTC+3 tarihinde Ibr yazdı: >> >> hi, as shree has advised, to detect Arabic writing use tesseract 4alpha, >> but in your case if you want to use it to detect ottoman text, you have to >> consider two things, if th

[tesseract-ocr] Different Versions of Tesseract

2018-03-13 Thread Ibr
Hi, I have installed more that one tesseract versions on the same machine (Ubuntu 14.04), every time I want to install newer version I create a different folder, inside it I install and compile Leptonica, then inside it I download and compile Tesseract, how can I use a specified version of

[tesseract-ocr] Detection PSM

2017-11-08 Thread Ibr
Hi, I was making detection for an image of a Japanese document, the command was: *tesseract image results_text -l jpn --tessdata-dir ./tessdata -c preserve_interword_spaces=1 --oem 1 *, and I noticed when I add the argument --psm 12 the accuracy is quite better, as far as I know that the

Re: [tesseract-ocr] oem Tesseract + lstm

2017-10-30 Thread Ibr
5:34:06 PM UTC+2, shree wrote: > The same traineddata file should have files for both engines - legacy > tesseract and lstm. > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Sun, O

[tesseract-ocr] oem Tesseract + lstm

2017-10-29 Thread Ibr
Hi I'm using: tesseract 4.00.00dev-692-gad5ee18 and leptonica-1.74.4 I want to use the oem 2, which is "2Tesseract + LSTM." for English language that means I need two traineddata, the traineddata with LSTM which is integrated with tesseract 4, and the traineddata which doesn't contain

[tesseract-ocr] Re: fine tune Tesseract

2017-10-29 Thread Ibr
ata refer to this link <https://groups.google.com/forum/#!topic/tesseract-ocr/zVjfHEyqbgg> for more about the "vert" issue On Thursday, October 26, 2017 at 4:35:26 PM UTC+3, wangdon...@gmail.com wrote: > > > 在 2017年10月24日星期二 UTC+8下午10:52:02,Ibr写道: >> >> Hi,

[tesseract-ocr] Re: fine tune Tesseract

2017-10-25 Thread Ibr
l_output ./trained/engoutput/eng.traineddata > > To "finish" the training > > On Tuesday, October 24, 2017 at 10:52:02 AM UTC-4, Ibr wrote: >> >> Hi, >> I have the latest version of Tesseract and leptonica 1.74.4, ran the >> command >> training/lstmtrai

[tesseract-ocr] fine tune Tesseract

2017-10-24 Thread Ibr
Hi, I have the latest version of Tesseract and leptonica 1.74.4, ran the command training/lstmtraining --model_output /home/ibr/latest_leptonica_4/lstmf_old_jpn/jpn \ --continue_from /home/ibr/latest_leptonica_4/jpn_tune/extracted/jpn.lstm \ --traineddata /home/ibr/latest_leptonica_4

[tesseract-ocr] Re: How to use Tesseract Arabic OCR.

2017-09-26 Thread Ibr
hi, as shree has advised, to detect Arabic writing use tesseract 4alpha, but in your case if you want to use it to detect ottoman text, you have to consider two things, if the font is uncommon, you need to do some enhancing to the Arabic model (ara.traineddata) against that font -it is a

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
how can I know how many lines in each lstmf file? I opened one with the notepad ++ and it was almost 7 line, and that can't be correct since I tried 61 font with 10 iterations > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To

Re: [tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
cover all files once. > > You can use --target_error_rate 0.01 instead of number of iterations > as a guide for how long to train. > > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

[tesseract-ocr] Fine Tuning Iterations

2017-06-22 Thread Ibr
Hi, if I want to run the command: training/lstmtraining --model_output ~/tesstutorial/full_japanese/new \ --continue_from ~/tesstutorial/extracted_lstm/jpn.lstm \ --train_listfile ~/tesstutorial/jpntrain/jpn.training_files.txt \ --max_iterations 10 how can I match the --max_iterations

[tesseract-ocr] Fine Tuning all Fonts List

2017-06-19 Thread Ibr
Hi, for engtrain and engeval they almost have the same command but for eval you specify the font using the argument --font-list, while in train you define the fonts in language-specifics.sh , I ran both command and I noticed that they produce the same results files, except in engtrain case

Re: [tesseract-ocr] Re: Font List

2017-06-18 Thread Ibr
Hi, for engtrain and engeval they almost have the same command but for eval you specify the font using the argument --font-list, while in train you define the fonts in language-specifics.sh , I ran both command and I noticed that they produce the same results files, except in engtrain case

Re: [tesseract-ocr] oem Detection

2017-06-14 Thread Ibr
yes I already extracted the lstm file and specified that at the argument continue: *--continue_from ~/tesstutorial/impact_from_full/jpn.lstm* isn't this step should do it? yet the error keep coming: *Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.lstm, unpacking...Failed to continue

Re: [tesseract-ocr] oem Detection

2017-06-14 Thread Ibr
isting model? because when I run the command above it says:- Loaded file /home/ibr/tesstutorial/impact_from_full/jpn.traineddata, unpacking... Failed to continue from: /home/ibr/tesstutorial/impact_from_full/jpn.traineddata On Tuesday, June 13, 2017 at 4:28:21 PM UTC+3, shree wrote: > combine

Re: [tesseract-ocr] oem Detection

2017-06-14 Thread Ibr
as for --continue_from, its mentioned in here <https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact> its can be for recognition model which is be .lstm, if not what is the existing model? because when I run the command above it says:- Loaded file /ho

[tesseract-ocr] Re: Font List

2017-06-14 Thread Ibr
> > UPDATE > I figured out how to use the list, and seems two commands are the same, so still the question, what is the difference between the engtrain and engeval? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this

[tesseract-ocr] Re: Font List

2017-06-14 Thread Ibr
> > I think this command is substitute for the command above, correct? > training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain because the only

[tesseract-ocr] Font List

2017-06-14 Thread Ibr
Hi, for the command: training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata \ --fontlist "Impact Condensed" --output_dir ~/tesstutorial/engeval for the argument --fontlist how can

Re: [tesseract-ocr] oem Detection

2017-06-14 Thread Ibr
___ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Jun 13, 2017 at 6:09 PM, Ibr <ibr.h...@gmail.com > > wrote: > >> thanks for the response, well actually I wrote the command wrong, I >> wanted to co

Re: [tesseract-ocr] oem Detection

2017-06-13 Thread Ibr
@ http://bhajans.ramparivar.com > > On Tue, Jun 13, 2017 at 5:25 PM, Ibr <ibr.h...@gmail.com > > wrote: > >> seems so, to add or merge the new LSTM files in the traineddata this >> command to user correct: *training/combine_tessdata -o >> tessda

Re: [tesseract-ocr] oem Detection

2017-06-13 Thread Ibr
; *You need to create a new traineddata with the new lstm files and then > test with it.* > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Jun 13, 2017 at 3:17 PM, Ibr <ibr.h...@gmail.com &

[tesseract-ocr] oem Detection

2017-06-13 Thread Ibr
Hi, when make detection using the tesseract 4.00.00alpha and use the command: *tesseract image results -l ara --tessdata-dir ./tessdata --oem 1 *the oem here means "Neural nets LSTM only", so there is no argument in tesseract to specify where to find the LSTM files, how the tesseract find

Re: [tesseract-ocr] Detect Multiple Images by Command Line

2017-06-12 Thread Ibr
gt; On Mon, Jun 12, 2017 at 3:58 PM, Ibr <ibr.h...@gmail.com > > wrote: > >> Hi, >> >> When I want to detect an image on the tesseract 4.00alpha I run the >> command *tesseract image results -l lang --tessdata-dir ./tessdata --oem >> 1* . >> &

[tesseract-ocr] Detect Multiple Images by Command Line

2017-06-12 Thread Ibr
Hi, When I want to detect an image on the tesseract 4.00alpha I run the command *tesseract image results -l lang --tessdata-dir ./tessdata --oem 1* . my question is, when I need to detect say 10 image, for example image1, image2 image3 etc. but I want to do that all in one command, and

[tesseract-ocr] Re: how to use tesseract to detect table?

2017-06-05 Thread Ibr
Hi, I think for detecting an image which contains a table you should use the argument --psm # with the detection command, psm stands for Page Segmentation Mode, the default is 3 I think for a table use 6 so it will be --psm 6 , anyway just type tesseract and it will be printed on the terminal

[tesseract-ocr] Detection Using LSTM Files

2017-06-05 Thread Ibr
Hi, assume that I have creates 20 LSTM files for English for example, each LSTM file is for a different font, when I make detection against an image by running the command: *tesseract image results -l eng--tessdata-dir ./tessdata --oem 1* does the tesseract check the image against all LSTM

Re: [tesseract-ocr] Same Font with Multible Styles

2017-06-05 Thread Ibr
se names in your fontlist for training. > > If they are all listed as test, then it may not work. > > ShreeDevi > ____ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Thu, Jun 1, 2017 at 6:36 PM, Ibr <

[tesseract-ocr] Same Font with Multible Styles

2017-06-01 Thread Ibr
Hi, If we assume that we have set of fonts files, and all of there fonts files are for the same font, but each one of them is for a different style, for example if we have font "test" there will be file for test regular, and file for test bold and file for test italic, but all of these files

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
replied to it On Thursday, May 4, 2017 at 3:06:34 PM UTC+3, Ahmad Moawad wrote: > > check ur email > > On Thursday, May 4, 2017 at 1:51:04 PM UTC+2, Ibr wrote: >> >> ibr.h...@gmail.com >> >> On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad w

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >

Re: [tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ursday, May 4, 2017 at 12:52:42 PM UTC+3, shree wrote: > Ibr, > > You are incorrect in your description of LSTM training. > > What you are doing will use the ara.traineddata provided in the repo, > there will be no change in output. > > Once lstmf files are created, you

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
ibr.ham...@gmail.com On Thursday, May 4, 2017 at 2:47:12 PM UTC+3, Ahmad Moawad wrote: > > Ibr give me your email! > > On Thursday, May 4, 2017 at 1:06:22 PM UTC+2, Ibr wrote: >> >> while I was creating lstmf files to I can use them in recognition text >

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
, Ahmad Moawad wrote: > My Scenario is related to make training from images not from text base, I > want to finetune characters such as: > لمجرد not ملجرد > and soon on > > On Thursday, May 4, 2017 at 11:28:13 AM UTC+2, Ibr wrote: >> >> if you are referring to tess

[tesseract-ocr] Re: How to make training for Arabic in Tesseract 4.0

2017-05-04 Thread Ibr
if you are referring to tesseract 4.00alpha with liptonica 1.74.1, and if you compiled them in the correct way and got the binaries that you need for training lmstf files, then I recommend to follow the suggestions that is made by tesseract devs which is: once you create an .lstmf file for a

[tesseract-ocr] enhance Tesseract 4 accuracy

2017-04-24 Thread Ibr
Hi, I'm using tesseract 4.00alpha tesseract with leptonica 1.74.1 on Ubuntu 14, I use it to create LSTM files, when I want to make a detection for any image I use both trained data with the LSTM file, and the command is tesseract image.tif output -l ara --oem 1 the results is good except for

[tesseract-ocr] Re: Tesseract 4 Tesstrain

2017-04-19 Thread Ibr
answer is at the last thread at this <https://groups.google.com/forum/#!topic/tesseract-ocr/TbhPAzPzqWo> On Wednesday, April 19, 2017 at 2:37:29 PM UTC+3, Ibr wrote: > > Hi, > im trying to run this command on bash (on windows 10): *training/tesstrain.sh > --fonts_dir /usr/shar

[tesseract-ocr] Tesseract 4 Tesstrain

2017-04-19 Thread Ibr
Hi, im trying to run this command on bash (on windows 10): *training/tesstrain.sh --fonts_dir /usr/share/fonts --lang ara --langdata_dir ../langdata --tessdata_dir ./tessdata --output_dir ./output1* yet it keeps generating the error: *ERROR: text2image not found* although the text2image exists

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-19 Thread Ibr
indows via mobaxterm, which > makes it easier to use > > see http://mobaxterm.mobatek.net/download-home-edition.html > > > ShreeDevi > > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Wed, Apr 19, 201

Re: [tesseract-ocr] Re: Tesseract Installation

2017-04-19 Thread Ibr
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > > On Tue, Apr 11, 2017 at 6:53 PM, shree <shree...@gmail.com > > wrote: > >> >> On Tuesday, April 11, 2017 at 4:10:26 PM UTC+5:30, Ibr wrote: >>> >>> >>> Note: I'm using windows 1

[tesseract-ocr] Creating fonts list Tesserocr

2017-04-13 Thread Ibr
Hi, im using tesseract 3.05.00dev on windows CMD, i was able to create .tif files and .box file when i run the command for a single font, yet when i try to create fontlist an error is generated for every font, the error message is *Font Arial failed with 7909 hits = 17.73%* and there is an

[tesseract-ocr] Tesseract Installation

2017-04-11 Thread Ibr
Hi, I'm trying to install the tesseract following the steps from this website ,i ran the command for the step 5 all worked fine except the command *sudo ldconfig *and it returned the error *sudo: unable to