Re: [tesseract-ocr] Original training data for eng.traineddata

2023-06-20 Thread Zdenko Podobny
With opensourced data you will not be able to create (from scratch) the same quality traineddata as Google provided. However there are some projects that fine tuned Google model successfully e.g. (UB-Mannheim/: https://madoc.bib.uni-mannheim.de/53748/ ) Zdenko st 21. 6. 2023 o 4:38 Duy Khanh

Re: [tesseract-ocr] What are Langdata repository given for retraining Tesseract

2023-06-20 Thread Duy Khanh
Hi! Do you have the answer yet? Cause I am currently looking for it :D Vào lúc 12:27:57 UTC+7 ngày Thứ Sáu, 16 tháng 4, 2021, venkat...@gmail.com đã viết: > Thank you that was helpful. So is it the same training set used for > creating the default traindeddata files available in the repo? > >

[tesseract-ocr] Original training data for eng.traineddata

2023-06-20 Thread Duy Khanh
Hi. Is the existing eng.training_text in langdata_lstm the full text corpus used for training the eng.traineddata? Do we have a list of fonts used for generating the images? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from

Re: [tesseract-ocr] Runic OCR with tesseract

2023-06-20 Thread Zdenko Podobny
https://github.com/tesseract-ocr/langdata and https://github.com/tesseract-ocr/langdata_lstm provide input data that could be useful for tesseract training. I am not aware of Runic traineddata released by Google or contributors => you will need to create it by yourself. Zdenko ut 20. 6. 2023 o

Re: [tesseract-ocr] Unable to generate Hindi line images using text2image

2023-06-20 Thread Zdenko Podobny
Please follow the official training procedure [1], read the official docs[2], or complain to the author of the tutorial you decide to follow. [1] https://github.com/tesseract-ocr/tesstrain [2] https://tesseract-ocr.github.io/tessdoc/ Zdenko ut 20. 6. 2023 o 10:39 abhilash rao napísal(a): >

Re: [tesseract-ocr] Building for iOS arm-64 produces x86_64 library

2023-06-20 Thread Zdenko Podobny
Please do not post only the last error - usually, there is a problem before and e,g, configure output could indicate a lot of... Make sure you check the issue tracker where are already some hints on what to check e.g. https://github.com/tesseract-ocr/tesseract/issues/3980

Re: [tesseract-ocr] Unable to generate Hindi line images using text2image

2023-06-20 Thread Harsha Perera
i want create Number plate reading application can you help On Tue, 20 Jun 2023 at 14:09, abhilash rao wrote: > Hi guys, > So I needed a help in generating line images using tesseract's text2image > for Hindi Language. > Details: > I am using python script available on this repository >

Re: [tesseract-ocr] Building for iOS arm-64 produces x86_64 library

2023-06-20 Thread Maria Vilensky
We now tried the same script on M1 based machine, and the error is as follows: CXXLD libtesseract_native.la CXXLD libtesseract_ccutil.la CXXLD libtesseract_lstm.la CXXLD libtesseract.la CXXLD tesseract Undefined symbols for architecture arm64: "tesseract::IntSimdMatrix::intSimdMatrixNEON",

[tesseract-ocr] Re: iOS Simulator (macOS) arm64 compilation failed on linking with NEON error

2023-06-20 Thread Maria Vilensky
Hi, We have the same error, did you end up finding a solution? On Wednesday, February 16, 2022 at 12:36:31 PM UTC+2 Oskar Gargas wrote: > Hello, > > I try to compile Tesseract for macOS on arm64. I use the following > commands: > > ``` > export LIBS="-lz -lpng -ljpeg -ltiff" > export >

[tesseract-ocr] Re: Runic OCR with tesseract

2023-06-20 Thread serhat acar
I am trying hard to create a training model. However, it is showing me unrecognisable arguments errors when generating the data using tesstrain.sh, it is not recognising my font. Could you please help me? Thanks Serhat Le mardi 20 juin 2023 à 11:03:07 UTC+2, Meth Man a écrit : > Hey all. I'm

[tesseract-ocr] Runic OCR with tesseract

2023-06-20 Thread Meth Man
Hey all. I'm new to tesseract and I'm trying to find Futhark Runes on an image via OCR. I can see there are training files for runes (Runic.unicharset) but is there also a pre-generated language file that I can use or wiull I have to train it myself? I cant see anything related to

[tesseract-ocr] Unable to generate Hindi line images using text2image

2023-06-20 Thread abhilash rao
Hi guys, So I needed a help in generating line images using tesseract's text2image for Hindi Language. Details: I am using python script available on this repository https://github.com/astutejoe/tesseract_tutorial.git which basically reads a text file and splits it into lines and saves each as

Re: [tesseract-ocr] Building for iOS arm-64 produces x86_64 library

2023-06-20 Thread Maria Vilensky
Thanks for your reply, after many more googling and build attempts, I am not even sure if this is supposed to work - compling arm64 library on an Intel-based mac? Can anyone who tried this confirm this or give a suggestion how to deal with it? Thanks! On Sunday, June 18, 2023 at 5:02:10 PM