[tesseract-ocr] Tesseract OCR for analysing hand-written exams papers

2024-04-30 Thread Oscar Gledel
Hi, I've come here after quite a few attempts and tests with tesseract as part of a university study project in France. The aim of this project is to analyse exam papers written by students in order to facilitate marking. Our teacher wanted an open-source OCR tool, so we turned to Tesseract.

[tesseract-ocr] tesseract-ocr is not converting or extracting the text properly

2023-11-14 Thread Arul Britto Kumar Abraham
Hi, I am using tesseract-ocr in my python code to convert non-searchable pdf to searchable pdf document, it is not converting fully... I am using "poppler-23.08.0" to convert the PDF page to images from this image I am using "pytesseract.image_to_pdf_or_hocr" method to convert to PDF files

Re: [tesseract-ocr] Tesseract-ocr in quiet mode

2023-07-23 Thread astro
HI,  Just found the solution. Here is a code snippet in case anyone is interested. |Dim p as New ProcessStartInfo(@"command", args) p.WindowStyle = ProcessWindowStyle.Hidden p.CreateNoWindow = true Process.Start(p) Cheers Nor | On 7/23/2023 10:13 AM, astro wrote: Hi Zdenko THanks for

Re: [tesseract-ocr] Tesseract-ocr in quiet mode

2023-07-23 Thread astro
Hi Zdenko THanks for that reply. I wasn't sure if that was the case or not. Guess I just have to live with it. Cheers  NOr On 7/23/2023 10:01 AM, Zdenko Podobny wrote: It is not a tesseract problem but the VB. Prove for this you can find in pytesseract that call tesseract executable without

Re: [tesseract-ocr] Tesseract-ocr in quiet mode

2023-07-23 Thread Zdenko Podobny
It is not a tesseract problem but the VB. Prove for this you can find in pytesseract that call tesseract executable without console windows. Zdenko ne 23. 7. 2023 o 15:55 nor s napísal(a): > Is there a way to have Tesseract run without producing a Dos window? I'm > incorporating a call to

[tesseract-ocr] Tesseract-ocr in quiet mode

2023-07-23 Thread nor s
Is there a way to have Tesseract run without producing a Dos window? I'm incorporating a call to Tesseract-ocr in my VB.net application to read some date info from an image. Each time I execute Tesseract I get a dos window popping up. I'm on windows 10 and Tesseract 5.0 Thanks Nor -- You

Re: [tesseract-ocr] Tesseract OCR on PDF without converting into images

2022-08-12 Thread Merlijn B.W. Wajer
Hi Banti, On 11/08/2022 12:11, Banti Kumar wrote: Can I use tesseract on pdf without converting pages into images? I have some pdf pages with digital text and Images with text, I just want to apply ocr on images but not on the digital text regions so I can get better accuracy for searchable

Re: [tesseract-ocr] Tesseract OCR on PDF without converting into images

2022-08-12 Thread Zdenko Podobny
No. On Thu, 11 Aug 2022, 12:11 Banti Kumar, wrote: > Can I use tesseract on pdf without converting pages into images? > I have some pdf pages with digital text and Images with text, I just want > to apply ocr on images but not on the digital text regions so I can get > better accuracy for

[tesseract-ocr] Tesseract OCR on PDF without converting into images

2022-08-11 Thread Banti Kumar
Can I use tesseract on pdf without converting pages into images? I have some pdf pages with digital text and Images with text, I just want to apply ocr on images but not on the digital text regions so I can get better accuracy for searchable pdfs TIA -- You received this message because you

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-27 Thread Hervé
decimal point is not a problem, I can devide by 100 or 10 and it works :) could you share my the whole code ? thanks Le lundi 27 juin 2022 à 20:44:42 UTC+2, zdenop a écrit : > not sure what are you doing, but try something like this: > > def autoinvert(binarized_img, tresh=0.5): > """Invert

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-27 Thread Zdenko Podobny
not sure what are you doing, but try something like this: def autoinvert(binarized_img, tresh=0.5): """Invert binarized image if amount of black pixels is higher than tresh. """ height, width = binarized_img.shape non_zero = cv2.countNonZero(binarized_img) white_rate =

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-26 Thread Zdenko Podobny
Check your tesseract version (tesseract -v). Here is mine: tesseract 5.1.0-70-g0df5 leptonica-1.83.0 (Jun 24 2022, 17:48:50) [MSC v.1929 LIB Release x64] libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.0.91) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.2 : libopenjp2 2.5.0 Found

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Hervé
Sorry I am really noob When I do : tesseract pH_treshr.png - I have : Empty page!! Empty page!! How do you achieve to have this image ? and why can't I tesseract it like you ? I am on buster with tesseract 5.1 is there a way to discuss ? discord ? thanks for your patience and help Le samedi

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Zdenko Podobny
Sorry - I mean Rescaling: Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images. For more information see the FAQ. "Willus Dotkom" made interesting test for Optimal image resolution with suggestion for optimal Height of capital letter in

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Hervé
I am on tesseract 5 Inverting images While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background. isn'it the same than : (thresh, im_bw) = cv2.threshold(gray, 128, 255,

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Zdenko Podobny
Why you did not try more relevant hits like inverting and resizing? Zdenko so 25. 6. 2022 o 10:56 Hervé napísal(a): > I tried gray image, black and white, and I use > > custom_psm = r'--psm 7' > > didn't try others parameters > Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit : > >>

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Hervé
I tried gray image, black and white, and I use custom_psm = r'--psm 7' didn't try others parameters Le samedi 25 juin 2022 à 10:32:14 UTC+2, zdenop a écrit : > > > so 25. 6. 2022 o 8:15 Hervé napísal(a): > >> Hi >> I just tried some, without real success >> >> Please be specific: what did

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Zdenko Podobny
so 25. 6. 2022 o 8:15 Hervé napísal(a): > Hi > I just tried some, without real success > > Please be specific: what did you try and what was the result? > could I learn digits from pictures ? maybe this font is not well recognized > Any training is useless if the failure is at the image

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-25 Thread Hervé
Hi I just tried some, without real success could I learn digits from pictures ? maybe this font is not well recognized thanks Le vendredi 24 juin 2022 à 17:12:44 UTC+2, zdenop a écrit : > Did try to implement suggestion from documentation? >

Re: [tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-24 Thread Zdenko Podobny
Did try to implement suggestion from documentation? https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md Zdenko pi 24. 6. 2022 o 16:59 Hervé napísal(a): > Hi, I need some help to make tesseract-OCR recognize digits : can't > achieve to make this work with > > >

[tesseract-ocr] Tesseract OCR LCD digits doesn't work

2022-06-24 Thread Hervé
Hi, I need some help to make tesseract-OCR recognize digits : can't achieve to make this work with https://img.super-h.fr/images/2022/06/24/9a03414616bc4c6bd6e4bdb78e9d6783.jpg here is my code : import cv2 import pytesseract pytesseract.pytesseract.tesseract_cmd ="C:\\Program

[tesseract-ocr] tesseract-ocr on other mac MacPorts necessary

2022-03-30 Thread polki paul
Hello, I created a shared library of tesseract-ocr on my mac, it work. but when I try my app in an other mac, I need to install for working : *MacPorts* *sudo port install autoconf \ * * automake \ * * libtool \ * *

[tesseract-ocr] Tesseract-OCR on VM

2021-05-25 Thread amjad Baidas
Hi, I'm a new user of tesseract-OCR and I want to put some specs for a project that I want to do for one of our clients. I just have a question, is tesseract-OCR can be used on VM machine or not? and if yes, what are the best specs of the VM that guarantee the best performance for that?

Re: [tesseract-ocr] Tesseract ocr

2021-04-24 Thread Zdenko Podobny
Hi, pdf is a document format (like odt, doc, docx, rtf). tesseract is processing images. You did not mention what programing language(s) you plan to use, but there plenty of tool for pdf text extraction e.g. textract (python) [1] If you have "stupid pdf" (just somebody embed to pdf scanned

Re: [tesseract-ocr] Tesseract ocr

2021-04-24 Thread Mohammad Waqas Shoukat Ali
Hi Zdenko, My input is different pdf documents that contain things like salary slips and some other financial documents. We want to use tesseract feature to extract the name,email address,amounts type of fields from documents. On Sat, Apr 24, 2021 at 2:50 PM Zdenko Podobny wrote: > Please be

Re: [tesseract-ocr] Tesseract ocr

2021-04-24 Thread Zdenko Podobny
Please be more specific: provide an example of what your input is and what you want to achieve. Zdenko so 24. 4. 2021 o 7:58 Mohammad Waqas Shoukat Ali napísal(a): > hi team, > > i want to understand how i can teach my tesseract model for different > files format. > > -- > You received this

[tesseract-ocr] Tesseract ocr

2021-04-23 Thread Mohammad Waqas Shoukat Ali
hi team, i want to understand how i can teach my tesseract model for different files format. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to

Re: [tesseract-ocr] tesseract-ocr for train persian language

2020-12-11 Thread Shree Devi Kumar
I don't think jTessBoxEditor supports RTL languages like Persian. You can try using tesstrain.sh On Fri, Dec 11, 2020 at 8:57 PM alireza m wrote: > hi i want to train persian language by b nazanin font but jTessBoxEditor > doesn't have b nazanin font how can i add new font??? > > -- > You

[tesseract-ocr] tesseract-ocr for train persian language

2020-12-11 Thread alireza m
hi i want to train persian language by b nazanin font but jTessBoxEditor doesn't have b nazanin font how can i add new font??? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it,

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-10-27 Thread Sorosh Shiwa
hello thanks a lot for information but how can i use it in flutter? please reply my question sorosh shiwa On Tue, Oct 27, 2020 at 2:36 PM write2...@gmail.com wrote: > not able to extract this. can anyone able to extract this? > > On Thursday, August 13, 2020 at 3:31:19 PM UTC+3 Mahmoud Mabrouk

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-10-27 Thread write2...@gmail.com
not able to extract this. can anyone able to extract this? On Thursday, August 13, 2020 at 3:31:19 PM UTC+3 Mahmoud Mabrouk wrote: > for numbers i used this and works fine with AEN numbers > https://github.com/ahmed-tea/tessdata_Arabic_Numbers > > > On Thursday, 13 August 2020 13:41:12 UTC+2,

[tesseract-ocr] Tesseract OCR numbers in figures not found

2020-10-20 Thread MaSei
I want to extract numbers from an image. Usually the numbers are around some figure and sometimes within the figure. I'm using Tesseract for this task. Tesseract works quite well for documents with a lot of text but I have not really found the right parameters to get good results for this

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-08-19 Thread Anuradha B
Thanks Mahmoud...DO we have to just copy the ara_number.traineddata file from https://github.com/ahmed-tea/tessdata_Arabic_Numbers to the tessdata folder in the local system.I am using Google colab

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-08-13 Thread Mahmoud Mabrouk
for numbers i used this and works fine with AEN numbers https://github.com/ahmed-tea/tessdata_Arabic_Numbers On Thursday, 13 August 2020 13:41:12 UTC+2, Anuradha B wrote: > > I am trying to extract the arabic dates and numbers from the national ID > card.I am using the following code in

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-14 Thread Shree Devi Kumar
@Eliyaz I do not know Arabic or any other RTL. I suggest you try running training with the latest code and tesstrain. You may have to experiment to get the best result. I will try to do a test run with the data you provided, does it include numbers and dates? On Tue, Jul 14, 2020, 13:18 Eliyaz L

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-14 Thread Eliyaz L
Hi sorry to bother, just a follow up. i tried the latest tesseract its working fine with the arabic text and numbers but the only issue is with arabic date, so if the issue is still open, can i prepare dataset and train a separate custom model for only numbers and date. if possible then pls

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-13 Thread Eliyaz L
Thanks for the support, it saves lot of time and efforts. i tried the latest tesseract its working fine with the arabic text and numbers but the only issue is with arabic date, so if the issue is still open, can i prepare dataset and train a separate custom model for only numbers and date. if

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
If I recall correctly, ara_number.traineddata has been trained for legacy engine. You cannot use two traineddata files each using a different engine. Regarding training of Arabic numbers and punctuation, it is currently an open issue. If you use the latest code from tesstrain repo it should

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Hi Shree, i was using thie below version. I guess you are right its 2016 file. Let me test with latest traineddata. https://tesseract-ocr.github.io/tessdoc/Data-Files https://github.com/tesseract-ocr/tessdata/raw/4.00/ara.traineddata Meanwhile can u pls help me with arabic number. i tried

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
See https://github.com/tesseract-ocr/tesseract/issues/758 and other similar issues On Sun, Jul 12, 2020 at 6:52 PM Shree Devi Kumar wrote: > @Eliyaz What version of tesseract are you using? Which traineddata? > > >Always the letter "لا" is predicted as "ال" . > > I think this was fixed by Ray

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
@Eliyaz What version of tesseract are you using? Which traineddata? >Always the letter "لا" is predicted as "ال" . I think this was fixed by Ray Smiith in 2017 and should be ok in the traineddata files in tessdata_fast and tessdata_best repos. On Sun, Jul 12, 2020 at 6:45 PM Rainer Verteidiger

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Rainer Verteidiger
Always the letter "لا" is predicted as "ال" . Not sure how much relevancy that bears in the context of training models, but لا is no letter! It's a ligature ("Arabic Ligature Lam with Alef") formed by combining ل ("Arabic Letter Lam") with ا ("Arabic Letter Alef") whereas ال is ا followed

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Always the letter "لا" is predicted as "ال" . My training data here My prediction document will be in Traditional Arabic font here . Below shell command

Re: [tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Shree Devi Kumar
What character are you trying to add? Please share the training data to try and replicate the issue. On Sun, Jul 12, 2020, 15:35 Eliyaz L wrote: > Hi, > > > My use case is on Arabic document, the pre retrained ara.traineddata are > good but not perfect. so i wish to fine tune ara.traineddata,

[tesseract-ocr] Tesseract-OCR Training Arabic text & numbers

2020-07-12 Thread Eliyaz L
Hi, My use case is on Arabic document, the pre retrained ara.traineddata are good but not perfect. so i wish to fine tune ara.traineddata, if the results are not satisfying then have train my own custom data. please suggest me for the following: 1. for my use case in Arabic text,

[tesseract-ocr] Tesseract ocr act weird while scalling up image size. How to know which scale factor is best for some particular types of image?

2020-06-19 Thread Navpreet Devpuri
How can we figure out which scale factor is best without checking ocr for every scale factor for some particular types of image ? explained in details at stackoverflow question

Re: [tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

2020-04-30 Thread tristan gordon
Know the resolution, and headers, where the issue for Tesseract OCR PHP the following (should help) for anyone in future looking for a solution: 1. Create your imagick instance, ie $image -> new Imagick('image.jpg'); 2. Then set the resolution using two lines, first:

Re: [tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

2020-04-30 Thread tristan gordon
Thank you. Now to look at imagick to set the resolution! On Thursday, 30 April 2020 10:36:56 UTC+1, shree wrote: > > Looks like the image resolution is not set correctly. You can specify dpi > while processing. > > ubuntu@tesseract-ocr:~/TEST$ tesseract 82.png - --dpi 300 > 82 >

Re: [tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

2020-04-30 Thread Shree Devi Kumar
Looks like the image resolution is not set correctly. You can specify dpi while processing. ubuntu@tesseract-ocr:~/TEST$ tesseract 82.png - --dpi 300 82 ubuntu@tesseract-ocr:~/TEST$ tesseract 81.png - --dpi 300 81 On Thu, Apr 30, 2020 at 2:57 PM tristan gordon wrote: > Hello all, > > Could

[tesseract-ocr] Tesseract OCR Failing to Read Cleaned Numbers. Suggestions Please?

2020-04-30 Thread tristan gordon
Hello all, Could you help? Attached are two images containing two numbers, 81 and 82, which I am attempting to get Tesseract OCR to read. Each time Tesseract OCR is returning empty page and producing an empty text.txt document. The error is displaying as follows: # tesseract 82.png out

[tesseract-ocr] Tesseract OCR table data not printing correctly

2020-03-01 Thread Srikanth Vijayakumar
Hello, In the pdf file , it contains table which contains serial numbers (1-50) and relevant to that dollar values (Eg. $10655.9) are present. When I try to extract, it is printing the below output [| oRoUP | RAYE [ | | 1 | $106559| | | 2 | seoatslodPomeroySweetsutes | | [3 |

Re: [tesseract-ocr] tesseract ocr to pdf from .tif file send from fax machine

2020-02-06 Thread Zdenko Podobny
try to use the latest version. Zdenko št 6. 2. 2020 o 20:09 George Varghese napísal(a): > tesseract installed on Windows 2012 R2 server > > tesseract version v5.0.0-alpha.20191030 with Leptonica > > command line tesseract a1.tif a -l eng -psm 4 --oem 1 -pdf - > > does create a pdf with

[tesseract-ocr] tesseract ocr to pdf from .tif file send from fax machine

2020-02-06 Thread George Varghese
tesseract installed on Windows 2012 R2 server tesseract version v5.0.0-alpha.20191030 with Leptonica command line tesseract a1.tif a -l eng -psm 4 --oem 1 -pdf - does create a pdf with white font and a black background. Please let me know how option should I pass to get a white background

[tesseract-ocr] tesseract-ocr 4.1.1 release

2019-12-26 Thread Zdenko Podobny
Hello all, Stable version of tesseract-ocr engine 4.1.1 was released today [1] . This is bugfix release for 4.x branch. With this release cppan build system was marked as obsolete. It successor software-network (aka sw) was implemented instead. Autotools and cmake are supported as main build

Re: [tesseract-ocr] Tesseract-OCR giving different results for same image on different systems.

2019-12-19 Thread adesh gautam
Both AVX and AVX2 are enabled on both the systems. I am not using specific tessdata_fast or tessdata_best. I am using the default eng.traineddata that comes with windows installer. On Tuesday, December 17, 2019 at 9:36:16 PM UTC+5:30, shree wrote: > > >There is the same version of tesseract on

Re: [tesseract-ocr] Tesseract-OCR giving different results for same image on different systems.

2019-12-17 Thread Shree Devi Kumar
>There is the same version of tesseract on the two systems as i mentioned before. OK. But is there any difference in specs of the 2 systems in terms of AVX etc. Hence tesseract -v would be useful. Also, just check the results via CLI. I get different results when using eng.traineddata from

Re: [tesseract-ocr] Tesseract-OCR giving different results for same image on different systems.

2019-12-16 Thread adesh gautam
The file size of eng.traineddata is same - 3.92MB. On Tuesday, December 17, 2019 at 12:47:28 PM UTC+5:30, shree wrote: > > Please check file sizes for eng.traineddata - they maybe different > versions even though they are called the same. > > On Mon, Dec 16, 2019 at 9:06 PM adesh gautam >

Re: [tesseract-ocr] Tesseract-OCR giving different results for same image on different systems.

2019-12-16 Thread Shree Devi Kumar
Run tesseract --version on the different systems. Are thetraineddata files being used on the different systems the same? Share an image and the different output received in each case. On Mon, Dec 16, 2019, 17:58 adesh gautam wrote: > Hi, > > I am using tesseract-ocr on my images, and i am

[tesseract-ocr] Tesseract-OCR giving different results for same image on different systems.

2019-12-16 Thread adesh gautam
Hi, I am using tesseract-ocr on my images, and i am getting different results by running tesseract on different systems for same image. I am using *pytesseract *library. I am setting the following parameters: *--psm 6 -c classify_enable_learning=0 -c classify_enable_adaptive_matcher=0*

Re: [tesseract-ocr] Tesseract ocr failed to recognize number from number plate images

2019-10-22 Thread Sangharsh Kamble
Yes I apply various image filtering process on image and also go through the OpenAlpr project site. But I have to create my own alpr project so I need this code. On Tue 22 Oct, 2019, 9:27 PM Timothy Snyder, wrote: > Yes you're going to have to do a significant amount of image processing to >

Re: [tesseract-ocr] Tesseract ocr failed to recognize number from number plate images

2019-10-22 Thread Timothy Snyder
Yes you're going to have to do a significant amount of image processing to transform those license plates into straight black text on a white background. Have you tried out the OpenALPR project? On Tue, Oct 22, 2019 at 4:00 AM Sangharsh Kamble wrote: > [image: 2.jpeg] > > [image: 4.jpeg] > >

Re: [tesseract-ocr] Tesseract ocr failed to recognize number from number plate images

2019-10-22 Thread Zdenko Podobny
Unless you provide clear images (black letters on white background) (maybe with straight text, but this could be handle by leptonica) you can not expect that tesseract will provide you correct results. Zdenko ut 22. 10. 2019 o 10:00 Sangharsh Kamble napísal(a): > [image: 2.jpeg] > > [image:

[tesseract-ocr] Tesseract ocr failed to recognize number from number plate images

2019-10-22 Thread Sangharsh Kamble
[image: 2.jpeg] [image: 4.jpeg] [image: ALARM_ANPR_2019_09_26_00_09_48_0878_crop.jpg] [image: ALARM_ANPR_2019_09_26_00_48_02_0976_crop.jpg] import cv2 import numpy as np from PIL import Image import pytesseract from scipy import ndimage from scipy.ndimage import rotate #from matplotlib

Re: [tesseract-ocr] tesseract-ocr pdf input to searchable pdf (ocr-ed) and djvu input to searchable pdf

2019-10-21 Thread Zdenko Podobny
Yes, tesseract can create searchable pdf (I am not sure how you define if process is reliable...). Tesseract input must be image (or list of images in text file) so you can not directly convert pdf pr djvu files to searchable pdf. But there are tools like OCRmyPDF[1] that can help you with

[tesseract-ocr] tesseract-ocr pdf input to searchable pdf (ocr-ed) and djvu input to searchable pdf

2019-10-21 Thread tuxcrafter
Hello everybody, Our Xerox machines died again that has the option to do standalone scans to searchable pdf. We only have linux workstations. I am now looking at buying a cheaper scanning solutions and do pdf to searchable pdfs on the workstations. Can tesseract-ocr be used to convert pdf's

Re: [tesseract-ocr] Tesseract OCR 4 paper

2019-09-11 Thread Jennil Thiyam
okay thanks, I will go through it On Wed, Sep 11, 2019 at 6:48 PM Shree Devi Kumar wrote: > did you see the following links > > >- > >NeuralNetsInTesseract4.00 > >- > >VGSLSpecs

Re: [tesseract-ocr] Tesseract OCR 4 paper

2019-09-11 Thread Shree Devi Kumar
did you see the following links - NeuralNetsInTesseract4.00 - VGSLSpecs - VGSLSpecs info from Tensorflow

Re: [tesseract-ocr] Tesseract OCR 4 paper

2019-09-11 Thread Jennil Thiyam
Shree do you have any other links that talk about how LSTM works in tesseract OCR On Wed, Sep 11, 2019 at 6:33 PM Shree Devi Kumar wrote: > https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#documentation > > > > > On Wed, Sep 11, 2019 at 6:29 PM Jennil Thiyam > wrote: > >> Does

Re: [tesseract-ocr] Tesseract OCR 4 paper

2019-09-11 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#documentation On Wed, Sep 11, 2019 at 6:29 PM Jennil Thiyam wrote: > Does anyone has the link that describes the working of Tessercat 4, I > found paper that talks about the processing steps of tesseract 3, but > failed to get any

[tesseract-ocr] Tesseract OCR 4 paper

2019-09-11 Thread Jennil Thiyam
Does anyone has the link that describes the working of Tessercat 4, I found paper that talks about the processing steps of tesseract 3, but failed to get any research paper that describes tesseract 4. Please let me know -- You received this message because you are subscribed to the Google

Re: [tesseract-ocr] Tesseract-ocr 3.05 vb6 integration

2019-05-03 Thread Giuseppe Romano
Thank, i try to use this on my vb6 project. Best regards Giuseppe Il giorno ven 3 mag 2019 alle ore 17:04 Zdenko Podobny ha scritto: > it is just older version, but basics are the same: > https://github.com/tesseract-ocr/tesseract/blob/3.05/api/baseapi.h > > Or C-API: >

Re: [tesseract-ocr] Tesseract-ocr 3.05 vb6 integration

2019-05-03 Thread Zdenko Podobny
it is just older version, but basics are the same: https://github.com/tesseract-ocr/tesseract/blob/3.05/api/baseapi.h Or C-API: https://github.com/tesseract-ocr/tesseract/blob/3.05/api/capi.h Zdenko pi 3. 5. 2019 o 16:57 Giuseppe Romano napísal(a): > Thanks for the answer but my tesseract

Re: [tesseract-ocr] Tesseract-ocr 3.05 vb6 integration

2019-05-03 Thread Giuseppe Romano
Thanks for the answer but my tesseract version is the 3.05 and tesseract 40.dll in the installation folder unfortunately not exist. Tesseract.exe use libtesseract-3.dll and i haven't api entry point, can you help me for the api declarations? Thanks Il giorno ven 3 mag 2019 alle ore 16:09 Zdenko

Re: [tesseract-ocr] Tesseract-ocr 3.05 vb6 integration

2019-05-03 Thread Zdenko Podobny
What do you mean with "i haven't find the way to use it by dll"? "is this possible?" yes it is. tesseract.exe use tesseract40.dll. So you can use is at any other library. Zdenko pi 3. 5. 2019 o 9:16 Giuseppe Romano napísal(a): > Hi, > my name is Giuseppe, i have this problem. I use

[tesseract-ocr] Tesseract-ocr 3.05 vb6 integration

2019-05-03 Thread Giuseppe Romano
Hi, my name is Giuseppe, i have this problem. I use tesseract-ocr 3.05 with shell integration, but is really slow. i haven't find the way to use it by dll, is this possible? The project unfortunatly must be developed on vb6 platform. Thanks -- You received this message because you are

[tesseract-ocr] tesseract ocr

2019-04-25 Thread widi yanto
[image: images1.png] I want to change this image so that it will be the letter I will print I ask 1. Analyze the components of the image in the picture? 2. What software for simulation? 3. what method do you use? thank you for the help -- You received this message because you are

[tesseract-ocr] tesseract ocr for wide images in landscape mode

2018-12-28 Thread Chirabrata Bhaumik
Hi, I have a picture in landscaoe mode with 1660pixel(W) by 470(H). While running tesseract 4.0 with LSTM on it, tesseract is not performing OCR on the whole document. From the output it looks more like after about 80 or 120 columns, tesseract is nit doing the OCR. Any idea how to overcome

Re: [tesseract-ocr] tesseract ocr wont read the letters in the attached chart

2018-12-18 Thread JJ
Thanks a lot Actually tesseract 3.0 does recognize it. Unfortunatlly the c# api was based on tesseract 3.1 so I had to write a pinvoke for tesseract 3.0 and problem solved. On Thursday, December 13, 2018 at 12:49:44 PM UTC-8, zdenop wrote: > > I am afraid you need to first implement some

[tesseract-ocr] tesseract ocr wont read the letters in the attached chart

2018-12-10 Thread JJ
Hi I have been trying to get tesseract ocr api and command line to recognize and locate the letters in the attached pic with no success. I have modified the image, added blur and/or sharpened with no luck. To me it doesn't seem it should be that challenging Anyone has any idea why? I am

Re: [tesseract-ocr] tesseract-ocr

2018-06-19 Thread Navaneetha Bitla
using serak trainer i have trained the 1300 hand written fonts. it doesnt show the accuracy level and iterations. is that important, actually i dont know that's why i'm asking. Thank you for the immediate replay. On Tue, Jun 19, 2018 at 4:00 PM, Shree Devi Kumar wrote: > Which version of

Re: [tesseract-ocr] tesseract-ocr

2018-06-19 Thread Shree Devi Kumar
Which version of tesseract/. How did you train the fonts? What was accuracy level for training? How many iterations? On Tue, Jun 19, 2018 at 3:00 PM Navaneetha Bitla wrote: > Hi, this is Navaneetha > > i'm working in hand written character recognition project. > > I have trained 1300 different

[tesseract-ocr] tesseract-ocr

2018-06-19 Thread Navaneetha Bitla
Hi, this is Navaneetha i'm working in hand written character recognition project. I have trained 1300 different hand written fonts of english and moved the files into tessdata directory. tested tesseract using the below commands: $convert -density 300 input.png -depth 8 -strip -background

[tesseract-ocr] Tesseract OCR quality issues with python

2018-06-12 Thread Vidur Malhotra
Hi, I tried running tesseract OCR on the same image using below 2 approach: 1. Command line (tesseract version 3.05.01) tesseract image.jpg out.txt 2. using pytesseract in python (pytesseract version 0.2.2) import PIL from PIL import Image import pytesseract text =

[tesseract-ocr] tesseract ocr How to shield the influence of decimal point on recognition

2017-11-17 Thread 强华东
when i ocr this pic ,the out is “20174108090长期 ” my traineddata only was trained by “1234567890” and “长期 ” how to refuse to

[tesseract-ocr] Tesseract OCR shows random alphabets only

2017-09-21 Thread adeel . noor
I'm implementing Tesseract OCR in my iOS application using real time camera update method - (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection I am passing the input image

Re: [tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

2017-08-29 Thread ShreeDevi Kumar
Try first with best/Latin.traineddata that should handle text with diacritics --- >>Pango suggested font Gandhari Unicode. Use "Gandhari Unicode" within quotes as Font name >>ERROR: Could not find training text file /usr/local/share/tessdata// eng/eng.training_text give script_dir

[tesseract-ocr] Tesseract OCR 4.0.0 Alpha how to train a new font

2017-08-29 Thread Anand Akella
Hi, Im new to tesseract and have a pdf file with diacritical marks. I tried to run tesseract 4.0.0 with language eng. I see that it is not able to recognize the text with diacritical marks. I found a font that can detect diacritical mark. Gandhari Unicode 5.1

[tesseract-ocr] tesseract-ocr vs SwiftOCR

2017-08-16 Thread Kiran Patil
Hi, Anybody has tried to compare SwiftOCR with tesseract-ocr (LSTM engine) ? https://github.com/garnele007/SwiftOCR Please post your findings here. Regards, Kiran. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this

[tesseract-ocr] tesseract-ocr-ell, tesseract-ocr-grc: improvements

2017-07-31 Thread dimitrDimitr
At http://www.elspell.gr/myspell there is OpenOffice Greek Dictionary v0.9 with 800.000 greek words encoded with windows-1253, under MPL 1.1/GPL 2.0/LGPL 2.1 License. Polytonic characters aren't used after 1982 and we don't have

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-17 Thread akhil katpally
Thanks Shree for pointing out. master is for tesseract 4.0. On Friday, July 7, 2017 at 9:36:37 AM UTC-7, shree wrote: > > ​for 3.05 don't you need to checkout the 3.05 branch??​ > master is for 4.0 development. > > ShreeDevi > > भजन -

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-07 Thread ShreeDevi Kumar
​for 3.05 don't you need to checkout the 3.05 branch??​ master is for 4.0 development. ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Jul 7, 2017 at 9:22 PM, akhil katpally wrote: >

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-07-07 Thread akhil katpally
Steven .. Here is the list of commands to install tesseract 3.05 on Redhat 6 ... Hope this should work for Redhat 5 ... if not please try to downgrade the tesseract and try .. sudo yum update sudo yum install wget unzip sudo yum install gcc gcc-c++ make sudo

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-06-27 Thread Steven Heydendahl
Is tesseract 3.05 available for redhat 5? Can we just rpm it or do we have to add a repository? On Tuesday, June 27, 2017 at 2:07:59 PM UTC-6, zdenop wrote: > > 2.04 is too old. > Please ask install 3.05 + language data (at least eng and osd) > > Zdenko > > On Tue, Jun 27, 2017 at 9:58 PM,

Re: [tesseract-ocr] Tesseract-ocr on Redhat 5

2017-06-27 Thread Zdenko Podobný
2.04 is too old. Please ask install 3.05 + language data (at least eng and osd) Zdenko On Tue, Jun 27, 2017 at 9:58 PM, Steven Heydendahl wrote: > Hi all, > > Novice here. I had made a request at my company to install tesseract-ocr > on our redhat 5 OS. > > They ended up

[tesseract-ocr] Tesseract-ocr on Redhat 5

2017-06-27 Thread Steven Heydendahl
Hi all, Novice here. I had made a request at my company to install tesseract-ocr on our redhat 5 OS. They ended up installing the following: rpm -Vp "tesseract-2.04-1.el5.rf.x86_64.rpm" which is apparently an older version of tesseract. Now, that completed successfully however, every

[tesseract-ocr] tesseract ocr + swt algorithm

2016-11-27 Thread 吉亦玮
hello,everyone , currently , i am developing an application to detect the text in nature scenes, after extracting the text from the nature scenes ( the images are took by a 3d camera ), then use the tesseract ocr to extract the text from the image . based on this idea, could i use the

[tesseract-ocr] Tesseract ocr recognition rate

2016-09-02 Thread Suin You
There are 2 questions. I'm only using tess4j for JAVA. It may not have the function that gets the result's accuracy rate. can I know the tesseract result accuracy rate? And, How do I distribute the Graphic Text and 2D Graphic Image like logo or emblem? Sometimes the tesseract recognized 2D

Re: [tesseract-ocr] Tesseract-ocr duration time calculation

2016-07-19 Thread Zdenko Podobný
Have a look at Text Fairy app[1] video 0:22. Are you looking for something like that? If yes, source code is available at github[2]. rmtheis wrote short blog post about it too[3]... [1] https://play.google.com/store/apps/details?id=com.renard.ocr [2] https://github.com/renard314/textfairy [3]

[tesseract-ocr] Tesseract-ocr duration time calculation

2016-07-10 Thread beeingtime
Hi, is there any way how could be calculated an image OCR duration time before the recognation itself will start? I would like to make some kind of time status bar showing how long will it take to process the image. I tried to find dependence between image size or volume but there is no any

[tesseract-ocr] tesseract ocr for ocv

2016-07-08 Thread chrisobvofc
Hello, I want to use tesseract to check if there is printed the correct freshness date on a product. After teaching tesseract i still have about 10% of dates read wrong. Mostly 5 and 6 are mixed up. However, since I know the what the date should be I can set a whitelist with the according

  1   2   >