[tesseract-ocr] Detection with whitelist

2021-02-16 Thread Kostas
Hi team

great job with the lib. I am trying to find a specific word in a text and I 
use the whitelist for that to get maximum true positives. For example if I 
search "accept" I define the white list "acept". However words like ac4cept 
or accefpht are also identified. Can i work around that? Is there a way 
e.g. to do letter by letter analysis or any other good idea? 

Thanks a lot! 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/50b37a35-e8dd-4784-9086-485725414ccfn%40googlegroups.com.


[tesseract-ocr] Re: Data in Excel sheet

2021-02-16 Thread Kostas
I just read the documentation, perhaps goes like that: 

Tables recognitions

It is known tesseract has problem to recognize text/data from tables (see 
issues 
tracker ) without custom 
segmenation/layout analyze. You can try to use/test Sintun proposal 
 
or 
get idea for Text Extraction from a Table Image, using PyTesseract and 
OpenCV 

/code for Text-Extraction-Table-Image 


murtuz...@gmail.com schrieb am Dienstag, 16. Februar 2021 um 13:16:34 UTC+1:

> +1
>
> On Wednesday, October 9, 2019 at 10:33:34 PM UTC+5:30 myquest wrote:
>
>> Hi Friends,
>>
>> Please advise me how to get the table data from image in csv format using 
>> tesseract?
>>
>> Inam
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1103db58-bf05-4b5c-a23e-324055648aa2n%40googlegroups.com.


[tesseract-ocr] Re: Data in Excel sheet

2021-02-16 Thread Murtuza Dahodwala
+1

On Wednesday, October 9, 2019 at 10:33:34 PM UTC+5:30 myquest wrote:

> Hi Friends,
>
> Please advise me how to get the table data from image in csv format using 
> tesseract?
>
> Inam
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5d4efe85-b725-4955-8a8f-8481599d1c08n%40googlegroups.com.


[tesseract-ocr] Re: Table Detection using Tesseract

2021-02-16 Thread Murtuza Dahodwala
You can do this easily with YOLOV4

On Wednesday, February 19, 2020 at 8:05:17 PM UTC+5:30 mit wrote:

> Hi,
>
> Just wanted to know if there is any way to detect table using 
> Tesseract(both with border and borderless).Like If its possible to train 
> tesseract to recognise the table.
>
> TIA
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/21e66108-b3f2-49ed-b47a-2602062fe537n%40googlegroups.com.


Re: [tesseract-ocr] Re: seven segment display - 4.0 traineddata

2021-02-16 Thread Shubham Trivedi

Hello where can I find training data which contains only numbers and no 
alphabets or special characters
On Friday, 6 March 2020 at 10:07:51 UTC+5:30 Bhakti Shah wrote:

> Hey Dominik, Is ssd.traineddata working for you?
> Are you getting decimal point?
>
>
> On Saturday, April 20, 2019 at 6:59:36 PM UTC+5:30, Dominik Helleberg 
> wrote:
>
>> thx will give it a try!
>> Cheers
>> Dominik
>>
>> Am Do., 18. Apr. 2019 um 14:52 Uhr schrieb Shree Devi Kumar <
>> shree...@gmail.com>:
>>
> Hi Dominik,
>>>
>>> Please see https://github.com/Shreeshrii/tessdata_ssd/issues/1
>>>
>>> The repo has a few traineddata files that you can try.
>>>
>>> On Wed, Apr 17, 2019 at 1:24 AM Dominik Helleberg  
>>> wrote:
>>>
>> Hi,

 I'm want to read 7-segment LED-Displays like this. I'm also 
 investigating CNNs at the moment as I'm not sure if tesseract is the right 
 choice here.

 Any opinions?
 Cheers

 Dominik


 [image: lcd3.png]


 On Saturday, April 13, 2019 at 11:05:15 PM UTC+2, shree wrote:
>
> I had deleted the files but can rerun the training again.
>
> It will be helpful if you can provide a sample text of the kind of 
> display you are trying to recognize - eg. is it just digits,  or also 
> includes letters - which ones, only uppercase??? Please provide at least 
> 100 lines of text.
>
> A sample image as well as any info on similar looking fonts will be 
> helpful.
>
>
>
>
> On Sat, Apr 13, 2019 at 11:07 PM Dominik Helleberg <
> dominik@gmail.com> wrote:
>
>> Hi, the download gives a 404 error :( 
>> Since I'm trying to detect 7 segment displays with tesseract 4, i 
>> would really like to try it out.
>> Do you have the download still at hand somewhere?
>> Thanks!
>>
>> On Wednesday, March 29, 2017 at 5:40:32 PM UTC+2, shree wrote:
>>>
>>> Hi,
>>>
>>> I have built a 4.0 traineddata using some seven segment display 
>>> fonts. Trained mostly on numbers 0-9, capital letters A-Z, : etc.
>>>
>>> It is uploaded as a zip file at 
>>> https://github.com/Shreeshrii/tessdata4alpha/raw/master/ssd1.zip
>>>
>>> unzip to get ssd1.traineddata 
>>>
>>> I have not tested it much. Seemed to work with the sample images 
>>> provided in this email.
>>>
>>> Since B and 8, O and 0, S, Z and 5 all look similar in this display, 
>>> there would be errors.
>>>
>>> Should this be trained just for numbers, may have a better accuracy 
>>> then.
>>>
>>> I am doing another run of training to see if there are improvements.
>>>
>>> Meanwhile, those who actually want this should give it a try and 
>>> provide feedback.
>>>
>>>
>>> ShreeDevi
>>> 
>>> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>>>
>>> On Fri, Mar 24, 2017 at 3:04 PM,  wrote:
>>>

 Hello, 
 I am basically working in electronics field and new to C#.Currently 
 I am working on one project (Image processing in C#) where i am using 
 C#,where in one of the part i have to detect text or digits of 7 
 segment 
 display image for that on google i found Tesseract  solution.

 For experiment i have first try to convert normal text image in to 
 text file and it is working fine for some of the basic images but it 
 is not 
 working with 7 segment display.so i came to know i required trained 
 data 
 file for 7 segment.

 For training 7 segment data i follow the steps which are shown in 
 vidoe of below link:https://www.youtube.com/watch?v=i_1-hGsXxy8.
 But the output.txt file showing in that video is not generating in 
 my case.Due to which after using trained 7 segment data file ,i am 
 getting 
 garbage value in text file.So for checking that i am getting proper 
 trained 
 file or not , i have follow the procedure which is shown on that video 
 but 
 it is giving an error  like outpt.txt file not found.Is this happened 
 because of missing otput.txt file or something else i am missing to 
 do.I 
 have follow all the steps which are shown in that video for training 7 
 segment data.

 Also i have installed jTessBoxEditorFX.jar, serak trainer & 
 Tesseract-ocr v3.02.So at the end i am just stuck in the point 
 where i don't know where i am going wrong,is my procedure is wrong or 
 software installation is not proper because after installing tesseract 
 there is red cross mark against tesseract.

 Please somebody help me to figure it out.If possible please provide 
 me 7 segment trained data file and also the exact steps to trained 7