date:20170607

[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

2017-06-07 Thread Shaw Ryan

Yes box file is empty.
I will try to process image first,
Appreciating what you done for me. Thank you
在 2017年6月8日星期四 UTC+8上午2:31:41，Quan Nguyen写道：
>
> I don't see any box file, but from the appearance of the image, Tesseract 
> probably had problems recognizing it, therefore, producing an empty box 
> file. You'll need to perform some image processing first to make the image 
> more amenable to Tesseract.
>
> On Tuesday, June 6, 2017 at 9:44:58 PM UTC-5, Shaw Ryan wrote:
>>
>> Thank you 
>> I have uploaded box and tiff
>> Please help
>> 在 2017年6月5日星期一 UTC+8下午6:27:14，Shaw Ryan写道：
>>>
>>>
>>> 
>>> How can I edit the data?
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8434a139-94a9-4949-ad9f-5f897ea18f65%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: What is the "Confidence"value returned by Tesseract and how it is calculated?

2017-06-07 Thread akhil katpally

 3-> Yes you can get the confidence at the character level ... please see 
the tesseract api examples ... 
https://github.com/tesseract-ocr/tesseract/wiki/APIExample#example-of-iterator-over-the-classifier-choices-for-a-single-symbol
 
  
 1-> Don't know .. i am looking for it as well. Hope this will be helpful 
.. When ever tesseract tries to recognizes a particular character it has 
different choices for that letter, of all those it takes one with maximum 
confidence value and returns to us ... you can even get the difference 
choices and its confidence with tesseract::ChoiceIterator() method.
2-> What do you mean by changing accuracy levels of tesseract?   
On Thursday, June 1, 2017 at 4:09:12 AM UTC-7, Thilina Jayathilaka wrote:
>
> Hello, 
>
> 1. I need to know what is the confidence value (returned by tesseract API) 
> and how it calculates that value? 
>
> 2. Is there any possibility that I can change the accuracy levels of 
> tesseract? 
>
> 3. Can I detect the confidence value for *each letter* separately when I 
> pass an image which contains a *word*?
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d94473bf-49cc-416d-8ff1-daa3458abd98%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: how to use tesseract to detect table?

2017-06-07 Thread akhil katpally

You can use tesseract parameters .. internally tesseract detects the tables 
you can leverage that information and print it out ... and also one of the 
parameter will print you out the detected table information (coordinates). 
textord_dump_table_images ---  Show table regions (this would dump 
intermediate images which will ) 
textord_tablefind_show_stats  ---Show page stats used in table 
finding 
and there are some more you can try them ... 
to use the parameters in the command line you can use -c option followed by 
parameters. 

On Monday, April 17, 2017 at 1:03:03 AM UTC-7, Azka Gilani wrote:
>
> @johnny did you find anything in that? i am stuck on the same problem.
> @dinh van Chinh that method doesn't use tesseract api!
>
> On Monday, July 18, 2016 at 6:05:54 PM UTC-4, Johnny ho wrote:
>>
>> Are there any examples to show how to use Tesseract to detect tables in 
>> an images?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/253415aa-24f7-48af-aa5c-564a597d975c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

2017-06-07 Thread Quan Nguyen

I don't see any box file, but from the appearance of the image, Tesseract 
probably had problems recognizing it, therefore, producing an empty box 
file. You'll need to perform some image processing first to make the image 
more amenable to Tesseract.

On Tuesday, June 6, 2017 at 9:44:58 PM UTC-5, Shaw Ryan wrote:
>
> Thank you 
> I have uploaded box and tiff
> Please help
> 在 2017年6月5日星期一 UTC+8下午6:27:14，Shaw Ryan写道：
>>
>>
>> 
>> How can I edit the data?
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/08f51a21-b26d-4c33-98ed-f6fc6336a934%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Place of feature extraction in optical character recognition

2017-06-07 Thread PashaTurkish

Hi all

My question is closer to OCR theory then to tesseract-ocr but I post it
here because it anyway is related with ocr and ocr software.

I am learning OCR and reading this book
https://www.amazon.com/Character-Recognition-Different-Languages-Computing/dp/3319502514

The authors define 8 processes to implement OCR that follow one by one (2
after 1, 3 after 2 etc):

1. Optical scanning
2. Location segmentation
3. Pre-processing
4. Segmentation
5. Representation
6. Feature extraction
7. Recognition
8. Post-processing

This is what they write about representation (#5)

The fifth OCR component is representation. The image representation plays
one of the most important roles in any recognition system. In the simplest
case, gray level or binary images are fed to a recognizer. However, in most
of the recognition systems in order to avoid extra complexity and to
increase the accuracy of the algorithms, a more compact and characteristic
representation is required. For this purpose, a set of features is
extracted for each class that helps distinguish it from other classes while
remaining invariant to characteristic differences within the class.The
character image representation methods are generally categorized into three
major groups: (a) global transformation and series expansion (b)
statistical representation and (c) geometrical and topological
representation.

This is what they write about feature extraction (#6)

The sixth OCR component is feature extraction. The objective of feature
extraction is to capture essential characteristics of symbols. Feature
extraction is accepted as one of the most difficult problems of pattern
recognition. The most straight forward way of describing character is by
actual raster image. Another approach is to extract certain features that
characterize symbols but leaves the unimportant attributes. The techniques
for extraction of such features are divided into three groups’ viz. (a)
distribution of points (b) transformations and series expansions and (c)
structural analysis.

Please, explain, why feature extraction is after representation, but not
before it. As I understand at representation we get from image (!) certain
model of character, so after that we must match this model to certain
class. I don't understand what we do at feature extraction. Or I understand
everything wrong. Please, help.

The question was also asked on SO
https://stackoverflow.com/questions/44396721/place-of-feature-extraction-in-optical-character-recognition

Best regards, Pavel

--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit
https://groups.google.com/d/msgid/tesseract-ocr/47a66d93-ff4e-48c1-9144-3a17d01614c3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] improve image so i can better OCR

2017-06-07 Thread eliav schmulewitz



Hi

I posted this on stackoverflow but got no response...


I am trying to read subtitles from an image taken from the news using 
tesserract on python. 
for some reasons I get better results when saving the file using plt and 
using tesseract reading it from there

   1. Why is that?
   2. How can I refine my results using cv2?

import urllib3import requestsimport numpy as npimport pytesseractimport 
matplotlib.pyplot as pltfrom  PIL import Imagedef downloadFile():
url = 
'https://drive.google.com/uc?export=download=0B7t_yZLolnbiaVpicnEwbDRjTmc'
http = urllib3.PoolManager()
r = http.request('GET',url)
f = open('testing.npy', 'wb')
f.write(r.data)

downloadFile()
frame = np.load('testing.npy')
new_frame = frame[170:210,8:195]
plt.imshow(new_frame)
plt.axis('off')
plt.savefig('plt.png')print('from array: ' + 
pytesseract.image_to_string(Image.fromarray(new_frame),lang = 'eng'))print( 
'from plt: ' + pytesseract.image_to_string(Image.open('plt.png'),lang = 'eng'))

Thank you!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/994ad827-8804-4f6f-89d7-6ff3348fc9e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Reading Japanese Text (Kanji)

2017-06-07 Thread akshat garg

Hi, 

As a part of explorating tesseract, I was trying to read Japanese text 
using tesseract. 

I was able to read Katakana and Hiragana chart with certain degree of 
accuracy (around 60-70 %), but Kanji symbols remains a mystery. 

Any suggestions are welcomed. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/96d7360a-645c-4216-89a5-1f4d0ed6314e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

[tesseract-ocr] Re: What is the "Confidence"value returned by Tesseract and how it is calculated?

[tesseract-ocr] Re: how to use tesseract to detect table?

[tesseract-ocr] Re: use jTesseractEdit training but box edit is empty

[tesseract-ocr] Place of feature extraction in optical character recognition

[tesseract-ocr] improve image so i can better OCR

[tesseract-ocr] Reading Japanese Text (Kanji)

7 matches

Site Navigation

Mail list logo

Footer information