Re: [tesseract-ocr] Could anyone help me about pytessract?

2019-09-19 Thread luffy monky
Sorry because I can understand why the out put is nothing...But an other code 
use the same way it will out the string but show 03 not 09 

I just want to debug about those question

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c705fe96-2e3e-450e-adf6-af3a3860f667%40googlegroups.com.


Re: [tesseract-ocr] Could anyone help me about pytessract?

2019-09-19 Thread Lorenzo Bolzani
Try to invert the images.


Lorenzo

Il giorno gio 19 set 2019 alle ore 05:52 luffy monky 
ha scritto:

> Hi ALL
> I try to use any sample code from google.
> But it's show no thing in my code
> Could I trouble you for any advice??
> Here is my sample code
> 
> import pytesseract
> from PIL import Image
>
> image = Image.open("test3.jpg")
> code = pytesseract.image_to_string(image)
> print(code)
> 
>
> and I will attached the test image@@
>  if you can show some thing please help me.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/57f7319e-d88b-4740-8aff-f88d4b9a7bdf%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLxhKgwVf8cYaO7NMHwO%2BDi6f%2BO28atys89FaDtSn2eWOA%40mail.gmail.com.


Re: [tesseract-ocr] Trained data for E13B font

2019-09-19 Thread ElGato ElMago
Hello,

CMC-7 is totally a different font than E13B.  It's only E13B around 
myself.  I've never seen CMC-7 in person.

I had about 100 sample checks and used a check reading machine, one of 
those at banks.  Thus they're in the same image quality and character 
quality.

Although it's a small sample, there was no phantom character, no 
wrong-reading on symbols, nor on numerics in the end.  There was one 
isolated word with two characters that had been skipped.  Number of spaces 
between words tend to be shorter than real, which causes no problem in 
parsing.

I'm sort of done at the moment.  Not going for extensive training.  I'd 
think you could improve the training text for CMC-7.  The training with 
neural network (LSTM) works like a magic but it somewhat depends on how the 
training text is prepared.  I analyzed bad boxing with hocr output and put 
those patterns more in the training text.

Hope this helps.

ElMagoElGato

2019年9月17日火曜日 14時44分59秒 UTC+9 Mamadou:

> Hello,
>
>
> Thanks again for sharing your E-13B traineddata, it was helpful. 
> We’ve managed to get good accuracy for E-13B with Tesseract but failed with 
> CMC-7. So, we ended using TensorFlow for both fonts.
>
> I’m curious to know which level of accuracy you’ve reached. You can check our 
> accuracy for Tesseract using app at 
> https://github.com/DoubangoTelecom/tesseractMICR#the-recognizer-app. For 
> Tensorflow at https://www.doubango.org/webapps/micr/. 
>
> Also, have you tried with real life samples (e.g. random images from Google 
> search)? Why are you including the SPACE in your charset and training data? 
> It makes the convergence harder.
>
> As promised, the dataset is hosted at 
> https://github.com/DoubangoTelecom/tesseractMICR
>
>
> On Friday, August 9, 2019 at 10:40:15 AM UTC+2, ElGato ElMago wrote:
>>
>> I added eng.traineddata and LICENSE.  I used my account name in the 
>> license file.  I don't know if it's appropriate or not.  Please tell me if 
>> it's not.
>>
>> 2019年8月9日金曜日 16時17分41秒 UTC+9 Mamadou:
>>>
>>>
>>>
>>> On Friday, August 9, 2019 at 7:31:03 AM UTC+2, ElGato ElMago wrote:

 Here's my sharing on GitHub.  Hope it's of any use for somebody.

 https://github.com/ElMagoElGato/tess_e13b_training

>>> Thanks for sharing your experience with us.
>>> Is it possible to share your Tesseract model (xxx.traineddata)?
>>> We're building a dataset using real life images like what we have 
>>> already done for MRZ (
>>> https://github.com/DoubangoTelecom/tesseractMRZ/tree/master/dataset).
>>> Your model would help us to automated the annotation and will speedup 
>>> our devs. Off course we'll have to manualy correct the annotations but it 
>>> will be faster for us. 
>>> Also, please add a license to your repo so that we know if we have right 
>>> to use it
>>>


 2019年8月8日木曜日 9時35分17秒 UTC+9 ElGato ElMago:
>
> OK, I'll do so.  I need to reorganize naming and so on a little bit.  
> Will be out there soon.
>
> 2019年8月7日水曜日 21時11分01秒 UTC+9 Mamadou:
>>
>>
>>
>> On Wednesday, August 7, 2019 at 2:36:52 AM UTC+2, ElGato ElMago wrote:
>>>
>>> HI,
>>>
>>> I'm thinking of sharing it of course.  What is the best way to do 
>>> it?  After all this, the contribution part of mine is only how I 
>>> prepared 
>>> the training text.  Even that is consist of Shree's text and mine.  The 
>>> instructions and tools I used already exist.
>>>
>> If you have a Github account just create a repo and publish the data 
>> and instructions. 
>>
>>>
>>> ElMagoElGato
>>>
>>> 2019年8月7日水曜日 8時20分02秒 UTC+9 Mamadou:
>>>
 Hello,
 Are you planning to release the dataset or models?
 I'm working on the same subject and planning to share both under 
 BSD terms

 On Tuesday, August 6, 2019 at 10:11:40 AM UTC+2, ElGato ElMago 
 wrote:
>
> Hi,
>
> FWIW, I got to the point where I can feel happy with the accuracy. 
> As the images of the previous post show, the symbols, especially 
> on-us 
> symbol and amount symbol, were causing mix-up each other or to 
> another 
> character.  I added much more more symbols to the training text and 
> formed 
> words that start with a symbol.  One example is as follows:
>
> 9;:;=;<;< <0<1<3<4;6;8;9;:;=;
>
>
> I randomly made 8,000 lines like this.  In fine-tuning from eng, 
> 5,000 iteration was almost good.  Amount symbol still is confused a 
> little 
> when it's followed by 0.  Fine tuning tends to be dragged by small 
> particles.  I'll have to think of something to make further 
> improvement.
>
> Training from scratch produced a bit more stable traineddata.  It 
> doesn't get confused with symbols so often but tends to generate 
> 

[tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
thanks for your responses
@Timothy Snyder: I think I cannot do this in postprocesssing, as it is 
possible that both spellings occur, but I have to differentiate them. Or 
what did you do exactly?
@zdenop: Unfortunately it is not possible for me to send a longer text.

anyone else any ideas? 

Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>
> I'm using Tesseract with Python. I have an image with 1-6 words in it and 
> need to read the text. Sometimes the character "C", which look the same in 
> upper and lower case, is detected as lower case c instead of upper case C. 
> I see the problem, but in context to the following letters it should be 
> possible to detect the right notation. Is there any configuration or 
> something to improve this?
>
> I had a look at the configuration options of config='-psm x' with 
> different values for x, but nothing fits to my problem
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/c8271c49-77a3-4081-9418-0a822be1f8c7%40googlegroups.com.


[tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr


[image: currentImage.png]
@Lorenzo Blz: This is an example image. The output of my code is 
"calibrations". The height of the letters is not the same. Of course it 
cannot be recognized if there is only a "c", but in the context to the 
other letters tesseract should be able to detect if it is a small or 
capital letter, I think. This image has no noise or anything else, I don't 
unterstand the problem. But nevertheless, your comment to change the size 
helped! If I resize it with 150% or 75% for example, it works. I just don't 
know how to solve it if I don't have a reference value later on. How to 
decide which is the right spelling, 100% image size or 150%. Or is it 
possible to say that it's always a more reliable result if I resize the 
image in preprocessing?

Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>
> I'm using Tesseract with Python. I have an image with 1-6 words in it and 
> need to read the text. Sometimes the character "C", which look the same in 
> upper and lower case, is detected as lower case c instead of upper case C. 
> I see the problem, but in context to the following letters it should be 
> possible to detect the right notation. Is there any configuration or 
> something to improve this?
>
> I had a look at the configuration options of config='-psm x' with 
> different values for x, but nothing fits to my problem
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
Please provide more information (versions info, how you do OCR - seem like
you use some coding).
I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line with
tessdata_best and if work for me:
tesseract unnamed.png -
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 497
Calibrations

Zdenko


št 19. 9. 2019 o 10:43 'Sandra M.' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

> [image: currentImage.png]
> @Lorenzo Blz: This is an example image. The output of my code is
> "calibrations". The height of the letters is not the same. Of course it
> cannot be recognized if there is only a "c", but in the context to the
> other letters tesseract should be able to detect if it is a small or
> capital letter, I think. This image has no noise or anything else, I don't
> unterstand the problem. But nevertheless, your comment to change the size
> helped! If I resize it with 150% or 75% for example, it works. I just don't
> know how to solve it if I don't have a reference value later on. How to
> decide which is the right spelling, 100% image size or 150%. Or is it
> possible to say that it's always a more reliable result if I resize the
> image in preprocessing?
>
> Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>>
>> I'm using Tesseract with Python. I have an image with 1-6 words in it and
>> need to read the text. Sometimes the character "C", which look the same in
>> upper and lower case, is detected as lower case c instead of upper case C.
>> I see the problem, but in context to the following letters it should be
>> possible to detect the right notation. Is there any configuration or
>> something to improve this?
>>
>> I had a look at the configuration options of config='-psm x' with
>> different values for x, but nothing fits to my problem
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xRV59aFAdRrj-erFNodY0OHPgisoWrOtKXoLdZkL-Pcg%40mail.gmail.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Lorenzo Bolzani
I tried to upscale, downscale, with and without the white border and I
always get Calibrations. I even tried a few psm modes.

I'm using:

tesseract 4.0.0
 leptonica-1.76.0
  libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib
1.2.11

What I would do is this:
- prepare a test set with some data so that you can check what gives you an
improvement and what not on average
- remove the white border (see here
)
- now rescale the text so that it is about 35/55px, try a few values and
see what works best. I would also try a few completely different values
(75, 100) while I'm there (just make sure you always start from the
original images when you rescale not to mess the images too much, I would
use find+imagemagick).

If this doesn't work, you could look at the character boxes size. If the
text height is fixed you should be able to tell immediately what is what.

If this doesn't work and if you have some data, you could consider doing
some fine tuning (for example with ocrd-train
) but if your text is so clear
and standard you should not need it.


I just saw that you are using version 3.x, this is the old version and does
not use neural networks. Current stable version is 4.1.


Lorenzo

Il giorno gio 19 set 2019 alle ore 10:43 'Sandra M.' via tesseract-ocr <
tesseract-ocr@googlegroups.com> ha scritto:

> [image: currentImage.png]
> @Lorenzo Blz: This is an example image. The output of my code is
> "calibrations". The height of the letters is not the same. Of course it
> cannot be recognized if there is only a "c", but in the context to the
> other letters tesseract should be able to detect if it is a small or
> capital letter, I think. This image has no noise or anything else, I don't
> unterstand the problem. But nevertheless, your comment to change the size
> helped! If I resize it with 150% or 75% for example, it works. I just don't
> know how to solve it if I don't have a reference value later on. How to
> decide which is the right spelling, 100% image size or 150%. Or is it
> possible to say that it's always a more reliable result if I resize the
> image in preprocessing?
>
> Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>>
>> I'm using Tesseract with Python. I have an image with 1-6 words in it and
>> need to read the text. Sometimes the character "C", which look the same in
>> upper and lower case, is detected as lower case c instead of upper case C.
>> I see the problem, but in context to the following letters it should be
>> possible to detect the right notation. Is there any configuration or
>> something to improve this?
>>
>> I had a look at the configuration options of config='-psm x' with
>> different values for x, but nothing fits to my problem
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAMgOLLwouZhkZkME31jW-KVchbeHViByEqsqchy3pe4c0gtBRg%40mail.gmail.com.


Re: [tesseract-ocr] text2image: No such file or directory

2019-09-19 Thread Zdenko Podobny
Does  /usr/local/bin/text2image exists? Did you
installed text2image/training tools?

Zdenko


št 19. 9. 2019 o 13:59 Ajinkya Khalwadekar 
napísal(a):

> I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for
> tesseract 4.0 learning.
> I am using macOS mojave.
>
> All was good until i tried 'text2image --list_available_fonts
> --fonts_dir=/Library/Fonts'.
>
> o/p i get on this is '-bash: /usr/local/bin/text2image: No such file or
> directory'.
>
> Any leads on this?
>
> Thanks in advance.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/4ddf76d0-47bb-4ec5-bc88-3413e3a1e45b%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wVCLEdj%2Bmw9CXwCmURkiBGtnCRJmAcucqqCP_arzjKPw%40mail.gmail.com.


[tesseract-ocr] text2image issue

2019-09-19 Thread Ajinkya Khalwadekar
I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for 
tesseract 4.0 learning.
I am using macOS mojave.

All was good until i tried 'text2image --list_available_fonts 
--fonts_dir=/Library/Fonts'.

o/p i get on this is '-bash: /usr/local/bin/text2image: No such file or 
directory'.

Any leads on this?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/f03a0874-506e-4b6b-b1f6-ec42bdbca62b%40googlegroups.com.


Re: [tesseract-ocr] text2image issue

2019-09-19 Thread Zdenko Podobny
You already send this to forum and I already replied. Did you read it?

Zdenko


št 19. 9. 2019 o 15:04 Ajinkya Khalwadekar 
napísal(a):

> I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for
> tesseract 4.0 learning.
> I am using macOS mojave.
>
> All was good until i tried 'text2image --list_available_fonts
> --fonts_dir=/Library/Fonts'.
>
> o/p i get on this is '-bash: /usr/local/bin/text2image: No such file or
> directory'.
>
> Any leads on this?
>
> Thanks in advance.
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/f03a0874-506e-4b6b-b1f6-ec42bdbca62b%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xkLsVSA%3D%2BMQA9ahtXeC5NVKUPj%3DQ_UfdYSXe9UGaa5FQ%40mail.gmail.com.


[tesseract-ocr] text2image: No such file or directory

2019-09-19 Thread Ajinkya Khalwadekar
I am following https://github.com/tesseract-ocr/tesseract/issues/1453 for 
tesseract 4.0 learning.
I am using macOS mojave.

All was good until i tried 'text2image --list_available_fonts 
--fonts_dir=/Library/Fonts'.

o/p i get on this is '-bash: /usr/local/bin/text2image: No such file or 
directory'.

Any leads on this?

Thanks in advance.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/4ddf76d0-47bb-4ec5-bc88-3413e3a1e45b%40googlegroups.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
You were both right - updating to version 5 fixed the problem more or less! 
Only in one case there is still a problem with lower and upper case 
letters, but for the other cases it's working now!

Am Donnerstag, 19. September 2019 12:49:43 UTC+2 schrieb zdenop:
>
> your tesseract version is old. Current version is 4.1 (or dev version is 
> 5.0).
> For 4.x and above you can you different tessdata: best, fast or with 3.x 
> module.
>
> Zdenko
>
>
> št 19. 9. 2019 o 11:55 'Sandra M.' via tesseract-ocr <
> tesser...@googlegroups.com > napísal(a):
>
>> I use Tesseract 3.02 leptonica-1.68. What do you mean with tessdata_best? 
>> I'm new in this field and just know how to call tesseract with the given 
>> code line How can the resolution be 0 dpi?
>>
>> I'm using this Python code:
>>
>> import pytesseractimport argparseimport cv2import os
>> # construct the argument parse and parse the arguments
>> ap = argparse.ArgumentParser()
>> ap.add_argument("-i", "--image", required=True,
>> help="path to input image to be OCR'd")
>> args = vars(ap.parse_args())
>> # load the example image and convert it to grayscale
>> image = cv2.imread(args["image"])
>> gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
>> # write the grayscale image to disk as a temporary file so we can# apply OCR 
>> to it
>> filename = "{}.png".format(os.getpid())
>> cv2.imwrite(filename, gray)
>> # load the image as a PIL/Pillow image, apply OCR, and then delete# the 
>> temporary file
>> text = pytesseract.image_to_string(gray)print("Output: " + text)
>>
>>
>> Am Donnerstag, 19. September 2019 11:23:50 UTC+2 schrieb zdenop:
>>>
>>> Please provide more information (versions info, how you do OCR - seem 
>>> like you use some coding).
>>> I just tried tesseract (tesseract 5.0.0-alpha-416-g408d6) command line 
>>> with tessdata_best and if work for me:
>>> tesseract unnamed.png -
>>> Warning: Invalid resolution 0 dpi. Using 70 instead.
>>> Estimating resolution as 497
>>> Calibrations
>>>
>>> Zdenko
>>>
>>>
>>> št 19. 9. 2019 o 10:43 'Sandra M.' via tesseract-ocr <
>>> tesser...@googlegroups.com> napísal(a):
>>>
 [image: currentImage.png]
 @Lorenzo Blz: This is an example image. The output of my code is 
 "calibrations". The height of the letters is not the same. Of course it 
 cannot be recognized if there is only a "c", but in the context to the 
 other letters tesseract should be able to detect if it is a small or 
 capital letter, I think. This image has no noise or anything else, I don't 
 unterstand the problem. But nevertheless, your comment to change the size 
 helped! If I resize it with 150% or 75% for example, it works. I just 
 don't 
 know how to solve it if I don't have a reference value later on. How to 
 decide which is the right spelling, 100% image size or 150%. Or is it 
 possible to say that it's always a more reliable result if I resize the 
 image in preprocessing?

 Am Mittwoch, 18. September 2019 17:19:22 UTC+2 schrieb Sandra M.:
>
> I'm using Tesseract with Python. I have an image with 1-6 words in it 
> and need to read the text. Sometimes the character "C", which look the 
> same 
> in upper and lower case, is detected as lower case c instead of upper 
> case 
> C. I see the problem, but in context to the following letters it should 
> be 
> possible to detect the right notation. Is there any configuration or 
> something to improve this?
>
> I had a look at the configuration options of config='-psm x' with 
> different values for x, but nothing fits to my problem
>
 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesser...@googlegroups.com.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/e4ed704a-cee0-4bb2-80ae-9fc9b82ab55d%40googlegroups.com
  
 
 .

>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/9faf77f7-c862-47f6-b01d-629773025a7f%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 

Re: [tesseract-ocr] Re: Compile Tesseract with vcpkg to get dynamic libraries

2019-09-19 Thread Zdenko Podobny
I did not tried it, but if you have installed leptonica, you can install
tesseract from source, just adjust relevant part of cmake configuration.
AFAIK vcpkg uses cmake and ninja, so this this tutorial  (last part) can
help you:
http://www.sk-spell.sk.cx/building-tesseract-and-leptonica-with-cmake-and-clang-on-windows


Zdenko


št 19. 9. 2019 o 13:59 Anon ymous  napísal(a):

> I have the same problem
>
> On Sunday, September 30, 2018 at 1:19:37 PM UTC-4, PLOBEXRIME wrote:
>>
>> Hi, I'm searching for a way to compile Tesseract and get tesseract.dll
>> library file. CPPAN works for me from time to time but always fail for x64
>> build so I've tried vcpkg instead. However vcpkg compiles Tesseract as
>> executable and 3rd party libraries that are needed by it i.e.
>> leptonica-1.74.4.dll, gif.dll etc. but no tesseract.dll
>> The command line that I use:
>> vcpkg install tesseract:x64-windows --head
>> So my question is - how and if vcpkg can be configured to compile
>> Tesseract as a DLL ? The reason I need DLL file is that I can't use static
>> libraries from Delphi.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/63ccfc46-c190-49f9-9f1f-a1b6e1936d68%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wao%3DvL70sfC_uzgna0%3DH7H4umgyrKU_RVbz%3DfOMjSX-Q%40mail.gmail.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
You were both right - updating to version 5 fixed the problem more or less! 
Only in one case there is still a problem with lower and upper case 
letters, but for the other cases it's working now!

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/e8563427-d531-4589-a178-e8bdee4a8e7b%40googlegroups.com.


[tesseract-ocr] OCR results are different on different OS (Linux and Windows)

2019-09-19 Thread Karan Singh
For the same image, I am using the tesseract to get the text output. But 
apparently the output is bad on linux version (RHEL) than windows (Windows 
10). I also made sure that all the installation dependencies and version 
are same. 

Kindly let me know how to deal with this.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d0d54587-e405-4a41-a8c1-77ef70c9edc3%40googlegroups.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread Zdenko Podobny
please provide image for testing.

Zdenko


št 19. 9. 2019 o 18:06 'Sandra M.' via tesseract-ocr <
tesseract-ocr@googlegroups.com> napísal(a):

> But therefore I get empty strings now, because it occurs a symbol that
> tesseract does not know. I had this problem before as well, but could fix
> it for whatever reason with config='--psm 7'. This doesn't work now
> anymore... Do you have an idea for this as well? I don't need to detect the
> symbol, I just want that the rest of the string is not "thrown away"...
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/29f63b14-e2f4-481b-89f6-bd8149e71138%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8w7FacieXfvrTdzFFG55KcpYChbj8TMj5FzcAabx1f_jA%40mail.gmail.com.


Re: [tesseract-ocr] Re: problems with upper-case character

2019-09-19 Thread 'Sandra M.' via tesseract-ocr
But therefore I get empty strings now, because it occurs a symbol that 
tesseract does not know. I had this problem before as well, but could fix 
it for whatever reason with config='--psm 7'. This doesn't work now 
anymore... Do you have an idea for this as well? I don't need to detect the 
symbol, I just want that the rest of the string is not "thrown away"...

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/29f63b14-e2f4-481b-89f6-bd8149e71138%40googlegroups.com.


[tesseract-ocr] Which mode is better if i crop the exact text with multiple words and pass to tesseract for accuracy?

2019-09-19 Thread Purushotham Rao Eravalli
I am using some other model for text detection, I get text boxes for each 
line in the image(basically some time identity cards). Now i need to pass 
them through the tesseract for recognition. Which psm mode do you think 
will the higher accuracy between psm13 and psm7. 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/524d709c-86a1-4117-bc0f-d9c553004a21%40googlegroups.com.


Re: [tesseract-ocr] OCR results are different on different OS (Linux and Windows)

2019-09-19 Thread Zdenko Podobny
Do you really think that somebody can reproduce problem based on
information you provided?

Zdenko


št 19. 9. 2019 o 18:10 Karan Singh  napísal(a):

> For the same image, I am using the tesseract to get the text output. But
> apparently the output is bad on linux version (RHEL) than windows (Windows
> 10). I also made sure that all the installation dependencies and version
> are same.
>
> Kindly let me know how to deal with this.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/d0d54587-e405-4a41-a8c1-77ef70c9edc3%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8zQn02kMtxHujFTZ04mx1qOs764FeN-NtE7trpYwiS%3DGw%40mail.gmail.com.


[tesseract-ocr] Re: Compile Tesseract with vcpkg to get dynamic libraries

2019-09-19 Thread Anon ymous
I have the same problem

On Sunday, September 30, 2018 at 1:19:37 PM UTC-4, PLOBEXRIME wrote:
>
> Hi, I'm searching for a way to compile Tesseract and get tesseract.dll 
> library file. CPPAN works for me from time to time but always fail for x64 
> build so I've tried vcpkg instead. However vcpkg compiles Tesseract as 
> executable and 3rd party libraries that are needed by it i.e. 
> leptonica-1.74.4.dll, gif.dll etc. but no tesseract.dll 
> The command line that I use:
> vcpkg install tesseract:x64-windows --head
> So my question is - how and if vcpkg can be configured to compile 
> Tesseract as a DLL ? The reason I need DLL file is that I can't use static 
> libraries from Delphi.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/63ccfc46-c190-49f9-9f1f-a1b6e1936d68%40googlegroups.com.