Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Zdenko Podobny
If command line work for you that most easy way is to follow tesseract
executable code[1]:
IMO you need to use variable user_words_file; AFAIR user_words_suffix specifies
only file extension...
Then it should work[2] e.g. tessseract will load user words (effect on
recognition is other topic).

[1]
https://github.com/tesseract-ocr/tesseract/blob/4c8b7d5e3539bae18eb8337d5ebc1fccf56c1f93/src/api/tesseractmain.cpp#L357
[2]
https://github.com/tesseract-ocr/tesseract/blob/aa78a720a34708eece6e498c32e3593a24aa1e74/src/dict/dict.cpp#L254


Zdenko


st 3. 7. 2019 o 19:59 Jochen Naumann  napísal(a):

> Thanks, I already tried api->SetVariable("user_words_suffix", "user-words"
> );
> Did not work, while specifying it in a config file and using the command
> line tesseract tool it works.
> I used a file monitor tool to see if the process tries to open a
> user-words file, but it did not. The tesseract tool however does.
> I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
> But I am using 4.1, where this is fixed.
> Do you have a working example?
>
>
>
> Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen  >:
>
>> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
>> https://github.com/tesseract-ocr/tesseract/issues/960
>>
>> api->SetVariable("user_words_suffix", "user-words");
>>
>>
>> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>>>
>>> Hi, I can set the user-words file on the command line with tesseract
>>> tool, but how do I set this using the api?
>>> I searched for it in the sourcecode but could not find it, woult
>>> appreciate any help.
>>>
>>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "tesseract-ocr" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xdXJfsVc9wuiJBjEKm%2BDPw389yg2mXShGZ6%2BRYT%2BDavw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Jochen Naumann
Thanks, I already tried api->SetVariable("user_words_suffix", "user-words");
Did not work, while specifying it in a config file and using the command
line tesseract tool it works.
I used a file monitor tool to see if the process tries to open a user-words
file, but it did not. The tesseract tool however does.
I am aware of  https://github.com/tesseract-ocr/tesseract/issues/960
But I am using 4.1, where this is fixed.
Do you have a working example?



Am Mi., 3. Juli 2019 um 18:16 Uhr schrieb Quan Nguyen :

> https://github.com/tesseract-ocr/tesseract/wiki/APIExample
> https://github.com/tesseract-ocr/tesseract/issues/960
>
> api->SetVariable("user_words_suffix", "user-words");
>
>
> On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>>
>> Hi, I can set the user-words file on the command line with tesseract
>> tool, but how do I set this using the api?
>> I searched for it in the sourcecode but could not find it, woult
>> appreciate any help.
>>
>> --
> You received this message because you are subscribed to a topic in the
> Google Groups "tesseract-ocr" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/tesseract-ocr/vN6jRopxB5Y/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAHrkbJJbPgLh0qzVKfesMVU0h4g5F0YqYQV%2BME4xFCY3T24%3Dmw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Not able to extract table contents for Scanned pdf's and Normal Pdf's using Tesseract-ocr?

2019-07-03 Thread Akhil Dixit
I am also facing same issue for Scan PDF specially with multiple columns 
and Text with numbers. Please share some inputs here if anyone tried using 
tesseract or some other APIs.

On Friday, May 31, 2019 at 3:55:08 PM UTC+5:30, Sayali begampure wrote:
>
> We are trying to extract text content from normal pdf and scanned pdf 
> (image) using tesseract-ocr.
>
> We have observed following issues for the pdf's with table as table 
> Contents are not getting extracted properly.
>
>1. Contents from few cells(rows/columns) are not visible.Sometimes 
>heading of the table is missing.
>2. If numbers are there inside table, all the numbers are not getting 
>extracted.
>3. Some letters are extracted wrongly . eg. i is misinterpreted as l.
>4. Column sequence is getting interchanged as it is parsing 
>horizontally.
>5. Some extra characters are getting extracted along with normal one.
>
> Tried image_to_string ,image_to_data ,opencv approach
>
> Sample code used is:
>
> from PIL import Image
>
> import pytesseract from pytesseract import image_to_string from 
> pytesseract import image_to_boxes
>
> image=(pytesseract.image_to_string(Image.open('table_number.jpg'))) 
> print(image)
>
>
> It should extract rows and columns properly which it is not extracting as 
> of now. Kindly suggest function or method to enhance the results for table 
> content extraction using tesseract.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/9227377d-c1dd-4f58-9741-1d752b7a208f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: setting user-words in api?

2019-07-03 Thread Quan Nguyen
https://github.com/tesseract-ocr/tesseract/wiki/APIExample
https://github.com/tesseract-ocr/tesseract/issues/960

api->SetVariable("user_words_suffix", "user-words");


On Wednesday, July 3, 2019 at 10:29:59 AM UTC-5, Jochen Naumann wrote:
>
> Hi, I can set the user-words file on the command line with tesseract tool, 
> but how do I set this using the api? 
> I searched for it in the sourcecode but could not find it, woult 
> appreciate any help.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/6d3c039b-1d58-427f-b53a-5ef8a3639c40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] setting user-words in api?

2019-07-03 Thread Jochen Naumann
Hi, I can set the user-words file on the command line with tesseract tool, 
but how do I set this using the api? 
I searched for it in the sourcecode but could not find it, woult appreciate 
any help.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/adf1763a-8b98-44dd-af9c-15ae7e38be1f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] tesseract bug in windows

2019-07-03 Thread Shree Devi Kumar
Bugs are to reported in github under issues. If it is specific to windows
and uses prebuilt binaries, please report in repo of the source.

On Wed, 3 Jul 2019, 20:26 _ Flaviu,  wrote:

> Sorry for this topic, but I think that tesseract library has a bug when
> run in windows 10. My question is, where can I report a bug ? I have tried
> on tesseract-dev forum, but that was not the right place ...
>
> Is about this issue:
> https://groups.google.com/forum/#!topic/tesseract-ocr/tT9KmOukA9g
> and
> https://groups.google.com/forum/#!topic/tesseract-ocr/E-LURyGiqUw
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/86ec2fe6-dfd5-4e50-97f7-ef5499c39f9b%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWdLZLcx%2BLu%3DYJ04LBBR3B98CE9AqxDB7V8QYhOuXUbbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] tesseract bug in windows

2019-07-03 Thread _ Flaviu
Sorry for this topic, but I think that tesseract library has a bug when run 
in windows 10. My question is, where can I report a bug ? I have tried on 
tesseract-dev forum, but that was not the right place ... 

Is about this issue:
https://groups.google.com/forum/#!topic/tesseract-ocr/tT9KmOukA9g
and
https://groups.google.com/forum/#!topic/tesseract-ocr/E-LURyGiqUw

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/86ec2fe6-dfd5-4e50-97f7-ef5499c39f9b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Rect and real area to be recognized

2019-07-03 Thread Abstract
Btw, another strange thing happens.

During LSTM training, output seems extremely perfect, all the recognition 
looks exactly the same as the training images. After several hours of 
process, all the rates fall to very low values.

But. I take the result trainingdata file, and run recognition on the same 
images with my custom tool, i.e. I parse corresponding box file, group 
per-line boxes into one box, and make tesseract recognize it.
Result is hard to understand. About 20% cases are faulty. Sometimes 
recognized text is empty, sometimes characters are different, sometimes 
some foreign characters are inserted (as I wrote before, looks like some 
characters
are recognized out the filter rect).

Any ideas ?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/409fcf7a-d4d9-45a5-8af0-ccb91ede8150%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Rect and real area to be recognized

2019-07-03 Thread Abstract
Is it possible that real recognition area is wider than 
TessBaseAPISetRectangle ?

I noticed that sometimes extra characters in the recognition output, quire 
often exactly the same characters near to Rect bounds (but outside of it).

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d6db9d7e-de9b-45e7-91a2-16ed0619edb6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Trying to use "-c tessedit_page_number=1" at the end of command to process only page two of a multipage tiff

2019-07-03 Thread Laurent Sabourin
Just created issue #2537 


On Wednesday, 3 July 2019 01:36:25 UTC-4, zdenop wrote:
>
> I see the same behaviour on windows. Can you please create issue 
> ?
>
> Zdenko
>
>
> ut 2. 7. 2019 o 22:14 Laurent Sabourin  > napísal(a):
>
>> I am attaching a sample tiff that reproduce the issue, this one is a G3 
>> compression with the same issue...
>>
>> On Tuesday, 2 July 2019 15:58:29 UTC-4, Laurent Sabourin wrote:
>>>
>>> It is a G4 tiff compression not JPEG...
>>>
>>> On Tuesday, 2 July 2019 15:36:19 UTC-4, zdenop wrote:

 I guess you have tiff with jpeg compression 
 ...
  
 You need to use the latest tesseract code and leptonica >1.77

 Zdenko


 ut 2. 7. 2019 o 21:22 Laurent Sabourin  
 napísal(a):

> I am using tesseract to extract text from a multi page tiff image, but 
> I only want to process the second page. I am using the following command:
>
> tesseract.exe FILE.TIF OUT --tessdata-dir ."\tessdata" -l eng --psm 1 
> --oem 1 -c tessedit_page_number=1
>
> For some reason it always processes the first page no matter what page 
> number I put in the option. If I remove that option, it processes all the 
> pages.
>
> Is it a known issue? Am I doing it wrong?
>
> Thank you. 
>
> See below for details on my environment:
>
> Operating system:
> Windows 10 Enterprise 1903 10.0.18362.175 Client
>
> Here is my version output:
> tesseract --version
> tesseract 4.0.0
>  leptonica-1.76.0 (May 30 2019, 11:18:56) [MSC v.1916 LIB Release x86]
>   libgif 5.1.4 : libjpeg 6b (libjpeg-turbo 2.0.1) : libpng 1.6.37 : 
> libtiff 4.0.10 : zlib 1.2.11
>  Found AVX
>  Found SSE
>
> -- 
> You received this message because you are subscribed to the Google 
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to tesser...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/tesseract-ocr/3aa29162-5364-4cc2-9d37-44fe96248433%40googlegroups.com
>  
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>
 -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesser...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/2238917e-d36f-4c2f-96b9-88e22234c714%40googlegroups.com
>>  
>> 
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/ad451eee-ad18-47bc-b761-c9461784e265%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] how to train tesseract to detect superscripts and subscripts

2019-07-03 Thread Shree Devi Kumar
See
https://github.com/Shreeshrii/tess4training#additional-training-scripts---replace-top-layer-bash

On Wed, Jul 3, 2019 at 6:03 PM fady taher  wrote:

> Am trying to detect a superscript like the attached, I tried to add the
> "Cr⁶⁺" to the training set like 15 times, but still, it couldnt be
> recognized correctly
>
> the source file can found at
>
>
> http://download.siliconexpert.com/pdfs2/2019/6/4/10/44/32/882174/pns_/manual/ecqe2394kt_rohs.pdf
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/8bf52ee3-eb0e-4404-8bd6-49295bf87c4f%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWYKzOhvMQjnoB50mezLa33%3Dc5zt%2Bu8rS44V1irGoPGaQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] how to train tesseract to detect superscripts and subscripts

2019-07-03 Thread fady taher
Am trying to detect a superscript like the attached, I tried to add the 
"Cr⁶⁺" to the training set like 15 times, but still, it couldnt be 
recognized correctly

the source file can found at

http://download.siliconexpert.com/pdfs2/2019/6/4/10/44/32/882174/pns_/manual/ecqe2394kt_rohs.pdf


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/8bf52ee3-eb0e-4404-8bd6-49295bf87c4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: Install Tesseract 4 on CentOS and Red Hat [SOLVED!]

2019-07-03 Thread Александр Поздняков
Hi.
You need to add the repository key:

 wget
http://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/repodata/repomd.xml.key
rpm --import repomd.xml.key


вт, 2 июл. 2019 г., 19:45 Ivan Auffret :

> Hi Alexander,
>
> I am trying to install from your repo but I am getting the following error:
>
> Public key for tesseract-langpack-eng-4.00~git30-5.1.noarch.rpm is not
> installed
>
> Anyone knows what I can do?
>
> On Wednesday, April 25, 2018 at 9:47:15 AM UTC-7, Александр Поздняков
> wrote:
>>
>> for CentOS
>>
>>> yum-config-manager --add-repo
>>> https://download.opensuse.org/repositories/home:/Alexander_Pozdnyakov/CentOS_7/
>>> yum update
>>> yum install tesseract
>>
>>
>> for example
>>
>>> yum install tesseract-langpack-deu
>>
>>
>>
>> среда, 25 апреля 2018 г., 16:30:01 UTC+3 пользователь Eugene Huang
>> написал:
>>>
>>> Hello Александр!
>>>
>>> I took a look at your stuff; it is very extensive. If all the
>>> installations work, this should be front-paged! I have never used openSUSE.
>>> Could you point me to some resources to figure out how use your
>>> installation packages?
>>>
>>>
>>> @shree
>>> Thanks for the info. I definitely notice that Tesseract 4 is more
>>> accurate--more example, Tesseract 4 can read small italics font whereas
>>> Tesseract 3 makes lots of mistakes. Seems like Tesseract 4 is the future!
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/c44d29b0-e860-47b2-ac8f-a0e98af34776%40googlegroups.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CANzBpR3_KzkT-6Et6ut4bG_S4cncEZ59%2Bx_Kjs%3Db7BLmWj8RhQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Re: Off Topic: merge Spaces in words by OCR old books

2019-07-03 Thread _ Flaviu
No problem with your english, I am not a native english spoken language 
too, but I understand you. 

Good idea with the Extension, I don't know that extension. And things are 
going ? Are handling well ?

On Tuesday, July 2, 2019 at 11:31:48 AM UTC+3, Martin Jenniges wrote:
>
> Hello,
>
>
> I scan and ocr old books for a historial Society.
>
>
> In some books, the line have some words with estra Spaces, par example B 
> ue rg er m ei ster ei for Buergermeisterei.
>
>
> Now, after ocr this books, I must merge this extra Spaces once per once.
>
>
> My Question: give it a Makro or Extension for LibreOffice Writer or 
> Notepad++, which can help me ?
>
>
> I hope, You understand me; and I thank You for answer me!
>
>
> Martin Jenniges
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/d65d94a0-c992-4f75-92c6-0cf7b193a9e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[tesseract-ocr] Second Guess from LTSM

2019-07-03 Thread john


api->SetImage(image); api->Recognize(0); 
api->GetIterator()->GetBestLSTMSymbolChoices(); //get size zero every time..

the API returns size 0.
Am i using it correctly?

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/035a9acf-eb45-4d54-893c-17f628cc4253%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.