Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-30 Thread bácsi Kazi
Thanks!
Eventually the issue was this: 
https://groups.google.com/forum/#!msg/tesseract-ocr/FSURCa9m7Ko/
So I suppose the files from the download page resulted the error, but the 
newer files on Git work well when building Tesseract on Cygwin.
Greetings:

Kazi

2015. december 30., szerda 9:42:25 UTC+1 időpontban shree a következőt írta:

> On cygwin Marco Atzeri has packaged Tesseract as well as the training 
> utilities for 3.04.00 along with some training data. Instruction for cygwin 
> installation is here: https://cygwin.com/cygwin-ug-net/setup-net.html
>
> Tesseract specific packages to be installed:
>
> tesseract-ocr   3.04.00-2
> tesseract-ocr-eng   3.04-1
> tesseract-training-core 3.04-1
> tesseract-training-eng  3.04-1
> tesseract-training-util 3.04.00-2
>
>
> ShreeDevi
> 
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
> On Wed, Dec 30, 2015 at 5:11 AM, bácsi Kazi  > wrote:
>
>> Dear Zdenko!
>>
>> Thank you for your reply! Even though the original file was in Italian, 
>> your output is quite impressive!
>> I found a guide how to compile with CygWin: 
>> http://vorba.ch/2014/tesseract-cygwin.html
>> So I installed CygWin64 with the necessary packages, then everything went 
>> fine with Leptonica, but I screwed up with Tesseract. During make when 
>> processing ccutil/ambigs.cpp it lacks the strtok_r.h file, but it's in 
>> the vs2010/port folder (if I place it there it finds it ambiguous). I 
>> used: CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib" 
>> ./configure because of my Leptonica installation.
>> So I can't get even a "normal" installation, not to mention the one 
>> written here: https://github.com/tesseract-ocr/tesseract/wiki/Compiling
>> I'm not familiar with this stuff - that's why I was asking an installer 
>> (couldn't find the one you were referring to).
>> I couldn't get either that you have suggested exactly in your last line.
>> Greetings:
>>
>> Kazi
>>
>> 2015. december 28., hétfő 20:23:35 UTC+1 időpontban zdenop a következőt 
>> írta:
>>
>>> First of all - there is no such policy as not providing Windows 
>>> installers.  There is no installer because there is nobody who would 
>>> maintain it and provide solution (e.g. NSIS destroys my PATH variable on 
>>> windows ;-) ). Everybody is busy with programming :-) (something else).
>>>
>>> Next: there is windows build based on cygwin, so if you need windows 
>>> portable version you get it (search this forum).
>>>
>>> Next in attachment you can find output created with current tesseract 
>>> code created with:
>>> tesseract example.png example -l spa
>>> (I renamed your file and I hope I chose correct language for OCR). It 
>>> seem that result is better than yours including capitalization. 
>>>
>>> IMO tesseract executable is nice example how to use tesseract library. 
>>> Maybe you should try to use tesseract library directly
>>>
>>>
>>> Zdenko
>>>
>>> On Mon, Dec 28, 2015 at 7:00 PM, bácsi Kazi  wrote:
>>>
 Dear Zdenko,

 I provide an example file in attachment. You can see Enrico, Antonio, 
 Roberto in the output with this mistake, despite all these names are 
 present in the dictionary with all-caps.
 I haven't tried later versions, because you have a policy of not 
 providing Windows installers, and I was busy with other programming. But 
 if 
 you say it's worth it, I'll try. Is there any guide how to create a 
 portable version for Windows?
 Thanks again!

 Kazi

 2015. december 28., hétfő 10:08:35 UTC+1 időpontban zdenop a következőt 
 írta:

> When you ask for support please provide example files.
> Did you try the latest version of tesseract?
>
> Zdenko
>
> On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi  
> wrote:
>
>> Could you help? Have I missed something blatantly trivial?
>> Any help would be highly appreciated!
>>
>> Kazi
>>
>> 2015. december 15., kedd 8:33:27 UTC+1 időpontban bácsi Kazi a 
>> következőt írta:
>>
>>> Hi there! 
>>>
>>> I'm playing with Tesseract 3.02, and I would need precise 
>>> recognition of capital letters. Unfortunately my files are full of all 
>>> caps 
>>> and small caps. During the training if I included such words in the 
>>> sample, 
>>> I got random capitals in the rest of the text. I thought I would try to 
>>> put 
>>> them into a new font, same. I included them in the dictionary files, 
>>> somewhat better, but still problematic at letter o, u, v etc. I.e. 
>>> HELLo 
>>> WoRLD & friends, despite having HELLO WORLD in dictionary. 
>>> It's quite similar to this: 
>>> https://code.google.com/p/tesseract-ocr/issues/detail?id=691 
>>> What is your experience? 

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-30 Thread ShreeDevi Kumar
On cygwin Marco Atzeri has packaged Tesseract as well as the training
utilities for 3.04.00 along with some training data. Instruction for cygwin
installation is here: https://cygwin.com/cygwin-ug-net/setup-net.html

Tesseract specific packages to be installed:

tesseract-ocr   3.04.00-2
tesseract-ocr-eng   3.04-1
tesseract-training-core 3.04-1
tesseract-training-eng  3.04-1
tesseract-training-util 3.04.00-2


ShreeDevi

भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Wed, Dec 30, 2015 at 5:11 AM, bácsi Kazi  wrote:

> Dear Zdenko!
>
> Thank you for your reply! Even though the original file was in Italian,
> your output is quite impressive!
> I found a guide how to compile with CygWin:
> http://vorba.ch/2014/tesseract-cygwin.html
> So I installed CygWin64 with the necessary packages, then everything went
> fine with Leptonica, but I screwed up with Tesseract. During make when
> processing ccutil/ambigs.cpp it lacks the strtok_r.h file, but it's in
> the vs2010/port folder (if I place it there it finds it ambiguous). I
> used: CPPFLAGS="-I/usr/local/include" LDFLAGS="-L/usr/local/lib"
> ./configure because of my Leptonica installation.
> So I can't get even a "normal" installation, not to mention the one
> written here: https://github.com/tesseract-ocr/tesseract/wiki/Compiling
> I'm not familiar with this stuff - that's why I was asking an installer
> (couldn't find the one you were referring to).
> I couldn't get either that you have suggested exactly in your last line.
> Greetings:
>
> Kazi
>
> 2015. december 28., hétfő 20:23:35 UTC+1 időpontban zdenop a következőt
> írta:
>
>> First of all - there is no such policy as not providing Windows
>> installers.  There is no installer because there is nobody who would
>> maintain it and provide solution (e.g. NSIS destroys my PATH variable on
>> windows ;-) ). Everybody is busy with programming :-) (something else).
>>
>> Next: there is windows build based on cygwin, so if you need windows
>> portable version you get it (search this forum).
>>
>> Next in attachment you can find output created with current tesseract
>> code created with:
>> tesseract example.png example -l spa
>> (I renamed your file and I hope I chose correct language for OCR). It
>> seem that result is better than yours including capitalization.
>>
>> IMO tesseract executable is nice example how to use tesseract library.
>> Maybe you should try to use tesseract library directly
>>
>>
>> Zdenko
>>
>> On Mon, Dec 28, 2015 at 7:00 PM, bácsi Kazi  wrote:
>>
>>> Dear Zdenko,
>>>
>>> I provide an example file in attachment. You can see Enrico, Antonio,
>>> Roberto in the output with this mistake, despite all these names are
>>> present in the dictionary with all-caps.
>>> I haven't tried later versions, because you have a policy of not
>>> providing Windows installers, and I was busy with other programming. But if
>>> you say it's worth it, I'll try. Is there any guide how to create a
>>> portable version for Windows?
>>> Thanks again!
>>>
>>> Kazi
>>>
>>> 2015. december 28., hétfő 10:08:35 UTC+1 időpontban zdenop a következőt
>>> írta:
>>>
 When you ask for support please provide example files.
 Did you try the latest version of tesseract?

 Zdenko

 On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi  wrote:

> Could you help? Have I missed something blatantly trivial?
> Any help would be highly appreciated!
>
> Kazi
>
> 2015. december 15., kedd 8:33:27 UTC+1 időpontban bácsi Kazi a
> következőt írta:
>
>> Hi there!
>>
>> I'm playing with Tesseract 3.02, and I would need precise recognition
>> of capital letters. Unfortunately my files are full of all caps and small
>> caps. During the training if I included such words in the sample, I got
>> random capitals in the rest of the text. I thought I would try to put 
>> them
>> into a new font, same. I included them in the dictionary files, somewhat
>> better, but still problematic at letter o, u, v etc. I.e. HELLo WoRLD &
>> friends, despite having HELLO WORLD in dictionary.
>> It's quite similar to this:
>> https://code.google.com/p/tesseract-ocr/issues/detail?id=691
>> What is your experience? How to train Tesseract for caps? Is it
>> better in later versions? Is there a configuration parameter to set?
>> Thanks!
>>
>> Kazi
>
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to tesseract-oc...@googlegroups.com.
> To post to this group, send email to tesser...@googlegroups.com.
> Visit this group at 

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-29 Thread bácsi Kazi
Dear Zdenko!

Thank you for your reply! Even though the original file was in Italian, 
your output is quite impressive!
I found a guide how to compile with CygWin: 
http://vorba.ch/2014/tesseract-cygwin.html
So I installed CygWin64 with the necessary packages, then everything went 
fine with Leptonica, but I screwed up with Tesseract. During make when 
processing ccutil/ambigs.cpp it lacks the strtok_r.h file, but it's in the 
vs2010/port folder (if I place it there it finds it ambiguous). I used: 
CPPFLAGS="-I/usr/local/include" 
LDFLAGS="-L/usr/local/lib" ./configure because of my Leptonica installation.
So I can't get even a "normal" installation, not to mention the one written 
here: https://github.com/tesseract-ocr/tesseract/wiki/Compiling
I'm not familiar with this stuff - that's why I was asking an installer 
(couldn't find the one you were referring to).
I couldn't get either that you have suggested exactly in your last line.
Greetings:

Kazi

2015. december 28., hétfő 20:23:35 UTC+1 időpontban zdenop a következőt 
írta:

> First of all - there is no such policy as not providing Windows 
> installers.  There is no installer because there is nobody who would 
> maintain it and provide solution (e.g. NSIS destroys my PATH variable on 
> windows ;-) ). Everybody is busy with programming :-) (something else).
>
> Next: there is windows build based on cygwin, so if you need windows 
> portable version you get it (search this forum).
>
> Next in attachment you can find output created with current tesseract code 
> created with:
> tesseract example.png example -l spa
> (I renamed your file and I hope I chose correct language for OCR). It seem 
> that result is better than yours including capitalization. 
>
> IMO tesseract executable is nice example how to use tesseract library. 
> Maybe you should try to use tesseract library directly
>
>
> Zdenko
>
> On Mon, Dec 28, 2015 at 7:00 PM, bácsi Kazi  > wrote:
>
>> Dear Zdenko,
>>
>> I provide an example file in attachment. You can see Enrico, Antonio, 
>> Roberto in the output with this mistake, despite all these names are 
>> present in the dictionary with all-caps.
>> I haven't tried later versions, because you have a policy of not 
>> providing Windows installers, and I was busy with other programming. But if 
>> you say it's worth it, I'll try. Is there any guide how to create a 
>> portable version for Windows?
>> Thanks again!
>>
>> Kazi
>>
>> 2015. december 28., hétfő 10:08:35 UTC+1 időpontban zdenop a következőt 
>> írta:
>>
>>> When you ask for support please provide example files.
>>> Did you try the latest version of tesseract?
>>>
>>> Zdenko
>>>
>>> On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi  wrote:
>>>
 Could you help? Have I missed something blatantly trivial?
 Any help would be highly appreciated!

 Kazi

 2015. december 15., kedd 8:33:27 UTC+1 időpontban bácsi Kazi a 
 következőt írta:

> Hi there! 
>
> I'm playing with Tesseract 3.02, and I would need precise recognition 
> of capital letters. Unfortunately my files are full of all caps and small 
> caps. During the training if I included such words in the sample, I got 
> random capitals in the rest of the text. I thought I would try to put 
> them 
> into a new font, same. I included them in the dictionary files, somewhat 
> better, but still problematic at letter o, u, v etc. I.e. HELLo WoRLD & 
> friends, despite having HELLO WORLD in dictionary. 
> It's quite similar to this: 
> https://code.google.com/p/tesseract-ocr/issues/detail?id=691 
> What is your experience? How to train Tesseract for caps? Is it better 
> in later versions? Is there a configuration parameter to set? 
> Thanks! 
>
> Kazi

 -- 
 You received this message because you are subscribed to the Google 
 Groups "tesseract-ocr" group.
 To unsubscribe from this group and stop receiving emails from it, send 
 an email to tesseract-oc...@googlegroups.com.
 To post to this group, send email to tesser...@googlegroups.com.
 Visit this group at https://groups.google.com/group/tesseract-ocr.
 To view this discussion on the web visit 
 https://groups.google.com/d/msgid/tesseract-ocr/16a46021-43b9-484f-a66f-e3b077b4aadb%40googlegroups.com
  
 
 .

 For more options, visit https://groups.google.com/d/optout.

>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com .
>> To post to this group, send email to tesser...@googlegroups.com 
>> .
>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>> To 

Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-28 Thread zdenko podobny
When you ask for support please provide example files.
Did you try the latest version of tesseract?

Zdenko

On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi  wrote:

> Could you help? Have I missed something blatantly trivial?
> Any help would be highly appreciated!
>
> Kazi
>
> 2015. december 15., kedd 8:33:27 UTC+1 időpontban bácsi Kazi a következőt
> írta:
>
>> Hi there!
>>
>> I'm playing with Tesseract 3.02, and I would need precise recognition of
>> capital letters. Unfortunately my files are full of all caps and small
>> caps. During the training if I included such words in the sample, I got
>> random capitals in the rest of the text. I thought I would try to put them
>> into a new font, same. I included them in the dictionary files, somewhat
>> better, but still problematic at letter o, u, v etc. I.e. HELLo WoRLD &
>> friends, despite having HELLO WORLD in dictionary.
>> It's quite similar to this:
>> https://code.google.com/p/tesseract-ocr/issues/detail?id=691
>> What is your experience? How to train Tesseract for caps? Is it better in
>> later versions? Is there a configuration parameter to set?
>> Thanks!
>>
>> Kazi
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/16a46021-43b9-484f-a66f-e3b077b4aadb%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8x_As34bgBffwGhBiyEax6TzjP49xkk%2BbLtRzKmZ0Nf_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Re: [tesseract-ocr] Re: All-caps, small-caps

2015-12-28 Thread zdenko podobny
First of all - there is no such policy as not providing Windows
installers.  There is no installer because there is nobody who would
maintain it and provide solution (e.g. NSIS destroys my PATH variable on
windows ;-) ). Everybody is busy with programming :-) (something else).

Next: there is windows build based on cygwin, so if you need windows
portable version you get it (search this forum).

Next in attachment you can find output created with current tesseract code
created with:
tesseract example.png example -l spa
(I renamed your file and I hope I chose correct language for OCR). It seem
that result is better than yours including capitalization.

IMO tesseract executable is nice example how to use tesseract library.
Maybe you should try to use tesseract library directly


Zdenko

On Mon, Dec 28, 2015 at 7:00 PM, bácsi Kazi  wrote:

> Dear Zdenko,
>
> I provide an example file in attachment. You can see Enrico, Antonio,
> Roberto in the output with this mistake, despite all these names are
> present in the dictionary with all-caps.
> I haven't tried later versions, because you have a policy of not providing
> Windows installers, and I was busy with other programming. But if you say
> it's worth it, I'll try. Is there any guide how to create a portable
> version for Windows?
> Thanks again!
>
> Kazi
>
> 2015. december 28., hétfő 10:08:35 UTC+1 időpontban zdenop a következőt
> írta:
>
>> When you ask for support please provide example files.
>> Did you try the latest version of tesseract?
>>
>> Zdenko
>>
>> On Sun, Dec 27, 2015 at 9:43 PM, bácsi Kazi  wrote:
>>
>>> Could you help? Have I missed something blatantly trivial?
>>> Any help would be highly appreciated!
>>>
>>> Kazi
>>>
>>> 2015. december 15., kedd 8:33:27 UTC+1 időpontban bácsi Kazi a
>>> következőt írta:
>>>
 Hi there!

 I'm playing with Tesseract 3.02, and I would need precise recognition
 of capital letters. Unfortunately my files are full of all caps and small
 caps. During the training if I included such words in the sample, I got
 random capitals in the rest of the text. I thought I would try to put them
 into a new font, same. I included them in the dictionary files, somewhat
 better, but still problematic at letter o, u, v etc. I.e. HELLo WoRLD &
 friends, despite having HELLO WORLD in dictionary.
 It's quite similar to this:
 https://code.google.com/p/tesseract-ocr/issues/detail?id=691
 What is your experience? How to train Tesseract for caps? Is it better
 in later versions? Is there a configuration parameter to set?
 Thanks!

 Kazi
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/16a46021-43b9-484f-a66f-e3b077b4aadb%40googlegroups.com
>>> 
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/b07dfde1-a659-4caf-83a7-23464b7f7a27%40googlegroups.com
> 
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8y_bpX%2BiLjfWDsAgiroAsv1i3d2YFesoS%3DsXsFcUSfGZg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Atti Parlamentari

— 3423 ——

Camera dei Deputati

XII LEGISLATURA - DISCUSSIONI — SEDUTA DEL 6 O'ITOBRE 1994

i fatti di cui a1 primo ed al secondo comma,
se gli interessi o i vantaggi usurari fossero
risultati di sette volte e mezzo superiori al
tasso