I have exactly the same problem for Amharic. I find three characters
missing; and they are screwing the Ocr result.
Dear Shree, can you help me please?
On Friday, January 6, 2017 at 3:50:38 PM UTC+3 shree wrote:
> I have uploaded modified nor.traineddata at
>
>
I think I might stick with the postprocessing for now, too much oddities I need
tonlearn to be able to compile it ;). Still, I think this project is awesome
and I might take it up a notch and try the same I am doing now just using .net
code :)
--
You received this message because you are
Actually postprocessing with replace for AE will be the best bet as 4.0 is
slower than the tesseract engine for latin-based scripts.
You can experiment with 4.0.0alpha.
See https://github.com/tesseract-ocr/tesseract/wiki/Compiling
you will also need to compile the latest version of leptonica
No worries, I will play around and see what I can get working. For now I am
using a simple replace in my script to handle the Æ.
How would I go about if I were to compile tesseract 4.0 alpha using git and
cmake? The wiki says the 4.0 alpha Source code is available in the master
branch of the
Sorry, I am not familiar with powershell and nuget.
If you are on Windows, you can try the experimental binaries for 4.0.0alpha
for gimagereader, gui front-end to Tesseract-ocr. You can ocr a pdf
directly or load multiple images at the same time.
- excuse the brevity, sent from mobile
On
Thanks Shree :D. Really appreciate it. Will this work with v3.03 too? I am
basing my code on this: https://github.com/jourdant/powershell-paperless
and there is a script to initialize the environment that is getting the
tesseract files from here: https://nuget.org/api/v2/package/tesseract-ocr.
I have uploaded modified nor.traineddata at
https://github.com/Shreeshrii/tessdata4alpha/blob/master/nor.traineddata
See attached log and info file for commands used in training. It took about
9 hours on my pc - about 1700 iterations only and then my PC froze so I
rebooted and created the
Den torsdag 5 januari 2017 kl. 04:39:01 UTC+1 skrev shree:
>
> Ray is planning to retrain the languages for the new 4.0.0 version
> sometime in January. So it would be helpful if you could open an issue on
> https://github.com/tesseract-ocr/langdata/issues with this information.
>
Is it
Tried 'Finetune' - that does not help in addition of a character.
Trying 'Add a layer' now.
ShreeDevi
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
On Thu, Jan 5, 2017 at 8:59 PM, Ludvig F Aarstad wrote:
>
Fantastic, thanks:).
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to
I will give it a try and let you know.
--
You received this message because you are subscribed to the Google Groups
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send
I can come up with several samples, if that helps.
I also realized that the occurrence of Æ in the beginning of a sentence is
quite rare. It will in most cases only be for names of people (surnames
mostly) and names of places and streets in addition to some specific
Norwegian words that can
Ray is planning to retrain the languages for the new 4.0.0 version sometime
in January. So it would be helpful if you could open an issue on
https://github.com/tesseract-ocr/langdata/issues with this information.
Also, if you can provide a sample representative Norwegian text including Æ,
I will
If someone feels up to it, any chance of dumbing down the procedure for adding
in a missing letter in the norwegian language? I am happy tondl the legwork,
just need to understand the concept, and I don't quite understand it when
reading the guides.
An easy list containing the steps would do
>
> Hm, in Norwegian it isn't that rare. Or at least shouldn't be ;). Æ is
>> the uppercase version of æ, and it would never occur in the middle of a
>> word.
>>
>
> I find it strange that it has been left out alltogether. What must I do to
>> get it in there?
>>
>
tirsdag 3. januar 2017
First, the latest version is 3.04 (although there's also a tag for 3.05).
Second, there will soon (hopefully) be a release for 4.00 which will make
3.x obsolete.
Having said that, it looks like the root cause of your problem is that
Tesseract doesn't know Æ is a possible letter for Norwegian.
16 matches
Mail list logo