Re: [tesseract-ocr] Improving text recognition in musical scores

ShreeDevi Kumar Mon, 22 Jan 2018 00:23:20 -0800

You could try tesseract4.0.0alpha(latest commit from master branch) which
will allow you to use 'Latin' traineddata which supports most languages
written in Latin script. See if that gives you better recognition for the
text.


ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Jan 22, 2018 at 6:49 AM, 'Max Poliakovski' via tesseract-ocr <
[email protected]> wrote:

> Hello,
>
> the Audiveris music scanner <http://www.audiveris.org> utilizes Tesseract
> OCR v3.05.01 for recognition of textual items. The OCR is invoked after all
> basic musical objects (staves, notes, beams) have been recognized.
>
> Text recognition is performed on the preprocessed image with staves
> removed. Tesseract is currently executed in the PSM_AUTO mode. Text
> language(s) will be usually specified a priori by the user.
>
> We're currently looking for ways to improve text recognition because the
> current results we obtain with Tesseract are far from being satisfactory.
>
> Needless to say, musical scores usually represent a very difficult target
> for OCR systems. In order to understand why, let us analyze textual items
> in such a score (see attachment):
>
>    1. we got the title of piece, its composer and the arranger's name
>    written in bold typeface
>    2. there is a tempo indication ("With conviction") that contains a
>    musical symbol (the crotchet) Tesseract fails to recognize properly
>    3. the lyrics are scattered between the staves in form of
>    syllables followed by whitespaces and hyphens ("-"/"_")
>    4. chord symbols are located above the staves and usually contains
>    characters and character sequences confusing the OCR
>
> The above mentioned is just the tip of the iceberg because the items from
> the categories 1-3 can be written in different languages or even mix
> several languages together.
>
> Improved recognition of lyrics(3) and chords(4) is crucial because of
> their importance for the musical context.
>
> What can be done in oder to tweak Tesseract towards a better recognition
> of scattered syllables (as in the case of lyrics) and unusual character
> sequences (as in the case of chords)?
>
> We'd greatly appreciate any suggestions.
>
> Thank you in advance!
> Cheers
> Max Poliakovski from Audiveris project
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/f062f430-35ac-4010-8e80-e1864d3f1cb3%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/f062f430-35ac-4010-8e80-e1864d3f1cb3%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduV4PJ%3DyGy7NLFhqjC9g662nD8tUuLX1qakcOaZkaPsrmw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] Improving text recognition in musical scores

Reply via email to