[ocropus] Split up page into words

OCRopus newbie Mon, 14 Jan 2013 17:21:10 -0800

Hi there,

I would like to split up books of pages of hand-written text into words. No 
OCR should be attempted.


The idea is as follows:

1) Split up scanned page into first line, then word images, maintain 
relationship word <-> page.

2) Possibly discard some of the word images based on some criteria

3) Use some algorithm to sort the word images by "similarity". Ideally, 
similar words would end up close to each other.

Use all of this to create an index of the book.

Is this something OCRopus can be useful for?

I've tried OCRopus to to the first part. It works well on the line part, 
but then goes directly to characters, there is no step of words.

Thanks for any input!

Cheers,
Gerhard



-- 
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/ocropus/-/v-ZB4kmCOagJ.
For more options, visit https://groups.google.com/groups/opt_out.

[ocropus] Split up page into words

Reply via email to