> I am currently using the ocr system Cuneiform. For flexibility I want
> to use the hocr format.

Great!

> In order to standardize the output from Cuneiform, I want to follow
> the standard as close as possible.
> Ocropus refers to this page for the standard:
> http://docs.google.com/View?docid=dfxcv4vc_67g844kf
>
> I have not been able to find any other spec so I suppose this is still
> the official standard (last update 2007).

Yes, that's the official document.

> Who would be the owner of the hocr spec?

I maintain it.

> Are any changes foreseen/planned?

No; most of the hard parts of OCR output formats (styles, fonts,
script-dependent issues) are taken care of by the HTML spec.  hOCR
just describes how to denote OCR-specific information like bounding
boxes.

If there is something completely different you need (e.g.,
bibliographic markup, etc.), just use and/or define a separate
microformat to represent it.

If there is something engine-specific you need, pick an ocrx_... tag
that doesn't conflict with an existing one.

ocr_... tags are intended to represent engine-independent information,
so for that, it's probably a good idea to talk about it before picking
a new tag.

Tom

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to