> I am currently using the ocr system Cuneiform. For flexibility I want > to use the hocr format.
Great! > In order to standardize the output from Cuneiform, I want to follow > the standard as close as possible. > Ocropus refers to this page for the standard: > http://docs.google.com/View?docid=dfxcv4vc_67g844kf > > I have not been able to find any other spec so I suppose this is still > the official standard (last update 2007). Yes, that's the official document. > Who would be the owner of the hocr spec? I maintain it. > Are any changes foreseen/planned? No; most of the hard parts of OCR output formats (styles, fonts, script-dependent issues) are taken care of by the HTML spec. hOCR just describes how to denote OCR-specific information like bounding boxes. If there is something completely different you need (e.g., bibliographic markup, etc.), just use and/or define a separate microformat to represent it. If there is something engine-specific you need, pick an ocrx_... tag that doesn't conflict with an existing one. ocr_... tags are intended to represent engine-independent information, so for that, it's probably a good idea to talk about it before picking a new tag. Tom --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
