The code and logic have changed rather dramatically, and it's probably easier to do this differently in 0.4.
If you want to do this, here's my recommendation: Look at the buildhtml command in C++. You can either use that and modify it, or you can rewrite that in Python. You get the information about the line bounding boxes from book/*.pseg.png. For each of those lines, you get the corresponding character bounding boxes from book/*/*.cseg.png and the corresponding characters from book/0000/*.txt You put all that information together and generate the HTML you like. That's how we're going to do it in 0.5 and also our PDF generation code. However, there is one caveat: right now, the file names for text lines are of the form book/0000/0000.png; this will change to book/0000/000000.png, where the six digit number is the hex digits corresponding to the line color in the book/*.pseg.png image. Right now, it's hard to relate the text line file names to the rectangles on the page. Have a look here: https://docs.google.com/View?id=dd7kkvh9_0hrnvq2cv https://docs.google.com/View?id=dfxcv4vc_92c8xxp7 Tom On Fri, Jul 10, 2009 at 23:59, travis<[email protected]> wrote: > > would it be possible for me to transfer the code that did this in 0.3 > to 0.4? would it be bad (impossible?) to do this? > > .travis > > On Jul 10, 12:59 pm, tmbdev <[email protected]> wrote: >> Not yet, but it's a frequently requested feature, we need it for other >> applications, and we're working on it. >> >> I've added an issue to keep track of it. >> >> Tom >> >> On Jul 8, 1:37 am, travis <[email protected]> wrote: >> >> >> >> > i read a post that said getting word bounding boxes was possible in >> > 0.3 and thus i am hopeful that it is also possible in 0.4. my question >> > is this: what is the easiest way to get word bounding boxes? i am >> > quite willing to modify code, but i would like to do it the most >> > "correct" way possible. >> >> > .travis > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "ocropus" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/ocropus?hl=en -~----------~----~----~----~------~----~------~--~---
