The code and logic have changed rather dramatically, and it's probably
easier to do this differently in 0.4.

If you want to do this, here's my recommendation:

Look at the buildhtml command in C++.  You can either use that and
modify it, or you can rewrite that in Python.

You get the information about the line bounding boxes from
book/*.pseg.png.  For each of those lines, you get the corresponding
character bounding boxes from book/*/*.cseg.png and the corresponding
characters from book/0000/*.txt

You put all that information together and generate the HTML you like.

That's how we're going to do it in 0.5 and also our PDF generation code.

However, there is one caveat: right now, the file names for text lines
are of the form book/0000/0000.png; this will change to
book/0000/000000.png, where the six digit number is the hex digits
corresponding to the line color in the book/*.pseg.png image.  Right
now, it's hard to relate the text line file names to the rectangles on
the page.

Have a look here:

https://docs.google.com/View?id=dd7kkvh9_0hrnvq2cv

https://docs.google.com/View?id=dfxcv4vc_92c8xxp7

Tom

On Fri, Jul 10, 2009 at 23:59, travis<[email protected]> wrote:
>
> would it be possible for me to transfer the code that did this in 0.3
> to 0.4? would it be bad (impossible?) to do this?
>
>   .travis
>
> On Jul 10, 12:59 pm, tmbdev <[email protected]> wrote:
>> Not yet, but it's a frequently requested feature, we need it for other
>> applications, and we're working on it.
>>
>> I've added an issue to keep track of it.
>>
>> Tom
>>
>> On Jul 8, 1:37 am, travis <[email protected]> wrote:
>>
>>
>>
>> > i read a post that said getting word bounding boxes was possible in
>> > 0.3 and thus i am hopeful that it is also possible in 0.4. my question
>> > is this: what is the easiest way to get word bounding boxes? i am
>> > quite willing to modify code, but i would like to do it the most
>> > "correct" way possible.
>>
>> >    .travis
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to