https://bugzilla.wikimedia.org/show_bug.cgi?id=57807

George Orwell III <[email protected]> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |[email protected],
                   |                            |[email protected]

--- Comment #5 from George Orwell III <[email protected]> ---
(In reply to vladjohn2013 from comment #0)
> Merge proofread text back into Djvu files
> 
> . . . The idea is to create an
> export tool that will get word positions and confidence levels using
> Tesseract and then re-map the text layer back into the DjVu file. If
> possible, word coordinates should be kept.

Isn't some of that already possible using DjVuLibre's built in DjVu-to-XML
scheme? (See attachments)

As far as I can tell, this method was once feasible & pursued then "abandoned"
some 7+ years ago for the current 'plain-text' dump approach we have now due to
some resource(?) issues at the time. Most of the related bits seem (to me) to
still be in place if you go by what is found in  
https://git.wikimedia.org/tree/mediawiki%2Fcore  

/includes/media/DjVu.php  and;
/includes/media/DjVuImage.php

It seems (again, to me) the first step on the path to making the proposal a
reality is to see if its still possible to actually generate an XML from a DjVu
file using the current state of mediawiki et. al as it stands today. I know
this is possible on a vanilla x86 local install of the DjVuLibre software
package (refer to the attachments again)... but all that online server, Linux,
Debian, Ubuntobama stuff is beyond me - and something along those lines is what
is in play here.

So:  Can anyone successfully generate the DjVuLibre defined XML derivative from
a .DjVu file using just the available mediawiki regime/scheme in place?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to