Re: [Wikisource-l] Two requests for MediaViewer

Alex Brollo Wed, 01 Oct 2014 13:23:44 -0700

2014-10-01 15:15 GMT+02:00 Jane Darnell <[email protected]>:

> I have seen many messy text-image mixes on Google books, especially older
> texts from manual typesetting days.  That's why I was wondering if it would
> be possible to have a tool that stores pages as you go, so you can step in
> and adjust it on a per page basis. I am not familiar with abbyy.xml files,
> but this may be the way to go
>


I burned out some millions of neurons while attempting to parse abbyy xml
files, since I'm not a professional programmer, but what I vaguely saw and
got is very, very exciting.  Unluckily my scripts are so rough that can't
be shared, but I'm certain that real programmers could get unbeliavable
results from such tons of data. I found too values of certainty of OCR
recognition for any character and for any word, so that uncertain words
could be highlighted when imported... or passed to a recaptcha tool. But
abbyy xml use would be a next step; what I'll like by now is simply mapped
text layer from djvu files - made simple and useful for any wikisource
user.

Alex

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Two requests for MediaViewer

Reply via email to