It is not a trivial matter. The best bet would be to take an existing pdf import tool for a word processor, and try to write a similar tool for wikitext.
There is the Oracle PDF Import Extension for Open Office, the code can be browsed, maybe it can give you some ideas http://extensions.services.openoffice.org/project/pdfimport Micru On Wed, Jun 12, 2013 at 12:38 PM, Alex Brollo <[email protected]> wrote: > When we tried to convert into wiki code (a needed step to add links and to > convert files into a "wiki hypertext") a pdf file, that's a opaque, closed > format, such a work turned off in a nightmare. If we simply load free pdf > books "as they are", I don't see any advantage, but "feed wikisource > numbers/statistics" nd this in presently far from my personal interest. > > As you guess, I'm one of users who don't support Aubrey's enthusiasm about > texts born digital, even if free. :-) > > Alex > > > 2013/6/12 David Cuenca <[email protected]> > >> Nobody is saying anything about using copyrighted works, there are many >> books that have an open license that would allow to include them in >> Wikisource. >> >> For instance in ca-ws we have this translation from 2009: >> >> http://ca.wikisource.org/wiki/Llibre:El_secret_de_l%E2%80%99or_que_creix_%282009%29.djvu >> >> The original is in the PD, and the translator gave away his rights. It >> would have been much easier to work directly with the pdf, instead of >> converting to djvu. >> >> Micru >> >> >> On Wed, Jun 12, 2013 at 10:47 AM, Aarti K. Dwivedi < >> [email protected]> wrote: >> >>> If I am not wrong, as of today, most books that were born digital, are >>> still under copyright. Of course, they are available freely on the >>> internet. But we can't use the pirated copies. How would we go about the >>> procurement of these books? >>> If we procure these copyrighted books, then the only we would have to do >>> is to check for proper formatting. Isn't it? >>> >>> >>> On Wed, Jun 12, 2013 at 7:58 PM, Lars Aronsson <[email protected]> wrote: >>> >>>> On 06/12/2013 02:48 PM, Andrea Zanni wrote: >>>> >>>>> We could define some tasks as >>>>> * corrected the page >>>>> * OPTIONAL added optional templates/links/annotations >>>>> *... >>>>> >>>> >>>> Geotagged all the photos, ... >>>> >>>> The list doesn't end. You need a generic mechanism >>>> for any new feature you can invent. But aren't our >>>> existing templates and categories the best way to >>>> do this? You could just add to each page: >>>> {{done|proofread=user1|**validated=user2|geotagged=**user4|...}} >>>> >>>> >>>> -- >>>> Lars Aronsson ([email protected]) >>>> Project Runeberg - free Nordic literature - http://runeberg.org/ >>>> >>>> >>>> >>>> >>>> ______________________________**_________________ >>>> Wikisource-l mailing list >>>> [email protected].**org <[email protected]> >>>> https://lists.wikimedia.org/**mailman/listinfo/wikisource-l<https://lists.wikimedia.org/mailman/listinfo/wikisource-l> >>>> >>> >>> >>> >>> -- >>> Aarti K. Dwivedi >>> >>> >>> _______________________________________________ >>> Wikisource-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >>> >>> >> >> >> -- >> Etiamsi omnes, ego non >> _______________________________________________ >> Wikisource-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikisource-l >> >> > > _______________________________________________ > Wikisource-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikisource-l > > -- Etiamsi omnes, ego non
_______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
