2013/6/12 David Cuenca <[email protected]>: > It is not a trivial matter. The best bet would be to take an existing pdf > import tool for a word processor, and try to write a similar tool for > wikitext. > > There is the Oracle PDF Import Extension for Open Office, the code can be > browsed, maybe it can give you some ideas > http://extensions.services.openoffice.org/project/pdfimport
PDF scraping is a technique that's is gaining more and more attention since a lot of data on the web are hidden in PDF; so some libraries for this task are under development. My favorite language being Python I will suggest this blog post: http://blog.scraperwiki.com/2010/12/17/scraping-pdfs-now-26-less-unpleasant-with-scraperwiki/ C _______________________________________________ Wikisource-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikisource-l
