Re: [Wikisource-l] Converting pdf files into wiki markup

Cristian Consonni Thu, 13 Jun 2013 17:04:30 -0700

2013/6/12 David Cuenca <[email protected]>:
> It is not a trivial matter. The best bet would be to take an existing pdf
> import tool for a word processor, and try to write a similar tool for
> wikitext.
>
> There is the Oracle PDF Import Extension for Open Office, the code can be
> browsed, maybe it can give you some ideas
> http://extensions.services.openoffice.org/project/pdfimport


PDF scraping is a technique that's is gaining more and more attention
since a lot of data on the web are hidden in PDF; so some libraries
for this task are under development.
My favorite language being Python I will suggest this blog post:
http://blog.scraperwiki.com/2010/12/17/scraping-pdfs-now-26-less-unpleasant-with-scraperwiki/

C

_______________________________________________
Wikisource-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikisource-l

Re: [Wikisource-l] Converting pdf files into wiki markup

Reply via email to