PDFBox and COVID-19

Peter Murray-Rust Thu, 26 Mar 2020 03:15:08 -0700

One of the tools in tackling epidemics is to collect, clean and analyze the
science. I have set up a site https://github.org/petermr/openVirus to
download scientific papers and convert them to semantic, queryable form.
There can be many thousands of papers on topics such as ventilators, or
epidemics and schools, as well as genomes and chemistry. Many of us
believe these could hold information useful to current and future COVID-19
strategies (e.g. the Liberian Ebola outbreak was actually predicted in a
paper).
  Most papers are exposed as PDF which means it's hard for machines to read
them reliably. PDFBox is an essential tool in my workflow. Work on parsing
PDF is never complete, so any volunteers work be welcome (mail me NOT this
list). (I do diagrams as well as text).


And sincere thanks to the small team that keep PDFBox going. Simply seeing
the daily stream of factual issues is an encouragement to all of us.

P.

-- 
"I always retain copyright in my papers, and nothing in any contract I sign
with any publisher will override that fact. You should do the same".

Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

PDFBox and COVID-19

Reply via email to