RE: Question re installing Tika

Nick Burch Thu, 26 Jun 2014 05:24:20 -0700

On Thu, 26 Jun 2014, Richard wrote:

You haven't by chance happen to have programmatically looped through adirectory full of pdfs and used Tika to extract each of their pdfcontents into separate text or xml files? If so, what do you recommendto do the extraction?

For a proof of concept, how about something simple like a bash for loopand the tika app?


for i in *.pdf; do j=`echo "$i" | sed 's/.pdf//'`; java -jar tika-app.jar
  --text "$i" > "$j.txt"; java -jar tika-app.jar --xml "$i" > "$j.xml"; done

Nick

RE: Question re installing Tika

Reply via email to