looks like a great example to put on the website too ;) ------------------------ Chris Mattmann [email protected]
-----Original Message----- From: Nick Burch <[email protected]> Reply-To: <[email protected]> Date: Thursday, June 26, 2014 5:23 AM To: "[email protected]" <[email protected]> Subject: RE: Question re installing Tika >On Thu, 26 Jun 2014, Richard wrote: >> You haven't by chance happen to have programmatically looped through a >> directory full of pdfs and used Tika to extract each of their pdf >> contents into separate text or xml files? If so, what do you recommend >> to do the extraction? > >For a proof of concept, how about something simple like a bash for loop >and the tika app? > >for i in *.pdf; do j=`echo "$i" | sed 's/.pdf//'`; java -jar tika-app.jar > --text "$i" > "$j.txt"; java -jar tika-app.jar --xml "$i" > "$j.xml"; >done > >Nick
