On Thu, 26 Jun 2014, Chris Mattmann wrote:
looks like a great example to put on the website too ;)
To be fair to all users, we probably ought to have an example that works
on windows as well. Any powershell gurus around who care to take a stab at
the windows equivalent?
Nick
-----Original Message-----
From: Nick Burch <[email protected]>
Reply-To: <[email protected]>
Date: Thursday, June 26, 2014 5:23 AM
To: "[email protected]" <[email protected]>
Subject: RE: Question re installing Tika
On Thu, 26 Jun 2014, Richard wrote:
You haven't by chance happen to have programmatically looped through a
directory full of pdfs and used Tika to extract each of their pdf
contents into separate text or xml files? If so, what do you recommend
to do the extraction?
For a proof of concept, how about something simple like a bash for loop
and the tika app?
for i in *.pdf; do j=`echo "$i" | sed 's/.pdf//'`; java -jar tika-app.jar
--text "$i" > "$j.txt"; java -jar tika-app.jar --xml "$i" > "$j.xml";
done
Nick