+1000 I'm not the Windows guru, but will try and look it up
-----Original Message----- From: Nick Burch <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Thursday, June 26, 2014 5:55 AM To: "[email protected]" <[email protected]> Subject: Re: Question re installing Tika >On Thu, 26 Jun 2014, Chris Mattmann wrote: >> looks like a great example to put on the website too ;) > >To be fair to all users, we probably ought to have an example that works >on windows as well. Any powershell gurus around who care to take a stab >at >the windows equivalent? > >Nick > >> -----Original Message----- >> From: Nick Burch <[email protected]> >> Reply-To: <[email protected]> >> Date: Thursday, June 26, 2014 5:23 AM >> To: "[email protected]" <[email protected]> >> Subject: RE: Question re installing Tika >> >>> On Thu, 26 Jun 2014, Richard wrote: >>>> You haven't by chance happen to have programmatically looped through a >>>> directory full of pdfs and used Tika to extract each of their pdf >>>> contents into separate text or xml files? If so, what do you recommend >>>> to do the extraction? >>> >>> For a proof of concept, how about something simple like a bash for loop >>> and the tika app? >>> >>> for i in *.pdf; do j=`echo "$i" | sed 's/.pdf//'`; java -jar >>>tika-app.jar >>> --text "$i" > "$j.txt"; java -jar tika-app.jar --xml "$i" > "$j.xml"; >>> done >>> >>> Nick >> >> >>
