+1000 I'm not the Windows guru, but will try and look it up



-----Original Message-----
From: Nick Burch <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Thursday, June 26, 2014 5:55 AM
To: "[email protected]" <[email protected]>
Subject: Re: Question re installing Tika

>On Thu, 26 Jun 2014, Chris Mattmann wrote:
>> looks like a great example to put on the website too ;)
>
>To be fair to all users, we probably ought to have an example that works
>on windows as well. Any powershell gurus around who care to take a stab
>at 
>the windows equivalent?
>
>Nick
>
>> -----Original Message-----
>> From: Nick Burch <[email protected]>
>> Reply-To: <[email protected]>
>> Date: Thursday, June 26, 2014 5:23 AM
>> To: "[email protected]" <[email protected]>
>> Subject: RE: Question re installing Tika
>>
>>> On Thu, 26 Jun 2014, Richard wrote:
>>>> You haven't by chance happen to have programmatically looped through a
>>>> directory full of pdfs and used Tika to extract each of their pdf
>>>> contents into separate text or xml files? If so, what do you recommend
>>>> to do the extraction?
>>>
>>> For a proof of concept, how about something simple like a bash for loop
>>> and the tika app?
>>>
>>> for i in *.pdf; do j=`echo "$i" | sed 's/.pdf//'`; java -jar
>>>tika-app.jar
>>>   --text "$i" > "$j.txt"; java -jar tika-app.jar --xml "$i" > "$j.xml";
>>> done
>>>
>>> Nick
>>
>>
>>

Reply via email to