RE: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-05-02 Thread Allison, Timothy B.
>> While PDFBox is a part of TIKA and the two projects are kindof "best friends >> forever" Thank you, Tilman! :) -Original Message- From: Tilman Hausherr [mailto:thaush...@t-online.de] Sent: Saturday, April 30, 2016 5:24 PM To: users@pdfbox.apache.org Subject: Re: is it possible to

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-05-01 Thread Tilman Hausherr
Am 01.05.2016 um 03:06 schrieb David Green: sorry for using wrong forum is there a tika forum ? https://mail-archives.apache.org/mod_mbox/tika-user/ your suggested command is working of a fashion java -jar c:\jars\tika-app-1.12.jar -J -t -i f: -o g: the directory structure is being

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-30 Thread David Green
sorry for using wrong forum is there a tika forum ? your suggested command is working of a fashion java -jar c:\jars\tika-app-1.12.jar -J -t -i f: -o g: the directory structure is being reproduced but the zip files are being copied as zip files (I think) the copied files retain the original

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-30 Thread Tilman Hausherr
Am 30.04.2016 um 19:46 schrieb David Green: you may gather that i am new to this. my original zip files containing pdf files are on my f drive I want the unpacked text files saved in an identical directory structure on my g drive I have tried: java -jar tika-app.X.Y.jar -J -t -i -o

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-30 Thread David Green
you may gather that i am new to this. my original zip files containing pdf files are on my f drive I want the unpacked text files saved in an identical directory structure on my g drive I have tried: java -jar tika-app.X.Y.jar -J -t -i -o resulted in "syntax error" can you please suggest

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-20 Thread Tilman Hausherr
Am 20.04.2016 um 21:51 schrieb David Green: . . . and save the text files in the same tree structure on another drive ? sure... but this is not a PDFBox problem, this is related to go through a ZIP file. Read about ZipInputStream and ZipEntry. Tilman

Re: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-20 Thread Branden Visser
PDFBox can extract the text from the PDF files for you, however unpacking the zip file, locating the PDF documents, saving in a different format and rezipping I believe is something you'll have to handle with other other libraries like commons-compress [1]. Hope that helps. Branden [1]

RE: is it possible to batch extract text from pdf files within a tree of folders within a zip file ?

2016-04-20 Thread Allison, Timothy B.
Might want to look at Tika (which uses PDFBox) for that. Let's say you have an that contains your zips. java -jar tika-app.jar -J -t -i -o See if that gets you close enough. -Original Message- From: davidgreen.co...@gmail.com [mailto:davidgreen.co...@gmail.com] On Behalf Of David