Tika calling exiftool and ffmpeg?

2016-09-01 Thread Chris Bamford
ernal-parsers.xml file ? I am using tika 1.12. Thanks - Chris Chris Bamford Lead Software Engineer m: +44 7860 405292 p: +44 207 847 8700 w: www.mimecast.com Address click here: www.mimecast.com/About-us/Contact-us/

Migrating to 2.0.0

2015-11-26 Thread Chris Bamford
()) { // give up } } What is the 2.0.0 equivalent? Thanks, - Chris [ YouTube: http://www.youtube.com/user/mimecast#p/u/15/_523kC3lcNQ] [ Twitter: http://twitter.com/mimecast ] [ Our Blog: http://blog.mimecast.com/ ] Chris Bamford Lead Software Engineer c: +44 7860 405292 p: +44 207 847

A question about extracting files embedded in a PDF

2015-11-04 Thread Chris Bamford
n be streamed out so the memory impact is low. Is there a way to do this? Thanks - Chris [ YouTube: http://www.youtube.com/user/mimecast#p/u/15/_523kC3lcNQ] [ Twitter: http://twitter.com/mimecast ] [ Our Blog: http://blog.mimecast.com/ ] Chris Bamford Lead Software Engineer c: +44 7860 405

Runtime exception in PDFParser.parse()

2015-01-21 Thread Chris Bamford
n(message); } I have also tried to turn logging up in my log4j.properties: log4j.logger.org.apache.pdfbox=TRACE But I see nothing except the slf4j line above. Can anyone tell me what I am doing wrong? Thanks - Chris Chris Bamford m: +44 7860 405292 www.mimecast.com Mimecast City

Re: Extraction of text from secured PDFs throws runtime exception

2015-01-15 Thread Chris Bamford
Thanks Tilman - I'll try your suggestions. Chris Bamford m: +44 7860 405292 www.mimecast.com Mimecast CityPoint One Ropemaker Street, London, EC2Y 9AW +44 (0) 207 847 8700 On 15 Jan 2015, at 17:45, Tilman Hausherr mailto:thaush...@t-online.de>> wrote: Hi,

Extraction of text from secured PDFs throws runtime exception

2015-01-15 Thread Chris Bamford
decryptPDFDoc() call? My maven pom dependencies are: org.bouncycastle bcprov-jdk15 1.44 org.bouncycastle bcmail-jdk15 1.44 Have I got this right? Thanks for any pointers. - Chris Chris Bamford Senior Developer m: +44 7860 405292 p: +44 207 847 8700 w: www.mimecast.com

Text ordering

2014-09-10 Thread Chris Bamford
processing chokes on. I have tried following it in the debugger but it is quite involved and I cannot see the conditions which cause it. Question: could a space or ā€˜\n’ could be inserted after each of these virtual sections? Thanks - Chris Chris Bamford Senior Developer m: +44 7860 405292 p: +44 207

Re: Handling PDFs with missing version in header

2013-10-10 Thread Chris Bamford
Chris Bamford m: +44 7860 405292 www.mimecast.com Mimecast CityPoint One Ropemaker Street, London, EC2Y 9AW +44 (0) 207 847 8700 On 10 Oct 2013, at 11:23, Maruan Sahyoun wrote: put in 1.4 an you are fine. In fact PDFBox doesn't take the version number into account. Aw

Handling PDFs with missing version in header

2013-10-10 Thread Chris Bamford
Hi there, I am attempting text extraction with PDFBox 1.8.2. For reasons I cannot explain, I am sometimes sent PDFs with no version number in the header, e.g. %PDF-\r\n instead of, say %PDF-1.7\r\n (I have checked, the version number does not appear in the next couple of lines, either.) Th

Re-post: Extract images in a memory-friendly way

2013-08-01 Thread Chris Bamford
Hello, I recently posted a question about an alternative approach to extracting images but got no reply. Does anyone have any ideas? Ideally I'd like to be able to pull the images out one by one, preferably via a stream so they can be written straigh to file with minimum impact on heap space -

Extract images in a memory-friendly way

2013-07-29 Thread Chris Bamford
Hi folks, Is there an approach I can use to extract images from a PDF file one at a time so they are not all loaded into memory at once? Thanks, - Chris

(PDFBOX-953) PDFBox fails to ExtractText from Adobe Acrobat X 256-bit AES encrypted documents

2012-08-28 Thread Chris Bamford
Hi. Is there any news on when PDFBox will support this? Thanks, - Chris Chris Bamford Developer 2 - 8 Balfe Street Kings Cross, London, N1 9EG mobile +44 7860 405292 tel: +44 (0) 207 843 2300 web www.mimecast.com The information contained in this communication from cbamf...@mimecast.com

Fwd: PDFBOX-737

2012-02-22 Thread Chris Bamford
Digging into the code I see that Adam's patch /does/ appear in 1.6.0 (sorry, missed that). What I'd really like to know is if there is any chance of the underlying problem being fixed ? Thanks again, - Chris Begin forwarded message: Date: 22 February 2012 14:32:08 GMT To: mailto:users@pdfbox

PDFBOX-737

2012-02-22 Thread Chris Bamford
Hi, Which release is this fix going into? I can't see it in 1.6.0 or 1.7.0 (roadmap). https://issues.apache.org/jira/browse/PDFBOX-737 Thanks, - Chris

Text extraction from PDF file fails

2012-02-20 Thread Chris Bamford
= document.getDocumentCatalog().getAllPages().size(); for (int i = 1; i <= pageCount; i++) { stripper.setStartPage( i ); stripper.setEndPage( i ); rtnBuffer.append(stripper.getText(document)); } Is there something wrong with my code or is the document malformed? Thanks, - Chris Chris Bamford Software Engineer 2

Fwd: Pushback buffer is full

2012-02-14 Thread Chris Bamford
Hi - sorry to repeat my question, but I have so far had no response. Can anyone help? Have I posted to the wrong place ? Thanks Chris Begin forwarded message: From: Chris Bamford mailto:cbamf...@mimecast.com>> Date: 7 February 2012 17:46:44 GMT To: "users@pdfbox.apache.org&l

Pushback buffer is full

2012-02-07 Thread Chris Bamford
ng lots of text and images work perfectly. Unfortunately I cannot supply an example file at this time for customer confidentiality reasons, but I was hoping someone might have overcome the same issue elsewhere? Thanks for any tips ... - Chris Chris Bamford Software Engineer 2 - 8 Balfe Street K