Got it, thanks! Any idea why Tika might not be working? I've been testing
and while no exceptions are being thrown, neither is anything being
appended when I call pdfText.append(contenthandler.toString());
On Fri, Dec 5, 2014 at 6:21 PM, Pradeep Gollakota
wrote:
> A static variable is not necess
A static variable is not necessary... a simple instance variable is just
fine.
On Fri Dec 05 2014 at 2:27:53 PM Ryan wrote:
> After running it with updated code, it seems like the problem has to do
> with something related to Tika since my output says that my input is the
> correct number of byt
After running it with updated code, it seems like the problem has to do
with something related to Tika since my output says that my input is the
correct number of bytes (i.e. it's actually being sent in correctly). Going
to test further to narrow down the problem.
Pradeep, would you recommend usin
Thanks Pradeep! I'll give it a try and report back
Ryan
On Fri, Dec 5, 2014 at 12:30 PM, Pradeep Gollakota
wrote:
> I forgot to mention earlier that you should probably move the PdfParser
> initialization code out of the evaluate method. This will probably cause a
> significant overhead both in
I forgot to mention earlier that you should probably move the PdfParser
initialization code out of the evaluate method. This will probably cause a
significant overhead both in terms of gc and runtime performance. You'll
want to initialize your parser once and evaluate all your docs against it.
- P
Java string's are immutable. So "pdfText.concat()" returns a new string and
the original string is left unmolested. So at the end, all you're doing is
returning an empty string. Instead, you can do "pdfText =
pdfText.concat(...)". But the better way to write it is to use a
StringBuilder.
StringBui
Hi,
I'm working on an open source project attempting to convert raw content
from a pdf (stored as a databytearray) into plain text using a Pig UDF and
Apache Tika. I could use your help. For some reason, the UDF I'm using
isn't working. The script succeeds but no output is written. *This is the
Pi