Re: Help with Pig UDF?

2014-12-06 Thread Ryan
Got it, thanks! Any idea why Tika might not be working? I've been testing and while no exceptions are being thrown, neither is anything being appended when I call pdfText.append(contenthandler.toString()); On Fri, Dec 5, 2014 at 6:21 PM, Pradeep Gollakota wrote: > A static variable is not necess

Re: Help with Pig UDF?

2014-12-05 Thread Pradeep Gollakota
A static variable is not necessary... a simple instance variable is just fine. On Fri Dec 05 2014 at 2:27:53 PM Ryan wrote: > After running it with updated code, it seems like the problem has to do > with something related to Tika since my output says that my input is the > correct number of byt

Re: Help with Pig UDF?

2014-12-05 Thread Ryan
After running it with updated code, it seems like the problem has to do with something related to Tika since my output says that my input is the correct number of bytes (i.e. it's actually being sent in correctly). Going to test further to narrow down the problem. Pradeep, would you recommend usin

Re: Help with Pig UDF?

2014-12-05 Thread Ryan
Thanks Pradeep! I'll give it a try and report back Ryan On Fri, Dec 5, 2014 at 12:30 PM, Pradeep Gollakota wrote: > I forgot to mention earlier that you should probably move the PdfParser > initialization code out of the evaluate method. This will probably cause a > significant overhead both in

Re: Help with Pig UDF?

2014-12-05 Thread Pradeep Gollakota
I forgot to mention earlier that you should probably move the PdfParser initialization code out of the evaluate method. This will probably cause a significant overhead both in terms of gc and runtime performance. You'll want to initialize your parser once and evaluate all your docs against it. - P

Re: Help with Pig UDF?

2014-12-05 Thread Pradeep Gollakota
Java string's are immutable. So "pdfText.concat()" returns a new string and the original string is left unmolested. So at the end, all you're doing is returning an empty string. Instead, you can do "pdfText = pdfText.concat(...)". But the better way to write it is to use a StringBuilder. StringBui

Help with Pig UDF?

2014-12-05 Thread Ryan
Hi, I'm working on an open source project attempting to convert raw content from a pdf (stored as a databytearray) into plain text using a Pig UDF and Apache Tika. I could use your help. For some reason, the UDF I'm using isn't working. The script succeeds but no output is written. *This is the Pi