On Wed, 22 Dec 2010, Shaun Cutts wrote:
Ok -- I have rewritten my simple output formatter in java (see below). The unicode problems were perhaps java/python problems as you suspected. The real problem is that tika passes an "Attributes" which has a null value returned by getValue().
Are you able to use a debugger to figure out what Attribute has a null value?
Nick
