Nick

Thanks for the response.

For the moment I have been able to work around this by deriving my own 
org.xml.sax.helpers.DefaultHandler derivative and avoiding the core (calling 
OfficeParser directly).

Tika seems to work with a few other xls files. I tried to install the most 
recent version of jaxp, but that didn't seem to affect.

Unfortunately I can't upload the XLS publicly, though its not *that* 
confidential, and I would happily send it to a developer if they promise not to 
publish it.

I wonder if it couldn't be a character encoding issue? -- the spreadsheet in 
question is itself a "mash-up" of spreadsheets from various banks all over the 
world, presumably some using different character encodings. It was "cut and 
paste" together in excel.

I am not an active java developer -- now I code mostly in python (jython for 
this project). I wonder if you spot anything unusual in the stack trace I sent 
-- could I have something on my CLASSPATH that is better left out?

-- Shaun

On Dec 19, 2010, at 7:43 PM, Nick Burch wrote:

> On Fri, 17 Dec 2010, Shaun Cutts wrote:
>> Caused by: java.lang.NullPointerException
>>      at 
>> com.sun.org.apache.xml.internal.serializer.ToStream.writeAttrString(ToStream.java:1962)
>>      at 
>> com.sun.org.apache.xml.internal.serializer.ToStream.processAttributes(ToStream.java:1942)
> 
> This doesn't look like the sort of code that should be giving problems...
> 
> Can you try with some other excel files and see if they work though? If they 
> do, any chance you could upload the problem file to jira so we can try to 
> track down why the core JVM xml code is null pointering
> 
> Nick

Reply via email to