Nick Thanks for the response.
For the moment I have been able to work around this by deriving my own org.xml.sax.helpers.DefaultHandler derivative and avoiding the core (calling OfficeParser directly). Tika seems to work with a few other xls files. I tried to install the most recent version of jaxp, but that didn't seem to affect. Unfortunately I can't upload the XLS publicly, though its not *that* confidential, and I would happily send it to a developer if they promise not to publish it. I wonder if it couldn't be a character encoding issue? -- the spreadsheet in question is itself a "mash-up" of spreadsheets from various banks all over the world, presumably some using different character encodings. It was "cut and paste" together in excel. I am not an active java developer -- now I code mostly in python (jython for this project). I wonder if you spot anything unusual in the stack trace I sent -- could I have something on my CLASSPATH that is better left out? -- Shaun On Dec 19, 2010, at 7:43 PM, Nick Burch wrote: > On Fri, 17 Dec 2010, Shaun Cutts wrote: >> Caused by: java.lang.NullPointerException >> at >> com.sun.org.apache.xml.internal.serializer.ToStream.writeAttrString(ToStream.java:1962) >> at >> com.sun.org.apache.xml.internal.serializer.ToStream.processAttributes(ToStream.java:1942) > > This doesn't look like the sort of code that should be giving problems... > > Can you try with some other excel files and see if they work though? If they > do, any chance you could upload the problem file to jira so we can try to > track down why the core JVM xml code is null pointering > > Nick
