Ok, sorry about that, Nick.

Actually, now I see this is happening in the Nutch classes for the MS parse
plugin, possibly not not POI, so I've posted on the Nutch list instead. Just
wanted to reply to your question first although it now doesn't seem to be a
problem with POI not handling the Excel document.

So, unless you're curious, you don't need to read on below unless you're
involved with Nutch and the MS parse plugins. :)

Here's the trace I get when I print the stacktrace of any exception
occurring as I attempt to call the MSExcelParser's getParse(Content). It
seems I get an NPE in MSBaseParser.getParse().

[#|2006-10-04T09:13:15.102+0200|WARNING|sun-appserver-ee9.1|javax.enterprise.system.stream.err|_ThreadID=16;_ThreadName=httpWorkerThread-8080-1;_RequestID=0b18e2ae-0f79-4241-9e29-a322c8ae2bc6;|
java.lang.NullPointerException
        at org.apache.nutch.parse.ms.MSBaseParser.getParse(MSBaseParser.java:94)
        at
org.apache.nutch.parse.msexcel.MSExcelParser.getParse(MSExcelParser.java:40)
        at
<my_package>.DocumentParser.parseDocument(DocumentParser.java:154)
        ...

Looking at the source (MSBaseParser.java) at this line, it says:

****SNIP****
      extractor.extract(new ByteArrayInputStream(raw));
      text = extractor.getText();
      properties = extractor.getProperties();
      outlinks = OutlinkExtractor.getOutlinks(text, content.getUrl(),
getConf());
      
    } catch (Exception e) {
      return new ParseStatus(ParseStatus.FAILED,
                             "Can't be handled as micrsosoft document. " +
e)
                             .getEmptyParse(this.conf);
    }
    
    // collect meta data
    Metadata metadata = new Metadata();
    title = properties.getProperty(DublinCore.TITLE);      <========== This
is line 94
    properties.remove(DublinCore.TITLE);
****SNIP****

So I can only gather that my properties object is null. As seen above in
this snippet from the MSBaseParser class, properties is initially null but
assigned a value from the ExcelExtractor / MSExtractor (properties =
extractor.getProperties();) which I assume is becoming null although I would
have expected just an empty Properties object in return to avoid the NPE at
line 94.

Hopefully someone on the Nutch list can shed some light on that.


Thanks,
Trym


Nick Burch wrote:
> 
> On Tue, 3 Oct 2006, tryma wrote:
>> Anyone know about any patches or can suggest a work-around for this?
> 
> You'll need to give us more to go on than this.... stack traces, problem 
> files, failing unit tests etc
> 
> (I personally haven't noticed any problems with processing 
> word/excel/powerpoint 2003 files with POI)
> 
> Nick
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
> The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Excel-2003-problem-tf2241908.html#a6635249
Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to