Hi,

there are some modifications nescessary, because the xls-plugin uses still an old interface. The changes are not difficult, but I still observe some other problems with this plugin.

Regards

        Michael

Ayyanar Inbamohan wrote:

Hi jerome,

Now i am trying nutch 7.0. I am using the plugin from
JIRA,but still while building the plugin using ant,i
am getting two exceptions from the excel plugin


compile:
     [echo] Compiling plugin: parse-msexcel
    [javac] Compiling 3 source files to
/home/oss/nutch-0.7/build/parse-msexcel/classes
    [javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:35:
getParse(org.apache.nutch.protocol.Content) in
org.apache.nutch.parse.msexcel.MSExcelParser cannot
implement getParse(org.apache.nutch.protocol.Content)
in org.apache.nutch.parse.Parser; overridden method
does not throw org.apache.nutch.parse.ParseException
    [javac]     public Parse getParse(final Content
content)throws ParseException {
    [javac]                  ^
    [javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:103:
cannot resolve symbol
    [javac] symbol  : constructor ParseData
(java.lang.String,org.apache.nutch.parse.Outlink[],java.util.Properties)
    [javac] location: class
org.apache.nutch.parse.ParseData
    [javac]    final ParseData parseData = new
ParseData(resultTitle, outlinks, metadata);
    [javac]                                ^
    [javac] 2 errors

how to avoid the above errors,



thanks,
Ayyanar...

--- Jérôme Charron <[EMAIL PROTECTED]> wrote:


Sample lines taken while crawling, where excel is
taken as application/pdf


I don't think that your xsl file is taken as a pdf,
but as an unknown file type (Content-Type: null).
In Nutch 0.6, if the httpd server is badly
configured and doesn't return a godd content-type, Nutch can't find it itself (and
then process is aborted).
In Nutch 0.7, the mime-type detector tries to find
the document's type if not sended by the server (it is a first step in detection, the next is to check that the type returned by the server is the good one). If you can, try nutch-7, that should solve your problem (
http://lucene.apache.org/nutch/release/)

Regards

Jérôme

--
Michael Nebel
http://www.nebel.de/
http://www.netluchs.de/

Reply via email to