Hi,
there are some modifications nescessary, because the xls-plugin uses
still an old interface. The changes are not difficult, but I still
observe some other problems with this plugin.
Regards
Michael
Ayyanar Inbamohan wrote:
Hi jerome,
Now i am trying nutch 7.0. I am using the plugin from
JIRA,but still while building the plugin using ant,i
am getting two exceptions from the excel plugin
compile:
[echo] Compiling plugin: parse-msexcel
[javac] Compiling 3 source files to
/home/oss/nutch-0.7/build/parse-msexcel/classes
[javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:35:
getParse(org.apache.nutch.protocol.Content) in
org.apache.nutch.parse.msexcel.MSExcelParser cannot
implement getParse(org.apache.nutch.protocol.Content)
in org.apache.nutch.parse.Parser; overridden method
does not throw org.apache.nutch.parse.ParseException
[javac] public Parse getParse(final Content
content)throws ParseException {
[javac] ^
[javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:103:
cannot resolve symbol
[javac] symbol : constructor ParseData
(java.lang.String,org.apache.nutch.parse.Outlink[],java.util.Properties)
[javac] location: class
org.apache.nutch.parse.ParseData
[javac] final ParseData parseData = new
ParseData(resultTitle, outlinks, metadata);
[javac] ^
[javac] 2 errors
how to avoid the above errors,
thanks,
Ayyanar...
--- Jérôme Charron <[EMAIL PROTECTED]> wrote:
Sample lines taken while crawling, where excel is
taken as application/pdf
I don't think that your xsl file is taken as a pdf,
but as an unknown file
type (Content-Type: null).
In Nutch 0.6, if the httpd server is badly
configured and doesn't return a
godd content-type, Nutch can't find it itself (and
then process is aborted).
In Nutch 0.7, the mime-type detector tries to find
the document's type if
not sended by the server (it is a first step in
detection, the next is to
check that the type returned by the server is the
good one). If you can, try
nutch-7, that should solve your problem (
http://lucene.apache.org/nutch/release/)
Regards
Jérôme
--
Michael Nebel
http://www.nebel.de/
http://www.netluchs.de/