Hi jerome,
Now i am trying nutch 7.0. I am using the plugin from
JIRA,but still while building the plugin using ant,i
am getting two exceptions from the excel plugin
compile:
[echo] Compiling plugin: parse-msexcel
[javac] Compiling 3 source files to
/home/oss/nutch-0.7/build/parse-msexcel/classes
[javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:35:
getParse(org.apache.nutch.protocol.Content) in
org.apache.nutch.parse.msexcel.MSExcelParser cannot
implement getParse(org.apache.nutch.protocol.Content)
in org.apache.nutch.parse.Parser; overridden method
does not throw org.apache.nutch.parse.ParseException
[javac] public Parse getParse(final Content
content)throws ParseException {
[javac] ^
[javac]
/home/oss/nutch-0.7/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel/MSExcelParser.java:103:
cannot resolve symbol
[javac] symbol : constructor ParseData
(java.lang.String,org.apache.nutch.parse.Outlink[],java.util.Properties)
[javac] location: class
org.apache.nutch.parse.ParseData
[javac] final ParseData parseData = new
ParseData(resultTitle, outlinks, metadata);
[javac] ^
[javac] 2 errors
how to avoid the above errors,
thanks,
Ayyanar...
--- Jérôme Charron <[EMAIL PROTECTED]> wrote:
> > Sample lines taken while crawling, where excel is
> > taken as application/pdf
>
>
> I don't think that your xsl file is taken as a pdf,
> but as an unknown file
> type (Content-Type: null).
> In Nutch 0.6, if the httpd server is badly
> configured and doesn't return a
> godd content-type, Nutch can't find it itself (and
> then process is aborted).
> In Nutch 0.7, the mime-type detector tries to find
> the document's type if
> not sended by the server (it is a first step in
> detection, the next is to
> check that the type returned by the server is the
> good one). If you can, try
> nutch-7, that should solve your problem (
> http://lucene.apache.org/nutch/release/)
>
> Regards
>
> Jérôme
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com