Hi, 
 
sorry for my delay, I was some days offline. 
The problem is, that I check within the plugin for mime-type 
"application/vnd.ms-powerpoint" but not for "application/powerpoint".
I think, I have to improve the plugin to read the supported mime-types from 
plugin.xml.
 
On the other hand, is it really required to check mime type in the source or 
can I be shure that it is correct at this point becaue it is defined in 
plugin.xml?
 
Thanks for your suggestions,
 
Stephan

________________________________

Von: Ayyanar Inbamohan [mailto:[EMAIL PROTECTED]
Gesendet: Do 08.09.2005 07:23
An: [email protected]
Betreff: Re: nutch 7.0 not fetching powerpoint, plugin is present



Hi All,


As you all said,
1. i have added the powerpoint to mime type,
2. in the nutch-default.xml also i have added the
powerpoint plugin in the plugins list
3. in plugin.xml also i have added the content-type as
application/powerpoint

but still i am getting the problem


050908 105407 fetching
http://localhost:8080/search_sample/kmportal3.ppt
050908 105407 fetching
http://localhost:8080/search_sample/testpdf.pdf
050908 105407 fetching
http://localhost:8080/search_sample/kmportal10.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal2.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal4.ppt
050908 105407 fetching
http://localhost:8080/search_sample/kmportal6.ppt
050908 105407 fetching
http://localhost:8080/search_sample/testexcel.xls
050908 105407 fetching
http://localhost:8080/search_sample/javaCertStudyNotes.pdf
050908 105407 fetching
http://localhost:8080/search_sample/kmportal7.ppt
050908 105408 fetching
http://localhost:8080/search_sample/testdoc.doc
050908 105408 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal3.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105408 fetching
http://localhost:8080/search_sample/kmportal8.ppt
050908 105409 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal8.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105409 fetching
http://localhost:8080/search_sample/kmportal9.ppt
050908 105410 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal9.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105410 fetching
http://localhost:8080/search_sample/kmportal11.ppt
050908 105411 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal10.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105411 fetching
http://localhost:8080/search_sample/kmportal5.ppt
050908 105412 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal11.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105413 fetching
http://localhost:8080/search_sample/kmportal1.ppt
050908 105413 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal1.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105415 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal5.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105416 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal2.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105417 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal4.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint
050908 105418 fetch okay, but can't parse
http://localhost:8080/search_sample/kmportal7.ppt,
reason: failed(2,203): Content-Type not
application/msword: application/powerpoint



thanks,
Ayyanar...

--- Jérôme Charron <[EMAIL PROTECTED]> wrote:

> > 3. implement a catch-all plugin, which is
> equivalent to a Unix command
> > strings(1) (I have an implementation of that which
> I can contribute).
> > And turn it off/on in the config, if it's off,
> then the unknown content
> > is skipped and logged, if it's on - then make the
> best effort to extract
> > text.
>
> Andrzej, I really like this solution... +1
> In such a case, other parse-plugin doesn't need
> anymore to check the
> content-type: if they get some content, they assume
> it is of the good
> content-type.
>
> Regards
>
> Jérôme
>
>
> --
> http://motrech.free.fr/
> http://www.frutch.org/
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com



Reply via email to