> 
> I have enabled the ppt extension from the
> crawl-urlfilter.txt, Now it is fetching the powerpoint
> files,
> But i am getting the following error, bcos ppt files
> content type is not taken by nutch..

Looking at the code, here is a copy of the comment of the ParserFactory (the 
class that choose which parser to use):
/*****
Content type has priority: the first plugin found whose
"contentType" attribute matches the beginning of the content's type is
used. If none match, then the first whose "pathSuffix" attribute matches
the end of the url's path is used. If neither of these match, then the
first plugin whose "pathSuffix" is the empty string is used.
*******/

In your case, it is the first plugin whose "pathSuffix" is the empty string 
that is used (a kind of random one).
I think, your http server is not well configured.
It returns application/powerpoint for powerpoint content type, where as the 
mspowerpoint plugin is called for content type 
application/vnd.ms-powerpoint.
Please, change your http server configuration and all should works fine.

Regards

Jérôme

Reply via email to