i believe it can. check your configuration files, nutch-site.xml and nutch-default.xml.
you will find something like <property> <name>plugin.includes</name> <value>protocol-http|urlfilter-regex|parse-(text|html|swf|pdf)|index-(basic|anchor)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value> <description>Regular expression naming plugin directory names to include. Any plugin not matching this expression is excluded. In any case you need at least include the nutch-extensionpoints plugin. By default Nutch includes crawling just HTML and plain text via HTTP, and basic indexing and search plugins. In order to use HTTPS please enable protocol-httpclient, but be aware of possible intermittent problems with the underlying commons-httpclient library. </description> </property> add to the parsers "msword". change parse-(text|html|swf|pdf)| to parse-(text|html|swf|pdf|msword) there is a plugin in plugins folder, which is parsing ms word documents. parse-msword i have not tried it so far. Jair Piedrahita Vargas schrieb: > Can Nutch search inside the content of an msword file? I've tried, but it > says "parser not found for contentType=application/msword" > What can I do to correct this Error? > > Thanks > > JAIR PIEDRAHITA VARGAS > Gerencia de Investigación y Nuevas Tecnologías > Teléfono: 4040000 Ext 41632 > Av. los Industriales Cra 48 # 26-85 piso 6B > BANCOLOMBIA S.A > > > ________________________________ > El contenido de este mensaje puede ser información privilegiada y > confidencial. Si usted no es el destinatario real del mismo, por favor > informe de ello a quien lo envía y destrúyalo en forma inmediata. Está > prohibida su retención, grabación, utilización o divulgación con cualquier > propósito. Este mensaje ha sido verificado con software antivirus; en > consecuencia, el remitente de éste no se hace responsable por la presencia en > él o en sus anexos de algún virus que pueda generar daños en los equipos o > programas del destinatario. > ****************************************************************************************************** > This communication (including all attachments) may contain information that > is private, confidential and privileged. If you have received this > communication in error; please notify the sender immediately, delete this > communication from all data storage devices and destroy all hard copies. Any > use, dissemination, distribution, copying or disclosure of this message and > any attachments, in whole or in part, by anyone other than the intended > recipient(s) is strictly prohibited. This message has been checked with an > antivirus software; accordingly, the sender is not liable for the presence of > any virus in attachments that causes or may cause damage to the recipient's > equipment or software. > >
