Hi Jorge, I can indeed reproduce your problem using your code.
After some debugging... You have to add a contentType to your implementation in plugin.xml: <implementation id="ImageThumbnailParser" class="...ImageThumbnailParser"><parameter name="contentType" value="image/png"/></implementation> Good luck! Send from my iphone, Mathijs Homminga On Jun 28, 2012, at 0:12, Jorge Luis Betancourt Gonzalez <jlbetanco...@uci.cu> wrote: > Of course Mathijs, thank you for the time and the replies, here goes my > parse-plugins.xml (as an attachment). > > Greetings! > > ----- Mensaje original ----- > De: "Mathijs Homminga" <mathijs.hommi...@kalooga.com> > Para: user@nutch.apache.org > Enviados: Miércoles, 27 de Junio 2012 17:44:43 > Asunto: Re: Problema with NullPointerException on custom Parser > > Hmmm looking at the ParserFactory code, there can actually be several causes > for a NullPointerException... > Can you also send the parse-plugins.xml? > > Mathijs Homminga > > On Jun 27, 2012, at 23:23, Jorge Luis Betancourt Gonzalez > <jlbetanco...@uci.cu> wrote: > >> This is the content of my plugin.xml >> >> <plugin >> id="image-thumbnail" >> name="Image thumbnailer for Orion" >> version="1.0.0" >> provider-name="nutch.org"> >> >> <runtime> >> <library name="image-thumbnail.jar"> >> <export name="*"/> >> </library> >> </runtime> >> >> <requires> >> <import plugin="nutch-extensionpoints"/> >> </requires> >> >> <extension id="org.apache.nutch.parse.thumbnail.ImageThumbnailParser" >> name="Image thumbnailer parser" >> point="org.apache.nutch.parse.Parser"> >> <implementation id="ImageThumbnailParser" >> >> class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/> >> </extension> >> >> <extension >> id="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter" >> name="Image thumbnail indexing filter" >> point="org.apache.nutch.indexer.IndexingFilter"> >> <implementation id="ImageThumbnailIndexingFilter" >> >> class="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter"/> >> </extension> >> >> </plugin> >> >> >> ----- Mensaje original ----- >> De: "Mathijs Homminga" <mathijs.hommi...@kalooga.com> >> Para: user@nutch.apache.org >> Enviados: Miércoles, 27 de Junio 2012 17:17:12 >> Asunto: Re: Problema with NullPointerException on custom Parser >> >> No need for Tika. Can you send your plugin.xml? >> >> Mathijs Homminga >> >> On Jun 27, 2012, at 23:07, Jorge Luis Betancourt Gonzalez >> <jlbetanco...@uci.cu> wrote: >> >>> Hi, >>> >>> I agree with you, and is a genius idea rely on Tika to parse the files, but >>> in this particular case when all I want to do is encode the content into >>> base64 should I wrote a custom parser to tika and rely on the parser-tika >>> plugin to do its magic? >>> >>> Jorge >>> >>> ----- Mensaje original ----- >>> De: "Lewis John Mcgibbney" <lewis.mcgibb...@gmail.com> >>> Para: user@nutch.apache.org >>> Enviados: Miércoles, 27 de Junio 2012 16:55:12 >>> Asunto: Re: Problema with NullPointerException on custom Parser >>> >>> Hi, >>> >>> I think you are partly correct. >>> >>> The core Nutch code itself doesn't do any parsing as such. All parsing >>> is relied upon by external parsing libraries. >>> >>> Basically we need to define a parser to do the parsing, using Tika as >>> a wrapper for mimeType detection and subsequent parsing saves us a bit >>> of overhead. >>> >>> Lewis >>> >>> On Wed, Jun 27, 2012 at 9:44 PM, Jorge Luis Betancourt Gonzalez >>> <jlbetanco...@uci.cu> wrote: >>>> Hi Lewis, thank you for the reply. Is mandatory wrote a wrap around Tika? >>>> I thought this was optional since I really don't parse the content >>>> searching for nothing, I only get the content, transform it into an Image >>>> object, resize it, and then I encode with base64 to store on the solr >>>> backend. >>>> >>>> So I thought that all this processing could be done getParse method. >>>> >>>> Is my assumption correct or is mandatory to write my desired logic using >>>> Tika? >>>> >>>> ----- Mensaje original ----- >>>> De: "Lewis John Mcgibbney" <lewis.mcgibb...@gmail.com> >>>> Para: user@nutch.apache.org >>>> Enviados: Miércoles, 27 de Junio 2012 16:33:01 >>>> Asunto: Re: Problema with NullPointerException on custom Parser >>>> >>>> Hi Jorge, >>>> >>>> It doesn't look like your actually using Tika as a wrapper for your >>>> custom parser at all... >>>> >>>> You would be need to specify the correct Tika config by calling >>>> tikaConfig.getParser >>>> >>>> hth >>>> >>>> On Wed, Jun 27, 2012 at 7:46 PM, Jorge Luis Betancourt Gonzalez >>>> <jlbetanco...@uci.cu> wrote: >>>>> Hi all: >>>>> >>>>> I'm working on a custom parser plugin to generate thumbnails from images >>>>> fetched with nutch 1.4. I'm doing this because the humbnails will be >>>>> converted into a base64 encoded string and stored on a Solr backend. >>>>> >>>>> So I basically wrote a custom parser (to which I send all png images, for >>>>> example). I enable the plugin (image-thumbnail) in the nutch-site.xml, >>>>> set some custom properties to load the width and height of the thumbnail. >>>>> Also set the alias on the parse-plugins.xml and set the plugin to handle >>>>> the image/png files, also in this file. >>>>> >>>>> the plugin is being loaded, but every time I get a png image to parse I >>>>> get this: >>>>> >>>>> Error parsing: >>>>> http://localhost/sites/all/themes/octavitos/images/iconos/audiointernet.png: >>>>> java.lang.NullPointerException >>>>> at org.apache.nutch.parse.ParserFactory.match(ParserFactory.java:388) >>>>> at >>>>> org.apache.nutch.parse.ParserFactory.getExtension(ParserFactory.java:397) >>>>> at >>>>> org.apache.nutch.parse.ParserFactory.matchExtensions(ParserFactory.java:296) >>>>> at >>>>> org.apache.nutch.parse.ParserFactory.findExtensions(ParserFactory.java:262) >>>>> at >>>>> org.apache.nutch.parse.ParserFactory.getExtensions(ParserFactory.java:234) >>>>> at >>>>> org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:119) >>>>> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) >>>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:86) >>>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:42) >>>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >>>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >>>>> at >>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>>>> >>>>> The thing is that I have put some log messages inside the getParse() >>>>> method but none of this message are being logged on the hadoop.log file, >>>>> so for what I can tell the method is not being executed. >>>>> >>>>> Any one has any idea what I'm doing wrong? >>>>> >>>>> P.S: I've attached the source of the ImageThumbnailParser. >>>>> >>>>> Greetings! >>>>> >>>>> >>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>>> INFORMATICAS... >>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>>> >>>>> http://www.uci.cu >>>>> http://www.facebook.com/universidad.uci >>>>> http://www.flickr.com/photos/universidad_uci >>>> >>>> >>>> >>>> -- >>>> Lewis >>>> >>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>> INFORMATICAS... >>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>> >>>> http://www.uci.cu >>>> http://www.facebook.com/universidad.uci >>>> http://www.flickr.com/photos/universidad_uci >>>> >>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>>> INFORMATICAS... >>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>>> >>>> http://www.uci.cu >>>> http://www.facebook.com/universidad.uci >>>> http://www.flickr.com/photos/universidad_uci >>> >>> >>> >>> -- >>> Lewis >>> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>> INFORMATICAS... >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>> >>> http://www.uci.cu >>> http://www.facebook.com/universidad.uci >>> http://www.flickr.com/photos/universidad_uci >>> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>> INFORMATICAS... >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>> >>> http://www.uci.cu >>> http://www.facebook.com/universidad.uci >>> http://www.flickr.com/photos/universidad_uci >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > > <parse-plugins.xml>