No need for Tika. Can you send your plugin.xml? Mathijs Homminga
On Jun 27, 2012, at 23:07, Jorge Luis Betancourt Gonzalez <[email protected]> wrote: > Hi, > > I agree with you, and is a genius idea rely on Tika to parse the files, but > in this particular case when all I want to do is encode the content into > base64 should I wrote a custom parser to tika and rely on the parser-tika > plugin to do its magic? > > Jorge > > ----- Mensaje original ----- > De: "Lewis John Mcgibbney" <[email protected]> > Para: [email protected] > Enviados: MiƩrcoles, 27 de Junio 2012 16:55:12 > Asunto: Re: Problema with NullPointerException on custom Parser > > Hi, > > I think you are partly correct. > > The core Nutch code itself doesn't do any parsing as such. All parsing > is relied upon by external parsing libraries. > > Basically we need to define a parser to do the parsing, using Tika as > a wrapper for mimeType detection and subsequent parsing saves us a bit > of overhead. > > Lewis > > On Wed, Jun 27, 2012 at 9:44 PM, Jorge Luis Betancourt Gonzalez > <[email protected]> wrote: >> Hi Lewis, thank you for the reply. Is mandatory wrote a wrap around Tika? I >> thought this was optional since I really don't parse the content searching >> for nothing, I only get the content, transform it into an Image object, >> resize it, and then I encode with base64 to store on the solr backend. >> >> So I thought that all this processing could be done getParse method. >> >> Is my assumption correct or is mandatory to write my desired logic using >> Tika? >> >> ----- Mensaje original ----- >> De: "Lewis John Mcgibbney" <[email protected]> >> Para: [email protected] >> Enviados: MiƩrcoles, 27 de Junio 2012 16:33:01 >> Asunto: Re: Problema with NullPointerException on custom Parser >> >> Hi Jorge, >> >> It doesn't look like your actually using Tika as a wrapper for your >> custom parser at all... >> >> You would be need to specify the correct Tika config by calling >> tikaConfig.getParser >> >> hth >> >> On Wed, Jun 27, 2012 at 7:46 PM, Jorge Luis Betancourt Gonzalez >> <[email protected]> wrote: >>> Hi all: >>> >>> I'm working on a custom parser plugin to generate thumbnails from images >>> fetched with nutch 1.4. I'm doing this because the humbnails will be >>> converted into a base64 encoded string and stored on a Solr backend. >>> >>> So I basically wrote a custom parser (to which I send all png images, for >>> example). I enable the plugin (image-thumbnail) in the nutch-site.xml, set >>> some custom properties to load the width and height of the thumbnail. Also >>> set the alias on the parse-plugins.xml and set the plugin to handle the >>> image/png files, also in this file. >>> >>> the plugin is being loaded, but every time I get a png image to parse I get >>> this: >>> >>> Error parsing: >>> http://localhost/sites/all/themes/octavitos/images/iconos/audiointernet.png: >>> java.lang.NullPointerException >>> at org.apache.nutch.parse.ParserFactory.match(ParserFactory.java:388) >>> at >>> org.apache.nutch.parse.ParserFactory.getExtension(ParserFactory.java:397) >>> at >>> org.apache.nutch.parse.ParserFactory.matchExtensions(ParserFactory.java:296) >>> at >>> org.apache.nutch.parse.ParserFactory.findExtensions(ParserFactory.java:262) >>> at >>> org.apache.nutch.parse.ParserFactory.getExtensions(ParserFactory.java:234) >>> at >>> org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:119) >>> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71) >>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:86) >>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:42) >>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) >>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >>> at >>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) >>> >>> The thing is that I have put some log messages inside the getParse() method >>> but none of this message are being logged on the hadoop.log file, so for >>> what I can tell the method is not being executed. >>> >>> Any one has any idea what I'm doing wrong? >>> >>> P.S: I've attached the source of the ImageThumbnailParser. >>> >>> Greetings! >>> >>> >>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >>> INFORMATICAS... >>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >>> >>> http://www.uci.cu >>> http://www.facebook.com/universidad.uci >>> http://www.flickr.com/photos/universidad_uci >> >> >> >> -- >> Lewis >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cu >> http://www.facebook.com/universidad.uci >> http://www.flickr.com/photos/universidad_uci > > > > -- > Lewis > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > http://www.uci.cu > http://www.facebook.com/universidad.uci > http://www.flickr.com/photos/universidad_uci

