Hi,

I think you are partly correct.

The core Nutch code itself doesn't do any parsing as such. All parsing
is relied upon by external parsing libraries.

Basically we need to define a parser to do the parsing, using Tika as
a wrapper for mimeType detection and subsequent parsing saves us a bit
of overhead.

Lewis

On Wed, Jun 27, 2012 at 9:44 PM, Jorge Luis Betancourt Gonzalez
<[email protected]> wrote:
> Hi Lewis, thank you for the reply. Is mandatory wrote a wrap around Tika? I 
> thought this was optional since I really don't parse the content searching 
> for nothing, I only get the content, transform it into an Image object, 
> resize it, and then I encode with base64 to store on the solr backend.
>
> So I thought that all this processing could be done getParse method.
>
> Is my assumption correct or is mandatory to write my desired logic using Tika?
>
> ----- Mensaje original -----
> De: "Lewis John Mcgibbney" <[email protected]>
> Para: [email protected]
> Enviados: Miércoles, 27 de Junio 2012 16:33:01
> Asunto: Re: Problema with NullPointerException on custom Parser
>
> Hi Jorge,
>
> It doesn't look like your actually using Tika as a wrapper for your
> custom parser at all...
>
> You would be need to specify the correct Tika config by calling
> tikaConfig.getParser
>
> hth
>
> On Wed, Jun 27, 2012 at 7:46 PM, Jorge Luis Betancourt Gonzalez
> <[email protected]> wrote:
>> Hi all:
>>
>> I'm working on a custom parser plugin to generate thumbnails from images 
>> fetched with nutch 1.4. I'm doing this because the humbnails will be 
>> converted into a base64 encoded string and stored on a Solr backend.
>>
>> So I basically wrote a custom parser (to which I send all png images, for 
>> example). I enable the plugin (image-thumbnail) in the nutch-site.xml, set 
>> some custom properties to load the width and height of the thumbnail. Also 
>> set the alias on the parse-plugins.xml and set the plugin to handle the 
>> image/png files, also in this file.
>>
>> the plugin is being loaded, but every time I get a png image to parse I get 
>> this:
>>
>> Error parsing: 
>> http://localhost/sites/all/themes/octavitos/images/iconos/audiointernet.png: 
>> java.lang.NullPointerException
>>        at org.apache.nutch.parse.ParserFactory.match(ParserFactory.java:388)
>>        at 
>> org.apache.nutch.parse.ParserFactory.getExtension(ParserFactory.java:397)
>>        at 
>> org.apache.nutch.parse.ParserFactory.matchExtensions(ParserFactory.java:296)
>>        at 
>> org.apache.nutch.parse.ParserFactory.findExtensions(ParserFactory.java:262)
>>        at 
>> org.apache.nutch.parse.ParserFactory.getExtensions(ParserFactory.java:234)
>>        at 
>> org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:119)
>>        at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
>>        at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:86)
>>        at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:42)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>        at 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>
>> The thing is that I have put some log messages inside the getParse() method 
>> but none of this message are being logged on the hadoop.log file, so for 
>> what I can tell the method is not being executed.
>>
>> Any one has any idea what I'm doing wrong?
>>
>> P.S: I've attached the source of the ImageThumbnailParser.
>>
>> Greetings!
>>
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>
>
>
> --
> Lewis
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci



-- 
Lewis

Reply via email to