Of course Mathijs, thank you for the time and the replies, here goes my 
parse-plugins.xml (as an attachment).

Greetings!

----- Mensaje original -----
De: "Mathijs Homminga" <[email protected]>
Para: [email protected]
Enviados: Miércoles, 27 de Junio 2012 17:44:43
Asunto: Re: Problema with NullPointerException on custom Parser

Hmmm looking at the ParserFactory code, there can actually be several causes 
for a NullPointerException...
Can you also send the parse-plugins.xml?

Mathijs Homminga

On Jun 27, 2012, at 23:23, Jorge Luis Betancourt Gonzalez <[email protected]> 
wrote:

> This is the content of my plugin.xml
>
> <plugin
>   id="image-thumbnail"
>   name="Image thumbnailer for Orion"
>   version="1.0.0"
>   provider-name="nutch.org">
>
>    <runtime>
>      <library name="image-thumbnail.jar">
>         <export name="*"/>
>      </library>
>   </runtime>
>
>   <requires>
>      <import plugin="nutch-extensionpoints"/>
>   </requires>
>
>   <extension id="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"
>              name="Image thumbnailer parser"
>              point="org.apache.nutch.parse.Parser">
>      <implementation id="ImageThumbnailParser"
>                      
> class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
>   </extension>
>
>   <extension 
> id="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter"
>              name="Image thumbnail indexing filter"
>              point="org.apache.nutch.indexer.IndexingFilter">
>      <implementation id="ImageThumbnailIndexingFilter"
>                      
> class="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter"/>
>   </extension>
>
> </plugin>
>
>
> ----- Mensaje original -----
> De: "Mathijs Homminga" <[email protected]>
> Para: [email protected]
> Enviados: Miércoles, 27 de Junio 2012 17:17:12
> Asunto: Re: Problema with NullPointerException on custom Parser
>
> No need for Tika. Can you send your plugin.xml?
>
> Mathijs Homminga
>
> On Jun 27, 2012, at 23:07, Jorge Luis Betancourt Gonzalez 
> <[email protected]> wrote:
>
>> Hi,
>>
>> I agree with you, and is a genius idea rely on Tika to parse the files, but 
>> in this particular case when all I want to do is encode the content into 
>> base64 should I wrote a custom parser to tika and rely on the parser-tika 
>> plugin to do its magic?
>>
>> Jorge
>>
>> ----- Mensaje original -----
>> De: "Lewis John Mcgibbney" <[email protected]>
>> Para: [email protected]
>> Enviados: Miércoles, 27 de Junio 2012 16:55:12
>> Asunto: Re: Problema with NullPointerException on custom Parser
>>
>> Hi,
>>
>> I think you are partly correct.
>>
>> The core Nutch code itself doesn't do any parsing as such. All parsing
>> is relied upon by external parsing libraries.
>>
>> Basically we need to define a parser to do the parsing, using Tika as
>> a wrapper for mimeType detection and subsequent parsing saves us a bit
>> of overhead.
>>
>> Lewis
>>
>> On Wed, Jun 27, 2012 at 9:44 PM, Jorge Luis Betancourt Gonzalez
>> <[email protected]> wrote:
>>> Hi Lewis, thank you for the reply. Is mandatory wrote a wrap around Tika? I 
>>> thought this was optional since I really don't parse the content searching 
>>> for nothing, I only get the content, transform it into an Image object, 
>>> resize it, and then I encode with base64 to store on the solr backend.
>>>
>>> So I thought that all this processing could be done getParse method.
>>>
>>> Is my assumption correct or is mandatory to write my desired logic using 
>>> Tika?
>>>
>>> ----- Mensaje original -----
>>> De: "Lewis John Mcgibbney" <[email protected]>
>>> Para: [email protected]
>>> Enviados: Miércoles, 27 de Junio 2012 16:33:01
>>> Asunto: Re: Problema with NullPointerException on custom Parser
>>>
>>> Hi Jorge,
>>>
>>> It doesn't look like your actually using Tika as a wrapper for your
>>> custom parser at all...
>>>
>>> You would be need to specify the correct Tika config by calling
>>> tikaConfig.getParser
>>>
>>> hth
>>>
>>> On Wed, Jun 27, 2012 at 7:46 PM, Jorge Luis Betancourt Gonzalez
>>> <[email protected]> wrote:
>>>> Hi all:
>>>>
>>>> I'm working on a custom parser plugin to generate thumbnails from images 
>>>> fetched with nutch 1.4. I'm doing this because the humbnails will be 
>>>> converted into a base64 encoded string and stored on a Solr backend.
>>>>
>>>> So I basically wrote a custom parser (to which I send all png images, for 
>>>> example). I enable the plugin (image-thumbnail) in the nutch-site.xml, set 
>>>> some custom properties to load the width and height of the thumbnail. Also 
>>>> set the alias on the parse-plugins.xml and set the plugin to handle the 
>>>> image/png files, also in this file.
>>>>
>>>> the plugin is being loaded, but every time I get a png image to parse I 
>>>> get this:
>>>>
>>>> Error parsing: 
>>>> http://localhost/sites/all/themes/octavitos/images/iconos/audiointernet.png:
>>>>  java.lang.NullPointerException
>>>>       at org.apache.nutch.parse.ParserFactory.match(ParserFactory.java:388)
>>>>       at 
>>>> org.apache.nutch.parse.ParserFactory.getExtension(ParserFactory.java:397)
>>>>       at 
>>>> org.apache.nutch.parse.ParserFactory.matchExtensions(ParserFactory.java:296)
>>>>       at 
>>>> org.apache.nutch.parse.ParserFactory.findExtensions(ParserFactory.java:262)
>>>>       at 
>>>> org.apache.nutch.parse.ParserFactory.getExtensions(ParserFactory.java:234)
>>>>       at 
>>>> org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:119)
>>>>       at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
>>>>       at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:86)
>>>>       at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:42)
>>>>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>>>       at 
>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>>
>>>> The thing is that I have put some log messages inside the getParse() 
>>>> method but none of this message are being logged on the hadoop.log file, 
>>>> so for what I can tell the method is not being executed.
>>>>
>>>> Any one has any idea what I'm doing wrong?
>>>>
>>>> P.S: I've attached the source of the ImageThumbnailParser.
>>>>
>>>> Greetings!
>>>>
>>>>
>>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>>>> INFORMATICAS...
>>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>>
>>>> http://www.uci.cu
>>>> http://www.facebook.com/universidad.uci
>>>> http://www.flickr.com/photos/universidad_uci
>>>
>>>
>>>
>>> --
>>> Lewis
>>>
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>>
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>
>>
>>
>> --
>> Lewis
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Attachment: parse-plugins.xml
Description: XML document

Reply via email to