This is the content of my plugin.xml
<plugin
id="image-thumbnail"
name="Image thumbnailer for Orion"
version="1.0.0"
provider-name="nutch.org">
<runtime>
<library name="image-thumbnail.jar">
<export name="*"/>
</library>
</runtime>
<requires>
<import plugin="nutch-extensionpoints"/>
</requires>
<extension id="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"
name="Image thumbnailer parser"
point="org.apache.nutch.parse.Parser">
<implementation id="ImageThumbnailParser"
class="org.apache.nutch.parse.thumbnail.ImageThumbnailParser"/>
</extension>
<extension id="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter"
name="Image thumbnail indexing filter"
point="org.apache.nutch.indexer.IndexingFilter">
<implementation id="ImageThumbnailIndexingFilter"
class="org.apache.nutch.parse.thumbnail.ImageThumbnailIndexingFilter"/>
</extension>
</plugin>
----- Mensaje original -----
De: "Mathijs Homminga" <[email protected]>
Para: [email protected]
Enviados: Miércoles, 27 de Junio 2012 17:17:12
Asunto: Re: Problema with NullPointerException on custom Parser
No need for Tika. Can you send your plugin.xml?
Mathijs Homminga
On Jun 27, 2012, at 23:07, Jorge Luis Betancourt Gonzalez <[email protected]>
wrote:
> Hi,
>
> I agree with you, and is a genius idea rely on Tika to parse the files, but
> in this particular case when all I want to do is encode the content into
> base64 should I wrote a custom parser to tika and rely on the parser-tika
> plugin to do its magic?
>
> Jorge
>
> ----- Mensaje original -----
> De: "Lewis John Mcgibbney" <[email protected]>
> Para: [email protected]
> Enviados: Miércoles, 27 de Junio 2012 16:55:12
> Asunto: Re: Problema with NullPointerException on custom Parser
>
> Hi,
>
> I think you are partly correct.
>
> The core Nutch code itself doesn't do any parsing as such. All parsing
> is relied upon by external parsing libraries.
>
> Basically we need to define a parser to do the parsing, using Tika as
> a wrapper for mimeType detection and subsequent parsing saves us a bit
> of overhead.
>
> Lewis
>
> On Wed, Jun 27, 2012 at 9:44 PM, Jorge Luis Betancourt Gonzalez
> <[email protected]> wrote:
>> Hi Lewis, thank you for the reply. Is mandatory wrote a wrap around Tika? I
>> thought this was optional since I really don't parse the content searching
>> for nothing, I only get the content, transform it into an Image object,
>> resize it, and then I encode with base64 to store on the solr backend.
>>
>> So I thought that all this processing could be done getParse method.
>>
>> Is my assumption correct or is mandatory to write my desired logic using
>> Tika?
>>
>> ----- Mensaje original -----
>> De: "Lewis John Mcgibbney" <[email protected]>
>> Para: [email protected]
>> Enviados: Miércoles, 27 de Junio 2012 16:33:01
>> Asunto: Re: Problema with NullPointerException on custom Parser
>>
>> Hi Jorge,
>>
>> It doesn't look like your actually using Tika as a wrapper for your
>> custom parser at all...
>>
>> You would be need to specify the correct Tika config by calling
>> tikaConfig.getParser
>>
>> hth
>>
>> On Wed, Jun 27, 2012 at 7:46 PM, Jorge Luis Betancourt Gonzalez
>> <[email protected]> wrote:
>>> Hi all:
>>>
>>> I'm working on a custom parser plugin to generate thumbnails from images
>>> fetched with nutch 1.4. I'm doing this because the humbnails will be
>>> converted into a base64 encoded string and stored on a Solr backend.
>>>
>>> So I basically wrote a custom parser (to which I send all png images, for
>>> example). I enable the plugin (image-thumbnail) in the nutch-site.xml, set
>>> some custom properties to load the width and height of the thumbnail. Also
>>> set the alias on the parse-plugins.xml and set the plugin to handle the
>>> image/png files, also in this file.
>>>
>>> the plugin is being loaded, but every time I get a png image to parse I get
>>> this:
>>>
>>> Error parsing:
>>> http://localhost/sites/all/themes/octavitos/images/iconos/audiointernet.png:
>>> java.lang.NullPointerException
>>> at org.apache.nutch.parse.ParserFactory.match(ParserFactory.java:388)
>>> at
>>> org.apache.nutch.parse.ParserFactory.getExtension(ParserFactory.java:397)
>>> at
>>> org.apache.nutch.parse.ParserFactory.matchExtensions(ParserFactory.java:296)
>>> at
>>> org.apache.nutch.parse.ParserFactory.findExtensions(ParserFactory.java:262)
>>> at
>>> org.apache.nutch.parse.ParserFactory.getExtensions(ParserFactory.java:234)
>>> at
>>> org.apache.nutch.parse.ParserFactory.getParsers(ParserFactory.java:119)
>>> at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:71)
>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:86)
>>> at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:42)
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>>> at
>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>>>
>>> The thing is that I have put some log messages inside the getParse() method
>>> but none of this message are being logged on the hadoop.log file, so for
>>> what I can tell the method is not being executed.
>>>
>>> Any one has any idea what I'm doing wrong?
>>>
>>> P.S: I've attached the source of the ImageThumbnailParser.
>>>
>>> Greetings!
>>>
>>>
>>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>>> INFORMATICAS...
>>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>>
>>> http://www.uci.cu
>>> http://www.facebook.com/universidad.uci
>>> http://www.flickr.com/photos/universidad_uci
>>
>>
>>
>> --
>> Lewis
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>>
>> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>>
>> http://www.uci.cu
>> http://www.facebook.com/universidad.uci
>> http://www.flickr.com/photos/universidad_uci
>
>
>
> --
> Lewis
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci