I suppose you've also tried
https://issues.apache.org/jira/browse/NUTCH-783as suggested in the
previous discussion?

On 21 May 2010 16:18, Julien Nioche <[email protected]> wrote:

> You can :
> - run *bin/nutch org.apache.nutch.parse.ParserChecker *and check that you
> are getting metatag.* in the parse-metadata
> - check in the log that the parse-metatags is really loaded
> - run 'ant test-plugins' and see the output in build/parse-metatags
> - check that you've added the field definitions in the SOLR schema
> - index with Lucene and use Luke to check that the fields are created
>
>
>
> On 21 May 2010 15:54, Claus Daldorph Nielsen <[email protected]> wrote:
>
>> I never got this to work. So if anybody have some ideas for debugging then
>> please post your ideas.
>>
>> The problem is that the meta tags are never found or added to the Solr
>> index. I have no idea why.
>>
>>
>>
>> Claus Daldorph Nielsen
>>
>> Theilgaard Mortensen a/s
>> Niels Hemmingsens gade 9
>> 1153 København K
>>
>> Tlf: 33448555
>>
>>
>>
>> Julien Nioche <[email protected]>
>> 21-05-2010 13:33
>> Please respond to
>> [email protected]
>>
>>
>> To
>> [email protected]
>> cc
>>
>> Subject
>> Re: Parse and index meta tags in Nutch 1.0
>>
>>
>>
>>
>>
>>
>> Have you checked the discussion in
>> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html
>> ?
>> What have you modified in nutch-site.xml?
>>
>> j.
>>
>> On 21 May 2010 12:15, Claus Daldorph Nielsen <[email protected]> wrote:
>>
>> > Julien,
>> >
>> > Thanks it looks much like what I need. I have applied the patch and
>> added
>> > the lines to nutch-site.xml and then rebuild the Nutch project. But
>> still
>> > I don't see any metatags in my index. Do you have any suggestions to
>> what
>> > I might be doing wrong? Perhaps some configuration that I missed?
>> >
>> >
>> >
>> > Claus Daldorph Nielsen
>> >
>> > Theilgaard Mortensen a/s
>> > Niels Hemmingsens gade 9
>> > 1153 København K
>> >
>> > Tlf: 33448555
>> >
>> >
>> >
>> > Julien Nioche <[email protected]>
>> > 21-05-2010 09:39
>> > Please respond to
>> > [email protected]
>> >
>> >
>> > To
>> > [email protected]
>> > cc
>> >
>> > Subject
>> > Re: Parse and index meta tags in Nutch 1.0
>> >
>> >
>> >
>> >
>> >
>> >
>> > Claus,
>> >
>> > See https://issues.apache.org/jira/browse/NUTCH-809 and a related
>> > discussion
>> > on
>> >
>> http://lucene.472066.n3.nabble.com/description-and-keywords-td690681.html
>> >
>> > Julien
>> >
>> > --
>> > DigitalPebble Ltd
>> > http://www.digitalpebble.com
>> >
>> > On 21 May 2010 08:26, Claus Daldorph Nielsen <[email protected]> wrote:
>> >
>> > > Hi,
>> > >
>> > > I am new to Nutch and trying to get Nutch to index meta tags from html
>> > > pages and store them for searching in Solr. The tags are on this form:
>> > > <meta name="TITLE" content="Some title" />
>> > > <meta name="KEYWORDS" content="Forum, help, build, stuff" />
>> > >
>> > > I would like to store the tags as two different fields in the index. I
>> > > have tried the example explaining how to create a plugin but the
>> example
>> > > is for Nutch 0.9 and only helps me getting started.
>> > >
>> > > I think that I should look at :
>> > >
>> > >
>> >
>> >
>>
>> $NUTCH_HOME/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/HtmlParser.java
>> > >
>> > > and find the line:
>> > > HTMLMetaProcessor.getMetaTags(metaTags, root, base);
>> > >
>> > > But I'm not sure how to go on from here. Any help would be appreciated
>> > and
>> > > you are welcome to inform me if you know of an existing plugin that
>> will
>> > > index the meta tags.
>> > >
>> > >
>> > >
>> > > Claus Daldorph Nielsen
>> > >
>> > > Theilgaard Mortensen a/s
>> >
>> >
>>
>>
>> --
>> DigitalPebble Ltd
>> http://www.digitalpebble.com
>>
>>
>
>
> --
> DigitalPebble Ltd
> http://www.digitalpebble.com
>



-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to