Hi Alexander, 
 I tried to debug by placing some xml parsing errors in the config files and
found that nutch wasn't looking at the rignt location. Once, that was fixed
some urls were still skipped and it turns out it was due to robots.txt file
forbidding nutch to crawl certain urls. 

Thanks. kenan.

Alexander Aristov wrote:
> 
> parse-pligin.xml says which plugin should be invoked for particular mime
> type but activation/deactivation is in the nutch-site.xml
> 
> Check activated plugins
> 
> Best Regards
> Alexander Aristov
> 
> 
> 2009/5/8 Kenan Azam <[email protected]>
> 
>> Thanks Alexander, however, tried that but again the plugin is registered
>> but
>> not used. The mime-type is html, I had not entered my other plugins in
>> parse-plugin.xml but they were still running.
>>
>> The other thing I don't get is that all urls starting with
>> literature/article.do are not being indexed by any of my plugins. Maybe
>> the
>> fetching process is somehow scoring them and deciding that they are not
>> worth indexing.
>>
>> I am using boost values so could this be a possibility.
>>
>>  Again, these urls get fetched but never indexed.
>> hadoop.log file shows
>>  2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching
>>
>>
>>
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
>>  2009-05-07<
>>
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07
>> >
>> >
>> > 14:32:23,049 INFO  fetcher.Fetcher - fetching
>> >
>> >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
>> >  2009-05-07<
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07
>> >14:32:23,051
>> > INFO  fetcher.Fetcher - fetching
>> >
>> >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
>> >  2009-05-07<
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07
>> >14:32:23,052
>> > INFO  fetcher.Fetcher - fetching
>> >
>> >
>> Thanks, Kenan.
>> On Thu, May 7, 2009 at 11:12 PM, Alexander Aristov <
>> [email protected]> wrote:
>>
>> > Did you assign mime type to this plugin. What is it?
>> >
>> > It's in the parse-plugins.xml file. Unless you do that Nutch won't know
>> if
>> > it should invoke your plugin for processing particular pages.
>> >
>> >
>> > Best Regards
>> > Alexander Aristov
>> >
>> >
>> > 2009/5/8 kazam <[email protected]>
>> >
>> > >
>> > > Hi there,
>> > > I am using nutch-0.8.1 and I have 5 custom plugins that I am using.
>> All
>> > of
>> > > those plugins seem to get used from the logs but one of them is not
>> being
>> > > used. Also, the urls it was written for are also skipped altogether.
>> > >
>> > > Here are some pieces from hadoop.log file
>> > > 2009-05-07 14:27:41,227 INFO  plugin.PluginRepository - Registered
>> > Plugins:
>> > > .....
>> > > .........
>> > > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -        
>> Xenbase
>> > > Indexer
>> > > (index-xenbase)
>> > > 2009-05-07 14:27:41,228 INFO  plugin.PluginRepository -        
>> Article
>> > > Display
>> > > Page Parser (parse-articlePage)
>> > >
>> > > The last plugin --> parse-articlePage is never used.
>> > >
>> > > I wrote this plugin to index urls of the type
>> > >
>> > >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=670
>> > >
>> > > Again, these urls get fetched but never indexed.
>> > > hadoop.log file shows
>> > > 2009-05-07 14:32:23,048 INFO  fetcher.Fetcher - fetching
>> > >
>> > >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966
>> > > 2009-05-07<
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=5966%0A2009-05-07
>> >14:32:23,049
>> > INFO  fetcher.Fetcher - fetching
>> > >
>> > >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196
>> > > 2009-05-07<
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=9196%0A2009-05-07
>> >14:32:23,051
>> > INFO  fetcher.Fetcher - fetching
>> > >
>> > >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247
>> > > 2009-05-07<
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6247%0A2009-05-07
>> >14:32:23,052
>> > INFO  fetcher.Fetcher - fetching
>> > >
>> > >
>> >
>> http://xlaevis.cpsc.ucalgary.ca/literature/article.do?method=display&articleId=6223
>> > >
>> > > Am I missing some configuration, or is there a bug in the plugin, I
>> don't
>> > > see any exceptions being thrown.
>> > >
>> > > Thanks for any pointers.
>> > >
>> > >
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://www.nabble.com/Registered-plugin-never-invoked-and-urls-skipped-tp23435093p23435093.html
>> > > Sent from the Nutch - User mailing list archive at Nabble.com.
>> > >
>> > >
>> >
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Registered-plugin-never-invoked-and-urls-skipped-tp23435093p23491215.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to