OK,

I'm trying to use the SolrIndexer with Nutch 1.0 and nothing seems to
be sent to Solr.

I've put some more debug logging into the SolrIndexer and SolrWriter
classes. It seems like although the SolrWriter class is told to open()
and close() it is never told to write() anything in between.

Why would that be? Surely nutch should be sending everything to Solr?
Is there some other kind of filtering going on? How could I find out?
Hadoop is taking ages to do the "map" and then quite quickly the
reduce results in nothing...

Here is the previous email on the subject in case your emailer hasnt
tied the two together.

Alex


2009/8/11 Alex McLintock <alex.mclint...@gmail.com>:
> Further information to this....
>
> I'm running on a single machine in fake clustering mode.
>
> A tmp directory gets created, with nothing but another empty directory
> inside of it.
>
> The hadoop log file just says the same thing over and over every 30 
> seconds....
>
> 2009-08-11 20:20:57,803 INFO  plugin.PluginRepository - Plugins:
> looking in: /local/apps/software/nutch/plugins
> 2009-08-11 20:20:58,158 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository - Registered Plugins:
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository -         the
> nutch core extension points (nutch-extensionpoints)
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository -         Basic
> Query Filter (query-basic)
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository -         Basic
> URL Normalizer (urlnormalizer-basic)
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository -         Basic
> Indexing Filter (index-basic)
> 2009-08-11 20:20:58,159 INFO  plugin.PluginRepository -         Html
> Parse Plug-in (parse-html)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         Site
> Query Filter (query-site)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         Basic
> Summarizer Plug-in (summary-basic)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         HTTP
> Framework (lib-http)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -
> Pass-through URL Normalizer (urlnormalizer-pass)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         Regex
> URL Filter (urlfilter-regex)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         Http
> Protocol Plug-in (protocol-http)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         XML
> Response Writer Plug-in (response-xml)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         Regex
> URL Normalizer (urlnormalizer-regex)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -         OPIC
> Scoring Plug-in (scoring-opic)
> 2009-08-11 20:20:58,160 INFO  plugin.PluginRepository -
> CyberNeko HTML Parser (lib-nekohtml)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         Anchor
> Indexing Filter (index-anchor)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         URL
> Query Filter (query-url)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         Regex
> URL Filter Framework (lib-regex-filter)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         JSON
> Response Writer Plug-in (response-json)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         Nutch
> Summarizer (org.apache.nutch.searcher.Summarizer)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         Nutch
> Protocol (org.apache.nutch.protocol.Protocol)
> 2009-08-11 20:20:58,161 INFO  plugin.PluginRepository -         Nutch
> Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Field Filter (org.apache.nutch.indexer.field.FieldFilter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         HTML
> Parse Filter (org.apache.nutch.parse.HtmlParseFilter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Query Filter (org.apache.nutch.searcher.QueryFilter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Search Results Response Writer
> (org.apache.nutch.searcher.response.ResponseWriter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> URL Normalizer (org.apache.nutch.net.URLNormalizer)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> URL Filter (org.apache.nutch.net.URLFilter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Online Search Results Clustering Plugin
> (org.apache.nutch.clustering.OnlineClusterer)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Indexing Filter (org.apache.nutch.indexer.IndexingFilter)
> 2009-08-11 20:20:58,162 INFO  plugin.PluginRepository -         Nutch
> Content Parser (org.apache.nutch.parse.Parser)
> 2009-08-11 20:20:58,163 INFO  plugin.PluginRepository -         Nutch
> Scoring (org.apache.nutch.scoring.ScoringFilter)
> 2009-08-11 20:20:58,163 INFO  plugin.PluginRepository -
> Ontology Model Loader (org.apache.nutch.ontology.Ontology)
> 2009-08-11 20:20:58,171 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2009-08-11 20:20:58,202 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
>
>
>
> Is Solr output a plugin, and is it not set up above?
>
> 2009/8/11 Alex McLintock <alex.mclint...@gmail.com>:
>> I'm trying to send my Nutch crawl to SolR. I've "generated, fetched,
>> updated", several times. I've done an invertlinks.
>> But when I try to do the solrindex it just sits there for ages and
>> doesnt seem to stress the solr server at all.
>>
>> I'm using Nutch 1.0, Sun Java 1.6, Ubuntu Linux 9.04.
>>
>> /local/apps/software/nutch$ bin/nutch solrindex
>> http://rio23:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/*
>>
>> Is there some kind of "verbose" option so that I can better see what
>> it is doing? I could maybe insert some extra deugging, or do i need to
>> run this in Eclipse?
>>
>> The Java process seems to be using up most of a core's CPU time so it
>> seems to be doing *something*.
>>
>> This is my first Solr project so I have proved that it is up and
>> running, but havent actually added any data to it yet...
>>
>> Alex
>>
>

Reply via email to