I don't know the answer, but I'd also check in the tomcat/J2ee container logs to see if there are any clues. This helped me solve a problem with nutch solrindex once. Also, I think the data directory for solr should be growing as it add in more stuff.
|-----Original Message----- |From: Alex McLintock [mailto:[email protected]] |Sent: Tuesday, August 11, 2009 12:22 PM |To: [email protected] |Subject: Re: Nutch to SolR. First steps | |Further information to this.... | |I'm running on a single machine in fake clustering mode. | |A tmp directory gets created, with nothing but another empty directory |inside of it. | |The hadoop log file just says the same thing over and over every 30 |seconds.... | |2009-08-11 20:20:57,803 INFO plugin.PluginRepository - Plugins: |looking in: /local/apps/software/nutch/plugins |2009-08-11 20:20:58,158 INFO plugin.PluginRepository - Plugin |Auto-activation mode: [true] |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - Registered Plugins: |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - the |nutch core extension points (nutch-extensionpoints) |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - Basic |Query Filter (query-basic) |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - Basic |URL Normalizer (urlnormalizer-basic) |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - Basic |Indexing Filter (index-basic) |2009-08-11 20:20:58,159 INFO plugin.PluginRepository - Html |Parse Plug-in (parse-html) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - Site |Query Filter (query-site) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - Basic |Summarizer Plug-in (summary-basic) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - HTTP |Framework (lib-http) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - |Pass-through URL Normalizer (urlnormalizer-pass) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - Regex |URL Filter (urlfilter-regex) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - Http |Protocol Plug-in (protocol-http) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - XML |Response Writer Plug-in (response-xml) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - Regex |URL Normalizer (urlnormalizer-regex) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - OPIC |Scoring Plug-in (scoring-opic) |2009-08-11 20:20:58,160 INFO plugin.PluginRepository - |CyberNeko HTML Parser (lib-nekohtml) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Anchor |Indexing Filter (index-anchor) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - URL |Query Filter (query-url) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Regex |URL Filter Framework (lib-regex-filter) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - JSON |Response Writer Plug-in (response-json) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Registered |Extension-Points: |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Nutch |Summarizer (org.apache.nutch.searcher.Summarizer) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Nutch |Protocol (org.apache.nutch.protocol.Protocol) |2009-08-11 20:20:58,161 INFO plugin.PluginRepository - Nutch |Analysis (org.apache.nutch.analysis.NutchAnalyzer) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Field Filter (org.apache.nutch.indexer.field.FieldFilter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - HTML |Parse Filter (org.apache.nutch.parse.HtmlParseFilter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Query Filter (org.apache.nutch.searcher.QueryFilter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Search Results Response Writer |(org.apache.nutch.searcher.response.ResponseWriter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |URL Normalizer (org.apache.nutch.net.URLNormalizer) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |URL Filter (org.apache.nutch.net.URLFilter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Online Search Results Clustering Plugin |(org.apache.nutch.clustering.OnlineClusterer) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Indexing Filter (org.apache.nutch.indexer.IndexingFilter) |2009-08-11 20:20:58,162 INFO plugin.PluginRepository - Nutch |Content Parser (org.apache.nutch.parse.Parser) |2009-08-11 20:20:58,163 INFO plugin.PluginRepository - Nutch |Scoring (org.apache.nutch.scoring.ScoringFilter) |2009-08-11 20:20:58,163 INFO plugin.PluginRepository - |Ontology Model Loader (org.apache.nutch.ontology.Ontology) |2009-08-11 20:20:58,171 INFO indexer.IndexingFilters - Adding |org.apache.nutch.indexer.basic.BasicIndexingFilter |2009-08-11 20:20:58,202 INFO indexer.IndexingFilters - Adding |org.apache.nutch.indexer.anchor.AnchorIndexingFilter | | | |Is Solr output a plugin, and is it not set up above? | |2009/8/11 Alex McLintock <[email protected]>: |> I'm trying to send my Nutch crawl to SolR. I've "generated, fetched, |> updated", several times. I've done an invertlinks. |> But when I try to do the solrindex it just sits there for ages and |> doesnt seem to stress the solr server at all. |> |> I'm using Nutch 1.0, Sun Java 1.6, Ubuntu Linux 9.04. |> |> /local/apps/software/nutch$ bin/nutch solrindex |> http://rio23:8983/solr/ crawl/crawldb crawl/linkdb crawl/segments/* |> |> Is there some kind of "verbose" option so that I can better see what |> it is doing? I could maybe insert some extra deugging, or do i need to |> run this in Eclipse? |> |> The Java process seems to be using up most of a core's CPU time so it |> seems to be doing *something*. |> |> This is my first Solr project so I have proved that it is up and |> running, but havent actually added any data to it yet... |> |> Alex |>
