Lewis, Thanks for looking at this. SOL has newest payched schema and I restarted tomcat.
I set DEBUG for SolrIndexerJob in log4j.properties file log4j.logger.org.apache.nutch.indexer.solr.SolrIndexerJob=DEBUG,cmdstdout >Can I >also suggest that you experiment with the crawl script (which >accompanies the nutch script) instead of using the deprecated crawl >command. Where is this script? bin folder has only nutch script. >perhaps attempt to set the SolrIndexerJob logging to DEBUG and review >your hadoop.log as well. I can confirm that I was able to get Nutch >trunk working with a standalone Solr 4.0 multicore server with the >patch applied just last week. I am using nutch 2.1 not trunk. Does it make any difference on behavior of nutch script? Can you give me main points, maybe a scripts of what is your full steps, on how you tested and got this working last week? I am getting this in hadop.log 2012-11-13 10:34:50,466 INFO solr.SolrIndexerJob - SolrIndexerJob: starting 2012-11-13 10:34:50,805 INFO plugin.PluginRepository - Plugins: looking in: /home/eakarsu/searchProject/apache-nutch-2.1/runtime/local/plugins 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Registered Plugins: 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Registered Extension-Points: 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Parse Filter (org.apache.nutch.parse.ParseFilter) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2012-11-13 10:34:50,872 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100 2012-11-13 10:34:50,872 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-11-13 10:34:50,875 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-11-13 10:34:50,875 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-11-13 10:34:51,891 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2012-11-13 10:34:52,765 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: content dest: content 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: site dest: site 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: title dest: title 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: host dest: host 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: segment dest: segment 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: boost dest: boost 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: digest dest: digest 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2012-11-13 10:34:52,821 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2012-11-13 10:34:52,821 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2012-11-13 10:34:55,434 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2012-11-13 10:34:56,455 ERROR solr.SolrIndexerJob - SolrIndexerJob: org.apache.solr.common.SolrException: Not Found Not Found request: http://localhost:8080/sol40/update at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75) at org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:60) at org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:75) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:84) On Tue, Nov 13, 2012 at 9:53 AM, Lewis John Mcgibbney < [email protected]> wrote: > Hi, > > On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu <[email protected]> wrote: > > Lewis, > > > > I applied the patch you told me. I replaced schema.xml of sol4 > installation > > with schme-sol4.xml. Solr 4.0 system is up and running and I can see its > > web page with http://localhost:8080/sol40. > > You would need to either rename schema-solr4.xml to schema, then copy > this to your tomcat solr installation before starting/restarting the > server or alternatively copy the contents of the newly patched file to > the solr existing schema.xml > > > > > I followed tutorial blindly. Crawling went fine but it seem very slow > > compared to previous before patch applied > > Considering the patch only applies to the Solr indexing stage crawl > performance should not be affected in the slightest. Especially when > you are not passing the solr server URL during the crawl phase. Can I > also suggest that you experiment with the crawl script (which > accompanies the nutch script) instead of using the deprecated crawl > command. > > perhaps attempt to set the SolrIndexerJob logging to DEBUG and review > your hadoop.log as well. I can confirm that I was able to get Nutch > trunk working with a standalone Solr 4.0 multicore server with the > patch applied just last week. > > As I said, Markus has also suggested some additions to the patch so > maybe try catching some irregularities... trial and error. > > hth > > Lewis >

