Hello karl, I have restarted a new one, please let me know if that helps.
Regards, Mourad On 13 Nov 2012, at 15:45, Erol Akarsu <[email protected]> wrote: > Lewis, > > Thanks for looking at this. SOL has newest payched schema and I restarted > tomcat. > > I set DEBUG for SolrIndexerJob in log4j.properties file > > log4j.logger.org.apache.nutch.indexer.solr.SolrIndexerJob=DEBUG,cmdstdout > >> Can I >> also suggest that you experiment with the crawl script (which >> accompanies the nutch script) instead of using the deprecated crawl >> command. > > Where is this script? bin folder has only nutch script. > >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review >> your hadoop.log as well. I can confirm that I was able to get Nutch >> trunk working with a standalone Solr 4.0 multicore server with the >> patch applied just last week. > > I am using nutch 2.1 not trunk. Does it make any difference on behavior of > nutch script? > Can you give me main points, maybe a scripts of what is your full steps, > on how you tested and got this working last week? > > > I am getting this in hadop.log > > 2012-11-13 10:34:50,466 INFO solr.SolrIndexerJob - SolrIndexerJob: starting > 2012-11-13 10:34:50,805 INFO plugin.PluginRepository - Plugins: looking > in: /home/eakarsu/searchProject/apache-nutch-2.1/runtime/local/plugins > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Plugin > Auto-activation mode: [true] > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Registered Plugins: > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - the nutch core > extension points (nutch-extensionpoints) > 2012-11-13 10:34:50,867 INFO plugin.PluginRepository - Basic URL > Normalizer (urlnormalizer-basic) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Basic Indexing > Filter (index-basic) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Html Parse > Plug-in (parse-html) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - HTTP Framework > (lib-http) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Pass-through > URL Normalizer (urlnormalizer-pass) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > Filter (urlfilter-regex) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Http Protocol > Plug-in (protocol-http) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > Normalizer (urlnormalizer-regex) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Tika Parser > Plug-in (parse-tika) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - OPIC Scoring > Plug-in (scoring-opic) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - CyberNeko HTML > Parser (lib-nekohtml) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Anchor Indexing > Filter (index-anchor) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Regex URL > Filter Framework (lib-regex-filter) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Registered > Extension-Points: > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL > Normalizer (org.apache.nutch.net.URLNormalizer) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Protocol > (org.apache.nutch.protocol.Protocol) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Parse Filter > (org.apache.nutch.parse.ParseFilter) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch URL > Filter (org.apache.nutch.net.URLFilter) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Indexing > Filter (org.apache.nutch.indexer.IndexingFilter) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Content > Parser (org.apache.nutch.parse.Parser) > 2012-11-13 10:34:50,868 INFO plugin.PluginRepository - Nutch Scoring > (org.apache.nutch.scoring.ScoringFilter) > 2012-11-13 10:34:50,872 INFO basic.BasicIndexingFilter - Maximum title > length for indexing set to: 100 > 2012-11-13 10:34:50,872 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2012-11-13 10:34:50,875 INFO anchor.AnchorIndexingFilter - Anchor > deduplication is: off > 2012-11-13 10:34:50,875 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > 2012-11-13 10:34:51,891 WARN util.NativeCodeLoader - Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2012-11-13 10:34:52,765 INFO mapreduce.GoraRecordReader - > gora.buffer.read.limit = 10000 > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: content > dest: content > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: site dest: > site > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: title dest: > title > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: host dest: > host > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: segment > dest: segment > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: boost dest: > boost > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: digest dest: > digest > 2012-11-13 10:34:52,818 INFO solr.SolrMappingReader - source: tstamp dest: > tstamp > 2012-11-13 10:34:52,821 INFO basic.BasicIndexingFilter - Maximum title > length for indexing set to: 100 > 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.basic.BasicIndexingFilter > 2012-11-13 10:34:52,821 INFO anchor.AnchorIndexingFilter - Anchor > deduplication is: off > 2012-11-13 10:34:52,821 INFO indexer.IndexingFilters - Adding > org.apache.nutch.indexer.anchor.AnchorIndexingFilter > 2012-11-13 10:34:55,434 WARN mapred.FileOutputCommitter - Output path is > null in cleanup > 2012-11-13 10:34:56,455 ERROR solr.SolrIndexerJob - SolrIndexerJob: > org.apache.solr.common.SolrException: Not Found > > Not Found > > request: http://localhost:8080/sol40/update > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430) > at > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) > at > org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) > at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) > at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75) > at > org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:60) > at > org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:75) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:84) > > > On Tue, Nov 13, 2012 at 9:53 AM, Lewis John Mcgibbney < > [email protected]> wrote: > >> Hi, >> >> On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu <[email protected]> wrote: >>> Lewis, >>> >>> I applied the patch you told me. I replaced schema.xml of sol4 >> installation >>> with schme-sol4.xml. Solr 4.0 system is up and running and I can see its >>> web page with http://localhost:8080/sol40. >> >> You would need to either rename schema-solr4.xml to schema, then copy >> this to your tomcat solr installation before starting/restarting the >> server or alternatively copy the contents of the newly patched file to >> the solr existing schema.xml >> >>> >>> I followed tutorial blindly. Crawling went fine but it seem very slow >>> compared to previous before patch applied >> >> Considering the patch only applies to the Solr indexing stage crawl >> performance should not be affected in the slightest. Especially when >> you are not passing the solr server URL during the crawl phase. Can I >> also suggest that you experiment with the crawl script (which >> accompanies the nutch script) instead of using the deprecated crawl >> command. >> >> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review >> your hadoop.log as well. I can confirm that I was able to get Nutch >> trunk working with a standalone Solr 4.0 multicore server with the >> patch applied just last week. >> >> As I said, Markus has also suggested some additions to the patch so >> maybe try catching some irregularities... trial and error. >> >> hth >> >> Lewis >>

