Hello karl,

I have restarted a new one, please let me know if that helps.

Regards,

Mourad
On 13 Nov 2012, at 15:45, Erol Akarsu <[email protected]> wrote:

> Lewis,
> 
> Thanks for looking at this. SOL has newest payched schema and I restarted
> tomcat.
> 
> I set DEBUG for SolrIndexerJob in log4j.properties file
> 
> log4j.logger.org.apache.nutch.indexer.solr.SolrIndexerJob=DEBUG,cmdstdout
> 
>> Can I
>> also suggest that you experiment with the crawl script (which
>> accompanies the nutch script) instead of using the deprecated crawl
>> command.
> 
> Where is this script? bin folder has only nutch script.
> 
>> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review
>> your hadoop.log as well. I can confirm that I was able to get Nutch
>> trunk working with a standalone Solr 4.0 multicore server with the
>> patch applied just last week.
> 
> I am using nutch 2.1 not trunk. Does it make any difference on behavior of
> nutch script?
> Can you give me main points, maybe a scripts of what is your full steps,
> on how you tested and got this working last week?
> 
> 
> I am getting this in hadop.log
> 
> 2012-11-13 10:34:50,466 INFO  solr.SolrIndexerJob - SolrIndexerJob: starting
> 2012-11-13 10:34:50,805 INFO  plugin.PluginRepository - Plugins: looking
> in: /home/eakarsu/searchProject/apache-nutch-2.1/runtime/local/plugins
> 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository - Registered Plugins:
> 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository -     the nutch core
> extension points (nutch-extensionpoints)
> 2012-11-13 10:34:50,867 INFO  plugin.PluginRepository -     Basic URL
> Normalizer (urlnormalizer-basic)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Basic Indexing
> Filter (index-basic)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Html Parse
> Plug-in (parse-html)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     HTTP Framework
> (lib-http)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Pass-through
> URL Normalizer (urlnormalizer-pass)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> Filter (urlfilter-regex)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Http Protocol
> Plug-in (protocol-http)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> Normalizer (urlnormalizer-regex)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Tika Parser
> Plug-in (parse-tika)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     OPIC Scoring
> Plug-in (scoring-opic)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     CyberNeko HTML
> Parser (lib-nekohtml)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Anchor Indexing
> Filter (index-anchor)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Regex URL
> Filter Framework (lib-regex-filter)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch URL
> Normalizer (org.apache.nutch.net.URLNormalizer)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Protocol
> (org.apache.nutch.protocol.Protocol)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Parse Filter
> (org.apache.nutch.parse.ParseFilter)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch URL
> Filter (org.apache.nutch.net.URLFilter)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Indexing
> Filter (org.apache.nutch.indexer.IndexingFilter)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Content
> Parser (org.apache.nutch.parse.Parser)
> 2012-11-13 10:34:50,868 INFO  plugin.PluginRepository -     Nutch Scoring
> (org.apache.nutch.scoring.ScoringFilter)
> 2012-11-13 10:34:50,872 INFO  basic.BasicIndexingFilter - Maximum title
> length for indexing set to: 100
> 2012-11-13 10:34:50,872 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2012-11-13 10:34:50,875 INFO  anchor.AnchorIndexingFilter - Anchor
> deduplication is: off
> 2012-11-13 10:34:50,875 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2012-11-13 10:34:51,891 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 2012-11-13 10:34:52,765 INFO  mapreduce.GoraRecordReader -
> gora.buffer.read.limit = 10000
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: content
> dest: content
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: site dest:
> site
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: title dest:
> title
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: host dest:
> host
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: segment
> dest: segment
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: boost dest:
> boost
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: digest dest:
> digest
> 2012-11-13 10:34:52,818 INFO  solr.SolrMappingReader - source: tstamp dest:
> tstamp
> 2012-11-13 10:34:52,821 INFO  basic.BasicIndexingFilter - Maximum title
> length for indexing set to: 100
> 2012-11-13 10:34:52,821 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.basic.BasicIndexingFilter
> 2012-11-13 10:34:52,821 INFO  anchor.AnchorIndexingFilter - Anchor
> deduplication is: off
> 2012-11-13 10:34:52,821 INFO  indexer.IndexingFilters - Adding
> org.apache.nutch.indexer.anchor.AnchorIndexingFilter
> 2012-11-13 10:34:55,434 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
> 2012-11-13 10:34:56,455 ERROR solr.SolrIndexerJob - SolrIndexerJob:
> org.apache.solr.common.SolrException: Not Found
> 
> Not Found
> 
> request: http://localhost:8080/sol40/update
>    at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
>    at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
>    at
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
>    at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86)
>    at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:75)
>    at
> org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:60)
>    at
> org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:75)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at
> org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:84)
> 
> 
> On Tue, Nov 13, 2012 at 9:53 AM, Lewis John Mcgibbney <
> [email protected]> wrote:
> 
>> Hi,
>> 
>> On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu <[email protected]> wrote:
>>> Lewis,
>>> 
>>> I applied the patch you told me. I replaced schema.xml of sol4
>> installation
>>> with schme-sol4.xml. Solr 4.0 system is up and running and I can see its
>>> web page with http://localhost:8080/sol40.
>> 
>> You would need to either rename schema-solr4.xml to schema, then copy
>> this to your tomcat solr installation before starting/restarting the
>> server or alternatively copy the contents of the newly patched file to
>> the solr existing schema.xml
>> 
>>> 
>>> I followed tutorial blindly. Crawling went fine but it seem very slow
>>> compared to previous before patch applied
>> 
>> Considering the patch only applies to the Solr indexing stage crawl
>> performance should not be affected in the slightest. Especially when
>> you are not passing the solr server URL during the crawl phase. Can I
>> also suggest that you experiment with the crawl script (which
>> accompanies the nutch script) instead of using the deprecated crawl
>> command.
>> 
>> perhaps attempt to set the SolrIndexerJob logging to DEBUG and review
>> your hadoop.log as well. I can confirm that I was able to get Nutch
>> trunk working with a standalone Solr 4.0 multicore server with the
>> patch applied just last week.
>> 
>> As I said, Markus has also suggested some additions to the patch so
>> maybe try catching some irregularities... trial and error.
>> 
>> hth
>> 
>> Lewis
>> 

Reply via email to