Hi, Following is some part of my hadoop.log, As I am new to Nutch and Solr, therefore these lines jump above my head.
2014-07-19 10:43:45,314 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: content dest: content 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: title dest: title 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: host dest: host 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: batchId dest: batchId 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: boost dest: boost 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: digest dest: digest 2014-07-19 10:43:45,341 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2014-07-19 10:43:45,343 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100 2014-07-19 10:43:45,343 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2014-07-19 10:43:45,343 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2014-07-19 10:43:45,343 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2014-07-19 10:43:45,393 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'crawl_GF_webpage' , assuming they are the same. 2014-07-19 10:43:45,442 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-07-19 10:43:46,161 INFO solr.SolrIndexerJob - SolrIndexerJob: done. 2014-07-19 10:43:47,405 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: starting... 2014-07-19 10:43:47,405 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: Solr url: http://localhost:8983/solr/ 2014-07-19 10:43:47,749 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-07-19 10:43:47,761 WARN mapred.JobClient - No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-07-19 10:43:48,692 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-07-19 10:43:49,323 INFO solr.SolrDeleteDuplicates - SolrDeleteDuplicates: done. I have also attached the complete hadoop.log file. Regards, Ankur Dulwani On Saturday, 19 July 2014 10:07 AM, remi tassing [via Lucene] <[email protected]> wrote: Can you check the log file for more info? default location: $NUTCH_HOME/logs/hadoop.log Ref: http://www.opensourceconnections.com/blog/2014/05/24/crawling-with-nutch/ On Fri, Jul 18, 2014 at 8:52 PM, Ankur Dulwani <[hidden email]> wrote: > Hi, > I am using Nutch to crawl data from different sources, though it works for > mostly all the websites but it gives empty result for some sites like > https://www.google.com/finance. > > Fetcher: throughput threshold sequence: 5 > 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs > in 0 queues > > > This is what I get after crawling. > > So I need to add any configurations or any properties to be added. > > Thanks in advance. > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-returns-empty-result-set-for-some-websites-tp4147874.html > Sent from the Nutch - User mailing list archive at Nabble.com. > ________________________________ If you reply to this email, your message will be added to the discussion below:http://lucene.472066.n3.nabble.com/Nutch-returns-empty-result-set-for-some-websites-tp4147874p4148015.html To unsubscribe from Nutch returns empty result set for some websites, click here. NAML hadoop.log (124K) <http://lucene.472066.n3.nabble.com/attachment/4148018/0/hadoop.log> -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-returns-empty-result-set-for-some-websites-tp4147874p4148018.html Sent from the Nutch - User mailing list archive at Nabble.com.

