Hi, I'm new to Nutch. I have crawling several sites using Nutch and it works, with several website as exception. I've looked up on hadoop.log buat can't find any suspected errors for the failed crawling site. No document added on console as any other successful crawling like this:
2014-05-15 00:46:32,669 INFO solr.SolrWriter - Adding 5 documents And I assumed it has a problems when generating or feeding? Here is my log when I'm attempting to crawl http://www.okezone.com with DEBUG mode: 2014-05-15 12:18:16,110 INFO crawl.InjectorJob - InjectorJob: starting at 2014-05-15 12:18:16 2014-05-15 12:18:16,110 INFO crawl.InjectorJob - InjectorJob: Injecting urlDir: urls/okezone.com.txt 2014-05-15 12:18:17,464 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:17,528 INFO crawl.InjectorJob - InjectorJob: Using class org.apache.gora.hbase.store.HBaseStore as the Gora storage class. 2014-05-15 12:18:17,564 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:17,628 WARN snappy.LoadSnappy - Snappy native library not loaded 2014-05-15 12:18:18,138 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:18,145 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2014-05-15 12:18:18,217 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local1586545423_0001.xml, instantiating a new object cache 2014-05-15 12:18:18,267 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2014-05-15 12:18:18,434 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-05-15 12:18:18,916 INFO crawl.InjectorJob - InjectorJob: total number of urls rejected by filters: 0 2014-05-15 12:18:18,916 INFO crawl.InjectorJob - InjectorJob: total number of urls injected after normalization and filtering: 1 2014-05-15 12:18:18,917 INFO crawl.InjectorJob - Injector: finished at 2014-05-15 12:18:18, elapsed: 00:00:02 2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: starting at 2014-05-15 12:18:19 2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: Selecting best-scoring urls due for fetch. 2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: starting 2014-05-15 12:18:19,914 INFO crawl.GeneratorJob - GeneratorJob: filtering: false 2014-05-15 12:18:19,915 INFO crawl.GeneratorJob - GeneratorJob: normalizing: false 2014-05-15 12:18:19,915 INFO crawl.GeneratorJob - GeneratorJob: topN: 50000 2014-05-15 12:18:20,261 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2014-05-15 12:18:20,261 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2014-05-15 12:18:20,262 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2014-05-15 12:18:20,262 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2014-05-15 12:18:21,231 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:21,445 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:21,500 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:21,632 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:22,210 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:22,346 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:22,408 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:22,423 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2014-05-15 12:18:22,583 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml, instantiating a new object cache 2014-05-15 12:18:22,610 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local445951503_0001.xml, instantiating a new object cache 2014-05-15 12:18:22,610 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2014-05-15 12:18:22,610 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2014-05-15 12:18:22,610 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2014-05-15 12:18:22,882 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:22,892 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2014-05-15 12:18:23,015 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-05-15 12:18:23,043 INFO crawl.GeneratorJob - GeneratorJob: finished at 2014-05-15 12:18:23, time elapsed: 00:00:03 2014-05-15 12:18:23,044 INFO crawl.GeneratorJob - GeneratorJob: generated batch id: 1400131099-4513 2014-05-15 12:18:24,067 INFO fetcher.FetcherJob - FetcherJob: starting 2014-05-15 12:18:24,068 INFO fetcher.FetcherJob - FetcherJob: batchId: 1400131099-4513 2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: threads: 50 2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: parsing: false 2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob: resuming: false 2014-05-15 12:18:24,071 INFO fetcher.FetcherJob - FetcherJob : timelimit set for : 1400141904071 2014-05-15 12:18:24,919 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2014-05-15 12:18:24,919 INFO http.Http - http.proxy.host = null 2014-05-15 12:18:24,919 INFO http.Http - http.proxy.port = 8080 2014-05-15 12:18:24,919 INFO http.Http - http.timeout = 2147483640 2014-05-15 12:18:24,919 INFO http.Http - http.content.limit = 999999999 2014-05-15 12:18:24,919 INFO http.Http - http.agent = My Nutch Spider/Nutch-2.2.1 2014-05-15 12:18:24,919 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2014-05-15 12:18:24,919 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2014-05-15 12:18:25,672 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:25,869 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:25,927 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:26,066 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:26,641 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:26,781 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:26,852 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:26,868 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2014-05-15 12:18:26,945 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml, instantiating a new object cache 2014-05-15 12:18:27,242 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:27,258 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2014-05-15 12:18:27,261 INFO fetcher.FetcherJob - Using queue mode : byHost 2014-05-15 12:18:27,261 INFO fetcher.FetcherJob - Fetcher: threads: 50 2014-05-15 12:18:27,282 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local270323155_0001.xml, instantiating a new object cache 2014-05-15 12:18:27,284 INFO fetcher.FetcherJob - QueueFeeder finished: total 1 records. Hit by time limit :0 2014-05-15 12:18:27,291 INFO fetcher.FetcherJob - -finishing thread FetcherThread1, activeThreads=1 2014-05-15 12:18:27,291 INFO fetcher.FetcherJob - -finishing thread FetcherThread2, activeThreads=1 2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - -finishing thread FetcherThread4, activeThreads=1 2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - fetching http://www.okezone.com/ (queue crawl delay=5000ms) 2014-05-15 12:18:27,292 INFO fetcher.FetcherJob - -finishing thread FetcherThread3, activeThreads=1 2014-05-15 12:18:27,293 INFO http.Http - http.proxy.host = null 2014-05-15 12:18:27,293 INFO http.Http - http.proxy.port = 8080 2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread FetcherThread6, activeThreads=2 2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread FetcherThread5, activeThreads=1 2014-05-15 12:18:27,293 INFO http.Http - http.timeout = 2147483640 2014-05-15 12:18:27,293 INFO http.Http - http.content.limit = 999999999 2014-05-15 12:18:27,293 INFO http.Http - http.agent = My Nutch Spider/Nutch-2.2.1 2014-05-15 12:18:27,293 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2014-05-15 12:18:27,293 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2014-05-15 12:18:27,293 INFO fetcher.FetcherJob - -finishing thread FetcherThread8, activeThreads=1 2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread FetcherThread9, activeThreads=1 2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread FetcherThread10, activeThreads=1 2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread FetcherThread11, activeThreads=2 2014-05-15 12:18:27,294 INFO fetcher.FetcherJob - -finishing thread FetcherThread12, activeThreads=1 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread13, activeThreads=5 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread14, activeThreads=3 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread15, activeThreads=4 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread16, activeThreads=1 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread7, activeThreads=2 2014-05-15 12:18:27,306 INFO fetcher.FetcherJob - -finishing thread FetcherThread17, activeThreads=1 2014-05-15 12:18:27,307 INFO fetcher.FetcherJob - -finishing thread FetcherThread19, activeThreads=1 2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread FetcherThread21, activeThreads=1 2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread FetcherThread18, activeThreads=1 2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=1 2014-05-15 12:18:27,308 INFO fetcher.FetcherJob - -finishing thread FetcherThread23, activeThreads=1 2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread FetcherThread24, activeThreads=1 2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread FetcherThread25, activeThreads=1 2014-05-15 12:18:27,309 INFO fetcher.FetcherJob - -finishing thread FetcherThread26, activeThreads=1 2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread FetcherThread27, activeThreads=1 2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread FetcherThread28, activeThreads=1 2014-05-15 12:18:27,310 INFO fetcher.FetcherJob - -finishing thread FetcherThread29, activeThreads=1 2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread FetcherThread30, activeThreads=1 2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread FetcherThread31, activeThreads=1 2014-05-15 12:18:27,311 INFO fetcher.FetcherJob - -finishing thread FetcherThread32, activeThreads=1 2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread FetcherThread33, activeThreads=1 2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread FetcherThread22, activeThreads=1 2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread FetcherThread34, activeThreads=1 2014-05-15 12:18:27,312 INFO fetcher.FetcherJob - -finishing thread FetcherThread35, activeThreads=1 2014-05-15 12:18:27,313 INFO fetcher.FetcherJob - -finishing thread FetcherThread36, activeThreads=1 2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread FetcherThread38, activeThreads=1 2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread FetcherThread39, activeThreads=2 2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread FetcherThread37, activeThreads=1 2014-05-15 12:18:27,317 INFO fetcher.FetcherJob - -finishing thread FetcherThread40, activeThreads=2 2014-05-15 12:18:27,318 INFO fetcher.FetcherJob - -finishing thread FetcherThread42, activeThreads=1 2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread FetcherThread43, activeThreads=1 2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread FetcherThread44, activeThreads=1 2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread FetcherThread41, activeThreads=1 2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread FetcherThread45, activeThreads=1 2014-05-15 12:18:27,319 INFO fetcher.FetcherJob - -finishing thread FetcherThread46, activeThreads=1 2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - -finishing thread FetcherThread47, activeThreads=1 2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - Fetcher: throughput threshold: -1 2014-05-15 12:18:27,320 INFO fetcher.FetcherJob - Fetcher: throughput threshold sequence: 5 2014-05-15 12:18:27,321 INFO fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=1 2014-05-15 12:18:27,321 INFO fetcher.FetcherJob - -finishing thread FetcherThread49, activeThreads=1 2014-05-15 12:18:27,448 INFO fetcher.FetcherJob - rules.isAllowed(fit.u.toString()):true 2014-05-15 12:18:28,121 INFO fetcher.FetcherJob - -finishing thread FetcherThread0, activeThreads=0 2014-05-15 12:18:32,321 INFO fetcher.FetcherJob - 0/0 spinwaiting/active, 1 pages, 0 errors, 0.2 0 pages/s, 209 209 kb/s, 0 URLs in 0 queues 2014-05-15 12:18:32,321 INFO fetcher.FetcherJob - -activeThreads=0 2014-05-15 12:18:32,330 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-05-15 12:18:32,489 INFO fetcher.FetcherJob - FetcherJob: done 2014-05-15 12:18:33,550 INFO parse.ParserJob - ParserJob: starting 2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: resuming: false 2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: forced reparse: false 2014-05-15 12:18:33,552 INFO parse.ParserJob - ParserJob: batchId: 1400131099-4513 2014-05-15 12:18:33,860 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2014-05-15 12:18:34,897 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature 2014-05-15 12:18:35,602 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:35,888 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:35,959 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:36,096 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:36,632 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:36,751 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:36,820 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:36,836 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2014-05-15 12:18:37,137 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:37,147 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2014-05-15 12:18:37,149 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local295157817_0001.xml, instantiating a new object cache 2014-05-15 12:18:37,152 INFO crawl.SignatureFactory - Using Signature impl: org.apache.nutch.crawl.MD5Signature 2014-05-15 12:18:37,290 INFO parse.ParserJob - Parsing http://www.okezone.com/ 2014-05-15 12:18:37,292 DEBUG parse.ParseUtil - Parsing [ http://www.okezone.com/] with [org.apache.nutch.parse.html.HtmlParser@8d5fa] 2014-05-15 12:18:37,905 INFO regex.RegexURLNormalizer - can't find rules for scope 'fetcher', using default 2014-05-15 12:18:38,141 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-05-15 12:18:38,475 INFO parse.ParserJob - ParserJob: success 2014-05-15 12:18:39,534 INFO crawl.DbUpdaterJob - DbUpdaterJob: starting 2014-05-15 12:18:39,896 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2014-05-15 12:18:41,090 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:41,325 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:41,390 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:41,526 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:42,115 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:42,269 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:42,343 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:42,362 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 2014-05-15 12:18:42,454 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml, instantiating a new object cache 2014-05-15 12:18:42,678 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:42,687 INFO mapreduce.GoraRecordWriter - gora.buffer.write.limit = 10000 2014-05-15 12:18:42,691 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, file:/tmp/hadoop-irfan/mapred/local/localRunner/job_local873829540_0001.xml, instantiating a new object cache 2014-05-15 12:18:42,691 INFO crawl.FetchScheduleFactory - Using FetchSchedule impl: org.apache.nutch.crawl.DefaultFetchSchedule 2014-05-15 12:18:42,692 INFO crawl.AbstractFetchSchedule - defaultInterval=2592000 2014-05-15 12:18:42,692 INFO crawl.AbstractFetchSchedule - maxInterval=7776000 2014-05-15 12:18:42,849 WARN mapred.FileOutputCommitter - Output path is null in cleanup 2014-05-15 12:18:42,930 INFO crawl.DbUpdaterJob - DbUpdaterJob: done 2014-05-15 12:18:43,976 INFO solr.SolrIndexerJob - SolrIndexerJob: starting 2014-05-15 12:18:44,350 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2014-05-15 12:18:44,454 INFO basic.BasicIndexingFilter - Maximum title length for indexing set to: 100 2014-05-15 12:18:44,454 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.basic.BasicIndexingFilter 2014-05-15 12:18:44,457 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2014-05-15 12:18:44,457 INFO indexer.IndexingFilters - Adding org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2014-05-15 12:18:45,546 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:45,689 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-05-15 12:18:45,827 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:46,454 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:46,584 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:46,657 INFO store.HBaseStore - Keyclass and nameclass match but mismatching table names mappingfile schema is 'webpage' vs actual schema 'okezone.com_webpage' , assuming they are the same. 2014-05-15 12:18:46,667 INFO mapreduce.GoraRecordReader - gora.buffer.read.limit = 10000 Can anyone have any suggestion on how to solve this problem? Thanks. Irfan R.

