Hi all, I recently configured nutch-GORA on my cassandra DB. My colleague referred me to the below link, which is awesome. http://sujitpal.blogspot.in/2012/01/exploring-nutch-gora-with-cassandra.html
I followed the steps in the blog as is. The problem I am having is, the first time, everything goes well - inject, generate, fetch, and parse. But when I iterate, nutch fetch does not fetch the data. As a result, my solr index only has 10 records (from the first successful run), and is not picking the data from the subsequent runs. Results from my nutch fetch - andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329855266-1107256220 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329855266-1107256220 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread0, activeThreads=0 -finishing thread FetcherThread1, activeThreads=0 -finishing thread FetcherThread2, activeThreads=0 -finishing thread FetcherThread3, activeThreads=0 -finishing thread FetcherThread4, activeThreads=0 -finishing thread FetcherThread5, activeThreads=0 -finishing thread FetcherThread6, activeThreads=0 -finishing thread FetcherThread7, activeThreads=0 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done ************************************* vs the author of the above blog - sujit@cyclone:local$ bin/nutch fetch 1325709400-776802111 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1325709400-776802111 Using queue mode : byHost Fetcher: threads: 10 /*fetching http://www.parathyroid.com/parathyroid.htm QueueFeeder finished: total 47 records. Hit by time limit :0 -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=46 fetching http://www.parathyroid.com/Parathyroid-Surgeon.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=45 fetching http://www.parathyroid.com/paratiroide/index.html fetching http://www.parathyroid.com/diagnosis.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=43 fetching http://www.parathyroid.com/parathyroid-adenoma.htm fetching http://www.parathyroid.com/age.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=41 fetching http://www.parathyroid.com/FHH.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=40 fetching http://www.parathyroid.com/treatment-surgery.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=39 fetching http://www.parathyroid.com/who's_eligible.htm fetching http://www.parathyroid.com/parathyroid-disease.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=37 fetching http://www.parathyroid.com/FAQ.htm fetching http://www.parathyroid.com/finding-parathyroid.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=35 fetching http://www.parathyroid.com/hyperparathyroidism-diagnosis.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=34 fetching http://www.parathyroid.com/index.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=33 fetching http://www.parathyroid.com/parathyroid-pictures.htm fetching http://www.parathyroid.com/Parathyroid-Surgeon-Map.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=31 fetching http://www.parathyroid.com/mini-surgery.htm fetching http://www.parathyroid.com/about-Parathyroid.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=29 fetching http://www.parathyroid.com/disclaimer.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=28 fetching http://www.parathyroid.com/parathyroid-function.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=27 fetching http://www.parathyroid.com/paratiroide fetching http://www.parathyroid.com/low-vitamin-d.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=25 fetching http://www.parathyroid.com/parathyroid-symptoms-cartoon.htm fetching http://www.parathyroid.com/sestamibi.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=23 fetching http://www.parathyroid.com/osteoporosis.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=22 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=22 fetching http://www.parathyroid.com/surgery_cure_rates.htm fetching http://www.parathyroid.com/low-calcium.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=20 fetching http://www.parathyroid.com/Sensipar-high-calcium.htm fetching http://www.parathyroid.com/Dr.Norman.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=18 fetching http://www.parathyroid.com/parathyroid-anatomy.htm fetching http://www.parathyroid.com/parathyroid-surgery.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=16 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=16 fetching http://www.parathyroid.com/hypoparathyroidism.htm fetching http://www.parathyroid.com/endocrinology.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=14 fetching http://www.parathyroid.com/parathyroid-cancer.htm fetching http://www.parathyroid.com/testimonials.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=12 fetching http://www.parathyroid.com/hyperparathyroidism-videos.htm fetching http://www.parathyroid.com/high-calcium.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=10 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=10 fetching http://www.parathyroid.com/osteoporosis2.htm fetching http://www.parathyroid.com/MEN-Syndrome.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=8 fetching http://www.parathyroid.com/causes.htm fetching http://www.parathyroid.com/MIRP-Surgery.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=6 fetching http://www.parathyroid.com/Re-Operation.htm fetching http://www.parathyroid.com/pregnancy.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=4 * queue: http://www.parathyroid.com*/ I am thinking somewhere, depth needs to be specified? - If yes, where? I followed all the steps in the blog, and don't see a single error in my log file. My seed list directory is in - /home/andrew/nutch/ andrew@andrew-ubuntu:pwd /home/andrew/nutch/ andrew@andrew-ubuntu:~/nutch$ ls -ltr total 20 drwxrwxr-x 5 pooja pooja 4096 2012-02-19 19:38 workspace drwxrwxr-x 3 pooja pooja 4096 2012-02-19 21:23 install drwxrwxr-x 13 pooja pooja 4096 2012-02-20 08:06 gora drwxrwxr-x 9 pooja pooja 4096 2012-02-20 09:21 branch drwxrwxr-x 2 pooja pooja 4096 2012-02-21 12:05 web_seeds andrew@andrew-ubuntu:~/nutch$ cd web_seeds/ andrew@andrew-ubuntu:~/nutch/web_seeds$ ls -ltr total 4 -rwxr-xr-x 1 andrew andrew 19 2012-02-21 11:03 nutch.txt andrew@andrew-ubuntu:~/nutch/web_seeds$ cat * http://www.cnn.com For your reference, I have also pasted below the nutch inject, generate, fetch, and parse from my first run. andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject /home/andrew/nutch/web_seeds InjectorJob: starting InjectorJob: urlDir: /home/andrew/nutch/web_seeds InjectorJob: finished andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: true GeneratorJob: done GeneratorJob: generated batch id: 1329855121-1496717092 andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329855121-1496717092 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329855121-1496717092 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 1 records. Hit by time limit :0 fetching http://www.cnn.com/ -finishing thread FetcherThread1, activeThreads=1 -finishing thread FetcherThread3, activeThreads=1 -finishing thread FetcherThread2, activeThreads=1 -finishing thread FetcherThread4, activeThreads=1 -finishing thread FetcherThread5, activeThreads=1 -finishing thread FetcherThread6, activeThreads=1 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread8, activeThreads=1 -finishing thread FetcherThread9, activeThreads=1 -finishing thread FetcherThread0, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse 1329855121-1496717092 ParserJob: starting ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: batchId: 1329855121-1496717092 Parsing http://www.cnn.com/ ParserJob: success andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb DbUpdaterJob: starting DbUpdaterJob: done -- View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764751.html Sent from the Nutch - User mailing list archive at Nabble.com.

