Hi apachenutch, I am the author of the blog post...thanks for the kind words...
Did you miss the updatedb by any chance? This takes the outlinks from the parsed pages and adds them back to the fetch list so generate can then make these available for fetching... So initial cycle: inject, generate, fetch, parse, updatedb next cycle: generate, fetch, parse, updatedb ... finally: solrindex -sujit On Feb 21, 2012, at 12:32 PM, apachenutch wrote: > Hi all, > > I recently configured nutch-GORA on my cassandra DB. My colleague referred > me to the below link, which is awesome. > http://sujitpal.blogspot.in/2012/01/exploring-nutch-gora-with-cassandra.html > > I followed the steps in the blog as is. The problem I am having is, the > first time, everything goes well - inject, generate, fetch, and parse. But > when I iterate, nutch fetch does not fetch the data. As a result, my solr > index only has 10 records (from the first successful run), and is not > picking the data from the subsequent runs. > > Results from my nutch fetch - > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch > 1329855266-1107256220 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1329855266-1107256220 > Using queue mode : byHost > Fetcher: threads: 10 > QueueFeeder finished: total 0 records. Hit by time limit :0 > -finishing thread FetcherThread0, activeThreads=0 > -finishing thread FetcherThread1, activeThreads=0 > -finishing thread FetcherThread2, activeThreads=0 > -finishing thread FetcherThread3, activeThreads=0 > -finishing thread FetcherThread4, activeThreads=0 > -finishing thread FetcherThread5, activeThreads=0 > -finishing thread FetcherThread6, activeThreads=0 > -finishing thread FetcherThread7, activeThreads=0 > -finishing thread FetcherThread8, activeThreads=0 > -finishing thread FetcherThread9, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 > -activeThreads=0 > FetcherJob: done > > ************************************* > vs the author of the above blog - > > sujit@cyclone:local$ bin/nutch fetch 1325709400-776802111 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1325709400-776802111 > Using queue mode : byHost > Fetcher: threads: 10 > /*fetching http://www.parathyroid.com/parathyroid.htm > QueueFeeder finished: total 47 records. Hit by time limit :0 > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=46 > fetching http://www.parathyroid.com/Parathyroid-Surgeon.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=45 > fetching http://www.parathyroid.com/paratiroide/index.html > fetching http://www.parathyroid.com/diagnosis.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=43 > fetching http://www.parathyroid.com/parathyroid-adenoma.htm > fetching http://www.parathyroid.com/age.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=41 > fetching http://www.parathyroid.com/FHH.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=40 > fetching http://www.parathyroid.com/treatment-surgery.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=39 > fetching http://www.parathyroid.com/who's_eligible.htm > fetching http://www.parathyroid.com/parathyroid-disease.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=37 > fetching http://www.parathyroid.com/FAQ.htm > fetching http://www.parathyroid.com/finding-parathyroid.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=35 > fetching http://www.parathyroid.com/hyperparathyroidism-diagnosis.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=34 > fetching http://www.parathyroid.com/index.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=33 > fetching http://www.parathyroid.com/parathyroid-pictures.htm > fetching http://www.parathyroid.com/Parathyroid-Surgeon-Map.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=31 > fetching http://www.parathyroid.com/mini-surgery.htm > fetching http://www.parathyroid.com/about-Parathyroid.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=29 > fetching http://www.parathyroid.com/disclaimer.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=28 > fetching http://www.parathyroid.com/parathyroid-function.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=27 > fetching http://www.parathyroid.com/paratiroide > fetching http://www.parathyroid.com/low-vitamin-d.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=25 > fetching http://www.parathyroid.com/parathyroid-symptoms-cartoon.htm > fetching http://www.parathyroid.com/sestamibi.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=23 > fetching http://www.parathyroid.com/osteoporosis.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=22 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=22 > fetching http://www.parathyroid.com/surgery_cure_rates.htm > fetching http://www.parathyroid.com/low-calcium.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=20 > fetching http://www.parathyroid.com/Sensipar-high-calcium.htm > fetching http://www.parathyroid.com/Dr.Norman.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=18 > fetching http://www.parathyroid.com/parathyroid-anatomy.htm > fetching http://www.parathyroid.com/parathyroid-surgery.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=16 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=16 > fetching http://www.parathyroid.com/hypoparathyroidism.htm > fetching http://www.parathyroid.com/endocrinology.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=14 > fetching http://www.parathyroid.com/parathyroid-cancer.htm > fetching http://www.parathyroid.com/testimonials.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=12 > fetching http://www.parathyroid.com/hyperparathyroidism-videos.htm > fetching http://www.parathyroid.com/high-calcium.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=10 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=10 > fetching http://www.parathyroid.com/osteoporosis2.htm > fetching http://www.parathyroid.com/MEN-Syndrome.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=8 > fetching http://www.parathyroid.com/causes.htm > fetching http://www.parathyroid.com/MIRP-Surgery.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=6 > fetching http://www.parathyroid.com/Re-Operation.htm > fetching http://www.parathyroid.com/pregnancy.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=4 > * queue: http://www.parathyroid.com*/ > > I am thinking somewhere, depth needs to be specified? - If yes, where? > I followed all the steps in the blog, and don't see a single error in my log > file. My seed list directory is in - /home/andrew/nutch/ > > andrew@andrew-ubuntu:pwd > /home/andrew/nutch/ > andrew@andrew-ubuntu:~/nutch$ ls -ltr > total 20 > drwxrwxr-x 5 pooja pooja 4096 2012-02-19 19:38 workspace > drwxrwxr-x 3 pooja pooja 4096 2012-02-19 21:23 install > drwxrwxr-x 13 pooja pooja 4096 2012-02-20 08:06 gora > drwxrwxr-x 9 pooja pooja 4096 2012-02-20 09:21 branch > drwxrwxr-x 2 pooja pooja 4096 2012-02-21 12:05 web_seeds > > andrew@andrew-ubuntu:~/nutch$ cd web_seeds/ > > andrew@andrew-ubuntu:~/nutch/web_seeds$ ls -ltr > total 4 > -rwxr-xr-x 1 andrew andrew 19 2012-02-21 11:03 nutch.txt > > andrew@andrew-ubuntu:~/nutch/web_seeds$ cat * > http://www.cnn.com > > For your reference, I have also pasted below the nutch inject, generate, > fetch, and parse from my first run. > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject > /home/andrew/nutch/web_seeds > InjectorJob: starting > InjectorJob: urlDir: /home/andrew/nutch/web_seeds > InjectorJob: finished > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate > GeneratorJob: Selecting best-scoring urls due for fetch. > GeneratorJob: starting > GeneratorJob: filtering: true > GeneratorJob: done > GeneratorJob: generated batch id: 1329855121-1496717092 > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch > 1329855121-1496717092 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1329855121-1496717092 > Using queue mode : byHost > Fetcher: threads: 10 > QueueFeeder finished: total 1 records. Hit by time limit :0 > fetching http://www.cnn.com/ > -finishing thread FetcherThread1, activeThreads=1 > -finishing thread FetcherThread3, activeThreads=1 > -finishing thread FetcherThread2, activeThreads=1 > -finishing thread FetcherThread4, activeThreads=1 > -finishing thread FetcherThread5, activeThreads=1 > -finishing thread FetcherThread6, activeThreads=1 > -finishing thread FetcherThread7, activeThreads=1 > -finishing thread FetcherThread8, activeThreads=1 > -finishing thread FetcherThread9, activeThreads=1 > -finishing thread FetcherThread0, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 > -activeThreads=0 > FetcherJob: done > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse > 1329855121-1496717092 > ParserJob: starting > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: batchId: 1329855121-1496717092 > Parsing http://www.cnn.com/ > ParserJob: success > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb > DbUpdaterJob: starting > DbUpdaterJob: done > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764751.html > Sent from the Nutch - User mailing list archive at Nabble.com.

