Hi all,

I recently configured nutch-GORA on my cassandra DB. My colleague referred
me to the below link, which is awesome.
http://sujitpal.blogspot.in/2012/01/exploring-nutch-gora-with-cassandra.html

I followed the steps in the blog as is. The problem I am having is, the
first time, everything goes well - inject, generate, fetch, and parse. But
when I iterate, nutch fetch does not fetch the data. As a result, my solr
index only has 10 records (from the first successful run), and is not
picking the data from the subsequent runs.

Results from my nutch fetch -

andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch
1329855266-1107256220
FetcherJob: starting
FetcherJob : timelimit set for : -1
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob: batchId: 1329855266-1107256220
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 0 records. Hit by time limit :0
-finishing thread FetcherThread0, activeThreads=0
-finishing thread FetcherThread1, activeThreads=0
-finishing thread FetcherThread2, activeThreads=0
-finishing thread FetcherThread3, activeThreads=0
-finishing thread FetcherThread4, activeThreads=0
-finishing thread FetcherThread5, activeThreads=0
-finishing thread FetcherThread6, activeThreads=0
-finishing thread FetcherThread7, activeThreads=0
-finishing thread FetcherThread8, activeThreads=0
-finishing thread FetcherThread9, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0
-activeThreads=0
FetcherJob: done

*************************************
vs the author of the above blog -

sujit@cyclone:local$ bin/nutch fetch 1325709400-776802111
FetcherJob: starting
FetcherJob : timelimit set for : -1
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob: batchId: 1325709400-776802111
Using queue mode : byHost
Fetcher: threads: 10
/*fetching http://www.parathyroid.com/parathyroid.htm
QueueFeeder finished: total 47 records. Hit by time limit :0
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=46
fetching http://www.parathyroid.com/Parathyroid-Surgeon.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=45
fetching http://www.parathyroid.com/paratiroide/index.html
fetching http://www.parathyroid.com/diagnosis.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=43
fetching http://www.parathyroid.com/parathyroid-adenoma.htm
fetching http://www.parathyroid.com/age.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=41
fetching http://www.parathyroid.com/FHH.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=40
fetching http://www.parathyroid.com/treatment-surgery.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=39
fetching http://www.parathyroid.com/who's_eligible.htm
fetching http://www.parathyroid.com/parathyroid-disease.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=37
fetching http://www.parathyroid.com/FAQ.htm
fetching http://www.parathyroid.com/finding-parathyroid.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=35
fetching http://www.parathyroid.com/hyperparathyroidism-diagnosis.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=34
fetching http://www.parathyroid.com/index.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=33
fetching http://www.parathyroid.com/parathyroid-pictures.htm
fetching http://www.parathyroid.com/Parathyroid-Surgeon-Map.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=31
fetching http://www.parathyroid.com/mini-surgery.htm
fetching http://www.parathyroid.com/about-Parathyroid.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=29
fetching http://www.parathyroid.com/disclaimer.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=28
fetching http://www.parathyroid.com/parathyroid-function.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=27
fetching http://www.parathyroid.com/paratiroide
fetching http://www.parathyroid.com/low-vitamin-d.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=25
fetching http://www.parathyroid.com/parathyroid-symptoms-cartoon.htm
fetching http://www.parathyroid.com/sestamibi.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=23
fetching http://www.parathyroid.com/osteoporosis.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=22
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=22
fetching http://www.parathyroid.com/surgery_cure_rates.htm
fetching http://www.parathyroid.com/low-calcium.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=20
fetching http://www.parathyroid.com/Sensipar-high-calcium.htm
fetching http://www.parathyroid.com/Dr.Norman.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=18
fetching http://www.parathyroid.com/parathyroid-anatomy.htm
fetching http://www.parathyroid.com/parathyroid-surgery.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=16
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=16
fetching http://www.parathyroid.com/hypoparathyroidism.htm
fetching http://www.parathyroid.com/endocrinology.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=14
fetching http://www.parathyroid.com/parathyroid-cancer.htm
fetching http://www.parathyroid.com/testimonials.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=12
fetching http://www.parathyroid.com/hyperparathyroidism-videos.htm
fetching http://www.parathyroid.com/high-calcium.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=10
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=10
fetching http://www.parathyroid.com/osteoporosis2.htm
fetching http://www.parathyroid.com/MEN-Syndrome.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=8
fetching http://www.parathyroid.com/causes.htm
fetching http://www.parathyroid.com/MIRP-Surgery.htm
-activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=6
fetching http://www.parathyroid.com/Re-Operation.htm
fetching http://www.parathyroid.com/pregnancy.htm
-activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=4
* queue: http://www.parathyroid.com*/

I am thinking somewhere, depth needs to be specified? - If yes, where?
I followed all the steps in the blog, and don't see a single error in my log
file. My seed list directory is in - /home/andrew/nutch/

andrew@andrew-ubuntu:pwd
/home/andrew/nutch/
andrew@andrew-ubuntu:~/nutch$ ls -ltr
total 20
drwxrwxr-x  5 pooja pooja 4096 2012-02-19 19:38 workspace
drwxrwxr-x  3 pooja pooja 4096 2012-02-19 21:23 install
drwxrwxr-x 13 pooja pooja 4096 2012-02-20 08:06 gora
drwxrwxr-x  9 pooja pooja 4096 2012-02-20 09:21 branch
drwxrwxr-x  2 pooja pooja 4096 2012-02-21 12:05 web_seeds

andrew@andrew-ubuntu:~/nutch$ cd web_seeds/

andrew@andrew-ubuntu:~/nutch/web_seeds$ ls -ltr
total 4
-rwxr-xr-x 1 andrew andrew 19 2012-02-21 11:03 nutch.txt

andrew@andrew-ubuntu:~/nutch/web_seeds$ cat *
http://www.cnn.com

For your reference, I have also pasted below the nutch inject, generate,
fetch, and parse from my first run.

andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject
/home/andrew/nutch/web_seeds
InjectorJob: starting
InjectorJob: urlDir: /home/andrew/nutch/web_seeds
InjectorJob: finished
andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate
GeneratorJob: Selecting best-scoring urls due for fetch.
GeneratorJob: starting
GeneratorJob: filtering: true
GeneratorJob: done
GeneratorJob: generated batch id: 1329855121-1496717092

andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch
1329855121-1496717092
FetcherJob: starting
FetcherJob : timelimit set for : -1
FetcherJob: threads: 10
FetcherJob: parsing: false
FetcherJob: resuming: false
FetcherJob: batchId: 1329855121-1496717092
Using queue mode : byHost
Fetcher: threads: 10
QueueFeeder finished: total 1 records. Hit by time limit :0
fetching http://www.cnn.com/
-finishing thread FetcherThread1, activeThreads=1
-finishing thread FetcherThread3, activeThreads=1
-finishing thread FetcherThread2, activeThreads=1
-finishing thread FetcherThread4, activeThreads=1
-finishing thread FetcherThread5, activeThreads=1
-finishing thread FetcherThread6, activeThreads=1
-finishing thread FetcherThread7, activeThreads=1
-finishing thread FetcherThread8, activeThreads=1
-finishing thread FetcherThread9, activeThreads=1
-finishing thread FetcherThread0, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0
-activeThreads=0
FetcherJob: done

andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse
1329855121-1496717092
ParserJob: starting
ParserJob: resuming:    false
ParserJob: forced reparse:      false
ParserJob: batchId:     1329855121-1496717092
Parsing http://www.cnn.com/
ParserJob: success

andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb
DbUpdaterJob: starting
DbUpdaterJob: done




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764751.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to