CLASSIFICATION: UNCLASSIFIED

Working thru the tutorial for v1 of nutch.
urls/seed.txt contains
https://the.website.mil/inside/

regex-urlfilter.txt contains edits...

# accept anything else
#+.

# limit to the.website.mil
+^https://([a-z0-9]*\.)the.website.mil/inside

Yet nothing gets populated in the crawl db...

bin/nutch inject crawl/crawldb urls
Injector: starting at 2016-07-21 07:32:02
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Total number of urls rejected by filters: 1
Injector: Total number of urls after normalization: 0
Injector: Merging injected urls into crawl db.
Injector: overwrite: false
Injector: update: false
Injector: URLs merged: 0
Injector: Total new urls injected: 0

Thanks,
Kris

~~~~~~~~~~~~~~~~~~~~~~~~~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.      
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
[email protected]
~~~~~~~~~~~~~~~~~~~~~~~~~~


CLASSIFICATION: UNCLASSIFIED

Reply via email to