Hi list,
jaya@jayapc:~/opt/nutch2$ bin/nutch readdb -crawlId someid_webpage
WebTableReader: java.lang.Exception: Select one of -url | -stat | -dump
at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:472)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
jaya@jayapc:~/opt/nutch2$ bin/nutch readdb -crawlId someid_webpage -stat
WebTableReader: java.lang.Exception: Select one of -url | -stat | -dump
at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:472)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)
shame on me, because I'm in hurry, I have not read the instruction before:
jaya@jayapc:~/opt/nutch2$ bin/nutch readdb
Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex regex])
[-crawlId <id>] [-content] [-headers] [-links] [-text]
-crawlId <id> - the id to prefix the schemas to operate on,
(default: storage.crawl.id)
-stats [-sort] - print overall statistics to System.out
[-sort] - list status sorted by host
-url <url> - print information on <url> to System.out
-dump <out_dir> [-regex regex] - dump the webtable to a text file in
<out_dir>
-content - dump also raw content
-headers - dump protocol headers
-links - dump links
-text - dump extracted text
[-regex] - filter on the URL of the webtable entry
different on '-stat' and '-stats'