Hi list,

jaya@jayapc:~/opt/nutch2$ bin/nutch readdb -crawlId someid_webpage
WebTableReader: java.lang.Exception: Select one of -url | -stat | -dump
  at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:472)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)

jaya@jayapc:~/opt/nutch2$ bin/nutch readdb -crawlId someid_webpage -stat
WebTableReader: java.lang.Exception: Select one of -url | -stat | -dump
  at org.apache.nutch.crawl.WebTableReader.run(WebTableReader.java:472)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at org.apache.nutch.crawl.WebTableReader.main(WebTableReader.java:412)

shame on me, because I'm in hurry, I have not read the instruction before:
jaya@jayapc:~/opt/nutch2$ bin/nutch readdb
Usage: WebTableReader (-stats | -url [url] | -dump <out_dir> [-regex regex])
       [-crawlId <id>] [-content] [-headers] [-links] [-text]
    -crawlId <id>  - the id to prefix the schemas to operate on,
      (default: storage.crawl.id)
    -stats [-sort] - print overall statistics to System.out
    [-sort]        - list status sorted by host
    -url <url>     - print information on <url> to System.out
    -dump <out_dir> [-regex regex] - dump the webtable to a text file in
      <out_dir>
    -content       - dump also raw content
    -headers       - dump protocol headers
    -links         - dump links
    -text          - dump extracted text
    [-regex]       - filter on the URL of the webtable entry

different on '-stat' and '-stats'

Reply via email to