Hi Nana, Replies below On Thu, Apr 21, 2016 at 2:29 AM, <[email protected]> wrote:
> From: Nana Pandiawan <[email protected]> > To: User Apache Nutch <[email protected]> > Cc: > Date: Thu, 21 Apr 2016 10:45:03 +0700 > Subject: Dump Command in Apache Nutch 2.x > Dear All, > > I use Nutch 2.3.1 with HBase, how to find the command like this on nutch > version 2.3.1 : > > /"bin/nutch dump -outputDir DATA_DUMP -segment TestCrawl/segments > -mimetype image/jpeg image/png text/html"/ > > This dump command worked in Apache Nutch version 1.x and return the image > file > > The dump command does not exist for Nutch 2.X, you can see if this if you run ./bin/nutch Although Nutch 1.X and 2.X share the concept of common tools, they are different codebases with Nutch 1.X being under more development. The discrepancy here is simply due to the fact that the FileDumper [0] has not been ported/implemented within the Nutch 2.X codebase. If you feel like this is something you would like, then by all means please open a Jira ticket and submit a patch. Thanks [0] https://github.com/apache/nutch/blob/master/src/java/org/apache/nutch/tools/FileDumper.java

