Tool to recover partial fetcher output --------------------------------------
Key: NUTCH-451 URL: https://issues.apache.org/jira/browse/NUTCH-451 Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 0.9.0 Reporter: Andrzej Bialecki Assigned To: Andrzej Bialecki Fix For: 0.9.0 Attachments: LocalFetchRecover.java This class may help you to recover partial data from a failed Fetcher run. NOTE 1: this works ONLY if you ran Fetcher using "local" file system, i.e. you didn't use DFS - partial output to DFS is permanently lost if a process fails to properly close the output streams. NOTE 2: if Fetcher was stopped abruptly (killed or crashed), then partial SequenceFile-s will be corrupted at the end. This means that it won't be possible to recover all data from them - most likely only the data up to the last sync marker can be recovered. The recovery proces requires some preparation: * determine the map directories corresponding to the map task outputs of the failed job. These map directories contain SequenceFile-s consisting of pairs of <Text, FetcherOutput>, named e.g. part-0.out, or file.out, or spill0.out. * create the new input directory, let's say input/. Copy all SequenceFile-s into this directory, renaming them sequentially like this: input/part-00000 input/part-00001 input/part-00002 input/part-00003 ... * specify the "input" directory as the input to this tool. If all goes well, a new segment will be created as a subdirectory of the output dir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers