[ https://issues.apache.org/jira/browse/NUTCH-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633370#action_12633370 ]
Andrzej Bialecki commented on NUTCH-451: ----------------------------------------- I'm closing this issue, as the tool is not general enough to be included in Nutch. The code stays here, so anyone can still use it. > Tool to recover partial fetcher output > -------------------------------------- > > Key: NUTCH-451 > URL: https://issues.apache.org/jira/browse/NUTCH-451 > Project: Nutch > Issue Type: Improvement > Components: fetcher > Affects Versions: 0.9.0 > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Minor > Fix For: 1.0.0 > > Attachments: LocalFetchRecover-0.8.1.java, LocalFetchRecover.java > > > This class may help you to recover partial data from a failed Fetcher run. > NOTE 1: this works ONLY if you ran Fetcher using "local" file system, i.e. > you didn't use DFS - partial output to DFS is permanently lost if a process > fails to properly close the output streams. > NOTE 2: if Fetcher was stopped abruptly (killed or crashed), then partial > SequenceFile-s will be corrupted at the end. This means that it won't be > possible to recover all data from them - most likely only the data up to the > last sync marker can be recovered. > The recovery proces requires some preparation: > * determine the map directories corresponding to the map task outputs of the > failed job. These map directories contain SequenceFile-s consisting of pairs > of <Text, FetcherOutput>, named e.g. part-0.out, or file.out, or spill0.out. > * create the new input directory, let's say input/. Copy all SequenceFile-s > into this directory, renaming them sequentially like this: > input/part-00000 > input/part-00001 > input/part-00002 > input/part-00003 > ... > > * specify the "input" directory as the input to this tool. > If all goes well, a new segment will be created as a subdirectory of the > output dir. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.