Hi, I think it is not directly supported in Nutch2. One way would be to write a tool that simply deletes all fields not needed for general crawling. (Since you want to keep the fields that indicate that the url is already fetched, for example). The big fields that can be deleted after indexing include 'content' and 'text'.
Delete support is currently not optimal in Gora so you might want to implement a workaround by directly using your store specific api. (Of course this would not be of any benefit to the other datastores). If you do not need inlinks (anchor texts) you could strip out some of the functionality of the DbUpdateReducer that writes the inlinks for every row. (Just skip the actual writing of the inlinks to every row, but keeping the scoring functionality that depends on the inlinks). This requires some coding too. Feel free to share other suggestions. Ferdy. On Fri, Aug 3, 2012 at 4:17 PM, Bai Shen <[email protected]> wrote: > In Nutch 1.4, after I indexed a segment, I could delete it to save space. > Is something like this possible with Nutch 2? > > Thanks. >

