[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated NUTCH-677: ------------------------------------ Fix Version/s: (was: 1.1) - pushing this out per http://bit.ly/c7tBv9 > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, > SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java, > SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. > Unfortunately current URL based filtering does not allow for this. So I have > created a new SegmentMergeFilter extension which receives segment entry which > is being merged and decides if it should be included or not. Even though I > needed only ParseData for my purpose I have done it a bit more general > purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't > have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.