All, +5 on NUTCH-61 So far, we have been trying to use this patch with partial success on 0.8.1. We would be happy to help with work on updating/testing this.
Obviously we are hardly impartial, and we would also like to have NUTCH-422 (index-extra plugin) incorporated (although we are aware that we still have some cleanup to do and the provision of junit tests). We have done some further work on NUTCH-185 (XMLParser is configurable xml parser plugin), but haven't posted as yet because the work is perhaps too highly-customized (we generate fields automatically without any need to configure a specific Xpath). We are still deliberating over the desired configuration to do this without conflicting with those implementations where it is necessary to specify which fields go into the index. Apart from these, we would find the following candidates, which we hope to use/work on very soon (but perhaps not soon enough for this release), very useful: NUTCH-48 "Did you mean" query enhancement/refinement feature NUTCH-251 Administration GUI NUTCH-36 Chinese in Nutch NUTCH-92 DistributedSearch incorrectly scores results Best regards, Alan _________________________ Alan Tanaman iDNA Solutions http://blog.idna-solutions.com -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: 16 January 2007 16:19 To: nutch-dev@lucene.apache.org Subject: Re: Next Nutch release Sami Siren wrote: > Hello, > > It has been a while from a previous release (0.8.1) and looking at the > great fixes done in trunk I'd start thinking about baking a new release > soon. > > Looking at the jira roadmaps there are 1 blocking issues (fixing the > license headers) for 0.8.2 and two other blocking issues for 0.9.0 of > which I think NUTCH-233 is safe to put in. > Agreed. The replacement regex mentioned in the original comment seems safe enough, and simpler. > The top 10 voted issues are currently: > > NUTCH-61 Adaptive re-fetch interval. Detecting umodified content > Well ... I'm of a split mind on this. I can bring this patch up to date and apply it before 0.9.0, if we understand that this is a "0" release ... ;) Otherwise I'd prefer to wait with it right after the release. I would like also to proceed with NUTCH-339 (Fetcher2 patches + plus some changes I made in the meantime), since I'd like to expose the new fetcher to a broader audience, and it doesn't affect the existing implementation. > NUTCH-48 "Did you mean" query enhancement/refignment feature > NUTCH-251 Administration GUI > NUTCH-289 CrawlDatum should store IP address > I'm still not entirely convinced about this - and there is already a mechanism in place to support it if someone really wishes to keep this particular info (CrawlDatum.metaData). > NUTCH-36 Chinese in Nutch > NUTCH-185 XMLParser is configurable xml parser plugin. NUTCH-59 meta > data support in webdb > NUTCH-92 DistributedSearch incorrectly scores results NUTCH-68 This is too intrusive to fix just before the release - and needs additional discussion. > NUTCH-68 A > tool to generate arbitrary fetchlists Easy to port this to 0.9.0 - I can do this. > NUTCH-87 Efficient > site-specific crawling for a large number of sites > -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers