All,

+5 on NUTCH-61
So far, we have been trying to use this patch with partial success on 0.8.1.
We would be happy to help with work on updating/testing this.

Obviously we are hardly impartial, and we would also like to have NUTCH-422
(index-extra plugin) incorporated (although we are aware that we still have
some cleanup to do and the provision of junit tests).

We have done some further work on NUTCH-185 (XMLParser is configurable xml
parser plugin), but haven't posted as yet because the work is perhaps too
highly-customized (we generate fields automatically without any need to
configure a specific Xpath).  We are still deliberating over the desired
configuration to do this without conflicting with those implementations
where it is necessary to specify which fields go into the index.

Apart from these, we would find the following candidates, which we hope to
use/work on very soon (but perhaps not soon enough for this release), very
useful:

NUTCH-48        "Did you mean" query enhancement/refinement feature
NUTCH-251       Administration GUI
NUTCH-36        Chinese in Nutch
NUTCH-92        DistributedSearch incorrectly scores results

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
http://blog.idna-solutions.com

-----Original Message-----
From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] 
Sent: 16 January 2007 16:19
To: nutch-dev@lucene.apache.org
Subject: Re: Next Nutch release

Sami Siren wrote:
> Hello,
>
> It has been a while from a previous release (0.8.1) and looking at the
> great fixes done in trunk I'd start thinking about baking a new release
> soon.
>
> Looking at the jira roadmaps there are 1 blocking issues (fixing the
> license headers) for 0.8.2 and two other blocking issues for 0.9.0 of
> which I think NUTCH-233 is safe to put in.
>   

Agreed. The replacement regex mentioned in the original comment seems 
safe enough, and simpler.

> The top 10 voted issues are currently:
>
> NUTCH-61       Adaptive re-fetch interval. Detecting umodified content
>   

Well ... I'm of a split mind on this. I can bring this patch up to date 
and apply it before 0.9.0, if we understand that this is a "0" release 
... ;) Otherwise I'd prefer to wait with it right after the release.

I would like also to proceed with NUTCH-339 (Fetcher2 patches + plus 
some changes I made in the meantime), since I'd like to expose the new 
fetcher to a broader audience, and it doesn't affect the existing 
implementation.


> NUTCH-48      "Did you mean" query enhancement/refignment feature
> NUTCH-251     Administration GUI
> NUTCH-289     CrawlDatum should store IP address
>   

I'm still not entirely convinced about this - and there is already a 
mechanism in place to support it if someone really wishes to keep this 
particular info (CrawlDatum.metaData).

> NUTCH-36      Chinese in Nutch
> NUTCH-185     XMLParser is configurable xml parser plugin.
NUTCH-59        meta
> data support in webdb
> NUTCH-92      DistributedSearch incorrectly scores results
NUTCH-68        

This is too intrusive to fix just before the release - and needs 
additional discussion.


> NUTCH-68      A
> tool to generate arbitrary fetchlists         

Easy to port this to 0.9.0 - I can do this.


>       NUTCH-87        Efficient
> site-specific crawling for a large number of sites
>   



-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to