> Hi all, > > The following issues need to be discussed and appropriate action taken > before the 0.9 release: > > Blocker > ======== > * NUTCH-400 (Update & add missing license headers) - I believe this is > fixed and should be closed > > * NUTCH-353 (pages that serverside forwards will be refetched every > time) - this was partially fixed in NUTCH-273, but a more complete > solution would require significant changes to LinkDb. As there are no > patches implementing this, I left it open, but it's no longer as > critical as it was before. I propose to move it to "Major" and address > it in the next release. > > * NUTCH-233 (wrong regular expression hang reduce process for ever) - I > propose to apply the fix provided by Sean Dean and close this issue for > now. > > Critical > ======== > * NUTCH-436 (Incorrect handling of relative paths when the embedded URL > path is empty). There is no patch available yet. If someone could > contribute a patch I'd like to see this fixed before the release.
I am starting to take a look at this. I will try to get it fixed before we release. > > * NUTCH-427 (protocol-smb). This relies on a LGPL library, and it's > certainly not critical (as this is an optional new feature). I propose > to change it to Major, and make a decision - do we want another plugin > like parse-mp3 or parse-rtf, or not. > > * NUTCH-381 (Ignore external link not work as expected) - I'll try to > reproduce it, and if I find an easy fix I'd like to apply it before the > release. > > * NUTCH-277 (Fetcher dies because of "max. redirects") - I wasn't able > to reproduce it. If there is no updated information on this I propose to > close it with "Can't reproduce". > > * NUTCH-167 (Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE">) - > there's a patch which I tested in a limited production env. If there are > no objections I'd like to apply it before the release. > > Major > ===== > There are 84 major issues, but some of them are either invalid, or > should be "minor", or no longer apply and should be closed. Please > review them if you can and provide some comments or recommendations if > you think you have some new information. > > > One decision also that we need to make is which version of Hadoop should > be included in the release. Current trunk uses 0.10.1, I have a set of > production-tested patches that use 0.11.2, and today the Hadoop team > released 0.12.0 (to be followed shortly by a 0.12.1, most likely in time > before our release). The most conservative option is to stay with > 0.10.1, but by the time people start using Nutch this will be a fairly > old version already. I propose to upgrade to 0.11.2. We could use 0.12.1 > - but in this case with the expectation that we release less than stable > version of Nutch to be soon followed by a minor stable release ... +1 for using 0.11.2. I looked through the release notes for 0.12 and there were some niceties such as HADOOP-432 for undeletes and alot of bug fixes, but it didn't look like there were any critical issues as far as Nutch is concerned. Dennis Kubes > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers