Hello, I have looked a little into nutch code and mailing lists. I think the nutchbase branch (http://issues.apache.org/jira/browse/NUTCH-650) is very interesting, with a good potential to improve code clarity and flexibility (I find data structure quite obscure in current version). The issue is untouched since last august, so my question is : can nutchbase really be part of nutch 1.1 ? Is there still much work to do or is it almost ready ? Is it a worthy issue for an interested developer with a (still !) limited knowledge of the project ?
So far I have only tried to run nutchbase in eclipse by applying the tutorial (http://wiki.apache.org/nutch/RunNutchInEclipse1.0) but I run in errors when building, mostly from Parser and tests. I may start by cleaning this up. Eclipse build errors: Description Resource Path Location Type FetcherOutputFormat cannot be resolved to a type ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 362 Java Problem Generator.GENERATE_MAX_PER_HOST_BY_IP cannot be resolved TestGenerator.java /nutchbase/src/test/org/apache/nutch/crawl line 202 Java Problem ParseImpl cannot be resolved to a type ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 229 Java Problem ParseImpl cannot be resolved to a type BasicFields.java /nutchbase/src/java/org/apache/nutch/indexer/field line 335 Java Problem ParseImpl cannot be resolved to a type ExtParser.java /nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext line 138 Java Problem ParseImpl cannot be resolved to a type MSBaseParser.java /nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms line 108 Java Problem ParseImpl cannot be resolved to a type OOParser.java /nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo line 103 Java Problem ParseImpl cannot be resolved to a type PdfParser.java /nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf line 155 Java Problem ParseImpl cannot be resolved to a type RSSParser.java /nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss line 187 Java Problem ParseImpl cannot be resolved to a type SWFParser.java /nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf line 113 Java Problem ParseImpl cannot be resolved to a type TestIndexingFilters.java /nutchbase/src/test/org/apache/nutch/indexer line 45 Java Problem ParseImpl cannot be resolved to a type TestMoreIndexingFilter.java /nutchbase/src/plugin/index-more/src/test/org/apache/nutch/indexer/more line 61 Java Problem ParseImpl cannot be resolved to a type TextParser.java /nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text line 55 Java Problem ParseImpl cannot be resolved to a type ZipParser.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 105 Java Problem ParseResult cannot be resolved ExtParser.java /nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext line 137 Java Problem ParseResult cannot be resolved MSBaseParser.java /nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms line 107 Java Problem ParseResult cannot be resolved OOParser.java /nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo line 103 Java Problem ParseResult cannot be resolved PdfParser.java /nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf line 155 Java Problem ParseResult cannot be resolved RSSParser.java /nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss line 187 Java Problem ParseResult cannot be resolved SWFParser.java /nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf line 113 Java Problem ParseResult cannot be resolved TextParser.java /nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text line 55 Java Problem ParseResult cannot be resolved ZipParser.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 105 Java Problem ParseResult cannot be resolved to a type ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 159 Java Problem ParseResult cannot be resolved to a type CCParseFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 267 Java Problem ParseResult cannot be resolved to a type CCParseFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 267 Java Problem ParseResult cannot be resolved to a type ExtParser.java /nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext line 69 Java Problem ParseResult cannot be resolved to a type FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 106 Java Problem ParseResult cannot be resolved to a type FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 108 Java Problem ParseResult cannot be resolved to a type FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 108 Java Problem ParseResult cannot be resolved to a type FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 211 Java Problem ParseResult cannot be resolved to a type FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 221 Java Problem ParseResult cannot be resolved to a type HTMLLanguageParser.java /nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang line 90 Java Problem ParseResult cannot be resolved to a type HTMLLanguageParser.java /nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang line 90 Java Problem ParseResult cannot be resolved to a type MSBaseParser.java /nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms line 64 Java Problem ParseResult cannot be resolved to a type MSExcelParser.java /nutchbase/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel line 40 Java Problem ParseResult cannot be resolved to a type MSPowerPointParser.java /nutchbase/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint line 44 Java Problem ParseResult cannot be resolved to a type MSWordParser.java /nutchbase/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword line 43 Java Problem ParseResult cannot be resolved to a type OOParser.java /nutchbase/src/plugin/parse-oo/src/java/org/apache/nutch/parse/oo line 63 Java Problem ParseResult cannot be resolved to a type PdfParser.java /nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf line 69 Java Problem ParseResult cannot be resolved to a type RSSParser.java /nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss line 80 Java Problem ParseResult cannot be resolved to a type RelTagParser.java /nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag line 68 Java Problem ParseResult cannot be resolved to a type RelTagParser.java /nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag line 68 Java Problem ParseResult cannot be resolved to a type SWFParser.java /nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf line 64 Java Problem ParseResult cannot be resolved to a type SWFParser.java /nutchbase/src/plugin/parse-swf/src/java/org/apache/nutch/parse/swf line 125 Java Problem ParseResult cannot be resolved to a type TestFeedParser.java /nutchbase/src/plugin/feed/src/test/org/apache/nutch/parse/feed line 94 Java Problem ParseResult cannot be resolved to a type TextParser.java /nutchbase/src/plugin/parse-text/src/java/org/apache/nutch/parse/text line 41 Java Problem ParseResult cannot be resolved to a type ZipParser.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 55 Java Problem The constructor Fetcher(Configuration) is undefined TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 100 Java Problem The constructor Fetcher(Configuration) is undefined TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 177 Java Problem The constructor Generator(Configuration) is undefined TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 94 Java Problem The constructor Generator(Configuration) is undefined TestGenerator.java /nutchbase/src/test/org/apache/nutch/crawl line 312 Java Problem The constructor Injector(Configuration) is undefined TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 90 Java Problem The constructor Injector(Configuration) is undefined TestInjector.java /nutchbase/src/test/org/apache/nutch/crawl line 70 Java Problem The constructor NutchWritable(ParseImpl) is undefined ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 229 Java Problem The import org.apache.nutch.fetcher.FetcherOutputFormat cannot be resolved ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 44 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 50 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved BasicFields.java /nutchbase/src/java/org/apache/nutch/indexer/field line 61 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved ExtParser.java /nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext line 26 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved MSBaseParser.java /nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms line 39 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved PdfParser.java /nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf line 41 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved RSSParser.java /nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss line 41 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved TestExtParser.java /nutchbase/src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext line 26 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved TestIndexingFilters.java /nutchbase/src/test/org/apache/nutch/indexer line 26 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved TestMSWordParser.java /nutchbase/src/plugin/parse-msword/src/test/org/apache/nutch/parse/msword line 26 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved TestMoreIndexingFilter.java /nutchbase/src/plugin/index-more/src/test/org/apache/nutch/indexer/more line 29 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved TestZipParser.java /nutchbase/src/plugin/parse-zip/src/test/org/apache/nutch/parse/zip line 26 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved ZipParser.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 33 Java Problem The import org.apache.nutch.parse.ParseImpl cannot be resolved ZipTextExtractor.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 41 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 51 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved ExtParser.java /nutchbase/src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext line 21 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved FeedParser.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/parse/feed line 43 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved HTMLLanguageParser.java /nutchbase/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang line 33 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved MSBaseParser.java /nutchbase/src/plugin/lib-parsems/src/java/org/apache/nutch/parse/ms line 40 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved MSExcelParser.java /nutchbase/src/plugin/parse-msexcel/src/java/org/apache/nutch/parse/msexcel line 20 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved MSPowerPointParser.java /nutchbase/src/plugin/parse-mspowerpoint/src/java/org/apache/nutch/parse/mspowerpoint line 20 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved MSWordParser.java /nutchbase/src/plugin/parse-msword/src/java/org/apache/nutch/parse/msword line 21 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved PdfParser.java /nutchbase/src/plugin/parse-pdf/src/java/org/apache/nutch/parse/pdf line 37 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved RSSParser.java /nutchbase/src/plugin/parse-rss/src/java/org/apache/nutch/parse/rss line 36 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved RelTagParser.java /nutchbase/src/plugin/microformats-reltag/src/java/org/apache/nutch/microformats/reltag line 38 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved TestFeedParser.java /nutchbase/src/plugin/feed/src/test/org/apache/nutch/parse/feed line 32 Java Problem The import org.apache.nutch.parse.ParseResult cannot be resolved ZipParser.java /nutchbase/src/plugin/parse-zip/src/java/org/apache/nutch/parse/zip line 34 Java Problem The method calculate(WebTableRow, Parse) in the type Signature is not applicable for the arguments (Content, Parse) ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 187 Java Problem The method calculate(WebTableRow, Parse) in the type Signature is not applicable for the arguments (Content, Parse) ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 208 Java Problem The method fetch(String, int, boolean) from the type Fetcher is not visible TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 178 Java Problem The method fetch(String, int, boolean) in the type Fetcher is not applicable for the arguments (Path, int, boolean) TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 101 Java Problem The method generate(String, long, long, boolean) in the type Generator is not applicable for the arguments (Path, Path, int, int, long, boolean, boolean) TestGenerator.java /nutchbase/src/test/org/apache/nutch/crawl line 313 Java Problem The method generate(String, long, long, boolean) in the type Generator is not applicable for the arguments (Path, Path, int, long, long, boolean, boolean) TestFetcher.java /nutchbase/src/test/org/apache/nutch/fetcher line 95 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 200 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 211 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 213 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 216 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 230 Java Problem The method getData() is undefined for the type Parse ArcSegmentCreator.java /nutchbase/src/java/org/apache/nutch/tools/arc line 244 Java Problem The method getData() is undefined for the type Parse BasicFields.java /nutchbase/src/java/org/apache/nutch/indexer/field line 386 Java Problem The method getData() is undefined for the type Parse BasicFields.java /nutchbase/src/java/org/apache/nutch/indexer/field line 395 Java Problem The method getData() is undefined for the type Parse CCIndexingFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 55 Java Problem The method getData() is undefined for the type Parse CCParseFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 280 Java Problem The method getData() is undefined for the type Parse CCParseFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 286 Java Problem The method getData() is undefined for the type Parse CCParseFilter.java /nutchbase/src/plugin/creativecommons/src/java/org/creativecommons/nutch line 291 Java Problem The method getData() is undefined for the type Parse FeedIndexingFilter.java /nutchbase/src/plugin/feed/src/java/org/apache/nutch/indexer/feed line 76 Java Problem