This thread seems to have gotten very little attention. Jérôme - I'm all for extracting sub-libraries that can really live on its own and are substantial enough to warrant "their own identity".
Personally, I'm the most interested in Language Identifier plugin becoming a standalone, Nutch-independent piece. Doug had suggested we move it to Lucene's contrib section. If you think it makes sense to have some of these things lumped together, that's fine, too. It looks like Language Identifier and Charset Detector may go well together. Is this something you want/will push for and make happen? Otis ----- Original Message ---- From: Jérôme Charron <[EMAIL PROTECTED]> To: [email protected] Sent: Friday, April 7, 2006 4:26:54 AM Subject: [Proposal] New Lucene sub-project Hi all, While chatting with Chris Mattmann, it seems to be evident to us that there is a need for a new sub-project within Lucene. For now, Lucene's sub-projects used in Nutch are : 1. Lucene-java - The basis for search technology 2. Hadoop - The distributed computing platform 3. Nutch - The search engine that relies on Lucene and Hadoop. Since Nutch contains some value added pieces of code that focus on content analysis, we think it would be a good idea to split Nutch into a new sub-project based on content analysis manipulation. The components we have identified are : 1. MimeType Repository 2. Language Identifier 3. Content Signature (MD5Signature / TextProfileSignature / ...) (4. Generic Meta Data Infrastructure) (5. Charset Detector) (6. Parse Plugins Framework) The idea is to expose these pieces of codes into a standalone lib, since we are convinced they could be usefull in many other projects than Nutch. The benefits will be to have some code more widely used / tested / contributed. If this proposal is accepted, we have a candidate name for this new project: Tika (comes from my son ;-) ) Any comment is welcome. Jérôme ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
