Hi Otis,

> This thread seems to have gotten very little attention.
> Jérôme - I'm all for extracting sub-libraries that can really live on its
> own and are substantial enough to warrant "their own identity".
> 
> Personally, I'm the most interested in Language Identifier plugin becoming
> a standalone, Nutch-independent piece.  Doug had suggested we move it to
> Lucene's contrib section.  If you think it makes sense to have some of
> these things lumped together, that's fine, too.  It looks like Language
> Identifier and Charset Detector may go well together.
> 
> Is this something you want/will push for and make happen?

Just to add to this, it's something that I would push for whole-heartedly.
In addition to Jerome, I would be happy to dedicate time to this
sub-project, and feel it's quite worthy of being its own Stand-alone
library. 

Just my two cents, thanks!

Cheers,
  Chris


> 
> Otis
> 
> ----- Original Message ----
> From: Jérôme Charron <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Friday, April 7, 2006 4:26:54 AM
> Subject: [Proposal] New Lucene sub-project
> 
> Hi all,
> 
> While chatting with Chris Mattmann, it seems to be evident to us that
> there
> is a need for a new sub-project within Lucene.
> 
> For now, Lucene's sub-projects used in Nutch are :
> 1. Lucene-java - The basis for search technology
> 2. Hadoop - The distributed computing platform
> 3. Nutch - The search engine that relies on Lucene and Hadoop.
> 
> Since Nutch contains some value added pieces of code that focus on content
> analysis,
> we think it would be a good idea to split Nutch into a new sub-project
> based
> on content analysis
> manipulation. The components we have identified are :
> 
> 1. MimeType Repository
> 2. Language Identifier
> 3. Content Signature (MD5Signature / TextProfileSignature / ...)
> (4. Generic Meta Data Infrastructure)
> (5. Charset Detector)
> (6. Parse Plugins Framework)
> 
> The idea is to expose these pieces of codes into a standalone lib, since
> we
> are convinced they could be usefull
> in many other projects than Nutch.
> The benefits will be to have some code more widely used / tested /
> contributed.
> If this proposal is accepted, we have a candidate name for this new
> project:
> Tika (comes from my son  ;-) )
> 
> Any comment is welcome.
> 
> Jérôme
> 




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to