This thread seems to have gotten very little attention.
Jérôme - I'm all for extracting sub-libraries that can really live on its own 
and are substantial enough to warrant "their own identity".

Personally, I'm the most interested in Language Identifier plugin becoming a 
standalone, Nutch-independent piece.  Doug had suggested we move it to Lucene's 
contrib section.  If you think it makes sense to have some of these things 
lumped together, that's fine, too.  It looks like Language Identifier and 
Charset Detector may go well together.

Is this something you want/will push for and make happen?

Otis

----- Original Message ----
From: Jérôme Charron <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, April 7, 2006 4:26:54 AM
Subject: [Proposal] New Lucene sub-project

Hi all,

While chatting with Chris Mattmann, it seems to be evident to us that there
is a need for a new sub-project within Lucene.

For now, Lucene's sub-projects used in Nutch are :
1. Lucene-java - The basis for search technology
2. Hadoop - The distributed computing platform
3. Nutch - The search engine that relies on Lucene and Hadoop.

Since Nutch contains some value added pieces of code that focus on content
analysis,
we think it would be a good idea to split Nutch into a new sub-project based
on content analysis
manipulation. The components we have identified are :

1. MimeType Repository
2. Language Identifier
3. Content Signature (MD5Signature / TextProfileSignature / ...)
(4. Generic Meta Data Infrastructure)
(5. Charset Detector)
(6. Parse Plugins Framework)

The idea is to expose these pieces of codes into a standalone lib, since we
are convinced they could be usefull
in many other projects than Nutch.
The benefits will be to have some code more widely used / tested /
contributed.
If this proposal is accepted, we have a candidate name for this new project:
Tika (comes from my son  ;-) )

Any comment is welcome.

Jérôme





-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to