Jerome,

 I think that this is a great idea and ensures that there isn't replication
of so-called "management information" across the system. It could be easily
implemented as a utility method because we have utility java classes that
represent the ParsePluginList, that you could get the mimeTypes from.
Additionally, we could create a utility method that searches the extension
point list for parsing plugins and returns a boolean true or false whether
they are activated or not. Using this information, I believe that the url
filtering would be a snap.

 

+1

Cheers,
  Chris



On 12/1/05 12:11 PM, "Jérôme Charron" <[EMAIL PROTECTED]> wrote:

> Suggestion:
> For consistency purpose, and easy of nutch management, why not filtering the
> extensions based on the activated plugins?
> By looking at the mime-types defined in the parse-plugins.xml file and the
> activated plugins, we know which content-types will be parsed.
> So, by getting the file extensions associated to each content-type, we can
> build a list of file extensions to include (other ones will be excluded) in
> the fecth process.
> No?
> 
> Jérôme
> 
> --
> http://motrech.free.fr/
> http://www.frutch.org/

______________________________________________
Chris A. Mattmann
[EMAIL PROTECTED]
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group

_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to