Doug Cutting wrote:
Stefan Groschupf wrote:
Before we start adding meta data and more meta data, why not once in
general adding meta data to the crawlDatum, than we can have any
kinds of plugins that add and process metadata that belongs to a url.
+1
This feature strikes me as something that might prove very useful, but
might also prove unworkable, or at least not useful to everyone. Thus
it would be best if it doesn't require changes to a core class like
CrawlDatum. If it does eventually prove generally useful, as
something that everyone will use and that should be enabled by
default, then we could promote its data from metadata to a field for
efficiency.
In this vein, should modifiedTime be moved to metadata, once metadata
is added?
I'm of a split mind on this, because I hope that the detection of
unmodified content will be the default mode of operation... OTOH,
perhaps it's a premature micro-optimization. We can move it to metadata
for now, but I see it as a strong candidate to be moved back...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers