[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364683 ]
Stefan Groschupf commented on NUTCH-192:
----------------------------------------
Andrzej, Doug. I'm not sure if I understand you correct, do you suggest to have
string keys and values, or just string keys?
It confuse me a bit but I'm afraid to misunderstand things because of my
english, since I remember that one reason to have no meta data until today was
performance and the size of data.
In one of my personal use-cases I have a set of meta data that is definitely
smaller than 255 and I only need to store some long values.
So I would love to use key:ByteWritable and value:LongWritable.
Storing new LongWritable(23) or new UTF8("23") should be a significant
different in size. Also parsing byte int or long from a string takes some time.
At least there is a nice side effect, since this map also is a writable we can
store a Map in a Map, what allows heretically meta data.
I fully agree with having a manual created mapping table stored in the
MapWritable class and I will change this and commit a new patch.
Thanks for your comments!
> meta data support for CrawlDatum
> --------------------------------
>
> Key: NUTCH-192
> URL: http://issues.apache.org/jira/browse/NUTCH-192
> Project: Nutch
> Type: Improvement
> Versions: 0.8-dev
> Reporter: Stefan Groschupf
> Fix For: 0.8-dev
> Attachments: metadata300106.patch
>
> Supporting meta data in CrawlDatum would help to get a set of new nutch
> features realized and makes a lot possible to smaller special focused search
> engines.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers