[ http://issues.apache.org/jira/browse/NUTCH-192?page=all ]

Stefan Groschupf updated NUTCH-192:
-----------------------------------

    Attachment: metadata300106.patch

Attached a first suggestion for a patch to adding meta data support into 
crawlDatum. 
In general I created a MapWritable and add this to the CrawlDatum. If no meta 
data are added to CrawlDatum there will be only one more int written to the 
output stream. The MapWritable works like a HashMap but requre Writables as key 
and value. Beside the key and the value size it writes two addition int's into 
the stream to identify the classes of  key and value. If we may be more change 
the WritableName we can minimize that to two addidtional bytes for storing 
classes (this would limit us but i guess we will neve so mache writable object 
types. :-o). However I started with a patch that changes as less as possible 
and I'm sure there is space for improvements. So feedback and improvement 
suggestions are welcome.



> meta data support for CrawlDatum
> --------------------------------
>
>          Key: NUTCH-192
>          URL: http://issues.apache.org/jira/browse/NUTCH-192
>      Project: Nutch
>         Type: Improvement
>     Versions: 0.8-dev
>     Reporter: Stefan Groschupf
>      Fix For: 0.8-dev
>  Attachments: metadata300106.patch
>
> Supporting meta data in CrawlDatum would help to get a set of new nutch 
> features realized and makes a lot possible to smaller special focused search 
> engines.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to