ParseData's contentMeta accumulates unnecessary values during parse
-------------------------------------------------------------------
Key: NUTCH-535
URL: https://issues.apache.org/jira/browse/NUTCH-535
Project: Nutch
Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Doğacan Güney
Assignee: Doğacan Güney
Fix For: 1.0.0
After NUTCH-506, if you run parse on a segment, parseData's contentMeta
accumulates metadata of every content parsed so far. This is because NUTCH-506
changed constructor to create a new metadata (before NUTCH-506, a new metadata
was created for every call to readFields). It seems hadoop somehow caches
Content instance so each new call to Content.readFields during ParseSegment
increases size of metadata. Because of this, one can end up with *huge*
parse_data directory (something like 10 times larger than content directory)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers