We are trying to create a cluster output post processor which will write cluster specific data. You can apply the latest patch available on https://issues.apache.org/jira/browse/MAHOUT-843 and use ClusterOutputPostProcessor's distribute method. You won't get outofmemory there. If this is what you want.

Paritosh

On 18-11-2011 12:09, zou.cl wrote:
Hi guys,

      I just noticed the out of memory problem in the ClusterDumper class. It 
seems that it loads all the data (for example, the clusteredPoints) into the 
Map container which cost huge memory if we have GBs data. I think we could also 
use Mapreduce to print the results instead of loading all into memory.








zou.cl via foxmail
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any 
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential 
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of 
this communication is
not the intended recipient, unauthorized use, forwarding, printing,  storing, 
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this 
communication in error,please
immediately notify the sender by return e-mail, and delete the original message 
and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11

Reply via email to