We are trying to create a cluster output post processor which will write
cluster specific data.
You can apply the latest patch available on
https://issues.apache.org/jira/browse/MAHOUT-843 and use
ClusterOutputPostProcessor's distribute method. You won't get
outofmemory there. If this is what you want.
Paritosh
On 18-11-2011 12:09, zou.cl wrote:
Hi guys,
I just noticed the out of memory problem in the ClusterDumper class. It
seems that it loads all the data (for example, the clusteredPoints) into the
Map container which cost huge memory if we have GBs data. I think we could also
use Mapreduce to print the results instead of loading all into memory.
zou.cl via foxmail
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any
accompanying attachment(s)
is intended only for the use of the intended recipient and may be confidential
and/or privileged of
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of
this communication is
not the intended recipient, unauthorized use, forwarding, printing, storing,
disclosure or copying
is strictly prohibited, and may be unlawful.If you have received this
communication in error,please
immediately notify the sender by return e-mail, and delete the original message
and all copies from
your system. Thank you.
---------------------------------------------------------------------------------------------------
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/4022 - Release Date: 11/17/11