Reducing dimension (drastically, try less than 100 if functionality allows this) can be a solution.

Which vector implementation are you using? If the vectors are sparsely populated ( have lots of uninitialized/unused dimensions) , you can use RandomAccessSparseVector or SequentialAccessSparseVector, which will populate only the dimensions which you are using. This can also decrease memory consumption.

On 04-11-2011 11:19, gaurav redkar wrote:
Hi,

yes Paritosh..even i think the same. actually i am using a test data set
that has 5000 tuples with 1000 dimensions each.  the thing is der are too
many files created in the pointsDir folder and i think the program tries to
open a path to all d files(i.e. read all the files in memory at once). Is
my interpretation correct.?? Also how do i go about fixing it..?

Thanks



On Fri, Nov 4, 2011 at 11:03 AM, Paritosh Ranjan<[email protected]>  wrote:

Reading point is keeping everything in memory which might have crashed it.
pointList.add(record.**getSecond());

Your dataset size is 40 MB but the vectors might be too large. How many
dimensions are you having in your Vector?


On 04-11-2011 10:57, gaurav redkar wrote:

Hello,

I am in  a fix with the Clusterdumper utility. The clusterdump utility
crashes when it tries to output the clusters by outputting an out of
memory
exception: java heap space.

when i checked the error stack, it seems that the program crashed in
readPoints() function. i guess it is unable to build the "result" map. Any
idea how do i fix this.??

I am working on a dataset of size 40mb. I had tried increaseing the heap
space but with no luck.

Thanks

Gaurav



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11




-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11

Reply via email to