Reducing dimension (drastically, try less than 100 if functionality
allows this) can be a solution.
Which vector implementation are you using? If the vectors are sparsely
populated ( have lots of uninitialized/unused dimensions) , you can use
RandomAccessSparseVector or SequentialAccessSparseVector, which will
populate only the dimensions which you are using. This can also decrease
memory consumption.
On 04-11-2011 11:19, gaurav redkar wrote:
Hi,
yes Paritosh..even i think the same. actually i am using a test data set
that has 5000 tuples with 1000 dimensions each. the thing is der are too
many files created in the pointsDir folder and i think the program tries to
open a path to all d files(i.e. read all the files in memory at once). Is
my interpretation correct.?? Also how do i go about fixing it..?
Thanks
On Fri, Nov 4, 2011 at 11:03 AM, Paritosh Ranjan<[email protected]> wrote:
Reading point is keeping everything in memory which might have crashed it.
pointList.add(record.**getSecond());
Your dataset size is 40 MB but the vectors might be too large. How many
dimensions are you having in your Vector?
On 04-11-2011 10:57, gaurav redkar wrote:
Hello,
I am in a fix with the Clusterdumper utility. The clusterdump utility
crashes when it tries to output the clusters by outputting an out of
memory
exception: java heap space.
when i checked the error stack, it seems that the program crashed in
readPoints() function. i guess it is unable to build the "result" map. Any
idea how do i fix this.??
I am working on a dataset of size 40mb. I had tried increaseing the heap
space but with no luck.
Thanks
Gaurav
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11