Actually i have to run the meanshift algorithm on a large dataset for my project. the clusterdumper facility works on smaller data sets .
But my project will mostly include large-scale data (size will mostly extend to gigabytes). So i need to modify the clusterdumper facility to work on the such dataset. Also the vector is densely populated. i probably need to read each file from pointsDir one at a tym while constructing the "result" map. Any pointers as to how do i do it.?? Thanks On Fri, Nov 4, 2011 at 11:27 AM, Paritosh Ranjan <[email protected]> wrote: > Reducing dimension (drastically, try less than 100 if functionality allows > this) can be a solution. > > Which vector implementation are you using? If the vectors are sparsely > populated ( have lots of uninitialized/unused dimensions) , you can use > RandomAccessSparseVector or SequentialAccessSparseVector, which will > populate only the dimensions which you are using. This can also decrease > memory consumption. > > > On 04-11-2011 11:19, gaurav redkar wrote: > >> Hi, >> >> yes Paritosh..even i think the same. actually i am using a test data set >> that has 5000 tuples with 1000 dimensions each. the thing is der are too >> many files created in the pointsDir folder and i think the program tries >> to >> open a path to all d files(i.e. read all the files in memory at once). Is >> my interpretation correct.?? Also how do i go about fixing it..? >> >> Thanks >> >> >> >> On Fri, Nov 4, 2011 at 11:03 AM, Paritosh Ranjan<[email protected]> >> wrote: >> >> Reading point is keeping everything in memory which might have crashed >>> it. >>> pointList.add(record.****getSecond()); >>> >>> >>> Your dataset size is 40 MB but the vectors might be too large. How many >>> dimensions are you having in your Vector? >>> >>> >>> On 04-11-2011 10:57, gaurav redkar wrote: >>> >>> Hello, >>>> >>>> I am in a fix with the Clusterdumper utility. The clusterdump utility >>>> crashes when it tries to output the clusters by outputting an out of >>>> memory >>>> exception: java heap space. >>>> >>>> when i checked the error stack, it seems that the program crashed in >>>> readPoints() function. i guess it is unable to build the "result" map. >>>> Any >>>> idea how do i fix this.?? >>>> >>>> I am working on a dataset of size 40mb. I had tried increaseing the heap >>>> space but with no luck. >>>> >>>> Thanks >>>> >>>> Gaurav >>>> >>>> >>>> >>>> ----- >>>> No virus found in this message. >>>> Checked by AVG - www.avg.com >>>> Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11 >>>> >>>> >>> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 10.0.1411 / Virus Database: 2092/3994 - Release Date: 11/03/11 >> > >
