Hello everyone,

I'm having problems using Streaming K Means.

I'm trying to use StreamingKMeansDriver.runMapReduce, but I'm getting an exception java.lang.RuntimeException: Failed to instantiate distanceMeasure and I'm not able to find out what is the problem.

        Configuration configuration = new Configuration();
        configuration.set("--estimatedNumMapClusters", "18");
        configuration.set("-k", "6");
configuration.set("--distanceMeasure", "org.apache.mahout.common.distance.CosineDistanceMeasure");
        Path input = new Path("/home/marko/mahout/synthetic_control.seq");
        Path output = new Path("/home/marko/mahout/pera/");
        StreamingKMeansDriver.runMapReduce(configuration, input, output);

Can anyone tell me what the is the problem, and maybe an example how to run StreamingKMeans from code?

I have successfully used Streaming K Means by calling

String[] args1 = new String[] {"-i","/home/marko/mahout/synthetic_control.seq","-o","/home/marko/mahout/pera/","--estimatedNumMapClusters","18","-k","6"};
        StreamingKMeansDriver.main(args1);

The output I get

key = 0, weight = 1.00, vector = {0:27.5606,31:38.8558,34:43.7514,4:28.7735,32:36.093,5:33.618,8:30.85,30:38.0233,24:42.096,29:39.5394,51:48.046,36:43.0201,2:36.1555,23:39.4361,60:49.8759,21:33.8983,11:38.77,7:33.3357,20:40.0553,14:30.62,28:37.2912,55:47.1712,54:51.0787,37:41.79,49:53.8093,41:40.7436,1:31.8836,48:52.1046,26:37.5231,44:51.0195,56:49.0151,42:50.7942,27:35.0382,50:46.8279,40:48.9718,46:47.4958,43:49.1814,19:31.1634,59:47.1264,17:29.8724,18:31.2151,22:31.5504,57:56.9343,45:47.7454,47:52.8485,58:53.8417,35:39.4066,3:35.1986,53:47.6176} key.toString()=0

key = 1, weight = 2.00, vector = {0:28.65235,31:49.86865,34:45.503,32:43.94945,5:32.89505,8:33.266400000000004,6:28.8538,30:49.078900000000004,24:45.3078,29:44.7095,17:12.7058,51:49.219849999999994,36:44.011399999999995,12:33.53985,23:35.05445,58:51.2766,21:31.50855,11:35.09185,60:52.6501,15:12.74985,7:30.71435,28:47.908500000000004,46:51.8624,33:44.8596,55:50.411199999999994,45:24.0696,54:49.2177,37:46.48245,49:53.45715,41:42.6875,1:13.1755,26:47.34955,44:49.88765,56:48.9574,27:47.8778,50:52.697900000000004,16:34.44915,40:45.8294,48:49.9206,39:43.791,9:35.781499999999994,52:44.336349999999996,13:31.0088,43:49.61055,19:12.0178,59:54.974900000000005,2:17.4499,20:34.2177,18:31.79565,22:36.2613,35:51.31335,57:24.2226,25:22.37745,38:49.172200000000004,47:47.85,14:32.7834,3:30.582500000000003,53:48.9705} key.toString()=1

key = 2, weight = 152.00, vector = {0:30.308929166666662,31:37.89175694444444,34:40.313206944444445,4:28.234209722222218,52:43.319116666666666,5:28.583401388888888,8:30.41192361111111,6:27.122180555555556,30:35.36717083333335,24:32.6571611111111,60:47.45788194444444,51:44.96369305555557,36:38.223752777777776,12:30.085509722222223,23:33.20938750000001,58:44.61068194444445,21:32.97969305555554,11:30.22750277777778,10:32.060152777777766,15:29.185484722222228,7:29.902281944444447,20:31.756131944444444,14:29.41984027777778,28:35.05381388888889,33:37.02411527777778,55:44.240726388888895,45:41.017775,54:42.935356944444436,1:26.75311805555556,49:43.76183194444445,41:40.06038888888889,9:30.28303472222222,48:40.979955555555556,26:36.07801249999999,44:42.008533333333325,29:35.69802222222223,37:39.81415833333333,56:47.06050277777777,42:41.27901527777778,27:32.251562500000006,50:45.150658333333325,16:31.263650000000002,40:40.04733611111111,46:40.36827777777778,39:39.23484583333334,43:42.384634722222216,19:31.756208333333333,32:37.107061111111115,2:27.071493055555557,17:32.05645833333333,18:31.90512222222223,47:41.31990416666666,35:38.52181805555554,57:44.64460555555556,25:35.03711805555556,38:38.438113888888886,22:32.69226944444444,59:46.562286111111106,13:32.09874861111111,3:27.104297222222222,53:43.85713611111111} key.toString()=2

key = 3, weight = 2.00, vector = {0:30.185449999999996,31:41.9578,34:46.91155,4:15.68445,32:45.0681,5:35.30145,8:32.85745,6:14.111,30:44.63145,24:40.68435,60:56.05995,17:40.0015,51:25.0513,36:46.43435,12:32.0613,23:37.361450000000005,58:52.3468,21:39.6964,11:33.4141,10:34.94435,15:36.535849999999996,7:18.4441,28:40.1027,46:50.7803,33:40.2361,55:56.43835,54:49.87345,25:42.783649999999994,3:31.87645,49:48.6065,41:49.6939,1:31.4409,26:19.1482,44:47.217349999999996,29:39.50865,37:41.2697,56:55.546350000000004,42:46.4043,27:42.9353,50:47.81725,16:33.53155,40:46.72375,48:44.8384,39:44.48655,9:16.81675,52:46.7161,13:36.255250000000004,43:48.7649,19:40.079,59:55.7936,2:31.533549999999998,20:40.4979,18:40.2089,22:38.1516,35:44.39035,57:53.22595,45:49.1731,38:24.55595,47:48.40375,14:33.2802,53:53.8623} key.toString()=3

key = 4, weight = 258.00, vector = {0:29.9348880794702,31:28.56757814569536,34:29.028467549668882,4:29.866621192052982,32:27.076078807947024,5:29.772399337748347,8:24.439425165562916,6:27.648106622516554,30:26.647501324503317,24:27.587005960264904,60:27.925100066225173,51:28.524571125827812,36:27.22875576158941,12:25.618866887417223,23:26.076735099337753,58:28.460376225165568,21:27.01749668874172,11:25.109786754966883,10:23.211491390728472,15:27.11530132450331,7:26.046012582781458,20:28.690811920529804,14:26.972223841059602,28:26.995324503311256,33:28.264741059602645,55:27.806986092715228,45:26.724292052980132,54:27.049585695364232,37:27.759665430463574,49:27.010873509933777,41:28.14905894039735,1:30.426772185430465,48:26.921705298013247,26:24.756764900662258,44:27.09524900662252,29:26.225837086092714,57:27.513937019867555,56:26.426737152317887,42:27.73037483443709,27:26.162309271523174,50:29.218896026490064,16:26.66407682119205,40:27.55462781456954,46:26.596096026490063,39:27.3768417218543,9:24.289268874172187,43:26.64814172185431,19:28.645127152317883,59:26.575245496688737,2:29.291917218543038,17:28.56763907284768,18:28.538417880794707,22:26.21464701986755,35:28.40589933774834,52:29.422719271523185,25:26.593215894039734,38:26.70580264900662,47:28.733517483443713,13:24.650416556291386,3:31.447106622516564,53:26.132733708609273} key.toString()=4

key = 5, weight = 185.00, vector = {0:30.340515286624203,31:19.232557707006375,34:18.889934713375787,4:27.13570700636943,32:19.52735280254777,5:26.769334394904458,8:25.991707006369428,6:24.50922866242038,30:21.127041528662417,24:22.67118808917197,60:11.410238299363058,51:13.493395675159237,36:18.50240305732484,12:25.1683127388535,23:22.72065859872611,58:11.735817652866244,21:23.42202547770701,11:24.488473248407644,10:25.515975796178346,15:23.7855796178344,7:25.499776433121017,1:27.108324840764322,28:20.16715490445861,46:13.467933439490444,33:18.75225267515923,55:12.225422923566876,45:14.32592140127388,54:12.782745123566876,37:16.604375668789814,49:13.57119076433121,41:14.572259808917199,9:24.29511847133758,26:21.875732484076433,44:13.453237133757959,29:20.456985796178348,57:12.012338025477707,56:13.078008216560509,42:14.120310891719742,27:20.739581337579615,50:13.693988471337576,16:23.62958280254777,40:15.625883630573249,48:13.919134140127385,39:15.97887770700637,13:24.82823057324841,43:14.302185031847129,19:25.15577770700637,59:11.290807967515926,2:27.252857961783434,17:25.255168152866244,18:23.945151592356687,22:23.400647070063695,20:24.469106369426754,35:18.195827388535033,52:12.425490063694266,25:21.124606815286626,38:16.10359210191083,47:14.002531783439489,14:25.53967579617834,3:25.328435668789812,53:12.843222356687896} key.toString()=5


I have used driver for starting K means, but somehow seems that its run method is more intuitively parametrized... Is there a reason to use run method instead of main?

Also, when I use StreamingKMeans by calling the main method, the clusters I'm getting are definitely not what I expected, since there should be an almost equal distribution of points to clusters in my dataset, but out of 6 clusters, I get 2 or 3 clusters that have only 1 or 2 points assigned to them (they have weight=1, correct me if that is not the number of points assigned to that cluster?). What could be the reason?

Also, can I somehow get which point belongs to which cluster? And the distance between the point and the centroid of the cluster it has been assigned to?

Thanks

Reply via email to