Thanks Satya. I tried setting the initSteps as 25 and the maxIteration as 500, both in R and Spark. The results provided below were from that settings.
Also, in Spark and R the center remains almost the same, but they are different from each other. Thanks & Regards Saroj From: Satya Varaprasad Allumallu <alluma...@gmail.com> To: Saroj C <saro...@tcs.com> Cc: User <user@spark.apache.org> Date: 01/02/2017 08:53 PM Subject: Re: Difference in R and Spark Output Can you run Spark Kmeans algorithm multiple times and see if the centers remain stable? I am guessing it is related to random initialization of centers. On Mon, Jan 2, 2017 at 1:34 AM, Saroj C <saro...@tcs.com> wrote: Dear Felix, Thanks. Please find the differences Cluster Spark - Size R- Size 0 69 114 1 79 141 2 77 93 3 90 44 4 130 53 Spark - Centers 0.807554406 0.123759 -0.58642 -0.17803 0.624278 -0.06752 0.033517 -0.01504 -0.02794 0.016699 0.20841 -0.00149 -0.05598 0.039746 0.030756 -0.19788 -0.07906 -0.14881 0.0056 0.01479 0.066883 0.002491 -0.428583581 -0.81975 0.347356 -0.18664 0.047582 0.058692 -0.0721 -0.13873 -0.08666 0.085334 0.054398 -0.0228 0.008369 0.073103 0.022246 -0.15439 -0.06016 -0.15073 -0.03734 0.004299 0.089258 -0.00694 0.692744675 0.148123 0.087253 0.851781 -0.2179 0.003407 -0.12357 -0.01795 0.016427 0.088004 0.021502 -0.04616 -0.00847 0.023397 0.057656 -0.12036 -0.03947 -0.13338 -0.02975 0.012217 0.090547 -0.00232 -0.677692276 0.581091 0.446125 -0.13087 0.037225 0.018936 0.055286 0.01146 -0.08648 0.053719 0.072753 -0.00873 -0.04448 0.042067 0.089221 -0.1977 -0.07368 -0.14674 -0.00641 0.020815 0.058425 0.016745 1.03518389 0.228964 0.539982 -0.3581 -0.13488 -0.00525 -0.1267 -0.04439 -0.01923 0.111272 -0.05181 -0.05508 -0.04143 0.046479 0.059224 -0.16148 -0.07541 -0.12046 -0.03535 0.003049 0.070862 0.010083 R - Centers 0.7710882 0.86271 0.249609 0.074961 0.251188 -0.05293 -0.11106 -0.08063 0.01516 0.054043 0.056937 -0.0287 -0.03291 0.056607 0.045214 -0.15237 -0.05442 -0.14038 -0.02326 0.013882 0.078523 -0.0087 -0.644077 0.022256 0.368266 -0.06912 0.123979 0.009181 -0.04506 -0.04179 -0.0255 0.041568 0.04081 -0.02936 -0.04849 0.049712 0.062894 -0.16736 -0.06679 -0.12705 -0.007 0.018079 0.062337 0.00349 0.9772678 -0.57499 0.523792 -0.27319 0.163677 0.053579 -0.07616 0.074556 0.00662 0.087303 0.088835 -0.01923 -0.04938 0.07299 0.059872 -0.19137 -0.04737 -0.1536 0.002926 0.049441 0.079147 0.02771 0.5172924 0.167666 -0.16523 -0.82951 -0.77577 -0.00981 0.018531 -0.09629 -0.1654 0.273644 -0.05433 -0.03593 0.115834 0.021465 -0.00981 -0.15112 -0.16178 -0.04783 -0.19962 -0.12418 0.07286 0.03266 0.717927 -0.34229 -0.33544 0.817617 -0.21383 0.02735 0.01675 -0.10814 -0.1747 0.033743 0.038308 -0.0495 -0.05961 -0.01977 0.092247 -0.16017 -0.04787 -0.20766 0.040038 0.024614 0.090587 -0.0236 Please let me know, if any additional info will help to find these anomalies. Thanks & Regards Saroj From: Felix Cheung <felixcheun...@hotmail.com> To: User <user@spark.apache.org>, Saroj C <saro...@tcs.com> Date: 12/31/2016 10:36 AM Subject: Re: Difference in R and Spark Output Could you elaborate more on the huge difference you are seeing? From: Saroj C <saro...@tcs.com> Sent: Friday, December 30, 2016 5:12:04 AM To: User Subject: Difference in R and Spark Output Dear All, For the attached input file, there is a huge difference between the Clusters in R and Spark(ML). Any idea, what could be the difference ? Note we wanted to create Five(5) clusters. Please find the snippets in Spark and R Spark //Load the Data file // Create K means Cluster KMeans kmeans = new KMeans().setK(5).setMaxIter(500) .setFeaturesCol("features" ).setPredictionCol("prediction"); In R //Load the Data File into df //Create the K Means Cluster model <- kmeans(df, 5) Thanks & Regards Saroj =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you