ok i will share a simple example soon. meantime you will be able to see this behavior using example here,
https://github.com/apache/spark/blob/branch-1.2/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala slightly modify it to include model.latestModel.clusterCenters.foreach(println) (after model.trainOn) add new files to trainingDir periodically I have 3 dimensions per data-point - they look like these, [1, 1, 385.2241452777778] [3, 1, 384.7529463888889] [4,1, 3083.2778025] [2, 4, 6226.402321388889] [1, 2, 785.8426655555555] [5, 1, 6706.054241388889] ........ and monitor. please let know if I missed something Krishna On Fri, Feb 19, 2016 at 10:59 AM, Bryan Cutler <[email protected]> wrote: > Can you share more of your code to reproduce this issue? The model should > be updated with each batch, but can't tell what is happening from what you > posted so far. > > On Fri, Feb 19, 2016 at 10:40 AM, krishna ramachandran <[email protected]> > wrote: > >> Hi Bryan >> Agreed. It is a single statement to print the centers once for *every* >> streaming batch (4 secs) - remember this is in streaming mode and the >> receiver has fresh data every batch. That is, as the model is trained >> continuously so I expect the centroids to change with incoming streams (at >> least until convergence) >> >> But am seeing same centers always for the entire duration - ran the app >> for several hours with a custom receiver. >> >> Yes I am using the latestModel to predict using "labeled" test data. But >> also like to know where my centers are >> >> regards >> Krishna >> >> >> >> On Fri, Feb 19, 2016 at 10:18 AM, Bryan Cutler <[email protected]> wrote: >> >>> Could you elaborate where the issue is? You say calling >>> model.latestModel.clusterCenters.foreach(println) doesn't show an updated >>> model, but that is just a single statement to print the centers once.. >>> >>> Also, is there any reason you don't predict on the test data like this? >>> >>> model.predictOnValues(testData.map(lp => (lp.label, >>> lp.features))).print() >>> >>> >>> >>> On Thu, Feb 18, 2016 at 5:59 PM, ramach1776 <[email protected]> wrote: >>> >>>> I have streaming application wherein I train the model using a receiver >>>> input >>>> stream in 4 sec batches >>>> >>>> val stream = ssc.receiverStream(receiver) //receiver gets new data every >>>> batch >>>> model.trainOn(stream.map(Vectors.parse)) >>>> If I use >>>> model.latestModel.clusterCenters.foreach(println) >>>> >>>> the value of cluster centers remain unchanged from the very initial >>>> value it >>>> got during first iteration (when the streaming app started) >>>> >>>> when I use the model to predict cluster assignment with a labeled input >>>> the >>>> assignments change over time as expected >>>> >>>> testData.transform {rdd => >>>> rdd.map(lp => (lp.label, >>>> model.latestModel().predict(lp.features))) >>>> }.print() >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/StreamingKMeans-does-not-update-cluster-centroid-locations-tp26275.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >> >
