Re: StreamingKMeans does not update cluster centroid locations

krishna ramachandran Fri, 19 Feb 2016 12:49:51 -0800

ok i will share a simple example soon. meantime you will be able to see
this behavior using example here,


https://github.com/apache/spark/blob/branch-1.2/examples/src/main/scala/org/apache/spark/examples/mllib/StreamingKMeansExample.scala

slightly modify it to include

model.latestModel.clusterCenters.foreach(println)

(after model.trainOn)

add new files to trainingDir periodically

I have 3 dimensions per data-point - they look like these,

[1, 1, 385.2241452777778]

[3, 1, 384.7529463888889]

[4,1, 3083.2778025]

[2, 4, 6226.402321388889]

[1, 2, 785.8426655555555]

[5, 1, 6706.054241388889]

........

and monitor. please let know if I missed something

Krishna





On Fri, Feb 19, 2016 at 10:59 AM, Bryan Cutler <[email protected]> wrote:

> Can you share more of your code to reproduce this issue?  The model should
> be updated with each batch, but can't tell what is happening from what you
> posted so far.
>
> On Fri, Feb 19, 2016 at 10:40 AM, krishna ramachandran <[email protected]>
> wrote:
>
>> Hi Bryan
>> Agreed. It is a single statement to print the centers once for *every*
>> streaming batch (4 secs) - remember this is in streaming mode and the
>> receiver has fresh data every batch. That is, as the model is trained
>> continuously so I expect the centroids to change with incoming streams (at
>> least until convergence)
>>
>> But am seeing same centers always for the entire duration - ran the app
>> for several hours with a custom receiver.
>>
>> Yes I am using the latestModel to predict using "labeled" test data. But
>> also like to know where my centers are
>>
>> regards
>> Krishna
>>
>>
>>
>> On Fri, Feb 19, 2016 at 10:18 AM, Bryan Cutler <[email protected]> wrote:
>>
>>> Could you elaborate where the issue is?  You say calling
>>> model.latestModel.clusterCenters.foreach(println) doesn't show an updated
>>> model, but that is just a single statement to print the centers once..
>>>
>>> Also, is there any reason you don't predict on the test data like this?
>>>
>>> model.predictOnValues(testData.map(lp => (lp.label,
>>> lp.features))).print()
>>>
>>>
>>>
>>> On Thu, Feb 18, 2016 at 5:59 PM, ramach1776 <[email protected]> wrote:
>>>
>>>> I have streaming application wherein I train the model using a receiver
>>>> input
>>>> stream in 4 sec batches
>>>>
>>>> val stream = ssc.receiverStream(receiver) //receiver gets new data every
>>>> batch
>>>> model.trainOn(stream.map(Vectors.parse))
>>>> If I use
>>>> model.latestModel.clusterCenters.foreach(println)
>>>>
>>>> the value of cluster centers remain unchanged from the very initial
>>>> value it
>>>> got during first iteration (when the streaming app started)
>>>>
>>>> when I use the model to predict cluster assignment with a labeled input
>>>> the
>>>> assignments change over time as expected
>>>>
>>>>       testData.transform {rdd =>
>>>>         rdd.map(lp => (lp.label,
>>>> model.latestModel().predict(lp.features)))
>>>>       }.print()
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/StreamingKMeans-does-not-update-cluster-centroid-locations-tp26275.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>>
>

Re: StreamingKMeans does not update cluster centroid locations

Reply via email to