Hi,
Thank you for your reply. My aim is not to use global clustering step, but
rather to use BIRCH for online an clustering (possible infinite stream). I
was also trying to set BIRCH threshold automatically. In order to do so, I
use Gap Statistics (developed it on top of Apache Spark) for certain
'window' of data stream, and I am able to produce BIRCH threshold with high
accuracy (based on tests I did so far). Since BIRCH is highly dependent on
order of data, and because of the way I am setting the threshold, there is
a certain possibility that some clusters have to be merged (while
possibility for splitting the clusters is small). In order to keep it
"online", I want to merge those clusters in a "intermediate" step if there
is a need. So basically I want to do merging if needed in the
"partial_fit", before I proceed with the next batch and maybe modified
threshold.
That is why I can't use global clustering with predefined number of
clusters or other clustering model. Hope this makes sense now.
Thanks again.
Dzeno
On Sun, Feb 7, 2016 at 9:58 PM, Joel Nothman <joel.noth...@gmail.com> wrote:
> It's not clear *why* you're doing this. The model will automatically
> recluster the subclusters after identifying them, as long as you specify
> either a number of clusters or a clustering model to the n_clusters
> parameter. Can you fit this post-processing into that "final clustering"
> framework?
>
> On 8 February 2016 at 07:12, Dženan Softić <dzen...@gmail.com> wrote:
>
>> Hi,
>>
>> I am doing some experiments with BIRCH. When BIRCH finish, I would like to
>> merge subclusters based on some criteria. I am doing this this by calling
>> "merge_subcluster" method on subcluster that I want to merge with, passing
>> it subcluster object of the second cluster:
>>
>> cluster1.merge_subcluster(cluster2, self.threshold)
>>
>> It seems to work, since it updates correctly N, LS, SS (n_samples,
>> linear_sum, squared_sum). What is left is to remove a merged subcluster
>> (cluster2) from the subclusters list and to update centroids:
>>
>> ind = leaf.subclusters_.index(cluster1) #getting the index to update the
>> centroid
>> ind_remove = leaf.subclusters_.index(cluster2) #getting the index of a
>> cluster that needs to be removed because it is merged
>> leaf.init_centroids_[ind] = cluster1.centroid_ #update centroid
>> leaf.init_sq_norm_[ind] = cluster1.sq_norm_
>> leaf.centroids_ = np.delete(leaf.centroids_, ind_remove, 0) #removing the
>> centroid of a cluster2
>> self.root_.init_centroids_ = np.delete(self.root_.init_centroids_,
>> ind_remove, 0) #removing the centroid from the root
>> leaf.subclusters_.remove(cluster) #removing the cluster itself
>>
>> I am not sure I am doing it the right way. Any suggestion/comment would be
>> very much appreciated.
>>
>> Thanks,
>> Dzeno
>>
>>
>> ------------------------------------------------------------------------------
>> Site24x7 APM Insight: Get Deep Visibility into Application Performance
>> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
>> Monitor end-to-end web transactions and take corrective actions now
>> Troubleshoot faster and improve end-user experience. Signup Now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general