Okay, please disregard the previous e-mail.
That hypothesis is toast; clustering works just fine with ball k-means.

So, the problem lies in streaming k-means somewhere.

On Thu, Dec 6, 2012 at 12:06 AM, Dan Filimon
<[email protected]> wrote:
> Hi,
>
> One of the most basic tests for streaming k-means (and k-means in
> general) is whether it works well for points that are multi-normally
> distributed around the vertices of a unit cube.
>
> So, for a cube, there'd be 8 vertices in 3d space. Generating
> thousands of points should cluster them in those 8 clusters and they
> should be relatively close to the means of these multinormal
> distributions.
>
> I decided to generalize it to more than 3 dimensions, and see how it
> works for hypercubes with n dimensions and 2^n vertices.
>
> Not well it turns out.
>
> The clusters become less balanced as the number of dimensions increases.
> I'm not sure if this is to be expected. I understand that in high
> dimensional spaces, it becomes more likely for distances to be equal
> and vectors to be orthogonal, but I'm seeing issues starting at 5
> dimensions and this doesn't seem like a particularly high number of
> dimension to me.
>
> Is this normal?
> Should the hypercube no longer have all sides equal to 1? The variance
> of the multinormals is also 1.
>
> Thanks!

Reply via email to