Random low dimensional projections tend to look like normal distributions.
 This is the law of large numbers at work.  I think it is hard to diagnose
anything from this.

On the other hand, projections against the principal components tend to
show more structure.

On Thu, Dec 27, 2012 at 11:53 AM, Dan Filimon
<[email protected]>wrote:

> Hi!
>
> I'm finally getting back to work on Streaming KMeans! :)
> The last thing I did was experiment with different ways of vectorizing
> the 20 newsgroups data set and I wanted to project them in 3D and
> check out  what I get.
>
> The result is pretty odd, but I get it regardless of the method I use
> to generate vectors.
> It looks like someone splashed a 2D normal distribution on a sphere.
>
> Here's an image from Ted's algorithm [2] and one from mine [3] using
> log term-frequency scoring.
> Ted's uses vectors of size 9000 with hashing (using
> StaticWordValueEncoder) while mine uses vectors of size ~90000 with a
> manual approach.
>
> I think the vectorization actually went okay for both algorithms, but
> maybe the projection is off?
>
> The shape is odd. What am I doing wrong? :/
>
> [1] https://gist.github.com/4391252
> [2] http://swarm.cs.pub.ro/~dfilimon/skm-mahout/ted-projected.png
> [3] http://swarm.cs.pub.ro/~dfilimon/skm-mahout/log-projected.png
>

Reply via email to