What about gauging it's ability to predict the topics of labeled data?

1) Grab RSS feeds of blog posts and use the tags as labels
2) Delicious bookmarks & their content versus user tags
3) other examples abound...

On Tue, Jan 4, 2011 at 10:33 AM, Jake Mannix <[email protected]> wrote:

> Saying we have hashing is different than saying we know what will happen to
> an algorithm once its running over hashed features (as the continuing work
> on our Stochastic SVD demonstrates).
>
> I can certainly try to run LDA over a hashed vector set, but I'm not sure
> what criteria for correctness / quality of the topic model I should use if
> I
> do.
>
>  -jake
>
> On Jan 4, 2011 7:21 AM, "Robin Anil" <[email protected]> wrote:
>
> We already have the second part - the hashing trick. Thanks to Ted, and he
> has a mechanism to partially reverse engineer the feature as well. You
> might
> be able to drop it directly in the job itself or even vectorize and then
> run
> LDA.
>
> Robin
>
> On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <[email protected]> wrote:
> >
> Hey Robin, > > Vowp...
>

Reply via email to