What about gauging it's ability to predict the topics of labeled data? 1) Grab RSS feeds of blog posts and use the tags as labels 2) Delicious bookmarks & their content versus user tags 3) other examples abound...
On Tue, Jan 4, 2011 at 10:33 AM, Jake Mannix <[email protected]> wrote: > Saying we have hashing is different than saying we know what will happen to > an algorithm once its running over hashed features (as the continuing work > on our Stochastic SVD demonstrates). > > I can certainly try to run LDA over a hashed vector set, but I'm not sure > what criteria for correctness / quality of the topic model I should use if > I > do. > > -jake > > On Jan 4, 2011 7:21 AM, "Robin Anil" <[email protected]> wrote: > > We already have the second part - the hashing trick. Thanks to Ted, and he > has a mechanism to partially reverse engineer the feature as well. You > might > be able to drop it directly in the job itself or even vectorize and then > run > LDA. > > Robin > > On Tue, Jan 4, 2011 at 8:44 PM, Jake Mannix <[email protected]> wrote: > > > Hey Robin, > > Vowp... >
