On Mon, Jan 17, 2011 at 12:28 AM, Claudia Grieco <gri...@crmpa.unisa.it>wrote:
> > If you don't have truly massive volumes, then SGD is almost certainly a > better choice because it is simpler. > > By "simpler" you mean "faster" or "easier to code"? > I mean that the code itself is simpler. This results in it being faster and easier to code. > As for the multiple categories problem...I was thinking of returning the > top N categories to the user, or the ones whose score is more than a certain > threshold...do you think it's fine? > top N works. Threshold can be a little difficult because longer documents may give larger scores on average and it may not be linear in document length. Try it.