Least squares techniques in general depend on an assumption of normal distribution of errors. With counts, that is only plausible with large values.
Also decomposition a like this make linearity assumptions which imply all items/words are independent. They are clearly not. Sent from my iPhone > On Mar 27, 2014, at 7:18, Tevfik Aytekin <[email protected]> wrote: > > Interesting topic, > Ted, can you give examples of those mathematical assumptions > under-pinning ALS which are violated by the real world? > >> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <[email protected]> wrote: >> How can there be any other practical method? Essentially all of the >> mathematical assumptions under-pinning ALS are violated by the real world. >> Why would any mathematical consideration of the number of features be much >> more than heuristic? >> >> That said, you can make an information content argument. You can also make >> the argument that if you take too many features, it doesn't much hurt so >> you should always take as many as you can compute. >> >> >> >>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <[email protected]> wrote: >>> >>> Hi, >>> >>> does anyone know of a principled approach of choosing the number of >>> features for ALS (other than cross-validation?) >>> >>> --sebastian >>>
