Least squares techniques in general depend on an assumption of normal 
distribution of errors. With counts, that is only plausible with large values. 

Also decomposition a like this make linearity assumptions which imply all 
items/words are independent.  They are clearly not. 

Sent from my iPhone

> On Mar 27, 2014, at 7:18, Tevfik Aytekin <[email protected]> wrote:
> 
> Interesting topic,
> Ted, can you give examples of those mathematical assumptions
> under-pinning ALS which are violated by the real world?
> 
>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <[email protected]> wrote:
>> How can there be any other practical method?  Essentially all of the
>> mathematical assumptions under-pinning ALS are violated by the real world.
>> Why would any mathematical consideration of the number of features be much
>> more than heuristic?
>> 
>> That said, you can make an information content argument.  You can also make
>> the argument that if you take too many features, it doesn't much hurt so
>> you should always take as many as you can compute.
>> 
>> 
>> 
>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> does anyone know of a principled approach of choosing the number of
>>> features for ALS (other than cross-validation?)
>>> 
>>> --sebastian
>>> 

Reply via email to