On 05/15/2012 10:06 PM, David Warde-Farley wrote: > On 2012-05-15, at 3:23 PM, Andreas Mueller<[email protected]> wrote: > >> I am not sure if we want to support sparse data. I have no experience with >> using MLPs on sparse data. >> Could this be done efficiently? The weight vector would need to be >> represented explicitly and densely, I guess. >> >> Any ideas? > People can and do use neural nets with sparse inputs, dense-sparse products > aren't usually too bad in my experience. Careful regularization and/or lots > of data (a decent number of examples where each feature is non-zero) will be > necessary to get good results, but this goes for basically any parametric > model operating on sparse inputs. > Looking at the SequentialDataset implementation and the algorithms again, I tend to agree with David (M.), in that using numpy arrays might be better. If we want to support a sparse version, we'd need another implementation (of the low level functions).
The SequentialDataset was made for vector x vector operations. Depending on whether we do mini-batch or online learning, the MLP needs vector x matrix or matrix x matrix operations. In particular matrix x matrix is probably not feasible with the SequentialDataset, though I think even vector x matrix might be ugly and possibly slow, though I'm not sure there. What do you think Mathieu (and the others)? On the same topic: I'm not sure if we decided whether we want minibatch, batch and online learning. I have the feeling that it might be possible to do particular optimizations for online learning, and this is the algorithm that I favor the most. Comments? David M., what do you think? Btw, two comments on your current code: I think this looks pretty good already. Atm, the tests are failing, though. Also, I feel like using squared error for classification is a very bad habit that for some reason survived the last 20 years in some dark corner. Did you compare timings and results against my implementation? Once you are pretty sure that the code is correct, you should disable the boundscheck in cython, as this can improve speed a lot :) Cheers, Andy ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
