On Mon, Oct 22, 2012 at 10:40:35AM -0500, Daniel Elliott wrote: I am not knowledgeable about the "SPSS way" of doing things with respect to creating new functions but I figured that, with neural nets for example, I could provide the training parameters available and you could tell me what format the input and output data should be. From there, writing the interface wouldn't be too painful. My assumption is that PSPP users are more focused on analyzing results from a returned model than they are interested in the minutiae of implementation detail. From this perspective, I think that the best way to use an neural network from PSPP would be k-fold cross-correlation or bootstrap cross-validation which are described in chapter 6 of Empirical Methods for Artificial Intelligence by Paul Cohen. This would shield the user from as many of the issues in model selection as possible. It would be good if the users could specify stuff like the number of layers and the number of nodes in each layer and the type of activation functions to use or some subset of these items. Sadly, the approach to machine learning algorithms is pretty undisciplined.
There are several goals for PSPP. One is to provide a free replacement for SPSS. Just as libreOffice does for MSWord and Gnumeric does for Excel. A significant proportion of our users are students doing undergraduate stats courses. These users need a) a user interface which resembles that of SPSS, and b) results which resemble those of SPSS, both in terms of presentation and values. Now SPSS, has several NN options. For example there is an MLP command. If we were to implement a MLP command, the user interface should therefore resemble that of SPSS, although the implementation need not. Alternatively, one could provide a PSPP "extension" which does not claim to be SPSS compatible, so long as that is clear in the documentation. A second class of users, are professional statisticians, who process HUGE amounts of data - datasets with hundreds of millions of observations. The routines used in PSPP go to great pains to cope with such datasets. I mention this, because it can sometimes be a non-trivial task to convert an existing routine to do that, especially if the implementation dynamically allocates memory to store its data. Again, I am very far from being a competent statistician, but would enjoy the opportunity to provide some tools to PSPP. My abilities are primarily in things like logistic regression, mixtures of Gaussians, PCA, and neural networks for classification and prediction. I also do reinforcement learning but I doubt that is of any use. PCA is already supported. See the FACTOR command. We also have k-means clustering. Coincidentally, logistic regression I am already working on and hope to complete very shortly. We don't yet have any neural net routines, nor do we have hierachial clustering. So we could certainly use some contributions there. I suggest you have a look at how some of the existing algorithms are implemented, and perhaps post some code to show how you think your contributions could fit. Thanks for your interest. Regards John -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://keys.gnupg.net or any PGP keyserver for public key.
signature.asc
Description: Digital signature
_______________________________________________ pspp-dev mailing list pspp-dev@gnu.org https://lists.gnu.org/mailman/listinfo/pspp-dev