Ben, I think what you've outlined is right on the mark. I was very pleased to see the areas you mentioned, particularly with regard to large data files. I still do some work with what used to be considered very large files from an ongoing nationwide household survey, but these days the data files might be thought of as perhaps "small" in size. The devil is in the details, of course, but you've made a big start with the current versions of pspp. I think one of the hard areas will be the graphical interfaces, as there is the Windows XP (now Vista) world, the Linux Gnome/KDE world, the Mac OS X Aqua/Cocoa world, and then say the Solaris world with gtk/gnome. From a statistical procedure standpoint, it might be interesting if someone would think about what's involved in implementing the Complex Samples routines--I suggested such an addition to SPSS several years ago, and they picked up the Wesvar package, then dropped it, and have recently implemented their own version.
Having an open source project like pspp will be of huge benefit to the statistical/data analysis communities, and the time is right for such a project. I would be happy to help out or assist as best I can. Marshall -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Pfaff Sent: Tuesday, August 02, 2005 12:58 PM To: [email protected] Subject: PSPP goals Jason Stover and I met over lunch yesterday and talked over some of the goals for PSPP. I realized that I haven't ever done a good job of expressing these on the list, although I've talked them over with a few individuals at different times. So I've written up a statement of my long-term goals for PSPP, included below. I think I'd like to include this in the README for 0.4.0. Comments are welcome--please give feedback. ---------------------------------------------------------------------- The long term goals for PSPP are ambitious. We wish to provide the following support to users: * All of the SPSS transformation language. PSPP already supports a large subset of it. * All the statistical procedures that someone is willing to implement, whether they exist in SPSS or not. Currently, statistical support is limited, but growing. * Compatibility with SPSS syntax, including compatibility with known bugs and warts, where it makes sense. We also provide an "enhanced" mode in certain cases where PSPP can output better results that may surprise SPSS users. * Friendly textual and graphical interfaces. PSPP does not do a good job of this yet. * Attractive output, including graphs, in a variety of human- and machine-readable formats. PSPP currently produces output in ASCII, PostScript, and HTML formats. We will enhance PSPP's output formatting in the future. * Good documentation. Currently the PSPP manual describes its language completely, but we would like to add information on how to select statistical procedures and interpret their results. * Efficient support for very large data sets. For procedures where it is practical, we wish to efficiently support data sets many times larger than physical memory. The framework for this feature is already in place, but it has not been tuned or extensively tested. Over the long term, we also wish to provide support to developers who wish to extend PSPP with new statistical procedures, by supplying the following: * Easy-to-use support for parsing language syntax. Currently, parsing is done by writing "recursive descent" code by hand, with some support for automated parsing of the most common constructs. We wish to improve the situation by supplying a more complete and flexible parser generator. * Easy-to-use support for producing attractive output. Currently, output is done by writing code to explicitly fill in table cells with data. We should be able to supply a more convenient interface that also allows for providing machine-readable output. * Eventually, a plug-in interface for procedures. Over the short term, the interface between the PSPP core and statistical procedures is evolving quickly enough that a plug-in model does not make sense. Over the long term, it may make sense to introduce plug-ins. -- Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;) -- Linus Torvalds _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
