Hi-
This appears to be a bug in the PSPP regression routine with data with a
large amount of missing values!
I recently noticed some small discrepancies between simple bivariate
regression results between IBM SPSS, STATA and PSPP. Until Prof.
Shackman's email, I hadn't realized that the discrepancies only occur
when there are many missing values. I was just confused...
Sadly, I also find problems when running linear regressions using PSPP
on data with missing values. I wish I knew what was causing the problem.
So, using Dropbox, I wanted to make available some data which seems to
illustrate the issue.
Using psppire.exe 0.7.9-gab8ce2 on Windows AND psppire 0.7.8 on
LinuxMint LXDE, PSPP calculates descriptive statistics just like SPSS
and STATA on the same dataset, but does not calculate identical b
coefficients when running bivariate or multivariate regressions.
I created the following public opinion survey data files consisting of
three variables from the 2004 Canadian Election Study which I recoded
and declared certain values to be missing:
http://dl.dropbox.com/u/35198072/ces2004-regtest.sav
<http://www.queensu.ca/cora/ces.html> has many observations with missing
values.
http://dl.dropbox.com/u/35198072/ces2004-regtest2.sav has the same three
variables, but I dropped all of the cases with missing values.
This is the syntax file used to run descriptive statistics and three
regression analyses.
http://dl.dropbox.com/u/35198072/regression-tests.sps
PSPP generates these regression results and descriptive statistics with
missing values:
http://dl.dropbox.com/u/35198072/regression-test-pspp1.html
PSPP generates these regression results and descriptive statistics using
the data without any missing values:
http://dl.dropbox.com/u/35198072/regression-test-pspp2.html
Here is the STATA output on the same output (.log is a text file - email
me if you have a problem opening it). The first three regressions should
match the output in regression-test-pspp1.html
They are close, but not close enough... The bottom three regressions use
the data with no missing values and these DO match PSPP's output (in
regression-test-pspp2.html).
http://dl.dropbox.com/u/35198072/regression-test-stata.log
I also ran the data on SPSS and found results consistent with STATA.
There did not seem to be any problems with Pearson's Chi-Square or
Kendall's Tau-B when running a crosstab on the data with the missing values.
I am sorry I don't know what has gone wrong, so I am making available
this data in hopes someone might figure out where there is a mistake. I
caution other users running regression on PSPP.
Yours,
Renan
On 04-Mar-12 11:37 PM, Gene Shackman wrote:
Hi
I'm using the windows version, psppire.exe 0.7.8-g997322, that I
downloaded from
http://www.gnu.org/software/pspp/get.html
I'm using windows vista, home version.
My question is about linear regression. If I use data that has no
missing values, then PSPP regression seems to work fine. I compared
the results with other packages and got the same results, see
http://gsociology.icaap.org/methods/comparing_freestaprograms.html
However, if I use data that does have missing values, I get results
that are different from other programs. See the results from other
programs here
http://gsociology.icaap.org/methods/comparing_freestaprograms_missing.html
this also lists the data set I'm using, and attached below are the
results I get from PSPP (If you format this as courier, it aligns up
right.)
So 2 questions
1. How does pspp deal with missing? By the way, I tried coding blanks
as missing and also tried replacing all the missing values with -99999
and told pspp those were missing values, and got exactly the same results.
2. There don't appear to be any options on how regression is done,
like forward, backward, forced, etc. I didn't see anything in the
documentation about it either. Is it just doing straight forced
regression? Will there be any options on how to do regression?
Thanks very much.
REGRESSION
/VARIABLES= c_arable climate North phone_kpop
/DEPENDENT= gini
/STATISTICS=COEFF R ANOVA.
Model Summary
#====#========#=================#==========================#
# R #R Square|Adjusted R Square|Std. Error of the Estimate#
##===#========#=================#==========================#
#|.60# .36| .35| 8.65#
##===#========#=================#==========================#
ANOVA
#===========#==============#===#===========#=====#============#
# #Sum of Squares| df|Mean Square| F |Significance#
##==========#==============#===#===========#=====#============#
#|Regression# 4548.35| 4| 1137.09|15.19| .00#
#|Residual # 7933.89|106| 74.85| | #
#|Total # 12482.24|110| | | #
##==========#==============#===#===========#=====#============#
Coefficients
#===========#=====#==========#====#=====#============#
# # B |Std. Error|Beta| t |Significance#
##==========#=====#==========#====#=====#============#
#|(Constant)#47.95| 2.06| .00|23.22| .00#
#| c_arable # -.12| .05|-.20|-2.28| .02#
#| climate #-1.24| 1.04|-.11|-1.20| .23#
#| North # -.14| .03|-.43|-4.96| .00#
#|phone_kpop# .00| .00|-.07| -.81| .42#
##==========#=====#==========#====#=====#============#
Gene
Gene Shackman, Ph.D.
The Global Social Change Research Project
http://gsociology.icaap.org
Free Resources for Methods in Evaluation and Social Research
http://gsociology.icaap.org/methods
----------
Applied Sociologist
----------
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users
--
Renan Levine
Department of Political Science
University of Toronto - Scarborough
renan.lev...@utoronto.ca
http://individual.utoronto.ca/renan
(416) 208-2651
_______________________________________________
Pspp-users mailing list
Pspp-users@gnu.org
https://lists.gnu.org/mailman/listinfo/pspp-users