On Sun, Apr 24, 2005 at 08:45:10PM -0700, Ben Pfaff wrote:
John Darrington <[EMAIL PROTECTED]> writes:
> Maybe we should find out exactly what SPSS does.
I think that's the thing to do. I will try to test it out in the
next few days. If you get to it before me, pass along your
results, and I will do the same.
My results are in the brief report attached. My conclusion is that SPSS does indeed keep its long-short name map, and does not allow short names to magically change. So I think we should do the same. I don't think it adds too much extra complexity. Variables need only to have one name (the long one). The map needs to be a member of the dictionary. The only modules which will need to use it however will be sfm-read and sfm-write. I suppose the question still remains about what should happen if the variables are renamed. Tom Watson's comments seem to suggest that SPSS simply ignores the short names and renames only the long ones. We can probably do better than this. Another question is the geometry of the long-short name map --- should it be indexed by shortname or by longname. I remember wondering if I made the right choice when I was implementing it. Any comments? J' -- PGP Public key ID: 1024D/2DE827B3 fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 See http://pgp.mit.edu or any PGP keyserver for public key.
Introduction
------------
Version 12 of SPSS introduced long names for its variables. In this
version variable names can be upto 64 bytes long. Previous versions
permitted variable names to a maximum of 8 bytes long. In order to
allow backward compatibility of system files, the designers chose a
system whereby the system files are written with the original 8 byte
names, but also a map of short names to long names.
The question arises of whether the mapping between short and long
names persists thoughout a session. If it does not, then a system
file loaded with a particular set of variable names and subsequently
saved, may end up with a completely different set of variable names,
or the same set, but with a different mapping.
It's considered desirable that PSPP emulate the behaviour of SPSS in
order to ensure compatibility.
Purpose
-------
I wanted to examine the hypothesis:
"SPSS v12 retains its mapping between long variable names and short
variable names throughout a session."
Test Environment and Tools
--------------------------
1. A Windoze operating system running SPSS v12.
2. The Emacs text editor.
Method
------
I prepared a syntax file 'write.sps' containing the following:
DATA LIST LIST /foobarwiz1 * foobarwiz2 *.
BEGIN DATA.
1 2
END DATA.
ECHO 'State of dictionary prior to writing'.
DISPLAY DICTIONARY.
LIST.
SAVE /OUTFILE='out.sav'.
When run through SPSS v12, this file produced the output in appendix A
'OUTPUT1.TXT'.
It it pertinent that the variables foobarwiz1 and foobarwiz2 have been
allocated indeces 1 and 2 respectively.
Next, I examined the 'out.sav' system file using the hexl-mode of
Emacs. The hex dump of this file is in Appendix B. SPSS
decided to allocate the short name FOOBARWI to the variable
"foobarwiz1" and the short name FOOBAR_A to the variable "foobarwiz2".
I created a verbatim copy of 'out.sav' which I named 'in.sav'. Then
using the hexl-mode of Emacs I modified 'in.sav' as follows. In the
short-long name map I exchanged the strings FOOBARWI and FOOBAR_A. So
effectively exchanging the mappings between the two variables. A
hexdump of the modified 'in.sav' is in Appendix C.
Now I used the following syntax file to read the modified 'in.sav' and
re-write it to a file 'out2.sav'.
GET /FILE='in.sav'.
ECHO 'State of dictionary after reading'.
DISPLAY DICTIONARY.
LIST.
SAVE /OUTFILE='out2.sav'.
The motive is, that since 'in.sav' contains a map which is not the
default mapping, if the mapping is not preserved during this session,
then 'out2.sav' will be written with the default mapping instead of
the mapping presented to it in 'in.sav'.
The output file from this session is shown in Appendix D.
It is not entirely unexpected, that the indeces of variables foobarwiz1 and
foobarwiz2 are now reversed from the situation in the previous
session.
The interesting part however comes when examining 'out2.sav'. The
mapping has retained that which was in existence in 'in.sav' ---
foobarwiz1 has the short name FOOBAR_A and foobarwiz2 has the short
name FOOBARWI. The hex dump of out2.sav is presented in Appendix E.
Conclusion
----------
The results suggest that that SPSS does indeed persist is long/short
name mappings throughout the duration of a SPSS session. It does not
appear to generate new mappings every time a system file is written,
but uses existing mappings if available.
Appendix A
----------
OUTPUT1.TXT
State of dictionary prior to writing
File Information
Notes
�
| -------------------------------------------- | -------------------- |
| Output Created | 26-APR-2005 07:55:10 |
| -------------------------------------------- | -------------------- |
| Comments | |
| ----------- | ------------------------------ | -------------------- |
| Input | Filter | <none> |
| | ------------------------------ | -------------------- |
| | Weight | <none> |
| | ------------------------------ | -------------------- |
| | Split File | <none> |
| | ------------------------------ | -------------------- |
| | N of Rows in Working Data File | 1 |
| ----------- | ------------------------------ | -------------------- |
| Syntax | DISPLAY DICTIONARY. |
| | |
| ----------- | ------------------------------ | -------------------- |
| Resources | Elapsed Time | 0:00:00.00 |
| ----------- | ------------------------------ | -------------------- |
�
List of variables on the working file
Name (Position) Label
foobarwiz1 (1)
Measurement Level: Scale
Column Width: 10 Alignment: Right
Print Format: F8.2
Write Format: F8.2
foobarwiz2 (2)
Measurement Level: Scale
Column Width: 10 Alignment: Right
Print Format: F8.2
Write Format: F8.2
List
Notes
�
| -------------------------------------------- | -------------------- |
| Output Created | 26-APR-2005 07:55:10 |
| -------------------------------------------- | -------------------- |
| Comments | |
| ----------- | ------------------------------ | -------------------- |
| Input | Filter | <none> |
| | ------------------------------ | -------------------- |
| | Weight | <none> |
| | ------------------------------ | -------------------- |
| | Split File | <none> |
| | ------------------------------ | -------------------- |
| | N of Rows in Working Data File | 1 |
| ----------- | ------------------------------ | -------------------- |
| Syntax | LIST. |
| | |
| ----------- | ------------------------------ | -------------------- |
| Resources | Elapsed Time | 0:00:00.00 |
| ----------- | ------------------------------ | -------------------- |
�
foobarwiz1 foobarwiz2
1.00 2.00
Number of cases read: 1 Number of cases listed: 1
Appendix B.
----------
'out.sav'
2446 4c32 4028 2329 2053 5053 5320 4441 $FL2@(#) SPSS DA
00000010: 5441 2046 494c 4520 4d53 2057 696e 646f TA FILE MS Windo
00000020: 7773 2052 656c 6561 7365 2031 322e 3020 ws Release 12.0
00000030: 7370 7373 696f 3332 2e64 6c6c 2020 2020 spssio32.dll
00000040: 0200 0000 0200 0000 0100 0000 0000 0000 ................
00000050: 0100 0000 0000 0000 0000 5940 3236 2041 [EMAIL PROTECTED] A
00000060: 7072 2030 3530 373a 3535 3a31 3020 2020 pr 0507:55:10
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
00000080: 2020 2020 2020 2020 2020 2020 2020 2020
00000090: 2020 2020 2020 2020 2020 2020 2020 2020
000000a0: 2020 2020 2020 2020 2020 2020 2000 0000 ...
000000b0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0208 0500 0208 0500 464f 4f42 4152 5749 ........FOOBARWI
000000d0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0208 0500 0208 0500 464f 4f42 4152 5f41 ........FOOBAR_A
000000f0: 0700 0000 0300 0000 0400 0000 0800 0000 ................
00000100: 0c00 0000 0000 0000 0000 0000 d002 0000 ................
00000110: 0100 0000 0100 0000 0200 0000 0200 0000 ................
00000120: 0700 0000 0400 0000 0800 0000 0300 0000 ................
00000130: ffff ffff ffff efff ffff ffff ffff ef7f ................
00000140: feff ffff ffff efff 0700 0000 0b00 0000 ................
00000150: 0400 0000 0600 0000 0300 0000 0a00 0000 ................
00000160: 0100 0000 0300 0000 0a00 0000 0100 0000 ................
00000170: 0700 0000 0d00 0000 0100 0000 2700 0000 ............'...
00000180: 464f 4f42 4152 5749 3d66 6f6f 6261 7277 FOOBARWI=foobarw
00000190: 697a 3109 464f 4f42 4152 5f41 3d66 6f6f iz1.FOOBAR_A=foo
000001a0: 6261 7277 697a 32e7 0300 0000 0000 0065 barwiz2........e
000001b0: 66fc 0000 0000
Appendix C.
-----------
'in.sav'
2446 4c32 4028 2329 2053 5053 5320 4441 $FL2@(#) SPSS DA
00000010: 5441 2046 494c 4520 4d53 2057 696e 646f TA FILE MS Windo
00000020: 7773 2052 656c 6561 7365 2031 322e 3020 ws Release 12.0
00000030: 7370 7373 696f 3332 2e64 6c6c 2020 2020 spssio32.dll
00000040: 0200 0000 0200 0000 0100 0000 0000 0000 ................
00000050: 0100 0000 0000 0000 0000 5940 3236 2041 [EMAIL PROTECTED] A
00000060: 7072 2030 3530 373a 3535 3a31 3020 2020 pr 0507:55:10
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
00000080: 2020 2020 2020 2020 2020 2020 2020 2020
00000090: 2020 2020 2020 2020 2020 2020 2020 2020
000000a0: 2020 2020 2020 2020 2020 2020 2000 0000 ...
000000b0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0208 0500 0208 0500 464f 4f42 4152 5749 ........FOOBARWI
000000d0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0208 0500 0208 0500 464f 4f42 4152 5f41 ........FOOBAR_A
000000f0: 0700 0000 0300 0000 0400 0000 0800 0000 ................
00000100: 0c00 0000 0000 0000 0000 0000 d002 0000 ................
00000110: 0100 0000 0100 0000 0200 0000 0200 0000 ................
00000120: 0700 0000 0400 0000 0800 0000 0300 0000 ................
00000130: ffff ffff ffff efff ffff ffff ffff ef7f ................
00000140: feff ffff ffff efff 0700 0000 0b00 0000 ................
00000150: 0400 0000 0600 0000 0300 0000 0a00 0000 ................
00000160: 0100 0000 0300 0000 0a00 0000 0100 0000 ................
00000170: 0700 0000 0d00 0000 0100 0000 2700 0000 ............'...
00000180: 464f 4f42 4152 5f41 3d66 6f6f 6261 7277 FOOBAR_A=foobarw
00000190: 697a 3109 464f 4f42 4152 5749 3d66 6f6f iz1.FOOBARWI=foo
000001a0: 6261 7277 697a 32e7 0300 0000 0000 0065 barwiz2........e
000001b0: 66fc 0000 0000
Appendix D.
-----------
State of dictionary after reading
File Information
Notes
�
| -------------------------- | -------------------- |
| Output Created | 26-APR-2005 08:07:26 |
| -------------------------- | -------------------- |
| Comments | |
| ----------- | ------------ | -------------------- |
| Input | Data | Z:\Names\in.sav |
| | ------------ | -------------------- |
| | Filter | <none> |
| | ------------ | -------------------- |
| | Weight | <none> |
| | ------------ | -------------------- |
| | Split File | <none> |
| ----------- | ------------ | -------------------- |
| Syntax | DISPLAY DICTIONARY. |
| | |
| ----------- | ------------ | -------------------- |
| Resources | Elapsed Time | 0:00:00.00 |
| ----------- | ------------ | -------------------- |
�
List of variables on the working file
Name (Position) Label
foobarwiz2 (1)
Measurement Level: Scale
Column Width: 10 Alignment: Right
Print Format: F8.2
Write Format: F8.2
foobarwiz1 (2)
Measurement Level: Scale
Column Width: 10 Alignment: Right
Print Format: F8.2
Write Format: F8.2
List
Notes
�
| -------------------------------------------- | -------------------- |
| Output Created | 26-APR-2005 08:07:26 |
| -------------------------------------------- | -------------------- |
| Comments | |
| ----------- | ------------------------------ | -------------------- |
| Input | Data | Z:\Names\in.sav |
| | ------------------------------ | -------------------- |
| | Filter | <none> |
| | ------------------------------ | -------------------- |
| | Weight | <none> |
| | ------------------------------ | -------------------- |
| | Split File | <none> |
| | ------------------------------ | -------------------- |
| | N of Rows in Working Data File | 1 |
| ----------- | ------------------------------ | -------------------- |
| Syntax | LIST. |
| | |
| ----------- | ------------------------------ | -------------------- |
| Resources | Elapsed Time | 0:00:00.00 |
| ----------- | ------------------------------ | -------------------- |
�
foobarwiz2 foobarwiz1
1.00 2.00
Number of cases read: 1 Number of cases listed: 1
Appendix E.
-----------
'out2.sav'
2446 4c32 4028 2329 2053 5053 5320 4441 $FL2@(#) SPSS DA
00000010: 5441 2046 494c 4520 4d53 2057 696e 646f TA FILE MS Windo
00000020: 7773 2052 656c 6561 7365 2031 322e 3020 ws Release 12.0
00000030: 7370 7373 696f 3332 2e64 6c6c 2020 2020 spssio32.dll
00000040: 0200 0000 0200 0000 0100 0000 0000 0000 ................
00000050: 0100 0000 0000 0000 0000 5940 3236 2041 [EMAIL PROTECTED] A
00000060: 7072 2030 3530 383a 3037 3a32 3620 2020 pr 0508:07:26
00000070: 2020 2020 2020 2020 2020 2020 2020 2020
00000080: 2020 2020 2020 2020 2020 2020 2020 2020
00000090: 2020 2020 2020 2020 2020 2020 2020 2020
000000a0: 2020 2020 2020 2020 2020 2020 2000 0000 ...
000000b0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000c0: 0208 0500 0208 0500 464f 4f42 4152 5749 ........FOOBARWI
000000d0: 0200 0000 0000 0000 0000 0000 0000 0000 ................
000000e0: 0208 0500 0208 0500 464f 4f42 4152 5f41 ........FOOBAR_A
000000f0: 0700 0000 0300 0000 0400 0000 0800 0000 ................
00000100: 0c00 0000 0000 0000 0000 0000 d002 0000 ................
00000110: 0100 0000 0100 0000 0200 0000 0200 0000 ................
00000120: 0700 0000 0400 0000 0800 0000 0300 0000 ................
00000130: ffff ffff ffff efff ffff ffff ffff ef7f ................
00000140: feff ffff ffff efff 0700 0000 0b00 0000 ................
00000150: 0400 0000 0600 0000 0300 0000 0a00 0000 ................
00000160: 0100 0000 0300 0000 0a00 0000 0100 0000 ................
00000170: 0700 0000 0d00 0000 0100 0000 2700 0000 ............'...
00000180: 464f 4f42 4152 5749 3d66 6f6f 6261 7277 FOOBARWI=foobarw
00000190: 697a 3209 464f 4f42 4152 5f41 3d66 6f6f iz2.FOOBAR_A=foo
000001a0: 6261 7277 697a 31e7 0300 0000 0000 0065 barwiz1........e
000001b0: 66fc 0000 0000
pgpZSQHKEJfWh.pgp
Description: PGP signature
_______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
