Re: updated documentation

Jason Stover Thu, 05 May 2005 15:30:31 -0700

Here's a patch with some minor edits. I created the
patch with diff -Bbc. Just in case it's botched, or
in case you don't want all the changes, I've attached
the modified language.texi too.


-Jason



On Mon, May 02, 2005 at 04:39:40PM -0700, Ben Pfaff wrote:
> As part of my big commit last night I also updated a lot of the
> documentation in the "Language" chapter of the manual.  If anyone
> wants to proofread it, that'd be great.
> 
> -- 
> "Now I have to go wash my mind out with soap."
> --Derick Siddoway
> 
> 
> _______________________________________________
> pspp-dev mailing list
> [email protected]
> http://lists.gnu.org/mailman/listinfo/pspp-dev

-- 
[EMAIL PROTECTED]
SDF Public Access UNIX System - http://sdf.lonestar.org

*** language.texi       2005-05-04 09:30:20.000000000 -0400
--- language.texi.proof 2005-05-04 09:32:34.000000000 -0400
***************
*** 5,11 ****
  
  @quotation
  @strong{Please note:} PSPP is not even close to completion.
! Only a few actual statistical procedures are implemented.  PSPP
  is a work in progress.
  @end quotation
  
--- 5,11 ----
  
  @quotation
  @strong{Please note:} PSPP is not even close to completion.
! Only a few statistical procedures are implemented.  PSPP
  is a work in progress.
  @end quotation
  
***************
*** 50,56 ****
  @end example
  
  @cindex case-sensitivity
! Identifiers may be up any length, but only the first 64 bytes are
  significant.  Identifiers are not case-sensitive: @code{foobar},
  @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  different representations of the same identifier.
--- 50,56 ----
  @end example
  
  @cindex case-sensitivity
! Identifiers may be any length, but only the first 64 bytes are
  significant.  Identifiers are not case-sensitive: @code{foobar},
  @code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
  different representations of the same identifier.
***************
*** 150,156 ****
  punctuator only as the last character on a line (except white space).
  When it is the last non-space character on a line, a period is not
  treated as part of another token, even if it would otherwise be part
! of e.g.@: an identifier or a floating-point number.
  
  Actually, the character that ends a command can be changed with
  @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
--- 150,156 ----
  punctuator only as the last character on a line (except white space).
  When it is the last non-space character on a line, a period is not
  treated as part of another token, even if it would otherwise be part
! of, e.g.@:, an identifier or a floating-point number.
  
  Actually, the character that ends a command can be changed with
  @cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
***************
*** 174,183 ****
  The command name may be followed by one or more @dfn{subcommands}.
  Each subcommand begins with a subcommand name, which may be
  abbreviated to its first three letters.  Some subcommands accept a
! series of one or more specifications, which follow the subcommand name
! and, optionally separated from it by an equals sign (@samp{=}), and
! optionally separated from each other by commas.  Each subcommand must
! be separated from the next (if any) by a forward slash (@samp{/}).
  
  There are multiple ways to mark the end of a command.  The most common
  way is to end the last line of the command with a period (@samp{.}) as
--- 174,184 ----
  The command name may be followed by one or more @dfn{subcommands}.
  Each subcommand begins with a subcommand name, which may be
  abbreviated to its first three letters.  Some subcommands accept a
! series of one or more specifications, which follow the subcommand
! name, optionally separated from it by an equals sign
! (@samp{=}). Specifications may be separated from each other
! by commas or spaces.  Each subcommand must be separated from the next (if any)
! by a forward slash (@samp{/}).
  
  There are multiple ways to mark the end of a command.  The most common
  way is to end the last line of the command with a period (@samp{.}) as
***************
*** 216,229 ****
  @item File definition commands
  @cindex file definition commands
  Give instructions for reading data from text files or from special
! binary ``system files''.  Most of these commands discard any previous
! data or variables to replace it with the new data and
! variables.  At least one must appear before the first command in any of
  the categories below.  @xref{Data Input and Output}.
  
  @item Input program commands
  @cindex input program commands
! Though rarely used, these provide powerful tools for reading data files
  in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  
  @item Transformations
--- 217,230 ----
  @item File definition commands
  @cindex file definition commands
  Give instructions for reading data from text files or from special
! binary ``system files''.  Most of these commands replace any previous
! data or variables with new data or
! variables.  At least one file definition command must appear before the first 
command in any of
  the categories below.  @xref{Data Input and Output}.
  
  @item Input program commands
  @cindex input program commands
! Though rarely used, these provide tools for reading data files
  in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.
  
  @item Transformations
***************
*** 250,256 ****
  @cindex order of commands
  
  PSPP does not place many restrictions on ordering of commands.  The
! main restriction is that variables must be defined they are otherwise
  referenced.  This section describes the details of command ordering,
  but most users will have no need to refer to them.
  
--- 251,257 ----
  @cindex order of commands
  
  PSPP does not place many restrictions on ordering of commands.  The
! main restriction is that variables must be defined before they are otherwise
  referenced.  This section describes the details of command ordering,
  but most users will have no need to refer to them.
  
***************
*** 259,265 ****
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
  @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
  
! PSPP starts up in the initial state.  Each successful completion
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
--- 260,266 ----
  distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
  @emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)
  
! PSPP starts in the initial state.  Each successful completion
  of a command may cause a state transition.  Each type of command has its
  own rules for state transitions:
  
***************
*** 487,493 ****
  @cindex variables, system
  
  There are seven system variables.  These are not like ordinary
! variables, as they are not stored in each case.  They can only be used
  in expressions.  These system variables, whose values and output formats
  cannot be modified, are described below.
  
--- 488,494 ----
  @cindex variables, system
  
  There are seven system variables.  These are not like ordinary
! variables because system variables are not always stored.  They can be used 
only
  in expressions.  These system variables, whose values and output formats
  cannot be modified, are described below.
  
***************
*** 565,576 ****
  included then it is assumed to be 0.  Some formats do not allow @var{d}
  to be specified.
  
! When an input format is specified on @cmd{DATA LIST} or another
! command, then
! it is converted to an output format for the purposes of @cmd{PRINT}
! and other
! data output commands.  For most purposes, input and output formats are
! the same; the salient differences are described below.
  
  Below are listed the input and output formats supported by PSPP.  If an
  input format is mapped to a different output format by default, then
--- 566,576 ----
  included then it is assumed to be 0.  Some formats do not allow @var{d}
  to be specified.
  
! When @cmd{DATA LIST} or another command specifies an input format,
! that format is converted to an output format for the purposes of
! @cmd{PRINT} and other data output commands.  For most purposes, input
! and output formats are the same; the salient differences are described
! below.
  
  Below are listed the input and output formats supported by PSPP.  If an
  input format is mapped to a different output format by default, then
***************
*** 654,660 ****
  @item PIB @result{} F: 1 <= iw,ow <= 8
  Positive integer binary format.  The field is interpreted as a
  fixed-point positive binary number.  The location of the decimal point
! is implied.  Endianness is teh same as the host machine.
  
  The default output format follows the rules for IB format.
  
--- 654,660 ----
  @item PIB @result{} F: 1 <= iw,ow <= 8
  Positive integer binary format.  The field is interpreted as a
  fixed-point positive binary number.  The location of the decimal point
! is implied.  Endianness is the same as the host machine.
  
  The default output format follows the rules for IB format.
  
***************
*** 831,837 ****
  
  @item DATETIMEw.d: 17 <= iw,ow <= 40
  Date and time format.  Input format: leader + day + date-delimiter +
! month + date-delimiter + yaer + time-delimiter + hour24 + time-delimiter
  + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
  @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
  @var{d} > 0 then fractional seconds @samp{.SS} are added.
--- 831,837 ----
  
  @item DATETIMEw.d: 17 <= iw,ow <= 40
  Date and time format.  Input format: leader + day + date-delimiter +
! month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
  + minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
  @var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
  @var{d} > 0 then fractional seconds @samp{.SS} are added.
***************
*** 890,902 ****
  names begin with an octothorpe (@samp{#}).  
  
  Scratch variables have the same properties as variables left with
! @cmd{LEAVE}:
! they retain their values between cases, and for the first case they are
! initialized to 0 or blanks.  They have the additional property that they
! are deleted before the execution of any procedure.  For this reason,
! scratch variables can't be used for analysis.  To obtain the same
! effect, use @cmd{COMPUTE} (@pxref{COMPUTE}) to copy the scratch variable's
! value into an ordinary variable, then analysis that variable.
  
  @node Files, BNF, Variables, Language
  @section Files Used by PSPP
--- 890,902 ----
  names begin with an octothorpe (@samp{#}).  
  
  Scratch variables have the same properties as variables left with
! @cmd{LEAVE}: they retain their values between cases, and for the first
! case they are initialized to 0 or blanks.  They have the additional
! property that they are deleted before the execution of any procedure.
! For this reason, scratch variables can't be used for analysis.  To use
! a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
! to copy its value into an ordinary variable, then use that ordinary
! variable in the analysis.
  
  @node Files, BNF, Variables, Language
  @section Files Used by PSPP
***************
*** 912,919 ****
  @cindex syntax file
  @item command file
  @itemx syntax file
! These names (synonyms) refer to the file that contains instructions to
! PSPP that tell it what to do.  The syntax file's name is specified on
  the PSPP command line.  Syntax files can also be pulled in with
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
--- 912,919 ----
  @cindex syntax file
  @item command file
  @itemx syntax file
! These names (synonyms) refer to the file that contains instructions
! that tell PSPP what to do.  The syntax file's name is specified on
  the PSPP command line.  Syntax files can also be pulled in with
  @cmd{INCLUDE} (@pxref{INCLUDE}).
  
***************
*** 959,965 ****
  @item
  Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
  often called @dfn{terminals}.  There are some special terminals, which
! are actually written in lowercase for clarity:
  
  @table @asis
  @cindex @code{number}
--- 959,965 ----
  @item
  Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
  often called @dfn{terminals}.  There are some special terminals, which
! are written in lowercase for clarity:
  
  @table @asis
  @cindex @code{number}

@node Language, Expressions, Invocation, Top
@chapter The PSPP language
@cindex language, PSPP
@cindex PSPP, language

@quotation
@strong{Please note:} PSPP is not even close to completion.
Only a few statistical procedures are implemented.  PSPP
is a work in progress.
@end quotation

This chapter discusses elements common to many PSPP commands.
Later chapters will describe individual commands in detail.

@menu
* Tokens::                      Characters combine to form tokens.
* Commands::                    Tokens combine to form commands.
* Types of Commands::           Commands come in several flavors.
* Order of Commands::           Commands combine to form syntax files.
* Missing Observations::        Handling missing observations.
* Variables::                   The unit of data storage.
* Files::                       Files used by PSPP.
* BNF::                         How command syntax is described.
@end menu

@node Tokens, Commands, Language, Language
@section Tokens
@cindex language, lexical analysis
@cindex language, tokens
@cindex tokens
@cindex lexical analysis

PSPP divides most syntax file lines into series of short chunks
called @dfn{tokens}.
Tokens are then grouped to form commands, each of which tells
PSPP to take some action---read in data, write out data, perform
a statistical procedure, etc.  Each type of token is
described below.

@table @strong
@cindex identifiers
@item Identifiers
Identifiers are names that typically specify variables, commands, or
subcommands.  The first character in an identifier must be a letter,
@samp{#}, or @samp{@@}.  The remaining characters in the identifier
must be letters, digits, or one of the following special characters:

@example
@center @.  _  $  #  @@
@end example

@cindex case-sensitivity
Identifiers may be any length, but only the first 64 bytes are
significant.  Identifiers are not case-sensitive: @code{foobar},
@code{Foobar}, @code{FooBar}, @code{FOOBAR}, and @code{FoObaR} are
different representations of the same identifier.

@cindex identifiers, reserved
@cindex reserved identifiers
Some identifiers are reserved.  Reserved identifiers may not be used
in any context besides those explicitly described in this manual.  The
reserved identifiers are:

@example
@center ALL  AND  BY  EQ  GE  GT  LE  LT  NE  NOT  OR  TO  WITH
@end example

@item Keywords
Keywords are a subclass of identifiers that form a fixed part of
command syntax.  For example, command and subcommand names are
keywords.  Keywords may be abbreviated to their first 3 characters if
this abbreviation is unambiguous.  (Unique abbreviations of 3 or more
characters are also accepted: @samp{FRE}, @samp{FREQ}, and
@samp{FREQUENCIES} are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords.  Other identifiers
may be used both as keywords and as user-defined identifiers, such as
variable names.

@item Numbers
@cindex numbers
@cindex integers
@cindex reals
Numbers are expressed in decimal.  A decimal point is optional.
Numbers may be expressed in scientific notation by adding @samp{e} and
a base-10 exponent, so that @samp{1.234e3} has the value 1234.  Here
are some more examples of valid numbers:

@example
-5  3.14159265359  1e100  -.707  8945.
@end example

Negative numbers are expressed with a @samp{-} prefix.  However, in
situations where a literal @samp{-} token is expected, what appears to
be a negative number is treated as @samp{-} followed by a positive
number.

No white space is allowed within a number token, except for horizontal
white space between @samp{-} and the rest of the number.

The last example above, @samp{8945.} will be interpreted as two
tokens, @samp{8945} and @samp{.}, if it is the last token on a line.
@xref{Commands, , Forming commands of tokens}.

@item Strings
@cindex strings
@cindex @samp{'}
@cindex @samp{"}
@cindex case-sensitivity
Strings are literal sequences of characters enclosed in pairs of
single quotes (@samp{'}) or double quotes (@samp{"}).  To include the
character used for quoting in the string, double it, e.g.@:
@samp{'it''s an apostrophe'}.  White space and case of letters are
significant inside strings.

Strings can be concatenated using @samp{+}, so that @samp{"a" + 'b' +
'c'} is equivalent to @samp{'abc'}.  Concatenation is useful for
splitting a single string across multiple source lines. The maximum
length of a string, after concatenation, is 255 characters.

Strings may also be expressed as hexadecimal, octal, or binary
character values by prefixing the initial quote character by @samp{X},
@samp{O}, or @samp{B} or their lowercase equivalents.  Each pair,
triplet, or octet of characters, according to the radix, is
transformed into a single character with the given value.  If there is
an incomplete group of characters, the missing final digits are
assumed to be @samp{0}.  These forms of strings are nonportable
because numeric values are associated with different characters by
different operating systems.  Therefore, their use should be confined
to syntax files that will not be widely distributed.

@cindex characters, reserved
@cindex 0
@cindex white space
The character with value 00 is reserved for
internal use by PSPP.  Its use in strings causes an error and
replacement by a space character.

@item Punctuators and Operators
@cindex punctuators
@cindex operators
These tokens are the punctuators and operators:

@example
@center ,  /  =  (  )  +  -  *  /  **  <  <=  <>  >  >=  ~=  &  |  .
@end example

Most of these appear within the syntax of commands, but the period
(@samp{.}) punctuator is used only at the end of a command.  It is a
punctuator only as the last character on a line (except white space).
When it is the last non-space character on a line, a period is not
treated as part of another token, even if it would otherwise be part
of, e.g.@:, an identifier or a floating-point number.

Actually, the character that ends a command can be changed with
@cmd{SET}'s ENDCMD subcommand (@pxref{SET}), but we do not recommend
doing so.  Throughout the remainder of this manual we will assume that
the default setting is in effect.
@end table

@node Commands, Types of Commands, Tokens, Language
@section Forming commands of tokens

@cindex PSPP, command structure
@cindex language, command structure
@cindex commands, structure

Most PSPP commands share a common structure.  A command begins with a
command name, such as @cmd{FREQUENCIES}, @cmd{DATA LIST}, or @cmd{N OF
CASES}.  The command name may be abbreviated to its first word, and
each word in the command name may be abbreviated to its first three
or more characters, where these abbreviations are unambiguous.

The command name may be followed by one or more @dfn{subcommands}.
Each subcommand begins with a subcommand name, which may be
abbreviated to its first three letters.  Some subcommands accept a
series of one or more specifications, which follow the subcommand
name, optionally separated from it by an equals sign
(@samp{=}). Specifications may be separated from each other
by commas or spaces.  Each subcommand must be separated from the next (if any)
by a forward slash (@samp{/}).

There are multiple ways to mark the end of a command.  The most common
way is to end the last line of the command with a period (@samp{.}) as
described in the previous section (@pxref{Tokens}).  A blank line, or
one that consists only of white space or comments, also ends a command
by default, although you can use the NULLINE subcommand of @cmd{SET}
to disable this feature (@pxref{SET}).

In batch mode only, that is, when reading commands from a file instead
of an interactive user, any line that contains a non-space character
in the leftmost column begins a new command.  Thus, each command
consists of a flush-left line followed by any number of lines indented
from the left margin.  In this mode, a plus sign, minus sign, or
period (@samp{+}, @[EMAIL PROTECTED], or @samp{.}) as the first character
in a line is ignored and causes that line to begin a new command,
which allows for visual indentation of a command without that command
being considered part of the previous command.

Sometimes, one encounters syntax files that are intended to be
interpreted in interactive mode rather than batch mode.  When this
occurs, use the @samp{-i} command line option to force interpretation
in interactive mode (@pxref{Language control options}).

@node Types of Commands, Order of Commands, Commands, Language
@section Types of Commands

Commands in PSPP are divided roughly into six categories:

@table @strong
@item Utility commands
@cindex utility commands
Set or display various global options that affect PSPP operations.
May appear anywhere in a syntax file.  @xref{Utilities, , Utility
commands}.

@item File definition commands
@cindex file definition commands
Give instructions for reading data from text files or from special
binary ``system files''.  Most of these commands replace any previous
data or variables with new data or
variables.  At least one file definition command must appear before the first 
command in any of
the categories below.  @xref{Data Input and Output}.

@item Input program commands
@cindex input program commands
Though rarely used, these provide tools for reading data files
in arbitrary textual or binary formats.  @xref{INPUT PROGRAM}.

@item Transformations
@cindex transformations
Perform operations on data and write data to output files.  Transformations
are not carried out until a procedure is executed.  

@item Restricted transformations
@cindex restricted transformations
Transformations that cannot appear in certain contexts.  @xref{Order
of Commands}, for details.

@item Procedures
@cindex procedures
Analyze data, writing results of analyses to the listing file.  Cause
transformations specified earlier in the file to be performed.  In a
more general sense, a @dfn{procedure} is any command that causes the
active file (the data) to be read.
@end table

@node Order of Commands, Missing Observations, Types of Commands, Language
@section Order of Commands
@cindex commands, ordering
@cindex order of commands

PSPP does not place many restrictions on ordering of commands.  The
main restriction is that variables must be defined before they are otherwise
referenced.  This section describes the details of command ordering,
but most users will have no need to refer to them.

PSPP possesses five internal states, called initial, INPUT PROGRAM,
FILE TYPE, transformation, and procedure states.  (Please note the
distinction between the @cmd{INPUT PROGRAM} and @cmd{FILE TYPE}
@emph{commands} and the INPUT PROGRAM and FILE TYPE @emph{states}.)

PSPP starts in the initial state.  Each successful completion
of a command may cause a state transition.  Each type of command has its
own rules for state transitions:

@table @strong
@item Utility commands
@itemize @bullet
@item
Valid in any state.
@item
Do not cause state transitions.  Exception: when @cmd{N OF CASES}
is executed in the procedure state, it causes a transition to the
transformation state.
@end itemize

@item @cmd{DATA LIST}
@itemize @bullet
@item
Valid in any state.
@item
When executed in the initial or procedure state, causes a transition to
the transformation state.  
@item
Clears the active file if executed in the procedure or transformation
state.
@end itemize

@item @cmd{INPUT PROGRAM}
@itemize @bullet
@item
Invalid in INPUT PROGRAM and FILE TYPE states.
@item
Causes a transition to the INPUT PROGRAM state.  
@item
Clears the active file.
@end itemize

@item @cmd{FILE TYPE}
@itemize @bullet
@item
Invalid in INPUT PROGRAM and FILE TYPE states.
@item
Causes a transition to the FILE TYPE state.
@item
Clears the active file.
@end itemize

@item Other file definition commands
@itemize @bullet
@item
Invalid in INPUT PROGRAM and FILE TYPE states.
@item
Cause a transition to the transformation state.
@item
Clear the active file, except for @cmd{ADD FILES}, @cmd{MATCH FILES},
and @cmd{UPDATE}.
@end itemize

@item Transformations
@itemize @bullet
@item
Invalid in initial and FILE TYPE states.
@item
Cause a transition to the transformation state.
@end itemize

@item Restricted transformations
@itemize @bullet
@item
Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
@item
Cause a transition to the transformation state.
@end itemize

@item Procedures
@itemize @bullet
@item
Invalid in initial, INPUT PROGRAM, and FILE TYPE states.
@item
Cause a transition to the procedure state.
@end itemize
@end table

@node Missing Observations, Variables, Order of Commands, Language
@section Handling missing observations
@cindex missing values
@cindex values, missing

PSPP includes special support for unknown numeric data values.
Missing observations are assigned a special value, called the
@dfn{system-missing value}.  This ``value'' actually indicates the
absence of a value; it means that the actual value is unknown.  Procedures
automatically exclude from analyses those observations or cases that
have missing values.  Details of missing value exclusion depend on the
procedure and can often be controlled by the user; refer to
descriptions of individual procedures for details.

The system-missing value exists only for numeric variables.  String
variables always have a defined value, even if it is only a string of
spaces.

Variables, whether numeric or string, can have designated
@dfn{user-missing values}.  Every user-missing value is an actual value
for that variable.  However, most of the time user-missing values are
treated in the same way as the system-missing value.  String variables
that are wider than a certain width, usually 8 characters (depending on
computer architecture), cannot have user-missing values.

For more information on missing values, see the following sections:
@ref{Variables}, @ref{MISSING VALUES}, @ref{Expressions}.  See also the
documentation on individual procedures for information on how they
handle missing values.

@node Variables, Files, Missing Observations, Language
@section Variables
@cindex variables
@cindex dictionary

Variables are the basic unit of data storage in PSPP.  All the
variables in a file taken together, apart from any associated data, are
said to form a @dfn{dictionary}.  
Some details of variables are described in the sections below.

@menu
* Attributes::                  Attributes of variables.
* System Variables::            Variables automatically defined by PSPP.
* Sets of Variables::           Lists of variable names.
* Input/Output Formats::        Input and output formats.
* Scratch Variables::           Variables deleted by procedures.
@end menu

@node Attributes, System Variables, Variables, Variables
@subsection Attributes of Variables
@cindex variables, attributes of
@cindex attributes of variables
Each variable has a number of attributes, including:

@table @strong
@item Name
An identifier, up to 64 bytes long.  Each variable must have a different name.
@xref{Tokens}.

Some system variable names begin with @samp{$}, but user-defined
variables' names may not begin with @samp{$}.

@cindex @samp{.}
@cindex period
@cindex variable names, ending with period
The final character in a variable name should not be @samp{.}, because
such an identifier will be misinterpreted when it is the final token
on a line: @code{FOO.} will be divided into two separate tokens,
@samp{FOO} and @samp{.}, indicating end-of-command.  @xref{Tokens}.

@cindex @samp{_}
The final character in a variable name should not be @samp{_}, because
some such identifiers are used for special purposes by PSPP
procedures.

As with all PSPP identifiers, variable names are not case-sensitive.
PSPP capitalizes variable names on output the same way they were
capitalized at their point of definition in the input.

@cindex variables, type
@cindex type of variables
@item Type
Numeric or string.

@cindex variables, width
@cindex width of variables
@item Width
(string variables only) String variables with a width of 8 characters or
fewer are called @dfn{short string variables}.  Short string variables
can be used in many procedures where @dfn{long string variables} (those
with widths greater than 8) are not allowed.

Certain systems may consider strings longer than 8
characters to be short strings.  Eight characters represents a minimum
figure for the maximum length of a short string.

@item Position
Variables in the dictionary are arranged in a specific order.
@cmd{DISPLAY} can be used to show this order: see @ref{DISPLAY}.

@item Initialization
Either reinitialized to 0 or spaces for each case, or left at its
existing value.  @xref{LEAVE}.

@cindex missing values
@cindex values, missing
@item Missing values
Optionally, up to three values, or a range of values, or a specific
value plus a range, can be specified as @dfn{user-missing values}.
There is also a @dfn{system-missing value} that is assigned to an
observation when there is no other obvious value for that observation.
Observations with missing values are automatically excluded from
analyses.  User-missing values are actual data values, while the
system-missing value is not a value at all.  @xref{Missing Observations}.

@cindex variable labels
@cindex labels, variable
@item Variable label
A string that describes the variable.  @xref{VARIABLE LABELS}.

@cindex value labels
@cindex labels, value
@item Value label
Optionally, these associate each possible value of the variable with a
string.  @xref{VALUE LABELS}.

@cindex print format
@item Print format
Display width, format, and (for numeric variables) number of decimal
places.  This attribute does not affect how data are stored, just how
they are displayed.  Example: a width of 8, with 2 decimal places.
@xref{PRINT FORMATS}.

@cindex write format
@item Write format
Similar to print format, but used by certain commands that are
designed to write to binary files.  @xref{WRITE FORMATS}.
@end table

@node System Variables, Sets of Variables, Attributes, Variables
@subsection Variables Automatically Defined by PSPP
@cindex system variables
@cindex variables, system

There are seven system variables.  These are not like ordinary
variables because system variables are not always stored.  They can be used only
in expressions.  These system variables, whose values and output formats
cannot be modified, are described below.

@table @code
@cindex @code{$CASENUM}
@item $CASENUM
Case number of the case at the moment.  This changes as cases are
shuffled around.

@cindex @code{$DATE}
@item $DATE
Date the PSPP process was started, in format A9, following the
pattern @code{DD MMM YY}.

@cindex @code{$JDATE}
@item $JDATE
Number of days between 15 Oct 1582 and the time the PSPP process
was started.

@cindex @code{$LENGTH}
@item $LENGTH
Page length, in lines, in format F11.

@cindex @code{$SYSMIS}
@item $SYSMIS
System missing value, in format F1.

@cindex @code{$TIME}
@item $TIME
Number of seconds between midnight 14 Oct 1582 and the time the active file
was read, in format F20.

@cindex @code{$WIDTH}
@item $WIDTH
Page width, in characters, in format F3.
@end table

@node Sets of Variables, Input/Output Formats, System Variables, Variables
@subsection Lists of variable names
@cindex TO convention
@cindex convention, TO

To refer to a set of variables, list their names one after another.
Optionally, their names may be separated by commas.  To include a
range of variables from the dictionary in the list, write the name of
the first and last variable in the range, separated by @code{TO}.  For
instance, if the dictionary contains six variables with the names
@code{ID}, @code{X1}, @code{X2}, @code{GOAL}, @code{MET}, and
@code{NEXTGOAL}, in that order, then @code{X2 TO MET} would include
variables @code{X2}, @code{GOAL}, and @code{MET}.

Commands that define variables, such as @cmd{DATA LIST}, give
@code{TO} an alternate meaning.  With these commands, @code{TO} define
sequences of variables whose names end in consecutive integers.  The
syntax is two identifiers that begin with the same root and end with
numbers, separated by @code{TO}.  The syntax @code{X1 TO X5} defines 5
variables, named @code{X1}, @code{X2}, @code{X3}, @code{X4}, and
@code{X5}.  The syntax @code{ITEM0008 TO ITEM0013} defines 6
variables, named @code{ITEM0008}, @code{ITEM0009}, @code{ITEM0010},
@code{ITEM0011}, @code{ITEM0012}, and @code{ITEM00013}.  The syntaxes
@code{QUES001 TO QUES9} and @code{QUES6 TO QUES3} are invalid.

After a set of variables has been defined with @cmd{DATA LIST} or
another command with this method, the same set can be referenced on
later commands using the same syntax.

@node Input/Output Formats, Scratch Variables, Sets of Variables, Variables
@subsection Input and Output Formats

Data that PSPP inputs and outputs must have one of a number of formats.
These formats are described, in general, by a format specification of
the form @code{NAMEw.d}, where @var{name} is the
format name and @var{w} is a field width.  @var{d} is the optional
desired number of decimal places, if appropriate.  If @var{d} is not
included then it is assumed to be 0.  Some formats do not allow @var{d}
to be specified.

When @cmd{DATA LIST} or another command specifies an input format,
that format is converted to an output format for the purposes of
@cmd{PRINT} and other data output commands.  For most purposes, input
and output formats are the same; the salient differences are described
below.

Below are listed the input and output formats supported by PSPP.  If an
input format is mapped to a different output format by default, then
that mapping is indicated with @result{}.  Each format has the listed
bounds on input width (iw) and output width (ow).

The standard numeric input and output formats are given in the following
table:

@table @asis
@item Fw.d: 1 <= iw,ow <= 40
Standard decimal format with @var{d} decimal places.  If the number is
too large to fit within the field width, it is expressed in scientific
notation (@code{1.2+34}) if w >= 6, with always at least two digits in
the exponent.  When used as an input format, scientific notation is
allowed but an E or an F must be used to introduce the exponent.

The default output format is the same as the input format, except if
@var{d} > 1.  In that case the output @var{w} is always made to be at
least 2 + @var{d}.

@item Ew.d: 1 <= iw <= 40; 6 <= ow <= 40
For input this is equivalent to F format except that no E or F is
require to introduce the exponent.  For output, produces scientific
notation in the form @code{1.2+34}.  There are always at least two
digits given in the exponent.

The default output @var{w} is the largest of the input @var{w}, the
input @var{d} + 7, and 10.  The default output @var{d} is the input
@var{d}, but at least 3.

@item COMMAw.d: 1 <= iw,ow <= 40
Equivalent to F format, except that groups of three digits are
comma-separated on output.  If the number is too large to express in the
field width, then first commas are eliminated, then if there is still
not enough space the number is expressed in scientific notation given
that w >= 6.  Commas are allowed and ignored when this is used as an
input format.  

@item DOTw.d: 1 <= iw,ow <= 40
Equivalent to COMMA format except that the roles of comma and decimal
point are interchanged.  However: If SET /DECIMAL=DOT is in effect, then
COMMA uses @samp{,} for a decimal point and DOT uses @samp{.} for a
decimal point.

@item DOLLARw.d: 1 <= iw <= 40; 2 <= ow <= 40
Equivalent to COMMA format, except that the number is prefixed by a
dollar sign (@samp{$}) if there is room.  On input the value is allowed
to be prefixed by a dollar sign, which is ignored.

The default output @var{w} is the input @var{w}, but at least 2.

@item PCTw.d: 2 <= iw,ow <= 40
Equivalent to F format, except that the number is suffixed by a percent
sign (@samp{%}) if there is room.  On input the value is allowed to be
suffixed by a percent sign, which is ignored.

The default output @var{w} is the input @var{w}, but at least 2.

@item Nw.d: 1 <= iw,ow <= 40
Only digits are allowed within the field width.  The decimal point is
assumed to be @var{d} digits from the right margin.

The default output format is F with the same @var{w} and @var{d}, except
if @var{d} > 1.  In that case the output @var{w} is always made to be at
least 2 + @var{d}.

@item Zw.d @result{} F: 1 <= iw,ow <= 40
Zoned decimal input.  If you need to use this then you know how.

@item IBw.d @result{} F: 1 <= iw,ow <= 8
Integer binary format.  The field is interpreted as a fixed-point
positive or negative binary number in two's-complement notation.  The
location of the decimal point is implied.  Endianness is the same as the
host machine.

The default output format is F8.2 if @var{d} is 0.  Otherwise it is F,
with output @var{w} as 9 + input @var{d} and output @var{d} as input
@var{d}.

@item PIB @result{} F: 1 <= iw,ow <= 8
Positive integer binary format.  The field is interpreted as a
fixed-point positive binary number.  The location of the decimal point
is implied.  Endianness is the same as the host machine.

The default output format follows the rules for IB format.

@item Pw.d @result{} F: 1 <= iw,ow <= 16
Binary coded decimal format.  Each byte from left to right, except the
rightmost, represents two digits.  The upper nibble of each byte is more
significant.  The upper nibble of the final byte is the least
significant digit.  The lower nibble of the final byte is the sign; a
value of D represents a negative sign and all other values are
considered positive.  The decimal point is implied.

The default output format follows the rules for IB format.

@item PKw.d @result{} F: 1 <= iw,ow <= 16
Positive binary code decimal format.  Same as P but the last byte is the
same as the others.

The default output format follows the rules for IB format.

@item RBw @result{} F: 2 <= iw,ow <= 8

Binary C architecture-dependent ``double'' format.  For a standard
IEEE754 implementation @var{w} should be 8.

The default output format follows the rules for IB format.

@item PIBHEXw.d @result{} F: 2 <= iw,ow <= 16
PIB format encoded as textual hex digit pairs.  @var{w} must be even.

The input width is mapped to a default output width as follows:
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], 
[EMAIL PROTECTED],
[EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED]  No allowances are made 
for
decimal places.

@item RBHEXw @result{} F: 4 <= iw,ow <= 16

RB format encoded as textual hex digits pairs.  @var{w} must be even.

The default output format is F8.2.

@item CCAw.d: 1 <= ow <= 40
@itemx CCBw.d: 1 <= ow <= 40
@itemx CCCw.d: 1 <= ow <= 40
@itemx CCDw.d: 1 <= ow <= 40
@itemx CCEw.d: 1 <= ow <= 40

User-defined custom currency formats.  May not be used as an input
format.  @xref{SET}, for more details.
@end table

The date and time numeric input and output formats accept a number of
possible formats.  Before describing the formats themselves, some
definitions of the elements that make up their formats will be helpful:

@table @dfn
@item leader
All formats accept an optional white space leader.

@item day
An integer between 1 and 31 representing the day of month.

@item day-count
An integer representing a number of days.

@item date-delimiter
One or more characters of white space or the following characters:
@code{- / . ,}

@item month
A month name in one of the following forms:
@itemize @bullet
@item
An integer between 1 and 12.
@item
Roman numerals representing an integer between 1 and 12.
@item
At least the first three characters of an English month name (January,
February, @dots{}).
@end itemize

@item year
An integer year number between 1582 and 19999, or between 1 and 199.
Years between 1 and 199 will have 1900 added.

@item julian
A single number with a year number in the first 2, 3, or 4 digits (as
above) and the day number within the year in the last 3 digits.

@item quarter
An integer between 1 and 4 representing a quarter.

@item q-delimiter
The letter @samp{Q} or @samp{q}.

@item week
An integer between 1 and 53 representing a week within a year.

@item wk-delimiter
The letters @samp{wk} in any case.

@item time-delimiter
At least one characters of white space or @samp{:} or @samp{.}.

@item hour
An integer greater than 0 representing an hour.

@item minute
An integer between 0 and 59 representing a minute within an hour.

@item opt-second
Optionally, a time-delimiter followed by a real number representing a
number of seconds.

@item hour24
An integer between 0 and 23 representing an hour within a day.

@item weekday
At least the first two characters of an English day word.

@item spaces
Any amount or no amount of white space.

@item sign
An optional positive or negative sign.

@item trailer
All formats accept an optional white space trailer.
@end table

The date input formats are strung together from the above pieces.  On
output, the date formats are always printed in a single canonical
manner, based on field width.  The date input and output formats are
described below:

@table @asis
@item DATEw: 9 <= iw,ow <= 40
Date format. Input format: leader + day + date-delimiter +
month + date-delimiter + year + trailer.  Output format: DD-MMM-YY for
@var{w} < 11, DD-MMM-YYYY otherwise.

@item EDATEw: 8 <= iw,ow <= 40
European date format.  Input format same as DATE.  Output format:
DD.MM.YY for @var{w} < 10, DD.MM.YYYY otherwise.

@item SDATEw: 8 <= iw,ow <= 40
Standard date format. Input format: leader + year + date-delimiter +
month + date-delimiter + day + trailer.  Output format: YY/MM/DD for
@var{w} < 10, YYYY/MM/DD otherwise.

@item ADATEw: 8 <= iw,ow <= 40
American date format.  Input format: leader + month + date-delimiter +
day + date-delimiter + year + trailer.  Output format: MM/DD/YY for
@var{w} < 10, MM/DD/YYYY otherwise.

@item JDATEw: 5 <= iw,ow <= 40
Julian date format.  Input format: leader + julian + trailer.  Output
format: YYDDD for @var{w} < 7, YYYYDDD otherwise.

@item QYRw: 4 <= iw <= 40, 6 <= ow <= 40
Quarter/year format.  Input format: leader + quarter + q-delimiter +
year + trailer.  Output format: @samp{Q Q YY}, where the first
@samp{Q} is one of the digits 1, 2, 3, 4, if @var{w} < 8, @code{Q Q
YYYY} otherwise.

@item MOYRw: 6 <= iw,ow <= 40
Month/year format.  Input format: leader + month + date-delimiter + year
+ trailer.  Output format: @samp{MMM YY} for @var{w} < 8, @samp{MMM
YYYY} otherwise.

@item WKYRw: 6 <= iw <= 40, 8 <= ow <= 40
Week/year format.  Input format: leader + week + wk-delimiter + year +
trailer.  Output format: @samp{WW WK YY} for @var{w} < 10, @samp{WW WK
YYYY} otherwise.

@item DATETIMEw.d: 17 <= iw,ow <= 40
Date and time format.  Input format: leader + day + date-delimiter +
month + date-delimiter + year + time-delimiter + hour24 + time-delimiter
+ minute + opt-second.  Output format: @samp{DD-MMM-YYYY HH:MM}.  If
@var{w} > 19 then seconds @samp{:SS} is added.  If @var{w} > 22 and
@var{d} > 0 then fractional seconds @samp{.SS} are added.

@item TIMEw.d: 5 <= iw,ow <= 40
Time format.  Input format: leader + sign + spaces + hour +
time-delimiter + minute + opt-second.  Output format: @samp{HH:MM}.
Seconds and fractional seconds are available with @var{w} of at least 8
and 10, respectively.

@item DTIMEw.d: 1 <= iw <= 40, 8 <= ow <= 40
Time format with day count.  Input format: leader + sign + spaces +
day-count + time-delimiter + hour + time-delimiter + minute +
opt-second.  Output format: @samp{DD HH:MM}.  Seconds and fractional
seconds are available with @var{w} of at least 8 and 10, respectively.

@item WKDAYw: 2 <= iw,ow <= 40
A weekday as a number between 1 and 7, where 1 is Sunday.  Input format:
leader + weekday + trailer.  Output format: as many characters, in all
capital letters, of the English name of the weekday as will fit in the
field width.

@item MONTHw: 3 <= iw,ow <= 40
A month as a number between 1 and 12, where 1 is January.  Input format:
leader + month + trailer.  Output format: as many character, in all
capital letters, of the English name of the month as will fit in the
field width.
@end table

There are only two formats that may be used with string variables:

@table @asis
@item Aw: 1 <= iw <= 255, 1 <= ow <= 254
The entire field is treated as a string value.

@item AHEXw @result{} A: 2 <= iw <= 254; 2 <= ow <= 510
The field is composed of characters in a string encoded as textual hex
digit pairs.

The default output @var{w} is half the input @var{w}.
@end table

@node Scratch Variables,  , Input/Output Formats, Variables
@subsection Scratch Variables

Most of the time, variables don't retain their values between cases.
Instead, either they're being read from a data file or the active file,
in which case they assume the value read, or, if created with
@cmd{COMPUTE} or
another transformation, they're initialized to the system-missing value
or to blanks, depending on type.

However, sometimes it's useful to have a variable that keeps its value
between cases.  You can do this with @cmd{LEAVE} (@pxref{LEAVE}), or you can
use a @dfn{scratch variable}.  Scratch variables are variables whose
names begin with an octothorpe (@samp{#}).  

Scratch variables have the same properties as variables left with
@cmd{LEAVE}: they retain their values between cases, and for the first
case they are initialized to 0 or blanks.  They have the additional
property that they are deleted before the execution of any procedure.
For this reason, scratch variables can't be used for analysis.  To use
a scratch variable in an analysis, use @cmd{COMPUTE} (@pxref{COMPUTE})
to copy its value into an ordinary variable, then use that ordinary
variable in the analysis.

@node Files, BNF, Variables, Language
@section Files Used by PSPP

PSPP makes use of many files each time it runs.  Some of these it
reads, some it writes, some it creates.  Here is a table listing the
most important of these files:

@table @strong
@cindex file, command
@cindex file, syntax file
@cindex command file
@cindex syntax file
@item command file
@itemx syntax file
These names (synonyms) refer to the file that contains instructions
that tell PSPP what to do.  The syntax file's name is specified on
the PSPP command line.  Syntax files can also be pulled in with
@cmd{INCLUDE} (@pxref{INCLUDE}).

@cindex file, data
@cindex data file
@item data file
Data files contain raw data in ASCII format suitable for being read in
by @cmd{DATA LIST}.  Data can be embedded in the syntax
file with @cmd{BEGIN DATA} and @cmd{END DATA}: this makes the
syntax file a data file too.

@cindex file, output
@cindex output file
@item listing file
One or more output files are created by PSPP each time it is
run.  The output files receive the tables and charts produced by
statistical procedures.  The output files may be in any number of formats,
depending on how PSPP is configured.

@cindex active file
@cindex file, active
@item active file
The active file is the ``file'' on which all PSPP procedures
are performed.  The active file contains variable definitions and
cases.  The active file is not necessarily a disk file: it is stored
in memory if there is room.
@end table

@node BNF,  , Files, Language
@section Backus-Naur Form
@cindex BNF
@cindex Backus-Naur Form
@cindex command syntax, description of
@cindex description of command syntax

The syntax of some parts of the PSPP language is presented in this
manual using the formalism known as @dfn{Backus-Naur Form}, or BNF. The
following table describes BNF:

@itemize @bullet
@cindex keywords
@cindex terminals
@item
Words in all-uppercase are PSPP keyword tokens.  In BNF, these are
often called @dfn{terminals}.  There are some special terminals, which
are written in lowercase for clarity:

@table @asis
@cindex @code{number}
@item @code{number}
A real number.

@cindex @code{integer}
@item @code{integer}
An integer number.

@cindex @code{string}
@item @code{string}
A string.

@cindex @code{var-name}
@item @code{var-name}
A single variable name.

@cindex operators
@cindex punctuators
@item @code{=}, @code{/}, @code{+}, @code{-}, etc.
Operators and punctuators.

@cindex @code{.}
@item @code{.}
The end of the command.  This is not necessarily an actual dot in the
syntax file: @xref{Commands}, for more details.
@end table

@item
@cindex productions
@cindex nonterminals
Other words in all lowercase refer to BNF definitions, called
@dfn{productions}.  These productions are also known as
@dfn{nonterminals}.  Some nonterminals are very common, so they are
defined here in English for clarity:

@table @code
@cindex @code{var-list}
@item var-list
A list of one or more variable names or the keyword @code{ALL}.

@cindex @code{expression}
@item expression
An expression.  @xref{Expressions}, for details.
@end table

@item
@cindex ``is defined as''
@cindex productions
@samp{::=} means ``is defined as''.  The left side of @samp{::=} gives
the name of the nonterminal being defined.  The right side of @samp{::=}
gives the definition of that nonterminal.  If the right side is empty,
then one possible expansion of that nonterminal is nothing.  A BNF
definition is called a @dfn{production}.

@item
@cindex terminals and nonterminals, differences
So, the key difference between a terminal and a nonterminal is that a
terminal cannot be broken into smaller parts---in fact, every terminal
is a single token (@pxref{Tokens}).  On the other hand, nonterminals are
composed of a (possibly empty) sequence of terminals and nonterminals.
Thus, terminals indicate the deepest level of syntax description.  (In
parsing theory, terminals are the leaves of the parse tree; nonterminals
form the branches.)

@item
@cindex start symbol
@cindex symbol, start
The first nonterminal defined in a set of productions is called the
@dfn{start symbol}.  The start symbol defines the entire syntax for
that command.
@end itemize
@setfilename ignored

_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Re: updated documentation

Reply via email to