These commits, taken together, rip out the PSPP lexer and replace
it by one that is more flexible, easier to use, and better at handling
character encodings.

Ben Pfaff (18):
  data-reader: Remove unreachable "return" statements.
  file-name: Do not make output files line-buffered in fn_open().
  sys-file-reader: Refactor to clean up character encoding support.
  output: New function text_item_create_nocopy().
  str: New function ss_realloc().
  str: Rename ss_chomp() to ss_chomp_byte(), ds_chomp() to
    ds_chomp_byte().
  str: New functions for checking for and removing string suffixes.
  hash-functions: New function hash_case_bytes().
  i18n: New function uc_name().
  i18n: New function recode_string_len().
  i18n: New functions for truncating strings in an arbitrary encoding.
  identifier: Rename token_type_to_string() and make a new version.
  i18n: New functions and data structure for obtaining encoding info.
  encoding-guesser: New library to guess the encoding of a text file.
  u8-istream: New library for reading a text file and recoding to
    UTF-8.
  segment: New library for low-level phase of lexical syntax analysis.
  scan: New library for high-level PSPP syntax lexical analysis.
  lexer: Reimplement for better testability and internationalization.

 NEWS                                               |   46 +-
 Smake                                              |    8 +-
 doc/dev/concepts.texi                              |   25 +-
 doc/flow-control.texi                              |   33 +-
 doc/invoking.texi                                  |   20 +-
 doc/language.texi                                  |   96 +-
 doc/utilities.texi                                 |   69 +-
 perl-module/PSPP.xs                                |   14 +-
 perl-module/t/Pspp.t                               |    4 +-
 src/data/automake.mk                               |    1 +
 src/data/dictionary.c                              |  174 +-
 src/data/dictionary.h                              |   14 +-
 src/data/file-handle-def.c                         |    8 +-
 src/data/file-name.c                               |   11 +-
 src/data/gnumeric-reader.h                         |    8 +-
 src/data/identifier.c                              |  124 +-
 src/data/identifier.h                              |   12 +-
 src/data/identifier2.c                             |  133 ++
 src/data/mrset.c                                   |   33 +-
 src/data/mrset.h                                   |    7 +-
 src/data/por-file-reader.c                         |   25 +-
 src/data/por-file-writer.c                         |    5 +-
 src/data/procedure.c                               |   19 +
 src/data/procedure.h                               |    4 +-
 src/data/sys-file-reader.c                         | 2084 +++++++++++---------
 src/data/sys-file-reader.h                         |    2 +-
 src/data/sys-file-writer.c                         |   20 +-
 src/data/variable.c                                |  139 +-
 src/data/variable.h                                |    7 +-
 src/data/vector.c                                  |   15 +-
 src/data/vector.h                                  |    4 +-
 src/language/automake.mk                           |    6 -
 src/language/command.c                             |  158 +-
 src/language/command.def                           |   11 +-
 src/language/control/automake.mk                   |    3 +-
 src/language/control/do-if.c                       |   12 +-
 src/language/control/loop.c                        |    4 +-
 src/language/control/repeat.c                      |  714 +++-----
 src/language/control/temporary.c                   |    6 +-
 src/language/data-io/combine-files.c               |   21 +-
 src/language/data-io/data-list.c                   |    5 +-
 src/language/data-io/data-parser.c                 |    9 +-
 src/language/data-io/data-reader.c                 |   58 +-
 src/language/data-io/file-handle.q                 |   29 +-
 src/language/data-io/get-data.c                    |    8 +-
 src/language/data-io/inpt-pgm.c                    |   30 +-
 src/language/data-io/save-translate.c              |    2 +
 src/language/data-io/trim.c                        |    5 +-
 src/language/dictionary/apply-dictionary.c         |   11 +-
 src/language/dictionary/attributes.c               |   69 +-
 src/language/dictionary/missing-values.c           |   44 +-
 src/language/dictionary/modify-variables.c         |    4 +-
 src/language/dictionary/mrsets.c                   |   27 +-
 src/language/dictionary/numeric.c                  |   12 +-
 src/language/dictionary/rename-variables.c         |    3 +-
 src/language/dictionary/split-file.c               |    4 +-
 src/language/dictionary/sys-file-info.c            |   18 +-
 src/language/dictionary/value-labels.c             |   22 +-
 src/language/dictionary/variable-label.c           |   17 +-
 src/language/dictionary/vector.c                   |   13 +-
 src/language/dictionary/weight.c                   |    4 +-
 src/language/expressions/parse.c                   |   34 +-
 src/language/expressions/private.h                 |    5 +-
 src/language/lexer/automake.mk                     |    8 +
 src/language/lexer/include-path.c                  |   89 +
 .../msg-ui.h => language/lexer/include-path.h}     |   20 +-
 src/language/lexer/lexer.c                         | 2143 +++++++++++---------
 src/language/lexer/lexer.h                         |  158 ++-
 src/language/lexer/q2c.c                           |    6 +-
 src/language/lexer/scan.c                          |  596 ++++++
 src/language/lexer/scan.h                          |   93 +
 src/language/lexer/segment.c                       | 1631 +++++++++++++++
 src/language/lexer/segment.h                       |  122 ++
 src/language/lexer/token.c                         |  173 ++
 src/language/{prompt.h => lexer/token.h}           |   36 +-
 src/language/lexer/value-parser.c                  |    2 +-
 src/language/lexer/variable-parser.c               |   17 +-
 src/language/lexer/variable-parser.h               |   10 +-
 src/language/prompt.c                              |   75 -
 src/language/stats/aggregate.c                     |   23 +-
 src/language/stats/autorecode.c                    |    3 +-
 src/language/stats/descriptives.c                  |   23 +-
 src/language/stats/flip.c                          |    8 +-
 src/language/stats/frequencies.q                   |   25 +-
 src/language/stats/npar.c                          |   77 +-
 src/language/stats/rank.q                          |   12 +-
 src/language/stats/sort-cases.c                    |    2 +-
 src/language/syntax-file.c                         |  144 --
 src/language/syntax-file.h                         |   25 -
 src/language/syntax-string-source.c                |  151 --
 src/language/syntax-string-source.h                |   33 -
 src/language/tests/format-guesser-test.c           |    2 +-
 src/language/tests/moments-test.c                  |    2 +-
 src/language/tests/paper-size.c                    |    2 +-
 src/language/utilities/cache.c                     |    4 +-
 src/language/utilities/cd.c                        |    8 +-
 src/language/utilities/date.c                      |    4 +-
 src/language/utilities/host.c                      |   14 +-
 src/language/utilities/include.c                   |  198 +-
 src/language/utilities/permissions.c               |   13 +-
 src/language/utilities/set.q                       |   13 +-
 src/language/utilities/title.c                     |   92 +-
 src/language/xforms/compute.c                      |    6 +-
 src/language/xforms/count.c                        |   26 +-
 src/language/xforms/fail.c                         |    8 +-
 src/language/xforms/recode.c                       |   45 +-
 src/language/xforms/sample.c                       |    4 +-
 src/language/xforms/select-if.c                    |    2 +-
 src/libpspp/automake.mk                            |   10 +-
 src/libpspp/encoding-guesser.c                     |  289 +++
 src/libpspp/encoding-guesser.h                     |  126 ++
 src/libpspp/getl.c                                 |  271 ---
 src/libpspp/getl.h                                 |  113 -
 src/libpspp/hash-functions.c                       |   18 +-
 src/libpspp/hash-functions.h                       |    1 +
 src/libpspp/i18n.c                                 |  345 ++++-
 src/libpspp/i18n.h                                 |   87 +-
 src/libpspp/message.c                              |  118 +-
 src/libpspp/message.h                              |   23 +-
 src/libpspp/msg-locator.c                          |   87 -
 src/libpspp/msg-locator.h                          |   34 -
 src/{ui/terminal/msg-ui.c => libpspp/prompt.c}     |   41 +-
 src/{language => libpspp}/prompt.h                 |   17 +-
 src/libpspp/str.c                                  |   54 +-
 src/libpspp/str.h                                  |   11 +-
 src/libpspp/u8-istream.c                           |  475 +++++
 src/libpspp/u8-istream.h                           |   45 +
 src/output/driver.c                                |   56 +-
 src/output/text-item.c                             |   12 +-
 src/output/text-item.h                             |    3 +-
 src/ui/gui/automake.mk                             |    2 -
 src/ui/gui/comments-dialog.c                       |   13 +-
 src/ui/gui/executor.c                              |   19 +-
 src/ui/gui/executor.h                              |    4 +-
 src/ui/gui/main.c                                  |   39 +-
 src/ui/gui/psppire-data-window.c                   |   16 +-
 src/ui/gui/psppire-dict.c                          |    9 +-
 src/ui/gui/psppire-syntax-window.c                 |   16 +-
 src/ui/gui/psppire-syntax-window.h                 |    1 -
 src/ui/gui/psppire-var-store.c                     |    7 +-
 src/ui/gui/psppire.c                               |   34 +-
 src/ui/gui/psppire.h                               |    6 +-
 src/ui/gui/syntax-editor-source.c                  |  130 --
 src/ui/gui/syntax-editor-source.h                  |   34 -
 src/ui/gui/text-data-import-dialog.c               |    4 +-
 src/ui/source-init-opts.c                          |   20 +-
 src/ui/source-init-opts.h                          |    4 +-
 src/ui/terminal/automake.mk                        |   13 +-
 src/ui/terminal/main.c                             |  113 +-
 src/ui/terminal/read-line.h                        |   31 -
 src/ui/terminal/terminal-opts.c                    |   58 +-
 src/ui/terminal/terminal-opts.h                    |    8 +-
 src/ui/terminal/{read-line.c => terminal-reader.c} |  310 ++--
 .../repeat.h => ui/terminal/terminal-reader.h}     |   11 +-
 tests/automake.mk                                  |   43 +
 tests/data/data-in.at                              |   64 +-
 tests/data/sys-file-reader.at                      |  295 +--
 tests/dissect-sysfile.c                            |    8 +-
 tests/language/control/do-repeat.at                |  101 +-
 tests/language/data-io/data-list.at                |    6 +-
 tests/language/data-io/get.at                      |    2 -
 tests/language/data-io/inpt-pgm.at                 |    6 +-
 tests/language/data-io/print.at                    |    2 +-
 tests/language/dictionary/missing-values.at        |    2 +-
 tests/language/dictionary/sys-file-info.at         |    4 +-
 tests/language/expressions/evaluate.at             |    9 +-
 tests/language/expressions/parse.at                |    2 +-
 tests/language/lexer/lexer.at                      |   43 +
 tests/language/lexer/q2c.at                        |    2 +-
 tests/language/lexer/scan-test.c                   |  217 ++
 tests/language/lexer/scan.at                       |  818 ++++++++
 tests/language/lexer/segment-test.c                |  318 +++
 tests/language/lexer/segment.at                    | 1070 ++++++++++
 tests/language/stats/aggregate.at                  |    4 +-
 tests/language/stats/rank.at                       |    6 +-
 tests/language/utilities/insert.at                 |   29 +-
 tests/libpspp/encoding-guesser-test.c              |  102 +
 tests/libpspp/encoding-guesser.at                  |  143 ++
 tests/libpspp/i18n-test.c                          |   68 +-
 tests/libpspp/i18n.at                              |  125 +-
 tests/libpspp/u8-istream-test.c                    |  126 ++
 tests/libpspp/u8-istream.at                        |  142 ++
 182 files changed, 12211 insertions(+), 5364 deletions(-)
 create mode 100644 src/data/identifier2.c
 create mode 100644 src/language/lexer/include-path.c
 rename src/{ui/terminal/msg-ui.h => language/lexer/include-path.h} (66%)
 create mode 100644 src/language/lexer/scan.c
 create mode 100644 src/language/lexer/scan.h
 create mode 100644 src/language/lexer/segment.c
 create mode 100644 src/language/lexer/segment.h
 create mode 100644 src/language/lexer/token.c
 copy src/language/{prompt.h => lexer/token.h} (50%)
 delete mode 100644 src/language/prompt.c
 delete mode 100644 src/language/syntax-file.c
 delete mode 100644 src/language/syntax-file.h
 delete mode 100644 src/language/syntax-string-source.c
 delete mode 100644 src/language/syntax-string-source.h
 create mode 100644 src/libpspp/encoding-guesser.c
 create mode 100644 src/libpspp/encoding-guesser.h
 delete mode 100644 src/libpspp/getl.c
 delete mode 100644 src/libpspp/getl.h
 delete mode 100644 src/libpspp/msg-locator.c
 delete mode 100644 src/libpspp/msg-locator.h
 rename src/{ui/terminal/msg-ui.c => libpspp/prompt.c} (58%)
 rename src/{language => libpspp}/prompt.h (69%)
 create mode 100644 src/libpspp/u8-istream.c
 create mode 100644 src/libpspp/u8-istream.h
 delete mode 100644 src/ui/gui/syntax-editor-source.c
 delete mode 100644 src/ui/gui/syntax-editor-source.h
 delete mode 100644 src/ui/terminal/read-line.h
 rename src/ui/terminal/{read-line.c => terminal-reader.c} (53%)
 rename src/{language/control/repeat.h => ui/terminal/terminal-reader.h} (77%)
 create mode 100644 tests/language/lexer/scan-test.c
 create mode 100644 tests/language/lexer/scan.at
 create mode 100644 tests/language/lexer/segment-test.c
 create mode 100644 tests/language/lexer/segment.at
 create mode 100644 tests/libpspp/encoding-guesser-test.c
 create mode 100644 tests/libpspp/encoding-guesser.at
 create mode 100644 tests/libpspp/u8-istream-test.c
 create mode 100644 tests/libpspp/u8-istream.at

-- 
1.7.2.3


_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to