Re: [PATCH] Make comm check order of input files

2008-04-21 Thread Bo Borgerson
Hi,

The previous version did not warn if the final record in a file was
out of order and `--check-order' was not in effect.

Thanks,

Bo
From dc34eed9e6ee34f473a8d74b98bccaf082fe79c2 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Sun, 20 Apr 2008 21:24:16 -0400
Subject: [PATCH] Make comm check order of input files

* NEWS: List new behavior.
* doc/coreutils.texi (checkOrderOption) New macro for
describing `--check-order' and `--nocheck-order', used in
both join and comm.
* src/comm.c (main): Initialize new options.
(usage): Describe new options.
(compare_files): Keep an extra pair of buffers for the previous
line from each file to check the internal order.
(check_order): If an order-check is required, compare and handle
the result appropriately.
(copylinebuffer): Copy a linebuffer; used for copy before read.
* tests/misc/Makefile.am: List new test.
* tests/misc/comm: Tests for the comm program, including the
new order-checking functionality and attendant command-line options.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 NEWS   |8 ++
 doc/coreutils.texi |   39 +++---
 src/comm.c |  178 +++
 tests/misc/Makefile.am |1 +
 tests/misc/comm|  131 +++
 5 files changed, 329 insertions(+), 28 deletions(-)
 create mode 100755 tests/misc/comm

diff --git a/NEWS b/NEWS
index 04893c6..4038da2 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,13 @@
 GNU coreutils NEWS-*- outline -*-
 
+* Noteworthy changes in release ??
+
+** New features
+
+  comm now verifies that the inputs are in sorted order.  This check can
+  be turned off with the --nocheck-order option.
+
+
 * Noteworthy changes in release 6.11 (2008-04-19) [stable]
 
 ** Bug fixes
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index f42e736..5ed7f43 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4342,6 +4342,32 @@ status that does not depend on the result of the comparison.
 Upon normal completion @command{comm} produces an exit code of zero.
 If there is an error it exits with nonzero status.
 
[EMAIL PROTECTED] checkOrderOption{cmd}
+If the @option{--check-order} option is given, unsorted inputs will
+cause a fatal error message.  If the option @option{--nocheck-order}
+is given, unsorted inputs will never cause an error message.  If
+neither of these options is given, wrongly sorted inputs are diagnosed
+only if an input file is found to contain unpairable lines.  If an
+input file is diagnosed as being unsorted, the @command{\cmd\} command
+will exit with a nonzero status (and the output should not be used).
+
+Forcing @command{\cmd\} to process wrongly sorted input files
+containing unpairable lines by specifying @option{--nocheck-order} is
+not guaranteed to produce any particular output.  The output will
+probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED] macro
[EMAIL PROTECTED]
+
[EMAIL PROTECTED] @samp
+
[EMAIL PROTECTED] --check-order
+Fail with an error message if either input file is wrongly ordered.
+
[EMAIL PROTECTED] --nocheck-order
+Do not check that both input files are in sorted order.
+
[EMAIL PROTECTED] table
+
 
 @node tsort invocation
 @section @command{tsort}: Topological sort
@@ -5183,18 +5209,7 @@ c c1 c2
 b b1 b2
 @end example
 
-If the @option{--check-order} option is given, unsorted inputs will
-cause a fatal error message.  If the option @option{--nocheck-order}
-is given, unsorted inputs will never cause an error message.  If
-neither of these options is given, wrongly sorted inputs are diagnosed
-only if an input file is found to contain unpairable lines.  If an
-input file is diagnosed as being unsorted, the @command{join} command
-will exit with a nonzero status (and the output should not be used).
-
-Forcing @command{join} to process wrongly sorted input files
-containing unpairable lines by specifying @option{--nocheck-order} is
-not guaranteed to produce any particular output.  The output will
-probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED]
 
 The defaults are:
 @itemize
diff --git a/src/comm.c b/src/comm.c
index cbda362..0a9e8b9 100644
--- a/src/comm.c
+++ b/src/comm.c
@@ -52,8 +52,31 @@ static bool only_file_2;
 /* If true, print lines that are found in both files. */
 static bool both;
 
+/* If nonzero, we have seen at least one unpairable line. */
+static bool seen_unpairable;
+
+/* If nonzero, we have warned about disorder in that file. */
+static bool issued_disorder_warning[2];
+
+/* If nonzero, check that the input is correctly ordered. */
+static enum
+  {
+CHECK_ORDER_DEFAULT,
+CHECK_ORDER_ENABLED,
+CHECK_ORDER_DISABLED
+  } check_input_order;
+
+enum
+{
+  CHECK_ORDER_OPTION = CHAR_MAX + 1,
+  NOCHECK_ORDER_OPTION
+};
+
+
 static struct option const long_options[] =
 {
+  {check-order, no_argument, NULL, CHECK_ORDER_OPTION},
+  {nocheck-order

Re: [PATCH] Make comm check order of input files

2008-04-21 Thread Bo Borgerson
Hi,

Pádraig pointed out that there's no reason to copy data around here.

This version avoids the copies.

Thanks Pádraig,

Bo
From 49ec3883efc8a89e8a4260f25bb50178aced1be4 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Sun, 20 Apr 2008 21:24:16 -0400
Subject: [PATCH] Make comm check order of input files

* NEWS: List new behavior.
* doc/coreutils.texi (checkOrderOption) New macro for
describing `--check-order' and `--nocheck-order', used in
both join and comm.
* src/comm.c (main): Initialize new options.
(usage): Describe new options.
(compare_files): Keep an extra pair of buffers for the previous
line from each file to check the internal order.
(check_order): If an order-check is required, compare and handle
the result appropriately.
(copylinebuffer): Copy a linebuffer; used for copy before read.
* tests/misc/Makefile.am: List new test.
* tests/misc/comm: Tests for the comm program, including the
new order-checking functionality and attendant command-line options.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 NEWS   |8 ++
 doc/coreutils.texi |   39 
 src/comm.c |  166 ++--
 tests/misc/Makefile.am |1 +
 tests/misc/comm|  131 ++
 5 files changed, 313 insertions(+), 32 deletions(-)
 create mode 100755 tests/misc/comm

diff --git a/NEWS b/NEWS
index 04893c6..4038da2 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,13 @@
 GNU coreutils NEWS-*- outline -*-
 
+* Noteworthy changes in release ??
+
+** New features
+
+  comm now verifies that the inputs are in sorted order.  This check can
+  be turned off with the --nocheck-order option.
+
+
 * Noteworthy changes in release 6.11 (2008-04-19) [stable]
 
 ** Bug fixes
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index f42e736..5ed7f43 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4342,6 +4342,32 @@ status that does not depend on the result of the comparison.
 Upon normal completion @command{comm} produces an exit code of zero.
 If there is an error it exits with nonzero status.
 
[EMAIL PROTECTED] checkOrderOption{cmd}
+If the @option{--check-order} option is given, unsorted inputs will
+cause a fatal error message.  If the option @option{--nocheck-order}
+is given, unsorted inputs will never cause an error message.  If
+neither of these options is given, wrongly sorted inputs are diagnosed
+only if an input file is found to contain unpairable lines.  If an
+input file is diagnosed as being unsorted, the @command{\cmd\} command
+will exit with a nonzero status (and the output should not be used).
+
+Forcing @command{\cmd\} to process wrongly sorted input files
+containing unpairable lines by specifying @option{--nocheck-order} is
+not guaranteed to produce any particular output.  The output will
+probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED] macro
[EMAIL PROTECTED]
+
[EMAIL PROTECTED] @samp
+
[EMAIL PROTECTED] --check-order
+Fail with an error message if either input file is wrongly ordered.
+
[EMAIL PROTECTED] --nocheck-order
+Do not check that both input files are in sorted order.
+
[EMAIL PROTECTED] table
+
 
 @node tsort invocation
 @section @command{tsort}: Topological sort
@@ -5183,18 +5209,7 @@ c c1 c2
 b b1 b2
 @end example
 
-If the @option{--check-order} option is given, unsorted inputs will
-cause a fatal error message.  If the option @option{--nocheck-order}
-is given, unsorted inputs will never cause an error message.  If
-neither of these options is given, wrongly sorted inputs are diagnosed
-only if an input file is found to contain unpairable lines.  If an
-input file is diagnosed as being unsorted, the @command{join} command
-will exit with a nonzero status (and the output should not be used).
-
-Forcing @command{join} to process wrongly sorted input files
-containing unpairable lines by specifying @option{--nocheck-order} is
-not guaranteed to produce any particular output.  The output will
-probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED]
 
 The defaults are:
 @itemize
diff --git a/src/comm.c b/src/comm.c
index cbda362..b2b2bba 100644
--- a/src/comm.c
+++ b/src/comm.c
@@ -52,8 +52,31 @@ static bool only_file_2;
 /* If true, print lines that are found in both files. */
 static bool both;
 
+/* If nonzero, we have seen at least one unpairable line. */
+static bool seen_unpairable;
+
+/* If nonzero, we have warned about disorder in that file. */
+static bool issued_disorder_warning[2];
+
+/* If nonzero, check that the input is correctly ordered. */
+static enum
+  {
+CHECK_ORDER_DEFAULT,
+CHECK_ORDER_ENABLED,
+CHECK_ORDER_DISABLED
+  } check_input_order;
+
+enum
+{
+  CHECK_ORDER_OPTION = CHAR_MAX + 1,
+  NOCHECK_ORDER_OPTION
+};
+
+
 static struct option const long_options[] =
 {
+  {check-order, no_argument, NULL, CHECK_ORDER_OPTION},
+  {nocheck-order

Re: [PATCH] Make comm check order of input files

2008-04-21 Thread Jim Meyering
Bo Borgerson [EMAIL PROTECTED] wrote:
 Pádraig pointed out that there's no reason to copy data around here.

 This version avoids the copies.

 Thanks Pádraig,

Thanks from me, too.

If you guys can help by reviewing others' changes, that would help me.
In the run-up to 6.11, quite a few large patches have accumulated, and
all by myself, it's going to take a while to work through all that.
If you do review something, and find nothing wrong with it, please say
so, publicly.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


[PATCH] Make comm check order of input files

2008-04-20 Thread Bo Borgerson
On Sun, Apr 20, 2008 at 8:35 PM, Karl Berry [EMAIL PROTECTED] wrote:
 If not, I'll be happy to do it.

  Please!

Here's a patch.

Bo
From 1a651ab6aedea0d0cc383f2e60c82fe7f0d395f0 Mon Sep 17 00:00:00 2001
From: Bo Borgerson [EMAIL PROTECTED]
Date: Sun, 20 Apr 2008 21:24:16 -0400
Subject: [PATCH] Make comm check order of input files

* NEWS: List new behavior.
* doc/coreutils.texi (checkOrderOption) New macro for
describing `--check-order' and `--nocheck-order', used in
both join and comm.
* src/comm.c (main): Initialize new options.
(usage): Describe new options.
(compare_files): Keep an extra pair of buffers for the previous
line from each file to check the internal order.
(check_order): If an order-check is required, compare and handle
the result appropriately.
(copylinebuffer): Copy a linebuffer; used for copy before read.
* tests/misc/Makefile.am: List new test.
* tests/misc/comm: Tests for the comm program, including the
new order-checking functionality and attendant command-line options.

Signed-off-by: Bo Borgerson [EMAIL PROTECTED]
---
 NEWS   |8 +++
 doc/coreutils.texi |   39 
 src/comm.c |  158 +++-
 tests/misc/Makefile.am |1 +
 tests/misc/comm|  121 
 5 files changed, 300 insertions(+), 27 deletions(-)
 create mode 100755 tests/misc/comm

diff --git a/NEWS b/NEWS
index 04893c6..4038da2 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,13 @@
 GNU coreutils NEWS-*- outline -*-
 
+* Noteworthy changes in release ??
+
+** New features
+
+  comm now verifies that the inputs are in sorted order.  This check can
+  be turned off with the --nocheck-order option.
+
+
 * Noteworthy changes in release 6.11 (2008-04-19) [stable]
 
 ** Bug fixes
diff --git a/doc/coreutils.texi b/doc/coreutils.texi
index f42e736..5ed7f43 100644
--- a/doc/coreutils.texi
+++ b/doc/coreutils.texi
@@ -4342,6 +4342,32 @@ status that does not depend on the result of the comparison.
 Upon normal completion @command{comm} produces an exit code of zero.
 If there is an error it exits with nonzero status.
 
[EMAIL PROTECTED] checkOrderOption{cmd}
+If the @option{--check-order} option is given, unsorted inputs will
+cause a fatal error message.  If the option @option{--nocheck-order}
+is given, unsorted inputs will never cause an error message.  If
+neither of these options is given, wrongly sorted inputs are diagnosed
+only if an input file is found to contain unpairable lines.  If an
+input file is diagnosed as being unsorted, the @command{\cmd\} command
+will exit with a nonzero status (and the output should not be used).
+
+Forcing @command{\cmd\} to process wrongly sorted input files
+containing unpairable lines by specifying @option{--nocheck-order} is
+not guaranteed to produce any particular output.  The output will
+probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED] macro
[EMAIL PROTECTED]
+
[EMAIL PROTECTED] @samp
+
[EMAIL PROTECTED] --check-order
+Fail with an error message if either input file is wrongly ordered.
+
[EMAIL PROTECTED] --nocheck-order
+Do not check that both input files are in sorted order.
+
[EMAIL PROTECTED] table
+
 
 @node tsort invocation
 @section @command{tsort}: Topological sort
@@ -5183,18 +5209,7 @@ c c1 c2
 b b1 b2
 @end example
 
-If the @option{--check-order} option is given, unsorted inputs will
-cause a fatal error message.  If the option @option{--nocheck-order}
-is given, unsorted inputs will never cause an error message.  If
-neither of these options is given, wrongly sorted inputs are diagnosed
-only if an input file is found to contain unpairable lines.  If an
-input file is diagnosed as being unsorted, the @command{join} command
-will exit with a nonzero status (and the output should not be used).
-
-Forcing @command{join} to process wrongly sorted input files
-containing unpairable lines by specifying @option{--nocheck-order} is
-not guaranteed to produce any particular output.  The output will
-probably not correspond with whatever you hoped it would be.
[EMAIL PROTECTED]
 
 The defaults are:
 @itemize
diff --git a/src/comm.c b/src/comm.c
index cbda362..5b1e5a2 100644
--- a/src/comm.c
+++ b/src/comm.c
@@ -52,8 +52,31 @@ static bool only_file_2;
 /* If true, print lines that are found in both files. */
 static bool both;
 
+/* If nonzero, we have seen at least one unpairable line. */
+static bool seen_unpairable;
+
+/* If nonzero, we have warned about disorder in that file. */
+static bool issued_disorder_warning[2];
+
+/* If nonzero, check that the input is correctly ordered. */
+static enum
+  {
+CHECK_ORDER_DEFAULT,
+CHECK_ORDER_ENABLED,
+CHECK_ORDER_DISABLED
+  } check_input_order;
+
+enum
+{
+  CHECK_ORDER_OPTION = CHAR_MAX + 1,
+  NOCHECK_ORDER_OPTION
+};
+
+
 static struct option const long_options[] =
 {
+  {check-order, no_argument, NULL, CHECK_ORDER_OPTION},
+  {nocheck