Hi,
here is UTF-8 support for fmt(1).
This does not include the -c case; the patch is already large enough.
Because tedu@ said he didn't see value in splitting the cut(1) diff,
i dare sending it as one big patch. If anybody wants to have it
split into steps for easier review and a safer transition, please
just say so. But i don't think changing this program is particularly
dangerous.
The main changes are in three areas:
1. get_line():
This function can no longer expand tabs up front because their width
depends on the display width of characters earlier on the line.
This change causes minor growth in indent_length().
While here, always NUL-terminate the input buffer. It's safer and
simplifies the code, also reducing the number of arguments for two
functions.
Also delete the contorted spaces_pending logic in get_line(), simply
trim trailing whitespace at the end, and delete the pointless XMALLOC
macro.
2. process_stream():
It used to iterate bytes, now it iterates characters. The code
becomes a bit longer, but using mbtowc(3), wcwidth(3), and iswblank(3)
directly, it's quite readable in this case.
3. output_word():
Needs both the length of the word in bytes and the width in output
positions now. The hand-rolled output_buffer complicated matters
for no gain. Just let stdio do its work. Simplifies new_paragraph,
too. Also simplify calling of output_indent() by doing the 0 check
inside.
All told, the patch shortens the code by four lines. Not bad
for adding functionality, right? :-)
OK?
Ingo
Index: fmt.c
===
RCS file: /cvs/src/usr.bin/fmt/fmt.c,v
retrieving revision 1.33
diff -u -p -r1.33 fmt.c
--- fmt.c 9 Oct 2015 01:37:07 - 1.33
+++ fmt.c 8 Dec 2015 21:15:15 -
@@ -176,6 +176,8 @@
#include
#include
#include
+#include
+#include
/* Something that, we hope, will never be a genuine line length,
* indentation etc.
@@ -222,7 +224,6 @@ static int grok_mail_headers = 0; /* tr
static int format_troff = 0; /* Format troff? */
static int n_errors = 0; /* Number of failed files. */
-static char *output_buffer = NULL; /* Output line will be built
here */
static size_t x; /* Horizontal position in
output line */
static size_t x0; /* Ditto, ignoring leading
whitespace */
static size_t pending_spaces; /* Spaces to add before next
word */
@@ -232,17 +233,16 @@ static int output_in_paragraph = 0; /*
static voidprocess_named_file(const char *);
static voidprocess_stream(FILE *, const char *);
-static size_t indent_length(const char *, size_t);
+static size_t indent_length(const char *);
static int might_be_header(const char *);
-static voidnew_paragraph(size_t, size_t);
-static voidoutput_word(size_t, size_t, const char *, size_t, size_t);
+static voidnew_paragraph(size_t);
+static voidoutput_word(size_t, size_t, const char *, int, int, int);
static voidoutput_indent(size_t);
static voidcenter_stream(FILE *, const char *);
-static char*get_line(FILE *, size_t *);
+static char*get_line(FILE *);
static void*xrealloc(void *, size_t);
void usage(void);
-#define XMALLOC(x) xrealloc(0, x)
#define ERRS(x) (x >= 127 ? 127 : ++x)
/* Here is perhaps the right place to mention that this code is
@@ -332,7 +332,6 @@ main(int argc, char *argv[])
goal_length = 65;
if (max_length == 0)
max_length = goal_length+10;
- output_buffer = XMALLOC(max_length+1); /* really needn't be longer */
/* 2. Process files. */
@@ -381,25 +380,31 @@ typedef enum {
static void
process_stream(FILE *stream, const char *name)
{
- size_t n;
+ const char *wordp, *cp;
+ wchar_t wc;
size_t np;
size_t last_indent = SILLY; /* how many spaces in last indent? */
size_t para_line_number = 0;/* how many lines already read in this
para? */
size_t first_indent = SILLY;/* indentation of line 0 of paragraph */
+ int wcl;/* number of bytes in wide character */
+ int wcw;/* display width of wide character */
+ int word_length;/* number of bytes in word */
+ int word_width; /* display width of word */
+ int space_width;/* display width of space after word */
+ int line_width; /* display width of line */
HdrType prev_header_type = hdr_ParagraphStart;
HdrType header_type;
/* ^-- header_type of previous line; -1 at para start */
const char *line;
- size_t length;
if (centerP) {
center_stream(stream, name);
return;
}
- while ((lin