Re: mhbuild and long header fields

2023-08-31 Thread David Levine
Philipp wrote:

> I have a version with charstring_t attached. I'm unsure if it's better
> to only fold the body or include the field name. The version attached
> only fold the body.

RFC 5322 ยง2.2.3 only mentions folding the body.  And field names
can't have whitespace.  So I think that only a body can be folded.

I'll exercise your patch for the next week or so.  Unfortunately,
I'll be off line during much of that time so might not respond
quickly.

Thank you for doing this.  As the References field in this message
shows, it's badly needed.

David



Re: mhbuild and long header fields

2023-08-31 Thread Philipp
[2023-08-28 11:53] Philipp 
> [2023-08-27 22:00] David Levine 
> > Philipp wrote:
> >
> > > [2023-08-27 09:29] David Levine 
> > > >
> > > > My only comment on the code itself is that I prefer functions that
> > > > just do one thing.  So I would implement a fold function that just
> > > > modifies a string, and leave the fprintf/fwrite to output_headers().
> > >
> > > I have thought about this, but this would require fold() to allocate 
> > > memory.
> > > Because the plan was to use fold() in output_headers() I though this could
> > > be avoided.
> >
> > I don't think that allocating memory is a drawback.
>
> I first wrote a long text about the missing string abstraction, then
> I found charstring_t. I'll adjust this a few days.

I have a version with charstring_t attached. I'm unsure if it's better
to only fold the body or include the field name. The version attached
only fold the body.

Philipp
diff --git a/Makefile.am b/Makefile.am
index 4fc84c1d..168d9fe6 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -378,6 +378,7 @@ noinst_HEADERS = \
 sbr/fmt_new.h \
 sbr/fmt_rfc2047.h \
 sbr/fmt_scan.h \
+sbr/fold.h \
 sbr/folder_addmsg.h \
 sbr/folder_delmsgs.h \
 sbr/folder_free.h \
@@ -1106,6 +1107,7 @@ sbr_libmh_a_SOURCES = \
 sbr/fmt_new.c \
 sbr/fmt_rfc2047.c \
 sbr/fmt_scan.c \
+sbr/fold.c \
 sbr/folder_addmsg.c \
 sbr/folder_delmsgs.c \
 sbr/folder_free.c \
diff --git a/sbr/fold.c b/sbr/fold.c
new file mode 100644
index ..d80cfe2d
--- /dev/null
+++ b/sbr/fold.c
@@ -0,0 +1,63 @@
+/* fold.c -- fold a mail header field
+ *
+ * This code is Copyright (c), by the authors of nmh.  See the
+ * COPYRIGHT file in the root directory of the nmh distribution for
+ * complete copyright information. */
+
+#include "h/mh.h"
+#include "h/mime.h"
+#include "sbr/charstring.h"
+#include "fold.h"
+
+void
+fold(charstring_t dst, size_t namelen, const char *restrict body)
+{
+	const char *restrict body_next;
+	const char *restrict wsp;
+	const char *restrict wsp_next;
+	const bool crlf = strchr(body, '\r');
+	charstring_clear(dst);
+	namelen++;
+
+	while (*body) {
+		body_next = strchr(body, '\n');
+		if ((unsigned long) (body_next - body) <= MAXTEXTPERLN - namelen) {
+			charstring_push_back_chars(dst, body, body_next - body + 1, body_next - body + 1);
+			namelen = 0;
+			body = body_next + 1;
+			continue;
+		}
+		wsp = body;
+		while (namelen == 0 && (*wsp == ' ' || *wsp == '\t')) {
+			wsp++;
+		}
+		wsp = wsp_next = strpbrk(wsp, " \t");
+
+		/* if now whitespace is in the current line just print the curret line as is */
+		if (!wsp_next || wsp_next > body_next) {
+			charstring_push_back_chars(dst, body, body_next - body + 1, body_next - body + 1);
+			namelen = 0;
+			body = body_next + 1;
+			continue;
+		}
+
+		while ((unsigned long)(wsp_next - body) <= MAXTEXTPERLN - namelen) {
+			wsp = wsp_next;
+			wsp_next = strpbrk(wsp+1, " \t");
+			if (!wsp_next) {
+break;
+			}
+			if (wsp_next > body_next) {
+break;
+			}
+		}
+
+		charstring_push_back_chars(dst, body, wsp - body, wsp - body);
+		if (crlf) {
+			charstring_push_back(dst, '\r');
+		}
+		charstring_push_back(dst, '\n');
+		namelen = 0;
+		body = wsp;
+	}
+}
diff --git a/sbr/fold.h b/sbr/fold.h
new file mode 100644
index ..8d0fd6d7
--- /dev/null
+++ b/sbr/fold.h
@@ -0,0 +1,7 @@
+/* fold.h -- fold a mail header field
+ *
+ * This code is Copyright (c), by the authors of nmh.  See the
+ * COPYRIGHT file in the root directory of the nmh distribution for
+ * complete copyright information. */
+
+void fold(charstring_t dst, size_t namelen, const char *restrict body);
diff --git a/test/mhbuild/test-mhbuild b/test/mhbuild/test-mhbuild
index 706a804a..f8d4992b 100755
--- a/test/mhbuild/test-mhbuild
+++ b/test/mhbuild/test-mhbuild
@@ -221,5 +221,58 @@ run_test "mhbuild $f" \
 
 check "$f" "$expected"
 
+start_test "Checking for correct header folding"
+
+cat >"`mhpath new`" <<\E
+From: Somebody 
+To: Nobody 
+Subject: Test message
+References:   
+
+This is a test
+E
+
+cat > "$expected" <<\E
+From: Somebody 
+To: Nobody 
+Subject: Test message
+References: 
+  
+MIME-Version: 1.0
+Content-Type: text/plain; charset="us-ascii"
+
+This is a test
+E
+
+run_test "mhbuild -auto `mhpath last`"
+check "`mhpath last`" "$expected"
+
+start_test "Checking header folding with a to long line"
+
+cat >"`mhpath new`" <<\E
+From: Somebody 
+To: Nobody 
+Subject: Test message
+References:   
+
+This is a test
+E
+
+cat > "$expected" <<\E
+From: Somebody 
+To: Nobody 
+Subject: Test message
+References:
+ 
+  
+MIME-Version: 1.0
+Content-Type: text/plain; charset="us-ascii"
+
+This is a test
+E
+
+run_test "mhbuild -auto `mhpath last`"
+check "`mhpath last`" "$expected"
+
 finish_test
 exit $failed
diff --git a/uip/mhoutsbr.c b/uip/mhoutsbr.c
index 6033cad8..6f80bafc 100644
--- a/uip/mhoutsbr.c
+++ b/uip/mhoutsbr.c
@@ -17,6 +17,8 @@
 #include "h/mhparse.h"
 #include "mhoutsbr.h"
 #include