Re: [PATCH] wget-1.8.2: Portability, plus EBCDIC patch

2003-10-08 Thread Martin Kraemer
On Tue, Oct 07, 2003 at 06:06:59PM +0200, Hrvoje Niksic wrote:
> Martin, thanks for the patch and the detailed report.  Note that it
> might have made more sense to apply the patch to the latest CVS
> version, which is somewhat different from 1.8.2.

What must I set CVSROOT to?

> I'm really not sure whether to add this patch.  On the one hand, it's
> nice to support as many architectures as possible.  But on the other
> hand, most systems are ASCII.  All the systems I've ever seen or
> worked on have been ASCII.

Right; that is exactly what makes it so hard for those who must
work on EBCDIC systems: nobody supports them, and most available
software is proprietary. So, getting a patch (even if only distributed
as-is, e.g., in contrib/ebcdic.patch) is a valuable help for those
who don't have it (yet).

>  I am fairly certain that I would not be
> able to support EBCDIC in the long run and that, unless someone were
> to continually support EBCDIC, the existing support would bitrot away.
> 
> Is anyone on the Wget list using an EBCDIC system?

How can they if they don't have the patch? It only works if the socket
"talks ASCII" on the network, and that is what the patch solves ;-)

   Martin
-- 
<[EMAIL PROTECTED]> | Fujitsu Siemens
Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730  Munich,  Germany


Re: [PATCH] wget-1.8.2: Portability, plus EBCDIC patch

2003-10-07 Thread Hrvoje Niksic
Martin, thanks for the patch and the detailed report.  Note that it
might have made more sense to apply the patch to the latest CVS
version, which is somewhat different from 1.8.2.

I'm really not sure whether to add this patch.  On the one hand, it's
nice to support as many architectures as possible.  But on the other
hand, most systems are ASCII.  All the systems I've ever seen or
worked on have been ASCII.  I am fairly certain that I would not be
able to support EBCDIC in the long run and that, unless someone were
to continually support EBCDIC, the existing support would bitrot away.

Is anyone on the Wget list using an EBCDIC system?


[PATCH] wget-1.8.2: Portability, plus EBCDIC patch

2003-10-07 Thread Martin Kraemer
Hello Hrvoje and Dan,

I have been using wget for many years now, and finally got to applying
a patch I made long ago (EBCDIC patch against wget-1.5.3) to the
current wget-1.8.2. This patch makes wget compile and run on a
mainframe computer using the EBCDIC character set.

Also, when compiling wget on Solaris (using the SUNWspro "Forte"
compiler), I stumbled over a portability problem (C++ comments in a 
C source) to which I add a patch as well.

About the EBCDIC patch:
* The goal was to create a patch which worked for our EBCDIC system
  (Fujitsu-Siemens' mainframe OS is called BS2000, it runs on /390
  hardware, but is not compatible with OS/390 per se) but would be
  easily adaptable to OS/390 (to which I have no access, but whose
  behaviour I know from similar ports). The code to actually make
  it work for OS/390 is not in place, but I added a tool (called
  safe-ctype-mk.c -- delete if you don't like it) to create the
  additions to safe-ctype.c which are necessary because IBM's
  EBCDIC differs from "our" EBCDIC.

* Because code conversion is necessary for text files, a distiction
  between "text" and "binary" download was added (based on the
  downloaded MIME type; see the routines http_set_convert_flag() and
  http_get_convert_flag(). A future patch may add a new
  --conversion=text/binary/auto switch which is not implemented
  yet.)  Currently, the same heuristics are used as in the Apache
  HTTP server to determine whether conversion is required (for
  several kinds of text files) or not required (for images,
  compressed files etc.)

* Because EBCDIC alphabetic characters live in the range between
  '\xA1' and '\xE9', the getopt_long() numbers have been shifted up
  by 200, beyond the 0xFF boundary, to avoid conflicts between
  single-character options and numeric long-option values. That does
  not change the behaviour on ASCII machines, but allows the source
  to compile on EBCDIC machines (otherwise: error: multiple case in
  switch).

* wget-1.8.2 has been compiled on our BS2000, with the patch applied,
  and with SSL enabled (against openssl-0.9.6k), and has been tested
  to work correctly.

If you would add the patch to future versions of wget, then all
users of our BS2000 as well as users of IBM's OS/390 could take
advantage of the availability of wget for EBCDIC-based machines, and
hopefully someone would also contribute the missing IBM-EBCDIC
counterparts to our BS2000-EBCDIC patch.

  Martin
-- 
<[EMAIL PROTECTED]> | Fujitsu Siemens
Fon: +49-89-636-46021, FAX: +49-89-636-47655 | 81730  Munich,  Germany
diff -bur wget-1.8.2/src/ftp.c work/wget-1.8.2/src/ftp.c
--- wget-1.8.2/src/ftp.c.orig   2003-10-06 17:20:58.710178000 +0200
+++ wget-1.8.2/src/ftp.c2003-10-06 17:17:00.399371000 +0200
@@ -474,7 +474,7 @@
}
 
   err = ftp_size(&con->rbuf, u->file, len);
-//  printf("\ndebug: %lld\n", *len);
+/*  printf("\ndebug: %lld\n", *len); */
   /* FTPRERR */
   switch (err)
{
diff -bur wget-1.8.2/src/http.c work/wget-1.8.2/src/http.c
--- wget-1.8.2/src/http.c.orig  2003-10-06 17:20:58.900182000 +0200
+++ wget-1.8.2/src/http.c   2003-10-06 17:19:16.829836000 +0200
@@ -1777,7 +1777,7 @@
  FREE_MAYBE (dummy);
  return RETROK;
}
-//  fprintf(stderr, "test: hstat.len: %lld, hstat.restval: %lld\n", hstat.dltime);
+/*  fprintf(stderr, "test: hstat.len: %lld, hstat.restval: %lld\n", 
hstat.dltime); */
   tmrate = retr_rate (hstat.len - hstat.restval, hstat.dltime, 0);
 
   if (hstat.len == hstat.contlen)
diff -bur wget-1.8.2.orig/src/connect.c wget-1.8.2/src/connect.c
--- wget-1.8.2.orig/src/connect.c   Mon Oct  6 17:13:11 2003
+++ wget-1.8.2/src/connect.cMon Oct  6 17:10:28 2003
@@ -47,6 +47,10 @@
 #endif
 #endif /* WINDOWS */
 
+#if #system(bs2000)
+#include 
+#endif
+
 #include 
 #ifdef HAVE_STRING_H
 # include 
@@ -73,6 +77,26 @@
to connect_to_one.  */
 static const char *connection_host_name;
 
+#if 'A' == '\xC1' /* CHARSET_EBCDIC */
+/* Start off with convert=1 (headers are always converted) */
+static int convert_flag_last_reply = 1;
+
+void
+http_set_convert_flag(const char *type)
+{
+convert_flag_last_reply = 
+   (strncasecmp(type, "text/", 5) == 0 
+   || strncasecmp(type, "message/", 8) == 0 
+   || strcasecmp(type, "application/postscript") == 0);
+}
+
+int
+http_get_convert_flag()
+{
+return convert_flag_last_reply;
+}
+#endif
+ 
 void
 set_connection_host_name (const char *host)
 {
@@ -459,6 +483,11 @@
 }
   while (res == -1 && errno == EINTR);
 
+#if 'A' == '\xC1'
+  if (res > 0 && http_get_convert_flag())
+_a2e_n(buf,res);
+#endif
+
   return res;
 }
 
@@ -472,6 +501,25 @@
 {
   int res = 0;
 
+#if 'A' == '\xC1' /* CHARSET_EBCDIC */
+  static char *cbuf = NULL;
+  static int csize = 0;
+
+  if (len > csize) {
+if (cbuf != NULL)
+  free(cbuf);
+cbuf = malloc(csize = len+8192); /* add arbitrary amount of skew */
+