Re: patch for replacing non-printable chars in filenames

2005-04-01 Thread Vidar Madsen
Hi.

 After trying this a bit, I now think it would read better to use 3-digit
 octal escaping.

I would be perfectly fine with that. And octal is probably more in the
line of how escaping is traditionally done. As long as I can process
the files in the log, I'm all for it.

Btw, will this change make it into a later rsync version (2.4.7?) ? I
would rather not depend on using a custom patched rsync, but if it
will become a standard feature at some point it feels less hacky. ;)

Anyway, thanks. :)

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-04-01 Thread Wayne Davison
On Fri, Apr 01, 2005 at 10:26:18AM +0200, Vidar Madsen wrote:
 Btw, will this change make it into a later rsync version ?

Yes, I've just committed it for 2.6.5.  Now I need to add configure
checking for setlocale() and locale.h.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Vidar Madsen
Hi.

Sorry about picking up a rather ancient thread, but this didn't bite
me until now (when I upgraded to 2.6.4);

Wayne wrote:
 I've also checked
 in an improvement to safe_fname() that makes it use isprint() (instead
 of just looking for newlines).

Is there a chance that this feature will become selectable? I have
some scripts that rely on a specially formatted log (made with
--log-format) to do some post-processing after (or during) the
transfer, and these now fail, since several files (whose names contain
non-ascii chars) might be squashed into the same string.

Alternatively, how about escaping the chars instead of just munging
them? I.e. output files like two-line\x0afile name or P\xe5ske
(norwegian for easter, for the curious;), or something like that?

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Vidar Madsen
Oops, I should have added that for isprint() (in safe_fname()) to be
locale-aware at all, you need to add a call to setlocale(LC_CTYPE,
).

Vidar
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Wayne Davison
On Thu, Mar 31, 2005 at 01:17:16PM +0200, Vidar Madsen wrote:
 Alternatively, how about escaping the chars instead of just munging
 them? I.e. output files like two-line\x0afile name or P\xe5ske
 (norwegian for easter, for the curious;), or something like that?

I'd be fine with that.  It would mean doubling \ characters as well,
though.  Anyone else have an opinion on this?

Appended is a patch that does the suggested escaping.

..wayne..
--- util.c  30 Mar 2005 19:34:20 -  1.181
+++ util.c  31 Mar 2005 16:09:16 -
@@ -877,11 +877,12 @@ int pop_dir(char *dir)
return 1;
 }
 
-/* Return the filename, turning any non-printable characters into '?'s.
- * This ensures that outputting it on a line of its own cannot generate an
- * empty line.  This function can return only MAX_SAFE_NAMES values at a
- * time!  The returned value can be longer than MAXPATHLEN (because we
- * may be trying to output an error about a too-long filename)! */
+/* Return the filename, turning any non-printable characters into escaped
+ * characters (e.g. \n - \x0d, \ - \\).  This ensures that outputting it
+ * cannot generate an empty line nor corrupt the screen.  This function can
+ * return only MAX_SAFE_NAMES values at a time!  The returned value can be
+ * longer than MAXPATHLEN (because we may be trying to output an error about
+ * a too-long filename)! */
 char *safe_fname(const char *fname)
 {
 #define MAX_SAFE_NAMES 4
@@ -891,13 +892,21 @@ char *safe_fname(const char *fname)
char *t;
 
ndx = (ndx + 1) % MAX_SAFE_NAMES;
-   for (t = fbuf[ndx]; *fname; fname++) {
-   if (!isprint(*(uchar*)fname))
-   *t++ = '?';
-   else
+   for (t = fbuf[ndx]; *fname  limit; fname++) {
+   if (*fname == '\\') {
+   if ((limit -= 2)  0)
+   break;
+   *t++ = '\\';
+   *t++ = '\\';
+   } else if (!isprint(*(uchar*)fname)) {
+   if ((limit -= 3)  0)
+   break;
+   sprintf(t, \\%02x, *(uchar*)fname);
+   t += 3;
+   } else {
+   limit--;
*t++ = *fname;
-   if (--limit == 0)
-   break;
+   }
}
*t = '\0';
 
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Re: patch for replacing non-printable chars in filenames

2005-03-31 Thread Wayne Davison
On Thu, Mar 31, 2005 at 08:13:52AM -0800, Wayne Davison wrote:
 Appended is a patch that does the suggested escaping.

Actually, that patch didn't put the suggested 'x' in after the '\'.
After trying this a bit, I now think it would read better to use 3-digit
octal escaping.  That would turn a \n into \012 instead of \x0a, for
instance.  The changes to the prior patch are as easy as increasing the
'3's to '4's, changing the sprintf() format to \\%03o, and fixing the
function comment.

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2005-02-07 Thread Wayne Davison
On Thu, Nov 25, 2004 at 11:27:58AM +0100, Paul Slootman wrote:
 Not all filenames that are printed are passed through safe_fname()
 AFAICS, e.g. a random piece of code from rsync.c:166 :

I looked at eliminating safe_fname() in favor of putting the filtering
into rwrite(), and there are a bunch of places that expect to be able
to output tabs and newlines as a part of the string.  So, I decided to
try to find all the places that didn't use either safe_fname() or
full_fname() (which calls safe_fname()) and fix them.  I've also checked
in an improvement to safe_fname() that makes it use isprint() (instead
of just looking for newlines).

..wayne..
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-26 Thread Stefan Nehlsen
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
 +/* Replace non-printing chars in the string, most probably due to
 + * wierd filenames. Skip the first and last chars, they may be 
 \n */
 +int i;
 +for (i=1; ilen-1; i++)
 +if (!isprint(buf[i]))
 +buf[i] = '?';

Is looping over strings a good idea in times of UTF-8?


cu, Stefan
-- 
Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-26 Thread Paul Slootman
On Fri 26 Nov 2004, Stefan Nehlsen wrote:
 On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
  +/* Replace non-printing chars in the string, most probably due 
  to
  + * wierd filenames. Skip the first and last chars, they may be 
  \n */
  +int i;
  +for (i=1; ilen-1; i++)
  +if (!isprint(buf[i]))
  +buf[i] = '?';
 
 Is looping over strings a good idea in times of UTF-8?

It is if you don't know the strings are in UTF-8, and you want to
prevent garbage chars reaching the tty (the whole point of this
exercise :-)


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-25 Thread Paul Slootman
On Tue 23 Nov 2004, Wayne Davison wrote:
 On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
  Here's a patch. Opinions?
 
 I think that a better place to munge the name would be in the
 safe_fname() routine in utils.c (which already munges newlines
 characters into question marks).  The reason I didn't change
 any other characters was because I feared that it would mangle
 foreign filenames that use high-bit characters.  I'd want some
 feedback from such users before accepting such a patch.

Not all filenames that are printed are passed through safe_fname()
AFAICS, e.g. a random piece of code from rsync.c:166 :

if (verbose  2) {
if (change_uid) {
rprintf(FINFO,
set uid of %s from %ld to %ld\n,
fname, (long)st-st_uid, (long)file-uid);
}
if (change_gid) {
rprintf(FINFO,
set gid of %s from %ld to %ld\n,
fname, (long)st-st_gid, (long)file-gid);
}
}

Note that isprint() will take into account the locale in effect, i.e.
when using the FR_fr locale things like é should be recognized as
printable. At least, under linux that would seem to be the case; from
the NOTE section of isprint's manpage:

The  details of what characters belong into which class depend on
the current locale. [...]

setlocale(LC_CTYPE, NULL) probably needs to be called during program
startup, however...

The bug reporter (a frenchman I believe) was agreeable to all non-ASCII
chars being replaced however; that's preferable to having his tty messed
now and again.

Making it depend on whether stdout is a tty may also be useful.


Paul Slootman
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


patch for replacing non-printable chars in filenames

2004-11-23 Thread Paul Slootman
There's a bug reported in Debian about the tty being screwed up by wierd
filenames, see http://bugs.debian.org/bug=242300

On the one hand, find will also do this. On the other hand, ls will
replace such chars with a question mark. Upon inspection, it appears to
be fairly simple to also do this in rsync (in the rwrite() function).

Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e.
offer some way to turn it off?

Paul Slootman

--- log.c.orig  2004-10-04 11:51:37.0 +0200
+++ log.c   2004-11-23 17:27:29.0 +0100
@@ -180,6 +180,15 @@
 
buf[len] = 0;
 
+if (code == FINFO) {
+/* Replace non-printing chars in the string, most probably due to
+ * wierd filenames. Skip the first and last chars, they may be \n 
*/
+int i;
+for (i=1; ilen-1; i++)
+if (!isprint(buf[i]))
+buf[i] = '?';
+}
+
if (am_server  msg_fd_out = 0) {
/* Pass the message to our sibling. */
send_msg((enum msgcode)code, buf, len);
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-23 Thread Wayne Davison
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
 Here's a patch. Opinions?

I think that a better place to munge the name would be in the
safe_fname() routine in utils.c (which already munges newlines
characters into question marks).  The reason I didn't change
any other characters was because I feared that it would mangle
foreign filenames that use high-bit characters.  I'd want some
feedback from such users before accepting such a patch.

..wayne..
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Re: patch for replacing non-printable chars in filenames

2004-11-23 Thread Dmitry V. Levin
Hi,

On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote:
 There's a bug reported in Debian about the tty being screwed up by wierd
 filenames, see http://bugs.debian.org/bug=242300
 
 On the one hand, find will also do this. On the other hand, ls will
 replace such chars with a question mark. Upon inspection, it appears to
 be fairly simple to also do this in rsync (in the rwrite() function).

1. find's output is mostly for another program's input, not for tty.
2. ls does --hide-control-chars by default only if isatty (STDOUT_FILENO).

 Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e.
 offer some way to turn it off?

I'd make it like ls, i.e. when descriptor is a tty; also I'd add some
option to enforce --hide-control-chars also for non-tty.


-- 
ldv


pgpQQas3RA4cg.pgp
Description: PGP signature
-- 
To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html