Re: patch for replacing non-printable chars in filenames
Hi. After trying this a bit, I now think it would read better to use 3-digit octal escaping. I would be perfectly fine with that. And octal is probably more in the line of how escaping is traditionally done. As long as I can process the files in the log, I'm all for it. Btw, will this change make it into a later rsync version (2.4.7?) ? I would rather not depend on using a custom patched rsync, but if it will become a standard feature at some point it feels less hacky. ;) Anyway, thanks. :) Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Fri, Apr 01, 2005 at 10:26:18AM +0200, Vidar Madsen wrote: Btw, will this change make it into a later rsync version ? Yes, I've just committed it for 2.6.5. Now I need to add configure checking for setlocale() and locale.h. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Hi. Sorry about picking up a rather ancient thread, but this didn't bite me until now (when I upgraded to 2.6.4); Wayne wrote: I've also checked in an improvement to safe_fname() that makes it use isprint() (instead of just looking for newlines). Is there a chance that this feature will become selectable? I have some scripts that rely on a specially formatted log (made with --log-format) to do some post-processing after (or during) the transfer, and these now fail, since several files (whose names contain non-ascii chars) might be squashed into the same string. Alternatively, how about escaping the chars instead of just munging them? I.e. output files like two-line\x0afile name or P\xe5ske (norwegian for easter, for the curious;), or something like that? Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Oops, I should have added that for isprint() (in safe_fname()) to be locale-aware at all, you need to add a call to setlocale(LC_CTYPE, ). Vidar -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Mar 31, 2005 at 01:17:16PM +0200, Vidar Madsen wrote: Alternatively, how about escaping the chars instead of just munging them? I.e. output files like two-line\x0afile name or P\xe5ske (norwegian for easter, for the curious;), or something like that? I'd be fine with that. It would mean doubling \ characters as well, though. Anyone else have an opinion on this? Appended is a patch that does the suggested escaping. ..wayne.. --- util.c 30 Mar 2005 19:34:20 - 1.181 +++ util.c 31 Mar 2005 16:09:16 - @@ -877,11 +877,12 @@ int pop_dir(char *dir) return 1; } -/* Return the filename, turning any non-printable characters into '?'s. - * This ensures that outputting it on a line of its own cannot generate an - * empty line. This function can return only MAX_SAFE_NAMES values at a - * time! The returned value can be longer than MAXPATHLEN (because we - * may be trying to output an error about a too-long filename)! */ +/* Return the filename, turning any non-printable characters into escaped + * characters (e.g. \n - \x0d, \ - \\). This ensures that outputting it + * cannot generate an empty line nor corrupt the screen. This function can + * return only MAX_SAFE_NAMES values at a time! The returned value can be + * longer than MAXPATHLEN (because we may be trying to output an error about + * a too-long filename)! */ char *safe_fname(const char *fname) { #define MAX_SAFE_NAMES 4 @@ -891,13 +892,21 @@ char *safe_fname(const char *fname) char *t; ndx = (ndx + 1) % MAX_SAFE_NAMES; - for (t = fbuf[ndx]; *fname; fname++) { - if (!isprint(*(uchar*)fname)) - *t++ = '?'; - else + for (t = fbuf[ndx]; *fname limit; fname++) { + if (*fname == '\\') { + if ((limit -= 2) 0) + break; + *t++ = '\\'; + *t++ = '\\'; + } else if (!isprint(*(uchar*)fname)) { + if ((limit -= 3) 0) + break; + sprintf(t, \\%02x, *(uchar*)fname); + t += 3; + } else { + limit--; *t++ = *fname; - if (--limit == 0) - break; + } } *t = '\0'; -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Mar 31, 2005 at 08:13:52AM -0800, Wayne Davison wrote: Appended is a patch that does the suggested escaping. Actually, that patch didn't put the suggested 'x' in after the '\'. After trying this a bit, I now think it would read better to use 3-digit octal escaping. That would turn a \n into \012 instead of \x0a, for instance. The changes to the prior patch are as easy as increasing the '3's to '4's, changing the sprintf() format to \\%03o, and fixing the function comment. ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Thu, Nov 25, 2004 at 11:27:58AM +0100, Paul Slootman wrote: Not all filenames that are printed are passed through safe_fname() AFAICS, e.g. a random piece of code from rsync.c:166 : I looked at eliminating safe_fname() in favor of putting the filtering into rwrite(), and there are a bunch of places that expect to be able to output tabs and newlines as a part of the string. So, I decided to try to find all the places that didn't use either safe_fname() or full_fname() (which calls safe_fname()) and fix them. I've also checked in an improvement to safe_fname() that makes it use isprint() (instead of just looking for newlines). ..wayne.. -- To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: +/* Replace non-printing chars in the string, most probably due to + * wierd filenames. Skip the first and last chars, they may be \n */ +int i; +for (i=1; ilen-1; i++) +if (!isprint(buf[i])) +buf[i] = '?'; Is looping over strings a good idea in times of UTF-8? cu, Stefan -- Stefan Nehlsen | ParlaNet Administration | [EMAIL PROTECTED] | +49 431 988-1260 -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Fri 26 Nov 2004, Stefan Nehlsen wrote: On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: +/* Replace non-printing chars in the string, most probably due to + * wierd filenames. Skip the first and last chars, they may be \n */ +int i; +for (i=1; ilen-1; i++) +if (!isprint(buf[i])) +buf[i] = '?'; Is looping over strings a good idea in times of UTF-8? It is if you don't know the strings are in UTF-8, and you want to prevent garbage chars reaching the tty (the whole point of this exercise :-) Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue 23 Nov 2004, Wayne Davison wrote: On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: Here's a patch. Opinions? I think that a better place to munge the name would be in the safe_fname() routine in utils.c (which already munges newlines characters into question marks). The reason I didn't change any other characters was because I feared that it would mangle foreign filenames that use high-bit characters. I'd want some feedback from such users before accepting such a patch. Not all filenames that are printed are passed through safe_fname() AFAICS, e.g. a random piece of code from rsync.c:166 : if (verbose 2) { if (change_uid) { rprintf(FINFO, set uid of %s from %ld to %ld\n, fname, (long)st-st_uid, (long)file-uid); } if (change_gid) { rprintf(FINFO, set gid of %s from %ld to %ld\n, fname, (long)st-st_gid, (long)file-gid); } } Note that isprint() will take into account the locale in effect, i.e. when using the FR_fr locale things like é should be recognized as printable. At least, under linux that would seem to be the case; from the NOTE section of isprint's manpage: The details of what characters belong into which class depend on the current locale. [...] setlocale(LC_CTYPE, NULL) probably needs to be called during program startup, however... The bug reporter (a frenchman I believe) was agreeable to all non-ASCII chars being replaced however; that's preferable to having his tty messed now and again. Making it depend on whether stdout is a tty may also be useful. Paul Slootman -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
patch for replacing non-printable chars in filenames
There's a bug reported in Debian about the tty being screwed up by wierd filenames, see http://bugs.debian.org/bug=242300 On the one hand, find will also do this. On the other hand, ls will replace such chars with a question mark. Upon inspection, it appears to be fairly simple to also do this in rsync (in the rwrite() function). Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e. offer some way to turn it off? Paul Slootman --- log.c.orig 2004-10-04 11:51:37.0 +0200 +++ log.c 2004-11-23 17:27:29.0 +0100 @@ -180,6 +180,15 @@ buf[len] = 0; +if (code == FINFO) { +/* Replace non-printing chars in the string, most probably due to + * wierd filenames. Skip the first and last chars, they may be \n */ +int i; +for (i=1; ilen-1; i++) +if (!isprint(buf[i])) +buf[i] = '?'; +} + if (am_server msg_fd_out = 0) { /* Pass the message to our sibling. */ send_msg((enum msgcode)code, buf, len); -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: Here's a patch. Opinions? I think that a better place to munge the name would be in the safe_fname() routine in utils.c (which already munges newlines characters into question marks). The reason I didn't change any other characters was because I feared that it would mangle foreign filenames that use high-bit characters. I'd want some feedback from such users before accepting such a patch. ..wayne.. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: patch for replacing non-printable chars in filenames
Hi, On Tue, Nov 23, 2004 at 05:29:57PM +0100, Paul Slootman wrote: There's a bug reported in Debian about the tty being screwed up by wierd filenames, see http://bugs.debian.org/bug=242300 On the one hand, find will also do this. On the other hand, ls will replace such chars with a question mark. Upon inspection, it appears to be fairly simple to also do this in rsync (in the rwrite() function). 1. find's output is mostly for another program's input, not for tty. 2. ls does --hide-control-chars by default only if isatty (STDOUT_FILENO). Here's a patch. Opinions? Perhaps don't do it unconditionally, i.e. offer some way to turn it off? I'd make it like ls, i.e. when descriptor is a tty; also I'd add some option to enforce --hide-control-chars also for non-tty. -- ldv pgpQQas3RA4cg.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html