Re: locale in locate(1)

2019-01-14 Thread Ingo Schwarze
Hi,

Jan Stary wrote on Thu, Jan 10, 2019 at 09:35:48PM +0100:

> Does locate(1) need to setlocale(3)?

Committed as part of a much larger diff.

Note that answering that question required substantial code review
and code cleanup.

When auditing, don't just mechanically look for suspicious patterns.
You have to actually inspect and understand the code to make an
an audit useful.  Something like the patch below is merely a
starting point for an audit (trivial to find), not a result.

Yours,
  Ingo


> Index: locate/locate.c
> ===
> RCS file: /cvs/src/usr.bin/locate/locate/locate.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 locate.c
> --- locate/locate.c   19 Nov 2015 21:46:05 -  1.31
> +++ locate/locate.c   10 Jan 2019 20:34:16 -
> @@ -73,7 +73,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -122,7 +121,6 @@ main(int argc, char *argv[])
>  {
>   int ch;
>   char **dbv = NULL;
> - (void) setlocale(LC_ALL, "");
>  
>   if (pledge("stdio rpath", NULL) == -1)
>   err(1, "pledge");



Re: locale in locate(1)

2019-01-13 Thread Ingo Schwarze
Hi Ted,

Ted Unangst wrote on Sun, Jan 13, 2019 at 03:22:41AM -0500:
> Ingo Schwarze wrote:

>> So, here is some cleanup:
>>  * garbage collect useless hand-rolled lookup tables
>>  * merge one-line helper functions into callers
>>  * garbage collect obfuscating macros
>>  * fix one format string error (%d used for u_int)
>>  * garbage collect setlocale(3) and 
>>  * garbage collect several unused constants
>>  * minus 45 LOC, no functional change
>> 
>> OK?

> I think you may be fixing a bug here...

Is that an OK?

>> Index: locate/fastfind.c
>> ===
>> RCS file: /cvs/src/usr.bin/locate/locate/fastfind.c,v
>> retrieving revision 1.14
>> diff -u -p -r1.14 fastfind.c
>> --- locate/fastfind.c8 Dec 2017 17:26:42 -   1.14
>> +++ locate/fastfind.c13 Jan 2019 02:07:47 -
>> @@ -126,10 +126,8 @@ fastfind_mmap
>>  u_char bigram1[NBG], bigram2[NBG], path[PATH_MAX];
>>  
>>  #ifdef FF_ICASE
>> -/* use a lookup table for case insensitive search */
>> -u_char table[UCHAR_MAX + 1];
>> -
>> -tolower_word(pathpart);
>> +for (p = pathpart; *p != '\0'; p++)
>> +*p = tolower(*p);
>>  #endif /* FF_ICASE*/
>>  
>>  /* init bigram table */
>> @@ -156,11 +154,8 @@ fastfind_mmap
>>  p = pathpart;
>>  patend = patprep(p);
>>  cc = *patend;
>> -
>>  #ifdef FF_ICASE
>> -/* set patend char to true */
>> -table[TOLOWER(*patend)] = 1;
>> -table[toupper(*patend)] = 1;
>> +cc = tolower(cc);
>>  #endif /* FF_ICASE */

> I cannot see where that table is zeroed.
> It lives on the stack, it could contain anything.

Good point.  Then again, if that bug stings, it only causes bogus
matches in the pre-match phase (and only with -i).  The real
comparison is done later on the complete strings, see the calls to
fnmatch(3) and strstr(3) bwlow.  So the bug probably only ruined -i
performance, but did not lead to invalid results.

Yours,
  Ingo



Re: locale in locate(1)

2019-01-13 Thread Ted Unangst
Ingo Schwarze wrote:
> So, here is some cleanup:
>  * garbage collect useless hand-rolled lookup tables
>  * merge one-line helper functions into callers
>  * garbage collect obfuscating macros
>  * fix one format string error (%d used for u_int)
>  * garbage collect setlocale(3) and 
>  * garbage collect several unused constants
>  * minus 45 LOC, no functional change
> 
> OK?
>   Ingo

I think you may be fixing a bug here...

> Index: locate/fastfind.c
> ===
> RCS file: /cvs/src/usr.bin/locate/locate/fastfind.c,v
> retrieving revision 1.14
> diff -u -p -r1.14 fastfind.c
> --- locate/fastfind.c 8 Dec 2017 17:26:42 -   1.14
> +++ locate/fastfind.c 13 Jan 2019 02:07:47 -
> @@ -126,10 +126,8 @@ fastfind_mmap
>   u_char bigram1[NBG], bigram2[NBG], path[PATH_MAX];
>  
>  #ifdef FF_ICASE
> - /* use a lookup table for case insensitive search */
> - u_char table[UCHAR_MAX + 1];
> -
> - tolower_word(pathpart);
> + for (p = pathpart; *p != '\0'; p++)
> + *p = tolower(*p);
>  #endif /* FF_ICASE*/
>  
>   /* init bigram table */
> @@ -156,11 +154,8 @@ fastfind_mmap
>   p = pathpart;
>   patend = patprep(p);
>   cc = *patend;
> -
>  #ifdef FF_ICASE
> - /* set patend char to true */
> - table[TOLOWER(*patend)] = 1;
> - table[toupper(*patend)] = 1;
> + cc = tolower(cc);
>  #endif /* FF_ICASE */

I cannot see where that table is zeroed. It lives on the stack, it could
contain anything.



Re: locale in locate(1)

2019-01-12 Thread Ingo Schwarze
Hi Scott,

Scott Cheloha wrote on Thu, Jan 10, 2019 at 07:18:43PM -0600:

> locate(1) uses tolower(3), the results of which can be affected by
> locale settings.

In general, yes.  On OpenBSD, not really:

   $ man tolower | grep -A 1 always  
 OpenBSD always uses the C locale for these functions, ignoring the
 global locale, the thread-specific locale, and the locale argument.

That makes sense because:

   $ man tolower | grep -A 1 representable 
 The argument c must be EOF or representable as an unsigned char;
 otherwise, the result is undefined.

So locale(1) support for tolower(3) really only makes sense for
single-byte non-ASCII locales (like LATIN-1) and we don't support
any of those.  The wchar_t characters needed for UTF-8 cannot be
represented as unsigned chars.

> locate(1)'s database is, at least in theory, portable.

I guess you are trying to say the right thing...
More precisely:  Filenames are byte strings, *NOT* character strings,
not even multibyte character strings.  Consequently, a filename
does *NOT* have a character encoding.  The locate datebase simply
stores these byte strings.  That it uses its own, home-grown encoding
scheme does not matter because that is not a *character* encoding
but rather some very naive compression method.  Since it does *NOT*
store characters, the locate database format is locale-independent
in the first place.

> If you don't explicitly set the locale to POSIX/C I think it might
> be possible to create a corrupted database,

No.  The database creation tools - updatedb, mklocatedb, bigram,
code - all explicitly ignore the locale.  There is no way to create
a corrupt database.

You will have fun copying a database from one machine to another
machine where sizeof(int) is different, though.  But that's not
corrupt, just incompatible, and unrelated to character encoding.

> or read a valid database as corrupted.

No, that isn't possible either.  Even if the tolower(3) function were
locale-dependent on OpenBSD, the worst thing that could happen would
be partially broken support for the locale(1) -i option.  But it isn't,
so whether you run locate(1) with the ASCII or with the UTF-8 locale
makes no difference.

That said, something else *can* go wrong.  If filenames contain
terminal control codes, running locate(1) can screw up your terminal:
as far as i can see, it doesn't have the protections that ls(1) has,
but that's a separate matter.


A brief look into /usr/src/usr.bin/locale/ makes it obvious that
it's a rathole of messy code requiring some love gently applied
with Bob's whale flensing knife.

For example, the multiple layers of wrappers around tolower(3)
are totally pointless because native tolower(3) already does table
lookup in the first place.

So, here is some cleanup:
 * garbage collect useless hand-rolled lookup tables
 * merge one-line helper functions into callers
 * garbage collect obfuscating macros
 * fix one format string error (%d used for u_int)
 * garbage collect setlocale(3) and 
 * garbage collect several unused constants
 * minus 45 LOC, no functional change

OK?
  Ingo


Index: locate/fastfind.c
===
RCS file: /cvs/src/usr.bin/locate/locate/fastfind.c,v
retrieving revision 1.14
diff -u -p -r1.14 fastfind.c
--- locate/fastfind.c   8 Dec 2017 17:26:42 -   1.14
+++ locate/fastfind.c   13 Jan 2019 02:07:47 -
@@ -126,10 +126,8 @@ fastfind_mmap
u_char bigram1[NBG], bigram2[NBG], path[PATH_MAX];
 
 #ifdef FF_ICASE
-   /* use a lookup table for case insensitive search */
-   u_char table[UCHAR_MAX + 1];
-
-   tolower_word(pathpart);
+   for (p = pathpart; *p != '\0'; p++)
+   *p = tolower(*p);
 #endif /* FF_ICASE*/
 
/* init bigram table */
@@ -156,11 +154,8 @@ fastfind_mmap
p = pathpart;
patend = patprep(p);
cc = *patend;
-
 #ifdef FF_ICASE
-   /* set patend char to true */
-   table[TOLOWER(*patend)] = 1;
-   table[toupper(*patend)] = 1;
+   cc = tolower(cc);
 #endif /* FF_ICASE */
 
 
@@ -173,10 +168,11 @@ fastfind_mmap
 
/* go forward or backward */
if (c == SWITCH) { /* big step, an integer */
-   if (len < INTSIZE)
+   if (len < sizeof(int))
break;
count += getwm(paddr) - OFFSET;
-   len -= INTSIZE; paddr += INTSIZE;
+   len -= sizeof(int);
+   paddr += sizeof(int);
} else {   /* slow step, =< 14 chars */
count += c - OFFSET;
}
@@ -207,7 +203,7 @@ fastfind_mmap
break; /* SWITCH */
}
 #ifdef FF_ICASE
-   if (table[c])
+   if (tolower(c) == cc)
 #else
if 

Re: locale in locate(1)

2019-01-10 Thread Ted Unangst
Scott Cheloha wrote:
> > On Jan 10, 2019, at 14:35, Jan Stary  wrote:
> > 
> > Does locate(1) need to setlocale(3)?
> 
> locate(1) uses tolower(3), the results of which can be affected by
> locale settings.  locate(1)'s database is, at least in theory, portable.
> If you don't explicitly set the locale to POSIX/C I think it might be
> possible to create a corrupted database, or read a valid database
> as corrupted.

Speaking of portability...

man locate says:

 The locate database is not byte order independent.  It is not possible to
 share the databases between machines with different byte order.  The
 current locate implementation understands databases in host byte order or
 network byte order.  So a little-endian machine can't understand a locate
 database which was built on a big-endian machine.

That doesn't really make sense. If it understands network byte order, then it
can read big endian databases. And then there's this comment in the source:

 * Convert network byte order to host byte order if necessary.
 * So we can read on FreeBSD/i386 (little endian) a locate database
 * which was built on SunOS/sparc (big endian).

So there's a little mystery to unravel for somebody with multiple CPUs.
Does it work or not?



Re: locale in locate(1)

2019-01-10 Thread Theo de Raadt
> Does locate(1) need to setlocale(3)?

Since you ask the question, yes.

Oh wait, was your question rhetorical?

It lacks facts.

Who's time are you wasting?


> Index: locate/locate.c
> ===
> RCS file: /cvs/src/usr.bin/locate/locate/locate.c,v
> retrieving revision 1.31
> diff -u -p -r1.31 locate.c
> --- locate/locate.c   19 Nov 2015 21:46:05 -  1.31
> +++ locate/locate.c   10 Jan 2019 20:34:16 -
> @@ -73,7 +73,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -122,7 +121,6 @@ main(int argc, char *argv[])
>  {
>   int ch;
>   char **dbv = NULL;
> - (void) setlocale(LC_ALL, "");
>  
>   if (pledge("stdio rpath", NULL) == -1)
>   err(1, "pledge");
> 



Re: locale in locate(1)

2019-01-10 Thread Theo de Raadt
Scott Cheloha  wrote:

> The ctype wizard may also have a very elegant explanation clarifying
> whether this is correct or not.

The diffs received are annoying.

What problem does it solve?

What are the downsides?

Is there any education in the diffs as to why the changes are needed,
so that people not freshly in the game can participate?

No.

Questions are not asked nor answered.

Like, seriously come on.  If you don't know why you are writing a diff
or can't be bothered to articulate why you are writing the diff why are
you sending it?

I suggest we all sit this out until there is another try.



Re: locale in locate(1)

2019-01-10 Thread Scott Cheloha
> On Jan 10, 2019, at 14:35, Jan Stary  wrote:
> 
> Does locate(1) need to setlocale(3)?

locate(1) uses tolower(3), the results of which can be affected by
locale settings.  locate(1)'s database is, at least in theory, portable.
If you don't explicitly set the locale to POSIX/C I think it might be
possible to create a corrupted database, or read a valid database
as corrupted.

But tedu says we have a policy so maybe I am mistaken about this.
The ctype wizard may also have a very elegant explanation clarifying
whether this is correct or not.



locale in locate(1)

2019-01-10 Thread Jan Stary
Does locate(1) need to setlocale(3)?

Jan

Index: locate/locate.c
===
RCS file: /cvs/src/usr.bin/locate/locate/locate.c,v
retrieving revision 1.31
diff -u -p -r1.31 locate.c
--- locate/locate.c 19 Nov 2015 21:46:05 -  1.31
+++ locate/locate.c 10 Jan 2019 20:34:16 -
@@ -73,7 +73,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -122,7 +121,6 @@ main(int argc, char *argv[])
 {
int ch;
char **dbv = NULL;
-   (void) setlocale(LC_ALL, "");
 
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");