Re: [musl] bbox: musl versus uclibc

2017-08-14 Thread Rich Felker
On Mon, Aug 14, 2017 at 07:43:39PM +0200, Denys Vlasenko wrote:
> As uclibc is increasingly aging, I am finally forced
> to switch to musl: I'm bitten by a nasty bug in
> getopt() - hush is using it in a slightly unusual way,
> which uclibc does not expect.

While I'm glad musl is working for you, this sounds fragile unless
it's actually just a bug in uclibc, in which case it'll hopefully get
fixed. If you're relying on weird internal-state of getopt there's a
chance this could break in the future on musl or on other libcs.
Can you explain the problem you encountered on uclibc?

> I built a toolchain using
> https://github.com/richfelker/musl-cross-make
> (Rich, is this the thing I should be using?)
> and it worked with no issues at all.

Yes, it's the easiest and canonical way to get a full toolchain.

> (I can probably only wish for the README
> to also mention how to make this a _static_
> toolchain... I have a box with 32-bit userspace,
> would be awesome to be able to copy this fresh
> 64-bit toolchain to it and have it working).

If you put LDFLAGS="-static --static" (second form is a hack to
workaround libtool's stripping of -static :) in COMMON_CONFIG you
should get a static build of the toolchain, but it's linked to
whatever the libc in your build environment is.

In my config.mak I use:

ifeq ($(HOST),)
COMMON_CONFIG += CC="i486-linux-musl-gcc -static --static" 
CXX="i486-linux-musl-g++ -static --static"
endif
COMMON_CONFIG += CFLAGS="-g0 -Os" CXXFLAGS="-g0 -Os" LDFLAGS="-s -static 
--static"

to build the toolchain static-linked using an existing
i486/musl-targeting toolchain. The static in $CC/$CXX is needed
because some broken configure tests in GMP try to run test binaries
built without honoring LDFLAGS, and will fail to run if you don't have
ld-musl-i386 installed.

Now that canadian-cross support works, you could probably instead
treat i486-linux-musl as a cross compiler (host!=build) instead.

If you prefer you can use x86_64 instead, too, but I like the
toolchain binaries themselves being 32-bit simply because it saves
roughly half the ram usage when running them.

> Then I built busybox. Impressions:
> 
> Only a few options did not build:
> EXTRA_COMPAT and FEATURE_VI_REGEX_SEARCH
> failed because they need GNU regexp extensions.

Yes. There was a request for the corresponding version of some of
these extensions in the POSIX regex API, which bb vi could be
converted to use if we adopted them, but there is some nontrivial
runtime cost. It probably doesn't matter now since the regexec core is
fairly slow (well, good big-O but bad constant) right now, but it may
impose a more noticable relative cost once we make regexec faster.

> FEATURE_MOUNT_NFS and FEATURE_INETD_RPC do not build
> because they need rpc/rpc.h.
> Not complaining, since them being in libc was a mistake
> in the first place.

I think glibc has dropped these too now, haven't they? AFAIK they're
just keeping the symbols for ABI compat but they're not an exposed API
for linking new code; you're supposed to use tirpc.

> Now, the good news - musl has smaller data!
> 6695 bytes versus 7129 bytes for uclibc:
> 
>text  data   bss dechex filename
>  894902   465  6664  902031  dc38f busybox.uclibc
>  912538   563  6132  919233  e06c1 busybox.musl

Probably getpw*/getgr* static buffers or something. musl's backend for
these just uses the buffer out of getline so as not to impose cost in
apps that don't need the legacy (non-_r) functions or an arbitrary
limit on record length.

It would be interesting to know where the text size increase comes
from. Not that big in abs difference, but given that a large portion
of the text is busybox code and not libc, it's a fairly large relative
difference. Come to think about it, it might actually be regex..

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Re: Possible Vulnerability in httpd.c

2016-11-22 Thread Rich Felker
On Tue, Nov 22, 2016 at 02:30:50AM +0100, Denys Vlasenko wrote:
> On Mon, Nov 21, 2016 at 8:18 PM, Simon Rettberg
>  wrote:
> > On Mon, 21 Nov 2016 20:37:14 +0200
> >  Timo Teras  wrote:
> >>
> >> It is still good practice to fill it with snprintf. If this is done,
> >> proper error checking should be done to check the final 'len' that it
> >> does not exceed IOBUF_SIZE or you have information leak bug (since
> >> snprintf returns the length it would generate if buffer was unbounded;
> >> not the length it actually wrote to the buffer).
> >>
> > Exactly. It typically goes like this: Someone is using functions generally
> > considered "unsafe" because they know for sure it is not exploitable the way
> > it's being used in this specific instance. Then eventually someone else 
> > comes
> > along, adds feature X using unsafe functions as well (you try to do it the 
> > same
> > way the rest is written, right?), and boom, you suddenly got your exploit
> > because X happens to enable the remote user to inject arbitrarily long data
> > (think some %s the user can control).
> 
> Yes, I am aware this happens.
> 
> > It should be fixed properly, handling the case where the return value is 
> > either
> >> BUFSIZ or even < 0.
> 
> If you are afraid that there can be bugs of the "we thought it can't overflow,
> but it in fact can", then why you aren't afraid that length checking code
> wouldn't have bugs?

The latter is more obvious and easier for people not familiar with the
code to check. The former relies on a lot of hidden assumptions
including the value of BUFSIZ and the sizes of types.

> Different projects choose their paranoias differently.
> >From its inception, bbox was paranoid about code size.
> 
> If you see an actual bug where buffer can overflow,
> I'm more than willing to fix it.
> 
> But if there is no actual bug, and it's just a general concern
> that "it looks unsafe", then code size trumps it.

Have you stopped to consider the size from pulling in the deprecated
sprintf function to begin with? If all references to it were removed,
then static-linked busybox would only have snprintf, not sprintf. On
musl/i386 this would only save about 50 bytes but it might save more
on other archs or libcs.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Current git HEAD busybox segfaults on some applets

2016-09-16 Thread Rich Felker
On Fri, Sep 16, 2016 at 04:35:23PM +, Laurent Bercot wrote:
> 
>  A full gdb output is available here: http://pastebin.com/3k6SENiX
> 
>  The issue comes from the fact that fflush(stream) is #define'd as
> fflush_unlocked(stream), so fflush(0) actually runs
> fflush_unlocked(0), which segfaults with the current version of
> musl (but not glibc).
> 
>  fflush_unlocked() and friends are not part of the standard API
> (only getc_unlocked and putc_unlocked are) and it is a mistake
> to assume they 1. exist, 2. behave the same as their non-unlocked
> counterpart.
> 
>  The culprit is this commit:
> https://git.busybox.net/busybox/commit/include/libbb.h?id=aa3576a29b9619f4e1c1b131f5db53ad2bc2cb00
> 
>  Later commits modify the additions in libbb.h, but those
> additions are incorrect in the first place. I will send a
> patch that removes them.

More to the point, they're unnecessary and unsafe. The *_unlocked
functions are not well-specified, and except for getc/putc, they're
not going to yield you any performance benefits anyway.

The issue here happened because musl's fflush_unlocked is just
provided as an alias for the internal function fflush() calls once the
file is locked, or for each function in the list of all open files
when fflush(0) is used. This internal function is not capable of
handling the fflush(0) behavior itself. Maybe it should be able to;
I'm leaning towards getting rid of all the *_unlocked functions except
the standard ones and making the rest aliases for the non-_unlocked
version just to keep this kind of thing from happening in the future.
But I don't think it's a good idea for applications to be using them.
They were added purely for compatibility with broken software that
assumes they exist, not because they're useful for anything.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] getrandom: new applet

2016-07-12 Thread Rich Felker
On Tue, Jul 12, 2016 at 01:09:42PM -0400, Michael Conrad wrote:
> On 7/7/2016 11:49 AM, Rob Landley wrote:
> >On 07/06/2016 11:41 AM, Etienne Champetier wrote:
> >>>Now you really hate the fact that getrandom() is a syscall.
> >I do not hate the fact getrandom is a syscall. I'm asking what the point
> >is of a new applet to call this syscall. You have suggested it could
> >block to show when /dev/urandom is producing a higher grade of
> >randomness than it does before being properly seeded. That is, as far as
> >I can tell, the only reason for your new applet to exist rather than
> >upgrading $RANDOM in ash/hush.
> >
> Actually in my opinion the syscall is inferior to a new character
> device, because blocking syscalls interfere with event-driven
> programming.
> 
> Suppose you want to write a single-thread event-driven web server
> which initializes its SSL library with this randomness source (i.e.
> won't allow SSL until enough entropy is collected for a good
> initialization) but you still want it to be able to accept non-SSL
> connections.  In order to use the syscall you need a thread, or
> child process.  (haha, such as pipe to a "getrandom" applet...)

Thanks for contributing your ideas about what color the bikeshed
should be, but getrandom's already got you covered. Until your csPRNG
is initialized, just call getrandom with the GRND_NONBLOCK once each
time you get any event (i.e. each time poll() returns). There's no
need to busy-wait or periodically check with timeouts since you don't
need the results until there's an event to act on, anyway.

There are very good reasons it's a syscall rather than a device: many
use cases require a never-fails entropy source, and with the device
node approach they're vulnerable to fd-exhaustion attacks. Most
existing bad code, when faced with such a situation, falls back to
some completely insecure seed like time(). The only reliable way to
prevent such idiocy was to provide an interface that can't fail.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] getrandom: new applet

2016-07-11 Thread Rich Felker
On Mon, Jul 11, 2016 at 08:19:29PM +0200, Denys Vlasenko wrote:
> On Mon, Jul 11, 2016 at 8:58 AM, Rob Landley  wrote:
> > Anyway, the point is you can't voluntarily collect _more_ entropy from
> > these sources. They run when there's data to gather, calling the
> > routines when there isn't new data to gather doesn't help much.
> 
> And yet, interestingly, if you try to solve an opposite problem,
> the predictable and accurate benchmarking, you suddenly discover that
> the damn thing just can't be made to run completely deterministically.

The fact that your benchmark timings vary from run to run doesn't have
anything to do with whether they're deterministic or predictable. It's
mostly a function of the full state of the kernel (and all running
processes) -- aside from asynchronous hardware events, the main
factors which influce the timings are the points at which the
scheduler interrupts, the state of the cache, page tables, and tlb,
etc.

On modern x86's there's quite a degree of variation, for complicated
reasons that basically all stem from (1) unpredictable asynchronous
peripheral hardware and (2) internal unpredictability due to the
tricks involved in running at such high clocks. On small embedded
systems with little or no peripheral hardware, though, things are
quite predictable. I don't have figures but my (educated, based on
kernel debugging) guess would be that it's common for the number of
possible states when enterring init to be on the order of tens or at
most hundreds.

> Let's boot in single user, and run just one process, you say.
> Let's pin it to CPU0, you say. Switch off all dynamic CPU scaling,
> you say. No way. Still not totally deterministic.

Your logic is badly^H^H^H^H^Hdisastrously wrong here. Absolute or even
near determinism is not needed for there to be a major vulnerability.
Even if there are on the scale of 2^32 possible initial states,
computing all the resulting keys for each and trying them all is a
trivial computation even for individual attackers. With modern cloud
computing you don't need to be a nation-state to do this kind of brute
forcing.

> I tried timing CPU cycle counter delta across a CPUID(0) instruction.
> That alone had some jitter. I collected a few megabytes of it and ran
> it through entropy estimator, it said there is ~0.9 bit of randomness per 
> byte.
> (I'm not claiming this is true on all CPUs. I'm only sharing my experience
> on my CPU).
> 
> I tried sending signals to my own process and timing how long does that take.
> Also varies.

Your interactive experiments with an interactive system are
meaningless because you have the biggest entropy input of all: the
timing of your own human-generated input. In order for them to be
meaningful you need to start from cold-boot and script the
measurements to automatically run.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] getrandom: new applet

2016-07-08 Thread Rich Felker
On Wed, Jul 06, 2016 at 12:31:01AM -0500, Rob Landley wrote:
> On 07/05/2016 11:49 PM, Rich Felker wrote:
> > On the other hand, /dev/urandom has a problem that it will give
> > results before sufficient entropy has been collected, resulting in
> > duplicate sequences (and thus duplicate keys generated) on identical
> > devices with a fairly high probability.
> 
> Which is why you'd read a byte from /dev/random first if you need to
> wait for the pool to have entropy in it? Given that there's an existing
> mechanism that will block that'sbeen here for 20 years?

The answer to why this is not a solution is in the previous paragraph
that you didn't quote.

> > The getrandom syscall was added both to address this deficiency in
> > urandom, and to address the case where random bytes are needed but no
> > file descriptors are available (to protect against fd exhaustion
> > attacks undermining crypto, basically).
> 
> If no file descriptors are available, you can't launch a getrandom command.

This text is about the syscall not the proposed busybox command.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] getrandom: new applet

2016-07-05 Thread Rich Felker
On Tue, Jul 05, 2016 at 09:33:04PM -0500, Rob Landley wrote:
> On 07/05/2016 06:42 AM, Etienne Champetier wrote:
> >> You've implied that this new API can block until it's initialized, which
> >> reading from /dev/random can already do, and presumably
> >> select/poll/inotify could do on /dev/random without consuming entropy.
> > 
> > As shown in my mail from 29 june 2016 at 15:54
> > [   14.321536] ### getrandom ###
> > [   42.603677] ### dd ###
> > 
> > getrandom() wait until /dev/urandom is initialized, which can be way
> > before /dev/random is initialized
> 
> Ok, this is a new assertion. How does that work? How would /dev/urandom
> get random data to seed it without /dev/random having enough entropy for
> one byte of output?

The key point is that /dev/random is "tinfoil hat compliant". The
things it does (and especially considering entropy to become
"depleted") have no basis in reality but can't change because people
will complain (bikeshedding). /dev/random is safe to use, but very
slow and costly, _even after_ it has collected sufficient entropy
(because it wrongly considers that entropy to become "depleted").

On the other hand, /dev/urandom has a problem that it will give
results before sufficient entropy has been collected, resulting in
duplicate sequences (and thus duplicate keys generated) on identical
devices with a fairly high probability.

The getrandom syscall was added both to address this deficiency in
urandom, and to address the case where random bytes are needed but no
file descriptors are available (to protect against fd exhaustion
attacks undermining crypto, basically).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Bug 9076 - Whois using a non working host for queries by default

2016-07-04 Thread Rich Felker
On Mon, Jul 04, 2016 at 09:36:07PM -0500, Rob Landley wrote:
> On 07/04/2016 02:47 PM, Vito Mulè wrote:
> > stopped using strcpy,
> 
> Why?
> 
> > +size_t query_len = strlen(argv_host);
> > +   char *str_token = malloc(query_len+1 * sizeof(char));
> 
> > +   strncpy(str_token, argv_host, query_len+1);
> 
> Ignoring the way you've taken 3 lines to say strdup() (and your
> sizeof(char) is only applying to the 1, not query_len+1, but it's ok
> because sizeof(char) is a constant 1 on every system linux has ever
> supported)...

Not just Linux. sizeof(char) is literally the answer to the question
"how many chars in a char?" or "how big is char in units of char?"

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH] getrandom: new applet

2016-07-03 Thread Rich Felker
On Sun, Jul 03, 2016 at 06:44:22PM +0200, Denys Vlasenko wrote:
> On Tue, Jun 28, 2016 at 6:56 PM, Etienne CHAMPETIER
>  wrote:
> > first user of this applet will be LEDE (OpenWrt) to save an urandom seed
> > using getrandom() (so we are sure /dev/urandom pool is initialized)
> 
> 
> Please be more specific which task you want to achieve with this tool.
> 
> I googled for this name and no such tool exists (yet?) in distros,
> I hesitate to introduce a new tool and then have an API collision
> when they add something similar.
> 
> BTW, I know that security people would scream bloody murder,
> but wouldn't
> 
> cd /proc && cat cpuinfo meminfo stat interrupts diskstats slabinfo
> schedstat buddyinfo >/dev/random
> 
> in practice work quite satisfactorily for adding some entropy at boot time?
> If you don't think so, can you demonstrate a setup where the output is
> predictable?

Probably at most a few bits on SoCs without unpredictable peripherals
involved. I could try running some tests on single-core J2 when I get
a chance, but I would not expect this approch to be suitable for
anything where you depend on the output being secret/unpredictable.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] getrandom: new applet

2016-07-03 Thread Rich Felker
On Sun, Jul 03, 2016 at 06:46:32PM +0200, Denys Vlasenko wrote:
> On Tue, Jun 28, 2016 at 7:47 PM, walter harms  wrote:
> > perhaps a better aim ist to improve the $RANDOM in ash ?
> 
> $RAMDOM generator in my tests passed all "dieharder -g 200 -a" tests.
> How much better than this do you need?

That's not saying much. A tempered LCG with 32-bit state/period can
pass all of dieharder except the period tests.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: kbuild overwrites $CC

2016-07-02 Thread Rich Felker
On Fri, Jul 01, 2016 at 06:16:59PM +0200, Martin Kaiser wrote:
> Dear all,
> 
> I'm trying to compile busysbox with a toolchain created from
> yoctoproject. This toolchain comes with a setup script to set $CC and
> cross-compiler related environment variables. If the toolchain is
> installed to a non-standard path, $CC contains the --sysroot parameter,
> e.g.
> 
> martin@vagrant-ubuntu-trusty-32:~/src/busybox$ echo $CC
> arm-poky-linux-gnueabi-gcc -march=armv5te -marm -mthumb-interwork 
> --sysroot=/opt/poky-1.7.3/sysroots/armv5te-poky-linux-gnueabi
> 
> Busybox' Makefile unconditionally overwrites CC with
> 
> CC  = $(CROSS_COMPILE)gcc
> 
> CROSS_COMPILE ist set up correctly in the environment, so I get the
> right compiler but the --sysroot parameter is stripped. As the include
> path depends on sysroot, the compiler doesn't find some include files
> and busybox compilation fails.
> 
> Do you have an idea how this is supposed to work? Should I submit a
> patch that replaces CC = ... with CC ?= ... ;-)

In general, putting CC= on the make command line rather than in the
environment will override any make variables (unless an obscure GNU
make feature to force them is used, and it's not used here). If you
already have it in the environment, this will work:

make CC="$CC"

or if you want _all_ vars from the environment to take precedence:

make -e

That can be dangerous though since makefiles might happen to use var
names that happen to clash with something in your environment for
purely internal purposes, and in a worst case it could end up leading
to rm -rf of some directory you care about (I doubt this is likely
with the busybox makefile but I still don't recommend -e).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH v3 3/4] nsenter: new applet

2016-03-14 Thread Rich Felker
On Mon, Mar 14, 2016 at 02:27:56PM -0400, Mike Frysinger wrote:
> On 14 Mar 2016 18:11, Bartosz Gołaszewski wrote:
> > 2016-03-14 15:27 GMT+01:00 Mike Frysinger :
> > > On 14 Mar 2016 11:07, Bartosz Golaszewski wrote:
> > >> +#ifndef __BB_NAMESPACE_H
> > >> +#define __BB_NAMESPACE_H
> > >
> > > use a naming style like other busybox headers
> > >
> > >> +/*
> > >> + * Longest possible path to a procfs file used in namespace utils. Must 
> > >> be
> > >> + * able to contain the '/proc/' string, the '/ns/user' string which is 
> > >> the
> > >> + * longest namespace name and a 32-bit integer representing the process 
> > >> ID.
> > >> + */
> > >> +#define NS_PROC_PATH_MAX (sizeof("/proc//ns/user") + sizeof(pid_t) * 3)
> > >
> > > using the sizeof pid_t as a proxy for how many chars it'd take to render
> > > a decimal number in ASCII is wonky.  just hardcode it as "10" since that
> > > is the largest unsigned 32bit number ("4294967296").
> > 
> > Can you elaborate on how it's wonky? While your solution is perfectly
> > fine I think that there's nothing wrong with the way I've done it
> > neither.
> 
> the code seems to assume that the byte size scales into the number of
> chars required to represent the number in base 10 when it's really a
> log scale.  here's a table to show it's wonky:

sizeof is already a log scale. 3*sizeof(pid_t) is
3*log_256(MAX_PID+1), which is greater than 3*ceil(log_1000(MAX_PID)),
the actual space requirement.

> pid_t |sizeof |*3 val|actual|
> size  |(bytes)|(char)|  |
> (bits)|   |  |  |
> --|---|--|--|
>   8   |  1|  3   |  4   |
>  16   |  2|  6   |  6   |
>  32   |  4| 12   | 11   |
>  64   |  8| 24   | 20   |
> 128   | 16| 48   | 40   |
> 
> the "actual" value is one higher than you might expect because pid_t
> is a signed quantity.  for 8bit, "-128" is 0x80 and takes 4 bytes.
> for 32bit ("int"), "-2147483648" is 0x8000 and takes 11 bytes.

While pid_t is signed, negative values are not meaningful as pids;
they're pgids or special sentinels in some contexts. If this code is
trying to print a negative pid into a pathname used in /proc, that's a
bug. But in general, when using the 3*sizeof idiom you should add +1
for a sign if needed and another +1 for null termination if not
already included elsewhere.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH v3 3/4] nsenter: new applet

2016-03-14 Thread Rich Felker
On Mon, Mar 14, 2016 at 10:27:19AM -0400, Mike Frysinger wrote:
> On 14 Mar 2016 11:07, Bartosz Golaszewski wrote:
> > +#ifndef __BB_NAMESPACE_H
> > +#define __BB_NAMESPACE_H
> 
> use a naming style like other busybox headers

And in particular, don't use leading underscores, ever. They're not
available for use by applications.

> > +/*
> > + * Longest possible path to a procfs file used in namespace utils. Must be
> > + * able to contain the '/proc/' string, the '/ns/user' string which is the
> > + * longest namespace name and a 32-bit integer representing the process ID.
> > + */
> > +#define NS_PROC_PATH_MAX (sizeof("/proc//ns/user") + sizeof(pid_t) * 3)
> 
> using the sizeof pid_t as a proxy for how many chars it'd take to render
> a decimal number in ASCII is wonky.  just hardcode it as "10" since that
> is the largest unsigned 32bit number ("4294967296").

I disagree that this is "wonky". It's a simple safe bound for number
of characters needed to print an integer type. Hard-coding a number,
while unlikely to affect any future Linux targets, is a bomb in the
code that could become dangerous if someone reused it on a system with
pid_t larger than 32-bit.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 0/2] ntpd: retry name-resolution until success

2016-02-10 Thread Rich Felker
On Thu, Feb 11, 2016 at 05:56:06AM +0100, Denys Vlasenko wrote:
> On Thu, Feb 4, 2016 at 2:43 PM, Mark O'Donovan  wrote:
> > This is an attempt to fix bug 8131:
> >   ntpd: should retry on name resolving error
> 
> Should it?
> Why doesn't ping do this? telnet? netcat? No network utility does this.
> Adding it to ntpd only would be inconsistent.

Because ping, telnet, and netcat are not daemons. A daemon should
respond gracefully to transient failures rather than aborting.

> I can see why this request comes specifically for ntpd. It is often started
> early at boot. Sometimes, network isn't up for some tens of seconds
> after boot, so ntpd fails to resolve names.
> 
> There are other ways to fix this.
> An utterly simplistic one would be to delay ntpd startup by, say, 20 seconds.

That's utterly broken. You end up with an invalid or badly-wrong
clock for the first 20 seconds after boot, which could lead to all
sorts of problems with timestamps.

> A slightly better one is to restart ntpd when it fails.

I don't see how this is any better than having it retry internally.
It's probably more work to implement and yields worse behavior.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH V2]networking: Allow dot at the end of the domain name in dhcp response

2016-02-10 Thread Rich Felker
On Thu, Feb 11, 2016 at 05:59:39AM +0100, Denys Vlasenko wrote:
> On Tue, Feb 9, 2016 at 6:23 PM, Balaji Punnuru  
> wrote:
> > When a dhcp server responds with a domain name that ends with a ".",
> > domain name validation is failing which leads to populating domain bad in
> > resolv.conf
> >
> > Domain name ending with . is a valid syntax according RFC-1034.
> >
> > The Patch fixes the domain name validation which ends with "."
> 
> Well, this may be technically allowed, but wouldn't you think that
> such DHCP response is definitely odd, and it's better to know about it?

Maybe a log message for the oddity, but it should not result in active
misconfiguration. Ending dns names with a dot is valid and IIRC
historically this practice was used to indicate that the name is
anchored to the dns root rather than possibly being interpreted
relative to some domain. The standard APIs for lookups all accept
names ending with a dot, and this should suppress search domains.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Fix compiling with musl's utmp stubs

2016-01-30 Thread Rich Felker
On Sun, Jan 31, 2016 at 01:02:27AM -0500, Kylie McClain wrote:
> >From 4d76d9b69503b1ad61cf857d83cb8c4d5d92465b Mon Sep 17 00:00:00 2001
> From: Kylie McClain 
> Date: Sat, 30 Jan 2016 23:00:18 -0500
> Subject: [PATCH] Fix compiling with musl's utmp stubs
> 
> This patch fixes compiling busybox with FEATURE_UTMP and _WTMP enabled.
> musl, while not really support utmp/wtmp, provides stub functions, as well
> as variables such as _PATH_UTMP, so that programs using utmp or wtmp can
> still compile fine.
> 
> My reasoning for this patch is that on Exherbo, I'm currently trying to get
> us to be able to use the same busybox config file for both glibc and musl
> systems, using utmp/wtmp on systems that support it, and using the stubs
> on musl without needing two different configs.
> 
> As of latest musl git, it provides all utmp functions needed; 1.1.12 doesn't,
> but I sent a patch to Rich to add the utmp{,x}name functions expected to
> exist, which was merged into musl upstream.
> ---
>  include/libbb.h | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/libbb.h b/include/libbb.h
> index a8ceb44..c1516a9 100644
> --- a/include/libbb.h
> +++ b/include/libbb.h
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #if defined __UCLIBC__ /* TODO: and glibc? */
>  /* use inlined versions of these: */
>  # define sigfillset(s)__sigfillset(s)
> @@ -106,7 +107,11 @@
>  #  define updwtmpx updwtmp
>  #  define _PATH_UTMPX _PATH_UTMP
>  # else
> +#  include 
>  #  include 
> +#  if defined _PATH_UTMP && !defined _PATH_UTMPX
> +#   define _PATH_UTMPX _PATH_UTMP
> +#  endif
>  # endif
>  #endif
>  #if ENABLE_LOCALE_SUPPORT
> -- 
> 2.7.0

I think we could just expose these from utmpx.h on musl if it helps.
They're in the reserved namespace so that's certainly permissible.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Remove a compilation warning

2016-01-01 Thread Rich Felker
On Fri, Jan 01, 2016 at 02:44:25PM -0500, Jody Bruchon wrote:
> How about putting the left side in parentheses to silence the
> warning instead of turning the warning off completely?

My point was that the parentheses make the code less readable, not
more readable, and that nobody would reasonably mistake the operator
precedence in this case or in any case reported by
-Wlogical-not-parentheses. It's a bogus warning.

Rich

> On January 1, 2016 2:14:55 PM EST, Rich Felker <dal...@libc.org> wrote:
> >On Thu, Dec 31, 2015 at 11:01:24AM +0100, Cristian Ionescu-Idbohrn
> >wrote:
> >> Reported by gcc (Debian 5.3.1-4) 5.3.1 20151219
> >> 
> >> shell/ash.c: In function 'evaltree':
> >> shell/ash.c:8432:19: warning: logical not is only applied to the left
> >hand side of comparison [-Wlogical-not-parentheses]
> >>if (!exitstatus == is_or)
> >>^
> >
> >I would prefer turning off this warning instead of making the code
> >harder to read. If someone would read !a == b as !(a==b) then they
> >have much bigger C comprehension problems they need to work on.
> >
> >Rich
> >___
> >busybox mailing list
> >busybox@busybox.net
> >http://lists.busybox.net/mailman/listinfo/busybox
> 
> -- 
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Remove a compilation warning

2016-01-01 Thread Rich Felker
On Thu, Dec 31, 2015 at 11:01:24AM +0100, Cristian Ionescu-Idbohrn wrote:
> Reported by gcc (Debian 5.3.1-4) 5.3.1 20151219
> 
> shell/ash.c: In function 'evaltree':
> shell/ash.c:8432:19: warning: logical not is only applied to the left hand 
> side of comparison [-Wlogical-not-parentheses]
>if (!exitstatus == is_or)
>^

I would prefer turning off this warning instead of making the code
harder to read. If someone would read !a == b as !(a==b) then they
have much bigger C comprehension problems they need to work on.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: how to use ntp in busybox

2015-11-24 Thread Rich Felker
On Tue, Nov 24, 2015 at 10:40:51AM +0100, walter harms wrote:
> 
> 
> Am 23.11.2015 07:49, schrieb Wei, Catherine:
> > Hi:
> >  I've checked the ntpclient, and found that it can monitor the time 
> > difference between server and local machine, but seems it cannot support 
> > runtime configuration as Michael D. Setzer II said.
> > 
> 
> We use it busybox ntp for some time now, also with ntpclient since it is 
> convenient.
> btw: ntpdate is removed from ntp some time ago.
> 
> Why do you need a runtime configuration at all ? you can always restart with 
> a new setup,
> write a small wrapper and you can use a file again.

Indeed, this seems ideal. Proper use of ntp only makes extremely-small
adjustments to the clock rate, so stopping the ntp client for a
fraction of a second (or even several hours) and restarting it with
new configuration is not going to have any noticable effect on the
system clock.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] fix hush umask command

2015-10-06 Thread Rich Felker
On Wed, Oct 07, 2015 at 12:31:15AM -0400, Rich Felker wrote:
> The hush umask command is backwards; umask 022 sets the umask to 0755.
> Attached patch should fix it.
> 
> Rich

> diff --git a/shell/hush.c b/shell/hush.c
> index 96c739f..58c8dab 100644
> --- a/shell/hush.c
> +++ b/shell/hush.c
> @@ -8956,9 +8956,7 @@ static int FAST_FUNC builtin_umask(char **argv)
>   if (argv[0]) {
>   mode_t old_mask = mask;
>  
> - mask ^= 0777;
>   rc = bb_parse_mode(argv[0], );
> - mask ^= 0777;
>   if (rc == 0) {
>   mask = old_mask;
>   /* bash messages:

Hmm, this may break symbolic umask specs, a feature I've never seen
used and wasn't aware even existed. I might look into it more later
but it would be great if someone who already knows this stuff could
figure it out and work out a correct patch.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[PATCH] fix hush umask command

2015-10-06 Thread Rich Felker
The hush umask command is backwards; umask 022 sets the umask to 0755.
Attached patch should fix it.

Rich
diff --git a/shell/hush.c b/shell/hush.c
index 96c739f..58c8dab 100644
--- a/shell/hush.c
+++ b/shell/hush.c
@@ -8956,9 +8956,7 @@ static int FAST_FUNC builtin_umask(char **argv)
if (argv[0]) {
mode_t old_mask = mask;
 
-   mask ^= 0777;
rc = bb_parse_mode(argv[0], );
-   mask ^= 0777;
if (rc == 0) {
mask = old_mask;
/* bash messages:
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH] su: support denying accounts with blank password

2015-10-01 Thread Rich Felker
On Thu, Oct 01, 2015 at 03:42:40PM -0300, Alain Mouette wrote:
> Why would you want to completely disable root login?
> 
> If it is a security feature, how can it be used? It can be
> interesting to avoid escalating priviledges, but then how to to
> administrative tasks? (assuming ssh root login is disabled too)

Requiring pubkey-based ssh login is the canonical best-practices way.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [RFC/PATCH v2 3/5] libbb: add ends_with() function

2015-08-21 Thread Rich Felker
On Fri, Aug 21, 2015 at 06:26:27PM +0200, walter harms wrote:
 
 
 Am 21.08.2015 16:23, schrieb Bartosz Golaszewski:
  This function checks if given key can be found at the end of the string.
  
  Signed-off-by: Bartosz Golaszewski bartekg...@gmail.com
  ---
   include/libbb.h  |  1 +
   libbb/compare_string_array.c | 30 ++
   2 files changed, 31 insertions(+)
  
  diff --git a/include/libbb.h b/include/libbb.h
  index a56b684..7b41c9b 100644
  --- a/include/libbb.h
  +++ b/include/libbb.h
  @@ -422,6 +422,7 @@ const char *bb_basename(const char *name) FAST_FUNC;
   char *last_char_is(const char *s, int c) FAST_FUNC;
   const char* endofname(const char *name) FAST_FUNC;
   char *is_prefixed_with(const char *string, const char *key) FAST_FUNC;
  +int ends_with(const char *str, const char *key) FAST_FUNC;
   
   int ndelay_on(int fd) FAST_FUNC;
   int ndelay_off(int fd) FAST_FUNC;
  diff --git a/libbb/compare_string_array.c b/libbb/compare_string_array.c
  index e24815a..a2d77c7 100644
  --- a/libbb/compare_string_array.c
  +++ b/libbb/compare_string_array.c
  @@ -23,6 +23,19 @@ char* FAST_FUNC is_prefixed_with(const char *string, 
  const char *key)
   #endif
   }
   
  +int FAST_FUNC ends_with(const char *str, const char *key)
  +{
  +   size_t str_len = strlen(str), key_len = strlen(key);
  +
  +   if (str_len = key_len) {
  +   if (strcmp(str + str_len - key_len, key) == 0) {
  +   return 1;
  +   }
  +   }
  +
  +   return 0;
  +}
  +
 
 maybe its a bit late but ...
 
 the function name is a bit unfortunate whats about has_suffix() ?
 
 you can improve readability by returning strcmp directly and calculating
 the len immediately.
 
 int has_suffix(const char *str, const char *key)
 {
 size_t len = strlen(str)-strlen(key);
 
 if (len  0 )
   return -1;

No you can't. len0 is always false because len is unsigned. The
original code in the patch was correct; your version is buggy.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: osuosl.org and Spamhaus PBL

2015-08-03 Thread Rich Felker
On Wed, Jul 22, 2015 at 03:16:15PM +0200, Laurent Bercot wrote:
   X-Greylist: delayed 00:06:59 by SQLgrey-1.7.6
 
  (For the record, I read the headers wrong, and it's osuosl.org
 that actually performs that greylisting. My apologies to Numericable,
 for once I accused them wrongly.
  Now to understand what that greylisting does for osuosl.org...
 if they're using PBL in the first place, and only accept SMTP
 connections from known ISP servers... oh well, some things are
 best left unexplained.)

Greylisting (at least if passing results are cached) is a lot less
hostile than outright blocking, but I share your sentiment and would
strongly prefer that this list (and in general, all mail services used
by multiple users, some of whom may not want this hostile blocking)
not use the PBL. I've even had my (non-dynamic, business class) IP
addresses blocked by it in the past and nearly had to pay ransom to
get them delisted.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 1/2] Revert ash: use alloca to get rid of setjmp

2015-07-26 Thread Rich Felker
On Wed, Jul 22, 2015 at 04:02:22PM +0100, Daniel Thompson wrote:
 2015-07-22 5:19 GMT+02:00 Rich Felker dal...@libc.org:
 On Sun, Jul 19, 2015 at 11:07:13PM +0200, Denys Vlasenko wrote:
 I would rather keep it.
 
 What is the most horrible thing which can happen here?
 
 Arbitrary code execution due to stack overflow. Does this really need
 a PoC? alloca is _always_ unsafe unless the argument is bounded and
 tiny.
 
 It would interesting to know if ash ever automatically runs its
 tokenizer over environment variables.
 
 If the tokenizer can only run on the command stream then there's not
 much to be gained from overflowing the stack since anyone who can
 inject an evil token in to command stream already has shell access.

This is not the case. A counterexample is eval acting on a string
constructed from untrusted input that was already validated to be safe
(e.g. to consist entirely of alphanumeric characters).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 1/2] Revert ash: use alloca to get rid of setjmp

2015-07-21 Thread Rich Felker
On Sun, Jul 19, 2015 at 11:07:13PM +0200, Denys Vlasenko wrote:
 I would rather keep it.
 
 What is the most horrible thing which can happen here?

Arbitrary code execution due to stack overflow. Does this really need
a PoC? alloca is _always_ unsafe unless the argument is bounded and
tiny.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 2/2] ash: use alloca to get rid of setjmp

2015-07-15 Thread Rich Felker
On Mon, Jul 13, 2015 at 04:25:02AM +0200, Denys Vlasenko wrote:
 On Thu, Jul 2, 2015 at 10:01 AM, Ron Yorston r...@frippery.org wrote:
  Rich Felker wrote:
 In general alloca is unsafe. It's not obvious to me what the code here
 is doing, so I can't tell for sure if it's safe or not, but I think
 this needs a strong justification of safety before being acceptable.
 
  It's a parser for a POSIXy shell, I doubt that the code is obvious to
  anyone.
 
  My understanding is that it's reading a token and has got to the point
  where a command substitution has been detected.  It wants to save the
  bit of the token it's already processed.  So if we have
 
 echo very long string`date`
 
  the code would allocate space for the very long string.
 
 In practice, it would be a problem if very long string
 is some hundreds of kbytes, even a few mbytes long.
 
  Is this safe?  In most cases it probably is, but not if the script is
  malicious.  If the very long string is too big for your stack you get a
  seg fault or worse.  With a suitably long string and small stack I can
  reliably crash dash.
 
 With a sufficiently small memory limits you can crash any shell.
 
 Let's go with this change, unless someone sees a way to not just
 crash, but do something worse.

I suspect it can easily be made to do arbitrary code execution when
otherwise-safe (e.g. checked against whitelist for special chars)
strings from untrusted input are expanded inside eval commands.

Any new use of VLA/alloca should be completely banned. It's basically
always an exploitable bug.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 2/2] ash: use alloca to get rid of setjmp

2015-07-01 Thread Rich Felker
On Wed, Jul 01, 2015 at 04:46:18PM +0100, Ron Yorston wrote:
 Now that the only thing protected by setjmp/longjmp is the saved string,
 we can allocate it on the stack to get rid of the jump.
 
 Based on commit bd35d8e from git://git.kernel.org/pub/scm/utils/dash/dash.git
 by Herbert Xu.

In general alloca is unsafe. It's not obvious to me what the code here
is doing, so I can't tell for sure if it's safe or not, but I think
this needs a strong justification of safety before being acceptable.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: [EDT][PATCH 0/11] typo fixes in busybox

2015-05-21 Thread Rich Felker
On Fri, May 22, 2015 at 03:39:18AM +, Maninder Singh wrote:
  EP-E9D7571734A347E2ADA07C4134AB97EA
  Hi,
 
 Please fix your MUA or MTA to omit that EP- thing in the mail-body.
 Hi Bernhard,
 Already requested for fixing, from Next time hopefullt it will be fixed.
 Do i need to resubmit the patches if changes are acceptable?

You're also breaking threading by sending a new message rather than
replying (the In-Reply-To: header is missing from this reply). From
the headers in your messages, your mail software looks like it's
seriously misconfigured..

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Bionic lacks transparent LFS migrations; provide a workaround

2015-04-28 Thread Rich Felker
On Tue, Apr 28, 2015 at 09:53:26PM -0400, Matt Whitlock wrote:
 On Wednesday, 29 April 2015, at 1:26 am, Tanguy Pruvot wrote:
  For this case i generally cast the param to the biggest possible type.
  
  printf(llu, (uint64_t) val);
 
 You probably shouldn't assume that uint64_t is the largest possible
 integer type. In particular, off_t is not guaranteed to be = 64
 bits.
 
 #include inttypes.h
 printf(%PRIuMAX, (uintmax_t) val);
 
 But I hate the idea of involving all the nasty large-integer
 arithmetic machinery if it's not strictly necessary.

PRIuMAX is useless. There's a standard name for this, %ju.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Bionic lacks transparent LFS migrations; provide a workaround

2015-04-28 Thread Rich Felker
On Tue, Apr 28, 2015 at 03:55:59PM -0400, Matt Whitlock wrote:
 On Tuesday, 28 April 2015, at 9:56 am, Rich Felker wrote:
  What I do think is a bad idea is providing a hackish environment where
  only some things work correctly with 64-bit off_t and others silently
  fail or even misinterpret their arguments and blow up. If that can be
  avoided and the ugliness can be contained in a pit that's isolated to
  Bionic, I'm not opposed to it.
 
 For what it's worth, BusyBox *presently* misinterprets arguments on
 Bionic because Bionic's struct stat has 64-bit st_ino and st_size,
 but BusyBox happily assigns the values of these fields into
 variables of type ino_t and off_t, which breaks at run-time. These

Uhg.

 problems would be apparent at compile-time too, except that
 BusyBox's build system doesn't specify -Wconversion, so no warnings
 are produced. Specifying -Wconversion reveals numerous places
 throughout the BusyBox sources where rvalues are implicitly
 narrowed. My personal belief is that these warnings should always be
 enabled, and implicit narrowing conversions should always be avoided
 by using explicit casts, to make obvious that a change in value is
 possible, but I am not here to convince the BusyBox devs to
 eliminate all the implicit narrowing conversions in BusyBox.

The implicit narrowing warnings are almost all noise. For instance any
arithmetic with types smaller than int will result in implicit
narrowing when you store the result. So I don't think it's practical
to use them.

 Besides, I agree that Bionic is at fault for deviating from the
 standard here by declaring st_ino and st_size as unsigned long long
 and long long rather than as ino_t and off_t.

Yes, this is very broken. This all really needs to be fixed on
Bionic's side -- they should drop the support for 32-bit off_t and
always remap the standard names (at the header level) to the *64
functions. It would have been better if they'd never defined the
non-64-suffixed functions with the wrong 32-bit-off_t semantics to
begin with but that mistake was already done. Still they can fix the
compile-time behavior without breaking their ABI.

 I'll work on finishing the Large File Support in Bionic and adding
 support for _FILE_OFFSET_BITS==64. Is there any chance that a Bionic
 headers patch could be shipped with BusyBox? Again, there is no
 reason to require BusyBox users to compile against the latest Bionic
 headers just to get support for large files. In fact, LFS could be
 implemented on all versions of Bionic with a simple -include in
 CFLAGS. (That's almost no different than implementing it in
 platform.h, but maybe you'd like it better in a separate file.)

For Busybox in the short term, I'd say just use whatever works well
and doesn't uglify non-Bionic support. But considering how broken this
is there should really be a push to fix Bionic in a way all apps can
use, not just Busybox. Whether that means upstreaming a patch to fix
it or having a patch/preinclude that you can just apply/add to CFLAGS
probably depends on what's practical with upstream...

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 5/5] Bionic lacks transparent LFS migrations; provide a workaround

2015-04-27 Thread Rich Felker
On Mon, Apr 27, 2015 at 06:30:59AM +, dietmar.schind...@manroland-web.com 
wrote:
  On Friday, 24 April 2015, at 4:12 am, Matt Whitlock wrote:
   +#if defined(__BIONIC__)  _FILE_OFFSET_BITS == 64
 
  The preprocessor needs to test whether _FILE_OFFSET_BITS is defined before 
  using it in an
  expression.
 
 That's not true. INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x,
 Programming languages - C, 6.10.1 Conditional inclusion, §4:
 ... After all replacements due to macro expansion and the *defined*
 unary operator have been performed, all remaining identifiers ...
 are replaced with the pp-number 0...
 
 Thus, it's not needed to write
 
  #if defined(__BIONIC__)  defined(_FILE_OFFSET_BITS)  _FILE_OFFSET_BITS 
  == 64

It's a matter of policy. While it's valid C, -Wundef generates
warnings for such usage and thus policy is to avoid it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH] Bionic lacks transparent LFS migrations; provide a workaround

2015-04-26 Thread Rich Felker
On Sat, Apr 25, 2015 at 04:06:23PM -0400, Matt Whitlock wrote:
 On Saturday, 25 April 2015, at 11:38 am, Rich Felker wrote:
  On Sat, Apr 25, 2015 at 04:53:33AM -0400, Matt Whitlock wrote:
   This solves some of the problems arising from Bionic's off_t being
   32 bits wide despite _FILE_OFFSET_BITS==64. See BusyBox bug #6908.
   
   Note that this doesn't solve all such problems since Bionic lacks
   64-bit variants of many standard I/O functions: open, creat, lockf,
   posix_fadvise, posix_fallocate, truncate, sendfile, getrlimit,
   setrlimit, fopen, freopen, fseeko, ftello, fgetpos, fsetpos,
   mkstemp.
  
  As long as that's the case, I think this approach of trying to fake
  64-bit off_t on Bionic is dangerous and probably best omitted. IMO we
  should either be pushing for Bionic to fix these things or sticking
  with 32-bit off_t on Bionic and treating that as a limitation of the
  platform.
 
 Don't throw the baby out with the bathwater. Bionic *does* support
 large files. In fact, it's mandatory in some cases: there are no
 32-bit versions of stat(), lstat(), fstat(), fstatat(), statfs(),
 fstatfs(), or getdents(); and struct stat, struct statfs, and struct
 dirent are equivalent to struct stat64, struct statfs64, and struct
 dirent64 from Glibc, respectively.
 
 Furthermore, while Bionic does support 32-bit lseek(), pread(),
 pwrite(), and ftruncate() with 32-bit off_t, it does also have
 lseek64(), pread64(), pwrite64(), and ftruncate64() with off64_t.
 
 In short, Bionic has most of the syscall-level support introduced by
 _LARGEFILE64_SOURCE in Glibc, but they did not implement the
 transparent migration introduced by _FILE_OFFSET_BITS==64. This is
 easy to implement atop Bionic, however, as it is simply a matter of
 mapping the relevant 32-bit types and functions to their 64-bit
 equivalents. (See feature_test_macros(7).)

This really can and should be fixed at the header level for Bionic.
Nobody should be writing foo64 in sources except as a transparent
workaround for broken systems. While you could do the workaround in
Busybox, I think it would be much more useful as a patch to the Bionic
headers that anyone can apply to the installed headers to fix them,
and a patch to send upstream to get the source of the problem fixed.

 Bionic does *not* implement the 64-bit stdio functions, but those
 use a different type (fpos_t instead of off_t) and so are not
 affected by this patch.
 
 Would you be more willing to apply this patch if I were to produce a
 unit test suite demonstrating that 64-bit file syscalls work with
 off_t while stdio functions remain 32-bit with fpos_t?

That's something of a mischaracterization. The only useful stdio
functions use off_t: fseeko and ftello. fgetpos and fsetpos are rather
useless since all they can do is save/restore a position you've
already reached as an abstract position object. They cannot work with
any kind of sequential addresses.

As for being willing to apply the patch, it's not my decision.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Bionic lacks transparent LFS migrations; provide a workaround

2015-04-25 Thread Rich Felker
On Sat, Apr 25, 2015 at 04:53:33AM -0400, Matt Whitlock wrote:
 This solves some of the problems arising from Bionic's off_t being
 32 bits wide despite _FILE_OFFSET_BITS==64. See BusyBox bug #6908.
 
 Note that this doesn't solve all such problems since Bionic lacks
 64-bit variants of many standard I/O functions: open, creat, lockf,
 posix_fadvise, posix_fallocate, truncate, sendfile, getrlimit,
 setrlimit, fopen, freopen, fseeko, ftello, fgetpos, fsetpos,
 mkstemp.

As long as that's the case, I think this approach of trying to fake
64-bit off_t on Bionic is dangerous and probably best omitted. IMO we
should either be pushing for Bionic to fix these things or sticking
with 32-bit off_t on Bionic and treating that as a limitation of the
platform.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 1/2] Bionic lacks ttyname_r; provide a workaround

2015-04-24 Thread Rich Felker
On Fri, Apr 24, 2015 at 05:22:57PM -0400, Matt Whitlock wrote:
 I think providing an alternative implementation of ttyname_r() in
 missing_syscalls.c won't work. Busybox cannot be statically linked
 when ttyname_r is defined in missing_syscalls.o:
 
 ld: error: 
 /usr/local/arm-linux-androideabi/bin/../sysroot/usr/lib/libc.a(stubs.o): 
 multiple definition of 'ttyname_r'
 ld: libbb/lib.a(missing_syscalls.o): previous definition here

This is only happening because another function from stubs.o is being
pulled in. Passing -Wl,-t might be able to show the reason. But it's
probably best to #define ttyname_r to something else when providing a
fallback to avoid the problem.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] libbb: remove unnecessary argument to nonblock_immune_read

2015-04-19 Thread Rich Felker
On Sun, Apr 19, 2015 at 10:50:25AM +0100, Ron Yorston wrote:
 The loop_on_EINTR argument to nonblock_immune_read is always set to 1.
 
 function old new   delta
 xmalloc_reads200 195  -5
 pgetc488 483  -5
 argstr  13131308  -5
 nonblock_immune_read 123  86 -37
 --
 (add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-52) Total: -52 bytes

An even better question is whether the looping code could be
eliminated entirely. Are any of the bb commands that use this code
installing interurpting signal handlers? If not then the code for
supporting EINTR is useless.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [musl] Re: Busybox on musl is affected by CVE-2015-1817

2015-04-02 Thread Rich Felker
On Wed, Apr 01, 2015 at 10:49:40AM +0200, u-w...@aetey.se wrote:
 Hi Harald,
 
 On Wed, Apr 01, 2015 at 10:11:51AM +0200, Harald Becker wrote:
  There is a big difference if you talk about suid
  *root* programs or other suid usage.
  
  The former is definitely very dangerous and should be used with extreme care
  (I think this is the case we are talking about), the later use may even be
  used to drop privileges (not to raise), or to temporarily hop to the
 
 You are right about a remarkable difference between suid root and suid
 somethingelse, just because root has the very special role in Unix.

The difference is there but it's not as remarkable as you think. As
soon as you can get control of uid==suid_target_uid, you can do nasty
things as that user like modifying the binary that's suid to that
user, so that the next user who invokes it invokes malicious code.
suid is pretty much across-the-board evil.

 Besides root vs non-root I doubt there is any clear line between raising
 and dropping privileges, you replace one set of allowed operations by
 a different one, that's it. Note that root does not need suid to drop
 its privileges.

Dropping privileges in the sense of suid just means setting the
effective and saved uids for the process to the real (invoking,
returned by getuid()) uid.

  privileges of a different user (may be allowing access to some files only by
  using specific commands).
 
 As soon as a program is setuid it has to be written for the purpose
 of very reliably checking and limiting what it does on behalf of who,
 independently of how it can potentially be invoked out of context.
 This is known to be hard, I believe it is harder to do reliably than
 e.g. issue a request to a daemon - talking about the complexity level.

Yes. Said differently, the difference is a huge and perpetually
expanding attack surface (all process state inheritable across exec)
versus a tiny attack surface (one unix socket).

Please don't misinterpret this -- I'm not saying that busybox should
be implementing a ping daemon. Rather we should be using the
functionality the kernel gave us for avoiding suid (SOCK_DGRAM ICMP)
when it's available and making it possible for users to get a fully
functional busybox without the need for suid.

  When used with care and as intended, suid and sgid is a nice feature, but
 
 Unfortunately I can not really appreciate its beauty which appears to hide
 the complexity and/or move it to other parties (like the dynamic linker
 or the software maintenance infrastructure). Yes it looks simple and
 efficient but is it, really?

It's not.

  nowadays there are too many Unix novices, who misunderstand or misuse this,
  punching big holes in every security concern.
 
 Unfortunately even seasoned gurus easily create / fail to notice holes!
 :(

I'd much rather eliminate the opportunity for the hole from the start.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Busybox on musl is affected by CVE-2015-1817

2015-03-29 Thread Rich Felker
For details on CVE-2015-1817, see:
http://www.openwall.com/lists/musl/2015/03/30/1

With musl-linked Busybox installed setuid and ping enabled, exploiting
this issue is trivial.

While CVE-2015-1817 is certainly musl's fault, there are two changes
to Busybox I'd like to propose that would have prevented it from being
exploitable:

1. Having setuid utilities like ping obtain the resource they need (in
   the case of ping, SOCK_RAW) without processing user input at all,
   then fully dropping root (setuid(getuid())) before doing anything.
   This has been standard practice for setuid programs since the 90s
   and it feels bad that busybox is not doing it.

2. Reconsider the rejection of the patch to add SOCK_DGRAM support for
   ping, which allows it to run without root.

Do either or both of these sound acceptable?

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [RFC] Proof-of-concept for netlink listener for mdev -i

2015-03-14 Thread Rich Felker
On Thu, Mar 12, 2015 at 05:51:00PM +0100, Laurent Bercot wrote:
 On 12/03/2015 16:19, Natanael Copa wrote:
 netlink listener code that needs to be in memory all the time:
 http://git.alpinelinux.org/cgit/ncopa/nldev/tree/nldev.c
 
  A few comments:
 [...]
  - It may be worth it to write functions similar to die(), edie()
 and dbg() that don't use stdio, and call _exit() instead of exit().
 If you manage to scrap stdio, you gain 2 pages of text (with the
 musl implementation).

Could you elaborate on how you measure that? With musl only the parts
of stdio you actually use will be linked, and use of exit does not
result in linking of any additional code, since the startup code has
to call exit(main(...)) anyway. In any case exit and its dependencies
are tiny.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [RFC] Proof-of-concept for netlink listener for mdev -i

2015-03-14 Thread Rich Felker
On Sun, Mar 15, 2015 at 01:22:35AM +0100, Laurent Bercot wrote:
 On 14/03/2015 20:23, Rich Felker wrote:
 Could you elaborate on how you measure that? With musl only the parts
 of stdio you actually use will be linked, and use of exit does not
 result in linking of any additional code, since the startup code has
 to call exit(main(...)) anyway. In any case exit and its dependencies
 are tiny.
 
  The die(), edie() and dbg() functions/macros in the example use printf
 format strings and variable arguments, I assumed they are built around
 a function of the printf family.
 
  gcc 4.8.1, musl from 2014-12-23, Linux 3.10, x86_64.
 
 $ cat nostdio.c
 #include unistd.h
 
 int main (void)
 {
   write(1, Hello World!\n, 13) ;
   return 0 ;
 }
 
 $ gcc -O2 -static -o nostdio nostdio.c  strip nostdio
 $ wc -c nostdio
 2952 nostdio
 
 $ cat stdio.c
 #include stdio.h
 
 int main (void)
 {
   printf(%s!\n, Hello World) ; /* avoids gcc magic */
   return 0 ;
 }
 
 $ gcc -O2 -static -o stdio stdio.c  strip stdio
 $ wc -c stdio
 15224 stdio
 
  That's almost 3 pages.

Yes. That's printf though, not stdio in general. printf and its
dependencies are a good 8k or so. Anyway if that was your intended
meaning then you're right.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Fwd: [PATCH] ls: no longer assume a 4-digit year on file timestamp.

2015-02-17 Thread Rich Felker
On Tue, Feb 17, 2015 at 02:29:05PM +0100, Laurent Bercot wrote:
 On 17/02/2015 13:52, Explorer wrote:
 My original motivation was to avoid that annoying comment saying that
 it's buggy after year , and I believe it should be written right
 in the first place.
 
  Would you prefer the following comment: It's buggy after year 9 ?
 
  (Not saying that the patch is a bad idea, but if you want to write
 it right, then it's a bit more complex. And totally not worth it.)

Read the new code again. It works fine for that case as far as I can
tell.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] linedit, deluser: use POSIX getpwent instead of getpwent_r

2015-02-07 Thread Rich Felker
On Sat, Feb 07, 2015 at 09:49:19AM -0800, Isaac Dunham wrote:
 On Thu, Feb 05, 2015 at 03:52:24PM -0500, Rich Felker wrote:
  On Thu, Feb 05, 2015 at 09:42:08PM +0100, Denys Vlasenko wrote:
   struct passwd *getpwent()
   {
   static char *line;
   static struct passwd pw;
   size_t size=0;
   if (!f) f = fopen(/etc/passwd, rbe);
   if (!f) return 0;
   return __getpwent_a(f, pw, line, size);
   }
   
   I would prefer that even struct passwd is malloced...
  
  I don't think it would make much practical difference. It could be
  changed though.
  
   But more importantly, bbox can't optimize only for musl.
   Other libc'es  may have static line buffers there.
   
   And musl will eventually be forced to implement getpwent_r()
   if it wants to be usable for more packages... so...
  
  getpwent_r makes no sense; the _r functions are for thread-safe
  versions of their corresponding legacy functions, but getpwent_r has
  inherent global state -- the iterator. Whoever made it just wasn't
  thinking. To make a correct interface like this the caller would need
  to have an iterator object to pass to the function, but I can't see
  much merit in inventing a new interface for this.
 
 Besides having hidden global state, the man page notes:
  Other systems  use  the prototype
   struct passwd *getpwent_r(struct passwd *pwd, char *buf, int buflen);
  or, better,
   int getpwent_r(struct passwd *pwd, char *buf, int buflen, FILE **pw_fp);
 
 In other words, according to the manpage, getpwent_r() is decidedly
 unportable.
 
 Per my investigations, Dragonfly/Net/FreeBSD seem to use the same
 prototype as glibc; apparently Solaris uses the first alternate prototype;
 and the last mentioned seems to be a reference to Tru64, which
 uses pw_fp to keep track of its position instead of an iterator.
 
 OpenBSD and MirBSD do not implement getpwent_r, as far as I can tell.

It should be noted here that multiple conflicting historical
definitions of a nonstandard interface are one of the big exclusion
criteria musl goes by.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] linedit, deluser: use POSIX getpwent instead of getpwent_r

2015-02-05 Thread Rich Felker
On Thu, Feb 05, 2015 at 09:42:08PM +0100, Denys Vlasenko wrote:
 struct passwd *getpwent()
 {
 static char *line;
 static struct passwd pw;
 size_t size=0;
 if (!f) f = fopen(/etc/passwd, rbe);
 if (!f) return 0;
 return __getpwent_a(f, pw, line, size);
 }
 
 I would prefer that even struct passwd is malloced...

I don't think it would make much practical difference. It could be
changed though.

 But more importantly, bbox can't optimize only for musl.
 Other libc'es  may have static line buffers there.
 
 And musl will eventually be forced to implement getpwent_r()
 if it wants to be usable for more packages... so...

getpwent_r makes no sense; the _r functions are for thread-safe
versions of their corresponding legacy functions, but getpwent_r has
inherent global state -- the iterator. Whoever made it just wasn't
thinking. To make a correct interface like this the caller would need
to have an iterator object to pass to the function, but I can't see
much merit in inventing a new interface for this.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] linedit, deluser: use POSIX getpwent instead of getpwent_r

2015-02-05 Thread Rich Felker
On Thu, Feb 05, 2015 at 02:09:36PM +0100, Denys Vlasenko wrote:
 On Fri, Apr 25, 2014 at 9:36 AM, Natanael Copa nc...@alpinelinux.org wrote:
  Prefer use POSIX getpwent over glibc extension getpwent_r. This fixes
  building with musl libc with CONFIG_USE_BB_PWD_GRP disabled.
 
 Sorry, I missed this patch.
 
 How much static data (data or bss increase according to size busybox)
 do these functions pull in in static build?
 
 The comments specifically say they use getpwent_r because that avoids
 static data size penalty, usually about 1k.

The static size penalty on musl is a few bytes. We don't have full
static buffers for the strings but just use the buffer getline()
produces while reading the file, so only this pointer and the passwd
struct itself, not the strings it points to, have static storage.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


[PATCH] fix gzip applet not to store timestamps

2015-02-01 Thread Rich Felker
Storing the original file's modification time in the output file is
harmful (precludes deterministic results) and unlike official gzip,
the busybox version provides no way to suppress this behavior; the -n
option is silently ignored. Rather than trying to make -n work, this
patch just removes the timestamp-storing functionality. It should be
considered deprecated anyway; it's not Y2038-safe and gunzip ignores
it by default.

Per RFC 1952, 0 is the correct value to store to indicate that there
is no timestamp.

Rich
diff --git a/archival/gzip.c b/archival/gzip.c
index 1e779c9..903577b 100644
--- a/archival/gzip.c
+++ b/archival/gzip.c
@@ -2007,7 +2007,7 @@ static void ct_init(void)
  * IN assertions: the input and output buffers are cleared.
  */
 
-static void zip(ulg time_stamp)
+static void zip()
 {
ush deflate_flags = 0;  /* pkzip -es, -en or -ex equivalent */
 
@@ -2018,7 +2018,7 @@ static void zip(ulg time_stamp)
/* compression method: 8 (DEFLATED) */
/* general flags: 0 */
put_32bit(0x00088b1f);
-   put_32bit(time_stamp);
+   put_32bit(0);
 
/* Write deflated file to zip file */
G1.crc = ~0;
@@ -2044,8 +2044,6 @@ static void zip(ulg time_stamp)
 static
 IF_DESKTOP(long long) int FAST_FUNC pack_gzip(transformer_aux_data_t *aux 
UNUSED_PARAM)
 {
-   struct stat s;
-
/* Clear input and output buffers */
G1.outcnt = 0;
 #ifdef DEBUG
@@ -2077,9 +2075,7 @@ IF_DESKTOP(long long) int FAST_FUNC 
pack_gzip(transformer_aux_data_t *aux UNUSED
G2.bl_desc.max_length  = MAX_BL_BITS;
//G2.bl_desc.max_code= 0;
 
-   s.st_ctime = 0;
-   fstat(STDIN_FILENO, s);
-   zip(s.st_ctime);
+   zip();
return 0;
 }
 
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: [PATCH] udhcpd: Handle auto_time timeout overflow

2015-01-27 Thread Rich Felker
On Wed, Jan 21, 2015 at 11:00:03PM +0100, Tim Hentenaar wrote:
 Howdy y'all,
 
 I've noticed an interesting issue with udhcpd and auto_time.
 
 Some paths within the while loop don't go through continue_with_autotime.
 Thus, if it takes a bit too long to reset timeout_end, the monotonic
 timer may be far enough along that the subtraction which sets tv.tv_sec
 will overflow, like so:
 
 Jan 21 19:38:13 10.0.0.1 udhcpd[75]: Waking from select()
 Jan 21 19:38:13 10.0.0.1 udhcpd[75]: tv_sec = 10
 Jan 21 19:38:21 10.0.0.1 udhcpd[75]: Waking from select()
 Jan 21 19:38:23 10.0.0.1 udhcpd[75]: Sending OFFER of 10.0.0.2
 Jan 21 19:38:23 10.0.0.1 udhcpd[75]: tv_sec = -1
 Jan 21 19:38:23 10.0.0.1 udhcpd[75]: Waking from select()
 Jan 21 19:38:23 10.0.0.1 udhcpd[75]: Sending ACK to 10.0.0.2
 Jan 21 19:38:23 10.0.0.1 udhcpd[75]: tv_sec = -1
 Jan 21 19:38:43 10.0.0.2 udhcpc[47]: Sending renew...
 Jan 21 19:38:43 10.0.0.2 udhcpc[47]: Lease of 10.0.0.2 obtained, lease time 30
 Jan 21 19:38:43 10.0.0.1 udhcpd[75]: Waking from select()
 Jan 21 19:38:43 10.0.0.1 udhcpd[75]: Sending ACK to 10.0.0.2
 Jan 21 19:38:43 10.0.0.1 udhcpd[75]: tv_sec = -21
 Jan 21 19:39:03 10.0.0.2 udhcpc[47]: Sending renew...
 Jan 21 19:39:03 10.0.0.1 udhcpd[75]: Waking from select()
 Jan 21 19:39:03 10.0.0.1 udhcpd[75]: Sending ACK to 10.0.0.2
 Jan 21 19:39:03 10.0.0.2 udhcpc[47]: Lease of 10.0.0.2 obtained, lease time 30
 Jan 21 19:39:03 10.0.0.1 udhcpd[75]: tv_sec = -41
 
 This patch adds a quick and easy check for it, resetting tv_sec to 0,
 which should fall through to write_leases() and continue_with_autotime,
 resetting timeout_end again.
 
 Tim
 
 ---
  networking/udhcp/dhcpd.c | 3 +++
  1 file changed, 3 insertions(+)
 
 diff --git a/networking/udhcp/dhcpd.c b/networking/udhcp/dhcpd.c
 index 4b3ed24..d56763f 100644
 --- a/networking/udhcp/dhcpd.c
 +++ b/networking/udhcp/dhcpd.c
 @@ -415,6 +415,9 @@ int udhcpd_main(int argc UNUSED_PARAM, char **argv)
   if (server_config.auto_time) {
   tv.tv_sec = timeout_end - monotonic_sec();
   tv.tv_usec = 0;
 +
 + if ((unsigned)tv.tv_sec  server_config.auto_time)
 + tv.tv_sec = 0;

I don't think this is a valid fix. If overflow occurs, there has
already been an invocation of undefined behavior (assuming it's
actually an overflow and not an implicit conversion into a signed type
from legal unsigned arithmetic, but in that case the result would
still be nonsense and might not look negative!). The check belongs
before the arithmetic that would invoke UB (or just give a
non-meaningful result) rather than after the computation.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] udhcpd: Handle auto_time timeout overflow

2015-01-27 Thread Rich Felker
On Tue, Jan 27, 2015 at 04:02:16PM +0100, Tim Hentenaar wrote:
 On Tue, Jan 27, 2015 at 08:41:30AM -0500, Rich Felker wrote:
  On Wed, Jan 21, 2015 at 11:00:03PM +0100, Tim Hentenaar wrote:
   ---
networking/udhcp/dhcpd.c | 3 +++
1 file changed, 3 insertions(+)
   
   diff --git a/networking/udhcp/dhcpd.c b/networking/udhcp/dhcpd.c
   index 4b3ed24..d56763f 100644
   --- a/networking/udhcp/dhcpd.c
   +++ b/networking/udhcp/dhcpd.c
   @@ -415,6 +415,9 @@ int udhcpd_main(int argc UNUSED_PARAM, char **argv)
 if (server_config.auto_time) {
 tv.tv_sec = timeout_end - monotonic_sec();
 tv.tv_usec = 0;
   +
   + if ((unsigned)tv.tv_sec  server_config.auto_time)
   + tv.tv_sec = 0;
  
  I don't think this is a valid fix. If overflow occurs, there has
  already been an invocation of undefined behavior (assuming it's
  actually an overflow and not an implicit conversion into a signed type
  from legal unsigned arithmetic, but in that case the result would
  still be nonsense and might not look negative!). The check belongs
  before the arithmetic that would invoke UB (or just give a
  non-meaningful result) rather than after the computation.
  
 
 Apologies for the double-response, but I revised the
 patch to use a stack variable for safety, and to save 5 bytes.
 
 To be honest, I think this particular bit of code could benefit from
 some refactoring...
 
 Better?
 
 ---
  networking/udhcp/dhcpd.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)
 
 diff --git a/networking/udhcp/dhcpd.c b/networking/udhcp/dhcpd.c
 index 4b3ed24..75ddc12 100644
 --- a/networking/udhcp/dhcpd.c
 +++ b/networking/udhcp/dhcpd.c
 @@ -399,6 +399,7 @@ int udhcpd_main(int argc UNUSED_PARAM, char **argv)
   fd_set rfds;
   struct dhcp_packet packet;
   int bytes;
 + unsigned int msec;
   struct timeval tv;
   uint8_t *server_id_opt;
   uint8_t *requested_ip_opt;
 @@ -413,8 +414,12 @@ int udhcpd_main(int argc UNUSED_PARAM, char **argv)
  
   max_sock = udhcp_sp_fd_set(rfds, server_socket);
   if (server_config.auto_time) {
 - tv.tv_sec = timeout_end - monotonic_sec();
   tv.tv_usec = 0;
 + tv.tv_sec  = 0;
 +
 + msec = monotonic_sec();
 + if (msec  timeout_end)
 + tv.tv_sec = timeout_end - msec;

Why are we using a 32-bit type for seconds in 2015? msec should be
time_t, and monotonic_sec should be returning time_t, or they could
just use int64_t or something instead (but I feel like there's not
much point in supporting 64-bit time if the underlying system's time_t
isn't 64-bit anyway, and it's more costly, so time_t is probably
best).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: ntpdate-like functionality in ntpd

2015-01-27 Thread Rich Felker
On Tue, Jan 27, 2015 at 10:26:47PM +0100, Denys Vlasenko wrote:
 On Tue, Jan 27, 2015 at 7:27 PM, Guillermo Rodriguez Garcia
 guille.rodrig...@gmail.com wrote:
  Hello all,
 
  Is there a way to make ntpd work just like ntpdate (just use the first
  response received to set the time)? This is to set the approximate
  time at boot as quickly as possible before starting other time sensitive
  services.
 
  The closest I can get is ntpd -nqp server but this seems to need
  5 valid samples in order to set the time.
 
 Would it work for you if you simply background it
 and let it do its work in parallel with the rest of the boot?

I doubt having the boot process continue with the wrong time would be
acceptable. File timestamps will be wrong until the background process
completes. You really want to wait synchronously for an answer.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] udhcpd: Handle auto_time timeout overflow

2015-01-27 Thread Rich Felker
On Tue, Jan 27, 2015 at 10:28:26PM +0100, Tim Hentenaar wrote:
 On Tue, Jan 27, 2015 at 09:51:29PM +0100, Denys Vlasenko wrote:
  Hmm, I think it's a sign-extension bug. Can you try replacing
  
  tv.tv_sec = timeout_end - monotonic_sec();
  
  with
  
  tv.tv_sec = (int)(timeout_end - monotonic_sec());
  
  I suspect this will fix the behavior.
 
 When I make that change, I get:
 
 movq$0, -872(%rbp)  #, tv.tv_usec
 subl%eax, %ecx  # D.8486, D.8486
 testl   %r14d, %r14d#
 movslq  %ecx, %rax  # D.8486,
 movq%rax, -880(%rbp)# D.8494, tv.tv_sec
 je  .L101   #,
 testq   %rax, %rax  # D.8494
 jle .L192   #,
 
 Hmm... Looking at the assembly before the change, it's moving eax - edx
 instead of sign-extending, while here (with the explicit cast) it
 sign-extends the result. It then generates the proper jump instruction
 to boot.
 
 Perhaps it wrongly assumes that since the operands for the subtraction
 are 32-bit unsigned integers, that the result will be also unsigned. Then,
 the sign-extension gets optimized away. Then, making the cast explicit
 forces gcc to sign-extend the result.

This is not a wrong assumption. The result of unsigned subtraction is
unsigned, and it's computed via modular arithmetic.

The problem is that the semantics of the C code as written are wrong,
not anything weird done by the compiler in translating it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] udhcpd: Handle auto_time timeout overflow

2015-01-27 Thread Rich Felker
On Tue, Jan 27, 2015 at 04:51:33PM -0500, Cathey, Jim wrote:
 At the bottom, some of C's arithmetical rules
 are 'stupid'.  Especially as regards type
 promotion.  At least they're well-defined.

No, unsigned types are just modular arithmetic. There's nothing
'stupid' or unexpected about how they behave unless you're not aware
of that, at least not any more stupid than = and == (and even === in
some langs :) having different meanings or left-shifting cout in
C++...

 Absolutely true in a mathematical sense is that
 the difference between two unsigned numbers is
 SIGNED!

It depends on what number system you're working in. Objects of a fixed
size cannot represent the integers so you have to either pick a number
system that can be represented (C's unsigned types) or work with an
approximation of the integers (C's signed types).

 But that's not what C does.  You can get
 around this, easy enough, but you do have to
 understand exactly what is going on.  It helps
 if you are working on a 2's-complement machine.

Twos complement is irrelevant to the operation of unsigned arithmetic.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: bug-report: ash: fix a memory leak

2014-12-23 Thread Rich Felker
On Wed, Dec 24, 2014 at 10:32:19AM +0800, yhu2 wrote:
 The script which triggers the leak:
 
 while true
   do
 while true
   do
 break;
 done/dev/null
 done
 
 someone had fixed this bug, the commit is:
 http://git.busybox.net/busybox/commit/shell/ash.c?id=4ba6c5d3ba3d2c7922aff6b5c2e73b8325f1cf17
 
 but this commit results in crash running some shell scripts, so it was 
 reverted.
 
 
 I am trying to resolve this defect,any commnent would be appreciated!

There is no easy fix. The patch you sent is incorrect. The problem is
that the stack model used for allocation in ash is semantically
wrong, and needs to be replaced with a dependency-tree model or
something else that can handle freeing one object without freeing all
subsequently-allocated objects.

As a workaround, it might be possible to code a hack to reuse the
existing copy of the string on the allocation stack rather than
allocating a new copy whenever an existing copy already exists. This
would not solve the general case (e.g. when the redirection uses a
different file on each iteration) but it would solve the case above, I
think. It might be worth pursuing this approach if nobody is available
to work on the proper fix in the near future.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: problem of memory leak in re-direction (shell/ash.c: expredir)

2014-12-14 Thread Rich Felker
On Mon, Dec 15, 2014 at 02:55:41PM +0800, shengyong wrote:
 hi, all
 I meet the memory leak problem when use re-direction in a reading-loop, like:
 
   while true
   do
   while true
   do
   break
   done  /dev/null
   done
 
 There is some discussion on this issue:
 * http://lists.busybox.net/pipermail/busybox/2012-December/078738.html
 * https://bugs.busybox.net/show_bug.cgi?id=5822
 
 But it is still not fixed in the later versions?

AFAIK it is not fixed. As I stated in the bug report you linked, a
proper fix would require dropping the (semantically invalid) use of
the current stack-like allocation strategy and replacing it with some
sort of reference-tree structure that allows freeing of allocations
other than the most-recent. The easiest way I can see to do this would
be writing an allocator similar to talloc or adapting talloc itself if
the license and code size are acceptable. But in any case I think it's
a non-trivial task and nobody's volunteered to do it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] mktemp: don't use mktemp() function

2014-12-11 Thread Rich Felker
On Thu, Dec 11, 2014 at 06:01:57PM +0100, Bartosz Golaszewski wrote:
 The linker emits this warning:
 
   warning: the use of `mktemp' is dangerous, better use `mkstemp' or `mkdtemp'
 
 Fix it by using mkstemp() instead of mktemp().
 
 function old new   delta
 mktemp_main  214 233 +19
 --
 (add/remove: 0/0 grow/shrink: 1/0 up/down: 19/0)   Total: 19 bytes
 
 Signed-off-by: Bartosz Golaszewski bartekg...@gmail.com
 ---
  debianutils/mktemp.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/debianutils/mktemp.c b/debianutils/mktemp.c
 index 983d7a2..5ef557a 100644
 --- a/debianutils/mktemp.c
 +++ b/debianutils/mktemp.c
 @@ -59,6 +59,8 @@ int mktemp_main(int argc UNUSED_PARAM, char **argv)
   const char *path;
   char *chp;
   unsigned opts;
 + int fd;
 +
   enum {
   OPT_d = 1  0,
   OPT_q = 1  1,
 @@ -93,8 +95,9 @@ int mktemp_main(int argc UNUSED_PARAM, char **argv)
   chp = concat_path_file(path, chp);
  
   if (opts  OPT_u) {
 - chp = mktemp(chp);
 - if (chp[0] == '\0')
 + close(fd = mkstemp(chp));
 + unlink(chp);
 + if (fd  0)

How is this an improvement? It increases the code size and performs
unnecessary and potentially harmful filesystem operations. And it's
just covering up the dangerous issue rather than fixing it -- using
mkstemp then deleting the file and reusing the name is even MORE
dangerous than using mktemp, since creating the file even momentarily
exposed its name to an attacker. Of course if the code using the
mktemp utility is written correctly, neither is dangerous anyway.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Writing unicode ncurses applications for busybox

2014-11-18 Thread Rich Felker
On Tue, Nov 18, 2014 at 12:47:36PM +0300, Pugnator wrote:
  Hi all,
 I write ncurses based application which utilizes  russian characters. It 
 looks like this:
 
 wchar_t *unicode_string = LЭто юникод;
 mvwprintw(stdscr,1,5,%ls, unicode_string);    
 
 And it works pretty well on my desktop.
 I enabled all UNCIODE_ config options I found
 
 I enabled terminal: export TERM=linux2.2
 
 I copied terminfo into my busybox system (kernel 3.2)
 When I run my application, only ascii is printed. Unicode chars are absent: 
 no garbage or whatever.
 
 How it can be workarounded if it is possible. Application is dynamically 
 linked and all libraries were transferred too

Busybox has nothing to do with how your own applications behave,
unless they're calling out to the system utilities to do things. I
suspect either your libc (uclibc?) was built without locale/UTF-8
support, or your ncurses was built without wide character support. If
you're using Buildroot or a similar tool, it should offer you the
option to configure these aspects of the build.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: nslookup's Server: 0.0.0.0 but resolves correctly

2014-11-02 Thread Rich Felker
On Sun, Nov 02, 2014 at 09:02:57PM +, Steven Honeyman wrote:
 On 2 November 2014 18:05,  strob...@gmail.com wrote:
  musl:
 
   24
   25 /* unused; purely for broken apps */
   26 typedef struct __res_state {
   27 int retrans;
 
 
  sorry you lost me.
  is this an nslookup.c from your own repo?
  I thought musl only needs header patches with bb.
 
 No - that's in the dns related source code of musl libc (which is also
 broken like you describe android bionic)
 
 I should really say now, I know almost nothing about android/bionic
 which is why I can't offer any direct suggestions for that... but I
 use musl quite a bit so can try to approach that way.
 
 Basically, busybox tries to get dns settings through an (old?) method
 that is not supported in musl (and possibly bionic). The code in musl
 behind the scenes is just a bunch of return 0; to keep broken apps
 happy in thinking the res functions work, but meanwhile it uses its
 own method of parsing resolv.conf - which might be the easiest way I
 can think of to work around this issue... but I'm sure there is a much
 better way that I don't know about.
 
 ^ that's my current understanding at least, but if anyone can explain
 better then I'm sure we'd all benefit!

I'll try to explain what it's like for musl.

The above comment in regards to __res_state is referring to the idea
that this structure was never meant to be a public API, but got abused
as one by applications attempting to override nameservers or settings
or otherwise fiddle with the resolver's internal state.

musl's resolver is completely stateless and parses resolv.conf on each
request. This both ensures that it's always using fresh data (e.g. on
systems where the DNS configuration changes frequently) and avoids all
the issues of having global state that's not thread-safe or
library-safe. (IIRC glibc avoids this by making the resolver state
thread-local, and by providing the new n-prefixed versions of the
res functions which can use a caller-provided __res_state.)

The res_* functions are not stubs/dummies in musl. As of a few
versions ago, they're all intended to work. res_init is a nop simply
because there is no state.

The right way to do lookups with a custom DNS is to call res_mkquery
and then sendto() your preferred DNS.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Considering porting acpi

2014-10-12 Thread Rich Felker
On Sun, Oct 12, 2014 at 03:08:03AM -0500, Rob Landley wrote:
 Heh. The ping.c one is particularly strange because clause 2,
 Redistributions in binary form must reproduce the above notice but
 busybox does not include the word Regents in any text string, so how
 it would emit it at runtime I couldn't tell you. That said it's been
 there like that for _years_ and nobody has ever cared except me.

The subject of the verb produce there is not the program, when
executed but [the] redistribution. This just means you have to have
a text file or paper distributed with the binary containing the
notice. It does not mean that the program itself needs to embed the
notice and be able to display it on an output device; however, doing
so is an easy way to ensure that it stays attached unless somebody
intentionally removes it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Considering porting acpi

2014-10-10 Thread Rich Felker
On Fri, Oct 10, 2014 at 08:43:15AM -0700, Isaac Dunham wrote:
   The code I have is currently using dirtree_read(), which corresponds to
   recursive_action().
   Basically it looks in /sys/class/*/*/, /sys/class/power_supply,
   and /sys/class/thermal for device status.
   I could make it shorter if I use glob() instead of a recursive callback;
   does anyone know whether glob() is likely to be a bloat problem?
  
  There's a cut-down glob function you could borrow from the linux
  kernel ( lib/glob.c ) if you really wanted it slim.
 
 That would make a smaller static binary where only the acpi applet is
 concerned, but it might well increase the net size; glob uses fnmatch,
 which is use in several parts of busybox.

Yes, I would think replacing glob would be a net increase in size
unless you have an extremely minimal bb config to begin with.

 This reminds me: I keep thinking about writing fngrep, a grep-like
 tool that uses fnmatch instead of regexes. Not widely useful, perhaps
 on occasion.

Since grep already has -E and -F, wouldn't it make more sense just to
add this as a nonstandard option to normal grep? Then you'd get all
the other grep functionality for free.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: netcat nc only ipv4 (ping6 ipv6 already present)

2014-08-30 Thread Rich Felker
On Sun, Aug 31, 2014 at 02:35:33PM +1000, Jaro Gress wrote:
 It would be great  to have netcat nc upgraded to handle ipv6.
 the usual option is nc -6 -l x or nc -6 -l -p x.
 The -p option could be done without. It is kind of non standard.
 ping6 is handling ipv6 already well.
 
 (BeagleBone arm debian is also backward on ipv6 netcat)

Why should an option like -6 be needed at all? If the specified
address is an ipv6 literal, or a hostname that resolves to ipv6, then
it should just work without needing a special option.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-15 Thread Rich Felker
On Thu, Aug 14, 2014 at 11:58:56PM -0700, Isaac Dunham wrote:
 On Thu, Aug 14, 2014 at 04:15:50PM +0100, Laszlo Papp wrote:
  On Thu, Aug 14, 2014 at 4:05 PM, Rich Felker dal...@libc.org wrote:
  
   On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com
wrote:
   
 its the same with bionic libc (arm)

 printf(test) is ok but not printf(buf) with char buf[] = test;
 printf(%s, buf) is ok

   
Yeah, I guess it is about personal preference. I personally do not like
   the
extended code just to make some smart option silent. IMHO, it makes 
the
code needlessly longer by adding another small layer for the content of
   the
variable. If there is no need for formatting, why use printf in the 
first
place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On
   the
other hand, it is too bad that there is nothing between fputs and puts
where you do not need to use the file descriptor and you do not get a 
new
line either.
  
   In principle stdout is more expensive than . stdout is a GOT
   reference in PIC, and  is just PC-relative. For non-PIC it makes no
   difference though.
  
  
  Then my PC is probably lying since the practice does seem to disagree with
  you, apparently as well as this post:
  
  http://stackoverflow.com/questions/17238784/c-puts-without-newline#comment24978835_17238840
  
  (and the one below, ok, it is not critical, but it performs better for
  others, too)
  
  Raw strings are also uglier IMHO than well-defined identifiers, like
  stdout, but that is personal taste, so I do not mind that one.
  
 printf() is almost always going to call vfprintf(stdout, fmt, ap);
 I've checked musl, uclibc, glibc, and netbsd; dietlibc is the only
 exception I've found, and it's ugly and does not have locking/retry/...
 newlib calls _vfprintf_r, which is almost the same concept.
 Meanwhile, printf has to do the formatting.
 
 In other words: 
 Just because printf(%s, string) doesn't name stdout, don't assume that
 it doesn't use a reference to stdout.
 So an ACK for Laszlo Papp's recommendation of fputs().

I was talking about code size overhead, not run time overhead. Of
course there's a reference to stdout in printf. When you reference it
again yourself, that's a second reference. It's not a big deal but
apparently busybox folks care about this kind of micro-optimization...

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-15 Thread Rich Felker
On Fri, Aug 15, 2014 at 12:31:15AM +0200, Harald Becker wrote:
   and how want you behave in case of invalid UTF-8 sequences? My
 functions just skip over stray codes of 0x80..0xBF and synchronize
 on next valid UTF-8 leading byte. How would you count invalid
 sequences?
 
 In general, I would count the whole operation as a failure, returning
 some value such as -1 reserved for failure, since the string is not
 actually UTF-8 and thus how many characters? has no meaning. For
 specific uses, there might be other preferred behaviors. If your goal
 is display, you may want to simply replace illegal sequences with
 U+FFFD in which case you'd count each such sequence as 1, but if
 you're using this character-counting to allocate a buffer for the
 converted string, you need to be sure your conversion function and
 character-counting function agree on how illegal sequences are
 counted, or you might overflow your buffer or end up having to
 truncate the output.
 
 Rich, will you ever use the result of counting the numbers of UTF-8
 characters to allocate a buffer? I don't think so. That would be
 very ill behavior. To allocate buffer space you need the number of
 bytes occupied by a string, not the number of UTF-8 characters.

If your intent is to convert the string to UTF-32/wchar_t/whatever,
then yes, you use the result for allocating a buffer. In my mind
that's the main point of counting characters (since otherwise you
usually care about either bytes, for storage, or columns, for
presentation), and while I personally consider it better to work
character-at-a-time and keep the string as UTF-8, some APIs require a
string in a different format, especially ones that work with a whole
string and prepare it for visual presentation.

The main other place counting characters makes sense is for
implementing languages that do substring operations with character
indexes, which I think is the one you care about.

 So the big question is: Is there anybody who still needs the BB
 internal Unicode handling and can't use the locale functions of a
 libc. Why and for what purpose is this needed? In which environment?

I think the intent was to let uClibc users (and possibly eglibc
users?) omit locale support from the libc, which reduces libc size
quite a bit, and use the UTF-8 code in busybox instead.

 As far as I know, the beginning of those BB internal functions,
 where at times where only glibc had locale support and there where
 no alternatives for small environments. But things changed and there
 are now alternatives. So have we reached a point, where we are able
 to simplify things in BB (which means to focus on correct mb
 function usage everywhere and to strip unnecessary decisions,
 configs and helper code)?

I wouldn't object to this change.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Wed, Aug 13, 2014 at 07:06:38PM +0200, Harald Becker wrote:
 Hi Denys !
 
  The world seems to be standardizing on utf-8.
 Thank God, supporting gazillion of encodings is no fun.
 
 You say this, but libbb/unicode.c contains a unicode_strlen calling
 this complex mb to wc conversion function to count the number of
 characters. Those multi byte functions tend to be highly complex and
 slow (don't know if they have gone better). For just UTF-8, things
 can be optimized.

This depends on your libc. In musl, the only thing slow about them is
having to account for some idiotic special-cases the standard allows
(special meanings for null pointers in each of the arguments) and even
that should not be slow on machines with proper branch prediction.

 e.g.
 
 size_t utf8len( const char* s )
 {
   size_t n = 0;
   while (*s)
 if ((*s++ ^ 0x40)  0xC0)
   n++;
   return n;
 }

This function is only valid if the string is known to be valid UTF-8.
Otherwise it hides errors, which may or may not be problematic
depending on what you're using it for.

 Another fast function I use for UTF-8 ... skip to Nth UTF-8
 character in a string (returns a pointer to trailing \0 if N 
 number of UTF-8 chars in string):
 
 char *utf8skip( char const* s, size_t n )
 {
   for ( ; n  *s; --n )
 while ((*++s ^ 0x40) = 0xC0);
   return (char*)s;
 }

This code is invalid; it's assuming char is unsigned. In practice,
*++s ^ 0x40 is going to be negative on most archs. Better would be
doing an unsigned range check like (unsigned char)*++s-0x800x40U.

Of course it also gets tripped up badly on invalid sequences.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Re: Re: missing format string in applets/usage_pod.c

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 03:28:51PM +0100, Laszlo Papp wrote:
 On Thu, Aug 14, 2014 at 2:44 PM, Tanguy Pruvot tanguy.pru...@gmail.com
 wrote:
 
  its the same with bionic libc (arm)
 
  printf(test) is ok but not printf(buf) with char buf[] = test;
  printf(%s, buf) is ok
 
 
 Yeah, I guess it is about personal preference. I personally do not like the
 extended code just to make some smart option silent. IMHO, it makes the
 code needlessly longer by adding another small layer for the content of the
 variable. If there is no need for formatting, why use printf in the first
 place and not fputs with stdout? IMHO, stdout is nicer than %s :-) On the
 other hand, it is too bad that there is nothing between fputs and puts
 where you do not need to use the file descriptor and you do not get a new
 line either.

In principle stdout is more expensive than . stdout is a GOT
reference in PIC, and  is just PC-relative. For non-PIC it makes no
difference though.

As for the topic at hand, I don't think it's necessary to make this
change if the messages are DOCUMENTED as being format strings that are
required not to contain any format specifiers, and there's a big
warning to this effect in the source wherever they appear. printf
allows format strings chosen (or even constructed) at runtime and the
-W option to make this an error is bogus; at least it cannot be
applied to code that was not written for a specific coding style that
forbids such usage.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 07:16:36PM +0200, Tanguy Pruvot wrote:
 size_t utf8len( const char* s )
 {
size_t n = 0;
while (*s)
  if ((*s++ ^ 0x40)  0xC0)
n++;
return n;
 }
 
 you need to test s != NULL, else *s will crash

Says who? NULL is not a valid pointer. Should you also check for
things like s != (char *)-1 ? What value would you return then,
anyway?

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 07:14:52PM +0200, Harald Becker wrote:
 Hi Rich!
 
  You say this, but libbb/unicode.c contains a unicode_strlen calling
 this complex mb to wc conversion function to count the number of
 characters. Those multi byte functions tend to be highly complex and
 slow (don't know if they have gone better). For just UTF-8, things
 can be optimized.
 
 This depends on your libc.
 
  that is, why I added don't know if gone better ... really
 good when musl is fast on this ... the problem is BB is more likely
 linked with glibc or uClibc ... there the results are not so great
 :(

I think uClibc is pretty fast at this too. It's glibc that's horribly
slow. Rough comparison:

For processing a full string buffer, musl is roughly twice as fast as
uClibc, and uClibc is roughly twice as fast as glibc.

For byte-by-byte processing: musl is roughly 3x as fast as uClibc and
roughly 4x as fast as glibc.

Source: my comparison at http://www.etalabs.net/compare_libcs.html

Presumably you would use a full string operation here (mbstowcs with
null output pointer) for computing length in characters.

 size_t utf8len( const char* s )
 {
size_t n = 0;
while (*s)
  if ((*s++ ^ 0x40)  0xC0)
n++;
return n;
 }
 
 This function is only valid if the string is known to be valid UTF-8.
 
 Yes, I told it's for UTF-8.

Yes, but there's a difference between nominally UTF-8 and
known-valid UTF-8.

 Otherwise it hides errors, which may or may not be problematic
 depending on what you're using it for.
 
 If you know you are using UTF-8 you do not need to check every
 string over and over again, else it's pure paranoia. It is robust,
 as it will not run away on anything which is valid C string.

Well if the string comes from a source outside of your control, you
need to check it at least once. But you might not want to check and
reject it at the original point of input, e.g. if you want to be able
to preserve arbitrary byte sequences that might not be UTF-8, e.g. an
argument that's a filename in an invalid encoding which you're trying
to delete or rename to fix. So IMO it makes a lot more sense to do
your checking at the point of treating the string as a sequence of
characters, even if it happens multiple times. The cost is not high if
your implementation is efficient.

 Of course it also gets tripped up badly on invalid sequences.
 
 How can it get tripped? It silently skip over invalid sequences (of
 0x80 to 0xBF until next leading of a sequence). It shall not get
 stuck in any way. Or tell me exactly how ...

By itself it's not a problem, but the interaction with other code may
be a problem if the other code does not follow exactly the same
conventions.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Possible Unicode Problems in Busybox - Collect and Discussion

2014-08-14 Thread Rich Felker
On Thu, Aug 14, 2014 at 09:09:02PM +0200, Harald Becker wrote:
 Hi Rich!
 
  I think uClibc is pretty fast at this too. It's glibc that's horribly
 slow. Rough comparison:
 
 Pretty fast is still slower than UTF-8 optimized functions.

The standard functions certainly can be UTF-8-optimized, and they are
in at least several implementations. I think glibc still has a pretty
slow path to get to the UTF-8 decoding but hopefully that will be
fixed eventually. I'll remind myself to pursue that in the future.

 For processing a full string buffer, musl is roughly twice as fast as
 uClibc, and uClibc is roughly twice as fast as glibc.
 
 You don't need to make ads for musl here, I' would like to see
 prebuild versions of BB statically linked with musl.

It's not an ad. It's just pointing out that uClibc is probably not
significantly slower for what you care about. My interpretation was
that you trusted me that musl is fast here, but thought other more
commonly used implementations might be slow, so I stated the relative
speeds as a basis for evaluating that.

 Presumably you would use a full string operation here (mbstowcs with
 null output pointer) for computing length in characters.
 
 Do you remember my question only UTF-8 or full multi byte locale? It
 is exactly this decision. The former may be optimized the later more
 accurate.

Yes I remember the question. Assuming the standard function has a fast
path for UTF-8, which it should, the only reason to expect the
standard multibyte functions to be significantly slower than your
custom ones is that they detect illegal sequences rather than blindly
assuming the input is valid.

 If you know you are using UTF-8 you do not need to check every
 string over and over again, else it's pure paranoia. It is robust,
 as it will not run away on anything which is valid C string.
 
 Well if the string comes from a source outside of your control, you
 need to check it at least once. But you might not want to check and
 reject it at the original point of input, e.g. if you want to be able
 to preserve arbitrary byte sequences that might not be UTF-8, e.g. an
 argument that's a filename in an invalid encoding which you're trying
 to delete or rename to fix. So IMO it makes a lot more sense to do
 your checking at the point of treating the string as a sequence of
 characters, even if it happens multiple times. The cost is not high if
 your implementation is efficient.
 
  and how want you behave in case of invalid UTF-8 sequences? My
 functions just skip over stray codes of 0x80..0xBF and synchronize
 on next valid UTF-8 leading byte. How would you count invalid
 sequences?

In general, I would count the whole operation as a failure, returning
some value such as -1 reserved for failure, since the string is not
actually UTF-8 and thus how many characters? has no meaning. For
specific uses, there might be other preferred behaviors. If your goal
is display, you may want to simply replace illegal sequences with
U+FFFD in which case you'd count each such sequence as 1, but if
you're using this character-counting to allocate a buffer for the
converted string, you need to be sure your conversion function and
character-counting function agree on how illegal sequences are
counted, or you might overflow your buffer or end up having to
truncate the output.

 Of course it also gets tripped up badly on invalid sequences.
 
 How can it get tripped? It silently skip over invalid sequences (of
 0x80 to 0xBF until next leading of a sequence). It shall not get
 stuck in any way. Or tell me exactly how ...
 
 By itself it's not a problem, but the interaction with other code may
 be a problem if the other code does not follow exactly the same
 conventions.
 
 Sure, you can't mix multi byte functions with pure UTF-8 functions,
 you always need to look what type of function you call in your code.
 So what's different here.

Interaction with other code was not about mixing your own pure UTF-8
functions with the standard C multibyte functions in
possibly-non-UTF-8 locales. It was about mixing them with other code
that's processing UTF-8 but handling errors differently. One such
example would be the standard C multibyte functions when
nl_langinfo(CODESET) has already been determined to be UTF-8 (so you
know they're processing UTF-8), but pure UTF-8 code outside of the
standard functions might also be handling errors differently from what
you're doing, and mixing it with your handling _could_ be dangerous,
depending on what you do.

  and the convention is just UTF-8 (even with invalid sequences)
 not a mixture with other multi byte codes. Not so much requirement
 of a convention?
 
 The functions have bean designed carefully to be not trapped on
 invalid sequences. I know they look extreme simple, but this is part
 of the optimization.
 
  remember: We are not talking about the ability to work with
 other multi byte locales. The assumption was pure ASCII or UTF-8.

I'm fine with assuming 

Re: How do I (unconditionally) enable unicode support in busybox?

2014-08-11 Thread Rich Felker
On Mon, Aug 11, 2014 at 05:15:21PM +0200, Harald Becker wrote:
 IMO there is still something very strange with sed and unicode
 
 YES! I did not stop looking for this. Looks like this is a problem
 in the regular expression parser.
 
 s /./x/g
 
 shall match every character and replace with a single x, but indeed
 it matches every byte of UTF-8 characters too (which is wrong). But
 this doesn't seam to depend on setting of LANG (which confused me).
 Is it possible, it only worked when BB is linked with glibc in a
 fully functional environment. Maybe than an UTF-8 aware regex
 scanner is used. We need to look further on this!

I think this is the result of using uclibc with a broken regex
implementation -- either as a result of a build time option (omitting
locale? omitting full regex?) or just a deficiency in uclibc. Using
glibc or musl would solve it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: How do I (unconditionally) enable unicode support in busybox?

2014-08-06 Thread Rich Felker
On Wed, Aug 06, 2014 at 02:51:21PM +0200, Harald Becker wrote:
 Hi James!
 
   The problem is mainly when I use a busybox script
 as /init inside an initrd (initramfs).  It runs as process 1.
 The /init script is called directly by the bootloader.  Its
 environment is controlled by the command line parameters which
 are set by the *user*.   Worse, in this situation:
 
  export LANG=...
 
 DOES NOT WORK for changing subsequent unicode behavior.
 
 Why shall it not work in this situation? It works for me.
 
 At the start of the init script I use export LANG=... whatever I
 need. As it is exported it goes to the environment of every process
 started from init script, so UTF-8 support is kept.

Perhaps the issue is that it doesn't work in the initial shell itself
-- for example, glob bracket expressions treat a character as a set of
bytes and match any one of those bytes rather than the character.

 The
 behavior is set and locked in by the value of LANG in the initial
 environment which I don't have control over.
 
 No control? Why?
 
 It is usally not needed, but in case you really want to achieve this:
 
 first init script:
 
 export LANG=...
 exec /bin/sh second_init_script MAY EVEN SET COMMAND LINE ARGUMENTS
 
 second init script runs as process 1 with environment set as you like.

I agree this is a good workaround.

 If exporting worked then the original code I posted would work:
 
  echo -n $x | LANG=utf sed 's/./x/g' | wc -c
 
 Argh ... this kind of exporting makes trouble with ash (and many
 other Unix shells, not all behave correct as expected). Always set
 it at the beginning of your script:

Are you sure? It's well-documented, required by the standard, and has
always worked fine with ash for me. I would consider it the preferred
way of exporting when you don't want to affect subsequent commands or
the state of the current shell, only the current command.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: How do I (unconditionally) enable unicode support in busybox?

2014-08-04 Thread Rich Felker
On Sun, Aug 03, 2014 at 11:50:29PM -0600, James Bowlin wrote:
 I run busybox in an initrd (initramfs) environment using both
 legacy Grub and isolinux as boot loaders.  I want to be able to
 get the correct length of unicode strings in characters, not
 bytes.  I always have these two options set:
 
 CONFIG_UNICODE_SUPPORT=y
 CONFIG_UNICODE_WIDE_WCHARS=y
 
 and I've played with the 3 combinations of:
 
 CONFIG_UNICODE_USING_LOCALE
 CONFIG_FEATURE_CHECK_UNICODE_IN_ENV
 
 both off, and one or the other on.
 
 At best I get very erratic results that depend on the value the
 LANG variable when the first busybox shell (/init script) starts
 and it seems to be immune to later using export LANG=xxx.
 
 One of the things I've tried is:
 
 echo -n $x | LANG=en_US.UTF-8 sed 's/./x/g' | wc -c
 
 This works *sometimes* but seems to depend on the value of LANG
 when the first busybox shell (/init script) is started but it has
 been flakey at best.  Also, I don't have absolute control over
 that initial value of LANG because it is can be set by users with
 a lang=xxx boot parameter.
 
 Unicode strings always print fine.  I'm just struggling with
 getting the length of a string that has unicode characters.
 Using ${#x} has always failed.  So has wc -c which is what led
 me to the sed trick above.
 
 The code above to get the length always seems to work
 consistently in my development environment (using the right
 busybox .config).  Oddly enough it fails in my development
 environment when CONFIG_FEATURE_CHECK_UNICODE_IN_ENV=y which is
 exactly the opposite of what I would expect.

This option is utterly broken and should never be used. It searches
for the string .utf or .UTF in $LANG to determine if UTF-8 should
be enabled. There is no reason that this string needs to appear in the
name of a locale for the locale to be UTF-8-based (plenty of locales
have no legacy encoding).

This option should really be removed from Busybox, or at least get a
big warning slapped on it that it's broken and doesn't do what it
should.

The correct (and only correct) way to determine if UTF-8 should be
used is to call setlocale(LC_CTYPE, ) and then check
nl_langinfo(CODESET).

 it seems to mostly fail when I choot into a plain busybox
 environment or inside my initrd (initramfs) during boot.

If you're having trouble even with CONFIG_FEATURE_CHECK_UNICODE_IN_ENV
turned off and CONFIG_UNICODE_USING_LOCALE turned on, please let me
know which libc you're using -- this could matter too -- and if it's
uClibc, whether you have locale properly enabled for it.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: How do I (unconditionally) enable unicode support in busybox?

2014-08-04 Thread Rich Felker
On Mon, Aug 04, 2014 at 03:00:07PM -0400, Michael Conrad wrote:
 On 8/4/2014 2:48 PM, James Bowlin wrote:
 BTW, the following code is an infinite loop in my
 initrd:
 
 [ $LANG = en_US.UTF-8 ] || LANG=en_US.UTF-8 exec $0 $@
 
 I think you should focus on this bug.  Which busybox version? Which
 libc version?  Because my busybox runs it just fine. (as should any
 conforming shell)

I suspect it's an issue with some option to run busybox commands in
the shell process itself rather than forking and execing. Seeing a
full config file would be useful.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] top: fix parsing of /proc/meminfo

2014-07-22 Thread Rich Felker
On Tue, Jul 22, 2014 at 08:24:30PM +0300, Timo Teras wrote:
 On Tue, 22 Jul 2014 13:07:15 -0400
 Cathey, Jim jcat...@ciena.com wrote:
 
  Am I missing something here?  There is no structure, just a
  character pointer.  If you leave off static it will be compiled as
  an instruction that pushes a constant onto the stack, by any/every
  compiler.
  
  Which is probably _larger_ code than just referring to something
  stored in the .text/.data segment.  It's copying something that is
  found in text/data into the stack space, then referring to that
  address thereafter.  The compiler has to do:
  
  Copy A-B
  Then use B.
  
  -vs-
  
  Use A.
  
  You do this when you want the ability to modify B.
  If A is fixed and inviolate, you mark it static const
  and be done with it.  What this turns into, exactly,
  depends upon your compiler, CPU, and ABI, but I can't
  think of any case where the extra copy is going to
  turn out _better_ than the simpler case.
 
 Usage of 'static' depends whether it's const char * or const
 char[]. When defining const char * static does not make sense, as
 in theory it should enforce *also* the pointer value to go to .rodata
 (the string literal of course goes there too). So you might end up
 wasting one extra pointer in .rodata.
 
 OTOH, with const char[] you definitely want the static as other
 wise you'd end up with the copy described above.
 
 I only suggested removing 'static' since I had it a pointer. The latest
 patch has array, and thus is static.

I agree that using a static const array is the best approach here, and
that with a *single* pointer, making a static object to hold the
pointer rather than just an automatic const-qualified variable is a
pessimization, if anything. However static const pointers are useful
when they're arrays of pointers, or part of structures, etc., but
their usefulness is limited except with static linking and non-PIE
main programs, since they'll need relocations for PIC/PIE which would
throw them out of rodata.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Use the standard libc types in fatattr

2014-07-05 Thread Rich Felker
On Sat, Jul 05, 2014 at 12:20:51PM +0100, Laszlo Papp wrote:
  To be pedantic, uint32_t was introduced in the Open Group Base
  Specifications, Issue 5 (released in 1997, basis for UNIX98).
  At that point it was defined in inttypes.h, which only defined
  (u)int*_t.
 
  As far as when inttypes.h was available, it was first shipped in glibc
  2.1
  (and prereleases thereof), meaning it was first packaged in Debian 2.1
  (slink) on sparc.  Slink shipped with kernel 2.0.36.
  But the names of sized types in the kernel _could_ have been switched
  over when the standard was released, or rather during any of the unstable
  series after that; this is not a compiler change but a header change,
  and the kernel has its own headers. Kernel 2.0 was released in 1996
  (before the standard), but 2.2, 2.4, and 2.6 were released in 1999,
  2001, and 2003.
 
 
 Glibc is not the only thing, in fact it has not been used much in embedded,
 and even today, it is mostly for computing embedded.

From a header standpoint, uClibc effectively is (an old version of)
glibc. From well after the standard types were added. So this is not
an argument against using them.

  The real reason is mentioned in the kernel coding style manual:
  some people don't like the new types.
 
 
 Tes... for good: do not fix what is not broken.

Do not fix what is not broken does not apply here. The kernel
headers have ALWAYS been broken for use in userspace. The UAPI effort
has partly fixed this (and is evidence that they _are_ trying to fix
this, albeit poorly).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Use the standard libc types in fatattr

2014-07-04 Thread Rich Felker
On Fri, Jul 04, 2014 at 04:25:11PM +0200, Denys Vlasenko wrote:
 On Fri, Jul 4, 2014 at 4:06 PM, Laszlo Papp lp...@kde.org wrote:
  On Fri, Jul 4, 2014 at 2:47 PM, Denys Vlasenko vda.li...@googlemail.com
  wrote:
 
  On Fri, Jul 4, 2014 at 3:26 PM, Laszlo Papp lp...@kde.org wrote:
-# define FAT_IOCTL_GET_ATTRIBUTES_IOR('r', 0x10, __u32)
-# define FAT_IOCTL_SET_ATTRIBUTES_IOW('r', 0x11, __u32)
+# define FAT_IOCTL_GET_ATTRIBUTES_IOR('r', 0x10, uint32_t)
+# define FAT_IOCTL_SET_ATTRIBUTES_IOW('r', 0x11, uint32_t)
 #endif
   
 /* Currently supports only the FAT flags, not the NTFS ones.
  
  
   Applied, thanks!
  
   (why kernel doesn't just use std types?...)
  
   What do you mean by std types?
 
  Like uint32_t
 
 
  As indicated before, it was only introduced in C99. The kernel project
  predates that for one.
 
 I understand that.
 It's 2014.
 15 years to convert to (now-)standard type should be doable.

Yes. If the kernel would convert to using the stdint.h types and
provide its own stdint.h for use with -nostdinc when building the
kernel itself, the kernel headers would automatically be compatible
with userspace when used directly by applications like busybox.

On the other hand, if the kernel headers are intended to be used for
implementing some libc headers, they cannot depend on or pull in
stdint.h unless the corresponding libc header is allowed to do so; if
they did so, it would be namespace pollution. I suspect this is one of
the reasons they may be hesitant to use these types. Right now, all of
their wacky types are in the reserved namespace (double-underscore
prefix) and thus they don't have to worry about it.

Of course libc implementations should not be using kernel headers -- I
thought we learned that lesson 20 years ago with libc5 -- but uclibc
does so heavily and glibc even resurrected the practice at some point
for various things (especially socket stuff, and lots more on some
archs).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [Question] Questions about the BusyBox specification.

2014-07-02 Thread Rich Felker
On Tue, Jul 01, 2014 at 08:47:45PM +0200, Harald Becker wrote:
 Hi Rich!
 
  Obviously something like that isn't acceptable for inclusion.
  It was probably just a hacked-up version of upstream iptables.
 
 Just as a question. I did not look into that very deep.
 
 You are talking about iptables. I thought newer kernel have a
 different firewall, with a complete different language/interpreter.
 Is that really intentional to look still at the old iptables?
 Wouldn't it be better to implement applets of the new firewall
 rules, giving also other users a push to use the new firewall
 infrastructure.

I was under the impression that most users/products are still using
the iptables interface, despite it having a new backend that they
could use directly. It wouldn't hurt to have both, but a
command-line-compatible version of iptables is probably more important
from a user perspective.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [Question] Questions about the BusyBox specification.

2014-07-01 Thread Rich Felker
On Mon, Jun 30, 2014 at 07:57:56PM -0700, Isaac Dunham wrote:
 On Fri, Jun 27, 2014 at 09:26:27AM -0400, Rich Felker wrote:
  On Fri, Jun 27, 2014 at 10:06:07AM +0200, Frank Ihle wrote:
 (6) Is there a (stateless/statefull) firewall for BusyBox ?
 think this is not related to busybox. Use iptables?
  
  The lack of an iptables command in Busybox is something that would be
  nice to fix, especially since the official iptables is bloated and
  (last I checked) requires dynamic linking. But this would still not be
  a firewall for Busybox (because Busybox is NOT AN OS), just an
  alternate implementation of the low-level firewall configuration tool.
 
 At one point someone ported iptables to busybox.
 However, that was in the busybox 1.00/1.1 era, so it's probably missing
 several bugfixes and features, as well as having a lot of bitrot;
 it also runs ~14000 lines and requires rpc/rpc.h.
 The copyright notices say (C) 2000-2002.
 (Rob Landley was not interested.)

Obviously something like that isn't acceptable for inclusion. It was
probably just a hacked-up version of upstream iptables.

The right way to add iptables would be to reverse engineer it via
strace (i.e. only look at the syscalls it makes to perform the actual
commands, not the source code, which is likely hideous) and write code
that does the same thing.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [Question] Questions about the BusyBox specification.

2014-06-27 Thread Rich Felker
On Fri, Jun 27, 2014 at 10:06:07AM +0200, Frank Ihle wrote:
   (6) Is there a (stateless/statefull) firewall for BusyBox ?
   think this is not related to busybox. Use iptables?

The lack of an iptables command in Busybox is something that would be
nice to fix, especially since the official iptables is bloated and
(last I checked) requires dynamic linking. But this would still not be
a firewall for Busybox (because Busybox is NOT AN OS), just an
alternate implementation of the low-level firewall configuration tool.

   (7.1) Does BusyBox provide a Network stack ?
  Run: make menuconfig
  and look under Networking utilities
 
 Alright, according to my question, is the answer in this Thread
 (http://stackoverflow.com/questions/19349923/tcp-ip-stack-in-busybox)
 then wrong? There they say there is no TCP/IP stack in BusyBox.

There's not, because Busybox is not an OS/kernel. Busybox is simply an
alternate implementation of some of the essential userspace
utilities for providing things like standard POSIX commands, network
configuration, filesystem creation and repair, (de)compression and
archive utilities, etc.

The TCP/IP stack (a somewhat dated term, BTW) is in the kernel.

   (10) What is the long-term availability of BusyBox? Can it be
   expected to be available for at least 10 years ?
  You download the source, you will have it forever.
 
 Maybe I should have asked differently: will BusyBox be developped
 and supported in future or e.g.: will there be a successor program
 and therefore BusyBox would be left on a final state ?

There's little guarantee of anything like this, but considering how
widely Busybox is used, I would be really surprised if somebody else
didn't pick up development if the current maintainers got tired of it.
Of course there's no guarantee that someone who did so would continue
to develop in a direction that's appealing to your needs, but since
it's free software, you're always free to do your own fork or hire
someone to do it.

There's also Toybox, which I suspect Rob would call a successor to
Busybox.. ;-)

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: ftpd authentication [PATCH] ftpd: NOMMU/chroot fix

2014-06-27 Thread Rich Felker
On Fri, Jun 27, 2014 at 01:55:26PM +0200, Denys Vlasenko wrote:
 On Fri, Jun 27, 2014 at 1:46 PM, Morten Kvistgaard
 m...@pch-engineering.dk wrote:
  A small detail, why do you check if the root_fd is valid? Eg.
 
  ...
  if (G.root_fd = 0) {
  ...
 
  It shouldn't be possible for it to be invalid and the old code didn't 
  validate it either.
 
 I added this in the next patch:
 
  #if !BB_MMU
 -   G.root_fd = xopen(/, O_RDONLY | O_DIRECTORY);
 -   close_on_exec_on(G.root_fd);
 +   G.root_fd = -1;
  #endif
 argv += optind;
 if (argv[0]) {
 +#if !BB_MMU
 +   G.root_fd = xopen(/, O_RDONLY | O_DIRECTORY);
 +   close_on_exec_on(G.root_fd);
 +#endif

It would be nice to have an assume O_CLOEXEC build option and use
O_CLOEXEC wherever close_on_exec_on is currently used. Might save a
little bit of bloat if nothing else. This could probably be achieved
just by adding O_CLOEXEC at the call sites, defining it to 0 if the
system doesn't define it, and defining close_on_exec_on(x) away to
nothing if the build option is selected.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH 0/6] Add a unit-testing framework to Busybox

2014-06-26 Thread Rich Felker
On Thu, Jun 26, 2014 at 12:35:53PM +0200, Denys Vlasenko wrote:
 On Thu, Jun 26, 2014 at 12:00 PM, Bartosz Gołaszewski
 bartekg...@gmail.com wrote:
  2014-06-22 16:31 GMT+02:00 Denys Vlasenko vda.li...@googlemail.com:
  Applied all patches with some editing.
  Thanks a lot!
 
  Hi Denys,
 
  I see, that you removed the INIT_FIRST macro from my implementation.
  Are you sure llist_t *tests will always be initialized before calling
  the test registering functions?
 
 Global data is initialized to zero at program load time.

And even if you need a non-zero initial value, static initialization
is usually suitable; there's no need for code for this kind of thing.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: ftpd authentication [PATCH] ftpd: NOMMU/chroot fix

2014-06-26 Thread Rich Felker
On Thu, Jun 26, 2014 at 04:47:04PM +0200, Denys Vlasenko wrote:
 On Thu, Jun 26, 2014 at 1:37 PM, Morten Kvistgaard
 m...@pch-engineering.dk wrote:
  ...
  execve(proc/self/exe, [ftpd, -l, /], [/* 9 vars */]) = -1
  ENOENT (No such file or directory) ...
 
 This is strange. Any ideas why this fails on your machine?
 
  Yes, the fchdir(G.root_fd) is not enough to break the jail. (And it's not 
  just my machine. It's all of our Ubuntu versions and all of our uClinux 
  versions. Which made me assume that it was a general issue.)
 
  There's a nice quote, I think: Ref: http://m.oschina.net/blog/113399. (One 
  of the first hits on google. There're prolly better sources.)
 
  ===
 
 
  /* Partially break out of the chroot jail by doing an fchdir()
   This only partially breaks out of the chroot() jail since whilst
   our current working directory is outside the chroot jail, our
   root directory is still within it. Thus anything which refers to
/ will refer to files under the chroot point.
   */
  if (fchdir(dir_fd)0) {
  fprintf(stderr, Failed to fchdir - %s\n,
  strerror(errno));
  exit(1);
  }
 
 The point is, we *do not* refer to /.
 We exec proc/self/exe, NOT /proc/self/exe.
 
 It does work on my machine.
 
 How come it doesn't work on your machine?

If proc/self/exe make be relative to the working directory, but
/lib/ld-whatever.so from proc/self/exe's PT_INTERP header is certainly
not relative. This is the cause of the ENOENT.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Add coreutils/unlink.c

2014-06-23 Thread Rich Felker
On Mon, Jun 23, 2014 at 07:44:23PM +0200, Denys Vlasenko wrote:
 On Mon, Jun 23, 2014 at 7:24 PM, Cathey, Jim jcat...@ciena.com wrote:
  The entire point of unlink, the reason it even
  exists, is that it never takes _any_ options.
  Anything you feed it is a filename, and it
  will delete it.
 
 coreutils disagree:
 
 $ unlink --version
 unlink (GNU coreutils) 8.17
 Copyright (C) 2012 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html.
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.
 
 Written by Michael Stone.
 
 $ unlink -qwerty
 unlink: invalid option -- 'q'
 Try 'unlink --help' for more information.

This seems to be a bug in coreutils then.

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/unlink.html

Under OPTIONS it says None. For other utilities that take options,
the text reads something like:

The rm utility shall conform to XBD Utility Syntax Guidelines.

I think the specification is clear in that the coreutils behavior is
not permitted.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] Add coreutils/unlink.c

2014-06-23 Thread Rich Felker
On Mon, Jun 23, 2014 at 07:36:50PM +0100, Laurent Bercot wrote:
 Well, unlink takes '--version' and '--help' as options. I think
 there's a conflict between open standard and coreutils' oddity to
 bring command syntax and version information with command line
 switches.
 
  unlink is not the only nonconforming executable in GNU coreutils.
 true and false, for instance, also accept GNU options. (Because
 being able to perform true --help and true --version is vital.)
 It's just a symptom of GNU's tendency to think itself as better than
 the standard while pretending to conform to it.
 
  Busybox has to make a choice: either be conformant or mimic GNU
 coreutils' behaviour. Both paths are valid, but Busybox has to be
 consistent with it: all utilities have to make the same choice.
 
  I vote for a general compile-time option BB_MIMIC_GNU, if there
 isn't anything similar already.

If this is the path taken, I think the config option should be clearly
name and documented to explain that it is not adding useful features
but rather compatibility with gratuitous GNU brokenness. For example
in the case of unlink, the coreutils brokenness defeats the whole
purpose of having the utility.

Of course having the utility is rather useless anyway since rm --
filename is just as good. I would argue that unlink is useless and
could just be replaced by such a script if needed.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


netstat output issues

2014-06-07 Thread Rich Felker
Two small issues I found in busybox's netstat vs non-busybox versions:

1. The any address 0.0.0.0 or :: is not printed as '*' but actually
   looked up via the resolver. If nothing else this is a waste of
   time; it's also (at least in my opinion) uglier and harder to read.

2. IPv6 literals are printed without brackets around them, resulting
   in confusion between the port number and the ':' characters within
   the address.

Neither is a big deal, but these might be nice to fix at some point if
others agree.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Issues removing files with certain characters in their names.

2014-05-30 Thread Rich Felker
On Thu, May 29, 2014 at 06:41:17PM -0400, Joshua Judson Rosen wrote:
 But why is ls able to match the files when rm is not able to remove them?

I have no idea. Have you tried running them under strace and seeing
where the failure occurs?

 Is it perhaps because ls is not actually doing any operations on the files
 themselves (not even a stat?), and just reporting the dirent-d_name strings
 that it got from readdir()? In which case ls -l * would fail on the same
 files even when ls * doesn't?

I doubt it.

 Or is there something deeper whereby stat() succeeds but unlink() fails?

I'm guessing it's a failure at in the userspace code for rm, not the
syscall. Again strace could help you confirm this and possibly
determine where rm's idea of the filename is getting corrupted.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Proposition: use a hashtable instead of bsearch to locate applets

2014-05-30 Thread Rich Felker
On Fri, May 30, 2014 at 12:13:41PM +0100, Laurent Bercot wrote:
 
 On 05/30/2014 10:16 AM, Bartosz Gołaszewski wrote:
 I've checked the times just by looking up all the applets in a loop
 and measuring the time using gettimeofday() - the results are: ~220
 microseconds for bsearch and ~150 microseconds for hashtable on my
 linux laptop. Is it significant? I think so. Is it noticeable?
 Probably not. The idea came to me, when thinking about unifying the
 hashtables used in busybox. I guess you're right and it isn't really
 worth the size increase.
 
  I think it would definitely be a worthwhile improvement if all busybox
 is doing was looking up the applets in a loop. ;)
  A more realistic test, however, would be to fork+exec a trivial applet
 (true, for instance) in a loop, and measure the times with bsearch vs.
 with the hash table. And I'm certain you'll find that the difference
 becomes quite insignificant.

The lowest time I've ever seen for exec (not even counting fork;
measured from immediately before the exec syscall to entry into main)
is over 200µs, and can easily exceed 1ms with dynamic linking. So even
if the program did nothing but applet lookup and exit, I think we'd be
looking at a performance increase of ~30% at best.

Realistically, as soon as you throw in some actual work and other
syscalls that actually do something, I suspect the difference to be
less than 5%.

If there is a desire to change the way applet lookup is done, I would
suggest trying to make it optimal in both size and performance. 150µs
is not impressive at all. A perfect hash constructed at build time for
the list of busybox applet-name strings should make it possible to do
the applet lookup in ~1µs with trivial code/table size (all constant
in .text).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Issues removing files with certain characters in their names.

2014-05-30 Thread Rich Felker
On Fri, May 30, 2014 at 08:06:24PM +0200, Harald Becker wrote:
 Hi Joshua !
 
 On 29-05-2014 18:41 Joshua Judson Rosen jro...@harvestai.com
 wrote:
 
 But why is ls able to match the files when rm is not able to
 remove them?
 
 The problem happens on the opposite direction you expect. It is
 not the create/open/unlink which modify the file name it is the
 directory scan. ls doesn't really match the filename. It just
 scans the directory, filters with given file name masks, then
 display what it got.

ls never performas any filtering. If ls is given a directory name, it
just reads the entries with readdir(); no 'matching' needs to take
place at all. If it's given a glob pattern, that glob never reaches
ls; it's expanded by the shell. If you quote the glob to actually pass
it to ls, ls will not process it as a glob pattern; it's treated as a
literal filename containing *'s or ?'s. So I don't understand what
you're claiming happens.

 But on some unusual file names the directory
 scan gives names which does not exactly match the name stored on
 file system. They can be displayed, but used for an remove or
 move operation the stat/unlink fails. This usually happens
 when the names contain control or unprintable characters
 (e.g. a byte with all zero) which get removed by the
 kernel/file system driver.

Printability has nothing to do with processing the filename. And a
zero byte fundamentally cannot be in a filename (the filename in the
directory table consists of those characters up to, and not including,
any zero byte stored).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Issues removing files with certain characters in their names.

2014-05-30 Thread Rich Felker
On Fri, May 30, 2014 at 09:18:19PM +0200, Harald Becker wrote:
 Hi Rich !
 
 My statement was imprecise; of course to support users still
 stuck on legacy locales, nl_langinfo(CODESET) should be
 consulted.
 
 How do you determine the correct code set of a foreign file
 system on an external drive? How can you tell if all systems
 which accessed this drive has handled translations in the correct
 way?

All modern filesystems used on external devices (fat32, ntfs, udf,
...) use Unicode-based encodings for filenames, so the foreign
encoding is known and fixed.

   and not only unzip may produce such results. Think of
  using an USB stick at an Windows machine, then carry that over
  to an Linux machine.
 
 The filenames are stored in UCS-2. No problem.
 
 UCS-2 with different code page translations from an 8 bit
 charset. Translations which leave name mapping in inconsistent
 state when further translations occur.

I don't follow what you think the problem is.

 If you mount it incorrectly, then this is user error.
 
 Correct, all those trouble arrives due to anybody having an
 incorrect setup. This will ripple trough and may produce trouble
 on other ends.

All modern Linux-based systems use the utf8 option by default when
mounting filesystems that don't store filenames as pure byte strings
but in a Unicode-based form. You have to be rolling your own or else
actively breaking your system's default setup to get this wrong.

 All programs are not affected. Only programs which read
 filenames as byte strings from foreign sources (such as the
 directory table of a zip file) are affected.
 
  but how do you know the code page the zip archive uses. How
 do you know you need to do translations? I'm unsure if the archiv
 contains this information, so it needs to be provided by a much
 more error prone user.

When encountering such an archive, the unzip utility could simply exit
with an error when there are non-ASCII names unless the user specifies
the encoding. To be less error-prone, it could print the names as
interpreted in several different encodings as part of the error
message, to help the user identify which one is correct. IMO it should
also automatically assume UTF-8 and suppress the error condition if
the names all decode as valid UTF-8, since the probability of
meaningful non-UTF-8 text decoding successful as UTF-8 is negligible.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox nslookup slow on x86_64

2014-05-30 Thread Rich Felker
On Sat, May 31, 2014 at 02:02:35AM +0200, Denys Vlasenko wrote:
 On Thursday 29 May 2014 20:47, Rich Felker wrote:
  On Wed, May 28, 2014 at 04:34:23PM +0200, Denys Vlasenko wrote:
   On Tue, May 27, 2014 at 9:34 AM, muddyboot emu...@gmail.com wrote:
Hi, I found nslookup resolve very slow on x86_64 system , it cost 5 
seconds or longer almost everytime.
   
Tested OS: Debian 7 x86_64 with kernel 3.2.5   LFS x86_64 with kernel 
3.12
   
No IPv6 enabled in kernel config.
DNS server works fine
nslookup program from bind-9.7 works fine
nslookup from busybox test on i686 system OK
   
target busybox version: 1.17.4、1.20.2、1.21.1、1.22.1
   
Any response for this problem is great appreciated.
   
   bbox nslookup uses libc to perform the lookup.
   
   glibc maintainers known to be quite.. er.. stubborn
   about how DNS should work.
   
   For example, they insist that IPv6 DNS requests must be sent
   even if the machine has no IPv6 support in kernel
   (let alone a more typical case where machine
   has no IPv6 connectivity).
   
   Your DNS server does not respond to IPv6 requests,
   but glibc waits for them.
  
  Unless the caller requested AI_ADDRCONFIG or requested AF_INET
  explicitly as opposed to AF_INET6, it's required to do this. And I
  don't think it's a bug. It may be useful to know all DNS results even
  if some of them (v6) won't be used for your current client setup. The
  bug is in whatever broken nameserver is ignoring  requests rather
  than properly looking them up and returning a result.
 
 Well, in many cases users have no power over that nameserver.
 
 Forcing them to suffer instead of giving them ways to disable
  requests is arrogant.

As I said an option in nslookup to do this would be nice. I believe
glibc also has an option in resolv.conf (mentioned earlier in this
thread, IIRC) to disable the  requests globally. But I don't think
it makes sense to complain about glibc doing both lookups by default,
since that is the behavior specified/required by the standard. IMO
this is one of the few cases of glibc doing the right thing (properly
supporting modern usage rather than focusing on bug-compatibility for
broken legacy setups).

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: busybox nslookup slow on x86_64

2014-05-29 Thread Rich Felker
On Wed, May 28, 2014 at 04:34:23PM +0200, Denys Vlasenko wrote:
 On Tue, May 27, 2014 at 9:34 AM, muddyboot emu...@gmail.com wrote:
  Hi, I found nslookup resolve very slow on x86_64 system , it cost 5 seconds 
  or longer almost everytime.
 
  Tested OS: Debian 7 x86_64 with kernel 3.2.5   LFS x86_64 with kernel 3.12
 
  No IPv6 enabled in kernel config.
  DNS server works fine
  nslookup program from bind-9.7 works fine
  nslookup from busybox test on i686 system OK
 
  target busybox version: 1.17.4、1.20.2、1.21.1、1.22.1
 
  Any response for this problem is great appreciated.
 
 bbox nslookup uses libc to perform the lookup.
 
 glibc maintainers known to be quite.. er.. stubborn
 about how DNS should work.
 
 For example, they insist that IPv6 DNS requests must be sent
 even if the machine has no IPv6 support in kernel
 (let alone a more typical case where machine
 has no IPv6 connectivity).
 
 Your DNS server does not respond to IPv6 requests,
 but glibc waits for them.

Unless the caller requested AI_ADDRCONFIG or requested AF_INET
explicitly as opposed to AF_INET6, it's required to do this. And I
don't think it's a bug. It may be useful to know all DNS results even
if some of them (v6) won't be used for your current client setup. The
bug is in whatever broken nameserver is ignoring  requests rather
than properly looking them up and returning a result.

However, it may be nice to have an option for bb nslookup to turn off
v6 lookups if such an option doesn't already exist.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 09:18:26AM +0200, Harald Becker wrote:
 Hi Denys !
 
  For what it's worth the users with this problem were unable to
  remove the files using wildcards. For example, one user had a
  file named:
 
På hjul.mkv
 
  ls P* displayed the file.
  rm P* returned the error can't remove 'På Hjul.mkv': No such
  file or directory
 
 I have hard time believing this.
 Wildcard expansion is done by the shell, not by ls and rm.
 
 IOW: ls and rm see exactly the same expanded names.
 
 Since they don't mangle the names in any way
 (e.g. no UTF-8 decoding) before feeding them to system calls,
 it should work.
 
 I know this problem very well. It happens about every few month,
 that I get a ZIP packaged file from a Windows system. As the
 maintainer is a bit stupid, he can't manage to avoid foreign
 characters and I end up with unusual file names after unzip.

This sounds like a bug in the unzip utility. If it encounters byte
sequences which are not UTF-8, it should convert them from whatever
legacy encoding they're in to UTF-8, possibly issuing an error that
the user needs to specify this encoding if it can't be determined.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox

Re: Issues removing files with certain characters in their names.

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 11:27:07PM +0200, Harald Becker wrote:
 Hi Rich !
 
  I know this problem very well. It happens about every few
  month, that I get a ZIP packaged file from a Windows system.
  As the maintainer is a bit stupid, he can't manage to avoid
  foreign characters and I end up with unusual file names after
  unzip.
 
 This sounds like a bug in the unzip utility. If it encounters
 byte sequences which are not UTF-8, it should convert them from
 whatever legacy encoding they're in to UTF-8, possibly issuing
 an error that the user needs to specify this encoding if it
 can't be determined.
 
 Then you need to consider all programs buggy which don't
 mangle with the file names. There are so many programs which just
 copy filenames through and let the kernel decide what to do. And
 I do not mean BB unzip here, normally I'm using the upstream
 unzip.
 
  and how can you consider all names being UTF-8 ... nowadays
 may be, but what when using 8 bit locales with different
 charsets? UTF-8 mangling would be wrong on those.

My statement was imprecise; of course to support users still stuck on
legacy locales, nl_langinfo(CODESET) should be consulted.

  and not only unzip may produce such results. Think of using
 an USB stick at an Windows machine, then carry that over to an
 Linux machine.

The filenames are stored in UCS-2. No problem.

 Depending on how the file system is mounted you
 may get unusual file names when copying names with foreign
 characters. Now who is bad?

If you mount it incorrectly, then this is user error. Note that
correct versus incorrect does not depend on the contents of the
storage device, only the encoding the local system where you're
mounting it is using.

 Would be nice to have them all fixed ... get them all fixed the
 same way when doing some mapping ... but can that ever reach all
 programs? This is a so long standing problem, nobody really
 cares. 

All programs are not affected. Only programs which read filenames as
byte strings from foreign sources (such as the directory table of a
zip file) are affected.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: busybox nslookup slow on x86_64

2014-05-29 Thread Rich Felker
On Thu, May 29, 2014 at 11:30:58PM +0200, Harald Becker wrote:
 Hi Rich !
 
  bbox nslookup uses libc to perform the lookup.
 
 However, it may be nice to have an option for bb nslookup to
 turn off v6 lookups if such an option doesn't already exist.
 
 The problem has been solved by placing single-request option
 in /etc/resolv.conf. So it was a glibc related problem.

This option is a workaround for buggy nameserver software on some
routers that hangs when you perform multiple requests at the same
time. It's far from being a complete workaround since multiple
processes/threads (or even different machines behind the router) might
make simultaneous requests in a way that hangs the router. The correct
fix is not to use the built-in nameserver on such routers but to
instead either configure a local nameserver on 127.0.0.1 or use a
third-party one (e.g. 8.8.8.8). Or replace the router's firmware with
OpenWRT if possible.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Shell test or quoting mechanism breaks on parens

2014-05-19 Thread Rich Felker
On Mon, May 19, 2014 at 05:02:48PM -0500, dung_ngu...@dell.com wrote:
 Dell - Internal Use - Confidential
 Hello,
 I used to use busybox 1.00 and this command line returns 0 or at least the 
 output is the same as RHEL 5.3 shell:
 
# [ '(' = '(' ] ; rc=$? ; echo $rc ; [ $rc -ne 0 ]  echo EVIL SH 
 TEST/QUOTE
0
 
 However, after I upgraded to busybox 1.20.2 and I see different output:
 
# [ '(' = '(' ] ; rc=$? ; echo $rc ; [ $rc -ne 0 ]  echo EVIL SH 
 TEST/QUOTE
sh: closing paren expected
2
EVIL SH TEST/QUOTE
 
 Does anyone know why the output is different with busybox 1.00 or is this a 
 security issue from cmd line ?

As far as I can tell the issue is in the test command (both standalone
and in the shell), not in the shell itself. The standard test/[
command has very specific rules for how its arguments are interpreted
and Busybox does not seem to be honoring them. See:

http://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html

In particular, if there are 3 arguments and the second is a binary
primary (which '=' is) then, regardless of the contents of the first
and third, the binary primary is applied to them.

Hopefully someone familiar with the Busybox source can look into this.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: bug in busybox sed with non-ascii chars

2014-05-07 Thread Rich Felker
On Mon, May 05, 2014 at 08:08:32PM +0100, Sam Liddicott wrote:
 One of the advantages of utf-8 encoding was that it was easy to re-sync
 after an invalid sequence.
 
 It's a bit of a waste to then not do that. Minus points for musl.

An application can resync, although the C multibyte interfaces are not
really designed to be used this way (and you have to be careful if the
locale's encoding might be state-dependent, e.g. some legacy CJK
encodings). However the implementation cannot silently resync behind
your back. Doing so introduces serious bugs, some of which may be
security-relevant, since you either silently miss seeing some bytes
from the input when processing input via conversion to wide
characters, or some invalid sequences appear to the application as
valid. Either possibility is dangerous. In particular, it's wrong for
the regex . to match anything that's an illegal sequence, and wrong
for ^.*$ to match a line containing any illegal sequences (since the
. can't match it).

 Can you not run sed with LANG=C or LANG=POSIX?

That's not what they're doing, but it's not a solution anyway. ISO C
leaves the character encoding of the C locale implementation-defined,
and the Rationale text from the 1995 amendments to C explicitly allows
for the possibility that the C locale's character encoding has
multibyte characters (e.g. is UTF-8).

musl presently does not support byte-based characters at all, only
UTF-8. This conforms to the current versions of ISO C and POSIX, but
the Austin Group has adopted a requirement that the C locale be 8
bit clean as a future requirement, which musl will probably support
at some time in the future.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: bug in busybox sed with non-ascii chars

2014-05-04 Thread Rich Felker
On Sun, May 04, 2014 at 04:44:10PM +0200, Denys Vlasenko wrote:
 On Sat, May 3, 2014 at 5:07 PM, Rich Felker dal...@libc.org wrote:
  Lets refuse to find end of line if there is a non UTF-8 sequence inside 
  that line?
  Sounds wrong to me...
 
  sed (also regcomp and regexec) requires text input. Byte streams with
  illegal sequences are not text. Actually since the regex is not trying
  to match the illegal sequence, just the end-of-line, it would
  theoretically be possible to make this work (and it will once we
  overhaul the regex implementation to work with byte-based DFA's rather
  than character-based ones), but that doesn't change the fact that it's
  an invalid test.
 
 Language lawyering is less important that real world usage.

Indeed it's nice to support additional real-world usage when doing so
does not harm any other usage. But we're not talking about real-world
usage here. We're talking about a buggy configure test.

I'd love to improve or even rewrite the regex engine but that's a lot
of work and lower priority than a number of other things on the musl
roadmap.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: bug in busybox sed with non-ascii chars

2014-05-03 Thread Rich Felker
On Sat, May 03, 2014 at 03:17:49PM +0200, Denys Vlasenko wrote:
 On Saturday 03 May 2014 05:10, Rich Felker wrote:
  On Wed, Apr 30, 2014 at 10:31:00AM +0200, Natanael Copa wrote:
   Hi,
   
   I came across a bug (or feature) in busybox sed when trying to build 
   firefox-29.
   
   Testcase based on what firefox's configure scripts does:
   
   ASCII='AA'
   NONASCII=$'\246\246'
   
   echo -e ($ASCII)\n($NONASCII) | busybox sed 's/$/,/'
  
  The above script is invalid; \246\246 is an illegal sequence and thus
  is rejected by regexec. It will work only on non-UTF-8 systems/locales
  (which musl does not support).
 
 Lets refuse to find end of line if there is a non UTF-8 sequence inside that 
 line?
 Sounds wrong to me...

sed (also regcomp and regexec) requires text input. Byte streams with
illegal sequences are not text. Actually since the regex is not trying
to match the illegal sequence, just the end-of-line, it would
theoretically be possible to make this work (and it will once we
overhaul the regex implementation to work with byte-based DFA's rather
than character-based ones), but that doesn't change the fact that it's
an invalid test.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: Changing default shell for Cron

2014-05-02 Thread Rich Felker
On Mon, Apr 28, 2014 at 05:23:07PM +, Rob Anderson wrote:
 Hello,
 
 I have compiled Busybox for Android with a 
 fully working cron. However, when I try to run crond/crontab, I get the 
 following output in the log;
 
 crond: can't execute '/bin/sh' for user root
 
 Unfortunately,
  I have tried all the usual suggestions on the internet e.g. adding PATH
  and SHELL to the cron but to no avail (I can get cron working perfectly
  by symlinking /system/bin/sh to /bin/sh but I do not like this method). 
 Is it possible to hardcode a different default shell path into the 
 Busybox source for cron to execute e.g. the default /system/bin/sh for 
 Android? If so, could you please advise whereabouts in the source this 
 change needs to be made?

This is really just android being gratuitously broken; it would be
really nice to pressure them to fix this instead of adapting
everything to their brokenness...

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: bug in busybox sed with non-ascii chars

2014-05-02 Thread Rich Felker
On Wed, Apr 30, 2014 at 10:31:00AM +0200, Natanael Copa wrote:
 Hi,
 
 I came across a bug (or feature) in busybox sed when trying to build 
 firefox-29.
 
 Testcase based on what firefox's configure scripts does:
 
 ASCII='AA'
 NONASCII=$'\246\246'
 
 echo -e ($ASCII)\n($NONASCII) | busybox sed 's/$/,/'

The above script is invalid; \246\246 is an illegal sequence and thus
is rejected by regexec. It will work only on non-UTF-8 systems/locales
(which musl does not support).

Please file a bug with Firefox.

Rich


P.S. I think you got my response to this on #musl but it's nice to
have the resolution here for the record anyway.
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] memset 0 in obscure is optimized away by compiler

2014-04-16 Thread Rich Felker
On Wed, Apr 16, 2014 at 07:14:05PM +0200, Harald Becker wrote:
 Hi Tito !
 
 I've tried to find out if memset is really optimized away in
 this case with some test code that I've compiled with :
 
 What is wrong with optimization of code, e.g. replacing call to
 memset by a quick loop which does same thing even faster than a
 function call? ... beside that such optimization needs normally a
 bit more of space it is faster and does same thing as original
 function call. Try using size optimization (-Os) to avoid
 replacing function calls by inline replacements.

The quick loop will equally be optimized away, as neither it nor the
memset have any observable effect.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH] memset 0 in obscure is optimized away by compiler

2014-04-16 Thread Rich Felker
On Wed, Apr 16, 2014 at 07:51:42PM +0200, Harald Becker wrote:
 Hi Tito !
 
 my fear is/was that the call to memset is totally
 optimized away when optimization is turned on
 and therefore the memory containing the password
 strings is not zeroed at all.
 
 This would be a completely ill behavior of compiler optimization.

No it's not. It's identical to optimizing the following to an empty
function:

void foo() { int x; x=1; }

This function is equivalent to the empty function because it has no
observable behavior. Someone who expects it to leave a 1 at some
position on the stack that the caller could later observe is smoking
the undefined behavior pipe and needs to forget everything wrong they
learned about the C language and Read The Fine Standard.

 I do only know one case where the memset may be optimized to
 nothing, that is, when you fill content of an auto variable near
 end of a function, and don't further access that memory space.
 That would be same as something like:
 
 int func( int i, j )
 {
   int x;
 
   ... // put any code here using variable x
 
   x = 0;  // this may be optimized away, as x is not accessed
   after setting it to a value
 
   return i + j; 
 }

That's exactly the situation here. The lifetime of the object being
cleared by memset ends sufficiently close to the memset that the
compiler is able to achieve the same observable effects that would be
seen in the abstract machine without actually performing any memory
fill.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: [PATCH v2 1/2] mktemp: avoid removed mktemp()

2014-04-14 Thread Rich Felker
On Mon, Apr 14, 2014 at 02:29:21PM +0200, Denys Vlasenko wrote:
 On Mon, Apr 14, 2014 at 12:20 PM, Bernhard Reutner-Fischer
 rep.dot@gmail.com wrote:
  This is a change in behavior - now we would
  actually create, and then immediately delete the directory
  if run as mktemp -du.
 
  I have mixed feelings about that.
  Do you think it's ok?
 
  I think it's ok, yes. The race-window is about the same as
  before, the creat()/unlink() do make it a bit slower but that
  is IMO ok.
 
 There was no race at all before for mktemp -n case:
 
 if (opts  OPT_u) {
 chp = mktemp(chp);
 if (chp[0] == '\0')
 goto error;
 } else ...
 puts(chp);
 return EXIT_SUCCESS;
 
 No race here.

The 'race' is in the caller, but it's not always a race. If the caller
properly uses O_EXCL or similar and retries on fail, there's no race.
Note that there are legitimate uses of mktemp -u, such as along with
mkfifo or for choosing a name for a socket or shared memory object or
similar.

  An alternative would be to export a uClibc-specific
  __gen_tempname() function
 
 How about exporting a function named mktemp()?
  ;)
 POSIX removing it from its text doesn't mean that
 a conforming libc must *not* provide it.

Agreed. I don't see any value in removing these functions. If support
for libcs that lack them is desired, a proper implementation should be
provided in busybox rather than poor hacks that involve creating and
deleting the temp file, and which could fail due to odd permission
setups, etc.

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


Re: dc hitting a compiler bug, or undefined behavior

2014-04-02 Thread Rich Felker
On Mon, Mar 31, 2014 at 02:17:33PM +0200, Denys Vlasenko wrote:
 On Sun, Mar 30, 2014 at 11:18 AM, Lauri Kasanen cur...@operamail.com wrote:
  Hi,
 
  I'm seeing busybox dc acting funny when compiled with some versions of
  gcc. This is on busybox git. The x86 binary busybox_unstripped and
  config are attached.
 
  gcc 4.2.2 - ok
  gcc 4.7.2:
  nc 10 1 add p
  2.738e+93
 
  So either bb is hitting a compiler bug,
 
 Looks like a compiler bug:
 
 d = strtod(argument, end);
 if (end != argument  *end == '\0') {
 push(d);
 
 is compiled to this:
 
0x08049490 +27:call   0x80488b0 strtod@plt
0x08049495 +32:mov-0x14(%ebp),%eax
0x08049498 +35:pop%ecx
0x08049499 +36:pop%ebx
0x0804949a +37:cmp%esi,%eax
0x0804949c +39:je 0x80494b1 stack_machine+60
0x0804949e +41:cmpb   $0x0,(%eax)
0x080494a1 +44:jne0x80494b5 stack_machine+64
0x080494a3 +46:push   %eax
0x080494a4 +47:push   %eax
0x080494a5 +48:fstpl  (%esp)   HERE
0x080494a8 +51:call   0x8048f10 push
 
 Note how push(double a) argument gets passed on stack.
 But push() starts with this:
 
 Dump of assembler code for function push:
 = 0x08048f10 +0:fldl   (%edi)
 
 which reads argument from some bogus location instead
 of stack:
 
 (gdb) nexti  === EXECUTE fldl
 0x08048f12 in push ()
 (gdb) p $st0
 $1 = 2.0554264135011661055418432229752339e-314
 
 That's not 10!
 10 exists on stack all right:
 
 (gdb) p *(double*)($esp+4)
 $5 = 10
 
 the program just doesn't fetch it correctly.

It really looks to me like some non-default optimization is going on
and misbehaving. I've never seen gcc emit this kind of 'bare' function
without prologue/epilogue (or in this case, the ugly prologue moved
inside the conditional just before the function call). Is the busybox
build system doing anything behind the scenes (or perhaps with #pragma
or hidden attribte tags?) to get gcc to adjust the calling convention
for some functions?

Rich
___
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox


  1   2   3   4   >