from:"Michael Forney"

Re: [hackers] [sbase] libutf: add some const's

2023-04-13 Thread Michael Forney

Applied, thanks!

Re: [hackers] [sbase][PATCH] cp: don't abort when src and dest file are the same

2023-04-13 Thread Michael Forney

Applied, thanks!

Re: [hackers] Re: [sbase][PATCH] dd: fix for ibs * count < obs

2023-04-13 Thread Michael Forney

On 2022-11-29, phoebos  wrote:
> Perhaps I should add some context.
>
> Running:
>
> yes | dd ibs=1 count=1
>
> reports that 512 bytes are read:
>
> 512+0 records in
> 1+0 records out
>
> Only one byte should be read, as occurs with GNU and busybox dd:
>
> 1+0 records in
> 0+1 records out
>
> I believe the reason for this bug in sbase is because of the comparison
> with obs in the below loop, which overlooks cases such as these.

Thanks for reporting this bug! Apologies for the delay.

> All feedback is welcome. Is this an appropriate fix to the bug?
>
> phoebos
>
> On Tue, Nov 22, 2022 at 16:28:35 +, phoebos wrote:
>> Previously, running `dd ibs=1 count=1` read 512 bytes rather than 1.
>> ---
>>  dd.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/dd.c b/dd.c
>> index 6061048..4081eca 100644
>> --- a/dd.c
>> +++ b/dd.c
>> @@ -174,7 +174,7 @@ main(int argc, char *argv[])
>>  /* XXX: handle non-seekable files */
>>  }
>>  while (!eof && (count == -1 || ifull + ipart < count)) {
>> -while (ipos - opos < obs) {
>> +while (ipos - opos < obs && ifull + ipart < count) {

The count == -1 part of the outer condition is still important.
Otherwise, without a count argument, we never read any data.

Rather than duplicate this condition twice, I've pushed a slightly
different fix to just set eof once we've read the specified number of
blocks. I think this should fix the problem.

>>  ret = read(ifd, buf + ipos, ibs);
>>  if (ret == 0) {
>>  eof = 1;
>> --
>> 2.38.1

Re: [hackers] [sbase][PATCH] sort.1: fix typo

2023-04-12 Thread Michael Forney

Applied, thanks!

Re: [hackers] sbase [PATCH] refer to re_format(7) for BSDs

2023-04-12 Thread Michael Forney

Applied, thanks!

Re: [hackers] [sbase][PATCH] Fix 'w' command not respecting '-s' option

2023-04-12 Thread Michael Forney

Applied, thanks!

Re: [hackers] [sbase][PATCH] flock: call exec without forking

2022-03-06 Thread Michael Forney

On 2022-03-06, Arthur Williams  wrote:
> Yeah I noticed that it wasn't defined in POSIX (which makes me wounder
> why it is here in sbase and not ubase).  To clarify what I meant, there
> are popular implementations that don't have the '-F' flag (like busybox)
> and adding it would just encourage people to write code that didn't work
> in many environments. And this may be fine if there was some benefit,
> but I just don't see the any positives of forking.
>
> But if we agree it isn't incorrect to not fork, any objection to the
> patch?

There are a couple of cases I can think of where not forking might not
be what you want:
- If the process forks again and then exits, the new child ends up
holding on to the lock longer than the lifetime of the original
process. There seems to be a -o flag to close the lock before execing
the child to prevent this, but this doesn't work if we exec directly.
- It's debatable whether this is a valid thing to do, but if the
process tries to close all open file descriptors (using something like
closefrom on OpenBSD), it might end up releasing the lock
accidentally.

The first point made me realize that -o with your patch makes flock(1)
a no-op. It would take the lock, then release it, and then exec. So if
we were to apply the patch, I think we'd need to remove the -o option.

Did you check what other implementations of flock(1) do in terms of
forking, and what options they support? It looks like in addition to
busybox, NetBSD and FreeBSD both have it.

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-16 Thread Michael Forney

On 2021-12-16, Laslo Hunhold  wrote:
> I know this thread is already long enough, but I took my time now to
> read deeper into the topic. Please read below, as we might come to a
> conclusion there now.

Thanks for sticking with it. I know this topic is quite pedantic and
hypothetical, but I think it's still important to consider and
understand.

> Interestingly, there was even an internal discussion on the
> gcc-bugtracker[0] about this. They were thinking about adding an
> attribute __attribute__((no_alias)) to the uint8_t typedef so it would
> explicitly lose the aliasing-exception.
>
> There's a nice rant on [1] and a nice discussion on [2] about this
> whole thing. And to be honest, at this point I still wasn't 100%
> satisfied.

Thanks for the links. The aliasing discussion in [0] is very
interesting, and I will definitely bookmark [1] to use as a reference
in the future.

> What convinced me was how they added UTF-8-literals in C11. There you
> can define explicit UTF-8 literals as u8"Hällö Wörld!" and they're of
> type char[]. So even though char * is a bit ambiguous, we document well
> that we expect an UTF-8 string. C11 goes further and accomodates us
> with ways to portably define them.

Interestingly, there is a C23 proposal[0] to introduce char8_t as a
typedef for unsigned char and change the type (!) of UTF-8 string
literals from char * to char8_t * (aka unsigned char *). It has not
been discussed in any meeting yet, but it will be interesting to see
what the committee thinks of it. I don't think u8 string literals are
widely used at this point, but it's weird to see a proposal breaking
backwards compatibility like this.

[0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm

> To also address this point, here's what we can do to make us all happy:
>
>   1) Change the API to accept char*
>   2) Cast the pointers internally to (unsigned char *) for bitwise
>  modifications. We may do that as we may alias with char, unsigned
>  char and signed char.
>   3) Treat it as an invalid code point when any bit higher than the 9th
>  is set. This is actually already in the implementation, as we have
>  strict ranges.
>
> Please take a look at the attached diff and let me know what you think.
> Is this portable and am I correct to assume we might even handle
> chars longer than 8 bit properly?

I agree with all of this. Your patch looks good to me.

> There's just one open question: Do you know of a better way than to do
>
>(char *)(unsigned char[]){ 0xff, 0xef, 0xa0 }
>
> to specify a literal char-array with specific bit-patterns?

I believe "\xff\xef\xa0" also works, but I am not very confident about
this; the wording of the standard is not clear to me.

It says (6.4.4.4p6)

> The hexadecimal digits that follow the backslash and the letter x
> in a hexadecimal escape sequence are taken to be part of the
> construction of a single character for an integer character constant
> or of a single wide character for a wide character constant. The
> numerical value of the hexadecimal integer so formed specifies the
> value of the desired character or wide character.

Okay, so '\xff' constructs a single character with value 255. But, is
'\xff' considered an integer character constant containing a single
character?

Then (6.4.4.4p10):

> An integer character constant has type int. The value of an integer
> character constant containing a single character that maps to a
> single-byte execution character is the numerical value of the
> representation of the mapped character interpreted as an integer.

Does this one apply? Not sure because later sentences mention escape
sequences explicitly, and it's not clear if 255 maps to a single-byte
execution character if CHAR_MAX == 127. Also, I'm not sure how to
parse the last part of the sentence (some grouping parentheses would
be helpful). The representation of 255 is , so what does it
mean to interpret as an integer (of what width)?

> The value of an integer character constant containing more than one
> character (e.g., 'ab'), or containing a character or escape sequence
> that does not map to a single-byte execution character, is
> implementation-defined.

If '\xff' is considered to not map to a single-byte execution
character, then this would indicate that it's implementation-defined.

> If an integer character constant contains
> a single character or escape sequence, its value is the one that
> results when an object with type char whose value is that of the
> single character or escape sequence is converted to type int.

What does it mean for a char to have value of the escape sequence,
since char may not be able to represent 255? Why are there two
sentences that specify the value of an integer character constant
containing a single character? If the first one applies, is this one
ignored?

The main thing that indicates to me that it is defined is example 2 in
that section (6.4.4.4p13):

> Consider implementations that use two's

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-16 Thread Michael Forney

On 2021-12-16, Laslo Hunhold  wrote:
> However, the case
> I'm making is that we can assume that
>
>  1) uint8_t exists
>  2) uint8_t == unsigned char

I think assumption 1 is valid, but not necessarily 2.

> This may not be directly specified in the standard, but follows from
> the following observations:
>
> 1) We make use of POSIX-functions in the code, so compiling
>libgrapheme requires a POSIX-compliant compiler and stdlib. POSIX
>requires CHAR_BIT == 8, which means that we can assume that chars
>are 8 bit, and thus uint8_t exists.
> 2) C99 specifies char to be of at least 8 bit size. Given char is meant
>to be the smallest addressable unit and uint8_t exists, char is
>exactly 8 bits.

Both of these observations are true, but just because uint8_t is 8-bit
and unsigned char is 8-bit doesn't mean that uint8_t == unsigned char.
A C implementation can have implementation-defined extended integer
types, so it is possible that it defines uint8_t as an 8-bit extended
integer type, distinct from unsigned char (similar to how long long
and long may be distinct 64-bit integer types). As far as I know, this
would be still be POSIX compliant.

> However, here you have a problem when suddenly char is 16 bits (might
> be according to the standard). Because then you read in two
> UTF-8-code-units at once, but lg_utf8_decode silently discards half of
> the data in the high bits.
> But this wouldn't even happen, given POSIX mandates char to be 8 bits,
> and given even C99 mandates char to be of integral type, you only have
> one unique way to specify an unsigned integer of certain bit-length,
> given C99 also mandates that char shouldn't have any padding.

Ah, okay, I see what you mean. To be honest I'm not really sure how
something like file encoding and I/O would work on such a system, but
I was assuming that files would contain one code unit per byte, rather
than packing multiple code units into a single byte. For instance, on
a hypothetical system with 9-bit bytes, I wouldn't expect a code unit
to cross the byte boundary.

> So the case can be made that uint8_t == unsigned char, and casting
> between char and unsigned char is fine, so you just cast any char * to
> uint8_t * which will work as you would otherwise not have been able to
> even compile libgrapheme in the first place.
>
> Or am I missing something here except from the standard semantically
> making a difference? Is there any technical possibility to have a
> system that has CHAR_BIT == 8 where uint8_t != unsigned char?

Yes, I believe this is a possibility.

If you are assuming that unsigned char == uint8_t, I think you should
just use unsigned char in your API. You could document the API as
expecting one UTF-8 code unit per byte if you are worried about
confusion regarding CHAR_BIT.

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-15 Thread Michael Forney

On 2021-12-15, Laslo Hunhold  wrote:
> thanks for clearing that up! After more thought I made the decision to
> go with uint8_t, though. I see the point regarding character types, but
> this notion is more of a smelly foot in the C standard. We are moving
> towards UTF-8 as _the_ default encoding format, so considering
> character strings as such is justified.

I think this is a mistake. It makes it very difficult to use the API
correctly if you have data in an array of char or unsigned char, which
is usually the case.

Here's an example of some real code that has a char * buffer:
https://git.sr.ht/~exec64/imv/tree/a83304d4d673aae6efed51da1986bd7315a4d642/item/src/console.c#L54-58

How would you suggest that this code be written for the new API? The
only thing I can think is

if (buffer[position] != 0) {
  size_t bufferlen = strlen(buffer) + 1 - position;
  uint8_t *newbuffer = malloc(bufferlen);
  if (!newbuffer) ...
  memcpy(newbuffer, buffer + position, bufferlen);
  position += grapheme_bytelen(newbuffer);
  free(newbuffer);
}
return position;

This sort of thing would turn me off of using the library entirely.

> Any other way would have introduced too many implicit assumptions.

Like what?

If you really want your code to break when CHAR_BIT != 8, you could
use a static assert (there are also ways to emulate this in C99). But
even if CHAR_BIT > 8, unsigned char is perfectly capable to represent
all the values used in UTF-8 encoding, so I don't see the problem.

> And even if all fails and there simply is no 8-bit-type, one can always
> use the lg_grapheme_isbreak()-function and roll his own de/encoding.

I'm still confused as to what you mean by rolling your own
de/encoding. What would that look like?

If there is no 8-bit type, libgrapheme could not be compiled or used
at all since uint8_t would be missing.

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-12 Thread Michael Forney

On 2021-12-12, Laslo Hunhold  wrote:
> yes, if we were only accessing that would be fine. However, what about
> the other way around? libgrapheme also writes to arrays with
> lg_utf8_encode(), and that's where we can't just write to char.

You can do that with a cast, too:

*(unsigned char *)s = ...;

> But char and unsigned char are of integer type, aren't they?

They are integer types and character types. Character types are a
subset of integer types: char, signed char, and unsigned char.

> So on a
> POSIX-system, which is 99.999% of cases, it makes no difference if we
> cast between (char *) and (unsigned char *) (as you suggested above if
> we went with unsigned char * for the interfaces) and between (char *)
> and (uint_least8_t *), does it? So if the end-user has to cast anyway,
> then he can just cast to an uint* type as well.

The difference is that uint8_t and uint_least8_t are not necessarily
character types. Although the existence of uint8_t implies that
unsigned char has exactly 8 bits, uint8_t could be a separate 8-bit
integer type distinct from the character types. If this were the case,
accessing an array of unsigned char through a pointer to uint8_t would
be undefined behavior (C99 6.5p7).

Here are some examples:

char a[1] = {0};
// always valid, evaluates to 0
*(unsigned char *)a;
// always valid, sets the bits of a[0] to 
// but the value of a[0] depends on the signed-int representation
*(unsigned char *)a = 0xff;
// undefined behavior if uint8_t is not a character type
*(uint8_t *)a;
*(uint8_t *)a = 0xff;

uint8_t b[1] = {0};
// always valid, evaluates to 0
*(unsigned char *)b;
// always valid, sets the bits of a[0] to 
*(unsigned char *)b = 0xff;

> Even more drastically, given UTF-8 is an encoding, I don't really feel
> good about not being strict about the returned arrays in such a way that
> it becomes possible to have an array of e.g. 16-bit integers where only
> the bottom half is used and it become the user's job to then hand-craft
> it into a proper array to send over the network, etc. Surely one can
> hack around this as a library user, but at a certain point I think "to
> hell with it" and just be strict about it in the API. C already has a
> weak type system and I don't want to further weaken it by supporting
> decades-old implicit assumptions on types. So in a way, maybe uint8_t
> is the way to go, and then the library user immediately knows it's not
> going to work with his machine because uint8_t is not defined for him.

Not quite sure what you mean here. Are you talking about the case
where CHAR_BIT is 16? In that case, there'd be no uint8_t, so you
couldn't "hand-craft it into a proper array". I'm not sure how
networking APIs would work on such a system, but maybe they'd consider
only the lowest 8 bits of each byte.

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-12 Thread Michael Forney

On 2021-12-11, Laslo Hunhold  wrote:
> So would you say that the only good way would be to only accept arrays
> of unsigned char in the API? I think this seems to be the logical
> conclusion.

That's one option, but another is to keep using arrays of char, but
cast to unsigned char * before accessing. This is perfectly fine in C
since unsigned char is a character type and you are allowed to access
the representation of any object through a pointer to character type,
regardless of the object's actual type.

Accepting unsigned char * is maybe a bit nicer for libgrapheme's
implementation, but char * is nicer for the users, since that's likely
the type they already have. It also allows them to continue to use
string.h functions such as strlen or strcmp on the same buffer (which
also are defined to interpret characters as unsigned char).

> When I read more I found out that C++ introduced static_cast and
> reinterpret_cast for this simple reason: Assuming some crazy
> signed-int-representation we just make up in our heads (some random
> permutation of 0..255 to -127..128), it is impossible to really know the
> intent of the user passing us a (signed) char-array. Let's say
> "0b01010101" means "0" in our crazy signed type, does the user intend
> to convey to us a null-byte (which is simply "encoded" in the signed
> type), or does he literally mean "0b01010101"? With static_cast and
> reinterpret_cast you can handle both cases separately.

I guess it depends on how that data was obtained in the first place.
Say you have char buf[1024], and read UTF-8 encoded data from a file
into it. fread is defined in terms of fgetc, which "obtains that
character as unsigned char" and stores into an array of unsigned char
overlaying the object. In this case, accessing as unsigned char is the
intention.

I can't really think of a case where the intention would be to
interpret as signed char and convert to unsigned char. With
sign-magnitude, it'd be impossible to encode Ā (UTF-8 0xC4 0x80) this
way, since there is no char value that results in 0x80 when converted
to unsigned char.

I know it's just a thought experiment, but note that there are only
three signed-int representations valid in C: sign-magnitude, one's
complement, and two's complement. They only differ by the meaning of
the sign bit, which is the highest bit of the corresponding unsigned
integer type, so you couldn't go as crazy as the representation you
described.

>  1) Would you also go down the route of just demanding an array of
> unsigned integers of at least 8 bits?

I'd suggest sticking with char *, but unsigned char * seems reasonable as well.

>  2) Would you define it as "unsigned char *" or "uint_least8_t *"?
> I'd almost favor the latter, given the entire library is already
> using the stdint-types.

I don't think uint_least8_t is a good idea, since there is no
guarantee that it is a character type. The API user is unlikely to
have the data in a buffer of this type, so they'd potentially have to
allocate a new one and copy into it. With unsigned char *, they could
just cast if necessary.

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-11 Thread Michael Forney

On 2021-12-11, Michael Forney  wrote:
> Conversion of unsigned char values outside the range of char is
> implementation defined by C99 6.3.1.3p3:
>
>> Otherwise, the new type is signed and the value cannot be represented
>> in it; either the result is implementation-defined or an
>> implementation-defined signal is raised.

Also worth noting, this clause still remains even in the current C23
draft, which requires two's complement. So, assuming that CHAR_MAX ==
127, (char)0xFD will continue to be implementation defined and might
raise a signal. This is different from C++, which went a step further
to define conversion between all integer types to be the unique value
congruent to 2^N (where N is the number of bits of the destination
type).

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-11 Thread Michael Forney

Just want to mention up front that all of below is what I believe to
be true from my interpretation of the standard. I'm happy to be
corrected if I am wrong about any of this.

On 2021-12-11, Laslo Hunhold  wrote:
> Okay, maybe I misunderstood something here, but from what I understand
> casting between signed and unsigned char is well-defined, no matter the
> implementation. However, if you want to work bitwise it's only
> well-defined if you do it on an unsigned type (i.e. unsigned char in
> this case), which is why I cast to unsigned char. Where is the
> undefined behaviour here? Is it undefined behaviour to cast between
> signed and unsigned char when the value is larger than 128?

Neither conversion is undefined behavior, but unsigned char values >
CHAR_MAX converted to char is implementation defined.

Conversion of a negative char value to unsigned char is defined by C99
6.3.1.3p2:

> Otherwise, if the new type is unsigned, the value is converted by
> repeatedly adding or subtracting one more than the maximum value
> that can be represented in the new type until the value is in the
> range of the new type.

Conversion of unsigned char values outside the range of char is
implementation defined by C99 6.3.1.3p3:

> Otherwise, the new type is signed and the value cannot be represented
> in it; either the result is implementation-defined or an
> implementation-defined signal is raised.

>> > -  .arr = (uint8_t[]){ 0xFD },
>> > +  .arr = (char[]){
>> > +  (unsigned char)0xFD,
>> > +  },
>>
>> This cast doesn't do anything here. Both 0xFD and (unsigned char)0xFD
>> have the same value (0xFD), which can't necessarily be represented as
>> char. For example if CHAR_MAX is 127, this conversion is
>> implementation defined and could raise a signal (C99 6.3.1.3p2).
>>
>> I think using hex escapes in a string literal ("\xFD") has the
>> behavior you want here. You could also create an array of unsigned
>> char and cast to char *.
>
> From how I understood the standard it does make a difference. "0xFD" as
> is is an int-literal and it prints a warning stating that this cannot
> be cast to a (signed) char. However, it does not complain with unsigned
> char, so I assumed that the standard somehow safeguards it.

I'm not sure why casting to unsigned char makes the warning go away.
The only difference is the type of the expression (int vs unsigned
char), but the rules in 6.3.1.3 don't care about the source type, only
its value.

I'm not aware of any exception in the standard for unsigned char to
char conversion (but if there is one, I'd be interested to know).

> But when I got it correctly, you are saying that this only works
> because I assume two's complement, right? So what's the portable way to
> work with chars? :)

I guess it depends specifically on what you are trying to do. If you
want a char *, such that when it is cast to unsigned char * and
dereferenced, you get some value 0xAB, you could write "\xAB", or
(char *)(unsigned char[]){0xAB}. There isn't really a nice way to get
a char such that converting to unsigned char results in some value,
since this isn't usually what you want and can't be done in general
(with sign-magnitude, there is no char such that converting to
unsigned char results in 0x80).

Regarding two's complement assumption, consider the UTF-8 encoding of
α: 0xCE 0xB1 or 11001110 10110001. If you interpret that as two's
complement, you get [-50, -79]. Converting to unsigned char will add
256, resulting in [0xCE, 0xB1] like you want. However, with
sign-magnitude you get [-78, -49], converted to unsigned char is
[0xB2, 0xCF] (and something else for one's complement). If you instead
just interpret 11001110 10110001 as unsigned char, you get [0xCE,
0xB1] without depending on the signed integer representation. With
C23, the only possible interpretation of 11001110 10110001 as signed
char is [-50, -79], so it doesn't matter if you go through char or
directly to unsigned char, the result is the same.

Really, I think UTF-8 encoding stored in char * is kind of a lie,
since it doesn't really make sense to talk about negative code units,
but it is useful so that you can still use standard string libc
functions. The string.h functions are even specified to interpret as
unsigned char (C99 7.21.1p3):

> For all functions in this subclause, each character shall be
> interpreted as if it had the type unsigned char (and therefore every
> possible object representation is valid and has a different value).

Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold

2021-12-11 Thread Michael Forney

On 2021-12-11, g...@suckless.org  wrote:
> The type uint32_t is not guaranteed by the standard to be present,
> but it guarantees uint_least32_t. If a libgrapheme-user passes a
> pointer
> to an uint32_t (instead of uint_least32_t) there will be no problem,
> as the presence of uint32_t immediately implies uint32_t ==
> uint_least32_t.

It is true that the existence of uint32_t implies that uint_least32_t
also has exactly 32 bits and no padding bits, but they could still be
distinct types. For instance, on a 32-bit platform with int and long
both being exactly 32 bits, you could define uint32_t as one and
uint_least32_t as the other. In that case, dereferencing an array of
uint32_t as uint_least32_t would be undefined behavior.

That said, I agree with this change. It also has the benefit of
matching the definition of C11's char32_t.

> diff --git a/src/utf8.c b/src/utf8.c
> index 4488359..1cb5e17 100644
> --- a/src/utf8.c
> +++ b/src/utf8.c
> @@ -92,7 +101,7 @@ lg_utf8_decode(const uint8_t *s, size_t n, uint32_t *cp)
>* (i.e. between 0x80 (1000) and 0xBF (1011))
>*/
>   for (i = 1; i <= off; i++) {
> - if(!BETWEEN(s[i], 0x80, 0xBF)) {
> + if(!BETWEEN((unsigned char)s[i], 0x80, 0xBF)) {
>   /*
>* byte does not match format; return
>* number of bytes processed excluding the

Although irrelevant in C23, which will require 2's complement
representation, I want to note the distinction between (unsigned
char)s[i] and ((unsigned char *)s)[i]. The former adds 2^CHAR_BIT to
negative values, while the latter interprets as a CHAR_BIT-bit
unsigned integer (adds 2^CHAR_BIT if the sign bit is set). For
example, if char had sign-magnitude representation, we'd have
(unsigned char)"\x80"[0] == 0, but ((unsigned char *)"\x80")[0] ==
0x80.

The latter is probably what you want, but you could ignore this if you
only care about 2's complement (which is a completely reasonable
position).

> diff --git a/test/utf8-decode.c b/test/utf8-decode.c
> index 1182fb0..ee71cf9 100644
> --- a/test/utf8-decode.c
> +++ b/test/utf8-decode.c
> @@ -9,7 +9,7 @@
>  #define LEN(x) (sizeof(x) / sizeof(*(x)))
>
>  static const struct {
> - uint8_t *arr; /* UTF-8 byte sequence */
> + char*arr; /* UTF-8 byte sequence */
>   size_t   len; /* length of UTF-8 byte sequence */
>   size_t   exp_len; /* expected length returned */
>   uint32_t exp_cp;  /* expected code point returned */
> @@ -29,7 +29,9 @@ static const struct {
>* [ 1101 ] ->
>* INVALID
>*/
> - .arr = (uint8_t[]){ 0xFD },
> + .arr = (char[]){
> + (unsigned char)0xFD,
> + },

This cast doesn't do anything here. Both 0xFD and (unsigned char)0xFD
have the same value (0xFD), which can't necessarily be represented as
char. For example if CHAR_MAX is 127, this conversion is
implementation defined and could raise a signal (C99 6.3.1.3p2).

I think using hex escapes in a string literal ("\xFD") has the
behavior you want here. You could also create an array of unsigned
char and cast to char *.

>   .len = 1,
>   .exp_len = 1,
>   .exp_cp  = LG_CODEPOINT_INVALID,

Re: [hackers] [sbase][PATCH] nohup: Open nohup.out WRONLY

2021-10-23 Thread Michael Forney

Thanks for the patch, and sorry for the delay. Applied now.

Re: [hackers] [sbase][ed][PATCH] Fix use-after-free

2020-10-19 Thread Michael Forney

Thanks, applied.

Re: [hackers] [sbase][PATCH v2 0/5] fold fixes

2020-10-14 Thread Michael Forney

On 2020-10-09, Richard Ipsum  wrote:
> Hi, thanks for the feedback, suggested changes applied.

Thanks, applied.

Re: [hackers] [sbase][ed][PATCH] Fix use-after-free

2020-10-14 Thread Michael Forney

On 2020-10-14, Cág  wrote:
> I think it never made it to the list.
>
> ---
>  ed.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ed.c b/ed.c
> index cee9687..9e29297 100644
> --- a/ed.c
> +++ b/ed.c
> @@ -850,6 +850,7 @@ join(void)
>   delete(line1, line2);
>   inject(s.str, BEFORE);
>   free(s.str);
> + s.str = NULL;

This works, but I'm not sure why `s` is static here, since it is
cleared at the start of the function, and freed at the end of the
function.

Maybe I'm missing something, but I think it would be better to replace
the first `free(s.str)` with `s.str = NULL`, and make `s` have
automatic storage.

>  }
>
>  static void
> --
> 2.25.1

Re: [hackers] [sbase][PATCH 4/5] fold: fix handling of multibyte characters

2020-10-02 Thread Michael Forney

On 2020-10-02, Richard Ipsum  wrote:
> I'm happy to drop this patch from the series but libgrapheme isn't in
> sbase's tree and it doesn't seem reasonable to expect users of sbase to
> install libgrapheme themselves?

For now, I think it's fine to keep your patch (it is definitely an
improvement). Maybe you could add an entry to the TODO about using
libgrapheme to count grapheme clusters instead?

Re: [hackers] [sbase][PATCH 5/5] fold: fix handling of -s

2020-10-01 Thread Michael Forney

On 2020-06-20, Richard Ipsum  wrote:
> diff --git a/fold.c b/fold.c
> index 169064b..10a23cf 100644
> --- a/fold.c
> +++ b/fold.c
> @@ -29,12 +29,14 @@ foldline(struct line *l) {
>   eprintf("fwrite :");
>   if (l->data[i] != '\n')
>   putchar('\n');
> - last = (sflag && spacesect) ? spacesect : i;
> + if (sflag && spacesect)
> + i = spacesect;
> + last = i;
>   col = 0;
>   spacesect = 0;
>   }
>   runelen = chartorune(, l->data + i);
> - if (sflag && isspace(l->data[i]))
> + if (sflag && isblankrune(r))

It seems like this hunk might belong better with patch 4/5? Or maybe
4/5 should use isspacerune, and change it to isblankrune here.

>   spacesect = i + 1;

I think this should be `i + runelen` in case there is a multibyte blank.

>   if (!bflag && iscntrl(l->data[i])) {
>   switch(l->data[i]) {

Re: [hackers] [sbase][PATCH 2/5] fold: fix handling of \b

2020-10-01 Thread Michael Forney

On 2020-06-20, Richard Ipsum  wrote:
> diff --git a/fold.c b/fold.c
> index 9c3c919..a54594b 100644
> --- a/fold.c
> +++ b/fold.c
> @@ -19,7 +19,7 @@ foldline(struct line *l) {
>   for (i = 0, last = 0, col = 0, spacesect = 0; i < l->len; i++) {
>   if (!UTF8_POINT(l->data[i]) && !bflag)
>   continue;
> - if (col >= width) {
> + if (col >= width && (l->data[i] != '\b' || bflag)) {

I think '\r' might also need special handling here.

>   len = ((sflag && spacesect) ? spacesect : i) - last;
>   if (fwrite(l->data + last, 1, len, stdout) != len)
>   eprintf("fwrite :");

Re: [hackers] Re: [sbase][PATCH 0/5] fold fixes

2020-10-01 Thread Michael Forney

Hey Richard,

Thanks for the reminder about the patches.

On 2020-09-30, Richard Ipsum  wrote:
> Hi, looks like these patches got lost/forgotten,
> is there any reason to not merge them?

Sorry about that. I hadn't forgotten about them, I had just been
putting off digging into the code to review them properly. But that's
not fair to contributors like yourself (I really do appreciate all the
fixes you've been sending), so I think I will set myself a deadline of
one week to review patches so that this doesn't happen in the future.
If you don't hear from me within a week, feel free to send a reminder.

> It's been a while so I've almost forgotten how they work,
> but I've got a fair bit of free time to bring them
> up to scratch right now if they need more work.

I've just gone through them now, and they look pretty good to me. I
just have a few comments here and there.

-Michael

Re: [hackers] [sbase][PATCH 4/5] fold: fix handling of multibyte characters

2020-09-30 Thread Michael Forney

POSIX says we should be counting column positions rather than
codepoints, but I think that might be rather difficult to get right
and this is probably an improvement already.

I know Laslo has studied this area for libgrapheme, so maybe he has
suggestions.

On 2020-06-20, Richard Ipsum  wrote:
> diff --git a/fold.c b/fold.c
> index a5f320b..169064b 100644
> --- a/fold.c
> +++ b/fold.c
> @@ -7,6 +7,7 @@
>
>  #include "text.h"
>  #include "util.h"
> +#include "utf.h"
>
>  static intbflag = 0;
>  static intsflag = 0;
> @@ -15,11 +16,14 @@ static size_t width = 80;
>  static void
>  foldline(struct line *l) {
>   size_t i, col, last, spacesect, len;
> + Rune r;
> + int runelen;
> +
> + for (i = 0, last = 0, col = 0, spacesect = 0; i < l->len; i += runelen) 
> {
>

There's a extraneous blank line here.

> - for (i = 0, last = 0, col = 0, spacesect = 0; i < l->len; i++) {
> - if (!UTF8_POINT(l->data[i]) && !bflag)
> - continue;
>   if (col >= width && (l->data[i] != '\b' || bflag)) {
> + if (bflag && col > width)
> + i -= runelen;   /* never split a character */
>   len = ((sflag && spacesect) ? spacesect : i) - last;
>   if (fwrite(l->data + last, 1, len, stdout) != len)
>   eprintf("fwrite :");
> @@ -29,6 +33,7 @@ foldline(struct line *l) {
>   col = 0;
>   spacesect = 0;
>   }
> + runelen = chartorune(, l->data + i);

I think we should use charntorune(, l->data + i, l->len - i) here.

>   if (sflag && isspace(l->data[i]))
>   spacesect = i + 1;
>   if (!bflag && iscntrl(l->data[i])) {
> @@ -46,7 +51,7 @@ foldline(struct line *l) {
>   break;
>   }
>   } else {
> - col++;
> + col += bflag ? runelen : 1;
>   }
>   }
>   if (l->len - last)

Re: [hackers] [sbase][PATCH] Make 'w' command print byte count

2020-09-18 Thread Michael Forney

On 2020-09-14, Tait Hoyem  wrote:
> Here is the new patch:

Thanks, applied.

> + printf("%zd\n", bytecount);

This should be %zu (size_t is an unsigned type), but I fixed this up
when applying.

Re: [hackers] [sbase][PATCH] Make 'w' command print byte count

2020-09-13 Thread Michael Forney

Thanks for the patch.

On 2020-09-07, Tait Hoyem  wrote:
> diff --git a/ed.c b/ed.c
> index cee9687..82b9c2c 100644
> --- a/ed.c
> +++ b/ed.c
> @@ -623,14 +623,16 @@ static void
>  dowrite(const char *fname, int trunc)
>  {
>   FILE *fp;
> - int i, line;
> + int i, line, bytecount;

I think we should use a different type than int here. I'm not sure if
size_t or off_t is more appropriate, but size_t is probably
reasonable.

>
>   if (!(fp = fopen(fname, (trunc) ? "w" : "a")))
>   error("input/output error");
>
>   line = curln;
> - for (i = line1; i <= line2; ++i)
> + for (i = line1; i <= line2; ++i) {
> + bytecount += strlen(gettxt(i));
>   fputs(gettxt(i), fp);

I think we should try to avoid recomputing gettxt(i) here. In fact,
looking at the body of gettxt, it actually stores the result in
text.str, along with its length in text.siz.

What do you think about the following?

gettxt(i);
bytecount += text.siz - 1;
fwrite(text.str, 1, text.siz - 1);

> +  }

This should be indented with a tab.

>
>   curln = line2;
>   if (fclose(fp))
> @@ -638,6 +640,7 @@ dowrite(const char *fname, int trunc)
>   strcpy(savfname, fname);
>   modflag = 0;
>   curln = line;
> + printf("%d\n", bytecount);
>  }
>
>  static void
>
> diff --git a/TODO b/TODO
> index 59c9440..92acfa3 100644
> --- a/TODO
> +++ b/TODO
> @@ -55,7 +55,6 @@ ed
>  line
>  .
>  1g/^$/p
> -* w command doesn't print byte count.
>  * Editing huge files doesn't work well.
>
>  printf
>
>

Re: [hackers] setsid: add optional -f to force fork()

2020-07-25 Thread Michael Forney

On 2020-07-14, Hiltjo Posthuma  wrote:
> The below patch adds an -f flag to force fork(2)ing and creating a new
> process.

Thanks, I applied the patch.

Though I wonder, is Linux the only operating system that typically has
setsid(1)? It doesn't look like it is available on BSDs. It's such a
small tool and it can be implemented portably so I guess it's fine as
a part of sbase, but I wonder if there is some other tool with similar
functionality on other systems.

Re: [hackers] [sbase][PATCH] remove sbase-box from PHONY

2020-07-03 Thread Michael Forney

Thanks, applied.

[hackers] [PATCH sbase v2] libutil/recurse: Split into two functions

2020-06-23 Thread Michael Forney

recurse() is peculiar function since it is used for two purposes:

1. To bootstrap the recursion process.
2. To perform recursion when a directory is encountered.

In the first case, we stat() the directory, record it in the history
list, then depending on the value of DIRFIRST, we call the function
and then recurse or vice-versa. The function then calls recurse()
again on the same directory, but since we have already added it to
the history list, this call gets pruned.

In the second case, we stat() the directory a second time (we already
did this when traversing the parent directory's entries), then call
the function on each entry.

This approach leads to a few problems:

- We stat() each directory in the hierarchy twice, once at the
  beginning of recurse(), and once for each contained directory.
- We need a DIRFIRST flag to specify whether the function recurses
  at the start or end, even though just running the function would
  do things in the correct order.
- History is only checked when we are about to recurse on a directory,
  *after* we have already operated on it. So while we do prevent
  infinite loops, we double-count the directory.

The last problem can be demonstrated easily with du(1):

$ mkdir D && ln -s . D/E
$ du -L D
8   D/E
16  D
$

The correct result is:

$ du -L D
8   D
$

To solve these problems, split recurse() into two functions:
recurse(), which initializes the recursor struct then calls the
function on the given path, and recursedir(), which reads the
directory entries, calling the function on each one.
---
Changes since v1:
- Unconditionally initialize r->path in recurse()

 chgrp.c   |   2 +-
 chmod.c   |   4 +-
 chown.c   |   2 +-
 du.c  |   2 +-
 fs.h  |   5 +-
 libutil/recurse.c | 140 --
 libutil/rm.c  |   2 +-
 tar.c |   4 +-
 8 files changed, 84 insertions(+), 77 deletions(-)

diff --git a/chgrp.c b/chgrp.c
index 4042a0d..6be7e1d 100644
--- a/chgrp.c
+++ b/chgrp.c
@@ -24,7 +24,7 @@ chgrp(int dirfd, const char *name, struct stat *st, void 
*data, struct recursor
weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
diff --git a/chmod.c b/chmod.c
index c79488b..58f995b 100644
--- a/chmod.c
+++ b/chmod.c
@@ -19,7 +19,7 @@ chmodr(int dirfd, const char *name, struct stat *st, void 
*data, struct recursor
weprintf("chmod %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
@@ -32,7 +32,7 @@ usage(void)
 int
 main(int argc, char *argv[])
 {
-   struct recursor r = { .fn = chmodr, .maxdepth = 1, .follow = 'H', 
.flags = DIRFIRST };
+   struct recursor r = { .fn = chmodr, .maxdepth = 1, .follow = 'H' };
size_t i;
 
argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
diff --git a/chown.c b/chown.c
index 71628eb..27dd3a7 100644
--- a/chown.c
+++ b/chown.c
@@ -28,7 +28,7 @@ chownpwgr(int dirfd, const char *name, struct stat *st, void 
*data, struct recur
weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
diff --git a/du.c b/du.c
index 1815a02..6388159 100644
--- a/du.c
+++ b/du.c
@@ -42,7 +42,7 @@ du(int dirfd, const char *path, struct stat *st, void *data, 
struct recursor *r)
 
subtotal = nblks(st->st_blocks);
if (S_ISDIR(st->st_mode))
-   recurse(dirfd, path, , r);
+   recursedir(dirfd, path, , r);
*total += subtotal;
 
if (!r->depth)
diff --git a/fs.h b/fs.h
index 00ecd3b..9ee3c5d 100644
--- a/fs.h
+++ b/fs.h
@@ -18,12 +18,12 @@ struct recursor {
int maxdepth;
int follow;
int flags;
+   dev_t dev;
 };
 
 enum {
SAMEDEV  = 1 << 0,
-   DIRFIRST = 1 << 1,
-   SILENT   = 1 << 2,
+   SILENT   = 1 << 1,
 };
 
 extern int cp_aflag;
@@ -41,6 +41,7 @@ extern int rm_status;
 extern int recurse_status;
 
 void recurse(int, const char *, void *, struct recursor *);
+void recursedir(int, const char *, void *, struct recursor *);
 
 int cp(const char *, const char *, int);
 void rm(int, const char *, struct stat *st, void *, struct recursor *);
diff --git a/libutil/recurse.c b/libutil/recurse.c
index e66efaf..110195b 100644
--- a/libutil/recurse.c
+++ b/libutil/recurse.c
@@ -15,22 +15,36 @@
 
 int recurse_status = 0;
 
+/* returns 0 if we have seen this directory, 1 otherwise */
+static int
+histadd(struct recursor *r, struct stat *st)
+{
+   struct history *h;
+
+   if (S_ISDIR(st->st_mode)) {
+   for (h = r->hist; h; h =

Re: [hackers] [sbase] libutil/recurse: Simplify adding trailing slash || Michael Forney

2020-06-23 Thread Michael Forney

On 2020-06-23, Laslo Hunhold  wrote:
> isn't that a bug waiting to happen when anything about the semantics of
> the function later changes?

I don't think it will be a problem. We are not interested in the path
with an appended trailing slash, it is just an intermediate step in
constructing the path to the child. When we add the final path
component, we necessarily check that the buffer has enough space for
the directory entry name and nul-terminator.

Re: [hackers] [PATCH sbase] libutil/recurse: Split into two functions

2020-06-23 Thread Michael Forney

Thanks for testing this out, Richard.

On 2020-06-23, Richard Ipsum  wrote:
> I don't feel qualified to criticise the overall design, but do we not still
> need a way to specify whether traversal should be pre-order or post-order?
> I figure this is what DIRFIRST was for right?

The effective traversal order can be controlled by structuring your
recursor function to call recursedir() at the beginning or end.

DIRFIRST was only useful because a top-level directory was handled
specially by recurse. Normally, recurse() does not "visit" its
argument at all, only its children. For D/f, the call structure was
something like this for DIRFIRST:

recurse(D)
stat(D)
add to history
r->fn(D)
do something with D
recurse(D)
stat(D)
prune since we've already seen D
...
r->fn(D/entry1)
do something with D/entry1
recurse(D/entry1)
...
...
r->fn(D/entry2)
do something with D/entry2
recurse(D/entry2)
...
...

For not DIRFIRST, it is

recurse(D)
stat(D)
add to history
r->fn(D/entry1)
recurse(D/entry1)
...
...
do something with D/entry1
r->fn(D/entry2)
recurse(D/entry2)
...
...
do something with D/entry2

r->fn(D)
do something with D
recurse(D)
stat(D)
prune since we've already seen D
...

As you can see, the DIRFIRST flag is only used at the toplevel for
bootstrapping. In other cases, it is ignored, since the function is
just structured differently depending on whether it needs the children
processed first or last:

fn(D)
do something with D
recurse(D)

or

fn(D)
recurse(D)
do something with D

>> -if (dirfd == AT_FDCWD)
>> -pathlen = estrlcpy(r->path, name, sizeof(r->path));
>
> Now that we no longer do this, r->path is not being reset between
> separate calls, so we have:
>
> % mkdir D
> % echo 'hello world' > f
> % du D f
> 4   D
> 4   f
> % ~/sbase/du D f
> 4   D
> 4   D

Ah, good catch. In recurse(), I still do the initialization, but I
changed the condition to `if (!r->pathlen)` thinking that would be
slightly better than `if (dirfd == AT_FDCWD)` in case the caller were
to pre-initialize r->path itself and pass a special directory FD. But,
nothing actually does this so it's not worth worrying about.

I think a better solution is just unconditionally initialize r->path
in recurse().

-Michael

[hackers] [PATCH sbase] libutil/recurse: Split into two functions

2020-06-23 Thread Michael Forney

recurse() is peculiar function since it is used for two purposes:

1. To bootstrap the recursion process.
2. To perform recursion when a directory is encountered.

In the first case, we stat() the directory, record it in the history
list, then depending on the value of DIRFIRST, we call the function
and then recurse or vice-versa. The function then calls recurse()
again on the same directory, but since we have already added it to
the history list, this call gets pruned.

In the second case, we stat() the directory a second time (we already
did this when traversing the parent directory's entries), then call
the function on each entry.

This approach leads to a few problems:

- We stat() each directory in the hierarchy twice, once at the
  beginning of recurse(), and once for each contained directory.
- We need a DIRFIRST flag to specify whether the function recurses
  at the start or end, even though just running the function would
  do things in the correct order.
- History is only checked when we are about to recurse on a directory,
  *after* we have already operated on it. So while we do prevent
  infinite loops, we double-count the directory.

The last problem can be demonstrated easily with du(1):

$ mkdir D && ln -s . D/E
$ du -L D
8   D/E
16  D
$

The correct result is:

$ du -L D
8   D
$

To solve these problems, split recurse() into two functions:
recurse(), which initializes the recursor struct then calls the
function on the given path, and recursedir(), which reads the
directory entries, calling the function on each one.
---
 chgrp.c   |   2 +-
 chmod.c   |   4 +-
 chown.c   |   2 +-
 du.c  |   2 +-
 fs.h  |   5 +-
 libutil/recurse.c | 141 --
 libutil/rm.c  |   2 +-
 tar.c |   4 +-
 8 files changed, 85 insertions(+), 77 deletions(-)

diff --git a/chgrp.c b/chgrp.c
index 4042a0d..6be7e1d 100644
--- a/chgrp.c
+++ b/chgrp.c
@@ -24,7 +24,7 @@ chgrp(int dirfd, const char *name, struct stat *st, void 
*data, struct recursor
weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
diff --git a/chmod.c b/chmod.c
index c79488b..58f995b 100644
--- a/chmod.c
+++ b/chmod.c
@@ -19,7 +19,7 @@ chmodr(int dirfd, const char *name, struct stat *st, void 
*data, struct recursor
weprintf("chmod %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
@@ -32,7 +32,7 @@ usage(void)
 int
 main(int argc, char *argv[])
 {
-   struct recursor r = { .fn = chmodr, .maxdepth = 1, .follow = 'H', 
.flags = DIRFIRST };
+   struct recursor r = { .fn = chmodr, .maxdepth = 1, .follow = 'H' };
size_t i;
 
argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
diff --git a/chown.c b/chown.c
index 71628eb..27dd3a7 100644
--- a/chown.c
+++ b/chown.c
@@ -28,7 +28,7 @@ chownpwgr(int dirfd, const char *name, struct stat *st, void 
*data, struct recur
weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(dirfd, name, NULL, r);
+   recursedir(dirfd, name, NULL, r);
}
 }
 
diff --git a/du.c b/du.c
index 1815a02..6388159 100644
--- a/du.c
+++ b/du.c
@@ -42,7 +42,7 @@ du(int dirfd, const char *path, struct stat *st, void *data, 
struct recursor *r)
 
subtotal = nblks(st->st_blocks);
if (S_ISDIR(st->st_mode))
-   recurse(dirfd, path, , r);
+   recursedir(dirfd, path, , r);
*total += subtotal;
 
if (!r->depth)
diff --git a/fs.h b/fs.h
index 00ecd3b..9ee3c5d 100644
--- a/fs.h
+++ b/fs.h
@@ -18,12 +18,12 @@ struct recursor {
int maxdepth;
int follow;
int flags;
+   dev_t dev;
 };
 
 enum {
SAMEDEV  = 1 << 0,
-   DIRFIRST = 1 << 1,
-   SILENT   = 1 << 2,
+   SILENT   = 1 << 1,
 };
 
 extern int cp_aflag;
@@ -41,6 +41,7 @@ extern int rm_status;
 extern int recurse_status;
 
 void recurse(int, const char *, void *, struct recursor *);
+void recursedir(int, const char *, void *, struct recursor *);
 
 int cp(const char *, const char *, int);
 void rm(int, const char *, struct stat *st, void *, struct recursor *);
diff --git a/libutil/recurse.c b/libutil/recurse.c
index e66efaf..0403e16 100644
--- a/libutil/recurse.c
+++ b/libutil/recurse.c
@@ -15,22 +15,37 @@
 
 int recurse_status = 0;
 
+/* returns 0 if we have seen this directory, 1 otherwise */
+static int
+histadd(struct recursor *r, struct stat *st)
+{
+   struct history *h;
+
+   if (S_ISDIR(st->st_mode)) {
+   for (h = r->hist; h; h = h->prev) {
+   if (h->ino == st->st_ino && h->dev ==

Re: [hackers] [sbase][PATCH] du: recurse: fix path

2020-06-20 Thread Michael Forney

On 2020-06-03, Richard Ipsum  wrote:
> path is not fixed up on exit from recursive step, this leads to
> incorrect paths in du's output.

Thanks, applied with a small tweak to keep the invariant that
r->pathlen == strlen(r->path).

Re: [hackers] [sbase] install: Use fchown to change owner || Michael Forney

2020-05-25 Thread Michael Forney

On 2020-05-24, Quentin Rameau  wrote:
> Hi Michael,
>
>> +if (fchown(f2, owner, group) < 0)
>> +eprintf("lchown %s:", s2);
>
> Little typo here ^

Whoops, missed that. In the follow up commit, I used the correct
function name, so I think it's just a mistake from when I split my
changes into two commits.

Re: [hackers] [PATCH] tar: if first argument doesn't have a leading dash, prepend one

2020-05-18 Thread Michael Forney

On 2020-05-18, Laslo Hunhold  wrote:
> I have objections for this patch. I don't like the old style and it's
> horribly annoying with tools like unrar and 7z as well. I don't know
> why the compression tools all ignore years of established command line
> syntax, but I think we shouldn't chime in to this mad play.

Whether you like it or not, it's the most common usage of tar by far,
and as far as I know, the only one that was ever standardized. You are
not forced to use this syntax, the usage following the Utility Syntax
Guidelines still works after this patch.

> At least, I would print a warning if the old style syntax is seen so
> people start fixing their scripts.

On what basis are scripts written to the SUSv2 specification broken?
Because we said so? Ethan looked at a bunch of tar implementations,
and some did not support the hyphenated options. Therefore, the most
portable way to call tar is with the old-style options.

Personally, I think tar is a hopeless interface and we should
implement pax in sbase. I started on an implementation a while ago,
but it is unfinished. After this, we can remove this tar
implementation, which has some known bugs and deficiencies, and
possibly replace it with a tar compatibility interface to pax. This
tar compatibility interface should probably support the hyphen-less
option key, since its whole purpose is legacy compatibility.

Re: [hackers] [PATCH] tar: if first argument doesn't have a leading dash, prepend one

2020-05-18 Thread Michael Forney

On 2020-05-15, Ethan Sommer  wrote:
> this allows tar to be called in the common form "tar " instead of only
> allowing "tar -"

Thanks for the patch, Ethan.

This patch looks fine to me, but seeing as we used to support this
usage before it was reverted in [0], I'd like to see if anyone else
has comments. The approach you used avoids the code duplication that
may have been the motivation for its removal.

[0] 
https://git.suckless.org/sbase/commit/2334c049528450e46062395b70b9460542000d60.html

Re: [hackers] [PATCH] nl.1: document pages

2020-05-15 Thread Michael Forney

Thanks for the patch, Richard. Applied.

[hackers] [PATCH sbase] Add implementation of dd(1)

2020-04-30 Thread Michael Forney

---
Although ubase has an implementation of dd(1), it is Linux-specific,
it does not conform to the POSIX specification, it is missing support
for operands such as `ibs` and `obs`, and it does not error out on
unknown operands.

This is an initial attempt at a portable implementation for sbase.
I have not tested it thoroughly (only to run the openssh test suite,
which is what prompted this), so any help testing or reviewing is
appreciated.

 .gitignore |   1 +
 Makefile   |   1 +
 dd.1   |  91 +
 dd.c   | 234 +
 4 files changed, 327 insertions(+)
 create mode 100644 dd.1
 create mode 100644 dd.c

diff --git a/.gitignore b/.gitignore
index a533bfa..15a18c8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -17,6 +17,7 @@
 /cron
 /cut
 /date
+/dd
 /dirname
 /du
 /echo
diff --git a/Makefile b/Makefile
index 5473e5a..22339d0 100644
--- a/Makefile
+++ b/Makefile
@@ -100,6 +100,7 @@ BIN =\
cron\
cut\
date\
+   dd\
dirname\
du\
echo\
diff --git a/dd.1 b/dd.1
new file mode 100644
index 000..7ffbb76
--- /dev/null
+++ b/dd.1
@@ -0,0 +1,91 @@
+.Dd 2020-04-28
+.Dt DD 1
+.Os sbase
+.Sh NAME
+.Nm dd
+.Nd convert and copy a file
+.Sh SYNOPSIS
+.Nm
+.Op Ar operand Ns ...
+.Sh DESCRIPTION
+.Nm
+copies its input to its output, possibly after conversion, using
+the specified block sizes,
+.Pp
+The following operands are available:
+.Bl -tag -width ibs=expr
+.It Cm if= Ns Ar file
+Read from the file named by
+.Ar file
+instead of standard input.
+.It Cm of= Ns Ar file
+Write to the file named by
+.Ar file
+instead of standard output.
+.It Cm ibs= Ns Ar expr
+Set the input block size to
+.Ar expr
+(defaults to 512).
+.It Cm obs= Ns Ar expr
+Set the output block size to
+.Ar expr
+(defaults to 512).
+.It Cm bs= Ns Ar expr
+Set the input and output block sizes to
+.Ar expr .
+Additionally, if no conversion other than
+.Cm noerror ,
+.Cm notrunc ,
+or
+.Cm sync
+is specified, input blocks are copied as single output blocks, even
+when the input block is short.
+.It Cm skip= Ns Ar n
+Skip
+.Ar n
+input blocks before starting to copy.
+.It Cm seek= Ns Ar n
+Skip
+.Ar n
+output blocks before starting to copy.
+.It Cm count= Ns Ar n
+Copy at most
+.Ar n
+input blocks.
+.It Cm conv= Ns value Ns Op , Ns Ar value Ns ...
+Apply the conversions specified by
+.Ar value .
+.Bl -tag -width Ds
+.It Cm lcase
+Map uppercase characters to the corresponding lowercase character
+using
+.Fn tolower .
+.It Cm ucase
+Map lowercase characters to the corresponding uppercase character
+using
+.Fn toupper .
+.It Cm swab
+Swap each pair of bytes in the input block.
+If there is an odd number of bytes in a block, the last one is
+unmodified.
+.It Cm noerror
+In case of an error reading from the input, do not fail.
+Instead, print a diagnostic message and a summary of the current
+status.
+.It Cm notrunc
+Do not truncate the output file.
+.It Cm sync
+In case of a partial input block, pad with null bytes to form a
+complete block.
+.El
+.El
+.Sh STANDARDS
+The
+.Nm
+utility is compliant with the
+.St -p1003.1-2008
+specification, except that it does not implement the
+.Cm block
+and
+.Cm unblock
+conversions.
diff --git a/dd.c b/dd.c
new file mode 100644
index 000..5522db5
--- /dev/null
+++ b/dd.c
@@ -0,0 +1,234 @@
+/* See LICENSE file for copyright and license details. */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "util.h"
+
+static off_t ifull, ofull, ipart, opart;
+
+static void
+usage(void)
+{
+   eprintf("usage: %s [operand...]\n", argv0);
+}
+
+static size_t
+parsesize(char *expr)
+{
+   char *s = expr;
+   size_t n = 1;
+
+   for (;;) {
+   n *= strtoumax(s, , 10);
+   switch (*s) {
+   case 'k': n <<= 10; s++; break;
+   case 'b': n <<= 9; s++; break;
+   }
+   if (*s != 'x' || !s[1])
+   break;
+   s++;
+   }
+   if (*s || n == 0)
+   eprintf("invalid block size expression '%s'\n", expr);
+
+   return n;
+}
+
+static void
+bswap(unsigned char *buf, size_t len)
+{
+   int c;
+
+   for (len &= ~1; len > 0; buf += 2, len -= 2) {
+   c = buf[0];
+   buf[0] = buf[1];
+   buf[1] = c;
+   }
+}
+
+static void
+lcase(unsigned char *buf, size_t len)
+{
+   for (; len > 0; buf++, len--)
+   buf[0] = tolower(buf[0]);
+}
+
+static void
+ucase(unsigned char *buf, size_t len)
+{
+   for (; len > 0; buf++, len--)
+   buf[0] = toupper(buf[0]);
+}
+
+static void
+summary(void)
+{
+   fprintf(stderr, "%"PRIdMAX"+%"PRIdMAX" records in\n", (intmax_t)ifull, 
(intmax_t)ipart);
+   fprintf(stderr, "%"PRIdMAX"+%"PRIdMAX" records out\n", (intmax_t)ofull, 
(intmax_t)opart);
+}
+
+int
+main(int argc, char *argv[])
+{
+   enum {
+   LCASE   = 1 << 0,
+

Re: [hackers] Announcing libschrift (a TrueType font rendering library)

2020-04-24 Thread Michael Forney

On 2020-04-24, Silvan Jegen  wrote:
> Yeah, that's where my missing understanding of graphics programming
> becomes apparent. I assumed a font rendering library will just take a
> pointer to some memory as an argument to its rendering function to which
> it will then write the bitmapped glyph. Surely you will have to take the
> different buffer formats into account there though so I am not sure how
> that would work.

You would also have to consider how the glyph should be blended with
the destination, whether you want to use mask image (for example to
render in different colors or with transparency), and other details.
Rendering the glyphs to separate images in a grayscale format gives
the most flexibility. This also allows you to cache the glyphs so you
don't have to re-render them all the time.

> I'm also stomped by how the shared memory (like wl_shm) differs in Wayland
> from a buffer you would get from EGL (like dmabuf-based wl_buffers) or
> another graphics library. I thought I could just use pixman to render
> into both but I don't think that's actually true... This example[0]
> definitely looks relevant to this topic.

The buffers you get from OpenGL/Vulkan are generally tiled in various
vendor-specific (or even model-specific) formats, meaning they don't
use a linear memory layout. If you were to render into it as if it
were linear, you'd get jumbled output on the screen. The GPU memory is
also not necessarily accessible by the CPU. There are ways to allocate
an image in a linear layout backed by host-mappable memory, but that
is likely to come with performance penalty (comparatively). Usually,
you would only do this to upload textures to the GPU.

Recently, most GPU vendors have agreed to settle on an opaque 64-bit
"format modifier", which can be passed between APIs along with the
DMA-BUF fd so they know how the image is laid out in memory.
Previously, the image layout was either assumed, or passed along
through some out-of-band channel (for example, AMD stores this
information in a metadata structure associated with the buffer).

I've been working off-and-on for a while on a library[0] to provide
some sort of API to do simple 2D rendering and presentation operations
that can be implemented with XRender, Vulkan, pixman, and also
directly by submitting command buffers to the GPU. It's meant to be
the successor to wld, but I still have a really long way to go, and
have not begun to thing about how text rendering would work (but when
I do, I'll definitely be looking at libschrift).

[0] https://git.sr.ht/~mcf/libblit/

Re: [hackers] Announcing libschrift (a TrueType font rendering library)

2020-04-23 Thread Michael Forney

On 2020-04-23, Silvan Jegen  wrote:
> I had a quick look and currently it looks like it's mostly useful for
> rendering of fonts in X. I wonder how an interface would look like that
> could also be used for text rendering for a Wayland client. I assume the
> library would instead just render to some graphics memory to be rendered
> by the compositor, but I am not completely sure.

I think the interface would look the same for Wayland, but the missing
piece is something to composite the glyphs into the application's
window buffer, which is handled by XRender in the demo.

If you are rendering to shared memory, pixman (which is essentially
XRender in a library) can be used similarly. You can create a glyph
cache, load the glyph images produced by libschrift into it, and then
use pixman_composite_glyphs to render them onto your frame.

For OpenGL/Vulkan, it's the same on X11 and Wayland, since the client
is doing direct rendering in both cases. I believe it's generally done
by creating an "atlas" texture containing a bunch of glyphs at
different positions, and then rendering subregions of it onto your
frame. Most code using freetype directly could probably be adapted to
libschrift fairly easily.

Re: [hackers] [ubase][PATCH] mount: fix mount -o move

2020-04-16 Thread Michael Forney

This patch looks good to me.

Re: [hackers] [PATCH v4][sbase] paste: Support -d '\0'

2020-04-15 Thread Michael Forney

On 2020-04-15, Richard Ipsum  wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.

Thanks, Richard. Applied.

Sorry for the delay between reviews.

Re: [hackers] [PATCH v3][sbase] paste: Support -d '\0'

2020-04-07 Thread Michael Forney

On 2020-03-27, Richard Ipsum  wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.
> ---
>  libutf/utf.c  | 12 
>  libutf/utftorunestr.c | 12 
>  paste.c   | 27 +++
>  utf.h |  4 +++-
>  4 files changed, 42 insertions(+), 13 deletions(-)
>
> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..fc78f29 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> @@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
>   return i;
>  }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> + const char *p = s, *end = s + len;
> + size_t i;
> + Rune r;
> +
> + for(i = 0; p < end; i++)
> + p += charntorune(, p, end - p);
> + return i;
> +}

It looks like charntorune can return 0 even if p < end if it
encounters a truncated UTF-8 sequence, which would cause this to
infinite loop.

I think something similar to utfntorunestr should work here.

> +
>  char *
>  utfrune(const char *s, Rune r)
>  {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..d350c77 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> @@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
>   return i;
>  }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> + int i, n;
> + const char *p = str, *end = str + len;
> +
> + for(i = 0; (n = charntorune([i], p, end - p)) && p < end; i++)
> + p += n;

I don't think the `&& p < end` is necessary, since if p == end,
charntorune returns 0.

Also, I think this function should return size_t, not int. I pushed a
change to make utftorunestr return size_t as well.

> +
> + return i;
> +}

Re: [hackers] [PATCH 1/1] paste: Support -d '\0'

2020-03-21 Thread Michael Forney

On 2020-03-09, Richard Ipsum  wrote:
> POSIX specifies that -d '\0' sets the delimiter to an empty string.

Hi Richard,

Sorry for the delay on the review. This mostly looks good. Just a few
questions/comments.


> diff --git a/libutf/utf.c b/libutf/utf.c
> index 897c5ef..cf46e57 100644
> --- a/libutf/utf.c
> +++ b/libutf/utf.c
> @@ -62,6 +62,18 @@ utfnlen(const char *s, size_t len)
>   return i;
>  }
>
> +size_t
> +utfmemlen(const char *s, size_t len)
> +{
> + const char *p = s;
> + size_t i;
> + Rune r;
> +
> + for(i = 0; p - s < len; i++)
> + p += chartorune(, p);
> + return i;
> +}
> +
>  char *
>  utfrune(const char *s, Rune r)
>  {
> diff --git a/libutf/utftorunestr.c b/libutf/utftorunestr.c
> index 005fe8a..5da9d5f 100644
> --- a/libutf/utftorunestr.c
> +++ b/libutf/utftorunestr.c
> @@ -11,3 +11,15 @@ utftorunestr(const char *str, Rune *r)
>
>   return i;
>  }
> +
> +int
> +utfntorunestr(const char *str, size_t len, Rune *r)
> +{
> + int i, n;
> + const char *p = str;
> +
> + for(i = 0; (n = chartorune([i], p)) && p - str < len; i++)
> + p += n;
> +
> + return i;
> +}

I have a slight concern here (and in utfmemlen) that if the string
ends with a partial UTF-8 sequence or len == 0, we may read past the
end of the buffer. Perhaps we should use charntorune here?

> diff --git a/libutil/unescape.c b/libutil/unescape.c
> index d8ed2a2..deca948 100644
> --- a/libutil/unescape.c
> +++ b/libutil/unescape.c
> @@ -21,7 +21,8 @@ unescape(char *s)
>   ['n'] = '\n',
>   ['r'] = '\r',
>   ['t'] = '\t',
> - ['v'] = '\v'
> + ['v'] = '\v',
> + ['0'] = '\0'

I think this is not necessary. It should be handled by the octal
escape handling below.

Re: [hackers] ubase musl

2020-03-21 Thread Michael Forney

On 2020-01-5, spaceman  wrote:
> ubase won't compile with musl later than 1.1.23 because it lacks the
> major, minor and makedev function. Added the necessary include to make
> it compile.

Yes, this is a known issue. I sent a patch last August:
https://lists.suckless.org/hackers/1908/16960.html

Unfortunately, ubase does not have an active maintainer.

If you're interested, I've been accumulating some patches in my local branch:
https://github.com/michaelforney/ubase

Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Michael Forney

On 2020-03-05, Quentin Rameau  wrote:
>> Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""` is
>> unspecified
>
> I don't think so, -d "" is just a list with an empty string.
> So -d '\0' is equivalent to -d '', '\0' is here to let the user express
> an empty string in a list, which wouldn't be possible otherwise (like
> how would one specify empty string in a list like 'abcd').

Here is a direct quote from POSIX:

The commands:

paste -d "\0" ...
paste -d "" ...

are not necessarily equivalent; the latter is not specified by
this volume of POSIX.1-2017 and may result in an error.

>> and `-d""` is invalid, since paste(1) must follow the
>> utility syntax guidelines (guideline 7).
>
> Not sure what you mean there, -d"" is the concatenation of -d and '',
> which is standard.
> Did you quote the correct guideline? “Guideline 7: Option-arguments
> should not be optional.” here there's an option-argument, that's an
> empty string.

I guess it's a combination of 6 and 7. Option arguments must be
separate parameters and non-optional (unless specified otherwise).
There is an exception made that allows conforming implementations to
accept an option adjacent with its option argument in the same
argument, but I think this only makes sense if the option-argument is
non-empty.

POSIX says

  The construct '\0' is used to mean "no separator" because historical
versions of paste did not follow the syntax guidelines, and the
command:

  paste -d"" ...

  could not be handled properly by getopt().

I think this implies that `paste -d""` is no longer valid, due to the
requirements of the syntax guidelines.

Re: [hackers] [PATCH][sbase] paste: Allow null delim

2020-03-05 Thread Michael Forney

Hi Richard,

Thanks for the patch. Can you clarify if this change fixes `-d "\0"`,
`-d ""` or `-d""`?

Looking at POSIX, I see that `-d '\0'` must be supported, `-d ""` is
unspecified, and `-d""` is invalid, since paste(1) must follow the
utility syntax guidelines (guideline 7).

I recently investigated a similar issue to this in tr(1). I think a
proper solution would be to add length parameters to utflen and
utftorunestr so that they can handle 0-bytes in the strings.

On 2020-03-02, Richard Ipsum  wrote:
> ---
>  paste.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/paste.c b/paste.c
> index b0ac761..0159fe0 100644
> --- a/paste.c
> +++ b/paste.c
> @@ -53,7 +53,8 @@ nextline:
>
>   for (; efgetrune(, dsc[i].fp, dsc[i].name) ;) {
>   for (m = last + 1; m < i; m++)
> - efputrune(&(delim[m % delimlen]), stdout, 
> "");
> + if (delim[m % delimlen] != '\0')
> + efputrune(&(delim[m % delimlen]), 
> stdout, "");
>   last = i;
>   if (c == '\n') {
>   if (i != fdescrlen - 1)
> @@ -68,7 +69,8 @@ nextline:
>   if (i == fdescrlen - 1)
>   putchar('\n');
>   else
> - efputrune(, stdout, "");
> + if (d != '\0')
> + efputrune(, stdout, "");
>   last++;
>   }
>   }
> @@ -96,7 +98,7 @@ main(int argc, char *argv[])
>   seq = 1;
>   break;
>   case 'd':
> - adelim = EARGF(usage());
> + adelim = ARGF();

I think allowing missing option-argument here breaks POSIX compatibility.

>   unescape(adelim);
>   break;
>   default:
> @@ -109,8 +111,12 @@ main(int argc, char *argv[])
>   /* populate delimiters */
>   /* TODO: fix libutf to accept sizes */
>   delim = ereallocarray(NULL, utflen(adelim) + 1, sizeof(*delim));
> - if (!(delimlen = utftorunestr(adelim, delim)))
> + if (*adelim == '\0') {
> + delimlen = 1;
> + *delim = '\0';
> + } else if (!(delimlen = utftorunestr(adelim, delim))) {
>   usage();
> + }
>
>   /* populate file list */
>   dsc = ereallocarray(NULL, argc, sizeof(*dsc));
> --
> 2.25.1

Re: [hackers] [sbase][PATCH] od: Fix argument parsing for -t flag

2020-03-03 Thread Michael Forney

On 2020-02-16, Quentin Rameau  wrote:
> I think something like that is better, what do you think?

Thanks, this looks good to me. Please resend with a commit message.

> diff --git a/od.c b/od.c
> index 0b1c5c6..9ff8ff2 100644
> --- a/od.c
> +++ b/od.c
> @@ -212,7 +212,7 @@ main(int argc, char *argv[])
>  {
>   int fd;
>   struct type *t;
> - int ret = 0, len;
> + int ret = 0, len, defbytes;
>   char *s;
>
>   big_endian = (*(uint16_t *)"\0\xff" == 0xff);
> @@ -260,6 +260,7 @@ main(int argc, char *argv[])
>   case 'o':
>   case 'u':
>   case 'x':
> + defbytes = 0;
>   /* todo: allow multiple digits */
>   if (*(s+1) > '0' && *(s+1) <= '9') {
>   len = *(s+1) - '0';
> @@ -271,17 +272,17 @@ main(int argc, char *argv[])
>   case 'S':
>   len = sizeof(short);
>   break;
> + default:
> + defbytes = 1;
>   case 'I':
>   len = sizeof(int);
>   break;
>   case 'L':
>   len = sizeof(long);
>   break;
> - default:
> - len = sizeof(int);
>   }
>   }
> - addtype(*s++, len);
> + addtype(defbytes ? *s : *s++, len);

I think the increment should be separate for clarity; defbytes isn't
really controlling the parameter, just whether or not s is advanced.

addtype(*s, len);
if (!defbytes)
s++;

>   break;
>   default:
>   usage();
>
>

[hackers] [PATCH sbase] install: Remove special handling for non-regular files

2020-03-01 Thread Michael Forney

All install(1) implementations I'm aware of don't try to replicate
the source file node like this. Additionally, this reportedly breaks
some scripts that use install(1) in a pipeline.
---
 xinstall.c | 72 +-
 1 file changed, 12 insertions(+), 60 deletions(-)

diff --git a/xinstall.c b/xinstall.c
index fe55396..e102c4c 100644
--- a/xinstall.c
+++ b/xinstall.c
@@ -43,72 +43,24 @@ make_dirs(char *dir, int was_missing)
 static int
 install(const char *s1, const char *s2, int depth)
 {
-   DIR *dp;
int f1, f2;
-   struct dirent *d;
-   struct stat st;
-   ssize_t r;
-   char target[PATH_MAX], ns1[PATH_MAX], ns2[PATH_MAX];
-
-   if (stat(s1, ) < 0)
-   eprintf("stat %s:", s1);
-
-   if (S_ISLNK(st.st_mode)) {
-   if ((r = readlink(s1, target, sizeof(target) - 1)) >= 0) {
-   target[r] = '\0';
-   if (unlink(s2) < 0 && errno != ENOENT)
-   eprintf("unlink %s:", s2);
-   else if (symlink(target, s2) < 0)
-   eprintf("symlink %s -> %s:", s2, target);
-   }
-   } else if (S_ISDIR(st.st_mode)) {
-   if (!(dp = opendir(s1)))
-   eprintf("opendir %s:", s1);
-   if (mkdir(s2, mode | 0111) < 0 && errno != EEXIST)
-   eprintf("mkdir %s:", s2);
-
-   while ((d = readdir(dp))) {
-   if (!strcmp(d->d_name, ".") || !strcmp(d->d_name, ".."))
-   continue;
-
-   estrlcpy(ns1, s1, sizeof(ns1));
-   if (s1[strlen(s1) - 1] != '/')
-   estrlcat(ns1, "/", sizeof(ns1));
-   estrlcat(ns1, d->d_name, sizeof(ns1));
-
-   estrlcpy(ns2, s2, sizeof(ns2));
-   if (s2[strlen(s2) - 1] != '/')
-   estrlcat(ns2, "/", sizeof(ns2));
-   estrlcat(ns2, d->d_name, sizeof(ns2));
-
-   fnck(ns1, ns2, install, depth + 1);
-   }
 
-   closedir(dp);
-   } else if (S_ISBLK(st.st_mode) || S_ISCHR(st.st_mode) ||
-  S_ISSOCK(st.st_mode) || S_ISFIFO(st.st_mode)) {
+   if ((f1 = open(s1, O_RDONLY)) < 0)
+   eprintf("open %s:", s1);
+   if ((f2 = creat(s2, 0600)) < 0) {
if (unlink(s2) < 0 && errno != ENOENT)
eprintf("unlink %s:", s2);
-   else if (mknod(s2, (st.st_mode & ~0) | mode, st.st_rdev) < 
0)
-   eprintf("mknod %s:", s2);
-   } else {
-   if ((f1 = open(s1, O_RDONLY)) < 0)
-   eprintf("open %s:", s1);
-   if ((f2 = creat(s2, 0600)) < 0) {
-   if (unlink(s2) < 0 && errno != ENOENT)
-   eprintf("unlink %s:", s2);
-   if ((f2 = creat(s2, 0600)) < 0)
-   eprintf("creat %s:", s2);
-   }
-   if (concat(f1, s1, f2, s2) < 0)
-   exit(1);
+   if ((f2 = creat(s2, 0600)) < 0)
+   eprintf("creat %s:", s2);
+   }
+   if (concat(f1, s1, f2, s2) < 0)
+   exit(1);
 
-   if (fchmod(f2, mode) < 0)
-   eprintf("fchmod %s:", s2);
+   if (fchmod(f2, mode) < 0)
+   eprintf("fchmod %s:", s2);
 
-   close(f1);
-   close(f2);
-   }
+   close(f1);
+   close(f2);
 
if (lchown(s2, owner, group) < 0)
eprintf("lchown %s:", s2);
-- 
2.25.1

[hackers] [PATCH ubase] Remove mknod; it was imported into sbase

2020-03-01 Thread Michael Forney

---
 Makefile |  2 --
 mknod.1  | 37 -
 mknod.c  | 45 -
 3 files changed, 84 deletions(-)
 delete mode 100644 mknod.1
 delete mode 100644 mknod.c

diff --git a/Makefile b/Makefile
index b526421..b072ef0 100644
--- a/Makefile
+++ b/Makefile
@@ -61,7 +61,6 @@ BIN = \
lsmod \
lsusb \
mesg  \
-   mknod \
mkswap\
mount \
mountpoint\
@@ -102,7 +101,6 @@ MAN1 = \
id.1\
login.1 \
mesg.1  \
-   mknod.1 \
mountpoint.1\
pagesize.1  \
passwd.1\
diff --git a/mknod.1 b/mknod.1
deleted file mode 100644
index 5437fbb..000
--- a/mknod.1
+++ /dev/null
@@ -1,37 +0,0 @@
-.Dd February 2, 2015
-.Dt MKNOD 1
-.Os ubase
-.Sh NAME
-.Nm mknod
-.Nd create a special device file
-.Sh SYNOPSIS
-.Nm
-.Op Fl m Ar mode
-.Ar name
-.Ar type
-.Ar major
-.Ar minor
-.Sh DESCRIPTION
-.Nm
-creates a special device file named
-.Ar name
-with major number
-.Ar major ,
-and minor number
-.Ar minor .
-.Ar type
-specifies what kind of special file will be created and must be one of:
-.Bl -tag -width Ds
-.It Ar u | c
-A character device.
-.It Ar b
-A block device.
-.El
-.Sh OPTIONS
-.Bl -tag -width Ds
-.It Fl m
-Set the mode of the new file based on the octal value of
-.Ar mode .
-.El
-.Sh SEE ALSO
-.Xr mknod 2
diff --git a/mknod.c b/mknod.c
deleted file mode 100644
index 8de35c7..000
--- a/mknod.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* See LICENSE file for copyright and license details. */
-#include 
-#include 
-
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#include "util.h"
-
-static void
-usage(void)
-{
-   eprintf("usage: %s [-m mode] name type major minor\n", argv0);
-}
-
-int
-main(int argc, char *argv[])
-{
-   mode_t type, mode = 0644;
-   dev_t dev;
-
-   ARGBEGIN {
-   case 'm':
-   mode = estrtol(EARGF(usage()), 8);
-   break;
-   default:
-   usage();
-   } ARGEND;
-
-   if (argc != 4)
-   usage();
-
-   if (strlen(argv[1]) != 1 || !strchr("ucb", argv[1][0]))
-   eprintf("mknod: '%s': invalid type\n", argv[1]);
-   type = (argv[1][0] == 'b') ? S_IFBLK : S_IFCHR;
-
-   dev = makedev(estrtol(argv[2], 0), estrtol(argv[3], 0));
-
-   if (mknod(argv[0], type|mode, dev) == -1)
-   eprintf("mknod: '%s':", argv[0]);
-   return 0;
-}
-- 
2.25.1

Re: [hackers] [sbase] [PATCH] *sum: Ignore -b and -t flags

2020-01-02 Thread Michael Forney

On 2020-01-02, Laslo Hunhold  wrote:
> I would print something on stderr. POSIX is ignored often enough and a
> ton of scripts are using the cancerous GNU extensions and other
> extensions. If we just "ignore" them, there is no learning effect or
> push for change for script writers, so maybe we could add a warning
> while we ignore them, so when you run a script that makes use of these
> mostly useless flags (which we could also tell them), then this might a
> push in a good direction. What do you think?

I'm well aware that POSIX is ignored often. I send patches to projects
every time I run into a script using non-POSIX options that break
compatibility with sbase. But in this case, POSIX is not really
relevant since as I mentioned, these are not POSIX tools, and there
aren't many different implementations (BSDs have their own md5(1),
sha256(1), ...), so coreutils is essentially the standard.

Who are we to decide that -b and -t are not be valid flags for sha*sum
and should not be used? I ran into a script that used `sha256sum -b`,
but how could I justify removing that option to upstream?

The flags may be useless on operating systems that sbase supports, but
the "rb" is a valid mode for C99 fopen ("r" opens as a text file), so
there must be operating systems where it makes a difference. Removing
-b may break the script on those operating systems.

Re: [hackers] [sbase] [PATCH] *sum: Ignore -b and -t flags

2020-01-02 Thread Michael Forney

On 2020-01-02, Quentin Rameau  wrote:
> I agree in that silently ignoring commands from the user is bad, as it
> breaks expectations.

Well, in this case ignoring them is a valid implementation, since on
POSIX there is no distinction between opening a file as text or binary
mode. So it will return the correct result with those flags.

> Though as noted in this case, those are not standardized (maybe that
> wasn't a great idea to add them in sbase instead of ubase even if they
> can be implemented in a portable manner), so anything can happen there.

There is http://austingroupbugs.net/view.php?id=1041 which encourages
implementations to provide *some* method of checksum with high
security, but unfortunately still has no specification of what that
looks like.

> I'm not sure we should start adding those kind of half-compability
> parsing with coreutils, where do we stop?

I disagree that it is "half-compatibility", but this is a good point.
It's kind of a similar issue with tar(1), which was removed from
POSIX, but still extremely widely used. I think the best we can do is
implement the options that are commonly expected of the tools. In this
case implementing -b and -t is as simple as adding a case to a switch,
so I don't see much downside to doing so.

> Also for what it's worth:
>
> $ busybox md5sum -b md5sum.c
> md5sum: unrecognized option: b
> BusyBox v1.31.1 (2019-11-29 10:55:12 UTC) multi-call binary.
>
> Usage: md5sum [FILE]...
>
> $ busybox md5sum -t md5sum.c
> md5sum: unrecognized option: t
> BusyBox v1.31.1 (2019-11-29 10:55:12 UTC) multi-call binary.
>
> Usage: md5sum [FILE]...

Looks like with busybox the -b and -t options depend on
ENABLE_FEATURE_MD5_SHA1_SUM_CHECK, which also enables the -c flag.

$ busybox md5sum -b /dev/null
d41d8cd98f00b204e9800998ecf8427e  /dev/null

[hackers] [sbase] [PATCH] *sum: Ignore -b and -t flags

2020-01-01 Thread Michael Forney

These tools are not standardized, but these flags are supported in all
implementations I'm aware of (coreutils, busybox), and are occasionally
used in scripts.

These flags are only meaningful on operating systems which differentiate
between text and binary files, so just ignore them.
---
 md5sum.c| 4 
 sha1sum.c   | 4 
 sha224sum.c | 4 
 sha256sum.c | 4 
 sha384sum.c | 4 
 sha512-224sum.c | 4 
 sha512-256sum.c | 4 
 sha512sum.c | 4 
 8 files changed, 32 insertions(+)

diff --git a/md5sum.c b/md5sum.c
index 86fb40f..224b20e 100644
--- a/md5sum.c
+++ b/md5sum.c
@@ -28,6 +28,10 @@ main(int argc, char *argv[])
uint8_t md[MD5_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha1sum.c b/sha1sum.c
index 4f3ae77..cc8dcae 100644
--- a/sha1sum.c
+++ b/sha1sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA1_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha224sum.c b/sha224sum.c
index 5c4a6cb..e9a10cf 100644
--- a/sha224sum.c
+++ b/sha224sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA224_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha256sum.c b/sha256sum.c
index d863539..686c70f 100644
--- a/sha256sum.c
+++ b/sha256sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA256_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha384sum.c b/sha384sum.c
index f975b61..c76947e 100644
--- a/sha384sum.c
+++ b/sha384sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA384_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha512-224sum.c b/sha512-224sum.c
index 6e4a9d6..53f2e62 100644
--- a/sha512-224sum.c
+++ b/sha512-224sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA512_224_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha512-256sum.c b/sha512-256sum.c
index c2d582e..ea556b8 100644
--- a/sha512-256sum.c
+++ b/sha512-256sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA512_256_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
diff --git a/sha512sum.c b/sha512sum.c
index 65a0c2c..a76e685 100644
--- a/sha512sum.c
+++ b/sha512sum.c
@@ -27,6 +27,10 @@ main(int argc, char *argv[])
uint8_t md[SHA512_DIGEST_LENGTH];
 
ARGBEGIN {
+   case 'b':
+   case 't':
+   /* ignore */
+   break;
case 'c':
cryptfunc = cryptcheck;
break;
-- 
2.24.1

Re: [hackers] [PATCH][sbase] sort: Don't do top-level sort when -c is used with -k

2020-01-01 Thread Michael Forney

On 2020-01-01, Richard Ipsum  wrote:
> ---
>  sort.c | 13 +
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/sort.c b/sort.c
> index fc76738..c586394 100644
> --- a/sort.c
> +++ b/sort.c
> @@ -383,10 +383,15 @@ main(int argc, char *argv[])
>   usage();
>   } ARGEND
>
> - /* -b shall only apply to custom key definitions */
> - if (TAILQ_EMPTY() && global_flags)
> - addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
> - addkeydef("1", global_flags & MOD_R);
> + if (TAILQ_EMPTY()) {
> + if (global_flags) {
> + /* -b shall only apply to custom key definitions */
> + addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
> + }
> + addkeydef("1", global_flags & MOD_R);
> + } else if (!Cflag && !cflag) {
> + addkeydef("1", global_flags & MOD_R);
> + }
>
>   if (!argc) {
>   if (Cflag || cflag) {

I've been reading up on the various sort(1) options, and trying to
understand the original intent of this code.

My current understanding is that when no -k flag is present, it adds a
keydef for all fields with the global flags, and regardless, it adds a
keydef for all fields with no flags (except -r) so that there is some
tie-breaker for lines that are equal according to the specified flags.
This tie-breaker is fine for printing the lines, but should not be
used when checking if the file is already sorted.

Consider `sort -f -c` with the input

a
A

This should return 0 since -f considers all lowercase characters as
uppercase, but currently it returns 1, so it is not only a problem
with -k.

I think the following diff should cover those cases as well:

diff --git a/sort.c b/sort.c
index a51997f..fbb1abf 100644
--- a/sort.c
+++ b/sort.c
@@ -385,7 +385,8 @@ main(int argc, char *argv[])
/* -b shall only apply to custom key definitions */
if (TAILQ_EMPTY() && global_flags)
addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
-   addkeydef("1", global_flags & MOD_R);
+   if (TAILQ_EMPTY() || (!Cflag && !cflag))
+   addkeydef("1", global_flags & MOD_R);

if (!argc) {
if (Cflag || cflag) {

I'm still not convinced of the value of this tie-breaker keydef, so
another option might be to just only add it if no -k flag is
specified:

diff --git a/sort.c b/sort.c
index a51997f..adf1d6d 100644
--- a/sort.c
+++ b/sort.c
@@ -383,9 +383,8 @@ main(int argc, char *argv[])
} ARGEND

/* -b shall only apply to custom key definitions */
-   if (TAILQ_EMPTY() && global_flags)
+   if (TAILQ_EMPTY())
addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
-   addkeydef("1", global_flags & MOD_R);

if (!argc) {
if (Cflag || cflag) {

I think I will apply the first diff for now to fix the bug, and
perhaps propose the second as a patch to the list, since it involves
an unrelated behavior change (order of equal lines according to
options passed to sort).

Re: [hackers] [PATCH 1/3][sbase] sort: Fix handling of -k n,n case

2019-12-31 Thread Michael Forney

On 2019-12-31, Richard Ipsum  wrote:
> When field_start is equal to field_end use field N as the sort key.

I think this is not just a problem with `-k n,n`, but any time
field_end is specified. Only up to the beginning of the end field is
compared (if there are more fields after it).

For example

$ sort -t , -k 1,2 -k 4 < diff --git a/sort.c b/sort.c
> index 9706e8f..941b694 100644
> --- a/sort.c
> +++ b/sort.c
> @@ -70,6 +70,9 @@ skipcolumn(struct line *a, int skip_to_next_col)
>   s += fieldseplen;
>   a->data = s;
>   a->len = a->len - (s - a->data);
> + } else {
> + a->data = s;
> + a->len = 1;
>   }
>   } else {
>   a->data += a->len - 1;

I think we should keep the invariant that a->data + a->len is the end
of the line, even though end.len isn't used after the call to
skipcolumn(, 0).

What do you think about the following diff instead?

diff --git a/sort.c b/sort.c
index dfc383f..a51997f 100644
--- a/sort.c
+++ b/sort.c
@@ -66,11 +66,10 @@ skipcolumn(struct line *a, int skip_to_next_col)

if (fieldsep) {
if ((s = memmem(a->data, a->len, fieldsep, fieldseplen))) {
-   if (skip_to_next_col) {
+   if (skip_to_next_col)
s += fieldseplen;
-   a->len -= s - a->data;
-   a->data = s;
-   }
+   a->len -= s - a->data;
+   a->data = s;
} else {
a->data += a->len - 1;
a->len = 1;

Re: [hackers] [PATCH 2/3][sbase] sort: Don't do top-level sort when -k is used

2019-12-31 Thread Michael Forney

On 2019-12-31, Richard Ipsum  wrote:
> ---
>  sort.c | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/sort.c b/sort.c
> index 941b694..e974011 100644
> --- a/sort.c
> +++ b/sort.c
> @@ -196,7 +196,8 @@ slinecmp(struct line *a, struct line *b)
>   x = strtod(col1.line.data, NULL);
>   y = strtod(col2.line.data, NULL);
>   res = (x < y) ? -1 : (x > y);
> - } else if (kd->flags & (MOD_D | MOD_F | MOD_I)) {
> + } else if ((kd->flags & MOD_D) | (kd->flags & MOD_F) |
> +(kd->flags & MOD_I)) {

This is functionally the same, right? Any reason for this change?

>   res = skipmodcmp(, , kd->flags);
>   } else {
>   res = linecmp(, );
> @@ -386,10 +387,13 @@ main(int argc, char *argv[])
>   usage();
>   } ARGEND
>
> - /* -b shall only apply to custom key definitions */
> - if (TAILQ_EMPTY() && global_flags)
> - addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
> - addkeydef("1", global_flags & MOD_R);
> + if (TAILQ_EMPTY()) {
> + if (global_flags) {
> + /* -b shall only apply to custom key definitions */
> + addkeydef("1", global_flags & ~(MOD_STARTB | MOD_ENDB));
> + }
> + addkeydef("1", global_flags & MOD_R);
> + }
>
>   if (!argc) {
>   if (Cflag || cflag) {

We use qsort to sort the lines, and the order of elements that compare
equal when applied to qsort is unspecified, so I think this means that
the output of sbase sort(1) could vary from libc to libc. POSIX says
that `The order in which lines that still compare equal are written is
unspecified`, so I don't think there is actually a conformance issue
with that, but it is something to consider.

I do see the problem with the top-level sort when -[cC] is used, so
maybe another option is to only add it when we are not doing order
checking.

What are your thoughts on this?

Re: [hackers] [PATCH 0/3][sbase] sort fixes

2019-12-31 Thread Michael Forney

On 2019-12-31, Richard Ipsum  wrote:
> Two additional fixes to the first one.

Hi Richard,

As always, thanks for the patches. I applied your third patch, and
have a couple small comments for the first two.

> Thanks and happy new year!
> Richard

To you, too!

-Michael

Re: [hackers] [sbase] libutil/recurse: Use while-loop instead of for-loop with only condition || Michael Forney

2019-12-29 Thread Michael Forney

On 2019-12-29, Quentin Rameau  wrote:
> The patch itself shows that, could you explain what it fixes or improve
> in your commit messages?

It fixes the stylistic issue that a for-loop is used with only a
condition (no initialization or increment expression), which is just
an unconventional way of writing a while-loop.

Not really sure what more I could add here...

Re: [hackers] [PATCH 1/2] Use *at functions with appropriate flags instead of lstat/lchown

2019-12-28 Thread Michael Forney

On 2019-12-23, Laslo Hunhold  wrote:
> What is the performance effect of this patch on big directories? It
> should be noticeable, shouldn't it?

There is a slight performance benefit, but it is pretty negligible due
to system call overhead. I tested du(1) on an extracted linux-5.4.6
source tree and saw an improvement of about 4-5% on my system
(comparing average of 20 runs).

I think the main benefit is the decreased stack usage and the
possibility to refactor other tools like cp to share the recurse code
(since now the base name is passed by itself).

[hackers] [PATCH 2/2] libutil/recurse: Use a single path buffer, and directory fd

2019-12-22 Thread Michael Forney

This way, we don't use PATH_MAX bytes on the stack per path component,
and don't have to keep copying the complete path around.
---
 chgrp.c   | 11 +--
 chmod.c   | 11 ++-
 chown.c   | 10 +-
 du.c  | 13 +++--
 fs.h  |  9 ++---
 libutil/recurse.c | 45 ++---
 libutil/rm.c  | 13 +++--
 mv.c  |  3 ++-
 rm.c  |  4 +++-
 tar.c | 10 +-
 10 files changed, 72 insertions(+), 57 deletions(-)

diff --git a/chgrp.c b/chgrp.c
index 04b2dd0..d336c88 100644
--- a/chgrp.c
+++ b/chgrp.c
@@ -14,18 +14,17 @@ static gid_t gid = -1;
 static int   ret = 0;
 
 static void
-chgrp(const char *path, struct stat *st, void *data, struct recursor *r)
+chgrp(int dirfd, const char *name, struct stat *st, void *data, struct 
recursor *r)
 {
int flags = 0;
 
if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth)))
flags |= AT_SYMLINK_NOFOLLOW;
-
-   if (fchownat(AT_FDCWD, path, -1, gid, flags) < 0) {
-   weprintf("chown %s:", path);
+   if (fchownat(dirfd, name, -1, gid, flags) < 0) {
+   weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(path, NULL, r);
+   recurse(dirfd, name, NULL, r);
}
 }
 
@@ -71,7 +70,7 @@ main(int argc, char *argv[])
}
 
for (argc--, argv++; *argv; argc--, argv++)
-   recurse(*argv, NULL, );
+   recurse(AT_FDCWD, *argv, NULL, );
 
return ret || recurse_status;
 }
diff --git a/chmod.c b/chmod.c
index 2a0085d..3d63791 100644
--- a/chmod.c
+++ b/chmod.c
@@ -1,4 +1,5 @@
 /* See LICENSE file for copyright and license details. */
+#include 
 #include 
 
 #include "fs.h"
@@ -9,16 +10,16 @@ static mode_t mask= 0;
 static intret = 0;
 
 static void
-chmodr(const char *path, struct stat *st, void *data, struct recursor *r)
+chmodr(int dirfd, const char *name, struct stat *st, void *data, struct 
recursor *r)
 {
mode_t m;
 
m = parsemode(modestr, st->st_mode & ~S_IFMT, mask);
-   if (chmod(path, m) < 0) {
-   weprintf("chmod %s:", path);
+   if (fchmodat(dirfd, name, m, 0) < 0) {
+   weprintf("chmod %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(path, NULL, r);
+   recurse(dirfd, name, NULL, r);
}
 }
 
@@ -76,7 +77,7 @@ done:
usage();
 
for (--argc, ++argv; *argv; argc--, argv++)
-   recurse(*argv, NULL, );
+   recurse(AT_FDCWD, *argv, NULL, );
 
return ret || recurse_status;
 }
diff --git a/chown.c b/chown.c
index dcd4914..3aaedef 100644
--- a/chown.c
+++ b/chown.c
@@ -17,18 +17,18 @@ static gid_t gid = -1;
 static int   ret = 0;
 
 static void
-chownpwgr(const char *path, struct stat *st, void *data, struct recursor *r)
+chownpwgr(int dirfd, const char *name, struct stat *st, void *data, struct 
recursor *r)
 {
int flags = 0;
 
if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth)))
flags |= AT_SYMLINK_NOFOLLOW;
 
-   if (fchownat(AT_FDCWD, path, uid, gid, flags) < 0) {
-   weprintf("chown %s:", path);
+   if (fchownat(dirfd, name, uid, gid, flags) < 0) {
+   weprintf("chown %s:", r->path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
-   recurse(path, NULL, r);
+   recurse(dirfd, name, NULL, r);
}
 }
 
@@ -99,7 +99,7 @@ main(int argc, char *argv[])
usage();
 
for (argc--, argv++; *argv; argc--, argv++)
-   recurse(*argv, NULL, );
+   recurse(AT_FDCWD, *argv, NULL, );
 
return ret || recurse_status;
 }
diff --git a/du.c b/du.c
index 4ceac66..fa6a8d6 100644
--- a/du.c
+++ b/du.c
@@ -3,6 +3,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -35,19 +36,19 @@ nblks(blkcnt_t blocks)
 }
 
 static void
-du(const char *path, struct stat *st, void *data, struct recursor *r)
+du(int dirfd, const char *path, struct stat *st, void *data, struct recursor 
*r)
 {
off_t *total = data, subtotal;
 
subtotal = nblks(st->st_blocks);
if (S_ISDIR(st->st_mode))
-   recurse(path, , r);
+   recurse(dirfd, path, , r);
*total += subtotal;
 
if (!r->depth)
-   printpath(*total, path);
+   printpath(*total, r->path);
else if (!sflag && r->depth <= maxdepth && (S_ISDIR(st->st_mode) || 
aflag))
-   printpath(subtotal, path);
+   printpath(subtotal, r->path);
 }
 
 static void
@@ -104,11 +105,11 @@ main(int argc, char *argv[])
blksize = 1024;
 
if

[hackers] [PATCH 1/2] Use *at functions with appropriate flags instead of lstat/lchown

2019-12-22 Thread Michael Forney

---
 chgrp.c   | 17 ++---
 chown.c   | 17 ++---
 libutil/cp.c  | 18 ++
 libutil/recurse.c | 28 +++-
 4 files changed, 29 insertions(+), 51 deletions(-)

diff --git a/chgrp.c b/chgrp.c
index 960ed2c..04b2dd0 100644
--- a/chgrp.c
+++ b/chgrp.c
@@ -2,6 +2,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 
@@ -15,19 +16,13 @@ static int   ret = 0;
 static void
 chgrp(const char *path, struct stat *st, void *data, struct recursor *r)
 {
-   char *chownf_name;
-   int (*chownf)(const char *, uid_t, gid_t);
+   int flags = 0;
 
-   if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth))) {
-   chownf_name = "lchown";
-   chownf = lchown;
-   } else {
-   chownf_name = "chown";
-   chownf = chown;
-   }
+   if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth)))
+   flags |= AT_SYMLINK_NOFOLLOW;
 
-   if (chownf(path, -1, gid) < 0) {
-   weprintf("%s %s:", chownf_name, path);
+   if (fchownat(AT_FDCWD, path, -1, gid, flags) < 0) {
+   weprintf("chown %s:", path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
recurse(path, NULL, r);
diff --git a/chown.c b/chown.c
index c517acf..dcd4914 100644
--- a/chown.c
+++ b/chown.c
@@ -1,5 +1,6 @@
 /* See LICENSE file for copyright and license details. */
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -18,19 +19,13 @@ static int   ret = 0;
 static void
 chownpwgr(const char *path, struct stat *st, void *data, struct recursor *r)
 {
-   char *chownf_name;
-   int (*chownf)(const char *, uid_t, gid_t);
+   int flags = 0;
 
-   if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth))) {
-   chownf_name = "lchown";
-   chownf = lchown;
-   } else {
-   chownf_name = "chown";
-   chownf = chown;
-   }
+   if ((r->maxdepth == 0 && r->follow == 'P') || (r->follow == 'H' && 
r->depth) || (hflag && !(r->depth)))
+   flags |= AT_SYMLINK_NOFOLLOW;
 
-   if (chownf(path, uid, gid) < 0) {
-   weprintf("%s %s:", chownf_name, path);
+   if (fchownat(AT_FDCWD, path, uid, gid, flags) < 0) {
+   weprintf("chown %s:", path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
recurse(path, NULL, r);
diff --git a/libutil/cp.c b/libutil/cp.c
index b6f8b23..61c73f4 100644
--- a/libutil/cp.c
+++ b/libutil/cp.c
@@ -26,24 +26,18 @@ int
 cp(const char *s1, const char *s2, int depth)
 {
DIR *dp;
-   int f1, f2;
+   int f1, f2, flags = 0;
struct dirent *d;
struct stat st;
struct timespec times[2];
ssize_t r;
-   int (*statf)(const char *, struct stat *);
-   char target[PATH_MAX], ns1[PATH_MAX], ns2[PATH_MAX], *statf_name;
+   char target[PATH_MAX], ns1[PATH_MAX], ns2[PATH_MAX];
 
-   if (cp_follow == 'P' || (cp_follow == 'H' && depth)) {
-   statf_name = "lstat";
-   statf = lstat;
-   } else {
-   statf_name = "stat";
-   statf = stat;
-   }
+   if (cp_follow == 'P' || (cp_follow == 'H' && depth))
+   flags |= AT_SYMLINK_NOFOLLOW;
 
-   if (statf(s1, ) < 0) {
-   weprintf("%s %s:", statf_name, s1);
+   if (fstatat(AT_FDCWD, s1, , flags) < 0) {
+   weprintf("stat %s:", s1);
cp_status = 1;
return 0;
}
diff --git a/libutil/recurse.c b/libutil/recurse.c
index d2dc3ed..f50347e 100644
--- a/libutil/recurse.c
+++ b/libutil/recurse.c
@@ -1,6 +1,7 @@
 /* See LICENSE file for copyright and license details. */
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -21,20 +22,15 @@ recurse(const char *path, void *data, struct recursor *r)
struct history *new, *h;
struct stat st, dst;
DIR *dp;
-   int (*statf)(const char *, struct stat *);
-   char subpath[PATH_MAX], *statf_name;
+   char subpath[PATH_MAX];
+   int flags = 0;
 
-   if (r->follow == 'P' || (r->follow == 'H' && r->depth)) {
-   statf_name = "lstat";
-   statf = lstat;
-   } else {
-   statf_name = "stat";
-   statf = stat;
-   }
+   if (r->follow == 'P' || (r->follow == 'H' && r->depth))
+   flags |= AT_SYMLINK_NOFOLLOW;
 
-   if (statf(path, ) < 0) {
+   if (fstatat(AT_FDCWD, path, , flags) < 0) {
if (!(r->flags & SILENT)) {
-   weprintf("%s %s:", statf_name, path);
+   weprintf("stat %s:", path);
recurse_status = 1;
}
return;
@@ -66,19 +62,17 @@

[hackers] [PATCH] cp: Default to -P when -R is specified

2019-12-22 Thread Michael Forney

POSIX only specifies the -H, -L, and -P options for use with -R, and
the default is left to the implementation. Without -R, symlinks must
be followed.

Most implementations use -P as the default with -R, which makes sense
to me as default behavior (the source and destination trees are the same).

Since we use the same code for -R and without it, and we allow -H, -L,
and -P without -R, set the default based on the presence of -R. Without
it, use -L as before, as required by POSIX, and with it, use -P to match
the behavior of other implementations.
---
 cp.1 | 5 -
 cp.c | 3 +++
 libutil/cp.c | 2 +-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/cp.1 b/cp.1
index f74127d..a37caa3 100644
--- a/cp.1
+++ b/cp.1
@@ -48,9 +48,12 @@ Dereference
 if it is a symbolic link.
 .It Fl L
 Dereference all symbolic links.
-This is the default.
+This is the default without
+.Fl R .
 .It Fl P
 Preserve symbolic links.
+This is the default with
+.Fl R .
 .It Fl R
 Traverse directories recursively. If this flag is not specified, directories
 are not copied.
diff --git a/cp.c b/cp.c
index d87e77e..6abe02c 100644
--- a/cp.c
+++ b/cp.c
@@ -45,6 +45,9 @@ main(int argc, char *argv[])
if (argc < 2)
usage();
 
+   if (!cp_follow)
+   cp_follow = cp_rflag ? 'P' : 'L';
+
if (argc > 2) {
if (stat(argv[argc - 1], ) < 0)
eprintf("stat %s:", argv[argc - 1]);
diff --git a/libutil/cp.c b/libutil/cp.c
index b6f8b23..bb85c14 100644
--- a/libutil/cp.c
+++ b/libutil/cp.c
@@ -20,7 +20,7 @@ int cp_pflag  = 0;
 int cp_rflag  = 0;
 int cp_vflag  = 0;
 int cp_status = 0;
-int cp_follow = 'L';
+int cp_follow;
 
 int
 cp(const char *s1, const char *s2, int depth)
-- 
2.24.1

[hackers] [sbase] [PATCH] chmod: Remove -HLP flags, and ignore symlinks during traversal

2019-12-22 Thread Michael Forney

---
Proposed patch for option 1 described in
https://lists.suckless.org/dev/1912/33729.html

 chmod.1 | 16 
 chmod.c | 11 +++
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/chmod.1 b/chmod.1
index 4805ce0..b102b64 100644
--- a/chmod.1
+++ b/chmod.1
@@ -6,10 +6,7 @@
 .Nd change file modes
 .Sh SYNOPSIS
 .Nm
-.Oo
-.Fl R
-.Op Fl H | L | P
-.Oc
+.Op Fl R
 .Ar mode
 .Ar file ...
 .Sh DESCRIPTION
@@ -56,18 +53,13 @@ add | remove | set
 .It r|w|x|s|t
 read | write | execute | setuid and setgid | sticky
 .El
+.Pp
+Symbolic links are followed if they are passed as operands, and ignored
+if they are encountered during directory traversal.
 .Sh OPTIONS
 .Bl -tag -width Ds
 .It Fl R
 Change modes recursively.
-.It Fl H
-Dereference
-.Ar file
-if it is a symbolic link.
-.It Fl L
-Dereference all symbolic links.
-.It Fl P
-Preserve symbolic links. This is the default.
 .El
 .Sh SEE ALSO
 .Xr chgrp 1 ,
diff --git a/chmod.c b/chmod.c
index 2a0085d..6004a74 100644
--- a/chmod.c
+++ b/chmod.c
@@ -14,7 +14,7 @@ chmodr(const char *path, struct stat *st, void *data, struct 
recursor *r)
mode_t m;
 
m = parsemode(modestr, st->st_mode & ~S_IFMT, mask);
-   if (chmod(path, m) < 0) {
+   if (!S_ISLNK(st->st_mode) && chmod(path, m) < 0) {
weprintf("chmod %s:", path);
ret = 1;
} else if (S_ISDIR(st->st_mode)) {
@@ -25,14 +25,14 @@ chmodr(const char *path, struct stat *st, void *data, 
struct recursor *r)
 static void
 usage(void)
 {
-   eprintf("usage: %s [-R [-H | -L | -P]] mode file ...\n", argv0);
+   eprintf("usage: %s [-R] mode file ...\n", argv0);
 }
 
 int
 main(int argc, char *argv[])
 {
struct recursor r = { .fn = chmodr, .hist = NULL, .depth = 0, .maxdepth 
= 1,
- .follow = 'P', .flags = DIRFIRST };
+ .follow = 'H', .flags = DIRFIRST };
size_t i;
 
argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
@@ -45,11 +45,6 @@ main(int argc, char *argv[])
case 'R':
r.maxdepth = 0;
break;
-   case 'H':
-   case 'L':
-   case 'P':
-   r.follow = (*argv)[i];
-   break;
case 'r': case 'w': case 'x': case 's': case 't':
/* -[rwxst] are valid modes, so we're done */
if (i == 1)
-- 
2.24.1

Re: [hackers] [ubase][PATCH] df: Fix value in Avail column

2019-11-17 Thread Michael Forney

This patch looks good to me.

On 2019-11-12, Mattias Andrée  wrote:
> and it is maid explicitly clear that  +  shall not
> be :

s/maid/made/

Re: [hackers] [ubase][PATCH] Add vmstat

2019-11-05 Thread Michael Forney

On 2019-11-05, Mattias Andrée  wrote:
> On Sat, 2 Nov 2019 12:33:58 -0700
> Michael Forney  wrote:
>
>> I've never used vmstat before, but this looks pretty good overall and
>> seems to work well.
>>
>> On 2019-10-05, Mattias Andrée  wrote:
>> > +  goto beginning;
>> > +  for (; argc && (argc < 2 || i < count); i++) {
>>
>> Why not just set count = 1 when argc < 2?
>
> Because that would stop the loop after 1 iteration.
> If argc == 1, the loop should be infinite.

Oh, right.

> An alternative that would work is:
>
>   for (;;) {
>   load_vm([i & 1]);
>   print_vm([i & 1], [~i & 1], active_mem, timestamp, 
> one_header ? !i :
> (i % 50 == 0));
>   i++;
>   if (!argc || (argc == 2 && i == count))
>   break;
>   clock_nanosleep(CLOCK_MONOTONIC, 0, , NULL);
>   }

FWIW, I like this approach.

Re: [hackers] [ubase][PATCH] Add vmstat

2019-11-03 Thread Michael Forney

I've never used vmstat before, but this looks pretty good overall and
seems to work well.

On 2019-10-05, Mattias Andrée  wrote:
> + goto beginning;
> + for (; argc && (argc < 2 || i < count); i++) {

Why not just set count = 1 when argc < 2?

> + clock_nanosleep(CLOCK_MONOTONIC, 0, , NULL);
> + beginning:
> + load_vm([i & 1]);
> + print_vm([i & 1], [~i & 1], active_mem, timestamp, 
> one_header ? !i
> : (i % 50 == 0));
> + }

I think it might be a bit clearer to re-arrange the loop body to avoid
the labels and goto.

Something like

count = argc == 2 ? atoll(argv[1]) : 1;
for (;;) {
load_vm([i & 1]);
print_vm([i & 1], [~i & 1], active_mem, timestamp, 
one_header
? !i : (i % 50 == 0));
if (++i == count)
break;
clock_nanosleep(CLOCK_MONOTONIC, 0, , NULL);
}

[hackers] Re: [ubase] [PATCH] Include sys/sysmacros.h when major is not defined in sys/types.h

2019-11-02 Thread Michael Forney

On 2019-08-10, Michael Forney  wrote:
> This fixes the build with glibc >= 2.28 and musl >= 1.1.23.

Ping. Anyone still care about ubase? There are a few patches pending
on the list.

If nobody else is interested, I can start maintaining ubase.

[hackers] [ubase] [PATCH] Include sys/sysmacros.h when major is not defined in sys/types.h

2019-08-10 Thread Michael Forney

This fixes the build with glibc >= 2.28 and musl >= 1.1.23.
---
 libutil/tty.c | 5 -
 mknod.c   | 3 +++
 mountpoint.c  | 3 +++
 stat.c| 3 +++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/libutil/tty.c b/libutil/tty.c
index bceb01e..f56388e 100644
--- a/libutil/tty.c
+++ b/libutil/tty.c
@@ -1,6 +1,9 @@
 /* See LICENSE file for copyright and license details. */
-#include 
 #include 
+#include 
+#ifndef major
+#include 
+#endif
 
 #include 
 #include 
diff --git a/mknod.c b/mknod.c
index 8de35c7..9dbede7 100644
--- a/mknod.c
+++ b/mknod.c
@@ -1,6 +1,9 @@
 /* See LICENSE file for copyright and license details. */
 #include 
 #include 
+#ifndef major
+#include 
+#endif
 
 #include 
 #include 
diff --git a/mountpoint.c b/mountpoint.c
index 8f205a2..726cc80 100644
--- a/mountpoint.c
+++ b/mountpoint.c
@@ -1,6 +1,9 @@
 /* See LICENSE file for copyright and license details. */
 #include 
 #include 
+#ifndef major
+#include 
+#endif
 
 #include 
 #include 
diff --git a/stat.c b/stat.c
index 220a659..3a6569b 100644
--- a/stat.c
+++ b/stat.c
@@ -1,6 +1,9 @@
 /* See LICENSE file for copyright and license details. */
 #include 
 #include 
+#ifndef major
+#include 
+#endif
 
 #include 
 #include 
-- 
2.20.1

[hackers] [sdhcp] [PATCH] Request mask, router and DNS server options

2019-08-10 Thread Michael Forney

I've run into a DHCP server that doesn't send these unless they are
requested.
---
 sdhcp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sdhcp.c b/sdhcp.c
index 1fcf5e6..7e3f7f5 100644
--- a/sdhcp.c
+++ b/sdhcp.c
@@ -267,6 +267,7 @@ hnoptput(unsigned char *p, int opt, uint32_t data, size_t 
len)
 static void
 dhcpsend(int type, int how)
 {
+   static unsigned char params[] = { OBmask, OBrouter, OBdnsserver };
unsigned char *ip, *p;
 
memset(, 0, sizeof(bp));
@@ -290,6 +291,7 @@ dhcpsend(int type, int how)
/* memcpy(bp.ciaddr, client, sizeof bp.ciaddr); */
p = optput(p, ODipaddr, client, sizeof(client));
p = optput(p, ODserverid, server, sizeof(server));
+   p = optput(p, ODparams, params, sizeof(params));
break;
case DHCPrelease:
memcpy(bp.ciaddr, client, sizeof(client));
-- 
2.20.1

Re: [hackers] [sbase][PATCH] which: check AT_EACCESS

2019-07-30 Thread Michael Forney

On 2019-07-29, Mattias Andrée  wrote:
> setuid is inherited (exec(3) never changes the effective user according
> to POSIX unless the new process have the setuid flag and it is not blocked
> by the ST_NOSUID mount option). However, I cannot think of a real world
> scenario where this would matter; it would be if the user have a program
> similar to sudo that only changes the effective user.

Ah, okay, thanks for the explanation. Well, I suppose it doesn't hurt,
and also matches `test -x`, which uses AT_EACCESS as well.

Applied.

Re: [hackers] [sbase][PATCH] which: remove unnecessary third parameter in O_RDONLY call to open(3)

2019-07-29 Thread Michael Forney

Thanks, applied.

Re: [hackers] [sbase][PATCH] which: check AT_EACCESS

2019-07-29 Thread Michael Forney

On 2019-07-27, Mattias Andrée  wrote:
> A file is executable only if the effective user
> have permission to execute it. The real user's
> permissions do not matter.

Thanks for the patch, but doesn't this only make a difference if the
`which` binary itself is setuid? If not, can you provide an example
that is fixed by this patch?

I looked at a few other implementations and they just use access(3),
which behaves like faccessat(3) with no flags.

Re: [hackers] [PATCH v3 2/2][sbase] chgrp: parse gid if operand is not group name

2019-07-08 Thread Michael Forney

On 2019-07-04, Richard Ipsum  wrote:
> +
> + if (gid == -1)
> + usage();

I think this check is not necessary either, since estrtonum will error
if the specified value was negative.

I pushed with the check removed, and also a small style tweak (hope
you don't mind).

Thanks for the patch!

Re: [hackers] [PATCH v2 2/2][sbase] chgrp: parse gid if operand is not group name

2019-07-08 Thread Michael Forney

On 2019-07-05, Richard Ipsum  wrote:
> Forgot to mention that I just took this from chown pretty much.
> I'm not convinced the checks are needed there either, without the checks
> you'll just get an error instead of a usage message.

I think they are needed in chown since the user or group could be
omitted. Well, actually `owner && *owner` could just be `*owner`, but
I think it's better to leave it for consistency with the group check
below.

Re: [hackers] [PATCH v2 1/2][sbase] chown/chgrp: chown target not symlink by default

2019-07-04 Thread Michael Forney

On 2019-07-03, Richard Ipsum  wrote:
> chown on a symlink should only chown the symlink itself when -h
> is specified, when no options are provided the target should be chown'd.

So this makes it so the -P behavior is only used when -R is in effect,
since all of the -H, -L, and -P options are guarded by -R.

Okay, I think I understand. Applied, thanks!

Re: [hackers] [PATCH v2 2/2][sbase] chgrp: parse gid if operand is not group name

2019-07-03 Thread Michael Forney

On 2019-07-03, Richard Ipsum  wrote:
> ---
>  chgrp.c | 22 +++---
>  1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/chgrp.c b/chgrp.c
> index 7ee3553..ddced31 100644
> --- a/chgrp.c
> +++ b/chgrp.c
> @@ -44,6 +44,7 @@ int
>  main(int argc, char *argv[])
>  {
>   struct group *gr;
> + char *group;
>   struct recursor r = { .fn = chgrp, .hist = NULL, .depth = 0, .maxdepth =
> 1,
> .follow = 'P', .flags = 0 };
>
> @@ -66,14 +67,21 @@ main(int argc, char *argv[])
>   if (argc < 2)
>   usage();
>
> - errno = 0;
> - if (!(gr = getgrnam(argv[0]))) {
> - if (errno)
> - eprintf("getgrnam %s:", argv[0]);
> - else
> - eprintf("getgrnam %s: no such group\n", argv[0]);
> + group = argv[0];
> + if (group && *group) {

I don't think you need to check if group != NULL here, since we know
argc >= 2 at this point.

Also, do we need to check *group? Is it an error to call getgrnam(3)
with an empty string? I assume it would just fail to find a group.

> + errno = 0;
> + gr = getgrnam(group);
> + if (gr) {
> + gid = gr->gr_gid;
> + } else {
> + if (errno)
> + eprintf("getgrnam %s:", group);
> + gid = estrtonum(group, 0, UINT_MAX);
> + }
>   }
> - gid = gr->gr_gid;
> +
> + if (gid == -1)
> + usage();
>
>   for (argc--, argv++; *argv; argc--, argv++)
>   recurse(*argv, NULL, );
> --
> 2.21.0

Re: [hackers] [PATCH 1/3][sbase] chown: Ignore failure to find user/group name

2019-07-03 Thread Michael Forney

On 2019-07-03, Richard Ipsum  wrote:
> A fix for this has been applied to OpenBSD's libc.

Awesome, thanks for handling that!

Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)

2019-07-01 Thread Michael Forney

On 2019-06-30, Laslo Hunhold  wrote:
> What do the others think?

Thanks for investigating this, Laslo. Your findings seem to match my
quick experimentation, and at this point I am leaning towards sticking
with ARGBEGIN/ARGEND, but using a simpler implementation like the one
I shared or the one you are using for farbfeld. It's good to know that
we could wrap getopt(3) if necessary, but I had forgotten about
POSIXLY_CORRECT. I think that alone might be reason enough to avoid
it. I think the argv permutation glibc getopt does by default would be
a serious regression.

Additionally, I've been thinking about the argv = { NULL } case.
Currently argv0 will get set to NULL, but there seems to be a lot of
places where it is passed to printf without checking (even though most
implementations will print "(null)"). I'm thinking we should probably
provide a fallback argv0 in this case. Perhaps we could pass it as a
parameter to ARGBEGIN, like

ARGBEGIN("rm") {
...
} ARGEND

Or maybe we could just introduce an idiom always used as the first
statement in main() like

argv0 = argc ? argv[0] : "rm";

rather than implicitly setting it ARGBEGIN.

Re: [hackers] [sbase] Add .gitignore || Michael Forney

2019-07-01 Thread Michael Forney

On 2019-07-01, Anselm Garbe  wrote:
> Hi there,
>
> On Sat, 29 Jun 2019 at 20:21,  wrote:
>> +++ b/.gitignore
>> @@ -0,0 +1,99 @@
>> +*.o
>> +/getconf.h
>> +/libutf.a
>> +/libutil.a
> [..]
>
> I suggest:
>
> *
> !*.c
> !*.h
> gitconf.h
>
> or something along those lines. Then you don't need to maintain all
> the cruft and can also get rid of this Makefile rule.

Thanks for the suggestion. I'm worried this might match too much.
Ideally only build outputs should be ignored.

Re: [hackers] [sbase] Add .gitignore || Michael Forney

2019-07-01 Thread Michael Forney

On 2019-07-01, David Demelier  wrote:
> I think the Makefile rule is redundant and keeping just the .gitignore
> could be enough. Because to me a project/code/application should not
> know which SCM it is based on. That's also why I personnally exclude
> SCMs bits from distributions files (including ignore files).

The idea with the Makefile rule is that the Makefile is the single
source of truth for the list of build artifacts that need to be
ignored. If someone removes a tool from sbase but forgets to remove
the entry from .gitignore, or adds a tool and messes up the sorting
order in .gitignore, that could easily go unnoticed for a long time.

> The Makefile rule could be a simple alias in your shell :-)

I don't think this is as simple as you make it sound, since it depends
on variables within the Makefile. My best attempt is to pipe a
Makefile fragment to make and tell it to read from multiple sources.

printf '.PHONY: .gitignore\n.gitignore:\n\t{ printf '\''*.o\\n'\'' ;
printf '\''/%%s\\n'\'' getconf.h $(LIB) $(BIN) ; } > $@\n' | make -f
Makefile -f - .gitignore

Either way, I don't think the files checked into the sbase repository
should depend on any personal scripts I have. If the maintainer
changes, I want to make sure it is clear how it was generated.
Additionally, having the rule ensures that a contributor of a new tool
can easily update .gitignore with their patch.

I appreciate your concern, but I don't think the two-line Makefile
rule at the end of the file will negatively affect anyone using or
working on sbase. It's there if you need it, but if not it can just be
ignored.

Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)

2019-06-30 Thread Michael Forney

On 2019-06-30, Evan Gates  wrote:
> On Fri, Jun 28, 2019 at 3:08 AM Michael Forney  wrote:
>>
>> Then, maybe something like the following would work for those cases.
>>
>> ARGBEGIN {
>> default:
>> goto done;
>> } ARGEND
>> done:
>
> Seeing as we have more confusion and bugs to deal with in argument
> handling, perhaps now is a time to revist the conversation surrounding
> getopt(3p)?  There is a POSIX mandated libc function that parses POSIX
> options following the utility syntax guidelines.  I strongly suggest
> using it.  I know there has been pushback in the past but it's a
> portable option that solves our exact problems.  This is why it exists.
> It's already part of libc.  We should stop avoiding it in favor of a
> difficult to read block of macros that have trouble dealing with corner
> cases.  Removing arg.h simplifies sbase code and maintenance and helps
> break the cycle of NIH that we often fall into.

I'm okay with switching to getopt(3), but also note that the current
arg.h is probably more complicated than it needs to be. Here's a
version I rewrote that I've been using in my own projects:

https://git.sr.ht/~mcf/samurai/blob/master/arg.h

I agree that getopt(3) would probably be better at handling the corner
cases. Yesterday I was planning to check that tools behaved correctly
with argv = { NULL }. The first thing I tried was rm(1), which tried
to remove files corresponding to my environment (i.e.
"SHELL=/bin/ksh"). Yikes!

I am also not sure how getopt(3) could be used to handle the tricky
cases I mentioned, like printf(1). It doesn't have any options, but
still needs to support `--` and treat arguments that start with `-` as
operands rather than printing a usage message.

[hackers] [sbase] [PATCH] ls, tar: Include sys/sysmacros.h by default

2019-06-30 Thread Michael Forney

Current glibc and the upcoming musl no longer provide the device macros
through sys/types.h. So, include sys/sysmacros.h if HAVE_SYS_SYSMACROS_H
is defined, and define it by default.
---
Not really sure what to do here. This is my best idea apart from adding
a configure script.

 README| 4 
 config.mk | 3 ++-
 ls.c  | 2 +-
 tar.c | 2 +-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/README b/README
index 8653552..8b6b408 100644
--- a/README
+++ b/README
@@ -30,6 +30,10 @@ including Linux, *BSD, OSX, Haiku, Solaris, SCO OpenServer 
and others.
 Various combinations of operating systems and architectures have also
 been built.
 
+If your system does not have the sys/sysmacros.h header, you will need
+to modify config.mk appropriately. This is necessary since there is no
+common header for the major, minor, and makedev macros.
+
 You can build sbase with gcc, clang, tcc, nwcc and pcc.
 
 Status
diff --git a/config.mk b/config.mk
index 9fb18da..5733fa7 100644
--- a/config.mk
+++ b/config.mk
@@ -10,7 +10,8 @@ AR = ar
 RANLIB = ranlib
 
 # for NetBSD add -D_NETBSD_SOURCE
+# remove -DHAVE_SYS_SYSMACROS_H if your system doesn't have sys/sysmacros.h
 # -lrt might be needed on some systems
-CPPFLAGS = -D_DEFAULT_SOURCE -D_BSD_SOURCE -D_XOPEN_SOURCE=700 
-D_FILE_OFFSET_BITS=64
+CPPFLAGS = -D_DEFAULT_SOURCE -D_BSD_SOURCE -D_XOPEN_SOURCE=700 
-D_FILE_OFFSET_BITS=64 -DHAVE_SYS_SYSMACROS_H
 CFLAGS   = -std=c99 -Wall -pedantic
 LDFLAGS  = -s
diff --git a/ls.c b/ls.c
index 7de15ac..8fda438 100644
--- a/ls.c
+++ b/ls.c
@@ -1,7 +1,7 @@
 /* See LICENSE file for copyright and license details. */
 #include 
 #include 
-#ifdef __GLIBC__
+#ifdef HAVE_SYS_SYSMACROS_H
 #include 
 #endif
 
diff --git a/tar.c b/tar.c
index fb0df45..6d7a380 100644
--- a/tar.c
+++ b/tar.c
@@ -2,7 +2,7 @@
 #include 
 #include 
 #include 
-#ifdef __GLIBC__
+#ifdef HAVE_SYS_SYSMACROS_H
 #include 
 #endif
 
-- 
2.20.1

Re: [hackers] [sbase] Add .gitignore || Michael Forney

2019-06-30 Thread Michael Forney

On 2019-06-30, Quentin Rameau  wrote:
>> Add .gitignore
>>
>> Also, add rule to regenerate in case executable list changes.
>
> yuck

Remarks like this aren't constructive, please keep them to yourself.

I already justified why adding .gitignore and rule to regenerate it is
useful. I'm sorry that you don't approve, but not having a .gitignore
was a significant annoyance to me and was making it difficult to do my
job as maintainer.

When I asked you about your workflow to try to understand your
opposition to .gitignore, you ignored my question and *at the same
time* accused me of ignoring your question, which I had answered
multiple times at that point.

I'm tired of arguing with you, so please, just find a way to deal with it.

Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)

2019-06-28 Thread Michael Forney

On 2019-06-28, Laslo Hunhold  wrote:
>> Rather than add a special ENOFLAGS macro, I am tempted to revert
>> 9016d288 instead. The question that remains is how to handle arguments
>> that look like options (start with `-`) for tools that don't have any
>> specified options. It seems for some tools it is important to treat
>> them as operands (e.g. printf, expr), but for others that are commonly
>> extended with additional options, it might make sense to fail with
>> usage() to catch unsupported uses (e.g. hostname, tsort, chroot).
>
> thanks for elaborating about it more! This topic was a big point of
> debate back then. In this matter Posix should have the final word. So
> in this sense, if Posix mandates for e.g. dirname(1) to support -- then
> we should not ignore -- and the user will have to write
>
>   $ dirname -- --
>
> I think though that we all agree that an ENOFLAGS-macro would be
> insanity.

I don't necessarily think it would be insanity, and actually prefer it
to the one-liner used currently. Something like it might even be
necessary to handle `--` without consuming any option arguments (for
the printf and expr case mentioned above) unless we change the
ARGBEGIN macro to not advance the argv strings. Then, maybe something
like the following would work for those cases.

ARGBEGIN {
default:
goto done;
} ARGEND
done:

Though, this isn't particularly nice either...

Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)

2019-06-28 Thread Michael Forney

On 2019-06-28, Michael Forney  wrote:
> As far as I know, unless the documentation states that a utility shall
> conform to the Utility Syntax Guidelines, it is not required to
> support `--`. However, it does use the language "should" which
> means[2]:
>
>   For an implementation that conforms to POSIX.1-2017, describes a
> feature or behavior that is recommended but not mandatory.

I found a section in POSIX that is a bit more explicit about this, and
it looks like handling `--` is indeed required. See the OPTIONS
section in

https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/utilities/V3_chap01.html#tag_17_04

Default Behavior: When this section is listed as "None.",
it means that the implementation need not support any
options. Standard utilities that do not accept options,
but that do accept operands, shall recognize "--" as a first
argument to be discarded.

The requirement for recognizing "--" is because conforming
applications need a way to shield their operands from any
arbitrary options that the implementation may provide as
an extension.

Re: [hackers] [sbase][PATCH] Support -- in all utilities except echo(1)

2019-06-28 Thread Michael Forney

Hi Mattias,

I'm really sorry for ignoring this for so long. Someone asked me about
the `argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;` one-liner,
and I remembered this patch. I finally took the time to investigate
this issue in more detail.

On 2018-07-08, Mattias Andrée  wrote:
> In POSIX-2017 it was clarified via the documentation for
> basename(1) and dirname(1) that all programs should support
> -- unless specified otherwise.

I was confused by this for a bit, but I think you are referring to
http://austingroupbugs.net/view.php?id=192, which was not present in
the 2008 edition[0], but did make it into the 2013 edition[1].

As far as I know, unless the documentation states that a utility shall
conform to the Utility Syntax Guidelines, it is not required to
support `--`. However, it does use the language "should" which
means[2]:

For an implementation that conforms to POSIX.1-2017, describes a
feature or behavior that is recommended but not mandatory.

It looks like these tools formerly supported `--`, but this was
removed in 9016d288[3], with the justification that echo(1) is
mandated to *not* support `--` and that there is no reason why other
tools with no options should handle it. I would argue that there *is*
a reason to support `--`, and that is the Utility Syntax Guidelines.
echo(1) is an exception that is explicitly called out, but a single
exception shouldn't set a precedent for other tools. In fact, tty(1)
is actually required to support the Utility Syntax Guidelines[4], so
9016d288 actually breaks POSIX conformance. The examples you mentioned
in basename(1) and dirname(1), and also a note about `--` I found in
expr(1) seem to support this view.

Rather than add a special ENOFLAGS macro, I am tempted to revert
9016d288 instead. The question that remains is how to handle arguments
that look like options (start with `-`) for tools that don't have any
specified options. It seems for some tools it is important to treat
them as operands (e.g. printf, expr), but for others that are commonly
extended with additional options, it might make sense to fail with
usage() to catch unsupported uses (e.g. hostname, tsort, chroot).

[0] 
http://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/basename.html
[1] 
http://pubs.opengroup.org/onlinepubs/9699919799.2013edition/utilities/basename.html
[2] 
https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/basedefs/V1_chap01.html#tag_01_05_06
[3] 
https://git.suckless.org/sbase/commit/9016d288f1cd03bf514fe84d537b203040e1ccf8.html
[4] 
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/tty.html#tag_20_135_04

Re: [hackers] [sbase][PATCH] Minor optimizations for 'yes'

2019-06-28 Thread Michael Forney

On 2019-06-27, Laslo Hunhold  wrote:
> On Thu, 27 Jun 2019 09:10:28 +0200
> Quentin Rameau  wrote:
>
> Dear Quentin,
>
>> I agree, yes is neither a benchmark tool nor a data generator, it just
>> outputs 'y' for piping into other commands.
>> Keep it simple.
>
> I agree with you in general, but are we really "optimizing" yes(1) here
> for the sake of performance? This looks to me like a case of premature
> optimization.

I think we are all on the same page here about simplifying yes(1)
rather than optimizing it. See aidanwillie0317's most recent patch.

> We don't know if a script relies on the behaviour of yes(1) printing
> all passed strings repeatedly. So, even though the tool is not
> standardized by Posix the common "consensus" is that
>
>   - with no arguments, it shall print y and a newline repeatedly.
>   - with arguments, it shall print them, comma separated, followed by
> newline, repeatedly.

(I think you mean "space separated" here)

How did you determine this consensus? All the BSD implementations I
looked at (OpenBSD, FreeBSD, NetBSD) only support a single argument.

Do you have any examples of scripts relying on this behavior of yes(1)?

> The point about readability is a good one. I will even take the blame
> for writing it as it is now. However, you could greatly improve
> readability without sacrificing features/performance. The line
>
>   argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
>
> is used in many places in sbase and is more or less an idiom. The
> entire tool itself could then be written as follows. We can even
> remove the checks consequently within the loop, which could be regarded
> as a "performance" improvement.
>
>   int i;
>
>   argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
>   
>   if (argc == 0) {
>   for (;;) {
>   fputs("y\n", stdout);
>   }
>   } else {
>   for (;;) {
>   for (i = 0; i < argc; i++) {
>   fputs(argv[i], stdout);
>   fputc(' ', stdout);

This will end up printing an extra space before the newline.

>   }
>   fputc('\n', stdout);
>   }
>   }
>
> What do you think? Patch is attached.

This looks pretty good to me. If we decide to support any number of
operands, this seems like the way to do it.

Re: [hackers] [sbase][PATCH] Minor optimizations for 'yes'

2019-06-27 Thread Michael Forney

On 2019-06-27, aidanwillie0317  wrote:
> Or, even simpler:

Thanks, this looks good. Only a couple minor comments.

Could you update the man page to reflect the fact that now only one
string is supported?

> ---
>  yes.c | 10 +++---
>  1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/yes.c b/yes.c
> index dd97ea6..5f3b2e1 100644
> --- a/yes.c
> +++ b/yes.c
> @@ -6,14 +6,10 @@
>  int
>  main(int argc, char *argv[])
>  {
> -   char **p;
> +   char *p = argv[1] ? argv[1] : "y";
>
> -   argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;

This line is actually necessary. The C standard only guarantees that
argc is non-negative, and argv[argc] is a null pointer[0]. In this
case, argv[1] would be an access beyond the end of the argv array.
Since we technically don't print argv0 anywhere (either directly, or
through eprintf), we could get around this by just checking `argc >
1`, but for consistency with the rest of the tools I would suggest

argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
p = argc ? argv[0] : "y";

[0] http://port70.net/~nsz/c/c11/n1570.html#5.1.2.2.1p2

[hackers] [ubase] [PATCH 2/2] pw_check: Allow empty password in shadow(5)

2019-06-26 Thread Michael Forney

---
 libutil/passwd.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libutil/passwd.c b/libutil/passwd.c
index c1233a1..3171fd5 100644
--- a/libutil/passwd.c
+++ b/libutil/passwd.c
@@ -23,12 +23,6 @@ pw_check(const struct passwd *pw, const char *pass)
struct spwd *spw;
 
p = pw->pw_passwd;
-   if (p[0] == '\0') {
-   if (pass[0] == '\0')
-   return 1;
-   weprintf("incorrect password\n");
-   return 0;
-   }
if (p[0] == 'x' && p[1] == '\0') {
errno = 0;
spw = getspnam(pw->pw_name);
@@ -45,6 +39,12 @@ pw_check(const struct passwd *pw, const char *pass)
weprintf("denied\n");
return -1;
}
+   if (p[0] == '\0') {
+   if (pass[0] == '\0')
+   return 1;
+   weprintf("incorrect password\n");
+   return 0;
+   }
 
cryptpass = crypt(pass, p);
if (!cryptpass) {
-- 
2.20.1

Re: [hackers] [sbase][PATCH] Minor optimizations for 'yes'

2019-06-26 Thread Michael Forney

On 2019-06-26, aidanwillie0317  wrote:
>
> On Wednesday, June 26, 2019 4:04:50 AM EDT you wrote:
>> What's the motivation for this? Are you running into performance
>> issues with yes(1), or is this just for fun to increase the throughput
>> for some benchmark?
> One of yes(1)'s uses (besides the intended use of making interactive
> programs
> work noninteractively) is to generate a high volume stream of data, as
> /dev/
> urandom will only go up around 10MiB/s on most devices. I was doing some
> testing for a personal program to see how well it could handle an influx of
> data, and I noticed that the sbase yes is much slower than the GNU and
> FreeBSD
> implementations. So, I patched the sbase one quickly and got it to run
> faster
> than many implementations (GNU yes being a major exception, running at
> ~4GiB/s
> on my machine).

What about /dev/zero? Or do you need ASCII data? I don't think yes(1)
is the right tool for the job. As you said, it is meant for
auto-responding to prompts.

> As you said before, the code is hard to follow, so I didn't mess with that
> part that much. However, in retrospect, the code can be simplfied to this:
>   for (p = argv; ; p = *(p + 1) ? p + 1 : argv) {
>   fputs(*p, stdout);
>   putchar(!*(p + 1) ? '\n' : ' ');
>   }

Thanks, that looks better. I'm thinking this might be even clearer to
use an index instead of advancing a pointer:

i = 0;
for (;;) {
fputs(argv[i], stdout);
i = (i + 1) % argc;
putchar(i ? '\n' : ' ');
}

In fact, since yes(1) is a non-standard utility, I wonder if we should
even bother supporting multiple arguments. It looks like some BSD
implementations only repeat the first argument.

So, we could make this as simple as

s = argc > 1 ? argv[1] : "y";
for (;;)
puts(s);

>> I think I'd prefer
>>
>>  for (;;)
>>  puts("y");
>>
>> here.
> Originally, I had a puts(3) call there, but this decreased the performance
> from the unpatched sbase, so I changed it to some fputc calls. Calling
> write(2) didn't help either. I researched why and it's due to some
> buffering,
> but adding buffering would overcomplicate the code.

I think the major slowdown here is from locking and unlocking the file
for every loop iteration. Using putchar_unlocked will be much faster
than a plain putchar. But anyway, I think we should stick to puts(3)
for simplicity.

Re: [hackers] [PATCH 1/3][sbase] chown: Ignore failure to find user/group name

2019-06-26 Thread Michael Forney

On 2019-06-26, Richard Ipsum  wrote:
> On Tue, Jun 25, 2019 at 04:55:10PM -0700, Michael Forney wrote:
>> I think the error is coming from
>> https://github.com/openbsd/src/blob/f84583fe5d7899ab1504dc1d58a04742db4a3058/lib/libc/gen/getgrent.c#L217.
>> If I run `touch /var/run/ypbind.lock` in my OpenBSD VM, it fixes the
>> issue.
>
> I see, so to preserve the existing behaviour you'd want to save errno
> before calling access, then restore afterward.

Yes, I think that would work.

>> I think this is unintended, since there appears to be some logic to
>> save and restore errno in grscan(). Looks like it was introduced in
>> https://github.com/openbsd/src/commit/b971f1acd7c34a49359ccefbe512e06f3826a939
>> (first released in OpenBSD 5.9).
>
> Right, exactly. I think the reason this doesn't break OpenBSD chown is
> because it uses `gid_from_group` and the implementation[1] for that
> actually ignores errno, if it didn't they might have spotted this issue
> themselves.
>
> Do you want to raise this with upstream or shall I?

If you don't mind, I'd appreciate it if you did.

> Yes I can see why you would rather not ignore errno to be fair.
> Maybe the better option is to see if the fix can be made in OpenBSD
> and then I could rework these patches?

Yes, that sounds like a good plan.

Thanks!

Re: [hackers] [sbase][PATCH] Minor optimizations for 'yes'

2019-06-26 Thread Michael Forney

What's the motivation for this? Are you running into performance
issues with yes(1), or is this just for fun to increase the throughput
for some benchmark?

I'd be in favor of changing yes if it increases readability (while
terse, the existing logic is a bit hard to follow), but I don't think
this patch does that.

On 2019-06-25, AGitBoy  wrote:
> ---
>  yes.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/yes.c b/yes.c
> index dd97ea6..49f5ed6 100644
> --- a/yes.c
> +++ b/yes.c
> @@ -6,13 +6,20 @@
>  int
>  main(int argc, char *argv[])
>  {
> - char **p;
> + if (argc > 1) {
> + char **p;
>
> - argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
> + argv0 = *argv, argv0 ? (argc--, argv++) : (void *)0;
>
> - for (p = argv; ; p = (*p && *(p + 1)) ? p + 1 : argv) {
> - fputs(*p ? *p : "y", stdout);
> - putchar((!*p || !*(p + 1)) ? '\n' : ' ');
> + for (p = argv; ; p = (*p && *(p + 1)) ? p + 1 : argv) {
> + fputs(*p, stdout);
> + putchar((!*p || !*(p + 1)) ? '\n' : ' ');
> + }

In this case, since we know there is at least one argument, *p is
always true, and I think some simplifications fall out of that.

> + } else {
> + while (1) {
> + fputc('y', stdout);
> + fputc('\n', stdout);
> + }

I think I'd prefer

for (;;)
puts("y");

here.

>   }
>
>   return 1; /* not reached */
> --
> 2.21.0

[hackers] [ubase] [PATCH 1/2] pw_check: Simplify slightly

2019-06-25 Thread Michael Forney

Use common check for password field starting with '*' or '!'.
---
 libutil/passwd.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/libutil/passwd.c b/libutil/passwd.c
index 0798225..c1233a1 100644
--- a/libutil/passwd.c
+++ b/libutil/passwd.c
@@ -23,19 +23,13 @@ pw_check(const struct passwd *pw, const char *pass)
struct spwd *spw;
 
p = pw->pw_passwd;
-   if (p[0] == '!' || p[0] == '*') {
-   weprintf("denied\n");
-   return -1;
-   }
-
-   if (pw->pw_passwd[0] == '\0') {
+   if (p[0] == '\0') {
if (pass[0] == '\0')
return 1;
weprintf("incorrect password\n");
return 0;
}
-
-   if (pw->pw_passwd[0] == 'x' && pw->pw_passwd[1] == '\0') {
+   if (p[0] == 'x' && p[1] == '\0') {
errno = 0;
spw = getspnam(pw->pw_name);
if (!spw) {
@@ -46,10 +40,10 @@ pw_check(const struct passwd *pw, const char *pass)
return -1;
}
p = spw->sp_pwdp;
-   if (p[0] == '!' || p[0] == '*') {
-   weprintf("denied\n");
-   return -1;
-   }
+   }
+   if (p[0] == '!' || p[0] == '*') {
+   weprintf("denied\n");
+   return -1;
}
 
cryptpass = crypt(pass, p);
-- 
2.20.1

Re: [hackers] [PATCH 1/3][sbase] chown: Ignore failure to find user/group name

2019-06-25 Thread Michael Forney

On 2019-06-25, Richard Ipsum  wrote:
> On Tue, Jun 25, 2019 at 03:44:57PM -0700, Michael Forney wrote:
>> On 2019-06-25, Richard Ipsum  wrote:
>> > This fixes an error on OpenBSD,
>> >
>> > % chown 1000:1000 yes.c
>> > ./chown: getgrnam 1000: No such file or directory
>>
>> POSIX says that if the entry cannot be found, errno shall not be
>> changed. Are you saying that OpenBSD doesn't behave this way?
>
> Yeah when I run this on OpenBSD with uid:gid values I get ENOENT.

Do you think that's something they'd be willing to fix?

I think the error is coming from
https://github.com/openbsd/src/blob/f84583fe5d7899ab1504dc1d58a04742db4a3058/lib/libc/gen/getgrent.c#L217.
If I run `touch /var/run/ypbind.lock` in my OpenBSD VM, it fixes the
issue.

I think this is unintended, since there appears to be some logic to
save and restore errno in grscan(). Looks like it was introduced in
https://github.com/openbsd/src/commit/b971f1acd7c34a49359ccefbe512e06f3826a939
(first released in OpenBSD 5.9).

>>
>> > ---
>> >  chown.c | 18 ++
>> >  1 file changed, 2 insertions(+), 16 deletions(-)
>> >
>> > diff --git a/chown.c b/chown.c
>> > index d7dc8e0..748ce97 100644
>> > --- a/chown.c
>> > +++ b/chown.c
>> > @@ -79,26 +79,12 @@ main(int argc, char *argv[])
>> >*group++ = '\0';
>> >
>> >if (owner && *owner) {
>> > -  errno = 0;
>> >pw = getpwnam(owner);
>> > -  if (pw) {
>> > -  uid = pw->pw_uid;
>> > -  } else {
>> > -  if (errno)
>> > -  eprintf("getpwnam %s:", owner);
>> > -  uid = estrtonum(owner, 0, UINT_MAX);
>> > -  }
>> > +  uid = pw ? pw->pw_uid : estrtonum(owner, 0, UINT_MAX);
>>
>> It looks like there is no error with getpwnam, only with getgrname,
>> right?
>>
>
> Yes, but I would argue that realistically the only time this function errors
> is
> when the entry doesn't exist, we don't want to exit in that case,
> instead try parsing it as a numeric uid and if that fails then exit.

The POSIX getgrnam specification also lists a couple other possible
errors: I/O error and out of file descriptors.

Also, the chown specification specifically says

If a numeric group operand exists in the group database as a group
name, the group ID number associated with that group name shall be
used as the group ID.

If we aren't able to read /etc/group for some reason and the the group
name for a number exists, we would end up using the wrong group ID.

Re: [hackers] [PATCH 1/3][sbase] chown: Ignore failure to find user/group name

2019-06-25 Thread Michael Forney

On 2019-06-25, Richard Ipsum  wrote:
> This fixes an error on OpenBSD,
>
> % chown 1000:1000 yes.c
> ./chown: getgrnam 1000: No such file or directory

POSIX says that if the entry cannot be found, errno shall not be
changed. Are you saying that OpenBSD doesn't behave this way?

> ---
>  chown.c | 18 ++
>  1 file changed, 2 insertions(+), 16 deletions(-)
>
> diff --git a/chown.c b/chown.c
> index d7dc8e0..748ce97 100644
> --- a/chown.c
> +++ b/chown.c
> @@ -79,26 +79,12 @@ main(int argc, char *argv[])
>   *group++ = '\0';
>
>   if (owner && *owner) {
> - errno = 0;
>   pw = getpwnam(owner);
> - if (pw) {
> - uid = pw->pw_uid;
> - } else {
> - if (errno)
> - eprintf("getpwnam %s:", owner);
> - uid = estrtonum(owner, 0, UINT_MAX);
> - }
> + uid = pw ? pw->pw_uid : estrtonum(owner, 0, UINT_MAX);

It looks like there is no error with getpwnam, only with getgrname, right?

Re: [hackers] [sbase] [PATCH] Add .gitignore

2019-06-17 Thread Michael Forney

On 2019-06-17, Quentin Rameau  wrote:
>> On 2019-06-17, Quentin Rameau  wrote:
>> >> How do you deal with ~250 lines of "Untracked files:" in the `git
>> >> status` output?

Can you answer this question? I'm curious why this doesn't obstruct
your workflow.

>> >> If you want to see them, you can always run `git status --ignored`.
>> >
>> > If you want it, you can always run `make .gitignore` *once*.
>>
>> Once for every clone of sbase.
>
> Yes, as opposed to once for every git status command issued.

Perhaps you can set up a shorter git alias for `status --ignored`.
This will also work for other repositories that contain a .gitignore.

>> >> > What's the rationale for having it duplicated both in the SCM and in
>> >> > the
>> >> > Makefile then?
>> >>
>> >> So that if a utility is added or removed, .gitignore can easily be kept
>> >> in
>> >> sync.
>> >
>> > So what's the point of having .gitignore tracked by the SCM?
>>
>> So that it gets applied by default.
>
> It seems you're deliberately not answering the question, or are
> suggesting bloating the make/SCM system with a justification for
> lazyness.

I answered the question. I want .gitignore checked in because then the
configuration gets applied by default. I want a rule to update it so
that it is easy to keep in sync with the Makefile.

> You asked if there was any objection to this, you got some, now do what
> you want.

I'm trying to understand your objection and workflow. My current
understanding is that you think it bloats the repository, and you
prefer to see the 250 lines of build artifacts when you run `git
status`. Is that an accurate summary?

In my opinion, since the file is only 700 bytes long, I think its
usefulness outweighs its bloat.

I'll wait a few more days in case someone else has an opinion.

> But pushing it both into the Makefile and the SCM is not justified at
> all, chose either one.

The Makefile rule will help to prevent it from getting out of date
with respect to the Makefile, so I think it is justified. It won't get
run unless you explicitly build it.

Re: [hackers] [sbase] [PATCH] Add .gitignore

2019-06-17 Thread Michael Forney

On 2019-06-17, Quentin Rameau  wrote:
>> How do you deal with ~250 lines of "Untracked files:" in the `git
>> status` output?
>>
>> If you want to see them, you can always run `git status --ignored`.
>
> If you want it, you can always run `make .gitignore` *once*.

Once for every clone of sbase.

>> >> The place for user-specific and repository-specific ignored files is
>> >> .git/info/exclude. But in general, we don't know the location of the
>> >> .git directory, so we'd probably have to use some git command to
>> >> figure out exactly where to put it.
>> >
>> > There's no “we”, that's the user's responsability to figure out where
>> > to put it.
>>
>> There is a "we" if a Makefile rule were to be added, since it needs to
>> create the file at the appropriate location.
>
> The appropriate location is .gitignore.

No, see gitignore(5):

* Patterns which should be version-controlled and distributed to other
  repositories via clone (i.e., files that all developers will want to
  ignore) should go into a .gitignore file.

* Patterns which are specific to a particular repository but which do
  not need to be shared with other related repositories (e.g., auxiliary
  files that live inside the repository but are specific to one user’s
  workflow) should go into the $GIT_DIR/info/exclude file.

* Patterns which a user wants Git to ignore in all situations (e.g.,
  backup or temporary files generated by the user’s editor of choice)
  generally go into a file specified by core.excludesFile in the user’s
  ~/.gitconfig. Its default value is $XDG_CONFIG_HOME/git/ignore. If
  $XDG_CONFIG_HOME is either not set or empty, $HOME/.config/git/ignore
  is used instead.

>> > What's the rationale for having it duplicated both in the SCM and in
>> > the
>> > Makefile then?
>>
>> So that if a utility is added or removed, .gitignore can easily be kept in
>> sync.
>
> So what's the point of having .gitignore tracked by the SCM?

So that it gets applied by default.

Re: [hackers] [sbase] [PATCH] Add .gitignore

2019-06-17 Thread Michael Forney

On 2019-06-17, Quentin Rameau  wrote:
>> What's the downside to having it checked into the repository? Are
>> there any reasons why a user wouldn't want this feature? Personally, I
>> want it in every clone of sbase I make, and I imagine most other
>> people want this as well. It is annoying to have `git status` scroll
>> my changes off the screen due to all the build artifacts.
>
> Well, personnaly I don't want it, I prefer to have a clear view of the
> clone status, hence the “user-side feature”, as in user-specific
> preference.

How do you deal with ~250 lines of "Untracked files:" in the `git
status` output?

If you want to see them, you can always run `git status --ignored`.

>> The place for user-specific and repository-specific ignored files is
>> .git/info/exclude. But in general, we don't know the location of the
>> .git directory, so we'd probably have to use some git command to
>> figure out exactly where to put it.
>
> There's no “we”, that's the user's responsability to figure out where
> to put it.

There is a "we" if a Makefile rule were to be added, since it needs to
create the file at the appropriate location.

> What's the rationale for having it duplicated both in the SCM and in the
> Makefile then?

So that if a utility is added or removed, .gitignore can easily be kept in sync.

Re: [hackers] [sbase] [PATCH] Add .gitignore

2019-06-17 Thread Michael Forney

On 2019-06-13, Quentin Rameau  wrote:
> Hello Michael,
>
>> Also, add rule to regenerate in case executable list changes.
>> ---
>> Any objection to this?
>
> I'm not really keen on pushing this file to the source history, I think
> this should be kept separate, this is a user-side feature.

What's the downside to having it checked into the repository? Are
there any reasons why a user wouldn't want this feature? Personally, I
want it in every clone of sbase I make, and I imagine most other
people want this as well. It is annoying to have `git status` scroll
my changes off the screen due to all the build artifacts.

> But having a rule un the Makefile, as you did, is a good compromise
> there, if the user wants it, he can have it with little additionnal
> effort.
>
> So I'd be for pushing the Makefile rule, not committing the actual file
> into the development history

The place for user-specific and repository-specific ignored files is
.git/info/exclude. But in general, we don't know the location of the
.git directory, so we'd probably have to use some git command to
figure out exactly where to put it.

Re: [hackers] [sbase] [PATCH] mkfifo: Simplify -m handling

2019-06-13 Thread Michael Forney

Thanks for looking this over.

On 2019-06-13, Richard Ipsum  wrote:
> I might be wrong but I think this will cause a subtle change in
> behaviour since the umask is needed to correctly interpret the mode
> string. One example that's documented in chmod(1p) is "-w" vs "a-w".
> "a-w" removes all write perms, but "-w" removes only the permissions
> that weren't filtered by the mask. This could explain why this
> is currently being done in two separate steps.

umask(0) sets the mask to 0 and returns the old mask, so I think the
mode is interpreted the same way before and after this change
(getumask() is even implemented as two calls to umask, returning the
result of the first call).

Since chmod isn't affected by the current umask, I think the final
mode is the same before and after this change.

$ umask 022
$ ./mkfifo.old -m -w testfifo && ls -l testfifo && rm testfifo
pr--rw-rw-1 michael  michael   0 Jun 13 17:02 testfifo
$ ./mkfifo.new -m -w testfifo && ls -l testfifo && rm testfifo
pr--rw-rw-1 michael  michael   0 Jun 13 17:02 testfifo
$ ./mkfifo.old -m a-w testfifo && ls -l testfifo && rm testfifo
pr--r--r--1 michael  michael   0 Jun 13 17:03 testfifo
$ ./mkfifo.new -m a-w testfifo && ls -l testfifo && rm testfifo
pr--r--r--1 michael  michael   0 Jun 13 17:03 testfifo

> That being said looking at mkfifo(1p) it says that if -m is present
> then the value of mkfifo's mode argument is unspecified, but it
> shouldn't be any less restrictive than the argument provided to -m.
> So if I'm reading this correctly the current argument of
> (S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP | S_IROTH | S_IWOTH)
> isn't strictly correct either since it could be less restrictive
> than the argument provided to -m.

Yes, so I think the patch also prevents this issue since the FIFO is
created with the final mode.

> If I'm not mistaken in the second part then maybe the mode argument
> to mkfifo should be 0, then apply the actual mode using chmod?

I think this would work as well.

1 2 3 >

1 - 100 of 247 matches

Mail list logo