Re: [hackers] help
On Thu, 3 Aug 2023 12:29:04 +0100 Christopher Lang wrote: Dear Christopher, > I was trying to subscribe to the mailing list but didn't get a > response from so I tried this. I think I > have figured it out not though. > Sorry if I pinged you. And I'm fine haha. > PS. I'm sorry if this is html mail, still trying to set up a better > email client. how can we be sure it's you and not your kidnapper? HANG IN THERE, CHRISTOPHER, IF YOU CAN HEAR THIS! With best regards Laslo
Re: [hackers] [quark][PATCH] Fix buffer over-read in decode()
On Sun, 21 Aug 2022 20:09:16 + HushBugger wrote: > On Wed, 2022-08-17 at 08:49 +0600, NRK wrote: > > I think the `s++` should be removed from the for loop and `s` should > > be incremented as needed inside the loop instead. > > Agreed. I've changed it. Thank you for working out this patch, I have applied it! :)
Re: [hackers] [quark][PATCH] Fix strftime error handling
On Fri, 8 Jul 2022 11:12:17 -0700 robert wrote: > Unlike snprintf, strftime buffer contents are undefined when it fails, > so make sure the buffer is null-terminated. To prevent garbage from > being printed out, we simply set the timestamp to the empty string, > but maybe setting it to "unknown time" or something similar would be > better. Either way, I don't think this can fail until year 1, so > it's not a big deal. > --- > connection.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/connection.c b/connection.c > index 8aca2ab..24de809 100644 > --- a/connection.c > +++ b/connection.c > @@ -31,7 +31,8 @@ connection_log(const struct connection *c) > if (!strftime(tstmp, sizeof(tstmp), "%Y-%m-%dT%H:%M:%SZ", > gmtime(&(time_t){time(NULL)}))) { > warn("strftime: Exceeded buffer capacity"); > - /* continue anyway (we accept the truncation) */ > + tstmp[0] = '\0'; /* tstmp contents are undefined on > failure */ > + /* continue anyway */ > } > > /* generate address-string */ > -- > 2.17.1 Thank you, I have applied your patch!
Re: [hackers] [quark][PATCH] Remove superfluous byteorder conversion
On Tue, 19 Apr 2022 12:20:40 +0200 Thomas Oltmann wrote: > When comparing IPv4 addresses in sock_same_addr() we don't need > to correct their byteorder just to see if they are equal or not. > Byte swapping would only be needed if we needed to know > which address had the greater value. > --- > sock.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/sock.c b/sock.c > index ecb73ef..f1385ca 100644 > --- a/sock.c > +++ b/sock.c > @@ -200,8 +200,8 @@ sock_same_addr(const struct sockaddr_storage > *sa1, const struct sockaddr_storage ((struct sockaddr_in6 > *)sa2)->sin6_addr.s6_addr, sizeof(((struct sockaddr_in6 > *)sa1)->sin6_addr.s6_addr)); case AF_INET: > - return ntohl(((struct sockaddr_in > *)sa1)->sin_addr.s_addr) == > -ntohl(((struct sockaddr_in > *)sa2)->sin_addr.s_addr); > + return ((struct sockaddr_in *)sa1)->sin_addr.s_addr > == > +((struct sockaddr_in *)sa2)->sin_addr.s_addr; > default: /* AF_UNIX */ > return strcmp(((struct sockaddr_un *)sa1)->sun_path, > ((struct sockaddr_un *)sa2)->sun_path) > == 0; -- > 2.35.1 Thanks, applied!
Re: [hackers] [quark][PATCH] Fix inverted conditional in sock_same_addr()
On Tue, 19 Apr 2022 12:04:57 +0200 Thomas Oltmann wrote: > sock_same_addr() is supposed to return 0 if sa1 and sa2 are different > addresses. Since memcmp() returns 0 if its arguments are equal, we > need to flip the return value by comparing it to 0. > --- > sock.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/sock.c b/sock.c > index ecb73ef..e6e7754 100644 > --- a/sock.c > +++ b/sock.c > @@ -198,7 +198,7 @@ sock_same_addr(const struct sockaddr_storage > *sa1, const struct sockaddr_storage case AF_INET6: > return memcmp(((struct sockaddr_in6 > *)sa1)->sin6_addr.s6_addr, ((struct sockaddr_in6 > *)sa2)->sin6_addr.s6_addr, > - sizeof(((struct sockaddr_in6 > *)sa1)->sin6_addr.s6_addr)); > + sizeof(((struct sockaddr_in6 > *)sa1)->sin6_addr.s6_addr)) == 0; case AF_INET: > return ntohl(((struct sockaddr_in > *)sa1)->sin_addr.s_addr) == ntohl(((struct sockaddr_in > *)sa2)->sin_addr.s_addr); -- > 2.35.1 Thank you, I have applied your patch! You really have eagle-eyes. :)
Re: [hackers][quark] Quark don't print-out output after dettach
On Sat, 31 Dec 2022 06:35:43 -0500 fo...@dnmx.org wrote: Dear fossy, > Quark does not print-out connection messages to terminal or a log file > once dettached. > Can someone help me? could you please provide a minimal reproducible example? With best regards Laslo Hunhold
Re: [hackers] [slstatus] More LICENSE updates || drkhsh
On Wed, 21 Dec 2022 14:36:31 + drkhsh wrote: Dear Aaron, > So, before this discussion becomes untechnical, here are some facts > from a quick research. > > https://www.copyright.gov/title17/92chap4.html#408 > > "(2) in the case of a work other than an anonymous or pseudonymous > work, the name and nationality or domicile of the author or authors, > and, if one or more of the authors is dead, the dates of their deaths; > > (3) if the work is anonymous or pseudonymous, the nationality or > domicile of the author or authors;" > > For me that means that publishing pseudonymous copyrighted work under > any license should be fine as long as the license does not explicitly > mention it? you are citing US copyright law and it's a whole can of worms across the world. I could go on citing EU, Russian, etc. (all signers of the Berne convention) laws, which are lenient or dismissive in regard to pseudonymous attributions, but always at least require unique identification of a pseudonym, which can always be a matter of dispute. If a license is invalid in a certain country's jurisdiction (e.g. if pseudonyms are used even though not allowed), the code ends up being "all rights reserved" and thus not satisfying the OSI license criteria. Even if pseudonyms were allowed by the Berne convention (didn't check), it would probably at least require unique identifiability of the pseudonym, casting doubt at the overall license text. I know many people and companies who are very careful about only building their software using components with watertight licenses. Regarding the pseudoynm uniqueness, I think that "pseudonym" in the law's sense is very thinly stretched in regard to arbitrary web-nicknames, and the only reason, I think, it's included in some laws is that when an author writes a book under pseudonym the book doesn't immediately go into the public domain. We could discuss this forever, but this ends up being territory where one would have to ask a judge if a pseudonym is unique enough or not. Having real names only in a license has other, more practical, advantages, though: You actually have the chance to reach out to people even years after the software release, and I've had one positive experience with this a few years ago. There is simply no chance if you just have a nickname with a throwaway-e-mail-address, e.g. "PBC ", to ever reach out to this person in most cases after just a few years. Reaching out could be regarding a relicensing (e.g. ISC/MIT -> GPL or the other way round), technical inquiry or simply to invite them to something. I'm amazed about how many people are scared of putting their names on their works; in a sense it reduces the software's trustworthiness, may it only be by a subjective factor. With best regards Laslo
Re: [hackers] [slstatus] Update LICENSE || drkhsh
On Mon, 19 Dec 2022 02:44:40 +0100 (CET) g...@suckless.org wrote: Dear Aaron, > commit 1ae616190cb3f88221571343a284fdf9f55b683f > Author: drkhsh > AuthorDate: Mon Dec 19 02:40:00 2022 +0100 > Commit: drkhsh > CommitDate: Mon Dec 19 02:44:21 2022 +0100 > > Update LICENSE > > diff --git a/LICENSE b/LICENSE > index 70b9fb3..b7e3aa6 100644 > --- a/LICENSE > +++ b/LICENSE > @@ -27,6 +27,8 @@ Copyright 2020 Alexandre Ratchov > Copyright 2020 Mart Lubbers > Copyright 2020 Daniel Moch > Copyright 2022 NRK > +Copyright 2022 Patrick Iacob > +Copyright 2021-2022 planet36 > > Permission to use, copy, modify, and/or distribute this software for > any purpose with or without fee is hereby granted, provided that the > above planet36's real name is "Steven Ward" (as can be extracted from his GitHub[0]) and his canonical E-Mail-address is plane...@gmail.com. It should be avoided to add pseudonyms to license files, as the license is formally and legally binding. With best regards Laslo [0]:https://github.com/planet36/organize-roms/commit/24f10204297b74939a8676a864fa5c605e9f0306
Re: [hackers] [slstatus] config.mk: Fix PREFIX assignment || planet36
On Mon, 19 Dec 2022 02:44:40 +0100 (CET) g...@suckless.org wrote: Dear Aaron, > commit c225c4315161a992b9e44dd990d083ee57f7f713 > Author: planet36 > AuthorDate: Wed May 26 14:29:32 2021 -0400 > Commit: drkhsh > CommitDate: Mon Dec 19 02:44:21 2022 +0100 > > config.mk: Fix PREFIX assignment > > Signed-off-by: drkhsh > > diff --git a/config.mk b/config.mk > index ead1859..8f06800 100644 > --- a/config.mk > +++ b/config.mk > @@ -4,7 +4,7 @@ VERSION = 0 > # customize below to fit your system > > # paths > -PREFIX = /usr/local > +PREFIX ?= /usr/local > MANPREFIX = $(PREFIX)/share/man > > X11INC = /usr/X11R6/include > I would interject here that "?=" is not POSIX and assume that there was push by some packager. Based on my experience, I would recommend to go back to "=" and encourage packagers to simply do make PREFIX=... which overrides any assignments in config.mk. With best regards Laslo
Re: [hackers] [libgrapheme] Do not falsely read entire buffer instead of simply the filled with || Laslo Hunhold
On Thu, 24 Nov 2022 20:32:53 +0600 NRK wrote: Dear NRK, > Small nitpick: ASan (and the other sanitizers) are *dynamic* > analyzers, as they happen during runtime. > > Static analysis is analyzing without executing anything. Examples of > static analyzers would be clang-tidy or cppcheck. Newer GCC versions > also have a `-fanalyzer` flag for statically analyzing C code, but in > my experience it's not mature yet - but the direction looks promising. yes, thanks, you are totally right, of course. :) With best regards Laslo
Re: [hackers] [libgrapheme] Add a check make-target as an alias for test || Laslo Hunhold
On Mon, 21 Nov 2022 11:06:33 + "Tom Schwindl" wrote: Dear Tom, > This should probably be added to the PHONY target as a prerequisite. thank you, I have added this in commit[0]. With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/84bd5ee67bb9cbd317c8fa44ae4da768e2af922d.html
Re: [hackers] [lchat] Makefile: add dist target to create release tarballs || Jan Klemkow
On Thu, 20 Oct 2022 19:18:42 -0400 Steve Ward wrote: Dear Steve, > If you want to stick with git, the mkdir, cp, tar, and rm commands > could be replaced with: > git archive --prefix lchat-$(VERSION)/ HEAD | gzip > > lchat-$(VERSION).tar.gz this would add an implicit dependency on git, though. With best regards Laslo
Re: [hackers] [tabbed] Makefile: simplify and remove hiding the build process || Hiltjo Posthuma
On Wed, 12 Oct 2022 23:02:14 +0200 (CEST) g...@suckless.org wrote: > -# Solaris > -#CFLAGS = -fast ${INCS} -DVERSION=\"${VERSION}\" > -#LDFLAGS = ${LIBS} Noo, not Solaris!
Re: [hackers] [libgrapheme] Switch to semantic versioning and improve dynamic library handling || Laslo Hunhold
On Fri, 7 Oct 2022 23:32:08 +0600 NRK wrote: Dear NRK, > Curious, what makes you change you mind about putting these back in > config.mk instead of keeping them in the Makefile ? Since they aren't > meant to be changed by the user. you're totally right. I put them back at first because I would risk not rebuilding when I changed them in the Makefile (which is actually easily fixable by just adding a dependency on the Makefile additionally to config.mk, which I did just now[0]). Another aspect was that the VERSION_* variables are used in some of the variables, but I admit that it makes more sense to simply use them and optionally add a comment at the top. To be honest, though, they are pretty much self-explanatory. With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/d42f53b5baafe01caa48477e204b63e065660117.html
Re: [hackers] [libgrapheme] Convert GRAPHEME_STATE to uint_least16_t and remove it || Laslo Hunhold
On Tue, 4 Oct 2022 05:07:13 +0600 NRK wrote: Dear NRK, > Another possibility is wrapping the integer inside a struct: > > typedef struct { unsigned internal_state; } GRAPHEME_STATE; > > the benefit of this is that the type GRAPHEME_STATE clearly states the > purpose, whereas a `uint_least16_t` doesn't. > > Wrapping an enum into a struct is also a common trick to get stronger > type-checking from the compiler; I don't think it matters in this case > though, since the state is always passed via pointer. > > > and I want all of the semantics to be crystal clear to the > > end-user. > > Other way of looking at it is that the state is an internal thing so > the user shouldn't be concerned about what's going on behind the > scene. yeah, you bring up good points that I also thought of. What one should not forget is that those shenanigans also complicate the use of FFIs. I really originally thought that the state type would be used in more than one place, but that's not the case. Enough meaning is given to it by the name of the variable, so it's cool. > The `(uint_least16_t)1` casts don't really do much since `int` is > guaranteed to be 16bits anyways. But if you want to be explicit, you > can still use `UINT16_C(1)`, which is shorter thus less noisy, > instead of casting: > > - out->prop_set = in & (((uint_least16_t)(1)) << 8); > + out->prop_set = in & (UINT16_C(1) << 8); ah yeah, I always seem to forget about this macro, even though I use it so often in the code. Fixed[0] now. > I'd also return by value in these 2 functions. Expressions like these > are more clear compared to out pointers: > > *s = state_deserialize(); > state = state_serialize(*s); I prefer to always pass structs by reference. Call me old-fashioned in this regard. Thank you for reviewing the changes, though. I really appreciate it! :) With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/0aa5d262f8d0975341bcc60916e12044c7d64d0d.html
Re: [hackers] [libgrapheme][PATCH] fix manpage
On Sun, 2 Oct 2022 09:29:18 +0600 NRK wrote: Dear NRK, > - to_case: there's no `len` parameter. it should be `srclen` and > `dstlen`. > - is_case: `caselen` should be a pointer. > --- > > P.S: one more thing that caught my eye; the "next" manpages for the > codepoint versions states: > > If len is set to SIZE_MAX the string str is interpreted to be > NUL-terminated and processing stops when a NUL-byte is > encountered. > > is this correct? what if the integer contains a nul-byte? > > it seems to be that it should be an integer (uint_least32_t) with the > value 0, not a nul-byte, which are different things. thanks for reporting these problems! I actually had on my TODO to take a look at the manuals, given there were also some other problems. I now took the time to fix them all[0], including your suggestions. Thank you very much! The wording regarding the "NUL-byte" was a bit unfortunate for the codepoint-based functions. I fixed it up accordingly. With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/995e37182dc53da55dc4cf34868513610215c79e.html
Re: [hackers] [libgrapheme] Update to Unicode 15.0.0 || Laslo Hunhold
On Thu, 15 Sep 2022 09:44:11 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > Finally support for the duck emoji! > > https://blog.unicode.org/2022/09/announcing-unicode-standard-version-150.html > https://www.unicode.org/announcements/u15-emoji-annc-large.png yes, and not to forget that we can finally express an old old meme in Unicode: . With best regards Laslo
Re: [hackers] [libgrapheme] Add manuals for the grapheme_to_*case_utf8-functions || Laslo Hunhold
On Sun, 28 Aug 2022 20:00:42 +0200 Quentin Rameau wrote: Dear Quentin, > But of course, that's why the construction ${variable} exists, > for this very common case. > It's clear that UNITs isn't a variable, > so you need to separate the variable from the string. > You don't that with a subshell and printf, > you do that with just ${UNIT}s. thanks for your explanation and pointing this out! I totally forgot about this and have now pushed a change to use the proper parameter expansion[0]. For those interested, here's the excerpt from the POSIX-standard[1]. Also thanks to you, Thomas Oltmann, for pointing this out as well. With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/6e6c538e4efb4d191a2f0391466556eb758d76bd.html [1]:https://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02_06_02
Re: [hackers] [libgrapheme] Add manuals for the grapheme_to_*case_utf8-functions || Laslo Hunhold
On Sun, 28 Aug 2022 18:33:21 +0200 Quentin Rameau wrote: > > +function returns the number of $(printf $UNIT)s in the array > > resulting > ^-- But… Why?! Doesn't work otherwise in the heredoc. If I write $UNITs it doesn't interpret it correctly, so I reformulate it as a subshell-expression. It's not elegant, but works. Is there a better way to address this?
Re: [hackers] [dwm][PATCH RESEND 0/2] Const-correctness fixes
On Mon, 22 Aug 2022 11:15:19 +0100 Chris Down wrote: Dear Chris, > Hmm? For example, for the FC/Xft types, fontname is declared as const > by xfont_create, but then we cast away its constness when passing it > to FCNameParse. The same goes for text, which we claim is const in > the drw_font_getexts signature, but then we remove its constness. > > In general the existing code seems confused, no? Either we shouldn't > pass them in as const in the first place, or we should maintain the > constness that we declare in the function parameters. > > There shouldn't be any logical change here, but it seems weird to say > things are not mutable up front and then waver about it later. Right > now there's no UB, but making sure we don't cast away the const > mitigates the risk altogether. I agree here. Not only should const be used to at least have a partial "contract" for the function parameters (C doesn't offer a lot in this regard and it's an easy way to prevent problems), it also allows the compiler to optimize the code better. With best regards Laslo
Re: [hackers] [libgrapheme] [PATCH] Remove dead file `src/util.c'
On Wed, 10 Aug 2022 15:03:06 + Tom Schwindl wrote: Dear Tom, > Since commit 072bb271868a3583da1f ("Introduce mostly branchless > character break detection") removed the code from the file, it no > longer serves a purpose. while it was true, it now is used again. :) It often makes more sense to keep util-stubs that are guaranteed to be used later rather than ripping it out of the build-system only having to include it later again. With best regards Laslo
Re: [hackers] [sbase] [PATCH] Use ar(1)'s s-flag instead of invoking ranlib(1)
On Mon, 1 Aug 2022 11:09:16 +0200 "Roberto E. Vargas Caballero" wrote: Dear Roberto, > Because then you will support only the last systems. If you keep > the ranlib you will support systems that support all versions of > the standard. Again, if you find a system without ranlib then > we can talk and consider what to do, but removing only for the sake > of "the standard does not include anymore ranlib" is a horrible idea. > For example, scc requires the use of ranlib, if you remove it then > I will not be able to continue testing scc with suckless software. > What happens if I want to compile sbase in an old SunOs workstation? I thought about it a bit more in the last few weeks and added ranlib again. The main reason is that I find it convincing that POSIX would not try to define varying binary formats, which is why the toolchain-tool ranlib(1) was probably never included. Adding the s-flag to ar is simply an unexpected and ill-fitting feature-creep that bloats up an otherwise simple archive-tool. Thanks for this very interesting discussion and sharing your experience! With best regards Laslo
Re: [hackers][quark][patch] pre-compression
On Tue, 16 Aug 2022 11:19:55 -0400 fo...@dnmx.org wrote: Dear fossy, > Ah.. so very complicated, huh? Oh, well.. it's not a low-hanging-fruit by any means, yes. > Hey, so.. I have a question.. how come that Suckless' web-site isn't > hosted using Quark? If I remember correctly - it's Nginx? > Like - not even OpenBSD's httpd?? The site is currently served using nginx. As far as I can tell, we just haven't gotten around to switching over to OpenBSD's httpd. To be completely frank, quark is a tool for a very limited scope, and I've been battling with certain aspects for quite a while. It's good for quickly hosting something from the command line, but lacks in other aspects. On OpenBSD, I would even recommend its httpd over quark in most cases. > I find Quark fine enough.. just couldn't manage to log messages with a > command.. probably could add a few lines for logging.. but I just.. > I'm too lazy for that :/ It's as simple as quark > log and a daily cron for log-rotation that amounts to mv log log.2022-08-16 or something (fired daily at midnight, the date of course generated on the fly for the filename). With best regards Laslo
Re: [hackers][quark][patch] pre-compression
On Mon, 8 Aug 2022 07:57:51 -0400 fo...@dnmx.org wrote: Dear fossy, > I'll try to do it by myself, but I don't promise anything.. It seems > like the resp files were moved to esnprintf or something. > > Sorry for no original e-mail text, DNMX is broken :d given quark's new structure compression is not easily possible anymore, unless you add a complicated stream-compression on top within the individual connection-structs. Thanks anyway for your offer! With best regards Laslo
Re: [hackers] [libgrapheme] Use (size_t)(-1) instead of SIZE_MAX and fix style || Laslo Hunhold
On Sun, 31 Jul 2022 12:18:22 +0200 Mattias Andrée wrote: Dear Mattias, > Why wouldn't SIZE_MAX be the maximum of size_t? you're totally right and I changed it. Thanks! With best regards Laslo
Re: [hackers] [libgrapheme] Rename reallocarray() to reallocate_array() to prevent mangling || Laslo Hunhold
On Mon, 1 Aug 2022 21:53:45 +0600 NRK wrote: Dear NRK, > Given that this no longer shadows a libc/conventional function, I'd go > one step further and move the `fprintf + exit` check inside > reallocate_array() so the calling code doesn't need to worry about > null returns. thanks for your remark, but I prefer it this way. With best regards Laslo
Re: [hackers] [libgrapheme][PATCH] Add reallocarray implementation
On Sat, 30 Jul 2022 14:29:05 -0700 robert wrote: Dear Robert, > reallocarray is nonstandard and glibc declares it only when > _GNU_SOURCE is defined. Without this patch or _GNU_SOURCE defined, I > get a seg fault from reallocarray being implicitly declared with the > wrong signature. thanks for your patch! I applied it with a few modifications. As a matter of fact, glibc exports reallocarray() with _DEFAULT_SOURCE since version 2.29 (from January 2019), however, you are still totally correct that using this function reduces portability. With best regards Laslo
Re: [hackers] [sbase] [PATCH] Use ar(1)'s s-flag instead of invoking ranlib(1)
On Fri, 22 Jul 2022 17:28:38 +0200 "Roberto E. Vargas Caballero" wrote: Dear Roberto, > I disagree with this change. I think it adds nothing and reduce > portability of the Makefiles. why would it reduce the portability of the Makefiles? It can be expected that all ar-implementations support the s-flag, and ranlib is simply legacy. With best regards Laslo
Re: [hackers] [dwm][PATCH] spawn: reduce 2 lines, change fprintf() + perror() + exit() to die("... :")
On Fri, 29 Jul 2022 18:26:04 -0500 explosion0men...@gmail.com wrote: Dear explosion0mental, > when calling die and the last character of the string corresponds to > ':', die() will call perror(). See util.c > > Cuz muh lines of code!1 > - fprintf(stderr, "dwm: execvp %s", ((char > **)arg->v)[0]); > - perror(" failed"); > - exit(EXIT_SUCCESS); > + die("dwm: execvp '%s' failed:", ((char as far as I can tell this is not correct, given the program exits with EXIT_SUCCESS, not EXIT_FAILURE. With best regards Laslo
Re: [hackers] [sbase] [PATCH] Use ar(1)'s s-flag instead of invoking ranlib(1)
On Fri, 22 Jul 2022 17:28:38 +0200 "Roberto E. Vargas Caballero" wrote: Dear Roberto, > I disagree with this change. I think it adds nothing and reduce > portability of the Makefiles. why would it reduce the portability of the Makefiles? It can be expected that all ar-implementations support the s-flag, and ranlib is simply legacy. With best regards Laslo
Re: [hackers] [quark][PATCH] Fix strftime error handling
On Fri, 8 Jul 2022 11:12:17 -0700 robert wrote: Dear Robert, > Unlike snprintf, strftime buffer contents are undefined when it fails, > so make sure the buffer is null-terminated. To prevent garbage from > being printed out, we simply set the timestamp to the empty string, > but maybe setting it to "unknown time" or something similar would be > better. Either way, I don't think this can fail until year 1, so > it's not a big deal. nice catch, thanks! I'll merge it with the next window. With best regards Laslo
Re: [hackers] [PATCH][libgrapheme] macro-hygiene: wrap arguments in parenthesis
On Wed, 29 Jun 2022 09:07:49 +0600 NRK wrote: Dear NRK, > reported by clang-tidy. thank you very much! I pushed it! Also, even though I appreciate you checking the code, there are admittedly multiple ugly spots that need refactoring and also known bugs (especially for the _utf8-functions) that need fixing or rather refactoring. The library is currently in a phase of "expansion" to check technical feasability. Shared concepts will be integrated into common concepts to ultimately simplify the code. Stay tuned for more! :) With best regards Laslo
Re: [hackers] [sent] [PATCH 1/3] sent.c: Drop unnecessary NULL checks
On Sun, 26 Jun 2022 21:53:49 +0300 Greg Minshall wrote: Dear Greg, > for what it's worth, i'd probably code with the checks, so as to avoid > future code editors (including myself) doing a double-take, thinking, > "hmm, did the author consider that case?". (though, of course, you > did that same -- if opposite -- double-take when you saw that code.) come on, it is general knowledge that free() accepts NULL arguments. The extra checks just add more cruft. With best regards Laslo
Re: [hackers] [libgrapheme] Explicitly use object-files in library-generation || Laslo Hunhold
On Fri, 24 Jun 2022 11:51:51 +0200 Quentin Rameau wrote: Dear Quentin, > > libgrapheme.a: $(SRC:=.o) > > - $(AR) rc $@ $? > > + $(AR) rc $@ $(SRC:=.o) > > $(RANLIB) $@ > > This works as intended with $?, because then you only update objects > that are out of date, not *all* objects inconditionally (just note > that you might want the -u flag too). today I learned, thank you! :) I pushed the change, but kept out the -u flag, as it's a bit redundant and might lead to unexpected results when you override something in make. Please let me know if I'm missing something there. With best regards Laslo
Re: [hackers] [libgrapheme] Implement line-segmentation || Laslo Hunhold
On Fri, 17 Jun 2022 13:47:32 -0400 fo...@dnmx.org wrote: > Is there no better way of sending that long message? > You ended up in my spam folder, and this is a rarety in-it-self. > Just though I'd mention that. > Have a nice day. Such big commits are a rarity and I see no reason to adapt the git-mail-daemon for the very few cases big "data"-files like this one are pushed. If anyone knows a way to tell git not to "diff" a file like this, please let me know. I know you can tell git to treat files as binary, but I honestly don't want that long-term, given the diffs to the data-files are interesting by themselves with new Unicode-versions. With best regards Laslo
Re: [hackers] [libgrapheme] Add Word-data-files || Laslo Hunhold
On Mon, 13 Jun 2022 06:11:00 +0600 NRK wrote: Dear NRK, > IMO they add unnecessary noise to the repo and commit diff. If this > was the primary reason, then simply including them in the tarball > would've sufficed. > > However since they already got committed, don't think it's worth > reverting now. your point is definitely valid and it also worried me, but the self-containedness weighs heavier in my opinion. I think it's a worrying trend that more and more software requires an internet connection to satisfy internal dependencies at compile-time. At least you can get around the need for external dependencies by having a package mirror or something. The diversity of sytems out there is almost unimaginable. I always imagine some remote village in Namibia which only has one single shared satellite uplink. With xz-compression (level 2e), the libgrapheme-tarball, which includes all data-files, is only around 180K (including some extra files not published yet). Uncompressed, the Unicode data files are 2.5MB (!) and would require 10 separate connections to download, which makes the case pretty clear to me. With best regards Laslo
Re: [hackers] [libgrapheme] Implement word-segmentation || Laslo Hunhold
On Wed, 8 Jun 2022 17:08:57 +0600 NRK wrote: Dear NRK, > On Mon, Jun 06, 2022 at 10:40:33PM +0200, g...@suckless.org wrote: > > + /* with no breaks we break at the end */ > > + if (off == len) { > > + return len; > > + } else { > > + return off; > > + } > > This is just the same as `return off;` , is it not? yes, indeed, thank you! I've fixed it now[0]. With best regards Laslo [0]:https://git.suckless.org/libgrapheme/commit/5910bc61b6f065cab26682993a76904c37a0f86b.html
Re: [hackers] [dwm|dmenu|st][PATCH] strip the installed binary
On Mon, 2 May 2022 13:37:26 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > I don't like this. > > I'd rather have it so the Makefile respects the system or package > system CFLAGS and LDFLAGS by default. Then someone can do: make > CFLAGS="-Os" LDFLAGS="-s" etc. > > LDFLAGS="-s" is practically the same as calling strip and stripping > it. > > It is up to the distro package/ports maintainer to strip symbols (or > not). This can be an additional packaging step. I would've suggested the same. > As a off-topic side-note I think we removing config.mk and just > having the Makefile is simpler too. As a counterpoint, the config.mk makes it very clear which environment variables are used in the Makefile. Some packagers like to apply patches directly to the config.mk (which is an anti-pattern, but done regardless). Given the config.mk is usually more stable than the Makefile itself and a clear indication of what you can tinker with, I'd keep it. It's much more intuitive to look into config.mk when something doesn't work than getting the idea to directly look into the Makefile. With best regards Laslo
Re: [hackers] [dmenu] inputw: improve correctness and startup performance || NRK
On Fri, 29 Apr 2022 20:39:51 +0200 Jochen Sprickerhof wrote: Dear Jochen, > There is actually a dmenu fork here: > > https://github.com/michaelforney/dmenu > > The diff does not look too big and afair it was working for me some > time ago. I think it would be great to provide an implementation for > Wayland. Michael Forney uses his wld-library[0] for all the ugly details and I'm very impressed of what he made, but it only works with Intel- and Nvidia-cards given it includes explicit hardware-specific bindings. I wonder how affected Wayland-EGL-whatnot-code is by code-rot, though, and how easy it is to integrate in a Makefile without too much build-magic. I also wonder why you need to have explicit hardware-handling, but maybe Michael was trying not to depend on Mesa or something. With best regards Laslo [0]:https://github.com/michaelforney/wld
Re: [hackers] [dmenu] inputw: improve correctness and startup performance || NRK
On Fri, 29 Apr 2022 17:12:15 +0200 Jochen Sprickerhof wrote: Dear Jochen, > That sounds like the non_blocking_stdin patch: > > http://tools.suckless.org/dmenu/patches/non_blocking_stdin/ oh yes, thanks for pointing that patch out! The "reloading-hotkey"-behaviour is a bit overkill, but I find the select-loop listening on the xfd and stdin to be pretty elegant. With best regards Laslo
Re: [hackers] [dmenu] inputw: improve correctness and startup performance || NRK
On Fri, 29 Apr 2022 22:55:31 +0600 NRK wrote: Dear NRK, > While you've asked this to Hiltjo, I figured I'd give my 2c on this > since I've been trolling around the dmenu code base a bit recently. > > Most of the heavy-lifting is currently done via libsl, however libsl > is a pretty thin abstraction over X and exposes a lot of the X (and > Xft) specific details in the API. Just taking a look at `drw.h` should > confirm this. > > So in order to support wayland, the entire API will need to reworked > to hide away all low level details so that it can be used for both X > and Wayland. > > But that's not all, dmenu itself makes a good amount of calls to Xlib > functions. So all those will need to be abstracted away as well. > > At the end, I suspect it'll be much simpler to just have a separate > branch or even just a rewrite from scratch rather than trying to cram > support for both wayland and X in the existing codebase. thanks for your overview! One big issue I see is that Wayland offers no way of placing a window "at the top". If there are ways to "request" this these are proprietary extensions to the protocol. I'm a bit torn with regard to Wayland: On the one hand it is being adopted more and more (sway, etc.), but on the other hand, I find it to be a very ill-designed protocol that leads to a lot of fragmentation, extension madness and drops a lot of useful stuff X offers. The chance to design something truly wonderful was wasted on this piece of crap. Maybe on the third try in 2053... With best regards Laslo
Re: [hackers] [dmenu] inputw: improve correctness and startup performance || NRK
On Fri, 29 Apr 2022 10:31:14 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > Reading through the long wall of text (*sigh*). I'll try to respond > to the relevant parts of the actual topic. > > There won't be grapheme support into dmenu or dwm (until decided > otherwise for whatever reason), it is too complex. libgrapheme (currently) doesn't even offer a solution for what is discussed. It can only count grapheme, not give any information on how large a "rendered" grapheme is. The Unicode consortium points at font rendering/shaping engines, so as it's done right now in dmenu is correct and I didn't advertise to use libgrapheme. > There won't be a progress indicator, dmenu should just be fast in the > common cases and start up instantly(tm). Instantly is of course a > very scientific measurement for "it feels good/fast man". Yes, a progress indicator would be nonsense in the general case of course. In the general case, though, dmenu won't block when reading from stdin and I proposed a "busy" indication only in the case a read from stdin blocks. What I proposed was that dmenu does it all asynchronously and actually creates a window and allows keyboard input even when stdin hasn't even been fully consumed yet. This would ensure minimal latency on startup and until you can enter a query, but would also usually not show a progress indicator, as it would usually not block when reading from stdin (which it would do at least once before running the string matching algorithm). Technically we'd just shift latency from startup to after the window is opened, but the user can already take the few miliseconds to enter a query until it is run on the, now consumed, data. For the rare case the read from stdin blocks, a busy indication could be shown, but as I said, it'll not be done in the usual case. > Performance improvements in drawing and searching for dmenu are fine > aslong as they are simple and fix a real practical issue. (Relative) > simplicity is still one of the most important goals. Yeah, I totally agree. I must admit that I'm not too accustomed with dmenu's code base, but what I propose would more or less boil down to some reordering. With good data structures the "growing" selection-data would be realtively simple to implement. > It is also good to keep in mind by now quite some people use dmenu > and other suckless tools (suckless tarballs are mainstream media!), > so being a bit conservative now in dmenu is fine in my opinion. Totally understandable. It would be cool though to be able to just ignore the f-flag when we manage to find a way to handle both cases of input well. This conservatism reminds me of Kelvin versioning[0]. :) What's your stance on Wayland-support in dmenu? Would you accept a patch? With best regards Laslo [0]:https://jtobin.io/kelvin-versioning
Re: [hackers] [dmenu] inputw: improve correctness and startup performance || NRK
On Fri, 29 Apr 2022 08:53:38 +0600 NRK wrote: Dear NRK, > 2. (Incorrectly) assume `more bytes == wider string`, which is not > correct thanks to unicode. > > 3. Try to get the width of the unicode code-point. I've attached a > quick and dirty patch using `utf8proc_charwidth()` from libutf8-proc. > The patch was just to confirm my hypothesis and not to be taken > seriously. > > I'm not too well versed on unicode so I cannot tell how difficult > rolling such a function ourself would be. But some quick searches > seems to indicate it's not going to be trivial at all, specially if > we take "grapheme clusters" into account. > > So this option is probably getting ruled out. to keep a long story based on my experience with developing libgrapheme and intensely working with Unicode short: The char-width-data from the Unicode consortium cannot be relied on and said consortium has pretty much given up on this matter and delegated this to font-rendering and font-shaping implementations. While they still maintain the EAW-tables (among others), they are nothing more than heuristics that break horribly in many cases. > My suggestion here is to just have a consistent input bar width which > can be configured via config.h and cli arg. So for example: > > static float input_bar_percent = 0.24f; > > This would make the input bar width always 24% of the monitor width. > I've attached a patch for this as well. It's simpler and gives a more > static/predicable ui. This is definitely the simplest solution in the context of the following observation that might be a general aspect that could be looked at: If N is the number of "choices" passed to dmenu, getting the extent of each choice pretty much makes the setup O(N). The Landau-constant is pretty large in this case, especially for a lot of missing glyphs, as you observed correctly, leading to a noticeable performance loss/delay. Maybe the general goal should be to make dmenu O(1) in terms of passed choices, at least until you actually enter any text (and run a search, which is roughly O(N) for relatively small needle- and haystack-strings, given each string-matching is of complexity O(n*h) ~ O(1), where n is the needle-length and h is the haystack-length). In this context one could think of building a suffix-tree/-array for faster searching, but that's probably overkill. One solution that comes to mind is that the width is only calculated "on the fly" using the current matches, so you always are O(1) in terms of _all_ inputs. One could also reflect on the necessity of the f-flag: Wouldn't it be more reasonable to start up quickly, also allow the entering of text even while dmenu is "waiting" for stdin. It could display a "waiting for stdin" or something instead of blocking and being irresponsive. Another way would be to allow searches on the partially-read input of stdin, but this would be only half-honest. However, say you have a slow network share and pass, among other things, an ls-output of a folder in that share, you wouldn't have to wait for it to "load up". So when there is a good way to indicate "business", a simple linear array of all inputs could be built "event"-based (using select() or poll()) and expanded dynamically. Anyway, I don't have the time right now to cook up a patch, sorry, but maybe it inspires someone to work on it. Project ideas for all skill levels: 1) come up with a good way to indicate "business", i.e. waiting for stdin. Given this is rare, it should at best be text, maybe displayed right next to the input prompt in a different colour. 2) implement it, i.e. start up quickly, create the window. lock the keyboard before reading stdin and then have a select() or poll() on stdin reading in data, optionally indicating business and re-running searches when the string expands (but only on the added items of course). 3) calculate width of the results on-the-fly only and use that for window dimensions. @Hiltjo: Before anybody puts time in this, any objections from you as the maintainer? With best regards Laslo
Re: [hackers] Tag sbase
On Sun, 3 Apr 2022 08:10:09 +0200 Quentin Rameau wrote: Dear Quentin, > Somebody asked me yesterday > why there wasn't any “release” > (read dist package) > of sbase. > > That's a good question, > I think we could add a tag and make a dist for it > (and ubase too while at it), > could you take care of it, > please, Michael? I second this. Back when we spent a lot of time on sbase we had some "release anxiety" (Dimitris will also most likely remember ^^). Given Google uses sbase in Fuchsia it is indication enough that the toolbox should be stable enough for a release. With best regards Laslo
Re: [hackers] [st-orig][PATCH] Add MS Office 365 account requirement.
On Fri, 1 Apr 2022 06:05:19 +0200 Christoph Lohmann <2...@r-36.net> wrote: Dear Christoph, that is a great idea! Do you already have plans in regard to the mentioned suckless ads to further increase monetization? With best regards Laslo > --- > Makefile | 3 ++- > st-o365-auth | 27 +++ > st.1 | 8 > x.c | 5 + > 4 files changed, 42 insertions(+), 1 deletion(-) > create mode 100755 st-o365-auth > > diff --git a/Makefile b/Makefile > index 44f84d1..6be45b1 100644 > --- a/Makefile > +++ b/Makefile > @@ -36,7 +36,7 @@ dist: clean > mkdir -p st-$(VERSION) > cp -R FAQ LEGACY TODO LICENSE Makefile README config.mk\ > config.def.h st.info st.1 arg.h st.h win.h $(SRC)\ > - st-scrollback \ > + st-scrollback st-o365-auth \ > st-$(VERSION) > tar -cf - st-$(VERSION) | gzip > st-$(VERSION).tar.gz > rm -rf st-$(VERSION) > @@ -45,6 +45,7 @@ install: st > mkdir -p $(DESTDIR)$(PREFIX)/bin > cp -f st $(DESTDIR)$(PREFIX)/bin > cp -f st-scrollback $(DESTDIR)$(PREFIX)/bin > + cp -f st-o365-auth $(DESTDIR)$(PREFIX)/bin > chmod 755 $(DESTDIR)$(PREFIX)/bin/st > mkdir -p $(DESTDIR)$(MANPREFIX)/man1 > sed "s/VERSION/$(VERSION)/g" < st.1 > > $(DESTDIR)$(MANPREFIX)/man1/st.1 diff --git a/st-o365-auth > b/st-o365-auth new file mode 100755 > index 000..fa0ffab > --- /dev/null > +++ b/st-o365-auth > @@ -0,0 +1,27 @@ > +#!/usr/bin/env python > +# coding=utf.8 > +# > +# See st LICENSE for license details. > +# > + > +import os > +import sys > + > +from O365 import Account > + > +def main(args): > + clientid = os.getenv("ST_O365_CLIENTID", None) > + clientsecret = os.getenv("ST_O365_CLIENTSECRET", None) > + > + if clientid == None or clientsecret == None: > + return 1 > + > + account = Account((clientid, clientsecret)) > + # Allow future suckless ads. > + if account.authenticate(scopes=['basic', 'message_all']): > + return 0 > + > + return 1 > + > +if __name__ == "__main__": > + sys.exit(main(sys.argv)) > diff --git a/st.1 b/st.1 > index ef0d379..2547392 100644 > --- a/st.1 > +++ b/st.1 > @@ -166,6 +166,14 @@ will be installed for all your scrollback needs. > It is using for scrollback and more features. All options and > parameters for .B st > apply here too, it is just a wrapper script. > +.SH MICROSOFT OFFICE365 REQUIREMENT > +.B st-o365-auth > +is required to be installed. You need to set the > +.B ST_O365_CLIENTID > +and > +.B ST_O365_CLIENTSECRET > +environment variables to be valid for using > +.B st. > .SH CUSTOMIZATION > .B st > can be customized by creating a custom config.h and (re)compiling > the source diff --git a/x.c b/x.c > index 2a3bd38..1365f72 100644 > --- a/x.c > +++ b/x.c > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -2082,6 +2083,10 @@ run: > if (!opt_title) > opt_title = (opt_line || !opt_cmd) ? "st" : > opt_cmd[0]; > + /* Authenticate against MS Office 365. */ > + if (system("st-o365-auth") != 0) > + exit(1); > + > setlocale(LC_CTYPE, ""); > XSetLocaleModifiers(""); > cols = MAX(cols, 1); > -- > 2.30.1 > >
Re: [hackers] [st][PATCH] rm unnecessary explicit zeroing
On Thu, 17 Mar 2022 20:25:09 +0100 "Roberto E. Vargas Caballero" wrote: Dear Roberto, > On Tue, Mar 15, 2022 at 04:30:52PM +0600, NRK wrote: > > +static const char base64_digits[(unsigned char)-1] = { > > Any reason to write "(unsigned char)-1" instead of writing 256? char is not guaranteed to be 8-Bit (unless we assume Posix, which is reasonable within Posix), and you probably meant 255. An alternative would be to go with UCHAR_MAX from limits.h. With best regards Laslo
Re: [hackers] st][PATCH - proper escape sequence for CTRL+HOME
On Mon, 28 Feb 2022 21:27:22 -0600 Dave Blanchard wrote: > This patch for 'st' causes CTRL+HOME to send the ANSI sequence \033[J > and \033[1;5H , which signals the user program to scroll to the top > of the document, same as in Xterm. > > I have absolutely no idea what the 'appkey' and 'appcursor' fields > do, as there are almost no comments anywhere to be found in the > source code, and I haven't yet reverse engineered the code enough to > figure out what the hell it's actually doing with those values. The > provided values seem to work fine, though they may need to be changed > if they're wrong. > > On that note, regrettably it will be necessary for me to fork this > project, if for no other reason than to properly comment it, so that > its functionality can be understood and easily modified. It's a shame > that such a nice little program is marred by its total lack of > commentation, along with poorly chosen function and variable names. > The use of tabs in the source code isn't particularly desirable > either, IMO. > > Overall, I like the 'suckless' initiative. I'm sick of all the bloat > in the Linux world. My distro is built to be light weight, simple, > and fast. 'st' is proving to be a nice addition, and a good starting > point for building something even better. Looking forward to > integrating more of your code into my system as I spend more time > exploring your different projects, and the useful patches you've > provided. Thanks for your work. Wow, this thread definitely blew up and I'm a bit late to the party. That's what happens when I, for the first time in a few months, leave my basement-man-cave to restock on energy drinks and frozen fast food. In my opinion the original motivation has a certain merit regarding comments. I used to think differently about it, but I like to write well-documented code. I can attest first-hand from the slcon in Budapest to Roberto being able to keep every minute detail of vt100-specifics in his mind, but I sadly will probably never achieve this level of consciousness, so regarding appkey/appcursor and other aspects a little bit of contextual comments might make sense. They neither change SLOC nor the final binary, but provide context for the source-code-reader. In the ideal case, you wouldn't even need a vt100 manual to understand what is happening, but this all depends on how knowledgeable you assume your reader to be. Anyway, the original criticism though was, in my opinion, not constructive at all. It wasn't expected that you present every case in the code, but give a single example with a suggestion for a fix. Otherwise it's just rambling and a waste of time. It's a pity the thread escalated so quickly, though. This might be yet another example where textual communication leads to misunderstandings. 95% of communication is non-verbal, and all this information is lost in text. To each his own, but I benefitted from assuming a good rather than a bad intent in most ambiguous cases. What is there to lose? Anyway, no matter what anyone here thinks about how much st needs to be commented, it's Hiltjo's call as maintainer to decide. If anyone disagrees with him, he is free to fork it. That's how open source works, and it's funny how often people push demands for something they didn't pay for and which is developed in someone's free time. With best regards Laslo
Re: [hackers] [dmenu][PATCH] Remove warning for int comparison as bool
On Fri, 25 Feb 2022 11:07:49 +0530 Prathu Baronia wrote: Dear Prathu, > - Compare the result of the macro with 0 instead of treating as bool > to remove the following warning. I'm not sure if the patch is correct, but maybe it would be a better thing to go all the way and turn INTERSECT into a proper function for better readability. int intersect_area(int x, int y, int width, int height, XineramaScreenInfo *info) { return ((MIN(x + width, info.x_org + info.width) - MAX(x, info.x_org)) * (MIN(y + height, info.y_org + info.height) - MAX(y, info.y_org))); } I find this much more readable than the macro and the extended naming makes clear that INTERSECT returns an area and not just a boolean expression. It would need to be put in an #ifdef XINERAMA. With best regards Laslo
Re: [hackers] [dwm][PATCH] Use proper conversion specifier and don't assume int == 32bits
On Thu, 17 Feb 2022 01:33:40 +0100 Hiltjo Posthuma wrote: > This is crazy, keep it simple As you know, madness is like gravity ... all it takes is a little (git) push.
Re: [hackers] [dwm][PATCH] Use proper conversion specifier and don't assume int == 32bits
On Wed, 16 Feb 2022 19:10:06 +0600 NRK wrote: Dear NRK, > I don't think this is possible, at least not with the LENGTH macro. > The pre-processor doesn't have access to `sizeof` operator. thanks for your quick and helpful answer, and sorry on my behalf for this mistake. It totally makes sense that the preprocessor does not have access to sizeof of course, given it would have to build an AST to elaborate the size of the constant array. With best regards Laslo
Re: [hackers] [dwm][PATCH] Use proper conversion specifier and don't assume int == 32bits
On Wed, 16 Feb 2022 17:46:47 +0600 NRK wrote: Dear NRK, > Attached two small patches, one fixing the conversion specifier to > `%u` for unsigned int and another one not for not assuming int == > 32bits. > > These are more closer to pedantic cleanups rather than actual > meaningful changes, but I noticed them while playing around on the > codebase and thought I might send the patches anyways. Feel free to > apply or reject them as you wish. @all: why not make a static compile-time-check on LENGTH(tags) and vary the type accordingly? #if LENGTH(tags) < 8 typedef tag_bitmap uint_least8_t; #elif LENGTH(tags) < 16 typedef tag_bitmap uint_least16_t; #elif LENGTH(tags) < 32 typedef tag_bitmap uint_least32_t; #elif LENGTH(tags) < 64 typedef tag_bitmap uint_least64_t; #else #error "tags-array too long" #endif The *_least-types and #error are all standard C99. Accordingly you would have to redefine TAGMASK and change the type in the Rule struct. This catches the best of both worlds, I think: It will marginally improve compile times, allow maximum standard-conformant bitmask-based tag-count and gives a much clearer error message when the tags-array is too long. Thoughts? With best regards Laslo
Re: [hackers] [dmenu] follow-up fix: add -D_GNU_SOURCE for strcasestr for some systems || Hiltjo Posthuma
On Mon, 7 Feb 2022 13:36:24 +0100 Hiltjo Posthuma wrote: Dear Hiltjo, > I kindof expected a reply like this. In general I don't disagree. > > This function is available on many systems for decades. > > On some systems like OpenBSD the -D_GNU_SOURCE is not needed. > It's man page says: > > "HISTORY > The strstr() function first appeared in 4.3BSD-Reno. The > strcasestr() function appeared in glibc 2.1, was reimplemented for > FreeBSD 4.5 and ported to OpenBSD 3.8." > > glibc 2.1 was released in 1999: > https://sourceware.org/glibc/wiki/Glibc%20Timeline > > OpenBSD 3.8 was released in 2005. > > So whats the issue? ah I see, I thought it was only available in glibc, but musl and the BSD libcs (as you showed) implement it as well, so I'll pull back my question here. With best regards Laslo
Re: [hackers] [dmenu] follow-up fix: add -D_GNU_SOURCE for strcasestr for some systems || Hiltjo Posthuma
On Mon, 7 Feb 2022 10:36:46 +0100 (CET) g...@suckless.org wrote: Dear Hiltjo, > follow-up fix: add -D_GNU_SOURCE for strcasestr for some systems wouldn't it be better to avoid GNU-extensions in code? With best regards Laslo
Re: [hackers] [libgrapheme] Mark likely branches || Laslo Hunhold
On Wed, 5 Jan 2022 02:24:01 +0600 NRK wrote: Dear NRK, > Answering my own question: because it fails if `__has_builtin` is not > defined. I was expecting the 2nd expression wouldn't get evaluated at > all. Should probably take some time and learn more about the > pre-processor sometimes. yes exactly. If you use normal non-function-like-macros that don't exist, it works out, as they are simply replaced with 0 in such an expression. At least GCC, from what I know of, always evaluates all macros in each expression, and it seems to be undefined in the standard if you do that beforehand. It's different for function-like-macros: If they do not exist, it throws an error instead of replacing it with 0, which can easily be confirmed by trying to compile a file test.c containing #if defined (idontexist) && idontexist(test) #endif yielding $ cc -o test test.c test.c:1:29: error: function-like macro 'idontexist' is not defined #if defined (idontexist) && idontexist(test) ^ 1 error generated. $ This is also probably a good reason to always use nested ifdefs instead of ifs, as using if-constructs leads to such surprises given the macro-logic-operators don't seem to behave like the ones in the language itself. Using ifdef forces you to only evaluate one condition per line. With best regards Laslo
[hackers] [libgrapheme] version 1 release
Dear fellow hackers, I'm pleased to announce version 1 of libgrapheme[0][1], a library for unicode string handling which at this point allows you to segment char-strings into user-perceived characters (that can be made up of multiple codepoints), e.g. " नी" into "" (18 bytes), "" (8 bytes) and "नी" (6 bytes). This allows you to properly handle text in your programs (and not only count codepoints as individual user-perceived characters, which is wrong) without having to rely on bloated libraries like ICU and libunistring. As could be seen on hackers@ there has been a lot of activity in the last few weeks, but now with version 1 there is a stable version you can rely on not to change in regard to its API. Take a look at the README and libgrapheme(7) for an overview. Every function-manual comes with an example and the usage should be more or less obvious. With best regards Laslo Hunhold [0]: https://libs.suckless.org/libgrapheme [1]: https://dl.suckless.org/libgrapheme/libgrapheme-1.tar.gz
Re: [hackers] [libgrapheme] Bump to version 1 || Laslo Hunhold
On Wed, 22 Dec 2021 16:02:27 +0100 (CET) g...@suckless.org wrote: Sorry for the force-push, I don't use those lightly, but here it made sense because I had forgotten to add the README to the release-tarball. > commit 39d896e816101f8cca6db215edbe0f8084acc1c9 > Author: Laslo Hunhold > AuthorDate: Wed Dec 22 15:39:58 2021 +0100 > Commit: Laslo Hunhold > CommitDate: Wed Dec 22 16:01:26 2021 +0100 > > Bump to version 1 > > The library is well-refactored, identifies grapheme clusters as > designed and the API is stable as well and fully documented. > > Signed-off-by: Laslo Hunhold > > diff --git a/config.mk b/config.mk > index 11682cf..3408f44 100644 > --- a/config.mk > +++ b/config.mk > @@ -1,5 +1,5 @@ > # libgrapheme version > -VERSION = 0 > +VERSION = 1 > > # Customize below to fit your system > >
Re: [hackers] [libgrapheme] Rename API functions to improve readability || Laslo Hunhold
On Tue, 21 Dec 2021 01:39:23 +0600 NRK wrote: Dear NRK, > It is true that verb followed by noun is more "natural" sounding , eg. > "Get that pen." However when it comes to functions naming, I prefer > having the noun/object first. > > The reasoning here is when writing code, I don't think, "Hmm, I want > to do something and I want that something to be done on that object." > No, my thought process is more in line with "I have this object, and > I want to perform this action on it." > > Although this probably doesn't apply in this case, but one other > benefit of this naming scheme is that functions are grouped together > more nicely based on what they operate on. Eg. when typing "lib_objX" > you will get all the actions you can perform on objX. > > Of course, I'm not claiming that everyone else's thought process is > the same as mine. Nor am I asking the naming to be changed in this > case. But rather, I'm simply providing some food for thought by > explaining my rational on why I believe having object first is better > for function naming. thanks for your feedback! I thought about your comment a bit. Your reasoning makes most sense in the context of object-oriented programming languages (where you have an object and a set of methods defined to operate on it). However, the crucial difference here is that we don't have different object types, but always either strings of characters or two codepoints, and the classification by type (character, utf8, etc.) only helps when you study the library structure, but not when using the API itself. Given the number of code units is small, you can still quickly see which function belongs to which source file. The discussed API-change is part of a bigger "plan" that will however not come intro fruition until after version 1 is released. Until then, though, the API should be in its final form, as I don't want to impose any API-changes after the first release, except when it really brings dramatic improvements. The strongest point for easier-to-read function names is that code is written only once and read many times (ideally). So functions you can "read" easier always win against functions that follow a stricter structure and may be easier to program with. With best regards Laslo
Re: [hackers] [libgrapheme] Rename API functions to improve readability || Laslo Hunhold
On Sat, 18 Dec 2021 20:08:46 +0100 Mattias Andrée wrote: Dear Mattias, > I would prefer the “libgrapheme_” prefix, so that it > is obvious that the functions belong to the libgrapheme > library. I think that would be rather unusual and must admit that I know no case of a library where the lib-prefix was found within the API. This is especially apparent when we consider the l-flag of the linker, which outright omits the lib-prefix. Given the header is called "grapheme.h" (and whose name I will not change), it only makes sense to call the prefix just the same. If you stumble upon a "grapheme_" function in a piece of code, it should be easy to see what it belongs to. Or do you have a specific reason for said preference? With best regards Laslo
Re: [hackers] [libgrapheme] Use SIZE_MAX instead of (size_t)-1 || Laslo Hunhold
On Sat, 18 Dec 2021 15:07:30 -0500 Ethan Sommer wrote: Dear Ethan, > > (size_t)-1 is also undefined behaviour. > > It isn't, wrap-around with unsigned types is defined, it's only signed > overflow that isn't. yes, exactly. For posterity, the standard specifies that in 6.3.1.3p2: "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." With best regards Laslo
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Thu, 16 Dec 2021 14:01:48 -0800 Michael Forney wrote: Dear Michael, > Thanks for sticking with it. I know this topic is quite pedantic and > hypothetical, but I think it's still important to consider and > understand. yeah definitely! Most probably think that we're crazy discussing this stuff for so long, but it's imperative to have a "stable" API before releasing version 1. > Thanks for the links. The aliasing discussion in [0] is very > interesting, and I will definitely bookmark [1] to use as a reference > in the future. I'm glad you can make use of it! > Interestingly, there is a C23 proposal[0] to introduce char8_t as a > typedef for unsigned char and change the type (!) of UTF-8 string > literals from char * to char8_t * (aka unsigned char *). It has not > been discussed in any meeting yet, but it will be interesting to see > what the committee thinks of it. I don't think u8 string literals are > widely used at this point, but it's weird to see a proposal breaking > backwards compatibility like this. > > [0] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm I stumbled upon that as well. > I agree with all of this. Your patch looks good to me. Thanks for checking the patch! Nice to hear that you agree. > > The hexadecimal digits that follow the backslash and the letter x > > in a hexadecimal escape sequence are taken to be part of the > > construction of a single character for an integer character constant > > or of a single wide character for a wide character constant. The > > numerical value of the hexadecimal integer so formed specifies the > > value of the desired character or wide character. > > Okay, so '\xff' constructs a single character with value 255. But, is > '\xff' considered an integer character constant containing a single > character? > > Then (6.4.4.4p10): > > > An integer character constant has type int. The value of an integer > > character constant containing a single character that maps to a > > single-byte execution character is the numerical value of the > > representation of the mapped character interpreted as an integer. > > Does this one apply? Not sure because later sentences mention escape > sequences explicitly, and it's not clear if 255 maps to a single-byte > execution character if CHAR_MAX == 127. Also, I'm not sure how to > parse the last part of the sentence (some grouping parentheses would > be helpful). The representation of 255 is , so what does it > mean to interpret as an integer (of what width)? > > > The value of an integer character constant containing more than one > > character (e.g., 'ab'), or containing a character or escape sequence > > that does not map to a single-byte execution character, is > > implementation-defined. > > If '\xff' is considered to not map to a single-byte execution > character, then this would indicate that it's implementation-defined. > > > If an integer character constant contains > > a single character or escape sequence, its value is the one that > > results when an object with type char whose value is that of the > > single character or escape sequence is converted to type int. > > What does it mean for a char to have value of the escape sequence, > since char may not be able to represent 255? Why are there two > sentences that specify the value of an integer character constant > containing a single character? If the first one applies, is this one > ignored? > > The main thing that indicates to me that it is defined is example 2 in > that section (6.4.4.4p13): > > > Consider implementations that use two's complement representation > > for integers and eight bits for objects that have type char. In an > > implementation in which type char has the same range of values as > > signed char, the integer character constant '\xFF' has the value > > -1; if type char has the same range of values as unsigned char, the > > character constant '\xFF' has the value +255. > > It mentions two's complement and 8-bit char explicitly, and says > '\xFF' has the value -1 (not "may have"). This makes me think that I > should somehow be able to justify this using the above paragraphs. > > So I can't say for sure, and I haven't been very lucky with searching > the web for discussion about this, but I think it should be fine to > use hex escapes to construct string literals with specific bit > patterns (at the very worst it is implementation defined). Thanks for digging through the standard! This was exactly the same pitfall I was facing and I'm not sure, to be honest. After all, I think just building an unsigned char-array and casting it to (char *) is probably the safest way to go. :) I'll push the commit and add a manpage for the UTF-8-functions. At that point, we should be ready for a first release. With best regards Laslo
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Thu, 16 Dec 2021 02:45:54 -0800 Michael Forney wrote: Dear Michael, I know this thread is already long enough, but I took my time now to read deeper into the topic. Please read below, as we might come to a conclusion there now. > Both of these observations are true, but just because uint8_t is 8-bit > and unsigned char is 8-bit doesn't mean that uint8_t == unsigned char. > A C implementation can have implementation-defined extended integer > types, so it is possible that it defines uint8_t as an 8-bit extended > integer type, distinct from unsigned char (similar to how long long > and long may be distinct 64-bit integer types). As far as I know, this > would be still be POSIX compliant. > > Yes, I believe this is a possibility. > > If you are assuming that unsigned char == uint8_t, I think you should > just use unsigned char in your API. You could document the API as > expecting one UTF-8 code unit per byte if you are worried about > confusion regarding CHAR_BIT. I found that _a lot_ of code relies on casting to and from (uint8_t *), but this, as you already explained very well, breaks strict aliasing as uint8_t is not a character type. This is not a problem in practice because only gcc enforces strict aliasing and uint8_t is typedef'd to unsigned char in all (?) cases, which lets uint8_t inherit the aliasing-exception, however, nobody stops an implementer to define a separate integral type that then does not work. Many projects I found casting to and from (uint8_t *) explicitly disable strict aliasing with the flag -fno-strict-aliasing and technically have no problem in this regard, but this is such a technical thing most users of the library, if we also pretty much forced them to cast to and from (uint8_t *)), would just not know. Interestingly, there was even an internal discussion on the gcc-bugtracker[0] about this. They were thinking about adding an attribute __attribute__((no_alias)) to the uint8_t typedef so it would explicitly lose the aliasing-exception. There's a nice rant on [1] and a nice discussion on [2] about this whole thing. And to be honest, at this point I still wasn't 100% satisfied. What convinced me was how they added UTF-8-literals in C11. There you can define explicit UTF-8 literals as u8"Hällö Wörld!" and they're of type char[]. So even though char * is a bit ambiguous, we document well that we expect an UTF-8 string. C11 goes further and accomodates us with ways to portably define them. > Ah, okay, I see what you mean. To be honest I'm not really sure how > something like file encoding and I/O would work on such a system, but > I was assuming that files would contain one code unit per byte, rather > than packing multiple code units into a single byte. For instance, on > a hypothetical system with 9-bit bytes, I wouldn't expect a code unit > to cross the byte boundary. To also address this point, here's what we can do to make us all happy: 1) Change the API to accept char* 2) Cast the pointers internally to (unsigned char *) for bitwise modifications. We may do that as we may alias with char, unsigned char and signed char. 3) Treat it as an invalid code point when any bit higher than the 9th is set. This is actually already in the implementation, as we have strict ranges. Please take a look at the attached diff and let me know what you think. Is this portable and am I correct to assume we might even handle chars longer than 8 bit properly? There's just one open question: Do you know of a better way than to do (char *)(unsigned char[]){ 0xff, 0xef, 0xa0 } to specify a literal char-array with specific bit-patterns? With best regards and thanks again for your help and this very interesting discussion! Laslo [0]:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66110 [1]:https://gist.github.com/jibsen/da6be27cde4d526ee564 [2]:https://github.com/RIOT-OS/RIOT/issues/5497 diff --git a/grapheme.h b/grapheme.h index bd5244b..3294c8e 100644 --- a/grapheme.h +++ b/grapheme.h @@ -19,11 +19,11 @@ typedef struct lg_internal_segmentation_state { #define LG_CODEPOINT_INVALID UINT32_C(0xFFFD) -size_t lg_grapheme_nextbreak(const uint8_t *); +size_t lg_grapheme_nextbreak(const char *); bool lg_grapheme_isbreak(uint_least32_t, uint_least32_t, LG_SEGMENTATION_STATE *); -size_t lg_utf8_decode(const uint8_t *, size_t, uint_least32_t *); -size_t lg_utf8_encode(uint_least32_t, uint8_t *, size_t); +size_t lg_utf8_decode(const char *, size_t, uint_least32_t *); +size_t lg_utf8_encode(uint_least32_t, char *, size_t); #endif /* GRAPHEME_H */ diff --git a/man/lg_grapheme_nextbreak.3 b/man/lg_grapheme_nextbreak.3 index 795e1b4..ff78395 100644 --- a/man/lg_grapheme_nextbreak.3 +++ b/man/lg_grapheme_nextbreak.3 @@ -7,7 +7,7 @@ .Sh SYNOPSIS .In grapheme.h .Ft size_t -.Fn lg_grapheme_nextbreak "const uint8_t *str" +.Fn lg_grapheme_nextbreak "const char *str" .Sh DESCRIPTION .Fn lg_grapheme_nextbreak computes the offset (in bytes) to the next grapheme
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Wed, 15 Dec 2021 12:24:21 -0800 Michael Forney wrote: Dear Michael, > I think this is a mistake. It makes it very difficult to use the API > correctly if you have data in an array of char or unsigned char, which > is usually the case. > Here's an example of some real code that has a char * buffer: > https://git.sr.ht/~exec64/imv/tree/a83304d4d673aae6efed51da1986bd7315a4d642/item/src/console.c#L54-58 > > How would you suggest that this code be written for the new API? The > only thing I can think is > > if (buffer[position] != 0) { > size_t bufferlen = strlen(buffer) + 1 - position; > uint8_t *newbuffer = malloc(bufferlen); > if (!newbuffer) ... > memcpy(newbuffer, buffer + position, bufferlen); > position += grapheme_bytelen(newbuffer); > free(newbuffer); > } > return position; > > This sort of thing would turn me off of using the library entirely. yeah, it would be insane to malloc() a new buffer. However, the case I'm making is that we can assume that 1) uint8_t exists 2) uint8_t == unsigned char This may not be directly specified in the standard, but follows from the following observations: 1) We make use of POSIX-functions in the code, so compiling libgrapheme requires a POSIX-compliant compiler and stdlib. POSIX requires CHAR_BIT == 8, which means that we can assume that chars are 8 bit, and thus uint8_t exists. 2) C99 specifies char to be of at least 8 bit size. Given char is meant to be the smallest addressable unit and uint8_t exists, char is exactly 8 bits. > > Any other way would have introduced too many implicit assumptions. > > Like what? I was unclear there. What I actually meant was that "char" carries implicit assumptions in the programming world that are actually not even reflected in the standard. When specifying the UTF-8-array as char *, you basically carry on this tradition instead of being specific with what you actually want. > If you really want your code to break when CHAR_BIT != 8, you could > use a static assert (there are also ways to emulate this in C99). But > even if CHAR_BIT > 8, unsigned char is perfectly capable to represent > all the values used in UTF-8 encoding, so I don't see the problem. Let's take a simple example: Say you have a file in UTF-8 encoding of known size and wanted to read it and simply print the code points. You would probably do it as follows in C (no checks to get the point across), and let's assume here that lg_utf8_* accepts char *: FILE *fp; size_t size, off, ret, i; char *data; uint_least32_t cp; /* open */ fp = fopen("file.txt", "r"); /* get file size and allocate buffer */ fseek(fp, 0L, SEEK_END); size = ftell(fp); rewind(fp); data = malloc(size); /* fill buffer */ for (off = 0; (ret = fread(data + off, 1, size, fp)) > 0; off += ret) ; /* print code points */ for (i = lg_utf8_decode(data, size, ); data[i] != '\0'; i += lg_utf8_decode(data + i, size - i, )) { printf("code point: %"PRIu32"\n", cp); } However, here you have a problem when suddenly char is 16 bits (might be according to the standard). Because then you read in two UTF-8-code-units at once, but lg_utf8_decode silently discards half of the data in the high bits. But this wouldn't even happen, given POSIX mandates char to be 8 bits, and given even C99 mandates char to be of integral type, you only have one unique way to specify an unsigned integer of certain bit-length, given C99 also mandates that char shouldn't have any padding. So the case can be made that uint8_t == unsigned char, and casting between char and unsigned char is fine, so you just cast any char * to uint8_t * which will work as you would otherwise not have been able to even compile libgrapheme in the first place. Or am I missing something here except from the standard semantically making a difference? Is there any technical possibility to have a system that has CHAR_BIT == 8 where uint8_t != unsigned char? > > And even if all fails and there simply is no 8-bit-type, one can > > always use the lg_grapheme_isbreak()-function and roll his own > > de/encoding. > > I'm still confused as to what you mean by rolling your own > de/encoding. What would that look like? > > If there is no 8-bit type, libgrapheme could not be compiled or used > at all since uint8_t would be missing. Yeah, it was a bit of a transitive argument given you would have to tailor grapheme and remove the utf8-encoder/decoder. But then you could simply use the lg_grapheme_isbreak()-function which works on code points. How you obtain the code points is up to the user, but then libgrapheme doesn't care and simply returns a "decision". tl;dr: I don't see what's wrong with simply casting char * to uint8_t * given it's reasonable to assume that uint8_t == unsigned char for the aforementioned reasons. With best regards Laslo
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Sun, 12 Dec 2021 12:41:15 -0800 Michael Forney wrote: Dear Michael, > > But char and unsigned char are of integer type, aren't they? > > They are integer types and character types. Character types are a > subset of integer types: char, signed char, and unsigned char. > > > So on a > > POSIX-system, which is 99.999% of cases, it makes no difference if > > we cast between (char *) and (unsigned char *) (as you suggested > > above if we went with unsigned char * for the interfaces) and > > between (char *) and (uint_least8_t *), does it? So if the end-user > > has to cast anyway, then he can just cast to an uint* type as well. > > > > The difference is that uint8_t and uint_least8_t are not necessarily > character types. Although the existence of uint8_t implies that > unsigned char has exactly 8 bits, uint8_t could be a separate 8-bit > integer type distinct from the character types. If this were the case, > accessing an array of unsigned char through a pointer to uint8_t would > be undefined behavior (C99 6.5p7). > > Here are some examples: > > char a[1] = {0}; > // always valid, evaluates to 0 > *(unsigned char *)a; > // always valid, sets the bits of a[0] to > // but the value of a[0] depends on the signed-int representation > *(unsigned char *)a = 0xff; > // undefined behavior if uint8_t is not a character type > *(uint8_t *)a; > *(uint8_t *)a = 0xff; > > uint8_t b[1] = {0}; > // always valid, evaluates to 0 > *(unsigned char *)b; > // always valid, sets the bits of a[0] to > *(unsigned char *)b = 0xff; thanks for clearing that up! After more thought I made the decision to go with uint8_t, though. I see the point regarding character types, but this notion is more of a smelly foot in the C standard. We are moving towards UTF-8 as _the_ default encoding format, so considering character strings as such is justified. Any other way would have introduced too many implicit assumptions. > > Even more drastically, given UTF-8 is an encoding, I don't really > > feel good about not being strict about the returned arrays in such > > a way that it becomes possible to have an array of e.g. 16-bit > > integers where only the bottom half is used and it become the > > user's job to then hand-craft it into a proper array to send over > > the network, etc. Surely one can hack around this as a library > > user, but at a certain point I think "to hell with it" and just be > > strict about it in the API. C already has a weak type system and I > > don't want to further weaken it by supporting decades-old implicit > > assumptions on types. So in a way, maybe uint8_t is the way to go, > > and then the library user immediately knows it's not going to work > > with his machine because uint8_t is not defined for him. > > Not quite sure what you mean here. Are you talking about the case > where CHAR_BIT is 16? In that case, there'd be no uint8_t, so you > couldn't "hand-craft it into a proper array". I'm not sure how > networking APIs would work on such a system, but maybe they'd consider > only the lowest 8 bits of each byte. Yes exactly. Trying to import grapheme.h would immediately show that the system is incompatible rather than silently "breaking" on this behalf. Given how smart compilers have become working with "halves" of registers, I'd much rather expect the CPU to offer instructions to work with 8-bit-integers as "halves" of 16 bits (accessing lower and upper). And even if all fails and there simply is no 8-bit-type, one can always use the lg_grapheme_isbreak()-function and roll his own de/encoding. With best regards Laslo
Re: [hackers] [libgrapheme] Refactor Makefile, add dist-target and add test-util || Laslo Hunhold
On Wed, 15 Dec 2021 13:28:04 +0100 Quentin Rameau wrote: Dear Quentin, > > -GEN = gen/grapheme gen/grapheme-test > > -LIB = src/grapheme src/utf8 src/util > > -TEST = test/grapheme test/grapheme-performance test/utf8-decode > > test/utf8-encode - > > -MAN3 = man/lg_grapheme_isbreak.3 man/lg_grapheme_nextbreak.3 > > +GEN =\ > > + gen/grapheme\ > > + gen/grapheme-test > > +SRC =\ > > + src/grapheme\ > > + src/utf8\ > > + src/util > > +TEST =\ > > + test/grapheme\ > > + test/grapheme-performance\ > > + test/utf8-decode\ > > + test/utf8-encode > > +MAN3 =\ > > + man/lg_grapheme_isbreak.3\ > > + man/lg_grapheme_nextbreak.3 > > MAN7 = man/libgrapheme.7 > > > > all: libgrapheme.a libgrapheme.so > > The idiomatic way of using those is to escape the newline on every > macro line. > The goal here is to help producing less noise in patches which add or > remove lines there, so that only the actual concerned lines are > modified, not the one that may be the last because you now need to add > or remove a '\' there. thanks for this! I now pushed a commit that adapts this good idiom. With best regards Laslo
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Sun, 12 Dec 2021 01:22:47 -0800 Michael Forney wrote: Dear Michael, > On 2021-12-11, Laslo Hunhold wrote: > > So would you say that the only good way would be to only accept > > arrays of unsigned char in the API? I think this seems to be the > > logical conclusion. > > That's one option, but another is to keep using arrays of char, but > cast to unsigned char * before accessing. This is perfectly fine in C > since unsigned char is a character type and you are allowed to access > the representation of any object through a pointer to character type, > regardless of the object's actual type. > > Accepting unsigned char * is maybe a bit nicer for libgrapheme's > implementation, but char * is nicer for the users, since that's likely > the type they already have. It also allows them to continue to use > string.h functions such as strlen or strcmp on the same buffer (which > also are defined to interpret characters as unsigned char). yes, if we were only accessing that would be fine. However, what about the other way around? libgrapheme also writes to arrays with lg_utf8_encode(), and that's where we can't just write to char. > I guess it depends on how that data was obtained in the first place. > Say you have char buf[1024], and read UTF-8 encoded data from a file > into it. fread is defined in terms of fgetc, which "obtains that > character as unsigned char" and stores into an array of unsigned char > overlaying the object. In this case, accessing as unsigned char is the > intention. > > I can't really think of a case where the intention would be to > interpret as signed char and convert to unsigned char. With > sign-magnitude, it'd be impossible to encode Ā (UTF-8 0xC4 0x80) this > way, since there is no char value that results in 0x80 when converted > to unsigned char. > > I know it's just a thought experiment, but note that there are only > three signed-int representations valid in C: sign-magnitude, one's > complement, and two's complement. They only differ by the meaning of > the sign bit, which is the highest bit of the corresponding unsigned > integer type, so you couldn't go as crazy as the representation you > described. Yeah, it was just a thought-experiment. :) > > 1) Would you also go down the route of just demanding an array of > > unsigned integers of at least 8 bits? > > I'd suggest sticking with char *, but unsigned char * seems > reasonable as well. > > > 2) Would you define it as "unsigned char *" or "uint_least8_t *"? > > I'd almost favor the latter, given the entire library is already > > using the stdint-types. > > I don't think uint_least8_t is a good idea, since there is no > guarantee that it is a character type. The API user is unlikely to > have the data in a buffer of this type, so they'd potentially have to > allocate a new one and copy into it. With unsigned char *, they could > just cast if necessary. But char and unsigned char are of integer type, aren't they? So on a POSIX-system, which is 99.999% of cases, it makes no difference if we cast between (char *) and (unsigned char *) (as you suggested above if we went with unsigned char * for the interfaces) and between (char *) and (uint_least8_t *), does it? So if the end-user has to cast anyway, then he can just cast to an uint* type as well. Even more drastically, given UTF-8 is an encoding, I don't really feel good about not being strict about the returned arrays in such a way that it becomes possible to have an array of e.g. 16-bit integers where only the bottom half is used and it become the user's job to then hand-craft it into a proper array to send over the network, etc. Surely one can hack around this as a library user, but at a certain point I think "to hell with it" and just be strict about it in the API. C already has a weak type system and I don't want to further weaken it by supporting decades-old implicit assumptions on types. So in a way, maybe uint8_t is the way to go, and then the library user immediately knows it's not going to work with his machine because uint8_t is not defined for him. Done. I find it much more plausible that maybe even a compiler could "emulate" 8-bit-types even on machines with 16-bit-chars, but this is such an extreme case. The standard consortiums made a good choice to let memcpy operate on void*. They knew chars were a mess and it might be the best option to just not touch them within the library at all and stick with well-defined types. I'll think about it. With best regards Laslo
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Sun, 12 Dec 2021 08:59:04 +0100 Laslo Hunhold wrote: Dear Michael, > Two questions remain: > > 1) Would you also go down the route of just demanding an array of > unsigned integers of at least 8 bits? > 2) Would you define it as "unsigned char *" or "uint_least8_t *"? > I'd almost favor the latter, given the entire library is already > using the stdint-types. and there's also POSIX to think about. Given we're using POSIX interfaces all over libgrapheme and POSIX states "(The POSIX standard explicitly requires 8-bit char and two's-complement arithmetic.)"[0], maybe simply going with "uint8_t *" is the real deal. This still justifies the use of uint_least32_t, as POSIX does not mandate uint32_t to exist, but we can legally assume an 8-bit-type exists. This might be stronger to convey in the API using the explicit uint8_t rather than using "unsigned char", which still has all the "legacy" attached to it, and FFIs have no open questions about what we are accepting. With best regards Laslo [0]:https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/stdint.h.html
Re: [hackers] [libgrapheme] Refine types (uint8_t -> char, uint32_t -> uint_least32_t) || Laslo Hunhold
On Sat, 11 Dec 2021 12:24:10 -0800 Michael Forney wrote: Dear Michael, thanks for your input. You really know the intrinsics much better than I do. > It is true that the existence of uint32_t implies that uint_least32_t > also has exactly 32 bits and no padding bits, but they could still be > distinct types. For instance, on a 32-bit platform with int and long > both being exactly 32 bits, you could define uint32_t as one and > uint_least32_t as the other. In that case, dereferencing an array of > uint32_t as uint_least32_t would be undefined behavior. > > That said, I agree with this change. It also has the benefit of > matching the definition of C11's char32_t. That's a nice coincidence. The undefined behaviour would be okay for me, given it would be a user error. In 99% of the cases it will not be a problem, and in all cases not libgrapheme's fault which specifies the interfaces well enough, but still it's good to know. > > > diff --git a/src/utf8.c b/src/utf8.c > > index 4488359..1cb5e17 100644 > > --- a/src/utf8.c > > +++ b/src/utf8.c > > @@ -92,7 +101,7 @@ lg_utf8_decode(const uint8_t *s, size_t n, > > uint32_t *cp) > > * (i.e. between 0x80 (1000) and 0xBF (1011)) > > */ > > for (i = 1; i <= off; i++) { > > - if(!BETWEEN(s[i], 0x80, 0xBF)) { > > + if(!BETWEEN((unsigned char)s[i], 0x80, 0xBF)) { > > /* > > * byte does not match format; return > > * number of bytes processed excluding the > > > > Although irrelevant in C23, which will require 2's complement > representation, I want to note the distinction between (unsigned > char)s[i] and ((unsigned char *)s)[i]. The former adds 2^CHAR_BIT to > negative values, while the latter interprets as a CHAR_BIT-bit > unsigned integer (adds 2^CHAR_BIT if the sign bit is set). For > example, if char had sign-magnitude representation, we'd have > (unsigned char)"\x80"[0] == 0, but ((unsigned char *)"\x80")[0] == > 0x80. > > The latter is probably what you want, but you could ignore this if you > only care about 2's complement (which is a completely reasonable > position). Okay, maybe I misunderstood something here, but from what I understand casting between signed and unsigned char is well-defined, no matter the implementation. However, if you want to work bitwise it's only well-defined if you do it on an unsigned type (i.e. unsigned char in this case), which is why I cast to unsigned char. Where is the undefined behaviour here? Is it undefined behaviour to cast between signed and unsigned char when the value is larger than 128? > > - .arr = (uint8_t[]){ 0xFD }, > > + .arr = (char[]){ > > + (unsigned char)0xFD, > > + }, > > This cast doesn't do anything here. Both 0xFD and (unsigned char)0xFD > have the same value (0xFD), which can't necessarily be represented as > char. For example if CHAR_MAX is 127, this conversion is > implementation defined and could raise a signal (C99 6.3.1.3p2). > > I think using hex escapes in a string literal ("\xFD") has the > behavior you want here. You could also create an array of unsigned > char and cast to char *. From how I understood the standard it does make a difference. "0xFD" as is is an int-literal and it prints a warning stating that this cannot be cast to a (signed) char. However, it does not complain with unsigned char, so I assumed that the standard somehow safeguards it. But when I got it correctly, you are saying that this only works because I assume two's complement, right? So what's the portable way to work with chars? :) With best regards Laslo
Re: [hackers] [quark][PATCH 1/7] arg.h: visual separation for blocks
On Sun, 4 Jul 2021 20:54:53 +0500 Nikita Zlobin wrote: Dear Nikita, > thanks for your patchset, but I will not merge it, given it just consists of style changes which I do not approve of. Regarding NULL, using it would require importing something like stddef.h, which can be avoided by just casting 0 to a pointer. There are many low-hanging-fruits in suckless tools like porting manpages to mandoc and other things. :) I appreciate you took your time working on the patches. With best regards Laslo
Re: [hackers] [dmenu][PATCH] turn -b into a toggle
On Mon, 16 Aug 2021 19:30:03 +0600 NRK wrote: Dear NRK, > Fair enough. I suppose it should be better fit as a user patch in the > wiki then? I personally don't think that this makes sense as a user-patch, given there's maintenance involved and such a change usually just leads to failed hunks when using multiple patches. Everyone is free to upload a patch in the wiki, though, but there are already too many unmaintained and dead patches. With best regard Laslo
Re: [hackers] [dmenu][PATCH] turn -b into a toggle
On Mon, 16 Aug 2021 10:28:36 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > Thanks for the patch. I'd rather not add another option for it. > > I think if the default is not changed it still makes sense. Either > way the option works as documented. I understand, it's your call as the maintainer. Thanks for the quick response! With best regards Laslo
Re: [hackers] [dmenu][PATCH] turn -b into a toggle
On Sun, 15 Aug 2021 23:44:58 +0600 NRK wrote: Dear NRK, > currently config.h allows users to set the value of topbar to 0. > however if one does that, there's no way for him to get a topbar > again. it makes more sense to have -b as a toggle instead. this trades one problem for another given dmenu suddenly changes behaviour unexpectedly. Imagine someone having set up dmenu to be at the top by default (using config.h), but having a certain launcher script that invokes dmenu to be at the bottom by passing "-b". Now said user, using the launcher script often, learns to prefer the bottom bar and sets it as such in config.h. Assuming the b-flag will just be a redundancy, he keeps them in his launcher, only to be surprised that dmenu suddenly shows up at the top. In my opinion, and the maintainer's may differ in that regard, behaviour of flags should not be surprising when the defaults have been changed. Because of that, why not add another flag "-t" that forces dmenu to appear at the top of the screen. This makes it immediately obvious what happens (rather than turning b into a position-toggle, which makes zero phonetic sense and effectively renders the b-flag unusable for scripts because its inconsistent) and adds very little overhead. See the attached patch (also @Hiltjo, what do you think?). :) With best regards Laslo From 6499e6a6313a7dda8fc75329e01d37e585839ba6 Mon Sep 17 00:00:00 2001 From: Laslo Hunhold Date: Mon, 16 Aug 2021 09:36:49 +0200 Subject: [PATCH] Add t-flag complementing b-flag (top/bottom-positioning) There currently is no way to override the bottom-positioning if it has been set as a default in config.h. The simplest solution is to just add a complementary t-flag which overrides whatever behaviour has been set. This is more favourable compared to turning the b-flag into a toggle, given it would lead to inconsistent behaviour (scripts can't rely on it) and break the phonetic readability of the letter "b". Separate t- and b-flags are very clear and add negligible overhead. Signed-off-by: Laslo Hunhold --- LICENSE | 1 + config.def.h | 2 +- dmenu.1 | 6 +- dmenu.c | 20 +++- 4 files changed, 22 insertions(+), 7 deletions(-) diff --git a/LICENSE b/LICENSE index 3afd28e..f4a0e4f 100644 --- a/LICENSE +++ b/LICENSE @@ -10,6 +10,7 @@ MIT/X Consortium License © 2010-2012 Connor Lane Smith © 2014-2020 Hiltjo Posthuma © 2015-2019 Quentin Rameau +© 2021 Laslo Hunhold Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), diff --git a/config.def.h b/config.def.h index 1edb647..6a3839a 100644 --- a/config.def.h +++ b/config.def.h @@ -1,7 +1,7 @@ /* See LICENSE file for copyright and license details. */ /* Default settings; can be overriden by command line. */ -static int topbar = 1; /* -b option; if 0, dmenu appears at bottom */ +static int topbar = 1; /* -b/-t option; if 0, dmenu appears at bottom */ /* -fn option overrides fonts[0]; default X11 font or font set */ static const char *fonts[] = { "monospace:size=10" diff --git a/dmenu.1 b/dmenu.1 index 323f93c..15c5e68 100644 --- a/dmenu.1 +++ b/dmenu.1 @@ -3,7 +3,8 @@ dmenu \- dynamic menu .SH SYNOPSIS .B dmenu -.RB [ \-bfiv ] +.RB [ \-b | \-t ] +.RB [ \-fiv ] .RB [ \-l .IR lines ] .RB [ \-m @@ -75,6 +76,9 @@ defines the selected background color. .BI \-sf " color" defines the selected foreground color. .TP +.B \-t +dmenu appears at the top of the screen. +.TP .B \-v prints version information to stdout, then exits. .TP diff --git a/dmenu.c b/dmenu.c index 98507d9..85f1fa5 100644 --- a/dmenu.c +++ b/dmenu.c @@ -709,18 +709,20 @@ int main(int argc, char *argv[]) { XWindowAttributes wa; - int i, fast = 0; + int i, bflag = 0, tflag = 0, fast = 0; for (i = 1; i < argc; i++) /* these options take no arguments */ if (!strcmp(argv[i], "-v")) { /* prints version information */ puts("dmenu-"VERSION); exit(0); - } else if (!strcmp(argv[i], "-b")) /* appears at the bottom of the screen */ - topbar = 0; - else if (!strcmp(argv[i], "-f")) /* grabs keyboard before reading stdin */ + } else if (!strcmp(argv[i], "-b")) { /* appears at the bottom of the screen */ + bflag = 1; + } else if (!strcmp(argv[i], "-t")) { /* appears at the top of the screen */ + tflag = 1; + } else if (!strcmp(argv[i], "-f")) { /* grabs keyboard before reading stdin */ fast = 1; - else if (!strcmp(argv[i], "-i")) { /* case-insensitive item matching */ + } else if (!strcmp(argv[i], "-i")) { /* case-insensitive item matching */ fstrncmp = strncasecmp; fstrstr = cistrstr; } else if (i + 1 == argc) @@ -747,6 +749,14 @@ main(int argc, char *argv[])
Re: [hackers] [sbase] tar: check if reallocarray failed
On Sat, 17 Jul 2021 21:04:04 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > The patch below is for sbase tar: > > From 2eec3e07a5bd1ed1fa41ca02865297ab7d8b5fa8 Mon Sep 17 00:00:00 2001 > From: Hiltjo Posthuma > Date: Sat, 17 Jul 2021 21:03:27 +0200 > Subject: [PATCH] tar: check if reallocarray failed > > --- > tar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tar.c b/tar.c > index b74c134..122f30a 100644 > --- a/tar.c > +++ b/tar.c > @@ -78,7 +78,7 @@ static const char *filtertools[] = { > static void > pushdirtime(char *name, time_t mtime) > { > - dirtimes = reallocarray(dirtimes, dirtimeslen + 1, > sizeof(*dirtimes)); > + dirtimes = ereallocarray(dirtimes, dirtimeslen + 1, > sizeof(*dirtimes)); dirtimes[dirtimeslen].name = strdup(name); > dirtimes[dirtimeslen].mtime = mtime; > dirtimeslen++; ah yes, good catch! While at it, we might also want to check the strdup() in the consecutive line, right? With best regards Laslo
Re: [hackers] [PATCH] Add a configuration option for fullscreen locking
On Tue, 13 Jul 2021 14:33:31 -0400 Sebastian LaVine wrote: Dear Sebastian, > I am the "some people" that Quentin mentioned above :) > > I brought this up in the #suckless channel yesterday, when I was > having a problem with Firefox: When I entered into fullscreen mode, I > could no longer switch windows with my Alt+J/K keybindings. I use the > fakefullscreen patch, so for me fullscreen windows just expand as > much as possible. > > Quentin (quinq), Ingvix and I discussed this on IRC. To me, this > behavior seems rather unintuitive and therefore should not be > included in mainline. To my knowledge, it isn't documented anywhere > except the commit message, which is: count me in in that regard. If an application (most likely a game) wants exclusive fullscreen, it can capture the mouse in the window. I always set it like this in wine and have had no problems with that, and it still allows workspace-switching. For what it's worth, in my humble opinion dwm should always guarantee that you can switch workspaces. "Exclusive" fullscreen is a hack as we know from slock. Having a configuration-option is the best compromise, though. The maintainer decides what should be the default. :) With best regards Laslo
Re: [hackers] [st][PATCH] arg.h: optimize & style
On Sun, 4 Jul 2021 11:55:53 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > Thanks, but I prefer the current style one. > > I'm not confident this patch doesn't modify any behaviour. > For example I see the `i` variable was removed, but it is actually > important to not modify argv as this causes issues on NetBSD and > OpenBSD process listing (see commit > a5a928bfc1dd049780a45e072cb4ee42de7219bf). > > This is just one example. Unless it fixes a bug I rather keep the > current code. the arg.h in st has some "low-hanging" fruits regarding improvements, and I modified it accordingly in quark and farbfeld back in 2017 (see [0]) to fix some issues. One example is that in my modified form, you can actually access EARGF()/ARGF() multiple times (instead of silently corrupting the state), and it properly handles the case when argv is NULL (which is allowed by POSIX). There was also a bit of code-deduplication/refactoring with fewer local variables and abort() was replaced with exit(1). Of course and as the license permits, feel free to use it in st or other projects as well, if you like. :) With best regards Laslo [0]:https://git.suckless.org/quark/file/arg.h.html
Re: [hackers] [st][patch] Mild const-correctness improvements.
On Thu, 6 May 2021 17:48:33 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > The patch looks fine. I'm not in favor of some of the const changes, > but I think it makes sense to make function parameters like for > xstrdup() const. > > I'll review and push it later. const-correctness saved me from quite a few bugs in the past, so I personally changed my mind about it a a few years ago. It's always good to write down "contracts", because you can only keep so many things in your head at the same time. With best regards Laslo
Re: [hackers] [st][patch] Mild const-correctness improvements.
On Thu, 6 May 2021 16:11:33 +0200 "Markus F.X.J. Oberhumer" wrote: Dear Markus, > this is my first post to this list, so I hope I got the email patch > right. > > GitHub repo is at > https://github.com/markus-oberhumer/suckless-st/compare/mild-const-correctness-improvements thanks for your input, but please save the patch as a file (using git diff or git format-patch) and attach it to your E-Mail. A GitHub-link is not good because it doesn't satisfy archivability and is overkill, among other things. With best regards Laslo
Re: [hackers] [st][PATCH] Set custom environment variables in config.h
On Fri, 2 Apr 2021 07:42:24 + Subhaditya Nath wrote: Dear Subhaditya, > From 79e69338725563e1bdba32e856726e8fa5151e4c Mon Sep 17 00:00:00 2001 > From: Subhaditya Nath > Date: Thu, 1 Apr 2021 19:42:51 +0530 > Subject: [PATCH] Set custom environment variables in config.h > > This patch enables setting custom environment variables in config.h. > This patch changes config.def.h, and sets $EDITOR to /usr/bin/vim by > default. Beware. that's what .profile files are for. I personally don't see the benefit and, to the contrary, see a lot of potential for unexpected behaviour, but maybe I'm missing something. With best regards Laslo
Re: [hackers] [tabbed][PATCH] Remove quotes around variables in Makefile
On Fri, 2 Apr 2021 11:03:05 +0200 Hiltjo Posthuma wrote: Dear Hiltjo, > I prefer with quotes. You can still do make PREFIX=~/.local or > whatever. Otherwise you could use $HOME. aren't the quotes also necessary in case one of the variables (DESTDIR, MANPREFIX, etc.) contains spaces? With best regards Laslo
Re: [hackers] [svkbd] [merge request] various patches for svkbd
On Sat, 27 Mar 2021 14:03:05 +0100 Maarten van Gompel wrote: Dear Maarten, > I wonder if the svkbd patches I submitted last week arrived properly > and if you have the opportunity to look at them soon? > > (I only see 2 of the 24(!) patches in the mailing list archives, there > may be some caught in a filter?) > > Once possible issues are resolved and things are merged we'd like a > new svkbd release tag (0.3.0) so I can pick up the packaging end for > Alpine Linux and we can subsequently do our sxmo 1.4.0 release, for > which the new svkbd is a major dependency. no worries, all 24 patches arrived fine! I'm surprised, though, that the archive apparently only lists 2. Let's wait until the maintainer eaches out to you. With best regards Laslo
Re: [hackers] [quark] Apply (D)DoS-Hardening || Laslo Hunhold
On Sun, 07 Feb 2021 21:41:58 +0300 Greg Minshall wrote: Dear Greg, > thanks for your reply and detailed explanation, which i should have > understood from your earlier e-mail (and, if not, from looking at the > code). don't worry about it; the algorithm-code is a bit convoluted given the two optimizations taking place at the same time. With best regards Laslo
Re: [hackers] [quark] Apply (D)DoS-Hardening || Laslo Hunhold
On Sun, 07 Feb 2021 17:07:24 +0300 Greg Minshall wrote: Dear Greg, > just a comment from the outside. > > if i read get_connection_to_drop_candidate() correctly, your algorithm > selects the first, in terms of location in 'connection' array, "best" > (lowest state) candidate to drop. > > you might think of, when finding an *equally* "best" candidate, > flipping some (weighted, by?) coin, and either taking your current > candidate, or taking the newly discovered "best". as someone's > e-mail tag says, "when in doubt, randomize" :). > > (as a *research* experiment, in some other life, i might flip a coin > for *every* element in the array, based maybe on the relative states?) thanks for your input! I may have been imprecise with the description of my algorithm: Of all connection slots (where every one of them is occupied), it first finds out which in-address takes up the most (e.g. 20 connections from 127.217.17.131). Among those 20, it finds the (first) one with the smallest progression (i.e. state in this case). Indeed, this "minimizer" is not unique and we can have more than one connection from this client in the "minimal" state (e.g. 10 of those connections might be in the state C_RECV_HEADER, so not even finished with sending the request header). One could refine the algorithm to also minimize e.g. over the number of bytes received (for C_RECV_HEADER) or how much data has already been sent (for C_SEND_HEADER, C_SEND_BODY) and then find an even "better" candidate among those connections from this one greedy client, but maybe that goes too far. Your randomness approach might give a little peace of mind to select from multiple candidates, as mentioned before, but the placement in the connection-array itself is non-deterministic if a non-trivial number of clients access the server, especially near saturation, where any slot at any point in the connection-array might become free at any time. If I wanted a more refined behaviour, I'd probably just reduce my drop-candidate set further (with the previously mentioned further criteria). With this reduction, we'd be talking about 1-2 minimizers (how likely is it that we have matching byte-progresses?) which would not need a randomized approach anyway. If we still have a large set of candidates despite the refined criteria, one can reasonably assume that the client is just spamming the server with connections, and then it doesn't really matter which one of the connections we drop. Maybe I'll further refine it in the future. Thanks for reaching out and raising this very interesting point about randomization! With best regards for a nice Sunday Laslo
Re: [hackers] [quark][PATCH] Return -1 in case of errors in queue event wrapper functions.
On Sat, 30 Jan 2021 13:54:58 +0100 Rainer Holzner wrote: Dear Rainer, > Use same data type for nready (number of events) as returned by > queue_wait(). --- > [...] > - int qfd, nready, fd; > + int qfd, fd; > + ssize_t nready; > [...] > + return -1; thanks for spotting these mistakes and submitting a patch! I've pushed it. With best regards Laslo
Re: [hackers] [quark] Ignore queries and fragments in URIs || Laslo Hunhold
On Sat, 30 Jan 2021 14:30:12 +0100 Hiltjo Posthuma wrote: > Cool story, bro. To be continued ;)
Re: [hackers] [lchat][PATCH] Point that libutf is available in the sbase
On Thu, 28 Jan 2021 16:06:28 -0300 Pedro Lucas Porcellis wrote: > --- > README.md | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/README.md b/README.md > index 6ece4c0..d3a815f 100644 > --- a/README.md > +++ b/README.md > @@ -18,7 +18,7 @@ Programs you can use lchat as a front end for: > Requirements > > > - * libutf > + * [libutf](https://git.suckless.org/sbase) > * tail(1) > * grep(1) > > -- > 2.30.0 > > I don't know if that's really correct, given sbase does not install the library on the system. Instead, sbase just has a local copy, compiles it as a static library (.a) and links it into each binary statically. It would make more sense to do the same for lchat, but I'd rather check and see what really is needed. From what I can tell, reading the slackline-source, one could easily port it to libgrapheme, for example, because the utf.h-requirement is only necessary for "Rune"-handling (i.e. Codepoints). By porting lchat to libgrapheme you would get grapheme-cluster-support for free on top of that. With best regards Laslo
Re: [hackers] [quark][PATCH] Add a config switch to enable/disable NPROC limit
On Mon, 25 Jan 2021 14:17:17 +0100 Giulio Picierro wrote: Dear Giulio, > sorry for the late reply, I had a really busy week (course to teach) > :/. don't worry about it! This is a mailing list and meant to be asynchronous. Don't feel pressured to response and rather take your time; it's a pastime after all. > I didn't have the chance to test the new code, which I will do soon, > hopefully. > > If I understand correctly your solution is to read the rlimit from > the system and then update accordingly. > > Now it seems to me a good solution, however I have to ask: do we > really need to set the rlimit on the number of processes (which are > fixed)? > > I mean, what is the rationale behind it? Security purposes? > > I'm just asking out of curiosity, to understand better the design > choices :D. There are two aspects at play here, the limit on open file descriptors of the current process (RLIMIT_NOFILE) and the limit on threads per user (RLIMIT_NPROC). The former is simple: It's a per-process-limit and we just apply a heuristic and find a coarse upper bound for file descriptors the quark process will consume. You see we set both cur and max, which is because a program usually gets a signal when it exceeds the "soft" cur-limit, but we don't want that. We want to either succeed or fail hard. The nice thing about setrlimit() is that if the process is not privileged (or has CAP_SYS_RESOURCE set) it can always set the soft limit and irrevocably reduce the hard limit, so in case the limits are already large enough, we don't even need CAP_SYS_RESOURCE. The latter case is much more difficult, because the thread-limit is not per-process but per-user. However, there isn't really a portable way to find out how many current threads a user has. Say we give quark the flag "-t 50" and are thus telling quark to spawn 50 serving-threads. However, if the user has a thread-limit of 1000 and already has 990 threads active, pthread_create() will fail after creating 10 threads. However, if the user has a thread-limit of 90 and only 20 active threads, the thread creation will succeed (we suppose in both cases that the number of "foreign" threads is constant). Usually the thread limit is very high and not an issue, but there might be configurations where that is not the case. The approach I worked out for quark is kind of a hack, because what it does is just ask the system to increase the thread-limit by the number of threads quark needs. If that fails, e.g. if we are already at the kernel limit, we just ignore that error, because that's as much as we can hope to do. The only case where quark now could possibly fail with this heuristic is on a system where a user is close to thread-saturation (in terms of its limits, which must be way below the kernel limits) and some user-program rapidly spawns threads in the few miliseconds between quark's thread-limit-increase (triggering a TOCTOU) and thread-allocation. However, even in this extreme case, quark would just error out on pthread_create() and this is no security issue or anything. One could get rid of setting the thread limit by spawning the worker threads before dropping root, but I just don't feel comfortable with that. The earlier you lobotomize yourself, the better. With best regards, hoping this was helpful to you Laslo
Re: [hackers] [quark] http: fix default index serving
On Sun, 24 Jan 2021 19:12:30 +0100 Quentin Rameau wrote: Dear Quentin, > I would prefer that you keep rightful authors of patches instead of > changing the style a bit and committing in your own name. > This isn't respectful of contributors and seems to be a recurring > issue with you. > > If you want to change the style, you can discuss it with the authors, > and amend the commit before pushing instead of doing that. > > I would prefer that you revert the commit, and do it properly (which > would be good as it would also be explained in the development > history). as you know I mean neither disrespect nor offense and strictly add even one-time-contributors to the LICENSEs of my projects, because I believe in proper attribution. The way I handled the application of patches is probably due to the fact that I read a lot of OpenBSD-commits. To give you an insight, look at [0], on how frequent they "credit" external patches in the commit messages. However, I see and agree with your point and have reverted and split up the commits[1][2][3][4] and updated the license[5]. The main reason for this split is that git distinguishes between committer and author, a feature CVS doesn't have and which is likely the reason they choose the form, and it's good to have a distinctive history with clear authorship of patches and credit, as you also stated as your preference. > > The http_prepare_response()-function is pretty messy, especially in > > regard to stale data, which this bug is also based on. I'm working > > on making it more resilient by splitting the discrete sub-problems > > into separate functions. > > Yes, but that's also partly due to the style, these are no > “fallthrough” cases, there are early returns, and it's easier to read > them as such instead of putting them into if-then-else blocks > everywhere. This is a style/code-readability-matter indeed, however, I also added the change to the part regarding mime-type-handling, which is not style. As an afterthought, though, it makes more sense to do that in a separate commit, which I did now. Anyways, if I do something wrong or something bothers you, please let me know right then so I can have a chance to correct this. Otherwise, it's likely I won't notice. If you call it "recurrent", it basically implies an ill intent, which I don't have at all. With best regards Laslo [0]:https://freshbsd.org/search?q=heavily+based+on%5B%5D=openbsd=commit_date [1]:https://git.suckless.org/quark/commit/a4ea7cbe676adffd1dbd98b2bb7f68591b24d46c.html [2]:https://git.suckless.org/quark/commit/deeec27c56d8f5049abac0dad3782f5daf95a1a3.html [3]:https://git.suckless.org/quark/commit/8afc6416647585ec2695d57eee7c226216e4111c.html [4]:https://git.suckless.org/quark/commit/67c29aaba8a8194685677586338688e82c619e93.html [5]:https://git.suckless.org/quark/commit/c6a9055e5a30be570e30da8d216c39662c3a3f99.html
Re: [hackers] [quark] http: fix default index serving
On Sun, 24 Jan 2021 14:48:23 +0100 Quentin Rameau wrote: Dear Quentin, > bump sadly this patch was part of the mails that kept bumping due to the DMARC/SPF-signing-issue we discussed earlier at admins@. Now that I see your bump I hope this issue is resolved and I can see your future mails again. :) I took a look at the archives[0] and have merged it in [1], however, changed it into an else-case (to make it not depend on the fallthroughs in the if only) and changed the mime-check so the mime-type is matched against the docindex-path. The http_prepare_response()-function is pretty messy, especially in regard to stale data, which this bug is also based on. I'm working on making it more resilient by splitting the discrete sub-problems into separate functions. Thanks for finding this issue and your patch! I must admit that this issue slipped past me because I only checked quark's behaviour with "curl -I", which effectively masked this problem. With best regards Laslo [0]:https://lists.suckless.org/hackers/2101/17763.html [1]:https://git.suckless.org/quark/commit/87ae2e9212c5cc7309eefa2a3f49a758862db6c7.html
Re: [hackers] [quark][PATCH] Add a config switch to enable/disable NPROC limit
On Mon, 18 Jan 2021 23:03:11 +0100 Laslo Hunhold wrote: > that is a really nice observation! Thanks for pointing it out and your > elaborate explanation. I must honestly admit that I assumed the limit > was per process and not per user. I'll think about how to approach > this the best way; given your aforementioned fact, I only see two > options: > >1) Don't touch the rlimits and let it fail, giving a proper error > message (might be problematic for open file descriptors that > might get exhausted at runtime). One can also check the limits > beforehand and error out (e.g. if we cannot guarantee 4 fds per > slot). >2) Uncrement the rlimits by first reading them and setting the > incremented value. possible problems here are TOCTOU (even > though the risk here is not too high) and a possible > interference in things that shouldn't be touched by convention. To give a followup, I went with option 2), because it allows the smoothest operation. Quark is run as root and thus can seize all the assets it needs. I'm sure many set their global resource limits to reasonable values, but if you run your server and give it a thread and slot count, you can estimate that it might exceed your resources. In that respect, it is forgivable by quark to just raise the bar instead of failing. Thanks again, Giulio, for your input and patch suggestion. I hope this fixes your issues in your case! With best regards Laslo
Re: [hackers] [quark][PATCH] Add a config switch to enable/disable NPROC limit
On Sun, 17 Jan 2021 17:29:53 +0100 Giulio Picierro wrote: Dear Giulio, > Quoting the book "The Linux Programming Interface" from Micheal > Kerrisk: "the RLIMIT_NPROC limit, which places a limit on the number > of processes that can be created, is measured against not just that > process’s consumption of the corresponding resource, but also against > the sum of resources consumed by all processes with the same real > user ID." > > This leads quark to easily fail on Linux when launched with the same > userid of a logged user. > > For example if the user 'giulio' has an active desktop session, the > following command: > > $ sudo ./quark -p 8080 -u giulio -g giulio -l > > fails with the following error: > > $ ./quark: pthread_create: Resource temporarily unavailable > > No error occour if instead quark is launched with an userid that does > not have a session, such as the 'http' user, usually reserved for web > servers. > > I don't know if this is expected or this could be considered a bug: > in the end for production servers we could expect that the limit > works correctly. > > In any case, the least invasive way that I have found to solve the > issue is to introduce a config switch to disable the limit, retaining > it enabled by default. that is a really nice observation! Thanks for pointing it out and your elaborate explanation. I must honestly admit that I assumed the limit was per process and not per user. I'll think about how to approach this the best way; given your aforementioned fact, I only see two options: 1) Don't touch the rlimits and let it fail, giving a proper error message (might be problematic for open file descriptors that might get exhausted at runtime). One can also check the limits beforehand and error out (e.g. if we cannot guarantee 4 fds per slot). 2) Uncrement the rlimits by first reading them and setting the incremented value. possible problems here are TOCTOU (even though the risk here is not too high) and a possible interference in things that shouldn't be touched by convention. What do the others think? With best regards Laslo
Re: [hackers] [quark] Use epoll/kqueue and worker threads to handle connections || Laslo Hunhold
On Sun, 17 Jan 2021 12:48:38 +0100 Hiltjo Posthuma wrote: Dear Hiltjo, > This does not work on OpenBSD and it does not compile. thanks for letting me know! I didn't come around to testing it on OpenBSD yet, but did it now and pushed a fix[0]. With best regards Laslo [0]:https://git.suckless.org/quark/commit/959c855734e3af12f35532d76deb1ab85474f8f4.html
Re: [hackers] [quark] Prevent overflow in strtonum()-parameters || Laslo Hunhold
On Sun, 1 Nov 2020 11:17:42 +0100 Quentin Rameau wrote: Dear Quentin, > SIZE_MAX is the tangible guarantee for the upper limit of size_t. indeed, but strtonum's maxval argument is a signed long long, and given size_t can be unsigned long long, we could overflow it. With best regards Laslo
Re: [hackers] [quark][PATCH] Fix overflow when calling strtonum in parse_range
On Sun, 1 Nov 2020 01:15:32 +0100 José Miguel Sánchez García wrote: Dear José, > Good point! It could be the case that SIZE_MAX is smaller than > LLONG_MAX. Honestly I don't know, but I would do what you are > proposing just to be sure: it is the safest option, and maybe the > compiler will take care of replacing the correct value at compile > time. Way better than leaving another bug lingering until someone > else finds it again. we should be safe at that point and I have committed the MIN-solution in commit 7d26fc695. Thanks for your report! With best regards Laslo
Re: [hackers] [quark][PATCH] Fix overflow when calling strtonum in parse_range
On Sat, 31 Oct 2020 21:58:26 + José Miguel Sánchez García wrote: Dear José, > The value passed as maxval, SIZE_MAX, doesn't fit on a long long int > due to signedness. It was causing legitimate range request to be > discarded as bad. > > I tested it serving an mp4 and opening it with Firefox. A "range=0-" > was requested, and it triggered the bug. this is a great catch, thanks! But wouldn't it be better to use MIN(SIZE_MAX, LLONG_MAX)? I haven't found anything in the standard that puts "long long" and "size_t" into any relation, which means, for me, that any case is possible where either value could be larger, but please correct me if I'm wrong. With best regards Laslo
Re: [hackers] [quark] Thoughts on CGI and authentication?
On Mon, 26 Oct 2020 11:49:33 +0100 José Miguel Sánchez García wrote: Dear José, > Funny, that's my current use case. All my CGI is through forms, so > I'm currently running a separate server for the form handlers, > regenerating the HTML and then redirecting to the recently updated > page through a "303 See Other" code. > > My motivation behind integrating CGI into quark was leveraging the > quality of its implementation to avoid the security pitfalls of > badly-written HTTP servers out there. I would only have to worry > about writing a simple script to handle the form data. > > Also, if CGI was integrated into the web server itself, I could use > the same domain/port/endpoint to serve the static page (via a GET > request) and to handle the form (via a POST request). Moot point but > it goes a long way towards usability. another approach would be to have a very small interposer that splits GET and POST requests and forwards them to quark and the CGI-handler respectively. > Finally, CGI is often used to customize the content of a page for a > given user. Imagine a logged in user in a forum: they must see a link > that points to their profile. Anonymous users would see a login/signup > bar instead. > > I must say that, even with these advantages in mind, I've come to > think that CGI would not be appropriate for quark. Its goals are at > odds with the needs of a CGI implementation, and that's fine (there > are alternatives for those who want CGI). Feel free to prove me wrong > :) Software gets really complex if you try covering the last 5% of use-cases. Given the massive flexibility of the static web and how many CGI-applications really are just far away from the original idea of the web I really don't see a reason to tailor quark towards CGI. It was there before, but it just made everything really complicated. With best regards Laslo
Re: [hackers] [quark][PATCH] Add skeleton for keep-alive connections
On Thu, 29 Oct 2020 10:16:36 + José Miguel Sánchez García wrote: Dear José, > The bare minimum has been implemented, it is currently unused. It > allows the server to maintain a stateful connection with the client. > Also, keep-alive connections are more efficient than successive > request/response pairs of connections. thanks for your patch, but this can definitely be implemented much simpler. It's sufficient to have a "binary" field "int keepalive" in the response struct and set it when we prepare the request-struct (defaulting to 0 of course, i.e. close) depending on the request-fields. There's no need to add new data-structures or anything. At the end of serve(), we then check the response-struct and either close the connection or return to receiving the header. I respect your systematic approach, but it's not like there will be any more than a binary state (close or keep-alive) to this process. I'd love to chime in further, but I've got a lot to do at the moment and definitely have the keep-alive-connections as a big thing on my todo-list. With best regards Laslo
Re: [hackers] [quark][PATCH] Don't erase response on http_send_error_response
On Mon, 26 Oct 2020 11:34:17 +0100 José Miguel Sánchez García wrote: Dear José, > > I also don't see a reason for the constraints you mention. Just add > > an array of group-auth-pairs to the server struct and also add a > > group-auth-pair to the req-struct that you then fill when you parse > > the request fields in http_parse_header(). Then later, in > > http_prepare_header_buf(), you check if they match and either send > > an error-header (access denied) or allow access. > > > > In case the auth-field is empty but the file requires a password, > > you, in turn, send the desired header to ask for auth. > > You are absolutely right, and I just didn't see it when I was working > on it. Sorry for wasting your time. no problem! Sometimes it takes a few refactorings of an idea until it is implemented the best way. With best regards Laslo
Re: [hackers] [quark] Thoughts on CGI and authentication?
On Sun, 25 Oct 2020 18:00:30 +0300 Platon Ryzhikov wrote: Dear Platon, > I've recently had an idea that instead of adding support for running > scripts by HTTP server (which in any case leads to new fork() calls) > one could use a library providing HTTP server itself while all the > logic is created separately and is performed using callbacks from > library main loop. In that case one could attempt to handle dynamic > (and static using proper callbacks) content within fixed number of > threads. there is theoretically no limit to that, but IPC is a difficult thing here given you are within a chroot. One could think of another Unix-domain socket (besides the one that would be created with the -U option) that could be used to "send" and "receive" data, but to be honest, it really is not withing quark's scope. Tell me one example where you need CGI which isn't a web forum? To give an example how you can solve something statically: A comment section could be built by having a static web server and also a very thin "handler" that is called when the form is submitted that adds the comment to a database and updates the static data on the fly. The advantage of this is that if someone manages to "crash" the comment-handler or kill the database process or something, the website is not affected. Still, maybe I'm missing something here. Please let me know what you need CGI for! With best regards Laslo
Re: [hackers] [quark][PATCH] Don't erase response on http_send_error_response
On Sun, 25 Oct 2020 11:04:26 +0100 José Miguel Sánchez García wrote: Dear José, > I'm currently relying on the req struct NOT being erased, because I'm > storing the realm the file belongs to there. Then, I'm using that > realm information to build the WWW-Authenticate header for the 401 > error response. > > I could just save that field before erasing everything else, but I > wonder if that's the way to go. If you are getting rid of everything, > maybe I shouldn't make exceptions? Definitely don't make exceptions here, because erasing the entire struct is a consistency measure and being inconsistent there complicates the semantics. I also don't see a reason for the constraints you mention. Just add an array of group-auth-pairs to the server struct and also add a group-auth-pair to the req-struct that you then fill when you parse the request fields in http_parse_header(). Then later, in http_prepare_header_buf(), you check if they match and either send an error-header (access denied) or allow access. In case the auth-field is empty but the file requires a password, you, in turn, send the desired header to ask for auth. With best regards Laslo
Re: [hackers] [quark][PATCH] Don't erase response on http_send_error_response
On Sat, 24 Oct 2020 16:19:13 + José Miguel Sánchez García wrote: Dear José, thanks for taking your time reading the code and reporting this! > The comment before the offending line indicated it was intended to > only erase the fields, but it erased the whole response. It was most > likely a bug. > > /* empty all fields */ > - memset(req, 0, sizeof(*req)); > + memset(&(req->fields), 0, sizeof(req->fields)); No, this is supposed to be like this. I agree that the comment is a bit misleading, but http_parse_header() really builds a request from scratch and first sets it all to zero. With "fields" I'm referring to the struct fields in request, and this misleading comment will be fixed in an upcoming commit. With best regards Laslo
Re: [hackers] [quark] Thoughts on CGI and authentication?
On Fri, 23 Oct 2020 17:10:37 +0200 José Miguel Sánchez García wrote: Dear José, > That was the whole reasoning behind supporting digest authentication. > Sure, TLS protects the connection from third parties messing around > with your connection, but nothing prevents an evil/misconfigured > server from stealing your cleartext password. At least with digest > authentication, you know that the server is not seeing your password > either (at least you would if the login UI for HTTP auth were barely > usable and told you info about the security mechanism being used... > I'm getting off track sorry). I see what you mean. Still, when you go via TLS, it makes sure that the authenticity of the server is assured as well. > > Keeping with the spirit of the current set of command line arguments > > (e.g. -m for maps, of which you can specify as many as you want), > > one could have a flag -p (protect/password/whatever) that takes a > > group name and a cleartext password and applies it to all files > > matching that group in the serving folder, for example '-m "nogroup > > user:pw"' for example. > > I like that: simple and intuitive. Will do that, thanks! You might also go with "group user pw", which saves us one more "token"-format. > I hope it ends up being a drop-in solution, looking at the code it > seems like it will. We'll know when it's done ;) It most probably will be. With best regards Laslo