Date: Fri, 11 Mar 2016 06:26:42 +0100 From: Joerg Sonnenberger <jo...@britannica.bec.de> Message-ID: <20160311052642.gb27...@britannica.bec.de>
| Three questions here. First is, how much work is it to go from NUL | delimited strings to explicitly sized strings? Lots. I would not even attempt that particular change alone. While certainly an attractive idea, it would (I think) require altering the intermediate format to not just be strings representing commands, and slightly later in the processing, redirects, assignments and args (the cmd name is just arg 0). | Second is, would that allow most of the special chars to go away? If done properly, yes, they would all vanish, and be incorporated into the tree structure (or something isomorphic.) I think this could be a long term project (perhaps even a GSoC in the future.) | Third is, alternatively, would it allow to move to a more consistent | scheme using NUL as escape character? Unfortunately, no. Believe it or not, sh already uses '\0' as a separator in places (since the code knows that '\0' simply cannot be a data character.) And, no, you really don't want to know how that works... And from a slightly earlier message ... | We can also switch to using isalpha_l and friends with explicit C locale. Yes, that would get about half the advantage (avoiding the locale dependent syntax that it currently has) without the (slight) speedup. For the shell I doubt that's worth the bother - this stuff is only used for recognising shell syntax elements, for which the char set is (or should be) largely fixed. What user data might be the shell doesn't care (which is partly why it is totally ignorant of any i18n issues.) Those are just bytes... This means that doing our own char -> char_type mapping is just fine (I have no idea why FreeBSD felt the need to change it, sometime in the mid 90's .. I suspect that it came to NetBSD without much evaluation, just "they did it, their shell has less bugs than ours" ... the NetBSD PR that caused this was one of the myriad of "set -e is broken" that NetBSD has had over the years - the patches that were incorporated included stuff related to that, and all kinds of other changes, including this one.) The syntax tables & macros that the shell uses, are actually one of the better designed features of the implementation - or at least, once UPEOF goes away they will be. They actually work as they should, and are implemented correctly (assuming that sh only needs to deal with characters that fit in octets, which for shell syntax, is OK.) If it matters, making this change will make the shell slightly smaller, not bigger, the char syntax array (even though unused) was never removed, and it would no longer require the <ctype.h> arrays. Switching which values represent the special characters should make no externally visible change at all (other than, if done alone, which user data characters get squashed.) kre