On 4/9/22 11:36, Moritz Christian Weber wrote: > Hi, > the previous mention git (clone) toy is making progress, even as I struggled a > lot with my bad C knowledge.
I look forward to this. Learning is part of the process. :) > I have a 300 LOC multi-command scaffold, which > works in most parts. I'm curious about your design. I'd been thinking of reusing the existing plumbing in main.c for this. If git.c is a single C file then I'd probably split toy_find() into two functions, with a new toy_find_list() function for toy_find to call, taking the list to search and its length as arguments. (The "adjust to skip the first entry of toy_list because the toybox multiplexer is special cased" should stay in the toy_find() wrapper that operates on the global list.) Then git.c could have a function that declares its own toy_list: struct toy_list *find, git_list[] = { // Note: keep this alphebetized for binary search {"help", git_help_main, "<1>1agimw", 0}, {"pull", git_pull_main, "(recurse-submodules): qv", 0}, ... }; if (!(find = toy_find_list(*toys.optargs, git_list, ARRAY_LEN(git_list))) error_exit("no %s", name); toy_init(find, toys.optargs+1); Hmmm, I really need to work out how to handle --option and --no-option in the argument stuff (git pull is full of them). That's an extra prefix punctuation in the optstr I expect (haven't used "+" yet), and a round of lib/args.c modification to do cleanly. Or maybe not PREFIX, maybe it's a new trailing punctuation on a longopt saying this could be a --no- variant and a second bitfield tracking the "no"s we've seen? Given that a command line can theoretically have --thingy --no-thingy --thingy it would have to be aware and toggle somehow... Still need the second bitfield so the caller can use it to switch off DEFAULT values... toys.optflags |= DEFAULTS; toys.optflags &= ~toys.nopeflags; Anyway, I look forward to seeing what you've done. Not meaning to overwhelm you, a lot of cans of worms can stay closed for a while longer. (Wait for somebody to complain...) Fred Brooks had a lovely line in The Mythical Man-Month about how you're probably going to throw away your first implementation, but if you PLAN on doing this you will definitely throw TWO implementations away. The insight is that you don't properly understand a problem until after you've solved it and tried to use the solution to do real stuff. And then the x11 guys wrote about the "second system effect", where after you've done a working prototype implementation that's taken off and developed a userbase, there's a strong temptation to start over and DO IT RIGHT with a new one that will conclusively solve every problem for everybody... meaning you hugely overreach and do a lot of "infrastructure in search of a user" and make a bloated mess. And then if you're REALLY lucky you get a third system like Unix, in which survivors feeling from the collapse of a second system (in Unix' case Multics, which was a second system response to MIT's CTSS and had 36,000 pages of design documents before the hardware arrived; the project collapsed under its own weight basically immediately) started over being MINIMAL this time. Having learned all the second system lessons about what NOT to do, they were asking what can I now AVOID doing to make a small and simple system that does only what it needs to? (This whole plot arc is one of the many talk ideas I never got around to properly organizing and giving at a conference...) > It supports also git pack indexing as it make things easier > to handle, safes the lifetime of SDCards on embedded systems (to avoid a > decompression of all pack objects and then removing all objects, which are not > touched by checkout) and opens the option to extend the toy for further > commands. > > Anyways, I am a little stuck now in the progress due to 3 mandatory library > refactorings for git fetch: > > 1) zlib inflate() is my biggest pain point. I'm currently poking at that to add gzip and deflate support to wget. > The git pack format indicates the > length of each zlib compressed object in the pack file by the size of > UNCOMPRESSED chars after decompression. Most likely it calculates it before compressing the data. (If I recall, the header and body were compressed as a single block?) > Consequently, git fetch expects to read > a zlib compressed object as long as it did not read more than the announced > number of uncompressed char and then expects feedback from inflate up to which > compressed char/position/offset it read (file pointer/position/offset). After > that it reads the next object type, uncompressed object length and compressed > objects data and so on. I guess it is some optimization for sequential write > of > the pack format, when compressing. > > I am not sure, if I can get this behavior out of inflate() in deflate.c by > initializing deflate and bitbuf in a special way. Or if I need another inflate > function signature. Any kind of advise would be helpful. I'm not sure I understand. Do you need a function that something like: int deflate_read(char *buf, unsigned length, unsigned &used); Which sets used to the number of bytes read from the input... Except the problem there is the input isn't specified (filehandle, memory buffer), and "used" is going to be a number of BITS not bytes. and the next read may not start on a byte boundary if the output was stopped suddenly (it needs the deflate state to resume writing into the next buffer). I'd have thought that "here's this filehandle to read uncompressed data from" could be passed off to the next thing that needs to read more uncompressed data from after the previous data? (I vaguely recall deflate marks end-of-data so it can tell provide EOF to the filehandle. When you resume from there, do you need the next compressed or next uncompressed byte after the previous compressed block? I guess uncompressed is more flexible and then we can feed it through the decompressor again ourselves if we need to.) The problem is, fundamentally what you've got it in two pieces: "here's X bytes left in the buffer" plus "now read more from the original input". I kinda of want to know what the consumer looks like before trying to smooth that over... > 2) sha1sum(): I need a sha1sum() to hash decompressed git objects for > indexing. > md5sum.c seems to have the functionality, but again with focus to file > descriptors. Any indication how to reuse sha1sum in another toy? I suppose I can move the hash plumbing into lib? Although possibly it belongs in portability.c because of that library config stuff to use assembly optimized version du jour. And moving it raises the question of whether sha256sum and sha3 should be in the library too... Sigh. A command reaching out sideways and using another command's plumbing is... not encouraged. Although toysh does it for some shell builtins. But toysh either has to be ok with "this command is switched off in menuconfig" or force the config with dependencies. (Note that toysh is special cased in scripts/single.sh.) I suppose git could have "depends" lines in its config. Or "selects"... Anyway, generally speaking the three "clean" options are: A) Run it as a child process, ala xpopen("command") or similar. There's xrun() and toys/*/mount.c has a tortoise() function that should probably be genericized somehow. (It's on the todo list. B) Bundle multiple commands into a single .c file, although that can go overboard (with ps.c being the prime example of "probably too much"). C) Move the plumbing in question to lib/ like I did with deflate.c a while back. (And am revisiting for wget.c now...) > 3) wget_main(): I currently use execv() both required git fetch http requests > wget.c and read the files afterwards. It works (except from the fact that file > contains 8 char of git protocol response too), but I would be more elegant to > have parts of wget_main() in the library for reusage. Piping would be also an > option, but less elegant than a function. Any indication which parts and > function signature for wget.c could/should be refactored to the library? I > would > like to http get/post and get a response buffer/char* back. I'm working on this now. Let me promote it first, and then I can see if it needs to go in lib... Could you explain "8 bytes of git protocol response"? (There's a git:// protocol and an https:// protocol and I never really read up on either...) Does this also need to read a known number of bytes and then pass off the filehandle to a second consumer without overshooting? > Best regards, > Moritz Rob _______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net