Re: [Toybox] More expand cleanups

Rob Landley Fri, 30 Nov 2012 00:51:35 -0800

On 11/28/2012 03:34:59 AM, Jonathan Clairembault wrote:

> Back to expand_file(). The downside of using readall() is thatinteractive> granularity goes way down. I had this problem with "tee" once upona time,> it meant that piping the output of anything through tee made itappear in 4k> chunks, which meant if you logged the result of a build youcouldn't really> see what the build was doing. I'm not sure expand has the same usecases,
> but that's why I did xread().
Well it seems like gnu/damnit version does buffering as well at least
it does not process input as a line by line basis. I don't see why
using xread changes anything, you probably need fgets here. Though I
think we can safely buffer until someone comes in and raises
interactivity need. wdyt?

I was thinking more along the lines of letting fputc() write data intothe stdio.h buffer and letting that worry about when to flush it, andthen we don't have to keep track of two positions.

> Ah, hang on. Internationalization. This thing is going to needmultibyte> support for utf8, isn't it? (The same general logic as wc -m. Hmmm,I wonder
> if they can share code?)

Ah! I thought toybox was not dealing with internationalization. Though
that's a good thing to have internationalization.

I'm not doing full internationalization with date formats and havingsort come up with different orders depending on locale, but UTF8support is worth doing (with a top level config symbol, a bit likefloating point support).

> Ok, I'll have to come back to this in the morning.

And it is... no longer morning! (We'll ignore the two missed days inthere.)

I updated wc to theoretically deal with buffer wraps better. In realityI haven't got UTF8 test data to run through this, and should probablyfind some at some point.

I redid the actual expand function to be simpler: read data into toybufand then write it to stdout using either fputc(char, stdout) orxprintf("%*c", len, ' ') depending on whether it's a tab or somethingelse. It checks for tab (trigger the space behavior) and newline (resetcounters).

What it does _not_ currently do is track "spaces advanced" separatelyfrom "bytes advanced", that needs the utf8 stuff to grab groups ofbytes that represent a single character, and to make _that_ work I needto copy the logic I just added to wc, which means maybe I shouldgenericize it into lib/lib.c somehow? Needs more thought.

This also assumes that all characters are the same width, which isprobably wrong and I need help with if so. (I dunno how to dofontmetrics here?)

I need to catch up on doing the test suite, because I've been testingby hand. My scrollback buffer says:


echo -e 'blah\tblah' | ./toybox expand | hexdump -C
echo -e 'blah\tblah' | ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah' | \
  ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,22,33 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,22,33,44 | hexdump -C

Possibly I should turn that into an actual automated testy thing.

Sleep time now.

Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Re: [Toybox] More expand cleanups

Reply via email to