On 11/28/2012 03:34:59 AM, Jonathan Clairembault wrote:
> Back to expand_file(). The downside of using readall() is that interactive > granularity goes way down. I had this problem with "tee" once upon a time, > it meant that piping the output of anything through tee made it appear in 4k > chunks, which meant if you logged the result of a build you couldn't really > see what the build was doing. I'm not sure expand has the same use cases,
> but that's why I did xread().

Well it seems like gnu/damnit version does buffering as well at least
it does not process input as a line by line basis. I don't see why
using xread changes anything, you probably need fgets here. Though I
think we can safely buffer until someone comes in and raises
interactivity need. wdyt?

I was thinking more along the lines of letting fputc() write data into the stdio.h buffer and letting that worry about when to flush it, and then we don't have to keep track of two positions.

> Ah, hang on. Internationalization. This thing is going to need multibyte > support for utf8, isn't it? (The same general logic as wc -m. Hmmm, I wonder
> if they can share code?)

Ah! I thought toybox was not dealing with internationalization. Though
that's a good thing to have internationalization.

I'm not doing full internationalization with date formats and having sort come up with different orders depending on locale, but UTF8 support is worth doing (with a top level config symbol, a bit like floating point support).

> Ok, I'll have to come back to this in the morning.

And it is... no longer morning! (We'll ignore the two missed days in there.)

I updated wc to theoretically deal with buffer wraps better. In reality I haven't got UTF8 test data to run through this, and should probably find some at some point.

I redid the actual expand function to be simpler: read data into toybuf and then write it to stdout using either fputc(char, stdout) or xprintf("%*c", len, ' ') depending on whether it's a tab or something else. It checks for tab (trigger the space behavior) and newline (reset counters).

What it does _not_ currently do is track "spaces advanced" separately from "bytes advanced", that needs the utf8 stuff to grab groups of bytes that represent a single character, and to make _that_ work I need to copy the logic I just added to wc, which means maybe I should genericize it into lib/lib.c somehow? Needs more thought.

This also assumes that all characters are the same width, which is probably wrong and I need help with if so. (I dunno how to do fontmetrics here?)

I need to catch up on doing the test suite, because I've been testing by hand. My scrollback buffer says:

echo -e 'blah\tblah' | ./toybox expand | hexdump -C
echo -e 'blah\tblah' | ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah' | \
  ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,11 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,22,33 | hexdump -C
echo -e 'blah\tblah and then some more because\tblah\n\tand' | \
  ./toybox expand -t 3,11,22,33,44 | hexdump -C

Possibly I should turn that into an actual automated testy thing.

Sleep time now.

Rob
_______________________________________________
Toybox mailing list
[email protected]
http://lists.landley.net/listinfo.cgi/toybox-landley.net

Reply via email to