Rob Landley wrote: > On 04/13/14 17:18, Felix Janda wrote: > > Rob Landley wrote: [..] > >> Really the only interesting errno case from iconv is illegal sequence. > >> The rest just say "ran out of input" or "ran out of output" which is > >> what you expect from a conversion that's not at the end of the file yet. > >> (Ok, truncated sequence is a synonym for illegal sequence if we're not > >> at the end of the buffer, which we can special case as at the _start_ of > >> the buffer with the memmove logic.) > > > > You mean "if we're at the end of the buffer"? > > No, if we are at the end of the buffer, truncated sequence isn't an > error. It means the buffer ran out before the sequence did. But if we're > _not_ at the end of the buffer, it means the
Ah, I was confusing "end of buffer" and "end of file". > However, if we just zap the parts we handled, do the memmove to the > front, refill the buffer, and then have the error _again_ that means the > truncated sequence is invalid, not a problem with running out of data. > > (And that means we don't have to care how long the truncated sequence > is, so we don't care how far from the end of the buffer still counts as > retrying instead of skipping.) > > >> Hmmm... we should probably pass illegal sequence bytes through. (Pass > >> 'em through.) Except check if output buffer is full before doing that. > >> (Don't have to check inleft nonzero because if inconv() returns illegal > >> sequence but used up all the input buffer, that's a libc bug.) > > > > Right... > > I think the -c flag controls whether or not to pass them through, > although posix is going "and we refuse to specify the behavior here at > all because Microsoft paid us money not to". > > http://pubs.opengroup.org/onlinepubs/9699919799/utilities/iconv.html > > >> Where would I get a test file to convert? I just ran a text file through > >> it and confirmed it's not making any changes to it, but that doesn't > >> mean much. :) > > > > More interesting would be roundtrip encoding some files. > > Except that means "cat" would pass. Not really a test that instills a > lot of confidence in me... > > > For testing, I just used an uim tarball[1] with some eucjp encoded files. > > The cleaned up version still seems to work properly. > > We can echo -e some snippets. Basically if we convert between utf-8 and > whatever it is windows uses (latin pi) for like japan or korean or > something, we'll have shown it Did A Thing. We're not trying to test the > libc implementation of iconv, just show that we're feeding data into it. > > > Even more interesting would be a file with some illegal sequences. I didn't > > test that at all. > > The failure paths are always the most interesting thing to test. And the > most often overlooked... > > We'd also want to test retry across 2k boundaries on both input and > output if we were being serious. _and_ test a file that exactly filled > up the input and another that exactly filled the output buffer when the > file ended. > > But again, since I dunno what success looks like, I'll wait for somebody > who does to complain. :) I think the simplest thing would be to translate between iso-8859-1 and utf-8. Attached a simple test. > > Error handling looks more sensible. Have you considered that iconv_open() > > might also fail because of insufficient memory. > > I looked at doing perror_exit(0) but EINVAL is "Invalid argument" which > isn't necessarily enough to figure out what went wrong. As for other > failure causes: [..] Ok thanks, I see why you changed the error message. > >> P.S. Posix iconv has several more command line options. -c is easy and > >> -s is NOP for us, but I dunno how to do -l. > > > > glibc's doesn't have them. So I guessed that they are not much used. > > Now I see that libiconv has them. > > When glibc and posix disagree, posix can potentially win. I'll probably > do the extra 2 posix options on general principles, and fluff out the > help text before promoting it. Yeah, -c and -s look sensible. Felix
#!/bin/bash [ -f testing.sh ] && . testing.sh #testing "name" "command" "result" "infile" "stdin" iso=$(printf '\357') utf=$(printf '\303\257') # "ï" printf a > iso printf a > utf for i in $(seq 4096) do printf "$iso" >> iso printf "$utf" >> utf done testing "iconv" "iconv -f iso-8859-1 iso" "$(cat utf)" "" "" testing "iconv -c" "iconv -c -f utf-8 iso" "a" "" "" rm iso utf
_______________________________________________ Toybox mailing list Toybox@lists.landley.net http://lists.landley.net/listinfo.cgi/toybox-landley.net