Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
See docs/README.developers. We don't have a written convention for when to use a branch, so it's a judgment call considering how invasive the changes will be, duration, likelihood of success, and whatever else. (I am planning to remove my old merged branches, maybe after the release. We don't do that very often, maybe this can give you a clue to how often we create branches.) That pretty much nails it. I think most of us just work on private local branches until we're ready to merge to the head and push back to the main repository. Public branches generally show up when things need wider testing and review before being merged back to the trunk; they show up infrequently. --lyndon ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Ralph wrote: > Thanks, that helped quite a bit. I've pushed a trivial fix to master. > If I've done anything wrong, e.g. not configured my ID properly, then > let me know. Looks fine. One thing that we should add to README.developers, the buildbot results are available at http://orthanc.ca:8010/waterfall . It polls for commits, I'm not sure what the interval is but it seems like a small number of minutes. Green is good. Only three hosts are active on master now. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Hi, David wrote: > See docs/README.developers. Thanks, that helped quite a bit. I've pushed a trivial fix to master. If I've done anything wrong, e.g. not configured my ID properly, then let me know. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
>:-) Through a cunning bit of social engineering the other day, I did >apparently gain access to nmh's git repository. Is there anything that >documents conventions in using it for the project, e.g. whether to check >in directly on master or use a branch? I think David covered the (lack of rules) adequately; we don't have any real rules for branch vs master, other than "use your best judgement". The only other thing to be aware of is if you're building from the git repo, you need to have more stuff than you do if you're building from a distribution tarfile (like the Autotools suite, yacc, and lex). --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Ralph wrote: > Hi David, > > :-) Through a cunning bit of social engineering the other day, I don't want to know :-) > I did apparently gain access to nmh's git repository. Is there > anything that documents conventions in using it for the project, > e.g. whether to check in directly on master or use a branch? See docs/README.developers. We don't have a written convention for when to use a branch, so it's a judgment call considering how invasive the changes will be, duration, likelihood of success, and whatever else. (I am planning to remove my old merged branches, maybe after the release. We don't do that very often, maybe this can give you a clue to how often we create branches.) > > This shouldn't happen very often, so I'd lean toward the simpler code. > > By that I guess you mean the status quo? Yes :-) David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Hi David, > Great! We'll get you to the bleeding edge yet. :-) Through a cunning bit of social engineering the other day, I did apparently gain access to nmh's git repository. Is there anything that documents conventions in using it for the project, e.g. whether to check in directly on master or use a branch? > > An alternative to sliding down the remaining unprocessed input with > > memmove(3) and shortening the next fread, I suppose. > > This shouldn't happen very often, so I'd lean toward the simpler code. By that I guess you mean the status quo? -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Ralph wrote: > Hi David, > > > Might Ken's commit adfed5f72bc07ac7de8dfc62188338d4d4f25a38 have fixed > > this? > > Yes, indeed. Great! We'll get you to the bleeding edge yet. > IOW, it seeks to 4Ki and reads 4Ki - 1 so it's left in the right place > to read the one byte, that we already have, next time. An alternative > to sliding down the remaining unprocessed input with memmove(3) and > shortening the next fread, I suppose. This shouldn't happen very often, so I'd lean toward the simpler code. > `strace -e desc mhparam foo' shows many lseek(2)s to get the current > position on .mh_profile; always the same. Triggered by the infamous > m_getfld()'s ftello(3). :-) Must trigger for every header of every > email processed too. Yes, m_getfld() could use another rewrite. Though the last one really wasn't, I tried to maintain the existing logic to avoid too many simultaneous changes. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Hi David, > Might Ken's commit adfed5f72bc07ac7de8dfc62188338d4d4f25a38 have fixed > this? Yes, indeed. I get identical output from iconv(1) and mhshow(1) with the function from http://git.savannah.gnu.org/cgit/nmh.git/tree/uip/mhshowsbr.c. > + if (errno == EINVAL) { > + /* middle of multi-byte sequence */ > + if (write (fd, dest_buffer, outbytes_before - outbytes) < 0) { > + advise (dest, "write"); > + } > + fseeko (*fp, -inbytes, SEEK_CUR); Interestingly, that seeking back by the 1 unprocessed byte of input so the top-of-loop's fread can take another whole 8KiB triggers fseeko(0xaf20d0, -1, 1, 0x7f1c02ff3530 SYS_lseek(3, 4096, 0)= 4096 SYS_read(3, "\315\273\n5)"..., 4095) = 4095 <... fseeko resumed> ) = 0 __fread_chk(0x7fff69f40800, 8192, 1, 8192 IOW, it seeks to 4Ki and reads 4Ki - 1 so it's left in the right place to read the one byte, that we already have, next time. An alternative to sliding down the remaining unprocessed input with memmove(3) and shortening the next fread, I suppose. `strace -e desc mhparam foo' shows many lseek(2)s to get the current position on .mh_profile; always the same. Triggered by the infamous m_getfld()'s ftello(3). :-) Must trigger for every header of every email processed too. -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
>> 1.6's mhshow(1) says > >Might Ken's commit adfed5f72bc07ac7de8dfc62188338d4d4f25a38 >have fixed this? I think our generic assumption is that utf8 is the only multibyte sequence we have to deal with. Although I guess that really only matters if we get an EILSEQ and we're substituting a '?'. --Ken ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Ralph wrote: > > 1.6's mhshow(1) says > > > > mhshow: unable to convert character set to gb2312, continuing... > > I meant to draw attention to that. It was converting *from* gb2312 (to > UTF-8). Fixed, thanks. David ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Ralph wrote: > 1.6's mhshow(1) says Might Ken's commit adfed5f72bc07ac7de8dfc62188338d4d4f25a38 have fixed this? > I took a look at mhshowsbr.c's convert_charset() and I think it's > failing to handle an EINVAL return. That commit adds handling of EINVAL and EISLEQ, relevant portion of the diff is below. David + if (errno == EINVAL) { + /* middle of multi-byte sequence */ + if (write (fd, dest_buffer, outbytes_before - outbytes) < 0) { + advise (dest, "write"); + } + fseeko (*fp, -inbytes, SEEK_CUR); + if (end > 0) bytes_to_read += inbytes; + /* advise(NULL, "convert_charset: EINVAL"); */ + continue; + } + if (errno == EILSEQ) { + /* invalid multi-byte sequence */ + if (fromutf8) { + for (++ib, --inbytes; +inbytes > 0 && + (((unsigned char) *ib) & 0xc0) == 0x80; +++ib, --inbytes) + continue; + } else { + ib++; inbytes--; /* skip it */ + } + (*ob++) = '?'; outbytes --; + /* advise(NULL, "convert_charset: EILSEQ"); */ + goto iconv_start; + } + advise (NULL, "convert_charset: errno = %d", errno); ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
Re: [Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Hi again, > 1.6's mhshow(1) says > > mhshow: unable to convert character set to gb2312, continuing... I meant to draw attention to that. It was converting *from* gb2312 (to UTF-8). -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers
[Nmh-workers] mhshow(1) iconv(3) Bug if Multibyte Straddles Buffer End.
Hi, I got an email recently, probably spam, its charset is gb2312. $ mhlist msg part type/subtype size description 8032 text/plain 10K charset="gb2312" $ 1.6's mhshow(1) says mhshow: unable to convert character set to gb2312, continuing... But I mhstore(1)'d it and used iconv(1) and that was happy so I dug. I took a look at mhshowsbr.c's convert_charset() and I think it's failing to handle an EINVAL return. inbytes and outbytes both start at 8Ki. That's E2BIG having processed 5,646 of in and 8,191 of out. After the bump we attempt to continue, 2,546 of in remaining, and 5,093 of out. That's EINVAL with in now at 8,191, out 11,880. I think the two-byte rune is straddling the 8Ki boundary. I've annotated this with commas between runes. $ mhstore -outfile - 8032 | hd | grep -B 1 2000 storing message 8032 to stdout 1ff0 d0,c1 a6,0a,0a,b0 cb,a1 a2,b4 d3,bc bc,ca f5,d7 || 2000 df,cf f2,b9 dc,c0 ed,b5 c4,cb c4,b8 f6,ba cb,d0 || $ -- Cheers, Ralph. https://plus.google.com/+RalphCorderoy ___ Nmh-workers mailing list Nmh-workers@nongnu.org https://lists.nongnu.org/mailman/listinfo/nmh-workers