We need ragel's internal data structures to match the signedness of the input array, and sometimes you just need a signed type because you're parsing a stream of integers.
Perhaps what might be better is defaulting the C alphtype to unsigned char, if that's the more common case. -Adrian On Thu, Oct 24, 2013 at 12:53:12PM -0700, William Ahern wrote: > On Thu, Oct 24, 2013 at 08:52:17PM +0200, Peter van Dijk wrote: > > Hello folks, > > > > we (PowerDNS) have a small Ragel parser for segmenting and unescaping DNS > > TXT record data. Some time ago, we expanded the allowed inputs for this > > parser to the full 8 bit 'extended ASCII' range (which Ragel calls > > 'extend'). > > > > This works well on most platforms - but it failed for us on Debian/s390x. > > > > After a lot of digging I found that char is unsigned on s390x, while it is > > signed on amd64, i386 and many other platforms. > > > > I have added 'alphtype unsigned char' to our Ragel file. This makes the > > parser work reliably on both amd64 and s390x (and, hopefully, many other > > platforms). > > > > However, I feel something is wrong. It seems that on s390x, Ragel is > > mostly confused about the type of char. It generates a parser that treats > > extend as -128..127, but maps non-ASCII inputs in the 128..255 range. This > > discrepancy feels like a Ragel issue to me. > > > > A much longer version of this story is at > > https://www.evernote.com/shard/s344/sh/cb968134-4d58-4e46-8b5e-47366a129038/60fafaf56d5a350edf891cf82cefc66d > > > > My question: is this a Ragel bug? Regardless of yes/no, is what I did > > (alphtype unsigned char) the best workaround? > > IMHO it would probably be better for Ragel to use unsigned char arithmetic > for both char and unsigned char. Off the top of my head it even seems like > Ragel should treat all input as unsigned. > > FWIW, I always use unsigned arithmetic, for Ragel and most everything else. > Signed arithmetic is for mathematical formulas, not bit twiddling and string > processing. At the very least, it quickly leads to undefined behavior, > whereas signed->unsigned conversions in C are always well defined. > > Does anybody on the list actually use or depend on signed behavior in their > machines? > > > _______________________________________________ > ragel-users mailing list > ragel-users@complang.org > http://www.complang.org/mailman/listinfo/ragel-users _______________________________________________ ragel-users mailing list ragel-users@complang.org http://www.complang.org/mailman/listinfo/ragel-users