Re: [sane-devel] Character encoding used for sane_strstatus() strings
Hi, On 2022-07-18 03:19, Povilas Kanapickas wrote: Hi John, On 2022-07-18 05:25, John Scott wrote: The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin- 1"). However, from inspecting the code for sane_strstatus(), it appears that it just returns ordinary string literals, which use whatever encoding the compiler prescribes for narrow string literals and need not be the same. Agreed, going by the letter of standards this is indeed a problem. So, what character encoding should I be assuming for strings coming from sane_strstatus() as an application writer? One solution to this dilemma is, since sane_strstatus() appears to only use characters from ASCII in the strings, is to use UTF-8 string literals, like this: u8"Hello, world" This would bump compiler requirements to C11. I don't think this is bad, because we already require C++ for at least one popular backend so it's unlikely we have many platforms with just ancient C compiler available. I'm CC'ing Ralph for a second opinion of whether we can start requiring C11. By the way, does the current assumption actually break in practice, that is, are there compilers for which ASCII text will not encode to a subset of ISO-8859-1? If you can affirm that the specification needs to prevail, I can send a merge request to adjust the string literals accordingly. Let's wait until Ralph replies and then we can see how to proceed. Thanks a lot for noticing this. Regards, Povilas . None of the suggestions that we have seen so far seem very portable, yet this situation is indeed a problem. Since UTF-8 is pretty much the de facto string representation these days, would a better solution be to change the SANE spec. to specify UTF-8? If the currently supported text strings are the same in UTF-8 and ISO-8859-1 then there should be no practical fallout from the change. What would the fallout of such a change be? Would it make frontend support simpler? Do any of our current frontends actually care? Cheers, Ralph
Re: [sane-devel] Character encoding used for sane_strstatus() strings
On 07/19/22 18:04, Ralph Little wrote: Hi, On 2022-07-18 03:19, Povilas Kanapickas wrote: Hi John, On 2022-07-18 05:25, John Scott wrote: The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin- 1"). However, from inspecting the code for sane_strstatus(), it appears that it just returns ordinary string literals, which use whatever encoding the compiler prescribes for narrow string literals and need not be the same. Agreed, going by the letter of standards this is indeed a problem. So, what character encoding should I be assuming for strings coming from sane_strstatus() as an application writer? One solution to this dilemma is, since sane_strstatus() appears to only use characters from ASCII in the strings, is to use UTF-8 string literals, like this: u8"Hello, world" This would bump compiler requirements to C11. I don't think this is bad, because we already require C++ for at least one popular backend so it's unlikely we have many platforms with just ancient C compiler available. I'm CC'ing Ralph for a second opinion of whether we can start requiring C11. I personally don't mind going to c11. Mind you, we did just get an enquiry from someone trying to build on Solaris, but they have gcc 4.9 which should support it so it is probably OK. GCC 4.9 is pretty ancient. Cheers, Ralph hey saners i'm likely that solaris sparc/gcc 4.9 someone and everything i got is far more than 'pretty ancient'. heck the gcc4.9 vintage is only 2014.04.27! if that is 'pretty ancient' everything else around here is prehistoric! gcc 4.9 does provide: -std=c++11 Conform to the ISO 2011 C++ standard -std=c++14 This switch lacks documentation -std=c++1y Conform to the ISO 2014(?) C++ draft standard (experimental and incomplete support) and -std=c11 Conform to the ISO 2011 C standard (experimental and incomplete support) -std=c1x Deprecated in favor of -std=c11 plus the -std=gnu variants of the above. whether this compiler is in conformance is unknown to me, but it seems to work. but i am unable to get the sane-backend- 1.0.32.86-911be/backend/genesys/* code to yield a functional scanimage for the canon lide 200. scanner controls work, reads image data, but image has loads of vertical bands. any help with that is welcome. the important thing to note is this system generated a scanimage for the sane-backend-1.0.28/backend/genesys_*.cc code, so the basic c++ infrastructure seems to be ok. aloha ras
Re: [sane-devel] Character encoding used for sane_strstatus() strings
Hi, On 2022-07-18 03:19, Povilas Kanapickas wrote: Hi John, On 2022-07-18 05:25, John Scott wrote: The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin- 1"). However, from inspecting the code for sane_strstatus(), it appears that it just returns ordinary string literals, which use whatever encoding the compiler prescribes for narrow string literals and need not be the same. Agreed, going by the letter of standards this is indeed a problem. So, what character encoding should I be assuming for strings coming from sane_strstatus() as an application writer? One solution to this dilemma is, since sane_strstatus() appears to only use characters from ASCII in the strings, is to use UTF-8 string literals, like this: u8"Hello, world" This would bump compiler requirements to C11. I don't think this is bad, because we already require C++ for at least one popular backend so it's unlikely we have many platforms with just ancient C compiler available. I'm CC'ing Ralph for a second opinion of whether we can start requiring C11. I personally don't mind going to c11. Mind you, we did just get an enquiry from someone trying to build on Solaris, but they have gcc 4.9 which should support it so it is probably OK. GCC 4.9 is pretty ancient. Cheers, Ralph
Re: [sane-devel] Character encoding used for sane_strstatus() strings
For Linux systems, C11 is a bit of the default as the Linux kernel now requires C11. So that would get all the major Linux distros. On Mon, Jul 18, 2022 at 10:48 AM John Scott wrote: > > On Mon, 2022-07-18 at 13:19 +0300, Povilas Kanapickas wrote: > > By the way, does the current assumption actually break in practice, that > > is, are there compilers for which ASCII text will not encode to a subset > > of ISO-8859-1? > I assume you mean "Are there compilers for which narrow/multibyte string > literals will not encode to a subset of ISO-8859-1?" In that case, I > haven't researched the matter and don't know of a system for which this > is a problem off the top of my head. > > Note that if we're unwilling to bump compiler requirements to C11, there > are still a couple options. GCC has the -fexec-charset option to specify > what encoding "ordinary" string literals should be in; we can set this > to ISO-8859-1 when building SANE, but this won't be portable to > compilers without this option. > > We could also define a macro that's compatible with older compilers like > this: > #if __STDC_VERSION__ >= 201112L > #define SANE_STRING(X) u8##X > #else > #define SANE_STRING(X) X > #endif > > Lastly, iconv() is always an option. -- Kelly "STrRedWolf" Price http://redwolf.ws
Re: [sane-devel] Character encoding used for sane_strstatus() strings
On Mon, 2022-07-18 at 13:19 +0300, Povilas Kanapickas wrote: > By the way, does the current assumption actually break in practice, that > is, are there compilers for which ASCII text will not encode to a subset > of ISO-8859-1? I assume you mean "Are there compilers for which narrow/multibyte string literals will not encode to a subset of ISO-8859-1?" In that case, I haven't researched the matter and don't know of a system for which this is a problem off the top of my head. Note that if we're unwilling to bump compiler requirements to C11, there are still a couple options. GCC has the -fexec-charset option to specify what encoding "ordinary" string literals should be in; we can set this to ISO-8859-1 when building SANE, but this won't be portable to compilers without this option. We could also define a macro that's compatible with older compilers like this: #if __STDC_VERSION__ >= 201112L #define SANE_STRING(X) u8##X #else #define SANE_STRING(X) X #endif Lastly, iconv() is always an option. signature.asc Description: This is a digitally signed message part
Re: [sane-devel] Character encoding used for sane_strstatus() strings
Hi John, On 2022-07-18 05:25, John Scott wrote: > The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin- > 1"). However, from inspecting the code for sane_strstatus(), it appears > that it just returns ordinary string literals, which use whatever > encoding the compiler prescribes for narrow string literals and need not > be the same. Agreed, going by the letter of standards this is indeed a problem. > So, what character encoding should I be assuming for strings coming from > sane_strstatus() as an application writer? One solution to this dilemma > is, since sane_strstatus() appears to only use characters from ASCII in > the strings, is to use UTF-8 string literals, like this: > u8"Hello, world" This would bump compiler requirements to C11. I don't think this is bad, because we already require C++ for at least one popular backend so it's unlikely we have many platforms with just ancient C compiler available. I'm CC'ing Ralph for a second opinion of whether we can start requiring C11. By the way, does the current assumption actually break in practice, that is, are there compilers for which ASCII text will not encode to a subset of ISO-8859-1? > If you can affirm that the specification needs to prevail, I can send a > merge request to adjust the string literals accordingly. Let's wait until Ralph replies and then we can see how to proceed. Thanks a lot for noticing this. Regards, Povilas