subject:"Re\: \[sane\-devel\] Character encoding used for sane

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-24 Thread Ralph Little


Hi,

On 2022-07-18 03:19, Povilas Kanapickas wrote:

Hi John,

On 2022-07-18 05:25, John Scott wrote:

The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
1"). However, from inspecting the code for sane_strstatus(), it appears
that it just returns ordinary string literals, which use whatever
encoding the compiler prescribes for narrow string literals and need not
be the same.

Agreed, going by the letter of standards this is indeed a problem.


So, what character encoding should I be assuming for strings coming from
sane_strstatus() as an application writer? One solution to this dilemma
is, since sane_strstatus() appears to only use characters from ASCII in
the strings, is to use UTF-8 string literals, like this:
u8"Hello, world"

This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.

I'm CC'ing Ralph for a second opinion of whether we can start requiring C11.

By the way, does the current assumption actually break in practice, that
is, are there compilers for which ASCII text will not encode to a subset
of ISO-8859-1?


If you can affirm that the specification needs to prevail, I can send a
merge request to adjust the string literals accordingly.

Let's wait until Ralph replies and then we can see how to proceed.

Thanks a lot for noticing this.

Regards,
Povilas
.


None of the suggestions that we have seen so far seem very portable, yet 
this situation is indeed a problem.


Since UTF-8 is pretty much the de facto string representation these 
days, would a better solution be to change the SANE spec. to specify UTF-8?
If the currently supported text strings are the same in UTF-8 and 
ISO-8859-1 then there should be no practical fallout from the change.


What would the fallout of such a change be?
Would it make frontend support simpler?
Do any of our current frontends actually care?

Cheers,
Ralph

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-21 Thread r. a. schmied


On 07/19/22 18:04, Ralph Little wrote:

Hi,

On 2022-07-18 03:19, Povilas Kanapickas wrote:

Hi John,

On 2022-07-18 05:25, John Scott wrote:

The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
1"). However, from inspecting the code for sane_strstatus(), it appears
that it just returns ordinary string literals, which use whatever
encoding the compiler prescribes for narrow string literals and need not
be the same.

Agreed, going by the letter of standards this is indeed a problem.


So, what character encoding should I be assuming for strings coming from
sane_strstatus() as an application writer? One solution to this dilemma
is, since sane_strstatus() appears to only use characters from ASCII in
the strings, is to use UTF-8 string literals, like this:
u8"Hello, world"

This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.

I'm CC'ing Ralph for a second opinion of whether we can start
requiring C11.


I personally don't mind going to c11.
Mind you, we did just get an enquiry from someone trying to build on
Solaris, but they have gcc 4.9 which should support it so it is probably
OK.
GCC 4.9 is pretty ancient.

Cheers,
Ralph



hey saners

i'm likely that solaris sparc/gcc 4.9 someone and everything i
got is far more than 'pretty ancient'.  heck the gcc4.9 vintage
is only 2014.04.27!  if that is 'pretty ancient' everything else
around here is prehistoric!

gcc 4.9 does provide:
   -std=c++11   Conform to the ISO 2011 C++ standard
   -std=c++14   This switch lacks documentation
   -std=c++1y   Conform to the ISO 2014(?) C++ draft
standard (experimental and incomplete support)
and
   -std=c11 Conform to the ISO 2011 C standard (experimental
and incomplete support)
   -std=c1x Deprecated in favor of -std=c11
plus the -std=gnu variants of the above.

whether this compiler is in conformance is unknown to me, but
it seems to work.  but i am unable to get the sane-backend-
1.0.32.86-911be/backend/genesys/* code to yield a functional
scanimage for the canon lide 200.  scanner controls work,
reads image data, but image has loads of vertical bands.
any help with that is welcome.

the important thing to note is this system generated a scanimage
for the sane-backend-1.0.28/backend/genesys_*.cc code, so the basic
c++ infrastructure seems to be ok.


aloha

ras

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-19 Thread Ralph Little


Hi,

On 2022-07-18 03:19, Povilas Kanapickas wrote:

Hi John,

On 2022-07-18 05:25, John Scott wrote:

The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
1"). However, from inspecting the code for sane_strstatus(), it appears
that it just returns ordinary string literals, which use whatever
encoding the compiler prescribes for narrow string literals and need not
be the same.

Agreed, going by the letter of standards this is indeed a problem.


So, what character encoding should I be assuming for strings coming from
sane_strstatus() as an application writer? One solution to this dilemma
is, since sane_strstatus() appears to only use characters from ASCII in
the strings, is to use UTF-8 string literals, like this:
u8"Hello, world"

This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.

I'm CC'ing Ralph for a second opinion of whether we can start requiring C11.


I personally don't mind going to c11.
Mind you, we did just get an enquiry from someone trying to build on 
Solaris, but they have gcc 4.9 which should support it so it is probably OK.

GCC 4.9 is pretty ancient.

Cheers,
Ralph

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-18 Thread Kelly Price

For Linux systems, C11 is a bit of the default as the Linux kernel now
requires C11.  So that would get all the major Linux distros.

On Mon, Jul 18, 2022 at 10:48 AM John Scott  wrote:
>
> On Mon, 2022-07-18 at 13:19 +0300, Povilas Kanapickas wrote:
> > By the way, does the current assumption actually break in practice, that
> > is, are there compilers for which ASCII text will not encode to a subset
> > of ISO-8859-1?
> I assume you mean "Are there compilers for which narrow/multibyte string
> literals will not encode to a subset of ISO-8859-1?" In that case, I
> haven't researched the matter and don't know of a system for which this
> is a problem off the top of my head.
>
> Note that if we're unwilling to bump compiler requirements to C11, there
> are still a couple options. GCC has the -fexec-charset option to specify
> what encoding "ordinary" string literals should be in; we can set this
> to ISO-8859-1 when building SANE, but this won't be portable to
> compilers without this option.
>
> We could also define a macro that's compatible with older compilers like
> this:
> #if __STDC_VERSION__ >= 201112L
> #define SANE_STRING(X) u8##X
> #else
> #define SANE_STRING(X) X
> #endif
>
> Lastly, iconv() is always an option.



-- 
Kelly "STrRedWolf" Price
http://redwolf.ws

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-18 Thread John Scott

On Mon, 2022-07-18 at 13:19 +0300, Povilas Kanapickas wrote:
> By the way, does the current assumption actually break in practice, that
> is, are there compilers for which ASCII text will not encode to a subset
> of ISO-8859-1?
I assume you mean "Are there compilers for which narrow/multibyte string
literals will not encode to a subset of ISO-8859-1?" In that case, I
haven't researched the matter and don't know of a system for which this
is a problem off the top of my head.

Note that if we're unwilling to bump compiler requirements to C11, there
are still a couple options. GCC has the -fexec-charset option to specify
what encoding "ordinary" string literals should be in; we can set this
to ISO-8859-1 when building SANE, but this won't be portable to
compilers without this option.

We could also define a macro that's compatible with older compilers like
this:
#if __STDC_VERSION__ >= 201112L
#define SANE_STRING(X) u8##X
#else
#define SANE_STRING(X) X
#endif

Lastly, iconv() is always an option.

signature.asc
Description: This is a digitally signed message part

Re: [sane-devel] Character encoding used for sane_strstatus() strings

2022-07-18 Thread Povilas Kanapickas

Hi John,

On 2022-07-18 05:25, John Scott wrote:
> The SANE spec says that all strings are encoded in ISO-8859-1 ("Latin-
> 1"). However, from inspecting the code for sane_strstatus(), it appears
> that it just returns ordinary string literals, which use whatever
> encoding the compiler prescribes for narrow string literals and need not
> be the same.

Agreed, going by the letter of standards this is indeed a problem.

> So, what character encoding should I be assuming for strings coming from
> sane_strstatus() as an application writer? One solution to this dilemma
> is, since sane_strstatus() appears to only use characters from ASCII in
> the strings, is to use UTF-8 string literals, like this:
>   u8"Hello, world"

This would bump compiler requirements to C11. I don't think this is bad,
because we already require C++ for at least one popular backend so it's
unlikely we have many platforms with just ancient C compiler available.

I'm CC'ing Ralph for a second opinion of whether we can start requiring C11.

By the way, does the current assumption actually break in practice, that
is, are there compilers for which ASCII text will not encode to a subset
of ISO-8859-1?

> If you can affirm that the specification needs to prevail, I can send a
> merge request to adjust the string literals accordingly.

Let's wait until Ralph replies and then we can see how to proceed.

Thanks a lot for noticing this.

Regards,
Povilas

Re: [sane-devel] Character encoding used for sane_strstatus() strings

Re: [sane-devel] Character encoding used for sane_strstatus() strings

Re: [sane-devel] Character encoding used for sane_strstatus() strings

Re: [sane-devel] Character encoding used for sane_strstatus() strings

Re: [sane-devel] Character encoding used for sane_strstatus() strings

Re: [sane-devel] Character encoding used for sane_strstatus() strings

6 matches

Site Navigation

Mail list logo

Footer information