The problem is that getc returns -1 to indicate EOF.  On those C
compilers where the default character type is unsigned byte, this is not a
problem since characters always return values in the range of
0 to 255.  On those C compilers where the default character type is
signed byte,  characters return values in the range -128 to 127.  This
is a problem when the values are sign extended by getc.
(Isn't the C "standard" fun.)

George Rogers

At 10:59 AM 6/30/2004 -0400, you wrote:

There might not BE a definition of getc since it returns
an int and the default is to return an int.  I searched
in /usr/include and /usr/include/sys on one of my Unix
machines and it was not explicitly defined...

What exactly is the problem you are running into with
doing IO on 128-255 characters?

WRITING

Writing shouldn't care: the bits just go out,
except that routines that take "strings" as output
(such as printf) may chop off everything after a zero
byte.  Use fwrite instead.

READING

Reading shouldn't care either, except that you need
to be careful about telling the -1 from EOF apart from
the -1 that you get when you accidently (and erroniously)
sign-extend 128-255.

If you do this:

char foo = getc(stream);

you cannot tell an EOF from char 255 since both leave
the bit pattern FF in foo.  If instead you do this:

int foo = getc(stream);

Then foo is an entire integer.  Say it's 16 bits.
Then the 255 char will leave the bit pattern  00FF
in foo and EOF will leave bit pattern FFFF so you
can tell them apart.  If it's 32 bits the patterns
are 000000FF and FFFFFFFF etc.

SO, if you are putting the result of getc into a char,
then checking it for negative, on machines on which
a "char" is a "signed char", the 128-255 characters
will "look like" an EOF.  With careful checking of
the actual bit pattern you can still tell the
differance, except for character 255 itself.

So just keep it in an int like Richard suggested.

Or, you could use fread(&mychar, 1, 1, stream)

Or, use an independant call to feof() to detect
end of file.

=====

General background:

There is a suite of subroutines defined for IO of ASCII
text characters, which run from 1 to 127 (and which
sometimes use -1 for an EOF flag) and there is a completely
different suite of subroutines that is used for BINARY data,
in which a byte can take any of the 256 possible values.
Usually the problem is either null (zero) which is used as
an end-of-string by some subroutines, or the above-127
characters you are trying to use.

Now, a data item in a computer can be treated as either
SIGNED or UNSIGNED.  A SIGNED byte runs from -127 to 127
(NO! STOP! NO 1's complement vs 2's complement stuff AIEEE!)
while an UNSIGNED byte runs from 0 to 255.

Whether a "char" on your computer is SIGNED or UNSIGNED
depends on the C compiler, which basically wants to generate
the most efficient code, and the fact that your particular
machine hardware architecture may make it easier to do signed
operations or easier to do unsigned operations, so unless you
specifically say, it can choose either, which is a source of
incompatability in porting programs from one computer
architecture to another.

For ASCII which runs from 1 to 127 IT DOESN'T MATTER! whether
the data is signed or unsigned, since the "sign bit" will
never be set.

SO.  You want to use the characters over 127.  You should know
that there may be standardization problems, that these codes
differ between Macintoshes and Suns and Windows.  The major
email programs deal with this by putting in a header that
says

   Content-Type: text/plain; charset="iso-8859-1"

(I copied this from the header of your email message!)
So this specifies the 128-255 characters as the ISO
Standard 8859-1 mapping.

You could declare every byte as UNSIGNED CHAR.  You could keep
your bytes in INT instead of CHAR.  Instead of using the
getc suite you could use fread and fwrite.

If "bar" turns out to be SIGNED CHAR on your machine and
you cannot control this, you might have to use code like:

   foo = 0xFF & bar;

which removes the sign-extend which happens if "foo" is
a data type longer than "bar".


[EMAIL PROTECTED] wrote:
Good mornig,
the problem is that i'm dealing with the extended ASCII code, 8 bits, 'cause
I need characters as à è ò ù and so on. Do you know if there is a function I
can use for I\O of which I can handle this situation? I can't find the
definition of getc, I've checked STDIO.H.
I use char c=getc(file)
Could you give me some suggestions?Obviously I can't add a massive overhead
to the message.To solve this problem I could use 7 bits Ascii but I must use
the accented chars.Maybe I could print the char to the file as int but it
would do a big overhead!!!
Thanks for your time ,best regards!
----- Original Message -----
From: "Richard Levitte - VMS Whacker" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, June 28, 2004 6:00 PM
Subject: Re: OT: problems with crypto and ASCII


In message <[EMAIL PROTECTED]> on Mon, 28 Jun 2004
17:45:23 +0200, <[EMAIL PROTECTED]> said:

deck80> Hi everybody...sorry if it's not a question strictly involving
deck80> openssl but I hope someone can help me.
deck80> I'm writing a simple program that encode a file with a LFSR
deck80> and a clock controlled  shift register. Basically there is a
deck80> char m, I create a char of "worms" x and I make cipher c=m^x
deck80> in output. The problem is that the output can be every kind of
deck80> 256-ASCII code so also one of the first 31. So when it reads
deck80> the encoded file it reads also the special chars.It seems it
deck80> stops when it finds the char "
deck80> ÿ
deck80>
deck80> " which is probably the end of file. So the output is usually
deck80> a little part of the input. How can I do to solve this? I've
deck80> tried to read the file char by char and also without the
deck80> control if I'm reading an EOF
deck80> while(c=getchar(ifile)/*!=EOF*/)
deck80> {...
deck80> }
deck80>  but it understands the file is finished this way either.
deck80> I've tried to append 10 EOF at the end, trying to recognize it
deck80> as a different EOF sequence but it doesn't work.
deck80> I could try to use a sequence of 10 zeros before the end but
deck80> it doesn't seem to be a smart solution(as the former with the
deck80> 10 EOF;))

What is the type of c?  If it's a 'char', try changing it to 'int'.

This is really a C language question :-).

-----
Please consider sponsoring my work on free software.
See http://www.free.lp.se/sponsoring.html for details.

--
Richard Levitte   \ Tunnlandsvägen 52 \ [EMAIL PROTECTED]
[EMAIL PROTECTED]  \ S-168 36  BROMMA  \ T: +46-708-26 53 44
                   \      SWEDEN       \
Procurator Odiosus Ex Infernis                -- [EMAIL PROTECTED]
Member of the OpenSSL development team: http://www.openssl.org/

Unsolicited commercial email is subject to an archival fee of $400.
See <http://www.stacken.kth.se/~levitte/mail/> for more info.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

______________________________________________________________________ OpenSSL Project http://www.openssl.org User Support Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]

-- Charles B (Ben) Cranston mailto: [EMAIL PROTECTED] http://www.wam.umd.edu/~zben

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [EMAIL PROTECTED]
Automated List Manager                           [EMAIL PROTECTED]

______________________________________________________________________ OpenSSL Project http://www.openssl.org User Support Mailing List [EMAIL PROTECTED] Automated List Manager [EMAIL PROTECTED]

Reply via email to