Re: Error when using XSL with French Characters

2008-09-04 Thread Vincent Hennebert
Hi,

Andreas Delmelle wrote:
 On Sep 3, 2008, at 18:35, Steffanina, Jeff wrote:
 
 Hi Jeff
 
 There is always one MORE option to consider!!

 What would you suggest as the best way to handle this?
 
 I think I'd opt for using (N)umeric (C)haracter (R)eferences. Reasoning
 would be that if one changes the BASIC code to emit the sequence
 '#xE8;', this will never, ever have to be changed (unless Unicode would
 somehow decide on altering the codepoints). You can change the encoding
 in the XML header all you want, NCRs will always work.
 
 On the other hand, if you have a LOT of those characters, using NCRs
 could make your XML a bit bulky (instead of 1 byte/character, you

Not mentioning the fact that this would make the document really tedious
to type, and not very readable...


 actually generate 6-8 bytes to represent one character in the final
 result; the XML parser, instead of needing only one byte, has to parse
 all bytes from '' up to and including ';').
 The character code you mentioned earlier (130) is the decimal value for
 'é' in ASCII, so if you're concerned with the size of the XML and do not
 want to generate 6 bytes for one character, try specifying US-ASCII as
 encoding for the source XML.

No, US-ASCII is a 7-bit character set, which means it can contain only
128 characters, none of them being an accented letter [1].

From your other message it looks like the default character set on your
system is ISO-8859-15, which is ok for all of the western languages plus
a few more [2]. Your BASIC program probably uses that character set, in
which case you just have to change the header of your xml file:
?xml version=1.0 encoding=ISO-8859-15?

As long as you put the right header in the XML file you can live with
that setup. However, it is safer to switch to UTF-8 now, in order to
avoid troubles in the future. Indeed, it’s probable that when you change
your computer or upgrade your system the default character set will
become UTF-8. Then if you re-edit that file on the new system, accented
letters will be entered as UTF-8 sequences that are incompatible with
ISO-8859-15, and you’ll basically see garbage in the result. Unless your
editor is elaborate enough to recognize that the file is xml, and parses
the header to get its encoding. But I doubt many editors do that...

You can choose to convert your files to UTF-8 later on, but that might
represent a lot of work, plus you will have to edit every file to change
the xml header to UTF-8. Since the use of UTF-8 as the default charset
will happen sooner or later, you better do that now, when you don’t have
too many files.

Changing the default character set is very system-dependent. Basically
you have to play with the LOCALE variable. You can (may) get a list of
available locales by typing the following command in a terminal:
$ locale -a
C
en_US.iso885915
en_US.utf8
...

If no UTF-8 locale is available it must be generated. Try to find
documentation for your system or ask the system administrator if
applicable...

You find that complicated? It is, it has always been, and I’m afraid it
may forever be. This is historical...

[1] http://en.wikipedia.org/wiki/Ascii
[2] http://en.wikipedia.org/wiki/ISO/IEC_8859-15

HTH,
Vincent

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Error when using XSL with French Characters

2008-09-04 Thread Andreas Delmelle

On Sep 4, 2008, at 12:06, Vincent Hennebert wrote:


snip /
No, US-ASCII is a 7-bit character set, which means it can contain only
128 characters, none of them being an accented letter [1].


Ouch! Indeed. I'm so used to the basic 7-bit set being extended...
To think that I even tried it over here in an editor. If I had only  
also tried to actually save the file, I would have noticed...


Sorry for the confusion, Jeff.

The conclusion is definitely the right one: if you can somehow manage  
to have the BASIC code write the file as UTF-8, all the encoding  
hassles disappear.



Cheers

Andreas

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Error when using XSL with French Characters

2008-09-03 Thread Steffanina, Jeff
Manuel,
 
We create the XML using a version of BASIC.  To create this particular
character, we send  CHR(130) to the XML.  When I open the XML in vi, I
see the proper FRENCH symbol.
 
 

Jeff 




From: Manuel Mall [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 02, 2008 10:51 PM
To: 'fop-users@xmlgraphics.apache.org'
Subject: RE: Error when using XSL with French Characters



I am suspicious that although you declare the XML file as being
in UTF-8 it actually isn't. How do you produce the XML file? 

 

Manuel

 





From: Steffanina, Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 3 September 2008 10:23 AM
To: fop-users@xmlgraphics.apache.org
Subject: Error when using XSL with French Characters

 

 

My Friends, 
Fop-0.95   My style sheet has been working perfectly.  However,
the user submitted some text in French.  In the text was a letter e
with an accent above it.

That character caused the following error: 
Invalid byte 1 of 1-byte UTF-8 sequence. 

My .xml looks fine.   The  e with the accent above it is
perfect. 
First line in my XML: 
?xml version=1.0 encoding=UTF-8? 

Here is the first line of my XSL: 
?xml version=1.0 encoding=UTF-8? 

I am confused over why the UTF-8 for the XML understands the
character but the UTF-8 in the XSL does not? 

I found an article that suggests that the problem would be
solved with: 
?xml version=1.0 encoding=8859-1? 

Would this be a viable/recommended solution?   Do you have a
better idea? 

 



Re: Error when using XSL with French Characters

2008-09-03 Thread Jean-François El Fouly
There are four kinds of accent current in French (é è ê ë) so you should 
be more precise.
None of them can possibly correspond to CHR(130) neither in UTF-8 nor in 
ISO-8859-1

On what kind of system/platform/OS are you working ?
Mentioning vi makes me guess it should be some kind of Unix but at the 
same time the encoding used makes this improbable...

I guess more information is needed here.

Steffanina, Jeff a écrit :

Manuel,
 
We create the XML using a version of BASIC.  To create this particular 
character, we send  CHR(130) to the XML.  When I open the XML in vi, 
I see the proper FRENCH symbol.
 
 


*/Jeff /*


*From:* Manuel Mall [mailto:[EMAIL PROTECTED]
*Sent:* Tuesday, September 02, 2008 10:51 PM
*To:* 'fop-users@xmlgraphics.apache.org'
*Subject:* RE: Error when using XSL with French Characters

I am suspicious that although you declare the XML file as being in
UTF-8 it actually isn't. How do you produce the XML file?

 


Manuel

 




*From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED]
*Sent:* Wednesday, 3 September 2008 10:23 AM
*To:* fop-users@xmlgraphics.apache.org
*Subject:* Error when using XSL with French Characters

 

 


My Friends,
Fop-0.95   My style sheet has been working perfectly.  However,
the user submitted some text in French.  In the text was a letter
e with an accent above it.

That character caused the following error:
Invalid byte 1 of 1-byte UTF-8 sequence.

My .xml looks fine.   The  e with the accent above it is perfect.
First line in my XML:
?xml version=1.0 encoding=UTF-8?

Here is the first line of my XSL:
?xml version=1.0 encoding=UTF-8?

I am confused over why the UTF-8 for the XML understands the
character but the UTF-8 in the XSL does not?

I found an article that suggests that the problem would be solved
with:
?xml version=1.0 encoding=8859-1?

Would this be a viable/recommended solution?   Do you have a
better idea?

 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Error when using XSL with French Characters

2008-09-03 Thread Steffanina, Jeff
Jean-Francois,

fop-0.95
I am running Redhat Linux 2.4.21-47.0.1. 

The letter I am referring to is:  é è
I assume I am having problems with any French character that includes a glyph.

What are you using for  ?xml version=1.0 encoding=?

I appreciate any suggestions.  I have not had to deal with international 
characters sets before.

Thanks.

Jeff 

-Original Message-
From: Jean-François El Fouly [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 03, 2008 8:58 AM
To: fop-users@xmlgraphics.apache.org
Subject: Re: Error when using XSL with French Characters

There are four kinds of accent current in French (é è ê ë) so you should 
be more precise.
None of them can possibly correspond to CHR(130) neither in UTF-8 nor in 
ISO-8859-1
On what kind of system/platform/OS are you working ?
Mentioning vi makes me guess it should be some kind of Unix but at the 
same time the encoding used makes this improbable...
I guess more information is needed here.

Steffanina, Jeff a écrit :
 Manuel,
  
 We create the XML using a version of BASIC.  To create this particular 
 character, we send  CHR(130) to the XML.  When I open the XML in vi, 
 I see the proper FRENCH symbol.
  
  

 */Jeff /*

 
 *From:* Manuel Mall [mailto:[EMAIL PROTECTED]
 *Sent:* Tuesday, September 02, 2008 10:51 PM
 *To:* 'fop-users@xmlgraphics.apache.org'
 *Subject:* RE: Error when using XSL with French Characters

 I am suspicious that although you declare the XML file as being in
 UTF-8 it actually isn't. How do you produce the XML file?

  

 Manuel

  

 

 *From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED]
 *Sent:* Wednesday, 3 September 2008 10:23 AM
 *To:* fop-users@xmlgraphics.apache.org
 *Subject:* Error when using XSL with French Characters

  

  

 My Friends,
 Fop-0.95   My style sheet has been working perfectly.  However,
 the user submitted some text in French.  In the text was a letter
 e with an accent above it.

 That character caused the following error:
 Invalid byte 1 of 1-byte UTF-8 sequence.

 My .xml looks fine.   The  e with the accent above it is perfect.
 First line in my XML:
 ?xml version=1.0 encoding=UTF-8?

 Here is the first line of my XSL:
 ?xml version=1.0 encoding=UTF-8?

 I am confused over why the UTF-8 for the XML understands the
 character but the UTF-8 in the XSL does not?

 I found an article that suggests that the problem would be solved
 with:
 ?xml version=1.0 encoding=8859-1?

 Would this be a viable/recommended solution?   Do you have a
 better idea?

  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Error when using XSL with French Characters

2008-09-03 Thread Steffanina, Jeff

Jean-Francois,
On my Linux box I have this entry in:  /etc/sysconfig/i18n

LANG=en_US.iso885915 



Jeff Steffanina
FOSSE Development,  Bethesda, MD
(301)380-2047
[EMAIL PROTECTED]

This communication contains information from Marriott International, Inc. 
that may be confidential. Except for personal use by the intended recipient, or 
as expressly authorized by the sender, any person who receives this information 
is prohibited from disclosing, copying, distributing, and/or using it. If you 
have received this communication in error, please immediately delete it and all 
copies, and promptly notify the sender. Nothing in this communication is 
intended as an electronic signature under applicable law.


-Original Message-
From: Jean-François El Fouly [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 03, 2008 8:58 AM
To: fop-users@xmlgraphics.apache.org
Subject: Re: Error when using XSL with French Characters

There are four kinds of accent current in French (é è ê ë) so you should 
be more precise.
None of them can possibly correspond to CHR(130) neither in UTF-8 nor in 
ISO-8859-1
On what kind of system/platform/OS are you working ?
Mentioning vi makes me guess it should be some kind of Unix but at the 
same time the encoding used makes this improbable...
I guess more information is needed here.

Steffanina, Jeff a écrit :
 Manuel,
  
 We create the XML using a version of BASIC.  To create this particular 
 character, we send  CHR(130) to the XML.  When I open the XML in vi, 
 I see the proper FRENCH symbol.
  
  

 */Jeff /*

 
 *From:* Manuel Mall [mailto:[EMAIL PROTECTED]
 *Sent:* Tuesday, September 02, 2008 10:51 PM
 *To:* 'fop-users@xmlgraphics.apache.org'
 *Subject:* RE: Error when using XSL with French Characters

 I am suspicious that although you declare the XML file as being in
 UTF-8 it actually isn't. How do you produce the XML file?

  

 Manuel

  

 

 *From:* Steffanina, Jeff [mailto:[EMAIL PROTECTED]
 *Sent:* Wednesday, 3 September 2008 10:23 AM
 *To:* fop-users@xmlgraphics.apache.org
 *Subject:* Error when using XSL with French Characters

  

  

 My Friends,
 Fop-0.95   My style sheet has been working perfectly.  However,
 the user submitted some text in French.  In the text was a letter
 e with an accent above it.

 That character caused the following error:
 Invalid byte 1 of 1-byte UTF-8 sequence.

 My .xml looks fine.   The  e with the accent above it is perfect.
 First line in my XML:
 ?xml version=1.0 encoding=UTF-8?

 Here is the first line of my XSL:
 ?xml version=1.0 encoding=UTF-8?

 I am confused over why the UTF-8 for the XML understands the
 character but the UTF-8 in the XSL does not?

 I found an article that suggests that the problem would be solved
 with:
 ?xml version=1.0 encoding=8859-1?

 Would this be a viable/recommended solution?   Do you have a
 better idea?

  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Error when using XSL with French Characters

2008-09-03 Thread Andreas Delmelle

On Sep 3, 2008, at 15:05, Steffanina, Jeff wrote:

Hi Jeff


fop-0.95
I am running Redhat Linux 2.4.21-47.0.1.

The letter I am referring to is:  é è
I assume I am having problems with any French character that  
includes a glyph.


What are you using for  ?xml version=1.0 encoding=?

I appreciate any suggestions.  I have not had to deal with  
international characters sets before.


If all else fails, remember that XML *always* allows Numeric  
Character References, like #x0A; or #10; for a linefeed (values are  
always UTF-8 codepoints).


In UTF-8, the respective character codes are:

#xE8; - è
#xE9; - é

If you output those sequences in the BASIC module, then it should  
work, regardless of which encoding is specified in the XML header.


HTH!

Cheers

Andreas


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Error when using XSL with French Characters

2008-09-03 Thread Steffanina, Jeff

There is always one MORE option to consider!!

What would you suggest as the best way to handle this? 


Jeff

-Original Message-
From: Andreas Delmelle [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 03, 2008 12:32 PM
To: fop-users@xmlgraphics.apache.org
Subject: Re: Error when using XSL with French Characters

On Sep 3, 2008, at 15:05, Steffanina, Jeff wrote:

Hi Jeff

 fop-0.95
 I am running Redhat Linux 2.4.21-47.0.1.

 The letter I am referring to is:  é è
 I assume I am having problems with any French character that  
 includes a glyph.

 What are you using for  ?xml version=1.0 encoding=?

 I appreciate any suggestions.  I have not had to deal with  
 international characters sets before.

If all else fails, remember that XML *always* allows Numeric  
Character References, like #x0A; or #10; for a linefeed (values are  
always UTF-8 codepoints).

In UTF-8, the respective character codes are:

#xE8; - è
#xE9; - é

If you output those sequences in the BASIC module, then it should  
work, regardless of which encoding is specified in the XML header.

HTH!

Cheers

Andreas


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Error when using XSL with French Characters

2008-09-03 Thread Andreas Delmelle

On Sep 3, 2008, at 18:35, Steffanina, Jeff wrote:

Hi Jeff


There is always one MORE option to consider!!

What would you suggest as the best way to handle this?


I think I'd opt for using (N)umeric (C)haracter (R)eferences.  
Reasoning would be that if one changes the BASIC code to emit the  
sequence '#xE8;', this will never, ever have to be changed (unless  
Unicode would somehow decide on altering the codepoints). You can  
change the encoding in the XML header all you want, NCRs will always  
work.


On the other hand, if you have a LOT of those characters, using NCRs  
could make your XML a bit bulky (instead of 1 byte/character, you  
actually generate 6-8 bytes to represent one character in the final  
result; the XML parser, instead of needing only one byte, has to  
parse all bytes from '' up to and including ';').
The character code you mentioned earlier (130) is the decimal value  
for 'é' in ASCII, so if you're concerned with the size of the XML and  
do not want to generate 6 bytes for one character, try specifying US- 
ASCII as encoding for the source XML.



HTH!

Andreas
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Error when using XSL with French Characters

2008-09-02 Thread Steffanina, Jeff

My Friends,
Fop-0.95   My style sheet has been working perfectly.  However, the user 
submitted some text in French.  In the text was a letter e with an accent 
above it.

That character caused the following error:
Invalid byte 1 of 1-byte UTF-8 sequence.

My .xml looks fine.   The  e with the accent above it is perfect.
First line in my XML:
?xml version=1.0 encoding=UTF-8?

Here is the first line of my XSL:
?xml version=1.0 encoding=UTF-8?

I am confused over why the UTF-8 for the XML understands the character but the 
UTF-8 in the XSL does not?

I found an article that suggests that the problem would be solved with: 
?xml version=1.0 encoding=8859-1?

Would this be a viable/recommended solution?   Do you have a better idea?


Jeff Steffanina
FOSSE Development,  Bethesda, MD
(301)380-2047
[EMAIL PROTECTED]

This communication contains information from Marriott International, Inc. 
that may be confidential. Except for personal use by the intended recipient, or 
as expressly authorized by the sender, any person who receives this information 
is prohibited from disclosing, copying, distributing, and/or using it. If you 
have received this communication in error, please immediately delete it and all 
copies, and promptly notify the sender. Nothing in this communication is 
intended as an electronic signature under applicable law.




RE: Error when using XSL with French Characters

2008-09-02 Thread Manuel Mall
I am suspicious that although you declare the XML file as being in UTF-8 it
actually isn't. How do you produce the XML file? 

 

Manuel

 

  _  

From: Steffanina, Jeff [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, 3 September 2008 10:23 AM
To: fop-users@xmlgraphics.apache.org
Subject: Error when using XSL with French Characters

 

 

My Friends, 
Fop-0.95   My style sheet has been working perfectly.  However, the user
submitted some text in French.  In the text was a letter e with an accent
above it.

That character caused the following error: 
Invalid byte 1 of 1-byte UTF-8 sequence. 

My .xml looks fine.   The  e with the accent above it is perfect. 
First line in my XML: 
?xml version=1.0 encoding=UTF-8? 

Here is the first line of my XSL: 
?xml version=1.0 encoding=UTF-8? 

I am confused over why the UTF-8 for the XML understands the character but
the UTF-8 in the XSL does not? 

I found an article that suggests that the problem would be solved with: 
?xml version=1.0 encoding=8859-1? 

Would this be a viable/recommended solution?   Do you have a better idea? 

 

Jeff Steffanina 
FOSSE Development,  Bethesda, MD 
(301)380-2047 
[EMAIL PROTECTED] 

This communication contains information from Marriott International,
Inc. that may be confidential. Except for personal use by the intended
recipient, or as expressly authorized by the sender, any person who receives
this information is prohibited from disclosing, copying, distributing,
and/or using it. If you have received this communication in error, please
immediately delete it and all copies, and promptly notify the sender.
Nothing in this communication is intended as an electronic signature under
applicable law.