Re: PS (Malformed UTF-8 character)

2003-10-26 Thread Marco Baroni
Thanks for your quick reply! Looks like your theory about the input data being "in ascii (with entity references...)" is contradicted by the evidence. Indeed. So now you need to determine what character encoding is being used for the non-ascii codes, which are obviously present in the data. When

Re: Malformed UTF-8 character

2003-10-26 Thread John Delacour
At 1:12 am +0200 26/10/03, Marco Baroni wrote: I am new to (explicit) unicode handling, and right now I am facing this problem. I have some data (lots of data) that in theory should be in ascii (with entity references in place of non-ascii characters). I have no easy way to get to know exact

Re: Bidirectional (bidi) Support?

2003-10-26 Thread Ken Beesley
> Date: Sun, 26 Oct 2003 18:02:36 +0200 > From: Jarkko Hietaniemi <[EMAIL PROTECTED]> Beesley: > > It's curious that the Arabic Presentation Forms got > > into Unicode at all, and a number of people still think > > it was a mistake, a sell-out. One of the Fathers of Unicode > > told me they were

Re: Bidirectional (bidi) Support?

2003-10-26 Thread Ken Beesley
Chris, I think what you've done is very interesting, and useful, so what I have to say below is not intended as criticism of your work in any way. It's curious that the Arabic Presentation Forms got into Unicode at all, and a number of people still think it was a mistake, a sell-out. One of the

Re: PS (Malformed UTF-8 character)

2003-10-26 Thread David Graff
[EMAIL PROTECTED] said: > I see a rhombus with a question mark inside (which is the way my > shell displays non-ASCII characters). I guess it is a c with cedilla > from the context. > So, I would like to ask you or anybody else: is there some kind of > tool (e.g., a text editor) that I could u

Re: Bidirectional (bidi) Support?

2003-10-26 Thread Jarkko Hietaniemi
> It's curious that the Arabic Presentation Forms got > into Unicode at all, and a number of people still think > it was a mistake, a sell-out. One of the Fathers of Unicode > told me they were deprecated. Even the Unicode specification > explains their presence rather apologetically. Well, one

Re: PS (Malformed UTF-8 character)

2003-10-26 Thread Edward Cherlin
On Sunday 26 October 2003 01:27 am, Marco Baroni wrote: > Thanks for your quick reply! > > > When you look at the file and you see > > a c with cedilla, can you tell whether is this actually the > > appropriate character, based on its context? Is this true > > of all such characters? > > I do not