Re: [dev] Unicode---Give us all of it!

2007-03-15 Thread Stephan Bergmann
Stephan Bergmann wrote: Stephan Bergmann wrote: [...] How to proceede --- In a first step, I will try to identify and gather as many places in OOo that need to be adapted, but I need your help for that: IF YOU KNOW OF ANY PLACE IN OOo THAT NEEDS TO BE ADAPTED, PLEASE LET ME

Re: [dev] Unicode---Give us all of it!

2006-11-24 Thread Eike Rathke
Hi Stephan, this slipped under my desk.. On Thursday, 2006-11-16 17:39:28 +0100, Stephan Bergmann wrote: This underlines the need of an iterator that takes care of such things. I just wonder how combining characters should be best treated then. I like the idea of an iterator returning

Re: [dev] Unicode---Give us all of it!

2006-11-24 Thread Stephan Bergmann
Eike Rathke wrote: Hi Stephan, this slipped under my desk.. On Thursday, 2006-11-16 17:39:28 +0100, Stephan Bergmann wrote: This underlines the need of an iterator that takes care of such things. I just wonder how combining characters should be best treated then. I like the idea of an

Re: [dev] Unicode---Give us all of it!

2006-11-16 Thread Stephan Bergmann
Eike Rathke wrote: Hi Stephan, On Tuesday, 2006-11-14 15:45:46 +0100, Stephan Bergmann wrote: - On Windows, Writer shows a correct glyph; cursor traveling and selection work. On X11, Writer shows two boxes instead of a single correct glyph; cursor traveling left/right works by treating the

Re: [dev] Unicode---Give us all of it!

2006-11-16 Thread Stephan Bergmann
Stephan Bergmann wrote: [...] 1 I installed a font that contains appropriate glyphs (code2001.ttf from ) into a SRC680m192 share/fonts/truetype, imported an UTF-8 encoded text file that contains U+010300 into Writer, copy/pasted that text to various places, on both Windows and X11: With CWS

Re: [dev] Unicode---Give us all of it!

2006-11-16 Thread Eike Rathke
Hi Stephan, On Thursday, 2006-11-16 10:29:12 +0100, Stephan Bergmann wrote: With CWS icuupgrade (currently SRC680m193) the situation for X11 changes as follows: - On Windows, Writer shows a correct glyph; cursor traveling and selection work. On X11, Writer shows two boxes instead of a

Re: [dev] Unicode---Give us all of it!

2006-11-16 Thread Mathias Bauer
Michael Meeks wrote: On Fri, 2006-11-10 at 17:12 +0100, Stephan Bergmann wrote: This indicates that an application's concept of character is often best represented by a programming environment's concept of string. An interesting insight indeed. Use sal_uInt32 to represent

Re: [dev] Unicode---Give us all of it!

2006-11-15 Thread Stephan Bergmann
Michael Meeks wrote: Now - as you say, there is some poison chalice of endless ABI stability here, but if some big review of the code is underway, it'd be nice to add some #ifdef NO_DEPRECATED_API around the sal_Unicode * operator, and add a sal_WideUnicode [] operator instead (perhaps)

Re: [dev] Unicode---Give us all of it!

2006-11-15 Thread Eike Rathke
Hi Michael, On Tuesday, 2006-11-14 10:56:09 +, Michael Meeks wrote: operator const sal_Unicode *() const SAL_THROW(()) { return pData-buffer; } const sal_Unicode * getStr() const SAL_THROW(()) { return pData-buffer; } And replace them with an inlined [] operator, or

Re: [dev] Unicode---Give us all of it!

2006-11-15 Thread Eike Rathke
Hi Stephan, On Tuesday, 2006-11-14 15:45:46 +0100, Stephan Bergmann wrote: - On Windows, Writer shows a correct glyph; cursor traveling and selection work. On X11, Writer shows two boxes instead of a single correct glyph; cursor traveling left/right works by treating the two boxes as a

Re: [dev] Unicode---Give us all of it!

2006-11-14 Thread Kay Ramme - Sun Germany - Hamburg
Michael, Michael Meeks wrote: There's no chance then of switching to UTF-8 as an underlying string representation :-) and saving a measurable chunk of our string overhead ? this is certainly possible by introducing a new string (I mean exactly _one_ string), which IMHO should address

Re: [dev] Unicode---Give us all of it!

2006-11-14 Thread Kay Ramme - Sun Germany - Hamburg
Stephan, from my point of view, we should have originally followed what the glibc does with wchar_t (see http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html), unfortunately switching to this obviously is incompatible and a lot of work, so your suggestion sounds quite

Re: [dev] Unicode---Give us all of it!

2006-11-14 Thread Stephan Bergmann
Stephan Bergmann wrote: [...] How to proceede --- In a first step, I will try to identify and gather as many places in OOo that need to be adapted, but I need your help for that: IF YOU KNOW OF ANY PLACE IN OOo THAT NEEDS TO BE ADAPTED, PLEASE LET ME KNOW. Once all places have

Re: [dev] Unicode---Give us all of it!

2006-11-14 Thread Michael Meeks
Hi Kay, On Tue, 2006-11-14 at 10:53 +0100, Kay Ramme wrote: Michael Meeks wrote: There's no chance then of switching to UTF-8 as an underlying string representation :-) and saving a measurable chunk of our string overhead ? this is certainly possible by introducing a new string (I

Re: [dev] Unicode---Give us all of it!

2006-11-13 Thread Stephan Bergmann
Niklas Nebel wrote: Philipp Lohmann - Sun Germany wrote: Wouldn't that be more or less any occurence of sal_Unicode? There's hundreds of them in Calc alone. That depends probably on the details. If for example you are searching for ansi1252 code characters in a unicode string (e.g. '/' for

Re: [dev] Unicode---Give us all of it!

2006-11-13 Thread Niklas Nebel
Stephan Bergmann wrote: I doubt that it is that many places that need to be changed. (For example, what do you think needs to be done for text import/export?) The obvious changes for text import: - Separator characters are user-supplied, so they can no longer be handled as a sal_Unicode. -

Re: [dev] Unicode---Give us all of it!

2006-11-13 Thread Stephan Bergmann
Michael Meeks wrote: [...] Use sal_uInt32 to represent individual Unicode encoded characters and add any necessary base functionality to rtl::OUString (e.g., operating on the individual Unicode encoded characters represented by an instance of rtl::OUString). There's no chance then of

Re: [dev] Unicode---Give us all of it!

2006-11-13 Thread Stephan Bergmann
Niklas Nebel wrote: Stephan Bergmann wrote: I doubt that it is that many places that need to be changed. (For example, what do you think needs to be done for text import/export?) The obvious changes for text import: - Separator characters are user-supplied, so they can no longer be handled

[dev] Unicode---Give us all of it!

2006-11-10 Thread Stephan Bergmann
Unicode---Give us all of it! Unicode encodes characters in a codespace that ranges from 0 to 0x10. Much of the OOo code base operates on UTF-16 code units that range from 0 to 0x: - C/C++ code based on sal_Unicode. - Java code based on Java char. - UNO

Re: [dev] Unicode---Give us all of it!

2006-11-10 Thread Niklas Nebel
Stephan Bergmann wrote: In a first step, I will try to identify and gather as many places in OOo that need to be adapted, but I need your help for that: IF YOU KNOW OF ANY PLACE IN OOo THAT NEEDS TO BE ADAPTED, PLEASE LET ME KNOW. Wouldn't that be more or less any occurence of sal_Unicode?

Re: [dev] Unicode---Give us all of it!

2006-11-10 Thread Michael Meeks
On Fri, 2006-11-10 at 17:12 +0100, Stephan Bergmann wrote: This indicates that an application's concept of character is often best represented by a programming environment's concept of string. An interesting insight indeed. Use sal_uInt32 to represent individual Unicode encoded

Re: [dev] Unicode---Give us all of it!

2006-11-10 Thread Philipp Lohmann - Sun Germany
Niklas Nebel wrote: Stephan Bergmann wrote: In a first step, I will try to identify and gather as many places in OOo that need to be adapted, but I need your help for that: IF YOU KNOW OF ANY PLACE IN OOo THAT NEEDS TO BE ADAPTED, PLEASE LET ME KNOW. Wouldn't that be more or less any

Re: [dev] Unicode---Give us all of it!

2006-11-10 Thread Niklas Nebel
Philipp Lohmann - Sun Germany wrote: Wouldn't that be more or less any occurence of sal_Unicode? There's hundreds of them in Calc alone. That depends probably on the details. If for example you are searching for ansi1252 code characters in a unicode string (e.g. '/' for URL or filename