Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-23 Thread Bill Spitzak
If you know your UTF-8 is correct (no errors) you can find the start of 
the Nth Unicode code point (not character!) by finding the Nth byte 
that is not in the range 0x80-0xBF.

However if you think you need to print N characters then you are not 
using Unicode correctly. That will return the length of UTF-8 that 
represents N Unicode code points, but those are not characters. There 
are combining accents, invisible characters, diretion change indicators, 
etc. And as others pointed out, even the printing ones in monospace 
fonts take different widths.

On 09/21/2011 08:58 AM, MacArthur, Ian (SELEX GALILEO, UK) wrote:

 OK. I have string with some length (m = strlen(s);) and I have n -
 quantity of characters which I can print.
 I can calculate k = fl_utf_nb_char(s, m);

 If k= n then no problem : fl_draw(s,x,y);

 Otherwise?
 In cycle decrease the source text  by one byte and check for
 length, yes?

 It's pretty easy to scan a UTF8 string and determine where each glyph
 begins, so you'd want to chop a whole glyph off the end of your string,
 not just one byte, I imagine.

 I think there might even be a fltk helper function for finding the glyph
 positions in the string - or am I remembering some other toolkit...?




 SELEX Galileo Ltd
 Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 
 3EL
 A company registered in England  Wales.  Company no. 02426132
 
 This email and any attachments are confidential to the intended
 recipient and may also be privileged. If you are not the intended
 recipient please delete it from your system and notify the sender.
 You should not copy it or use it for any purpose nor disclose or
 distribute its contents to any other person.
 


___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-23 Thread Bill Spitzak
On 09/21/2011 01:39 PM, Nikita Egorov wrote:

 off topic: I'm not sure the word glyph is a proper one in our case.
 IIRC the glyph can be only part of character. A few glyphs can be at
 one character cell and make up grapheme, symbol. So I'm interested in
 how many character cells will be displayed.

The proper term for what you are finding is Unicode code points.

If you restrict yourself to printing Latin-1 you will approximately get 
what you think you want.

I have also seen a (quite lengthy) piece of code that determines if a 
code, printed in monospace, takes 1, 2, or 0 cells. Not necessarily 
correct but it uses a huge table of ranges and supposedly agrees with 
the majority of monospace fonts in existence. All the combining accents 
and direction and language indicators are claimed to take 0.
___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-23 Thread Bill Spitzak
If you know your string is ASCII then the number of glyphs is equal to 
the number of bytes. This may be reasonable if you are just trying to 
print numbers.

Even using fltk's calls, you can use utf8_fwd to move from the start of 
the string to find the Nth code point. This has the advantage that it 
will count errorneous sequences correctly (1 code for each byte). This 
is a lot more efficient than going from the end and measuring over and over.

On 09/21/2011 08:58 AM, MacArthur, Ian (SELEX GALILEO, UK) wrote:

 OK. I have string with some length (m = strlen(s);) and I have n -
 quantity of characters which I can print.
 I can calculate k = fl_utf_nb_char(s, m);

 If k= n then no problem : fl_draw(s,x,y);

 Otherwise?
 In cycle decrease the source text  by one byte and check for
 length, yes?

 It's pretty easy to scan a UTF8 string and determine where each glyph
 begins, so you'd want to chop a whole glyph off the end of your string,
 not just one byte, I imagine.

 I think there might even be a fltk helper function for finding the glyph
 positions in the string - or am I remembering some other toolkit...?




 SELEX Galileo Ltd
 Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 
 3EL
 A company registered in England  Wales.  Company no. 02426132
 
 This email and any attachments are confidential to the intended
 recipient and may also be privileged. If you are not the intended
 recipient please delete it from your system and notify the sender.
 You should not copy it or use it for any purpose nor disclose or
 distribute its contents to any other person.
 


___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-23 Thread Bill Spitzak
On 09/22/2011 02:44 AM, Duncan Gibson wrote:
 I have changed (r.9055) the Doxygen doc of fl_draw() functions
 to state explicitly that all involved strings are UTF-8 encoded
 and all lengths are in bytes.

 I would suggest to use the fl_utf8decode() function in your case.
 It will successively compute the byte length of each Unicode character
 in your UTF-8 string. When you have reached the maximum allowed
 number of characters or the string end, you'll know how many bytes
 to send to fl_draw(). This would work for LGC scripts provided
 accented characters are encoded with a single Unicode value.

 The Unicode in FLTK section in http://www.fltk.org/doc-1.3/unicode.html
 already explicitly states:

 FLTK will only handle single characters, so composed characters
  consisting of a base character and floating accent characters will
  be treated as multiple characters

 as this was the consensus when this was all being discussed in 2009.
 D.

Not sure exactly what is meant by that.

The low-level utf8fwd/back/decode api will return the Unicode code 
points without any changes or normalization, so this is correct about that.

However printout is *supposed* to render UTF-8 as accurately as 
possible. In the ideal world the accents will print above the character 
before them in exactly the right place.

I think this is working pretty well on the newest OS/X versions. Windows 
is somewhat inbetween (I think it recognizes anything that can be 
replaced with a precomposed character), and Linux is the worst right now 
(it just draws the glyphs next to each other) (Linux could be the best 
if FLTK called Pango to do the layout, but there may be resistance to 
that). In any case FLTK's behavior when printing is not an indication of 
the intended design.

Correctly handling Unicode strings requires enormous resources, and it 
is hoped that FLTK can avoid writing all that. Supposedly a program just 
calls these resources directly, and FLTK calls them to render strings, 
but otherwise it does not need to provide an API.
___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-22 Thread Manolo Gouy
I have changed (r.9055) the Doxygen doc of fl_draw() functions
to state explicitly that all involved strings are UTF-8 encoded
and all lengths are in bytes.

I would suggest to use the fl_utf8decode() function in your case.
It will successively compute the byte length of each Unicode character
in your UTF-8 string. When you have reached the maximum allowed
number of characters or the string end, you'll know how many bytes
to send to fl_draw(). This would work for LGC scripts provided
accented characters are encoded with a single Unicode value.
___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-22 Thread MacArthur, Ian (SELEX GALILEO, UK)

 I have changed (r.9055) the Doxygen doc of fl_draw() functions
 to state explicitly that all involved strings are UTF-8 encoded
 and all lengths are in bytes.

Good idea.


 I would suggest to use the fl_utf8decode() function in your case.
 It will successively compute the byte length of each Unicode character
 in your UTF-8 string. When you have reached the maximum allowed
 number of characters or the string end, you'll know how many bytes
 to send to fl_draw(). This would work for LGC scripts provided
 accented characters are encoded with a single Unicode value.

Yes - I think that sounds better than my idea of walking backwards along
the string to clip off excess characters..



SELEX Galileo Ltd
Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 
3EL
A company registered in England  Wales.  Company no. 02426132

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.


___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-22 Thread Duncan Gibson
 I have changed (r.9055) the Doxygen doc of fl_draw() functions
 to state explicitly that all involved strings are UTF-8 encoded
 and all lengths are in bytes.

 I would suggest to use the fl_utf8decode() function in your case.
 It will successively compute the byte length of each Unicode character
 in your UTF-8 string. When you have reached the maximum allowed
 number of characters or the string end, you'll know how many bytes
 to send to fl_draw(). This would work for LGC scripts provided
 accented characters are encoded with a single Unicode value.

The Unicode in FLTK section in http://www.fltk.org/doc-1.3/unicode.html
already explicitly states:

   FLTK will only handle single characters, so composed characters
consisting of a base character and floating accent characters will
be treated as multiple characters

as this was the consensus when this was all being discussed in 2009.
D.
___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Nikita Egorov
 Without looking into it in more detail, I'm going to say docs problem -
 it should not say n characters but should probably say something like
 the number of bytes needed to represent n characters in UTF8 or some
 such thing... Or...?
Yes, description should be replaced to number of bytes, but I want
to work with characters.

 To solve the problem we can convert all string to UTF-16 and
 then restrict its by the specified length.

 Which perhaps will not work either, since as soon as you hit any
 character that is not on the BMP you will need a UTF16 surrogate pair,
 and the same sort of problem occurs.

But in UTF-16 all symbols have size two bytes. There is no problem to
set specified size of string as opposed to UTF8 where every symbol can
have own size (from 1 up to 5?) .

 Indeed - the reason for having the length option in these methods is
 explicitly so that the string does not need to be NULL terminated...

It is one more issue - the current implementation ignores NULL byte.
It prints exactly N bytes from the source. Such behavior makes some
problems for me too. May be we should check for zero char and stop
printing the rest?

 Not keen, if we are to have a clean UTF8 API (and I think we should)
 then...

What is the harm if we add one more function which accepts UTF16
string ? In MS Win it would be part of the gd-draw(...) which would
invoked after conversion.

 Because in order to evaluate right length of
 text, user has to convert source line to the UTF-16, restrict
 size and convert into UTF-8 again to invoke fl_draw(s,x,y)
 where string will be converted one more time.  Thus at the
 moment there is two unnecessary conversions.

 Hmm, we have functions that tell you the number of characters in a
 UTF8 string, do these not help?

I see no way to use it. Shortly - I have to print column of text lines
restricted by specified length. I use the Courier font.
The lines can contain any symbols - latin, cyrillic etc. I can't
evaluate needed size of line in bytes if I use UTF8. Only via
conversion into any accessible format with monosized symbols.

 Converting to and from UTF16 is a pain (though necessary at times on
 WIN32 since they opted for UTF16 as their internal API, in effect..)

I know.

 Anyway, best post an STR, and maybe a simple example we can use for
 testing.

It's no problem, but... my latest STR (fl_draw_image_mono(...) has no
body at all!) hang there without any interest

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Matthias Melcher

On 21.09.2011, at 14:50, Nikita Egorov wrote:

 Without looking into it in more detail, I'm going to say docs problem -
 it should not say n characters but should probably say something like
 the number of bytes needed to represent n characters in UTF8 or some
 such thing... Or...?
 Yes, description should be replaced to number of bytes, but I want
 to work with characters.

The docs are wrong. fl_draw is a low level function and should not do things 
like character counting.

It would be your responsibility to convert the number of characters into the 
number of bytes. There are functions for that in fltk_utf... . It also makes 
sense, because a string that seems identical can be represented in various ways 
(for example, an umlaut can be composed from a  and the letter U, or it can 
use the umlaut glyph - both is perfectly legal). And only the caller can know 
how fl_draw is used. So, bytes would be correct.

 To solve the problem we can convert all string to UTF-16 and
 then restrict its by the specified length.
 
 Which perhaps will not work either, since as soon as you hit any
 character that is not on the BMP you will need a UTF16 surrogate pair,
 and the same sort of problem occurs.
 
 But in UTF-16 all symbols have size two bytes. There is no problem to
 set specified size of string as opposed to UTF8 where every symbol can
 have own size (from 1 up to 5?) .

No, in UTF-16, characters can be composed as well, plus all characters above 
0x7fff (IIRC) are represented by a four byte sequence, just like characters 
above 0x7f in UTF-8 are represented by longer sequences. Most Chinese 
characters will need four bytes in UTF-16, for example.

 Indeed - the reason for having the length option in these methods is
 explicitly so that the string does not need to be NULL terminated...
 
 It is one more issue - the current implementation ignores NULL byte.
 It prints exactly N bytes from the source. Such behavior makes some
 problems for me too. May be we should check for zero char and stop
 printing the rest?

No. This function was written particularly so that a NUL can be printed. This 
may (or may not) be useful for certain terminal style text displays, who knows. 
But I do know that there was a reason to do it - back then.

 Not keen, if we are to have a clean UTF8 API (and I think we should)
 then...
 
 What is the harm if we add one more function which accepts UTF16
 string ? In MS Win it would be part of the gd-draw(...) which would
 invoked after conversion.

It was a conscious and consensual decision to go with UTF-8. If we add a single 
UTF-16 call, we will need to provide more and more calls, and we will need to 
provide support calls for platforms that do not use UTF-16 (All Unixes, for 
example). This is certainly possible, but who will maintain the code? AFAIK 
there are two functions to convert to and from UTF-16 which you can use if you 
prefer UTF-16 when coding. Internally however, FLTK uses UTF-8 (even if it has 
to convert back to UTF-16 to print text).

 Because in order to evaluate right length of
 text, user has to convert source line to the UTF-16, restrict
 size and convert into UTF-8 again to invoke fl_draw(s,x,y)
 where string will be converted one more time.  Thus at the
 moment there is two unnecessary conversions.
 
 Hmm, we have functions that tell you the number of characters in a
 UTF8 string, do these not help?
 
 I see no way to use it. Shortly - I have to print column of text lines
 restricted by specified length. I use the Courier font.
 The lines can contain any symbols - latin, cyrillic etc. I can't
 evaluate needed size of line in bytes if I use UTF8. Only via
 conversion into any accessible format with monosized symbols.

No, even if you use monospace fonts, you can not assume that the number of 
characters times the width of the font will give you the width of the string 
that will be rendered on screen. There are characters and character 
combinations in Unicode that need more or less pixels, even in monospaced 
fonts! There are even character sequences that have different width in 
different combinations, so simply adding up the width on each individual 
character will not work.

The only reliable way to get the width of whatever is printed is using 
fl_width() after setting the font and size.

Trust me, I know all that because I converted Fl_Text_Editor from ASCII to 
UTF-8, and doing this, I learned much more about Unicode than I ever wanted to 
know.

 Anyway, best post an STR, and maybe a simple example we can use for
 testing.
 
 It's no problem, but... my latest STR (fl_draw_image_mono(...) has no
 body at all!) hang there without any interest

FLTK is an Open Source effort. We are really trying to keep the ball rolling, 
but we all have regular jobs and some of us even regular kids ;-) . Please be 
patient.  Our first goal is to keep FLTK1 as stable and as bug-free as 
possible. 

 - Matthias


___
fltk-dev mailing 

Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread MacArthur, Ian (SELEX GALILEO, UK)

 But in UTF-16 all symbols have size two bytes. There is no problem to
 set specified size of string as opposed to UTF8 where every symbol can
 have own size (from 1 up to 5?) .

Not true I'm afraid - only glyphs from the BMP are sure to be two bytes
in UTF16.
Any glyph from a higher plane will be 4 bytes.
Check up on surrogate pairs to see what I mean.


 I see no way to use it. Shortly - I have to print column of text lines
 restricted by specified length. I use the Courier font.
 The lines can contain any symbols - latin, cyrillic etc. I can't
 evaluate needed size of line in bytes if I use UTF8. Only via
 conversion into any accessible format with monosized symbols.

I might need to see an example to help visualize what you are doing...
I'm not sure I'm understanding all the nuances of your problem.


  Anyway, best post an STR, and maybe a simple example we can use for
  testing.
 
 It's no problem, but... my latest STR (fl_draw_image_mono(...) has no
 body at all!) hang there without any interest

Oh we read them - we're just busy, so don't get much time to actually do
anything about them...

I still think that posting the STR with details, examples, etc., greatly
increases the chance that *something* might get done!





___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread MacArthur, Ian (SELEX GALILEO, UK)

 The only reliable way to get the width of whatever is printed 
 is using fl_width() after setting the font and size.

Or my preferred option of fl_text_extents()




SELEX Galileo Ltd
Registered Office: Sigma House, Christopher Martin Road, Basildon, Essex SS14 
3EL
A company registered in England  Wales.  Company no. 02426132

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender.
You should not copy it or use it for any purpose nor disclose or
distribute its contents to any other person.


___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Nikita Egorov
 No, even if you use monospace fonts, you can not assume that the number of 
 characters times the width of the font will give you the width of the string 
 that will be rendered on screen. There are characters and character 
 combinations in Unicode that need more or less pixels, even in monospaced 
 fonts! There are even character sequences that have different width in 
 different combinations, so simply adding up the width on each individual 
 character will not work.

Hmm, it's quite new thing for me! Could you tell me a sample of one of
such combinations?

 The only reliable way to get the width of whatever is printed is using 
 fl_width() after setting the font and size.

I have no width of line in dots. I have only maximal length of string
in characters.

 Trust me, I know all that because I converted Fl_Text_Editor from ASCII to 
 UTF-8, and doing this, I learned much more about Unicode than I ever wanted 
 to know.

Thanks, now I know that my method is not quite correct in all cases.

 FLTK is an Open Source effort.
I know it. And I sent how to fix found bugs.

  We are really trying to keep the ball rolling, but we all have regular jobs 
 and some of us even regular kids ;-) . Please be patient.  Our first goal is 
 to keep FLTK1 as stable and as bug-free as possible.

Mine too.

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Matthias Melcher

On 21.09.2011, at 16:52, Nikita Egorov wrote:

 No, even if you use monospace fonts, you can not assume that the number of 
 characters times the width of the font will give you the width of the string 
 that will be rendered on screen. There are characters and character 
 combinations in Unicode that need more or less pixels, even in monospaced 
 fonts! There are even character sequences that have different width in 
 different combinations, so simply adding up the width on each individual 
 character will not work.
 
 Hmm, it's quite new thing for me! Could you tell me a sample of one of
 such combinations?

characters  0x2e80 for example have a width of two monospace chars. These are 
mostly Chinese, Japanese and Korean. Basically, for monospaced font in Unicode, 
you can have non-spacing, single-width or double-width characters or ligatures.

 The only reliable way to get the width of whatever is printed is using 
 fl_width() after setting the font and size.
 
 I have no width of line in dots. I have only maximal length of string
 in characters.

Oh, OK. I misread that. 

You can use this:

/* OD: returns the number of Unicode chars in the UTF-8 string */
FL_EXPORT int fl_utf_nb_char(const unsigned char *buf, int len);

 - Matthias
___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Nikita Egorov
 characters  0x2e80 for example have a width of two monospace chars. These 
 are mostly Chinese, Japanese and Korean. Basically, for monospaced font in 
 Unicode, you can have non-spacing, single-width or double-width characters or 
 ligatures.

I know about the double-width characters. Fortunately, it's not my case anyhow.

 You can use this:

 /* OD: returns the number of Unicode chars in the UTF-8 string */
 FL_EXPORT int fl_utf_nb_char(const unsigned char *buf, int len);

OK. I have string with some length (m = strlen(s);) and I have n -
quantity of characters which I can print.
I can calculate k = fl_utf_nb_char(s, m);

If k = n then no problem : fl_draw(s,x,y);

Otherwise?
In cycle decrease the source text  by one byte and check for length, yes?

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Matthias Melcher

This should work (untested):

int findFirstNCharacters(const char *str, int n) 
{
  int bytes = 0;
  int maxBytes = strlen(str);
  while (n0 != *str!=0) {
int bytesInChar = fl_utf_nb_char(*str, maxBytes);
if (bytesInChar==-1) break; // error in UTF-8
bytes += bytesInChar;
maxBytes -= bytesInChar;
str += bytesInChar;
n--;
  }
  return bytes;
}

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Ian MacArthur

On 21 Sep 2011, at 18:20, Matthias Melcher wrote:

 
 This should work (untested):
 
 int findFirstNCharacters(const char *str, int n) 
 {
  int bytes = 0;
  int maxBytes = strlen(str);
  while (n0 != *str!=0) {
int bytesInChar = fl_utf_nb_char(*str, maxBytes);
if (bytesInChar==-1) break; // error in UTF-8
bytes += bytesInChar;
maxBytes -= bytesInChar;
str += bytesInChar;
n--;
  }
  return bytes;
 }

If the intent is to trim glyphs off the end of a string until it only has the 
required number of glyphs left, then I think you could do something useful 
using:

/* F2: Move backward to the previous valid UTF8 sequence start */
FL_EXPORT const char* fl_utf8back(const char* p, const char* start, const char* 
end);


Starting one byte in from the end, I think you can use this to walk backwards 
through the string until you have removed the necessary number of glyphs... 

Say you have used:

  int num_glyphs = fl_utf_nb_char(const unsigned char *buf, int len);

To determine that there are 10 glyphs in your string, and you know you only 
have room for 6 on the screen, then you could use 

   const char *glyph_begin = fl_utf8back(buf, ...);


4 times to walk back to the start of glyph 7, then by comparing the values of 
buf and glyph_begin you know exactly how many bytes in your string are 
required to create the 6 glyphs that you can fit on the display...

Well, something like that, anyway...




___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Nikita Egorov
 This should work (untested):

 int findFirstNCharacters(const char *str, int n)
 {
  int bytes = 0;
  int maxBytes = strlen(str);
  while (n0 != *str!=0) {
    int bytesInChar = fl_utf_nb_char(*str, maxBytes);
    if (bytesInChar==-1) break; // error in UTF-8
    bytes += bytesInChar;
    maxBytes -= bytesInChar;
    str += bytesInChar;
    n--;
  }
  return bytes;
 }

Thank you for this clue!
I suppose you meant something like int bytesInChar =
fl_utf8len1(*str) instead the  = fl_utf_nb_char(*str, maxBytes),
aren't ?
Anyhow I made function which works as I want. I won't try to suggest
to add it to fl_draw(), but such function would be a good helper to
use with fl_draw() as Ian said before.

Nikita Egorov

PS Though it seems to me fl_draw(s,n,x,y) is a function that gives
confusing results and it needs thorough docs at least.

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Nikita Egorov
 If the intent is to trim glyphs off the end of a string until it only has the 
 required number of glyphs left, then I think you could do something useful 
 using:

 /* F2: Move backward to the previous valid UTF8 sequence start */
 FL_EXPORT const char* fl_utf8back(const char* p, const char* start, const 
 char* end);


 Starting one byte in from the end, I think you can use this to walk backwards 
 through the string until you have removed the necessary number of glyphs...

 Say you have used:

  int num_glyphs = fl_utf_nb_char(const unsigned char *buf, int len);

 To determine that there are 10 glyphs in your string, and you know you only 
 have room for 6 on the screen, then you could use

   const char *glyph_begin = fl_utf8back(buf, ...);

 4 times to walk back to the start of glyph 7, then by comparing the values of 
 buf and glyph_begin you know exactly how many bytes in your string are 
 required to create the 6 glyphs that you can fit on the display...

 Well, something like that, anyway...

Thanks, I found what I need.  It can be approached via several ways,
certainly. :)

off topic: I'm not sure the word glyph is a proper one in our case.
IIRC the glyph can be only part of character. A few glyphs can be at
one character cell and make up grapheme, symbol. So I'm interested in
how many character cells will be displayed.

Nikita

___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev


Re: [fltk.development] problem with fl_draw(s,n,x,y)

2011-09-21 Thread Ian MacArthur

On 21 Sep 2011, at 21:39, Nikita Egorov wrote:
 
 off topic: I'm not sure the word glyph is a proper one in our case.
 IIRC the glyph can be only part of character. A few glyphs can be at
 one character cell and make up grapheme, symbol. So I'm interested in
 how many character cells will be displayed.

OK - but you can not know that from just looking at each character in the 
string, in the general case.

In a language (say English) that has simple typography, then it can Just Work; 
1 Unicode codepoint == one character == one glyph == one displayed symbol.

But for languages with more complex layout constraints, e.g. Indic languages, 
or Semitic languages, then you need to see each symbol in the context of the 
surrounding symbols, and any ligatures, re-ordering, variant shapes... before 
you can even tell what needs to be drawn.
And remember that the canonical order of the characters in the string may not 
match the order they are shown on screen, and that eliding some or all of the 
characters may substantially alter the appearance of the rendered text...

Often, it may be better to render the text then clip it to fit, than to try and 
pick out just a few characters from the stream.

But I guess you are not interested in that level of complexity! (I know I'm 
not!)

Anyway, returning to my point before I wandered way off... For English there's 
generally a pretty close mapping between characters in the string, their 
canonical order in the string, and the glyphs that appear on the screen.
I think that's mostly sort of true for LGC languages (though not always - 
Matt's Umlaut example...) and often true for CJK text (again, with many 
exceptions...)

But yes, the final symbol that appears on screen may be composed of multiple 
fragments, and simply counting character entries in the source string is not 
going to help with that at all...

Anyway - fltk has no tools to help with that, we just fire symbols onto the 
screen.
To know what symbols to fire, you need to look at ICU or PanGo or some such 
mechanism to figure out how to compose a view of the string from the canonical 
representation in the bytes - then maybe just clip that to fit into the 
viewable area...



___
fltk-dev mailing list
fltk-dev@easysw.com
http://lists.easysw.com/mailman/listinfo/fltk-dev