Re: UTF-8 in string literals and translation strings in particular

2015-10-09 Thread Georg Baum
Guillaume Munch wrote:

> So, is the plan is to change char_type from wchar_t to uchar32_t in 2.3
> and use the syntax U"..." to directly define docstring literals? Do you
> see any issues with this change? Then we do not need any conversion
> method, we can just use docstring for all purposes when non-ASCII chars
> are involved.

Actually I did not think that far, but yes, this is a very good idea. I 
don't think that there will be issues, this stuff is very well defined, and 
easy to implement for compiler vendors. Then we would finally get rid of the 
windows/linux differences (wchar_t vs. uint32_t for char_type) as well.

> Now for the patch under discussion the plan become:
> 
> 1/ wait for 2.3
> 2/ allow utf8 in translatable strings, using unicode literals.
> 3/ use … instead of ... in the UI (as before)
> 
> Does it make sense?

I think so.


Georg




Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Guillaume Munch

Please, do not scatter the messages that much, and make the discussion
easy to follow.


Recalling the subject of the discussion:


Le 08/10/2015 01:55, Cyrille Artho a écrit :


Why not apply a variant of that hack as using "sed" to all the
language/menu strings in the repository? Of course the changes would
 have to be checked by a human in case one wants to keep "..." in
some cases.


I already did these replacements in the previously posted patch. The
issue is that we then need to adapt the translation chain to be
compatible with UTF-8 strings in the source and ensure that all goes
well. This is also done in the patch. I took the time to make these
improvements because I see them as a lifting of restrictions for future
creative improvements to the UI (for which my patch may or may not
suffice, but this is a first step).

Pavel and Jean-Marc have technical objections that they haven't always
made very clear yet, they are also worried that it might increase the
work of the translators, and Jean-Marc and Abdel suggest alternatives
which are more hackish than my solution.



1. About technical objections


Le 07/10/2015 00:23, Pavel Sanda a écrit :

To clarify, I'm not against having ellipsis in the menu per se by
whatever mechanism you use to pass it to Qt routines. What I am not
way too happy is increasing % of utf8-based source code itself and
following translations.



But, encouraging to use UTF-8 is the main purpose of the patch, and you
are not providing further arguments.

I would have understood a reluctance about UTF-8 five years ago, but
it is not likely to go away now that it is here. Moreover having sources
in this specific encoding is becoming common practice, for instance it
is even a requirement for Qt5 compatibility:
.

Here's a short example of symbols that cannot be used yet in translation
strings, default menu .inc files, and translatable parts of
layouts/modules, and which we have to hack to get them anywhere else:

“ ” ° § ¶ ½ µ ÷ ✓ £ ␣ ‰


Le 07/10/2015 10:14, Jean-Marc Lasgouttes a écrit :


I would not describe that code as readable ;) This early return in
the loop is pretty weird.



I wanted to leave the algorithm unchanged on the ASCII subset, because I
feared that somebody would find to object about a possible loss in
performance. I knew I had to think about everything to get this patch
accepted. Now I can simply say that I will write the simpler,
non-optimised version instead if you prefer. I doubt that
there would be any perceptible difference.






(For ellipses though, I find more useful to read … in the source
rather than \u2026.)


Well it would if our mechanism for sending patches to lyx-cvs did
specify correctly the attachment encoding to utf8. Right now, I just
see garbage.



This is a separate issue that will need to be fixed, but there is no
time pressure. If that becomes too inconvenient then it means that the
patch that I propose has become useful. Also, are you sure that the
problem is with the mechanism that you mention? I also have issues with
the encoding of attached files, but after investigating the headers I
concluded that it was a bug in Thunderbird.




2. About translations and translators



Le 07/10/2015 09:36, Jean-Marc Lasgouttes a écrit :

Le 06/10/2015 21:43, Pavel Sanda a écrit :

Jean-Marc Lasgouttes wrote:

Well, actually I am not sure that we use the same ellipsis as in
 English.


You mean as a translator I need to find French version of
ellipsis?


I am not sure that it exists.



Jean-Marc's joke ironically shows that we should use Unicode, because if
there was such a convention in France around a different shape of the
ellipsis symbol, then OpenType fonts could provide it as a localised
variant through the feature called 'locl' and it would be taken care of
by e.g. Pango/HarfBuzz (under Linux), not LyX. (This is food for
orthotypographic thoughs.)

But I think that in the rest of the discussion we have already agreed
that we would ideally like to have … in the menus.




On 07/10/2015 10:54, Jean-Marc Lasgouttes wrote:

By doing this change, we put more pressure on our translators to do
thing right.


Which I would rather call "showing the good example", and which is good
as long as it does not incur more work for translators.


So I wanted to see if it was as simple as I thought, and solve the
problem at once with a small script that changes ... into …  in the po
files whenever the source string has changed in this way. (It takes as a
source both translations before and after message merge). Also, when one
is certain about the translation (i.e. up to this substitution it
matches exactly a former non-fuzzy translation), one must remove the
"fuzzy" tag or this might confuse translators.

I have to give you credit for your reluctance because I have found that
the string suggested by gettext was not always the good one. But I have
also found that in these 

Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Jean-Marc Lasgouttes

Le 08/10/2015 11:58, Guillaume Munch a écrit :

Please, do not scatter the messages that much, and make the discussion
easy to follow.


I will reply only to some of the arguments for the sake of efficiency.


Le 07/10/2015 00:23, Pavel Sanda a écrit :

To clarify, I'm not against having ellipsis in the menu per se by
whatever mechanism you use to pass it to Qt routines. What I am not
way too happy is increasing % of utf8-based source code itself and
following translations.



But, encouraging to use UTF-8 is the main purpose of the patch, and you
are not providing further arguments.


The problem with the patch is that it does not have a clear goal. The 
discussion would have been much easier if you had splitted it in 3 from 
the start:


1/ easy use of utf8 in docstring
2/ allow utf8 in translattable strings
3/ use … instead of ... in UI

And now we have to discuss it all in one 'for or  against my proposal' 
discussion.


For the record, concerning these 3 problems:
1/ I would agree with extending docstring so that it considers that char 
const * and std::string represent UTF8. However, I wonder what is the 
best approach for that. Making this work only for some operators seems 
strange to me. Wouldn't it be possible to set up some implicit constructors?


2/ This is possible, as long as we prove that their is a need for it. I 
know/think that gettext discourages use of non-ascii, but I do not know 
whether there are valid reasons for that.


3/ I agree that we should use … in UI, but I have reservations about 
whether changing all of our po files is the way to go right now.



I would have understood a reluctance about UTF-8 five years ago, but
it is not likely to go away now that it is here. Moreover having sources
in this specific encoding is becoming common practice, for instance it
is even a requirement for Qt5 compatibility:
.


We do already specify utf8 for our source (comments actually).


I would not describe that code as readable ;) This early return in
the loop is pretty weird.


I wanted to leave the algorithm unchanged on the ASCII subset, because I
feared that somebody would find to object about a possible loss in
performance. I knew I had to think about everything to get this patch
accepted. Now I can simply say that I will write the simpler,
non-optimised version instead if you prefer. I doubt that
there would be any perceptible difference.


OK.


Well it would if our mechanism for sending patches to lyx-cvs did
specify correctly the attachment encoding to utf8. Right now, I just
see garbage.



This is a separate issue that will need to be fixed, but there is no
time pressure. If that becomes too inconvenient then it means that the
patch that I propose has become useful. Also, are you sure that the
problem is with the mechanism that you mention? I also have issues with
the encoding of attached files, but after investigating the headers I
concluded that it was a bug in Thunderbird.


I agree that it is a separate issue.


I am not sure that it exists.


Jean-Marc's joke ironically shows that we should use Unicode, because if
there was such a convention in France around a different shape of the
ellipsis symbol, then OpenType fonts could provide it as a localised
variant through the feature called 'locl' and it would be taken care of
by e.g. Pango/HarfBuzz (under Linux), not LyX. (This is food for
orthotypographic thoughs.)


Does it mean that the font looks different when I change the language of 
the document? Why not.



The script gives by construction no false positive (i.e. we remove the
fuzzy tag only when we are sure), and it provides a diagnosis, so we can
check that there are very few false negatives (I haven't seen one yet)
(this means that there is always a reason to let a fuzzy translation go
to the translators). I would propose to run it right after the 2.2
string merge and right before they are given to translators.


That is reasonable if we go this way.


Our translateors are not all geeks. We have to make life easy for
them.


I did not realise that this particular discussion was about how
*translators* will type the special characters. Because for translators
the solution is pretty simple: it's copy and paste.


You mean that one is supposed for reach for the mouse just to type 
something? Not the fastest way to go. But we could possibly provide 
guidance for each language.


Seriously, saying "use your mouse to get it" or "type Alt 2 0 2 6 every 
time you want to get the character" is taking people a bit lightly.
The idea of LyX is to walk the fine line between canonical LaTeX and 
convenience. Likewise, we should consider where the line is between 
strict orthotypography and convenience.



An alternative would be the attached hack.


But, I still do not see how you want to treat my enhancement for
HSpaceUi.ui with that kind of hack. (And, you call it a hack yourself.)


Don't use 

Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Stephan Witt
Am 08.10.2015 um 16:02 schrieb Jean-Marc Lasgouttes :

>>> I have to admit that I did not understand Abdel's idea %-|
>> 
>> I understood it that way: to present the ellipsis character in the UI
>> there is no need to put that character with unicode in the source code.
>> It is possible to translate the sequence "..." to the character "…"
>> within the I18N process. I think the po files were not in the scope
>> of this discussion - they don't count as "source" files here.
> 
> Then it is exactly what my "hack" does. I attach it again for reference.

I think the idea was to let the translators do it manually where it is
appropriate. And to do it for english this way too.

Stephan

> It probably needs a bit more polish, but the idea is here.
> 
> JMarc
> 
> <0001-Transform-.-to-proper-ellipsis-in-translations.patch>



Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Stephan Witt
Am 08.10.2015 um 15:45 schrieb Jean-Marc Lasgouttes :

> Le 08/10/2015 11:58, Guillaume Munch a écrit :
>> Please, do not scatter the messages that much, and make the discussion
>> easy to follow.
> 
> I will reply only to some of the arguments for the sake of efficiency.
> 

…

> 
>> Le 07/10/2015 20:32, Abdelrazak Younes a écrit :
>>> 
>>> No need for any hack: Another alternative that works AFAIR and is
>>> cleaner IMO, english to english translation works. So, for english or
>>> french actually, ... can be translated to the ellipsis character.
> 
> I have to admit that I did not understand Abdel's idea %-|

I understood it that way: to present the ellipsis character in the UI
there is no need to put that character with unicode in the source code.
It is possible to translate the sequence "..." to the character "…"
within the I18N process. I think the po files were not in the scope
of this discussion - they don't count as "source" files here.

Stephan

Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Jean-Marc Lasgouttes

I have to admit that I did not understand Abdel's idea %-|


I understood it that way: to present the ellipsis character in the UI
there is no need to put that character with unicode in the source code.
It is possible to translate the sequence "..." to the character "…"
within the I18N process. I think the po files were not in the scope
of this discussion - they don't count as "source" files here.


Then it is exactly what my "hack" does. I attach it again for reference. 
It probably needs a bit more polish, but the idea is here.


JMarc

>From c48e487a7c1ba98061d55d6b2d0f2b2a0a219986 Mon Sep 17 00:00:00 2001
From: Jean-Marc Lasgouttes 
Date: Wed, 7 Oct 2015 10:34:06 +0200
Subject: [PATCH] Transform ... to proper ellipsis in translations

---
 src/support/Messages.cpp |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/support/Messages.cpp b/src/support/Messages.cpp
index da1c3cc..177436f 100644
--- a/src/support/Messages.cpp
+++ b/src/support/Messages.cpp
@@ -94,6 +94,7 @@
 #endif
 
 using namespace std;
+using namespace lyx::support;
 using boost::uint32_t;
 
 namespace lyx {
@@ -125,6 +126,10 @@ void cleanTranslation(docstring & trans)
}
break;
}
+
+   static docstring const threedots = from_ascii("...");
+   static docstring const ellipsis = docstring(1, 0x2026);
+   trans = subst(trans, threedots, ellipsis);
 }
 
 } // lyx
@@ -132,8 +137,6 @@ void cleanTranslation(docstring & trans)
 
 #ifdef ENABLE_NLS
 
-using namespace lyx::support;
-
 namespace lyx {
 
 std::string Messages::gui_lang_;
-- 
1.7.9.5



Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Kornel Benko
Am Donnerstag, 8. Oktober 2015 um 16:14:49, schrieb Stephan Witt 

> Am 08.10.2015 um 16:02 schrieb Jean-Marc Lasgouttes :
> 
> >>> I have to admit that I did not understand Abdel's idea %-|
> >> 
> >> I understood it that way: to present the ellipsis character in the UI
> >> there is no need to put that character with unicode in the source code.
> >> It is possible to translate the sequence "..." to the character "…"
> >> within the I18N process. I think the po files were not in the scope
> >> of this discussion - they don't count as "source" files here.
> > 
> > Then it is exactly what my "hack" does. I attach it again for reference.
> 
> I think the idea was to let the translators do it manually where it is
> appropriate. And to do it for english this way too.

I think that too. I did it for sk.po, it looks OK to me.
No change in UI or source. Nobody is forced to change the po files. Encouraging
to do so is OK IMHO.

> Stephan
> 
> > It probably needs a bit more polish, but the idea is here.
> > 
> > JMarc
> > 
> > <0001-Transform-.-to-proper-ellipsis-in-translations.patch>

Kornel

signature.asc
Description: This is a digitally signed message part.


Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Guillaume Munch

Le 08/10/2015 16:46, Kornel Benko a écrit :

Am Donnerstag, 8. Oktober 2015 um 16:14:49, schrieb Stephan Witt



I think the idea was to let the translators do it manually where
it is appropriate. And to do it for english this way too.


I think that too. I did it for sk.po, it looks OK to me. No change
in UI or source. Nobody is forced to change the po files. Encouraging
to do so is OK IMHO.



With this approach, Jean-Marc's objections suddenly become meaningful:



You mean that one is supposed for reach for the mouse just to type
something? Not the fastest way to go. But we could possibly provide
guidance for each language.

Seriously, saying "use your mouse to get it" or "type Alt 2 0 2 6
every time you want to get the character" is taking people a bit
lightly. The idea of LyX is to walk the fine line between canonical
LaTeX and convenience. Likewise, we should consider where the line
is between strict orthotypography and convenience.


I could always update fr.po by hand to have ellipses as you suggest, but 
will future translators do it when new cases arise, if the original 
string does not even give the good example and if one has to 
remember/look elsewhere to get this special char?


This approach is fundamentally broken. The sed script that replaces "" 
with “” was only a workaround for the same problem we are discussing now.



Guillaume



Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Guillaume Munch

Le 08/10/2015 14:45, Jean-Marc Lasgouttes a écrit :

I will reply only to some of the arguments for the sake of
efficiency.


As we usually say, I lacked the time to make it shorter...



The problem with the patch is that it does not have a clear goal.
The discussion would have been much easier if you had splitted it in
3 from the start:

1/ easy use of utf8 in docstring 2/ allow utf8 in translattable
strings 3/ use … instead of ... in UI


I planned to do that eventually. I have started being less eager in my
rebases after a gentle remark from Richard, but this was after I started
working on this patch. Not today, though.



And now we have to discuss it all in one 'for or  against my
proposal' discussion.


Please discuss 2), for which 1) and 3) are an immediate consequence to
me. This was also what I raised in my original message as witnessed by
the subject of the thread. Indeed I do not think 1) or 3) alone would be
quite worth our time.



For the record, concerning these 3 problems: 1/ I would agree with
extending docstring so that it considers that char const * and
std::string represent UTF8. However, I wonder what is the best
approach for that. Making this work only for some operators seems
strange to me. Wouldn't it be possible to set up some implicit
constructors?


My rationale was that UTF-8 strings are only intermediates for building
docstrings or QStrings, on which we do the real operations. Operating on
strings directly must still be discouraged. So nothing changes here, it
is just a bit easier to build docstrings with non-ASCII chars. We do not
want more operations on strings. Or did I miss any obvious operation we 
want?




2/ This is possible, as long as we prove that their is a need for
it.


How do we proceed? We have provided several examples already.


I know/think that gettext discourages use of non-ascii, but I do not
know whether there are valid reasons for that.


I could not find post-2004 sources about this. Gettext supports UTF-8 
starting from version 0.12 and we now have a proof of concept that it 
works for LyX.





Does it mean that the font looks different when I change the
language of the document? Why not.


Yes, it's already a thing in some languages.




I did not realise that this particular discussion was about how
*translators* will type the special characters. Because for
translators the solution is pretty simple: it's copy and paste.


You mean that one is supposed for reach for the mouse just to type
something? Not the fastest way to go. But we could possibly provide
guidance for each language.

Seriously, saying "use your mouse to get it" or "type Alt 2 0 2 6
every time you want to get the character" is taking people a bit
lightly. The idea of LyX is to walk the fine line between canonical
LaTeX and convenience. Likewise, we should consider where the line
is between strict orthotypography and convenience.



What I mean is, if I am translating fr.po, I get the following:

  #: src/frontends/qt4/ui/HSpaceUi.ui:37
  msgid "…… (dots)"
  msgstr ""

and now I just have to copy-paste the dots:

  msgstr "…… (pointillés)"

I do not see why you refer to the idea of LyX in this context or speak
of grabbing the mouse. I don't understand the confusion. Are you
thinking of the manuals? But I am not suggesting changes to the manuals.
LaTeX's ellipsis is quite different and I do not like it just like you 
don't.




3/ I agree that we should use … in UI, but I have reservations about
whether changing all of our po files is the way to go right now.


The script gives by construction no false positive (i.e. we remove
the fuzzy tag only when we are sure), and it provides a diagnosis,
so we can check that there are very few false negatives (I haven't
seen one yet) (this means that there is always a reason to let a
fuzzy translation go to the translators). I would propose to run
it right after the 2.2 string merge and right before they are given
to translators.


That is reasonable if we go this way.





An alternative would be the attached hack.


But, I still do not see how you want to treat my enhancement for
HSpaceUi.ui with that kind of hack. (And, you call it a hack
yourself.)


Don't use my words against me :) It is a hack because I wrote it in
10 minutes and I would not put that in code before checking what is
missing. Since it operates are message translation time (not only
menus), I would say that it does 95% of the work.



95% is approximately the concrete figure I have for my own script, and
the rest only has to be corrected/validated only once by hand in a 
distributed fashion ;)





A contributor has to do a first step, unless your "not for today"
is a euphemism for never.


Be my guest! I do not think that anybody coming up with support for
STIX fonts would be turned down.


I meant, I consider that improving LyX's support for Unicode at the UI
level is a first step towards anything Unicode.




Le 07/10/2015 20:32, Abdelrazak Younes a écrit :


No need for 

Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Kornel Benko
Am Donnerstag, 8. Oktober 2015 um 17:35:46, schrieb Guillaume Munch 

> Le 08/10/2015 16:46, Kornel Benko a écrit :
> > Am Donnerstag, 8. Oktober 2015 um 16:14:49, schrieb Stephan Witt
> > 
> >>
> >> I think the idea was to let the translators do it manually where
> >> it is appropriate. And to do it for english this way too.
> >
> > I think that too. I did it for sk.po, it looks OK to me. No change
> > in UI or source. Nobody is forced to change the po files. Encouraging
> > to do so is OK IMHO.
> 
> 
> With this approach, Jean-Marc's objections suddenly become meaningful:
> 
> 
> > You mean that one is supposed for reach for the mouse just to type
> > something? Not the fastest way to go. But we could possibly provide
> > guidance for each language.
> >
> > Seriously, saying "use your mouse to get it" or "type Alt 2 0 2 6
> > every time you want to get the character" is taking people a bit
> > lightly. The idea of LyX is to walk the fine line between canonical
> > LaTeX and convenience. Likewise, we should consider where the line
> > is between strict orthotypography and convenience.
> 
> I could always update fr.po by hand to have ellipses as you suggest, but 
> will future translators do it when new cases arise, if the original 
> string does not even give the good example and if one has to 
> remember/look elsewhere to get this special char?

Makes sense.

> This approach is fundamentally broken. The sed script that replaces "" 
> with “” was only a workaround for the same problem we are discussing now.
> 

True.

> Guillaume

Kornel

signature.asc
Description: This is a digitally signed message part.


Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Enrico Forestieri
On Thu, Oct 08, 2015 at 09:48:52PM +0200, Jean-Marc Lasgouttes wrote:
> Le 08/10/15 03:19, Cyrille Artho a écrit :
> >It's possible to embed characters from a font in an SVG graph; this
> >could be used to avoid problems with fonts that do not have all math
> >characters.
> 
> I seem to remember (but git log is your friend) that these icon started
> their life with embedded fonts, and that they were transformed to something
> else later.
> 
> But only Enrico can tell us what the status is.

I don't think they were embedded as they looked different on different
platforms. They were replaced with paths, i.e., a vectorial representation
of the used characters, so that they appear the same on any platform as
they were originally intended. This can be done with inkscape: after
inserting some text using a specific font, it can be transformed into a
path by a menu entry.

-- 
Enrico


Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Georg Baum
Jean-Marc Lasgouttes wrote:

> The problem with the patch is that it does not have a clear goal. The
> discussion would have been much easier if you had splitted it in 3 from
> the start:
> 
> 1/ easy use of utf8 in docstring
> 2/ allow utf8 in translattable strings
> 3/ use … instead of ... in UI

4) use of unicode string literals in C++ source files

This would have been easier indeed. For example, I have no real opinion 
about 2) and 3).

4) is not possible as long as we support C++98 (because the source encoding 
is not standardized and especially MSVC has a horrible interpretation of 
it).

Concerning 1) I have a strong opinion which needs a bit of history 
explained: When unicode support was introduced in LyX the idea was to 
replace all strings which can contain non-ASCII contents with docstring. The 
only exceptions would be interfaces to third party libraries or 
import/export, where it is sometimes needed to use std::string with a 
certain encoding. Unfortunately this conversion was never completely 
finished (this is the reason for all the "FIXME UNICODE" comments). 
Therefore, after finishing this task, all occurences of std::string would 
contain ASCII contents with very rare exceptions.
The alternative which was also discussed was to use docstring everywhere. 
This would have been less work to do, but the advantages of the mixed 
docstring/std::string approach were bugs found during the transition 
process, more memory and runtime efficiency, and (if it was completed) a 
clear picture where one can expect ASCII and where user visible contents is 
used.

The proposed changes to docstring weaken the clear separation of ASCII/non-
ASCII contents. They are not needed if the unicode transition is finished 
(i.e. all "FIXME UNICODE" comments addressed). They are not needed either if 
we change our mind and use the alternative approach of docstring everywhere 
instead. For me, the disadvantages count much higher than the advantages, 
therefore I would suggest to either finish the unicode transition, or using 
docstring everywhere. The only exception would be unicode string literals in 
C++11 mode. Support for these in docstring is both safe and useful in any 
case.

> For the record, concerning these 3 problems:
> 1/ I would agree with extending docstring so that it considers that char
> const * and std::string represent UTF8. However, I wonder what is the
> best approach for that. Making this work only for some operators seems
> strange to me. Wouldn't it be possible to set up some implicit
> constructors?

Implicit constructors do not exist on purpose, for the reason explained 
above. qt has also learned that they are problematic (see 
QT_NO_CAST_FROM_ASCII, which is actually misnamed since it disables casts 
from const char * which are implemented using fromUtf8). Recently I also 
learned that a volunteer provided a huge patch some time ago for the kate 
editor which made it use QT_NO_CAST_FROM_ASCII in order to avoid bugs.
 

Georg




Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Guillaume Munch

Le 08/10/2015 22:11, Georg Baum a écrit :

Jean-Marc Lasgouttes wrote:


The problem with the patch is that it does not have a clear goal. The
discussion would have been much easier if you had splitted it in 3 from
the start:

1/ easy use of utf8 in docstring
2/ allow utf8 in translattable strings
3/ use … instead of ... in UI


4) use of unicode string literals in C++ source files




Thank you for the disambiguation. I included this in 2).



This would have been easier indeed. For example, I have no real opinion
about 2) and 3).

4) is not possible as long as we support C++98 (because the source encoding
is not standardized and especially MSVC has a horrible interpretation of
it).


Then, I agree this cannot go into 2.2. For 2.3, on the other hand, C++11 
opens better possibilities like directly writing docstring literals 
which is no doubt better than extending the string -> docstring conversions.





Concerning 1) I have a strong opinion which needs a bit of history
explained: When unicode support was introduced in LyX the idea was to
replace all strings which can contain non-ASCII contents with docstring. The
only exceptions would be interfaces to third party libraries or
import/export, where it is sometimes needed to use std::string with a
certain encoding. Unfortunately this conversion was never completely
finished



(this is the reason for all the "FIXME UNICODE" comments).


Good to know.


Therefore, after finishing this task, all occurences of std::string would
contain ASCII contents with very rare exceptions.
The alternative which was also discussed was to use docstring everywhere.
This would have been less work to do, but the advantages of the mixed
docstring/std::string approach were bugs found during the transition
process, more memory and runtime efficiency, and (if it was completed) a
clear picture where one can expect ASCII and where user visible contents is
used.

The proposed changes to docstring weaken the clear separation of ASCII/non-
ASCII contents. They are not needed if the unicode transition is finished
(i.e. all "FIXME UNICODE" comments addressed). They are not needed either if
we change our mind and use the alternative approach of docstring everywhere
instead. For me, the disadvantages count much higher than the advantages,
therefore I would suggest to either finish the unicode transition, or using
docstring everywhere. The only exception would be unicode string literals in
C++11 mode. Support for these in docstring is both safe and useful in any
case.



I am now convinced that string must remain ASCII.

Even independently from the issue 4) with C++98, I agree that it is 
better to wait 2.3 for C++11 support and not cast in stone a situation 
that would have been created by C++98 limitations.


So, is the plan is to change char_type from wchar_t to uchar32_t in 2.3 
and use the syntax U"..." to directly define docstring literals? Do you 
see any issues with this change? Then we do not need any conversion 
method, we can just use docstring for all purposes when non-ASCII chars 
are involved.


Now for the patch under discussion the plan become:

1/ wait for 2.3
2/ allow utf8 in translatable strings, using unicode literals.
3/ use … instead of ... in the UI (as before)

Does it make sense?

Thank you for the detailed answer.


Guillaume



Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Abdelrazak Younes

On 08/10/2015 16:14, Stephan Witt wrote:

Am 08.10.2015 um 16:02 schrieb Jean-Marc Lasgouttes :


I have to admit that I did not understand Abdel's idea %-|

I understood it that way: to present the ellipsis character in the UI
there is no need to put that character with unicode in the source code.
It is possible to translate the sequence "..." to the character "…"
within the I18N process. I think the po files were not in the scope
of this discussion - they don't count as "source" files here.

Then it is exactly what my "hack" does. I attach it again for reference.

I think the idea was to let the translators do it manually where it is
appropriate. And to do it for english this way too.


Indeed :-)

Abdel


Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Abdelrazak Younes

On 08/10/2015 11:58, Guillaume Munch wrote:

I think that my solution is vastly superior to these proposed
alternatives that misrepresent the problem, and that I have lifted the
technical objections and the burden for translators.


I think I mostly agree with you at this point. That being said you must 
convince the others, not me :-P


Abdel.



Re: UTF-8 in string literals and translation strings in particular

2015-10-08 Thread Jean-Marc Lasgouttes

Le 08/10/15 03:19, Cyrille Artho a écrit :

It's possible to embed characters from a font in an SVG graph; this
could be used to avoid problems with fonts that do not have all math
characters.


I seem to remember (but git log is your friend) that these icon started 
their life with embedded fonts, and that they were transformed to 
something else later.


But only Enrico can tell us what the status is.

JMarc



Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 07/10/2015 01:23, Pavel Sanda a écrit :

Le 06/10/2015 21:01, Pavel Sanda a écrit :
I think you might be mixing issues. One thing is allowing to have
UTF-8 string literals especially in translations, another one is
deciding that we now use ? instead of ... in the interface to be
consistent with Qt.


To clarify, I'm not against having ellipsis in the menu per se by whatever
mechanism you use to pass it to Qt routines. What I am not way too happy is
increasing % of utf8-based source code itself and following translations.


That would be trivial easy to implement for menus indeed. Or even for 
dialogs, if we put the code in _().


JMarc



Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 06/10/2015 21:43, Pavel Sanda a écrit :

Jean-Marc Lasgouttes wrote:

Well, actually I am not sure that we use the same ellipsis as in English.


You mean as a translator I need to find French version of ellipsis?


I am not sure that it exists.

JMarc



Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 06/10/2015 22:17, Guillaume Munch a écrit :

I think you might be mixing issues. One thing is allowing to have UTF-8
string literals especially in translations, another one is deciding that
we now use … instead of ... in the interface to be consistent with Qt.


What would be worse than using ... in the UI would be to use a random 
mix of ... and … (look Ma, I managed to type it with my compose key!) in 
the UI. By doing this change, we put more pressure on our translators to 
do thing right.


An alternative would be the attached hack.

JMarc
>From c48e487a7c1ba98061d55d6b2d0f2b2a0a219986 Mon Sep 17 00:00:00 2001
From: Jean-Marc Lasgouttes 
Date: Wed, 7 Oct 2015 10:34:06 +0200
Subject: [PATCH] Transform ... to proper ellipsis in translations

---
 src/support/Messages.cpp |7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/support/Messages.cpp b/src/support/Messages.cpp
index da1c3cc..177436f 100644
--- a/src/support/Messages.cpp
+++ b/src/support/Messages.cpp
@@ -94,6 +94,7 @@
 #endif
 
 using namespace std;
+using namespace lyx::support;
 using boost::uint32_t;
 
 namespace lyx {
@@ -125,6 +126,10 @@ void cleanTranslation(docstring & trans)
 		}
 		break;
 	}
+
+	static docstring const threedots = from_ascii("...");
+	static docstring const ellipsis = docstring(1, 0x2026);
+	trans = subst(trans, threedots, ellipsis);
 }
 
 } // lyx
@@ -132,8 +137,6 @@ void cleanTranslation(docstring & trans)
 
 #ifdef ENABLE_NLS
 
-using namespace lyx::support;
-
 namespace lyx {
 
 std::string Messages::gui_lang_;
-- 
1.7.9.5



Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 06/10/2015 20:52, Guillaume Munch a écrit :

(For proper French usage, you can also input accented upper-case
characters and I am sure that your question marks are preceded by a
space − though I would agree that a thin space would appear as too
formal in e-mails.)


Ha! You do not know who you are talking to. A former PhD student 
complained that because of me, he had become too obsessed with 
orthotypography to be able to concentrate on the contents of what he was 
reading :)


The fact is that, with the patch that I posted a minute ago, I had 
trouble to convince myself that the ... were replaced with …, since the 
difference is quite subtle. Can you really see the difference 
immediately when you use a menu?



I could propose to hack it into the menu code...


Please, no. Would you also hack my changes to
src/frontends/qt4/ui/HSpaceUi.ui into the dialog code?


Well I did write the patch anyway.


(For ellipses though, I find more useful to read … in the source
rather than \u2026.)


Well it would if our mechanism for sending patches to lyx-cvs did 
specify correctly the attachment encoding to utf8. Right now, I just see 
garbage.


JMarc


Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Cyrille Artho



Unicode characters in some menus (math panel etc.) would indeed look
much better than pixelated bitmaps.


Do you have particular examples in mind? I thought we used svg icons these
days.

In my view, it is better to have a character that looks like the math font
that LyX uses, than a poorly designed unicode character that smells MS Word
from 100 meters. Unless we teach LyX how to use unicode math fonts, but
this is not for today.

JMarc

>
In the math panel on Mac OS X (non-Retina displays), the icons are very 
pixelated. This could be due to the rendering without anti-aliasing but in 
any case most of the icons look very dated due to that.


Examples are the Laplace operator ∇, the inequality signs ≤ ≦ ≮ and the 
root symbols in row 2. Examples that *are* anti-aliases and look better, 
are "a/b" and the parentheses.


Maybe this is again because people wanted to "play safe" for systems 
without a complete Unicode character set.

--
Regards,
Cyrille Artho - http://artho.com/
I love deadlines. I like the whooshing sound they make as they fly by.
-- Douglas Adams


Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Cyrille Artho



 >
In the math panel on Mac OS X (non-Retina displays), the icons are very
pixelated. This could be due to the rendering without anti-aliasing but in
any case most of the icons look very dated due to that.

Examples are the Laplace operator ∇, the inequality signs ≤ ≦ ≮ and the
root symbols in row 2. Examples that *are* anti-aliases and look better,
are "a/b" and the parentheses.

Maybe this is again because people wanted to "play safe" for systems
without a complete Unicode character set.

>
It's possible to embed characters from a font in an SVG graph; this could 
be used to avoid problems with fonts that do not have all math characters.


I've found some information on this page:

http://www.xml.com/lpt/a/1410

(roughly in the middle of the text). I'm not sure which tools can automate 
this process, though.

--
Regards,
Cyrille Artho - http://artho.com/
For a list of the ways which technology has
failed to improve our quality of life, press 3.
-- Phil Read, on Slashdot


Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Cyrille Artho

Jean-Marc Lasgouttes wrote:

Le 06/10/2015 22:17, Guillaume Munch a écrit :

I think you might be mixing issues. One thing is allowing to have UTF-8
string literals especially in translations, another one is deciding that
we now use … instead of ... in the interface to be consistent with Qt.


What would be worse than using ... in the UI would be to use a random mix
of ... and … (look Ma, I managed to type it with my compose key!) in the
UI. By doing this change, we put more pressure on our translators to do
thing right.

An alternative would be the attached hack.

JMarc


Why not apply a variant of that hack as using "sed" to all the 
language/menu strings in the repository? Of course the changes would have 
to be checked by a human in case one wants to keep "..." in some cases.


I think if we replace all "..." (perhaps in all menus, or also in most/all 
of the manuals), then there will be very few new instances where we need to 
distinguish between "..." and a proper ellipsis.


At least on MacOS, the difference is very clear, the ellipsis is easy to 
get through the character viewer, and text-to-speech software and other 
converters may also work better with proper punctuation (although that's 
pure speculation from my side, I have not tested that).

--
Regards,
Cyrille Artho - http://artho.com/
I love deadlines. I like the whooshing sound they make as they fly by.
-- Douglas Adams


Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Abdelrazak Younes

On 07/10/2015 10:54, Jean-Marc Lasgouttes wrote:

Le 06/10/2015 22:17, Guillaume Munch a écrit :

I think you might be mixing issues. One thing is allowing to have UTF-8
string literals especially in translations, another one is deciding that
we now use … instead of ... in the interface to be consistent with Qt.


What would be worse than using ... in the UI would be to use a random 
mix of ... and … (look Ma, I managed to type it with my compose key!) 
in the UI. By doing this change, we put more pressure on our 
translators to do thing right.


An alternative would be the attached hack.


No need for any hack: Another alternative that works AFAIR and is 
cleaner IMO, english to english translation works. So, for english or 
french actually, ... can be translated to the ellipsis character.


Abdel.



Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 07/10/2015 01:10, Cyrille Artho a écrit :

I think for UI guidelines, if Apple and Gnome recommend A, and Microsoft
recommends B, then A is definitely the right way. ;-)


Agreed.


who knows how to get an ellipsis on his/her keyboard?


It's trivial. Character viewer -> Punctuation -> there it is right away.
If you type it often you can look up the keyboard shortcut.


Our translateors are not all geeks. We have to make life easy for them.


Unicode characters in some menus (math panel etc.) would indeed look
much better than pixelated bitmaps.


Do you have particular examples in mind? I thought we used svg icons 
these days.


In my view, it is better to have a character that looks like the math 
font that LyX uses, than a poorly designed unicode character that smells 
MS Word from 100 meters. Unless we teach LyX how to use unicode math 
fonts, but this is not for today.


JMarc


Re: UTF-8 in string literals and translation strings in particular

2015-10-07 Thread Jean-Marc Lasgouttes

Le 06/10/2015 21:43, Guillaume Munch a écrit :

-LASSERT(static_cast(*c) < 0x80, return l);
-s.push_back(*c);
+if (static_cast(*c) < 0x80)
+s.push_back(*c);
+else
+return s += from_utf8(string(c));




In the meanwhile I understood that probably the precision that you need
is that c is a C string that denotes the suffix of r and string(c)
converts the whole remainder of r into a c++ string, not just the next
char.


I would not describe that code as readable ;) This early return in the 
loop is pretty weird.


OTOH, I am not opposed to extending docstring to understand utf8. At the 
time where it was introduced, it was decided to stick to ASCII because 
there was a big risk of passing latin1 strings. We can probably assume 
that this risk is gone now.


JMarc


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Guillaume Munch

Le 06/10/2015 18:28, Jean-Marc Lasgouttes a écrit :


For the docstring part of the code, I am not sure what code like the
following do:

-LASSERT(static_cast(*c) < 0x80, return l);
-s.push_back(*c);
+if (static_cast(*c) < 0x80)
+s.push_back(*c);
+else
+return s += from_utf8(string(c));

There is nothing magic about from_utf8(string(c)), right? This is just
accepting latin1 characters, or am I blind?




To be more precise, it accepts Latin-1 characters that are part of the 
ASCII subset, yes, if that was your remark :)





Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Guillaume Munch

Le 06/10/2015 18:28, Jean-Marc Lasgouttes a écrit :

Le 06/10/2015 18:38, Guillaume Munch a écrit :

I'm trying to come up with examples where do we actually "need"
unicode in the interface. The ellipsis case seems to be trivial
indeed.


Is it trivial?  On my ubuntu libreoffice (where I can write the same
string to compare), I would say that what is used is three dots and not
ellipsis. I failed to find documentation on the subject, except:
http://stackoverflow.com/questions/3777072/in-menus-for-should-one-use-ellipsis-sign-or-just-three-dots


Note that the above comments are from 2010.




After more looking around, I see that Apple recommends the ellispsis
character:
https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/OSXHIGuidelines/TerminologyWording.html#//apple_ref/doc/uid/2957-CH15-SW3


Gnome seems to prefer ellipsis character too:
https://developer.gnome.org/hig/stable/writing-style.html.en#ellipses

Microsoft does not say anything, but their examples use ...:
https://msdn.microsoft.com/en-us/library/dn742392.aspx



In addition, Qt already uses … to elide strings so we currently have an 
inconsistent UI.





Seriously, I think this is just going to annoy our translators.


Seriously :) as described in the rationale of my patch, this cannot be 
more transparent for translators. Gettext pre-sets the previous 
translation; translators just have to do a global search-replace. And 
until they do, the old translation is still displayed. I took the time 
to check that it's indeed the case. And you might be underestimating our 
translators, some of whom might be lovers of proper typographic usage.



Seriously, who knows how to get an ellipsis on his/her keyboard?


Seriously :) I am sorry, ignorance is not an argument. A lot of bad 
typographic usage is only the legacy of past technical limitations that 
are long gone. It took me 10 seconds to learn that … is AltGr+Shift+, in 
the french Linux keyboard and Option+; on Mac. (For proper French usage, 
you can also input accented upper-case characters and I am sure that 
your question marks are preceded by a space − though I would agree that 
a thin space would appear as too formal in e-mails.)




I could propose to hack it into the menu code...


Please, no. Would you also hack my changes to 
src/frontends/qt4/ui/HSpaceUi.ui into the dialog code?




I general, our source code is already UTF8 (in particular author names
in .cpp files), but I am not sure that adding weird characters in the
source is always helpful. I would not swear that there only one
character looking like an ellipsis in unicode standard.



Do you have an example where this might lead to a confusing situation? 
If I see … in a translation string I would trust the author that he did 
not write U+1D087 BYZANTINE MUSICAL SYMBOL TRIPLI without a good reason. 
And I do not see how … is more confusing than 0x2026, developers are 
free to explain with a comment in both cases.





 > Properly formatted text in general, not just proper ellipses…
 > A revamped IPA toolbar?
 > The math toolbar?
 > See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.

Please note that we have to be very careful with unicode characters. At
some times I advocated using the proper unicode visible space character
in our documentation, but it turned out that several windows font did
not have that. You do not want to force users to use such or such font
in their text editor.



As already discussed in the "newline char" thread, this is a separate 
issue, easily fixed by providing a fallback font. This is standard practice.


I hear hacks and half-solutions left and right, when Unicode provides a 
standard solution. A fallback font would be a good investment for not 
reinventing the wheel constantly. (And a custom portable font with a few 
special characters taken from various free fonts is incredibly easy to do.)





Of all programs, if LyX does not need an Unicode interface, which
program does? I am sure you will be able to come up with creative uses.


LyX needs an interface that blends well with the environment where it runs.


This is repeating the argument before, so I would repeat the reply.




LyX already uses a ton of Unicode chars, defined by hand with their code
point. This patch makes it easier to use Unicode in the source, and
enables special chars in translation strings as well.


Code point is more precise than trying to recognize a character in a
unicode table. I am sure that emacs can tell me what is the code point
at cursor position, but life is to short to try to find it.


First, the patch does not prevent you from using code points when 
appropriate. But it now also allows you to use the \u and \U escape 
sequences in string literals, e.g. for translation strings, if you find 
it appropriate. Though this is c++11, so only starting from LyX 2.3. 
(For ellipses though, I find more useful to read … in the source rather 
than \u2026.)





And the patch is free.


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Richard Heck

On 10/06/2015 12:38 PM, Guillaume Munch wrote:

Le 06/10/2015 17:13, Pavel Sanda a écrit :

Guillaume Munch wrote:

Le 04/10/2015 23:20, Guillaume Munch a écrit :

Dear list,

Has there been some discussion already about allowing UTF-8 in
the source code, in particular for translatable strings? Is this
 something we long for?

Guillaume


Seeing how there is unanimous interest, here's a proof of concept.

Please read the commit log carefully including the rationale and
the "TODO". Skip the first third regarding string updates (yes I
can make them in a separate patch).

While the subject of the patch may seem trivial, I think that
setting this up is a good investment for future use of Unicode in
the interface.


I'm trying to come up with examples where do we actually "need"
unicode in the interface. The ellipsis case seems to be trivial
indeed.


Properly formatted text in general, not just proper ellipses…
A revamped IPA toolbar?
The math toolbar?
See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.


There are probably some cases involving HTML output, as well. What about 
lib/symbols and the like?


Richard



Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Guillaume Munch

Le 06/10/2015 20:03, Guillaume Munch a écrit :

Le 06/10/2015 18:28, Jean-Marc Lasgouttes a écrit :


For the docstring part of the code, I am not sure what code like the
following do:

-LASSERT(static_cast(*c) < 0x80, return l);
-s.push_back(*c);
+if (static_cast(*c) < 0x80)
+s.push_back(*c);
+else
+return s += from_utf8(string(c));

There is nothing magic about from_utf8(string(c)), right? This is just
accepting latin1 characters, or am I blind?




To be more precise, it accepts Latin-1 characters that are part of the
ASCII subset, yes, if that was your remark :)




In the meanwhile I understood that probably the precision that you need 
is that c is a C string that denotes the suffix of r and string(c) 
converts the whole remainder of r into a c++ string, not just the next char.





Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Pavel Sanda
Guillaume Munch wrote:
>> Seriously, I think this is just going to annoy our translators.
>
> Seriously :) as described in the rationale of my patch, this cannot be more 
> transparent for translators. Gettext pre-sets the previous translation; 
> translators just have to do a global search-replace. And until they do, the 
> old translation is still displayed. I took the time to check that it's 
> indeed the case. And you might be underestimating our translators, some of 
> whom might be lovers of proper typographic usage.
>
>> Seriously, who knows how to get an ellipsis on his/her keyboard?
>
> Seriously :) I am sorry, ignorance is not an argument. A lot of bad 
> typographic usage is only the legacy of past technical limitations that are 
> long gone. It took me 10 seconds to learn that ??? is AltGr+Shift+, in the 
> french Linux keyboard and Option+; on Mac. (For proper French usage, you 
> can also input accented upper-case characters and I am sure that your 
> question marks are preceded by a space ??? though I would agree that a thin 
> space would appear as too formal in e-mails.)

I dont think this is a question of ignorance but of comfort.
It took me 5 min to figure out how to enter elipsis directly and no
I will not rember the number 2026 or whatever weird shortcut my
keyboard layout might have for the next time.
I'm afraid more on JMarc side on the points above.

Pavel


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Pavel Sanda
Jean-Marc Lasgouttes wrote:
> Well, actually I am not sure that we use the same ellipsis as in English.

You mean as a translator I need to find French version of ellipsis?
P


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Guillaume Munch

Le 06/10/2015 21:01, Pavel Sanda a écrit :

Guillaume Munch wrote:

Seriously, I think this is just going to annoy our translators.


Seriously :) as described in the rationale of my patch, this cannot be more
transparent for translators. Gettext pre-sets the previous translation;
translators just have to do a global search-replace. And until they do, the
old translation is still displayed. I took the time to check that it's
indeed the case. And you might be underestimating our translators, some of
whom might be lovers of proper typographic usage.


Seriously, who knows how to get an ellipsis on his/her keyboard?


Seriously :) I am sorry, ignorance is not an argument. A lot of bad
typographic usage is only the legacy of past technical limitations that are
long gone. It took me 10 seconds to learn that ??? is AltGr+Shift+, in the
french Linux keyboard and Option+; on Mac. (For proper French usage, you
can also input accented upper-case characters and I am sure that your
question marks are preceded by a space ??? though I would agree that a thin
space would appear as too formal in e-mails.)


I dont think this is a question of ignorance but of comfort.
It took me 5 min to figure out how to enter elipsis directly and no
I will not rember the number 2026 or whatever weird shortcut my
keyboard layout might have for the next time.
I'm afraid more on JMarc side on the points above.

Pavel




I think you might be mixing issues. One thing is allowing to have UTF-8 
string literals especially in translations, another one is deciding that 
we now use … instead of ... in the interface to be consistent with Qt.


If you do not find typographic consistency important personally, then I 
imagine that you would continue typing "..." in UI strings. I would see 
it personally as a minor bug, but who will prevent you from doing it?



Guillaume




Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Jean-Marc Lasgouttes

Le 06/10/15 21:43, Pavel Sanda a écrit :

Jean-Marc Lasgouttes wrote:

Well, actually I am not sure that we use the same ellipsis as in English.


You mean as a translator I need to find French version of ellipsis?


I even remeber of a document that stated that using 3 dots wa better in 
French, but I cannot find it right now…


JMarc



Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Pavel Sanda
> Le 06/10/2015 21:01, Pavel Sanda a écrit :
> I think you might be mixing issues. One thing is allowing to have
> UTF-8 string literals especially in translations, another one is
> deciding that we now use ? instead of ... in the interface to be
> consistent with Qt.

To clarify, I'm not against having ellipsis in the menu per se by whatever
mechanism you use to pass it to Qt routines. What I am not way too happy is
increasing % of utf8-based source code itself and following translations.

Pavel


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Cyrille Artho




After more looking around, I see that Apple recommends the ellispsis
character:
https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/OSXHIGuidelines/TerminologyWording.html#//apple_ref/doc/uid/2957-CH15-SW3


Gnome seems to prefer ellipsis character too:
https://developer.gnome.org/hig/stable/writing-style.html.en#ellipses

Microsoft does not say anything, but their examples use ...:
https://msdn.microsoft.com/en-us/library/dn742392.aspx

I think for UI guidelines, if Apple and Gnome recommend A, and Microsoft 
recommends B, then A is definitely the right way. ;-)


I think it's important to consider that in many languages, three dots is 
just plain wrong, so we should do things the proper way.

>

Seriously, I think this is just going to annoy our translators. Seriously,
who knows how to get an ellipsis on his/her keyboard?


It's trivial. Character viewer -> Punctuation -> there it is right away.
If you type it often you can look up the keyboard shortcut.
>

I could propose to hack it into the menu code...

I general, our source code is already UTF8 (in particular author names in
.cpp files), but I am not sure that adding weird characters in the source
is always helpful. I would not swear that there only one character looking
like an ellipsis in unicode standard.

 > Properly formatted text in general, not just proper ellipses…
 > A revamped IPA toolbar?
 > The math toolbar?
 > See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.

Please note that we have to be very careful with unicode characters. At
some times I advocated using the proper unicode visible space character in
our documentation, but it turned out that several windows font did not have
that. You do not want to force users to use such or such font in their text
editor.

>
Then maybe we can try to prevent a user configuration LyX with such strange 
fonts. Fonts without important characters are a relic of the past, and 
sooner or later even Windows will adapt.



Of all programs, if LyX does not need an Unicode interface, which
program does? I am sure you will be able to come up with creative uses.


LyX needs an interface that blends well with the environment where it runs.



Unicode characters in some menus (math panel etc.) would indeed look much 
better than pixelated bitmaps.

--
Regards,
Cyrille Artho - http://artho.com/
Solitude is an illusion, for every silence is filled
by a clamorous search for meaning.
-- Steven Erikson, "House of Chains"


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Pavel Sanda
Guillaume Munch wrote:
> Le 04/10/2015 23:20, Guillaume Munch a écrit :
>> Dear list,
>>
>> Has there been some discussion already about allowing UTF-8 in the
>> source code, in particular for translatable strings? Is this
>> something we long for?
>>
>> Guillaume
>>
> Seeing how there is unanimous interest, here's a proof of concept.
>
> Please read the commit log carefully including the rationale and the 
> "TODO". Skip the first third regarding string updates (yes I can make them 
> in a separate patch).
>
> While the subject of the patch may seem trivial, I think that setting
> this up is a good investment for future use of Unicode in the interface.

I'm trying to come up with examples where do we actually "need" unicode
in the interface. The ellipsis case seems to be trivial indeed.
So except that I can't use my keyboard directly for ellipsis and worry
what happens with python 2.x what are the gains actually?

Pavel

... Or are guys secretly preparing for turning GUI into French? :)


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Jean-Marc Lasgouttes

Le 06/10/2015 18:38, Guillaume Munch a écrit :

I'm trying to come up with examples where do we actually "need"
unicode in the interface. The ellipsis case seems to be trivial
indeed.


Is it trivial? On my ubuntu libreoffice (where I can write the same 
string to compare), I would say that what is used is three dots and not 
ellipsis. I failed to find documentation on the subject, except:

http://stackoverflow.com/questions/3777072/in-menus-for-should-one-use-ellipsis-sign-or-just-three-dots

After more looking around, I see that Apple recommends the ellispsis 
character:

https://developer.apple.com/library/mac/documentation/UserExperience/Conceptual/OSXHIGuidelines/TerminologyWording.html#//apple_ref/doc/uid/2957-CH15-SW3

Gnome seems to prefer ellipsis character too:
https://developer.gnome.org/hig/stable/writing-style.html.en#ellipses

Microsoft does not say anything, but their examples use ...:
https://msdn.microsoft.com/en-us/library/dn742392.aspx

Seriously, I think this is just going to annoy our translators. 
Seriously, who knows how to get an ellipsis on his/her keyboard?


I could propose to hack it into the menu code...

I general, our source code is already UTF8 (in particular author names 
in .cpp files), but I am not sure that adding weird characters in the 
source is always helpful. I would not swear that there only one 
character looking like an ellipsis in unicode standard.


> Properly formatted text in general, not just proper ellipses…
> A revamped IPA toolbar?
> The math toolbar?
> See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.

Please note that we have to be very careful with unicode characters. At 
some times I advocated using the proper unicode visible space character 
in our documentation, but it turned out that several windows font did 
not have that. You do not want to force users to use such or such font 
in their text editor.



Of all programs, if LyX does not need an Unicode interface, which
program does? I am sure you will be able to come up with creative uses.


LyX needs an interface that blends well with the environment where it runs.


LyX already uses a ton of Unicode chars, defined by hand with their code
point. This patch makes it easier to use Unicode in the source, and
enables special chars in translation strings as well.


Code point is more precise than trying to recognize a character in a 
unicode table. I am sure that emacs can tell me what is the code point 
at cursor position, but life is to short to try to find it.



And the patch is free.


:)

For the docstring part of the code, I am not sure what code like the 
following do:


-   LASSERT(static_cast(*c) < 0x80, return l);
-   s.push_back(*c);
+   if (static_cast(*c) < 0x80)
+   s.push_back(*c);
+   else
+   return s += from_utf8(string(c));

There is nothing magic about from_utf8(string(c)), right? This is just 
accepting latin1 characters, or am I blind?


JMarc


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Jean-Marc Lasgouttes
Well, actually I am not sure that we use the same ellipsis as in English.

JMarc

Le 6 octobre 2015 18:13:25 GMT+02:00, Pavel Sanda  a écrit :
>I'm trying to come up with examples where do we actually "need" unicode
>in the interface. The ellipsis case seems to be trivial indeed.
>So except that I can't use my keyboard directly for ellipsis and worry
>what happens with python 2.x what are the gains actually?
>
>Pavel
>
>... Or are guys secretly preparing for turning GUI into French? :)

-- 
Envoyé de mon appareil Android avec K-9 Mail. Veuillez excuser ma brièveté.


Re: UTF-8 in string literals and translation strings in particular

2015-10-06 Thread Guillaume Munch

Le 06/10/2015 17:13, Pavel Sanda a écrit :

Guillaume Munch wrote:

Le 04/10/2015 23:20, Guillaume Munch a écrit :

Dear list,

Has there been some discussion already about allowing UTF-8 in
the source code, in particular for translatable strings? Is this
 something we long for?

Guillaume


Seeing how there is unanimous interest, here's a proof of concept.

Please read the commit log carefully including the rationale and
the "TODO". Skip the first third regarding string updates (yes I
can make them in a separate patch).

While the subject of the patch may seem trivial, I think that
setting this up is a good investment for future use of Unicode in
the interface.


I'm trying to come up with examples where do we actually "need"
unicode in the interface. The ellipsis case seems to be trivial
indeed.


Properly formatted text in general, not just proper ellipses…
A revamped IPA toolbar?
The math toolbar?
See also src/frontends/qt4/ui/HSpaceUi.ui in the patch.

Of all programs, if LyX does not need an Unicode interface, which
program does? I am sure you will be able to come up with creative uses.

LyX already uses a ton of Unicode chars, defined by hand with their code
point. This patch makes it easier to use Unicode in the source, and
enables special chars in translation strings as well.

And the patch is free.


So except that I can't use my keyboard directly for ellipsis


Well, Mac and Linux users can. For Windows, I do not know.


and worry what happens with python 2.x what are the gains actually?


Nothing wrong will happen with python 2.x. The patch does not touch the 
Python string literals, and it already support UTF-8 translated strings 
if that was needed because all languages apart from English are already 
written in UTF-8.




Pavel

... Or are guys secretly preparing for turning GUI into French? :)



Of course I would let you know one week in advance if that was the case.


Thanks for the feedback

Guillaume



Re: UTF-8 in string literals and translation strings in particular

2015-10-05 Thread Guillaume Munch

Le 04/10/2015 23:20, Guillaume Munch a écrit :

Dear list,

Has there been some discussion already about allowing UTF-8 in the
source code, in particular for translatable strings? Is this
something we long for?


Guillaume




Seeing how there is unanimous interest, here's a proof of concept.

Please read the commit log carefully including the rationale and the 
"TODO". Skip the first third regarding string updates (yes I can make 
them in a separate patch).


While the subject of the patch may seem trivial, I think that setting
this up is a good investment for future use of Unicode in the interface.



Guillaume
>From a3471a5d734cb06cd3f552b079ff94985d755ea0 Mon Sep 17 00:00:00 2001
From: Guillaume Munch 
Date: Sat, 3 Oct 2015 05:16:05 +0100
Subject: [PATCH] U+2026 HORIZONTAL ELLIPSIS instead of "..."
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* Set up the translation chain for UTF-8 strings.

* docstring::operator{==,=+,+} are now compatible UTF-8 (the implementation has
  not changed for the ASCII subset).

Rationale:

We must use UTF-8 strings in the source code instead of defining a global
constant "support::ellipsis = 0x2026" and using concatenation. For two reasons:

 * All translatable strings containing ... are changed, and the former
   translation is correctly suggested as a fuzzy translation. Only substituting
   ... in the string is the route the less confusing for translators who won't
   have to scratch their heads to adapt all the translation strings. It also
   means that the translation remains correct until somebody updates it.

 * We want to keep the source readable. There are already several places where
   for a single unicode character we push_back() a wide char defined by its code
   point and then we have to describe it in a comment. By allowing UTF-8
   strings, we encourage the use of Unicode in the interface. This could be
   useful, for a document processor that has the most advanced Unicode output
   capabilities.

Audit of the usability of UTF-8 strings in the source:

 * to define a docstring:

   * use from_utf8("…")

   * use _() (it uses from_utf8 internally) (source: src/support/Messages.cpp:
 Messages::get())

   * use operator{==,=+,+}. (new)

 * to define a QString:

   * use QString::fromUtf8("…")

   * use qt_() (it uses QString::fromUtf8 internally) (source:
 src/frontends/qt4/qt_helpers.cpp)

Warning: QString::operator{==,=+,+} are NOT compatible with UTF-8 strings in
 source (they use a locale-dependant conversion). Use QString::fromUtf8()
 first. (The locale-dependant conversion can be disabled by defining
 QT_NO_CAST_FROM_ASCII. This will disable all dangerous functions at
 compile-time, see the manual.)

 * Python: only Python 3 is UTF-8 by default. So we change nothing there.

 * Qt designer/creator must be configured to save in UTF-8.

TODO:
 * ensure that the c++ compiler runs under (the equivalent of) LC_ALL=C or
   LC_ALL=UTF-8.
 * Define QT_NO_CAST_FROM_ASCII
---
 lib/ui/stdcontext.inc |  72 ++--
 lib/ui/stdmenus.inc   | 108 +++---
 po/Makevars   |   2 +-
 po/Rules-lyx  |   5 +-
 po/utf-8.pot.in   |   6 ++
 src/BiblioInfo.cpp|   2 +-
 src/Buffer.cpp|  10 +--
 src/BufferView.cpp|   6 +-
 src/Converter.cpp |   2 +-
 src/LyXRC.cpp |   2 +-
 src/Text3.cpp |   3 +-
 src/VCBackend.cpp |   4 +-
 src/frontends/qt4/GuiApplication.cpp  |   6 +-
 src/frontends/qt4/GuiCompare.cpp  |   2 +-
 src/frontends/qt4/GuiDocument.cpp |  10 +--
 src/frontends/qt4/GuiHyperlink.cpp|   4 +-
 src/frontends/qt4/GuiView.cpp |  14 ++--
 src/frontends/qt4/GuiWorkArea.cpp |  11 +--
 src/frontends/qt4/Menus.cpp   |   8 +--
 src/frontends/qt4/ui/BibtexAddUi.ui   |   2 +-
 src/frontends/qt4/ui/BibtexUi.ui  |   8 +--
 src/frontends/qt4/ui/BranchesUi.ui|   4 +-
 src/frontends/qt4/ui/ColorUi.ui   |   8 +--
 src/frontends/qt4/ui/CompareUi.ui |   4 +-
 src/frontends/qt4/ui/ErrorListUi.ui   |   2 +-
 src/frontends/qt4/ui/ExternalUi.ui|   2 +-
 src/frontends/qt4/ui/GraphicsUi.ui|   4 +-
 src/frontends/qt4/ui/HSpaceUi.ui  |  12 ++--
 src/frontends/qt4/ui/IncludeUi.ui |   2 +-
 src/frontends/qt4/ui/IndicesUi.ui |   4 +-
 src/frontends/qt4/ui/LaTeXUi.ui   |   4 +-
 src/frontends/qt4/ui/PrefColorsUi.ui  |   2 +-
 src/frontends/qt4/ui/PrefCompletionUi.ui  |   4 +-
 src/frontends/qt4/ui/PrefFileformatsUi.ui |   2 +-
 src/frontends/qt4/ui/PrefInputUi.ui   |   4 +-
 src/frontends/qt4/ui/PrefPathsUi.ui   |  16 ++---
 

UTF-8 in string literals and translation strings in particular

2015-10-04 Thread Guillaume Munch

Deal list,

Has there been some discussion already about allowing UTF-8 in the 
source code, in particular for translatable strings? Is this something 
we long for?



Guillaume