Re: false positive in test for unencodable preamble

2015-07-07 Thread Jürgen Spitzmüller
Guenter Milde wrote:
 I assumed this and I answered accordingly (as if you had written literal
 and not literate). 
 
 My argument, however, is that we may cosider it more useful to encode
 unencodable characters into a LICR instead of throwing an error, as for
 LaTeX this is one representation of the given character entity.

I understood. However, I still think the preamble content should not be 
changed whatsoever. If a user enters invalid content, inform him.

Jürgen


Re: false positive in test for unencodable preamble

2015-07-07 Thread Jürgen Spitzmüller
Guenter Milde wrote:
> I assumed this and I answered accordingly (as if you had written literal
> and not literate). 
> 
> My argument, however, is that we may cosider it more useful to encode
> "unencodable" characters into a LICR instead of throwing an error, as for
> LaTeX this is one representation of the given "character entity".

I understood. However, I still think the preamble content should not be 
changed whatsoever. If a user enters invalid content, inform him.

Jürgen


Re: false positive in test for unencodable preamble

2015-07-06 Thread Jean-Marc Lasgouttes

Le 06/07/2015 17:17, Richard Heck a écrit :

I'm slightly confused. Is the idea that this inset can go anywhere you
like in the document, but its contents end up in the premable? If not,
then why do we need insets to do this? Why not just have some sort of
paragraph layout that does this? of which we already have a lot?


People like insets because they can be folded, for example. Adding to 
InsetLayout the same InPreamble property as Layout has should not be too 
difficult. Or adding a inPremable() virtual method and the relevant 
parameters for the ERT inset.


JMarc



Re: false positive in test for unencodable preamble

2015-07-06 Thread Jürgen Spitzmüller
Am Montag 06 Juli 2015, 17:24:47 schrieb Guenter Milde:
 On 2015-07-05, Jürgen Spitzmüller wrote:
  Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
  The attached patch fixes that and even simplifies the code. Does anybody
  know of any reason _not_ to use unicodesymbols replacements for the user
  preamble?
  
  I consider the preamble to be literate, just like ERT is. Thus, I
  think no auto-replacements should be performed.

No, I mean literal as in: pass to LaTeX as is.

Jürgen


Re: false positive in test for unencodable preamble

2015-07-06 Thread Guenter Milde
On 2015-07-05, Jürgen Spitzmüller wrote:
 Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
 The attached patch fixes that and even simplifies the code. Does anybody
 know of any reason _not_ to use unicodesymbols replacements for the user
 preamble?

 I consider the preamble to be literate, just like ERT is. Thus, I
 think no auto-replacements should be performed.

literal means that LaTeX' special characters (\{}~...) must not be
escaped or transformed. All these charcters are in the 7-bit ASCII range.

We could, however, consider to use unicodesymbols replacements for
characters that would be lost or lead to an error otherwise. This would
automagically regard only characters outside the 7-bit ASCII range and
therefore not interfere with the active characters.

The reason for _not_ using the unicodesymbols replacements would be
purity: the author can be sure that all in ERT (or the latex preamble)
is put to the LaTeX file as-is.  

However, encoding encodable characters is already a deviation from the
pure doctrine. One can argue that an 8-bit encoded character is still
the same as the equivalent Unicode character, but this argument may
also be extended to LICR as for LaTeX, an LICR is also just another
representation of the same character.

Günter




Re: false positive in test for unencodable preamble

2015-07-06 Thread Richard Heck

On 07/05/2015 10:46 AM, Scott Kostyshak wrote:

On Sun, Jul 5, 2015 at 10:35 AM, Guenter Milde mi...@users.sf.net wrote:

On 2015-07-05, Jean-Marc Lasgouttes wrote:

In the preamble, there is no way to make a LyX-only comment. :-(

If we implemented InPreamble for insets, then one could create an
ERT-like inset that goes in preamble.

This may be an idea...


Actually, we could even replace the current preamble with such insets.

but I would not use it to *replace* the current DocumentsSettingspreamble.
This is a place for advanced user settings that belong to the other
document settings (fonts, language, encoding, package/modules, ...) and
should not clutter the editor window.

I really like JMarc's idea, and agree with Günter that it should not
replace the current set up. We could also allow for this inset to
place preamble code *before* (some of) LyX's automatic preamble
export. Many issues in the past have come from user's not being able
to get something into the preamble soon enough, such as the order in
which packages are loaded.


I'm slightly confused. Is the idea that this inset can go anywhere you 
like in the document, but its contents end up in the premable? If not, 
then why do we need insets to do this? Why not just have some sort of 
paragraph layout that does this? of which we already have a lot?


Richard




Re: false positive in test for unencodable preamble

2015-07-06 Thread Richard Heck

On 07/05/2015 10:46 AM, Scott Kostyshak wrote:

On Sun, Jul 5, 2015 at 10:35 AM, Guenter Milde  wrote:

On 2015-07-05, Jean-Marc Lasgouttes wrote:

In the preamble, there is no way to make a "LyX-only" comment. :-(

If we implemented InPreamble for insets, then one could create an
ERT-like inset that goes in preamble.

This may be an idea...


Actually, we could even replace the current preamble with such insets.

but I would not use it to *replace* the current Documents>Settings>preamble.
This is a place for advanced user settings that belong to the other
document settings (fonts, language, encoding, package/modules, ...) and
should not clutter the editor window.

I really like JMarc's idea, and agree with Günter that it should not
replace the current set up. We could also allow for this inset to
place preamble code *before* (some of) LyX's automatic preamble
export. Many issues in the past have come from user's not being able
to get something into the preamble soon enough, such as the order in
which packages are loaded.


I'm slightly confused. Is the idea that this inset can go anywhere you 
like in the document, but its contents end up in the premable? If not, 
then why do we need insets to do this? Why not just have some sort of 
paragraph layout that does this? of which we already have a lot?


Richard




Re: false positive in test for unencodable preamble

2015-07-06 Thread Jean-Marc Lasgouttes

Le 06/07/2015 17:17, Richard Heck a écrit :

I'm slightly confused. Is the idea that this inset can go anywhere you
like in the document, but its contents end up in the premable? If not,
then why do we need insets to do this? Why not just have some sort of
paragraph layout that does this? of which we already have a lot?


People like insets because they can be folded, for example. Adding to 
InsetLayout the same InPreamble property as Layout has should not be too 
difficult. Or adding a inPremable() virtual method and the relevant 
parameters for the ERT inset.


JMarc



Re: false positive in test for unencodable preamble

2015-07-06 Thread Guenter Milde
On 2015-07-05, Jürgen Spitzmüller wrote:
> Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
>> The attached patch fixes that and even simplifies the code. Does anybody
>> know of any reason _not_ to use unicodesymbols replacements for the user
>> preamble?

> I consider the preamble to be "literate", just like ERT is. Thus, I
> think no auto-replacements should be performed.

"literal" means that LaTeX' "special characters" (\{}~...) must not be
escaped or transformed. All these charcters are in the 7-bit ASCII range.

We could, however, consider to use unicodesymbols replacements for
characters that would be lost or lead to an error otherwise. This would
"automagically" regard only characters outside the 7-bit ASCII range and
therefore not interfere with the active characters.

The reason for _not_ using the unicodesymbols replacements would be
"purity": the author can be sure that all in ERT (or the latex preamble)
is put to the LaTeX file as-is.  

However, encoding encodable characters is already a deviation from the
pure doctrine. One can argue that an 8-bit encoded character is still
"the same" as the equivalent Unicode character, but this argument may
also be extended to LICR as for LaTeX, an LICR is also just another
representation of the same character.

Günter




Re: false positive in test for unencodable preamble

2015-07-06 Thread Jürgen Spitzmüller
Am Montag 06 Juli 2015, 17:24:47 schrieb Guenter Milde:
> On 2015-07-05, Jürgen Spitzmüller wrote:
> > Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
> >> The attached patch fixes that and even simplifies the code. Does anybody
> >> know of any reason _not_ to use unicodesymbols replacements for the user
> >> preamble?
> > 
> > I consider the preamble to be "literate", just like ERT is. Thus, I
> > think no auto-replacements should be performed.

No, I mean "literal" as in: pass to LaTeX as is.

Jürgen


Re: false positive in test for unencodable preamble

2015-07-05 Thread Jürgen Spitzmüller
Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
 The attached patch fixes that and even simplifies the code. Does anybody
 know of any reason _not_ to use unicodesymbols replacements for the user
 preamble?

I consider the preamble to be literate, just like ERT is. Thus, I think no 
auto-replacements should be performed.

Jürgen

 Georg




Re: false positive in test for unencodable preamble

2015-07-05 Thread Georg Baum
Guenter Milde wrote:

 I want a concise comment in the LyX file, in order to explain what the
 code is about.
 
 In the document, I have the option to change the comment into a LyX-note.
 Fine.
 
 In the preamble, there is no way to make a LyX-only comment. :-(

You have a point. Unfortunately we don't know if the user wanted a LyX-
only comment or not.

 OTOH, the unicodesymbols replacements should be applied in the
 LaTeX-preamble, too. Currently, it seems they are applied for comments in
 the document, but not for the user-preamble. See
 http://www.lyx.org/trac/raw-attachment/ticket/9607/%C4%82%C4%83-in-preamble-comment-warning.lyx

Very good! This was a silent dataloss before and is now visible, so the
new warning did already uncover a problem that w did not know before.

The attached patch fixes that and even simplifies the code. Does anybody 
know of any reason _not_ to use unicodesymbols replacements for the user 
preamble?


Georgdiff --git a/src/BufferParams.cpp b/src/BufferParams.cpp
index 7a641e0..a783dbc 100644
--- a/src/BufferParams.cpp
+++ b/src/BufferParams.cpp
@@ -22,6 +22,7 @@
 #include BranchList.h
 #include Buffer.h
 #include buffer_funcs.h
+#include BufferEncodings.h
 #include Bullet.h
 #include Color.h
 #include ColorSet.h
@@ -58,6 +59,8 @@
 #include support/Translator.h
 #include support/lstrings.h
 
+#include boost/tuple/tuple.hpp
+
 #include algorithm
 #include sstream
 
@@ -1367,6 +1370,10 @@ void BufferParams::validate(LaTeXFeatures  features) const
 
 	if (!language-requires().empty())
 		features.require(language-requires());
+
+	docstring const user_preamble = from_utf8(preamble);
+	for (size_t i = 0; i  user_preamble.size(); ++i)
+		BufferEncodings::validate(user_preamble[i], features);
 }
 
 
@@ -1940,29 +1947,14 @@ bool BufferParams::writeLaTeX(otexstream  os, LaTeXFeatures  features,
 
 		// Check if the user preamble contains uncodable glyphs
 		docstring const u_preamble = from_utf8(preamble);
-		odocstringstream user_preamble;
+		docstring user_preamble;
 		docstring uncodable_glyphs;
 		Encoding const * const enc = features.runparams().encoding;
-		if (enc) {
-			for (size_t n = 0; n  u_preamble.size(); ++n) {
-char_type c = u_preamble[n];
-if (!enc-encodable(c)) {
-	docstring const glyph(1, c);
-	LYXERR0(Uncodable character '
-		 glyph
-		 ' in user preamble!);
-	uncodable_glyphs += glyph;
-	if (features.runparams().dryrun) {
-		user_preamble_(LyX Warning: )
-		_(uncodable character)   ';
-		user_preamble.put(c);
-		user_preamble  ';
-	}
-} else
-	user_preamble.put(c);
-			}
-		} else
-			user_preamble  u_preamble;
+		if (enc)
+			boost::tie(user_preamble, uncodable_glyphs) =
+enc-latexString(u_preamble, features.runparams().dryrun);
+		else
+			user_preamble = u_preamble;
 
 		// On BUFFER_VIEW|UPDATE, warn user if we found uncodable glyphs
 		if (!features.runparams().dryrun  !uncodable_glyphs.empty()) {
@@ -1980,7 +1972,7 @@ bool BufferParams::writeLaTeX(otexstream  os, LaTeXFeatures  features,
 preamble code accordingly.),
   uncodable_glyphs));
 		}
-		atlyxpreamble += user_preamble.str() + '\n';
+		atlyxpreamble += user_preamble + '\n';
 	}
 
 	// footmisc must be loaded after setspace



Re: false positive in test for unencodable preamble

2015-07-05 Thread Guenter Milde
On 2015-07-05, Jean-Marc Lasgouttes wrote:
 Le 05/07/2015 14:12, Georg Baum a écrit :
 Guenter Milde wrote:

 I want a concise comment in the LyX file, in order to explain what the
 code is about.

 In the document, I have the option to change the comment into a LyX-note.
 Fine.

 In the preamble, there is no way to make a LyX-only comment. :-(

 If we implemented InPreamble for insets, then one could create an 
 ERT-like inset that goes in preamble. 

This may be an idea...

 Actually, we could even replace the current preamble with such insets.

but I would not use it to *replace* the current DocumentsSettingspreamble.
This is a place for advanced user settings that belong to the other
document settings (fonts, language, encoding, package/modules, ...) and
should not clutter the editor window.

Günter



Re: false positive in test for unencodable preamble

2015-07-05 Thread Jean-Marc Lasgouttes

Le 05/07/2015 14:12, Georg Baum a écrit :

Guenter Milde wrote:


I want a concise comment in the LyX file, in order to explain what the
code is about.

In the document, I have the option to change the comment into a LyX-note.
Fine.

In the preamble, there is no way to make a LyX-only comment. :-(


If we implemented InPreamble for insets, then one could create an 
ERT-like inset that goes in preamble. Actually, we could even replace 
the current preamble with such insets.


JMarc



Re: false positive in test for unencodable preamble

2015-07-05 Thread Scott Kostyshak
On Sun, Jul 5, 2015 at 10:35 AM, Guenter Milde mi...@users.sf.net wrote:
 On 2015-07-05, Jean-Marc Lasgouttes wrote:

 In the preamble, there is no way to make a LyX-only comment. :-(

 If we implemented InPreamble for insets, then one could create an
 ERT-like inset that goes in preamble.

 This may be an idea...

 Actually, we could even replace the current preamble with such insets.

 but I would not use it to *replace* the current DocumentsSettingspreamble.
 This is a place for advanced user settings that belong to the other
 document settings (fonts, language, encoding, package/modules, ...) and
 should not clutter the editor window.

I really like JMarc's idea, and agree with Günter that it should not
replace the current set up. We could also allow for this inset to
place preamble code *before* (some of) LyX's automatic preamble
export. Many issues in the past have come from user's not being able
to get something into the preamble soon enough, such as the order in
which packages are loaded.


Re: false positive in test for unencodable preamble

2015-07-05 Thread Georg Baum
Guenter Milde wrote:

> I want a concise comment in the LyX file, in order to explain what the
> code is about.
> 
> In the document, I have the option to change the comment into a LyX-note.
> Fine.
> 
> In the preamble, there is no way to make a "LyX-only" comment. :-(

You have a point. Unfortunately we don't know if the user wanted a "LyX-
only" comment or not.

> OTOH, the "unicodesymbols" replacements should be applied in the
> LaTeX-preamble, too. Currently, it seems they are applied for comments in
> the document, but not for the user-preamble. See
> http://www.lyx.org/trac/raw-attachment/ticket/9607/%C4%82%C4%83-in-preamble-comment-warning.lyx

Very good! This was a silent dataloss before and is now visible, so the
new warning did already uncover a problem that w did not know before.

The attached patch fixes that and even simplifies the code. Does anybody 
know of any reason _not_ to use unicodesymbols replacements for the user 
preamble?


Georgdiff --git a/src/BufferParams.cpp b/src/BufferParams.cpp
index 7a641e0..a783dbc 100644
--- a/src/BufferParams.cpp
+++ b/src/BufferParams.cpp
@@ -22,6 +22,7 @@
 #include "BranchList.h"
 #include "Buffer.h"
 #include "buffer_funcs.h"
+#include "BufferEncodings.h"
 #include "Bullet.h"
 #include "Color.h"
 #include "ColorSet.h"
@@ -58,6 +59,8 @@
 #include "support/Translator.h"
 #include "support/lstrings.h"
 
+#include 
+
 #include 
 #include 
 
@@ -1367,6 +1370,10 @@ void BufferParams::validate(LaTeXFeatures & features) const
 
 	if (!language->requires().empty())
 		features.require(language->requires());
+
+	docstring const user_preamble = from_utf8(preamble);
+	for (size_t i = 0; i < user_preamble.size(); ++i)
+		BufferEncodings::validate(user_preamble[i], features);
 }
 
 
@@ -1940,29 +1947,14 @@ bool BufferParams::writeLaTeX(otexstream & os, LaTeXFeatures & features,
 
 		// Check if the user preamble contains uncodable glyphs
 		docstring const u_preamble = from_utf8(preamble);
-		odocstringstream user_preamble;
+		docstring user_preamble;
 		docstring uncodable_glyphs;
 		Encoding const * const enc = features.runparams().encoding;
-		if (enc) {
-			for (size_t n = 0; n < u_preamble.size(); ++n) {
-char_type c = u_preamble[n];
-if (!enc->encodable(c)) {
-	docstring const glyph(1, c);
-	LYXERR0("Uncodable character '"
-		<< glyph
-		<< "' in user preamble!");
-	uncodable_glyphs += glyph;
-	if (features.runparams().dryrun) {
-		user_preamble << "<" << _("LyX Warning: ")
-		   << _("uncodable character") << " '";
-		user_preamble.put(c);
-		user_preamble << "'>";
-	}
-} else
-	user_preamble.put(c);
-			}
-		} else
-			user_preamble << u_preamble;
+		if (enc)
+			boost::tie(user_preamble, uncodable_glyphs) =
+enc->latexString(u_preamble, features.runparams().dryrun);
+		else
+			user_preamble = u_preamble;
 
 		// On BUFFER_VIEW|UPDATE, warn user if we found uncodable glyphs
 		if (!features.runparams().dryrun && !uncodable_glyphs.empty()) {
@@ -1980,7 +1972,7 @@ bool BufferParams::writeLaTeX(otexstream & os, LaTeXFeatures & features,
 "preamble code accordingly."),
   uncodable_glyphs));
 		}
-		atlyxpreamble += user_preamble.str() + '\n';
+		atlyxpreamble += user_preamble + '\n';
 	}
 
 	// footmisc must be loaded after setspace



Re: false positive in test for unencodable preamble

2015-07-05 Thread Jürgen Spitzmüller
Am Sonntag 05 Juli 2015, 14:12:55 schrieb Georg Baum:
> The attached patch fixes that and even simplifies the code. Does anybody
> know of any reason _not_ to use unicodesymbols replacements for the user
> preamble?

I consider the preamble to be "literate", just like ERT is. Thus, I think no 
auto-replacements should be performed.

Jürgen

> Georg




Re: false positive in test for unencodable preamble

2015-07-05 Thread Jean-Marc Lasgouttes

Le 05/07/2015 14:12, Georg Baum a écrit :

Guenter Milde wrote:


I want a concise comment in the LyX file, in order to explain what the
code is about.

In the document, I have the option to change the comment into a LyX-note.
Fine.

In the preamble, there is no way to make a "LyX-only" comment. :-(


If we implemented InPreamble for insets, then one could create an 
ERT-like inset that goes in preamble. Actually, we could even replace 
the current preamble with such insets.


JMarc



Re: false positive in test for unencodable preamble

2015-07-05 Thread Guenter Milde
On 2015-07-05, Jean-Marc Lasgouttes wrote:
> Le 05/07/2015 14:12, Georg Baum a écrit :
>> Guenter Milde wrote:

>>> I want a concise comment in the LyX file, in order to explain what the
>>> code is about.

>>> In the document, I have the option to change the comment into a LyX-note.
>>> Fine.

>>> In the preamble, there is no way to make a "LyX-only" comment. :-(

> If we implemented InPreamble for insets, then one could create an 
> ERT-like inset that goes in preamble. 

This may be an idea...

> Actually, we could even replace the current preamble with such insets.

but I would not use it to *replace* the current Documents>Settings>preamble.
This is a place for advanced user settings that belong to the other
document settings (fonts, language, encoding, package/modules, ...) and
should not clutter the editor window.

Günter



Re: false positive in test for unencodable preamble

2015-07-05 Thread Scott Kostyshak
On Sun, Jul 5, 2015 at 10:35 AM, Guenter Milde  wrote:
> On 2015-07-05, Jean-Marc Lasgouttes wrote:

 In the preamble, there is no way to make a "LyX-only" comment. :-(
>
>> If we implemented InPreamble for insets, then one could create an
>> ERT-like inset that goes in preamble.
>
> This may be an idea...
>
>> Actually, we could even replace the current preamble with such insets.
>
> but I would not use it to *replace* the current Documents>Settings>preamble.
> This is a place for advanced user settings that belong to the other
> document settings (fonts, language, encoding, package/modules, ...) and
> should not clutter the editor window.

I really like JMarc's idea, and agree with Günter that it should not
replace the current set up. We could also allow for this inset to
place preamble code *before* (some of) LyX's automatic preamble
export. Many issues in the past have come from user's not being able
to get something into the preamble soon enough, such as the order in
which packages are loaded.


Re: false positive in test for unencodable preamble

2015-07-02 Thread Guenter Milde
On 2015-06-30, Georg Baum wrote:
 Guenter Milde wrote:
 On 2015-06-26, Jürgen Spitzmüller wrote:
 2015-06-26 16:44 GMT+02:00 Guenter Milde mi...@users.sf.net:

 Please don't check for unencodable characters in comments.

 It's still invalid encoding, since the output file contains invalid
 glyphs (no matter if this line is processed by LaTeX or not).

 I have a different view on this:

 **Invalid** characters may only occure in utf-8 encoded files,
 for example in a file generated by LyX with default settings if
 * the document language defaults to utf8
 * a second language defaults to an 8-bit encoding:

...

   \selectlanguage{ngerman}%
   \inputencoding{latin9}%
   Gr��e \selectlanguage{mongolian}%
...

 Here, the German word Grüße contains 2 invalid characters if you want to
 process/view/edit the whole file as Utf-8.

 This is not invalid. Such a file is not an utf8 file, it is a file with 
 mixed encoding, but each single character is valid. The fact that most 
 editors cannot display such a file correctly is something else, but e.g. 
 emacs can display this file correctly.

OK. However, in 8-bit encoded files, there are no *invalid* characters at all.
There are characters that cannot be encoded correctly which is another problem.



 Similarily, all text parts in a comment are uncritical if the file is
 processed by TeX, because comments are not decoded at all.

 This does not matter. If the user enters a comment (remember that this is 
 either in the preamble or in ERT) we must assume that he did that on purpose 
 and wants the comment to be preserved. We should not silently throw away 
 parts of the comment.

 I agree with Jürgen here. Ignoring unencodable characters in comments means 
 that you don't care for the contents of the comment. 

This depends: 

* If the user wants to export the file to LaTeX, yes.

* If the user wants to compile the document or export to PDF/PS/DVI, no.
  As LaTeX throws away the comment anyway, there is no need to care about
  missing characters.


 If you don't care, then why don't you omit the characters in the first
 place (or even the whole comment)?

I want a concise comment in the LyX file, in order to explain what the
code is about.

In the document, I have the option to change the comment into a LyX-note.
Fine.

In the preamble, there is no way to make a LyX-only comment. :-(

However, as the condition preamble comment character that cannot be
encoded in the chosen LaTeX encoding and has no LICR in unicodesymbols
is rare and there are workarounds (select a different encoding, use an
encodable transliteration or description, or ignore the warning and
proceed), I agree now that we don't need to check for in comment in the
preamble.

OTOH, the unicodesymbols replacements should be applied in the
LaTeX-preamble, too. Currently, it seems they are applied for comments in
the document, but not for the user-preamble. See 
http://www.lyx.org/trac/raw-attachment/ticket/9607/%C4%82%C4%83-in-preamble-comment-warning.lyx

Günter









Re: false positive in test for unencodable preamble

2015-07-02 Thread Guenter Milde
On 2015-06-30, Georg Baum wrote:
> Guenter Milde wrote:
>> On 2015-06-26, Jürgen Spitzmüller wrote:
>>> 2015-06-26 16:44 GMT+02:00 Guenter Milde :

 Please don't check for unencodable characters in comments.

>>> It's still invalid encoding, since the output file contains invalid
>>> glyphs (no matter if this line is processed by LaTeX or not).

>> I have a different view on this:

>> **Invalid** characters may only occure in utf-8 encoded files,
>> for example in a file generated by LyX with default settings if
>> * the document language defaults to utf8
>> * a second language defaults to an 8-bit encoding:

...

>>   \selectlanguage{ngerman}%
>>   \inputencoding{latin9}%
>>   Gr��e \selectlanguage{mongolian}%
...

>> Here, the German word Grüße contains 2 invalid characters if you want to
>> process/view/edit the whole file as Utf-8.

> This is not invalid. Such a file is not an utf8 file, it is a file with 
> mixed encoding, but each single character is valid. The fact that most 
> editors cannot display such a file correctly is something else, but e.g. 
> emacs can display this file correctly.

OK. However, in 8-bit encoded files, there are no *invalid* characters at all.
There are characters that cannot be encoded correctly which is another problem.



>> Similarily, all text parts in a comment are uncritical if the file is
>> processed by TeX, because comments are not decoded at all.

> This does not matter. If the user enters a comment (remember that this is 
> either in the preamble or in ERT) we must assume that he did that on purpose 
> and wants the comment to be preserved. We should not silently throw away 
> parts of the comment.

> I agree with Jürgen here. Ignoring unencodable characters in comments means 
> that you don't care for the contents of the comment. 

This depends: 

* If the user wants to export the file to LaTeX, yes.

* If the user wants to compile the document or export to PDF/PS/DVI, no.
  As LaTeX "throws away" the comment anyway, there is no need to care about
  missing characters.


> If you don't care, then why don't you omit the characters in the first
> place (or even the whole comment)?

I want a concise comment in the LyX file, in order to explain what the
code is about.

In the document, I have the option to change the comment into a LyX-note.
Fine.

In the preamble, there is no way to make a "LyX-only" comment. :-(

However, as the condition "preamble comment character that cannot be
encoded in the chosen LaTeX encoding and has no LICR in unicodesymbols"
is rare and there are workarounds (select a different encoding, use an
encodable transliteration or description, or ignore the warning and
proceed), I agree now that we don't need to check for "in comment" in the
preamble.

OTOH, the "unicodesymbols" replacements should be applied in the
LaTeX-preamble, too. Currently, it seems they are applied for comments in
the document, but not for the user-preamble. See 
http://www.lyx.org/trac/raw-attachment/ticket/9607/%C4%82%C4%83-in-preamble-comment-warning.lyx

Günter









Re: false positive in test for unencodable preamble

2015-07-01 Thread Jürgen Spitzmüller
2015-06-29 13:26 GMT+02:00 Guenter Milde mi...@users.sf.net:

 Similarily, all text parts in a comment are uncritical if the file is
 processed by TeX, because comments are not decoded at all.


But LyX cannot convert this to the target encoding and thus fails. Silently
removing the glyphs without giving a warning is no option, IMHO. We cannot
remove data without asking.

See attached file (try to view PDF with LyX 2.1.x).

Jürgen


enc.lyx
Description: application/lyx


Re: false positive in test for unencodable preamble

2015-07-01 Thread Jürgen Spitzmüller
2015-06-29 13:26 GMT+02:00 Guenter Milde :

> Similarily, all text parts in a comment are uncritical if the file is
> processed by TeX, because comments are not decoded at all.
>

But LyX cannot convert this to the target encoding and thus fails. Silently
removing the glyphs without giving a warning is no option, IMHO. We cannot
remove data without asking.

See attached file (try to view PDF with LyX 2.1.x).

Jürgen


enc.lyx
Description: application/lyx


Re: false positive in test for unencodable preamble

2015-06-30 Thread Guenter Milde
On 2015-06-26, Jürgen Spitzmüller wrote:
 2015-06-26 16:44 GMT+02:00 Guenter Milde mi...@users.sf.net:

 Please don't check for unencodable characters in comments.

 It's still invalid encoding, since the output file contains invalid glyphs
 (no matter if this line is processed by LaTeX or not).

I have a different view on this:

**Invalid** characters may only occure in utf-8 encoded files,
for example in a file generated by LyX with default settings if
* the document language defaults to utf8
* a second language defaults to an 8-bit encoding:

  %% LyX 2.1.3 created this file.  For more info, see http://www.lyx.org/.
  %% Do not edit unless you really know what you are doing.
  \documentclass[a4paper]{article}
  \usepackage{lmodern}
  \renewcommand{\sfdefault}{lmss}
  \renewcommand{\ttdefault}{lmtt}
  \usepackage[T1]{fontenc}
  \usepackage[latin9,utf8]{inputenc}

  \makeatletter

  %% LyX specific LaTeX commands.
  \special{papersize=\the\paperwidth,\the\paperheight}


  \makeatother

  \usepackage[ngerman,mongolian]{babel}
  \begin{document}
  Test

  \selectlanguage{ngerman}%
  \inputencoding{latin9}%
  Gr��e \selectlanguage{mongolian}%

  \end{document}

Here, the German word Grüße contains 2 invalid characters if you want to
process/view/edit the whole file as Utf-8.

With TeX, there is no problem, as the German text part is read using a
different encoding.

Similarily, all text parts in a comment are uncritical if the file is
processed by TeX, because comments are not decoded at all.

Just like in a code source file, comments may contain characters that are
not valid in the code.


 Change the comment and use valid glyphs.

This would mean I have to be very verbose and write 0218 LATIN CAPITAL
LETTER S WITH COMMA BELOW (or some other unambiguous ASCII representation
of the to-be-tested letter Ș) in the LaTeX preamble of my LyX file to
be able to use LyX unicodesymbols export conversions.

In my view, LyX is overly restrictive here and the new feature stands in the
way.

Günter


PS: A glyph is a graphical representation of a character.
LyX files, TeX files, C++ files and E-Mails only contain characters
(in various character encodings), not glyphs. Glyphs are defined in
font files (otf, tff, metafont, ...). Unicode defines code-points for
characters and (as a non-binding information) shows sample glyphs for
printable characters.




Re: false positive in test for unencodable preamble

2015-06-30 Thread Georg Baum
Guenter Milde wrote:

 On 2015-06-26, Jürgen Spitzmüller wrote:
 2015-06-26 16:44 GMT+02:00 Guenter Milde mi...@users.sf.net:
 
 Please don't check for unencodable characters in comments.
 
 It's still invalid encoding, since the output file contains invalid
 glyphs (no matter if this line is processed by LaTeX or not).
 
 I have a different view on this:
 
 **Invalid** characters may only occure in utf-8 encoded files,
 for example in a file generated by LyX with default settings if
 * the document language defaults to utf8
 * a second language defaults to an 8-bit encoding:
 
   %% LyX 2.1.3 created this file.  For more info, see http://www.lyx.org/.
   %% Do not edit unless you really know what you are doing.
   \documentclass[a4paper]{article}
   \usepackage{lmodern}
   \renewcommand{\sfdefault}{lmss}
   \renewcommand{\ttdefault}{lmtt}
   \usepackage[T1]{fontenc}
   \usepackage[latin9,utf8]{inputenc}
 
   \makeatletter
 
   %% LyX specific LaTeX commands.
   \special{papersize=\the\paperwidth,\the\paperheight}
 
 
   \makeatother
 
   \usepackage[ngerman,mongolian]{babel}
   \begin{document}
   Test
 
   \selectlanguage{ngerman}%
   \inputencoding{latin9}%
   Gr��e \selectlanguage{mongolian}%
 
   \end{document}
 
 Here, the German word Grüße contains 2 invalid characters if you want to
 process/view/edit the whole file as Utf-8.

This is not invalid. Such a file is not an utf8 file, it is a file with 
mixed encoding, but each single character is valid. The fact that most 
editors cannot display such a file correctly is something else, but e.g. 
emacs can display this file correctly.

 With TeX, there is no problem, as the German text part is read using a
 different encoding.

Yes.
 
 Similarily, all text parts in a comment are uncritical if the file is
 processed by TeX, because comments are not decoded at all.

This does not matter. If the user enters a comment (remember that this is 
either in the preamble or in ERT) we must assume that he did that on purpose 
and wants the comment to be preserved. We should not silently throw away 
parts of the comment.

 This would mean I have to be very verbose and write 0218 LATIN CAPITAL
 LETTER S WITH COMMA BELOW (or some other unambiguous ASCII representation
 of the to-be-tested letter Ș) in the LaTeX preamble of my LyX file to
 be able to use LyX unicodesymbols export conversions.

You could also use an ASCII approximation (e.g. S,).

 In my view, LyX is overly restrictive here and the new feature stands in
 the way.

I agree with Jürgen here. Ignoring unencodable characters in comments means 
that you don't care for the contents of the comment. If you don't care, then 
why don't you omit the characters in the first place (or even the whole 
comment)?


Georg




Re: false positive in test for unencodable preamble

2015-06-30 Thread Guenter Milde
On 2015-06-26, Jürgen Spitzmüller wrote:
> 2015-06-26 16:44 GMT+02:00 Guenter Milde :

>> Please don't check for unencodable characters in comments.

> It's still invalid encoding, since the output file contains invalid glyphs
> (no matter if this line is processed by LaTeX or not).

I have a different view on this:

**Invalid** characters may only occure in utf-8 encoded files,
for example in a file generated by LyX with default settings if
* the document language defaults to utf8
* a second language defaults to an 8-bit encoding:

  %% LyX 2.1.3 created this file.  For more info, see http://www.lyx.org/.
  %% Do not edit unless you really know what you are doing.
  \documentclass[a4paper]{article}
  \usepackage{lmodern}
  \renewcommand{\sfdefault}{lmss}
  \renewcommand{\ttdefault}{lmtt}
  \usepackage[T1]{fontenc}
  \usepackage[latin9,utf8]{inputenc}

  \makeatletter

  %% LyX specific LaTeX commands.
  \special{papersize=\the\paperwidth,\the\paperheight}


  \makeatother

  \usepackage[ngerman,mongolian]{babel}
  \begin{document}
  Test

  \selectlanguage{ngerman}%
  \inputencoding{latin9}%
  Gr��e \selectlanguage{mongolian}%

  \end{document}

Here, the German word Grüße contains 2 invalid characters if you want to
process/view/edit the whole file as Utf-8.

With TeX, there is no problem, as the German text part is read using a
different encoding.

Similarily, all text parts in a comment are uncritical if the file is
processed by TeX, because comments are not decoded at all.

Just like in a code source file, comments may contain characters that are
not valid in the code.


> Change the comment and use valid glyphs.

This would mean I have to be very verbose and write 0218 LATIN CAPITAL
LETTER S WITH COMMA BELOW (or some other unambiguous ASCII representation
of the to-be-tested letter Ș) in the LaTeX preamble of my LyX file to
be able to use LyX "unicodesymbols" export conversions.

In my view, LyX is overly restrictive here and the new feature stands in the
way.

Günter


PS: A glyph is a graphical representation of a character.
LyX files, TeX files, C++ files and E-Mails only contain characters
(in various character encodings), not glyphs. Glyphs are defined in
font files (otf, tff, metafont, ...). Unicode defines code-points for
characters and (as a non-binding information) shows sample glyphs for
printable characters.




Re: false positive in test for unencodable preamble

2015-06-30 Thread Georg Baum
Guenter Milde wrote:

> On 2015-06-26, Jürgen Spitzmüller wrote:
>> 2015-06-26 16:44 GMT+02:00 Guenter Milde :
> 
>>> Please don't check for unencodable characters in comments.
> 
>> It's still invalid encoding, since the output file contains invalid
>> glyphs (no matter if this line is processed by LaTeX or not).
> 
> I have a different view on this:
> 
> **Invalid** characters may only occure in utf-8 encoded files,
> for example in a file generated by LyX with default settings if
> * the document language defaults to utf8
> * a second language defaults to an 8-bit encoding:
> 
>   %% LyX 2.1.3 created this file.  For more info, see http://www.lyx.org/.
>   %% Do not edit unless you really know what you are doing.
>   \documentclass[a4paper]{article}
>   \usepackage{lmodern}
>   \renewcommand{\sfdefault}{lmss}
>   \renewcommand{\ttdefault}{lmtt}
>   \usepackage[T1]{fontenc}
>   \usepackage[latin9,utf8]{inputenc}
> 
>   \makeatletter
> 
>   %% LyX specific LaTeX commands.
>   \special{papersize=\the\paperwidth,\the\paperheight}
> 
> 
>   \makeatother
> 
>   \usepackage[ngerman,mongolian]{babel}
>   \begin{document}
>   Test
> 
>   \selectlanguage{ngerman}%
>   \inputencoding{latin9}%
>   Gr��e \selectlanguage{mongolian}%
> 
>   \end{document}
> 
> Here, the German word Grüße contains 2 invalid characters if you want to
> process/view/edit the whole file as Utf-8.

This is not invalid. Such a file is not an utf8 file, it is a file with 
mixed encoding, but each single character is valid. The fact that most 
editors cannot display such a file correctly is something else, but e.g. 
emacs can display this file correctly.

> With TeX, there is no problem, as the German text part is read using a
> different encoding.

Yes.
 
> Similarily, all text parts in a comment are uncritical if the file is
> processed by TeX, because comments are not decoded at all.

This does not matter. If the user enters a comment (remember that this is 
either in the preamble or in ERT) we must assume that he did that on purpose 
and wants the comment to be preserved. We should not silently throw away 
parts of the comment.

> This would mean I have to be very verbose and write 0218 LATIN CAPITAL
> LETTER S WITH COMMA BELOW (or some other unambiguous ASCII representation
> of the to-be-tested letter Ș) in the LaTeX preamble of my LyX file to
> be able to use LyX "unicodesymbols" export conversions.

You could also use an ASCII approximation (e.g. "S,").

> In my view, LyX is overly restrictive here and the new feature stands in
> the way.

I agree with Jürgen here. Ignoring unencodable characters in comments means 
that you don't care for the contents of the comment. If you don't care, then 
why don't you omit the characters in the first place (or even the whole 
comment)?


Georg




Re: false positive in test for unencodable preamble

2015-06-26 Thread Jürgen Spitzmüller
2015-06-26 16:44 GMT+02:00 Guenter Milde mi...@users.sf.net:

 Dear LyX developers,

 trying the testfile for the comma-accent feature, I came across an
 annoyance:

   With every compilation attempt, a pop up window tells me, that there
   are unencodable characters in the preamble.

 However, the offending characters are in a comment:

   % Romanian letters Ș and Ț should be used with \textcommaaccent to
   % distinguish from the corresponding letters with cedilla.

 so there is actually no problem and

 * I don't want to change the comment
 * I want to use ASCII encoding (to test the unicodesymbols).

 Please don't check for unencodable characters in comments.


It's still invalid encoding, since the output file contains invalid glyphs
(no matter if this line is processed by LaTeX or not).

Change the comment and use valid glyphs.

Jürgen





 Günter




Re: false positive in test for unencodable preamble

2015-06-26 Thread Jürgen Spitzmüller
2015-06-26 16:44 GMT+02:00 Guenter Milde :

> Dear LyX developers,
>
> trying the testfile for the comma-accent feature, I came across an
> annoyance:
>
>   With every compilation attempt, a pop up window tells me, that there
>   are unencodable characters in the preamble.
>
> However, the "offending" characters are in a comment:
>
>   % Romanian letters Ș and Ț should be used with \textcommaaccent to
>   % distinguish from the corresponding letters with cedilla.
>
> so there is actually no problem and
>
> * I don't want to change the comment
> * I want to use ASCII encoding (to test the "unicodesymbols").
>
> Please don't check for unencodable characters in comments.
>

It's still invalid encoding, since the output file contains invalid glyphs
(no matter if this line is processed by LaTeX or not).

Change the comment and use valid glyphs.

Jürgen




>
> Günter
>
>