Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul




I had a file once that Notepad++ indicated was in utf-8, but it 
contained one character that was not, and that was enough to cause LyX 
problems.


Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
 If you don't already have the iconv utility, you might want to 
download it (there's a free Windows port that's part of the GnuWin32 
project on SourceForge, at 
http://gnuwin32.sourceforge.net/packages/libiconv.htm).   Cygwin also 
should contain it.  Once it's installed, try running


  iconv -c -t utf-8 yourfile  newfile

in a DOS shell.  This should convert the original file (yourfile) to a 
utf-8 version (newfile), omitting any characters that are not valid in 
utf-8.  You can then compare newfile to yourfile to see what, if 
anything, was omitted.  (If you want to tell whether yourfile is in fact 
valid utf-8, run inconv without the -c flag.  It will fail with an error 
message if it encounters any invalid characters.)


/Paul



Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul Schwartz
Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.
Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, OCRing errors excepted.
I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)

Paul


Paul A. Rubin [EMAIL PROTECTED] a écrit dans le message 
de news: [EMAIL PROTECTED]
 Paul Schwartz wrote:
 Sorry, I am still under 1.5.1 and I do not know if it has been corrected
 with 1.5.2  or 1.5.3.
 After scanning then getting through OCR and saving either using Word pad 
 or
 Note pad being sure that under word pad it is saved under unicode UTF-8 
 Then
 importing  into a LyX document I got the following window alarm :
 Quote
 LyX : Reading not UTF-8 encoded file

 The file is not UTF-8 encoded.
 It will be read as a local 8 bit-encoded. If this does not give the 
 correct
 result then please change the encoding of the file to UTF-8 with a 
 program
 other than LyX.
 Unquote

 I recoded again with Word pad without success getting the same alarm.
 However, when closing the alarm window, the text is properly imported.
 Because with the former LyX versions, I never got this problem, I suppose
 that I might be a false alarm ?
 Any clue ?

 Thanks

 Paul



 I had a file once that Notepad++ indicated was in utf-8, but it contained 
 one character that was not, and that was enough to cause LyX problems.

 Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
 If you don't already have the iconv utility, you might want to download it 
 (there's a free Windows port that's part of the GnuWin32 project on 
 SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). 
 Cygwin also should contain it.  Once it's installed, try running

   iconv -c -t utf-8 yourfile  newfile

 in a DOS shell.  This should convert the original file (yourfile) to a 
 utf-8 version (newfile), omitting any characters that are not valid in 
 utf-8.  You can then compare newfile to yourfile to see what, if anything, 
 was omitted.  (If you want to tell whether yourfile is in fact valid 
 utf-8, run inconv without the -c flag.  It will fail with an error message 
 if it encounters any invalid characters.)

 /Paul

 





Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.


We'd probably need a developer for a definitive answer to this, but I'm 
pretty sure that LyX uses iconv (or the library version, libiconv) to 
handle encodings, and it's possible that a newer (or different) release 
of libiconv was used in the compilation of LyX 1.5.1.  That might 
account for the worked-with-a-former-version aspect.


Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, OCRing errors excepted.


It's possible, of course, that a character was dropped or otherwise 
munged, but it was part of an OCR error and so you're not noticing it.


As I (vaguely) understand this, Windows programs tend to start a UTF-8 
file with a three byte code (EF BB BF) indicating it's UTF-8.  Wikipedia 
says Notepad does this, and that it is not part of the standard.  I 
mention this because, in screwing around with iconv and Notepad++ (which 
can do some encoding changes), I once generated an ANSI file from a 
UTF-8 file.  The UTF-8 file contained at least one non-ASCII character 
that came through intact in the ANSI file.  In fact, doing a byte by 
byte comparison, the only difference between the UTF-8 and ANSI files 
was that the latter lacked that initial three byte code.  Nonetheless, 
if I told iconv that the ANSI file was UTF-8, it balked at converting 
the one odd character.  So it's possible that either the presence or 
absence of a header has the version of libiconv used with LyX 1.5.1 
burping, even though the rest of the file comes through ok.


I don't have 1.5.1 installed anymore, but I ran a little experiment (two 
versions of a file containing one Arabic character and a bunch of 
English text, one with the three byte prefix and one without).  LyX 
1.5.2 and LyX 1.5.3 behaved identically.  Both imported the file (as 
plain text) correctly, and neither threw up a warning.  The only 
difference between file versions was that the three byte prefix, when 
present, was imported as a goofy character (open box), easily deleted. 
The Arabic character came through correctly either way.  I also tried 
LyX 1.4.4, which I still have installed for some reason.  Again, it had 
no complaint with either version, but the Arabic character was imported 
incorrectly (and the prefix was imported as three rather goofy looking 
characters rather than as one).


So apparently something in the support for Unicode changed between 1.4.4 
and 1.5.2.  Whether this bears on your experience, I can't say.


I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)


I'm stuck with Windows myself -- too much investment in Windows-only 
software, plus I work in an environment where students think Micro$oft 
makes the only productivity software in the universe.  So I wouldn't 
dream of pointing fingers.


/Paul



Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul




I had a file once that Notepad++ indicated was in utf-8, but it 
contained one character that was not, and that was enough to cause LyX 
problems.


Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
 If you don't already have the iconv utility, you might want to 
download it (there's a free Windows port that's part of the GnuWin32 
project on SourceForge, at 
http://gnuwin32.sourceforge.net/packages/libiconv.htm).   Cygwin also 
should contain it.  Once it's installed, try running


  iconv -c -t utf-8 yourfile  newfile

in a DOS shell.  This should convert the original file (yourfile) to a 
utf-8 version (newfile), omitting any characters that are not valid in 
utf-8.  You can then compare newfile to yourfile to see what, if 
anything, was omitted.  (If you want to tell whether yourfile is in fact 
valid utf-8, run inconv without the -c flag.  It will fail with an error 
message if it encounters any invalid characters.)


/Paul



Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul Schwartz
Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.
Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, OCRing errors excepted.
I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)

Paul


Paul A. Rubin [EMAIL PROTECTED] a écrit dans le message 
de news: [EMAIL PROTECTED]
 Paul Schwartz wrote:
 Sorry, I am still under 1.5.1 and I do not know if it has been corrected
 with 1.5.2  or 1.5.3.
 After scanning then getting through OCR and saving either using Word pad 
 or
 Note pad being sure that under word pad it is saved under unicode UTF-8 
 Then
 importing  into a LyX document I got the following window alarm :
 Quote
 LyX : Reading not UTF-8 encoded file

 The file is not UTF-8 encoded.
 It will be read as a local 8 bit-encoded. If this does not give the 
 correct
 result then please change the encoding of the file to UTF-8 with a 
 program
 other than LyX.
 Unquote

 I recoded again with Word pad without success getting the same alarm.
 However, when closing the alarm window, the text is properly imported.
 Because with the former LyX versions, I never got this problem, I suppose
 that I might be a false alarm ?
 Any clue ?

 Thanks

 Paul



 I had a file once that Notepad++ indicated was in utf-8, but it contained 
 one character that was not, and that was enough to cause LyX problems.

 Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
 If you don't already have the iconv utility, you might want to download it 
 (there's a free Windows port that's part of the GnuWin32 project on 
 SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). 
 Cygwin also should contain it.  Once it's installed, try running

   iconv -c -t utf-8 yourfile  newfile

 in a DOS shell.  This should convert the original file (yourfile) to a 
 utf-8 version (newfile), omitting any characters that are not valid in 
 utf-8.  You can then compare newfile to yourfile to see what, if anything, 
 was omitted.  (If you want to tell whether yourfile is in fact valid 
 utf-8, run inconv without the -c flag.  It will fail with an error message 
 if it encounters any invalid characters.)

 /Paul

 





Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.


We'd probably need a developer for a definitive answer to this, but I'm 
pretty sure that LyX uses iconv (or the library version, libiconv) to 
handle encodings, and it's possible that a newer (or different) release 
of libiconv was used in the compilation of LyX 1.5.1.  That might 
account for the worked-with-a-former-version aspect.


Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, OCRing errors excepted.


It's possible, of course, that a character was dropped or otherwise 
munged, but it was part of an OCR error and so you're not noticing it.


As I (vaguely) understand this, Windows programs tend to start a UTF-8 
file with a three byte code (EF BB BF) indicating it's UTF-8.  Wikipedia 
says Notepad does this, and that it is not part of the standard.  I 
mention this because, in screwing around with iconv and Notepad++ (which 
can do some encoding changes), I once generated an ANSI file from a 
UTF-8 file.  The UTF-8 file contained at least one non-ASCII character 
that came through intact in the ANSI file.  In fact, doing a byte by 
byte comparison, the only difference between the UTF-8 and ANSI files 
was that the latter lacked that initial three byte code.  Nonetheless, 
if I told iconv that the ANSI file was UTF-8, it balked at converting 
the one odd character.  So it's possible that either the presence or 
absence of a header has the version of libiconv used with LyX 1.5.1 
burping, even though the rest of the file comes through ok.


I don't have 1.5.1 installed anymore, but I ran a little experiment (two 
versions of a file containing one Arabic character and a bunch of 
English text, one with the three byte prefix and one without).  LyX 
1.5.2 and LyX 1.5.3 behaved identically.  Both imported the file (as 
plain text) correctly, and neither threw up a warning.  The only 
difference between file versions was that the three byte prefix, when 
present, was imported as a goofy character (open box), easily deleted. 
The Arabic character came through correctly either way.  I also tried 
LyX 1.4.4, which I still have installed for some reason.  Again, it had 
no complaint with either version, but the Arabic character was imported 
incorrectly (and the prefix was imported as three rather goofy looking 
characters rather than as one).


So apparently something in the support for Unicode changed between 1.4.4 
and 1.5.2.  Whether this bears on your experience, I can't say.


I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)


I'm stuck with Windows myself -- too much investment in Windows-only 
software, plus I work in an environment where students think Micro$oft 
makes the only productivity software in the universe.  So I wouldn't 
dream of pointing fingers.


/Paul



Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul




I had a file once that Notepad++ indicated was in utf-8, but it 
contained one character that was not, and that was enough to cause LyX 
problems.


Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
 If you don't already have the iconv utility, you might want to 
download it (there's a free Windows port that's part of the GnuWin32 
project on SourceForge, at 
http://gnuwin32.sourceforge.net/packages/libiconv.htm).   Cygwin also 
should contain it.  Once it's installed, try running


  iconv -c -t utf-8 yourfile > newfile

in a DOS shell.  This should convert the original file (yourfile) to a 
utf-8 version (newfile), omitting any characters that are not valid in 
utf-8.  You can then compare newfile to yourfile to see what, if 
anything, was omitted.  (If you want to tell whether yourfile is in fact 
valid utf-8, run inconv without the -c flag.  It will fail with an error 
message if it encounters any invalid characters.)


/Paul



Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul Schwartz
Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.
Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, "OCRing" errors excepted.
I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)

Paul


"Paul A. Rubin" <[EMAIL PROTECTED]> a écrit dans le message 
de news: [EMAIL PROTECTED]
> Paul Schwartz wrote:
>> Sorry, I am still under 1.5.1 and I do not know if it has been corrected
>> with 1.5.2  or 1.5.3.
>> After scanning then getting through OCR and saving either using Word pad 
>> or
>> Note pad being sure that under word pad it is saved under unicode UTF-8 
>> Then
>> importing  into a LyX document I got the following window alarm :
>> Quote
>> LyX : Reading not UTF-8 encoded file
>>
>> The file is not UTF-8 encoded.
>> It will be read as a local 8 bit-encoded. If this does not give the 
>> correct
>> result then please change the encoding of the file to UTF-8 with a 
>> program
>> other than LyX.
>> Unquote
>>
>> I recoded again with Word pad without success getting the same alarm.
>> However, when closing the alarm window, the text is properly imported.
>> Because with the former LyX versions, I never got this problem, I suppose
>> that I might be a false alarm ?
>> Any clue ?
>>
>> Thanks
>>
>> Paul
>>
>>
>
> I had a file once that Notepad++ indicated was in utf-8, but it contained 
> one character that was not, and that was enough to cause LyX problems.
>
> Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). 
> If you don't already have the iconv utility, you might want to download it 
> (there's a free Windows port that's part of the GnuWin32 project on 
> SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). 
> Cygwin also should contain it.  Once it's installed, try running
>
>   iconv -c -t utf-8 yourfile > newfile
>
> in a DOS shell.  This should convert the original file (yourfile) to a 
> utf-8 version (newfile), omitting any characters that are not valid in 
> utf-8.  You can then compare newfile to yourfile to see what, if anything, 
> was omitted.  (If you want to tell whether yourfile is in fact valid 
> utf-8, run inconv without the -c flag.  It will fail with an error message 
> if it encounters any invalid characters.)
>
> /Paul
>
> 





Re: LyX reading not UTF-8 encoded file.

2007-12-22 Thread Paul A. Rubin

Paul Schwartz wrote:

Thanks Paul, you are always pretty helpful.

However, ahead of sending that question, I rechecked with a file which I 
used formerly and on which a former LyX version did not issued an alarm and 
I got the same result. That is why it surprised me.


We'd probably need a developer for a definitive answer to this, but I'm 
pretty sure that LyX uses iconv (or the library version, libiconv) to 
handle encodings, and it's possible that a newer (or different) release 
of libiconv was used in the compilation of LyX 1.5.1.  That might 
account for the worked-with-a-former-version aspect.


Besides, after clearing the alarm everything appeared perfectly correct 
inside the LyX file, "OCRing" errors excepted.


It's possible, of course, that a character was dropped or otherwise 
munged, but it was part of an OCR error and so you're not noticing it.


As I (vaguely) understand this, Windows programs tend to start a UTF-8 
file with a three byte code (EF BB BF) indicating it's UTF-8.  Wikipedia 
says Notepad does this, and that it is not part of the standard.  I 
mention this because, in screwing around with iconv and Notepad++ (which 
can do some encoding changes), I once generated an ANSI file from a 
UTF-8 file.  The UTF-8 file contained at least one non-ASCII character 
that came through intact in the ANSI file.  In fact, doing a byte by 
byte comparison, the only difference between the UTF-8 and ANSI files 
was that the latter lacked that initial three byte code.  Nonetheless, 
if I told iconv that the ANSI file was UTF-8, it balked at converting 
the one odd character.  So it's possible that either the presence or 
absence of a header has the version of libiconv used with LyX 1.5.1 
burping, even though the rest of the file comes through ok.


I don't have 1.5.1 installed anymore, but I ran a little experiment (two 
versions of a file containing one Arabic character and a bunch of 
English text, one with the three byte prefix and one without).  LyX 
1.5.2 and LyX 1.5.3 behaved identically.  Both imported the file (as 
plain text) correctly, and neither threw up a warning.  The only 
difference between file versions was that the three byte prefix, when 
present, was imported as a goofy character (open box), easily deleted. 
The Arabic character came through correctly either way.  I also tried 
LyX 1.4.4, which I still have installed for some reason.  Again, it had 
no complaint with either version, but the Arabic character was imported 
incorrectly (and the prefix was imported as three rather goofy looking 
characters rather than as one).


So apparently something in the support for Unicode changed between 1.4.4 
and 1.5.2.  Whether this bears on your experience, I can't say.


I confirm that I am XP PRO (do not blame me please ! But since the 80's 
and I am far to be a pro.)


I'm stuck with Windows myself -- too much investment in Windows-only 
software, plus I work in an environment where students think Micro$oft 
makes the only productivity software in the universe.  So I wouldn't 
dream of pointing fingers.


/Paul



LyX reading not UTF-8 encoded file.

2007-12-21 Thread Paul Schwartz
Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul






LyX reading not UTF-8 encoded file.

2007-12-21 Thread Paul Schwartz
Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul






LyX reading not UTF-8 encoded file.

2007-12-21 Thread Paul Schwartz
Sorry, I am still under 1.5.1 and I do not know if it has been corrected
with 1.5.2  or 1.5.3.
After scanning then getting through OCR and saving either using Word pad or
Note pad being sure that under word pad it is saved under unicode UTF-8 Then
importing  into a LyX document I got the following window alarm :
Quote
LyX : Reading not UTF-8 encoded file

The file is not UTF-8 encoded.
It will be read as a local 8 bit-encoded. If this does not give the correct
result then please change the encoding of the file to UTF-8 with a program
other than LyX.
Unquote

I recoded again with Word pad without success getting the same alarm.
However, when closing the alarm window, the text is properly imported.
Because with the former LyX versions, I never got this problem, I suppose
that I might be a false alarm ?
Any clue ?

Thanks

Paul