Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul I had a file once that Notepad++ indicated was in utf-8, but it contained one character that was not, and that was enough to cause LyX problems. Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). If you don't already have the iconv utility, you might want to download it (there's a free Windows port that's part of the GnuWin32 project on SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). Cygwin also should contain it. Once it's installed, try running iconv -c -t utf-8 yourfile newfile in a DOS shell. This should convert the original file (yourfile) to a utf-8 version (newfile), omitting any characters that are not valid in utf-8. You can then compare newfile to yourfile to see what, if anything, was omitted. (If you want to tell whether yourfile is in fact valid utf-8, run inconv without the -c flag. It will fail with an error message if it encounters any invalid characters.) /Paul
Re: LyX reading not UTF-8 encoded file.
Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, OCRing errors excepted. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) Paul Paul A. Rubin [EMAIL PROTECTED] a écrit dans le message de news: [EMAIL PROTECTED] Paul Schwartz wrote: Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul I had a file once that Notepad++ indicated was in utf-8, but it contained one character that was not, and that was enough to cause LyX problems. Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). If you don't already have the iconv utility, you might want to download it (there's a free Windows port that's part of the GnuWin32 project on SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). Cygwin also should contain it. Once it's installed, try running iconv -c -t utf-8 yourfile newfile in a DOS shell. This should convert the original file (yourfile) to a utf-8 version (newfile), omitting any characters that are not valid in utf-8. You can then compare newfile to yourfile to see what, if anything, was omitted. (If you want to tell whether yourfile is in fact valid utf-8, run inconv without the -c flag. It will fail with an error message if it encounters any invalid characters.) /Paul
Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. We'd probably need a developer for a definitive answer to this, but I'm pretty sure that LyX uses iconv (or the library version, libiconv) to handle encodings, and it's possible that a newer (or different) release of libiconv was used in the compilation of LyX 1.5.1. That might account for the worked-with-a-former-version aspect. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, OCRing errors excepted. It's possible, of course, that a character was dropped or otherwise munged, but it was part of an OCR error and so you're not noticing it. As I (vaguely) understand this, Windows programs tend to start a UTF-8 file with a three byte code (EF BB BF) indicating it's UTF-8. Wikipedia says Notepad does this, and that it is not part of the standard. I mention this because, in screwing around with iconv and Notepad++ (which can do some encoding changes), I once generated an ANSI file from a UTF-8 file. The UTF-8 file contained at least one non-ASCII character that came through intact in the ANSI file. In fact, doing a byte by byte comparison, the only difference between the UTF-8 and ANSI files was that the latter lacked that initial three byte code. Nonetheless, if I told iconv that the ANSI file was UTF-8, it balked at converting the one odd character. So it's possible that either the presence or absence of a header has the version of libiconv used with LyX 1.5.1 burping, even though the rest of the file comes through ok. I don't have 1.5.1 installed anymore, but I ran a little experiment (two versions of a file containing one Arabic character and a bunch of English text, one with the three byte prefix and one without). LyX 1.5.2 and LyX 1.5.3 behaved identically. Both imported the file (as plain text) correctly, and neither threw up a warning. The only difference between file versions was that the three byte prefix, when present, was imported as a goofy character (open box), easily deleted. The Arabic character came through correctly either way. I also tried LyX 1.4.4, which I still have installed for some reason. Again, it had no complaint with either version, but the Arabic character was imported incorrectly (and the prefix was imported as three rather goofy looking characters rather than as one). So apparently something in the support for Unicode changed between 1.4.4 and 1.5.2. Whether this bears on your experience, I can't say. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) I'm stuck with Windows myself -- too much investment in Windows-only software, plus I work in an environment where students think Micro$oft makes the only productivity software in the universe. So I wouldn't dream of pointing fingers. /Paul
Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul I had a file once that Notepad++ indicated was in utf-8, but it contained one character that was not, and that was enough to cause LyX problems. Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). If you don't already have the iconv utility, you might want to download it (there's a free Windows port that's part of the GnuWin32 project on SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). Cygwin also should contain it. Once it's installed, try running iconv -c -t utf-8 yourfile newfile in a DOS shell. This should convert the original file (yourfile) to a utf-8 version (newfile), omitting any characters that are not valid in utf-8. You can then compare newfile to yourfile to see what, if anything, was omitted. (If you want to tell whether yourfile is in fact valid utf-8, run inconv without the -c flag. It will fail with an error message if it encounters any invalid characters.) /Paul
Re: LyX reading not UTF-8 encoded file.
Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, OCRing errors excepted. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) Paul Paul A. Rubin [EMAIL PROTECTED] a écrit dans le message de news: [EMAIL PROTECTED] Paul Schwartz wrote: Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul I had a file once that Notepad++ indicated was in utf-8, but it contained one character that was not, and that was enough to cause LyX problems. Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). If you don't already have the iconv utility, you might want to download it (there's a free Windows port that's part of the GnuWin32 project on SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). Cygwin also should contain it. Once it's installed, try running iconv -c -t utf-8 yourfile newfile in a DOS shell. This should convert the original file (yourfile) to a utf-8 version (newfile), omitting any characters that are not valid in utf-8. You can then compare newfile to yourfile to see what, if anything, was omitted. (If you want to tell whether yourfile is in fact valid utf-8, run inconv without the -c flag. It will fail with an error message if it encounters any invalid characters.) /Paul
Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. We'd probably need a developer for a definitive answer to this, but I'm pretty sure that LyX uses iconv (or the library version, libiconv) to handle encodings, and it's possible that a newer (or different) release of libiconv was used in the compilation of LyX 1.5.1. That might account for the worked-with-a-former-version aspect. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, OCRing errors excepted. It's possible, of course, that a character was dropped or otherwise munged, but it was part of an OCR error and so you're not noticing it. As I (vaguely) understand this, Windows programs tend to start a UTF-8 file with a three byte code (EF BB BF) indicating it's UTF-8. Wikipedia says Notepad does this, and that it is not part of the standard. I mention this because, in screwing around with iconv and Notepad++ (which can do some encoding changes), I once generated an ANSI file from a UTF-8 file. The UTF-8 file contained at least one non-ASCII character that came through intact in the ANSI file. In fact, doing a byte by byte comparison, the only difference between the UTF-8 and ANSI files was that the latter lacked that initial three byte code. Nonetheless, if I told iconv that the ANSI file was UTF-8, it balked at converting the one odd character. So it's possible that either the presence or absence of a header has the version of libiconv used with LyX 1.5.1 burping, even though the rest of the file comes through ok. I don't have 1.5.1 installed anymore, but I ran a little experiment (two versions of a file containing one Arabic character and a bunch of English text, one with the three byte prefix and one without). LyX 1.5.2 and LyX 1.5.3 behaved identically. Both imported the file (as plain text) correctly, and neither threw up a warning. The only difference between file versions was that the three byte prefix, when present, was imported as a goofy character (open box), easily deleted. The Arabic character came through correctly either way. I also tried LyX 1.4.4, which I still have installed for some reason. Again, it had no complaint with either version, but the Arabic character was imported incorrectly (and the prefix was imported as three rather goofy looking characters rather than as one). So apparently something in the support for Unicode changed between 1.4.4 and 1.5.2. Whether this bears on your experience, I can't say. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) I'm stuck with Windows myself -- too much investment in Windows-only software, plus I work in an environment where students think Micro$oft makes the only productivity software in the universe. So I wouldn't dream of pointing fingers. /Paul
Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul I had a file once that Notepad++ indicated was in utf-8, but it contained one character that was not, and that was enough to cause LyX problems. Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). If you don't already have the iconv utility, you might want to download it (there's a free Windows port that's part of the GnuWin32 project on SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). Cygwin also should contain it. Once it's installed, try running iconv -c -t utf-8 yourfile > newfile in a DOS shell. This should convert the original file (yourfile) to a utf-8 version (newfile), omitting any characters that are not valid in utf-8. You can then compare newfile to yourfile to see what, if anything, was omitted. (If you want to tell whether yourfile is in fact valid utf-8, run inconv without the -c flag. It will fail with an error message if it encounters any invalid characters.) /Paul
Re: LyX reading not UTF-8 encoded file.
Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, "OCRing" errors excepted. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) Paul "Paul A. Rubin" <[EMAIL PROTECTED]> a écrit dans le message de news: [EMAIL PROTECTED] > Paul Schwartz wrote: >> Sorry, I am still under 1.5.1 and I do not know if it has been corrected >> with 1.5.2 or 1.5.3. >> After scanning then getting through OCR and saving either using Word pad >> or >> Note pad being sure that under word pad it is saved under unicode UTF-8 >> Then >> importing into a LyX document I got the following window alarm : >> Quote >> LyX : Reading not UTF-8 encoded file >> >> The file is not UTF-8 encoded. >> It will be read as a local 8 bit-encoded. If this does not give the >> correct >> result then please change the encoding of the file to UTF-8 with a >> program >> other than LyX. >> Unquote >> >> I recoded again with Word pad without success getting the same alarm. >> However, when closing the alarm window, the text is properly imported. >> Because with the former LyX versions, I never got this problem, I suppose >> that I might be a false alarm ? >> Any clue ? >> >> Thanks >> >> Paul >> >> > > I had a file once that Notepad++ indicated was in utf-8, but it contained > one character that was not, and that was enough to cause LyX problems. > > Since you mentioned Notepad and Wordpad, I suspect you're on Windows(?). > If you don't already have the iconv utility, you might want to download it > (there's a free Windows port that's part of the GnuWin32 project on > SourceForge, at http://gnuwin32.sourceforge.net/packages/libiconv.htm). > Cygwin also should contain it. Once it's installed, try running > > iconv -c -t utf-8 yourfile > newfile > > in a DOS shell. This should convert the original file (yourfile) to a > utf-8 version (newfile), omitting any characters that are not valid in > utf-8. You can then compare newfile to yourfile to see what, if anything, > was omitted. (If you want to tell whether yourfile is in fact valid > utf-8, run inconv without the -c flag. It will fail with an error message > if it encounters any invalid characters.) > > /Paul > >
Re: LyX reading not UTF-8 encoded file.
Paul Schwartz wrote: Thanks Paul, you are always pretty helpful. However, ahead of sending that question, I rechecked with a file which I used formerly and on which a former LyX version did not issued an alarm and I got the same result. That is why it surprised me. We'd probably need a developer for a definitive answer to this, but I'm pretty sure that LyX uses iconv (or the library version, libiconv) to handle encodings, and it's possible that a newer (or different) release of libiconv was used in the compilation of LyX 1.5.1. That might account for the worked-with-a-former-version aspect. Besides, after clearing the alarm everything appeared perfectly correct inside the LyX file, "OCRing" errors excepted. It's possible, of course, that a character was dropped or otherwise munged, but it was part of an OCR error and so you're not noticing it. As I (vaguely) understand this, Windows programs tend to start a UTF-8 file with a three byte code (EF BB BF) indicating it's UTF-8. Wikipedia says Notepad does this, and that it is not part of the standard. I mention this because, in screwing around with iconv and Notepad++ (which can do some encoding changes), I once generated an ANSI file from a UTF-8 file. The UTF-8 file contained at least one non-ASCII character that came through intact in the ANSI file. In fact, doing a byte by byte comparison, the only difference between the UTF-8 and ANSI files was that the latter lacked that initial three byte code. Nonetheless, if I told iconv that the ANSI file was UTF-8, it balked at converting the one odd character. So it's possible that either the presence or absence of a header has the version of libiconv used with LyX 1.5.1 burping, even though the rest of the file comes through ok. I don't have 1.5.1 installed anymore, but I ran a little experiment (two versions of a file containing one Arabic character and a bunch of English text, one with the three byte prefix and one without). LyX 1.5.2 and LyX 1.5.3 behaved identically. Both imported the file (as plain text) correctly, and neither threw up a warning. The only difference between file versions was that the three byte prefix, when present, was imported as a goofy character (open box), easily deleted. The Arabic character came through correctly either way. I also tried LyX 1.4.4, which I still have installed for some reason. Again, it had no complaint with either version, but the Arabic character was imported incorrectly (and the prefix was imported as three rather goofy looking characters rather than as one). So apparently something in the support for Unicode changed between 1.4.4 and 1.5.2. Whether this bears on your experience, I can't say. I confirm that I am XP PRO (do not blame me please ! But since the 80's and I am far to be a pro.) I'm stuck with Windows myself -- too much investment in Windows-only software, plus I work in an environment where students think Micro$oft makes the only productivity software in the universe. So I wouldn't dream of pointing fingers. /Paul
LyX reading not UTF-8 encoded file.
Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul
LyX reading not UTF-8 encoded file.
Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul
LyX reading not UTF-8 encoded file.
Sorry, I am still under 1.5.1 and I do not know if it has been corrected with 1.5.2 or 1.5.3. After scanning then getting through OCR and saving either using Word pad or Note pad being sure that under word pad it is saved under unicode UTF-8 Then importing into a LyX document I got the following window alarm : Quote LyX : Reading not UTF-8 encoded file The file is not UTF-8 encoded. It will be read as a local 8 bit-encoded. If this does not give the correct result then please change the encoding of the file to UTF-8 with a program other than LyX. Unquote I recoded again with Word pad without success getting the same alarm. However, when closing the alarm window, the text is properly imported. Because with the former LyX versions, I never got this problem, I suppose that I might be a false alarm ? Any clue ? Thanks Paul