Re: Removing BOM from UTF-8
Hi, Gerard Seibert [EMAIL PROTECTED] words on 18.02.2006 - 16:57 (-0500 Zulu-Time): Benjamin A'Lee wrote: It shouldn't be writing any new files; it prints the filtered text to stdout. Ben OK, then that is the problem. I need it to actually write the file. It could either rename the old file and then rewrite it which would be nice, or just over write the old file. The BOM is just the first three characters in the file. I am assuming that it would not be removing anything else in the file. use a for-loop in your shell: # bash # cd to/your/directory # for i in *; do # nobom.sh $i $i.new # done this will take all your files in your directory and proceed each one it with nobom.sh, which then will write it to new file. Be sure that your perlscript points to your perl installation on your system. You can use 'which perl' to get the location of your perl installation. Cheers Erik -- J. Erik Heinz Keyboard-samuraing in process :: All non-mailinglist mail to this emailadress will be deleted. OpenBC: https://www.openbc.com/hp/JErik_Heinz ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
J. Erik Heinz wrote: use a for-loop in your shell: # bash # cd to/your/directory # for i in *; do # nobom.sh $i $i.new # done this will take all your files in your directory and proceed each one it with nobom.sh, which then will write it to new file. Be sure that your perlscript points to your perl installation on your system. You can use 'which perl' to get the location of your perl installation. Cheers Erik Thanks! I'll give it a try when I return to work. -- Gerard ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Removing BOM from UTF-8
I have a large number of text files created in MS Word and saved in UTF-8 format. Unfortunately, MS Word adds the BOM to each file. I need to remove the BOM. Information regarding BOM and UTF-8 can be found here: http://www.cl.cam.ac.uk/~mgk25/unicode.html http://www.w3.org/International/questions/qa-utf8-bom A brief excerpt: It has also been suggested to use the UTF-8 encoded BOM (0xEF 0xBB 0xBF) as a signature to mark the beginning of a UTF-8 file. This practice should definitely not be used on POSIX systems for several reasons: * On POSIX systems, the locale and not magic file type codes define the encoding of plain text files. Mixing the two concepts would add a lot of complexity and break existing functionality. * Adding a UTF-8 signature at the start of a file would interfere with many established conventions such as the kernel looking for “#!” at the beginning of a plaintext executable to locate the appropriate interpreter. * Handling BOMs properly would add undesirable complexity even to simple programs like cat or grep that mix contents of several files into one. It has been suggested that a script could be written to eliminate the BOM from a file(s). My script writing skills suck. I have been unable to locate one using Google, so I was hoping that someone might know where I could either locate such a program, or perhaps give me an idea on how to script one. Thanks! -- Gerard Seibert [EMAIL PROTECTED] I'm interested in the fact that the less secure a man is, the more likely he is to have extreme prejudice. Clint Eastwood ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
On Sat, 2006-02-18 at 11:28 -0500, Gerard Seibert wrote: It has been suggested that a script could be written to eliminate the BOM from a file(s). My script writing skills suck. I have been unable to locate one using Google, so I was hoping that someone might know where I could either locate such a program, or perhaps give me an idea on how to script one. #!/usr/bin/perl @file=; $file[0] =~ s/^\xEF\xBB\xBF//; print(@file); That'll read a file from stdin, remove the BOM from the beginning of the first line if it's present, and print it to stdout. Hope it helps. Ben -- Termisoc Tech Officer: http://termisoc.org/ My Homepage: http://benalee.co.uk/ People demand freedom of speech as compensation for the freedom of thought which they have but seldom use. -- Søren Kierkegaard ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
Benjamin A'Lee wrote: On Sat, 2006-02-18 at 11:28 -0500, Gerard Seibert wrote: It has been suggested that a script could be written to eliminate the BOM from a file(s). My script writing skills suck. I have been unable to locate one using Google, so I was hoping that someone might know where I could either locate such a program, or perhaps give me an idea on how to script one. #!/usr/bin/perl @file=; $file[0] =~ s/^\xEF\xBB\xBF//; print(@file); That'll read a file from stdin, remove the BOM from the beginning of the first line if it's present, and print it to stdout. Hope it helps. Ben -- Termisoc Tech Officer: http://termisoc.org/ My Homepage: http://benalee.co.uk/ People demand freedom of speech as compensation for the freedom of thought which they have but seldom use. -- Søren Kierkegaard Maybe I am doing something wrong, but it does not appear to be working correctly. I named the file nobom.sh and put it in the same directory as the files I want to convert. I also set the program permission to 0755. typing the p[program name does nothing; I have to precede it with 'perl'. Even then, it does not appear to work correctly. In the following example, the file is parsed, but not converted. perl nobom.sh testfile Am I doing something incorrectly here? Thanks! -- Gerard Seibert [EMAIL PROTECTED] PGP: http://www.seibercom.net/sig/gerard.asc pgpHJZbonjTLX.pgp Description: PGP signature
Re: Removing BOM from UTF-8
On Sat, 2006-02-18 at 14:34 -0500, Gerard Seibert wrote: Maybe I am doing something wrong, but it does not appear to be working correctly. I named the file nobom.sh and put it in the same directory as the files I want to convert. I also set the program permission to 0755. typing the p[program name does nothing; I have to precede it with 'perl'. Even then, it does not appear to work correctly. In the following example, the file is parsed, but not converted. Sorry; try changing the first line to #!/usr/local/bin/perl perl nobom.sh testfile Am I doing something incorrectly here? Try: cat testfile | nobom.sh Though the way you describe appears to work here: $ cat bom-testfile | hd ef bb bf 23 20 42 4f 4d 20 74 65 73 74 20 66 69 |...# BOM test fi| 0010 6c 65 0a |le.| 0013 $ bomkill.pl bom-testfile | hd 23 20 42 4f 4d 20 74 65 73 74 20 66 69 6c 65 0a |# BOM test file.| 0010 Ben -- Termisoc Tech Officer: http://termisoc.org/ My Homepage: http://benalee.co.uk/ People demand freedom of speech as compensation for the freedom of thought which they have but seldom use. -- Søren Kierkegaard ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
enjamin A'Lee wrote: From: Benjamin A'Lee [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Date: Today 03:29:34 pm On Sat, 2006-02-18 at 14:34 -0500, Gerard Seibert wrote: Maybe I am doing something wrong, but it does not appear to be working correctly. I named the file nobom.sh and put it in the same directory as the files I want to convert. I also set the program permission to 0755. typing the p[program name does nothing; I have to precede it with 'perl'. Even then, it does not appear to work correctly. In the following example, the file is parsed, but not converted. Sorry; try changing the first line to #!/usr/local/bin/perl perl nobom.sh testfile Am I doing something incorrectly here? Try: cat testfile | nobom.sh Though the way you describe appears to work here: $ cat bom-testfile | hd ef bb bf 23 20 42 4f 4d 20 74 65 73 74 20 66 69 |...# BOM test fi| 0010 6c 65 0a |le.| 0013 $ bomkill.pl bom-testfile | hd 23 20 42 4f 4d 20 74 65 73 74 20 66 69 6c 65 0a |# BOM test file.| 0010 Ben Something appears to be wrong here. First, the file will not run unless I precede it with 'perl'. I have another perl script in the same directory that runs just fine without any special prefixes. Also, the script does not seem to remove the BOM entity. This is the script as I have it entered: #!/usr/local/bin/perl use warnings; use diagnostics -verbose; @file=; $file[0] =~ s/^\xEF\xBB\xBF//; print(@file); I have the file permissions set to 0755. Is there anything else that could be causing this to fail? This is the first line of the file I am attempting to fix (well one of them). Subject: That is what appears when I use pico to view the file. -- Gerard Seibert [EMAIL PROTECTED] PGP: http://www.seibercom.net/sig/gerard.asc pgpZJtpuCSTLF.pgp Description: PGP signature
Re: Removing BOM from UTF-8
Gerard Seibert wrote: enjamin A'Lee wrote: From: Benjamin A'Lee [EMAIL PROTECTED] To: freebsd-questions@freebsd.org Date: Today 03:29:34 pm On Sat, 2006-02-18 at 14:34 -0500, Gerard Seibert wrote: Maybe I am doing something wrong, but it does not appear to be working correctly. I named the file nobom.sh and put it in the same directory as the files I want to convert. I also set the program permission to 0755. typing the p[program name does nothing; I have to precede it with 'perl'. Even then, it does not appear to work correctly. In the following example, the file is parsed, but not converted. Sorry; try changing the first line to #!/usr/local/bin/perl perl nobom.sh testfile Am I doing something incorrectly here? Try: cat testfile | nobom.sh Though the way you describe appears to work here: $ cat bom-testfile | hd ef bb bf 23 20 42 4f 4d 20 74 65 73 74 20 66 69 |...# BOM test fi| 0010 6c 65 0a |le.| 0013 $ bomkill.pl bom-testfile | hd 23 20 42 4f 4d 20 74 65 73 74 20 66 69 6c 65 0a |# BOM test file.| 0010 Ben Something appears to be wrong here. First, the file will not run unless I precede it with 'perl'. I have another perl script in the same directory that runs just fine without any special prefixes. Also, the script does not seem to remove the BOM entity. This is the script as I have it entered: #!/usr/local/bin/perl use warnings; use diagnostics -verbose; @file=; $file[0] =~ s/^\xEF\xBB\xBF//; print(@file); I have the file permissions set to 0755. Is there anything else that could be causing this to fail? This is the first line of the file I am attempting to fix (well one of them). Subject: That is what appears when I use pico to view the file. As I continue to play with this, it has become apparent that the new file is not being written, or at least I cannot locate it. Since I do not know perl, I have no idea where to look for answers. -- Gerard Seibert [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
On Sat, 2006-02-18 at 16:14 -0500, Gerard Seibert wrote: As I continue to play with this, it has become apparent that the new file is not being written, or at least I cannot locate it. Since I do not know perl, I have no idea where to look for answers. It shouldn't be writing any new files; it prints the filtered text to stdout. Ben -- Termisoc Tech Officer: http://termisoc.org/ My Homepage: http://benalee.co.uk/ People demand freedom of speech as compensation for the freedom of thought which they have but seldom use. -- Søren Kierkegaard ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
I use this to add BOM: http://search.cpan.org/~lyokato/UTF8BOM-1.01/lib/UTF8BOM.pm You shouldn't be so fixed on eliminating BOMs, it's quite a nice concept. It causes less trouble than you think. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Removing BOM from UTF-8
Benjamin A'Lee wrote: It shouldn't be writing any new files; it prints the filtered text to stdout. Ben OK, then that is the problem. I need it to actually write the file. It could either rename the old file and then rewrite it which would be nice, or just over write the old file. The BOM is just the first three characters in the file. I am assuming that it would not be removing anything else in the file. -- Gerard Seibert [EMAIL PROTECTED] ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]