removing BOM prepended by codecs?

2013-09-25 Thread J. Bagg
So it is just a random sequence of junk. It will be a matter of finding the real start of the record (in this case a %) and throwing the junk away. I was misled by the note in the codecs class that BOMs were being prepended. Should have looked more carefully. Mea culpa. --

Re: removing BOM prepended by codecs?

2013-09-25 Thread Dave Angel
On 25/9/2013 06:38, J. Bagg wrote: So it is just a random sequence of junk. It will be a matter of finding the real start of the record (in this case a %) and throwing the junk away. Please join the list. Your present habit of starting a new thread for each of your messages is getting old.

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I'm having trouble with the BOM that is now prepended to codecs files. The files have to be read by java servlets which expect a clean file without any BOM. Is there a way to stop the BOM being written? It is seriously messing up my work as the servlets do not expect it to be there. I could

Re: removing BOM prepended by codecs?

2013-09-24 Thread Steven D'Aprano
On Tue, 24 Sep 2013 10:42:22 +0100, J. Bagg wrote: I'm having trouble with the BOM that is now prepended to codecs files. The files have to be read by java servlets which expect a clean file without any BOM. Is there a way to stop the BOM being written? Of course there is :-) but first we

Re: removing BOM prepended by codecs?

2013-09-24 Thread Peter Otten
J. Bagg wrote: I'm having trouble with the BOM that is now prepended to codecs files. The files have to be read by java servlets which expect a clean file without any BOM. Is there a way to stop the BOM being written? I think if you specify the byte order explicitly with UTF-16-LE or

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I'm using: outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict') to write as I know that the files are unicode compliant. I run the raw files that are delivered through a Python script to check the unicode and report problem characters which are then edited. The files use a whole

Re: removing BOM prepended by codecs?

2013-09-24 Thread Tim Golden
On 24/09/2013 14:01, J. Bagg wrote: I'm using: outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict') Well for the life of me I can't make that produce a BOM on 2.7 or 3.4. In other words: code import codecs with codecs.open(temp.txt, w+, utf-8, errors=strict) as f: f.write(abc)

Re: removing BOM prepended by codecs?

2013-09-24 Thread Dave Angel
On 24/9/2013 09:01, J. Bagg wrote: Why would you start a new thread? just do a Reply-List (or Reply-All and remove the extra names) to the appropriate message on the existing thread. I'm using: outputfile = codecs.open (fn, 'w+', 'utf-8', errors='strict') That won't be adding a BOM. It

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
I've checked the original files using od and they don't have BOMs. I'll remove them in the servlet. The overhead is probably small enough unless somebody is doing a massive search. We have a limit anyway to prevent somebody stealing the entire set of data. I started writing the Python search

Re: removing BOM prepended by codecs?

2013-09-24 Thread Peter Otten
J. Bagg wrote: I've checked the original files using od and they don't have BOMs. I'll remove them in the servlet. The overhead is probably small enough unless somebody is doing a massive search. We have a limit anyway to prevent somebody stealing the entire set of data. I started

Re: removing BOM prepended by codecs?

2013-09-24 Thread wxjmfauth
Le mardi 24 septembre 2013 11:42:22 UTC+2, J. Bagg a écrit : I'm having trouble with the BOM that is now prepended to codecs files. The files have to be read by java servlets which expect a clean file without any BOM. Is there a way to stop the BOM being written? It is

removing BOM prepended by codecs?

2013-09-24 Thread J. Bagg
My editor is JEdit. I use it on a Win 7 machine but have everything set up for *nix files as that is the machine I'm normally working on. The files are mailed to me as updates. The library where the indexers work do use MS computers but this is restricted to EndNote with an exporter into the

Re: removing BOM prepended by codecs?

2013-09-24 Thread Chris Angelico
On Wed, Sep 25, 2013 at 4:43 AM, wxjmfa...@gmail.com wrote: - The *mark* (once the Unicode.org terminology in FAQ) indicating a unicode encoded raw text file is neither a byte order mark, nor a signature, it is an encoded code point, the encoded U+FEFF, 'ZERO WIDTH NO-BREAK SPACE', code

Re: removing BOM prepended by codecs?

2013-09-24 Thread Piet van Oostrum
J. Bagg j.b...@kent.ac.uk writes: I've checked the original files using od and they don't have BOMs. I'll remove them in the servlet. The overhead is probably small enough unless somebody is doing a massive search. We have a limit anyway to prevent somebody stealing the entire set of data.