Re: write html-headers (utf-8)

2005-05-30 Thread Martin v. Lwis
db wrote: In this template I write a few Mysql variables. Those variable often have german characters. This characters (Gösing in stead of Gösing). The german characters in the html template are shown correctly. The problem is then with these variables: apparently, the Mysql variables are

Re: UTF16 codec doesn't round-trip?

2005-05-28 Thread Martin v. Lwis
John Perks and Sarah Mount wrote: If the ascii can't be recognized as UTF16, then surely the codec shouldn't have allowed it to be encoded in the first place? I could understand if it was trying to decode ascii into (native) UTF32. Please don't call the thing you are trying to decode ascii.

Re: processing a large utf-8 file

2005-05-20 Thread Martin v. Lwis
Ivan Voras wrote: Since the .encoding attribute of file objects are read-only, what is the proper way to process large utf-8 text files? You should use codecs.open, or codecs.getreader to get a StreamReader for UTF-8. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Re: Q: The `print' statement over Unicode

2005-05-08 Thread Martin v. Lwis
Jeremy Bowers wrote: Then I'd honor his consistency of belief, but still consider it impolite in general, as asking someone to do tons of work overall to save you a bit is almost always impolite. This is not what he did, though - he did not break the protocol by sending in patches by email

Re: unicode em space in regex

2005-04-17 Thread Martin v. Lwis
Xah Lee wrote: Thanks. Is it true that any unicode chars can also be used inside regex literally? e.g. re.search(ur'+',mystring,re.U) I tested this case and apparently i can. Yes. In fact, when you write u\u2003 or u doesn't matter to re.search. Either way you get a Unicode object with

Re: unicode em space in regex

2005-04-16 Thread Martin v. Lwis
Xah Lee wrote: how to represent the unicode em space in regex? You will have to pass a Unicode literal as the regular expression, e.g. fracture=re.split(u'\u2003*\\|\u2003*',myline,re.U) Notice that, in raw Unicode literals, you can still use \u to escape characters, e.g.

Re: Licensing Python code under the Python license

2005-03-13 Thread Martin v. Lwis
JanC wrote: This is difficult to do right, if you have to consider all the laws in different countries... Right. So he points out that his explanations are for US copyright law only, and then that legislation even in different US states, or perhaps even in districts, might be different.

Re: distutils: binary distribution?

2005-03-09 Thread Martin v. Lwis
Stefan Waizmann wrote: I would like the distutils are creating a binary distribution only - means create the distribution file with *.pyc files WITHOUT the *.py files. Any ideas? You will need to create your own command. You can either specialize the build command, to not copy the source code

Re: Unicode BOM marks

2005-03-09 Thread Martin v. Lwis
Steve Horsley wrote: It is my understanding that the BOM (U+feff) is actually the Unicode character Non-breaking zero-width space. My understanding is that this used to be the case. According to http://www.unicode.org/faq/utf_bom.html#38 the application should now specify specific processing,

Re: unicode surrogates in py2.2/win

2005-03-08 Thread Martin v. Lwis
Mike Brown wrote: Very strange how it only shows up after the 1st import attempt seems to succeed, and it doesn't ever show up if I run the code directly or run the code in the command-line interpreter. The reason for that is that the Python byte code stores the Unicode literal in UTF-8. The

Re: Unicode BOM marks

2005-03-07 Thread Martin v. Lwis
Francis Girard wrote: If I understand well, into the UTF-8 unicode binary representation, some systems add at the beginning of the file a BOM mark (Windows?), some don't. (Linux?). Therefore, the exact same text encoded in the same UTF-8 will result in two different binary files, and of a

Re: Unicode BOM marks

2005-03-07 Thread Martin v. Lwis
Francis Girard wrote: Well, no text files can't be concatenated ! Sooner or later, someone will use cat on the text files your application did generate. That will be a lot of fun for the new unicode aware super-cat. Well, no. For example, Python source code is not typically concatenated, nor is

Re: unicode(obj, errors='foo') raises TypeError - bug?

2005-02-23 Thread Martin v. Lwis
Steven Bethard wrote: Yeah, I agree it's weird. I suspect if someone supplied a patch for this behavior it would be accepted -- I don't think this should break backwards compatibility (much). Notice that the right thing to do would be to pass encoding and errors to __unicode__. If the string

Re: unicode(obj, errors='foo') raises TypeError - bug?

2005-02-23 Thread Martin v. Lwis
Kent Johnson wrote: Could this be handled with a try / except in unicode()? Something like this: Perhaps. However, this would cause a significant performance hit, and possbibly undesired side effects. So due process would require that the interface of __unicode__ first, and then change the actual

Re: unicode encoding usablilty problem

2005-02-21 Thread Martin v. Lwis
aurora wrote: What is the processing of getting a PEP work out? Does the work and discussion carry out in the python-dev mailing list? I would be glad to help out especially on this particular issue. See PEP 1 for the PEP process. The main point is that discussion is *not* carried out on any

Re: unicode encoding usablilty problem

2005-02-20 Thread Martin v. Lwis
aurora wrote: Lots of errors. Amount them are gzip (binary?!) and strftime?? For gzip, this is not surprising. It contains things like self.fileobj.write('\037\213') which is not intended to denote characters. How about b'' - 8bit string; '' unicode string and no automatic conversion. This

Re: unicode encoding usablilty problem

2005-02-18 Thread Martin v. Lwis
aurora wrote: The Java has a much more usable model with unicode used internally and encoding/decoding decision only need twice when dealing with input and output. In addition to Fredrik's comment (that you should use the same model in Python) and Walter's comment (that you can enforce it by

Re: supress creation of .pyc files

2005-02-16 Thread Martin v. Lwis
Thomas Guettler wrote: Is there a way to import a file without creating a .pyc file? That is part of PEP 304, which is not implemented yet, and apparently currently stalled due to lack of interest. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Re: xml parsing escape characters

2005-01-21 Thread Martin v. Lwis
Luis P. Mendes wrote: From your experience, do you think that if this wrong XML code could be meant to be read only by somekind of Microsoft parser, the error will not occur? This is very unlikely. MSXML would never do this incorrectly. Regards, Martin --

Re: rotor replacement

2005-01-21 Thread Martin v. Lwis
[EMAIL PROTECTED] wrote: Do you know this for a fact? I'm going by newsgroup messages from around the time that I was proposing to put together a standard block cipher module for Python. Ah, newsgroup messages. Anybody could respond, whether they have insight or not. The PSF does comply with the

Re: xml parsing escape characters

2005-01-20 Thread Martin v. Lwis
Luis P. Mendes wrote: with:DataSetNode = stringNode.childNodes[0] print DataSetNode.toxml() I get: lt;DataSetgt; ~ lt;Ordergt; ~lt;Customergt;439lt;/Customergt; ~ lt;/Ordergt; lt;/DataSetgt; ___- so far so good, but when I

Re: xml parsing escape characters

2005-01-20 Thread Martin v. Lwis
Irmen de Jong wrote: The unescaping is usually done for you by the xml parser that you use. Usually, but not in this case. If you have a text that looks like XML, and you want to put it into an XML element, the XML file uses lt; and gt;. The XML parser unescapes that as and . However, it does not

Re: xml parsing escape characters

2005-01-20 Thread Martin v. Lwis
Irmen de Jong wrote: Usually, but not in this case. If you have a text that looks like XML, and you want to put it into an XML element, the XML file uses lt; and gt;. The XML parser unescapes that as and . However, it does not then consider the and as markup, and it shouldn't. That's also what

Re: xml parsing escape characters

2005-01-20 Thread Martin v. Lwis
Luis P. Mendes wrote: When I access the url via the Firefox browser and look into the source code, I also get: ?xml version=1.0 encoding=utf-8? string xmlns=httplt;DataSetgt; ~ lt;Ordergt; ~lt;Customergt;439lt;/Customergt; ~ lt;/Ordergt; lt;/DataSetgt;/string Please do try to

Re: ElementTree cannot parse UTF-8 Unicode?

2005-01-20 Thread Martin v. Lwis
Jarek Zgoda wrote: So why are there non-UNICODE versions of wxPython??? To save memory or something??? Win95, Win98, WinME have problems with unicode. This problem can be solved - on W9x, wxPython would have to pass all Unicode strings to WideCharToMultiByte, using CP_ACP, and then pass the

Re: xml parsing escape characters

2005-01-19 Thread Martin v. Lwis
Luis P. Mendes wrote: I get the following result: ?xml version=1.0 encoding=utf-8? string xmlns=http://www..;lt;DataSetgt; ~ lt;Ordergt; Most likely, this result is correct, and your document really does contain lt;Ordergt; I don't get any elements. But, if I access the same url via a

Re: Unicode conversion in 'print'

2005-01-14 Thread Martin v. Lwis
Ricardo Bugalho wrote: thanks for the information. But what I was really looking for was informaion on when and why Python started doing it (previously, it always used sys.getdefaultencoding())) and why it was done only for 'print' when stdout is a terminal instead of always. It does that since

Re: Referenz auf Variable an Funktion bergeben?

2005-01-10 Thread Martin v. Lwis
Torsten Mohr wrote: Geht sowas auch in Python? Nicht direkt. Es ist blich, dass Funktionen, die Ergebnisse (Rckgabewerte) liefern, dies mittels return tun: def vokale(string): result = [c for c in string if c in aeiou] return .join(result) x = Hallo, Welt x = vokale(x) Falls man mehrere