RedNotebook 1.1.2
RedNotebook 1.1.2 has been released. You can get the tarball, the Windows installer and links to distribution packages at http://rednotebook.sourceforge.net/downloads.html What is RedNotebook? RedNotebook is a **graphical journal** and diary helping you keep track of notes and thoughts. It includes a calendar navigation, customizable templates, export functionality and word clouds. You can also format, tag and search your entries. RedNotebook is available in the repositories of most common Linux distributions and a Windows installer is available. It is written in Python and uses GTK+ for its interface. What's new? --- * Add fullscreen mode (F11) * Highlight all found occurences of the searched word (LP:614353) * Highlight mixed markups (**__Bold underline__**) * Highlight structured headers (=Part=, ==Subpart==, ===Section===, Subsection, =Subsubsection=) * Document structured headers * Highlight ``, , '' * Write documentation about ``, , '' * Let the preview and edit button have the same size * Fix: Correctly highlight lists (LP:622456) * Fix: Do not set maximized to True when sending RedNotebook to the tray (LP:657421) * Fix: Add Ctrl-P shortcut for edit button (LP:685609) * Fix: Add \ to the list of ignored chars for word clouds * Fix: Escape characters before adding results to the search list * Fix: Local links with whitespace in latex * Windows: Fix opening linked files * Windows: Do not center window to prevent alignment issues * Windows: Fix image preview (LP:663944) * Internal: Replace tabs by whitespace in source code * Many translations updated Cheers, Jendrik -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
Pogo 0.3.1
I am proud to announce the release of Pogo 0.3.1, probably the simplest and fastest audio player for Linux. You can get the tarball and an Ubuntu deb package at http://launchpad.net/pogo What is Pogo? Pogo plays your music. Nothing else. It tries to be fast and easy-to-use. Pogo's elementary-inspired design uses the screen-space very efficiently. It is especially well-suited for people who organize their music by albums on the harddrive. The main interface components are a directory tree and a playlist that groups albums in an innovative way. Pogo is a fork of Decibel Audio Player. Supported file formats include Ogg Vorbis, MP3, FLAC, Musepack, Wavpack, and MPEG-4 AAC. Pogo is written in Python and uses GTK+ and gstreamer. What's new in 0.3.1 You are a radar detector (2010-12-26) == * When a track is added from nautilus etc. start playback if not already playing * Show info messages when no music directories have been added * Stop old search when user clears search field or enters new search phrase * Add search shortcut (Ctrl-F) * Do not allow adding root or home directory to music directories * Translations updated Cheers, Jendrik -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/
Fw: Re: User input masks - Access Style
On 2010-12-27, flebber flebber.c...@gmail.com wrote: Is there anyay to use input masks in python? Similar to the function found in access where a users input is limited to a type, length and format. So in my case I want to ensure that numbers are saved in a basic format. 1) Currency so input limited to 000.00 eg 1.00, 2.50, 13.80 etc Some GUIs provide this functionality or provide callbacks for validation functions that can determine the validity of the input. ? don't know of any modules that provide formatted input in a terminal. ?ost terminal input functions just read from stdin (in this case a buffered line) and output that as a string. ?t is easy enough to validate whether terminal input is in the proper. Your example time code might look like: ... import re ... import sys ... ... # get the input ... print(Please enter time in the format 'MM:SS:HH': , end=) ... timeInput = input() ... ... # validate the input is in the correct format (usually this would be in ... # loop that continues until the user enters acceptable data) ... if re.match(r'''^[0-9]{2}:[0-9]{2}:[0-9]{2}$''', timeInput) == None: ... ??print(I'm sorry, your input is improperly formated.) ... ??sys.exit(1) ... ... # break the input into its componets ... componets = timeInput.split(:) ... minutes = int(componets[0]) ... seconds = int(componets[1]) ... microseconds = int(componets[2]) ... ... # output the time ... print(Your time is: + %02d % minutes + : + %02d % seconds + : + ... ??%02d % microseconds) Currency works the same way using validating it against: r'''[0-9]+\.[0-9]{2}''' For sports times that is time duration not a system or date times should I assume that I would need to calculate a user input to a decimal number and then recalculate it to present it to user? I am not sure what you are trying to do or asking. ?ython provides time, date, datetime, and timedelta objects that can be used for date/time calculations, locale based formatting, etc. ?hat you use, if any, will depend on what you are actually tring to accomplish. ?our example doesn't really show you doing much with the time so it is difficult giving you any concrete recommendations. yes you are right I should have clarified. The time is a duration over distance, so its a speed measure. Ultimately I will need to store the times so I may need to use something likw sqlAlchemy but I am nowehere near the advanced but I know that most Db's mysql, postgre etc don't support time as a duration as such and i will probably need to store it as a decimal and convert it back for the user. -- http://mail.python.org/mailman/listinfo/python-list You can let a user to separately input the days, hours, minutes, etc. And use the type timedelta to store the time duration: datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]) Beyond 2.7, you can use timedelta.total_seconds() to convert the time duration to a number for database using. And later restore the number back to timedelta by timedelta(seconds=?). Refer to: http://docs.python.org/library/datetime.html?highlight=timedelta#timedelta-objects -- --- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --- -- http://mail.python.org/mailman/listinfo/python-list
Re: round in 2.6 and 2.7
On Dec 23, 6:57 pm, Hrvoje Niksic hnik...@xemacs.org wrote: I stumbled upon this. Python 2.6: round(9.95, 1) 10.0 So it seems that Python is going out of its way to intuitively round 9.95, while the repr retains the unnecessary digits. No, Python's not doing anything clever here. Python 2.6 uses a simple rounding algorithm that frequently gives the wrong answer for halfway or near-halfway cases. It's just luck that in this particular case it gives the apparently-correct (but actually incorrect) answer. Martin's already explained that the 2.7 behaviour is correct, and agrees with string formatting. However, note that there's still a disconnect between these two operations in Python 2.7: round(1.25, 1) 1.3 format(1.25, '.1f') '1.2' That's because 'round' in Python 2.x (including 2.7) still rounds exact halfway cases away from zero, while string formatting rounds them to the value with even last digit. In Python 3.x, even this discrepancy is fixed---everything does round-halfway-to-even. Is the change to round() expected? Expected, and intentional. :-) [Martin] Float-to-string and string-to-float conversions are correctly rounded. The round() function is also now correctly rounded. Not sure that this is correct English; I think it means that the round() function is now correct. Well, the correct result of the example the OP gave would be 9.9 exactly. But since 9.9 isn't exactly representable as a Python float, we necessarily get an approximation. The language above is intended to convey that it's the 'correctly rounded' approximation---that is, the closest Python float to the true value of 9.9 (with halfway cases rounded to even, as usual). Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: string u'hyv\xe4' to file as 'hyvä'
On Dec 27, 6:47 am, Mark Tolonen metolone+gm...@gmail.com wrote: gintare g.statk...@gmail.com wrote in message In file i find 'hyv\xe4' instead of hyv . When you open a file with codecs.open(), it expects Unicode strings to be written to the file. Don't encode them again. Also, .writelines() expects a list of strings. Use .write(): import codecs item=u'hyv\xe4' F=codecs.open('/opt/finnish.txt', 'w+', 'utf8') F.write(item) F.close() Gintare, Mark's code is correct. When you are reading the file back make sure you understand what you are seeing: F2 = codecs.open('finnish.txt', 'r', 'utf8') item2 = F2.read() item2 u'hyv\xe4' That might like as though item2 is 7 characters long, and it contains a backslash followed by x, e, 4. However item2 is identical to item, they both contain 4 characters - the final one being a-umlaut. Python has shown the string using a backslash escape, because printing a non- ascii character might fail. You can see this directly, if your Python session is running in a terminal (or GUI) that can handle non-ascii characters: print item2 hyvä -- http://mail.python.org/mailman/listinfo/python-list
Interning own classes like strings for speed and size?
Hi! I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. At first, I just modeled these as a simple class containing two values for each axis. In a second step, I derived this class from tuple instead of object. Some code then moved from __init__ to __new__ and some code that modified these objects had to be changed to replace them instead. The upside to this is that they can be used as keys in sets and dicts, which isn't the case for mutable types[1]. What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. Am I looking in the wrong direction? Is there some better approach? Please don't tell me to use C, as I'm specifically interested in learning Python, I'm pretty sure I could have solved the problem quickly in C++ otherwise. Other suggestions? Cheers! Uli [1] Somebody correct me if I'm wrong, but I believe I could have defined a hashing function for the type and thus allowed its use in a set or dict, right? However, goofing up because you accidentally modified an object and changed its hash value is something I don't want to risk anyway. -- http://mail.python.org/mailman/listinfo/python-list
Re: Fw: Re: User input masks - Access Style
On Dec 27, 7:57 pm, linmq li...@neusoft.com wrote: On 2010-12-27, flebber flebber.c...@gmail.com wrote: Is there anyay to use input masks in python? Similar to the function found in access where a users input is limited to a type, length and format. So in my case I want to ensure that numbers are saved in a basic format. 1) Currency so input limited to 000.00 eg 1.00, 2.50, 13.80 etc Some GUIs provide this functionality or provide callbacks for validation functions that can determine the validity of the input. ? don't know of any modules that provide formatted input in a terminal. ?ost terminal input functions just read from stdin (in this case a buffered line) and output that as a string. ?t is easy enough to validate whether terminal input is in the proper. Your example time code might look like: ... import re ... import sys ... ... # get the input ... print(Please enter time in the format 'MM:SS:HH': , end=) ... timeInput = input() ... ... # validate the input is in the correct format (usually this would be in ... # loop that continues until the user enters acceptable data) ... if re.match(r'''^[0-9]{2}:[0-9]{2}:[0-9]{2}$''', timeInput) == None: ... ??print(I'm sorry, your input is improperly formated.) ... ??sys.exit(1) ... ... # break the input into its componets ... componets = timeInput.split(:) ... minutes = int(componets[0]) ... seconds = int(componets[1]) ... microseconds = int(componets[2]) ... ... # output the time ... print(Your time is: + %02d % minutes + : + %02d % seconds + : + ... ??%02d % microseconds) Currency works the same way using validating it against: r'''[0-9]+\.[0-9]{2}''' For sports times that is time duration not a system or date times should I assume that I would need to calculate a user input to a decimal number and then recalculate it to present it to user? I am not sure what you are trying to do or asking. ?ython provides time, date, datetime, and timedelta objects that can be used for date/time calculations, locale based formatting, etc. ?hat you use, if any, will depend on what you are actually tring to accomplish. ?our example doesn't really show you doing much with the time so it is difficult giving you any concrete recommendations. yes you are right I should have clarified. The time is a duration over distance, so its a speed measure. Ultimately I will need to store the times so I may need to use something likw sqlAlchemy but I am nowehere near the advanced but I know that most Db's mysql, postgre etc don't support time as a duration as such and i will probably need to store it as a decimal and convert it back for the user. -- http://mail.python.org/mailman/listinfo/python-list You can let a user to separately input the days, hours, minutes, etc. And use the type timedelta to store the time duration: datetime.timedelta([days[, seconds[, microseconds[, milliseconds[, minutes[, hours[, weeks]]]) Beyond 2.7, you can use timedelta.total_seconds() to convert the time duration to a number for database using. And later restore the number back to timedelta by timedelta(seconds=?). Refer to:http://docs.python.org/library/datetime.html?highlight=timedelta#time... -- --- Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) is intended only for the use of the intended recipient and may be confidential and/or privileged of Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is not the intended recipient, unauthorized use, forwarding, printing, storing, disclosure or copying is strictly prohibited, and may be unlawful.If you have received this communication in error,please immediately notify the sender by return e-mail, and delete the original message and all copies from your system. Thank you. --- Very helpful thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. At first, I just modeled these as a simple class containing two values for each axis. In a second step, I derived this class from tuple instead of object. Some code then moved from __init__ to __new__ and some code that modified these objects had to be changed to replace them instead. The upside to this is that they can be used as keys in sets and dicts, which isn't the case for mutable types[1]. What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. Am I looking in the wrong direction? Is there some better approach? Please don't tell me to use C, as I'm specifically interested in learning Python, I'm pretty sure I could have solved the problem quickly in C++ otherwise. Other suggestions? Cheers! Uli [1] Somebody correct me if I'm wrong, but I believe I could have defined a hashing function for the type and thus allowed its use in a set or dict, right? However, goofing up because you accidentally modified an object and changed its hash value is something I don't want to risk anyway. I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern How it's done in python see http://www.google.com/search?q=python+singleton Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Daniel Fetchinson wrote: I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern Actually, no. What I want is the flyweight pattern instead: http://en.wikipedia.org/wiki/Flyweight_pattern ...but thank you for the approach of looking for a suitable pattern! Cheers! Uli -- http://mail.python.org/mailman/listinfo/python-list
Re: type(d) != type(d.copy()) when type(d).issubclass(dict)
kj no.em...@please.post wrote: In (almost?) all cases any objects constructed by a subclass of a builtin class will be of the original builtin class. What I *really* would like to know is: how do *you* know this (and the same question goes for the other responders who see this behavior of dict as par for the course). Can you show me where it is in the documentation? I'd really appreciate it. TIA! I know it from experience (and reading source). So far as I can tell it isn't explicitly stated anywhere in the documentation. Mostly the documentation just says a method returns 'a copy of' prossibly with some modification. For example: str.capitalize() Return a copy of the string with its first character capitalized and the rest lowercased. That is ambiguous as it leaves open the question whether it returns a string that is a copy or an object of the type being operated upon. It happens to be the former but it doesn't actually say. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
I believe what you are looking for is (some variant of) the singleton pattern: http://en.wikipedia.org/wiki/Singleton_pattern Actually, no. What I want is the flyweight pattern instead: http://en.wikipedia.org/wiki/Flyweight_pattern Oh I see. I did not know about this pattern, but in my defense it looks like a variant of the singleton pattern :) Thanks! One always learns something new on python-list. Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown -- http://mail.python.org/mailman/listinfo/python-list
Re: How to pop the interpreter's stack?
Steven D'Aprano wrote: On Sun, 26 Dec 2010 09:15:32 -0800, Ethan Furman wrote: Steven D'Aprano wrote: Right. But I have thought of a clever trick to get the result KJ was asking for, with the minimum of boilerplate code. Instead of this: def _pre_spam(args): if condition(args): raise SomeException(message) if another_condition(args): raise AnotherException(message) if third_condition(args): raise ThirdException(message) def spam(args): _pre_spam(args) do_useful_work() you can return the exceptions instead of raising them (exceptions are just objects, like everything else!), and then add one small piece of boilerplate to the spam() function: def _pre_spam(args): if condition(args): return SomeException(message) if another_condition(args): return AnotherException(message) if third_condition(args): return ThirdException(message) def spam(args): exc = _pre_spam(args) if exc: raise exc do_useful_work() -1 You failed to mention that cleverness is not a prime requisite of the python programmer -- in fact, it's usually frowned upon. The big problem with the above code is you are back to passing errors in-band, pretty much completely defeating the point of have an out-of-band channel. How is that any worse than making _pre_spam() a validation function that returns a bool? def spam(args): flag = _pre_spam(args) if flag: raise SomeException() do_useful_work() Also -1. Is that also frowned upon for being too clever? Frowned upon for being out-of-band, and not as much fun as being clever. ;) I'm pretty sure you've expressed similar sentiments in the past (although my memory could be failing me). More to the point, the OP had code that said: args, kwargs = __pre_spam(*args, **kwargs) and __pre_spam was either passing back verified (and possibly modified) parameters, or raising an exception. ~Ethan~ -- http://mail.python.org/mailman/listinfo/python-list
Re: User input masks - Access Style
On Sun, 2010-12-26 at 20:37 -0800, flebber wrote: Is there anyay to use input masks in python? Similar to the function found in access where a users input is limited to a type, length and format. http://faq.pygtk.org/index.py?file=faq14.022.htpreq=show Typically this is handled by a callback on a keypress event. -- http://mail.python.org/mailman/listinfo/python-list
Re: string u'hyv\xe4' to file as 'hyvä'
On 27/12/2010 05:56, gintare wrote: Hello, STILL do not work. WHAT to be done. import codecs item=u'hyv\xe4' F=codecs.open('/opt/finnish.txt', 'w+', 'utf8') F.writelines(item.encode('utf8')) F.close() As I said in my previous post, you shouldn't be using .writelines, and you shouldn't encode it when writing it to the file because codecs.open will do that for you, that's its purpose: import codecs item = u'hyv\xe4' F = codecs.open('/opt/finnish.txt', 'w+', 'utf8') F.write(item) F.close() In file i find 'hyv\xe4' instead of hyvä. Sorry for mistyping in previous letter about 'latin-1'. I was making all possible combinations, when normal example syntax did not work, before writting to this forum regards, gintare On 27 Gruo, 01:14, MRABpyt...@mrabarnett.plus.com wrote: On 26/12/2010 22:43, gintare wrote: Could you please help me with special characters saving to file. I need to write the string u'hyv\xe4' to file. I would like to open file and to have line 'hyv ' import codecs word= u'hyv\xe4' F=codecs.open(/opt/finnish.txt, 'w+','Latin-1') This opens the file using the Latin-1 encoding (although only if you put the filename in quotes). F.writelines(item.encode('Latin-1')) This encodes the Unicode item (did you mean 'word'?) to a bytestring using the Latin-1 encoding. You opened the file using Latin-1 encoding, so this is pointless. You should pass a Unicode string; it will encode it for you. You're also passing a bytestring to the .writelines method, which expects a list of strings. What you should be doing is this: F.write(word) F.writelines(item.encode('utf8')) This encodes the Unicode item to a bytestring using the UTF-8 encoding. This is also pointless. You shouldn't be encoding to UTF-8 and then trying to write it to a file which was opened using Latin-1 encoding! F.writelines(item) F.close() All three writelines gives the same result in finnish.txt: hyv\xe4 i would like to find 'hyv '.- Slėpti cituojamą tekstą - - Rodyti cituojamą tekstą - -- http://mail.python.org/mailman/listinfo/python-list
Language Detection Library/Code
Can anyone suggest a *language detection library* in python which works on a phrase of say 2-5 words. -- ~l0nwlf -- http://mail.python.org/mailman/listinfo/python-list
Re: Keeping track of the N largest values
Am 26.12.2010 19:51, schrieb Stefan Sonnenberg-Carstens: l = [] K = 10 while 1: a = input() if len(l) == K: l.remove(min(l)) l=[x for x in l if x a] + [a] + [x for x in l if x a] print l A minor fault made it into my prog: l = [0] K = 10 while 1: a = input() l=[x for x in l if x a] + [a] + [x for x in l if x a] if len(l) == K: l.remove(min(l)) print l -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On 12/27/2010 6:05 AM, Ulrich Eckhardt wrote: Hi! I'm trying to solve a computational problem and of course speed and size is important there. Apart from picking the right algorithm, I came across an idea that could help speed up things and keep memory requirements down. What I have is regions described by min and max coordinates. What sort of numbers are the coordinates? If integers in a finite range, your problem is a lot simpler than if float of indefinite precision. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On 12/21/2010 3:16 AM, Stefan Behnel wrote: Adam Tauno Williams, 20.12.2010 20:49: ... You need to process the document as a stream of elements; aka SAX. IMHO, this is the worst advice you can give. Why do you say that? I would have thought that using SAX in this application is an excellent idea. I agree that for applications for which performance is not a problem, and for which we need to examine more than one or a few element types, a tree implementation is more functional, less programmer intensive, and provides an easier to understand approach to the data. But with huge amounts of data where performance is a problem SAX will be far more practical. In the special case where only a few elements are of interest in a complex tree, SAX can sometimes also be more natural and easy to use. SAX might also be more natural for this application. The O.P. could tell us for sure, but I wonder if perhaps his 1 GB XML file is NOT a true single record. You can store an entire text encyclopedia in less than one GB. What he may have is a large number logically distinct individual records of some kind, each stored as a node in an all-encompassing element wrapper. Building a tree for each record could make sense but, if I'm right about the nature of the data, building a tree for the wrapper gives very little return for the high cost. If that's so, then I'd recommend one of two approaches: 1. Use SAX, or 2. Parse out individual logical records using string manipulation on an input stream, then build a tree for one individual record in memory using one of the DOM or ElementTree implementations. After each record is processed, discard its tree and start on the next record. Alan -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On 12/26/2010 3:15 PM, Tim Harig wrote: ... The problem is that XML has become such a defacto standard that it used automatically, without thought, even when there are much better alternatives available. I agree with you but, as you say, it has become a defacto standard. As a result, we often need to use it unless there is some strong reason to use something else. The same thing can be said about relational databases. There are applications for which a hierarchical database makes more sense, is more efficient, and is easier to understand. But anyone who recommends a database that is not relational had better be prepared to defend his choice with some powerful reasoning because his management, his customers, and the other programmers on his team are probably going to need a LOT of convincing. And of course there are many applications where XML really is the best. It excels at representing complex textual documents while still allowing programmatic access to individual items of information. Alan -- http://mail.python.org/mailman/listinfo/python-list
Re: __delitem__ feature
In 4d181afb$0$30001$c3e8da3$54964...@news.astraweb.com Steven D'Aprano steve+comp.lang.pyt...@pearwood.info writes: We know it because it explains the observable facts. So does Monday-night quarterbacking... -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
Terry Reedy wrote: What sort of numbers are the coordinates? If integers in a finite range, your problem is a lot simpler than if float of indefinite precision. Yes, indeed, I could optimize the amount of data required to store the data itself, but that would require application-specific handling of the data, which is actually not what I want to learn about. If it was that, I'd use a language where I have lower-level access to the system. ;) Thanks nonetheless! Uli -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
Alan Meyer, 27.12.2010 21:40: On 12/21/2010 3:16 AM, Stefan Behnel wrote: Adam Tauno Williams, 20.12.2010 20:49: ... You need to process the document as a stream of elements; aka SAX. IMHO, this is the worst advice you can give. Why do you say that? I would have thought that using SAX in this application is an excellent idea. From my experience, SAX is only practical for very simple cases where little state is involved when extracting information from the parse events. A typical example is gathering statistics based on single tags - not a very common use case. Anything that involves knowing where in the XML tree you are to figure out what to do with the event is already too complicated. The main drawback of SAX is that the callbacks run into separate method calls, so you have to do all the state keeping manually through fields of the SAX handler instance. My serious advices is: don't waste your time learning SAX. It's simply too frustrating to debug SAX extraction code into existence. Given how simple and fast it is to extract data with ElementTree's iterparse() in a memory efficient way, there is really no reason to write complicated SAX code instead. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Digitally Signing a XML Document (using SHA1+RSA or SHA1+DSA)
Hi All, I have a requirement to digitally sign a XML Document using SHA1+RSA or SHA1+DSA Could someone give me a lead on a library that I can use to fulfill this requirement? The XML Document has values such as RSASK-BEGIN RSA PRIVATE KEY- MIIBOgIBAAJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqguw76g/jmeO6f4i31rDLVQ n7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQMCQQCOd2lLpgRm6esMblO18WOG 3h8oCNcaydfUa1QmaX0apHlDFnI7UDXpYaHp2VL9gvtSJT5L3ZASMzxRPXJSvzcT AiEA/16jQh18BAD4q3yk1gKw19I8OuJOYAxFYX9noCEFWUMCIQDWOiYfPtxK3A1s AFARsDnnHTL4FbRPpiZ79vP+VgqojwIhAKo/F4Fo/VgApceobeQByzqMKCdBiZVd g5ZU78AWA5DXAiEAjtFuv389hz1eSAA1YSAmmhN3UA54NRlu/U9NVDlccF8CIBkc Z52oGxy/skwVwI5TBcB1YqXJTT47/6/hTAVMTwaA -END RSA PRIVATE KEY-/RSASK RSAPUBK-BEGIN PUBLIC KEY- MFowDQYJKoZIhvcNAQEBBQADSQAwRgJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqgu w76g/jmeO6f4i31rDLVQn7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQM= -END PUBLIC KEY-/RSAPUBK And the XML also has another node that has a Public Key with Modules and Exponents etc that I apparently need to utilize. RSAPK M1bMd8XkGml7gkqV9kOoVSk0uvA1CqC7DvqD+OZ47p/iLfWsMtVCfuxiKW7rkLy836qcQac8Hzbi38DfJ8y7UbQ==/M EAw==/E /RSAPK I am a little thin on this concept and expecting if you could guide me to a library/documentation that I could utilize. Thanks a lot for your help. Regards, Anurag -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On Mon, 2010-12-27 at 22:55 +0100, Stefan Behnel wrote: Alan Meyer, 27.12.2010 21:40: On 12/21/2010 3:16 AM, Stefan Behnel wrote: Adam Tauno Williams, 20.12.2010 20:49: ... You need to process the document as a stream of elements; aka SAX. IMHO, this is the worst advice you can give. Why do you say that? I would have thought that using SAX in this application is an excellent idea. From my experience, SAX is only practical for very simple cases where little state is involved when extracting information from the parse events. A typical example is gathering statistics based on single tags - not a very common use case. Anything that involves knowing where in the XML tree you are to figure out what to do with the event is already too complicated. I've found that using a stack-model makes traversing complex documents with SAX quite manageable. For example, I parse BPML files with SAX. If the document is nested and context sensitive then I really don't see how iterparse differs all that much. My serious advices is: don't waste your time learning SAX. It's simply too frustrating to debug SAX extraction code into existence. Given how simple and fast it is to extract data with ElementTree's iterparse() in a memory efficient way, there is really no reason to write complicated SAX code instead. -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On 2010-12-27, Alan Meyer amey...@yahoo.com wrote: On 12/26/2010 3:15 PM, Tim Harig wrote: ... The problem is that XML has become such a defacto standard that it used automatically, without thought, even when there are much better alternatives available. I agree with you but, as you say, it has become a defacto standard. As a result, we often need to use it unless there is some strong reason to use something else. XML should be used where it makes sense to do so. As always, use the proper tool for the proper job. XML became such a defacto standard, in part, because it was abused for many uses in the first place so using it because it is a defacto standard is just piling more and more mistakes on top of each other. The same thing can be said about relational databases. There are applications for which a hierarchical database makes more sense, is more efficient, and is easier to understand. But anyone who recommends a database that is not relational had better be prepared to defend his choice with some powerful reasoning because his management, his customers, and the other programmers on his team are probably going to need a LOT of convincing. I have no particular problem with using other database models in theory. In practice, at least until recently, there were few decent implementations for alternative model databases. That is starting to change with the advent of the so-called NoSQL databases. There are a few models that I really do like; but, there are also a lot of failed models. A large part of the problem was the push towards object databases which is one of the failed models IMNSHO. Its failure tended to give some of the other datase models a bad name. And of course there are many applications where XML really is the best. It excels at representing complex textual documents while still allowing programmatic access to individual items of information. Much agreed. There are many things that XML does very well. It works great for XMP-RPC style interfaces. I prefer it over binary formats for documents. It does suitibly for exporting discreet amounts of information. There are however a number of things that it does poorly. I don't condone its use for configuration files. I don't condone its use as a data store and when you have data approaching gigabytes, that is exaclty how you are using it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On 27 December 2010 22:05, Ulrich Eckhardt dooms...@knuut.de wrote: What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. The problem I foresee is that if I define equality as identity, this lookup when creating will never eliminate duplicates. If I only fall back to equality comparison for non-identical objects, I would probably sacrifice most of the gain. If I build a dict mapping between the values and the actual objects, I would have doubled the required memory and uselessly store the same values twice there. The first thing to deal with the equality check. The way this is generally done is to first do an identity check, then if that fails fall back to an equality check. This gives you a fast path for the normal case, but still gives full equality checks on a slow path. Your assumption of double storage for a dict is somewhat flawed if I understand you correctly. The mapping: (value1, value2, ...) = my_object(value1, value2, ...) *could* result in value1, value2, ... being created and stored twice (hence the possibility of double storage) and the mapping tuple being stored + your object. However, if the key and value are the same object, there is only a single additional reference being stored (within the dict structure of course). The way you should probably deal with this is to always create one of your objects for doing the lookup. Then your algorithm is: new_object = my_object(value1, value2, ...) try: canonical = canonical_dict[new_object] except KeyError: canonical = canonical_dict[new_object] = new_object You'd have to structure your __new__ appropriately to do it there, but it is possible assuming that everything you need for equality testing is done in __new__. If you further want to reduce storage (if it's an issue) you could also canonicalise the values themselves using a similar technique. You could even use the same canonicalisation dictionary so long as you could ensure that none of the different types compare equal (e.g. floats and integers). Note that as an implementation detail the integers -5...256 are already interned, but you can't rely on that (the range has changed over time). Tim Delaney -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
Alan Meyer amey...@yahoo.com wrote: On 12/26/2010 3:15 PM, Tim Harig wrote: I agree with you but, as you say, it has become a defacto standard. As a result, we often need to use it unless there is some strong reason to use something else. This is certainly true. In the rarified world of usenet, we can all bash XML (and I'm certainly front and center of the XML bashing crowd). In the real world, however, it's a necessary evil. Knowing how to work with it (at least to some extent) should be in every software engineer's bag of tricks. The same thing can be said about relational databases. There are applications for which a hierarchical database makes more sense, is more efficient, and is easier to understand. But anyone who recommends a database that is not relational had better be prepared to defend his choice with some powerful reasoning because his management, his customers, and the other programmers on his team are probably going to need a LOT of convincing. This is also true. In the old days, they used to say, Nobody ever got fired for buying IBM. Relational databases have pretty much gotten to that point. Suits are comfortable with Oracle and MS SqlServer, and even MySQL. If you want to go NoSQL, the onus will be on you to demonstrate that it's the right choice. Sometimes, even when it is the right choice, it's the wrong choice. You typically have a limited amount of influence capital to spend, and many battles to fight. Sometimes it's right to go along with SQL, even if you know it's wrong from a technology point of view, simply because taking the easy way out on that battle may let you devote the energy you need to win more important battles. And, anyway, when your SQL database becomes the bottleneck, you can always go back and say, I told you so. Trust me, if you're ever involved in an I told you so moment, you really want to be on the transmitting end. And of course there are many applications where XML really is the best. It excels at representing complex textual documents while still allowing programmatic access to individual items of information. Yup. For stuff like that, there really is no better alternative. To go back to my earlier example of Parental-AdvisoryFALSE/Parental-Advisory using 432 bits to store 1 bit of information, stuff like that doesn't happen in marked-up text documents. Most of the file is CDATA (do they still use that term in XML, or was that an SGML-ism only?). The markup is a relatively small fraction of the data. I'm happy to pay a factor of 2 or 3 to get structured text that can be machine processed in useful ways. I'm not willing to pay a factor of 432 to get tabular data when there's plenty of other much more reasonable ways to encode it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Interning own classes like strings for speed and size?
On Mon, 27 Dec 2010 12:05:10 +0100, Ulrich Eckhardt wrote: What I'm now considering is to only allow a single instance of these objects for each set of values, similar to interned strings. What I would gain is that I could safely compare objects for identity instead of equality. What I'm not yet sure is how much overhead that would give me and/or how to keep it low. The idea is to store each instance in a set and after creating a new object I would first look up an equal object in the global set and return that instead, otherwise add the new one. Try this technique: class InternedTuple(tuple): ... _cache = {} ... def __new__(cls, *args): ... t = super().__new__(cls, *args) ... return cls._cache.setdefault(t, t) ... t1 = InternedTuple((1.0, 2.0)) t2 = InternedTuple((0.0, 0.0)) t3 = InternedTuple((1.0, 2.0)) t1 is t2 False t1 is t3 True t1 == t2 False t1 == t3 True -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On 12/27/2010 4:55 PM, Stefan Behnel wrote: ... From my experience, SAX is only practical for very simple cases where little state is involved when extracting information from the parse events. A typical example is gathering statistics based on single tags - not a very common use case. Anything that involves knowing where in the XML tree you are to figure out what to do with the event is already too complicated. The main drawback of SAX is that the callbacks run into separate method calls, so you have to do all the state keeping manually through fields of the SAX handler instance. My serious advices is: don't waste your time learning SAX. It's simply too frustrating to debug SAX extraction code into existence. Given how simple and fast it is to extract data with ElementTree's iterparse() in a memory efficient way, there is really no reason to write complicated SAX code instead. Stefan I confess that I hadn't been thinking about iterparse(). I presume that clear() is required with iterparse() if we're going to process files of arbitrary length. I should think that this approach provides an intermediate solution. It's more work than building the full tree in memory because the programmer has to do some additional housekeeping to call clear() at the right time and place. But it's less housekeeping than SAX. I guess I've done enough SAX, in enough different languages, that I don't find it that onerous to use. When I need an element stack to keep track of things I can usually re-use code I've written for other applications. But for a programmer that doesn't do a lot of this stuff, I agree, the learning curve with lxml will be shorter and the programming and debugging can be faster. Alan -- http://mail.python.org/mailman/listinfo/python-list
Re: Language Detection Library/Code
On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand anand.shash...@gmail.com wrote: Can anyone suggest a language detection library in python which works on a phrase of say 2-5 words. Generally such libraries work by bi/trigram frequency analysis, which means you're going to have a fairly high error rate with such small phrases. If you're only dealing with a handful of languages it may make more sense to combine an existing library with a simple dictionary lookup model to improve accuracy. Katie -- CoderStack http://www.coderstack.co.uk/perl-jobs-in-london The Software Developer Job Board -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
On 12/27/2010 6:21 PM, Roy Smith wrote: ... In the old days, they used to say, Nobody ever got fired for buying IBM. Relational databases have pretty much gotten to that point That's _exactly_ the comparison I had in mind too. I once worked for a company that made a pitch to a big potential client (the BBC) and I made the mistake of telling the client that I didn't think a relational database was the best for his particular application. We didn't win that contract and I never made that mistake again! Alan -- http://mail.python.org/mailman/listinfo/python-list
Re: __delitem__ feature
On 12/26/2010 11:49 AM, kj wrote: Inmailman.302.1293387041.6505.python-l...@python.org Ian Kellyian.g.ke...@gmail.com writes: On 12/26/2010 10:53 AM, kj wrote: P.S. If you uncomment the commented-out line, and comment out the last line of the __init__ method (which installs self._delitem as self.__delitem__) then *all* the deletion attempts invoke the __delitem__ method, and are therefore blocked. FWIW. Because subclasses of builtins only check the class __dict__ for special method overrides, not the instance __dict__. How do you know this? From memory, although it seems I remembered it slightly wrong; it's the way new-style classes work in general, not anything to do with builtins in particular. Is this documented? Yes, as others have pointed out. Or is this a case of Monday-night quarterbacking? Do you mean Monday-morning quarterbacking? Either way, I don't know what you mean by that in this context. -- http://mail.python.org/mailman/listinfo/python-list
Re: Language Detection Library/Code
On Tue, Dec 28, 2010 at 6:03 AM, Katie T ka...@coderstack.co.uk wrote: On Mon, Dec 27, 2010 at 7:10 PM, Shashwat Anand anand.shash...@gmail.com wrote: Can anyone suggest a language detection library in python which works on a phrase of say 2-5 words. Generally such libraries work by bi/trigram frequency analysis, which means you're going to have a fairly high error rate with such small phrases. If you're only dealing with a handful of languages it may make more sense to combine an existing library with a simple dictionary lookup model to improve accuracy. Katie Infact I'm dealing with very few languages - German, French, Italian, Portugese and Russian. I read papers mentioning bi/tri gram frequency but was unable to find any library. 'guess-language' doesn't perform at all. The cld (Compact Language Detection) module of Google chrome performs well but it is not a standalone library ( I hope someone ports it ). Regarding dictionary lookup+n-gram approach I didn't quite understand what you wanted to say. -- http://mail.python.org/mailman/listinfo/python-list
Re: Partition Recursive
# parse_url11.py # devpla...@gmail.com # 2010-12 (Dec)-27 # A brute force ugly hack from a novice programmer. # You're welcome to use the code, clean it up, make positive suggestions # for improvement. Parse a url string into a list using a generator. #special_itemMeaning = ;?:@=#. #//, #/, special_item = [;, ?, :, @, =, , #, ., /, //] # drop urls with obviously bad formatting - NOTIMPLEMENTED drop_item = [|, localhost, .., ///] ignore_urls_containing = [php, cgi] def url_parser_generator(url): len_text = len(url) index = 0 start1 = 0# required here if url contains ONLY specials start2 = 0# required here if url contains ONLY non specials while index len_text: # LOOP1 == Get and item in the special_item list; can be any length if url[index] in special_item: start1 = index inloop1 = True while inloop1: if inloop1: if url[start1:index+1] in special_item: #print [,start1, :, index+1, ] = , url[start1:index+1] inloop1 = True else:# not in ANYMORE, but was in special_item #print [,start1, :, index, ] = , url[start1:index] yield url[start1:index] start1 = index inloop1 = False if inloop1: if index len_text-1: index = index + 1 else: #yield url[start1:index] # NEW inloop1 = False elif url[index] in drop_item: # not properly implemeted at all raise NotImplemented( Processing items in the drop_item list is not \ implemented., url[index]) elif url[index] in ignore_urls_containing: # not properly implemeted at all raise NotImplemented( Processing items in the ignore_urls_containing list \ is not implemented., url[index]) # LOOP2 == Get any item not in the special_item list; can be any length elif not url[index] in special_item: start2 = index inloop2 = True while inloop2: if inloop2: #if not url[start2:index+1] in special_item: #- doesnt work if not url[index] in special_item: #print [,start2, :, index+1, ] = , url[start2:index+1] inloop2 = True else:# not in ANYMORE, but item was not in special_item before #print [,start2, :, index, ] = , url[start2:index] yield url[start2:index] start2 = index inloop2 = False if inloop2: if index len_text-1: index = index + 1 else: #yield url[start2:index] # NEW inloop2 = False else: print url[index], Not Implemented # should not get here index = index + 1 if index = len_text-1: break # Process any remaining part of URL and yield it to caller. # Don't know if last item in url is a special or non special. # Used start1 and start2 instead of start and # used inloop1 and inloop2 instead of inloop # to help debug, as using just start and inloop can get be # harder to track in a generator. if start1 = start2: start = start1 else: start = start2 yield url[start: index+1] def parse(url): mylist = [] words = url_parser_generator(url) for word in words: mylist.append(word) #print word return mylist def test(): urls = { 0: (True,http://docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition), 1: (True,/http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition), 2: (True,//http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition), 3: (True,///http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition), 4: (True,/http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition/), 5: (True,//http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition//), 6: (True,///http:///docs.python.org/dev/library/stdtypes.html? highlight=partition#str.partition///), 7: (True,/#/http:///#docs.python..org/dev//library/ stdtypes./html??highlight=p=partition#str.partition///), 8: (True,httpdocspythonorgdevlibrarystdtypeshtmlhighlightpartitionstrpartition), 9: (True,httpdocs.pythonorgdevlibrarystdtypeshtmlhighlightpartitionstrpartition), 10:
Re: Digitally Signing a XML Document (using SHA1+RSA or SHA1+DSA)
On Tue, 2010-12-28 at 03:25 +0530, Anurag Chourasia wrote: Hi All, I have a requirement to digitally sign a XML Document using SHA1+RSA or SHA1+DSA Could someone give me a lead on a library that I can use to fulfill this requirement? http://stuvel.eu/rsa Never used it though. The XML Document has values such as RSASK-BEGIN RSA PRIVATE KEY- MIIBOgIBAAJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqguw76g/jmeO6f4i31rDLVQ n7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQMCQQCOd2lLpgRm6esMblO18WOG 3h8oCNcaydfUa1QmaX0apHlDFnI7UDXpYaHp2VL9gvtSJT5L3ZASMzxRPXJSvzcT AiEA/16jQh18BAD4q3yk1gKw19I8OuJOYAxFYX9noCEFWUMCIQDWOiYfPtxK3A1s AFARsDnnHTL4FbRPpiZ79vP+VgqojwIhAKo/F4Fo/VgApceobeQByzqMKCdBiZVd g5ZU78AWA5DXAiEAjtFuv389hz1eSAA1YSAmmhN3UA54NRlu/U9NVDlccF8CIBkc Z52oGxy/skwVwI5TBcB1YqXJTT47/6/hTAVMTwaA -END RSA PRIVATE KEY-/RSASK RSAPUBK-BEGIN PUBLIC KEY- MFowDQYJKoZIhvcNAQEBBQADSQAwRgJBANWzHfF5Bppe4JKlfZDqFUpNLrwNQqgu w76g/jmeO6f4i31rDLVQn7sYilu65C8vN+qnEGnPB824t/A3yfMu1G0CAQM= -END PUBLIC KEY-/RSAPUBK Is this any kind of standard or just something someone made up? Is there a namespace for the document? It seems quite odd that the document contains a *private* key. If all you need to do is parse to document to retrieve the values that seems straight-forward enough. And the XML also has another node that has a Public Key with Modules and Exponents etc that I apparently need to utilize. RSAPK M1bMd8XkGml7gkqV9kOoVSk0uvA1CqC7DvqD +OZ47p/iLfWsMtVCfuxiKW7rkLy836qcQac8Hzbi38DfJ8y7UbQ==/M EAw==/E /RSAPK I am a little thin on this concept and expecting if you could guide me to a library/documentation that I could utilize. -- http://mail.python.org/mailman/listinfo/python-list
How to programmatically exit from wsgi's serve_forever() loop
Is it possible to programmatically exit from the wsgiref's serve_forever() loop? I tried the following, all without success: httpd.server_close() httpd.shutdown() sys.exit(1) os._exit(1) (shouldn't this always abort an application?) raise KeyboardInterupt (Ctrl+Break from console works) Thanks, Malcolm -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
By the way Stefan, please don't take any of my comments as complaints. I use lxml more and more in my work. It's fast, functional and pretty elegant. I've written a lot of code on a lot of projects in my 35 year career but I don't think I've written anything anywhere near as useful to anywhere near as many people as lxml. Thank you very much for writing lxml and contributing it to the community. Alan -- http://mail.python.org/mailman/listinfo/python-list
Re: Language Detection Library/Code
Hi I already Developed a language detection with Python Here is the Link. With Regards, Santhosh V.Kumar -- http://mail.python.org/mailman/listinfo/python-list
Re: Language Detection Library/Code
Hi I already Developed a language detection with Python Here is the Link. http://code.google.com/p/langdet/ With Regards, Santhosh V.Kumar -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
Roy Smith, 28.12.2010 00:21: To go back to my earlier example of Parental-AdvisoryFALSE/Parental-Advisory using 432 bits to store 1 bit of information, stuff like that doesn't happen in marked-up text documents. Most of the file is CDATA (do they still use that term in XML, or was that an SGML-ism only?). The markup is a relatively small fraction of the data. I'm happy to pay a factor of 2 or 3 to get structured text that can be machine processed in useful ways. I'm not willing to pay a factor of 432 to get tabular data when there's plenty of other much more reasonable ways to encode it. If the above only appears once in a large document, I don't care how much space it takes. If it appears all over the place, it will compress down to a couple of bits, so I don't care about the space, either. It's readability that counts here. Try to reverse engineer a binary format that stores the above information in 1 bit. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
Alan Meyer, 28.12.2010 03:18: By the way Stefan, please don't take any of my comments as complaints. I don't. After all, this discussion is more about the general data format than the specific tools. I use lxml more and more in my work. It's fast, functional and pretty elegant. I've written a lot of code on a lot of projects in my 35 year career but I don't think I've written anything anywhere near as useful to anywhere near as many people as lxml. Thank you very much for writing lxml and contributing it to the community. Thanks, I'm happy to read that. You're welcome. Note that lxml also owes a lot to Fredrik Lundh for designing ElementTree and to Martijn Faassen for starting to reimplement it on top of libxml2 (and choosing the name :). Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Trying to parse a HUGE(1gb) xml file
Alan Meyer, 28.12.2010 01:29: On 12/27/2010 4:55 PM, Stefan Behnel wrote: From my experience, SAX is only practical for very simple cases where little state is involved when extracting information from the parse events. A typical example is gathering statistics based on single tags - not a very common use case. Anything that involves knowing where in the XML tree you are to figure out what to do with the event is already too complicated. The main drawback of SAX is that the callbacks run into separate method calls, so you have to do all the state keeping manually through fields of the SAX handler instance. My serious advices is: don't waste your time learning SAX. It's simply too frustrating to debug SAX extraction code into existence. Given how simple and fast it is to extract data with ElementTree's iterparse() in a memory efficient way, there is really no reason to write complicated SAX code instead. I confess that I hadn't been thinking about iterparse(). I presume that clear() is required with iterparse() if we're going to process files of arbitrary length. I should think that this approach provides an intermediate solution. It's more work than building the full tree in memory because the programmer has to do some additional housekeeping to call clear() at the right time and place. But it's less housekeeping than SAX. The iterparse() implementation in lxml.etree allows you to intercept on a specific tag name, which is especially useful for large XML documents that are basically an endless sequence of (however deeply structured) top-level elements - arguably the most common format for gigabyte sized XML files. So what I usually do here is to intercept on the top level tag name, clear() that tag after use and leave it dangling around, like this: for _, element in ET.iterparse(source, tag='toptagname'): # ... work on the element and its subtree element.clear() That allows you to write simple in-memory tree handling code (iteration, XPath, XSLT, whatever), while pushing the performance up (compared to ET's iterparse that returns all elements) and keeping the total amount of memory usage reasonably low. Even a series of several hundred thousand empty top level tags don't add up to anything that would truly hurt a decent machine. In many cases where I know that the XML file easily fits into memory anyway, I don't even do any housekeeping at all. And the true advantage is: if you ever find that it's needed because the file sizes grow beyond your initial expectations, you don't have to touch your tested and readily debugged data extraction code, just add a suitable bit of cleanup code, or even switch from the initial all-in-memory parse() solution to an event-driven iterparse()+cleanup solution. I guess I've done enough SAX, in enough different languages, that I don't find it that onerous to use. When I need an element stack to keep track of things I can usually re-use code I've written for other applications. But for a programmer that doesn't do a lot of this stuff, I agree, the learning curve with lxml will be shorter and the programming and debugging can be faster. I'm aware that SAX has the advantage of being available for more languages. But if you are in the lucky position to use Python for XML processing, why not just use the tools that it makes available? Stefan -- http://mail.python.org/mailman/listinfo/python-list
ANN : PySWITCH Release – 0.1alpha
Hi All, I am glad to announce the first alpha release of PySWITCH. http://pyswitch.sf.net The idea of PySWITCH is to offer a complete library to Python and Twisted programmers for interacting with FreeSWITCH using EventSocket interface. The target is to cover all FreeSWITCH API commands and Dialplan tools. PySWITCH handles all the low level details in executing FreeSWITCH commands, so the programmer can easily concentrate on quickly building FreeSWITCH applications. As an example, the API functions offered by PySWITCH often executes many FreeSWITCH commands under the hood and finally returns the desired result. Suppose you execute a background job, PySWITCH API will automatically wait and catch the backgroundjob event parse the result and will fire the deferred. The current release covers good amount of API commands and a few Dialplan tools. The protocol communication issues are ironed out. It has a nice event call back interface. I’ll present its usage in couple of tutorials soon. -- Thanks Regards, Godson Gera http://godson.in -- http://mail.python.org/mailman/listinfo/python-list
Re: How to programmatically exit from wsgi's serve_forever() loop
On 12/27/2010 6:05 PM, pyt...@bdurham.com wrote: Is it possible to programmatically exit from the wsgiref's serve_forever() loop? I tried the following, all without success: httpd.server_close() httpd.shutdown() sys.exit(1) os._exit(1) (shouldn't this always abort an application?) raise KeyboardInterupt (Ctrl+Break from console works) help(wsgiref.simple_server.WSGIServer.serve_forever) Help on method serve_forever in module SocketServer: serve_forever(self, poll_interval=0.5) unbound wsgiref.simple_server.WSGIServer method Handle one request at a time until shutdown. Polls for shutdown every poll_interval seconds. Ignores self.timeout. If you need to do periodic tasks, do them in another thread. help(wsgiref.simple_server.WSGIServer.shutdown) Help on method shutdown in module SocketServer: shutdown(self) unbound wsgiref.simple_server.WSGIServer method Stops the serve_forever loop. Blocks until the loop has finished. This must be called while serve_forever() is running in another thread, or it will deadlock. Did you try: import threading threading.Thread(target=httpd.shutdown).start() Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
[issue10771] descriptor protocol documentation has two different definitions of owner class
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: I agree that the owner terminology imprecise. Will work on a doc fix when I get chance. -- assignee: d...@python - rhettinger nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10771 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10774] test_logging leaves temp files
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: Fix checked into py3k (r87512). -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10774 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10626] Bad interaction between test_logging and test_concurrent_futures
Vinay Sajip vinay_sa...@yahoo.co.uk added the comment: The reason for the bad interaction is that some of the tests in test_logging disable all existing loggers (due to the configuration tests - disabling of existing loggers is explicitly tested for), but as a side effect this also disabled the concurrent.futures logger. I've made a change to test_logging which preserves the disabled state of all existing loggers across tests, and now all is well when testing regrtest.py test_concurrent_futures test_logging test_concurrent_futures after applying Brian's patch of 24 Dec 2010. The change has been checked into py3k (r87513). However, this raises the wider issue of other loggers in stdlib and the effect on them of logging configuration calls. I'll raise this on python-dev for discussion. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10626 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly
New submission from Goffi go...@goffi.org: G'day, While translating my software to french, I realised that minidom's writexml method doesn't handle encoding parameter correctly: it changes the header of the resulting xml, but not the encoding itself (which it should according to the documentation: http://docs.python.org/library/xml.dom.minidom.html). The given example doesn't work with writexml; but if I save by myself using the toxml's encoding parameter (like in the commented line), it works as expected. Anyway, it would be better if minidom could handle unicode string directly. -- components: XML files: test_minidom.py messages: 124709 nosy: Goffi priority: normal severity: normal status: open title: minidom Node.writexml method doesn't manage encoding parameter correctly versions: Python 2.6 Added file: http://bugs.python.org/file20174/test_minidom.py ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10781 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly
Martin v. Löwis mar...@v.loewis.de added the comment: The documentation is incorrect; writexml does not support an encoding parameter. Only Document nodes support the encoding parameter in writexml, and it is intentional that its only effect is to fill out the XML declaration. I don't understand the last sentence in your report: what is it that you want to see supported, and how is that related to this issue? -- assignee: - d...@python components: +Documentation nosy: +d...@python, loewis ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10781 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly
Goffi go...@goffi.org added the comment: Thanks for your quick reply The last sentence has nothing to do with the report, it was just a general remark that it would be nice if minidom could support unicode string directly. Should I send a mail to d...@python.org to report the doc issue, or this one is sufficient ? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10781 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8889] test_support.transient_internet fails on Freebsd because socket has no attribute EAI_NODATA
R. David Murray rdmur...@bitdance.com added the comment: I never forward ported this, but it was fixed in a different way in python3 during a complete rewrite of transient_internet for other reasons. -- resolution: - fixed stage: commit review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8889 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8898] The email package should defer to the codecs module for all aliases
R. David Murray rdmur...@bitdance.com added the comment: Too late for 3.2, will implement for 3.3. -- title: The email package should defer to the codecs module for all aliases - The email package should defer to the codecs module for all aliases versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8898 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1685453] email package should work better with unicode
R. David Murray rdmur...@bitdance.com added the comment: Now that we are primarily focused on Python3 development, collecting unicode issues is not really all that useful (at least not to me, and I'm currently doing the email maintenance), so I'm closing this. All the relevant issues are assigned to me anyway, so I'll be dealing with them by and by. -- dependencies: -Add decode_header_as_string method to email.utils, Add utf8 alias for email charsets, Unicode email address helper, email package and Unicode strings handling, email.Header (via add_header) encodes non-ASCII content incorrectly, email.Header encode() unicode P2.6, email.header unicode fix, email.parser: impossible to read messages encoded in a different encoding, email/base64mime.py cannot work, email/charset.py convert() patch, smtplib is broken in Python3, unicode in email.MIMEText and email/Charset.py resolution: - out of date stage: unit test needed - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1685453 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10781] minidom Node.writexml method doesn't manage encoding parameter correctly
Martin v. Löwis mar...@v.loewis.de added the comment: The last sentence has nothing to do with the report, it was just a general remark that it would be nice if minidom could support unicode string directly. minidom most certainly supports Unicode directly. All element names, attribute names, and text nodes carry Unicode objects. Should I send a mail to d...@python.org to report the doc issue, or this one is sufficient ? This one is sufficient. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10781 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1243730] Big speedup in email message parsing
R. David Murray rdmur...@bitdance.com added the comment: Since this is a performance hack and is considerably invasive of the feedparser code (and needs updating), I'm deferring it to 3.3. -- stage: unit test needed - patch review versions: +Python 3.3 -Python 2.7, Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1243730 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3244] multipart/form-data encoding
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue3244 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10764] sysconfig and alternative implementations
Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com: -- nosy: +Arfrever ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10764 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1162477] Parsing failures in parsedate_tz
R. David Murray rdmur...@bitdance.com added the comment: Somehow I missed this in my pre-beta feature request review :( -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1162477 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9864] email.utils.{parsedate, parsedate_tz} should have better return types
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9864 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1043706] External storage protocol for large email messages
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1043706 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10764] sysconfig and alternative implementations
Tarek Ziadé ziade.ta...@gmail.com added the comment: Yes that's what we said we would do, and was the second step after the extraction of sysconfig from distutils. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10764 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6533] Make test_xmlrpc_net functional in the absence of time.xmlrpc.com
R. David Murray rdmur...@bitdance.com added the comment: The skip was added and the service is back and has been for a while, so I'm closing this, but see also issue 6027. -- resolution: - out of date stage: patch review - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6533 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10753] request_uri method of wsgiref module does not support RFC1808 params.
Senthil Kumaran orsent...@gmail.com added the comment: I agree that semi-colon separated segments (params) can be in PATH portion of the url. I was trying to find out, how a path;params would be useful in wsgiref request_uri's PATH_INFO variable , wherein I assumed PATH_INFO should be a file-system path or a method name. After doing a bit of study, I find that ';' can be part of PATH_INFO in wsgiref compliant servers. I find a couple of bugs related to issues where ';' in PATH_INFO is not handled properly in other systems - http://bit.ly/g4UHhX So, I think, we can have ';' as safe character so that it is prevented from quoting. Also, RFC 3986 in Section 3.3 says that ';' '=' and ',' can be considered safe in the PATH component. Should we include those too? -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10753 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10782] Not possible to cross-compile due to poor detection of %lld support in printf
New submission from Ben Gamari bgam...@gmail.com: Configure.in assumes that %lld is not supported by printf if cross-compiling. This causes build errors in pyport.h, In file included from Include/Python.h:58:0, from Parser/parser.c:8: Include/pyport.h:243:13: error: #error This platform's pyconfig.h needs to define PY_FORMAT_LONG_LONG ... What is one supposed to do about this short of changing the configure script to assume support by default. -- components: Build messages: 124722 nosy: bgamari priority: normal severity: normal status: open title: Not possible to cross-compile due to poor detection of %lld support in printf type: compile error versions: Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10782 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue740495] API enhancement: poplib.MailReader()
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue740495 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue634412] RFC 2112 in email package
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue634412 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue795081] email.Message param parsing problem II
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.3 -Python 2.7 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue795081 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1025395] email.Utils.parseaddr fails to parse valid addresses
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1025395 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8769] Straightforward usage of email package fails to round-trip
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8769 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9967] encoded_word regular expression in email.header.decode_header()
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9967 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9298] binary email attachment issue with base64 encoding
Changes by R. David Murray rdmur...@bitdance.com: -- versions: +Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9298 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10753] request_uri method of wsgiref module does not support RFC1808 params.
R. David Murray rdmur...@bitdance.com added the comment: If the RFC says they are safe it seems like we should include them in the safe list. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10753 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1379416] email.Header encode() unicode P2.6
R. David Murray rdmur...@bitdance.com added the comment: Committed to 2.7 in r87515. On second thought there's no reason to forward port the test because Python3 doesn't have the equivalent type-promotion issues. -- nosy: -BreamoreBoy resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: -Python 3.1, Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue1379416 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8719] buildbot: segfault on FreeBSD (signal 11)
STINNER Victor victor.stin...@haypocalc.com added the comment: https://github.com/haypo/faulthandler/wiki can be tried on this buildbot to get more information about this issue. But the module have to be installed on this host. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8719 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Changes by Ethan Furman et...@stoneleaf.us: -- nosy: +stoneleaf ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9893] Usefulness of the Misc/Vim/ files?
Brett Cannon br...@python.org added the comment: But if you have a local copy of the Vim files from the community what is preventing you from editing them for new keywords and sending a patch to the maintainer so that the rest of the community is brought up to speed that much faster? I suspect that not many people beyond core devs use the Misc/Vim file while more people in the community use the vim.org files. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue9893 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
New submission from David Beazley d...@dabeaz.com: Is the struct.pack() function supposed to automatically encode Unicode strings into binary? For example: struct.pack(10s,Jalape\u00f1o) b'Jalape\xc3\xb1o\x00' This is Python 3.2b1. -- components: Library (Lib) messages: 124727 nosy: dabeaz priority: normal severity: normal status: open title: struct.pack() and Unicode strings type: behavior versions: Python 3.2 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10779] Change filename encoding to FS encoding in PyErr_WarnExplicit()
STINNER Victor victor.stin...@haypocalc.com added the comment: Fixed by r87517. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10779 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7056] regrtest runtest_inner calls findtestdir unnecessarily
R. David Murray rdmur...@bitdance.com added the comment: Committed (redid, actually) 2nd patch in r87516. I may or may not backport it. -- resolution: - fixed stage: patch review - committed/rejected status: open - closed versions: -Python 2.6, Python 2.7, Python 3.1 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue7056 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
David Beazley d...@dabeaz.com added the comment: Note: This is what happens in Python 2.6.4: import struct struct.pack(10s,uJalape\u00f1o) Traceback (most recent call last): File stdin, line 1, in module struct.error: argument for 's' must be a string -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10778] decoding_fgets() (tokenizer.c) decodes the filename from the wrong encoding
STINNER Victor victor.stin...@haypocalc.com added the comment: Fixed by r87518. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10778 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
David Beazley d...@dabeaz.com added the comment: Hmmm. Well, the docs seem to say that it's allowed and that it will be encoded as UTF-8. Given the treatment of Unicode/bytes elsewhere in Python 3, all I can say is that this behavior is rather surprising. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
R. David Murray rdmur...@bitdance.com added the comment: But clearly intentional, and now enshrined in released code. -- nosy: +mark.dickinson, r.david.murray resolution: - invalid stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4212] email.LazyImporter does not use absolute imports
R. David Murray rdmur...@bitdance.com added the comment: LazyImporter isn't used in Python3. Without someone motivated to propose a patch this isn't going to be changed, so I'm closing the issue. -- resolution: - wont fix stage: - committed/rejected status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue4212 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Ethan Furman et...@stoneleaf.us added the comment: I like MRAB's suggestion best: MRAB wrote: Suggestion: an explicit 'raise' in the exception handler excludes the context, but if you want to include it then 'raise with'. For example: # Exclude the context try: command_dict[command]() except KeyError: raise CommandError(Unknown command) # Include the context try: command_dict[command]() except KeyError: raise with CommandError(Unknown command) I think we can even strike off the verbiage in the exception handler... that way, raise always does the same thing -- raise KeyError will raise a KeyError, always, not sometimes a KeyError and sometimes a KeyError nested in a WhatEverError. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10782] Not possible to cross-compile due to poor detection of %lld support in printf
Roumen Petrov bugtr...@roumenpetrov.info added the comment: Use config.cache to set ac_cv_have_long_long_format -- nosy: +rpetrov ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10782 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8719] buildbot: segfault on FreeBSD (signal 11)
David Bolen db3l@gmail.com added the comment: Wouldn't that module have to be put into the actual source tree, since the tests run beneath the interpreter/libraries that are built for the test? That may be what you meant, but installed on this host made me think I could do something external on the buildbot which I don't think would work given that the module has to be called from within the tests themselves? -- David -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8719 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: I agree with the OP that we need a way to either suppress chaining or have it turned-off by default. A person writing an exception handler should have control over what the user sees. -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Can we at least offer an optional choice of encoding? -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Changes by Matthew Barnett pyt...@mrabarnett.plus.com: -- nosy: +mrabarnett ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8719] buildbot: segfault on FreeBSD (signal 11)
Changes by Mark Dickinson dicki...@gmail.com: -- nosy: +mark.dickinson ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue8719 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
David Beazley d...@dabeaz.com added the comment: Why is it even encoding at all? Almost every other part of Python 3 forces you to be explicit about bytes/string conversion. For example: struct.pack(10s, x.encode('utf-8')) Given that automatic conversion is documented, it's not clear what can be done at this point. However, there are very few other parts of Python 3 that perform implicit string-byte conversions like this (at least that I know of off-hand). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: Many of these kind of decisions were made quickly, haphazardly, and with almost no discussion and were made by contributors who were new to Python core development (no familiar with the API norms). Given the rats nest of bytes/text problems in Py3.0 and Py3.1, I think it is fair game to fix it now. The APIs have not been shaken-out and battle-tested through wide-spread adoption, so it was fair to expect that the first experienced user to come along would find these rough patches. ISTM, this should get fixed. The most innocuous way to do it is to add a warning for the implicit conversion. That way, any existing 3.x code (probably precious little) would continue to run. Another option is to just finish the job by adding an encoding parameter that defaults to utf-8. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
Raymond Hettinger rhettin...@users.sourceforge.net added the comment: A possible answer to why is this encoding at all was probably to make it easier to transition code from python 2.x where strings were usually ascii and it would make no difference in output if encoded in utf-8. The 2-to-3 fixer was good at handling name changes but not bytes/text issues. That is just a guess at what the developer may have been thinking. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
David Beazley d...@dabeaz.com added the comment: I encountered this issue is in the context of distributed computing/interprocess communication involving binary-encoded records (and encoding/decoding such records using struct). At its core, this is all about I/O--something where encodings and decoding matter a lot. Frankly, it was quite surprising that a unicode string would silently pass through struct and turn into bytes. IMHO, the fact that this is even possible encourages a sloppy usage of struct that favors programming convenience over correctness--something that's only going to end badly for the poor soul who passes non-ASCII characters into struct without knowing it. A default encoding might be okay as long as it was set to something like ASCII or Latin-1 (not UTF-8). At least then you'd get an encoding error for characters that don't fit into a byte. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10780] Fix filename encoding in PyErr_SetFromWindowsErrWithFilename() (and PyErr_SetExcFromWindowsErrWithFilename())
STINNER Victor victor.stin...@haypocalc.com added the comment: Fixed by r87519. -- resolution: - fixed status: open - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10780 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
David Beazley d...@dabeaz.com added the comment: Actually, here's another one of my favorite examples: import struct struct.pack(s,\xf1) b'\xc3' Not only does this not encode the correct value, it doesn't even encode the entire UTF-8 encoding (just the first byte of it). Like I said, pity the poor bastard who puts something that in their code and they spend the whole day trying figure out where in the hell '\xf1' magically got turned into '\xc3'. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.7 (modifications to current re 2.2.2)
Jacques Grove jacq...@tripitinc.com added the comment: Testing issue2636-20101224.zip: Nested modifiers seems to hang the regex compilation when used in a non-capturing group e.g.: re.compile((?:(?i)foo)) or re.compile((?:(?u)foo)) No problem on stock Python 2.6.5 regex engine. The unnested version of the same regex compiles fine. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue2636 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6210] Exception Chaining missing method for suppressing context
Antoine Pitrou pit...@free.fr added the comment: A person writing an exception handler should have control over what the user sees. There is already support for this in the traceback module (see the chain parameter to various funcs). -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6210 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10783] struct.pack() and Unicode strings
STINNER Victor victor.stin...@haypocalc.com added the comment: This feature was introduced in a big commit from Guido van Rossum (made before Python 3.0): r55500. The changelog is strange because it starts with Make test_zipfile pass. The zipfile module now does all I/O in binary mode using bytes. but ends with The _struct needed a patch to support bytes, str8 and str for the 's' and 'p' formats.. Why was _struct patched at the same time? Implicit conversion bytes and str is a very bad idea, it is the root of all confusion related to Unicode. The experience with Python 2 demonstrated that it should be changed, and it was changed in Python 3.0. But Python 3.0 is a big project, it has many modules. Some modules were completly broken in Python 3.0, it works better with 3.1, and we hope that it will be even better with 3.2. Attached patch removes the implicit conversion for 'c', 's' and 'p' formats. I did a similar change in ctypes, 5 months ago: issue #8966. If a program written for Python 3.1 fails because of the patch, it can use explicit conversion to stay compatible with 3.1 and 3.2 (patched). I think that it's better to use explicit conversion. Implicit conversion on 'c' format is really weird and it was not documented correctly: the note (1) is attached to b format, not to the c format. Example: struct.pack('c', 'é') struct.error: char format requires bytes or string of length 1 len('é') 1 There is also a length issue with the s format: struct.pack() truncates unicode string to a length in bytes, not in character, it is confusiong. struct.pack('2s', 'ha') b'ha' struct.pack('2s', 'hé') b'h\xc3' struct.pack('3s', 'hé') b'h\xc3\xa9' Finally, I don't like implicit conversion from unicode to bytes on pack, because it's not symmetrical. struct.pack('3s', 'hé') b'h\xc3\xa9' struct.unpack('3s', b'h\xc3\xa9') (b'h\xc3\xa9',) (str - pack() - unpack() - bytes) -- keywords: +patch nosy: +haypo Added file: http://bugs.python.org/file20175/struct.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue10783 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com