Ophelia 0.1
The first release of Ophelia, 0.1, has just been tagged. From README.txt: = Ophelia creates XHTML pages from templates written in TAL, the Zope Template Attribute Language. It is designed to reduce code repetition to zero. At present, Ophelia contains a request handler for the Apache2 web server. Static content -- Consider Ophelia as SSI on drugs. It's not fundamentally different, just a lot friendlier and more capable. Use Ophelia for sites where you basically write your HTML yourself, except that you need write the recurring stuff only once. Reducing repetition to zero comes at a price: your site must follow a pattern for Ophelia to combine your templates the right way. Consider your site's layout to be hierarchical: there's a common look to all your pages, sections have certain characteristics, and each page has unique content. It's crucial to Ophelia that this hierarchy reflects in the file system organization of your documents; how templates are nested is deduced from their places in the hierarchy of directories. Dynamic content --- Ophelia makes the Python language available for including dynamic content. Each template file may include a Python script. Python scripts and templates contributing to a page share a common set of variables to modify and use. Ophelia's content model is very simple and works best if each content object you publish is its own view: the page it is represented on. If you get content from external resources anyway (e.g. a database or a version control repository), it's still OK to use Ophelia even with multiple views per content object as long as an object's views doesn't depend on the object's type or even the object itself. Trying to use Ophelia on a more complex site will lead to an ugly entanglement of logic and presentation. Don't use Ophelia for sites that are actually web interfaces to applications, content management systems and the like. = To use Ophelia, you need - Apache2 - Python 2.3 or better - mod_python 3.1 or better - the zope package from Zope3 Ophelia is released under the Zope Public License, version 2.1. You can access the source code repository at https://svn.thomas-lotze.de/repos/public/Ophelia/, browse it using ViewCVS at http://svn.thomas-lotze.de/svn-public/Ophelia/, and download the 0.1 release from http://svn.thomas-lotze.de/svn-public/Ophelia/tags/Ophelia-0.1.tar.gz. Ophelia is currently used to deliver its author's private web site. -- Thomas -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: Phebe 0.1
Phebe 0.1 has been released and uploaded to the Python package index. From README.txt: = Phebe comprises a Python package and a number of executable scripts to operate a mobile phone connected to your computer. The implementation follows the Sony-Ericsson developer guidelines for using AT commands as of December 7, 2006, see http://developer.sonyericsson.com/getDocument.do?docId=65054. It has been tested only on a SE K750i, using Debian and Gentoo Linux distributions with a 2.6 kernel so far. The current status of Phebe is works for me, i.e. it provides the functionality the author immediately needs: get usage stats of the phone, back-up the phonebook, dump and delete short messages. See ROADMAP.txt and TODO.txt for prospective further developments. While neither talking through the AT command interface nor the higher-level data structures implemented by Phebe are operating system specific, communication with the device is. Phebe currently does this by using a Python module only available on Unix. The author is not going to port Phebe to non-Unix systems any time soon, so if you want it to support your OS, you have to supply an appropriate patch. Phebe was written by Thomas Lotze. Please contact the author at [EMAIL PROTECTED] to provide feedback or suggestions on or contributions to Phebe. = Phebe requires Python 2.5. The Phebe code base is maintained in a subversion repository at https://svn.thomas-lotze.de/repos/public/Phebe. There is a ViewCVS view on the repository available at http://svn.thomas-lotze.de/public/Phebe. -- Thomas -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: Ophelia 0.2 - Create web sites from TAL templates
Ophelia 0.2 was released today. Ophelia creates XHTML pages from templates written in TAL, the Zope Tag Attribute Language. It is designed to reduce code repetition to zero. At present, Ophelia contains a request handler for the Apache2 web server. Ophelia is released under the Zope Public License, version 2.1. To use Ophelia 0.2, you need: - Apache2 - Python 2.4 or better - mod_python 3.1 or better - the zope.tal package from Zope3 and anything it depends upon WSGI support is planned for a future version, possibly 0.3. The package is available from the Python package index as a source distribution and as eggs for both Python 2.4 and 2.5: http://cheeseshop.python.org/pypi/Ophelia You can access the source code repository at https://svn.thomas-lotze.de/repos/public/Ophelia/, browse it using ViewCVS at http://svn.thomas-lotze.de/svn-public/Ophelia/, or visit Ophelia's web page, containing a commented live usage example, at http://www.thomas-lotze.de/en/software/ophelia/. From the documentation: What kind of sites is Ophelia good for? +++ Static content -- Consider Ophelia as SSI on drugs. It's not fundamentally different, just a lot friendlier and more capable. Use Ophelia for sites where you basically write your HTML yourself, except that you need write the recurring stuff only once. Reducing repetition to zero comes at a price: your site must follow a pattern for Ophelia to combine your templates the right way. Consider your site's layout to be hierarchical: there's a common look to all your pages, sections have certain characteristics, and each page has unique content. It's crucial to Ophelia that this hierarchy reflect in the file system organization of your documents; how templates combine is deduced from their places in the hierarchy of directories. Dynamic content --- Ophelia makes the Python language available for including dynamic content. Each template file may include a Python script. Python scripts and templates contributing to a page share a common set of variables to modify and use. Ophelia's content model is very simple and works best if each content object you publish is its own view: the page it is represented on. If you get content from external resources anyway (e.g. a database or a version control repository), it's still OK to use Ophelia even with multiple views per content object as long as an object's views don't depend on the object's type or even the object itself. Trying to use Ophelia on a more complex site will lead to an ugly entanglement of logic and presentation. Don't use Ophelia for sites that are actually web interfaces to applications, content management systems and the like. -- Viele Grüße, Thomas -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
ANN: Ophelia 0.3 - Create web sites from TAL templates
Ophelia 0.3 has just been released. Ophelia creates XHTML pages from templates written in TAL, the Zope Tag Attribute Language. It is designed to reduce code repetition to zero. The package contains both a WSGI application running Ophelia as well as a request handler for mod_python, the Python module for the Apache2 web server. Additionally, a script is included that renders a page and dumps it to stdout, and another one that runs a wsgiref based HTTP server hosting Ophelia's WSGI application. Ophelia is released under the Zope Public License, version 2.1. To use Ophelia 0.3, you need Python 2.4. The mod_python request handler requires mod_python 3.3 or better. The package is available from the Python package index as a source distribution and a Python 2.4 egg: http://cheeseshop.python.org/pypi/ophelia The source code contains a zc.buildout configuration for an environment including Apache and mod_python. You can access the source code repository at https://svn.thomas-lotze.de/repos/public/Ophelia/, browse it using ViewCVS at http://svn.thomas-lotze.de/svn-public/Ophelia/, or visit Ophelia's web page, containing a commented live usage example, at http://www.thomas-lotze.de/en/software/ophelia/. From the documentation: What kind of sites is Ophelia good for? === Static content -- Consider Ophelia as SSI on drugs. It's not fundamentally different, just a lot friendlier and more capable. Use Ophelia for sites where you basically write your HTML yourself, except that you need write the recurring stuff only once. Reducing repetition to zero comes at a price: your site must follow a pattern for Ophelia to combine your templates the right way. Consider your site's layout to be hierarchical: there's a common look to all your pages, sections have certain characteristics, and each page has unique content. It's crucial to Ophelia that this hierarchy reflect in the file system organization of your documents; how templates combine is deduced from their places in the hierarchy of directories. Dynamic content --- Ophelia makes the Python language available for including dynamic content. Each template file may include a Python script. Python scripts and templates contributing to a page share a common set of variables to modify and use. Ophelia's content model is very simple and works best if each content object you publish is its own view: the page it is represented on. If you get content from external resources anyway (e.g. a database or a version control repository), it's still OK to use Ophelia even with multiple views per content object as long as an object's views don't depend on the object's type or even the object itself. Trying to use Ophelia on a more complex site will lead to an ugly entanglement of logic and presentation. Don't use Ophelia for sites that are actually web interfaces to applications, content management systems and the like. -- Viele Grüße, Thomas -- http://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations.html
StringIO objects sharing a buffer
Hi, I want to implement a tokenizer for some syntax. So I thought I'd subclass StringIO and make my new class return tokens on next(). However, if I want to read tokens from two places in the string in turns, I'd either need to do some housekeeping of file pointers outside the tokenizer class (which is ugly) or use two tokenizers on the same data buffer (which seems impossible to me using my preferred approach as a file-like object has exactly one file pointer). Is there a way for multiple StringIO objects to share a buffer of data, or do I have to give up on subclassing StringIO for this purpose? (An alternative would be a tokenizer class that has a StringIO instead of being one and do the file pointer housekeeping in there.) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Copying data between file-like objects
Hi, another question: What's the most efficient way of copying data between two file-like objects? f1.write(f2.read()) doesn't seem to me as efficient as it might be, as a string containing all the contents of f2 will be created and thrown away. In the case of two StringIO objects, this means there's a point when the contents is held in memory three times. Reading and writing a series of short blocks to avoid a large copy buffer seems ugly to me, and string objects will be created and thrown away all the time. Do I have to live with that? (In C, I would do the same thing, only without having to create and throw away anything while overwriting a copy buffer, and being used to doing everything the pedestrian way, anyway.) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Copying data between file-like objects
Fredrik Lundh wrote: if f2 isn't too large, reading lots of data in one operation is often the most efficient way (trust me, the memory system is a lot faster than your disk) Sure. if you don't know how large f2 can be, use shutil.copyfileobj: help(shutil.copyfileobj) Help on function copyfileobj in module shutil: copyfileobj(fsrc, fdst, length=16384) copy data from file-like object fsrc to file-like object fdst This sounds like what I was looking for. Thanks for the pointer. However, the following doesn't seem like anything is being copied: from StringIO import StringIO from shutil import copyfileobj s = StringIO() s2 = StringIO() s.write('asdf') copyfileobj(s, s2) s2.getvalue() '' to copy stringio objects, you can use f1 = StringIO(f2.getvalue()). But this should have the same problem as using read(): a string will be created on the way which contains all the content. why you would want/need to do this is more than I can figure out, though... Because I want to manipulate a copy of the data and be able to compare it to the original afterwards. Another thing I'd like to do is copy parts of a StringIO object's content to another object. This doesn't seem possible with any shutil method. Any idea on that? What one can really wonder, I admit, is why the difference between holding data two or three times in memory matters that much, especially if the latter is only for a short time. But as I'm going to use the code that handles the long string as a core component to some application, I'd like to make it behave as well as possible. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Copying data between file-like objects
Fredrik Lundh wrote: copyfileobj copies from the current location, and write leaves the file pointer at the end of the file. a s.seek(0) before the copy fixes that. Damn, this cannot be read from the documentation, and combined with the fact that there's no length parameter for a portion to copy either, I thought copying would mean copying all. getvalue() returns the contents of the f2 file as a string, and f1 will use that string as the buffer. there's no extra copying. Oh, good to know. Then StringIO(f2.getvalue()) or StringIO(f2.read()) would be the way to go. Because I want to manipulate a copy of the data and be able to compare it to the original afterwards. why not just use a plain string (or a list of strings)? your focus on StringIO sounds like a leftover from some C library you've been using in an earlier life ;-) Because the data can be a lot, and modifying long strings means a lot of slicing and copying partial strings around, if I understand right. Modifying a StringIO buffer is possible in-place. Plus, it's easier to teach an algorithm that works on a StringIO to use a file instead, so I may be able to avoid reading stuff into memory altogether in certain places without worrying about special cases. use a plain string and slicing. (if you insist on using StringIO, use seek and read) OK. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Controlling a generator the pythonic way
Hi, I'm trying to figure out what is the most pythonic way to interact with a generator. The task I'm trying to accomplish is writing a PDF tokenizer, and I want to implement it as a Python generator. Suppose all the ugly details of toknizing PDF can be handled (such as embedded streams of arbitrary binary content). There remains one problem, though: In order to get random file access, the tokenizer should not simply spit out a series of tokens read from the file sequentially; it should rather be possible to point it at places in the file at random. I can see two possibilities to do this: either the current file position has to be read from somewhere (say, a mutable object passed to the generator) after each yield, or a new generator needs to be instantiated every time the tokenizer is pointed to a new file position. The first approach has both the disadvantage that the pointer value is exposed and that due to the complex rules for hacking a PDF to tokens, there will be a lot of yield statements in the generator code, which would make for a lot of pointer assignments. This seems ugly to me. The second approach is cleaner in that respect, but pointing the tokenizer to some place has now the added semantics of creating a whole new generator instance. The programmer using the tokenizer now needs to remember to throw away any references to the generator each time the pointer is reset, which is also ugly. Does anybody here have a third way of dealing with this? Otherwise, which ugliness is the more pythonic one? Thanks a lot for any ideas. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Peter Hansen wrote: Thomas Lotze wrote: I can see two possibilities to do this: either the current file position has to be read from somewhere (say, a mutable object passed to the generator) after each yield, [...] The third approach, which is certain to be cleanest for this situation, is to have a custom class which stores the state information you need, and have the generator simply be a method in that class. Which is, as far as the generator code is concerned, basically the same as passing a mutable object to a (possibly standalone) generator. The object will likely be called self, and the value is stored in an attribute of it. Probably this is indeed the best way as it doesn't require the programmer to remember any side-effects. It does, however, require a lot of attribute access, which does cost some cycles. A related problem is skipping whitespace. Sometimes you don't care about whitespace tokens, sometimes you do. Using generators, you can either set a state variable, say on the object the generator is an attribute of, before each call that requires a deviation from the default, or you can have a second generator for filtering the output of the first. Again, both solutions are ugly (the second more so than the first). One uses side-effects instead of passing parameters, which is what one really wants, while the other is dumb and slow (filtering can be done without taking a second look at things). All of this makes me wonder whether more elaborate generator semantics (maybe even allowing for passing arguments in the next() call) would not be useful. And yes, I have read the recent postings on PEP 343 - sigh. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Mike Meyer wrote: Yes, such a switch gets the desired behavior as a side effect. Then again, a generator that returns tokens has a desired behavior (advancing to the next token) as a side effect(*). That's certainly true. If you think about these things as the state of the object, rather than side effects, it won't seem nearly as ugly. In fact, part of the point of using a class is to encapsulate the state required for some activity in one place. Wanting to do everything via parameters to methods is a very top-down way of looking at the problem. It's not necessarily correct in an OO environment. What worries me about the approach of changing state before making a next() call instead of doing it at the same time by passing a parameter is that the state change is meant to affect only a single call. The picture might fit better (IMO) if it didn't look so much like working around the fact that the next() call can't take parameters for some technical reason. I agree that decoupling state changes and next() calls would be perfectly beautiful if they were decoupled in the problem one wants to model. They aren't. *) It's noticable that some OO languages/libraries avoid this side effect: the read method updates an attribute, so you do the read then get the object read from the attribute. That's very OO, but not very pythonic. Just out of curiosity: What makes you state that that behaviour isn't pythonic? Is it because Python happens to do it differently, because of a gut feeling, or because of some design principle behind Python I fail to see right now? -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Peter Hansen wrote: Fair enough, but who cares what the generator code thinks? It's what the programmer has to deal with that matters, and an object is going to have a cleaner interface than a generator-plus-mutable-object. That's right, and among the choices discussed, the object is the one I do prefer. I just don't feel really satisfied... It does, however, require a lot of attribute access, which does cost some cycles. Hmm... premature optimization is all I have to say about that. But when is the right time to optimize? There's a point when the thing runs, does the right thing and - by the token of make it run, make it right, make it fast - might get optimized. And if there are places in a PDF library that might justly be optimized, the tokenizer is certainly one of them as it gets called really often. Still, I'm going to focus on cleaner code and, first and foremost, a clean API if it comes to a decision between these goals and optimization - at least as long as I'm talking about pure Python code. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Thomas Lotze wrote: Does anybody here have a third way of dealing with this? Sleeping a night sometimes is an insightful exercise *g* I realized that there is a reason why fiddling with the pointer from outside the generator defeats much of the purpose of using one. The implementation using a simple method call instead of a generator needs to store some internal state variables on an object to save them for the next call, among them the pointer and a tokenization mode. I could make the thing a generator by turning the single return statement into a yield statement and adding a loop, leaving all the importing and exporting of the pointer intact - after all, someone might reset the pointer between next() calls. This is, however, hardly using all the possibilities a generator allows. I'd rather like to get rid of the mode switches by doing special things where I detect the need for them, yielding the result, and proceeding as before. But as soon as I move information from explicit (state variables that can be reset along with the pointer) to implicit (the point where the generator is suspended after yielding a token), resetting the pointer will lead to inconsistencies. So, it seems to me that if I do want to use generators for any practical reason instead of just because generators are way cool, they need to be instantiated anew each time the pointer is reset, for simple consistency reasons. Now a very simple idea struck me: If one is worried about throwing away a generator as a side-effect of resetting the tokenization pointer, why not define the whole tokenizer as not being resettable? Then the thing needs to be re-instantiated very explicitly every time it is pointed somewhere. While still feeling slightly awkward, it has lost the threat of doing unexpected things. Does this sound reasonable? -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Thomas Lotze wrote: A related problem is skipping whitespace. Sometimes you don't care about whitespace tokens, sometimes you do. Using generators, you can either set a state variable, say on the object the generator is an attribute of, before each call that requires a deviation from the default, or you can have a second generator for filtering the output of the first. Last night's sleep was really productive - I've also found another way to tackle this problem, and it's really simple IMO. One could pass the parameter at generator instantiation time and simply create two generators behaving differently. They work on the same data and use the same source code, only with a different parametrization. All one has to care about is that they never get out of sync. If the data pointer is an object attribute, it's clear how to do it. Otherwise, both could acquire their data from a common generator that yields the PDF content (or a buffer representing part of it) character by character. This is even faster than keeping a pointer and using it as an index on the data. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: why python on debian without the module profile?
kyo guan wrote: ImportError: No module named profile They moved it to non-free because the module's license isn't DFSG compliant. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Controlling a generator the pythonic way
Thomas Lotze wrote: I'm trying to figure out what is the most pythonic way to interact with a generator. JFTR, so you don't think I'd suddenly lost interest: I won't be able to respond for a couple of days because I've just incurred a nice little hospital session... will be back next week. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: OO approach to decision sequence?
Jordan Rastrick wrote: Without knowing more about your problem, I think the most obvious OO approach would be to write a seperate (simple) class for each of node_type_1, node_type_2, etc. While I agree that this is the cleanest and usually simplest approach, it does have its drawbacks. I'm currently working on a project where I'd very much like to avoid writing a whole set of classes just for the purpose of avoiding a decision chain. For a PDF library, I need basic data types that are used in a PDF document. Such are integers, floats, strings, lists, dictionaries and a few. At some point they have to be written to a file, and at first I was tempted to create types like pdfint, pdffloat, pdfstr etc. which implement the respective file encoding either in a write method or directly in __str__. However, the whole point of the library is to allow working with the document's data. Beside manipulating existing (as in read from a PDF file) mutable objects this includes creating new objects of type pdffoo. And I realized it is very bothersome to have to say x = pdfint(5) instead of x = 5 everytime I deal with integers that would end up in the document. Similar for, e.g., adding to PDF integers: x = pdfint(y+z) instead of just x = y+z. The latter can be cured by touching all methods returning any pdffoo instances. No sane person would do this, however, and it would not eliminate any pdffoo(x) type conversions in the app code anyway. So I decided that in this case it is best to go without special types and use those provided by Python, and live with an ugly decision chain or two at defined places in the library. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Package organization
Hi, I've two questions concerning organizing and naming things when writing a Python package. - Naming of classes: I'm writing a library that reads PDF files. I have a data structure that represents the large-scale structure of a PDF file (header, trailer, incremental updates etc), and I'll have others, e.g. one that represents the document as a collection of logical objects (page descriptions, images etc). Assume I have a package called PDF. Should the classes then be called simply File and Objects, as it is clear what they do as they are imported from PDF? Or should they be called PDFFile and PDFObjects, as the names would be too undescriptive otherwise? - Organizing subpackages and interfaces: I'm using the zope.interface package in order to define interface classes. In a small package called foo, one might define interfaces IReadableFoo and IWritableFoo in foo.interfaces. However, in a large package foo with subpackages bar and baz, interface definitions might either sit in foo.bar.interfaces and foo.baz.interfaces, or in foo.interfaces.bar and foo.interfaces.baz. Which is preferable? Thanks for any thought on this. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Package organization
F. Petitjean wrote: As you whish :-) Damn freedom of choice *g if in the package ie in the __init__.py (not the best idea) from PDF import File as PDFFile # always possible Technically, this is clear - however I don't like the idea of giving the same thing different names, especially if there's a chance that other people get to look at and try to understand the code... Using short names being unique by virtue of the subpackage hierarchy internally and leaving it to the user (which might even be another subpackage of the library) to import it as something more descriptive in his context is probably the easiest, cleanest and least obtrusive thing, as I think about it. Have you installed the reportlab package ? It is full of from ... import .. and it generates PDF. I do know ReportLab. IIRC, last time I looked, it didn't simply expose an API that models and operates on a PDF document's structures, but was designed to produce PDF files with a certain kind of content. It didn't seem to be of much easy use for anything wildly different from that. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Module Exposure
Jacob Page wrote: better-named, Just a quick remark, without even having looked at it yet: the name is not really descriptive and runs a chance of misleading people. The example I'm thinking of is using zope.interface in the same project: it's customary to name interfaces ISomething. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Should I use if or try (as a matter of speed)?
Steve Juranich wrote: I was wondering how true this holds for Python, where exceptions are such an integral part of the execution model. It seems to me, that if I'm executing a loop over a bunch of items, and I expect some condition to hold for a majority of the cases, then a try block would be in order, since I could eliminate a bunch of potentially costly comparisons for each item. Exactly. But in cases where I'm only trying a single getattr (for example), using if might be a cheaper way to go. Relying on exceptions is faster. In the Python world, this coding style is called EAFP (easier to ask forgiveness than permission). You can try it out, just do something 10**n times and measure the time it takes. Do this twice, once with prior checking and once relying on exceptions. And JFTR: the very example you chose gives you yet another choice: getattr can take a default parameter. What do I mean by cheaper? I'm basically talking about the number of instructions that are necessary to set up and execute a try block as opposed to an if block. I don't know about the implementation of exceptions but I suspect most of what try does doesn't happen at run-time at all, and things get checked and looked for only if an exception did occur. An I suspect that it's machine code that does that checking and looking, not byte code. (Please correct me if I'm wrong, anyone with more insight.) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Should I use if or try (as a matter of speed)?
Steven D'Aprano wrote: On the gripping hand, testing for errors before they happen will be slow if errors are rare: Hm, might have something to do with why those things intended for handling errors after they happened are called exceptions ;o) - If your code has side effects (eg changing existing objects, writing to files, etc), then you might want to test for error conditions first. Otherwise, you can end up with your data in an inconsistent state. BTW: Has the context management stuff from PEP 343 been considered for implementing transactions? - Why are you optimizing your code now anyway? Get it working the simplest way FIRST, then _time_ how long it runs. Then, if and only if it needs to be faster, should you worry about optimizing. The simplest way will often be try...except blocks. Basically, I agree with the make it run, make it right, make it fast attitude. However, FWIW, I sometimes can't resist optimizing routines that probably don't strictly need it. Not only does the resulting code run faster, but it is usually also shorter and more readable and expressive. Plus, I tend to gain further insight into the problem and tools in the process. YMMV, of course. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Frankenstring
Hi, I think I need an iterator over a string of characters pulling them out one by one, like a usual iterator over a str does. At the same time the thing should allow seeking and telling like a file-like object: f = frankenstring(0123456789) for c in f: ... print c ... if c == 2: ... break ... 0 1 2 f.tell() 3L f.seek(7) for c in f: ... print c ... 7 8 9 It's definitely no help that file-like objects are iterable; I do want to get a character, not a complete line, at a time. I can think of more than one clumsy way to implement the desired behaviour in Python; I'd rather like to know whether there's an implementation somewhere that does it fast. (Yes, it's me and speed considerations again; this is for a tokenizer at the core of a library, and I'd really like it to be fast.) I don't think there's anything like it in the standard library, at least not anything that would be obvious to me. I don't care whether this is more of a string iterator with seeking and telling, or a file-like object with a single-character iterator; as long as it does both efficiently, I'm happy. I'd even consider writing such a beast in C, albeit more as a learning exercise than as a worthwhile measure to speed up some code. Thanks for any hints. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Fwd: Should I use if or try (as a matter of speed)?
Christopher Subich wrote: try: f=file('file_here') except IOError: #File doesn't exist error_handle error_flag = 1 if not error_flag: do_setup_code do_stuff_with(f) which nests on weird, arbitrary error flags, and doesn't seem like good programming to me. Neither does it to me. What about try: f=file('file_here') except IOError: #File doesn't exist error_handle else: do_setup_code do_stuff_with(f) (Not that I'd want to defend Joel's article, mind you...) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Slicing every element of a list
Alex Dempsey wrote: for line in lines: line = line[1:-5] line = line.split('\\t\') This went without returning any errors, but nothing was sliced or split. Next I tried: for i in range(len(lines)): lines[i] = lines[i][1:-5] lines[i] = lines[i].split('\\t\') This of course worked, but why didn't the first one work. Because when assigning to line the second time, you just make the identifier reference a new object, you don't touch the list. This is how one might do it without ranging over the length of the list and having to get the lines out by element access: for i, line in enumerate(lines): line = line[1:-5] lines[i] = line.split('\\t\') Probably there are even better ways, this is just off the top of my head. Further why didn't the first one return an error? Because you didn't make any. You just discarded your results; why should anyone stop you from burning cycles? *g -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
jay graves wrote: see StringIO or cStringIO in the standard library. Just as with files, iterating over them returns whole lines, which is unfortunately not what I want. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Scott David Daniels wrote: Now if you want to do it for a file, you could do: for c in thefile.read(): The whole point of the exercise is that seeking on a file doesn't influence iteration over its content. In the loop you suggest, I can seek() on thefile to my heart's content and will always get its content iterated over exactly from beginning to end. It had been read before any of this started, after all. Similarly, thefile.tell() will always tell me thefile's size or the place I last seek()'ed to instead of the position of the next char I will get. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Bengt Richter wrote: lotzefile.py -- Thanks. [...] byte = self.buf[self.pos] This is the place where the thing is basically a str whose items are accessed as sequence elements. It has some iterator behaviour and file management which makes it nice to use, of course, and to most this will be enough (and it is a lot indeed). But it loses the efficiency of for c in asdf: do_something(c) Actually, relying on string[index] behind the scenes is one of the ways of implementing frankenstring I labelled clumsy in the original posting ;o) I suspect you could get better performance if you made LotzeFile instances able to return interators over buffer chunks and get characters from them, which would be string iterators supplying the characters rather than the custom .next, but the buffer chunks would have to be of some size to make that pay. Testing is the only way to find out what the crossing point is, if you really have to. If I understand this correctly, you'd have to switch to using a new iterator after seeking, which would make this impossible: f = LotzeFile('something') for c in iter(f): do_something(c) if some_condition: f.seek(somewhere) # the next iteration reads from the new position And it would break telling since the class can't know how many characters have been read from an iterator once it returned one after seeking or switching to another buffer chunk. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Andreas Lobinger wrote: t2 = f.find('2')+1 This is indeed faster than going through a string char by char. It doesn't make for a nice character-based state machine, but of course it avoids making Python objects for every character and uses the C implementation of str for searching. However, it's only fine if you are looking for single characters. As soon as you're looking for classes of characters, you need the (slower) regex machinery (as you well know, but for the sake of discussion...). A string, and a pointer on that string. If you give up the boundary condition to tell backwards, you can start to eat up the string via f = f[p:]. There was a performance difference with that, in fact it was faster ~4% on a python2.2. When I tried it just now, it was the other way around. Eating up the string was slower, which makes sense to me since it involves creating new string objects all the time. I dont't expect any iterator solution to be faster than that. It's not so much an issue of iterators, but handling Python objects for every char. Iterators would actually be quite helpful for searching: I wonder why there doesn't seem to be an str.iterfind or str.itersplit thing. And I wonder whether there shouldn't be str.findany and str.iterfindany, which takes a sequence as an argument and returns the next match on any element of it. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Peter Otten wrote: Not clumsy, just slow. As you wish ;o) I didn't mean clumsy as in clumsy looking Python code anyway, rather as in clumsy to use the Python machinery for operations that are straight-forward and efficient in C, in which language str and cStringIO are implemented already. I hope you'll let us know how much faster your final approach turns out to be. I'm pretty convinced that implementing an algorithmically nice state machine that goes through a string char by char won't get any faster than using s[index] all the time unless I do a frankenstring in C. Failing that, a more pragmatic approach is what Andreas suggests; see the other subthread. By the way, I'll consider anything that doesn't implement seek() and tell() cheating :-) An implementation of frankenstring would have to have seek and tell, that's the point of doing it. But for half-way simple state machines, hiding the index handling in a Python class that slows things down it just not worth it. Doing index += 1 here and there is fine if it happens only half a dozen times. I know it's not beautiful, that's why I started this thread ;o) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Thomas Lotze wrote: And I wonder whether there shouldn't be str.findany and str.iterfindany, which takes a sequence as an argument and returns the next match on any element of it. On second thought, that wouldn't gain much on a loop over finding each sequence, but add more complexity than it is worth. What would be more useful, especially thinking of a C implementation, is str.findanyof and str.findnoneof. They take a string as an argument and find the first occurrence of any char in that string or any char not in that string, resp. Especially finding any char not among a given few needs a hoop to jump through now, if I didn't miss anything. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: using hotshot for timing and coverage analysis
Andreas Lobinger wrote: hotshot.Profile has flags for recording timing per line and line events. Even if i had both set to 1 i still get only the standard data (time per call). Could it be that pstats.Stats doesn't know about hotshot? Haven't checked... What's much more annoying about hotshot is that loading the stats takes ages if one profiles stuff that runs about half a minute or so. At least it does that on Python 2.4.1a0 as shipped with Debian testing a while ago. Is there any document available that has examples how to use the hotshot for converage analysis and to display timing per line? Haven't looked thoroughly yet; all I know is what's in the Python docs. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Programming Contest
Brian Quinlan wrote: I've decided that it would be be fun to host a weekly Python programming contest. I like the idea, and doing the first problem was fun indeed :o) I'm always looking for feedback, so let me know what you think or if you have any ideas for future problems. It would be nice if you could put up a suite of test data with oracle solutions for download. For those sitting behind a modem line (like me), it would be a great help and speed up of the testing cycle. Thanks for your effort, in any case! -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Frankenstring
Peter Otten wrote: I hope you'll let us know how much faster your final approach turns out to be OK, here's a short report on the current state. Such code as there is can be found at http://svn.thomas-lotze.de/PyASDF/pyasdf/_frankenstring.c, with a Python mock-up in the same directory. Thinking about it (Andreas, thank you for the reminder :o)), doing character-by-character scanning in Python is stupid, both in terms of speed and, given some more search capabilities than str currently has, elegance. So what I did until now (except working myself into writing extensions in C) is give the evolving FrankenString some search methods that enable searching for the first occurrence in the string of any character out of a set of characters given as a string, or any character not in such a set. This has nothing to do yet with iterators and seeking/telling. Just letting C do the while data[index] not in whitespace: index += 1 part speeds up my PDF tokenizer by a factor between 3 and 4. I have never compared that directly to using regular expressions, though... As a bonus, even with this minor addition the Python code looks a little cleaner already: c = data[cursor] while c in whitespace: # Whitespace tokens. cursor += 1 if c == '%': # We're just inside a comment, read beyond EOL. while data[cursor] not in \r\n: cursor += 1 cursor += 1 c = data[cursor] becomes cursor = data.skipany(whitespace, start) c = data[cursor] while c == '%': # Whitespace tokens: comments till EOL and whitespace. cursor = data.skipother(\r\n, cursor) cursor = data.skipany(whitespace, cursor) c = data[cursor] (removing '%' from the whitespace string, in case you wonder). The next thing to do is make FrankenString behave. Right now there's too much copying of string content going on everytime a FrankenString is initialized; I'd like it to share string content with other FrankenStrings or strs much like cStringIO does. I hope it's just a matter of learning from cStringIO. To justify the franken part of the name some more, I consider mixing in yet another ingredient and making the thing behave like a buffer in that a FrankenString should be possible to make from only part of a string without copying data. After that, the thing about seeking and telling iterators over characters or search results comes in. I don't think it will make much difference in performance now that the stupid character searching has been done in C, but it'll hopefully make for more elegant Python code. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: is this pythonic?
Mage wrote: Or is there better way? for (i, url) in [(i,links[i]) for i in range(len(links))]: ... links is a list. for i, url in enumerate(links): -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Using gnu readline in my own python program?
sboyle55 wrote: Hi...I'm a newbie to python, and very confused. I'm writing a simple program and want the user to be able to edit a line that I display using the full gnu readline capabilitites. (For example, control+a to go to the beginning of the line.) Then I want to be able to read the line after it's been edited... Probably the built-in function raw_input already does what you want. It uses readline if available. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Using gnu readline in my own python program?
sboyle55 wrote: raw_input is an excellent suggestion, and almost exactly what I want. But, I want to give the user a string to edit, not have them start from scratch inputting a string. http://svn.thomas-lotze.de/PyASDF/pyasdf/cli.py Take a look at the fancy_input function. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: accessor/mutator functions
Dan Sommers wrote: I think I'd add a change_temperature_to method that accepts the target temperature and some sort of timing information, depending on how the rest of the program and/or thread is structured. But then you put application logic into a library function. Doing this consistently leads to a monster of a library that tries to account for all possible applications. Where does this leave the KISS principle? In the case of simply reading the current temperature, and not knowing what's inside that device driver, I'd still lean away from exposing a current temperature attribute directly. I think part of my thinking comes from my old Pascal days, when it made me cringe to think that x:=b; might actually execute a subroutine rather than just copy some memory around. Then you also avoid lists, dicts and, ironically, methods. Accessing methods means to access a callable attribute, after all, with all the stuff going on behind the scenes on attribute access. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Python Graphing Utilities.
Kenneth Miller wrote: I am new to Python and i was wondering what graphing utlities would be available to me. I have already tried BLT and after weeks of unsuccesful installs i'd like to find something else. Anything someone would recommend? You might also want to check out PyX: http://pyx.sf.net/. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Semantics of propagated exceptions
Hi, I wonder how to solve the following problem the most pythonic way: Suppose you have a function f which, as part of its protocol, raises some standard exception E under certain, well-defined circumstances. Suppose further that f calls other functions which may also raise E. How to best distinguish whether an exception E raised by f has the meaning defined by the protocol or just comes from details of the implementation? As an example, let's inherit from dict and replace __getitem__. It is supposed to raise a KeyError if an item is not found in the mapping. But what if it does some magic to use default values: def __getitem__(self, key): if key in self: return self[key] defaults = foobar[default] return defaults[key] If default is not in foobar, a KeyError is raised by that lookup and propagates to the calling code. However, the problem is not key can't be found but I'm too stupid to find out whether key can be found. In a web context where key identifies the resource requested, this might make the difference between a 404 Not found and a 500 Internal server error response. Several solutions come to mind, neither of which I'm satisfied with: - f might catch E exceptions from the implementation and raise some other error in their stead, maybe with an appropriate message or treating the traceback in some helpful way. This destroys the original exception. - f might catch and re-raise E exceptions, setting some flag on them that identifies them as protocol exceptions or not. This requires calling code to know about the flag. - Calling code might guess whether the exception comes from some inner working of f from how deep in the calling stack the exception originated. Obviously, this will not be easy or not even work at all if f calls related functions which might also raise E with the protocol semantics. This requires calling code to do some magic but keeps f from having to catch and raise exceptions all over the place. Some gut feeling tells me the first option is preferrable, but I'ld like to read your opinions and maybe other alternatives. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Semantics of propagated exceptions
Sorry for not answering such a long time. It's because my question originated from a discussion within our company which moved out of focus shortly after I posted, and over waiting for some response from them before replying here, I forgot about it. Steve Holden wrote: - f might catch E exceptions from the implementation and raise some other error in their stead, maybe with an appropriate message or treating the traceback in some helpful way. This destroys the original exception. My solution, of course, takes this approach. Good to see that my gut feeling as to the most pythonic approach seems to coincide with the answers I've received ;o) -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Minimally intrusive XML editing using Python
I wonder what Python XML library is best for writing a program that makes small modifications to an XML file in a minimally intrusive way. By that I mean that information the program doesn't recognize is kept, as are comments and whitespace, the order of attributes and even whitespace around attributes. In short, I want to be able to change an XML file while producing minimal textual diffs. Most libraries don't allow controlling the order of and the whitespace around attributes, so what's generally left to do is store snippets of original text along with the model objects and re-use that for writing the edited XML if the model wasn't modified by the program. Does a library exist that helps with this? Does any XML library at all allow structured access to the text representation of a tag with its attributes? Thank you very much. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Stefan Behnel wrote: Take a look at canonical XML (C14N). In short, that's the only way to get a predictable XML serialisation that can be used for textual diffs. It's supported by lxml. Thank you for the pointer. IIUC, c14n is about changing an XML document so that its textual representation is reproducible. While this representation would certainly solve my problem if I were to deal with input that's already in c14n form, it doesn't help me handling arbitrarily formatted XML in a minimally intrusive way. IOW, I don't want the XML document to obey the rules of a process, but instead I want a process that respects the textual form my input happens to have. -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Chris Rebert wrote: Have you considered using an XML-specific diff tool such as: I'm afraid I'll have to fall back to using such a thing if I don't find a solution to what I actually want to do. I do realize that XML isn't primarily about its textual representation, so I guess I shouldn't be surprised if what I'm looking for doesn't exist. Still, it would be nice if it did... -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
Re: Minimally intrusive XML editing using Python
Please consider this a reply to any unanswered messages I received in response to my original post. Dave Angel wrote: What's your real problem, or use case? Are you just concerned with diffing, or are others likely to read the xml, and want it formatted the way it already is? I'd like to put the XML under revision control along with other stuff. Other people should be able to make sense of the diffs and I'd rather not require them to configure their tools to use some XML differ. And how general do you need this tool to be? For example, if the only thing you're doing is modifying existing attributes or existing tags, the minimal change would be pretty unambiguous. But if you're adding tags, or adding content on what was an empty element, then the requirement gets fuzzy And finding an existing library for something fuzzy is unlikely. Sure. I guess it's something like an 80/20 problem: Changing attributes in a way that keeps the rest of the XML intact will go a long way and as we're talking about XML that is supposed to be looked at by humans, I would base any further requirements on the assumption that it's pretty-printed in some way so that removing an element, for example, can be defined by touching as few lines as possible, and adding one can be restricted to adding a line in the appropriate place. If more complex stuff isn't as well-defined, that would be entirely OK with me. Sample input, change list, and desired output would be very useful. I'd like to be able to reliably produce a diff like this using a program that lets me change the value in some useful way, which might be dragging a point across a map with the mouse in this example: --- foo.gpx 2009-05-30 19:45:45.0 +0200 +++ bar.gpx 2009-11-23 17:41:36.0 +0100 @@ -11,7 +11,7 @@ speed0.792244/speed fix2d/fix /trkpt -trkpt lat=50.605995000 lon=10.70968 +trkpt lat=50.605985000 lon=10.70968 ele508.30/ele time2009-05-30T16:37:10Z/time course15.15/course -- Thomas -- http://mail.python.org/mailman/listinfo/python-list
[issue30046] csv: Inconsistency re QUOTE_NONNUMERIC
New submission from Thomas Lotze: A csv.writer with quoting=csv.QUOTE_NONNUMERIC does not quote boolean values, which makes a csv.reader with the same quoting behaviour fail on that value: csv.py -- import csv import io f = io.StringIO() writer = csv.writer(f, quoting=csv.QUOTE_NONNUMERIC) writer.writerow(['asdf', 1, True]) f.seek(0) reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC) for row in reader: print(row) -- $ python3 csvbug.py Traceback (most recent call last): File "csvbug.py", line 12, in for row in reader: ValueError: could not convert string to float: 'True' -- I'd consider this inconsistency a bug, but in any case something that needs documenting. -- components: Library (Lib) messages: 291516 nosy: tlotze priority: normal severity: normal status: open title: csv: Inconsistency re QUOTE_NONNUMERIC type: behavior ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue30046> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com