Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))
[reorganized a bit] Ville Vainio [EMAIL PROTECTED] writes: Why don't you use external validation on the created xml? Validating it every time sounds like way too much like Javaic BD to be fun anymore. Pickle should serve you well, and would probably remove about half of your code. Do the simplest thing that could possibly work and all that. What is the point in doing validation if it isn't done every time? Why wouldn't I do it every time? It isn't that slow thing to do. Pickle doesn't have validation. I am not comfortable for using it as storage format that should be reliable over years when the program evolves. It also doesn't tell me if my program has put something other to the data than I meant to. The program will just throw some weird exception. I want to do the simplest thing, but I also want something that helps me keep the program usable also in the future. I prefer putting some resources to get some validation to it initially than use later more resouces to do something with undetermined lump of data. python has shipped with a fast XML parser since 2.1, or so. Ilpo With what features? validation? I really want a validating Ilpo parser with a DOM interface. (Or something better than DOM, Ilpo must be object oriented.) Check out (coincidentally) Fredrik's elementtree: http://effbot.org/zone/element-index.htm At least the interface looks quite simple and usable. With some validation wrapping over it, it might be ok... Ilpo And my point is that the regular expression compilation can Ilpo be a problem in python. The current regular expression Ilpo engine is just unusable slow in short lived programs with a Ilpo bit bigger amount of regexps. And fixing it should not be Ilpo that hard: an easy improvement would be to add some kind of Ilpo storing mechanism for the compiled regexps. Are there any Ilpo reasons not to do this? It should start life as a third-party module (perhaps written by you, who knows :-). If it is deemed useful and clean enough, it could be integrated w/ python proper. This is clearly something that should not be in the python core, because the regexps themselves aren't there either. How can it work automatically in separate module? Replacing the re.compile with something sounds possible way of getting the regexps, but how and where to store the compiled data? Is there a way to put it to the byte code file? Maybe I need to take a look at it when I find the time... -- Ilpo Nyyssönen # biny # /* :-) */ -- http://mail.python.org/mailman/listinfo/python-list
Re: Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))
Ilpo Nyyssönen wrote: What is the point in doing validation if it isn't done every time? Why wouldn't I do it every time? It isn't that slow thing to do. DTD validation is useful in two cases: making sure that data from a foreign source has the right structure, and making sure that data you create has the right structure. The former is relevant for de- ployed code, but the latter really only makes sense during deve- lopment, and can easily be solved by running an external validator as part of your test suite. Pickle doesn't have validation. I am not comfortable for using it as storage format that should be reliable over years when the program evolves. It also doesn't tell me if my program has put something other to the data than I meant to. But DTD validation doesn't tell you that either -- it's only concerned with the structure, not the content. You can get a bit further with better schema technologies, but if you want reliable storage, use checksums or digests. Validation is like the helmet used by skydivers; if you think that's all you need, you sure is going to be surprised when you hit the ground. I want to do the simplest thing, but I also want something that helps me keep the program usable also in the future. I prefer putting some resources to get some validation to it initially than use later more resouces to do something with undetermined lump of data. If you want the simplest thing, get rid of the DTD, and make your loader ignore things that it doesn't recognize, use default values for fields that are not required (or weren't in the format from the start), and give a nice readable error message if something required is missing. That'll give you a nice, portable, reliable, and extremely future-proof design. Check out (coincidentally) Fredrik's elementtree: http://effbot.org/zone/element-index.htm At least the interface looks quite simple and usable. With some validation wrapping over it, it might be ok... I was going to point you to a validating parser for ET, but the it might be ok statement is a bit too arrogant for my taste. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))
Ilpo == Ilpo Nyyssnen iny writes: Ilpo Pickle doesn't have validation. I am not comfortable for Ilpo using it as storage format that should be reliable over Ilpo years when the program evolves. It also doesn't tell me if That's why you should implement xml import/export mechanism and use the xml file as the canonical data, while the pickle is only a cache for the data. Ilpo How can it work automatically in separate module? Replacing Ilpo the re.compile with something sounds possible way of getting Ilpo the regexps, but how and where to store the compiled data? Ilpo Is there a way to put it to the byte code file? Do what you already did - dump the regexp cache to a separate file. -- Ville Vainio http://tinyurl.com/2prnb -- http://mail.python.org/mailman/listinfo/python-list
Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl)
Ilpo == Ilpo Nyyssnen iny writes: so you picked the wrong file format for the task, and the slowest Ilpo What would you recommend instead? Ilpo I have searched alternatives, but somehow I still find XML Ilpo the best there is. It is a standard format with standard Ilpo programming API. Ilpo I don't want to lose my calendar data. XML as a standard Ilpo format makes it easier to convert later to some other Ilpo format. As a textual format it is also readable as raw also Ilpo and this eases debugging. Use pickle, perhaps, for optimal speed and code non-ugliness. You can always use xml as import/export format, perhaps even dumping the db to xml at the end of each day. Ilpo And my point is that the regular expression compilation can Ilpo be a problem in python. The current regular expression Ilpo engine is just unusable slow in short lived programs with a Ilpo bit bigger amount of regexps. And fixing it should not be Ilpo that hard: an easy improvement would be to add some kind of Ilpo storing mechanism for the compiled regexps. Are there any Ilpo reasons not to do this? It should start life as a third-party module (perhaps written by you, who knows :-). If it is deemed useful and clean enough, it could be integrated w/ python proper. This is clearly something that should not be in the python core, because the regexps themselves aren't there either. python has shipped with a fast XML parser since 2.1, or so. Ilpo With what features? validation? I really want a validating Ilpo parser with a DOM interface. (Or something better than DOM, Ilpo must be object oriented.) Check out (coincidentally) Fredrik's elementtree: http://effbot.org/zone/element-index.htm Ilpo I don't want to make my programs ugly (read: use some more Ilpo low level interface) and error prone (read: no validation) Ilpo to make them fast. Why don't you use external validation on the created xml? Validating it every time sounds like way too much like Javaic BD to be fun anymore. Pickle should serve you well, and would probably remove about half of your code. Do the simplest thing that could possibly work and all that. -- Ville Vainio http://tinyurl.com/2prnb -- http://mail.python.org/mailman/listinfo/python-list