Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))

2005-04-24 Thread Ilpo Nyyssönen

[reorganized a bit]

Ville Vainio [EMAIL PROTECTED] writes:

 Why don't you use external validation on the created xml? Validating
 it every time sounds like way too much like Javaic BD to be fun
 anymore. Pickle should serve you well, and would probably remove about
 half of your code. Do the simplest thing that could possibly work
 and all that.

What is the point in doing validation if it isn't done every time? Why
wouldn't I do it every time? It isn't that slow thing to do.

Pickle doesn't have validation. I am not comfortable for using it as
storage format that should be reliable over years when the program
evolves. It also doesn't tell me if my program has put something other
to the data than I meant to. The program will just throw some weird
exception.

I want to do the simplest thing, but I also want something that helps
me keep the program usable also in the future. I prefer putting some
resources to get some validation to it initially than use later more
resouces to do something with undetermined lump of data.

  python has shipped with a fast XML parser since 2.1, or so.

 Ilpo With what features? validation? I really want a validating
 Ilpo parser with a DOM interface. (Or something better than DOM,
 Ilpo must be object oriented.)

 Check out (coincidentally) Fredrik's elementtree:

 http://effbot.org/zone/element-index.htm

At least the interface looks quite simple and usable. With some
validation wrapping over it, it might be ok...

 Ilpo And my point is that the regular expression compilation can
 Ilpo be a problem in python. The current regular expression
 Ilpo engine is just unusable slow in short lived programs with a
 Ilpo bit bigger amount of regexps. And fixing it should not be
 Ilpo that hard: an easy improvement would be to add some kind of
 Ilpo storing mechanism for the compiled regexps. Are there any
 Ilpo reasons not to do this?

 It should start life as a third-party module (perhaps written by you,
 who knows :-). If it is deemed useful and clean enough, it could be
 integrated w/ python proper. This is clearly something that should not
 be in the python core, because the regexps themselves aren't there
 either.

How can it work automatically in separate module? Replacing the
re.compile with something sounds possible way of getting the regexps,
but how and where to store the compiled data? Is there a way to put it
to the byte code file?

Maybe I need to take a look at it when I find the time...

-- 
Ilpo Nyyssönen # biny # /* :-) */
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))

2005-04-24 Thread Fredrik Lundh
Ilpo Nyyssönen wrote:

 What is the point in doing validation if it isn't done every time? Why
 wouldn't I do it every time? It isn't that slow thing to do.

DTD validation is useful in two cases: making sure that data from
a foreign source has the right structure, and making sure that data
you create has the right structure.  The former is relevant for de-
ployed code, but the latter really only makes sense during deve-
lopment, and can easily be solved by running an external validator
as part of your test suite.

 Pickle doesn't have validation. I am not comfortable for using it as
 storage format that should be reliable over years when the program
 evolves. It also doesn't tell me if my program has put something other
 to the data than I meant to.

But DTD validation doesn't tell you that either -- it's only concerned
with the structure, not the content. You can get a bit further with better
schema technologies, but if you want reliable storage, use checksums
or digests.  Validation is like the helmet used by skydivers; if you think
that's all you need, you sure is going to be surprised when you hit the
ground.

 I want to do the simplest thing, but I also want something that helps
 me keep the program usable also in the future. I prefer putting some
 resources to get some validation to it initially than use later more
 resouces to do something with undetermined lump of data.

If you want the simplest thing, get rid of the DTD, and make your
loader ignore things that it doesn't recognize, use default values for
fields that are not required (or weren't in the format from the start),
and give a nice readable error message if something required is
missing.  That'll give you a nice, portable, reliable, and extremely
future-proof design.

  Check out (coincidentally) Fredrik's elementtree:
 
  http://effbot.org/zone/element-index.htm

 At least the interface looks quite simple and usable. With some
 validation wrapping over it, it might be ok...

I was going to point you to a validating parser for ET, but the it might
be ok statement is a bit too arrogant for my taste.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Object oriented storage with validation (was: Re: Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl))

2005-04-24 Thread Ville Vainio
 Ilpo == Ilpo Nyyssnen iny writes:

Ilpo Pickle doesn't have validation. I am not comfortable for
Ilpo using it as storage format that should be reliable over
Ilpo years when the program evolves. It also doesn't tell me if

That's why you should implement xml import/export mechanism and use
the xml file as the canonical data, while the pickle is only a cache
for the data.

Ilpo How can it work automatically in separate module? Replacing
Ilpo the re.compile with something sounds possible way of getting
Ilpo the regexps, but how and where to store the compiled data?
Ilpo Is there a way to put it to the byte code file?

Do what you already did - dump the regexp cache to a separate file. 

-- 
Ville Vainio   http://tinyurl.com/2prnb
-- 
http://mail.python.org/mailman/listinfo/python-list


Caching compiled regexps across sessions (was Re: Regular Expressions - Python vs Perl)

2005-04-23 Thread Ville Vainio
 Ilpo == Ilpo Nyyssnen iny writes:

 so you picked the wrong file format for the task, and the slowest

Ilpo What would you recommend instead?

Ilpo I have searched alternatives, but somehow I still find XML
Ilpo the best there is. It is a standard format with standard
Ilpo programming API.

Ilpo I don't want to lose my calendar data. XML as a standard
Ilpo format makes it easier to convert later to some other
Ilpo format. As a textual format it is also readable as raw also
Ilpo and this eases debugging.

Use pickle, perhaps, for optimal speed and code non-ugliness. You can
always use xml as import/export format, perhaps even dumping the db to
xml at the end of each day.

Ilpo And my point is that the regular expression compilation can
Ilpo be a problem in python. The current regular expression
Ilpo engine is just unusable slow in short lived programs with a
Ilpo bit bigger amount of regexps. And fixing it should not be
Ilpo that hard: an easy improvement would be to add some kind of
Ilpo storing mechanism for the compiled regexps. Are there any
Ilpo reasons not to do this?

It should start life as a third-party module (perhaps written by you,
who knows :-). If it is deemed useful and clean enough, it could be
integrated w/ python proper. This is clearly something that should not
be in the python core, because the regexps themselves aren't there
either.

 python has shipped with a fast XML parser since 2.1, or so.

Ilpo With what features? validation? I really want a validating
Ilpo parser with a DOM interface. (Or something better than DOM,
Ilpo must be object oriented.)

Check out (coincidentally) Fredrik's elementtree:

http://effbot.org/zone/element-index.htm

Ilpo I don't want to make my programs ugly (read: use some more
Ilpo low level interface) and error prone (read: no validation)
Ilpo to make them fast.

Why don't you use external validation on the created xml? Validating
it every time sounds like way too much like Javaic BD to be fun
anymore. Pickle should serve you well, and would probably remove about
half of your code. Do the simplest thing that could possibly work
and all that.

-- 
Ville Vainio   http://tinyurl.com/2prnb
-- 
http://mail.python.org/mailman/listinfo/python-list