dllimporter -- import dlls for zip archived or frozen applications

2005-05-20 Thread Dieter Maurer
'Dllimporter' is a package to facilitate the import of shared libraries
(aka dlls or extension modules) for Python applications
running from a zip archive or an executable (a frozen application).

The standard Python import mechanism cannot import extension modules
being part of a zip archived or frozen package. 'Dllimporter' overcomes
this restriction.

The package provides a class 'dllimporter' the instances of which implement
Pythons 'importer' protocol (see PEP-0302). These instances are
designed to be used as importers on Pythons 'sys.meta_path' (again
see PEP-0302, for details).


Download: http://www.dieter.handshake.de/pyprojects/dllimporter.tgz


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


[Ann] DMpdb: improvements and extensions for pdb

2005-08-15 Thread Dieter Maurer
DMpdb is a tiny Python package with improvements and extensions for
pdb, Pythons built-in debugger:

  *  the 'where' command gets optional arguments 'number' and 'end'
to control the amount of entries printed;

 its output is made more readable;

 by defining the method 'getAdditionalFrameInfo', a derived class
 can show additional debugging info.
 
  *  the 'do_break' command can now be used from outside
 the debugger to define breakpoints
 
  *  the new 'frame' command allows to directly switch to an interesting
 frame
 
The package also defines the module Zpdb providing a debugger class
that understands Zope's additional debugging info ('__traceback_info__' and
'__traceback_supplement__').


Download: http://www.dieter.handshake.de/pyprojects
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


Re: Embedding a restricted python interpreter

2005-01-17 Thread Dieter Maurer
Paul Rubin http://[EMAIL PROTECTED] writes on 08 Jan 2005 14:56:43 -0800:
 Dieter Maurer [EMAIL PROTECTED] writes:
It uses a specialized compiler that prevents dangerous bytecode operations
to be generated and enforces a restricted builtin environment.
 
 Does it stop the user from generating his own bytecode strings and
 demarshalling them?

Almost surely, I do not understand you:

  In the standard setup, the code has no access to most
  of Python's runtime library. Only a few selected modules
  are deemed to be safe and can be imported (and used) in
  RestrictedPython. marshal or unmarshal are not considered safe.
  Security Declaration can be used to make more modules importable -- but
  then, this is an explicite decision by the application developper.

  *If* the framework decided to exchange byte code between
  user and iterpreter, then there would be no security at
  all, because the interpreter is the standard interpreter
  and security is built into the compilation process.
  Of course, you should not step in *after* the secured step ;-)

  Thus, RestrictedPython expects that the user sends
  Python source code (and not byte code!), it compiles
  this source code into byte code that enforces a strict
  access and facility policy.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: limited python virtual machine (WAS: Another scripting language implemented into Python itself?)

2005-01-27 Thread Dieter Maurer
Steven Bethard [EMAIL PROTECTED] writes on Tue, 25 Jan 2005 12:22:13 -0700:
 Fuzzyman wrote:
 ...
   A better (and of course *vastly* more powerful but unfortunately only
   a dream ;-) is a similarly limited python virutal machine.

I already wrote about the RestrictedPython which is part of Zope,
didn't I?

Please search the archive to find a description...


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Zope: Adding a layer causes valid output to become an object reference?

2005-02-13 Thread Dieter Maurer
Junkmail [EMAIL PROTECTED] writes on 10 Feb 2005 18:26:52 -0800:
 I've been playing with Zope for about a year and took the plunge last
 week into making a product.

You should send Zope related questions to the Zope mailing list.
You will need to subcribe. You can do this at http://www.zope.org;.

 ...
 What doesn't work is when I refer to an object I've created from a
 dtml-var or a tal:content or tal:replace statement.  Instead of the
 proper output I receive: TextileClass at home where TextileClass is
 my class name and home is the Object ID I'm referencing.

There is a fundamental difference when you call an object
with ZPublisher (i.e. via the Web) and when you use it
in a template.

ZPublisher effectively calls index_html and if
this is None, it calls the object.

A template (usually) calls the object (if it is callable) and
then converts the result into a string.


Apparently, your ZInstance is not callable.
Therefore, it is converted into a string.
Apparently, it does not have a custom __str__,
therefore, you get the standard string conversion for instances
(that's what you see).

 The fact that
 the  and  surround the output make it invisible in the browser and
 had me chasing ghosts for a while.  I'd bet I'm not groking something
 here.

This should not happen in ZPT (unless you use structure).

 So if I call /home I get the proper HTML output: bWhat I am looking
 for/b but when in another object I reference dtml-var home I get:
 TextileClass at home.
 
 Any thoughts are appreciated.

Call a method on your ZInstance.

To get the same result as via ZPublisher (the Web),
call its index_html. This may need arguments (I do not know).


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: KeyboardInterrupt vs extension written in C

2005-10-22 Thread Dieter Maurer
Tamas Nepusz [EMAIL PROTECTED] writes on 20 Oct 2005 15:39:54 -0700:
 The library I'm working on
 is designed for performing calculations on large-scale graphs (~1
 nodes and edges). I want to create a Python interface for that library,
 so what I want to accomplish is that I could just type from igraph
 import * in a Python command line and then access all of the
 functionalities of the igraph library. Now it works, except the fact
 that if, for example, I start computing the diameter of a random graph
 of ~10 nodes and ~20 edges, I can't cancel it, because the
 KeyboardInterrupt is not propagated to the Python toplevel (or it isn't
 even generated until the igraph library routine returns).

Python installs a SIGINT handler that just notes that
such a signal was received. The note is handled during
bytecode execution. This way, Python handles the (dangerous)
asynchronous signal synchronously (which is much safer).
But, it also means that a signal during execution of your C extension
is only handled when it finished.

What you can do in your wrapper code:

   Temporarily install a new handler for SIGINT that
   uses longjmp to quit the C extension execution when
   the signal occurs.

   Note that longjmp is dangerous. Great care is necessary.

   It is likely that SIGINT occurrences will lead to big
   resource leaks (because your C extension will have no
   way to release resources when it gets quit with longjmp).


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python-based Document Management System?

2005-11-15 Thread Dieter Maurer
W. Borgert [EMAIL PROTECTED] writes on Thu, 10 Nov 2005 14:43:14 +0100:
 I'm looking for a Python-based DMS, but I don't know any.
 The following points are relevant:
 
 - suitable for 10..100 users with more than 1 documents
 
 - documents are mostly proprietary formats: MS Word, MS Excel,
   MS PowerPoint, but maybe also: PDF, HTML, DocBook, ...
 
 - typical DMS features: access control, archive older
   versions, search/query, document hierarchy, web frontend

You may have a look at Plone, Silva and CPS (all Zope based).


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Profiling with hotshot and wall clock time

2005-11-26 Thread Dieter Maurer
Geert Jansen [EMAIL PROTECTED] writes on Thu, 24 Nov 2005 21:33:03 +0100:
 ...
 Is possible to use hotshot with wall clock time, i.e. is it possible
 to have the code fragment below show one second as opposed to zero? 
 The old profiler seems to have functionality choosing a timer function
 but it crashed on my code.

I do not know whether it is possible with hotshop but it is
with profile. Depending on how large the waiting time it,
profile might be adequate to analyse the problem.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python2.4: building '_socket' extension fails with `INET_ADDRSTRLEN' undeclared

2004-12-05 Thread Dieter Maurer
Martin v. Löwis [EMAIL PROTECTED] writes on Sat, 04 Dec 2004 00:37:43 +0100:
...
 So it appears that on your system, INET_ADDRSTRLEN is not defined,
 even if it is supposed to be defined on all systems, regardless
 of whether they support IPv6.

I have met this problem in older Solaris versions: the Solaris
headers did not define the macro.

I fixed my problem by providing the lacking definition
in the relevant Python header file.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to set condition breakpoints?

2004-12-18 Thread Dieter Maurer
Christopher J. Bottaro [EMAIL PROTECTED] writes on Fri, 10 Dec 2004 
11:45:19 -0600:
 ...
 Hmm, thanks for the suggestions.  One more quick question.  Is it even
 possible to set a breakpoint in a class method in pdb.py?  I can't even say
 break Class.f without the condition.  I don't think the documentation for
 pdb is very good...=(

What happens? I can do it...

However, I had to fix pdb to prevent it to set the breakpoint
inside the docstring (where it is not effective).

I hope the fix found its way into the most recent Python versions
(2.3.4 and 2.4).

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Embedding a restricted python interpreter

2005-01-08 Thread Dieter Maurer
Doug Holton [EMAIL PROTECTED] writes on Thu, 06 Jan 2005 20:34:31 -0600:
 ...
 Hi, there is a page on this topic here:
 http://www.python.org/moin/SandboxedPython
 
 The short answer is that it is not possible to do this with the
 CPython, but you can run sandboxed code on other virtual machines,
 such as Java's JVM with Jython, or .NET/Mono's CLR with Boo or
 IronPython.

Zope contains a restrictedPython implementation.

  It uses a specialized compiler that prevents dangerous bytecode operations
  to be generated and enforces a restricted builtin environment.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simpler transition to PEP 3000 Unicode only strings?

2005-09-21 Thread Dieter Maurer
Petr Prikryl [EMAIL PROTECTED] writes on Tue, 20 Sep 2005 11:21:59 +0200:
 ...
 The idea:
 =
 
 What do you think about the following proposal
 that goes the half way
 
   If the Python source file is stored in UTF-8 (or
   other recognised Unicode file format), then the
   encoding declaration must reflect the format or
   can be omitted entirely. In such case, all
   simple string literals will be treated as
   unicode string literals.
   
 Would this break any existing code?

Yes: modules that construct byte strings (i.e. strings
which should *not* be unicode strings).

Nevertheless, such a module may be stored in UTF-8.
-- 
http://mail.python.org/mailman/listinfo/python-list


Removing nested tuple function parameters (was: C#3.0 and lambdas)

2005-09-21 Thread Dieter Maurer
Fredrik Lundh [EMAIL PROTECTED] writes on Mon, 19 Sep 2005 10:31:48 +0200:
 ...
 meanwhile, over in python-dev land:
 
 Is anyone truly attached to nested tuple function parameters; 'def
 fxn((a,b)): print a,b'?  /.../

Yes, I am...

 Would anyone really throw a huge fit if they went away?  I am willing
 to write a PEP for their removal in 2.6 with a deprecation in 2.5 if
 people are up for it.

I would think of it as a great stupidity.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: strange import phenomenon

2005-09-21 Thread Dieter Maurer
Christoph Zwerschke [EMAIL PROTECTED] writes on Tue, 20 Sep 2005 11:20:37 
+0200:
 Just hitting a strange problem with Python import behavior. It is the
 same on all Python 2.x versions and it is probably correct, but I
 currently don't understand why this happens.
 ...
 --- dir/__init__.py ---
 print init
 ---
 
 
 --- dir/hello.py --
 print hello world
 ---
 
 
 --- dir/test2.py --
 import sys
 sys.path = []
 
 import hello
 ---
 
 
 The script test2.py removes all entries from the sys.path. So when I
 run test2.py directly, I get an ImportError because the hello module
 cannot be imported. This is as expected.
 
 
 However, if I run test1, the hello module *is* imported and I get the
 hello world message. Why is that??

Because Python tries to resolve modules inside a package (dir in
your example) locally first. This local resolution does not
involve sys.path but package.__path__.
Only when the local lookup fails, a global lookup (using sys.path)
is tried.

In your case, the local lookup succeeds.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is Python suitable for a huge, enterprise size app?

2005-05-27 Thread Dieter Maurer
Fredrik Lundh [EMAIL PROTECTED] writes on Tue, 24 May 2005 22:38:05 +0200:
 ...
 nothing guarantees that, of course.  but I've never seen that
 happen. and I'm basing my comments on observed behaviour in
 real systems, not on theoretical worst-case scenarios.

I observed in real systems (Zope) that the system got slower
and slower as the amount of allocated memory increased -- although
the OS was far from its memory resource limits (and virtual memory size
was not much larger then resident memory size). Flushing caches
(and thereby releasing most memory) did not speed up things
but restarting did.

I do not understand this observed behaviour.

 every
 time I've seen serious fragmentation, it's been related to leaks,
 not peak memory usage.

An analysis did not reveal serious leaks, in the cases mentioned above.


Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-27 Thread Dieter Maurer
Martin v. Löwis [EMAIL PROTECTED] writes on Tue, 24 May 2005 23:58:03 +0200:
 ... 10.000 failing opens -- a cause for significant IO during startup ? ...

 So I would agree that IO makes a significant part of startup, but
 I doubt it is directory reading (unless perhaps you have an
 absent NFS server or some such).

We noticed the large difference between warm and cold start even when
we run from a zip archive. We expected that the only relevant IO would
go to the zip archives and therefore, we preloaded them to the
OS cache (by reading them sequentially) before the Python start.
To our great surprise, this did not significantly reduced Python's
(cold) startup time. We concluded that there must be other IO
not directed to the zip archives, started investigating and found
the 10.000 opens to non-existing files as the only other
significant IO contingent


Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-27 Thread Dieter Maurer
Scott David Daniels [EMAIL PROTECTED] writes on Wed, 25 May 2005 07:10:00 
-0700:
 ...
 I'll bet this means that the 'zope.zip', 'python24.zip' would drop
 you to about 12500 - 1 = 2500 failing opens.  That should be
 an easy test: sys.path.insert(0, 'zope.zip') or whatever.
 If that works and you want to drop even more, make a copy of zope.zip,
 update it with python24.zip, and call the result python24.zip.

We can not significantly reduce the amount of opens further:

  Each module import from a zip archive opens the archive.
  As we have about 2.500 modules, we will get this order of opens
  (as long as we use Python's zipimporter).

The zipimporter uses a sequence of stats to determine
whether it can handle a path item: it drops the last
component until it gets an existing file object and then
checks that it is indeed a zip archive.
Adding a cache for this check could save an additional few
hundreds of opens.


Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: some profiler questions

2005-06-09 Thread Dieter Maurer
Mac [EMAIL PROTECTED] writes on 7 Jun 2005 20:38:51 -0700:
 
 1) I'd still like to run my whole app (i.e., using main()), but I'd
 like to limit the profiling to only the select few subroutines.  That
 is, say I have a set of such fns in mind, call it key_fns, and I
 would like to only profile # of invocations of these fns from key_fns,
 as well as by whom they were called, and how much cumulative time was
 spent in them.  Is such lower-level control possible?  The main reason
 I want this is that I do not want to profile most of the low-level
 routines, like vector addition, at least not yet... I don't want to
 slow down code execution any more than is necessary, as the statistics
 gathering should occur during normal runs (i.e., during normal
 operation).

Please read the How it works section.

You find an example how the profiler can be customized
in my ZopeProfiler product (the HLProfiler.py).
It makes profiling slower (as it derives high level profiles
in addition to the low level ones) but might nevertheless
help you as an example.

http://www.dieter.handshake.de/pyprojects/zope

 2) I've only just discovered that pstats has print_callers()!  That's
 very useful info I wasn't aware was available!  What I'm really looking
 for now is profiler output in the style generated by gprof, the GNU
 profiler, as I have found that format terribly useful (a section for
 each fn, with the fn's callers and callees interwoven in each section).
  Does anyone know of a utility which would format the Python profiling
 info in that format, or something very similar?  I haven't actually
 seen any output from print_callers (can't find any samples on Net, and
 my app is currently half-busted, mid-refactor), so if that's what it
 precisely does, ignore this question.

Thus, you look at the internal representation of pstats
and format it in the way you like...

 3) assuming the above-mentioned fine control of (1) is not yet
 possible, I will muddle on with my own selective profiling code; the
 question I have then is, what is the cleanest way to override a class
 instance's method at runtime?

class_.f = some_wrapper(class_.f)

 What my profiler is doing is overriding
 the key fns/methods of an instance with a stat-gatherer-instrumented
 version, which ends up calling the original method.  I tried reading
 profile.py and pstats.py for ideas, but they are a little too
 complicated for me, for now; I doubt that's how they do their profiling
 anyways.

You should read the How it works section.
You will find out that your doubt is justified...


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: subprocess module and blocking

2005-06-13 Thread Dieter Maurer
Robin Becker [EMAIL PROTECTED] writes on Sun, 12 Jun 2005 09:22:52 +:
 I'm using a polling loop in a thread that looks approximately like this
 
 while 1:
  p = find_a_process()
  rc = p.poll()
  if rc is not None:
  out, err = p.communicate()
  #deal with output etc
  sleep(1)

 I notice that under both win32 and freebsd that things are fine
 provided that the subprocess doesn't write too much to
 stdout/stderr. However, the subprocess seems to lock often if too much
 is written (under freebsd I see a process state of POLL). I assume
 that the subprocess is filling up the pipe and then failing to wake up
 again. I had expected that subprocess would take care of this for me,

You just found out that this is not the case.

The warning attached to communicates docstring might have
you averted: Note: the data read is buffered in memory...
do not use for large size.

If subprocess would do what you expect, it would need to
buffer the output (to make room in the pipes again).
But, for large data, this could have dramatic consequences.

Thus, you should use select on the pipes to find out
when to read data.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unbound names in __del__

2005-06-18 Thread Dieter Maurer
Peter Hansen [EMAIL PROTECTED] writes on Fri, 17 Jun 2005 08:43:26 -0400:
 ...
 And I don't recall the last time I saw a __del__ in third-party code I
 was examining.
 
 
 What's your use case for del?

I had to use one a few days ago:

 To call the unlink method of a minidom object when
 its container is destroyed.

 It would have been possible to let the cyclic garbage
 collector find and eliminate the cyclic minidom objects.
 But, DOMs may be large and I create lots of them in
 a short time (each in its own container)...


Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: importing packages from a zip file

2005-07-01 Thread Dieter Maurer
Scott David Daniels [EMAIL PROTECTED] writes on Wed, 29 Jun 2005 10:36:29 
-0700:
 Peter Tillotson wrote:
 ...
  from myZip.zip import myModule.py
 
 
 Does this work for you?  It gives me a syntax error.
 
 Typically, put the zip file on the sys.path list, and import modules
 and packages inside it.  If you zip up the above structure, you can use:
 
  sys.path.insert(0, 'myZip.zip')
  import base.branch1.myModule

The alternative is to use a zipimporter (from module zipimport)
and use the importer protocol (documented in a PEP).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use cases for del

2005-07-07 Thread Dieter Maurer
Daniel Dittmar [EMAIL PROTECTED] writes on Wed, 06 Jul 2005 16:12:46 +0200:
 Peter Hansen wrote:
  Arguing the case for del: how would I, in doing automated testing,
  ensure that I've returned everything to a clean starting point in
  all cases if I can't delete variables?  Sometimes a global is the
  simplest way to do something... how do I delete a global if not with
  del?
 
 
 globals ().__delitem__ (varname)
 
 except that the method would probably be called delete.

You now have a uniform way to remove an object from a namespace.

Why would you want to give each namespace its own method to
remove objects from it?

Can you imagine how much code you would break (would your proposal
to remove del got accepted)?


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Conditionally implementing __iter__ in new style classes

2005-07-07 Thread Dieter Maurer
Thomas Heller [EMAIL PROTECTED] writes on Wed, 06 Jul 2005 18:07:10 +0200:
 Thomas Heller [EMAIL PROTECTED] writes:
 ...
  class Base:
  def __getattr__(self, name):
  if name == __iter__ and hasattr(self, Iterator):
  return self.Iterator
  raise AttributeError, name
 
  class Concrete(Base):
  def Iterator(self):
  yield 1
 ...
  If, however, I make Base a newstyle class, this will not work any
  longer.  __getattr__ is never called for __iter__ (neither is
  __getattribute__, btw).  Probably this has to do with data descriptors
  and non-data descriptors, but I'm too tired at the moment to think
  further about this.
 
  Is there any way I could make the above code work with new style
  classes?
 
 I forgot to mention this: The Base class also implements a __getitem__
 method which should be used for iteration if the .Iterator method in the
 subclass is not available.  So it seems impossible to raise an exception
 in the __iter__ method if .Iterator is not found - __iter__ MUST return
 an iterator if present.

Then, it should return an interator (a new object) that uses
the __getitem__ method to iterate.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Revised PEP 349: Allow str() to return unicode strings

2005-08-24 Thread Dieter Maurer
Neil Schemenauer [EMAIL PROTECTED] writes on Mon, 22 Aug 2005 15:31:42 -0600:
 ...
 Some code may require that str() returns a str instance.  In the
 standard library, only one such case has been found so far.  The
 function email.header_decode() requires a str instance and the
 email.Header.decode_header() function tries to ensure this by
 calling str() on its argument.  The code was fixed by changing
 the line header = str(header) to:
 
 if isinstance(header, unicode):
 header = header.encode('ascii')

Note, that this is not equivalent to the old str(header):

  str(header) used Python's default encoding while the
  new code uses 'ascii'.

  The new code might be more correct than the old one has been.


 ...
 Alternative Solutions
 
 A new built-in function could be added instead of changing str().
 Doing so would introduce virtually no backwards compatibility
 problems.  However, since the compatibility problems are expected to
 rare, changing str() seems preferable to adding a new built-in.

Can we get a new builtin with the exact same behaviour as
the current str which can be used when we do require an str
(and cannot use a unicode).



Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode encoding usablilty problem

2005-02-21 Thread Dieter Maurer
Fredrik Lundh [EMAIL PROTECTED] writes on Sat, 19 Feb 2005 18:44:27 +0100:
 aurora [EMAIL PROTECTED] wrote:
 
  I don't want to mix them. But how could I find them? How do I know this  
  statement can be 
  potential problem
 
if a==b:
 
  where a and b can be instantiated individually far away from this line of  
  code that put them 
  together?

I do understand aurora's problems very well.

Me, too, I had suffered from this occasionally:

   * some library decides to use unicode (without I had asked it to do so)

   * Python decides then to convert other strings to unicode
 and bum: Unicode decode error.

I solve these issues with a sys.setdefaultencoding(ourDefaultEncoding)
in sitecustomize.py.

I know that almost all the characters I have to handle
are encoded in ourDefaultEncoding and if something
converts to Unicode without being asked for, then this
is precisely the correct encoding.

I know that Unicode fanatists do not like setdefaultencoding
but until we will have completely converted to Unicode (which we probably
will do in the farer future), this is essential to keep sane...


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ZoDB's capabilities

2005-03-02 Thread Dieter Maurer
Larry Bates [EMAIL PROTECTED] writes on Mon, 28 Feb 2005 18:48:39 -0600:
 There is a VERY large website that uses Zope/ZODB that takes up to
 9000 hits per second when it gets busy.  ZODB is very fast and
 holds up well under load.

But, you will not get 9000 hits per second from the ZODB (unless
both your disk and your processor are unexpectedly fast).

To get 9000 hits per second, you need extensive caching.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: java crashes in python thread

2005-03-02 Thread Dieter Maurer
Easeway wrote:
 I use os.system invoking java VM, when running in python thread, the
 java application crashes.

Andreas Jung has reported similar behaviour.

He suggested that Java 1.4 and the threads of Linux 2.6
do not work reliably together.
He tried Java 1.5 and this combination crashes only occationally.
But this, it is not yet reliable...

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Tons of stats/opens to non-existing files increases Python's startupon loaded NFS servers

2005-05-06 Thread Dieter Maurer
Fredrik Lundh [EMAIL PROTECTED] writes on Fri, 6 May 2005 00:08:36 +0200:
  ... lots of no such file or directory ...
  Whoa!! After looking at what is being stat'd or
  open'd, it looks like 'encodings' is new in 2.4 and,
  even worse, everything is looked for as a zip first.
 
 so why not build that ZIP?

We, too, saw this problem -- and we had the *.zip files already...

Python is a bit stupid in its import logic.
When, e.g., a package P defined in a zip archive zzz.zip
contains an import os, then Python checks whether
zzz.zip contains P.os (that is okay). *But*, usually
zzz.zip does not define P.os (as os is a builtin module)
and then Python checks in the file system
for zzz.zip/P/os{,.py,pyc,.so,} and zzz.zip/P/osmodule.so.
Of course, all of them fail as zzz.zip is a Zip archive
and zzz.zip/something is not meaningfull as a file system reference.

I improved on this by patching Python's import.c with the
attached patch. The patch implements that a path_hook
declaring itself to be responsible for a path is authoritative
for both negative as well as positive find_module responses.
Earlier, a negative find_module response caused Python to
try the default module lookup.


Furthermore, it is vital that your sys.path is as small as possible
because a single module lookup can cause file system lookups
in the order of 4 times the number of path elements. 

The standard extension of sys.path often contains far more
path elements than necessary (if you defined python24.zip, you
should remove all other python library directories that do not contain
shared objects).

Dieter

--- Python/import.c~	2004-10-07 08:46:25.0 +0200
+++ Python/import.c	2005-05-04 12:52:19.0 +0200
@@ -1211,6 +1211,9 @@
 return NULL;
 			/* Note: importer is a borrowed reference */
 			if (importer != Py_None) {
+			  	/* DM 2005-05-04: ATT: memory leak!
+   almost surely, we need
+   a Py_XDECREF(copy) */
 PyObject *loader;
 loader = PyObject_CallMethod(importer,
 			 find_module,
@@ -1223,7 +1226,12 @@
 	return importhookdescr;
 }
 Py_DECREF(loader);
-			}
+/* DM 2005-05-04: do not try the builtin import
+   when the responsible importer failed.
+   At least, for zipimport, trying builtin
+   import would be stupid. */
+continue;
+		}
 			/* no hook was successful, use builtin import */
 		}
 
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Is Python suitable for a huge, enterprise size app?

2005-05-21 Thread Dieter Maurer
Fredrik Lundh [EMAIL PROTECTED] writes on Thu, 19 May 2005 09:54:15 +0200:
 ...
 and unless your operating system is totally braindead, and thus completely 
 unfit
 to run huge enterprise size applications, that doesn't really matter much.  
 leaks
 are problematic, large peak memory use isn't.

Could you elaborate a bit?

Large peak memory use means that the application got a large
address space. What garantees that the residual memory use
(after the peak) is compact and not evenly spread across
the address space. While the OS probably is able to
reuse complete pages which are either unused for a longer
time or at least rarely accessed, it may become nasty when
almost every page contains a small amount of heavily used
memory.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-21 Thread Dieter Maurer
Martin v. Löwis [EMAIL PROTECTED] writes on Fri, 20 May 2005 18:13:56 +0200:
 Robin Becker wrote:
  Firstly should python start up with non-existent entries on the path?
 
 Yes, this is by design.
 
  Secondly is this entry be the default for some other kind of python
  installation?
 
 Yes. People can package everything they want in python24.zip (including
 site.py). This can only work if python24.zip is already on the path
 (and I believe it will always be sought in the directory where
 python24.dll lives).

The question was:

   should python start up with **non-existent** objects on the path.

I think there is no reason why path needs to contain an object
which does not exist (at the time the interpreter starts).

In your use case, python24.zip does exist and therefore may
be on the path. When python24.zip does not exist, it does
not contain anything and especially not site.py.


I recently analysed excessive import times and
saw thousands of costly and unneccesary filesystem operations due to:

  *  long sys.path, especially containing non-existing objects

 Although non-existent, about 5 filesystem operations are
 tried on them for any module not yet located.

  *  a severe weakness in Python's import hook treatment

 When there is an importer i for a path p and
 this importer cannot find module m, then p is
 treated as a directory and 5 file system operations
 are tried to locate p/m. Of course, all of them fail
 when p happens to be a zip archive.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-22 Thread Dieter Maurer
Martin v. Löwis [EMAIL PROTECTED] writes on Sat, 21 May 2005 23:53:31 +0200:
 Dieter Maurer wrote:
 ...
  The question was:
  
 should python start up with **non-existent** objects on the path.
  
  I think there is no reason why path needs to contain an object
  which does not exist (at the time the interpreter starts).
 
 There is. When the interpreter starts, it doesn't know what object
 do or do not exist. So it must put python24.zip on the path
 just in case.

Really?

Is the interpreter unable to call C functions (stat for example)
to determine whether an object exists before it puts it on path.

 Yes, but the interpreter cannot know in advance whether
 python24.zip will be there when it starts.

Thus, it checks dynamically when it starts.

  I recently analysed excessive import times and
  saw thousands of costly and unneccesary filesystem operations due to:
 
 Hmm. In my Python 2.4 installation, I only get 154 open calls, and
 63 stat calls on an empty Python file. So somebody must have messed
 with sys.path really badly if you saw thoughsands of file operations
 (although I wonder what operating system you use so that failing
 open operations are costly; most operating systems should do them
 very efficiently).

The application was Zope importing about 2.500 modules
from 2 zip files zope.zip and python24.zip.
This resulted in about 12.500 opens -- about 4 times more
than would be expected -- about 10.000 of them failing opens.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-22 Thread Dieter Maurer
Steve Holden [EMAIL PROTECTED] writes on Sun, 22 May 2005 09:14:43 -0400:
 ...
 There are some aspects of Python's initialization that are IMHO a bit
 too filesystem-dependent. I mentioned one in
 
 
  
 http://sourceforge.net/tracker/index.php?func=detailaid=1116520group_id=5470atid=105470
 
 
 but I'd appreciate further support. Ideally there should be some means
 for hooked import mechanisms to provide answers that are currently
 sought from the filestore.

There are such hooks. See e.g. the meta_path hooks as
described by PEP 302.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-24 Thread Dieter Maurer
Steve Holden [EMAIL PROTECTED] writes on Sun, 22 May 2005 16:19:10 -0400:
 ...
 Indeed I have written PEP 302-based code to import from a relational
 database, but I still don't believe there's any satisfactory way to
 have [such a hooked import mechanism] be a first-class component of an
 architecture that specifically requires an os.py to exist in the file
 store during initialization.
 
 
 I wasn't asking for an import hook mechanism (since I already knew
 these to exist), but for a way to allow such mechanisms to be the sole
 import support for certain implementations.

We do not have os.py (directly) on the file system.
It lives (like everything else) in a zip archive.

This works because the zipimporter is put on
sys.path_hook before the interpreter starts executing Python code.

Thus, all you have to do: use a different Python startup
and ensure that you special importer (able to import e.g. os)
is already set up, before you start executing Python code.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python24.zip

2005-05-24 Thread Dieter Maurer
Martin v. Löwis [EMAIL PROTECTED] writes on Sun, 22 May 2005 21:24:41 +0200:
 ...
 What do you mean, unable to? It just doesn't.

The original question was: why does Python put non-existing
entries on 'sys.path'.

Your answer seems to be: it just does not do it -- but it might
be changed if someone does the work.

This fine with me.

 ...
 In the past, there was a silent guarantee that you could add
 items to sys.path, and only later create the directories behind
 these items. I don't know whether people rely on this guarantee.

I do not argue that Python should prevent adding non-existing
items on path. This would not work as Python may not
know what existing means (due to path_hooks).

I only argue that it should not *itself* (automatically) put items on path
where it knows the responsible importers and knows (or can
easily determine) that they are non existing for them.

 ...
  The application was Zope importing about 2.500 modules
  from 2 zip files zope.zip and python24.zip.
  This resulted in about 12.500 opens -- about 4 times more
  than would be expected -- about 10.000 of them failing opens.
 
 I see. Out of curiosity: how much startup time was saved
 when sys.path was explicitly stripped to only contain these
 two zip files?

I cannot tell you precisely because it is very time consuming
to analyse cold start timing behavior (it requires a reboot for
each measurement).

We essentially have the following numbers only:

   warm startcold start
(filled OS caches)(empty OS caches)

from file system5s 13s
from ZIP archives   4s  8s
frozen  3s  5s

The ZIP archive time was measured after a patch to import.c
that prevents Python to view a ZIP archive member as a directory
when it cannot find the currently looked for module (of course,
this lookup fails also when the archive member is viewed as a directory).
Furthermore, all C-extensions were loaded via a meta_path hook (and
not sys.path) and sys.path contained just the two Zip archives.
These optimizations led to about 3.000 opens (down from originally 12.500).

 I would expect that importing 2500 modules takes *way*
 more time than doing 10.000 failed opens.

You may be wrong: searching for non existing files may cause
disk io which is several orders of magnitude slower that
CPU activities.

The comparison between warm start (few disc io) and cold start
(much disc io) tells you that the import process is highly
io dominated (for cold starts).

I know that this does not prove that the failing opens contribute
significantly. However, a colleague reported that the
import.c patch (essential for the reduction of the number of opens)
resulted in significant (but not specified) improvements.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ZSI and attachments

2008-03-14 Thread Dieter Maurer
Laszlo Nagy [EMAIL PROTECTED] writes on Tue, 11 Mar 2008 15:59:36 +0100:
 I wonder if the newest ZSI has support for attachments? Last time I
 checked (about a year ago) this feature was missing. I desperately
 need it. Alternatively, is there any other SOAP lib for python that
 can handle attachments?

The ZSI 2.0 documentation says:

   ...It can also be used to build applications using SOAP Messages with 
Attachments

I never did it and do not know how ZSI supports this.
You probably will get a more informed answer on
mailto:[EMAIL PROTECTED],
the ZSI mailing list.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Tremendous slowdown due to garbage collection

2008-04-27 Thread Dieter Maurer
Christian Heimes [EMAIL PROTECTED] writes on Sat, 12 Apr 2008 18:47:32 +0200:
 [EMAIL PROTECTED] schrieb:
  which made me suggest to use these as defaults, but then
  Martin v. Löwis wrote that
  
  No, the defaults are correct for typical applications.
  
  At that point I felt lost and as the general wish in that thread was
  to move
  discussion to comp.lang.python, I brought it up here, in a modified
  and simplified form.
 
 Martin said that the default settings for the cyclic gc works for most
 people. Your test case has found a pathologic corner case which is *not*
 typical for common application but typical for an artificial benchmark.
 Python is optimized for regular apps, not for benchmark (like some video
 drivers).

Martin said it but nevertheless it might not be true.

We observed similar very bad behaviour -- in a Web application server.
Apparently, the standard behaviour is far from optimal when the
system contains a large number of objects and occationally, large
numbers of objects are created in a short time.
We have seen such behaviour during parsing of larger XML documents, for
example (in our Web application).


Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Tremendous slowdown due to garbage collection

2008-04-28 Thread Dieter Maurer
Martin v. Löwis wrote at 2008-4-27 19:33 +0200:
 Martin said it but nevertheless it might not be true.
 
 We observed similar very bad behaviour -- in a Web application server.
 Apparently, the standard behaviour is far from optimal when the
 system contains a large number of objects and occationally, large
 numbers of objects are created in a short time.
 We have seen such behaviour during parsing of larger XML documents, for
 example (in our Web application).

I don't want to claim that the *algorithm* works for all typically
applications well. I just claim that the *parameters* of it are fine.
The OP originally proposed to change the parameters, making garbage
collection run less frequently. This would a) have bad consequences
in terms of memory consumption on programs that do have allocation
spikes, and b) have no effect on the asymptotic complexity of the
algorithm in the case discussed.

In our case, it helped to change the parameters:

  As usual in Python, in our case cyclic garbage is very rare.
  On the other hand, we have large caches with lots of objects,
  i.e. a large number of long living objects.
  Each generation 2 garbage collection visits the complete
  object set. Thus, if it is called too often, matters can
  deteriorate drastically.

  In our case, the main problem has not been the runtime
  but that during GC the GIL is hold (understandably).
  This meant that we had every few minutes a scheduling
  distortion in the order of 10 to 20 s (the time of
  our generation 2 gc).

We changed the parameters to let generation 2 GC happen
at about 1/1000 of its former frequency.


I do not argue that Python's default GC parameters must change -- only
that applications with lots of objects may want to consider a
reconfiguration.



-- 
Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Remove multiple inheritance in Python 3000

2008-04-28 Thread Dieter Maurer
Nick Stinemates [EMAIL PROTECTED] writes on Thu, 24 Apr 2008 08:26:57 -0700:
 On Tue, Apr 22, 2008 at 04:07:01AM -0700, GD wrote:
  Please remove ability to multiple inheritance in Python 3000.

I hope your request will not be followed.

  Multiple inheritance is bad for design, rarely used and contains many
  problems for usual users.

Multiple inheritance is very productive by supporting mixin classes.
I use it extensively and get clean code quickly developped.

I hate Java because it does not support multiple inheritance
and forces me to write lots of tedious error prone delegations
to work around this limitation.

Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Tremendous slowdown due to garbage collection

2008-05-01 Thread Dieter Maurer
John Nagle [EMAIL PROTECTED] writes on Mon, 28 Apr 2008 11:41:41 -0700:
 Dieter Maurer wrote:
  Christian Heimes [EMAIL PROTECTED] writes on Sat, 12 Apr 2008 18:47:32 
  +0200:
  [EMAIL PROTECTED] schrieb:
  which made me suggest to use these as defaults, but then
 
  We observed similar very bad behaviour -- in a Web application server.
  Apparently, the standard behaviour is far from optimal when the
  system contains a large number of objects and occationally, large
  numbers of objects are created in a short time.
  We have seen such behaviour during parsing of larger XML documents, for
  example (in our Web application).
 
 Our solution to that was to modify BeautifulSoup to use weak pointers.
 All the pointers towards the root and towards previous parts of the
 document are weak.  As a result, reference counting alone is sufficient
 to manage the tree.  We still keep GC enabled, but it doesn't find much
 to collect.

It will not help in our setup.

We, too, have almost no cycles -- but the GC does not know this:

  If a large number of objects are created temporarily and not released
  before the generation 1 threshoold is reached, then
  the garbage collector will start collections -- even, if there
  are no or very few cycles.
  A generation 2 garbage collection takes time proportional
  to the total number of (GC aware) objects -- independent of
  the number of cycles.

Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: The Importance of Terminology's Quality

2008-05-13 Thread Dieter Maurer
[EMAIL PROTECTED] [EMAIL PROTECTED] writes on Wed, 7 May 2008 16:13:36 
-0700 (PDT):
 ...
 Let me give a few example.
 
 • “lambda”, widely used as a keyword in functional languages, is named
 just “Function” in Mathematica. The “lambda” happend to be called so
 in the field of symbolic logic, is due to use of the greek letter
 lambda “λ” by happenstance. The word does not convey what it means.
 While, the name “Function”, stands for the mathematical concept of
 “function” as is.

lambda is not a function, but an operator with two operands, a variable
name and an expression resulting in another expression
which behaves like a function (the abstraction
of the variable name in the expression).


Dieter
--
http://mail.python.org/mailman/listinfo/python-list

Re: Zsi interoperability

2008-09-18 Thread Dieter Maurer
Mailing List SVR [EMAIL PROTECTED] writes on Tue, 16 Sep 2008 08:31:13 +0200:
 ...
 however my server require client
 certificate authentication,
 
 does soaplib or zsi work in this environment?

ZSI allows you to provide an alternative transport.
That's the usual way to let ZSI work over https rather than http.

I do not know whether Python supports a client certificate authentication
transport out of the box -- but at least the problem is split into
two easier parts.

Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: Zsi interoperability

2008-09-18 Thread Dieter Maurer
Marco Bizzarri [EMAIL PROTECTED] writes on Mon, 15 Sep 2008 20:26:27 +0200:
 On Mon, Sep 15, 2008 at 8:15 PM, Stefan Behnel [EMAIL PROTECTED] wrote:
  Mailing List SVR wrote:
  I have to implement a soap web services from wsdl, the server is
  developed using oracle, is zsi or some other python library for soap
  interoperable with oracle soa?
 
  No idea, but I'd definitely try soaplib before ZSI.
 
  Stefan
 
 I'm working on a project where I need to write a client for SOAP with
 Attachments; I can see ZSI does not support it

The ZSI documentation (2.0) says that SOAP attachments are supported --
but I never tried it.


Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Nested generator caveat

2008-07-03 Thread Dieter Maurer
I met the following surprising behaviour

 def gen0():
...   for i in range(3):
... def gen1():
...   yield i
... yield i, gen1()
...
 for i,g in gen0(): print i, g.next()
...
0 0
1 1
2 2
 for i,g in list(gen0()): print i, g.next()
...
0 2
1 2
2 2


If this is not a bug, it is at least quite confusing.


The apparent reason is that the free variables in
nested generator definitions are not bound (to a value) at invocation
time but only at access time.


Almost surely, the same applies to all locally defined functions
with free variables.
This would mean that locally defined functions with free
variables are very risky in generators.


-- 
Dieter
--
http://mail.python.org/mailman/listinfo/python-list


Re: pythonic backtrace with gdb

2008-01-24 Thread Dieter Maurer
Hynek Hanke [EMAIL PROTECTED] writes on Wed, 23 Jan 2008 14:30:22 +0100:
 ...
 I've also tried to use the backtrace script here
 http://mashebali.com/?Python_GDB_macros:The_Macros:Backtrace
 But I get a different error:
 (gdb) pbt
 Invalid type combination in ordering comparison.
 
 I'm using GDB version 6.6.90.

I expect that your GDB version is too new and has introduced
some safety checks (which now break).

It will probably help when you add explicite type casts to long
around the comparisons in the definition of pbt.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Large production environments using ZODB/ZOE?

2008-08-17 Thread Dieter Maurer
Phillip B Oldham [EMAIL PROTECTED] writes on Thu, 7 Aug 2008 09:26:04 -0700 
(PDT):
 I've been reading a lot recently on ZODB/ZOE, but I've not seen any
 reference to its use in large-scale production envrironments.
 
 Are there any real-world examples of ZODB/ZOE in use for a large
 system? By large, I'm thinking in terms of both horizontally-scaled
 systems and in terms of data storage size.

We are using it to host about a hundred domains of the Haufe Mediengruppe,
among others www.haufe.de, www.lexware.de and www.redmark.de.

Dieter











--
http://mail.python.org/mailman/listinfo/python-list


Re: Interrupt python thread

2008-08-30 Thread Dieter Maurer
En Mon, 25 Aug 2008 05:00:07 -0300, BlueBird [EMAIL PROTECTED]
escribi╴:
 Unfortunately, this does not map very well with my program. Each of my
 threads are calling foreign code (still written in python though),
 which might be busy for 1 to 10 minutes with its own job.

 I wanted something to easily interrupt every thread to prevent my
 program to stall for 10 minutes if I want to stop it (getting tired of
 killing python all the time).

At the C level, Python has function to send an exception to a thread.
The threads will see the exception only when it executes Python code
(i.e. not when it is waiting or running in external (e.g. C) code).

You may use (e.g.) PyRex to make a Python wrapper available
to your Python code.

Dieter
--
http://mail.python.org/mailman/listinfo/python-list

Re: SOAPpy, WSDL and objects

2006-03-08 Thread Dieter Maurer
[EMAIL PROTECTED] writes on 7 Mar 2006 08:14:47 -0800:
 login
request
   firstnameFirstName/firstname
   lastnameLastName/lastname
/request
 /login
 
 I am trying to do this with the following code:
 
 from SOAPpy import WSDL
 server = WSDL.Proxy(m_url)
 request = {'firstname': FirstName,
 'lastname': LastName}
 server.login(request)

Try: server.login({'request':request}).


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Possible inaccuracy in Python 2.4 when using profiler calibration

2006-06-16 Thread Dieter Maurer
Brian Quinlan [EMAIL PROTECTED] writes on Thu, 15 Jun 2006 10:36:26 +0200:
 I have a misinformed theory that I'd like to share with the list.
 
 I believe that profiler calibration no longer makes sense in Python
 2.4 because C functions are tracked and they have a different call
 overhead than Python functions (and calibration is done only using
 Python functions). Here is my reasoning (in code form):

I fear it never made sense -- even with pure Python functions:

  I tried to calibrate under Linux and failed miserably:
  apparently, the clock resolution is very coarse
  introducing a very high variance which is very bad for
  calibration.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Remote Boot Manager Scripting (Python)

2006-06-23 Thread Dieter Maurer
[EMAIL PROTECTED] writes on 21 Jun 2006 07:43:17 -0700:
 ...
 I have to remotely start a computer. It has dual boot (WinXP and
 Linux).
 My remote computer is Linux which will send command to remotely boot
 the other computer.
 
 Can we write python script or some utility which would let us select
 the Operating System to boot ? For example If we send parameter WIN
 it boots into Windows and if we send NIX it boots into Linux.

Probably yes -- but it is rather a remote boot protocol
than a Python questions.

Your remote boot protocol must in some way specify which
os should be booted. When you implement the client site
boot protocol in Python and you know how to select the os,
then simply do it.


I can tell something about grub (GRand Unified Boot loader).
It can be configured to be controlled by a serial line.
In this case, the boot protocol would be very simple:
send down characters to the serial line until the correct
os is selected; then send a return to boot.

It would be trivial for a small Python script (with access to this
serial line) to implement this.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Program slowing down with greater memory use

2006-06-25 Thread Dieter Maurer
Dan Stromberg [EMAIL PROTECTED] writes on Thu, 22 Jun 2006 23:36:00 GMT:
 I have two different python programs that are slowing down quite a bit as
 their memory use goes up.

I have seen this too with Zope.

I do not know where it comes from -- maybe from degraded locality.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Pyrex] pyrex functions to replace a method (Re: replace a method in class: how?)

2006-06-29 Thread Dieter Maurer
Greg Ewing [EMAIL PROTECTED] writes on Wed, 28 Jun 2006 11:56:55 +1200:
...
 I have suggested that builtin functions should be
 given the same method-binding behaviour as interpreted
 functions. The idea wasn't rejected out of hand, but
 I don't think anything has been done about it yet.

You can use:

def wrapAsMethod(to_be_wrapped)
  def wrapped(*args, **kw):
return to_be_wrapper(*args, **kw)
  return wrapped

and then use in your class:

class ...:
  ...
  myMethod = wrapAsMethod(builtin_function)


Such, this use case probably does not require a language change.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting socket connection failure

2006-07-15 Thread Dieter Maurer
[EMAIL PROTECTED] writes on 10 Jul 2006 08:42:11 -0700:
 I've tried to RTFM this and am having no luck.First off, I am using
 Mac OSX 10.4.7 with python 2.4.2 from fink.  I am trying to connect to
 a server that should be rejecting connections and I was surprised when
 it did not throw an exception or do something otherwise equally nasty.
 It just connects and never returns any data.  First, the proof that
 something is there and rejecting the connection (or is it that this
 thing actually accepts the connection and then drops it?)...
 
 telnet localhost 31414
 Trying 127.0.0.1...
 Connected to localhost.
 Escape character is '^]'.
 Connection closed by foreign host.

What you see here is that the connection was opened successfully
(the connect succeeded) and then closed again.

 ...
 In [1]: import socket, select
 
 In [2]: remote = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
 
 In [3]: remote.connect(('localhost',31414))
 
 In [4]: remote.recv(200)
 Out[4]: ''

The means, that you see the same in Python:
recv returning an empty string indicates that the connection
was closed.

 
 In [5]: r,w,e=select.select([remote],[remote],[remote],1)
 
 In [6]: print r,w,e
 [socket._socketobject object at 0x7e48d0] [socket._socketobject
 object at 0x7e48d0] []

I have seen something similar recently:

  I can write (send to be precise) to a socket closed by
  the foreign partner without error
  (but of course, the written data does not arrive at the remote side).
  Only the second send raises an exception.

  I expect this is a TCP bug.


--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Detecting socket connection failure

2006-07-24 Thread Dieter Maurer
[EMAIL PROTECTED] writes on 19 Jul 2006 08:34:00 -0700:
 ...
 Were you also using mac osx?

No, I have observed the problem under Linux.


 Dieter Maurer wrote:
 
  I have seen something similar recently:
 
I can write (send to be precise) to a socket closed by
the foreign partner without error
(but of course, the written data does not arrive at the remote side).
Only the second send raises an exception.
  
I expect this is a TCP bug.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: a bug in list.remove?

2006-08-21 Thread Dieter Maurer
Astan Chee [EMAIL PROTECTED] writes on Sat, 19 Aug 2006 03:34:26 +1000:
   for p in ps:
 if p in qs:
 ps.remove(p)

You are modifying an object (ps) while you iterate over it.
This is a receipe for surprises...

The standard idiom is to iterate over a copy rather than the object itself:

for p in ps[:]:
  


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Input from the same file as the script

2006-08-22 Thread Dieter Maurer
Georg Brandl [EMAIL PROTECTED] writes on Sun, 20 Aug 2006 20:08:38 +0200:
 [EMAIL PROTECTED] wrote:
  Can the input to the python script be given from the same file as the
  script itself. e.g., when we execute a python script with the command
  'python scriptName', can the input be given in someway ?
  When I ran the below the python interpreter gave an error.

The easiest way would be:

 data = '''\
 here comes your data
 ...
 '''

 # and now you use it
 ... data ...

 # you can even wrap it into a file
 from StringIO import StringIO
 data_as_file = StringIO(data)
 ... data_as_file.readline() ...

--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Don't use __slots__ (was Re: performance of dictionary lookup vs. object attributes)

2006-08-28 Thread Dieter Maurer
Patrick Maupin [EMAIL PROTECTED] writes on 26 Aug 2006 12:51:44 -0700:
 ...
 The only
 problem I personally know of is that the __slots__ aren't inherited,

__slots__ *ARE* inherited, although the rules may be a bit
complex.

 class B(object):
...   __slots__ = ('f1', 'f2',)
...
 class C(B): pass
...
 C.__slots__
('f1', 'f2')
 c=C()
 c.__dict__
{}
 c.f1='f1'
 c.__dict__
{}
 c.fc='fc'
 c.__dict__
{'fc': 'fc'}
 class C2(B):
...   __slots__=('f21',)
...
 C2.__slots__
('f21',)
 c2=C2()
 c2.f1='x'
 c2.f21='y'

--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Zope] how do I test for the current item in an iteration

2007-09-28 Thread Dieter Maurer
kamal hamzat wrote at 2007-9-28 16:36 +0100:
I have this error after i added the if statement

Error Type: TypeError
Error Value: mybrains.__cmp__(x,y) requires y to be a 'mybrains', not a 'int'


for i in context.zCatNewsCurrent():
 if i = 5:  
print %s: %s: %s % (i.id, i.author, i.summary)

You are aware that you use i both as an integer (i = 5)
as well as a structure (i.id, i.author, ...).

Python is quite polymorph -- but there are some limits.

Andreas suggestion was good: enumerate may help you...



-- 
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Survival of the fittest

2006-09-28 Thread Dieter Maurer
Thomas Bartkus [EMAIL PROTECTED] writes on Tue, 26 Sep 2006 22:06:56 -0500:
 ...
 We would be curious to know about those things you can do in C++
 but can't do in Python.

I implemented an incremental search engine in Python.

It worked fine for large quite specific and queries (it
was faster than a C implemented search engine).

However, for large or queries, it was by two orders of magnitude
slower then the C competitor.

I had to move my implementation to C. Now my search engine is
almost always (significantly) faster and when it is slower,
the difference is negligeable.


We learn: a C/C++ implementation can in some cases be drastically
more efficient than a Python one.

My special case was that almost all my data were integers
and the C implementation could exploit this fact -- unlike the Python one.

--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Survival of the fittest

2006-10-03 Thread Dieter Maurer
Dennis Lee Bieber [EMAIL PROTECTED] writes on Thu, 28 Sep 2006 23:57:51 GMT:
 On 28 Sep 2006 22:48:21 +0200, Dieter Maurer [EMAIL PROTECTED]
 declaimed the following in comp.lang.python:
 
  We learn: a C/C++ implementation can in some cases be drastically
  more efficient than a Python one.
 
   Did we?
 
   When did someone build a C/C++ compiler that generates bytecodes for
 the Python virtual machine interpreter?
 
   What I've learned from this tale is that a C/C++ implementation --
 compiling to native hardware opcodes -- can run faster than an
 implementation compiled to interpreted Python bytecodes... No idea as to
 efficiency of the implementations themselves G

I would be really surprised if an optimizing compiler (e.g. a just
in time compiler) would have found the optimations I implemented.

You may look at the two implementations: They are
IncrementalSearch (Python) and IncrementalSearch2 (C), respectively, on

  http://www.dieter.handshake.de/pyprojects/zope


The Python implementation worked with arbitrary trees,
the C implementation needs quite different code for integer trees
and and for arbitrary trees and only for integer trees, it can
gain the two orders of magnitude in speed.

The optimizing compiler would need to recognize that it can
drastically optimize if *ALL* keys in a tree are integer
and would need to efficiently detect for a given tree
that it satisfies this condition.
It would lose immediately, when it tried to verify the
a tree has only integer keys (as this takes linear time while
most operations are sub-linear).


--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Profiling Python

2008-12-06 Thread Dieter Maurer
[EMAIL PROTECTED] writes on Wed, 3 Dec 2008 07:13:14 -0800 (PST):
 To clarify again,
 Is there some function like profile.PrintStats() which dynamically
 shows the stats before stopping the Profiler?

Try to (deep) copy the profiler instance and than call PrintStats()
on the copy.

Of course, you will then profile also the PrintStats in the running
profiler.


Dieter

--
http://mail.python.org/mailman/listinfo/python-list


Re: advice needed for lazy evaluation mechanism

2009-11-11 Thread Dieter Maurer
Steven D'Aprano st...@remove-this-cybersource.com.au writes on 10 Nov 2009 
19:11:07 GMT:
 ...
  So I am trying to restructure it using lazy evaluation.
 
 Oh great, avoiding confusion with something even more confusing.

Lazy evaluation may be confusing if it is misused.
But, it may be very clear and powerful if used appropriately.

Lazy evaluation essentially means:
you describe beforehand how a computation should be
performed but do this computation only when its result is immediately
required.
Of course, this makes it more difficult to predict when the computation
actually happens -- something potentially very confusing when the computation
has side effects. If the computation is side effect free, potentially
great gains can be achieved without loss of clarity.

Python supports lazy evaluation e.g. by its genenerators (and
generator expressions). Its itertools module provides examples
to efficiently use iterators (and by inclusion generators) without
sacrificing clarity.

--
Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: setattr() oddness

2010-01-17 Thread Dieter Maurer
Lie Ryan lie.1...@gmail.com writes on Sat, 16 Jan 2010 19:37:29 +1100:
 On 01/16/10 10:10, Sean DiZazzo wrote:
  Interesting.  I can understand the would take time argument, but I
  don't see any legitimate use case for an attribute only accessible via
  getattr().  Well, at least not a pythonic use case.
 
 mostly for people (ab)using attributes instead of dictionary.

Here is one use case:

 A query application. Queries are described by complex query objects.
 For efficiency reasons, query results should be cached.
 For this, it is not unnatural to use query objects as cache keys.
 Then, query objects must not get changed in an uncontrolled way.
 I use __setattr__ to control access to the objects.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: setattr() oddness

2010-01-19 Thread Dieter Maurer
Steven D'Aprano ste...@remove.this.cybersource.com.au writes on 18 Jan 2010 
06:47:59 GMT:
 On Mon, 18 Jan 2010 07:25:58 +0100, Dieter Maurer wrote:
 
  Lie Ryan lie.1...@gmail.com writes on Sat, 16 Jan 2010 19:37:29 +1100:
  On 01/16/10 10:10, Sean DiZazzo wrote:
   Interesting.  I can understand the would take time argument, but I
   don't see any legitimate use case for an attribute only accessible
   via getattr().  Well, at least not a pythonic use case.
  
  mostly for people (ab)using attributes instead of dictionary.
  
  Here is one use case:
  
   A query application. Queries are described by complex query objects.
   For efficiency reasons, query results should be cached. For this, it is
   not unnatural to use query objects as cache keys. Then, query objects
   must not get changed in an uncontrolled way. I use __setattr__ to
   control access to the objects.
 
 
 (1) Wouldn't it be more natural to store these query keys in a list or 
 dictionary rather than as attributes on an object?
 
 e.g. instead of:
 
 cache.__setattr__('complex query object', value)
 
 use:
 
 cache['complex query object'] = value

Few will use cache.__setattr__(...) but cache.attr = ... which
is nicer than cache['attr'] = 

Moreover, it is not the cache but the query of which I want to protect
modification. My cache indeed uses cache[query_object] = 
But I want to prevent query_object from being changed after a potential
caching.



 (2) How does __setattr__ let you control access to the object? If a user 
 wants to modify the cache, and they know the complex query object, what's 
 stopping them from using __setattr__ too?

In my specific case, __setattr__ prevents all modifications via attribute
assignment. The class uses __dict__ access to set attributes when
it knows it is still safe.

Of course, this is no real protection against attackers (which could
use __dict__ as well). It only protects against accidental change
of query objects.


Meanwhile, I remembered a more important use case for __setattr__:
providing for transparent persistancy. The ZODB (Zope Object DataBase)
customizes __setattr__ in order to intercept object modifications
and register automatically that the change needs to be persisted at
the next transaction commit.


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Memory usage problem of twisted server

2010-01-21 Thread Dieter Maurer
Victor Lin borns...@gmail.com writes on Wed, 20 Jan 2010 02:52:25 -0800 (PST):
 Hi,
 
 I encountered an increasing memory usage problem of my twisted server.
 I have posted a question on stackoverflow:
 http://stackoverflow.com/questions/2100192/how-to-find-the-source-of-increasing-memory-usage-of-a-twisted-server
 
 I have read the article Improving Python's Memory Allocator (
 http://evanjones.ca/memoryallocator/ ) and Python Memory Management
 ( http://evanjones.ca/python-memory.html ). And I now know little
 about how Python manages memory. I am wondering, is that the
 increasing memory usage problem of my audio broadcasting caused by the
 how python manage memory?

Your careful reading has already told you that Python delegates
memory allocation for larger blocks (= 256 bytes) to the underlying
C runtime library (malloc and friends).

The C runtime library does not use memory compaction, i.e.
it does not relocate used memory blocks in order to free space
in few large chunks. Therefore, it is sensible to memory fragmentation:
the free space gets scattered around in a large number of rather small
blocks.
The fragmentation rate is especially high when the memory request sizes
have a high variance.
 
 Is that my guessing correct? How can I monitor the memory allocation
 of Python?

Look at http://guppy-pe.sourceforge.net/;

--
Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Use eval() safely?

2010-02-24 Thread Dieter Maurer
Steven D'Aprano ste...@remove.this.cybersource.com.au writes on 22 Feb 2010 
06:07:05 GMT:
 ...
 It's *especially* not safe if you put nothing in the globals dict, 
 because Python kindly rectifies that by putting the builtins into it:
 
  eval(__builtins__.keys(), {}, {})
 ['IndexError', 'all', 'help', 'vars', ... 'OverflowError']
 
 
  eval(globals(), {}, {})
 {'__builtins__': {...}}
 
  eval(globals(), {'__builtins__': None}, {})
 Traceback (most recent call last):
   File stdin, line 1, in module
   File string, line 1, in module
 NameError: name 'globals' is not defined
 
 So {'__builtins__': None} is safer than {}. Still not safe, exactly, but 
 safer. Or at least you make the Black Hats work harder before they own 
 your server :)

Using functionality introduced with the class/type homogenization,
it is quite easy to get access to the file type (even when __builtins__
is disabled). Having file, arbitrary files can be read, written, destroyed...


Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: break unichr instead of fix ord?

2009-08-29 Thread Dieter Maurer
Martin v. Löwis mar...@v.loewis.de writes on Fri, 28 Aug 2009 10:12:34 
+0200:
  The PEP says:
   * unichr(i) for 0 = i  2**16 (0x1) always returns a
 length-one string.
  
   * unichr(i) for 2**16 = i = TOPCHAR will return a
 length-one string on wide Python builds. On narrow
 builds it will raise ValueError.
  and
   * ord() is always the inverse of unichr()
  
  which of course we know; that is the current behavior.  But
  there is no reason given for that behavior.
 
 Sure there is, right above the list:
 
 Most things will behave identically in the wide and narrow worlds.
 
 That's the reason: scripts should work the same as much as possible
 in wide and narrow builds.
 
 What you propose would break the property unichr(i) always returns
 a string of length one, if it returns anything at all.

But getting a ValueError in some builds (and not in others)
is rather worse than getting unicode strings of different length

 1) Should surrogate pairs be disallowed on narrow builds?
  That appears to have been answered in the negative and is
  not relevant to my question.
 
 It is, as it does lead to inconsistencies between wide and narrow
 builds. OTOH, it also allows the same source code to work on both
 versions, so it also preserves the uniformity in a different way.

Do you not have the inconsistencies in any case?
... ValueError in some builds and not in others ...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Things to know about super (was: super() and multiple inheritance failure)

2009-09-28 Thread Dieter Maurer
Michele Simionato michele.simion...@gmail.com writes on Fri, 25 Sep 2009 
22:58:32 -0700 (PDT):
 ...
You know that in an ideal world I would just
 throw
 away multiple inheritance, it is just not worth the complication.

I am a fan of multiple inheritance: it lets the compliler/language runtime
do stupid tasks (implementing delegations), I would otherwise have to do
explicitely. True, there may be complications - but often, they can be
avoided
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python performance on Solaris

2009-10-16 Thread Dieter Maurer
Antoine Pitrou solip...@pitrou.net writes on Thu, 15 Oct 2009 16:25:43 + 
(UTC):
 Le Wed, 14 Oct 2009 22:39:14 -0700, John Nagle a écrit :
  
  Note that multithreaded compute-bound Python programs really suck
  on multiprocessors.  Adding a second CPU makes the program go slower,
  due to a lame mechanism for resolving conflicts over the global
  interpreter lock.
 
 I'm not sure what you're talking about. Python has no mechanism for 
 resolving conflicts over the global interpreter lock (let alone a lame 
 one :-)), it just trusts the OS to schedule a thread only when it is not 
 waiting on an unavailable resource (a lock). The GIL is just an OS-level 
 synchronization primitive and its behaviour (fairness, performance) will 
 depend on the behaviour of the underlying OS.

But, independent from the OS and the fairness/performance of the GIL
management itself: the GIL is there to prevent concurrent execution
of Python code. Thus, at any time, at most one thread (in a process)
is executing Python code -- other threads may run as well, as long
as they are inside non Python code but cannot be executing Python bytecode,
independent of available CPU resources. This implies that Python cannot
fully exploit the power of multiprocessors.

It is also true that adding CPUs may in fact reduce performance for
compute bound multithreaded Python programs. While the additional
computational resources cannot be use by Python, the additional overhead
(switching between CPUs) may reduce overall performance.
I agree with you that it is difficult to understand when this overhead
were really significant.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: restriction on sum: intentional bug?

2009-10-17 Thread Dieter Maurer
Christian Heimes li...@cheimes.de writes on Fri, 16 Oct 2009 17:58:29 +0200:
 Alan G Isaac schrieb:
  I expected this to be fixed in Python 3:
  
  sum(['ab','cd'],'')
  Traceback (most recent call last):
 File stdin, line 1, in module
  TypeError: sum() can't sum strings [use ''.join(seq) instead]
  
  Of course it is not a good way to join strings,
  but it should work, should it not?  Naturally,
 
 It's not a bug. sum() doesn't work on strings deliberately. ''.join()
 *is* the right and good way to concatenate strings.

Apparently, sum special cases 'str' in order to teach people to use join.
It would have been as much work and much more friendly, to just use join
internally to implement sum when this is possible.

Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


[OT] Supporting homework (was: Re: Checking a Number for Palindromic Behavior)

2009-10-22 Thread Dieter Maurer
Steven D'Aprano ste...@remove.this.cybersource.com.au writes on 20 Oct 2009 
05:35:18 GMT:
 As far as I'm concerned, asking for help on homework without being honest 
 up-front about it and making an effort first, is cheating by breaking the 
 social contract. Anyone who rewards cheaters by giving them the answer 
 they want is part of the problem. Whether cheaters prosper in the long 
 run or not, they make life more difficult for the rest of us, and should 
 be discouraged.

A few days ago, I have read an impressive book: Albert Jacquard: Mon utopie.
The author has been a university professor (among others for
population genectics, a discipline between mathematics and biologie).
One of the corner therories in his book: mankind has reached the current
level of development not mainly due to exceptional work by individuals
but by the high level of cooperation between individuals.

In this view, asking for help (i.e. seeking communication/cooperation)
with individual tasks should probably be highly encouraged not discouraged.
At least, it is highly doubtful that the paradigm each for himself,
the most ruthless wins will be adequate for the huge problems mankind
will face in the near future (defeating hunger, preventing drastic
climate changes, natural resources exhaustion, ); intensive
cooperation seems to be necessary.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 2.6 Deprecation Warnings with __new__ Can someone explain why?

2009-10-24 Thread Dieter Maurer
Terry Reedy tjre...@udel.edu writes on Fri, 23 Oct 2009 03:04:41 -0400:
 Consider this:
 
 def blackhole(*args, **kwds): pass
 
 The fact that it accept args that it ignores could be considered
 misleading or even a bug.

Maybe, it could. But, it is by no means necessary.

In mathematics, there is a set of important functions which behave precisely
as described above (there ignore their arguments); they are called
constant functions
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: httplib incredibly slow :-(

2009-08-14 Thread Dieter Maurer
Chris Withers ch...@simplistix.co.uk writes on Thu, 13 Aug 2009 08:20:37 
+0100:
 ...
 I've already established that the file downloads in seconds with
 [something else], so I'd like to understand why python isn't doing the
 same and fix the problem...

A profile might help to understand what the time is used for.

As almost all operations are not done in Python itself (httplib is really
a very tiny wrapper above a socket), a C level profile may be necessary
to understand the behaviour.

Dieter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Can I get logging.FileHandler to close the file on each emit?

2012-08-30 Thread Dieter Maurer
rikardhul...@gmail.com writes:

 I use logging.FileHandler (on windows) and I would like to be able to delete 
 the file while the process is running and have it create the file again on 
 next log event.

 On windows (not tried linux) this is not possible because the file is locked 
 by the process, can I get it to close the file after each log event?

 If not, would the correct thing to do be to write my own LogHandler with this 
 behavior?

Zope is using Python's logging module and wants to play well
with log rotating (start a new logfile, do something with the old log file
(compress, rename, remove)).
It does this by registering a signal handler which closes its logfiles
when the corresponding signal is received.

Maybe, you can do something like this. Signal handling under
Windows is limited, but maybe you find a usable signal under Windows
(Zope is using SIGUSR1).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Are the property Function really useful?

2012-08-30 Thread Dieter Maurer
levinie...@gmail.com writes:

 Are the property Function really useful?

Someone invested time to implement/document/test it.
Thus, there are people who have use cases for it...

 Where can i use the property function?

You can use it when you have parameterless methods
which you want to access as if they were simple attributes:
i.e. obj.m instead of obj.m().
To phrase is slightly differently: the property function
allows you to implement computed (rather than stored) attributes.

You may find this feature uninteresting: fine, do not use it...

However, there are cases where it is helpful, e.g.:

  You have a base class B with an attribute a.
  Now, you want to derive a class D from B where a is
  not fixed but must be computed from other attributes.


The Eiffel programming language even stipulates that
attributes and parameterless methods are essentially the same
and application of the property function is implicit in Eiffel
for parameterless methods: to hide implementation details.

As you see, property can be highly valued ;-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: sockets,threads and interupts

2012-09-04 Thread Dieter Maurer
loial jldunn2...@gmail.com writes:

 I have threaded python script that uses sockets to monitor network ports.

 I want to ensure that the socket is closed cleanly in all circumstances. This 
 includes if the script is killed or interupted in some other way.

The operating system should close all sockets automatically when
the process dies. Thus, if closing alone is sufficient...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why derivated exception can not be pickled ?

2012-09-05 Thread Dieter Maurer
Mathieu Courtois mathieu.court...@gmail.com writes:

 Here is my example :


 import cPickle

 ParentClass = object # works
 ParentClass = Exception  # does not

 class MyError(ParentClass):
 def __init__(self, arg):
 self.arg = arg

 def __getstate__(self):
 print '#DBG pass in getstate'
 odict = self.__dict__.copy()
 return odict

 def __setstate__(self, state):
 print '#DBG pass in setstate'
 self.__dict__.update(state)

 exc = MyError('IDMESS')

 fo = open('pick.1', 'w')
 cPickle.dump(exc, fo)
 fo.close()

 fo = open('pick.1', 'r')
 obj = cPickle.load(fo)
 fo.close()


 1. With ParentClass=object, it works as expected.

 2. With ParentClass=Exception, __getstate__/__setstate__ are not called.

The pickle interface is actually more complex and there are several
ways an object can ensure picklability. For example, there is
also a __reduce__ method. I suppose, that Exception defines methods
which trigger the use of an alternative picklability approach (different
from __getstate__/__setstate__).

I would approach your case the following way: Use pickle instead
of cPickle and debug picking/unpickling to find out what
happens in detail.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [web] Long-running process: FCGI? SCGI? WSGI?

2012-09-06 Thread Dieter Maurer
Gilles nos...@nospam.com writes:

 To write a long-running web application, I'd like to some feedback
 about which option to choose.

 Apparently, the choice boilds down to this:
 - FastCGI
 - SCGI
 - WSGI

 It seems like FCGI and SCGI are language-neutral, while WSGI is
 Python-specific.

 Besides that, how to make an informed choice about which option to
 choose?

Obviously, this depends on your environment. Some hosters, web servers,
applications may directly support one interface and not others.

If you control your whole environment, I would look for a newer
approach. I do not know SCGI but I know that WSGI is fairly recent.
This means that during its design, FastCGI was already known and
not deemed to be sufficient. Thus, you can expect more features
(more modularisation, in this case) in WSGI.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to implement a combo Web and Desktop app in python.

2012-09-14 Thread Dieter Maurer
Shawn McElroy luckysm...@gmail.com writes:

 ...
 So I need to find a way I can implement this in the best way...

It is in general very difficult to say reliable things about the best way.
Because, that depends very much on details.

My former employer has created a combo destop/online application
based on Zope. Zope is a web application framework, platform independent,
easily installable, with an integrated HTTP server. It is one of
the natural choices as a basis for a Python implemented web application.
To get a desktop application, application and Zope was installed
on the client system and a standard browser used for the ui.

The main drawback of this scheme came from the limitations of
the browser implemented ui. It has been very difficult to implement
deep integration with the desktop (e.g. drap  drop in and out of
the application; integration with the various other applications
(Outlook, Word, ...)) and to provide gimicks provided by the
surrounding environment. Thus, after 8 years, the application started
to look old style and the browser based ui was replaced by a stand alone
desktop application that talked via webservices with an online
system (if necessary).

Thus, *if* the ui requirements are fairly low (i.e. can fairly easily
be implemented via a browser) you could go a similar route. If your
ui requirements are high, you can replace the browser by a self
developped (thick) ui application that talks via an
abstraction with its backend. Properly designed, the abstraction
could either be implemented by direct calls (to
a local library) or by webservice calls (to an online service).
This way, you could use your client application both for the (local)
desktop only case as well as for the online case.


Your description (stripped) suggests that you need special support
for offline usage. The is separate functionality, independent of
the desktop/online question. For example, highly available distributed database
systems must provide some synchronization mechanism for resynchronization
after temporary network connectivity loss. Another example:
transactional systems must not lose transactions and
can for example use asnychronous message queues to ensure that
messages are safely delivered even in the case of temporary
communication problems or failures.

Thus, look at these aspects independent from the desktop/online
szenario -- these aspects affect any distributed system
and solutions can be found there. Those solutions tend to be
complex (and expensive).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decorators not worth the effort

2012-09-14 Thread Dieter Maurer
 On Sep 14, 3:54 am, Jean-Michel Pichavant jeanmic...@sequans.com
 wrote:
 I don't like decorators, I think they're not worth the mental effort.

Fine.

I like them because they can vastly improve reusability and drastically
reduce redundancies (which I hate). Improved reusability and
reduced redundancies can make applications more readable, easier
to maintain and faster to develop.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to implement a combo Web and Desktop app in python.

2012-09-15 Thread Dieter Maurer
Shawn McElroy luckysm...@gmail.com writes:
 ...
 Although you are correct in the aspect of having 'real' OS level integration. 
 Being able to communicate with other apps as well as contextual menus. 
 Although, could I not still implement those features from python, into the 
 host system from python? There are also tools like 'kivi' which allow you to 
 get system level access to do things. Though im not too sure on how far that 
 extends, or how useful it would be.

In my szenario you have a standard browser as (thin) client and
Python only on the server side. In my szenario, the server could
run on the clients desktop -- however, it ran there as a service,
i.e. not in user space. My knowledge about Windows
is limited. I do not really know whether a Windows service can
fully interact with applications running in the user space and
what limitations may apply.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Decorators not worth the effort

2012-09-15 Thread Dieter Maurer
Dwight Hutto wrote at 2012-9-14 23:42 -0400:
 ...
Reduce redundancy, is argumentative.

To me, a decorator, is no more than a logging function. Correct me if
I'm wrong.

Well, it depends on how you are using decorators and how complex
your decorators are. If what you are using as decorating function
it really trivial, as trivial as @decoratorname, then you
do not gain much.

But your decorator functions need not be trivial.
An example: in a recent project,
I have implemented a SOAP webservice where most services depend
on a valid session and must return specified fields even when
(as in the case of an error) there is no senseful value.
Instead of putting into each of those function implementations
the check do I have a valid session? and at the end
add required fields not specified, I opted for the following
decorator:

def valid_session(*fields):
! fields = (errorcode,) + fields
  @decorator
  def valid_session(f, self, sessionkey, *args, **kw):
!   s = get_session(sessionkey)
!   if not s.get(authenticated, False):
! rd = {errorcode: u1000}
!   else:
! rd = f(self, sessionkey, *args, **kw)
!   return tuple(rd.get(field, DEFAULTS.get(field, '')) for field in fields)
  return valid_session

The lines starting with ! represent the logic encapsulated by the
decorator -- the logic, I would have to copy into each function implementation
without it.

I then use it this way:

  @valid_session()
  def logout(self, sessionkey):
s = get_session(sessionkey)
s[authenticated] = False
return {}

  @valid_session(amountavail)
  def getStock(self, sessionkey, customer, item, amount):
info = self._get_article(item)
return {uamountavail:info[deliverability] and u0 or u1}

  @valid_session(item, shortdescription, pe, me, min, price, vpe, 
stock, linkpicture, linkdetail, linklist, description, tax)
  def fetchDetail(self, sessionkey, customer, item):
return self._get_article(item)
  ...

I hope you can see that at least in this example, the use of the decorator
reduces redundancy and highly improves readability -- because
boilerplate code (check valid session, add default values for unspecified
fields) is not copied over and over again but isolated in a single place.


The example uses a second decorator (@decorator) --
in the decorator definition itself. This decorator comes from the
decorator module, a module facilitating the definition of signature
preserving decorators (important in my context): such a decorator
ensures that the decoration result has the same parameters as the
decorated function. To achieve this, complex Python implementation
details and Python's introspection must be used. And I am very
happy that I do not have to reproduce this logic in my decorator
definitions but just say @decorator :-)


Example 3: In another project, I had to implement a webservice
where most of the functions should return json serialized data
structures. As I like decorators, I chose a @json decorator.
Its definition looks like this:

@decorator
def json(f, self, *args, **kw):
  r = f(self, *args, **kw)
  self.request.response.setHeader(
'content-type',
# application/json made problems with the firewall,
#  try text/json instead
#'application/json; charset=utf-8'
'text/json; charset=utf-8'
)
  return udumps(r)

It calls the decorated function, then adds the correct content-type
header and finally returns the json serialized return value.

The webservice function definitions then look like:

@json
def f1(self, ):
   

@json
def f2(self, ...):
   

The function implementions can concentrate on their primary task.
The json decorator tells that the result is (by magic specified
elsewhere) turned into a json serialized value.

This example demontrates the improved maintainability (caused by
the redundancy reduction): the json rpc specification stipulates
the use of the application/json content type. Correspondingly,
I used this content-type header initially. However, many enterprise
firewalls try to protect against viruses by banning application/*
responses -- and in those environments, my initial webservice
implementation did not work. Thus, I changed the content type
to text/json. Thanks to the decorator encapsulation of the
json result logic, I could make my change at a single place -- not littered
all over the webservice implementation.


And a final example: Sometimes you are interested to cache (expensive)
function results. Caching involves non-trivial logic (determine the cache,
determine the key, check whether the cache contains a value for the key;
if not, call the function, cache the result). The package plone.memoize
defines a set of decorators (for different caching policies) which
which caching can be as easy as:

  @memoize
  def f():
  

The complete caching logic is encapsulated in the tiny @memoize prefix.
It tells: calls to this function are cached. The function implementation
can concentrate on its 

Re: Decorators not worth the effort

2012-09-18 Thread Dieter Maurer
Jean-Michel Pichavant jeanmic...@sequans.com writes:

 - Original Message -
 Jean-Michel Pichavant wrote:
 [snip]
 One minor note, the style of decorator you are using loses the
 docstring
 (at least) of the original function. I would add the
 @functools.wraps(func)
 decorator inside your decorator.

 Is there a way to not loose the function signature as well ?

Look at the decorator module.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: python application file format

2012-09-27 Thread Dieter Maurer
Benjamin Jessup b...@abzinc.com writes:

 ...
 What do people recommend for a file format for a python desktop
 application? Data is complex with 100s/1000s of class instances, which
 reference each other.

 ...
 Use cPickle with a module/class whitelist? (Can't easily port, not
 entirely safe, compact enough, expandable)

This is the approach used by the ZODB (Zope Object DataBase).

I like the ZODB. It is really quite easy to get data persisted.
It uses an elaborate caching scheme to speed up database interaction
and has transaction control to ensure persistent data consistency
in case of errors.

Maybe not so relevant in your context, it does not require
locking to safely access persistent data in a multi thread environment.

 ...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Private methods

2012-10-11 Thread Dieter Maurer
alex23 wuwe...@gmail.com writes:

 On 10 Oct, 17:03, real-not-anti-spam-addr...@apple-juice.co.uk (D.M.
 Procida) wrote:
 It certainly makes it quick to build a class with the attributes I need,
 but it does make tracing logic sometimes a pain in the neck.

 I don't know what the alternative is though.

 Components.

 The examples are in C++ and it's about game development, but I found
 this article to be very good at explaining the approach:
 http://gameprogrammingpatterns.com/component.html

 I've become a big fan of components  adaptation using zope.interface:
 http://wiki.zope.org/zope3/ZopeGuideComponents

If multiple inheritance is deemed complex, adaptation is even more so:

  With multiple inheritance, you can quite easily see from the source
  code how things are put together.
  Adaptation follows the inversion of control principle. With this
  principle, how a function is implemented, is decided outside
  and can very easily be changed (e.g. through configuration).
  This gives great flexibility but also nightmares when things do
  not work as expected...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: serialization and versioning

2012-10-13 Thread Dieter Maurer
Neal Becker ndbeck...@gmail.com writes:

 I wonder if there is a recommended approach to handle this issue.

 Suppose objects of a class C are serialized using python standard pickling.  
 Later, suppose class C is changed, perhaps by adding a data member and a new 
 constructor argument.

 It would see the pickling protocol does not directly provide for this - but 
 is 
 there a recommended method?

 I could imagine that a class could include a class __version__ property that 
 might be useful - although I would further expect that it would not have been 
 defined in the original version of class C (but only as an afterthought when 
 it 
 became necessary).

The ZODB (Zope Object DataBase) is based on Python's pickle.

In the ZODB world, the following strategy is used:

  *  if the class adds a new data attribute, give it (in addition) a
 corresponding class level attribute acting as default value
 in case the pickled state of an instance lacks this
 instance level attribute

  *  for more difficult cases, define an appropriate __getstate__
 for the class that handles the necessary model upgrades

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: deque and thread-safety

2012-10-13 Thread Dieter Maurer
Christophe Vandeplas christo...@vandeplas.com writes:

 ...
 From the documentation I understand that deques are thread-safe:
 Deques are a generalization of stacks and queues (the name is pronounced 
 “deck”
 and is short for “double-ended queue”). Deques support thread-safe, memory
 efficient appends and pops from either side of the deque with approximately 
 the
 same O(1) performance in either direction.

 It seems that appending to deques is indeed thread-safe, but not
 iterating over them.

You are right.

And when you think about it, then there is not much point in striving
for thread safety for iteration (alone).
Iteration is (by nature) a non atomic operation: you iterate because
you want to do something with the intermediate results; this doing
is not part of the iteration itself.
Thus, you are looking for thread safety not for only the iteration
but for the iteration combined with additional operations (which
may well extend beyond the duration of the iteration).

Almost surely, the deque implementation is using locks
to ensure thread safety for its append and pop. Check whether
this lock is exposed to the application. In this case, use
it to protect you atomic sections involving iteration.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: problems with xml parsing (python 3.3)

2012-10-28 Thread Dieter Maurer
janni...@gmail.com writes:

 I am new to Python and have a problem with the behaviour of the xml parser. 
 Assume we have this xml document:

 ?xml version=1.0 encoding=UTF-8?
 bibliography
 entry
 Title of the first book.
 /entry
 entry
 coauthored/
 Title of the second book.
 /entry
 /bibliography


 If I now check for the text of all 'entry' nodes, the text for the node with 
 the empty element isn't shown



 import xml.etree.ElementTree as ET
 tree = ET.ElementTree(file='test.xml')
 root = tree.getroot()
 resultSet = root.findall(.//entry)
 for r in resultSet:
   print (r.text)

I do not know about xml.etree but the (said) quite compatible
lxml.etree handles text nodes in a quite different way from
that of DOM: they are *not* considered children of the parent
element but are attached as attributes text and tail to either
the container element (if the first DOM node is a text node) or the preceeding
element, otherwise.

Your code snippet suggests that xml.etree behaves identically in
this respect. In this case, you would find Title of the second book
as the tail attribute of the element coauthored.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Applying a paid third party ssl certificate

2012-11-04 Thread Dieter Maurer
ehsmenggro...@gmail.com writes:
 I haven't quite figured out how to apply a paid ssl cert, say RapidSSL free 
 SSL test from Python's recent sponsor sslmatrix.com and what to do with that 
 to make Python happy.

 This good fellow suggests using the PEM format. I tried and failed.
 http://www.minnmyatsoe.com/category/python-2/

 The self signed cert recepies found all work swell, but some browsers 
 (webkit) gets very upset indeed. I want to use ajax requests from clients 
 (e.g autocompletion, stats collection etc) and put that in a python program 
 without hogging down the main apache stack, but without a proper ssl cert 
 this doesn't work.

 Does anyone have any ideas what to do?

From your description, I derive that you want
your client (python program) to autenticate itself via an
SSL certificate.

If my assumption is correct, I would start with a look at
the Python documentation for HTTPS connections.
When I remember right, they have 2 optional parameters
to specify a client certificate and to specify trusted
certificates (when server presented certificates should be verified).

Once, you have determined how to present the client certificate
for the base HTTPS connection, you may need to look at the documentation
or source code of higher level apis (such as urllib2) to learn
how to pass on your certificate down to the real connection.

You may also have a look at PyPI. You may find there packages
facilitating Python's SSL support.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python garbage collector/memory manager behaving strangely

2012-11-14 Thread Dieter Maurer
a...@pythoncraft.com (Aahz) writes:

 ...
 def readlines(f):
 lines = []
 while f is not empty:
 line = f.readline()
 if not line: break
 if len(line)  2 and line[-2:] == '|\n':
 lines.append(line)
 yield ''.join(lines)
 lines = []
 else:
 lines.append(line)
 
 There's a few changes I'd make:
 I'd change the name to something else, so as not to shadow the built-in,
 ...
 Actually, as an experienced programmer, I *do* think it is confusing as
 evidenced by the mistake Dave made!  Segregated namespaces are wonderful
 (per Zen), but let's not pollute multiple namespaces with same name,
 either.

 It may not be literally shadowing the built-in, but it definitely
 mentally shadows the built-in.

I disagree with you. namespaces are there that in working
with a namespace I do not need to worry much about other
namespaces. Therefore, calling a function readlines
is very much justified (if it reads lines from a file), even
though there was a module around with name readlines.
By the way, the module is named readline (not readlines).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Getting empty attachment with smtplib

2012-11-14 Thread Dieter Maurer
Tobiah t...@tobiah.org writes:

 I just found out that the attachment works fine
 when I read the mail from the gmail website.  Thunderbird
 complains that the attachment is empty.

The MIME standard (a set of RFCs) specifies how valid messages
with attachments should look like.

Fetch the mail (unprocessed if possible) and look at its
structure. If it is conformant to the MIME standard, then
Thunderbird made a mistake; otherwise, something went wrong
with the message construction.

I can already say that smtplib is not to blame. It is (mostly) unconcerned
with the internal structure of the message -- and by itself
will not empty attachments.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: error importing smtplib

2012-11-16 Thread Dieter Maurer
Eric Frederich eric.freder...@gmail.com writes:

 I created some bindings to a 3rd party library.
 I have found that when I run Python and import smtplib it works fine.
 If I first log into the 3rd party application using my bindings however I
 get a bunch of errors.

 What do you think this 3rd party login could be doing that would affect the
 ability to import smtp lib.

 Any suggestions for debugging this further.  I am lost.

 This works...

 import smtplib
 FOO_login()

 This doesn't...

 FOO_login()
 import smtplib

 Errors.

 import smtplib
 ERROR:root:code for hash sha224 was not found.
 Traceback (most recent call last):
   File /opt/foo/python27/lib/python2.7/hashlib.py, line 139, in module
 globals()[__func_name] = __get_hash(__func_name)
   File /opt/foo/python27/lib/python2.7/hashlib.py, line 103, in
 __get_openssl_constructor
 return __get_builtin_constructor(name)
   File /opt/foo/python27/lib/python2.7/hashlib.py, line 91, in
 __get_builtin_constructor
 raise ValueError('unsupported hash type %s' % name)
 ValueError: unsupported hash type sha224

From the error, I suppose it does something bad
for hash registries.

When I have analysed problems with hashlib (some time ago,
my memory may not be completely trustworthy), I got the
impression that hashlib essentially delegates to the
openssl libraries for the real work and especially
the supported hash types. Thus, I suspect that
your FOO_login() does something which confuses openssl.
One potential reason could be that it loads a bad version
of an openssl shared library.

I would use the trace (shell) command to find out what operating system
calls are executed during FOO_login(), hoping that one of them
give me a clue.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: error importing smtplib

2012-11-17 Thread Dieter Maurer
Eric Frederich eric.freder...@gmail.com writes:

 ...
 So I'm guessing the problem is that after I log in, the process has a
 conflicting libssl.so file loaded.
 Then when I try to import smtplib it tries getting things from there and
 that is where the errors are coming from.

 The question now is how do I fix this?

Likely, you must relink the shared object containing
your FOO_login. When its current version was linked,
the (really) old libssl has been current and the version was linked
against it.
As the binary objects for your shared object might depend on the old
version, it is best, to not only relink but to recompile it as well.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Stack_overflow error

2012-11-20 Thread Dieter Maurer
Aung Thet Naing aung.thetna...@gmail.com writes:

 I'm having Stack_overflow exception in _ctypes_callproc (callproc.c). The 
 error actually come from the:

  cleanup:
 for (i = 0; i  argcount; ++i)
 Py_XDECREF(args[i].keep);

 when args[i].keep-ob_refCnt == 1

Really a stack overflow or a general segmentation violation?
Under *nix, both are not easy to distinguish -- but maybe, you are
working with Windows?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Suitable software stacks for simple python web service

2012-11-21 Thread Dieter Maurer
Kev Dwyer kevin.p.dw...@gmail.com writes:

 I have to build a simple web service which will:

  - receive queries from our other servers
  - forward the requests to a third party SOAP service
  - process the response from the third party
  - send the result back to the original requester

 From the point of view of the requester, this will happen within the scope 
 of a single request.  

 The data exchanged with the original requester will likely be encoded as 
 JSON; the SOAP service will be handled by SUDS.

 The load is likely to be quite light, say a few requests per hour, though 
 this may increase in the future.

 Given these requirements, what do you think might be a suitable software 
 stack, i.e. webserver and web framework (if a web framework is even 
 necessary)?  

From your description (so far), you would not need a web framework
but could use any way to integrate Python scripts into a web server,
e.g. mod_python, cgi, WSGI, 
Check what ways your web server will suport.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: deepcopy questions

2012-11-28 Thread Dieter Maurer
lars van gemerden l...@rational-it.com writes:
 ... deepcopy dropping some items ...
 Any ideas are still more then welcome,

deepcopy is implemented in Python (rather than C).
Thus, if necessary, you can debug what it is doing
and thereby determine where the items have been dropped.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os.system and subprocess odd behavior

2012-12-14 Thread Dieter Maurer
py_genetic conor.robin...@gmail.com writes:

 Example of the issue for arguments sake:

 Platform Ubuntu server 12.04LTS, python 2.7

 Say file1.txt has hello world in it.
  ^
Here, you speak of file1.txt (note the extension .txt)

 subprocess.Popen(cat  file1  file2, shell = True)
 subprocess.call(cat  file1  file2, shell = True)
 os.system(cat  file1  file2)

But in your code, you use file1 (without extension).

If your code really references a non-existing file, you may well
get what you are observing.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: need some help with unexpected signal exception when using input from a thread (Pypy 1.9.0 on osx/linux)

2012-12-15 Thread Dieter Maurer
Irmen de Jong irmen.nos...@xs4all.nl writes:

 Using Pypy 1.9.0. Importing readline. Using a background thread to get 
 input() from
 stdin. It then crashes with:

   File /usr/local/Cellar/pypy/1.9/lib_pypy/pyrepl/unix_console.py, line 
 400, in restore
 signal.signal(signal.SIGWINCH, self.old_sigwinch)
 ValueError: signal() must be called from the main thread

 Anyone seen this before? What's going on?

Apparently, input is not apt to be called from a background thread.

I have no idea why signal should only be callable from the main thread.
I do not think this makes much sense. Speak with the Pypy developers
about this.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Using pdb with greenlet?

2012-06-13 Thread Dieter Maurer
Salman Malik salma...@live.com writes:

 I am sort of a newbie to Python ( have just started to use pdb).
 My problem is that I am debugging an application that uses greenlets and when
 I encounter something in code that spawns the coroutines or wait for an event,
 I lose control over the application (I mean that after that point I can no
 longer do 'n' or 's' on the code). Can anyone of you tell me how to tame
 greenlet with pdb, so that I can see step-by-step as to what event does a
 coroutine sees and how does it respond to it.
 Any help would be highly appreciated.

Debugging works via the installation of a tracehook function. If such a
function is installed in a thread, the Python interpreter calles back
via the installed hook to report events relevant for debugging.
Usually the hook function is defined by a debugger which examines
whether the event is user relevant (e.g. if a breakpoint has been hit,
or code for a new line has been entered) and in this case imforms
the user and may give him control.

It is important that the trace hook installation is thread specific. Otherwise,
debugging in a multithreaded environment would be a nightmare - as
events from multiple threads may arrive and seriously confuse
the debugger as well as the user.


I do not know greenlet. However, I expect that it uses threads
under the hood to implement coroutines. In such a case, it would
be natural that debugging one coroutine would not follow the
execution into a different coroutine.

To change this, greenlet would need to specially support
the tracehook feature: when control is transfered to a different
coroutine, the tracehook would need to be transfered as well.
Personally, I am not sure that this would be a good idea
(I sometimes experience debugging interaction from different threads --
and I can tell you that it is a really nasty experience).

However, you can set the tracehook youself in your each of your
coroutines: import pdb; pdb.set_trace(). This is called a code breakpoint.
It installs the debuggers tracehook function in the current thread
and gives control to the debugger (i.e. it works like a breakpoint).
I use this quite frequently to debug multithreaded web applications
and it works quite well (sometimes with nasty experiences).

pdb is not optimal to for multithread debugging because it expects
to interact with a single thread only. For a good experience,
a thread aware extension would be necessary.

--
Dieter


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: validating XML

2012-06-13 Thread Dieter Maurer
andrea crotti andrea.crott...@gmail.com writes:

 Hello Python friends, I have to validate some xml files against some xsd
 schema files, but I can't use any cool library as libxml unfortunately.

Why?
It seems not very rational to implement a complex task (such as
XML-Schema validation) when there are ready solutions around.

 A Python-only validator might be also fine, but all the projects I've
 seen are partial or seem dead..
 So since we define the schema ourselves, I was allowed to only implement
 the parts of the huge XML definition that we actually need.
 Now I'm not quite sure how to do the validation myself, any suggestions?

I would look for a command line tool available
on your platform which performs the validation and
call this from Python.

--
Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: validating XML

2012-06-14 Thread Dieter Maurer
andrea crotti andrea.crott...@gmail.com writes:
 ...
 The reason is that it has to work on many platforms and without any c module
 installed, the reason of that

Searching for a pure Python solution, you might have a look at PyXB.

It has not been designed to validate XML instances against XML-Schema
(but to map between XML instances and Python objects based on
an XML-Schema description) but it detects many problems in the
XML instances. It does not introduce its own C extensions
(but relies on an XML parser shipped with Python).

 Anyway in a sense it's also quite interesting, and I don't need to implement
 the whole XML, so it should be fine.

The XML is the lesser problem. The big problem is XML-Schema: it is
*very* complex with structure definitions (elements, attributes and
#PCData), inheritance, redefinition, grouping, scoping rules, inclusion,
data types with restrictions and extensions.

Thus if you want to implement a reliable algorithm which for
given XML-schema and XML-instance checks whether the instance is
valid with respect to the schema, then you have a really big task.

Maybe, you have a fixed (and quite simple) schema. Then
you may be able to implement a validator (for the fixed schema).
But I do not understand why you would want such a validation.
If you generate the XML instances, then thouroughly test your
generation process (using any available validator) and then trust it.
If the XML instances come from somewhere else and must be interpreted
by your application, then the important thing is that they are
understood by your application, not that they are valid.
If you get a complaint that your application cannot handle a specific
XML instance, then you validate it in your development environment
(again with any validator available) and if the validation fails,
you have good arguments.


 What I haven't found yet is an explanation of a possible algorithm to use for
 the validation, that I could then implement..

You parse the XML (and get a tree) and then recursively check
that the elements, attributes and text nodes in the tree
conform to the schema (in an abstract sense,
the schema is a collection of content models for the various elements;
each content model tells you how the element content and attributes
should look like).
For a simple schema, this is straight forward. If the schema starts
to include foreign schemas, uses extensions, restrictions or redefines,
then it gets considerably more difficult.


--
Dieter

-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   >