Re: Trying to parse a HUGE(1gb) xml file

2010-12-21 Thread Stefan Behnel

Adam Tauno Williams, 20.12.2010 20:49:

On Mon, 2010-12-20 at 11:34 -0800, spaceman-spiff wrote:

This is a rather long post, but i wanted to include all the details&
everything i have tried so far myself, so please bear with me&  read
the entire boringly long post.
I am trying to parse a ginormous ( ~ 1gb) xml file.


Do that hundreds of times a day.


0. I am a python&  xml n00b, s&  have been relying on the excellent
beginner book DIP(Dive_Into_Python3 by MP(Mark Pilgrim) Mark , if
u are readng this, you are AWESOME&  so is your witty&  humorous
writing style)
1. Almost all exmaples pf parsing xml in python, i have seen, start off with 
these 4 lines of code.
import xml.etree.ElementTree as etree


Try

import xml.etree.cElementTree as etree

instead. Note the leading "c", which hints at the C implementations of 
ElementTree. It's much faster and much more memory friendly than the Python 
implementation.




tree = etree.parse('*path_to_ginormous_xml*')
root = tree.getroot()  #my huge xml has 1 root at the top level
print root


Yes, this is a terrible technique;  most examples are crap.


2. In the 2nd line of code above, as Mark explains in DIP, the parse
function builds&  returns a tree object, in-memory(RAM), which
represents the entire document.
I tried this code, which works fine for a small ( ~ 1MB), but when i
run this simple 4 line py code in a terminal for my HUGE target file
(1GB), nothing happens.
In a separate terminal, i run the top command,&  i can see a python
process, with memory (the VIRT column) increasing from 100MB , all the
way upto 2100MB.


Yes, this is using DOM.  DOM is evil and the enemy, full-stop.


Actually, ElementTree is not "DOM", it's modelled after the XML Infoset. 
While I agree that DOM is, well, maybe not "the enemy", but not exactly 
beautiful either, ElementTree is really a good thing, likely also in this case.




I am guessing, as this happens (over the course of 20-30 mins), the
tree representing is being slowly built in memory, but even after
30-40 mins, nothing happens.
I dont get an error, seg fault or out_of_memory exception.


You need to process the document as a stream of elements; aka SAX.


IMHO, this is the worst advice you can give.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Redundant importing of modules

2010-12-21 Thread bruno.desthuilli...@gmail.com
On 21 déc, 03:03, Steve Holden  wrote:
> On 12/20/2010 8:36 PM, Jshgwave wrote:>
> > When writing a function that uses a module such as NumPy, it is tempting
> > to include the statement "import numpy" or "import numpy as np" in the
> > definition of the function, in case the  function is used in a script
> > that hasn't already imported NumPy.

(answering the OP - post didn't show off here on c.l.py):

This is actually totally useless. The global namespace of a function
is the namespace of the module in which it has been defined, not the
namespace of the module where the function is called.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Trying to parse a HUGE(1gb) xml file

2010-12-21 Thread Stefan Behnel

spaceman-spiff, 20.12.2010 21:29:

I am sorry i left out what exactly i am trying to do.

0. Goal :I am looking for a specific element..there are several 10s/100s 
occurrences of that element in the 1gb xml file.
The contents of the xml, is just a dump of config parameters from a packet 
switch( although imho, the contents of the xml dont matter)

I need to detect them&  then for each 1, i need to copy all the content b/w the 
element's start&  end tags&  create a smaller xml file.


Then cElementTree's iterparse() is your friend. It allows you to basically 
iterate over the XML tags while its building an in-memory tree from them. 
That way, you can either remove subtrees from the tree if you don't need 
them (to safe memory) or otherwise handle them in any way you like, such as 
serialising them into a new file (and then deleting them).


Also note that the iterparse implementation in lxml.etree allows you to 
specify a tag name to restrict the iterator to these tags. That's usually a 
lot faster, but it also means that you need to take more care to clean up 
the parts of the tree that the iterator stepped over. Depending on your 
requirements and the amount of manual code optimisation that you want to 
invest, either cElementTree or lxml.etree may perform better for you.


It seems that you already found the article by Liza Daly about high 
performance XML processing with Python. Give it another read, it has a 
couple of good hints and examples that will help you here.


Stefan

--
http://mail.python.org/mailman/listinfo/python-list


how to inter-working on process in window

2010-12-21 Thread haloha
Hi all


how do i send an ESC key into a process on window
i already get a pid of process  but i dont know how to send ESC key into
process


Please help
Ha
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: On 07/13/2010 02:18 PM, Adam Mercer wrote:That version of M2Crypto does not

2010-12-21 Thread Robert Schuon
I was getting the same error trying to build M2Crypto 0.20.2 for Python 2.5 on 
a Win 7 laptop, so I pulled down the trunk, and it did build properly using 
minGW and Swig.  However, when I try to "python setup.py install", python 
simply gives the same complaint that python was built in visual studio 2003, 
and will not install M2Crypto.  Any help would be greatly appreciated, as I 
have been trying to get this to work for days.   Does anyone have a build for 
0.20.2 that works with python 2.5?  I found so many other builds, but not that 
one.

Thanks,

Bob S.

> On Tuesday, July 13, 2010 5:18 PM Adam Mercer wrote:

> Hi
> 
> I am trying to build M2Crypto on Mac OS X 10.6.4 against python2.5
> (python2.6 fails in the same way), with SWIG 2.0.0 and OpenSSL 1.0.0a
> and it is failing with the following:
> 
> 105   :info:build swigging SWIG/_m2crypto.i to SWIG/_m2crypto_wrap.c
> 106   :info:build swig -python
> -I/opt/local/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5
> -I/opt/local/include -includeall -o SWIG/_m2crypto_wrap.c
> SWIG/_m2crypto.i
> 107   :info:build SWIG/_bio.i:64: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 108   :info:build SWIG/_rand.i:19: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 109   :info:build SWIG/_evp.i:156: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 110   :info:build SWIG/_dh.i:36: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 111   :info:build SWIG/_rsa.i:43: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 112   :info:build SWIG/_dsa.i:31: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 113   :info:build SWIG/_ssl.i:207: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 114   :info:build SWIG/_x509.i:313: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 115   :info:build SWIG/_pkcs7.i:42: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 116   :info:build SWIG/_pkcs7.i:42: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 117   :info:build SWIG/_util.i:9: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 118   :info:build SWIG/_ec.i:111: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 119   :info:build SWIG/_engine.i:162: Warning 454: Setting a
> pointer/reference variable may leak memory.
> 120   :info:build creating build/temp.macosx-10.6-x86_64-2.5
> 121   :info:build creating build/temp.macosx-10.6-x86_64-2.5/SWIG
> 122   :info:build /usr/bin/gcc-4.2 -fno-strict-aliasing -mno-fused-madd
> -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
> -I/opt/local/Library/Frameworks/Python.framework/Versions/2.5/include/python2.5
> -I/opt/local/include
> -I/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_python_py25-m2crypto/work/M2Crypto-0.20.2/SWIG
> -c SWIG/_m2crypto_wrap.c -o
> build/temp.macosx-10.6-x86_64-2.5/SWIG/_m2crypto_wrap.o -DTHREADING
> 123   :info:build SWIG/_m2crypto_wrap.c: In function 'rand_pseudo_bytes':
> 124   :info:build SWIG/_m2crypto_wrap.c:3899: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 125   :info:build SWIG/_m2crypto_wrap.c: In function 'pkcs5_pbkdf2_hmac_sha1':
> 126   :info:build SWIG/_m2crypto_wrap.c:3973: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 127   :info:build SWIG/_m2crypto_wrap.c: In function 'bytes_to_key':
> 128   :info:build SWIG/_m2crypto_wrap.c:4132: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 129   :info:build SWIG/_m2crypto_wrap.c: In function 'sign_final':
> 130   :info:build SWIG/_m2crypto_wrap.c:4228: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 131   :info:build SWIG/_m2crypto_wrap.c: In function 'pkey_as_der':
> 132   :info:build SWIG/_m2crypto_wrap.c:4300: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 133   :info:build SWIG/_m2crypto_wrap.c: In function 'pkey_get_modulus':
> 134   :info:build SWIG/_m2crypto_wrap.c:4333: warning: value computed is not 
> used
> 135   :info:build SWIG/_m2crypto_wrap.c:4358: warning: value computed is not 
> used
> 136   :info:build SWIG/_m2crypto_wrap.c: In function 'AES_crypt':
> 137   :info:build SWIG/_m2crypto_wrap.c:: warning: pointer targets
> in passing argument 1 of 'PyString_FromStringAndSize' differ in
> signedness
> 138   :info:build SWIG/_m2crypto_wrap.c: At top level:
> 139   :info:build SWIG/_m2crypto_wrap.c:5846: error: expected '=', ',',
> ';', 'asm' or '__attribute__' before '*' token
> 140   :info:build SWIG/_m2crypto_wrap.c:5850: error: expected ')' before '*' 
> token
> 141   :info:build SWIG/_m2crypto_wrap.c:5854:

Re: Modifying an existing excel spreadsheet

2010-12-21 Thread Jon Clements
On Dec 20, 9:56 pm, Ed Keith  wrote:
> I have a user supplied 'template' Excel spreadsheet. I need to create a new 
> excel spreadsheet based on the supplied template, with data filled in.
>
> I found the tools 
> herehttp://www.python-excel.org/, andhttp://sourceforge.net/projects/pyexcelerator/.
>  I have been trying to use the former, since the latter seems to be devoid of 
> documentation (not even any docstrings).
>
> My first thought was to copy the template, open the copy, modify it and save 
> the modifications. But it looks like if I open an existing spreadsheet it 
> must be read only. So I tried to  open the template, copy it to a new 
> spreadsheet and write the new spreadsheet, but I can't seem to copy the 
> images, and it looks like copying the formatting is going to be difficult.
>
> Can anyone give me any tips or advice?
>
> Thanks in advance,
>
>    -EdK
>
> Ed Keith
>
> e_...@yahoo.com
>
> Blog: edkeith.blogspot.com

Have you tried: http://groups.google.com/group/python-excel
 and searching the archives for "template"? Similar questions have
come up before there.

hth

Jon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Bug in fixed_point?!

2010-12-21 Thread Robert Kern

On 12/20/10 10:03 PM, C Barrington-Leigh wrote:

I cannot figure out what I'm doing wrong. The following does not
return a fixed point:


from scipy import optimize
xxroot= optimize.fixed_point(lambda xx: exp(-2.0*xx)/2.0, 1.0,
args=(), xtol=1e-12, maxiter=500)
print ' %f solves fixed point, ie f(%f)=%f ?'%
(xxroot,xxroot,exp(-2.0*xxroot)/2.0)


You will want to ask scipy questions on the scipy-user mailing list:

  http://www.scipy.org/Mailing_Lists

When you do, please provide the information that Terry Reedy asked for.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: class inheritance

2010-12-21 Thread Ethan Furman

JLundell wrote:

On Saturday, March 13, 2010 9:03:36 AM UTC-8, Jonathan Lundell wrote:

I've got a subclass of fractions.Fraction called Value; it's a mostly
trivial class, except that it overrides __eq__ to mean 'nearly equal'.
However, since Fraction's operations result in a Fraction, not a
Value, I end up with stuff like this:

x = Value(1) + Value(2)

where x is now a Fraction, not a Value, and x == y uses
Fraction.__eq__ rather than Value.__eq__.

This appears to be standard Python behavior (int does the same thing).
I've worked around it by overriding __add__, etc, with functions that
invoke Fraction but coerce the result. But that's tedious; there are a
lot of methods to override.

So I'm wondering: is there a more efficient way to accomplish what I'm
after?


I recently implemented a different approach to this. I've got:

class Rational(fractions.Fraction):

... and some methods of my own, including my own __new__ and __str__ (which is 
one of the reasons I need the class). Then after (outside) the class 
definition, this code that was inspired by something similar I noticed in 
Python Cookbook. There are two things going on here. One is, of course, the 
automation at import time. The other is that the wrapper gets a Fraction 
instance and simply overrides __class__, rather than creating yet another 
Rational and unbinding the interim Fraction. Seems to work quite well.


[snip]

Another option is to use a metaclass:

class Perpetuate(ABCMeta):
def __new__(metacls, cls_name, cls_bases, cls_dict):
if len(cls_bases) > 1:
raise TypeError("multiple bases not allowed")
result_class = type.__new__(metacls, cls_name,
cls_bases, cls_dict)
base_class = cls_bases[0]
known_attr = set()
for attr in cls_dict.keys():
known_attr.add(attr)
for attr in base_class.__dict__.keys():
if attr in ('__new__'):
continue
code = getattr(base_class, attr)
if callable(code) and attr not in known_attr:
setattr(result_class, attr,
metacls._wrap(base_class, code))
elif attr not in known_attr:
setattr(result_class, attr, code)
return result_class
@staticmethod
def _wrap(base, code):
def wrapper(*args, **kwargs):
if args:
cls = args[0]
result = code(*args, **kwargs)
if type(result) == base:
return cls.__class__(result)
elif isinstance(result, (tuple, list, set)):
new_result = []
for partial in result:
if type(partial) == base:
new_result.append(cls.__class__(partial))
else:
new_result.append(partial)
result = result.__class__(new_result)
elif isinstance(result, dict):
for key in result:
value = result[key]
if type(value) == base:
result[key] = cls.__class__(value)
return result
wrapper.__name__ = code.__name__
wrapper.__doc__ = code.__doc__
return wrapper


then the actual class becomes:

class CloseFraction(Fraction):
__metaclass__ = Perpetuate
def __eq__(x, y):
return abs(x - y) < 1  # season to taste
def __repr__(x):
return "CloseFraction(%d, %d)" % (x.numerator, x.denominator)

Perpetuate needs to handle multiple inheritance better, but it might 
meet your needs at this point.


Sample run:
--> n = CloseFraction(3, 2)
--> n
CloseFraction(3, 2)
--> print n
3/2
--> m = CloseFraction(9, 4)
--> m
CloseFraction(9, 4)
--> n == m
True
--> n - m
CloseFraction(-3, 4)
--> n + m
CloseFraction(15, 4)
--> n.real
CloseFraction(3, 2)
--> n.imag
0  # this is an int

Hope this helps!

~Ethan~
--
http://mail.python.org/mailman/listinfo/python-list


Re: help with link parsing?

2010-12-21 Thread Jon Clements
On Dec 20, 7:14 pm, "Littlefield, Tyler"  wrote:
> Hello all,
> I have a question. I guess this worked pre 2.6; I don't remember the
> last time I used it, but it was a while ago, and now it's failing.
> Anyone mind looking at it and telling me what's going wrong? Also, is
> there a quick way to match on a certain site? like links from google.com
> and only output those?
> #!/usr/bin/env python
>
> #This program is free software: you can redistribute it and/or modify it
> under the terms of the GNU General Public License as published
> #by the Free Software Foundation, either version 3 of the License, or
> (at your option) any later version.
>
> #This program is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> #MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> General Public License for more details.
> #
> #You should have received a copy of the GNU General Public License along
> with this program. If not, see
> #http://www.gnu.org/licenses/.
>
> """
> This script will parse out all the links in an html document and write
> them to a textfile.
> """
> import sys,optparse
> import htmllib,formatter
>
> #program class declarations:
> class Links(htmllib.HTMLParser):
>      def __init__(self,formatter):
>          htmllib.HTMLParser.__init__(self, formatter)
>          self.links=[]
>      def start_a(self, attrs):
>          if (len(attrs)>0):
>              for a in attrs:
>                  if a[0]=="href":
>                      self.links.append(a[1])
>                      print a[1]
>                      break
>
> def main(argv):
>      if (len(argv)!=3):
>          print("Error:\n"+argv[0]+"  .\nParses 
> for all links and saves them to .")
>          return 1
>      lcount=0
>      format=formatter.NullFormatter()
>      html=Links(format)
>      print "Retrieving data:"
>      page=open(argv[1],"r")
>      print "Feeding data to parser:"
>      html.feed(page.read())
>      page.close()
>      print "Writing links:"
>      output=open(argv[2],"w")
>      for i in (html.links):
>          output.write(i+"\n")
>          lcount+=1
>      output.close()
>      print("Wrote "+str(lcount)+" links to "+argv[2]+".");
>      print("done.")
>
> if (__name__ == "__main__"):
>      #we call the main function passing a list of args, and exit with
> the return code passed back.
>      sys.exit(main(sys.argv))
>
> --
>
> Thanks,
> Ty

This doesn't answer your original question, but excluding the command
line handling, how's this do you?:

import lxml
from urlparse import urlsplit

doc = lxml.html.parse('http://www.google.com')
print map(urlsplit, doc.xpath('//a/@href'))

[SplitResult(scheme='http', netloc='www.google.co.uk', path='/imghp',
query='hl=en&tab=wi', fragment=''), SplitResult(scheme='http',
netloc='video.google.co.uk', path='/', query='hl=en&tab=wv',
fragment=''), SplitResult(scheme='http', netloc='maps.google.co.uk',
path='/maps', query='hl=en&tab=wl', fragment=''),
SplitResult(scheme='http', netloc='news.google.co.uk', path='/nwshp',
query='hl=en&tab=wn', fragment=''), ...]

Much nicer IMHO, plus the lxml.html has iterlinks() and other
convenience functions for handling HTML.

hth

Jon.

-- 
http://mail.python.org/mailman/listinfo/python-list


[RELEASED] Python 3.2 beta 2

2010-12-21 Thread Georg Brandl
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On behalf of the Python development team, I'm happy to announce the
second beta preview release of Python 3.2.

Python 3.2 is a continuation of the efforts to improve and stabilize the
Python 3.x line.  Since the final release of Python 2.7, the 2.x line
will only receive bugfixes, and new features are developed for 3.x only.

Since PEP 3003, the Moratorium on Language Changes, is in effect, there
are no changes in Python's syntax and built-in types in Python 3.2.
Development efforts concentrated on the standard library and support for
porting code to Python 3.  Highlights are:

* numerous improvements to the unittest module
* PEP 3147, support for .pyc repository directories
* PEP 3149, support for version tagged dynamic libraries
* PEP 3148, a new futures library for concurrent programming
* PEP 384, a stable ABI for extension modules
* PEP 391, dictionary-based logging configuration
* an overhauled GIL implementation that reduces contention
* an extended email package that handles bytes messages
* countless fixes regarding bytes/string issues; among them full
  support for a bytes environment (filenames, environment variables)
* many consistency and behavior fixes for numeric operations
* a sysconfig module to access configuration information
* a pure-Python implementation of the datetime module
* additions to the shutil module, among them archive file support
* improvements to pdb, the Python debugger

For a more extensive list of changes in 3.2, see

http://docs.python.org/3.2/whatsnew/3.2.html

To download Python 3.2 visit:

http://www.python.org/download/releases/3.2/

Please consider trying Python 3.2 with your code and reporting any bugs
you may notice to:

http://bugs.python.org/


Enjoy!

- -- 
Georg Brandl, Release Manager
georg at python.org
(on behalf of the entire python-dev team and 3.2's contributors)

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk0Q/aAACgkQN9GcIYhpnLDf8gCgkLGAsE+T3R505jZc1RxXDYsa
NSsAnRGaFjeTm9o2Z5O8FuIzTUG8t1PT
=hHzz
-END PGP SIGNATURE-
-- 
http://mail.python.org/mailman/listinfo/python-list


seeking pygtk bindings for gtkdatabox

2010-12-21 Thread GrayShark
Hello,
a search for the python bindings for gtkdatabox lead no where. Anyone know 
of who is maintaining/working/siting such a package?

Thanks in advance.

Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending XML to a WEB Service and Getting Response Back

2010-12-21 Thread John Nagle

On 12/20/2010 11:45 PM, Ian Kelly wrote:

On 12/20/2010 11:34 PM, John Nagle wrote:

SOAPpy is way out of date. The last update on SourceForge was in
2001.


2007, actually: http://sourceforge.net/projects/pywebsvcs/files/

And there is repository activity within the past 9 months. Still, point
taken.


   The original SOAPpy was at

http://sourceforge.net/projects/soapy/files/

but was apparently abandoned in 2001. Someone else picked
it up and moved it to

http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/

where it was last updated in 2005.  ZSI was last updated in
2007.  Users are still submitting bug reports, but nobody
is answering.  Somebody posted "Who maintains the pywebsvcs webpage?"
in February 2009, but no one answered them.

There's also "Python SOAP"

http://sourceforge.net/projects/pythonsoap/

abandoned in 2005.

The "suds" module

http://sourceforge.net/projects/python-suds/

was last updated in March, 2010.  That version
will work with Python 2.6, and probably 2.7.
There's very little project activity, but at
least it's reasonably current.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Scanning directories for new files?

2010-12-21 Thread Matty Sarro
Hey everyone.
I'm in the midst of writing a parser to clean up incoming files,
remove extra data that isn't needed, normalize some values, etc. The
base files will be uploaded via FTP.
How does one go about scanning a directory for new files? For now
we're looking to run it as a cron job but eventually would like to
move away from that into making it a service running in the
background.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending XML to a WEB Service and Getting Response Back

2010-12-21 Thread Anurag Chourasia
Thanks for the response all.

I tried exploring suds (which seems to be the current) and i hit
problems right away. I will now try urllib or httplib.

I have asked for help in the suds forum. Hope somebody replies.

When i try to create a client, the error is as follows.

>>> from suds.client import Client
>>> url = 'http://10.251.4.33:8041/DteEnLinea/ws/EnvioGuia.jws'
>>> client = Client(url)

Traceback (most recent call last):
  File "", line 1, in 
  File "suds/client.py", line 112, in __init__
self.wsdl = reader.open(url)
  File "suds/reader.py", line 152, in open
d = self.fn(url, self.options)
  File "suds/wsdl.py", line 136, in __init__
d = reader.open(url)
  File "suds/reader.py", line 79, in open
d = self.download(url)
  File "suds/reader.py", line 101, in download
return sax.parse(string=content)
  File "suds/sax/parser.py", line 136, in parse
sax.parse(source)
  File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
  File "/usr/local/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
  File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
  File "/usr/local/lib/python2.7/xml/sax/handler.py", line 38, in fatalError
raise exception
xml.sax._exceptions.SAXParseException: :1:62: syntax error
>>> [3] + Stopped (SIGTSTP)python

This seems to be a old problem passing versions.

Regards,
Anurag

On Wed, Dec 22, 2010 at 12:40 AM, John Nagle  wrote:
> On 12/20/2010 11:45 PM, Ian Kelly wrote:
>>
>> On 12/20/2010 11:34 PM, John Nagle wrote:
>>>
>>> SOAPpy is way out of date. The last update on SourceForge was in
>>> 2001.
>>
>> 2007, actually: http://sourceforge.net/projects/pywebsvcs/files/
>>
>> And there is repository activity within the past 9 months. Still, point
>> taken.
>
>   The original SOAPpy was at
>
>        http://sourceforge.net/projects/soapy/files/
>
> but was apparently abandoned in 2001. Someone else picked
> it up and moved it to
>
>        http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/
>
> where it was last updated in 2005.  ZSI was last updated in
> 2007.  Users are still submitting bug reports, but nobody
> is answering.  Somebody posted "Who maintains the pywebsvcs webpage?"
> in February 2009, but no one answered them.
>
>    There's also "Python SOAP"
>
> http://sourceforge.net/projects/pythonsoap/
>
> abandoned in 2005.
>
>    The "suds" module
>
> http://sourceforge.net/projects/python-suds/
>
> was last updated in March, 2010.  That version
> will work with Python 2.6, and probably 2.7.
> There's very little project activity, but at
> least it's reasonably current.
>
>                                John Nagle
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Scanning directories for new files?

2010-12-21 Thread Jon Clements
On Dec 21, 7:17 pm, Matty Sarro  wrote:
> Hey everyone.
> I'm in the midst of writing a parser to clean up incoming files,
> remove extra data that isn't needed, normalize some values, etc. The
> base files will be uploaded via FTP.
> How does one go about scanning a directory for new files? For now
> we're looking to run it as a cron job but eventually would like to
> move away from that into making it a service running in the
> background.

Not a direct answer, but I would choose the approach of letting the
FTP server know when a new file has been added. For instance:
http://www.pureftpd.org/project/pure-ftpd -

"Any external shell script can be called after a successful upload.
Virus scanners and database archiveal can easily be set up."

Of course, there's loads more servers, that I'm sure will have
callback events or similar.

Although, yes, the monitoring the file system is completely possible.

hth

Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Scanning directories for new files?

2010-12-21 Thread Stefan Sonnenberg-Carstens

Am 21.12.2010 20:17, schrieb Matty Sarro:

Hey everyone.
I'm in the midst of writing a parser to clean up incoming files,
remove extra data that isn't needed, normalize some values, etc. The
base files will be uploaded via FTP.
How does one go about scanning a directory for new files? For now
we're looking to run it as a cron job but eventually would like to
move away from that into making it a service running in the
background.

When You say cron, I assume you're running linux.
One approach would be to os.walk() the directory in question, and filling a
dict with the absolute name of the file as key and the output from 
stat() as content.

Then re-scan regularly and check for changes in mtime,ctime etc.

A less ressource consuming approach would be to use Linux' inotify 
infrastructure,

which can be used from python https://github.com/seb-m/pyinotify

And, your service is only an import away :-)

https://github.com/seb-m/pyinotify/blob/master/python2/examples/daemon.py
--
http://mail.python.org/mailman/listinfo/python-list


Re: Scanning directories for new files?

2010-12-21 Thread Martin Gregorie
On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote:

> Hey everyone.
> I'm in the midst of writing a parser to clean up incoming files, remove
> extra data that isn't needed, normalize some values, etc. The base files
> will be uploaded via FTP.
> How does one go about scanning a directory for new files? For now we're
> looking to run it as a cron job but eventually would like to move away
> from that into making it a service running in the background.
>
Make sure the files are initially uploaded using a name that the parser 
isn't looking for and rename it when the upload is finished. This way the 
parser won't try to process a partially loaded file. 

If you are uploading to a *nix machine You the rename can move the file 
between directories provided both directories are in the same filing 
system. Under those conditions rename is always an atomic operation with 
no copying involved. This would you to, say, upload the file to "temp/
myfile" and renamed it to "uploaded/myfile" with your parser only 
scanning the uploaded directory and, presumably, renaming processed files 
to move them to a third directory ready for further processing.

I've used this technique reliably with files arriving via FTP at quite 
high rates.
  

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org   |
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: If/then style question

2010-12-21 Thread Francesco

I'd bet you would stress your point Steven! But you don't need to persuade me, 
I do already agree.
I just meant to say that, when the advantage is little, there's no need to 
rewrite a working function.
And that with modern CPUs, if tests take so little time, that even some 
redundant one is not so much of a nuisance.
in your working example, the "payload" is just a couple of integer calculations, that take very little time too. So the overhead due 
to redundant if tests does show clearly. And also in that not-really-real situation, 60% overhead just meant less than 3 seconds. 
Just for the sake of discussion, I tried to give both functions some plough to pull, and a worst-case situation too:


>>> t1 = Timer('for x in range(100): print func1(0),',
...  'from __main__ import func1')
>>>
>>> t2 = Timer('for x in range(100): print func2(0),',
...  'from __main__ import func2')
>>>
>>> min(t1.repeat(number=1, repeat=1))
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
53.011015366479114
>>> min(t2.repeat(number=1, repeat=1))
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
47.55442856564332

that accounts for a scant 11% overhead, on more than one million tests per 
cycle.

That said,  let's make really clear that I would heartily prefer func2 to func1, based both on readability and speed. Thank you for 
having spent some time playing with me!

Francesco

On 19/12/2010 1.05, Steven D'Aprano wrote:

Well, let's try it with a working (albeit contrived) example. This is
just an example -- obviously I wouldn't write the function like this in
real life, I'd use a while loop, but to illustrate the issue it will do.

def func1(n):
 result = -1
 done = False
 n = (n+1)//2
 if n%2 == 1:
 result = n
 done = True
 if not done:
 n = (n+1)//2
 if n%2 == 1:
 result = n
 done = True
 if not done:
 n = (n+1)//2
 if n%2 == 1:
 result = n
 done = True
 if not done:
 for i in range(100):
 if not done:
 n = (n+1)//2
 if n%2 == 1:
 result = n
 done = True
 return result


def func2(n):
 n = (n+1)//2
 if n%2 == 1:
 return n
 n = (n+1)//2
 if n%2 == 1:
 return n
 n = (n+1)//2
 if n%2 == 1:
 return n
 for i in range(100):
 n = (n+1)//2
 if n%2 == 1:
 return n
 return -1


Not only is the second far more readable that the first, but it's also
significantly faster:


from timeit import Timer
t1 = Timer('for i in range(20): x = func1(i)',

... 'from __main__ import func1')

t2 = Timer('for i in range(20): x = func2(i)',

... 'from __main__ import func2')

min(t1.repeat(number=10, repeat=5))

7.3219029903411865

min(t2.repeat(number=10, repeat=5))

4.530779838562012

The first function does approximately 60% more work than the first, all
of it unnecessary overhead.





--
http://mail.python.org/mailman/listinfo/python-list


Re: Scanning directories for new files?

2010-12-21 Thread GrayShark
On Tue, 21 Dec 2010 14:17:40 -0500, Matty Sarro wrote:

> Hey everyone.
> I'm in the midst of writing a parser to clean up incoming files, remove
> extra data that isn't needed, normalize some values, etc. The base files
> will be uploaded via FTP.
> How does one go about scanning a directory for new files? For now we're
> looking to run it as a cron job but eventually would like to move away
> from that into making it a service running in the background.

You can try pyinotify.
Pyinotify is a Python module for monitoring filesystems changes. Pyinotify 
relies on a Linux Kernel feature (merged in kernel 2.6.13) called inotify. 
inotify is an event-driven notifier, its notifications are exported from 
kernel space to user space through three system calls. pyinotify binds 
these system calls and provides an implementation on top of them offering 
a generic and abstract way to manipulate those functionalities.

I'm assuming your using Linux. You seem to be at least using UNIX (cron).

read more at: http://pyinotify.sourceforge.net/

Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending XML to a WEB Service and Getting Response Back

2010-12-21 Thread Ian Kelly

On 12/21/2010 12:10 PM, John Nagle wrote:

The original SOAPpy was at

http://sourceforge.net/projects/soapy/files/

but was apparently abandoned in 2001. Someone else picked
it up and moved it to

http://sourceforge.net/projects/pywebsvcs/files/SOAP.py/


These are unrelated projects, AFACT.  The former was released as version 
0.1 on 4/27/01.  According to the changelog, the first public release of 
the latter was version 0.5 on 4/17/01.


--
http://mail.python.org/mailman/listinfo/python-list



Re: Python 3.2 beta 2

2010-12-21 Thread Luis M . González
I wonder if Unladen Swallow is still being considered for merger with
Python 3.3.
Is it?


On Dec 21, 4:18 pm, Georg Brandl  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On behalf of the Python development team, I'm happy to announce the
> second beta preview release of Python 3.2.
>
> Python 3.2 is a continuation of the efforts to improve and stabilize the
> Python 3.x line.  Since the final release of Python 2.7, the 2.x line
> will only receive bugfixes, and new features are developed for 3.x only.
>
> Since PEP 3003, the Moratorium on Language Changes, is in effect, there
> are no changes in Python's syntax and built-in types in Python 3.2.
> Development efforts concentrated on the standard library and support for
> porting code to Python 3.  Highlights are:
>
> * numerous improvements to the unittest module
> * PEP 3147, support for .pyc repository directories
> * PEP 3149, support for version tagged dynamic libraries
> * PEP 3148, a new futures library for concurrent programming
> * PEP 384, a stable ABI for extension modules
> * PEP 391, dictionary-based logging configuration
> * an overhauled GIL implementation that reduces contention
> * an extended email package that handles bytes messages
> * countless fixes regarding bytes/string issues; among them full
>   support for a bytes environment (filenames, environment variables)
> * many consistency and behavior fixes for numeric operations
> * a sysconfig module to access configuration information
> * a pure-Python implementation of the datetime module
> * additions to the shutil module, among them archive file support
> * improvements to pdb, the Python debugger
>
> For a more extensive list of changes in 3.2, see
>
>    http://docs.python.org/3.2/whatsnew/3.2.html
>
> To download Python 3.2 visit:
>
>    http://www.python.org/download/releases/3.2/
>
> Please consider trying Python 3.2 with your code and reporting any bugs
> you may notice to:
>
>    http://bugs.python.org/
>
> Enjoy!
>
> - --
> Georg Brandl, Release Manager
> georg at python.org
> (on behalf of the entire python-dev team and 3.2's contributors)
>
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.4.11 (GNU/Linux)
>
> iEYEARECAAYFAk0Q/aAACgkQN9GcIYhpnLDf8gCgkLGAsE+T3R505jZc1RxXDYsa
> NSsAnRGaFjeTm9o2Z5O8FuIzTUG8t1PT
> =hHzz
> -END PGP SIGNATURE-

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python 3.2 beta 2

2010-12-21 Thread Martin v. Loewis
Am 21.12.2010 22:56, schrieb Luis M. González:
> I wonder if Unladen Swallow is still being considered for merger with
> Python 3.3.
> Is it?

3.2 isn't even released yet, and 3.3 will appear 18 months after it (so
in Summer 2012). It's much too early to tell.

OTOH, to answer you literal question: most certainly. At least you seem
to be considering it, so it's certainly being considered by somebody.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Funny __future__ imports

2010-12-21 Thread Daniel da Silva
from __future__ import space_shuttle
DeprecationWarning: will be removed in next release


Post yours!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: True lists in python?

2010-12-21 Thread Arnaud Delobelle
Duncan Booth  writes:

> I guess you might be able to do it with a double-linked list provided 
> that when traversing the list you always keep two nodes around to 
> determine the direction. e.g. instead of asking for node6.nextNode() you 
> ask for node6.nextNode(previous=node1) and then the code can return 
> whichever sibling wasn't given. That would make reversal (assuming you 
> have both nodes) O(1), but would make traversing the list slower.

There used to be a trick to implement doubly linked lists with the same
memory footprint as singly linked ones: instead of each node storing two
pointers (one to the next node, one to the previous one), you just store
one value:

(previous node) xor (next node)

This means that when traversing the list, you need to always remember
which node you are coming from.  But it also makes these lists
kind of symmetrical.

-- 
Arnaud
-- 
http://mail.python.org/mailman/listinfo/python-list


Specialisation / Interests

2010-12-21 Thread Jon Clements
Hi all,

Was thinking tonight (now this morning my time):

What would we consider the "long time" posters on c.l.p consider what
they respond to and offer serious advice on.

For instance:
- Raymond Hettinger for algo's in collections and itertools
- MRAB for regex's (never seen him duck a post where re was (not)
required.
- the "effbot" for PIL & ElementTree
- Mark Hammond for work on win32
- Mark Dickinson for floating point/number theory etc...

Then so many others!...

I'm leaving a huge amount out, so no rudeness intended - but what you
think guys and gals?

Cheers,
Jon.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [python-committers] [RELEASED] Python 3.2 beta 2

2010-12-21 Thread Nick Coghlan
On Wed, Dec 22, 2010 at 6:18 AM, Georg Brandl  wrote:
> Since PEP 3003, the Moratorium on Language Changes, is in effect, there
> are no changes in Python's syntax and built-in types in Python 3.2.

Minor nit - we actually did tweak a few of the builtin types a bit
(mostly the stuff to improve Sequence ABC conformance and to make
range objects more list-like)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Funny __future__ imports

2010-12-21 Thread MRAB

On 21/12/2010 22:17, Daniel da Silva wrote:

from __future__ import space_shuttle
DeprecationWarning: will be removed in next release


Post yours!


from __future__ import time_machine
ImportError: time_machine in use by import
--
http://mail.python.org/mailman/listinfo/python-list


Re: Funny __future__ imports

2010-12-21 Thread Emile van Sebille

On 12/21/2010 6:38 PM MRAB said...

On 21/12/2010 22:17, Daniel da Silva wrote:

from __future__ import space_shuttle
DeprecationWarning: will be removed in next release


Post yours!


from __future__ import time_machine
ImportError: time_machine in use by import


from __future__ import improved_realestate_market
ValueError: realestate market depreciated

:)



--
http://mail.python.org/mailman/listinfo/python-list


Re: Sending XML to a WEB Service and Getting Response Back

2010-12-21 Thread John Nagle

On 12/21/2010 11:26 AM, Anurag Chourasia wrote:

Thanks for the response all.

I tried exploring suds (which seems to be the current) and i hit
problems right away. I will now try urllib or httplib.

I have asked for help in the suds forum. Hope somebody replies.

When i try to create a client, the error is as follows.


from suds.client import Client
url = 'http://10.251.4.33:8041/DteEnLinea/ws/EnvioGuia.jws'
client = Client(url)


Traceback (most recent call last):
   File "", line 1, in
   File "suds/client.py", line 112, in __init__
 self.wsdl = reader.open(url)
   File "suds/reader.py", line 152, in open
 d = self.fn(url, self.options)
   File "suds/wsdl.py", line 136, in __init__
 d = reader.open(url)
   File "suds/reader.py", line 79, in open
 d = self.download(url)
   File "suds/reader.py", line 101, in download
 return sax.parse(string=content)
   File "suds/sax/parser.py", line 136, in parse
 sax.parse(source)
   File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 107, in parse
 xmlreader.IncrementalParser.parse(self, source)
   File "/usr/local/lib/python2.7/xml/sax/xmlreader.py", line 123, in parse
 self.feed(buffer)
   File "/usr/local/lib/python2.7/xml/sax/expatreader.py", line 211, in feed
 self._err_handler.fatalError(exc)
   File "/usr/local/lib/python2.7/xml/sax/handler.py", line 38, in fatalError
 raise exception
xml.sax._exceptions.SAXParseException::1:62: syntax error

[3] + Stopped (SIGTSTP)python


This seems to be a old problem passing versions.

Regards,
Anurag


   Try posting a URL that isn't on network 10. That's some local
network at your end.

John Nagle
--
http://mail.python.org/mailman/listinfo/python-list


Re: Bug in fixed_point?!

2010-12-21 Thread C Barrington-Leigh
On Dec 21, 9:36 am, Robert Kern  wrote:
> When you do, please provide the information that Terry Reedy asked for.
>

Sorry; quite right. For completeness I'll post here as well as over on
scipy.

Here's the actual code:
-
from scipy import optimize
from math import exp
xxroot= optimize.fixed_point(lambda xx: exp(-2.0*xx)/2.0, 1.0,
args=(), xtol=1e-12, maxiter=500)
print ' %f solves fixed point, ie f(%f)=%f ?'%
(xxroot,xxroot,exp(-2.0*xxroot)/2.0)


Here is the output
--
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
In [1]: run tmp.py
 0.332058 solves fixed point, ie f(0.332058)=0.257364 ?


-- 
http://mail.python.org/mailman/listinfo/python-list